Real and Abstract Anlysis

Real And Abstract Analysis
Kuttler
March 21, 2014

2
Contents
I Prerequisite Material 11
1 Some Fundamental Concepts 13
1.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.1 Basic Denitions . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.2 The Schroder Bernstein Theorem . . . . . . . . . . . . . . . . 16
1.1.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . 19
1.2 lim sup And lim inf . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Double Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Row reduced echelon form 27

2.1 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 The row reduced echelon form of a matrix . . . . . . . . . . . . . . . 35
2.3 Finding the inverse of a matrix . . . . . . . . . . . . . . . . . . . . . 39
2.4 The rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Sequences 43
3.1 The Inner Product And Dot Product . . . . . . . . . . . . . . . . . . 43
3.1.1 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Vector Valued Sequences And Their Limits . . . . . . . . . . . . . . 46
3.3 Sequential Compactness . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Closed And Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5 Cauchy Sequences And Completeness . . . . . . . . . . . . . . . . . 55
3.6 Shrinking Diameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Continuous Functions 61
4.1 Continuity And The Limit Of A Sequence . . . . . . . . . . . . . . . 64
4.2 The Extreme Values Theorem . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Sequences And Series Of Functions . . . . . . . . . . . . . . . . . . . 72
4.6 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7 Sequences Of Polynomials, Weierstrass Approximation . . . . . . . . 77
4.7.1 The Tietze Extension Theorem . . . . . . . . . . . . . . . . . 81
3
4 CONTENTS
4.8 The Operator Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.9 Ascoli Arzela Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5 The Mathematical Theory Of Determinants, Basic Linear Algebra 99

5.1 The Function sgnn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.1 The Denition . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.2 Permuting Rows Or Columns . . . . . . . . . . . . . . . . . . 102
5.2.3 A Symmetric Denition . . . . . . . . . . . . . . . . . . . . . 103
5.2.4 The Alternating Property Of The Determinant . . . . . . . . 104
5.2.5 Linear Combinations And Determinants . . . . . . . . . . . . 105
5.2.6 The Determinant Of A Product . . . . . . . . . . . . . . . . . 105
5.2.7 Cofactor Expansions . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.8 Formula For The Inverse . . . . . . . . . . . . . . . . . . . . . 108
5.2.9 Cramers Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2.10 Upper Triangular Matrices . . . . . . . . . . . . . . . . . . . 110
5.2.11 The Determinant Rank . . . . . . . . . . . . . . . . . . . . . 110
5.2.12 Determining Whether A Is One To One Or Onto . . . . . . . 112
5.2.13 Schurs Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.14 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . 116
5.3 The Right Polar Factorization . . . . . . . . . . . . . . . . . . . . . . 117
6 The Derivative 121

6.1 Basic Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3 The Matrix Of The Derivative . . . . . . . . . . . . . . . . . . . . . 123
6.4 A Mean Value Inequality . . . . . . . . . . . . . . . . . . . . . . . . 126
6.5 Existence Of The Derivative, C 1 Functions . . . . . . . . . . . . . . 127
6.6 Higher Order Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 131
6.7 C k Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.7.1 Some Standard Notation . . . . . . . . . . . . . . . . . . . . . 134
6.8 The Derivative And The Cartesian Product . . . . . . . . . . . . . . 134
6.9 Mixed Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 138
6.10 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . 140
6.10.1 More Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.10.2 The Case Of Rn . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.11 Taylors Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.11.1 Second Derivative Test . . . . . . . . . . . . . . . . . . . . . . 149
6.12 The Method Of Lagrange Multipliers . . . . . . . . . . . . . . . . . . 151
6.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
CONTENTS 5
II Integration And Measure 159

7 Measures And Measurable Functions 161
7.1 Open Coverings And Compactness . . . . . . . . . . . . . . . . . . . 161
7.2 An Outer Measure On P (R) . . . . . . . . . . . . . . . . . . . . . . 164
7.3 Measures And Measure Spaces . . . . . . . . . . . . . . . . . . . . . 167
7.4 Measures From Outer Measures . . . . . . . . . . . . . . . . . . . . . 169
7.5 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.6 One Dimensional Lebesgue Stieltjes Measure . . . . . . . . . . . . . 181
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8 The Abstract Lebesgue Integral 189

8.1 Denition For Nonnegative Measurable Functions . . . . . . . . . . . 189
8.1.1 Riemann Integrals For Decreasing Functions . . . . . . . . . . 189
8.1.2 The Lebesgue Integral For Nonnegative Functions . . . . . . 190
8.2 The Lebesgue Integral For Nonnegative Simple Functions . . . . . . 191
8.3 The Monotone Convergence Theorem . . . . . . . . . . . . . . . . . 193
8.4 Other Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.5 Fatous Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.6 The Integrals Righteous Algebraic Desires . . . . . . . . . . . . . . . 195
8.7 The Lebesgue Integral, L1 . . . . . . . . . . . . . . . . . . . . . . . . 196
8.8 The Dominated Convergence Theorem . . . . . . . . . . . . . . . . . 201
8.9 The One Dimensional Lebesgue Stieltjes Integral . . . . . . . . . . . 203
8.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9 The Lebesgue Integral For Functions Of n Variables 213

9.1 Completion Of Measure Spaces, Approximation . . . . . . . . . . . . 213
9.2 Dynkin Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.3 n Dimensional Lebesgue Measure And Integrals . . . . . . . . . . . . 220
9.3.1 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.3.2 n Dimensional Lebesgue Measure And Integrals . . . . . . . . 221
9.3.3 The Sigma Algebra Of Lebesgue Measurable Sets . . . . . . . 225
9.3.4 Fubinis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.4 Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.4.1 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.4.2 Completion Of Product Measure Spaces . . . . . . . . . . . . 231
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10 Lebesgue Measurable Sets 237

10.1 The Algebra Of Lebesgue Measurable Sets . . . . . . . . . . . . . 237
10.2 Change Of Variables, Linear Maps . . . . . . . . . . . . . . . . . . . 241
10.3 Covering Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
10.4 Dierentiable Functions And Measurability . . . . . . . . . . . . . . 248
10.5 Change Of Variables, Nonlinear Maps . . . . . . . . . . . . . . . . . 250
10.6 The Mapping Is Only One To One . . . . . . . . . . . . . . . . . . . 254
10.7 Mappings Which Are Not One To One . . . . . . . . . . . . . . . . . 256
6 CONTENTS
10.8 Spherical Coordinates In p Dimensions . . . . . . . . . . . . . . . . . 258

10.9 Brouwer Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . 261
10.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11 Approximation Theorems 273

11.1 Bernstein Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 273
11.2 Stones Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . 275
11.3 The Case Of Complex Valued Functions . . . . . . . . . . . . . . . . 278
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
12 The Lp Spaces 283

12.1 Basic Inequalities And Properties . . . . . . . . . . . . . . . . . . . . 283
12.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.3 Minkowskis Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.4 Density Simple Functions . . . . . . . . . . . . . . . . . . . . . . . . 290
12.5 Density Of Cc () . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
12.6 Continuity Of Translation . . . . . . . . . . . . . . . . . . . . . . . . 292
12.7 Separability, Some Special Functions . . . . . . . . . . . . . . . . . . 293
12.8 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.9 Molliers And Density Of Cc (Rn ) . . . . . . . . . . . . . . . . . . . 296
12.10L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
12.11Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
13 Fourier Transforms 309

13.1 Fourier Transforms Of Functions In G . . . . . . . . . . . . . . . . . 309
13.2 Fourier Transforms Of Just About Anything . . . . . . . . . . . . . . 311
13.2.1 Fourier Transforms Of G . . . . . . . . . . . . . . . . . . . . 311
13.2.2 Fourier Transforms Of Functions In L1 (Rn ) . . . . . . . . . . 315
13.2.3 Fourier Transforms Of Functions In L2 (Rn ) . . . . . . . . . . 317
13.2.4 The Schwartz Class . . . . . . . . . . . . . . . . . . . . . . . 322
13.2.5 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
13.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
14 Fourier Series 331

14.1 Denition And Basic Properties . . . . . . . . . . . . . . . . . . . . . 331
14.2 The Riemann Lebesgue Lemma . . . . . . . . . . . . . . . . . . . . . 335
14.3 Dinis Criterion For Convergence . . . . . . . . . . . . . . . . . . . . 335
14.4 Jordans Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
14.5 Integrating And Dierentiating Fourier Series . . . . . . . . . . . . . 342
14.6 Dierentiating Fourier Series . . . . . . . . . . . . . . . . . . . . . . 344
14.7 Ways Of Approximating Functions . . . . . . . . . . . . . . . . . . . 346
14.7.1 Uniform Approximation With Trig. Polynomials . . . . . . . 346
14.7.2 Mean Square Approximation . . . . . . . . . . . . . . . . . . 350
14.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
CONTENTS 7
III Further Topics 361
15 Metric Spaces And General Topological Spaces 363

15.1 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
15.2 Compactness In Metric Space . . . . . . . . . . . . . . . . . . . . . . 365
15.3 Some Applications Of Compactness . . . . . . . . . . . . . . . . . . . 368
15.4 Ascoli Arzela Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 370
15.5 General Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . 374
15.6 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
15.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
16 Measure Theory And Topology 389

16.1 Borel Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
16.2 Regular Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
16.3 Locally Compact Hausdor Spaces . . . . . . . . . . . . . . . . . . . 398
16.4 Positive Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . 402
16.5 Lebesgue Measure And Its Properties . . . . . . . . . . . . . . . . . 410
16.6 Change Of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.7 Fubinis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
17 Extension Theorems 421

17.1 Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
17.2 Caratheodory Extension Theorem . . . . . . . . . . . . . . . . . . . 423
17.3 The Tychono Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 426
17.4 Kolmogorov Extension Theorem . . . . . . . . . . . . . . . . . . . . 428
17.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
17.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
18 Banach Spaces 441

18.1 Theorems Based On Baire Category . . . . . . . . . . . . . . . . . . 441
18.1.1 Baire Category Theorem . . . . . . . . . . . . . . . . . . . . . 441
18.1.2 Uniform Boundedness Theorem . . . . . . . . . . . . . . . . . 445
18.1.3 Open Mapping Theorem . . . . . . . . . . . . . . . . . . . . . 446
18.1.4 Closed Graph Theorem . . . . . . . . . . . . . . . . . . . . . 448
18.2 Hahn Banach Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 450
18.3 Uniform Convexity Of Lp . . . . . . . . . . . . . . . . . . . . . . . . 458
18.4 Weak And Weak Topologies . . . . . . . . . . . . . . . . . . . . . . 464
18.4.1 Basic Denitions . . . . . . . . . . . . . . . . . . . . . . . . . 464
18.4.2 Banach Alaoglu Theorem . . . . . . . . . . . . . . . . . . . . 465
18.4.3 Eberlein Smulian Theorem . . . . . . . . . . . . . . . . . . . 467
18.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
8 CONTENTS
19 Hilbert Spaces 477

19.1 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
19.2 Approximations In Hilbert Space . . . . . . . . . . . . . . . . . . . . 483
19.3 Orthonormal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
19.4 Fourier Series, An Example . . . . . . . . . . . . . . . . . . . . . . . 488
19.5 General Theory Of Continuous Semigroups . . . . . . . . . . . . . . 490
19.5.1 An Evolution Equation . . . . . . . . . . . . . . . . . . . . . 499
19.5.2 Adjoints, Hilbert Space . . . . . . . . . . . . . . . . . . . . . 502
19.5.3 Adjoints, Reexive Banach Space . . . . . . . . . . . . . . . . 506
19.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
20 Representation Theorems 515

20.1 Radon Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . 515
20.2 Vector Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
20.3 Representation Theorems For The Dual Space Of Lp . . . . . . . . . 529
20.4 The Dual Space Of L () . . . . . . . . . . . . . . . . . . . . . . . 537
20.5 Non Finite Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
20.6 The Dual Space Of C0 (X) . . . . . . . . . . . . . . . . . . . . . . . . 546
20.7 The Dual Space Of C0 (X), Another Approach . . . . . . . . . . . . 551
20.8 More Attractive Formulations . . . . . . . . . . . . . . . . . . . . . . 553
20.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
21 Dierentiation With Respect To General Radon Measures 559

21.1 Besicovitch Covering Theorem . . . . . . . . . . . . . . . . . . . . . 559
21.2 Fundamental Theorem Of Calculus For Radon Measures . . . . . . . 563
21.3 Slicing Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
21.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
22 The Bochner Integral 577

22.1 Strong And Weak Measurability . . . . . . . . . . . . . . . . . . . . 577
22.2 The Essential Bochner Integral . . . . . . . . . . . . . . . . . . . . . 585
22.3 The Spaces Lp (; X) . . . . . . . . . . . . . . . . . . . . . . . . . . 590
22.4 Measurable Representatives . . . . . . . . . . . . . . . . . . . . . . . 596
22.5 Vector Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
22.6 The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . 603
23 Hausdor Measure 611

23.1 Denition Of Hausdor Measures . . . . . . . . . . . . . . . . . . . . 611
23.1.1 Properties Of Hausdor Measure . . . . . . . . . . . . . . . . 612
23.2 Hn And mn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
23.3 Technical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 617
23.3.1 Steiner Symmetrization . . . . . . . . . . . . . . . . . . . . . 619
23.3.2 The Isodiametric Inequality . . . . . . . . . . . . . . . . . . . 621
23.4 The Proper Value Of (n) . . . . . . . . . . . . . . . . . . . . . . . . 622
23.4.1 A Formula For (n) . . . . . . . . . . . . . . . . . . . . . . . 623
23.4.2 Hausdor Measure And Linear Transformations . . . . . . . . 625
CONTENTS 9
A The Hausdor Maximal Theorem 627

A.1 The Hamel Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
Copyright
c 2010,
10 CONTENTS
Part I
Prerequisite Material
11
Some Fundamental Concepts
1.1 Set Theory

1.1.1 Basic Denitions
A set is a collection of things called elements of the set. For example, the set of
integers, the collection of signed whole numbers such as 1,2,-4, etc. This set whose
existence will be assumed is denoted by Z. Other sets could be the set of people in
a family or the set of donuts in a display case at the store. Sometimes parentheses,
{ } specify a set by listing the things which are in the set between the parentheses.
For example the set of integers between -1 and 2, including these numbers could
be denoted as {1, 0, 1, 2}. The notation signifying x is an element of a set S, is
written as x S. Thus, 1 {1, 0, 1, 2, 3}. Here are some axioms about sets.
Axioms are statements which are accepted, not proved.
1. Two sets are equal if and only if they have the same elements.
2. To every set A, and to every condition S (x) there corresponds a set B, whose
elements are exactly those elements x of A for which S (x) holds.
3. For every collection of sets there exists a set that contains all the elements
that belong to at least one set of the given collection.
4. The Cartesian product of a nonempty family of nonempty sets is nonempty.
5. If A is a set there exists a set P (A) such that P (A) is the set of all subsets
of A. This is called the power set.
These axioms are referred to as the axiom of extension, axiom of specication,

axiom of unions, axiom of choice, and axiom of powers respectively.
It seems fairly clear you should want to believe in the axiom of extension. It is
merely saying, for example, that {1, 2, 3} = {2, 3, 1} since these two sets have the
same elements in them. Similarly, it would seem you should be able to specify a
new set from a given set using some condition which can be used as a test to
13
14 SOME FUNDAMENTAL CONCEPTS
determine whether the element in question is in the set. For example, the set of all
integers which are multiples of 2. This set could be specied as follows.
{x Z : x = 2y for some y Z} .
In this notation, the colon is read as such that and in this case the condition is
being a multiple of 2.
Another example of political interest, could be the set of all judges who are not
judicial activists. I think you can see this last is not a very precise condition since
there is no way to determine to everyones satisfaction whether a given judge is an
activist. Also, just because something is grammatically correct does not
mean it makes any sense. For example consider the following nonsense.
S = {x set of dogs : it is colder in the mountains than in the winter} .
So what is a condition?
We will leave these sorts of considerations and assume our conditions make sense.
The axiom of unions states that for any collection of sets, there is a set consisting
of all the elements in each of the sets in the collection. Of course this is also open to
further consideration. What is a collection? Maybe it would be better to say set
of sets or, given a set whose elements are sets there exists a set whose elements
consist of exactly those things which are elements of at least one of these sets. If S
is such a set whose elements are sets
{A : A S} or S
signify this union.

Something is in the Cartesian product of a set or family of sets if it consists
of a single thing taken from each set in the family. Thus (1, 2, 3) {1, 4, .2}
{1, 2, 7} {4, 3, 7, 9} because it consists of exactly one element from each of the sets
which are separated by . Also, this is the notation for the Cartesian product of
nitely many sets. If S is a set whose elements are sets

A
AS
signies the Cartesian product.

The Cartesian product is the set of choice functions, a choice function being a
function which selects exactly one element of each set of S. You may think the axiom
of choice, stating that the Cartesian product of a nonempty family of nonempty sets
is nonempty, is innocuous but there was a time when many mathematicians were
ready to throw it out because it implies things which are very hard to believe, things
which never happen without the axiom of choice.
A is a subset of B, written A B, if every element of A is also an element of
B. This can also be written as B A. A is a proper subset of B, written A B
or B A if A is a subset of B but A is not equal to B, A = B. A B denotes the
intersection of the two sets A and B and it means the set of elements of A which
1.1. SET THEORY 15
are also elements of B. The axiom of specication shows this is a set. The empty
set is the set which has no elements in it, denoted as . A B denotes the union
of the two sets A and B and it means the set of all elements which are in either of
the sets. It is a set because of the axiom of unions.
The complement of a set, (the set of things which are not in the given set ) must
be taken with respect to a given set called the universal set which is a set which
contains the one whose complement is being taken. Thus, the complement of A,
denoted as AC ( or more precisely as X \ A) is a set obtained from using the axiom
of specication to write
AC {x X : x / A}
The symbol / means: is not an element of. Note the axiom of specication takes
place relative to a given set. Without this universal set it makes no sense to use
the axiom of specication to obtain the complement.
Words such as all or there exists are called quantiers and they must be
understood relative to some given set. For example, the set of all integers larger
than 3. Or there exists an integer larger than 7. Such statements have to do with a
given set, in this case the integers. Failure to have a reference set when quantiers
are used turns out to be illogical even though such usage may be grammatically
correct. Quantiers are used often enough that there are symbols for them. The
symbol is read as for all or for every and the symbol is read as there
exists. Thus could mean for every upside down A there exists a backwards
E.
DeMorgans laws are very useful in mathematics. Let S be a set of sets each of
which is contained in some universal set U . Then
{ } C
AC : A S = ( {A : A S})
and { } C
AC : A S = ( {A : A S}) .
These laws follow directly from the denitions. Also following directly from the
denitions are:
Let S be a set of sets then
B {A : A S} = {B A : A S} .
and: Let S be a set of sets show
B {A : A S} = {B A : A S} .
Unfortunately, there is no single universal set which can be used for all sets.
Here is why: Suppose there were. Call it S. Then you could consider A the set
of all elements of S which are not elements of themselves, this from the axiom of
specication. If A is an element of itself, then it fails to qualify for inclusion in A.
Therefore, it must not be an element of itself. However, if this is so, it qualies for
inclusion in A so it is an element of itself and so this cant be true either. Thus
the most basic of conditions you could imagine, that of being an element of, is
meaningless and so allowing such a set causes the whole theory to be meaningless.
The solution is to not allow a universal set. As mentioned by Halmos in Naive
set theory, Nothing contains everything. Always beware of statements involving
quantiers wherever they occur, even this one. This little observation described
above is due to Bertrand Russell and is called Russells paradox.
1.1.2 The Schroder Bernstein Theorem

It is very important to be able to compare the size of sets in a rational way. The
most useful theorem in this context is the Schroder Bernstein theorem which is the
main result to be presented in this section. The Cartesian product is discussed
above. The next denition reviews this and denes the concept of a function.
Denition 1.1.1 Let X and Y be sets.
X Y {(x, y) : x X and y Y }
A relation is dened to be a subset of X Y . A function, f, also called a mapping,

is a relation which has the property that if (x, y) and (x, y1 ) are both elements of
the f , then y = y1 . The domain of f is dened as
D (f ) {x : (x, y) f } ,
written as f : D (f ) Y .
It is probably safe to say that most people do not think of functions as a type
of relation which is a subset of the Cartesian product of two sets. A function is like
a machine which takes inputs, x and makes them into a unique output, f (x). Of
course, that is what the above denition says with more precision. An ordered pair,
(x, y) which is an element of the function or mapping has an input, x and a unique
output, y,denoted as f (x) while the name of the function is f . mapping is often
a noun meaning function. However, it also is a verb as in f is mapping A to B
. That which a function is thought of as doing is also referred to using the word
maps as in: f maps X to Y . However, a set of functions may be called a set of
maps so this word might also be used as the plural of a noun. There is no help for
it. You just have to suer with this nonsense.
The following theorem which is interesting for its own sake will be used to prove
the Schroder Bernstein theorem.
Theorem 1.1.2 Let f : X Y and g : Y X be two functions. Then there exist

sets A, B, C, D, such that
A B = X, C D = Y, A B = , C D = ,
f (A) = C, g (D) = B.
1.1. SET THEORY 17
The following picture illustrates the conclusion of this theorem.
X Y
f
A - C = f (A)
g
B = g(D) D
Proof: Consider the empty set X. If y Y \ f (), then g (y) / because

has no elements. Also, if A, B, C, and D are as described above, A also would
have this same property that the empty set has. However, A is probably larger.
Therefore, say A0 X satises P if whenever y Y \ f (A0 ) , g (y)
/ A0 .
A {A0 X : A0 satises P}.
Let A = A. If y Y \ f (A), then for each A0 A, y Y \ f (A0 ) and so

g (y)
/ A0 . Since g (y)
/ A0 for all A0 A, it follows g (y)
/ A. Hence A satises
P and is the largest subset of X which does so. Now dene
C f (A) , D Y \ C, B X \ A.
It only remains to verify that g (D) = B.

Suppose x B = X \ A. Then A {x} does not satisfy P and so there exists
y Y \ f (A {x}) D such that g (y) A {x} . But y / f (A) and so since A
satises P, it follows g (y)
/ A. Hence g (y) = x and so x g (D).
Theorem 1.1.3 (Schroder Bernstein) If f : X Y and g : Y X are one to

one, then there exists h : X Y which is one to one and onto.
Proof: Let A, B, C, D be the sets of Theorem 1.1.2 and dene

{
f (x) if x A
h (x)
g 1 (x) if x B
Then h is the desired one to one and onto mapping.

Recall that the Cartesian product may be considered as the collection of choice
functions.
Denition 1.1.4 Let I be a set and let Xi be a set for each i I. f is a choice
function written as
f Xi
iI
if f (i) Xi for each i I.

The axiom of choice says that if Xi = for each i I, for I a set, then

Xi = .
iI
Sometimes the two functions, f and g are onto but not one to one. It turns out
that with the axiom of choice, a similar conclusion to the above may be obtained.
Corollary 1.1.5 If f : X Y is onto and g : Y X is onto, then there exists

h : X Y which is one to one and onto.
Proof: For each y Y , f 1 (y) {x X : f (x) = y} = . Therefore, by the

axiom of choice, there exists f01 yY f 1 (y) which is the same as saying that
for each y Y , f01 (y) f 1 (y). Similarly, there exists g01 (x) g 1 (x) for all
x X. Then f01 is one to one because if f01 (y1 ) = f01 (y2 ), then
( ) ( )
y1 = f f01 (y1 ) = f f01 (y2 ) = y2 .
Similarly g01 is one to one. Therefore, by the Schroder Bernstein theorem, there
exists h : X Y which is one to one and onto.
Denition 1.1.6 A set S, is nite if there exists a natural number n and a map
which maps {1, , n} one to one and onto S. S is innite if it is not nite. A
set S, is called countable if there exists a map mapping N one to one and onto
S.(When maps a set A to a set B, this will be written as : A B in the future.)
Here N {1, 2, }, the natural numbers. S is at most countable if there exists a
map : N S which is onto.
The property of being at most countable is often referred to as being countable

because the question of interest is normally whether one can list all elements of the
set, designating a rst, second, third etc. in such a way as to give each element of
the set a natural number. The possibility that a single element of the set may be
counted more than once is often not important.
Theorem 1.1.7 If X and Y are both at most countable, then X Y is also at most
countable. If either X or Y is countable, then X Y is also countable.
Proof: It is given that there exists a mapping : N X which is onto. Dene

(i) xi and consider X as the set {x1 , x2 , x3 , }. Similarly, consider Y as the
set {y1 , y2 , y3 , }. It follows the elements of X Y are included in the following
rectangular array.
(x1 , y1 ) (x1 , y2 ) (x1 , y3 ) Those which have x1 in rst slot.

(x2 , y1 ) (x2 , y2 ) (x2 , y3 ) Those which have x2 in rst slot.
(x3 , y1 ) (x3 , y2 ) (x3 , y3 ) Those which have x3 in rst slot. .
.. .. .. ..
. . . .
1.1. SET THEORY 19
Follow a path through this array as follows.
(x1 , y1 ) (x1 , y2 ) (x1 , y3 )

(x2 , y1 ) (x2 , y2 )

(x3 , y1 )
Thus the rst element of X Y is (x1 , y1 ), the second element of X Y is (x1 , y2 ),

the third element of X Y is (x2 , y1 ) etc. This assigns a number from N to each
element of X Y. Thus X Y is at most countable.
It remains to show the last claim. Suppose without loss of generality that X
is countable. Then there exists : N X which is one to one and onto. Let
: X Y N be dened by ((x, y)) 1 (x). Thus is onto N. By the rst
part there exists a function from N onto X Y . Therefore, by Corollary 1.1.5,
there exists a one to one and onto mapping from X Y to N.
Theorem 1.1.8 If X and Y are at most countable, then X Y is at most countable.

If either X or Y are countable, then X Y is countable.
Proof: As in the preceding theorem,
X = {x1 , x2 , x3 , }
and
Y = {y1 , y2 , y3 , } .
Consider the following array consisting of X Y and path through it.
x1 x2 x3

y1 y2
Thus the rst element of X Y is x1 , the second is x2 the third is y1 the fourth is
y2 etc.
Consider the second claim. By the rst part, there is a map from N onto X Y .
Suppose without loss of generality that X is countable and : N X is one to one
and onto. Then dene (y) 1, for all y Y ,and (x) 1 (x). Thus, maps
X Y onto N and this shows there exist two onto maps, one mapping X Y onto N
and the other mapping N onto X Y . Then Corollary 1.1.5 yields the conclusion.
1.1.3 Equivalence Relations

There are many ways to compare elements of a set other than to say two elements
are equal or the same. For example, in the set of people let two people be equiv-
alent if they have the same weight. This would not be saying they were the same
person, just that they weighed the same. Often such relations involve considering
one characteristic of the elements of a set and then saying the two elements are
equivalent if they are the same as far as the given characteristic is concerned.
Denition 1.1.9 Let S be a set. is an equivalence relation on S if it satises

the following axioms.
1. x x for all x S. (Reexive)
2. If x y then y x. (Symmetric)
3. If x y and y z, then x z. (Transitive)
Denition 1.1.10 [x] denotes the set of all elements of S which are equivalent to
x and [x] is called the equivalence class determined by x or just the equivalence class
of x.
With the above denition one can prove the following simple theorem.
Theorem 1.1.11 Let be an equivalence class dened on a set S and let H denote
the set of equivalence classes. Then if [x] and [y] are two of these equivalence classes,
either x y and [x] = [y] or it is not true that x y and [x] [y] = .
1.2 lim sup And lim inf

It is assumed in all that is done that R is complete. There are two ways to describe
completeness of R. One is to say that every bounded set has a least upper bound and
a greatest lower bound. The other is to say that every Cauchy sequence converges.
These two equivalent notions of completeness will be taken as given.
The symbol, F will mean either R or C. The symbol [, ] will mean all real
numbers along with + and which are points which we pretend are at the
right and left ends of the real line respectively. The inclusion of these make believe
points makes the statement of certain theorems less trouble.
Denition 1.2.1 For A [, ] , A = sup A is dened as the least upper

bound in case A is bounded above by a real number and equals if A is not bounded
above. Similarly inf A is dened to equal the greatest lower bound in case A is
bounded below by a real number and equals in case A is not bounded below.
Lemma 1.2.2 If {An } is an increasing sequence in [, ], then
sup {An } = lim An .

n
Similarly, if {An } is decreasing, then
inf {An } = lim An .

n
1.2. LIM SUP AND LIM INF 21
Proof: Let sup ({An : n N}) = r. In the rst case, suppose r < . Then
letting > 0 be given, there exists n such that An (r , r]. Since {An } is
increasing, it follows if m > n, then r < An Am r and so limn An = r
as claimed. In the case where r = , then if a is a real number, there exists n
such that An > a. Since {Ak } is increasing, it follows that if m > n, Am > a. But
this is what is meant by limn An = . The other case is that r = . But
in this case, An = for all n and so limn An = . The case where An is
decreasing is entirely similar.
n
Sometimes the limit of a sequence does not exist. For example, if an = (1) ,
then limn an does not exist. This is because the terms of the sequence are a
distance of 1 apart. Therefore there cant exist a single number such that all the
terms of the sequence are ultimately within 1/4 of that number. The nice thing
about lim sup and lim inf is that they always exist. First here is a simple lemma
and denition.
Denition 1.2.3 Denote by [, ] the real line along with symbols and .
It is understood that is larger than every real number and is smaller than
every real number. Then if {An } is an increasing sequence of points of [, ] ,
limn An equals if the only upper bound of the set {An } is . If {An } is
bounded above by a real number, then limn An is dened in the usual way and
equals the least upper bound of {An }. If {An } is a decreasing sequence of points of
[, ] , limn An equals if the only lower bound of the sequence {An } is
. If {An } is bounded below by a real number, then limn An is dened in the
usual way and equals the greatest lower bound of {An }. More simply, if {An } is
increasing,
lim An = sup {An }
n
and if {An } is decreasing then
lim An = inf {An } .
n
Lemma 1.2.4 Let {an } be a sequence of real numbers and let Un sup {ak : k n} .
Then {Un } is a decreasing sequence. Also if Ln inf {ak : k n} , then {Ln } is
an increasing sequence. Therefore, limn Ln and limn Un both exist.
Proof: Let Wn be an upper bound for {ak : k n} . Then since these sets are
getting smaller, it follows that for m < n, Wm is an upper bound for {ak : k n} .
In particular if Wm = Um , then Um is an upper bound for {ak : k n} and so Um
is at least as large as Un , the least upper bound for {ak : k n} . The claim that
{Ln } is decreasing is similar.
From the lemma, the following denition makes sense.
Denition 1.2.5 Let {an } be any sequence of points of [, ]
lim sup an lim sup {ak : k n}
n n
lim inf an lim inf {ak : k n} .

n n
Theorem 1.2.6 Suppose {an } is a sequence of real numbers and that
lim sup an
n
and
lim inf an
n
are both real numbers. Then limn an exists if and only if
lim inf an = lim sup an

n n
and in this case,

lim an = lim inf an = lim sup an .
n n n
Proof: First note that
sup {ak : k n} inf {ak : k n}
and so from Theorem 3.2.7,
lim sup an lim sup {ak : k n}

n n
lim inf {ak : k n}

n
lim inf an .
n
Suppose rst that limn an exists and is a real number. Then by Theorem 3.5.3
{an } is a Cauchy sequence. Therefore, if > 0 is given, there exists N such that if
m, n N, then
|an am | < /3.
From the denition of sup {ak : k N } , there exists n1 N such that
sup {ak : k N } an1 + /3.
Similarly, there exists n2 N such that
inf {ak : k N } an2 /3.
It follows that
2
sup {ak : k N } inf {ak : k N } |an1 an2 | + < .
3

Since the sequence, {sup {ak : k N }}N =1 is decreasing and {inf {ak : k N }}N =1
is increasing, it follows from Theorem 3.2.7
0 lim sup {ak : k N } lim inf {ak : k N }

N N
1.2. LIM SUP AND LIM INF 23
Since is arbitrary, this shows

lim sup {ak : k N } = lim inf {ak : k N } (1.2.1)
N N
Next suppose 1.2.1. Then

lim (sup {ak : k N } inf {ak : k N }) = 0
N
Since sup {ak : k N } inf {ak : k N } it follows that for every > 0, there
exists N such that
sup {ak : k N } inf {ak : k N } <
Thus if m, n > N, then
|am an | <
which means {an } is a Cauchy sequence. Since R is complete, it follows that
limn an a exists. By the squeezing theorem, it follows
a = lim inf an = lim sup an
n n

With the above theorem, here is how to dene the limit of a sequence of points
in [, ].
Denition 1.2.7 Let {an } be a sequence of points of [, ] . Then limn an
exists exactly when
n n
and in this case
lim an lim inf an = lim sup an .
n n n
The signicance of lim sup and lim inf, in addition to what was just discussed,
is contained in the following theorem which follows quickly from the denition.
Theorem 1.2.8 Suppose {an } is a sequence of points of [, ] . Let
= lim sup an .
n
Then if b > , it follows there exists N such that whenever n N,

an b.
If c < , then an > c for innitely many values of n. Let
= lim inf an .
n
Then if d < , it follows there exists N such that whenever n N,

an d.
If e > , it follows an < e for innitely many values of n.
The proof of this theorem is left as an exercise for you. It follows directly from
the denition and it is the sort of thing you must do yourself. Here is one other
simple proposition.
Proposition 1.2.9 Let limn an = a > 0. Then
lim sup an bn = a lim sup bn .

n n
Proof: This follows from the denition. Let n = sup {ak bk : k n} . For all n
large enough, an > a where is small enough that a > 0. Therefore,
n sup {bk : k n} (a )
for all n large enough. Then
lim sup an bn = lim n lim sup an bn

n n n
lim (sup {bk : k n} (a ))
n
= (a ) lim sup bn
n
Similar reasoning shows
lim sup an bn (a + ) lim sup bn

n n
Now since > 0 is arbitrary, the conclusion follows.
1.3 Double Series

Sometimes it is required to consider double series which are of the form

ajk ajk .
k=m j=m k=m j=m
In other words, rst sum on j yielding something which depends on k and then sum
these. The major consideration for these double series is the question of when

ajk = ajk .
k=m j=m j=m k=m
In other words, when does it make no dierence which subscript is summed over
rst? In the case of nite sums there is no issue here. You can always write

M
N
N
M
ajk = ajk
k=m j=m j=m k=m
1.3. DOUBLE SERIES 25
because addition is commutative. However, there are limits involved with innite
sums and the interchange in order of summation involves taking limits in a dierent
order. Therefore, it is not always true that it is permissible to interchange the
two sums. A general rule of thumb is this: If something involves changing the
order in which two limits are taken, you may not do it without agonizing over
the question. In general, limits foul up algebra and also introduce things which are
counter intuitive. Here is an example. This example is a little technical. It is placed
here just to prove conclusively there is a question which needs to be considered.
Example 1.3.1 Consider the following picture which depicts some of the ordered
pairs (m, n) where m, n are positive integers.
0 0 0 0 0 c 0 -c
0 0 0 0 c 0 -c 0
0 0 0 c 0 -c 0 0
0 0 c 0 -c 0 0 0
0 c 0 -c 0 0 0 0
b 0 -c 0 0 0 0 0
0 a 0 0 0 0 0 0
The numbers next to the point are the values of amn . You see ann = 0 for all
n, a21 = a, a12 = b, amn = c for (m, n) on the line y = 1 + x whenever m > 1, and
amn = c for all (m, n) on the line y = x 1 whenever m > 2.

Then m=1 amn = a if n = 1, m=1 amn = b c if n = 2 and if n >

2, m=1 amn = 0. Therefore,

amn = a + b c.
n=1 m=1

Next
observe that n=1 amn = b if m = 1, n=1 amn = a + c if m = 2, and
a
n=1 mn = 0 if m > 2. Therefore,

amn = b + a + c
m=1 n=1
and so the two sums are dierent. Moreover, you can see that by assigning dierent
values of a, b, and c, you can get an example for any two dierent numbers desired.
It turns out that if aij 0 for all i, j, then you can always interchange the order
of summation. This is shown next and is based on the following lemma. First, some
notation should be discussed.
Denition 1.3.2 Let f (a, b) [, ] for a A and b B where A, B are

sets which means that f (a, b) is either a number, , or . The symbol, + is
interpreted as a point out at the end of the number line which is larger than every
real number. Of course there is no such number. That is why it is called . The
symbol, is interpreted similarly. Then supaA f (a, b) means sup (Sb ) where
Sb {f (a, b) : a A} .
Unlike limits, you can take the sup in dierent orders.
Lemma 1.3.3 Let f (a, b) [, ] for a A and b B where A, B are sets.
Then
sup sup f (a, b) = sup sup f (a, b) .
aA bB bB aA
Proof: Note that for all a, b, f (a, b) supbB supaA f (a, b) and therefore, for
all a, supbB f (a, b) supbB supaA f (a, b). Therefore,
sup sup f (a, b) sup sup f (a, b) .
aA bB bB aA
Repeat the same argument interchanging a and b, to get the conclusion of the
lemma.
Theorem 1.3.4 Let aij 0. Then

aij = aij .
i=1 j=1 j=1 i=1
Proof: First note there is no trouble in dening these sums because the aij are
all nonnegative. If a sum diverges, it only diverges to and so is the value of
the sum. Next note that

n
aij sup aij
n
j=r i=r j=r i=r
because for all j,

n
aij aij .
i=r i=r
Therefore,

n
m
n
aij sup aij = sup lim aij
n n m
j=r i=r j=r i=r j=r i=r

n
m
n
m
= sup lim aij = sup lim aij
n m n m
i=r j=r i=r j=r

n
n

= sup aij = lim aij = aij
n n
i=r j=r i=r j=r i=r j=r
Interchanging the i and j in the above argument proves the theorem.

Row reduced echelon form
2.1 Elementary Matrices

The elementary matrices result from doing a row operation to the identity matrix.
Denition 2.1.1 The row operations consist of the following
1. Switch two rows.
2. Multiply a row by a nonzero number.
3. Replace a row by the same row added to a multiple of another row.
We refer to these as the row operations of type 1,2, and 3 respectively.
The elementary matrices are given in the following denition.
Denition 2.1.2 The elementary matrices consist of those matrices which result
by applying a row operation to an identity matrix. Those which involve switching
rows of the identity are called permutation matrices1 .
As an example of why these elementary matrices are interesting, consider the

following.

0 1 0 a b c d x y z w
1 0 0 x y z w = a b c d
0 0 1 f g h i f g h i
A 3 4 matrix was multiplied on the left by an elementary matrix which was

obtained from row operation 1 applied to the identity matrix. This resulted in
applying the operation 1 to the given matrix. This is what happens in general.
1 More generally, a permutation matrix is a matrix which comes by permuting the rows of the
identity matrix, not just switching two rows.
27
28 ROW REDUCED ECHELON FORM
Now consider what these elementary matrices look like. First Pij , which involves
switching row i and row j of the identity where i < j. We write

r1
..
.

ri

I = ...

rj

.
..
rn
where
rj = (0 1 0)
with the 1 in the j th position from the left.

This matrix P ij is of the form

r1
..
.

rj

..
.

ri

.
..
rn
Now consider what this does to a column vector.

r1 v1 v1
.. .. ..
. . .

rj vi vj

.. .. = ..
. . .

ri vj vi

. .. ..
.. . .
rn vn vn
2.1. ELEMENTARY MATRICES 29
Now we try multiplication of a matrix on the left by this elementary matrix P ij .

Consider
a11 a12 a1p
.. .. ..
. . .

ai1 ai2 aip

ij ..
P ... ..
. .

aj1 aj2 ajp

. .. ..
.. . .
an1 an2 anp
From the way you multiply matrices this is a matrix which has the indicated
columns.
a11 a12 a1p
.. .. ..
. . .

ai1 ai2 aip

ij .. ij .. .
P . , P . , , P ij ..

aj1 aj2 ajp

. . .
.. .. ..
an1 an2 anp

a11 a12 a1p
.. .. ..
. . .

aj1 aj2 ajp

= ... , ... , , ...

ai1 ai2 aip

. . .
.. .. ..
an1 an2 anp

a11 a12 a1p
.. .. ..
. . .

aj1 aj2 ajp

..
= ... ..
. .

ai1 ai2 aip

. .. ..
.. . .
an1 an2 anp
This has established the following lemma.
Lemma 2.1.3 Let P ij denote the elementary matrix which involves switching the
ith and the j th rows of I. Then if P ij , A are conformable, we have
P ij A = B
where B is obtained from A by switching the ith and the j th rows.

Next consider the row operation which involves multiplying the ith row by a
nonzero constant, c. We write
r1
r2

I= .
..
rn
where
rj = (0 1 0)
th
with the 1 in the j position from the left. The elementary matrix which results
from applying this operation to the ith row of the identity matrix is of the form

r1
..
.

E (c, i) =
cri

.
..
rn

r1 v1 v1
.. .. ..
. . .

cri vi = cvi

. . .
.. .. ..
rn vn vn
Denote by E (c, i) this elementary matrix which multiplies the ith row of the identity
by the nonzero constant, c. Then from what was just discussed and the way matrices
are multiplied,
a11 a12 a1p
.. .. ..
. . .

E (c, i) ai1 ai2 aip
. .. ..
.. . .
an1 an2 anp
equals a matrix having the columns indicated below.

a11 a12 a1p
.. .. ..
. . .

= ca i1 ca i2 ca ip

. .. ..
. . . .
an1 an2 anp
This proves the following lemma.
Lemma 2.1.4 Let E (c, i) denote the elementary matrix corresponding to the row
operation in which the ith row is multiplied by the nonzero constant c. Thus E (c, i)
involves multiplying the ith row of the identity matrix by c. Then
E (c, i) A = B
where B is obtained from A by multiplying the ith row of A by c.
Finally consider the third of these row operations. Letting rj be the j th row of
the identity matrix, denote by E (c i + j) the elementary matrix obtained from
the identity matrix by replacing rj with rj + cri . In case i < j this will be of the
form
r1
..
.

ri

..
.

cri + rj

..
.
rn

r1 v1 v1
.. .. ..
. . .

ri vi vi

.. .. ..
. . = .

cri + rj vj cvi + vj

.. . ..
. .. .
rn vn vn
Now from this and the way matrices are multiplied,

a11 a12 a1p
.. .. ..
. . .

ai1 ai2 aip

..
E (c i + j) ... ..
. .

aj2 aj2 ajp

. .. ..
.. . .
an1 an2 anp
equals a matrix of the following form having the indicated columns.

a11 a12 a1p
.. .. ..
. . .

ai1 ai2 aip

.. .. .
E (c i + j) . , E (c i + j) . , E (c i + j) ..

aj2 aj2 ajp

. . .
.. .. ..
an1 an2 anp

a11 a12 a1p
.. .. ..
. . .

ai1 ai2 aip

. . .
= .. .. ..

aj2 + cai1 aj2 + cai2 ajp + caip

.. .. ..
. . .
an1 an2 anp
The case where i > j is similar. This proves the following lemma in which, as above,
the ith row of the identity is ri .
Lemma 2.1.5 Let E (c i + j) denote the elementary matrix obtained from I by
replacing the j th row of the identity rj with cri + rj . Letting the k th row of A be
ak ,
E (c i + j) A = B
where B has the same rows as A except the j th row of B is cai + aj .
The above lemmas are summarized in the following theorem.
Theorem 2.1.6 To perform any of the three row operations on a matrix A it suf-
ces to do the row operation on the identity matrix, obtaining an elementary matrix
E, and then take the product, EA. In addition to this, the following identities hold
for the elementary matrices described above.
E (c i + j) E (c i + j) = E (c i + j) E (c i + j) = I (2.1.1)
( ) ( )
E (c, i) E c1 , i = E c1 , i E (c, i) (2.1.2)
ij ij
P P =I (2.1.3)
Proof: Consider 2.1.1. Starting with I and taking c times the ith row added
to the j th yields E (c i + j) which diers from I only in the j th row. Now
multiplying on the left by E (c i + j) takes c times the ith row and adds to the j th
thus restoring the j th row to its original state. Thus E (c i + j) E (c i + j) =
I. Similarly E (c i + j) E (c i + j) = I. The reasoning is similar for 2.1.2 and
2.1.3.
Denition 2.1.7 For an nn matrix A, an nn matrix B which has the property

that AB = BA = I is denoted by A1 . Such a matrix is called an inverse. When
A has an inverse, it is called invertible.
The following lemma says that if a matrix acts like an inverse, then it is the
inverse. Also, the product of invertible matrices is invertible.
Lemma 2.1.8 If B, C are both inverses of A, then B = C. That is, there exists at
most one inverse of a matrix. If A1 , , Am are each invertible m m matrices,
then the product A1 A2 Am is also invertible and
1 1 1
(A1 A2 Am ) = A1
m Am1 A1
Proof. From the denition and associative law of matrix multiplication,

B = BI = B (AC) = (BA) C = IC = C.
This proves the uniqueness of the inverse.
Next suppose A, B are invertible. Then
( ) ( )
AB B 1 A1 = A BB 1 A1 = AIA1 = AA1 = I
and also ( ) ( )
B 1 A1 AB = B 1 A1 A B = B 1 IB = B 1 B = I
It follows from Denition 2.1.7 that AB has an inverse and it is B 1 A1 . Thus the
case of m = 1, 2 in the claim of the lemma is true. Suppose this claim is true for k.
Then
A1 A2 Ak Ak+1 = (A1 A2 Ak ) Ak+1
By induction, the two matrices (A1 A2 Ak ) , Ak+1 are both invertible and
1
(A1 A2 Ak ) = A1 1 1
k A2 A1
By the case of the product of two invertible matrices shown above,

1 1
((A1 A2 Ak ) Ak+1 ) = A1
k+1 (A1 A2 Ak )
= A1 1 1 1
k+1 Ak A2 A1 .

We will discuss methods for nding the inverse later. For now, observe that
Theorem 2.1.6 says that elementary matrices are invertible and that the inverse of
such a matrix is also an elementary matrix. The major conclusion of the above
Lemma and Theorem is the following lemma about linear relationships.
Denition 2.1.9 Let v1 , , vk , u be vectors. Then u is said to be a linear com-
bination of the vectors {v1 , , vk } if there exist scalars c1 , , ck such that

k
u= ci vi .
i=1
We also say that when the above holds for some scalars c1 , , ck , there exists a
linear relationship between the vector u and the vectors {v1 , , vk }.
We will discuss this more later, but the following picture illustrates the geometric
signicance of the vectors which have a linear relationship with two vectors u, v
pointing in dierent directions.
1u y
v
x
The following lemma states that linear relationships between columns in a ma-
trix are preserved by row operations. This simple lemma is the main result in
understanding all the major questions related to the row reduced echelon form as
well as many other topics.
Lemma 2.1.10 Let A and B be two m n matrices and suppose B results from a
row operation applied to A. Then the k th column of B is a linear combination of the
i1 , , ir columns of B if and only if the k th column of A is a linear combination of
the i1 , , ir columns of A. Furthermore, the scalars in the linear combinations are
the same. (The linear relationship between the k th column of A and the i1 , , ir
columns of A is the same as the linear relationship between the k th column of B
and the i1 , , ir columns of B.)
Proof. Let A be the following matrix in which the ak are the columns
( )
a1 a2 an
and let B be the following matrix in which the columns are given by the bk
( )
b1 b2 bn
Then by Theorem 2.1.6 on Page 32, bk = Eak where E is an elementary matrix.
Suppose then that one of the columns of A is a linear combination of some other
columns of A. Say
ak = c1 ai1 + + cr air
Then multiplying by E,
bk = Eak = c1 Eai1 + + cr Eair = c1 bi1 + + cr bir .

This is really just an extension of the technique for nding solutions to a linear
system of equations. In solving a system of equations earlier, row operations were
used to exhibit the last column of an augmented matrix as a linear combination of
the preceding columns. The row reduced echelon form makes obvious all linear
relationships between all columns, not just the last column and those preceding it.
2.2. THE ROW REDUCED ECHELON FORM OF A MATRIX 35
2.2 The row reduced echelon form of a matrix

When you do row operations on a matrix, there is an ultimate conclusion. It is
called the row reduced echelon form. We show here that every matrix has such
a row reduced echelon form and that this row reduced echelon form is unique. The
signicance is that it becomes possible to use the denite article in referring to the
row reduced echelon form. Hence important conclusions about the original matrix
may be logically deduced from an examination of its unique row reduced echelon
form. First we need the following denition.
Denition 2.2.1 Dene special column vectors ei as follows.

0
..
.

ei =
1

.
..
0
Thus ei is the column vector which has all zero entries except for a 1 in the ith
position down from the top.
Now here is the description of the row reduced echelon form.
Denition 2.2.2 An m n matrix is said to be in row reduced echelon form if,

in viewing successive columns from left to right, the rst nonzero column encoun-
tered is e1 and, in viewing the columns of the matrix from left to right, you have
encountered e1 , e2 , , ek , the next column is either ek+1 or this next column is a
linear combination of the vectors, e1 , e2 , , ek .
The n n matrix

1 0 0
0 1 0

I= .. .. .. (the identity matrix)
. . .
0 0 1
is row reduced. So too are, for example,

( ) 1 0 0 1 2 0 0
0 1
A= ,B = 0 1 3 ,C = 0 0 1 0 .
0 0
0 0 0 0 0 0 1
Denition 2.2.3 Given a matrix A, row reduction produces one and only one row
reduced matrix B with A B. See Corollary 2.2.9.
Theorem 2.2.4 Let A be an m n matrix. Then A has a row reduced echelon

form determined by a simple process.
Proof. Viewing the columns of A from left to right, take the rst nonzero
column. Pick a nonzero entry in this column and switch the row containing this
entry with the top row of A. Now divide this new top row by the value of this
nonzero entry to get a 1 in this position and then use row operations to make all
entries below this equal to zero. Thus the rst nonzero column is now e1 . Denote
the resulting matrix by A1 . Consider the sub-matrix of A1 to the right of this
column and below the rst row. Do exactly the same thing for this sub-matrix that
was done for A. This time the e1 will refer to F m1 . Use the rst 1 obtained by the
above process which is in the top row of this sub-matrix and row operations, to zero
out every entry above it in the rows of A1 . Call the resulting matrix A2 . Thus A2
satises the conditions of the above denition up to the column just encountered.
Continue this way till every column has been dealt with and the result must be in
row reduced echelon form.
Now here is some terminology which is often used.
Denition 2.2.5 The rst pivot column of A is the rst nonzero column of A
which becomes e1 in the row reduced echelon form. The next pivot column is the
rst column after this which becomes e2 in the row reduced echelon form. The third
is the next column which becomes e3 in the row reduced echelon form and so forth.
The algorithm just described for obtaining a row reduced echelon form shows
that these columns are well dened, but we will deal with this issue more carefully
in Corollary 2.2.9 where we show that every matrix corresponds to exactly one row
reduced echelon form.
Example 2.2.6 Determine the pivot columns for the matrix

2 1 3 6 2
A= 1 7 8 4 0 (2.2.4)
1 3 4 2 2
As described above, the k th pivot column of A is the column in which ek appears

in the row reduced echelon form for the rst time in reading from left to right. A
row reduced echelon form for A is
64

1 0 1 0 35
0 1 1 0 4
35
0 0 0 1 35 9
It follows that columns 1,2, and 4 in A are pivot columns.

Note that from Lemma 2.1.10 the last column of A has a linear relationship to
the rst four columns. Namely

2 2 ( ) 1 ( ) 6
0 = 64 1 + 4 7 + 9 4
35 35 35
2 1 3 2
This linear relationship is revealed by the row reduced echelon form but it was not
apparent from the original matrix.
2.2. THE ROW REDUCED ECHELON FORM OF A MATRIX 37
Denition 2.2.7 Two matrices A, B are said to be row equivalent if B can be

obtained from A by a sequence of row operations. When A is row equivalent to B,
we write A B.
Proposition 2.2.8 In the notation of Denition 2.2.7. A A. If A B, then

B A. If A B and B C, then A C.
Proof: That A A is obvious. Consider the second claim. By Theorem 2.1.6,

there exist elementary matrices E1 , E2 , , Em such that
B = E1 E2 Em A
1
It follows from Lemma 2.1.8 that (E1 E2 Em ) exists and equals the product of
the inverses of these matrices in the reverse order. Thus
1 1 1
Em Em1 E11 B = (E1 E2 Em ) B
1
= (E1 E2 Em ) (E1 E2 Em ) A = A
By Theorem 2.1.6, each Ek1 is an elementary matrix. By Theorem 2.1.6 again, the
above shows that A results from a sequence of row operations applied to B. The
last claim is left for an exercise.
There are three choices for row operations at each step in Theorem 2.2.4. A
natural question is whether the same row reduced echelon matrix always results in
the end from following any sequence of row operations.
We have already made use of the following observation in nding a linear rela-
tionship between the columns of the matrix A in 2.2.4, but here it is stated more
formally. Now
x1
..
. = x1 e1 + + xn en ,
xn
so to say two column vectors are equal, is to say the column vectors are the same
linear combination of the special vectors ej .
Corollary 2.2.9 The row reduced echelon form is unique. That is if B, C are two
matrices in row reduced echelon form and both are obtained from A by a sequence
of row operations, then B = C.
Proof: Suppose B and C are both row reduced echelon forms for the matrix
A. It follows that B and C have zero columns in the same position because row
operations do not aect zero columns. By Proposition 2.2.8, B and C are row
equivalent. Suppose e1 , , er occur in B for the rst time, reading from left to
right, in positions i1 , , ir respectively. Then from the description of the row
reduced echelon form, each of these columns of B, in positions i1 , , ir , is not a
linear combination of the preceding columns. Since C is row equivalent to B, it
follows from Lemma 2.1.10, that each column of C in positions i1 , , ir is not a
linear combination of the preceding columns of C. By the description of the row

reduced echelon form, e1 , , er occur for the rst time in C in positions i1 , , ir
respectively. Therefore, both B and C have the sequence e1 , e2 , , er occurring for
the rst time in the positions, i1 , i2 , , ir . Since these matrices are row equivalent,
it follows from Lemma 2.1.10, that the columns between the ik and ik+1 position in
the two matrices are linear combinations involving the same scalars, of the columns
in the i1 , , ik position. Similarly, the columns after the ir position are linear
combinations of the columns in the i1 , , ir positions involving the same scalars
in both matrices. This is equivalent to the assertion that each of these columns is
identical in B and C.
Now with the above corollary, here is a very fundamental observation. The
number of nonzero rows in the row reduced echelon form is the same as the number
of pivot columns. Namely, this number is r in both cases where e1 , , er are the
pivot columns in the row reduced echelon form. Now consider a matrix which looks
like this: (More columns than rows.)
Corollary 2.2.10 Suppose A is an m n matrix and that m < n. That is, the
number of rows is less than the number of columns. Then one of the columns of A
is a linear combination of the preceding columns of A. Also, there exists x F n
such that x = 0 and Ax = 0.
Proof: Since m < n, not all the columns of A can be pivot columns. In reading
from left to right, pick the rst one which is not a pivot column. Then from the
description of the row reduced echelon form, this column is a linear combination of
the preceding columns. Say
aj = x1 a1 + + xj1 aj1 .
Therefore, from the way we multiply a matrix times a vector,

x1 x1
.. ..
. .

xj1 xj1

A
1
= (a1 aj1 aj an ) 1

=0

0 0

. .
.. ..
0 0

2.3. FINDING THE INVERSE OF A MATRIX 39
2.3 Finding the inverse of a matrix

We have already discussed the idea of an inverse of a matrix in Denition 2.1.7.
Recall that the inverse of an n n matrix A is a matrix B such that
AB = BA = I
where I is the identity matrix discussed in In Theorem 2.1.6, it was shown that an
elementary matrix is invertible and that its inverse is also an elementary matrix.
We also showed in Lemma 2.1.8 that the product of invertible matrices is invertible
and that the inverse of this product is the product of the inverses in the reverse
order. In this section, we consider the problem of nding an inverse for a given
n n matrix. Recall that A has an inverse, denoted by A1 if AA1 = A1 A = I.
The procedure for nding the inverse is called the Gauss-Jordan procedure.
Procedure 2.3.1 Suppose A is an n n matrix. To nd A1 if it exists, form the

augmented n 2n matrix
(A|I)
and then if possible, do row operations until you obtain an n 2n matrix of the
form
(I|B) . (2.3.5)
When this has been done, B = A1 . If it is impossible to row reduce to a matrix of
the form (I|B) , then A has no inverse.
The procedure just described actually yields a right inverse. This is a matrix
B such that AB = I.
As mentioned earlier, what you have really found in the above algorithm is a
right inverse. Is this right inverse matrix, which we have called the inverse, really
the inverse, the matrix which when multiplied on both sides gives the identity?
Theorem 2.3.2 Suppose A, B are nn matrices and AB = I. Then it follows that

BA = I also and so B = A1 . For n n matrices, the left inverse, right inverse
and inverse are all the same thing.
Proof. If AB = I for A, B n n matrices, is BA = I? If AB = I, there exists

a unique solution x to the equation
Bx = y
for any choice of y. In fact,

x = A (Bx) = Ay.
This means the row reduced echelon form of B must be I. Thus every column is a
pivot column. Otherwise, there exists a free variable and the solution, if it exists,
would not be unique, contrary to what was just shown must happen if AB = I. It
follows that a right inverse B 1 for B exists. The above procedure yields
( ) ( )
B I I B 1
Now multiply both sides of the equation AB = I on the right by B 1 . Then

( )
A = A BB 1 = (AB) B 1 = B 1
Thus A is the right inverse of B and so BA = I. This shows that if AB = I, then

BA = I also. Exchanging roles of A and B, we see that if BA = I, then AB = I.

This has shown that in the context of n n matrices, right inverses, left inverses
and inverses are all the same and this matrix is called A1 .
The following corollary is also of interest.
Corollary 2.3.3 An n n matrix A has an inverse if and only if the row reduced
echelon form of A is I.
Proof. First suppose the row reduced echelon form of A is I. Then Procedure
2.3.1 yields a right inverse for A. By Theorem 2.3.2 this is the inverse. Next suppose
A has an inverse. Then there exists a unique solution x to the equation
Ax = y
given by x = A1 y. It follows that in the augmented matrix
(A|0)
there are no free variables and so every column to the left of | is a pivot column.
Therefore, the row reduced echelon form of A is I.
This also shows the following major theorem.
Theorem 2.3.4 An n n matrix is invertible if and only if there is a sequence of

elementary matrices {E1 , , Es } such that A = E1 Es .
Proof: If A is invertible, then there exist elementary matrices Ei such that
E1 Es1 Es A = I
Hence, multiplying by the inverses of these elementary matrices,
A = Es1 E21 E11
and as shown above, the inverse of an elementary matrix is an elementary matrix.

This proves the theorem in one direction. The other direction is obvious because
as pointed out above, the product of invertible matrices is invertible.
2.4. THE RANK OF A MATRIX 41
2.4 The rank of a matrix

With the existence and uniqueness of the row reduced echelon form, it is a natural
step to dene the rank of a matrix.
Denition 2.4.1 Let A be an m n matrix. Then the rank of A is dened to

be the number of pivot columns. From the description of the row reduced echelon
form, this is also equal to the number of nonzero rows in the row reduced echelon
form. The nullity of A is the number of non pivot columns. This is the same as
the number of free variables in the augmented matrix (A|0).
The rank is important because of the following proposition.
Proposition 2.4.2 Let A be an m n matrix which has rank r. Then there exists
a set of r columns of A such that every other column is a linear combination of
these columns. Furthermore, none of these columns is a linear combination of the
other r 1 columns in the set. The rank of A is no larger than the minimum of m
and n. Also the rank added to the nullity equals n.
Proof. Since the rank of A is r it follows that A has exactly r pivot columns.
Thus, in the row reduced echelon form, every column is a linear combination of these
pivot columns and none of the pivot columns is a linear combination of the others
pivot columns. By Lemma 2.1.10 the same is true of the columns in the original
matrix A. There are at most min (m, n) pivot columns (nonzero rows). Therefore,
the rank of A is no larger than min (m, n) as claimed. Since every column is either
a pivot column or isnt a pivot column, this shows that the rank added to nullity
equals n.
Sequences
3.1 The Inner Product And Dot Product

The inner product is dened for vectors in Cn or Rn . To avoid making a distinction
I will use the symbol F. Vectors are denoted by bold face letters. Thus a will denote
the vector
a = (a1 , , an ) .
Scalars are denoted by letters which are not in bold face.
Denition 3.1.1 Let a, b Fn , dene a,b as

n
(a, b) ak bk .
k=1
This is called the inner product.
With this denition, there are several important properties satised by the inner
product. In the statement of these properties, and will denote scalars and a, b, c
will denote vectors or in other words, points in Fn . The following proposition comes
directly from the denition of the inner product.
Proposition 3.1.2 The inner product satises the following properties.
(a, b) = (b, a) (3.1.1)
(a, a) 0 and equals zero if and only if a = 0 (3.1.2)

((a + b) , c) = (a, c) + (b, c) (3.1.3)
(c, (a + b)) = (c, a) + (c, b) (3.1.4)
2
|a| (a, a) (3.1.5)
Example 3.1.3 Find (1, 2, 0, 1) , (0, i, 2, 3) .
43
44 SEQUENCES
This equals 0 + 2 (i) + 0 + 3 = 3 2i

For any inner product there is always the Cauchy Schwarz inequality.
Theorem 3.1.4 The inner product satises the inequality
|(a, b)| |a| |b| . (3.1.6)
Furthermore equality is obtained if and only if one of a or b is a scalar multiple of

the other.
Proof: First dene C such that
(a, b) = |(a, b)| , || = 1,
and dene a function of t R
f (t) = ((a + tb) , (a + tb)) .
Then by 3.1.2, f (t) 0 for all t R. Also from 3.1.3,3.1.4,3.1.1, and 3.1.5
f (t) = (a, (a + tb)) + (tb, (a + tb))

2
= (a, a) + t (a, b) + t (b, a) + t2 || (b, b)
2 2
= |a| + 2t Re (a, b) + |b| t2
2 2
= |a| + 2t |(a, b)| + |b| t2
2
Now if |b| = 0 it must be the case that (a, b) = 0 because otherwise, you could
pick large negative values of t and violate f (t) 0. Therefore, in this case, the
Cauchy Schwarz inequality holds.
In the case that |b| = 0, y = f (t) is a polynomial which opens up and therefore,
if it is always nonnegative, the quadratic formula requires that
The discriminant
z }| {
2 2 2
4 |(a, b)| 4 |a| |b| 0
since otherwise the function, f (t) would have two real zeros and would necessarily
have a graph which dips below the t axis. This proves 3.1.6.
It is clear from the axioms of the inner product that equality holds in 3.1.6
whenever one of the vectors is a scalar multiple of the other. It only remains to
verify that this is the only way equality can occur. If either vector equals zero,
then equality is obtained in 3.1.6 so it can be assumed both vectors are non zero.
Then if equality is achieved, it follows f (t) has exactly one real zero because the
discriminant vanishes. Therefore, for some value of t, a + tb = 0 showing that a is
a multiple of b.
You should note that the entire argument was based only on the properties
of the inner product listed in 3.1.1 - 3.1.5. This means that whenever something
3.1. THE INNER PRODUCT AND DOT PRODUCT 45
satises these properties, the Cauchy Schwartz inequality holds. There are many
other instances of these properties besides vectors in Fn .
The Cauchy Schwartz inequality allows a proof of the triangle inequality for
distances in Fn in much the same way as the triangle inequality for the absolute
value.
Theorem 3.1.5 (Triangle inequality) For a, b Fn
|a + b| |a| + |b| (3.1.7)
and equality holds if and only if one of the vectors is a nonnegative scalar multiple
of the other. Also
||a| |b|| |a b| (3.1.8)
Proof : By properties of the inner product and the Cauchy Schwartz inequality,
2
|a + b| = ((a + b) , (a + b))
= (a, a) + (a, b) + (b, a) + (b, b)
2 2
= |a| + 2 Re (a, b) + |b|
2 2
|a| + 2 |(a, b)| + |b|
2 2
|a| + 2 |a| |b| + |b|
2
= (|a| + |b|) .
Taking square roots of both sides you obtain 3.1.7.
It remains to consider when equality occurs. If either vector equals zero, then
that vector equals zero times the other vector and the claim about when equality
occurs is veried. Therefore, it can be assumed both vectors are nonzero. To get
equality in the second inequality above, Theorem 3.1.4 implies one of the vectors
must be a multiple of the other. Say b = a. Also, to get equality in the rst
inequality, (a, b) must be a nonnegative real number. Thus
2
0 (a, b) = (a,a) = |a| .
Therefore, must be a real number which is nonnegative.
To get the other form of the triangle inequality,
a=ab+b
so
|a| = |a b + b|
|a b| + |b| .
Therefore,
|a| |b| |a b| (3.1.9)
Similarly,
|b| |a| |b a| = |a b| . (3.1.10)
It follows from 3.1.9 and 3.1.10 that 3.1.8 holds. This is because ||a| |b|| equals
the left side of either 3.1.9 or 3.1.10 and either way, ||a| |b|| |a b| .
46 SEQUENCES
3.1.1 The Dot Product

If you forget about the
conjugate, the resulting product will be referred to as the dot
n
product. Thus a b = i=1 ai bi . This dot product satises the following properties.
ab=ba (3.1.11)
((a + b) c) = (a c) + (b c) (3.1.12)
(c (a + b)) = (c a) + (c b) (3.1.13)
Note that it is the same thing as the inner product if the vectors are in R . However,
n
in case the vectors are in Cn , this dot product will not satisfy a a 0. For example,
(1, 2i) (1, 2i) = 1 4 = 3. However, ((1, 2i) , (1, 2i)) = 1 + 4 = 5. Usually we are
considering Rn so it makes no dierence.
3.2 Vector Valued Sequences And Their Limits

Functions dened on the set of integers larger than a given integer which have
values in a vector space are called vector valued sequences. I will always assume
the vector space is a normed vector space. Actually, it will specialized even more
to Fn , although everything can be done for an arbitrary vector space and when it
creates no diculties, I will state certain denitions and easy theorems in the more
general context and use the symbol |||| to refer to the norm. Other than this, the
notation is almost the same as it was when the sequences had values in C. The
main dierence is that certain variables are placed in bold face to indicate they are
vectors. Even this is not really necessary but it is conventional to do it.The concept
of subsequence is also the same as it was for sequences of numbers. To review,
Denition 3.2.1 Let {an } be a sequence and let n1 < n2 < n3 , be any strictly
increasing list of integers such that n1 is at least as large as the rst number in the
domain of the function. Then if bk ank , {bk } is called a subsequence of {an } .
( ( ))
Example 3.2.2 Let an = n + 1, sin n1 . Then {an }n=1 is a vector valued se-
quence.
The denition of a limit of a vector valued sequence is given next. It is just like
the denition given for sequences of scalars. However, here the symbol || refers to
the usual norm in Fn . In a general normed vector space, it will be denoted by |||| .

Denition 3.2.3 A vector valued sequence {an }n=1 converges to a in a normed
vector space V, written as
lim an = a or an a
n
if and only if for every > 0 there exists n such that whenever n n ,
||an a|| < .

3.2. VECTOR VALUED SEQUENCES AND THEIR LIMITS 47
In words the denition says that given any measure of closeness , the terms of
the sequence are eventually this close to a. Here, the word eventually refers to n
being suciently large.
Theorem 3.2.4 If limn an = a and limn an = a1 then a1 = a.
Proof: Suppose a1 = a. Then let 0 < < ||a1 a|| /2 in the denition of
the limit. It follows there exists n such that if n n , then ||an a|| < and
|an a1 | < . Therefore, for such n,
||a1 a|| ||a1 an || + ||an a||

< + < ||a1 a|| /2 + ||a1 a|| /2 = ||a1 a|| ,
a contradiction.
Theorem 3.2.5 Suppose {an } and {bn } are vector valued sequences and that
lim an = a and lim bn = b.

n n
Also suppose x and y are scalars in F. Then
lim xan + ybn = xa + yb (3.2.14)

n
Also,
lim (an bn ) = (a b) (3.2.15)
n
If {xn } is a sequence of scalars in F converging to x and if {an } is a sequence of

vectors in Fn converging to a, then
lim xn an = xa. (3.2.16)

n
Also if {xk } is a sequence of vectors in Fn then xk x, if and only if for each j,
lim xjk = xj . (3.2.17)

k
where here ( ) ( )
xk = x1k , , xnk , x = x1 , , xn .
Proof: Consider the rst claim. By the triangle inequality
||xa + yb (xan + ybn )|| |x| ||a an || + |y| ||b bn || .
By denition, there exists n such that if n n ,

||a an || , ||b bn || <
2 (1 + |x| + |y|)
so for n > n ,

||xa + yb (xan + ybn )|| < |x| + |y| .
2 (1 + |x| + |y|) 2 (1 + |x| + |y|)
48 SEQUENCES
Now consider the second. Let > 0 be given and choose n1 such that if n n1
then
|an a| < 1.
For such n, it follows from the Cauchy Schwarz inequality and properties of the
inner product that
|an bn a b| |(an bn ) (an b)| + |(an b) (a b)|

|an | |bn b| + |b| |an a|
(|a| + 1) |bn b| + |b| |an a| .
Now let n2 be large enough that for n n2 ,

|bn b| < , and |an a| < .
2 (|a| + 1) 2 (|b| + 1)
Such a number exists because of the denition of limit. Therefore, let
n > max (n1 , n2 ) .
For n n ,
|an bn a b| (|a| + 1) |bn b| + |b| |an a|

< (|a| + 1) + |b| .
2 (|a| + 1) 2 (|b| + 1)
This proves 3.2.15. The claim, 3.2.16 is left for you to do.
Finally consider the last claim. If 3.2.17 holds, then from the denition of
distance in Fn ,
v
u
u n ( )2
lim |x xk | lim t xj xjk = 0.
k k
j=1

On the other hand, if limk |x xk | = 0, then since xjk xj |x xk | , it
follows from the squeezing theorem that

lim xjk xj = 0.
k

An important theorem is the one which states that if a sequence converges, so
does every subsequence. You should review Denition 3.2.1 at this point. The proof
is identical to the one involving sequences of numbers.
Theorem 3.2.6 Let {xn } be a vector valued sequence with limn xn = x and let
{xnk } be a subsequence. Then limk xnk = x.
3.3. SEQUENTIAL COMPACTNESS 49
Proof: Let > 0 be given. Then there exists n such that if n > n , then
||xn x|| < . Suppose k > n . Then nk k > n and so
||xnk x|| <
showing limk xnk = x as claimed.
Theorem 3.2.7 Let {xn } be a sequence of real numbers and suppose each xn l
( l)and limn xn = x. Then x l ( l) . More generally, suppose {xn } and {yn }
are two sequences such that limn xn = x and limn yn = y. Then if xn yn
for all n suciently large, then x y.
Proof: Let > 0 be given. Then for n large enough,
l xn > x
and so
l + x.
Since > 0 is arbitrary, this requires l x. The other case is entirely similar or else
you could consider l and {xn } and apply the case just considered.
Consider the last claim. There exists N such that if n N then xn yn and
|x xn | + |y yn | < /2.
Then considering n > N in what follows,
x y xn + /2 (yn /2) = xn yn + .
Since was arbitrary, it follows x y 0.
Theorem 3.2.8 Let {xn } be a sequence vectors and suppose each ||xn || l ( l)and
limn xn = x. Then x l ( l) . More generally, suppose {xn } and {yn } are two
sequences such that limn xn = x and limn yn = y. Then if ||xn || ||yn || for
all n suciently large, then ||x|| ||y|| .
Proof: It suces to just prove the second part since the rst part is similar.
By the triangle inequality,
|||xn || ||x||| ||xn x||
and for large n this is given to be small. Thus {||xn ||} converges to ||x|| . Similarly
{||yn ||} converges to ||y||. Now the desired result follows from Theorem 3.2.7.
3.3 Sequential Compactness

The following is the denition of sequential compactness. It is a very useful notion
which can be used to prove existence theorems.
50 SEQUENCES
Denition 3.3.1 A set K V, a normed vector space is sequentially compact if

whenever {an } K is a sequence, there exists a subsequence, {ank } such that this
subsequence converges to a point of K.
First of all, it is convenient to consider the sequentially compact sets in F.

[ ]
Lemma 3.3.2 Let Ik = ak , bk and suppose that for all k = 1, 2, ,
Ik Ik+1 .
Then there exists a point, c R which is an element of every Ik .
Proof: Since Ik Ik+1 , this implies
ak ak+1 , bk bk+1 . (3.3.18)
Consequently, if k l,
al al bl bk . (3.3.19)
Now dene
{ }
c sup al : l = 1, 2,
By the rst inequality in 3.3.18, and 3.3.19
{ }
ak c = sup al : l = k, k + 1, bk (3.3.20)
for each k = 1, 2 . Thus c Ik for every k.

If this went too fast, the reason{ for the last inequality
} in 3.3.20 is that from
3.3.19, bk is an upper bound to al : l = k, k + 1, . Therefore, it is at least as
large as the least upper bound.
Theorem 3.3.3 Every closed interval, [a, b] is sequentially compact.

[ ] [ ]
Proof: Let {xn } [a, b] I0 . Consider the two intervals a, a+b 2 and a+b
2 ,b
each of which has length (b a) /2. At least one of these intervals contains xn for
innitely many values of n. Call this interval I1 . Now do for I1 what was done for I0 .
Split it in half and let I2 be the interval which contains xn for innitely many values
of n. Continue this way obtaining a sequence of nested intervals I0 I1 I2 I3
where the length of In is (b a) /2n . Now pick n1 such that xn1 I1 , n2 such that
n2 > n1 and xn2 I2 , n3 such that n3 > n2 and xn3 I3 , etc. (This can be done
because in each case the intervals contained xn for innitely many values of n.) By
the nested interval lemma there exists a point, c contained in all these intervals.
Furthermore,
|xnk c| < (b a) 2k
and so limk xnk = c [a, b].
3.4. CLOSED AND OPEN SETS 51
Theorem 3.3.4 Let

n
I= Kk
k=1
where Kk is a sequentially compact set in F. Then I is a sequentially compact set
in Fn .

Proof: Let {xk }k=1 be a sequence of points in I. Let
( )
xk = x1k , , xnk
{ }
Thus xik k=1 is a sequence of points in Ki . Since Ki is sequentially compact, there
{ }
exists a subsequence of {xk }k=1 denoted by {x1k } such that x11k converges{to x}1
for some x1 K1 . Now there exists a further subsequence, {x2k } such that x12k
converges to x1 , because by Theorem 3.2.6, subsequences of convergent sequences
{ }
converge to the same limit as the convergent sequence, and in addition, x22k

} x K2 . Continue taking subsequences such that for {xjk }k=1 ,
2
converges {to some

it follows xrjk converges to some xr Kr for all r j. Then {xnk }k=1 is the
desired subsequence such that the sequence of numbers in F obtained by taking
the j th component of this ) converges to some x Kj .It follows from
j
( subsequence
Theorem 3.2.5 that x x , , x I and is the limit of {xnk }k=1 .
1 n
Corollary 3.3.5 Any box of the form

[a, b] + i [c, d] {x + iy : x [a, b] , y [c, d]}
is sequentially compact in C.
Proof: The given box is essentially [a, b] [c, d] .

{xk + iyk }k=1 [a, b] + i [c, d]
is the same as saying (xk , yk ) [a, b] [c, d] . Therefore, there exists (x, y) [a, b]
[c, d] such that xk x and yk y. In other words xk + iyk x + iy and
x + iy [a, b] + i [c, d].
3.4 Closed And Open Sets

The denition of open and closed sets is next.
Denition 3.4.1 Let U be a set of points in a normed vector space, V . A point,
p U is said to be an interior point if whenever ||x p|| is suciently small, it
follows x U also. The set of points, x which are closer to p than is denoted by
B (p, ) {x V : ||x p|| < } .
This symbol, B (p, ) is called an open ball of radius . Thus a point, p is an interior
point of U if there exists > 0 such that p B (p, ) U . An open set is one for
which every point of the set is an interior point. Closed sets are those which are
complements of open sets. Thus H is closed means H C is open.
52 SEQUENCES
Theorem 3.4.2 The intersection of any nite collection of open sets is open. The
union of any collection of open sets is open. The intersection of any collection of
closed sets is closed and the union of any nite collection of closed sets is closed.
Proof: To see that any union of open sets is open, note that every point of the
union is in at least one of the open sets. Therefore, it is an interior point of that
set and hence an interior point of the entire union.
Now let {U1 , , Um } be some open sets and suppose p m k=1 Uk . Then there
exists rk > 0 such that B (p, rk ) Uk . Let 0 < r min (r1 , r2 , , rm ) . Then
B (p, r) m k=1 Uk and so the nite intersection is open. Note that if the nite
intersection is empty, there is nothing to prove because it is certainly true in this
case that every point in the intersection is an interior point because there arent
any such points.
Suppose {H1 , , Hm } is a nite set of closed sets. Then mk=1 Hk is closed if
its complement is open. However, from DeMorgans laws,
C
k=1 Hk ) = k=1 Hk ,
(m m C
a nite intersection of open sets which is open by what was just shown.
Next let C be some collection of closed sets. Then
C { }
(C) = H C : H C ,
a union of open sets which is therefore open by the rst part of the proof. Thus C
is closed.
Next there is the concept of a limit point which gives another way of character-
izing closed sets.
Denition 3.4.3 Let A be any nonempty set and let x be a point. Then x is said
to be a limit point of A if for every r > 0, B (x, r) contains a point of A which is
not equal to x.
Example 3.4.4 Consider A = B (x, ) , an open ball in a normed vector space.

Then every point of B (x, ) is a limit point. There are more general situations
than normed vector spaces in which this assertion is false.
If z B (x, ) , consider z+ k1 (x z) wk for k N. Then

1
||wk x|| = z+ (x z) x
k
( ) ( )
1 1
= 1 z 1 x
k k
k1
= ||z x|| <
k
and also
1
||wk z|| ||x z|| < /k
k
3.4. CLOSED AND OPEN SETS 53
so wk z. Furthermore, the wk are distinct. Thus z is a limit point of A as

claimed. This is because every ball containing z contains innitely many of the wk
and since they are all distinct, they cant all be equal to z.
Similarly, the following holds in any normed vector space.
Theorem 3.4.5 Let A be a nonempty set in V, a normed vector space. A point a is

a limit point of A if and only if there exists a sequence of distinct points of A, {an }
which converges to a. Also a nonempty set A is closed if and only if it contains all
its limit points.
Proof: Suppose rst a is a limit point of A. There exists a1 B (a, 1) A such

that a1 = a. Now supposing distinct points, a1 , , an have been chosen such that
none are equal to a and for each k n, ak B (a, 1/k) , let
{ }
1
0 < rn+1 < min , ||a a1 || , , ||a an || .
n+1
Then there exists an+1 B (a, rn+1 ) A with an+1 = a. Because of the denition of
rn+1 , an+1 is not equal to any of the other ak for k < n + 1. Also since ||a am || <
1/m, it follows limm am = a. Conversely, if there exists a sequence of distinct
points of A converging to a, then B (a, r) contains all an for n large enough. Thus
B (a, r) contains innitely many points of A since all are distinct. Thus at least one
of them is not equal to a. This establishes the rst part of the theorem.
Now consider the second claim. If A is closed then it is the complement of an
open set. Since AC is open, it follows that if a AC , then there exists > 0 such
that B (a, ) AC and so no point of AC can be a limit point of A. In other words,
every limit point of A must be in A. Conversely, suppose A contains all its limit
points. Then AC does not contain any limit points of A. It also contains no points
of A. Therefore, if a AC , since it is not a limit point of A, there exists > 0 such
that B (a, ) contains no points of A dierent than a. However, a itself is not in A
because a AC . Therefore, B (a, ) is entirely contained in AC . Since a AC was
arbitrary, this shows every point of AC is an interior point and so AC is open.
Closed subsets of sequentially compact sets are sequentially compact.
Theorem 3.4.6 If K is a sequentially compact set in a normed vector space and
if H is a closed subset of K then H is sequentially compact.
Proof: Let {xn } H. Then since K is sequentially compact, there is a sub-

sequence, {xnk } which converges to a point, x K. If x / H, then since H C is
open, it follows there exists B (x, r) such that this open ball contains no points of
H. However, this is a contradiction to having xnk x which requires xnk B (x, r)
for all k large enough. Thus x H and this has shown H is sequentially compact.
Denition 3.4.7 A set S V, a normed vector space is bounded if there is some

r > 0 such that S B (0, r) .
Theorem 3.4.8 Every closed and bounded set in Fn is sequentially compact. Con-
versely, every sequentially compact set in Fn is closed and bounded.
54 SEQUENCES
Proof: Let H be a closed and bounded set in Fn . Then H B (0, r) for some r.
n 2
Therefore, if x H, x = (x1 , , xn ) , it must be that i=1 |xi | < r and so each
xi [r, r] + i [r, r] Rr , a sequentially compact set by Corollary 3.3.5. Thus H
n
is a closed subset of i=1 Rr which is a sequentially compact set by Theorem 3.3.4.
Therefore, by Theorem 3.4.6 it follows H is sequentially compact.
Conversely, suppose K is a sequentially compact set in Fn . If it is not bounded,
then there exists a sequence, {km } such that km K but km / B (0, m) for m =
1, 2, . However, this sequence cannot have any convergent subsequence because if
C
kmk k, then for large enough m, k B (0,m) D (0, m) and kmk B (0, m)
for all k large enough and this is a contradiction because there can only be nitely
many points of the sequence in B (0, m) . If K is not closed, then it is (missing)a
limit point. Say k is a limit point of K which is not in K. Pick km B k , m 1
.
Then {km } converges to k and so every subsequence also converges to k by
Theorem 3.2.6. Thus there is no point of K which is a limit of some subsequence
of {km } , a contradiction.
What are some examples of closed and bounded sets in a general normed vector
space and more specically Fn ?
Proposition 3.4.9 Let D (z, r) denote the set of points,
{w V : ||w z|| r}
Then D (z, r) is closed and bounded. Also, let S (z,r) denote the set of points
{w V : ||w z|| = r}
Then S (z, r) is closed and bounded. It follows that if V = Fn ,then these sets are
sequentially compact.
Proof: First note D (z, r) is bounded because
D (z, r) B (0, ||z|| + 2r)
Here is why. Let x D (z, r) . Then ||x z|| r and so
||x|| ||x z|| + ||z|| r + ||z|| < 2r + ||z|| .
It remains to verify it is closed. Suppose then that y / D (z, r) . This means
||y z|| > r. Consider the open ball B (y, ||y z|| r) . If x B (y, ||y z|| r) ,
then
||x y|| < ||y z|| r
and so by the triangle inequality,
||z x|| ||z y|| ||y x|| > ||x y|| + r ||x y|| = r
Thus the complement of D (z, r) is open and so D (z, r) is closed.
C C
For the second type of set, note S (z, r) = B (z, r) D (z, r) , the union of two
open sets which by Theorem 3.4.2 is open. Therefore, S (z, r) is a closed set which
is clearly bounded because S (z, r) D (z, r).
3.5. CAUCHY SEQUENCES AND COMPLETENESS 55
3.5 Cauchy Sequences And Completeness

The concept of completeness is that every Cauchy sequence converges. Cauchy
sequences are those sequences which have the property that ultimately the terms
of the sequence are bunching up. More precisely,
Denition 3.5.1 {an } is a Cauchy sequence in a normed vector space, V if for all
> 0, there exists n such that whenever n, m n ,
||an am || < .
Theorem 3.5.2 The set of terms (values) of a Cauchy sequence in a normed vector
space V is bounded.
Proof: Let = 1 in the denition of a Cauchy sequence and let n > n1 . Then
from the denition,
||an an1 || < 1.
It follows that for all n > n1 ,
||an || < 1 + ||an1 || .
Therefore, for all n,

n1
||an || 1 + ||an1 || + ||ak || .
k=1
Theorem 3.5.3 If a sequence {an } in V, a normed vector space converges, then

the sequence is a Cauchy sequence.
Proof: Let > 0 be given and suppose an a. Then from the denition of
convergence, there exists n such that if n > n , it follows that

||an a|| <
2
Therefore, if m, n n + 1, it follows that

||an am || ||an a|| + ||a am || < + =
2 2
showing that, since > 0 is arbitrary, {an } is a Cauchy sequence.
The following theorem is very useful. It is identical to an earlier theorem. All
that is required is to put things in bold face to indicate they are vectors.
Theorem 3.5.4 Suppose {an } is a Cauchy sequence in any normed vector space
and there exists a subsequence, {ank } which converges to a. Then {an } also con-
verges to a.
56 SEQUENCES
Proof: Let > 0 be given. There exists N such that if m, n > N, then
||am an || < /2.
Also there exists K such that if k > K, then
||a ank || < /2.
Then let k > max (K, N ) . Then for such k,
||ak a|| ||ak ank || + ||ank a||

< /2 + /2 = .
Denition 3.5.5 If V is a normed vector space having the property that every
Cauchy sequence converges, then V is called complete. It is also referred to as a
Banach space.
Example 3.5.6 R is given to be complete. This is a fundamental axiom on which

calculus is developed.
Given R is complete, the following lemma is easily obtained.
Lemma 3.5.7 C is complete.

Proof: Let {xk + iyk }k=1 be a Cauchy sequence in C. This requires {xk } and
{yk } are both Cauchy sequences in R. This follows from the obvious estimates
|xk xm | , |yk ym | |(xk + iyk ) (xm + iym )| .
By completeness of R there exists x R such that xk x and similarly there exists

y R such that yk y. Therefore, since

2 2
|(xk + iyk ) (x + iy)| (xk x) + (yk y)
|xk x| + |yk y|
it follows (xk + iyk ) (x + iy) .

A simple generalization of this idea yields the following theorem.
Theorem 3.5.8 Fn is complete.
Proof: By 3.5.7, F is complete. Now let {am } be a Cauchy sequence in Fn .

Then by the denition of the norm

j j
am ak |am ak |
3.6. SHRINKING DIAMETERS 57
{ }
where ajm denotes the j th component of am . Thus for each j = 1, 2, , n, ajm m=1
is a Cauchy sequence. It follows from Theorem 3.5.7, the completeness of F, there
exists aj such that
lim ajm = aj
m
Theorem 3.2.5 implies that limm am = a where

( )
a = a 1 , , an .
3.6 Shrinking Diameters

It is useful to consider another version of the nested interval lemma. This involves
a sequence of sets such that set (n + 1) is contained in set n and such that their
diameters converge to 0. It turns out that if the sets are also closed, then often
there exists a unique point in all of them.
Denition 3.6.1 Let S be a nonempty set in a normed vector space, V . Then

diam (S) is dened as
diam (S) sup {||x y|| : x, y S} .
This is called the diameter of S.

Theorem 3.6.2 Let {Fn }n=1 be a sequence of closed sets in Fn such that
lim diam (Fn ) = 0

n
and Fn Fn+1 for each n. Then there exists a unique p

k=1 Fk .
Proof: Pick pk Fk . This is always possible because by assumption each set is

nonempty. Then {pk }k=m Fm and since the diameters converge to 0, it follows
{pk } is a Cauchy sequence. Therefore, it converges to a point, p by completeness
of Fn discussed in Theorem 3.5.8. Since each Fk is closed, p Fk for all k. This is
because it is a limit of a sequence of points only nitely many of which are not in
the closed set Fk . Therefore, p
k=1 Fk . If q k=1 Fk , then since both p, q Fk ,
|p q| diam (Fk ) .
It follows since these diameters converge to 0, |p q| for every . Hence p = q.

A sequence of sets {Gn } which satises Gn Gn+1 for all n is called a nested
sequence of sets.
58 SEQUENCES
3.7 Exercises
1. For a nonempty set S in a normed vector space, V, dene a function
x dist (x,S) inf {||x y|| : y S} .
Show
||dist (x, S) dist (y, S)|| ||x y|| .
2. Let A be a nonempty set in Fn or more generally in a normed vector space.

Dene the closure of A to equal the intersection of all closed sets which contain
A. This is usually denoted by A. Show A = A A where A consists of the
set of limit points of A. Also explain why A is closed.
3. The interior of a set was dened above. Tell why the interior of a set is always
an open set. The interior of a set A is sometimes denoted by A0 .
4. Given an example of a set A whose interior is empty but whose closure is all
of Rn .
5. A point, p is said to be in the boundary of a nonempty set A if for every r > 0,

B (p, r) contains points of A as well as points of AC . Sometimes this is denoted
as A. In a normed vector space, is it always the case that A A = A? Prove
or disprove.
6. Give an example of a nite dimensional normed vector space where the eld
of scalars is the rational numbers which is not complete.
7. Explain why as far as the theorems of this chapter are concerned, Cn is es-
sentially the same as R2n .
8. A set A Rn is said to be convex if whenever x, y A it follows tx+ (1 t) y

A whenever t [0, 1]. Show B (z, r) is convex. Also show D (z,r) is convex.
If A is convex, does it follow A is convex? Explain why or why not.
9. Let A be any nonempty subset of Rn . The convex hull of A, usually denoted by

co (A) is dened as the set of
all convex combinations of pointsin A. A convex
p
combination is of the form k=1 tk ak where each tk 0 and k tk = 1. Note
that p can be any nite number. Show co (A) is convex.
p
10. Suppose A Rn and z co (A) . Thus z = k=1 tk ak for tk 0 and k tk =
1. Show there exists n + 1 of the points {a1 , , ap } such that z is a convex
combination of these n+1 points. Hint: Show that if p > n+1 then the vectors
p
{ak a1 }k=2 must be linearly dependent. Conclude from this p the existence of
p
scalars {i } such that i=1 i ai = 0. Now for s R, z = k=1 (tk + si ) ak .
Consider small s and adjust till one or more of the tk + sk vanish. Now you
are in the same situation as before but with only p 1 of the ak . Repeat the
argument till you end up with only n+1 at which time you cant repeat again.
3.7. EXERCISES 59
11. Show that any uncountable set of points in Fn must have a limit point.
12. Let V be any nite dimensional vector space having a basis {v1 , , vn } . For
x V, let
n
x= xk vk
k=1
so that the scalars, xk are the components of x with respect to the given basis.
Dene for x, y V

n
(x y) xi yi
i=1
Show this is a dot product for V satisfying all the axioms of a dot product
presented earlier.
13. In the context of Problem 12 let |x| denote the norm of x which is produced
by this inner product and suppose |||| is some other norm on V . Thus
( )1/2
2
|x| |xi |
i
where
x= xk vk . (3.7.21)
k
Show there exist positive numbers < independent of x such that
|x| ||x|| |x|
This is referred to by saying the two norms are equivalent. Hint: The top half
is easy using the Cauchy Schwarz inequality. The bottom half is somewhat
harder. Argue that if it is not so, there exists a sequence {xk } such that
|xk | = 1 but k 1 |xk | = k 1 ||xk || and then note the vector of components
of xk is on S (0, 1) which was shown to be sequentially compact. Pass to
a limit in 3.7.21 and use the assumed inequality to get a contradiction to
{v1 , , vn } being a basis.
14. It was shown above that in Fn , the sequentially compact sets are exactly those
which are closed and bounded. Show that in any nite dimensional normed
vector space, V the closed and bounded sets are those which are sequentially
compact.
15. Two norms on a nite dimensional vector space, ||||1 and ||||2 are said to be
equivalent if there exist positive numbers < such that
||x||1 ||x||2 ||x1 ||1 .
Show the statement that two norms are equivalent is an equivalence rela-
tion. Explain using the result of Problem 13 why any two norms on a nite
dimensional vector space are equivalent.
60 SEQUENCES

16. A normed vector space, V is separable if there is a countable set {wk }k=1
such that whenever B (x, ) is an open ball in V, there exists some wk in this
open ball. Show that Fn is separable. This set of points is called a countable
dense set.
17. Let V be any normed vector space with norm ||||. Using Problem 13 show
that V is separable.
18. Suppose V is a normed vector space. Show there exists a countable set of open

balls B {B (xk , rk )}k=1 having the remarkable property that any open set U
is the union of some subset of B. This collection of balls is called a countable
basis. Hint: Use Problem 17 to get a countable ( dense
) dense set of points,

{xk }k=1 and then consider balls of the form B xk , 1r where r N. Show this
collection of balls is countable and then show it has the remarkable property
mentioned.
19. Suppose S is any nonempty set in V a nite dimensional normed vector space.
Suppose C is a set of open sets such that C S. (Such a collection of sets is
called an open cover.) Show using Problem 18 that there are countably many

sets from C, {Uk }k=1 such that S k=1 Uk . This is called the Lindelo
property when every open cover can be reduced to a countable sub cover.
20. A set H in a normed vector space is said to be compact if whenever C is a set
of open sets such that C H, there are nitely many sets of C, {U1 , , Up }
such that
H pi=1 Ui .
Show using Problem 19 that if a set in a normed vector space is sequentially
compact, then it must be compact. Next show using Problem 14 that a set
in a normed vector space is compact if and only if it is closed and bounded.
Explain why the sets which are compact, closed and bounded, and sequentially
compact are the same sets in any nite dimensional normed vector space
Continuous Functions
Continuous functions are dened as they are for a function of one variable.
Denition 4.0.1 Let V, W be normed vector spaces. A function f : D (f ) V

W is continuous at x D (f ) if for each > 0 there exists > 0 such that whenever
y D (f ) and
||y x||V <
it follows that
||f (x) f (y)||W < .
A function, f is continuous if it is continuous at every point of D (f ) .
There is a theorem which makes it easier to verify certain functions are contin-
uous without having to always go to the above denition. The statement of this
theorem is purposely just a little vague. Some of these things tend to hold in almost
any context, certainly for any normed vector space.
Theorem 4.0.2 The following assertions are valid
1. The function, af +bg is continuous at x when f , g are continuous at x

D (f ) D (g) and a, b F.
2. If and f and g have values in Fn and they are each continuous at x, then f g
is continuous at x. If g has values in F and g (x) = 0 with g continuous, then
f /g is continuous at x.
3. If f is continuous at x, f (x) D (g) , and g is continuous at f (x) ,then g f

is continuous at x.
4. If V is any normed vector space, the function f : V R, given by f (x) = ||x||

is continuous.
5. f is continuous at every point of V if and only if whenever U is an open set

in W, f 1 (W ) is open.
61
62 CONTINUOUS FUNCTIONS
Proof: First consider 1.) Let > 0 be given. By assumption, there exist 1 > 0
such that whenever |x y| < 1 , it follows |f (x) f (y)| < 2(|a|+|b|+1)

and there
exists 2 > 0 such that whenever |x y| < 2 , it follows that |g (x) g (y)| <
2(|a|+|b|+1) . Then let 0 < min ( 1 , 2 ) . If |x y| < , then everything happens

at once. Therefore, using the triangle inequality
|af (x) + bf (x) (ag (y) + bg (y))|
|a| |f (x) f (y)| + |b| |g (x) g (y)|

( ) ( )

< |a| + |b| < .
2 (|a| + |b| + 1) 2 (|a| + |b| + 1)
Now consider 2.) There exists 1 > 0 such that if |y x| < 1 , then |f (x) f (y)| <
1. Therefore, for such y,
|f (y)| < 1 + |f (x)| .
It follows that for such y,
|f g (x) f g (y)| |f (x) g (x) g (x) f (y)| + |g (x) f (y) f (y) g (y)|
|g (x)| |f (x) f (y)| + |f (y)| |g (x) g (y)|

(1 + |g (x)| + |f (y)|) [|g (x) g (y)| + |f (x) f (y)|]
(2 + |g (x)| + |f (x)|) [|g (x) g (y)| + |f (x) f (y)|]
Now let > 0 be given. There exists 2 such that if |x y| < 2 , then

|g (x) g (y)| < ,
2 (2 + |g (x)| + |f (x)|)
and there exists 3 such that if |x y| < 3 , then

|f (x) f (y)| <
2 (2 + |g (x)| + |f (x)|)
Now let 0 < min ( 1 , 2 , 3 ) . Then if |x y| < , all the above hold at once
and so
|f g (x) f g (y)|
(2 + |g (x)| + |f (x)|) [|g (x) g (y)| + |f (x) f (y)|]

( )

< (2 + |g (x)| + |f (x)|) + = .
2 (2 + |g (x)| + |f (x)|) 2 (2 + |g (x)| + |f (x)|)
This proves the rst part of 2.) To obtain the second part, let 1 be as described
above and let 0 > 0 be such that for |x y| < 0 ,
|g (x) g (y)| < |g (x)| /2

63
and so by the triangle inequality,
|g (x)| /2 |g (y)| |g (x)| |g (x)| /2
which implies |g (y)| |g (x)| /2, and |g (y)| < 3 |g (x)| /2.
Then if |x y| < min ( 0 , 1 ) ,

f (x) f (y) f (x) g (y) f (y) g (x)

g (x) g (y) = g (x) g (y)
|f (x) g (y) f (y) g (x)|
( )
2
|g(x)|
2
2 |f (x) g (y) f (y) g (x)|

= 2
|g (x)|
2
2 [|f (x) g (y) f (y) g (y) + f (y) g (y) f (y) g (x)|]
|g (x)|
2
2 [|g (y)| |f (x) f (y)| + |f (y)| |g (y) g (x)|]
|g (x)|
[ ]
2 3
2 |g (x)| |f (x) f (y)| + (1 + |f (x)|) |g (y) g (x)|
|g (x)| 2
2
2 (1 + 2 |f (x)| + 2 |g (x)|) [|f (x) f (y)| + |g (y) g (x)|]
|g (x)|
M [|f (x) f (y)| + |g (y) g (x)|]
where M is dened by
2
M 2 (1 + 2 |f (x)| + 2 |g (x)|)
|g (x)|
Now let 2 be such that if |x y| < 2 , then

1
|f (x) f (y)| < M
2
and let 3 be such that if |x y| < 3 , then
1
|g (y) g (x)| < M .
2
Then if 0 < min ( 0 , 1 , 2 , 3 ) , and |x y| < , everything holds and

f (x) f (y)

g (x) g (y) M [|f (x) f (y)| + |g (y) g (x)|]
[ ]
< M M 1 + M 1 = .
2 2
This completes the proof of the second part of 2.)

Note that in these proofs no eort is made to nd some sort of best . The
problem is one which has a yes or a no answer. Either it is or it is not continuous.
Now consider 3.). If f is continuous at x, f (x) D (g) , and g is continuous at
f (x) ,then g f is continuous at x. Let > 0 be given. Then there exists > 0 such
that if |y f (x)| < and y D (g) , it follows that |g (y) g (f (x))| < . From
continuity of f at x, there exists > 0 such that if |x z| < and z D (f ) , then
|f (z) f (x)| < . Then if |x z| < and z D (g f ) D (f ) , all the above hold
and so
|g (f (z)) g (f (x))| < .
This proves part 3.)
To verify part 4.), let > 0 be given and let = . Then if ||x y|| < , the
triangle inequality implies
|f (x) f (y)| = |||x|| ||y|||

||x y|| < = .
This proves part 4.)

Next consider 5.) Suppose rst f is continuous. Let U be open and let x
f 1 (U ) . This means f (x) U. Since U is open, there exists > 0 such that
B (f (x) , ) U. By continuity, there exists > 0 such that if y B (x, ) , then
f (y) B (f (x) , ) and so this shows B (x,) f 1 (U ) which implies f 1 (U ) is
open since x is an arbitrary point of f 1 (U ) . Next suppose the condition about
inverse images of open sets are open. Then apply this condition to the open set
B (f (x) , ) . The condition says f 1 (B (f (x) , )) is open, and since
x f 1 (B (f (x) , )) ,
it follows x is an interior point of f 1 (B (f (x) , )) so there exists > 0 such that

B (x, ) f 1 (B (f (x) , )) . This says f (B (x, )) B (f (x) , ) . In other words,
whenever ||y x|| < , ||f (y) f (x)|| < which is the condition for continuity at
the point x. Since x is arbitrary,
4.1 Continuity And The Limit Of A Sequence

There is a very useful way of thinking of continuity in terms of limits of sequences
found in the following theorem. In words, it says a function is continuous if it takes
convergent sequences to convergent sequences whenever possible.
Theorem 4.1.1 A function f : D (f ) W is continuous at x D (f ) if and only

if, whenever xn x with xn D (f ) , it follows f (xn ) f (x) .
Proof: Suppose rst that f is continuous at x and let xn x. Let > 0 be given.
By continuity, there exists > 0 such that if ||y x|| < , then ||f (x) f (y)|| < .
4.1. CONTINUITY AND THE LIMIT OF A SEQUENCE 65
However, there exists n such that if n n , then ||xn x|| < and so for all n
this large,
||f (x) f (xn )|| <
which shows f (xn ) f (x) .
Now suppose the condition about taking convergent sequences to convergent
sequences holds at x. Suppose f fails to be continuous at x. Then there exists > 0
and xn D (f ) such that ||x xn || < n1 , yet
||f (x) f (xn )|| .
But this is clearly a contradiction because, although xn x, f (xn ) fails to converge

to f (x) . It follows f must be continuous after all.
Theorem 4.1.2 Suppose f : D (f ) R is continuous at x D (f ) and suppose
||f (xn )|| l ( l)
where {xn } is a sequence of points of D (f ) which converges to x. Then
||f (x)|| l ( l) .
Proof: Since ||f (xn )|| l and f is continuous at x, it follows from the triangle
inequality, Theorem 3.2.8 and Theorem 4.1.1,
||f (x)|| = lim ||f (xn )|| l.

n
The other case is entirely similar.

Another very useful idea involves the automatic continuity of the inverse function
under certain conditions.
Theorem 4.1.3 Let K be a sequentially compact set and suppose f : K f (K) is

continuous and one to one. Then f 1 must also be continuous.
Proof: Suppose f (kn ) f (k) . Does it follow kn k? If this does not happen,
then there exists > 0 and a subsequence still denoted as {kn } such that
|kn k| (4.1.1)
Now since K is compact, there exists a further subsequence, still denoted as {kn }
such that
kn k K
However, the continuity of f requires
f (kn ) f (k )
and so f (k ) = f (k). Since f is one to one, this requires k = k, a contradiction to

4.1.1.
4.2 The Extreme Values Theorem

The extreme values theorem says continuous functions achieve their maximum and
minimum provided they are dened on a sequentially compact set.
The next theorem is known as the max min theorem or extreme value theorem.
Theorem 4.2.1 Let K Fn be sequentially compact. Thus K is closed and

bounded, and let f : K R be continuous. Then f achieves its maximum and
its minimum on K. This means there exist, x1 , x2 K such that for all x K,
f (x1 ) f (x) f (x2 ) .
Proof: Let = sup {f (x) : x K} . Next let {k } be an increasing sequence

which converges to but each k < . Therefore, for each k, there exists xk K
such that
f (xk ) > k .
Since K is sequentially compact, there exists a subsequence, {xkl } such that
lim xkl = x K.
l
Then by continuity of f,
f (x) = lim f (xkl ) lim kl =

l l
which shows f achieves its maximum on K. To see it achieves its minimum, you
could repeat the argument with a minimizing sequence or else you could consider
f and apply what was just shown to f , f having its minimum when f has its
maximum.
4.3 Connected Sets

Stated informally, connected sets are those which are in one piece. In order to dene
what is meant by this, I will rst consider what it means for a set to not be in one
piece.
Denition 4.3.1 Let A be a nonempty subset of V a normed vector space. Then

A is dened to be the intersection of all closed sets which contain A. Note the whole
space, V is one such closed set which contains A.
Lemma 4.3.2 Let A be a nonempty set in a normed vector space V. Then A is a

closed set and
A = A A
where A denotes the set of limit points of A.
4.3. CONNECTED SETS 67
Proof: First of all, denote by C the set of closed sets which contain A. Then
A = C
and this will be closed if its complement is open. However,

C { }
A = HC : H C .
Each H C is open and so the union of all these open sets must also be open. This
is because if x is in this union, then it is in at least one of them. Hence it is an
interior point of that one. But this implies it is an interior point of the union of
them all which is an even larger set. Thus A is closed.
The interesting part is the next claim. First note that from the denition, A A
so if x A, then x A. Now consider y A but y / A. If y
/ A, a closed set, then
C
there exists B (y, r) A . Thus y cannot be a limit point of A, a contradiction.
Therefore,
A A A
Next suppose x A and suppose x / A. Then if B (x, r) contains no points of
A dierent than x, since x itself is not in A, it would follow that B (x,r) A =
C
and so recalling that open balls are open, B (x, r) is a closed set containing A so
from the denition, it also contains A which is contrary to the assertion that x A.
/ A, then x A and so
Hence if x
A A A

Now that the closure of a set has been dened it is possible to dene what is
meant by a set being separated.
Denition 4.3.3 A set S in a normed vector space is separated if there exist sets
A, B such that
S = A B, A, B = , and A B = B A = .
In this case, the sets A and B are said to separate S. A set is connected if it is not
separated. Remember A denotes the closure of the set A.
Note that the concept of connected sets is dened in terms of what it is not. This
makes it somewhat dicult to understand. One of the most important theorems
about connected sets is the following.
Theorem 4.3.4 Suppose U and V are connected sets having nonempty intersec-
tion. Then U V is also connected.
Proof: Suppose U V = A B where A B = B A = . Consider the sets

A U and B U. Since
( )
(A U ) (B U ) = (A U ) B U = ,
It follows one of these sets must be empty since otherwise, U would be separated.
It follows that U is contained in either A or B. Similarly, V must be contained in
either A or B. Since U and V have nonempty intersection, it follows that both V
and U are contained in one of the sets A, B. Therefore, the other must be empty
and this shows U V cannot be separated and is therefore, connected.
The intersection of connected sets is not necessarily connected as is shown by
the following picture.
Theorem 4.3.5 Let f : X Y be continuous where Y is a normed vector space

and X is connected. Then f (X) is also connected.
Proof: To do this you show f (X) is not separated. Suppose to the contrary
that f (X) = A B where A and B separate f (X) . Then consider the sets f 1 (A)
and f 1 (B) . If z f 1 (B) , then f (z) B and so f (z) is not a limit point of
A. Therefore, there exists an open set U containing f (z) such that U A = .
But then, the continuity of f and Theorem 4.0.2 implies that f 1 (U ) is an open
set containing z such that f 1 (U ) f 1 (A) = . Therefore, f 1 (B) contains no
limit points of f 1 (A) . Similar reasoning implies f 1 (A) contains no limit points
of f 1 (B). It follows that X is separated by f 1 (A) and f 1 (B) , contradicting
the assumption that X was connected.
An arbitrary set can be written as a union of maximal connected sets called
connected components. This is the concept of the next denition.
Denition 4.3.6 Let S be a set and let p S. Denote by Cp the union of all
connected subsets of S which contain p. This is called the connected component
determined by p.
Theorem 4.3.7 Let Cp be a connected component of a set S in a normed vector
space. Then Cp is a connected set and if Cp Cq = , then Cp = Cq .
Proof: Let C denote the connected subsets of S which contain p. If Cp = A B
where
A B = B A = ,
then p is in one of A or B. Suppose without loss of generality p A. Then every
set of C must also be contained in A since otherwise, as in Theorem 4.3.4, the set
would be separated. But this implies B is empty. Therefore, Cp is connected. From

this, and Theorem 4.3.4, the second assertion of the theorem is proved.
This shows the connected components of a set are equivalence classes and par-
tition the set.
A set I is an interval in R if and only if whenever x, y I then (x, y) I. The
following theorem is about the connected sets in R.
Theorem 4.3.8 A set C in R is connected if and only if C is an interval.
Proof: Let C be connected. If C consists of a single point, p, there is nothing

to prove. The interval is just [p, p] . Suppose p < q and p, q C. You need to show
(p, q) C. If
x (p, q) \ C
let C (, x) A, and C (x, ) B. Then C = A B and the sets A and B
separate C contrary to the assumption that C is connected.
Conversely, let I be an interval. Suppose I is separated by A and B. Pick x A
and y B. Suppose without loss of generality that x < y. Now dene the set
S {t [x, y] : [x, t] A}
and let l be the least upper bound of S. Then l A so l

/ B which implies l A.
But if l
/ B, then for some > 0,
(l, l + ) B =
contradicting the denition of l as an upper bound for S. Therefore, l B which

implies l
/ A after all, a contradiction. It follows I must be connected.
This yields a generalization of the intermediate value theorem from one variable
calculus.
Corollary 4.3.9 Let E be a connected set in a normed vector space and suppose
f : E R and that y (f (e1 ) , f (e2 )) where ei E. Then there exists e E such
that f (e) = y.
Proof: From Theorem 4.3.5, f (E) is a connected subset of R. By Theorem

4.3.8 f (E) must be an interval. In particular, it must contain y.
The following theorem is a very useful description of the open sets in R.
Theorem 4.3.10 Let U be an open set in R. Then there exist countably many

disjoint open sets {(ai , bi )}i=1 such that U =
i=1 (ai , bi ) .
Proof: Let p U and let z Cp , the connected component determined by p.

Since U is open, there exists, > 0 such that (z , z + ) U. It follows from
Theorem 4.3.4 that
(z , z + ) Cp .
This shows Cp is open. By Theorem 4.3.8, this shows Cp is an open interval, (a, b)
where a, b [, ] . There are therefore at most countably many of these con-
nected components because each must contain a rational number and the rational

numbers are countable. Denote by {(ai , bi )}i=1 the set of these connected compo-
nents.
Denition 4.3.11 A set E in a normed vector space is arcwise connected if for

any two points, p, q E, there exists a closed interval, [a, b] and a continuous
function, : [a, b] E such that (a) = p and (b) = q.
An example of an arcwise connected topological space would be any subset of

Rn which is the continuous image of an interval. Arcwise connected is not the same
as connected. A well known example is the following.
{( ) }
1
x, sin : x (0, 1] {(0, y) : y [1, 1]} (4.3.2)
x
You can verify that this set of points in the normed vector space R2 is not arcwise
connected but is connected.
Lemma 4.3.12 In a normed vector space, B (z,r) is arcwise connected.
Proof: This is easy from the convexity of the set. If x, y B (z, r) , then let
(t) = x + t (y x) for t [0, 1] .
||x + t (y x) z|| = ||(1 t) (x z) + t (y z)||

(1 t) ||x z|| + t ||y z||
< (1 t) r + tr = r
showing (t) stays in B (z, r).
Proposition 4.3.13 If X is arcwise connected, then it is connected.
Proof: Let X be an arcwise connected set and suppose it is separated. Then

X = A B where A, B are two separated sets. Pick p A and q B. Since
X is given to be arcwise connected, there must exist a continuous function :
[a, b] X such that (a) = p and (b) = q. But then ([a, b]) = ( ([a, b]) A)
( ([a, b]) B) and the two sets ([a, b]) A and ([a, b]) B are separated thus
showing that ([a, b]) is separated and contradicting Theorem 4.3.8 and Theorem
4.3.5. It follows that X must be connected as claimed.
Theorem 4.3.14 Let U be an open subset of a normed vector space. Then U is

arcwise connected if and only if U is connected. Also the connected components of
an open set are open sets.
Proof: By Proposition 4.3.13 it is only necessary to verify that if U is connected

and open in the context of this theorem, then U is arcwise connected. Pick p U .
4.4. UNIFORM CONTINUITY 71
Say x U satises P if there exists a continuous function, : [a, b] U such that

(a) = p and (b) = x.
A {x U such that x satises P.}
If x A, then Lemma 4.3.12 implies B (x, r) U is arcwise connected for
small enough r. Thus letting y B (x, r) , there exist intervals, [a, b] and [c, d] and
continuous functions having values in U , , such that (a) = p, (b) = x, (c) =
x, and (d) = y. Then let 1 : [a, b + d c] U be dened as
{
(t) if t [a, b]
1 (t)
(t + c b) if t [b, b + d c]
Then it is clear that 1 is a continuous function mapping p to y and showing that
B (x, r) A. Therefore, A is open. A = because since U is open there is an open
set B (p, ) containing p which is contained in U and is arcwise connected.
Now consider B U \A. I claim this is also open. If B is not open, there exists a
point z B such that every open set containing z is not contained in B. Therefore,
letting B (z, ) be such that z B (z, ) U, there exist points of A contained in
B (z, ) . But then, a repeat of the above argument shows z A also. Hence B is
open and so if B = , then U = B A and so U is separated by the two sets B and
A contradicting the assumption that U is connected.
It remains to verify the connected components are open. Let z Cp where Cp is
the connected component determined by p. Then picking B (z, ) U, Cp B (z, )
is connected and contained in U and so it must also be contained in Cp . Thus z is
an interior point of Cp .
As an application, consider the following corollary.
Corollary 4.3.15 Let f : Z be continuous where is a connected open set
in a normed vector space. Then f must be a constant.
Proof: Suppose not. Then it achieves two dierent values, k and l = k. Then
= f 1 (l) f 1 ({m Z : m = l}) and these are disjoint nonempty open sets
which separate . To see they are open, note
( ( ))
1 1 1 1
f ({m Z : m = l}) = f m=l m , m +
6 6
(( ))
which is the inverse image of an open set while f 1 (l) = f 1 l 16 , l + 61 also
an open set.
4.4 Uniform Continuity

The concept of uniform continuity is also similar to the one dimensional concept.
Denition 4.4.1 Let f be a function. Then f is uniformly continuous if for every
> 0, there exists a depending only on such that if ||x y|| < then
||f (x) f (y)|| < .
Theorem 4.4.2 Let f : K F be continuous where K is a sequentially compact

set in Fn or more generally a normed vector space. Then f is uniformly continuous
on K.
Proof: If this is not true, there exists > 0 such that for every > 0
there exists a pair of points, x and y such that even though ||x y || < ,
||f (x ) f (y )|| . Taking a succession of values for equal to 1, 1/2, 1/3, ,
and letting the exceptional pair of points for = 1/n be denoted by xn and yn ,
1
||xn yn || < , ||f (xn ) f (yn )|| .
n
Now since K is sequentially compact, there exists a subsequence, {xnk } such that
xnk z K. Now nk k and so
1
||xnk ynk || < .
k
Hence
||ynk z|| ||ynk xnk || + ||xnk z||
1
< + ||xnk z||
k
Consequently, ynk z also. By continuity of f and Theorem 4.1.2,
0 = ||f (z) f (z)|| = lim ||f (xnk ) f (ynk )|| ,
k
an obvious contradiction. Therefore, the theorem must be true.

Recall the closed and bounded subsets of Fn are those which are sequentially
compact.
4.5 Sequences And Series Of Functions

Now it is an easy matter to consider sequences of vector valued functions.
Denition 4.5.1 A sequence of functions is a map dened on N or some set of
integers larger than or equal to a given integer, m which has values which are func-

tions. It is written in the form {fn }n=m where fn is a function. It is assumed also
that the domain of all these functions is the same.
Here the functions have values in some normed vector space.
The denition of uniform convergence is exactly the same as earlier only now it
is not possible to draw representative pictures so easily.
Denition 4.5.2 Let {fn } be a sequence of functions. Then the sequence converges
pointwise to a function f if for all x D, the domain of the functions in the
sequence,
f (x) = lim fn (x)
n
4.5. SEQUENCES AND SERIES OF FUNCTIONS 73
Thus you consider for each x D the sequence of numbers {fn (x)} and if this
sequence converges for each x D, the thing it converges to is called f (x).
Denition 4.5.3 Let {fn } be a sequence of functions dened on D. Then {fn } is

said to converge uniformly to f if it converges pointwise to f and for every > 0
there exists N such that for all n N
||f (x) fn (x)|| <
for all x D.
Theorem 4.5.4 Let {fn } be a sequence of continuous functions dened on D and

suppose this sequence converges uniformly to f . Then f is also continuous on D. If
each fn is uniformly continuous on D, then f is also uniformly continuous on D.
Proof: Let > 0 be given and pick z D. By uniform convergence, there exists
N such that if n > N, then for all x D,
||f (x) fn (x)|| < /3. (4.5.3)
Pick such an n. By assumption, fn is continuous at z. Therefore, there exists > 0

such that if ||z x|| < then
||fn (x) fn (z)|| < /3.
It follows that for ||x z|| < ,
||f (x) f (z)|| ||f (x) fn (x)|| + ||fn (x) fn (z)|| + ||fn (z) f (z)||
< /3 + /3 + /3 =
which shows that since was arbitrary, f is continuous at z.

In the case where each fn is uniformly continuous, and using the same fn for
which 4.5.3 holds, there exists a > 0 such that if ||y z|| < , then
||fn (z) fn (y)|| < /3.
Then for ||y z|| < ,
||f (y) f (z)|| ||f (y) fn (y)|| + ||fn (y) fn (z)|| + ||fn (z) f (z)||
< /3 + /3 + /3 =
This shows uniform continuity of f .
Denition 4.5.5 Let {fn } be a sequence of functions dened on D. Then the

sequence is said to be uniformly Cauchy if for every > 0 there exists N such that
whenever m, n N,
||fm (x) fn (x)|| <
for all x D.
Then the following theorem follows easily.
Theorem 4.5.6 Let {fn } be a uniformly Cauchy sequence of functions dened on

D having values in a complete normed vector space such as Fn for example. Then
there exists f dened on D such that {fn } converges uniformly to f .
Proof: For each x D, {fn (x)} is a Cauchy sequence. Therefore, it converges

to some vector f (x). Let > 0 be given and let N be such that if n, m N,
||fm (x) fn (x)|| < /2
for all x D. Then for any x D, pick n N and it follows from Theorem 3.2.8
||f (x) fn (x)|| = lim ||fm (x) fn (x)|| /2 < .

m
Corollary 4.5.7 Let {fn } be a uniformly Cauchy sequence of functions continuous

on D having values in a complete normed vector space like Fn . Then there exists f
dened on D such that {fn } converges uniformly to f and f is continuous. Also, if
each fn is uniformly continuous, then so is f .
Proof: This follows from Theorem 4.5.6 and Theorem 4.5.4.

Here is one more fairly obvious theorem.
Theorem 4.5.8 Let {fn } be a sequence of functions dened on D having values in

a complete normed vector space like Fn . Then it converges pointwise if and only if
the sequence {fn (x)} is a Cauchy sequence for every x D. It converges uniformly
if and only if {fn } is a uniformly Cauchy sequence.
Proof: If the sequence converges pointwise, then by Theorem 3.5.3 the se-
quence {fn (x)} is a Cauchy sequence for each x D. Conversely, if {fn (x)} is a
Cauchy sequence for each x D, then {fn (x)} converges for each x D because
of completeness.
Now suppose {fn } is uniformly Cauchy. Then from Theorem 4.5.6 there ex-
ists f such that {fn } converges uniformly on D to f . Conversely, if {fn } converges
uniformly to f on D, then if > 0 is given, there exists N such that if n N,
|f (x) fn (x)| < /2
for every x D. Then if m, n N and x D,
|fn (x) fm (x)| |fn (x) f (x)| + |f (x) fm (x)| < /2 + /2 = .
Thus {fn } is uniformly Cauchy.

Once you understand sequences, it is no problem to consider series.
4.5. SEQUENCES AND SERIES OF FUNCTIONS 75
Denition 4.5.9 Let {fn } be a sequence of functions dened on D. Then

( )
n
fk (x) lim fk (x) (4.5.4)
n
k=1 k=1
whenever the limit exists. Thus there is a new function denoted by

fk (4.5.5)
k=1
and its value at x is given by the limit of the sequence of partial sums in 4.5.4. If
for all x D, the limit in 4.5.4 exists, then 4.5.5 is said to converge pointwise.

k=1 fk is said to converge uniformly on D if the sequence of partial sums,
{ n }

fk
k=1
converges uniformly.If the indices for the functions start at some other value than
1, you make the obvious modication to the above denition.
Theorem 4.5.10 Let {fn } be a sequence of functions dened on D which have

values in a complete normed vector space like Fn . The series k=1 fk converges
pointwise if and only if for each > 0 and x D, there exists N,x which may
depend on x as well as such that when q > p N,x ,

q
fk (x) <

k=p

The series k=1 fk converges uniformly on D if for every > 0 there exists N
such that if q > p N then

q
fk (x) < (4.5.6)

k=p
for all x D.
Proof: The rst part follows from Theorem 4.5.8. The second part follows
from observing the condition is equivalent to the sequence of partial sums forming a
uniformly Cauchy sequence and then by Theorem 4.5.6,
these partial sums converge
uniformly to a function which is the denition of k=1 fk .
Is there an easy way to recognize when 4.5.6 happens? Yes, there is. It is called
the Weierstrass M test.
Theorem 4.5.11 Let {fn } be a sequence of functions dened on D having values

in a complete normed vector space
like Fn . Suppose there exists
Mn such that
sup {|fn (x)| : x D} < Mn and n=1 M n converges. Then n=1 fn converges
uniformly on D.
Proof: Let z D. Then letting m < n and using the triangle inequality
n

m
n

fk (z) fk (z) ||fk (z)|| Mk <

k=1 k=1 k=m+1 k=m+1

whenever m is large enough because of the assumption that n=1 Mn converges.
Therefore, the sequenceof partial sums is uniformly Cauchy on D and therefore,

converges uniformly to k=1 fk on D.
Theorem
4.5.12 If {fn } is a sequence of continuous
functions dened on D and
f
k=1 k converges uniformly, then the function, k=1 fk must also be continuous.
Proof: This follows from Theorem 4.5.4 applied to the sequence of partial
sums
of the above series which is assumed to converge uniformly to the function, k=1 fk .
4.6 Polynomials
General considerations about what a function is have already been considered ear-
lier. For functions of one variable, the special kind of functions known as a polyno-
mial has a corresponding version when one considers a function of many variables.
This is found in the next denition.
Denition 4.6.1 Let be an n dimensional multi-index. This means
= (1 , , n )
where each i is a positive integer or zero. Also, let

n
|| |i |
i=1
Then x means
x x
1 x2 x3
1 2 n
where each xj F. An n dimensional polynomial of degree m is a function of the

form
p (x) = d x.
||m
where the d are complex or real numbers. Rational functions are dened as the
quotient of two polynomials. Thus these functions are dened on Fn .
For example, f (x) = x1 x22 + 7x43 x1 is a polynomial of degree 5 and
x1 x22 + 7x43 x1 + x32

4x31 x22 + 7x23 x1 x32
4.7. SEQUENCES OF POLYNOMIALS, WEIERSTRASS APPROXIMATION 77
is a rational function.
Note that in the case of a rational function, the domain of the function might
not be all of Fn . For example, if
x1 x22 + 7x43 x1 + x32

f (x) = ,
x22 + 3x21 4
the domain of f would be all complex numbers such that x22 + 3x21 = 4.
By Theorem 4.0.2 all polynomials are continuous. To see this, note that the
function,
k (x) xk
is a continuous function because of the inequality
| k (x) k (y)| = |xk yk | |x y| .
Polynomials are simple sums of scalars times products of these functions. Similarly,
by this theorem, rational functions, quotients of polynomials, are continuous at
points where the denominator is non zero. More generally, if V is a normed vector
space, consider a V valued function of the form

f (x) d x
||m
where d V , sort of a V valued polynomial. Then such a function is continuous

by application of Theorem 4.0.2 and the above observation about the continuity of
the functions k .
Thus there are lots of examples of continuous functions. However, it is even
better than the above discussion indicates. As in the case of a function of one
variable, an arbitrary continuous function can typically be approximated uniformly
by a polynomial. This is the n dimensional version of the Weierstrass approximation
theorem.
4.7 Sequences Of Polynomials, Weierstrass Approx-

imation
Just as an arbitrary continuous function dened on an interval can be approximated
uniformly by a polynomial, there exists a similar theorem which is just a general-
ization of the earlier one which will hold for continuous functions dened on a box
or more generally a closed and bounded set. The proof is based on the following
lemma.
Lemma 4.7.1 The following estimate holds for x [0, 1] and m 2.

m ( )
m 2 mk 1
(k mx) xk (1 x) m
k 4
k=0
Proof: First of all, from the binomial theorem

m ( )
( )m m kt k mk
1 x + et x = e x (1 x)
k
k=0
and so, taking the derivative on both sides with respect to t and then letting t = 0,
m ( )
m mk
mx = kxk (1 x)
k
k=0
Next take the second derivative of both sides with respect to t and then let t = 0.
Thus after doing the computations,
m ( )
m 2 k mk
x2 m2 x2 m + mx = k x (1 x)
k
k=0
It follows from the above and the binomial theorem,

m ( )
m 2 mk
(k mx) xk (1 x)
k
k=0
m ( ) m ( )

m 2 k mk m mk
= k x (1 x) + m x 2mx
2 2
kxk (1 x)
k k
k=0 k=0
= x2 m2 x2 m + mx + m2 x2 2m2 x2 = x2 m + mx
and this achieves its maximum value when x = 1/2. Plugging this in, the above is
no larger than 14 m.
Now let f be a continuous function dened on [0, 1] . Let pn be the polynomial
dened by
n ( ) ( )
n k nk
pn (x) f xk (1 x) . (4.7.7)
k n
k=0
n
Now for f a continuous function dened on [0, 1] and for x = (x1 , , xn ) ,consider
the polynomial,
m m ( )( )
( )
m m m k1 mk1 k2 mk2
pm (x) x (1 x1 ) x2 (1 x2 )
k1 k2 kn 1
k1 =1 kn =1
( )
mkn k1 kn
xknn (1 xn ) f , , . (4.7.8)
m m
Also dene if I is a set in Rn
||h||I sup {|h (x)| : x I} .
Thus pm converges uniformly to f on a set I if
lim ||pm f ||I = 0.

m
To simplify the notation, let k = (k1 , , kn ) where each ki [0, m], k

( k1 ) m
m , , m , and let
kn
( ) ( )( ) ( )
m m m m
.
k k1 k2 kn
Also dene
||k|| max {ki , i = 1, 2, , n}
mk mk1 mk2 mkn
xk (1 x) xk11 (1 x1 ) xk22 (1 x2 ) xknn (1 xn ) .
Thus in terms of this notation,
( ) ( )
m k mk k
pm (x) = x (1 x) f
k m
||k|| m
This is the n dimensional version of the Bernstein polynomials which is what results
n the case where n = 1.
n n
Lemma 4.7.2 For x [0, 1] , f a continuous F valued function dened on [0, 1] ,
n
and pm given in 4.7.8, pm converges uniformly to f on [0, 1] as m .
Proof: The function, f is uniformly continuous because it is continuous on a

n
sequentially compact set [0, 1] . Therefore, there exists > 0 such that if |x y| <
, then
|f (x) f (y)| < .
2
Denote by G the set of k such that (ki mxi ) < 2 m2 for each
k i where
= / n.
Note this condition is equivalent to saying that for each i, mi xi < . A short
computation shows that by the binomial theorem,
(m) mk
xk (1 x) =1
k
||k|| m
n
and so for x [0, 1] ,
( ) ( )
m k mk k
|pm (x) f (x)| x (1 x) f f (x)
k m
||k|| m
(m) ( )
mk k
x (1 x)
k
f f (x)
k m
kG
m ( ) ( )
mk k

+ x (1 x)
k
f m f (x) (4.7.9)
C
k
kG
Now for k G it follows that for each i

ki
xi < (4.7.10)
m n
(k) k
and so f m f (x) < because the above implies m x < . Therefore, the
rst sum on the right in 4.7.9 is no larger than
(m) mk
(m) mk
xk (1 x) xk (1 x) = .
k k
kG ||k|| m
n
Letting M max {|f (x)| : x [0, 1] } it follows
|pm (x) f (x)|

(m) mk
+ 2M xk (1 x)
k
kGC
( )n ( ) n
1 m 2 mk
+ 2M (kj mxj ) xk (1 x)
2 m2 k j=1
kGC
( )n ( ) n
1 m 2 mk
+ 2M (kj mxj ) xk (1 x)
2 m2 k j=1
||k|| m
because on GC ,
2
(kj mxj )
< 1, j = 1, , n.
2 m2
Now by Lemma 4.7.1,
( )n (
1 m )n
|pm (x) f (x)| + 2M .
2 m2 4
Therefore, since the right side does not depend on x, it follows that for all m
suciently large,
||pm f ||[0,1]n 2
and since is arbitrary, this shows limm ||pm f ||[0,1]n = 0.
Theorem 4.7.3 Let f be a continuous function dened on

n
R [ak , bk ] .
k=1
Then there exists a sequence of polynomials {pm } converging uniformly to f on R.
Proof: Let gk : [0, 1] [ak , bk ] be linear, one to one, and onto and let
x = g (y) (g1 (y1 ) , g2 (y2 ) , , gn (yn )) .

n n
Thus g : [0, 1] k=1 [ak , bk ] is one to one, onto, and each component function
n
is linear. Then f g is a continuous function dened on [0, 1] . It follows from
n
Lemma 4.7.2 there exists a sequence of polynomials, {pm (y)} each dened on [0, 1]
{ ( )}
which converges uniformly to f g on [0, 1] . Therefore, pm g1 (x) converges
n
uniformly to f (x) on R. But

( )
y = (y1 , , yn ) = g11 (x1 ) , , gn1 (xn )
{ ( )}
and each gk1 is linear. Therefore, pm g1 (x) is a sequence of polynomials.
There is a more general version of this theorem which is easy to get. It depends
on the Tietze extension theorem, a wonderful little result which is interesting for
its own sake.
4.7.1 The Tietze Extension Theorem

To generalize the Weierstrass approximation theorem I will give a special case of
the Tietze extension theorem, a very useful result in topology. When this is done,
it will be possible to prove the Weierstrass approximation theorem for functions
dened on a closed and bounded subset of Rn rather than a box.
Lemma 4.7.4 Let S Rn be a nonempty subset. Dene
dist (x, S) inf {|x y| : y S} .
Then x dist (x, S) is a continuous function satisfying the inequality,
|dist (x, S) dist (y, S)| |x y| . (4.7.11)
Proof: The continuity of x dist (x, S) is obvious if the inequality 4.7.11

is established. So let x, y Rn . Without loss of generality, assume dist (x, S)
dist (y, S) and pick z S such that |y z| < dist (y, S) . Then
|dist (x, S) dist (y, S)| = dist (x, S) dist (y, S)

|x z| (|y z| )
|z y| + |x y| |y z| + = |x y| + .
Since is arbitrary, this proves 4.7.11.
Lemma 4.7.5 Let H, K be two nonempty disjoint closed subsets of Rn . Then there
exists a continuous function, g : Rn [1, 1] such that g (H) = 1/3, g (K) =
1/3, g (Rn ) [1/3, 1/3] .
Proof: Let
dist (x, H)
f (x) .
dist (x, H) + dist (x, K)
The denominator is never equal to zero because if dist (x, H) = 0, then x H
because H is closed. (To see this, pick hk B (x, 1/k) H. Then hk x and since
H is closed, x H.) Similarly, if dist (x, K) = 0, then x K and so the denominator
is never zero as claimed. Hence f is (continuous) and from its denition, f = 0 on H
and f = 1 on K. Now let g (x) 32 f (x) 12 . Then g has the desired properties.

Denition 4.7.6 For f a real or complex valued bounded continuous function de-
ned on M Rn .
||f ||M sup {|f (x)| : x M } .
Lemma 4.7.7 Suppose M is a closed set in Rn where Rn and suppose f : M

[1, 1] is continuous at every point of M. Then there exists a function, g which is
dened and continuous on all of Rn such that ||f g||M < 32 , g (Rn ) [1/3, 1/3] .
Proof: Let H = f 1 ([1, 1/3]) , K = f 1 ([1/3, 1]) . Thus H and K are

disjoint closed subsets of M. Suppose rst H, K are both nonempty. Then by
Lemma 4.7.5 there exists g such that g is a continuous function dened on all of Rn
and g (H) = 1/3, g (K) = 1/3, and g (Rn ) [1/3, 1/3] . It follows ||f g||M <
2/3. If H = , then f has all its values in [1/3, 1] and so letting g 1/3, the
desired condition is obtained. If K = , let g 1/3.
Lemma 4.7.8 Suppose M is a closed set in Rn and suppose f : M [1, 1] is

continuous at every point of M. Then there exists a function, g which is dened and
continuous on all of Rn such that g = f on M and g has its values in [1, 1] .
Proof: Using Lemma 4.7.7, let g1 be such that g1 (Rn ) [1/3, 1/3] and
2
||f g1 ||M .
3
Suppose g1 , , gm have been chosen such that gj (Rn ) [1/3, 1/3] and
m ( )i1
( )m

2 2
f gi < . (4.7.12)
3 3
i=1 M
This has been done for m = 1. Then

( ) ( m ( )i1
)
3 m
2
f gi 1
2 3
i=1 M
( 3 )m ( m ( 2 )i1 )
and so 2 f i=1 3 gi can play the role of f in the rst step of the
proof. Therefore, there exists gm+1 dened and continuous on all of Rn such that
its values are in [1/3, 1/3] and
( ) ( m ( )i1
)
3 m
2 2
f gi gm+1 .
2 3 3
i=1 M
Hence ( m ( )i1
) ( ) ( )m+1
m
2 2 2
f gi gm+1 .
3 3 3
i=1 M
It follows there exists a sequence, {gi } such that each has its values in [1/3, 1/3]
and for every m 4.7.12 holds. Then let
( )i1
2
g (x) gi (x) .
i=1
3
It follows ( ) m ( )i1
2 i1
2 1
|g (x)| gi (x) 1
3 3 3
i=1 i=1
and ( ) ( )
2 i1 i1
2 1
gi (x)
3 3 3
so the Weierstrass M test applies and shows convergence is uniform. Therefore g
must be continuous. The estimate 4.7.12 implies f = g on M .
The following is the Tietze extension theorem.
Theorem 4.7.9 Let M be a closed nonempty subset of Rn and let f : M [a, b]

be continuous at every point of M. Then there exists a function, g continuous on
all of Rn which coincides with f on M such that g (Rn ) [a, b] .
Proof: Let f1 (x) = 1 + ba 2

(f (x) b) . Then f1 satises the conditions of
Lemma 4.7.8 and so there exists g1 : Rn [1, 1] )such that g is continuous on Rn
( ba
and equals f1 on M . Let g (x) = (g1 (x) 1) 2 + b. This works.
With the Tietze extension theorem, here is a better version of the Weierstrass
approximation theorem.
Theorem 4.7.10 Let K be a closed and bounded subset of Rn and let f : K R

be continuous. Then there exists a sequence of polynomials {pm } such that
lim (sup {|f (x) pm (x)| : x K}) = 0.

m
In other words, the sequence of polynomials converges uniformly to f on K.
Proof: By the Tietze extension theorem, there exists an extension of f to a

continuous function g dened on all Rn such that g = f on K. Now since K is
bounded, there exist intervals, [ak , bk ] such that

n
K [ak , bk ] = R
k=1
Then by the Weierstrass approximation theorem, Theorem 4.7.3 there exists a se-
quence of polynomials {pm } converging uniformly to g on R. Therefore, this se-
quence of polynomials converges uniformly to g = f on K as well.
By considering the real and imaginary parts of a function which has values in C
one can generalize the above theorem.
Corollary 4.7.11 Let K be a closed and bounded subset of Rn and let f : K F

be continuous. Then there exists a sequence of polynomials {pm } such that
lim (sup {|f (x) pm (x)| : x K}) = 0.

m
In other words, the sequence of polynomials converges uniformly to f on K.
4.8 The Operator Norm

It is important to be able to measure the size of a linear operator. The most
convenient way is described in the next denition.
Denition 4.8.1 Let V, W be two nite dimensional normed vector spaces having
norms ||||V and ||||W respectively. Let L L (V, W ) . Then the operator norm of
L, denoted by ||L|| is dened as
||L|| sup {||Lx||W : ||x||V 1} .
Then the following theorem discusses the main properties of this norm. In the
future, I will dispense with the subscript on the symbols for the norm because it is
clear from the context which norm is meant. Here is a useful lemma.
Lemma 4.8.2 Let V be a normed vector space having a basis {v1 , , vn }. Let
{ n }

n
A = a F : ak vk 1

k=1
where a = (a1 , , an ) . Then A is a closed and bounded subset of Fn .
Proof: First suppose a

/ A. Then

n

ak vk > 1.

k=1
Then for b = (b1 , , bn ) , and using the triangle inequality,

n n

bk vk = (ak (ak bk )) vk

k=1 k=1

n n

ak vk |ak bk | ||vk ||

k=1 k=1
and now it is apparent that if |a b| is suciently small so that each |ak bk | is

small enough, this expression is larger than 1. Thus there exists > 0 such that
B (a, ) AC showing that AC is open. Therefore, A is closed.
4.8. THE OPERATOR NORM 85
Next consider the claim that A is bounded. Suppose this is not so. Then there
exists a sequence {ak } of points of A,
( )
ak = a1k , , ank ,
such that limk |ak | = . Then from the denition of A,

n ajk 1
v
j . (4.8.13)
|a | |a k|
j=1 k
Let ( )
a1k an
bk = , , k
|ak | |ak |
Then |bk | = 1 so bk is contained in the closed and bounded set S (0, 1) which is
sequentially compact in Fn . It follows there exists a subsequence, still denoted by
{bk } such that it converges to b S (0,1) . Passing to the limit in 4.8.13 using the
following inequality,

n n j
ajk
n
a
vj bj vj k
bj ||vj ||
|a | |a |
j=1 k j=1 j=1k
n
to see that the sum converges to j=1 bj vj , it follows

n
bj vj = 0
j=1
and this is a contradiction because {v1 , , vn } is a basis and not all the bj can
equal zero. Therefore, A must be bounded after all.
Theorem 4.8.3 The operator norm has the following properties.
1. ||L|| <
2. For all x X, ||Lx|| ||L|| ||x|| and if L L (V, W ) while M L (W, Z) ,
then ||M L|| ||M || ||L||.
3. |||| is a norm. In particular,
(a) ||L|| 0 and ||L|| = 0 if and only if L = 0, the linear transformation

which sends every vector to 0.
(b) ||aL|| = |a| ||L|| whenever a F
(c) ||L + M || ||L|| + ||M ||
4. If L L (V, W ) for V, W normed vector spaces, L is continuous, meaning that

L1 (U ) is open whenever U is an open set in W .
Proof: First consider 1.). Let A be as in the above lemma. Then

n

||L|| sup L aj vj : a A

j=1

n

= sup aj L (vj ) : a A <

j=1

n
because a j=1 aj L (vj ) is a real valued continuous function dened on a
sequentially compact set and so it achieves its maximum.
Next consider 2.). If x = 0 there is nothing to show. Assume x = 0. Then from
the denition of ||L|| , ( )
x
L ||L||
||x||
and so, since L is linear, you can multiply on both sides by ||x|| and conclude
||L (x)|| ||L|| ||x|| .
For the other claim,
||M L|| sup {||M L (x)|| : ||x|| 1}
||M || sup {||Lx|| : ||x|| 1} ||M || ||L|| .
Finally consider 3.) If ||L|| = 0 then from 2.), ||Lx|| 0 and so Lx = 0 for
every x which is the same as saying L = 0. If Lx = 0 for every x, then L = 0 by
denition. Let a F. Then from the properties of the norm, in the vector space,
||aL|| sup {||aLx|| : ||x|| 1}
= sup {|a| ||Lx|| : ||x|| 1}
= |a| sup {||Lx|| : ||x|| 1} |a| ||L||
Finally consider the triangle inequality.
||L + M || sup {||Lx + M x|| : ||x|| 1}
sup {||M x|| + ||Lx|| : ||x|| 1}
sup {||Lx|| : ||x|| 1} + sup {||M x|| : ||x|| 1}
because ||Lx|| sup {||Lx|| : ||x|| 1} with a similar inequality holding for M.
Therefore, by denition,
||L + M || ||L|| + ||M || .
Finally consider 4.). Let L L (V, W ) and let U be open in W and v L1 (U ) .
Thus since U is open, there exists > 0 such that
L (v) B (L (v) , ) U.
4.8. THE OPERATOR NORM 87
Then if w V,
||L (v w)|| = ||L (v) L (w)|| ||L|| ||v w||
and so if ||v w|| is suciently small, ||v w|| < / ||L|| , then L (w) B (L (v) , )
which shows B (v, / ||L||) L1 (U ) and since v L1 (U ) was arbitrary, this
shows L1 (U ) is open.
The operator norm will be very important in the chapter on the derivative.
Part 1.) of Theorem 4.8.3 says that if L L (V, W ) where V and W are two
normed vector spaces, then there exists K such that for all v V,
||Lv||W K ||v||V
An obvious case is to let L = id, the identity map on V and let there be two
dierent norms on V, ||||1 and ||||2 . Thus (V, ||||1 ) is a normed vector space and
so is (V, ||||2 ) . Then Theorem 4.8.3 implies that
||v||2 = ||id (v)||2 K2 ||v||1 (4.8.14)
while the same reasoning implies there exists K1 such that
||v||1 K1 ||v||2 . (4.8.15)
This leads to the following important theorem.
Theorem 4.8.4 Let V be a nite dimensional vector space and let ||||1 and ||||2 be
two norms for V. Then these norms are equivalent which means there exist constants,
, such that for all v V
||v||1 ||v||2 ||v||1
A set K is sequentially compact if and only if it is closed and bounded. Also every
nite dimensional normed vector space is complete. Also any closed and bounded
subset of a nite dimensional normed vector space is sequentially compact.
Proof: From 4.8.14 and 4.8.15
||v||1 K1 ||v||2 K1 K2 ||v||1
and so
1
||v||1 ||v||2 K2 ||v||1 .
K1
Next consider the claim that all closed and bounded sets in a normed vector
space are sequentially compact. Let L : Fn V be dened by

n
L (a) ak vk
k=1
where {v1 , , vn } is a basis for V . Thus L L (Fn , V ) and so by Theorem 4.8.3

this is a continuous function. Hence if K is a closed and bounded subset of V it
follows
( )
L1 (K) = Fn \ L1 K C = Fn \ (an open set) = a closed set.
Also L1 (K) is bounded. To see this, note that L1 is one to one onto V and so
L1 L (V, Fn ). Therefore,
1
L (v) L1 ||v|| L1 r
where K B (0, r) . Since K is bounded, such an r exists. Thus L1 (K) is a closed

and bounded subset of Fn and is therefore sequentially compact.
{ It follows that if
}

{vk }k=1 K, there is a subsequence {vkl }l=1 such that L1 vkl converges to a
point, a L1 (K) . Hence by continuity of L,
( )
vkl = L L1 (vkl ) La K.
Conversely, suppose K is sequentially compact. I need to verify it is closed and

bounded. If it is not closed,
( then) it is missing a limit point, k0 . Since k0 is a limit
point, there exists kn B k0 , n1 such that kn = k0 . Therefore, {kn } has no limit
point in K because k0 / K. It follows K must be closed. If K is not bounded,
then you could pick kn K such that km / B (0, m) and it follows {kk } cannot
have a subsequence which { }converges because if k K, then for large enough m,
k B (0, m/2) and so if kkj is any subsequence, kkj / B (0, m) for all but nitely
many j. In other words, for any k K, it is not the limit of any subsequence. Thus
K must also be bounded.

Finally consider the claim about completeness. Let {vk }k=1 {be a Cauchy } se-
quence in V . Since L1 , dened above is in L (V, Fn ) , it follows L1 vk k=1 is a
Cauchy sequence in Fn . This follows from the inequality,
1
L vk L1 vl L1 ||vk vl || .
therefore, there exists a Fn such that L1 vk a and since L is continuous,

( )
vk = L L1 (vk ) L (a) .

Next suppose K is a closed and bounded subset of V and let {xk }k=1 be a
sequence of vectors in K. Let {v1 , , vn } be a basis for V and let

n
xk = xjk vj
j=1
Dene a norm for V according to

n
j 2
n
2
||x|| x , x = xj vj
j=1 j=1
4.9. ASCOLI ARZELA THEOREM 89
It is clear most axioms of a norm hold. The triangle inequality also holds because
by the triangle inequality for Fn ,
1/2
n
j
||x + y|| x + y j 2
j=1
1/2 1/2
n
j 2
n
j 2
x + y ||x|| + ||y|| .
j=1 j=1
By the rst part of this theorem, this norm is equivalent to the norm on V . Thus
K{is closed
} and bounded with respect to this new norm. It follows that for each
j
j, xk is a bounded sequence in F and so by the theorems about sequential
k=1
compactness in F it follows upon taking subsequences n times, there exists a sub-
sequence xkl such that for each j,
lim xjkl = xj
l
for some xj . Hence

n
n
lim xkl = lim xjkl vj = x j vj K
l l
j=1 j=1
because K is closed.
n space and let {v1 , , vn } be a basis. Dene a

Example 4.8.5 Let V be a vector
norm on V as follows. For v = k=1 ak vk ,
||v|| max {|ak | : k = 1, , n}
In the above example, this is a norm on the vector space, V . It is clear ||av|| =
|a| ||v|| and that ||v|| 0 and
nequals 0 if and only ifv = 0. The hard part is the
n
triangle inequality. Let v = k=1 ak vk and w = v = k=1 bk vk .
||v + w|| max {|ak + bk |} max {|ak | + |bk |}

k k
max |ak | + max |bk | ||v|| + ||w|| .
k k
This shows this is indeed a norm.
4.9 Ascoli Arzela Theorem

Let {fk }k=1 be a sequence of functions dened on a compact set which have values
in a nite dimensional normed vector space V. The following denition will be of
the norm of such a function.
Denition 4.9.1 Let f : K V be a continuous function which has values in a

nite dimensional normed vector space V where here K is a compact set contained
in some normed vector space. Dene
||f || sup {||f (x)||V : x K} .
Denote the set of such functions by C (K; V ) .
Proposition 4.9.2 The above denition yields a norm and in fact C (K; V ) is a
complete normed linear space.
Proof: This is obviously a vector space. Just verify the axioms. The main
thing to show it that the above is a norm. First note that ||f || = 0 if and only if
f = 0 and ||f || = || ||f || whenever F, the eld of scalars, C or R. As to the
triangle inequality,
||f + g|| sup {||(f + g) (x)|| : x K}
sup {||f (x)||V : x K} + sup {||g (x)||V : x K}

||g|| + ||f ||
Furthermore, the function x ||f (x)||V is continuous thanks to the triangle in-
equality which implies
|||f (x)||V ||f (y)||V | ||f (x) f (y)||V .
Therefore, ||f || is a well dened nonnegative real number.
It remains to verify completeness. Suppose then {fk } is a Cauchy sequence with
respect to this norm. Then from the denition it is a uniformly Cauchy sequence
and since by Theorem 4.8.4 V is a complete normed vector space, it follows from
Theorem 4.5.6, there exists f C (K; V ) such that {fk } converges uniformly to f .
That is,
lim ||f f k || = 0.
k
This proves the proposition.
Theorem 4.8.4 says that closed and bounded sets in a nite dimensional normed
vector space V are sequentially compact. This theorem typically doesnt apply to
C (K; V ) because generally this is not a nite dimensional vector space although
as shown above it is a complete normed vector space. It turns out you need more
than closed and bounded in order to have a subset of C (K; V ) be sequentially
compact.
Denition 4.9.3 Let F C (K; V ) . Then F is equicontinuous if for every > 0
there exists > 0 such that whenever |x y| < , it follows
|f (x) f (y)| <
for all f F. F is bounded if there exists C such that
||f || C
for all f F.
Lemma 4.9.4 Let K be a sequentially compact nonempty subset of a nite dimen-

sional normed vector space. Then there exists a countable set D {ki }i=1 such
that for every > 0 and for every x K,
B (x, ) D = .
( )
Proof: Let n N. Pick k1n K. If B k1n , n1 K, stop. Otherwise pick
( )
n 1
k2 K \ B k1 ,
n
n
Continue this way till the process ends. It must end because if it didnt, there would
exist a convergent subsequence which would imply two of the kjn would have to be
closer than 1/n which is impossible from the construction. Denote this collection
of points by Dn . Then D n=1 Dn . This must work because if > 0 is given
and x K, let 1/n < /3 and the construction implies x B (kin , 1/n) for some
kin Dn D. Then
kin B (x, ) .
D is countable because it is the countable union of nite sets.
Denition 4.9.5 More generally, if K is any subset of a normed vector space and
there exists D such that D is countable and for all x K,
B (x, ) D =
then K is called separable.
Now here is another remarkable result about equicontinuous functions.

Lemma 4.9.6 Suppose {fk }k=1 is equicontinuous and the functions are dened on
a sequentially compact set K. Suppose also for each x K,
lim fk (x) = f (x) .

k
Then in fact f is continuous and the convergence is uniform. That is
lim ||fk f || = 0.
k
Proof: Uniform convergence would say that for every > 0, there exists n
such that if k, l n , then for all x K,
||fk (x) fl (x)|| < .
Thus if the given sequence does not converge uniformly, there exists > 0 such that
for all n, there exists k, l n and xn K such that
||fk (xn ) fl (xn )||

Since K is sequentially compact, there exists a subsequence, still denoted by {xn }

such that limn xn = x K. Then letting k, l be associated with n as just
described,
||fk (xn ) fl (xn )||V ||fk (xn ) fk (x)||V

+ ||fk (x) fl (x)||V + ||fl (x) fl (xn )||V
By equicontinuity, if n is large enough, this implies

< + ||fk (x) fl (x)||V +
3 3
and now taking n still larger if necessary, the middle term on the right in the above
is also less than /3 which yields a contradiction. Hence convergence is uniform and
so it follows from Theorem 4.5.6 the function f is actually continuous and
lim ||f f k || = 0.
k

The Ascoli Arzela theorem is the following.
Theorem 4.9.7 Let K be a closed and bounded subset of a nite dimensional

normed vector space and let F C (K; V ) where V is a nite dimensional normed

vector space. Suppose also that F is bounded and equicontinuous. Then if {fk }k=1

F, there exists a subsequence {fkl }l=1 which converges to a function f C (K; V )
in the sense that
lim ||f f kl ||
l
{ } { }
Proof: Denote by f(k,n) n=1 a subsequence of f(k1,n) n=1 where the index
denoted by (k 1, k 1) is always less than the index denoted by (k, k) . Also let

the countable dense subset of Lemma 4.9.4 be D = {dk }k=1 . Then consider the
following diagram.
f(1,1) , f(1,2) , f(1,3) , f(1,4) , d1

f(2,1) , f(2,2) , f(2,3) , f(2,4) , d1 , d2
f(3,1) , f(3,2) , f(3,3) , f(3,4) , d1 , d2 , d3
f(4,1) , f(4,2) , f(4,3) , f(4,4) , d1 , d2 , d3 , d4
..
.
{ }
The meaning is as follows. f(1,k) k=1 is a subsequence of the original sequence

which converges at d1 . Such a subsequence exists because {fk (d1 )}k=1 is contained
in a bounded set so a subsequence converges by Theorem 4.8.4. (It is given to be in
a bounded set and so the closure
{ of }this bounded set is both closed and bounded,

hence weakly compact.) Now f(2,k) k=1 is a subsequence of the rst subsequence
which converges, at d2 . Then by Theorem 3.2.6 this new subsequence continues to
converge at d1 . Thus, as indicated by the diagram, it converges at both b1 and
4.10. EXERCISES 93
b2 . Continuing this way explains the{ meaning

} of the diagram. Now consider the
subsequence of the original sequence f(k,k) k=1 . For k n, this subsequence is a

subsequence of the subsequence {fn,k }k=1 and so it converges at d1 , d2 , d3 , d4 .
{ }
This being true for all n, it follows f(k,k) k=1 converges at every point of D. To
save on notation, I shall simply denote this as {fk } .
Then letting d D,
||fk (x) fl (x)||V ||fk (x) fk (d)||V

+ ||fk (d) fl (d)||V + ||fl (d) fl (x)||V
Picking d close enough to x and applying equicontinuity,
||fk (x) fl (x)||V < 2/3 + ||fk (d) fl (d)||V
Thus for k, l large enough, the right side is less than . This shows that for each x

K, {fk (x)}k=1 is a Cauchy sequence and so by completeness of V this converges. Let
f (x) be the thing to which it converges. Then f is continuous and the convergence
is uniform by Lemma 4.9.6.
4.10 Exercises
1. In Theorem 4.7.3 it is assumed f has values in F. Show there is no change if
f has values in V, a normed vector space provided
you redene the denition
of a polynomial to be something of the form ||m a x where a V .
2. How would you generalize the conclusion of Corollary 4.7.11 to include the
situation where f has values in a nite dimensional normed vector space?
3. If {fn } and {gn } are sequences of Fn valued functions dened on D which con-
verge uniformly, show that if a, b are constants, then afn + bgn also converges
uniformly. If there exists a constant, M such that |fn (x)| , |gn (x)| < M for all
n and for all x D, show {fn gn } converges uniformly. Let fn (x) 1/ |x|
for x B (0,1) and let gn (x) (n 1) /n. Show {fn } converges uniformly on
B (0,1) and {gn } converges uniformly but {fn gn } fails to converge uniformly.
4. Formulate a theorem for series of functions of n variables which will allow you
to conclude the innite series is uniformly continuous based on reasonable
assumptions about the functions in the sum.
5. If f and g are real valued functions which are continuous on some set D, show
that
min (f, g) , max (f, g)
are also continuous. Generalize this to any nite collection of continuous func-
tions. Hint: Note max (f, g) = |f g|+f
2
+g
. Now recall the triangle inequality
which can be used to show || is a continuous function.
6. Find an example of a sequence of continuous functions dened on Rn such

that each function is nonnegative and each function has a maximum value
equal to 1 but the sequence of functions converges to 0 pointwise on Rn \ {0} ,
that is, the set of vectors in Rn excluding 0.
7. Theorem 4.3.14 says an open subset U of Rn is arcwise connected if and only
if U is connected. Consider the usual Cartesian coordinates relative to axes
x1 , , xn . A square curve is one consisting of a succession of straight line
segments each of which is parallel to some coordinate axis. Show an open
subset U of Rn is connected if and only if every two points can be joined by
a square curve.
8. Let x h (x) be a bounded continuous function. Show the function f (x) =
h(nx)
n=1 n2 is continuous.
9. Let S be a any countable subset of Rn . Show there exists a function, f
dened on Rn which is discontinuous at every point of S but continuous
everywhere else. Hint: This is real easy if you do the right thing. It involves
the Weierstrass M test.
10. By Theorem 4.7.10 there exists n a sequence of polynomials converging uni-
formly to f (x) = |x| on R k=1 [M, M ] . Show there exists a sequence of
polynomials, {pn } converging uniformly to f on R which has the additional
property that for all n, pn (0) = 0.
11. If f is any continuous function dened on K asequentially compact subset

of Rn , show there exists a series of the form k=1 pk , where each pk is a
polynomial, which converges uniformly to f on [a, b]. Hint: You should use
the Weierstrass approximation theorem to obtain a sequence of polynomials.
Then arrange it so the limit of this sequence is an innite sum.
12. A function f is Holder continuous if there exists a constant, K such that

|f (x) f (y)| K |x y|
for some 1 for all x, y. Show every Holder continuous function is uniformly
continuous.
13. Consider f (x) dist (x,S) where S is a nonempty subset of Rn . Show f is
uniformly continuous.
14. Let K be a sequentially compact set in a normed vector space V and let
f : V W be continuous where W is also a normed vector space. Show f (K)
is also sequentially compact.
15. If f is uniformly continuous, does it follow that |f | is also uniformly continu-
ous? If |f | is uniformly continuous does it follow that f is uniformly continu-
ous? Answer the same questions with uniformly continuous replaced with
continuous. Explain why.
4.10. EXERCISES 95
16. Let f : D R be a function. This function is said to be lower semicontinuous1

at x D if for any sequence {xn } D which converges to x it follows
f (x) lim inf f (xn ) .

n
Suppose D is sequentially compact and f is lower semicontinuous at every

point of D. Show that then f achieves its minimum on D.
17. Let f : D R be a function. This function is said to be upper semicontinuous

at x D if for any sequence {xn } D which converges to x it follows
f (x) lim sup f (xn ) .

n
Suppose D is sequentially compact and f is upper semicontinuous at every

point of D. Show that then f achieves its maximum on D.
18. Show that a real valued function dened on D Rn is continuous if and only
if it is both upper and lower semicontinuous.
19. Show that a real valued lower semicontinuous function dened on a sequen-
tially compact set achieves its minimum and that an upper semicontinuous
function dened on a sequentially compact set achieves its maximum.
20. Give an example of a lower semicontinuous function dened on Rn which is

not continuous and an example of an upper semicontinuous function which is
not continuous.
21. Suppose {f : } is a collection of continuous functions. Let
F (x) inf {f (x) : }
Show F is an upper semicontinuous function. Next let
G (x) sup {f (x) : }
Show G is a lower semicontinuous function.
22. Let f be a function. epi (f ) is dened as
{(x, y) : y f (x)} .
It is called the epigraph of f . We say epi (f ) is closed if whenever (xn , yn )

epi (f ) and xn x and yn y, it follows (x, y) epi (f ) . Show f is lower
semicontinuous if and only if epi (f ) is closed. What would be the correspond-
ing result equivalent to upper semicontinuous?
1 The notion of lower semicontinuity is very important for functions which are dened on innite
dimensional sets. In more general settings, one formulates the concept dierently.
23. The operator norm was dened for L (V, W ) above. This is the usual norm
used for this vector space of linear transformations. Show that any other norm
used on L (V, W ) is equivalent to the operator norm. That is, show that if
||||1 is another norm, there exist scalars , such that
||L|| ||L||1 ||L||
for all L L (V, W ) where here |||| denotes the operator norm.
24. One alternative norm which is very popular is as follows. Let L L (V, W )
and let (lij ) denote the matrix of L with respect to some bases. Then the
Frobenius norm is dened by
1/2

|lij |
2
||L||F .
ij
Show this is a norm. Other norms are of the form

1/p

|lij |
p
ij
where p 1 or even
||L|| = max |lij | .
ij
Show these are also norms.
25. Explain why L (V, W ) is always a complete normed vector space whenever
V, W are nite dimensional normed vector spaces for any choice of norm for
L (V, W ). Also explain why every closed and bounded subset of L (V, W ) is
sequentially compact for any choice of norm on this space.
26. Let L L (V, V ) where V is a nite dimensional normed vector space. Dene

Lk
eL
k!
k=1
Explain the meaning of this innite sum and show it converges in L (V, V ) for
any choice of norm on this space. Now tell how to dene sin (L).
27. Let X be a nite dimensional normed vector space, real or complex. Show
n
that X is separable.Hint: Let {vi }
i=1 be a basis and dene a map from F to
n
n n
X, , as follows. ( k=1 xk ek ) k=1 xk vk . Show is continuous and has
a continuous inverse. Now let D be a countable dense set in Fn and consider
(D).
4.10. EXERCISES 97
28. Let B (X; Rn ) be the space of functions f , mapping X to Rn such that
sup{|f (x)| : x X} < .
Show B (X; Rn ) is a complete normed linear space if we dene
||f || sup{|f (x)| : x X}.
29. Let (0, 1]. Dene, for X a compact subset of Rp ,
C (X; Rn ) {f C (X; Rn ) : (f ) + ||f || ||f || < }
where
||f || sup{|f (x)| : x X}
and
|f (x) f (y)|
(f ) sup{ : x, y X, x = y}.
|x y|
Show that (C (X; Rn ) , |||| ) is a complete normed linear space. This is
called a Holder space. What would this space consist of if > 1?
30. Let {fn }
n=1 C (X; R ) where X is a compact subset of R and suppose
n p
||fn || M
for all n. Show there exists a subsequence, nk , such that fnk converges in
C (X; Rn ). The given sequence is precompact when this happens. (This also
shows the embedding of C (X; Rn ) into C (X; Rn ) is a compact embedding.)
Hint: You might want to use the Ascoli Arzela theorem.
31. This problem is for those who know about the derivative and the integral of
a function of one variable. Let f :R Rn Rn be continuous and bounded
and let x0 Rn . If
x : [0, T ] Rn
and h > 0, let {
x0 if s h,
h x (s)
x (s h) , if s > h.
For t [0, T ], let
t
xh (t) = x0 + f (s, h xh (s)) ds.
0
Show using the Ascoli Arzela theorem that there exists a sequence h 0 such
that
xh x
in C ([0, T ] ; Rn ). Next argue
t
x (t) = x0 + f (s, x (s)) ds
0
and conclude the following theorem. If f :R Rn Rn is continuous and

bounded, and if x0 Rn is given, there exists a solution to the following
initial value problem.
x = f (t, x) , t [0, T ]
x (0) = x0 .
This is the Peano existence theorem for ordinary dierential equations.

32. Let D (x0 , r) be the closed ball in Rn ,
{x : |x x0 | r}
where this is the usual norm coming from the dot product. Let P : Rn
D (x0 , r) be dened by
{
x if x D (x0 , r)
P (x) xx0
x0 + r |xx0|
if x / D (x0 , r)
Show that |P x P y| |x y| for all x Rn .
33. Use Problem 31 to obtain local solutions to the initial value problem where
f is not assumed to be bounded. It is only assumed to be continuous. This
means there is a small interval whose length is perhaps not T such that the
solution to the dierential equation exists on this small interval.
The Mathematical Theory Of
Determinants, Basic Linear
Algebra
5.1 The Function sgnn

It is easiest to give a dierent denition of the determinant which is clearly well
dened and then prove the earlier one in terms of Laplace expansion. Let (i1 , , in )
be an ordered list of numbers from {1, , n}. This means the order is important
so (1, 2, 3) and (2, 1, 3) are dierent. There will be some repetition between this
section and the earlier section on determinants. The main purpose is to give all
the missing proofs. Two books which give a good introduction to determinants are
Apostol [2] and Rudin [39]. A recent book which also has a good introduction is
Baker [4].
The following Lemma will be essential in the denition of the determinant.
Lemma 5.1.1 There exists a unique function sgnn which maps each list of numbers
from {1, , n} to one of the three numbers, 0, 1, or 1 which also has the following
properties.
sgnn (1, , n) = 1 (5.1.1)
sgnn (i1 , , p, , q, , in ) = sgnn (i1 , , q, , p, , in ) (5.1.2)
In words, the second property states that if two of the numbers are switched, the
value of the function is multiplied by 1. Also, in the case where n > 1 and
{i1 , , in } = {1, , n} so that every number from {1, , n} appears in the or-
dered list, (i1 , , in ) ,
sgnn (i1 , , i1 , n, i+1 , , in )

n
(1) sgnn1 (i1 , , i1 , i+1 , , in ) (5.1.3)
where n = i in the ordered list, (i1 , , in ).
99
100THE MATHEMATICAL THEORY OF DETERMINANTS, BASIC LINEAR ALGEBRA
Proof: To begin with, it is necessary to show the existence of such a function.

This is clearly true if n = 1. Dene sgn1 (1) 1 and observe that it works.
No switching is possible. In the case where n = 2, it is also clearly true. Let
sgn2 (1, 2) = 1 and sgn2 (2, 1) = 1 while sgn2 (2, 2) = sgn2 (1, 1) = 0 and verify it
works. Assuming such a function exists for n, sgnn+1 will be dened in terms of
sgnn . If there are any repeated numbers in (i1 , , in+1 ) , sgnn+1 (i1 , , in+1 ) 0.
If there are no repeats, then n + 1 appears somewhere in the ordered list. Let
be the position of the number n + 1 in the list. Thus, the list is of the form
(i1 , , i1 , n + 1, i+1 , , in+1 ). From 5.1.3 it must be that
sgnn+1 (i1 , , i1 , n + 1, i+1 , , in+1 )

n+1
(1) sgnn (i1 , , i1 , i+1 , , in+1 ) .
It is necessary to verify this satises 5.1.1 and 5.1.2 with n replaced with n + 1. The
rst of these is obviously true because
n+1(n+1)
sgnn+1 (1, , n, n + 1) (1) sgnn (1, , n) = 1.
If there are repeated numbers in (i1 , , in+1 ) , then it is obvious 5.1.2 holds because
both sides would equal zero from the above denition. It remains to verify 5.1.2 in
the case where there are no numbers repeated in (i1 , , in+1 ) . Consider
( r s
)
sgnn+1 i1 , , p, , q, , in+1 ,
where the r above the p indicates the number, p is in the rth position and the s
above the q indicates that the number, q is in the sth position. Suppose rst that
r < < s. Then
( )
r s
sgnn+1 i1 , , p, , n + 1, , q, , in+1
( r s1
)
n+1
(1) sgnn i1 , , p, , q , , in+1
while ( )
r s
sgnn+1 i1 , , q, , n + 1, , p, , in+1 =
( r s1
)
n+1
(1) sgnn i1 , , q, , p , , in+1
and so, by induction, a switch of p and q introduces a minus sign in the result.
Similarly, if > s or if < r it also follows that 5.1.2 holds. The interesting case
is when = r or = s. Consider the case where = r and note the other case is
entirely similar. ( )
r s
sgnn+1 i1 , , n + 1, , q, , in+1 =
( s1
)
n+1r
(1) sgnn i1 , , q , , in+1 (5.1.4)
5.2. THE DETERMINANT 101
while ( )
r s
sgnn+1 i1 , , q, , n + 1, , in+1 =
( r
)
n+1s
(1) sgnn i1 , , q, , in+1 . (5.1.5)
By making s 1 r switches, move the q which is in the s 1th position in 5.1.4 to

the rth position in 5.1.5. By induction, each of these switches introduces a factor
of 1 and so
( s1
) ( r
)
s1r
sgnn i1 , , q , , in+1 = (1) sgnn i1 , , q, , in+1 .
Therefore,
( r s
) ( s1
)
n+1r
sgnn+1 i1 , , n + 1, , q, , in+1 = (1) sgnn i1 , , q , , in+1
( r
)
n+1r s1r
= (1) (1) sgnn i1 , , q, , in+1
( r
)
n+s
= (1) sgnn i1 , , q, , in+1
( r
)
2s1 n+1s
= (1) (1) sgnn i1 , , q, , in+1
( r s )
= sgnn+1 i1 , , q, , n + 1, , in+1 .
This proves the existence of the desired function.

To see this function is unique, note that you can obtain any ordered list of
distinct numbers from a sequence of switches. If there exist two functions f and
g both satisfying 5.1.1 and 5.1.2, you could start with f (1, , n) = g (1, , n)
and applying the same sequence of switches, eventually arrive at f (i1 , , in ) =
g (i1 , , in ). If any numbers are repeated, then 5.1.2 gives both functions are equal
to zero for that ordered list.
5.2 The Determinant

5.2.1 The Denition
In what follows sgn will often be used rather than sgnn because the context supplies
the appropriate n.
Denition 5.2.1 Let f be a real valued function which has the set of ordered lists
of numbers from {1, , n} as its domain. Dene

f (k1 kn )
(k1 , ,kn )
to be the sum of all the f (k1 kn ) for all possible choices of ordered lists (k1 , , kn )
of numbers of {1, , n}. For example,

f (k1 , k2 ) = f (1, 2) + f (2, 1) + f (1, 1) + f (2, 2) .
(k1 ,k2 )
Denition 5.2.2 Let (aij ) = A denote an n n matrix. The determinant of A,

denoted by det (A) is dened by

det (A) sgn (k1 , , kn ) a1k1 ankn
(k1 , ,kn )
where the sum is taken over all ordered lists of numbers from {1, , n}. Note it
suces to take the sum over only those ordered lists in which there are no repeats
because if there are, sgn (k1 , , kn ) = 0 and so that term contributes 0 to the sum.
Let A be an n n matrix, A = (aij ) and let (r1 , , rn ) denote an ordered list

of n numbers from {1, , n}. Let A (r1 , , rn ) denote the matrix whose k th row
is the rk row of the matrix, A. Thus

det (A (r1 , , rn )) = sgn (k1 , , kn ) ar1 k1 arn kn (5.2.6)
(k1 , ,kn )
and
A (1, , n) = A.
5.2.2 Permuting Rows Or Columns

Proposition 5.2.3 Let
(r1 , , rn )
be an ordered list of numbers from {1, , n}. Then
sgn (r1 , , rn ) det (A)

= sgn (k1 , , kn ) ar1 k1 arn kn (5.2.7)
(k1 , ,kn )
= det (A (r1 , , rn )) . (5.2.8)
Proof: Let (1, , n) = (1, , r, s, , n) so r < s.
det (A (1, , r, , s, , n)) = (5.2.9)

sgn (k1 , , kr , , ks , , kn ) a1k1 arkr asks ankn ,
(k1 , ,kn )
and renaming the variables, calling ks , kr and kr , ks , this equals

= sgn (k1 , , ks , , kr , , kn ) a1k1 arks askr ankn
(k1 , ,kn )
These got switched
z }| {
= sgn k1 , , kr , , ks , , kn a1k1 askr arks ankn
(k1 , ,kn )
= det (A (1, , s, , r, , n)) . (5.2.10)

Consequently,
det (A (1, , s, , r, , n)) =
det (A (1, , r, , s, , n)) = det (A)
Now letting A (1, , s, , r, , n) play the role of A, and continuing in this way,
switching pairs of numbers,
p
det (A (r1 , , rn )) = (1) det (A)
where it took p switches to obtain(r1 , , rn ) from (1, , n). By Lemma 5.1.1,
this implies
p
det (A (r1 , , rn )) = (1) det (A) = sgn (r1 , , rn ) det (A)
and proves the proposition in the case when there are no repeated numbers in the
ordered list, (r1 , , rn ). However, if there is a repeat, say the rth row equals the
sth row, then the reasoning of 5.2.9 -5.2.10 shows that A (r1 , , rn ) = 0 and also
sgn (r1 , , rn ) = 0 so the formula holds in this case also.
Observation 5.2.4 There are n! ordered lists of distinct numbers from {1, , n}.
To see this, consider n slots placed in order. There are n choices for the rst
slot. For each of these choices, there are n 1 choices for the second. Thus there
are n (n 1) ways to ll the rst two slots. Then for each of these ways there are
n 2 choices left for the third slot. Continuing this way, there are n! ordered lists
of distinct numbers from {1, , n} as stated in the observation.
5.2.3 A Symmetric Denition

With the above, it is possible to give a more symmetric
( ) description of the determi-
nant from which it will follow that det (A) = det AT .
Corollary 5.2.5 The following formula for det (A) is valid.
1
det (A) =
n!

sgn (r1 , , rn ) sgn (k1 , , kn ) ar1 k1 arn kn . (5.2.11)
(r1 , ,rn ) (k1 , ,kn )
( )
And also
( det AT = det (A) where AT is the transpose of A. (Recall that for
)
AT = aTij , aTij = aji .)
Proof: From Proposition 5.2.3, if the ri are distinct,

det (A) = sgn (r1 , , rn ) sgn (k1 , , kn ) ar1 k1 arn kn .
(k1 , ,kn )
Summing over all ordered lists, (r1 , , rn ) where the ri are distinct, (If the ri are
not distinct, sgn (r1 , , rn ) = 0 and so there is no contribution to the sum.)
n! det (A) =

sgn (r1 , , rn ) sgn (k1 , , kn ) ar1 k1 arn kn .
(r1 , ,rn ) (k1 , ,kn )
This proves the corollary since the formula gives the same number for A as it does
for AT .
5.2.4 The Alternating Property Of The Determinant

Corollary 5.2.6 If two rows or two columns in an n n matrix, A, are switched,
the determinant of the resulting matrix equals (1) times the determinant of the
original matrix. If A is an n n matrix in which two rows are equal or two columns
are equal then det (A) = 0. Suppose the ith row of A equals (xa1 + yb1 , , xan + ybn ).
Then
det (A) = x det (A1 ) + y det (A2 )
where the ith row of A1 is (a1 , , an ) and the ith row of A2 is (b1 , , bn ) , all
other rows of A1 and A2 coinciding with those of A. In other words, det is a linear
function of each row A. The same is true with the word row replaced with the
word column.
Proof: By Proposition 5.2.3 when two rows are switched, the determinant of the
resulting matrix is (1) times the determinant of the original matrix. By Corollary
5.2.5 the same holds for columns because the columns of the matrix equal the rows
of the transposed matrix. Thus if A1 is the matrix obtained from A by switching
two columns, ( ) ( )
det (A) = det AT = det AT1 = det (A1 ) .
If A has two equal columns or two equal rows, then switching them results in the
same matrix. Therefore, det (A) = det (A) and so det (A) = 0.
It remains to verify the last assertion.

det (A) sgn (k1 , , kn ) a1k1 (xaki + ybki ) ankn
(k1 , ,kn )

=x sgn (k1 , , kn ) a1k1 aki ankn
(k1 , ,kn )

+y sgn (k1 , , kn ) a1k1 bki ankn
(k1 , ,kn )
x det (A1 ) + y det (A2 ) .

( )
The same is true of columns because det AT = det (A) and the rows of AT are
the columns of A.
5.2.5 Linear Combinations And Determinants

Denition 5.2.7 A vector w, is a linear combination
r of the vectors {v1 , , vr } if
there exists scalars, c1 , cr such that w = k=1 ck vk . This is the same as saying
w span {v1 , , vr }.
The following corollary is also of great use.
Corollary 5.2.8 Suppose A is an n n matrix and some column (row) is a linear

combination of r other columns (rows). Then det (A) = 0.
( )
Proof: Let A = a1 an be the columns of A and suppose the condition
that one column is a linear combination of r of the others is satised. Then by using
Corollary 5.2.6 you may rearrange the columns to have the nth column a linear
r
combination of the rst r columns. Thus an = k=1 ck ak and so
( r )
det (A) = det a1 ar an1 k=1 ck ak .
By Corollary 5.2.6

r
( )
det (A) = ck det a1 ar an1 ak = 0.
k=1
( )
The case for rows follows from the fact that det (A) = det AT .
5.2.6 The Determinant Of A Product

Recall the following denition of matrix multiplication.
Denition 5.2.9 If A and B are n n matrices, A = (aij ) and B = (bij ), AB =

(cij ) where
n
cij aik bkj .
k=1
One of the most important rules about determinants is that the determinant of
a product equals the product of the determinants.
Theorem 5.2.10 Let A and B be n n matrices. Then
det (AB) = det (A) det (B) .

Proof: Let cij be the ij th entry of AB. Then by Proposition 5.2.3,
det (AB) =

sgn (k1 , , kn ) c1k1 cnkn
(k1 , ,kn )
( ) ( )

= sgn (k1 , , kn ) a1r1 br1 k1 anrn brn kn
(k1 , ,kn ) r1 rn

= sgn (k1 , , kn ) br1 k1 brn kn (a1r1 anrn )
(r1 ,rn ) (k1 , ,kn )

= sgn (r1 rn ) a1r1 anrn det (B) = det (A) det (B) .
(r1 ,rn )
5.2.7 Cofactor Expansions

Lemma 5.2.11 Suppose a matrix is of the form
( )
A
M= (5.2.12)
0 a
or ( )
A 0
M= (5.2.13)
a
where a is a number and A is an (n 1) (n 1) matrix and denotes either a
column or a row having length n 1 and the 0 denotes either a column or a row of
length n 1 consisting entirely of zeros. Then
det (M ) = a det (A) .
Proof: Denote M by (mij ). Thus in the rst case, mnn = a and mni = 0 if
i = n while in the second case, mnn = a and min = 0 if i = n. From the denition
of the determinant,

det (M ) sgnn (k1 , , kn ) m1k1 mnkn
(k1 , ,kn )
Letting denote the position of n in the ordered list, (k1 , , kn ) then using the
earlier conventions used to prove Lemma 5.1.1, det (M ) equals
( )
n1
n
(1) sgnn1 k1 , , k1 , k+1 , , kn m1k1 mnkn
(k1 , ,kn )
Now suppose 5.2.13. Then if kn = n, the term involving mnkn in the above expres-
sion equals zero. Therefore, the only terms which survive are those for which = n
or in other words, those for which kn = n. Therefore, the above expression reduces
to
a sgnn1 (k1 , kn1 ) m1k1 m(n1)kn1 = a det (A) .
(k1 , ,kn1 )
To get the assertion in the situation of 5.2.12 use Corollary 5.2.5 and 5.2.13 to write
(( T ))
( T) A 0 ( )
det (M ) = det M = det = a det AT = a det (A) .
a

In terms of the theory of determinants, arguably the most important idea is
that of Laplace expansion along a row or a column. This will follow from the above
denition of a determinant.
Denition 5.2.12 Let A = (aij ) be an n n matrix. Then a new matrix called

the cofactor matrix, cof (A) is dened by cof (A) = (cij ) where to obtain cij delete
the ith row and the j th column of A, take the determinant of the (n 1) (n 1)
matrix which results, (This is called the ij th minor of A. ) and then multiply this
i+j
number by (1) . To make the formulas easier to remember, cof (A)ij will denote
the ij th entry of the cofactor matrix.
The following is the main result. Earlier this was given as a denition and the
outrageous totally unjustied assertion was made that the same number would be
obtained by expanding the determinant along any row or column. The following
theorem proves this assertion.
Theorem 5.2.13 Let A be an n n matrix where n 2. Then

n
n
det (A) = aij cof (A)ij = aij cof (A)ij .
j=1 i=1
The rst formula consists of expanding the determinant along the ith row and the
second expands the determinant along the j th column.
Proof: Let (ai1 , , ain ) be the ith row of A. Let Bj be the matrix obtained
from A by leaving every row the same except the ith row which in Bj equals
(0, , 0, aij , 0, , 0). Then by Corollary 5.2.6,

n
det (A) = det (Bj )
j=1
Denote by Aij the (n 1) (n 1) matrix obtained by deleting the ith row and
i+j ( )
the j th column of A. Thus cof (A)ij (1) det Aij . At this point, recall
that from Proposition 5.2.3, when two rows or two columns in a matrix, M, are
switched, this results in multiplying the determinant of the old matrix by 1 to get
the determinant of the new matrix. Therefore, by Lemma 5.2.11,
(( ij ))
nj ni A
det (Bj ) = (1) (1) det
0 aij
(( ij ))
i+j A
= (1) det = aij cof (A)ij .
0 aij
Therefore,

n
det (A) = aij cof (A)ij
j=1
which is the formula for expanding det (A) along the ith row. Also,
( ) n
( )
det (A) = det AT = aTij cof AT ij
j=1

n
= aji cof (A)ji
j=1
which is the formula for expanding det (A) along the ith column.
Note that this gives an easy way to write a formula for the inverse of an n n
matrix.
5.2.8 Formula For The Inverse

Theorem
( 1 ) 5.2.14 A1 exists if and only if det(A) = 0. If det(A) = 0, then A1 =
aij where
a1
ij = det(A)
1
cof (A)ji
for cof (A)ij the ij th cofactor of A.
Proof: By Theorem 5.2.13, and letting (air ) = A, if det (A) = 0,

n
air cof (A)ir det(A)1 = det(A) det(A)1 = 1.
i=1
Now consider

n
air cof (A)ik det(A)1
i=1
when k = r. Replace the k th column with the rth column to obtain a matrix, Bk
whose determinant equals zero by Corollary 5.2.6. However, expanding this matrix
along the k th column yields
1

n
1
0 = det (Bk ) det (A) = air cof (A)ik det (A)
i=1
Summarizing,

n
1
air cof (A)ik det (A) = rk .
i=1
Using the other formula in Theorem 5.2.13, and similar reasoning,

n
1
arj cof (A)kj det (A) = rk
j=1
( )
This proves that if det (A) = 0, then A1 exists with A1 = a1
ij , where
1
a1
ij = cof (A)ji det (A) .
Now suppose A1 exists. Then by Theorem 5.2.10,

( ) ( )
1 = det (I) = det AA1 = det (A) det A1
so det (A) = 0.
The next corollary points out that if an n n matrix, A has a right or a left
inverse, then it has an inverse.
Corollary 5.2.15 Let A be an n n matrix and suppose there exists an n n

matrix, B such that BA = I. Then A1 exists and A1 = B. Also, if there exists
C an n n matrix such that AC = I, then A1 exists and A1 = C.
Proof: Since BA = I, Theorem 5.2.10 implies
det B det A = 1
and so det A = 0. Therefore from Theorem 5.2.14, A1 exists. Therefore,

( )
A1 = (BA) A1 = B AA1 = BI = B.
The case where CA = I is handled similarly.

The conclusion of this corollary is that left inverses, right inverses and inverses
are all the same in the context of n n matrices.
Theorem 5.2.14 says that to nd the inverse, take the transpose of the cofactor
matrix and divide by the determinant. The transpose of the cofactor matrix is
called the adjugate or sometimes the classical adjoint of the matrix A. In words,
A1 is equal to one over the determinant of A times the adjugate matrix of A.
5.2.9 Cramers Rule

In case you are solving a system of equations, Ax = y for x, it follows that if A1
exists, ( )
x = A1 A x = A1 (Ax) = A1 y
thus solving the system. Now in the case that A1 exists, there is a formula for
A1 given above. Using this formula,

n
n
1
xi = a1
ij yj = cof (A)ji yj .
j=1 j=1
det (A)
By the formula for the expansion of a determinant along a column,

y1
1 .. ,
xi = det ... ..
. .
det (A)
yn
T
where here the ith column of A is replaced with the column vector (y1 , yn ) ,
and the determinant of this modied matrix is taken and divided by det (A). This
formula is known as Cramers rule.
5.2.10 Upper Triangular Matrices

Denition 5.2.16 A matrix M , is upper triangular if Mij = 0 whenever i > j.
Thus such a matrix equals zero below the main diagonal, the entries of the form Mii
as shown.

.. ..
0 . .

. .. ..
.. . .
0 0
A lower triangular matrix is dened similarly as a matrix for which all entries above
the main diagonal are equal to zero.
With this denition, here is a simple corollary of Theorem 5.2.13.
Corollary 5.2.17 Let M be an upper (lower) triangular matrix. Then det (M ) is

obtained by taking the product of the entries on the main diagonal.
5.2.11 The Determinant Rank

Denition 5.2.18 A submatrix of a matrix A is the rectangular array of numbers
obtained by deleting some rows and columns of A. Let A be an m n matrix. The
determinant rank of the matrix equals r where r is the largest number such that
some r r submatrix of A has a non zero determinant.
Theorem 5.2.19 If A, an m n matrix has determinant rank r, then there exist

r rows (columns) of the matrix such that every other row (column) is a linear
combination of these r rows (columns).
Proof: Suppose the determinant rank of A = (aij ) equals r. Thus some r r

submatrix has non zero determinant and there is no larger square submatrix which
has non zero determinant. Suppose such a submatrix is determined by the r columns
whose indices are
j1 < < jr
and the r rows whose indices are
i1 < < ir
I want to show that every row is a linear combination of these rows. Consider the
lth row and let p be an index between 1 and n. Form the following (r + 1) (r + 1)
matrix
ai1 j1 ai1 jr ai1 p
.. .. ..
. . .

air j1 air jr air p
alj1 aljr alp
Of course you can assume l / {i1 , , ir } because there is nothing to prove if the
lth row is one of the chosen ones. The above matrix has determinant 0. This is
because if p
/ {j1 , , jr } then the above would be a submatrix of A which is too
large to have non zero determinant. On the other hand, if p {j1 , , jr } then the
above matrix has two columns which are equal so its determinant is still 0.
Expand the determinant of the above matrix along the last column. Let Ck
denote the cofactor associated with the entry aik p . This is not dependent on the
choice of p. Remember, you delete the column and the row the entry is in and take
the determinant of what is left and multiply by 1 raised to an appropriate power.
Let C denote the cofactor associated with alp . This is given to be nonzero, it being
the determinant of the matrix

ai1 j1 ai1 jr
.. ..
. .
air j1 a ir j r
Thus

r
0 = alp C + Ck aik p
k=1
which implies

r
Ck
r
alp = aik p mk aik p
C
k=1 k=1
Since this is true for every p and since mk does not depend on p, this has shown the
lth row is a linear combination of the i1 , i2 , , ir rows. The determinant rank does
not change when you replace A with AT . Therefore, the same conclusion holds for
the columns.
5.2.12 Determining Whether A Is One To One Or Onto

The following theorem is of fundamental importance and ties together many of the
ideas presented above.
Theorem 5.2.20 Let A be an n n matrix. Then the following are equivalent.
1. det (A) = 0.
2. A, AT are not one to one.
3. A is not onto.
Proof: Suppose det (A) = 0. Then the determinant rank of A = r < n.

Therefore, there exist r columns such that every other column is a linear com-
bination of these columns by Theorem 5.2.19. In particular, it follows that for
some( m, the mth column is a )linear combination of all the others. Thus letting
A = a1 am an where the columns are denoted by ai , there exists
scalars, i such that
am = k ak .
k=m
( )T
Now consider the column vector x 1 1 n . Then

Ax = am + k ak = 0.
k=m
Since also A0 = 0, it follows A is not one to one. Similarly, AT is not one to one
by the same argument applied to AT . This veries that 1.) implies 2.).
Now suppose 2.). Then since AT is not one to one, it follows there exists x = 0
such that
AT x = 0.
Taking the transpose of both sides yields
x T A = 0T
where the 0T is a 1 n matrix or row vector. Now if Ay = x, then

2 ( )
|x| = xT (Ay) = xT A y = 0T y = 0
contrary to x = 0. Consequently there can be no y such that Ay = x and so A is

not onto. This shows that 2.) implies 3.).
Finally, suppose 3.). If 1.) does not hold, then det (A) = 0 but then from
Theorem 5.2.14 A1 exists and so for every y Fn there exists a unique x Fn
such that Ax = y. In fact x = A1 y. Thus A would be onto contrary to 3.). This
shows 3.) implies 1.).
Corollary 5.2.21 Let A be an n n matrix. Then the following are equivalent.

1. det(A) = 0.
2. A and AT are one to one.
3. A is onto.
Proof: This follows immediately from the above theorem.
5.2.13 Schurs Theorem

Consider the following system of equations for x1 , x2 , , xn

n
aij xj = 0, i = 1, 2, , m (5.2.14)
j=1
where m < n. Then the following theorem is a fundamental observation.
Theorem 5.2.22 Let the system of equations be as just described in 5.2.14 where
m < n. Then letting
xT (x1 , x2 , , xn ) Fn ,
there exists x = 0 such that the components satisfy each of the equations of 5.2.14.
Here F is a eld of scalars. Think R or C for example.
Proof: The above system is of the form
Ax = 0
where A is an m n matrix with m < n. Therefore, if you form the matrix

( )
A
,
0
an n n matrix having n m rows of zeros on the bottom, it follows this matrix has
determinant equal to 0. Therefore, from Theorem 5.2.20, there exists x = 0 such
that Ax = 0.
Denition 5.2.23 A set of vectors in Fn , F = R or C, {x1 , , xk } is called an

orthonormal set of vectors if
{
1 if i = j
xi xj = (xi , xj ) = ij
0 if i = j
Theorem 5.2.24 Let v1 be a unit vector (|v1 | = 1) in Fn , n > 1. Then there exist
vectors {v2 , , vn } such that
{v1 , , vn }
is an orthonormal set of vectors.

Proof: The equation for x

v1 x = 0
has a nonzero solution x by Theorem 5.2.22. Pick such a solution and divide by
its magnitude to get v2 a unit vector such that v1 v2 = 0. Hence {v1 , v2 } is an
orthonormal set of vectors. Now suppose v1 , , vk have been chosen such that
{v1 , , vk } is an orthonormal set of vectors. Then consider the equations
vj x = 0 j = 1, 2, , k
This amounts to the situation of Theorem 5.2.22 in which there are more variables
than equations. Therefore, by this theorem, there exists a nonzero x solving all
these equations. Divide by its magnitude and this gives vk+1 .
The argument gives the following generalization.
Corollary 5.2.25 Let {v1 , , vk } be orthonormal in Fn where k < n. Then there

exist vectors {vk+1 , , vn } such that {v1 , , vn } is an orthonormal basis for Fn .
Denition 5.2.26 If U is an n n matrix whose columns form an orthonormal

set of vectors, then U is called an orthogonal matrix if it is real and a unitary
matrix if it is complex. Note that from the way we multiply matrices,
U T U = U U T = I.
Thus U 1 = U T . If U is only unitary, then from the dot product in Cn , we replace

the above with
U U = U U = I.
Where the indicates to take the conjugate of the transpose.
Note the product of orthogonal or unitary matrices is orthogonal or unitary

because
T
(U1 U2 ) (U1 U2 ) = U2T U1T U1 U2 = I

(U1 U2 ) (U1 U2 ) = U2 U1 U1 U2 = I.
Two matrices A and B are similar if there is some invertible matrix S such
that A = S 1 BS. Note that similar matrices have the same characteristic equation
because by Theorem 5.2.10 which says the determinant of a product is the product
of the determinants,
( ) ( )
det (I A) = det I S 1 BS = det S 1 (I B) S
( ) ( )
= det S 1 det (I B) det (S) = det S 1 S det (I B) = det (I B)
With this preparation, here is a case of Schurs theorem.

Theorem 5.2.27 Let A be a real or complex n n matrix. Then there exists a

unitary matrix U such that
U AU = T, (5.2.15)
where T is an upper triangular matrix having the eigenvalues of A on the main
diagonal listed according to multiplicity as zeros of the characteristic equation. If A
has all real entries and eigenvalues, then U can be chosen to be orthogonal.
Proof: The theorem is clearly true if A is a 1 1 matrix. Just let U = 1 the

1 1 matrix which has 1 down the main diagonal and zeros elsewhere. Suppose it
is true for (n 1) (n 1) matrices and let A be an n n matrix. Then let v1 be
a unit eigenvector for A. There exists 1 such that
Av1 = 1 v1 , |v1 | = 1.
By Theorem 5.2.24 there exists {v1 , , vn }, an orthonormal set in Rn . Let U0

be a matrix whose ith column is vi . Then from the above, it follows U0 is unitary.
Then from the way you multiply matrices, (Remember that to get the ij th entry of
a product, you take the dot product if the ith row of the matrix on the left with
the j th column of the matrix on the right.) U0 AU0 is of the form

1
0

..
. A1
0
where A1 is an n 1 n 1 matrix. The above matrix is similar to A, so it has the

same eigenvalues and indeed the same characteristic equation. Now by induction,
e1 such that
there exists an (n 1) (n 1) unitary matrix U
e1 A1 U
U e1 = Tn1 ,
an upper triangular matrix. Consider

( )
1 0
U1 e1
0 U
From the way we multiply matrices, this is a unitary matrix and

( )( )( )
1 0 1 1 0
U1 U0 AU0 U1 = e e1
0 U 1 0 A1 0 U
( )
1
= T
0 Tn1
where T is upper triangular. Then let U = U0 U1 . Then U AU = T. If A is real

having real eigenvalues, all of the above can be accomplished using the real dot
product and using real eigenvectors. Thus U can be orthogonal.
5.2.14 Symmetric Matrices

Recall a real matrix A is symmetric if A = AT .
Lemma 5.2.28 A real symmetric matrix has all real eigenvalues.
Proof: Recall the eigenvalues are solutions to
det (I A) = 0
and so by Theorem 5.2.20, there exists x a vector such that
Ax = x, x = 0
Of course if A is real, it is still possible that the eigenvalue could be complex and if
this is the case, then the vector x will also end up being complex. I wish to show
that the eigenvalues are all real. Suppose then that is an eigenvalue and let x be
the corresponding eigenvector described above. Then letting x denote the complex
conjugate of x,
T
xT x = (Ax) x = xT AT x = xT Ax = xT Ax = xT x
and so, canceling xT x, it follows = showing is real.
Theorem 5.2.29 Let A be a real symmetric matrix. Then there exists a diago-
nal matrix D consisting of the eigenvalues of A down the main diagonal and an
orthogonal matrix U such that
U T AU = D.
Proof: Since A has all real eigenvalues, it follows from Theorem 5.2.27, there
exists an orthogonal matrix U such that
U T AU = T
where T is upper triangular. Now
T T = U T AT U = U T AU = T
and so in fact T is a diagonal matrix having the eigenvalues of A down the diagonal.
Theorem 5.2.30 Let A be a real symmetric matrix which has all positive eigen-
values 0 < 1 2 n . Then
2
(Ax x) xT Ax 1 |x|
5.3. THE RIGHT POLAR FACTORIZATION 117
Proof: Let U be the orthogonal matrix of Theorem 5.2.29. Then

( ) ( )
(Ax x) = xT Ax = xT U D U T x
( ) ( ) ( T ) 2
= UT x D UT x = i U x i
i
( ) ( )
1 U T x 2 = 1 U T xU T x
i
i
( )T 2
= 1 U T x U T x =1 xT U U T x = 1 xT Ix = 1 |x| .
5.3 The Right Polar Factorization

The right polar factorization involves writing a matrix as a product of two other
matrices, one which preserves distances and the other which stretches and distorts.
First here are some lemmas which review and add to many of the topics discussed
so far about adjoints and orthonormal sets and such things.
Lemma 5.3.1 Let A be a symmetric matrix such that all its eigenvalues are non-
negative. Then there exists a symmetric matrix, A1/2 such that A1/2 has all non-
( )2
negative eigenvalues and A1/2 = A.
Proof: Since A is symmetric, it follows from Theorem 5.2.29 there exists a

diagonal matrix D having all real nonnegative entries and an orthogonal matrix
U such that A = U T DU. Then denote by D1/2 the matrix which is obtained by
replacing each diagonal entry of D with its square root. Thus D1/2 D1/2 = D. Then
dene
A1/2 U T D1/2 U.
Then ( )2
A1/2 = U T D1/2 U U T D1/2 U = U T DU = A.
Since D1/2 is real,

( )T ( )T ( )T
U T D1/2 U = U T D1/2 UT = U T D1/2 U
so A1/2 is symmetric.
There is also a useful observation about orthonormal sets of vectors which is
stated in the next lemma.
Lemma 5.3.2 Suppose {x1 , x2 , , xr } is an orthonormal set of vectors. Then if

c1 , , cr are scalars, r 2
r
2
ck xk = |ck | .

k=1 k=1
Proof: This follows from the denition. From the properties of the dot product
and using the fact that the given set of vectors is orthonormal,
r 2 r

r

ck xk = ck xk , cj xj

k=1 k=1 j=1

r
2
= ck cj (xk , xj ) = |ck | .
k,j k=1

Here is another lemma about preserving distance.
Lemma 5.3.3 Suppose R is an mn matrix with m > n and R preserves distances.

Then RT R = I.
Proof: Since R preserves distances, |Rx| = |x| for every x. Therefore from the
axioms of the dot product,
2 2
|x| + |y| + (x, y) + (y, x)
2
= |x + y|
= (R (x + y) , R (x + y))
= (Rx,Rx) + (Ry,Ry) + (Rx, Ry) + (Ry, Rx)
2 2 ( ) ( )
= |x| + |y| + RT Rx, y + y, RT Rx
and so for all x, y,

( ) ( )
RT Rx x, y + y,RT Rx x = 0
Hence for all x, y, ( )

RT Rx x, y = 0
Now for a x, y given, choose {1, 1} such that
( ) ( )
RT Rx x, y = RT Rx x, y
Then
( ) ( )
0 = RT Rx x,y = RT Rx x, y
( )
= RT Rx x, y
( )
Thus RT Rx x, y = 0 for all x, y because the given x, y were arbitrary. Let
y = RT Rx x to conclude that for all x,
RT Rx x = 0
which says RT R = I since x is arbitrary.

With this preparation, here is the big theorem about the right polar factoriza-
tion.
5.3. THE RIGHT POLAR FACTORIZATION 119
Theorem 5.3.4 Let F be an m n matrix where m n. Then there exists a

symmetric n n matrix, U which has all nonnegative eigenvalues and an m n
matrix, R which preserves distances and satises RT R = I such that
F = RU.
Proof: Consider F T F. This is a symmetric matrix because

( )T ( )T
FTF = FT FT = FTF
Also the eigenvalues of the n n matrix F T F are all nonnegative. This is because
if x is an eigenvalue,
( )
(x, x) = F T F x, x = (F x,F x) 0.
Therefore, by Lemma 5.3.1, there exists an n n symmetric matrix, U having all

nonnegative eigenvalues such that
U 2 = F T F.
Consider the subspace U (Fn ). Let {U x1 , , U xr } be an orthonormal basis for

U (Fn ) Fn . Note that U (Fn ) might not be all of Fn . Using Corollary 5.2.25,
extend to an orthonormal basis for all of Fn ,
{U x1 , , U xr , yr+1 , , yn } .
Next observe that {F x1 , , F xr } is also an orthonormal set of vectors in Fm .

This is because
( ) ( )
(F xk , F xj ) = F T F xk , xj = U 2 xk , xj
( )
= U xk , U T xj = (U xk , U xj ) = jk
Therefore, from Corollary 5.2.25 again, this orthonormal set of vectors can be ex-
tended to an orthonormal basis for Fm ,
{F x1 , , F xr , zr+1 , , zm }
Thus there are at least as many zk as there are yj . Now for x Fn , since
{U x1 , , U xr , yr+1 , , yn }
is an orthonormal basis for Fn , there exist unique scalars,
c1 , cr , dr+1 , , dn
such that

r
n
x= ck U xk + dk yk
k=1 k=r+1
Dene

r
n
Rx ck F xk + dk zk (5.3.16)
k=1 k=r+1
Then also there exist scalars bk such that

r
Ux = bk U xk
k=1
and so from 5.3.16, ( )

r
r
RU x = bk F xk = F bk xk
k=1 k=1
r
Is F ( k=1 bk xk ) = F (x)?
( ( r ) ( r ) )

F bk xk F (x) , F bk xk F (x)
k=1 k=1
( ( r ) ( ))
( )
r
= F F T
bk xk x , bk xk x
k=1 k=1
( ( ) ( ))

r
r
= U 2
bk xk x , bk xk x
k=1 k=1
( ( r ) ( r ))

= U bk xk x , U bk xk x
k=1 k=1
( )

r
r
= bk U xk U x, bk U xk U x =0
k=1 k=1
r
Therefore, F ( k=1 bk xk ) = F (x) and this shows
RU x = F x.
From 5.3.16 and Lemma 5.3.2 R preserves distances. Therefore, by Lemma 5.3.3
RT R = I.
The Derivative
6.1 Basic Denitions

The concept of derivative generalizes right away to functions of many variables.
However, no attempt will be made to consider derivatives from one side or another.
This is because when you consider functions of many variables, there isnt a well
dened side. However, it is certainly the case that there are more general notions
which include such things. I will present a fairly general notion of the derivative of
a function which is dened on a normed vector space which has values in a normed
vector space. The case of most interest is that of a function which maps Fn to Fm
but it is no more trouble to consider the extra generality and it is sometimes useful
to have this extra generality because sometimes you want to consider functions
dened, for example on subspaces of Fn and it is nice to not have to trouble with
ad hoc considerations. Also, you might want to consider Fn with some norm other
than the usual one.
In what follows, X, Y will denote normed vector spaces. Thanks to Theorem
4.8.4 all the denitions and theorems given below work the same for any norm given
on the vector spaces.
Let U be an open set in X, and let f : U Y be a function.
Denition 6.1.1 A function g is o (v) if

g (v)
lim =0 (6.1.1)
||v||0 ||v||
A function f : U Y is dierentiable at x U if there exists a linear transforma-

tion L L (X, Y ) such that
f (x + v) = f (x) + Lv + o (v)
This linear transformation L is the denition of Df (x). This derivative is often

called the Frechet derivative.
Note that from Theorem 4.8.4 the question whether a given function is dieren-
tiable is independent of the norm used on the nite dimensional vector space. That
121
122 THE DERIVATIVE
is, a function is dierentiable with one norm if and only if it is dierentiable with
another norm.
The denition 6.1.1 means the error,
f (x + v) f (x) Lv
converges to 0 faster than ||v||. Thus the above denition is equivalent to saying
||f (x + v) f (x) Lv||
lim =0 (6.1.2)
||v||0 ||v||
or equivalently,
||f (y) f (x) Df (x) (y x)||
lim = 0. (6.1.3)
yx ||y x||
The symbol, o (v) should be thought of as an adjective. Thus, if t and k are
constants,
o (v) = o (v) + o (v) , o (tv) = o (v) , ko (v) = o (v)
and other similar observations hold.
Theorem 6.1.2 The derivative is well dened.
Proof: First note that for a xed vector, v, o (tv) = o (t). This is because
o (tv) o (tv)
lim = lim ||v|| =0
t0 |t| t0 ||tv||
Now suppose both L1 and L2 work in the above denition. Then let v be any vector
and let t be a real scalar which is chosen small enough that tv + x U . Then
f (x + tv) = f (x) + L1 tv + o (tv) , f (x + tv) = f (x) + L2 tv + o (tv) .
Therefore, subtracting these two yields (L2 L1 ) (tv) = o (tv) = o (t). There-
fore, dividing by t yields (L2 L1 ) (v) = o(t)t . Now let t 0 to conclude that
(L2 L1 ) (v) = 0. Since this is true for all v, it follows L2 = L1 .
Lemma 6.1.3 Let f be dierentiable at x. Then f is continuous at x and in fact,
there exists K > 0 such that whenever ||v|| is small enough,
||f (x + v) f (x)|| K ||v||
Proof: From the denition of the derivative,
f (x + v) f (x) = Df (x) v + o (v) .
o(||v||)
Let ||v|| be small enough that ||v|| < 1 so that ||o (v)|| ||v||. Then for such
v,
||f (x + v) f (x)|| ||Df (x) v|| + ||v||
(||Df (x)|| + 1) ||v||
This proves the lemma with K = ||Df (x)|| + 1.
Here ||Df (x)|| is the operator norm of the linear transformation, Df (x).
6.2. THE CHAIN RULE 123
6.2 The Chain Rule

With the above lemma, it is easy to prove the chain rule.
Theorem 6.2.1 (The chain rule) Let U and V be open sets U X and V
Y . Suppose f : U V is dierentiable at x U and suppose g : V Fq is
dierentiable at f (x) V . Then g f is dierentiable at x and
D (g f ) (x) = D (g (f (x))) D (f (x)) .
Proof: This follows from a computation. Let B (x,r) U and let r also be
small enough that for ||v|| r, it follows that f (x + v) V . Such an r exists
because f is continuous at x. For ||v|| < r, the denition of dierentiability of g
and f implies
g (f (x + v)) g (f (x)) =
Dg (f (x)) (f (x + v) f (x)) + o (f (x + v) f (x))

= Dg (f (x)) [Df (x) v + o (v)] + o (f (x + v) f (x))
= D (g (f (x))) D (f (x)) v + o (v) + o (f (x + v) f (x)) . (6.2.4)
It remains to show o (f (x + v) f (x)) = o (v).

By Lemma 6.1.3, with K given there, letting > 0, it follows that for ||v|| small
enough,
||o (f (x + v) f (x))|| (/K) ||f (x + v) f (x)|| (/K) K ||v|| = ||v|| .
Since > 0 is arbitrary, this shows o (f (x + v) f (x)) = o (v) because whenever

||v|| is small enough,
||o (f (x + v) f (x))||
.
||v||
By 6.2.4, this shows
g (f (x + v)) g (f (x)) = D (g (f (x))) D (f (x)) v + o (v)
which proves the theorem.
6.3 The Matrix Of The Derivative

Let X, Y be normed vector spaces, a basis for X being {v1 , , vn } and a basis for
Y being {w1 , , wm } . First note that if i : X F is dened by

i v xi where v = xk vk ,
k
then i L (X, F) and so by Theorem 4.8.3, it follows that i is continuous and if

limst g (s) = L, then | i g (s) i L| || i || ||g (s) L|| and so the ith components
converge also.
124 THE DERIVATIVE
Suppose that f : U Y is dierentiable. What is the matrix of Df (x) with

respect to the given bases? That is, if

Df (x) = Jij (x) wi vj ,
ij
what is Jij (x)?

f (x + tvk ) f (x) Df (x) (tvk ) + o (tvk )
Dvk f (x) lim = lim
t0 t t0 t

= Df (x) (vk ) = Jij (x) wi vj (vk ) = Jij (x) wi jk
ij ij

= Jik (x) wi
i
It follows
( )
f (x + tvk ) f (x)
lim j
t0 t
fj (x + tvk ) fj (x)
lim Dvk fj (x)
t0
( t )

= j Jik (x) wi = Jjk (x)
i
Thus Jik (x) = Dvk fi (x).

In the case where X = Rn and Y = Rm and v is a unit vector, Dv fi (x) is the
familiar directional derivative in the direction v of the function, fi .
Of course the case where X = Fn and f : U Fn Fm , is dierentiable and the
basis vectors are the usual basis vectors is the case most commonly encountered.
What is the matrix of Df (x) taken with respect to the usual basis vectors? Let
ei denote the vector of Fn which has a one in the ith entry and zeroes elsewhere.
This is the standard basis for Fn . Denote by Jij (x) the matrix with respect to these
basis vectors. Thus
Df (x) = Jij (x) ei ej .
ij
Then from what was just shown,

fi (x + tek ) fi (x)
Jik (x) = Dek fi (x) lim
t0 t
fi
(x) fi,xk (x) fi,k (x)
xk
where the last several symbols are just the usual notations for the partial derivative
of the function, fi with respect to the k th variable where

m
f (x) fi (x) ei .
i=1
6.3. THE MATRIX OF THE DERIVATIVE 125
In other words, the matrix of Df (x) is nothing more than the matrix of partial
derivatives. The k th column of the matrix (Jij ) is
f f (x + tek ) f (x)
(x) = lim Dek f (x) .
xk t0 t
Thus the matrix of Df (x) with respect to the usual basis vectors is the matrix
of the form
f1,x1 (x) f1,x2 (x) f1,xn (x)
.. .. ..
. . .
fm,x1 (x) fm,x2 (x) fm,xn (x)
where the notation g,xk denotes the k th partial derivative given by the limit,
g (x + tek ) g (x) g
lim .
t0 t xk
The above discussion is summarized in the following theorem.
Theorem 6.3.1 Let f : Fn Fm and suppose f is dierentiable at x. Then all the
partial derivatives fx
i (x)
j
exist and if Jf (x) is the matrix of the linear transforma-
tion, Df (x) with respect to the standard basis vectors, then the ij th entry is given
fi
by x j
(x) also denoted as fi,j or fi,xj .
Denition 6.3.2 In general, the symbol
Dv f (x)
is dened by
f (x + tv) f (x)
lim
t0 t
where t F. This is often called the Gateaux derivative.
What if all the partial derivatives of f exist? Does it follow that f is dieren-
tiable? Consider the following function, f : R2 R,
{ xy
f (x, y) = x2 +y 2 if (x, y) = (0, 0) .
0 if (x, y) = (0, 0)
Then from the denition of partial derivatives,
f (h, 0) f (0, 0) 00
lim = lim =0
h0 h h0 h
and
f (0, h) f (0, 0) 00
lim = lim =0
h0 h h0 h
However f is not even continuous at (0, 0) which may be seen by considering the
behavior of the function along the line y = x and along the line x = 0. By Lemma
6.1.3 this implies f is not dierentiable. Therefore, it is necessary to consider the
correct denition of the derivative given above if you want to get a notion which
generalizes the concept of the derivative of a function of one variable in such a way
as to preserve continuity whenever the function is dierentiable.
126 THE DERIVATIVE
6.4 A Mean Value Inequality

The following theorem will be very useful in much of what follows. It is a version
of the mean value theorem as is the next lemma.
Lemma 6.4.1 Let Y be a normed vector space and suppose h : [0, 1] Y is dier-
entiable and satises
||h (t)|| M.
Then
||h (1) h (0)|| M.
Proof: Let > 0 be given and let
S {t [0, 1] : for all s [0, t] , ||h (s) h (0)|| (M + ) s}
Then 0 S. Let t = sup S. Then by continuity of h it follows
||h (t) h (0)|| = (M + ) t (6.4.5)
Suppose t < 1. Then there exist positive numbers, hk decreasing to 0 such that
||h (t + hk ) h (0)|| > (M + ) (t + hk )
and now it follows from 6.4.5 and the triangle inequality that
||h (t + hk ) h (t)|| + ||h (t) h (0)||

= ||h (t + hk ) h (t)|| + (M + ) t > (M + ) (t + hk )
and so
||h (t + hk ) h (t)|| > (M + ) hk
Now dividing by hk and letting k
||h (t)|| M + ,
a contradiction.
Theorem 6.4.2 Suppose U is an open subset of X and f : U Y has the property

that Df (x) exists for all x in U and that, x + t (y x) U for all t [0, 1]. (The
line segment joining the two points lies in U .) Suppose also that for all points on
this line segment,
||Df (x+t (y x))|| M.
Then
||f (y) f (x)|| M |y x| .
6.5. EXISTENCE OF THE DERIVATIVE, C 1 FUNCTIONS 127
Proof: Let
h (t) f (x + t (y x)) .
Then by the chain rule,
h (t) = Df (x + t (y x)) (y x)
and so
||h (t)|| = ||Df (x + t (y x)) (y x)||

M ||y x||
by Lemma 6.4.1
||h (1) h (0)|| = ||f (y) f (x)|| M ||y x|| .
6.5 Existence Of The Derivative, C 1 Functions

There is a way to get the dierentiability of a function from the existence and con-
tinuity of the Gateaux derivatives. This is very convenient because these Gateaux
derivatives are taken with respect to a one dimensional variable. The following
theorem is the main result.
Theorem 6.5.1 Let X be a normed vector space having basis {v1 , , vn } and let
Y be another normed vector space having basis {w1 , , wm } . Let U be an open
set in X and let f : U Y have the property that the Gateaux derivatives,
f (x + tvk ) f (x)
Dvk f (x) lim
t0 t
exist and are continuous functions of x. Then Df (x) exists and

n
Df (x) v = Dvk f (x) ak
k=1
where

n
v= ak vk .
k=1
Furthermore, x Df (x) is continuous; that is
lim ||Df (y) Df (x)|| = 0.

yx
128 THE DERIVATIVE
n
Proof: Let v = k=1 ak vk . Then
( )

n
f (x + v) f (x) = f x+ ak vk f (x) .
k=1
0
Then letting k=1 0, f (x + v) f (x) is given by

n
k
k1
f x+ aj vj f x+ a j v j
k=1 j=1 j=1

n
= [f (x + ak vk ) f (x)] +
k=1

n
k
k1
f x+ aj vj f (x + ak vk ) f x+ aj vj f (x)
k=1 j=1 j=1
(6.5.6)
Consider the k th term in 6.5.6. Let

k1
h (t) f x+ aj vj + tak vk f (x + tak vk )
j=1
for t [0, 1] . Then

1
k1
h (t) = ak lim f x+ aj vj + (t + h) ak vk f (x + (t + h) ak vk )
h0 ak h
j=1

k1
f x+ aj vj + tak vk f (x + tak vk )
j=1
and this equals

k1
Dvk f x+ aj vj + tak vk Dvk f (x + tak vk ) ak (6.5.7)
j=1
Now without loss of generality, it can be assumed the norm on X is given by that
of Example 4.8.5,

n
||v|| max |ak | : v = ak vk

j=1
6.5. EXISTENCE OF THE DERIVATIVE, C 1 FUNCTIONS 129
because by Theorem 4.8.4 all norms on X are equivalent. Therefore, from 6.5.7 and
the assumption that the Gateaux derivatives are continuous,

k1

||h (t)|| = Dvk f x+

aj vj + tak vk Dvk f (x + tak vk ) ak
j=1
|ak | ||v||
provided ||v|| is suciently small. Since is arbitrary, it follows from Lemma 6.4.1
the expression in 6.5.6 is o (v) because this expression equals a nite sum of terms
of the form h (1) h (0) where ||h (t)|| ||v|| . Thus

n
f (x + v) f (x) = [f (x + ak vk ) f (x)] + o (v)
k=1

n
n
= Dvk f (x) ak + [f (x + ak vk ) f (x) Dvk f (x) ak ] + o (v) .
k=1 k=1
Consider the k th term in the second sum.

( )
f (x + ak vk ) f (x)
f (x + ak vk ) f (x) Dvk f (x) ak = ak Dvk f (x)
ak
where the expression in the parentheses converges to 0 as ak 0. Thus whenever
||v|| is suciently small,
||f (x + ak vk ) f (x) Dvk f (x) ak || |ak | ||v||
which shows the second sum is also o (v). Therefore,

n
f (x + v) f (x) = Dvk f (x) ak + o (v) .
k=1
Dening

n
Df (x) v Dvk f (x) ak
k=1

where v = k ak vk , it follows Df (x) L (X, Y ) and is given by the above formula.
It remains to verify x Df (x) is continuous.
||(Df (x) Df (y)) v||
n
||(Dvk f (x) Dvk f (y)) ak ||
k=1

n
max {|ak | , k = 1, , n} ||Dvk f (x) Dvk f (y)||
k=1

n
= ||v|| ||Dvk f (x) Dvk f (y)||
k=1
130 THE DERIVATIVE
and so

n
||Df (x) Df (y)|| ||Dvk f (x) Dvk f (y)||
k=1
which proves the continuity of Df because of the assumption the Gateaux derivatives
are continuous.
This motivates the following denition of what it means for a function to be C 1 .
Denition 6.5.2 Let U be an open subset of a normed nite dimensional vector

space, X and let f : U Y another nite dimensional normed vector space. Then
f is said to be C 1 if there exists a basis for X, {v1 , , vn } such that the Gateaux
derivatives,
Dvk f (x)
exist on U and are continuous.
Here is another denition of what it means for a function to be C 1 .
Denition 6.5.3 Let U be an open subset of a normed nite dimensional vector

f is said to be C 1 if f is dierentiable and x Df (x) is continuous as a map from
U to L (X, Y ).
Now the following major theorem states these two denitions are equivalent.
Theorem 6.5.4 Let U be an open subset of a normed nite dimensional vector

the two denitions above are equivalent.
Proof: It was shown in Theorem 6.5.1 that Denition 6.5.2 implies 6.5.3. Sup-
pose then that Denition 6.5.3 holds. Then if v is any vector,
f (x + tv) f (x) Df (x) tv + o (tv)

lim = lim
t0 t t0 t
o (tv)
= Df (x) v+ lim = Df (x) v
t0 t
Thus Dv f (x) exists and equals Df (x) v. By continuity of x Df (x) , this estab-
lishes continuity of x Dv f (x) and proves the theorem.
Note that the proof of the theorem also implies the following corollary.
Corollary 6.5.5 Let U be an open subset of a normed nite dimensional vector

space, X and let f : U Y another nite dimensional normed vector space. Then if
there is a basis of X, {v1 , , vn } such that the Gateaux derivatives, Dvk f (x) exist
and are continuous. Then all Gateaux derivatives, Dv f (x) exist and are continuous
for all v X.
From now on, whichever denition is more convenient will be used.

6.6. HIGHER ORDER DERIVATIVES 131
6.6 Higher Order Derivatives

If f : U X Y for U an open set, then
x Df (x)
is a mapping from U to L (X, Y ), a normed vector space. Therefore, it makes

perfect sense to ask whether this function is also dierentiable.
Denition 6.6.1 The following is the denition of the second derivative.
D2 f (x) D (Df (x)) .
Thus,
Df (x + v) Df (x) = D2 f (x) v + o (v) .
This implies
D2 f (x) L (X, L (X, Y )) , D2 f (x) (u) (v) Y,
and the map
(u, v) D2 f (x) (u) (v)
is a bilinear map having values in Y . In other words, the two functions,
u D2 f (x) (u) (v) , v D2 f (x) (u) (v)
are both linear.

The same pattern applies to taking higher order derivatives. Thus,
( )
D3 f (x) D D2 f (x)
and D3 f (x) may be considered as a trilinear map having values in Y . In general

Dk f (x) may be considered a k linear map. This means the function
(u1 , , uk ) Dk f (x) (u1 ) (uk )
has the property

uj Dk f (x) (u1 ) (uj ) (uk )
is linear.
Also, instead of writing
D2 f (x) (u) (v) , or D3 f (x) (u) (v) (w)
the following notation is often used.
D2 f (x) (u, v) or D3 f (x) (u, v, w)
with similar conventions for higher derivatives than 3. Another convention which
is often used is the notation
Dk f (x) vk
132 THE DERIVATIVE
instead of
Dk f (x) (v, , v) .
Note that for every k, Dk f maps U to a normed vector space. As mentioned
above, Df (x) has values in L (X, Y ) , D2 f (x) has values in L (X, L (X, Y )) , etc.
Thus it makes sense to consider whether Dk f is continuous. This is described in
the following denition.
Denition 6.6.2 Let U be an open subset of X, a normed vector space and let
f : U Y. Then f is C k (U ) if f and its rst k derivatives are all continuous. Also,
Dk f (x) when it exists can be considered a Y valued multilinear function.
6.7 C k Functions
Recall that for a C 1 function, f

Df (x) v = Dvj f (x) aj = Dvj fi (x) wi aj
j ij
( )

= Dvj fi (x) wi vj ak vk = Dvj fi (x) wi vj (v)
ij k ij

where ak vk = v and
k
f (x) = fi (x) wi . (6.7.8)
i
This is because ( )

wi vj ak vk ak wi jk = wi aj .
k k
Thus
Df (x) = Dvj fi (x) wi vj
ij
I propose to iterate this observation, starting with f and then going to Df and
then D2 f and so forth. Hopefully it will yield a rational way to understand higher
order derivatives in the same way that matrices can be used to understand linear
transformations. Thus beginning with the derivative,

Df (x) = Dvj1 fi (x) wi vj1 .
ij1
Then letting wi vj1 play the role of wi in 6.7.8,

( )
D2 f (x) = Dvj2 Dvj1 fi (x) wi vj1 vj2
ij1 j2

Dvj1 vj2 fi (x) wi vj1 vj2
ij1 j2
6.7. C K FUNCTIONS 133
Then letting wi vj1 vj2 play the role of wi in 6.7.8,

( )
D3 f (x) = Dvj3 Dvj1 vj2 fi (x) wi vj1 vj2 vj3
ij1 j2 j3

Dvj1 vj2 vj3 fi (x) wi vj1 vj2 vj3
ij1 j2 j3
etc. In general, the notation,
wi vj1 vj2 vjn
denes an appropriate linear transformation given by
wi vj1 vj2 vjn (vk ) = wi vj1 vj2 vjn1 kjn .
The following theorem is important.
Theorem 6.7.1 The function x Dk f (x) exists and is continuous for k p if

and only if there exists a basis for X, {v1 , , vn } and a basis for Y, {w1 , , wm }
such that for

f (x) fi (x) wi ,
i
it follows that for each i = 1, 2, , m all Gateaux derivatives,
Dvj1 vj2 vjk fi (x)
for any choice of vj1 vj2 vjk and for any k p exist and are continuous.
Proof: This follows from a repeated application of Theorems 6.5.1 and 6.5.4 at
each new dierentiation.
Denition 6.7.2 Let X, Y be nite dimensional normed vector spaces and let U
be an open set in X and f : U Y be a function,

f (x) = fi (x) wi
i
where {w1 , , wm } is a basis for Y. Then f is said to be a C n (U ) function if for

every k n, Dk f (x) exists for all x U and is continuous. This is equivalent to the
other condition which states that for each i = 1, 2, , m all Gateaux derivatives,
Dvj1 vj2 vjk fi (x)
for any choice of vj1 vj2 vjk where {v1 , , vn } is a basis for X and for any
k n exist and are continuous.
134 THE DERIVATIVE
6.7.1 Some Standard Notation

In the case where X = Rn and the basis chosen is the standard basis, these Gateaux
derivatives are just the partial derivatives. Recall the notation for partial derivatives
in the following denition.
Denition 6.7.3 Let g : U X. Then

g g (x + hek ) g (x)
gxk (x) (x) lim
xk h0 h
Higher order partial derivatives are dened in the usual way.
2g
gxk xl (x) (x)
xl xk
and so forth.
A convenient notation which is often used which helps to make sense of higher
order partial derivatives is presented in the following denition.
Denition 6.7.4 = (1 , , n ) for 1 n positive integers is called a multi-

index. For a multi-index, || 1 + + n and if x X,
x = (x1 , , xn ),
and f a function, dene
|| f (x)
1 x2 xn , D f (x)
x x 1 2 n
.
x 2
1 x2
1
x
n
n
Then in this special case, the following denition is equivalent to the above as
a denition of what is meant by a C k function.
Denition 6.7.5 Let U be an open subset of Rn and let f : U Y. Then for k a

nonnegative integer, f is C k if for every || k, D f exists and is continuous.
6.8 The Derivative And The Cartesian Product

There are theorems which can be used to get dierentiability of a function based
on existence and continuity of the partial derivatives. A generalization of this was
given above. Here a function dened on a product space is considered. It is very
much like what was presented above and could be obtained as a special case but
to reinforce the ideas, I will do it from scratch because certain aspects of it are
important in the statement of the implicit function theorem.
The following is an important abstract generalization of the concept of partial
derivative presented above. Insead of taking the derivative with respect to one
variable, it is taken with respect to several but not with respect to others. This
vague notion is made precise in the following denition. First here is a lemma.
6.8. THE DERIVATIVE AND THE CARTESIAN PRODUCT 135
Lemma 6.8.1 Suppose U is an open set in X Y. Then the set Uy dened by
Uy {x X : (x, y) U }
is an open set in X. Here X Y is a nite dimensional vector space in which the

vector space operations are dened componentwise. Thus for a, b F,
a (x1 , y1 ) + b (x2 , y2 ) = (ax1 + bx2 , ay1 + by2 )
and the norm can be taken to be
||(x, y)|| max (||x|| , ||y||)
Proof: Recall by Theorem 4.8.4 it does not matter how this norm is dened
and the denition above is convenient. It obviously satises most axioms of a norm.
The only one which is not obvious is the triangle inequality. I will show this now.
||(x, y) + (x1 , y1 )|| = ||(x + x1 , y + y1 )|| max (||x + x1 || , ||y + y1 ||)

max (||x|| + ||x1 || , ||y|| + ||y1 ||)
suppose then that ||x|| + ||x1 || ||y|| + ||y1 || . Then the above equals
||x|| + ||x1 || max (||x|| , ||y||) + max (||x1 || , ||y1 ||) ||(x, y)|| + ||(x1 , y1 )||
In case ||x|| + ||x1 || < ||y|| + ||y1 || , the argument is similar.

Let x Uy . Then (x, y) U and so there exists r > 0 such that
B ((x, y) , r) U.
This says that if (u, v) X Y such that ||(u, v) (x, y)|| < r, then (u, v) U.
Thus if
||(u, y) (x, y)|| = ||u x|| < r,
then (u, y) U. This has just said that B (x,r), the ball taken in X is contained in
Uy .
Or course one could also consider
Ux {y : (x, y) U }
in the same way and conclude this set is open in Y . Also, the n generalization to
many factors yields the same conclusion. In this case, for x i=1 Xi , let
( )
||x|| max ||xi ||Xi : x = (x1 , , xn )
n
Then a similar argument to the above shows this is a norm on i=1 Xi .
n
Corollary 6.8.2 Let U i=1 Xi and let
{ ( ) }
U(x1 , ,xi1 ,xi+1 , ,xn ) x Fri : x1 , , xi1 , x, xi+1 , , xn U .
Then U(x1 , ,xi1 ,xi+1 , ,xn ) is an open set in Fri .

136 THE DERIVATIVE
The proof is similar to the above.

n
Denition 6.8.3 Let g : U i=1 Xi Y , where U is an open set. Then the
map ( )
z g x1 , , xi1 , z, xi+1 , , xn
is a function from the open set in Xi ,
{ ( ) }
z : x = x1 , , xi1 , z, xi+1 , , xn U
to Y . When this map is dierentiable,its derivative is denoted by Di g (x). To aid

n
in the notation, for v Xi , let i v
n i=1 Xi be the vector (0, , v, , 0) th
where
the v is in the i slot and for v i=1 Xi , let vi denote the entry in the i slot
th
of v. Thus, by saying
( )
z g x1 , , xi1 , z, xi+1 , , xn
is dierentiable is meant that for v Xi suciently small,
g (x + i v) g (x) = Di g (x) v + o (v) .
Note Di g (x) L (Xi , Y ) .
Denition 6.8.4 Let U X be an open set. Then f : U Y is C 1 (U ) if f is

dierentiable and the mapping
x Df (x) ,
is continuous as a function from U to L (X, Y ).
With this denition of partial derivatives, here is the major theorem.

n
Theorem 6.8.5 Let g, U, i=1 Xi , be given as in Denition 6.8.3. Then g is
C 1 (U ) if and only if Di g exists and is continuous on U for each i. In this case, g
is dierentiable and
Dg (x) (v) = Dk g (x) vk (6.8.9)
k
where v = (v1 , , vn ) .
Proof: Suppose then that Di g exists and is continuous for each i. Note that

k
j vj = (v1 , , vk , 0, , 0) .
j=1
n 0
Thus j=1 j vj = v and dene j=1 j vj 0. Therefore,

n
k
k1
g (x + v) g (x) = g x+ j vj g x + j vj (6.8.10)
k=1 j=1 j=1
6.8. THE DERIVATIVE AND THE CARTESIAN PRODUCT 137
Consider the terms in this sum.

k
k1
g x+ j v j g x + j vj = g (x+k vk ) g (x) + (6.8.11)
j=1 j=1

k
k1
g x+ j vj g (x+k vk ) g x + j vj g (x) (6.8.12)
j=1 j=1
and the expression in 6.8.12 is of the form h (vk ) h (0) where for small w Xk ,

k1
h (w) g x+ j vj + k w g (x + k w) .
j=1
Therefore,

k1
Dh (w) = Dk g x+ j vj + k w Dk g (x + k w)
j=1
and by continuity, ||Dh (w)|| < provided ||v|| is small enough. Therefore, by
Theorem 6.4.2, whenever ||v|| is small enough,
||h (vk ) h (0)|| ||vk || ||v||
which shows that since is arbitrary, the expression in 6.8.12 is o (v). Now in 6.8.11
g (x+k vk ) g (x) = Dk g (x) vk + o (vk ) = Dk g (x) vk + o (v) .
Therefore, referring to 6.8.10,

n
g (x + v) g (x) = Dk g (x) vk + o (v)
k=1
which shows Dg (x) exists and equals the formula given in 6.8.9.
Next suppose g is C 1 . I need to verify that Dk g (x) exists and is continuous.
Let v Xk suciently small. Then
g (x + k v) g (x) = Dg (x) k v + o (k v)
= Dg (x) k v + o (v)
since ||k v|| = ||v||. Then Dk g (x) exists and equals
Dg (x) k
Now x Dg (x) nis continuous. Since k is linear, it follows from Theorem 4.8.3
that k : Xk i=1 Xi is also continuous,
The way this is usually used is in the following corollary, a case of Theorem 6.8.5
obtained by letting Xi = F in the above theorem.
138 THE DERIVATIVE
Corollary 6.8.6 Let U be an open subset of Fn and let f :U Fm be C 1 in

the sense that all the partial derivatives of f exist and are continuous. Then f is
dierentiable and

n
f
f (x + v) = f (x) + (x) vk + o (v) .
xk
k=1
6.9 Mixed Partial Derivatives

Continuing with the special case where f is dened on an open set in Fn , I will next
consider an interesting result due to Euler in 1734 about mixed partial derivatives.
It turns out that the mixed partial derivatives, if continuous will end up being equal.
Recall the notation
f
fx = = De1 f
x
and
2f
fxy = = De1 e2 f.
yx
Theorem 6.9.1 Suppose f : U F2 R where U is an open set on which fx , fy ,
fxy and fyx exist. Then if fxy and fyx are continuous at the point (x, y) U , it
follows
fxy (x, y) = fyx (x, y) .
Proof: Since U is open, there exists r > 0 such that B ((x, y) , r) U. Now let
|t| , |s| < r/2, t, s real numbers and consider
h(t) h(0)
1 z }| { z }| {
(s, t) {f (x + t, y + s) f (x + t, y) (f (x, y + s) f (x, y))}. (6.9.13)
st
Note that (x + t, y + s) U because
( )1/2
|(x + t, y + s) (x, y)| = |(t, s)| = t2 + s2
( 2 )1/2
r r2 r
+ = < r.
4 4 2
As implied above, h (t) f (x + t, y + s) f (x + t, y). Therefore, by the mean
value theorem and the (one variable) chain rule,
1 1
(s, t) = (h (t) h (0)) = h (t) t
st st
1
= (fx (x + t, y + s) fx (x + t, y))
s
for some (0, 1) . Applying the mean value theorem again,
(s, t) = fxy (x + t, y + s)
6.9. MIXED PARTIAL DERIVATIVES 139
where , (0, 1).

If the terms f (x + t, y) and f (x, y + s) are interchanged in 6.9.13, (s, t) is
unchanged and the above argument shows there exist , (0, 1) such that
(s, t) = fyx (x + t, y + s) .
Letting (s, t) (0, 0) and using the continuity of fxy and fyx at (x, y) ,
lim (s, t) = fxy (x, y) = fyx (x, y) .
(s,t)(0,0)

The following is obtained from the above by simply xing all the variables except
for the two of interest.
Corollary 6.9.2 Suppose U is an open subset of X and f : U R has the property
that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk
are both continuous at x U. Then fxk xl (x) = fxl xk (x) .
By considering the real and imaginary parts of f in the case where f has values
in C you obtain the following corollary.
Corollary 6.9.3 Suppose U is an open subset of Fn and f : U F has the property
that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk
are both continuous at x U. Then fxk xl (x) = fxl xk (x) .
Finally, by considering the components of f you get the following generalization.
Corollary 6.9.4 Suppose U is an open subset of Fn and f : U F m has the
property that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and
fxl xk are both continuous at x U. Then fxk xl (x) = fxl xk (x) .
It is necessary to assume the mixed partial derivatives are continuous in order
to assert they are equal. The following is a well known example [3].
Example 6.9.5 Let
{
xy (x2 y 2 )
f (x, y) = x2 +y 2 if (x, y) = (0, 0)
0 if (x, y) = (0, 0)
From the denition of partial derivatives it follows immediately that fx (0, 0) =
fy (0, 0) = 0. Using the standard rules of dierentiation, for (x, y) = (0, 0) ,
x4 y 4 + 4x2 y 2 x4 y 4 4x2 y 2
fx = y 2 , fy = x 2
(x2 + y 2 ) (x2 + y 2 )
Now
fx (0, y) fx (0, 0)
fxy (0, 0) lim
y0 y
y 4
= lim = 1
y0 (y 2 )2
140 THE DERIVATIVE
while
fy (x, 0) fy (0, 0)
fyx (0, 0) lim
x0 x
x4
= lim =1
x0 (x2 )2
showing that although the mixed partial derivatives do exist at (0, 0) , they are not
equal there.
6.10 Implicit Function Theorem

The following lemma is very useful.
Lemma 6.10.1 Let A L (X, X) where X is a nite dimensional normed vector

space and suppose ||A|| r < 1. Then
1
(I A) exists (6.10.14)
and
1 1
(I A) (1 r) . (6.10.15)
Furthermore, if
{ }
I A L (X, X) : A1 exists
the map A A1 is continuous on I and I is an open subset of L (X, X).
Proof: Let ||A|| r < 1. If (I A) x = 0, then x = Ax and so if x = 0,
||x|| = ||Ax|| ||A|| ||x|| < r ||x||
which is a contradiction. Therefore, (I A) is one to one. Hence it maps a basis of

X to a basis of X and is therefore, onto. Here is why. Let {v1 , , vn } be a basis
for X and suppose
n
ck (I A) vk = 0.
k=1
Then ( )

n
(I A) ck vk =0
k=1
and since (I A) is one to one, it follows

n
ck vk = 0
k=1
6.10. IMPLICIT FUNCTION THEOREM 141
n
which requires each ck = 0 because the {vk } are independent. Hence {(I A) vk }k=1
is a basis for X because there are n of these vectors and every basis has the same
size. Therefore, if y X, there exist scalars, ck such that
( n )
n
y= ck (I A) vk = (I A) ck vk
k=1 k=1
1
so (I A) is onto as claimed. Thus (I A) L (X, X) and it remains to estimate
its norm.
||x Ax|| ||x|| ||Ax|| ||x|| ||A|| ||x|| ||x|| (1 r)

1
Letting y = x Ax so x = (I A) y, this shows, since (I A) is onto that for
all y X,
1
||y|| (I A) y (1 r)

1 1
and so (I A) (1 r) . This proves the rst part.
To verify the continuity of the inverse map, let A I. Then
( )
B = A I A1 (A B)

and so if A1 (A B) < 1 which, by Theorem 4.8.3, happens if

||A B|| < 1/ A1 ,
( )1
it follows from the rst part of this proof that I A1 (A B) exists and so
( )1 1
B 1 = I A1 (A B) A
which shows I is open. Also, if

1
A (A B) r < 1, (6.10.16)
1 1
B A (1 r)1
Now for such B this close to A such that 6.10.16 holds,

1
B A1 = B 1 (A B) A1 ||A B|| B 1 A1
2
||A B|| A1 (1 r)
1
which shows the map which takes a linear transformation in I to its inverse is
continuous.
The next theorem is a very useful result in many areas. It will be used in this
section to give a short proof of the implicit function theorem but it is also useful
in studying dierential equations and integral equations. It is sometimes called the
uniform contraction principle.
142 THE DERIVATIVE
Theorem 6.10.2 Let X, Y be nite dimensional normed vector spaces. Also let E
be a closed subset of X and F a closed subset of Y. Suppose for each (x, y) E F,
T (x, y) E and satises
||T (x, y) T (x , y)|| r ||x x || (6.10.17)
where 0 < r < 1 and also
||T (x, y) T (x, y )|| M ||y y || . (6.10.18)
Then for each y F there exists a unique xed point for T (, y) , x E, satisfying
T (x, y) = x (6.10.19)
and also if x (y) is this xed point,

M
||x (y) x (y )|| ||y y || . (6.10.20)
1r
Proof: First consider the claim there exists a xed point for the mapping,
T (, y). For a xed y, let g (x) T (x, y). Now pick any x0 E and consider the
sequence,
x1 = g (x0 ) , xk+1 = g (xk ) .
Then by 6.10.17,
||xk+1 xk || = ||g (xk ) g (xk1 )|| r ||xk xk1 ||
r2 ||xk1 xk2 || rk ||g (x0 ) x0 || .

Now by the triangle inequality,

p
||xk+p xk || ||xk+i xk+i1 ||
i=1

p
rk ||g (x0 ) x0 ||
rk+i1 ||g (x0 ) x0 || .
i=1
1r

Since 0 < r < 1, this shows that {xk }k=1 is a Cauchy sequence. Therefore, by
completeness of E it converges to a point x E. To see x is a xed point, use the
continuify of g to obtain
x lim xk = lim xk+1 = lim g (xk ) = g (x) .

k k k
This proves 6.10.19. To verify 6.10.20,
||x (y) x (y )|| = ||T (x (y) , y) T (x (y ) , y )||
||T (x (y) , y) T (x (y) , y )|| + ||T (x (y) , y ) T (x (y ) , y )||

M ||y y || + r ||x (y) x (y )|| .

Thus
(1 r) ||x (y) x (y )|| M ||y y || .
This also shows the xed point for a given y is unique.
The implicit function theorem deals with the question of solving, f (x, y) = 0
for x in terms of y and how smooth the solution is. It is one of the most important
theorems in mathematics. The proof I will give holds with no change in the context
of innite dimensional complete normed vector spaces when suitable modications
are made on what is meant by L (X, Y ) . There are also even more general versions
of this theorem than to normed vector spaces.
Recall that for X, Y normed vector spaces, the norm on X Y is of the form
||(x, y)|| = max (||x|| , ||y||) .
Theorem 6.10.3 (implicit function theorem) Let X, Y, Z be nite dimensional normed

vector spaces and suppose U is an open set in X Y . Let f : U Z be in C 1 (U )
and suppose
1
f (x0 , y0 ) = 0, D1 f (x0 , y0 ) L (Z, X) . (6.10.21)
Then there exist positive constants, , , such that for every y B (y0 , ) there
exists a unique x (y) B (x0 , ) such that
f (x (y) , y) = 0. (6.10.22)
Furthermore, the mapping, y x (y) is in C 1 (B (y0 , )).

1
Proof: Let T (x, y) x D1 f (x0 , y0 ) f (x, y). Therefore,
1
D1 T (x, y) = I D1 f (x0 , y0 ) D1 f (x, y) . (6.10.23)
by continuity of the derivative which implies continuity of D1 T, it follows there

exists > 0 such that if ||(x x0 , y y0 )|| < , then
1
||D1 T (x, y)|| < . (6.10.24)
2
Also, it can be assumed is small enough that

1
D1 f (x0 , y0 ) ||D2 f (x, y)|| < M (6.10.25)

1
where M > D1 f (x0 , y0 ) ||D2 f (x0 , y0 )||. By Theorem 6.4.2, whenever x, x
B (x0 , ) and y B (y0 , ),
1
||T (x, y) T (x , y)|| ||x x || . (6.10.26)
2
Solving 6.10.23 for D1 f (x, y) ,
D1 f (x, y) = D1 f (x0 , y0 ) (I D1 T (x, y)) .

144 THE DERIVATIVE
1 1
By Lemma 6.10.1 and the assumption that D1 f (x0 , y0 ) exists, it follows, D1 f (x, y)
exists and equals
1 1
(I D1 T (x, y)) D1 f (x0 , y0 )
By the estimate of Lemma 6.10.1 and 6.10.24,

1 1
D1 f (x, y) 2 D1 f (x0 , y0 ) . (6.10.27)
Next more restrictions are placed on y to make it even closer to y0 . Let

( )

0 < < min , .
3M
Then suppose x B (x0 , ) and y B (y0 , ). Consider

1
x D1 f (x0 , y0 ) f (x, y) x0 = T (x, y) x0 g (x, y) .
1
D1 g (x, y) = I D1 f (x0 , y0 ) D1 f (x, y) = D1 T (x, y) ,
and
1
D2 g (x, y) = D1 f (x0 , y0 ) D2 f (x, y) .
Also note that T (x, y) = x is the same as saying f (x0 , y0 ) = 0 and also g (x0 , y0 ) =
0. Thus by 6.10.25 and Theorem 6.4.2, it follows that for such (x, y) B (x0 , )
B (y0 , ),
||T (x, y) x0 || = ||g (x, y)|| = ||g (x, y) g (x0 , y0 )||
||g (x, y) g (x, y0 )|| + ||g (x, y0 ) g (x0 , y0 )||
1 5
M ||y y0 || + ||x x0 || < + = < . (6.10.28)
2 2 3 6
Also for such (x, yi ) , i = 1, 2, Theorem 6.4.2 and 6.10.25 implies

1
||T (x, y1 ) T (x, y2 )|| = D1 f (x0 , y0 ) (f (x, y2 ) f (x, y1 ))
M ||y2 y1 || . (6.10.29)
From now on assume ||x x0 || < and ||y y0 || < so that 6.10.29, 6.10.27,
6.10.28, 6.10.26, and 6.10.25 all hold. By 6.10.29, 6.10.26, 6.10.28, and the uniform
( )
contraction principle, Theorem 6.10.2 applied to E B x0 , 5 6 and F B (y0 , )
implies that for each y B (y0 , ), there exists a unique x (y) B (x0 , ) (actually
( )
in B x0 , 5
6 ) such that T (x (y) , y) = x (y) which is equivalent to
f (x (y) , y) = 0.
Furthermore,
||x (y) x (y )|| 2M ||y y || . (6.10.30)
This proves the implicit function theorem except for the verication that y
x (y) is C 1 . This is shown next. Letting v be suciently small, Theorem 6.8.5 and
Theorem 6.4.2 imply
0 = f (x (y + v) , y + v) f (x (y) , y) =
D1 f (x (y) , y) (x (y + v) x (y)) +
+D2 f (x (y) , y) v + o ((x (y + v) x (y) , v)) .
The last term in the above is o (v) because of 6.10.30. Therefore, using 6.10.27,
solve the above equation for x (y + v) x (y) and obtain
1
x (y + v) x (y) = D1 (x (y) , y) D2 f (x (y) , y) v + o (v)
Which shows that y x (y) is dierentiable on B (y0 , ) and

1
Dx (y) = D1 f (x (y) , y) D2 f (x (y) , y) . (6.10.31)
Now it follows from the continuity of D2 f , D1 f , the inverse map, 6.10.30, and this
formula for Dx (y)that x () is C 1 (B (y0 , )).
The next theorem is a very important special case of the implicit function the-
orem known as the inverse function theorem. Actually one can also obtain the
implicit function theorem from the inverse function theorem. It is done this way in
[33] and in [2].
Theorem 6.10.4 (inverse function theorem) Let x0 U , an open set in X , and

let f : U Y where X, Y are nite dimensional normed vector spaces. Suppose
f is C 1 (U ) , and Df (x0 )1 L(Y, X). (6.10.32)
Then there exist open sets W , and V such that
x0 W U, (6.10.33)
f : W V is one to one and onto, (6.10.34)

1 1
f is C , (6.10.35)
Proof: Apply the implicit function theorem to the function
F (x, y) f (x) y
where y0 f (x0 ). Thus the function y x (y) dened in that theorem is f 1 .

Now let
W B (x0 , ) f 1 (B (y0 , ))
and
V B (y0 , ) .

146 THE DERIVATIVE
6.10.1 More Derivatives

In the implicit function theorem, suppose f is C k . Will the implicitly dened func-
tion also be C k ? It was shown above that this is the case if k = 1. In fact it holds
for any positive integer k.
First of all, consider D2 f (x (y) , y) L (Y, Z) . Let {w1 , , wm } be a basis
for Y and let {z1 , , zn } be a basis for Z. Then D2 f (x (y) , y) has a matrix
with
( respect to these
) bases. Thus conserving on notation, denote this matrix by
D2 f (x (y) , y)ij . Thus

D2 f (x (y) , y) = D2 f (x (y) , y)ij zi wj
ij
The scalar valued entries of the matrix of D2 f (x (y) , y) have the same dieren-
tiability as the function y D2 f (x (y) , y) . This is because the linear projection
map ij mapping L (Y, Z) to F given by ij L Lij , the ij th entry of the matrix of
L with respect to the given bases is continuous thanks to Theorem 4.8.3. Similar
considerations apply to D1 f (x (y) , y) and the entries of its matrix, D1 f (x (y) , y)ij
taken with respect to suitable bases. From the formula for the inverse of a matrix,
1 1
Theorem 5.2.14, the ij th entries of the matrix of D1 f (x (y) , y) , D1 f (x (y) , y)ij
also have the same dierentiability as y D1 f (x (y) , y).
Now consider the formula for the derivative of the implicitly dened function in
6.10.31,
1
Dx (y) = D1 f (x (y) , y) D2 f (x (y) , y) . (6.10.36)
The above derivative is in L (Y, X) . Let {w1 , , wm } be a basis for Y and let
{v1 , , vn } be a basis for X. Letting xi be the ith component of x with respect
to the basis for X, it follows from Theorem 6.7.1, y x (y) will be C k if all such
Gateaux derivatives, Dwj1 wj2 wjr xi (y) exist and are continuous for r k and for
any i. Consider what is required for this to happen. By 6.10.36,
( 1
)
Dwj xi (y) = D1 f (x (y) , y) (D2 f (x (y) , y))kj
ik
k
G1 (x (y) , y) (6.10.37)
where (x, y) G1 (x, y) is C k1 because it is assumed f is C k and one derivative

has been taken to write the above. If k 2, then another Gateaux derivative can
be taken.
G1 (x (y + twk ) , y + twk ) G1 (x (y) , y)
Dwj wk xi (y) lim
t0 t
= D1 G1 (x (y) , y) Dx (y) wk + D2 G1 (x (y) , y)
G2 (x (y) , y,Dx (y))
Since a similar result holds for all i and any choice of wj , wk , this shows x is at
least C 2 . If k 3, then another Gateaux derivative can be taken because then
(x, y, z) G2 (x, y, z) is C 1 and it has been established Dx is C 1 . Continuing this
6.11. TAYLORS FORMULA 147
way shows Dwj1 wj2 wjr xi (y) exists and is continuous for r k. This proves the
following corollary to the implicit and inverse function theorems.
Corollary 6.10.5 In the implicit and inverse function theorems, you can replace
C 1 with C k in the statements of the theorems for any k N.
6.10.2 The Case Of Rn

In many applications of the implicit function theorem,
f : U Rn Rm Rn
and f (x0 , y0 ) = 0 while f is C 1 . How can you recognize the condition of the implicit
1
function theorem which says D1 f (x0 , y0 ) exists? This is really not hard. You
recall the matrix of the transformation D1 f (x0 , y0 ) with respect to the usual basis
vectors is
f1,x1 (x0 , y0 ) f1,xn (x0 , y0 )
.. ..
. .
fn,x1 (x0 , y0 ) fn,xn (x0 , y0 )
1
and so D1 f (x0 , y0 ) exists exactly when the determinant of the above matrix is
nonzero. This is the condition to check. In the general case, you just need to verify
D1 f (x0 , y0 ) is one to one and this can also be accomplished by looking at the matrix
of the transformation with respect to some bases on X and Z.
6.11 Taylors Formula

First recall the Taylor formula with the Lagrange form of the remainder. It will
only be needed on [0, 1] so that is what I will show.
Theorem 6.11.1 Let h : [0, 1] R have m + 1 derivatives. Then there exists

t (0, 1) such that

m
h(k) (0) h(m+1) (t)
h (1) = h (0) + + .
k! (m + 1)!
k=1
Proof: Let K be a number chosen such that

( )

m
h(k) (0)
h (1) h (0) + +K =0
k!
k=1
Now the idea is to nd K. To do this, let

( )

m
h(k) (t) k m+1
F (t) = h (1) h (t) + (1 t) + K (1 t)
k!
k=1
148 THE DERIVATIVE
Then F (1) = F (0) = 0. Therefore, by Rolles theorem there exists t between 0 and
1 such that F (t) = 0. Thus,

m
h(k+1) (t)
F (t) = h (t) +
k
0 = (1 t)
k!
k=1

m
h(k) (t) k1 m
k (1 t) K (m + 1) (1 t)
k!
k=1
And so

m
h(k+1) (t)
m1
h(k+1) (t)
= h (t) +
k k
(1 t) (1 t)
k! k!
k=1 k=0
m
K (m + 1) (1 t)
h(m+1) (t)
= h (t) + (1 t) h (t) K (m + 1) (1 t)
m m
m!
and so
h(m+1) (t)
K= .
(m + 1)!

Now let f : U R where U X a normed vector space and suppose f
C m (U ). Let x U and let r > 0 be such that
B (x,r) U.
Then for ||v|| < r consider
f (x+tv) f (x) h (t)
for t [0, 1]. Then by the chain rule,
h (t) = Df (x+tv) (v) , h (t) = D2 f (x+tv) (v) (v)
and continuing in this way,
h(k) (t) = D(k) f (x+tv) (v) (v) (v) D(k) f (x+tv) vk .
It follows from Taylors formula for a function of one variable given above that

m
D(k) f (x) vk D(m+1) f (x+tv) vm+1
f (x + v) = f (x) + + . (6.11.38)
k! (m + 1)!
k=1
This proves the following theorem.

Theorem 6.11.2 Let f : U R and let f C m+1 (U ). Then if
B (x,r) U,
and ||v|| < r, there exists t (0, 1) such that 6.11.38 holds.
6.11. TAYLORS FORMULA 149
6.11.1 Second Derivative Test

Now consider the case where U Rn and f : U R is C 2 (U ). Then from Taylors
theorem, if v is small enough, there exists t (0, 1) such that
D2 f (x+tv) v2
f (x + v) = f (x) + Df (x) v+ . (6.11.39)
2
Consider
D2 f (x+tv) (ei ) (ej ) D (D (f (x+tv)) ei ) ej

( )
f (x + tv)
= D ej
xi
2 f (x + tv)
=
xj xi
where ei are the usual basis vectors. Letting

n
v= vi ei ,
i=1
the second derivative term in 6.11.39 reduces to

1 2 1
D f (x+tv) (ei ) (ej ) vi vj = Hij (x+tv) vi vj
2 i,j 2 i,j
where
2 f (x+tv)
Hij (x+tv) = D2 f (x+tv) (ei ) (ej ) = .
xj xi
2 f (x)
Denition 6.11.3 The matrix whose ij th entry is xj xi is called the Hessian ma-
trix, denoted as H (x).
From Theorem 6.9.1, this is a symmetric real matrix, thus self adjoint. By the
continuity of the second partial derivative,
1
f (x + v) = f (x) + Df (x) v+ vT H (x) v+
2
1( T )
v (H (x+tv) H (x)) v . (6.11.40)
2
where the last two terms involve ordinary matrix multiplication and
vT = (v1 vn )
for vi the components of v relative to the standard basis.

150 THE DERIVATIVE
Denition 6.11.4 Let f : D R where D is a subset of some normed vector

space. Then f has a local minimum at x D if there exists > 0 such that for all
y B (x, )
f (y) f (x) .
f has a local maximum at x D if there exists > 0 such that for all y B (x, )
f (y) f (x) .
Theorem 6.11.5 If f : U R where U is an open subset of Rn and f is C 2 ,
suppose Df (x) = 0. Then if H (x) has all positive eigenvalues, x is a local min-
imum. If the Hessian matrix H (x) has all negative eigenvalues, then x is a local
maximum. If H (x) has a positive eigenvalue, then there exists a direction in which
f has a local minimum at x, while if H (x) has a negative eigenvalue, there exists
a direction in which H (x) has a local maximum at x.
Proof: Since Df (x) = 0, formula 6.11.40 holds and by continuity of the second
derivative, H (x) is a symmetric matrix. Thus H (x) has all real eigenvalues. Sup-
pose rst that H (x) has all positive eigenvalues and that all are larger than 2 > 0.
Then by Theorem 5.2.30,
2
uT H (x) u 2 |u| .
From 6.11.40 and the continuity of H, if v is small enough,
1 2 1 2 2 2
f (x + v) f (x) + 2 |v| 2 |v| = f (x) + |v| .
2 4 4
This shows the rst claim of the theorem. The second claim follows from similar
reasoning. Suppose H (x) has a positive eigenvalue 2 . Then let v be an eigenvector
for this eigenvalue. Then from 6.11.40,
1
f (x+tv) = f (x) + t2 vT H (x) v+
2
1 2( T )
t v (H (x+tv) H (x)) v
2
which implies
1 2 1 ( )
f (x+tv) = f (x) + t2 2 |v| + t2 vT (H (x+tv) H (x)) v
2 2
1 2 2 2
f (x) + t |v|
4
whenever t is small enough. Thus in the direction v the function has a local mini-
mum at x. The assertion about the local maximum in some direction follows simi-
larly.
This theorem is an analogue of the second derivative test for higher dimensions.
As in one dimension, when there is a zero eigenvalue, it may be impossible to de-
termine from the Hessian matrix what the local qualitative behavior of the function
is. For example, consider
f1 (x, y) = x4 + y 2 , f2 (x, y) = x4 + y 2 .
6.12. THE METHOD OF LAGRANGE MULTIPLIERS 151
Then Dfi (0, 0) = 0 and for both functions, the Hessian matrix evaluated at
(0, 0) equals ( )
0 0
0 2
but the behavior of the two functions is very dierent near the origin. The second
has a saddle point while the rst has a minimum there.
6.12 The Method Of Lagrange Multipliers

As an application of the implicit function theorem, consider the method of Lagrange
multipliers from calculus. Recall the problem is to maximize or minimize a function
subject to equality constraints. Let f : U R be a C 1 function where U Rn and
let
gi (x) = 0, i = 1, , m (6.12.41)
be a collection of equality constraints with m < n. Now consider the system of
nonlinear equations
f (x) = a
gi (x) = 0, i = 1, , m.
x0 is a local maximum if f (x0 ) f (x) for all x near x0 which also satises the
constraints 6.12.41. A local minimum is dened similarly. Let F : U R Rm+1
be dened by
f (x) a
g1 (x)

F (x,a) .. . (6.12.42)
.
gm (x)
Now consider the m+1n Jacobian matrix, the matrix of the linear transformation,
D1 F (x, a) with respect to the usual basis for Rn and Rm+1 .

fx1 (x0 ) fxn (x0 )
g1x1 (x0 ) g1xn (x0 )

.. .. .
. .
gmx1 (x0 ) gmxn (x0 )
If this matrix has rank m + 1 then some m + 1 m + 1 submatrix has nonzero

determinant. It follows from the implicit function theorem that there exist m + 1
variables, xi1 , , xim+1 such that the system
F (x,a) = 0 (6.12.43)
species these m + 1 variables as a function of the remaining n (m + 1) variables

and a in an open set of Rnm . Thus there is a solution (x,a) to 6.12.43 for some x
152 THE DERIVATIVE
close to x0 whenever a is in some open interval. Therefore, x0 cannot be either a

local minimum or a local maximum. It follows that if x0 is either a local maximum
or a local minimum, then the above matrix must have rank less than m + 1 which,
requires the rows to be linearly dependent. Thus, there exist m scalars,
1 , , m ,
and a scalar , not all zero such that

fx1 (x0 ) g1x1 (x0 ) gmx1 (x0 )
.. .. ..
. = 1 . + + m . . (6.12.44)
fxn (x0 ) g1xn (x0 ) gmxn (x0 )
If the column vectors

g1x1 (x0 ) gmx1 (x0 )
.. ..
. , . (6.12.45)
g1xn (x0 ) gmxn (x0 )
are linearly independent, then, = 0 and dividing by yields an expression of the

form

fx1 (x0 ) g1x1 (x0 ) gmx1 (x0 )
.. .. ..
. = 1 . + + m . (6.12.46)
fxn (x0 ) g1xn (x0 ) gmxn (x0 )
at every point x0 which is either a local maximum or a local minimum. This proves
the following theorem.
Theorem 6.12.1 Let U be an open subset of Rn and let f : U R be a C 1

function. Then if x0 U is either a local maximum or local minimum of f subject
to the constraints 6.12.41, then 6.12.44 must hold for some scalars , 1 , , m
not all equal to zero. If the vectors in 6.12.45 are linearly independent, it follows
that an equation of the form 6.12.46 holds.
6.13 Exercises
1. Suppose L L (X, Y ) and suppose L is one to one. Show there exists r > 0
such that for all x X,
||Lx|| r ||x|| .
Hint: You might argue that |||x||| ||Lx|| is a norm.

2. Show every polynomial, ||k d x is C k for every k.
3. If f : U R where U is an open set in X and f is C 2 , show the mixed

Gateaux derivatives, Dv1 v2 f (x) and Dv2 v1 f (x) are equal.
6.13. EXERCISES 153
4. Give an example of a function which is dierentiable everywhere but at some

point it fails to have continuous partial derivatives. Thus this function will
be an example of a dierentiable function which is not C 1 .
5. The existence of partial derivatives does not imply continuity as was shown
in an example. However, much more can be said than this. Consider
{ 2 4 2
(x y )
f (x, y) = (x2 +y 4 )2
if (x, y) = (0, 0) ,
1 if (x, y) = (0, 0) .
Show each Gateaux derivative, Dv f (0) exists and equals 0 for every v. Also
show each Gateaux derivative exists at every other point in R2 . Now consider
the curve x2 = y 4 and the curve y = 0 to verify the function fails to be con-
tinuous at (0, 0). This is an example of an everywhere Gateaux dierentiable
function which is not dierentiable and not continuous.
6. Let f be a real valued function dened on R2 by

{ 3 3
x y
f (x, y) x2 +y 2 if (x, y) = (0, 0)
0 if (x, y) = (0, 0)
Determine whether f is continuous at (0, 0). Find fx (0, 0) and fy (0, 0) . Are
the partial derivatives of f continuous at (0, 0)? Find
f (t (u, v))
D(u,v) f ((0, 0)) , lim .
t0 t
Is the mapping (u, v) D(u,v) f ((0, 0)) linear? Is f dierentiable at (0, 0)?
7. Let f : V R where V is a nite dimensional normed vector space. Suppose

f is convex which means
f (tx + (1 t) y) tf (x) + (1 t) f (y)
whenever t [0, 1]. Suppose also that f is dierentiable. Show then that for
every x, y V,
(Df (x) Df (y)) (x y) 0.
8. Suppose f : U V F where U is an open subset of V, a nite dimensional

inner product space with the inner product denoted by (, ) . Suppose f is
dierentiable. Show there exists a unique vector v (x) V such that
(u v (x)) = Df (x) u.
This special vector is called the gradient and is usually denoted by f (x) .
Hint: You might review the Riesz representation theorem presented earlier.
154 THE DERIVATIVE
9. Suppose f : U Y where U is an open subset of X, a nite dimensional

normed vector space. Suppose that for all v X, Dv f (x) exists. Show
that whenever a F Dav f (x) = aDv f (x). Explain why if x Dv f (x) is
continuous then v Dv f (x) is linear. Show that if f is dierentiable at x,
then Dv f (x) = Df (x) v.
10. Suppose B is an open ball in X and f : B Y is dierentiable. Suppose also
there exists L L (X, Y ) such that
||Df (x) L|| < k
for all x B. Show that if x1 , x2 B,
|f (x1 ) f (x2 ) L (x1 x2 )| k |x1 x2 | .
Hint: Consider T x = f (x) Lx and argue ||DT (x)|| < k. Then consider
Theorem 6.4.2.
11. Let U be an open subset of X, f : U Y where X, Y are nite dimensional
normed vector spaces and suppose f C 1 (U ) and Df (x0 ) is one to one. Then
show f is one to one near x0 . Hint: Show using the assumption that f is C 1
that there exists > 0 such that if
x1 , x2 B (x0 , ) ,
then
r
|f (x1 ) f (x2 ) Df (x0 ) (x1 x2 )| |x1 x2 | (6.13.47)
2
then use Problem 1.
12. Suppose M L (X, Y ) and suppose M is onto. Show there exists L L (Y, X)
such that
LM x =P x
where P L (X, X), and P 2 = P . Also show L is one to one and onto. Hint:
Let {y1 , , ym } be a basis of Y and let M xi = yi . Then dene

m
m
Ly = i xi where y = i yi .
i=1 i=1
Show {x1 , , xm } is a linearly independent set and show you can obtain
{x1 , , xm , , xn }, a basis for X in which M xj = 0 for j > m. Then let

m
Px i xi
i=1
where

m
x= i xi .
i=1
6.13. EXERCISES 155
13. This problem depends on the result of Problem 12. Let f : U X Y, f

is C 1 , and Df (x) is onto for each x U . Then show f maps open subsets
of U onto open sets in Y . Hint: Let P = LDf (x) as in Problem 12. Ar-
gue L maps open sets from Y to open sets of the vector space X1 P X
and L1 maps open sets from X1 to open sets of Y. Then Lf (x + v) =
Lf (x) + LDf (x) v + o (v) . Now for z X1 , let h (z) = Lf (x + z) Lf (x) .
Then h is C 1 on some small open subset of X1 containing 0 and Dh (0) =
LDf (x) which is seen to be one to one and onto and in L (X1 , X1 ) . There-
fore, if r is small enough, h (B (0,r)) equals an open set in X1 , V. This is by
the inverse function theorem. Hence L (f (x + B (0,r)) f (x)) = V and so
f (x + B (0,r)) f (x) = L1 (V ) , an open set in Y.
14. Suppose U R2 is an open set and f : U R3 is C 1 . Suppose Df (s0 , t0 )

has rank two and
x0
f (s0 , t0 ) = y0 .
z0
Show that for (s, t) near (s0 , t0 ), the points f (s, t) may be realized in one of
the following forms.
{(x, y, (x, y)) : (x, y) near (x0 , y0 )},
{( (y, z) , y, z) : (y, z) near (y0 , z0 )},

or
{(x, (x, z) , z, ) : (x, z) near (x0 , z0 )}.
This shows that parametrically dened surfaces can be obtained locally in a
particularly simple form.
15. Let f : U Y , Df (x) exists for all x U , B (x0 , ) U , and there exists
L L (X, Y ), such that L1 L (Y, X), and for all x B (x0 , )
r
||Df (x) L|| < , r < 1.
||L1 ||
Show that there exists > 0 and an open subset of B (x0 , ) , V , such that
f : V B (f (x0 ) , ) is one to one and onto. Also Df 1 (y) exists for each
y B (f (x0 ) , ) and is given by the formula
[ ( )]1
Df 1 (y) = Df f 1 (y) .
Hint: Let
Ty (x) T (x, y) xL1 (f (x) y)
(1r)
for |y f (x0 )| < 2||L1 || , consider {Ty (x0 )}. This is a version of the inverse
n
function theorem for f only dierentiable, not C 1 .

156 THE DERIVATIVE
16. Recall the nth derivative can be considered a multilinear function dened on
X n with values in some normed vector space. Now dene a function denoted
as wi vj1 vjn which maps X n Y in the following way
wi vj1 vjn (vk1 , , vkn ) wi j1 k1 j2 k2 jn kn (6.13.48)
and wi vj1 vjn is to be linear in each variable. Thus, for

( n )

n
ak1 vk1 , , akn vkn X n ,
k1 =1 kn =1
( )

n
n
wi vj1 vjn ak1 vk1 , , akn vkn
k1 =1 kn =1

wi (ak1 ak2 akn ) j1 k1 j2 k2 jn kn
k1 k2 kn
= wi aj1 aj2 ajn (6.13.49)
Show each wi vj1 vjn is an n linear Y valued function. Next show the set
of n linear Y valued functions is a vector space and these special functions,
wi vj1 vjn for all choices of i and the jk is a basis of this vector space. Find
the dimension of the vector space.
n n 2 2
17. Minimize j=1 xj subject to the constraint j=1 xj = a . Your answer
should be some function of a which you may assume is a positive number.
18. Find the point, (x, y, z) on the level surface, 4x2 + y 2 z 2 = 1which is closest
to (0, 0, 0) .
19. A curve is formed from the intersection of the plane, 2x + 3y + z = 3 and the
cylinder x2 + y 2 = 4. Find the point on this curve which is closest to (0, 0, 0) .
20. A curve is formed from the intersection of the plane, 2x + 3y + z = 3 and
the sphere x2 + y 2 + z 2 = 16. Find the point on this curve which is closest to
(0, 0, 0) .
21. Find the point on the plane, 2x + 3y + z = 4 which is closest to the point
(1, 2, 3) .
22. Let A = (Aij ) be an n n matrix which is symmetric. Thus Aij = Aji
and recall (Ax)i = Aij xj where as usual sum over the repeated index. Show

xi (Aij xj xi ) = 2Aij xj . Show that when you use the method of Lagrange
n
mul-
tipliers to maximize the function, Aij xj xi subject to the constraint, j=1 x2j =
1, the value of which corresponds to the maximum value of this functions is
such that Aij xj = xi . Thus Ax = x. Thus is an eigenvalue of the matrix,
A.
6.13. EXERCISES 157
23. Let x1 , , x5 be 5 positive numbers. Maximize their product subject to the

constraint that
x1 + 2x2 + 3x3 + 4x4 + 5x5 = 300.
24. Let f (x1 , , xn ) = xn1 x2n1 x1n . Then f achieves a maximum on the set
{ }
n
S xR : n
ixi = 1 and each xi 0 .
i=1
If x S is the point where this maximum is achieved, nd x1 /xn .

25. Let (x, y) be a point on the ellipse, x2 /a2 + y 2 /b2 = 1 which is in the rst
quadrant. Extend the tangent line through (x, y) till it intersects the x and
y axes and let A (x, y) denote the area of the triangle formed by this line and
the two coordinate axes. Find the minimum value of the area of this triangle
as a function of a and b.
n
26. Maximize i=1 x2i ( x21 x22 x23 ( ) x2n ) subject to the constraint,
n 2 2 2 n
i=1 xi = r . Show the maximum is r /n . Now show from this that
( )1/n

n
1 2
n
x2i x
i=1
n i=1 i
and nally, conclude that if each number xi 0, then

( )1/n

n
1
n
xi xi
i=1
n i=1
and there exist values of the xi for which equality holds. This says the geo-
metric mean is always smaller than the arithmetic mean.
27. Maximize x2 y 2 subject to the constraint
x2p y 2q
+ = r2
p q
where p, q are real numbers larger than 1 which have the property that
1 1
+ = 1.
p q
show the maximum is achieved when x2p = y 2q and equals r2 . Now conclude
that if x, y > 0, then
xp yq
xy +
p q
and there are values of x and y where this inequality is an equation.
158 THE DERIVATIVE
Part II
Integration And Measure
159
Measures And Measurable
Functions
The integral to be discussed next is the Lebesgue integral. This integral is more
general than the Riemann integral of beginning calculus. It is not as easy to dene
as this integral but it is vastly superior in every application. In fact, the Riemann
integral has been obsolete for over 100 years. There exist convergence theorems
for this integral which are not available for the Riemann integral and unlike the
Riemann integral, the Lebesgue integral generalizes readily to abstract settings used
in probability theory. Much of the analysis done in the last 100 years applies to the
Lebesgue integral. For these reasons, and because it is very easy to generalize the
Lebesgue integral to functions of many variables I will present the Lebesgue integral
here. First it is convenient to discuss outer measures, measures, and measurable
function in a general setting.
7.1 Open Coverings And Compactness

This is a good place to put an important theorem about compact sets. The denition
of what is meant by a compact set follows.
Denition 7.1.1 Let U denote a collection of open sets in a normed vector space.
Then U is said to be an open cover of a set K if K U . Let K be a subset of
a normed vector space. Then K is compact if whenever U is an open cover of K
there exist nitely many sets of U, {U1 , , Um } such that
K m
k=1 Uk .
In words, every open cover admits a nite subcover.

It was shown earlier that in any nite dimensional normed vector space the closed
and bounded sets are those which are sequentially compact. The next theorem says
that in any normed vector space, sequentially compact and compact are the same.1
1 Actually, this is true more generally than for normed vector spaces. It is also true for metric
spaces, those on which there is a distance dened.
161
162 MEASURES AND MEASURABLE FUNCTIONS
First here is a very interesting lemma about the existence of something called a
Lebesgue number, the number r in the next lemma.
Lemma 7.1.2 Let K be a sequentially compact set in a normed vector space and
let U be an open cover of K. Then there exists r > 0 such that if x K, then
B (x, r) is a subset of some set of U.
Proof: Suppose no such r exists. Then in particular,

( 1/n
) does not work for each
n N. Therefore, there exists xn K such that B xn , n1 is not a subset of any of
the sets of U. Since K is sequentially compact, there exists a subsequence, {xnk }
converging to a point x of K. Then there exists r > 0 such that B (x, r) U U
because U is an open cover. Also xnk B (x,r/2) for all k large enough and also
for all k large enough, 1/nk < r/2. Therefore, there exists xnk B (x,r/2) and
1/nk < r/2. But this is a contradiction because
B (xnk , 1/nk ) B (x, r) U
contrary to the choice of xnk which required B (xnk , 1/nk ) is not contained in any
set of U.
Theorem 7.1.3 Let K be a set in a normed vector space. Then K is compact if

and only if K is sequentially compact. In particular if K is a closed and bounded
subset of a nite dimensional normed vector space, then K is compact.
Proof: Suppose rst K is sequentially compact and let U be an open cover.

Let r be a Lebesgue number as described in Lemma 7.1.2. Pick x1 K. Then
m
B (x1 , r) U1 for some U1 U. Suppose {B (xi , r)}i=1 have been chosen such that
B (xi , r) Ui U.
m m
If their union contains K then {Ui }i=1 is a nite subcover of U. If {B (xi , r)}i=1
does not cover K, then there exists xm+1 / mi=1 B (xi , r) and so B (xm+1 , r)
Um+1 U. This process must stop after nitely many choices of B (xi , r) because

if not, {xk }k=1 would have a subsequence which converges to a point of K which
cannot occur because whenever i = j,
||xi xj || > r
Therefore, eventually
K m
k=1 B (xk , r) k=1 Uk .
m
this proves one half of the theorem.

Now suppose K is compact. I need to show it is sequentially compact. Suppose
it is not. Then there exists a sequence, {xk } which has no convergent subsequence.
This requires that {xk } have no limit point for if it did have a limit point x, then
B (x, 1/n) would contain innitely many distinct points of {xk } and so a subse-
quence of {xk } converging to x could be obtained. Also no xk is repeated innitely
often because if there were such, a convergent subsequence could be obtained. Hence
7.1. OPEN COVERINGS AND COMPACTNESS 163
k=m {xk } Cm is a closed set, closed because it contains all its limit points. (It
has no limit points so it contains them all.) Then letting Um = Cm C
, it follows {Um }
is an open cover of K which has no nite subcover. Thus K must be sequentially
compact after all.
If K is a closed and bounded set in a nite dimensional normed vector space,
then K is sequentially compact by Theorem 4.8.4. Therefore, by the rst part of
this theorem, it is sequentially compact.
Summarizing the above theorem along with Theorem 4.8.4 yields the following
corollary which is often called the Heine Borel theorem.
Corollary 7.1.4 Let X be a nite dimensional normed vector space and let K X.
Then the following are equivalent.
1. K is closed and bounded.

2. K is sequentially compact.
3. K is compact.
There is also a very interesting result having to do with coverings of an arbitrary

set. The message is that you can always reduce to a countable subcover. Here F = C
or R.
Lemma 7.1.5 Let ||x|| max {|xi | , i = 1, 2, , n} for x Fn . Then every set U
which is open in Fn is the countable union of balls of the form B (x,r) where the
open ball is dened in terms of the above norm. Also, if C is any collection of open
sets, then there exists a countable subset C such that
C = C .
Proof: By Theorem 4.8.3 if you consider the two normed vector spaces (Fn , ||)
and (Fn , ||||) , the identity map is continuous in both directions. Therefore, if a set
U is open with respect to || , it follows it is open with respect to |||| and the other
way around. The other thing to notice is that there exists a countable dense subset
of F. The rationals will work if F = R and if F = C, then you use Q + iQ. Letting
D be a countable dense subset of F, Dn is a countable dense subset of Fn . It is
countable because it is a nite Cartesian product of countable sets and you can use
Theorem 1.1.7 of Page 18 repeatedly. It is dense because if x Fn , then by density
of D, there exists dj D such that
|dj xj | <
then d (d1 , , dn ) is such that ||d x|| < .

Now consider the set of open balls,
B {B (d, r) : d Dn , r Q} .
This collection of open balls is countable by Theorem 1.1.7 of Page 18. I claim
every open set is the union of balls from B. Let U be an open set in Fn and x U .
Then there exists > 0 such that B (x, ) U. There exists d Dn B (x, /5) .
Then pick rational number /5 < r < 2/5. Consider the set of B, B (d, r) . Then
x B (d, r) because r > /5. However, it is also the case that B (d, r) B (x, )
because if y B (d, r) then
||y x|| ||y d|| + ||d x||

2
< + < .
5 5
Let B B denote those sets of B which contain some point of C and which are
contained in some set of C. For each B B , let UB be a single set of C which
contains B. Consider a point of C x. Then x is in some set of B just described.
Therefore,
x B {UB : B B }
Let C {UB : B B }. Since x is arbitrary, this shows that
C C C
The last condition which says you can reduce to a countable sub covering is
called the Lindelo property.
7.2 An Outer Measure On P (R)

A measure on R is like length. I will present something more general than length
because it is no trouble to do so and the generalization is useful in many areas of
mathematics such as probability.
Denition 7.2.1 The following denition is important.
F (x+) lim F (y) , F (x) = lim F (y)

yx+ yx
Thus one of these is the limit from the left and the other is the limit from the right.
Denition 7.2.2 P (S) denotes the set of all subsets of S.
Theorem 7.2.3 Let F be an increasing function dened on R. This will be called

an integrator function. There exists a function : P (R) [0, ] which satises
the following properties.
1. If A B, then 0 (A) (B) , () = 0.

2. (
k=1 Ai ) i=1 (Ai )
3. ([a, b]) = F (b+) F (a) ,
4. ((a, b)) = F (b) F (a+)

7.2. AN OUTER MEASURE ON P (R) 165
5. ((a, b]) = F (b+) F (a+)
6. ([a, b)) = F (b) F (a).
Proof: First it is necessary to dene the function . This is contained in the

following denition.
Denition 7.2.4 For A R,

(A) = inf (F (bi ) F (ai +)) : A (ai , b i )
i=1

j=1
In words, you look at all coverings of A with open intervals. For each of these
open coverings, you add the lengths of the individual open intervals and you take
the inmum of all such numbers obtained.
Then 1.) is obvious because if a countable collection of open intervals covers B,
then it also covers A. Thus the set of numbers obtained for B is smaller than the set
of numbers for A. Why is () = 0? Pick a point of continuity of F. Such points exist
because F is increasing and so it has only countably many points of discontinuity.
Let a be this point. Then (a , a + ) and so () F (a + ) F (a )
for every > 0. Letting 0, it follows that () = 0.
Consider 2.). If any (Ai ) = , there is nothing to prove. The assertion simply
is . Assume then that (Ai ) < for all i. Then for each m N there
m
exists a countable set of open intervals, {(ami , bi )}i=1 such that

i ) F (ai +)) .
(F (bm m
(Am ) + m
>
2 i=1
Then using Theorem 1.3.4 on Page 26,

(
m=1 Am ) i ) F (ai +)) =
(F (bm m
i ) F (ai +))
(F (bm m
i,m m=1 i=1

(Am ) + = (Am ) + ,
m=1
2m m=1
and since is arbitrary, this establishes 2.).

Next consider 3.). By denition, there exists a sequence of open intervals,

{(ai , bi )}i=1 whose union contains [a, b] such that

([a, b]) + (F (bi ) F (ai +))
i=1
By Theorem 7.1.3, nitely many of these intervals also cover [a, b]. It follows there
n
exist nitely many of these intervals, denoted as {(ai , bi )}i=1 , which overlap, such
that a (a1 , b1 ) , b1 (a2 , b2 ) , , b (an , bn ) . Therefore,

n
([a, b]) (F (bi ) F (ai +)) .
i=1
It follows

n
(F (bi ) F (ai +)) ([a, b])
i=1

n
(F (bi ) F (ai +))
i=1
F (b+) F (a)
Therefore,
F (b + ) F (a ) ([a, b]) F (b+) F (a)
Letting 0,
F (b+) F (a) ([a, b]) F (b+) F (a)
([a, b]) = F (b+) F (a) ,
This establishes 3.).

Consider 4.). For small > 0,
([a + , b ]) ((a, b)) ([a, b]) .
Therefore, from 3.) and the denition of ,
F ((b )) F ((a + )) F ((b ) +) F ((a + ) )
= ([a + , b ]) ((a, b)) F (b) F (a+)

Now letting decrease to 0 it follows
F (b) F (a+) ((a, b)) F (b) F (a+)
This shows 4.)

Consider 5.). From 3.) and 4.), for small > 0,
F (b+) F ((a + ))
F (b+) F ((a + ) )
= ([a + , b]) ((a, b])
((a, b + )) = F ((b + ) ) F (a+)
F (b + ) F (a+) .
7.3. MEASURES AND MEASURE SPACES 167
Now let converge to 0 from above to obtain
F (b+) F (a+) = ((a, b]) = F (b+) F (a+) .
This establishes 5.) and 6.) is entirely similar to 5.).

The rst two conditions of the above theorem are so important that we give
something satisfying them a special name.
Denition 7.2.5 Let be a nonempty set. A function mapping P () [0, ] is

called an outer measure if it satises
Theorem 7.2.6 1. If A B, then 0 (A) (B) , () = 0.

2. (
k=1 Ai ) i=1 (Ai )
7.3 Measures And Measure Spaces

First here is a denition of a measure. This is what we want to eventually get.
What has been obtained above on P (R) is merely an outer measure along with its
values on intervals.
Denition 7.3.1 S P () is called a algebra, pronounced sigma algebra, if
, S,
If E S then E C S
and
If Ei S, for i = 1, 2, , then
i=1 Ei S.
A function : S [0, ] where S is a algebra is called a measure if whenever

{Ei }i=1 S and the Ei are disjoint, then it follows

( )

j=1 Ej = (Ej ) .
j=1
The triple (, S, ) is often called a measure space. Sometimes people refer to

(, S) as a measurable space, making no reference to the measure. Sometimes (, S)
may also be called a measure space.
The following lemma is obvious and its proof is left for you.
Lemma 7.3.2 Let G denote a set whose elements are algebras. Then G is also
a algebra.
For S P () , (S) denotes the smallest algebra which contains S. Such a

smallest algebra exists because P () is a algebra. Hence, letting G denote the
set of all algebras which contain S it follows G = . Therefore, G is nonempty
and (S) G.
Denition 7.3.3 In any topological space, the Borel sets consist of the smallest
algebra which contains the open sets. Thus denoting by the collection of open
sets, the Borel sets, written B ( ).
Theorem 7.3.4 Let {Em } m=1 be a sequence of measurable sets in a measure space
(, F, ). Then if En En+1 En+2 ,
(
i=1 Ei ) = lim (En ) (7.3.1)
n
and if En En+1 En+2 and (E1 ) < , then
(
i=1 Ei ) = lim (En ). (7.3.2)
n
Stated more succinctly, Ek E implies (Ek ) (E) and Ek E with (E1 ) <
implies (Ek ) (E).
Proof: First note that C C

i=1 Ei = (i=1 Ei ) F so i=1 Ei is measurable.
( C )C
Also note that for A and B sets of F, A \ B A B F . To show 7.3.1, note
that 7.3.1 is obviously true if (Ek ) = for any k. Therefore, assume (Ek ) <
for all k. Thus
(Ek+1 \ Ek ) + (Ek ) = (Ek+1 )
and so
(Ek+1 \ Ek ) = (Ek+1 ) (Ek ).
Also,

Ek = E1 (Ek+1 \ Ek )
k=1 k=1
and the sets in the above union are disjoint. Hence

(
i=1 Ei ) = (E1 ) + (Ek+1 \ Ek ) = (E1 )
k=1

+ (Ek+1 ) (Ek )
k=1

n
= (E1 ) + lim (Ek+1 ) (Ek ) = lim (En+1 ).
n n
k=1
This shows part 7.3.1.
To verify 7.3.2,
(E1 ) = (
i=1 Ei ) + (E1 \ i=1 Ei )
since (E1 ) < , it follows (

i=1 Ei ) < . Also, E1 \ i=1 Ei E1 \ i=1 Ei and
n
so by 7.3.1,
(E1 ) (
i=1 Ei ) = (E1 \ i=1 Ei ) = lim (E1 \ i=1 Ei )
n
n
7.4. MEASURES FROM OUTER MEASURES 169
= (E1 ) lim (ni=1 Ei ) = (E1 ) lim (En ),

n n
Hence, subtracting (E1 ) from both sides,
lim (En ) = (
i=1 Ei ).
n
The following denition is important.
Denition 7.3.5 If something happens except for on a set of measure zero, then
it is said to happen a.e. almost everywhere. For example, {fk (x)} is said to
converge to f (x) a.e. if there is a set of measure zero, N such that if x N, then
fk (x) f (x).
7.4 Measures From Outer Measures

Earlier an outer measure on P (R) was constructed. This can be used to obtain a
measure dened on R. However, the procedure for doing so is a special case of a
general approach due to Caratheodory in about 1918.
Denition 7.4.1 Let be a nonempty set and let : P() [0, ] be an outer
measure. For E , E is measurable if for all S ,
(S) = (S \ E) + (S E). (7.4.3)
To help in remembering 7.4.3, think of a measurable set E, as a process which

divides a given set into two pieces, the part in E and the part not in E as in 7.4.3.
In the Bible, there are several incidents recorded in which a process of division
resulted in more stu than was originally present.2 Measurable sets are exactly
those which are incapable of such a miracle. You might think of the measurable
sets as the nonmiraculous sets. The idea is to show that they form a algebra on
which the outer measure is a measure.
First here is a denition and a lemma.
Denition 7.4.2 (S)(A) (S A) for all A . Thus S is the name of a

new outer measure, called restricted to S.
The next lemma indicates that the property of measurability is not lost by
considering this restricted measure.
Lemma 7.4.3 If A is measurable, then A is S measurable.

2 1 Kings 17, 2 Kings 4, Mathew 14, and Mathew 15 all contain such descriptions. The stu
involved was either oil, bread, our or sh. In mathematics such things have also been done with
sets. In the book by Bruckner Bruckner and Thompson there is an interesting discussion of the
Banach Tarski paradox which says it is possible to divide a ball in R3 into ve disjoint pieces and
assemble the pieces to form two disjoint balls of the same size as the rst. The details can be
found in: The Banach Tarski Paradox by Wagon, Cambridge University press. 1985. It is known
that all such examples must involve the axiom of choice.
Proof: Suppose A is measurable. It is desired to to show that for all T ,

(S)(T ) = (S)(T A) + (S)(T \ A).
Thus it is desired to show
(S T ) = (T A S) + (T S AC ). (7.4.4)
But 7.4.4 holds because A is measurable. Apply Denition 7.4.1 to S T instead
of S.
If A is S measurable, it does not follow that A is measurable. Indeed, if
you believe in the existence of non measurable sets, you could let A = S for such a
non measurable set and verify that S is S measurable.
The next theorem is the main result on outer measures which shows that starting
with an outer measure you can obtain a measure.
Theorem 7.4.4 Let be a set and let be an outer measure on P (). The
collection of measurable sets S, forms a algebra and

If Fi S, Fi Fj = , then (
i=1 Fi ) = (Fi ). (7.4.5)
i=1
If Fn Fn+1 , then if F =
n=1 Fn and Fn S, it follows that
(F ) = lim (Fn ). (7.4.6)

n
If Fn Fn+1 , and if F =
n=1 Fn for Fn S then if (F1 ) < ,
(F ) = lim (Fn ). (7.4.7)

n
This measure space is also complete which means that if (F ) = 0 for some F S
then if G F, it follows G S also.
Proof: First note that and are obviously in S. Now suppose A, B S. I
will show A \ B A B C is in S. To do so, consider the following picture.
S
C
S AC B

S AC B

S A BC

S A B
B
A
7.4. MEASURES FROM OUTER MEASURES 171
First note that
S \ (A \ B) = (S \ A) (S B) = (S \ A) (A B S) .
Since is subadditive and A, B S,
(S) (S \ (A \ B)) + (S (A \ B))

( )
(S \ A) + (S B A) + S A B C
= (S \ A) + (S A) = (S)
and so all the inequalities are equal signs. Therefore, since S is arbitrary, this shows
A \ B S.
Since S, this shows that A S if and only if AC S. Now if A, B S,
A B = (AC B C )C = (AC \ B)C S. By induction, if A1 , , An S, then so
is ni=1 Ai . If A, B S, with A B = ,
(A B) = ((A B) A) + ((A B) \ A) = (A) + (B).
By induction, if Ai Aj = and Ai S,

n
(ni=1 Ai ) = (Ai ). (7.4.8)
i=1
Now let A =
i=1 Ai where Ai Aj = for i = j.

n
(Ai ) (A) (ni=1 Ai ) = (Ai ).
i=1 i=1
Since this holds for all n, you can take the limit as n and conclude,

(Ai ) = (A)
i=1
which establishes 7.4.5.

Consider part 7.4.6. Without loss of generality (Fk ) < for all k since
otherwise there is nothing to show. Suppose {Fk } is an increasing sequence of sets

of S. Then letting F0 , {Fk+1 \ Fk }k=0 is a sequence of disjoint sets of S since
it was shown above that the dierence of two sets of S is in S. Also note that from
7.4.8
(Fk+1 \ Fk ) + (Fk ) = (Fk+1 )
and so if (Fk ) < , then
(Fk+1 \ Fk ) = (Fk+1 ) (Fk ) .
Therefore, letting
F
k=1 Fk
which also equals

k=1 (Fk+1 \ Fk ) ,
it follows from part 7.4.5 just shown that

n
(F ) = (Fk+1 \ Fk ) = lim (Fk+1 \ Fk )
n
k=1 k=1

n
= lim (Fk+1 ) (Fk ) = lim (Fn+1 ) .
n n
k=1
In order to establish 7.4.7, let the Fn be as given there. Then, since (F1 \ Fn )
increases to (F1 \ F ), 7.4.6 implies
lim ( (F1 ) (Fn )) = (F1 \ F ) .

n
Now (F1 \ F ) + (F ) (F1 ) and so (F1 \ F ) (F1 ) (F ). Hence
lim ( (F1 ) (Fn )) = (F1 \ F ) (F1 ) (F )

n
which implies
lim (Fn ) (F ) .
n
But since F Fn ,
(F ) lim (Fn )
n
and this establishes 7.4.7. Note that it was assumed (F1 ) < because (F1 ) was
subtracted from both sides.
It remains to show S is closed under countable unions. Recall that if A S, then
AC S and S is closed under nite unions. Let Ai S, A = i=1 Ai , Bn = i=1 Ai .
n
Then
(S) = (S Bn ) + (S \ Bn ) (7.4.9)
= (S)(Bn ) + (S)(BnC ).
By Lemma 7.4.3 Bn is (S) measurable and so is BnC . I want to show (S)

(S \ A) + (S A). If (S) = , there is nothing to prove. Assume (S) < .
Then apply Parts 7.4.7 and 7.4.6 to the outer measure S in 7.4.9 and let n .
Thus
Bn A, BnC AC
and this yields (S) = (S)(A) + (S)(AC ) = (S A) + (S \ A).
Therefore A S and this proves Parts 7.4.5, 7.4.6, and 7.4.7.
It only remains to verify the assertion about completeness. Letting G and F be
as described above, let S . I need to verify
(S) (S G) + (S \ G)
7.5. MEASURABLE FUNCTIONS 173
However,
(S G) + (S \ G) (S F ) + (S \ F ) + (F \ G)
= (S F ) + (S \ F ) = (S)
because by assumption, (F \ G) (F ) = 0.
Corollary 7.4.5 Completeness is the same as saying that if (E \ E ) (E \ E)

N F and (N ) = 0, then if E F , it follows that E F also.
Proof: If the new condition holds, then suppose G F where (F ) = 0, F F.

=
z }| {
Then (G \ F ) (F \ G) F and (F ) is given to equal 0. Therefore, G F .
Now suppose the earlier version of completeness and let
(E \ E ) (E \ E) N F
where (N ) = 0 and E F. Then we know
(E \ E ) , (E \ E) F
and all have measure zero. It follows E \ (E \ E ) = E E F. Hence
E = (E E ) (E \ E) F
The measure which results from the outer measure of Theorem 7.2.3 is called
the Lebesgue Stieltjes measure associated with the integrator function F . Its prop-
erties will be discussed more later.
7.5 Measurable Functions

The integral will be dened on measurable functions which is the next topic con-
sidered. It is sometimes convenient to allow functions to take the value +. You
should think of +, usually referred to as , as something out at the right end
of the real line and its only importance is the notion of sequences converging to
it. xn exactly when for all l R, there exists N such that if n N, then
xn > l. This is what it means for a sequence to converge to . Dont think of
as a number. It is just a convenient symbol which allows the consideration of
some limit operations more simply. Similar considerations apply to but this
value is not of very great interest. In fact the set of most interest for the values of
a function, f is the complex numbers or more generally some normed vector space.
Recall the notation,
f 1 (A) {x : f (x) A} [f (x) A] [f A]
in whatever context the notation occurs.

Lemma 7.5.1 Let f : (, ] where F is a algebra of subsets of . Then

the following are equivalent.
f 1 ((d, ]) F for all nite d,
f 1 ((, d)) F for all nite d,

f 1 ([d, ]) F for all nite d,
f 1 ((, d]) F for all nite d,
f 1 ((a, b)) F for all a < b, < a < b < .
Proof: First note that the rst and the third are equivalent. To see this, observe
f 1 ([d, ]) =
n=1 f
1
((d 1/n, ]),
and so if the rst condition holds, then so does the third.
f 1 ((d, ]) =
n=1 f
1
([d + 1/n, ]),
and so if the third condition holds, so does the rst.

Similarly, the second and fourth conditions are equivalent. Now
f 1 ((, d]) = (f 1 ((d, ]))C
so the rst and fourth conditions are equivalent. Thus the rst four conditions are
equivalent and if any of them hold, then for < a < b < ,
f 1 ((a, b)) = f 1 ((, b)) f 1 ((a, ]) F .
Finally, if the last condition holds,

( )C
f 1 ([d, ]) =
k=1 f
1
((k + d, d)) F
and so the third condition holds. Therefore, all ve conditions are equivalent.
This lemma allows for the following denition of a measurable function having
values in (, ].
Denition 7.5.2 Let (, F) be a measure space and let f : (, ]. Then f

is said to be F measurable if any of the equivalent conditions of Lemma 7.5.1 hold.
More generally, when you have a measure space (, F) and a function g : X
where X is a topological space, we say that g is measurable if g 1 (U ) F for all
U an open set.
In the case of (, ], you can start with the general denition of measurability
just given and conclude that all of the equivalent conclusions in the above lemma
hold. This was contained in the above proof.
Theorem 7.5.3 Let fn and f be functions mapping to (, ] where F is a

algebra of measurable sets of . Then if fn is measurable, and f () = limn fn (),
it follows that f is also measurable. (Pointwise limits of measurable functions are
measurable.)
Proof: The idea is to show f 1 ((a, ]) F .

( ) ( ) ([ ])
1 1 1 1 1 1
f (a + , ] k=1 n=k fn (a + , ] f a + ,
r r r
1
because f () > a implies that for all n large enough,
[ fn ()
] > a + r also. If
fn () > a + r for all n large enough, then f () a + r , . Now simply take a
1 1
union over all r N. Thus

( )
1 1 1
f ((a, ]) r=1 k=1 n=k fn (a + , ]
r
([ ])
1
r=1 f
1
a + , = f 1 ((a, ])
r
The expression in the middle is given to be measurable.
Notation 7.5.4 For E a set, let XE () be dened by

{
1 if E
XE (x) =
0 if
/E
Let (, F) be a measure space. A simple function is one which is measurable and

has nitely many values. Thus every simple function is of the form

m
s () ck XEk () ,
k=1
the ck being the distinct values of s, Ek being the set on which s = ck . A nonnegative
simple function is one in which the ck [0, ].
m
Lemma 7.5.5 Let s () = k=1 ck XEk () where the ck are the distinct values of
s, the Ek being disjoint. Then s is measurable if and only if each Ek is measurable..
Proof: Suppose rst s is measurable. Let be small enough that Ek =

s1 (ck , ck + ) . That is, no other values of the cj are in this interval. Then
it follows Ek is measurable because s is given to be measurable.
Next suppose each Ek is measurable. Then [s > ] consists of a nite union of
Ek where ck > .
Note that f 1 ([a, b)) = f 1 (, a) f 1 (, b) which is measurable.
C
Theorem 7.5.6 A function f 0 is measurable with respect to the measure space

(, F) if and only if there exists a sequence of nonnegative simple functions {sn }
satisfying
0 sn () (7.5.10)
sn () sn+1 ()
f () = lim sn () for all . (7.5.11)
n
If f is bounded and measurable, the convergence is actually uniform.
Proof : First suppose that f is measurable.
f 1 ([a, b)) = f 1 ((, a)) f 1 ((, b)) F.

C
Letting I { : f () = } , dene

n
2
k
tn () = X 1 () + nXI ().
n f ([k/n,(k+1)/n))
k=0
Then tn () f () for all and limn tn () = f () for all . This is because

n
tn () = n for I and if f () [0, 2 n+1 ), then
1
0 f () tn () . (7.5.12)
n
Thus whenever
/ I, the above inequality will hold for all n large enough. Let
s1 = t1 , s2 = max (t1 , t2 ) , s3 = max (t1 , t2 , t3 ) , .
Then the sequence {sn } satises 7.5.10-7.5.11.

In case f is bounded, the term nXI () is not present. Therefore, for all n large
enough, 2n n f () for all , 7.5.12 holds for all . Thus the convergence is
uniform.
Conversely, if f is the limit of nonnegative simple functions, then by Theorem
7.5.3, f is measurable because each of these simple functions is measurable.
Now specialize to the case where f has values in R.
Denition 7.5.7 Let f : R. Then
|f | + f
f + max (f, 0) = ,
2
|f | f
f f + f = min (f, 0) = .
2
Thus f + f = f and both f + and f are nonnegative.
Observation 7.5.8 Note that f < since f > . Also note that f = f +
when f + > 0 and f = f when f > 0.
Theorem 7.5.9 Let (, F) be a measurable space. Then f : R is measurable

if and only if both f + and f are measurable, if and only if each of these are
pointwise limits of an increasing sequence of simple functions.
Proof: Suppose rst that f is measurable and a 0. Then from the denition
and the above observation,
( + )1
f ((a, ]) = f 1 ((a, ]) F
If a < 0, then from the denition,

( + )1 ( )1
f ((a, ]) = f + ([0, ]) = F
Thus from Lemma 7.5.1, f + is measurable.

Next consider f . First let a 0.
( )1
f ((a, ]) = f 1 ((, a)) F
If a < 0, then
( )1 ( )1
f ((a, ]) = f ([0, ]) = F
and so f + , f are both measurable. The rest involving the approximating sequences
of simple functions follows from Theorem 7.5.6.
Conversely, if both f + and f are measurable, then consider the following pic-
ture.
Ri y =a+x
The top half can be lled with a countable union of open rectangles of the sort
shown, Ri = (ai , bi ) (ci , di ). See Lemma 7.1.5. Then clearly f () > a, if and only
if (f () , f + ()) is in the top half plane determined in the above picture, which
is the union of countably many open rectangles. Hence
( + )1 ( )1
f 1 ((a, ]) = i=1 f (ci , di ) f (ai , bi ) F
You could also consider the following equation in which the right side is a count-
able union of measurable sets.
[ ] [ ]
[f > a] = rQ f + > a + r f < r .
The equation is true because if [f + > a + r] [f < r] for some r, then f () =

f + () f () > a + r r = a. Thus the right side is contained in the left. If
f () > a, then f + () > a + f () and so you can choose r a little bigger than
f () such that also f + () > a + r. Hence, for that r, [f + > a + r] [f < r] .

Denition 7.5.10 Let (, F) be a measurable space and let f : C, the complex

numbers. Then f is said to be measurable if the real and imaginary parts of f
are measurable. Complex simple functions are dened as those which are complex
valued, measurable and have nitely many values. Thus by similar reasoning to the
above, every complex simple function is of the form

n
s () = ck XEk ()
k=1
where the ck are the distinct values of s, and the Ek are disjoint sets.
n
Lemma 7.5.11 Let s () = k=1 (ak + ibk ) XEk () where the ak + ibk are the
distinct values of s and the Ek are the disjoint sets on which s achieves these
values. Then s is measurable if and only if each Ek is measurable.
n n
Proof: First suppose s is measurable. Then both k=1 ak XEk () and k=1 bk XEk ()
are measurable. Then Ek = [Re s = ak ]
[Im s = bk ] F. Conversely, assume each
Ek is measurable. Then k ak XEk and k bk XEk are both measurable because if
h is the rst, then
[h > ] = {Ek : ak > } F.
Similar considerations hold for the second.
Now it is not hard to generalize the above to complex valued functions.
Theorem 7.5.12 f : C is measurable if and only if there exists a sequence of

complex simple functions {sj }j=1 such that |sj | |f | and sj () f () for all .
Proof: If f is measurable, the conclusion follows right away from Theorem 7.5.9
applied to the real and imaginary parts of f. Letting h denote one of these real or
imaginary parts, there exist increasing sequences of simple functions {sk } , {tk }
converging pointwise to h+ and h respectively. If h () 0, Then h () = 0 so
tk () = 0. Hence
|h ()| = h+ () sk () = |sk () tk ()|
If h () 0, then h () = h () and so h+ () = 0. Hence sk () = 0 and
|h ()| = h () tk () = |sk () tk ()|
Either way, |h| |sk tk |.

Thus there exist real simple functions {pk } {qk } converging to Re f and Im f
respectively such that |pk | |Re f | , |qk | |Im f | . Hence the complex simple
function pk + iqk converges pointwise to Re f + i Im f and

2 2
|pk + iqk | = p2k + qk2 (Re f ) + (Im f ) .
Conversly, if there exists a sequence of complex simple functions which do what

is expressed above, then their real parts are simple functions converging to the real
part of f and their imaginary parts are simple functions converging to the imaginary
part of f . Then it follows from Theorem 7.5.3 that f is measurable because its real
and imaginary parts are.
Now the following is a major result about continuous functions of complex valued
measurable functions.
Corollary 7.5.13 Let (, F) be a measurable space, let g : Cn C be continuous

and let fi : C be measurable. Then g f is also measurable where f =
(f1 , , fn ).
{ }
Proof: Let sij j=1 be a sequence of complex simple functions converging point-
( )
wise to fi . Let sj s1j , , snj . Then by continuity of g,
lim g sj = g f
j
It follows that if g sj is measurable, then so is g f . However, since each sj has a

simple function in each component, g sj has nitely many values, corresponding
to the possible values of the sij . Furthermore, these nitely many values of sj are
achieved on a measurable set resulting from the nite intersection of the sets on
which the sij are constant. Therefore, g sj is measurable because it has nitely
many values which are achieved on measurable sets, and consequently g f is also
measurable because it is a limit of these measurable functions.
What this says is that you can do pretty much any reasonable algebraic ma-
nipulation to measurable functions and you end up with a measurable function.
In particular, linear combinations, products, absolute values, etc. of measurable
functions are all measurable.
Theorem 7.5.14 (Egoro ) Let (, F, ) be a nite measure space,
(() < )
and let fn , f be complex valued functions such that Re fn , Im fn are all measurable
and
lim fn () = f ()
n
for all
/ E where (E) = 0. Then for every > 0, there exists a set
F E, (F ) < ,
such that fn converges uniformly to f on F C .
Proof: First suppose E = so that convergence is pointwise everywhere. It

follows then that Re f and Im f are pointwise limits of measurable functions and
are therefore measurable. Let Ekm = { : |fn () f ()| 1/m for some
n > k}. Note that

2 2
|fn () f ()| = (Re fn () Re f ()) + (Im fn () Im f ())
and so by Corollary 7.5.13,

[ ]
1
|fn f |
m
is measurable. Hence Ekm is measurable because
[ ]
1
Ekm =
n=k+1 |fn f | .
m
For xed m, k=1 Ekm = because fn converges to f . Therefore, if there

exists k such that if n > k, |fn () f ()| < m
1
which means
/ Ekm . Note also
that
Ekm E(k+1)m .
Since (E1m ) < , Theorem 7.3.4 on Page 168 implies
0 = (
k=1 Ekm ) = lim (Ekm ).
k
Let k(m) be chosen such that (Ek(m)m ) < 2m and let

F = Ek(m)m .
m=1
Then (F ) < because

( )
(F ) Ek(m)m < 2m =
m=1 m=1
Now let > 0 be given and pick m0 such that m1

0 < . If F , then
C

C
Ek(m)m .
m=1
Hence Ek(m
C
0 )m0
so
|fn () f ()| < 1/m0 <
for all n > k(m0 ). This holds for all F C and so fn converges uniformly to f on
F C.

Now if E = , consider {XE C fn }n=1 . Each XE C fn has real and imaginary
parts measurable and the sequence converges pointwise to XE f everywhere. There-
fore, from the rst part, there exists a set of measure less than , F such that on
C
F C , {XE C fn } converges uniformly to XE C f. Therefore, on (E F ) , {fn } con-
verges uniformly to f .
7.6. ONE DIMENSIONAL LEBESGUE STIELTJES MEASURE 181
7.6 One Dimensional Lebesgue Stieltjes Measure

Now with these major results about measures, it is time to specialize to the outer
measure of Theorem 7.2.3. The next theorem gives Lebesgue Stieltjes measure on
R. The conditions 7.6.13 and 7.6.14 given below are known respectively as inner
and outer regularity.
Theorem 7.6.1 Let S denote the algebra of Theorem 7.4.4, associated with the
outer measure in Theorem 7.2.3, on which is a measure. Then every open
interval is in S. So are all open and closed sets. Furthermore, if E is any set in S
(E) = sup { (K) : K is a closed and bounded set K E} (7.6.13)
(E) = inf { (V ) : V is an open set V E} (7.6.14)
Proof: The rst task is to show (a, b) S. I need to show that for every S R,
( )
C
(S) (S (a, b)) + S (a, b) (7.6.15)
Suppose rst S is an open interval, (c, d) . If (c, d) has empty intersection with (a, b)
or is contained in (a, b) there is nothing to prove. The above expression reduces to
nothing more than (S) = (S). Suppose next that (c, d) (a, b) . In this case,
the right side of the above reduces to
((a, b)) + ((c, a] [b, d))

F (b) F (a+) + F (a+) F (c+) + F (d) F (b)
= F (d) F (c+) ((c, d))
The only other cases are c a < d b or a c < d b. Consider the rst of these
cases. Then the right side of 7.6.15 for S = (c, d) is
((a, d)) + ((c, a]) = F (d) F (a+) + F (a+) F (c+)

= F (d) F (c+) = ((c, d))
The last case is entirely similar. Thus 7.6.15 holds whenever S is an open interval.
Now it is clear 7.6.15 also holds if (S) = . Suppose then that (S) < and
let
S
k=1 (ak , bk )
such that

(S) + > (F (bk ) F (ak +)) = ((ak , bk )) .
k=1 k=1
Then since is an outer measure, and using what was just shown,
( )
C
(S (a, b)) + S (a, b)
( )
( C
k=1 (ak , bk ) (a, b)) + k=1 (ak , bk ) (a, b)

( )
C
((ak , bk ) (a, b)) + (ak , bk ) (a, b)
k=1

((ak , bk )) (S) + .
k=1
Since is arbitrary, this shows 7.6.15 holds for any S and so any open interval is in
S.
It follows any open set is in S. This follows from Theorem 4.3.10 which implies
that if U is open, it is the countable union of disjoint open intervals. Since each of
these open intervals is in S and S is a algebra, their union is also in S. It follows
every closed set is in S also. This is because S is a algebra and if a set is in S
then so is its complement. The closed sets are those which are complements of open
sets.
The assertion of outer regularity is not hard to get. Letting E be any set

(E) < , there exist open intervals covering E denoted by {(ai , bi )}i=1 such that

(E) + > F (bi ) F (ai +) = (ai , bi ) (V )
i=1 i=1
where V is the union of the open intervals just mentioned. Thus
(E) (V ) (E) + .
This shows outer regularity. If (E) = , there is nothing to show.

Now consider the assertion of inner regularity 7.6.13. Suppose I is a closed and
bounded interval and E I with E S. By outer regularity, there exists open V
containing I E C such that
( )
I E C + > (V )
( ( ))
Then since is additive on S, it follows that V \ I E C < . Then K V C I
is a compact subset of E. Also,
( ) ( )
E \ V C I = E V = V \ EC V \ I EC ,
a set of measure less than . Therefore,

( ) ( ) ( ( ))
V C I + V C I + E \ V C I = (E) ,
so the desired conclusion holds in the case where E is contained in a compact

interval.
7.7. EXERCISES 183
Now suppose E is arbitrary and let l < (E) . Then choosing small enough,
l + < (E) also. Letting En E [n, n] , it follows from Theorem 7.3.4 that
for n large enough, (En ) > l + . Now from what was just shown, there exists
K En such that (K) + > (En ). Hence (K) > l. This shows 7.6.13.
Denition 7.6.2 When the integrator function is F (x) = x, the Lebesgue Stielt-
jes measure just discussed is known as one dimensional Lebesgue measure and is
denoted as m.
Proposition 7.6.3 For m Lebesgue measure, m ([a, b]) = m ((a, b)) = b a. Also
m is translation invariant in the sense that if E is any Lebesgue measurable set,
then m (x + E) = m (E).
Proof: The formula for the measure of an interval comes right away from
Theorem 7.2.3. From this, it follows right away that whenever E is an interval,
m (x + E) = m (E). Every open set is the countable disjoint union of open intervals,
so if E is an open set, then m (x + E) = m (E). What about closed sets? First
suppose H is a closed and bounded set. Then letting (n, n) H,
(((n, n) \ H) + x) + (H + x) = ((n, n) + x)
Hence, from what was just shown about open sets,
(H) = ((n, n)) ((n, n) \ H)

= ((n, n) + x) (((n, n) \ H) + x) = (H + x)
Therefore, the translation invariance holds for closed and bounded sets. If H is an
arbitrary closed set, then
(H + x) = lim (H [n, n] + x) = lim (H [n, n]) = (H) .

n n
It follows right away that if G is the countable intersection of open sets, (G set,
pronounced g delta set ) then
m (G (n, n) + x) = m (G (n, n))
Now taking n , m (G + x) = m (G) .Similarly, if F is the countable union of

compact sets, (F set, pronounced F sigma set) then (F + x) = (F ) . Now using
Theorem 7.6.1, if E is an arbitrary measurable set, there exist an F set F and a
G set G such that F E G and m (F ) = m (G) = m (E). Then
m (F ) = m (x + F ) m (x + E) m (x + G) = m (G) = m (E) = m (F ) .
7.7 Exercises
1. Let C be a set whose elements are algebras of subsets of . Show C is a
algebra also.
2. Let be any set. Show P () , the set of all subsets of is a algebra. Now
let L denote some subset of P () . Consider all algebras which contain L.
Show the intersection of all these algebras which contain L is a algebra
containing L and it is the smallest algebra containing L, denoted by (L).
When is a normed vector space, and L consists of the open sets (L) is
called the algebra of Borel sets.
3. Show that for (, F) a measurable space, f : C is measurable if and
only if f 1 (open sets) F . Hint: Note that any open set is composed of
countably many rectangles of the form (a, b) + i (c, d).
4. Consider = [0, 1] and let S denote all subsets of [0, 1] , F such that either
F C or F is countable. Note the empty set must be countable. Show S is a
algebra. (This is a sick algebra.) Now let : S [0, ] be dened by
(F ) = 1 if F C is countable and (F ) = 0 if F is countable. Show is a
measure on S.
5. Let = N, the positive integers and let a algebra be given by F = P (N),
the set of all subsets of N. What are the measurable functions having values
in C? Let (E) be the number of elements of E where E is a subset of N.
Show is a measure.
6. Let F be a algebra of subsets of and suppose F has innitely many
elements. Show that F is uncountable. Hint: You might try to show there
exists a countable sequence of disjoint sets of F, {Ai }. It might be easiest to
verify this by contradiction if it doesnt exist rather than a direct construction
however, I have seen this done several ways. Once this has been done, you
can dene a map , from P (N) into F which is one to one by (S) = iS Ai .
Then argue P (N) is uncountable and so F is also uncountable.
7. A probability space is a measure space, (, F, P ) where the measure P has
the property that P () = 1. Such a measure is called a probability measure.
Random vectors are measurable functions, X, mapping a probability space,
(, F, P ) to Rn . Thus X () Rn for each and P is a probability
measure dened on the sets of F, a algebra of subsets of . For E a Borel
set in Rn , dene
( )
(E) P X1 (E) probability that X E.
Show this is a well dened probability measure on the Borel sets of Rn . Thus
(E) = P (X () E) . It is called the distribution.
8. Let E be a countable subset of R. Show m(E) = 0. Hint: Let the set be

{ei }i=1 and let ei be the center of an open interval of length /2i .
9. If S is an uncountable set of irrational numbers, is it necessary that S has
a rational number as a limit point? Hint: Consider the proof of Problem 8
when applied to the rational numbers. (This problem was shown to me by
Lee Erlebach.)
7.7. EXERCISES 185
10. Let be a nite measure on the Borel sets of Rn . Show that must be
regular. Hint: You might let F denote those Borel sets for which is regular,
show the open sets are in F and that F is a algebra.
11. If K is a compact subset of an open set V where K, V are in Rn , show there
exists a continuous function f : Rn [0, 1] such that f (x) = 1 on K and
spt (f ) {x : f (x) = 0} referred to as the support of f is a compact subset
of V .
12. Suppose
(, F, ) is a measure space and {Ei } F. Suppose also that
i=1 (Ei ) < . Let N consist of all the points of which are in innitely
many of the sets Ei . Show that (N ) = 0.
13. Suppose is a nite regular measure dened on B (Rn ) and E is a Borel
set. Let XE denote the indicator function of E dened as
{
1 if E
XE (x)
0 if
/E
Show there exists a sequence of continuous functions having compact supports
{fk } such that fk (x) XE (x) for a.e. x. Recall from the above problems,
the support of f is dened as
spt (f ) {x : f (x) = 0}.
14. Consider [the following

] [ ] nested sequence of compact sets {Pn }. We let P1 =
[0, 1], P2 = 0, 13 23 , 1 , etc. To go from Pn to Pn+1 , delete the open interval
which is the middle third of each closed interval in Pn . Let P = n=1 Pn .
Since P is the intersection of nested nonempty compact sets, it follows from
advanced calculus that P = . Show m(P ) = 0. Show there is a one to
one onto mapping of [0, 1] to P . The set P is called the Cantor set. Thus,
although P has measure zero, it has the same number of points in it as [0, 1]
in the sense that there is a one to one and onto mapping from one to the
other. Hint: There are various ways of doing this last part but the most
enlightenment is obtained by exploiting the construction of the Cantor set.
You might also work the next problem to get a way to do it.
15. Consider the sequence of functions dened in the following way. Let
f1 (x) = x on [0, 1]. To get from fn to fn+1 , let fn+1 = fn on all intervals where
fn is constant. If fn is nonconstant on [a, b], let fn+1 (a) = fn (a), fn+1 (b) =
fn (b), fn+1 is piecewise linear and equal to 12 (fn (a) + fn (b)) on the middle
third of [a, b]. Sketch a few of these and you will see the pattern. The process
of modifying a nonconstant section of the graph of this function is illustrated
in the following picture.
Show {fn } converges uniformly on [0, 1]. If f (x) = limn fn (x), show that
f (0) = 0, f (1) = 1, f is continuous, and f (x) = 0 for all x
/ P where P is the
Cantor set. This function is called the Cantor function.It is a very important
example to remember. Note it has derivative equal to zero a.e. and yet it
1
succeeds in climbing from 0 to 1. Thus it would seem that 0 f (t) dt = 0 =
f (1) f (0) . Is this somehow contradictory to the fundamental theorem of
calculus? Hint: This isnt too hard if you focus on getting a careful estimate
on the dierence between two successive functions in the list, considering only
a typical small interval in which the change takes place. The above picture
should be helpful.
16. Let m(W ) > 0, W is measurable, W [a, b]. Show there exists a nonmea-
surable subset of W . Hint: Let x y if x y Q. Observe that is an
equivalence relation on R. See Denition 1.1.9 on Page 20 for a review of this
terminology. Let C be the set of equivalence classes and let D {CW : C C
and C W = }. By the axiom of choice, there exists a set A, consisting of
exactly one point from each of the nonempty sets which are the elements of
D. Show
W rQ A + r (a.)
A + r1 A + r2 = if r1 = r2 , ri Q. (b.)
Observe that since A [a, b], then A + r [a 1, b + 1] whenever |r| < 1. Use
this to show that if m(A) = 0, or if m(A) > 0 a contradiction results.Show
there exists some set S such that m (S) < m (S A) + m (S \ A) where m is
the outer measure determined by m.
17. This problem gives a very interesting example found in the book by
McShane [36]. Let g(x) = x + f (x) where f is the strange function of Problem
15. Let P be the Cantor set of Problem 14. Let [0, 1] \ P = j=1 Ij where
Ij is open and Ij Ik = if j = k. These intervals are the connected
components of the complement
of the
Cantor set. Show m(g(Ij )) = m(Ij )

so m(g( j=1 I j )) = j=1 m(g(I j )) = j=1 m(Ij ) = 1. Thus m(g(P )) = 1
because g([0, 1]) = [0, 2]. By Problem 16 there exists a set A g (P ) which is
non measurable. Dene (x) = XA (g(x)). Thus (x) = 0 unless x P . Tell
why is measurable. (Recall m(P ) = 0 and Lebesgue measure is complete.)
Now show that XA (y) = (g 1 (y)) for y [0, 2]. Tell why g 1 is continuous
but g 1 is not measurable. (This is an example of measurable continuous
= measurable.) Show there exist Lebesgue measurable sets which are not
Borel measurable. Hint: The function, is Lebesgue measurable. Now show
that Borel measurable = measurable.
18. Let K be a compact subset of R having no isolated points. Show that there
exists an increasing continuous function g such that g is constant on every
connected component of K C and has values between 0 and 1. If J, L are two
components, J < L, then the value of g on J is strictly less than its value on
L. Hint: Let the components be {(ak , bk )}. Let a be the rst point of K and
7.7. EXERCISES 187
b be the last. Let g0 be piecewise linear, increasing and continuous going from
0 to the left of a to 1 to the right of b. Let g1 equal 12 (g0 (a1 ) + g0 (b1 )) on
(a1 , b1 ) and adjust to make piecewise linear and increasing going from 0 to 1.
Next adjust g1 in a similar way to make it constant on (a2 , b2 ). Continue this
way. Estimate ||gk gk1 || in terms of gk1 (bk ) gk1 (ak ) and observe
and use that the intervals (gk1 (ak ) , gk1 (bk )) are disjoint.
19. Show that if K is any compact subset of R which has no isolated points,
there exists a Borel measure which has the properties (K) = 1, (E) =
(E K) , if H is a proper compact subset of K, then (H) < 1. Also,
(p) = 0 whenever p is a point.
20. Let K be an arbitrary compact subset of R. Then there exists a Borel

measure which has the properties (K) = 1, (E) = (E K) , if H is a
compact subset of K, then (H) < 1.
21. Suppose you consider the closed upper half plane determined by the line y =
x. Can it be covered with countably many rectangles of the form [a, b] [c, d]
each of which is contained in the upper half plane? Hint: You must cover
the points on the line y = x.
The Abstract Lebesgue
Integral
The general Lebesgue integral requires a measure space, (, F, ) and, to begin with,
a nonnegative measurable function. I will use Lemma 1.3.3 about interchanging two
supremums frequently. Also, I will use the observation that if {an } is an increasing
sequence of points of [0, ] , then supn an = limn an which is obvious from the
denition of sup.
8.1 Denition For Nonnegative Measurable Func-

tions
8.1.1 Riemann Integrals For Decreasing Functions
First of all, the notation
[g < f ]
is short for
{ : g () < f ()}
with other variants of this notation being similar. Also, the convention, 0 = 0
will be used to simplify the presentation whenever it is convenient to do so.
Denition 8.1.1 Let f : [a, b] [0, ] be decreasing. Dene

b b b
f () d lim M f () d = sup M f () d
a M a M a
where a b means the minimum of a and b. Note that for f bounded,

b b
sup M f () d = f () d
M a a
189
190 THE ABSTRACT LEBESGUE INTEGRAL
where the integral on the right is the usual Riemann integral because eventually
M > f . For f a nonnegative decreasing function dened on [0, ),
R R R
f d lim f d = sup f d = sup sup f M d
0 R 0 R>1 0 R M >0 0
Since decreasing bounded functions are Riemann integrable, the above denition
is well dened. Now here are some obvious properties.
Lemma 8.1.2 Let f be a decreasing nonnegative function dened on an interval

[a, b] . Then if [a, b] = m
k=1 Ik where Ik [ak , bk ] and the intervals Ik are non
overlapping, it follows
b m bk

f d = f d.
a k=1 ak
Proof: This follows from the computation,

b b
f d lim f M d
a M a
m bk
m
bk
= lim f M d = f d
M ak ak
k=1 k=1
Note both sides could equal +.
8.1.2 The Lebesgue Integral For Nonnegative Functions

Here is the denition of the Lebesgue integral of a function which is measurable
and has values in [0, ].
Denition 8.1.3 Let (, F, ) be a measure space and suppose f : [0, ] is

measurable. Then dene

f d ([f > ]) d
0
which makes sense because ([f > ]) is nonnegative and decreasing.

Note that if f g, then f d gd because ([f > ]) ([g > ]) .
0
For convenience i=1 ai 0.
Lemma 8.1.4 In the situation of the above denition,

f d = sup ([f > hi]) h
h>0 i=1
8.2. THE LEBESGUE INTEGRAL FOR NONNEGATIVE SIMPLE FUNCTIONS 191
Proof: Let m (h, R) satisfy Rh < hm (R, h) R. Then limR m (h, R) =

and so
hm(h,R)
f d ([f > ]) d = sup sup ([f > ]) M d
0 M R 0

m(h,R)
= sup sup sup ( ([f > kh]) M ) h
M R>0 h>0
k=1
hm(h,R)
The sum is just a lower sum for the integral 0
([f > ]) M d. Hence,
switching the order of the sups, this equals

m(R,h)

= sup sup ( ([f > kh])) h = sup ( ([f > kh])) h.
h>0 R h>0
k=1 k=1
8.2 The Lebesgue Integral For Nonnegative Sim-

ple Functions
To begin with, here is a useful lemma.
Lemma 8.2.1 If f () = 0 for all > a, where f is a decreasing nonnegative
function, then a
f () d = f () d.
0 0
Proof: From the denition,

R R
f () d = lim f () d = sup f () d
0 R 0 R>1 0
R
= sup sup f () M d
R>1 M 0
R
= sup sup f () M d
M R>1 0
a
= sup sup f () M d
M R>1 0
a a
= sup f () M d f () d.
M 0 0
Now the Lebesgue integral for a nonnegative function has been dened, what
does it do to a nonnegative simple function? Recall a nonnegative simple function is
one which has nitely many nonnegative real values which it assumes on measurable
sets. Thus a simple function can be written in the form

n
s () = ci XEi ()
i=1
where the ci are each nonnegative, the distinct values of s.

p
Lemma 8.2.2 Let s () = i=1 ai XEi () be a nonnegative simple function where
the Ei are distinct but the ai might not be. Then

p
sd = ai (Ei ) . (8.2.1)
i=1
Proof: Without loss of generality, assume 0 a0 < a1 a2 ap and

that (Ei ) < , i > 0. Here is why. If (Ei ) = , then letting a (ai1 , ai ) , by
Lemma 8.2.1, the left side would be
ap ai
([s > ]) d ([s > ]) d
0 a0
ai
sup ([s > ]) M d
M 0
sup M ai =
M
and so both sides are equal to . Thus it can be assumed for each i, (Ei ) < .
Then it follows from Lemma 8.2.1 and Lemma 8.1.2,
ap p
ak
([s > ]) d = ([s > ]) d = ([s > ]) d
0 0 k=1 ak1

p
p
p
i
p
= (ak ak1 ) (Ei ) = (Ei ) (ak ak1 ) = ai (Ei )
k=1 i=k i=1 k=1 i=1
Lemma 8.2.3 If a, b 0 and if s and t are nonnegative simple functions, then

as + btd = a sd + b td.
Proof: Let

n
m
s() = i XAi (), t() = j XBj ()
i=1 i=1
where i are the distinct values of s and the j are the distinct values of t. Clearly
as + bt is a nonnegative simple function because it has nitely many values on
measurable sets In fact,

m
n
(as + bt)() = (ai + b j )XAi Bj ()
j=1 i=1
8.3. THE MONOTONE CONVERGENCE THEOREM 193
where the sets Ai Bj are disjoint and measurable. By Lemma 8.2.2,

m
n
as + btd = (ai + b j )(Ai Bj )
j=1 i=1
n
m
m
n
= a i (Ai Bj ) + b j (Ai Bj )
i=1 j=1 j=1 i=1

n
m
= a i (Ai ) + b j (Bj )
i=1 j=1

= a sd + b td.
8.3 The Monotone Convergence Theorem

The following is called the monotone convergence theorem. This theorem and re-
lated convergence theorems are the reason for using the Lebesgue integral.
Theorem 8.3.1 (Monotone Convergence theorem) Let f have values in [0, ] and
suppose {fn } is a sequence of nonnegative measurable functions having values in
[0, ] and satisfying
lim fn () = f () for each .
n
fn () fn+1 ()
Then f is measurable and

f d = lim fn d.
n
Proof: By Lemma 8.1.4

lim fn d = sup fn d
n n

N
= sup sup ([fn > kh]) h = sup sup sup ([fn > kh]) h
n h>0 h>0 N n
k=1 k=1

N

= sup sup ([f > kh]) h = sup ([f > kh]) h = f d
h>0 N h>0
k=1 k=1
To illustrate what goes wrong without the Lebesgue integral, consider the fol-
lowing example.
Example 8.3.2 Let {rn } denote the rational numbers in [0, 1] and let
{
1 if t
/ {r1 , , rn }
fn (t)
0 otherwise
Then fn (t) f (t) where f is the function which is one on the rationals and zero
on the irrationals. Each fn is Riemann integrable (why?)
but f is not Riemann
integrable. Therefore, you cant write f dx = limn fn dx.
A meta-mathematical observation related to this type of example is this. If you
can choose your functions, you dont need the Lebesgue integral. The Riemann Dar-
boux integral is just ne. It is when you cant choose your functions and they come
to you as pointwise limits that you really need the superior Lebesgue integral or
at least something more general than the Riemann integral. The Riemann integral
is entirely adequate for evaluating the seemingly endless lists of boring problems
found in calculus books. It is shown later that the two integrals coincide when the
Lebesgue integral is taken with respect to Lebesgue measure and the function being
integrated is Riemann integrable.
8.4 Other Denitions

To review and summarize the above, if f 0 is measurable,

f d ([f > ]) d (8.4.2)
0

another way to get the same thing for f d is to take an increasing sequence of
nonnegative simple functions, {sn } with sn () f () and then by monotone
convergence theorem,
f d = lim sn
n
m
where if sn () = j=1 ci XEi () ,

m
sn d = ci (Ei ) .
i=1
Similarly this also shows that for such nonnegative measurable function,
{ }
f d = sup s : 0 s f, s simple
Here is an equivalent denition of the integral of a nonnegative measurable function.

The fact it is well dened has been discussed above.
Denition 8.4.1 For s a nonnegative simple function,

n n
s () = ck XEk () , s = ck (Ek ) .
k=1 k=1
8.5. FATOUS LEMMA 195
For f a nonnegative measurable function,

{ }
f d = sup s : 0 s f, s simple .
8.5 Fatous Lemma

The next theorem, known as Fatous lemma is another important theorem which
justies the use of the Lebesgue integral.
Theorem 8.5.1 (Fatous lemma) Let fn be a nonnegative measurable function with
values in [0, ]. Let g() = lim inf n fn (). Then g is measurable and

gd lim inf fn d.
n
In other words, (
)
lim inf fn d lim inf fn d
n n
Proof: Let gn () = inf{fk () : k n}. Then

1
gn1 ([a, ]) = k=n fk ([a, ])
( 1
)
C C
= k=n fk ([a, ]) F.
Thus gn is measurable by Lemma 7.5.1. Also g() = limn gn () so g is measur-
able because it is the pointwise limit of measurable functions. Now the functions gn
form an increasing sequence of nonnegative measurable functions so the monotone
convergence theorem applies. This yields

gd = lim gn d lim inf fn d.
n n
The last inequality holding because

gn d fn d.

(Note that it is not known whether limn fn d exists.)
8.6 The Integrals Righteous Algebraic Desires

The monotone convergence theorem shows the integral wants to be linear. This is
the essential content of the next theorem.
Theorem 8.6.1 Let f, g be nonnegative measurable functions and let a, b be non-
negative numbers. Then af + bg is measurable and

(af + bg) d = a f d + b gd. (8.6.3)
Proof: By Theorem 7.5.6 on Page 175 there exist increasing sequences of non-
negative simple functions, sn f and tn g. Then af + bg, being the pointwise
limit of the simple functions asn + btn , is measurable. Now by the monotone con-
vergence theorem and Lemma 8.2.3,

(af + bg) d = lim asn + btn d
n
( )
= lim a sn d + b tn d
n

= a f d + b gd.

As long as you are allowing functions to take the value +, you cannot consider
something like f + (g) and so you cant very well expect a satisfactory statement
about the integral being linear until you restrict yourself to functions which have
values in a vector space. This is discussed next.
8.7 The Lebesgue Integral, L1

The functions considered here have values in C, which is a vector space. A function
f with values in C is of the form f = Re f + i Im f where Re f and Im f are real
valued functions. In fact
f +f f f
Re f = , Im f = .
2 2i
Denition 8.7.1 Let (, S, ) be a measure space and suppose f : C. Then f
is said to be measurable if both Re f and Im f are measurable real valued functions.
2 2 2
As is always the case for complex numbers, |z| = (Re z) + (Im z) . Also, for
g a real valued function, one can consider its positive and negative parts dened
respectively as
g (x) + |g (x)| |g (x)| g (x)

g + (x) , g (x) = .
2 2
Thus |g| = g + +g and g = g + g and both g + and g are measurable nonnegative
functions if g is measurable.
Then the following is the denition of what it means for a complex valued
function f to be in L1 ().
Denition 8.7.2 Let (, F, ) be a measure space. Then a complex valued function

f is in L1 () if
|f | d < .
8.7. THE LEBESGUE INTEGRAL, L1 197
For a function in L1 () , the integral is dened as follows.

[ ]
+ +
f d (Re f ) d (Re f ) d + i (Im f ) d (Im f ) d
I will show that with this denition, the integral is linear and well dened. First
note that it is clearly well dened because all the above integrals are of nonnegative
functions and are each equal to a nonnegative
real number because for h equal to
any of the functions, |h| |f | and |f | d < .
Here is a lemma which will make it possible to show the integral is linear.
Lemma 8.7.3 Let g, h, g , h be nonnegative measurable functions in L1 () and

suppose that
g h = g h .
Then
gd hd = g d h d.
Proof: By assumption, g + h = g + h. Then from the Lebesgue integrals

righteous algebraic desires, Theorem 8.6.1,

gd + h d = g d + hd
which implies the claimed result.

( )
Lemma 8.7.4 Let Re L1 () denote the vector space of real valued functions
1
in L
( 1() where
) the eld of scalars is the real nulmbers. Then d is linear on
Re L () .
Proof: First observe that from the denition of the positive and negative parts
of a function,
( )
(f + g) (f + g) = f + + g + f + g
+
because both sides equal f + g. Therefore from Lemma 8.7.3 and the denition, it
follows from Theorem 8.6.1 that

(f + g) (f + g) d = f + + g + d f + g d
+
f + gd
( )

= f d + g d
+ +
f d + g d = f d + gd.
what about taking out scalars? First note that if a is real and nonnegative, then

(af ) = af + and (af ) = af while if a < 0, then (af ) = af and (af ) =
+ +
af . These claims follow immediately from the above denitions of positive and
+
negative parts of a function. Thus if a < 0 and f L1 () , it follows from Theorem

8.6.1 that

+
af d (af ) d (af ) d = (a) f d (a) f + d
( )

= a f d + a f d = a +
f d f d a f d.
+
The case where a 0 works out similarly but easier.

Now here is the main result.

Theorem 8.7.5 d is linear on L1 () and L1 () is a complex vector space. If
f L1 () , then Re f, Im f, and |f | are all in L1 () . Furthermore, for f L1 () ,
[ ]
+ +
f d (Re f ) d (Re f ) d + i (Im f ) d (Im f ) d

Re f d + i Im f d
and the triangle inequality holds,

f d |f | d. (8.7.4)

Also, for every f L1 () it follows that for every > 0 there exists a simple
function s such that |s| |f | and

|f s| d < .
Also L1 () is a vector space.
Proof: First consider the claim

( that) the integral is linear. It was shown above
that the integral is linear on Re L1 () . Then letting a + ib, c + id be scalars and
f, g functions in L1 () ,
(a + ib) f + (c + id) g = (a + ib) (Re f + i Im f ) + (c + id) (Re g + i Im g)
= c Re (g)b Im (f )d Im (g)+a Re (f )+i (b Re (f ) + c Im (g) + a Im (f ) + d Re (g))

It follows from the denition that

(a + ib) f + (c + id) gd = (c Re (g) b Im (f ) d Im (g) + a Re (f )) d

+i (b Re (f ) + c Im (g) + a Im (f ) + d Re (g)) (8.7.5)
8.7. THE LEBESGUE INTEGRAL, L1 199
Also, from the denition,

( )
(a + ib) f d + (c + id) gd = (a + ib) Re f d + i Im f d
( )
+ (c + id) Re gd + i Im gd
which equals

= a Re f d b Im f d + ib Re f d + ia Im f d

+c Re gd d Im gd + id Re gd d Im gd.
Using Lemma 8.7.4 and collecting terms, it follows that this reduces to 8.7.5. Thus
the integral is linear as claimed.
Consider the claim about approximation with a simple function. Letting h equal
any of
+ +
(Re f ) , (Re f ) , (Im f ) , (Im f ) , (8.7.6)
It follows from the monotone convergence theorem and Theorem 7.5.6 on Page 175
there exists a nonnegative simple function s h such that

|h s| d < .
4
Therefore, letting s1 , s2 , s3 , s4 be such simple functions, approximating respectively
the functions listed in 8.7.6, and s s1 s2 + i (s3 s4 ) ,

+
|f s| d (Re f ) s1 d + (Re f ) s2 d

+
+ (Im f ) s3 d + (Im f ) s4 d <
It is clear from the construction that |s| |f |.

What about 8.7.4? Let C be such that || = 1 and f d = f d . Then
from what was shown above about the integral being linear,

f d = f d = f d = Re (f ) d |f | d.

If f, g L1 () , then it is known that for a, b scalars, it follows that af + bg is

measurable. See Corollary 7.5.13. Also

|af + bg| d |a| |f | + |b| |g| d < .
The following corollary follows from this. The conditions of this corollary are
sometimes taken as a denition of what it means for a function f to be in L1 ().
Corollary 8.7.6 f L1 () if and only if there exists a sequence of complex simple

functions, {sn } such that
sn () f () for all
(8.7.7)
limm,n (|sn sm |) = 0
When f L1 () ,
f d lim sn . (8.7.8)
n
Proof: From the above theorem, if f L1 there exists a sequence of simple

functions {sn } such that

|f sn | d < 1/n, sn () f () for all
Then
1 1
|sn sm | d |sn f | d + |f sm | d + .
n m
Next suppose the existence of the approximating sequence of simple functions.
Then f is measurable because its real and imaginary parts are the limit of measur-
able functions. By Fatous lemma,

|f | d lim inf |sn | d <
n
because

|sn | d |sm | d |sn sm | d

{ }
which is given to converge to 0. Hence |sn | d is a Cauchy sequence and is
therefore, bounded.
In case f L1 () , letting {sn } be the approximating sequence, Fatous lemma
implies

f d sn d |f sn | d lim inf |sm sn | d <
m
provided n is large enough. Hence 8.7.8 follows.

This is a good time to observe the following fundamental observation which
follows from a repeat of the above arguments.
Theorem 8.7.7 Suppose (f ) [0, ] for all nonnegative measurable functions

and suppose that for a, b 0 and f, g nonnegative measurable functions,
(af + bg) = a (f ) + b (g) .
In other words, wants to be linear. Then has a unique linear extension to the
complex valued measurable functions.
8.8. THE DOMINATED CONVERGENCE THEOREM 201
8.8 The Dominated Convergence Theorem

One of the major theorems in this theory is the dominated convergence theorem.
Before presenting it, here is a technical lemma about lim sup and lim inf which is
really pretty obvious from the denition.
Lemma 8.8.1 Let {an } be a sequence in [, ] . Then limn an exists if and
only if
n n
and in this case, the limit equals the common value of these two numbers.
Proof: Suppose rst limn an = a R. Then, letting > 0 be given, an
(a , a + ) for all n large enough, say n N. Therefore, both inf {ak : k n}
and sup {ak : k n} are contained in [a , a + ] whenever n N. It follows
lim supn an and lim inf n an are both in [a , a + ] , showing

lim inf an lim sup an < 2.
n n

Since is arbitrary, the two must be equal and they both must equal a. Next suppose
limn an = . Then if l R, there exists N such that for n N,
l an
and therefore, for such n,
l inf {ak : k n} sup {ak : k n}
and this shows, since l is arbitrary that
lim inf an = lim sup an = .
n n
The case for is similar.

Conversely, suppose lim inf n an = lim supn an = a. Suppose rst that
a R. Then, letting > 0 be given, there exists N such that if n N,
sup {ak : k n} inf {ak : k n} <
therefore, if k, m > N, and ak > am ,
|ak am | = ak am sup {ak : k n} inf {ak : k n} <
showing that {an } is a Cauchy sequence. Therefore, it converges to a R, and
as in the rst part, the lim inf and lim sup both equal a. If lim inf n an =
lim supn an = , then given l R, there exists N such that for n N,
inf an > l.
n>N
Therefore, limn an = . The case for is similar.

Here is the dominated convergence theorem.
Theorem 8.8.2 (Dominated Convergence theorem) Let fn L1 () and suppose
f () = lim fn (),
n
and there exists a measurable function g, with values in [0, ],1 such that

|fn ()| g() and g()d < .
Then f L1 () and

0 = lim |fn f | d = lim f d fn d

n n
Proof: f is measurable by Theorem 7.5.3. Since |f | g, it follows that
f L1 () and |f fn | 2g.
By Fatous lemma (Theorem 8.5.1),

2gd lim inf 2g |f fn |d
n

= 2gd lim sup |f fn |d.
n

Subtracting 2gd,

0 lim sup |f fn |d.
n
Hence
( )
0 lim sup |f fn |d
n
( )

lim inf |f fn |d f d fn d 0.
n
This proves the theorem by Lemma 8.8.1 because the lim sup and lim inf are equal.
Corollary 8.8.3 Suppose fn L1 () and f () = limn fn () . Suppose also

there
exist measurable functions, gn , g with
values in [0,
] such that lim n gn d =
gd, gn () g () a.e. and both gn d and gd are nite. Also suppose
|fn ()| gn () . Then
lim |f fn | d = 0.
n
1 Note that, since g is allowed to have the value , it is not known that g L1 () .
8.9. THE ONE DIMENSIONAL LEBESGUE STIELTJES INTEGRAL 203
Proof: It is just like the above. This time g + gn |f fn | 0 and so by

Fatous lemma,
2gd lim sup |f fn | d =
n

lim inf (gn + g) d lim sup |f fn | d
n n

= lim inf ((gn + g) |f fn |) d 2gd
n

and so lim supn |f fn | d 0. Thus
( )
0 lim sup |f fn |d
n
( )

lim inf |f fn |d f d fn d 0.

n
Denition 8.8.4 Let E be a measurable subset of .

f d f XE d.
E
If L1 (E) is written, the algebra is dened as
{E A : A F}
and the measure is restricted to this smaller algebra. Clearly, if f L1 (),

then
f XE L1 (E)
and if f L1 (E), then letting f be the 0 extension of f o of E, it follows f
L1 ().
8.9 The One Dimensional Lebesgue Stieltjes Inte-

gral
Let F be an increasing function dened on R. Let be the Lebesgue Stieltjes
measure dened in Theorems 7.6.1 and 7.2.3. The conclusions of these theorems
are reviewed here.
Theorem 8.9.1 Let F be an increasing function dened on R, an integrator func-

tion. There exists a function : P (R) [0, ] which satises the following prop-
erties.
1. If A B, then 0 (A) (B) , () = 0.


2. (
k=1 Ai ) i=1 (Ai )
3. ([a, b]) = F (b+) F (a) ,
4. ((a, b)) = F (b) F (a+)
5. ((a, b]) = F (b+) F (a+)
6. ([a, b)) = F (b) F (a) where
F (b+) lim F (t) , F (b) lim F (t) .

tb+ ta
There also exists a algebra S of measurable sets on which is a measure which

contains the open sets and also satises the regularity conditions,
(E) = sup { (K) : K is a closed and bounded set K E} (8.9.9)
(E) = inf { (V ) : V is an open set V E} (8.9.10)

whenever E is a set in S.
The Lebesgue integral taken with respect to this measure, is called the Lebesgue
Stieltjes integral. Note that any real valued continuous function is measurable with
respect to S. This is because if f is continuous, inverse images of open sets are open
and open sets are in S. Thus f is measurable because f 1 ((a, b)) S. Similarly if
f has complex values this argument applied to its real and imaginary parts yields
the conclusion that f is measurable.
For f a continuous function, how does the Lebesgue Stieltjes integral compare
with the Darboux Stieltjes integral? To answer this question, here is a technical
lemma.
Lemma 8.9.2 Let D be a countable subset of R and suppose a, b / D. Also suppose

f is a continuous function dened on [a, b] . Then there exists a sequence of functions
{sn } of the form

mn
( n )
sn (x) f zk1 X[zk1
n ,zkn ) (x)
k=1
such that each zkn

/ D and
sup {|sn (x) f (x)| : x [a, b]} < 1/n.

Proof: First note that D contains no intervals. To see this let D = {dk }k=1 . If
D has an interval of length 2, let Ik be an interval centered at dk which has length
/2k . Therefore, the sum of the lengths of these intervals is no more than

= .
2k
k=1
8.9. THE ONE DIMENSIONAL LEBESGUE STIELTJES INTEGRAL 205
Thus D cannot contain an interval of length 2. Since is arbitrary, D cannot

contain any interval.
Since f is continuous, it follows from Theorem 4.4.2 on Page 72 that f is uni-
formly continuous. Therefore, there exists > 0 such that if |x y| 3, then
|f (x) f (y)| < 1/n
Now let {x0 , , xmn } be a partition of [a, b] such that |xi xi1 | < for each i.
/ D and |zkn xk | < . Then
For k = 1, 2, , mn 1, let zkn
n
zk zk1 n
|zkn xk | + |xk xk1 | + xk1 zk1
n
< 3.
It follows that for each x [a, b]

m
n
( n )

f zk1 X[zk1
n n ) (x) f (x) < 1/n.
,zk

k=1
Proposition 8.9.3 Let f be a continuous function dened on R. Also let F be an

increasing function dened on R. Then whenever c, d are not points of discontinuity
of F and [a, b] [c, d] ,
b
f X[c,d] dF = f X[c,d] d
a
Here is the Lebesgue Stieltjes measure dened above.
Proof: Since F is an increasing function it can have only countably many dis-
continuities. The reason for this is that the only kind of discontinuity it can have is
where F (x+) > F (x) . Now since F is increasing, the intervals (F (x) , F (x+))
for x a point of discontinuity are disjoint and so since each must contain a rational
number and the rational numbers are countable, and therefore so are these intervals.
Let D denote this countable set of discontinuities of F . Then if l, r
/ D, [l, r]
[a, b] , it follows quickly from the denition of the Darboux Stieltjes integral that
b
X[l,r) dF = F (r) F (l) = F (r) F (l)
a

= ([l, r)) = X[l,r) d.
Now let {sn } be the sequence of step functions of Lemma 8.9.2 such that these step
functions converge uniformly to f on [c, d] , say maxx |f (x) sn (x)| < 1/n. Then

( ) 1
X[c,d] f X[c,d] sn d X[c,d] (f sn ) d ([c, d])
n
and

b( ) b
1
X[c,d] f X[c,d] sn dF X[c,d] |f sn | dF < (F (b) F (a)) .
a a n
Also if sn is given by the formula of Lemma 8.9.2,

mn

mn
( n ) ( n )
X[c,d] sn d = f zk1 X[zk1
n n ) d =
,zk f zk1 X[zk1
n ,zkn ) d
k=1 k=1

mn
( n ) ( n ) mn
( n )( ( n ))
= f zk1 [zk1 , zkn ) = f zk1 F (zkn ) F zk1
k=1 k=1

mn
( n )( ( n ))
= f zk1 F (zkn ) F zk1
k=1
mn b

( n ) b
= f zk1 X[zk1
n n ) dF =
,zk sn dF.
k=1 a a
Therefore,
b

X[c,d] f d X[c,d] f dF
X[c,d] f d X s
[c,d] n d
a
b b
b

+ X[c,d] sn d sn dF + sn dF X[c,d] f dF
a a a
1 1
([c, d]) + (F (b) F (a))

n n
and since n is arbitrary, this shows
b
f d f dF = 0.
a
In particular, in the special case where F is continuous and f is continuous,

b
f dF = X[a,b] f d.
a
Thus, if F (x) = x so the Darboux Stieltjes integral is the usual integral from
calculus,
b
f (t) dt = X[a,b] f d
a
where is the measure which comes from F (x) = x as described above. This
measure is often denoted by m. Thus when f is continuous
b
f (t) dt = X[a,b] f dm
a
8.10. EXERCISES 207
and so there is no problem in writing

b
f (t) dt
a
for either the Lebesgue or the Riemann integral. Furthermore, when f is continuous,
you can compute the Lebesgue integral by using the fundamental theorem of calculus
because in this case, the two integrals are equal.
8.10 Exercises
1. Let = N ={1, 2, }. Let F = P(N), the set of all subsets of N, and let
(S) = number of elements in S. Thus ({1}) = 1 = ({2}), ({1, 2}) = 2,
etc. Show (, F, ) is a measure space. It is called counting measure. What
functions are measurable in this case? For a nonnegative function, f dened
on N, show

f d = f (k)
N k=1
What do the monotone convergence and dominated convergence theorems say

about this example?
2. For the measure space of Problem 1, give an example of a sequence of

nonnegative measurable functions {fn } converging pointwise to a function f ,
such that inequality is obtained in Fatous lemma.
3. If (, F, ) is a measure space
and f
0 is measurable, show that 1if g () =
f () a.e. and g 0, then gd = f d. Show that if f, g L () and
g () = f () a.e. then gd = f d.
4. An algebra A of subsets of is a subset of the power set such that is in the

algebra and for A, B A, A \ B and A B are both in A. Let C {Ei }i=1

be a countable collection of sets and let 1 i=1 Ei . Show there exists an
algebra of sets A, such that A C and A is countable. Note the dierence
between this problem and Problem 6. Hint: Let C1 denote all nite unions
of sets of C and 1 . Thus C1 is countable. Now let B1 denote all complements
with respect to 1 of sets of C1 . Let C2 denote all nite unions of sets of
B1 C1 . Continue in this way, obtaining an increasing sequence Cn , each of
which is countable. Let
A i=1 Ci .
5. Let A P () where P () denotes the set of all subsets of . Let (A)

denote the intersection of all algebras which contain A, one of these being
P (). Show (A) is also a algebra.
6. We say a function g mapping a normed vector space, to a normed vector

space is Borel measurable if whenever U is open, g 1 (U ) is a Borel set. (The
Borel sets are those sets in the smallest algebra which contains the open
sets.) Let f : X and let g : X Y where X is a normed vector space
and Y equals C, R, or (, ] and F is a algebra of sets of . Suppose f
is measurable and g is Borel measurable. Show g f is measurable.
7. Let (, F, ) be a measure space. Dene : P() [0, ] by
(A) = inf{(B) : B A, B F}.
Show satises
() = 0, if A B, (A) (B),

(i=1 Ai ) (Ai ), (A) = (A) if A F .
i=1
If satises these conditions, it is called an outer measure. This shows every

measure determines an outer measure on the power set.
8. Suppose (, S, ) is a measure space which may not be complete. Show that
one can obtain a larger algebra and extend the measure to this larger
algebra in such a way that the resulting measure space is complete. Do it by
considering the outer measure determined by and then using Caratheodorys
procedure to get a possibly larger algebra such that on this algebra, this
outer measure is a measure.
9. Let {Ei } be a sequence of measurable sets with the property that

(Ei ) < .
i=1
Let S = { such that Ei for innitely many values of i}. Show

(S) = 0 and S is measurable. This is part of the Borel Cantelli lemma.
Hint: Write S in terms of intersections and unions. Something is in S means
that for every n there exists k > n such that it is in Ek . Remember the tail
of a convergent series is small.
10. Let {fn } , f be measurable functions with values in C. {fn } converges in

measure if
lim (x : |f (x) fn (x)| ) = 0
n
for each xed > 0. Prove the theorem of F. Riesz. If fn converges to f

in measure, then there exists a subsequence {fnk } which converges to f a.e.
Hint: Choose n1 such that
(x : |f (x) fn1 (x)| 1) < 1/2.

8.10. EXERCISES 209
Choose n2 > n1 such that
(x : |f (x) fn2 (x)| 1/2) < 1/22,
n3 > n2 such that
(x : |f (x) fn3 (x)| 1/3) < 1/23,
etc. Now consider what it means for fnk (x) to fail to converge to f (x). Then
use Problem 9.
11. Suppose (, ) is a nite measure space ( () < ) and S L1 (). Then
S is said to be uniformly integrable if for every > 0 there exists > 0 such
that if E is a measurable set satisfying (E) < , then

|f | d <
E
for all f S. Show S is uniformly integrable and bounded in L1 () if there

exists an increasing function h which satises
{ }
h (t)
lim = , sup h (|f |) d : f S < .
t t
S is bounded if there is some number, M such that

|f | d M
for all f S.
12. Let (, F, ) be a measure space and suppose f, g : (, ] are
measurable. Prove the sets
{ : f () < g()} and { : f () = g()}
are measurable. Hint: The easy way to do this is to write
{ : f () < g()} = rQ [f < r] [g > r] .
Note that l (x, y) = x y is not continuous on (, ] so the obvious idea

doesnt work.
13. Let {fn } be a sequence of real or complex valued measurable functions. Let
S = { : {fn ()} converges}.
Show S is measurable. Hint: You might try to exhibit the set where fn
converges in terms of countable unions and intersections using the denition
of a Cauchy sequence.
14. Suppose un (t) is a dierentiable function for t (a, b) and suppose that for
t (a, b),
|un (t)|, |un (t)| < Kn

where n=1 Kn < . Show

( un (t)) = un (t).
n=1 n=1
Hint: This is an exercise in the use of the dominated convergence theorem

and the mean value theorem.
15. Suppose {fn } is a sequence of nonnegative measurable functions dened on

a measure space, (, S, ). Show that

fk d = fk d.
k=1 k=1
Hint: Use the monotone convergence theorem along with the fact the integral
is linear.

16. The integral f (t) dt will denote the Lebesgue integral taken with re-
spect to one dimensional Lebesgue measure as discussed earlier. Show that
for > 0, t eat is in L1 (R). The gamma function is dened for x > 0 as
2

(x) et tx1 dt
0
Show t et tx1 is in L1 (R) for all x > 0. Also show that
(x + 1) = x (x) , (1) = 1.
How does (n) for n an integer compare with (n 1)!?
17. This problem outlines a treatment of Stirlings formula which is a very useful
approximation to n! based on a section in [39]. It is an excellent application
of the monotone convergence theorem. Follow and justify the following steps
using the convergence theorems for the Lebesgue integral as needed. Here
x > 0.
(x + 1) = et tx dt
0
First change the variables letting t = x (1 + u) to get (x + 1) =

x x+1
( )x
e x eu (1 + u) du
1
8.10. EXERCISES 211

Next make the change of variables u = s x2 to obtain (x + 1) =
( ( ))x
x x+(1/2) 2
s x 2
2e x x e 1+s ds
2
x
The integrand is increasing in x. This is most easily seen by taking ln of the

integrand and then taking the derivative with respect to x. This derivative
is positive. Next show the limit of the integrand as x is es . This
2
isnt too bad if you take ln and then use LHospitals rule. Consider the
integral. Explain why it must be increasing in x. Next justify the following
assertion. Remember the monotone convergence theorem applies to a sequence
of functions.
( ( ))x
s x 2
es ds
2 2
lim e 1 + s ds =
x x
2
x
Now Stirlings formula is

(x + 1)
es ds
2
lim =
x 2ex xx+(1/2)
where this last improper integral equals a well dened constant (why?). It
is very easy, when you know something about multipleintegrals of functions
of more than one variable to verify this constant is but the necessary
mathematical machinery has not yet been presented. It can also be done
through much more dicult arguments in the context of functions of only one
variable. See [39] for these clever arguments.
18. To show you the power of Stirlings formula, nd whether the series

n!en
n=1
nn
converges. The ratio test falls at but you can try it if you like. Now explain
why, if n is large enough
( )
1 n n+(1/2)
es ds c 2en nn+(1/2) .
2
n! 2e n
2
Use this.
19. Give a theorem in which the improper Riemann integral coincides with a
suitable Lebesgue integral. (There are many such situations just nd one.)

20. Note that 0 sinx x dx is a valid improper Riemann integral dened by
R
sin x
lim dx
R 0 x
but this function, sin x/x is not in L1 ([0, )). Why?
21. Let f be a nonnegative strictly decreasing function dened on [0, ). For

0 y f (0), let f 1 (y) = x where y [f (x+) , f (x)]. (Draw a picture. f
could have jump discontinuities.) Show that f 1 is nonincreasing and that
f (0)
f (t) dt = f 1 (y) dy.
0 0
22. If A is mS measurable, it does not follow that A is m measurable. Give an

example to show this is the case. You can use the existence of non measurable
sets.
23. If f is a nonnegative Lebesgue measurable function, show there exists g a
Borel measurable function such that g (x) = f (x) a.e.
The Lebesgue Integral For
Functions Of n Variables
9.1 Completion Of Measure Spaces, Approxima-

tion
Suppose (, F, ) is a measure space. Then it is always possible to enlarge
( the)
algebra and dene a new measure on this larger algebra such that , F, is
a complete measure space. Recall this means that if
(E \ E ) (E \ E) N
where (N ) = 0 with N F, then E F also. The following theorem is the

main result. The new measure space is called the completion of the measure space.
Denition 9.1.1 A measure space, (, F, ) is called nite if there exists a se-

quence {n } F such that n n = and (n ) < .
For example, if X is a nite dimensional normed vector space and is a measure

dened on B (X) which is nite on compact sets, then you could take n = B (0, n) .
Theorem 9.1.2 Let (, F, ) be a measure space. Then there exists a measure

space, (, G, ) satisfying
1. (, G, ) is a complete measure space.
2. = on F
3. G F
4. For every E G there exists G F such that G E and (G) = (E) .

In addition to this, if (, F, ) is nite, then the following approximation
result holds.
213
214 THE LEBESGUE INTEGRAL FOR FUNCTIONS OF N VARIABLES
5. For every E G there exists F F and G F such that F E G and
(G \ F ) = (G \ F ) = 0 (9.1.1)
There is a unique complete measure space (, G, ) extending (, F, ) which

satises 9.1.1.
Proof: Dene the outer measure
(A) inf { (E) : E F} , () 0.
Denote by G the algebra of measurable sets. Then I claim that = on F. It

is clear that on F directly from the denition. Now if A F and (A) = ,
there is nothing to prove. Therefore, assume (A) < and suppose A E such
that E F and
(A) + > (E) (A)
Then since is arbitrary, also.
From the denition, there exists E S such that if (S) < ,
(S) + > (E) (S)
Let En S, be a decreasing sequence of sets of F such that
(S) (En ) (S) + 2n
Now let G = n=1 En . It follows that G S and (S) = (G).

Why is F G? Letting (S) < , (There is nothing to prove if (S) = .)
let G F be such that G S and (S) = (G) . Then if A F ,
( ) ( )
(S) (S A) + S AC (G A) + G AC
( )
= (G A) + G AC = (G) = (S) .
Thus F G..
Finally suppose is nite. Let =
n=1 n where the n are disjoint sets of
F and (n ) < . Letting A G, consider An A n . From(what was) just
shown, there exists Gn AC n , Gn n such that (Gn ) = AC n .
Gn
n AC
n A
Since (n ) < , this implies

( ( )) ( )
Gn \ AC n = (Gn ) AC n = 0.
9.1. COMPLETION OF MEASURE SPACES, APPROXIMATION 215
n A n and so Fn Gn n An and (An \ Fn ) =

Now GC C
( ( )) ( )
A n \ GC
n n = (A n Gn ) = (A Gn ) = Gn \ AC
( ( ))
Gn \ AC n = 0.
Letting F = n Fn , it follows that F F and

(A \ F ) (Ak \ Fk ) = 0.
k=1
Also, there exists Gn An such that (Gn ) = (Gn ) = (An ) . Since the measures
are nite, it follows that (Gn \ An ) = 0. Then letting G =
n=1 Gn , it follows that
G A and
(G \ A) = (
n=1 Gn \ n=1 An )

(
n=1 Gn \ An ) (Gn \ An ) = 0.
n=1
Thus (G \ F )(= (G ) \ F ) = (G \ A) + (A \ F ) = 0.
If you have , G complete and satisfying 9.1.1, then letting E G , it follows
from 5, there exist F, G F such that
F E G, (G \ F ) = 0 = (G \ F ) .
Therefore, by completeness of the two measure spaces, E G. The opposite in-

clusion is similar. Hence G = G . If E G, let F E G where (G \ F ) = 0.
Then
(E) (G) = (F ) = (F ) (E)
The opposite inequality holds by the same reasoning. Hence = .
Theorem 9.1.3 Let (, F, ) be a complete measure space and let f g h be

functions having values in [0, ] . Suppose also that f () = h () a.e. and that f
and h are measurable. Then g is also measurable. If (, G, ) is the completion of
a nite measure space (, F, ) as described above in Theorem 9.1.2, then if f is
measurable with respect to G having values in [0, ] , it follows there exist functions
h, g measurable with respect to F, h f g, such that g = h a.e. Also if f has
complex values and is G measurable, there exist h, g which are F measurable such
that |h| |f | |g| and h = f = g a.e.
Proof: Consider the rst claim. g = f +X[g>f ] (g f ) . f [is given to be measur-

]
able. The second term on the right is also measurable because X[g>f ] (g f ) > =
if < 0 while if 0, the set is contained in [h f > 0] , which is a measurable
set of measure zero.
Now consider the last assertion. By Theorem 7.5.6 on Page 175 there exists an
increasing sequence of nonnegative simple functions, {sn } measurable with respect
to G which converges pointwise to f . Letting

mn
sn () = cnk XEkn () (9.1.2)
k=1
be one of these simple functions, it follows from Theorem 9.1.2, there exist sets,
Fkn F such that Fkn Ekn Gnk and (Gnk \ Fkn ) = 0. Then let

mn
mn
bn () cnk XFkn () , tn () = cnk XGnk () .
k=1 k=1
Hence bn sn tn , and tn = bn a.e. Let
g () lim sup tn ()
n
and let
h () = lim inf bn () .
n
Then h, g are F measurable, h f g, and both h, g equal f o N n [tn bn > 0].

Because o this set of measure zero, both functions equal sn which converges to f .
The last claim follows from considering the positive and negative parts of real
and imaginary parts.
There is a certain property called regularity which is very important. Recall the
denition of the Borel sets in Denition 7.3.3.
Denition 9.1.4 Let (X, F, ) be a measure space where the algebra F contains
B (X) , the Borel sets. This is called a regular measure space if for every E F ,
(E) = sup { (K) : K E, K compact}
and
(E) = inf { (V ) : V E, V open}
Note that this is equivalent to the existence of a set F E which is the countable
union of compact sets and a set G E which is the countable intersection of open
sets such that
(E) = (F ) = (G) .
Sets which are the countable intersection of open sets are called G sets and those
which are the countable union of compact sets are called F sets.1
In the context of a nite measure space, this can be strengthened to mean the
same as (G \ F ) = 0 and it is in this form that regularity will usually be referred
to. The following theorem is very interesting in this regard.
1 Actually, F usually means that the set is a countable union of closed sets but in the situations

considered in this book it is usually the same thing.
9.1. COMPLETION OF MEASURE SPACES, APPROXIMATION 217
Theorem 9.1.5 Let be a measure dened on the Borel sets of Rn which is nite
on compact sets. Then if E is any Borel set, there exists F a countable union of
compact sets and G, a countable intersection of open sets, such that F E G
and (G \ F ) = 0.
Proof: First suppose that is a nite measure. Let F E Borel such that the
following two conditions hold. These will be referred to as inner regular on E and
outer regular on E.
(E) = sup { (K) : K compact and K E}
(E) = inf { (V ) : V open and V E}

Then F contains the open sets because every open set is {the union
( of an) increas-
}
ing sequence of compact sets. If U is open, simply let Kn = x : dist x,U C n1
B (0, n). Therefore, (U ) = limn (Kn ). It is obvious that if U is open, then
there exists V open containing U such that (V ) = (U ) . Just take V = U .
Similar reasoning shows that the closed sets are in F.
Next suppose E F. Show there exists K compact, K E such that
(E) < (K) + .
But since is a measure, (E) (K) = (E \ K)

( . Thus ) (E \ K) < . Therefore,
K C is an open set which contains E C . Since K C \ E C = (E \ K) , it follows
that is outer regular on E C .
By similar reasoning, there exists V E, V( open, such
) that (V \ E) < . It
follows that V C is a closed subset of E C and E C \ V C = (V \ E) < . Now
V C =
n=1 V
C
B (0, n)
n=1 Kn ,
the union of an increasing sequence of compact sets. Therefore, for all n large
enough, ( ) ( ) ( )
E C \ Kn E C \ V C + V C \ Kn < .
This has shown that F is closed with respect to complements.
Next consider Ei F. There exist open sets Vi Ei such that (Vi Ei ) <
/2i . Then

((i Vi ) \ (i Ei )) (i (Vi \ Ei )) (Vi Ei ) < .
i=1
Thus is outer regular on i Ei .

Next, there exists Ki Ei such that (Ki ) +
2i > (Ei ). Then

(
i=1 Ei \ i=1 Ki ) (i (Ei \ Ki )) (Ei \ Ki ) < .
i=1
It follows that
( n
i=1 Ei \ i=1 Ki ) (i=1 Ei \ i=1 Ki ) + (i=1 Ki \ i=1 Ki ) <
n
provided n is suciently large.

Hence F is a algebra which contains the open sets. Therefore, F equals the
Borel sets. It follows that for any E Borel, there exists F a countable union of
compact sets and G a countable intersection of open sets such that G E F and
(G \ F ) = 0.
Next consider the case where is only known to be nite on compact sets. Then
let A1 B (0, 1) , A2 B (0, 2)\B (0, 1) , An B (0, n)\B (0, n 1) , . Thus
is nite on each of these Borel sets Ai . Let n (E) (E An ). Thus n is a
nite Borel measure and for E Borel,

(E) = n (E) .
n=1
Applying the above to each n , there exist Fn , a countable union of compact

sets and Gn , a countable intersection of open sets such that Gn E Fn and
n (Gn \ Fn ) = 0. Let G n Gn and F n=1 Fn . Then G is a countable inter-
section of open sets, F is a countable union of compact sets, G E F, and

(G \ F ) n (Gn \ Fn ) = 0.
n=1
The following little theorem shows that if you have a measure which is regular
on B (X) , then the completion is also regular on the enlarged algebra coming
from the completion.
Theorem 9.1.6 Suppose is a regular measure dened on B (), the(Borel sets of)
, a topological space. Also suppose is nite. Then denoting by , B (),
the completion of (, B () , ) , it follows is also regular.
Proof: By Theorem 9.1.2, for F B () there exist sets of B () , H, G such

that H F G and (G \ H) = 0. By regularity of , there exist G a G set and
H an F set such that G G, H H, and (H \ H ) = (G \ G) = 0. Then
(G \ H ) = (G \ H )
(G \ G) + (G \ H) + (H \ H ) = 0.
A repeat of the above argument yields the following corollary.
Corollary 9.1.7 The conclusion of the above theorem holds for replaced with Y
where Y is a closed subset of .
9.2. DYNKIN SYSTEMS 219
9.2 Dynkin Systems

The approach to n dimensional Lebesgue measure will be based on a very elegant
idea due to Dynkin.
Denition 9.2.1 Let be a set and let K be a collection of subsets of . Then K

is called a system if , K and whenever A, B K, it follows A B K.
For example, if Rn = , an example of a system

n would be the set of all open
sets. Another example would be sets of the form k=1 Ak where Ak is a Lebesgue
measurable set.
The following is the fundamental lemma which shows these systems are useful.
Lemma 9.2.2 Let K be a system of subsets of , a set. Also let G be a collection

of subsets of which satises the following three properties.
1. K G
2. If A G, then AC G

3. If {Ai }i=1 is a sequence of disjoint sets from G then
i=1 Ai G.
Then G (K) , where (K) is the smallest algebra which contains K.
Proof: First note that if
H {G : 1 - 3 all hold}
then H yields a collection of sets which also satises 1 - 3. Therefore, I will

assume in the argument that G is the smallest collection of sets satisfying 1 - 3, the
intersection of all such collections. Let A K and dene
GA {B G : A B G} .
I want to show GA satises 1 - 3 because then it must equal G since G is the smallest
collection of subsets of which satises 1 - 3. This will give the conclusion that for
A K and B G, A B G. This information will then be used to show that if
A, B G then A B G. From this it will follow very easily that G is a algebra
which will imply it contains (K). Now here are the details of the argument.
Since K is given to be a system, K G A . Property 3 is obvious because if
{Bi } is a sequence of disjoint sets in GA , then
A
i=1 Bi = i=1 A Bi G
because A Bi G and the property 3 of G.

It remains to verify Property 2 so let B GA . I need to verify that B C GA .
In other words, I need to show that A B C G. However, consider the following
picture in which the shaded area is A \ B = A B C .
A B
Thus it is easy to see that

( )C
A B C = AC (A B) G
Here is why. Since B GA , A B G and since A K G it follows AC G. It

follows the union of the disjoint sets AC and (A B) is in G and then from 2 the
complement of their union is in G. Thus GA satises 1 - 3 and this implies since G
is the smallest such, that GA G. However, GA is constructed as a subset of G and
so G = GA . This proves that for every B G and A K, A B G. Now pick
B G and consider
GB {A G : A B G} .
I just proved K GB . The other arguments are identical to show GB satises 1 - 3

and is therefore equal to G. This shows that whenever A, B G it follows AB G.
This implies G is a algebra. To show this, all that is left is to verify G is closed
under countable unions because then it follows G is a algebra. Let {Ai } G.
Then let A1 = A1 and
An+1 An+1 \ (ni=1 Ai )

( )
= An+1 ni=1 AC i
( )
= ni=1 An+1 AC i G
because it was just shown that nite intersections of sets of G are in G. Since the
Ai are disjoint, it follows

i=1 Ai = i=1 Ai G
Therefore, G (K) because it is a algebra which contains K.
9.3 n Dimensional Lebesgue Measure And Inte-

grals
9.3.1 Iterated Integrals
Let m denote one dimensional Lebesgue measure. That is, it is the Lebesgue Stielt-
jes measure which comes from the integrator function, F (x) = x. Also let the
algebra of measurable sets be denoted by F. Recall this algebra contained the
open sets. Also from the construction given above,
m ([a, b]) = m ((a, b)) = b a

9.3. N DIMENSIONAL LEBESGUE MEASURE AND INTEGRALS 221
Denition 9.3.1 Let f be a function of n variables and consider the symbol

f (x1 , , xn ) dxi1 dxin . (9.3.3)
where (i1 , , in ) is a permutation of the integers {1, 2, , n} . The symbol means

to rst do the Lebesgue integral

f (x1 , , xn ) dxi1
yielding a function of the other n 1 variables given above. Then you do

( )
f (x1 , , xn ) dxi1 dxi2
and continue this way. The iterated integral is said to make sense if the process just
described makes sense at each step. Thus, to make sense, it is required
xi1 f (x1 , , xn )
can be integrated. Either the function has values in [0, ] and is measurable or it
is a function in L1 . Then it is required

xi2 f (x1 , , xn ) dxi1
can be integrated and so forth. The symbol in 9.3.3 is called an iterated integral.
With the above explanation of iterated integrals, it is now time to dene n

dimensional Lebesgue measure.
9.3.2 n Dimensional Lebesgue Measure And Integrals

With the Lemma about systems given above and the monotone convergence
theorem, it is possible to give a very elegant and fairly easy denition of the Lebesgue
integral of a function of n real variables. This is done in the following proposition.
Notation 9.3.2 A set is called F if it is a countable union of compact sets. A set

is called G if it is a countable intersection of open sets.
Proposition 9.3.3 There exists a algebra F n of sets of Rn which contains the

open sets and a measure mn dened on this algebra such that if f : Rn [0, )
is measurable with respect to F n then for any permutation (i1 , , in ) of {1, , n}
it follows
f dmn = f (x1 , , xn ) dxi1 dxin (9.3.4)
Rn
In particular, this implies that if Ai is Lebesgue measurable for each i = 1, , n

then ( n )

n
mn Ai = m (Ai ) .
i=1 i=1
Also for every Borel set E, there exist F an F set and G a G set such that
F EG
and mn (G \ H) = 0.
Proof: Dene a system as

{ n }

K Ai : Ai is Lebesgue measurable
i=1
n
Also let Rp [p, p] , the n dimensional rectangle having sides [p, p]. A set F
Rn will be said to satisfy property P if for every p N and any two permutations
of {1, 2, , n}, (i1 , , in ) and (j1 , , jn ) the two iterated integrals

XRp F dxi1 dxin , XRp F dxj1 dxjn
make sense and are equal. Now dene G to be those subsets of Rn which have
property P.
Thus K G because if (i1 , , in ) is any permutation of {1, 2, , n} and

n
A= Ai K
i=1
then
n
XRp A dxi1 dxin = m ([p, p] Ai ) .
i=1
Now suppose F G and let (i1 , , in ) and (j1 , , jn ) be two permutations. Then
( )
Rp = Rp F C (Rp F )
and so

( )
XRp F C dxi1 dxin = XRp XRp F dxi1 dxin .
Since Rp G, the iterated integrals on the right and hence on the left make
sense. Then continuing with the expression on the right and using that F G,

( )
XRp XRp F dxi1 dxin =

n n
(2p) XRp F dxi1 dxin = (2p) XRp F dxj1 dxjn

( )
= XRp XRp F dxj1 dxjn = XRp F C dxj1 dxjn
which shows that if F G then so is F C .

Next suppose {Fi }i=1 is a sequence of disjoint sets in G. Let F =
i=1 Fi . I need
to show F G. Since the sets are disjoint,

XRp F dxi1 dxin = XRp Fk dxi1 dxin
k=1

N
= lim XRp Fk dxi1 dxin
N
k=1
Do
the iterated integrals make sense? Note that the iterated integral makes sense
N
for k=1 XRp Fk as the integrand because it is just a nite sum of functions for
which the iterated integral makes sense. Therefore,

xi1 XRp Fk (x)
k=1
is measurable and by the monotone convergence theorem,

N
XRp Fk (x) dxi1 = lim XRp Fk dxi1
N
k=1 k=1
Now each of the functions,

N
xi2 XRp Fk dxi1
k=1
is measurable and so the limit of these functions,

XRp Fk (x) dxi1
k=1
is also measurable. Therefore, one can do another integral to this function. Con-
tinuing this way using the monotone convergence theorem, it follows the iterated
integral makes sense. The same reasoning shows the iterated integral makes sense
for any other permutation.
Now applying the monotone convergence theorem as needed,

XRp F dxi1 dxin = XRp Fk dxi1 dxin
k=1

N N

= lim XRp Fk dxi1 dxin = lim XRp Fk dxi1 dxin
N N
k=1 k=1
N

N
k=1
N

N
k=1
N

= lim XRp Fk dxj1 dxjn
N
k=1
the last step holding because each Fk G. Then repeating the steps above in the
opposite order, this equals

XRp Fk dxj1 dxjn = XRp F dxj1 dxjn
k=1
Thus F G. By Lemma 9.2.2 G (K).

n
Let F n = (K). Each set of the form k=1 Uk where Uk is an open set is in
K. Also every open set in Rn is a countable union of open sets of this form. This
follows from Lemma 7.1.5 on Page 163. Therefore, every open set is in F n . Thus
(K) B (Rn ) , the Borel sets of Rn .
For F F n dene

mn (F ) lim XRp F dxj1 dxjn
p
where (j1 , , jn ) is a permutation of {1, , n} . It doesnt matter which one. It

was shown above they all give the same result. I need to verify mn is a measure.
Let {Fk } be a sequence of disjoint sets of F n .

mn (
k=1 Fk ) = lim XRp Fk dxj1 dxjn
p
k=1

m
= lim lim XRp Fk dxj1 dxjn
p m
k=1
Using the monotone convergence theorem repeatedly as in the rst part of the
argument, this equals

lim XRp Fk dxj1 dxjn mn (Fk ) .
p
k=1 k=1
Thus mn is a measure. Now letting Ak be a Lebesgue measurable set

( n )
n
mn Ak = lim X[p,p]Ak (xk ) dxj1 dxjn
p
k=1 k=1

n
n
= lim m ([p, p] Ak ) = m (Ak ) .
p
k=1 k=1
Next consider 9.3.4.

It was shown above that for F F it follows

XF dmn = lim XRp F dxj1 dxjn
Rn p
Applying the monotone convergence theorem repeatedly on the right, this yields
that the iterated integral makes sense and

XF dmn = XF dxj1 dxjn
Rn
It follows 9.3.4 holds for every nonnegative simple function in place of f because
these are just linear combinations of functions, XF . Now taking an increasing
sequence of nonnegative simple functions, {sk } which converges to a measurable
nonnegative function f

f dmn = lim sk dmn = lim sk dxj1 dxjn
Rn k Rn k

= f dxj1 dxjn
The assertion about regularity on the Borel sets follows from Theorem 9.1.5
because the measure is nite on compact sets.
9.3.3 The Sigma Algebra Of Lebesgue Measurable Sets

In the above section, Lebesgue measure was dened on a algebra F n which was the
smallest algebrawhich contains K, where K was the set of measurable rectangles,
n
sets of the form i=1 Ei for Ei a Lebesgue measurable set. However, this is not
a complete measure space. To see this, consider A B where m1 (B) = 0 and A
is a non measurable set. Then A B is a subset of R B and m2 (R B) = 0.
However, A B cannot be in (K) because all sets E in (K) have the property
that Ey {x : (x, y) E} is a measurable set. In this case, pick y B and
(A B)y = A which is not measurable. This leads to the following denition.
Denition 9.3.4 The measure space for Lebesgue measure is (Rn , Fn , mn ) where
Fn is the completion of (Rn , B (Rn ) , mn ) where B (Rn ) denotes the Borel sets is n
dimensional measure dened on B (Rn ).
The important thing about Lebesgue measure is that the measure space is com-
plete and it is a regular measure space.
Theorem 9.3.5 The measure space (Rn , Fn , mn ) is complete and regular. This
means that for every E Fn , there exists an F set F and a G set G such that
G E F and mn (G \ F ) = 0. Also if f 0 is Fn measurable, then there exists
g f such that g is Borel measurable and g = f a.e.
Proof: Let E Fn . From Theorem 9.1.2 there exist Borel sets A, B such that
A E B and mn (B \ A) = 0. Now from Theorem 9.1.5, there exists a G set G
containing B and an F set F contained in A such that mn (A \ F ) = mn (G \ B) =
0. Then
mn (G \ F ) = mn (G \ B) + mn (B \ A) + mn (A \ F ) = 0.
Now consider the last claim. Let sk be an increasing
mk sequence of simple functions
which converges pointwise to f . Say sk (x) = i=1 ci XEi (x) where Ei Fn .
Then let Fi Ei such that Fi is Borel measurable and mn (Ei \ Fi ) = 0. Letting

Nk = m i=1 (Ei \ Fi ) , and N = k=1 Nk , it follows mn (N ) = 0. Now let N be
k
a Borel measurable set such that N has measure zero and N N . Then each
function in the sequence of simple functions given by sk XN i=1
mk
ci XFi is Borel
measurable, and the sequence converges to a Borel measurable function which equals
f o the exceptional set of measure zero N and equals 0 on N.
I dened (Fn , mn ) as the completion of (B (Rn ) , mn ) but the same thing would
have been obtained if I had used (F n , mn ). This can be shown from using the
regularity of one dimensional Lebesgue measure. You might try and show this.
9.3.4 Fubinis Theorem

Formula 9.3.4 is often called Fubinis theorem. So is the following theorem. In
general, people tend to refer to theorems about the equality of iterated integrals as
Fubinis theorem, and in fact Fubini did produce such theorems, but so did Tonelli
and some of these theorems presented here and above should be called Tonellis
theorem.
Theorem 9.3.6 Let mn be dened in Proposition 9.3.3 on the algebra of sets
F n given there. Suppose f L1 (Rn , F n , mn ) , (f is F n measurable.) . Then if
(i1 , , in ) is any permutation of {1, , n} ,

f dmn = f (x) dxi1 dxin .
Rn
In particular, iterated integrals for any permutation of {1, , n} are all equal.
Proof: It suces to prove this for f having real values because if this is shown
the general case is obtained by taking real and imaginary parts. Since f L1 (Rn ) ,

|f | dmn <
Rn
and so both 12 (|f | + f ) and 12 (|f | f ) are in L1 (Rn ) and are each nonnegative.
Hence from Proposition 9.3.3,
[ ]
1 1
f dmn = (|f | + f ) (|f | f ) dmn
Rn 2 2
R
n

1 1
= (|f | + f ) dmn (|f | f ) dmn
R n 2 R n 2

1
= (|f (x)| + f (x)) dxi1 dxin
2

1
(|f (x)| f (x)) dxi1 dxin
2

1 1
= (|f (x)| + f (x)) (|f (x)| f (x)) dxi1 dxin
2 2

= f (x) dxi1 dxin
The following corollary is a convenient way to verify the hypotheses of the above
theorem.
Corollary 9.3.7 Suppose f is measurable with respect to F n and suppose for some
permutation, (i1 , , in )

|f (x)| dxi1 dxin <
Then f L1 (Rn ) .
Proof: By Proposition 9.3.3,

|f | dmn = |f (x)| dxi1 dxin <
Rn
and so f is in L1 (Rn ).
In using Proposition 9.3.3 or Corollary 9.3.7 or Theorem 9.3.6 when f is only
known to be Fn measurable, one typically uses Theorem 9.3.5 to get a function g
which is equal to f mn a.e., but g is Borel and hence F n measurable. (You use
this theorem on the positive and negative parts of the real and imaginary parts of
f .) Then every Lebesgue integral mentioned above in the above can be computed
by using the iterated integral of g. However, if you are interested in a much fussier
result, read the section on completion of product measure spaces below.
Since F n contains the Borel sets, all the above formulas pertain to the case
where f is Borel measurable.
Example 9.3.8 Find the iterated integral

1 1
sin (y)
dydx
0 x y
Notice the limits. The iterated integral equals

sin (y)
XA (x, y) dm2
R 2 y
where
A = {(x, y) : x y where x [0, 1]}
Fubinis theorem can be applied because the function (x, y) sin (y) /y is contin-
uous except at y = 0 and can be redened to be continuous there. The function is
also bounded so
sin (y)
(x, y) XA (x, y)
y
( )
clearly is in L1 R2 . Therefore,

sin (y) sin (y)
XA (x, y) dm2 = XA (x, y) dxdy
R2 y y
1 y
sin (y)
= dxdy
0 0 y
1
= sin (y) dy = 1 cos (1)
0
9.4 Product Measures

9.4.1 General Theory
The treatment of Lebesgue measure given above is a special case of something called
product measure. You can take the product measure for any two nite or nite
measures. A measure space (, F, ) is called nite if there are measurable subsets
n such that (n ) < and = n=1 n . nite
Given two nite measure spaces, (X, F, ) and (Y, S, ) , there is a way to dene
a algebra of subsets of X Y , denoted by F S and a measure, denoted by
dened on this algebra such that
(A B) = (A) (B)
whenever A F and B S.
Denition 9.4.1 Let (X, F, ) and (Y, S, ) be two measure spaces. A measurable
rectangle is a set of the form A B where A F and B S.
9.4. PRODUCT MEASURES 229
With this lemma, it is easy to dene product measure.

Let (X, F, ) and (Y, S, ) be two nite measure spaces. Dene K to be the set
of measurable rectangles, A B, A F and B S. Let
{ }
G E X Y : XE dd = XE dd (9.4.5)
Y X X Y
where in the above, part of the requirement is for all integrals to make sense.
Then K G. This is obvious.
Next I want to show that if E G then E C G. Observe XE C = 1 XE and so

XE C dd = (1 XE ) dd
Y X
Y X
= (1 XE ) dd
X Y
= XE C dd
X Y
which shows that if E G, then E G.C
Next I want to show G is closed under countable unions of disjoint sets of G. Let
{Ai } be a sequence of disjoint sets from G. Then

X
i=1 Ai
dd = X Ai dd = XAi dd
Y X Y X i=1 Y i=1 X

= XAi dd = XAi dd
i=1 Y X i=1 X Y

= XAi dd = XAi dd
X i=1 Y X Y i=1

= X
i=1 Ai
dd, (9.4.6)
X Y
the interchanges between the summation and the integral depending on the mono-
tone convergence theorem. Thus G is closed with respect to countable disjoint
unions.
From Lemma 9.2.2, G (K) . Also the computation in 9.4.6 implies that on
(K) one can dene a measure, denoted by and that for every E (K) ,

( ) (E) = XE dd = XE dd. (9.4.7)
Y X X Y
Now here is Fubinis theorem.
Theorem 9.4.2 Let f : X Y [0, ] be measurable with respect to the algebra,
(K) just dened and let be the product measure of 9.4.7 where and are
nite measures on (X, F) and (Y, S) respectively. Then

f d ( ) = f dd = f dd.
XY Y X X Y
Proof: Let {sn } be an increasing sequence of (K) measurable simple functions

which converges pointwise to f. The above equation holds for sn in place of f from
what was shown above. The nal result follows from passing to the limit and using
the monotone convergence theorem.
The symbol, F S denotes (K).
Of course one can generalize right away to measures which are only nite.
Theorem 9.4.3 Let f : X Y [0, ] be measurable with respect to the algebra,

(K) just dened and let be the product measure of 9.4.7 where and are
nite measures on (X, F) and (Y, S) respectively. Then

f d ( ) = f dd = f dd.
XY Y X X Y
Proof: Since the measures are nite, there exist increasing sequences of sets,
{Xn } and {Yn } such that (Xn ) < and (Yn ) < . Then and restricted to
Xn and Yn respectively are nite. Then from Theorem 9.4.2,

f dd = f dd
Yn Xn Xn Yn
Passing to the limit yields

f dd = f dd
Y X X Y
whenever f is as above. In particular, you could take f = XE where E F S

and dene
( ) (E) XE dd = XE dd.
Y X X Y
Then just as in the proof of Theorem 9.4.2, the conclusion of this theorem is ob-
tained.
n
It is also useful to note that all the above holds for i=1 Xi in place of X Y.
You would simply modify the denition of G in 9.4.5 including allpermutations for
n
the iterated integrals and for K you would use sets of the form i=1 Ai where Ai
is measurable. Everything goes through exactly as above. Thus the following is
obtained.
n n
Theorem 9.4.4 Let {(Xi , Fi , i )}i=1 be nite measure spaces and let i=1 Fi
denote
n the smallest algebra which contains the measurable boxes n of the form
A
i=1 i where A i F i . Then
n there exists a measure dened on i=1 Fi such that
n
if f : i=1 Xi [0, ] is i=1 Fi measurable, and (i1 , , in ) is any permutation
of (1, , n) , then

f d = f di1 din
Xin Xi1
9.4. PRODUCT MEASURES 231
9.4.2 Completion Of Product Measure Spaces

Using Theorem 9.1.3 it is easy to give a generalization to yield a theorem for the
completion of product spaces.
n n
Theorem 9.4.5 Let {(Xi , Fi , i )}i=1 be nite measure spaces and let i=1 Fi
denote
n the smallest algebra which contains the measurable boxes n of the form
A
i=1 i where A i F i . Then
n there exists a measure dened on i=1 Fi such that
n
if f : i=1 Xi [0, ] is i=1 Fi measurable, and (i1 , , in ) is any permutation
of (1, , n) , then
f d = f di1 din
Xin Xi1
( n )
n
Let i=1 Xi , i=1 Fi , denote the completion of this product measure space and
let

n
f: Xi [0, ]
i=1
n n
be i=1 Fi measurable. Then there exists N i=1 Fi such that (N ) = 0 and a
n
nonnegative function, f1 measurable with respect to i=1 Fi such that f1 = f o
N and if (i1 , , in ) is any permutation of (1, , n) , then

f d = f1 di1 din .
Xin Xi1
Furthermore, f1 may be chosen to satisfy either f1 f or f1 f.

Proof: This follows immediately from Theorem 9.4.4 and Theorem 9.1.3. By
the second theorem, thereexists a function f1 f such that f1 = f for all
n
/ N, a set of i=1 Fi having measure zero. Then by Theorem 9.1.2
(x1 , , xn )
and Theorem 9.4.4

f d = f1 d = f1 di1 din .
X in Xi1

Since f1 = f o a set of measure zero, I will dispense with the subscript. Also
it is customary to write
= 1 n
and
= 1 n .
Thus in more standard notation, one writes

f d (1 n ) = f di1 din
Xin Xi1
This theorem is often referred to as Fubinis theorem. The next theorem is also
called this.
( n )
n
Corollary 9.4.6 Suppose f L1 i=1 Xi , i=1 F i , 1 n where each Xi
is a nite measure space. Then if (i1 , , in ) is any permutation of (1, , n) ,
it follows
f d (1 n ) = f di1 din .
Xin Xi1
Proof: Just apply Theorem 9.4.5 to the positive and negative parts of the real
and imaginary parts of f.
Here is another easy corollary.
Corollary
n 9.4.7 Suppose in the situation of Corollary 9.4.6, f = f1 o N, a set of
i=1 F i having 1 nmeasure zero and that f1 is a complex valued function
n
measurable with respect to i=1 Fi . Suppose also that for some permutation of
(1, 2, , n) , (j1 , , jn )

|f1 | dj1 djn < .
Xjn Xj1
Then ( )

n
n
f L 1
Xi , Fi , 1 n
i=1 i=1
and the conclusion of Corollary 9.4.6 holds.
n
Proof: Since |f1 | is i=1 Fi measurable, it follows from Theorem 9.4.4 that

> |f1 | dj1 djn
Xjn Xj1

= |f1 | d (1 n )

= |f1 | d (1 n )

= |f | d (1 n ) .
( n )
n
Thus f L1 i=1 Xi , i=1 Fi , 1 n as claimed and the rest follows from
Corollary 9.4.6.
The following lemma is also useful.
Lemma 9.4.8 Let (X, F, ) and (Y, S, ) be nite complete measure spaces and
suppose f 0 is F S measurable. Then for a.e. x,
y f (x, y)
is S measurable. Similarly for a.e. y,
x f (x, y)
is F measurable.
9.5. EXERCISES 233
Proof: By Theorem 9.1.3, there exist F S measurable functions, g and h and

a set N F S of measure zero such that g f h and for (x, y) / N, it
follows that g (x, y) = h (x, y) . Then

gdd = hdd
X Y X Y
and so for a.e. x,

gd = hd.
Y Y
Then it follows that for these values of x, g (x, y) = h (x, y) and so by Theorem 9.1.3
again and the assumption that (Y, S, ) is complete, y f (x, y) is S measurable.
The other claim is similar.
9.5 Exercises
2 62z 3z ( )
1. Find 0 0 1 (3 z) cos y 2 dy dx dz.
2x
1 183z 6z ( )
2. Find 0 0 1 (6 z) exp y 2 dy dx dz.
3x
2 244z 6z ( )
3. Find 0 0 1 (6 z) exp x2 dx dy dz.
4y
1 124z 3z sin x
4. Find 0 0 1
x dx dy dz.
4y
20 1 5z 25 5 1 y 5z
5. Find 0 0 1 y sinx x dx dz dy+ 20 0 5 1 y sin x
x dx dz dy. Hint: You might
5 5
try doing it in the order, dy dx dz
6. Explain why for each t > 0, x etx is a function in L1 (R) and

1
etx dx = .
0 t
Thus
R R
sin (t)
dt = sin (t) etx dxdt
0 t 0 0
Now explain why you can change the order of integration in the above iterated
integral. Then compute what you get. Next pass to a limit as R and
show
sin (t) 1
dt =
0 t 2
r
7. Explain why a f (t) dt limr a f (t) dt whenever f L1 (a, ) ; that
is f X[a,) L1 (R).
1/2
8. Let f (y) = g (y) = |y| if y (1, 0) (0, 1) and f (y) = g (y) = 0 if
y / (1,0) (0, 1). For which values of x does it make sense to write the
integral R f (x y) g (y) dy?
n
9. Let Ei be a Borel set in R. Show that i=1 Ei is a Borel set in Rn .
10. Let {an } be an increasing sequence of numbers in (0, 1) which converges to
1. Let gn be a nonnegative function which equals zero outside (an , an+1 ) such
that gn dx = 1. Now for (x, y) [0, 1) [0, 1) dene

f (x, y) gn (y) (gn (x) gn+1 (x)) .
k=1
Explain why this is actually a nite sum for each such (x, y) so there are
no convergence questions in the innite sum. Explain why f is a continuous
function on [0, 1) [0, 1). You can extend f to equal zero o [0, 1) [0, 1) if
you like. Show the iterated integrals exist but are not equal. In fact, show
1 1 1 1
f (x, y) dydx = 1 = 0 = f (x, y) dxdy.
0 0 0 0
Does this example contradict the Fubini theorem? Explain why or why not.
11. Let f : [a, b] R be Rieman integrable. Thus f is a bounded function and
by Darbouxs theorem, there exists a unique number between all the upper
sums and lower sums of f , this number being the Riemann integral. Show
that f is Lebesgue measurable and
b
f (x) dx = f dm
a [a,b]
where the second integral in the above is the Lebesgue integral taken with
respect to one dimensional Lebesgue measure and the rst is the ordinary
Riemann integral.
12. Let (, F, ) be a nite measure space and let f : [0, ) be mea-
surable. Also let : [0, ) R be increasing with (0) = 0 and a C 1
function. Show that

f d = (t) ([f > t]) dt.
0
Hint: This can be done using the following steps. Let tni = i2n . Show that

X[f >t] () = lim X[f >tn ] () X[tni ,tni+1 ) (t)
n i+1
i=0
Now this is a countable sum of F B ([0, )) measurable functions and so it

follows that (t, ) X[f >t] () is F B ([0, )) measurable. Consequently,
9.5. EXERCISES 235
so is X[f >t] () (t) . Note that it is important in the argument to have f > t.
Now observe
f ()

f d = (t) dtd = X[f >t] () (t) dtd
0 0
Use Fubinis theorem. For your information, this does not require the measure
space to be nite. You can use a dierent argument which ties in to the
rst denition of the Lebesgue integral. The function t ([f > t]) is called
the distribution function.
13. Give a dierent proof of the above as follows. First suppose f is a simple
function,
n
f () = ak XEk ()
k=1
where the ak are strictly increasing, (a0 ) = a0 0. Then explain carefully
the steps to the following argument.
n (ai ) n (ai )
n
f d = ([ f > t]) dt = (Ek ) dt
i=1 (ai1 ) i=1 (ai1 ) k=i
n n ai n ai
n
= (Ek ) (t) dt =
(t) (Ek ) dt
i=1 k=i ai1 i=1 ai1 k=i
n ai
= (t) ([f > t]) dt = (t) ([f > t]) dt
i=1 ai1 0
Note that this did not require the measure space to be nite and comes
directly from the denition of the integral.
14. Give another argument for the above result as follows.

([ ])
f d = ([ f > t]) dt = f > 1 (t) dt
0 0
and now change the variable in the last integral, letting (s) = t. Justify the
easy manipulations.
15. Let (x) C (x) for all > 1, (0) = 0, is strictly increasing on
[0, ), is C 1 , and suppose (, F, ) is a nite measure space. Also suppose
f, g are nonnegative measurable functions Suppose there exists > 1 such
that for all > 0 and 1 > > 0,
([f > ] [g ]) () ([f > ])
where lim0+ () = 0 where is increasing. Show there exists a constant
C depending only on , such that

f d C gd

This is called the good lambda inequality2 . Hint: Use the above problems.
Fill in the details.

(f ) d = (t) ([f > t]) dt = () ([f > ]) d
0 0

= () ([f > ] [g ]) d
0

+ () ([f > ] [g > ]) d
0

() () ([f > ]) d + () ([g > ]) d
0 0
( )
t
= () () ([f > ]) d + ([g > t]) dt
0 0
) (

= () (f ) d + g d

() C (f ) d + C/ (g) d

Now adjust . This yields the desired result in the case that (f ) d < .
What about the case where (f ) d = ? Does the good lambda estimate
hold if f is replaced with f m for m a positive constant? Recall () < .
2 I have no idea why it is called the good lambda inequality. I am also not sure if there is a bad
lambda inequality. It is a remarkable result however.

Lebesgue Measurable Sets
10.1 The Algebra Of Lebesgue Measurable Sets

The algebra of Lebesgue measurable sets is larger than the above algebra of
Borel sets or of the earlier algebra which came from an application of the system
lemma. It is convenient to use this larger algebra, especially when considering
change of variables formulas, although it is certainly true that one can do most
interesting theorems with the Borel sets only. However, it is in some ways easier to
consider the more general situation and this will be done next.
Denition 10.1.1 The completion of (Rp , B (Rp ) , mp ) is the Lebesgue measure
space. Denote this by (Rp , Fp , mp ) .
Thus for each E Fp ,
mp (E) = inf {mp (F ) : F E and F B (Rp )}
It follows that for each E Fp there exists F B (Rp ) such that F E and
mp (E) = mp (F ) .
Theorem 10.1.2 mp is regular on Fp . In fact, if E Fp , there exist sets in
B (Rp ) , F, G such that
F E G,
F is a countable union of compact sets, G is a countable intersection of open sets
and
mp (G \ F ) = 0.
p
If Ak is Lebesgue measurable then k=1 Ak Fp and
( p )
p
mp Ak = m (Ak ) .
k=1 k=1
In addition to this, mp is translation invariant. This means that if E Fp and

x Rp , then
mp (x+E) = mp (E) .
The expression x + E means {x + e : e E} .
237
238 LEBESGUE MEASURABLE SETS
Proof: The regularity of mp on the Borel sets follows from Theorem 9.1.5.
Then Theorem 9.1.6 implies mp is regular. The assertion about the measure of the
Cartesian product in the case where each Ak is Borel follows from the fact that mp
is the extension of a measure for which the assertion does hold. See Theorem 9.1.2
on the completion of a measure space. In case the Ak are only Lebesgue measurable,
the equation can be obtained from regularity considerations.
It only remains to consider the claim about translation invariance. Let K denote
all sets of the form
p
Uk
k=1
where each Uk is an open set in R. Thus K is a system.

p
p
x+ Uk = (xk + Uk )
k=1 k=1
which is also a nite Cartesian product of nitely many open sets. Also,
( ) ( p )
p
mp x + Uk = mp (xk + Uk )
k=1 k=1

p
= m (xk + Uk )
k=1
( )
p
p
= m (Uk ) = mp Uk
k=1 k=1
The step to the last line is obvious because an arbitrary open set in R is the disjoint
union of open intervals, and the lengths of these intervals are unchanged when they
are slid to another location.
Now let G denote those sets of F p (Recallthat F p was the smallest algebra
p
which contains all the measurable rectangles i=1 Ei , Ei Lebesgue measurable.) E
with the property that for each n N
p p
mp (x + E (n, n) ) = mp (E (n, n) )
p
and the set x + E (n, n) is in F p . Thus K G. If E G, then
( p) p p
x + E C (n, n) (x + E (n, n) ) = x + (n, n)
p
which implies x + E C (n, n) is in F p since it equals a dierence of two sets in
F p . Now consider the following.
( p) p
mp x + E C (n, n) + mp (E (n, n) )
( p ) p
= mp x + E C (n, n) + mp (x + E (n, n) )
p p
= mp (x + (n, n) ) = mp ((n, n) )
10.1. THE ALGEBRA OF LEBESGUE MEASURABLE SETS 239
( p) p
= mp E C (n, n) + mp (E (n, n) )
which shows ( p) ( p)
mp x + E C (n, n) = mp E C (n, n)
showing that E C G.
If {Ek } is a sequence of disjoint sets of G,
mp (x + (
p p
k=1 Ek ) (n, n) ) = mp (k=1 (x + Ek ) (n, n) )
n
Now the sets {(x + Ek ) (p, p) } are also disjoint and so the above equals
p
p
mp (x + Ek (n, n) ) = mp (Ek (n, n) )
k k
= mp (
p
k=1 Ek (n, n) )
Thus G is also closed with respect to countable disjoint unions. It follows from the
lemma on systems that G = (K) = B (Rp ) .
I have just shown that for every E B (Rp ) , and any n N,
p p
mp (x + E (n, n) ) = mp (E (n, n) )
Taking the limit as n yields
mp (x + E) = mp (E) .
This proves translation invariance on sets in B (Rp ).

By Regularity of mp , for E Fp , there exists G a G set and F an F set such
that F E G and mp (G \ F ) = 0. Also, it is obvious that the translation of a
G set is a G set and the translation of an F set is an F set. Therefore, from
what was just shown,
F +xE+xG+x
and mp ((G + x) \ (F + x)) = mp (G \ F + x) = mp (G \ F ) = 0. Therefore, by
completeness, E + x is Lebesgue measurable and
mp (E + x) mp (G + x) = mp (G) = mp (E)
= mp (F ) = mp (F + x) mp (E + x)
The following lemma is about the existence of a Borel measurable representa-

tive for any Lebesgue measurable function. It follows right away from the above
denition of Lebesgue measure as the completion of (Rp , B (Rp ) , mp ) and Theorem
9.1.3.
Lemma 10.1.3 Let f be measurable with respect to Fp . Then there exists a Borel
measurable function g, such that g = f mp a.e.
What is the main importance of regularity of Lebesgue measure? It is related

to the density of continuous functions which have compact support in L1 (Rn ).
Denition 10.1.4 Let f be a function. spt (f ) {x : f (x) = 0}. In words, the

closure of the set where f is not zero. spt (f ) is called the support of f . A function
is said to be in Cc (Rp ) if it is continuous and has compact support.
The following remarkable theorem on approximation follows from regularity of
the measure.
Theorem 10.1.5 Let f L1 (Rn ) . Then there exists g Cc (Rn ) such that

|f g| dmn < .
Rn
Proof: First note that every open set V is the countable union of compact sets.
In fact, { ( ) }
V = k=1 B (0, k) x V : dist x, V
C
1/k .
Now if K V where V is an open set and K is a compact set, it follows from the
nite intersection property of compact sets that for all k large enough,
{ ( ) }
K B (0, k) x V : dist x, V C > 1/k W
{ ( ) }
B (0, k) x V : dist x, V C 1/k ,
dist(x,W C )
a compact subset of V . Let h (x) = dist(x,W C )+dist(x,K) . Then since W is an open
set, containing the compact set K, it follows that the denominator is never equal
to 0. Therefore, h is continuous. Also, if x K, then h = 0 and if x / W, then
h = 0. It follows that h is in Cc (Rn ) , equals 1 on K and vanishes o a compact
set which is contained in V . We denote this situation with the notation of Rudin,
K h W.
Now let f 0 and f dmn < . By Theorem 8.7.5, there exists a nonnegative
m
simple function, s (x) = k=1 ck XEk (x) , mn (Ek ) < , such that

|f s| dmn < .
2
By regularity of the measure, there exists a compact set Kk and an open set Vk
such that

Kk Ek Vk , mn (Vk \ Kk ) < m
2 ( k=1 ck )
mwhat was just shown, there exists hk such that Kk hk Vk . Then let
Then from
h (x) = k=1 ck hk (x) . Thus h Cc (Rn ).

m

|s h| dmn ck mn (Vk \ Kk ) < .
2
k=1
Therefore,

|f h| dmn |f s| dmn + |s h| dmn < + = .
2 2
For an arbitrary f L1 (Rn ) , you simply apply the above result to positive and
negative parts of real and imaginary parts.
10.2. CHANGE OF VARIABLES, LINEAR MAPS 241
10.2 Change Of Variables, Linear Maps

The change of variables formula for linear maps is implied by the following major
theorem. First, here is some notation. Let dxi denote
dx1 dxi1 dxi+1 dxp .
Theorem 10.2.1 Let E be any Lebesgue measurable set and let A be a pp matrix.
Then AE is Lebesgue measurable and mp (AE) = |det (A)| mp (E). Also, if E is
any Lebesgue measurable set, then

XA(E) (y) dy = XE (x) |det (A)| dx.
p
Proof: Let Q denote i=1 (ai , bi ) , an open rectangle. First suppose A is an
elementary matrix which is obtained from I by adding times row i to row j.
Then det (A) = 1 and

bi bi bj +xi
mp (AQ) XAQ dxj dxj = dxj dxj
i=j ai i=j ai aj +xi

p
= |bi ai | = mp (Q) = |det (A)| mp (Q)
i=1
The linear transformation determined by A just shears Q in the direction of xj . It

is clear in this case that AQ is an open, hence measurable set.
Next suppose A is an elementary matrix which comes from multiplying the j th
row of I with = 0. Then this changes the length of one of the sides of the box by
the factor , resulting in

mp (AQ) = || |bj aj | |bi ai | = || mp (Q) = |det (A)| mp (Q)
i=j
In case A is an elementary matrix which comes from switching two rows of

the identity, then it maps Q to a possibly dierent Q which has the lengths of its
sides the same set of numbers as the set of lengths of the sides of Q. Therefore,
|det (A)| = 1 and so
mp (AQ) = mp (Q) = |det (A)| mp (Q) .
p
Let Rn = i=1 (n, n) and let K consist of all nite intersections of such open
boxes as just described. This is clearly a system. Now let
G {E B (Rp ) : mp (A (E Rn )) = |det (A)| mp (E Rn ) , and AE is Borel}
where A is any of the above elementary matrices. It is clear that G is closed with
respect to countable disjoint unions. If E G, is E C G? Since A is onto,
AE C = Rn \ AE which is Borel. What about the estimate?
( ( ))
mp (A (E Rn )) + mp A E C Rn = mp (A (Rn )) = |det (A)| mp (Rn )
and so
( ( ))
mp A E C Rn = |det (A)| mp (Rn ) mp (A (E Rn ))
= |det (A)| (mp (Rn ) mp (E Rn ))
( )
= |det (A)| mp E C Rn
It was shown above that G contains K. By Lemma 9.2.2, it follows that G (K) =
B (Rp ). Therefore, for any A elementary and E a Borel set,
|det (A)| mp (E) = lim |det (A)| mp (E Rn )

n
= lim mp (A (E Rn )) = mp (A (E)) .
n
Now consider A an arbitrary invertible matrix. Then A is the product of ele-

mentary matrices A1 Am and so if E is Borel, so is AE. Also

m
m1
|det (A)| mp (E) = |det (Ai )| mp (E) = |det (Ai )| mp (Am E)
i=1 i=1

m2
= |det (Ai )| mp (Am1 Am E) = mp (A1 Am E) = mp (AE) .
i=1
In case A is an arbitrary matrix which has rank less than p, there exists a
sequence of elementary matrices such that
A = E1 E2 Es B
where B is in row reduced echelon form and has at least one row of zeros. Thus if
E is any Lebesgue measurable set,
AE E1 E2 Es (BRp )
However, BRn is a Borel set of measure zero because it is contained in a set of the
form
F {x Rp : xk , , xp = 0}
and this has measure zero. Therefore, E1 E2 Es (BRp ) is a Borel set of measure
zero because
mp (E1 E2 Es (BRp )) = |det (E1 E2 Es )| mp (BRp ) = 0.
It follows from completeness of the measure that AE is Lebesgue measurable and
mp (AE) = |det (A)| mp (E) = 0.
It has now been shown that for invertible A, and E any Borel set,
mp (AE) = |det (A)| mp (E)

10.3. COVERING THEOREMS 243
and for any Lebesgue measurable set E and A not invertible, the above formula
holds. It only remains to verify the formula holds for A invertible and E only
Lebesgue measurable. However, in this case, A maps open sets to open sets because
its inverse is continuous and maps compact sets to compact sets because x Ax
is continuous. Hence A takes G sets to G sets and F sets to F sets. Let E be
Lebesgue measurable. By regularity of the measure, there exists G and F, G and F
sets respectively such that F E G and mp (G \ F ) = 0. Then AF AE AG
and
mp (AG \ AF ) = mp (A (G \ F )) = |det (A)| mp (G \ F ) = 0.
By completeness, AE is Lebesgue measurable. Also
|det (A)| mp (F ) = mp (AF ) mp (AE) mp (AG)

= |det (A)| mp (G) = |det (A)| mp (E) = |det (A)| mp (F )
The above theorem also implies easily the following version of the change of variables
formula for linear mappings.
Theorem 10.2.2 Let f 0 and suppose it is Lebesgue measurable. Then if A is a

p p matrix,

XA(Rp ) (y) f (y) dy = f (A (x)) |det (A)| dx. (10.2.1)
Proof: From Theorem 10.2.1, the equation is true if det (A) = 0. It follows that
it suces to consider only the case where A1 exists. First suppose f (y) = XE (y)
where E is a Lebesgue measurable set. In this case, A (Rn ) = Rn . Then from
Theorem 10.2.1

( )
XA(Rp ) (y) f (y) dy = mp (E) = |det (A)| mp A1 E

= |det (A)| XA1 E (x) dx
Rn

= |det (A)| XE (Ax) dx = f (A (x)) |det (A)| dx
Rn
It follows from this that 10.2.1 holds whenever f is a nonnegative simple function.
Finally, the general result follows from approximating the Lebesgue measurable
function with nonnegative simple functions using Theorem 7.5.6 and then applying
the monotone convergence theorem.
10.3 Covering Theorems

The Vitali covering theorem is a profound result about coverings of a set in Rn with
open balls. The balls can be dened in terms of any norm for Rn . For example, the
norm could be
||x|| max {|xk | : k = 1, , n}
or the usual norm

2
|x| = |xk |
k
or any other. The balls can be either open or closed or neither. The proof given
here is from Basic Analysis [32].
Lemma 10.3.1 Let F be a countable collection of balls satisfying
> M sup{r : B(p, r) F} > 0
and let k (0, ) . Then there exists G F such that
If B(p, r) G then r > k, (10.3.2)
If B1 , B2 G then B1 B2 = , (10.3.3)
G is maximal with respect to 10.3.2 and 10.3.3. (10.3.4)

By this is meant that if H is a collection of balls satisfying 10.3.2 and 10.3.3, then
H cannot properly contain G.
Proof: If no ball of F has radius larger than k, let G = . Assume therefore,

that some balls have radius larger than k. Let F {Bi }i=1 . Now let Bn1 be the
rst ball in the list which has radius greater than k. If every ball having radius
larger than k intersects this one, then stop. The maximal set is {Bn1 } . Otherwise,
let Bn2 be the next ball having radius larger than k which is disjoint from Bn1 .

Continue this way obtaining {Bni }i=1 , a nite or innite sequence of disjoint balls
having radius larger than k. Then let G {Bni }. To see G is maximal with respect
to 10.3.2 and 10.3.3, suppose B F, B has radius larger than k, and G {B}
satises 10.3.2 and 10.3.3. Then at some point in the process, B would have been
chosen because it would be the ball of radius larger than k which has the smallest
index. Therefore, B G and this shows G is maximal with respect to 10.3.2 and
10.3.3.
For a ball, B = B (x, r) , denote by B e the open ball, B (x, 4r), denoted as
B0 (x, 4r) .
Lemma 10.3.2 Let F be a countable collection of balls, and let
A {B : B F} .
Suppose
> M sup {r : B(p, r) F} > 0.
Then there exists G F such that G consists of disjoint balls and
e : B G}.
A {B
Proof: By Lemma 10.3.1, there exists G1 F which satises 10.3.2, 10.3.3,

and 10.3.4 with k = 2M 3 .
Suppose G1 , , Gm1 have been chosen for m 2. Let
union of the balls in these Gj
z }| {
Fm = {B F : B Rn \ {G1 Gm1 } },
and using Lemma 10.3.1, let Gm be a maximal collection( of) disjoint balls from Fm
m
with the property that each ball has radius larger than 23 M. Let G k=1 Gk .
Let x B (p, r) F . Choose m such that
( )m ( )m1
2 2
M <r M
3 3
Then B (p, r) must have nonempty intersection with some ball from G1 Gm
because if it didnt, then Gm would fail to be maximal. Denote by B (p0 , r0 ) a ball
in G1 Gm which has nonempty intersection with B (p, r) . Thus
( )m
2
r0 > M.
3
Consider the picture, in which w B (p0 , r0 ) B (p, r) .
r0 x
w p
p0 r
?
Then
r0
z }| {
|x p0 | |x p| + |p w| + |w p0 |
< 32 r0
z }| {
( )m1
2
r + r + r0 2 M + r0
3
( )m1 ( )m
2 3
< 2 r0 + r0 = 4r0 .
3 2
This proves the lemma since it shows B (p, r) B0 (p0 , 4r0 ).
You dont need to assume the set of Balls is countable. Let F be any collection of
balls having bounded radii. Let F result from replacing each ball B (x, r) F with
the open ball B0 (x, r (1 + )). Thus, letting A denote the union of these slightly
enlarged balls, it follows from Problem 19 on Page 60 or Lemma 7.1.5 on Page 163,
that a countable subset of F denoted by F has the property that F = A .
Thus, letting = 1/4, the above conclusion follows for these enlarged balls and
that the enlarged balls are of the form B0 (x, 5r) where B (x, r) F. Note that if
B0 (x, r (1 + )) B0 (x1 , r1 (1 + )) = , then B (x, r) B (x1 , r1 ) = . This proves
the following proposition.
Proposition 10.3.3 Let F be a collection of balls, and let
A {B : B F} .
Suppose
> M sup {r : B(p, r) F} > 0.
Then there exists G F such that G consists of disjoint balls whose closures are
also disjoint and
A {Bb : B G}
b denotes the open ball B0 (x, 5r).

where for B = B (x, r) a ball, B
Here is the concept of a Vitali covering.
Denition 10.3.4 Let S be a set and let C be a covering of S meaning that every
point of S is contained in a set of C. This covering is said to be a Vitali covering if
for each > 0 and x S, there exists a set B C containing x, the diameter of B
is less than , and there exists an upper bound to the set of diameters of sets of C.
Theorem 10.3.5 Let E Rp be a bounded measurable set and let F be a collection

of balls, open or not, of bounded radii such that F covers E in the sense of Vitali.
Then there exists a countable collection of balls from F whose closures are disjoint,
denoted by {Bj }
j=1 , such that mp (E \ j=1 Bj ) = mp (E \ j=1 Bj ) = 0.
Proof: From the change of variables theorem for linear transformations,
mp (B (x,r)) = mp (B (0,r))
= p mp (0, r) = p mp (B (x,r)) ,
Let S (x,r) {y : |y x| = r}. Then for each < r,
mp (S (x, r)) mp (B (x, r + )) mp (B (x, r ))

= mp (B (0, r + )) mp (B (0, r ))
(( )p ( )p )
r+ r
= (mp (B (0, r)))
r r
Hence mp (S (x, r)) = 0.

If mp (E) = 0, there is nothing to prove, so assume the measure of this set is
positive. By outer regularity of Lebesgue measure, there exists U , an open set which
satises
mp (E) > (1 10p )mp (U ), U E.
E1
Each point of E is contained in balls of F of arbitrarily small radii and so there

exists a covering of E with balls of F whose closures are contained in U . Therefore,

by Proposition 10.3.3, there exist balls, {Bi }i=1 F such that their closures are
disjoint and
E c
j=1 Bj , Bj U.
Therefore,
( ) ( )
mp E \
j=1 Bj mp (U ) mp
j=1 Bj

( )1 ( )
< 1 10p mp (E) mp Bj
j=1
( )
( )
p 1 cj
= 1 10 mp (E) 5p mp B
j=1
( )
p 1 p
1 10 mp (E) 5 mp (E)
= mp (E) p
where ( )1
p 1 10p 5p < 1
Thus, there exists m1 large enough that
( )
mp E \ mj=1 Bj p mp (E)
1
Now consider E \ m j=1 Bj and apply the same reasoning to it that was done to E.
1
Thus there exists m2 > m1 such that

( ) ( )
mp E \ m j=1 Bj p mp E \ j=1 Bj p mp (E)
2 m1 2
Continuing this way, there exists an increasing subsequence mk such that

( )
mp E \ m j=1 Bj p mp (E)
k k
( )
and since p < 1, and mp (E) < , this implies mp E \
j=1 Bj = 0.
You dont need to assume that E has nite measure in order to draw the above
conclusion.
Corollary 10.3.6 Let E Rp be a measurable set and let F be a collection of balls

which come from some norm, open or not, but having bounded radii such that F
covers E in the sense of Vitali. Then there exists a countable collection of balls from
F having disjoint closures, denoted by {Bj }
j=1 , such that mp (E \ j=1 Bj ) = 0.
Proof: Consider An = B (0, n) \ B (0, n 1), n = 1, 2, . Let En = E

An . Then n=1 En N = E where N is a set of measure zero. From Theorem

10.3.5, there exist balls of F, having disjoint closures denoted by {Bin }i=1 , such

that mp (En \ i=1 Bi ) = 0 and each Bi An . Then {Bi , (i, n) N N} is a
n n n
suitable collection of balls having disjoint closures.
10.4 Dierentiable Functions And Measurability

To begin with, certain kinds of functions map measurable sets to measurable sets.
It will be assumed that U is an open set in Rp and that h : U Rp satises
Dh (x) exists for all x U, (10.4.5)
Note that if
h (x) = Lx
where L L (Rp , Rp ) , then L is included in 10.4.5 because
L (x + v) = L (x) + L (v) + o (v)
In fact, o (v) = 0. Thus multiplication by a p p matrix satises 10.4.5.

It is convenient in the following lemma to use the norm on Rp given by
||x|| = max {|xk | : k = 1, 2, , p} .
Thus B (x, r) is the open box,

p
(xk r, xk + r)
k=1
p
and so mp (B (x, r)) = (2r) . Also for a linear transformation A L (Rp , Rp ) ,
||A|| sup ||Ax|| .

||x||1
It is appropriate to consider the measurability of a certain set in the following

lemma. Note that Dh (x) is a matrix whose ith column is hxi , the partial derivative
with respect to xi given by
h (x+tei ) h (x)
lim
t0 t
10.4. DIFFERENTIABLE FUNCTIONS AND MEASURABILITY 249
Each component of the above dierence quotient is a continuous function and so

it follows that each component of the above limit is Borel measurable. Therefore,
x ||Dh (x)|| is also a Borel measurable function because it equals
sup ||Dh (x) v||

||v||1,vD
where D is a dense countable subset of the unit ball. Since it is the sup of countably
many Borel measurable functions, it must also be Borel measurable. It follows the
set
Tk {x T : ||Dh (x)|| < k}
for T a measurable set, must be measurable as well.
Lemma 10.4.1 Let h satisfy 10.4.5. If T U and mp (T ) = 0, then mp (h (T )) =

0.
Proof: Let
Tk {x T : ||Dh (x)|| < k}
and let > 0 be given. Now by outer regularity, there exists an open set V ,
containing Tk which is contained in U such that mp (V ) < . Let x Tk . Then by
dierentiability,
h (x + v) = h (x) + Dh (x) v + o (v)
and so there exist arbitrarily small rx < 1 such that B (x,5rx ) V and whenever
||v|| 5rx , ||o (v)|| < k ||v|| . Thus
h (B (x, 5rx )) B (h (x) , 6krx ) .
From the Vitali covering theorem, there exists a countable { }disjoint sequence of

these balls, {B (xi , ri )}i=1 such that {B (xi , 5ri )}i=1 = B ci covers Tk Then
i=1
letting mp denote the outer measure determined by mp ,
( ( ))
mp (h (Tk )) mp h B bi
i=1

( ( ))
bi
mp h B mp (B (h (xi ) , 6krxi ))
i=1 i=1

p
= mp (B (xi , 6krxi )) = (6k) mp (B (xi , rxi ))
i=1 i=1
p p
(6k) mp (V ) (6k) .
Since > 0 is arbitrary, this shows mp (h (Tk )) = 0. Now
mp (h (T )) = lim mp (h (Tk )) = 0.
k
Lemma 10.4.2 Let h satisfy 10.4.5. If S is a Lebesgue measurable subset of U ,

then h (S) is Lebesgue measurable.
Proof: By Theorem 10.1.2 there exists F which is a countable union of compact
sets F =
k=1 Kk such that
F S, mp (S \ F ) = 0.
Then since h is continuous
h (F ) = k h (Kk ) B (Rp )
because the continuous image of a compact set is compact. Also, h (S \ F ) is a set
of measure zero by Lemma 10.4.1 and so
h (S) = h (F ) h (S \ F ) Fp
because it is the union of two sets which are in Fp .
In particular, this proves most of the following theorem from a dierent point
of view to that done before.
Theorem 10.4.3 Let A be a p p matrix. Then if E is a Lebesgue measurable set,
it follows that A (E) is also a Lebesgue measurable set.
Proof: By Theorem 10.1.2, there exists F a countable union of compact sets
{Ki }, and a set of measure zero N disjoint from F such that E = F N, with
mp (N ) = 0. Since x Ax is a continuous map, each AKi is compact and so AF
is a countable union of compact sets. Therefore, it is a Borel set and is therefore,
Lebesgue measurable. Thus A (F N ) = A (F ) A (N ) and from what was just
shown, A (N ) is of measure zero and so it is measureable.
10.5 Change Of Variables, Nonlinear Maps

In this section theorems are proved which yield change of variables formulas for
C 1 functions. More general versions can be seen in Kuttler [32], Kuttler [33], and
Rudin [40]. You can obtain more by exploiting the Radon Nikodym theorem and
the Lebesgue fundamental theorem of calculus, two topics which are best studied
in a more advanced course. Instead, I will present some good theorems using the
Vitali covering theorem directly.
A basic version of the theorems to be presented is the following. If you like, let
the balls be dened in terms of the norm
||x|| max {|xk | : k = 1, , p}
Lemma 10.5.1 Let U and V be bounded open sets in Rp and let h, h1 be C 1
functions such that h (U ) = V . Also let f Cc (V ) . Then

f (y) dmp = f (h (x)) |det (Dh (x))| dmp
V U
10.5. CHANGE OF VARIABLES, NONLINEAR MAPS 251
Proof: First note h1 (spt (f )) is a closed subset of the bounded set U and so
it is compact. Thus x f (h (x)) |det (Dh (x))| is bounded and continuous.
Let x U. By the assumption that h and h1 are C 1 ,
h (x + v) h (x) = Dh (x) v + o (v)

( )
= Dh (x) v + Dh1 (h (x)) o (v)
= Dh (x) (v + o (v))
and so if r > 0 is small enough then B (x, r) is contained in U and
h (B (x, r)) h (x) =
h (x+B (0,r)) h (x) Dh (x) (B (0, (1 + ) r)) . (10.5.6)

Making r still smaller if necessary, one can also obtain
|f (y) f (h (x))| < (10.5.7)
for any y h (B (x, r)) and also
|f (h (x1 )) |det (Dh (x1 ))| f (h (x)) |det (Dh (x))|| < (10.5.8)
whenever x1 B (x, r) . The collection of such balls is a Vitali cover of U. By

Corollary 10.3.6 there is a sequence of disjoint closed balls {Bi } such that U =
i=1 Bi N where mp (N ) = 0. Denote by xi the center of Bi and ri the radius.

Then by Lemma 10.4.1, the monotone convergence theorem, 10.5.6 - 10.5.8, and
the change of variables formula for linear maps Theorem 10.2.1,

V
f (y) dmp = i=1 h(Bi ) f (y) dmp

mp (V ) + i=1 h(Bi ) f (h (xi )) dmp

mp (V ) + i=1 f (h (xi )) mp (h (Bi ))

mp (V ) + i=1 f (h (xi )) mp (Dh (xi ) (B (0, (1 + ) ri )))
p
= mp (V ) + (1 + ) i=1 Bi f (h (xi )) |det (Dh (xi ))| dmp
( )
p
mp (V ) + (1 + ) i=1 Bi
f (h (x)) |det (Dh (x))| dmp + mp (B i )
p p
mp (V ) + (1 + ) i=1 Bi f (h (x)) |det (Dh (x))| dmp + (1 + ) mp (U )
p p
= mp (V ) + (1 + ) U f (h (x)) |det (Dh (x))| dmp + (1 + ) mp (U )
Since > 0 is arbitrary, this shows

f (y) dmp f (h (x)) |det (Dh (x))| dmp (10.5.9)
V U
whenever f Cc (V ) . Now x f (h (x)) |det (Dh (x))| is in Cc (U ) and so using the

same argument with U and V switching roles and replacing h with h1 ,

f (h (x)) |det (Dh (x))| dmp
U

( ( )) ( ( )) ( )
f h h1 (y) det Dh h1 (y) det Dh1 (y) dmp = f (y) dmp
V V
by the chain rule. This with 10.5.9 proves the lemma.
The next task is to relax the assumption that f is continuous. This will make
use of the following simple lemma.
Lemma 10.5.2 Let S be a nonempty set in Rp and let
dist (x, S) inf {|x y| : y S}
Then this function of x is continuous. If K is a compact subset of an open set G,
there exists a function f : G [0, 1] such that spt (f ) {x : f (x) = 0} is a compact
subset of G and f (x) = 1 for all x K. This situation is written as K f G.
Proof: First consider the claim about the function. Let x1 , x2 be two points.
Say dist (x1 , S) dist (x2 , S). Then let y S be such that dist (x2 , S) > |x2 y|.
Then
|dist (x1 , S) dist (x2 , S)| = dist (x1 , S) dist (x2 , S)
|x1 y| (|x2 y| ) |x1 x2 | +
Since is arbitrary, |dist (x1 , S) dist (x2 , S)| |x1 x2 |.
Now consider the second claim about f . Since K is compact, there exists a posi-
tive distance, 2 between K and GC . (Why?) Now let D1 , , Dm be closed balls of
radius which cover K. Hence K m i=1 Di H is a compact { set which
( contains
) }K
and is contained in G. Now consider the open sets B (0, k) x : dist x, GC > k1 .
These open sets cover H and are increasing so there exists one of them W which
contains H. Note that the closure of this set is compact because it is closed and
bounded. Hence K H W G and W is compact. Let
( )
dist x, W C
f (x) =
dist (x, W C ) + dist (x,H)
Then since H is compact, it has positive distance to W C and so the denominator
is nonzero. Hence the function given has the desired properties.
Note that the lemma proves a little more than needed by having the function
equal 1 on an open set containing K.
Corollary 10.5.3 Let U and V be bounded open sets in Rp and let h, h1 be C 1
functions such that h (U ) = V . Also let E V be measurable. Then

XE (y) dmp = XE (h (x)) |det (Dh (x))| dmp .
V U
10.5. CHANGE OF VARIABLES, NONLINEAR MAPS 253
Proof: First suppose E H V where H is compact. By regularity, there

exist compact sets Kk and a decreasing sequence of open sets Gk V such that
Kk E Gk
and mp (Gk \ Kk ) < 2k . By Lemma 10.5.2, there exist fk such that Kk fk Gk .

Then fk (y) XE (y) a.e. because if y is such that convergence
fails, it must be
the case that y is in Gk \ Kk for innitely many k and k mp (Gk \ Kk ) < . This
set equals
N =
m=1 k=m Gk \ Kk
and so for each m N
mp (N ) mp (
k=m Gk \ Kk )

mp (Gk \ Kk ) < 2k = 2(m1)
k=m k=m
showing mp (N ) = 0.
/ h1 (N ), a set of measure
Then fk (h (x)) must converge to XE (h (x)) for all x
zero by Lemma 10.4.1. Thus XE (h (x)) = limk fk (h (x)) o h1 (N ) and so by
completeness of Lebesgue measure, x XE (h (x)) is measurable. Then

fk (y) dmp = fk (h (x)) |det (Dh (x))| dmp .
V U
Since V is bounded, G1 is compact. Therefore, |det (Dh (x))| is bounded inde-

pendent of k and so, by the dominated convergence theorem, using a dominating
function, XV in the integral on the left and XG1 |det (Dh)| on the right, it follows

V U
For an arbitrary measurable E, let V = k=1 Hk where Hk is compact and Hk

Hk+1 . Let Ek = Hk E replace E in the above and use the monotone convergence
theorem letting k .
You dont need to assume the open sets are bounded.
Corollary 10.5.4 Let U and V be open sets in Rp and let h, h1 be C 1 functions

such that h (U ) = V . Also let E V be measurable. Then

V U
Proof: Since both h, h1 are continuous, h maps open sets to open sets. Let
Un = B (0, n) U where n is large enough that this intersection is nonempty. Let
Vn = h (Un ) , an open set. Then if E is a Lebesgue measurable set, the above
implies
XEVn (y) dmp = XEVn (h (x)) |det Dh (x)| dmp
Vn Un
Hence
XEVn (y) dmp = XEVn (h (x)) |det Dh (x)| dmp
V U
Now let n and use the monotone convergence theorem.
With this corollary, the main theorem follows.
Theorem 10.5.5 Let U and V be open sets in Rp and let h, h1 be C 1 functions

such that h (U ) = V. Then if g is a nonnegative Lebesgue measurable function,

g (y) dmp = g (h (x)) |det (Dh (x))| dmp . (10.5.10)
V U
Proof: From Corollary 10.5.4, 10.5.10 holds for any nonnegative simple function
in place of g. In general, let {sk } be an increasing sequence of simple functions which
converges to g pointwise. Then from the monotone convergence theorem

g (y) dmp = lim sk dmp = lim sk (h (x)) |det (Dh (x))| dmp
k V k U
V

= g (h (x)) |det (Dh (x))| dmp .
U

Of course this theorem implies the following corollary by splitting up the function
into the positive and negative parts of the real and imaginary parts.
Corollary 10.5.6 Let U and V be open sets in Rp and let h, h1 be C 1 functions

such that h (U ) = V. Let g L1 (V ) . Then

g (y) dmp = g (h (x)) |det (Dh (x))| dmp .
V U
This is a pretty good theorem but it isnt too hard to generalize it. In particular,
it is not necessary to assume h1 is C 1 .
In what follows, it may be convenient to take
||x|| = max {|xi | , i = 1, , p} ||x||
and use the operator norm for A L (Rp , Rp ).
10.6 The Mapping Is Only One To One

The following is Sards lemma. In the proof, it does not matter which norm you use
in dening balls but it may be easiest to consider the norm ||x|| max {|xi | , i = 1, , p}.
Lemma 10.6.1 (Sard) Let U be an open set in Rp and let h : U Rp be dieren-

tiable. Let
Z {x U : det Dh (x) = 0} .
Then mp (h (Z)) = 0.
10.6. THE MAPPING IS ONLY ONE TO ONE 255
Proof: For convenience, assume the balls in the following argument come from
|||| . First note that Z is a Borel set because h is continuous and so the component
functions of the Jacobian matrix are each Borel measurable. Hence the determinant
is also Borel measurable.
Suppose that U is a bounded open set. Let > 0 be given. Also let V Z with
V U open, and
mp (Z) + > mp (V ) .
Now let x Z. Then since h is dierentiable at x, there exists x > 0 such that if
r < , then B (x, r) V and also,
h (B (x,r)) h (x) + Dh (x) (B (0,r)) + B (0,r) , < 1.
Regard Dh (x) as an n n matrix, the matrix of the linear transformation Dh (x)

with respect to the usual coordinates. Since x Z, it follows that there exists an
invertible matrix A such that ADh (x) is in row reduced echelon form with a row
of zeros on the bottom. Therefore,
mp (A (h (B (x,r)))) = mp (ADh (x) (B (0,r)) + AB (0,r)) (10.6.11)
The diameter of ADh (x) (B (0,r)) is no larger than ||A|| ||Dh (x)|| 2r and it lies in
Rp1 {0} . The diameter of AB (0,r) is no more than ||A|| (2r) .Therefore, the
measure of the right side in 10.6.11 is no more than
p1
[(||A|| ||Dh (x)|| 2r + ||A|| (2)) r] (r)
p
C (||A|| , ||Dh (x)||) (2r)
Hence from the change of variables formula for linear maps,
C (||A|| , ||Dh (x)||)

mp (h (B (x,r))) mp (B (x, r))
|det (A)|
Then letting x be still smaller if necessary, corresponding to suciently small ,
mp (h (B (x,r))) mp (B (x, r))
The balls of this form constitute a Vitali cover of Z. Hence, by the Vitali covering

theorem, there exists {Bi }i=1 , Bi = Bi (xi , ri ) , a collection of disjoint balls, each of
which is contained in V, such that mp (h (Bi )) mp (Bi ) and mp (Z \ i Bi ) = 0.
Hence from Lemma 10.4.1,
mp (h (Z) \ i h (Bi )) mp (h (Z \ i Bi )) = 0
Therefore,

mp (h (Z)) mp (h (Bi )) mp (Bi )
i i
(mp (V )) (mp (Z) + ) .
Since is arbitrary, this shows mp (h (Z)) = 0. What if U is not bounded? Then

consider Zn = Z B (0, n) . From what was just shown, h (Zn ) has measure 0 and
so it follows that h (Z) also does, being the countable union of sets of measure zero.

With this important lemma, here is a generalization of Theorem 10.5.5.
Theorem 10.6.2 Let U be an open set and let h be a 1 1, C 1 (U ) function with

values in Rp . Then if g is a nonnegative Lebesgue measurable function,

g (y) dmp = g (h (x)) |det (Dh (x))| dmp . (10.6.12)
h(U ) U
Proof: Let Z = {x : det (Dh (x)) = 0} , a closed set. Then by the inverse
function theorem, h1 is C 1 on h (U \ Z) and h (U \ Z) is an open set. Therefore,
from Lemma 10.6.1, h (Z) has measure zero and so by Theorem 10.5.5,

g (y) dmp = g (y) dmp = g (h (x)) |det (Dh (x))| dmp
h(U ) h(U \Z) U \Z

= g (h (x)) |det (Dh (x))| dmp .
U
10.7 Mappings Which Are Not One To One

Now suppose h is only C 1 , not necessarily one to one. For
U+ {x U : |det Dh (x)| > 0}
and Z the set where |det Dh (x)| = 0, Lemma 10.6.1 implies mp (h(Z)) = 0. For
x U+ , the inverse function theorem implies there exists an open set Bx U+ ,
such that h is one to one on Bx .
Let {Bi } be a countable subset of {Bx }xU+ such that U+ = i=1 Bi . Let
E1 = B1 . If E1 , , Ek have been chosen, Ek+1 = Bk+1 \ ki=1 Ei . Thus

i=1 Ei = U+ , h is one to one on Ei , Ei Ej = ,
and each Ei is a Borel set contained in the open set Bi . Now dene

n(y) Xh(Ei ) (y) + Xh(Z) (y).
i=1
The set h (Ei ) , h (Z) are measurable by Lemma 10.4.2. Thus n () is measurable.
Lemma 10.7.1 Let F h(U ) be measurable. Then

n(y)XF (y)dmp = XF (h(x))| det Dh(x)|dmp .
h(U ) U
10.7. MAPPINGS WHICH ARE NOT ONE TO ONE 257
Proof: Using Lemma 10.6.1 and the Monotone Convergence Theorem

mp (h(Z))=0
z }| {

n(y)XF (y)dmp = Xh(Ei ) (y) + Xh(Z) (y) XF (y)dmp
h(U ) h(U ) i=1

= Xh(Ei ) (y)XF (y)dmp
i=1 h(U )

= Xh(Ei ) (y)XF (y)dmp
i=1 h(Bi )

= XEi (x)XF (h(x))| det Dh(x)|dmp
i=1 Bi

i=1 U

U i=1

= XF (h(x))| det Dh(x)|dmp = XF (h(x))| det Dh(x)|dmp .
U+ U
Denition 10.7.2 For y h(U ), dene a function, #, according to the formula
#(y) number of elements in h1 (y).
Observe that
#(y) = n(y) a.e. (10.7.13)
because n(y) = #(y) if y
/ h(Z), a set of measure 0. Therefore, # is a measurable
function because of completeness of Lebesgue measure.
Theorem 10.7.3 Let g 0, g measurable, and let h be C 1 (U ). Then

#(y)g(y)dmp = g(h(x))| det Dh(x)|dmp . (10.7.14)
h(U ) U
Proof: From 10.7.13 and Lemma 10.7.1, 10.7.14 holds for all g, a nonnegative
simple function. Approximating an arbitrary measurable nonnegative function, g,
with an increasing pointwise convergent sequence of simple functions and using
the monotone convergence theorem, yields 10.7.14 for an arbitrary nonnegative
measurable function, g.
10.8 Spherical Coordinates In p Dimensions

Sometimes there is a need to deal with spherical coordinates in more than three
dimensions. In this section, this concept is dened and formulas are derived for
these coordinate systems. Recall polar coordinates are of the form
y1 = cos
y2 = sin
where > 0 and R. Thus these transformation equations are not one to one
but they are one to one on (0, ) [0, 2). Here I am writing in place of r to
emphasize a pattern which is about to emerge. I will consider polar coordinates as
spherical coordinates in two dimensions. I will also simply refer to such coordinate
systems as polar coordinates regardless of the dimension. This is also the reason I
am writing y1 and y2 instead of the more usual x and y. Now consider what happens
when you go to three dimensions. The situation is depicted in the following picture.
R (x1 , x2 , x3 )
1

R2
From this picture, you see that y3 = cos 1 . Also the distance between (y1 , y2 )
and (0, 0) is sin (1 ) . Therefore, using polar coordinates to write (y1 , y2 ) in terms
of and this distance,
y1 = sin 1 cos ,
y2 = sin 1 sin ,
y3 = cos 1 .
where 1 R and the transformations are one to one if 1 is restricted to be in

[0, ] . What was done is to replace with sin 1 and then to add in y3 = cos 1 .
Having done this, there is no reason to stop with three dimensions. Consider the
following picture:
R (x1 , x2 , x3 , x4 )
2

R3
From this picture, you see that y4 = cos 2 . Also the distance between (y1 , y2 , y3 )
and (0, 0, 0) is sin (2 ) . Therefore, using polar coordinates to write (y1 , y2 , y3 ) in
10.8. SPHERICAL COORDINATES IN P DIMENSIONS 259
terms of , 1 , and this distance,
y1 = sin 2 sin 1 cos ,

y2 = sin 2 sin 1 sin ,
y3 = sin 2 cos 1 ,
y4 = cos 2
where 2 R and the transformations will be one to one if
2 , 1 (0, ) , (0, 2) , (0, ) .
Continuing this way, given spherical coordinates in Rp , to get the spherical

coordinates in Rp+1 , you let yp+1 = cos p1 and then replace every occurance of
with sin p1 to obtain y1 yp in terms of 1 , 2 , , p1 ,, and .
It is always the case that measures the distance from the point in Rp to the
origin in Rp , 0. Each i R and the transformations
( ) will be one to one if each
i (0, ) , and (0, 2) . Denote by hp , , the above transformation.

It can be shown
p2 using math induction and geometric reasoning that these co-
ordinates map i=1 (0, ) (0, 2) (0, ) one to one onto an open subset of Rp
which is everything except for the set of measure zero p (N ) where N results from
having some i equal to 0 or or for = 0 or for equal to either 2 or 0. Each of
these are sets of Lebesgue measure
( zero and so their union) is also a set of measure
p2
zero. You can see that hp i=1 (0, ) (0, 2) (0, ) omits the union of the
coordinate axes except for maybe one of them. This is not important to the integral
because it is just a set of measure zero.
( )
Theorem 10.8.1 Let y = hp , , be the spherical coordinate transformations
p2
in Rp . Then letting A = i=1 (0, ) (0, 2) , it follows h maps A (0, ) one to
one onto all of Rp except a set of measure zero given by hp (N ) where N is the set
of measure zero ( )
A [0, ) \ (A (0, ))
( )
, will always be of the form
Also det Dhp ,
( ) ( )
, = p1 ,
.
det Dhp , (10.8.15)
and .1 Then if f is nonnegative and Lebesgue

where is a continuous function of
measurable,
( ( )) ( )
f (y) dmp = f (y) dmp = f hp , , p1 , dmp
Rp hp (A) A
(10.8.16)
1 Actually it is only a function of the rst but this is not important in what follows.
Furthermore whenever f is Borel measurable and nonnegative, one can apply

Fubinis theorem and write
( ( )) ( )
f (y) dy = p1 , ,
f h , d
dd (10.8.17)
Rp 0 A
d denotes dmp1 on A. The same formulas hold if f L1 (Rp ) .

where here d
Proof: Formula 10.8.15 is obvious from the denition of the spherical coordi-
nates because in the matrix of the derivative, there will be a in p 1 columns.
The rst claim is also clear from the denition and math induction or from the
geometry of the above description. It remains to verify 10.8.16 and 10.8.17. It is
clear hp maps A [0, ) onto Rp . Since hp is dierentiable, it maps sets of measure
zero to sets of measure zero. Then
Rp = hp (N A (0, )) = hp (N ) hp (A (0, )) ,
the union of a set of measure zero with hp (A (0, )) . Therefore, from the change
of variables formula,
( ( )) ( )
f (y) dmp = f (y) dmp = f hp , , p1 , dmp
Rp hp (A(0,)) A(0,)
which proves 10.8.16. This formula continues to hold if f is in L1 (Rp ). Finally,

if f 0 or in L1 (Rn ) and is Borel measurable, then it is F p measurable as well.
Recall that F p includes the smallest algebra which contains products of open
intervals. Hence F p includes the Borel sets B (Rp ). Thus from the denition of mp
( ( )) ( )
f hp , , p1 , dmp
A(0,)
( ( )) ( )
= , p1 ,
f hp , dmp1 dm
(0,) A
( ( )) ( )
= p1
f hp , , , dmp1 dm
(0,) A
Now the claim about f L1 follows routinely from considering the positive and
negative parts of the real and imaginary parts of f in the usual way.
Note that the above equals
( ( )) ( )
f hp , , p1 , dmp

A[0,)
and the iterated integral is also equal to

( ( )) ( )
p1 f hp , , , dmp1 dm
[0,)
A
because the dierence is just a set of measure zero.

10.9. BROUWER FIXED POINT THEOREM 261
Notation 10.8.2 Often this is written dierently. Note that from the spherical co-
ordinate formulas, f (h (, , )) = f () where || = 1. Letting S p1 denote the
unit sphere, { Rp : || = 1} , the inside integral in the above formula is some-
times written as
f () d
S p1
where is a measure on S p1 . See [32] for another description of this measure. It

isnt an important issue here. Either 10.8.17 or the formula
( )
p1 f () d d
0 S p1
will be referred
( ) toas polar coordinates and is very useful in establishing estimates.
Here S p1 A (, ) dmp1 .
( )s
2
Example 10.8.3 For what values of s is the integral B(0,R)
1 + |x| dy bounded
independent of R? Here B (0, R) is the ball, {x R : |x| R} . p
I think you can see immediately that s must be negative but exactly how neg-
ative? It turns out it depends on p and using polar coordinates, you can nd just
exactly what is needed. From the polar coordinates formula above,
( )s
2
R ( )s
1 + |x| dy = 1 + 2 p1 dd
B(0,R) 0 S p1
R ( )s
= Cp 1 + 2 p1 d
0
Now the very hard problem has been reduced to considering an easy one variable
problem of nding when
R
( )s
p1 1 + 2 d
0
is bounded independent of R. You need 2s + (p 1) < 1 so you need s < p/2.
10.9 Brouwer Fixed Point Theorem

The Brouwer xed point theorem is one of the most signicant theorems in math-
ematics. There exist relatively easy proofs of this important theorem. The proof I
am giving here is the one given in Evans [17]. I think it is one of the shortest and
easiest proofs of this important theorem. It is based on the following lemma which
is an interesting result about cofactors of a matrix.
Recall that for A an p p matrix, cof (A)ij is the determinant of the matrix
which results from deleting the ith row and the j th column and multiplying by
i+j
(1) . In the proof and in what follows, I am using Dg to equal the matrix of
the linear transformation Dg taken with respect to the usual basis on Rp . Thus

Dg (x) = (Dg)ij ei ej
ij

and recall that (Dg)ij = gi /xj where g = i gi ei .
Lemma 10.9.1 Let g : U Rp be C 2 where U is an open subset of Rp . Then

p
cof (Dg)ij,j = 0,
j=1
det(Dg)
where here (Dg)ij gi,j gi
xj . Also, cof (Dg)ij = gi,j .
Proof: From the cofactor expansion theorem,

p
det (Dg) = gi,j cof (Dg)ij
i=1
and so
det (Dg)
= cof (Dg)ij (10.9.18)
gi,j
which shows the last claim of the lemma. Also

kj det (Dg) = gi,k (cof (Dg))ij (10.9.19)
i
because if k = j this is just the cofactor expansion of the determinant of a matrix

in which the k th and j th columns are equal. Dierentiate 10.9.19 with respect to
xj and sum on j. This yields
(det Dg)
kj gr,sj = gi,kj (cof (Dg))ij + gi,k cof (Dg)ij,j .
r,s,j
gr,s ij ij
Hence, using kj = 0 if j = k and 10.9.18,

(cof (Dg))rs gr,sk = gr,ks (cof (Dg))rs + gi,k cof (Dg)ij,j .
rs rs ij
Subtracting the rst sum on the right from both sides and using the equality of
mixed partials,

gi,k (cof (Dg))ij,j = 0.
i j
10.9. BROUWER FIXED POINT THEOREM 263

If det (gi,k ) = 0 so that (gi,k ) is invertible, this shows j (cof (Dg))ij,j = 0. If
det (Dg) = 0, let
gk (x) = g (x) + k x
where k 0 and det (Dg + k I) det (Dgk ) = 0. Then

(cof (Dg))ij,j = lim (cof (Dgk ))ij,j = 0
k
j j
( ) 10.9.2 Let h be a function dened on an open set U R . Then

p
Denition
h C U if there exists a function g dened on an open set W containng U such
k
that g = h on U and g is C k (W ) .
In the following lemma, you could use any norm in dening the balls and every-
thing would work the same but I have in mind the usual norm.
( )
Lemma 10.9.3 There does not exist h C 2 B (0, R) such that h :B (0, R)
B (0, R) which also has the property that h (x) = x for all x B (0, R) . Such a
function is called a retraction.
Proof: Suppose such an h exists. Let [0, 1] and let p (x) x+ (h (x) x) .
This function, p is called a homotopy of the identity map and the retraction, h.
Let
I () det (Dp (x)) dx.
B(0,R)
Then using the dominated convergence theorem,

det (Dp (x)) pij (x)
I () = dx
B(0,R) i.j pi,j
det (Dp (x))
= (hi (x) xi ),j dx
B(0,R) i j
pi,j

= cof (Dp (x))ij (hi (x) xi ),j dx
B(0,R) i j
Now by assumption, hi (x) = xi on B (0, R) and so one can form iterated integrals
and integrate by parts in each of the one dimensional integrals to obtain

I () = cof (Dp (x))ij,j (hi (x) xi ) dx = 0.
i B(0,R) j
Therefore, I () equals a constant. However,
I (0) = mp (B (0, R)) > 0
but
I (1) = det (Dh (x)) dmp = # (y) dmp = 0
B(0,1) B(0,1)
because from polar coordinates or other elementary reasoning, mp (B (0, 1)) = 0.

The following is the Brouwer xed point theorem for C 2 maps.
( )
Lemma 10.9.4 If h C 2 B (0, R) and h : B (0, R) B (0, R), then h has a
xed point, x such that h (x) = x.
Proof: Suppose the lemma is not true. Then for all x, |x h (x)| = 0. Then
dene
x h (x)
g (x) = h (x) + t (x)
|x h (x)|
where t (x) is nonnegative and is chosen such that g (x) B (0, R) . This mapping
is illustrated in the following picture.
f (x)
x
g(x)
If x t (x) is C 2 near B (0, R), it will follow g is a C 2 retraction onto B (0, R)

contrary to Lemma 10.9.3. Now t (x) is the nonnegative solution, t to
( )
2 x h (x)
H (x, t) = |h (x)| + 2 h (x) , t + t2 = R 2 (10.9.20)
|x h (x)|
Then ( )
x h (x)
Ht (x, t) = 2 h (x) , + 2t.
|x h (x)|
If this is nonzero for all x near B (0, R), it follows from the implicit function theorem
that t is a C 2 function of x. From 10.9.20
( )
x h (x)
2t = 2 h (x) ,
|x h (x)|
( )2
x h (x) ( )
2
4 h (x) , 4 |h (x)| R2
|x h (x)|
and so
( )
x h (x)
Ht (x, t) = 2t + 2 h (x) ,
|x h (x)|
( )2
( ) x h (x)
2
= 4 R |h (x)| + 4 h (x) ,
2
|x h (x)|
10.10. EXERCISES 265
If |h (x)| < R, this is nonzero. If |h (x)| = R, then it is still nonzero unless

(h (x) , x h (x)) = 0.
But this cannot happen because the angle between h (x) and x h (x) cannot be
/2. Alternatively, if the above equals zero, you would need
2
(h (x) , x) = |h (x)| = R2
which cannot happen unless x = h (x) which is assumed not to happen. Therefore,
x t (x) is C 2 near B (0, R) and so g (x) given above contradicts Lemma 10.9.3.

Now it is easy to prove the Brouwer xed point theorem.
Theorem 10.9.5 Let f : B (0, R) B (0, R) be continuous. Then f has a xed
point.
Proof: If this is not so, there exists > 0 such that for all x B (0, R),
|x f (x)| > .
By the Weierstrass approximation theorem, there exists h, a polynomial such that
{ }
max |h (x) f (x)| : x B (0, R) < .
2
Then for all x B (0, R),

|x h (x)| |x f (x)| |h (x) f (x)| > =
2 2
contradicting Lemma 10.9.4.
10.10 Exercises
1. Recall the denition of fy . Prove that if f L1 (Rp ) , then

lim |f fy | dmp = 0
y0 Rp
This is known as continuity of translation. Hint: Use the theorem about

being able to approximate an arbitrary function in L1 (Rp ) with a function in
Cc (Rp ).
2. Show that if a, b 0 and if p, q > 0 such that
1 1
+ =1
p q
then
ap bq
ab +
p q
ap bq
Hint: You might consider for xed a 0, the function h (b) p + q ab
and nd its minimum.
3. In the context of the previous problem, prove Holders inequality. If f, g

measurable functions, then
( )1/p ( )1/q
p q
|f | |g| d |f | d |g| d
Hint: If either of the factors on the right equals 0, explain why there is nothing
( p )1/p ( q )1/q
to show. Now let a = |f | / |f | d and b = |g| / |g| d . Apply
the inequality of the previous problem.
4. If f L1 (Rp ) , show there exists g L1 (Rp ) such that g is also Borel
measurable such that g (x) = f (x) for a.e. x.
5. Suppose f, g L1 (Rp ) . Dene f g (x) by

f (x y) g (y) dmp (y) .
Show this makes sense for a.e. x and that in fact for a.e. x

|f (x y)| |g (y)| dmp (y)
Next show
|f g (x)| dmp (x) |f | dmp |g| dmp .
Hint: Use Problem 4. Show rst there is no problem if f, g are Borel mea-
surable. The reason for this is that you can use Fubinis theorem to write

|f (x y)| |g (y)| dmp (y) dmp (x)

= |f (x y)| |g (y)| dmp (x) dmp (y)

= |f (z)| dmp |g (y)| dmp .
Explain. Then explain why if f and g are replaced by functions which are
equal to f and g a.e. but are Borel measurable, the convolution is unchanged.
6. In the situation of Problem 5 Show x f g (x) is continuous whenever g
is also bounded. Hint: Use Problem 1.
7. Let f : [0, ) R be in L1 (R, m). The Laplace transform is given by
x
fb(x) = 0 ext f (t)dt. Let f, g be in L1 (R, m), and let h(x) = 0 f (x
t)g(t)dt. Show h L1 , and b
h = fbgb.
8. Suppose A is covered by a nite collection of Balls, F. Show that then
p bi
there exists a disjoint collection of these balls, {Bi }i=1 , such that A pi=1 B
b
where Bi has the same center as Bi but 3 times the radius. Hint: Since the
collection of balls is nite, they can be arranged in order of decreasing radius.
9. Let f be a function dened on an interval, (a, b). The Dini derivates are
dened as
f (x + h) f (x)
D+ f (x) lim inf ,
h0+ h
f (x + h) f (x)
D+ f (x) lim sup
h0+ h
f (x) f (x h)
D f (x) lim inf ,
h0+ h
f (x) f (x h)
D f (x) lim sup .
h0+ h
Suppose f is continuous on (a, b) and for all x (a, b), D+ f (x) 0. Show
that then f is increasing on (a, b). Hint: Consider the function, H (x)
f (x) (d c) x (f (d) f (c)) where a < c < d < b. Thus H (c) = H (d).
Also it is easy to see that H cannot be constant if f (d) < f (c) due to the
assumption that D+ f (x) 0. If there exists x1 (a, b) where H (x1 ) > H (c),
then let x0 (c, d) be the point where the maximum of f occurs. Consider
D+ f (x0 ). If, on the other hand, H (x) < H (c) for all x (c, d), then consider
D+ H (c).
10. Suppose in the situation of the above problem we only know
D+ f (x) 0 a.e.
Does the conclusion still follow? What if we only know D+ f (x) 0 for every
x outside a countable set? Hint: In the case of D+ f (x) 0,consider the
bad function in the exercises for the chapter on the construction of measures
which was based on the Cantor set. In the case where D+ f (x) 0 for all but
countably many x, by replacing f (x) with fe(x) f (x) + x, consider the
situation where D+ fe(x) > 0 for all but
( countably )many x. If in this situation,
f (c) > f (d) for some c < d, and y fe(d) , fe(c) ,let
e e
{ }
z sup x [c, d] : fe(x) > y0 .
Show that fe(z) = y0 and D+ fe(z) 0. Conclude that if fe fails to be in-

creasing, then D+ fe(z) 0 for uncountably many points, z. Now draw a
conclusion about f .
11. Let f : [a, b] R be increasing. Show
Npq

z[ }| {]

m D+ f (x) > q > p > D+ f (x) = 0 (10.10.21)
and conclude that aside from a set of measure zero, D+ f (x) = D+ f (x).
Similar reasoning will show D f (x) = D f (x) a.e. and D+ f (x) = D f (x)
a.e. and so o some set of measure zero, we have
D f (x) = D f (x) = D+ f (x) = D+ f (x)
which implies the derivative exists and equals this common value. Hint: To
show 10.10.21, let U be an open set containing Npq such that m (Npq ) + >
m (U ). For each x Npq there exist y > x arbitrarily close to x such that
f (y) f (x) < p (y x) .
Thus the set of such intervals, {[x, y]} which are contained in U constitutes a
Vitali cover of Npq . Let {[xi , yi ]} be disjoint and
m (Npq \ i [xi , yi ]) = 0.
Now let V i (xi , yi ). Then also we have

=V
z }| {
m Npq \ i (xi , yi ) = 0.
and so m (Npq V ) = m (Npq ). For each x Npq V , there exist y > x

arbitrarily close to x such that
f (y) f (x) > q (y x) .
Thus the set of such intervals, {[x , y ]} which are contained in V is a Vitali
cover of Npq V . Let {[xi , yi ]} be disjoint and
m (Npq V \ i [xi , yi ]) = 0.
Then verify the following:

f (yi ) f (xi ) > q (yi xi ) qm (Npq V ) = qm (Npq )
i i

pm (Npq ) > p (m (U ) ) p (yi xi ) p
i

(f (yi ) f (xi )) p f (yi ) f (xi ) p
i i
and therefore, (q p) m (Npq ) p. Since > 0 is arbitrary, this proves that

there is a right derivative a.e. A similar argument does the other cases.
12. Suppose f is a function in L1 (R) and f is dierentiable. Does it follow
that f L1 (R)? Hint: What if is C 1 and vanishes inside (0, 1) (Give an
example.) and f (x) = (2p (x p)) for x (p, p + 1) , f (x) = 0 if x < 0?
13. Why is it that if f L1 (R) , then there exists g C 1 (R) which vanishes o
some nite interval? Consider g Cc (R) which is close to f in L1 (R) and
x+h
then consider gh (x) 2h
1
xh
g (t) dt.
14. Prove Lemma 10.4.1 which says a C 1 function maps a set of measure zero to
a set of measure zero using Theorem 10.7.3.
r
15. For this problem dene a f (t) dt limr a f (t) dt. Note this coincides
with the Lebesgue integral when f L1 (a, ). Show
sin(u)
(a) 0 u du = 2

(b) limr sin(ru)
u du= 0 whenever > 0.

(c) If f L (R), then limr R sin (ru) f (u) du = 0.
1

Hint: For the rst two, use u1 = 0 eut dt and apply Fubinis theorem to
R
0
sin u R eut dtdu. For the last part, rst establish it for f a C 1 function
which vanishes o a nite interval and then use the density of this set in
L1 (R) to obtain the result. This is called the Riemann Lebesgue lemma.
16. Suppose that g L1 (R) and that at some x > 0, g is locally Holder
continuous from the right and from the left. This means
lim g (x + r) g (x+)
r0+
exists,
lim g (x r) g (x)
r0+
exists and there exist constants K, > 0 and r (0, 1] such that for |x y| <
,
r
|g (x+) g (y)| < K |x y|
for y > x and
r
|g (x) g (y)| < K |x y|
for y < x. Show that under these conditions,
( )
2 sin (ur) g (x u) + g (x + u)
lim du
r 0 u 2
g (x+) + g (x)
= .
2
17. Let f L1 (R) . Then the Fourier transform of f is given by

1
F f (t) = f (s) eist ds.
2 R
Let g L1 (R) and suppose g is locally Holder continuous from the right and
from the left at x. Show that then
R
1 g (x+) + g (x)
lim eixt eity g (y) dydt = .
R 2 R 2
This is very interesting. Hint: Show the left side of the above equation
reduces to ( )
2 sin (ur) g (x u) + g (x + u)
du
0 u 2
and then use Problem 16 to obtain the result.
18. A measurable function g dened on (0, ) has exponential growth if
|g (t)| Cet for some . For Re (s) > , dene the Laplace Transform by

Lg (s) esu g (u) du.
0
Assume that g has exponential growth as above and is Holder continuous from
the right and from the left at t. Pick > . Show that
R
1 g (t+) + g (t)
lim et eiyt Lg ( + iy) dy = .
R 2 R 2
This formula is sometimes written in the form
+i
1
est Lg (s) ds
2i i
and is called the complex inversion integral for Laplace transforms. It can be
used to nd inverse Laplace transforms. Hint:
R
1
et eiyt Lg ( + iy) dy =
2 R
R
1
et eiyt e(+iy)u g (u) dudy.
2 R 0
Now use Fubinis theorem and do the integral from R to R to get this equal
to
et u sin (R (t u))
e g (u) du
tu
where g is the zero extension of g o [0, ). Then this equals

et (tu) sin (Ru)
e g (t u) du
u
which equals

2et g (t u) e(tu) + g (t + u) e(t+u) sin (Ru)
du
0 2 u
and then apply the result of Problem 16.
19. Let K be a nonempty closed and convex subset of Rp . Recall K is convex

means that if x, y K, then for all t [0, 1] , tx + (1 t) y K. Show that
if x Rp there exists a unique z K such that
|x z| = min {|x y| : y K} .
This z will be denoted as P x. Hint: First note you do not know K is compact.
Establish the parallelogram identity if you have not already done so,
2 2 2 2
|u v| + |u + v| = 2 |u| + 2 |v| .
Then let {zk } be a minimizing sequence,
2
lim |zk x| = inf {|x y| : y K} .
k
Now using convexity, explain why

2 2 2
zk zm 2
+ x zk + zm = 2 x zk + 2 x zm
2 2 2 2
and then use this to argue {zk } is a Cauchy sequence. Then if zi works for
i = 1, 2, consider (z1 + z2 ) /2 to get a contradiction.
20. In Problem 19 show that P x satises the following variational inequality.
(xP x) (yP x) 0
for all y K. Then show that |P x1 P x2 | |x1 x2 |. Hint: For the rst
2
part note that if y K, the function t |x (P x + t (yP x))| achieves its
minimum on [0, 1] at t = 0. For the second part,
(x1 P x1 ) (P x2 P x1 ) 0, (x2 P x2 ) (P x1 P x2 ) 0.
Explain why
(x2 P x2 (x1 P x1 )) (P x2 P x1 ) 0
and then use a some manipulations and the Cauchy Schwarz inequality to get
the desired inequality.
21. Establish the Brouwer xed point theorem for any convex compact set in Rp .
Hint: If K is a compact and convex set, let R be large enough that the closed
ball, D (0, R) K. Let P be the projection onto K as in Problem 20 above.
If f is a continuous map from K to K, consider f P . You want to show f has
a xed point in K.
22. In the situation of the implicit function theorem, suppose f (x0 , y0 ) = 0 and
assume f is C 1 . Show that for (x, y) B (x0 , ) B (y0 , r) where , r are
small enough, the mapping
1
x Ty (x) xD1 f (x0 , y0 ) f (x, y)
is continuous and maps B (x0 , ) to B (x0 , /2) B (x0 , ). Apply the Brouwer
xed point theorem to obtain a shorter proof of the implicit function theorem.
23. Here is a really interesting little theorem which depends on the Brouwer xed
point theorem. It plays a prominent role in the treatment of the change of
variables formula in Rudins book, [40] and is useful in other contexts as well.
The idea is that if a continuous function mapping a ball in Rk to Rk doesnt
move any point very much, then the image of the ball must contain a slightly
smaller ball.
Lemma: Let B = B (0, r), a ball in Rk and let F : B Rk be continuous
and suppose for some < 1,
|F (v) v| < r (10.10.22)
for all v B. Then
F (B) B (0, r (1 )) .
Hint: Suppose a B (0, r (1 )) \ F (B) so it didnt work. First explain
why a = F (v) for all v B. Now letting G :B B, be dened by G (v)
r(aF(v))
|aF(v)| ,it follows G is continuous. Then by the Brouwer xed point theorem,
G (v) = v for some v B. Explain why |v| = r. Then take the inner product
with v and explain the following steps.
2 r
(G (v) , v) = |v| = r2 = (a F (v) , v)
|a F (v)|
r
= (a v + v F (v) , v)
|a F (v)|
r
= [(a v, v) + (v F (v) , v)]
|a F (v)|
r [ ]
2
= (a, v) |v| + (v F (v) , v)
|a F (v)|
r [ 2 ]
r (1 ) r2 +r2 = 0.
|a F (v)|
24. Using Problem 23 establish the following interesting result. Suppose f : U
Rp is dierentiable. Let
S = {x U : det Df (x) = 0}.
Show f (U \ S) is an open set.
25. Let K be a closed, bounded and convex set in Rp and let f : K Rp be
continuous and let y Rp . Show using the Brouwer xed point theorem
there exists a point x K such that P (y f (x) + x) = x. Next show that
(y f (x) , z x) 0 for all z K. The existence of this x is known as
Browders lemma and it has great signicance in the study of certain types of
nolinear operators. Now suppose f : Rp Rp is continuous and satises
(f (x) , x)
lim = .
|x| |x|
Show using Browders lemma that f is onto.
Approximation Theorems
11.1 Bernstein Polynomials

In this chapter, is the Stone Weierstrass approximation theorem. To begin with,
a very special case will be presented which has to do with functions dened on
intervals. First here is a little lemma.
Lemma 11.1.1 Let x [0, 1] . Then

m (
)
m mk 2 1
xk (1 x) (k mx) m.
k 2
k=0
Proof: Let
( )
( )m
m
m mk
(t) 1 x + et x = ekt xk (1 x)
k
k=0
Then
( )m1
(t) = mxet xet x + 1
( )m2 ( )
(t) = mxet xet x + 1 mxet x + 1
Then
m ( )
m mk
(0) = mx = kxk (1 x) ,
k
k=0
m (
)
m
(0) =
mk
mx (mx x + 1) = k 2 xk (1 x)
k
k=0
Therefore,
m (
) m (
)
m mk 2 m mk 2
x (1 x)
k
(k mx) xk (1 x) k
k k
k=0 k=0
273
274 APPROXIMATION THEOREMS
m (
)
m mk
2mx kxk (1 x) + m2 x2
k
k=0
= mx (mx x + 1) 2mx (mx) + m2 x2 = mx (1 x)

and this has a maximum when x = 1/2 which yields (1/4) m.
With this preparation, here is the rst version of the Weierstrass approximation
theorem.
Theorem 11.1.2 Let f C ([0, 1]) and let

m (
) ( )
m mk k
pm (x) x (1 x)
k
f .
k m
k=0
Then these polynomials converge uniformly to f on [0, 1].
Proof: Let ||f || denote the largest value of |f |. By uniform continuity of f ,

there exists a > 0 such that if |x x | < , then |f (x) f (x )| < /2. By the
binomial theorem,
m (
) ( )
m mk k
|pm (x) f (x)| x (1 x)
k
f f (x)
k m
k=0
( ) ( )

m
xk (1 x)
mk f k f (x) +
k m
|m
k
x|<
( )
m mk
2 ||f || xk (1 x)
k
|m
k
x|
Therefore,
m (
) ( )
m mk m mk
xk (1 x) + 2 ||f || xk (1 x)
k 2 k
k=0 (kmx)2 m2 2
m ( )
1 m 2 mk
+ 2 ||f || 2 2 (k mx) xk (1 x)
2 m k=0 k
1 1
+ 2 ||f || m 2 2 <
2 4 m
provided m is large enough.
Denition 11.1.3 For a continuous function f dened on some set S, let
||f || sup {|f (x)| : x S} .

11.2. STONES GENERALIZATION 275
Thus the above theorem can be stated as follows. There exists a sequence of polyno-
mials {pm } such that
lim ||pm f || = 0.
m
Thus |||| is a norm.
The classical form of the Weierstrass approximation theorem follows.
Corollary 11.1.4 If f C ([a, b]) , then there exists a sequence of polynomials

which converge uniformly to f on [a, b].
Proof: Let l : [0, 1] [a, b] be one to one, linear and onto. Then f l is
continuous on [0, 1] and so if > 0 is given, there exists a polynomial p such that
for all x [0, 1] ,
|p (x) f l (x)| <
Therefore, letting y = l (x) , it follows that for all y [a, b] ,
|p (y) f (y)| < .
As another corollary, here is the version which will be used in Stones general-
ization.
Corollary 11.1.5 Let f be a continuous function dened on [M, M ] with f (0) =

0. Then there exists a sequence of polynomials {pm } such that pm (0) = 0 and
limm ||pm f || = 0.
Proof: From Corollary 11.1.4 there exists a sequence of polynomials {pc

m } such
that ||pc
m f || 0. Simply consider pm = c
pm c
pm (0).
11.2 Stones Generalization

Stones generalization of this theorem must be one of the most elegant things to be
found. This is next. First it will be done for a compact subset of Rn and then it
will be done for Rn itself. However, it can be generalized much further to locally
compact Hausdor spaces. However, it is my intent to mainly feature Rn and so a
simplied version which will be sucient for the needs of this book will be presented.
Denition 11.2.1 A is called a real algebra of functions if whenever f, g A, so

is f g and A is also a real vector space. The closure of A, denoted by A consists
of all functions f such that for some sequence {gk } A, limk ||gk f || = 0.
An algebra A of functions dened on K is said to separate the points if for every
x = y, there exists A such that (x) = (y) . It is said to annihilate no point
if for every x K, there exists A such that (x) = 0.
Note that if A is an algebra which annihilates no point and separates the points,
then A is also an algebra which has these same properties.
Lemma 11.2.2 Suppose A is an algebra of continuous functions which are dened

on K, a compact subset of some Rn and that the algebra annihilates no point and
separates the points. Then for f A it follows that |f | is in A. If f, g are in A, so
is max (f, g) and min (f, g).
Proof: Say ||f || M. Let pm (t) |t| uniformly on [M, M ] with pm (0) = 0.
Thus pm (f ) A and limm |||f | pm (f )|| = 0. Hence |f | A. Now note that
|g f | + (f + g) (f + g) |g f |
max (f, g) = , min (f, g) =
2 2
Since A is an algebra, these are in A.

Now here is a fundamental observation. If p, q are two points of K, then there
exists a function of A which will achieve a desired value at these two points.
Lemma 11.2.3 Let A be an algebra of functions dened on K which separates

points and annihilates no point. Then if p, q are two dierent points of K and if
c, d R, then there exists a function pq A such that pq (p) = c, pq (q) = d.
Proof: By assumption, there exist , , A such that (p) = (q) , (p) =

0, (q) = 0. Let
(x) (q) c (x) (p) d

pq (x) (x) + (x)
(p) (q) (p) (q) (p) (q)
This clearly works in the sense that it gives the right values at p, q. Why is it in
A? This is obvious when you expand it and see that you are adding products of
functions in A and scalar multiples of functions in A.
Now here is the rst step to Stones generalization.
Theorem 11.2.4 Let K be a compact set and let A be a real algebra of continuous
functions dened on K such that A separates the points and annihilates no point.
Then for every f C (K) and > 0, there exists A such that ||f || .
Proof: Let f C (K) . I will show there exists A such that ||f || .
This will prove the desired result thanks to the denition of A.
Pick p K and let pq A be such that pq (p) = f (p) and pq (q) = f (q).
Then for each q K, there exists an open set Uq containing q such that pq (z)+ >
f (z) for all z Uq . Since K is compact, there }mnitely many, Uq1 , , Uqm which
{ are
cover K. Therefore, letting p (x) min pqi i=1 , it follows that p (p) = f (p)
and also p (x) + > f (x) for all x. Now of course p was arbitrary and for each p,
there exists an open set Vp containing p such that f (z) > p (z) for all z Vp .
n
Since K is compact, there exist nitely many of these Vp which cover K, {Vpi }i=1 .
Then it follows that for all x,
{ }n
f (x) > max pi i=1
11.2. STONES GENERALIZATION 277
{ }n
Let = max pi i=1 . Then (x) + > f (x) > (x) and so ||f || < .

The next step is to generalize this to algebras in C0 (Rn ) or more generally
to C0 (X) where X is a locally compact Hausdor space. I will ignore the more
abstract case at this time.
Denition 11.2.5 A function f dened on Rn is said to be in C0 (Rn ) if f is

continuous and if lim|x| f (x) = 0. The same denitions involving an algebra of
functions still hold. The only dierence is that here the functions are not dened on
a compact set. Because of the limit condition, ||f || sup {|f (x)| : x Rn } < .
The generalization to this case depends on the ideas illustrated in the following
picture.
xn+1
P
K y
sphere : x21 + + x2n + (xn+1 1)2 = 1
Rn
x = (y)
The sphere is labeled K and there is a mapping, denoted by which takes the
surface of K \ {P } , to Rn as implied by the picture. Obviously this mapping and
its inverse is continuous. Now make the following denition for f C0 (Rn ) .
{
f (y) if y =P
f (y)
0 if y = P
Then f is a continuous function dened on K. This is because if yk P, this is

the same thing as |yk | and so, by assumption f (yk ) 0 = lim|x| f (x)
f (P ). Now suppose A is an algebra of functions contained in C0 (Rn ) which sepa-
rates the points of Rn and annihilates no point of Rn .
Let Ab denote the algebra of functions consisting of all functions of the form
g + c where c is a constant and g A. Then Ab is an algebra of functions on
C (K) which annihilates no point of K and separates the points of K. Since K is
a compact subset of Rn+1 , the conclusion of Theorem 11.2.4 applies. Therefore, if
f C0 (Rn ) , there exists gk Ab such that

f gk 0

Since f (P ) = 0, it can be assumed that gk is of the form h

k for hk A, the original
algebra of functions in R . Therefore, simply ignoring P,
n
lim ||f hk || = 0.
k
This proves the following theorem which is the real form of the Stone Weierstrass
approximation theorem.
Theorem 11.2.6 Let A be a real algebra of functions in C0 (Rn ) which separates

points and annihilates no point of Rn . Then if f is any function of C0 (Rn ) , there

exists a sequence of functions of A {hk }k=1 such that
lim ||hk f || = 0.
k
11.3 The Case Of Complex Valued Functions

What about the general case where C0 (Rn ) consists of complex valued functions
and the eld of scalars is C rather than R? The following is the version of the Stone
Weierstrass theorem which applies to this case. You have to assume that for f A
it follows f A. Such an algebra is called self adjoint.
Theorem 11.3.1 Suppose A is an algebra of functions in C0 (Rn ) which separates

the points, annihilates no point, and has the property that if f A, then f A.
Then A is dense in C0 (X).
Proof: Let Re A {Re f : f A}, Im A {Im f : f A}. First I will show

that A = Re A + i Im A = Im A + i Re A. Let f A. Then
1( ) 1( )
f= f +f + f f = Re f + i Im f Re A + i Im A
2 2
and so A Re A + i Im A. Also
1 ( ) i( )
f= if + if if + (if ) = Im (if ) + i Re (if ) Im A + i Re A
2i 2
This proves one half of the desired equality. Now suppose h Re A + i Im A. Then
h = Re g1 + i Im g2 where gi A. Then since Re g1 = 12 (g1 + g1 ) , it follows Re g1
A. Similarly Im g2 A. Therefore, h A. The case where h Im A + i Re A is
similar. This establishes the desired equality.
Now Re A and Im A are both real algebras. I will show this now. First consider
Im A. It is obvious this is a real vector space. It only remains to verify that
the product of two functions in Im A is in Im A. Note that from the rst part,
Re A, Im A are both subsets of A because, for example, if u Im A then u + 0
Im A + i Re A = A. Therefore, if v, w Im A, both iv and w are in A and so
Im (ivw) = vw and ivw A. Similarly, Re A is an algebra.
Both Re A and Im A must separate the points. Here is why: If x1 = x2 , then
there exists f A such that f (x1 ) = f (x2 ) . If Im f (x1 ) = Im f (x2 ) , this shows
there is a function in Im A, Im f which separates these two points. If Im f fails
to separate the two points, then Re f must separate the points and so you could
consider Im (if ) to get a function in Im A which separates these points. This shows
Im A separates the points. Similarly Re A separates the points.
11.4. EXERCISES 279
Neither Re A nor Im A annihilate any point. This is easy to see because if x

is a point there exists f A such that f (x) = 0. Thus either Re f (x) = 0 or
Im f (x) = 0. If Im f (x) = 0, this shows this point is not annihilated by Im A.
If Im f (x) = 0, consider Im (if ) (x) = Re f (x) = 0. Similarly, Re A does not
annihilate any point.
It follows from Theorem 11.2.6 that Re A and Im A are dense in the real valued
functions of C0 (X). Let f C0 (X) . Then there exists {hn } Re A and {gn }
Im A such that hn Re f uniformly and gn Im f uniformly. Therefore, hn +
ign A and it converges to f uniformly.
A repeat of the above argument using Theorem 11.2.4 directly leads to the
following corollary.
Corollary 11.3.2 Suppose A is an algebra of functions in C (K) where K is a

compact set, which separates the points of K, annihilates no point, and has the
property that if f A, then f A. Then A is dense in C (K), the complex valued
continuous functions dened on K.
11.4 Exercises
1. Consider polynomials in x3 on [0, b] . Such a polynomial is of the form p (x) =
a0 + a1 x3 + a2 x6 + + an x3n . Show these polynomials are dense in the space
of continuous functions dened on [0, b].
2. Consider polynomials in ln (1 + x) on [0, b] . Such a polynomial is of the form

1 n
p (x) = a0 + a1 (ln (1 + x)) + + an (ln (1 + x)) . Show these functions are
dense in the space of continuous functions dened on [0, b].
3. Recall the formula for the Bernstein polynomials.

n ( ) ( )
n k nk
pn (x) = f xk (1 x)
k n
k=0
These converged to the continuous function on [0, 1]. Now what if f is

C 1 ([0, 1])? Is it true that pn (x) yields a sequence of polynomials which con-
verges uniformly to f (x) on [0, 1]? The answer is yes. Prove it. What
happens in general when f is C n ([0, 1])? This sort of { thing }
is in general false.

Consider the sequence of functions {fn (x)}n=1 = x
1+nx2 and show it
n=1
converges uniformly to 0 but fn
(0) converges to 1. Thus the Bernstein poly-
nomials are really remarkable.
4. Let S 1 denote the unit circle centered at (0, 0). For convenience, consider
S 1 to be in the complex plane. Thus the points of S 1 are of the form eix for
x R. Show that linear combinations
( 1) of the functions einx for n Z is an
algebra and is dense in C S . That is, show that if f is continuous on S 1
and complex valued, there exist complex scalars ck such that

{ }
( )
n
ikx
sup f e ix
ck e : x R < .

k=n
5. Suppose g is a continuous function dened on R which is( 2) periodic,

g (x + 2) = g (x) for all x. Then dene f on S 1 as( follows.
) f eix g (x) .
Explain why this is well dened and yields f C S . Hint: If eix = eiy
1
then x y is a multiple of 2. If eixn eix , you might argue that each xn

and x can be considered in [0, 2) and that the xn can be chosen such that
xn x. Now prove the following theorem. If g is continuous on R and 2
periodic, then for every > 0 there exist complex numbers ck such that
{ }
n

sup g (x) ck eikx : x R < .

k=n
6. This problem gives an outline of the way Weierstrass originally

1 ( proved
) the
2 n
Weierstrass approximation theorem. Choose an such that 1 1 x an dx =
1. Show an < n+12 or something like this. Now show that for (0, 1) ,
( )
1( )n ( )n
lim 1 x2 an + an 1 x2 dx = 0.
n 1
Next for f a continuous function dened on R, dene the polynomial, pn (x)

by
x+1 ( )n 1
2 ( )n
pn (x) an 1 (x t) f (t) dt = an f (x t) 1 t2 dt.
x1 1
Then show limn ||pn f || = 0. where ||f || = max {|f (x)| : x [1, 1]}.
7. Suppose f C0 ([0, )) and also |f (t)| Cert . Let A denote the algebra
of linear combinations of functions of the form est for s suciently large.
Thus A is dense in C0 ([0, )) . Show that if

est f (t) dt = 0
0
for each s suciently large, then f (t) = 0. Next consider only |f (t)| Cert
for some r. That is f has exponential growth. Show the same conclusion holds
for f if
est f (t) dt = 0
0
for all s suciently large. This justies the Laplace transform procedure of
dierential equations where if the Laplace transforms of two functions are
equal, then the two functions are considered to be equal. More can be said
about this. Hint: For the last part, consider g (t) e2rt f (t) and apply the
rst part to g. If g (t) = 0 then so is f (t).
11.4. EXERCISES 281
n
8. Consider linear combinations of functions of the form i=1 i (xi ) where
i Cc (R) . Show that this is an algebra A of functions of Cc (Rn ) . Explain
why A is dense in C0 (Rn ).
9. In the context of the above problem, if f Cc (Rn ) , dene

f dx lim pk dx1 dxn
k
where ||pk f || 0, pk a function of the sort described in Problem 8. You

must show that the limit in the above exists and that this limit is independent
of the choice of approximating sequence from A. Now give an unbelievably
short
and easy proof that the order of iterated integrals can be interchanged
for f dx where f Cc (Rn ).
10. Suppose is a nite outer regular Borel measure dened on a compact

Hausdor space . Show that there exists K a compact subset of such
that (K) = () but if H is any compact subset of K, then (H) < (K).
Hint: This uses outer regularity and the nite intersection property.
The Lp Spaces
12.1 Basic Inequalities And Properties

One of the main applications of the Lebesgue integral is to the study of various
sorts of functions space. These are vector spaces whose elements are functions of
various types. One of the most important examples of a function space is the space
of measurable functions whose absolute values are pth power integrable where p 1.
These spaces, referred to as Lp spaces, are very useful in applications. In the chapter
(, S, ) will be a measure space.
Denition 12.1.1 Let 1 p < . Dene

Lp () {f : f is measurable and |f ()|p d < }

For each p > 1 dene q by

1 1
+ = 1.
p q
Often one uses p instead of q in this context.
Lp () is a vector space and has a norm. This is similar to the situation for Rn
but the proof requires the following fundamental inequality. .
Theorem 12.1.2 (Holders inequality) If f and g are measurable functions, then

if p > 1,
( ) p1 ( ) q1
|f | |g| d |f | d
p
|g| d .
q
(12.1.1)
Proof: First here is a proof of Youngs inequality .
ap bq
Lemma 12.1.3 If p > 1, and 0 a, b then ab p + q .
Proof: Consider the following picture:
283
284 THE LP SPACES
x
b
x = tp1
t = xq1
t
a
From this picture, the sum of the area between the x axis and the curve added to
the area between the t axis and the curve is at least as large as ab. Using beginning
calculus, this is equivalent to the following inequality.
a b
ap bq
ab t p1
dt + xq1 dx = + .
0 0 p q
The above picture represents the situation which occurs when p > 2 because the
graph of the function is concave up. If 2 p > 1 the graph would be concave down
or a straight line. You should verify that the same argument holds in these cases
just as well. In fact, the only thing which matters in the above inequality is that
the function x = tp1 be strictly increasing.
Note equality occurs when ap = bq .
Here is an alternate proof.
Lemma 12.1.4 For a, b 0,

ap bq
ab +
p q
and equality occurs when if and only if ap = bq .
Proof: If b = 0, the inequality is obvious. Fix b > 0 and consider f (a)

ap q
p + bq ab. Then f (a) = ap1 b. This is negative when a < b1/(p1) and is
positive when a > b1/(p1) . Therefore, f has a minimum when a = b1/(p1) . In other
words, when ap = bp/(p1) = bq since 1/p + 1/q = 1. Thus the minimum value of f
is
bq bq
+ b1/(p1) b = bq bq = 0.
p q
It follows f 0 and this yields the desired inequality.

Proof of Holders inequality: If either |f |p d or |g|p d equals , the
inequality
p 12.1.1 is obviously valid because anything. If either |f |p
d or
|g| d equals 0, then f = 0 a.e. or that g = 0 a.e. and so in this case the left side of
the inequality equals 0 and so the inequality is therefore true. Therefore assume both
( )1/p
|f |p d and |g|p d are less than and not equal to 0. Let |f |p d = I (f )
( p )1/q
and let |g| d = I (g). Then using the lemma,

|f | |g| 1 |f |p 1 |g|q
d p d + q d = 1.
I (f ) I (g) p I (f ) q I (g)
12.1. BASIC INEQUALITIES AND PROPERTIES 285
Hence,
( )1/p ( )1/q
|f | |g| d I (f ) I (g) = |f | d
p
|g| d
q
.
This proves Holders inequality.

The following lemma will be needed.
Lemma 12.1.5 Suppose x, y C. Then
p p p
|x + y| 2p1 (|x| + |y| ) .
Proof: The function f (t) = tp is concave up for t 0 because p > 1. Therefore,
the secant line joining two points on the graph of this function must lie above the
graph of the function. This is illustrated in the following picture.
(|x| + |y|)/2 = m
|x| m |y|
Now as shown above,
( )p p p
|x| + |y| |x| + |y|

2 2
which implies
p p p p
|x + y| (|x| + |y|) 2p1 (|x| + |y| )

Note that if y = (x) is any function for which the graph of is concave up,
you could get a similar inequality by the same argument.
Corollary 12.1.6 (Minkowski inequality) Let 1 p < . Then
( )1/p ( )1/p ( )1/p
p p p
|f + g| d |f | d + |g| d . (12.1.2)
Proof: If p = 1, this is obvious because it is just the triangle inequality. Let

p > 1. Without loss of generality, assume
( )1/p ( )1/p
p p
|f | d + |g| d <
( p )1/p
and |f + g| d = 0 or there is nothing to prove. Therefore, using the above
lemma, ( )
|f + g| d 2
p p1
|f | + |g| d < .
p p
286 THE LP SPACES
p p1
Now |f () + g ()| |f () + g ()| (|f ()| + |g ()|). Also, it follows from the
denition of p and q that p 1 = pq . Therefore, using this and Holders inequality,

|f + g|p d

|f + g|p1 |f |d + |f + g|p1 |g|d

p p
= |f + g| q |f |d + |f + g| q |g|d

1 1 1 1
( |f + g|p d) q ( |f |p d) p + ( |f + g|p d) q ( |g|p d) p.
1
Dividing both sides by ( |f + g|p d) q yields 12.1.2.
The following follows immediately from the above.
Corollary 12.1.7 Let fi Lp () for i = 1, 2, , n. Then

( n p )1/p n ( )1/p

p
fi d |fi | .

i=1 i=1
This shows that if f, g L , then f + g Lp . Also, it is clear that if a is a

p
constant and f Lp , then af Lp because

p p p
|af | d = |a| |f | d < .
Thus Lp is a vector space and

( p )1/p ( p )1/p
a.) |f | d 0, |f | d = 0 if and only if f = 0 a.e.
( p )1/p ( p )1/p
b.) |af | d = |a| |f | d if a is a scalar.
( p )1/p ( p )1/p ( p )1/p
c.) |f + g| d |f | d + |g| d .
( p )1/p ( p )1/p
f |f | d would dene a norm if |f | d = 0 implied f = 0.
Unfortunately, this is not so because if f = 0 a.e. but is nonzero on a set of
( p )1/p
measure zero, |f | d = 0 and this is not allowed. However, all the other
properties of a norm are available and so a little thing like a set of measure zero
will not prevent the consideration of Lp as a normed vector space if two functions
in Lp which dier only on a set of measure zero are considered the same. That is,
an element of Lp is really an equivalence class of functions where two functions are
equivalent if they are equal a.e. With this convention, here is a denition.
Denition 12.1.8 Let f Lp (). Dene

( )1/p
p
||f ||p ||f ||Lp |f | d .
12.2. COMPLETENESS 287
Then with this denition and using the convention that elements in Lp are
considered to be the same if they dier only on a set of measure zero, || ||p is a
norm on Lp () because if ||f ||p = 0 then f = 0 a.e. and so f is considered to be
the zero function because it diers from 0 only on a set of measure zero.
12.2 Completeness
The following is an important denition.
Denition 12.2.1 A complete normed linear space is called a Banach1 space.
Lp is a Banach space. This is the next big theorem.
Theorem 12.2.2 The following hold for Lp ()
a.) Lp () is complete.
b.) If {fn } is a Cauchy sequence in Lp (), then there exists f Lp () and a
subsequence which converges a.e. to f Lp (), and ||fn f ||p 0.
Proof: Let {fn } be a Cauchy sequence in Lp (). This means that for every
> 0 there exists N such that if n, m N , then ||fn fm ||p < . Now select a
subsequence as follows. Let n1 be such that ||fn fm ||p < 21 whenever n, m n1 .
Let n2 be such that n2 > n1 and ||fn fm ||p < 22 whenever n, m n2 . If
n1 , , nk have been chosen, let nk+1 > nk and whenever n, m nk+1 , ||fn
fm ||p < 2(k+1) . The subsequence just mentioned is {fnk }. Thus, ||fnk fnk+1 ||p <
2k . Let
gk+1 = fnk+1 fnk .
Then by the corollary to Minkowskis inequality,
m

m

> ||gk+1 ||p ||gk+1 ||p |gk+1 |

k=1 k=1 k=1 p
for all m. It follows that
( m
)p (

)p
|gk+1 | d ||gk+1 ||p < (12.2.3)
k=1 k=1
1 These spaces are named after Stefan Banach, 1892-1945. Banach spaces are the basic item of
study in the subject of functional analysis and will be considered later in this book.
There is a recent biography of Banach, R. Katuza, The Life of Stefan Banach, (A. Kostant and
W. Woyczy nski, translators and editors) Birkhauser, Boston (1996). More information on Banach
can also be found in a recent short article written by Douglas Henderson who is in the department
of chemistry and biochemistry at BYU.
Banach was born in Austria, worked in Poland and died in the Ukraine but never moved. This
is because borders kept changing. There is a rumor that he died in a German concentration camp
which is apparently not true. It seems he died after the war of lung cancer.
He was an interesting character. He hated taking examinations so much that he did not receive
his undergraduate university degree. Nevertheless, he did become a professor of mathematics due
to his important research. He and some friends would meet in a cafe called the Scottish cafe where
they wrote on the marble table tops until Banachs wife supplied them with a notebook which
became the Scotish notebook and was eventually published.
288 THE LP SPACES
for all m and so the monotone convergence theorem implies that the sum up to m
in 12.2.3 can be replaced by a sum up to . Thus,
(

)p
|gk+1 | d <
k=1
which requires

|gk+1 (x)| < a.e. x.
k=1

Therefore, k=1 gk+1 (x) converges for a.e. x because the functions have values in
a complete space, C, and this shows the partial sums form a Cauchy sequence. Now
let x be such that this sum is nite. Then dene

f (x) fn1 (x) + gk+1 (x) = lim fnm (x)
m
k=1
m
since k=1 gk+1 (x) = fnm+1 (x) fn1 (x). Therefore there exists a set E having
measure zero such that
lim fnk (x) = f (x)
k
for all x
/ E. Redene fnk to equal 0 on E and let f (x) = 0 for x E. It then
follows that limk fnk (x) = f (x) for all x. By Fatous lemma, and the Minkowski
inequality,
( )1/p
p
||f fnk ||p = |f fnk | d
( )1/p
p
lim inf |fnm fnk | d = lim inf ||fnm fnk ||p
m m

m1

lim inf fnj+1 fnj fni+1 fni 2(k1). (12.2.4)
m p p
j=k i=k
Therefore, f Lp () because
||f ||p ||f fnk ||p + ||fnk ||p < ,
and limk ||fnk f ||p = 0. This proves b.).

This has shown fnk converges to f in Lp (). It follows the original Cauchy
sequence also converges to f in Lp (). This is a general fact that if a subsequence
of a Cauchy sequence converges, then so does the original Cauchy sequence. You
should give a proof of this or you could see Theorem 3.5.4.
12.3. MINKOWSKIS INEQUALITY 289
12.3 Minkowskis Inequality

In working with the Lp spaces, the following inequality also known as Minkowskis
inequality is very useful. It is
similar to the Minkowski inequality for sums. To
see this, replace the integral, X with a nite summation sign and you will see the
usual Minkowski inequality or rather the version of it given in Corollary 12.1.7.
To prove this theorem rst consider a special case of it in which technical con-
siderations which shed no light on the proof are excluded.
Lemma 12.3.1 Let (X, S, ) and (Y, F, ) be nite complete measure spaces and
let f be measurable and uniformly bounded. Then the following inequality is
valid for p 1.
( ) p1 ( ) p1
p
|f (x, y)| d d ( |f (x, y)| d) d .
p
(12.3.5)
X Y Y X
Proof: Since f is bounded and (X) , (Y ) < ,

( ) p1
( |f (x, y)|d) d
p
< .
Y X
Let
J(y) = |f (x, y)|d.
X
Note there is no problem in writing this for a.e. y because f is measurable and
Lemma 9.4.8 on Page 232. Then by Fubinis theorem,
( )p
|f (x, y)|d d = J(y) p1
|f (x, y)|d d
Y X
Y X
= J(y)p1 |f (x, y)|d d

X Y
Now apply Holders inequality in the last integral above and recall p 1 = pq . This
yields
( )p
|f (x, y)|d d
Y X
( ) q1 ( ) p1
J(y) dp
|f (x, y)| d
p
d
X Y Y
( ) q1 ( ) p1
= J(y) d p
|f (x, y)| d
p
d
Y X Y
( ) q1 ( ) p1
= ( |f (x, y)|d) d
p
|f (x, y)| d
p
d. (12.3.6)
Y X X Y
290 THE LP SPACES
Therefore, dividing both sides by the rst factor in the above expression,
( ( )p ) p1 ( ) p1
|f (x, y)|d d |f (x, y)|p d d. (12.3.7)
Y X X Y
Note that 12.3.7 holds even if the rst factor of 12.3.6 equals zero.
Now consider the case where f is not assumed to be bounded and where the
measure spaces are nite.
Theorem 12.3.2 Let (X, S, ) and (Y, F, ) be -nite measure spaces and let f
be product measurable. Then the following inequality is valid for p 1.
( ) p1 ( ) p1
p
|f (x, y)| d d ( |f (x, y)| d)p d . (12.3.8)
X Y Y X
Proof: Since the two measure spaces are nite, there exist measurable sets,
Xm and Yk such that Xm Xm+1 for all m, Yk Yk+1 for all k, and (Xm ) , (Yk ) <
. Now dene {
f (x, y) if |f (x, y)| n
fn (x, y)
n if |f (x, y)| > n.
Thus fn is uniformly bounded and product measurable. By the above lemma,
( ) p1 ( ) p1
p
|fn (x, y)| d d ( |fn (x, y)| d)p d . (12.3.9)
Xm Yk Yk Xm
Now observe that |fn (x, y)| increases in n and the pointwise limit is |f (x, y)|. There-
fore, using the monotone convergence theorem in 12.3.9 yields the same inequality
with f replacing fn . Next let k and use the monotone convergence theorem
again to replace Yk with Y . Finally let m in what is left to obtain 12.3.8.
Note that the proof of this theorem depends on two manipulations, the inter-
change of the order of integration and Holders inequality. Note that there is nothing
to check in the case of double sums. Thus if aij 0, it is always the case that
( )p 1/p 1/p
p
aij aij
j i i j
because the integrals in this case are just sums and (i, j) aij is measurable.
The Lp spaces have many important properties.
12.4 Density Simple Functions

Theorem 12.4.1 Let p 1 and let (, S, ) be a measure space. Then the simple
functions are dense in Lp ().
12.5. DENSITY OF CC () 291
Proof: Recall that a function, f , having values in R can be written in the form
f = f + f where
f + = max (0, f ) , f = max (0, f ) .
Therefore, an arbitrary complex valued function, f is of the form

( )
f = Re f + Re f + i Im f + Im f .
If each of these nonnegative functions is approximated by a simple function, it

follows f is also approximated by a simple function. Therefore, there is no loss of
generality in assuming at the outset that f 0.
Since f is measurable, Theorem 7.5.6 implies there is an increasing sequence of
simple functions, {sn }, converging pointwise to f (x). Now
|f (x) sn (x)| |f (x)|.
By the Dominated Convergence theorem,

0 = lim |f (x) sn (x)|p d.
n
Thus simple functions are dense in Lp .
12.5 Density Of Cc ()
For a topological space, Cc () is the space of continuous functions with compact
support in . If you have never heard of a topological space, think Rp . Also recall
the following denition.
Denition 12.5.1 Let (, S, ) be a measure space and suppose (, ) is also a

topological space. Then (, S, ) is called a regular measure space if the algebra
of Borel sets is contained in S, and for all E S,
(E) = inf{(V ) : V E and V open}
and if (E) < ,
(E) = sup{(K) : K E and K is compact }
and (K) < for any compact set K.
For example mp , Lebesgue measure on Rp is an example of such a measure

by Theorem 10.1.2. However, there are many other examples of regular measure
spaces. However, if the extra generality has no interest, just let = Rp .
Lemma 12.5.2 Let be a metric space in which the closed balls are compact and
let K be a compact subset of V , an open set. Then there exists a continuous function
f : [0, 1] such that f (x) = 1 for all x K and spt(f ) is a compact subset of
V . That is, K f V.
292 THE LP SPACES

Proof: First note that V is the increasing union of open sets {Wm }m=1 having
compact closures. Therefore, there exists an m such that K Wm since otherwise,
you could obtain a nested sequence of nonempty compact sets of the form K \ Wm
which would have a point in common contrary to the assertion that K V . Pick
such an m. Then let
( C
)
dist x, Wm
f (x) = C ) + dist (x, K)
dist (x, Wm
This is clearly equal to 1 on K and equals 0 o Wm so spt (f ) is compact. It is

continuous because the functions are all continuous and the denominator cannot
vanish because if x K, then it is at positive distance from W C .
It is not necessary to be in a metric space to do this. You can accomplish the
same thing using Urysohns lemma but this is a topic for another course.
Theorem 12.5.3 Let (, S, ) be a regular measure space as in Denition 12.5.1(For

example mp and = Rp ) where the conclusion of Lemma 12.5.2 holds. Then Cc ()
is dense in Lp ().
Proof: First consider a measurable set E where (E) < . Let K E V

where (V \ K) < . Now let K h V. Then

p
|h XE | d XVp \K d = (V \ K) < .
It follows that for each s a simple function in Lp () , there exists h Cc () such

that ||s h||p < . This is because if

m
s(x) = ci XEi (x)
i=1
is a simple function in Lp where the ci are the distinct nonzero values of s each
(Ei ) < since otherwise s
/ Lp due to the inequality

p p
|s| d |ci | (Ei ) .
By Theorem 12.4.1, simple functions are dense in Lp (). Therefore, the set of
functions Cc () , is also dense in Lp ().
12.6 Continuity Of Translation

Denition 12.6.1 Let f be a function dened on U Rn and let w Rn . Then
fw will be the function dened on w + U by
fw (x) = f (x w).
12.7. SEPARABILITY, SOME SPECIAL FUNCTIONS 293
Theorem 12.6.2 (Continuity of translation in Lp ) Let f Lp (Rn ) with the mea-

sure being Lebesgue measure. Then
lim ||fw f ||p = 0.

||w||0
Proof: Let > 0 be given and let g Cc (Rn ) with ||g f ||p <
3. Since
Lebesgue measure is translation invariant (mn (w + E) = mn (E)),

||gw fw ||p = ||g f ||p < .
3
You can see this from looking at simple functions and passing to the limit or you
could use the change of variables formula to verify it.
Therefore
||f fw ||p ||f g||p + ||g gw ||p + ||gw fw ||

2
< + ||g gw ||p . (12.6.10)
3
But lim|w|0 gw (x) = g(x) uniformly in x because g is uniformly continuous. Now
let B be a large ball containing spt (g) and let 1 be small enough that B (x, ) B
whenever x spt (g). If > 0 is given ( there exists )< 1 such that if |w| < , it
1/p
follows that |g (x w) g (x)| < /3 1 + mn (B) . Therefore,
( )1/p
p
||g gw ||p = |g (x) g (x w)| dmn
B
1/p
mn (B)
( )< .
3 1 + mn (B)
1/p 3
Therefore, whenever |w| < , it follows ||g gw ||p <

3 and so from 12.6.10 ||f
fw ||p < .
12.7 Separability, Some Special Functions

Here the measure space will be (Rn , mn , Fn ) , familiar Lebesgue measure.
First recall the following denition of a polynomial.
Denition 12.7.1 = (1 , , n ) for 1 n nonnegative integers is called a

multi-index. For a multi-index, || 1 + + n and if x Rn ,
x = (x1 , , xn ) ,
and f a function, dene

x x
1 x2 xn .
1 2 n
294 THE LP SPACES
A polynomial in n variables of degree m is a function of the form

p (x) = a x .
||m
Here is a multi-index as just described and a C. Also dene for =

(1 , , n ) a multi-index
|| f
D f (x) n .
1 x2 xn
x 1 2
Denition 12.7.2 Dene G1 to be the functions of the form p (x) ea|x| where
2
a > 0 is rational and p (x) is a polynomial having all rational coecients, a being
rational if it is of the form a+ib for a, b Q. Let G be all nite sums of functions
in G1 . Thus G is an algebra of functions which has the property that if f G then
f G.
Thus there are countably many functions in G1 . This is because, for each m,
there are countably many choices for a for || m since there are nitely many
for || m and for each such , there are countably many choices for a since
Q+iQ is countable. (Why?) Thus there are countably many polynomials having
degree no more than m. This is true for each m and so the number of dierent
polynomials is a countable union of countable sets which is countable. Now there
are countably many choices of e|x| and so there are countably many in G1 because
2
the Cartesian product of countable sets is countable.

Now G consists of nite sums of functions in G1 . Therefore, it is countable
because for each m N, there are countably many such sums which are possible.
I will show now that G is dense in Lp (Rn ) but rst, here is a lemma which
follows from the Stone Weierstrass theorem.
Lemma 12.7.3 G is dense in C0 (Rn ) with respect to the norm,
||f || sup {|f (x)| : x Rn }
Proof: By the Weierstrass approximation theorem, it suces to show G sep-

arates the points and annihilates no point. It was already observed in the above
denition that f G whenever f G. If y1 = y2 suppose rst that |y1 | = |y2 | .
Then in this case, you can let f (x) e|x| . Then f G and f (y1 ) = f (y2 ). If
2
|y1 | = |y2 | , then suppose y1k = y2k . This must happen for some k because y1 = y2 .
Then let f (x) xk e|x| . Thus G separates points. Now e|x| is never equal to
2 2
zero and so G annihilates no point of R . n
These functions are clearly quite specialized. Therefore, the following theorem
is somewhat surprising.
Theorem 12.7.4 For each p 1, p < , G is dense in Lp (Rn ). Since G is count-

able, this shows that Lp (Rn ) is separable.
12.8. CONVOLUTIONS 295
Proof: Let f Lp (Rn ) . Then there exists g Cc (Rn ) such that ||f g||p < .
Now let b > 0 be large enough that
( )
2 p
eb|x| dx < p .
Rn
2
Then x g (x) eb|x| is in Cc (Rn ) C0 (Rn ) . Therefore, from Lemma 12.7.3 there
exists G such that
b||2
ge < 1

b|x|2
Therefore, letting (x) e (x) it follows that G and for all x Rn ,
|g (x) (x)| < eb|x|

2
Therefore,
( )1/p ( ( )p )1/p
eb|x|
p 2
|g (x) (x)| dx dx < .
Rn Rn
It follows
||f ||p ||f g||p + ||g ||p < 2.

From now on, we can drop the restriction that the coecients of the polynomials
in G are rational. We also drop the restriction that a is rational. Thus G will be
nite sums of functions which are of the form p (x) ea|x| where the coecients of
2
p are complex and a > 0.

The following lemma is also interesting even if it is obvious.
Lemma 12.7.5 For G , p a polynomial, and , multi-indices, D G and

p G. Also
sup{|x D (x)| : x Rn } <
Thus these special functions are innitely dierentiable (smooth). They also
have the property that they and all their derivatives vanish as |x| .
12.8 Convolutions
An important construction is the convolution of two functions. This is dened as
follows.
Denition 12.8.1 Let f, g be functions dened on Rn . Then f g (x) equals the

following integral provided it exists.

g f (x) g (x y) f (y) dy
Rn
296 THE LP SPACES
We have the following fundamental result about convolutions in the context of

Lebesgue measure.
Theorem 12.8.2 Let f L1 (Rn ) and g Lp (Rn ) for 1 p < . Then f g (x)
makes sense for a.e. x. Furthermore, f g (x) = g f (x) a.e., f g Lp (Rn ) and
||f g||Lp (Rn ) ||f ||L1 (Rn ) ||g||Lp (Rn )
Proof: By Lemma 10.1.3, both f and g have Borel measurable representatives.
Thus these representatives are equal to f and g respectively in L1 (Rn ) and Lp (Rn ).
Thus I will assume without loss of generality that both f and g are Borel measurable.
Then using the Minkowski inequality for integrals and translation invariance of
Lebesgue measure, ( )
|f (x y) g (y)| dy dx =
Rn Rn
( ( )p )1/p ( )1/p
p
|f (y)| |g (x y)| dy dx |f (y)| |g (x y)| dx dy
( )1/p
p
= |f (y)| |g (x)| dx dy = ||f ||L1 (Rn ) ||g||Lp (Rn )
the last step following from translation invariance of Lebesgue measure, Theorem
10.1.2 or from the change of variables formulas. It follows from this inequality that
for a.e. x,
|f (x y)| |g (y)| dy <
and so, for those values of x, it follows y f (x y) g (y) is in L1 (Rn ) , and so

f (x y) g (y) dy f g (x)
exists. Also note that the above also implies, f g = g f .

To emphasize something which was observed in the proof, f (y) g (x y) dy
at those values of x where it makes sense, is unchanged when you change the
representative of f and g.
12.9 Molliers And Density Of Cc (Rn )

The special functions dened above are dense in each Lp (Rn ) . This is remark-
able. However, there is a possibly even more remarkable theorem about innitely
dierentiable functions which have compact support.
Denition 12.9.1 Let U be an open subset of Rn . Cc (U ) is the vector space
of all innitely dierentiable functions which equal zero for all x outside of some
compact set contained in U . Similarly, Ccm (U ) is the vector space of all functions
which are m times continuously dierentiable and whose support is a compact subset
of U .
( )
12.9. MOLLIFIERS AND DENSITY OF CC RN 297
Example 12.9.2 Let U = B (z, 2r)

[( )1 ]
2
exp |x z| r2 if |x z| < r,
(x) =

0 if |x z| r.
Then a little work shows Cc (U ). The following also is easily obtained.
Lemma 12.9.3 Let U be any open set. Then Cc (U ) = .
Proof: Pick z U and let r be small enough that B (z, 2r) U . Then let
Cc (B (z, 2r)) Cc (U ) be the function of the above example.
Denition 12.9.4 Let U = {x Rn : |x| < 1}. A sequence { m } Cc (U ) is

called a mollier (This is often called an approximate identity if the dierentiability
is not included.) if
1
m (x) 0, m (x) = 0, if |x| ,
m

and m (x) = 1. Sometimes it may be written as { } where satises the above
conditions except (x) = 0 if |x| . In other words, takes the place of 1/m
and in everything that follows 0 instead of m .

As before, f (x, y)d(y) will mean x is xed and the function y f (x, y)
is being integrated. To make the notation more familiar, dx is written instead of
dmn (x).
Example 12.9.5 Let
Cc (B(0, 1)) (B(0, 1) = {x : |x| < 1})

with (x) 0 and dm = 1. Let m (x) = cm (mx) where cm is chosen in such
a way that m dm = 1. By the change of variables theorem cm = mn .
Denition 12.9.6 A function, f , is said to be in L1loc (Rn , ) if f is measurable

and if |f |XK L1 (Rn , ) for every compact set K. Here is a Radon measure
(A Borel measure which is complete and regular like mn .) on Rn . Usually =
mn , Lebesgue measure. When this is so, write L1loc (Rn ) or Lp (Rn ), etc. If f
L1loc (Rn , ), and g Cc (Rn ),

f g(x) f (y)g(x y)d.
Note that here, a specic order is mandated, unlike the earlier treatment given above
for convolutions in the context of Lebesgue measure where the order does not matter.
The following lemma will be useful in what follows. It says that one of these very
unregular functions in L1loc (Rn , ) is smoothed out by convolving with a mollier.
298 THE LP SPACES
Lemma 12.9.7 Let f L1loc (Rn , ), and g Cc (Rn ). Then f g is an innitely

dierentiable function. Here is a Radon measure on Rn . (Complete, regular, and
nite on compact sets.)
Proof: Consider the dierence quotient for calculating a partial derivative of

f g.

f g (x + tej ) f g (x) g(x + tej y) g (x y)
= f (y) d (y) .
t t
Using the fact that g Cc (Rn ), the quotient,
g(x + tej y) g (x y)
,
t
is uniformly bounded. To see this easily, use Theorem 6.4.2 on Page 126 to get the
existence of a constant, M depending on
max {||Dg (x)|| : x Rn }
such that
|g(x + tej y) g (x y)| M |t|
for any choice of x and y. Therefore, there exists a dominating function for the
integrand of the above integral which is of the form C |f (y)| XK where K is a
compact set depending on the support of g. It follows the limit of the dierence
quotient above passes inside the integral as t 0 and

(f g) (x) = f (y) g (x y) d (y) .
xj xj

Now letting x j
g play the role of g in the above argument, partial derivatives of all
orders exist. A similar use of the dominated convergence theorem shows all these
partial derivatives are also continuous.
The following theorem is a lot like Lemma 10.5.2 except the function is innitely
dierentiable.
Theorem 12.9.8 Let K be a compact subset of an open set U . Then there exists
a function, h Cc (U ), such that h(x) = 1 for all x K and h(x) [0, 1] for all
x.
Proof: Let r > 0 be small enough that K + B(0, 3r) U. The symbol,
K + B(0, 3r) means
{k + x : k K and x B (0, 3r)} .
Thus this is simply a way to write
{B (k, 3r) : k K} .
( )
12.9. MOLLIFIERS AND DENSITY OF CC RN 299
Think of it as fattening up the set K. Let Kr = K + B(0, r). A picture of what is

happening follows.
K Kr U
Consider XKr m where m is a mollier. Let m be so large that m 1

< r.
Then from the( denition
) of what is meant by a convolution, and using that m has
support in B 0, m1
, XKr m = 1 on K and that its support is in K + B (0, 3r).
Now using Lemma 12.9.7, XKr m is also innitely dierentiable. Therefore, let
h = XKr m .
Theorem 12.9.9 For each p 1, Cc (Rn ) is dense in Lp (Rn ). Here the measure
is Lebesgue measure.
Proof: Let f Lp (Rn ) and let > 0 be given. Choose g Cc (Rn ) such that
||f g||p < 2 . This can be done by using Theorem 12.5.3. Now let

gm (x) = g m (x) g (x y) m (y) dmn (y) = g (y) m (x y) dmn (y)
where { m } is a mollier. It follows from Lemma 12.9.7 gm Cc (Rn ). It vanishes

if x 1
/ spt(g) + B(0, m ).
( ) p1
||g gm ||p = |g(x) g(x y) m (y)dmn (y)|p dmn (x)
( ) p1
( |g(x) g(x y)| m (y)dmn (y))p dmn (x)
( ) p1
|g(x) g(x y)| dmn (x)
p
m (y)dmn (y)

= ||g gy ||p m (y)dmn (y) <
1
B(0, m ) 2
whenever m is large enough. This follows from the uniform continuity of g. Theorem
12.3.2 was used to obtain the third inequality. There is no measurability problem
because the function
(x, y) |g(x) g(x y)| m (y)
is continuous. Thus when m is large enough,

||f gm ||p ||f g||p + ||g gm ||p < + = .
2 2
300 THE LP SPACES

This is a very remarkable result. Functions in Lp (Rn ) dont need to be continu-
ous anywhere and yet every such function is very close in the Lp norm to one which
is innitely dierentiable having compact support.
Corollary 12.9.10 Let U be an open set. For each p 1, Cc (U ) is dense in

Lp (U ). Here the measure is Lebesgue measure.
Proof: Let f Lp (U ) and let > 0 be given. Choose g Cc (U ) such that

||f g||p < 2 . This is possible because Lebesgue measure restricted to the open set
U is regular. Thus the existence of such a g follows from Theorem 12.5.3. Now let

gm (x) = g m (x) g (x y) m (y) dmn (y) = g (y) m (x y) dmn (y)
where { m } is a mollier. It follows from Lemma 12.9.7 gm Cc (U ) for all m

suciently large. It vanishes if x 1
/ spt(g) + B(0, m ). Then
( ) p1
||g gm ||p = |g(x) g(x y) m (y)dmn (y)| dmn (x)
p
( ) p1
( |g(x) g(x y)| m (y)dmn (y))p dmn (x)
( ) p1
|g(x) g(x y)|p dmn (x) m (y)dmn (y)

= ||g gy ||p m (y)dmn (y) <
1
B(0, m ) 2
whenever m is large enough. This follows from the uniform continuity of g. Theorem
12.3.2 was used to obtain the third inequality. There is no measurability problem
because the function
(x, y) |g(x) g(x y)| m (y)
is continuous. Thus when m is large enough,

||f gm ||p ||f g||p + ||g gm ||p < + = .
2 2

Another thing should probably be mentioned. If you have had a course in com-
plex analysis, you may be wondering whether these innitely dierentiable functions
having compact support have anything to do with analytic functions which also have
innitely many derivatives. The answer is no! Recall that if an analytic function
has a limit point in the set of zeros then it is identically equal to zero. Thus these
functions in Cc (Rn ) are not analytic. This is a strictly real analysis phenomenon
and has absolutely nothing to do with the theory of functions of a complex variable.
12.10. L 301
12.10 L
Formally the conjugate index to 1 would be regarding 1/ as 0. This is also an
important space. Sometimes we call it the space of essentially bounded functions
meaning that they are bounded o a set of measure zero.
Denition 12.10.1 Let (, S, ) be a measure space. L () is the vector space

of measurable functions such that for some M > 0, |f (x)| M for all x outside of
some set of measure zero (|f (x)| M a.e.). Dene f = g when f (x) = g(x) a.e.
and ||f || inf{M : |f (x)| M a.e.}.
Theorem 12.10.2 L () is a Banach space.
Proof: It is clear that L () is a vector space. Is || || a norm?

Claim: If f L (),{ then |f (x)| ||f || a.e. }
Proof of the claim: x : |f (x)| ||f || + n1 En is a set of measure zero
according to the denition of ||f || . Furthermore, {x : |f (x)| > ||f || } = n En
and so it is also a set of measure zero. This veries the claim.
Now if ||f || = 0 it follows that f (x) = 0 a.e. Also if f, g L (),
|f (x) + g (x)| |f (x)| + |g (x)| ||f || + ||g||
a.e. and so ||f || + ||g|| serves as one of the constants, M in the denition of
||f + g|| . Therefore,
||f + g|| ||f || + ||g|| .
Next let c be a number. Then |cf (x)| = |c| |f (x)| |c| ||f || and so ||cf ||
|c| ||f || . Therefore since c is arbitrary, ||f || = ||c (1/c) f || 1c ||cf || which
implies |c| ||f || ||cf || . Thus || || is a norm as claimed.
To verify completeness, let {fn } be a Cauchy sequence in L () and use the
above claim to get the existence of a set of measure zero, Enm such that for all
x / Enm ,
|fn (x) fm (x)| ||fn fm ||
Let E = n,m Enm . Thus (E) = 0 and for each x / E, {fn (x)}n=1 is a Cauchy
sequence in C. Let
{
0 if x E
f (x) = = lim XE C (x)fn (x).
limn fn (x) if x
/E n
Then f is clearly measurable because it is the limit of measurable functions. If
Fn = {x : |fn (x)| > ||fn || }
and F =
n=1 Fn , it follows (F ) = 0 and that for x
/ F E,
|f (x)| lim inf |fn (x)| lim inf ||fn || <

n n
302 THE LP SPACES
because {||fn || } is a Cauchy sequence. (|||fn || ||fm || | ||fn fm || by the

triangle inequality.) Thus f L (). Let n be large enough that whenever m > n,
||fm fn || < .
Then, if x
/ E,
|f (x) fn (x)| = lim |fm (x) fn (x)|
m
lim inf ||fm fn || < .
m
Hence ||f fn || < for all n large enough.
12.11 Exercises
1. Let E be a Lebesgue measurable set in R. Suppose m(E) > 0. Consider the
set
E E = {x y : x E, y E}.
Show that E E contains an interval. Hint: Let

f (x) = XE (t)XE (x + t)dt.
Note f is continuous at 0 and f (0) > 0 and use continuity of translation in

Lp .
2. Give an example of a sequence of functions in Lp (R) which converges to
zero in Lp but does not converge pointwise to 0. Does this contradict the
proof of the theorem that Lp is complete? You dont have to be real precise,
just describe it.
3. Let g be in L2 ([0, 2]) and extend it to be periodic of period 2. The nth
partial sum of the Fourier series of g is dened as

n
ck eikx
k=n
where ck is the Fourier coecient given by

2
1
ck = eiky g (y) dy.
2 0
Show that this particular choice of ck has the property that if ak is any other
choice of these constants, then

n n
ikx ikx
g ck e g ak e

k=n L2 (0,2) k=n L2 (0,2)
Thus, the partial sums of the Fourier series give the optimal approximation
of g in L2 (0, 2)!
4. Now show that whenever g is as above,

n
ikx
lim g ck e =0
n
k=n L2 (0,2)
Thus the Fourier series converges in L2 (0, 2) to the function it came from.
Hint: This is not too hard if you use Problem 5 on Page 280. This does not
show anything about pointwise convergence of the partial sums of the Fourier
series! This will be done later. In fact the question of pointwise convergence
of Fourier series to such a function in L2 (0, 2) was an unsolved problem till
the middle 1960s.
5. Now suppose that f is a continuous 2 periodic function. Suppose also
that for t [0, 2] ,
t
f (t) = f (0) + g (s) ds
0
n
where g L (0, 2) . Letting Sn g (x)
2
k=n gk e
ikx
for
2
1
gk = g (s) eiks ds
2 0
the Fourier coecient of g, it follows that Sn g g in L2 (0, 2).
(a) Show that this convergence implies that for each t [0, 2] ,
t
f (t) = f (0) + lim Sn g (s) ds
n 0
Now use the integral of the sum equals the sum of the integrals on the
right to nd that

n
gk ( ikt )
f (t) = f (0) + lim e 1
n ik
k=n, k=0
(b) Show that the Fourier coecient of f is given by fk = gk

ik whenever k = 0.
Also verify that

n
gk
n
lim = lim fk C
n ik n
k=n, k=0 k=n, k=0
exists. Explain from this why for all t [0, 2] ,

n
f (t) = f (0) C + lim fk eikt
n
k=n, k=0
304 THE LP SPACES
(c) Finally, use the proof of the completeness of L2 to argue that for some
subsequence nm , limm nm = ,

nm
f (t) = lim fk eikt + f0
m
k=nm , k=0
2
for a.e. t [0, 2], where f0 is the Fourier coecient of f, f0 = 2
1
0
f (t) dt.
Explain why f (0) C = f0 . Explain why this gives a result for pointwise
convergence of the Fourier series of f and describe this result carefully.
A much better result on pointwise convergence will be given later.

6. The Marcinkiewicz interpolation theorem is a very amazing and useful the-
orem. Here is a denition.
Denition 12.11.1 Lp ()+L1 () will denote the space of measurable func-

tions, f , such that f is the sum of a function in Lp () and L1 (). Also, if
T : Lp () + L1 () space of measurable functions, T is subadditive if
|T (f + g) (x)| |T f (x)| + |T g (x)|.
T is of type (p, p) if there exists a constant independent of f Lp () such

that
||T f ||p A f p , f Lp ().
T is weak type (p, p) if there exists a constant A independent of f such that
( )p
A
([x : |T f (x)| > ]) ||f ||p , f Lp ().

Now here is the Marcinkiewicz interpolation theorem.
Theorem 12.11.2 Let (, , S) be a nite measure space, 1 < r < , and

let
T : L1 () + Lr () space of measurable functions
be subadditive, weak (r, r), and weak (1, 1). Then T is of type (p, p) for every
p (1, r) and
||T f ||p Ap ||f ||p
where the constant Ap depends only on p and the constants in the denition
of weak (1, 1) and weak (r, r).
This problem is to prove this wonderful theorem. Recall rst Problem 13 on

Page 235. From this problem or the previous problem,

p
|f | d = ptp1 ([|f | > t]) dt (12.11.11)
0
There is nothing to prove if f / Lp so always assume f Lp . Now for > 0,

let
{ {
f (x) if |f (x)| f (x) if |f (x)| >
f1 (x) , f2 (x)
0 if |f (x)| > 0 if |f (x)|
Thus f = f1 + f2 .
(a) Show that f1 Lr and f2 L1 .

(b) Explain why
[ ] [ ]
[|T f | > ] |T f1 | > |T f2 | > .
2 2
(c) Using the weak type estimates and 12.11.11, explain why

p
|T f | d = pp1 ([|T f | > ]) d
0
([

]) ([ ])
p |T f1 | >
p1
d + pp1 |T f2 | > d
0 2 0 2
( )r
2Ar 2A1
p p1 ||f1 ||Lr d + p p1 ||f2 ||1 d.
0 0
(d) Now explain from the denition of f1 and f2 why the right side equals

r r
= p (2Ar ) p1r |f1 | dd + 2A1 p p2 |f2 | dd
0 0
|f (x)|
r r
= p (2Ar ) |f | p1r dd + 2A1 p |f | p2 dd
|f (x)| 0
( )
2r Arr p 2pA1 p
max , ||f ||Lp ()
rp p1
(e) Note how Fubinis theorem was used. Why were the functions of interest
product measurable? After all, fi is a function of although this has
been suppressed. You might consider the proof of measurability which
led to Problem 12 on Page 234.
7. Let K be a bounded subset of Lp (Rn ) and suppose that for each > 0
there exists G such that G is compact with

p
|u (x)| dx < p
Rn \G
and for all > 0, there exist a > 0 and such that if |h| < , then

p
|u (x + h) u (x)| dx < p
306 THE LP SPACES
for all u K. Show that K is precompact in Lp (Rn ). Hint: Let k be a

mollier and consider
Kk {u k : u K} .
Verify the conditions of the Ascoli Arzela theorem for these functions dened
on G and show there is an net for each > 0. Can you modify this to let
an arbitrary open set take the place of Rn ? This is a very important result.
8. Let (, d) be a metric space and suppose also that (, S, ) is a regular
measure space such that () < and let f L1 () where f has complex
values. Show that for every > 0, there exists an open set of measure less
than , denoted here by V and a continuous function, g dened on such
that f = g on V C . Thus, aside from a set of small measure, f is continuous.
If |f ()| M , show that we can also assume |g ()| M . This is called
Lusins theorem. Hint: Use Theorems 12.5.3 and 12.2.2 to obtain a sequence
of functions in Cc () , {gn } which converges pointwise a.e. to f and then use
Egoros theorem to obtain a small set W of measure less than /2 such that
convergence
( C is uniform
) on W C . Now let F be a closed subset of W C such
that W \ F < /2. Let V = F C . Thus (V ) < and on F = V C ,
the convergence of {gn } is uniform showing that the restriction of f to V C is
continuous. Now use the Tietze extension theorem.
9. Let : R R be convex. This means
(x + (1 )y) (x) + (1 )(y)

(y)(x) (z)(y)
whenever [0, 1]. Verify that if x < y < z, then yx zy
and that (z)(x)
zx (z)(y)
zy . Show if s R there exists such that
(s) (t) + (s t) for all t. Show that if is convex, then is continuous.
10. Prove Jensens inequality.

If : R R is convex, () = 1,
and f : R
is in L1 (), then ( f du) (f )d. Hint: Let s = f d and use
Problem 9.

11. Let p1 + p1 = 1, p > 1, let f Lp (R), g Lp (R). Show f g is uniformly
continuous on R and |(f g)(x)| ||f ||Lp ||g||Lp . Hint: You need to consider
why f g exists and then this follows from the denition of convolution and
continuity of translation in Lp .
1
12. B(p, q) = 0 xp1 (1 x)q1 dx, (p) = 0 et tp1 dt for p, q > 0. The rst
of these is called the beta function, while the second is the gamma function.
Show a.) (p + 1) = p(p); b.) (p)(q) = B(p, q)(p + q).
x
13. Let f Cc (0, ) and dene F (x) = x1 0 f (t)dt. Show
p
||F ||Lp (0,) ||f ||Lp (0,) whenever p > 1.
p1
Hint: Argue there is no loss of generality in assuming f 0 and then assume

this is so. Integrate 0 |F (x)|p dx by parts as follows:

show = 0
z }| {
F dx = xF p |
p
0 p xF p1 F dx.
0 0
Now show xF = f F and use this in the last integral. Complete the
argument by using Holders inequality and p 1 = p/q.
14. Now supposef Lp (0, ), p > 1, and f not necessarily in Cc (0, ). Show
x
that F (x) = x1 0 f (t)dt still makes sense for each x > 0. Show the inequality
of Problem 13 is still valid. This inequality is called Hardys inequality. Hint:
To show this, use the above inequality along with the density of Cc (0, ) in
Lp (0, ).
15. Suppose f, g 0. When does equality hold in Holders inequality?
16. Prove Vitalis Convergence theorem: Let {fn } be uniformly integrable and
complex valued, () < , fn (x) f (x) a.e. where f is measurable. Then
f L1 and limn |fn f |d = 0. Hint: Use Egoros theorem to show
{fn } is a Cauchy sequence in L1 ().
17. Show the Vitali Convergence theorem implies the Dominated Convergence
theorem for nite measure spaces but there exist examples where the Vitali
convergence theorem works and the dominated convergence theorem does not.
18. Suppose f L L1 . Show limp ||f ||Lp = ||f || . Hint:

p p
(||f || ) ([|f | > ||f || ]) |f | d
[|f |>||f || ]

p p1 p1
|f | d = |f | |f | d ||f || |f | d.
Now raise both ends to the 1/p power and take lim inf and lim sup as p .
You should get ||f || lim inf ||f ||p lim sup ||f ||p ||f ||
19. Suppose () < . Show that if 1 p < q, then Lq () Lp (). Hint

Use Holders inequality.
20. Show L1 (R)* L2 (R) and L2 (R) * L1 (R) if Lebesgue measure is used. Hint:
Consider 1/ x and 1/x.
21. Suppose that [0, 1] and r, s, q > 0 with
1 1
= + .
q r s
308 THE LP SPACES
show that

( |f | d)
q 1/q
(( |f | d) ) (( |f |s d)1/s )1.
r 1/r
If q, r, s 1 this says that
||f ||q ||f ||r ||f ||1

s .
Using this, show that

( )
ln ||f ||q ln (||f ||r ) + (1 ) ln (||f ||s ) .
Hint:
|f |q d = |f |q |f |q(1) d.
q q(1)
Now note that 1 = r + s and use Holders inequality.
22. Suppose f is a function in L1 (R) and f is innitely dierentiable. Does it
follow that f L1 (R)? Is f measurable? Hint: What if Cc (0, 1) and
f (x) = (2n (x n)) for x (n, n + 1) , f (x) = 0 if x < 0?
23. For a function f L1 (Rp ), the Fourier transform, F f is given by

1
F f (t) ( )n eitx f (x) dx
2 Rp
and the so called inverse Fourier transform, F 1 f is dened by

1
F f (t) ( )n eitx f (x) dx
2 R p
Show that if f L1 (Rp ) , then lim|x| F f (x) = 0. Hint: You might try to
show this rst for f Cc (Rp ).
Fourier Transforms
13.1 Fourier Transforms Of Functions In G

Let G be the functions of Denition 12.7.2 except, for the sake of convenience,
remove all references to rational numbers. Thus G consists of nite sum of poly-
nomials having coecients in C times ea|x| for some a > 0. The idea is to rst
2
understand the Fourier transform on these very specialized functions.
Denition 13.1.1 For G Dene the Fourier transform, F and the inverse
Fourier transform, F 1 by

F (t) (2)n/2 eitx (x)dx,
Rn

1 n/2
F (t) (2) eitx (x)dx.
Rn
n
where t x i=1 ti xi . Note there is no problem with this denition because is
in L1 (Rn ) and therefore, itx
e (x) |(x)| ,
an integrable function.
One reason for using the functions G is that it is very easy to compute the
Fourier transform of these functions. The rst thing to do is to verify F and F 1
map G to G and that F 1 F () = .
Lemma 13.1.2 The following hold. (c > 0)

( )n/2 ( )n/2
1 c|t|2 ist 1
ec|t| eist dt
2
e e dt =
2 Rn 2 Rn
( )n/2 ( )n ( )n/2
1
|s|2 1
e 4c |s| .
1 2
= e 4c = (13.1.1)
2 c 2c
309
310 FOURIER TRANSFORMS
Proof: Consider rst the case of one dimension. Let H (s) be given by

ect eist dt = ect cos (st) dt
2 2
H (s)
R R
Then using the dominated convergence theorem to dierentiate,
( ( )
) ect
2
s
ct2 ct2
H (s) = e t sin (st) dt = sin (st) | e cos (st) dt
R 2c 2c R
s
= H (s) .
2c
ct2
dt. Thus H (0) = R ecx dx I and so
2
Also H (0) = R e
2

ec(x +y ) dxdy =
2 2
ecr rddr = .
2
I2 =
R 2 0 0 c
For another proof of this which does not use change of variables and polar coordi-
nates, see Problem 14 below. Hence

s
H (s) + H (s) = 0, H (0) = .
2c c
s2

It follows that H (s) = e 4c . Hence
c
( )1/2
1 ct2 ist 1 s2 1 s2
e e dt = e 4c = e 4c .
2 R c 2 2c
This proves the formula in the case of one dimension. The case of the inverse Fourier
transform is similar. The n dimensional formula follows from Fubinis theorem.
With these formulas, it is easy to verify F, F 1 map G to G and F F 1 =
F 1 F = id.
Theorem 13.1.3 Each of F and F 1 map G to G. Also F 1 F () = and
F F 1 () = .

Proof: To make the notation simpler, will symbolize (2)1n/2 Rn . Also,
fb (x) eb|x| . Then from the above
2
n/2
F fb = (2b) f(4b)1
The rst claim will be shown if it is shown that F G for (x) = x eb|x|
2
because an arbitrary function of G is a nite sum of scalar multiples of functions

such as . Using Lemma 13.1.2,

eitx x eb|x| dx
2
F (t)
( )
|| itx b|x|2
= (i) Dt e e dx
( ( )n )
|| |t|2
= (i) Dt e 4b
b
13.2. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 311
|t|2
and this is clearly in G because it equals a polynomial times e 4b . Similarly,
F 1 : G G. Now consider F 1 F () (s) . From the above, and integrating by
parts,
( )
1 || itx b|x|2
F F () (s) = (i) ist
e Dt e e dx dt
( )
|| || ist itx b|x|2
= (i) (i) s e e e dx dt
= s F 1 (F (fb )) (s)
( ) ( )
n/2 n/2 1
F 1 (F (fb )) (s) = F 1 (2b) f(4b)1 (s) = (2b) F f(4b)1 (s)
( )n/2
n/2 1
= (2b) 2 (4b) f(4(4b)1 )1 (s) = fb (s)
Hence F 1 F () (s) = s fb (s) = (s).
13.2 Fourier Transforms Of Just About Anything

13.2.1 Fourier Transforms Of G
Denition 13.2.1 Let G denote the vector space of linear functions dened on G
which have values in C. Thus T G means T : G C and T is linear,
T (a + b) = aT () + bT () for all a, b C, , G
Let G. Then we can regard as an element of G by dening

() (x) (x) dx.
Rn
Then we have the following important lemma.
Lemma 13.2.2 The following is obtained for all , G.
( )
F () = (F ) , F 1 () = F 1
Also if G and = 0 in G so that () = 0 for all G, then = 0 as a
function.
Proof:

F () F (t) (t) dt
Rn
( )n/2
1
= eitx (x)dx (t) dt
Rn 2 Rn
( )n/2
1
= (x) eitx (t) dtdx
Rn 2 Rn

= (x)F (x) dx (F )
Rn
The other claim is similar.

Suppose now () = 0 for all G. Then

dx = 0
Rn
for all G. Therefore, this is true for = and so = 0.

This lemma suggests a way to dene the Fourier transform of something in G .
Denition 13.2.3 For T G , dene F T, F 1 T G by

( )
F T () T (F ) , F 1 T () T F 1
Lemma 13.2.4 F and F 1 are both one to one, onto, and are inverses of each
other.
Proof: First note F and F 1 are both linear. This follows directly from the
denition. Suppose now F T = 0. Then F T () = T (F ) = 0 for all ( G. But )F
and F 1 map G onto G because if G, then as shown above, = F F 1 () .
Therefore, T = 0 and so F is one to one. Similarly F 1 is one to one. Now
( ) ( ( ))
F 1 (F T ) () (F T ) F 1 T F F 1 () = T .
Therefore, F 1 F (T ) = T. Similarly, F F 1 (T ) = T. Thus both F and F 1 are

one to one and onto and are inverses of each other as suggested by the notation.
Probably the most interesting things in G are functions of various kinds. The
following lemma will be useful in considering this situation.

Lemma 13.2.5 If f L1loc (Rn ) and Rn f dx = 0 for all Cc (Rn ), then
f = 0 a.e.
Proof: Let E be bounded and Lebesgue measurable. By regularity, there exists

a compact set Kk E and an open set Vk E such that mn (Vk \ Kk ) < 2k .
Let hk equal 1 on Kk , vanish on VkC , and take values between 0 and 1. Then
hk converges to XE o
k=1 l=k (Vl \ Kl ) , a set of measure zero. Hence, by the
dominated convergence theorem,

f XE dmn = lim f hk dmn = 0.
k
It follows that for E an arbitrary Lebesgue measurable set,

f XB(0,R) XE dmn = 0.
Let {
f
if |f | = 0
|f |
sgn f =
0 if |f | = 0
By Theorem 9.1.3, there exists {sk }, a sequence of simple functions converging

pointwise to sgn f such that |sk | 1. Then by the dominated convergence theorem
again,
|f | XB(0,R) dmn = lim f XB(0,R) sk dmn = 0.
k
Since R is arbitrary, |f | = 0 a.e.
Corollary 13.2.6 Let f L1 (Rn ) and suppose

f (x) (x) dx = 0
Rn
for all G. Then f = 0 a.e.
Proof: Let Cc (Rn ) . Then by the Stone Weierstrass approximation theo-

rem, there exists a sequence of functions, {k } G such that k uniformly.
Then by the dominated convergence theorem,

f dx = lim f k dx = 0.
k
By Lemma 13.2.5 f = 0.
The next theorem is the main result of this sort.
Theorem 13.2.7 Let f Lp (Rn ) , p 1, or suppose f is measurable and has

polynomial growth, ( )m
2
|f (x)| K 1 + |x|
for some m N. Then if

f dx = 0
for all G, then it follows f = 0.
Proof: First note that if f Lp (Rn ) or has polynomial growth, then it makes
sense to write the integral f dx described above. This is obvious in the case of
polynomial growth. In the case where f Lp (Rn ) it also makes sense because
( )1/p ( )1/p
p p
|f | || dx |f | dx || dx <
due to the fact mentioned above that all these functions in G are in Lp (Rn ) for
every p 1. Suppose now that f Lp , p 1. The case where f L1 (Rn ) was
dealt with in Corollary 13.2.6. Suppose f Lp (Rn ) for p > 1. Then
( )
p2 p 1 1
|f | f L (R ) , p = q, + = 1
n
p q

and by density of G in Lp (Rn ) (Theorem 12.7.4), there exists a sequence {gk } G
such that
p2
gk |f | f 0.
p
Then
( )
p p2
|f | dx = f |f | f gk dx + f gk dx
Rn Rn Rn
( )
p2
= f |f | f gk dx
Rn

p2
||f ||Lp gk |f | f
p
which converges to 0. Hence f = 0.

It remains to consider the case where f has polynomial growth. Thus x
f (x) e|x| L1 (Rn ) . Therefore, for all G,
2

0 = f (x) e|x| (x) dx
2
because e|x| (x) G. Therefore, by the rst part, f (x) e|x| = 0 a.e.
2 2
Note that polynomial growth could be replaced with a condition of the form
( )m
2
|f (x)| K 1 + |x| ek|x| , < 2
and the same proof would yield that these functions are in G . The main thing to
observe is that almost all functions of interest are in G .
Theorem 13.2.8 Let f be a measurable function with polynomial growth,

( )N
2
|f (x)| C 1 + |x| for some N,
or let f Lp (Rn ) for some p [1, ]. Then f G if

f () f dx.
Proof: Let f have polynomial growth rst. Then the above integral is clearly
well dened and so in this case, f G .
Next suppose f Lp (Rn ) with > p 1. Then it is clear again that the
above integral is well dened because of the fact that is a sum of polynomials

times exponentials of the form ec|x| and these are in Lp (Rn ). Also f () is
2
clearly linear in both cases.

This has shown that for nearly any reasonable function, you can dene its Fourier
transform as described above. You could also dene the Fourier transform of a nite
Borel measure because for such a measure

d
Rn
is a linear functional on G. This includes the very important case of probability

distribution measures. The theoretical basis for this assertion will be given a little
later.
13.2.2 Fourier Transforms Of Functions In L1 (Rn )

First suppose f L1 (Rn ) .

Theorem 13.2.9 Let f L1 (Rn ) . Then F f () = Rn
gdt where
( )n/2
1
g (t) = eitx f (x) dx
2 Rn
( )
1 n/2

and F 1 f () = Rn
gdt where g (t) = 2 Rn
eitx f (x) dx. In short,

F f (t) (2)n/2 eitx f (x)dx,
Rn

F 1 f (t) (2)n/2 eitx f (x)dx.
Rn
Proof: From the denition and Fubinis theorem,

( )n/2
1
F f () f (t) F (t) dt = f (t) eitx (x) dxdt
Rn Rn 2 Rn
(( )n/2 )
1 itx
= f (t) e dt (x) dx.
Rn 2 Rn
Since G is arbitrary, it follows from Theorem 13.2.7 that F f (x) is given by the
claimed formula. The case of F 1 is identical.
Here are interesting properties of these Fourier transforms of functions in L1 .
Theorem 13.2.10 If f L1 (Rn ) and ||fk f ||1 0, then F fk and F 1 fk con-

verge uniformly to F f and F 1 f respectively. If f L1 (Rn ), then F 1 f and F f
are both continuous and bounded. Also,
lim F 1 f (x) = lim F f (x) = 0. (13.2.2)

|x| |x|
Furthermore, for f L1 (Rn ) both F f and F 1 f are uniformly continuous.
Proof: The rst claim follows from the following inequality.

itx
|F fk (t) F f (t)| (2) n/2 e fk (x) eitx f (x) dx
R
n
n/2
= (2) |fk (x) f (x)| dx
Rn
n/2
= (2) ||f fk ||1 .
which a similar argument holding for F 1 .

Now consider the second claim of the theorem.

itx
|F f (t) F f (t )| (2)n/2 e eit x |f (x)| dx
Rn
The integrand is bounded by 2 |f (x)|, a function in L1 (Rn ) and converges to 0 as

t t and so the dominated convergence theorem implies F f is continuous. To see
F f (t) is uniformly bounded,

|F f (t)| (2)n/2 |f (x)| dx < .
Rn
A similar argument gives the same conclusions for F 1 .

It remains to verify 13.2.2 and the claim that F f and F 1 f are uniformly
continuous.

|F f (t)| (2) n/2
eitx
f (x)dx
n R
n/2
Now let > 0 be given and let g Cc n
(R ) such that (2) ||g f ||1 < /2.
Then

|F f (t)| (2)n/2 |f (x) g (x)| dx
Rn

+ (2) n/2
e itx
g(x)dx
Rn

/2 + (2) n/2
e itx
g(x)dx .
Rn
Now integrating by parts, it follows that for ||t|| max {|tj | : j = 1, , n} > 0

n
1
|F f (t)| /2 + (2)n/2 g (x) dx (13.2.3)

||t|| Rn j=1 xj
and this last expression converges to zero as ||t|| . The reason for this is that
if tj = 0, integration by parts with respect to xj gives

1 g (x)
(2)n/2 eitx g(x)dx = (2)n/2 eitx dx.
Rn it j R n xj
Therefore, choose the j for which ||t|| = |tj | and the result of 13.2.3 holds. There-
fore, from 13.2.3, if ||t|| is large enough, |F f (t)| < . Similarly, lim||t|| F 1 (t) =
0. Consider the claim about uniform continuity. Let > 0 be given. Then there
exists R such that if ||t|| > R, then |F f (t)| < 2 . Since F f is continuous, it is
n
uniformly continuous on the compact set [R 1, R + 1] . Therefore, there exists
n
1 such that if ||t t || < 1 for t , t [R 1, R + 1] , then
|F f (t) F f (t )| < /2. (13.2.4)

Now let 0 < < min ( 1 , 1) and suppose ||t t || < . If both t, t are contained
in [R, R] , then 13.2.4 holds. If t [R, R] and t
n n n
/ [R, R] , then both are
n
contained in [R 1, R + 1] and so this veries 13.2.4 in this case. The other case
n
is that neither point is in [R, R] and in this case,
|F f (t) F f (t )| |F f (t)| + |F f (t )|

< + = .
2 2
There is a very interesting relation between the Fourier transform and convolu-
tions.
n/2
Theorem 13.2.11 Let f, g L1 (Rn ). Then f g L1 and F (f g) = (2) F f F g.
Proof: Consider
|f (x y) g (y)| dydx.
Rn Rn
The function, (x, y) |f (x y) g (y)| is Lebesgue measurable and so by Fubinis

theorem,

|f (x y) g (y)| dydx = |f (x y) g (y)| dxdy = ||f ||1 ||g||1 < .
Rn Rn Rn Rn

x, Rn |f (x y) g (y)| dy < and for each of these values
It follows that for a.e.
of x, it follows that Rn f (x y) g (y) dy exists and equals a function of x which is
in L1 (Rn ) , f g (x). Now

n/2
F (f g) (t) (2) eitx f g (x) dx
Rn

n/2 itx
= (2) e f (x y) g (y) dydx
R Rn

n
n/2
= (2) eity g (y) eit(xy) f (x y) dxdy
Rn Rn
n/2
= (2) F f (t) F g (t) .
There are many other considerations involving Fourier transforms of functions

in L1 (Rn ). Some others are in the exercises.
13.2.3 Fourier Transforms Of Functions In L2 (Rn )

Consider F f and F 1 f for f L2 (Rn ). First note that the formula given for F f
and F 1 f when f L1 (Rn ) will not work for f L2 (Rn ) unless f is also in L1 (Rn ).
Recall that a + ib = a ib.
Theorem 13.2.12 For G, ||F ||2 = ||F 1 ||2 = ||||2 .

Proof: First note that for G,

F () = F 1 () , F 1 () = F (). (13.2.5)
This follows from the denition. For example,

n/2
F (t) = (2) eitx (x) dx
Rn

= (2)n/2 eitx (x) dx
Rn
Let , G. It was shown above that

(F )(t)dt = (F )dx.
Rn Rn
Similarly,
(F 1 )dx = (F 1 )dt. (13.2.6)
Rn Rn
Now, 13.2.5 - 13.2.6 imply

||2 dx = dx = F 1 (F )dx = F (F )dx
Rn
R Rn
Rn
n
= F (F )dx = |F |2 dx.
Rn Rn
Similarly
||||2 = ||F 1 ||2 .
Lemma 13.2.13 Let f L2 (Rn ) and let k f in L2 (Rn ) where k G. (Such
a sequence exists because of density of G in L2 (Rn ).) Then F f and F 1 f are both
in L2 (Rn ) and the following limits take place in L2 .
lim F (k ) = F (f ) , lim F 1 (k ) = F 1 (f ) .
k k
Proof: Let G be given. Then

F f () f (F ) f (x) F (x) dx
Rn

= lim k (x) F (x) dx = lim F k (x) (x) dx.
k Rn k Rn

Also by Theorem 13.2.12 {F k }k=1 is Cauchy in L2 (Rn ) and so it converges to
some h L2 (Rn ). Therefore, from the above,

F f () = h (x) (x)
Rn
which shows that F (f ) L2 (Rn ) and h = F (f ) . The case of F 1 is entirely

similar.
Since F f and F 1 f are in L2 (Rn ) , this also proves the following theorem.
Theorem 13.2.14 If f L2 (Rn ), F f and F 1 f are the unique elements of L2 (Rn )

such that for all G,

F f (x)(x)dx = f (x)F (x)dx, (13.2.7)
Rn Rn

F 1 f (x)(x)dx = f (x)F 1 (x)dx. (13.2.8)
Rn Rn
Theorem 13.2.15 (Plancherel)
||f ||2 = ||F f ||2 = ||F 1 f ||2 . (13.2.9)
Proof: Use the density of G in L2 (Rn ) to obtain a sequence, {k } converging

to f in L2 (Rn ). Then by Lemma 13.2.13
||F f ||2 = lim ||F k ||2 = lim ||k ||2 = ||f ||2 .
k k
Similarly,
||f ||2 = ||F 1 f ||2.
The following corollary is a simple generalization of this. To prove this corollary,
use the following simple lemma which comes as a consequence of the Cauchy Schwarz
inequality.
Lemma 13.2.16 Suppose fk f in L2 (Rn ) and gk g in L2 (Rn ). Then

lim fk gk dx = f gdx
k Rn Rn
Proof:

fk gk dx f gdx fk gk dx fk gdx +

Rn R n Rn Rn

fk gdx f gdx

Rn Rn
||fk ||2 ||g gk ||2 + ||g||2 ||fk f ||2 .

Now ||fk ||2 is a Cauchy sequence and so it is bounded independent of k. Therefore,
the above expression is smaller than whenever k is large enough.
Corollary 13.2.17 For f, g L2 (Rn ),

f gdx = F f F gdx = F 1 f F 1 gdx.
Rn Rn Rn
Proof: First note the above formula is obvious if f, g G. To see this, note

1
F f F gdx = F f (x) n/2
eixt g (t) dtdx
Rn Rn (2) Rn

1 ( )
= n/2
eixt F f (x) dxg (t)dt = F 1 F f (t) g (t)dt
Rn (2) Rn Rn

= f (t) g (t)dt.
Rn
The formula with F 1 is exactly similar.

Now to verify the corollary, let k f in L2 (Rn ) and let k g in L2 (Rn ).
Then by Lemma 13.2.13

F f F gdx = lim F k F k dx = lim k k dx = f gdx
Rn k Rn k Rn Rn
A similar argument holds for F 1 .

How does one compute F f and F 1 f ?
Theorem 13.2.18 For f L2 (Rn ), let fr = f XEr where Er is a bounded measur-
able set with Er Rn . Then the following limits hold in L2 (Rn ) .
F f = lim F fr , F 1 f = lim F 1 fr .
r r
Proof: ||f fr ||2 0 and so ||F f F fr ||2 0 and ||F 1 f F 1 fr ||2 0 by

Plancherels Theorem.
What are F fr and F 1 fr ? Let G

F fr dx = fr F dx
Rn Rn

n
= (2) 2 fr (x)eixy (y)dydx
Rn Rn

n
= [(2) 2 fr (x)eixy dx](y)dy.
Rn Rn
Since this holds for all G, a dense subset of L2 (Rn ), it follows that

n
F fr (y) = (2) 2 fr (x)eixy dx.
Rn
Similarly
1 n
F fr (y) = (2) 2 fr (x)eixy dx.
Rn
This shows that to take the Fourier transform of a function
in L2 (Rn ), it suces
2 n n
to take the limit as r in L (R ) of (2) 2 Rn fr (x)eixy dx. A similar
procedure works for the inverse Fourier transform.
Note this reduces to the earlier denition in case f L1 (Rn ). Now consider the
convolution of a function in L2 with one in L1 .
Theorem 13.2.19 Let h L2 (Rn ) and let f L1 (Rn ). Then h f L2 (Rn ),
F 1 (h f ) = (2) F 1 hF 1 f,
n/2
n/2
F (h f ) = (2) F hF f,
and
||h f ||2 ||h||2 ||f ||1 . (13.2.10)
Proof: An application of Minkowskis inequality yields

( ( )2 )1/2
|h (x y)| |f (y)| dy dx ||f ||1 ||h||2 . (13.2.11)
Rn Rn

Hence |h (x y)| |f (y)| dy < a.e. x and

x h (x y) f (y) dy
is in L2 (Rn ). Let Er Rn , m (Er ) < . Thus,
hr XEr h L2 (Rn ) L1 (Rn ),
and letting G,
F (hr f ) () dx

(hr f ) (F ) dx

n/2
= (2) hr (x y) f (y) eixt (t) dtdydx
( )
n/2
= (2) hr (x y) ei(xy)t dx f (y) eiyt dy (t) dt

n/2
= (2) F hr (t) F f (t) (t) dt.
Since is arbitrary and G is dense in L2 (Rn ),

n/2
F (hr f ) = (2) F hr F f.
Now by Minkowskis Inequality, hr f h f in L2 (Rn ) and also it is clear that

hr h in L2 (Rn ) ; so, by Plancherels theorem, you may take the limit in the above
and conclude
n/2
F (h f ) = (2) F hF f.
The assertion for F 1 is similar and 13.2.10 follows from 13.2.11.
13.2.4 The Schwartz Class

The problem with G is that it does not contain Cc (Rn ). I have used it in presenting
the Fourier transform because the functions in G have a very specic form which
made some technical details work out easier than in any other approach I have
seen. The Schwartz class is a larger class of functions which does contain Cc (Rn )
and also has the same nice properties as G. The functions in the Schwartz class
are innitely dierentiable and they vanish very rapidly as |x| along with all
their partial derivatives. This is the description of these functions, not a specic
form involving polynomials times e|x| . To describe this precisely requires some
2
notation.
Denition 13.2.20 f S, the Schwartz class, if f C (Rn ) and for all positive
integers N ,
N (f ) <
where
2
N (f ) = sup{(1 + |x| )N |D f (x)| : x Rn , || N }.
Thus f S if and only if f C (Rn ) and
sup{|x D f (x)| : x Rn } < (13.2.12)
for all multi indices and .
Also note that if f S, then p(f ) S for any polynomial, p with p(0) = 0 and
that
S Lp (Rn ) L (Rn )
for any p 1. To see this assertion about the p (f ), it suces to consider the case
of the product of two elements of the Schwartz class. If f, g S, then D (f g) is
a nite sum of derivatives of f times derivatives of g. Therefore, N (f g) < for
all N . You may wonder about examples of things in S. Clearly any function in
Cc (Rn ) is in S. However there are other functions in S. For example e|x| is in
2
S as you can verify for yourself and so is any function from G. Note also that the
density of Cc (Rn ) in Lp (Rn ) shows that S is dense in Lp (Rn ) for every p.
Recall the Fourier transform of a function in L1 (Rn ) is given by

F f (t) (2)n/2 eitx f (x)dx.
Rn
Therefore, this gives the Fourier transform for f S. The nice property which S
has in common with G is that the Fourier transform and its inverse map S one to
one onto S. This means I could have presented the whole of the above theory in
terms of S rather than in terms of G. However, it is more technical.
Theorem 13.2.21 If f S, then F f and F 1 f are also in S.

Proof: To begin with, let = ej = (0, 0, , 1, 0, , 0), the 1 in the j th slot.

F 1 f (t + hej ) F 1 f (t) eihxj 1
= (2)n/2 eitx f (x)( )dx. (13.2.13)
h Rn h
Consider the integrand in 13.2.13.

i(h/2)x
itx ihxj
1 e ei(h/2)xj
e f (x)( e
j
) = |f (x)| ( )
h h

i sin ((h/2) xj )
= |f (x)| |f (x)| |xj |

(h/2)
and this is a function in L1 (Rn ) because f S. Therefore by the Dominated

Convergence Theorem,

F 1 f (t)
= (2)n/2 eitx ixj f (x)dx
tj
R
n
n/2
= i(2) eitx xej f (x)dx.
Rn
Now xej f (x) S and so one can continue in this way and take derivatives inde-
nitely. Thus F 1 f C (Rn ) and from the above argument,

1 n/2
D F f (t) =(2) eitx (ix) f (x)dx.
Rn
To complete showing F 1 f S,

t D F 1 f (t) =(2)n/2
a
eitx t (ix) f (x)dx.
Rn
Integrate this integral by parts to get

1 n/2
i|| eitx D ((ix) f (x))dx.
a
t D F f (t) =(2) (13.2.14)
Rn
Here is how this is done.

eitj xj j
t (ix) f (x) |

eitj xj tj j (ix) f (x)dxj = +
R itj j

1
i eitj xj tj j Dej ((ix) f (x))dxj
R
where the boundary term vanishes because f S. Returning to 13.2.14, use the
fact that |eia | = 1 to conclude

|t D F 1 f (t)| C
a
|D ((ix) f (x))|dx < .
Rn
It follows F 1 f S. Similarly F f S whenever f S.

Of course S can be considered a subset of G as follows. For S,

() dx
Rn
Theorem 13.2.22 Let S. Then (F F 1 )() = and (F 1 F )() =

whenever S. Also F and F 1 map S one to one and onto S.
Proof: The rst claim follows from the fact that F and F 1 are inverses of

( 1on)G which was established above. For the second,1let S. Then
each other
= F F . Thus F maps S onto S. If F = 0, then do F to both sides to
conclude = 0. Thus F is one to one and onto. Similarly, F 1 is one to one and
onto.
13.2.5 Convolution
To begin with it is necessary to discuss the meaning of f where f G and G.
What should it mean? First suppose f Lp (Rn ) or measurable with polynomial
growth. Then f also has these properties. Hence, it should be the case that
f () = Rn f dx = Rn f () dx. This motivates the following denition.
Denition 13.2.23 Let T G and let G. Then T T G will be dened

by
T () T () .
The next topic is that of convolution. It was just shown that
F F f, F 1 (f ) = (2) F 1 F 1 f
n/2 n/2
F (f ) = (2)
whenever f L2 (Rn ) and G so the same denition is retained in the general

case because it makes perfect sense and agrees with the earlier denition.
Denition 13.2.24 Let f G and let G. Then dene the convolution of f

with an element of G as follows.
F 1 (F F f ) G
n/2
f (2)
There is an obvious question. With this denition, is it true that F 1 (f ) =

n/2 1
(2) F F 1 f as it was earlier?
Theorem 13.2.25 Let f G and let G.

n/2
F (f ) = (2) F F f, (13.2.15)
F 1 (f ) = (2) F 1 F 1 f.
n/2
(13.2.16)
Proof: Note that 13.2.15 follows from Denition 13.2.24 and both assertions
hold for f G. Consider 13.2.16. Here is a simple formula involving a pair of
functions in G. ( )
F 1 F 1 (x)
( )
(x y) eiyy1 eiy1 z (z) dzdy1 dy (2)
n
=
( )
(x y) eiyy1 eiy1 z (z) dzd
n
= y1 dy (2)
= ( F F ) (x) .
Now for G,
( ) n/2 ( 1 )
F F 1 F 1 f () (2) F F 1 f (F )
n/2
(2)
( ) n/2 ( 1 ( 1 ))
F 1 f F 1 F (2)
n/2
(2) f F F F =
( ))
n/2 1 (( )
f (2) F F F 1 F 1 (F )
( )
f F 1 F 1 = f ( F F ) (13.2.17)
Also ( )
F 1 (F F f ) () (2) (F F f ) F 1
n/2 n/2
(2)
( ) n/2 ( ( ))
F f F F 1 (2) f F F F 1 =
n/2
(2)
( ( )))
n/2 (
= f F (2) F F 1
( ( )))
n/2 ( 1 ( ( ))
= f F (2) F F F F 1 = f F F 1 (F F )
f (F F ) = f ( F F ) . (13.2.18)
The last line follows from the following.

F F (x y) (y) dy = F (x y) F (y) dy

= F (x y) F (y) dy

= (x y) F F (y) dy.
From 13.2.18 and 13.2.17 , since was arbitrary,

( )
F F 1 F 1 f = (2)
n/2 n/2 1
(2) F (F F f ) f
which shows 13.2.16.

13.3 Exercises
1. For f L1 (Rn ), show that if F 1 f L1 or F f L1 , then f equals a
continuous bounded function a.e.
2. Suppose f, g L1 (R) and F f = F g. Show f = g a.e.
3. Show that if f L1 (Rn ) , then lim|x| F f (x) = 0.
4. Suppose f f = f or f f = 0 and f L1 (R). Show f = 0.

r
5. For this problem dene a f (t) dt limr a f (t) dt. Note this coincides
with the Lebesgue integral when f L1 (a, ). Show
sin(u)
(a) 0 u du = 2
sin(ru)
(b) limr u du
= 0 whenever > 0.

(c) If f L1 (R), then limr R sin (ru) f (u) du = 0.

Hint: For the rst two, use u1 = 0 eut dt and apply Fubinis theorem to
R
0
sin u R eut dtdu. For the last part, rst establish it for f Cc (R) and
then use the density of this set in L1 (R) to obtain the result. This is sometimes
called the Riemann Lebesgue lemma.
6. Suppose that g L1 (R) and that at some x > 0, g is locally Holder

continuous from the right and from the left. This means
lim g (x + r) g (x+)
r0+
exists,
lim g (x r) g (x)
r0+
exists and there exist constants K, > 0 and r (0, 1] such that for |x y| <
,
r
|g (x+) g (y)| < K |x y|
for y > x and
r
|g (x) g (y)| < K |x y|
for y < x. Show that under these conditions,
( )
2 sin (ur) g (x u) + g (x + u)
lim du
r 0 u 2
g (x+) + g (x)
= .
2
13.3. EXERCISES 327
7. Let g L1 (R) and suppose g is locally Holder continuous from the right
and from the left at x. Show that then
R
1 g (x+) + g (x)
lim eixt eity g (y) dydt = .
R 2 R 2
This is very interesting. If g L2 (R), this shows F 1 (F g) (x) = g(x+)+g(x)

2 ,
the midpoint of the jump in g at the point, x. In particular, if g G,
F 1 (F g) = g. Hint: Show the left side of the above equation reduces to
( )
2 sin (ur) g (x u) + g (x + u)
du
0 u 2
and then use Problem 6 to obtain the result.
8. A measurable function g dened on (0, ) has exponential growth if |g (t)|
Cet for some . For Re (s) > , dene the Laplace Transform by

Lg (s) esu g (u) du.
0
Assume that g has exponential growth as above and is Holder continuous from
the right and from the left at t. Pick > . Show that
R
1 g (t+) + g (t)
lim et eiyt Lg ( + iy) dy = .
R 2 R 2
This formula is sometimes written in the form

+i
1
est Lg (s) ds
2i i
and is called the complex inversion integral for Laplace transforms. It can be
used to nd inverse Laplace transforms. Hint:
R
1
et eiyt Lg ( + iy) dy =
2 R
R
1
et eiyt e(+iy)u g (u) dudy.
2 R 0
Now use Fubinis theorem and do the integral from R to R to get this equal
to
et u sin (R (t u))
e g (u) du
tu
where g is the zero extension of g o [0, ). Then this equals

et (tu) sin (Ru)
e g (t u) du
u
which equals

2et g (t u) e(tu) + g (t + u) e(t+u) sin (Ru)
du
0 2 u
and then apply the result of Problem 6.
9. Suppose f G. Show F (fxj )(t) = itj F f (t).
10. Let f G and let k be a positive integer.

||f ||k,2 (||f ||22 + ||D f ||22 )1/2.
||k
One could also dene

|||f |||k,2 ( |F f (x)|2 (1 + |x|2 )k dx)1/2.
Rn
Show both || ||k,2 and ||| |||k,2 are norms on G and that they are equivalent.
These are Sobolev space norms. For which values of k does the second norm
make sense? How about the rst norm?
11. Dene H k (Rn ), k 0 by f L2 (Rn ) such that

1
( |F f (x)|2 (1 + |x|2 )k dx) 2 < ,

1
|||f |||k,2 ( |F f (x)|2 (1 + |x|2 )k dx) 2.
Show H k (Rn ) is a Banach space, and that if k is a positive integer, H k (Rn )

={ f L2 (Rn ) : there exists {uj } G with ||uj f ||2 0 and {uj } is a
Cauchy sequence in || ||k,2 of Problem 10}. This is one way to dene Sobolev
Spaces. Hint: One way to do the second part of this is to dene a new
measure by ( )k
2
(E) 1 + |x| dx.
E
Then show is a Borel measure which is inner and outer regular and show
there exists {gm } such that gm G and gm F f in L2 (). Thus gm =
F fm , fm G because F maps G onto G. Then by Problem 10, {fm } is
Cauchy in the norm || ||k,2 .
12. If 2k > n, show that if f H k (Rn ), then f equals a bounded continuous

function a.e. Hint: Show that for k this large, F f L1 (Rn ), and then use
Problem 1. To do this, write
k k
|F f (x)| = |F f (x)|(1 + |x|2 ) 2 (1 + |x|2 ) 2 ,
13.3. EXERCISES 329
So
k k
|F f (x)|dx = |F f (x)|(1 + |x|2 ) 2 (1 + |x|2 ) 2 dx.
Use the Cauchy Schwarz inequality. This is an example of a Sobolev imbedding

Theorem.
13. Let u G. Then F u G and so, in particular, it makes sense to form the
integral,
F u (x , xn ) dxn
R
where (x , xn ) = x R . For u G, dene u (x ) u (x , 0). Find a

n
constant such that F (u) (x ) equals this constant times the above integral.
Hint: By the dominated convergence theorem

F u (x , xn ) dxn = lim e(xn ) F u (x , xn ) dxn .
2
R 0 R
Now use the denition of the Fourier transform and Fubinis theorem as re-
quired in order to obtain the desired relationship.
( )2 ( x2 (1+t2 ) )
x t2 1 e
14. Let h (x) = 0
e dt + 0 1+t2 dt . Show that h (x) = 0 and

h (0) = /4. Then let x to conclude that 0 et dt = /2. Show that
2
t2 ct2

e dt = and that e dt = c .
15. Recall that for f a function, fy (x) = f (x y) . Find a relationship between

F fy (t) and F f (t) given that f L1 (Rn ).
16. For f L1 (Rn ) , simplify F f (t + y) .
17. For f (L1) (Rn ) and c a nonzero real number, show F f (ct) = F g (t) where
g (x) = f xc .

18. Suppose that f L1 (R) and that |x| |f (x)| dx < . Find a way to use the
Fourier transform of f to compute xf (x) dx.
Fourier Series
14.1 Denition And Basic Properties

A Fourier series is an expression of the form

ck eikx
k=
where this means

n
lim ck eikx .
n
k=n
Obviously such a sequence of partial sums may or may not converge at a particular
value of x.
These series have been important in applied math since the time of Fourier who
was an ocer in Napoleons army. He was interested in studying the ow of heat in
cannons and invented the concept to aid him in his study. Since that time, Fourier
series and the mathematical problems related to their convergence have motivated
the development of modern methods in analysis. As recently as the mid 1960s a
problem related to convergence of Fourier series was solved for the rst time and
the solution of this problem was a big surprise.1 This chapter is on the classical
theory of convergence of Fourier series.
If you can approximate a function f with an expression of the form

ck eikx
k=
then the function must have the property f (x + 2) = f (x) because this is true of
every term in the above series. More generally, here is a denition.
1 The question was whether the Fourier series of a function in L2 converged a.e. to the function.
It turned out that it did, to the surprise of many because it was known that the Fourier series of
a function in L1 does not necessarily converge to the function a.e. The problem was solved by
Carleson in 1965.
331
332 FOURIER SERIES
Denition 14.1.1 A function, f dened on R is a periodic function of period T if

f (x + T ) = f (x) for all x.
As just explained, Fourier series are useful for representing periodic functions and
no other kind of function. There is no loss of generality in studying only functions
which are periodic of period 2. Indeed, if f is a function (which ) has period T , you
can study this function in terms of the function, g (x) f T2x where g is periodic
of period 2.

Denition 14.1.2 For f L1 ([, ]) (f measurable and
|f (t)| dt < ) and
f periodic on R, dene the Fourier series of f as

ck eikx , (14.1.1)
k=
where
1
ck f (y) eiky dy. (14.1.2)
2
Also dene the nth partial sum of the Fourier series of f by

n
Sn (f ) (x) ck eikx . (14.1.3)
k=n
It may be interesting to see where this formula came from. Suppose then that

f (x) = ck eikx ,
k=

multiply both sides by eimx and take the integral
, so that

f (x) eimx dx = ck eikx eimx dx.
k=
Now switch the sum and the integral on the right side even though there is absolutely
no reason to believe this makes any sense. Then

imx
f (x) e dx = ck eikx eimx dx
k=

= cm 1dx = 2cm

because eikx eimx dx = 0 if k = m. It is formal manipulations of the sort just
presented which suggest that Denition 14.1.2 might be interesting.
14.1. DEFINITION AND BASIC PROPERTIES 333
In case f is real valued, ck = ck and so

1
n
( )
Sn f (x) = f (y) dy + 2 Re ck eikx .
2
k=1
Letting ck k + i k

n
1
Sn f (x) = f (y) dy + 2 [k cos kx k sin kx]
2 k=1
where
1 iky 1
ck = f (y) e dy = f (y) (cos ky i sin ky) dy
2 2
which shows that

1
1
k = f (y) cos (ky) dy, k = f (y) sin (ky) dy
2 2
Therefore, letting ak = 2k and bk = 2 k ,

1 1
ak = f (y) cos (ky) dy, bk = f (y) sin (ky) dy

and
a0
n
Sn f (x) = + ak cos kx + bk sin kx (14.1.4)
2
k=1
This is often the way Fourier series are presented in elementary courses where it is
only real functions which are to be approximated. However it is easier to stick with
Denition 14.1.2.
The partial sums of a Fourier series can be written in a particularly simple form
which is presented next.

n
Sn f (x) = ck eikx
k=n
n ( )
1 iky
= f (y) e dy eikx
2
k=n

1 ( ik(xy) )
n
= e f (y) dy
2
k=n

Dn (x y) f (y) dy. (14.1.5)

The function,
1 ikt
n
Dn (t) e
2
k=n
is called the Dirichlet Kernel.
334 FOURIER SERIES
Theorem 14.1.3 The function, Dn satises the following:

1. Dn (t) dt = 1
2. Dn is periodic of period 2
1 sin(n+ 2 )t
1
3. Dn (t) = (2) sin( 2 )
t .
iky
Proof: Part 1 is obvious because 2 1

e dy = 0 whenever k = 0 and it
equals 1 if k = 0. Part 2 is also obvious because t eikt is periodic of period 2.
It remains to verify Part 3. Note

n
n
ikt
2Dn (t) = e =1+2 cos (kt)
k=n k=1
Therefore,
( ) ( ) n ( )
t t t
2Dn (t) sin = sin +2 sin cos (kt)
2 2 2
k=1
( ) n (( ) ) (( ) )
t 1 1
= sin + sin k+ t sin k t
2 2 2
k=1
(( ) )
1
= sin n+ t
2
where the easily veried trig. identity cos (a) sin (b) = 12 (sin (a + b) sin (a b)) is
used to get to the second line.
Here is a picture of the Dirichlet kernels for n = 1, 2, and 3
y 1
0
3 2 1 0 1 2 3
x
1
Note they are not nonnegative but there is a large central positive bump which
gets larger as n gets larger.
It is not reasonable to expect a Fourier series to converge to the function at
every point. To see this, change the value of the function at a single point in
(, ) and extend to keep the modied function periodic. Then the Fourier series
of the modied function is the same as the Fourier series of the original function and
so if pointwise convergence did take place, it no longer does. However, it is possible
to prove an interesting theorem about pointwise convergence of Fourier series. This
is done next.
14.2. THE RIEMANN LEBESGUE LEMMA 335
14.2 The Riemann Lebesgue Lemma

The Riemann Lebesgue lemma is the basic result which makes possible the study of
pointwise convergence of Fourier series. It is also a major result in other contexts
and serves as a useful example.
Lemma 14.2.1 (Riemann Lebesgue) Let f L1 (R) . Then

lim f (t) sin (t + ) dt = 0. (14.2.6)
R
Proof: By Theorem 12.9.9, there exists g Cc (R) such that
||f g||L1 (R) < /2.
Then for spt (g) (a, b)

=0
z }| {
cos (t + ) b b
cos (t + )
g (t) sin (t + ) dt = g (t) |a + g (t) dt
R a
and this converges to 0 as since g (t) is bounded. Therefore,

f (t) sin (t + ) dt f (t) sin (t + ) dt g (t) sin (t + ) dt

+ g (t) sin (t + ) dt
R

|f (t) g (t)| dt + /2 < /2 + /2 =
provided is large enough.
14.3 Dinis Criterion For Convergence

Fourier series like to converge to the midpoint of the jump of a function under
certain conditions. The condition given for convergence in the following theorem
is due to Dini. It is a generalization of the usual theorem presented in elementary
books on Fourier series methods. [3].
Recall
lim f (t) f (x+) , and lim f (t) f (x)
tx+ tx
Theorem 14.3.1 Let f be a periodic function of period 2 which is in L1 ([, ]).

Suppose at some x, f (x+) and f (x) both exist and that the function

f (x y) f (x) + f (x + y) f (x+)
y h (y) (14.3.7)
y
336 FOURIER SERIES
is in L1 ([, ]). Then

f (x+) + f (x)
lim Sn f (x) = . (14.3.8)
n 2
Proof:
Sn f (x) = Dn (x y) f (y) dy

Change variables x y y and use the periodicity of f and Dn along with the
formula for Dn (y) to write this as

Sn f (x) = Dn (y) f (x y)

0
= Dn (y) f (x y) dy + Dn (y) f (x y) dy
0

= Dn (y) [f (x y) + f (x + y)] dy
0
(( ) )[ ]
1 sin n + 12 y f (x y) + f (x + y)
= (y) dy. (14.3.9)
0 sin 2 2
Note the function (( ) )

1 sin n + 21 y
y ( ) ,
sin y2
while it is not dened at 0, is at least bounded and by LHospitals rule,
(( ) )
1 sin n + 12 y 2n + 1
lim ( ) = ,
y0 sin y2
so dening it to equal this value at 0 yields a continuous, hence Riemann integrable

function
and so the above integral at least makes sense. Also from the property
that Dn (t) dt = 1,

f (x+) + f (x) = Dn (y) [f (x+) + f (x)] dy

= 2 Dn (y) [f (x+) + f (x)] dy

0
(( ) )
1 sin n + 21 y
= ( ) [f (x+) + f (x)] dy.
0 sin y2
Therefore,

Sn f (x) f (x+) + f (x) =
2

1 sin ((n + 1 ) y ) [ f (x y) f (x) + f (x + y) f (x+) ]
( )
2
dy . (14.3.10)
0 sin y2 2
14.3. DINIS CRITERION FOR CONVERGENCE 337
Now the function

f (x y) f (x) + f (x + y) f (x+)
(y)
y (14.3.11)
2 sin 2
is in L1 ([0, ]) and so, extending it to equal 0 o this interval produces a function

which is in L1 (R). I need to verify this assertion. When this is done, the desired
result will follow right away from the Riemann Lebesgue lemma. There is clearly
no measurability diculty with the above function. Also, by the assumption given,
the function equals

f (x y) f (x) + f (x + y) f (x+)
y
( y )

2 sin 2 y
The function y
2 sin( y2 )
is bounded on (0, ) so the result is in L1 ([0, ]) .
The following corollary is obtained immediately from the above proof with minor
modications.
Corollary 14.3.2 Let f be a periodic function of period 2 which is an element of

R ([, ]). Suppose at some x, the function

f (x y) + f (x + y) 2s
y (14.3.12)
y
is in L1 ([0, ]). Then

lim Sn f (x) = s. (14.3.13)
n
The following corollary gives an easy to check condition for the Fourier series to
converge to the mid point of the jump.
Corollary 14.3.3 Let f be a periodic function of period 2 which is an element

of L1 ([, ]). Suppose at some x, f (x+) and f (x) both exist and there exist
positive constants, K and such that whenever 0 < y <
|f (x y) f (x)| Ky , |f (x + y) f (x+)| < Ky (14.3.14)
where (0, 1]. Then
f (x+) + f (x)
lim Sn f (x) = . (14.3.15)
n 2
Proof: The condition 14.3.14 clearly implies Dinis condition, 14.3.7. This is
because for 0 < y <

f (x y) f (x) + f (x + y) f (x+)
2Ky 1
y
338 FOURIER SERIES
and so
f (x y) f (x) + f (x + y) f (x+)

dy
y

1
2Ky 1 dy + |f (x y) f (x) + f (x + y) f (x+)| dy.

Now

2Ky 1 dy = 2K

which converges to 2K as 0. Thus

f (x y) f (x) + f (x + y) f (x+)
lim dy
0+ y
exists and so, from the monotone convergenct theorem the function
f (x y) f (x) + f (x + y) f (x+)
y
y
is in L1 ([0, ]) . This is the Dini condition.
As pointed out by Apostol [3], where you can read these theorems presented in
the context of the Riemann integral, this is a very surprising result because even
though the Fourier coecients depend on the values of the function on all of [, ],
the convergence properties depend in this theorem on very local behavior of the
function.
14.4 Jordans Criterion

There is a dierent condition which implies the Fourier series converges to the mid
point of the jump. In order to prove the theorem, there are some interesting lemmas
which are needed.
Lemma 14.4.1 Let G be an increasing function dened on [a, b]. Thus G (x)
G (y) whenever x < y. Then G (x) = G (x+) = G (x) for every x except for a
countable set of exceptions.
Proof: Let S {x [a, b] : G (x+) > G (x)}. Then there is a rational number
in each interval, (G (x) , G (x+)) and also, since G is increasing, these intervals are
disjoint. It follows that there are only countably many such intervals. Therefore,
S is countable and if x / S, G (x+) = G (x) showing that G is continuous on S C
and the claimed equality holds.
The next lemma is called the second mean value theorem for integrals.
Lemma 14.4.2 Let G be an increasing function dened on [a, b] and let f be a
continuous function dened on [a, b]. Then there exists t0 [a, b] such that
b ( t0 ) ( )
b
G (s) f (s) ds = G (a) f (s) ds + G (b) f (s) ds . (14.4.16)
a a t0
14.4. JORDANS CRITERION 339
Proof: Letting h > 0 dene

t s
1
Gh (t) G (r) drds
h2 th sh
where G (x) G (a) for all x < a. Thus Gh (a) = G (a). Also, from the fundamental
theorem of calculus, Gh (t) 0 and is a continuous function of t. Also it is clear
t
that limh0 Gh (t) = G (t) for all t [a, b]. Letting F (t) a f (s) ds,
b b
Gh (s) f (s) ds = F (t) Gh (t) |ba F (t) Gh (t) dt. (14.4.17)
a a
Now letting m = min {F (t) : t [a, b]} and M = max {F (t) : t [a, b]}, since
Gh (t) 0,
b b b
m Gh (t) dt F (t) Gh (t) dt M Gh (t) dt.
a a a
b
Therefore, if a
Gh (t) dt = 0,
b
F (t) Gh (t) dt
m a
b M
a
G h (t) dt
and so by the intermediate value theorem from calculus,

( ) b
b

Gh (t) dt F (th ) = F (t) Gh (t) dt
a a
b
for some th [a, b] . This is true even if a Gh (t) dt = 0 because in this case, the
left side equals 0. Since Gh 0 and is continuous, it must equal 0 and so the right
side is also 0. Therefore, substituting for
b
F (t) Gh (t) dt
a
in 14.4.17,
[ ]
b b
Gh (s) f (s) ds = F (t) Gh (t) |ba F (th ) Gh (t) dt
a a
= F (b) Gh (b) F (th ) Gh (b) + F (th ) Gh (a)

( ) ( )
b th
= f (s) ds Gh (b) + f (s) ds G (a) .
th a
Now selecting a convergent subsequence, still denoted by h which converges to zero,

let th t0 [a, b]. Therefore, using the dominated convergence theorem or simply
340 FOURIER SERIES
the continuity of f and the above lemma,

b b b
G (s) f (s) ds = G (s) f (s) ds = lim Gh (s) f (s) ds
h0 a
a a
( ) ( )
b th
= lim f (s) ds Gh (b) + f (s) ds G (a)
h0 th a
( ) ( )
b t0
= f (s) ds G (b) + f (s) ds G (a) .
t0 a
The above lemma will be used in the following lemma from Apostol [3].
Lemma 14.4.3 Let G be increasing. Then for > 0,

sin (y)
lim G (y) dy = G (0+)
0 y 2

Proof: Let 0 < h < then 0 G (y) sin(y) y dy =
h h
sin (y) sin (y) sin (y)
(G (y) G (0+)) dy + G (0+) dy + G (y) dy
0 y 0 y h y
From the mean value theorem above, the rst integral equals
h
sin (y)
(G (h) G (0+)) dy
0 y

This integral converges as to 0 siny y dy = 2 . Just change the variable and
use Problem 6 on Page 233. See also Problem 19 on Page 358 below. Therefore, if
h is chosen small enough, the rst term is bounded by /3. Fix such an h. Then
as the second term converges to 2 G (0+). The last term converges to 0 by
the Riemann Lebesgue lemma. Therefore, xing h as described,

h sin (y)
sin (y)
G (y) dy G (0+) (G (y) G (0+)) dy
0 y 2 0 y
h
sin (y)
sin (y)
+ G (0+) dy + G (y) dy
0 y 2 h y

< + +
3 3 3
provided is large enough.
Denition 14.4.4 Let f : [a, b] C be a function. Then f is of bounded variation
if { n }

sup |f (ti ) f (ti1 )| : a = t0 < < tn = b V (f, [a, b]) <
i=1
where the sums are taken over all possible lists, {a = t0 < < tn = b}. The sym-
bol, V (f, [a, b]) is known as the total variation on [a, b].
14.4. JORDANS CRITERION 341
Lemma 14.4.5 A real valued function f , dened on an interval [a, b] is of bounded

variation if and only if there are increasing functions, H and G dened on [a, b]
such that f (t) = H (t) G (t). A complex valued function is of bounded variation
if and only if the real and imaginary parts are of bounded variation.
Proof: For f a real valued function of bounded variation, dene an increasing

function, H (t) V (f, [a, t]) and then note that
G(t)
z }| {
f (t) = H (t) [H (t) f (t)].
It is routine to verify that G (t) is increasing. Conversely, if f (t) = H (t) G (t)

where H and G are increasing, the total variation for H is just H (b) H (a) and
the total variation for G is G (b) G (a). Therefore, the total variation for f is
bounded by the sum of these.
The last claim follows from the observation that
|f (ti ) f (ti1 )| max (|Re f (ti ) Re f (ti1 )| , |Im f (ti ) Im f (ti1 )|)
and
|Re f (ti ) Re f (ti1 )| + |Im f (ti ) Im f (ti1 )| |f (ti ) f (ti1 )| .
Since a bounded variation function is the dierence of increasing functions, this

immediately implies the following corollary.
Corollary 14.4.6 If f is of bounded variation, then

sin (y)
lim f (y) = f (0+)
0 y 2
With this corollary, here is the main theorem, the Jordan criterion for pointwise
convergence of the Fourier series.
Theorem 14.4.7 Suppose f is 2 periodic and is in L1 (, ). Suppose also that

for some > 0, f is of bounded variation on [x , x + ]. Then
f (x+) + f (x)
lim Sn f (x) = . (14.4.18)
n 2
Proof: First note that from Denition 14.4.4, limyx Re f (y) exists because
Re f is the dierence of two increasing functions. Similarly this limit will exist for
Im f by the same reasoning, and limits of the form limyx+ will also exist. Then
f (x+) + f (x)
Sn f (x) =
2
( )
f (x+) + f (x)
Dn (y) f (x y) dy
2
342 FOURIER SERIES

= Dn (y) [(f (x + y) f (x+)) + (f (x y) f (x))] dy.
0
Now the Dirichlet kernel, Dn (y) is a constant multiple of
sin ((n + 1/2) y) / sin (y/2)
and so the Riemann Lebesgue lemma implies

lim Dn (y) [(f (x + y) f (x+)) + (f (x y) f (x))] dy = 0.
n
Thus it suces to show that

lim Dn (y) [(f (x + y) f (x+)) + (f (x y) f (x))] dy = 0. (14.4.19)
n 0
Now y (f (x + y) f (x+))+(f (x y) f (x)) h (y) is of bounded variation

for y [0, ] and limy0+ h (y) = h (0+) = 0. The above limit equals
( )
sin n + 21 y y y
lim ( y ) h (y) dy = lim ( ) h (y) = 0
n 0 y sin 2 y0+ sin y
2
by Corollary 14.4.6.
It is known that neither the Jordan criterion nor the Dini criterion implies the
other. See Problem 21.
14.5 Integrating And Dierentiating Fourier Se-

ries
You can typically integrate Fourier series term by term and things will work out
according to your expectations. More precisely, here is the main theorem.
Theorem 14.5.1 Let f be 2 periodic and in L1 ([, ]). Then
x x n x
f (t) dt = a0 dt + lim ak eikt dt
n
k=n,k=0
where ak are the Fourier coecients of f .

To prove this theorem, here is a lemma.

Lemma 14.5.2 Suppose f L1 (, ) and f dx = 0. Then
x
f (t) dtdx = tf (t) dt

x
eikt
eikx f (t) dtdx = f (t) dt
ik
14.5. INTEGRATING AND DIFFERENTIATING FOURIER SERIES 343
Proof: This comes from a use of Fubinis theorem.

x
f (t) dtdx = f (t) dxdt = f (t) dxdt
t t

= f (t) ( t) dt = tf (t) dt.

x
eikx f (t) dtdx = f (t) eikx dxdt
t
( )

eikt eik
eikt
= f (t) dt = f (t) dt
ik ik ik

Proof of the theorem: First suppose a0 1
f (t) dt = 0. Of course this
x 2
happens if and only if f (t) dt = 0. Then letting F (x) = f (t) dt, it follows
that F () = F () = 0 and that if F is extended to be 2 periodic, the resulting
function is continuous and of bounded variation on any interval. What is the Fourier
series of F ? Denote its Fourier coecients by Ak and the Fourier coecients of f
by ak . Then from the lemma,
x
1 1
A0 = f (t) dtdx = tf (t) dt
2 2
x
1 1 eikt 1
Ak = eikx f (t) dtdx = f (t) dt = ak
2 2 ik ik
Since F is of bounded variation, it follows that for all x,
1
1
F (x) = tf (t) dt + ak eikx
2 ik
k=0
In particular, this is true when x = and then

1
1
0= tf (t) dt + ak eik
2 ik
k=0
It follows that
1 x
( ) n x
F (x) = ak eikx etk = a0 dt + lim ak eikt dt
ik n
k=0 k=n,k=0
which proves the theorem when a0 = 0. Now in general, consider g f a0 . For

k = 0,
ikt
g (t) e dt = f (t) eikt dt 2ak .

Then from what was just shown,
x x x
gdt = f dt a0 (x + ) = ak eikt dt
k=0
344 FOURIER SERIES
Showing that for all x,

x x
n x
f dt = a0 dt + lim ak eikt dt.
n
k=n,k=0
Example 14.5.3 Let f (x) = x for x [, ) and extend f to make it 2 periodic.

Then the Fourier coecients of f are
k
(1) i
a0 = 0, ak =
k

Therefore, 1
2
teikt = i
k cos k
x
1 2 1 2
tdt = x
2 2

n k x
(1) i
= lim eikt dt
n k
k=n,k=0
( )

n
(1) i
k
sin xk cos xk + (1)
k
= lim +i
n k k k
k=n,k=0
For fun, let x = 0 and conclude

( )
1
n
(1) i
k
1 + (1)
k
2 = lim i
2 n k k
k=n,k=0
( )

n
(1)
k+1
1 + (1)
k
= lim
n k k
k=n,k=0

n k

(1) + (1) 4
= lim 2 = 2
n
k=1
k2
k=1
(2k 1)
and so

2 1
= 2
8 (2k 1)
k=1
14.6 Dierentiating Fourier Series

Of course it is not reasonable to suppose you can dierentiate a Fourier series term
by term and get good results.
Consider the series for f (x) = 1 if x (0, ] and f (x) = 1 on (, 0) with
f (0) = 0. In this case a0 = 0.
( 0 )
1 ikt ikt i cos k 1
ak = e dt e dt =
2 0 k
14.6. DIFFERENTIATING FOURIER SERIES 345
so the Fourier series is ( )

(1) 1
k
ieikx
k
k=0
What happens if you dierentiate it term by term? It gives

k
(1) 1 ikx
e

k=0
which fails to converge anywhere because the k th term fails to converge to 0. This
is in spite of the fact that f has a derivative away from 0.
However, it is possible to prove some theorems which let you dierentiate a
Fourier series term by term. Here is one such theorem.
Theorem 14.6.1 Suppose for x [, ]

x
f (x) = f (t) dt + f ()

where f is also periodic of period 2 and f (t) is in L1 ([, ]). Then if

f (x) = ak eikx
k=

it follows the Fourier series of f is

ak ikeikx .
k=
Proof: Since f is 2 periodic, (why?) it follows from Theorem 14.5.1

( x )
f (x) f () = bk ikt
e dt
k=
where bk is the k th Fourier coecient of f . Thus

1
bk = f (t) eikt dt.
2
If k = 0, b0 = 1
2 (f () f ()) = 0 by periodicity of f . For k = 0,
( t )
1 iks k
bk = f (t) ike ds (1) dt
2

Since
f (t) dt = 0, this equals
( t )
1 iks
f (t) ike ds dt
2
346 FOURIER SERIES
and now using Fubinis theorem

1 1
= ik f (t) eiks dtds = ik eiks (f () f (s)) ds
2 s 2

1
= ik f (s) eiks ds = ikak
2

because eiks ds = 0. It follows the Fourier series for f is

ikak eikx
k=
as claimed.
Note the conclusion of this theorem is only about the Fourier series of f . It does
not say the Fourier series of f converges pointwise to f . However, if f satises a
Dini condition, then this will also occur. For example, if f has a bounded derivative
at every point, then by the mean value theorem |f (x) f (y)| K |x y| , and
this is enough to show the Fourier series converges to f (x).
14.7 Ways Of Approximating Functions

Given above is a theorem about Fourier series converging pointwise to a periodic
function or more generally to the mid point of the jump of the function. Notice
that some sort of smoothness of the function approximated was required, the Dini
condition or the Jordan condition. It can be shown that if this sort of thing is not
present, the Fourier series of a continuous periodic function may fail to converge
to it in a very spectacular manner. In fact, Fourier series dont do very well at
converging pointwise. However, there is another way of converging at which Fourier
series cannot be beat. It is mean square convergence. This is the same as converging
in L2 ([, ]).
Denition 14.7.1 Let f be a function dened on an interval, [a, b] . Then a se-

quence, {gn } of functions is said to converge uniformly to f on [a, b] if
lim sup {|f (x) gn (x)| : x [a, b]} ||f g|| = 0.

n
The sequence is said to converge mean square to f if

( )1/2
b
2
lim ||f gn ||2 lim |f gn | dx =0
n n a
14.7.1 Uniform Approximation With Trig. Polynomials

It turns out that if you dont insist the ak be the Fourier coecients, then every
continuous 2 periodic function f () can be approximated uniformly with a
14.7. WAYS OF APPROXIMATING FUNCTIONS 347
Trig. polynomial of the form

n
pn () ak eik
k=n
This means that for all > 0 there exists a pn () such that
||f pn || < .
Denition 14.7.2 Recall the nth partial sum of the Fourier series Sn f (x) is given
by
Sn f (x) = Dn (x y) f (y) dy = Dn (t) f (x t) dt

where Dn (t) is the Dirichlet kernel,

( )
1 sin n + 12 t
Dn (t) = (2) ( )
sin 2t
The nth Fejer mean, n f (x) is the average of the rst n of the Sn f (x). Thus
( )
1 1
n n
n+1 f (x) Sk f (x) = Dk (t) f (x t) dt
n+1 n+1
k=0 k=0
The Fejer kernel is

1
n
Fn+1 (t) Dk (t) .
n+1
k=0
As was the case with the Dirichlet kernel, the Fejer kernel has some properties.
Lemma 14.7.3 The Fejer kernel has the following properties.
1. Fn+1 (t) = Fn+1 (t + 2)

2. Fn+1 (t) dt = 1
n
3. Fn+1 (t) f (x t) dt = k=n bk eik for a suitable choice of bk .
1cos((n+1)t)
4. Fn+1 (t) = 4(n+1) sin2 ( 2t )
, Fn+1 (t) 0, Fn (t) = Fn (t) .
5. For every > 0,
lim sup {Fn+1 (t) : |t| } = 0.

n
In fact, for |t| ,

2
Fn+1 (t) ( ) .
(n + 1) sin2 2 4
348 FOURIER SERIES
Proof: Part 1.) is obvious because Fn+1 is the average of functions for which
this is true.
Part 2.) is also obvious for the same reason as Part 1.). Part 3.) is obvious
because it is true for Dn in place of Fn+1 and then taking the average yields the
same sort of sum.
The last statements in 4.) are obvious from the formula which is the only hard
part of 4.).

n (( ) )
1 1
Fn+1 (t) = (t) sin k+ t
(n + 1) sin 2 2 k=0 2

n (( ) ) ( )
1 1 t
= ( ) sin k + t sin
(n + 1) sin2 2t 2 k=0 2 2
( )
Using the identity sin (a) sin (b) = cos (a b) cos (a + b) with a = k + 12 t and
b = 2t , it follows
1 n
Fn+1 (t) = ( )
2 t (cos (kt) cos (k + 1) t)
(n + 1) sin 2 4 k=0
1 cos ((n + 1) t)
= ( )
(n + 1) sin2 2t 4
which completes the demonstration of 4.).

Next consider 5.). Since Fn+1 is even it suces to show
lim sup {Fn+1 (t) : t } = 0

n
For the given t, |t| ,

1 cos ((n + 1) t) 2
Fn+1 (t) ( ) ( )
(n + 1) sin2 2 4 (n + 1) sin2 2 4
which shows 5.).

Here is a picture of the Fejer kernels for n = 2, 4, 6.
1.0
0.75
y 0.5
0.25
0.0
3 2 1 0 1 2 3
t
Note how these kernels are nonnegative, unlike the Dirichlet kernels. Also there
is a large bump in the center which gets increasingly large as n gets larger. The
fact these kernels are nonnegative is what is responsible for the superior ability of
the Fejer means to approximate a continuous function.
Theorem 14.7.4 Let f be a continuous and 2 periodic function. Then
lim ||f n+1 f || = 0.

n
Proof: Let > 0 be given. Then by part 2. of Lemma 14.7.3,

|f (x) n+1 f (x)| = f (x) Fn+1 (y) dy Fn+1 (y) f (x y) dy

= (f (x) f (x y)) Fn+1 (y) dy

|f (x) f (x y)| Fn+1 (y) dy

= |f (x) f (x y)| Fn+1 (y) dy + |f (x) f (x y)| Fn+1 (y) dy

+ |f (x) f (x y)| Fn+1 (y) dy

Since Fn+1 is even and |f | is continuous and periodic, hence bounded by some
constant M the above is dominated by

|f (x) f (x y)| Fn+1 (y) dy + 4M Fn+1 (y) dy

Now choose such that for all x, it follows that if |y| < then
|f (x) f (x y)| < /2.
This can be done because f is uniformly continuous on [, ]. Since f is periodic,

it must also be uniformly continuous on R. (why?) Therefore, for this , this has
shown that for all x

|f (x) n+1 f (x)| /2 + 4M Fn+1 (y) dy

and now by Lemma 14.7.3 it follows
8M
||f n+1 f || /2 + ( ) <
(n + 1) sin2 2 4
provided n is large enough.

350 FOURIER SERIES
14.7.2 Mean Square Approximation

The partial sums of the Fourier series of f do a better job approximating f in
the mean square sense than any other linear combination of the functions, eik for
|k| n. This will be shown next. It is nothing but a simple computation. Recall
the Fourier coecients are

1
ak = f () eik
2
Then using this fact as needed, consider the following computation.

2

n

f () bk eik d

k=n
( )( )

n
n
il
= f () bk e ik
f () bl e d
k=n l=n
( ( )( )

n
n
bl eil
2
= |f ()| + bk eik
k=n l=n
)

n
n
il
f () bl e f () bk e ik
d
l=n k=n

eik eil d 2
2
= |f ()| d + bk bl bl al 2 bk ak
kl l k
2
Then adding and subtracting 2 k |ak | ,

2 2 2
= |f ()| d 2 |ak | + 2 |bk |
k k
2
2 bl al 2 bk ak + 2 |ak |
l k k
( )

2
2
( )
= |f ()| d 2 |ak | + 2 (bk ak ) bk ak
k k

2 2 2
= |f ()| d 2 |ak | + 2 |bk ak |
k k
Therefore, to make
2

n

f () bk eik d

k=n
as small as possible for all choices of bk , one should let bk = ak , the k th Fourier
coecient. Stated another way,
n
2

ik 2
f () bk e d |f () Sn f ()| d

k=n
for any choice of bk . In particular,

2 2
|f () n+1 f ()| d |f () Sn f ()| d. (14.7.20)

Also,

n
f () Sn f ()d = ak f () eik d
k=n
=2ak
z }| {

n
= ak f () eik d
k=n
n
2
= 2 |ak |
k=n
Similarly,

n
2
f ()Sn f () d = 2 |ak |
k=n
and a simple computation of the above sort shows that also

n
2
Sn f () Sn f ()d = 2 |ak | .
k=n
Therefore, ( )
0 (f () Sn f ()) f () Sn f () d

2 2
= |f ()| + |Sn f ()| f ()Sn f () f () Sn f ()d

2 2
= |f ()| |Sn f ()| d

showing

n
2 2 2
2 |ak | |Sn f ()| d |f ()| d (14.7.21)
k=n
Now it is easy to prove the following fundamental theorem.

352 FOURIER SERIES
Theorem 14.7.5 Let f L2 ([, ]) and periodic of period 2. Then

2
lim |f Sn f | dx = 0.
n
Proof: First assume f is continuous and 2 periodic. Then by 14.7.20

2 2
|f Sn f | dx |f n+1 f | dx

2 2
||f n+1 f || dx = 2 ||f n+1 f ||

and the last expression converges to 0 by Theorem 14.7.4.

Next suppose f is only in L2 ([, ]) . By Theorem 12.5.3, there exists a con-
tinuous function g which has compact support such that

gX[,] f X([,]) g f X([,]) L2 (R) <
L2
Since (, ) is a countable union of compact sets, without loss of[generality, it can

]
be assumed that spt (g) (, ). This is because, letting Kn = + n1 , n1 ,
it follows from the dominated convergence theorem that

lim gX[,] gXKn L2 (R) = 0
n
and so gX[,] could be replaced with gXKn if necessary in the above inequality
involving .
Extending g to be 2 periodic, it follows
||f Sn f ||L2 ([,]) ||f n+1 f ||L2 ([,])
||f g||L2 ([,]) + ||g n+1 g||L2 ([,]) + || n+1 (g f )||L2 ([,]) . (14.7.22)
The last term is no larger than

( ( )2 )1/2

Fn (y) |g (x y) f (x y)| dy dx

( )1/2
2
Fn (y) |g (x y) f (x y)| dx dy

= Fn (y) dy ||g f ||L2 ([,]) = ||g f ||L2 ([,])

Therefore, 14.7.22 is no larger than 3 where was arbitrary.

14.8. EXERCISES 353
14.8 Exercises
1. Suppose f has innitely many derivatives and is also periodic with period
2. Let the Fourier series of f be

ak eik
k=
Show that
lim k m ak = lim k m ak = 0
k k
for every m N.
2. Let f be a continuous function dened on [, ]. Show there exists a poly-
nomial, p such that ||p f || < where
||g|| sup {|g (x)| : x [, ]} .
Extend this result to an arbitrary interval. This is another approach to the
Weierstrass approximation theorem. Hint: First nd a linear function, ax +
b = y such that f y has the property that it has the same value at both
ends of [, ]. Therefore, you may consider this as the restriction to [, ]
of a continuous periodic function, F . Now nd a trig polynomial,

n
(x) a0 + ak cos kx + bk sin kx
k=1
such that || F || < 3 . Recall 14.1.4. Now consider the power series of
the trig functions making use of the error estimate for the remainder after m
terms.
3. The inequality established above,
n
2 2 2
2 |ak | |Sn f ()| d |f ()| d
k=n
is called Bessels inequality. Use this inequality to give an easy proof that for
all f L2 ([, ]) ,
lim f (x) einx dx = 0.
n
Recall that in the Riemann Lebesgue lemma |f | L1 ((a, b]) so while this
exercise is easier, it lacks the generality of the earlier proof. Explain why this
is less general.
4. Let f (x) = x for x (, ) and extend to make the resulting function
dened on R and periodic of period 2. Find the Fourier series of f . Verify
the Fourier series converges to the midpoint of the jump and use this series
to nd a nice formula for 4 . Hint: For the last part consider x = 2 .
354 FOURIER SERIES
5. Let f (x) = x2 on (, ) and extend to form a 2 periodic function dened

2
on R. Find the Fourier series of f . Now obtain a famous formula for 6 by
letting x = .
6. Let f (x) = cos x for x (0, ) and dene f (x) cos x for x (, 0).
Now extend this function to make it 2 periodic. Find the Fourier series of f .
7. Suppose f is piecewise continuous and 2 periodic on R. This means the

function is periodic and it is continuous on [, ] except for niteliy many
points. Also suppose that at every point, f (x+) and f (x) exist. Show
that the Fejer means, n f (x) f (x+)+f
2
(x)
. Thus there is no requirement
of smoothness of f from one side at all and you still get convergence to the
midpoint of the jump.
8. Suppose f, g L2 ([, ]). Show

1
f gdx = k k ,
2 k=
where k are the Fourier coecients of f and k are the Fourier coecients
of g.
9. Recall the partial summation formula, called the Dirichlet formula which says
that
q
q1
ak bk = Aq bq Ap1 bp + Ak (bk bk+1 ) .
k=p k=p
q
Here Aq k=1 ak . Also recall Dirichlets test which says that if limk bk =
0, Ak are bounded, and |b k bk+1 | converges, then ak bk converges. Show
the partial sums of k sin kx are bounded for each x R. Using this fact
and
the Dirichlet test above, obtain some theorems which will state that
k ak sin kx converges for all x.
10. Let {an } be a sequence of positive numbers having the property that
lim nan = 0
n
n N, nan (n + 1) an+1 . Show that if this is so, it follows that

and for all

the series, k=1 an sin nx converges uniformly on R. This is a variation of a
very interesting problem found
insin Apostols book, [3]. Hint: Use the Dirichlet
sin kx
formula of Problem 9 on kak kkx and show the partial sums of k
are bounded independent of x. To do this, you might n argue the maximum
value of the partial sums of this series occur when k=1 cos kx = 0. Sum this
q ( )k
series by considering the real part of the geometric series, k=1 eix and
sin kx
then show the partial sums of k are Riemann sums for a certain nite
integral.
14.8. EXERCISES 355
11. The problem in Apostols book mentioned in Problem 10 does not require

nan to be decreasing and is as follows. Let {ak }k=1 be a decreasing sequence
of nonnegative numbers which satises limn nan = 0. Then

ak sin (kx)
k=1
converges uniformly on R. You can nd this problem worked out completely

in Jones [30]. Fill in the details to the following argument or something like
it to obtain a proof. First show that for p < q, and x (0, ) ,

( )
q
a sin (kx) 3ap csc x . (14.8.23)
k 2
k=p
To do this, use summation by parts using the formula

(( ) ) (( ) )
q
cos p 12 x cos q + 12 x
sin (kx) = ( ) ,
k=p
2 sin x2
which you can establish by taking the imaginary part of a geometric series of
q ( )k
the form k=1 eix or else the approach used above to nd a formula for
the Dirichlet kernel. Now dene
b (p) sup {nan : n p} .
Thus b (p) 0, b (p) is decreasing in p, and if k n, ak b (n) /k. Then

from 14.8.23 and the assumption {ak } is decreasing,
0 if m=q
z }| {
q m q

a sin (kx) a sin (kx) + a sin (kx)
k k k

k=p k=p k=m+1
{ (x)
m b(k)
k |sin (kx)| + 3ap csc 2 if m < q
k=p
q b(k)
k=p k |sin (kx)| if m = q.
{
m b(k) 2
k kx + 3ap x if m < q
k=p
q b(k) (14.8.24)
k=p k kx if m = q
where this uses the inequalities

(x) x
sin , |sin (x)| |x| for x (0, ) .
2 2
There are two cases to consider depending on whether x 1/q. First suppose
that x 1/q. Then let m = q and use the bottom line of 14.8.24 to write
356 FOURIER SERIES
that in this case,

q 1 q
a sin (kx) b (k) b (p) .
k q
k=p k=p
If x > 1/q, then q > 1/x and you use the top line of 14.8.24 picking m such
that
1 1
q > m 1.
x x
Then in this case,

q m
b (k) 2
a sin (kx) kx + 3ap
k k x
k=p k=p
b (p) x (m p) + 6ap (m + 1)
( )
1 b (p)
b (p) x + 6 (m + 1) 25b (p) .
x m+1

Therefore, the partial sums of the series, ak sin kx form a uniformly Cauchy
sequence and must converge uniformly on (0, ). Now explain why this implies
the series converges uniformly on R.

12. Suppose f (x) = k=1 ak sin kx and that the convergence is uniform. Recall
something like this holds for power series. Is it reasonable to suppose that
f (x) = k=1 ak k cos kx? Explain.
13. Suppose |uk (x)| Kk for all x D where

n
Kk = lim Kk < .
n
k= k=n

Show that k= uk (x) converges converges uniformly on D in the sense
that for all > 0, there exists N such that whenever n > N ,

n

uk (x) uk (x) <

k= k=n
for all x D. This is called the Weierstrass M test.

14. Suppose f is a dierentiable function of period 2 and suppose that both
f and f are in L2 ([, ]) such that for all x (, ) and y suciently
small, x+y
f (x + y) f (x) = f (t) dt.
x
Show that the Fourier series of f converges uniformly to f . Hint: First show
using the Dini criterion that Sn f (x) f (x) for all x. Next let k= ak eikx
14.8. EXERCISES 357
be the Fourier series for f . Then from the denition of ak , show that for
1
k = 0, ak = ik ak where ak is the Fourier coecient of f . Now use the

Bessels inequality to argue that k= |ak | < and then show this implies
2

|ak | < . You might want to use the Cauchy Schwarz inequality to do this
part. Then using the version of the Weierstrass M test given in Problem 13
obtain uniform convergence of the Fourier series to f .
15. Let f be a function dened on R. Then f is even if f () = f () for all

R. Also f is called odd if for all R, f () = f (). Now using the
Weierstrass approximation theorem show directly that if h is a continuous even
2 periodic function, then for every > 0 there exists an m and constants,
a0 , , am such that

m

h () ak cosk () <

k=0
for all R. Hint: Note the function arccos is continuous and maps [1, 1]
onto [0, ] . Using this, show you can dene g a continuous function on [1, 1]
by g (cos ) = h () for on [0, ]. Now use the Weierstrass approximation
theorem on [1, 1].
16. Show that if f is any odd 2 periodic function, then its Fourier series can
be simplied to an expression of the form

bn sin (nx)
n=1
and also f (m) = 0 for all m N.

17. Consider the symbol k=1 an . The innite sum might not converge. Summa-
bility methods are systematic ways of assigning a number to such a symbol.
The nth Ceasaro mean n is dened as the average of the rst n partial sums
of the series. Thus
1
n
n Sk
n
k=1
where

k
Sk aj .
j=1

Show that if k=1 an converges then limn n also exists and equals the
same thing. Next nd an example where, although k=1 an fails to converge,
limn n does exist. This summability method is called Ceasaro summa-
bility. Recall the Fejer means were obtained in just this way.
358 FOURIER SERIES
18. Let 0 < r < 1 and for f a continuous periodic function of period 2 consider

Ar f () r|k| ak eik
k=
where the ak are the Fourier coecients of f. Show that
lim Ar f () = f () .
r1
Hint: You need to nd a kernel and write as the integral of the kernel con-
volved with f . Then consider properties of this kernel as was done with the
Fejer kernel.
19. Recall the Dirichlet kernel is

( )
1 sin n + 12 t
Dn (t) (2) ( )
sin 2t

and it has the property that
Dn (t) dt = 1. Show rst that this implies

1 sin (nt) cos (t/2)
dt = 1
2 sin (t/2)
and this implies

1 sin (nt) cos (t/2)
= 1.
0 sin (t/2)
Next change the variable to show the integral equals
n
1 sin (u) cos (u/2n) 1
du
0 sin (u/2n) n
Now show that

sin (u) cos (u/2n) 1 2 sin u
lim =
n sin (u/2n) n u
Next show that

1 n 2 sin u 1 n sin (u) cos (u/2n) 1
lim du = lim du = 1
n 0 u n 0 sin (u/2n) n
Finally show
R
sin u sin u
lim du = du
R 0 u 2 0 u
This is a very important improper integral.
14.8. EXERCISES 359
20. Recall the Fourier series of a function in L2 (, ) converges to the func-

tion in L2 (, ). Prove a similar theorem with L2 (, ) replaced by
L2 (m, m) and the functions
{ }
(1/2) inx
(2) e
nZ
used in the Fourier series replaced with

{ }
(1/2) i m
n
(2m) e x
nZ
Now suppose f is a function in L2 (R) satisfying F f (t) = 0 if |t| > m. Show

that if this is so, then
( )
1 n sin ( (mx + n))
f (x) = f .
m mx + n
nZ
Here m is a positive integer. This is sometimes called the Shannon sampling

theorem.Hint: First note that since F f L2 and is zero o a nite interval,
it follows F f L1 . Also
m
1
f (t) = eitx F f (x) dx
2 m
and you can conclude from this that f has all derivatives and they are all
bounded. Thus f is a very nice function. You can replace F f with its Fourier
series.
( ) Then consider carefully the Fourier coecient of F f. Argue it equals
f nm or at least an appropriate constant times this. When you get this the
rest will fall quickly into place if you use F f is zero o [m, m].
21. Show that neither the Jordan nor the Dini criterion for pointwise convergence
implies the other criterion. That is, nd an example of a function for which
Jordans condition implies pointwise convergence but not Dinis and then nd
a function for which Dini works but Jordan does not. Hint: You might try
1
considering something like y = [ln (x)] for x > 0, x near 0, to get something
for which Jordan works but Dini does not. For the other part, try something
like x sin (1/x).
360 FOURIER SERIES
Part III
Further Topics
361
Metric Spaces And General
Topological Spaces
15.1 Metric Space

Denition 15.1.1 A metric space is a set, X and a function d : X X [0, )
which satises the following properties.
d (x, y) = d (y, x)
d (x, y) 0 and d (x, y) = 0 if and only if x = y
d (x, y) d (x, z) + d (z, y) .
You can check that Rn and Cn are metric spaces with d (x, y) = |x y| . How-
ever, there are many others. The denitions of open and closed sets are the same
for a metric space as they are for Rn .
Denition 15.1.2 A set, U in a metric space is open if whenever x U, there

exists r > 0 such that B (x, r) U. As before, B (x, r) {y : d (x, y) < r} . Closed
sets are those whose complements are open. A point p is a limit point of a set, S
if for every r > 0, B (p, r) contains innitely many points of S. A sequence, {xn }
converges to a point x if for every > 0 there exists N such that if n N, then
d (x, xn ) < . {xn } is a Cauchy sequence if for every > 0 there exists N such that
if m, n N, then d (xn , xm ) < .
Lemma 15.1.3 In a metric space, X every ball, B (x, r) is open. A set is closed
if and only if it contains all its limit points. If p is a limit point of S, then there
exists a sequence of distinct points of S, {xn } such that limn xn = p.
Proof: Let z B (x, r). Let = r d (x, z) . Then if w B (z, ) ,
d (w, x) d (x, z) + d (z, w) < d (x, z) + r d (x, z) = r.
Therefore, B (z, ) B (x, r) and this shows B (x, r) is open.

The properties of balls are presented in the following theorem.
363
364 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Theorem 15.1.4 Suppose (X, d) is a metric space. Then the sets {B(x, r) : r >
0, x X} satisfy
{B(x, r) : r > 0, x X} = X (15.1.1)
If p B (x, r1 ) B (z, r2 ), there exists r > 0 such that
B (p, r) B (x, r1 ) B (z, r2 ) . (15.1.2)
Proof: Observe that the union of these balls includes the whole space, X so
15.1.1 is obvious. Consider 15.1.2. Let p B (x, r1 ) B (z, r2 ). Consider
r min (r1 d (x, p) , r2 d (z, p))
and suppose y B (p, r). Then
d (y, x) d (y, p) + d (p, x) < r1 d (x, p) + d (x, p) = r1
and so B (p, r) B (x, r1 ). By similar reasoning, B (p, r) B (z, r2 ). This proves
the theorem.
Let K be a closed set. This means K C X \ K is an open set. Let p be a
limit point of K. If p K C , then since K C is open, there exists B (p, r) K C . But
this contradicts p being a limit point because there are no points of K in this ball.
Hence all limit points of K must be in K.
Suppose next that K contains its limit points. Is K C open? Let p K C .
Then p is not a limit point of K. Therefore, there exists B (p, r) which contains at
most nitely many points of K. Since p / K, it follows that by making r smaller if
necessary, B (p, r) contains no points of K. That is B (p, r) K C showing K C is
open. Therefore, K is closed.
Suppose now that p is a limit point of S. Let x1 (S \ {p}) B (p, 1) . If
x1 , , xk have been chosen, let
{ }
1
rk+1 min d (p, xi ) , i = 1, , k, .
k+1
Let xk+1 (S \ {p}) B (p, rk+1 ) . This proves the lemma.
Lemma 15.1.5 If {xn } is a Cauchy sequence in a metric space, X and if some
subsequence, {xnk } converges to x, then {xn } converges to x. Also if a sequence
converges, then it is a Cauchy sequence.
Proof: Note rst that nk k because in a subsequence, the indices, n1 , n2 ,
are strictly increasing. Let > 0 be given and let N be such that for k >
N, d (x, xnk ) < /2 and for m, n N, d (xm , xn ) < /2. Pick k > n. Then if n > N,

d (xn , x) d (xn , xnk ) + d (xnk , x) < + = .
2 2
Finally, suppose limn xn = x. Then there exists N such that if n > N, then
d (xn , x) < /2. it follows that for m, n > N,

d (xn , xm ) d (xn , x) + d (x, xm ) < + = .
2 2
This proves the lemma.
15.2. COMPACTNESS IN METRIC SPACE 365
15.2 Compactness In Metric Space

Many existence theorems in analysis depend on some set being compact. Therefore,
it is important to be able to identify compact sets. The purpose of this section is
to describe compact sets in a metric space.
Denition 15.2.1 Let A be a subset of X. A is compact if whenever A is con-

tained in the union of a set of open sets, there exists nitely many of these open sets
whose union contains A. (Every open cover admits a nite subcover.) A is se-
quentially compact means every sequence has a convergent subsequence converging
to an element of A.
In a metric space compact is not the same as closed and bounded!
Example 15.2.2 Let X be any innite set and dene d (x, y) = 1 if x = y while
d (x, y) = 0 if x = y.
You should verify the details that this is a metric space because it satises the
axioms of a metric. The set X is closed and bounded because its complement is
which is clearly open because every point of is an interior point. (There are
none.) Also
{ (X is )bounded }because X = B (x, 2). However, X is clearly not compact
because B x, 21 : x X is a collection of open sets whose union contains X but
since they are all disjoint and( nonempty,
) there is no nite subset of these whose
union contains X. In fact B x, 12 = {x}.
From this example it is clear something more than closed and bounded is needed.
If you are not familiar with the issues just discussed, ignore them and continue.
Denition 15.2.3 In any metric space, a set E is totally bounded if for every > 0
there exists a nite set of points {x1 , , xn } such that
E ni=1 B (xi , ).
This nite set of points is called an net.
The following proposition tells which sets in a metric space are compact. First
here is an interesting lemma.
Lemma 15.2.4 Let X be a metric space and suppose D is a countable dense subset
of X. In other words, it is being assumed X is a separable metric space. Consider
the open sets of the form B (d, r) where r is a positive rational number and d D.
Denote this countable collection of open sets by B. Then every open set is the union
of sets of B. Furthermore, if C is any collection of open sets, there exists a countable
subset, {Un } C such that n Un = C.
Proof: Let U be an open set and let x U. Let B (x, ) U. Then by density of
D, there exists d DB (x, /4) . Now pick r Q(/4, 3/4) and consider B (d, r) .
Clearly, B (d, r) contains the point x because r > /4. Is B (d, r) B (x, )? if so,
this proves the lemma because x was an arbitrary point of U . Suppose z B (d, r) .
Then
3
d (z, x) d (z, d) + d (d, x) < r + < + =
4 4 4
Now let C be any collection of open sets. Each set in this collection is the union
of countably many sets of B. Let B denote the sets of B which are contained in
some set of C. Thus B = C. Then for each B B , pick UB C such that
B UB . Then {UB : B B } is a countable collection of sets of C whose union
equals C. Therefore, this proves the lemma.
Proposition 15.2.5 Let (X, d) be a metric space. Then the following are equiva-
lent.
(X, d) is compact, (15.2.3)
(X, d) is sequentially compact, (15.2.4)
(X, d) is complete and totally bounded. (15.2.5)
Proof: Suppose 15.2.3 and let {xk } be a sequence. Suppose {xk } has no
convergent subsequence. If this is so, then no value of the sequence is repeated
more than nitely many times. Also {xk } has no limit point because if it did,
there would exist a subsequence which converges. To see this, suppose p is a limit
point of {xk } . Then in B (p, 1) there are innitely many points of {xk } . Pick one
called xk1 . Now if xk1 , xk2 , , xkn have been picked with xki B (p, 1/i) , consider
B (p, 1/ (n + 1)) . There are innitely many points of {xk } in this ball also. Pick

xkn+1 such that kn+1 > kn . Then {xkn }n=1 is a subsequence which converges to p
and it is assumed this does not happen. Thus {xk } has no limit points. It follows
the set
Cn = {xk : k n}
is a closed set because it has no limit points and if
Un = CnC ,
then
X =
n=1 Un
but there is no nite subcovering, because no value of the sequence is repeated more
than nitely many times. Note xk is not in Un whenever k > n. This contradicts
compactness of (X, d). This shows 15.2.3 implies 15.2.4.
Now suppose 15.2.4 and let {xn } be a Cauchy sequence. Is {xn } convergent?
By sequential compactness xnk x for some subsequence. By Lemma 15.1.5 it
follows that {xn } also converges to x showing that (X, d) is complete. If (X, d) is
not totally bounded, then there exists > 0 for which there is no net. Hence there
exists a sequence {xk } with d (xk , xl ) for all l = k. By Lemma 15.1.5 again,
this contradicts 15.2.4 because no subsequence can be a Cauchy sequence and so no
subsequence can converge. This shows 15.2.4 implies 15.2.5.
15.2. COMPACTNESS IN METRIC SPACE 367
Now suppose 15.2.5. What about 15.2.4? Let {pn } be a sequence and let
n
{xni }m
i=1 be a 2
n
net for n = 1, 2, . Let
( )
Bn B xnin , 2n
be such that Bn contains pk for innitely many values of k and Bn Bn+1 = .

To do this, suppose Bn contains ( pk for innitely
) many values of k. Then one of
(n+1)
the sets which intersect Bn , B xn+1
i , 2 must contain pk for innitely many
values of k because all these indices of points(from {pn } contained
) in Bn must be
(n+1)
accounted for in one of nitely many sets, B xn+1 i , 2 . Thus there exists a
strictly increasing sequence of integers, nk such that
pnk Bk .
Then if k l,

k1
( )
d (pnk , pnl ) d pni+1 , pni
i=l

k1
< 2(i1) < 2(l2).
i=l
Consequently {pnk } is a Cauchy sequence. Hence it converges because the metric

space is complete. This proves 15.2.4.
Now suppose 15.2.4 and 15.2.5 which have now been shown to be equivalent.
Let Dn be a n1 net for n = 1, 2, and let
D =
n=1 Dn .
Thus D is a countable dense subset of (X, d).

Now let C be any set of open sets such that C X. By Lemma 15.2.4, there
exists a countable subset of C,
Ce = {Un }
n=1
such that Ce = C. If C admits no nite subcover, then neither does Ce and there ex-
ists pn X \ nk=1 Uk . Then since X is sequentially compact, there is a subsequence
{pnk } such that {pnk } converges. Say
p = lim pnk .
k
All but nitely many points of {pnk } are in X \ nk=1 Uk . Therefore p X \ nk=1 Uk
for each n. Hence
p/ k=1 Uk
contradicting the construction of {Un }

n=1 which required that n=1 Un X. Hence
X is compact. This proves the proposition.
Consider Rn . In this setting totally bounded and bounded are the same. This
will yield a proof of the Heine Borel theorem from advanced calculus.
Lemma 15.2.6 A subset of Rn is totally bounded if and only if it is bounded.
Proof: Let A be totally bounded. Is it bounded? Let x1 , , xp be a 1 net for

A. Now consider the ball B (0, r + 1) where r > max (|xi | : i = 1, , p) . If z A,
then z B (xj , 1) for some j and so by the triangle inequality,
|z 0| |z xj | + |xj | < 1 + r.
Thus A B (0,r + 1) and so A is bounded.

Now suppose A is bounded and suppose A is not totally bounded. Then there
exists > 0 such that there is no net for A. Therefore, there exists a sequence of
points {ai } with |ai aj | if i = j. Since A is bounded, there exists r > 0 such
that
A [r, r)n.
(x [r, r)n means xi [r, r) for each i.) Now dene S to be all cubes of the form

n
[ak , bk )
k=1
where
ak = r + i2p r, bk = r + (i + 1) 2p r,
( )n
for i {0, 1, , 2p+1 1}. Thus S is a collection of 2p+1 non overlapping
cubes
whose union equals [r, r)n and whose diameters are all equal to 2p r n. Now
choose p large enough that the diameter of these cubes is less than . This yields a
contradiction because one of the cubes must contain innitely many points of {ai }.
The next theorem is called the Heine Borel theorem and it characterizes the
compact sets in Rn .
Theorem 15.2.7 A subset of Rn is compact if and only if it is closed and bounded.
Proof: Since a set in Rn is totally bounded if and only if it is bounded, this

theorem follows from Proposition 15.2.5 and the observation that a subset of Rn is
closed if and only if it is complete. This proves the theorem.
15.3 Some Applications Of Compactness

The following corollary is an important existence theorem which depends on com-
pactness.
Corollary 15.3.1 Let X be a compact metric space and let f : X R be contin-

uous. Then max {f (x) : x X} and min {f (x) : x X} both exist.
15.3. SOME APPLICATIONS OF COMPACTNESS 369
Proof: First it is shown f (X) is compact. Suppose C is a set of open sets whose
1
{ f (X). Then} since f is continuous f (U ) is open for all U C.
union contains
Therefore, f 1 (U ) : U C is a collection of open sets {whose union contains X. }
Since X is compact, it follows nitely many of these, f 1 (U1 ) , , f 1 (Up )
p
contains X in their union. Therefore, f (X) k=1 Uk showing f (X) is compact
as claimed.
Now since f (X) is compact, Theorem 15.2.7 implies f (X) is closed and bounded.
Therefore, it contains its inf and its sup. Thus f achieves both a maximum and a
minimum.
Denition 15.3.2 Let X, Y be metric spaces and f : X Y a function. f is

uniformly continuous if for all > 0 there exists > 0 such that whenever x1 and
x2 are two points of X satisfying d (x1 , x2 ) < , it follows that d (f (x1 ) , f (x2 )) < .
A very important theorem is the following.
Theorem 15.3.3 Suppose f : X Y is continuous and X is compact. Then f is

uniformly continuous.
Proof: Suppose this is not true and that f is continuous but not uniformly
continuous. Then there exists > 0 such that for all > 0 there exist points, p
and q such that d (p , q ) < and yet d (f (p ) , f (q )) . Let pn and qn be
the points which go with = 1/n. By Proposition 15.2.5 {pn } has a convergent
subsequence, {pnk } converging to a point, x X. Since d (pn , qn ) < n1 , it follows
that qnk x also. Therefore,
d (f (pnk ) , f (qnk )) d (f (pnk ) , f (x)) + d (f (x) , f (qnk ))
but by continuity of f , both d (f (pnk ) , f (x)) and d (f (x) , f (qnk )) converge to 0

as k contradicting the above inequality. This proves the theorem.
Another important property of compact sets in a metric space concerns the nite
intersection property.
Denition 15.3.4 If every nite subset of a collection of sets has nonempty inter-
section, the collection has the nite intersection property.
Theorem 15.3.5 Suppose F is a collection of compact sets in a metric space, X

which has the nite intersection property. Then each of these sets is closed and
there exists a point in their intersection. (F = ).
Proof: First I show each compact set is closed. Let K be a nonempty compact
set and suppose p / K. Then for each x K, let Vx = B (x, d (p, x) /3) and
Ux = B (p, d (p, x) /3) so that Ux and Vx have empty intersection. Then since V
is compact, there are nitely many Vx which cover K say Vx1 , , Vxn . Then let
U = ni=1 Uxi . It follows p U and U has empty intersection with K. In fact U has
empty intersection with ni=1 Vxi . Since U is an open set and p K C is arbitrary,
it follows K C is an open set.
{ C }
Consider now the claim about the intersection. {If this were not
} so, F : F F =
X and so, in particular, picking some F0 F, F : F F would be an open
C
cover of F0 . Since F0 is compact, some nite subcover, F1C , , Fm C

exists. But
then F0 k=1 Fk which means k=0 Fk = , contrary to the nite intersection
m C m
property. To see this, note that if x F0 , then it must fail to be in some Fk and so
it is not in m
k=0 Fk . Since this is true for every x it follows k=0 Fk = .
m
m
Theorem 15.3.6 Let Xi be a compact metric space with metric di . Then i=1 Xi
is also a compact metric space with respect to the metric, d (x, y) maxi (di (xi , yi )).
{ }
Proof: This is most easily seen from sequential compactness. Let xk k=1
m th k k
be a sequence
{ k } of points in i=1 Xi . Consider the i component of x , xi . It
follows xi is a sequence of points in Xi and so it has a convergent subsequence.
{ }
Compactness of X1 implies there exists a subsequence of xk , denoted by xk1 such
that
lim xk11 x1 X1 .
k1
{ }
Now there exists a further subsequence, denoted by xk2 such that in addition to
{ l } x2 x2 X2 . After
k2
this, taking m such subsequences, there exists a subsequence,
x such that
m lim l xl
i = xi Xi for each i. Therefore, letting x (x1 , , xm ),
xl x in i=1 Xi . This proves the theorem.
15.4 Ascoli Arzela Theorem

Denition 15.4.1 Let (X, d) be a complete metric space. Then it is said to be
locally compact if B (x, r) is compact for each r > 0.
Thus if you have a locally compact metric space, then if {an } is a bounded
sequence, it must have a convergent subsequence.
Let K be a compact subset of Rn and consider the continuous functions which
have values in a locally compact metric space, (X, d) where d denotes the metric on
X. Denote this space as C (K, X) .
Denition 15.4.2 For f, g C (K, X) , where K is a compact subset of Rn and
X is a locally compact complete metric space dene
K (f, g) sup {d (f (x) , g (x)) : x K} .
Then K provides a distance which makes C (K, X) into a metric space.
The Ascoli Arzela theorem is a major result which tells which subsets of C (K, X)
are sequentially compact.
Denition 15.4.3 Let A C (K, X) for K a compact subset of Rn . Then A is
said to be uniformly equicontinuous if for every > 0 there exists a > 0 such that
whenever x, y K with |x y| < and f A,
d (f (x) , f (y)) < .
The set, A is said to be uniformly bounded if for some M < , and a X,
f (x) B (a, M )
for all f A and x K.
Uniform equicontinuity is like saying that the whole set of functions, A, is uni-
formly continuous on K uniformly for f A. The version of the Ascoli Arzela
theorem I will present here is the following.
Theorem 15.4.4 Suppose K is a nonempty compact subset of Rn and A C (K, X)

is uniformly bounded and uniformly equicontinuous. Then if {fk } A, there exists
a function, f C (K, X) and a subsequence, fkl such that
lim K (fkl , f ) = 0.
l
To give a proof of this theorem, I will rst prove some lemmas.

Lemma 15.4.5 If K is a compact subset of Rn , then there exists D {xk }k=1 K
such that D is dense in K. Also, for every > 0 there exists a nite set of points,
{x1 , , xm } K, called an net such that
m
i=1 B (xi , ) K.
1 K. If every point of K is within 1/m of x1 , stop.

Proof: For m N, pick xm m
Otherwise, pick
2 K \ B (x1 , 1/m) .
xm m
1 , 1/m) B (x2 , 1/m) , stop. Otherwise, pick

If every point of K contained in B (xm m
3 K \ (B (x1 , 1/m) B (x2 , 1/m)) .

xm m m
1 , 1/m) B (x2 , 1/m) B (x3 , 1/m) , stop.

If every point of K is contained in B (xm m m
Otherwise, pick
4 K \ (B (x1 , 1/m) B (x2 , 1/m) B (x3 , 1/m))

xm m m m
Continue this way until the process stops, say at N (m). It must stop because
if it didnt, there would be a convergent subsequence due to the compactness of
K. Ultimately all terms of this convergent subsequence would be closer than 1/m,
violating the manner in which they are chosen. Then D =
N (m)
m=1 k=1 {xk } . This
m
is countable because it is a countable union of countable sets. If y K and > 0,

then for some m, 2/m < and so B (y, ) must contain some point of {xm k } since
otherwise, the process stopped too soon. You could have picked y. This proves the
lemma.
Lemma 15.4.6 Suppose D is dened above and {gm } is a sequence of functions of

A having the property that for every xk D,
lim gm (xk ) exists.

m
Then there exists g C (K, X) such that
lim (gm , g) = 0.
m
Proof: Dene g rst on D.
g (xk ) lim gm (xk ) .

m
Next I show that {gm } converges at every point of K. Let x K and let > 0 be
given. Choose xk such that for all f A,

d (f (xk ) , f (x)) < .
3
I can do this by the equicontinuity. Now if p, q are large enough, say p, q M,

d (gp (xk ) , gq (xk )) < .
3
Therefore, for p, q M,
d (gp (x) , gq (x)) d (gp (x) , gp (xk )) + d (gp (xk ) , gq (xk )) + d (gq (xk ) , gq (x))

< + + =
3 3 3
It follows that {gm (x)} is a Cauchy sequence having values X. Therefore, it con-
verges. Let g (x) be the name of the thing it converges to.
Let > 0 be given and pick > 0 such that whenever x, y K and |x y| < ,
it follows d (f (x) , f (y)) < 3 for all f A. Now let {x1 , , xm } be a net for
K as in Lemma 15.4.5. Since there are only nitely many points in this net, it
follows that there exists N such that for all p, q N,

d (gq (xi ) , gp (xi )) <
3
for all {x1 , , xm } . Therefore, for arbitrary x K, pick xi {x1 , , xm } such
that |xi x| < . Then
d (gq (x) , gp (x)) d (gq (x) , gq (xi )) + d (gq (xi ) , gp (xi )) + d (gp (xi ) , gp (x))

< + + = .
3 3 3
Since N does not depend on the choice of x, it follows this sequence {gm } is uni-
formly Cauchy. That is, for every > 0, there exists N such that if p, q N,
then
(gp , gq ) < .
Next, I need to verify that the function, g is a continuous function. Let N be

large enough that whenever p, q N, the above holds. Then for all x K,

d (g (x) , gp (x)) (15.4.6)
3
whenever p N. This follows from observing that for p, q N,

d (gq (x) , gp (x)) <
3
and then taking the limit as q to obtain 15.4.6. In passing to the limit, you
can use the following simple claim.
Claim: In a metric space, if an a, then d (an , b) d (a, b) .
Proof of the claim: You note that by the triangle inequality, d (an , b)
d (a, b) d (an , a) and d (a, b) d (an , b) d (an , a) and so
|d (an , b) d (a, b)| d (an , a) .
Now let p satisfy 15.4.6 for all x whenever p > N. Also pick > 0 such that if
|x y| < , then

d (gp (x) , gp (y)) < .
3
Then if |x y| < ,
d (g (x) , g (y)) d (g (x) , gp (x)) + d (gp (x) , gp (y)) + d (gp (y) , g (y))

< + + = .
3 3 3
Since was arbitrary, this shows that g is continuous.
It only remains to verify that (g, gk ) 0. But this follows from 15.4.6. This
proves the lemma.
With these lemmas, it is time to prove Theorem 15.4.4.
Proof of Theorem 15.4.4: Let D = {xk } be the countable dense set of K
gauranteed by Lemma 15.4.5 and let {(1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) , } be a
subsequence of N such that
lim f(1,k) (x1 ) exists.

k
This is where the local compactness of X is being used. Now let
{(2, 1) , (2, 2) , (2, 3) , (2, 4) , (2, 5) , }
be a subsequence of {(1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) , } which has the property
that
lim f(2,k) (x2 ) exists.
k
Thus it is also the case that
f(2,k) (x1 ) converges to lim f(1,k) (x1 ) .

k
because every subsequence of a convergent sequence converges to the same thing as

the convergent sequence. Continue this way and consider the array
f(1,1) , f(1,2) , f(1,3) , f(1,4) , converges at x1
f(2,1) , f(2,2) , f(2,3) , f(2,4) converges at x1 and x2
f(3,1) , f(3,2) , f(3,3) , f(3,4) converges at x1 , x2 , and x3
..
.
{ }
Now let gk f(k,k) . Thus gk is ultimately a subsequence of f(m,k) whenever k > m
and therefore, {gk } converges at each point of D. By Lemma 15.4.6 it follows there
exists g C (K; X) such that
lim (g, gk ) = 0.
k
This proves the Ascoli Arzela theorem.

Actually there is an if and only if version of it but the most useful case is what
is presented here. The process used to get the subsequence in the proof is called
the Cantor diagonalization procedure.
15.5 General Topological Spaces

It turns out that metric spaces are not suciently general for some applications.
This section is a brief introduction to general topology. In making this generaliza-
tion, the properties of balls which are the conclusion of Theorem 15.1.4 on Page
364 are stated as axioms for a subset of the power set of a given set which will be
known as a basis for the topology.
Denition 15.5.1 Let X be a nonempty set and suppose B P (X). Then B is a
basis for a topology if it satises the following axioms.
1.) Whenever p A B for A, B B, it follows there exists C B such that
p C A B.
2.) B = X.
Then a subset, U, of X is an open set if for every point, x U, there exists
B B such that x B U . Thus the open sets are exactly those which can be
obtained as a union of sets of B. Denote these subsets of X by the symbol and
refer to as the topology or the set of open sets.
Note that this is simply the analog of saying a set is open exactly when every
point is an interior point.
Proposition 15.5.2 Let X be a set and let B be a basis for a topology as dened
above and let be the set of open sets determined by B. Then
, X , (15.5.7)
If C , then C (15.5.8)
If A, B , then A B . (15.5.9)
15.5. GENERAL TOPOLOGICAL SPACES 375
Proof: If p then there exists B B such that p B because there are

no points in . Therefore, . Now if p X, then by part 2.) of Denition 15.5.1
p B X for some B B and so X .
If C , and if p C, then there exists a set, B C such that p B.
However, B is itself a union of sets from B and so there exists C B such that
p C B C. This veries 15.5.8.
Finally, if A, B and p A B, then since A and B are themselves unions
of sets of B, it follows there exists A1 , B1 B such that A1 A, B1 B, and
p A1 B1 . Therefore, by 1.) of Denition 15.5.1 there exists C B such that
p C A1 B1 A B, showing that A B as claimed. Of course if
A B = , then A B . This proves the proposition.
Denition 15.5.3 A set X together with such a collection of its subsets satisfying
15.5.7-15.5.9 is called a topological space. is called the topology or set of open sets
of X.
Denition 15.5.4 A topological space is said to be Hausdor if whenever p and q

are distinct points of X, there exist disjoint open sets U, V such that p U, q V .
In other words points can be separated with open sets.
U V
p q

Hausdor
Denition 15.5.5 A subset of a topological space is said to be closed if its comple-

ment is open. Let p be a point of X and let E X. Then p is said to be a limit
point of E if every open set containing p contains a point of E distinct from p.
Note that if the topological space is Hausdor, then this denition is equivalent
to requiring that every open set containing p contains innitely many points from
E. Why?
Theorem 15.5.6 A subset, E, of X is closed if and only if it contains all its limit
points.
Proof: Suppose rst that E is closed and let x be a limit point of E. Is x E?

If x
/ E, then E C is an open set containing x which contains no points of E, a
contradiction. Thus x E.
Now suppose E contains all its limit points. Is the complement of E open? If
x E C , then x is not a limit point of E because E has all its limit points and so
there exists an open set, U containing x such that U contains no point of E other
than x. Since x / E, it follows that x U E C which implies E C is an open set
because this shows E C is the union of open sets.
Theorem 15.5.7 If (X, ) is a Hausdor space and if p X, then {p} is a closed

set.
Proof: If x = p, there exist open sets U and V such that x U, p V and

C
U V = . Therefore, {p} is an open set so {p} is closed.
Note that the Hausdor axiom was stronger than needed in order to draw the
conclusion of the last theorem. In fact it would have been enough to assume that
if x = y, then there exists an open set containing x which does not intersect y.
Denition 15.5.8 A topological space (X, ) is said to be regular if whenever C is

a closed set and p is a point not in C, there exist disjoint open sets U and V such
that p U, C V . Thus a closed set can be separated from a point not in the
closed set by two disjoint open sets.
U V
p C

Regular
Denition 15.5.9 The topological space, (X, ) is said to be normal if whenever

C and K are disjoint closed sets, there exist disjoint open sets U and V such that
C U, K V . Thus any two disjoint closed sets can be separated with open sets.
U V
C K
Normal
Denition 15.5.10 Let E be a subset of X. E is dened to be the smallest closed

set containing E.
Lemma 15.5.11 The above denition is well dened.
Proof: Let C denote all the closed sets which contain E. Then C is nonempty
because X C.
C { }
( {A : A C}) = AC : A C ,
an open set which shows that C is a closed set and is the smallest closed set which
contains E.
Theorem 15.5.12 E = E {limit points of E}.

15.5. GENERAL TOPOLOGICAL SPACES 377
Proof: Let x E and suppose that x / E. If x is not a limit point either, then
there exists an open set, U ,containing x which does not intersect E. But then U C
is a closed set which contains E which does not contain x, contrary to the denition
that E is the intersection of all closed sets containing E. Therefore, x must be a
limit point of E after all.
Now E E so suppose x is a limit point of E. Is x E? If H is a closed set
containing E, which does not contain x, then H C is an open set containing x which
contains no points of E other than x negating the assumption that x is a limit point
of E.
The following is the denition of continuity in terms of general topological spaces.
It is really just a generalization of the - denition of continuity given in calculus.
Denition 15.5.13 Let (X, ) and (Y, ) be two topological spaces and let f : X
Y . f is continuous at x X if whenever V is an open set of Y containing f (x),
there exists an open set U such that x U and f (U ) V . f is continuous if
f 1 (V ) whenever V .
You should prove the following.
Proposition 15.5.14 In the situation of Denition 15.5.13 f is continuous if and

only if f is continuous at every point of X.
n
Denition 15.5.15 Let (Xi , i ) be topological spaces. i=1 Xi is the Cartesian
n
product. Dene a product topology as follows. Let B = i=1 Ai where Ai i .
Then B is a basis for the product topology.
Theorem 15.5.16 The set B of Denition 15.5.15 is a basis for a topology.

n n
Proof: Suppose x i=1 Ai i=1 Bi where Ai and Bi are open sets. Say
x = (x1 , , xn ) .
n n
n xi Ai Bi for each i. Therefore, x i=1 Ai Bi B and i=1 Ai Bi
Then
i=1 Ai .
The denition of compactness is also considered for a general topological space.
This is given next.
Denition 15.5.17 A subset, E, of a topological space (X, ) is said to be compact

if whenever C and E C, there exists a nite subset of C, {U1 Un }, such
that E ni=1 Ui . (Every open covering admits a nite subcovering.) E is precom-
pact if E is compact. A topological space is called locally compact if it has a basis
B, with the property that B is compact for each B B.
In general topological spaces there may be no concept of bounded. Even if

there is, closed and bounded is not necessarily the same as compactness. However,
in any Hausdor space every compact set must be a closed set.
Theorem 15.5.18 If (X, ) is a Hausdor space, then every compact subset must
also be a closed set.
Proof: Suppose p
/ K. For each x X, there exist open sets, Ux and Vx such
that
x Ux , p Vx ,
and
Ux Vx = .
If K is assumed to be compact, there are nitely many of these sets, Ux1 , , Uxm
which cover K. Then let V m i=1 Vxi . It follows that V is an open set containing
p which has empty intersection with each of the Uxi . Consequently, V contains no
points of K and is therefore not a limit point of K. This proves the theorem.
A useful construction when dealing with locally compact Hausdor spaces is the
notion of the one point compactication of the space.
Denition 15.5.19 Suppose (X, ) is a locally compact Hausdor space. Then let
Xe X {} where is just the name of some point which is not in X which is
called the point at innity. A basis for the topology e e is
for X
{ C }
K where K is a compact subset of X .
e and so the open sets, K C are basic open
The complement is taken with respect to X
sets which contain .
The reason this is called a compactication is contained in the next lemma.
( )
Lemma 15.5.20 If (X, ) is a locally compact Hausdor space, then X, e e
is a
compact Hausdor space. Also if U is an open set of e , then U \ {} is an open
set of .
( )
Proof: Since (X, ) is a locally compact Hausdor space, it follows X, e e
is
a Hausdor topological space. The only case which needs checking is the one of
p X and . Since (X, ) is locally compact, there exists an open set of , U
C
having compact closure which contains p. Then p U and U and these are
disjoint open sets containing the points, p and respectively. Now let C be an
open cover of X e with sets from e
. Then must be in some set, U from C, which
must contain a set of the form K C where K is a compact subset of X. Then there
exist sets from C, U1 , , Ur which cover K. Therefore, a nite subcover of X e is
U1 , , Ur , U .
To see the last claim, suppose U contains since otherwise there is nothing to
show. Notice that if C is a compact set, then X \ C is an open set. Therefore, if
x U \ {} , and if X e \ C is a basic open set contained in U containing , then
if x is in this basic open set of X, e it is also in the open set X \ C U \ {} . If x
is not in any basic open set of the form X e \ C then x is contained in an open set of
which is contained in U \ {}. Thus U \ {} is indeed open in . This proves
the lemma.
Denition 15.5.21 If every nite subset of a collection of sets has nonempty in-
tersection, the collection has the nite intersection property.
Theorem 15.5.22 Let K be a set whose elements are compact subsets of a Haus-
dor topological space, (X, ). Suppose K has the nite intersection property. Then
= K.
Proof: Suppose to the contrary that = K. Then consider

{ }
C KC : K K .
It follows C is an open cover of K0 where K0 is any particular element of K. But

then there are nitely many K K, K1 , , Kr such that K0 ri=1 KiC implying
that ri=0 Ki = , contradicting the nite intersection property.
Lemma 15.5.23 Let (X, ) be a topological space and let B be a basis for . Then
K is compact if and only if every open cover of basic open sets admits a nite
subcover.
Proof: Suppose rst that X is compact. Then if C is an open cover consisting

of basic open sets, it follows it admits a nite subcover because these are open sets
in C.
Next suppose that every basic open cover admits a nite subcover and let C be
an open cover of X. Then dene Ce to be the collection of basic open sets which are
contained in some set of C. It follows Ce is a basic open cover of X and so it admits
a nite subcover, {U1 , , Up }. Now each Ui is contained in an open set of C. Let
Oi be a set of C which contains Ui . Then {O1 , , Op } is an open cover of X. This
proves the lemma.
In fact, much more can be said than Lemma 15.5.23. However, this is all which
I will present here.
15.6 Connected Sets

Stated informally, connected sets are those which are in one piece. More precisely,
Denition 15.6.1 A set, S in a general topological space is separated if there exist

sets, A, B such that
S = A B, A, B = , and A B = B A = .
In this case, the sets A and B are said to separate S. A set is connected if it is not
separated.
One of the most important theorems about connected sets is the following.
Theorem 15.6.2 Suppose U and V are connected sets having nonempty intersec-
tion. Then U V is also connected.
Proof: Suppose U V = A B where A B = B A = . Consider the sets,

A U and B U. Since
( )
(A U ) (B U ) = (A U ) B U = ,
It follows one of these sets must be empty since otherwise, U would be separated.
It follows that U is contained in either A or B. Similarly, V must be contained in
either A or B. Since U and V have nonempty intersection, it follows that both V
and U are contained in one of the sets, A, B. Therefore, the other must be empty
and this shows U V cannot be separated and is therefore, connected.
The intersection of connected sets is not necessarily connected as is shown by
the following picture.
Theorem 15.6.3 Let f : X Y be continuous where X and Y are topological

spaces and X is connected. Then f (X) is also connected.
Proof: To do this you show f (X) is not separated. Suppose to the contrary
that f (X) = A B where A and B separate f (X) . Then consider the sets, f 1 (A)
and f 1 (B) . If z f 1 (B) , then f (z) B and so f (z) is not a limit point of
A. Therefore, there exists an open set, U containing f (z) such that U A = .
But then, the continuity of f implies that f 1 (U ) is an open set containing z such
that f 1 (U ) f 1 (A) = . Therefore, f 1 (B) contains no limit points of f 1 (A) .
Similar reasoning implies f 1 (A) contains no limit points of f 1 (B). It follows
that X is separated by f 1 (A) and f 1 (B) , contradicting the assumption that X
was connected.
An arbitrary set can be written as a union of maximal connected sets called
connected components. This is the concept of the next denition.
Denition 15.6.4 Let S be a set and let p S. Denote by Cp the union of all
connected subsets of S which contain p. This is called the connected component
determined by p.
Theorem 15.6.5 Let Cp be a connected component of a set S in a general topo-

logical space. Then Cp is a connected set and if Cp Cq = , then Cp = Cq .
Proof: Let C denote the connected subsets of S which contain p. If Cp = A B

where
A B = B A = ,
then p is in one of A or B. Suppose without loss of generality p A. Then every
set of C must also be contained in A also since otherwise, as in Theorem 15.6.2, the
set would be separated. But this implies B is empty. Therefore, Cp is connected.
From this, and Theorem 15.6.2, the second assertion of the theorem is proved.
This shows the connected components of a set are equivalence classes and par-
tition the set.
A set, I is an interval in R if and only if whenever x, y I then (x, y) I. The
following theorem is about the connected sets in R.
Theorem 15.6.6 A set, C in R is connected if and only if C is an interval.
Proof: Let C be connected. If C consists of a single point, p, there is nothing

to prove. The interval is just [p, p] . Suppose p < q and p, q C. You need to show
(p, q) C. If
x (p, q) \ C
let C (, x) A, and C (x, ) B. Then C = A B and the sets, A and B
separate C contrary to the assumption that C is connected.
Conversely, let I be an interval. Suppose I is separated by A and B. Pick x A
and y B. Suppose without loss of generality that x < y. Now dene the set,
S {t [x, y] : [x, t] A}
and let l be the least upper bound of S. Then l A so l

/ B which implies l A.
But if l
/ B, then for some > 0,
(l, l + ) B =
contradicting the denition of l as an upper bound for S. Therefore, l B which

implies l
/ A after all, a contradiction. It follows I must be connected.
The following theorem is a very useful description of the open sets in R.
Theorem 15.6.7 Let U be an open set in R. Then there exist countably many

disjoint open sets, {(ai , bi )}i=1 such that U =
i=1 (ai , bi ) .
Proof: Let p U and let z Cp , the connected component determined by p.

Since U is open, there exists, > 0 such that (z , z + ) U. It follows from
Theorem 15.6.2 that
(z , z + ) Cp .
This shows Cp is open. By Theorem 15.6.6, this shows Cp is an open interval,
(a, b) where a, b [, ] . There are therefore at most countably many of these
connected components because each must contain a rational number and the ra-

tional numbers are countable. Denote by {(ai , bi )}i=1 the set of these connected
components.
Denition 15.6.8 A topological space, E is arcwise connected if for any two

points, p, q E, there exists a closed interval, [a, b] and a continuous function,
: [a, b] E such that (a) = p and (b) = q. E is locally connected if it has
a basis of connected open sets. E is locally arcwise connected if it has a basis of
arcwise connected open sets.
An example of an arcwise connected topological space would be the any subset

of Rn which is the continuous image of an interval. Locally connected is not the
same as connected. A well known example is the following.
{( ) }
1
x, sin : x (0, 1] {(0, y) : y [1, 1]} (15.6.10)
x
You can verify that this set of points considered as a metric space with the metric
from R2 is not locally connected or arcwise connected but is connected.
Proposition 15.6.9 If a topological space is arcwise connected, then it is con-

nected.
Proof: Let X be an arcwise connected space and suppose it is separated.

Then X = A B where A, B are two separated sets. Pick p A and q B.
Since X is given to be arcwise connected, there must exist a continuous function
: [a, b] X such that (a) = p and (b) = q. But then we would have ([a, b]) =
( ([a, b]) A) ( ([a, b]) B) and the two sets, ([a, b]) A and ([a, b]) B are
separated thus showing that ([a, b]) is separated and contradicting Theorem 15.6.6
and Theorem 15.6.3. It follows that X must be connected as claimed.
Theorem 15.6.10 Let U be an open subset of a locally arcwise connected topolog-

ical space, X. Then U is arcwise connected if and only if U if connected. Also the
connected components of an open set in such a space are open sets, hence arcwise
connected.
Proof: By Proposition 15.6.9 it is only necessary to verify that if U is connected

and open in the context of this theorem, then U is arcwise connected. Pick p U .
Say x U satises P if there exists a continuous function, : [a, b] U such that
(a) = p and (b) = x.
A {x U such that x satises P.}
If x A, there exists, according to the assumption that X is locally arcwise con-

nected, an open set, V, containing x and contained in U which is arcwise connected.
Thus letting y V, there exist intervals, [a, b] and [c, d] and continuous functions
having values in U , , such that (a) = p, (b) = x, (c) = x, and (d) = y.
Then let 1 : [a, b + d c] U be dened as
{
(t) if t [a, b]
1 (t)
(t) if t [b, b + d c]
15.7. EXERCISES 383
Then it is clear that 1 is a continuous function mapping p to y and showing that

V A. Therefore, A is open. A = because there is an open set, V containing p
which is contained in U and is arcwise connected.
Now consider B U \ A. This is also open. If B is not open, there exists a
point z B such that every open set containing z is not contained in B. Therefore,
letting V be one of the basic open sets chosen such that z V U, there exist
points of A contained in V. But then, a repeat of the above argument shows z A
also. Hence B is open and so if B = , then U = B A and so U is separated by
the two sets, B and A contradicting the assumption that U is connected.
It remains to verify the connected components are open. Let z Cp where Cp
is the connected component determined by p. Then picking V an arcwise connected
open set which contains z and is contained in U, Cp V is connected and contained
in U and so it must also be contained in Cp .
As an application, consider the following corollary.
Corollary 15.6.11 Let f : Z be continuous where is a connected open set.

Then f must be a constant.
Proof: Suppose not. Then it achieves two dierent values, k and l = k. Then
= f 1 (l) f 1 ({m Z : m = l}) and these are disjoint nonempty open sets
which separate . To see they are open, note
( ( ))
1 1 1 1
f ({m Z : m = l}) = f m=l m , n +
6 6
which is the inverse image of an open set.
15.7 Exercises
1. Let V be an open set in Rn . Show there is an increasing sequence of
open sets, {Um } , such for all m N, Um Um+1 , Um is compact, and
V = m=1 Um .
2. Completeness of R is an axiom. Using this, show Rn and Cn are complete

metric spaces with respect to the distance given by the usual norm.
3. Let X be a metric space. Can we conclude B (x, r) = {y : d (x, y) r}?
Hint: Try letting X consist of an innite set and let d (x, y) = 1 if x = y and
d (x, y) = 0 if x = y.
4. The usual version of completeness in R involves the assertion that a nonempty
set which is bounded above has a least upper bound. Show this is equivalent
to saying that every Cauchy sequence converges.
5. If (X, d) is a metric space, prove that whenever K, H are disjoint non empty
closed sets, there exists f : X [0, 1] such that f is continuous, f (K) = {0},
and f (H) = {1}.
6. Consider R with the usual metric, d (x, y) = |x y| and the metric,
(x, y) = |arctan x arctan y|
Thus we have two metric spaces here although they involve the same sets of
points. Show the identity map is continuous and has a continuous inverse.
Show that R with the metric, is not complete while R with the usual metric
is complete. The rst part of this problem shows the two metric spaces are
homeomorphic. (That is what it is called when there is a one to one onto
continuous map having continuous inverse between two topological spaces.)
Thus completeness is not a topological property although it will likely be
referred to as such.
7. If M is a separable metric space and T M , then T is also a separable

metric space with the same metric.
8. Prove the Heine Borel theorem as follows. First show [a, b] is compact in R.
n
Next show that i=1 [ai , bi ] is compact. Use this to verify that compact sets
are exactly those which are closed and bounded.
9. Give an example of a metric space in which closed and bounded subsets are
not necessarily compact. Hint: Let X be any innite set and let d (x, y) = 1
if x = y and d (x, y) = 0 if x = y. Show this is a metric space. What about
B (x, 2)?
10. If f : [a, b] R is continuous, show that f is Riemann integrable.Hint: Use

the theorem that a function which is continuous on a compact set is uniformly
continuous.
11. Give an example of a set, X R2 which is connected but not arcwise

connected. Recall arcwise connected means for every two points, p, q X
there exists a continuous function f : [0, 1] X such that f (0) = p, f (1) = q.
12. Let (X, d) be a metric space where d is a bounded metric. Let C denote the
collection of closed subsets of X. For A, B C, dene
(A, B) inf { > 0 : A B and B A}
where for a set S,
S {x : dist (x, S) inf {d (x, s) : s S} } .
Show S is a closed set containing S. Also show that is a metric on C. This

is called the Hausdor metric.
13. Using 12, suppose (X, d) is a compact metric space. Show (C, ) is a complete
metric space. Hint: Show rst that if Wn W where Wn is closed, then
(Wn , W ) 0. Now let {An } be a Cauchy sequence in C. Then if > 0
15.7. EXERCISES 385
there exists N such that when m, n N , then (An , Am ) < . Therefore, for
each n N ,
(An )
k=n Ak .
Let A
n=1 k=n Ak . By the rst part, there exists N1 > N such that for
n N1 , ( )

k=n Ak , A < , and (An ) k=n Ak .
Therefore, for such n, A Wn An and (Wn ) (An ) A because
(An )
k=n Ak A.
14. In the situation of the last two problems, let X be a compact metric space.
Show (C, ) is compact. Hint: Let Dn be a 2n net for X. Let Kn denote
nite unions of sets of the form B (p, 2n ) where p Dn . Show Kn is a
2(n1) net for (C, ).
15. Suppose U is an open connected subset of Rn and f : U N is continuous.

That is f has values only in N. Also N is a metric space with respect to the
usual metric on R. Show that f must actually be constant.
16. Explain why L (V, W ) is always a complete normed vector space whenever
V, W are nite dimensional normed vector spaces for any choice of norm for
L (V, W ). Also explain why every closed and bounded subset of L (V, W ) is
sequentially compact for any choice of norm on this space.
17. Let L L (V, V ) where V is a nite dimensional normed vector space. Dene

Lk
eL
k!
k=1
Explain the meaning of this innite sum and show it converges in L (V, V ) for
any choice of norm on this space. Now tell how to dene sin (L).
18. Let X be a nite dimensional normed vector space, real or complex. Show
n
that X is separable.Hint: Let {vi }
i=1 be a basis and dene a map from F to
n
n n
X, , as follows. ( k=1 xk ek ) k=1 xk vk . Show is continuous and has
a continuous inverse. Now let D be a countable dense set in Fn and consider
(D).
19. Let B (X; Rn ) be the space of functions f , mapping X to Rn such that
sup{|f (x)| : x X} < .
Show B (X; Rn ) is a complete normed linear space if we dene
||f || sup{|f (x)| : x X}.

C (X; Rn ) {f C (X; Rn ) : (f ) + ||f || ||f || < }
where
||f || sup{|f (x)| : x X}
and
|f (x) f (y)|
(f ) sup{ : x, y X, x = y}.
|x y|
21. Let {fn }
n=1 C (X; R ) where X is a compact subset of R and suppose
n p
||fn || M
for all n. Show there exists a subsequence, nk , such that fnk converges in
C (X; Rn ). The given sequence is precompact when this happens. (This also
shows the embedding of C (X; Rn ) into C (X; Rn ) is a compact embedding.)
Hint: You might want to use the Ascoli Arzela theorem.
22. Let f :R Rn Rn be continuous and bounded and let x0 Rn . If
x : [0, T ] Rn
and h > 0, let {

x0 if s h,
h x (s)
x (s h) , if s > h.
For t [0, T ], let
t
xh (t) = x0 + f (s, h xh (s)) ds.
0
Show using the Ascoli Arzela theorem that there exists a sequence h 0 such
that
xh x
in C ([0, T ] ; Rn ). Next argue
t
x (t) = x0 + f (s, x (s)) ds
0
and conclude the following theorem. If f :R Rn Rn is continuous and

bounded, and if x0 Rn is given, there exists a solution to the following
initial value problem.
x = f (t, x) , t [0, T ]
x (0) = x0 .
This is the Peano existence theorem for ordinary dierential equations.

15.7. EXERCISES 387
23. Let D (x0 , r) be the closed ball in Rn ,
{x : |x x0 | r}
where this is the usual norm coming from the dot product. Let P : Rn
D (x0 , r) be dened by
{
x if x D (x0 , r)
P (x) xx0
x0 + r |xx0|
if x / D (x0 , r)
Show that |P x P y| |x y| for all x Rn .

24. Use Problem 22 to obtain local solutions to the initial value problem where
f is not assumed to be bounded. It is only assumed to be continuous. This
means there is a small interval whose length is perhaps not T such that the
solution to the dierential equation exists on this small interval.
25. Recall that if X is a compact Hausdor space and A is a real algebra of
functions on C (X) which separates the points and annihilates no point, then
A was dense in C (X). Use the one point compactication to show that if X
is a locally compact Hausdorf space the same result holds if C (X) is replaced
with C0 (X) where f C0 (X) means that f is continuous and for every > 0,
there exists a compact set K X such that |f (x)| < whenever x / K. (f
vanishes at ).
Measure Theory And
Topology
16.1 Borel Measures

Denition 16.1.1 A Hausdor space is a topological space which has the property
that if p, q are two distinct points, then there exist disjoint sets U, V such that
p U, q V, and U V = .
Denition 16.1.2 Let be a measure on a algebra S, of subsets of , where

(, ) is a topological space. is a Borel measure if S contains all Borel sets.
Recall that the Borel sets consist of the smallest algebra which contains the open
sets.
Lemma 16.1.3 In a Hausdor space, every compact set is closed.
Proof: Let K be compact. If x / K, then for each k K, there exists an

open set Bk containing k and an open set Uk containing x such that Bk Uk = .
Then nitely many of these Bk cover K because of compactness. Say Bk1 , , Bkn .
It follows that ni=1 Uki is an open set containing x which does not intersect K.
Therefore, K C is open and K is closed.
In the case of a Hausdor topological space, the following lemma gives conditions
under which the algebra of measurable sets for an outer measure contains
the Borel sets. In words, it assumes the outer measure is inner regular on open sets
and outer regular on all sets.
Lemma 16.1.4 Let be a Hausdor space and suppose is an outer measure

satisfying is nite on compact sets and the following conditions,
1. (E) = inf { (V ) , V E, V open} for all E. (Outer regularity.)

2. For every open set V, (V ) = sup { (K) : K V, K compact} (Inner regu-
larity on open sets.)
3. If A, B are compact disjoint sets, then (A B) = (A) + (B).
389
390 MEASURE THEORY AND TOPOLOGY
Then the following hold.
1. If > 0 and if K is compact, there exists V open such that V K and
(V \ K) <
2. If > 0 and if V is open with (V ) < , there exists a compact subset K of

V such that
(V \ K) <
3. Then the measurable sets S contain the Borel sets and also is inner regular
on every open set and for every E S with (E) < .
Proof: First we establish 1 and 2 and use them to establish the last assertion.
Consider 2. Suppose it is not true. Then there exists an open set V having (V ) <
but for all K V, (V \ K) for some > 0. By inner regularity on open
sets, there exists K1 V, K1 compact, such that (K1 ) /2. Now by assumption,
(V \ K1 ) and so by inner regularity on open sets again, there exists compact
K2 V \ K1 such that (K2 ) /2. Continuing this way, there is a sequence of
disjoint compact sets contained in V {Ki } such that (Ki ) /2.
K1
K4
K2
K3
Now this is an obvious contradiction because by 3,

n

(V ) (ni=1 Ki ) = (Ki ) n
i=1
2
for each n, contradicting (V ) < .

Next consider 1. By outer regularity, there exists an open set W K such
that (W ) < (K) + 1. By 2, there exists compact K1 W \ K such that
((W \ K) \ K1 ) < . Then consider V W \ K1 . This is an open set contain-
ing K and from what was just shown,
((W \ K1 ) \ K) = ((W \ K) \ K1 ) < .
Now consider the last assertion.

16.1. BOREL MEASURES 391
Dene
S1 = {E P () : E K S}
for all compact K.
First it will be shown the compact sets are in S. From this it will follow the
closed sets are in S1 . Then you show S1 = S. Thus S1 = S is a algebra and so it
contains the Borel sets. Finally you show the inner regularity assertion.
Claim 1: Compact sets are in S.
Proof of claim: Let V be an open set with (V ) < . I will show that for C
compact,
(V ) (V \ C) + (V C).
Here is a diagram to help keep things straight.
H V C
By 2, there exists a compact set K V \ C such that

((V \ C) \ K) < .
and a compact set H V such that
(V \ H) <
Then
(V ) (H) + (H C) + (H \ C) +
(V C) + (V \ C) + (H C) + (K) + 3
By 3,
= (H C) + (K) + 3 = ((H C) K) + 3 (V ) + 3.
Since is arbitrary, this shows that
(V ) = (V \ C) + (V C). (16.1.1)
Of course 16.1.1 is exactly what needs to be shown for arbitrary S in place of
V . It suces to consider only S having (S) < . If S , with (S) < , let
V S, (S) + > (V ). Then from what was just shown, if C is compact,
+ (S) > (V ) = (V \ C) + (V C)
(S \ C) + (S C).
Since is arbitrary, this shows the compact sets are in S. This proves the claim.
As discussed above, this veries the closed sets are in S1 . If S1 is a algebra,
this will show that S1 contains the Borel sets. Thus I rst show S1 is a algebra.
To see that S1 is closed with respect to taking complements, let E S1 and K
a compact set.
K = (E C K) (E K).
Then from the fact, just established, that the compact sets are in S,
E C K = K \ (E K) S.
S1 is closed under countable unions because if K is a compact set and En S1 ,
K
n=1 En = n=1 K En S
because it is a countable union of sets of S. Thus S1 is a algebra.

Therefore, if E S and K is a compact set, just shown to be in S, it follows
K E S because S is a algebra which contains the compact sets and so S1 S.
It remains to verify S1 S. Recall that
S1 {E : E K S for all K compact}
Let E S1 and let V be an open set with (V ) < and choose K V such
that (V \ K) < . Then since E S1 , it follows E K, E C K S and so
The two sets are disjoint and in S
z }| {
(V ) (V \ E) + (V E) (K \ E) + (K E) + 2
= (K) + 2 (V ) + 3
(V ) = (V \ E) + (V E)
which would show E S if V were an arbitrary set.

Now let S be such an arbitrary set. If (S) = , then
(S) = (S E) + (S \ E).
If (S) < , let

V S, (S) + (V ).
Then
(S) + (V ) = (V \ E) + (V E) (S \ E) + (S E).
Since is arbitrary, this shows that E S and so S1 = S. Thus S Borel sets as

claimed.
From 2 is inner regular on all open sets. It remains to show that
(F ) = sup{(K) : K F }
16.1. BOREL MEASURES 393
for all F S with (F ) < . It might help to refer to the following crude picture
to keep things straight. It also might not help. I am not sure.
<
U \F F K VC V V
<
Let (F ) < and let U be an open set U F, (U ) < . Let V be open,
V U \ F , and
(V \ (U \ F )) < .
(This can be obtained as follows, because is a measure on S.
(V ) = (U \ F ) + (V \ (U \ F ))
Thus from the outer regularity of , 1 above, there exists V such that it contains
U \ F and
(U \ F ) + > (V ) .
and so
(V \ (U \ F )) = (V ) (U \ F ) < .)
Also,
( )C
V \ (U \ F ) = V U FC
[ ]
= V UC F
( )
= (V F ) V U C
V F
and so
(V F ) (V \ (U \ F )) < .
Since V U F , V U C F so U V C U F = F . Hence U V C is a
C C
subset of F . Now let K U, (U \ K) < . Thus K V C is a compact subset of

F and
(F ) = (V F ) + (F \ V )
< + (F \ V ) + (U V C ) 2 + (K V C ).
Since is arbitrary, this proves the second part of the lemma.

Where do outer measures come from? One way to obtain an outer measure is
to start with a measure , dened on a algebra of sets S, and use the following
denition of the outer measure induced by the measure.
Denition 16.1.5 Let be a measure dened on a algebra of sets, S P ().

Then the outer measure induced by , denoted by is dened on P () as
(E) = inf{(F ) : F S and F E}.
A measure space, (S, , ) is nite if there exist measurable sets, i with (i ) <
and = i=1 i .
You should prove the following lemma.
Lemma 16.1.6 If (S, , ) is nite then there exist disjoint measurable sets,
{Bn } such that (Bn ) < and
n=1 Bn = .
The following lemma deals with the outer measure generated by a measure
which is nite. It says that if the given measure is nite and complete then no
new measurable sets are gained by going to the induced outer measure and then
considering the measurable sets in the sense of Caratheodory.
Lemma 16.1.7 Let (, S, ) be any measure space and let : P() [0, ] be
the outer measure induced by . Then is an outer measure as claimed and if S
is the set of measurable sets in the sense of Caratheodory, then S S and =
on S. Furthermore, if is nite and (, S, ) is complete, then S = S.
Proof: It is easy to see that is an outer measure. Let E S. The plan is to

show E S and (E) = (E). To show this, let S and then show
(S) (S E) + (S \ E). (16.1.2)
This will verify that E S. If (S) = , there is nothing to prove, so assume

(S) < . Thus there exists T S, T S, and
(S) > (T ) = (T E) + (T \ E)
(T E) + (T \ E)
(S E) + (S \ E) .
Since is arbitrary, this proves 16.1.2 and veries S S. Now if E S and V E

with V S, (E) (V ). Hence, taking inf, (E) (E). But also (E) (E)
since E S and E E. Hence
(E) (E) (E).
Next consider the claim about not getting any new sets from the outer measure
in the case the measure space is nite and complete.
Suppose rst F S and (F ) < . Then there exists E S such that E F
and (E) = (F ) . Since (F ) < ,
(E \ F ) = (E) (F ) = 0.
16.2. REGULAR MEASURES 395
Then there exists D E \ F such that D S and (D) = (E \ F ) = 0. Then by

completeness of S, it follows E \ F S and so
E = (E \ F ) F
Hence F = E \ (E \ F ) S. In the general case where (F ) is not known to be

nite, let (Bn ) < , with Bn Bm = for all n = m and n Bn = . Apply what
was just shown to F Bn , obtaining each of these is in S. Then F = n F Bn S.
16.2 Regular Measures

If (, F, ) is a measure space and if is also a topological space, the following
gives the denition of what it means to say that is regular.
Denition 16.2.1 Let be a measure on a algebra S, of subsets of , where

(, ) is a topological space. is a Borel measure if S contains all Borel sets. is
called outer regular if is Borel and for all E S,
(E) = inf{(V ) : V is open and V E}.
is called inner regular if is Borel and
(E) = sup{(K) : K E, and K is compact}.
If the measure is both outer and inner regular, it is called regular. Such a measure,
if it is complete, is referred to as a Radon measure.
The following is a very interesting result.
Theorem 16.2.2 Let be a separable complete metric space, and let be a nite
measure ( () < ) dened on the Borel sets B (). Then is regular.
Proof: Also denote by the outer measure generated by . There is no loss of

generality in doing this thanks to Lemma 16.1.7. For convenience, lets say that a
measure is almost regular on F if for all A F,
(A) = inf { (V ) , V A, V open}
and for all A F ,
(A) = sup { (C) , C A, C closed} .
First I will show that is almost regular. This will not require completeness of the
metric space. I will also say almost regular on A if the above conditions hold for
A.
Claim 1: is almost regular.
Proof: It is obvious that
(U ) = sup { (C) : C U, C closed}
whenever U is open. This is because

{ }

( ) 1
U = n=1 x U : dist x, U C
.
n
It is also obvious that if H is closed, then
(H) = inf { (V ) : V H, V open}
because { }
1
H=
n=1 x : dist (x, H) < .
n
In words, is almost regular on open and closed sets. Let
G {F B (E) : is almost regular on F }
Then G is closed with respect (to complements.

) If F G, there exists a closed set
H F such that (F \ H) = H C \ F C < . Thus the inner almost regularity
of on F implies the outer regularity of (F C . Similarly,
) there exists an open set
V open, V F such that (V \ F ) = F C \ V C < and so the outer almost
regularity of on F implies the inner almost regularity of on F C . Thus G is
indeed closed with respect to complements.

If {Fj }j=1 G, There exist Hj Fj with Hj closed and (Fj \ Hj ) < /2j .
Then

( )
j=1 F j \
j=1 Hj (Fj \ Hj ) <
j=1
It follows that since is a nite measure, then for all n large enough,
( )
j=1 Fj \ j=1 Hj <
n
Thus is inner almost regular on

j=1 Fj . Similarly, there exist open sets Vj Fj
with (Vj \ Fj ) < /2 . Then
j

( )

j=1 Vj \
j=1 Fj (Vj \ Fj ) <
j=1
Thus G is a algebra. It contains the open sets because is inner almost regular on
open sets and obviously outer regular on an open set. Therefore G contains B ().
Next we use the separability and completeness of the metric space to go from
almost regular to regular.
Claim 2: Let C be a closed set. Then
(C) = sup { (K) : K C and K compact}

16.2. REGULAR MEASURES 397
{ ( )}
Proof: Let {ak }k=1 be a dense subset of . Thus B ak , n1 k,n cover C. It
follows there exists mn such that
( ( ))
1
C \ k=1 B ak ,
mn
(C \ Cn ) < n .
n 2
Let
K C (
n=1 Cn )
Then K is a compact set because it is totally bounded and a closed subset of a

complete space.
(C \ K) = (
n=1 (C \ Cn ))

(C \ Cn ) < n
= .
n=1 n=1
2
This proves the second claim.

Now it follows that is inner regular as well as outer regular.
Corollary 16.2.3 Let be a complete metric space in which the closures of balls
are compact, = . Also let be a Borel measure which is nite on compact sets.
Then must be regular.
Proof: Let K (E) (K E). Then this is a nite measure if K is contained

in a compact set and is therefore, regular. Let
An B (x0 , n) \ B (x0 , n 1)
, x0 and let
Bn = B (x0 , n + 1) \ B (x0 , n 2)
Thus the An are disjoint and have union equal to and the Bn are open sets having
nite measure which contain the respective An . Also, for E An ,
(E) = Bn (E)
By the above theorem, each Bn is regular. Let E be any Borel set with l < (E) .
Then for n large enough,

n
n
l< (E Ak ) = Bk (E Ak )
k=1 k=1
Choose r < 1 such that also

n
l<r Bk (E Ak )
k=1
There exists a compact set Kk contained in E Ak such that
Bk (Kk ) > rBk (E Ak ) .
Then letting K be the union of these, K E and

n
n
n
(K) = (Kk ) = Bk (Kk ) > r Bk (E Ak ) > l
k=1 k=1 k=1
Thus this is inner regular.

To show outer regular, it suces to assume (E) < since otherwise there is
nothing to prove. There exists an open Vn containing E An which is contained in
Bn such that
Bn (E An ) + /2n > Bn (Vn ) .
Then let V be the union of all these Vn .

(V \ E) = (k Vk \ k (E Ak )) (Vk \ (E Ak ))
k=1

= Bk (Vk \ (E Ak )) < =
2k
k=1 k=1
It follows that
(E) + > (V )
16.3 Locally Compact Hausdor Spaces

Denition 16.3.1 A Hausdor space is a topological space which has the property
that if p, q are two distinct points, then there exist disjoint sets U, V such that
p U, q V, and U V = . A locally compact Hausdor space is one which has a
basis U of open sets with the property that for U U, U is compact.
The fundamental property of locally compact Hausdor spaces which will be of

use here is the following.
Lemma 16.3.2 Let X be a locally compact Hausdor space, and let K be a compact
subset of the open set V . Then there exists an open set U such that U is compact
and K U U V .
Proof: To begin with, here is a claim. This claim is obvious in the case of a
metric space but requires some proof in this more general case.
Claim: If k K then there exists an open set Uk containing k such that Uk is
contained in V.
Proof of claim: Since X is locally compact, there exists a basis of open sets
whose closures are compact, U. Denote by C the set of all U U which contain
16.3. LOCALLY COMPACT HAUSDORFF SPACES 399
k and let C denote the set of all closures of these sets of C intersected with the
closed set V C . Thus C is a collection of compact sets. I will argue that there are
nitely many of the sets of C which have empty intersection. If not, then C has
the nite intersection property and so there exists a point p in all of them. Since
X is a Hausdor space, there exist disjoint basic open sets from U, A, B such that
k A and p B. Therefore, p / A contrary to the above requirement that p be in
all such sets. It follows there are sets A1 , , Am in C such that
V C A1 Am =
Let Uk = A1 Am . Then Uk A1 Am and so it has empty intersection

with V C . Thus it is contained in V . Also Uk is a closed subset of the compact set
A1 so it is compact. This proves the claim.
Now to complete the proof of the theorem, since K is compact, there are nitely
many Uk of the sort just described which cover K, Uk1 , , Ukr . Let
U = ri=1 Uki
so it follows
U = ri=1 Uki
and so K U U V and U is a compact set.
Urysohns lemma is a fundamental result. This lemma really applies to normal
topological spaces but we will need a version of it which holds for locally compact
Hausdor space.
Theorem 16.3.3 (Urysohn) Let (X, ) be a locally compact Hausdor space and
let K V where K is compact and V is open. Then there exists g : X [0, 1] such
that g is continuous, g (x) = 1 on K and there exists an open set U having compact
closure such that g (x) = 0 if x
/ U and K U U V .
Proof: By Lemma 16.3.2, there exists an open set U having compact closure
which also contains K such that U V . Let D {rn }
n=1 be the rational numbers
in (0, 1). Using Lemma 16.3.2 again, choose Vr1 an open set such that
K Vr1 V r1 U.
Suppose Vr1 , , Vrk have been chosen, and list the rational numbers
r1 , , rk
in order,
rl1 < rl2 < < rlk for {l1 , , lk } = {1, , k}.
If rk+1 > rlk then letting p = rlk , let Vrk+1 satisfy
V p Vrk+1 V rk+1 U.
If rk+1 (rli , rli+1 ), let p = rli and let q = rli+1 . Then let Vrk+1 satisfy
V p Vrk+1 V rk+1 Vq .
If rk+1 < rl1 , let p = rl1 and let Vrk+1 satisfy
K Vrk+1 V rk+1 Vp .
Thus there exist open sets Vr for each r Q (0, 1) with the property that if
r < s are two rational numbers,
K Vr V r Vs V s U.
Now let

f (x) = min (inf{t D : x Vt }, 1) , f (x) 1 if x
/ Vt .
tD
(Recall D = Q (0, 1) .) I claim f is continuous.

f 1 ([0, a)) = {Vt : t < a, t D},
an open set.
Next consider x f 1 ([0, a]) so f (x) a. If t > a, then x Vt because if not,
then
inf{t D : x Vt } > a.
Hence f 1 ([0, a]) {Vt : t > a}. It is also clear that {Vt : t > a} f 1 ([0, a]).
Thus
f 1 ([0, a]) = {Vt : t > a} = {V t : t > a}
which is a closed set. If a = 1, f 1 ([0, 1]) = f 1 ([0, a]) = X. Therefore,
f 1 ((a, 1]) = X \ f 1 ([0, a]) = open set.
It follows f is continuous. Clearly f (x) = 0 on K. If x U C , then x
/ Vt for any
t D so f (x) = 1 on U C. Let g (x) = 1 f (x).
In any metric space there is a much easier proof of the conclusion of Urysohns
lemma which applies.
Lemma 16.3.4 Let S be a nonempty subset of a metric space, (X, d) . Dene
f (x) dist (x, S) inf {d (x, y) : y S} .
Then f is continuous.
Proof: Consider |f (x) f (x1 )|and suppose without loss of generality that
f (x1 ) f (x) . Then choose y S such that f (x) + > d (x, y) . Then
|f (x1 ) f (x)| = f (x1 ) f (x) f (x1 ) d (x, y) +
d (x1 , y) d (x, y) +
d (x, x1 ) + d (x, y) d (x, y) +
= d (x1 , x) + .
Since is arbitrary, it follows that |f (x1 ) f (x)| d (x1 , x).
16.3. LOCALLY COMPACT HAUSDORFF SPACES 401
Theorem 16.3.5 (Urysohns lemma for metric space) Let H be a closed subset of
an open set U in a metric space, (X, d) . Then there exists a continuous function,
g : X [0, 1] such that g (x) = 1 for all x H and g (x) = 0 for all x
/ U.
Proof: If x / C, a closed set, then dist (x, C) > 0 because if not, there would
exist a sequence of points of( C converging
) to x and it would follow that x C.
Therefore, dist (x, H) + dist x, U C > 0 for all x X. Now dene a continuous
function g as ( )
dist x, U C
g (x) .
dist (x, H) + dist (x, U C )
It is easy to see this veries the conclusions of the theorem.
Denition 16.3.6 Dene spt(f ) (support of f ) to be the closure of the set {x :

f (x) = 0}. If V is an open set, Cc (V ) will be the set of continuous functions f ,
dened on having spt(f ) V . Thus in Theorem 16.3.3, f Cc (V ).
Denition 16.3.7 If K is a compact subset of an open set V , then K V if
Cc (V ), (K) = {1}, () [0, 1],
where denotes the whole topological space considered. Also for Cc (), K
if
() [0, 1] and (K) = 1.
and V if
() [0, 1] and spt() V.
Theorem 16.3.8 (Partition of unity) Let K be a compact subset of a locally com-

pact Hausdor topological space satisfying Theorem 16.3.3 and suppose
K V = ni=1 Vi , Vi open.
Then there exist i Vi with

n
i (x) = 1
i=1
for all x K.
Proof: Let K1 = K \ ni=2 Vi . Thus K1 is compact and K1 V1 . Let K1

W1 W 1 V1 with W 1 compact. To obtain W1 , use Theorem 16.3.3 to get f such
that K1 f V1 and let W1 {x : f (x) = 0} . Thus W1 , V2 , Vn covers K and
W 1 V1 . Let K2 = K \(ni=3 Vi W1 ). Then K2 is compact and K2 V2 . Let K2
W2 W 2 V2 , W 2 compact. Continue this way nally obtaining W1 , , Wn ,
K W1 Wn , and W i Vi W i compact. Now let W i Ui U i Vi , U i
compact.
Wi Ui Vi
By Theorem 16.3.3, let U i i Vi , ni=1 W i ni=1 Ui . Dene

{ n n
(x)i (x)/ j=1 j (x) if j=1 j (x) = 0,
i (x) = n
0 if j=1 j (x) = 0.
n
If x is such that j=1 j (x) = 0, then x / ni=1 U i . Consequently (y) = 0 for
ally near x and so i (y) = 0 for all y near x. Hence i is continuous at such x.
n
If j=1 j (x) = 0, this situation persists near x and so i is continuous at such
n
points. Therefore i is continuous. If x K, then (x) = 1 and so j=1 j (x) = 1.
Clearly 0 i (x) 1 and spt( j ) Vj .
The following corollary wont be needed immediately but is very interesting just
the same.
Corollary 16.3.9 If H is a compact subset of Vi , there exists a partition of unity

such that i (x) = 1 for all x H in addition to the conclusion of Theorem 16.3.8.
Proof: Keep Vi the same but replace Vj with V fj Vj \ H. Now in the proof
above, applied to this modied collection of open sets, if j = i, j (x) = 0 whenever
x H. Therefore, i (x) = 1 on H.
16.4 Positive Linear Functionals

Denition 16.4.1 Let (, ) be a topological space. L : Cc () C is called a
positive linear functional if L is linear,
L(af1 + bf2 ) = aLf1 + bLf2 ,
and if Lf 0 whenever f 0.
It is easy to nd examples of positive linear functionals of the above sort.
Example 16.4.2 Let be N the positive integers with the usual metric space topol-
ogy |x y| d (x, y). Then every function dened on is continuous. Thus Cc ()
consists of those
functions f which vanish for all n large enough. For such func-

tions, let Lf k=1 f (k) where the sum is well dened because all but nitely
many terms equal 0.
16.4. POSITIVE LINEAR FUNCTIONALS 403
Example 16.4.3 Let = Rn with the usual metric space topology given by |x y|
( )1/2
n 2
i=1 |xi yi | . For f Cc (Rn ) , dene

Lf f (x1 , , xn ) dx1 dxn ,

this being the ordinary iterated Riemann integral used in beginning calculus.
The Riesz representation theorem shows that positive linear functionals of this
sort correspond to measures and the Lebesgue integrals which result extend the
functionals. In the second example, the measure which results will be Lebesgue
measure.
Theorem 16.4.4 (Riesz representation theorem) Let (, ) be a locally compact

Hausdor space and let L be a positive linear functional on Cc (). Then there
exists a algebra S containing the Borel sets and a unique measure , dened on
S, such that
is complete, (16.4.3)
(K) < for all K compact, (16.4.4)
(F ) = sup{(K) : K F, K compact},
for all F open and for all F S with (F ) < ,
(F ) = inf{(V ) : V F, V open}
for all F S, and

f d = Lf for all f Cc (). (16.4.5)
The plan is to dene an outer measure and then to show that it, together with the
algebra of sets measurable in the sense of Caratheodory, satises the conclusions
of the theorem. Always, K will be a compact set and V will be an open set.
Denition 16.4.5 (V ) sup{Lf : f V } for V open, () = 0. (E)

inf{(V ) : V E} for arbitrary sets E.
Lemma 16.4.6 is a well-dened outer measure.
Proof: First it is necessary to verify is well dened because there are two
descriptions of it on open sets. Suppose then that 1 (V ) inf{(U ) : U V
and U is open}. It is required to verify that 1 (V ) = (V ) where is given as
sup{Lf : f V }. If U V, then (U ) (V ) directly from the denition. Hence
from the denition of 1 , it follows 1 (V ) (V ) . On the other hand, V V and
so 1 (V ) (V ) . This veries is well dened.

It remains to show that is an outer measure. nLet V = i=1 Vi and let f V .
Then spt(f ) ni=1 Vi for some n. Let i Vi , i=1 i = 1 on spt(f ).

n
n

Lf = L(f i ) (Vi ) (Vi ).
i=1 i=1 i=1
Hence

(V ) (Vi )
i=1

since f V is arbitrary. Now let E = i=1 Ei . Is (E) i=1 (Ei )? Without
loss of generality, it can be assumed (Ei ) < for each i since if not so, there is
nothing to prove. Let Vi Ei with (Ei ) + 2i > (Vi ).

(E) (
i=1 Vi ) (Vi ) + (Ei ).
i=1 i=1

Since was arbitrary, (E) i=1 (Ei ).
Lemma 16.4.7 Let K be compact, g 0, g Cc (), and g = 1 on K. Then

(K) Lg. Also (K) < whenever K is compact.
Proof: Let (0, 1) and V = {x : g(x) > } so V K and let h V .
K V
g>
Then h 1 on V while g1 1 on V and so g1 h which implies
L(g1 ) Lh and that therefore, since L is linear,
Lg Lh.
Since h V is arbitrary, and K V ,
Lg (V ) (K) .
Letting 1 yields Lg (K). This proves the rst part of the lemma. The
second assertion follows from this and Theorem 16.3.3. If K is given, let
Kg
and so from what was just shown, (K) Lg < .
Lemma 16.4.8 If A and B are disjoint compact subsets of , then (A B) =

(A) + (B).
Proof: By Theorem 16.3.3, there exists h Cc () such that A h B C . Let

U1 = h1 (( 12 , 1]), V1 = h1 ([0, 12 )). Then A U1 , B V1 and U1 V1 = .
A U1 B V1
From Lemma 16.4.7 (A B) < and so there exists an open set W such that
W A B, (A B) + > (W ) .
Now let U = U1 W and V = V1 W . Then
U A, V B, U V = , and (A B) + (W ) (U V ).
Let A f U, B g V . Then by Lemma 16.4.7,
(A B) + (U V ) L(f + g) = Lf + Lg (A) + (B).
Since > 0 is arbitrary, this proves the lemma.

From Lemma 16.4.7 the following lemma is obtained.
Lemma 16.4.9 Let f Cc (), f () [0, 1]. Then (spt(f )) Lf . Also, every
open set V satises
(V ) = sup { (K) : K V } .
Proof: Let V spt(f ) and let spt(f ) g V . Then Lf Lg (V ) because

f g. Since this holds for all V spt(f ), Lf (spt(f )) by denition of .
spt(f ) V
Finally, let V be open and let l < (V ) . Then from the denition of , there
exists f V such that L (f ) > l. Therefore, l < (spt (f )) (V ) and so this
shows the claim about inner regularity of the measure on an open set.
This has now veried the conditions of Lemma 16.1.4. It follows is inner regular
on sets of nite measure and outer regular on all sets, also that the algebra of
measurable sets contains the Borel sets.
It remains to show satises 16.4.5.

Lemma 16.4.10 f d = Lf for all f Cc ().
Proof: Let f Cc (), f real-valued, and suppose f () [a, b]. Choose t0 < a
and let t0 < t1 < < tn = b, ti ti1 < . Let
Ei = f 1 ((ti1 , ti ]) spt(f ). (16.4.6)
Note that ni=1 Ei is a closed set, and in fact
ni=1 Ei = spt(f ) (16.4.7)
since = ni=1 f 1 ((ti1 , ti ]). Let Vi Ei , Vi is open and let Vi satisfy
f (x) < ti + for all x Vi , (16.4.8)
(Vi \ Ei ) < /n.

By Theorem 16.3.8 there exists hi Cc () such that

n
hi Vi , hi (x) = 1 on spt(f ).
i=1
Now note that for each i,
f (x)hi (x) hi (x)(ti + ).
(If x Vi , this follows from 16.4.8. If x

/ Vi both sides equal 0.) Therefore,
n n
Lf = L( f hi ) L( hi (ti + ))
i=1 i=1

n
= (ti + )L(hi )
i=1
( n )
n
= (|t0 | + ti + )L(hi ) |t0 |L hi .
i=1 i=1
Now note that |t0 | + ti + 0 and so from the denition of and Lemma 16.4.7,
this is no larger than

n
(|t0 | + ti + )(Vi ) |t0 |(spt(f ))
i=1

n
(|t0 | + ti + ) ((Ei ) + /n) |t0 |(spt(f ))
i=1
(spt(f ))
z }| {
n
n
|t0 | (Ei ) + |t0 | + ti (Ei ) + (|t0 | + |b|)
i=1 i=1

n
n
ti + (Ei ) + 2 |t0 |(spt(f )).
i=1
n i=1
From 16.4.7 and 16.4.6, the rst and last terms cancel. Therefore this is no larger
than
(2|t0 | + |b| + (spt(f )) + )
n n

+ ti1 (Ei ) + (spt(f )) + (|t0 | + |b|)
i=1 i=1
n

f d + (2|t0 | + |b| + 2(spt(f )) + ) + (|t0 | + |b|)
Since > 0 is arbitrary,

Lf f d (16.4.9)

for all f C
c (), f real. Hence equality holds in 16.4.9 because L(f ) f d
so L(f ) f d. Thus Lf = f d for all f Cc (). Just apply the result for
real functions to the real and imaginary parts of f .
This gives the existence part of the Riesz representation theorem.
It only remains to prove uniqueness. Suppose both 1 and 2 are measures on
S satisfying the conclusions of the theorem. Then if K is compact and V K, let
K f V . Then

1 (K) f d1 = Lf = f d2 2 (V ).
Thus 1 (K) 2 (K) for all K. Similarly, the inequality can be reversed and so it
follows the two measures are equal on compact sets. By the assumption of inner
regularity on open sets, the two measures are also equal on all open sets. By outer
regularity, they are equal on all sets of S.
An important example of a locally compact Hausdor space is any metric space
in which the closures of balls are compact. For example, Rn with the usual metric
is an example of this. Not surprisingly, more can be said in this important special
case.
Theorem 16.4.11 Let (, ) be a metric space in which the closures of the balls
are compact and let L be a positive linear functional dened on Cc () . Then there
exists a measure representing the positive linear functional which satises all the
conclusions of Theorem 16.3.3 and in addition the property that is regular. The
same conclusion follows if (, ) is a compact Hausdor space.
Proof: Let and S be as described in Theorem 16.4.4. The outer regularity
comes automatically as a conclusion of Theorem 16.4.4. It remains to verify inner
regularity. Let F S and let l < k < (F ) . Now let z and n = B (z, n) for
n N. Thus F n F. It follows that for n large enough,
k < (F n ) (F ) .
Since (F n ) < it follows there exists a compact set K such that K

F n F and
l < (K) (F ) .
This proves inner regularity. In case (, ) is a compact Hausdor space, the con-
clusion of inner regularity follows from Theorem 16.4.4. This proves the theorem.

The proof of the above yields the following corollary.
Corollary 16.4.12 Let (, ) be a locally compact Hausdor space and suppose

dened on a algebra S represents the positive linear functional L where L is
dened on Cc () in the sense of Theorem 16.3.3. Suppose also that there exist
n S such that =
n=1 n and (n ) < . Then is regular.
The following is on the uniqueness of the algebra in some cases.
Denition 16.4.13 Let (, ) be a locally compact Hausdor space and let L be a

positive linear functional dened on Cc () such that the complete measure dened
by the Riesz representation theorem for positive linear functionals is inner regular.
Then this is called a Radon measure. Thus a Radon measure is complete, and
regular.
Corollary 16.4.14 Let (, ) be a locally compact Hausdor space which is also

compact meaning
= n=1 n , n is compact,
and let L be a positive linear functional dened on Cc () . Then if (1 , S1 ) , and

(2 , S2 ) are two Radon measures, together with their algebras which represent L
then the two algebras are equal and the two measures are equal.
Proof: Suppose (1 , S1 ) and (2 , S2 ) both work. It will be shown the two

measures are equal on every compact set. Let K be compact and let V be an open
set containing K. Then let K f V. Then

1 (K) = d1 f d1 = L (f ) = f d2 2 (V ) .
K
Therefore, taking the inmum over all V containing K implies 1 (K) 2 (K) .
Reversing the argument shows 1 (K) = 2 (K) . This also implies the two measures
are equal on all open sets because they are both inner regular on open sets. It is
being assumed the two measures are regular. Now let F S1 with 1 (F ) < .
Then there exist sets, H, G such that H F G such that H is the countable
union of compact sets and G is a countable intersection of open sets such that
1 (G) = 1 (H) which implies 1 (G \ H) = 0. Now G \ H can be written as the
countable intersection of sets of the form Vk \Kk where Vk is open, 1 (Vk ) < and
Kk is compact. From what was just shown, 2 (Vk \ Kk ) = 1 (Vk \ Kk ) so it follows
2 (G \ H) = 0 also. Since 2 is complete, and G and H are in S2 , it follows F S2
and 2 (F ) = 1 (F ) . Now for arbitrary F possibly having 1 (F ) = , consider
F n . From what was just shown, this set is in S2 and 2 (F n ) = 1 (F n ).

Taking the union of these F n gives F S2 and also 1 (F ) = 2 (F ) . This shows
S1 S2 . Similarly, S2 S1 .
In a sense, the Riesz representation theorem can be thought of as a way to
extend a given Borel measure on Rn to a complete measure.
Lemma 16.4.15 Suppose is a Borel measure on Rn which is nite on compact

sets. Let Lf = f d be a positive linear functional and let be the measure
representing this functional as in the Riesz representation theorem. Then = on
the Borel sets.
Proof: Let V be open and let fn V be such that {fn } is an increasing

sequence and converges pointwise to XV . Then by the dominated convergence
theorem,
(V ) = lim fn d = lim fn d = (V )
n n
Let B be an open bounded set and let B (E) (B E) , B (E) (B E) .

Then B and B are nite Borel measures dened on a separable complete metric
space. From what was just shown, they are equal on open sets. By Theorem 16.2.2
these nite measures are both regular measures. Therefore, by outer regularity, for
E Borel,
B (E) = inf {B (V ) : V E} = inf {B (V ) : V E} = B (E)
Now let k N and Bk = B (0, k) . It was just shown that for E Borel,
(Bk E) = (Bk E) .
Let k . This proves the lemma.

The following lemma is often useful.
Lemma 16.4.16 Let (, F, ) be a measure space where is a topological space.

Suppose is a Radon measure and f is measurable with respect to F. Then there
exists a Borel measurable function g, such that g = f a.e.
Proof: Assume without loss of generality that f 0. Then let sn f pointwise.

Say

Pn
sn () = cnk XEkn ()
k=1
where Ekn F . By the outer regularity of , there exists a Borel set Fkn Ekn such
that (Fkn ) = (Ekn ). In fact Fkn can be assumed to be a G set. Let

Pn
tn () cnk XFkn () .
k=1
Then tn is Borel measurable and tn () = sn () for all / Nn where Nn F is a

set of measure zero. Now let N N
n=1 n . Then N is a set of measure zero and
/ N , then tn () f (). By outer regularity, there exists N N where N
if
is a Borel set (Could have it a G set) and (N ) = 0. Then tn X(N )C converges
pointwise to a Borel measurable function, g, and g () = f () for all / N .
Therefore, g = f a.e.
16.5 Lebesgue Measure And Its Properties

Denition 16.5.1 Dene the following positive linear functional for f Cc (Rn ) .

f f (x) dx1 dxn .

Then the measure representing this functional is Lebesgue measure.
The following lemma will help in understanding Lebesgue measure.
Lemma 16.5.2 Every open set in Rn is the countable disjoint union of half open
boxes of the form
n
(ai , ai + 2k ]
i=1
k
where ai = l2 for some integers, l, k. The sides of these boxes are of equal length.
One could also have half open boxes of the form

n
[ai , ai + 2k )
i=1
and the conclusion would be unchanged.
Proof: Let

n
Ck = {All half open boxes (ai , ai + 2k ] where
i=1
ai = l2k for some integer l.}

Thus Ck consists of a countable disjoint collection of boxes whose union is Rn . This
is sometimes called a tiling of Rn . Think of tiles on the oor of a bathroom
and
you will get the idea. Note that each box has diameter no larger than 2k n. This
is because if
n
x, y (ai , ai + 2k ],
i=1
16.5. LEBESGUE MEASURE AND ITS PROPERTIES 411
then |xi yi | 2k . Therefore,

( )1/2

n
( )
k 2
|x y| 2 = 2k n.
i=1
Let U be open and let B1 all sets of C1 which are contained in U . If B1 , , Bk

have been chosen, Bk+1 all sets of Ck+1 contained in
( )
U \ ki=1 Bi .
Let B = i=1 Bi . In fact B = U . Clearly B U because every box of every

Bi is contained in U . If p U , let k be the smallest integer such that p is contained
in a box from Ck which is also a subset of U . Thus
p Bk B .
Hence B is the desired countable disjoint collection of half open boxes whose union
is U . The last assertion about the other type of half openrectangle is obvious.
n
Now what does Lebesgue measure do to a rectangle, i=1 (ai , bi ]?
n n
Lemma 16.5.3 Let R = i=1 [ai , bi ], R0 = i=1 (ai , bi ). Then

n
mn (R0 ) = mn (R) = (bi ai ).
i=1
Proof: Let k be large enough that
ai + 1/k < bi 1/k
for i = 1, , n and consider functions gik and fik having the following graphs.
1 fik 1 gik
ai + 1
bi 1 ai 1
k bi + 1
k k k
R R
ai bi ai bi
Let

n
n
g k (x) = gik (xi ), f k (x) = fik (xi ).
i=1 i=1
Then by elementary calculus along with the denition of ,

n
(bi ai + 2/k) g k = g k dmn mn (R) mn (R0 )
i=1

n
f k dmn = f k (bi ai 2/k).
i=1
Letting k , it follows that

n
mn (R) = mn (R0 ) = (bi ai ).
i=1
Lemma 16.5.4 Let U be an open or closed set. Then mn (U ) = mn (x + U ) .
Proof: By Lemma 16.5.2 there is a sequence of disjoint half open rectangles,

{Ri } such that i Ri = U. Therefore, x + U = i (x + Ri ) and the x + Ri are
also disjoint rectangles
which areidentical to the Ri but translated. From Lemma
16.5.3, mn (U ) = i mn (Ri ) = i mn (x + Ri ) = mn (x + U ) .
It remains to verify the lemma for a closed set. Let H be a closed bounded set
rst. Then H B (0,R) for some R large enough. First note that x + H is a closed
set. Thus, from what was just shown,
mn (B (0, R)) = mn (B (0, R) + x)

= mn (H + x) + mn ((B (0, R) \ H) + x)
= mn (H + x) + mn (B (0, R) \ H)
and so
mn (H) = mn (B (0, R)) mn (B (0, R) \ H) = mn (H + x)
Therefore, mn (x + H) = mn (H) as claimed. If H is not bounded, consider Hm
B (0, m)H. Then mn (x + Hm ) = mn (Hm ) . Passing to the limit as m yields
the result in general.
Theorem 16.5.5 Lebesgue measure is translation invariant. That is
mn (E) = mn (x + E)
for all E Lebesgue measurable.
Proof: Suppose mn (E) < . By regularity of the measure, there exist sets
G, H such that G is a countable intersection of open sets, H is a countable union
of compact sets, mn (G \ H) = 0, and G E H. Now mn (G) = mn (G + x) and
mn (H) = mn (H + x) which follows from Lemma 16.5.4 applied to the sets which
are either intersected to form G or unioned to form H. Now
x+H x+E x+G
and both x + H and x + G are measurable because they are either countable unions
or countable intersections of measurable sets. Furthermore,
mn (x + G \ x + H) = mn (x + G) mn (x + H) = mn (G) mn (H) = 0
16.6. CHANGE OF VARIABLES 413
and so by completeness of the measure, x + E is measurable. It follows
mn (E) = mn (H) = mn (x + H) mn (x + E)
mn (x + G) = mn (G) = mn (E) .
If mn (E) is not necessarily less than , consider Em B (0, m) E. Then

mn (Em ) = mn (Em + x) by the above. Letting m it follows mn (E) =
mn (E + x).
16.6 Change Of Variables
If E is any Lebesgue measurable set in Rn , and if L is matrix on Rn it will be shown

that mn (LE) = |det (L)| mn (E) and that in fact, LE is Lebesgue measurable.
First consider the case where det (L) = 0 so that L is one to one and onto. Then
L is a homeomorphism, and so if F is any Borel set LF must also be a Borel set.
Therefore, it at least makes sense to speak of mn (LF ) for F Borel.
Recall the elementary matrices are those which result from doing a row operation
to the identity matrix. The important thing about elementary matrices is that a
given matrix is always a product of elementary matrices times the row reduced
echelon form of the given matrix. Now one of the row operations involves taking a
multiple of one row added to another. It is convenient to simplify this by always
letting this multiple equal 1. This changes nothing because you can simply use a
combination of two row operations to achieve the same thing as what was obtained
by taking an arbitrary multiple or a row added to another. This will be done below.
Thus the row operations involve switching rows, multiplying a row by a nonzero
number, and replacing a row with another row added to that row.
Lemma 16.6.1 Let L be an n n matrix. Then for all F Lebesgue measurable,

mn (LF ) = |det (L)| mn (F ) .
Proof: First suppose L is an elementary matrix. From the above description of

Lebesgue measure, it follows that mn (LQ) = |det (L)| mn (Q) for Q = (0, 1]n . This
is obvious for all elementary matrices except the one which comes from adding one
row of the identity to another row. Say L comes from adding the k th row to the
j th row. Then det (L) = 1 because L is a triangular matrix with all ones down the
main diagonal. Letting Q denote (0, 1]n , consider LQ. For x Q, Lx has xi in the
ith component and xk + xj in the j th component. Thus the following results when
x ranges over Q.
S2
ej 6
S1
-
ek
Denote by S1 those points of L (Q) such that also xj < 1. This is a Borel set
because it is the intersection of an open set with a Borel set. Let S2 denote those
points of L (Q) for which xj 1. Again, this is the intersection of Borel sets and
is therefore Borel. Then, as suggested by the picture, it follows from translation
invariance of Lebesgue measure, proved above and elementary geometry, that
mn (L (Q)) = mn (S1 ) + mn (S2 ) = mn (S1 ) + mn (S2 ej )
= mn (Q) = |det (L)| mn (Q)
Now consider a half open rectangle R having all equal sides of length 2m , of
the sort described in Lemma 16.5.2. Then there exists a vector v and an integer m
such that
(R v) = 2m Q
Hence, from translation invariance and the formula det (aL) = an det (L) ,
( )
mn (L (R)) = mn (L (R v)) = mn 2m LQ
= 2mn |det (L)| mn (Q) = |det (L)| mn (R)
It follows from Lemma 16.5.2 that whenever L is an elementary matrix and V
is open,
mn (LV ) = |det (L)| mn (V ) (16.6.10)
16.6.10 is also true for any elementary L if V is replaced with a compact set K.
Let K be compact and contained in an open set V having nite measure. Then
mn (LK) + |det (L)| mn ((V \ K)) = mn (LK) + mn (L (V \ K))
= mn (LV ) = |det (L)| mn (V ) ,
and so
mn (LK) = |det (L)| mn (V ) |det (L)| mn ((V \ K)) = |det (L)| mn (K) .
Now let E be an arbitrary Lebesgue measurable set with mn (E) < . Then
there exists F, a countable union of an increasing sequence of compact sets {Kk }
contained in E, G a countable intersection of a decreasing sequence {Vk } of open
sets containing E, such that
mn (G \ F ) = 0, F E G
16.7. FUBINIS THEOREM 415
Then it also follows from the above that L (G \ F ) is Lebesgue measurable because
it is the intersection of open sets {L (Vk \ Kk )} and that
mn (L (G \ F )) = lim mn (L (Vk \ Kk )) = |det (L)| lim mn (Vk \ Kk )
k k
= |det (L)| mn (G \ F ) = 0
It follows that L (F ) L (E) L (G) and the two ends are Lebesgue measurable
with
mn (L (G \ F )) = mn (LG \ LF ) = 0,
and so L (E) is Lebesgue measurable. Also the desired formula must hold for G
and F and therefore,
|det (L)| mn (E) = |det (L)| mn (F ) = mn (L (F )) mn (L (E))
mn (L (G)) = |det (L)| mn (G) = |det (L)| mn (E)
and so all the inequalities are equal signs and
|det (L)| mn (E) = mn (L (E)) .
If E is an arbitrary Lebesgue measurable set, apply the above result to E B (0, k)
and then let k . Since every invertible matrix is a product of elementary
matrices, it follows that the above formula holds for any invertible L.
Can we relax the requirement that L be invertible? If L is an arbitrary n n
matrix, then there are elementary matrices Ej such that L = E1 Er R where R
is in row reduced echelon form. If R = I, then this just exhibits L as a product of
elementary matrices. Otherwise, R maps Rn to span (e1 , , en1 ) which is clearly
a closed set of Lebesgue measure 0.
Consider this second case that
( L )is not invertible. Then if E is any Lebesgue
measurable set LE E1 Er Rn1 and this second set is closed with Lebesgue
measure zero because, from the above,
( ( ))
r
( )
mn E1 Er Rn1 = |det (Ei )| mn Rn1 = 0.
i=1
Therefore LE, being a subset of this set of measure zero, is also Lebesgue measurable
and has measure 0. Also mn (E) |det (L)| = 0 so the formula continues to hold for
all L invertible or not.
16.7 Fubinis Theorem

First of all, here is a simple observation.
Lemma 16.7.1 There exists a Borel measure dened by

(E) XE dxi1 dxin
dxij = dm1 .
Proof: First, why does it make sense? It obviously makes sense for any E of
the form
n
E= Ui
i=1
where Ui is an open set. Let G denote the Borel sets E for which

XERp dxi1 dxin
n
makes sense. Here Rp = (p, p) . If you have a nite disjoint union ki=1 Ei ,
Then the iterated integral makes sense because it is just the integral of a nite
sum of indicator functions XRp E and for each of these, the iterated integral makes
sense. It follows from the monotone convergence theorem that the iterated integral
makes sense for i=1 Ei where each Ei G. Now if E G, E Rp = R Rp \
C n
(E Rp ) . Each iterated integral makes sense for XERp and each makes sense for
XRp . Therefore, each makes sense for the dierence XRp XERp = XE C Rp . Thus
G contains the open rectangles and if K denotes these open rectangles, G (K) by
Dynkins lemma. However, (K) equals the Borel sets. Hence G equals the Borel
sets. Now dene
(E) XE dxi1 dxin
It is obviously a measure from the monotone convergence theorem. It also makes

sense by the monotone convergence theorem applied to
XERp
and letting p .
This Borel measure is nite on compact sets. If f 0 and is in Cc (Rn ) , then
since it is Borel measurable, there exists an increasing sequence of simple Borel
functions sn f. Then

f d = lim sn d = lim sn dxi1 dxin
n n

= f dxi1 dxin ,
the last step following from a repeated application of the monotone

convergence
theorem. Thus the functional which gives Lebesgue measure is f d, and so by
Lemma 16.4.15, mn = on all Borel sets. Therefore, whenever E is Borel,

XE dmn = XE (x) dxi1 dxin
It follows that if s is a nonnegative Borel measurable simple function, then

sdmn = sdxi1 dxin ,
and passing to a limit using a sequence of nonnegative simple functions and using
the monotone convergence theorem multiple times, we obtain Fubinis theorem.
16.8. EXERCISES 417
Theorem 16.7.2 Let f be a nonnegative Borel measurable function and let i1 in

be any permutation of the integers 1 n. Then

f dmn = f dxi1 dxin
where the iterated integral makes sense because each iterate is measurable.
16.8 Exercises
1. Let = N, the natural numbers and let d (p, q) = |p q|, the usual dis-
tance in
R. Show that (, d) the closures of the balls are compact. Now let
f k=1 f (k) whenever f Cc (). Show this is a well dened positive
linear functional on the space Cc (). Describe the measure of the Riesz rep-
resentation theorem which results from this positive linear functional. What
if (f ) = f (1)? What measure would result from this functional? Which
functions are measurable?
2. Verify that dened in Lemma 16.1.7 is an outer measure.

3. Let F : R R be increasing and right continuous. Let f f dF where
the integral is the Riemann Stieltjes integral of f Cc (R). Show the measure
from the Riesz representation theorem satises
([a, b]) = F (b) F (a) , ((a, b]) = F (b) F (a) ,
([a, a]) = F (a) F (a) .
Hint: You might want to review the material on Riemann Stieltjes integrals
presented in the Preliminary part of the notes.
4. Suppose is a metric space and , are two Borel measures with the prop-
erty that they are nite on every ball and that they are equal on every open
set. Show they must be equal on every Borel set. Hint: Let G denote those
Borel sets E such that (E B) = (E B) for B an open ball. Show G
is closed with respect to countable disjoint unions and complements and con-
tains the system consisting of the open sets. Then consider the lemma on
systems. Let B = B (p, n) , n = 1, 2, .
5. Let be a metric space with the closed balls compact and suppose is a
measure dened on the Borel sets of which is nite on compact sets. Show
there exists a unique Radon measure, which equals on the Borel sets.
6. Random vectors (variables) are measurable functions X, mapping a prob-
ability space, (, P, F) to Rn (sometimes, although not in this problem, a Ba-
nach space). Thus X () Rn for each and P is a probability measure
dened on the sets of F, a algebra of subsets of . For E a Borel set in Rn ,
dene ( )
(E) P X1 (E) probability that X E.
Show this is a well dened measure on the Borel sets of Rn and use Problem 5
to obtain a Radon measure, X dened on a algebra of sets of Rn including
the Borel sets such that for E a Borel set, X (E) =Probability that (X E).
7. For X a random variable dened above and X the Radon measure just
dened, suppose h : Rp R is Borel measurable and in L1 (Rp , X ) . Then

h (X ()) dP = h (x) dX .
Rp
8. Let X, Y be random vectors with values in Rp and suppose that

eitx dX = eity dY
Rp Rp
for all t Rp . Show that it follows X = Y . This function X (t)

itx
Rp
e dX is called the characteristic function for the random variable X.
This is a major result in probability which says that the distribution measures
are determined by the characteristic function. Hint: Letting G the special
space used earlier to dene the Fourier transform, show that

ity
e (t) dtdY = eitx (t) dtdX
Rp Rp Rp Rp
Since F 1 maps G to G, it follows that

dY = dX
Rp Rp
for all G. Then explain why this also holds for all Cc (Rp ) . Now
apply the Riesz representation theorem to conclude that Y = X .
9. Suppose X and Y are metric spaces having compact closed balls. Show
(X Y, dXY )
is also a metric space which has the closures of balls compact. Here
dXY ((x1 , y1 ) , (x2 , y2 )) max (d (x1 , x2 ) , d (y1 , y2 )) .
Let
A {E F : E is a Borel set in X, F is a Borel set in Y } .
Show (A), the smallest algebra containing A contains the Borel sets. Hint:
Show every open set in a metric space which has closed balls compact can be
obtained as a countable union of compact sets. Next show this implies every
open set can be obtained as a countable union of open sets of the form U V
where U is open in X and V is open in Y .
16.8. EXERCISES 419
10. Suppose (, S, ) is a measure space( which may ) not be complete. Could

you obtain a complete measure space, , S, 1 by simply letting S consist
of all sets of the form E where there exists F S such that (F \ E) (E \ F )
N for some N S which has measure zero and then let (E) = 1 (F )?
Explain.
11. Let (, S, ) be a nite measure space and let f : [0, ) be measurable.

Dene
A {(x, y) : y < f (x)}
Verify that A is m measurable. Show that

f d = XA (x, y) ddm = XA d m.
12. For f a nonnegative measurable function, it was shown that

f d = ([f > t]) dt.

Would it work the same if you used ([f t]) dt? Explain.
13. The Riemann integral is only dened for functions which are bounded which
are also dened on a bounded interval. If either of these two criteria are not
satised, then the integral is not the Riemann integral. Suppose f is Riemann
integrable on a bounded interval, [a, b]. Show that it must also be Lebesgue
integrable with respect to one dimensional Lebesgue measure and the two
integrals coincide. Give a theorem in which the improper Riemann integral
coincides with a suitable Lebesgue integral. (There are many such situations

just nd one.) Note that 0 sinx x dx is a valid improper Riemann integral but
is not a Lebesgue integral. Why?
14. Suppose is a nite measure dened on the Borel subsets of X where X is

a separable complete metric space. Show that is necessarily regular. Hint:
First show is outer regular on closed sets in the sense that for H closed,
(H) = inf { (V ) : V H and V is open}
Then show that for every open set, V
(V ) = sup { (H) : H V and H is closed} .
Next let F consist of those sets for which is outer regular and also inner reg-
ular with closed replacing compact in the denition of inner regular. Finally
show that if C is a closed set, then
(C) = sup { (K) : K C and K is compact} .

To do this, consider a countable dense subset of C, {an } and let

( )
1
Cn = m n
k=1 B a k , C.
n
Show you can choose mn such that
(C \ Cn ) < /2n .
Then consider K n Cn .
15. Let (, F, ) be a nite measure space and suppose {fn } is a sequence of

nonnegative functions which satisfy fn () C independent of n, . Suppose
also this sequence converges to 0 in measure. That is, for all > 0,
lim ([fn ]) = 0
n
Show that then

lim fn () d = 0.
n
16. Let K be a compact subset of R having no isolated points. Show that there
exists an increasing continuous function g such that g is constant on every
connected component of K C and has values between 0 and 1. If J, L are two
components, J < L, then the value of g on J is strictly less than its value on
L. Hint: Let the components be {(ak , bk )}. Let a be the rst point of K and
b be the last. Let g0 be piecewise linear, increasing and continuous going from
0 to the left of a to 1 to the right of b. Let g1 equal 21 (g0 (a1 ) + g0 (b1 )) on
(a1 , b1 ) and adjust to make piecewise linear and increasing going from 0 to 1.
next adjust g1 in a similar way to make it constant on (a2 , b2 ). Continue this
way. Estimate ||gk gk1 || in terms of gk1 (bk ) gk1 (ak ) and observe
and use that the intervals (gk1 (ak ) , gk1 (bk )) are disjoint.
17. Show that if K is any compact subset of R which has no isolated points,
there exists a Radon measure which has the properties (K) = 1, (E) =
(E K) , if H is a proper compact subset of K, then (H) < 1. Also,
(p) = 0 whenever p is a point.
18. Let K be an arbitrary compact subset of R. Then there exists a Radon
measure which has the properties (K) = 1, (E) = (E K) , if H is a
compact subset of K, then (H) < 1.
Extension Theorems
17.1 Algebras
First of all, here is the denition of an algebra and theorems which tell how to
recognize one when you see it. An algebra is like a algebra except it is only closed
with respect to nite unions.
Denition 17.1.1 A is said to be an algebra of subsets of a set, Z if Z A, A,

and when E, F A, E F and E \ F are both in A.
It is important to note that if A is an algebra, then it is also closed under nite

intersections. This is because E F = (E C F C )C A since E C = Z \ E A and
F C = Z \ F A. Note that every algebra is an algebra but not the other way
around.
Something satisfying the above denition is called an algebra because union is
like addition, the set dierence is like subtraction and intersection is like multipli-
cation. Furthermore, only nitely many operations are done at a time and so there
is nothing like a limit involved.
How can you recognize an algebra when you see one? The answer to this question
is the purpose of the following lemma.
Lemma 17.1.2 Suppose R and E are subsets of P(Z)1 such that E is dened as
the set of all nite disjoint unions of sets of R. Suppose also
, Z R
A B R whenever A, B R,
A \ B E whenever A, B R.
Then E is an algebra of sets of Z.

1 Set of all subsets of Z
421
422 EXTENSION THEOREMS
Proof: Note rst that if A R, then AC E because AC = Z \ A.

Now suppose that E1 and E2 are in E,
E1 = m
i=1 Ri , E2 = j=1 Rj
n
where the Ri are disjoint sets in R and the Rj are disjoint sets in R. Then
E1 E2 = m
i=1 j=1 Ri Rj
n
which is clearly an element of E because no two of the sets in the union can intersect
and by assumption they are all in R. Thus by induction, nite intersections of sets
of E are in E. Consider the dierence of two elements of E next.
If E = ni=1 Ri E,
E C = ni=1 RiC = nite intersection of sets of E
which was just shown to be in E. Now, if E1 , E2 E,
E1 \ E2 = E1 E2C E
from what was just shown about nite intersections.

Finally consider nite unions of sets of E. Let E1 and E2 be sets of E. Then
E1 E2 = (E1 \ E2 ) E2 E
because E1 \ E2 consists of a nite disjoint union of sets of R and these sets must
be disjoint from the sets of R whose union yields E2 because (E1 \ E2 ) E2 = .
The following corollary is particularly helpful in verifying the conditions of the
above lemma.
Corollary 17.1.3 Let (Z1 , R1 , E1 ) and (Z2 , R2 , E2 ) be as described in Lemma 17.1.2.

Then (Z1 Z2 , R, E) also satises the conditions of Lemma 17.1.2 if R is dened
as
R {R1 R2 : Ri Ri }
and
E { nite disjoint unions of sets of R}.
Consequently, E is an algebra of sets.
Proof: It is clear , Z1 Z2 R. Let A B and C D be two elements of R.
ABC D =AC BD R
by assumption.
A B \ (C D) =
E2 E1 R2
z }| { z }| { z }| {
A (B \ D) (A \ C) (D B)
17.2. CARATHEODORY EXTENSION THEOREM 423
= (A Q) (P R)
where Q E2 , P E1 , and R R2 .
D
B
Since A Q and P R do not intersect, it follows the above expression is in E

because each of these terms are. This proves the corollary.
17.2 Caratheodory Extension Theorem

The Caratheodory extension theorem is a fundamental result which makes possible
the consideration of measures on innite products among other things. The idea is
that if a nite measure dened only on an algebra is trying to be a measure, then
in fact it can be extended to a measure.
Denition 17.2.1 Let E be an algebra of sets of and let 0 be a nite measure

on E. This means 0 is nitely additive and if Ei , E are sets of E with the Ei
disjoint and
E = i=1 Ei ,
then

0 (E) = 0 (Ei )
i=1
while 0 () < .
In this denition, 0 is trying to be a measure and acts like one whenever pos-
sible. Under these conditions, 0 can be extended uniquely to a complete measure,
, dened on a algebra of sets containing E such that agrees with 0 on E. The
following is the main result.
Theorem 17.2.2 Let 0 be a measure on an algebra of sets, E, which satises

0 () < . Then there exists a complete measure space (, S, ) such that
(E) = 0 (E)
for all E E. Also if is any such measure which agrees with 0 on E, then =
on (E), the algebra generated by E.
Proof: Dene an outer measure as follows.

{ }

(S) inf 0 (Ei ) : S
i=1 Ei , Ei E
i=1
Claim 1: is an outer measure.

Proof of Claim 1: Let S
i=1 Si and let Si j=1 Eij , where

(Si ) + i
(Eij ) .
2 j=1
Then ( )
(S) (Eij ) = (Si ) + = (Si ) + .
i j i
2i i
Since is arbitrary, this shows is an outer measure as claimed.

By the Caratheodory procedure, there exists a unique algebra, S, consisting
of the measurable sets such that
(, S, )
is a complete measure space. It remains to show extends 0 .

Claim 2: If S is the algebra of measurable sets, S E and = 0 on E.
Proof of Claim 2: First observe that if A E, then (A) 0 (A) by
denition. Letting

(A) + > 0 (Ei ) ,
i=1 Ei A, Ei E,
i=1
it follows

(A) + > 0 (Ei A) 0 (A)
i=1
since A = i=1 Ei A. Therefore, = 0 on E.

Consider the assertion that E S. Let A E and let S be any set. There
exist sets {Ei } E such that i=1 Ei S but

(S) + > (Ei ) .
i=1
Then
(S) (S A) + (S \ A)
(
i=1 Ei \ A) + (i=1 (Ei A))

(Ei \A) + (Ei A) = (Ei ) < (S) + .
i=1 i=1 i=1
17.2. CARATHEODORY EXTENSION THEOREM 425
Since is arbitrary, this shows A S.

This has proved the existence part of the theorem. To verify uniqueness, Let
G {E (E) : (E) = (E)} .
Then G is given to contain E and is obviously closed with respect to countable

disjoint unions and complements. Therefore by Lemma 9.2.2, G = (E) and this
proves the lemma.
The following lemma is also very signicant.
Lemma 17.2.3 Let M be a metric space with the closed balls compact and suppose
is a measure dened on the Borel sets of M which is nite on compact sets.
Then there exists a unique Radon measure, which equals on the Borel sets. In
particular must be both inner and outer regular on all Borel sets.

Proof: Dene a positive linear functional, (f ) = f d. Let be the Radon
measure which comes from the Riesz representation theorem for positive linear
functionals. (Theorem 16.4.11) Thus for all f continuous,

f d = f d.
If V is an open set, let {fn } be a sequence of continuous functions which is increasing

and converges to XV pointwise. Then applying the monotone convergence theorem,

XV d = (V ) = XV d = (V )
and so the two measures coincide on all open sets. Every compact set is a countable
intersection of open sets and so the two measures coincide on all compact sets. Now
let B (a, n) be a ball of radius n and let E be a Borel set contained in this ball.
Then by regularity of there exist sets F, G such that G is a countable intersection
of open sets and F is a countable union of compact sets such that F E G and
(G \ F ) = 0. Now (G) = (G) and (F ) = (F ) . Thus
(G \ F ) + (F ) = (G)
= (G) = (G \ F ) + (F )
and so (G \ F ) = (G \ F ) . It follows
(E) = (F ) = (F ) = (G) = (E) .
If E is an arbitrary Borel set, then
(E B (a, n)) = (E B (a, n))
and letting n , this yields (E) = (E) .

17.3 The Tychono Theorem

Sometimes it is necessary to consider innite Cartesian products of topological
spaces. When you have nitely many topological spaces in the product and each is
compact, it can be shown that the Cartesian product is compact with the product
topology. It turns out that the same thing holds for innite products but you have
to be careful how you dene the topology. The rst thing likely to come to mind
by analogy with nite products is not the right way to do it.
First recall the Hausdor maximal principle.
Theorem 17.3.1 (Hausdor maximal principle) Let F be a nonempty partially

ordered set. Then there exists a maximal chain.
The main tool in the study of products of compact topological spaces is the
Alexander subbasis theorem which is presented next. Recall a set is compact if
every basic open cover admits a nite subcover. This was pretty easy to prove.
However, there is a much smaller set of open sets called a subbasis which has this
property. The proof of this result is much harder.
Denition 17.3.2 S is called a subbasis for the topology if the set B of nite
intersections of sets of S is a basis for the topology, .
Theorem 17.3.3 Let (X, ) be a topological space and let S be a subbasis for
. Then if H X, H is compact if and only if every open cover of H consisting
entirely of sets of S admits a nite subcover.
Proof: The only if part is obvious because the subasic sets are themselves open.
If every basic open cover admits a nite subcover then the set in question is
compact. Suppose then that H is a subset of X having the property that subbasic
open covers admit nite subcovers. Is H compact? Assume this is not so. Then
what was just observed about basic covers implies there exists a basic open cover
of H, O, which admits no nite subcover. Let F be dened as
{O : O is a basic open cover of H which admits no nite subcover}.
The assumption is that F is nonempty. Partially order F by set inclusion and use
the Hausdor maximal principle to obtain a maximal chain, C, of such open covers
and let
D = C.
If D admits a nite subcover, then since C is a chain and the nite subcover has only
nitely many sets, some element of C would also admit a nite subcover, contrary
to the denition of F. Therefore, D admits no nite subcover. If D properly
contains D and D is a basic open cover of H, then D has a nite subcover of H
since otherwise, C would fail to be a maximal chain, being properly contained in
C {D }. Every set of D is of the form
U = m
i=1 Bi , Bi S
17.3. THE TYCHONOFF THEOREM 427
because they are all basic open sets. If it is the case that for all U D one of the
Bi is found in D, then replace each such U with the subbasic set from D containing
it. But then this would be a subbasic open cover of H which by assumption would
admit a nite subcover contrary to the properties of D. Therefore, one of the sets
of D, denoted by U , has the property that
U = m
i=1 Bi , Bi S
and no Bi is in D. Thus D {Bi } admits a nite subcover, for each of the above
Bi because it is strictly larger than D. Let this nite subcover corresponding to Bi
be denoted by
V1i , , Vmi i , Bi
Consider
{U, Vji , j = 1, , mi , i = 1, , m}.
If p H \ {Vji }, then p Bi for each i and so p U . This is therefore a nite
subcover of D contradicting the properties of D. Therefore, F must be empty and
this proves the theorem.
Let I be a set and suppose for each i I, (Xi , i )is a nonempty topological
space. The Cartesian product of the Xi , denoted by iI Xi , consists of the set
of allchoice functions dened on I which select a single element of each Xi . Thus
f iI Xi means for every i I, f (i) Xi . The axiom of choice says iI Xi
is nonempty. Let
Pj (A) = Bi
iI
where Bi = Xi if i = j and Bj = A. A subbasis for a topology on the product

space consists of all sets Pj (A) where A j . (These sets have an open set from
the topology of Xj in the j th slot and the whole space in the other slots.) Thus a
basis consists of nite intersections of these sets. Note that theintersection of two
of these basic sets is another basic set and their union yields iI Xi . Therefore,
they satisfy the condition needed for a collection of sets to serve as a basis for a
topology. This topology is called the product topology and is denoted by i .
It is tempting to dene a basis for a topology to be sets of the form iI Ai
where Ai is open in Xi . This is not the same thing at all. Note that the basis just
described has at most nitely many slots lled with an open set which is not the
whole space. The thing just mentioned in which every slot may be lled by a proper
open set is called the box topology and there exist people who are interested in it.
The Alexander subbasis theorem is used to prove the Tychono theorem which
says
that if each Xi is a compact topological space, then in the product topology,
iI Xi is also compact.

Theorem 17.3.4 If (Xi , i ) is compact, then so is ( iI Xi , i ).
Proof: By the Alexander subbasis theorem, the theorem will be proved if every
subbasic open cover admits a nite subcover. Therefore, let O be a subbasic open

cover of iI Xi . Let
Oj = {Q O : Q = Pj (A) for some A j }.
Thus Oj consists of those sets of O which have a possibly proper subset of Xi only
in the slot i = j. Let
j Oj = {A : Pj (A) Oj }.
Thus j Oj picks out those proper open subsets of Xj which occur in Oj .
If no j Oj covers Xj , then by the axiom of choice, there exists

f Xi \ i Oi
iI

Therefore, f (j) / j Oj for each j I. Now f is a point of iI Xi and so
f Pk (A) O for some k. However, this is a contradiction as it was shown that
f (k) is not an element of A. (A is one of the sets whose union makes up k Ok .)
This contradiction shows that for some j, j Oj covers Xj . Thus
Xj = j Oj
and so by compactness of Xj , there exist A1 , , Am ,

sets in j such that Xj
m
i=1 Ai and P j (Ai ) O. Therefore,
{P j (Ai )}m
i=1 covers iI Xi . By the Alexander
subbasis theorem this proves iI Xi is compact.
17.4 Kolmogorov Extension Theorem

Let a subbasis for [, ] be sets of the form [, a) and (a, ]. Thus with this
n
subbasis, [, ] is a compact Hausdor space. Also let Mt [, ] t where
nt is a positive integer and endow this product with the product topology so that
Mt is also a compact Hausdor space.
(Like R) and the interest will be in
I will denote a totally ordered index set,
building a measure on the product space, tI Mt . By the well ordering principle,
you can always put an order on any index set so this order is no restriction, but we
do not insist on a well order and in fact, index sets of great interest are R or [0, ).
Also for X a topological space, B (X) will denote the Borel sets.
Notation 17.4.1 The symbol J will denote a nite subset of I, J = (t1 , , tn ) ,

the ti taken in order. EJ will denote a set which has a set Et of B (Mt ) in the tth
position for t J and for t / J, the set in the tth position will be Mt . KJ will
denote a set which has a compact set in the tth position for t J and for t / J, the
set inthe tth position will be Mt . Thus KJ is compact in the product topology of
tI Mt .Also denote by RJ the sets EJ and R the union of the RJ . Let EJ
denote nite disjoint unions of sets of RJ and let E denote nite disjoint unions of
sets of R. Thus if F is a set of E, there exists J such
that F is a nite disjoint
union
of sets of RJ . For F , denote by J (F) the set tJ Ft where F = tI Ft .
17.4. KOLMOGOROV EXTENSION THEOREM 429

Lemma 17.4.2 The sets, E ,EJ dened above form an algebra of sets of tI Mt .
Proof: First consider RJ . If A, B RJ , then A B RJ also. Is A \ B a
nite disjoint union of sets of RJ ? It suces to verify that J (A \ B) is a nite
disjoint union of J (RJ ). Let |J| denote the number of indices in J. If |J| = 1,
then it is obvious that J (A \ B) is a nite disjoint union of sets of J (RJ ). In
fact, letting J = (t) and the tth entry of A is A and the tth entry of B is B, then
the tth entry of A \ B is A \ B, a Borel set of Mt , a nite disjoint union of Borel
sets of Mt .
Suppose then that for A, B sets of RJ , J (A \ B) is a nite disjoint union of
sets of J (RJ ) for |J| n, and consider J = (t1 , , tn , tn+1 ) . Let the tth
i entry
of A and B be respectively Ai and Bi . It follows that J (A \ B) has the following
in the entries for J
(A1 A2 An An+1 ) \ (B1 B2 Bn Bn+1 )
Letting A represent A1 A2 An and B represent B1 B2 Bn , this
is of the form
A (An+1 \ Bn+1 ) (A \ B) (An+1 Bn+1 )
By induction, (A \ B) is the nite disjoint union of sets of R(t1 , ,tn ) . Therefore,
the above is the nite disjoint union of sets of RJ . It follows that EJ is an algebra.
Now suppose A, B R. Then for some nite set J, both are in RJ . Then from
what was just shown,
A \ B EJ E, A B R.
By Lemma 17.1.2 on Page 421 this shows E is an algebra.
With this preparation, here is the Kolmogorov extension theorem. In the state-
ment and proof of the theorem, Fi , Gi , and Ei will denote Borel sets. Any list of
indices from I will always be assumed to be taken in order. Thus, if J I and
J = (t1 , , tn ) , it will always be assumed t1 < t2 < < tn .
Theorem 17.4.3 For each nite set
J = (t1 , , tn ) I,
supposethere exists a Borel probability measure, J = t1 tn dened on the Borel
sets of tJ Mt such that the following consistency condition holds. If
(t1 , , tn ) (s1 , , sp ) ,
then ( )
t1 tn (Ft1 Ftn ) = s1 sp Gs1 Gsp (17.4.1)
where if si = tj , then Gsi = Ftj and if si is not equal to any of the indices, tk ,
then Gsi = Msi . Then for E dened in Denition 17.4.1, there exists a probability
measure, P and a algebra F = (E) such that
( )

Mt , P, F
tI

is a probability space. Also there exist measurable functions, Xs : tI Mt Ms
dened as
Xs x xs
for each s I such that for each (t1 tn ) I,
t1 tn (Ft1 Ftn ) = P ([Xt1 Ft1 ] [Xtn Ftn ])
( )

n
= P (Xt1 , , Xtn ) Ftj = P Ft (17.4.2)
j=1 tI
where Ft = Mt for every t / {t1 tn } and Fti is a Borel set. Also if f is a non-
negative
( function
) of nitely many variables, xt1 , , xtn , measurable with respect to
n
B j=1 Mtj , then f is also measurable with respect to F and

f (xt1 , , xtn ) d t1 tn
Mt1 Mtn

=
f (xt1 , , xtn ) dP (17.4.3)
tI Mt
Proof: Let E be the algebra of sets dened in Denition 17.4.1. I want to dene
a measure on E. For F E, there exists J such that F is the nite disjoint unions
of sets of RJ . Dene
P0 (F) J ( J (F))
Then P0 is well dened because of the consistency condition on the measures J .
P0 is clearly nitely additive because the J are measures and one can pick J as
large as desired to include all t where there may be something other than Mt . Also,
from the denition,
( )

P0 () P0 Mt = t1 (Mt1 ) = 1.
tI
Next I will show P0 is a nite measure on E. After this it is only a matter of using
the Caratheodory extension theorem to get the existence of the desired probability
measure P.
Claim: Suppose En is in E and suppose En . Then P0 (En ) 0.
Proof of the claim: If not, there exists a sequence such that although En
, P0 (En ) > 0. Let En EJn . Thus it is a nite disjoint union of sets of RJn .
By regularity of the measures J , there exists a compact set KJn En such that

Jn ( Jn (KJn )) + > Jn ( Jn (En ))
2n+2
Thus

P0 (KJn ) + Jn ( Jn (KJn )) +
2n+2 2n+2
> Jn ( Jn (En )) P0 (En )
The interesting thing about these KJn is: they have the nite intersection property.
Here is why.
P0 (m
k=1 KJk ) + P0 (E \ k=1 KJk )
m m
( m )
P0 (k=1 KJk ) + P0 k=1 E \ KJk
m k

< P0 (m K
k=1 Jk ) + < P0 (m
k=1 KJk ) + /2,
2k+2
k=1
and so P0 (m k=1 KJk ) > /2. Now this yields a contradiction, because this nite in-
tersection property implies the intersection of all the KJk is nonempty, contradicting
En since each KJn is contained in En .
With the claim, it follows P0 is a measure on E. Here is why: If E = k=1 E
k
n
where E, E E, then (E \ k=1 Ek ) and so
k
P0 (nk=1 Ek ) P0 (E) .
n
Hence if the Ek are disjoint, P0 (nk=1 Ek ) = k=1 P0 (Ek ) P0 (E) . Thus for
disjoint Ek having k Ek = E E,

P0 (
k=1 Ek ) = P0 (Ek ) .
k=1
Now to conclude the proof, apply the Caratheodory extension theorem to obtain
P a probability measure which extends P0 to a algebra which contains (E) the
sigma algebra generated by E with P = P0 on E. Thus for EJ E, P (EJ ) =
P0 (EJ ) = J ((P
J Ej ) . )
Next, let tI Mt , F, P be the probability space and for x tI Mt let
Xt (x) = xt , the tth entry of x. It follows Xt is measurable (also continuous) because
if U is open in Mt , then Xt1 (U ) has a U in the tth slot and Ms everywhere else for
s = t. Thus inverse images of open sets are measurable. Also, letting J be a nite
subset of I and for J = (t1 , , tn ) , and Ft1 , , Ftn Borel sets in Mt1 Mtn
respectively, it follows FJ , where FJ has Fti in the tth i entry, is in E and therefore,
P ([Xt1 Ft1 ] [Xt2 Ft2 ] [Xtn Ftn ]) =
P ([(Xt1 , Xt2 , , Xtn ) Ft1 Ftn ]) = P (FJ ) = P0 (FJ )

= t1 tn (Ft1 Ftn )
about the integrals. Suppose f (xt1 , , xtn ) = XF

Finally consider the claim
where F is a Borel set of tJ Mt where J = (t1 , , tn ). To begin with suppose
F = Ft1 Ftn (17.4.4)

( )
where each Ftj is in B Mtj . Then

XF (xt1 , , xtn ) d t1 tn = t1 tn (Ft1 Ftn )
Mt1 Mtn
( )

= P Ft = XtI Ft (x) dP
tI

= XF (xt1 , , xtn ) dP (17.4.5)

where Ft = Mt if t / J. Let K denote sets, F of( the sort)in 17.4.4. It is clearly a

system. Now let G denote those sets F in B tJ Mt such that 17.4.5 holds.
Thus G K. It is clear that G is closed with respect (to countable) disjoint unions
and complements.
Hence G (K) but (K) = B tJ M t because every open
set in tJ Mt is the countable union of rectangles
( like 17.4.4
) in which each Fti is
open. Therefore, 17.4.5 holds for every F B tJ M t .
Passing to simple functions and then using the monotone convergence theorem
yields the nal claim of the theorem.
n
The next task is to consider the case where Mt = (, ) t . To consider this
case, here is a lemma which will allow this case to be deduced from the above
theorem. In this lemma, Mt [, ] t .
n

Lemma 17.4.4 Let J be a nite subset ofI. Then U is a Borel set in tJ Mt if
and only if there exists a Borel set, U in tJ Mt such that U = U tJ Mt .
Proof: A subbasis for the topology for [, ] is sets of the form [, a)
n
n (a, ]. Also
and a subbasis for the topology of [, ] is sets of the form
n n
i=1 [, ai ) and ]. Similarly, a subbasis
i=1 (ai , n for the topology of (, )
n
consists of sets of the form i=1 (, a i ) and i=1 (ai , ). Thus the basic open

sets
of tJ M t are of the form U
tJ t M where U is a basic open set in

tJ M t . It follows
the open sets of tJ M t are of the form
U tJ Mt where
U is open in tJ Mt . Let F denote those Borel sets of tJ M t which are of the
form U tJ Mt for U a Borel set in tJ M t

. Then as just shown, F contains

the system of open sets in tJ Mt . Let G denote those Borel sets of tJ Mt
which are of the desired form. It is clearly closed with respect to complements and
countable disjoint unions. Hence G equals the Borel sets of tJ Mt .
Maybe this diagram will help to keep the argument straight.
M
M
U
Now here is the Kolmogorov extension theorem in the desired form.

Theorem 17.4.5 (Kolmogorov extension theorem) For each nite set
J = (t1 , , tn ) I,
supposethere exists a Borel probability measure, J = t1 tn dened on the Borel
sets of tJ Mt for Mt = Rnt for nt an integer, such that the following consistency
condition holds. If
(t1 , , tn ) (s1 , , sp ) ,
then ( )
t1 tn (Ft1 Ftn ) = s1 sp Gs1 Gsp (17.4.6)
where if si = tj , then Gsi = Ftj and if si is not equal to any of the indices, tk , then
Gsi = Msi . Then for E dened as in Denition 17.4.1, adjusted so that never
appears as any endpoint of any interval, there exists a probability measure, P and
a algebra F = (E) such that
( )

Mt , P, F
tI

is a probability space. Also there exist measurable functions, Xs : tI Mt Ms
dened as
Xs x xs
for each s I such that for each (t1 tn ) I,
t1 tn (Ft1 Ftn ) = P ([Xt1 Ft1 ] [Xtn Ftn ])

( )

n
= P (Xt1 , , Xtn ) Ftj = P Ft (17.4.7)
j=1 tI
where Ft = Mt for every t / {t1 tn } and Fti is a Borel set. Also if f is a non-
negative
( function
) of nitely many variables, xt1 , , xtn , measurable with respect to
n
B j=1 Mtj , then f is also measurable with respect to F and

f (xt1 , , xtn ) d t1 tn
Mt1 Mtn

=
f (xt1 , , xtn ) dP (17.4.8)
tI Mt

Proof: Using Lemma 17.4.4, extend each measure, (J to M t , dened
) by adding
in(the points) at the ends, by letting (E) E M for all E

J J tI t
B tI M t . Then apply Theorem 17.4.3 to these extended measures and use the
denition of the extensions of each J to replace each Mt with Mt everywhere it
occurs.
As a special case, you can obtain a version of product measure for possibly
innitely many factors. Suppose in the context of the above theorem that t is a
probability measure dened on the Borel sets of Mt Rnt fornt a positive integer,
n
and let the measures, t1 tn be dened on the Borel sets of i=1 Mti by
product measure
z }| {
t1 tn (E) ( t1 tn ) (E) .
Then these measures satisfy the necessary consistency condition and so the Kol-
mogorov extension theorem given above can be applied to obtain a measure P
( )
dened on tI Mt , F and measurable functions Xs : tI Mt Ms such that
for Fti a Borel set in Mti ,
( )

n
P (Xt1 , , Xtn ) Fti = t1 tn (Ft1 Ftn )
i=1
= t1 (Ft1 ) tn (Ftn ) . (17.4.9)

In particular, P (Xt Ft ) = t (Ft ) . Then P in the resulting probability space,
( )

Mt , F, P
tI

will be denoted as tI t . This proves the following theorem which describes an
innite product measure.
Theorem 17.4.6 Let Mt for t I be given as in Theorem 17.4.5 and let t be

a Borel probability measure dened on the Borel sets of Mt . Then there exists a
P and a algebra F = (E) where E is given in Denition 17.4.1 such
measure
that ( t Mt , F, P ) is a probability space satisfying 17.4.9 whenevereach Fti is a
Borel set of Mti . This probability measure is sometimes denoted as t t .
17.5 Independence
The concept of independence is probably the main idea which separates probability
from analysis and causes some of us to struggle to understand what is going on. In
what follows, recall that a Banach space is a complete normed vector space. These
are discussed more elsewhere in the book.
Denition 17.5.1 Let (, F, P ) be a probability space, P () = 1. The sets in

F are called events. A set of events, {Ai }iI is called independent if whenever
m
{Aik }k=1 is a nite subset

m
P (m
k=1 Aik ) = P (Aik ) .
k=1
( )
Each of these events denes a rather simple algebra, Ai , AC i , , denoted
by Fi . Now the following lemma is interesting because it motivates a more general
notion of independent algebras.
Lemma 17.5.2 Suppose Bi Fi for i I. Then for any m N

m
P (m
k=1 Bik ) = P (Bik ) .
k=1
17.5. INDEPENDENCE 435
Proof: The proof is by induction on the number l of the Bik which are not
equal to Aik . First suppose l = 0. Then the above assertion is true by assumption.
Suppose it is so for some l and there are l + 1 sets not equal to Aik . If any equals
there is nothing to show. Both sides equal 0. If any equals , there is also nothing
to show. You can ignore that set in both sides and then you have by induction the
two sides are equal because you have no more than l sets dierent than Aik . The
only remaining case is where some Bik = AC C
ik . Say Bim+1 = Aim+1 for simplicity.
( ) ( )
P m+1
k=1 Bik = P Aim+1 k=1 Bik
C m
( )
k=1 Bik ) P Aim+1 k=1 Bik
= P (m m
Then by induction,

m
( )
m
m
( ( ))
= P (Bik ) P Aim+1 P (Bik ) = P (Bik ) 1 P Aim+1
k=1 k=1 k=1
( )
m
m+1
= P AC
im+1 P (Bik ) = P (Bik )
k=1 k=1
thus proving it for l + 1.

This motivates a more general notion of independence in terms of algebras.
Denition 17.5.3 If {Fi }iI is any set of algebras contained in F, they are said
to be independent if whenever Aik Fik for k = 1, 2, , m, then

m
P (m
k=1 Aik ) = P (Aik ) .
k=1
A set of random variables {Xi }iI is independent if the algebras { (Xi )}iI
are independent algebras. Here {(X) denotes the smallest } algebra such that
X is measurable. Thus (X) = X1 (U ) : U is a Borel set . More generally,
(Xi : i I) is the smallest algebra such that each Xi is measurable.
Observation 17.5.4 Recall that (X) was the smallest algebra such that X is
measurable with respect to (X) . That is (X) must contain X1 (U ) for every
open U . If S denotes the Borel sets B such that X1 (B) (X) , it follows easily
that S is a algebra and also contains the open sets. Hence S must contain the
Borel sets. Hence { 1 }
X (U ) : U is a Borel set
must be contained in (X) . However, such sets described above also constitute a
algebra and X is measurable with respect to this algebra. Hence it contains (X).
Note that by Lemma 17.5.2 you can consider independent events in terms of
independent algebras. That is, a set of independent events can always be consid-
ered as events taken from a set of independent algebras. This is a more general
notion because here the algebras might have innitely many sets in them.
Lemma 17.5.5 Suppose the set of random variables, {Xi }iI is independent. Also
suppose I1 I and j
/ I1 . Then the algebras (Xi : i I1 ) , (Xj ) are indepen-
dent algebras.
Proof: Let B (Xj ) . I want to show that for any A (Xi : i I1 ) ,

it follows that P (A B) = P (A) P (B) . Let K consist of nite intersections of
sets of the form X1 k (Bk ) where Bk is a Borel set and k I1 . Thus K is a
system and (K) = (Xi : i I1 ) . Now if you have one of these sets of the form
1
A = m k=1 Xk (Bk ) where without loss of generality, it can be assumed the k are
distinct since Xk1 (Bk ) X1 1
k (Bk ) = Xk (Bk Bk ) , then
( )
m
( )
1
P (A B) = P m X
k=1 k (B k ) B = P (B) P X1
k (Bk )
k=1
( 1
)
= P (B) P m
k=1 Xk (Bk ) .
Thus K is contained in
G {A (Xi : i I1 ) : P (A B) = P (A) P (B)} .
Now G is closed with respect to complements and countable disjoint unions. Here
is why: If each Ai G and the Ai are disjoint,
P ((
i=1 Ai ) B) = P (
i=1 (Ai B))

= P (Ai B) = P (Ai ) P (B)
i i

= P (B) P (Ai ) = P (B) P (
i=1 Ai )
i
If A G, ( )
P AC B + P (A B) = P (B)
and so
( )
P AC B = P (B) P (A B)
= P (B) P (A) P (B)
( )
= P (B) (1 P (A)) = P (B) P AC .
Therefore, from the lemma on systems, Lemma 9.2.2 on Page 219, it follows
G (K) = (Xi : i I1 ).

Notation 17.5.6 In probability, it is standard to write E (X) in place of
XdP.
This is also referred to as the expectation.
r
Lemma 17.5.7 If {Xk }k=1 are independent random variables having values in Z a
r
separable metric space, and if gk is a Borel measurable function, then {gk (Xk )}k=1
17.5. INDEPENDENCE 437
is also independent. Furthermore, if the random variables have values in R, and

they are all bounded, then
( r )
r
E Xi = E (Xi ) .
i=1 i=1
More generally, the above formula holds if it is only known that each Xi L1 (; R)
and
r
Xi L1 (; R) .
i=1
r
Proof: First consider the claim about {gk (Xk )}k=1 . Letting O be an open set
in Z, ( 1 )
1
(gk Xk ) (O) = X1 k gk (O) = X1k (Borel set) (Xk ) .
1
It follows (gk Xk ) (E) is in (Xk ) whenever E is Borel because the sets whose
inverse images are measurable includes the Borel sets. Thus (gk Xk ) (Xk )
and this proves the rst part of the lemma.
m m
Let X1 = i=1 ci XEi , X2 = j=1 dj XFj where P (Ei Fj ) = P (Ei ) P (Fj ). Then
( ) ( )
X1 X2 dP = dj ci P (Ei ) P (Fj ) = X1 dP X2 dP
i,j
In general for X1 , X2 independent, there exist sequences of bounded simple functions

{sn } , {tn } measurable with respect to (X1 ) and (X2 ) respectively such that
sn X1 pointwise and tn X2 pointwise. Then from the above and the dominated
convergence theorem,
( ) ( )
X1 X2 dP = lim sn tn dP = lim sn dP tn dP
n n
( ) ( )
= X1 dP X2 dP
Next suppose there are m of these independent bounded random variables. Then
m
mi=2 Xi (X2 , , Xm ) and by Lemma 17.5.5 the two random variables X1 and
i=2 Xi are independent. Hence from the above and induction,

m
m
m m

Xi dP = X1 Xi dP = X1 dP Xi dP = Xi dP
i=1 i=2 i=2 i=1
Now consider the last claim. Replace each Xi with Xin where this is just a
truncation of the form
Xi if |Xi | n
Xin n if Xi > n

n if Xi < n
Then by the rst part ( )

r
r
E Xin = E (Xin )
i=1 i=1
r r
Now | i=1 Xin | | i=1 Xi | L1 and so by the dominated convergence theorem,
you can pass to the limit in both sides to get the desired result.
Proposition 17.5.8 If X : Z is measurable, then (X) equals the smallest
algebra such that X is measurable with respect to it. Also if Xi are random variables
n spaces Zi , then (X) = (X1 , , Xn ) where
having values in separable Banach
X is the vector mapping to i=1 Zi and (X1 , , Xn ) is the smallest algebra
such that each Xi is measurable with respect to it.
Proof: Let G denote the smallest algebra such that X is measurable with
respect to this algebra. By denition X1 (open) G. Furthermore, the set of all
E such that X1 (E) G is a algebra. Hence it includes all the Borel sets. Hence
X1 (Borel) G and so G (X) . However, (X) dened above is a algebra
such that X is measurable with respect n to (X) . Therefore, G = (X).
Letting Bi be a Borel set in Zi , i=1 Bi is a Borel set. To see this, consider
n
the projection maps j : i=1 Zi Zj dened by j (x) xj . Then each j is
continuous relative to the product topology, hence Borel measurable, and if Bj is a
Borel set in Zj , then 1
j (Bj ) is also a Borel set. But it is clear that

n
Bi = ni=1 1
i (Bi ) , a Borel set.
i=1
Thus ( )

n
X 1
Bi = ni=1 Xi1 (Bi ) (X1 , , Xn )
i=1
n
If G denotes the Borel sets F i=1 Zi such that X1 (F ) (X1 , , Xn ) , then
G is clearly a algebra which contains the open sets. Hence G = B the Borel sets of
n
i=1 Zi . This shows that (X) (X1 , , Xn ) . Next we observe that (X) is a
algebra with the property that (each Xi)is measurable with respect to (X) . This
1 1 n
follows from Xi (Bi ) = X j=1 Aj (X) , where each Aj = Zj except for
Ai = Bi .Since (X1 , , Xn ) is dened as the smallest such algebra, it follows
that (X) (X1 , , Xn ) .
Maybe this would be a good place to put a really interesting result known as the
Doob Dynkin lemma. This amazing result is illustrated with the following diagram
in which X = (X1 , , Xm ). By Proposition 17.5.8 (X) = (X1 , , Xn ) .
X
(, (X)) F
X g
m m
( i=1 Ei , B ( i=1 Ei ))
You start with X and can write it as the composition g X provided X is (X)
measurable.
17.6. EXERCISES 439
Lemma 17.5.9 Let (, F) be a measure space and let Xi : Ei where Ei is a

separable Banach space. Suppose also that X : F where F is a Banach space.
Then X is (X1 , , Xm ) measurable if and only if there exists a Borel measurable
m
function g : i=1 Ei F such that X = g (X1 , , Xm ).
Proof: First suppose X () = f XW () where f F and W (X1 , , Xm ) .
1
Then by Proposition
m 17.5.8, W is of the form (X1 , , Xm ) (B) X1 (B) where
B is Borel in i=1 Ei . Therefore,
X () = f XX1 (B) () = f XB (X ()) .
Now suppose X is measurable with respect to (X1 , , Xm ) . Then there exist
simple functions

mn
Xn () = fk XBk (X ()) gn (X ())
k=1
m
where the Bk are Borel sets in i=1 Ei , such that Xn () X () , each gn being
Borel. Thus gn converges on X () . Furthermore, the set on which gn does converge
is a Borel set equal to
[ ]
1

n=1 m=1 p,q ||gp gq || <
n
which contains X () . Therefore, modifying gn by multiplying it by the indicator
function of this Borel set containing X (), we can conclude that gn converges to a
Borel function g and, passing to a limit in the above,
X () = g (X ())
Conversely, suppose X () = g (X ()) . Why is X (X) measurable?
( )
X 1 (open) = X1 g 1 (open) = X1 (Borel) (X)
17.6 Exercises
1. A random variable X : (, F, P ) R is said to be normally distributed
with mean and variance 2 > 0 if for all E a Borel set in R,

1 (x)2
P ([X E]) = e 22 dx.
2 E
Use the Kolmogorov extension theorem to show that there exists a probability

space and random variables { i }i=1 dened on this space such that each i is
normally distributed with mean 0 and variance 1 such that also the random
variables i are independent. Hint: For i1 < < im , and E a Borel set
1 (x2 ++x2 )
of Rm , dene i1 im (E) 2
1
m e 2 i1 im
dmm . Show that this
( ) E
satises the necessary consistency condition for the Kolmogorov extension
theorem.
2. Show using the Hausdor maximal theorem that there exists a maximal
sequence of orthonormal functions {gi } in L2 (0, ) which is countable. Then
show that if f L2 (0, ) ,

2 2
||f ||L2 (0,) = |(f, gi )| .
i=1
3. With the above problems, show there exists { i } a countable sequence

of independent normally distributed random variables having mean 0 and
variance 1. Also let {gi } be a maximal orthonormal sequence of functions of
L2 (0, ). Show that for each t, the series

( )
W (t) = i X(0,t) , gi
k=1
converges in L2 () . Thus the above series yields a random variable for each
t. W (t) is called the Wiener process or one dimensional Brownian motion.
Much more can be said about it.
Banach Spaces
18.1 Theorems Based On Baire Category

18.1.1 Baire Category Theorem
Some examples of Banach spaces that have been discussed up to now are Rn , Cn ,
and Lp (). Theorems about general Banach spaces are proved in this chapter.
The main theorems to be presented here are the uniform boundedness theorem, the
open mapping theorem, the closed graph theorem, and the Hahn Banach Theorem.
The rst three of these theorems come from the Baire category theorem which is
about to be presented. They are topological in nature. The Hahn Banach theorem
has nothing to do with topology. Banach spaces are all normed linear spaces and as
such, they are all metric spaces because a normed linear space may be considered
as a metric space with d (x, y) ||x y||. You can check that this satises all the
axioms of a metric. As usual, if every Cauchy sequence converges, the metric space
is called complete.
Denition 18.1.1 A complete normed linear space is called a Banach space.
The following remarkable result is called the Baire category theorem. To get an
idea of its meaning, imagine you draw a line in the plane. The complement of this
line is an open set and is dense because every point, even those on the line, are limit
points of this open set. Now draw another line. The complement of the two lines
is still open and dense. Keep drawing lines and looking at the complements of the
union of these lines. You always have an open set which is dense. Now what if there
were countably many lines? The Baire category theorem implies the complement
of the union of these lines is dense. In particular it is nonempty. Thus you cannot
write the plane as a countable union of lines. This is a rather rough description of
this very important theorem. The precise statement and proof follow.
Theorem 18.1.2 Let (X, d) be a complete metric space and let {Un }
n=1 be a se-
quence of open subsets of X satisfying Un = X (Un is dense). Then D
n=1 Un
is a dense subset of X.
441
442 BANACH SPACES
Proof: Let p X and let r0 > 0. I need to show D B(p, r0 ) = . Since U1 is

dense, there exists p1 U1 B(p, r0 ), an open set. Let p1 B(p1 , r1 ) B(p1 , r1 )
U1 B(p, r0 ) and r1 < 21 . This is possible because U1 B (p, r0 ) is an open set
and so there exists r1 such that B (p1 , 2r1 ) U1 B (p, r0 ). But
B (p1 , r1 ) B (p1 , r1 ) B (p1 , 2r1 )
because B (p1 , r1 ) = {x X : d (x, p) r1 }. (Why?)
r0 p
p 1
There exists p2 U2 B(p1 , r1 ) because U2 is dense. Let
p2 B(p2 , r2 ) B(p2 , r2 ) U2 B(p1 , r1 ) U1 U2 B(p, r0 ).
and let r2 < 22 . Continue in this way. Thus
rn < 2n,
B(pn , rn ) U1 U2 ... Un B(p, r0 ),

B(pn , rn ) B(pn1 , rn1 ).
The sequence, {pn } is a Cauchy sequence because all terms of {pk } for k n
are contained in B (pn , rn ), a set whose diameter is no larger than 2n . Since X is
complete, there exists p such that
lim pn = p .
n
Since all but nitely many terms of {pn } are in B(pm , rm ), it follows that p
B(pm , rm ) for each m. Therefore,
p
m=1 B(pm , rm ) i=1 Ui B(p, r0 ).
This proves the theorem.

The following corollary is also called the Baire category theorem.
Corollary 18.1.3 Let X be a complete metric space and suppose X =

i=1 Fi
where each Fi is a closed set. Then for some i, interior Fi = .
Proof: If all Fi has empty interior, then FiC would be a dense open set. There-
fore, from Theorem 18.1.2, it would follow that
= (
C
i=1 Fi ) = i=1 Fi = .
C
18.1. THEOREMS BASED ON BAIRE CATEGORY 443
The set D of Theorem 18.1.2 is called a G set because it is the countable

intersection of open sets. Thus D is a dense G set.
Recall that a norm satises:
a.) ||x|| 0, ||x|| = 0 if and only if x = 0.
b.) ||x + y|| ||x|| + ||y||.
c.) ||cx|| = |c| ||x|| if c is a scalar and x X.
From the denition of continuity, it follows easily that a function is continuous
if
lim xn = x
n
implies
lim f (xn ) = f (x).
n
Theorem 18.1.4 Let X and Y be two normed linear spaces and let L : X Y be
linear (L(ax + by) = aL(x) + bL(y) for a, b scalars and x, y X). The following
are equivalent
a.) L is continuous at 0
b.) L is continuous
c.) There exists K > 0 such that ||Lx||Y K ||x||X for all x X (L is
bounded).
Proof: a.)b.) Let xn x. It is necessary to show that Lxn Lx. But

(xn x) 0 and so from continuity at 0, it follows
L (xn x) = Lxn Lx 0
so Lxn Lx. This shows a.) implies b.).

b.)c.) Since L is continuous, L is continuous at 0. Hence ||Lx||Y < 1 whenever
||x||X for some . Therefore, suppressing the subscript on the || ||,
( )
x
||L || 1.
||x||
Hence
1
||Lx|| ||x||.

c.)a.) follows from the inequality given in c.).
Denition 18.1.5 Let L : X Y be linear and continuous where X and Y are

normed linear spaces. Denote the set of all such continuous linear maps by L(X, Y )
and dene
||L|| = sup{||Lx|| : ||x|| 1}. (18.1.1)
This is called the operator norm.
444 BANACH SPACES
Note that from Theorem 18.1.4 ||L|| is well dened because of part c.) of that
Theorem.
The next lemma follows immediately from the denition of the norm and the
assumption that L is linear.
Lemma 18.1.6 With ||L|| dened in 18.1.1, L(X, Y ) is a normed linear space.
Also ||Lx|| ||L|| ||x||.
Proof: Let x = 0 then x/ ||x|| has norm equal to 1 and so

( )
x
L ||L|| .
||x||
Therefore, multiplying both sides by ||x||, ||Lx|| ||L|| ||x||. This is obviously a
linear space. It remains to verify the operator norm really is a norm. First (of all, )
if ||L|| = 0, then Lx = 0 for all ||x|| 1. It follows that for any x = 0, 0 = L ||x||
x
and so Lx = 0. Therefore, L = 0. Also, if c is a scalar,
||cL|| = sup ||cL (x)|| = |c| sup ||Lx|| = |c| ||L|| .

||x||1 ||x||1
It remains to verify the triangle inequality. Let L, M L (X, Y ) .
||L + M || sup ||(L + M ) (x)|| sup (||Lx|| + ||M x||)

||x||1 ||x||1
sup ||Lx|| + sup ||M x|| = ||L|| + ||M || .

||x||1 ||x||1
This shows the operator norm is really a norm as hoped. This proves the lemma.
For example, consider the space of linear transformations dened on Rn having
values in Rm . The fact the transformation is linear automatically imparts conti-
nuity to it. You should give a proof of this fact. Recall that every such linear
transformation can be realized in terms of matrix multiplication.
Thus, in nite dimensions the algebraic condition that an operator is linear is
sucient to imply the topological condition that the operator is continuous. The
situation is not so simple in innite dimensional spaces such as C (X; Rn ). This
explains the imposition of the topological condition of continuity as a criterion for
membership in L (X, Y ) in addition to the algebraic condition of linearity.
Theorem 18.1.7 If Y is a Banach space, then L(X, Y ) is also a Banach space.
Proof: Let {Ln } be a Cauchy sequence in L(X, Y ) and let x X.
||Ln x Lm x|| ||x|| ||Ln Lm ||.
Thus {Ln x} is a Cauchy sequence. Let
Lx = lim Ln x.
n
Then, clearly, L is linear because if x1 , x2 are in X, and a, b are scalars, then
L (ax1 + bx2 ) = lim Ln (ax1 + bx2 )

n
= lim (aLn x1 + bLn x2 )
n
= aLx1 + bLx2 .
Also L is continuous. To see this, note that {||Ln ||} is a Cauchy sequence of real
numbers because |||Ln || ||Lm ||| ||Ln Lm ||. Hence there exists K > sup{||Ln || :
n N}. Thus, if x X,
||Lx|| = lim ||Ln x|| K||x||.

n
18.1.2 Uniform Boundedness Theorem

The next big result is sometimes called the Uniform Boundedness theorem, or the
Banach-Steinhaus theorem. This is a very surprising theorem which implies that for
a collection of bounded linear operators, if they are bounded pointwise, then they are
also bounded uniformly. As an example of a situation in which pointwise bounded
does not imply uniformly bounded, consider the functions f (x) X(,1) (x) x1
for (0, 1). Clearly each function is bounded and the collection of functions is
bounded at each point of (0, 1), but there is no bound for all these functions taken
together. One problem is that (0, 1) is not a Banach space. Therefore, the functions
cannot be linear.
Theorem 18.1.8 Let X be a Banach space and let Y be a normed linear space.
Let {L } be a collection of elements of L(X, Y ). Then one of the following
happens.
a.) sup{||L || : } <
b.) There exists a dense G set, D, such that for all x D,
sup{||L x|| } = .
Proof: For each n N, dene
Un = {x X : sup{||L x|| : } > n}.
Then Un is an open set because if x Un , then there exists such that
||L x|| > n
But then, since L is continuous, this situation persists for all y suciently close
to x, say for all y B (x, ). Then B (x, ) Un which shows Un is open.
Case b.) is obtained from Theorem 18.1.2 if each Un is dense.
The other case is that for some n, Un is not dense. If this occurs, there exists
x0 and r > 0 such that for all x B(x0 , r), ||L x|| n for all . Now if y
446 BANACH SPACES
B(0, r), x0 + y B(x0 , r). Consequently, for all such y, ||L (x0 + y)|| n. This
implies that for all and ||y|| < r,
||L y|| n + ||L (x0 )|| 2n.

r
Therefore, if ||y|| 1, 2 y < r and so for all ,
(r )
||L y || 2n.
2
Now multiplying by r/2 it follows that whenever ||y|| 1, ||L (y)|| 4n/r. Hence
case a.) holds.
18.1.3 Open Mapping Theorem

Another remarkable theorem which depends on the Baire category theorem is the
open mapping theorem. Unlike Theorem 18.1.8 it requires both X and Y to be
Banach spaces.
Theorem 18.1.9 Let X and Y be Banach spaces, let L L(X, Y ), and suppose L
is onto. Then L maps open sets onto open sets.
To aid in the proof, here is a lemma.
Lemma 18.1.10 Let a and b be positive constants and suppose
B(0, a) L(B(0, b)).
Then
L(B(0, b)) L(B(0, 2b)).
Proof of Lemma 18.1.10: Let y L(B(0, b)). There exists x1 B(0, b) such
that ||y Lx1 || < a2 . Now this implies
2y 2Lx1 B(0, a) L(B(0, b)).
Thus 2y 2Lx1 L(B(0, b)) just like y was. Therefore, there exists x2 B(0, b)
such that ||2y 2Lx1 Lx2 || < a/2. Hence ||4y 4Lx1 2Lx2 || < a, and there
exists x3 B (0, b) such that ||4y 4Lx1 2Lx2 Lx3 || < a/2. Continuing in this
way, there exist x1 , x2 , x3 , x4 , ... in B(0, b) such that

n
||2n y 2n(i1) L(xi )|| < a
i=1
which implies
( )

n
n
(i1) (i1)
||y 2 L(xi )|| = ||y L 2 (xi ) || < 2n a (18.1.2)
i=1 i=1

Now consider the partial sums of the series, i=1 2(i1) xi .

n

|| 2(i1) xi || b 2(i1) = b 2m+2 .
i=m i=m
Therefore, these
partial sums form a Cauchy sequence and so since X is complete,

there exists x = i=1 2(i1) xi . Letting n in 18.1.2 yields ||y Lx|| = 0. Now

n
||x|| = lim || 2(i1) xi ||
n
i=1

n
n
lim 2(i1) ||xi || < lim 2(i1) b = 2b.
n n
i=1 i=1

Proof of Theorem 18.1.9: Y = n=1 L(B(0, n)). By Corollary 18.1.3, the set,
L(B(0, n0 )) has nonempty interior for some n0 . Thus B(y, r) L(B(0, n0 )) for
some y and some r > 0. Since L is linear B(y, r) L(B(0, n0 )) also. Here is
why. If z B(y, r), then z B(y, r) and so there exists xn B (0, n0 ) such
that Lxn z. Therefore, L (xn ) z and xn B (0, n0 ) also. Therefore
z L(B(0, n0 )). Then it follows that
B(0, r) B(y, r) + B(y, r)

{y1 + y2 : y1 B (y, r) and y2 B (y, r)}
L(B(0, 2n0 ))
The reason for the last inclusion is that from the above, if y1 B (y, r) and y2
B (y, r), there exists xn , zn B (0, n0 ) such that
Lxn y1 , Lzn y2 .
Therefore,
||xn + zn || 2n0
and so (y1 + y2 ) L(B(0, 2n0 )).
By Lemma 18.1.10, L(B(0, 2n0 )) L(B(0, 4n0 )) which shows
B(0, r) L(B(0, 4n0 )).
Letting a = r(4n0 )1 , it follows, since L is linear, that B(0, a) L(B(0, 1)). It

follows since L is linear,
L(B(0, r)) B(0, ar). (18.1.3)
Now let U be open in X and let x + B(0, r) = B(x, r) U . Using 18.1.3,
L(U ) L(x + B(0, r))
= Lx + L(B(0, r)) Lx + B(0, ar) = B(Lx, ar).

448 BANACH SPACES
Hence
Lx B(Lx, ar) L(U ).
which shows that every point, Lx LU , is an interior point of LU and so LU is
open. This proves the theorem.
This theorem is surprising because it implies that if || and |||| are two norms
with respect to which a vector space X is a Banach space such that || K ||||,
then there exists a constant k, such that |||| k || . This can be useful because
sometimes it is not clear how to compute k when all that is needed is its existence.
To see the open mapping theorem implies this, consider the identity map id x = x.
Then id : (X, ||||) (X, ||) is continuous and onto. Hence id is an open map which
implies id1 is continuous. Theorem 18.1.4 gives the existence of the constant k.
18.1.4 Closed Graph Theorem

Denition 18.1.11 Let f : D E. The set of all ordered pairs of the form
{(x, f (x)) : x D} is called the graph of f .
Denition 18.1.12 If X and Y are normed linear spaces, make X Y into

a normed linear space by using the norm ||(x, y)|| = max (||x||, ||y||) along with
component-wise addition and scalar multiplication. Thus a(x, y) + b(z, w) (ax +
bz, ay + bw).
There are other ways to give a norm for X Y . For example, you could dene
||(x, y)|| = ||x|| + ||y||
Lemma 18.1.13 The norm dened in Denition 18.1.12 on X Y along with

the denition of addition and scalar multiplication given there make X Y into a
normed linear space.
Proof: The only axiom for a norm which is not obvious is the triangle inequality.
Therefore, consider
||(x1 , y1 ) + (x2 , y2 )|| = ||(x1 + x2 , y1 + y2 )||

= max (||x1 + x2 || , ||y1 + y2 ||)
max (||x1 || + ||x2 || , ||y1 || + ||y2 ||)
max (||x1 || , ||y1 ||) + max (||x2 || , ||y2 ||)

= ||(x1 , y1 )|| + ||(x2 , y2 )|| .
It is obvious X Y is a vector space from the above denition. This proves the
lemma.
Lemma 18.1.14 If X and Y are Banach spaces, then X Y with the norm and
vector space operations dened in Denition 18.1.12 is also a Banach space.
Proof: The only thing left to check is that the space is complete. But this
follows from the simple observation that {(xn , yn )} is a Cauchy sequence in X Y
if and only if {xn } and {yn } are Cauchy sequences in X and Y respectively. Thus
if {(xn , yn )} is a Cauchy sequence in X Y , it follows there exist x and y such that
xn x and yn y. But then from the denition of the norm, (xn , yn ) (x, y).
Lemma 18.1.15 Every closed subspace of a Banach space is a Banach space.
Proof: If F X where X is a Banach space and {xn } is a Cauchy sequence

in F , then since X is complete, there exists a unique x X such that xn x.
However this means x F = F since F is closed.
Denition 18.1.16 Let X and Y be Banach spaces and let D X be a subspace.

A linear map L : D Y is said to be closed if its graph is a closed subspace of
X Y . Equivalently, L is closed if xn x and Lxn y implies x D and
y = Lx.
Note the distinction between closed and continuous. If the operator is closed
the assertion that y = Lx only follows if it is known that the sequence {Lxn }
converges. In the case of a continuous operator, the convergence of {Lxn } follows
from the assumption that xn x. It is not always the case that a mapping which
is closed is necessarily continuous. Consider the function f (x) = tan (x) if x is not
an odd multiple of 2 and f (x) 0 at every odd multiple of 2 . Then the graph
is closed and the function is dened on R but it clearly fails to be continuous. Of
course this function is not linear. You could also consider the map,
d { }
: y C 1 ([0, 1]) : y (0) = 0 D C ([0, 1]) .
dx
where the norm is the uniform norm on C ([0, 1]) , ||y|| . If y D, then
x
y (x) = y (t) dt.
0
Therefore, if dyn
dx f C ([0, 1]) and if yn y in C ([0, 1]) it follows that
x dyn (t)
yn (x) = 0 dx dt
x
y (x) = 0
f (t) dt
and so by the fundamental theorem of calculus f (x) = y (x) and so the mapping
is closed. It is obviously not continuous because it takes y (x) and y (x) + n1 sin (nx)
to two functions which are far from each other even though these two functions are
very close in C ([0, 1]). Furthermore, it is not dened on the whole space, C ([0, 1]).
The next theorem, the closed graph theorem, gives conditions under which closed
implies continuous.
450 BANACH SPACES
Theorem 18.1.17 Let X and Y be Banach spaces and suppose L : X Y is

closed and linear. Then L is continuous.
Proof: Let G be the graph of L. G = {(x, Lx) : x X}. By Lemma 18.1.15

it follows that G is a Banach space. Dene P : G X by P (x, Lx) = x. P maps
the Banach space G onto the Banach space X and is continuous and linear. By the
open mapping theorem, P maps open sets onto open sets. Since P is also one to
one, this says that P 1 is continuous. Thus ||P 1 x|| K||x||. Hence
||Lx|| max (||x||, ||Lx||) K||x||
By Theorem 18.1.4 on Page 443, this shows L is continuous and proves the theorem.
The following corollary is quite useful. It shows how to obtain a new norm on
the domain of a closed operator such that the domain with this new norm becomes
a Banach space.
Corollary 18.1.18 Let L : D X Y where X, Y are a Banach spaces, and L

is a closed operator. Then dene a new norm on D by
||x||D ||x||X + ||Lx||Y .
Then D with this new norm is a Banach space.
Proof: If {xn } is a Cauchy sequence in D with this new norm, it follows both
{xn } and {Lxn } are Cauchy sequences and therefore, they converge. Since L is
closed, xn x and Lxn Lx for some x D. Thus ||xn x||D 0.
18.2 Hahn Banach Theorem

The closed graph, open mapping, and uniform boundedness theorems are the three
major topological theorems in functional analysis. The other major theorem is the
Hahn-Banach theorem which has nothing to do with topology. Before presenting
this theorem, here are some preliminaries about partially ordered sets.
Denition 18.2.1 Let F be a nonempty set. F is called a partially ordered set if

there is a relation, denoted here by , such that
x x for all x F .
If x y and y z then x z.
C F is said to be a chain if every two elements of C are related. This means that
if x, y C, then either x y or y x. Sometimes a chain is called a totally ordered
set. C is said to be a maximal chain if whenever D is a chain containing C, D = C.
18.2. HAHN BANACH THEOREM 451
The most common example of a partially ordered set is the power set of a given
set with being the relation. It is also helpful to visualize partially ordered sets
as trees. Two points on the tree are related if they are on the same branch of
the tree and one is higher than the other. Thus two points on dierent branches
would not be related although they might both be larger than some point on the
trunk. You might think of many other things which are best considered as partially
ordered sets. Think of food for example. You might nd it dicult to determine
which of two favorite pies you like better although you may be able to say very
easily that you would prefer either pie to a dish of lard topped with whipped cream
and mustard. The following theorem is equivalent to the axiom of choice. For a
discussion of this, see the appendix on the subject.
Theorem 18.2.2 (Hausdor Maximal Principle) Let F be a nonempty partially
ordered set. Then there exists a maximal chain.
Denition 18.2.3 Let X be a real vector space : X R is called a gauge function
if
(x + y) (x) + (y),
(ax) = a(x) if a 0. (18.2.4)
Suppose M is a subspace of X and z / M . Suppose also that f is a linear
real-valued function having the property that f (x) (x) for all x M . Consider
the problem of extending f to M Rz such that if F is the extended function,
F (y) (y) for all y M Rz and F is linear. Since F is to be linear, it suces
to determine how to dene F (z). Letting a > 0, it is required to dene F (z) such
that the following hold for all x, y M .
f (x)
z }| {
F (x) + aF (z) = F (x + az) (x + az),
f (y)
z }| {
F (y) aF (z) = F (y az) (y az). (18.2.5)
Now if these inequalities hold for all y/a, they hold for all y because M is given to
be a subspace. Therefore, multiplying by a1 18.2.4 implies that what is needed is
to choose F (z) such that for all x, y M ,
f (x) + F (z) (x + z), f (y) (y z) F (z)
and that if F (z) can be chosen in this way, this will satisfy 18.2.5 for all x, y and
the problem of extending f will be solved. Hence it is necessary to choose F (z)
such that for all x, y M
f (y) (y z) F (z) (x + z) f (x). (18.2.6)
Is there any such number between f (y) (y z) and (x + z) f (x) for every
pair x, y M ? This is where f (x) (x) on M and that f is linear is used.
For x, y M ,
(x + z) f (x) [f (y) (y z)]
452 BANACH SPACES
= (x + z) + (y z) (f (x) + f (y))
(x + y) f (x + y) 0.
Therefore there exists a number between
sup {f (y) (y z) : y M }
and
inf {(x + z) f (x) : x M }
Choose F (z) to satisfy 18.2.6. This has proved the following lemma.
Lemma 18.2.4 Let M be a subspace of X, a real linear space, and let be a gauge
function on X. Suppose f : M R is linear, z / M , and f (x) (x) for all
x M . Then f can be extended to M Rz such that, if F is the extended function,
F is linear and F (x) (x) for all x M Rz.
With this lemma, the Hahn Banach theorem can be proved.
Theorem 18.2.5 (Hahn Banach theorem) Let X be a real vector space, let M be a
subspace of X, let f : M R be linear, let be a gauge function on X, and suppose
f (x) (x) for all x M . Then there exists a linear function, F : X R, such
that
a.) F (x) = f (x) for all x M
b.) F (x) (x) for all x X.
Proof: Let F = {(V, g) : V M, V is a subspace of X, g : V R is linear,

g(x) = f (x) for all x M , and g(x) (x) for x V }. Then (M, f ) F so F = .
Dene a partial order by the following rule.
(V, g) (W, h)
means
V W and h(x) = g(x) if x V.
By Theorem 18.2.2, there exists a maximal chain, C F. Let Y = {V : (V, g) C}
and let h : Y R be dened by h(x) = g(x) where x V and (V, g) C. This
is well dened because if x V1 and V2 where (V1 , g1 ) and (V2 , g2 ) are both in the
chain, then since C is a chain, the two element related. Therefore, g1 (x) = g2 (x).
Also h is linear because if ax + by Y , then x V1 and y V2 where (V1 , g1 )
and (V2 , g2 ) are elements of C. Therefore, letting V denote the larger of the two Vi ,
and g be the function that goes with V , it follows ax + by V where (V, g) C.
Therefore,
h (ax + by) = g (ax + by)

= ag (x) + bg (y)
= ah (x) + bh (y) .
Also, h(x) = g (x) (x) for any x Y because for such x, x V where (V, g) C.
Is Y = X? If not, there exists z X \ Y and there exists an extension of h to
Y Rz using Lemma 18.2.4. Letting( h denote this ) extended function, contradicts
the maximality of C. Indeed, C { Y Rz, h } would be a longer chain. This
proves the Hahn Banach theorem.
This is the original version of the theorem. There is also a version of this theorem
for complex vector spaces which is based on a trick.
Corollary 18.2.6 (Hahn Banach) Let M be a subspace of a complex normed linear

space, X, and suppose f : M C is linear and satises |f (x)| K||x|| for all
x M . Then there exists a linear function, F , dened on all of X such that
F (x) = f (x) for all x M and |F (x)| K||x|| for all x.
Proof: First note f (x) = Re f (x) + i Im f (x) and so
Re f (ix) + i Im f (ix) = f (ix) = if (x) = i Re f (x) Im f (x).
Therefore, Im f (x) = Re f (ix), and
f (x) = Re f (x) i Re f (ix).
This is important because it shows it is only necessary to consider Re f in under-

standing f . Now it happens that Re f is linear with respect to real scalars so the
above version of the Hahn Banach theorem applies. This is shown next.
If c is a real scalar
Re f (cx) i Re f (icx) = cf (x) = c Re f (x) ic Re f (ix).
Thus Re f (cx) = c Re f (x). Also,
Re f (x + y) i Re f (i (x + y)) = f (x + y)
= f (x) + f (y)
= Re f (x) i Re f (ix) + Re f (y) i Re f (iy).

Equating real parts, Re f (x + y) = Re f (x) + Re f (y). Thus Re f is linear with
respect to real scalars as hoped.
Consider X as a real vector space and let (x) K||x||. Then for all x M ,
| Re f (x)| |f (x)| K||x|| = (x).
From Theorem 18.2.5, Re f may be extended to a function, h which satises
h(ax + by) = ah(x) + bh(y) if a, b R

h(x) K||x|| for all x X.
Actually, |h (x)| K ||x|| . The reason for this is that h (x) = h (x) K ||x|| =
K ||x|| and therefore, h (x) K ||x||. Let
F (x) h(x) ih(ix).

454 BANACH SPACES
By arguments similar to the above, F is linear.
F (ix) = h (ix) ih (x)

= ih (x) + h (ix)
= i (h (x) ih (ix)) = iF (x) .
If c is a real scalar,
F (cx) = h(cx) ih(icx)

= ch (x) cih (ix) = cF (x)
Now
F (x + y) = h(x + y) ih(i (x + y))

= h (x) + h (y) ih (ix) ih (iy)
= F (x) + F (y) .
Thus
F ((a + ib) x) = F (ax) + F (ibx)

= aF (x) + ibF (x)
= (a + ib) F (x) .
This shows F is linear as claimed.

Now wF (x) = |F (x)| for some |w| = 1. Therefore
must equal zero
z }| {
|F (x)| = wF (x) = h(wx) ih(iwx) = h(wx)
= |h(wx)| K||wx|| = K ||x|| .
This proves the corollary.
Denition 18.2.7 Let X be a Banach space. Denote by X the space of continuous

linear functions which map X to the eld of scalars. Thus X = L(X, F). By
Theorem 18.1.7 on Page 444, X is a Banach space. Remember with the norm
dened on L (X, F),
||f || = sup{|f (x)| : ||x|| 1}
X is called the dual space.
Denition 18.2.8 Let X and Y be Banach spaces and suppose L L(X, Y ). Then
dene the adjoint map in L(Y , X ), denoted by L , by
L y (x) y (Lx)
for all y Y .
The following diagram is a good one to help remember this denition.

L

X Y

X Y
L
This is a generalization of the adjoint of a linear transformation on an inner
product space. Recall
(Ax, y) = (x, A y)
What is being done here is to generalize this algebraic concept to arbitrary Banach
spaces. There are some issues which need to be discussed relative to the above
denition. First of all, it must be shown that L y X . Also, it will be useful to
have the following lemma which is a useful application of the Hahn Banach theorem.
Lemma 18.2.9 Let X be a normed linear space and let x X. Then there exists
x X such that ||x || = 1 and x (x) = ||x||.
Proof: Let f : Fx F be dened by f (x) = ||x||. Then for y = x Fx,
|f (y)| = |f (x)| = || ||x|| = |y| .
By the Hahn Banach theorem, there exists x X such that x (x) = f (x) and
||x || 1. Since x (x) = ||x|| it follows that ||x || = 1 because
( )
x ||x||

||x || x = = 1.
||x|| ||x||
Theorem 18.2.10 Let L L(X, Y ) where X and Y are Banach spaces. Then
a.) L L(Y , X ) as claimed and ||L || = ||L||.
b.) If L maps one to one onto a closed subspace of Y , then L is onto.
c.) If L maps onto a dense subset of Y , then L is one to one.
Proof: It is routine to verify L y and L are both linear. This follows imme-
diately from the denition. As usual, the interesting thing concerns continuity.
||L y || = sup |L y (x)| = sup |y (Lx)| ||y || ||L|| .
||x||1 ||x||1
Thus L is continuous as claimed and ||L || ||L|| .

By Lemma 18.2.9, there exists yx Y such that ||yx || = 1 and yx (Lx) =
||Lx|| .Therefore,
||L || = sup ||L y || = sup sup |L y (x)|
||y ||1 ||y ||1 ||x||1
= sup sup |y (Lx)| = sup sup |y (Lx)|

||y ||1 ||x||1 ||x||1 ||y ||1
sup |yx (Lx)| = sup ||Lx|| = ||L||

||x||1 ||x||1
456 BANACH SPACES
showing that ||L || ||L|| and this shows part a.).

If L is one to one and onto a closed subset of Y , then L (X) being a closed
subspace of a Banach space, is itself a Banach space and so the open mapping
theorem implies L1 : L(X) X is continuous. Hence

||x|| = ||L1 Lx|| L1 ||Lx||
Now let x X be given. Dene f L(L(X), C) by f (Lx) = x (x). The function,
f is well dened because if Lx1 = Lx2 , then since L is one to one, it follows x1 = x2
and so f (L (x1 )) = x (x1 ) = x (x2 ) = f (L (x1 )). Also, f is linear because
f (aL (x1 ) + bL (x2 )) = f (L (ax1 + bx2 ))
x (ax1 + bx2 )
= ax (x1 ) + bx (x2 )
= af (L (x1 )) + bf (L (x2 )) .
In addition to this,

|f (Lx)| = |x (x)| ||x || ||x|| ||x || L1 ||Lx||

and so the norm of f on L (X) is no larger than ||x || L1 . By the Hahn Banach

theorem, exists an extension of f to an element y Y such that ||y ||
there
1
||x || L . Then
L y (x) = y (Lx) = f (Lx) = x (x)
so L y = x because this holds for all x. Since x was arbitrary, this shows L is
onto and proves b.).
Consider the last assertion. Suppose L y = 0. Is y = 0? In other words
is y (y) = 0 for all y Y ? Pick y Y . Since L (X) is dense in Y, there exists
a sequence, {Lxn } such that Lxn y. But then by continuity of y , y (y) =
limn y (Lxn ) = limn L y (xn ) = 0. Since y (y) = 0 for all y, this implies
y = 0 and so L is one to one.
Corollary 18.2.11 Suppose X and Y are Banach spaces, L L(X, Y ), and L is
one to one and onto. Then L is also one to one and onto.
There exists a natural mapping, called the James map from a normed linear
space, X, to the dual of the dual space which is described in the following denition.
Denition 18.2.12 Dene J : X X by J(x)(x ) = x (x).
Theorem 18.2.13 The map, J, has the following properties.
a.) J is one to one and linear.
b.) ||Jx|| = ||x|| and ||J|| = 1.
c.) J(X) is a closed subspace of X if X is complete.
Also if x X ,
||x || = sup {|x (x )| : ||x || 1, x X } .
Proof:
J (ax + by) (x ) x (ax + by)

= ax (x) + bx (y)
= (aJ (x) + bJ (y)) (x ) .
Since this holds for all x X , it follows that
J (ax + by) = aJ (x) + bJ (y)
and so J is linear. If Jx = 0, then by Lemma 18.2.9 there exists x such that

x (x) = ||x|| and ||x || = 1. Then
0 = J(x)(x ) = x (x) = ||x||.
This shows a.).

To show b.), let x X and use Lemma 18.2.9 to obtain x X such that
x (x) = ||x|| with ||x || = 1. Then

||x|| sup{|y (x)| : ||y || 1}

= sup{|J(x)(y )| : ||y || 1} = ||Jx||
|J(x)(x )| = |x (x)| = ||x||
Therefore, ||Jx|| = ||x|| as claimed. Therefore,
||J|| = sup{||Jx|| : ||x|| 1} = sup{||x|| : ||x|| 1} = 1.
This shows b.).

To verify c.), use b.). If Jxn y X then by b.), xn is a Cauchy sequence
converging to some x X because
||xn xm || = ||Jxn Jxm ||
and {Jxn } is a Cauchy sequence. Then Jx = limn Jxn = y .

Finally, to show the assertion about the norm of x , use what was just shown
applied to the James map from X to X still referred to as J.
||x || = sup {|x (x)| : ||x|| 1} = sup {|J (x) (x )| : ||Jx|| 1}
sup {|x (x )| : ||x || 1} = sup {|J (x ) (x )| : ||x || 1}

||Jx || = ||x ||.
Denition 18.2.14 When J maps X onto X , X is called reexive.
It happens the Lp spaces are reexive whenever p > 1.

458 BANACH SPACES
18.3 Uniform Convexity Of Lp

These terms refer roughly to how round the unit ball is. Here is the denition.
Denition 18.3.1 A Banach space is uniformly convex if whenever ||xn ||, ||yn ||
1 and ||xn + yn || 2, it follows that ||xn yn || 0.
You can show that uniform convexity implies strict convexity. There are various
other things which can also be shown. See the exercises for some of these. In this
section, it will be shown that the Lp spaces are examples of uniformly convex spaces.
This involves some inequalities known as Clarksons inequalities. Before present-
ing these, here are the backwards Holder inequality and the backwards Minkowski
inequality.
Lemma 18.3.2 Let 0 < p < 1 and let f, g be measurable functions. Also

p/(p1) p
|g| d < , |f | d <

Then the following backwards Holder inequality holds.

( )1/p ( )(p1)/p
p p/(p1)
|f g| d |f | d |g| d

Proof: If |f g| d = , there is nothing to prove. Hence assume this is nite.
Then
p p p
|f | d = |g| |f g| d
This makes sense because, due to the hypothesis on g it must be the case that g
equals 0 only on a set of measure zero, since p/ (p 1) < 0. Then
( )p ( ( )1/(1p) )1p
p 1
|f | d |f g| d p d
|g|
( )p ( )1p
p/p1
= |f g| d |g| d
Now divide and then take the pth root.

Here is the backwards Minkowski inequality.
p
Corollary 18.3.3 Let 0 < p < 1 and suppose |h| d < for h = f, g. Then
( )1/p ( )1/p ( )1/p
p p p
(|f | + |g|) d |f | d + |g| d
18.3. UNIFORM CONVEXITY OF LP 459
p
Proof: If (|f | + |g|) d = 0 then there is nothing to prove so assume this is
not zero.
p p1
(|f | + |g|) d = (|f | + |g|) (|f | + |g|) d
p p p
(|f | + |g|) |f | + |g| and so
( )p/p1
p1
(|f | + |g|) d < .
Hence the backward Holder inequality applies and it follows that

p p1 p1
(|f | + |g|) d = (|f | + |g|) |f | d + (|f | + |g|) |g| d
( ( [ )1/p ]
)p/p1 )(p1)/p ( )1/p (
p1 p p
(|f | + |g|) |f | d + |g| d
( )(p1)/p [( )1/p ( )1/p ]
p p p
= (|f | + |g|) |f | d + |g| d
and so, dividing gives the desired inequality.

Consider the easy Clarkson inequalities.
Lemma 18.3.4 For any p 2 the following inequality holds for any t [0, 1] ,

1 + t p 1 t p 1
+ p
2 2 2 (|t| + 1)
Proof: It is clear that, since p 2, the inequality holds for t = 0 and t = 1.Thus
it suces to consider only t (0, 1). Let x = 1/t. Then, dividing by 1/tp , the
inequality holds if and only if
( )p ( )p
x+1 x1 1
+ (1 + xp )
2 2 2
for all x 1. Let
(( )p ( )p )
1 x+1 x1
f (x) = (1 + xp ) +
2 2 2
Then f (1) = 0 and
( ( )p1 ( )p1 )
p p x+1 p x1
f (x) = xp1 +
2 2 2 2 2
Since p 1 1, by convexity of f (x) = xp1 ,

( x+1 x1 )p1 ( x )p1
p p1 2 + 2 p
f (x) x p = xp1 p 0
2 2 2 2
Hence f (x) 0 for all x 1.
460 BANACH SPACES
Corollary 18.3.5 If z, w C and p 2, then

z + w p z w p 1
+ p p
2 2 2 (|z| + |w| ) (18.3.7)
Proof: One of |w| , |z| is larger. Say |z| |w| . Then dividing both sides of the
p
proposed inequality by |z| it suces to verify that for all complex t having |t| 1,

1 + t p 1 t p 1
+ p
2 2 2 (|t| + 1)
Say t = rei where r 1.Then consider the expression

1 + rei p 1 rei p
+
2 2
It is 2p times
( )p/2 ( )p/2
2 2
(1 + r cos ) + r2 sin2 () + (1 r cos ) + r2 sin2 ()
( )p/2 ( )p/2
= 1 + r2 + 2r cos + 1 + r2 2r cos ,
a continuous periodic function for R which achieves its maximum value when
= 0. This follows from the rst derivative test from calculus. Therefore, for |t| 1,

1 + t p 1 t p 1 + |t| p 1 |t| p 1
+ + p
2 2 2 2 2 (1 + |t| )
by the above lemma.

With this corollary, here is the easy Clarkson inequality.
Theorem 18.3.6 Let p 2. Then

p
f + g p
+ f g 1 (||f ||p p + ||g||p p )
2 p 2 p 2 L L
L L
Proof: This follows right away from the above corollary.

f + g p f g p
d + d 1 p p
(|f | + |g| ) d
2 2 2

Now it remains to consider the hard Clarkson inequalities. These pertain to

p < 2. First is the following elementary inequality.
Lemma 18.3.7 For 1 < p < 2, the following inequality holds for all t [0, 1] .
( )q/p
1 + t q 1 t q
+ 1 + 1 |t|p
2 2 2 2
where here 1/p + 1/q = 1 so q > 2.
Proof: First note that if t = 0 or 1, the inequality holds. Next observe that the
map s 1s
1+s maps (0, 1) onto (0, 1). Replace t with (1 s) / (1 + s). Then you get
( p )q/p
1 q s q
+ 1 + 1 1 s
s + 1 s + 1 2 2 s + 1
q
Multiplying both sides by (1 + s) , this is equivalent to showing that for all s
(0, 1) ,
( p )q/p
p q/p 1 1 1 s
1+s q
((1 + s) ) +
2 2 s + 1
( )q/p
1 p p q/p
= ((1 + s) + (1 s) )
2
This is the same as establishing

1 p p p1
((1 + s) + (1 s) ) (1 + sq ) 0 (18.3.8)
2
where p 1 = p/q due to the denition of q above.
( )
p p (p 1) (p k + 1)
, l1
l l!
( ) ( )
p p
and 1. What is the sign of ? Recall that 1 < p < 2 so the sign
0 l ( )
p
is positive if l = 0, l = 1, l = 2. What about l = 3? = p(p1)(p2)
3! so this
( ) 3
p
is negative. Then is positive. Thus these alternate between positive and
( )4 ( )
p p1
negative with > 0 for all k. What about ? When k = 0 it is
2k k
(p1)(p2)
positive. When k = ( 1 it is )also positive.( When k) = 2 it equals 2! < 0.
p1 p1
Then when k = 3, > 0. Thus is positive when k is odd and
3 k
is negative when k is even.
Now return to 18.3.8. The left side equals
( ( ) ( ) ) ( )
1 p p k
p1
k
s + (s) sqk .
2 k k k
k=0 k=0 k=0
The rst term equals 0. Then this reduces to

(
) ( ) ( )
p p1 p1
s
2k
sq2k
sq(2k1)
2k 2k 2k 1
k=1
462 BANACH SPACES
From the above observation about the binomial coecients, the above is larger than
(
) ( )
p p1
s
2k
sq(2k1)
2k 2k 1
k=1
It remains to show the k th term in the above sum is nonnegative. Now q (2k 1) >
2k for all k 1 because q > 2. Then since 0 < s < 1
( ) ( ) (( ) ( ))
p p1 p p1
s
2k
sq(2k1)
s2k

2k 2k 1 2k 2k 1
However, this is nonnegative because it equals

>0
z }| {
p (p 1) (p 2k + 1) (p 1) (p 2) (p 2k + 1)
s2k

(2k)! (2k 1)!
( )
p (p 1) (p 2k + 1) (p 1) (p 2) (p 2k + 1)
s2k
(2k)! (2k)!
(p 1) (p 2) (p 2k + 1)
= s2k (p 1) > 0.
(2k)!
As before, this leads to the following corollary.
Corollary 18.3.8 Let z, w C. Then for p (1, 2) ,

( )q/p
z + w q z w q
+ 1 |z|p + 1 |w|p
2 2 2 2
q
Proof: One of |w| , |z| is larger. Say |w| |z| . Then dividing by |w| , for
t = z/w, showing the above inequality is equivalent to showing that for all t C,
|t| 1,
( )q/p
t + 1 q 1 t q
+ 1 |t|p + 1
2 2 2 2
Now q > 2 and so by the same argument given in proving Corollary 18.3.5, for
t = rei , the left side of the above inequality is maximized when = 0. Hence, from
Lemma 18.3.7,
t + 1 q 1 t q |t| + 1 q 1 |t| q
+ +
2 2 2 2
( )q/p
1 p 1
|t| + .
2 2
From this the hard Clarkson inequality follows. The two Clarkson inequalities
are summarized in the following theorem.
Theorem 18.3.9 Let 2 p. Then

p
f + g p
+ f g 1 (||f ||p p + ||g||p p )
2 p 2 p 2 L L
L L
Let 1 < p < 2. then for 1/p + 1/q = 1,

q ( )q/p
f + g q
+ f g 1 ||f ||p p + 1 ||g||p p
2 p 2 p 2 L
2 L
L L
Proof: The rst was established above.

q
f + g q
+ f g
2 p 2 p
L L
( )q/p ( )q/p
f + g p f g p
d + d
2 2

( ( ) )q/p ( ( ) )q/p
f + g q p/q f g q p/q
= d + d
2 2

Now p/q < 1 and so the backwards Minkowski inequality applies. Thus
( ( ) )q/p
f + g q f g q p/q

2 + 2 d

From Corollary 18.3.8,

q/p
(( )q/p )p/q
1 p 1 p
|f | + |g| d
2 2
( ( ) )q/p ( )q/p
1 p 1 p 1 p 1 p
= |f | + |g| d = ||f ||Lp + ||g||Lp
2 2 2 2
Now with these Clarkson inequalities, it is not hard to show that all the Lp
spaces are uniformly convex.
Theorem 18.3.10 The Lp spaces are uniformly convex.

n
Proof: First suppose p 2. Suppose ||fn ||Lp , ||gn ||Lp 1 and fn +g
2 p 1.
L
Then from the rst Clarkson inequality,
p
fn + gn p
+ fn gn 1 (||fn ||p p + ||gn ||p p ) 1
2 2 p 2 L L
Lp L
and so ||fn gn ||Lp 0.

464 BANACH SPACES

n
Next suppose 1 < p < 2 and fn +g
2 p 1. Then from the second Clarkson
L
inequality
q ( )q/p
fn + gn q
+ fn gn 1 ||fn ||p p + 1 ||gn ||p p 1
2 p 2 p 2 L
2 L
L L
which shows that ||fn gn ||Lp 0.
18.4 Weak And Weak Topologies

18.4.1 Basic Denitions
Let X be a Banach space and let X be its dual space.1 For A a nite subset of
X , denote by A the function dened on X
A (x) max

|x (x)| (18.4.9)
x A
and also let BA (x, r) be dened by
BA (x, r) {y X : A (y x) < r} (18.4.10)
Then certain things are obvious. First of all, if a F and x, y X,
A (x + y) A (x) + A (y) ,
A (ax) = |a| A (x) .
Similarly, letting A be a nite subset of X, denote by A the function dened

on X
A (x ) max |x (x)| (18.4.11)
xA

and let BA (x , r) be dened by
BA (x , r) {y X : A (y x ) < r} . (18.4.12)
It is also clear that
A (x + y ) (x ) + A (y ) ,
A (ax ) = |a| A (x ) .
Lemma 18.4.1 The sets, BA (x, r) where A is a nite subset of X and x X

form a basis for a topology on X known as the weak topology. The sets BA (x , r)
where A is a nite subset of X and x X form a basis for a topology on X
known as the weak topology.
1 Actually, all this works in much more general settings than this.
18.4. WEAK AND WEAK TOPOLOGIES 465
Proof: The two assertions are very similar. I will verify the one for the weak
topology. The union of these sets, BA (x, r) for x X and r > 0 is all of X. Now
suppose z is contained in the intersection of two of these sets. Say
z BA (x, r) BA1 (x1 , r1 )
Then let C = A A1 and let

( )
0 < min r A (z x) , r1 A1 (z x1 ) .
Consider y BC (z, ) . Then
r A (z x) > C (y z) A (y z)
and so
r > A (y z) + A (z x) A (y x)
which shows y BA (x, r) . Similar reasoning shows y BA1 (x1 , r1 ) and so
BC (z, ) BA (x, r) BA1 (x1 , r1 ) .
Therefore, the weak topology consists of the union of all sets of the form BA (x, r).
18.4.2 Banach Alaoglu Theorem

Why does anyone care about these topologies? The short answer is that in the
weak topology, closed unit ball in X is compact. This is not true in the normal
topology. This wonderful result is the Banach Alaoglu theorem. First recall the
notion of the product topology, and the Tychono theorem, Theorem 17.3.4 on
Page 427 which are stated here for convenience.
Denition 18.4.2 Let I be a set and suppose for each i I, (Xi , i ) is a nonempty
topological space. The Cartesian product of the Xi , denoted by iI Xi , consists
of the set of allchoice functions dened on I which select a single element of each
i . Thus f iI Xi means for every i I, f (i) Xi . The axiom of choice says
X
iI Xi is nonempty. Let
Pj (A) = Bi
iI
where Bi = Xi if i = j and Bj = A. A subbasis for a topology on the product

space consists of all sets Pj (A) where A j . (These sets have an open set from
the topology of Xj in the j th slot and the whole space in the other slots.) Thus a
basis consists of nite intersections of these sets. Note that theintersection of two
of these basic sets is another basic set and their union yields iI Xi . Therefore,
they satisfy the condition needed for a collection of sets to serve as a basis for a
topology. This topology is called the product topology and is denoted by i .

Theorem 18.4.3 If (Xi , i ) is compact, then so is ( iI Xi , i ).
466 BANACH SPACES
The Banach Alaoglu theorem is as follows.
Theorem 18.4.4 Let B be the closed unit ball in X . Then B is compact in the
weak topology.
Proof: By the Tychono theorem, Theorem 18.4.3

P B (0, ||x||)
xX
is compact in the product topology where the topology on B (0, ||x||) is the usual
topology of F. Recall P is the set of functions which map a point, x X to a point
in B (0, ||x||). Therefore, B P. Also the basic open sets in the weak topology
on B are obtained as the intersection of basic open sets in the product topology
of P to B and so it suces to show B is a closed subset of P. Suppose then that
f P \ B . Since |f (x)| x for each x, it follows f cannot be linear. There are
two ways this can happen. One way is that for some x, y
f (x + y) = f (x) + f (y)
for some x, y X. However, if g is close enough to f at the three points, x + y, x,

and y, the above inequality will hold for g in place of f. In other words there is a
basic open set containing f, such that for all g in this basic open set, g / B. A
similar consideration applies in case f (x) = f (x) for some scalar and x. Since
P \ B is open, it follows B is a closed subset of P and is therefore, compact.
Sometimes one can consider the weak topology in terms of a metric space.
Theorem 18.4.5 If K X is compact in the weak topology and X is separable

in the weak topology then there exists a metric, d, on K such that if d is the topology
on K induced by d and if is the topology on K induced by the weak topology of
X , then = d . Thus one can consider K with the weak topology as a metric
space.
Proof: Let D = {xn } be the dense countable subset in X. The metric is

xn (f g)
d (f, g) 2n
n=1
1 + xn (f g)
where xn (f ) = |f (xn )|. Clearly d (f, g) = d (g, f ) 0. If d (f, g) = 0, then this

requires f (xn ) = g (xn ) for all xn D. Is it the case that f = g? B{f,g} (x, r)
contains some xn D. Hence
max {|f (xn ) f (x)| , |g (xn ) g (x)|} < r
and f (xn ) = g (xn ) . It follows that |f (x) g (x)| < 2r. Since r is arbitrary, this
implies f (x) = g (x) . It is routine to verify the triangle inequality from the easy to
establish inequality,
x y x+y
+ ,
1+x 1+y 1+x+y
valid whenever x, y 0. Therefore this is a metric.

Thus there are two topological spaces, (K, ) and (K, d), the rst being K with
the weak topology and the second being K with this metric. It is clear that if i
is the identity map, i : (K, ) (K, d), then i is continuous. Therefore, sets which
are open in (K, d) are open in (K, ) . Letting d denote those sets which are open
with respect to the metric, d .
Now suppose U . Is U in d ? Since K is compact with respect to , it follows
from the above that K is compact with respect to d . Hence K \ U is compact
with respect to d and so it is closed with respect to d . Thus U is open with
respect to d .
The fact that this set with the weak topology can be considered a metric
space is very signicant because if a point is a limit point in a metric space, one
can extract a convergent sequence.
Note that if a Banach space is separable, then it is weakly separable.
Corollary 18.4.6 If X is weakly separable and K X is compact in the weak

topology, then K is sequentially compact. That is, if {fn }
n=1 K, then there exists
a subsequence fnk and f K such that for all x X,
lim fnk (x) = f (x).

k
Proof: By Theorem 18.4.5, K is a metric space for the metric described there
and it is compact. Therefore by the characterization of compact metric spaces,
Proposition 15.2.5 on Page 366, K is sequentially compact. This proves the corol-
lary.
18.4.3 Eberlein Smulian Theorem

Next consider the weak topology. The most interesting results have to do with a
reexive Banach space. The following lemma ties together the weak and weak
topologies in the case of a reexive Banach space.
Lemma 18.4.7 Let J : X X be the James map
Jx (f ) f (x)
and let X be reexive so that J is onto. Then J is a homeomorphism of (X, weak topology)
and (X , weak topology).This means J is one to one, onto, and both J and J 1
are continuous.
Proof: Let f X and let
Bf (x, r) {y : |f (x) f (y)| < r}.
Thus Bf (x, r) is a subbasic set for the weak topology on X. I claim that
JBf (x, r) = Bf (Jx, r)

468 BANACH SPACES
where Bf (Jx, r) is a subbasic set for the weak topology. If y Bf (x, r) , then
Jy Jx = x y < r and so JBf (x, r) Bf (Jx, r) . Now if x Bf (Jx, r) ,
then since J is reexive, there exists y X such that Jy = x and so
y x = Jy Jx < r
showing that JBf (x, r) = Bf (Jx, r) . A typical subbasic set in the weak topology
is of the form Bf (Jx, r) . Thus J maps the subbasic sets of the weak topology to the
subbasic sets of the weak topology. Therefore, J is a homeomorphism as claimed.

The following is an easy corollary.
Corollary 18.4.8 If X is a reexive Banach space, then the closed unit ball is
weakly compact.
Proof: Let B be the closed unit ball. Then B = J 1 (B ) where B is the

unit ball in X which is compact in the weak topology. Therefore B is weakly
compact because J 1 is continuous.
Corollary 18.4.9 Let X be a reexive Banach space. If K X is compact in the

weak topology and X is separable in the weak topology, then there exists a metric
d, on K such that if d is the topology on K induced by d and if is the topology
on K induced by the weak topology of X, then = d . Thus one can consider K
with the weak topology as a metric space.
Proof: This follows from Theorem 18.4.5 and Lemma 18.4.7. Lemma 18.4.7
implies J (K) is compact in X . Then since X is separable in the weak topology,
X is separable in the weak topology and so there is a metric, d on J (K) which
delivers the weak topology on J (K). Let d (x, y) d (Jx, Jy) . Then
J id J 1
(K, d ) (J (K) , d ) (J (K) , weak ) (K, weak )
and all the maps are homeomorphisms.

Here is a useful lemma.
Lemma 18.4.10 Let Y be a closed subspace of a Banach space X and let y X \Y.
Then there exists x X such that x (Y ) = 0 but x (y) = 0.
Proof: Dene f (x + y) y . Thus f is linear on Y Fy. I claim that f

is also continuous on this subspace of X. If not, then there exists xn + n y 0
but |f (xn + n y)| > 0 for all n. First suppose |n | is bounded. Then, taking
a further subsequence, we can assume n . It follows then that {xn } must also
converge to some x Y since Y is closed. Therefore, in this case, x + y = 0 and
so = 0 since otherwise, y Y . In the other case when n is unbounded, you have
(xn /n + y) 0 and so it would require that y Y which cannot happen because
Y is closed. Hence f is continuous as claimed. It follows that for some k,
|f (x + y)| k x + y
Now apply the Hahn Banach theorem to extend f to x X .

Next is the Eberlein Smulian theorem which states that a Banach space is re-
exive if and only if the closed unit ball is weakly sequentially compact. Actually,
only half the theorem is proved here, the more useful only if part. The book by
Yoshida [42] has the complete theorem discussed. First here is an interesting lemma
for its own sake.
Lemma 18.4.11 A closed subspace of a reexive Banach space is reexive.
Proof: Let Y be the closed subspace of the reexive space, X. Consider the
following diagram
i 1-1
Y X
i onto
Y X
i
Y X
This diagram follows from Theorem 18.2.10 on Page 455, the theorem on adjoints.
Now let y Y . Then i y = JX (y) because X is reexive. I want to show
that y Y . If it is not in Y then since Y is closed, there exists x X such that
x (y) = 0 but x (Y ) = 0. Then i x = 0. Hence
0 = y (i x ) = i y (x ) = J (y) (x ) = x (y) = 0,
a contradiction. Hence y Y . Letting JY denote the James map from Y to Y

and x X ,
y (i x ) = i y (x ) = JX (y) (x )
= x (y) = x (iy) = i x (y) = JY (y) (i x )
Since i is onto, this shows y = JY (y) .
Theorem 18.4.12 (Eberlein Smulian) The closed unit ball in a reexive Banach
space X, is weakly sequentially compact. By this is meant that if {xn } is contained
in the closed unit ball, there exists a subsequence, {xnk } and x X such that for
all x X ,
x (xnk ) x (x) .
Proof: Let {xn } B B (0, 1). Let Y be the closure of the linear span of
{xn }. Thus Y is a separable. It is reexive because it is a closed subspace of a
reexive space so the above lemma applies. By the Banach Alaoglu theorem, the
closed unit ball B in Y is weak compact. Also by Theorem 18.4.5, B is a metric
space with a suitable metric.
i 1-1
B Y X
i onto
weakly separable B Y X
i
separable B Y X
470 BANACH SPACES
Thus B is complete and totally bounded with respect to this metric and it
follows that B with the weak topology is separable. This implies Y is also
separable in the weak topology. To see this, let {yn } D be a weak dense
set in B and let y Y . Let p be a large enough positive rational number that
y /p B . Then if A is any nite set from Y, there exists yn D such that
A (y /p yn ) < p . It follows pyn BA (y , ) showing that rational multiples of
D are weak dense in Y . Since Y is reexive, the weak and weak topologies on
Y coincide and so Y is weakly separable. Since Y is weakly separable, Corollary
18.4.6 implies B , the closed unit ball in Y is weak sequentially compact. Then
by Lemma 18.4.7 B, the unit ball in Y , is weakly sequentially compact. It follows
there exists a subsequence xnk , of the sequence {xn } and a point x Y , such that
for all f Y ,
f (xnk ) f (x).
Now if x X , and i is the inclusion map of Y into X,
x (xnk ) = i x (xnk ) i x (x) = x (x).
which shows xnk converges weakly and this shows the unit ball in X is weakly
sequentially compact.
Corollary 18.4.13 Let {xn } be any bounded sequence in a reexive Banach space
X. Then there exists x X and a subsequence, {xnk } such that for all x X ,
lim x (xnk ) = x (x)

k
Proof: If a subsequence, xnk has ||xnk || 0, then the conclusion follows.

Simply let x = 0. Suppose then that ||xn || is bounded away from 0. That is,
||xn || [, C]. Take a subsequence such that ||xnk || a. Then consider xnk / ||xnk ||.
By the Eberlein
Smulian theorem, this subsequence has a further subsequence,

xnkj / xnkj which converges weakly to x B where B is the closed unit ball.
It follows from routine considerations that xnkj ax weakly. This proves the
corollary.
18.5 Exercises
1. Is N a G set? What about Q? What about a countable dense subset of a
complete metric space?
2. Let f : R C be a function. Dene the oscillation of a function in B (x, r)
by r f (x) = sup{|f (z) f (y)| : y, z B(x, r)}. Dene the oscillation of the
function at the point, x by f (x) = limr0 r f (x). Show f is continuous
at x if and only if f (x) = 0. Then show the set of points where f is
continuous is a G set (try Un = {x : f (x) < n1 }). Does there exist a
function continuous at only the rational numbers? Does there exist a function
continuous at every irrational and discontinuous elsewhere? Hint: Suppose
18.5. EXERCISES 471

D is any countable set, D = {di }i=1 , and dene the function, fn (x) to equal
/ {d1 , , dn } and 2n for x in this nite set. Then consider
every x
zero for

g (x) n=1 fn (x). Show that this series converges uniformly.
3. Let f C([0, 1]) and suppose f (x) exists. Show there exists a constant, K,
such that |f (x) f (y)| K|x y| for all y [0, 1]. Let Un = {f C([0, 1])
such that for each x [0, 1] there exists y [0, 1] such that |f (x) f (y)| >
n|x y|}. Show that Un is open and dense in C([0, 1]) where for f C ([0, 1]),
||f || sup {|f (x)| : x [0, 1]} .
Show that n Un is a dense G set of nowhere dierentiable continuous func-

tions. Thus every continuous function is uniformly close to one which is
nowhere dierentiable.

4. Suppose f (x) = k=1 uk (x) where the convergence is uniform and each
uk is a polynomial. Is it reasonable to conclude that f (x) = k=1 uk (x)?
The answer is no. Use Problem 3 and the Weierstrass approximation theorem
to show this.
5. Let X be a normed linear space. A X is weakly bounded if for each
x X , sup{|x (x)| : x A} < , while A is bounded if sup{||x|| : x
A} < . Show A is weakly bounded if and only if it is bounded.
6. Let f be a 2 periodic locally integrable function on R. The Fourier series
for f is given by

n
ak eikx lim ak eikx lim Sn f (x)
n n
k= k=n
where
1
ak = eikx f (x) dx.
2
Show
Sn f (x) = Dn (x y) f (y) dy

where
sin((n + 21 )t)
Dn (t) = .
2 sin( 2t )

Verify that
Dn (t) dt = 1. Also show that if g L1 (R) , then

lim g (x) sin (ax) dx = 0.
a R
This last is called the Riemann Lebesgue lemma. Hint: For the last part,
assume rst that g Cc (R) and integrate by parts. Then exploit density of
the set of functions in L1 (R).
472 BANACH SPACES
7. It turns out that the Fourier series sometimes converges to the func-
tion pointwise. Suppose f is 2 periodic and Holder continuous. That is

|f (x) f (y)| K |x y| where (0, 1]. Show that if f is like this, then
the Fourier series converges to f at every point. Next modify your argument

to show that if at every point, x, |f (x+) f (y)| K |x y| for y close

enough to x and larger than x and |f (x) f (y)| K |x y| for every
f (x+)+f (x)
y close enough to x and smaller than x, then Sn f (x) 2 , the
midpoint of the jump of the function. Hint: Use Problem 6.
8. Let Y = {f such that f is continuous, dened on R, and 2 periodic}. Dene
||f ||Y = sup{|f (x)| : x [, ]}. Show that (Y, || ||Y ) is a Banach space. Let
x R and dene Ln (f ) = Sn f (x). Show Ln Y but limn ||Ln || = .
Show that for each x R, there exists a dense G subset of Y such that for f
in this set, |Sn f (x)| is unbounded. Finally, show there is a dense G subset of
Y having the property that |Sn f (x)| is unbounded on the rational numbers.
Hint: To do the rst part, let f (y) approximate sgn(Dn (xy)). Here sgn r =
1 if r > 0, 1 if r < 0 and 0 if r = 0. This rules out one possibility of the
uniform boundedness principle. After this, show the countable intersection of
dense G sets must also be a dense G set.
C (X; Rn ) {f C (X; Rn ) : (f ) + ||f || ||f || < }
where
||f || sup{|f (x)| : x X}
and
|f (x) f (y)|
(f ) sup{ : x, y X, x = y}.
|x y|
10. Let X be the Holder functions which are periodic of period 2. Dene
Ln f (x) = Sn f (x) where Ln : X Y for Y given in Problem 8. Show ||Ln ||
is bounded independent of n. Conclude that Ln f f in Y for all f X. In
other words, for the Holder continuous and 2 periodic functions, the Fourier
series converges to the function uniformly. Hint: Ln f (x) is given by

Ln f (x) = Dn (y) f (x y) dy

where f (x y) = f (x) + g (x, y) where |g (x, y)| C |y| . Use the fact the
Dirichlet kernel integrates to one to write
=|f (x)|
z }| {

Dn (y) f (x y) dy Dn (y) f (x) dy

18.5. EXERCISES 473
(( ) )

1
+C sin n+ y (g (x, y) / sin (y/2)) dy
2
Show the functions, y g (x, y) / sin (y/2) are bounded in L1 independent of
x and get a uniform bound on ||Ln ||. Now use a similar argument to show
{Ln f } is equicontinuous in addition to being uniformly bounded. In doing
this you might proceed as follows. Show

|Ln f (x) Ln f (x )| Dn (y) (f (x y) f (x y)) dy

||f || |x x |

(( ) )( )
f (x y) f (x) (f (x y) f (x ))
1
(y)
+ sin n+ y dy
2 sin 2
Then split this last integral into two cases, one for |y| < and one where
|y| . If Ln f fails to converge to f uniformly, then there exists > 0 and a
subsequence, nk such that ||Lnk f f || where this is the norm in Y or
equivalently the sup norm on [, ]. By the Arzela Ascoli theorem, there is
a further subsequence, Lnkl f which converges uniformly on [, ]. But by
Problem 7 Ln f (x) f (x).
11. Let X be a normed linear space and let M be a convex open set containing
0. Dene
x
(x) = inf{t > 0 : M }.
t
Show is a gauge function dened on X. This particular example is called a
Minkowski functional. It is of fundamental importance in the study of locally
convex topological vector spaces. A set, M , is convex if x + (1 )y M
whenever [0, 1] and x, y M .
12. The Hahn Banach theorem can be used to establish separation theorems.
Let M be an open convex set containing 0. Let x / M . Show there exists
x X such that Re x (x) 1 > Re x (y) for all y M . Hint: If y
M, (y) < 1. Show this. If x / M, (x) 1. Try f (x) = (x) for R.
Then extend f to the whole space using the Hahn Banach theorem and call
the result F , show F is continuous, then x it so F is the real part of x X .
13. A Banach space is said to be strictly convex if whenever ||x|| = ||y|| and
x = y, then
x + y

2 < ||x||.
F : X X is said to be a duality map if it satises the following: a.)

||F (x)|| = ||x||. b.) F (x)(x) = ||x||2 . Show that if X is strictly convex, then
474 BANACH SPACES
such a duality map exists. The duality map is an attempt to duplicate some
of the features of the Riesz map in Hilbert space. This Riesz map is the map
which takes a Hilbert space to its dual dened as follows.
R (x) (y) = (y, x)
The Riesz representation theorem for Hilbert space says this map is onto.
Hint: For an arbitrary Banach space, let
{ }
F (x) x : ||x || ||x|| and x (x) = ||x||
2
Show F (x) = by using the Hahn Banach theorem on f (x) = ||x||2 .

Next show F (x) is closed and convex. Finally show that you can replace
the inequality in the denition of F (x) with an equal sign. Now use strict
convexity to show there is only one element in F (x).
14. Prove the following theorem which is an improved version of the open mapping
theorem, [12]. Let X and Y be Banach spaces and let A L (X, Y ). Then
the following are equivalent.
AX = Y,
A is an open map.
Note this gives the equivalence between A being onto and A being an open
map. The open mapping theorem says that if A is onto then it is open.
15. Suppose D X and D is dense in X. Suppose L : D Y is linear and

e dened
||Lx|| K||x|| for all x D. Show there is a unique extension of L, L,
e e
on all of X with ||Lx|| K||x|| and L is linear. You do not get uniqueness
when you use the Hahn Banach theorem. Therefore, in the situation of this
problem, it is better to use this result.
16. A Banach space is uniformly convex if whenever ||xn ||, ||yn || 1 and
||xn + yn || 2, it follows that ||xn yn || 0. Show uniform convexity
implies strict convexity (See Problem 13). Hint: Suppose it is
not strictly

convex. Then there exist ||x|| and ||y|| both equal to 1 and xn +y
2
n
= 1
consider xn x and yn y, and use the conditions for uniform convexity to
get a contradiction. It can be shown that Lp is uniformly convex whenever
> p > 1. See Hewitt and Stromberg [28] or Ray [38].
17. Show that a closed subspace of a reexive Banach space is reexive. This is
done in the chapter. However, try to do it yourself.
18. xn converges weakly to x if for every x X , x (xn ) x (x). xn x

denotes weak convergence. Show that if ||xn x|| 0, then xn x.
19. Show that if X is uniformly convex, then if xn x and ||xn || ||x||, it

follows ||xn x|| 0. Hint: Use Lemma 18.2.9 to obtain f X with ||f || = 1
18.5. EXERCISES 475
and f (x) = ||x||. See Problem 16 for the denition of uniform convexity.
Now by the weak convergence, you can argue that if x = 0, f (xn / ||xn ||)
f (x/ ||x||). You also might try to show this in the special case where ||xn || =
||x|| = 1.
20. Suppose L L (X, Y ) and M L (Y, Z). Show M L L (X, Z) and that

(M L) = L M .
476 BANACH SPACES
Hilbert Spaces
19.1 Basic Theory

Denition 19.1.1 Let X be a vector space. An inner product is a mapping from
X X to C if X is complex and from X X to R if X is real, denoted by (x, y)
which satises the following.
(x, x) 0, (x, x) = 0 if and only if x = 0, (19.1.1)
(x, y) = (y, x). (19.1.2)

For a, b C and x, y, z X,
(ax + by, z) = a(x, z) + b(y, z). (19.1.3)
Note that 19.1.2 and 19.1.3 imply (x, ay + bz) = a(x, y) + b(x, z). Such a vector
space is called an inner product space.
The Cauchy Schwarz inequality is fundamental for the study of inner product
spaces.
Theorem 19.1.2 (Cauchy Schwarz) In any inner product space
|(x, y)| ||x|| ||y||.
Proof: Let C, || = 1, and (x, y) = |(x, y)| = Re(x, y). Let
F (t) = (x + ty, x + ty).
If y = 0 there is nothing to prove because
(x, 0) = (x, 0 + 0) = (x, 0) + (x, 0)
and so (x, 0) = 0. Thus, it can be assumed y = 0. Then from the axioms of the
inner product,
F (t) = ||x||2 + 2t Re(x, y) + t2 ||y||2 0.
477
478 HILBERT SPACES
This yields
||x||2 + 2t|(x, y)| + t2 ||y||2 0.
Since this inequality holds for all t R, it follows from the quadratic formula that
4|(x, y)|2 4||x||2 ||y||2 0.
This yields the conclusion and proves the theorem.

1/2
Proposition 19.1.3 For an inner product space, ||x|| (x, x) does specify a
norm.
Proof: All the axioms are obvious except the triangle inequality. To verify this,
2 2 2
||x + y|| (x + y, x + y) ||x|| + ||y|| + 2 Re (x, y)
2 2
||x|| + ||y|| + 2 |(x, y)|
2 2 2
||x|| + ||y|| + 2 ||x|| ||y|| = (||x|| + ||y||) .
The following lemma is called the parallelogram identity.
Lemma 19.1.4 In an inner product space,
||x + y||2 + ||x y||2 = 2||x||2 + 2||y||2.
The proof, a straightforward application of the inner product axioms, is left to

the reader.
Lemma 19.1.5 For x H, an inner product space,
||x|| = sup |(x, y)| (19.1.4)

||y||1
Proof: By the Cauchy Schwarz inequality, if x = 0,

( )
x
||x|| sup |(x, y)| x, = ||x|| .
||y||1 ||x||
It is obvious that 19.1.4 holds in the case that x = 0.
Denition 19.1.6 A Hilbert space is an inner product space which is complete.

Thus a Hilbert space is a Banach space in which the norm comes from an inner
product as described above.
In Hilbert space, one can dene a projection map onto closed convex nonempty
sets.
Denition 19.1.7 A set, K, is convex if whenever [0, 1] and x, y K, x +

(1 )y K.
19.1. BASIC THEORY 479
Theorem 19.1.8 Let K be a closed convex nonempty subset of a Hilbert space, H,

and let x H. Then there exists a unique point P x K such that ||P x x||
||y x|| for all y K.
Proof: Consider uniqueness. Suppose that z1 and z2 are two elements of K

such that for i = 1, 2,
||zi x|| ||y x|| (19.1.5)
for all y K. Also, note that since K is convex,
z1 + z2
K.
2
Therefore, by the parallelogram identity,
z1 + z2 z1 x z2 x 2
||z1 x||2 || x||2 = || + ||
2 2 2
z1 x 2 z2 x 2 z1 z2 2
= 2(|| || + || || ) || ||
2 2 2
1 2 1 2 z1 z2 2
= ||z1 x|| + ||z2 x|| || ||
2 2 2
z1 z2 2
||z1 x||2 || || ,
2
where the last inequality holds because of 19.1.5 letting zi = z2 and y = z1 . Hence
z1 = z2 and this shows uniqueness.
Now let = inf{||x y|| : y K} and let yn be a minimizing sequence. This
means {yn } K satises limn ||x yn || = . Now the following follows from
properties of the norm.
2 yn + ym
||yn x + ym x|| = 4(|| x||2 )
2
Then by the parallelogram identity, and convexity of K, yn +ym

2 K, and so
=||yn x+ym x||2

z }| {
yn + ym
|| (yn x) (ym x) || 2
= 2(||yn x|| + ||ym x|| ) 4(||
2 2
x|| )
2
2
2(||yn x||2 + ||ym x||2 ) 42.
Since ||x yn || , this shows {yn x} is a Cauchy sequence. Thus also {yn } is
a Cauchy sequence. Since H is complete, yn y for some y H which must be in
K because K is closed. Therefore
||x y|| = lim ||x yn || = .

n
Let P x = y.
480 HILBERT SPACES
Corollary 19.1.9 Let K be a closed, convex, nonempty subset of a Hilbert space,

H, and let x H. Then for z K, z = P x if and only if
Re(x z, y z) 0 (19.1.6)
for all y K.
Before proving this, consider what it says in the case where the Hilbert space is
Rn.
yy
K - x
z
Condition 19.1.6 says the angle, , shown in the diagram is always obtuse. Re-
member from calculus, the sign of x y is the same as the sign of the cosine of the
included angle between x and y. Thus, in nite dimensions, the conclusion of this
corollary says that z = P x exactly when the angle of the indicated angle is obtuse.
Surely the picture suggests this is reasonable.
The inequality 19.1.6 is an example of a variational inequality and this corollary
characterizes the projection of x onto K as the solution of this variational inequality.
Proof of Corollary: Let z K and let y K also. Since K is convex, it
follows that if t [0, 1],
z + t(y z) = (1 t) z + ty K.
Furthermore, every point of K can be written in this way. (Let t = 1 and y K.)
Therefore, z = P x if and only if for all y K and t [0, 1],
||x (z + t(y z))||2 = ||(x z) t(y z)||2 ||x z||2
for all t [0, 1] and y K if and only if for all t [0, 1] and y K
2 2 2
||x z|| + t2 ||y z|| 2t Re (x z, y z) ||x z||
If and only if for all t [0, 1],
2
t2 ||y z|| 2t Re (x z, y z) 0. (19.1.7)
Now this is equivalent to 19.1.7 holding for all t (0, 1). Therefore, dividing by
t (0, 1) , 19.1.7 is equivalent to
2
t ||y z|| 2 Re (x z, y z) 0
for all t (0, 1) which is equivalent to 19.1.6. This proves the corollary.
Corollary 19.1.10 Let K be a nonempty convex closed subset of a Hilbert space,
H. Then the projection map, P is continuous. In fact,
|P x P y| |x y| .
19.1. BASIC THEORY 481
Proof: Let x, x H. Then by Corollary 19.1.9,
Re (x P x , P x P x ) 0, Re (x P x, P x P x) 0
Hence
0 Re (x P x, P x P x ) Re (x P x , P x P x )
Re (x x , P x P x ) |P x P x |
2
=
and so
|P x P x | |x x | |P x P x | .
2

The next corollary is a more general form for the Brouwer xed point theorem.
Corollary 19.1.11 Let f : K K where K is a convex compact subset of Rn .

Then f has a xed point.
Proof: Let K B (0, R) and let P be the projection map onto K. Then
consider the map f P which maps B (0, R) to B (0, R) and is continuous. By the
Brouwer xed point theorem for balls, this map has a xed point. Thus there exists
x such that
f P (x) = x
Now the equation also requires x K and so P (x) = x. Hence f (x) = x.
Denition 19.1.12 Let H be a vector space and let U and V be subspaces. U V =

H if every element of H can be written as a sum of an element of U and an element
of V in a unique way.
The case where the closed convex set is a closed subspace is of special importance
and in this case the above corollary implies the following.
Corollary 19.1.13 Let K be a closed subspace of a Hilbert space, H, and let x H.

Then for z K, z = P x if and only if
(x z, y) = 0 (19.1.8)
for all y K. Furthermore, H = K K where
K {x H : (x, k) = 0 for all k K}
and
2 2 2
||x|| = ||x P x|| + ||P x|| . (19.1.9)
Proof: Since K is a subspace, the condition 19.1.6 implies Re(x z, y) 0

for all y K. Replacing y with y, it follows Re(x z, y) 0 which implies
Re(x z, y) 0 for all y. Therefore, Re(x z, y) = 0 for all y K. Now let
482 HILBERT SPACES
|| = 1 and (x z, y) = |(x z, y)|. Since K is a subspace, it follows y K for

all y K. Therefore,
0 = Re(x z, y) = (x z, y) = (x z, y) = |(x z, y)|.
This shows that z = P x, if and only if 19.1.8.
For x H, x = x P x + P x and from what was just shown, x P x K
and P x K. This shows that K + K = H. Is there only one way to write
a given element of H as a sum of a vector in K with a vector in K ? Suppose
y + z = y1 + z1 where z, z1 K and y, y1 K. Then (y y1 ) = (z1 z) and
so from what was just shown, (y y1 , y y1 ) = (y y1 , z1 z) = 0 which shows
y1 = y and consequently z1 = z. Finally, letting z = P x,
2 2 2
||x|| = (x z + z, x z + z) = ||x z|| + (x z, z) + (z, x z) + ||z||
2 2
= ||x z|| + ||z||
The following theorem is called the Riesz representation theorem for the dual of
a Hilbert space. If z H then dene an element f H by the rule (x, z) f (x). It
follows from the Cauchy Schwarz inequality and the properties of the inner product
that f H . The Riesz representation theorem says that all elements of H are of
this form.
Theorem 19.1.14 Let H be a Hilbert space and let f H . Then there exists a
unique z H such that
f (x) = (x, z) (19.1.10)
for all x H.
Proof: Letting y, w H the assumption that f is linear implies
f (yf (w) f (y)w) = f (w) f (y) f (y) f (w) = 0
which shows that yf (w) f (y)w f 1 (0), which is a closed subspace of H since
f is continuous. If f 1 (0) = H, then f is the zero map and z = 0 is the unique
element of H which satises 19.1.10. If f 1 (0) = H, pick u / f 1 (0) and let
w u P u = 0. Thus Corollary 19.1.13 implies (y, w) = 0 for all y f 1 (0). In
particular, let y = xf (w) f (x)w where x H is arbitrary. Therefore,
0 = (f (w)x f (x)w, w) = f (w)(x, w) f (x)||w||2.
Thus, solving for f (x) and using the properties of the inner product,
f (w)w
f (x) = (x, )
||w||2
Let z = f (w)w/||w||2 . This proves the existence of z. If f (x) = (x, zi ) i = 1, 2,
for all x H, then for all x H, then (x, z1 z2 ) = 0 which implies, upon taking
x = z1 z2 that z1 = z2 . This proves the theorem.
If R : H H is dened by Rx (y) (y, x) , the Riesz representation theorem
above states this map is onto. This map is called the Riesz map. It is routine to
show R is linear and |Rx| = |x|.
19.2. APPROXIMATIONS IN HILBERT SPACE 483
19.2 Approximations In Hilbert Space

The Gram Schmidt process applies in any Hilbert space.
Theorem 19.2.1 Let {x1 , , xn } be a basis for M a subspace of H a Hilbert

space. Then there exists an orthonormal basis for M, {u1 , , un } which has
the property that for each k n, span(x1 , , xk ) = span (u1 , , uk ) . Also if
{x1 , , xn } H, then
span (x1 , , xn )
is a closed subspace.
Proof: Let {x1 , , xn } be a basis for M. Let u1 x1 / |x1 | . Thus for k = 1,

span (u1 ) = span (x1 ) and {u1 } is an orthonormal set. Now suppose for some k < n,
u1 , , uk have been chosen such that (uj ul ) = jl and span (x1 , , xk ) =
span (u1 , , uk ). Then dene
k
xk+1 j=1 (xk+1 uj ) uj
uk+1 k ,
(19.2.11)
xk+1 j=1 (xk+1 uj ) uj
where the denominator is not equal to zero because the xj form a basis and so
xk+1
/ span (x1 , , xk ) = span (u1 , , uk )
Thus by induction,
uk+1 span (u1 , , uk , xk+1 ) = span (x1 , , xk , xk+1 ) .
Also, xk+1 span (u1 , , uk , uk+1 ) which is seen easily by solving 19.2.11 for xk+1
and it follows
span (x1 , , xk , xk+1 ) = span (u1 , , uk , uk+1 ) .
If l k,

k
(uk+1 ul ) = C (xk+1 ul ) (xk+1 uj ) (uj ul )
j=1

k
= C (xk+1 ul ) (xk+1 uj ) lj
j=1
= C ((xk+1 ul ) (xk+1 ul )) = 0.
n
The vectors, {uj }j=1 , generated in this way are therefore an orthonormal basis
because each vector has unit length.
Consider the second claim about nite dimensional subspaces. Without loss of
generality, assume {x1 , , xn } is linearly independent. If it is not, delete vectors
484 HILBERT SPACES
until a linearly independent set is obtained. Then by the rst part, span (x1 , , xn ) =
span (u1 , , un ) M where the ui are an orthonormal set of vectors. Suppose
{yk } M and yk y H. Is y M ? Let

n
yk ckj uj
j=1
( )T
Then let ck ck1 , , ckn . Then

k
n
k
n
( )
n
( )
c cl 2 cj clj =
2
ckj clj uj , ckj clj uj
j=1 j=1 j=1
2
= ||yk yl ||
{ }
which shows ck is a Cauchy sequence in Fn and so it converges to c Fn . Thus

n
n
y = lim yk = lim ckj uj = cj uj M.
k k
j=1 j=1
This completes the proof.
Theorem 19.2.2 Let M be the span of {u1 , , un } in a Hilbert space, H and let
y H. Then P y is given by

n
Py = (y, uk ) uk (19.2.12)
k=1
and the distance is given by

v
u
u 2 n
t|y| 2
|(y, uk )| . (19.2.13)
k=1
Proof:
( )

n
n
y (y, uk ) uk , up = (y, up ) (y, uk ) (uk , up )
k=1 k=1
= (y, up ) (y, up ) = 0
It follows that ( )

n
y (y, uk ) uk , u =0
k=1
for all u M and so by Corollary 19.1.13 this veries 19.2.12.

19.2. APPROXIMATIONS IN HILBERT SPACE 485
The square of the distance, d is given by

( )
n
n
2
d = y (y, uk ) uk , y (y, uk ) uk
k=1 k=1
2
n
2

n
2
= |y| 2 |(y, uk )| + |(y, uk )|
k=1 k=1
and this shows 19.2.13.

What if the subspace is the span of vectors which are not orthonormal? There
is a very interesting formula for the distance between a point of a Hilbert space and
a nite dimensional subspace spanned by an arbitrary basis.
Denition 19.2.3 Let {x1 , , xn } H, a Hilbert space. Dene

(x1 , x1 ) (x1 , xn )
.. ..
G (x1 , , xn ) . . (19.2.14)
(xn , x1 ) (xn , xn )
Thus the ij th entry of this matrix is (xi , xj ). This is sometimes called the Gram
matrix. Also dene G (x1 , , xn ) as the determinant of this matrix, also called the
Gram determinant.

(x1 , x1 ) (x1 , xn )

.. ..
G (x1 , , xn ) . . (19.2.15)

(xn , x1 ) (xn , xn )
The theorem is the following.
Theorem 19.2.4 Let M = span (x1 , , xn ) H, a Real Hilbert space where

{x1 , , xn } is a basis and let y H. Then letting d be the distance from y to M,
G (x1 , , xn , y)
d2 = . (19.2.16)
G (x1 , , xn )
n
Proof: By Theorem 19.2.1 M is a closed subspace of H. Let k=1 k xk be the
element of M which is closest to y. Then by Corollary 19.1.13,
( )
n
y k xk , xp = 0
k=1
for each p = 1, 2, , n. This yields the system of equations,

n
(y, xp ) = (xp , xk ) k , p = 1, 2, , n (19.2.17)
k=1
486 HILBERT SPACES
Also by Corollary 19.1.13,

d2
z }| {
2 n 2

n
2
||y|| = y k xk + k xk

k=1 k=1
and so, using 19.2.17,

( )
2

||y|| = d +2
k (xk , xj ) j
j k

= d2 + (y, xj ) j (19.2.18)
j
d2 + yxT (19.2.19)
in which
yxT ((y, x1 ) , , (y, xn )) , T (1 , , n ) .
Then 19.2.17 and 19.2.18 imply the following system
( )( ) ( )
G (x1 , , xn ) 0 yx
= 2
yxT 1 d2 ||y||
By Cramers rule,
( )
G (x1 , , xn ) yx
det 2
yxT ||y||
d2 = ( )
G (x1 , , xn ) 0
det
yxT 1
( )
G (x1 , , xn ) yx
det 2
yxT ||y||
=
det (G (x1 , , xn ))
det (G (x1 , , xn , y)) G (x1 , , xn , y)
= =
det (G (x1 , , xn )) G (x1 , , xn )
and this proves the theorem.
19.3 Orthonormal Sets

The concept of an orthonormal set of vectors is a generalization of the notion of the
standard basis vectors of Rn or Cn .
Denition 19.3.1 Let H be a Hilbert space. S H is called an orthonormal set
if ||x|| = 1 for all x S and (x, y) = 0 if x, y S and x =
y. For any set, D,
D {x H : (x, d) = 0 for all d D} .
If S is a set, span (S) is the set of all nite linear combinations of vectors from S.
19.3. ORTHONORMAL SETS 487
You should verify that D is always a closed subspace of H.
Theorem 19.3.2 In any separable Hilbert space, H, there exists a countable or-
thonormal set, S = {xi } such that the span of these vectors is dense in H. Further-
more, if span (S) is dense, then for x H,

n
x= (x, xi ) xi lim (x, xi ) xi . (19.3.20)
n
i=1 i=1
Proof: Let F denote the collection of all orthonormal subsets of H. F is

nonempty because {x} F where ||x|| = 1. The set, F is a partially ordered set
with the order given by set inclusion. By the Hausdor maximal theorem, there
exists a maximal chain, C in F. Then let S C. It follows S must be a maximal
orthonormal set of vectors. Why? It remains to verify that S is countable span (S)
is dense, and the condition, 19.3.20 holds. To see S is countable note that if x, y S,
then
2 2 2 2 2
||x y|| = ||x|| + ||y|| 2 Re (x, y) = ||x|| + ||y|| = 2.
( 1)
Therefore, the open sets, B x, 2 for x S are disjoint and cover S. Since H is
assumed to be separable, there exists a point from a countable dense set in each of
these disjoint balls showing there can only be countably many of the balls and that
consequently, S is countable as claimed.
It remains to verify 19.3.20 and that span (S) is dense. If span (S) is not dense,
then span (S) is a closed proper subspace of H and letting y / span (S),
y Py
z span (S) .
||y P y||
But then S {z} would be a larger orthonormal set of vectors contradicting the
maximality of S.

It remains to verify 19.3.20. Let S = {xi }i=1 and consider the problem of
choosing the constants, ck in such a way as to minimize the expression
2

n

x ck xk =

k=1
2

n
2

n
n
||x|| + |ck | ck (x, xk ) ck (x, xk ).
k=1 k=1 k=1
This equals
2

n
2

n
2
||x|| + |ck (x, xk )| |(x, xk )|
k=1 k=1
and therefore, this minimum is achieved when ck = (x, xk ) and equals
2

n
2
||x|| |(x, xk )|
k=1
488 HILBERT SPACES
Now since span (S) is dense, there exists n large enough that for some choice of
constants, ck ,
2

n

x ck xk < .

k=1
However, from what was just shown,

2 2

n
n

x (x, xi ) xi x ck xk <

i=1 k=1
n
showing that limn i=1 (x, xi ) xi = x as claimed. This proves the theorem.
The proof of this theorem contains the following corollary.
Corollary 19.3.3 Let S be any orthonormal set of vectors and let
{x1 , , xn } S.
Then if x H
2 2

n
n

x ck xk x (x, xi ) xi

k=1 i=1
for all choices of constants, ck . In addition to this, Bessels inequality
2

n
2
||x|| |(x, xk )| .
k=1

If S is countable and span (S) is dense, then letting {xi }i=1 = S, 19.3.20 follows.
19.4 Fourier Series, An Example

In this section consider the Hilbert space, L2 (0, 2) with the inner product,
2
(f, g) f gdm.
0
This is a Hilbert space because of the theorem which states the Lp spaces are
complete, Theorem 12.2.2 on Page 287. An example of an orthonormal set of
functions in L2 (0, 2) is
1
n (x) einx
2
for n an integer. Is it true that the span of these functions is dense in L2 (0, 2)?
Theorem 19.4.1 Let S = {n }nZ . Then span (S) is dense in L2 (0, 2).
19.4. FOURIER SERIES, AN EXAMPLE 489
Proof: By regularity of Lebesgue measure, it follows from Theorem 12.5.3 that

Cc (0, 2) is dense in L2 (0, 2) . Therefore, it suces to show that for g Cc (0, 2) ,
then for every > 0 there exists h span (S) such that ||g h||L2 (0,2) < .
Let T denote the points of C which are of the form eit for t R. Let A denote
the algebra of functions consisting of polynomials in z and 1/z for z T. Thus a
typical such function would be one of the form

m
ck z k
k=m
for m chosen large enough. This algebra separates the points of T because it contains
the function, p (z) = z. It annihilates no point of t because it contains the constant
function 1. Furthermore, it has the property that for f A, f A. By the Stone
Weierstrass approximation theorem, Theorem 11.3.1 on Page 278, A is dense in
( ) ). Now for g Cc (0, 2) , extend g to all of R to be 2 periodic. Then letting
C (T
G eit g (t) , it follows G is well dened and continuous on T. Therefore, there
exists H A such that for all t R,
( it ) ( )
H e G eit < 2 /2.
( )
Thus H eit is of the form
( )
m
( )k
m
H eit = ck eit = ck eikt span (S) .
k=m k=m
m ikt
Let h (t) = k=m ck e . Then
( 2 )1/2 ( 2 )1/2
2
|g h| dx max {|g (t) h (t)| : t [0, 2]} dx
0 0
( )1/2
2 { ( ) ( ) }
= max G eit H eit : t [0, 2] dx
0
( 2 )1/2
2
< = .
0 2
Corollary 19.4.2 For f L2 (0, 2) ,

m

lim f (f, k ) k
m
k=m L2 (0,2)
Proof: This follows from Theorem 19.3.2 on Page 487.

490 HILBERT SPACES
19.5 General Theory Of Continuous Semigroups

Much more on semigroups is available in Yosida [42]. This is just an introduction
to the subject.
Denition 19.5.1 A strongly continuous semigroup dened on H,a Banach space

is a function S : [0, ) H which satises the following for all x0 H.
S (t) L (H, H) , S (t + s) = S (t) S (s) ,

t S (t) x0 is continuous, lim S (t) x0 = x0
t0+
Sometimes such a semigroup is said to be C0 . It is said to have the linear operator

A as its generator if
{ }
S (h) x x
D (A) x : lim exists
h0 h
and for x D (A) , A is dened by
S (h) x x
lim Ax
h0 h
The assertion that t S (t) x0 is continuous and that S (t) L (H, H) is not
sucient to say there is a bound on ||S (t)|| for all t. Also the assertion that for
each x0 ,
lim S (t) x0 = x0
t0+
is not the same as saying that S (t) I in L (H, H) . It is a much weaker assertion.
The next theorem gives information on the growth of ||S (t)|| . It turns out it has
exponential growth.
Lemma 19.5.2 Let M sup {||S (t)|| : t [0, T ]} . Then M < .
Proof: If this is not true, then there exists tn [0, T ] such that ||S (tn )|| n.
That is the operators S (tn ) are not uniformly bounded. By the uniform bound-
edness principle, Theorem 18.1.8, there exists x H such that ||S (tn ) x|| is not
bounded. However, this is impossible because it is given that t S (t) x is con-
tinuous on [0, T ] and so t ||S (t) x|| must achieve its maximum on this compact
set.
Now here is the main result for growth of ||S (t)||.
Theorem 19.5.3 For M described in Lemma 19.5.2, there exists such that
||S (t)|| M et .
In fact, can be chosen such that M 1/T = e .

19.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 491
Proof: Let t be arbitrary. Then t = mT + r (t) where 0 r (t) < T . Then by

the semigroup property
||S (t)|| = ||S (mT + r (t))||
m
= ||S (r (t)) S (T ) || M m+1
Now mT t mT + r (t) (m + 1) T and so
t
m m+1
T
Therefore,
( )t
||S (t)|| M (t/T )+1 = M M 1/T .
Let M 1/T e and then

||S (t)|| M et
Denition 19.5.4 Let S (t) be a continuous semigroup as described above. It is
called a contraction semigroup if for all t 0
||S (t)|| 1.
It is called a bounded semigroup if there exists M such that for all t 0,
||S (t)|| M
Note that for S (t) an arbitrary continuous semigroup satisfying
||S (t)|| M et ,
It follows that the semigroup,
T (t) = et S (t)
is a bounded semigroup which satises
||T (t)|| M.
Proposition 19.5.5 Given a continuous semigroup S (t) , its generator A exists
and is a closed densely dened operator. Furthermore, for
||S (t)|| M et
1
and > , I A is onto and (I A) maps H onto D (A) and is in L (H, H).
Also for these values of ,

1
(I A) x= et S (t) xdt.
0
For > , the following estimate holds.

1 M
(I A)
| |
492 HILBERT SPACES
Proof: First note D (A) = . In fact 0 D (A). It follows from Theorem 19.5.3
that for all large enough, one can dene a Laplace transform,

R () x et S (t) xdt H.
0
Here the integral is the ordinary improper Riemann integral. I claim each of these
is in D (A) .
S (h) 0 et S (t) xdt 0 et S (t) xdt
h
Using the semigroup property and changing the variables in the rst of the above
integrals, this equals
( )
1
= eh et S (t) xdt et S (t) xdt
h h 0
( h )
1 ( h ) t t
= e 1 e S (t) xdt e h
e S (t) xdt
h 0 0
The limit as h 0 exists and equals

R () x x
Thus R () x D (A) as claimed and
AR () x = R () x x
Hence
x = (I A) R () x. (19.5.21)
Since x is arbitrary, this shows that for large enough, I A is onto.
Why is D (A) dense? It was shown above that R () x and thererfore R () x
D (A) . Then for > where ||S (t)|| M et ,

||R () x x|| = et S (t) xdt et xdt
0 0

t
e (S (t) x x) dt
0

h t t
= e (S (t) x x) dt + e (S (t) x x) dt
0 h

h t
e (S (t) x x) dt + e()t dt (M + 1) ||x||
0 h
Now since S (t) x x 0, it follows that for h suciently small

h t
e dt + e()h (M + 1) ||x||
2 0

+ e()h (M + 1) ||x|| <
2
whenever is large enough. Thus D (A) is dense as claimed.

Let x D (A) . Then for y H ,
( t ) t ( )
S (h) x x
y S (s) Axds = y S (s) lim ds
0 0 h0+ h
The dierence quotient is given to have a limit and so the dierence quotients are
bounded. Therefore, one can use the dominated convergence theorem to take the
limit outside the integral and write the above equals
t ( )
S (h) x x
lim y S (s) ds
h0+ 0 h
( ( t ))
t+h
1
= lim y S (s) xds S (s) xds
h0+ h h 0
( h )
t+h

= lim y S (s) xds S (s) x
h0+ t 0

= y (S (t) x x) .
Thus since y is arbitrary, for x D (A)
t
S (t) x = x + S (s) Axds
0
Why is A closed? Suppose xn x and xn D (A) while Axn z. From what

was just shown t
S (t) xn = xn + S (s) Axn ds
0
and so, passing to the limit this yields
t
S (t) x = x + S (s) zds
0
which implies
S (t) x x 1 t
lim = lim S (s) zds = z
t0+ h h0+ t 0
which shows Ax = z and x D (A) . Thus A is closed.

1
Because of 19.5.21 it follows R () x = (I A) x. Also

||R () x|| et S (t) xdt
0

M
et M et dt ||x|| ||x||
0 | |
1
so R () = (I A) L (H, H) and this also proves the last estimate. Also from
19.5.21, R () maps H onto D (A). This proves the proposition.
1
The linear mapping (I A) is called the resolvent.
The above proof contains an argument which implies the following corollary.
494 HILBERT SPACES
Corollary 19.5.6 Let S (t) be a continuous semigroup and let A be its generator.
Then for 0 < a < b and x D (A)
b
S (b) x S (a) x = S (t) Axdt
a
and also for t > 0 you can take the derivative from the left,
S (t) x S (t h) x
lim = S (t) Ax
h0+ h
Proof:Letting y H ,
( ) ( )

b b
S (h) x x
y S (t) Axdt = y S (t) lim dt
a a h0 h
The dierence quotients are bounded because they converge to Ax. Therefore, from
the dominated convergence theorem,
( ) b ( )

b
S (h) x x
y S (t) Axdt = lim y S (t) dt
a h0 a h
( )

b
S (h) x x
= lim y S (t) dt
h0 a h
( )
1 b+h
1 b
= lim y S (t) xdt S (t) xdt
h0 h a+h h a
( )
1 b+h 1 a+h
= lim y S (t) xdt S (t) xdt
h0 h b h a
= y (S (b) x S (a) x)
Since y is arbitrary, this proves the rst part. Now from what was just shown, if
t > 0 and h is small enough,

S (t) x S (t h) x 1 t
= S (s) Axds
h h th
which converges to S (t) Ax as h 0 + . This proves the corollary.

Given a closed densely dened operator, when is it the generator of a bounded
semigroup? This is answered in the following theorem which is called the Hille
Yosida theorem.
Theorem 19.5.7 Suppose A is a densely dened linear operator which has the
property that for all > 0,
1
(I A) L (H, H)
which means that I A : D (A) H is one to one and onto with continuous
inverse. Suppose also that for all n N,
( )n M
1
(I A) n . (19.5.22)

Then there exists a continuous semigroup, S (t) which has A as its generator and
satises ||S (t)|| M and A is closed. In fact letting
( )
1
S (t) exp + 2 (I A)
it follows lim S (t) x = S (t) x uniformly on nite intervals. Conversely, if A

is the generator of S (t) , a bounded continuous semigroup having ||S (t)|| M, then
1
(I A) L (H, H) for all > 0 and 19.5.22 holds.
Proof: Consider the operator

1
(I A) A
On D (A) , this equals

1
+ 2 (I A) (19.5.23)
which makes sense on all of H, not just on D (A). Also this last expression equals
1
A (I A)
on all of H because I A is given to be onto. Denote this as A to save notation.

Thus on D (A) ,
1 1
A (I A) = (I A) A
For x D (A) ,

1
(I A) x x

1
= (I A) (x (I A) x)
M
1
= (I A) Ax ||Ax||

which converges to 0. Therefore, for x D (A) ,
M ||Ax||
1
||A x Ax|| = (I A) Ax Ax (19.5.24)

1
so it also converges to 0. Because of 19.5.23, the operator A (I A) is contin-
uous. Now using 19.5.23 dene an approximate semigroup
( )k
tk 2 (I A)1

S (t) et
k!
k=0
496 HILBERT SPACES
The sum converges in L (H, H) because it converges absolutely and L (H, H) is

complete. Here is why it converges absolutely. It follows from the assumption in
the lemma.
( )k
2
k
( )k
t (I A)
1 tk 2 (I A)1

k! k!
k=0 k=0
k k
t M
= M et
k!
k=0
Thus
||S (t)|| et M et = M
The series converges uniformly on any nite interval thanks to the Weierstrass M
test. Thus t S (t) is continuous and it is also routine to verify the semigroup
identity. Clearly limt0 S (t) x = x. It is also the case that S (t) is generated by
1
+ 2 (I A) = A . This is easy to show from dierentiating the power series
which has a continuous derivative. Thus
( )k ( )k+1
tk 2 (I A)1
x tk 2 (I A)1
x
() et + et
k! k!
k=0 k=0
( ) ( )
1 1
= + 2 (I A) S (t) x = S (t) + 2 (I A) x
( )
1
Now let t 0+ to obtain + 2 (I A) x = A x.
1 1
Claim: For , > 0, (I A) and (I A) commute.
Proof of claim: Suppose
1 1
y = (I A) (I A) x (19.5.25)
1 1
z = (I A) (I A) x (19.5.26)
I need to show y = z. First note z D (A) and

1
(I A) z = (I A) x D (A) .
Hence
(I A) z = ( ) z + (I A) z D (A) .
Similarly
(I A) y, (I A) y D (A) .
From 19.5.25
(I A) (I A) y = x
and using 19.5.26,
x = (I A) (I A) z
= (( ) I + (I A)) (I A) z
2
= ( ) (I A) z + (I A) z
= (I A) ( ) z + (I A) (I A) z
= (I A) ( ) z + (I A) (( ) I + (I A)) z
= (I A) ( ) z + (I A) (( ) z + (I A) z)
= (I A) ( ) z + (I A) ( ) z + (I A) (I A) z
= (I A) (I A) z
Thus
x = (I A) (I A) z = (I A) (I A) y
and so z = y. This proves the claim.
It follows from the description of S (t) that S (t) and S (s) commute and also
A commutes with S (t) for any t.
I want to show that for each x D (A) ,
lim S (t) x S (t) x

where S (t) is the desired semigroup. Let x D (A)

t
d
||S (t) x S (t) x|| = (S (t r) S (r)) xdr
0 dr
Since A commutes with S (r) , the following formula follows from 19.5.24.
t

= (S (t r) S (r) A x S (t r) A S (r) x) dr
0
t
||S (t r) S (r) (A x A x)|| dr
0
M 2 t ||A x A x|| M 2 t (||A x Ax|| + ||Ax A x||)
( )
||Ax|| ||Ax||
+ tM 2

Hence whenever , large enough, ||S (t) x S (t) x|| is small. Thus S (t) x
converges uniformly on nite intervals to something denoted by S (t) x. Therefore,
t S (t) x is continuous for each x D (A) and also
||S (t) x|| = lim ||S (t) x|| M ||x||

498 HILBERT SPACES
so that S (t) can be extended to a continuous linear map, still called S (t) dened
on all of H which also satises ||S (t)|| M since D (A) is dense in H. If x is
arbitrary, let y D (A) be close to x. Then
||S (t) x S (t) x|| ||S (t) x S (t) y|| + ||S (t) y S (t) y||
+ ||S (t) y S (t) x||
2M ||x y|| + ||S (t) y S (t) y||

and so lim S (t) x = S (t) x for all x, uniformly on nite intervals. Thus
t S (t) x is continuous for any x H.
It remains to verify A generates S (t) and for all x, S (t) x x 0. From the
above,
t
S (t) x = x + S (s) A xds (19.5.27)
0
and so
lim ||S (t) x x|| = 0
t0+
By the uniform convergence just shown, there exists large enough that for all
t [0, ] ,
||S (t) x S (t) x|| < .
Then
lim sup ||S (t) x x|| lim sup (||S (t) x S (t) x|| + ||S (t) x x||)
t0+ t0+
lim sup ( + ||S (t) x x||)
t0+
It follows limt0+ S (t) x = x because is arbitrary.

Next, lim A x = Ax for all x D (A) by 19.5.24. Therefore, passing to the
limit in 19.5.27 yields from the uniform convergence
t
S (t) x = x + S (s) Axds
0
and by continuity of s S (s) Ax, it follows

S (h) x x 1 h
lim = lim S (s) Axds = Ax
h0+ h h0 h 0
Thus letting B denote the generator of S (t) , D (A) D (B) and A = B on D (A) .
It only remains to verify D (A) = D (B) .
To do this, let > 0 and consider the following where y H is arbitrary.
( )
1 1 1
(I B) y = (I B) (I A) (I A) y
1
Now (I A) y D (A) D (B) and A = B on D (A) and so
1 1
(I A) (I A) y = (I B) (I A) y
which implies,
1
(I B) y=
( )
1 1 1
(I B) (I B) (I A) y = (I A) y
Recall from Proposition 19.5.5 that an arbitrary element of D (B) is of the form
1
(I B) y and this has shown every such vector is in D (A) , in fact it equals
1
(I A) y. Hence D (B) D (A) which shows A generates S (t) and this proves
the rst half of the theorem.
Next suppose A is the generator of a semigroup S (t) having ||S (t)|| M. Then
by Proposition 19.5.5 for all > 0, (I A) is onto and

1
(I A) = et S (t) dt
0
thus
( )n
1
(I A)

= e (t1 ++tn )
S (t1 + + tn ) dt1 dtn
0 0

M
e(t1 ++tn ) M dt1 dtn = n .
0 0
19.5.1 An Evolution Equation

When generates a continuous semigroup, one can consider a very interesting
theorem about evolution equations of the form
y y = g (t)
provided t g (t) is C 1 .
Theorem 19.5.8 Let be the generator of S (t) , a continuous semigroup on H,

a Banach space and let t g (t) be in C 1 (0, ; H). Then there exists a unique
solution to the initial value problem
y y = g, y (0) = y0 D ()
and it is given by t
y (t) = S (t) y0 + S (t s) g (s) ds. (19.5.28)
0
This solution is continuous having continuous derivative and has values in D ().
500 HILBERT SPACES
Proof: First
t I show the following claim.
Claim: 0 S (t s) g (s) ds D () and
( t ) t
S (t s) g (s) ds = S (t) g (0) g (t) + S (t s) g (s) ds
0 0
Proof of the claim:

( t t )
1
S (h) S (t s) g (s) ds S (t s) g (s) ds
h 0 0
( t t )
1
= S (t s + h) g (s) ds S (t s) g (s) ds
h 0 0
( t )
th
1
= S (t s) g (s + h) ds S (t s) g (s) ds
h h 0

1 0 th
g (s + h) g (s)
= S (t s) g (s + h) ds + S (t s)
h h 0 h
t
1
S (t s) g (s) ds
h th
Using the estimate in Theorem 19.5.3 on Page 490 and the dominated convergence
theorem, the limit as h 0 of the above equals
t
S (t) g (0) g (t) + S (t s) g (s) ds
0
which proves the claim.

Since y0 D () ,
S (h) y0 y0
S (t) y0 = S (t) lim
h0 h
S (t + h) S (t)
= lim y0
h0 h
S (h) S (t) y0 S (t) y0
= lim (19.5.29)
h0 h
Since this limit exists, the last limit in the above exists and equals
S (t) y0 (19.5.30)
and so S (t) y0 D (). Now consider 19.5.28.
y (t + h) y (t) S (t + h) S (t)
= y0 +
h h
( )
t+h t
1
S (t s + h) g (s) ds S (t s) g (s) ds
h 0 0

S (t + h) S (t) 1 t+h
= y0 + S (t s + h) g (s) ds
h h t
( t t )
1
+ S (h) S (t s) g (s) ds S (t s) g (s) ds
h 0 0
From the claim and 19.5.29, 19.5.30 the limit of the right side is
( t )
S (t) y0 + g (t) + S (t s) g (s) ds
0
( t )
= S (t) y0 + S (t s) g (s) ds + g (t)
0
Hence
y (t) = y (t) + g (t)
and from the formula, y is continuous since by the claim and 19.5.30 it also equals
t
S (t) y0 + g (t) + S (t) g (0) g (t) + S (t s) g (s) ds
0
which is continuous. The claim and 19.5.30 also shows y (t) D (). This proves
the existence part of the lemma.
It remains to prove the uniqueness part. It suces to show that if
y y = 0, y (0) = 0
and y is C 1 having values in D () , then y = 0. Suppose then that y is this way.
Letting 0 < s < t,
d
(S (t s) y (s))
ds
y (s + h) y (s)
lim S (t s h)
h0 h
S (t s) y (s) S (t s h) y (s)

h

provided the limit exists. Since y exists and y (s) D () , this equals
S (t s) y (s) S (t s) y (s) = 0.
Let y H . This has shown that on the open interval (0, t) the function s
y (S (t s) y (s)) has a derivative equal to 0. Also from continuity of S and y, this
function is continuous on [0, t]. Therefore, it is constant on [0, t] by the mean value
theorem. At s = 0, this function equals 0. Therefore, it equals 0 on [0, t]. Thus
for xed s > 0 and letting t > s, y (S (t s) y (s)) = 0. Now let t decrease toward
s. Then y (y (s)) = 0 and since y was arbitrary, it follows y (s) = 0. This proves
uniqueness.
502 HILBERT SPACES
19.5.2 Adjoints, Hilbert Space

In Hilbert space, there are some special things which are true.
Denition 19.5.9 Let A be a densely dened closed operator on H a real Hilbert

space. Then A is dened as follows.
D (A ) {y H : |(Ax, y)| C |x|}
Then since D (A) is dense, there exists a unique element of H denoted by A y such
that
(Ax, y) = (x, A y)
for all x D (A) .
Lemma 19.5.10 Let A be closed and densely dened on D (H) H, a Hilbert

space. Then A is also closed and densely dened. Also (A ) = A. In addition to
1 1
this, if (I A) L (H, H) , then (I A ) L (H, H) and
(( )n ) ( )n
1 1
(I A) = (I A )
Proof: Denote by [x, y] an ordered pair in H H. Dene : H H H H

by
[x, y] [y, x]
Then the denition of adjoint implies that for G (B) equal to the graph of B,

G (A ) = ( G (A)) (19.5.31)
In this notation the inner product on H H with respect to which is dened is

given by
([x, y] , [a, b]) (x, a) + (y, b) .
Here is why this is so. For [x, A x] G (A ) it follows that for all y D (A)
([x, A x] , [Ay, y]) = (Ay, x) + (y, A x) = 0

and so [x, A x] ( G (A)) which shows

G (A ) ( G (A))

To obtain the other inclusion, let [a, b] ( G (A)) . This means that for all x
D (A) ,
([a, b] , [Ax, x]) = 0.
In other words, for all x D (A) ,
(Ax, a) = (x, b)
and so |(Ax, a)| C |x| for all x D (A) which shows a D (A ) and
(x, A a) = (x, b)
for all x D (A) . Therefore, since D (A) is dense, it follows b = A a and so

[a, b] G (A ) . This shows the other inclusion.
Note that if V is any subspace of the Hilbert space H H,
( )
V =V
and S is always a closed subspace. Also and commute. The reason for this is

that [x, y] ( V ) means that
(x, b) + (y, a) = 0
( )
for all [a, b] V and [x, y] V means [y, x] V so for all [a, b] V,
(y, a) + (x, b) = 0
which says the same thing. It is also clear that has the eect of multiplication
by 1.
It follows from the above description of the graph of A that even if G (A) were
not closed it would still be the case that G (A ) is closed.

Why is D (A ) dense? Suppose z D (A ) . Then for all y D (A ) so that
( )

[y, Ay] G (A ) , it follows [z, 0] G (A ) = ( G (A)) = G (A) but this
implies
[0, z] G (A)
and so z = A0 = 0. Thus D (A ) must be dense since there is no nonzero vector

in D (A ) .
Since A is a closed operator, meaning G (A) is closed in H H, it follows from
the above formula that
( ( ( )) ( )
)
G (A ) = ( G (A)) = ( G (A))
( ) ( )

= (G (A)) = G (A) = G (A)

and so (A ) = A.
Now consider the nal claim. First let y D (A ) = D (I A ) . Then letting
x H be arbitrary, ( ( ) )
1
x, (I A) (I A) y
( ) ( ( ) )
1 1
(I A) (I A) x, y = x, (I A) (I A ) y
Thus
( ) ( )
1 1
(I A) (I A) = I = (I A) (I A ) (19.5.32)
504 HILBERT SPACES
on D (A ). Next let x D (A) = D (I A) and y H arbitrary.

( ) ( ( ) )
1 1
(x, y) = (I A) (I A) x, y = (I A) x, (I A) y
( ( ) )
1
Now it follows (I A) x, (I A) y |y| |x| for any x D (A) and so
( )
1
(I A) y D (A )
Hence ( ( ) )
1
(x, y) = x, (I A ) (I A) y .
Since x D (A) is arbitrary and D (A) is dense, it follows

( )
1
(I A ) (I A) =I (19.5.33)
From 19.5.32 and 19.5.33 it follows

( )
1 1
(I A ) = (I A)
and (I A ) is one to one and onto with continuous inverse. Finally, from the
above,
( )n (( ) )n (( )n )
1 1 1
(I A ) = (I A) = (I A) .

With this preparation, here is an interesting result about the adjoint of the
generator of a continuous bounded semigroup. I found this in Balakrishnan [5].
Theorem 19.5.11 Suppose A is a densely dened closed operator which generates

a continuous semigroup, S (t) . Then A is also a closed densely dened operator
which generates S (t) and S (t) is also a continuous semigroup.
Proof: First suppose S (t) is also a bounded semigroup, ||S (t)|| M . From
Lemma 19.5.10 A is closed and densely dened. It follows from the Hille Yosida
theorem, Theorem 19.5.7 that
( )n M
1
(I A) n

From Lemma 19.5.10 and the fact the adjoint of a bounded linear operator preserves
the norm,
(( )n ) (( ) )n
M
(I A)
1 = (I A)
1
n
( )n
1
= (I A )
and so by Theorem 19.5.7 again it follows A generates a continuous semigroup,

T (t) which satises ||T (t)|| M. I need to identify T (t) with S (t). However, from
the proof of Theorem 19.5.7 and Lemma 19.5.10, it follows that for x D (A ) and
a suitable sequence {n } ,
( )k
tk 2 ( I A )1
n n
(T (t) x, y) = lim en t x, y
n k!
k=0
(( )k )
1
t
tk 2n (n I A)
= lim e n x, y
n k!
k=0
(( )k )
1
t t
k
2n (n I A)
= lim x, e n y
n k!
k=0
= (x, S (t) y) = (S (t) x, y) .
Therefore, since y is arbitrary, S (t) = T (t) on x D (A ) a dense set and this

shows the two are equal. This proves the proposition in the case where S (t) is also
bounded.
Next only assume S (t) is a continuous semigroup. Then by Proposition 19.5.5
there exists > 0 such that
||S (t)|| M et .
Then consider the operator I + A and the bounded semigroup et S (t). For
x D (A)
( )
eh S (h) x x S (h) x x eh 1
lim = lim eh + x
h0+ h h0+ h h
= x + Ax
Thus I + A generates et S (t) and it follows from the rst part that I + A
generates et S (t) . Thus
eh S (h) x x
x + A x = lim
h0+ h
(
)
h S (h) x x eh 1
= lim e + x
h0+ h h
S (h) x x
= x + lim
h0+ h
showing that A generates S (t) . It follows from Proposition 19.5.5 that A is closed
and densely dened. It is obvious S (t) is a semigroup. Why is it continuous? This
506 HILBERT SPACES
also follows from the rst part of the argument which establishes that
et S (t)
is continuous. This proves the theorem.
19.5.3 Adjoints, Reexive Banach Space

Here the adjoint of a generator of a semigroup is considered. I will show that the
adjoint of the generator generates the adjoint of the semigroup in a reexive Banach
space. This is about as far as you can go although a general but less satisfactory
result is given in Yosida [42].
Denition 19.5.12 Let A be a densely dened closed operator on H a real Banach

space. Then A is dened as follows.
D (A ) {y H : |y (Ax)| C ||x|| for all x D (A)}
Then since D (A) is dense, there exists a unique element of H denoted by A y such
that
A (y ) (x) = y (Ax)
for all x D (A) .
Lemma 19.5.13 Let A be closed and densely dened on D (A) H, a Banach

space. Then A is also closed and densely dened. Also (A ) = A. In addition to
1 1
this, if (I A) L (H, H) , then (I A ) L (H , H ) and
(( )n ) ( )n
1 1
(I A) = (I A )
Proof: Denote by [x, y] an ordered pair in H H. Dene : H H H H

by
[x, y] [y, x]
A similar notation will apply to H H . Then the denition of adjoint implies
that for G (B) equal to the graph of B,

G (A ) = ( G (A)) (19.5.34)
For S H H, dene S by
{[a , b ] H H : a (x) + b (y) = 0 for all [x, y] S}
If S H H a similar denition holds.
{[x, y] H H : a (x) + b (y) = 0 for all [a , b ] S}
Here is why 19.5.34 is so. For [x , A x ] G (A ) it follows that for all y D (A)
x (Ay) = A x (y)
and so for all [y, Ay] G (A) ,
x (Ay) + A x (y) = 0

which is what it means to say [x , A x ] ( G (A)) . This shows

G (A ) ( G (A))

To obtain the other inclusion, let [a , b ] ( G (A)) . This means that for all
[x, Ax] G (A) ,
a (Ax) + b (x) = 0
In other words, for all x D (A) ,
|a (Ax)| ||b || ||x||
which means by denition, a D (A ) and A a = b . Thus [a , b ] G (A ).This

shows the other inclusion.
Note that if V is any subspace of H H,
( )
V =V
and S is always a closed subspace. Also and commute. The reason for this is

that [x , y ] ( V ) means that
x (b) + y (a) = 0
( ) ( )
for all [a, b] V and [x , y ] V means [y , x ] V = V so for all
[a, b] V,
y (a) + x (b) = 0
which says the same thing. It is also clear that has the eect of multiplication
by 1. If V H H , the argument for commuting and is similar.
It follows from the above description of the graph of A that even if G (A) were
not closed it would still be the case that G (A ) is closed.
Why is D (A ) dense? If it is not dense, then by a typical application of the Hahn
Banach theorem, there exists y H such that y (D (A )) = 0 but y = 0.
Since H is reexive, there exists y H such that x (y) = 0 for all x D (A ) .
Thus ( )

[y, 0] G (A ) = ( G (A)) = G (A)
and so [0, y] G (A) which means y = A0 = 0, a contradiction. Thus D (A ) is

indeed dense. Note this is where it was important to assume the space is reexive.
If you consider C ([0, 1]) it is not dense in L ([0, 1]) but if f L1 ([0, 1]) satises
1
0
f gdm = 0 for all g C ([0, 1]) , then f = 0. Hence there is no nonzero f

C ([0, 1]) .
508 HILBERT SPACES
Since A is a closed operator, meaning G (A) is closed in H H, it follows from

the above formula that
( ( ( )) ( )
)
G (A ) = ( G (A)) = ( G (A))
( ) ( )

= (G (A)) = G (A) = G (A)

and so (A ) = A.
Now consider the nal claim. First let y D (A ) = D (I A ) . Then letting
x H be arbitrary,
( )
1
y (x) = (I A) (I A) y (x)
( )
1
= y (I A) (I A) x
1
Since y D (A ) and (I A) x D (A) , this equals
( )
1
(I A) y (I A) x
Now by denition, this equals

( )
1
(I A) (I A) y (x)
It follows that for y D (A ) ,

( )
1
(I A) (I A) y
( )
1
= (I A) (I A ) y = y (19.5.35)
Next let y H be arbitrary and x D (A)

( )
1
y (x) = y (I A) (I A) x
( )
1
= (I A) y ((I A) x)
( )
1
= (I A) (I A) y (x)
( )
1
In going from the second to the third line, the rst line shows (I A) y
D (A ) and so the third line follows. Since D (A) is dense, it follows
( )
1
(I A ) (I A) =I (19.5.36)
Then 19.5.35 and 19.5.36 show I A is one to one and onto from D (A ) t0 H
and ( )
1 1
(I A ) = (I A) .
Finally, from the above,

( )n (( ) )n (( )n )
1 1 1
(I A ) = (I A) = (I A) .

With this preparation, here is an interesting result about the adjoint of the
generator of a continuous bounded semigroup.
Theorem 19.5.14 Suppose A is a densely dened closed operator which generates

a continuous semigroup, S (t) . Then A is also a closed densely dened operator
which generates S (t) and S (t) is also a continuous semigroup.
Proof: First suppose S (t) is also a bounded semigroup, ||S (t)|| M . From
Lemma 19.5.13 A is closed and densely dened. It follows from the Hille Yosida
theorem, Theorem 19.5.7 that
( )n M
1
(I A) n

From Lemma 19.5.13 and the fact the adjoint of a bounded linear operator preserves
the norm,
(( )n ) (( ) )n
M
(I A)
1 = (I A)
1
n
( )n
1
= (I A )
and so by Theorem 19.5.7 again it follows A generates a continuous semigroup,

T (t) which satises ||T (t)|| M. I need to identify T (t) with S (t). However, from
the proof of Theorem 19.5.7 and Lemma 19.5.13, it follows that for x D (A )
and a suitable sequence {n } ,
( )k
tk 2 ( I A )1
n n
T (t) x (y) = lim en t x (y)
n k!
k=0
(( )k )
1

tk 2n (n I A)
= lim en t x (y)
n k!
k=0
(( )k )
1
t t k
2
n (n I A)
= lim x
e n
y

n k!
k=0
= x (S (t) y) = S (t) x (y) .

510 HILBERT SPACES
Therefore, since y is arbitrary, S (t) = T (t) on x D (A ) a dense set and this

shows the two are equal. In particular, S (t) is a semigroup because T (t) is. This
proves the proposition in the case where S (t) is also bounded.
Next only assume S (t) is a continuous semigroup. Then by Proposition 19.5.5
there exists > 0 such that
||S (t)|| M et .
Then consider the operator I + A and the bounded semigroup et S (t). For
x D (A)
( )
eh S (h) x x h S (h) x x eh 1
lim = lim e + x
h0+ h h0+ h h
= x + Ax
Thus I + A generates et S (t) and it follows from the rst part that I + A
generates the semigroup et S (t) . Thus
eh S (h) x x
x + A x = lim
h0+ h
( )
S (h) x x eh 1
= lim eh + x
h0+ h h

S (h) x x
= x + lim
h0+ h
showing that A generates S (t) . It follows from Proposition 19.5.5 that A is closed
and densely dened. It is obvious S (t) is a semigroup. Why is it continuous? This
also follows from the rst part of the argument which establishes that
t et S (t) x
is continuous. This proves the theorem.
19.6 Exercises
1
1. For f, g C ([0, 1]) let (f, g) = 0 f (x) g (x)dx. Is this an inner product
space? Is it a Hilbert space? What does the Cauchy Schwarz inequality say
in this context?
2. Suppose the following conditions hold.
(x, x) 0, (19.6.37)
(x, y) = (y, x). (19.6.38)

For a, b C and x, y, z X,
(ax + by, z) = a(x, z) + b(y, z). (19.6.39)

19.6. EXERCISES 511
These are the same conditions for an inner product except it is no longer
required that (x, x) = 0 if and only if x = 0. Does the Cauchy Schwarz
inequality hold in the following form?
1/2 1/2
|(x, y)| (x, x) (y, y) .
3. Let S denote the unit sphere in a Banach space, X,
S {x X : ||x|| = 1} .
Show that if Y is a Banach space, then A L (X, Y ) is compact if and only

if A (S) is precompact, A (S) is compact. A L (X, Y ) is said to be compact
if whenever B is a bounded subset of X, it follows A (B) is a compact subset
of Y. In words, A takes bounded sets to precompact sets.
4. Show that A L (X, Y ) is compact if and only if A is compact. Hint: Use
the result of 3 and the Ascoli Arzela theorem to argue that for S the unit ball
in X , there is a subsequence, {yn } S such that yn converges uniformly
on the compact set, A (S). Thus {A yn } is a Cauchy sequence in X . To get
the other implication, apply the result just obtained for the operators A and
A . Then use results about the embedding of a Banach space into its double
dual space.
5. Prove the parallelogram identity,
2 2 2 2
|x + y| + |x y| = 2 |x| + 2 |y| .
Next suppose (X, || ||) is a real normed linear space and the parallelogram
identity holds. Can it be concluded there exists an inner product (, ) such
that ||x|| = (x, x)1/2 ?
6. Let K be a closed, bounded and convex set in Rn and let f : K Rn be
continuous and let y Rn . Show using the Brouwer xed point theorem
there exists a point x K such that P (y f (x) + x) = x. Next show that
(y f (x) , z x) 0 for all z K. The existence of this x is known as
Browders lemma and it has great signicance in the study of certain types of
nolinear operators. Now suppose f : Rn Rn is continuous and satises
(f (x) , x)
lim = .
|x| |x|
Show using Browders lemma that f is onto.
7. Show that every inner product space is uniformly convex. This means that if
xn , yn are vectors whose norms are no larger than 1 and if ||xn + yn || 2,
then ||xn yn || 0.
8. Let H be separable and let S be an orthonormal set. Show S is countable.
Hint: How far apart are two elements of the orthonormal set?
512 HILBERT SPACES
9. Suppose {x1 , , xm } is a linearly independent set of vectors in a normed

linear space. Show span (x1 , , xm ) is a closed subspace. Also show every
orthonormal set of vectors is linearly independent.
10. Show every Hilbert space, separable or not, has a maximal orthonormal set
of vectors.
11. Prove Bessels inequality, which saysthat if {xn } n=1 is an orthonormal

set in H, then for all x H, ||x||2 n k=1 |(x, x k )| 2
. Hint: Show that if
M = span(x1 , , xn ), then P x = k=1 xk (x, x k ). Then observe ||x||2 =
||x P x|| + ||P x|| .
2 2
12. Show S is a maximal orthonormal set if and only if span (S) is dense in H,
where span (S) is dened as
span(S) {all nite linear combinations of elements of S}.
13. Suppose {xn }

n=1 is a maximal orthonormal set. Show that

N
x= (x, xn )xn lim (x, xn )xn
N
n=1 n=1

and ||x||2 = i=1 |(x, xi )|2 . Also show (x, y) = n=1 (x, xn )(y, xn ). Hint:
For the last part of this, you might proceed as follows. Show that

((x, y)) (x, xn )(y, xn )
n=1
is a well dened inner product on the Hilbert space which delivers the same
norm as the original inner product. Then you could verify that there exists
a formula for the inner product in terms of the norm and conclude the two
inner products, (, ) and ((, )) must coincide.
14. Suppose X is an innite dimensional Banach space and suppose
{x1 xn }
are linearly independent with ||xi || = 1. By Problem 9 span (x1 xn ) Xn
is a closed linear subspace of X. Now let z / Xn and pick y Xn such that
||z y|| 2 dist (z, Xn ) and let
zy
xn+1 = .
||z y||
Show the sequence {xk } satises ||xn xk || 1/2 whenever k < n. Now
show the unit ball {x X : ||x|| 1} in a normed linear space is compact if
and only if X is nite dimensional. Hint:

z y z y xk ||z y||
.
||z y|| xk = ||z y||
19.6. EXERCISES 513
15. Show that if A is a self adjoint operator on a Hilbert space and Ay = y for
a complex number and y = 0, then must be real. Also verify that if A is
self adjoint and Ax = x while Ay = y, then if = , it must be the case
that (x, y) = 0.
16. Theorem 19.5.8 gives the the existence and uniqueness for an evolution equa-
tion of the form
y y = g, y (0) = y0 H
where g is in C 1 (0, ; H) for H a Banach space. Recall was the generator
of a continuous semigroup, S (h). Generalize this to an equation of the form
y y = g + Ly, y (0) = y0 H
where L is a continuous linear map. Hint: You might consider + L and

show it generates a continuous semigroup. Then apply the theorem.
514 HILBERT SPACES
Representation Theorems
20.1 Radon Nikodym Theorem

This chapter is on various representation theorems. The rst theorem, the Radon
Nikodym Theorem, is a representation theorem for one measure in terms of an-
other. The approach given here is due to Von Neumann and depends on the Riesz
representation theorem for Hilbert space, Theorem 19.1.14 on Page 482.
Denition 20.1.1 Let and be two measures dened on a -algebra, S, of
subsets of a set, . is absolutely continuous with respect to ,written as ,
if (E) = 0 whenever (E) = 0.
It is not hard to think of examples which should be like this. For example,
suppose one measure is volume and the other is mass. If the volume of something
is zero, it is reasonable to expect the mass of it should also be equal to zero. In
this case, there is a function called the density which is integrated over volume to
obtain mass. The Radon Nikodym theorem is an abstract version of this notion.
Essentially, it gives the existence of the density function.
Theorem 20.1.2 (Radon Nikodym) Let and be nite measures dened on a -
algebra, S, of subsets of . Suppose . Then there exists a unique f L1 (, )
such that f (x) 0 and
(E) = f d.
E
If it is not necessarily the case that , there are two measures, and ||
such that = + || , || and there exists a set of measure zero, N such
that for all E measurable, (E) = (E N ) = (E N ) . In this case the two
measures, and || are unique and the representation of = + || is called
the Lebesgue decomposition of . The measure || is the absolutely continuous part
of and is called the singular part of .
Proof: Let : L2 (, + ) C be dened by

g = g d.

515
516 REPRESENTATION THEOREMS
By Holders inequality,
( )1/2 ( )1/2
2 1/2
|g| 2
1 d |g| d ( + ) = () ||g||2

where ||g||2 is the L2 norm of g taken with respect to + . Therefore, since

is bounded, it follows from Theorem 18.1.4 on Page 443 that (L2 (, + )) ,
the dual space L2 (, + ). By the Riesz representation theorem in Hilbert space,
Theorem 19.1.14, there exists a unique h L2 (, + ) with

g = g d = hgd( + ). (20.1.1)

The plan is to show h is real and nonnegative at least a.e. Therefore, consider the
set where Im h is positive.
E = {x : Im h(x) > 0} ,
Now let g = XE and use 20.1.1 to get

(E) = (Re h + i Im h)d( + ). (20.1.2)
E
Since the left side of 20.1.2 is real, this shows

0 = (Im h) d( + )
E
(Im h) d( + )
En
1
( + ) (En )
n
where { }
1
En x : Im h (x)
n
Thus ( + ) (En ) = 0 and since E =
n=1 En , it follows ( + ) (E) = 0. A similar
argument shows that for
E = {x : Im h (x) < 0},
( + )(E) = 0. Thus there is no loss of generality in assuming h is real-valued.

The next task is to show h is nonnegative. This is done in the same manner as
above. Dene the set where it is negative and then show this set has measure zero.
Let E {x : h(x) < 0} and let En {x : h(x) < n1 }. Then let g = XEn . Since
E = n En , it follows that if ( + ) (E) > 0 then this is also true for ( + ) (En )
for all n large enough. Then from 20.1.2

(En ) = h d( + ) (1/n) ( + ) (En ) < 0,
En
20.1. RADON NIKODYM THEOREM 517
a contradiction. Thus it can be assumed h 0.

At this point the argument splits into two cases.
Case Where . In this case, h < 1.
Let E = [h 1] and let g = XE . Then

(E) = h d( + ) (E) + (E).
E
Therefore (E) = 0. Since , it follows that (E) = 0 also. Thus it can be

assumed
0 h(x) < 1
for all x.
From 20.1.1, whenever g L2 (, + ),

g(1 h)d = hgd. (20.1.3)

Now let E be a measurable set and dene

n
g(x) hi (x)XE (x)
i=0
in 20.1.3. This yields

n+1

(1 hn+1 (x))d = hi (x)d. (20.1.4)
E E i=1

Let f (x) = i=1 hi (x) and use the Monotone Convergence theorem in 20.1.4 to let
n and conclude
(E) = f d.
E
f L (, ) because is nite.
1
The function, f is unique a.e. because, if g is another function which also

serves to represent , consider for each n N the set,
[ ]
1
En f g >
n
and conclude that

1
0= (f g) d (En ) .
En n
Therefore, (En ) = 0. It follows that

([f g > 0]) (En ) = 0
n=1
Similarly, the set where g is larger than f has measure zero. This proves the
theorem.
Case where it is not necessarily true that .
In this case, let N = [h 1] and let g = XN . Then

(N ) = h d( + ) (N ) + (N ).
N
and so (N ) = 0. Now dene a measure, by
(E) (E N )
so (E N ) = (E N N ) (E) and let || . Therefore,

( )
(E) = E N C
Also, ( )
|| (E) = (E) (E) (E) (E N ) = E N C .
Suppose || (E) > 0. Therefore, since h < 1 on N C

( )
|| (E) = E N C = h d( + )
EN C
( ) ( )
< E N C + E N C = (E) + || (E) ,
which is a contradiction unless (E) > 0. Therefore, || because if (E) = 0,

the above inequality cannot hold.
It only remains to verify the two measures and || are unique. Suppose then
that 1 and 2 play the roles of and || respectively. Let N1 play the role of N
in the denition of 1 and let f1 play the role of f for 2 . I will show that f = f1
a.e. Let Ek [f1 f > 1/k] for k N. Then on observing that 1 = 2 ||
( )
C
0 = ( 1 ) Ek (N1 N ) = (g1 g) d
Ek (N1 N )C
1 ( C
) 1
Ek (N1 N ) = (Ek ) .
k k
and so (Ek ) = 0. Therefore, ([f1 f > 0]) = 0 because [f1 f > 0] = k=1 Ek .
It follows f1 f a.e. Similarly, f1 f a.e. Therefore, 2 = || and so = 1
also.
The f in the theorem for the absolutely continuous case is sometimes denoted
d
by d and is called the Radon Nikodym derivative.
The next corollary is a useful generalization to nite measure spaces.
Corollary 20.1.3 Suppose and there exist sets Sn S with
Sn Sm = ,
n=1 Sn = ,
20.1. RADON NIKODYM THEOREM 519
and (Sn ), (Sn ) < . Then there exists f 0, where f is measurable, and

(E) = f d
E
for all E S. The function f is + a.e. unique.
Proof: Dene the algebra of subsets of Sn ,
Sn {E Sn : E S}.
Then both , and are nite measures on Sn , and . Thus, by Theorem

20.1.2, there exists a nonnegative Sn measurable function fn ,with (E) = E fn d
for all E Sn . Dene f (x) = fn (x) for x Sn . Since the Sn are disjoint and their
union is all of , this denes f on all of . The function, f is measurable because
f 1 ((a, ]) = 1
n=1 fn ((a, ]) S.
Also, for E S,

(E) = (E Sn ) = XESn (x)fn (x)d
n=1 n=1

= XESn (x)f (x)d
n=1
By the monotone convergence theorem

N

XESn (x)f (x)d = lim XESn (x)f (x)d
N
n=1 n=1

N
= lim XESn (x)f (x)d
N
n=1

= XESn (x)f (x)d = f d.
n=1 E
This proves the existence part of the corollary.

To see f is unique, suppose f1 and f2 both work and consider for n N
[ ]
1
Ek f1 f2 > .
k
Then
0 = (Ek Sn ) (Ek Sn ) = f1 (x) f2 (x)d.
Ek Sn
Hence (Ek Sn ) = 0 for all n so
(Ek ) = lim (E Sn ) = 0.
n

Hence ([f1 f2 > 0]) k=1 (Ek ) = 0. Therefore, ([f1 f2 > 0]) = 0 also.
Similarly
( + ) ([f1 f2 < 0]) = 0.
This version of the Radon Nikodym theorem will suce for most applications,
but more general versions are available. To see one of these, one can read the
treatment in Hewitt and Stromberg [28]. This involves the notion of decomposable
measure spaces, a generalization of nite.
Not surprisingly, there is a simple generalization of the Lebesgue decomposition
part of Theorem 20.1.2.
Corollary 20.1.4 Let (, S) be a set with a algebra of sets. Suppose and

are two measures dened on the sets of S and suppose there exists a sequence of

disjoint sets of S, {i }i=1 such that (i ) , (i ) < . Then there is a set of
measure zero, N and measures and || such that
+ || = , || , (E) = (E N ) = (E N ) .
Proof: Let Si {E i : E S} and for E Si , let i (E) = (E) and

(E) = (E) . Then by Theorem 20.1.2 there exist unique measures i and i||
i
such that i = i + i|| , a set of i measure zero, Ni Si such that for all E Si ,
i (E) = i (E Ni ) and i|| i . Dene for E S

(E) i (E i ) , || (E) i|| (E i ) , N i Ni .
i i
First observe that and || are measures.

( ) ( ) i

j=1 Ej i
j=1 Ej i = (Ej i )
i i j

= i (Ej i ) = (Ej i Ni )
j i j i

= i (Ej i ) = (Ej ) .
j i j
The argument for || is similar. Now

(N ) = (N i ) = i (Ni ) = 0
i i
and

(E) i (E i ) = i (E i Ni )
i i

= (E i N ) = (E N ) .
i
20.2. VECTOR MEASURES 521
Also if (E) = 0, then i (E i ) = 0 and so i|| (E i ) = 0. Therefore,

|| (E) = i|| (E i ) = 0.
i
The decomposition is unique because of the uniqueness of the i|| and i and the
observation that some other decomposition must coincide with the given one on the
i .
20.2 Vector Measures

The next topic will use the Radon Nikodym theorem. It is the topic of vector and
complex measures. The main interest is in complex measures although a vector
measure can have values in any topological vector space. Whole books have been
written on this subject. See for example the book by Diestal and Uhl [11] titled
Vector measures.
Denition 20.2.1 Let (V, ||||) be a normed linear space and let (, S) be a measure
space. A function : S V is a vector measure if is countably additive. That
is, if {Ei }
i=1 is a sequence of disjoint sets of S,

(
i=1 Ei ) = (Ei ).
i=1
Note that it makes sense to take nite sums because it is given that has
values in a vector space in which vectors can be summed. In the above, (Ei ) is a
vector. It might be a point in Rn or in any other vector space. In many of the most
important applications, it is a vector in some sort of function space which may be
innite dimensional. The innite sum has the usual meaning. That is

n
(Ei ) = lim (Ei )
n
i=1 i=1
where the limit takes place relative to the norm on V .
Denition 20.2.2 Let (, S) be a measure space and let be a vector measure

dened on S. A subset, (E), of S is called a partition of E if (E) consists of
nitely many disjoint sets of S and (E) = E. Let

||(E) = sup{ ||(F )|| : (E) is a partition of E}.
F (E)
|| is called the total variation of .
The next theorem may seem a little surprising. It states that, if nite, the total
variation is a nonnegative measure.
Theorem 20.2.3 If||() < , then || is a measure on S. Even if || () =

, || (
i=1 Ei ) i=1 || (Ei ) . That is || is subadditive and || (A) || (B)
whenever A, B S with A B.
Proof: Consider the last claim. Let a < || (A) and let (A) be a partition of
A such that
a< || (F )|| .
F (A)
Then (A) {B \ A} is a partition of B and

|| (B) || (F )|| + || (B \ A)|| > a.
F (A)
Since this is true for all such a, it follows || (B) || (A) as claimed.

Let {Ej }j=1 be a sequence of disjoint sets of S and let E = j=1 Ej . Then
letting a < || (E ) , it follows from the denition of total variation there exists a
partition of E , (E ) = {A1 , , An } such that

n
a< ||(Ai )||.
i=1
Also,
Ai = j=1 Ai Ej

and so by the triangle inequality, ||(Ai )|| j=1 ||(Ai Ej )||. Therefore, by the
above, and either Fubinis theorem or Lemma 1.3.4 on Page 26
||(Ai )||
z }| {

n
n

a< ||(Ai Ej )|| = ||(Ai Ej )|| ||(Ej )
i=1 j=1 j=1 i=1 j=1
n
because {Ai Ej }i=1 is a partition of Ej .
Since a is arbitrary, this shows

||(
j=1 Ej ) ||(Ej ).
j=1
If the sets, Ej are not disjoint, let F1 = E1 and if Fn has been chosen, let Fn+1
En+1 \ ni=1 Ei . Thus the sets, Fi are disjoint and
i=1 Fi = i=1 Ei . Therefore,

( ) ( )
||
j=1 Ej = || j=1 Fj || (Fj ) || (Ej )
j=1 j=1
and proves || is always subadditive as claimed regardless of whether || () < .

Now suppose || () < and let E1 and E2 be sets of S such that E1 E2 =

and let {Ai1 Aini } = (Ei ), a partition of Ei which is chosen such that

ni
|| (Ei ) < ||(Aij )|| i = 1, 2.
j=1
Such a partition exists because of the denition of the total variation. Consider the
sets which are contained in either of (E1 ) or (E2 ) , it follows this collection of
sets is a partition of E1 E2 denoted by (E1 E2 ). Then by the above inequality
and the denition of total variation,

||(E1 E2 ) ||(F )|| > || (E1 ) + || (E2 ) 2,
F (E1 E2 )
which shows that since > 0 was arbitrary,
||(E1 E2 ) ||(E1 ) + ||(E2 ). (20.2.5)

n
Then 20.2.5 implies that whenever the Ei are disjoint, ||(nj=1 Ej ) j=1 ||(Ej ).
Therefore,

n
||(Ej ) ||(
j=1 Ej ) ||(nj=1 Ej ) ||(Ej ).
j=1 j=1
Since n is arbitrary,

||(
j=1 Ej ) = ||(Ej )
j=1
which shows that || is a measure as claimed.

The following corollary is interesting. It concerns the case that is only nitely
additive.
Corollary 20.2.4 Suppose (, F) is a set with a algebra of subsets F and suppose

n
: F C is only nitely additive. That is, (ni=1 Ei ) = i=1 (Ei ) whenever the
Ei are disjoint. Then || , dened in the same way as above, is also nitely additive
provided || is nite.
Proof: Say E F = for E, F F. Let (E) , (F ) suitable partitions for

which the following holds.

|| (E F ) | (A)| + | (B)| || (E) + || (F ) 2.
A(E) B(F )
Similar considerations apply to any nite union.

Now let E = ni=1 Ei where the Ei are disjoint. Then letting (E) be a partition
of E,
|| (E) | (F )| ,
F (E)
it follows that
n

|| (E) + | (F )| = + (F Ei )

F (E) F (E) i=1

n
n
+ | (F Ei )| + || (Ei )
i=1 F (E) i=1
which shows || is nitely additive.

In the case that is a complex measure, it is always the case that || () < .
Theorem 20.2.5 Suppose is a complex measure on (, S) where S is a algebra

of subsets of . That is, whenever, {Ei } is a sequence of disjoint sets of S,

(
i=1 Ei ) = (Ei ) .
i=1
Then || () < .
Proof: First here is a claim.

Claim: Suppose || (E) = . Then there are disjoint subsets of E, A and B
such that E = A B, | (A)| , | (B)| > 1 and || (B) = .
Proof of the claim: From the denition of || , there exists a partition of
E, (E) such that
| (F )| > 20 (1 + | (E)|) . (20.2.6)
F (E)
Here 20 is just a nice sized number. No eort is made to be delicate in this argument.
Also note that (E) C because it is given that is a complex measure. Consider
the following picture consisting of two lines in the complex plane having slopes 1
and -1 which intersect at the origin, dividing the complex plane into four closed
sets, R1 , R2 , R3 , and R4 as shown.
R2
R3 R1
R4
Let i consist of those sets, A of (E) for which (A) Ri . Thus, some sets,
A of (E) could be in two of the i if (A) is on one of the intersecting lines. This
is not important. The thing which is important is that if (A) R1 or R3 , then

2
2
| (A)| |Re ( (A))| and if (A) R2 or R4 then 22 | (A)| |Im ( (A))| and
Re (z) has the same sign for z in R1 and R3 while Im (z) has the same sign for z in
R2 or R4 . Then by 20.2.6, it follows that for some i,

| (F )| > 5 (1 + | (E)|) . (20.2.7)
F i
Suppose i equals 1 or 3. A similar argument using the imaginary part applies if i

equals 2 or 4. Then,

(F ) Re ( (F )) = |Re ( (F ))|

F i F i F i

2 2
| (F )| > 5 (1 + | (E)|) .
2 2
F i
Now letting C be the union of the sets in i ,

5

| (C)| = (F ) > (1 + | (E)|) > 1. (20.2.8)
2
F i
Dene D E \ C.
Then (C) + (E \ C) = (E) and so

5
(1 + | (E)|) < | (C)| = | (E) (E \ C)|
2
= | (E) (D)| | (E)| + | (D)|
and so
5 3
1< + | (E)| < | (D)| .
2 2
Now since || (E) = , it follows from Theorem 20.2.5 that = || (E) || (C)+
|| (D) and so either || (C) = or || (D) = . If || (C) = , let B = C and
A = D. Otherwise, let B = D and A = C. This proves the claim.
Now suppose || () = . Then from the claim, there exist A1 and B1 such that
|| (B1 ) = , | (B1 )| , | (A1 )| > 1, and A1 B1 = . Let B1 \ A play the same
role as and obtain A2 , B2 B1 such that || (B2 ) = , | (B2 )| , | (A2 )| > 1,
and A2 B2 = B1 . Continue in this way to obtain a sequence of disjoint sets, {Ai }
such that | (Ai )| > 1. Then since is a measure,

(
i=1 Ai ) = (Ai )
i=1
but this is impossible because limi (Ai ) = 0.
Theorem 20.2.6 Let (, S) be a measure space and let : S C be a complex

vector measure. Thus ||() < . Let : S [0, ()] be a nite measure such
that . Then there exists a unique f L1 () such that for all E S,

f d = (E).
E
Proof: It is clear that Re and Im are real-valued vector measures on S.

Since ||() < , it follows easily that | Re |() and | Im |() < . This is clear
because
| (E)| |Re (E)| , |Im (E)| .
Therefore, each of
| Re | + Re | Re | Re() | Im | + Im | Im | Im()
, , , and
2 2 2 2
are nite measures on S. It is also clear that each of these nite measures are abso-
lutely continuous with respect to and so there exist unique nonnegative functions
in L1 (), f1, f2 , g1 , g2 such that for all E S,

1
(| Re | + Re )(E) = f1 d,
2
E
1
(| Re | Re )(E) = f2 d,
2
E
1
(| Im | + Im )(E) = g1 d,
2
E
1
(| Im | Im )(E) = g2 d.
2 E
Now let f = f1 f2 + i(g1 g2 ).

The following corollary is about representing a vector measure in terms of its
total variation. It is like representing a complex number in the form rei . The proof
requires the following lemma.
Lemma 20.2.7 Suppose (, S, ) is a measure space and f is a function in L1 (, )

with the property that

| f d| (E)
E
for all E S. Then |f | 1 a.e.
Proof of the lemma: Consider the following picture.
B(p, r)
(0, 0)
p
1
where B(p, r) B(0, 1) = . Let E = f 1 (B(p, r)). In fact (E) = 0. If (E) = 0

then

1
f d p = 1 (f p)d
(E) (E)
E
E
1
|f p|d < r
(E) E
because on E, |f (x) p| < r. Hence

1
| f d| > 1
(E) E
because it is closer to p than r. (Refer to the picture.) However, this contradicts the
assumption of the lemma. It follows (E) = 0. Since the set of complex numbers,
z such that |z| > 1 is an open set, it equals the union of countably many balls,

{Bi }i=1 . Therefore,
( ) ( )
f 1 ({z C : |z| > 1} = k=1 f
1
(Bk )

( )
f 1 (Bk ) = 0.
k=1
Thus |f (x)| 1 a.e. as claimed.
Corollary 20.2.8 Let be a complex vector measure with ||() < 1 Then there
exists a unique f L1 () such that (E) = E f d||. Furthermore, |f | = 1 for ||
a.e. This is called the polar decomposition of .
1 As proved above, the assumption that || () < is redundant.
Proof: First note that || and so such an L1 function exists and is unique.
It is required to show |f | = 1 a.e. If ||(E) = 0,

(E) 1
= f d|| 1.
||(E) ||(E)
E
Therefore by Lemma 20.2.7, |f | 1, || a.e. Now let

[ ]
1
En = |f | 1 .
n
Let {F1 , , Fm } be a partition of En . Then
m
m
m

| (Fi )| = f d || |f | d ||

i=1 i=1 Fi i=1 Fi
m ( ) m ( )
1 1
1 d || = 1 || (Fi )
i=1 Fi
n i=1
n
( )
1
= || (En ) 1 .
n
Then taking the supremum over all partitions,

( )
1
|| (En ) 1 || (En )
n
which shows || (En ) = 0. Hence || ([|f | < 1]) = 0 because [|f | < 1] =
n=1 En .This
proves Corollary 20.2.8.
Corollary 20.2.9 Suppose (, S) is a measure space and is a nite nonnegative

measure on S. Then for h L1 () , dene a complex measure, by

(E) hd.
E
Then
|| (E) = |h| d.
E
Furthermore, |h| = gh where gd || is the polar decomposition of ,

(E) = gd ||
E
Proof: From Corollary 20.2.8 there exists g such that |g| = 1, || a.e. and for
all E S
(E) = gd || = hd.
E E
20.3. REPRESENTATION THEOREMS FOR THE DUAL SPACE OF LP 529
Let sn be a sequence of simple functions converging pointwise to g. Then from the

above,
gsn d || = sn hd.
E E
Passing to the limit using the dominated convergence theorem,

d || = ghd.
E E
It follows gh 0 a.e. and |g| = 1. Therefore, |h| = |gh| = gh. It follows from the
above, that
|| (E) = d || = ghd = d || = |h| d
E E E E
and this proves the corollary.
20.3 Representation Theorems For The Dual Space

Of Lp
Recall the concept of the dual space of a Banach space in the Chapter on Banach
space starting on Page 441. The next topic deals with the dual space of Lp for p 1
in the case where the measure space is nite or nite. In what follows q = if
p = 1 and otherwise, p1 + 1q = 1.
Theorem 20.3.1 (Riesz representation theorem) Let p > 1 and let (, S, ) be a

nite measure space. If (Lp ()) , then there exists a unique h Lq () ( p1 + 1q =
1) such that
f = hf d.

This function satises ||h||q = |||| where |||| is the operator norm of .
Proof: (Uniqueness) If h1 and h2 both represent , consider
f = |h1 h2 |q2 (h1 h2 ),
where h denotes complex conjugation. By Holders inequality, it is easy to see that

f Lp (). Thus
0 = f f =

h1 |h1 h2 |q2 (h1 h2 ) h2 |h1 h2 |q2 (h1 h2 )d

= |h1 h2 |q d.
Therefore h1 = h2 and this proves uniqueness.

Now let (E) = (XE ). Since this is a nite measure space XE is an element
of Lp () and so it makes sense to write (XE ). In fact is a complex measure
having nite total variation. Let A1 , , An be a partition of .
|XAi | = wi (XAi ) = (wi XAi )
for some wi C, |wi | = 1. Thus

n
n
n
|(Ai )| = |(XAi )| = ( wi XAi )
i=1 i=1 i=1
n
1 1 1
||||( | wi XAi | d) = ||||( d) p = ||||() p.
p p
i=1
This is because if x , x is contained in exactly one of the Ai and so the absolute

value of the sum in the rst integral above is equal to 1. Therefore ||() <
because this was an arbitrary partition. Also, if {Ei } i=1 is a sequence of disjoint
sets of S, let
Fn = ni=1 Ei , F = i=1 Ei .
Then by the Dominated Convergence theorem,
||XFn XF ||p 0.
Therefore, by continuity of ,

n

(F ) = (XF ) = lim (XFn ) = lim (XEk ) = (Ek ).
n n
k=1 k=1
This shows is a complex measure with || nite.

It is also clear from the denition of that . Therefore, by the Radon
Nikodym theorem, there exists h L1 () with

(E) = hd = (XE ).
E
m
Actually h L and satises the other conditions above. Let s =
q
i=1 ci XEi be a
simple function. Then since is linear,
m
m
(s) = ci (XEi ) = ci hd = hsd. (20.3.9)
i=1 i=1 Ei
Claim: If f is uniformly bounded and measurable, then

(f ) = hf d.
Proof of claim: Since f is bounded and measurable, there exists a sequence of

simple functions, {sn } which converges to f pointwise and in Lp (). This follows
from Theorem 7.5.6 on Page 175 upon breaking f up into positive and negative
parts of real and complex parts. In fact this theorem gives uniform convergence.
Then
(f ) = lim (sn ) = lim hsn d = hf d,
n n
the rst equality holding because of continuity of , the second following from 20.3.9
and the third holding by the dominated convergence theorem.
This is a very nice formula but it still has not been shown that h Lq ().
Let En = {x : |h(x)| n}. Thus |hXEn | n. Then
|hXEn |q2 (hXEn ) Lp ().
By the claim, it follows that

||hXEn ||q = h|hXEn |q2 (hXEn )d = (|hXEn |q2 (hXEn ))
q
|||| |hXEn |q2 (hXEn )p = |||| ||hXEn ||qp,

the last equality holding because q 1 = q/p and so
( )1/p ( ( )p )1/p

|hXEn |q2 (hXEn )p d = |hXEn | q/p
d
q
= ||hXEn ||qp
Therefore, since q q
p = 1, it follows that
||hXEn ||q ||||.
Letting n , the Monotone Convergence theorem implies
||h||q ||||. (20.3.10)
Now that h has been shown to be in Lq (), it follows from 20.3.9 and the density
of the simple functions, Theorem 12.4.1 on Page 290, that

f = hf d
for all f Lp ().

It only remains to verify the last claim.

|||| = sup{ hf : ||f ||p 1} ||h||q ||||
by 20.3.10, and Holders inequality.

To represent elements of the dual space of L1 (), another Banach space is
needed.
Denition 20.3.2 Let (, S, ) be a measure space. L () is the vector space of

measurable functions such that for some M > 0, |f (x)| M for all x outside of
some set of measure zero (|f (x)| M a.e.). Dene f = g when f (x) = g(x) a.e.
and ||f || inf{M : |f (x)| M a.e.}.
Theorem 20.3.3 L () is a Banach space.
Proof: It is clear that L () is a vector space. Is || || a norm?

Claim: If f L (),{ then |f (x)| ||f || a.e. }
Proof of the claim: x : |f (x)| ||f || + n1 En is a set of measure zero
according to the denition of ||f || . Furthermore, {x : |f (x)| > ||f || } = n En
and so it is also a set of measure zero. This veries the claim.
Now if ||f || = 0 it follows that f (x) = 0 a.e. Also if f, g L (),
|f (x) + g (x)| |f (x)| + |g (x)| ||f || + ||g||
a.e. and so ||f || + ||g|| serves as one of the constants, M in the denition of
||f + g|| . Therefore,
||f + g|| ||f || + ||g|| .
Next let c be a number. Then |cf (x)| = |c| |f (x)| |c| ||f || and so ||cf ||
|c| ||f || . Therefore since c is arbitrary, ||f || = ||c (1/c) f || 1c ||cf || which
implies |c| ||f || ||cf || . Thus || || is a norm as claimed.
To verify completeness, let {fn } be a Cauchy sequence in L () and use the
above claim to get the existence of a set of measure zero, Enm such that for all
x / Enm ,
|fn (x) fm (x)| ||fn fm ||
Let E = n,m Enm . Thus (E) = 0 and for each x / E, {fn (x)}n=1 is a Cauchy
sequence in C. Let
{
0 if x E
f (x) = = lim XE C (x)fn (x).
limn fn (x) if x
/E n
Then f is clearly measurable because it is the limit of measurable functions. If
Fn = {x : |fn (x)| > ||fn || }
and F =
n=1 Fn , it follows (F ) = 0 and that for x
/ F E,
|f (x)| lim inf |fn (x)| lim inf ||fn || <

n n
because {||fn || } is a Cauchy sequence. (|||fn || ||fm || | ||fn fm || by the

triangle inequality.) Thus f L (). Let n be large enough that whenever m > n,
||fm fn || < .
Then, if x
/ E,
|f (x) fn (x)| = lim |fm (x) fn (x)|

m
lim inf ||fm fn || < .
m
Hence ||f fn || < for all n large enough. This proves the theorem.
( )
The next theorem is the Riesz representation theorem for L1 () .
Theorem 20.3.4 (Riesz representation theorem) Let (, S, ) be a nite measure

space. If (L1 ()) , then there exists a unique h L () such that

(f ) = hf d

for all f L1 (). If h is the function in L () representing (L1 ()) , then

||h|| = ||||.
Proof: Just as in the proof of Theorem 20.3.1, there exists a unique h L1 ()

such that for all simple functions, s,

(s) = hs d. (20.3.11)
To show h L (), let > 0 be given and let
E = {x : |h(x)| |||| + }.
Let |k| = 1 and hk = |h|. Since the measure space is nite, k L1 (). As
in Theorem 20.3.1 let {sn } be a sequence of simple functions converging to k in
L1 (), and pointwise. It follows from the construction in Theorem 7.5.6 on Page
175 that it can be assumed |sn | 1. Therefore

(kXE ) = lim (sn XE ) = lim hsn d = hkd
n n E E
where the last equality holds by the Dominated Convergence theorem. Therefore,

||||(E) |(kXE )| = | hkXE d| = |h|d
E
(|||| + )(E).
It follows that (E) = 0. Since > 0 was arbitrary, |||| ||h|| . Since h L (),
the density of the simple functions in L1 () and 20.3.11 imply

f = hf d , |||| ||h|| . (20.3.12)

This proves the existence part of the theorem. To verify uniqueness, suppose h1
and h2 both represent and let f L1 () be such that |f | 1 and f (h1 h2 ) =
|h1 h2 |. Then

0 = f f = (h1 h2 )f d = |h1 h2 |d.
Thus h1 = h2 . Finally,

|||| = sup{| hf d| : ||f ||1 1} ||h|| ||||
by 20.3.12.
Next these results are extended to the nite case.
Lemma 20.3.5 Let (, S, ) be a measure space and suppose there exists a mea-
r such that r (x) > 0 for all x, there exists M such that |r (x)| < M
surable function,
for all x, and rd < . Then for
(Lp (, )) , p 1,
there exists h Lq (, ), L (, ) if p = 1 such that

f = hf d.
Also ||h|| = ||||. (||h|| = ||h||q if p > 1, ||h|| if p = 1). Here
1 1
+ = 1.
p q
e, according to the rule

Proof: Dene a new measure

e (E)
rd. (20.3.13)
E

e is a nite measure on S. For (Lp ()) , dene
Thus
e (Lp (e
))

by ( )
e (g) r1/p g

This really is in (Lp (e
)) because
( ) ( )1/p
e 1/p p
(g) r g ||||
1/p
r g d = |||| ||g||Lp (e)

Therefore, by Theorems 20.3.4 and 20.3.1 there exists a unique h Lq (e) which
e
represents . Here q = if p = 1 and satises 1/q + 1/p = 1 otherwise. Thus for
g Lp (e
) ,
( ) ( )( )
1/p e
r g (g) = hgrd = r1/q h r1/p g d

( )
For f Lp () , it follows f = r1/p r1/p f = r1/p g and r1/p f Lp (e ). Thus
from the above,
( ( )) ( ) ( ) ( )
1/p 1/p 1/q 1/p 1/p
(f ) = r r f = r h r r f d = r1/q h f d

Since h Lq (e
) , it follows r1/q h Lq (). This is true even in the case that p = 1
so q = because r is bounded. It follows
q2
1/q q 1/q 1/q

r h q
= r h r h 1/q
r h =
L ()
( q2 ) ( ( )p )1/p

r1/q h r 1/q
h ||||
1/q q/p
r h d = ||||
1/q q/p
r h q
L ()

and so
1/q
r h |||| .
Lq ()
Now (
) 1/q
|||| sup r 1/q
h f d r h ||||
Lq ()
||f ||Lp () 1
and so all the conclusions of Theorems 20.3.4 and 20.3.1 hold.

A situation in which the conditions of the lemma are satised is the case where
the measure space is nite. In fact, you should show this is the only case in which
the conditions of the above lemma hold.
Theorem 20.3.6 (Riesz representation theorem) Let (, S, ) be nite and let
(Lp (, )) , p 1.
Then there exists a unique h Lq (, ), L (, ) if p = 1 such that

f = hf d.
1 1
+ = 1.
p q
Proof: Without loss of generality, assum () = . Then let {n } be a

sequence of disjoint elements of S having the property that
1 < (n ) < ,
n=1 n = .
Dene

1 1
r(x) = X (x) ( n ) , e
(E) = rd.
n=1
n2 n E
Thus
1
e() =
rd = <
n=1
n2
e is a nite measure. The above lemma gives the existence part of the conclusion
so
of the theorem. Uniqueness is done as before. This proves the theorem.
With the Riesz representation theorem, it is easy to show that
Lp (), p > 1
is a reexive Banach space. Recall Denition 18.2.14 on Page 457 for the denition.
Theorem 20.3.7 For (, S, ) a nite measure space and p > 1, Lp () is re-

exive.

Proof: Let r : (Lr ()) Lr () be dened for 1
r + 1
r
= 1 by

( r )g d = g
for all g Lr (). From Theorem 20.3.6 r is one to one, onto, continuous and
linear. By the open map theorem, 1 r is also one to one, onto, and continuous
( r equals the representor of ). Thus r is also one to one, onto, and continuous
by Corollary 18.2.11. Now observe that J = p 1 q
q . To see this, let z (L ) ,
p
y (L ) ,
p 1
q ( q z )(y ) = ( p z )(y )

= z ( p y )

= ( q z )( p y )d,
J( q z )(y ) = y ( q z )

= ( p y )( q z )d.
Therefore p 1 q p
q = J on q (L ) = L . But the two maps are onto and so J is
also onto.
20.4. THE DUAL SPACE OF L () 537
20.4 The Dual Space Of L ()

What about the dual space of L ()? This will involve the following Lemma. Also
recall the notion of total variation dened in Denition 20.2.2.
Lemma 20.4.1 Let (, F) be a measure space. Denote by BV () the space of
finitely additive complex measures such that || () < . Then dening ||||
|| () , it follows that BV () is a Banach space.
Proof: It is obvious that BV () is a vector space with the obvious conventions
involving scalar multiplication. Why is |||| a norm? All the axioms are obvious
except for the triangle inequality. However, this is not too hard either.

|| + || | + | () = sup | (A) + (A)|
()
A()

sup | (A)| + sup | (A)|
() ()
A() A()
|| () + || () = |||| + |||| .
Suppose now that { n } is a Cauchy sequence. For each E F,
| n (E) m (E)| || n m ||
and so the sequence of complex numbers n (E) converges. That to which it con-
verges is called (E) . Then it is obvious that (E) is nitely additive. Why is ||
nite? Since |||| is a norm, it follows that there exists a constant C such that for
all n,
| n | () < C
Let () be any partition. Then

| (A)| = lim | n (A)| C.
n
A() A()
Hence BV (). Let > 0 be given and let N be such that if n, m > N, then
|| n m || < /2.
Pick any such n. Then choose () such that

| n | () /2 < | (A) n (A)|
A()

= lim | m (A) n (A)| < lim inf | n m | () /2
m m
A()
It follows that
lim || n || = 0.
n
Corollary 20.4.2 Suppose (, F) is a measure space as above and suppose is

a measure dened on F. Denote by BV (; ) those nitely additive measures of
BV () such that in the usual sense that if (E) = 0, then (E) = 0.
Then BV (; ) is a closed subspace of BV ().
Proof: It is clear that it is a subspace. Is it closed? Suppose n and each
n is in BV (; ) . Then if (E) = 0, it follows that n (E) = 0 and so (E) = 0
also, being the limit of 0.
n
Denition 20.4.3 For s a simple function s () = k=1 ck XEk () and BV () ,
dene an integral with respect to as follows.
n
sd ck (Ek ) .
k=1

For f function which is in L (; ) , dene f d as follows. Applying Theorem
7.5.6, to the positive and negative parts of real and imaginary parts of f, there exists
a sequence of simple functions {sn } which converges uniformly to f o a set of
measure zero. Then
f d lim sn d
n
Lemma 20.4.4 The above denition of the integral with respect to a nitely addi-
tive measure in BV (; ) is well dened.
Proof: First consider the claim about the integral being well dened on the
simple functions. This is clearly true if it is required that the ck are disjoint and
the Ek also disjoint having union equal to . Thus dene the integral of a simple
function in this manner. First write the simple function as

n
ck XEk
k=1
where the ck are the values of the simple function. Then use the above formula to
dene the integral. Next suppose the Ek are disjoint but the ck are not necessarily
distinct. Let the distinct values of the ck be a1 , , am

ck XEk = aj XEi = aj Ei
k j i:ci =aj j i:ci =aj

= aj (Ei ) = ck (Ek )
j i:ci =aj k
and so the same formula for the integral of a simple function is obtained in this case
also. Now consider two simple functions

n
m
s= ak XEk , t = bj XFj
k=1 j=1
20.4. THE DUAL SPACE OF L () 539
where the ak and bj are the distinct values of the simple functions. Then from what
was just shown,

n m m n
(s + t) d = ak XEk Fj + bj XEk Fj d
k=1 j=1 j=1 k=1

= ak XEk Fj + bj XEk Fj d
j,k

= (ak + bj ) (Ek Fj )
j,k

n
m
m
n
= ak (Ek Fj ) + bj (Ek Fj )
k=1 j=1 j=1 k=1
n
m
= ak (Ek ) + bj (Fj )
k=1 j=1

= sd + td
Thus the integral is linear on simple functions so, in particular, the formula given
in the above denition is well dened regardless.
So what about the denition for f L (; )? Since f L , there is a set
of measure zero N such that on N C there exists a sequence of simple functions
which converges uniformly to f on N C . Consider sn and sm . As in the above, they
can be written as
p p
ck XEk ,
n
k X Ek
cm
k=1 k=1
respectively, where the Ek are disjoint having union equal to . Then by uniform
convergence, if m, n are suciently large, |cnk cm
k | < or else the corresponding
Ek is contained in N C a set of measure 0 thanks to . Hence
p

sn d sm d =
(ck ck ) (Ek )
n m

k=1

p
|cnk cm
k | | (Ek )| ||||
k=1
and so the integrals of these simple functions converge. Similar reasoning shows
that the denition is not dependent on the choice of approximating sequence.
Note also that for s simple,

sd ||s|| || () = ||s|| ||||
L L
Next the dual space of L (; ) will be identied with BV (; ). First here is a

simple observation. Let BV (; ) . Then dene the following for f L (; ) .

T (f ) f d
Lemma 20.4.5 For T just dened,
|T f | ||f ||L ||||
Proof: As noted above, the conclusion true if f is simple. Now if f is in L ,

then it is the uniform limit of simple functions o a set of measure zero. Therefore,
by the denition of the T ,
|T f | = lim |T sn | lim inf ||sn ||L |||| = ||f ||L |||| .

n n

Thus each T is in (L (; )) .
Here is the representation theorem, due to Kantorovitch, for the dual of L (; ).

Theorem 20.4.6 Let : BV (; ) (L (; )) be given by () T . Then
is one to one, onto and preserves norms.

Proof: It was shown in the above lemma that maps into (L (; )) . It is
obvious that is linear. Why does it preserve norms? From the above lemma,
|||| sup |T f | ||||

||f || 1
It remains to turn the inequality around. Let () be a partition. Then

| (A)| = sgn ( (A)) (A) f d
A() A()
where sgn ( (A)) is dened to be a complex number of modulus 1 such that sgn ( (A)) (A) =
| (A)| and
f () = sgn ( (A)) XA () .
A()
Therefore, choosing () suitably, since ||f || 1,

|||| = || () | (A)| = T (f )
A()
= |T (f )| = | () (f )| || ()|| ||||
Thus preserves norms. Hence it is one to one also. Why is onto?

Let (L (; )) . Then dene
(E) (XE ) (20.4.14)

20.5. NON FINITE CASE 541
This is obviously nitely additive because is linear. Also, if (E) = 0, then

XE = 0 in L and so (XE ) = 0. If () is any partition of , then

| (A)| = | (XA )| = sgn ( (XA )) (XA )
A() A() A()

= sgn ( (XA )) XA ||||
A()
and so |||| |||| showing that BV (; ). Also from 20.4.14, if s =

n
k=1 ck XEk is a simple function,
( n )
n
n
sd = ck (Ek ) = ck (XEk ) = ck XEk = (s)
k=1 k=1 k=1
Then letting f L (; ) , there exists a sequence of simple functions converging

to f uniformly o a set of measure zero and so passing to a limit in the above
with s replaced with sn it follows that

(f ) = f d
and so is onto.
20.5 Non Finite Case

It is not necessary to assume is either nite or nite to establish the Riesz rep-
resentation theorem for 1 < p < . This involves the notion of uniform convexity.
First we recall Clarksons inequalities.
Lemma 20.5.1 Let 2 p. Then

p
f + g p
+ f g 1 (||f ||p p + ||g||p p )
2 p 2 p 2 L L
L L
Let 1 < p < 2. then for 1/p + 1/q = 1,

q ( )q/p
f + g q
+ f g 1 ||f ||p p + 1 ||g||p p
2 p 2 p 2 L
2 L
L L
Recall also the following denition of uniform convexity.
Denition 20.5.2 A Banach space, X, is said to be uniformly convex if whenever

||xn || 1 and || xn +x
2
m
|| 1 as n, m , then {xn } is a Cauchy sequence and
xn x where ||x|| = 1.
Observe that Clarksons inequalities imply Lp is uniformly convex for all p >
1. Uniformly convex spaces have a very nice property which is described in the
following lemma. Roughly, this property is that any element of the dual space
achieves its norm at some point of the closed unit ball.
Lemma 20.5.3 Let X be uniformly convex and let X . Then there exists
x X such that
||x|| = 1, (x) = ||||.
Proof: Let ||e

xn || 1 and | (e
xn ) | ||||. Let xn = wn x
en where |wn | = 1 and
xn = |e
wn e xn |.
Thus (xn ) = | (xn ) | = | (e

xn ) | ||||.
(xn ) ||||, ||xn || 1.
We can assume, without loss of generality, that
||||
(xn ) = | (xn ) |
2
and = 0.
Claim || xn +x 2
m
|| 1 as n, m .
Proof of Claim: Let n, m be large enough that (xn ) , (xm ) |||| 2
where 0 < . Then ||xn + xm || = 0 because if it equals 0, then xn = xm so
(xn ) = (xm ) but both (xn ) and (xm ) are positive. Therefore consider
xn +xm
||xn +xm || , a vector of norm 1. Thus,
( )
(xn + xm ) 2||||
|||| | | .
||xn + xm || ||xn + xm ||
Hence
||xn + xm |||||| 2|||| .
Since > 0 is arbitrary, limn,m ||xn + xm || = 2. This proves the claim.
By uniform convexity, {xn } is Cauchy and xn x, ||x|| = 1. Thus (x) =
limn (xn ) = ||||.
The proof of the Riesz representation theorem will be based on the following
lemma which says that if you can show a directional derivative exists, then it can
be used to represent a functional.
Lemma 20.5.4 (McShane) Let X be a complex normed linear space and let X .
Suppose there exists x X, ||x|| = 1 with x = |||| = 0. Let y X and let
y (t) = ||x + ty|| for t R. Suppose y (0) exists for each y X. Then for all
y X,
y (0) + i iy (0) = ||||1 (y) .
Proof: Suppose rst that |||| = 1. Then since (x) = 1, (y (y)x) = 0 and
so
(x + t(y (y)x)) = (x) = 1 = ||||.
Therefore, ||x + t(y (y)x)|| 1 since otherwise ||x + t(y (y)x)|| = r < 1 and
so ( )
1 1 1
(x + t(y (y)x)) = (x) =
r r r
which would imply that |||| > 1.
Also for small t, |(y)t| < 1, and so
1 ||x + t (y (y)x)|| = ||(1 (y)t)x + ty||

t
|1 (y) t| x + y .
1 (y) t
This implies
1
|1 + t (y) + o (t)| =
|1 t(y)|

t
x + y = ||x + ty + o (t)|| (20.5.15)
1 (y)t
where limt0 o (t) (t1 ) = 0. Thus for t > 0,
|1 + t (y)| 1 ||x + ty|| ||x|| o (t)
Re (y) |Re (y)| +
t t t
and for t < 0,
|1 + t (y)| 1 ||x + ty|| ||x|| o (t)
Re (y) +
t t t
By assumption, letting t 0+ and t 0,
||x + ty|| ||x||
Re (y) = lim = y (0) .
t0 t
Now
(y) = Re (y) + i Im (y)
so
(iy) = i(y) = i Re (y) + Im (y)
and
(iy) = Re (iy) + i Im (iy).
Hence
Re (iy) = Im (y).
Consequently,
(y) = Re (y) + i Im (y) = Re (y) + i Re (iy)

= y (0) + i iy (0).
This proves the lemma when |||| = 1. For arbitrary = 0, let (x) = ||||, ||x|| = 1.
1
Then from above, if 1 (y) |||| (y) , ||1 || = 1 and so from what was just
shown,
(y)
1 (y) = = y (0) + i iy (0)
||||
Now here are some short observations. For t R, p > 1, and x, y C, x = 0
p p
|x + ty| |x| p2
lim = p |x| (Re x Re y + Im x Im y)
t0 t
p2
= p |x| Re (
xy) (20.5.16)
Also from convexity of f (r) = r , for |t| < 1,
p
p p p p
|x + ty| |x| ||x| + |t| |y|| |x|
[ ( )]p
|x| + |t| |y| p
= (1 + |t|) |x|
1 + |t|
p p
p |x| |t| |y| p
(1 + |t|) + |x|
1 + |t| 1 + |t|
p1 p p p
(1 + |t|) (|x| + |t| |y| ) |x|
( )
p1 p p
(1 + |t|) 1 |x| + 2p1 |t| |y|
, f (t) is uniformly bounded, depending on p, for t

p1
Now for f (t) (1 + t)
[0, 1] . Hence the above is dominated by an expression of the form
p p
Cp (|x| + |y| ) |t| (20.5.17)
The above lemma and uniform convexity of Lp can be used to prove a general
version of the Riesz representation theorem next. This version makes no assumption
that the measure space is nite. Let p > 1 and let : Lq (Lp ) be dened by

(g)(f ) = gf d. (20.5.18)

Theorem 20.5.5 (Riesz representation theorem p > 1) The map is 1-1, onto,
continuous, and
||g|| = ||g||, |||| = 1.

Proof: Obviously is linear. Suppose g = 0. Then 0 = gf d for all
f Lp . Let f = |g|q2 g. Then f Lp and so 0 = |g|q d. Hence g = 0 and is
1-1. That g (Lp ) is obvious from the Holder inequality. In fact,
|(g)(f )| ||g||q ||f ||p ,

and so ||(g)|| ||g||q. To see that equality holds, let

f = |g|q2 g ||g||1q
q .
Then ||f ||p = 1 and

(g)(f ) = |g|q d||g||1q
q = ||g||q .

Thus |||| = 1.
It remains to show is onto. Let (Lp ) . Is = g for some g Lq ? Without
loss of generality, assume = 0. By uniform convexity of Lp , Lemma 20.5.3, there
exists g such that
g = ||||, g Lp , ||g|| = 1.
p
For f Lp , dene f (t) |g + tf | d. Thus
1
f (t) ||g + tf ||p f (t) p .
Does f (0) exist? Let [g = 0] denote the set {x : g (x) = 0}.

f (t) f (0) (|g + tf |p |g|p )
= d
t t
p p
From 20.5.17, the integrand is bounded by Cp (|f | + |g| ) . Therefore, using 20.5.16,
the dominated convergence theorem applies and it follows f (0) =
[ ]
f (t) f (0) p1 (|g + tf |p |g|p )
lim = lim |t| |f | d +
p
d
t0 t t0 [g=0] [g=0] t

p2 p2
=p |g| Re (g f ) d = p |g| Re (g f ) d
[g=0]
Hence
p
f (0) = ||g|| q |g(x)|p2 Re(g(x)f(x))d.
Note 1
p 1 = 1q . Therefore,

p
if (0) = ||g|| q |g(x)|p2 Re(ig(x)f(x))d.
But Re(ig f) = Im(g f) and so by the McShane lemma,

p
(f ) = |||| ||g|| q |g(x)|p2 [Re(g(x)f(x)) + i Re(ig(x)f(x))]d

p
= |||| ||g|| q |g(x)|p2 [Re(g(x)f(x)) + i Im(g(x)f(x))]d

p
= |||| ||g|| q |g(x)|p2 g(x)f (x)d.
This shows that p

= (|||| ||g|| q |g|p2 g)
and veries is onto.
20.6 The Dual Space Of C0 (X)

Consider the dual space of C0 (X) where X is a locally compact Hausdor space.
It will turn out to be a space of measures. To show this, the following lemma will
be convenient. Recall this space is dened as follows.
Denition 20.6.1 f C0 (X) means that for every > 0 there exists a compact
set K such that |f (x)| < whenever x
/ K. Recall the norm on this space is
||f || ||f || sup {|f (x)| : x X}
Lemma 20.6.2 Suppose is a mapping which has nonnegative values which is

dened on the nonnegative functions in C0 (X) such that
(af + bg) = a (f ) + b (g) (20.6.19)
whenever a, b 0 and f, g 0. Then there exists a unique extension of to all of

C0 (X), such that whenever f, g C0 (X) and a, b C, it follows
(af + bg) = a (f ) + b (g) .
If
| (f )| C ||f ||
then
|f | C ||f ||
Proof: Let C0 (X; R) be the real-valued functions in C0 (X) and dene
R (f ) = f + f
for f C0 (X; R). Use the identity
(f1 + f2 )+ + f1 + f2 = f1+ + f2+ + (f1 + f2 )
and 20.6.19 to write
(f1 + f2 )+ (f1 + f2 ) = f1+ f1 + f2+ f2 ,
it follows that R (f1 + f2 ) = R (f1 ) + R (f2 ). To show that R is linear, it is

necessary to verify that R (cf ) = cR (f ) for all c R. But
(cf ) = cf ,
if c 0 while
(cf )+ = c(f ),
if c < 0 and
(cf ) = (c)f +,
20.6. THE DUAL SPACE OF C0 (X) 547
if c < 0. Thus, if c < 0,

( ) ( )
R (cf ) = (cf )+ (cf ) = (c) f (c)f +
= c(f ) + c(f + ) = c((f + ) (f )) = cR (f ) .
A similar formula holds more easily if c 0. Now let
f = R (Re f ) + iR (Im f )
for arbitrary f C0 (X). This is linear as desired.
Here is why. It is obvious that (f + g) = (f ) + (g) from the fact that
taking the real and imaginary parts are linear operations. The only thing to check
is whether you can factor out a complex scalar.
((a + ib) f ) = (af ) + (ibf )
R (a Re f ) + iR (a Im f ) + R (b Im f ) + iR (b Re f )
because ibf = ib Re f b Im f and so Re (ibf ) = b Im f and Im (ibf ) = b Re f .
Therefore, the above equals
= (a + ib) R (Re f ) + i (a + ib) R (Im f )
= (a + ib) (R (Re f ) + iR (Im f )) = (a + ib) f
The extension is obviously unique because all the above is required in order for
to be linear.
It remains to verify the claim about continuity of . From the denition of ,
if 0 g f, then
(f ) = (f g + g) = (f g) + (g) (g)
( )
|R f | f + f max f + , f (|f |) C ||f ||
Then letting f = |f | , || = 1, and using the above,
|f | = f = (f ) R (Re (f )) = |R (Re (f ))|
C ||Re (f )|| C ||f ||
Let L C0 (X) . Also denote by C0+ (X) the set of nonnegative continuous
functions dened on X. Dene for f C0+ (X)
(f ) = sup{|Lg| : |g| f }.
Note that (f ) < because |Lg| ||L||||g|| ||L||||f || for |g| f . Then the
following lemma is important.
Lemma 20.6.3 If c 0, (cf ) = c(f ), f1 f2 implies f1 f2 , and
(f1 + f2 ) = (f1 ) + (f2 ).
Also
0 (f ) ||L|| ||f ||
Proof: The rst two assertions are easy to see so consider the third.
For fj C0+ (X) , there exists gi C0 (X) such that |gi | fi and
(f1 ) + (f2 ) |L (g1 )| + |L (g2 )| + 2

= L ( 1 g1 ) + L ( 2 g2 ) + 2
= L ( 1 g1 + 2 g2 ) + 2
= |L ( 1 g1 + 2 g2 )| + 2
where |gi | fi and | i | = 1 and i L (gi ) = |L (gi )|. Now
| 1 g1 + 2 g2 | |g1 | + |g2 | f1 + f2
and so the above shows
(f1 ) + (f2 ) (f1 + f2 ) + 2.
Since is arbitrary, (f1 ) + (f2 ) (f1 + f2 ) . It remains to verify the other

inequality.
Now let |g| f1 + f2 , |Lg| (f1 + f2 ) . Let
{
fi (x)g(x)
hi (x) = f1 (x)+f2 (x) if f1 (x) + f2 (x) > 0,
0 if f1 (x) + f2 (x) = 0.
Then hi is continuous and h1 (x) + h2 (x) = g(x), |hi | fi . The reason it is

continuous at a point where f1 (x) + f2 (x) = 0 is that at every point y where
f1 (y) + f2 (y) > 0, the top description of the function gives

fi (y) g (y)

f1 (y) + f2 (y) |g (y)|
Therefore,
+ (f1 + f2 ) |Lg| |Lh1 + Lh2 | |Lh1 | + |Lh2 |

(f1 ) + (f2 ).
Since > 0 is arbitrary, this shows that
(f1 + f2 ) (f1 ) + (f2 ) (f1 + f2 )
The last assertion follows from
(f ) = sup{|Lg| : |g| f } sup ||L|| ||g|| ||L|| ||f ||

||g|| ||f ||
which proves the lemma.

Let be dened in Lemma 20.6.2. Then is linear by this lemma and also
satises
|f | ||L|| ||f || . (20.6.20)
20.6. THE DUAL SPACE OF C0 (X) 549
Also, if f 0,
f = R f = (f ) 0.
Therefore, is a positive linear functional on C0 (X). In particular, it is a positive
linear functional on Cc (X). By Theorem 16.4.4 on Page 403, there exists a unique
measure such that
f = f d
X
for all f Cc (X). This measure is inner regular on all open sets and on all
measurable sets having nite measure. In fact, it is actually a nite measure.

Lemma 20.6.4 Let L C0 (X) as above. Then letting be the Radon measure
just described, it follows is nite and
(X) = |||| = ||L||
Proof: First of all, why is |||| = ||L||? From 20.6.20 it follows |||| ||L||. But
also
|Lg| (|g|) = (|g|) |||| ||g||
and so by denition of the operator norm, ||L|| |||| .
Now X is an open set and so
(X) = sup { (K) : K X}
and so letting K f X for one of these K, it also follows
(X) = sup {f : f X}
However, for such f X,
0 f = R f ||L|| ||f || = ||L||
and so
(X) ||L|| .
Now since Cc (X) is dense in C0 (X) , there exists f Cc (X) such that ||f || 1
and
|f | + > |||| = ||L||
Then also f X and so
||L|| < |f | = f (X)
Since is arbitrary, this shows ||L|| = (X). This proves the lemma.
What follows is the Riesz representation theorem for C0 (X) .
Theorem 20.6.5 Let L (C0 (X)) for X a locally compact Hausdorf space. Then
there exists a nite Radon measure and a function L (X, ) such that for
all f C0 (X) ,
L(f ) = f d.
X
Furthermore,
(X) = ||L|| , || = 1 a.e.
and if
(E) d
E
then = || .
Proof: From the above there exists a unique Radon measure such that for all
f Cc (X) ,
f = f d
X
Then for f Cc (X) ,

|Lf | (|f |) = |f |d = ||f ||L1 () .
X
Since is both inner and outer regular thanks to it being nite, Cc (X) is dense
in L1 (X, ). (See Theorem 12.5.3 for more than is needed.) Therefore L extends
e By the Riesz representation theorem for
uniquely to an element of (L1 (X, )) , L.
L1 for nite measure spaces, there exists a unique L (X, ) such that for all
f L1 (X, ) ,
e
Lf = f d
X
In particular, for all f C0 (X) ,

Lf = f d
X
and it follows from Lemma 20.6.4, (X) = ||L||.

It remains to verify || = 1 a.e. For any f 0,

f f d |Lf | = f d
X X
Now if E is measurable, the regularity of implies there exists a sequence of bounded

functions fn Cc (X) such that fn (x) XE (x) a.e. Then using the dominated
convergence theorem in the above,

d = lim fn d lim fn d = d

E n X n X E
20.7. THE DUAL SPACE OF C0 (X), ANOTHER APPROACH 551
and so if (E) > 0,

1

1 d
(E) E
which shows from Lemma 20.2.7 that || 1 a.e. But also, choosing f1 appropri-
ately, ||f1 || 1, and letting Lf1 = |Lf1 | ,
(X) = ||L|| = sup |Lf | |Lf1 | +

||f || 1

f1 d + = Re (f1 ) d +
X X
|| d +
X
and since is arbitrary,

(X) || d
X
which requires || = 1 a.e. since it was shown to be no larger than 1 and if it is

smaller than 1 on a set of positive measure, then the above could not hold.
It only remains to verify = ||. By Corollary 20.2.9,

|| (E) = || d = 1d = (E)
E E
and so = || . This proves the Theorem.

Sometimes people write

f d f d ||
X X
where d || is the polar decomposition of the complex measure . Then with this
convention, the above representation is

L (f ) = f d, || (X) = ||L|| .
X
Also note that at most one can represent L. If there were two of them i , i = 1, 2,
then 1 2 would represent 0 and so | 1 2 | (X) = 0. Hence 1 = 2 , at least
on every Borel set.
20.7 The Dual Space Of C0 (X), Another Approach

It is possible to obtain the above theorem by a slick trick after rst proving it for
the special case where X is a compact Hausdor space. For X a locally compact
Hausdor space, X e denotes the one point compactication of X. Thus, X e =X
e
{} and the topology of X consists of the usual topology of X along with all
complements of compact sets which are dened as the open sets containing .
Also C0 (X) will denote the space of continuous functions, f , dened on X such
that in the topology of X, e limx f (x) = 0. For this space of functions, ||f ||
0
sup {|f (x)| : x X} is a norm which makes this into a Banach space. Then the
generalization is the following corollary.

Corollary 20.7.1 Let L (C0 (X)) where X is a locally compact Hausdor space.
Then there exists L (X, ) for a nite Radon measure such that for all
f C0 (X),

L (f ) = f d.
X
Proof: Let { ( ) }
e f C X
D e : f () = 0 .
( )
e is a closed subspace of the Banach space C X
Thus D e . Let : C0 (X) D
e be
dened by
{
f (x) if x X,
f (x) =
0 if x = .
e (||u|| = ||u|| .)The following diagram is

Then is an isometry of C0 (X) and D.
obtained.
( ) ( )
e i e
C0 (X) D C X
( )
C0 (X) e
D e
C X
i
( )
By the Hahn Banach theorem, there exists L1 C X e such that i L1 = L.
Now apply Theorem 20.6.5( to get) the existence of a nite Radon measure, 1 , on
e and a function L X,
X e 1 , such that

L1 g = gd1 .
e
X
Letting the algebra of 1 measurable sets be denoted by S1 , dene
S {E \ {} : E S1 }
and let be the restriction of 1 to S. If f C0 (X),

Lf = i L1 f L1 if = L1 f = f d1 = f d.
e
X X

20.8. MORE ATTRACTIVE FORMULATIONS 553
20.8 More Attractive Formulations

In this section, Corollary 20.7.1 will be rened and placed in an arguably more
attractive form. The measures involved will always be complex Borel measures
dened on a algebra of subsets of X, a locally compact Hausdor space.

Denition 20.8.1 Let be a complex measure. Then f d f hd || where
hd || is the polar decomposition of described above. The complex measure, is
called regular if || is regular.
The following lemma says that the dierence of regular complex measures is also
regular.
Lemma 20.8.2 Suppose i , i = 1, 2 is a complex Borel measure with total variation

nite2 dened on X, a locally compact Hausdorf space. Then 1 2 is also a regular
measure on the Borel sets.
Proof: Let E be a Borel set. That way it is in the algebras associated

with both i . Then by regularity of i , there exist K and V compact and open
respectively such that K E V and |i | (V \ K) < /2. Therefore,

|(1 2 ) (A)| = |1 (A) 2 (A)|
A(V \K) A(V \K)

|1 (A)| + |2 (A)|
A(V \K)
|1 | (V \ K) + |2 | (V \ K) < .
Therefore, |1 2 | (V \ K) and this shows 1 2 is regular as claimed.

Theorem 20.8.3 Let L C0 (X) Then there exists a unique complex measure,
with || regular and Borel, such that for all f C0 (X) ,

L (f ) = f d.
X
Furthermore, ||L|| = || (X) .
Proof: By Corollary 20.7.1 there exists L (X, ) where is a Radon

measure such that for all f C0 (X) ,

L (f ) = f d.
X
Let a complex Borel measure, be given by

(E) d.
E
2 Recall this is automatic for a complex measure.
This is a well dened complex measure because is a nite measure. By Corollary

20.2.9
|| (E) = || d (20.8.21)
E
and = g || where gd || is the polar decomposition for . Therefore, for f
C0 (X) ,

L (f ) = f d = f g || d = f gd || f d. (20.8.22)
X X X X
From 20.8.21 and the regularity of , it follows that || is also regular.

What of the claim about ||L||? By the regularity of || , it follows that C0 (X) (In
fact, Cc (X)) is dense in L1 (X, ||). Since || is nite, g L1 (X, ||). Therefore,
there exists a sequence of functions in C0 (X) , {fn } such that fn g in L1 (X, ||).
Therefore, there exists a subsequence, still denoted by {fn } such that fn (x) g (x)
|| a.e. also. But since |g (x)| = 1 a.e. it follows that hn (x) |f f(x)|+ n (x)
1 also
n n
converges pointwise || a.e. Then from the dominated convergence theorem and
20.8.22
||L|| lim hn gd || = || (X) .
n X
Also, if ||f ||C0 (X) 1, then

|L (f )| = f gd || |f | d || || (X) ||f ||C0 (X)
X X
and so ||L|| || (X) . This proves everything but uniqueness.

Suppose and 1 both work. Then for all f C0 (X) ,

0= f d ( 1 ) = f hd | 1 |
X X
where hd | 1 | is the polar decomposition for 1 . By Lemma 20.8.2 1 is

regular andso, as above, there exists {fn } such that |fn | 1 and fn h pointwise.
Therefore, X d | 1 | = 0 so = 1 . This proves the theorem.
20.9 Exercises
1. Suppose is a vector measure having values in Rn or Cn . Can you show that
|| must be nite? Hint: You might dene for each ei , one of the standard ba-
sis vectors, the real or complex measure, ei given by ei (E) ei (E) . Why
would this approach not yield anything for an innite dimensional normed lin-
ear space in place of Rn ?
2. The Riesz representation theorem of the Lp spaces can be used to prove a
very interesting inequality. Let r, p, q (1, ) satisfy
1 1 1
= + 1.
r p q
20.9. EXERCISES 555
Then
1 1 1 1
=1+ >
q r p r
and so r > q. Let (0, 1) be chosen so that r = q. Then also

1/p+1/p =1
z }| {
1
1 1
1 1
= 1 + 1=
r p q q p
and so
1 1
=
q q p
which implies p (1 ) = q. Now let f Lp (Rn ) , g Lq (Rn ) , f, g 0.
Justify the steps in the following argument using what was just shown that
r = q and p (1 ) = q. Let
( )
1 1
h Lr (Rn ) . + =1
r r

f g (x) h (x) dx = f (y) g (x y) h (x) dxdy .

1
|f (y)| |g (x y)| |g (x y)| |h (x)| dydx
( ( )r )1/r
1
|g (x y)| |h (x)| dx
( ( )r )1/r

|f (y)| |g (x y)| dx dy
[ ( )p /r ]1/p
( )r
1
|g (x y)| |h (x)| dx dy
[ ( )p/r ]1/p
( )r

|f (y)| |g (x y)| dx dy
[ ( )r /p ]1/r
( )p
1
|g (x y)| |h (x)| dy dx
[ ( )p/r ]1/p
p r
|f (y)| |g (x y)| dx dy
[ ( )r /p ]1/r
r (1)p q/r
= |h (x)| |g (x y)| dy dx ||g||q ||f ||p
q/r q/p
= ||g||q ||g||q ||f ||p ||h||r = ||g||q ||f ||p ||h||r . (20.9.23)
Youngs inequality says that
||f g||r ||g||q ||f ||p . (20.9.24)
Therefore ||f g||r ||g||q ||f ||p . How does this inequality follow from the
above computation? Does 20.9.23 continue to hold if r, p, q are only assumed
to be in [1, ]? Explain. Does 20.9.24 hold even if r, p, and q are only assumed
to lie in [1, ]?
3. Suppose (, , S) is a nite measure space and that {fn } is a sequence of
functions which converge weakly to 0 in Lp (). This means that

fn gd 0

for every g Lp (). Suppose also that fn (x) 0 a.e. Show that then
fn 0 in Lp () for every p > > 0.
4. Give an example of a sequence of functions in L (, ) which converges
weak to zero but which does not converge pointwise a.e. to zero. Conver-

gence weak to 0 means that for every g L1 (, ) , g (t) fn (t) dt 0.
Hint: First consider g Cc (, ) and maybe try something like fn (t) =
sin (nt). Do integration by parts.
5. Let be a real vector measure on the measure space (, F). That is has
values in R. The Hahn decomposition says there exist measurable sets P, N
such that
P N = , P N = ,
and for each F P, (F ) 0 and for each F N, (F ) 0. These
sets P, N are called the positive set and the negative set respectively. Show
the existence of the Hahn decomposition. Also explain how this decompo-
sition is unique in the sense that if P , N is another Hahn decomposition,
then (P \ P ) (P \ P ) has measure zero, a similar formula holding for
N, N . When you have the Hahn decomposition, as just described, you de-
ne + (E) (E P ) , (E) (E N ). This is sometimes called the
Hahn Jordan decomposition. Hint: This is pretty easy if you use the polar
decomposition above.
6. The Hahn decomposition holds for measures which have values in (, ].
Let be such a measure which is dened on a algebra of sets F. This is
not a vector measure because the set on which it has values is not a vector
space. Thus this case is not included in the above discussion. N F is called
20.9. EXERCISES 557
a negative set if (B) 0 for all B N. P F is called a positive set if

for all F P, (F ) 0. (Here it is always assumed you are only considering
sets of F.) Show that if (A) 0, then there exists N A such that N
is a negative set and (N ) (A). Hint: This is done by subtracting o
disjoint sets having positive meaure. Let A N0 and suppose Nn A has
been obtained. Tell why
tn sup { (E) : E Nn } 0.
Let Bn Nn such that

tn
(Bn ) >
2
Then Nn+1 Nn \ Bn . Thus the Nn are decreasing in n and the Bn are
disjoint. Explain why (Nn ) (N0 ). Let N = Nn . Argue tn must
converge to 0 since otherwise (N ) = . Explain why this requires N to
be a negative set in A which has measure no larger than that of A.
7. Using Problem 6 complete the Hahn decomposition for having values in
(, ]. Now the Hahn Jordan decomposition for the measure is
+ (E) (E P ) , (E) (E N ) .
Explain why is a nite measure. Hint: Let N0 = . For Nn a given

negative set, let
tn inf { (E) : E Nn = }
Explain why you can assume that for all n, tn < 0. Let En NnC such that
(En ) < tn /2 < 0
and from Problem 6 let An En be a negative set such that (An ) (En ).
Then Nn+1 Nn An . If tn does not converge to 0 explain why there exists a
set having measure which is not allowed. Thus tn 0. Let N = n=1 Nn
and explain why P N C must be positive due to tn 0.
8. What if has values in [, ). Prove there exists a Hahn decomposition for
as in the above problem. Why do we not allow to have values in [, ]?
Hint: You might want to consider .
9. Suppose X is a Banach space and let X denote its dual space. A sequence

{xn }n=1 in X is said to converge weak to x X if for every x X,
lim xn (x) = x (x) .

n
Let {n } be a mollier. Also let be the measure dened by
(E) = 1 if 0 E and 0 if 1
/ E.
Explain how n weak .

10. Let (, F, P ) be a probability space and let X : Rn be a random

variable. This means X 1 (open set) F. Dene a measure X on the Borel
sets of Rn as follows. For E a Borel set,
( )
X (E) P X 1 (E)
Explain why this is well dened. Next explain why X can be considered a
Radon probability measure by completion. Explain why X G if

X () dX
Rn
where G is the collection of functions used to dene the Fourier transform.
11. Using the above problem, the characteristic function of this measure (ran-
dom variable) is
X (y) eixy dX
Rn
Show this always exists for any such random variable and is continuous. Next
show that for two random variables X, Y, X = Y if and only if X (y) =
Y (y) for all y. In other words, show the distribution measures are the same
if and only if the characteristic functions are the same. A lot more can be
concluded by looking at characteristic functions of this sort. The important
thing about these characteristic functions is that they always exist, unlike
moment generating functions.
12. It was shown above that if X where X is a uniformly convex Banach
space, then there exists x X, ||x|| = 1, and (x) = |||| . Show that this x
must be unique. Hint: Recall that uniform convexity implies strict convexity.
Dierentiation With Respect
To General Radon Measures
This is a brief chapter on certain important topics on the dierentiation theory

for general Radon measures. For dierent proofs and some results which are not
discussed here, a good source is [16] which is where I rst read some of these things.
21.1 Besicovitch Covering Theorem

When dealing with probability distribution functions or some other Radon measure,
it is necessary to have a better covering theorem than the Vitali covering theorem
which works well for Lebesgue measure. However, for a Radon measure, if you
enlarge the ball by making the radius larger, you dont know what happens to the
measure of the enlarged ball except that its measure does not get smaller. Thus the
thing required is a covering theorem which does not depend on enlarging balls.
The rst fundamental observation is found in the following lemma which holds
for the context illustrated by the following picture. This picture is drawn such that
the balls come from the usual Euclidean norm, but the norm could be any norm on
Rn .
rx Bx By ry
x y
px
py
a
Ba r
Lemma 21.1.1 Let the balls Ba , Bx , By be as shown having radii r, rx , ry respec-

tively. Suppose the centers of Bx and By are not both in any of the balls shown, and
suppose ry rx r where is a number larger than 1. Also let Px a + r ||xa||
xa
559
560 DIFFERENTIATION WITH RESPECT TO GENERAL RADON MEASURES
with Py being dened similarly. Then it follows that ||Px Py || 1

+1 r. There ex-
ists a constant L (n, ) depending on and the dimension, such that if B1 , , Bm
are all balls such that any pair are in the same situation relative to Ba as Bx , and
By , then m L (n, ) .
Proof: From the denition,

x a y a
||Px Py || = r
||x a|| ||y a||

(x a) ||y a|| (y a) ||x a||

= r
||x a|| ||y a||

||y a|| (x y) + (y a) (||y a|| ||x a||)
= r

||x a|| ||y a||
||x y|| r
r |||y a|| ||x a||| . (21.1.1)
||x a|| ||x a||
There are two cases. First suppose that ||y a|| ||x a|| 0. Then this reduces
to
||x y|| r
r ||y a|| + r.
||x a|| ||x a||
From the assumptions, this is no larger than
( ) ( ) ( )
ry r + ry r r
r +1 r 1 r 1
||x a|| ||x a|| ||x a|| rx
( ) ( )
1 1
r 1 =r .

The other case is that ||y a|| ||x a|| < 0. Then in this case 21.1.1 reduces
to
( )
||x y|| 1
r (||x a|| ||y a||)
||x a|| ||x a||
( )
||x y|| ||y a||
= r 1+
||x a|| ||x a||
r
(||x y|| ||x a|| + ||y a||)
||x a||
r r
(ry (r + rx ) + ry ) (ry r)
rx + r r +r
( x )
r r 1 1
(rx r) 1 rx rx = r
rx + r rx + rx +1
This proves the estimate between Px and Py .

21.1. BESICOVITCH COVERING THEOREM 561
Finally, in the case of the balls Bi having centers at xi , let Pxi be the expression
a+r ||xxii a
a|| . Then (Pxi a) r
1
is on the unit sphere having center 0. Furthermore,

(Pxi a) r1 (Pyi a) r1 = r1 ||Pxi Pyi || r1 r 1 = 1 .
+1 +1
How many points on the unit sphere can be pairwise this far apart? This set is
compact and so there exists an 1
+1 net having L (n, ) points. Thus m cannot be
any larger than L (n, ).
The above lemma has do do with balls which are relatively large intersecting a
given ball. Next is a lemma which has to do with relatively small balls intersecting
a given ball.
Lemma 21.1.2 Let B be a ball having radius r and suppose B has nonempty in-
tersection with the balls B1 , , Bm having radii r1 , , rm respectively. Suppose
, > 1 and the ri are comparable with r in the sense that 1 r ri r. Let Bi
have the same center as Bi with radius equal to ri = ri for some < 1. If the Bi
are disjoint, then there exists a constant M (n, , , ) such that m M (n, , , ).
Letting = 10, = 1/3, = 4/3, it follows that m 60n .
Proof: Let the volume of a ball of radius r be given by (n) rn where (n)
depends on the norm used and on the dimension n as indicated. The idea is to
enlarge B, till it swallows all the Bi . Then, since they are disjoint and their radii
are not too small, there cant be too many of them.
This can be done for a single Bi by enlarging the radius of B to r + ri + ri .

Bi Bi B
Then to get all the Bi , you would just enlarge the radius of B to r + r + r =
(1 + + ) r. Then, using the inequality which makes ri comparable to r, it follows
that
m ( )n m
n n
(n) r (n) (ri ) (n) (1 + + ) rn
i=1
i=1
Therefore,
( )n
n
m (1 + + )

( )n
n
and so m (1 + + ) M (n, , , ).
From now on, let = 10 and let = 1/3 and = 4/3. Then
( )n
172
M (n, , , ) 60n
3
Thus m 60n .
Now here is the Besicovitch covering theorem.
Theorem 21.1.3 There exists a constant Nn , depending only on n with the fol-
lowing property. If F is any collection of nonempty balls in Rn with
sup {diam (B) : B F } < D <
and if A is the set of centers of the balls in F, then there exist subsets of F, H1 , ,
HNn , such that each Hi is a countable collection of disjoint balls from F (possibly
empty) and
A N i=1 {B : B Hi }.
n
Lemma 21.1.4 In the situation of Theorem 21.1.3, suppose the set of centers A is
J
bounded. Dene a sequence of balls from F, {Bj }j=1 where J such that
3
r (B1 ) sup {r (B) : B F} (21.1.2)
4
and if
Am A \ (m
i=1 Bi ) = , (21.1.3)
then Bm+1 F is chosen with center in Am such that
3
rm+1 r (Bm+1 ) sup {r : B (a, r) F , a Am } . (21.1.4)
4
Then letting Bj = B (aj , rj ) , this sequence satises
4 J
A Ji=1 Bi , r (Bk ) r (Bj ) for j < k, {B (aj , rj /3)}j=1 are disjoint.
3
(21.1.5)
Proof: Consider the second inequality. First note the sets Am form a decreasing
sequence. Thus, from the denition, of Bj , for j < k,
r (Bk ) sup {r : B (a,r) F , a Ak1 }

4
sup {r : B (a,r) F , a Aj1 } r (Bj )
3
Next consider the third claim. If x B (aj , rj /3) B (ai , ri /3) where j > i,
then from what was just shown
( )
rj ri 4 1 7
||aj ai || ||aj x|| + ||x ai || + + ri = ri < ri
3 3 9 3 9
21.2. FUNDAMENTAL THEOREM OF CALCULUS FOR RADON MEASURES 563
and this contradicts the construction because aj is not covered by B (ai , ri ).

Finally consider the claim that A Ji=1 Bi . Pick B1 satisfying 21.1.2. If
B1 , , Bm have been chosen, and Am is given in 21.1.3, then if it equals , it
follows A m i=1 Bi . Set J = m. Now let a be the center of Ba F . If a Am for
all m,(That is a does not get covered ( by the Bi .) then rm+1 34 r (Ba ) for all m,
rj )
a contradiction since the balls B aj , 3 are disjoint and A is bounded, implying
that rj 0.
Continuing with the proof of the theorem, let L (n, 10) be the constant of Lemma
21.1.1 and let Mn = L (n, 10)+60n +1. Dene the following sequence of sets consist-
ing of balls of F, G1 , G2 , , GMn . Referring to the sequence {Bk } just considered,
let B1 G1 and if B1 , , Bm have been assigned, each to a Gi , place Bm+1 in the
rst Gj such that it intersects no set already in Gj . The existence of such a j follows
from Lemmas 21.1.1 and 21.1.2. Here is why. Bm+1 can intersect at most L (n, 10)
sets of {B1 , , Bm } which have radii at least as large as 10Bm+1 thanks to Lemma
21.1.1. It can intersect at most 60n sets of {B1 , , Bm } which have radius smaller
than 10Bm+1 thanks to Lemma 21.1.2. Thus each Gj consists of disjoint sets of F
and the set of centers is covered by the union of these Gj . This proves the theorem
in case the set of centers is bounded.
Now let R1 = B (0, 5D) and if Rm has been chosen, let
Rm+1 = B (0, (m + 1) 5D) \ Rm
Thus, if |k m| 2, no ball from F having nonempty intersection with Rm can

intersect any ball from F which has nonempty intersection with Rk . This is because
all these balls have radius less than D. Now let Am A Rm and apply the above
result for a bounded set of centers to those balls of F which intersect Rm to obtain
sets of disjoint balls G1 (Rm ) , G2 (Rm ) , , GMn (Rm ) covering Am . Then simply
dene Gj
k=1 Gj (R2k ) , Gj k=1 Gj (R2k1 ) . Let Nn = 2Mn and
{ }
{H1 , , HNn } G1 , , GM

n
, G1 , , GMn
21.2 Fundamental Theorem Of Calculus For Radon

Measures
In this section the Besicovitch covering theorem will be used to give a generalization
of the Lebesgue dierentiation theorem to general Radon measures. In what follows,
will be a Radon measure,
Z {x Rn : (B (x,r)) = 0 for some r > 0},
Lemma 21.2.1 Z is measurable and (Z) = 0.
Proof: For each x Z, there exists a ball B (x,r) with (B (x,r)) = 0. Let C
be the collection of these balls. Since Rn has a countable basis, a countable subset,
e of C also covers Z. Let
C,
Ce = {Bi }i=1 .

Then letting denote the outer measure determined by ,

(Z) (Bi ) = (Bi ) = 0
i=1 i=1
Therefore, Z is measurable and has measure zero as claimed.

For x / Z, the above set of measure zero, dene the maximal function, M f :
Rn [0, ] by

1
M f (x) sup |f | d.
r1 (B (x, r)) B(x,r)
Theorem 21.2.2 Let be a Radon measure and let f L1 (Rn , ). Then for
a.e.x
/ Z,
1
lim |f (y) f (x)| d (y) = 0
r0 (B (x, r)) B(x,r)
Proof: First consider the following claim which is a weak type estimate of the
same sort used when dierentiating with respect to Lebesgue measure.
Claim 1: The following inequality holds for Nn the constant of the Besicovitch
covering theorem.
([M f > ]) Nn 1 ||f ||1
Proof: First note [M f > ] Z = and without loss of generality, you can
assume ([M f > ]) > 0. Next, for each x [M f > ] there exists a ball Bx =
B (x,rx ) with rx 1 and

1
(Bx ) |f | d > .
B(x,rx )
Let F be this collection of balls so that [M f > ] is the set of centers of balls of F.
By the Besicovitch covering theorem,
[M f > ] N
i=1 {B : B Gi }
n
where Gi is a collection of disjoint balls of F. Now for some i,
([M f > ]) /Nn ( {B : B Gi })
because if this is not so, then

Nn
([M f > ]) ( {B : B Gi })
i=1

Nn
([M f > ])
< = ([M f > ]),
i=1
Nn
21.2. FUNDAMENTAL THEOREM OF CALCULUS FOR RADON MEASURES 565
a contradiction. Therefore for this i,

([M f > ]) 1
( {B : B Gi }) = (B) |f | d
Nn B
BGi BGi

1 |f | d = 1 ||f ||1 .
Rn
This shows Claim 1.

Claim 2: If g is any continuous function dened on Rn , then for x
/ Z,

1
lim |g (y) g (x)| d (y) = 0
r0 (B (x, r)) B(x,r)
and
1
lim g (y) d (y) = g (x). (21.2.6)
r0 (B (x,r)) B(x,r)
Proof: Since g is continuous at x, whenever r is small enough,

1 1
|g (y) g (x)| d (y) d (y) = .
(B (x, r)) B(x,r) (B (x,r)) B(x,r)
21.2.6 follows from the above and the triangle inequality. This proves the claim.
Now let g Cc (Rn ) and x / Z. Then from the above observations about
continuous functions,
([ ])
1
x/ Z : lim sup |f (y) f (x)| d (y) > (21.2.7)
r0 (B (x, r)) B(x,r)
([ ])
1
x/ Z : lim sup |f (y) g (y)| d (y) >
r0 (B (x, r)) B(x,r) 2
([ ])
+ x / Z : |g (x) f (x)| > .
2
([ ]) ([ ])
M (f g) > + |f g| > (21.2.8)
2 2
Now
([ ])
|f g| d |f g| >
[|f g|> 2 ] 2 2
and so from Claim 1 21.2.8 and hence 21.2.7 is dominated by

( )
2 Nn
+ ||f g||L1 (Rn ,) .

But by regularity of Radon measures, Cc (Rn ) is dense in L1 (Rn , ) , and so since

g in the above is arbitrary, this shows 21.2.7 equals 0. Now
([ ])
1
x/ Z : lim sup |f (y) f (x)| d (y) > 0
r0 (B (x, r)) B(x,r)

([ ])
1 1
x/ Z : lim sup |f (y) f (x)| d (y) > =0
r0 (B (x, r)) B(x,r) k
k=1
By completeness of this implies

[ ]
1
x/ Z : lim sup |f (y) f (x)| d (y) > 0
r0 (B (x, r)) B(x,r)
is a set of measure zero.

The following corollary is the main result referred to as the Lebesgue Besicovitch
Dierentiation theorem.
Corollary 21.2.3 If f L1loc (Rn , ), then for a.e.x
/ Z,

1
lim |f (y) f (x)| d (y) = 0 . (21.2.9)
r0 (B (x, r)) B(x,r)
Proof: If f is replaced by f XB(0,k) then the conclusion 21.2.9 holds for all
/ k where Fk is a set of measure 0. Letting k = 1, 2, , and F
x F k=1 Fk , it
follows that F is a set of measure zero and for any x / F , and k {1, 2, }, 21.2.9
holds if f is replaced by f XB(0,k) . Picking any such x, and letting k > |x| + 1, this
shows
1
lim |f (y) f (x)| d (y)
r0 (B (x, r)) B(x,r)

1
= lim f XB(0,k) (y) f XB(0,k) (x) d (y) = 0.
r0 (B (x, r)) B(x,r)
21.3 Slicing Measures

Let be a nite Radon measure. I will show here that a formula of the following
form holds.
(F ) = d = XF (x, y) d x (y) d (x)
F Rn Rm
where (E) = (E R ). When this is done, the measures, x , are called slicing
m
measures and this shows that an integral with respect to can be written as an
iterated integral in terms of the measure and the slicing measures, x . This is
like going backwards in the construction of product measure. One starts with a
measure, , dened on the Cartesian product and produces and an innite family
of slicing measures from it whereas in the construction of product measure, one
starts with two measures and obtains a new measure on a algebra of subsets of
the Cartesian product of two spaces. First here are two technical lemmas.
21.3. SLICING MEASURES 567
Lemma 21.3.1 The space Cc (Rm ) with the norm

||f || sup {|f (y)| : y Rm }
is separable.
Proof: Let Dl consist of all functions which are of the form
( ( ))n
C
a y dist y,B (0,l + 1)
||N
where a Q, is a multi-index, and n is a positive integer. Consider D l Dl .

Then D is countable. If f Cc (Rn ) , then choose l large enough that spt (f )
B (0,l + 1), a locally compact space, f C0 (B (0, l + 1)). Then since Dl separates
the points of B (0,l + 1) is closed with respect to conjugates, and annihilates no
point, it is dense in C0 (B (0, l + 1)) . Alternatively, D is dense in C0 (Rn ) by Stone
+
Weierstrass and Cc (Rn ) is a subspace so it is also separable. So is Cc (Rn ) , the
nonnegative functions in Cc (R ).
n
From the regularity of Radon measures, the following lemma follows.

Lemma 21.3.2 If and are two Radon measures dened on algebras, S and
S , of subsets of Rn and if (V ) = (V ) for all V open, then = and S = S .
Proof: Every compact set is a countable intersection of open sets so the two
measures agree on every compact set. Hence it is routine that the two measures
agree on every G and F set. (Recall G sets are countable intersections of open
sets and F sets are countable unions of closed sets.) Now suppose E S is a
bounded set. Then by regularity of there exists G a G set and F, an F set
such that F E G and (G \ F ) = 0. Then it is also true that (G \ F ) = 0.
Hence E = F (E \ F ) and E \ F is a subset of G \ F, a set of measure zero. By
completeness of , it follows E S and
(E) = (F ) = (F ) = (E) .
If E S not necessarily bounded, let Em = E B (0, m) and then Em S and
(Em ) = (Em ) . Letting m , E S and (E) = (E) . Similarly, S S
and the two measures are equal on S .
The main result in the section is the following theorem.
Theorem 21.3.3 Let be a nite Radon measure on Rn+m dened on a algebra,
F. Then there exists a unique nite Radon measure, , dened on a algebra, S,
of sets of Rn which satises
(E) = (E Rm ) (21.3.10)
for all E Borel. There also exists a Borel set of measure zero, N , such that
for each x
/ N , there exists a Radon probability measure x such that if f is a
nonnegative measurable function or a measurable function in L1 (),
y f (x, y) is x measurable a.e.

x f (x, y) d x (y) is measurable (21.3.11)
Rm
and ( )
f (x, y) d = f (x, y) d x (y) d (x). (21.3.12)
Rn+m Rn Rm
If bx is any other collection of Radon measures satisfying 21.3.11 and 21.3.12, then
bx = x for a.e. x.
Proof:
Existence and uniqueness of
First consider the uniqueness of . Suppose 1 is another Radon measure sat-

isfying 21.3.10. Then in particular, 1 and agree on open sets and so the two
measures are the same by Lemma 21.3.2.
To establish the existence of , dene 0 on Borel sets by
0 (E) = (E Rm ).
Thus 0 is a nite Borel measure and so it is nite on compact sets. Lemma 17.2.3
on Page 17.2.3 implies the existence of the Radon measure extending 0 .
Uniqueness of x
Next consider the uniqueness of x . Suppose x and bx satisfy all conclusions

of the theorem with exceptional sets denoted by N and N b respectively. Then,
b
enlarging N and N , one may also assume, using Lemma 21.2.1, that for x / N N b,
(B (x,r)) > 0 whenever r > 0. Now let

m
A= (ai , bi ]
i=1
where ai and bi are rational. Thus there are countably many such sets. Then from
the conclusion of the theorem, if x0 / N N b,

1
XA (y) d x (y) d
(B (x0 , r)) B(x0 ,r) Rm

1
= XA (y) db
x (y) d,
(B (x0 , r)) B(x0 ,r) Rm
and by the Lebesgue Besicovitch Dierentiation theorem, there exists a set of
measure zero, EA , such that if x0 / EA N N b , then the limit in the above exists
as r 0 and yields
x0 (A) = bx0 (A).
Letting E denote the union of all the sets EA for A as described above, it follows
that E is a set of measure zero and if x0
/ EN N b then x (A) = bx (A) for
0 0
all such sets A. But every open set can be written as a disjoint union of sets of this
form and so for all such x0 , x0 (V ) = bx0 (V ) for all V open. By Lemma 21.3.2
this shows the two measures are equal and proves the uniqueness assertion for x .
It remains to show the existence of the measures x .
Existence of x
For f 0, f, g Cc (Rm ) and Cc (Rn ) respectively, dene

(g, f ) g (x) f (y) d
Rn+m
Since f 0, g (g, f ) is a positive linear functional on Cc (Rn ). Therefore, there

exists a unique Radon measure f such that for all g Cc (Rn ) ,

g (x) f (y) d = g (x) d f .
Rn+m Rn
I claim that f , the two being considered as measures on B (Rn ) . Suppose

then that (E) = (E Rm ) = 0. By regularity of f and , it follows that if
f (E) = 0, then there exists compact K and open V such that for some > 0,

K E V, (V Rm ) = (V ) < ,
2 (||f || + 1)
and f (K) > . Then let K g V. It follows then that

< f (K) = XK (x) d f (x) g (x) d f (x) =
Rn Rn

g (x) f (y) d XV Rm (x, y) f (y) d ||f || (V Rm ) < ,
Rn+m Rm+m 2
a contradiction. Hence f as claimed. It follows from the Radon Nikodym
theorem the existence of a function hf L1 () such that for all g Cc (Rn ) ,

g (x) f (y) d = g (x) d f = g (x) hf (x) d. (21.3.13)
Rn+m Rn Rn
It is obvious from the formula that the map from f Cc (Rm ) to L1 () given
by f hf is linear. However, this is not suciently specic because functions
in L1 () are only determined a.e. However, for hf L1 () , you can specify a
particular representative a.e. By the fundamental theorem of calculus,

c 1
hf (x) lim hf (z) d (z) (21.3.14)
r0 (B (x, r)) B(x,r)
exists o some set of measure zero Zf . Note that since this involves the integral
over a ball, it does not matter which representative of hf is placed in the formula.
cf (x) is well dened pointwise for all x not in some set of measure zero.
Therefore, h
c
Since hf = hf a.e. it follows that hcf is well dened and will work in the formula
21.3.13. Let
Z = {Zf : f D}
where D is the countable dense subset of Cc (Rm ). For f an arbitrary function in

Cc (Rm ) and f D, a dense countable subset of Cc (Rn ) , it follows from 21.3.13,
+ +

g (x) (hf (x) hf (x)) d ||f f || |g (x)| d

Rn Rn+m
Let gk (x) XB(z,r) (x) where z / Zf Zf . Then by the dominated convergence

theorem, the above implies

(hf (x) hf (x)) d ||f f || d = ||f f || (B (z, r)) .
B(z,r) B(z,r)Rm
Dividing by (B (z, r)) , and then taking a limit as r 0, it follows that for all
z
/ Zf Zf ,
|hf (z) hf (z)| ||f f || ,
Also, if (B (z, r)) > 0 for all r > 0, then for all r > 0,

1
(hf (x) hf (x)) d ||f f ||
(B (z, r)) B(z,r)
+
It follows that for f Cc (Rm ) arbitrary and z / Z,

1 1
lim sup hf (x) d lim inf hf (x) d
r0 (B (z, r)) B(x,r) r0 (B (z, r)) B(x,r)

1
= lim sup (hf (x) hf (x)) d (x)
r0 (B (z, r)) B(x,r)

1
lim inf (hf (x) hf (x)) d (x)
r0 (B (z, r)) B(x,r)

1
lim sup (hf (x) hf (x)) d (x)
r0 (B (z, r)) B(x,r)

1
+ lim inf (hf (x) hf (x)) d (x)
r0 (B (z, r)) B(x,r)
2 ||f f ||
and since f is arbitrary, it follows that the limit of 21.3.14 holds for all f Cc (Rm )
whenever z / Z, the above set of measure zero.
Now for f an arbitrary real valued function of Cc (Rn ) , simply apply the above
cf hd
result to positive and negative parts to obtain hf hf + hf and h d
f + hf .
Then it follows that for all f Cc (R ) and g Cc (R )
m m

g (x) f (y) d = cf (x) d.
g (x) h
Rn+m Rn
It is obvious that for each x

/ Z, (Z) = 0, that f h cf (x) is a positive linear
functional. Hence by the Riesz representation theorem, there exists a unique x
such that
cf (x) =
h f (y) d x (y) .
Rm
It follows that for x not in a set of measure zero,

g (x) f (y) d = g (x) f (y) d x d (21.3.15)
Rn+m Rn Rm

and x Rm f (y) d x is measurable and x is a Radon measure. Now let
fk XRm and g 0. Then by monotone convergence theorem,

g (x) d = g (x) d x d
Rn+m Rn Rm

If gk XRn , the monotone convergence theorem shows that x Rm d x is L1 ().
Next let gk XB(x,r) and use monotone convergence theorem to write

(B (x, r)) d = d x d
B(x,r)Rm B(x,r) Rm
Then dividing by (B (x, r)) and taking a limit as r 0, it follows that for
a.e. x, 1 = x (Rm ) , so these x are probability measures. Letting gk (x)
XA (x) , fk (y) XB (y) for A, B open, it follows that 21.3.15 is valid for g (x)
replaced with XA (x) and f (y) replaced with XB (y).
Now let G denote the Borel sets F of Rn+m such that

XF (x, y) d (x, y) = XF (x, y) d x (y) d (x)
Rn+m Rn Rm
and that all the integrals make sense. As just explained, this includes all Borel
sets of the form F = A B where A, B are open. It is clear that G is closed with
respect to countable disjoint unions and complements, while sets of the form A B
for A, B open form a system. Therefore, by Lemma 9.2.2, G contains the Borel
sets which is the smallest algebra which contains such products of open sets. It
follows from the usual approximation with simple functions that if f 0 and is
Borel measurable, then

f (x, y) d (x, y) = f (x, y) d x (y) d (x)
Rn+m Rn Rm
with all the integrals making sense.

This proves the theorem in the case where f is Borel measurable and non-
negative. It just remains to extend this to the case where f is only measurable.
However, from regularity of there exist Borel measurable functions g, h, g f h
such that

f (x, y) d (x, y) = g (x, y) d (x, y)
Rn+m Rn+m

= h (x, y) d (x, y)
Rn+m
It follows

g (x, y) d x (y) d (x) = h (x, y) d x (y) d (x)
Rn Rm Rn Rm
and so, since for a.e. x, y g (x, y) and y h (x, y) are x measurable with

0= (h (x, y) g (x, y)) d x (y)
Rm
and x is a Radon measure, hence complete, it follows for a.e. x, y f (x, y)

must be x measurable because it is equal to y g (x, y) , x a.e. Therefore, for
a.e. x, it makes sense to write

f (x, y) d x (y) .
Rm
Similar reasoning applies to the above function of x being measurable due to

being complete. It follows

f (x, y) d (x, y) = g (x, y) d (x, y)
Rn+m
R
n+m
= g (x, y) d x (y) d (x)

R R
n m
= f (x, y) d x (y) d (x)

Rn Rm
with everything making sense.
21.4 Exercises
1. Suppose U is an open set in Rn and h : U Rn is a function which satises
the following conditions.
h is continuous and one to one (21.4.16)
Dh (x) exists for all x A (21.4.17)

21.4. EXERCISES 573
mn (h (U \ A)) = 0 (21.4.18)
Here mn is Lebesgue measure. Show, using the Besikovitch covering theorem
that if T U and mn (T ) = 0, then mn (h (T )) = 0 also. If mn were replaced
with an arbitrary Radon measure, would this result still hold? Explain. Hint:
You might want to rst consider Tk {x T : Dh (x) k} , show h (Tk )
has measure zero and then let k .
2. If S is a Lebesgue measurable subset of U, show that h (S) is also Lebesgue

measurable.
3. For E a Lebesgue measurable set contained in U , dene (E) mn (h (E)) .

Show that this is a measure dened on the Lebesgue measurable sets and that
mn .Explain why there exists a nonnegative Lebesgue measurable func-
tion J in L1loc such that
(E) = Jdmn
E
4. Let B = B (0, r) Rn and let F : B Rn be continuous and also suppose

that for some < 1,
|F (v) v| < r
for all v B. Then show that F (B) B (0, r (1 )) . This amazing result
says that if F does not move any point too far, then F (B) contains a ball
centered at 0. It is in the book by Rudin, Real and Complex Analysis. Hint:
First show that if a B (0, r (1 ))\F (B) , then a = F (v) for all v B. To
do this, note that if |v| < r, then a = F (v) and so the only possibilities are to
have |v| = r. Now use the given condition on F and the triangle inequality to
verify that then |F (v) v| > r, contrary to the property of F. Now dene
G : B B by
r (a F (v))
G (v)
|a F (v)|
Then G is continuous. By the Brouwer xed point theorem, Theorem 10.9.5,
G (v) = v for some v B. Explain why |v| = r. Next consider r2 = (v, v) =
(G (v) , v) . Show that this expression is 0. To do this, note that
r
r2 = (G (v) , v) = ((a F (v)) , v)
|a F (v)|
r
= ((a v + v F (v)) , v)
|a F (v)|
r
= [(a v, v) + (v F (v) , v)]
|a F (v)|
Now use that (a, v) r2 (1 ) and |(v F (v) , v)| r2 .
5. Now you want to identify J. To do this, assume that at every point of

1
A, Dh (x) exists. Verify the following. For (0, 1) , then for each x A,
there exists rx such that whenever r < rx ,
h (B (x, r)) h (x) + Dh (x) B (0,r (1 + )) ,

h (B (x, r)) h (x) + Dh (x) B (0, r (1 )) .
To do the rst, it just follows from the denition of dierentiability. To do

the second, let
1 1
F (v) Dh (x) h (x + v) Dh (x) h (x)
Then explain why

1 1
F (v) v =Dh (x) (h (x) + Dh (x) v + o (v)) Dh (x) h (x) = o (v) .
Now show that the conditions of Problem 4 are veried.

6. With the above, it is time to identify J. Explain, using Theorem 10.2.1
why
n
(1 + ) |det Dh (x)| mn (B (0,r))
= |det Dh (x)| mn (B (0,r (1 + )))
= mn (Dh (x) B (0,r (1 + ))) mn (h (B (x, r)))

J (x) dmn mn (Dh (x) B (0, r (1 )))
B(x,r)
= |det Dh (x)| mn (B (0,r (1 )))
n
= (1 ) |det Dh (x)| mn (B (0,r))
Now divide by mn (B (0,r)) and use the fundamental theorem of calculus.

Next use the fact that was arbitrary. Explain why J (x) = |det Dh (x)| for
a.e. x. Explain also why |det Dh (x)| is Borel measurable.
7. In the above problem you have shown that when h satises 21.4.16 -
1
21.4.18 with Dh (x) existing for all x A,

mn (h (E)) = XE (x) |det Dh (x)| dmn .
U
Explain why this implies that for all F Borel measurable,

mn (F ) = XF (h (x)) |det Dh (x)| dmn
U
Next show the change of variables formula is valid for any f 0 and Borel
measurable,

f (y) dmn = f (h (x)) |det Dh (x)| dmn
V U
21.4. EXERCISES 575
1
under the assumptions 21.4.16 - 21.4.18 with Dh (x) existing for all x A.
Finally extend this result to only require f is Lebesgue measurable. Note that
x f (h (x)) is not known to be measurable but x f (h (x)) |det Dh (x)|
will be measurable. This last part will involve completeness of Lebesgue mea-
sure along with regularity.
8. Suppose h : U Rn is a function which satises the following conditions.
h is one to one (21.4.19)
Dh (x) exists for all x U (21.4.20)

Let A {x : |det Dh (x)| = 0} . Use Lemma 10.6.1 to verify that mn (h (U \ A)) =
0. Then explain why the above change of variables formula holds in this con-
text. There is a version of the covering theorem, upon which the proof of this
lemma was based, which follows from the Besicovitch covering theorem. See
the following problems.
9. Suppose you have a Radon measure . Let S (x,r) denote the set {y : |x y| = r} .
Show that (S (x, r)) = 0 for at most countably many values of r. Hint: You
might consider rst r n. Recall that (K) < for all K compact since
is a Radon measure.
10. Let be a Radon measure on a algebra of subsets of Rn and let E be a
bounded measurable set. Let F be a collection of balls having centers at the
points of E such that there is an upper bound for the radii of these balls. Then
say that F is a Vitali cover of E if for each x E, there exists some Br (x) F
whenever r is suciently small. Here Br (x) will be any set satisfying
B (x,r) Br (x) B (x,r)
where B (x, r) is the usual open ball.
(a) Using Problem 9, explain why for each x E, there exist Br (x) F
such that ( )
(S (x, r)) B (x,r) \ B (x,r) = 0
for arbitrarily small r > 0. Denote by F1 the collection of all Br (x) F

such that (S (x, r)) = 0 and also r 1.
(b) By the Besicovitch covering theorem, there are Nn collections of count-
ably many disjoint sets of F1 , H1 , , HNn , such that
E N
i=1 {B : B Hi }
n
Then explain why, for some i, there are nitely many sets of Hi , B1 , , Bm1
such that

m1
( ) m1
(E)
Bi E = (Bi E) > .
i=1 i=1
Nn + 1
( 1 )
(c) Letting E1 = E \ m
i=1 Bi , explain why
Nn
(E1 ) (E)
Nn + 1
{ let F2 be the
(d) Now } sets of F1 , if any, which have empty intersection with
Bi : Bi Hi . Then let E1 play the role of E in the above argument
and let F2 play the role of F. Thus there exist nitely many sets of
F2 , Bm1 +1 , , Bm2 which are disjoint and if E2 consists of those points
of E1 which are not covered by these balls, then
( )2
Nn Nn
(E2 ) (E1 ) (E)
Nn + 1 Nn + 1
Continuing this way, explain why (En ) 0 and why the disjoint balls
just constructed have the property that (E \
i=1 Bi ) = 0.
11. In the above problem, you dont need to have E bounded. Explain why
you can eliminate this assumption. Hint: Let rn be an increasing sequence of
positive real numbers
( ) rn )) = 0. Then let E0 = E B (0, r0 )
such that (S (0,
and En = E B (0, rn ) \ B (0, rn1 ) . Also let Fn be those sets of F which
are contained in B (0, rn ) \ B (0, rn1 ). Then apply the above result.
12. For X a random variable having values in Rn , denote by X the Radon mea-
sure satisfying X (E) P (X E) for every Borel E. Now suppose X, Y
are random variables having values in Rn and Rm respectively. First explain
why there exist unique probability measures, denoted as X|y and X|x such
that whenever E is a Borel set in Rn+m ,

XE d (X, Y) = XE dX|y dY = XE dY|x dX
Rn+m Rn Rm Rn Rm
Next explain why X, Y are independent if and only if dX|y = dX and

dY|y = dY . Recall that two random variables X, Y are independent means
that when A (X) and B (Y) , then P (A B) = P (A) P (B).
The Bochner Integral
22.1 Strong And Weak Measurability

In this chapter (, S,) will be a nite measure space and X will be a Banach
space which contains the values of either a function or a measure. The Banach
space will be either a real or a complex Banach space but the eld of scalars does
not matter and so it is denoted by F with the understanding that F = C unless
otherwise stated. The theory presented here includes the case where X = Rn or
Cn but it does not include the situation where f could have values in a space like
[0, ]. To begin with here is a denition.
Denition 22.1.1 A function, x : X, for X a Banach space, is a simple
function if it is of the form

n
x (s) = ai XBi (s)
i=1
where Bi S and (Bi ) < for each i. A function x from to X is said to be

strongly measurable if there exists a sequence of simple functions {xn } converging
pointwise to x. The function x is said to be weakly measurable if, for each f X ,
f x
is a scalar valued measurable function.
Earlier, a function was measurable if inverse images of open sets were measur-
able. Something similar holds here. The dierence is that another condition needs
to hold.
Theorem 22.1.2 x is strongly measurable if and only if x1 (U ) is measurable for
all U open in X and x () is separable.
Proof: Suppose rst x1 (U ) is measurable for all U open in X and x ()
is separable. Let {an }
n=1 be the dense subset of x (). It follows x
1
(B) is
measurable for all B Borel because
{B : x1 (B) is measurable}
577
578 THE BOCHNER INTEGRAL
is a algebra containing the open sets. Let

n
Ukn {z X : ||z ak || min{||z al ||l=1 }}.
In words, Ukm is the set of points of X which are as close to ak as they are to any
of the al for l n.
( )
Bkn x1 (Ukn ) , Dkn Bkn \ k1
i=1 Bi , D1 B1 ,
n n n
and

n
xn (s) ak XDkn (s).
k=1
n
Thus xn (s) is a closest approximation to x (s) from {ak }k=1 and so xn (s) x (s)
because {an }n=1 is dense in x (). Furthermore, xn is measurable because each Dk
n
is measurable.
Since (, S, ) is nite, there exists n with (n ) < . Let
yn (s) Xn (s) xn (s).
Then yn (s) x (s) for each s because for any s, s n if n is large enough. Also
yn is a simple function because it equals 0 o a set of nite measure.
Now suppose that x is strongly measurable. Then some sequence of simple
functions, {xn }, converges pointwise to x. Then x1 n (W ) is measurable for every
open set W because it is just a nite union of measurable sets. Thus, x1 n (W ) is
measurable for every Borel set W . This follows by considering
{ }
W : x1
n (W ) is measurable
and observing this is a algebra which contains the open sets. Since X is a metric
space, it follows that if U is an open set in X , there exists a sequence of open sets,
{Vn } which satises
V n U, V n Vn+1 , U =
n=1 Vn .
Then ( )
x1 (Vm ) x1
k (Vm ) x
1
Vm .
n< kn
This implies
x1 (U ) = x1 (Vm )
m<
( )
x1
k (Vm ) x1 V m x1 (U ).
m< n< kn m<
Since
x1 (U ) = x1
k (Vm ),
m< n< kn
22.1. STRONG AND WEAK MEASURABILITY 579
it follows that x1 (U ) is measurable for every open U . It remains to show x () is

separable. Let
D all values of the simple functions xn
which converge to x pointwise. Then D is clearly countable and dense in D, a set
which contains x ().
Claim: x () is separable. { }
Proof of claim: For n N, let Bn B (d, r) : 0 < r < n1 , r rational, d D .
( 1)
( B1 n) is countable. Let z(1 D.1 )Consider B z, n . Then there exists d D
Thus
B z, 3n . Now pick r Q 3n , n so that B (d, r) Bn . Now z B (d, r) and so
this shows that x () D Bn for each n. Now let Bn denote those sets of Bn

which have nonempty intersection with x () . Say Bn = {Bkn }n,k=1 . By the axiom
of choice, there exists xnk Bkn x () . Then if z x () , z is contained in some set

of Bn which also contains a point of {xnk }n,k=1 . Therefore, z is at least as close as
n
2/n to some point of {xk }n,k=1 which shows {xnk }n,k=1 is a countable dense subset
of x () . Therefore x () is separable.
The last part also shows that a subset of a separable metric space is also sepa-
rable. Therefore, the following simple corollary is obtained.
Corollary 22.1.3 If X is a separable Banach space then x is strongly measurable

if and only if x1 (U ) is measurable for all U open in X.
The next lemma is interesting for its own sake. Roughly it says that if a Banach
space is separable, then the unit ball in the dual space is weak separable. This
will be used to prove Pettiss theorem, one of the major theorems in this subject
which relates weak measurability to strong measurability.
Lemma 22.1.4 If X is a separable Banach space with B the closed unit ball in
X , then there exists a sequence {fn }
n=1 D B with the property that for every
x X,
||x|| = sup |f (x)|
f D
If H is a dense subset of X then D may be chosen to be contained in H.
Proof: Let {ak } be a countable dense set in X, and consider the mapping
n : B Fn
given by
n (f ) (f (a1 ) , , f (an )) .

Then n (B ) is contained in a compact subset of Fn because |f (ak )| ||ak || .

Therefore, there exists a countable
densek subset
{ ) ,({kn)(fk )}}k=1 .Then pick
of n (B
hj H B such that limj fk hj = 0. Then n hj , k, j must also be
k
{ }
dense in n (B ) . Let Dn = hkj , k, j . Dene
D
k=1 Dk .
Note that for each x X, there exists fx B such that fx (x) = ||x||. From the
construction,
||am || = sup {|f (am )| : f D }

because fam (am ) is the limit of numbers f (am ) for f Dm D . Therefore, for x
arbitrary,
||x|| ||x am || + ||am || = sup {|f (am )| : f D } + ||x am ||

sup {|f (am x) + f (x)| : f D } + ||x am ||
sup {|f (x)| : f D } + 2 ||x am || ||x|| + 2 ||x am || .

Since am is arbitrary and the {am }m=1 are dense, this establishes the claim of the
lemma.
The next theorem is one of the most important results in the subject. It is due
to Pettis and appeared in 1938.
Theorem 22.1.5 If x has values in a separable Banach space X. Then x is weakly

measurable if and only if x is strongly measurable.
Proof: It is necessary to show x1 (U ) is measurable whenever U is open.

Since every open set is a countable union of balls, it suces to show x1 (B (a, r))
is measurable for any ball, B (a, r) . (
Since every
) open ball is the countable union of
closed balls, it suces to verify x1 B (a, r) is measurable. From Lemma 22.1.4
( )
x1 B (a, r) = {s : ||x (s) a|| r}
{ }
= s : sup |f (x (s) a)| r
f D
= f D {s : |f (x (s) a)| r}
= f D {s : |f (x (s)) f (a)| r}
1
= f D (f x) B (f (a) , r)
which equals a countable union of measurable sets because it is assumed that f x

is measurable for all f X .
Next suppose x is strongly measurable. Then there exists a sequence of simple
functions xn which converges to x pointwise. Hence for all f X , f xn is
measurable and f xn f x pointwise. Thus x is weakly measurable.
The same method of proof yields the following interesting corollary.
Corollary 22.1.6 Let X be a separable Banach space and let B (X) denote the
algebra of Borel sets. Let H be a dense subset of X . Then B (X) = (H) F,
the smallest algebra of subsets of X which has the property that every function,
x H is measurable.
Proof: First I need to show F contains open balls because then F will contain
the open sets and hence the Borel sets. As noted above, it suces to show F
contains closed balls. Let D be those functionals in B dened in Lemma 22.1.4.
Then
{ }

{x : ||x a|| r} = x : sup |x (x a)| r
x D
= x D {x : |x (x a)| r}
= x D {x : |x (x) x (a)| r}
( )
= x D x1 B (x (a) , r) (H)
which is measurable because this is a countable intersection of measurable sets.

Thus F contains open sets so (H) F B (X) .
To show the other direction for the inclusion, note that each x is B (X) mea-
surable because x1 (open set) = open set. Therefore, B (X) (H) .
It is important to verify the limit of strongly measurable functions is itself
strongly measurable. This happens under very general conditions. Suppose X
is any separable metric space and let denote the open sets of X. Then it is routine
to see that
has a countable basis, B. (22.1.1)
Whenever U B, there exists a sequence of open sets, {Vm }
m=1 , such that

Vm V m Vm+1 , U = Vm . (22.1.2)
m=1
Theorem 22.1.7 Let fn and f be functions mapping to X where F is a

algebra of measurable sets of and (X, ) is a topological space satisfying 22.1.1 -
22.1.2. Then if fn is measurable, and f () = limn fn (), it follows that f is
also measurable. (Pointwise limits of measurable functions are measurable.)
Proof: Let B be the countable basis of 22.1.1 and let U B. Let {Vm } be the
sequence of 22.1.2. Since f is the pointwise limit of fn ,
f 1 (Vm ) { : fk () Vm for all k large enough} f 1 (Vm ).
Therefore,
1
f 1 (U ) =
m=1 f
1
(Vm )
m=1 n=1 k=n fk (Vm )

m=1 f
1
(Vm ) = f 1 (U ).
It follows f 1 (U ) F because it equals the expression in the middle which is
measurable. Now let W . Since B is countable, W = n=1 Un for some sets
Un B. Hence
f 1 (W ) =
n=1 f
1
(Un ) F .
Note that the same conclusion would hold for any topological space with the
property that for any open set U, it has such a sequence of Vk attached to it as in
22.1.2.
Corollary 22.1.8 x is strongly measurable if and only if x () is separable and x

is weakly measurable.
Proof: Strong measurability clearly implies weak measurability. If xn (s)
x (s) where xn is simple, then f (xn (s)) f (x (s)) for all f X . Hence f x is
measurable by Theorem 22.1.7 because it is the limit of a sequence of measurable
functions. Let D denote the set of all values of xn . Then D is a separable set
containing x (). Thus D is a separable metric space. Therefore x () is separable
also by the last part of the proof of Theorem 22.1.2.
Now suppose D is a countable dense subset of x () and x is weakly measurable.
Let Z be the subset consisting of all nite linear combinations of D with the scalars
coming from the set of rational points of F. Thus, Z is countable. Letting Y = Z,
Y is a separable Banach space containing x (). If f Y , f can be extended to an
element of X by the Hahn Banach theorem. Therefore, x is a weakly measurable
Y valued function. Now use Theorem 22.1.5 to conclude x is strongly measurable.

Weakly measurable as dened above means s x (x (s)) is measurable for
every x X . The next lemma ties this weak measurability to the usual version of
measurability in which a function is measurable when inverse images of open sets
are measurable.
Lemma 22.1.9 Let X be a Banach space and let x : (, F) K X where K
is weakly compact and X is separable. Then x is weakly measurable if and only if
x1 (U ) F whenever U is a weakly open set.
Proof: By Corollary 18.4.9 on Page 468, there exists a metric d, such that the
metric space topology with respect to d coincides with the weak topology. Since
K is compact, it follows that K is also separable. Hence it is completely separable
and so there exists a countable basis of open sets B for the weak topology on K. It
follows that if U is any weakly open set, covered by basic sets of the form BA (x, r)
where A is a nite subset of X , there exists a countable collection of these sets of
the form BA (x, r) which covers U .
Suppose now that x is weakly measurable. To show x1 (U ) F whenever U
is weakly open, it suces to verify x1 (BA (z, r)) F for any set, BA (z, r) . Let
A = {x1 , , xm } . Then
x1 (BA (z, r)) = {s : A (x (s) z) < r}
{ }

s : max

|x (x (s) z)| < r
x A
= m
i=1 {s : |xi (x (s) z)| < r}
= m
i=1 {s : |xi (x (s)) xi (z)| < r}
which is measurable because each xi x is given to be measurable.
Next suppose x1 (U ) F whenever U is weakly open. Then in particular this
holds when U = Bx (z, r) for arbitrary x . Hence
{s : x (s) Bx (z, r)} F.
But this says the same as
{s : |x (x (s)) x (z)| < r} F
Since x (z) can be a completely arbitrary element of F, it follows x x is an F

valued measurable function. In other words, x is weakly measurable according to
the former denition.
One can also dene weak measurability and prove a theorem just like the
Pettis theorem above. The next lemma is the analogue of Lemma 22.1.4.
Lemma 22.1.10 Let B be the closed unit ball in X. If X is separable, there exists
a sequence {xm }
m=1 D B with the property that for all y X ,

||y || = sup |y (x)| .

xD
Proof: Let
{xk }
k=1
be the dense subspace of X . Dene n : B Fn by
n (x) (x1 (x) , , xn (x)).
Then |xk (x)| ||xk || and so n (B) is contained in a compact subset of Fn .

Therefore, there exists a countable set, Dn B such that n (Dn ) is dense in
n (B) . Let
D n=1 Dn .
It remains to verify this works. Let y X . Then there exists y such that
|y (y)| > ||y || .
By density, there exists one of the xk from the countable dense subset of X such
that also
|xk (y)| > ||y || , ||xk y || < .
Now xk (y) k (B) and so there exists x Dk D such that
|xk (x)| > ||y || .
Then since ||xk y || < , this implies
|y (x)| ||y || 2.
Since > 0 is arbitrary,
||y || sup |y (x)| ||y ||

xD
The next theorem is another version of the Pettis theorem. First here is a
denition.
Denition 22.1.11 A function y having values in X is weak measurable, when

for each x X, y () (x) is a measurable scalar valued function.
Theorem 22.1.12 If X is separable and y : X is weak measurable, then

y is strongly measurable.
Proof: It is necessary to show y 1 (B (a , r)) is measurable. This will suce

because the separability of X implies every open set is the countable union of such
balls of the form B (a , r). It also suces to verify inverse images of closed balls
are measurable because every open ball is the countable union of closed balls. From
Lemma 22.1.10,
( )
y 1 B (a , r) = {s : ||y (s) a || r}
{ }

= s : sup |(y (s) a ) (x)| r
xD
{ }

= s : sup |y (s) (x) a (x)| r
xD
( )
1
= xD y () (x) B (a (x) , r)
which is a countable intersection of measurable sets by hypothesis.

The following are interesting consequences of the theory developed so far and
are of interest independent of the theory of integration of vector valued functions.
Theorem 22.1.13 If X is separable, then so is X.
Proof: Let D = {xm } B, the unit ball of X, be the sequence promised by

Lemma 22.1.10. Let V be all nite linear combinations of elements of {xm } with
rational scalars. Thus V is a separable subspace of X. The claim is that V = X.
If not, there exists
x0 X \ V .
But by the Hahn Banach theorem there exists x0 X satisfying x0 (x0 ) = 0, but
x0 (v) = 0 for every v V . Hence
||x0 || = sup |x0 (x)| = 0,

xD
a contradiction.
Corollary 22.1.14 If X is reexive, then X is separable if and only if X is sep-

arable.
Proof: From the above theorem, if X is separable, then so is X. Now suppose

X is separable with a dense subset equal to D. Then since X is reexive, J (D) is
dense in X where J is the James map satisfying Jx (x ) x (x) . Then since X
is separable, it follows from the above theorem that X is also separable.
22.2. THE ESSENTIAL BOCHNER INTEGRAL 585
22.2 The Essential Bochner Integral

Denition 22.2.1 Let ak X, a Banach space and let

n
x (s) = ak XEk (s) (22.2.3)
k=1
where for each k, Ek is measurable and (Ek ) < . Then dene

n
x (s) d ak (Ek ).
k=1
Proposition 22.2.2 Denition 22.2.1 is well dened.

Proof: It suces to verify that if

n
ak XEk (s) = 0,
k=1
then

n
ak (Ek ) = 0.
k=1
Let f X . Then
( )

n
n
f ak XEk (s) = f (ak ) XEk (s) = 0
k=1 k=1
and, therefore,
( n
)

n
( n

)
0= f (ak ) XEk (s) d = f (ak ) (Ek ) = f ak (Ek ) .
k=1 k=1 k=1

Since f X is arbitrary, and X separates the points of X, it follows that

n
ak (Ek ) = 0
k=1
as claimed.
It follows easily from this proposition that d is well dened and linear on
simple functions.
Denition 22.2.3 A strongly measurable function x is Bochner integrable if there
exists a sequence of simple functions xn converging to x pointwise and satisfying

||xn (s) xm (s)|| d 0 as m, n . (22.2.4)

If x is Bochner integrable, dene

x (s) d lim xn (s) d. (22.2.5)
n
Theorem 22.2.4 The Bochner integral is well dened and if x is Bochner inte-
grable and f X , ( )
f x (s) d = f (x (s)) d (22.2.6)

and

x (s) d ||x (s)|| d. (22.2.7)

Also, the Bochner integral is linear. That is, if a, b are scalars and x, y are two
Bochner integrable functions, then

(ax (s) + by (s)) d = a x (s) d + b y (s) d (22.2.8)

Also
x (s) d < when x is Bochner integrable.
Proof: First it is shown that the triangle inequality holds on simple functions
and that the limit in 22.2.5 exists. Thus, if x is given by 22.2.3 (simple) with the
Ek disjoint,

x (s) d

n n

= ak XEk (s) d = ak (Ek )

k=1 k=1
n n
||ak || (Ek ) = ||ak || XEk (s) d = ||x (s)|| d
k=1 k=1
which shows the triangle inequality holds on simple functions. This implies

xn (s) d
xm (s) d = (xn (s) xm (s)) d

||xn (s) xm (s)|| d

which veries the existence of the limit in 22.2.5. Also

xn (s) d
xm (s) d |xn (s) xm (s)| d

xn (s) xm (s) d

{ }
which is given to converge to 0 as m, n . Therefore,
xn (s) d is a
Cauchy sequence so it is bounded. Now from Fatous lemma,

x (s) d lim inf xn (s) d < .
n
This completes the rst part of the argument.

Next it is shown the integral does not depend on the choice of the sequence
satisfying 22.2.4 so that the integral is well dened. Suppose yn , xn both satisfy
22.2.4 and converge to x pointwise. By Fatous lemma,

yn d x d ||y x|| d + ||x xm || d
m n

lim inf ||yn yk || d + lim inf ||xk xm ||
k k
/2 + /2
if m and n are chosen large enough. Since is arbitrary, this shows the limit is the
same for both sequences and demonstrates the Bochner integral is well dened.
It remains to verify the triangle inequality on Bochner integrable functions and
the claim about passing a continuous linear functional inside the integral. Let x be
Bochner integrable and let xn be a sequence which satises the conditions of the
denition. Dene
{
xn (s) if ||xn (s)|| 2 ||x (s)||,
yn (s) (22.2.9)
0 if ||xn (s)|| > 2 ||x (s)||.
Thus
yn (s) = xn (s) X[||xn ||2||x||] (s) .
If x (s) = 0 then yn (s) = 0 for all n. If ||x (s)|| > 0 then for all n large enough,
yn (s) = xn (s).
Thus, yn (s) x (s) and

||yn (s)|| 2 ||x (s)||. (22.2.10)
By Fatous lemma,
||x|| d lim inf ||xn || d. (22.2.11)
n
{ }
Also from 22.2.4 and the triangle inequality on simple functions, ||xn || d n=1
is a Cauchy sequence and so it must be bounded. Therefore, by 22.2.10, 22.2.11,
and the dominated convergence theorem,

0 = lim ||yn ym || d (22.2.12)
n,m
and it follows xn can be replaced with yn in Denition 22.2.3.

From Denition 22.2.1,
( )
f yn d = f (yn ) d.

Thus,
( ) ( )
f xd = lim f yn d = lim f (yn ) d = f (x) d,
n n
the last equation holding from the dominated convergence theorem and 22.2.10 and
22.2.11. This shows 22.2.6. To verify 22.2.7,

x (s) d = lim yn (s) d
n

lim ||yn (s)|| d = ||x (s)|| d
n
where the last equation follows from the dominated convergence theorem and 22.2.10,
22.2.11.
It remains to verify 22.2.8. Let f X . Then from 22.2.6
( )
f (ax (s) + by (s)) d = (af (x (s)) + bf (y (s))) d

= a f (x (s)) d + b f (y (s)) d
(

)
= f a x (s) d + b y (s) d .

Since X separates the points of X,it follows

(ax (s) + by (s)) d = a x (s) d + b y (s) d

and this proves 22.2.8.

Theorem 22.2.5 An X valued function, x, is Bochner integrable if and only if x
is strongly measurable and
||x (s)|| d < . (22.2.13)

In this case there exists a sequence of simple functions {yn } satisfying 22.2.4, yn (s)
converging pointwise to x (s),
||yn (s)|| 2 ||x (s)|| (22.2.14)
and
lim ||x (s) yn (s)|| d = 0. (22.2.15)
n
Proof: Suppose x is strongly measurable and condition 22.2.13 holds. Since x

is strongly measurable, there exists a sequence of simple functions, {xn } converging
pointwise to x. As before, let
{
xn (s) if ||xn (s)|| 2 ||x (s)||,
yn (s) = (22.2.16)
0 if ||xn (s)|| > 2 ||x (s)||.
Then 22.2.14 holds for yn and yn (s) x (s) . Also

0 = lim ||yn (s) ym (s)|| d
m,n
since otherwise, there would exist > 0 and N as 0 and n , m > N

such that
||yn (s) ym (s)|| d .

But then taking a limit as 0 and using the dominated convergence theorem
and 22.2.14 and 22.2.13, this would imply 0 . Therefore, x is Bochner integrable.
22.2.15 follows from the dominated convergence theorem and 22.2.14.
Now suppose x is Bochner integrable. Then it is strongly measurable and there
exists a sequence of simple functions {xn } such that xn (s) converges pointwise to
x and
lim ||xn (s) xm (s)|| d = 0.
m,n
{ }
Therefore, as before, since
xn d n=1
is a Cauchy sequence, it follows
{ }
||xn || d
n=1
is also a Cauchy sequence because

||xn || d ||xm || d |||xn || ||xm ||| d

||xn xm || d.

Thus
||x|| d lim inf ||xn || d <
n
Using 22.2.16 it follows yn satises 22.2.14, converges pointwise to x and then from
the dominated convergence theorem 22.2.15 holds.
Here is a simple corollary.
Corollary 22.2.6 Let an X valued function x be Bochner integrable and let L

L (X, Y ) where Y is another Banach space. Then Lx is a Y valued Bochner inte-
grable function and ( )
L x (s) d = Lx (s) d

Proof: From Theorem 22.2.5 there is a sequence of simple functions {yn } hav-
ing the properties listed in that theorem. Then consider {Lyn } which converges
pointwise to Lx. Since L is continuous and linear,

||Lyn Lx||Y d ||L|| ||yn x||X d

which converges to 0. This implies

lim ||Lyn Lym || d = 0
m,n
and so by denition Lx is Bochner integrable. Also

x (s) d = lim yn (s) d
n

Lx (s) d = lim Lyn (s) d
n

= lim L yn (s) d
n
( )

L x (s) d Lx (s) d

Y
( )

L x (s) d L yn (s) d
Y

+ Lyn (s) d Lx (s) d < /2 + /2 =
Y
whenever n large enough. This proves the corollary.
22.3 The Spaces Lp (; X)

Denition 22.3.1 x Lp (; X) for p [1, ) if x is strongly measurable and

p
||x (s)|| d <

Also ( )1/p
p
||x||Lp (;X) ||x||p ||x (s)|| d . (22.3.17)

As in the case of scalar valued functions, two functions in Lp (; X) are consid-

ered equal if they are equal a.e. With this convention, and using the same arguments
found in the presentation of scalar valued functions it is clear that Lp (; X) is a
normed linear space with the norm given by 22.3.17. In fact, Lp (; X) is a Banach
space. This is the main contribution of the next theorem.
Lemma 22.3.2 If xn is a Cauchy sequence in Lp (; X) satisfying

||xn+1 xn ||p < ,
n=1
then there exists x Lp (; X) such that xn (s) x (s) a.e. and

||x xn ||p 0.
22.3. THE SPACES LP (; X) 591
Proof: Let

N
gN (s) ||xn+1 (s) xn (s)||X
n=1
Then by the triangle inequality,

( )1/p N (
)1/p
p p
gN (s) d ||xn+1 (s) xn (s)|| d
n=1

||xn+1 xn ||p < .
n=1
Let

g (s) = lim gN (s) = ||xn+1 (s) xn (s)||X .
N
n=1
By the monotone convergence theorem,

( )1/p ( )1/p
p p
g (s) d = lim gN (s) d < .
N
Therefore, there exists a set of measure 0, E, such that for s

/ E, g (s) < . Hence,
for s
/ E,
lim xN +1 (s)
N
exists because

N
xN +1 (s) = xN +1 (s) x1 (s) + x1 (s) = (xn+1 (s) xn (s)) + x1 (s).
n=1
Thus, if N > M , and s is a point where g (s) < ,

N
||xN +1 (s) xM +1 (s)||X ||xn+1 (s) xn (s)||X
n=M +1

||xn+1 (s) xn (s)||X
n=M +1

which shows that {xN +1 (s)}N =1 is a Cauchy sequence. Now let
{
limN xN (s) if s
/ E,
x (s)
0 if s E.
By Theorem 22.1.2, xn () is separable for each n. Therefore, x () is also separable.

Also, if f X , then
f (x (s)) = lim f (xN (s))
N
if s
/ E and f (x (s)) = 0 if s E. Therefore, f x is measurable because it is the
limit of the measurable functions,
f x N XE C .
Since x is weakly measurable and x () is separable, Corollary 22.1.8 shows that x

is strongly measurable. By Fatous lemma,

p p
||x (s) xN (s)|| d lim inf ||xM (s) xN (s)|| d.
M
But if N and M are large enough with M > N ,

( )1/p
M
p
||xM (s) xN (s)|| d ||xn+1 xn ||p
n=N

||xn+1 xn ||p <
n=N
and this shows, since is arbitrary, that

p
lim ||x (s) xN (s)|| d = 0.
N
It remains to show x Lp (; X). This follows from the above and the triangle
inequality. Thus, for N large enough,
( )1/p
p
||x (s)|| d

( )1/p ( )1/p
p p
||xN (s)|| d + ||x (s) xN (s)|| d

( )1/p
p
||xN (s)|| d + < .

Theorem 22.3.3 Lp (; X) is complete. Also every Cauchy sequence has a subse-

quence which converges pointwise.
Proof: If {xn } is Cauchy in Lp (; X), extract a subsequence {xnk } satisfying

xn xnk p 2k
k+1
and apply Lemma 22.3.2. The pointwise convergence of this subsequence was estab-
lished in the proof of this lemma. This proves the theorem because if a subsequence
of a Cauchy sequence converges, then the Cauchy sequence must also converge.
Observation 22.3.4 If the measure space is Lebesgue measure then you have con-
tinuity of translation in Lp (Rn ; X) in the usual way. More generally, for a Radon
measure on a locally compact Hausdor space, Cc (; X) is dense in Lp (; X) .
Here Cc (; X) is the space of continuous X valued functions which have compact
support in . The proof of this little observation follows immediately from approx-
imating with simple functions and then applying the appropriate considerations to
the simple functions.
Clearly Fatous lemma and the monotone convergence theorem make no sense
for functions with values in a Banach space but the dominated convergence theorem
holds in this setting.
Theorem 22.3.5 If x is strongly measurable and xn (s) x (s) a.e. with
||xn (s)|| g (s) a.e.
where g L1 (), then x is Bochner integrable and

x (s) d = lim xn (s) d.
n
Proof: ||xn (s) x (s)|| 2g (s) a.e. so by the usual dominated convergence
theorem,
0 = lim ||xn (s) x (s)|| d.
n
Also,
||xn (s) xm (s)|| d

||xn (s) x (s)|| d + ||xm (s) x (s)|| d,

and so {xn } is a Cauchy sequence in L1 (; X). Therefore, by Theorem 22.3.3, there
exists y L1 (; X) and a subsequence xn satisfying
xn (s) y (s) a.e. and in L1 (; X).
But x (s) = limn xn (s) a.e. and so x (s) = y (s) a.e. Hence

||x (s)|| d = ||y (s)|| d <

which shows that x is Bochner integrable. Finally, since the integral is linear,

x (s) d x (s) d = (x (s) x (s)) d
n n

||xn (s) x (s)|| d,

and this last integral converges to 0.

One can also prove a version of the Vitali convergence theorem. To do this, here
is a more general version of Egoros theorem.
Theorem 22.3.6 (Egoro ) Let (, F, ) be a nite measure space,
(() < )
and let fn , f be X valued measurable functions where X is a separable metric space

and for all / E where (E) = 0
fn () f ()
Then for every > 0, there exists a set,
F E, (F ) < ,
such that fn converges uniformly to f on F C .
Proof: First suppose E = so that convergence is pointwise everywhere. Let
Ekm = { : d (fn () , f ()) 1/m for some n > k}.

[ ]
Claim: : d (fn () , f ()) m 1
is measurable.

Proof of claim: Let {xk }k=1 be a countable dense subset of X and let r denote
a positive rational number, Q+ . Then
( ( ))
1
kN,rQ+ fn1 (B (xk , r)) f 1 B xk , r
m
[ ]
1
= d (f, fn ) < (22.3.18)
m
Here is why. If is in the set on the left, then d (fn () , xk ) < r and
1
d (f () , xk ) < r.
m
Therefore,
1 1
r = .
d (f () , fn ()) < r +
m m
Thus the left side is contained in the right. Now let be in the right side.
1
That is d (fn () , f ()) < m 1
. Choose 2r < m d (fn () , f ()) and pick xk
B (fn () , r). Then
d (f () , xk ) d (f () , fn ()) + d (fn () , xk )
1 1
< 2r + r = r
m m
( ( ))
Thus fn1 (B (xk , r)) f 1 B xk , m1
r and so is in the left side. Thus
the two sets are equal. Now the set on the left in 22.3.18 is measurable because it
is a countable union of measurable sets. This proves the claim since
[ ]
1
: d (fn () , f ())
m
is the complement of this measurable set.

Hence Ekm is measurable because
[ ]
1
Ekm = n=k+1 : d (f n () , f ()) .
m
For xed m, k=1 Ekm = because fn () converges to f (). Therefore, if
there exists k such that if n > k, |fn () f ()| < m
1
which means
/ Ekm . Note
also that
Ekm E(k+1)m .
Since (E1m ) < ,
0 = (
k=1 Ekm ) = lim (Ekm ).
k
Let k(m) be chosen such that (Ek(m)m ) < 2m and let

F = Ek(m)m .
m=1
Then (F ) < because

( )
(F ) Ek(m)m < 2m =
m=1 m=1
Now let > 0 be given and pick m0 such that m1

0 < . If F , then
C

C
Ek(m)m .
m=1
Hence Ek(m
C
0 )m0
so
d (f () , fn ()) < 1/m0 <
for all n > k(m0 ). This holds for all F C and so fn converges uniformly to f on
F C.

Now if E = , consider {XE C fn }n=1 . Then XE C fn is measurable and the se-
quence converges pointwise to XE f everywhere. Therefore, from the rst part, there
exists a set of measure less than , F such that on F C , {XE C fn } converges uniformly
C
to XE C f. Therefore, on (E F ) , {fn } converges uniformly to f .
Now here is the Vitali convergence theorem and a denition.
Denition 22.3.7 Let A L1 (; X). Then A is said to be uniformly integrable
if for every > 0 there exists > 0 such that whenever (E) < , it follows

||f ||X d <
E
for all f A. It is bounded if

sup ||f ||X d < .
f A
Theorem 22.3.8 Let (, F, ) be a nite measure space and let X be a separable

Banach space. Let {fn } L1 (; X) be uniformly integrable and bounded such that
fn () f () for each . Then f L1 (; X) and

lim ||fn f ||X d = 0.
n
Proof: Let > 0 be given. Then by uniform integrability there exists > 0
such that if (E) < then
||fn || d < /3.
E
By Fatous lemma the same inequality holds for f . Also Fatous lemma shows
f L1 (; X), f being measurable because of Theorem 22.1.7.
By Egoros theorem, Theorem 22.3.6, there exists a set of measure less than ,
E such that the convergence of {fn } to f is uniform o E. Therefore,

||f fn || d (||f ||X + ||fn ||X ) d + ||f fn ||X d
E
EC
2
< + d <
3 E C ( () + 1) 3
if n is large enough.
Note that a convenient way to achieve uniform integrability is to simple say {fn }
is bounded in Lp (; X) for some p > 1. This follows from Holders inequality.
( )1/p ( )1/p
p
||fn || d d ||fn || d .
E E
22.4 Measurable Representatives

In this section consider the special case where X = L1 (B, ) where (B, F,) is a
nite measure space and x L1 (; X). Thus for each s , x (s) L1 (B, ). In
general, the map
(s, t) x (s) (t)
will not be product measurable, but one can obtain a measurable representative.
This is important because it allows the use of Fubinis theorem on the measurable
representative.
By Theorem 22.2.5, there exists a sequence of simple functions, {xn }, of the
form
m
xn (s) = ak XEk (s) (22.4.19)
k=1
where ak L1 (B, ) which satisfy the conditions of Denition 22.2.3 and
||xn xm ||L1 (,L1 (B)) 0 as m, n (22.4.20)

22.4. MEASURABLE REPRESENTATIVES 597
For such a simple function, you can assume the Ek are disjoint and then
m m

xn L1 (,L1 (B)) = ak L1 (B) (Ek ) = |ak | d (Ek )
k=1 k=1 B

= |ak (t)| d (t) XEk (s) d (s)
B
= |xn | dd
B
Also, each xn is product measurable. Thus from 22.4.20,

xn xm L1 (,L1 (B)) = |xn xm | dd
B
which shows that {xn } is a Cauchy sequence in L1 ( B, ) . Then there exists

y L1 ( B, ) and a subsequence still called {xn } such that

lim |xn y| dd = lim xn yL1 (B) d = xn yL1 (,L1 (B)) = 0.
n B n
Now consider 22.4.20. Since limm xm (s) = x (s) in L1 (B) , it follows from
Fatous lemma that
||xn x||L1 (,L1 (B)) lim inf ||xn xm ||L1 (,L1 (B)) <
m
for all n large enough. Hence
lim ||xn x||L1 (,L1 (B)) = 0

n
and so
x (s) = y (s) in L1 (B) a.e. s
In particular, for a.e. s, it follows that
x (s) (t) = y (s, t) for a.e. t.

( )
Now x (s) d X = L1 (B, ) so it makes sense to ask for x (s) d (t),
at least a.e. t. To nd what this is, note

xn (s) d x (s) d ||xn (s) x (s)||X d.

X
Therefore, since the right side converges to 0,

lim xn (s) d x (s) d =
n X
( ) ( )

lim xn (s) d (t) x (s) d (t) d = 0.
n
B
But ( )
xn (s) d (t) = xn (s, t) d a.e. t.

Therefore
( )

lim xn (s, t) d x (s) d (t) d = 0. (22.4.21)
n
B
Also, since xn y in L1 ( B),

0 = lim |xn (s, t) y (s, t)| dd
n B

y (s, t) d d.
lim
n xn (s, t) d (22.4.22)
B
From 22.4.21 and 22.4.22
( )
y (s, t) d = x (s) d (t) a.e. t.

This proves the following theorem.
Theorem 22.4.1 Let X = L1 (B) where (B, F, ) is a nite measure space and
let x L1 (; X). Then there exists a measurable representative, y L1 ( B),
such that
x (s) = y (s, ) a.e. s in , the equation in L1 (B) ,
and ( )
y (s, t) d = x (s) d (t) a.e. t.

22.5 Vector Measures

There is also a concept of vector measures.
Denition 22.5.1 Let (, S) be a set and a algebra of subsets of . A mapping
F :SX
is said to be a vector measure if

F (
i=1 Ei ) = F (Ei )
i=1

whenever {Ei }i=1 is a sequence of disjoint elements of S. For F a vector measure,

|F | (A) sup{ || (F )|| : (A) is a partition of A}.
F (A)
This is the same denition that was given in the case where F would have values
in C, the only dierence being the fact that now F has values in a general Banach
space X as the vector space of values of the vector measure. Recall that a partition
of A is a nite set, {F1 , , Fm } S such that m
i=1 Fi = A. The same theorem
about |F | proved in the case of complex valued measures holds in this context with
the same proof. For completeness, it is included here.
Theorem 22.5.2 If |F | () < , then |F | is a measure on S.
Proof: Let E1 and E2 be sets of S such that E1 E2 = and let {Ai1 Aini } =
(Ei ), a partition of Ei which is chosen such that

ni
|F |(Ei ) < ||F (Aij )|| i = 1, 2.
j=1
Consider the sets which are contained in either of (E1 ) or (E2 ) , it follows this
collection of sets is a partition of E1 E2 which is denoted here by (E1 E2 ).
Then by the above inequality and the denition of total variation,

|F |(E1 E2 ) ||F (F )|| > |F |(E1 ) + |F |(E2 ) 2,
F (E1 E2 )
which shows that since > 0 was arbitrary,
|F |(E1 E2 ) |F |(E1 ) + |F |(E2 ). (22.5.23)

Let {Ej }j=1 be a sequence of disjoint sets of S and let E = j=1 Ej . Then by the
denition of total variation there exists a partition of E , (E ) = {A1 , , An }
such that
n
|F |(E ) < ||F (Ai )||.
i=1
Also,
Ai = j=1 Ai Ej

and so by the triangle inequality, ||F (Ai )|| j=1 ||F (Ai Ej )||. Therefore, by
the above,
||F (Ai )||
z }| {

n
|F |(E ) < ||F (Ai Ej )||
i=1 j=1
n
= ||F (Ai Ej )||
j=1 i=1

|F |(Ej )
j=1
n
because {Ai Ej }i=1 is a partition of Ej .
Since > 0 is arbitrary, this shows

|F |(
j=1 Ej ) |F |(Ej ).
j=1
n
Also, 22.5.23 implies that whenever the Ei are disjoint, |F |(nj=1 Ej ) j=1 |F |(Ej ).
Therefore,

n
|F |(Ej ) |F |(
j=1 Ej ) |F |(j=1 Ej )
n
|F |(Ej ).
j=1 j=1
Since n is arbitrary,

|F |(
j=1 Ej ) = |F |(Ej )
j=1
which shows that |F | is a measure as claimed.
Denition 22.5.3 A Banach space is said to have the Radon Nikodym property if
whenever
(, S, ) is a nite measure space
F : S X is a vector measure with |F | () <
F
then one may conclude there exists g L1 (; X) such that

F (E) = g (s) d
E
for all E S.
Some Banach spaces have the Radon Nikodym property and some dont. No
attempt is made to give a complete answer to the question of which Banach spaces
have this property but the next theorem gives examples of many spaces which do.
Theorem 22.5.4 Suppose X is a separable dual space. Then X has the Radon
Nikodym property.
Proof: Let F and let |F | () < for F : S X , a vector measure. Pick

x X and consider the map
E F (E) (x)
for E S. This denes a complex measure which is absolutely continuous with
respect to |F |. Therefore, by the Radon Nikodym theorem, there exists fx
L1 (, |F |) such that
F (E) (x) = fx (s) d |F |. (22.5.24)
E
Claim: |fx (s)| ||x|| for |F | a.e. s.

Proof of claim: Consider the closed ball in F, B (0, ||x||) and let B B (p, r)
be an open ball contained in its complement. Let fx1 (B) E S. I want to
argue that |F | (E) = 0 so suppose |F | (E) > 0. then
|F | (E) ||x|| ||F (E)|| ||x|| |F (E) (x)|
and so from 22.5.24,

1
fx (s) d |F | ||x|| . (22.5.25)

|F | (E) E
But on E, |fx (s) p| < r and so

1
fx (s) d |F | p < r
|F | (E)
E
which contradicts 22.5.25 because B (p, r) was given to have empty intersection with
B (0, ||x||). Therefore, |F | (E) = 0 as hoped.
( Now F \)B (0, ||x||) can be covered by
countably many such balls and so |F | F \ B (0, ||x||) = 0.
Denote the exceptional set of measure zero by Nx . By Theorem 22.1.13, X is
separable. Letting D be a dense, countable subset of X, dene
N1 xD Nx .
Thus
|F | (N1 ) = 0.
For any E S, x, y D, and a, b F,

fax+by (s) d |F | = F (E) (ax + by) = aF (E) (x) + bF (E) (y)
E

= (afx (s) + bfy (s)) d |F |. (22.5.26)
E
Since 22.5.26 holds for all E S, it follows
fax+by (s) = afx (s) + bfy (s)
for |F
| a.e. s and x, y D. Let D consist of all nite linear combinations of the
m
form i=1 ai xi where ai is a rational point of F and xi D. If

m
ai xi D,

i=1
the above argument implies

m
fm
i=1 ai xi
(s) = ai fxi (s) a.e.
i=1
is countable, there exists a set, N2 , with

Since D
|F | (N2 ) = 0
such that for s

/ N2 ,

m
fm
i=1 ai xi
(s) = ai fxi (s) (22.5.27)
i=1
m
whenever i=1 ai xi D.
Let
N = N1 N2
and let
x (s) XN C (s) fx (s)
h
for all x D.
Now for x X dene
hx (s) lim x (s) : x D}.

{h
x x
This is well dened because if x and y are elements of D, the above claim and
22.5.27 imply
y (s) = h
(x y ) (s) ||x y ||.
hx (s) h
Using 22.5.27, the dominated convergence theorem may be applied to conclude that
for xn x, with xn D,

hx (s) d |F | = lim x (s) d |F | = lim F (E) (xn ) = F (E) (x). (22.5.28)
h n
E n E n
that for all x, y X and a, b F,

It follows from the density of D
|hx (s)| ||x|| , hax+by (s) = ahx (s) + bhy (s), (22.5.29)
for all s because if s N , both sides of the equation in 22.5.29 equal 0.

Let (s) be given by
(s) (x) = hx (s).
By 22.5.29 it follows that (s) X for each s. Also
(s) (x) = hx (s) L1 ()
so () is weak measurable. Since X is separable, Theorem 22.1.12 implies that

is strongly measurable. Furthermore, by 22.5.29,
|| (s)|| sup | (s) (x)| sup |hx (s)| 1.

||x||1 ||x||1
Therefore,
|| (s)|| d |F | <

22.6. THE RIESZ REPRESENTATION THEOREM 603
so L1 (; X ). By 22.2.6, if E S,
( )
hx (s) d |F | = (s) (x) d |F | = (s) d |F | (x). (22.5.30)
E E E
From 22.5.28 and 22.5.30,

( )
(s) d |F | (x) = F (E) (x)
E
for all x X and therefore,

(s) d |F | = F (E).
E
Finally, since F , |F | also and so there exists k L1 () such that

|F | (E) = k (s) d
E
for all E S, by the Radon Nikodym Theorem. It follows

F (E) = (s) d |F | = (s) k (s) d.
E E
Letting g (s) = (s) k (s), this has proved the theorem.
Corollary 22.5.5 Any separable reexive Banach space has the Radon Nikodym
property.
It is not necessary to assume separability in the above corollary. For the proof
of a more general result, consult Vector Measures by Diestal and Uhl, [11].
22.6 The Riesz Representation Theorem

The Riesz representation theorem for the spaces Lp (; X) holds under certain con-
ditions. The proof follows the proofs given earlier for scalar valued functions.
Denition 22.6.1 If X and Y are two Banach spaces, X is isometric to Y if there

exists L (X, Y ) such that
||x||Y = ||x||X .
This will be written as X

= Y . The map is called an isometry.

The next theorem says that Lp (; X ) is always isometric to a subspace of
p
(L (; X)) for any Banach space, X.
Theorem 22.6.2 Let X be any Banach space and let (, S, ) be a nite measure

space. Let p 1 and let 1/p + 1/p = 1.(If p = 1, p .) Then Lp (; X ) is

isometric to a subspace of (Lp (; X)) . Also, for g Lp (; X ),

sup g (s) (f (s)) d = ||g||p .
||f ||p 1

Proof: First observe that for f Lp (; X) and g Lp (; X ),
s g (s) (f (s))
is a function in L1 (). (To obtain measurability, write f as a limit of simple

functions. Holders inequality then yields the function is in L1 ().) Dene

: Lp (; X ) (Lp (; X))
by
g (f ) g (s) (f (s)) d.

Holders inequality implies
||g|| ||g||p (22.6.31)
and it is also clear that is linear. Next it is required to show
||g|| = ||g||.
This will rst be veried for simple functions. Let

m
g (s) = ci XEi (s)
i=1
where ci X , the Ei are disjoint and
m
i=1 Ei = .

Then ||g|| Lp (). Let > 0 be given. By the scalar Riesz representation
theorem, there exists h Lp () such that ||h||p = 1 and

||g (s)||X h (s) d ||g||Lp (;X ) .

Now let di be chosen such that
ci (di ) ||ci ||X / ||h||L1 ()
and ||di ||X 1. Let

m
f (s) di h (s) XEi (s).
i=1
Thus f Lp (; X) and ||f ||Lp (;X) 1. This follows from

m
p p p
||f ||p = ||di ||X |h (s)| XEi (s) d
i=1
m (
)
p p p
= |h (s) | d ||di ||X |h| d = 1.
i=1 Ei
Also

||g|| |g (f )| = g (s) (f (s)) d

m
( )

||ci ||X / ||h||L1 () h (s) XEi (s) d

i=1

||g (s)||X h (s) d h (s) / ||h||L1 () d

||g||Lp (;X ) 2.
Since was arbitrary,
||g|| ||g|| (22.6.32)
and from 22.6.31 this shows equality holds in 22.6.32 whenever g is a simple function.

In general, let g Lp (; X ) and let gn be a sequence of simple functions
p
converging to g in L (; X ). Then
||g|| = lim ||gn || = lim ||gn || = ||g||.

n n
This proves the theorem and shows is the desired isometry.
Theorem 22.6.3 If X is a Banach space and X has the Radon Nikodym property,
then if (, S, ) is a nite measure space,
(Lp (; X))

= Lp (; X )
and in fact the mapping of Theorem 22.6.2 is onto.

Proof: Let l (Lp (; X)) and dene F (E) X by
F (E) (x) l (XE () x).
Lemma 22.6.4 F dened above is a vector measure with values in X and |F | () <
.
Proof of the lemma: Clearly F (E) is linear. Also
||F (E)|| = sup ||F (E) (x)||

||x||1
1/p
||l|| sup ||XE () x||Lp (;X) ||l|| (E) .
||x||1
Let {Ei }
i=1 be a sequence of disjoint elements of S and let E = n< En .

n n

F (E) (x) F (Ek ) (x) = l (XE () x) l (XEi () x) (22.6.33)

k=1 i=1

n

||l|| XE () x XEi () x
p
i=1 L (;X)
( )1/p

||l|| Ek ||x||.
k>n
Since () < ,
( )1/p

lim Ek =0
n
k>n
and so inequality 22.6.33 shows that

n

lim F (E) F (Ek ) = 0.
n
k=1 X
To show |F | () < , let > 0 be given, let {H1 , , Hn } be a partition of ,

and let ||xi || 1 be chosen in such a way that
F (Hi ) (xi ) > ||F (Hi )|| /n.
Thus
n

n
n

+ ||F (Hi )|| < l (XHi () xi ) ||l|| XHi () xi

i=1 i=1 i=1 Lp (;X)
( )1/p

n
1/p
||l|| XHi (s) d = ||l|| () .
i=1
Since > 0 was arbitrary,

n
1/p
||F (Hi )|| < ||l|| () .
i=1
1/p
Since the partition was arbitrary, this shows |F | () ||l|| () and this proves
the lemma.
Continuing with the proof of Theorem 22.6.3, note that
F .
Since X has the Radon Nikodym property, there exists g L1 (; X ) such that

F (E) = g (s) d.
E
Also, from the denition of F (E) ,

( n )

n
l xi XEi () = l (XEi () xi )
i=1 i=1

n n

= F (Ei ) (xi ) = g (s) (xi ) d. (22.6.34)
i=1 i=1 Ei
It follows from 22.6.34 that whenever h is a simple function,

l (h) = g (s) (h (s)) d. (22.6.35)

Let
Gn {s : ||g (s)||X n}
and let
j : Lp (Gn ; X) Lp (; X)
be given by {
h (s) if s Gn ,
jh (s) =
0 if s
/ Gn .
Thus j is the zero extension o of Gn . Letting h be a simple function in Lp (Gn ; X),

j l (h) = l (jh) = g (s) (h (s)) d. (22.6.36)
Gn

Since the simple functions are dense in Lp (Gn ; X), and g Lp (Gn ; X ), it follows
22.6.36 holds for all h Lp (Gn ; X). By Theorem 22.6.2,
||g||Lp (Gn ;X ) = ||j l||(Lp (Gn ;X)) ||l||(Lp (;X)) .
By the monotone convergence theorem,
||g||Lp (;X ) = lim ||g||Lp (Gn ;X ) ||l||(Lp (;X)) .

n

Therefore g Lp (; X ) and since simple functions are dense in Lp (; X), 22.6.35
holds for all h Lp (; X) . Thus l = g and the theorem is proved because, by
Theorem 22.6.2, ||l|| = ||g|| and the mapping is onto because l was arbitrary.
As in the scalar case, everything generalizes to the case of nite measure
spaces. The proof is almost identical.
Lemma 22.6.5 Let (, S, ) be a nite measure space and let X be a Banach

space such that X has the Radon Nikodym property. Then there exists a measurable
function, r such that r (x) > 0 for all x, such that |r (x)| < M for all x, and
rd < . For
(Lp (; X)) , p 1,

there exists a unique h Lp (; X ), L (; X ) if p = 1 such that

f = h (f ) d.
Also ||h|| = ||||. (||h|| = ||h||p if p > 1, ||h|| if p = 1). Here
1 1
+ = 1.
p p
Proof: First suppose r exists as described. Also, to save on notation and to
emphasize the similarity with the scalar case, denote the norm in the various spaces
by ||. Dene a new measure e, according to the rule

e (E)
rd. (22.6.37)
E
Thus e is a nite measure on S. Now dene a mapping, : Lp (; X, )

e) by
Lp (; X,
f = r p f.
1
Then
p p1 p p
||f ||Lp (e) = r f rd = ||f ||Lp ()
and so is one to one and in fact preserves norms. I claim that also is onto. To
1
see this, let g Lp (; X, e) and consider the function, r p g. Then

p1 p p p
r g d = |g| rd = |g| de <
1
( 1 )
Thus r p g Lp (; X, ) and r p g = g showing that is onto as claimed. Thus
is one to one, onto, and preserves norms. Consider the diagram below which is
descriptive of the situation in which must be one to one and onto.

p e
h, L (e
) L (ep
) , Lp () ,

Lp (e
) Lp ()

e Lp (e
Then for Lp () , there exists a unique
e = ,
) such that e =
|||| . By the Riesz representation theorem for nite measure spaces, there exists

a unique h Lp (e

) Lp (; X , e
e) which represents in
the
manner described
e
in the Riesz representation theorem. Thus ||h||Lp (e) = = |||| and for all
f Lp () ,
( 1 )
e e (f ) =
(f ) = (f ) h (f ) de
= rh r p f d

1
= r p hf d.
Now
p1 p p p
r h d = |h| rd = ||h||Lp (e) < .
1
e
Thus r p h = ||h||Lp (e) = = |||| and represents in the appropriate
Lp ()
way. If p = 1, then 1/p 0. Now consider the existence of r. Since the measure
space is nite, there exist {n } disjoint, each having positive measure and their
union equals . Then dene

1
r () 2
(n )1 Xn ()
n=1
n
This proves the Lemma.
Theorem 22.6.6 (Riesz representation theorem) Let (, S, ) be nite and let
X have the Radon Nikodym property. Then for
(Lp (; X, )) , p 1
there exists a unique h Lq (, X , ), L (, X , ) if p = 1 such that

f = h (f ) d.

1 1
+ = 1.
p q
Proof: The above lemma gives the existence part of the conclusion of the
theorem. Uniqueness is done as before.
Corollary 22.6.7 If X is separable, then for (, S, ) a nite measure space,
(Lp (; X))

= Lp (; X ).
Corollary 22.6.8 If X is separable and reexive, then for (, S, ) a nite mea-
sure space,
(Lp (; X))

= Lp (; X ).
Corollary 22.6.9 If X is separable and reexive and (, S, ) a nite measure
space,then if p (1, ) , then Lp (; X) is reexive.
Proof: This is just like the scalar valued case.
Hausdor Measure
23.1 Denition Of Hausdor Measures

This chapter is on Hausdor measures. First I will discuss some outer measures. In
all that is done here, (n) will be the volume of the ball in Rn which has radius 1.
Denition 23.1.1 For a set E, denote by r (E) the number which is half the di-
ameter of E. Thus
1 1
r (E) sup {|x y| : x, y E} diam (E)
2 2
Let E Rn .

Hs (E) inf{ (s)(r (Cj ))s : E
j=1 Cj , diam(Cj ) }
j=1
Hs (E) lim Hs (E).

0
In the above denition, (s) is an appropriate positive constant depending on

s. It will turn out that for n an integer, (n) = (n) where (n) is the Lebesgue
measure of the unit ball, B (0, 1) where the usual norm is used to determine this
ball.
Lemma 23.1.2 Hs and Hs are outer measures.
Proof: It is clear that Hs () = 0 and if A B, then Hs (A) Hs (B) with

similar assertions valid for Hs . Suppose E =
i=1 Ei and H (Ei ) < for each i.
s
i
Let {Cj }j=1 be a covering of Ei with

(s)(r(Cji ))s /2i < Hs (Ei )
j=1
611
612 HAUSDORFF MEASURE
and diam(Cji ) . Then

Hs (E) (s)(r(Cji ))s Hs (Ei ) + /2i + Hs (Ei ).
i=1 j=1 i=1 i=1
It follows that since > 0 is arbitrary,

Hs (E) Hs (Ei )
i=1
which shows Hs is an outer measure. Now notice that Hs (E) is increasing as 0.

Picking a sequence k decreasing to 0, the monotone convergence theorem implies

Hs (E) Hs (Ei ).
i=1
The outer measure Hs is called s dimensional Hausdor measure when restricted

to the algebra of Hs measurable sets.
Next I will show the algebra of Hs measurable sets includes the Borel sets.
This is done by the following very interesting condition known as Caratheodorys
criterion.
23.1.1 Properties Of Hausdor Measure

Denition 23.1.3 For two sets, A, B in a metric space, we dene
dist (A, B) inf {d (x, y) : x A, y B} .
Theorem 23.1.4 Let be an outer measure on the subsets of (X, d), a metric
space. If
(A B) = (A) + (B)
whenever dist(A, B) > 0, then the algebra of measurable sets contains the Borel
sets.
Proof: It suces to show that closed sets are in S, the -algebra of measurable
sets, because then the open sets are also in S and consequently S contains the Borel
sets. Let K be closed and let S be a subset of . Is (S) (S K) + (S \ K)?
It suces to assume (S) < . Let
1
Kn {x : dist(x, K) }
n
Since, x dist (x, K) is continuous, it follows Kn is closed. By the assumption of
the theorem,
(S) ((S K) (S \ Kn )) = (S K) + (S \ Kn ) (23.1.1)

23.1. DEFINITION OF HAUSDORFF MEASURES 613
since S K and S \ Kn are a positive distance apart. Now
(S \ Kn ) (S \ K) (S \ Kn ) + ((Kn \ K) S). (23.1.2)
If limn ((Kn \ K) S) = 0 then the theorem will be proved because this limit
along with 23.1.2 implies limn (S \ Kn ) = (S \ K) and then taking a limit
in 23.1.1, (S) (S K) + (S \ K) as desired. Therefore, it suces to establish
this limit.
Since K is closed, a point, x
/ K must be at a positive distance from K and so
Kn \ K =
k=n Kk \ Kk+1 .
Therefore

(S (Kn \ K)) (S (Kk \ Kk+1 )). (23.1.3)
k=n
If

(S (Kk \ Kk+1 )) < , (23.1.4)
k=1
then (S (Kn \ K)) 0 because it is dominated by the tail of a convergent series

so it suces to show 23.1.4.

M
(S (Kk \ Kk+1 )) =
k=1

(S (Kk \ Kk+1 )) + (S (Kk \ Kk+1 )). (23.1.5)
k even, kM k odd, kM
By the construction, the distance between any pair of sets, S (Kk \ Kk+1 ) for
dierent even values of k is positive and the distance between any pair of sets,
S (Kk \ Kk+1 ) for dierent odd values of k is positive. Therefore,

(S (Kk \ Kk+1 )) + (S (Kk \ Kk+1 ))
k even, kM k odd, kM

( S (Kk \ Kk+1 )) + ( S (Kk \ Kk+1 )) 2 (S) <
k even k odd
M
and so for all M, k=1 (S (Kk \ Kk+1 )) 2 (S) showing 23.1.4
With the above theorem, the following theorem is easy to obtain.
Theorem 23.1.5 The algebra of Hs measurable sets contains the Borel sets and
Hs has the property that for all E Rn , there exists a Borel set F E such that
Hs (F ) = Hs (E).
Proof: Let dist(A, B) = 2 0 > 0. Is it the case that

Hs (A) + Hs (B) = Hs (A B)?
This is what is needed to use Caratheodorys criterion.
Let {Cj }
j=1 be a covering of A B such that diam(Cj ) < 0 for each j and

Hs (A B) + > (s)(r (Cj ))s.
j=1
Thus
Hs (A B ) + > (s)(r (Cj ))s + (s)(r (Cj ))s
jJ1 jJ2
where
J1 = {j : Cj A = }, J2 = {j : Cj B = }.
Recall dist(A, B) = 2 0 , J1 J2 = . It follows
Hs (A B) + > Hs (A) + Hs (B).
Letting 0, and noting > 0 was arbitrary, yields
Hs (A B) Hs (A) + Hs (B).
Equality holds because Hs is an outer measure. By Caratheodorys criterion, Hs is
a Borel measure.
To verify the second assertion, note rst there is no loss of generality in letting
Hs (E) < . Let
E j=1 Cj , r(Cj ) < ,
and

Hs (E) + > (s)(r (Cj ))s.
j=1
Let
F =
j=1 Cj .
Thus F E and

()s
Hs (E) Hs (F ) (s)(r Cj ) = (s)(r (Cj ))s < + Hs (E).
j=1 j=1
Let k 0 and let F =

k=1 F k . Then F E and
Hsk (E) Hsk (F ) Hsk (F ) k + Hsk (E).

Letting k ,
Hs (E) Hs (F ) Hs (E)
A measure satisfying the conclusion of Theorem 23.1.5 is called a Borel regular
measure.
23.2. HN AND MN 615
23.2 Hn And mn
Next I will compare Hn and mn . To do this, recall the following covering theorem
of Corollary 10.3.6 on Page 248.
Theorem 23.2.1 Let E Rn and let F, be a collection of balls of bounded radii

such that F covers E in the sense of Vitali. Then there exists a countable collection
of disjoint balls from F, {Bj }
j=1 , such that mn (E \ j=1 Bj ) = 0.
In the next lemma, the balls are the usual balls taken with respect to the usual
distance in Rn .
Lemma 23.2.2 If mn (S) = 0 then Hn (S) = Hn (S) = 0. Also, there exists a

constant, k such that Hn (E) kmn (E) for all E Borel. Also, if Q0 [0, 1)n , the
unit cube, then Hn ([0, 1)n ) > 0.
Proof: Suppose rst mn (S) = 0. Without loss of generality, S is bounded.

Then by outer regularity, there exists a bounded open V containing S (and m
) n (V ) <
c c
. For each x S, there exists a ball Bx such that Bx V and > r Bx . By the
{ }
Vitali covering theorem there is a sequence of disjoint balls {Bk } such that B ck
covers S. Then letting (n) be the Lebesgue measure of the unit ball in Rn
( )n
Hn (S) ck = (n) 5n
(n) r B (n) r (Bk )
n
(n)
k k
(n) n (n) n
5 mn (V ) < 5
(n) (n)
Since is arbitrary, this shows Hn (S) = 0 and now it follows Hn (S) = 0.

Letting U be an open set and > 0, consider all balls, B contained in U which
have diameters less than . This is a Vitali covering of U and therefore by Theorem
23.2.1, there exists {Bi } , a sequence of disjoint balls of radii less than contained
in U such that i=1 Bi diers from U by a set of Lebesgue measure zero. Let (n)
be the Lebesgue measure of the unit ball in Rn . Then from what was just shown,

n (n) n
Hn (U ) = Hn (i Bi ) (n) r (Bi ) = (n) r (Bi )
i=1
(n) i=1

(n) (n)
= mn (Bi ) = mn (U ) kmn (U ) .
(n) i=1
(n)
Now letting E be Borel, it follows from the outer regularity of mn there exists
a decreasing sequence of open sets, {Vi } containing E such such that mn (Vi )
mn (E) . Then from the above,
Hn (E) lim Hn (Vi ) lim kmn (Vi ) = kmn (E) .

i i
Since > 0 is arbitrary, it follows that also
Hn (E) kmn (E) .
This proves the rst part of the lemma.

To verify the second part, note that it is obvious Hn and Hn are translation
invariant because diameters of sets do not change when translated. Therefore, if
Hn ([0, 1)n ) = 0, it follows Hn (Rn ) = 0 because Rn is the countable union of
translates of Q0 [0, 1)n . Since each Hn is no larger than Hn , the same must hold
for Hn . Therefore, there exists a sequence of sets, {Ci } each having diameter less
than such that the union of these sets equals Rn but

n
1> (n) r (Ci ) .
i=1
Now let Bi be a ball having radius equal to diam (Ci ) = 2r (Ci ) which contains Ci .
It follows
n (n) 2n n
mn (Bi ) = (n) 2n r (Ci ) = (n) r (Ci )
(n)
which implies

n (n)
1> (n) r (Ci ) = mn (Bi ) = ,
i=1 i=1
(n) 2n
a contradiction.
Theorem 23.2.3 By choosing (n) properly, one can obtain Hn = mn on all

Lebesgue measurable sets.
Proof: I will show Hn is a positive multiple of mn for any choice of (n) . Dene
mn (Q0 )
k=
Hn (Q0 )
where Q0 = [0, 1)n is the half open unit cube in Rn . I will show kHn (E) = mn (E)
for any Lebesgue measurable set. When this is done, it will follow that by adjusting
(n) the multiple
n can be taken to be 1.
Let Q(= ) i=1 [ai , ai + 2k ) be a half open box where ai = l2k . Thus Q0 is the
n
union of 2k of these identical half open boxes. By translation invariance, of Hn
and mn
( k )n n 1 1 ( k )n
2 H (Q) = Hn (Q0 ) = mn (Q0 ) = 2 mn (Q) .
k k
Therefore, kHn (Q) = mn (Q) for any such half open box and by translation in-
variance, for the translation of any such half open box. It follows from Lemma
16.5.2 that kHn (U ) = mn (U ) for all open sets. It follows immediately, since every
compact set is the countable intersection of open sets that kHn = mn on compact
23.3. TECHNICAL CONSIDERATIONS 617
sets. Therefore, they are also equal on all closed sets because every closed set is
the countable union of compact sets. Now let F be an arbitrary Lebesgue measur-
able set. I will show that F is Hn measurable and that kHn (F ) = mn (F ). Let
Fl = B (0, l) F. Then there exists H a countable union of compact sets and G a
countable intersection of open sets such that
H Fl G (23.2.6)
and mn (G \ H) = 0 which implies by Lemma 23.2.2
mn (G \ H) = kHn (G \ H) = 0. (23.2.7)
To do this, let {Gi } be a decreasing sequence of bounded open sets containing Fl

and let {Hi } be an increasing sequence of compact sets contained in Fl such that
kHn (Gi \ Hi ) = mn (Gi \ Hi ) < 2i
Then letting G = i Gi and H = i Hi this establishes 23.2.6 and 23.2.7. Then by

completeness of Hn it follows Fl is Hn measurable and
kHn (Fl ) = kHn (H) = mn (H) = mn (Fl ) .
Now taking l , it follows F is Hn measurable and kHn (F ) = mn (F ). There-

fore, adjusting (n) it can be assumed the constant, k is 1.
The exact determination of (n) is more technical.
23.3 Technical Considerations

Let (n) be the volume of the unit ball in Rn . Thus the volume of B(0, r) in Rn
is (n)rn from the change of variables formula. There is a very important and
interesting inequality known as the isodiametric inequality which says that if A is
any set in Rn , then
m(A) (n)(21 diam(A))n = (n) r (A) .

n
This inequality may seem obvious at rst but it is not really. The reason it is not
is that there are sets which are not subsets of any sphere having the same diameter
as the set. For example, consider an equilateral triangle.
Lemma 23.3.1 Let f : Rn1 [0, ) be Borel measurable and let
S = {(x,y) :|y| < f (x)}.
Then S is a Borel set in Rn .
Proof: Set sk be an increasing sequence of Borel measurable functions converg-

ing pointwise to f .

Nk
sk (x) = ckm XEmk (x).
m=1
Let
Sk = N
m=1 Em (cm , cm ).
k k k k
Then (x,y) Sk if and only if f (x) > 0 and |y| < sk (x) f (x). It follows that
Sk Sk+1 and
S = k=1 Sk .
But each Sk is a Borel set and so S is also a Borel set.

Let Pi be the projection onto
span (e1 , , ei1 , ei+1 , , en )
Rn , ek being the vector having a 1

where the ek are the standard basis vectors in
in the k slot and a 0 elsewhere. Thus Pi x j=i xj ej . Also let
th
APi x {xi : (x1 , , xi , , xn ) A}
APi x x
Pi x span{e1 , , ei1 ei+1 , , en }.
Lemma 23.3.2 Let A Rn be a Borel set. Then Pi x m(APi x ) is a Borel

measurable function dened on Pi (Rn ).
n
Proof: Let K be the system consisting of sets of the form j=1 Aj where Ai
is Borel. Also let G denote those Borel sets of Rn such that if A G then
Pi x m((A Rk )Pi x ) is Borel measurable.
where Rk = (k, k)n . Thus K G. If A G

(( ) )
Pi x m AC Rk Pi x
is Borel measurable because it is of the form

( ) ( )
m (Rk )Pi x m (A Rk )Pi x
and these are Borel measurable functions of Pi x. Also, if {Ai } is a disjoint sequence
of sets in G then
( ) ( )
m (i Ai Rk )Pi x = m (Ai Rk )Pi x
i
and each function of Pi x is Borel measurable. Thus by the lemma on systems,

Lemma 9.2.2, G = B (Rn ) .
Now let A Rn be Borel. Let Pi be the projection onto

( )
span e1 , , ei1 , ei+1 , , en
and as just described,
APi x = {y R : Pi x + yei A}
Thus for x = (x1 , , xn ),
APi x = {y R : (x1 , , xi1 , y, xi+1 , , xn ) A}.
Since A is Borel, it follows from Lemma 23.3.1 that
Pi x m(APi x )
is a Borel measurable function on Pi Rn = Rn1.
23.3.1 Steiner Symmetrization

Dene
S(A, ei ) {x =Pi x + yei : |y| < 21 m(APi x )}
Lemma 23.3.3 Let A be a Borel subset of Rn . Then S(A, ei ) satises
Pi x + yei S(A, ei ) if and only if Pi x yei S(A, ei ),
S(A, ei ) is a Borel set in Rn,

mn (S(A, ei )) = mn (A), (23.3.8)
diam(S(A, ei )) diam(A). (23.3.9)
Proof : The rst assertion is obvious from the denition. The Borel measur-
ability of S(A, ei ) follows from the denition and Lemmas 23.3.2 and 23.3.1. To
show Formula 23.3.8,
21 m(APi x )
mn (S(A, ei )) = dxi dx1 dxi1 dxi+1 dxn
P i Rn 21 m(APi x )

= m(APi x )dx1 dxi1 dxi+1 dxn = m(A).
P i Rn
Now suppose x1 and x2 S(A, ei )
x1 = Pi x1 + y1 ei , x2 = Pi x2 + y2 ei .
For x A dene
l(x) = sup{y : Pi x+yei A}.
g(x) = inf{y : Pi x+yei A}.
Then it is clear that
l(x1 ) g(x1 ) m(APi x1 ) 2|y1 |, (23.3.10)
l(x2 ) g(x2 ) m(APi x2 ) 2|y2 |. (23.3.11)

Claim: |y1 y2 | |l(x1 ) g(x2 )| or |y1 y2 | |l(x2 ) g(x1 )|.
Proof of Claim: If not,
2|y1 y2 | > |l(x1 ) g(x2 )| + |l(x2 ) g(x1 )|
|l(x1 ) g(x1 ) + l(x2 ) g(x2 )|

= l(x1 ) g(x1 ) + l(x2 ) g(x2 ).
2 |y1 | + 2 |y2 |
by 23.3.10 and 23.3.11 contradicting the triangle inequality.
Now suppose |y1 y2 | |l(x1 ) g(x2 )|. From the claim,
|x1 x2 | = (|Pi x1 Pi x2 |2 + |y1 y2 |2 )1/2

(|Pi x1 Pi x2 |2 + |l(x1 ) g(x2 )|2 )1/2
(|Pi x1 Pi x2 |2 + (|z1 z2 | + 2)2 )1/2

diam(A) + O( )
where z1 and z2 are such that Pi x1 + z1 ei A, Pi x2 + z2 ei A, and
|z1 l(x1 )| < and |z2 g(x2 )| < .
If |y1 y2 | |l(x2 ) g(x1 )|, then we use the same argument but let
|z1 g(x1 )| < and |z2 l(x2 )| < ,
Since x1 , x2 are arbitrary elements of S(A, ei ) and is arbitrary, this proves 23.3.9.

The next lemma says that if A is already symmetric with respect to the j th
direction, then this symmetry is not destroyed by taking S (A, ei ).
Lemma 23.3.4 Suppose A is a Borel set in Rn such that Pj x + ej xj A if and

only if Pj x+(xj )ej A. Then if i = j, Pj x + ej xj S(A, ei ) if and only if
Pj x+(xj )ej S(A, ei ).
Proof : By denition,
Pj x + ej xj S(A, ei )
if and only if
|xi | < 21 m(APi (Pj x+ej xj ) ).
Now
xi APi (Pj x+ej xj )
if and only if
xi APi (Pj x+(xj )ej )
by the assumption on A which says that A is symmetric in the ej direction. Hence
Pj x + ej xj S(A, ei )
if and only if
|xi | < 21 m(APi (Pj x+(xj )ej ) )
if and only if
Pj x+(xj )ej S(A, ei ).
23.3.2 The Isodiametric Inequality

The next theorem is called the isodiametric inequality. It is the key result used to
compare Lebesgue and Hausdor measures.
Theorem 23.3.5 Let A be any Lebesgue measurable set in Rn . Then
mn (A) (n)(r (A))n.
Proof: Suppose rst that A is Borel. Let A1 = S(A, e1 ) and let Ak =

S(Ak1 , ek ). Then by the preceding lemmas, An is a Borel set, diam(An )
diam(A), mn (An ) = mn (A), and An is symmetric. Thus x An if and only if
x An . It follows that
An B(0, r (An )).
(If x An \ B(0, r (An )), then x An \ B(0, r (An )) and so diam (An )
2|x| > diam(An ).) Therefore,
mn (An ) (n)(r (An ))n (n)(r (A))n .
It remains to establish this inequality for arbitrary measurable sets. Letting A be

such a set, let {Kn } be an increasing sequence of compact subsets of A such that
m(A) = lim m(Kk ).

k
Then
m(A) = lim m(Kk ) lim sup (n)(r (Kk ))n

k k
(n)(r (A))n.
23.4 The Proper Value Of (n)

I will show that the proper determination of (n) is (n), the volume of the unit
ball. Since (n) has been adjusted such that k = 1, mn (B (0, 1)) = Hn (B (0, 1)).

There exists a covering of B (0,1) of sets of radii less than , {Ci }i=1 such that
n
Hn (B (0, 1)) + > (n) r (Ci )
i
Then by Theorem 23.3.5, the isodiametric inequality,

n (n) ( )n
Hn (B (0, 1)) + > (n) r (Ci ) = (n) r C i
i
(n) i
(n) ( ) (n)
mn C i mn (B (0, 1))
(n) i (n)
(n) n
= H (B (0, 1))
(n)
Now taking the limit as 0,
(n) n
Hn (B (0, 1)) + H (B (0, 1))
(n)
and since > 0 is arbitrary, this shows (n) (n).

By the Vitali covering theorem, there exists a sequence of disjoint balls, {Bi }
such that B (0, 1) = (i=1 Bi )N where mn (N ) = 0. Then H (N ) = 0 can be con-
n
cluded because H H and Lemma 23.2.2. Using mn (B (0, 1)) = Hn (B (0, 1))
n n
again,

n
Hn (B (0, 1)) = Hn (i Bi ) (n) r (Bi )
i=1

(n) n (n)
= (n) r (Bi ) = mn (Bi )
(n) i=1
(n) i=1
(n) (n) (n) n
= mn (i Bi ) = mn (B (0, 1)) = H (B (0, 1))
(n) (n) (n)
which implies (n) (n) and so the two are equal. This proves that if (n) =
(n) , then the Hn = mn on the measurable sets of Rn .
This gives another way to think of Lebesgue measure which is a particularly nice
way because it is coordinate free, depending only on the notion of distance.
For s < n, note that Hs is not a Radon measure because it will not generally be
nite on compact sets. For example, let n = 2 and consider H1 (L) where L is a line
segment joining (0, 0) to (1, 0). Then H1 (L) is no smaller than H1 (L) when L is
considered a subset of R1 , n = 1. Thus by what was just shown, H1 (L) 1. Hence
H1 ([0, 1] [0, 1]) = . The situation is this: L is a one-dimensional object inside
23.4. THE PROPER VALUE OF (N ) 623
R2 and H1 is giving a one-dimensional measure of this object. In fact, Hausdor

measures can make such heuristic remarks as these precise. Dene the Hausdor
dimension of a set A, as
dim(A) = inf{s : Hs (A) = 0}
23.4.1 A Formula For (n)

What is (n)? Recall the gamma function which makes sense for all p > 0.

(p) et tp1 dt.
0
Lemma 23.4.1 The following identities hold.
p(p) = (p + 1),
( 1 )
(p)(q) = p1
x (1 x) q1
dx (p + q),
0
( )
1
= .
2
Proof: Using integration by parts,

(p + 1) = et tp dt = et tp |
0 + p et tp1 dt = p (p)
0 0
Next

(p) (q) = et tp1 dt es sq1 ds = e(t+s) tp1 sq1 dtds
0 0 0 0
u
u
eu (u s)
p1 q1 p1 q1
= e (u s) s duds = s dsdu
0 s 0 0
1
eu (u ux)
p1 q1
= (ux) udxdu
0 0
1
eu up+q1 (1 x)
p1
= xq1 dxdu
0 0
( 1 )
= (p + q) xp1 (1 x)q1 dx .
0
(1)
It remains to nd 2 .
( )
1 t 1/2 u2 1
eu du
2
= e t dt = e 2udu = 2
2 0 0 u 0
Now
( )2
e(x
2
x2 +y 2 )
ex dx ey dy =
2 2
e dx = dxdy
0 0 0 0 0
/2
1
er rddr =
2
=
0 0 4
and so ( )
1
eu du =
2
=2
2 0
Next let n be a positive integer.
Theorem 23.4.2 (n) = n/2 ((n/2 + 1))1 where (s) is the gamma function

(s) = et ts1 dt.
0
Proof: First let n = 1.

( )
3 1 1
( ) = = .
2 2 2 2
Thus
2
1/2 ((1/2 + 1))1 = = 2 = (1) .

and this shows the theorem is true if n = 1.
Assume the theorem is true for n and let Bn+1 be the unit ball in Rn+1 . Then
by the result in Rn ,
1 1
mn+1 (Bn+1 ) = (n)(1 x2n+1 )n/2 dxn+1 = 2(n) (1 t2 )n/2 dt.
1 0
Doing an integration by parts and using Lemma 23.4.1

1
1 1 1/2
= 2(n)n t (1 t )
2 2 (n2)/2
dt = 2(n)n u (1 u)n/21 du
0 2 0
1
= n(n) u3/21 (1 u)n/21 du = n(n)(3/2)(n/2)(((n + 3)/2))1
0
= n n/2 ((n/2 + 1))1 (((n + 3)/2))1 (3/2)(n/2)
= n n/2 ((n/2)(n/2))1 (((n + 1)/2 + 1))1 (3/2)(n/2)
= 2 n/2 (3/2)(((n + 1)/2 + 1))1 = (n+1)/2 (((n + 1)/2 + 1))1 .
From now on, in the denition of Hausdor measure, it will always be the case
that (s) = (s) . As shown above, this is the right thing to have (s) to equal if s
is a positive integer because this yields the important result that Hausdor measure
is the same as Lebesgue measure. Note the formula, s/2 ((s/2+1))1 makes sense
for any s 0.
23.4. THE PROPER VALUE OF (N ) 625
23.4.2 Hausdor Measure And Linear Transformations

Hausdor measure makes possible a unied development of n dimensional area.
As in the case of Lebesgue measure, the rst step in this is to understand
( )basic
considerations related to linear transformations. Recall that for L L Rk , Rl , L
is dened by
(Lu, v) = (u, L v) .
Also recall Theorem 5.3.4 on Page 119 which is stated here for convenience. This
theorem says you can write a linear transformation as the composition of two linear
transformations, one which preserves length and the other which distorts, the right
polar decomposition. The one which distorts is the one which will have a nontrivial
interaction with Hausdor measure while the one which preserves lengths does not
change Hausdor measure. These ideas are behind the following theorems and
lemmas.
Theorem 23.4.3 Let F be an n m matrix where m n. Then there exists an
m n matrix R and a n n matrix U such that
F = RU, U = U ,
all eigenvalues of U are non negative,
U 2 = F F, R R = I,
and |Rx| = |x|.
Lemma 23.4.4 Let R L(Rn , Rm ), n m, and R R = I. Then if A Rn ,
Hn (RA) = Hn (A).
In fact, if P : Rn Rm satises |P x P y| = |x y| , then
Hn (P A) = Hn (A) .
Proof: Note that
|R(x y)| = (R (x y) , R (x y)) = (R R (x y) , x y) = |x y|
2 2
Thus R preserves lengths.

Now let P be an arbitrary mapping which preserves lengths and let A be
bounded, P (A) j=1 Cj , r(Cj ) < , and

Hn (P A) + > (n)(r(Cj ))n .
j=1
Since P preserves lengths, it follows P is one to one on P (Rn ) and P 1 also preserves
lengths on P (Rn ) . Replacing each Cj with Cj (P A),

( )n
Hn (P A) + > (n)r(Cj (P A))n = (n)r P 1 (Cj (P A)) Hn (A).
j=1 j=1
Thus Hn (P A) Hn (A).
Now let A j=1 Cj , diam(Cj ) , and

n
Hn (A) + (n) (r (Cj ))
j=1
Then

n n
Hn (A) + (n) (r (Cj )) = (n) (r (P Cj )) Hn (P A).
j=1 j=1
Hence Hn (P A) = Hn (A). Letting 0 yields the desired conclusion in the

case where A is bounded. For the general case, let Ar = A B (0, r). Then
Hn (P Ar ) = Hn (Ar ). Now let r .
Lemma 23.4.5 Let F L(Rn , Rm ), n m, and let F = RU where R and U are

described in Theorem 5.3.4 on Page 119. Then if A Rn is Lebesgue measurable,
Hn (F A) = det(U )mn (A).
Proof: Using Theorem 5.3.4 on Page 119 and Theorem 23.2.3,
Hn (F A) = Hn (RU A)
= Hn (U A) = mn (U A) = det(U )mn (A).
Denition 23.4.6 Dene J to equal det(U ). Thus
J = det((F F )1/2 ) = (det(F F ))1/2.

The Hausdor Maximal
Theorem
The purpose of this appendix is to prove the equivalence between the axiom of
choice, the Hausdor maximal theorem, and the well-ordering principle. The Haus-
dor maximal theorem and the well-ordering principle are very useful but a little
hard to believe; so, it may be surprising that they are equivalent to the axiom of
choice. First it is shown that the axiom of choice implies the Hausdor maximal
theorem, a remarkable theorem about partially ordered sets.
A nonempty set is partially ordered if there exists a partial order, , satisfying
xx
and
if x y and y z then x z.
An example of a partially ordered set is the set of all subsets of a given set and
. Note that two elements in a partially ordered sets may not be related. In
other words, just because x, y are in the partially ordered set, it does not follow
that either x y or y x. A subset of a partially ordered set, C, is called a chain
if x, y C implies that either x y or y x. If either x y or y x then x and
y are described as being comparable. A chain is also called a totally ordered set. C
is a maximal chain if whenever Ce is a chain containing C, it follows the two chains
are equal. In other words C is a maximal chain if there is no strictly larger chain.
Lemma A.0.7 Let F be a nonempty partially ordered set with partial order .
Then assuming the axiom of choice, there exists a maximal chain in F.
Proof: Let X be the set of all chains from F. For C X , let
SC = {x F such that C{x} is a chain strictly larger than C}.
If SC = for any C, then C is maximal. Thus, assume SC = for all C X . Let

f (C) SC . (This is where the axiom of choice is being used.) Let
g(C) = C {f (C)}.
627
628 THE HAUSDORFF MAXIMAL THEOREM
Thus g(C) ) C and g(C) \ C ={f (C)} = {a single element of F}. A subset T of X
is called a tower if
T,
C T implies g(C) T ,
and if S T is totally ordered with respect to set inclusion, then
S T .
Here S is a chain with respect to set inclusion whose elements are chains.
Note that X is a tower. Let T0 be the intersection of all towers. Thus, T0 is a
tower, the smallest tower. Are any two sets in T0 comparable in the sense of set
inclusion so that T0 is actually a chain? Let C0 be a set of T0 which is comparable
to every set of T0 . Such sets exist, being an example. Let
B {D T0 : D ) C0 and f (C0 )
/ D} .
The picture represents sets of B. As illustrated in the picture, D is a set of B when

D is larger than C0 but fails to be comparable to g (C0 ). Thus there would be more
than one chain ascending from C0 if B = , rather like a tree growing upward in
more than one direction from a fork in the trunk. It will be shown this cant take
place for any such C0 by showing B = .
f (C0 )
C0 D
This will be accomplished by showing Te0 T0 \ B is a tower. Since T0 is the

smallest tower, this will require that Te0 = T0 and so B = .
Claim: Te0 is a tower and so B = .
Proof of the claim: It is clear that Te0 because for to be contained in B
it would be required to properly contain C0 which is not possible. Suppose D Te0 .
The plan is to verify g (D) Te0 .
Case 1: f (D) C0 . If D C0 , then since both D and {f (D)} are contained in
C0 , it follows g (D) C0 and so g (D) / B. On the other hand, if D ) C0 , then since
e
D T0 , f (C0 ) D and so g (D) also contains f (C0 ) implying g (D)
/ B. These are
the only two cases to consider because C0 is comparable to every set of T0 .
Case 2: f (D) / C0 . If D ( C0 it cant be the case that f (D) / C0 because if
this were so, g (D ) would not compare to C0 .
f (C0 )
D C0 f (D)
Hence if f (D)
/ C0 , then D C0 . If D = C 0 , then f (D) = f (C0 ) g (D) so
629
g (D) / B. Therefore, assume D ) C0 . Then, since D is in Te0 , f (C0 ) D and so

f (C0 ) g (D). Therefore, g (D) Te0 .
Now suppose S is a totally ordered subset of Te0 with respect to set inclusion.
Then if every element of S is contained in C0 , so is S and so S Te0 . If, on
the other hand, some chain from S, C, contains C0 properly, then since C / B,
f (C0 ) C S showing that S / B also. This has proved Te0 is a tower and
since T0 is the smallest tower, it follows Te0 = T0 . This has shown roughly that no
splitting into more than one ascending chain can occur at any C0 which is comparable
to every set of T0 . Next it is shown that every element of T0 has the property that
it is comparable to all other elements of T0 . This is done by showing that these
elements which possess this property form a tower.
Dene T1 to be the set of all elements of T0 which are comparable to every
element of T0 . (Recall the elements of T0 are chains from the original partial order.)
Claim: T1 is a tower.
Proof of the claim: It is clear that T1 because is a subset of every set.
Suppose C0 T1 . It is necessary to verify that g (C0 ) T1 . Let D T0 (Thus D C0
or else D ) C0 .)and consider g (C0 ) C0 {f (C0 )}. If D C0 , then D g (C0 )
so g (C0 ) is comparable to D. If D ) C0 , then D g (C0 ) by what was just shown
(B = ). Hence g (C0 ) is comparable to D. Since D was arbitrary, it follows g (C0 )
is comparable to every set of T0 . Now suppose S is a chain of elements of T1 and
let D be an element of T0 . If every element in the chain, S is contained in D, then
S is also contained in D. On the other hand, if some set, C, from S contains D
properly, then S also contains D. Thus S T 1 since it is comparable to every
D T0 .
This shows T1 is a tower and proves therefore, that T0 = T1 . Thus every set of
T0 compares with every other set of T0 showing T0 is a chain in addition to being a
tower.
Now T0 , g (T0 ) T0 . Hence, because g (T0 ) is an element of T0 , and T0 is a
chain of these, it follows g (T0 ) T0 . Thus
T0 g (T0 ) ) T0 ,
a contradiction. Hence there must exist a maximal chain after all. This proves the
lemma.
If X is a nonempty set, is an order on X if
x x,
and if x, y X, then
either x y or y x
and
if x y and y z then x z.
is a well order and say that (X, ) is a well-ordered set if every nonempty subset
of X has a smallest element. More precisely, if S = and S X then there exists
an x S such that x y for all y S. A familiar example of a well-ordered set is
the natural numbers.
Lemma A.0.8 The Hausdor maximal principle implies every nonempty set can
be well-ordered.
Proof: Let X be a nonempty set and let a X. Then {a} is a well-ordered

subset of X. Let
F = {S X : there exists a well order for S}.
Thus F = . For S1 , S2 F , dene S1 S2 if S1 S2 and there exists a well

order for S2 , 2 such that
(S2 , 2 ) is well-ordered
and if
y S2 \ S1 then x 2 y for all x S1 ,
and if 1 is the well order of S1 then the two orders are consistent on S1 . Then
observe that is a partial order on F. By the Hausdor maximal principle, let C
be a maximal chain in F and let
X C.
Dene an order, , on X as follows. If x, y are elements of X , pick S C such

that x, y are both in S. Then if S is the order on S, let x y if and only if x S y.
This denition is well dened because of the denition of the order, . Now let U
be any nonempty subset of X . Then S U = for some S C. Because of the
denition of , if y S2 \ S1 , Si C, then x y for all x S1 . Thus, if y X \ S
then x y for all x S and so the smallest element of S U exists and is the
smallest element in U . Therefore X is well-ordered. Now suppose there exists
z X \ X . Dene the following order, 1 , on X {z}.
x 1 y if and only if x y whenever x, y X
x 1 z whenever x X .
Then let
Ce = {S C or X {z}}.
Then Ce is a strictly larger chain than C contradicting maximality of C. Thus X \
X = and this shows X is well-ordered by . This proves the lemma.
With these two lemmas the main result follows.
Theorem A.0.9 The following are equivalent.
The axiom of choice
The Hausdor maximal principle

The well-ordering principle.
A.1. THE HAMEL BASIS 631
Proof: It only remains to prove that the well-ordering principle implies the
axiom of choice. Let I be a nonempty set and let Xi be a nonempty set for each
i I. Let X = {Xi : i I} and well order X. Let f (i) be the smallest element
of Xi . Then
f Xi .
iI
A.1 The Hamel Basis

A Hamel basis is nothing more than the correct generalization of the notion of a
basis for a nite dimensional vector space to vector spaces which are possibly not
of nite dimension.
Denition A.1.1 Let X be a vector space. A Hamel basis is a subset of X, such
that every vector of X can be written as a nite linear combination of vectors of
and the vectors of are linearly independent in the sense that if {x1 , , xn }
and
n
ck xk = 0
k=1
then each ck = 0.
The main result is the following theorem.
Theorem A.1.2 Let X be a nonzero vector space. Then it has a Hamel basis.
Proof: Let x1 X and x1 = 0. Let F denote the collection of subsets of X,
containing x1 with the property that the vectors of are linearly independent as
described in Denition A.1.1 partially ordered by set inclusion. By the Hausdor
maximal theorem, there exists a maximal chain, C. Let = C. Since C is a chain,
it follows that if {x1 , , xn } C then there exists a single C containing all
these vectors. Therefore, if
n
ck xk = 0
k=1
it follows each ck = 0. Thus the vectors of are linearly independent. Is every
vector of X a nite linear combination of vectors of ?
Suppose not. Then there exists z which is not equal to a nite linear combination
of vectors of . Consider {z} . If

m
cz + ck xk = 0
k=1
where the xk are vectors of , then if c = 0 this contradicts the condition that z
is not a nite linear combination of vectors of . Therefore, c = 0 and now all the
ck must equal zero because it was just shown is linearly independent. It follows
C { {z}} is a strictly larger chain than C and this is a contradiction. Therefore,
is a Hamel basis as claimed. This proves the theorem.
Bibliography
[1] Apostol, T. M., Calculus second edition, Wiley, 1967.
[2] Apostol T.M. Calculus Volume II Second edition, Wiley 1969.
[3] Apostol, T. M., Mathematical Analysis, Addison Wesley Publishing Co.,

1974.
[4] Baker, Roger, Linear Algebra, Rinton Press 2001.
[5] Balakrishnan A.V., Applied Functional Analysis, Springer Verlag 1976.
[6] Bartle R.G., A Modern Theory of Integration, Grad. Studies in Math., Amer.
Math. Society, Providence, RI, 2000.
[7] Bartle R. G. and Sherbert D.R. Introduction to Real Analysis third edi-
tion, Wiley 2000.
[8] Chahal J. S. , Historical Perspective of Mathematics 2000 B.C. - 2000 A.D.
[9] Davis H. and Snider A., Vector Analysis Wm. C. Brown 1995.
[10] Deimling K. Nonlinear Functional Analysis, Springer-Verlag, 1985.
[11] Diestal J. and Uhl J., Vector Measures, American Math. Society, Provi-
dence, R.I., 1977.
[12] Dontchev A.L. The Graves theorem Revisited, Journal of Convex Analysis,
Vol. 3, 1996, No.1, 45-53.
[13] DAngelo, J. and West D. Mathematical Thinking Problem Solving and

Proofs, Prentice Hall 1997.
[14] Edwards C.H. Advanced Calculus of several Variables, Dover 1994.
[15] Euclid, The Thirteen Books of the Elements, Dover, 1956.
[16] Evans L.C. and Gariepy, Measure Theory and Fine Properties of Functions,
CRC Press, 1992.
633
634 BIBLIOGRAPHY
[17] Evans L.C. Partial Dierential Equations, Berkeley Mathematics Lecture

Notes. 1993.
[18] Fitzpatrick P. M., Advanced Calculus a course in Mathematical Analysis,
PWS Publishing Company 1996.
[19] Federer H., Geometric Measure Theory, Springer-Verlag, New York, 1969.
[20] Fleming W., Functions of Several Variables, Springer Verlag 1976.
[21] Fonesca I. and Gangbo W. Degree theory in analysis and applications
Clarendon Press 1995.
[22] Greenberg, M. Advanced Engineering Mathematics, Second edition, Prentice
Hall, 1998
[23] Gromes W. Ein einfacher Beweis des Satzes von Borsuk. Math. Z. 178, pp.
399 -400 (1981)
[24] Gurtin M. An introduction to continuum mechanics, Academic press 1981.
[25] Hardy G., A Course Of Pure Mathematics, Tenth edition, Cambridge Uni-
versity Press 1992.
[26] Heinz, E.An elementary analytic theory of the degree of mapping in n dimen-
sional space. J. Math. Mech. 8, 231-247 1959
[27] Henstock R. Lectures on the Theory of Integration, World Scientic Publich-
ing Co. 1988.
[28] Hewitt E. and Stromberg K. Real and Abstract Analysis, Springer-Verlag,
New York, 1965.
[29] Horn R. and Johnson C. matrix Analysis, Cambridge University Press,
1985.
[30] Jones F., Lebesgue Integration on Euclidean Space, Jones and Bartlett 1993.
[31] Karlin S. and Taylor H. A First Course in Stochastic Processes, Academic
Press, 1975.
[32] Kuttler K. L., Basic Analysis, Rinton
[33] Kuttler K.L., Modern Analysis CRC Press 1998.
[34] Lang S. Real and Functional analysis third edition Springer Verlag 1993. Press,
2001.
[35] McLeod R. The Generalized Riemann Integral, Mathematical Association of
America, Carus Mathematical Monographs number 20 1980
[36] McShane E. J. Integration, Princeton University Press, Princeton, N.J. 1944.
BIBLIOGRAPHY 635
[37] Nobel B. and Daniel J. Applied Linear Algebra, Prentice Hall, 1977.
[38] Ray W.O. Real Analysis, Prentice-Hall, 1988.
[39] Rudin, W., Principles of mathematical analysis, McGraw Hill third edition
1976
[40] Rudin W., Real and Complex Analysis, third edition, McGraw-Hill, 1987.
[41] Salas S. and Hille E., Calculus One and Several Variables, Wiley 1990.
[42] Yosida K., Functional Analysis, Springer Verlag, 1978.

Index
C0 (Rn , 277 Borel regular, 614

Cc , 296 Borel Sets, 168
Ccm , 296 Borel sets, 168
F , 216 bounded, 53
F , 183 bounded continuous linear functions,
G , 443 443
G , 216 bounded variation, 341
G , 183 box topology, 427
Lp (; X), 590 Brouwer xed point theorem, 481
Lp Browders lemma, 272, 511
compactness, 306
systems, 219 Cantor diagonalization procedure, 374
algebra, 167 Cantor function, 186
algebra, 167 Cantor set, 185
Caratheodory extension theorem, 423
a.e., 169 Caratheodorys criterion, 612
adjugate, 109 Caratheodorys procedure, 170
Alexander subbasis theorem, 426 Cauchy Schwarz inequality, 477
algebra of sets, 421 Ceasaro summability, 357
almost everywhere, 169 chain rule, 123
approximate identity, 297 change of variables general case, 257
arithmetic mean, 157 characteristic function, 418
at most countable, 18 Clarkson
axiom of choice, 13, 18, 186 inequalities, 462
axiom of extension, 13 closed graph theorem, 449
axiom of specication, 13 closed set, 375
axiom of unions, 13 closure of a set, 376
cofactor, 107
Banach Alaoglu theorem, 465 compact, 161, 365
Banach space, 287 compact set, 377
Banach Steinhaus theorem, 445 completion of measure space, 213
Bessels inequality, 353, 488, 512 complex measurable functions, 178
Bochner integrable, 585 connected, 67, 379
Borel Cantelli lemma, 208 connected
Borel measurable, 186 arcwise connected, 382
Borel measure, 389, 395 continuous image, 380
636
INDEX 637
locally arcwise, 382 equivalent norms, 59

locally connected, 382 even, 357
connected component, 68, 380 events, 434
connected components, 68, 380 evolution equation
connected set continuous semigroup, 499
real numbers, 381 expectation, 436
continuous function, 377 exponential growth, 270, 327
convergence in measure, 208 extreme value theorem, 66
convex
set, 478 Fatous lemma, 195
convex nite intersection property, 369, 379
functions, 306 Fourier series, 332
convex hull, 58 pointwise convergence, 335
convolution, 324 uniform convergence, 357, 473
countable, 18 Fourier transform L1 , 315
countable basis, 60 Frechet derivative, 121
Cramers rule, 110 Fubinis theorem, 226, 231
function, 16
derivatives, 122 uniformly continuous, 71
determinant, 102 fundamental theorem of calculus
product, 105 general Radon measures, 566
transpose, 103 Gamma function, 306, 624
diameter of a set, 57 gamma function, 210
dierential equations Gateaux derivative, 125
Peano existence theorem, 98, 386 gauge function, 451
Dini derivates, 267 Gauss Jordan method for inverses, 39
Dirichlet kernel, 333 general spherical coordinates, 259
dominated convergence theorem, 202, geometric mean, 157
593 good lambda inequality, 236
Doob Dynkin , 438 gradient, 153
Doob Dynkin lemma, 438 Gram determinant, 485
dot product, 43 Gram matrix, 485
dual space, 454
duality maps, 474 Hahn Banach theorem, 452
Hahn decomposition, 556
Eberlein Smulian theorem, 469 Hahn Jordan decomposition, 556
Egoro theorem, 179, 594 Hamel basis, 631
eigenvalue, 156 Hardys inequality, 307
elementary matrices, 27 Haursdor measures, 611
epsilon net, 365, 371 Hausdor and Lebesgue measure, 622,
equality of mixed partial derivatives, 624
140 Hausdor dimension, 623
equicontinuous, 90 Hausdor maximal principle, 426, 451
equivalence class, 20 Hausdor maximal theorem, 627
equivalence relation, 20 Hausdor measure
638 INDEX
translation invariant, 616 Lagrange multipliers, 151, 152

Hausdor measures, 611 Laplace expansion, 107
Hausdor metric, 384 Laplace transform, 266, 270, 280, 327
Hausdor space, 375, 389 Lebesgue decomposition, 515
Heine Borel, 50 Lebesgue measure, 410
Heine Borel theorem, 163, 368, 384 one dimensional, 183
Hessian matrix, 149 Lebesgue number, 162
higher order derivatives, 131 left inverse, 40
Hilbert space, 477 limit point, 52, 375
Holder, 94 Lindelo property, 164
Holder inequality linear combination, 33, 105
backwards, 458 local maximum, 150
Holders inequality, 283 local minimum, 150
homotopy, 263 locally compact, 370
locally compact , 377
implicit function theorem, 143 locally compact Hausdor space, 398
independence lower semicontinuous, 95
algebras, 435 Lusins theorem, 306
independent
events, 434 Marcinkiewicz
random variables, 435 interpolation, 304
independent events, 434 matrix
independent random vectors, 435 left inverse, 109
independent sigma algebras, 435 lower triangular, 110
inner product, 43 right inverse, 109
inner product space, 477 upper triangular, 110
inner regular measure, 395 max. min.theorem, 66
inner regularity, 181 McShanes lemma, 542
Integral
mean square convergence, 346
Riemann and Lebesgue, 234
measurable, 169
interior point, 51
measurable function, 174
inverse, 33
pointwise limits, 175
left, 109
measurable functions
right, 109
Borel, 208
inverse function theorem, 145, 155
measurable representative, 598
inverses and determinants, 108
measurable sets, 169
invertible, 33
measurable space, 167
isodiametric inequality, 617, 621
measure, 167
isometric, 603
measure space, 167
iterated integral, 221
Minkowski functional, 473
James map, 456 Minkowski inequality
Jensens inequality, 306 backwards, 458
Minkowskis inequality, 289
Kolmogorov extension theorem, 429, minor, 107
432 mixed partial derivatives, 138
INDEX 639
mollier, 297 positive linear functional, 402

monotone convergence theorem, 193 positive part, 196
monotone functions power set, 13
dierentiable, 268 precompact, 97, 377, 386
multi - index, 76 probability measure, 184
multi-index, 134, 293 probability space, 184
product topology, 377
negative part, 196 projection in Hilbert space, 480
nested interval lemma, 50
nonmeasurable set, 186 Radon measure, 395
nonnegative simple function, 175 Radon Nikodym derivative, 518
normal topological space, 376 Radon Nikodym property, 600
nowhere dierentiable functions, 471 Radon Nikodym Theorem
nullity, 41 nite measures, 518
nite measures, 515
odd, 357
random variable
one point compactication, 378
distribution measure, 418
open cover, 161, 377
rank
open mapping theorem, 446
matrix, 41
open set, 51
rational function, 77
open sets, 375
real algebra of functions, 275
operator norm, 443
orthogonal matrix, 114 reexive Banach Space, 457
orthonormal, 113 reexive Banach space, 536
orthonormal set, 486 regular
outer measure, 167, 208 measure space, 216
outer regular measure, 395 regular measure, 395
outer regularity, 181 regular topological space, 376
resolvent, 493
parallelogram identity, 511 retraction, 263
partial derivatives, 125 Riemann integrable
partial order, 450 continuous, 384
partially ordered set, 627 Riemann Lebesgue lemma, 335
partition of unity, 401 Riesz map, 482
periodic function, 331 Riesz representation theorem, 605
permutation matrices, 27 C0 (X), 552
pi systems, 219 Hilbert space, 482
pivot column, 36 locally compact Hausdor space,
Plancherel theorem, 319 403
pointwise convergence Riesz Representation theorem
sequence, 72 C (X), 549
series, 75 Riesz representation theorem Lp
pointwise limits of measurable func- nite measures, 529
tions, 581 Riesz representation theorem Lp
polar decomposition, 527 nite case, 535, 609
polynomial, 294 Riesz representation theorem for L1
640 INDEX
nite measures, 533 Tietze extension theorem, 83

right inverse, 39, 40 topological space, 375
right polar factorization, 117 total variation, 521
row equivalent, 37 totally bounded set, 365
row operations, 27 totally ordered set, 627
row reduced echelon form, 35 translation invariant, 183, 412
Russells paradox, 16 triangle inequality, 45
Tychono theorem, 427
Sards lemma, 254
scalar product, 43 uniform boundedness theorem, 445
Schroder Bernstein theorem, 17 uniform contractions, 141
second derivative test, 150 uniform convergence, 346
second mean value theorem, 338 sequence, 73
semigroup series, 75
adjoint, 504 uniform convexity, 474
contraction uniformly bounded, 371
bounded, 491 uniformly Cauchy
generator, 490 sequence, 73
growth estimate, 490 uniformly continuous, 71
Hille Yosida theorem, 494 uniformly equicontinuous, 370
strongly continuous, 490 uniformly integrable, 209
separable, 60 unitary matrix, 114
separated, 67, 379 upper semicontinuous, 95
separation theorem, 473 Urysohns lemma, 399
sequential weak* compactness, 467
variational inequality, 480
sets, 13
vector measures, 521, 598
Shannon sampling theorem, 359
Vitali
sigma algebra, 167
convergence theorem, 596
sigma nite
Vitali convergence theorem, 307
measure space, 228
Vitali cover, 575
simple function, 175, 577
Vitali covering, 246
Sobolev Space
Vitali covering theorem, 248
embedding theorem, 329 volume of unit ball, 624
equivalent norms, 328
Sobolev spaces, 328 weak convergence, 474
span, 105 weak topology, 464
Steiner symetrization, 619 weak measurable, 584
Stirlings formula, 210 weak* topology, 464
strict convexity, 474 weakly measurable, 577
strongly measurable, 577 Weierstrass approximation theorem, 275
subbasis, 426 Weierstrass M test, 75, 356
support, 240 well ordered sets, 629
support of a function, 401
Youngs inequality, 283, 556
Taylors formula, 148

Real and Abstract Anlysis

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Real and Abstract Anlysis

Hochgeladen von

Copyright:

Verfügbare Formate

Real And Abstract Analysis

March 21, 2014

2 Row reduced echelon form 27

4.8 The Operator Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5 The Mathematical Theory Of Determinants, Basic Linear Algebra 99

6 The Derivative 121

II Integration And Measure 159

8 The Abstract Lebesgue Integral 189

9 The Lebesgue Integral For Functions Of n Variables 213

10 Lebesgue Measurable Sets 237

10.8 Spherical Coordinates In p Dimensions . . . . . . . . . . . . . . . . . 258

11 Approximation Theorems 273

12 The Lp Spaces 283

13 Fourier Transforms 309

14 Fourier Series 331

III Further Topics 361

15 Metric Spaces And General Topological Spaces 363

16 Measure Theory And Topology 389

17 Extension Theorems 421

18 Banach Spaces 441

19 Hilbert Spaces 477

20 Representation Theorems 515

21 Dierentiation With Respect To General Radon Measures 559

22 The Bochner Integral 577

23 Hausdor Measure 611

A The Hausdor Maximal Theorem 627

1.1 Set Theory

4. The Cartesian product of a nonempty family of nonempty sets is nonempty.

These axioms are referred to as the axiom of extension, axiom of specication,

S = {x set of dogs : it is colder in the mountains than in the winter} .

signify this union.

signies the Cartesian product.

and: Let S be a set of sets show

1.1.2 The Schroder Bernstein Theorem

Denition 1.1.1 Let X and Y be sets.

A relation is dened to be a subset of X Y . A function, f, also called a mapping,

Theorem 1.1.2 Let f : X Y and g : Y X be two functions. Then there exist

The following picture illustrates the conclusion of this theorem.

Proof: Consider the empty set X. If y Y \ f (), then g (y) / because

A {A0 X : A0 satises P}.

Let A = A. If y Y \ f (A), then for each A0 A, y Y \ f (A0 ) and so

It only remains to verify that g (D) = B.

Theorem 1.1.3 (Schroder Bernstein) If f : X Y and g : Y X are one to

Proof: Let A, B, C, D be the sets of Theorem 1.1.2 and dene

Then h is the desired one to one and onto mapping.

if f (i) Xi for each i I.

Corollary 1.1.5 If f : X Y is onto and g : Y X is onto, then there exists

Proof: For each y Y , f 1 (y) {x X : f (x) = y} = . Therefore, by the

The property of being at most countable is often referred to as being countable

Proof: It is given that there exists a mapping : N X which is onto. Dene

(x1 , y1 ) (x1 , y2 ) (x1 , y3 ) Those which have x1 in rst slot.

Follow a path through this array as follows.

(x1 , y1 ) (x1 , y2 ) (x1 , y3 )

Thus the rst element of X Y is (x1 , y1 ), the second element of X Y is (x1 , y2 ),

Theorem 1.1.8 If X and Y are at most countable, then X Y is at most countable.

Proof: As in the preceding theorem,

1.1.3 Equivalence Relations

Denition 1.1.9 Let S be a set. is an equivalence relation on S if it satises

1. x x for all x S. (Reexive)

3. If x y and y z, then x z. (Transitive)

1.2 lim sup And lim inf

Denition 1.2.1 For A [, ] , A = sup A is dened as the least upper

Lemma 1.2.2 If {An } is an increasing sequence in [, ], then

sup {An } = lim An .