Sie sind auf Seite 1von 219

Bruce K.

Driver

Math 280 (Probability Theory) Lecture Notes


February 23, 2007 File:prob.tex

Contents

Part Homework Problems: -1 Math 280B Homework Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.1 Homework 1. Due Monday, January 22, 2007 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.2 Homework 2. Due Monday, January 29, 2007 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.3 Homework #3 Due Monday, February 5, 2007 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.4 Homework #4 Solutions (Due Friday, February 16, 2007) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.5 Homework #5 Solutions (Due Friday, February 23, 2007) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.6 Homework #6 Solutions (Due Friday, March 2, 2007) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Math 280A Homework Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.1 Homework 1. Due Friday, September 29, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2 Homework 2. Due Friday, October 6, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.3 Homework 3. Due Friday, October 13, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.4 Homework 4. Due Friday, October 20, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.5 Homework 5. Due Friday, October 27, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.6 Homework 6. Due Friday, November 3, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.7 Homework 7. Due Monday, November 13, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.7.1 Corrections and comments on Homework 7 (280A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.8 Homework 8. Due Monday, November 27, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.9 Homework 9. Due Noon, on Wednesday, December 6, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 6 6

Part I Background Material 1 2 Limsups, Liminfs and Extended Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Basic Probabilistic Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Contents

Part II Formal Development 3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Algebraic sub-structures of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finitely Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Finitely Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Examples of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Simple Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Simple Independence and the Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Constructing Finitely Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Countably Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Distribution Function for Probability Measures on (R, BR ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Construction of Premeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Regularity and Uniqueness Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Construction of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Completions of Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 A Baby Version of Kolmogorovs Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 21 21 25 25 26 28 30 32 35 35 35 37 38 41 42

Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.2 Factoring Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 and Monotone Class Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 The Monotone Class Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Basic Properties of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 An Example of Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Borel-Cantelli Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Kolmogorov and Hewitt-Savage Zero-One Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Integration Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 A Quick Introduction to Lebesgue Integration Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Integrals of positive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Integrals of Complex Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Densities and Change of Variables Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Measurability on Complete Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Comparison of the Lebesgue and the Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 53 54 58 59 63 67 67 70 74 78 81 81 83

Page: 4

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

Contents

8.7.1 Laws of Large Numbers Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 9 Functional Forms of the Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 91 91 91 93 95 96 102 104 106 107 107 111 113 113 115 115 119 119

10 Multiple and Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Tonellis Theorem and Product Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Fubinis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Fubinis Theorem and Completions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Lebesgue Measure on Rd and the Change of Variables Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 The Polar Decomposition of Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 More Spherical Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Jensens, H olders and Minikowskis Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Completeness of Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Relationships between dierent Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Summary: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Appendix: Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part III Convergence Results 12 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Random Series Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.2 A WLLN Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Strong Law of Large Number Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 More on the Weak Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Maximal Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Kolmogorovs Convergence Criteria and the SSLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8 Necessity Proof of Kolmogorovs Three Series Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Weak Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Total Variation Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Derived Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 126 128 128 129 130 132 133 136 138 140 143 143 144 147

Page: 5

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

Contents

13.4 13.5 13.6 13.7

Skorohod and the Convergence of Types Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weak Convergence Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compactness and Tightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weak Convergence in Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

149 152 156 157 161 161 164 165 169 171 173 176 177

14 Characteristic Functions (Fourier Transform) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Basic Properties of the Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Continuity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 A Fourier Transform Inversion Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 Appendix: Bochners Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7 Appendix: A Multi-dimensional Weirstrass Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.8 Appendix: Some Calculus Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15 Weak Convergence of Random Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 15.1 Innitely Divisible and Stable Symmetric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 15.1.1 Stable Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Part IV Conditional Expectations and Martingales 16 Hilbert Space Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 17 The Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 18 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Additional Properties of Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 Regular Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 Appendix: Standard Borel Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 204 207 208 209

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Page: 6

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

Part

Homework Problems:

-1 Math 280B Homework Problems


-1.1 Homework 1. Due Monday, January 22, 2007
Hand in from p. 114 : 4.27 Hand in from p. 196 : 6.5, 6.7 Hand in from p. 234246: 7.12, 7.16, 7.33, 7.36 (assume each Xn is integrable!), 7.42 Hints and comments. 1. For 6.7, observe that Xn = n N (0, 1) . 2. For 7.12, let {Un : n = 0, 1, 2, ...} be i.i.d. random variables uniformly distributed on (0,1) and take X0 = U0 and then dene Xn inductively so that Xn+1 = Xn Un+1 . 3. For 7.36; use the assumptions to bound E [Xn ] in terms of E[Xn : Xn x]. Then use the two series theorem.
d

-1.4 Homework #4 Solutions (Due Friday, February 16, 2007)


Resnick Chapter 9: Look at: 9.22, 9.33 Resnick Chapter 9: Hand in 9.5, 9.6, 9.9 a-e., 9.10 Also hand in Exercise from these notes: 14.2, 14.3, and 14.4.

-1.5 Homework #5 Solutions (Due Friday, February 23, 2007)


Resnick Chapter 9: Look at: 8 2 Resnick Chapter 9: Hand in 11, 28, 34 (assume n n > 0), 35 (hint: show P [n = 0 i.o. ] = 0.), 38 (Hint: make use Proposition 7.25.)

-1.2 Homework 2. Due Monday, January 29, 2007


Resnick Chapter 7: Hand in 7.9, 7.13. Resnick Chapter 7: look at 7.28. (For 28b, assume E[Xi Xj ] (i j ) for 2 n i j. Also you may nd it easier to show S n 0 in L rather than the weaker notion of in probability.) Hand in Exercise 13.2 from these notes. Resnick Chapter 8: Hand in 8.4a-d, 8.13 (Assume Var (Nn ) > 0 for all n.)

-1.6 Homework #6 Solutions (Due Friday, March 2, 2007)


Look at Resnick Chapter 10: 11 Do the following Exercises from the Lecture Notes: 12.1, 18.1, 18.2, 18.3, 18.4 Resnick Chapter 10: Hand in 2 , 5*, 7, 8** In part 2b, please explain what convention you are using when the denominator is 0. *A Poisson process, {N (t)}t0 , with parameter satises (by denition): (i) N has independent increments , so that N (s) and N (t) N (s) are independent; (ii) if 0 u < v then N (v ) N (u) has the Poisson distribution with parameter (v u). **Hint: use Exercise 12.1.

-1.3 Homework #3 Due Monday, February 5, 2007


Resnick Chapter 8: Look at: 8.14, 8.20, 8.36 Resnick Chapter 8: Hand in 8.7, 8.17, 8.31, 8.30* (Due 8.31 rst), 8.34 *Ignore the part of the question referring to the moment generating function. Hint: use problem 8.31 and the convergence of types theorem. Also hand in Exercise 13.3 from these notes.

0 Math 280A Homework Problems


Unless otherwise noted, all problems are from Resnick, S. A Probability Path, Birkhauser, 1999.

0.5 Homework 5. Due Friday, October 27, 2006


Look at from p. 110116: 3, 5 Hand in from p. 110116: 1, 6, 8, 18, 19

0.1 Homework 1. Due Friday, September 29, 2006


p. 20-27: Look at: 9, 12, ,19, 27, 30, 36 p. 20-27: Hand in: 5, 17, 18, 23, 40, 41

0.6 Homework 6. Due Friday, November 3, 2006


Look at from p. 110116: 3, 5, 28, 29 Look at from p. 155166: 6, 34 Hand in from p. 110116: 9, 11, 15, 25 Hand in from p. 155166: 7 Hand in lecture note exercise: 7.1.

0.2 Homework 2. Due Friday, October 6, 2006


p. 63-70: Look at: 18 p. 63-70: Hand in: 3, 6, 7, 11, 13 and the following problem. Exercise 0.1 (280A-2.1). Referring to the setup in Problem 7 on p. 64 of Resnick, compute the expected number of dierent coupons collected after buying n boxes of cereal.

0.7 Homework 7. Due Monday, November 13, 2006


Look at from p. 155166: 13, 16, 37 Hand in from p. 155166: 11, 21, 26 Hand in lecture note exercises: 8.1, 8.2, 8.19, 8.20. 0.7.1 Corrections and comments on Homework 7 (280A) Problem 21 in Section 5.10 of Resnick should read, d P (s) = ds

0.3 Homework 3. Due Friday, October 13, 2006


Look at from p. 63-70: 5, 14, 19 Look at lecture notes: exercise 4.4 and read Section 5.5 Hand in from p. 63-70: 16 Hand in lecture note exercises: 4.1 4.3, 5.1 and 5.2.

0.4 Homework 4. Due Friday, October 20, 2006


Look at from p. 8590: 3, 7, 12, 17, 21 Hand in from p. 8590: 4, 6, 8, 9, 15 Also hand in the following exercise. Exercise 0.2 (280A-4.1). Suppose {fn }n=1 is a sequence of Random Variables on some measurable space. Let B be the set of such that fn ( ) is convergent as n . Show the set B is measurable, i.e. B is in the algebra.

kpk sk1 for s [0, 1] .


k=1

k Note that P (s) = k=0 pk s is well dened and continuous (by DCT) for s [1, 1] . So the derivative makes sense to compute for s (1, 1) with no qualications. When s = 1 you should interpret the derivative as the one sided derivative d P (1) P (1 h) |1 P (s) := lim h0 ds h

and you will need to allow for this limit to be innite in case k=1 kpk = . d In computing ds |1 P (s) , you may wish to use the fact (draw a picture or give a calculus proof) that 1 sk increases to k as s 1. 1s Hint for Exercise 8.20: Start by observing that Sn n
4

d = E = 1 n4

1 n

(Xk )
k=1 n

E [(Xk )(Xj )(Xl )(Xp )] .


k,j,l,p=1

Then analyze for which groups of indices (k, j, l, p); E [(Xk )(Xj )(Xl )(Xp )] = 0.

0.8 Homework 8. Due Monday, November 27, 2006


Look at from p. 155166: 19, 34, 38 Look at from p. 195201: 19, 24 Hand in from p. 155166: 14, 18 (Hint: see picture given in class.), 22a-b Hand in from p. 195201: 1a,b,d, 12, 13, 33 and 18 (Also assume EXn = 0)* Hand in lecture note exercises: 9.1.

* For Problem 18, please add the missing assumption that the random variables should have mean zero. (The assertion to prove is false without this assumption.) With this assumption, Var(X ) = E[X 2 ]. Also note that Cov(X, Y ) = 0 is equivalent to E[XY ] = EX EY.

0.9 Homework 9. Due Noon, on Wednesday, December 6, 2006


Look at from p. 195201: 3, 4, 14, 16, 17, 27, 30 Hand in from p. 195201: 15 (Hint: |a b| = 2(a b)+ (a b). ) Hand in from p. 234246: 1, 2 (Hint: it is just as easy to prove a.s. convergence), 15

Part I

Background Material

1 Limsups, Liminfs and Extended Limits


:= R {} , i.e. it Notation 1.1 The extended real numbers is the set R is R with two new points called and . We use the following conventions, 0 = 0, a = if a R with a > 0, a = if a R with a < 0, + a = for any a R, + = and = while is said to converge to () if for is not dened. A sequence an R all M R there exists m N such that an M (an M ) for all n m.
, Lemma 1.2. Suppose {an }n=1 and {bn }n=1 are convergent sequences in R then:

and bn = n with > 0 shows the necessity for assuming right hand side of Eq. (1.2) is not of the form 0. Proof. The proofs of items 1. and 2. are left to the reader. Proof of Eq. (1.1). Let a := limn an and b = limn bn . Case 1., suppose b = in which case we must assume a > . In this case, for every M > 0, there exists N such that bn M and an a 1 for all n N and this implies an + bn M + a 1 for all n N. Since M is arbitrary it follows that an + bn as n . The cases where b = or a = are handled similarly. Case 2. If a, b R, then for every > 0 there exists N N such that |a an | and |b bn | for all n N.

1. If an bn for1 a.a. n then limn an limn bn . 2. If c R, limn (can ) = c limn an . 3. If {an + bn }n=1 is convergent and
n

lim (an + bn ) = lim an + lim bn


n n

(1.1)

Therefore, |a + b (an + bn )| = |a an + b bn | |a an | + |b bn | 2

provided the right side is not of the form . 4. {an bn }n=1 is convergent and
n

lim (an bn ) = lim an lim bn


n n

(1.2)

provided the right hand side is not of the for 0 of 0 () . Before going to the proof consider the simple example where an = n and bn = n with > 0. Then if < 1 0 if = 1 lim (an + bn ) = if > 1 while
n

lim an + lim bn = .
n

for all n N. Since n is arbitrary, it follows that limn (an + bn ) = a + b. Proof of Eq. (1.2). It will be left to the reader to prove the case where lim an and lim bn exist in R. I will only consider the case where a = limn an = 0 and limn bn = here. Let us also suppose that a > 0 (the case a < 0 is handled similarly) and let := min a 2 , 1 . Given any M < , there exists N N such that an and bn M for all n N and for this choice of N, an bn M for all n N. Since > 0 is xed and M is arbitrary it follows that limn (an bn ) = as desired. , let sup and inf denote the least upper bound and For any subset R greatest lower bound of respectively. The convention being that sup = if or is not bounded from above and inf = if or is not bounded from below. We will also use the conventions that sup = and inf = +.
is a sequence of numbers. Then Notation 1.3 Suppose that {xn }n=1 R

This shows that the requirement that the right side of Eq. (1.1) is not of form is necessary in Lemma 1.2. Similarly by considering the examples an = n
1

lim inf xn = lim inf {xk : k n} and


n n

(1.3) (1.4)

Here we use a.a. n as an abreviation for almost all n. So an bn a.a. n i there exists N < such that an bn for all n N.

lim sup xn = lim sup{xk : k n}.


n n

10

1 Limsups, Liminfs and Extended Limits


n

We will also write lim for lim inf n and lim for lim sup . Remark 1.4. Notice that if ak := inf {xk : k n} and bk := sup{xk : k n}, then {ak } is an increasing sequence while {bk } is a decreasing sequence. and Therefore the limits in Eq. (1.3) and Eq. (1.4) always exist in R lim inf xn = sup inf {xk : k n} and
n n n

a inf {ak : k N } sup{ak : k N } a + , i.e. a ak a + for all k N. Hence by the denition of the limit, limk ak = a. If lim inf n an = , then we know for all M (0, ) there is an integer N such that M inf {ak : k N } and hence limn an = . The case where lim sup an = is handled simin

lim sup xn = inf sup{xk : k n}.


n

The following proposition contains some basic properties of liminfs and limsups.
Proposition 1.5. Let {an } n=1 and {bn }n=1 be two sequences of real numbers. Then

larly. exists. If A R, then for Conversely, suppose that limn an = A R every > 0 there exists N () N such that |A an | for all n N (), i.e. A an A + for all n N (). From this we learn that A lim inf an lim sup an A + .
n n

i 1. lim inf n an lim sup an and limn an exists in R


n

. lim inf an = lim sup an R


n n 2. There is a subsequence {ank } k=1 of {an }n=1 such that limk ank = lim sup an . Similarly, there is a subsequence {ank } k=1 of {an }n=1 such that n limk

Since > 0 is arbitrary, it follows that A lim inf an lim sup an A,


n n

ank = lim inf n an . lim sup(an + bn ) lim sup an + lim sup bn


n n n

3. (1.5)

i.e. that A = lim inf n an = lim sup an . If A = , then for all M > 0
n

whenever the right side of this equation is not of the form . 4. If an 0 and bn 0 for all n N, then lim sup(an bn ) lim sup an lim sup bn ,
n n n

there exists N = N (M ) such that an M for all n N. This show that lim inf n an M and since M is arbitrary it follows that lim inf an lim sup an .
n n

(1.6)

The proof for the case A = is analogous to the A = case. Proposition 1.6 (Tonellis theorem for sums). If {akn }k,n=1 is any sequence of non-negative numbers, then

provided the right hand side of (1.6) is not of the form 0 or 0. Proof. Item 1. will be proved here leaving the remaining items as an exercise to the reader. Since inf {ak : k n} sup{ak : k n} n, lim inf an lim sup an .
n n

akn =
k=1 n=1 n=1 k=1

akn .

Here we allow for one and hence both sides to be innite. Proof. Let
K N N K

Now suppose that lim inf n an = lim sup an = a R. Then for all > 0,
n

there is an integer N such that

M := sup
k=1 n=1

akn : K, N N

= sup
n=1 k=1

akn : K, N N

Page: 10

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

and L :=

akn .
k=1 n=1

Since
K K N

L=
k=1 n=1

akn = lim
N n=1

akn = lim
k=1 n=1

K N

lim

akn
k=1 n=1

and

K k=1

akn M for all K and N, it follows that L M. Conversely,


K N K

akn
k=1 n=1 k=1 n=1

akn
k=1 n=1

akn = L

and therefore taking the supremum of the left side of this inequality over K and N shows that M L. Thus we have shown

akn = M.
k=1 n=1

By symmetry (or by a similar argument), we also have that M and hence the proof is complete.

n=1

k=1

akn =

2 Basic Probabilistic Notions


Denition 2.1. A sample space is a set which is to represents all possible outcomes of an experiment. for N throws, and
N = DR

for an innite number of throws. 5. Suppose we release a perfume particle at location x R3 and follow its motion for all time, 0 t < . In this case, we might take, = C ([0, ) , R3 ) : (0) = x . Denition 2.3. An event is a subset of . Example 2.4. Suppose that = {0, 1} is the sample space for ipping a coin an innite number of times. Here n = 1 represents the fact that a head was thrown on the nth toss, while n = 0 represents a tail on the nth toss. Example 2.2. 1. The sample space for ipping a coin one time could be taken to be, = {0, 1} . 2. The sample space for ipping a coin N -times could be taken to be, = N {0, 1} and for ipping an innite number of times, = { = (1 , 2 , . . . ) : i {0, 1}} = {0, 1} . 3. If we have a roulette wheel with 40 entries, then we might take = {00, 0, 1, 2, . . . , 36} for one spin, = {00, 0, 1, 2, . . . , 36} for N spins, and = {00, 0, 1, 2, . . . , 36}
N N N N

1. A = { : 3 = 1} represents the event that the third toss was a head. 2. A = i=1 { : i = i+1 = 1} represents the event that (at least) two heads are tossed twice in a row at some time. 3. A = N =1 nN { : n = 1} is the event where there are innitely many heads tossed in the sequence. 4. A = N =1 nN { : n = 1} is the event where heads occurs from some time onwards, i.e. A i there exists, N = N ( ) such that n = 1 for all n N. Ideally we would like to assign a probability, P (A) , to all events A . Given a physical experiment, we think of assigning this probability as follows. Run the experiment many times to get sample points, (n) for each n N, then try to dene P (A) by P (A) = lim
N

1 # {1 k N : (k ) A} . N

(2.1)

for an innite number of spins. 4. If we throw darts at a board of radius R, we may take = DR := (x, y ) R2 : x2 + y 2 R for one throw,
N = DR

That is we think of P (A) as being the long term relative frequency that the event A occurred for the sequence of experiments, { (k )}k=1 . Similarly supposed that A and B are two events and we wish to know how likely the event A is given that we now that B has occurred. Thus we would like to compute: P (A|B ) = lim
n

# {k : 1 k n and k A B } , # {k : 1 k n and k B }

14

2 Basic Probabilistic Notions

which represents the frequency that A occurs given that we know that B has occurred. This may be rewritten as P (A|B ) = lim =
1 n # {k : 1 k n 1 n n # {k : 1 k

Example 2.7. The previous example suggests that if we ip a fair coin an innite N number of times, so that now = {0, 1} , then we should dene P ({ : (1 , . . . , k ) = }) =
k

and k A B } n and k B }

1 2k

(2.2)

P (A B ) . P (B )

Denition 2.5. If B is a non-null event, i.e. P (B ) > 0, dene the conditional probability of A given B by, P (A B ) P (A|B ) := . P (B ) There are of course a number of problems with this denition of P in Eq. (2.1) including the fact that it is not mathematical nor necessarily well dened. For example the limit may not exist. But ignoring these technicalities for the moment, let us point out three key properties that P should have. 1. P (A) [0, 1] for all A . 2. P () = 1 and P ( ) = 1. 3. Additivity. If A and B are disjoint event, i.e. A B = AB = , then P (A B ) = lim 1 # {1 k N : (k ) A B } N N 1 = lim [# {1 k N : (k ) A} + # {1 k N : (k ) B }] N N = P (A) + P (B ) .

for any k 1 and {0, 1} . Assuming there exists a probability, P : 2 [0, 1] such that Eq. (2.2) holds, we would like to compute, for example, the probability of the event B where an innite number of heads are tossed. To try to compute this, let An = { : n = 1} = {heads at time n} BN := nN An = {at least one heads at time N or later} and
B = N =1 BN = {An i.o.} = N =1 nN An .

Since
c c BN = nN Ac n M nN An = { : N = = M = 1} ,

1 0 as M . 2M N Therefore, P (BN ) = 1 for all N. If we assume that P is continuous under taking decreasing limits we may conclude, using BN B, that
c P (BN )

we see that

P (B ) = lim P (BN ) = 1.
N

Without this continuity assumption we would not be able to compute P (B ) . The unfortunate fact is that we can not always assign a desired probability function, P (A) , for all A . For example we have the following negative theorem. Theorem 2.8 (No-Go Theorem). Let S = {z C : |z | = 1} be the unit circle. Then there is no probability function, P : 2S [0, 1] such that P (S ) = 1, P is invariant under rotations, and P is continuous under taking decreasing limits. Proof. We are going to use the fact proved below in Lemma , that the continuity condition on P is equivalent to the additivity of P. For z S and N S let zN := {zn S : n N }, (2.3) that is to say ei N is the set N rotated counter clockwise by angle . By assumption, we are supposing that
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Example 2.6. Let us consider the tossing of a coin N times with a fair coin. In this case we would expect that every is equally likely, i.e. P ({ }) = 21 N . Assuming this we are then forced to dene P (A) = 1 # (A) . 2N

Observe that this probability has the following property. Suppose that k {0, 1} is a given sequence, then P ({ : (1 , . . . , k ) = }) = 1 1 2N k = k . N 2 2

That is if we ignore the ips after time k, the resulting probabilities are the same as if we only ipped the coin k times.

Page: 14

job: prob

P (zN ) = P (N ) for all z S and N S. Let R := {z = ei2t : t Q} = {z = ei2t : t [0, 1) Q}

(2.4)

a countable subgroup of S. As above R acts on S by rotations and divides S up into equivalence classes, where z, w S are equivalent if z = rw for some r R. Choose (using the axiom of choice) one representative point n from each of these equivalence classes and let N S be the set of these representative points. Then every point z S may be uniquely written as z = nr with n N and r R. That is to say S= (rN ) (2.5)
r R

where A is used to denote the union of pair-wise disjoint sets {A } . By Eqs. (2.4) and (2.5), 1 = P (S ) =
r R

P (rN ) =
r R

P (N ).

(2.6)

We have thus arrived at a contradiction, since the right side of Eq. (2.6) is either equal to 0 or to depending on whether P (N ) = 0 or P (N ) > 0. To avoid this problem, we are going to have to relinquish the idea that P should necessarily be dened on all of 2 . So we are going to only dene P on particular subsets, B 2 . We will developed this below.

Part II

Formal Development

3 Preliminaries
3.1 Set Operations
Let N denote the positive integers, N0 := N {0} be the non-negative integers and Z = N0 (N) the positive and negative integers including 0, Q the rational numbers, R the real numbers, and C the complex numbers. We will also use F to stand for either of the elds R or C. Notation 3.1 Given two sets X and Y, let Y X denote the collection of all functions f : X Y. If X = N, we will say that f Y N is a sequence with values in Y and often write fn for f (n) and express f as {fn }n=1 . If X = {1, 2, . . . , N }, we will write Y N in place of Y {1,2,...,N } and denote f Y N by f = (f1 , f2 , . . . , fN ) where fn = f (n). Notation 3.2 More generally if {X : A} is a collection of non-empty sets, let XA = X and : XA X be the canonical projection map dened
A

We also dene the symmetric dierence of A and B by A B := (B \ A) (A \ B ) . As usual if {A }I is an indexed collection of subsets of X we dene the union and the intersection of this collection by I A := {x X : I x A } and I A := {x X : x A I }. Notation 3.4 We will also write I A for I A in the case that {A }I are pairwise disjoint, i.e. A A = if = . Notice that is closely related to and is closely related to . For example let {An }n=1 be a sequence of subsets from X and dene
kn kn

inf An := kn Ak ,

by (x) = x . If If X = X for some xed space X, then we will write


A

as X A rather than XA . Recall that an element x XA is a choice function, i.e. an assignment x := x() X for each A. The axiom of choice states that XA = provided that X = for each A. Notation 3.3 Given a set X, let 2X denote the power set of X the collection of all subsets of X including the empty set. The reason for writing the power set of X as 2X is that if we think of 2 X meaning {0, 1} , then an element of a 2X = {0, 1} is completely determined by the set A := {x X : a(x) = 1} X. In this way elements in {0, 1} are in one to one correspondence with subsets of X. For A 2X let Ac := X \ A = {x X : x / A} and more generally if A, B X let B \ A := {x B : x / A} = A B c .
X

sup An := kn Ak , lim sup An := {An i.o.} := {x X : # {n : x An } = }


n

and lim inf An := {An a.a.} := {x X : x An for all n suciently large}.


n

(One should read {An i.o.} as An innitely often and {An a.a.} as An almost always.) Then x {An i.o.} i N N n N and this may be expressed as {An i.o.} = N =1 nN An . Similarly, x {An a.a.} i N N which may be written as {An a.a.} = N =1 nN An . n N, x An x An

20

3 Preliminaries

Denition 3.5. Given a set A X, let 1A (x) = be the characteristic function of A. Lemma 3.6. We have: 1. {An i.o.} = {Ac n a.a.} , 2. lim sup An = {x X :
n c n=1

1 if x A 0 if x /A

1An (x) = } ,

2. (taking = f (X )) shows X is countable. Conversely if f : X N is injective let x0 X be a xed point and dene g : N X by g (n) = f 1 (n) for n f (X ) and g (n) = x0 otherwise. 4. Let us rst construct a bijection, h, from N to N N. To do this put the elements of N N into an array of the form (1, 1) (1, 2) (1, 3) . . . (2, 1) (2, 2) (2, 3) . . . (3, 1) (3, 2) (3, 3) . . . . . . .. . . . . . . . and then count these elements by counting the sets {(i, j ) : i + j = k } one at a time. For example let h (1) = (1, 1) , h(2) = (2, 1), h (3) = (1, 2), h(4) = (3, 1), h(5) = (2, 2), h(6) = (1, 3) and so on. If f : N X and g : N Y are surjective functions, then the function (f g ) h : N X Y is surjective where (f g ) (m, n) := (f (m), g (n)) for all (m, n) N N. 5. If A = then A is countable by denition so we may assume A = . With out loss of generality we may assume A1 = and by replacing Am by A1 if necessary we may also assume Am = for all m. For each m N let am : N Am be a surjective function and then dene f : N N m=1 Am by f (m, n) := am (n). The function f is surjective and hence so is the composition, f h : N m=1 Am , where h : N N N is the bijection dened above. N 6. Let us begin by showing 2N = {0, 1} is uncountable. For sake of N contradiction suppose f : N {0, 1} is a surjection and write f (n) as N (f1 (n) , f2 (n) , f3 (n) , . . . ) . Now dene a {0, 1} by an := 1 fn (n). By construction fn (n) = an for all n and so a / f (N) . This contradicts the assumption that f is surjective and shows 2N is uncountable. For the general case, since Y0X Y X for any subset Y0 Y, if Y0X is uncountable then so is Y X . In this way we may assume Y0 is a two point set which may as well be Y0 = {0, 1} . Moreover, since X is an innite set we may nd an injective map x : N X and use this to set up an injection, i : 2N 2X by setting i (A) := {xn : n N} X for all A N. If 2X were countable we could nd a surjective map f : 2X N in which case f i : 2N N would be surjective as well. However this is impossible since we have already seed that 2N is uncountable. We end this section with some notation which will be used frequently in the sequel. Notation 3.9 If f : X Y is a function and E 2Y let f 1 E := f 1 (E ) := {f 1 (E )|E E}. If G 2X , let f G := {A 2Y |f 1 (A) G}.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

3. 4. 5. 6.

lim inf n An = x X : n=1 1Ac (x) < , n supkn 1Ak (x) = 1kn Ak = 1supkn An , inf 1Ak (x) = 1kn Ak = 1inf kn Ak , 1lim sup An = lim sup 1An , and
n

7. 1lim inf n An = lim inf n 1An . Denition 3.7. A set X is said to be countable if is empty or there is an injective function f : X N, otherwise X is said to be uncountable. Lemma 3.8 (Basic Properties of Countable Sets). 1. If A X is a subset of a countable set X then A is countable. 2. Any innite subset N is in one to one correspondence with N. 3. A non-empty set X is countable i there exists a surjective map, g : N X. 4. If X and Y are countable then X Y is countable. 5. Suppose for each m N that Am is a countable subset of a set X, then A = m=1 Am is countable. In short, the countable union of countable sets is still countable. 6. If X is an innite set and Y is a set with at least two elements, then Y X is uncountable. In particular 2X is uncountable for any innite set X. Proof. 1. If f : X N is an injective map then so is the restriction, f |A , of f to the subset A. 2. Let f (1) = min and dene f inductively by f (n + 1) = min ( \ {f (1), . . . , f (n)}) . Since is innite the process continues indenitely. The function f : N dened this way is a bijection. 3. If g : N X is a surjective map, let f (x) = min g 1 ({x}) = min {n N : f (n) = x} . Then f : X N is injective which combined with item

Page: 20

job: prob

3.3 Algebraic sub-structures of sets

21

Denition 3.10. Let E 2X be a collection of sets, A X, iA : A X be the inclusion map (iA (x) = x for all x A) and EA =
1 i A (E )

Example 3.15. Here are some examples of algebras. 1. B = 2X , then B is a algebra. 2. B = {, X } is a algebra called the trivial eld. 3. Let X = {1, 2, 3}, then A = {, X, {1} , {2, 3}} is an algebra while, S := {, X, {2, 3}} is a not an algebra but is a system. Proposition 3.16. Let E be any collection of subsets of X. Then there exists a unique smallest algebra A(E ) and algebra (E ) which contains E . Proof. Simply take A(E ) := and (E ) := {M : M is a algebra such that E M}. {A : A is an algebra such that E A}

= {A E : E E} .

3.2 Exercises
Let f : X Y be a function and {Ai }iI be an indexed family of subsets of Y, verify the following assertions. Exercise 3.1. (iI Ai ) =
c

iI Ac i.

Exercise 3.2. Suppose that B Y, show that B \ (iI Ai ) = iI (B \ Ai ). Exercise 3.3. f 1 (iI Ai ) = iI f 1 (Ai ). Exercise 3.4. f 1 (iI Ai ) = iI f 1 (Ai ). Exercise 3.5. Find a counterexample which shows that f (C D) = f (C ) f (D) need not hold. Example 3.11. Let X = {a, b, c} and Y = {1, 2} and dene f (a) = f (b) = 1 and f (c) = 2. Then = f ({a} {b}) = f ({a}) f ({b}) = {1} and {1, 2} = c c f ({a} ) = f ({a}) = {2} .

Example 3.17. Suppose X = {1, 2, 3} and E = {, X, {1, 2}, {1, 3}}, see Figure 3.1. Then

3.3 Algebraic sub-structures of sets


Denition 3.12. A collection of subsets A of a set X is a system or multiplicative system if A is closed under taking nite intersections. Denition 3.13. A collection of subsets A of a set X is an algebra (Field) if 1. , X A 2. A A implies that Ac A 3. A is closed under nite unions, i.e. if A1 , . . . , An A then A1 An A. In view of conditions 1. and 2., 3. is equivalent to 3 . A is closed under nite intersections. Denition 3.14. A collection of subsets B of X is a algebra (or sometimes called a eld) if B is an algebra which also closed under countable unions, i.e. if {Ai }i=1 B, then i=1 Ai B . (Notice that since B is also closed under taking complements, B is also closed under taking countable intersections.)
Page: 21 job: prob Fig. 3.1. A collection of subsets.

A(E ) = (E ) = 2X . On the other hand if E = {{1, 2}} , then A (E ) = {, X, {1, 2}, {3}}. Exercise 3.6. Suppose that Ei 2X for i = 1, 2. Show that A (E1 ) = A (E2 ) i E1 A (E2 ) and E2 A (E1 ) . Similarly show, (E1 ) = (E2 ) i E1 (E2 ) and E2 (E1 ) . Give a simple example where A (E1 ) = A (E2 ) while E1 = E2 .

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

22

3 Preliminaries

Denition 3.18. Let X be a set. We say that a family of sets F 2X is a partition of X if distinct members of F are disjoint and if X is the union of the sets in F . Example 3.19. Let X be a set and E = {A1 , . . . , An } where A1 , . . . , An is a partition of X. In this case A(E ) = (E ) = {i Ai : {1, 2, . . . , n}} where i Ai := when = . Notice that # (A(E )) = #(2{1,2,...,n} ) = 2n . Example 3.20. Suppose that X is a nite set and that A 2X is an algebra. For each x X let Ax = {A A : x A} A, wherein we have used A is nite to insure Ax A. Hence Ax is the smallest set in A which contains x. Let C = Ax Ay A. I claim that if C = , then Ax = Ay . To see this, let us rst consider the case where {x, y } C. In this case we must have Ax C and Ay C and therefore Ax = Ay . Now suppose either x or y is not in C. For deniteness, say x / C, i.e. x / y. Then x Ax \ Ay A from which it follows that Ax = Ax \ Ay , i.e. Ax Ay = . k Let us now dene {Bi }i=1 to be an enumeration of {Ax }xX . It is now a straightforward exercise to show A = {i Bi : {1, 2, . . . , k }} . Proposition 3.21. Suppose that B 2X is a algebra and B is at most a countable set. Then there exists a unique nite partition F of X such that F B and every element B B is of the form B = {A F : A B } . In particular B is actually a nite set and # (B ) = 2n for some n N. Proof. We proceed as in Example 3.20. For each x X let Ax = {A B : x A} B , wherein we have used B is a countable algebra to insure Ax B. Just as above either Ax Ay = or Ax = Ay and therefore F = {Ax : x X } B is a (necessarily countable) partition of X for which Eq. (3.1) holds for all B B . Enumerate the elements of F as F = {Pn }N n=1 where N N or N = . If N = , then the correspondence (3.1)

a {0, 1} Aa = {Pn : an = 1} B is bijective and therefore, by Lemma 3.8, B is uncountable. Thus any countable algebra is necessarily nite. This nishes the proof modulo the uniqueness assertion which is left as an exercise to the reader. Example 3.22 (Countable/Co-countable Field). Let X = R and E := {{x} : x R} . Then (E ) consists of those subsets, A R, such that A is countable or Ac is countable. Similarly, A (E ) consists of those subsets, A R, such that A is nite or Ac is nite. More generally we have the following exercise. Exercise 3.7. Let X be a set, I be an innite index set, and E = {Ai }iI be a partition of X. Prove the algebra, A (E ) , and that algebra, (E ) , generated by E are given by A(E ) = {i Ai : I with # () < or # (c ) < } and (E ) = {i Ai : I with countable or c countable} respectively. Here we are using the convention that i Ai := when = . Proposition 3.23. Let X be a set and E 2X . Let E c := {Ac : A E} and Ec := E {X, } E c Then A(E ) := {nite unions of nite intersections of elements from Ec }. (3.2)

Proof. Let A denote the right member of Eq. (3.2). From the denition of an algebra, it is clear that E A A(E ). Hence to nish that proof it suces to show A is an algebra. The proof of these assertions are routine except for possibly showing that A is closed under complementation. To check A is closed under complementation, let Z A be expressed as
N K

Z=
i=1 j =1

Aij

where Aij Ec . Therefore, writing Bij = Ac ij Ec , we nd that


N K K

Zc =
i=1 j =1

Bij =
j1 ,...,jN =1

(B1j1 B2j2 BN jN ) A

wherein we have used the fact that B1j1 B2j2 BN jN is a nite intersection of sets from Ec .

Page: 22

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

3.3 Algebraic sub-structures of sets

23

Remark 3.24. One might think that in general (E ) may be described as the countable unions of countable intersections of sets in E c . However this is in general false, since if

Proof. By Exercise 3.10, f 1 ( (E )) is a algebra and since E F , (E ) f 1 ( (E )). It now follows that (f 1 (E )) f 1 ( (E )).

Z=
i=1 j =1

Aij For the reverse inclusion, notice that f f 1 (E ) := B Y : f 1 (B ) f 1 (E )

with Aij Ec , then


Zc =
j1 =1,j2 =1,...jN =1,... =1

Ac,j

is a algebra which contains E and thus (E ) f f 1 (E ) . Hence for every B (E ) we know that f 1 (B ) f 1 (E ) , i.e. f 1 ( (E )) f 1 (E ) . Applying Eq. (3.3) with X = A and f = iA being the inclusion map implies
1 1 ( (E ))A = i A ( (E )) = (iA (E )) = (EA ).

which is now an uncountable union. Thus the above description is not correct. In general it is complicated to explicitly describe (E ), see Proposition 1.23 on page 39 of Folland for details. Also see Proposition 3.21. Exercise 3.8. Let be a topology on a set X and A = A( ) be the algebra generated by . Show A is the collection of subsets of X which may be written as nite union of sets of the form F V where F is closed and V is open. Solution to Exercise (3.8). In this case c is the collection of sets which are either open or closed. Now if Vi o X and Fj X for each j, then (n i=1 Vi ) m F is simply a set of the form V F where V X and F X. Therefore j o j =1 the result is an immediate consequence of Proposition 3.23. Denition 3.25. The Borel eld, B = BR = B (R) , on R is the smallest -eld containing all of the open subsets of R. Exercise 3.9. Verify the algebra, BR , is generated by any of the following collection of sets: 1. {(a, ) : a R} , 2. {(a, ) : a Q} or 3. {[a, ) : a Q} . Hint: make use of Exercise 3.6. Exercise 3.10. Suppose f : X Y is a function, F 2Y and B 2X . Show f 1 F and f B (see Notation 3.9) are algebras ( algebras) provided F and B are algebras ( algebras). Lemma 3.26. Suppose that f : X Y is a function and E 2Y and A Y then f 1 (E ) = f 1 ( (E )) and ( (E ))A = (EA ), (3.3) (3.4)

Example 3.27. Let E = {(a, b] : < a < b < } and B = (E ) be the Borel eld on R. Then E(0,1] = {(a, b] : 0 a < b 1} and we have B(0,1] = E(0,1] . In particular, if A B such that A (0, 1], then A E(0,1] . Denition 3.28. A function, f : Y is said to be simple if f ( ) Y is a nite set. If A 2 is an algebra, we say that a simple function f : Y is measurable if {f = y } := f 1 ({y }) A for all y Y. A measurable simple function, f : C, is called a simple random variable relative to A. Notation 3.29 Given an algebra, A 2 , let S(A) denote the collection of simple random variables from to C. For example if A A, then 1A S (A) is a measurable simple function. Lemma 3.30. For every algebra A 2 , the set simple random variables, S (A) , forms an algebra. Proof. Let us observe that 1 = 1 and 1 = 0 are in S (A) . If f, g S (A) and c C\ {0} , then {f + cg = } =
a,bC:a+cb=

where BA := {B A : B B} . (Similar assertion hold with () being replaced by A () .)


Page: 23 job: prob

({f = a} {g = b}) A

(3.5)

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

24

3 Preliminaries

and {f g = } =
a,bC:ab=

({f = a} {g = b}) A

(3.6)

from which it follows that f + cg and f g are back in S (A) . Denition 3.31. A simple function algebra, S, is a subalgebra of the bounded complex functions on X such that 1 S and each function, f S, is a simple function. If S is a simple function algebra, let A (S) := {A X : 1A S} . (It is easily checked that A (S) is a sub-algebra of 2X .) Lemma 3.32. Suppose that S is a simple function algebra, f S and f (X ) . Then {f = } A (S) . Proof. Let {i }i=0 be an enumeration of f (X ) with 0 = . Then
n 1 n n

3. If f : C is a simple function such that 1{f =} S for all C, then f = C 1{f =} S. Conversely, by Lemma 3.32, if f S then 1{f =} S for all C. Therefore, a simple function, f : X C is in S i 1{f =} S for all C. With this preparation, we are now ready to complete the verication. First o, A A (S (A)) 1A S (A) A A which shows that A (S (A)) = A. Similarly, f S (A (S)) {f = } A (S) C 1{f =} S C f S which shows S (A (S)) = S.

g :=
i=1

( i )
i=1

(f i 1) S.

Moreover, we see that g = 0 on n i=1 {f = i } while g = 1 on {f = } . So we have shown g = 1{f =} S and therefore that {f = } A. Exercise 3.11. Continuing the notation introduced above: 1. Show A (S) is an algebra of sets. 2. Show S (A) is a simple function algebra. 3. Show that the map A Algebras 2X S (A) {simple function algebras on X } is bijective and the map, S A (S) , is the inverse map. Solution to Exercise (3.11). 1. Since 0 = 1 , 1 = 1X S, it follows that and X are in A (S) . If A A (S) , then 1Ac = 1 1A S and so Ac A (S) . Finally, if A, B A (S) then 1AB = 1A 1B S and thus A B A (S) . 2. If f, g S (A) and c F, then {f + cg = } =
a,bF:a+cb=

({f = a} {g = b}) A

and {f g = } =
a,bF:ab=

({f = a} {g = b}) A

from which it follows that f + cg and f g are back in S (A) .


Page: 24 job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

4 Finitely Additive Measures


Denition 4.1. Suppose that E 2X is a collection of subsets of X and : E [0, ] is a function. Then 1. is monotonic if (A) (B ) for all A, B E with A B. 2. is sub-additive (nitely sub-additive) on E if
n

1. ( is monotone) (E ) (F ) if E F. 2. For A, B A, the following strong additivity formula holds; (A B ) + (A B ) = (A) + (B ) . 3. ( is nitely subbadditive) (n j =1 Ej ) 4. is sub-additive on A i
n j =1

(4.3)

(Ej ).

(E )
i=1 n

(Ei )

whenever E = i=1 Ei E with n N {} (n N). 3. is super-additive (nitely super-additive) on E if


n

(A)
i=1

(Ai ) for A =
i=1

Ai

(4.4)

(E )
i=1 n i=1

(Ei )

(4.1)

where A A and {Ai }i=1 A are pairwise disjoint sets. 5. ( is countably superadditive) If A = i=1 Ai with Ai , A A, then

whenever E = Ei E with n N {} (n N). 4. is additive or nitely additive on E if


n

i=1

Ai

i=1

(Ai ) .

6. A nitely additive measure, , is a premeasure i is sub-additve. (Ei ) (4.2) Proof. 1. Since F is the disjoint union of E and (F \ E ) and F \ E = F E c A it follows that (F ) = (E ) + (F \ E ) (E ). 2. Since A B = [A \ (A B )] [B \ (A B )] A B,

(E ) =
i=1 n

whenever E = i=1 Ei E with Ei E for i = 1, 2, . . . , n < . 5. If E = A is an algebra, () = 0, and is nitely additive on A, then is said to be a nitely additive measure. 6. is additive (or countable additive) on E if item 4. holds even when n = . 7. If E = A is an algebra, () = 0, and is additive on A then is called a premeasure on A. 8. A measure is a premeasure, : B [0, ] , where B is a algebra. We say that is a probability measure if (X ) = 1.

(A B ) = (A B \ (A B )) + (A B ) = (A \ (A B )) + (B \ (A B )) + (A B ) . Adding (A B ) to both sides of this equation proves Eq. (4.3). j s are pair-wise disjoint and 3. Let Ej = Ej \ (E1 Ej 1 ) so that the E n E = j =1 Ej . Since Ej Ej it follows from the monotonicity of that (E ) = (Ej ) (Ej ).

4.1 Finitely Additive Measures


Proposition 4.2 (Basic properties of nitely additive measures). Suppose is a nitely additive measure on an algebra, A 2X , E, F A with n E F and {Ej }j =1 A, then :

26

4 Finitely Additive Measures


4. If A = i=1 Bi with A A and Bi A, then A = i=1 Ai where Ai := Bi \ (B1 . . . Bi1 ) A and B0 = . Therefore using the monotonicity of and Eq. (4.4)

It is clear that 2 = 4 and that 3 = 5. To nish the proof we will show 5 = 2 and 5 = 3. 5 = 2. If An A such that An A A, then A \ An and therefore
n

(A)
i=1

(Ai )
i=1

(Bi ).
n

lim [P (A) P (An )] = lim P (A \ An ) = 0.


n

5. Suppose that A = i=1 Ai with Ai , A A, then i=1 Ai A for all n n and so by the monotonicity and nite additivity of , i=1 (Ai ) (A) . Letting n in this equation shows is superadditive. 6. This is a combination of items 5. and 6.

5 = 3. If An A such that An A A, then An \ A . Therefore,


n

lim [P (An ) P (A)] = lim P (An \ A) = 0.


n

Proposition 4.3. Suppose that P is a nitely additive probability measure on an algebra, A 2 . Then the following are equivalent: 1. P is additive on 2. For all An A such 3. For all An A such 4. For all An A such 5. For all An A such A. that that that that An An An An A A, P (An ) P (A) . A A, P (An ) P (A) . , P (An ) 1. , P (An ) 1.

Remark 4.4. Observe that the equivalence of items 1. and 2. in the above proposition hold without the restriction that P ( ) = 1 and in fact P ( ) = may be allowed for this equivalence. Denition 4.5. Let (, B ) be a measurable space, i.e. B 2 is a algebra. A probability measure on (, B ) is a nitely additive probability measure, P : B [0, 1] such that any and hence all of the continuity properties in Proposition 4.3 hold. We will call (, B , P ) a probability space. Lemma 4.6. Suppose that (, B , P ) is a probability space, then P is countably sub-additive. Proof. Suppose that An B and let A1 := A1 and for n 2, let An := An \ (A1 . . . An1 ) B . Then
P ( n=1 An ) = P (n=1 An ) = n=1

Proof. We will start by showing 1 2 3. 1 = 2. Suppose An A such that An A A. Let An := An \ An1 with A0 := . Then {An }n=1 are disjoint, An = n k=1 Ak and A = k=1 Ak . Therefore,
n

P (An )
n=1

P (An ) .

P (A) =
k=1

P (Ak ) = lim

P (Ak ) = lim P (n k=1 Ak ) = lim P (An ) .


k=1 n n

= 1. If {An }n=1 A are disjoint and A := n=1 An A, then N n=1 An A. Therefore, 2


N

4.2 Examples of Measures


Most algebras and -additive measures are somewhat dicult to describe and dene. However, there are a few special cases where we can describe explicitly what is going on. Example 4.7. Suppose that is a nite set, B := 2 , and p : [0, 1] is a function such that p ( ) = 1.

P (A) = lim P
N

N n=1 An

= lim

P (An ) =
n=1 n=1

P (An ) .

c 2 = 3. If An A such that An A A, then Ac n A and therefore, c lim (1 P (An )) = lim P (Ac n ) = P (A ) = 1 P (A) . n

3 = 2. If An A such that An A A, then again have,


n n

Ac n

A and therefore we

Then P (A) :=
A

p ( ) for all A

c lim (1 P (An )) = lim P (Ac n ) = P (A ) = 1 P (A) .

denes a measure on 2 .
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 26

job: prob

4.2 Examples of Measures

27

Example 4.8. Suppose that X is any set and x X is a point. For A X, let x (A) = 1 if x A 0 if x / A.

(A) =
F A

() =
F

()1A

Then = x is a measure on X called the Dirac delta measure at x. Example 4.9. Suppose that is a measure on X and > 0, then is also a measure on X. Moreover, if {j }j J are all measures on X, then = j =1 j , i.e.

where 1A is one if A and zero otherwise. We may check that is a measure on B . Indeed, if A = i=1 Ai and F , then A i Ai for one and hence exactly one Ai . Therefore 1A = i=1 1Ai and hence

(A) =
F

()1A =
F

()
i=1

1Ai

(A) =
j =1

j (A) for all A X

=
i=1 F

()1Ai =
i=1

(Ai )

is a measure on X. (See Section 3.1 for the meaning of this sum.) To prove this we must show that is countably additive. Suppose that {Ai }i=1 is a collection of pair-wise disjoint subsets of X, then

as desired. Thus we have shown that there is a one to one correspondence between measures on B and functions : F [0, ]. The following example explains what is going on in a more typical case of interest to us in the sequel. Example 4.12. Suppose that = R, A consists of those sets, A R which may be written as nite disjoint unions from S := {(a, b] R : a b } . We will show below the following:

( i=1 Ai )

(Ai ) =
i=1 i=1 j =1

j (Ai ) j ( i=1 Ai )
j =1

=
j =1 i=1

j (Ai ) = ( i=1 Ai )

wherein the third equality we used Theorem 1.6 and in the fourth we used that fact that j is a measure. Example 4.10. Suppose that X is a set : X [0, ] is a function. Then :=
xX

1. A is an algebra. (Recall that BR = (A) .) 2. To every increasing function, F : R [0, 1] such that F () := lim F (x) = 0 and
x x

(x)x

F (+) := lim F (x) = 1 there exists a nitely additive probability measure, P = PF on A such that

is a measure, explicitly (A) =


xA

(x)

P ((a, b] R) = F (b) F (a) for all a b . 3. P is additive on A i F is right continuous. 4. P extends to a probability measure on BR i F is right continuous. Let us observe directly that if F (a+) := limxa F (x) = F (a) , then (a, a + 1/n] while P ((a, a + 1/n]) = F (a + 1/n) F (a) F (a+) F (a) > 0. Hence P can not be additive on A in this case.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

for all A X. Example 4.11. Suppose that F 2X is a countable or nite partition of X and B 2X is the algebra which consists of the collection of sets A X such that A = { F : A} . (4.5) Any measure : B [0, ] is determined uniquely by its values on F . Conversely, if we are given any function : F [0, ] we may dene, for A B ,

Page: 27

job: prob

28

4 Finitely Additive Measures

4.3 Simple Integration


Proof. Denition 4.13 (Simple Integral). Suppose now that P is a nitely additive probability measure on an algebra A 2X . For f S (A) the integral or expectation, E(f ) = EP (f ), is dened by EP (f ) =
y C

1. If = 0, then E(f ) =
y C{}

y P (f = y ) =
y C{}

y P (f = y/)

yP (f = y ).

(4.6) =

z P (f = z ) = E(f ).
z C{}

Example 4.14. Suppose that A A, then E1A = 0 P (Ac ) + 1 P (A) = P (A) . (4.7)

The case = 0 is trivial. 2. Writing {f = a, g = b} for f 1 ({a}) g 1 ({b}), then E(f + g ) =


z C

Remark 4.15. Let us recall that our intuitive notion of P (A) was given as in Eq. (2.1) by 1 P (A) = lim # {1 k N : (k ) A} N N where (k ) was the result of the k th independent experiment. If we use this interpretation back in Eq. (4.6), we arrive at E(f ) =
y C

z P (f + g = z ) z P (a+b=z {f = a, g = b})
z C

= =
z C

z
a+b=z

P ({f = a, g = b}) (a + b) P ({f = a, g = b})

yP (f = y ) = lim 1 N N 1 N N
N

1 N

y # {1 k N : f ( (k )) = y }
y C

=
z C a+b=z

= lim

y
y C N k=1

1f ((k))=y = lim

1 N N

f ( (k )) 1f ((k))=y
k=1 y C

=
a,b

(a + b) P ({f = a, g = b}) .

But f ( (k )) . aP ({f = a, g = b}) =


a,b a

= lim

k=1

a
b

P ({f = a, g = b})

Thus informally, Ef should represent the average of the values of f over many independent experiments. Proposition 4.16. The expectation operator, E = EP , satises: 1. If f S(A) and C, then E(f ) = E(f ). 2. If f, g S (A) , then E(f + g ) = E(g ) + E(f ). (4.9) 3. E is positive, i.e. E(f ) 0 if f is a non-negative measurable simple function. 4. For all f S (A) , |Ef | E |f | . (4.10)
Page: 28 job: prob

=
a

aP (b {f = a, g = b}) aP ({f = a}) = Ef


a

= and similarly,

(4.8)

bP ({f = a, g = b}) = Eg.


a,b

Equation (4.9) is now a consequence of the last three displayed equations. 3. If f 0 then E(f ) = aP (f = a) 0.
a0

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

4.3 Simple Integration

29

4. First observe that |f | =


C

|| 1f =

1 1A = 1Ac =
n=1 M

1 Ac = n
n=1 k

(1 1An ) 1An1 1Ank

and therefore, E |f | = E
C

= || 1f = =
C

(1)
k=0 M

|| E1f = =
C

|| P (f = ) max |f | . =

0n1 <n2 <<nk M

(1)
k=0

k 0n1 <n2 <<nk M

1An1 Ank

On the other hand, |Ef | =


C

from which it follows that P (f = )


C

|| P (f = ) = E |f | . 1M = 1A = n=1 An

(1)
k=1

k+1 1n1 <n2 <<nk M

1An1 Ank .

(4.13)

Remark 4.17. Every simple measurable function, f : C, may be written as N f = j =1 j 1Aj for some j C and some Aj C. Moreover if f is represented this way, then
N N N

Taking expectations of this equation then gives Eq. (4.12). Remark 4.20. Here is an alternate proof of Eq. (4.13). Let and by relabeling the sets {An } if necessary, we may assume that A1 Am and / Am+1 AM for some 0 m M. (When m = 0, both sides of Eq. (4.13) are zero and so we will only consider the case where 1 m M.) With this notation we have
M

Ef = E
j =1

j 1 Aj =
j =1

j E1Aj =
j =1

j P (Aj ) .

Remark 4.18 (Chebyshevs Inequality). Suppose that f S(A), > 0, and p > 0, then P ({|f | }) = E 1|f | E Observe that |f | =
C p

(1)
k=1 m

k+1 1n1 <n2 <<nk M

1An1 Ank ( ) 1An1 Ank ( )


1n1 <n2 <<nk m

|f | p 1|f | p E |f | . p || 1{f =}
|| p

(4.11)

=
k=1 m

(1)

k+1

=
k=1

(1)
m

k+1

m k
k nk

is a simple random variable and {|f | } = Therefore,


|f |p p 1|f |

{f = } A as well.

=1
k=0

(1) (1)
m

is still a simple random variable.

m k

Lemma 4.19 (Inclusion Exclusion Formula). If An A for n = 1, 2, . . . , M such that M n=1 An < , then
M

= 1 (1 1)

= 1.

This veries Eq. (4.13) since 1M ( ) = 1. n=1 An Example 4.21 (Coincidences). Let be the set of permutations (think of card A) shuing), : {1, 2, . . . , n} {1, 2, . . . , n} , and dene P (A) := #( n! to be the uniform distribution (Haar measure) on . We wish to compute the probability of the event, B, that a random permutation xes some index i. To do this, let Ai := { : (i) = i} and observe that B = n i=1 Ai . So by the Inclusion Exclusion Formula, we have
macro: svmonob.cls date/time: 23-Feb-2007/15:20

M n=1 An =
k=1

(1)

k+1 1n1 <n2 <<nk M

(An1 Ank ) .

(4.12)

Proof. This may be proved inductively from Eq. (4.3). We will give a different and perhaps more illuminating proof here. Let A := M n=1 An . c M c Since Ac = M A = A , we have n n=1 n=1 n
Page: 29 job: prob

30

4 Finitely Additive Measures


n

P (B ) =
k=1

(1)

k+1 1i1 <i2 <i3 <<ik n

P (Ai1 Aik ) .

and so P ( a xed point) = while


3

4 2 = 6 3

Since P (Ai1 Aik ) = P ({ : (i1 ) = i1 , . . . , (ik ) = ik }) = and # {1 i1 < i2 < i3 < < ik n} = we nd P (B ) =
k=1 n

(1)
k=1

k+1

1 1 2 1 =1 + = k! 2 6 3

(n k )! n! n , k
k+1

and EN =

1 (3 + 1 + 1 + 0 + 0 + 1) = 1. 6

(1)

k+1

n (n k )! = k n!

(1)
k=1

1 . k!

4.4 Simple Independence and the Weak Law of Large Numbers


For the next two problems, let be a nite set, n N, = n , and Xi : be dened by Xi ( ) = i for and i = 1, 2, . . . , n. We further suppose p : [0, 1] is a function such that p ( ) = 1

For large n this gives,


n

P (B ) =
k=1

(1)

1 = e1 1 = 0.632. k!

Example 4.22. Continue the notation in Example 4.21. We now wish to compute the expected number of xed points of a random permutation, , i.e. how many cards in the shued stack have not moved on average. To this end, let X i = 1 Ai and observe that
n n

and P : 2 [0, 1] is the probability measure dened by P (A) :=


A

p ( ) for all A 2 .

(4.14)

N ( ) =
i=1

Xi ( ) =
i=1

1(i)=i = # {i : (i) = i} .

Exercise 4.1 (Simple Independence 1.). Suppose qi : [0, 1] are funcn tions such that qi () = 1 for i = 1, 2, . . . , n and If p ( ) = i=1 qi (i ) . Show for any functions, fi : R that
n n n

denote the number of xed points of . Hence we have


n n n

EP
i=1

fi (Xi ) =
i=1

EP [fi (Xi )] =
i=1

EQi fi

EN =
i=1

EXi =
i=1

P (Ai ) =
i=1

(n 1)! = 1. n!

where Qi ( ) =

qi () for all .

Let us check the above formula when n = 6. In this case we have 1 1 2 2 3 3


Page: 30

2 3 1 3 1 2

3 2 3 1 2 1

N ( ) 3 1 1 0 0 1
job: prob

Exercise 4.2 (Simple Independence 2.). Prove the converse of the previous exercise. Namely, if
n n

EP
i=1

fi (Xi ) =
i=1

EP [fi (Xi )]

(4.15)

for any functions, fi : R, then there exists functions qi : [0, 1] with n qi () = 1, such that p ( ) = i=1 qi (i ) .
macro: svmonob.cls date/time: 23-Feb-2007/15:20

4.4 Simple Independence and the Weak Law of Large Numbers

31

Exercise 4.3 (A Weak Law of Large Numbers). Suppose that R n is a nite set, n N, = n , p ( ) = i=1 q (i ) where q : [0, 1] such that q () = 1, and let P : 2 [0, 1] be the probability measure dened as in Eq. (4.14). Further let Xi ( ) = i for i = 1, 2, . . . , n, := EXi , 2 2 := E (Xi ) , and Sn = 1. Show, =

Proof. Let x [0, 1] , = {0, 1} , q (0) = 1 x, q (1) = x, = n , and P Pn 1 n i=1 i Px ({ }) = q (1 ) . . . q (n ) = x i=1 i (1 x) . As above, let Sn =
1 n

(X1 + + Xn ) , where Xi ( ) = i and observe that Px Sn = k n = n k nk x (1 x) . k

1 (X1 + + Xn ) . n

q () and
2

Therefore, writing Ex for EPx , we have


2

( ) q () =

q () .

(4.16) Ex [f (Sn )] =

f
k=0

2. Show, ESn = . 3. Let ij = 1 if i = j and ij = 0 if i = j. Show E [(Xi ) (Xj )] = ij . 4. Using Sn may be expressed as,
1 n 2 n i=1 2

k n

n k nk x (1 x) = pn (x) . k

Hence we nd |pn (x) f (x)| = |Ex f (Sn ) f (x)| = |Ex [f (Sn ) f (x)]| Ex |f (Sn ) f (x)| = Ex [|f (Sn ) f (x)| : |Sn x| ] + Ex [|f (Sn ) f (x)| : |Sn x| < ] 2M Px (|Sn x| ) + () where M := max |f (y )| and
y [0,1]

(Xi ) , show (4.17)

E (Sn ) =

1 2 . n 1 2 . n2

5. Conclude using Eq. (4.17) and Remark 4.18 that P (|Sn | ) (4.18)

So for large n, Sn is concentrated near = EXi with probability approaching 1 for n large. This is a version of the weak law of large numbers. Exercise 4.4 (Bernoulli Random Variables). Let = {0, 1} , , X : R be dened by X (0) = 0 and X (1) = 1, x [0, 1] , and dene Q = x1 + (1 x) 0 , i.e. Q ({0}) = 1 x and Q ({1}) = x. Verify, (x) := EQ X = x and 2 (x) := EQ (X x) = (1 x) x 1/4. Theorem 4.23 (Weierstrass Approximation Theorem via Bernsteins Polynomials.). Suppose that f C ([0, 1] , C) and
n 2

() := sup {|f (y ) f (x)| : x, y [0, 1] and |y x| } is the modulus of continuity of f. Now by the above exercises, Px (|Sn x| ) and hence we may conclude that max |pn (x) f (x)| M + () 2n2 1 4n2 (see Figure 4.1)

x[0,1]

pn (x) :=
k=0

n f k

k n

xk (1 x)

nk

and therefore, that lim sup max |pn (x) f (x)| () .


n x[0,1]

Then
n x[0,1]

lim

sup |f (x) pn (x)| = 0.

This completes the proof, since by uniform continuity of f, () 0 as 0.

(See Theorem 14.42 for a multi-dimensional generalization of this theorem.)


Page: 31 job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

32

4 Finitely Additive Measures

Proof. Let A denote the collection of sets which may be written as nite disjoint unions of sets from S . Clearly S A A(S ) so it suces to show A is an algebra since A(S ) is the smallest algebra containing S . By the properties of S , we know that , X A. Now suppose that Ai = F i F A where, for i = 1, 2, . . . , n, i is a nite collection of disjoint sets from S . Then
n n

Ai =
i=1 i=1 F i

=
(F1 ,,...,Fn )1 n

(F1 F2 Fn )

and this is a disjoint (you check) union of elements from S . Therefore A is closed under nite intersections. Similarly, if A = F F with being a nite collection of disjoint sets from S , then Ac = F F c . Since by assumption F c A for F S and A is closed under nite intersections, it follows that Ac A.
Fig. 4.1. Plots of Px (Sn = k/n) versus k/n for n = 100 with x = 1/4 (black), x = 1/2 (red), and x = 5/6 (green).

be as in Example Example 4.27. Let X = R and S := (a, b] R : a, b R 4.25. Then A(S ) may be described as being those sets which are nite disjoint unions of sets from S . Proposition 4.28 (Construction of Finitely Additive Measures). Suppose S 2X is a semi-algebra (see Denition 4.24) and A = A(S ) is the algebra generated by S . Then every additive function : S [0, ] such that () = 0 extends uniquely to an additive measure (which we still denote by ) on A. Proof. Since (by Proposition 4.26) every element A A is of the form A = i Ei for a nite collection of Ei S , it is clear that if extends to a measure then the extension is unique and must be given by (A) = (Ei ).
i

4.5 Constructing Finitely Additive Measures


Denition 4.24. A set S 2X is said to be an semialgebra or elementary class provided that S S is closed under nite intersections if E S , then E c is a nite disjoint union of sets from S . (In particular X = c is a nite disjoint union of elements from S .) Example 4.25. Let X = R, then S := (a, b] R : a, b R = {(a, b] : a [, ) and a < b < } {, R} is a semi-eld

(4.19)

To prove existence, the main point is to show that (A) in Eq. (4.19) is well dened; i.e. if we also have A = j Fj with Fj S , then we must show (Ei ) =
i j

(Fj ).

(4.20) (Ei

Exercise 4.5. Let A 2X and B 2Y be semi-elds. Show the collection E := {A B : A A and B B} is also a semi-eld. Proposition 4.26. Suppose S 2X is a semi-eld, then A = A(S ) consists of sets which may be written as nite disjoint unions of sets from S .
i

But Ei = j (Ei Fj ) and the additivity of on S implies (Ei ) = Fj ) and hence (Ei ) =
i j

(Ei Fj ) =
i,j

(Ei Fj ).

Similarly,
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 32

job: prob

(Fj ) =
j i,j

(Ei Fj )

which combined with the previous equation shows that Eq. (4.20) holds. It is now easy to verify that extended to A as in Eq. (4.19) is an additive measure on A. Proposition 4.29. Let X = R, S be a semi-algebra S = {(a, b] R : a b }, (4.21)

and A = A(S ) be the algebra formed by taking nite disjoint unions of elements from S , see Proposition 4.26. To each nitely additive probability measures : [0, 1] such that A [0, ], there is a unique increasing function F : R F () = 0, F () = 1 and . ((a, b] R) = F (b) F (a) a b in R (4.22)

[0, 1] such that F () = 0, Conversely, given an increasing function F : R F () = 1 there is a unique nitely additive measure = F on A such that the relation in Eq. (4.22) holds. Proof. Given a nitely additive probability measure , let . F (x) := ((, x] R) for all x R Then F () = 1, F () = 0 and for b > a, F (b) F (a) = ((, b] R) ((, a]) = ((a, b] R) . [0, 1] as in the statement of the theorem is Conversely, suppose F : R given. Dene on S using the formula in Eq. (4.22). The argument will be completed by showing is additive on S and hence, by Proposition 4.28, has a unique extension to a nitely additive measure on A. Suppose that
n

(a, b] =
i=1

(ai , bi ].

By reordering (ai , bi ] if necessary, we may assume that a = a1 < b1 = a2 < b2 = a3 < < bn1 = an < bn = b. Therefore, by the telescoping series argument,
n n

((a, b] R) = F (b) F (a) =


i=1

[F (bi ) F (ai )] =
i=1

((ai , bi ] R).

5 Countably Additive Measures


5.1 Distribution Function for Probability Measures on (R, BR )
Denition 5.1. Given a probability measure, P on BR , the cumulative distribution function (CDF) of P is dened as the function, F = FP : R [0, 1] given as F (x) := P ((, x]) . Example 5.2. Suppose that P = p1 + q1 + r with p, q, r > 0 and p + q + r = 1. In this case, 0 for x < 1 p for 1 x < 1 . F (x) = p + q for 1 x < 1 for x < Lemma 5.3. If F = FP : R [0, 1] is a distribution function for a probability measure, P, on BR , then: 1. F 2. F 3. F 4. F () := limx F (x) = 0, () := limx F (x) = 1, is non-decreasing, and is right continuous. 0 for x 0 F (x) := x for 0 x < 1 , 1 for 1 x < is the distribution function for a measure, m on BR which is concentrated on (0, 1]. The measure, m is called the uniform distribution or Lebesgue measure on (0, 1]. Recall from Denition 3.14 that B 2X is a algebra on X if B is an algebra which is closed under countable unions and intersections.

5.2 Construction of Premeasures


Proposition 5.6. Suppose that S 2X is a semi-algebra, A = A(S ) and : A [0, ] is a nitely additive measure. Then is a premeasure on A i is sub-additive on S . Proof. Clearly if is a premeasure on A then is - additive and hence sub-additive on S . Because of Proposition 4.2, to prove the converse it suces to show that the sub-additivity of on S implies the sub-additivity of on A.

So suppose A =
n=1

An with A A and each An A which we express as


Nn i=1

A=

k j =1

Ej with Ej S and An =

En,i with En,i S . Then


Nn

Theorem 5.4. To each function F : R [0, 1] satisfying properties 1. 4. in Lemma 5.3, there exists a unique probability measure, PF , on BR such that PF ((a, b]) = F (b) F (a) for all < a b < . Proof. The uniqueness assertion in the theorem is covered in Exercise 5.1 below. The existence portion of the Theorem follows from Proposition 5.7 and Theorem 5.19 below. Example 5.5 (Uniform Distribution). The function,

Ej = A Ej =
n=1

An Ej =
n=1 i=1

En,i Ej

which is a countable union and hence by assumption,


Nn

(Ej )
n=1 i=1

(En,i Ej ) .

Summing this equation on j and using the nite additivity of shows

36

5 Countably Additive Measures


k k Nn N N

(A) =
j =1

(Ej )
j =1 n=1 i=1 Nn k

(En,i Ej )
Nn

II
n=1

o J n
n=1

n . J

=
n=1 i=1 j =1

(En,i Ej ) =
n=1 i=1

(En,i ) =
n=1

(An ) ,

Hence by nite sub-additivity of ,


N

which proves (using Proposition 4.2) the sub-additivity of on A. Now suppose that F : R R be an increasing function, F () := limx F (x) and = F be the nitely additive measure on (R, A) described in Proposition 4.29. If happens to be a premeasure on A, then, letting An = (a, bn ] with bn b as n , implies F (bn ) F (a) = ((a, bn ]) ((a, b]) = F (b) F (a). Since was an arbitrary sequence such that bn b, we have shown limyb F (y ) = F (b), i.e. F is right continuous. The next proposition shows the converse is true as well. Hence premeasures on A which are nite on bounded sets are in one to one correspondences with right continuous increasing functions which vanish at 0. Proposition 5.7. To each right continuous increasing function F : R R there exists a unique premeasure = F on A such that F ((a, b]) = F (b) F (a) < a < b < . Proof. As above, let F () := limx F (x) and = F be as in Proposition 4.29. Because of Proposition 5.6, to nish the proof it suces to show is sub-additive on S . First suppose that < a < b < , J = (a, b], Jn = (an , bn ] such that
{bn }n=1

F (b) F ( a) = (I )
n=1

n ) (J
n=1

n ). (J

Using the right continuity of F and letting a a in the above inequality,

(J ) = ((a, b]) = F (b) F (a)


n=1

n J (5.2)

=
n=1

(Jn ) +
n=1

n \ Jn ). (J

Given > 0, we may use the right continuity of F to choose bn so that n \ Jn ) = F ( (J bn ) F (bn ) 2n n N. Using this in Eq. (5.2) shows

(J ) = ((a, b])
n=1

(Jn ) +

J=
n=1

Jn . We wish to show

which veries Eq. (5.1) since > 0 was arbitrary. The hard work is now done but we still have to check the cases where a = or b = . For example, suppose that b = so that

(J )
n=1

(Jn ).

(5.1) with Jn = (an , bn ] R. Then

J = (a, ) =
n=1

Jn

To do this choose numbers a > a, bn > bn in which case I := ( a, b] J,


o n := (an , n J bn ] J := (an , bn ) J n . n=1

IM := (a, M ] = J IM =
1 n=1

Jn IM

= [ J Since I a, b] is compact and I


1

o J n

there exists N < such that

o n To see this, let c := sup x b : [ a, x] is nitely covered by J . If c < b, n=1 o o m for some m and there exists x J m such that [ then c J a, x] is nitely covered o o n n by , say by J . We would then have that J n=1 n=1 n=1 o m covers [a, c ] for all c J . But this contradicts the denition of c.

n o o

and so by what we have already proved,


n o
o n J

n oN

n omax(m,N )

F (M ) F (a) = (IM )
n=1

(Jn IM )
n=1

(Jn ).

nitely

Now let M in this last inequality to nd that


macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 36

job: prob

5.3 Regularity and Uniqueness Results

37

((a, )) = F () F (a)
n=1

(Jn ). Therefore,

C \ A = [ i=1 Ci ] \ A = i=1 [Ci \ A] i=1 [Ci \ Ai ] .

The other cases where a = and b R and a = and b = are handled similarly. Before continuing our development of the existence of measures, we will pause to show that measures are often uniquely determined by their values on a generating sub-algebra. This detour will also have the added benet of motivating Carathoedorys existence proof to be given below.

(C \ A) =

( i=1

[Ci \ A])
i=1

(Ci \ A)
i=1

(Ci \ Ai ) < .

5.3 Regularity and Uniqueness Results


Denition 5.8. Given a collection of subsets, E , of X, let E denote the collection of subsets of X which are nite or countable unions of sets from E . Similarly let E denote the collection of subsets of X which are nite or countable intersections of sets from E . We also write E = (E ) and E = (E ) , etc. Lemma 5.9. Suppose that A 2X is an algebra. Then: 1. A is closed under taking countable unions and nite intersections. 2. A is closed under taking countable intersections and nite unions. 3. {Ac : A A } = A and {Ac : A A } = A . Proof. By construction A is closed under countable unions. Moreover if A = i=1 Ai and B = j =1 Bj with Ai , Bj A, then A B = i,j =1 Ai Bj A , which shows that A is also closed under nite intersections. Item 3. is straight forward and item 2. follows from items 1. and 3. Theorem 5.10 (Finite Regularity Result). Suppose A 2X is an algebra, B = (A) and : B [0, ) is a nite measure, i.e. (X ) < . Then for every > 0 and B B there exists A A and C A such that A B C and (C \ A) < . Proof. Let B0 denote the collection of B B such that for every > 0 there here exists A A and C A such that A B C and (C \ A) < . It is now clear that A B0 and that B0 is closed under complementation. Now suppose that Bi B0 for i = 1, 2, . . . and > 0 is given. By assumption there exists Ai A and Ci A such that Ai Bi Ci and (Ci \ Ai ) < 2i . N Let A := := N i=1 Ai , A i=1 Ai A , B := i=1 Bi , and C := i=1 Ci N A . Then A A B C and
Page: 37 job: prob

Since C \ AN C \ A, it also follows that C \ AN < for suciently large N and this shows B = i=1 Bi B0 . Hence B0 is a sub- -algebra of B = (A) which contains A which shows B0 = B . Many theorems in the sequel will require some control on the size of a measure . The relevant notion for our purposes (and most purposes) is that of a nite measure dened next. Denition 5.11. Suppose X is a set, E B 2X and : B [0, ] is a function. The function is nite on E if there exists En E such that (En ) < and X = n=1 En . If B is a algebra and is a measure on B which is nite on B we will say (X, B , ) is a nite measure space. The reader should check that if is a nitely additive measure on an algebra, B , then is nite on B i there exists Xn B such that Xn X and (Xn ) < . Corollary 5.12 ( Finite Regularity Result). Theorem 5.10 continues to hold under the weaker assumption that : B [0, ] is a measure which is nite on A. Proof. Let Xn A such that n=1 Xn = X and (Xn ) < for all n.Since A B n (A) := (Xn A) is a nite measure on A B for each n, by Theorem 5.10, for every B B there exists Cn A such that B Cn and (Xn [Cn \ B ]) = n (Cn \ B ) < 2n . Now let C := n=1 [Xn Cn ] A and observe that B C and (C \ B ) = ( n=1 ([Xn Cn ] \ B ))

n=1

([Xn Cn ] \ B ) =
n=1

(Xn [Cn \ B ]) < .

Applying this result to B c shows there exists D A such that B c D and (B \ Dc ) = (D \ B c ) < . So if we let A := Dc A , then A B C and (C \ A) = ([B \ A] [(C \ B ) \ A]) (B \ A) + (C \ B ) < 2 and the result is proved.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

38

5 Countably Additive Measures

Exercise 5.1. Suppose A 2X is an algebra and and are two measures on B = (A) . a. Suppose that and are nite measures such that = on A. Show = . b. Generalize the previous assertion to the case where you only assume that and are nite on A. Corollary 5.13. Suppose A 2X is an algebra and : B = (A) [0, ] is a measure which is nite on A. Then for all B B, there exists A A and C A such that A B C and (C \ A) = 0. Proof. By Theorem 5.10, given B B , we may choose An A and Cn A such that An B Cn and (Cn \ B ) 1/n and (B \ An ) 1/n. N By replacing AN by N n=1 An and CN by n=1 Cn , we may assume that An and Cn as n increases. Let A = An A and C = Cn A , then A B C and (C \ A) = (C \ B ) + (B \ A) (Cn \ B ) + (B \ An ) 2/n 0 as n .

Proposition 5.15. Let be a premeasure on an algebra A, then has a unique extension (still called ) to a function on A satisfying the following properties. 1. (Continuity) If An A and An A A , then (An ) (A) as n . 2. (Monotonicity) If A, B A with A B then (A) (B ) . 3. (Strong Additivity) If A, B A , then (A B ) + (A B ) = (A) + (B ) . (5.3)

4. (Sub-Additivity on A ) The function is sub-additive on A , i.e. if {An }n=1 A , then

( n=1 An )
n=1

(An ) .

(5.4)

5. ( - Additivity on A ) The function is countably additive on A . Proof. Let A, B be sets in A such that A B and suppose {An }n=1 and {Bn }n=1 are sequences in A such that An A and Bn B as n . Since Bm An An as m , the continuity of on A implies, (An ) = lim (Bm An ) lim (Bm ) .
m m

Exercise 5.2. Let B = BRn = ({open subsets of Rn }) be the Borel algebra on Rn and be a probability measure on B . Further, let B0 denote those sets B B such that for every > 0 there exists F B V such that F is closed, V is open, and (V \ F ) < . Show: 1. B0 contains all closed subsets of B . Hint: given a closed subset, F R and k N, let Vk := xF B (x, 1/k ) , where B (x, ) := {y Rn : |y x| < } . Show, Vk F as k . 2. Show B0 is a algebra and use this along with the rst part of this exercise to conclude B = B 0 . Hint: follow closely the method used in the rst step of the proof of Theorem 5.10. 3. Show for every > 0 and B B , there exist a compact subset, K Rn , such that K B and (B \ K ) < . Hint: take K := F {x Rn : |x| n} for some suciently large n.
n

We may let n in this inequality to nd,


n

lim (An ) lim (Bm ) .


m

(5.5)

Using this equation when B = A, implies, limn (An ) = limm (Bm ) whenever An A and Bn A. Therefore it is unambiguous to dene (A) by; (A) = lim (An )
n

for any sequence A such that An A. With this denition, the continuity of is clear and the monotonicity of follows from Eq. (5.5). Suppose that A, B A and {An }n=1 and {Bn }n=1 are sequences in A such that An A and Bn B as n . Then passing to the limit as n in the identity, (An Bn ) + (An Bn ) = (An ) + (Bn )

{An }n=1

5.4 Construction of Measures


Remark 5.14. Let us recall from Proposition 4.3 and Remark 4.4 that a nitely additive measure : A [0, ] is a premeasure on A i (An ) (A) for all {An }n=1 A such that An A A. Furthermore if (X ) < , then is a premeasure on A i (An ) 0 for all {An }n=1 A such that An .
Page: 38 job: prob

proves Eq. (5.3). In particular, it follows that is nitely additive on A . Let {An }n=1 be any sequence in A and choose {An,i }i=1 A such that An,i An as i . Then we have,
N N

N n=1 An,N
n=1

(An,N )
n=1

(An )
n=1

(An ) .

(5.6)

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

5.4 Construction of Measures


Since A N n=1 An,N n=1 An A , we may let N in Eq. (5.6) to conclude Eq. (5.4) holds. If we further assume that {An }n=1 A is a disjoint sequence, by the nite additivity and monotonicity of on A , we have N

39

(An ) = lim
n=1

(An ) = lim N n=1 An (n=1 An ) . n=1 N

Denition 5.17 (Measurable Sets). Suppose is a nite premeasure on an algebra A 2X . We say that B X is measurable if for all > 0 there exists A A and C A such that A B C and (C \ A) < . We will denote the collection of measurable subsets of X by B = B () . We also dene : B [0, (X )] by (B ) = inf { (C ) : B C A } . (5.8)

The previous two inequalities show is additive on A . Suppose is a nite premeasure on an algebra, A 2X , and A A A . Since A, Ac A and X = A Ac , it follows that (X ) = (A) + (Ac ) . From this observation we may extend to a function on A A by dening (A) := (X ) (Ac ) for all A A . (5.7)

Remark 5.18. If B B, > 0, A A and C A are such that A B C and (C \ A) < , then (A) (B ) (C ) and in particular, 0 (B ) (A) < , and 0 (C ) (B ) < . Indeed, if C A with B C , then A C and so by Lemma 5.16, (A) (C \ A) + (A) = (C ) from which it follows that (A) (B ) . The fact that (B ) (C ) follows directly from Eq. (5.8). Theorem 5.19 (Finite Premeasure Extension Theorem). Suppose is a nite premeasure on an algebra A 2X . Then B is a algebra on X which contains A and is a additive measure on B . Moreover, is the unique measure on B such that |A = . Proof. It is clear that A B and that B is closed under complementation. Now suppose that Bi B for i = 1, 2 and > 0 is given. We may then choose Ai Bi Ci such that Ai A , Ci A , and (Ci \ Ai ) < for i = 1, 2. Then with A = A1 A2 , B = B1 B2 and C = C1 C2 , we have A A B C A . Since C \ A = (C1 \ A) (C2 \ A) (C1 \ A1 ) (C2 \ A2 ) , it follows from the sub-additivity of that with (C \ A) (C1 \ A1 ) + (C2 \ A2 ) < 2. Since > 0 was arbitrary, we have shown that B B . Hence we now know that B is an algebra. Because B is an algebra, to verify that B is a algebra it suces to show that B = n=1 Bn B whenever {Bn }n=1 is a disjoint sequence in B . To prove B B , let > 0 be given and choose Ai Bi Ci such that Ai A , Ci A , and (Ci \ Ai ) < 2i for all i. Since the {Ai }i=1 are pairwise disjoint we may use Lemma 5.16 to show,
macro: svmonob.cls date/time: 23-Feb-2007/15:20

(5.9)

Lemma 5.16. Suppose is a nite premeasure on an algebra, A 2X , and has been extended to A A as described in Proposition 5.15 and Eq. (5.7) above. 1. If A A and An A such that An A, then (A) = limn (An ) . 2. is additive when restricted to A . 3. If A A and C A such that A C, then (C \ A) = (C ) (A) . Proof.
c 1. Since Ac n A A , by the denition of (A) and Proposition 5.15 it follows that

(A) = (X ) (Ac ) = (X ) lim (Ac n)


n

= lim [ (X ) (Ac n )] = lim (An ) .


n n

2. Suppose A, B A are disjoint sets and An , Bn A such that An A and Bn B, then An Bn A B and therefore, (A B ) = lim (An Bn ) = lim [ (An ) + (Bn ) (An Bn )]
n n

= (A) + (B ) wherein the last equality we have used Proposition 4.3. 3. By assumption, X = Ac C. So applying the strong additivity of on A in Eq. (5.3) with A Ac A and B C A shows (X ) + (C \ A) = (Ac C ) + (Ac C ) = (Ac ) + (C ) = (X ) (A) + (C ) .

Page: 39

job: prob

40

5 Countably Additive Measures


n n

(Ci ) =
i=1 i=1

( (Ai ) + (Ci \ Ai ))
n n

Theorem 5.20. Suppose that is a nite premeasure on an algebra A. Then (B ) := inf { (C ) : B C A } B (A) (5.11) 2i . denes a measure on (A) and this measure is the unique extension of on A to a measure on (A) . Proof. Let {Xn }n=1 A be chosen so that (Xn ) < for all n and Xn X as n and let (5.10) n (A) := n (A Xn ) for all A A.
n i=1

(n i=1 Ai )

+
i=1

(Ci \ Ai ) (X ) +
i=1

Passing to the limit, n , in this equation then shows

(Ci ) (X ) + < .
i=1 n Let B = i=1 Bi , C := i=1 Ci A and for n N let A := n n Then A A B C A , C \ A A and

Ai A .

n n C \ An = i=1 (Ci \ A ) [i=1 (Ci \ Ai )] i=n+1 Ci A .

Therefore, using the sub-additivity of on A and the estimate (5.10),


n

(C \ An )
i=1

(Ci \ Ai ) +
i=n+1

(Ci )

Each n is a premeasure (as is easily veried) on A and hence by Theorem 5.19 each n has an extension, n , to a measure on (A) . Since the measure n are increasing, := limn n is a measure which extends . The proof will be completed by verifying that Eq. (5.11) holds. Let B (A) , Bm = Xm B and > 0 be given. By Theorem 5.19, there exists Cm A such that Bm Cm Xm and (Cm \ Bm ) = m (Cm \ Bm ) < 2n . Then C := m=1 Cm A and

(C \ B ) (Ci ) as n .
m=1

(Cm \ B )

m=1

(Cm \ B )
m=1

(Cm \ Bm ) < .

+
i=n+1

Thus (B ) (C ) = (B ) + (C \ B ) (B ) + which, since > 0 is arbitrary, shows satises Eq. (5.11). The uniqueness of the extension is proved in Exercise 5.1. Example 5.21. If F (x) = x for all x R, we denote F by m and call m Lebesgue measure on (R, BR ) . 2i < .
i=1

Since > 0 is arbitrary, it follows that B B . Moreover by repeated use of Remark 5.18, we nd

| (B ) (An )| < +
i=n+1 n n

(Ci ) and
n n

(Bi ) (An ) =
i=1 i=1

[ (Bi ) (Ai )]
i=1

| (Bi ) (Ai )|

Combining these estimates shows


n

Theorem 5.22. Lebesgue measure m is invariant under translations, i.e. for B BR and x R, m(x + B ) = m(B ). (5.12) Moreover, m is the unique measure on BR such that m((0, 1]) = 1 and Eq. (5.12) holds for B BR and x R. Moreover, m has the scaling property m(B ) = || m(B ) (5.13)

(B )
i=1

(Bi ) < 2 +
i=n+1

(Ci )

which upon letting n gives,

where R, B BR and B := {x : x B }. (Bi ) 2. Proof. Let mx (B ) := m(x + B ), then one easily shows that mx is a measure on BR such that mx ((a, b]) = b a for all a < b. Therefore, mx = m by the uniqueness assertion in Exercise 5.1. For the converse, suppose that m is translation invariant and m((0, 1]) = 1. Given n N, we have
macro: svmonob.cls date/time: 23-Feb-2007/15:20

(B )
i=1

Since > 0 is arbitrary, we have shown (B ) = i=1 (Bi ) . This completes the proof that B is a - algebra and that is a measure on B .
Page: 40 job: prob

5.5 Completions of Measure Spaces

41

(0, 1] = n k=1 ( Therefore,

k1 k , ] = n k=1 n n

k1 1 + (0, ] . n n

Denition 5.24. A measure space (X, B , ) is complete if every subset of a null set is in B , i.e. for all F X such that F E B with (E ) = 0 implies that F B . Proposition 5.25 (Completion of a Measure). Let (X, B , ) be a measure space. Set N = N := {N X : F B such that N F and (F ) = 0} , := {A N : A B and N N } and B=B (A N ) := (A) for A B and N N , is a algebra, , see Fig. 5.1. Then B is a well dened measure on B is the which extends on B , and (X, B , unique measure on B ) is complete measure , is called the completion of B relative to and space. The -algebra, B , is called the completion of . . Let A B and N N and choose F B such Proof. Clearly X, B

1 = m((0, 1]) =
k=1 n

1 k1 + (0, ] n n

=
k=1

1 1 m((0, ]) = n m((0, ]). n n

That is to say 1 ]) = 1/n. n l ]) = l/n for all l, n N and therefore by the translation Similarly, m((0, n invariance of m, m((0, m((a, b]) = b a for all a, b Q with a < b. Finally for a, b R such that a < b, choose an , bn Q such that bn b and an a, then (an , bn ] (a, b] and thus m((a, b]) = lim m((an , bn ]) = lim (bn an ) = b a,
n n

i.e. m is Lebesgue measure. To prove Eq. (5.13) we may assume that = 0 1 since this case is trivial to prove. Now let m (B ) := || m(B ). It is easily checked that m is again a measure on BR which satises m ((a, b]) = 1 m ((a, b]) = 1 (b a) = b a if > 0 and m ((a, b]) = || if < 0. Hence m = m.
1

Fig. 5.1. Completing a algebra.

m ([b, a)) = ||

(b a) = b a that N F and (F ) = 0. Since N c = (F \ N ) F c , (A N )c = Ac N c = Ac (F \ N F c ) = [Ac (F \ N )] [Ac F c ] is closed under where [Ac (F \ N )] N and [Ac F c ] B . Thus B complements. If Ai B and Ni Fi B such that (Fi ) = 0 then since Ai B and Ni Fi and (Ai Ni ) = (Ai ) (Ni ) B (Fi ) (Fi ) = 0. Therefore, B is a algebra. Suppose A N1 = B N2 with A, B B and N1 , N2 , N . Then A A N1 A N1 F2 = B F2 which shows that (A) (B ) + (F2 ) = (B ).
macro: svmonob.cls date/time: 23-Feb-2007/15:20

5.5 Completions of Measure Spaces


Denition 5.23. A set E X is a null set if E B and (E ) = 0. If P is some property which is either true or false for each x X, we will use the terminology P a.e. (to be read P almost everywhere) to mean E := {x X : P is false for x} is a null set. For example if f and g are two measurable functions on (X, B , ), f = g a.e. means that (f = g ) = 0.
Page: 41 job: prob

42

5 Countably Additive Measures

Similarly, we show that (B ) (A) so that (A) = (B ) and hence (A N ) := (A) is well dened. It is left as an exercise to show is a measure, i.e. that it is countable additive.

Theorem 5.27 (Kolmogorovs Extension Theorem I.). Continuing the notation above, every nitely additive probability measure, P : A [0, 1] , has a unique extension to a probability measure on (A) . Proof. From Theorem 5.19, it suces to show limn P (An ) = 0 whenever {An }n=1 A with An . However, by Lemma 5.26, if An A and An , we must have that An = for a.a. n and in particular P (An ) = 0 for a.a. n. This certainly implies limn P (An ) = 0. Given a probability measure, P : (A) [0, 1] and n N and (1 , . . . , n ) n , let pn (1 , . . . , n ) := P ({ : 1 = 1 , . . . , n = n }) . (5.15)

5.6 A Baby Version of Kolmogorovs Extension Theorem


For this section, let be a nite set, := := N , and let A denote the collection of cylinder subsets of , where A is a cylinder set i there exists n N and B n such that A = B := { : (1 , . . . , n ) B } . Observe that we may also write A as A = B where B = B k n+k for any k 0. Exercise 5.3. Show A is an algebra. Lemma 5.26. Suppose {An }n=1 A is a decreasing sequence of non-empty cylinder sets, then n=1 An = . Proof. Since An A, we may nd Nn N and Bn Nn such that An = Bn . Using the observation just prior to this Lemma, we may assume that {Nn }n=1 is a strictly increasing sequence. By assumption, there exists (n) = (1 (n) , 2 (n) , . . . ) such that (n) An for all n. Moreover, since (n) An Ak for all k n, it follows that (1 (n) , 2 (n) , . . . , Nk (n)) Bk for all k n. (5.14) Since is a nite set, we may nd a 1 and an innite subset, 1 N such that 1 (n) = 1 for all n 1. Similarly, there exists 2 and an innite set, 2 1 , such that 2 (n) = 2 for all n 2 . Continuing this procedure inductively, there exists (for all j N) innite subsets, j N and points j such that 1 2 3 . . . and j (n) = j for all n j . We are now going to complete the proof by showing that := (1 , 2 , . . . ) is in n=1 An . By the construction above, for all N N we have (1 (n) , . . . , N (n)) = (1 , . . . , N ) for all n N . Taking N = Nk and n Nk with n k, we learn from Eq. (5.14) that (1 , . . . , Nk ) = (1 (n) , . . . , Nk (n)) Bk . But this is equivalent to showing Ak . Since k N was arbitrary it follows that n=1 An .
Page: 42 job: prob

Exercise 5.4 (Consistency Conditions). If pn is dened as above, show: 1. p1 () = 1 and 2. for all n N and (1 , . . . , n ) n , pn (1 , . . . , n ) =

pn+1 (1 , . . . , n , ) .

Exercise 5.5 (Converse to 5.4). Suppose for each n N we are given functions, pn : n [0, 1] such that the consistency conditions in Exercise 5.4 hold. Then there exists a unique probability measure, P on (A) such that Eq. (5.15) holds for all n N and (1 , . . . , n ) n . Example 5.28 (Existence of iid simple R.V.s). Suppose now that q : [0, 1] is a function such that q () = 1. Then there exists a unique probability measure P on (A) such that, for all n N and (1 , . . . , n ) n , we have P ({ : 1 = 1 , . . . , n = n }) = q (1 ) . . . q (n ) . This is a special case of Exercise 5.5 with pn (1 , . . . , n ) := q (1 ) . . . q (n ) .

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

6 Random Variables
6.1 Measurable Functions
Denition 6.1. A measurable space is a pair (X, M), where X is a set and M is a algebra on X. To motivate the notion of a measurable function, suppose (X, M, ) is a measure space and f : X R+ is a function. Roughly speaking, we are going to dene f d as a certain limit of sums of the form,
X

Proof. If f is M/F measurable, then f 1 (E ) f 1 (F ) M. Conversely if f 1 (E ) M, then, using Lemma 3.26, f 1 (F ) = f 1 ( (E )) = f 1 (E ) M.

Corollary 6.7. Suppose that (X, M) is a measurable space. Then the following conditions on a function f : X R are equivalent: 1. f is (M, BR ) measurable, 2. f 1 ((a, )) M for all a R, 3. f 1 ((a, )) M for all a Q, 4. f 1 ((, a]) M for all a R. Exercise 6.1. Prove Corollary 6.7. Hint: See Exercise 3.9. Exercise 6.2. If M is the algebra generated by E 2X , then M is the union of the algebras generated by countable subsets F E . Exercise 6.3. Let (X, M) be a measure space and fn : X R be a sequence of measurable functions on X. Show that {x : limn fn (x) exists in R} M. Exercise 6.4. Show that every monotone function f : R R is (BR , BR ) measurable. Denition 6.8. Given measurable spaces (X, M) and (Y, F ) and a subset A X. We say a function f : A Y is measurable i f is MA /F measurable. Proposition 6.9 (Localizing Measurability). Let (X, M) and (Y, F ) be measurable spaces and f : X Y be a function. 1. If f is measurable and A X then f |A : A Y is measurable. 2. Suppose there exist An M such that X = n=1 An and f |An is MAn measurable for all n, then f is M measurable. Proof. 1. If f : X Y is measurable, f 1 (B ) M for all B F and therefore 1 1 f | (B ) MA for all B F . A (B ) = A f

ai (f 1 (ai , ai+1 ]).


0<a1 <a2 <a3 <...

For this to make sense we will need to require f 1 ((a, b]) M for all a < b. Because of Corollary 6.7 below, this last condition is equivalent to the condition f 1 (BR ) M. Denition 6.2. Let (X, M) and (Y, F ) be measurable spaces. A function f : X Y is measurable of more precisely, M/F measurable or (M, F ) measurable, if f 1 (F ) M, i.e. if f 1 (A) M for all A F . Remark 6.3. Let f : X Y be a function. Given a algebra F 2 , the algebra M := f 1 (F ) is the smallest algebra on X such that f is (M, F ) - measurable . Similarly, if M is a - algebra on X then F = f M ={A 2Y |f 1 (A) M} is the largest algebra on Y such that f is (M, F ) - measurable. Example 6.4 (Characteristic Functions). Let (X, M) be a measurable space and 1 A X. Then 1A is (M, BR ) measurable i A M. Indeed, 1 A (W ) is either 1 c , X, A or A for any W R with 1A ({1}) = A. Example 6.5. Suppose f : X Y with Y being a nite set and F = 2 . Then f is measurable i f 1 ({y }) M for all y Y. Proposition 6.6. Suppose that (X, M) and (Y, F ) are measurable spaces and further assume E F generates F , i.e. F = (E ) . Then a map, f : X Y is measurable i f 1 (E ) M.
Y

44

6 Random Variables

2. If B F , then
1 1 f 1 (B ) = (B ) An = n=1 f n=1 f |An (B ).

Since each An M, MAn M and so the previous displayed equation shows f 1 (B ) M. The proof of the following exercise is routine and will be left to the reader. Proposition 6.10. Let (X, M, ) be a measure space, (Y, F ) be a measurable space and f : X Y be a measurable map. Dene a function : F [0, ] by (A) := (f 1 (A)) for all A F . Then is a measure on (Y, F ) . (In the future we will denote by f or f 1 and call f the push-forward of by f or the law of f under . Theorem 6.11. Given a distribution function, F : R [0, 1] let G : (0, 1) R be dened (see Figure 6.1) by, G (y ) := inf {x : F (x) y } . Then G : (0, 1) R is Borel measurable and G m = F where F is the unique measure on (R, BR ) such that F ((a, b]) = F (b) F (a) for all < a < b < . To give a formal proof of Eq. (6.1), G (y ) = inf {x : F (x) y } x0 , there exists xn x0 with xn x0 such that F (xn ) y. By the right continuity of F, it follows that F (x0 ) y. Thus we have shown {G x0 } (0, F (x0 )] (0, 1) . For the converse, if y F (x0 ) then G (y ) = inf {x : F (x) y } x0 , i.e. y {G x0 } . Indeed, y G1 ((, x0 ]) i G (y ) x0 . Observe that G (F (x0 )) = inf {x : F (x) F (x0 )} x0 and hence G (y ) x0 whenever y F (x0 ) . This shows that (0, F (x0 )] (0, 1) G1 ((0, x0 ]) . As a consequence we have G m = F . Indeed, (G m) ((, x]) = m G1 ((, x]) = m ({y (0, 1) : G (y ) x}) = m ((0, F (x)] (0, 1)) = F (x) .
Fig. 6.1. A pictorial denition of G.

Fig. 6.2. As can be seen from this picture, G (y ) x0 i y F (x0 ) and similalry, G (y ) x1 i y x1 .

See section 2.5.2 on p. 61 of Resnick for more details. Theorem 6.12 (Durrets Version). Given a distribution function, F : R [0, 1] let Y : (0, 1) R be dened (see Figure 6.3) by, Y (x) := sup {y : F (y ) < x} . Then Y : (0, 1) R is Borel measurable and Y m = F where F is the unique measure on (R, BR ) such that F ((a, b]) = F (b) F (a) for all < a < b < .
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Proof. Since G : (0, 1) R is a non-decreasing function, G is measurable. We also claim that, for all x0 R, that G1 ((0, x0 ]) = {y : G (y ) x0 } = (0, F (x0 )] R, see Figure 6.2.
Page: 44 job: prob

(6.1)

6.1 Measurable Functions

45

Proof. By assumption g 1 (G ) F and f 1 (F ) M so that (g f )


1

(G ) = f 1 g 1 (G ) f 1 (F ) M.

Denition 6.14 ( Algebras Generated by Functions). Let X be a set and suppose there is a collection of measurable spaces {(Y , F ) : A} and functions f : X Y for all A. Let (f : A) denote the smallest algebra on X such that each f is measurable, i.e.
1 (f : A) = ( f (F )).

Fig. 6.3. A pictorial denition of Y (x) .

Example 6.15. Suppose that Y is a nite set, F = 2Y , and X = Y N for some N N. Let i : Y N Y be the projection maps, i (y1, . . . , yN ) = yi . Then, as the reader should check, (1 , . . . , n ) = A N n : A n . Proposition 6.16. Assuming the notation in Denition 6.14 and additionally let (Z, M) be a measurable space and g : Z X be a function. Then g is (M, (f : A)) measurable i f g is (M, F )measurable for all A. Proof. () If g is (M, (f : A)) measurable, then the composition f g is (M, F ) measurable by Lemma 6.13. () Let
1 G = (f : A) = A f (F ) .

Proof. Since Y : (0, 1) R is a non-decreasing function, Y is measurable. Also observe, if y < Y (x) , then F (y ) < x and hence, F (Y (x) ) = lim F (y ) x.
y Y (x)

For y > Y (x) , we have F (y ) x and therefore, F (Y (x)) = F (Y (x) +) = lim F (y ) x


y Y (x)

and so we have shown F (Y (x) ) x F (Y (x)) . We will now show {x (0, 1) : Y (x) y0 } = (0, F (y0 )] (0, 1) . (6.2) and therefore

If f g is (M, F ) measurable for all , then


1 g 1 f (F ) M A

For the inclusion , if x (0, 1) and Y (x) y0 , then x F (Y (x)) F (y0 ), i.e. x (0, F (y0 )] (0, 1) . Conversely if x (0, 1) and x F (y0 ) then (by denition of Y (x)) y0 Y (x) . From the identity in Eq. (6.2), it follows that Y is measurable and (Y m) ((, y0 )) = m Y
1

1 1 g 1 A f (F ) = A g 1 f (F ) M.

Hence
1 g 1 (G ) = g 1 A f (F ) 1 = (g 1 A f (F ) M

(, y0 ) = m ((0, F (y0 )] (0, 1)) = F (y0 ) .

which shows that g is (M, G ) measurable. Denition 6.17. A function f : X Y between two topological spaces is Borel measurable if f 1 (BY ) BX . Proposition 6.18. Let X and Y be two topological spaces and f : X Y be a continuous function. Then f is Borel measurable.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Therefore, Law (Y ) = F as desired. Lemma 6.13 (Composing Measurable Functions). Suppose that (X, M), (Y, F ) and (Z, G ) are measurable spaces. If f : (X, M) (Y, F ) and g : (Y, F ) (Z, G ) are measurable functions then g f : (X, M) (Z, G ) is measurable as well.
Page: 45 job: prob

46

6 Random Variables

Proof. Using Lemma 3.26 and BY = (Y ), f


1

Proof. This is an application of Lemma 6.20 and Proposition 6.16. Corollary 6.22. Let (X, M) be a measurable space and f, g : X C be (M, BC ) measurable functions. Then f g and f g are also (M, BC ) measurable. Proof. Dene F : X C C, A : C C C and M : C C C by F (x) = (f (x), g (x)), A (w, z ) = w z and M (w, z ) = wz. Then A and M are continuous and hence (BC2 , BC ) measurable. Also F is (M, BC2 ) measurable since 1 F = f and 2 F = g are (M, BC ) measurable. Therefore A F = f g and M F = f g, being the composition of measurable functions, are also measurable. As an example of this material, let us give another proof of the existence of i.i.d. simple random variables see Example 5.28 above. Theorem 6.23 (Existence of i.i.d simple R.V.s). This Theorem has been moved to Theorem 7.22 below. (6.3) Corollary 6.24 (Independent variables on product spaces). This Corollary has been moved to Corollary 7.23 below. Lemma 6.25. Let C, (X, M) be a measurable space and f : X C be a (M, BC ) measurable function. Then F (x) := is measurable. Proof. Dene i : C C by i(z ) = For any open set V C we have i1 (V ) = i1 (V \ {0}) i1 (V {0}) Because i is continuous except at z = 0, i1 (V \ {0}) is an open set and hence in BC . Moreover, i1 (V {0}) BC since i1 (V {0}) is either the empty set or the one point set {0} . Therefore i1 (C ) BC and hence i1 (BC ) = i1 ( (C )) = (i1 (C )) BC which shows that i is Borel measurable. Since F = i f is the composition of measurable functions, F is also measurable. Remark 6.26. For the real case of Lemma 6.25, dene i as above but now take z to real. From the plot of i, Figure 6.26, the reader may easily verify that 1 i1 ((, a]) is an innite half interval for all a and therefore i is measurable. x
macro: svmonob.cls date/time: 23-Feb-2007/15:20
1 f (x)

(BY ) = f

( (Y )) = (f

(Y )) (X ) = BX .

Example 6.19. For i = 1, 2, . . . , n, let i : Rn R be dened by i (x) = xi . Then each i is continuous and therefore BRn /BR measurable. Lemma 6.20. Let E denote the collection of open rectangle in Rn , then BRn = (E ) . We also have that BRn = (1 , . . . , n ) and in particular, A1 An BRn whenever Ai BR for i = 1, 2, . . . , n. Therefore BRn may be described as the algebra generated by {A1 An : Ai BR } . Proof. Assertion 1. Since E BRn , it follows that (E ) BRn . Let E0 := {(a, b) : a, b Qn
n

a < b} ,

where, for a, b R , we write a < b i ai < bi for i = 1, 2, . . . , n and let (a, b) = (a1 , b1 ) (an , bn ) .

Since every open set, V Rn , may be written as a (necessarily) countable union of elements from E0 , we have V (E0 ) (E ) , i.e. (E0 ) and hence (E ) contains all open subsets of Rn . Hence we may conclude that BRn = (open sets) (E0 ) (E ) BRn . Assertion 2. Since each i is BRn /BR measurable, it follows that (1 , . . . , n ) BRn . Moreover, if (a, b) is as in Eq. (6.3), then (a, b) =
1 n i=1 i

if if

f (x) = 0 f (x) = 0

((ai , bi )) (1 , . . . , n ) .

if z = 0 0 if z = 0.

1 z

Therefore, E (1 , . . . , n ) and BRn = (E ) (1 , . . . , n ) . Assertion 3. If Ai BR for i = 1, 2, . . . , n, then A1 An =


1 n i=1 i

(Ai ) (1 , . . . , n ) = BRn .

Corollary 6.21. If (X, M) is a measurable space, then f = (f1 , f2 , . . . , fn ) : X Rn is (M, BRn ) measurable i fi : X R is (M, BR ) measurable for each i. In particular, a function f : X C is (M, BC ) measurable i Re f and Im f are (M, BR ) measurable.
Page: 46 job: prob

6.1 Measurable Functions

47

be a funcCorollary 6.28. Let (X, M) be a measurable space and f : X R tion. Then the following are equivalent 1. f is (M, BR ) - measurable, 2. f 1 ((a, ]) M for all a R, 3. f 1 ((, a]) M for all a R, 4. f 1 ({}) M, f 1 ({}) M and f 0 : X R dened by f 0 (x) := 1R (f (x)) = is measurable. = R {} . When talking We will often deal with functions f : X R dened about measurability in this context we will refer to the algebra on R by (6.4) BR := ({[a, ] : a R}) . Proposition 6.27 (The Structure of BR be as above, then ). Let BR and BR : A R BR }. BR = {A R In particular {} , {} BR and BR BR . Proof. Let us rst observe that
c {} = , n=1 [, n) = n=1 [n, ] BR {} = n=1 [n, ] BR and R = R\ {} BR .

f (x) if f (x) R 0 if f (x) {}

be functions Corollary 6.29. Let (X, M) be a measurable space, f, g : X R and (f + g ) : X R using the conventions, 0 = 0 and dene f g : X R and (f + g ) (x) = 0 if f (x) = and g (x) = or f (x) = and g (x) = . Then f g and f + g are measurable functions on X if both f and g are measurable. Exercise 6.5. Prove Corollary 6.28 noting that the equivalence of items 1. 3. is a direct analogue of Corollary 6.7. Use Proposition 6.27 to handle item 4. Exercise 6.6. Prove Corollary 6.29. Proposition 6.30 (Closure under sups, infs and limits). Suppose that (X, M) is a measurable space and fj : (X, M) R for j N is a sequence of M/BR measurable functions. Then supj fj , inf j fj , lim sup fj and lim inf fj
j j

(6.5)

be the inclusion map, Letting i : R R


1 i1 (BR ) = i

= Thus we have shown

[a, ] : a R [a, ] R : a R

i1 ([a, ]) : a R

= ({[a, ) : a R}) = BR .

are all M/BR measurable functions. (Note that this result is in generally false when (X, M) is a topological space and measurable is replaced by continuous in the statement.) Proof. Dene g+ (x) := sup j fj (x), then {x : g+ (x) a} = {x : fj (x) a j } = j {x : fj (x) a} M so that g+ is measurable. Similarly if g (x) = inf j fj (x) then {x : g (x) a} = j {x : fj (x) a} M. Since lim sup fj = inf sup {fj : j n} and
j j n

BR = i1 (BR ) = {A R : A BR }. This implies: 1. A BR A R BR and = is such that A R BR there exists B BR 2. if A R such that A R = B R. Because AB {} and {} , {} BR we may conclude that A BR as well. This proves Eq. (6.5). The proofs of the next two corollaries are left to the reader, see Exercises 6.5 and 6.6.
Page: 47 job: prob

lim inf fj = sup inf {fj : j n}


n

we are done by what we have already proved.


macro: svmonob.cls date/time: 23-Feb-2007/15:20

48

6 Random Variables

let f+ (x) := max {f (x), 0} and Denition 6.31. Given a function f : X R f (x) := max (f (x), 0) = min (f (x), 0) . Notice that f = f+ f . is a Corollary 6.32. Suppose (X, M) is a measurable space and f : X R function. Then f is measurable i f are measurable. Proof. If f is measurable, then Proposition 6.30 implies f are measurable. Conversely if f are measurable then so is f = f+ f . Denition 6.33. Let (X, M) be a measurable space. A function : X F ) is a simple function if is M BF (F denotes either R, C or [0, ] R measurable and (X ) contains only nitely many elements. Any such simple functions can be written as
n

=
i=1

i 1Ai with Ai M and i F.

(6.6)

Fig. 6.4. Constructing simple functions approximating a function, f : X [0, ].

Indeed, take 1 , 2 , . . . , n to be an enumeration of the range of and Ai = 1 ({i }). Note that this argument shows that any simple function may be written intrinsically as = y 1 1 ( { y } ) . (6.7)
y F

2k+1 2k 2k then n (x) = n+1 (x) = 2n and if x if x f 1 ( 2n +1 , 2n+1 ] +1 2k 2k+1 2k+1 2k+2 1 f ( 2n+1 , 2n+1 ] then n (x) = 2n+1 < 2n+1 = n+1 (x). Similarly

(2n , ] = (2n , 2n+1 ] (2n+1 , ], and so for x f 1 ((2n+1 , ]), n (x) = 2n < 2n+1 = n+1 (x) and for x f 1 ((2n , 2n+1 ]), n+1 (x) 2n = n (x). Therefore n n+1 for all n. It is clear by construction that n (x) f (x) for all x and that 0 f (x) n (x) 2n if x X2n . Hence we have shown that n (x) f (x) for all x X and n f uniformly on bounded sets. For the second assertion, rst assume that f : X R is a measurable function and choose n to be simple functions such + that n f as n and dene n = n n . Then
+ |n | = + n + n n+1 + n+1 = |n+1 | + and clearly |n | = + n + n f+ + f = |f | and n = n n f+ f = f as n . Now suppose that f : X C is measurable. We may now choose simple function un and vn such that |un | |Re f | , |vn | |Im f | , un Re f and vn Im f as n . Let n = un + ivn , then 2 |n | = u2 n + vn |Re f | + |Im f | = |f | 2 2 2 2

The next theorem shows that simple functions are pointwise dense in the space of measurable functions. Theorem 6.34 (Approximation Theorem). Let f : X [0, ] be measurable and dene, see Figure 6.4,
n2n 1

n (x) :=
k=0 n2 1
n

k 1 1 k k+1 (x) + n1f 1 ((n2n ,]) (x) 2n f (( 2n , 2n ]) k 1 k k+1 (x) + n1{f >n2n } (x) 2n { 2n <f 2n }

=
k=0

then n f for all n, n (x) f (x) for all x X and n f uniformly on the sets XM := {x X : f (x) M } with M < . Moreover, if f : X C is a measurable function, then there exists simple functions n such that limn n (x) = f (x) for all x and |n | |f | as n . Proof. Since ( k k+1 2k 2k + 1 2k + 1 2k + 2 , n ] = ( n+1 , n+1 ] ( n+1 , n+1 ], n 2 2 2 2 2 2
job: prob

and n = un + ivn Re f + i Im f = f as n .

Page: 48

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

6.2 Factoring Random Variables


Lemma 6.35. Suppose that (Y, F ) is a measurable space and F : X Y is a , there is a map. Then to every ( (F ), BR ) measurable function, H : X R (F , BR ) measurable function h : Y R such that H = h F. Proof. First suppose that H = 1A where A (F ) = F 1 (F ). Let B F such that A = F 1 (B ) then 1A = 1F 1 (B ) = 1B F and hence the Lemma is valid in this case with h = 1B . More generally if H = ai 1Ai is a simple function, then there exists Bi F such that 1Ai = 1Bi F and hence H = h F . For general ( (F ), F ) measurable with h := ai 1Bi a simple function on R function, H, from X R, choose simple functions Hn converging to H. Let hn such that Hn = hn F. Then it follows that be simple functions on R H = lim Hn = lim sup Hn = lim sup hn F = h F
n n n

. where h := lim sup hn a measurable function from Y to R


n

The following is an immediate corollary of Proposition 6.16 and Lemma 6.35. Corollary 6.36. Let X and A be sets, and suppose for A we are give a measurable space (Y , F ) and a function f : X Y . Let Y := A Y , F := A F be the product algebra on Y and M := (f : A) be the smallest algebra on X such that each f is measurable. Then the function F : X Y dened by [F (x)] := f (x) for each A is (M, F ) measurable is (M, BR and a function H : X R ) measurable i there exists a (F , BR ) such that H = h F. measurable function h from Y to R

7 Independence
7.1 and Monotone Class Theorems
Denition 7.1. Let C 2X be a collection of sets. 1. C is a monotone class if it is closed under countable increasing unions and countable decreasing intersections, 2. C is a class if it is closed under nite intersections and 3. C is a class if C satises the following properties: a) X C b) If A, B C and A B , then B \ A C . (Closed under proper dierences.) c) If An C and An A, then A C . (Closed under countable increasing unions.) Remark 7.2. If C is a collection of subsets of which is both a class and a system then C is a algebra. Indeed, since Ac = X \ A, we see that any - system is closed under complementation. If C is also a system, it is closed under intersections and therefore C is an algebra. Since C is also closed under increasing unions, C is a algebra. Lemma 7.3 (Alternate Axioms for a System*). Suppose that L 2 is a collection of subsets . Then L is a class i satises the following postulates: 1. X L 2. A L implies Ac L. (Closed under complementation.) 3. If {An }n=1 L are disjoint, the n=1 An L. (Closed under disjoint unions.) Proof. Suppose that L satises a. c. above. Clearly then postulates 1. and 2. hold. Suppose that A, B L such that A B = , then A B c and Ac B c = B c \ A L. Taking compliments of this result shows A B L as well. So by induction, m Bm := n=1 An L. Since Bm n=1 An it follows from postulate c. that n=1 An L. Now suppose that L satises postulates 1. 3. above. Notice that L and by postulate 3., L is closed under nite disjoint unions. Therefore if A, B L with A B, then B c L and A B c = allows us to conclude that A B c L. Taking complements of this result shows B \ A = Ac B L as well, i.e. postulate b. holds. If An L with An A, then Bn := An \ An1 L for all n, where by convention A0 = . Hence it follows by postulate 3 that n=1 An = n=1 Bn L. Theorem 7.4 (Dynkins Theorem). If L is a class which contains a contains a class, P , then (P ) L. Proof. We start by proving the following assertion; for any element C L, the collection of sets, LC := {D L : C D L} , is a system. To prove this claim, observe that: a. X LC , b. if A B with A, B LC , then A C, B C L with A C B \ C and (B \ A) C = [B C ] \ A = [B C ] \ [A C ] L. Therefore LC is closed under proper dierences. Finally, c. if An LC with An A, then An C L and An C A C L, i.e. A LC . Hence we have veried LC is still a system. For the rest of the proof, we may assume with out loss of generality that L is the smallest class containing P if not just replace L by the intersection of all classes containing P . Then for C P we know that LC L is a - class containing P and hence LC = L. Since C P was arbitrary, we have shown, C D L for all C P and D L. We may now conclude that if C L, then P LC L and hence again LC = L. Since C L is arbitrary, we have shown C D L for all C, D L, i.e. L is a system. So by Remark 7.2, L is a algebra. Since (P ) is the smallest algebra containing P it follows that (P ) L. As an immediate corollary, we have the following uniqueness result. Proposition 7.5. Suppose that P 2 is a system. If P and Q are two probability1 measures on (P ) such that P = Q on P , then P = Q on (P ) .
1

More generally, P and Q could be two measures such that P ( ) = Q ( ) < .

52

7 Independence

Proof. Let L := {A (P ) : P (A) = Q (A)} . One easily shows L is a class which contains P by assumption. Indeed, P L, if A, B L with A B, then P (B \ A) = P (B ) P (A) = Q (B ) Q (A) = Q (B \ A) so that B \ A L, and if An L with An A, then P (A) = limn P (An ) = limn Q (An ) = Q (A) which shows A L. Therefore (P ) L = (P ) and the proof is complete. Example 7.6. Let := {a, b, c, d} and let and be the probability measure 1 on 2 determined by, ({x}) = 1 4 for all x and ({a}) = ({d}) = 8 and ({b}) = ({c}) = 3/8. In this example, L := A 2 : P (A) = Q (A) is system which is not an algebra. Indeed, A = {a, b} and B = {a, c} are in L but A B / L. Exercise 7.1. Suppose that and are two measure on a measure space, (, B ) such that = on a system, P . Further assume B = (P ) and there exists n P such that; i) (n ) = (n ) < for all n and ii) n as n . Show = on B . Hint: Consider the measures, n (A) := (A n ) and n (A) = (A n ) . Solution to Exercise (7.1). Let n (A) := (A n ) and n (A) = (A n ) for all A B. Then n and n are nite measure such n ( ) = n ( ) and n = n on P . Therefore by Proposition 7.5, n = n on B . So by the continuity properties of and , it follows that (A) = lim (A n ) = lim n (A) = lim n (A) = lim (A n ) = (A)
n n n n

Corollary 7.9. The joint distribution, is uniquely determined from the knowledge of P ((X1 , . . . , Xn ) A1 An ) for all Ai BR or from the knowledge of P (X1 x1 , . . . , Xn xn ) for all Ai BR for all x = (x1 , . . . , xn ) Rn . Proof. Apply Proposition 7.5 with P being the systems dened by P := {A1 An BRn : Ai BR } for the rst case and P := {(, x1 ] (, xn ] BRn : xi R} for the second case. Denition 7.10. Suppose that {Xi }i=1 and {Yi }i=1 are two nite sequences of random variables on two probability spaces, (, B , P ) and (X, F , Q) respectively. We write (X1 , . . . , Xn ) = (Y1 , . . . , Yn ) if (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) have the same distribution, i.e. if P ((X1 , . . . , Xn ) B ) = Q ((Y1 , . . . , Yn ) B ) for all B BRn . More generally, if {Xi }i=1 and {Yi }i=1 are two sequences of random variables on two probability spaces, (, B , P ) and (X, F , Q) we write {Xi }i=1 = {Yi }i=1 i (X1 , . . . , Xn ) = (Y1 , . . . , Yn ) for all n N. Exercise 7.2. Let {Xi }i=1 and {Yi }i=1 be two sequences of random variables such that {Xi }i=1 = {Yi }i=1 . Let {Sn }n=1 and {Tn }n=1 be dened by, Sn := X1 + + Xn and Tn := Y1 + + Yn . Prove the following assertions. 1. Suppose that f : Rn Rk is a BRn /BRk measurable function, then f (X1 , . . . , Xn ) = f (Y1 , . . . , Yn ) . 2. Use your result in item 1. to show {Sn }n=1 = {Tn }n=1 . Hint: apply item 1. with k = n and a judiciously chosen function, f : Rn Rn . d d 3. Show lim sup Xn = lim sup Yn and similarly that lim inf n Xn =
n n d d d d d d n n

for all A B . Corollary 7.7. A probability measure, P, on (R, BR ) is uniquely determined by its distribution function, F (x) := P ((, x]) . Denition 7.8. Suppose that is a sequence of random variables on a 1 probability space, (, B , P ) . The measure, = P (X1 , . . . , Xn ) on BRn is called the joint distribution of (X1 , . . . , Xn ) . To be more explicit, (B ) := P ((X1 , . . . , Xn ) B ) := P ({ : (X1 ( ) , . . . , Xn ( )) B }) for all B BRn .
Page: 52 job: prob
n {Xi }i=1

lim inf n Yn . Hint: with the aid of the set identity,


macro: svmonob.cls date/time: 23-Feb-2007/15:20

7.1 and Monotone Class Theorems

53

lim sup Xn x
n

= {Xn x i.o.} ,

show P lim sup Xn x


n

Exercise 7.3. Suppose that A 2 is an algebra, B := (A) , and P is a probability measure on B . Show, using the theorem, that for every B B there exists A A such that that P (A B ) < . Here A B := (A \ B ) (B \ A)

= lim lim P (m k=n {Xk x}) .


n m

To use this identity you will also need to nd B BRm such that m k=n {Xk x} = {(X1 , . . . , Xm ) B } . 7.1.1 The Monotone Class Theorem This subsection may be safely skipped! Lemma 7.11 (Monotone Class Theorem*). Suppose A 2X is an algebra and C is the smallest monotone class containing A. Then C = (A). Proof. For C C let C (C ) = {B C : C B, C B c , B C c C},
c Bc then C (C ) is a monotone class. Indeed, if Bn C (C ) and Bn B, then Bn and so

is the symmetric dierence of A and B. Hints: 1. It may be useful to observe that 1A


B

= |1A 1B |

so that P (A B ) = E |1A 1B | . 2. Also observe that if B = Bi and A = i Ai , then B \ A i (Bi \ Ai ) i Ai A \ B i (Ai \ Bi ) i Ai so that A 3. We also have
c (B2 \ B1 ) \ (A2 \ A1 ) = B2 B1 (A2 \ A1 ) c c = B2 B1 (A2 Ac 1) c = B2 B1 (Ac 2 A1 ) c c = [B2 B1 Ac 2 ] [B2 B1 A1 ] (B2 \ A2 ) (A1 \ B1 ) c

Bi and Bi

B i (Ai

Bi ) .

C C C

C Bn C B c C Bn C B c and Bn C c B C c .

Since C is a monotone class, it follows that C B, C B c , B C c C , i.e. B C (C ). This shows that C (C ) is closed under increasing limits and a similar argument shows that C (C ) is closed under decreasing limits. Thus we have shown that C (C ) is a monotone class for all C C . If A A C , then A B, A B c , B Ac A C for all B A and hence it follows that A C (A) C . Since C is the smallest monotone class containing A and C (A) is a monotone class containing A, we conclude that C (A) = C for any A A. Let B C and notice that A C (B ) happens i B C (A). This observation and the fact that C (A) = C for all A A implies A C (B ) C for all B C . Again since C is the smallest monotone class containing A and C (B ) is a monotone class we conclude that C (B ) = C for all B C . That is to say, if A, B C then A C = C (B ) and hence A B, A B c , Ac B C . So C is closed under complements (since X A C ) and nite intersections and increasing unions from which it easily follows that C is a algebra.

and similarly, (A2 \ A1 ) \ (B2 \ B1 ) (A2 \ B2 ) (B1 \ A1 ) so that (A2 \ A1 ) (B2 \ B1 ) (B2 \ A2 ) (A1 \ B1 ) (A2 \ B2 ) (B1 \ A1 ) = (A1 B1 ) (A2 B2 ) .

4. Observe that An B and An A, then P (B An ) = P (B \ An )+P (An \ B ) P (B \ A)+P (A \ B ) = P (A B) .

5. Let L be the collection of sets B for which the assertion of the theorem holds. Show L is a system which contains A.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 53

job: prob

54

7 Independence

Solution to Exercise (7.3). Since L contains the system, A it suces by the theorem to show L is a system. Clearly, L since A L. If B1 B2 with Bi L and > 0, there exists Ai A such that P (Bi Ai ) = E |1Ai 1Bi | < /2 and therefore, P ((B2 \ B1 ) (A2 \ A1 )) P ((A1 P ((A1 B1 ) (A2 B2 )) B1 )) + P ((A2 B2 )) < . An ) < 2n

Proof. As mentioned above, we may always assume with out loss of generality that X Ci . Fix, Aj Cj for j = 2, 3, . . . , n. We will begin by showing that P (A A2 An ) = P (A) P (A2 ) . . . P (An ) for all A (C1 ) . (7.1)

Since it is clear that this identity holds if P (Aj ) = 0 for some j = 2, . . . , n, we may assume that P (Aj ) > 0 for j 2. In this case we may dene, Q (A) = P (A A2 An ) P (A A2 An ) = P (A2 ) . . . P (An ) P (A2 An ) = P (A|A2 An ) for all A (C1 ) .

Also if Bn B with Bn L, there exists An A such that P (Bn and therefore,

P ([n Bn ]

[n An ])
n=1

P (Bn

An ) < .

Moreover, if we let B := n Bn and AN := N n=1 An , then P B AN = P B \ AN +P AN \ B P (B \ A)+P (A \ B ) = P (B A)

where A := n An . Hence it follows for N large enough that P B

AN < .

Then equation Eq. (7.1) is equivalent to P (A) = Q (A) on (C1 ) . But this is true by Proposition 7.5 using the fact that Q = P on the system, C1 . Since (A2 , . . . , An ) C2 Cn were arbitrary we may now conclude that (C1 ) , C2 , . . . , Cn are independent. By applying the result we have just proved to the sequence, C2 , . . . , Cn , (C1 ) shows that (C2 ) , C3 , . . . , Cn , (C1 ) are independent. Similarly we show inductively that (Cj ) , Cj +1 , . . . , Cn , (C1 ) , . . . , (Cj 1 ) are independent for each j = 1, 2, . . . , n. The desired result occurs at j = n. Denition 7.14. A collection of subsets of B , {Ct }tT is said to be independent i {Ct }t are independent for all nite subsets, T. More explicitly, we are requiring P (t At ) = P (At )
t

7.2 Basic Properties of Independence


For this section we will suppose that (, B , P ) is a probability space. Denition 7.12. We say that A is independent of B is P (A|B ) = P (A) or equivalently that P (A B ) = P (A) P (B ) . We further say a nite sequence of collection of sets, {Ci }i=1 , are independent if P (j J Aj ) = P (Aj )
j J n

whenever is a nite subset of T and At Ct for all t . Corollary 7.15. If {Ct }tT is a collection of independent classes such that each Ct is a system, then { (Ct )}tT are independent as well. Example 7.16. Suppose that = n where is a nite set, B = 2 , P ({ }) = n j =1 qj (j ) where qj : [0, 1] are functions such that qj () = 1. n Let Ci := i1 A ni : A . Then {Ci }i=1 are independent. Indeed, if Bi := i1 Ai ni , then Bi = A1 A2 An and we have

for all Ai Ci and J {1, 2, . . . , n} . Observe that if {Ci }i=1 , are independent classes then so are {Ci {X }}i=1 . n Moreover, if we assume that X Ci for each i, then {Ci }i=1 , are independent i
n n n

P n j =1 Aj =
j =1

P (Aj ) for all (A1 , . . . , An ) C1 Cn .


n

Theorem 7.13. Suppose that {Ci }i=1 is a nite sequence of independent n classes. Then { (Ci )}i=1 are also independent.
Page: 54 job: prob

P (Bi ) =
A1 A2 An i=1

qi (i ) =
i=1 Ai

qi ()

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

7.2 Basic Properties of Independence

55

while P (Bi ) =

qi (i ) =
i1 Ai ni i=1 Ai

qi () .

Corollary 7.20. A sequence of random variables, ranges are independent i


k

k {Xj }j =1

with countable

Denition 7.17. A collections of random variables, {Xt : t T } are independent i { (Xt ) : t T } are independent. Theorem 7.18. Let X := {Xt : t T } be a collection of random variables. Then the following are equivalent: 1. The collection X, 2. P (t {Xt At }) =
t

P k j =1 {Xj = xj } =
j =1

P (Xj = xj )

(7.2)

for all xj R. Proof. Observe that both sides of Eq. (7.2) are zero unless xj is in the range of Xj for all j. Hence it suces to verify Eq. (7.2) for those xj Ran(Xj ) =: Rj k for all j. Now if {Xj }j =1 are independent, then {Xj = xj } (Xj ) for all xj R and therefore Eq. (7.2) holds. Conversely if Eq. (7.2) and Vj BR , then k P k j =1 {Xj Vj } = P j =1
xj Vj Rj

P (Xt At )

for all nite subsets, T, and all At BR for t . 3. P (t {Xt xt }) =


t

P (Xt xt )

{Xj = xj }
Qk
j =1

=P =

for all nite subsets, T, and all xt R for t . Proof. The equivalence of 1. and 2. follows almost immediately form the denition of independence and the fact that (Xt ) = {{Xt A} : A BR } . Clearly 2. implies 3. holds. Finally, 3. implies 2. is an application of Corollary 7.15 with Ct := {{Xt a} : a R} and making use the observations that Ct is a system for all t and that (Ct ) = (Xt ) . Example 7.19. Continue the notation of Example 7.16 and further assume that n R and let Xi : be dened by, Xi ( ) = i . Then {Xi }i=1 are independent random variables. Indeed, (Xi ) = Ci with Ci as in Example 7.16. Alternatively, from Exercise 4.1, we know that
n n

k j =1 {Xj = xj }
Vj Rj

(x1 ,...,xk )

(x1 ,...,xk )

Qk
j =1

P
Vj Rj k

k j =1 {Xj = xj }

=
k

(x1 ,...,xk )

Qk

P (Xj = xj )
k

j =1 j =1 Vj Rj

=
j =1 xj Vj Rj

P (Xj = xj ) =
j =1

P (Xj Vj ) .

EP
i=1

fi (Xi ) =
i=1

EP [fi (Xi )]

for all fi : R. Taking Ai and fi := 1Ai in the above identity shows that
n n

P (X1 A1 , . . . , Xn An ) = EP
i=1 n

1Ai (Xi ) =
i=1

EP [1Ai (Xi )]

Denition 7.21. As sequences of random variables, {Xn }n=1 , on a probability space, (, B , P ), are i.i.d. (= independent and identically distributed) if they are independent and (Xn ) P = (Xk ) P for all k, n. That is we should have P (Xn A) = P (Xk A) for all k, n N and A BR . Observe that {Xn }n=1 are i.i.d. random variables i
n n n

=
i=1

P (Xi Ai )

P (X1 A1 , . . . , Xn An ) =
j =1

P (Xi Ai ) =
j =1

P (X1 Ai ) =
j =1

(Ai ) (7.3)

as desired.
Page: 55 job: prob macro: svmonob.cls

date/time: 23-Feb-2007/15:20

56

7 Independence

where = (X1 ) P. The identity in Eq. (7.3) is to hold for all n N and all Ai BR . Theorem 7.22 (Existence of i.i.d simple R.V.s). Suppose that {qi }i=0 is a n sequence of positive numbers such that i=0 qi = 1. Then there exists a sequence {Xk }k=1 of simple random variables taking values in = {0, 1, 2 . . . , n} on ((0, 1], B , m) such that m ({X1 = i1 , . . . , Xk = ii }) = qi1 . . . qik for all i1 , i2 , . . . , ik {0, 1, 2, . . . , n} and all k N. Proof. For i = 0, 1, . . . , n, let 1 = 0 and j := interval, (a, b], let
j i=0 qi n

and for any

Ti ((a, b]) := (a + i1 (b a) , a + i (b a)]. Given i1 , i2 , . . . , ik {0, 1, 2, . . . , n}, let Ji1 ,i2 ,...,ik := Tik Tik1 (. . . Ti1 ((0, 1])) and dene {Xk }k=1 on (0, 1] by Xk :=
i1 ,i2 ,...,ik {0,1,2,...,n}

ik 1Ji1 ,i2 ,...,ik ,

Fig. 7.1. Here we suppose that p0 = 2/3 and p1 = 1/3 and then we construct Jl and Jl,k for l, k {0, 1} .

see Figure 7.1. Repeated applications of Corollary 6.22 shows the functions, Xk : (0, 1] R are measurable. Observe that m (Ti ((a, b])) = qi (b a) = qi m ((a, b]) , and so by induction, m (Ji1 ,i2 ,...,ik ) = qik qik1 . . . qi1 . The reader should convince herself/himself that {X1 = i1 , . . . Xk = ii } = Ji1 ,i2 ,...,ik and therefore, we have m ({X1 = i1 , . . . , Xk = ii }) = m (Ji1 ,i2 ,...,ik ) = qik qik1 . . . qi1 as desired. (7.4)

Corollary 7.23 (Independent variables on product spaces). Suppose n = {0, 1, 2 . . . , n} , qi > 0 with = N , and for i=0 qi = 1, = i N, let Yi : R be dened by Yi ( ) = i for all . Further let B := (Y1 , Y2 , . . . , Yn , . . . ) . Then there exists a unique probability measure, P : B [0, 1] such that P ({Y1 = i1 , . . . , Yk = ii }) = qi1 . . . qik . Proof. Let {Xi }i=1 be as in Theorem 7.22 and dene T : (0, 1] by T (x) = (X1 (x) , X2 (x) , . . . , Xk (x) , . . . ) . Observe that T is measurable since Yi T = Xi is measurable for all i. We now dene, P := T m. Then we have P ({Y1 = i1 , . . . , Yk = ii }) = m T 1 ({Y1 = i1 , . . . , Yk = ii }) = m ({Y1 T = i1 , . . . , Yk T = ii }) = m ({X1 = i1 , . . . , Xk = ii }) = qi1 . . . qik .
n

Page: 56

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

7.2 Basic Properties of Independence

57

Theorem 7.24. Given a nite subset, R and a function q : [0, 1] such that q () = 1, there exists a probability space, (, B , P ) and an independent sequence of random variables, {Xn }n=1 such that P (Xn = ) = q () for all . Proof. Use Corollary 7.20 to shows that random variables constructed in Example 5.28 or Theorem 7.22 t the bill. Proposition 7.25. Suppose that {Xn }n=1 is a sequence of i.i.d. random variables with distribution, P (Xn = 0) = P (Xn = 1) = 1 2 . If we let U := n 2 X , then P ( U x ) = (0 x ) 1 , i.e. U has the uniform distribution n n=1 on [0, 1] . Proof. Let us recall that P (Xn = 0 a.a.) = P (Xn = 1 a.a.) . Hence we may, by shrinking if necessary, assume that {Xn = 0 a.a.} = = {Xn = 1 a.a.} . With this simplication, we have 1 2 1 U< 4 1 3 U < 2 4 U< and hence that U< 3 4 = U< 1 2 1 3 U < 2 4 = {X1 = 0} , = {X1 = 0, X2 = 0} and = {X1 = 1, X2 = 0}

Since x U < x + 2(n+1) = n j =1 {Xj = j } {Xn+1 = 0} we see that P x U < x + 2(n+1) = 2(n+1) and hence P U < x + 2(n+1) = x + 2(n+1) which completes the induction argument. Since x P (U < x) is left continuous we may now conclude that P (U < x) = x for all x (0, 1) and since x x is continuous we may also deduce that P (U x) = x for all x (0, 1) . Hence we may conclude that P (U x) = (0 x) 1.

Lemma 7.26. Suppose that {Bt : t T } is an independent family of elds. And further assume that T = sS Ts and let BTs = tTs Bs = (tTs Bs ) . Then {BTs }sS is an independent family of elds. Proof. Let Cs = {K B : B B , K Ts } . It is now easily checked that {Cs }sS is an independent family of systems. Therefore {BTs = (Cs )}sS is an independent family of algebras. We may now show the existence of independent random variables with arbitrary distributions. U< 3 4 = 3 . 4 Theorem 7.27. Suppose that {n }n=1 are a sequence of probability measures on (R, BR ) . Then there exists a probability space, (, B , P ) and a sequence 1 {Yn }n=1 independent random variables with Law (Yn ) := P Yn = n for all n. Proof. By Theorem 7.24, there exists a sequence of i.i.d. random variables, {Zn }n=1 , such that P (Zn = 1) = P (Zn = 0) = 1 2 . These random variables may be put into a two dimensional array, {Xi,j : i, j N} , see the proof of Lemma 3.8. For each i, let Ui := j =1 2i Xi,j {Xi,j }j =1 measurable random variable. According to Proposition 7.25, Ui is uniformly distributed on [0, 1] . Moreover by the grouping Lemma 7.26, {Xi,j }j =1

= {X1 = 0} {X1 = 1, X2 = 0} . From these identities, it follows that P (U < 0) = 0, P U< 1 4 = 1 , P 4 U< 1 2 = 1 , and P 2

More generally, we claim that if x =

n j j =1 j 2

with j {0, 1} , then (7.5)

P (U < x) = x.

The proof is by induction on n. Indeed, we have already veried (7.5) when n = n 1, 2. Suppose we have veried (7.5) up to some n N and let x = j =1 j 2j and consider P U < x + 2(n+1) = P (U < x) + P x U < x + 2(n+1) =x+P xU <x+2
(n+1)

are independent
i=1

Page: 57

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

58

7 Independence

algebras and hence {Ui }i=1 is a sequence of i.i.d.. random variables with the uniform distribution. Finally, let Fi (x) := ((, x]) for all x R and let Gi (y ) = inf {x : Fi (x) y } . Then according to Theorem 6.11, Yi := Gi (Ui ) has i as its distribution. Moreover each Yi is {Xi,j }j =1 measurable and therefore the {Yi }i=1 are independent random variables. 7.2.1 An Example of Ranks Let {Xn }n=1 be i.i.d. with common continuous distribution function, F. In this case we have, for any i = j, that P (Xi = Xj ) = F F ({(x, x) : x R}) = 0. This may be proved directly with some work or will be an easy consequence of Fubinis theorem to be considered later, see Example 10.11 below. For the direct proof, let {al }l= be a sequence such that, al < al+1 for all l Z, liml al = and liml al = . Then {(x, x) : x R} lZ [(al , al+1 ] (al , al+1 ]]

must have X5 in the last slot, i.e. (, , , , X5 ) . Since R4 = 2, we know out of the remaining slots, X4 must be in the second from the far most right, i.e. (, , X4 , , X5 ) . Since R3 = 2, we know that X3 is again the second from the right of the remaining slots, i.e. we now know, (, X3 , X4 , , X5 ) . Similarly, R2 = 2 implies (X2 , X3 , X4 , , X5 ) and nally R1 = 1 gives, (X2 , X3 , X4 , X1 , X5 ) . As another example, if Ri = i for i = 1, 2, . . . , n, then Xn < Xn1 < < X1 . Theorem 7.28 (Renyi Theorem). Let {Xn }n=1 be i.i.d. and assume that F (x) := P (Xn x) is continuous. The {Rn }n=1 is an independent sequence, P (Rn = k ) = 1 for k = 1, 2, . . . , n, n

and the events, An = {Xn is a record} = {Rn = 1} are independent as n varies and 1 P (An ) = P (Rn = 1) = . n Proof. By Problem 6 on p. 110 of Resnick, (X1 , . . . , Xn ) and (X1 , . . . , Xn ) have the same distribution for any permutation . Since F is continuous, it now follows that up to a set of measure zero, = {X1 < X2 < < Xn }

and therefore, P (Xi = Xj )


lZ

P (Xi (al , al+1 ], Xj (al , al+1 ]) =


lZ

[F (al+1 ) F (al )]

and therefore 1 = P ( ) =

sup [F (al+1 ) F (al )]


lZ lZ

[F (al+1 ) F (al )] = sup [F (al+1 ) F (al )] .


lZ

P ({X1 < X2 < < Xn }) .

Since F is continuous and F (+) = 1 and F () = 0, it is easily seen that l , we have F is uniformly continuous on R. Therefore, if we choose al = N P (Xi = Xj ) lim sup sup F
N lZ

l+1 N

l N

= 0.

Let Rn denote the rank of Xn in the list (X1 , . . . , Xn ) , i.e.


n

Since P ({X1 < X2 < < Xn }) is independent of we may now conclude that 1 P ({X1 < X2 < < Xn }) = n! for all . As observed before the statement of the theorem, to each realization (1 , . . . , n ) , (here i N with i i) of (R1 , . . . , Rn ) there is a permutation, = (1 , . . . , n ) such that X1 < X2 < < Xn . From this it follows that {(R1 , . . . , Rn ) = (1 , . . . , n )} = {X1 < X2 < < Xn }

Rn :=
j =1

1Xj >Xn = # {j n : Xj > Xn } .

and therefore, P ({(R1 , . . . , Rn ) = (1 , . . . , n )}) = P (X1 < X2 < < Xn ) = Since 1 . n!

For example if (X1 , X2 , X3 , X4 , X5 , . . . ) = (9, 8, 3, 7, 23, . . . ) , we have R1 = 1, R2 = 2, R3 = 2, and R4 = 2, R5 = 1. Observe that rank order, from lowest to highest, of (X1 , X2 , X3 , X4 , X5 ) is (X2 , X3 , X4 , X1 , X5 ) . This can be determined by the values of Ri for i = 1, 2, . . . , 5 as follows. Since R5 = 1, we
Page: 58 job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

7.3 Borel-Cantelli Lemmas

59

P ({Rn = n }) =
(1 ,...n1 )

P ({(R1 , . . . , Rn ) = (1 , . . . , n )}) 1 1 1 = (n 1)! = n! n! n

then P (Xn = 1 i.o.) = 0 and hence P (Xn = 0 a.a.) = 1. In particular, P lim Xn = 0 = 1.

=
(1 ,...n1 )

we have shown that 1 1 P ({Rj = j }) . = = P ({(R1 , . . . , Rn ) = (1 , . . . , n )}) = n! j =1 j j =1


n n

Figure 7.2 below serves as motivation for the following elementary lemma on convex functions.

7.3 Borel-Cantelli Lemmas


Lemma 7.29 (First Borel Cantelli-Lemma). Suppose that {An }n=1 are measurable sets. If

P (An ) < ,
n=1

(7.6)

then P ({An i.o.}) = 0. Proof. First Proof. We have P ({An i.o.}) = P ( n=1 kn Ak ) = lim P (kn Ak ) lim
n n

P (Ak ) = 0.
kn

Fig. 7.2. A convex function, , along with a cord and a tangent line. Notice that the tangent line is always below and the cord lies above between the points of intersection of the cord with the graph of .

(7.7) Second Proof. (Warning: this proof require integration theory which is developed below.) Equation (7.6) is equivalent to

E
n=1

1An <

Lemma 7.31 (Convex Functions). Suppose that P C 2 ((a, b) R)2 with (x) 0 for almost all x (a, b) . Then satises; 1. for all x0 , x (a, b) , (x0 ) + (x0 ) (x x0 ) (x)

from which it follows that

1An < a.s.


n=1

and
2

which is equivalent to P ({An i.o.}) = 0. Example 7.30. Suppose that {Xn } are Bernoulli random variables with P (Xn = 1) = pn and P (Xn = 0) = 1 pn . If pn <
Page: 59 job: prob

P C 2 denotes the space of piecewise C 2 functions, i.e. P C 2 ((a, b) R) means the is C 1 and there are a nite number of points, {a = a0 < a1 < a2 < < an1 < an = b} , such that |[aj 1 ,aj ](a,b) is C 2 for all j = 1, 2, . . . , n.

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

60

7 Independence

2. for all u v with u, v (a, b) , (u + t (v u)) (u) + t ( (v ) (u)) t [0, 1] . (This lemma applies to the functions, ex for all R, |x| for > 1, and ln x to name a few examples. See Appendix 11.7 below for much more on convex functions.) Proof. 1. Let f (x) := (x) [ (x0 ) + (x0 ) (x x0 )] . Then f (x0 ) = f (x0 ) = 0 while f (x) 0 a.e. and so by the fundamental theorem of calculus,
x

Fig. 7.3. A graph of 1 x and ex showing that 1 x ex for all x.

f (x) = (x) (x0 ) =


x0

(y ) dy.

Hence it follows that f (x) 0 for x > x0 and f (x) 0 for x < x0 and therefore, f (x) 0 for all x (a, b) . 2. Let f (t) := (u) + t ( (v ) (u)) (u + t (v u)) . (t) = (v u)2 (u + t (v u)) 0 for almost Then f (0) = f (1) = 0 with f all t. By the mean value theorem, there exists, t0 (0, 1) such that f (t0 ) = 0 and then by the fundamental theorem of calculus it follows that
t

f (t) =
t0

( ) dt. f

Fig. 7.4. A graph of 1 x and e2x showing that 1 x e2x for all x [0, 1/2] .

In particular, f (t) 0 for t > t0 and f (t) 0 for t < t0 and hence f (t) f (1) = 0 for t t0 and f (t) f (0) = 0 for t t0 , i.e. f (t) 0. Example 7.32. Taking (x) := ex , we learn (see Figure 7.3), 1 x ex for all x R and taking (x) = e2x we learn that 1 x e2x for 0 x 1/2. (7.9) (7.8)

Exercise 7.4. For {an }n=1 [0, 1] , let


N

(1 an ) := lim
n=1

(1 an ) .
n=1

(The limit exists since, then

N n=1

(1 an ) as N .) Show that if {an }n=1 [0, 1),

(1 an ) = 0 i
n=1 n=1

an = .

Solution to Exercise (7.4). On one hand we have


N N N

(1 an )
n=1 n=1

ean = exp
n=1

an

Page: 60

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

7.3 Borel-Cantelli Lemmas

61

which upon passing to the limit as N gives


c c c P ( n=1 kn Ak ) = lim P (kn Ak ) = lim lim P (mkn Ak ) . n n m

(1 an ) exp
n=1 n=1

an

Making use of the independence of {Ak }k=1 and hence the independence of {Ac k }k=1 , we have P (mkn Ac k) =
mkn

Hence if n=1 an = then n=1 (1 an ) = 0. Conversely, suppose that n=1 an < . In this case an 0 as n and so there exists an m N such that an [0, 1/2] for all n m. With this notation we then have for N m that
N m N

P (Ac k) =
mkn

(1 P (Ak )) .

(7.12)

Using the simple inequality in Eq. (7.8) along with Eq. (7.12) shows
m

(1 an ) =
n=1 n=1 m

(1 an )
n=m+1 N

(1 an )
m N 2an

P (mkn Ac k)
mkn

eP (Ak ) = exp
k=n

P (Ak ) . inequality that

n=1 m

(1 an )
n=m+1

=
n=1

(1 an ) exp 2
n=m+1

an

Using Eq. (7.10), we nd from limm P (mkn Ac k ) = 0 and hence


n m

the

above

n=1

(1 an ) exp 2
n=m+1

an

c c P ( n=1 kn Ak ) = lim lim P (mkn Ak ) = lim 0 = 0 n

as desired. Example 7.34 (Example 7.30 continued). Suppose that {Xn } are now independent Bernoulli random variables with P (Xn = 1) = pn and P (Xn = 0) = 1 pn . Then P (limn Xn = 0) = 1 i pn < . Indeed, P (limn Xn = 0) = 1 i P (Xn = 0 a.a.) = 1 i P (Xn = 1 i.o.) = 0 i pn = P (Xn = 1) < .

So again letting N shows,


m

(1 an )
n=1 n=1

(1 an ) exp 2
n=m+1

an

> 0.

Lemma 7.33 (Second Borel-Cantelli Lemma). Suppose that {An }n=1 are independent sets. If

P (An ) = ,
n=1

Proposition 7.35 (Extremal behaviour of iid random variables). Sup pose that {Xn }n=1 is a sequence of i.i.d. random variables and cn is an increasing sequence of positive real numbers such that for all > 1 we have

(7.10) P X1 > 1 cn = (7.11) while


n=1

then P ({An i.o.}) = 1. Combining this with the rst Borel Cantelli Lemma gives the (Borel) Zero-One law, 0 if n=1 P (An ) < . P (An i.o.) = 1 if n=1 P (An ) = Proof. We are going to prove Eq. (7.11) by showing,
c 0 = P ({An i.o.} ) = P ({Ac n a.a}) = P (n=1 kn Ak ) . c m c Since kn Ac k n=1 kn Ak as n and k=n Ak n=1 kn Ak as m , c

(7.13)

P (X1 > cn ) < .


n=1

(7.14)

Then lim sup


n

Xn = 1 a.s. cn

(7.15)

Proof. By the second Borel-Cantelli Lemma, Eq. (7.13) implies P Xn > 1 cn i.o. n = 1 from which it follows that
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 61

job: prob

62

7 Independence

lim sup
n

Xn 1 a.s.. cn

Example 7.37. Suppose now that {Xn }n=1 are i.i.d. distributed by the Poisson distribution with intensity, , i.e. P (X1 = k ) = = 1. In this case we have

Taking = k = 1 + 1/k, we nd P lim sup


n

Xn 1 cn

=P

k=1 lim sup


n

1 Xn cn k

k e . k!

Similarly, by the rst Borel-Cantelli lemma, Eq. (7.14) implies P (Xn > cn i.o. n) = 0 or equivalently, P (Xn cn a.a. n) = 1. That is to say, lim sup
n

P (X1 n) = e
k=n

n k e k! n!

and

Xn a.s. cn Xn k cn

k=n

k n e = e k! n! n = e n!

k=n

n! kn k! n n! k e (k + n)! n!

and hence working as above, P Hence, P lim sup


n

k=0

k=0

n 1 k = . k! n!

lim sup
n

Xn 1 cn

=P

k=1 lim sup


n

= 1.

Thus we have shown that n n e P (X1 n) . n! n! = 1. Thus in terms of convergence issues, we may assume that P (X1 x) x x x! 2xex xx

Xn =1 cn

=P

lim sup
n

Xn Xn 1 lim sup 1 cn n cn

Example 7.36. Let {En }n=1 be a sequence of independent random variables with exponential distributions determined by P (En > x) = e(x0) or P (En x) = 1 e(x0) . (Observe that P (En 0) = 0) so that En > 0 a.s.) Then for cn > 0 and > 0, we have

wherein we have used Stirlings formula, x! 2xex xx . Now suppose that we wish to choose cn so that P (X1 cn ) 1/n. This suggests that we need to solve the equation, xx = n. Taking logarithms of this equation implies that ln n x= ln x and upon iteration we nd, x= = ln n ln
ln n ln x

P (En > cn ) =
n=1 n=1

cn

=
n=1

cn

Hence if we choose cn = ln n so that ecn = 1/n, then we have


P (En > ln n) =
n=1 n=1

1 n

which is convergent i > 1. So by Proposition 7.35, it follows that lim sup


n

=
2

ln n = (n) 2 (x)
3

ln n ( n ) 2 2

ln n ln x

En = 1 a.s. ln n
job: prob macro: svmonob.cls

2 (n)

ln n 3 (n) +

(x)

Page: 62

date/time: 23-Feb-2007/15:20

7.4 Kolmogorov and Hewitt-Savage Zero-One Laws


k - times

63

7.4 Kolmogorov and Hewitt-Savage Zero-One Laws


3

where k = ln ln ln. Since, x ln (n) , it follows that 3 (x) hence that ln (n) ln (n) 3 (n) x= = 1+O . 2 (n) + O ( 3 (n)) 2 (n) 2 (n) Thus we are lead to take cn := (cn )
cn ln(n) . 2 ( n)

(n) and

Let {Xn }n=1 be a sequence of random variables on a measurable space, (, B ) . Let Bn := (X1 , . . . , Xn ) , B := (X1 , X2 , . . . ) , Tn := (Xn+1 , Xn+2 , . . . ) , and T := n=1 Tn B . We call T the tail eld and events, A T , are called tail events. Example 7.38. Let Sn := X1 + + Xn and {bn }n=1 (0, ) such that bn . Here are some example of tail events and tail measurable random variables: 1. {
n=1

We then have, for (0, ) that

= exp (cn [ln + ln cn ]) ln (n) [ln + 2 (n) 3 (n)] 2 (n) ln 3 (n) + 1 ln (n) = exp 2 (n) = exp = n(1+n ())

Xn converges} T . Indeed,

Xk converges
k=1

=
k=n+1

Xk converges

Tn

where

ln 3 (n) n () := . 2 (n) P (X1 cn ) (/e) n cn 1 . c n c n (c ) 2cn e 2cn n(1+n ()) n


c

for all n N. n 2. both lim sup Xn and lim inf n Xn are T measurable as are lim sup S bn
n n

and lim inf n

Hence we have

Sn bn .

= 3. lim Xn exists in R lim

lim sup Xn = lim inf n Xn


n

T and similarly,

Since ln (/e) it follows that


cn

ln(/e) ln n = cn ln (/e) = ln (/e) = ln n 2 (n) , ( n ) 2 ln(/e) 2 (n)

Sn exists in R bn

lim sup
n

Sn Sn = lim inf n bn bn

and lim Sn exists in R bn


Sn bn

(/e) Therefore, P (X1 cn ) n

ln(/e) 2 (n)

cn

=n

< lim sup


n

Sn Sn = lim inf < n bn bn

T.

ln(n) n(1+n ()) 2 ( n)

(n) 1 (1+ n ()) ln (n) n


2

4. limn

= 0 T . Indeed, for any k N, lim Sn (Xk+1 + + Xn ) = lim n bn bn limn


Sn bn

where n () 0 as n . From this observation, we may show,

P (X1 cn ) < if > 1 and


n=1

from which it follows that

= 0 Tk for all k.

P (X1 cn ) = if < 1
n=1

Denition 7.39. Let (, B , P ) be a probability space. A eld, F B is almost trivial i P (F ) = {0, 1} , i.e. P (A) {0, 1} for all A F . is a random variable which is F meaLemma 7.40. Suppose that X : R such that X = c surable, where F B is almost trivial. Then there exists c R a.s.

and so by Proposition 7.35 we may conclude that lim sup


n

Xn = 1 a.s. ln (n) / 2 (n)


job: prob

Page: 63

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

64

7 Independence

Proof. Since {X = } and {X = } are in F , if P (X = ) > 0 or P (X = ) > 0, then P (X = ) = 1 or P (X = ) = 1 respectively. Hence, it suces to nish the proof under the added condition that P (X R) = 1. For each x R, {X x} F and therefore, P (X x) is either 0 or 1. Since the function, F (x) := P (X x) {0, 1} is right continuous, non-decreasing and F () = 0 and F (+) = 1, there is a unique point c R where F (c) = 1 and F (c) = 0. At this point, we have P (X = c) = 1. Proposition 7.41 (Kolmogorovs Zero-One Law). Suppose that P is a probability measure on (, B ) such that {Xn }n=1 are independent random variables. Then T is almost trivial, i.e. P (A) {0, 1} for all A T . Proof. Let A T B . Since A Tn for all n and Tn is independent of Bn , it follows that A is independent of n=1 Bn for all n. Since the latter set is a multiplicative set, it follows that A is independent of B = (Bn ) = n=1 Bn . But A B and hence A is independent of itself, i.e. P (A) = P (A A) = P (A) P (A) . Since the only x R, such that x = x2 is x = 0 or x = 1, the result is proved. In particular the tail events in Example 7.38 have probability either 0 or 1. Corollary 7.42. Keeping the assumptions in Proposition 7.41 and let {bn }n=1 (0, ) such that bn . Then lim sup Xn , lim inf n Xn ,
n
n lim sup S bn , and lim inf n

Proof. Let B0 := n=1 (X1 , X2 , . . . , Xn ) . Then B0 is an algebra and (B0 ) = B . By the regularity Theorem 5.10, for any B B and > 0, there exists An B0 such that An C (B0 ) , B C, and P (C \ B ) < . Since P (An B ) = P ([An \ B ] [B \ An ]) = P (An \ B ) + P (B \ An ) P (C \ B ) + P (B \ C ) < , for suciently large n, we have P (AB ) < where A = An B0 . Now suppose that B S , > 0, and A (X1 , X2 , . . . , Xn ) B0 such that P (AB ) < . Let : N N be the permutation dened by (j ) = j + n, (j + n) = j for j = 1, 2, . . . , n, and (j + 2n) = j + 2n for all j N. Since B = {(X1 , . . . , Xn ) B } = { : (1 , . . . , n ) B } for some B BRn , we have
1 T (B ) = { : ((T ( ))1 , . . . , (T ( ))n ) B } = { : (1 , . . . , n ) B } = { : (n+1 , . . . , n+n ) B } = {(Xn+1 , . . . , Xn+n ) B } (Xn+1 , . . . , Xn+n ) , 1 1 (B ) . (B ) are independent with P (B ) = P T it follows that B and T 2 1 Therefore P B T B = P (B ) . Combining this observation with the iden1 A , we nd tity, P (A) = P (A A) = P A T

Sn bn

are all constant almost surely. In particular, lim Sn n bn exists = 1 and in the

either P latter case

lim Sn exists n bn lim Sn = c a.s n bn

= 0 or P . for some c R

P (A) P (B )

1 1 = P A T A P B T B
1 1 E 1AT A 1B T B 1 1 = E 1A 1T A 1B 1T B

1 1 = E 1AT A 1B T B

Let us now suppose that := R = RN , Xn ( ) = n for all , and B := (X1 , X2 , . . . ) . We say a permutation (i.e. a bijective map on N), : N N is nite if (n) = n for a.a. n. Dene T : by T ( ) = (1 , 2 , . . . ) . Denition 7.43. The permutation invariant eld, S B, is the collec1 tion of sets, A B such that T (A) = A for all nite permutations . In the proof below we will use the identities, 1A
B

1 1 1 = E [1A 1B ] 1T A + 1B 1T A 1T B 1 1 E |[1A 1B ]| + E 1T A 1T B

1 1 = P (AB ) + P T AT B < 2.

Since |P (A) P (B )| P (AB ) < , it follows that P (A) [P (A) + O ()]


2

= |1A 1B | and P (A

B ) = E |1A 1B | .

< .
2

Proposition 7.44 (Hewitt-Savage Zero-One Law). Let P be a probability measure on (, B ) such that {Xn }n=1 is an i.i.d. sequence. Then S is almost trivial.
Page: 64 job: prob

Since > 0 was arbitrary, we may conclude that P (A) = P (A) for all A S .

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

Example 7.45 (Some Random Walk 0 1 Law Results). Continue the notation in Proposition 7.44. 1. As above, if Sn = X1 + + Xn , then P (Sn B i.o.) {0, 1} for all B BR . Indeed, if is a nite permutation,
1 T ({Sn B i.o.}) = {Sn T B i.o.} = {Sn B i.o.} .

Hence {Sn B i.o.} is in the permutation invariant eld. The same goes for {Sn B a.a.} 2. If P (X1 = 0) > 0, then lim sup Sn = a.s. or lim sup Sn = a.s. Indeed,
n n 1 T lim sup Sn x n

lim sup Sn T x
n

lim sup Sn x
n

which shows that lim sup Sn is S measurable. Therefore, lim sup Sn = c . Since, a.s., a.s. for some c R
n n n n n

c = lim sup Sn+1 = lim sup (Sn + X1 ) = lim sup Sn + X1 = c + X1 , we must have either c {} or X1 = 0 a.s. Since the latter is not allowed, lim sup Sn = or lim sup Sn = a.s.
n n

3. Now assume that P (X1 = 0) > 0 and X1 = X1 , i.e. P (X1 A) = P (X1 A) for all A BR . From item 2. we know that and from what we have already proved, we know lim sup Sn = c a.s. with c {} .
n

Since {Xn }n=1 and {Xn }n=1 are i.i.d. and Xn = Xn , it follows that {Xn }n=1 = {Xn }n=1 .The results of Exercise 7.2 then imply that lim sup Sn = lim sup (Sn ) and in particular lim sup (Sn ) = c a.s. as well.
n n n d d

Thus we have c = lim sup (Sn ) = lim inf Sn lim sup Sn = c.


n n n

Since the c = does not satisfy, c c, we must c = . Hence in this symmetric case we have shown, lim sup Sn = and lim sup (Sn ) = a.s.
n n

or equivalently that lim sup Sn = and lim inf Sn = a.s.


n n

8 Integration Theory
In this chapter, we will greatly extend the simple integral or expectation which was developed in Section 4.3 above. Recall there that if (, B , ) was measurable space and f : [0, ] was a measurable simple function, then we let E f := (f = ) .
[0,]

f d = lim

n d n2n 1
k=0

= lim

k 2n

k k+1 <f 2n 2n

+ n (f > n2n ) . |f | d <

, integrable if it is measurable and We call a function, f : R . We will denote the space of integrable functions by L1 ()

8.1 A Quick Introduction to Lebesgue Integration Theory


Theorem 8.1 (Extension to positive functions). For a positive measurable function, f : [0, ] , the integral of f with respect to is dened by f (x) d (x) := sup {E : is simple and f } .
X

Theorem 8.3 (Extension to integrable functions). The integral extends to a linear function from L1 () R. Moreover this extension is continuous under dominated convergence (see Theorem 8.34). That is if fn L1 () and there exists g L1 () such that |fn | g and f := limn fn exists pointwise, then f d = lim fn d = lim fn d.
n n

This integral has the following properties. 1. This integral is linear in the sense that (f + g ) d =

Notation 8.4 We write A f d := 1A f d for all A B where f is a measurable function such that 1A f is either non-negative or integrable. f d +

gd

Notation 8.5 If m is Lebesgue measure on BR , f is a non-negative Borel mea , we will often write b f (x) dx or surable function and a < b with a, b R a
b a

whenever f, g 0 are measurable functions and [0, ). 2. The integral is continuous under increasing limits, i.e. if 0 fn f, then f d =
n

f dm for

(a,b]R

f dm.

Example 8.6. Suppose < a < b < , f C ([a, b], R) and m be Lebesgue measure on R. Given a partition, = {a = a0 < a1 < < an = b}, let mesh( ) := max{|aj aj 1 | : j = 1, . . . , n} and f (x) :=
l=0 n1

lim fn d = lim

fn d.

See the monotone convergence Theorem 8.15 below. Remark 8.2. Given f : [0, ] measurable, we know from the approximation Theorem 6.34 n f where
n2 1
n

n :=
k=0

k 1 k + n1{f >n2n } . k+1 2n { 2n <f 2n } Then

f (al ) 1(al ,al+1 ] (x).

Therefore by the monotone convergence theorem,

68

8 Integration Theory
b n1 n1

f dm =
a l=0

f (al ) m ((al , al+1 ]) =


l=0

f (al ) (al+1 al )

is a Riemann sum. Therefore if {k }k=1 is a sequence of partitions with limk mesh(k ) = 0, we know that
b k b

and the latter expression, by the continuity of f, goes to zero as h 0 . This shows F = f on (a, b). For the converse direction, we have by assumption that G (x) = F (x) for x (a, b). Therefore by the mean value theorem, F G = C for some constant C. Hence
b

lim

fk dm =
a a

f (x) dx

(8.1)
a

f (x)dm(x) = F (b) = F (b) F (a) = (G(b) + C ) (G(a) + C ) = G(b) G(a).

where the latter integral is the Riemann integral. Using the (uniform) continuity of f on [a, b] , it easily follows that limk fk (x) = f (x) and that |fk (x)| g (x) := M 1(a,b] (x) for all x (a, b] where M := maxx[a,b] |f (x)| < . Since gdm = M (b a) < , we may apply D.C.T. to conclude, R
b k b b

We can use the above results to integrate some non-Riemann integrable functions: Example 8.8. For all > 0,

lim

fk dm =
a

a k

lim fk dm =
a

f dm. 1 dm(x) = . 1 + x2

This equation with Eq. (8.1) shows


b b

ex dm(x) = 1 and f (x) dx


0 R

f dm =
a a

whenever f C ([a, b], R), i.e. the Lebesgue and the Riemann integral agree on continuous functions. See Theorem 8.51 below for a more general statement along these lines. Theorem 8.7 (The Fundamental Theorem of Calculus). Suppose < x a < b < , f C ((a, b), R)L1 ((a, b), m) and F (x) := a f (y )dm(y ). Then 1. F C ([a, b], R) C 1 ((a, b), R). 2. F (x) = f (x) for all x (a, b). 3. If G C ([a, b], R) C 1 ((a, b), R) is an anti-derivative of f on (a, b) (i.e. f = G |(a,b) ) then
b

The proof of these identities are similar. By the monotone convergence theorem, Example 8.6 and the fundamental theorem of calculus for Riemann integrals (or Theorem 8.7 below),
N N

ex dm(x) = lim
0

ex dm(x) = lim
0

ex dx
0

= lim and 1 dm(x) = lim N 1 + x2 = lim


N

1 x N e |0 = 1

N N

f (x)dm(x) = G(b) G(a).


a R

1 dm(x) = lim N 1 + x2

N N

1 dx 1 + x2

Proof. Since F (x) := R 1(a,x) (y )f (y )dm(y ), limxz 1(a,x) (y ) = 1(a,z) (y ) for m a.e. y and 1(a,x) (y )f (y ) 1(a,b) (y ) |f (y )| is an L1 function, it follows from the dominated convergence Theorem 8.34 that F is continuous on [a, b]. Simple manipulations show, x+h 1 x [f (y ) f (x)] dm(y ) if h > 0 F (x + h) F (x) f (x) = h |h| x [f (y ) f (x)] dm(y ) if h < 0
x+h

tan1 (N ) tan1 (N ) = .

Let us also consider the functions xp , 1 dm(x) = lim n xp = lim


n 1 1p 1 0 1
1 n 1 1( n ,1] (x)

(0,1]

1 dm(x) xp
1 1/n

1 |h|

x+h |f (y ) f (x)| dm(y ) x x | f (y ) f (x)| dm(y ) x+h

if h > 0 if h < 0

1 xp+1 dx = lim n 1 p xp

sup {|f (y ) f (x)| : y [x |h| , x + |h|]}

if p < 1 if p > 1

Page: 68

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

8.1 A Quick Introduction to Lebesgue Integration Theory

69

If p = 1 we nd 1 dm(x) = lim n xp
1
1 n

(0,1]

1 dx = lim ln(x)|1 1/n = . n x

x x n d d n ln 1 = ln 1 + ln fn (x) = dn dn n n 1 x x = ln 1 + n x = h (x/n) n 1 n where, for 0 y < 1, h (y ) := ln(1 y ) + Since h (0) = 0 and h (y ) = 1 1 y + + >0 1 y 1 y (1 y )2 y . 1y

x n

x n2

Exercise 8.1. Show


1

1 dm (x) = xp

if p 1 . 1 p1 if p > 1

Example 8.9. The following limit holds,


n n

lim

1
0 n

x n

dm(x) = 1.

it follows that h 0. Thus we have shown, fn (x) ex as n as claimed. Example 8.10 (Jordans Lemma). In this example, let us consider the limit;
n

x 1[0,n] (x). Then limn fn (x) = ex for all To verify this, let fn (x) := 1 n x 0 and by taking logarithms of Eq. (7.8),

lim

cos sin
0

ln (1 x) x for x < 1. Let Therefore, for x < n, we have 1 x n


n

en sin() d. n

fn () := 1(0,] () cos sin = en ln(1 n ) en( n ) = ex


x x

en sin() .

Then |fn | 1(0,] L1 (m) and

from which it follows that 0 fn (x) e From Example 8.8, we know


x

for all x 0.

lim fn () = 1(0,] () 1{} () = 1{} () .

Therefore by the D.C.T.,


x n

lim

cos sin
0

e
0 x

dm(x) = 1 < ,

en sin() d =
R

1{} () dm () = m ({ }) = 0.

is an integrable function on [0, ). Hence by the dominated conso that e vergence theorem,
n n

Exercise 8.2 (Folland 2.28 on p. 60.). Compute the following limits and justify your calculations: 1. lim 2.
x sin( n ) x n dx. (1+ 0 n) n 1 1+nx2 lim 2 n dx n 0 (1+x ) n sin(x/n) lim x(1+x2 ) dx n 0

lim

1
0

x n

dm(x) = lim =
0

fn (x)dm(x)
0

lim fn (x)dm(x) =
0

ex dm(x) = 1.

3. 4. For all a R compute,

The limit in the above example may also be computed using the monotone convergence theorem. To do this we must show that n fn (x) is increasing in n for each x and for this it suces to consider n > x. But for n > x,

f (a) := lim

n(1 + n2 x2 )1 dx.
a

Now that we have an overview of the Lebesgue integral, let us proceed to the formal development of the facts stated above.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 69

job: prob

70

8 Integration Theory

8.2 Integrals of positive functions


Denition 8.11. Let L+ = L+ (B ) = {f : X [0, ] : f is measurable}. Dene f (x) d (x) =
X X X

2. Since { is simple and f } { is simple and g } , Eq. (8.2) follows from the denition of the integral. 1 3. Since 1{f } 1{f } 1 f f we have 1{f } 1{f } 1 f
p

f d := sup {E : is simple and f } . f d < . If A B , let f d :=


A X

We say the f L+ is integrable if f (x) d (x) =


A

1 f

and by monotonicity and the multiplicative property of the integral, 1A f d. (f ) =


X

Remark 8.12. Because of item 3. of Proposition 4.16, if is a non-negative simple function, X d = E so that X is an extension of E . Lemma 8.13. Let f, g L (B ) . Then: 1. if 0, then f d =
X X +

1{f } d

1{f } f p d
X

f p d.
X

4. If (f = ) > 0, then n := n1{f =} is a simple function such that n f for all n and hence n (f = ) = E (n ) f d
X X

f d f d = . X gd.
X

wherein X f d 0 if = 0, even if 2. if 0 f g, then f d


X

(8.2)

for all n. Letting n shows X f d = . Thus if (f = ) = 0. Moreover, {f > 0} = n=1 {f > 1/n} with (f > 1/n) n
X

f d < then

f d < for each n.

3. For all > 0 and p > 0, 1 (f ) p 1 f 1{f } d p X


p

f d.
X

(8.3)

Lemma 8.14 (Sums as Integrals). Let X be a set and : X [0, ] be a function, let = xX (x)x on B = 2X , i.e. (A) =
xA

The inequality in Eq. (8.3) is called Chebyshevs Inequality for p = 1 and Markovs inequality for p = 2. 4. If X f d < then (f = ) = 0 (i.e. f < a.e.) and the set {f > 0} is nite. Proof. 1. We may assume > 0 in which case, f d = sup {E : is simple and f }
X

(x).

If f : X [0, ] is a function (which is necessarily measurable), then f d =


X X

f .

= sup E : is simple and 1 f = sup {E [ ] : is simple and f } = sup {E [ ] : is simple and f } =


X

Proof. Suppose that : X [0, ) is a simple function, then = z [0,) z 1{=z } and =
X xX

(x)
z [0,)

z 1{=z} (x) =
z [0,)

z
xX

(x)1{=z} (x)

f d.

=
z [0,)

z({ = z }) =
X

d.

Page: 70

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

8.2 Integrals of positive functions

71

So if : X [0, ) is a simple function such that f, then d =


X X

fn E [1Xn ] = E [1Xn ] . Then using the continuity of under increasing unions, lim E [1Xn ] = lim 1Xn
y>0

(8.5)

f .

Taking the sup over in this last equation then shows that f d
X X

y 1{=y}

f .

= lim =

y(Xn { = y })
y>0

For the reverse inequality, let X be a nite set and N (0, ). Set f N (x) = min {N, f (x)} and let N, be the simple function given by N, (x) := 1 (x)f N (x). Because N, (x) f (x), f =
X N

nite sum y>0

y lim (Xn { = y })
n

= N, =
X

y lim ({ = y }) = E []
y>0 n

N, d
X

f d.

Since f N f as N , we may let N in this last equation to concluded f


X

f d.

This identity allows us to let n in Eq. (8.5) to conclude limn fn E [] and since (0, 1) was arbitrary we may further conclude,E [] limn fn . The latter inequality being true for all simple functions with f then implies that f lim
n

fn ,

Since is arbitrary, this implies f


X X

which combined with Eq. (8.4) proves the theorem. f d. Corollary 8.16. If fn L+ is a sequence of functions then

fn = Theorem 8.15 (Monotone Convergence Theorem). Suppose fn L+ is a sequence of functions such that fn f (f is necessarily in L+ ) then fn f as n .
n=1 n=1 n=1

fn . fn < a.e.

In particular, if

n=1

fn < then

Proof. First o we show that (f1 + f2 ) = f1 + f2

Proof. Since fn fm f, for all n m < , fn from which if follows fm f

fn is increasing in n and
n

by choosing non-negative simple function n and n such that n f1 and n f2 . Then (n + n ) is simple as well and (n + n ) (f1 + f2 ) so by the monotone convergence theorem, (8.4) (f1 + f2 ) = lim = lim (n + n ) = lim n + lim n + f1 + n f2 .

lim

fn

f.

For the opposite inequality, let : X [0, ) be a simple function such that 0 f, (0, 1) and Xn := {fn } . Notice that Xn X and fn 1Xn and so by denition of fn ,
Page: 71 job: prob macro: svmonob.cls

n =

date/time: 23-Feb-2007/15:20

72

8 Integration Theory
N

Now to the general case. Let gN :=


n=1

fn and g =
1

fn , then gN g and so

again by monotone convergence theorem and the additivity just proved,


N N

Proof. If f = 0 a.e. and f is a simple function then = 0 a.e. This implies that (1 ({y })) = 0 for all y > 0 and hence X d = 0 and therefore f d = 0. Conversely, if f d = 0, then by (Lemma 8.13), X (f 1/n) n

fn := lim
n=1

fn = lim
n=1

fn
n=1

f d = 0 for all n.

= lim

gN =

g =:
n=1

fn .

Therefore, (f > 0) n=1 (f 1/n) = 0, i.e. f = 0 a.e. For the second assertion let E be the exceptional set where f > g, i.e. E := {x X : f (x) > g (x)}. By assumption E is a null set and 1E c f 1E c g everywhere. Because g = 1E c g + 1E g and 1E g = 0 a.e., gd = and similarly f d = 1E c gd + 1E gd = 1E c gd

Remark 8.17. It is in the proof of this corollary (i.e. the linearity of the integral) that we really make use of the assumption that all of our functions are measurable. In fact the denition f d makes sense for all functions f : X [0, ] not just measurable functions. Moreover the monotone convergence theorem holds in this generality with no change in the proof. However, in the proof of Corollary 8.16, we use the approximation Theorem 6.34 which relies heavily on the measurability of the functions to be approximated. Example 8.18. Suppose, = N, B := 2N , and (A) = # (A) for A is the counting measure on B . Then for f : N [0, ), the function
N

1E c f d. Since 1E c f 1E c g everywhere, 1E c f d 1E c gd = gd.

f d =

Corollary 8.20. Suppose that {fn } is a sequence of non-negative measurable functions and f is a measurable function such that fn f o a null set, then fn f as n .

fN () :=
n=1

f (n) 1{n}

is a simple function with fN f as N . So by the monotone convergence theorem,


N

Proof. Let E X be a null set such that fn 1E c f 1E c as n . Then by the monotone convergence theorem and Proposition 8.19, fn = fn 1E c f 1E c = f as n .

f d = lim
N

fN d = lim
N N

f (n) ({n})
n=1

= lim

f (n) =
n=1 n=1

f (n) . Lemma 8.21 (Fatous Lemma). If fn : X [0, ] is a sequence of measurable functions then lim inf fn lim inf
n n nk

Exercise 8.3. Suppose that n : B [0, ] are measures on B for n N. Also suppose that n (A) is increasing in n for all A B. Prove that : B [0, ] dened by (A) := limn n (A) is also a measure. Hint: use Example 8.18 and the monotone convergence theorem. Proposition 8.19. Suppose that f 0 is a measurable function. Then f d = 0 i f = 0 a.e. Also if f, g 0 are measurable functions such that X f g a.e. then f d gd. In particular if f = g a.e. then f d = gd.

fn

Proof. Dene gk := inf fn so that gk lim inf n fn as k . Since gk fn for all k n, gk and therefore
macro: svmonob.cls date/time: 23-Feb-2007/15:20

fn for all n k

Page: 72

job: prob

8.2 Integrals of positive functions

73

gk lim inf

fn for all k.

Proof. Since ( n=1 An ) =


X

We may now use the monotone convergence theorem to let k to nd lim inf fn =
n k

1 d and n=1 An

lim gk =

MCT

lim

gk lim inf

fn .

(An ) =
n=1 X n=1

1An d

The following Lemma and the next Corollary are simple applications of Corollary 8.16. Lemma 8.22 (The First Borell Carntelli Lemma). Let (X, B , ) be a measure space, An B , and set

it suces to show

1An = 1 a.e. n=1 An


n=1

(8.6)

{An i.o.} = {x X : x An for innitely many ns} =


N =1 nN

An .

Now n=1 1An 1 and n=1 An some i = j, that is

n=1

1An (x) = 1 (x) i x Ai Aj for n=1 An

If

n=1

(An ) < then ({An i.o.}) = 0.

x:
n=1

1An (x) = 1 (x) n=1 An

= i<j Ai Aj

Proof. (First Proof.) Let us rst observe that

and the latter set has measure 0 being the countable union of sets of measure zero. This proves Eq. (8.6) and hence the corollary. Example 8.24. Let {rn } n=1 be an enumeration of the points in Q [0, 1] and dene 1 f (x) = 2n | x rn | n=1 with the convention that 1 |x rn | Since, By Theorem 8.7,
1 0

{An i.o.} = Hence if


n=1

xX:
n=1

1An (x) = .

(An ) < then


>
n=1

(An ) =
n=1 X

1An d =
X n=1

1An d

implies that
n=1

1An (x) < for - a.e. x. That is to say ({An i.o.}) = 0.

= 5 if x = rn .

(Second Proof.) Of course we may give a strictly measure theoretic proof of this fact: (An i.o.) = lim
N nN

An (An )

1 |x rn |

dx =

lim and the last limit is zero since


n=1

nN

rn 1 1 dx + dx x r r x n n rn 0 rn = 2 x rn |1 1 rn rn rn 2 rn x|0 = 2 4,

(An ) < .

we nd

Corollary 8.23. Suppose that (X, B , ) is a measure space and {An }n=1 B is a collection of sets such that (Ai Aj ) = 0 for all i = j, then

f (x)dm(x) =
[0,1] n=1

2n
[0,1]

1 |x rn |

dx
n=1

2n 4 = 4 < .

( n=1 An ) =
n=1

(An ).

In particular, m(f = ) = 0, i.e. that f < for almost every x [0, 1] and this implies that
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 73

job: prob

74

8 Integration Theory

2n
n=1

1 |x rn |

< for a.e. x [0, 1].

This result is somewhat surprising since the singularities of the summands form a dense subset of [0, 1].

Proof. Let f, g L1 (; R) and a, b R. By modifying f and g on a null set, we may assume that f, g are real valued functions. We have af + bg L1 (; R) because |af + bg | |a| |f | + |b| |g | L1 (; R) . If a < 0, then (af )+ = af and (af ) = af+ so that af = a f + a f+ = a( f+ f ) = a f.

8.3 Integrals of Complex Valued Functions


is integrable if f+ := Denition 8.25. A measurable function f : X R f 1{f 0} and f = f 1{f 0} are integrable. We write L1 (; R) for the space of real valued integrable functions. For f L1 (; R) , let f d = f+ d f d

A similar calculation works for a > 0 and the case a = 0 is trivial so we have shown that af = a f. Now set h = f + g. Since h = h+ h ,

are two measurable functions, let f + g denote Convention: If f, g : X R such that h(x) = f (x) + g (x) the collection of measurable functions h : X R whenever f (x) + g (x) is well dened, i.e. is not of the form or + . We use a similar convention for f g. Notice that if f, g L1 (; R) and h1 , h2 f + g, then h1 = h2 a.e. because |f | < and |g | < a.e. Notation 8.26 (Abuse of notation) We will sometimes denote the integral f d by (f ) . With this notation we have (A) = (1A ) for all A B . X Remark 8.27. Since f |f | f+ + f , a measurable function f is integrable i L1 (; R) := |f | d < . Hence |f | d < .
X

h+ h = f+ f + g+ g or h+ + f + g = h + f+ + g+ . Therefore, h+ + and hence h= h+ h = f+ + g+ f g = f+ g. f + g = h + f+ + g+

: f is measurable and f :XR

If f, g L1 (; R) and f = g a.e. then f = g a.e. and so it follows from Proposition 8.19 that f d = gd. In particular if f, g L1 (; R) we may dene (f + g ) d = hd
X X

Finally if f+ f = f g = g+ g then f+ + g g+ + f which implies that f+ + g g+ + f or equivalently that f= f+ f g+ g = g.

where h is any element of f + g. Proposition 8.28. The map f L1 (; R)


X

The monotonicity property is also a consequence of the linearity of the integral, the fact that f g a.e. implies 0 g f a.e. and Proposition 8.19. f d R f d gd for all f, g Denition 8.29. A measurable function f : X C is integrable if |f | d < . Analogously to the real case, let X L1 (; C) := f : X C : f is measurable and
X

is linear and has the monotonicity property: L1 (; R) such that f g a.e.


Page: 74

|f | d < .

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

8.3 Integrals of Complex Valued Functions

75

denote the complex valued integrable functions. Because, max (|Re f | , |Im f |) |f | 2 max (|Re f | , |Im f |) , |f | d < i |Re f | d + For f L (; C) dene f d = Re f d + i Im f d.
1

Proof. 1. By Chebyshevs inequality, Lemma 8.13, (|f | 1 )n n |f | d <


X

|Im f | d < . for all n. 2. (a) = (c) Notice that f=


E

g
E E

(f g ) = 0

It is routine to show the integral is still linear on L1 (; C) (prove!). In the remainder of this section, let L1 () be either L1 (; C) or L1 (; R) . If A B and f L1 (; C) or f : X [0, ] is a measurable function, let f d :=
A X

for all E B . Taking E = {Re(f g ) > 0} and using 1E Re(f g ) 0, we learn that 0 = Re
E

(f g )d =

1E Re(f g ) = 1E Re(f g ) = 0 a.e.

1A f d.

This implies that 1E = 0 a.e. which happens i ({Re(f g ) > 0}) = (E ) = 0. (8.7) Similar (Re(f g ) < 0) = 0 so that Re(f g ) = 0 a.e. Similarly, Im(f g ) = 0 a.e and hence f g = 0 a.e., i.e. f = g a.e. (c) = (b) is clear and so is (b) = (a) since f
E E

Proposition 8.30. Suppose that f L1 (; C) , then f d


X X

|f | d.

Proof. Start by writing X f d = Rei with R 0. We may assume that R = X f d > 0 since otherwise there is nothing to prove. Since R = ei
X

|f g | = 0.

f d =
X X

ei f d =
X

Re ei f d + i
X

Im ei f d,

it must be that 8.19,

Im ei f d = 0. Using the monotonicity in Proposition

Denition 8.32. Let (X, B , ) be a measure space and L1 () = L1 (X, B , ) denote the set of L1 () functions modulo the equivalence relation; f g i f = g a.e. We make this into a normed space using the norm f g
L1

|f g | d
L1

f d =
X X

Re e

f d
X

Re e

d
X

|f | d. and into a metric space using 1 (f, g ) = f g . Warning: in the future we will often not make much of a distinction between L1 () and L1 () . On occasion this can be dangerous and this danger will be pointed out when necessary.

Proposition 8.31. Let f, g L () , then 1. The set {f = 0} is nite, in fact {|f | for all n. 2. The following are equivalent a) E f = E g for all E B b) |f g | = 0
X 1 n}

{f = 0} and (|f |

1 n)

<

Remark 8.33. More generally we may dene Lp () = Lp (X, B , ) for p [1, ) as the set of measurable functions f such that |f | d <
X p

c) f = g a.e.

modulo the equivalence relation; f g i f = g a.e.


job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 75

76

8 Integration Theory

We will see in later that


1/p

Lp

|f | d

for f Lp ()

Proposition 8.35. Suppose that (, B , P ) is a probability space and {Zj }j =1 n are independent integrable random variables. Then j =1 Zj is also integrable and
n n

is a norm and (Lp (),

Lp )

is a Banach space in this norm.

E
j =1 n

Zj =
j =1

EZj .
n

Theorem 8.34 (Dominated Convergence Theorem). Suppose fn , gn , g L1 () , fn f a.e., |fn | gn L1 () , gn g a.e. and X gn d X gd. Then f L1 () and f d = lim
X h

Proof. By denition, {Zj }j =1 are independent i { (Zj )}j =1 are independent. Then as we have seen in a homework problem, E [1A1 . . . 1An ] = E [1A1 ] . . . E [1An ] when Ai (Zi ) for each i. By multi-linearity it follows that

fn d.
X 1

(In most typical applications of this theorem gn = g L () for all n.) Proof. Notice that |f | = limn |fn | limn |gn | g a.e. so that f L1 () . By considering the real and imaginary parts of f separately, it suces to prove the theorem in the case where f is real. By Fatous Lemma, (g f )d =
X

E [1 . . . n ] = E [1 ] . . . E [n ] whenever i are bounded (Zi ) measurable simple functions. By approximation by simple functions and the monotone and dominated convergence theorem, E [Y1 . . . Yn ] = E [Y1 ] . . . E [Yn ] whenever Yi is (Zi ) measurable and either Yi 0 or Yi is bounded. Taking Yi = |Zi | then implies that
n n

X n

lim inf (gn fn ) d lim inf


n X

(gn fn ) d

= lim =
X

gn d + lim inf
X n X

fn d

gd + lim inf
n X

fn d so that
n j =1

E
j =1

|Zj | =
j =1

E |Zj | <

Since lim inf n (an ) = lim sup an , we have shown,


n

K Zj is integrable. Moreover, for K > 0, let Zi = Zi 1|Zi |K , then n n

gd
X X

f d
X

gd +

lim inf n X fn d lim sup X fn d


n

E
j =1

Zj 1|Zj |K =
j =1

E Zj 1|Zj |K .

and therefore lim sup


n X

Now apply the dominated convergence theorem, n + 1 times, to conclude


n n n n

fn d
X

f d lim inf
n X X

fn d. f d.

E
j =1

Zj = lim E
K j =1

Zj 1|Zj |K =
j =1 n j =1

lim E Zj 1|Zj |K =
j =1 n

EZj .

This shows that lim

n X

fn d exists and is equal to

The dominating functions used here are

|Zj | , and {|Zj |}j =1 respectively. such that

Exercise 8.4. Give another proof of Proposition 8.30 by rst proving Eq. (8.7) with f being a simple function in which case the triangle inequality for complex numbers will do the trick. Then use the approximation Theorem 6.34 along with the dominated convergence Theorem 8.34 to handle the general case.

Corollary 8.36. Let {fn }n=1 L1 () be a sequence n=1 fn L1 () < , then n=1 fn is convergent a.e. and

fn
X n=1

d =
n=1 X

fn d.

Page: 76

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

8.3 Integrals of Complex Valued Functions

77

Proof. The condition L1 () . Hence then


n=1

n=1

fn is almost
N

fn L1 () < is equivalent to n=1 |fn | N everywhere convergent and if SN := n=1 fn ,

Proof. By considering the real and imaginary parts of f separately, we may assume that f is real. Also notice that f (t, x) = lim n(f (t + n1 , x) f (t, x)) n t and therefore, for x f t (t, x) is a sequential limit of measurable functions and hence is measurable for all t J. By the mean value theorem,

|SN |
n=1

|fn |
n=1

|fn | L () .

So by the dominated convergence theorem,

fn
X n=1

d =

X N N

lim SN d = lim

SN d
X

|f (t, x) f (t0 , x)| g (x) |t t0 | for all t J and hence

(8.8)

= lim

fn d =
n=1 X n=1 X

fn d.

|f (t, x)| |f (t, x) f (t0 , x)| + |f (t0 , x)| g (x) |t t0 | + |f (t0 , x)| . This shows f (t, ) L1 () for all t J. Let G(t) :=
X

f (t, x)d(x), then

Example 8.37 (Integration of Power Series). Suppose R > 0 and is a sequence of complex numbers such that n=0 |an | rn < for all r (0, R). Then

{an }n=0

G(t) G(t0 ) = t t0 By assumption, lim

f (t, x) f (t0 , x) d(x). t t0

an xn
n=0

dm(x) =
n=0

an

xn dm(x) =
n=0

an

n+1 n+1 n+1

for all R < < < R. Indeed this follows from Corollary 8.36 since

tt0

f (t, x) f (t0 , x) f (t, x) for all x X = t t0 t

|an | |x| dm(x)


n=0 n=0 0

| |

|an | |x| dm(x) +


0

||

and by Eq. (8.8), |an | |x| dm(x)


n

n=0

|an |

| |

n+1

+ || n+1

n+1

2r
n=0

|an | rn <

f (t, x) f (t0 , x) g (x) for all t J and x X. t t0 Therefore, we may apply the dominated convergence theorem to conclude lim G(tn ) G(t0 ) = lim n tn t0 f (tn , x) f (t0 , x) d(x) tn t0 X f (tn , x) f (t0 , x) = lim d(x) n tn t0 X f = (t0 , x)d(x) X t

where r = max(| | , ||). Corollary 8.38 (Dierentiation Under the Integral). Suppose that J R is an open interval and f : J X C is a function such that 1. x f (t, x) is measurable for each t J. 2. f (t0 , ) L1 () for some t0 J. 3. f t (t, x) exists for all (t, x). 4. There is a function g L1 () such that
f t (t, )

g for each t J.

Then f (t, ) L1 () for all t J (i.e. X |f (t, x)| d(x) < ), t f (t, x)d(x) is a dierentiable function on J and X d dt
Page: 77

(t0 ) = for all sequences tn J \ {t0 } such that tn t0 . Therefore, G G(t)G(t0 ) limtt0 exists and tt0 (t0 ) = G
X

f (t, x)d(x) =
X X

f (t, x)d(x). t
macro: svmonob.cls

f (t0 , x)d(x). t

job: prob

date/time: 23-Feb-2007/15:20

78

8 Integration Theory

Example 8.39. Recall from Example 8.8 that 1 =


[0,)

2. Let f : X [0, ] be a measurable function, show f d =


X X

ex dm(x) for all > 0.

f d.

(8.9)

Let > 0. For 2 > 0 and n N there exists Cn () < such that 0 d d
n

ex = xn ex C ()ex .

Hint: rst prove the relationship for characteristic functions, then for simple functions, and then for general positive measurable functions. 3. Show that a measurable function f : X C is in L1 ( ) i |f | L1 () and if f L1 ( ) then Eq. (8.9) still holds. Solution to Exercise (8.5). The fact that is a measure follows easily from Corollary 8.16. Clearly Eq. (8.9) holds when f = 1A by denition of . It then holds for positive simple functions, f, by linearity. Finally for general f L+ , choose simple functions, n , such that 0 n f. Then using MCT twice we nd f d = lim
n

Using this fact, Corollary 8.38 and induction gives n!n1 = =


[0,)

d d

1 =
[0,)

d d

ex dm(x)

x e
n x

n x

dm(x). n d = lim
X X n

n d =
X

That is n! =

n [0,)

x e

dm(x). Recall that xt1 ex dx for t > 0.


[0,)

X n

lim n d =
X

f d.

By what we have just proved, for all f : X C we have (t) := |f | d =


X X

|f | d

(The reader should check that (t) < for all t > 0.) We have just shown that (n + 1) = n! for all n N. Remark 8.40. Corollary 8.38 may be generalized by allowing the hypothesis to hold for x X \ E where E B is a xed null set, i.e. E must be independent of t. Consider what happens if we formally apply Corollary 8.38 to g (t) := 1xt dm(x), 0 g (t) = d dt
0

so that f L1 () i |f | L1 (). If f L1 () and f is real, f d =


X X

f+ d
X

f d =
X

f+ d
X

f d

=
X

[f+ f ] d =
X

f d.

1xt dm(x) =
0

1xt dm(x). t

The complex case easily follows from this identity. Notation 8.41 It is customary to informally describe dened in Exercise 8.5 by writing d = d. Exercise 8.6. Let (X, M, ) be a measure space, (Y, F ) be a measurable space and f : X Y be a measurable map. Dene a function : F [0, ] by (A) := (f 1 (A)) for all A F . 1. Show is a measure. (We will write = f or = f 1 .) 2. Show gd = (g f ) d
Y X

The last integral is zero since t 1xt = 0 unless t = x in which case it is not dened. On the other hand g (t) = t so that g (t) = 1. (The reader should decide which hypothesis of Corollary 8.38 has been violated in this example.)

8.4 Densities and Change of Variables Theorems


Exercise 8.5. Let (X, M, ) be a measure space and : X [0, ] be a measurable function. For A M, set (A) := A d. 1. Show : M [0, ] is a measure.
Page: 78 job: prob

(8.10)

for all measurable functions g : Y [0, ]. Hint: see the hint from Exercise 8.5.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

8.4 Densities and Change of Variables Theorems

79

3. Show a measurable function g : Y C is in L1 ( ) i g f L1 () and that Eq. (8.10) holds for all g L1 ( ). Solution to Exercise (8.6). The fact that is a measure is a direct check which will be left to the reader. The key computation is to observe that if A F and g = 1A , then gd =
Y Y

Exercise 8.7. Let F : R R be a C 1 -function such that F (x) > 0 for all x R and limx F (x) = . (Notice that F is strictly increasing so that F 1 : R R exists and moreover, by the inverse function theorem that F 1 is a C 1 function.) Let m be Lebesgue measure on BR and (A) = m(F (A)) = m( F 1
1 1 (A)) = F m (A)

1A d = (A) = f 1 (A) =
X

1f 1 (A) d.

for all A BR . Show d = F dm. Use this result to prove the change of variable formula, h F F dm =
R R

Moreover, 1f 1 (A) (x) = 1 i x f 1 (A) which happens i f (x) A and hence 1f 1 (A) (x) = 1A (f (x)) = g (f (x)) for all x X. Therefore we have gd =
Y X

hdm

(8.14)

(g f ) d

whenever g is a characteristic function. This identity now extends to nonnegative simple functions by linearity and then to all non-negative measurable functions by MCT. The statements involving complex functions follows as in the solution to Exercise 8.5. Remark 8.42. If X is a random variable on a probability space, (, B , P ) , and F (x) := P (X x) . Then E [f (X )] =
R

which is valid for all Borel measurable functions h : R [0, ]. Hint: Start by showing d = F dm on sets of the form A = (a, b] with a, b R and a < b. Then use the uniqueness assertions in Exercise 5.1 to conclude d = F dm on all of BR . To prove Eq. (8.14) apply Exercise 8.6 with g = h F and f = F 1 . Solution to Exercise (8.7). Let d = F dm and A = (a, b], then ((a, b]) = m(F ((a, b])) = m((F (a), F (b)]) = F (b) F (a) while ((a, b]) =
(a,b] b

F dm =
a

F (x)dx = F (b) F (a).

f (x) dF (x)

(8.11)

It follows that both = = F where F is the measure described in Proposition 5.7. By Exercise 8.6 with g = h F and f = F 1 , we nd h F F dm =
R R

where dF (x) is shorthand for dF (x) and F is the unique probability measure on (R, BR ) such that F ((, x]) = F (x) for all x R. Moreover if F : R [0, 1] happens to be C 1 -function, then dF (x) = F (x) dm (x) and Eq. (8.11) may be written as (8.12)

h F d =
R

1 h F d F m = R

(h F ) F 1 dm

=
R

hdm.

This result is also valid for all h L1 (m). Lemma 8.43. Suppose that X is a standard normal random variable, i.e. E [f (X )] =
R

f (x) F (x) dm (x) .

(8.13) then

1 P (X A) = 2

ex
A

/2

dx for all A BR ,

To verify Eq. (8.12) it suces to observe, by the fundamental theorem of calculus, that
b

P (X x) and1
1

1 1 x2 /2 e x 2

(8.15)

F ((a, b]) = F (b) F (a) =


a

F (x) dx =
(a,b]

F dm.

From this equation we may deduce that F (A) =

F dm for all A BR .

See, Gordon, Robert D. Values of Mills ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Ann. Math. Statistics 12, (1941). 364366. (Reviewer: Z. W. Birnbaum) 62.0X

Page: 79

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

80

8 Integration Theory

lim 2 x 1 1 ex /2 x 2 Proof. We begin by observing that

P (X x)

= 1.

(8.16)

Hence if p = 2 , we nd 2 1 x2 = x2(1+) + 2x 3x so that 1 P (X x)


1 1 x2 /2 x 2 e
1 1 e3x /2 1 + x(2)

P (X x) =
x

2 1 ey /2 dy 2

1 1 y2 /2 1 y y2 /2 e dy = e |x x 2 2 x

from which Eq. (8.15) follows. To prove Eq. (8.16), let > 1, then P (X x) =
2 2 1 1 ey /2 dy ey /2 dy 2 2 x x x 1 1 y2 /2 x 1 y y2 /2 e dy = e |x 2 x 2 x x 2 2 2 1 1 = ex /2 e x /2 . x 2

for x suciently large. Example 8.44. Let {Xn }n=1 be i.i.d. standard normal random variables. Then P (Xn cn ) Now, suppose that we take cn so that ecn /2 =
2

1 2 c2 n /2 . e cn

Hence P (X x) 1 1 x2 /2 x 2 e
2 x 1 ey /2 dy x 2 1 1 x2 /2 x 2 e

C n

1 ex /2 e ex2 /2

x2 /2

2 2 1 1 e( 1)x /2 .

or equivalently, c2 n /2 = ln (n/C ) or cn = 2 ln (n) 2 ln (C ).

From this equation it follows that lim inf P (X x)


2 x 1 1 ex /2 x 2

1 .

(We now take C = 1.) It then follows that P (Xn cn ) and therefore 1 2 ln (n) e
2

ln(n)

1 2 ln (n)

1 n2

Since > 1 was arbitrary, it follows that lim inf Since Eq. (8.15) implies that P (X x) lim sup 1 1 x2 /2 = 1 x x e 2 we are done. Additional information: Suppose that we now take = 1 + xp = Then 2 1 x2 = x2p + 2xp x2 = x22p + 2x2p .
Page: 80 job: prob

P (X x)

2 x 1 1 ex /2 x 2

= 1.

P (Xn cn ) = if < 1
n=1

and

P (Xn cn ) < if > 1.


n=1

Hence an application of Proposition 7.35 shows lim sup


n

1 + xp . xp

Xn = 1 a.s.. 2 ln n

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

8.6 Comparison of the Lebesgue and the Riemann Integral

81

8.5 Measurability on Complete Measure Spaces


In this subsection we will discuss a couple of measurability results concerning completions of measure spaces. Proposition 8.45. Suppose that (X, B , ) is a complete measure space2 and f : X R is measurable. 1. If g : X R is a function such that f (x) = g (x) for a.e. x, then g is measurable. 2. If fn : X R are measurable and f : X R is a function such that limn fn = f, - a.e., then f is measurable as well. Proof. 1. Let E = {x : f (x) = g (x)} which is assumed to be in B and (E ) = 0. Then g = 1E c f + 1E g since f = g on E c . Now 1E c f is measurable so g will be measurable if we show 1E g is measurable. For this consider, (1E g )1 (A) = E c (1E g )1 (A \ {0}) if 0 A (1E g )1 (A) if 0 /A (8.17)

, B ) measurable simple function n 0 such assume that f 0. Choose (M that n f as n . Writing n = ak 1Ak

, we may choose Bk M such that Bk Ak and with Ak M (Ak \ Bk ) = 0. Letting n := ak 1Bk we have produced a (M, B ) measurable simple function n 0 such that En := {n = n } has zero measure. Since (n En ) n (En ) , there exists F M such that n En F and (F ) = 0. It now follows that 1F n = 1F n g := 1F f as n . This shows that g = 1F f is (M, B ) measurable and that {f = g } F has measure zero. Since f = g , a.e., X f d = X gd so to prove Eq. (8.18) it suces to prove gd =
X X

gd.

(8.18)

Since (1E g )1 (B ) E if 0 / B and (E ) = 0, it follow by completeness of B that (1E g )1 (B ) B if 0 / B. Therefore Eq. (8.17) shows that 1E g is measurable. 2. Let E = {x : lim fn (x) = f (x)} by assumption E B and
n

(E ) = 0. Since g := 1E f = limn 1E c fn , g is measurable. Because f = g on E c and (E ) = 0, f = g a.e. so by part 1. f is also measurable. The above results are in general false if (X, B , ) is not complete. For example, let X = {0, 1, 2}, B = {{0}, {1, 2}, X, } and = 0 . Take g (0) = 0, g (1) = 1, g (2) = 2, then g = 0 a.e. yet g is not measurable. is the comLemma 8.46. Suppose that (X, M, ) is a measure space and M pletion of M relative to and is the extension of to M. Then a function , B = B R ) measurable i there exists a function g : X R f : X R is (M and that is (M, B ) measurable such E = {x : f (x) = g (x)} M (E ) = 0, i.e. f (x) = g (x) for a.e. x. Moreover for such a pair f and g, f L1 ( ) i g L1 () and in which case f d =
X X

Because = on M, Eq. (8.18) is easily veried for non-negative M measurable simple functions. Then by the monotone convergence theorem and the approximation Theorem 6.34 it holds for all M measurable functions g : X [0, ]. The rest of the assertions follow in the standard way by considering (Re g ) and (Im g ) .

8.6 Comparison of the Lebesgue and the Riemann Integral


For the rest of this chapter, let < a < b < and f : [a, b] R be a bounded function. A partition of [a, b] is a nite subset [a, b] containing {a, b}. To each partition = {a = t 0 < t 1 < < t n = b } (8.19)

gd.

of [a, b] let mesh( ) := max{|tj tj 1 | : j = 1, . . . , n}, Mj = sup{f (x) : tj x tj 1 }, mj = inf {f (x) : tj x tj 1 }


n n

Proof. Suppose rst that such a function g exists so that (E ) = 0. Since , B ) measurable, we see from Proposition 8.45 that f is (M , B) g is also (M , B ) measurable, by considering f we may measurable. Conversely if f is (M
2

Recall this means that if N X is a set such that N A M and (A) = 0, then N M as well.

G = f (a)1{a} +
1

Mj 1(tj1 ,tj ] , g = f (a)1{a} +


1

mj 1(tj1 ,tj ] and

Page: 81

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

82

8 Integration Theory

S f = Notice that

Mj (tj tj 1 ) and s f =
b b

mj (tj tj 1 ).

Lemma 8.50. The functions H, h : [a, b] R satisfy: 1. h(x) f (x) H (x) for all x [a, b] and h(x) = H (x) i f is continuous at x. 2. If {k }k=1 is any increasing sequence of partitions such that mesh(k ) 0 and G and g are dened as in Eq. (8.20), then G(x) = H (x) f (x) h(x) = g (x) x / := k=1 k . (8.23)

S f =
a

G dm and s f =
a

g dm.

The upper and lower Riemann integrals are dened respectively by


b a

f (x)dx = inf S f and


a b

f (x)dx = sup s f.
b f a

Denition 8.47. The function f is Riemann integrable i and which case the Riemann integral
b b b a

b a

(Note is a countable set.) 3. H and h are Borel measurable. f R Proof. Let Gk := Gk G and gk := gk g. 1. It is clear that h(x) f (x) H (x) for all x and H (x) = h(x) i lim f (y )
y x

f is dened to be the common value:


b

f (x)dx =
a a

f (x)dx =
a

f (x)dx.

The proof of the following Lemma is left to the reader as Exercise 8.18. Lemma 8.48. If and are two partitions of [a, b] and then G G f g g and S f S f s f s f. There exists an increasing sequence of partitions {k }k=1 such that mesh(k ) 0 and
b b

exists and is equal to f (x). That is H (x) = h(x) i f is continuous at x. 2. For x / , Gk (x) H (x) f (x) h(x) gk (x) k and letting k in this equation implies G(x) H (x) f (x) h(x) g (x) x / . Moreover, given > 0 and x / , sup{f (y ) : |y x| , y [a, b]} Gk (x) for all k large enough, since eventually Gk (x) is the supremum of f (y ) over some interval contained in [x , x + ]. Again letting k implies sup f (y ) G(x) and therefore, that
|y x|

(8.24)

Sk f
a

f and sk f
a

f as k .

If we let G := lim Gk and g := lim gk


k k

(8.20)

H (x) = lim sup f (y ) G(x)


y x

then by the dominated convergence theorem,


b

gdm = lim
[a,b]

gk = lim sk f =
[a,b] k a

f (x)dx

(8.21)

and
b

for all x / . Combining this equation with Eq. (8.24) then implies H (x) = G(x) if x / . A similar argument shows that h(x) = g (x) if x / and hence Eq. (8.23) is proved. 3. The functions G and g are limits of measurable functions and hence measurable. Since H = G and h = g except possibly on the countable set , both H and h are also Borel measurable. (You justify this statement.)

Gdm = lim
[a,b]

Gk = lim Sk f =
[a,b] k a

f (x)dx.

(8.22) Theorem 8.51. Let f : [a, b] R be a bounded function. Then


b b

Notation 8.49 For x [a, b], let H (x) = lim sup f (y ) := lim sup{f (y ) : |y x| , y [a, b]} and
y x y x 0

f=
a [a,b]

Hdm and
a

f=
[a,b]

hdm

(8.25)

h(x) = lim inf f (y ) := lim inf {f (y ) : |y x| , y [a, b]}.


0

and the following statements are equivalent:


job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 82

8.7 Exercises

83

1. H (x) = h(x) for m -a.e. x, 2. the set E := {x [a, b] : f is discontinuous at x} is an m null set. 3. f is Riemann integrable. If f is Riemann integrable then f is Lebesgue measurable3 , i.e. f is L/B measurable where L is the Lebesgue algebra and B is the Borel algebra on [a, b]. Moreover if we let m denote the completion of m, then
b

2. Dene A B i (AB ) = 0 and notice that (A, B ) = 0 i A B. Show is an equivalence relation. 3. Let M/ denote M modulo the equivalence relation, , and let [A] := {B M : B A} . Show that ([A] , [B ]) := (A, B ) is gives a well dened metric on M/ . 4. Similarly show ([A]) = (A) is a well dened function on M/ and show : (M/ ) R+ is continuous. Exercise 8.10. Suppose that n : M [0, ] are measures on M for n N. Also suppose that n (A) is increasing in n for all A M. Prove that : M [0, ] dened by (A) := limn n (A) is also a measure. Exercise 8.11. Now suppose that is some index set and for each , : M [0, ] is a measure on M. Dene : M [0, ] by (A) = (A) for each A M. Show that is also a measure. Exercise 8.12. Let (X, M, ) be a measure space and {An }n=1 M, show ({An a.a.}) lim inf (An )
n

Hdm =
[a,b] a

f (x)dx =
[a,b]

f dm =
[a,b]

hdm.

(8.26)

Proof. Let {k }k=1 be an increasing sequence of partitions of [a, b] as described in Lemma 8.48 and let G and g be dened as in Lemma 8.50. Since m( ) = 0, H = G a.e., Eq. (8.25) is a consequence of Eqs. (8.21) and (8.22). From Eq. (8.25), f is Riemann integrable i Hdm =
[a,b] [a,b]

hdm

and if (mn Am ) < for some n, then ({An i.o.}) lim sup (An ) .
n

and because h f H this happens i h(x) = H (x) for m - a.e. x. Since E = {x : H (x) = h(x)}, this last condition is equivalent to E being a m null set. In light of these results and Eq. (8.23), the remaining assertions including Eq. (8.26) are now consequences of Lemma 8.46. Notation 8.52 In view of this theorem we will often write b f dm. a
b a

Exercise 8.13 (Folland 2.13 on p. 52.). Suppose that {fn }n=1 is a sequence of non-negative measurable functions such that fn f pointwise and
n

f (x)dx for Then

lim

fn =

f < .

f = lim

8.7 Exercises
Exercise 8.8. Let be a measure on an algebra A 2X , then (A) + (B ) = (A B ) + (A B ) for all A, B A. Exercise 8.9 (From problem 12 on p. 27 of Folland.). Let (X, M, ) be a nite measure space and for A, B M let (A, B ) = (AB ) where AB = (A \ B ) (B \ A) . It is clear that (A, B ) = (B, A) . Show: 1. satises the triangle inequality: (A, C ) (A, B ) + (B, C ) for all A, B, C M.
3

fn
E

for all measurable sets E M. The conclusion need not hold if limn f. Hint: Fatou times two.

fn =

Exercise 8.14. Give examples of measurable functions {fn } on R such that fn decreases to 0 uniformly yet fn dm = for all n. Also give an example of a sequence of measurable functions {gn } on [0, 1] such that gn 0 while gn dm = 1 for all n. Exercise 8.15. Suppose {an }n= C is a summable sequence (i.e. in is a continuous function for n= |an | < ), then f ( ) := n= an e R and 1 f ()ein d. an = 2
macro: svmonob.cls date/time: 23-Feb-2007/15:20

f need not be Borel measurable.

Page: 83

job: prob

84

8 Integration Theory

Exercise 8.16. For any function f L1 (m) , show x R (,x] f (t) dm (t) is continuous in x. Also nd a nite measure, , on BR such that x (,x] f (t) d (t) is not continuous. Exercise 8.17. Folland 2.31b and 2.31e on p. 60. (The answer in 2.13b is wrong by a factor of 1 and the sum is on k = 1 to . In part (e), s should be taken to be a. You may also freely use the Taylor series expansion (1 z )1/2 = (2n 1)!! n (2n)! n z = 2 z for |z | < 1. n n 2 n! n=0 n=0 4 (n!)

Exercise 8.20 (A simple form of the Strong Law of Large Numbers). 4 Suppose now that E |X1 | < . Show for all > 0 and n N that Sn n
4

= =

1 n + 3n(n 1) 4 n4 1 n1 + 3 1 n1 4 n2

and use this along with Chebyshevs inequality to show P Sn > n n1 + 3 1 n1 4 . 4 n2

Exercise 8.18. Prove Lemma 8.48. 8.7.1 Laws of Large Numbers Exercises For the rest of the problems of this section, let (, B , P ) be a probability n space, {Xn }n=1 be a sequence if i.i.d. random variables, and Sn := k=1 Xk . If E |Xn | = E |X1 | < let := EXn be the mean of Xn , if E |Xn |
2

Conclude from the last estimate and the rst Borel Cantelli Lemma 8.22 that n limn S n = a.s.

= E |X1 |

< , let
2 = E Xn 2 be the standard deviation of Xn

2 := E (Xn ) and if E |Xn |


4

< , let := E |Xn |


4

Exercise 8.19 (A simple form of the Weak Law of Large Numbers). 2 Assume E |X1 | < . Show E E P for all > 0 and n N.
Page: 84 job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

Sn = , n
2

Sn n

Sn > n

2 , and n 2 2 n =

9 Functional Forms of the Theorem


Notation 9.1 Let be a set and H be a subset of the bounded real valued functions on H. We say that H is closed under bounded convergence if; for every sequence, {fn }n=1 H, satisfying: 1. there exists M < such that |fn ( )| M for all and n N, 2. f ( ) := limn fn ( ) exists for all , then f H. Similarly we say that H is closed under monotone conver gence if; for every sequence, {fn }n=1 H, satisfying: 1. there exists M < such that 0 fn ( ) M for all and n N, 2. fn ( ) is increasing in n for all , then f := limn fn H. Clearly if H is closed under bounded convergence then it is also closed under monotone convergence. Proposition 9.2. Let be a set. Suppose that H is a vector subspace of bounded real valued functions from to R which is closed under monotone convergence. Then H is closed under uniform convergence. as well, i.e. {fn }n=1 H with supnN sup |fn ( )| < and fn f, then f H. Proof. Let us rst assume that {fn }n=1 H such that fn converges uniformly to a bounded function, f : R. Let f := sup |f ( )| . Let > 0 be given. By passing to a subsequence if necessary, we may assume f fn 2(n+1) . Let gn := fn n + M with n and M constants to be determined shortly. We then have gn+1 gn = fn+1 fn + n n+1 2
(n+1)

Theorem 9.3 (Dynkins Multiplicative System Theorem). Suppose that H is a vector subspace of bounded functions from to R which contains the constant functions and is closed under monotone convergence. If M is multiplicative system (i.e. M is a subset of H which is closed under pointwise multiplication), then H contains all bounded (M) measurable functions. Proof. Let L := {A : 1A H} . We then have L since 1 = 1 H, if A, B L with A B then B \ A L since 1B \A = 1B 1A H, and if An L with An A, then A L because 1An H and 1An 1A H. Therefore L is system. Let n (x) = 0 [(nx) 1] (see Figure 9.1 below) so that n (x) 1x>0 . Given f1 , f2 , . . . , fk M and a1 , . . . , ak R, let
k

Fn :=
i=1

n (fi ai )

and let M := sup sup |fi ( ) ai | .


i=1,...,k

By the Weierstrass approximation Theorem 4.23, we may nd polynomial functions, pl (x) such that pl n uniformly on [M, M ] .Since pl is a polynomial k it is easily seen that i=1 pl (fi ai ) H. Moreover,
k

pl (fi ai ) Fn uniformly as l ,
i=1

from with it follows that Fn H for all n. Since,


k

+ n n+1 . Fn

Taking n := 2n , then n n+1 = 2n (1 1/2) = 2(n+1) in which case gn+1 gn 0 for all n. By choosing M suciently large, we will also have gn 0 for all n. Since H is a vector space containing the constant functions, gn H and since gn f + M, it follows that f = f + M M H. So we have shown that H is closed under uniform convergence.

1{fi >ai } = 1k i=1 {fi >ai }


i=1

it follows that 1k H or equivalently that k i=1 {fi > ai } L. Therei=1 {fi >ai } fore L contains the system, P , consisting of nite intersections of sets of the form, {f > a} with f M and a R.

86

9 Functional Forms of the Theorem

algebra. Using the fact that H is closed under bounded convergence, it follows that B is closed under increasing unions and hence that B is algebra. Since H is a vector space, H contains all B measurable simple functions. Since every bounded B measurable function may be written as a bounded limit of such simple functions, it follows that H contains all bounded B measurable functions. The proof is now completed by showing B contains (M) as was done in second paragraph of the proof of Theorem 9.3. Corollary 9.5. Suppose H is a real subspace of bounded functions such that 1 H and H is closed under bounded convergence. If P 2 is a multiplicative class such that 1A H for all A P , then H contains all bounded (P ) measurable functions. Proof. Let M = {1}{1A : A P} . Then M H is a multiplicative system and the proof is completed with an application of Theorem 9.3. Example 9.6. Suppose and are two probability measure on (, B ) such that f d =

Fig. 9.1. Plots of 1 , 2 and 3 .

As a consequence of the above paragraphs and the theorem, L contains (P ) = (M) . In particular it follows that 1A H for all A (M) . Since any positive (M) measurable function may be written as a increasing limit of simple functions, it follows that H contains all non-negative bounded (M) measurable functions. Finally, since any bounded (M) measurable functions may be written as the dierence of two such non-negative simple functions, it follows that H contains all bounded (M) measurable functions. Corollary 9.4. Suppose that H is a vector subspace of bounded functions from to R which contains the constant functions and is closed under bounded convergence. If M is a subset of H which is closed under pointwise multiplication, then H contains all bounded (M) measurable functions. Proof. This is of course a direct consequence of Theorem 9.3. Moreover, under the assumptions here, the proof of Theorem 9.3 simplies in that Proposition 9.2 is no longer needed. For fun, let us give another self-contained proof of this corollary which does not even refer to the theorem. In this proof, we will assume that H is the smallest subspace of bounded functions on which contains the constant functions, contains M, and is closed under bounded convergence. (As usual such a space exists by taking the intersection of all such spaces.) For f H, let Hf := {g H : gf H} . The reader will now easily verify that Hf is a linear subspace of H, 1 Hf , and Hf is closed under bounded convergence. Moreover if f M, then M Hf and so by the denition of H, H = Hf , i.e. f g H for all f M and g H. Having proved this it now follows for any f H that M Hf and therefore f g H whenever f, g H, i.e. H is now an algebra of functions. We will now show that B := {A : 1A H} is algebra. Using the fact that H is an algebra containing constants, the reader will easily verify that B is closed under complementation, nite intersections, and contains , i.e. B is an
Page: 86 job: prob

f d

(9.1)

for all f in a multiplicative subset, M, of bounded measurable functions on . Then = on (M) . Indeed, apply Theorem 9.3 with H being the bounded measurable functions on such that Eq. (9.1) holds. In particular if M = {1} {1A : A P} with P being a multiplicative class we learn that = on (M) = (P ) . Corollary 9.7. The smallest subspace of real valued functions, H, on R which contains Cc (R, R) (the space of continuous functions on R with compact support) is the collection of bounded Borel measurable function on R. Proof. By a homework problem, for < a < b < , 1(a,b] may be written as a bounded limit of continuous functions with compact support from which it follows that (Cc (R, R)) = BR . It is also easy to see that 1 is a bounded limit of functions in Cc (R, R) and hence 1 H. The corollary now follows by an application of The result now follows by an application of Theorem 9.3 with M := Cc (R, R). For the rest of this chapter, recall for p [1, ) that Lp () = Lp (X, B , ) is 1/p p the set of measurable functions f : R such that f Lp := |f | d < . It is easy to see that f p = || f p for all R and we will show below that f + g p f p + g p for all f, g Lp () , i.e.
p

satises the triangle inequality.

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

9 Functional Forms of the Theorem

87

Theorem 9.8 (Density Theorem). Let p [1, ), (, B , ) be a measure space and M be an algebra of bounded R valued measurable functions such that 1. M Lp (, R) and (M) = B . 2. There exists k M such that k 1 boundedly. Then to every function f Lp (, R) , there exist n M such that limn f n Lp () = 0, i.e. M is dense in Lp (, R) . Proof. Fix k N for the moment and let H denote those bounded B measurable functions, f : R, for which there exists {n }n=1 M such that limn k f n Lp () = 0. A routine check shows H is a subspace of the bounded measurable R valued functions on , 1 H, M H and H is closed under bounded convergence. To verify the latter assertion, suppose fn H and fn f boundedly. Then, by the dominated convergence theorem, limn k (f fn ) Lp () = 0.1 (Take the dominating function to be g = p [2C |k |] where C is a constant bounding all of the {|fn |}n=1 .) We may now 1 choose n M such that n k fn Lp () n then lim sup k f n
n Lp ()

Theorem 9.10. Suppose p [1, ), A B 2 is an algebra such that (A) = B and is nite on A. Let S(A, ) denote the measurable simple functions, : R such { = y } A for all y R and ({ = 0}) < . Then S(A, ) is dense subspace of Lp (). Proof. Let M := S(A, ). By assumption there exists k A such that (k ) < and k as k . If A A, then k A A and (k A) < so that 1k A M. Therefore 1A = limk 1k A is (M) measurable for every A A. So we have shown that A (M) B and therefore B = (A) (M) B , i.e. (M) = B . The theorem now follows from Theorem 9.8 after observing k := 1k M and k 1 boundedly. Theorem 9.11 (Separability of Lp Spaces). Suppose, p [1, ), A B is a countable algebra such that (A) = B and is nite on A. Then Lp () is separable and D={ aj 1Aj : aj Q + iQ, Aj A with (Aj ) < }

is a countable dense subset. Proof. It is left to reader to check D is dense in S(A, ) relative to the Lp () norm. Once this is done, the proof is then complete since S(A, ) is a dense subspace of Lp () by Theorem 9.10. Notation 9.12 Given a collection of bounded functions, M, from a set, , to R, let M (M ) denote the the bounded monotone increasing (decreasing) limits of functions from M. More explicitly a bounded function, f : R is in M respectively M i there exists fn M such that fn f respectively fn f. Exercise 9.1. Let (, B , P ) be a probability space and X, Y : R be a pair of random variables such that E [f (X ) g (Y )] = E [f (X ) g (X )] for every pair of bounded measurable functions, f, g : R R. Show P (X = Y ) = 1. Hint: Let H denote the bounded Borel measurable functions, h : R2 R such that E [h (X, Y )] = E [h (X, X )] . Use Corollary 9.4 to show H is the vector space of all bounded Borel measurable functions. Then take h (x, y ) = 1{x=y} . Theorem 9.13 (Bounded Approximation Theorem). Let (, B , ) be a nite measure space and M be an algebra of bounded R valued measurable functions such that:
macro: svmonob.cls date/time: 23-Feb-2007/15:20

lim sup k (f fn )
n n

Lp () Lp ()

+ lim sup k fn n

=0

(9.2)

which implies f H. An application of Dynkins Multiplicative System Theorem 9.3, now shows H contains all bounded measurable functions on . Let f Lp () be given. The dominated convergence theorem implies limk k 1{|f |k} f f Lp () = 0. p (Take the dominating function to be g = [2C |f |] where C is a bound on all of the |k | .) Using this and what we have just proved, there exists k M such that 1 k 1{|f |k} f k Lp () . k The same line of reasoning used in Eq. (9.2) now implies limk f k Lp () = 0. Example 9.9. Let be a measure on (R, BR ) such that ([M, M ]) < for all M < . Then, Cc (R, R) (the space of continuous functions on R with compact support) is dense in Lp () for all 1 p < . To see this, apply Theorem 9.8 with M = Cc (R, R) and k := 1[k,k] .
1

It is at this point that the proof would break down if p = .

Page: 87

job: prob

88

9 Functional Forms of the Theorem

1. (M) = B , 2. 1 M, and 3. |f | M for all f M.

Since > 0 was arbitrary, if follows that g H for 0. Similarly, M h g f M and (f (h)) = (h f ) < . which shows g H as well. Because of Theorem 9.3, to complete this proof, it suces to show H is closed under monotone convergence. So suppose that gn H and gn g, where g : R is a bounded function. Since H is a vector space, it follows that 0 n := gn+1 gn H for all n N. So if > 0 is given, we can nd, M un n vn M such that (vn un ) 2n for all n. By replacing un by un 0 M (by observation 1.), we may further assume that un 0. Let
N

Then for every bounded (M) measurable function, g : R, and every niel > 0, there exists f M and h M such that f g h and (h f ) < . in Proof. Let us begin with a few simple observations. s to of 1. M is a lattice if f, g M then s. 1 f g = (f + g + |f g |) M 2 1 f g = (f + g |f g |) M. 2 If f, g M or f, g M then f + g M or f + g M respectively. If 0 and f M (f M ), then f M (f M ) . If f M then f M and visa versa. If fn M and fn f where f : R is a bounded function, then f M . Indeed, by assumption there exists fn,i M such that fn,i fn as i . By observation (1), gn := max {fij : i, j n} M. Moreover it is clear that gn max {fk : k n} = fn f and hence gn g := limn gn f. Since fij g for all i, j, it follows that fn = limj fnj g and consequently that f = limn fn g f. So we have shown that gn f M . and

v :=
n=1

vn = lim

vn M (using observations 2. and 5.)


n=1

2. 3. 4. 5.

and for N N, let


N

uN :=
n=1

un M (using observation 2).

Then

n = lim
n=1

n = lim (gN +1 g1 ) = g g1
n=1 N

and uN g g1 v. Moreover,
N N

Now let H denote the collection of bounded measurable functions which satisfy the assertion of the theorem. Clearly, M H and in fact it is also easy to see that M and M are contained in H as well. For example, if f M , by denition, there exists fn M M such that fn f. Since M fn f f M and (f fn ) 0 by the dominated convergence theorem, it follows that f H. As similar argument shows M H. We will now show H is a vector sub-space of the bounded B = (M) measurable functions. H is closed under addition. If gi H for i = 1, 2, and > 0 is given, we may nd fi M and hi M such that fi gi hi and (hi fi ) < /2 for i = 1, 2. Since h = h1 + h2 M , f := f1 + f2 M , f g1 + g2 h, and (h f ) = (h1 f1 ) + (h2 f2 ) < , it follows that g1 + g2 H. H is closed under scalar multiplication. If g H then g H for all R. Indeed suppose that > 0 is given and f M and h M such that f g h and (h f ) < . Then for 0, M f g h M and (h f ) = (h f ) < .
Page: 88 job: prob

v uN =
n=1

(vn un ) +
n=N +1

(vn )
n=1

2n +
n=N +1

(vn )

+
n=N +1

(vn ) .

However, since

(vn )
n=1 n=1

n + 2n =
n=1

(n ) + ( )

=
n=1

(g g1 ) + ( ) < ,

it follows that for N N suciently large that n=N +1 (vn ) < . Therefore, for this N, we have v uN < 2 and since > 0 is arbitrary, if follows that g g1 H. Since g1 H and H is a vector space, we may conclude that g = (g g1 ) + g1 H.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Theorem 9.14 (Complex Multiplicative System Theorem). Suppose H is a complex linear subspace of the bounded complex functions on , 1 H, H is closed under complex conjugation, and H is closed under bounded convergence. If M H is multiplicative system which is closed under conjugation, then H contains all bounded complex valued (M)-measurable functions. Proof. Let M0 = spanC (M {1}) be the complex span of M. As the reader should verify, M0 is an algebra, M0 H, M0 is closed under complex conjugation and (M0 ) = (M) . Let HR := {f H : f is real valued} and MR 0 := {f M0 : f is real valued} . Then HR is a real linear space of bounded real valued functions 1 which is closed R R under bounded convergence and MR 0 H . Moreover, M0 is a multiplicative system (as the reader should check) and therefore by Theorem 9.3, HR contains all bounded MR 0 measurable real valued functions. Since H and M0 are complex linear spaces closed under complex conjugation, for any f H or 1 f M0 , the functions Re f = 1 2 f + f and Im f = 2i f f are in H or R R R M0 respectively. Therefore M0 = M0 + iM0 , M0 = (M0 ) = (M) , and H = HR + iHR . Hence if f : C is a bounded (M) measurable function, then f = Re f + i Im f H since Re f and Im f are in HR .

10 Multiple and Iterated Integrals


10.1 Iterated Integrals
Notation 10.1 (Iterated Integrals) If (X, M, ) and (Y, N , ) are two measure spaces and f : X Y C is a M N measurable function, the iterated integrals of f (when they make sense) are: and d(x)
X Y

x
Y

f (x, y )d (y ) is M B[0,] measurable, f (x, y )d(x) is N B[0,] measurable,


X

(10.3) (10.4)

d (y )f (x, y ) :=
X Y

f (x, y )d (y ) d(x)

d(x)
X Y

d (y )f (x, y ) =
Y

d (y )
X

d(x)f (x, y ).

(10.5)

and d (y )
Y X

Proof. Suppose that E = A B E := M N and f = 1E . Then d(x)f (x, y ) :=


Y X

f (x, y )d(x) d (y ).

f (x, y ) = 1AB (x, y ) = 1A (x)1B (y ) and one sees that Eqs. (10.1) and (10.2) hold. Moreover f (x, y )d (y ) =
Y Y

Notation 10.2 Suppose that f : X C and g : Y C are functions, let f g denote the function on X Y given by f g (x, y ) = f (x)g (y ). Notice that if f, g are measurable, then f g is (M N , BC ) measurable. To prove this let F (x, y ) = f (x) and G(x, y ) = g (y ) so that f g = F G will be measurable provided that F and G are measurable. Now F = f 1 where 1 : X Y X is the projection map. This shows that F is the composition of measurable functions and hence measurable. Similarly one shows that G is measurable.

1A (x)1B (y )d (y ) = 1A (x) (B ),

so that Eq. (10.3) holds and we have d(x)


X Y

d (y )f (x, y ) = (B )(A).

(10.6)

Similarly, f (x, y )d(x) = (A)1B (y ) and


X

10.2 Tonellis Theorem and Product Measure


Y

d (y )
X

d(x)f (x, y ) = (B )(A)

Theorem 10.3. Suppose (X, M, ) and (Y, N , ) are -nite measure spaces and f is a nonnegative (M N , BR ) measurable function, then for each y Y, x f (x, y ) is M B[0,] measurable, for each x X, y f (x, y ) is N B[0,] measurable, (10.2) (10.1)

from which it follows that Eqs. (10.4) and (10.5) hold in this case as well. For the moment let us now further assume that (X ) < and (Y ) < and let H be the collection of all bounded (M N , BR ) measurable functions on X Y such that Eqs. (10.1) (10.5) hold. Using the fact that measurable functions are closed under pointwise limits and the dominated convergence theorem (the dominating function always being a constant), one easily shows that H closed under bounded convergence. Since we have just veried that 1E H for all E in the class, E , it follows by Corollary 9.5 that H is the space

92

10 Multiple and Iterated Integrals

of all bounded (M N , BR ) measurable functions on X Y. Moreover, if f : X Y [0, ] is a (M N , BR ) measurable function, let fM = M f so that fM f as M . Then Eqs. (10.1) (10.5) hold with f replaced by fM for all M N. Repeated use of the monotone convergence theorem allows us to pass to the limit M in these equations to deduce the theorem in the case and are nite measures. For the nite case, choose Xn M, Yn N such that Xn X, Yn Y, (Xn ) < and (Yn ) < for all m, n N. Then dene m (A) = (Xm A) and n (B ) = (Yn B ) for all A M and B N or equivalently dm = 1Xm d and dn = 1Yn d. By what we have just proved Eqs. (10.1) (10.5) with replaced by m and by n for all (M N , BR ) measurable functions, f : X Y [0, ]. The validity of Eqs. (10.1) (10.5) then follows by passing to the limits m and then n making use of the monotone convergence theorem in the following context. For all u L+ (X, M), udm =
X X

Theorem 10.6 (Tonellis Theorem). Suppose (X, M, ) and (Y, N , ) are nite measure spaces and = is the product measure on M N . If f L+ (X Y, M N ), then f (, y ) L+ (X, M) for all y Y, f (x, ) L+ (Y, N ) for all x X, f (, y )d (y ) L+ (X, M),
Y X

f (x, )d(x) L+ (Y, N )

and f d =
X Y X

d(x)
Y

d (y )f (x, y ) d(x)f (x, y ).


X

(10.8) (10.9)

=
Y

d (y )

u1Xm d
X

ud as m ,

and for all and v L+ (Y, N ), vdn =


Y Y

Proof. By Theorem 10.3 and Corollary 10.4, the theorem holds when f = 1E with E M N . Using the linearity of all of the statements, the theorem is also true for non-negative simple functions. Then using the monotone convergence theorem repeatedly along with the approximation Theorem 6.34, one deduces the theorem for general f L+ (X Y, M N ). Example 10.7. In this example we are going to show, I := 2. To this end we observe, using Tonellis theorem, that
2 R

v 1Yn d
Y

vd as n .

ex

/2

dm (x) =

Corollary 10.4. Suppose (X, M, ) and (Y, N , ) are nite measure spaces. Then there exists a unique measure on M N such that (A B ) = (A) (B ) for all A M and B N . Moreover is given by (E ) =
X

I2 =
R

ex

/2

dm (x)

=
R

ey

/2 R

ex

/2

dm (x) dm (y )

=
R2

e(x

+y 2 )/2

dm2 (x, y )

d(x)
Y

d (y )1E (x, y ) =
Y

d (y )
X

d(x)1E (x, y )

(10.7)

where m2 = m m is Lebesgue measure on R2 , BR2 = BR BR . From the monotone convergence theorem, I 2 = lim
R

for all E M N and is nite. Proof. Notice that any measure such that (A B ) = (A) (B ) for all A M and B N is necessarily nite. Indeed, let Xn M and Yn N be chosen so that (Xn ) < , (Yn ) < , Xn X and Yn Y, then Xn Yn M N , Xn Yn X Y and (Xn Yn ) < for all n. The uniqueness assertion is a consequence of the combination of Exercises 4.5 and 5.1 Proposition 4.26 with E = M N . For the existence, it suces to observe, using the monotone convergence theorem, that dened in Eq. (10.7) is a measure on M N . Moreover this measure satises (A B ) = (A) (B ) for all A M and B N from Eq. (10.6). Notation 10.5 The measure is called the product measure of and and will be denoted by .
Page: 92 job: prob

e(x
DR

+y 2 )/2

d (x, y )

where DR = (x, y ) : x2 + y 2 < R2 . Using the change of variables theorem described in Section 10.5 below,1 we nd e(x
DR
2

+y 2 )/2

d (x, y ) =
(0,R)(0,2 ) R

er er
0
2

/2

rdrd
2

= 2
1

/2

rdr = 2 1 eR

/2

Alternatively, you can easily show that the integral D f dm2 agrees with the R multiple integral in undergraduate analysis when f is continuous. Then use the change of variables theorem from undergraduate analysis.

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

10.3 Fubinis Theorem

93

From this we learn that I 2 = lim 2 1 eR


R
2

f (x, y ) d (y ) =
/2

1E c (x) f (x, y ) d (y )
Y

= 2

=
Y

1E c (x) [f+ (x, y ) f (x, y )] d (y ) 1E c (x) f+ (x, y ) d (y )


Y Y

as desired. =

1E c (x) f (x, y ) d (y ) . (10.14)

10.3 Fubinis Theorem


The following convention will be in force for the rest of this section. Convention: If (X, M, ) is a measure space and f : X C is a measurable but non-integrable function, i.e. X |f | d = , by convention we will dene X f d := 0. However if f is a non-negative function (i.e. f : X [0, ]) is a non-integrable function we will still write X f d = . Theorem 10.8 (Fubinis Theorem). Suppose (X, M, ) and (Y, N , ) are nite measure spaces, = is the product measure on M N and f : X Y C is a M N measurable function. Then the following three conditions are equivalent: |f | d < , i.e. f L1 ( ),
X Y

Noting that 1E c (x) f (x, y ) = (1E c 1Y f ) (x, y ) is a positive M N measurable function, it follows from another application of Tonellis theorem that x Y f (x, y ) d (y ) is M measurable, being the dierence of two measurable functions. Moreover f (x, y ) d (y ) d (x)
X Y X Y

|f (x, y )| d (y ) d (x) < ,

which shows Y f (, y )dv (y ) L1 (). Integrating Eq. (10.14) on x and using Tonellis theorem repeatedly implies, f (x, y ) d (y ) d (x)
X Y

(10.10) (10.11) (10.12)

=
X

d (x)
Y

d (y ) 1E c (x) f+ (x, y )
X

d (x)
Y

d (y ) 1E c (x) f (x, y ) d (x) 1E c (x) f (x, y )


X

|f (x, y )| d (y ) d(x) < and


X Y

=
Y

d (y )
X

d (x) 1E c (x) f+ (x, y )


Y

d (y )

|f (x, y )| d(x) d (y ) < .


Y X

=
Y

d (y )
X

d (x) f+ (x, y )
Y

d (y )
X

d (x) f (x, y ) f d (10.15)


X Y

If any one (and hence all) of these condition hold, then f (x, ) L1 ( ) for -a.e. x, f (, y ) L1 () for -a.e. y, Y f (, y )dv (y ) L1 (), X f (x, )d(x) L1 ( ) and Eqs. (10.8) and (10.9) are still valid. Proof. The equivalence of Eqs. (10.10) (10.12) is a direct consequence of Tonellis Theorem 10.6. Now suppose f L1 ( ) is a real valued function and let E := xX:
Y

=
X Y

f+ d
X Y

f d =
X Y

(f+ f ) d =

which proves Eq. (10.8) holds. Now suppose that f = u + iv is complex valued and again let E be as in Eq. (10.13). Just as above we still have E M and (E ) = 0. By our convention, f (x, y ) d (y ) =
Y Y

|f (x, y )| d (y ) = .

(10.13)

1E c (x) f (x, y ) d (y ) =
Y

1E c (x) [u (x, y ) + iv (x, y )] d (y ) 1E c (x) v (x, y ) d (y )


Y

Then by Tonellis theorem, x Y |f (x, y )| d (y ) is measurable and hence E M. Moreover Tonellis theorem implies |f (x, y )| d (y ) d (x) =
X Y X Y

=
Y

1E c (x) u (x, y ) d (y ) + i

|f | d <

which implies that (E ) = 0. Let f be the positive and negative parts of f, then using the above convention we have
Page: 93 job: prob

which is measurable in x by what we have just proved. Similarly one shows f (, y ) d (y ) L1 () and Eq. (10.8) still holds by a computation similar to Y that done in Eq. (10.15). The assertions pertaining to Eq. (10.9) may be proved in the same way. The previous theorems have obvious generalizations to products of any nite number of nite measure spaces. For example the following theorem holds.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

94

10 Multiple and Iterated Integrals


n

Theorem 10.9. Suppose {(Xi , Mi , i )}i=1 are nite measure spaces and X := X1 Xn . Then there exists a unique measure, , on (X, M1 Mn ) such that (A1 An ) = 1 (A1 ) . . . n (An ) for all Ai Mi .

Ef (X1 , . . . , Xn ) = P ((X1 , . . . , Xn ) A1 An )
n n

=
j =1

P (Xj Aj ) =
j =1

j (Aj )

= (This measure and its completion will be denoted by 1 n .) If f : X [0, ] is a M1 Mn measurable function then f d =
X X(1) Rn

f (x1 , . . . , xn ) d1 (x1 ) . . . dn (xn ) .

d(1) (x(1) ) . . .
X(n)

d(n) (x(n) ) f (x1 , . . . , xn ) (10.16)

Therefore, H contains the multiplicative system, M := {1A1 An : Ai BR } and so by the multiplicative systems theorem, H contains all bounded (M) = BRn measurable functions. (2 = 3) Let A BRn and f = 1A in Eq. (10.17) to conclude that (A) = P ((X1 , . . . , Xn ) A) = E1A (X1 , . . . , Xn ) = 1A (x1 , . . . , xn ) d1 (x1 ) . . . dn (xn ) = (1 n ) (A) .
Rn

where is any permutation of {1, 2, . . . , n}. This equation also holds for any f L1 ( ) and moreover, f L1 ( ) i d(1) (x(1) ) . . .
X(1) X(n)

d(n) (x(n) ) |f (x1 , . . . , xn )| <

(3 = 1) This follows from the identity,


n

for some (and hence all) permutations, . This theorem can be proved by the same methods as in the two factor case, see Exercise 10.4. Alternatively, one can use the theorems already proved and induction on n, see Exercise 10.5 in this regard. Proposition 10.10. Suppose that {Xk }k=1 are random variables on a prob1 ability space (, B , P ) and k = P Xk is the distribution for Xk for 1 k = 1, 2, . . . , n, and := P (X1 , . . . , Xn ) is the joint distribution of (X1 , . . . , Xn ) . Then the following are equivalent, 1. {Xk }k=1 are independent, 2. for all bounded measurable functions, f : (Rn , BRn ) (R, BR ) , Ef (X1 , . . . , Xn ) =
Rn n n

P ((X1 , . . . , Xn ) A1 An ) = (A1 An ) =
j =1 n

j (Aj )

=
j =1

P (Xj Aj ) ,

which is valid for all Aj BR . Example 10.11 (No Ties). Suppose that X and Y are independent random variables on a probability space (, B , P ) . If F (x) := P (X x) is continuous, then P (X = Y ) = 0. To prove this, let (A) := P (X A) and (A) = P (Y A) . Because F is continuous, ({y }) = F (y ) F (y ) = 0, and hence P (X = Y ) = E 1{X =Y } =
R2

f (x1 , . . . , xn ) d1 (x1 ) . . . dn (xn ) , (taken in any order) (10.17)

1{x=y} d ( ) (x, y ) ({y }) d (y )


R

and 3. = 1 2 n . Proof. (1 = 2) Suppose that {Xk }k=1 are independent and let H denote the set of bounded measurable functions, f : (Rn , BRn ) (R, BR ) such that Eq. (10.17) holds. Then it is easily checked that H is a vector space which contains the constant functions and is closed under bounded convergence. Moreover, if f = 1A1 An where Ai BR , we have
n

=
R

d (y )
R

d (x) 1{x=y} =

=
R

0 d (y ) = 0.

Example 10.12. In this example we will show


M M

lim

sin x dx = /2. x

(10.18)

To see this write


Page: 94 job: prob macro: svmonob.cls

1 x

tx e dt 0

and use Fubini-Tonelli to conclude that


date/time: 23-Feb-2007/15:20

10.4 Fubinis Theorem and Completions


M 0

95

sin x dx = x =

M 0 0 0 0

etx sin x dt dx
0 M

sin x x e dx = x =

dx sin x ex
0 M 0

etx dt

etx sin x dx dt

dt
0 0

dx sin x e(+t)x

1 1 teM t sin M eM t cos M dt 2 1 + t 0 1 dt = as M , 1 + t2 2 0 = wherein we have used the dominated convergence theorem (for instance, take 1 t g (t) := 1+ + et )) to pass to the limit. t2 (1 + te The next example is a renement of this result. Example 10.13. We have
0

=
0

=
0

dt 2 ( + t) + 1 cos M + ( + t) sin M M (+t) 1 dt e dt 2 2 ( + t) + 1 ( + t) + 1 0 (10.21)

1 (cos M + ( + t) sin M ) eM (+t)

1 = arctan (M, ) 2 where (M, ) =


0

cos M + ( + t) sin M ( + t) + 1
2

eM (+t) dt.

Since sin x x 1 e dx = arctan for all > 0 x 2 (10.19) cos M + ( + t) sin M ( + t) + 1


2

1 + ( + t ) ( + t) + 1
2

C,

and for, M [0, ),


M 0

|(M, )| 1 eM sin x x e dx + arctan C x 2 M =


1 2 22 0

eM (+t) dt = C

eM . M

(10.20)

where C = maxx0

1+x 1+x2

= 1.2. In particular Eq. (10.18) is valid.

This estimate along with Eq. (10.21) proves Eq. (10.20) from which Eq. (10.18) follows by taking and Eq. (10.19) follows (using the dominated convergence theorem again) by letting M .

To verify these assertions, rst notice that by the fundamental theorem of calculus,
x x x

Note: you may skip the rest of this chapter!

|sin x| =
0

cos ydy
0

|cos y | dy
0

1dy = |x|

so

sin x x

1 for all x = 0. Making use of the identity

10.4 Fubinis Theorem and Completions


Notation 10.14 Given E X Y and x X, let

etx dt = 1/x
0 xE

:= {y Y : (x, y ) E }.

and Fubinis theorem,

Similarly if y Y is given let Ey := {x X : (x, y ) E }. If f : X Y C is a function let fx = f (x, ) and f y := f (, y ) so that fx : Y C and f y : X C.

Page: 95

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

96

10 Multiple and Iterated Integrals

Theorem 10.15. Suppose (X, M, ) and (Y, N , ) are complete nite measure spaces. Let (X Y, L, ) be the completion of (X Y, M N , ). If f is L measurable and (a) f 0 or (b) f L1 () then fx is N measurable for a.e. x and f y is M measurable for a.e. y and in case (b) fx L1 ( ) and f y L1 () for a.e. x and a.e. y respectively. Moreover, x
Y

2. For a.e. y, x f (x, y ) = g (x, y ) + h(x, y ) is in L1 () and f (x, y )d(x) =


X X

g (x, y )d(x).

From these assertions and Theorem 10.6, it follows that d(x)


X Y

fx d

L1 () and

y
X

f y d

L1 ( )

d (y )f (x, y ) =
X

d(x)
Y

d (y )g (x, y ) d (x)g (x, y )


Y

and f d =
X Y Y

= d
X

d (y )
Y

d f =
X

d
Y

d f. =

g (x, y )d( )(x, y )


X Y

Proof. If E M N is a null set (i.e. ( )(E ) = 0), then = 0 = ( )(E ) =


X

f (x, y )d(x, y ).
X Y

(x E )d(x) =
X

(Ey )d (y ). Similarly it is shown that d (y )


Y X

This shows that ({x : (x E ) = 0}) = 0 and ({y : (Ey ) = 0}) = 0, i.e. (x E ) = 0 for a.e. x and (Ey ) = 0 for a.e. y. If h is L measurable and h = 0 for a.e., then there exists E M N such that {(x, y ) : h(x, y ) = 0} E and ( )(E ) = 0. Therefore |h(x, y )| 1E (x, y ) and ( )(E ) = 0. Since {hx = 0} = {y Y : h(x, y ) = 0} x E and {hy = 0} = {x X : h(x, y ) = 0} Ey we learn that for a.e. x and a.e. y that {hx = 0} M, {hy = 0} N , ({hx = 0}) = 0 and a.e. and ({hy = 0}) = 0. This implies Y h(x, y )d (y ) exists and equals 0 for a.e. x and similarly that X h(x, y )d(x) exists and equals 0 for a.e. y. Therefore 0=
X Y 1

d(x)f (x, y ) =
X Y

f (x, y )d(x, y ).

10.5 Lebesgue Measure on Rd and the Change of Variables Theorem


Notation 10.16 Let
d times d d times

m := m m on BRd = BR BR be the d fold product of Lebesgue measure m on BR . We will also use md to denote its completion and let Ld be the completion of BRd relative to md . A subset A Ld is called a Lebesgue measurable set and md is called d dimensional Lebesgue measure, or just Lebesgue measure for short. Denition 10.17. A function f : Rd R is Lebesgue measurable if f 1 (BR ) Ld . Notation 10.18 I will often be sloppy in the sequel and write m for md and dx for dm(x) = dmd (x), i.e. f (x) dx = f dm =
Rd Rd

hd =
Y X

hd

d =
X 1 Y

hd

d.

For general f L (), we may choose g L (MN , ) such that f (x, y ) = g (x, y ) for a.e. (x, y ). Dene h := f g. Then h = 0, a.e. Hence by what we have just proved and Theorem 10.6 f = g + h has the following properties: 1. For a.e. x, y f (x, y ) = g (x, y ) + h(x, y ) is in L1 ( ) and f (x, y )d (y ) =
Y Y

f dmd .

g (x, y )d (y ).

Rd

Hopefully the reader will understand the meaning from the context.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 96

job: prob

10.5 Lebesgue Measure on Rd and the Change of Variables Theorem

97

Theorem 10.19. Lebesgue measure md is translation invariant. Moreover md is the unique translation invariant measure on BRd such that md ((0, 1]d ) = 1. Proof. Let A = J1 Jd with Ji BR and x Rd . Then x + A = (x1 + J1 ) (x2 + J2 ) (xd + Jd ) and therefore by translation invariance of m on BR we nd that md (x + A) = m(x1 + J1 ) . . . m(xd + Jd ) = m(J1 ) . . . m(Jd ) = md (A) and hence md (x + A) = md (A) for all A BRd since it holds for A in a multiplicative system which generates BRd . From this fact we see that the measure md (x + ) and md () have the same null sets. Using this it is easily seen that m(x + A) = m(A) for all A Ld . The proof of the second assertion is Exercise 10.6. Exercise 10.1. In this problem you are asked to show there is no reasonable notion of Lebesgue measure on an innite dimensional Hilbert space. To be more precise, suppose H is an innite dimensional Hilbert space and m is a countably additive measure on BH which is invariant under translations and satises, m(B0 ()) > 0 for all > 0. Show m(V ) = for all non-empty open subsets V H. Theorem 10.20 (Change of Variables Theorem). Let o Rd be an open set and T : T ( ) o Rd be a C 1 dieomorphism,2 see Figure 10.1. Then for any Borel measurable function, f : T ( ) [0, ], f (T (x)) | det T (x) |dx =
T ( ) d where T (x) is the linear transformation on Rd dened by T (x)v := dt |0 T (x + d tv ). More explicitly, viewing vectors in R as columns, T (x) may be represented by the matrix 1 T1 (x) . . . d T1 (x) . . .. . . T (x) = (10.23) , . . .

Fig. 10.1. The geometric setup of Theorem 10.20.

Remark 10.21. Theorem 10.20 is best remembered as the statement: if we make the change of variables y = T (x) , then dy = | det T (x) |dx. As usual, you must also change the limits of integration appropriately, i.e. if x ranges through then y must range through T ( ) . Proof. The proof will be by induction on d. The case d = 1 was essentially done in Exercise 8.7. Nevertheless, for the sake of completeness let us give a proof here. Suppose d = 1, a < < < b such that [a, b] is a compact subinterval of . Then | det T | = |T | and 1T ((, ]) (T (x)) |T (x)| dx =
[a,b] [a,b]

f (y ) dy,

(10.22)

1(, ] (x) |T (x)| dx =

|T (x)| dx.

If T (x) > 0 on [a, b] , then


|T (x)| dx =

T (x) dx = T ( ) T () 1T ((, ]) (y ) dy
T ([a,b])

1 Td (x) . . . d Td (x) i.e. the i - j matrix entry of T (x) is given by T (x)ij = i Tj (x) where T (x) = (T1 (x), . . . , Td (x))tr and i = /xi .
2

= m (T ((, ])) = while if T (x) < 0 on [a, b] , then

That is T : T ( ) o Rd is a continuously dierentiable bijection and the inverse map T 1 : T ( ) is also continuously dierentiable.

Page: 97

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

98

10 Multiple and Iterated Integrals


|T (x)| dx =

T (x) dx = T () T ( ) 1T ((, ]) (y ) dy.


T ([a,b])

= m (T ((, ])) = Combining the previous three equations shows f (T (x)) |T (x)| dx =
[a,b]

for some i {1, . . . , d} . For deniteness we will assume T is as in Eq. (10.25), the case of T in Eq. (10.26) may be handled similarly. For t R, let it : Rd1 Rd be the inclusion map dened by it (w) := wt := (w1 , . . . , wi1 , t, wi+1 , . . . , wd1 ) , t be the (possibly empty) open subset of Rd1 dened by

f (y ) dy
T ([a,b])

(10.24)

t := w Rd1 : (w1 , . . . , wi1 , t, wi+1 , . . . , wd1 ) and Tt : t Rd1 be dened by Tt (w) = (T2 (wt ) , . . . , Td (wt )) , see Figure 10.2. Expanding det T (wt ) along the rst row of the matrix T (wt )

whenever f is of the form f = 1T ((, ]) with a < < < b. An application of Dynkins multiplicative system Theorem 9.3 then implies that Eq. (10.24) holds for every bounded measurable function f : T ([a, b]) R. (Observe that |T (x)| is continuous and hence bounded for x in the compact interval, [a, b] .) N Recall that = n=1 (an , bn ) where an , bn R {} for n = 1, 2, < N with N = possible. Hence if f : T ( ) R + is a Borel measurable function and an < k < k < bn with k an and k bn , then by what we have already proved and the monotone convergence theorem 1(an ,bn ) (f T ) |T |dm =

1T ((an ,bn )) f T |T |dm 1T ([k ,k ]) f T |T | dm

= lim = lim

k T ( )

1T ([k ,k ]) f dm

=
T ( )

1T ((an ,bn )) f dm.


Fig. 10.2. In this picture d = i = 3 and is an egg-shaped region with an egg-shaped hole. The picture indicates the geometry associated with the map T and slicing the set along planes where x3 = t.

Summing this equality on n, then shows Eq. (10.22) holds. To carry out the induction step, we now suppose d > 1 and suppose the theorem is valid with d being replaced by d 1. For notational compactness, let us write vectors in Rd as row vectors rather than column vectors. Nevertheless, the matrix associated to the dierential, T (x) , will always be taken to be given as in Eq. (10.23). Case 1. Suppose T (x) has the form T (x) = (xi , T2 (x) , . . . , Td (x)) or T (x) = (T1 (x) , . . . , Td1 (x) , xi ) (10.26) (10.25)

shows |det T (wt )| = |det Tt (w)| . Now by the Fubini-Tonelli Theorem and the induction hypothesis,

Page: 98

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

10.5 Lebesgue Measure on Rd and the Change of Variables Theorem

99

f T | det T |dm =
Rd

1 f T | det T |dm 1 (wt ) (f T ) (wt ) | det T (wt ) |dwdt


Rd

and S : W S (W ) is a C 1 dieomorphism. Let R : S (W ) T (W ) o Rd to be the C 1 dieomorphism dened by R (z ) := T S 1 (z ) for all z S (W ) . Because

=
R

(f T ) (wt ) | det T (wt ) |dw dt f (t, Tt (w)) | det Tt (w) |dw dt f (t, z ) dz dt = f (y ) dy
R

(T1 (x) , . . . , Td (x)) = T (x) = R (S (x)) = R ((xi , T2 (x) , . . . , Td (x))) for all x W, if (z1 , z2 , . . . , zd ) = S (x) = (xi , T2 (x) , . . . , Td (x)) then R (z ) = T1 S 1 (z ) , z2 , . . . , zd . (10.28)

=
R

=
R


Rd1


Tt (t )

1T ( ) (t, z ) f (t, z ) dz dt

=
T ( )

Observe that S is a map of the form in Eq. (10.25), R is a map of the form in Eq. (10.26), T (x) = R (S (x)) S (x) (by the chain rule) and (by the multiplicative property of the determinant) |det T (x)| = | det R (S (x)) | |det S (x)| x W. So if f : T (W ) [0, ] is a Borel measurable function, two applications of the results in Case 1. shows, f T | det T |dm =
W W

wherein the last two equalities we have used Fubini-Tonelli along with the identity; T ( ) = T (it ( )) = {(t, z ) : z Tt (t )} .
tR tR

(f R | det R |) S |det S | dm f R | det R |dm =


S (W ) R(S (W ))

Case 2. (Eq. (10.22) is true locally.) Suppose that T : Rd is a general map as in the statement of the theorem and x0 is an arbitrary point. We will now show there exists an open neighborhood W of x0 such that f T | det T |dm =
W T (W )

f dm

f dm

=
T (W )

f dm

holds for all Borel measurable function, f : T (W ) [0, ]. Let Mi be the 1-i minor of T (x0 ) , i.e. the determinant of T (x0 ) with the rst row and ith column removed. Since
d

and Case 2. is proved. Case 3. (General Case.) Let f : [0, ] be a general non-negative Borel measurable function and let Kn := {x : dist(x, c ) 1/n and |x| n} . Then each Kn is a compact subset of and Kn as n . Using the compactness of Kn and case 2, for each n N, there is a nite open cover Wn of Kn such that W and Eq. (10.22) holds with replaced by W for each W Wn . Let {Wi }i=1 be an enumeration of n=1 Wn and set W1 = W1 and Wi := Wi \ (W1 Wi1 ) for all i 2. Then = i=1 Wi and by repeated use of case 2.,
macro: svmonob.cls date/time: 23-Feb-2007/15:20

0 = det T (x0 ) =
i=1

(1)

i+1

i Tj (x0 ) Mi ,

there must be some i such that Mi = 0. Fix an i such that Mi = 0 and let, S (x) := (xi , T2 (x) , . . . , Td (x)) . (10.27)

Observe that |det S (x0 )| = |Mi | = 0. Hence by the inverse function Theorem, there exist an open neighborhood W of x0 such that W o and S (W ) o Rd
Page: 99 job: prob

100

10 Multiple and Iterated Integrals

f T | det T |dm =
i=1

1W i (f T ) | det T |dm 1T (W i ) f T | det T |dm


i=1W
i

Example 10.23. Continuing the setup in Theorem 10.20, if A B , then m (T (A)) =


Rd

1T (A) (y ) dy =
Rd

1T (A) (T x) |det T (x)| dx

=
Rd

1A (x) |det T (x)| dx

=
i=1 T (Wi )

1T (W i ) f dm =
i=1 T ( )

1T (W i ) f dm

wherein the second equality we have made the change of variables, y = T (x) . Hence we have shown d (m T ) = |det T ()| dm. In particular if T GL(d, R) = GL(Rd ) the space of d d invertible matrices, then m T = |det T | m, i.e. m (T (A)) = |det T | m (A) for allA BRd . (10.30)

=
T ( )

f dm.

Remark 10.22. When d = 1, one often learns the change of variables formula as
b T (b)

This equation also shows that m T and m have the same null sets and hence the equality in Eq. (10.30) is valid for any A Ld . Exercise 10.2. Show that f L1 T ( ) , md i |f T | | det T |dm <

f (T (x)) T (x) dx =
a T (a)

f (y ) dy

(10.29)

where f : [a, b] R is a continuous function and T is C 1 function dened in a neighborhood of [a, b] . If T > 0 on (a, b) then T ((a, b)) = (T (a) , T (b)) and Eq. (10.29) is implies Eq. (10.22) with = (a, b) . On the other hand if T < 0 on (a, b) then T ((a, b)) = (T (b) , T (a)) and Eq. (10.29) is equivalent to
T (a)

and if f L1 T ( ) , md , then Eq. (10.22) holds. Example 10.24 (Polar Coordinates). Suppose T : (0, ) (0, 2 ) R2 is dened by x = T (r, ) = (r cos , r sin ) , i.e. we are making the change of variable, x1 = r cos and x2 = r sin for 0 < r < and 0 < < 2. In this case T (r, ) = and therefore dx = |det T (r, )| drd = rdrd. Observing that R2 \ T ((0, ) (0, 2 )) = := {(x, 0) : x 0} has m2 measure zero, it follows from the change of variables Theorem 10.20 that
2

f (T (x)) ( |T (x)|) dx =
(a,b) T ( b)

f (y ) dy =
T ((a,b))

f (y ) dy

which is again implies Eq. (10.22). On the other hand Eq. (10.29) is more general than Eq. (10.22) since it does not require T to be injective. The standard proof of Eq. (10.29) is as follows. For z T ([a, b]) , let
z

F (z ) :=
T (a)

f (y ) dy.

cos r sin sin r cos

Then by the chain rule and the fundamental theorem of calculus,


b b b

f (T (x)) T (x) dx =
a a

F (T (x)) T (x) dx =
a T (b)

d [F (T (x))] dx dx

= F (T (x)) |b a =
T (a)

f (y ) dy.

An application of Dynkins multiplicative systems theorem now shows that Eq. (10.29) holds for all bounded measurable functions f on (a, b) . Then by the usual truncation argument, it also holds for all positive measurable functions on (a, b) .
Page: 100 job: prob

f (x)dx =
R2 0

d
0

dr r f (r (cos , sin ))

(10.31)

for any Borel measurable function f : R2 [0, ].


macro: svmonob.cls date/time: 23-Feb-2007/15:20

10.5 Lebesgue Measure on Rd and the Change of Variables Theorem

101

Example 10.25 (Holomorphic Change of Variables). Suppose that f : o C = R2 C is an injective holomorphic function such that f (z ) = 0 for all z . We may express f as f (x + iy ) = U (x, y ) + iV (x, y ) for all z = x + iy . Hence if we make the change of variables, w = u + iv = f (x + iy ) = U (x, y ) + iV (x, y ) then dudv = det Ux Uy Vx Vy dxdy = |Ux Vy Uy Vx | dxdy.

Recalling that U and V satisfy the Cauchy Riemann equations, Ux = Vy and Uy = Vx with f = Ux + iVx , we learn Ux Vy Uy Vx = Therefore
2 Ux

Fig. 10.3. The region consists of the two curved rectangular regions shown.

2 Vx 2

= |f | . Exercise 10.3 (Spherical Coordinates). Let T : (0, ) (0, ) (0, 2 ) R3 be dened by T (r, , ) = (r sin cos , r sin sin , r cos ) = r (sin cos , sin sin , cos ) , see Figure 10.4. By making the change of variables x = T (r, , ) , show

dudv = |f (x + iy )| dxdy. Example 10.26. In this example we will evaluate the integral I :=

x4 y 4 dxdy

where = (x, y ) : 1 < x2 y 2 < 2, 0 < xy < 1 , see Figure 10.3. We are going to do this by making the change of variables, (u, v ) := T (x, y ) = x2 y 2 , xy , in which case dudv = det Notice that 1 ududv. 2 The function T is not injective on but it is injective on each of its connected components. Let D be the connected component in the rst quadrant so that = D D and T (D) = (1, 2) (0, 1) . The change of variables theorem then implies x4 y 4 = x2 y 2 x2 + y 2 = u x2 + y 2 = I :=
D

2x 2y y x

dxdy = 2 x2 + y 2 dxdy

x4 y 4 dxdy =

1 2

ududv =
(1,2)(0,1)

1 u2 2 3 | 1= 2 2 1 4

Fig. 10.4. The relation of x to (r, , ) in spherical coordinates.

and therefore I = I+ + I = 2 (3/4) = 3/2.


Page: 101 job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

102

10 Multiple and Iterated Integrals


2

f (x)dx =
R3 0

d
0

d
0

dr r2 sin f (T (r, , ))

Denition 10.28. For E BS d1 , let (E ) := d m(E1 ). We call the surface measure on S d1 . It is easy to check that is a measure. Indeed if E BS d1 , then E1 = 1 ((0, 1] E ) BRd so that m(E1 ) is well dened. Moreover if E = i=1 Ei , then E1 = i=1 (Ei )1 and

for any Borel measurable function, f : R3 [0, ]. Lemma 10.27. Let a > 0 and Id (a) :=
Rd

ea|x| dm(x). (E ) = d m(E1 ) =

m ((Ei )1 ) =
i=1 i=1

(Ei ).

Then Id (a) = (/a)d/2 . Proof. By Tonellis theorem and induction, Id (a) =


Rd1 R d = Id1 (a)I1 (a) = I1 (a).

The intuition behind this denition is as follows. If E S d1 is a set and > 0 is a small number, then the volume of (1, 1 + ] E = {r : r (1, 1 + ] and E } (10.32) should be approximately given by m ((1, 1 + ] E ) = (E ), see Figure 10.5 below. On the other hand

ea|y| eat md1 (dy ) dt

So it suces to compute: I2 (a) =


R2

ea|x| dm(x) =
R2 \{0}

ea(x1 +x2 ) dx1 dx2 .

Using polar coordinates, see Eq. (10.31), we nd,


2

I2 (a) =
0

dr r
0 M M 0

d ear = 2
0

rear dr
2

= 2 lim

re

ar 2

ear dr = 2 lim M 2a

=
0

2 = /a. 2a
Fig. 10.5. Motivating the denition of surface measure for a sphere.

This shows that I2 (a) = /a and the result now follows from Eq. (10.32).

10.6 The Polar Decomposition of Lebesgue Measure


Let
d 2

m ((1, 1 + ]E ) = m (E1+ \ E1 ) = (1 + )d 1 m(E1 ). Therefore we expect the area of E should be given by (E ) = lim
0

d1

= {x R : |x| :=
i=1

x2 i = 1}

be the unit sphere in Rd equipped with its Borel algebra, BS d1 and : 1 Rd \ {0} (0, ) S d1 be dened by (x) := (|x| , |x| x). The inverse map, 1 d1 d 1 : (0, ) S R \ {0} , is given by (r, ) = r. Since and 1 are continuous, they are both Borel measurable. For E BS d1 and a > 0, let Ea := {r : r (0, a] and E } = 1 ((0, a] E ) BRd .
Page: 102 job: prob

(1 + )d 1 m(E1 ) = d m(E1 ).

The following theorem is motivated by Example 10.24 and Exercise 10.3. Theorem 10.29 (Polar Coordinates). If f : Rd [0, ] is a (BRd , B ) measurable function then

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

10.6 The Polar Decomposition of Lebesgue Measure

103

f (x)dm(x) =
Rd (0,)S d1

f (r )rd1 drd ( ).

(10.33)
Rd

f dm =
(0,)S d1

f 1

d ( )

In particular if f : R+ R+ is measurable then

which combined with Tonellis Theorem 10.6 proves Eq. (10.35). Corollary 10.30. The surface area (S d1 ) of the unit sphere S d1 Rd is (10.34) (S d1 ) = where is the gamma function given by

f (|x|)dx =
Rd 0

f (r)dV (r)

2 d/2 (d/2)

(10.40)

where V (r) = m (B (0, r)) = rd m (B (0, 1)) = d1 S d1 rd . Proof. By Exercise 8.6, f dm =


Rd Rd \{0}

(x) := f 1
(0,)S d1

ux1 eu du
0

(10.41)

f 1 dm =

d ( m)

(10.35)

Moreover, (1/2) =

, (1) = 1 and (x + 1) = x (x) for x > 0.

Proof. Using Theorem 10.29 we nd

and therefore to prove Eq. (10.33) we must work out the measure m on B(0,) BS d1 dened by m(A) := m 1 (A) A B(0,) BS d1 . If A = (a, b] E with 0 < a < b and E BS d1 , then 1 (A) = {r : r (a, b] and E } = bE1 \ aE1 wherein we have used Ea = aE1 in the last equality. Therefore by the basic scaling properties of m and the fundamental theorem of calculus, ( m) ((a, b] E ) = m (bE1 \ aE1 ) = m(bE1 ) m(aE1 )
b

Id (1) =
0

dr rd1 er

d = (S d1 )
S d1 0

rd1 er dr.

(10.36)

We simplify this last integral by making the change of variables u = r2 so that 1/2 du. The result is r = u1/2 and dr = 1 2u
0

rd1 er dr =

1 eu u1/2 du 2 0 d 1 1 = u 2 1 eu du = (d/2). 2 0 2 u
d1 2

(10.42)

= bd m(E1 ) ad m(E1 ) = d m(E1 )


a

rd1 dr.

(10.37)

Letting d(r) = rd1 dr, i.e. (J ) =


J

Combing the the last two equations with Lemma 10.27 which states that Id (1) = d/2 , we conclude that 1 d/2 = Id (1) = (S d1 ) (d/2) 2 which proves Eq. (10.40). Example 8.8 implies (1) = 1 and from Eq. (10.42),

d1

dr J B(0,) ,

(10.38)

(1/2) = 2

Eq. (10.37) may be written as ( m) ((a, b] E ) = ((a, b]) (E ) = ( ) ((a, b] E ) . Since E = {(a, b] E : 0 < a < b and E BS d1 } , is a class (in fact it is an elementary class) such that (E ) = B(0,) BS d1 , it follows from the Theorem and Eq. (10.39) that m = . Using this result in Eq. (10.35) gives
Page: 103 job: prob macro: svmonob.cls

er dr = 0 = I1 (1) = .

er dr

(10.39)

The relation, (x + 1) = x (x) is the consequence of the following integration by parts argument:

(x + 1) =
0

eu ux+1

du = u

ux
0

d u e du du

=x
0

ux1 eu du = x (x).

date/time: 23-Feb-2007/15:20

104

10 Multiple and Iterated Integrals

10.7 More Spherical Coordinates


In this section we will dene spherical coordinates in all dimensions. Along the way we will develop an explicit method for computing surface integrals on spheres. As usual when n = 2 dene spherical coordinates (r, ) (0, ) [0, 2 ) so that x1 r cos = = T2 (, r). x2 r sin For n = 3 we let x3 = r cos 1 and then x1 x2 = T2 (, r sin 1 ),

and more generally, x1 = r sin n2 . . . sin 2 sin 1 cos x2 = r sin n2 . . . sin 2 sin 1 sin x3 = r sin n2 . . . sin 2 cos 1 . . . xn2 = r sin n2 sin n3 cos n4 xn1 = r sin n2 cos n3 xn = r cos n2 . By the change of variables formula, f (x)dm(x)
Rn

(10.43)

as can be seen from Figure 10.6, so that

=
0

dr
0i ,0 2

d1 . . . dn2 d

n (, 1 , . . . , n2 , r) f (Tn (, 1 , . . . , n2 , r)) (10.44)

where n (, 1 , . . . , n2 , r) := |det Tn (, 1 , . . . , n2 , r)| . Proposition 10.31. The Jacobian, n is given by


Fig. 10.6. Setting up polar coordinates in two and three dimensions.

n (, 1 , . . . , n2 , r) = rn1 sinn2 n2 . . . sin2 2 sin 1 .

(10.45)

x1 x2 = x3

T2 (, r sin 1 ) r cos 1

r sin 1 cos = r sin 1 sin =: T3 (, 1 , r, ). r cos 1

If f is a function on rS n1 the sphere of radius r centered at 0 inside of Rn , then f (x)d (x) = rn1
rS n1 S n1

f (r )d ( )

We continue to work inductively this way to dene x1 . Tn (, 1 , . . . , n2 , r sin n1 , ) . = Tn+1 (, 1 , . . . , n2 , n1 , r). . = r cos n1 xn xn+1 So for example, x1 x2 x3 x4 = r sin 2 sin 1 cos = r sin 2 sin 1 sin = r sin 2 cos 1 = r cos 2

=
0i ,0 2

f (Tn (, 1 , . . . , n2 , r))n (, 1 , . . . , n2 , r)d1 . . . dn2 d (10.46)

Proof. We are going to compute n inductively. Letting := r sin n1 Tn n and writing T for (, 1 , . . . , n2 , ) we have n+1 (,1 , . . . , n2 , n1 , r) =
Tn Tn 1

... 0 ...

Tn Tn Tn n2 r cos n1

r sin n1

sin n1 cos n1

= r cos2 n1 + sin2 n1 n (, , 1 , . . . , n2 , ) = rn (, 1 , . . . , n2 , r sin n1 ),

Page: 104

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

10.7 More Spherical Coordinates

105

i.e. n+1 (, 1 , . . . , n2 , n1 , r) = rn (, 1 , . . . , n2 , r sin n1 ). (10.47)

Indeed, 2(k+1)+1 = and 2k + 2 2k + 2 (2k )!! [2(k + 1)]!! 2k+1 = 2 =2 2k + 3 2k + 3 (2k + 1)!! (2(k + 1) + 1)!!

To arrive at this result we have expanded the determinant along the bottom row. Staring with 2 (, r) = r already derived in Example 10.24, Eq. (10.47) implies, 3 (, 1 , r) = r2 (, r sin 1 ) = r2 sin 1 4 (, 1 , 2 , r) = r3 (, 1 , r sin 2 ) = r3 sin2 2 sin 1 . . . n (, 1 , . . . , n2 , r) = rn1 sinn2 n2 . . . sin2 2 sin 1 which proves Eq. (10.45). Equation (10.46) now follows from Eqs. (10.33), (10.44) and (10.45). As a simple application, Eq. (10.46) implies (S n1 ) =
0i ,0 2 n2

2k + 1 2k + 1 (2k 1)!! (2k + 1)!! 2k = = . 2k + 1 2k + 2 (2k )!! (2k + 2)!! The recursion relation in Eq. (10.48) may be written as 2(k+1) = (S n ) = S n1 n1 which combined with S
1

(10.49)

= 2 implies

S 1 = 2, (S 2 ) = 2 1 = 2 2, 1 22 2 (S 3 ) = 2 2 2 = 2 2 = , 2 2!! 22 2 2 23 2 22 2 3 = 2 = (S 4 ) = 2!! 2!! 3 3!! 1 2 31 23 3 5 (S ) = 2 2 2 = , 2 3 42 4!! 1 2 31 42 24 3 (S 6 ) = 2 2 2 2= 2 3 42 53 5!! and more generally that (S 2n ) = 2 (2 ) (2 ) and (S 2n+1 ) = (2n 1)!! (2n)!!
n n n+1

sinn2 n2 . . . sin2 2 sin 1 d1 . . . dn2 d k = (S n2 )n2


k=1

= 2 where k :=
0

(10.48)

sink d. If k 1, we have by integration by parts that,


(10.50)

k =
0

sink d =
0

sink1 d cos = 2k,1 + (k 1)


0 k2

sink2 cos2 d

which is veried inductively using Eq. (10.49). Indeed, (S 2n+1 ) = (S 2n )2n = and (S (n+1) ) = (S 2n+2 ) = (S 2n+1 )2n+1 = Using (2n)!! = 2n (2(n 1)) . . . (2 1) = 2n n! (2 ) (2n)!! 2 (2 ) 2 = . (2n)!! (2n + 1)!! (2n + 1)!!
n+1 n+1

= 2k,1 + (k 1)
0

sin

1 sin d = 2k,1 + (k 1) [k2 k ]

2 (2 ) (2n 1)!! (2 ) = (2n 1)!! (2n)!! (2n)!!

n+1

and hence k satises 0 = , 1 = 2 and the recursion relation k = Hence we may conclude 0 = , 1 = 2, 2 = 1 2 31 42 531 , 3 = 2, 4 = , 5 = 2, 6 = 2 3 42 53 642 k1 k2 for k 2. k

and more generally by induction that 2k = (2k 1)!! (2k )!! and 2k+1 = 2 . (2k )!! (2k + 1)!!
job: prob

we may write (S 2n+1 ) = 2n! which shows that Eqs. (10.33) and (10.50 are in agreement. We may also write the formula in Eq. (10.50) as n/2 2(2) (n1)!! for n even n n+1 (S ) = (2) 2 for n odd. (n1)!!
macro: svmonob.cls date/time: 23-Feb-2007/15:20

n+1

Page: 105

106

10 Multiple and Iterated Integrals

10.8 Exercises
Exercise 10.4. Prove Theorem 10.9. Suggestion, to get started dene (A) :=
X1

Exercise 10.8. Folland Problem 2.48 on p. 69. (Counter example related to Fubini Theorem involving counting measures.) d (x1 ) . . .
Xn

d (xn ) 1A (x1 , . . . , xn )

Exercise 10.9. Folland Problem 2.50 on p. 69 pertaining to area under a curve. (Note the M BR should be M BR in this problem.) Exercise 10.10. Folland Problem 2.55 on p. 77. (Explicit integrations.) Exercise 10.11. Folland Problem 2.56 on p. 77. Let f L1 ((0, a), dm), g (x) = a f (t) 1 t dt for x (0, a), show g L ((0, a), dm) and x
a a

and then show Eq. (10.16) holds. Use the case of two factors as the model of your proof. Exercise 10.5. Let (Xj , Mj , j ) for j = 1, 2, 3 be nite measure spaces. Let F : (X1 X2 ) X3 X1 X2 X3 be dened by F ((x1 , x2 ), x3 ) = (x1 , x2 , x3 ).

g (x)dx =
0 0

f (t)dt.

1. Show F is ((M1 M2 ) M3 , M1 M2 M3 ) measurable and F 1 is (M1 M2 M3 , (M1 M2 ) M3 ) measurable. That is F : ((X1 X2 ) X3 , (M1 M2 ) M3 ) (X1 X2 X3 , M1 M2 M3 ) is a measure theoretic isomorphism. 2. Let := F [(1 2 ) 3 ] , i.e. (A) = [(1 2 ) 3 ] (F 1 (A)) for all A M1 M2 M3 . Then is the unique measure on M1 M2 M3 such that (A1 A2 A3 ) = 1 (A1 )2 (A2 )3 (A3 ) for all Ai Mi . We will write := 1 2 3 . 3. Let f : X1 X2 X3 [0, ] be a (M1 M2 M3 , BR ) measurable function. Verify the identity, f d =
X1 X2 X3 X3

x x dm(x) = . So sin / L1 ([0, ), m) and Exercise 10.12. Show 0 sin x x sin x x dm(x) is not dened as a Lebesgue integral. 0

Exercise 10.13. Folland Problem 2.57 on p. 77. Exercise 10.14. Folland Problem 2.58 on p. 77. Exercise 10.15. Folland Problem 2.60 on p. 77. Properties of the function. Exercise 10.16. Folland Problem 2.61 on p. 77. Fractional integration. Exercise 10.17. Folland Problem 2.62 on p. 80. Rotation invariance of surface measure on S n1 . Exercise 10.18. Folland Problem 2.64 on p. 80. On the integrability of a b |x| |log |x|| for x near 0 and x near in Rn . Exercise 10.19. Show, using Problem 10.17 that i j d ( ) =
S d1

d3 (x3 )
X2

d2 (x2 )
X1

d1 (x1 )f (x1 , x2 , x3 ),

makes sense and is correct. 4. (Optional.) Also show the above identity holds for any one of the six possible orderings of the iterated integrals. Exercise 10.6. Prove the second assertion of Theorem 10.19. That is show md is the unique translation invariant measure on BRd such that md ((0, 1]d ) = 1. Hint: Look at the proof of Theorem 5.22. Exercise 10.7. (Part of Folland Problem 2.46 on p. 69.) Let X = [0, 1], M = B[0,1] be the Borel eld on X, m be Lebesgue measure on [0, 1] and be counting measure, (A) = #(A). Finally let D = {(x, x) X 2 : x X } be the diagonal in X 2 . Show 1D (x, y )d (y ) dm(x) =
X X X X

1 ij S d1 . d

Hint: show

S d1

2 i d ( ) is independent of i and therefore

S d1

2 i d ( ) =

1 d

d S d1 2 j d ( ) .

j =1

1D (x, y )dm(x) d (y )

by explicitly computing both sides of this equation.


Page: 106 job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

11 Lp spaces
Let (, B , ) be a measure space and for 0 < p < and a measurable function f : C let
1/p

f and when p = , let f For 0 < p , let

:=

|f | d

(11.1)

Denition 11.3. 1. {fn } is a.e. Cauchy if there is a set E B such that (E ) = 0 and{1E c fn } is a pointwise Cauchy sequences. 2. {fn } is Cauchy in measure (or L0 Cauchy) if limm,n (|fn fm | > ) = 0 for all > 0. 3. {fn } is Cauchy in Lp if limm,n fn fm p = 0. When is a probability measure, we describe, fn f as fn converging to f in probability. If a sequence {fn }n=1 is Lp convergent, then it is Lp Cauchy. For example, when p [1, ] and fn f in Lp , we have fn fm
p

= inf {a 0 : (|f | > a) = 0}

(11.2)

fn f

+ f fm

0 as m, n .

Lp (, B , ) = {f : C : f is measurable and f

< }/

The case where p = 0 will be handled in Theorem 11.7 below. Lemma 11.4 (Lp convergence implies convergence in probability). Let p [1, ). If {fn } Lp is Lp convergent (Cauchy) then {fn } is also convergent (Cauchy) in measure. Proof. By Chebyshevs inequality (8.3), (|f | ) = (|f | p ) and therefore if {fn } is Lp Cauchy, then 1 fn fm p p 0 as m, n p showing {fn } is L0 Cauchy. A similar argument holds for the Lp convergent case. (|fn fm | )
p

where f g i f = g a.e. Notice that f g p = 0 i f g and if f g then f p = g p . In general we will (by abuse of notation) use f to denote both the function f and the equivalence class containing f. Remark 11.1. Suppose that f M, then for all a > M, (|f | > a) = 0 and therefore (|f | > M ) = limn (|f | > M + 1/n) = 0, i.e. |f ( )| M for a.e. . Conversely, if |f | M a.e. and a > M then (|f | > a) = 0 and hence f M. This leads to the identity: f

1 p

|f | d =

1 f p

p p

= inf {a 0 : |f ( )| a for a.e. } .

11.1 Modes of Convergence


Let {fn }n=1 {f } be a collection of complex valued measurable functions on . We have the following notions of convergence and Cauchy sequences. Denition 11.2. 1. fn f a.e. if there is a set E B such that (E ) = 0 and limn 1E c fn = 1E c f. 2. fn f in measure if limn (|fn f | > ) = 0 for all > 0. We will abbreviate this by saying fn f in L0 or by fn f. 3. fn f in Lp i f Lp and fn Lp for all n, and limn fn f p = 0. Here is a sequence of functions where fn 0 a.e., fn

0 in L1 , fn 0.

108

11 Lp spaces

Above is a sequence of functions where fn 0 in L1 , fn fn 0.


m 1

0 a.e., and

Above is a sequence of functions where fn 0 a.e., yet fn measure.

0 in L . or in

Theorem 11.5 (Egoro s Theorem: almost sure convergence implies convergence in probability). Suppose ( ) = 1 and fn f a.s. Then for all > 0 there exists E = E B such that (E ) < and fn f uniformly on E c . In particular fn f as n . Proof. Let fn f a.e. Then for all > 0, 0 = ({|fn f | > i.o. n}) = lim
N nN

(11.3)

{|fn f | > }

Here is a sequence of functions where fn 0 a.e., fn 0 but fn

0 in L1 .

lim sup ({|fN f | > })


N

from which it follows that fn f as n . To get the uniform convergence o a small exceptional set, the equality in Eq. (11.3) allows us to choose an increasing sequence {Nk }k=1 , such that, if Ek :=
nNk

|fn f | >

1 k

, then (Ek ) < 2k .

k The set, E := = . k=1 Ek , then satises the estimate, (E ) < k 2 1 Moreover, for / E, we have |fn ( ) f ( )| k for all n Nk and all k. That is fn f uniformly on E c .

Page: 108

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

11.1 Modes of Convergence


109

Lemma 11.6. Suppose an C and |an+1 an | n and


n n=1

n < . Then

2. Suppose fn f, > 0 and m, n N and are such that |fn ( ) fm ( )| > . Then < |fn ( ) fm ( )| |fn ( ) f ( )| + |f ( ) fm ( )| from which it follows that either |fn ( ) f ( )| > /2 or |f ( ) fm ( )| > /2. Therefore we have shown,

lim an = a C exists and |a an | n :=


k=n

k .

Proof. Let m > n then


m1 m1

|am an | =
k=n

(ak+1 ak )
k=n

|ak+1 ak |
k=n

k := n .

(11.4)

{|fn fm | > } {|fn f | > /2} {|fm f | > /2} and hence (|fn fm | > ) (|fn f | > /2)+ (|fm f | > /2) 0 as m, n . 3. Suppose {fn } is L0 () Cauchy and let n > 0 such that (n = 2n would do) and set n = subsequence of N such that ({|gj +1 gj | > j }) j . Let FN := j N {|gj +1 gj | > j } and E := N =1 FN = {|gj +1 gj | > j i.o.} and observe that (FN ) N < . Since
k=n

So |am an | min(m,n) 0 as , m, n , i.e. {an } is Cauchy. Let m in (11.4) to nd |a an | n . Theorem 11.7. Let (, B , ) be a measure space and {fn }n=1 be a sequence of measurable functions on . 1. If f and g are measurable functions and fn f and fn g then f = g a.e. 2. If fn f then {fn }n=1 is Cauchy in measure. 3. If {fn }n=1 is Cauchy in measure, there exists a measurable function, f, and a subsequence gj = fnj of {fn } such that limj gj := f exists a.e. 4. If {fn }n=1 is Cauchy in measure and f is as in item 3. then fn f. 5. Let us now further assume that ( ) < . In this case, a sequence of func tions, {fn }n=1 converges to f in probability i every subsequence, {fn }n=1 of {fn }n=1 has a further subsequence, {fn }n=1 , which is almost surely convergent to f. Proof. 1. Suppose that f and g are measurable functions such that fn g and fn f as n and > 0 is given. Since {|f g | > } = {|f fn + fn g | > } {|f fn | + |fn g | > } {|f fn | > /2} {|g fn | > /2} , (|f g | > ) (|f fn | > /2) + (|g fn | > /2) 0 as n . Hence (|f g | > 0) = n=1 |f g | > i.e. f = g a.e. 1 n

n <
n=1

k . Choose gj = fnj where {nj } is a

({|gj +1 gj | > j })
j =1 j =1

j < ,

it follows from the rst Borel-Cantelli lemma that 0 = (E ) = lim (FN ) .


N

For / E, |gj +1 ( ) gj ( )| j for a.a. j and so by Lemma 11.6, f ( ) := lim gj ( ) exists. For E we may dene f ( ) 0.

4. Next we will show gN f as N where f and gN are as above. If


n=1

|f g | >

1 n

= 0, then

c FN = j N {|gj +1 gj | j } ,

|gj +1 ( ) gj ( )| j for all j N. Another application of Lemma 11.6 shows |f ( ) gj ( )| j for all j N, i.e.

Page: 109

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

110

11 Lp spaces
c FN j N { : |f ( ) gj ( )| j } .

Taking complements of this equation shows {|f gN | > N } j N {|f gj | > j } FN . and therefore, (|f gN | > N ) (FN ) N 0 as N and in particular, gN f as N . With this in hand, it is straightforward to show fn f. Indeed, since {|fn f | > } = {|f gj + gj fn | > } {|f gj | + |gj fn | > } {|f gj | > /2} {|gj fn | > /2}, we have ({|fn f | > }) ({|f gj | > /2}) + (|gj fn | > /2). Therefore, letting j in this inequality gives, ({|fn f | > }) lim sup (|gj fn | > /2) 0 as n
j

Proof. First notice that |f | g a.e. and hence f L1 since g L1 . To see that |f | g, use Theorem 11.7 to nd subsequences {fnk } and {gnk } of {fn } and {gn } respectively which are almost everywhere convergent. Then |f | = lim |fnk | lim gnk = g a.e.
k k

If (for sake of contradiction) limn f fn subsequence {fnk } of {fn } such that

= 0 there exists > 0 and a

|f fnk | for all k.

(11.5)

Using Theorem 11.7 again, we may assume (by passing to a further subsequences if necessary) that fnk f and gnk g almost everywhere. Noting, |f fnk | g + gnk 2g and (g + gnk ) 2g, an application of the dominated convergence Theorem 8.34 implies limk |f fnk | = 0 which contradicts Eq. (11.5). Exercise 11.1 (Fatous Lemma). Let (, B , ) be a measure space. If fn 0 and fn f in measure, then f d lim inf n fn d. Exercise 11.2. Let (, B , ) be a measure space, p [1, ), {fn } Lp () p p and f Lp () . Then fn f in Lp () i fn f and |fn | |f | . Solution to Exercise (11.2). By the triangle inequality, f fn
p

because {fn }n=1 was Cauchy in measure. 5. If {fn }n=1 is convergent and hence Cauchy in probability then any subse quence, {fn }n=1 is also Cauchy in probability. Hence by item 3. there is a further subsequence, {fn }n=1 of {fn }n=1 which is convergent almost surely. Conversely if {fn }n=1 does not converge to f in probability, then there exists an > 0 and a subsequence, {nk } such that inf k (|f fnk | ) > 0. Any subsequence of {fnk } would have the same property and hence can not be almost surely convergent because of Theorem 11.5.

fn

which shows

|fn |

|f | if fn f in L . Moreover Chebyschevs

inequality implies fn f if fn f in Lp . p p p For the converse, let Fn := |f fn | and Gn := 2p1 [|f | + |fn | ] . Then p Fn 0, Fn Gn L1 , and Gn G where G := 2p |f | L1 . Therefore, p by Corollary 11.8, |f fn | = Fn 0 = 0. Corollary 11.9. Suppose (, B , ) is a probability space, fn f and gn g and : R R and : R2 R are continuous functions. Then 1. (fn ) (f ) , 2. (fn , gn ) (f, g ) , 3. fn + gn f + g, and 4. fn gn f g. Proof. Item 1., 3. and 4. all follow from item 2. by taking (x, y ) = (x) , (x, y ) = x + y, and (x, y ) = x y respectively. So it suces to prove item 2. To do this we will make repeated use of Theorem 11.7.

Corollary 11.8 (Dominated Convergence Theorem). Let (, B , ) be a measure space. Suppose {fn } , {gn } , and g are in L1 and f L0 are functions such that |fn | gn a.e., fn f, gn g, and Then f L1 and limn f fn limn fn = f.
Page: 110
1

gn

g as n .

= 0, i.e. fn f in L1 . In particular

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

11.2 Jensens, H olders and Minikowskis Inequalities

111

Given a subsequence, {nk } , of N there is a subsequence, {nk } of {nk } such that fnk f a.s. and yet a further subsequence {nk } of {nk } such that gnk g a.s. Hence, by the continuity of , it now follows that
k

f d

|f | d

|f | d.
n 1 i=1 pi

lim fnk , gnk

= (f, g ) a.s.

As a special case of Eq. (11.6), if pi , si > 0 for i = 1, 2, . . . , n and then


Pn Pn s1 . . . sn = e i=1 ln si = e i=1
1 pi

= 1,

which completes the proof.

ln si i

i=1

1 ln sp i e i = pi

i=1

i sp i . pi

(11.7)

11.2 Jensens, H olders and Minikowskis Inequalities


Theorem 11.10 (Jensens Inequality). Suppose that (, B , ) is a probability space, i.e. is a positive measure and ( ) = 1. Also suppose that f L1 (), f : (a, b), and : (a, b) R is a convex function, (i.e. (x) 0 on (a, b) .) Then

1 Indeed, we have applied Eq. (11.6) with = {1, 2, . . . , n} , = i=1 p i and i pi f (i) := ln si . As a special case of Eq. (11.7), suppose that s, t, p, q (1, ) p 1 1 with q = p 1 (i.e. p + q = 1) then

st

1 p 1 q s + t . p q

(11.8)

f d

(f )d

where if f / L1 (), then f is integrable in the extended sense and ( f ) d = . Proof. Let t = f d (a, b) and let R ( = (t) when (t) exists), be such that (s) (t) (s t) for all s (a, b). (See Lemma 7.31) and Figure 7.2 when is C 1 and Theorem 11.38 below for the existence of such a in the general case.) Then integrating the inequality, (f ) (t) (f t), implies that 0

(When p = q = 1/2, the inequality in Eq. (11.8) follows from the inequality, 2 0 (s t) .) 1/n As another special case of Eq. (11.7), take pi = n and si = ai with ai > 0, then we get the arithmetic geometric mean inequality, n a1 . . . an 1 n
n

ai .
i=1

(11.9)

Theorem 11.12 (H olders inequality). Suppose that 1 p and q := p 1 , or equivalently p + q 1 = 1. If f and g are measurable functions then p1 fg
1

g q.

(11.10)
p

(f )d (t) =

(f )d (

f d).

Assuming p (1, ) and f p g q < , equality holds in Eq. (11.10) i |f | q and |g | are linearly dependent as elements of L1 which happens i |g |q f
p p

Moreover, if (f ) is not integrable, then (f ) (t) + (f t) which shows that negative part of (f ) is integrable. Therefore, (f )d = in this case. Example 11.11. Since ex for x R, ln x for x > 0, and xp for x 0 and p 1 are all convex functions, we have the following inequalities exp

= g

q q

|f | a.e.

(11.11)

f d

ef d, |f | d

(11.6)

log(|f |)d log

Proof. The cases p = 1 and q = or p = and q = 1 are easy to deal with and will be left to the reader. So we now assume that p, q (1, ) . If f q = 0 or or g p = 0 or , Eq. (11.10) is again easily veried. So we will now assume that 0 < f q , g p < . Taking s = |f | / f p and t = |g |/ g q in Eq. (11.8) gives, p |f g | 1 |f | 1 |g |q + (11.12) f p g q p f p q g q with equality i |g/ g q | = |f | / f p p g q q |f | . Integrating Eq. (11.12) implies
macro: svmonob.cls
p1 (p1)

= |f |

p/q

/ f

p/q p ,

i.e. |g |q f

p p

and for p 1,
Page: 111 job: prob

date/time: 23-Feb-2007/15:20

112

11 Lp spaces

fg f
p

1 1 + =1 p q

with equality i Eq. (11.11) holds. The proof is nished since it is easily checked p q q p that equality holds in Eq. (11.10) when |f | = c |g | of |g | = c |f | for some constant c. Example 11.13. Suppose that ak C for k = 1, 2, . . . , n and p [1, ), then
n p n

Theorem 11.15 (Minkowskis Inequality). If 1 p and f, g Lp then f + g p f p + g p. (11.15) Proof. When p = , |f | f a.e. and |g | g |f | + |g | f + g a.e. and therefore f +g When p < , |f + g | (2 max (|f | , |g |)) = 2p max (|f | , |g | ) 2p (|f | + |g | ) , which implies1 f + g Lp since f +g
p p p p p p p p

a.e. so that |f + g |

+ g

ak
k=1

np1
k=1

|ak | .

(11.13)

Indeed, by H olders inequality applied using the measure space, {1, 2, . . . , n} equipped with counting measure, we have
n n n 1/p n 1/q n 1/p

ak =
k=1 k=1 p p1 .

ak 1
k=1

|ak |

p k=1

=n

1/q k=1

|ak |

2p

p p

+ g

p p

< .

Furthermore, when p = 1 we have f +g


1

where q =

Taking the pth power of this inequality then gives, Eq. (11.14).

|f + g |d

|f | d +

|g |d = f

+ g 1.
p

Theorem 11.14 (Generalized H olders inequality). Suppose that fi : C are measurable functions for i = 1, . . . , n and p1 , . . . , pn and r are positive n 1 numbers such that i=1 p = r1 , then i
n n

We now consider p (1, ) . We may assume f + g p , f p and g all positive since otherwise the theorem is easily veried. Integrating |f + g |p = |f + g ||f + g |p1 (|f | + |g |)|f + g |p1 and then applying Holders inequality with q = p/(p 1) gives |f + g |p d

are

fi
i=1 r

i=1

fi

pi

(11.14)

Proof. One may prove this theorem by induction based on H olders Theorem 11.12 above. Alternatively we may give a proof along the lines of the proof of Theorem 11.12 which is what we will do here. Since Eq. (11.14) is easily seen to hold if fi pi = 0 for some i, we will n ri = 1, hence we may assume that fi pi > 0 for all i. By assumption, i=1 p i replace si by sr and p by p /r for each i in Eq. (11.7) to nd i i i
n

|f | |f + g |p1 d +
p

|g | |f + g |p1 d
p1 q,

( f where |f + g |p1
q q

+ g p ) |f + g |

(11.16)

sr 1

. . . sr n
pi r

i=1

i (sr i) pi /r

p /r

=r
i=1

i sp i . pi

(|f + g |p1 )q d =

|f + g |p d = f + g p p.

(11.17)

Combining Eqs. (11.16) and (11.17) implies f +g


p p

Now replace si by |fi | / fi to nd 1


n i=1 n

in the previous inequality and integrate the result


n

f +g
p

p/q p

+ g

f +g

p/q p

(11.18)

fi

fi
pi i=1 r

r
i=1

1 1 i pi fi p pi

|fi | i d =
i=1

r = 1. pi

Solving this inequality for f + g


1

gives Eq. (11.15).

In light of Example 11.13, the last 2p in the above inequality may be replaced by 2p1 .

Page: 112

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

11.4 Relationships between dierent Lp spaces

113

11.3 Completeness of Lp spaces


Theorem 11.16. Let be as dened in Eq. (11.2), then (L (, B , ), ) is a Banach space. A sequence {fn }n=1 L con verges to f L i there exists E B such that (E ) = 0 and fn f uniformly on E c . Moreover, bounded simple functions are dense in L . Proof. By Minkowskis Theorem 11.15, satises the triangle inequality. The reader may easily check the remaining conditions that ensure is a norm. Suppose that {fn }n=1 L is a sequence such fn f L , i.e. f fn 0 as n . Then for all k N, there exists Nk < such that |f fn | > k Let E= k=1 nNk |f fn | > k
c 1 1 1

gj f

p p

k k

lim inf |gj gk |p d lim inf


k p p

|gj gk |p d

= lim inf gj gk In particular, f p gj f proof is nished because, fn f


p p

0 as j . < so the f Lp and gj f. The 0 as j, n .


Lp

+ gj

fn gj

+ gj f

See Proposition 12.5 for an important example of the use of this theorem.

= 0 for all n Nk .

11.4 Relationships between dierent Lp spaces


. The Lp () norm controls two types of behaviors of f, namely the behavior at innity and the behavior of local singularities. So in particular, if f blows up at a point x0 , then locally near x0 it is harder for f to be in Lp () as p increases. On the other hand a function f Lp () is allowed to decay at innity slower and slower as p increases. With these insights in mind, we should not in general expect Lp () Lq () or Lq () Lp (). However, there are two notable exceptions. (1) If ( ) < , then there is no behavior at innity to worry about and Lq () Lp () for all q p as is shown in Corollary 11.18 below. (2) If is counting measure, i.e. (A) = #(A), then all functions in Lp () for any p can not blow up on a set of positive measure, so there are no local singularities. In this case Lp () Lq () for all q p, see Corollary 11.23 below. Corollary 11.18. If ( ) < and 0 < p < q , then Lq () Lp (), the inclusion map is bounded and in fact f
p
1 1 [( )]( p q ) f

Then (E ) = 0 and for x E , |f (x) fn (x)| k for all n Nk . This shows that fn f uniformly on E c . Conversely, if there exists E B such that (E ) = 0 and fn f uniformly on E c , then for any > 0, (|f fn | ) = ({|f fn | } E c ) = 0 for all n suciently large. That is to say lim sup f fn
j

for all > 0.

The density of simple functions follows from the approximation Theorem 6.34. So the last item to prove is the completeness of L . Suppose m,n := fm fn 0 as m, n . Let Em,n = {|fn fm | > m,n } and E := Em,n , then (E ) = 0 and sup |fm (x) fn (x)| m,n 0 as m, n .
xE c

Therefore, f := limn fn exists on E c and the limit is uniform on E c . Letting f = limn 1E c fn , it then follows that limn fn f = 0. Theorem 11.17 (Completeness of L ()). For 1 p , L () equipped with the Lp norm, p (see Eq. (11.1)), is a Banach space. Proof. By Minkowskis Theorem 11.15, p satises the triangle inequality. As above the reader may easily check the remaining conditions that ensure p is a norm. So we are left to prove the completeness of Lp () for 1 p < , the case p = being done in Theorem 11.16. Let {fn }n=1 Lp () be a Cauchy sequence. By Chebyshevs inequality (Lemma 11.4), {fn } is L0 -Cauchy (i.e. Cauchy in measure) and by Theorem 11.7 there exists a subsequence {gj } of {fn } such that gj f a.e. By Fatous Lemma,
Page: 113 job: prob
p p

Proof. Take a [1, ] such that 1 1 1 pq = + , i.e. a = . p a q qp Then by Theorem 11.14, f


p

= f 1

= ( )1/a f

= ( )( p q ) f

q.

The reader may easily check this nal formula is correct even when q = provided we interpret 1/p 1/ to be 1/p.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

114

11 Lp spaces

The rest of this section may be skipped.


Example 11.19 (Power Inequalities). Let a := (a1 , . . . , an ) with ai > 0 for i = 1, 2, . . . , n and for p R \ {0} , let a := 1 n
n 1/p

1 Since n we have

1/p

1 as p , it follows that limp a 1


1 a q

= M. For p = q < 0,

lim

= lim

1 1 = = m = min ai . i maxi (1/ai ) 1/m

ap i
i=1

Then by Corollary 11.18, p a we have 1 n


n q a i i=1

is increasing in p for p > 0. For p = q < 0, = 1


1 n n i=1 1 ai q

Conclusion. If we extend the denition of a p to p = and p = p a (0, ) is a by a = maxi ai and a = mini ai , then R p continuous non-decreasing function of p. Proposition 11.20. Suppose that 0 < p0 < p1 , (0, 1) and p (p0 , p1 ) be dened by 1 1 = + (11.19) p p0 p1 with the interpretation that /p1 = 0 if p1 = .2 Then Lp Lp0 + Lp1 , i.e. every function f Lp may be written as f = g + h with g Lp0 and h Lp1 . For 1 p0 < p1 and f Lp0 + Lp1 let f := inf g
p0

1/q

1/q = 1 a

1 q

:=

1 := (1/a1 , . . . , 1/an ) . So for p < 0, as p increases, q = p decreases, so where a 1 1 1 that a q is decreasing and hence a is increasing. Hence we have shown q that p a p is increasing for p R \ {0} . We now claim that limp0 a p = n a1 . . . an . To prove this, write ap i = 2 p ln ai = 1 + p ln ai + O p for p near zero. Therefore, e

+ h

p1

:f =g+h .

1 n Hence it follows that lim a = lim = en


1

ap i =1+p
i=1

1 n

Then (Lp0 + Lp1 , ) is a Banach space and the inclusion map from Lp to Lp0 + Lp1 is bounded; in fact f 2 f p for all f Lp . Proof. Let M > 0, then the local singularities of f are contained in the set E := {|f | > M } and the behavior of f at innity is solely determined by f on E c . Hence let g = f 1E and h = f 1E c so that f = g + h. By our earlier discussion we expect that g Lp0 and h Lp1 and this is the case since,

ln ai + O p2 .
i=1

p0

p0

Pn

1 n

1/p

ap i
i=1 ln ai

= lim n n

p0

1 1+p n

1/p

ln ai + O p
i=1

p0 p0

|f |

p0

1|f |>M = M p0 f M
p

f M

p0

1|f |>M
p p

i=1

a1 . . . an . and h
p1 p1

M p0

1|f |>M M p0 p f

<

So if we now dene a 0 := a1 . . . an , the map p R a p (0, ) is continuous and increasing in p. We will now show that limp a p = maxi ai =: M and limp a p = mini ai =: m. Indeed, for p > 0, 1 1 p M n n and therefore, 1 n
Page: 114
1/p n

= f 1|f |M M p1

p1 p1

|f |

p1

1|f |M = M p1
p p

f M < .

p1

1|f |M

ap i
i=1

p
2

f M

1|f |M M p1 p f

A little algebra shows that may be computed in terms of p0 , p and p1 by = p0 p1 p . p p1 p0

M a

M.

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

11.5 Uniform Integrability

115

Moreover this shows f M 1p /p0 f Taking M = f


p p /p0 p

+ M 1p /p1 f

p /p1 p

Corollary 11.23. Suppose now that is counting measure on . Then Lp () Lq () for all 0 < p < q and f q f p . Proof. Suppose that 0 < p < q = , then f
p p p p p

then gives f
1p /p0

= sup {|f (x)| : x }


x

|f (x)| = f

1p /p1

and then taking = 1 shows f 2 f p . The proof that (Lp0 + Lp1 , ) is a Banach space is left as Exercise 11.6 to the reader. Corollary 11.21 (Interpolation of L norms). Suppose that 0 < p0 < p1 , (0, 1) and p (p0 , p1 ) be dened as in Eq. (11.19), then Lp0 Lp1 Lp and 1 (11.20) f p f p0 f p1 . Further assume 1 p0 < p < p1 , and for f Lp0 Lp1 let f := f
p0 p

i.e. f f p for all 0 < p < . For 0 < p q , apply Corollary 11.21 with p0 = p and p1 = to nd f
q

p/q p

1p/q

p/q p

1p/q p

= f

11.4.1 Summary: Lp0 Lp1 Lq Lp0 + Lp1 for any q (p0 , p1 ). If p q, then p q and f q f p . p Since (|f | > ) p f p , Lp convergence implies L0 convergence. L0 convergence implies almost everywhere convergence for some subsequence. 5. If ( ) < then almost everywhere convergence implies uniform convergence o certain sets of small measure and in particular we have L0 convergence. 6. If ( ) < , then Lq Lp for all p q and Lq convergence implies Lp convergence. 1. 2. 3. 4.

+ f

p1

Then (Lp0 Lp1 , ) is a Banach space and the inclusion map of Lp0 Lp1 into Lp is bounded, in fact f
p

max 1 , (1 )1

p0

+ f

p1

(11.21)

The heuristic explanation of this corollary is that if f Lp0 Lp1 , then f has local singularities no worse than an Lp1 function and behavior at innity no worse than an Lp0 function. Hence f Lp for any p between p0 and p1 . Proof. Let be determined as above, a = p0 / and b = p1 /(1 ), then by Theorem 11.14, f
p

11.5 Uniform Integrability


This section will address the question as to what extra conditions are needed in order that an L0 convergent sequence is Lp convergent. This will lead us to the notion of uniform integrability. To simplify matters a bit here, it will be assumed that (, B , ) is a nite measure space for this section. Notation 11.24 For f L1 () and E B , let (f : E ) :=
E

= |f | |f |

1 p

|f |

|f |

1 b

= f

p0

1 p1

It is easily checked that is a norm on Lp0 Lp1 . To show this space is complete, suppose that {fn } Lp0 Lp1 is a Cauchy sequence. Then {fn } is both Lp0 and Lp1 Cauchy. Hence there exist f Lp0 and g Lp1 such that limn f fn p0 = 0 and limn g fn p = 0. By Chebyshevs inequality (Lemma 11.4) fn f and fn g in measure and therefore by Theorem 11.7, f = g a.e. It now is clear that limn f fn = 0. The estimate in Eq. (11.21) is left as Exercise 11.5 to the reader. Remark 11.22. Combining Proposition 11.20 and Corollary 11.21 gives L
p0

f d.

and more generally if A, B B let (f : A, B ) :=


AB

f d.

p1

p0

+L

p1

for 0 < p0 < p1 , (0, 1) and p (p0 , p1 ) as in Eq. (11.19).


Page: 115 job: prob

When is a probability measure, we will often write E [f : E ] for (f : E ) and E [f : A, B ] for (f : A, B ).


macro: svmonob.cls date/time: 23-Feb-2007/15:20

116

11 Lp spaces

Denition 11.25. A collection of functions, L1 () is said to be uniformly integrable if,


a f

(|g | : E ) (|| : E ) + (|g | : E )


n n

lim sup (|f | : |f | a) = 0.

(11.22)

i=1

|ci | (E Bi ) + g

i=1

|ci | (E ) + /2.
n i=1

The condition in Eq. (11.22) implies supf f 1 < .3 Indeed, choose a suciently large so that supf (|f | : |f | a) 1, then for f f
1

This shows (|g | : E ) < provided that (E ) < (2

|ci |)

= (|f | : |f | a) + (|f | : |f | < a) 1 + a ( ) .

Proposition 11.29. A subset L1 () is uniformly integrable i L1 () is bounded is uniformly absolutely continuous. Proof. ( = ) We have already seen that uniformly integrable subsets, , are bounded in L1 () . Moreover, for f , and E B , (|f | : E ) = (|f | : |f | M, E ) + (|f | : |f | < M, E ) sup (|f | : |f | M ) + M (E ).
n

Let us also note that if = {f } with f L1 () , then is uniformly integrable. Indeed, lima (|f | : |f | a) = 0 by the dominated convergence theorem. Denition 11.26. A collection of functions, L1 () is said to be uniformly absolutely continuous if for all > 0 there exists > 0 such that sup (|f | : E ) < whenever (E ) < .
f

(11.23)

Remark 11.27. It is not in general true that if {fn } L1 () is uniformly absolutely continuous implies supn fn 1 < . For example take = {} and ({}) = 1. Let fn () = n. Since for < 1 a set E such that (E ) < is in fact the empty set and hence {fn }n=1 is uniformly absolutely continuous. However, for nite measure spaces without atoms, for every > 0 we may k nd a nite partition of by sets {E } =1 with (E ) < . If Eq. (11.23) holds with = 1, then
k

So given > 0 choose M so large that supf (|f | : |f | M ) < /2 and then take = 2M to verify that is uniformly absolutely continuous. (=) Let K := supf f 1 < . Then for f , we have (|f | a) f
1

/a K/a for all a > 0.

Hence given > 0 and > 0 as in the denition of uniform absolute continuity, we may choose a = K/ in which case sup (|f | : |f | a) < .
f

(|fn |) =
=1

(|fn | : E ) k

showing that (|fn |) k for all n. Lemma 11.28 (This lemma may be skipped.). For any g L1 (), = {g } is uniformly absolutely continuous. Proof. First Proof. If the Lemma is false, there would exist > 0 and sets En such that (En ) 0 while (|g | : En ) for all n. Since |1En g | |g | L1 and for any > 0, (1En |g | > ) (En ) 0 as n , the dominated convergence theorem of Corollary 11.8 implies limn (|g | : En ) = 0. This contradicts (|g | : En ) for all n and the proof is complete. n Second Proof. Let = i=1 ci 1Bi be a simple function such that g 1 < /2. Then
3

Since > 0 was arbitrary, it follows that lima supf (|f | : |f | a) = 0 as desired. Corollary 11.30. Suppose {fn }n=1 and {gn }n=1 are two uniformly integrable sequences, then {fn + gn }n=1 is also uniformly integrable. Proof. By Proposition 11.29, {fn }n=1 and {gn }n=1 are both bounded in L1 () and are both uniformly absolutely continuous. Since fn + gn 1 fn 1 + gn 1 it follows that {fn + gn }n=1 is bounded in L1 () as well. Moreover, for > 0 we may choose > 0 such that (|fn | : E ) < and (|gn | : E ) < whenever (E ) < . For this choice of and , we then have (|fn + gn | : E ) (|fn | + |gn | : E ) < 2 whenever (E ) < , showing {fn + gn }n=1 uniformly absolutely continuous. Another application of Proposition 11.29 completes the proof.

This is not necessarily the case if ( ) = . Indeed, if = R and = m is Lebesgue measure, the sequences of functions, fn := 1[n,n] n=1 are uniformly integrable but not bounded in L1 (m) .

Page: 116

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

11.5 Uniform Integrability

117

Exercise 11.3 (Problem 5 on p. 196 of Resnick.). Suppose that n is a sequence of integrable and i.i.d random variables. Then S n n=1 is uniformly integrable. Theorem 11.31 (Vitali Convergence Theorem). Let (, B , ) be a nite measure space, := {fn }n=1 be a sequence of functions in L1 () , and f : C be a measurable function. Then f L1 () and f fn 1 0 as n i fn f in measure and is uniformly integrable. Proof. (=) If fn f in measure and = is uniformly integrable then we know M := supn fn 1 < . Hence and application of Fatous lemma, see Exercise 11.1, |f | d lim inf
n {fn }n=1

{Xn }n=1

sup (|fn | : E ) sup (|fn | : E ) (N + (|f | : E )) N + (gN : E ) ,


n nN N n=1 1

(11.25) where gN = |f | + |fn | L . Given > 0 x N large so that N < /2 and then choose > 0 (by Lemma 11.28) such that (gN : E ) < if (E ) < . It then follows from Eq. (11.25) that sup (|fn | : E ) < /2 + /2 = when (E ) < .
n

|fn | d M < ,

Example 11.32. Let = [0, 1] , B = B[0,1] and P = m be Lebesgue measure on B . Then the collection of functions, f (x) := 2 (1 x/) 0 for (0, 1) is bounded in L1 (P ) , f 0 a.e. as 0 but 0=
0

i.e. f L1 (). One now easily checks that 0 := {f fn }n=1 is bounded in L1 () and (using Lemma 11.28 and Proposition 11.29) 0 is uniformly absolutely continuous and hence 0 is uniformly integrable. Therefore, f fn
1

lim f dP = lim
0

f dP = 1.

This is a typical example of a bounded and pointwise convergent sequence in L1 which is not uniformly integrable. Example 11.33. Let = [0, 1] , P be Lebesgue measure on B = B[0,1] , and for (0, 1) let a > 0 with lim0 a = and let f := a 1[0,] . Then Ef = a and so sup>0 f 1 =: K < i a K for all . Since sup E [f : f M ] = sup [a 1a M ] ,

= (|f fn | : |f fn | a) + (|f fn | : |f fn | < a) (a) +

1|f fn |<a |f fn | d

(11.24)

where (a) := sup (|f fm | : |f fm | a) 0 as a .


m

Since 1|f fn |<a |f fn | a L1 () and 1|f fn |<a |f fn | > (|f fn | > ) 0 as n , we may pass to the limit in Eq. (11.24), with the aid of the dominated convergence theorem (see Corollary 11.8), to nd lim sup f fn
n 1

if {f } is uniformly integrable and > 0 is given, for large M we have a for small enough so that a M. From this we conclude that lim sup0 (a ) and since > 0 was arbitrary, lim0 a = 0 if {f } is uniformly integrable. By reversing these steps one sees the converse is also true. Alternatively. No matter how a > 0 is chosen, lim0 f = 0 a.s.. So from Theorem 11.31, if {f } is uniformly integrable we would have to have lim (a ) = lim Ef = E0 = 0.
0 0

(a) 0 as a .

( = ) If fn f in L1 () , then by Chebyschevs inequality it follows that fn f in measure. Since convergent sequences are bounded, to show is uniformly integrable it suces to shows is uniformly absolutely continuous. Now for E B and n N, (|fn | : E ) (|f fn | : E ) + (|f | : E ) f fn Let N := supn>N f fn
Page: 117
1 1

Corollary 11.34. Let (, B , ) be a nite measure space, p [1, ), {fn }n=1 be a sequence of functions in Lp () , and f : C be a measurable function. Then f Lp () and f fn p 0 as n i fn f in measure and p := {|fn | }n=1 is uniformly integrable. Proof. ( = ) Suppose that fn f in measure and := {|fn | }n=1 p p is uniformly integrable. By Corollary 11.9, |fn | |f | in measure, and p p p p hn := |f fn | 0, and by Theorem 11.31, |f | L1 () and |fn | |f | in 1 L () . Since
macro: svmonob.cls date/time: 23-Feb-2007/15:20
p

+ (|f | : E ).

, then N 0 as N and

job: prob

118

11 Lp spaces

hn := |f fn | (|f | + |fn |) 2p1 (|f | + |fn | ) =: gn L1 () with gn g := 2p1 |f | in L1 () , the dominated convergence theorem in Corollary 11.8, implies f fn
p p p

Proof. 1. Let be as in item 1. above and set a := supxa a by assumption. Then for f (|f | : |f | a) = |f | (|f |) : |f | a (|f |)

x (x)

0 as

( (|f |) : |f | a)a

|f fn | d =

hn d 0 as n . and hence

( (|f |))a Ka lim sup |f | 1|f |a lim Ka = 0.


a

(=) Suppose f Lp and fn f in Lp . Again fn f in measure by Lemma 11.4. Let hn := ||fn |p |f |p | |fn |p + |f |p =: gn L1 and g := 2|f |p L1 . Then gn g, hn 0 and gn d gd. Therefore by the dominated convergence theorem in Corollary 11.8, lim hn d = 0,
n

a f

2. By assumption, a := supf |f | 1|f |a 0 as a . Therefore we may choose an such that

(n + 1) an <
n=0

i.e. |fn | |f | in L () . Hence it follows from Theorem 11.31 that is uniformly integrable. The following Lemma gives a concrete necessary and sucient conditions for verifying a sequence of functions is uniformly integrable. Lemma 11.35. Suppose that ( ) < , and L ( ) is a collection of functions. 1. If there exists a non decreasing function : R+ R+ such that limx (x)/x = and K := sup ((|f |)) <
f 0

where by convention a0 := 0. Now dene so that (0) = 0 and

(x) =
n=0

(n + 1) 1(an ,an+1 ] (x),

i.e. (x) =
0

(y )dy =
n=0

(n + 1) (x an+1 x an ) .

(11.26)

By construction is continuous, (0) = 0, (x) is increasing (so is convex) and (x) (n + 1) for x an . In particular (x) (an ) + (n + 1)x n + 1 for x an x x from which we conclude limx (x)/x = . We also have (x) (n + 1) on [0, an+1 ] and therefore (x) (n + 1)x for x an+1 . So for f ,

then is uniformly integrable. 2. Conversely if is uniformly integrable, there exists a non-decreasing continuous function : R+ R+ such that (0) = 0, limx (x)/x = and Eq. (11.26) is valid.
4

Here is an alternative proof. By the mean value theorem, ||f |p |fn |p | p(max(|f | , |fn |))p1 ||f | |fn || p(|f | + |fn |)p1 ||f | |fn || and therefore by H olders inequality,

||f |p |fn |p | d p

(|f | + |fn |)p1 ||f | |fn || d p


p

((|f |)) =
(|f | + |fn |)p1 |f fn | d
p/q p

(|f |)1(an ,an+1 ] (|f |)


n=0

p f fn p( f
p

(|f | + |fn |)p1


p/q p)

q p

= p |f | + |fn |

f fn

(n + 1) |f | 1(an ,an+1 ] (|f |)


n=0

+ fn

where q := p/(p 1). This shows that

f fn

||f |p |fn |p | d 0 as n .

n=0

(n + 1) |f | 1|f |an
n=0

(n + 1) an

Page: 118

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

11.7 Appendix: Convex Functions

119

and hence sup ((|f |))


f

(n + 1) an < .
n=0

11.6 Exercises
Exercise 11.4. Let f Lp L for some p < . Show f = limq f q . If we further assume (X ) < , show f = limq f q for all measurable functions f : X C. In particular, f L i limq f q < . Hints: Use Corollary 11.21 to show lim supq f q f and to show lim inf q f q f , let M < f and make use of Chebyshevs inequality. Exercise 11.5. Prove Eq. (11.21) in Corollary 11.21. (Part of Folland 6.3 on p. 186.) Hint: Use the inequality, with a, b 1 with a1 + b1 = 1 chosen appropriately, sa tb st + a b applied to the right side of Eq. (11.20). Exercise 11.6. Complete the proof of Proposition 11.20 by showing (Lp + Lr , ) is a Banach space.

Fig. 11.1. A convex function with three cords. Notice the slope relationships; m1 m3 m2 .

1. F (x, y ) is increasing in each of its arguments. 2. The following limits exist, + (x) := F (x, x+) := lim F (x, y ) < and
y x

(11.27) (11.28)

11.7 Appendix: Convex Functions


Reference; see the appendix (page 500) of Revuz and Yor. Denition 11.36. A function : (a, b) R is convex if for all a < x0 < x1 < b and t [0, 1] (xt ) t(x1 ) + (1 t)(x0 ) where xt = tx1 + (1 t)x0 , see Figure ?? below. Example 11.37. The functions exp(x) and log(x) are convex and |x| is convex i p 1 as follows from Lemma 7.31 for p > 1 and by inspection of p = 1. Theorem 11.38. Suppose that : (a, b) R is convex and for x, y (a, b) with x < y, let5 (y ) (x) F (x, y ) := . yx Then;
5

(y ) := F (y , y ) := lim F (x, y ) > .


xy

3. The functions, are both increasing functions and further satisfy, < (x) + (x) (y ) < a < x < y < b. 4. For any t (x) , + (x) , (y ) (x) + t (y x) for all x, y (a, b) . 5. For a < < < b, let K := max + () , ( ) . Then (11.30) (11.29)

| (y ) (x)| K |y x| for all x, y [, ] . That is is Lipschitz continuous on [, ] . 6. The function + is right continuous and is left continuous. 7. The set of discontinuity points for + and for are the same as the set of points of non-dierentiability of . Moreover this set is at most countable.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

The same formula would dene F (x, y ) for x = y. However, since F (x, y ) = F (y, x) , we would gain no new information by this extension.

Page: 119

job: prob

120

11 Lp spaces

Proof. 1. and 2. If we let ht = t(x1 ) + (1 t)(x0 ), then (xt , ht ) is on the line segment joining (x0 , (x0 )) to (x1 , (x1 )) and the statement that is convex is then equivalent of (xt ) ht for all 0 t 1. Since (x1 ) (x0 ) (x1 ) ht ht (x0 ) = = , xt x0 x1 x0 x1 xt the convexity of is equivalent to ht (x0 ) (x1 ) (x0 ) (xt ) (x0 ) = for all x0 xt x1 xt x0 xt x0 x1 x0 and to (x1 ) ht (x1 ) (xt ) (x1 ) (x0 ) = for all x0 xt x1 x1 x0 x1 xt x1 xt and convexity also implies (xt ) (x0 ) ht (x0 ) (x1 ) ht (x1 ) (xt ) = = . xt x0 xt x0 x1 xt x1 xt These inequalities may be written more compactly as, (v ) (u) (w) (u) (w) (v ) , vu wu wv (11.31)

t (x) = F (x, x) F (y, x) = or equivalently,

(x) (y ) xy

(y ) (x) t (x y ) = (x) + t (y x) for y x. Hence we have proved Eq. (11.30) for all x, y (a, b) . 5. For a < x < y < b, we have + () + (x) = F (x, x+) F (x, y ) F (y , y ) = (y ) ( ) (11.32) and in particular, K + () (y ) (x) ( ) K. yx

valid for all a < u < v < w < b, again see Figure 11.1. The rst (second) inequality in Eq. (11.31) shows F (x, y ) is increasing y (x). This then implies the limits in item 2. are monotone and hence exist as claimed. 3. Let a < x < y < b. Using the increasing nature of F, < (x) = F (x, x) F (x, x+) = + (x) < and + (x) = F (x, x+) F (y , y ) = (y ) as desired. 4. Let t (x) , + (x) . Then t + (x) = F (x, x+) F (x, y ) = or equivalently, (y ) (x) + t (y x) for y x. Therefore Eq. (11.30) holds for y x. Similarly, for y < x, (y ) (x) yx

This last inequality implies, | (y ) (x)| K (y x) which is the desired Lipschitz bound. 6. For a < c < x < y < b, we have + (x) = F (x, x+) F (x, y ) and letting x c (using the continuity of F ) we learn + (c+) F (c, y ) . We may now let y c to conclude + (c+) + (c) . Since + (c) + (c+) , it follows that + (c) = + (c+) and hence that + is right continuous. Similarly, for a < x < y < c < b, we have (y ) F (x, y ) and letting y c (using the continuity of F ) we learn (c) F (x, c) . Now let x c to conclude (c) (c) . Since (c) (c) , it follows that (c) = (c) , i.e. is left continuous. 7. Since are increasing functions, they have at most countably many points of discontinuity. Letting x y in Eq. (11.29), using the left continuity of , shows (y ) = + (y ) . Hence if is continuous at y, (y ) = (y +) = + (y ) and is dierentiable at y. Conversely if is dierentiable at y, then + (y ) = (y ) = (y ) = + (y ) which shows + is continuous at y. Thus we have shown that set of discontinuity points of + is the same as the set of points of non-dierentiability of . That the discontinuity set of is the same as the non-dierentiability set of is proved similarly. Corollary 11.39. If : (a, b) R is a convex function and D (a, b) is a dense set, then (y ) = sup [ (x) + (x) (y x)] for all x, y (a, b) .
xD

Page: 120

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

Proof. Let (y ) := supxD [ (x) + (x) (y x)] . According to Eq. (11.30) above, we know that (y ) (y ) for all y (a, b) . Now suppose that x (a, b) and xn with xn x. Then passing to the limit in the estimate, (y ) (xn ) + (xn ) (y xn ) , shows (y ) (x) + (x) (y x) . Since x (a, b) is arbitrary we may take x = y to discover (y ) (y ) and hence (y ) = (y ) . The proof that (y ) = + (y ) is similar.

Part III

Convergence Results

12 Laws of Large Numbers


In this chapter {Xk }k=1 will be a sequence of random variables on a probability space, (, B , P ) , and we will set Sn := X1 + + Xn for all n N. Denition 12.1. The covariance, Cov (X, Y ) of two square integrable random variables, X and Y, is dened by Cov (X, Y ) = E [(X aX ) (Y aY )] = E [XY ] EX EY where aX := EX and aY := EY. The variance of X, Var (X ) := Cov (X, X ) = E X
2

Exercise 12.1 (A correlation inequality). Suppose that X is a random variable and f, g : R R are two increasing functions such that both f (X ) and g (X ) are square integrable. Show Cov (f (X ) , g (X )) 0. Hint: let Y be another random variable which has the same law as X and is independent of X. Then consider E [(f (Y ) f (X )) (g (Y ) g (X ))] .

(EX )

(12.1)

We say that X and Y are uncorrelated if Cov (X, Y ) = 0, i.e. E [XY ] = n EX EY. More generally we say {Xk }k=1 L2 (P ) are uncorrelated i Cov (Xi , Xj ) = 0 for all i = j. Notice that if X and Y are independent random variables, then f (X ) , g (Y ) are independent and hence uncorrelated for any choice of Borel measurable functions, f, g : R R such that f (X ) and g (X ) are square integrable. It also follows from Eq. (12.1) that Var (X ) E X 2 for all X L2 (P ) . (12.2)

Theorem 12.3 (An L2 Weak Law of Large Numbers). Let {Xn }n=1 be a sequence of uncorrelated square integrable random variables, n = EXn and 2 = Var (Xn ) . If there exists an increasing positive sequence, {an } and R n such that 1 an 1 a2 n then
Sn an n

j as n and
j =1 n 2 j 0 as n , j =1

in L2 (P ) and also in probability.


n j =1

Lemma 12.2. The covariance function, Cov (X, Y ) is bilinear in X and Y and Cov (X, Y ) = 0 if either X or Y is constant. For any constant k, Var (X + k ) = n Var (X ) and Var (kX ) = k 2 Var (X ) . If {Xk }k=1 are uncorrelated L2 (P ) random variables, then
n

Proof. We rst observe that ESn = E S n


j =1 n

j and
n

2 j = Var (Sn ) =

Var (Xj ) =
j =1 j =1

2 j .

Var (Sn ) =
k=1

Var (Xk ) . Hence ESn = and Sn E Hence,


n j =1

Proof. We leave most of this simple proof to the reader. As an example of the type of argument involved, let us prove Var (X + k ) = Var (X ) ; Var (X + k ) = Cov (X + k, X + k ) = Cov (X + k, X ) + Cov (X + k, k ) = Cov (X + k, X ) = Cov (X, X ) + Cov (k, X ) = Cov (X, X ) = Var (X ) .

1 an j

j
j =1 2

an

1 a2 n

n 2 j 0. j =1

126

12 Laws of Large Numbers

Sn an

=
L2 (P )

Sn

n j =1

an Sn
n j =1

n j =1

an +

L2 (P ) n j =1

Theorem 12.6 (Khintchins WLLN). If {Xn }n=1 are i.i.d. L1 (P ) random variables, then Proof. Letting 0. Sn :=
i=1 n P 1 n Sn

= EX1 .

j
L2 (P )

an

an

Xi 1|Xi |n ,

Example 12.4. Suppose that {Xk }k=1 L2 (P ) are uncorrelated identically distributed random variables. Then Sn n
L2 (P )

we have {Sn = Sn } n i=1 {|Xi | > n} . Therefore, using Chebyschevs inequality along with the dominated convergence theorem, we have
n

= EX1 as n .

P (Sn = Sn )
i=1

P (|Xi | > n) = nP (|X1 | > n)

To see this, simply apply Theorem 12.3 with an = n. Proposition 12.5 (L2 - Convergence of Random Sums). Suppose that {Xk }k=1 L2 (P ) are uncorrelated. If k=1 Var (Xk ) < then

E [|X1 | : |X1 | > n] 0. Hence it follows that P i.e.


Sn n

S Sn n > n n

P (Sn = Sn ) 0 as n ,
Sn P n

(Xk k ) converges in L2 (P ) .
k=1

Sn P n

0. So it suces to prove

.
2 Sn L (P ) n

where k := EXk . Proof. Letting Sn := k=1 (Xk k ) , it suces by the completeness of L (P ) (see Theorem 11.17) to show Sn Sm 2 0 as m, n . Supposing n > m, we have
2 n 2 n

We will now complete the proof by showing that, in fact, this end, let n := 1 1 ESn = n n
n

. To

E Xi 1|Xi |n = E X1 1|X1 |n
i=1

Sn

2 Sm 2

=E
k=m+1 n

(Xk k )
n 2 k 0 as m, n . k=m+1

and observe that limn n = by the DCT. Moreover, E Sn n n


2

= Var = = 1 n2
n

=
k=m+1

Var (Xk ) =

Sn n

1 Var (Sn ) n2

Var Xi 1|Xi |n
i=1

Note well: since L2 (P ) convergence implies Lp (P ) convergence for 0 p 2, where by L0 (P ) convergence we mean convergence in probability. The remainder of this chapter is mostly devoted to proving a.s. convergence for the quantities in Theorem 11.17 and Proposition 12.5 under various assumptions. These results will be described in the next section.

1 1 2 Var X1 1|X1 |n E X1 1|X1 |n n n E |X1 | 1|X1 |n and so again by the DCT, Sn n


L2 (P ) Sn n

L2 (P )

0. This completes the proof since, + |n | 0 as n .

12.1 Main Results


The proofs of most of the theorems in this section will be the subject of later parts of this chapter.
Page: 126 job: prob

Sn n n

L2 (P )

In fact we have the stronger result.


macro: svmonob.cls date/time: 23-Feb-2007/15:20

12.1 Main Results

127

Theorem 12.7 (Kolmogorovs Strong Law of Large Numbers). Suppose that {Xn }n=1 are i.i.d. random variables and let Sn := X1 + + Xn . Then 1 there exists R such that n Sn a.s. i Xn is integrable and in which case EXn = .
1 Remark 12.8. If E |X1 | = but EX1 < , then n Sn a.s. To prove this, n M M M for M > 0 let Xn := Xn M and Sn := i=1 Xi . It follows from Theorem 1 M M M Sn M := EX1 a.s.. Since Sn Sn , we may conclude that 12.7 that n

Theorem 12.11 (Kolmogorovs Convergence Criteria). Suppose that {Yn }n=1 are independent square integrable random variables. If j =1 Var (Yj ) < , then j =1 (Yj EYj ) converges a.s. Proof. One way to prove this is to appeal Proposition 12.5 above and L evys Theorem 12.31 below. As second method is to make use of Kolmogorovs inequality. We will give this second proof below. The next theorem generalizes the previous theorem by giving necessary and sucient conditions for a random series of independent random variables to converge. Theorem 12.12 (Kolmogorovs Three Series Theorem). Suppose that {Xn }n=1 are independent random variables. Then the random series, j =1 Xj , is almost surely convergent i there exists c > 0 such that 1. 2. 3.
n=1 n=1 n=1

lim inf
n

1 M Sn lim inf Sn = M a.s. n n n


Sn n

Since M as M , it follows that lim inf n n that limn S n = a.s.

= a.s. and hence

One proof of Theorem 12.7 is based on the study of random series. Theorem 12.11 and 12.12 are standard convergence criteria for random series. Denition 12.9. Two sequences, {Xn } and {Xn } , of random variables are tail equivalent if

P (|Xn | > c) < , Var Xn 1|Xn |c < , and E Xn 1|Xn |c converges.

Moreover, if the three series above converge for some c > 0 then they converge for all values of c > 0. Proof. Proof of suciency. Suppose the three series converge for some c > 0. If we let Xn := Xn 1|Xn |c , then

E
n=1

1Xn =Xn =
n=1

P (Xn = Xn ) < .

Proposition 12.10. Suppose {Xn } and {Xn } are tail equivalent. Then 1. (Xn Xn ) converges a.s. 2. The sum Xn is convergent a.s. i the sum generally we have P Xn is convergent Xn is convergent a.s. More

P (Xn = Xn ) =
n=1 n=1

P (|Xn | > c) < .

Xn is convergent

=1

Hence {Xn } and {Xn } are tail equivalent and so it suces to show n=1 Xn is almost surely convergent. However, by the convergence of the second series we learn

3. If there exists a random variable, X , and a sequence an such that 1 n an lim then 1 n an lim
n

Var (Xn ) =
n=1 n=1

Var Xn 1|Xn |c <

Xk = X a.s
k=1

and so by Kolmogorovs convergence criteria,

(Xn EXn ) is almost surely convergent.


n n=1

Xk = X a.s
k=1

Proof. If {Xn } and {Xn } are tail equivalent, we know; for a.e. , Xn ( ) = Xn ( ) for a.a n. The proposition is an easy consequence of this observation.

Finally, the third series guarantees that n=1 EXn = n=1 E Xn 1|Xn |c is convergent, therefore we may conclude n=1 Xn is convergent. The proof of the reverse direction will be given in Section 12.8 below.

Page: 127

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

128

12 Laws of Large Numbers

12.2 Examples
12.2.1 Random Series Examples Example 12.13 (Kolmogorovs Convergence Criteria Example). Suppose that {Yn }n=1 are independent square integrable random variables, such that j =1 Var (Yj ) < and j =1 EYj converges a.s., then j =1 Yj converges a.s.. Denition 12.14. A random variable, Y, is normal with mean standard deviation 2 i P (Y B ) = 1 2 2
B d

>
n=1

P (|n N + n | > c) =

1 2 n=1

e 2 x dx
Bn

(12.4)

where Bn = (,

c + n ) n

c n , . n

e 22 (y) dy for all B BR .

(12.3)

If limn n = 0 then there is a c > 0 such that either n c i.o. or n c i.o. In the rst case in which case (0, ) Bn and in the second (, 0) 1 2 e 2 x dx 1/2 i.o. which would Bn and in either case we will have 1 2 Bn contradict Eq. (12.4). Hence we may concluded that limn n = 0. Similarly if limn n = 0, then we may conclude that Bn contains a set of the form [, ) i.o. for some < and so 1 2
1 2 1 e 2 x dx 2 Bn

We will abbreviate this by writing Y = N , 2 . When = 0 and 2 = 1 we will simply write N for N (0, 1) and if Y = N, we will say Y is a standard normal random variable. Observe that Eq. (12.3) is equivalent to writing E [f (Y )] = 1 2 2
R d d

e 2 x dx i.o.

which would again contradict Eq. (12.4). Therefore we may conclude that limn n = limn n = 0. 2. The convergence of the second series for all c > 0 implies

f (y ) e 22 (y) dy >

Var Yn 1|Yn |c =
n=1 n=1

Var [n N + n ] 1|n N +n |c , i.e.

for all bounded measurable functions, f : R R. Also observe that Y = d N , 2 is equivalent to Y = N +. Indeed, by making the change of variable, y = x + , we nd 1 E [f (N + )] = 2 1 = 2 f (x + ) e 2 x dx
R
1 2

>
n=1

2 n Var N 1|n N +n |c + 2 n Var 1|n N +n |c

n=1

2 n n .

f (y ) e 22 (y)
R

dy 1 = 2 2
j =1

f (y ) e 22 (y) dy.
R

where n := Var N 1|n N +n |c . As the reader should check, n 1 as 2 n and therefore we may conclude n=1 n < . It now follows by Kol mogorovs convergence criteria that n=1 (Yn n ) is almost surely convergent and therefore

Lemma 12.15. Suppose that {Yn }n=1 are independent square integrable ran2 dom variables such that Yn = N n , n . Then 2 converges. < and j j =1 j j =1 d

n =
n=1 n=1

Yn
n=1

(Yn n )

Yj converges a.s. i

Proof. The implication = is true without the assumption that the Yn are normal random variables as pointed out in Example 12.13. To prove the converse directions we will make use of the Kolmogorovs three series theo rem. Namely, if j =1 Yj converges a.s. then the three series in Theorem 12.12 converge for all c > 0. d 1. Since Yn = n N + n , we have for any c > 0 that

converges as well. Alternatively: we may also deduce the convergence of third series as well. Indeed, for all c > 0 implies

n=1

n by the

E [n N + n ] 1|n N +n |c
n=1

is convergent, i.e.

[n n + n n ] is convergent.
n=1

Page: 128

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

12.2 Examples

129

where n := E N 1|n N +n |c and n := E 1|n N +n |c . With a little eort one can show, n ek/n and 1 n ek/n for large n.
2 Since ek/n Cn for large n, it follows that so that n=1 n n is convergent. Moreover,
2 2 2

1 0

d sin dt

k t 2

dt = =

k2 2 22

cos
0

k t 2

dt
1

k 2 2 2 k 1 t + sin kt 22 k 4 4

=
0

k2 2 . 23

n=1

|n n | C

n=1

3 n <

|n (n 1)| C
n=1 n=1

2 |n | n <

Fact: Wiener in 1923 showed the series in Eq. (12.5) is in fact almost surely uniformly convergent. Given this, the process, t Bt is almost surely continuous. The process {Bt : 0 t 1} is Brownian Motion. Example 12.17. As a simple application of Theorem 12.12, we will now use Theorem 12.12 to give a proof of Theorem 12.11. We will apply Theorem 12.12 with Xn := Yn EYn . We need to then check the three series in the statement of Theorem 12.12 converge. For the rst series we have by the Markov inequality,

and hence

n =
n=1 n=1

n n
n=1

n (n 1)

must also be convergent. Example 12.16 (Brownian Motion). Let dom variable, i.e. P (Nn A) =
A {Nn }n=1

P (|Xn | > c) be i.i.d. standard normal ran n=1

1 1 2 E |Xn | = 2 2 c c n=1

Var (Yn ) < .


n=1

For the second series, observe that


2 1 ex /2 dx for all A BR . 2

Var Xn 1|Xn |c
n=1 n=1

Xn 1|Xn |c

n=1

2 Xn

=
n=1

Var (Yn ) <

Let {n }n=1 R, {an }n=1 R, and t R, then

and for the third series (by Jensens or H olders inequality)


an Nn sin n t converges a.s.


n=1 n=1

E Xn 1|Xn |c

n=1

E |Xn | 1|Xn |c
n=1

Var (Yn ) < .

provided n=1 a2 n < . This is a simple consequence of Kolmogorovs convergence criteria, Theorem 12.11, and the facts that E [an Nn sin n t] = 0 and
2 2 Var (an Nn sin n t) = a2 n sin n t an .

12.2.2 A WLLN Example Let {Xn }n=1 be i.i.d. random variables with common distribution function, F (x) := P (Xn x) . For x R let Fn (x) be the empirical distribution function dened by, n n 1 1 Fn (x) := 1X x = X ((, x]) . n j =1 j n j =1 j Since E1Xj x = F (x) and 1Xj x
j =1

As a special case, if we take n = (2n 1) 2 and an = that 2 2 Nk Bt := sin k t k 2 is a.s. convergent for all t R.
1 0 k=1,3,5,... The factor 2k2

2 (2n1) ,

then it follows (12.5)

has been determined by requiring,


2

are Bernoulli random variables, the


P

d 2 2 sin (kt) dt k

dt = 1

weak law of large numbers implies Fn (x) F (x) as n . As usual, for p (0, 1) let F (p) := inf {x : F (x) p} and recall that F (p) x i F (x) p. Let us notice that

as seen by,
Page: 129 job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

130

12 Laws of Large Numbers

Fn (p) = inf {x : Fn (x) p} = inf x :

1Xj x np
j =1

and hence,
P (F (p) Fn (p) ) = P (Fn (F (p) ) F (F (p) ) p F (F (p) )) = P (Fn (F (p) ) F (F (p) ) ) 0 as n .

= inf {x : # {j n : Xj x} np} . The order statistic of ( n) ( n) ( n) X1 , X 2 , . . . , Xn , where (X1 , . . . , Xn ) is the nite sequence, ( n) ( n) ( n) X1 , X 2 , . . . , Xn denotes (X1 , . . . , Xn )
( n)

Thus we have shown that X

( n) np

F (p) as n .

arranged in increasing order with possible repetitions. Let us observe that Xk ( n) are all random variables for k n. Indeed, Xk x i # {j n : Xj x} k n i j =1 1Xj x k, i.e. Xk
( n)

12.3 Strong Law of Large Number Examples


Example 12.19 (Renewal Theory). Let {Xi }i=1 be i.i.d. random variables with 0 < Xi < a.s. Think of the Xi as the time that bulb number i burns and Tn := X1 + + Xn is the time that the nth bulb burns out. (We assume the bulbs are replaced immediately on burning out.) Further let Nt := sup {n 0 : Tn t} denote the number of bulbs which have burned out up to time n. By convention, we set T0 = 0. Letting := EX1 (0, ], we have ETn = n the expected time the nth bulb burns out. On these grounds we expect Nt t/ and hence 1 1 Nt a.s. (12.6) t
1 To prove Eq. (12.6), by the SSLN, if 0 := limn n Tn = then P (0 ) = 1. From the denition of Nt , TNt t < TNt +1 and so

x =

1Xj x k
j =1

B.

Moreover, if we let x = min {n Z : n x} , the reader may easily check that ( n) (p) = X np . Fn Proposition 12.18. Keeping the notation above. Suppose that p (0, 1) is a point where F (F (p) ) < p < F (F (p) + ) for all > 0
(p) F (p) as n . Thus we can recover, with high then X np = Fn n th probability, the p quantile of the distribution F by observing {Xi }i=1 . ( n) P

Proof. Let > 0. Then


{Fn (p) F (p) > } = {Fn (p) + F (p)} = {Fn (p) + F (p)} = {Fn ( + F (p)) p} c

t TNt +1 TNt < . Nt Nt Nt Since Xi > 0 a.s., 1 := {Nt as t } also has full measure and for 0 1 we have = lim TNt () ( ) TNt ()+1 ( ) Nt ( ) + 1 t lim lim = . t t Nt ( ) Nt ( ) Nt ( ) + 1 Nt ( )

so that
{Fn (p) F (p) > } = {Fn (F (p) + ) < p} = {Fn ( + F (p)) F ( + F (p)) < p F (F (p) + )} .

Letting := F (F (p) + ) p > 0, we have, as n , that


P ({Fn (p) F (p) > }) = P (Fn ( + F (p)) F ( + F (p)) < ) 0.

Example 12.20 (Renewal Theory II). Let {Xi }i=1 be i.i.d. and {Yi }i=1 be i.i.d. with {Xi }i=1 being independent of the {Yi }i=1 . Also again assume that 0 < Xi < and 0 < Yi < a.s. We will interpret Yi to be the amount of time the ith bulb remains out after burning out before it is replaced by bulb number i + 1. Let Rt be the amount of time that we have a working bulb in the time interval [0, t] . We are now going to show 1 EX1 lim Rt = . t EX1 + EY1

Similarly, let := p F (F (p) ) > 0 and observe that


{F (p) Fn (p) } = {Fn (p) F (p) } = {Fn (F (p) ) p}

Page: 130

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

12.3 Strong Law of Large Number Examples

131

To prove this, now let Tn := (Xi + Yi ) be the time that the nth bulb is replaced and Nt := sup {n 0 : Tn t} denote the number of bulbs which have burned out up to time n. Then Rt = Nt 1 1 i=1 Xi . Setting = EX1 and = EY1 , we now have t Nt + a.s. so that 1 Nt = + t + o (t) a.s. Therefore, by the strong law of large numbers, 1 1 Rt = t t Nt 1 Xi = t Nt i=1
Nt Nt

n i=1

and x0 = . Observe that it is possible that xi = xi+1 for some of the i. This can occur when F has jumps of size greater than 1/k.

Xi
i=1

1 a.s. +

Theorem 12.21 (Glivenko-Cantelli Theorem). Suppose that {Xn }n=1 are n 1 i.i.d. random variables and F (x) := P (Xi x) . Further let n := n i=1 Xi be the empirical distribution with empirical distribution function, Fn (x) := n ((, x]) = Then
n xR

1 n

1Xi x .
i=1

lim sup |Fn (x) F (x)| = 0 a.s. Now suppose i has been chosen so that xi < xi+1 and let x (xi , xi+1 ) . Further let N ( ) N be chosen so that |Fn (xi ) F (xi )| < 1/k and |Fn (xi ) F (xi )| < 1/k . for n N ( ) and i = 1, 2, . . . , k 1 and k with P (k ) = 1. We then have Fn (x) Fn (xi+1 ) F (xi+1 ) + 1/k F (x) + 2/k and Fn (x) Fn (xi ) F (xi ) 1/k F (xi+1 ) 2/k F (x) 2/k. From this it follows that |F (x) Fn (x)| 2/k and we have shown for k and n N ( ) that sup |F (x) Fn (x)| 2/k.
xR

Proof. Since {1Xi x }i=1 are i.i.d random variables with E1Xi x = P (Xi x) = F (x) , it follows by the strong law of large numbers the limn Fn (x) = F (x) a.s. for each x R. Our goal is to now show that this convergence is uniform.1 To do this we will use one more application of the strong law of large numbers applied to {1Xi <x } which allows us to conclude, for each x R, that
n

lim Fn (x) = F (x) a.s. (the null set depends on x).

i Given k N, let k := and let xi := k : i = 1, 2, . . . , k 1 inf {x : F (x) i/k } for i = 1, 1, 2, . . . , k 1. Let us further set xk =
1

Observation. If F is continouous then, by what we have just shown, there is a set 0 such that P (0 ) = 1 and on 0 , Fn (r) F (r) for all r Q. Moreover on 0 , if x R and r x s with r, s Q, we have F (r) = lim Fn (r) lim inf Fn (x) lim sup Fn (x) lim Fn (s) = F (s) .
n n n n

We may now let s x and r x to conclude, on 0 , on F (x) lim inf Fn (x) lim sup Fn (x) F (x) for all x R,
n n

Hence it follows on 0 := k=1 k (a set with P (0 ) = 1) that


n xR

lim sup |Fn (x) F (x)| = 0.

i.e. on 0 , limn Fn (x) = F (x) . Thus, in this special case we have shown o a xed null set independent of x that limn Fn (x) = F (x) for all x R. Page: 131 job: prob macro: svmonob.cls

date/time: 23-Feb-2007/15:20

132

12 Laws of Large Numbers


n

Example 12.22 (Shannons Theorem). Let {Xi }i=1 be a sequence of i.i.d. random variables with values in {1, 2, . . . , r} N. Let p (k ) := P (Xi = k ) > 0 for 1 k r. Further, let n ( ) = p (X1 ( )) . . . p (Xn ( )) be the probability of the realization, (X1 ( ) , . . . , Xn ( )) . Since {ln p (Xi )}i=1 are i.i.d., 1 1 ln n = n n
n r

P (|Xk | > n) 0
k=1

(12.7)

and 1 n2 then

n 2 E Xk : |Xk | n 0, k=1

(12.8)

ln p (Xi ) E [ln p (X1 )] =


i=1 k=1

p (k ) ln p (k ) =: H (p) .

In particular if > 0, P H+ 1 ln n > n

H = =

1 n

ln n > 0 as n . Since

Sn an P 0. n

H+

1 1 ln n > H + ln n < n n 1 1 ln n > H + ln n < H n n

Proof. A key ingredient in this proof and proofs of other versions of the law of large numbers is to introduce truncations of the {Xk } . In this case we consider
n

Sn :=
k=1

Xk 1|Xk |n .

= n > en(H +) n < en(H ) and H 1 ln n > n


c

Since {Sn = Sn } n k=1 {|Xk | > n} , P S an Sn an n > n n =P Sn Sn > n


n

= n > en(H +)

n < en(H )

= n en(H +) n en(H ) = en(H +) n en(H ) , it follows that P en(H +) n en(H ) 1 as n . Thus the probability, n , that the random sample {X1 , . . . , Xn } should occur is approximately enH with high probability. The number H is called the entropy r of the distribution, {p (k )}k=1 . E Hence it suces to show
Sn an n L (P )
2

P (Sn = Sn )
k=1 Sn an P n

P (|Xk | > n) 0 as n .

0 as n and for this it suces to show,

0 as n . Observe that ESn = an and therefore, Sn an n


2

1 1 Var (Sn ) = 2 n2 n 1 n2
n

Var Xk 1|Xk |n
k=1

2 E Xk 1|Xk |n 0 as n . k=1

12.4 More on the Weak Laws of Large Numbers


Theorem 12.23 (Weak Law of Large Numbers). Suppose that n is a sequence of independent random variables. Let Sn := j =1 Xj and
n {Xn }n=1

We now verify the hypothesis of Theorem 12.23 in three situations. Corollary 12.24. If {Xn }n=1 are i.i.d. L2 (P ) random variables, then
1 n Sn P

= EX1 .

an :=
k=1

E (Xk : |Xk | n) = nE (X1 : |X1 | n) .

Proof. By the dominated convergence theorem, an 1 := n n


n

If
Page: 132 job: prob

E (Xk : |Xk | n) = E (X1 : |X1 | n) .


k=1

(12.9)

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

12.5 Maximal Inequalities

133

Moreover, 1 n2
n 2 E Xk : |Xk | n = k=1

Proof. To prove this we observe that 1 1 2 2 E X1 : |X1 | n E X1 0 as n n n E |X | : |X | n = E 2


n 2

10x|X |n xdx = 2
n

P (0 x |X | n) xdx (x) dx.

and by Chebyschevs inequality,


n

2
0

xP (|X | x) dx = 2
0

P (|Xk | > n) = nP (|X1 | > n) n


k=1

1 2 E |X1 | 0 as n . n2

Now given > 0, let M = M () be chosen so that (x) for x M. Then E |X | : |X | n = 2


0 2 M n

With these observations we may now apply Theorem 12.23 to complete the proof. Corollary 12.25 (Khintchins WLLN). If dom variables, then
P 1 n Sn {Xn }n=1

(x) dx + 2
M

(x) dx 2KM + 2 (n M )

are i.i.d. L (P ) ran-

= EX1 .

where K = sup { (x) : x 0} . Dividing this estimate by n and then letting n shows 1 2 lim sup E |X | : |X | n 2. n n Since > 0 was arbitrary, the proof is complete. Corollary 12.27 (Fellers WLLN). If {Xn }n=1 are i.i.d. and (x) := xP (|X1 | > x) 0 as x , then the hypothesis of Theorem 12.23 are satised. Proof. Since
n

Proof. Again we have by Eq. (12.9), Chebyschevs inequality, and the dominated convergence theorem, that
n

k=1

1 P (|Xk | > n) = nP (|X1 | > n) n E [|X1 | : |X1 | > n] 0 as n . n

Also 1 n2
n

k=1

|X1 | 1 2 2 1|X1 |n E Xk : |Xk | n = E |X1 | : |X1 | n = E |X1 | n n

P (|Xk | > n) = nP (|X1 | > n) = (n) 0 as n ,


k=1

and the latter expression goes to zero as n by the dominated convergence theorem, since |X1 | |X1 | 1|X1 |n |X1 | L1 (P ) n
1| and limn |X1 | |X n 1|X1 |n = 0. Hence again the hypothesis of Theorem 12.23 have been veried.

Eq. (12.7) is satised. Eq. (12.8), follows from Lemma 12.26 and the identity, 1 n2
n 2 E Xk : |Xk | n = k=1

1 2 E |X1 | : |X1 | n . n

Lemma 12.26. Let X be a random variable such that (x) := xP (|X | x) 0 as x , then 1 2 lim E |X | : |X | n = 0. (12.10) n n Note: If X L1 (P ) , then by Chebyschevs inequality and the dominated convergence theorem, (x) E [|X | : |X | x] 0 as x .

12.5 Maximal Inequalities


Theorem 12.28 (Kolmogorovs Inequality). Let {Xn } be a sequence of independent random variables with mean zero, Sn := X1 + + Xn , and Sn = maxj n |Sj | . Then for any > 0 we have
P (SN )

1 2 : |SN | . E SN 2

Page: 133

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

134

12 Laws of Large Numbers

Proof. Let J = inf {j : |Sj | } with the inmum of the empty set being taken to be equal to . Observe that {J = j } = {|S1 | < , . . . , |Sj 1 | < , |Sj | } (X1 , . . . , Xj ) . Now
N

Proof. (The proof of this Corollary may be skipped. We will give another proof in Corollary 12.36 below.) From Theorem 12.28, we have for every > 0 that P
SN Np = P (SN N p )

1 C 1 2 E SN = 2 2p CN = 2 (2p1) . 2 N 2p N N

2 SN

|SN |

> =E
N

2 SN

:J N =
j =1

E
2

2 SN

Hence if we suppose that Nn = n with (2p 1) > 1, then we have :J =j

P
n=1

SN n p Nn

n=1

C 2 n(2p1)

<

=
j =1 N

E (Sj + SN Sj ) : J = j
2 E Sj + (SN Sj ) + 2Sj (SN Sj ) : J = j j =1 N 2

and so by the rst Borel Cantelli lemma we have P


SN n p for n i.o. Nn S

=
( )

= 0.

E
j =1 N

2 Sj

+ (SN Sj ) : J = j
N 2 j =1 2 (|SN |

Nn From this it follows that limn N p = 0 a.s. n To nish the proof, for m N, we may choose n = n (m) such that

j =1

2 Sj

n = Nn m < Nn+1 = (n + 1) . > ) . Since


SN n(m) p Nn (m)+1 SN Sm n(m)+1 p p m Nn (m)

:J =j

P [J = j ] = P

The equality, () , is a consequence of the observations: 1) 1J =j Sj is (X1 , . . . , Xj ) measurable, 2) (Sn Sj ) is (Xj +1 , . . . , Xn ) measurable and hence 1J =j Sj and (Sn Sj ) are independent, and so 3) E [Sj (SN Sj ) : J = j ] = E [Sj 1J =j (SN Sj )] = E [Sj 1J =j ] E [SN Sj ] = E [Sj 1J =j ] 0 = 0.

and Nn+1 /Nn 1 as n , it follows that 0 = lim S lim m p p m N m N m mp n(m) n(m)+1 SN SN n(m)+1 n(m)+1 lim = lim = 0 a.s. p p m N m N n(m) n(m)+1 = lim = 0 a.s.
SN n(m) SN n(m)

Corollary 12.29 (L2 SSLN). Let {Xn } be a sequence of independent rann 2 dom variables with mean zero, and 2 = EXn < . Letting Sn = k=1 Xk and p > 1/2, we have 1 Sn 0 a.s. np If {Yn } is a sequence of independent random variables EYn = and 2 = Var (Xn ) < , then for any (0, 1/2) , 1 n
n

That is limm
Sm mp

Theorem 12.30 (Skorohods Inequality). Let {Xn } be a sequence of independent random variables and let > 0. Let Sn := X1 + + Xn . Then for all > 0, P (|SN | > ) (1 cN ()) P max |Sj | > 2 ,
j N

Yk = O
k=1

1 n

. where
j N

cN () := max P (|SN Sj | > ) .

Page: 134

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

12.5 Maximal Inequalities

135

Proof. Our goal is to compute P max |Sj | > 2 .


j N

To this end, let J = inf {j : |Sj | > 2} with the inmum of the empty set being taken to be equal to . Observe that {J = j } = {|S1 | 2, . . . , |Sj 1 | 2, |Sj | > 2} and therefore max |Sj | > 2
j N N

Proof. Let Sn := Xk . Since almost sure convergence implies convergence in probability, it suces to show; if Sn is convergent in probability then Sn is almost surely convergent. Given M M, let QM := supnM |Sn SM | and for M < N, let QM,N := supM nN |Sn SM | . Given (0, 1) , by assumption, there exists M = M () N such that maxM j N P (|SN Sj | > ) < for all N M. An application of Skorohods inequality, then shows P (QM,N 2) P (|SN SM | > ) . (1 maxM j N P (|SN Sj | > )) 1

n k=1

Since QM,N QM as N , we may conclude {J = j } . P (QM 2) Since, M := sup |Sn Sm | sup [|Sn SM | + |SM Sm |] = 2QM
m,nM m,nM 1

=
j =1

. 1

Also observe that on {J = j } , |SN | = |SN Sj + Sj | |Sj | |SN Sj | > 2 |SN Sj | . Hence on the {J = j, |SN Sj | } we have |SN | > , i.e. {J = j, |SN Sj | } {|SN | > } for all j N. Hence ti follows from this identity and the independence of {Xn } that
N

we may further conclude, P (M > 4)


P

and since > 0 is arbitrary, it

follows that M 0 as M . Moreover, since M is decreasing in M, it P follows that limM M =: exists and because M 0 we may concluded that = 0 a.s. Thus we have shown
m,n

P (|SN | > )
j =1 N

P (J = j, |SN Sj | )

lim |Sn Sm | = 0 a.s.

=
j =1

P (J = j ) P (|SN Sj | ) .

and therefore {Sn }n=1 is almost surely Cauchy and hence almost surely convergent. Proposition 12.32 (Reection Principle). Let X be a separable Banach
N d

Under the assumption that P (|SN Sj | > ) c for all j N, we nd P (|SN Sj | ) 1 c and therefore,
N

space and {i }i=1 be independent symmetric (i.e. i = i ) random variables k with values in X. Let Sk := i=1 i and Sk := supj k Sj with the convention that S0 = 0. Then P (SN r) 2P ( SN r) . (12.11) Proof. Since

P (|SN | > )
j =1

P (J = j ) (1 c) = (1 c) P

max |Sj | > 2 .


j N

{SN r} =

N j =1

Sj r, Sj 1 < r ,

As an application of Theorem 12.30 we have the following convergence result. Theorem 12.31 (L evys Theorem). Suppose that {Xn }n=1 are i.i.d. random variables then n=1 Xn converges in probability i n=1 Xn converges a.s.
Page: 135 job: prob

P (SN r) = P (SN r, SN r) + P (SN r, = P ( SN r) + P (SN r, SN < r).

SN < r ) (12.12)

where
macro: svmonob.cls date/time: 23-Feb-2007/15:20

136

12 Laws of Large Numbers


N P (SN

r,

SN < r ) =
j =1

P ( Sj r,

Sj 1

< r,

SN < r).

(12.13)

By symmetry and independence we have


P ( Sj r, Sj 1 < r, SN < r) = P ( Sj r, Sj 1 < r,

Proof. First proof. By Proposition 12.5, the sum, j =1 (Yj EYj ) , is L2 (P ) convergent and hence convergent in probability. An application of L evys Theorem 12.31 then shows j =1 (Yj EYj ) is almost surely convergent. n Second proof. Let Sn := j =1 Xj where Xj := Yj EYj . According to Kolmogorovs inequality, Theorem 12.28, for all M < N, P max |Sj SM | 1 1 2 E (SN SM ) = 2 2 1 2
N N 2 E Xj j =M +1

Sj +
k>j

k < r )

M j N

= P ( Sj r, Sj 1 < r, = P ( Sj r, Sj 1 < r,

Sj
k>j

k < r )

Var (Xj ) .
j =M +1

2Sj SN < r).

Letting N in this inequality shows, with QM := supj M |Sj SM | , P (QM ) Since 1 2

If Sj r and 2Sj SN < r, then r > 2Sj SN 2 Sj SN 2r SN and hence SN > r. This shows, Sj r,
Sj 1

Var (Xj ) .
j =M +1

< r,

2S j S N < r

Sj r,

Sj 1

< r,

SN > r

M := sup |Sj Sk | sup [|Sj SM | + |SM Sk |] 2QM


j,kM j,kM

and therefore,
P ( Sj r, Sj 1 < r, SN < r) P ( Sj r, Sj 1 < r,

SN > r).

we may further conclude, P (M 2) 1 2

Combining the estimate with Eq. (12.13) gives


N P (SN r,

Var (Xj ) 0 as M ,
j =M +1

SN < r )
j =1

P ( Sj r, Sj 1 < r,

SN > r )

= P (SN r,

SN > r) P ( SN r).

i.e. M 0 as M . Since M is decreasing in M, it follows that P limM M =: exists and because M 0 we may concluded that = 0 a.s. Thus we have shown
m,n

This estimate along with the estimate in Eq. (12.12) completes the proof of the theorem.

lim |Sn Sm | = 0 a.s.

12.6 Kolmogorovs Convergence Criteria and the SSLN


We are now in a position to prove Theorem 12.11 which we restate here. Theorem 12.33 (Kolmogorovs Convergence Criteria). Suppose that {Yn }n=1 are independent square integrable random variables. If j =1 Var (Yj ) < , then j =1 (Yj EYj ) converges a.s.

and therefore {Sn }n=1 is almost surely Cauchy and hence almost surely convergent. Lemma 12.34 (Kroneckers Lemma). Suppose that {xk } R and {ak } k (0, ) are sequences such that ak and k=1 x ak exists. Then 1 n an lim
n

xk = 0.
k=1

Page: 136

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

12.6 Kolmogorovs Convergence Criteria and the SSLN

137

Proof. Before going to the proof, let us warm-up by proving the following continuous version of the lemma. Let a (s) (0, ) and x (s) R be continuous (s) functions such that a (s) as s and 1 x a(s) ds exists. We are going to show n 1 lim x (s) ds = 0. n a (n) 1 Let X (s) :=
s 0

for any m N; we may conclude lim sup


n

Sn 1 lim sup an a n n = lim sup


n

(ak ak1 ) |rk |


k=2 n

x (u) du and

1 an

(ak ak1 ) |rk |


k=m

r (s) :=
s

X (u) du = a (u)

x (u) du. a (u)

sup |rk | lim sup


km n

1 an

(ak ak1 )
k=m

Then by assumption, r (s) 0 as s 0 and X (s) = a (s) r (s) . Integrating this equation shows
s s

= sup |rk | lim sup


km n

1 [an am1 ] = sup |rk | . an km

This completes the proof since supkm |rk | 0 as m . r (u) a (u) du. Corollary 12.35. Let {Xn } be a sequence of independent square integrable random variables and bn be a sequence such that bn . If

X (s) X (s0 ) =
s0

a (u) r (u) du =

a (u) r (u) |s u=s0

+
s0

Dividing this equation by a (s) and then letting s gives 1 a (s0 ) r (s0 ) a (s) r (s) |X (s)| = lim sup + r (u) a (u) du lim sup a (s) a (s) a (s) s0 s s s 1 lim sup r (s) + |r (u)| a (u) du a (s) s0 s a (s) a (s0 ) sup |r (u)| = sup |r (u)| 0 as s0 . lim sup a (s) s us0 us0 With this as warm-up, we go to the discrete case. Let
k s

k=1

Var (Xk ) < b2 k

then

Sn ESn 0 a.s. bn

Proof. By Kolmogorovs Convergence Criteria, Theorem 12.33, Xk EXk is convergent a.s. bk

k=1

Sk :=
j =1

xj and rk :=
j =k

xj . aj

Therefore an application of Kroneckers Lemma implies 0 = lim 1 n bn


n

so that rk 0 as k by assumption. Since xk = ak (rk rk+1 ) , we nd Sn 1 = an an =


n

(Xk EXk ) = lim


k=1

Sn ESn . bn

ak (rk rk+1 ) =
k=1 n

1 an

n+1

ak rk
k=1 k=2

ak1 rk Corollary 12.36 (L2 SSLN). Let {Xn } be a sequence of independent rann 2 dom variables such that 2 = EXn < . Letting Sn = k=1 Xk and := EXn , we have 1 (Sn n) 0 a.s. (12.14) bn provided bn and
p 1 n=1 b2 n

1 a1 r1 an rn+1 + an

(ak ak1 ) rk . (summation by parts)


k=2

Using the fact that ak ak1 0 for all k 2, and 1 lim n an


Page: 137
m

< . For example, we could take bn = n or


1/2+

(ak ak1 ) |rk | = 0


k=2

bn = n for an p > 1/2, or bn = n1/2 (ln n) Eq. (12.14) as


macro: svmonob.cls

for any > 0. We may rewrite

job: prob

date/time: 23-Feb-2007/15:20

138

12 Laws of Large Numbers

Sn n = o (1) bn or equivalently, Sn bn = o (1) . n n Proof. This corollary is a special case of Corollary 12.35. Let us simply observe here that

Proof. First observe that for all y 0 we have,


1ny y
n=1 n=1

1ny + 1 =
n=0

1ny .

(12.16)

Taking y = |X | / in Eq. (12.16) and then take expectations gives the estimate in Eq. (12.15). Proposition 12.40. Suppose that {Xn }n=1 are i.i.d. random variables, then the following are equivalent: 1. E |X1 | < . 2. There exists > 0 such that n=1 P (|X1 | n) < . 3. For all > 0, n=1 P (|X1 | n) < . n| 4. limn |X n = 0 a.s. Proof. The equivalence of items 1., 2., and 3. easily follows from Lemma 12.39. So to nish the proof it suces to show 3. is equivalent to 4. To this end n| we start by noting that limn |X n = 0 a.s. i 0=P |Xn | i.o. n

1 n1/2 (ln n)
1/2+ 2

1
1+2

n=2

n=2 n (ln n)

by comparison with the integral


2

1 x ln
1+2

dx =
ln 2

1 ey y 1+2

ey dy =
ln 2

1 dy < , y 1+2

wherein we have made the change of variables, y = ln x. Fact 12.37 Under the hypothesis in Corollary 12.36,
n

lim

Sn n n1/2 (ln ln n)
1/2

2 a.s.

= P (|Xn | n i.o.) for all > 0.

(12.17)

Our next goal is to prove the Strong Law of Large numbers (in Theorem 12.7) under the assumption that E |X1 | < .

However, since {|Xn | n}n=1 are independent sets, Borel zero-one law shows the statement in Eq. (12.17) is equivalent to n=1 P (|Xn | n) < for all > 0. Corollary 12.41. Suppose that {Xn }n=1 are i.i.d. random variables such that 1 1 n Sn c R a.s., then Xn L (P ) and := EXn = c. Proof. If
1 n Sn

12.7 Strong Law of Large Numbers


Lemma 12.38. Suppose that X : R is a random variable, then E |X | =
0 p

psp1 P (|X | s) ds =
0

psp1 P (|X | > s) ds.

c a.s. then n :=

Sn+1 n+1

Sn n

0 a.s. and therefore,

Proof. By the fundamental theorem of calculus, |X | =


0 p |X |

psp1 ds = p
0

1s|X | sp1 ds = p
0

1s<|X | sp1 ds.

Sn+1 Sn 1 1 Xn+1 = = n + Sn n+1 n+1 n+1 n n+1 Sn 1 = n + 0 + 0 c = 0. (n + 1) n Hence an application of Proposition 12.40 shows Xn L1 (P ) . Moreover by 1 Exercise 11.3, n Sn n=1 is a uniformly integrable sequenced and therefore, =E 1 Sn E n
n

Taking expectations of this identity along with an application of Tonellis theorem completes the proof. Lemma 12.39. If X is a random variable and > 0, then

P (|X | n)
n=1

1 E |X | P (|X | n) . n=0
job: prob

lim

1 Sn = E [c] = c. n

(12.15)

Page: 138

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

12.7 Strong Law of Large Numbers

139

Lemma 12.42. For all x 0, (x) := 1 1xn = n2 n=1

Proof. This is a simple application of Lemma 12.42; 1 2 min n2 1 ,1 . x


1 dt a t2

nx

1 1 2 2 2 = E |X | (|X |) E |X | : 1|X |n = E |X | 1 2 2 |X |n n n n=1 n=1 = 1/a. 2E |X |


2

Proof. The proof will be by comparison with the integral, For example, 1 1 1 + dt = 1 + 1 = 2 2 2 n t 1 n=1 and so
nx

1 1 |X |

2E |X | .

With this as preparation we are now in a position to prove Theorem 12.7 which we restate here. Theorem 12.44 (Kolmogorovs Strong Law of Large Numbers). Sup pose that {Xn }n=1 are i.i.d. random variables and let Sn := X1 + + Xn . 1 Then there exists R such that n Sn a.s. i Xn is integrable and in which case EXn = .

1 1 2 = = 2 for 0 < x 1. 2 n2 n x n=1

Similarly, for x > 1, 1 1 2+ 2 n x


x

nx

1 1 1 1 dt = 2 + = 2 t x x x

1+

1 x

2 , x

see Figure 12.7 below.

1 Proof. The implication, n Sn a.s. implies Xn L1 (P ) and EXn = has already been proved in Corollary 12.41. So let us now assume Xn L1 (P ) and let := EXn . Let Xn := Xn 1|Xn |n . By Proposition 12.40,

P (Xn = Xn ) =
n=1 n=1

P (|Xn | > n) =
n=1

P (|X1 | > n) E |X1 | < ,

and hence {Xn } and {Xn } are tail equivalent. Therefore it suces to show 1 limn n Sn = a.s. where Sn := X1 + + Xn . But by Lemma 12.43,
E |X | 1 2 n |Xn |n Var (Xn ) E |Xn | = 2 2 2 n n n n=1 n=1 n=1 2

=
n=1

E |X1 | 1|X1 |n n2

2E |X1 | < .

Therefore by Kolmogorovs convergence criteria, Lemma 12.43. Suppose that X : R is a random variable, then 1 2 E |X | : 1|X |n 2E |X | . 2 n n=1

Xn EXn is almost surely convergent. n n=1 Kroneckers lemma then implies 1 n n lim
n

(Xk EXk ) = 0 a.s.


k=1

Page: 139

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

140

12 Laws of Large Numbers

So to nish the proof, it only remains to observe 1 lim n n


n

k=1

1 EXk = lim n n
n

E Xn 1|Xn |n
k=1

1 = lim n n

lim

N j =1

1 Aj ln N

1 j

= 0 a.s.

E X1 1|X1 |n
k=1

So to nish the proof it only remains to show lim


N 1 j =1 j

= lim E X1 1|X1 |n = . Here we have used the dominated convergence theorem to see that an := E X1 1|X1 |n as n . It is now easy (and standard) to check that n 1 limn n k=1 an = limn an = as well. We end this section with another example of using Kolmogorovs convergence criteria in conjunction with Kroneckers lemma. We now assume that {Xn }n=1 are i.i.d. random variables with a continuous distribution function and let Aj denote the event when Xj is a record, i.e. Aj := {Xj > max {X1 , X2 , . . . , Xk1 }} . Recall from Renyi Theorem 7.28 that {Aj }j =1 are independent and P (Aj ) = for all j. Proposition 12.45. Keeping the preceding notation and let N := denote the number of records in the rst N observations. Then limN a.s. Proof. Since 1Aj are Bernoulli random variables, E1Aj = Var 1Aj = E12 Aj E1Aj Observing that
n n 2 1 j 1 j n

ln N

= 1.

(12.18)

To see this write


N +1

ln (N + 1) =
1 N

1 dx = x

N j

j +1

j =1

1 dx x
N

j +1 j N

=
j =1

1 1 x j 1 j

dx +
j =1

1 j (12.19)

= N +
j =1

N j =1 1Aj N ln N = 1

where
N

|N | =
j =1

ln

and

j+1 1 = j j

ln (1 + 1/j )
j =1

1 j

j =1

1 j2

1 j1 1 2 = . j j j2

and hence we conclude that limN N < . So dividing Eq. (12.19) by ln N and letting N gives the desired limit in Eq. (12.18).

E1Aj =
j =1 j =1

1 j

N 1 N

1 dx = ln N x

12.8 Necessity Proof of Kolmogorovs Three Series Theorem


This section is devoted to the necessity part of the proof of Kolmogorovs Three Series Theorem 12.12. We start with a couple of lemmas. Lemma 12.46. Suppose that {Yn }n=1 are independent random variables such that there exists c < such that |Yn | c < a.s. and further assume 2 EYn = 0. If n=1 Yn is almost surely convergent then n=1 EYn < . More precisely the following estimate holds,

we are lead to try to normalize the sum j =1 1Aj by ln N. So in the spirit of the proof of the strong law of large numbers let us compute;

Var
j =2

1Aj ln j

=
j =2

1 j1 ln2 j j 2

1 1 dx = ln2 x x

ln 2

1 dy < . y2

Therefore by Kolmogorovs convergence criteria we may conclude

1 Aj ln j

1 j

=
j =2

j =2

1 Aj 1Aj E ln j ln j

EYj2
j =1

( + c) for all > 0, P (supn |Sn | )

(12.20)

is almost surely convergent. An application of Kroneckers Lemma then implies


Page: 140 job: prob

where as usual, Sn :=
macro: svmonob.cls

n j =1

Yj .
date/time: 23-Feb-2007/15:20

12.8 Necessity Proof of Kolmogorovs Three Series Theorem

141

Remark 12.47. It follows from Eq. (12.20) that if P (supn |Sn | < ) > 0, then 2 j =1 Yj = limn Sn j =1 EYj < and hence by Kolmogorovs Theorem, exists a.s. and in particular, P (supn |Sn | < ) . Proof. Let > 0 and be the rst time |Sn | > , i.e. let be the stopping time dened by, = := inf {n 1 : |Sn | > } . As usual, = if {n 1 : |Sn | > } = . Then for N N,
2 2 2 E SN = E SN : N + E SN : >N 2 E SN : N + 2 P [ > N ] .

Since Sn is convergent a.s., it follows that P (supn |Sn | < ) = 1 and therefore,

lim P

sup |Sn | <


n

= 1.

Hence for suciently large, P (supn |Sn | < ) > 0 ad we learn that
2 EYj2 = lim E SN j =1 N

( + c) < . P (supn |Sn | )

Moreover,
N 2 E SN : N = j =1 N 2 E SN : =j = j =1 2 E Sj + 2Sj (SN Sj ) + (SN Sj ) : = j j =1 N N 2 N

E |Sj + SN Sj | : = j

Lemma 12.48. Suppose that {Yn }n=1 are independent random variables such that there exists c < such that |Yn | c a.s. for all n. If n=1 Yn converges in R a.s. then n=1 EYn converges as well. Proof. Let (0 , B0 , P0 ) be the probability space that {Yn }n=1 is dened on and let := 0 0 , B := B0 B0 , and P := P0 P0 . Further let Yn (1 , 2 ) := Yn (1 ) and Yn (1 , 2 ) := Yn (2 ) and

=
j =1 N

2 Sj

: =j +
j =1 2

E (SN Sj )

P [ = j ]
N

Zn (1 , 2 ) := Yn (1 , 2 ) Yn (1 , 2 ) = Yn (1 ) Yn (2 ) . Then |Zn | 2c a.s., EZn = 0, and

j =1 N

2 E (Sj 1 + Yj ) : = j + E SN j =1

P [ = j ]

Zn (1 , 2 ) =
n=1 n=1

Yn (1 )
n=1

Yn (2 ) exists

j =1

2 E ( + c) : = j + E SN P [ N ] 2

for P a.e. (1 , 2 ) . Hence it follows from Lemma 12.46 that


2 EZn = n=1 n=1

2 = ( + c) + E SN

P [ N ] . > =

Var (Zn ) =
n=1

Var (Yn Yn )

Putting this all together then gives,


2 2 E SN ( + c) + E SN 2 ( + c) + E SN 2 2 2

P [ N ] + 2 P [ > N ] P [ N ] + ( + c) P [ > N ]
2 SN 2

[Var (Yn ) + Var (Yn )] = 2


n=1 n=1

Var (Yn ) .

= ( + c) + P [ N ] E form which it follows that


2 E SN 2

( + c) ( + c) ( + c) = 1 P [ N ] 1 P [ < ] P [ = ] ( + c) . P (supn |Sn | )


job: prob
2

Thus by Kolmogorovs convergence theorem, it follows that n=1 (Yn EYn ) is convergent. Since n=1 Yn is a.s. convergent, we may conclude that n=1 EYn is also convergent. We are now ready to complete the proof of Theorem 12.12. Proof. Our goal is to show if {Xn }n=1 are independent random variables, then the random series, n=1 Xn , is almost surely convergent i for all c > 0 the following three series converge; 1.
n=1

P (|Xn | > c) < ,


date/time: 23-Feb-2007/15:20

Page: 141

macro: svmonob.cls

2. 3.

n=1 n=1

Var Xn 1|Xn |c < , and E Xn 1|Xn |c converges.

Since n=1 Xn is almost surely convergent, it follows that limn Xn = 0 a.s. and hence for every c > 0, P ({|Xn | c i.o.}) = 0. According the Borel zero one law this implies for every c > 0 that n=1 P (|Xn | > c) < . Given c this, we now know that {Xn } and Xn := Xn 1|Xn |c are tail equivalent for c all c > 0 and in particular n=1 Xn is almost surely convergent for all c > 0. c ), So according to Lemma 12.48 (with Yn = Xn
c EXn = n=1 n=1

E Xn 1|Xn |c

converges.

c c , we may now conclude that n=1 Yn is almost surely EXn Letting Yn := Xn convergent. Since {Yn } is uniformly bounded and EYn = 0 for all n, an application of Lemma 12.46 allows us to conclude 2 EYn < . n=1

Var Xn 1|Xn |c =
n=1

13 Weak Convergence Results


Suppose {Xn }n=1 is a sequence of random variables and X is another random variable (possibly dened on a dierent probability space). We would like to understand when, for large n, Xn and X have nearly the same distribution. Alternatively put, if we let n (A) := P (Xn A) and (A) := P (X A) , when is n close to for large n. This is the question we will address in this chapter.

Proof. Let = and h := f g : R so that d = hdm. Since ( ) = ( ) ( ) = 1 1 = 0, if A B we have (A) + (Ac ) = ( ) = 0. In particular this shows | (A)| = | (Ac )| and therefore,

13.1 Total Variation Distance


Denition 13.1. Let and be two probability measure on a measurable space, (, B ) . The total variation distance, dT V (, ) , is dened as dT V (, ) := sup | (A) (A)| .
AB

| (A)| =

1 1 hdm [| (A)| + | (Ac )|] = hdm + 2 2 Ac A 1 1 |h| dm + |h| dm = |h| dm. 2 A 2 Ac dT V (, ) = sup | (A)|
AB

(13.1)

This shows

Remark 13.2. The function, : B R dened by, (A) := (A) (A) for all A B , is an example of a signed measure. For signed measures, one usually denes
n

1 2

|h| dm.

To prove the converse inequality, simply take A = {h > 0} (note Ac = {h 0}) in Eq. (13.1) to nd | (A)| = 1 2 1 = 2 hdm
A Ac

TV

:= sup
i=1

| (Ai )| : n N and partitions, {Ai }i=1 B of

hdm |h| dm =
Ac

You are asked to show in Exercise 13.1 below, that when = , dT V (, ) = 1 2 TV . Lemma 13.3 (Sche es Lemma). Suppose that m is another positive measure on (, B ) such that there exists measurable functions, f, g : [0, ), such that d = f dm and d = gdm.1 Then dT V (, ) =

|h| dm +
A

1 2

|h| dm.

For the second assertion, let Gn := fn + g and observe that |fn g | 0 m a.e., |fn g | Gn L1 (m) , Gn G := 2g a.e. and Gn dm = 2 2 = Gdm and n . Therefore, by the dominated convergence theorem 8.34,
n

1 2

|f g | dm.

lim dT V (n , ) =

1 lim 2 n

|fn g | dm = 0.

Moreover, if {n }n=1 is a sequence of probability measure of the form, dn = fn dm with fn : [0, ), and fn g, m - a.e., then dT V (n , ) 0 as n .
1

For a concrete application of Sche es Lemma, see Proposition 13.35 below. Corollary 13.4. Let h := sup |h ( )| when h : R is a bounded random variable. Continuing the notation in Sche es lemma above, we have

Fact: it is always possible to do this by taking m = + for example.

144

13 Weak Convergence Results

dT V (, ) = Consequently,

1 sup 2

hd

hd : h

1 .

(13.2)

13.2 Weak Convergence


i 1 for i {1, 2, . . . , n} so that Example 13.6. Suppose that P Xn = n = n Xn is a discrete approximation to the uniform distribution, i.e. to U where i P (U A) = m (A [0, 1]) for all A BR . If we let An = n : i = 1, 2, . . . , n , then P (Xn An ) = 1 while P (U An ) = 0. Therefore, it follows that dT V (Xn , U ) = 1 for all n.2

hd

hd 2dT V (, ) h

(13.3)

and in particular, for all bounded and measurable functions, h : R, hdn


hd if dT V (n , ) 0.

(13.4)

Nevertheless we would like Xn to be close to U in distribution. Let us observe that if we let Fn (y ) := P (Xn y ) and F (y ) := P (U y ) , then Fn (y ) = P (Xn y ) = 1 i # i {1, 2, . . . , n} : y n n

Proof. We begin by observing that hd


hd =

h (f g ) dm

|h| |f g | dm

and F (y ) := P (U y ) = (y 1) 0. . From these formula, it easily follows that F (y ) = limn Fn (y ) for all y R. This suggest that we should say that Xn converges in distribution to X i P (Xn y ) P (X y ) for all y R. However, the next simple example shows this denition is also too restrictive. Example 13.7. Suppose that P (Xn = 1/n) = 1 for all n and P (X0 = 0) = 1. Then it is reasonable to insist that Xn converges of X0 in distribution. However, Fn (y ) = 1y1/n 1y0 = F0 (y ) for all y R except for y = 0. Observe that y is the only point of discontinuity of F0 . Notation 13.8 Let (X, d) be a metric space, f : X R be a function. The set of x X where f is continuous (discontinuous) at x will be denoted by C (f ) (D (f )). Observe that if F : R [0, 1] is a non-decreasing function, then C (F ) is at most countable. To see this, suppose that > 0 is given and let C := {y R : F (y +) F (y ) } . If y < y with y, y C , then F (y +) < F (y ) and (F (y ) , F (y +)) and (F (y ) , F (y +)) are disjoint intervals of length greater that . Hence it follows that 1 = m ([0, 1])
y C

|f g | dm = 2dT V (, ) h

Moreover, from the proof of Sche es Lemma 13.3, we have dT V (, ) = 1 2 hd


hd

when h := 1f >g 1f g . These two equations prove Eqs. (13.2) and (13.3) and the latter implies Eq. (13.4). Exercise 13.1. Under the hypothesis of Sche es Lemma 13.3, show
TV

|f g | dm = 2dT V (, ) .

Exercise 13.2. Suppose that is a (at most) countable set, B := 2 , and {n }n=0 are probability measures on (, B ) . Let fn ( ) := n ({ }) for . Show 1 dT V (n , 0 ) = |fn ( ) f0 ( )| 2

and limn dT V (n , 0 ) = 0 i limn n ({ }) = 0 ({ }) for all . Notation 13.5 Suppose that X and Y are random variables, let dT V (X, Y ) := dT V (X , Y ) = sup |P (X A) P (Y A)| ,
ABR

m ((F (y ) , F (y +))) # (C )

and hence that # (C ) 1 < . Therefore C := k=1 C1/k is at most countable.


2

where X = P X

and Y = P Y

More generally, if and are two probability measure on (R, BR ) such that ({x}) = 0 for all x R while concentrates on a countable set, then dT F (, ) = 1.

Page: 144

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

13.2 Weak Convergence

145

Denition 13.9. Let {F, Fn : n = 1, 2, . . . } be a collection of right continuous non-increasing functions from R to [0, 1] and by abuse of notation let us also denote the associated measures, F and Fn by F and Fn respectively. Then 1. Fn converges to F vaguely and write, Fn F, i Fn ((a, b]) F ((a, b]) for all a, b C (F ) . w 2. Fn converges to F weakly and write, Fn F, i Fn (x) F (x) for all x C (F ) . 3. We say F is proper, if F is a distribution function of a probability measure, i.e. if F () = 1 and F () = 0. Example 13.10. If Xn and U are as in Example 13.6 and Fn (y ) := P (Xn y ) v w and F (y ) := P (Y y ) , then Fn F and Fn F. Lemma 13.11. Let {F, Fn : n = 1, 2, . . . } be a collection of proper distribution v w functions. Then Fn F i Fn F. In the case where Fn and F are proper w and Fn F, we will write Fn = F. Proof. If Fn F, then Fn ((a, b]) = Fn (b) Fn (a) F (b) F (a) = v v F ((a, b]) for all a, b C (F ) and therefore Fn F. So now suppose Fn F and let a < x with a, x C (F ) . Then F (x) = F (a) + lim [Fn (x) Fn (a)] F (a) + lim inf Fn (x) .
n n w v

Example 13.13 (Central Limit Theorem). The central limit theorem (see the next chapter) states; if {Xn }n=1 are i.i.d. L2 (P ) random variables with := EX1 and 2 = Var (X1 ) , then Sn n d = N (0, ) = N (0, 1) . n Written out explicitly we nd lim P a< Sn n b n = P (a < N (0, 1) b) 1 = 2 or equivalently put 1 lim P n + na < Sn n + nb = n 2 More intuitively, we have Sn = n +
d b a b a

e 2 x dx

e 2 x dx.

nN (0, 1) = N n, n 2 .

Letting a , using the fact that F is proper, implies F (x) lim inf Fn (x) .
n

Lemma 13.14. Suppose X is a random variable, {cn }n=1 R, and Xn = X + cn . If c := limn cn exists, then Xn = X + c. Proof. Let F (x) := P (X x) and Fn (x) := P (Xn x) = P (X + cn x) = F (x cn ) . Clearly, if cn c as n , then for all x C (F ( c)) we have Fn (x) F (x c) . Since F (x c) = P (X + c x) , we see that Xn = X + c. Observe that Fn (x) F (x c) only for x C (F ( c)) but this is sucient to assert Xn = X + c. Example 13.15. Suppose that P (Xn = n) = 1 for all n, then Fn (y ) = 1yn 0 = F (y ) as n . Notice that F is not a distribution function because all 1 for all of the mass went o to +. Similarly, if we suppose, P (Xn = n) = 2 1 1 n, then Fn = 2 1[n,n) + 1[n,) 2 = F (y ) as n . Again, F is not a distribution function on R since half the mass went to while the other half went to +. Example 13.16. Suppose X is a non-zero random variables such that X = X, d n then Xn := (1) X = X for all n and therefore, Xn = X as n . On the other hand, Xn does not converge to X almost surely or in probability.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
d

Likewise, F (x) F (a) = lim [Fn (x) Fn (a)] lim sup [Fn (x) 1] = lim sup Fn (x) 1
n n n

which upon letting a , (so F (a) 1) allows us to conclude, F (x) lim sup Fn (x) .
n

Denition 13.12. A sequence of random variables, {Xn }n=1 is said to converge weakly or to converge in distribution to a random variable X (written Xn = X ) i Fn (y ) := P (Xn y ) = F (y ) := P (X y ) .

Page: 145

job: prob

146

13 Weak Convergence Results

The next theorem summarizes a number of useful equivalent characterizations of weak convergence. (The reader should compare Theorem 13.17 with Corollary 13.4.) In this theorem we will write BC (R) for the bounded continuous functions, f : R R (or f : R C) and Cc (R) for those f C (R) which have compact support, i.e. f (x) 0 if |x| is suciently large. Theorem 13.17. Suppose that {n }n=0 is a sequence of probability measures on (R, BR ) and for each n, let Fn (y ) := n ((, y ]) be the (proper) distribution function associated to n . Then the following are equivalent. 1. For all f BC (R) , f dn
R R

lim inf n ((a, b]) 0 ((a, b)) = 0 ((a, b]) ,


n

where the second equality in each of the equations holds because a and b are points of continuity of F0 . Hence we have shown that limn n ((a, b]) exists and is equal to 0 ((a, b]) .

f d0 as n .

(13.5)

2. Eq. (13.5) holds for all f BC (R) which are uniformly continuous. 3. Eq. (13.5) holds for all f Cc (R) . 4. Fn = F. 5. There exists a probability space (, B , P ) and random variables, Yn , on this 1 = n for all n and Yn Y0 a.s. space such that P Yn Proof. Clearly 1. = 2. = 3. and 5. = 1. by the dominated convergence theorem. Indeed, we have f dn = E [f (Yn )] E [f (Y )] =
R R D.C.T.

Fig. 13.1. The picture denition of the trapezoidal functions, f and g .

f d0

for all f BC (R) . Therefore it suces to prove 3. = 4. and 4. = 5. The proof of 4. = 5. will be the content of Skorohods Theorem 13.28 below. Given Skorohods Theorem, we will now complete the proof. (3. = 4.) Let < a < b < with a, b C (F0 ) and for > 0, let f (x) 1(a,b] and g (x) 1(a,b] be the functions in Cc (R) pictured in Figure 13.1. Then lim sup n ((a, b]) lim sup
n n R

Corollary 13.18. Suppose that {Xn }n=0 is a sequence of random variables, such that Xn X0 , then Xn = X0 . (Recall that example 13.16 shows the converse is in general false.) Proof. Let g BC (R) , then by Corollary 11.9, g (Xn ) g (X0 ) and since g is bounded, we may apply the dominated convergence theorem (see Corollary 11.8) to conclude that E [g (Xn )] E [g (X0 )] . Lemma 13.19. Suppose {Xn }n=1 is a sequence of random variables on a com P P

f dn =
R

f d0

(13.6)

and lim inf n ((a, b]) lim inf


n n R

g dn =
R

g d0 .

(13.7)

mon probability space and c R. Then Xn = c i Xn c. Proof. Recall that Xn c i for all > 0, P (|Xn c| > ) 0. Since {|Xn c| > } = {Xn > c + } {Xn < c } it follows Xn c i P (Xn > x) 0 for all x > c and P (Xn < x) 0 for all x < c. These conditions are also equivalent to P (Xn x) 1 for all x > c and P P (Xn x) P (Xn x ) 0 for all x < c (where x < x < c). So Xn c i
macro: svmonob.cls date/time: 23-Feb-2007/15:20
P P

Since f 1[a,b] and g 1(a,b) as 0, we may use the dominated convergence theorem to pass to the limit as 0 in Eqs. (13.6) and (13.7) to conclude, lim sup n ((a, b]) 0 ([a, b]) = 0 ((a, b])
n

and
Page: 146 job: prob

13.3 Derived Weak Convergence


n

147

lim P (Xn x) =

0 if x < c = F (x) 1 if x > c


P

2. If is a sequence of distribution functions converging weakly to F, then Fn converges to F uniformly on R, i.e.


n xR

{Fn }n=1

where F (x) = P (c x) = 1xc . Since C (F ) = R \ {c} , we have shown Xn c i Xn = c. We end this section with a few more equivalent characterizations of weak convergence. The combination of Theorem 13.17 and 13.20 is often called the Portmanteau Theorem. Theorem 13.20 (The Portmanteau Theorem). Suppose {Fn }n=0 are proper distribution functions. By abuse of notation, we will denote Fn (A) simply by Fn (A) for all A BR . Then the following are equivalent. 1. Fn = F0 . 2. lim inf n Fn (U ) F0 (U ) for open subsets, U R. 3. lim supn Fn (C ) F0 (C ) for all closed subsets, C R. 4. limn Fn (A) = F0 (A) for all A BR such that F0 (A) = 0. Proof. (1. = 2.) By Theorem 13.28 we may choose random variables, Yn , such that P (Yn y ) = Fn (y ) for all y R and n N and Yn Y0 a.s. as n . Since U is open, it follows that 1U (Y ) lim inf 1U (Yn ) a.s.
n

lim sup |F (x) Fn (x)| = 0.

In particular, it follows that sup |F ((a, b]) Fn ((a, b])| = sup |F (b) F (a) (Fn (b) Fn (a))|
a<b a<b

sup |F (b) Fn (b)| + sup |Fn (a) Fn (a)|


b a

0 as n . Hints for part 2. Given > 0, show that there exists, = 0 < 1 < < n = , such that |F (i+1 ) F (i )| for all i. Now show, for x [i , i+1 ), that |F (x) Fn (x)| (F (i+1 ) F (i ))+|F (i ) Fn (i )|+(Fn (i+1 ) Fn (i )) .

13.3 Derived Weak Convergence


Lemma 13.21. Let (X, d) be a metric space, f : X R be a function, and D (f ) be the set of x X where f is discontinuous at x. Then D (f ) is a Borel measurable subset of X. Proof. For x X and > 0, let Bx ( ) = {y X : d (x, y ) < } . Given > 0, let f : X R {} be dened by, f (x) := sup f (y ) .
y Bx ( )

and so by Fatous lemma, F (U ) = P (Y U ) = E [1U (Y )] lim inf E [1U (Yn )] = lim inf P (Yn U ) = lim inf Fn (U ) .
n n n

(2. 3.) This follows from the observations: 1) C R is closed i U := C c is open, 2) F (U ) = 1 F (C ) , and 3) lim inf n (Fn (C )) = lim supn Fn (C ) . with F0 A \ Ao = 0. (2. and 3. 4.) If F0 (A) = 0, then Ao A A Therefore F0 A = F0 (A) . F0 (A) = F0 (Ao ) lim inf Fn (Ao ) lim sup Fn A
n n

(4. = 1.) Let a, b C (F0 ) and take A := (a, b]. Then F0 (A) = F0 ({a, b}) = 0 and therefore, limn Fn ((a, b]) = F0 ((a, b]) , i.e. Fn = F0 . Exercise 13.3. Suppose that F is a continuous proper distribution function. Show, 1. F : R [0, 1] is uniformly continuous.
Page: 147 job: prob

We will begin by showing f is lower semi-continuous, i.e. f a is closed (or equivalently f > a is open) for all a R. Indeed, if f (x) > a, then there exists y Bx ( ) such that f (y ) > a. Since this y is in Bx ( ) whenever d (x, x ) < d (x, y ) (because then, d (x , y ) d (x, y ) + d (x, x ) < ) it follows that f (x ) > a for all x Bx ( d (x, y )) . This shows f > a is open in X. We similarly dene f : X R {} by f (x) := Since f = (f ) , it follows that {f a} = (f ) a
macro: svmonob.cls date/time: 23-Feb-2007/15:20
y Bx ( )

inf

f (y ) .

148

13 Weak Convergence Results

is closed for all a R, i.e. f is upper semi-continuous. Moreover, f f f for all > 0 and f f 0 and f f0 as 0, where f0 f f 0 and f0 : X R {} and f 0 : X R {} are measurable functions. The proof is now complete since it is easy to see that D (f ) = f > f0 = f f0 = 0 BX .
0 0

where M = sup |f | . Since, Xn = X, we know E [f (Xn , c)] E [f (X, c)] and hence we have shown, lim sup |E [f (Xn , Yn ) f (X, c)]|
n

lim sup |E [f (Xn , Yn ) f (Xn , c)]| + lim sup |E [f (Xn , c) f (X, c)]| .
n n

Remark 13.22. Suppose that xn x with x C (f ) := D (f ) . Then f (xn ) f (x) as n . Theorem 13.23 (Continuous Mapping Theorem). Let f : R R be a Borel measurable functions. If Xn = X0 and P (X0 D (f )) = 0, then f (Xn ) = f (X0 ) . If in addition, f is bounded, Ef (Xn ) Ef (X0 ) . Proof. Let {Yn }n=0 be random variables on some probability space as in Theorem 13.28. For g BC (R) we observe that D (g f ) D (f ) and therefore, P (Y0 D (g f )) P (Y0 D (f )) = P (X0 D (f )) = 0. Hence it follows that g f Yn g f Y0 a.s. So an application of the dominated convergence theorem (see Corollary 11.8) implies E [g (f (Xn ))] = E [g (f (Yn ))] E [g (f (Y0 ))] = E [g (f (X0 ))] . (13.8)

Since > 0 was arbitrary, we learn that limn Ef (Xn , Yn ) = Ef (X, c) . Now suppose f BC R2 with f 0 and let k (x, y ) [0, 1] be continuous functions with compact support such that k (x, y ) = 1 if |x| |y | k and k (x, y ) 1 as k . Then applying what we have just proved to fk := k f, we nd E [fk (X, c)] = lim E [fk (Xn , Yn )] lim inf E [f (Xn , Yn )] .
n n

Letting k in this inequality then implies that E [f (X, c)] lim inf E [f (Xn , Yn )] .
n

This inequality with f replaced by M f 0 then shows, M E [f (X, c)] lim inf E [M f (Xn , Yn )] = M lim sup E [f (Xn , Yn )] .
n n

Hence we have shown, lim sup E [f (Xn , Yn )] E [f (X, c)] lim inf E [f (Xn , Yn )]
n n

This proves the rst assertion. For the second assertion we take g (x) = (x M ) (M ) in Eq. (13.8) where M is a bound on |f | . Theorem 13.24 (Slutzkys Theorem). Suppose that Xn = X and P Yn c where c is a constant. Then (Xn , Yn ) = (X, c) in the sense that E [f (Xn , Yn )] E [f (X, c)] for all f BC R2 . In particular, by taking f (x, y ) = g (x + y ) and f (x, y ) = g (x y ) with g BC (R) , we learn Xn + Yn = X + c and Xn Yn = X c respectively. Proof. First suppose that f Cc R2 , and for > 0, let := () be chosen so that |f (x, y ) f (x , y )| if (x, y ) (x , y ) . Then |E [f (Xn , Yn ) f (Xn , c)]| E [|f (Xn , Yn ) f (Xn , c)| : |Yn c| ] + E [|f (Xn , Yn ) f (Xn , c)| : |Yn c| > ] + 2M P (|Yn c| > ) as n ,

and therefore limn E [f (Xn , Yn )] = E [f (X, c)] for all f BC R2 with f 0. This completes the proof since any f BC R2 may be written as a dierence of its positive and negative parts. Theorem 13.25 ( method). Suppose that {Xn }n=1 are random variables, b R, an R\ {0} with limn an = 0, and Xn b = Z. an If g : R R be a measurable function which is dierentiable at b, then g (Xn ) g (b) = g (b) Z. an Proof. Observe that Xn b = an Xn b = 0 Z = 0 an
date/time: 23-Feb-2007/15:20

Page: 148

job: prob

macro: svmonob.cls

13.4 Skorohod and the Convergence of Types Theorems

149

so that Xn = b and hence Xn b. By denition of the derivative of g at b, we have g (x + ) = g (b) + g (b) + () where () 0 as 0. Let Yn and Y be random variables on a xed probability space such that Yn = Xn = an Yn + b, so that g (Xn ) g (b) d g (an Yn + b) g (b) an Yn (an Yn ) = = g (b) Yn + an an an = g (b) Yn + Yn (an Yn ) g (b) Y a.s. This completes the proof since g (b) Y = g (b) Z. Example 13.26. Suppose that {Un }n=1 are i.i.d. random variables which are uniformly distributed on [0, 1] and let Yn := j =1 Ujn . Our goal is to nd an bn is weakly convergent to a non-constant random variable. and bn such that Yna n To this end, let n 1 ln Uj . Xn := ln Yn = n j =1 By the strong law of large numbers, lim Xn = E [ln U1 ] =
0 a.s. 1 a.s. 1 n
1

Yn e
1 n

= e1 N (0, 1) = N 0, e2 .

Hence we have shown, n


j =1 n

Xn b an

and Y = Z with Yn Y a.s. Then

Uj e1 = N 0, e2 .
1 n

Exercise 13.4. Given a function, f : X R and a point x X, let lim inf f (y ) := lim
y x y x 0 y Bx ( ) 0 y B ( ) x

inf

f (y ) and

(13.9) (13.10)

lim sup f (y ) := lim sup f (y ) , where Bx ( ) := {y X : 0 < d (x, y ) < } . Show f is lower (upper) semi-continuous i lim inf yx f (y ) lim supyx f (y ) f (x) for all x X.

f (x)

Solution to Exercise (13.4). Suppose Eq. (13.9) holds, a R, and x X such that f (x) > a. Since, lim
0 y Bx ( )

inf

f (y ) = lim inf f (y ) f (x) > a,


y x

ln xdx = [x ln x x]0 = 1

and therefore, limn Yn = e Let us further observe that

.
1

E ln2 U1 =
0 2

ln2 xdx = 2

it follows that inf yBx () f (y ) > a for some > 0. Hence we may conclude that Bx ( ) {f > a} which shows {f > a} is open. Conversely, suppose now that {f > a} is open for all a R. Given x X and a < f (x) , there exists > 0 such that Bx ( ) {f > a} . Hence it follows that lim inf yx f (y ) a and then letting a f (x) then implies lim inf yx f (y ) f (x) .

so that Var (ln U1 ) = 2 (1) = 1. Hence by the central limit theorem, Xn (1)
1 n

13.4 Skorohod and the Convergence of Types Theorems


n (Xn + 1) = N (0, 1) . Notation 13.27 Given a proper distribution function, F : R [0, 1] , let Y = F : (0, 1) R be the function dened by Y (x) = F (x) = sup {y R : F (y ) < x} . = g (1) N (0, 1) . Similarly, let
Xn

Therefore the method implies, g (Xn ) g (1)


1 n

Taking g (x) := e using g (Xn ) = e


Page: 149

Y + (x) := inf {y R : F (y ) > x} . = Yn , then implies


job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

150

13 Weak Convergence Results

order to nish the proof it suces to show, Yn (x) Y (x) for all x / E, where E is the countable null set dened as above, E := {x (0, 1) : Y (x) < Y + (x)} . We now suppose x / E. If y C (F0 ) with y < Y (x) , we have limn Fn (y ) = F0 (y ) < x and in particular, Fn (y ) < x for almost all n. This implies that Yn (x) y for a.a. n and hence that lim inf n Yn (x) y. Letting y Y (x) with y C (F0 ) then implies lim inf Yn (x) Y (x) .
n

Similarly, for x / E and y C (F0 ) with Y (x) = Y + (x) < y, we have limn Fn (y ) = F0 (y ) > x and in particular, Fn (y ) > x for almost all n. This implies that Yn (x) y for a.a. n and hence that lim supn Yn (x) y. Letting y Y (x) with y C (F0 ) then implies We will need the following simple observations about Y and Y + which are easily understood from Figure 13.4. 1. Y (x) Y + (x) and Y (x) < Y + (x) i x is the height of a at spot of F. 2. The set, E := {x (0, 1) : Y (x) < Y + (x)} , of at spot heights is at most countable. This is because, {(Y (x) , Y + (x))}xE is a collection of pairwise disjoint intervals which is necessarily countable. (Each such interval contains a rational number.) 3. The following inequality holds, F (Y (x) ) x F (Y (x)) for all x (0, 1) . (13.11) lim sup Yn (x) Y (x) .
n

Hence we have shown, for x / E, that lim sup Yn (x) Y (x) lim inf Yn (x)
n n

which shows
n lim Fn (x) = lim Yn (x) = Y (x) = F (x) for all x / E. n

(13.12)

Indeed, if y > Y (x) , then F (y ) x and by right continuity of F it follows that F (Y (x)) x. Similarly, if y < Y (x) , then F (y ) < x and hence F (Y (x) ) x. 4. {x (0, 1) : Y (x) y0 } = (0, F (y0 )] (0, 1) . To prove this assertion rst suppose that Y (x) y0 , then according to Eq. (13.11) we have x F (Y (x)) F (y0 ) , i.e. x (0, F (y0 )] (0, 1) . Conversely, if x (0, 1) and x F (y0 ) , then Y (x) y0 by denition of Y. 5. As a consequence of item 4. we see that Y is B(0,1) /BR measurable and m Y 1 = F, where m is Lebesgue measure on (0, 1) , B(0,1) . Theorem 13.28 (Baby Skorohod Theorem). Suppose that {Fn }n=0 is a collection of distribution functions such that Fn = F0 . Then there ex ists a probability space, (, B , P ) and random variables, {Yn }n=1 such that P (Yn y ) = Fn (y ) for all n N {} and limn Fn = limn Yn = Y = F a.s. Proof. We will take := (0, 1) , B = B(0,1) , and P = m Lebesgue measure on and let Yn := Fn and Y := F0 as in Notation 13.27. Because of the above comments, P (Yn y ) = Fn (y ) and P (Y y ) = F0 (y ) for all y R. So in
Page: 150 job: prob

Denition 13.29. Two random variables, Y and Z, are said to be of the same type if there exists constants, A > 0 and B R such that Z = AY + B.
d

(13.13)

Alternatively put, if U (y ) := P (Y y ) and V (y ) := P (Z y ) , then U and V should satisfy, U (y ) = P (Y y ) = P (Z Ay + B ) = V (Ay + B ) . For the next theorem we will need the following elementary observation. Lemma 13.30. If Y is non-constant (a.s.) random variable and U (y ) := P (Y y ) , then U (1 ) < U (2 ) for all 1 suciently close to 0 and 2 suciently close to 1. Proof. Observe that Y is constant i U (y ) = 1yc for some c R, i.e. i U only takes on the values, {0, 1} . So since Y is not constant, there exists y R such that 0 < U (y ) < 1. Hence if 2 > U (y ) then U (2 ) y and if 1 < U (y ) then U (1 ) y. Moreover, if we suppose that 1 is not the height of a at spot of U, then in fact, U (1 ) < U (2 ) . This inequality then remains valid as 1 decreases and 2 increases.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

13.4 Skorohod and the Convergence of Types Theorems

151

Theorem 13.31 (Convergence of Types). Suppose is a sequence of random variables and an , n (0, ) , bn , n R are constants and Y and Z are non-constant random variables. Then 1. if X n bn = Y an and Xn n = Z, n n n bn A = lim (0, ) and B := lim n an n an
d

{Xn }n=1

X n bn y an

= Fn (an y + bn ) and P

Xn n y n

= Fn (n y + n ) .

By assumption we have Fn (an y + bn ) = U (y ) and Fn (n y + n ) = V (y ) .

(13.14)

If w := sup {y : Fn (an y + bn ) < x} , then an w + bn = Fn (x) and hence

(13.15) Similarly, (13.16)

sup {y : Fn (an y + bn ) < x} =

Fn (x) bn . an

then Y and Z are of the same type. Moreover, the limits,

exists and Y = AZ + B. 2. If the relations in Eq. (13.16) hold then either of the convergences in Eqs. (13.14) or (13.15) implies the others with Z and Y related by Eq. (13.13). 3. If there are some constants, an > 0 and bn R and a non-constant random variable Y, such that Eq. (13.14) holds, then Eq. (13.15) holds using n and n of the form,
n := Fn (2 ) Fn (1 ) and n := Fn (1 )

Fn (x) n . n With these identities, it now follows from the proof of Skorohods Theorem 13.28 (see Eq. (13.12)) that there exists an at most countable subset, , of (0, 1) such that,

sup {y : Fn (n y + n ) < x} =

(x) bn Fn = sup {y : Fn (an y + bn ) < x} U (x) and an (x) n Fn = sup {y : Fn (n y + n ) < x} V (x) n

(13.17)

for some 0 < 1 < 2 < 1. If the Fn are invertible functions, Eq. (13.17) may be written as Fn (n ) = 1 and Fn (n + n ) = 2 . (13.18)

for all x / . Since Y and Z are not constants a.s., we can choose, by Lemma 13.30, 1 < 2 not in such that U (1 ) < U (2 ) and V (1 ) < V (2 ) . In particular it follows that
F (2 ) bn F (1 ) bn Fn (2 ) Fn (1 ) = n n an an an U (2 ) U (1 ) > 0

(13.19)

Proof. (2) Assume the limits in Eq. (13.16) hold. If Eq. (13.14) is satised, then by Slutskys Theorem 13.20, Xn bn + bn n an Xn n = n an n Xn bn an n bn an = an n an n 1 = A (Y B ) =: Z Similarly, if Eq. (13.15) is satised, then Xn n n n bn X n bn = + = AZ + B =: Y. an n an an (1) If Fn (y ) := P (Xn y ) , then

and similarly
Fn (2 ) Fn (1 ) V (2 ) V (1 ) > 0. n

Taking ratios of the last two displayed equations shows, U (2 ) U (1 ) n A := (0, ) . an V (2 ) V (1 ) Moreover,
Fn (1 ) bn U (1 ) and an Fn (1 ) n F (1 ) n n = n AV (1 ) an n an

(13.20)

Page: 151

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

152

13 Weak Convergence Results

and therefore, n bn F (1 ) n F (1 ) bn = n n AV (1 ) U (1 ) := B. an an an
Fn (1 ) and n := (2 ) Fn (3) Now suppose that we dene n := Fn (1 ) , then according to Eqs. (13.19) and (13.20)we have

From this it follows that bn 1 ln n. Given this, we now try to nd an by requiring, P Mn b n 1 an = Fn (an + bn ) = [F (an + bn )] 2 (0, 1) .
n

n /an U (2 ) U (1 ) (0, 1) and n bn U (1 ) as n . an Thus we may always center and scale the {Xn } using n and n of the form described in Eq. (13.17).

However, by what we have done above, this requires an + bn 1 ln n. Hence we may as well take an to be constant and for simplicity we take an = 1. 2. We now compute
n

lim P Mn 1 ln n x = lim = lim

1 e(x+ 1 ex n

ln n)

= exp ex .

13.5 Weak Convergence Examples


Example 13.32. Suppose that {Xn }n=1 are i.i.d. exp () random variables, i.e. Xn 0 a.s. and P (Xn x) = ex for all x 0. In this case F (x) := P (X1 x) = 1 e(x0) Consider Mn := max (X1 , . . . , Xn ) . We have, for x 0 and cn (0, ) that Fn (x) := P (Mn x) = P n j =1 {Xj x}
n

Notice that F (x) is a distribution function for some random variable, Y, and therefore we have shown Mn 1 ln n = Y as n

where P (Y x) = exp ex . Example 13.33. For p (0, 1) , let Xp denote the number of trials to get success in a sequence of independent trials with success probability p. Then n P (Xp > n) = (1 p) and therefore for x > 0, P (pXp > x) = P Xp >
x

=
j =1

P (Xj x) = [F (x)] = 1 ex
Mn bn an

x p

x x = (1 p)[ p ] = e[ p ] ln(1p)

ep[ p ] ex as p 0. = Y. Therefore pXp = T where T = exp (1) , i.e. P (T > x) = ex for x 0 or alternatively, P (T y ) = 1 ey0 . Remarks on this example. Let us see in a couple of ways where the appropriate centering and scaling of the Xp come from in this example. For n1 this let q = 1 p, then P (Xp = n) = (1 p) p = q n1 p for n N. Also let Fp (x) = P (Xp x) = P (Xp [x]) = 1 q [x]
n d

We now wish to nd an > 0 and bn R such that 1. To this end we note that P Mn b n x an = P (Mn an x + bn )

= Fn (an x + bn ) = [F (an x + bn )] . If we demand (c.f. Eq. (13.18) above) P Mn bn 0 an = Fn (bn ) = [F (bn )] 1 (0, 1) ,

then bn and we nd ln 1 n ln F (bn ) = n ln 1 ebn nebn .


Page: 152 job: prob

where [x] := n=1 n 1[n,n+1) . Method 1. Our goal is to choose ap > 0 and bp R such that limp 0 Fp (ap x + bp ) exists. As above, we rst demand (taking x = 0) that
p 0

lim Fp (bp ) = 1 (0, 1) .

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

13.5 Weak Convergence Examples

153

Since, 1 Fp (bp ) 1 q bp we require, q bp 1 1 and hence, c bp ln q = bp ln (1 p) bp p. This suggests that we take bp = 1/p say. Having done this, we would like to choose ap such that F0 (x) := lim Fp (ap x + bp ) exists.
p 0

and in particular, E [Xp (Xp 1) . . . (Xp k + 1)] = f (k) (1) = Since d dz


k

|z=1

pz . 1 qz

Since, F0 (x) Fp (ap x + bp ) 1 q this requires that (1 p) and hence that ln (1 F0 (x)) = (ap x + bp ) ln q (ap x + bp ) (p) = pap x 1. From this (setting x = 1) we see that pap c > 0. Hence we might take ap = 1/p as well. We then have
p1 (x+1)] Fp (ap x + bp ) = Fp p1 x + p1 = 1 (1 p)[ ap x+bp ap x+bp

d pz p (1 qz ) + qpz p = = 2 2 dz 1 qz (1 qz ) (1 qz ) d2 pz pq =2 3 dz 2 1 qz (1 qz ) p
2

and =q
ap x+bp

1 F0 (x) it follows that

(1 q ) 2q E [Xp (Xp 1)] = 2 3 = p2 . (1 q ) pq Therefore,


2 2 p = Var (Xp ) = EXp (EXp ) = 2

p := EXp =

1 and p

which is equal to 0 if x 1, and for x > 1 we nd


p1 (x+1)] (1 p)[ = exp

p1 (x + 1) ln (1 p) exp ( (x + 1)) .

2q 1 + p2 p

1 p

Hence we have shown,


p 0

q 1p 2q + p 1 = 2 = . 2 p p p2

lim Fp (ap x + bp ) = [1 exp ( (x + 1))] 1x1 Xp 1/p = pXp 1 = T 1 1/p

Thus, if we had used p and p to center and scale Xp we would have considered, Xp
1p p 1 p

pXp 1 = T 1 = 1p

or again that pXp = T. Method 2. (Center and scale using the rst moment and the variance of Xp .) The generating function is given by

instead. Theorem 13.34. Let {Xn }n=1 be i.i.d. random variables such that P (Xn = 1) = 1/2 and let Sn := X1 + + Xn the position of a drunk after n steps. Observe that |Sn | is an odd integer if n is odd and an even Sm integer if n is even. Then = N (0, 1) as m . m Proof. (Sketch of the proof.) We start by observing that S2n = 2k i # {i 2n : Xi = 1} = n + k while # {i 2n : Xi = 1} = 2n (n + k ) = n k and therefore,
macro: svmonob.cls date/time: 23-Feb-2007/15:20

f (z ) := E z Xp =
n=1

z n q n1 p =

pz . 1 qz

Observe that f (z ) is well dened for |z | < 1 q and that f (1) = 1, reecting the fact that P (Xp N) = 1, i.e. a success must occur almost surely. Moreover, we have f (z ) = E Xp z f
(k ) Xp 1

, f (z ) = E Xp (Xp 1) z
Xp k

Xp 2

,...

(z ) = E Xp (Xp 1) . . . (Xp k + 1) z

Page: 153

job: prob

154

13 Weak Convergence Results

P (S2n = 2k ) =

2n n+k

1 2

2n

(2n)! (n + k )! (n k )!

1 2

2n

Recall Stirlings formula states, n! nn en 2n as n and therefore, P (S2n = 2k ) = (n +


n+k (n+k) k) e

2k with k {0, 1, . . . , n} . Since where the sum is over x of the form, x = 2n 2 is the increment of x as k increases by 1 , we see the latter expression in 2n Eq. (13.21) is the Riemann sum approximation to

1 2 This proves
S2n 2n

b a

ex

/2

dx.

= N (0, 1) . Since 1 1+
1 2n

(2n) n (n + k ) (n k ) 1 1+ 1
k n

2n 2n

4n
nk (nk) k) e

2 (n + k ) (n 1+ k n
(n+k)

2 (n k )

1 2

2n

S S n + X2n+1 S2 n 2n+1 = 2 = 2n + 1 2n + 1 2n

X2n+1 + , 2n + 1
S 2n+1 2n+1

1 k2 n2
n

k n k n

(nk)

1 = n 1 = n

1
n

k n

1 k n

1+ 1

1
k1/2

k n

it follows directly (or see Slutskys Theorem 13.20) that as well.

= N (0, 1)

k2 n2

k1/2

1+

k n

.
x , 2n

So if we let x := 2k/ 2n, i.e. k = x n/2 and k/n = P S 2n = x 2n

Proposition 13.35. Suppose that {Un }n=1 are i.i.d. random variables which are uniformly distributed in (0, 1) . Let U(k,n) denote the position of the k th largest number from the list, {U1 , U2 , . . . , Un } . Further let k (n) be chosen so n) that limn k (n) = while limn k( n = 0 and let Xn := U(k(n),n) k (n) /n .
k ( n) n

we have

n x n/21/2 1 x2 x x 1 1+ 1 2n n 2n 2n     1 x2 /2 x x n/ 2 1 / 2 n/ 2 1 / 2 x x e e 2n e 2n n 2 1 ex /2 , n wherein we have repeatedly used (1 + an ) We now compute P S2 n a b 2n =


axb bn

n/21/2

Then dT V (Xn , N (0, 1)) 0 as n . Proof. (Sketch only. See Resnick, Proposition 8.2.1 for more details.) Observe that, for x (0, 1) , that
n n

P U(k,n) x = P
i=1

Xi k

=
l=k

n l nl x (1 x) . l

= ebn ln(1+an ) ebn an when an 0.

d From this it follows that n (x) := 1(0,1) (x) dx P U(k,n) x is the probability density for U(k,n) . It now turns out that n (x) is a Beta distribution,

n (x) = P S 2n = x 2n ex
axb
2

n nk k xk1 (1 x) . k

1 = 2

/2

2 2n

(13.21)

Giving a direct computation of this result is not so illuminating. So let us go another route. To do this we are going to estimate, P U(k,n) (x, x + ] , for (0, 1) . Observe that if U(k,n) (x, x + ], then there must be at least one Ui (x, x + ], for otherwise, U(k,n) x + would imply U(k,n) x as well and hence U(k,n) / (x, x + ]. Let
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 154

job: prob

13.5 Weak Convergence Examples

155

i := {Ui (x, x + ] and Uj / (x, x + ] for j = i} . Since P (Ui , Uj (x, x + ] for some i = j with i, j n)
i<j n

we arrive at n n! (k 1)! (n k )! 2 1
k (k1/2) n k (nk+1/2) n

P (Ui , Uj (x, x + ]) n2 n 2 , 2

By the change of variables formula, with x= u k (n) /n


k ( n) n

we see that
n

P U(k,n) (x, x + ] =
i=1

P U(k,n) (x, x + ], i + O 2 on noting the du = x=

= nP U(k,n) (x, x + ], 1 + O 2 . Now on the set, 1 ; U(k,n) (x, x + ] i there are exactly k 1 of U2 , . . . , Un in [0, x] and n k of these in [x + , 1] . This leads to the conclusion that P U(k,n) n 1 k1 nk (x, x + ] = n x (1 (x + )) + O 2 k1

k ( n) dx, n

x = k (n) at u = 0, and

1 k (n) /n n k (n) = k ( n) k (n)


n

n k (n)

k (n) n

n k (n)

k (n) n

=: bn ,

and therefore, P U(k,n) (x, x + ] n! nk = xk1 (1 x) . n (x) = lim 0 (k 1)! (n k )! By Stirlings formula, n! (k 1)! (n k )! (k 1) 1 ne = 2 1 ne = 2 Since k1 n
(k1/2) 1

E [F (Xn )] =
0

u k ( n ) /n du n (u) F
k ( n) n

bn

= nn en 2n 2 (k 1) (n k ) 1
k1 n nk (nk) n (nk) (nk) e

k ( n)

k (n) n n

k (n) x + k (n) /n F (x) du. n

Using this information, it is then shown in Resnick that 2 (n k ) k (n) n n k (n) x + k (n) /n n ex /2 2
2

(k1) (k1) e

k1 (k1) n

nk n

1
k1 (k1/2) n

which upon an application of Sche es Lemma 13.3 completes the proof.


k (nk+1/2) n

. Remark 13.36. It is possible to understand the normalization constants in the denition of Xn by computing the mean and the variance of U(n,k) . After some computations (see Chapter ??), one arrives at

= =

k n k n

(k1/2)

(k1/2)

k1 k 1 k

(k1/2)

(k1/2)

1 k n
(k1/2)

e1

Page: 155

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

156

13 Weak Convergence Results

EU(k,n) = =
2 EU( k,n) =

= Var U(k,n) = =

n! nk xk1 (1 x) xdx 0 (k 1)! (n k )! k k , n+1 n 1 n! nk 2 xk1 (1 x) x dx ( k 1)! ( n k )! 0 (k + 1) k and (n + 2) (n + 1) (k + 1) k k2 (n + 2) (n + 1) (n + 1)2

for all . If, for all x R, we dene F = G+ as in Eq. (13.22), then Fn (x) F (x) for all x C (F ) . (Note well; as we have already seen, it is possible that F () < 1 and F () > 0 so that F need not be a distribution function for a measure on (R, BR ) .) Proof. Suppose that x, y R with x < y and and s, t are chosen so that x < s < y < t. Then passing to the limit in the inequality, Fn (s) Fn (y ) Fn (t) implies F (x) = G+ (x) G (s) lim inf Fn (y ) lim sup Fn (y ) G (t) .
n n

k k k+1 n+1 n+2 n+1 k nk+1 k = 2. n + 1 (n + 2) (n + 1) n

Taking the innum over t (y, ) and then letting x R tend up to y, we may conclude F (y ) lim inf Fn (y ) lim sup Fn (y ) F (y ) for all y R.
n n

13.6 Compactness and Tightness


are two right continuous Suppose that R is a dense set and F and F on , then F = F on R. Indeed, for x R we have functions. If F = F () = F (x) . F (x) = lim F () = lim F
x x

This completes the proof, since F (y ) = F (y ) for y C (F ) . , BR The next theorem deals with weak convergence of measures on R . So with as not have to introduce any new machinery, the reader should identify R [1, 1] R via the map, [1, 1] x tan . x R 2

Lemma 13.37. If G : R is a non-decreasing function, then F (x) := G+ (x) := inf {G () : x < } is a non-decreasing right continuous function. Proof. To show F is right continuous, let x R and such that > x. Then for any y (x, ) , F (x) F (y ) = G+ (y ) G () and therefore, F (x) F (x+) := lim F (y ) G () .
y x

(13.22)

, BR Hence a probability measure on R may be identied with a probability measure on (R, BR ) which is supported on [1, 1] . Using this identication, we see that a should only be considered a point of continuity of a distribution [0, 1] i and only if F () = 0. On the other hand, is function, F : R always a point of continuity. Theorem 13.39 (Hellys Selection Theorem). Every sequence of probabil , BR ity measures, {n }n=1 , on R has a sub-sequence which is weakly conver , BR gent to a probability measure, 0 on R . Proof. Using the identication described above, rather than viewing n as , BR probability measures on R , we may view them as probability measures on (R, BR ) which are supported on [1, 1] , i.e. n ([1, 1]) = 1. As usual, let Fn (x) := n ((, x]) = n ((, x] [1, 1]) . Since {Fn (x)}n=1 [0, 1] and [0, 1] is compact, for each x R we may nd a convergence subsequence of {Fn (x)}n=1 . Hence by Cantors diagonalization
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Since > x with is arbitrary, we may conclude, F (x) F (x+) G+ (x) = F (x) , i.e. F (x+) = F (x) . Proposition 13.38. Suppose that {Fn }n=1 is a sequence of distribution functions and R is a dense set such that G () := limn Fn () [0, 1] exists
Page: 156 job: prob

13.7 Weak Convergence in Metric Spaces

157

argument we may nd a subsequence, {Gk := of the such that G (x) := limk Gk (x) exists for all x := Q. Letting F (x) := G (x+) as in Eq. (13.22), it follows from Lemma 13.37 and Proposition 13.38 that Gk = Fnk = F0 . Moreover, since Gk (x) = 0 for all x Q (, 1) and Gk (x) = 1 for all x Q [1, ). Therefore, F0 (x) = 1 for all x 1 and F0 (x) = 0 for all x < 1 and the corresponding measure, 0 is supported on [1, 1] . Hence 0 may now be transferred back to a measure , BR on R . Example 13.40. Suppose n = and n = and 1 2 (n + n ) = 1 ( + ) . This shows that probability may indeed transfer to the points 2 at . The next question we would like to address is when is the limiting measure, , BR 0 on R concentrated on R. The following notion of tightness is the key to answering this question. Denition 13.41. A collection of probability measures, , on (R, BR ) is tight i for every > 0 there exists M < such that

Fnk }k=1

{Fn }n=1

we may nd M < such that M , M C (F0 ) and n ([M , M ]) 1 for all n. Hence it follows that 0 ([M , M ]) = lim nk ([M , M ]) 1
k

and by letting 0 we conclude that 0 (R) = lim0 0 ([M , M ]) = 1. Conversely, suppose there is a subsequence {nk }k=1 such that nk = 0 , BR with 0 being a probability measure on R such that 0 (R) < 1. In this case 0 := 0 ({, }) > 0 and hence for all M < we have 0 ({, }) = 1 0 . 0 ([M, M ]) 0 R By choosing M so that M and M are points of continuity of F0 , it then follows that lim nk ([M, M ]) = 0 ([M, M ]) 1 0 .
k

Therefore,
nN

inf n (([M, M ])) 1 0 for all M <

inf ([M , M ]) 1 .

(13.23)

and {n }n=1 is not tight.

We further say that a collection of random variables, {X : } is tight 1 i the collection probability measures, P X : is tight. Equivalently put, {X : } is tight i
M

13.7 Weak Convergence in Metric Spaces

(This section may be skipped.)


Denition 13.43. Let X be a metric space. A sequence of probability measures {Pn }n=1 is said to converge weakly to a probability P if limn Pn (f ) = P (f ) for all for every f BC (X ). This is actually weak-* convergence when viewing Pn BC (X ) . For simplicity we will now assume that X is a complete metric space throughout this section. Proposition 13.44. The following are equivalent: 1. Pn P as n , i.e. Pn (f ) P (f ) for all f BC (X ). 2. Pn (f ) P (f ) for every f BC (X ) which is uniformly continuous. 3. lim sup Pn (F ) P (F ) for all F X.
n w

lim sup P (|X | M ) = 0.

(13.24)

Observe that the denition of uniform integrability (see Denition 11.25) is considerably stronger than the notion of tightness. It is also worth observing that if > 0 and C := sup E |X | < , then by Chebyschevs inequality, 1 C sup P (|X | M ) sup E |X | 0 as M M M and therefore {X : } is tight. Theorem 13.42. Let := {n }n=1 be a sequence of probability measures on , BR (R, BR ) . Then is tight, i every subsequently limit measure, 0 , on R is supported on R. In particular if is tight, there is a weakly convergent subsequence of converging to a probability measure on (R, BR ) . Proof. Suppose that nk = 0 with 0 being a probability measure on , BR R . As usual, let F0 (x) := 0 ([, x]) . If is tight and > 0 is given,
Page: 157 job: prob

4. lim inf n Pn (G) P (G) for all G o X. 5. limn Pn (A) = P (A) for all A B such that P (bd(A)) = 0.

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

158

13 Weak Convergence Results

Proof. 1. = 2. is obvious. For 2. = 3., let 1 if t 0 (t) := 1 t if 0 t 1 0 if t 1

i=1

(i 1) 1{ (i1) f < i } f k k k

i=1

i 1 (i1) i . k { k f < k }

(13.26)

(13.25)

Let Fi :=
k

i k

and notice that Fk = . Then for any probability P,


k

and let fn (x) := (nd(x, F )). Then fn BC (X, [0, 1]) is uniformly continuous, 0 1F fn for all n and fn 1F as n . Passing to the limit n in the equation 0 Pn (F ) Pn (fm ) gives 0 lim sup Pn (F ) P (fm )
n

i=1

(i 1) [P (Fi1 ) P (Fi )] P (f ) k

i=1

i [P (Fi1 ) P (Fi )] . k

(13.27)

Since (i 1) [P (Fi1 ) P (Fi )] k i=1


k k

and then letting m in this inequality implies item 3. 3. 4. Assuming item 3., let F = Gc , then 1 lim inf Pn (G) = lim sup(1 Pn (G)) = lim sup Pn (Gc )
n n c n

=
i=1 k1

(i 1) (i 1) P (Fi1 ) P (Fi ) k k i=1 i i1 1 P (Fi ) P (Fi ) = k k k i=1


k k1

P (G ) = 1 P (G) \ Ao , which implies 4. Similarly 4. = 3. 3. 5. Recall that bd(A) = A so if P (bd(A)) = 0 and 3. (and hence also 4. holds) we have ) P (A ) = P (A) and lim sup Pn (A) lim sup Pn (A
n n

=
i=1

P (Fi )
i=1

and i [P (Fi1 ) P (Fi )] k i=1


k k

lim inf Pn (A) lim inf Pn (Ao ) P (Ao ) = P (A)


n n

= X and =
i=1 i=1 k1

from which it follows that limn Pn (A) = P (A). Conversely, let F set F := {x X : (x, F ) } . Then bd(F ) F \ {x X : (x, F ) < } = A

i1 1 [P (Fi1 ) P (Fi )] + [P (Fi1 ) P (Fi )] k k i=1 P (Fi ) + 1 , k

where A := {x X : (x, F ) = } . Since {A }>0 are all disjoint, we must have P (A ) P (X ) 1


>0

Eq. (13.27) becomes, 1 k


k1

P (Fi ) P (f )
i=1

1 k

k1

P (Fi ) + 1/k.
i=1

and in particular the set := { > 0 : P (A ) > 0} is at most countable. Let n / be chosen so that n 0 as n , then P (Fm ) = lim Pn (Fm ) lim sup Pn (F ).
n n

Using this equation with P = Pn and then with P = P we nd lim sup Pn (f ) lim sup
n n k1

1 k

k1

Pn (Fi ) + 1/k
i=1

Let m in this equation to conclude P (F ) lim supn Pn (F ) as desired. To nish the proof we will now show 3. = 1. By an ane change of variables it suces to consider f C (X, (0, 1)) in which case we have
Page: 158 job: prob macro: svmonob.cls

1 k

P (Fi ) + 1/k P (f ) + 1/k.


i=1

date/time: 23-Feb-2007/15:20

13.7 Weak Convergence in Metric Spaces

159

Since k is arbitrary, lim supn Pn (f ) P (f ). Replacing f by 1 f in this inequality also gives lim inf n Pn (f ) P (f ) and hence we have shown limn Pn (f ) = P (f ) as claimed. Theorem 13.45 (Skorohod Theorem). Let (X, d) be a separable metric space and {n }n=0 be probability measures on (X, BX ) such that n = 0 as n . Then there exists a probability space, (, B , P ) and measurable func1 tions, Yn : X, such that n = P Yn for all n N0 := N {0} and limn Yn = Y a.s. Proof. See Theorem 4.30 on page 79 of Kallenberg [3]. Denition 13.46. Let X be a topological space. A collection of probability measures on (X, BX ) is said to be tight if for every > 0 there exists a compact set K BX such that P (K ) 1 for all P . Theorem 13.47. Suppose X is a separable metrizable space and = {Pn }n=1 is a tight sequence of probability measures on BX . Then there exists a subse quence {Pnk }k=1 which is weakly convergent to a probability measure P on BX . Proof. First suppose that X is compact. In this case C (X ) is a Banach space which is separable by the Stone Weirstrass theorem, see Exercise ??. By the Riesz theorem, Corollary ??, we know that C (X ) is in one to one correspondence with the complex measures on (X, BX ). We have also seen that C (X ) is metrizable and the unit ball in C (X ) is weak - * compact, see Theo rem ??. Hence there exists a subsequence {Pnk }k=1 which is weak -* convergent to a probability measure P on X. Alternatively, use the cantors diagonaliza tion procedure on a countable dense set C (X ) so nd {Pnk }k=1 such that (f ) := limk Pnk (f ) exists for all f . Then for g C (X ) and f , we have |Pnk (g ) Pnl (g )| |Pnk (g ) Pnk (f )| + |Pnk (f ) Pnl (f )| + |Pnl (f ) Pnl (g )| 2 g f + |Pnk (f ) Pnl (f )| which shows lim sup |Pnk (g ) Pnl (g )| 2 g f
n

n (A) := P n (A X ) for all A BX by setting P . By what we have just proved, := P n such that P converges weakly to a there is a subsequence P k k =1 k k probability measure P on X. The main thing we now have to prove is that (X ) = 1, this is where the tightness assumption is going to be used. Given P n (K ) 1 for all n. Since > 0, let K X be a compact set such that P K is compact in X it is compact in X as well and in particular a closed subset Therefore by Proposition 13.44 of X. (K ) lim sup P (K ) = 1 . P k
k

Since > 0 is arbitrary, this shows with X0 := n=1 K1/n satises P (X0 ) = 1. , we may view P as a measure on B by letting P (A) := Because X0 BX BX X (A X0 ) for all A BX . Given a closed subset F X, choose F X such P X. Then that F = F (F ) P (F ) = P (F X0 ) = P (F ), lim sup Pk (F ) = lim sup P k
k k

which shows Pk P.

Letting f tend to g in C (X ) shows lim supn |Pnk (g ) Pnl (g )| = 0 and hence (g ) := limk Pnk (g ) for all g C (X ). It is now clear that (g ) 0 for all g 0 so that is a positive linear functional on X and thus there is a probability measure P such that (g ) = P (g ). General case. By Theorem 18.34 we may assume that X is a subset of We now extend Pn to X a compact metric space which we will denote by X.
Page: 159 job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20

14 Characteristic Functions (Fourier Transform)


Denition 14.1. Given a probability measure, on (Rn , BRn ) , let () :=
Rn

d ( ) (x) =
Rn

u (x y ) d (y ) dx.

eix d (x)

If we further assume that d (x) = v (x) dx, then we have d ( ) (x) =


Rn

be the Fourier transform or characteristic function of . If X = (X1 , . . . , Xn ) : Rn is a random vector on some probability space (, B , P ) , then we let f () := fX () := E eiX . Of course, if := P X 1 , then fX () = () . Notation 14.2 Given a measure on a measurable space, (, B ) and a function, f L1 () , we will often write (f ) for f d. Denition 14.3. Let and be two probability measure on (Rn , BRn ) . The 1 convolution of and , denoted , is the measure, P (X + Y ) where 1 {X, Y } are two independent random vectors such that P X = and P Y 1 = . Of course we may give a more direct denition of the convolution of and by observing for A BRn that (A) = P (X + Y A) =
Rn

u (x y ) v (y ) dy dx.

To simplify notation we write, u v (x) =


Rn

u (x y ) v (y ) dy =
Rn

v (x y ) u (y ) dy.

Example 14.5. Suppose that n = 1, d (x) = 1[0,1] (x) dx and d (x) = 1[1,0] (x) dx so that (A) = (A) . In this case d ( ) (x) = 1[0,1] 1[1,0] (x) dx where 1[0,1] 1[1,0] (x) =
R

1[1,0] (x y ) 1[0,1] (y ) dy 1[0,1] (y x) 1[0,1] (y ) dy


R

= (14.1) (14.2) (14.3) =


R

d (x)
Rn

d (y ) 1A (x + y )

1[0,1]+x (y ) 1[0,1] (y ) dy

=
Rn

(A x) d (x) (A x) d (x) .
Rn

= m ([0, 1] (x + [0, 1])) = (1 |x|)+ .

14.1 Basic Properties of the Characteristic Function


Denition 14.6. A function f : Rn C is said to be positive denite, i m f () = f () for all Rn and for all m N, {j }j =1 Rn the matrix, {f (j k )}j,.k=1
m m

Remark 14.4. Suppose that d (x) = u (x) dx where u (x) 0 and u (x) dx = 1. Then using the translation invariance of Lebesgue meaRn sure and Tonellis theorem, we have (f ) =
Rn Rn

is non-negative. More explicitly we require,

f (x + y ) u (x) dxd (y ) =
Rn Rn

f (x) u (x y ) dxd (y )

k 0 for all (1 , . . . , m ) Cm . f (j k ) j
j,k=1

from which it follows that

162

14 Characteristic Functions (Fourier Transform)


m m

Notation 14.7 For l N {0} , let C l (Rn , C) denote the vector space of functions, f : Rn C which are l - time continuously dierentiable. More explicitly, , then f C l (Rn , C) i the partial derivatives, j1 . . . jk f, exist if j := x j and are continuous for k = 1, 2, . . . , l and all j1 , . . . , jk {1, 2, . . . , n} . Proposition 14.8 (Basic Properties of ). Let and be two probability measures on (Rn , BRn ) , then; 1. (0) = 1, and | ()| 1 for all . 2. () is continuous. 3. () = () for all Rn and in particular, is real valued i is symmetric, i.e. i (A) = (A) for all A BRn . (If = P X 1 for some random vector X, then is symmetric i X = X.) 4. is a positive denite function. (For the converse of this result, see Bochners Theorem 14.41 below. l 5. If Rn x d (x) < , then C l (Rn , C) and j1 . . . jm () =
Rn d

k = (j k ) j
j,k=1 Rn j,k=1 m

k d (x) ei(j k )x j eij x j eik x k d (x)


Rn j,k=1 m 2

=
Rn j =1

ij x

d (x) 0.

Example 14.9 (Example 14.5 continued.). Let d (x) = 1[0,1] (x) dx and (A) = (A) . Then () = ei 1 , i 0 ei 1 () = () = () = , and i eix dx =
2 1

(ixj1 . . . ixjm ) eix d (x) for all m l.

6. If X and Y are independent random vectors then fX +Y () = fX () fY () for all Rn . This may be alternatively expressed as () = () () for all Rn . 7. If a R, b Rn , and X : Rn is a random vector, then faX +b () = eib fX (a) . Proof. The proof of items 1., 2., 6., and 7. are elementary and will be left to the reader. It also easy to see that () = () and () = () if is symmetric. Therefore if is symmetric, then () is real. Conversely if () is real then () = () = eix d (x) = ()
Rn

() = () () = | ()| =

ei 1 i

2 [1 cos ] . 2

According to example 14.5 we also have d ( ) (x) = (1 |x|)+ dx and so directly we nd () =


R 1

eix (1 |x|)+ dx =
R

cos (x) (1 |x|)+ dx


1

=2
0

(1 x) cos x dx = 2
0 1 1 0

(1 x) d

sin x

sin x = 2 d (1 x) =2 0 1 cos . =2 2

sin x cos x x=1 dx = 2 |x=0 2

Proposition 14.10 (Injectivity of the Fourier Transform). If and are two probability measure on (Rn , BRn ) such that = , then = . Proof. Let H be the subspace of bounded measurable complex functions, f : Rn C, such that (f ) = (f ) . Then H is closed under bounded convergence and complex conjugation. Suppose that Zd is a nite set, L > 0 and p (x) =

where (A) := (A) . The uniqueness Proposition 14.10 below then implies = , i.e. is symmetric. This proves item 3. Item 5. follows by induction using Corollary 8.38. For item 4. let m N, m {j }j =1 Rn and (1 , . . . , m ) Cm . Then

a eix/(2L)

(14.4)

Page: 162

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

14.1 Basic Properties of the Characteristic Function

163

with a C. Then by assumption, (p) =

2L

2L

= (p)

Proof. This will be proved by induction on m. We start with m = 0 in which case we automatically we know by Proposition 14.8 or Lemma 14.11 that f C (R, C)). Since u () := Re f () = E [cos (X )] , it follows that u is an even function of and hence u = Re f is an odd function of and in particular, u (0) = 0. By the mean value theorem, to each > 0 with near 0, there exists 0 < c < such that u () u (0) = u (c ) = u (c ) u (0) . Therefore, u (0) u () u (c ) u (0) = u (0) as 0. c c Since E and lim0 1 cos (X ) u (0) u () 1 cos (X ) E = 2 c c
2 =1 2 X , we may apply Fatous lemma to conclude,

so that p H. From the Stone-Weirstrass theorem (see Exercise 14.7 below) or the theory of the Fourier series, any f C (Rn , C) which is L periodic, (i.e. f (x + Lei ) = f (x) for all x Rd and i = 1, 2, . . . , n) may be uniformly approximated by a trigonometric polynomial of the form in Eq. (14.4), see Exercise 14.8 below. Hence it follows from the bounded convergence theorem that f H for all f C (Rn , C) which are L periodic. Now suppose f Cc (Rn , C) . Then for L > 0 suciently large the function, fL (x) :=
Zn

f (x + L) ,

is continuous and L periodic and hence fL H. Since fL f boundedly as L , we may further conclude that f H as well, i.e. Cc (Rn , C) H. An application of the multiplicative system Theorem (see either Theorem 9.3 or Theorem 9.14) implies H contains all bounded (Cc (Rn , R)) = BRn measurable functions and this certainly implies = . For the most part we are now going to stick to the one dimensional case, i.e. X will be a random variable and will be a probability measure on (R, BR ) . The following Lemma is a special case of item 4. of Proposition 14.8. Lemma 14.11. Suppose n N and X is random variables such that E [|X | ] < () := E eiX is C n . If = P X 1 is the distribution of X, then dierentiable and (l) () = E (iX ) eiX =
R l n

1cos(X ) 2

1 1 cos (X ) E X 2 lim inf E u (0) < . 0 2 2 An application of Lemma 14.11 then implies that f C 2 (R, C) . For the general induction step we assume the truth of the theorem at level m in which case we know by Lemma 14.11 that f (2m) () = (1) E X 2m eiX =: (1) g () . By assumption we know that g is dierentiable in a neighborhood of 0 and that g (0) exists. We now proceed exactly as before but now with u := Re g. So for each > 0 near 0, there exists c (0, ) such that u (0) u () u (0) as 0 c and E X 2m 1 cos (X ) 1 cos (X ) u (0) u () E X 2m = . 2 c c
m m

(ix) eix d (x) for l = 0, 1, 2, . . . , n.

In particular it follows that E Xl (l) (0) = . il

The following theorem is a partial converse to this lemma. Hence the combination of Lemma 14.11 and Theorem 14.12 (see also Corollary 14.34 below) shows that there is a correspondence between the number of moments of X and the dierentiability of fX . Theorem 14.12. Let X be a random variable, m {0, 1, 2, . . . } , f () = E eiX . If f C 2m (R, C) such that g := f (2m) is dierentiable in a neighborhood of 0 and g (0) = f (2m+2) (0) exists. Then E X 2m+2 < and f C 2m+2 (R, C) .
Page: 163 job: prob

Another use of Fatous lemma gives, 1 1 cos (X ) E X 2m+2 = lim inf E X 2m u (0) < 0 2 2 from which Lemma 14.11 may be used to show f C 2m+2 (R, C) . This completes the induction argument.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

164

14 Characteristic Functions (Fourier Transform)

14.2 Examples
Example 14.13. If < a < b < and d (x) = () = 1 ba
b 1 ba 1[a,b]

FT (t) := P (T t) = 1 ea(t0) . Since FT (t) is piecewise dierentiable, the law of T, := P T 1 , has a density, (x) dx then Therefore, E eiaT =
0

d (t) = FT (t) dt = aeat 1t0 dt.

eix dx =
a

eib eia . i (b a)

aeat eit dt = a

If a = c and b = c with c > 0, then () = Observe that sin c . c Since

a = () . a i a (a i)
3

() = i it follows that ET = and hence Var (T ) =


2 a2

(a i)

and () = 2

1 () = 1 2 c2 + . . . 3!

1 2 and therefore, (0) = 0 and (0) = 3 c and hence it follows that

(0) (0) 2 = a1 and ET 2 = = 2 2 i i a


1 2 a

xd (x) = 0 and
R

1 x d (x) = c2 . 3 R
2

= a2 .
2

Example 14.14. Suppose Z is a Poisson random variable with mean a > 0, i.e. n P (Z = n) = ea a n! . Then

Proposition 14.16. If d (x) := 1 ex 2 ular we have xd (x) = 0 and


R

/2

dx, then () = e x2 d (x) = 1.

/2

. In partic-

fZ () = E eiZ = ea
n=0

ein

aei an = ea n! n! n=0

= exp a ei 1

Proof. Dierentiating the formula, 1 () = 2 ex


R
2

/2 ix

Dierentiating this result gives, fZ () = iaei exp a ei 1 and

dx,

for with respect to and then integrating by parts implies, 1 () = 2 i = 2 i = 2 ixex


R
2

fZ () = a2 ei2 aei exp a ei 1 from which we conclude, 1 EZ = fZ (0) = a and EZ 2 = fZ (0) = a2 + a. i Therefore, EZ = a = Var (Z ) . Example 14.15. Suppose T is a positive random variable such that P (T t + s|T s) = P (T t) for all s, t 0, or equivalently P (T t + s) = P (T t) P (T s) for all s, t 0, then P (T t) = eat for some a > 0. (Such exponential random variables are often used to model waiting times.) The distribution function for T is

/2 ix

dx

d x2 /2 ix e e dx dx R 2 d ex /2 eix dx = () . dx R

Solving this equation of () then implies () = e


2

/2

(0) = e

/2

(R) = e

/2

Page: 164

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

14.3 Continuity Theorem

165

Example 14.17. If is a probability measure on (R, BR ) and n N, then n is the characteristic function of the probability measure, namely the measure
n times

14.3 Continuity Theorem


Lemma 14.20 (Tail Estimate). Let X : (, B , P ) R be a random variable and fX () := E eiX be its characteristic function. Then for a > 0, P (|X | a) a 2
2/a

:= .

(14.5)

1 Alternatively put, if {Xk }k=1 are i.i.d. random variables with = P Xk , then n fX1 ++Xn () = fX () . 1

(1 fX ()) d =
2/a

a 2

2/a

(1 Re fX ()) d
2/a

(14.6)

Example 14.18. Suppose that {n }n=0 are probability measure on (R, BR ) and {pn }n=0 [0, 1] such that n=0 pn = 1. Then n=0 pn n is the characteristic function of the probability measure,

Proof. Recall that the Fourier transform of the uniform distribution on c [c, c] is sin c and hence 1 2c Therefore, 1 2c where
c

fX () d =
c

1 2c

E eiX d = E
c

sin cX . cX

:=
n=0

pn n .
. Let {Xn }n=0 {T } be independent P (T = n) = pn for all n N0 . Then

Here is a more interesting interpretation of 1 = n and random variables with P Xn (A) = P (XT A) , where XT ( ) := XT () ( ) . Indeed,

(1 fX ()) d = 1 E
c

sin cX cX

= E [Yc ]

(14.7)

(A) = P (XT A) =
n=0

P (XT A, T = n) =
n=0

P (Xn A, T = n)

sin cX . cX Notice that Yc 0 (see Eq. (14.47)) and moreover, Yc 1/2 if |cX | 2. Hence we may conclude Yc := 1 E [Yc ] E [Yc : |cX | 2] E 1 1 : |cX | 2 = P (|X | 2/c) . 2 2

=
n=0

P (Xn A, T = n) =
n=0

pn n (A) .

Let us also observe that


Combining this estimate with Eq. (14.7) shows, E eiXn : T = n 1 2c


c

() = E eiXT =
n=0

E eiXT : T = n =
n=0

(1 fX ()) d
c

1 P (|X | 2/c) . 2

=
n=0

E eiXn P (T = n) =
n=0

pn n () .

Taking a = 2/c in this estimate proves Eq. (14.6). Theorem 14.21 (Continuity Theorem). Suppose that {n }n=1 is a sequence of probability measure on (R, BR ) and suppose that f () := limn n () exists for all R. If f is continuous at = 0, then f is the characteristic function of a unique probability measure, , on BR and n = as n . Proof. By the continuity of f at = 0, for ever > 0 we may choose a suciently large so that 1 a 2
macro: svmonob.cls
2/a

n is the Example 14.19. If is a probability measure on (R, BR ) then n=0 pn characteristic function of a probability measure, , on (R, BR ) . In this case, = n=0 pn n where n is dened in Eq. (14.5). As an explicit example, if n a , then a > 0 and pn = a n! e

pn n =
n=0

an a n 1) e = ea ea = ea( n ! n=0

is the characteristic function of a probability measure. In other words, fXT () = E eiXT = exp (a (fX1 () 1)) .
Page: 165 job: prob

(1 Re f ()) d /2.
2/a

date/time: 23-Feb-2007/15:20

166

14 Characteristic Functions (Fourier Transform)

According to Lemma 14.20 and the DCT, n ({x : |x| a }) 1 a 2 1 a 2


2/a

Solution to Exercise (14.1). Working as above, we have (1 Re n ()) d 1 2c


d

2/a 2/a

1 eiX d = 1
[c,c]
d

sin cXj =: Yc , cXj j =1

(14.9)

(1 Re f ()) d /2.
2/a

where as before, Yc 0 and Yc 1/2 if c |Xj | 2 for some j, i.e. if c |X | 2. Therefore taking expectations of Eq. (14.9) implies, 1 2c
d [c,c]d

Hence n ({x : |x| a }) for all suciently large n, say n N. By increasing a if necessary we can assure that n ({x : |x| a }) for all n and hence := {n }n=1 is tight. By Theorem 13.42, we may nd a subsequence, {nk }k=1 and a probability measure on BR such that nk = as k . Since x eix is a bounded and continuous function, it follows that () = lim nk () = f () for all R,
k

(1 fX ()) d = E [Yc ] E [Yc : |X | 2/c] E 1 1 : |X | 2/c = P (|X | 2/c) . 2 2

Taking c = 2/a in this expression implies Eq. (14.8). The following lemma will be needed before giving our rst applications of the continuity theorem. Lemma 14.23. Suppose that {zn }n=1 C satises, limn nzn = C, then n lim (1 + zn ) = e .
n Proof. Since nzn , it follows that zn n 0 as n and therefore ln(1+zn ) by Lemma 14.45 below, (1 + zn ) = e and 2 ln (1 + zn ) = zn + O zn = zn + O

that is f is the characteristic function of a probability measure, . We now claim that n = as n . If not, we could nd a bounded continuous function, g, such that limn n (g ) = (g ) or equivalently, there would exists > 0 and a subsequence {k := nk } such that | (g ) k (g )| for all k N. However by Theorem 13.42 again, there is a further subsequence, l = kl of k such that l = for some probability measure . Since () = liml l () = f () = () , it follows that = . This leads to a contradiction since, lim | (g ) l (g )| = | (g ) (g )| = 0.
l

1 n2

Therefore, (1 + zn ) = eln(1+zn )
n n

Remark 14.22. One could also use Bochners Theorem 14.41 to conclude; if f () := limn n () is continuous then f is the characteristic function of a probability measure. Indeed, the condition of a function being positive denite is preserved under taking pointwise limits. Exercise 14.1. Suppose now X : (, B , P ) Rd is a random vector and fX () := E eiX is its characteristic function. Show for a > 0, P (|X | a) 2 a 4
d

= en ln(1+zn ) = en(zn +O( n2 )) e as n .


1

Proposition 14.24 (Weak Law of Large Numbers revisited). Suppose P n that {Xn }n=1 are i.i.d. integrable random variables. Then S n EX1 =: . Proof. Let f () := fX1 () = E eiX1 . Then by Taylors theorem, f () = 1 + i + o () . Since, f Sn () = f
n

(1 fX ()) d = 2
[2/a,2/a]d

a 4

d [2/a,2/a]d

(1 Re fX ()) d (14.8)

= 1 + i

+o n

1 n

where |X | = maxi |Xi | and d = d1 , . . . , dd .

it follows from Lemma 14.23 that


macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 166

job: prob

14.3 Continuity Theorem


n

167

lim f Sn () = ei
n

Corollary 14.26. If {Xn }n=1 are 2 that EX1 = 0 and EX1 = 1, then

i.i.d. square integrable random variables such

which is the characteristic function of the constant random variable, . By the n continuity Theorem 14.21, it follows that S and since is constant we n = may apply Lemma 13.19 to conclude
Sn n

sup P
R

S n y P (N (0, 1) y ) 0 as n . n

(14.11)

Theorem 14.25 (The Basic Central Limit Theorem). Suppose that {Xn }n=1 are i.i.d. square integrable random variables such that EX1 = 0 and Sn 2 EX1 = 1. Then = N (0, 1) . n Proof. By Theorem 14.21 and Proposition 14.16, it suces to show
n

Proof. This is a direct consequence of Theorem 14.25 and Exercise 13.3. Berry (1941) and Esse n (1942) showed there exists a constant, C < , such 3 3 that; if := E |X1 | < , then sup P
R

S n y P (N (0, 1) y ) C n

/ n.

lim E e

Sn i n

=e

2 /2

for all R.

Letting f () := E eiX1 , we have by Taylors theorem (see Eq. (14.43) and (14.46)) that 1 (14.10) f () = 1 (1 + ()) 2 2 where () 0 as 0. Therefore,
Sn () = E e f n Sn i n

In particular the rate of convergence is n1/2 . The exact value of the best constant C is still unknown but it is known to be less than 1. We will not prove this theorem here. However we will give a related result in Theorem 14.28 below. Remark 14.27. It is now a reasonable question to ask why is the limiting random variable normal in Theorem 14.25. One way to understand this is, if Sn under the assumptions of Theorem 14.25, we know = L where L is some n 2 random variable with EL = 0 and EL = 1, then S 1 2n = 2n 2
2n k=1, k odd

= f 1+

n n

1 = 1 2

2 n

Xj

/2

2n k=1, k even

Xj

(14.12)

wherein we have used Lemma 14.23 with zn = 1 2 1+ n 2 . n


d d

1 = (L1 + L2 ) 2 where L1 = L = L2 and L1 and L2 are independent. To rigorously understand this, using characteristic functions we would conclude from Eq. (14.12) that
S2n () = f Sn f 2n

Alternative proof. This proof uses Lemma 15.6 below as follows;


Sn () e f
n 2

/2

n 2 e /2n = f n 2 n f e /2n n 1 2 2 =n 1 1+ 1 +O 2 n 2n n

Sn f

Sn () = Passing to the limit in this equation then shows, with f () = limn f n

fL () , that 1 n2 f () = f Iterating this equation then shows 2


2n n

0 as n . = 1 1 2 2
n 2

2n 1+ 2
n

f () = f

Page: 167

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

168

14 Characteristic Functions (Fourier Transform)

An application of Lemma 14.23 then shows 1 f () = lim 1 n 2 =e


2 1 2

r (x, ) := 2n 1+ 2
n

1 2

f
0

(x + t) (1 t) dt.

2 n

Taking Eq. (14.15) with replaced by and subtracting the results then implies 1 f (x + ) f (x + ) = f (x) ( ) + f (x) 2 2 + (x, ) , (14.16) 2 where | (x, )| = r (x, ) 3 r (x, ) 3 M 3 3 || + | | , 3! (14.17)

= fN (0,1) () .

That is we must have L = N (0, 1) . It is interesting to give another proof of the central limit theorem. For this proof we will assume {Xn }n=1 has third moments. The only property about normal random variables that we shall use the proof is that if {Nn }n=1 are i.i.d. standard normal random variables, then T N + + Nn d n := 1 = N (0, 1) . n n Theorem 14.28 (A Non-Characteristic Proof of the CLT). Suppose that 3 {Xn }n=1 are mean zero variance one i.i.d random variables such that E |X1 | < 3 (3) . Then for f C (R) with M := supxR f (x) < , Ef S n n 1 M 3 3 E |N | + |X1 | Ef (N ) n 3!
d

wherein we have used the simple estimate, |r (x, )| M/3!. If we dene Uk := (N1 + + Nk1 + Xk+1 + + Xn ) / n, then Vk = Uk + Nk / n and V = U + X / n. Hence, using Eq. (14.16) with x = Uk , k 1 k k = Nk / n and = Xk / n, it follows that f (Vk ) f (Vk1 ) = f Uk + Nk / n f Uk + Xk / n 1 1 2 2 = f (Uk ) (Nk Xk ) + f (Uk ) Nk Xk + Rk 2n n (14.18) where M 3 3 |Nk | + |Xk | . (14.19) 3! n3/2 2 Taking expectations of Eq. (14.18) using; Eq. (14.19), ENk = 1 = EXk , ENk = 2 1 = EXk and the fact that Uk is independent of both Xk and Nk , we nd |Rk | = |E [f (Vk ) f (Vk1 )]| = |ERk | M 3 3 E |Nk | + |Xk | 3! n3/2

(14.13)

where Sn := X1 + + Xn and N = N (0, 1) . n , Nn Proof. Let X be independent random variables such that Nn = n=1 d n by Xn . Let N (0, 1) and Xn = X1 . To simplify notation, we will denote X Tn := N1 + + Nn and for 0 k n, let Vk := (N1 + + Nk + Xk+1 + + Xn ) / n with the convention that Vn = Sn / n and V0 = Tn / n. Then by a telescoping series argument, it follows that f Sn / n f Tn / n = f (Vn ) f (V0 ) =
n d

M 3 3 E |N1 | + |X1 | . 3! n3/2


n n

Combining this estimate with Eq. (14.14) shows, E f Sn / n f T n / n =


k=1

ERk
k=1

E |Rk |

[f (Vk ) f (Vk1 )] . (14.14)


k=1

1 M 3 3 E |N1 | + |X1 | . n 3! This completes the proof of Eq. (14.13) since


Tn () = f fN

Tn d = n

N because,

We now make use of Taylors theorem with integral remainder the form, 1 f (x + ) f (x) = f (x) + f (x) 2 + r (x, ) 3 2 where
Page: 168 job: prob

(14.15)

1 2 = exp n 2 n

= exp 2 /2 = fN () .

For more in this direction the reader is advised to look up Steins method.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

14.4 A Fourier Transform Inversion Formula

169

14.4 A Fourier Transform Inversion Formula


Proposition 14.10 guarantees the injectivity of the Fourier transform on the space of probability measures. Our next goal is to nd an inversion formula for the Fourier transform. To motivate the construction below, let us rst recall a few facts about Fourier series. To keep our exposition as simple as possible, we now restrict ourselves to the one dimensional case. n i L x For L > 0, let eL and let n (x) := e 1 (f, g )L := 2L
L

([a, b]) =
a

f (x) dx =
a b

1 2

() eix d dx

= =

1 2 1 2

()
a

eix dx d eia eib i d d.

()
c

= lim f (x) g (x) dx

1 c 2

()
c

eia eib i

for f, g L2 ([L, L] , dx) . Then it is well known (and fairly elementary 2 to prove) that eL n : n Z is an orthonormal basis for L ([L, L] , dx) . In particular, if f Cc (R) with supp(f ) [L, L] , then for x [L, L] , f (x) =
nZ

This should provide some motivation for Theorem 14.30 below. The following lemma is needed in the proof of the inversion Theorem 14.30 below. Lemma 14.29. For c > 0, let S (c) := 1 2
c c

f, eL n

eL L n

(x) =

1 2L

L nZ L

sin d.

(14.21)

f (y ) ei L y dy ei L x (14.20)

Then S (c) boundedly as c and


c c

1 = 2L where

nZ

n x n ei L f L

sin y d = sgn(y )S (c |y |) for all y R. 1 if y > 0 sgn(y ) = 1 if y < 0 . 0 if y = 0

(14.22)

where

() = f

f (y ) eiy dy.

Letting L in Eq. (14.20) then suggests that 1 2L


n 1 x n ei L f L 2

() eix d f

Proof. The rst assertion has already been dealt with in Example 10.12. We will repeat the argument here for the readers convenience. By symmetry and Fubinis theorem, S (c) =
1 c 1 c sin d = sin et dt d 0 0 0 c 1 = dt d sin et 0 0 1 1 1 = + etc [ cos c t sin c] dt, 2 0 1 + t2 c c

nZ

and we are lead to expect, f (x) = 1 2

() eix d. f

(14.23)

Hence if we now think that f (x) is a probability density and let d (x) := () , we should expect f (x) dx so that () = f

wherein we have used


c

d sin et = Im
0 0

dei et = Im
0

de(it) e(it)c 1 (i t)

= Im =

e(it)c 1 (i t)

1 Im 1 + t2

1 etc [ cos c t sin c] + 1 1 + t2


date/time: 23-Feb-2007/15:20

Page: 169

job: prob

macro: svmonob.cls

170

14 Characteristic Functions (Fourier Transform)

1 1 1 dt = . 0 1 + t2 2 The the integral in Eq. (14.23) tends to as c by the dominated convergence theorem. The second assertion in Eq. (14.22) is a consequence of the change of variables, z = y. Theorem 14.30 (Fourier Inversion Formula). If is a probability measure on (R, BR ) and < a < b < , then 1 c 2 lim
c

and

Corollary 14.31. Suppose that is a probability measure on (R, BR ) such that L1 (m) , then d = dm where is a continuous density on R. Proof. The function, (x) := 1 2 () eix d,
R

()
c

eia eib i
c

d = ((a, b)) +

1 ( ({a}) + ({b})) . 2

is continuous by the dominated convergence theorem. Moreover,


b

Proof. By Fubinis theorem and Lemma 14.29, I (c) :=


c c

(x) dx =
a

1 2

dx
a R

d () eix
b

()

eia eib i

d d

=
c R

eix d (x)
c

eia eib i e
ia

1 2 1 = 2 =

d ()
R a

dxeix eia eib i eia eib d i

d ()
R c

=
R

d (x)
c c

deix d
c

e i e i

ib

1 = lim 2 c .

()
c

=
R

d (x)

i(ax)

i(bx)

= ((a, b)) +

1 [ ({a}) + ({b})] . 2

Since Im ei(ax) ei(bx) i


c

cos ( (a x)) cos ( (b x))

Letting a b over a R such that ({a}) = 0 in this identity shows ({b}) = 0 for all b R. Therefore we have shown
b

((a, b]) =
a

(x) dx for all < a < b < .

is an odd function of it follows that I (c) =


R

d (x)
c c

d Re d
c

=
R

d (x)

ei(ax) ei(bx) i sin (x a) sin (x b)

Using one of the multiplicative systems theorems, it is now easy to verify that (A) = A (x) dx for all A BR or R hd = R hd for all bounded measurable functions h : R R. This then implies that 0, m a.e., and the d = dm. Example 14.32. Recall from Example 14.9 that eix (1 |x|)+ dx = 2
R

= 2
R

d (x) [sgn(x a)S (c |x a|) sgn(x b)S (c |x b|)] .

Now letting c in this expression (using the DCT) shows 1 1 lim I (c) = c 2 2 1 = 2 d (x) [sgn(x a) sgn(x b)]
R

1 cos . 2

Hence it follows1 from Corollary 14.31 that (1 |x|)+ =


1

d (x) 2 1(a,b) (x) + 1{a} (x) + 1{b} (x)


R

1 cos ix e d. 2

(14.24)

= ((a, b)) +

1 [ ({a}) + ({b})] . 2
job: prob

This identity could also be veried directly using residue calculus techniques from complex variables.

Page: 170

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

14.5 Exercises

171

Corollary 14.33. For all random variables, X, we have E |X | = 1 1 1 1 Re fX () d. 2 1 cos d. 2 (14.25)

Proof. Let u () := Re fX () = E [cos X ] and assume that u C 1 ((2, 2) , C) . Then according to Eq. (14.25) E |X | =
R

Proof. Evaluating Eq. (14.24) at x = 0 implies


1 u () d = 2

||

1 u () d + 2

||>

1 u () d. 2

1=

Since 0 1 u () 2 and 2/2 is integrable for || > , it suces to show >


||

Making the change of variables, M , in the above integral then shows M= 1 cos (M ) d. 2 1 Re fX () d. 2
1cos d 2

1 u () d = lim 0 2

||

1 u () d. 2

By an integration by parts we nd 1 u () d = 2 = is , we could =


||

Now let M = |X | in this expression and then take expectations to nd 1 E |X | = 1 cos X 1 E d = 2 R


||

(1 u ()) d 1
||

u () 1 u () 1 | + | 1 u () d +

1 u () d
||

Suppose that we did not know the value of c := still proceed as above to learn E |X | = 1 c

1 Re fX () d. 2

u () 1 u () 1

u ( ) 1 u ( ) 1 . + lim
0 ||

We could then evaluate c by making a judicious choice of X. For example if d X = N (0, 1) , we would have on one hand 1 E |X | = 2 |x| ex
R
2

1 u () d +

u () + u () + u (0) u (0)

/2

2 dx = 2 and so
1

xex

/2

dx =

2 .

||

u () + u () |u ()| d + ||

On the other hand, fX () = e 2 1 = c = 1 c 1e


R

2 /2

=2
0

|u ()| u () + u () d + < .

2 /2

d 2 c

1 = c

d 1e
R

2 /2

Passing the limit as 0 using the fact that u () is an odd function, we learn 1 u () d = lim 0 2

e
R

/2

d =

1 u () d +
||

||

u () + u ()

from which it follows, again, that c = . Corollary 14.34. Suppose X is a random variable such that u () := fX () continuously dierentiable for (2, 2) for some > 0. We further assume
0

2
0

|u ()| u () + u () d + < .

|u ()| d < .

(14.26)

Then E |X | < and fX C 1 (R, C) . (Since u is even, u is odd and u (0) = 0. Hence if u () were H older continuous for some > 0, then Eq. (14.26) would hold.)
Page: 171 job: prob

14.5 Exercises
Exercise 14.2. For x, R, let

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

172

14 Characteristic Functions (Fourier Transform)

(, x) :=

ix 1ix e if x = 0 x2
2 1 2

if x = 0.

(It is easy to see that (, 0) = limx0 (, x) and in fact that (, x) is n n smooth in (, x) .) Let {xk }k=1 R \ {0} , {Zk }k=1 {N } be independent random variables with N = N (0, 1) and Zk being Poisson random variables an with mean ak > 0, i.e. P (Zk = n) = eak nk ! for n = 0, 1, 2 . . . . With Y := n x ( Z a ) + N, show k k k=1 k fY () := E eiY = exp
R d

Exercise 14.5 (Exercise 2.3 in [5]). Let be the probability measure on 1 (R, BR ) , such that ({n}) = p (n) = c n2 ln |n| 1|n|2 with c chosen so that 1 p ( n ) = 1 . Show that C ( R , C ) even though R |x| d (x) = . To do nZ this show, 1 cos nt g (t) : n2 ln n
n2

is continuously dierentiable. Exercise 14.6 (Polyas Criterioin [1, Problem 26.3 on p. 305.] and [2, p. 104-107.]). Suppose () is a non-negative symmetric continuous function such that (0) = 1, () is non-increasing and convex for 0. Show () = () for some probability measure, , on (R, BR ) . Solution to Exercise (14.6). Because of the continuity theorem and some simple limiting arguments, it suces to prove the result for a function as pictured in Figure 14.1. From Example 14.32, we know that (1 ||)+ = ()

(, x) d (x)

where is the discrete measure on (R, BR ) given by


n

= 2 0 +
k=1

ak x2 k xk .

(14.27)

Exercise 14.3. To each nite and compactly supported measure, , on (R, BR ) show there exists a sequence {n }n=1 of nitely supported nite measures on (R, BR ) such that n = . Here we say is compactly supported if there exists M < such that ({x : |x| M }) = 0 and we say is nitely supported if there exists a nite subset, R such that (R \ ) = 0. Please interpret n = to mean, f dn
R R

f d for all f BC (R) .

Exercise 14.4. Show that if is a nite measure on (R, BR ) , then f () := exp


R

(, x) d (x)

(14.28)

is the characteristic function of a probability measure on (R, BR ) . Here is an outline to follow. (You may nd the calculus estimates in Section 14.8 to be of help.) 1. Show f () is continuous. 2. Now suppose that is compactly supported. Show, using Exercises 14.2, 14.3, and the continuity Theorem 14.21 that exp R (, x) d (x) is the characteristic function of a probability measure on (R, BR ) . 3. For the general case, approximate by a sequence of nite measures with compact support as in item 2.

Fig. 14.1. Here is a piecewise linear convex function. We will assume that dn > 0 for all n and that () = 0 for suciently large. This last restriction may be removed later by a limiting argument.

where is the probability measure, d (x) := 1 1 cos x dx. x2

For a > 0, let a (A) = (aA) in which case a (f ) = f a1 for all bounded measurable f and in particular, a () = a1 . To nish the proof it suces to show that () may be expressed as
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 172

job: prob

14.6 Appendix: Bochners Theorem


173

() =
n=1

pn an () =
n=1

pn 1

an

(14.29)
+

14.6 Appendix: Bochners Theorem


Denition 14.35. A function f C (Rn , C) is said to have rapid decay or rapid decrease if sup (1 + |x|)N |f (x)| < for N = 1, 2, . . . .
xRn

for some an > 0 and pn 0 such that n=1 pn . Indeed, if this is the case we may take, := n=1 pn an . It is pretty clear that we should take an = d1 + + dn for all n N. Since we are assuming () = 0 for large , there is a rst index, N N, such that
N

0 = (aN ) = 1
n=1

dn sn .

(14.30)

Equivalently, for each N N there exists constants CN < such that |f (x)| CN (1 + |x|)N for all x Rn . A function f C (Rn , C) is said to have (at most) polynomial growth if there exists N < such sup (1 + |x|)
N

Notice that sn = 0 for all n > N. Since

|f (x)| < ,

() =
n=k

1 pn when ak1 < < ak an

i.e. there exists N N and C < such that |f (x)| C (1 + |x|)N for all x Rn . Denition 14.36 (Schwartz Test Functions). Let S denote the space of functions f C (Rn ) such that f and all of its partial derivatives have rapid decay and let f N, = sup (1 + |x|)N f (x)
xRn

we must require, sk =

pn
n=k

1 for all k an

which then implies

pk a1 k

= sk sk+1 or equivalently that pk = ak (sk sk+1 ) . (14.31)

so that S = f C (Rn ) : f
N,

< for all N and .

Since is convex, we know that sk sk+1 or sk sk+1 for all k and therefore pk 0 and pk = 0 for all k > N. Moreover,

Also let P denote those functions g C (Rn ) such that g and all of its derivatives have at most polynomial growth, i.e. g C (Rn ) is in P i for all multiindices , there exists N < such sup (1 + |x|)
N

pk =
k=1 k=1

ak (sk sk+1 ) =
k=1

ak sk
k=2

ak1 sk

| g (x)| < .

(Notice that any polynomial function on Rn is in P .) sk dk Denition 14.37. A function : Rn C is said to be positive (semi) m denite i the matrices A := {(k j )}k,j =1 are positive denite for all m m N and {j }j =1 Rn . Proposition 14.38. Suppose that : Rn C is said to be positive denite with (0) = 1. If is continuous at 0 then in fact is uniformly continuous on all of Rn . Proof. Taking 1 = x, 2 = y and 3 = 0 in Denition 14.37 we conclude that 1 (x y ) (x) 1 (x y ) (x) 1 (y ) = (x y ) 1 (y ) A := (y x) (x) (y ) 1 (x) (y ) 1
macro: svmonob.cls date/time: 23-Feb-2007/15:20

= a1 s1 +
k=2

sk (ak ak1 ) = d1 s1 +
k=2

=
k=1

sk dk = 1

where the last equality follows from Eq. (14.30). Working backwards with pk d = dened as in Eq. (14.31) it is now easily shown that d n=1 pn 1 an
+

() for / {a1 , a2 , . . . } and since both functions are equal to 1 at = 0 we may conclude that Eq. (14.29) is indeed valid.

Page: 173

job: prob

174

14 Characteristic Functions (Fourier Transform)

is positive denite. In particular, 0 det A = 1 + (x y ) (y ) (x) + (x) (x y ) (y ) | (x)| | (y )| | (x y )| . Combining this inequality with the identity, | (x) (y )| = | (x)| + | (y )| (x) (y ) (y ) (x) , gives 0 1 | (x y )| + (x y ) (y ) (x) + (x) (x y ) (y ) | (x) (y )| + (x) (y ) + (y ) (x) = 1 | (x y )| | (x) (y )| + (x y ) (y ) (x) (y ) (x) + (x) (x y ) (y ) (x) (y ) = 1 | (x y )| | (x) (y )| + 2 Re (( (x y ) 1) (y ) (x)) 1 | (x y )| | (x) (y )| + 2 | (x y ) 1| . Hence we have | (x) (y )| 1 | (x y )| + 2 | (x y ) 1| = (1 | (x y )|) (1 + | (x y )|) + 2 | (x y ) 1| 4 |1 (x y )| which completes the proof. Lemma 14.39. If C (Rn , C) is a positive denite function, then 1. (0) 0. 2. ( ) = ( ) for all Rn . 3. |( )| (0) for all Rn . 4. For all f S(Rd ), ( )f ( )f ( )dd 0.
Rn Rn 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

0 det

(0) ( ) 2 2 = |(0)| |( )| . ( ) (0)

and hence |( )| (0) for all . This proves items 2. and 3. Item 4. follows by approximating the integral in Eq. (14.32) by Riemann sums, ( )f ( )f ( )dd
Rn Rn

= lim 2n
0 , (Zn )[1 ,1 ]n

( )f ( )f ( ) 0.

The details are left to the reader. Lemma 14.40. If is a nite positive measure on BRn , then := C (Rn , C) is a positive denite function. Proof. As has already been observed after Denition ??, the dominated convergence theorem implies C (Rn , C). Since is a positive measure (and hence real), ( ) =
Rn

eix d(x) =
Rn

( ). eix d(x) =
m

From this it follows that for any m N and {j }j =1 Rn , the matrix A := m { (k j )}k,j =1 is self-adjoint. Moreover if Cm ,
m m

j = (k j )k
k,j =1 Rn k,j =1 m

j d(x) ei(k j )x k eik x k eij x j d(x)


Rn k,j =1 m 2 ik x

= (14.32) showing A is positive denite.


Rn k=1

d(x) 0

Proof. Taking m = 1 and 1 = 0 we learn (0) || 0 for all C which proves item 1. Taking m = 2, 1 = and 2 = , the matrix A := (0) ( ) ( ) (0)

Theorem 14.41 (Bochners Theorem). Suppose C (Rn , C) is positive denite function, then there exists a unique positive measure on BRn such that = . Proof. If ( ) = ( ), then for f S we would have f d =
Rn Rn

is positive denite from which we conclude ( ) = ( ) (since A = A by denition) and


Page: 174 job: prob macro: svmonob.cls

(f ) d =
Rn

f ( ) ( )d.

date/time: 23-Feb-2007/15:20

14.6 Appendix: Bochners Theorem

175

This suggests that we dene I (f ) :=


Rn

0 I, f ( )f ( )d for all f S .

f = f

I, I, f

and therefore I, f f I, . Replacing f by f implies, I, f f I, and hence we have proved | I, f | C (supp(f )) f

We will now show I is positive in the sense if f S and f 0 then I (f ) 0. For general f S we have I (|f | ) =
Rn 2

(14.34)

( ) |f |

( )d =
Rn

( ) f

( )d f

=
Rn

( )d d = ( )f ( )f
Rn

( )f ( )f ( )d d (14.33)

=
Rn

( )f ( )f ( )d d 0.
2

(Rn , R) where C (K ) is a nite constant for each compact for all f DRn := Cc n subset of R . Because of the estimate in Eq. (14.34), it follows that I |DRn has a unique extension I to Cc (Rn , R) still satisfying the estimates in Eq. (14.34) and moreover this extension is still positive. So by the Riesz Markov Theorem ??, there exists a unique Radon measure on Rn such that such that I, f = (f ) for all f Cc (Rn , R). To nish the proof we must show ( ) = ( ) for all Rn given

For t > 0 let pt (x) := tn/2 e|x| It (x) := I

/2 t

S and dene
2

(f ) =
Rn

( )f ( )d for all f Cc (Rn , R).

(14.35)

pt (x) := I (pt (x )) = I (

pt (x ) ) pt (x ) S . Using

which is non-negative by Eq. (14.33) and the fact that [pt (x )] ( ) = =

(Rn , R+ ) be a radial function such f (0) = 1 and f (x) is decreasing Let f Cc as |x| increases. Let f (x) := f (x), then by Theorem ??,

pt (x y )eiy dy =
Rn ix e pt ( )

pt (y )ei(y+x) dy ,

F 1 eix f (x) ( ) = n f ( and therefore, from Eq. (14.35), eix f (x)d(x) =

=e

Rn ix t| |2 /2

( )n f (
Rn

It , =
Rn

I (pt (x )) (x)dx ( ) [pt (x )] ( ) (x)d dx


Rn Rn

Rn

)d.

(14.36)

= =
Rn Rn

Because Rn f ( )d = F f (0) = f (0) = 1, we may apply the approximate function Theorem ?? to Eq. (14.36) to nd eix f (x)d(x) ( ) as 0.
Rn

( )eix et|| ( ) ( )et||


Rn
2

/2

(x)d dx

(14.37)

/2

which coupled with the dominated convergence theorem shows I pt ,


Rn

( ) ( )d = I ( ) as t 0.

On the the other hand, when = 0, the monotone convergence theorem implies (f ) (1) = (Rn ) and therefore (Rn ) = (1) = (0) < . Now knowing the is a nite measure we may use the dominated convergence theorem to concluded (eix f (x)) (eix ) = ( ) as 0 for all . Combining this equation with Eq. (14.37) shows ( ) = ( ) for all Rn .

Hence if 0, then I ( ) = limt0 It , 0. Let K R be a compact set and Cc (R, [0, )) be a function such that = 1 on K. If f Cc (R, R) is a smooth function with supp(f ) K, then 0 f f S and hence
Page: 175 job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

176

14 Characteristic Functions (Fourier Transform)

14.7 Appendix: A Multi-dimensional Weirstrass Approximation Theorem


The following theorem is the multi-dimensional generalization of Theorem 4.23. Theorem 14.42 (Weierstrass Approximation Theorem). Suppose that K = [a1 , b1 ] . . . [ad , bd ] with < ai < bi < is a compact rectangle in Rd . Then for every f C (K, C), there exists polynomials pn on Rd such that pn f uniformly on K. Proof. By a simple scaling and translation of the arguments of f we may d assume with out loss of generality that K = [0, 1] . By considering the real and imaginary parts of f separately, it suces to assume f C ([0, 1], R). d 1 be i.i.d. random vectors with , . . . , Xn Given x K, let Xn = Xn n=1 values in Rd such that
d

where pn (x) =
(){0,1}d

(1) + + (n) n (1) + + (n) n

P (X1 = (1) , . . . , Xn = (n))


n d

=
(){0,1}d

(1 xi )
k=1 i=1

1i (k)

xi i

(k )

is a polynomial of degree nd. In fact more is true. Suppose > 0 is given, M = sup {|f (x)| : x K } , and = sup {|f (y ) f (x)| : x, y K and y x } . By uniform continuity of f on K, lim0 = 0. Therefore, |f (x) pn (x)| = E f (x) f ( Sn Sn ) E f (x) f ( ) n n Sn E f (x) f ( ) : Sn x > n Sn + E f (x) f ( ) : Sn x n (14.39)

P (Xn = ) =
i=1 d

(1 xi )

1i

i x i

j is a Bernoulli random variable for all = (1 , . . . , d ) {0, 1} . Since each Xn j with P Xn = 1 = xj , we know that j EXn = x and Var Xn = xj x2 j = xj (1 xj ).

2M P ( Sn x > ) + . By Chebyshevs inequality, P ( Sn x > ) 1 E Sn x 2


2

As usual let Sn = Sn := X1 + + Xn Rd , then E E Sn = x and n


2 d

d , 4n2

Sn x n

=
j =1 d

j Sn xj n j Sn n

=
j =1 d

Var
n

j Sn xj n

and therefore, Eq. (14.39) yields the estimate sup |f (x) pn (x)|
xK

=
j =1

Var
d

1 = 2 n j =1

Var
k=1

j Xk

2dM + n2

and hence lim sup sup |f (x) pn (x)| 0 as 0.


n xK

1 = n

d xj (1 xj ) . 4 n j =1
P

This shows Sn /n x in L2 (P ) and hence by Chebyshevs inequality, Sn /n x P n in and by a continuity theorem, f S f (x) as n . This along with the n dominated convergence theorem shows pn (x) := E f Sn n f (x) as n , (14.38)

Here is a version of the complex Weirstrass approximation theorem. Theorem 14.43 (Complex Weierstrass Approximation Theorem). Suppose that K Cd = Rd Rd is a compact rectangle. Then there exists polynomials in (z = x + iy, z = x iy ) , pn (z, z ) for z Cd , such that supzK |qn (z, z ) f (z )| 0 as n for every f C (K, C) .

Page: 176

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

14.8 Appendix: Some Calculus Estimates

177

Proof. The mapping (x, y ) Rd Rd z = x + iy Cd is an isomorphism z z z of vector spaces. Letting z = x iy as usual, we have x = z+ 2 and y = 2i . d d Therefore under this identication any polynomial p(x, y ) on R R may be written as a polynomial q in (z, z ), namely z+z zz , ). q (z, z ) = p( 2 2i Conversely a polynomial q in (z, z ) may be thought of as a polynomial p in (x, y ), namely p(x, y ) = q (x + iy, x iy ). Hence the result now follows from Theorem 14.42. Example 14.44. Let K = S = {z C : |z | = 1} and A be the set of polynomials in (z, z ) restricted to S 1 . Then A is dense in C (S 1 ). To prove this rst observe if f C S 1 then F (z ) = |z | f ( |z z | ) for z = 0 and F (0) = 0 denes F C (C) 1 such that F |S = f. By applying Theorem 14.43 to F restricted to a compact rectangle containing S 1 we may nd qn (z, z ) converging uniformly to F on K and hence on S 1 . Since z = z 1 on S 1 , we have shown polynomials in z and z 1 are dense in C (S 1 ). This example generalizes in an obvious way to d K = S 1 Cd . Exercise 14.7. Use Example 14.44 to show that any 2 periodic continuous function, g : Rd C, may be uniformly approximated by a trigonometric polynomial of the form p (x) = a eix
1

p () :=
n=N

bn ein

(14.40)

satises () p () . sup f

Exercise 14.8. Suppose f C (R, C) is a 2 periodic function (i.e. f (x + 2 ) = f (x) for all x R) and
2

f (x) einx dx = 0 for all n Z,


0

show again that f 0. Hint: Use Exercise 14.7. Solution to Exercise (14.8). By assumption, and so by the linearity of the Riemann integral,
2 2 0

f () ein d = 0 for all n

0=
0

f () p () d.

(14.41)

() Choose trigonometric polynomials, p , as in Eq. (14.40) such that p () f uniformly in as 0. Passing to the limit in Eq. (14.41) implies
2 2 2

0 = lim
0 0

f () p () d =
0

() d = f () f
0

|f ()| d.

where is a nite subset of Z and a C for all . Hint: start by d showing there exists a unique continuous function, f : S 1 C such that f eix1 , . . . , eixd = F (x) for all x = (x1 , . . . , xd ) Rd . Solution to Exercise (14.7). I will write out the solution when d = 1. For z S 1 , dene F (z ) := f (ei ) where R is chosen so that z = ei . Since f is 2 periodic, F is well dened since if solves ei = z then all other solutions are of the form { + 2n : n Z} . Since the map ei is a local homeomorphism, := ei : J S 1 i.e. for any J = (a, b) with b a < 2, the map J J This shows is a homeomorphism, it follows that F (z ) = f 1 (z ) for z J. 1 F is continuous when restricted to J. Since such sets cover S , it follows that F is continuous. It now follows from Example 14.44 that polynomials in z and z 1 are dense in C (S 1 ). Hence for any > 0 there exists p(z, z ) = am,n z m z n = am,n z m z n = am,n z mn

From this it follows that f 0, for if |f (0 )| > 0 for some 0 then |f ()| > 0 for in a neighborhood of 0 by continuity of f. It would then follow that 2 2 |f ()| d > 0. 0

14.8 Appendix: Some Calculus Estimates


We end this section by gathering together a number of calculus estimates that we will need in the future. 1. Taylors theorem with integral remainder states, if f C k (R) and z, R or f be holomorphic in a neighborhood of z C and C be suciently small so that f (z + t) is dened for t [0, 1] , then
k1

f (z + ) =
n=0 k1

f (n) (z ) f (n) (z )
n=0

n + k rk (z, ) n! n 1 (k ) + k f (z ) + (z, ) n! k!

(14.42)

such that |F (z ) p(z, z )| for all z. Taking z = ei then implies there exists bn C and N N such that
Page: 177 job: prob macro: svmonob.cls

(14.43)

date/time: 23-Feb-2007/15:20

178

14 Characteristic Functions (Fourier Transform)

where rk (z, ) = 1 k1 f (k) (z + t) (1 t) dt (k 1)! 0 1 = f (k) (z ) + (z, ) k!


1 1

g (y ) := cos y 1 + y 2 /2 0 for all y R. (14.44) (14.45) 4. Since


1

(14.48)

|ez 1 z | = z 2
0

etz (1 t) dt |z |

2 0

et Re z (1 t) dt,

and 1 (z, ) = (k 1)! f


0 (k )

if Re z 0, then (z + t) f
(k )

(z ) (1 t)

k1

|ez 1 z | |z | /2 dt 0 as 0. (14.46) and if Re z > 0 then |ez 1 z | eRe z |z | /2. Combining these into one estimate gives,
2

(14.49)

To prove this, use integration by parts to show, rk (z, ) = 1 k!


1

f (k) (z + t)
0

d dt

(1 t) dt
t=1

1 k = f (k) (z + t) (1 t) k! 1 = f (k) (z ) + rk+1 (z, ) , k! i.e. k rk (z, ) =

+ k! t=0

1 0

f (k+1) (z + t) (1 t) dt 5. Since eiy 1 = iy

|ez 1 z | e0Re z
1 ity e dt, 0

|z | . 2

(14.50)

eiy 1 |y | and hence (14.51)

eiy 1 2 |y | for all y R.

1 (k ) f (z ) k + k+1 rk+1 (z, ) . k! The result now follows by induction. 1 2. For y R, sin y = y 0 cos (ty ) dt and hence |sin y | |y | . 3. For y R we have
1 1

Lemma 14.45. For z = rei with < < and r > 0, let ln z = ln r + i. Then ln : C \ (, 0] C is a holomorphic function such that eln z = z 3 and if |z | < 1 then (14.47) |ln (1 + z ) z | |z |
2

1 2 (1 |z |)
2

for |z | < 1.

(14.52)

cos y = 1 + y 2
0

cos (ty ) (1 t) dt 1 + y 2
0 2

(1 t) dt = 1

y . 2

Proof. Clearly eln z = z and ln z is continuous. Therefore by the inverse function theorem for holomorphic functions, ln z is holomorphic and z
3

Equivalently put ,
2

d d ln z = eln z ln z = 1. dz dz

Alternatively,

Z y Z y |cos x| dx cos xdx |sin y | = |y | 0 0


and for y 0 we have, cos y 1 =
0

For the purposes of this lemma it suces to dene ln (1 + z ) = and to then observe: 1)

P
n=1

(z )n /n

Z
sin xdx
0

X d 1 ln (1 + z ) = (z )n = , dz 1 + z n=0

xdx = y 2 /2.

and 2) the functions 1 + z and eln(1+z) both solve f (z ) = and therefore eln(1+z) = 1 + z. 1 f (z ) with f (0) = 1 1+z

This last inequality may also be proved as a simple calculus exercise following from; g () = and g (y ) = 0 i sin y = y which happens i y = 0.

Page: 178

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

14.8 Appendix: Some Calculus Estimates

179

Therefore,

d dz

ln z =

1 z

and

d dz 2

ln z = z12 . So by Taylors theorem,


1

and using Eq. (14.55) with n = 2 implies (14.53) eiy 1 + iy y2 2! |y | . 3!


3

ln (1 + z ) = z z 2
0

1 (1 + tz )
2

(1 t) dt.

If t 0 and |z | < 1, then 1 1 1 n |tz | = . (1 + tz ) 1 t |z | 1 |z | n=0 and therefore,


1 0

Combining the last two inequalities completes the proof of Eq. (14.56). Equation (14.57) is proved similarly and hence will be omitted. Lemma 14.47. If X is a square integrable random variable, then f () := E eiX = 1 + iEX 2 E X 2 + r () 2!

1 (1 + tz )
2

(1 t) dt

2 (1 |z |)

2.

(14.54) where r () := 2 E X 2 and (14.55) () := E X 2 || |X | 3!


3 3

Eq. (14.52) is now a consequence of Eq. (14.53) and Eq. (14.54). Lemma 14.46. For all y R and n N {0} ,
n

= 2 ()

eiy
k=0

(iy ) k!

|y | (n + 1)!

n+1

|| |X | 3!

0 as 0.

(14.58)

and in particular, y2 eiy 1 + iy 2! More generally for all n N we have


n

Proof. Using Eq. (14.56) with y = X and taking expectations implies, |y | y2 . 3!


3

(14.56)

f () 1 + iEX

2 E X2 2!

E eiX 1 + iX 2 2 E X 2 || |X | 3!
3

X2 2!

eiy
k=0

(iy ) k!

|y | 2 |y | . (n + 1)! n!
iy

n+1

=: 2 () .

(14.57)

Proof. By Taylors theorem (see Eq. (14.42) with f (y ) = e , x = 0 and = y ) we have


n

The DCT, with X 2 L1 (P ) being the dominating function, allows us to conclude that lim0 () = 0.

eiy
k=0

(iy ) k!

y n+1 n! |y | n!
n+1

1 0 1 0

in+1 eity (1 t) dt (1 t) dt =
n

|y | (n + 1)!

n+1

which is Eq. (14.55). Using Eq. (14.55) with n = 1 implies eiy 1 + iy y2 2! eiy (1 + iy ) + y2 y2 + = y2 2 2 y2 2

Page: 179

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

15 Weak Convergence of Random Sums


Throughout this chapter, we will assume the following standing notation n unless otherwise stated. For each n N, let {Xn,k }k=1 be independent random variables and let
n

Remark 15.3. The reader should observe that in order for condition (M ) to hold in the setup in Example 15.1 it is necessary that limn s2 n = . Lemma 15.4. Let us continue the notation in Example 15.1. Then {Xn,k := Xk /sn } satises (LC ) if either of two conditions hold; 1. {Xn }n=1 are i.i.d. 2. The {Xn }n=1 satisfy Liapunov condition; there exists some > 2 such that n k=1 E |Xk | = 0. (15.6) lim n s n More generally, if {Xn,k } satises the Liapunov condition,
n n

Sn :=
k=1

Xn,k .
2 n,k

(15.1) =E
2 Xn,k

Until further notice we are going to assume E [Xn,k ] = 0, and Var (Sn ) =
n k=1 2 n,k

< ,

= 1. Also let fnk () := E eiXn,k (15.2)

denote the characteristic function of Xn,k . Example 15.1. Suppose are mean zero square integrable random varin n 2 2 ables with k = Var (Xn ) . If we let s2 n := k=1 Var (Xk ) = k=1 k , n 2 2 2 n,k := k /sn , and Xn,k := Xk /sn , then {Xn,k }k=1 satisfy the above hypothesis n and Sn = s1 k=1 Xk . n Our main interest in this chapter is to consider the limiting behavior of Sn as n . In order to do this, it will be useful to put conditions on the {Xn,k } such that no one term dominates sum dening the sum dening Sn in Eq. (15.1) in the limit as n . Denition 15.2. We say that {Xn,k } satises the Lindeberg Condition (LC) i
n n {Xn }n=1

lim

2 E Xn,k (|Xn,k |) = 0 k=1

where : [0, ) [0, ) is a non-decreasing function such that (t) > 0 for all t > 0, then {Xn,k } satises (LC ) . 2 and Proof. 1. If {Xn }n=1 are i.i.d., then sn = n where 2 = EX1
n

E
k=1

2 Xn,k

1 : |Xn,k | > t = 2 sn = =

n 2 E Xk : |Xk | > sn t k=1 n 2 E X1 : |X1 | > k=1

(15.7) nt

1 n 2

lim

2 E Xn,k : |Xn,k | > t = 0 for all t > 0. k=1

(15.3)

1 2 E X1 : |X1 | > nt 2

We say {Xn,k } satises condition (M ) if


2 Dn := max n,k : k n 0 as n ,

which, by DCT, tends to zero as n . 2. Assuming Eq. (15.6), then for any t > 0, (15.4)
n 2 E Xn,k : |Xn,k | > t k=1 k=1 n 2 E Xn,k

and we say {Xn,k } is uniformly asymptotic negligibility (UAN) if for all > 0, lim max P (|Xn,k | > ) = 0. (15.5)
n kn

Xn,k t

: |Xn,k | > t

1 t2

E [|Xn,k | ] =
k=1

1 t2 s n

E |Xk | 0.
k=1

182

15 Weak Convergence of Random Sums

For the last assertion, working as above we have


n 2 E Xn,k : |Xn,k | > t k=1 k=1 n 2 E Xn,k

Proof. Let a := that

n1 i=1

ai and b :=

n1 i=1 bi

and observe that |a| , |b| 1 and

(|Xn,k |) : |Xn,k | > t (t)

as n .

1 (t)

n 2 E Xn,k (|Xn,k |) 0 k=1

|an a bn b| |an a an b| + |an b bn b| = |an | |a b| + |an bn | |b| |a b| + |an bn | . The proof is now easily completed by induction on n. Theorem 15.7 (Lindeberg-Feller CLT (I)). Suppose {Xn,k } satises (LC ) , then Sn = N (0, 1) . (15.8) (See Theorem 15.11 for a converse to this theorem.)

Lemma 15.5. Let {Xn,k }n=1 be as above, then (LC ) = (M ) = (U AN ) . Proof. For k n,
2 2 2 2 1|Xn,k |>t n,k = E Xn,k = E Xn,k 1|Xn,k |t + E Xn,k n

To prove this theorem we must show E eiSn e


2

t +E

2 Xn,k 1|Xn,k |>t

t +
m=1

2 Xn,m 1|Xn,m |>t

/2

as n .

(15.9)

and therefore using (LC ) we nd


n kn 2 lim max n,k t2 for all t > 0.

This clearly implies (M ) holds. The assertion that (M ) implies (U AN ) follows by Chebyschevs inequality, max P (|Xn,k | > ) max
kn kn

Before starting the formal proof, let me give an informal explanation for Eq. (15.9). Using 2 2 fnk () 1 nk , 2 we might expect
n

1 2 E |Xn,k | : |Xn,k | > 2 E |Xn,k | : |Xn,k | > 0.


2

E eiSn =
k=1

Pn fnk () = e k=1 ln fnk ()

1 2

Pn = e k=1 ln(1+fnk ()1)


( A)

k=1

Pn e k=1 (fnk ()1)

=
k=1

e(fnk ()1)

In fact the same argument shows that (M ) implies


n

(B )

Pn 2 2 2 e k=1 2 nk = e 2 .

P (|Xn,k | > )
k=1

1 2

E |Xn,k | : |Xn,k | > 0.


k=1

The question then becomes under what conditions are these approximations valid. It turns out that approximation (A), namely that
n n

We will need the following lemma for our subsequent applications of the continuity theorem. Lemma 15.6. Suppose that ai , bi C with |ai | , |bi | 1 for i = 1, 2, . . . , n. Then
n n n

lim

fnk () exp
k=1 k=1

(fnk () 1)

= 0,

(15.10)

is valid if condition (M ) holds, see Lemma 15.9 below. It is shown in the estimate Eq. (15.11) below that the approximation (B ) is valid, i.e.
n n

ai
i=1 i=1

bi
i=1

|ai bi | .

lim

k=1

1 (fnk () 1) = 2 , 2
date/time: 23-Feb-2007/15:20

Page: 182

job: prob

macro: svmonob.cls

15 Weak Convergence of Random Sums

183

if (LC ) is satised. These observations would then constitute a proof of Theorem 15.7. The proof given below of Theorem 15.7 will not quite follow this route and will not use Lemma 15.9 directly. However, this lemma will be used in the proofs of Theorems 15.11 and 15.14. Proof. Now on to the formal proof of Theorem 15.7. Since
n n

and since > 0 is arbitrary, we may conclude that lim supn An,k = 0. n To estimate k=1 Bn,k , we use the estimate, |eu 1 + u| u /2 valid for u 0 (see Eq. 14.49 with z = u). With this estimate we nd,
n n

n k=1 2

Bn,k =
k=1 k=1 n

2 2 n,k 2 2 e n,k /2 2 2

E eiSn =
k=1

fnk () and e

/2

=
k=1

2 n,k /2

we may use Lemma 15.6 to conclude,


n n

k=1 4
2 2 n,k /2

2 1 2 n,k 2 2 n

4 8

n 4 n,k k=1

E e where

iSn

2 /2

k=1

fnk () e

=
k=1

(An,k + Bn,k )

max 2 8 kn n,k

2 n,k = k=1

4 max 2 0, 8 kn n,k

An,k := fnk () 1 Bn,k := Now, using Lemma 14.47, An,k = E e


iXn,k

2 2 n,k 2

and

wherein we have used (M ) (which is implied by (LC )) in taking the limit as n . As an application of Theorem 15.7 we can give half of the proof of Theorem 12.12. Theorem 15.8 (Converse assertion in Theorem 12.12). If {Xn }n=1 are independent random variables and the random series, n=1 Xn , is almost surely convergent, then for all c > 0 the following three series converge; 1. 2. 3.
n=1 n=1 n=1

2 2 n,k 2 2 e n,k /2 . 2

2 2 1 + Xn,k 2 || |Xn,k | 3!
3

E e

iXn,k

2 2 1 + Xn,k 2

P (|Xn | > c) < , Var Xn 1|Xn |c < , and E Xn 1|Xn |c converges.

2 2 E Xn,k

2 2 E Xn,k

|| |Xn,k | || |Xn,k | 2 : |Xn,k | + 2 E Xn,k : |Xn,k | > 3! 3!


3

2 E

|| |Xn,k | 2 : |Xn,k | + 2 E Xn,k : |Xn,k | > 3!

Proof. Since n=1 Xn is almost surely convergent, it follows that limn Xn = 0 a.s. and hence for every c > 0, P ({|Xn | c i.o.}) = 0. Accord ing the Borel zero one law this implies for every c > 0 that n=1 P (|Xn | > c) < c . Since Xn 0 a.s., {Xn } and Xn := Xn 1|Xn |c are tail equivalent for all c c > 0. In particular n=1 Xn is almost surely convergent for all c > 0. c c Fix c > 0, let Yn := Xn E [Xn ] and let
n n n c Var (Xk )= k=1 k=1

2 2 2 || E |Xn,k | : |Xn,k | + 2 E Xn,k : |Xn,k | > 3! 3 || 2 2 n,k + 2 E Xn,k : |Xn,k | > . = 6 From this estimate and (LC ) it follows that
n

s2 n = Var (Y1 + + Yn ) =
k=1

Var (Yk ) =

Var Xk 1|Xk |c .

For the sake of contradictions, suppose s2 n as n . Since |Yk | 2c, it n 2 follows that k=1 E Yk 1|Yk |>sn t = 0 for all suciently large n and hence 1 n s2 n lim
n 2 E Yk 1|Yk |>sn t = 0, k=1

lim sup
n k=1

An,k lim sup


n

3 + 2 6

n 2 E Xn,k : |Xn,k | > k=1

3 6 (15.11)

i.e. {Yn,k := Yk /sn }n=1 satises (LC ) see Examples 15.1 and Remark 15.3. So by the central limit Theorem 15.7, it follows that
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 183

job: prob

184

15 Weak Convergence of Random Sums

1 s2 n

n c c (Xn E [Xn ]) = k=1

1 s2 n

Yk = N (0, 1) .
k=1

Proof. For the rst item we estimate, EeiX 1 E eiX 1 E [2 |X |] = E [2 |X | : |X | ] + E [2 |X | : |X | < ] 2 2 2P [|X | ] + || 2 E |X | + || Replacing X by Xn,k and in the above inequality shows |n,k ()| = |fn,k () 1|
2 2n,k 2 2 E | X | + | | = + || . n,k 2 2

On the other hand we know 1 n s2 n lim


n c Xn = k=1 c Xk = 0 a.s. limn s2 n k=1

and so by Slutskys theorem, 1 s2 n


n c E [Xn ]= k=1

1 s2 n

n c Xn k=1

1 s2 n

Yk = N (0, 1) .
k=1

Therefore, lim sup max |n,k ()| lim sup


n kn n

But it is not possible for constant (i.e. non-random) variables, cn := n 1 c k=1 E [Xn ] , to converge to a non-degenerate limit. (Think about this eis2 n ther in terms of characteristic functions or in terms of distribution functions.) Thus we must conclude that

2Dn + || = || 0 as 0. 2

Var Xn 1|Xn |c =
n=1 n=1

c Var (Xn ) = lim s2 n < . n

For the second item, observe that Re n,k () = Re fn,k () 1 0 and hence en,k () = eRe n,k () e0 = 1 and hence we have from Lemma 15.6 and the estimate (14.49),
n n n

An application of Kolmogorovs convergence criteria (Theorem 12.11) implies that


c c (Xn E [Xn ]) is convergent a.s. n=1 c Since we already know that n=1 Xn is convergent almost surely we may now conclude n=1 E Xn 1|Xn |c is convergent. Let us now turn to the converse of Theorem 15.7, see Theorem 15.11 below.

fn,k ()
k=1 k=1

n,k ()

k=1 n

fn,k () en,k () en,k () 1 n,k ()


k=1 n

= 1 2

|n,k ()|
k=1

Lemma 15.9. Suppose that {Xn,k } satises property (M ) , i.e. Dn 2 maxkn n,k 0. If we dene, n,k () := fn,k () 1 = E eiXn,k 1 , then; 1. limn maxkn |n,k ()| = 0 and n 2. fSn () k=1 en,k () 0 as n , where
n

:=

1 max |n,k ()| 2 kn

|n,k ()| .
k=1

Moreover since EXn,k = 0, the estimate in Eq. (14.49) implies


n n

|n,k ()| =
k=1 k=1 n

E eiXn,k 1 iXn,k 1 2 |Xn,k | 2 2 2


n 2 n,k = k=1

fn,k () .
k=1

fSn () = E eiSn =
k=1

2 . 2

Thus we have shown,

Page: 184

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

15.1 Innitely Divisible and Stable Symmetric Distributions


n n

185

fn,k ()
k=1 k=1

en,k ()

2 max |n,k ()| 4 kn

The second inequality combined with Lemma 15.9 implies,


Pn lim e k=1 n,k () = lim
n

and the latter expression tends to zero by item 1. Lemma 15.10. Let X be a random variable such that EX 2 < and EX = 0. Further let f () := E eiX and u () := Re (f () 1) . Then for all c > 0, u () + or equivalently 2 E cos X 1 + X 2 E X 2 2 : |X | > c . 2 2 c In particular if we choose || 6/ |c| , then 2 1 E cos X 1 + X 2 2 E X 2 : |X | > c . 2 c
2

en,k () = e
k=1

/2

Taking the modulus of this equation then implies,


n

2 E X2 E X2 2 : |X | > c 2 2 c

Pn Pn 2 lim e k=1 Re n,k () = lim e k=1 n,k () = e /2


n

(15.12)

from which we may conclude


n

(15.13)

lim

Re n,k () = 2 /2.
k=1

We may write this last limit as


n

(15.14)

lim

E cos (Xn,k ) 1 +
k=1

2 2 X =0 2 n,k

2 Proof. For all R, we have (see Eq. (14.48)) cos X 1 + 2 X 0 and cos X 1 2. Therefore,

which by Lemma 15.10 implies


n n

2 2 u () + E X 2 = E cos X 1 + X 2 2 2 2 E cos X 1 + X 2 : |X | > c 2 2 2 E 2 + X : |X | > c 2 E 2 which gives Eq. (15.12). Theorem 15.11 (Lindeberg-Feller CLT (II)). Suppose {Xn,k } satises (M ) and also the central limit theorem in Eq. (15.8) holds, then {Xn,k } satises (LC ) . So under condition (M ) , Sn converges to a normal random variable i (LC ) holds. Proof. By assumption we have
n n kn 2 lim max n,k = 0 and lim n

lim

2 E Xn,k : |Xn,k | > c = 0 k=1

for all c > 0 which is (LC ) .

|X | 2 2 + X : |X | > c c2 2

15.1 Innitely Divisible and Stable Symmetric Distributions


To get some indication as to what we might expect to happen when the Lindeberg condition is relaxed, we consider the following Poisson limit theorem. Theorem 15.12 (A Poisson Limit Theorem). For each n N, let n {Xn,k }k=1 be independent Bernoulli random variables with P (Xn,k = 1) = pn,k and P (Xn,k = 0) = qn,k := 1 pn,k . Suppose; 1. limn k=1 pn,k = a (0, ) and 2. limn max1kn pn,k = 0. (So no one term is dominating the sums in item 1.) Then Sn = k=1 Xn,k = Z where Z is a Poisson random variable with mean a. (See Section 2.6 of Durrett[2] for more on this theorem.)
n n

fn,k () = e
k=1

/2

Page: 185

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

186

15 Weak Convergence of Random Sums

Proof. Recall from Example 14.14 that for any a > 0, E eiZ = exp a ei 1 Since E e it follows that E e
iSn iXn,k i

exp pn,k ei 1
k=1

k=1

1 + pn,k ei 1
n

.
i

= e pn,k + (1 pn,k ) = 1 + pn,k e


n i

1 ,

1 2

|zn,k |
k=1 n

1 max |zn,k | 2 1kn pn,k .

|zn,k |
k=1

2 max pn,k
1kn

=
k=1

1 + pn,k e

k=1

Using the assumptions, we may conclude


n n

Since 1 + pn,k ei 1 lies on the line segment joining 1 to ei , it follows that 1 + pn,k ei 1 1. Since
n

exp pn,k ei 1
k=1

k=1

1 + pn,k ei 1

0 as n .

exp pn,k ei 1
k=1

= exp
k=1

pn,k ei 1

exp a ei 1

we have shown
n n

lim E eiSn = lim = lim

1 + pn,k ei 1
k=1 n

exp pn,k ei 1
k=1

= exp a ei 1

The result now follows by an application of the continuity Theorem 14.21. Hence we may apply Lemma 15.6 to nd
n n

Remark 15.13. Keeping the notation in Theorem 15.12, we have 1 + pn,k ei 1 E [Xn,k ] = pn,k and Var (Xn,k ) = pn,k (1 pn,k ) and
n n

exp pn,k ei 1
k=1 n

k=1

k=1 n

exp pn,k ei 1

1 + pn,k ei 1

s2 n :=
k=1

Var (Xn,k ) =
k=1

pn,k (1 pn,k ) .

=
k=1

|exp (zn,k ) [1 + zn,k ]|

where zn,k = pn,k ei 1 . Since Re zn,k = pn,k (cos 1) 0, we may use the calculus estimate in Eq. (14.49) to conclude,
Page: 186 job: prob

Under the assumptions of Theorem 15.12, we see that s2 n a as n . Let Xn,k pn,k 2 so that E [Yn,k ] = 0 and n,k := Var (Yn,k ) = s1 Yn,k := 2 Var (Xn,k ) = sn n 1 p (1 p ) which satises condition ( M ) . Let us observe that, for large n,k n,k s2 n n,

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

15.1 Innitely Divisible and Stable Symmetric Distributions


2 2 E Yn,k : |Yn,k | > t = E Yn,k :

187

Xn,k pn,k >t sn

n kn

2 lim max n,k = 0.

2 = E Yn,k : |Xn,k pn,k | > sn t

Under condition (M ) , we expect fn,k () = 1 for n large. Therefore we expect 1 pn,k sn


2

2 Yn,k

: |Xn,k pn,k | > 2at

fn,k () = eln fn,k () = eln[1+(fn,k ()1)] = e(fn,k ()1) and hence that
n n n

2 = E Yn,k : Xn,k = 1 = pn,k

from which it follows that


n n n 2 E Yn,k : |Yn,k | > t = lim k=1 n

E eiSn = pn,k
k=1

fn,k () =
k=1 k=1

e(fn,k ()1) = exp


k=1

(fn,k () 1) . (15.15)

lim

1 pn,k sn

= a. This is in fact correct, since Lemma 15.9 indeed implies


n n

Therefore {Yn,k } do not satisfy (LC ) . Nevertheless we have


n

lim

E eiSn exp
k=1

(fn,k () 1)

= 0.

(15.16)

Yn,k =
k=1

n k=1

Xn,k sn

n k=1

pn,k

Z a a

Since E [Xn,k ] = 0, fn,k () 1 = E eiXn,k 1 = E eiXn,k 1 iXn,k = eix 1 ix dn,k (x)


R 1 where n,k := P Xn,k is the law of Xn,k . Therefore we have n n

where Z is a Poisson random variable with mean a. Notice that the limit is not a normal random variable. We wish to characterize the possible limiting distributions of sequences {Sn }n=1 when we relax the Lindeberg condition (LC ) to condition (M ) . We have the following theorem. Theorem 15.14. Suppose {Xn,k }k=1 satisfy property (M ) and Sn := n = L for some random variable L. Then the characteristic k=1 Xn,k function fL () := E eiL must be of the form, fL () = exp
R n

exp
k=1

(fn,k () 1)

= exp
k=1 R

eix 1 ix dn,k (x)


n

ix

1 ix d (x) x2

= exp
R

eix 1 ix
k=1

dn,k (x) (15.17)

= exp
R := where n n k=1

where is a nite positive measure on (R, BR ) such that (R) 1. (Recall ix 1ix that you proved in Exercise 14.4 that exp R e d (x) is always the x2 characteristic function of a probability measure.) Proof. As before, let fn,k () = E e the continuity theorem we are assuming lim fSn () = lim
iXn,k

eix 1 ix dn (x)

n,k . Let us further observe that


n n

and n,k () := fn,k () 1. By


R

x2 dn (x) = k=1 R

x2 dn,k (x) =
k=1

2 n,k = 1.

n n n

fn,k () = f ()
k=1

Hence if we dene d (x) := x2 dn (x) , then n is a probability measure and we have from Eqs. (15.16) and Eq. (15.17) that

where f () is continuous at = 0. We are also assuming property (M ) , i.e.

fSn () exp
R

eix 1 ix dn (x) x2

0.

(15.18)

Page: 187

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

188

15 Weak Convergence of Random Sums

ix 1ix with h () = 0, there Since h (x) := e is a continuous function of R x2 is a subsequence, {nl } of {n} such that nl (h) (h) for some probability , BR measure on R . Combining this with Eq. (15.18) allows us to conclude,

where s is the nite measure on (R, BR ) dened by s (A) := s2 s1 A for all A BR . The reader should observe that eix 1 ix 1 = 2 2 x x and hence (, x)
eix 1ix x2

fL () = lim E eiSnl = lim exp


l l R

eix 1 ix dn (x) l

= exp
R

eix 1 ix d (x) . x2

k=2

(ix) 1 = 2 k! x

k=2

ik k k2 x k!

Denition 15.15. We say that

n {Xn,k }k=1

is smooth. Moreover,

has bounded variation (BV ) i


n 2 n,k < .

sup Var (Sn ) = sup


n n {Xn,k }k=1 n k=1

(15.19) and

d eix 1 ix ixeix ix eix 1 = = i d x2 x2 x d2 eix 1 ix ixeix = i = eix . d2 x2 x Using these remarks and the fact that (R) < , it is easy to see that fL () =
R

Corollary 15.16. Suppose satisfy properties (M ) and (BV ) . If n Sn := k=1 Xn,k = L for some random variable L, then fL () = exp
R

eix 1 ix d (x) x2

(15.20) and fL () =

eix 1 ds (x) fL () x

where is a nite positive measure on (R, BR ) .


2 Proof. Let s2 n := Var (Sn ) . If limn sn = 0, then Sn 0 in L and hence weakly, therefore Eq. (15.20) holds with 0. So let us now suppose limn sn = 0. Since {sn }n=1 is bounded, we may by passing to a subsequence if necessary, assume limn sn = s > 0. By replacing Xn,k by Xn,k /sn and hence Sn by Sn /sn , we then know by Slutskys theorem that Sn /sn = L/s. Hence by an application of Theorem 15.14, we may conclude

eix ds (x) +
R R

eix 1 ds (x) x

fL ()

and in particular, fL (0) = 0 and fL (0) = s (R) . Therefore the probability measure, , on (R, BR ) such that () = fL () has mean zero and variance, s (R) < . Denition 15.17. A probability distribution, , on (R, BR ) is innitely divisible i for all n N there exists i.i.d. nondegenerate random variables, d n {Xn,k }k=1 , such that Xn,1 + +Xn,n = . This can be formulated in the following two equivalent ways. For all n N there should exists a non-degenerate probn n ability measure, n , on (R, BR ) such that () = [g ()] n = . For all n N, for some non-constant characteristic function, g. Theorem 15.18. The following class of symmetric distributions on (R, BR ) are equal; 1. C1 all possible limiting distributions under properties (M ) and (BV ) . 2. C2 all distributions with characteristic functions of the form given in Corollary 15.16.
macro: svmonob.cls date/time: 23-Feb-2007/15:20

fL (/s) = fL/s () = exp


R

eix 1 ix d (x) x2

where is a nite positive measure on (R, BR ) such that (R) 1. Letting s in this expression then implies fL () = exp
R

eisx 1 isx d (x) x2 eisx 1 isx (sx)


2

= exp
R

s d (x)

= exp
R

eix 1 ix ds (x) x2

Page: 188

job: prob

3. C3 all innitely divisible distributions with mean zero and nite variance. Proof. The inclusion, C1 C2 , is the content of Corollary 15.16. For C2 C3 , observe that if () = exp
R n

eix 1 ix d (x) x2

then () = [ n ()] where n is the unique probability measure on (R, BR ) such that eix 1 ix 1 d (x) . n () = exp x2 n R For C3 C1 , simply dene {Xn,k }k=1 to be i.i.d with E eiXn,k = n () . In this case Sn =
n k=1 n

Xn,k = .

15.1.1 Stable Laws See the le, dynkin-stable-innitely-divs.pdf, and Durrett [2, Example 3.10 on p. 106 and Section 2.7.].

Part IV

Conditional Expectations and Martingales

16 Hilbert Space Basics


Denition 16.1. Let H be a complex vector space. An inner product on H is a function, | : H H C, such that 1. ax + by |z = a x|z + b y |z i.e. x x|z is linear. 2. x|y = y |x . 3. x 2 := x|x 0 with equality x 2 = 0 i x = 0. Notice that combining properties (1) and (2) that x z |x is conjugate linear for xed z H, i.e. z |ax + by = a z |x + b z |y . The following identity will be used frequently in the sequel without further mention, x+y
2

Fig. 16.1. The picture behind the proof of the Schwarz inequality.

= x + y |x + y = x = x
2

+ y

+ x|y + y |x (16.1)

+ y

+ 2Re x|y .

Theorem 16.2 (Schwarz Inequality). Let (H, | ) be an inner product space, then for all x, y H | x|y | x y and equality holds i x and y are linearly dependent. Proof. If y = 0, the result holds trivially. So assume that y = 0 and observe; 2 if x = y for some C, then x|y = y and hence | x|y | = || y
2

Corollary 16.3. Let (H, | ) be an inner product space and x := x|x . Then the Hilbertian norm, , is a norm on H. Moreover | is continuous on H H, where H is viewed as the normed space (H, ). Proof. If x, y H, then, using Schwarzs inequality, x+y
2

= x x

2 2

+ y + y

2 2

+ 2Re x|y + 2 x y = ( x + y )2 .

= x y .

Now suppose that x H is arbitrary, let z := x y 2 x|y y. (So z is the orthogonal projection of x onto y, see Figure 16.1.) Then 0 z
2

Taking the square root of this inequality shows satises the triangle inequality. Checking that satises the remaining axioms of a norm is now routine and will be left to the reader. If x, x , y, y H, then | x + x|y + y x|y | = | x|y + x|y + x|y | x y + y x + x y 0 as x, y 0, from which it follows that | is continuous.

= x

x|y y = x y 2 | x|y |2 = x 2 y 2
2

| x|y |2 y y 4

2Re x|

x|y y y 2

from which it follows that 0 y equivalently i x = y 2 x|y y.

| x|y |2 with equality i z = 0 or

Denition 16.4. Let (H, | ) be an inner product space, we say x, y H are orthogonal and write x y i x|y = 0. More generally if A H is a set, x H is orthogonal to A (write x A) i x|y = 0 for all y A. Let

194

16 Hilbert Space Basics

A = {x H : x A} be the set of vectors orthogonal to A. A subset S H is an orthogonal set if x y for all distinct elements x, y S. If S further satises, x = 1 for all x S, then S is said to be an orthonormal set. Proposition 16.5. Let (H, | ) be an inner product space then 1. (Parallelogram Law) x+y
2

Denition 16.8. A subset C of a vector space X is said to be convex if for all x, y C the line segment [x, y ] := {tx + (1 t)y : 0 t 1} joining x to y is contained in C as well. (Notice that any vector subspace of X is convex.) Theorem 16.9 (Best Approximation Theorem). Suppose that H is a Hilbert space and M H is a closed convex subset of H. Then for any x H there exists a unique y M such that

+ xy

=2 x

+2 y

(16.2)

x y = d(x, M ) = inf

for all x, y H. 2. (Pythagorean Theorem) If S H is a nite orthogonal set, then


2

z M

xz .

Moreover, if M is a vector subspace of H, then the point y may also be characterized as the unique point in M such that (x y ) M. Proof. Uniqueness. By replacing M by M x := {m x : m M } we may assume x = 0. Let := d(0, M ) = inf mM m and y, z M, see Figure 16.2.

x
xS

=
xS

x 2.

(16.3)

3. If A H is a set, then A is a closed linear subspace of H. Proof. I will assume that H is a complex Hilbert space, the real case being easier. Items 1. and 2. are proved by the following elementary computations; x+y
2

+ xy = x
2

2 2

+ y
2

+ 2Re x|y + x
2

+ y

2Re x|y

=2 x and

+2 y ,

x
xS

=
xS

x|
y S

y =
x,y S

x|y x 2.
Fig. 16.2. The geometry of convex sets.

=
xS

x|x =
xS

Item 3. is a consequence of the continuity of | and the fact that A = xA Nul( |x )

By the parallelogram law and the convexity of M, 2 y


2

+2 z

= y+z =4

+ yz
2

where Nul( |x ) = {y H : y |x = 0} a closed subspace of H. Denition 16.6. A Hilbert space is an inner product space (H, | ) such that the induced Hilbertian norm is complete. Example 16.7. For any measure space, (, B , ) , H := L2 () with inner product, f |g =

y+z 2

+ yz

4 2 + y z 2 .

(16.4)

f ( ) g ( ) d ( )

Hence if y = z = , then 2 2 + 2 2 4 2 + y z 2 , so that y z 2 = 0. Therefore, if a minimizer for d(0, )|M exists, it is unique. Existence. Let yn M be chosen such that yn = n d(0, M ). Taking y = ym and z = yn in Eq. (16.4) shows
2 2 2m + 2n 4 2 + yn ym 2 .

is a Hilbert space see Theorem 11.17 for the completeness assertion.


Page: 194 job: prob macro: svmonob.cls

date/time: 23-Feb-2007/15:20

16 Hilbert Space Basics

195

Passing to the limit m, n in this equation implies, 2 2 + 2 2 4 2 + lim sup yn ym 2 ,


m,n

2 2. PM = PM (PM 3. PM = PM (PM 4. Ran(PM ) = M 5. If N M H

is a projection). is self-adjoint). and Nul(PM ) = M . is another closed subspace, the PN PM = PM PN = PN .

i.e. lim supm,n yn ym 2 = 0. Therefore, by completeness of H, {yn }n=1 is convergent. Because M is closed, y := lim yn M and because the norm is n continuous, y = lim yn = = d(0, M ).
n

Proof. 1. Let x1 , x2 H and C, then PM x1 + PM x2 M and PM x1 + PM x2 (x1 + x2 ) = [PM x1 x1 + (PM x2 x2 )] M showing PM x1 + PM x2 = PM (x1 + x2 ), i.e. PM is linear. 2 2. Obviously Ran(PM ) = M and PM x = x for all x M . Therefore PM = PM . 3. Let x, y H, then since (x PM x) and (y PM y ) are in M , PM x|y = PM x|PM y + y PM y = PM x|PM y = PM x + (x PM x)|PM y = x|PM y . 4. We have already seen, Ran(PM ) = M and PM x = 0 i x = x 0 M , i.e. Nul(PM ) = M . 5. If N M H it is clear that PM PN = PN since PM = Id on N = Ran(PN ) M. Taking adjoints gives the other identity, namely that PN PM = PN . More directly, if x H and n N, we have PN PM x|n = PM x|PN n = PM x|n = x|PM n = x|n . Since this holds for all n we may conclude that PN PM x = PN x. Corollary 16.13. If M H is a proper closed subspace of a Hilbert space H, then H = M M . Proof. Given x H, let y = PM x so that xy M . Then x = y +(xy ) 2 M + M . If x M M , then x x, i.e. x = x|x = 0. So M M = {0} . Exercise 16.1. Suppose M is a subset of H, then M = span(M ). Theorem 16.14 (Riesz Theorem). Let H be the dual space of H (i.e. that linear space of continuous linear functionals on H ). The map z H |z H is a conjugate linear1 isometric isomorphism.
1

So y is the desired point in M which is closest to 0. Now suppose M is a closed subspace of H and x H. Let y M be the closest point in M to x. Then for w M, the function g (t) := x (y + tw)
2

= xy

2tRe x y |w + t2 w

has a minimum at t = 0 and therefore 0 = g (0) = 2Re x y |w . Since w M is arbitrary, this implies that (x y ) M. Finally suppose y M is any point such that (x y ) M. Then for z M, by Pythagoreans theorem, xz
2

= xy+yz

= xy

+ yz

xy

which shows d(x, M )2 x y 2 . That is to say y is the point in M closest to x. Denition 16.10. Suppose that A : H H is a bounded operator, i.e. A := sup { Ax : x H with x = 1} < .

The adjoint of A, denoted A , is the unique operator A : H H such that Ax|y = x|A y . (The proof that A exists and is unique will be given in Proposition 16.15 below.) A bounded operator A : H H is self - adjoint or Hermitian if A = A . Denition 16.11. Let H be a Hilbert space and M H be a closed subspace. The orthogonal projection of H onto M is the function PM : H H such that for x H, PM (x) is the unique element in M such that (x PM (x)) M, i.e. PM (x) is the unique element in M such that x|m = PM (x)|m for all m M. (16.5)

(16.6)

Theorem 16.12 (Projection Theorem). Let H be a Hilbert space and M H be a closed subspace. The orthogonal projection PM satises: 1. PM is linear and hence we will write PM x rather than PM (x).
Page: 195 job: prob

Recall that j is conjugate linear if j (z1 + z2 ) = jz1 + jz 2 for all z1 , z2 H and C.

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

196

16 Hilbert Space Basics

Proof. The map j is conjugate linear by the axioms of the inner products. Moreover, for x, z H, | x|z | x z for all x H

Ax|y1 + y2

Ax|y2 K = Ax|y1 K + x|A (y2 ) = x|A (y1 ) K + = x|A (y1 ) + A (y2 ) H

with equality when x = z. This implies that jz H = |z H = z . Therefore j is isometric and this implies j is injective. To nish the proof we must show that j is surjective. So let f H which we assume, with out loss of generality, is non-zero. Then M =Nul(f ) a closed proper subspace of H. Since, by Corollary 16.13, H = M M , f : H/M = M F is a linear isomorphism. This shows that dim(M ) = 1 and hence H = M Fx0 where x0 M \ {0} .2 (x0 )/ x0 2 . Then Choose z = x0 M such that f (x0 ) = x0 |z , i.e. = f for x = m + x0 with m M and F, f (x) = f (x0 ) = x0 |z = x0 |z = m + x0 |z = x|z which shows that f = jz. Proposition 16.15 (Adjoints). Let H and K be Hilbert spaces and A : H K be a bounded operator. Then there exists a unique bounded operator A : K H such that Ax|y
K

and by the uniqueness of A (y1 + y2 ) we nd A (y1 + y2 ) = A (y1 ) + A (y2 ). This shows A is linear and so we will now write A y instead of A (y ). Since A y |x H = x|A y H = Ax|y K = y |Ax K is Exercise it follows that A = A. The assertion that (A + B ) = A + B 16.2. Items 3. and 4. Making use of Schwarzs inequality (Theorem 16.2), we have A = = = sup
kK : k =1

A k sup sup | A k |h | | k |Ah | = sup


hH : h =1

sup sup

kK : k =1 hH : h =1

Ah = A

= x|A y

for all x H and y K.

(16.7)

hH : h =1 kK : k =1

Moreover, for all A, B L(H, K ) and C,


, 1. (A + B ) = A + B 2. A := (A ) = A, 3. A = A and 2 4. A A = A . 5. If K = H, then (AB ) = B A . In particular A L (H ) has a bounded 1 inverse i A has a bounded inverse and (A ) = A1 .

so that A = A . Since A A A and A


2

A = A

= =

sup
hH : h =1

Ah

sup
hH : h =1

| Ah|Ah | sup A Ah = A A
2

sup
hH : h =1

| h|A Ah |
2

(16.8)

Proof. For each y K, the map x Ax|y K is in H and therefore there exists, by Theorem 16.14, a unique vector z H (we will denote this z by A (y )) such that Ax|y K = x|z H for all x H. This shows there is a unique map A : K H such that Ax|y K = x|A (y ) for all x H and y K. To see A is linear, let y1 , y2 K and C, then for any x H,
2

hH : h =1

we also have A A A A A which shows A Alternatively, from Eq. (16.8), A


2

= A A .

A A A

(16.9)

Alternatively, choose x0 M \ {0} such that f (x0 ) = 1. For x M we have f (x x0 ) = 0 provided that := f (x). Therefore x x0 M M = {0} , i.e. x = x0 . This again shows that M is spanned by x0 .

which then implies A A . Replacing A by A in this last inequality shows A A and hence that A = A . Using this identity back in 2 Eq. (16.9) proves A = A A . Now suppose that K = H. Then ABh|k = Bh|A k = h|B A k
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 196

job: prob

16 Hilbert Space Basics

197

which shows (AB ) = B A . If A1 exists then A1 A


Proof. Let z Z and choose zn S such that zn z. Since T zm T zn C zm zn 0 as m, n , z exists. Moreover, it follows by the completeness of X that limn T zn =: T if wn S is another sequence converging to z, then T zn T wn C zn wn C z z = 0 z is well dened. It is now a simple matter to check that T : and therefore T Z X is still linear and that z = lim T
n

A = AA1 = A
1

= I = I and = I = I.
1

This shows that A is invertible and (A ) invertible then so is A = A .

A1

. Similarly if A is

Exercise 16.2. Let H, K, M be Hilbert spaces, A, B L(H, K ), C L(K, M ) and (CA) = A C L(M, H ). and C. Show (A + B ) = A + B Exercise 16.3. Let H = Cn and K = Cm equipped with the usual inner products, i.e. z |w H = z w for z, w H. Let A be an m n matrix thought of as a linear operator from H to K. Show the matrix associated to A : K H is the conjugate transpose of A. Lemma 16.16. Suppose A : H K is a bounded operator, then: 1. Nul(A ) = Ran(A) . 2. Ran(A) = Nul(A ) . 3. if K = H and V H is an A invariant subspace (i.e. A(V ) V ), then V is A invariant. Proof. An element y K is in Nul(A ) i 0 = A y |x = y |Ax for all x H which happens i y Ran(A) . Because, by Exercise 16.1, Ran(A) = Ran(A) , and so by the rst item, Ran(A) = Nul(A ) . Now suppose A(V ) V and y V , then A y |x = y |Ax = 0 for all x V which shows A y V . The next elementary theorem (referred to as the bounded linear transformation theorem, or B.L.T. theorem for short) is often useful. Theorem 16.17 (B. L. T. Theorem). Suppose that Z is a normed space, X is a Banach3 space, and S Z is a dense linear subspace of Z. If T : S X is a bounded linear transformation (i.e. there exists C < such that T z C z L(Z, X ) and for all z S ), then T has a unique extension to an element T this extension still satises z C z T
3

T zn lim C zn = C z
n

for all x Z.

is an extension of T to all of the Z. The uniqueness of this extension is Thus T easy to prove and will be left to the reader.

. for all z S

A Banach space is a complete normed space. The main examples for us are Hilbert spaces.

Page: 197

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

17 The Radon-Nikodym Theorem


Theorem 17.1 (A Baby Radon-Nikodym Theorem). Suppose (X, M) is a measurable space, and are two nite positive measures on M such that (A) (A) for all A M. Then there exists a measurable function, : X [0, 1] such that d = d. Proof. If f is a non-negative simple function, then (f ) =
a0

Denition 17.2. Let and be two positive measure on a measurable space, (X, M). Then: 1. and are mutually singular (written as ) if there exists A M such that (A) = 0 and (Ac ) = 0. We say that lives on A and lives on Ac . 2. The measure is absolutely continuous relative to (written as ) provided (A) = 0 whenever (A) = 0. As an example, suppose that is a positive measure and 0 is a measurable function. Then the measure, := is absolutely continuous relative to . Indeed, if (A) = 0 then (A) =
A

a (f = a)
a0

a (f = a) = (f ) .

In light of Theorem 6.34 and the MCT, this inequality continues to hold for all non-negative measurable functions. Furthermore if f L1 () , then (|f |) (|f |) < and hence f L1 ( ) and | (f )| (|f |) (|f |) (X )
1 /2

L2 ()

d = 0. , then d = d

Therefore, L2 () f (f ) C is a continuous linear functional on L2 (). By the Riesz representation Theorem 16.14, there exists a unique L2 () such that (f ) = f d for all f L2 ().
X

We will eventually show that if and are nite and for some measurable function, 0.

In particular this equation holds for all bounded measurable functions, f : X R and for such a function we have (f ) = Re (f ) = Re
X

Denition 17.3 (Lebesgue Decomposition). Let and be two positive measure on a measurable space, (X, M). Two positive measures a and s form a Lebesgue decomposition of relative to if = a + s , a , and s . Lemma 17.4. If 1 , 2 and are positive measures on (X, M) such that 1 and 2 , then (1 + 2 ) . More generally if {i }i=1 is a sequence of positive measures such that i for all i then = i=1 i is singular relative to . Proof. It suces to prove the second assertion since we can then take j 0 for all j 3. Choose Ai M such that (Ai ) = 0 and i (Ac i ) = 0 for all i. c Letting A := i Ai we have (A) = 0. Moreover, since Ac = i Ac i Am for c c all m, we have i (A ) = 0 for all i and therefore, (A ) = 0. This shows that . Lemma 17.5. Let and be positive measures on (X, M). If there exists a Lebesgue decomposition, = s + a , of the measure relative to then this decomposition is unique. Moreover: if is a nite measure then so are s and a .

f d =
X

f Re d.

(17.1)

Thus by replacing by Re if necessary we may assume is real. Taking f = 1<0 in Eq. (17.1) shows 0 ( < 0) =
X

1<0 d 0,

from which we conclude that 1<0 = 0, a.e., i.e. ( < 0) = 0. Therefore 0, a.e. Similarly for > 1, ( > ) ( > ) =
X

1> d ( > )

which is possible i ( > ) = 0. Letting 1, it follows that ( > 1) = 0 and hence 0 1, - a.e.

200

17 The Radon-Nikodym Theorem

Proof. Since s , there exists A M such that (A) = 0 and s (Ac ) = 0 and because a , we also know that a (A) = 0. So for C M, (C A) = s (C A) + a (C A) = s (C A) = s (C ) and (C Ac ) = s (C Ac ) + a (C Ac ) = a (C Ac ) = a (C ) . (17.3) (17.2)

Theorem 17.8 (Radon Nikodym Theorem for Positive Measures). Suppose that and are nite positive measures on (X, M). Then has a unique Lebesgue decomposition = a + s relative to and there exists a unique (modulo sets of measure 0) function : X [0, ) such that da = d. Moreover, s = 0 i . Proof. The uniqueness assertions follow directly from Lemmas 17.5 and 17.6. Existence when and are both nite measures. (Von-Neumanns Proof. See Remark 17.9 for the motivation for this proof.) First suppose that and are nite measures and let = + . By Theorem 17.1, d = hd with 0 h 1 and this implies, for all non-negative measurable functions f, that (f ) = (f h) = (f h) + (f h) or equivalently (f (1 h)) = (f h). Taking f = 1{h=1} in Eq. (17.6) shows that ({h = 1}) = (1{h=1} (1 h)) = 0, i.e. 0 h (x) < 1 for - a.e. x. Let := 1{h<1} h 1h (17.6) (17.5)

Now suppose we have another Lebesgue decomposition, = a + s with M such that s and a . Working as above, we may choose A ) = 0 and A c is is still a null set and and (A s null. Then B = A A c is a null set for both s and B c = Ac A s . Therefore we may use Eqs. (17.2) and (17.3) with A being replaced by B to conclude, s (C ) = (C B ) = s (C ) and c a (C ) = (C B ) = a (C ) for all C M. Lastly if is a nite measure then there exists Xn M such that X = n=1 Xn and (Xn ) < for all n. Since > (Xn ) = a (Xn ) + s (Xn ), we must have a (Xn ) < and s (Xn ) < , showing a and s are nite as well. Lemma 17.6. Suppose is a positive measure on (X, M) and f, g : X [0, ] are functions such that the measures, f d and gd are nite and further satisfy, f d =
A A

gd for all A M.

(17.4)

and then take f = g 1{h<1} (1 h)1 with g 0 in Eq. (17.6) to learn (g 1{h<1} ) = (g 1{h<1} (1 h)1 h) = (g ). Hence if we dene a := 1{h<1} and s := 1{h=1} , we then have s (since s lives on {h = 1} while (h = 1) = 0) and a = and in particular a . Hence = a + s is the desired Lebesgue decomposition of . If we further assume that , then (h = 1) = 0 implies (h = 1) = 0 and hence that s = 0 and we conclude that = a = . Existence when and are -nite measures. Write X = n=1 Xn where Xn M are chosen so that (Xn ) < and (Xn ) < for all n. Let dn = 1Xn d and dn = 1Xn d. Then by what we have just proved there exists s s n L1 (X, n ) L1 (X, ) and measure n such that dn = n dn + dn with s s n n . Since n and n live on Xn there exists An MXn such that (An ) = n (An ) = 0 and

Then f (x) = g (x) for a.e. x. Proof. By assumption there exists Xn M such that Xn X and f d < and Xn gd < for all n. Replacing A by A Xn in Eq. Xn (17.4) implies 1Xn f d =
A AXn

f d =
AXn

gd =
A

1Xn gd

for all A M. Since 1Xn f and 1Xn g are in L1 () for all n, this equation implies 1Xn f = 1Xn g, a.e. Letting n then shows that f = g, a.e. Remark 17.7. Lemma 17.6 is in general false without the niteness assumption. A trivial counterexample is to take M = 2X , (A) = for all non-empty A M, f = 1X and g = 2 1X . Then Eq. (17.4) holds yet f = g.

Page: 200

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

s s n (X \ An ) = n (Xn \ An ) = 0. s This shows that n for all n and so by Lemma 17.4, s := singular relative to . Since s (n n + n )= n=1 n=1 s (n 1Xn + n ) = + s , n=1 s n is

=
n=1

n =

(17.7)

where := n=1 1Xn n , it follows that = a + s with a = . Hence this is the desired Lebesgue decomposition of relative to . Remark 17.9. Here is the motivation for the above construction. Suppose that d = ds + d is the Radon-Nikodym decomposition and X = A B such that s (B ) = 0 and (A) = 0. Then we nd s (f ) + (f ) = (f ) = (hf ) = (hf ) + (hf ). Letting f 1A f then implies that (1A f ) = s (1A f ) = (1A hf ) which show that h = 1, a.e. on A. Also letting f 1B f implies that (1B f ) = (h1B f ) + (h1B f ) = (h1B f ) + (h1B f ) which implies, = h + h, a.e. on B, i.e. (1 h) = h, a.e. on B.
h In particular it follows that h < 1, = a.e. on B and that = 1 h 1h<1 , a.e. So up to sets of measure zero, A = {h = 1} and B = {h < 1} and therefore,

d = 1{h=1} d + 1{h<1} d = 1{h=1} d +

h 1h<1 d. 1h

18 Conditional Expectation
In this section let (, B , P ) be a probability space, i.e. (, B , P ) is a measure space and P ( ) = 1. Let G B be a sub sigma algebra of B and write f Gb if f : C is bounded and f is (G , BC ) measurable. If A B and P (A) > 0, we will let E [X |A] := P (A B ) E [X : A] and P (B |A) := E [1B |A] := P (A) P (A) 0 = E [F : F = 0] E [G : F = 0] and hence that G1F =0 = 0 a.s. Similarly if A := {G > F } with > 1 in Eq. (18.1), then E [F : G > F ] E [G : G > F ] E [F : G > F ] = E [F : G > F ] . Since > 1, the only way this can happen is if E [F : G > F ] = 0. By the MCT we may now let 1 to conclude, 0 = E [F : G > F ] , i.e. F 1G>F = 0 a.s. Therefore, we have shown, almost surely, either F = 0 then G = 0 and F = 0 then G F and hence G F a.s. If F L1 (, B , P ) and E [F : A] = 0 for all A B, we may conclude by a simple limiting argument that E [F h] = 0 for all h Bb . Taking h := sgn(F ) := F |F | 1|F |>0 in this identity then implies 0 = E [F h] = E F which implies that F = 0 a.s. Denition 18.3 (Conditional Expectation). Let EG : L2 (, B , P ) L2 (, G , P ) denote orthogonal projection of L2 (, B , P ) onto the closed subspace L2 (, G , P ). For f L2 (, B , P ), we say that EG f L2 (, G , P ) is the conditional expectation of f. Theorem 18.4. Let (, B , P ) and G B be as above and let f, g L1 (, B , P ). The operator EG : L2 (, B , P ) L2 (, G , P ) extends uniquely to a linear contraction from L1 (, B , P ) to L1 (, G , P ). This extension enjoys the following properties; 1. If f 0, P a.e. then EG f 0, P a.e. 2. Monotinicity. If f g, P a.e. there EG f EG g, P a.e. 3. |EG f | EG |f | , P a.e. 4. If f L1 (, B , P ) then F = EG f L1 (, G , P ) i E(F h) = E(f h) for all h Gb . (18.2) F 1|F |>0 = E |F | 1|F |>0 = E [|F |] |F |

for all integrable random variables, X, and B B . We will often use the factorization Lemma 6.35 in this section. Because of this let us repeat it here. Lemma 18.1. Suppose that (Y, F ) is a measurable space and F : Y is a , there is a map. Then to every ( (F ), BR ) measurable function, H : R such that H = h F. (F , BR ) measurable function h : Y R Proof. First suppose that H = 1A where A (F ) = F 1 (F ). Let B F such that A = F 1 (B ) then 1A = 1F 1 (B ) = 1B F and hence the Lemma is valid in this case with h = 1B . More generally if H = ai 1Ai is a simple function, then there exists Bi F such that 1Ai = 1Bi F and hence H = h F . For general ( (F ), F ) measurable with h := ai 1Bi a simple function on R , choose simple functions Hn converging to H. Let hn function, H, from R such that Hn = hn F. Then it follows that be simple functions on R H = lim Hn = lim sup Hn = lim sup hn F = h F
n n n

. where h := lim sup hn a measurable function from Y to R


n

Lemma 18.2. Suppose that F, G : [0, ] are B measurable functions. Then F G a.s. i E [F : A] E [G : A] for all A B . (18.1)

In particular F = G a.s. i equality holds in Eq. (18.1). Moreover, for F L1 (, B , P ) , F = 0 a.s. i E [F : A] = 0 for all A B . Proof. Hopefully it is clear to the reader that it suces to prove the rst assertion. Also it is clear that F G a.s. implies Eq. (18.1). For the converse assertion, if we take A = {F = 0} in Eq. (18.1) we learn that

5. Pull out property or product rule. If g Gb and f L1 (, B , P ), then EG (gf ) = g EG f, P a.e.

204

18 Conditional Expectation

6. Tower or smoothing property. If G0 G1 B. Then EG0 EG1 f = EG1 EG0 f = EG0 f a.s. for all f L1 (, B , P ) . (18.3)

E [|EG f | h] = E EG f sgn (EG f )h = E f sgn (EG f )h E [|f | h] = E [EG |f | h] . Since h is arbitrary, it follows that |EG f | EG |f | , P a.e. Item 6. Now suppose 0 fn f L1 (, B , P ) and fn f a.s. Then by the MCT (or DCT) it follows that fn f in L1 (, B , P ) and therefore EG fn EG f in L1 (, B , P ) . On the other hand, by item 2. g := limn EG fn exists a.s. and we may identify g with EG f a.s. Thus we have shown EG fn EG f almost surely and in L1 (, B , P ) . Item 6., by the item 5. of the projection Theorem 16.12, Eq. (18.3) holds on L2 (, B , P ). By continuity of conditional expectation on L1 (, B , P ) and the density of L1 probability spaces in L2 probability spaces shows that Eq. (18.3) continues to hold on L1 (, B , P ). Remark 18.5. There is another standard construction of EG f based on the characterization in Eq. (18.2) and the Radon Nikodym Theorem 17.8. It goes as follows, for 0 f L1 (P ) , let Q := f P and observe that Q|G P |G and hence there exists 0 g L1 (, G , P ) such that dQ|G = gdP |G . This then implies that f dP = Q (A) =
A A

Proof. By the denition of orthogonal projection, f L2 (, B , P ) and h Gb , E(f h) = E(f EG h) = E(EG f h). (18.4) Taking h = sgn (EG f ) := in Eq. (18.4) shows E(|EG f |) = E(EG f h) = E(f h) E(|f h|) E(|f |). (18.6) EG f 1|E f |>0 EG f G (18.5)

It follows from this equation and the BLT (Theorem 16.17) that EG extends uniquely to a contraction form L1 (, B , P ) to L1 (, G , P ). Moreover, by a simple limiting argument, Eq. (18.4) remains valid for all f L1 (, B , P ) and h Gb . Indeed, if fn := f 1|f |n L1 (, B , P ) , then fn f in L1 (, B , P ) and hence E(EG f h) = E( lim EG fn h) = lim E(EG fn h)
n n

gdP for all A G ,

= lim E(fn h) = E(f h).


n

(18.7)

i.e. g = EG f. For general real valued, f L1 (P ) , dene EG f = EG f+ EG f and then for complex f L1 (P ) let EG f = EG Re f + iEG Im f. Notation 18.6 In the future, we will often write EG f as E [f |G ] . Moreover, if (X, M) is a measurable space and X : X is a measurable map. We will often simply denote E [f | (X )] simply by E [f |X ] . We will further let P (A|G ) := E [1A |G ] be the conditional probability of A given G , and P (A|X ) := P (A| (X )) be conditional probability of A given X. Exercise 18.1. Suppose f L1 (, B , P ) and f > 0 a.s. Show E [f |G ] > 0 a.s. Use this result to conclude if f (a, b) a.s. for some a, b such that a < b , then E [f |G ] (a, b) a.s. More precisely you are to show that any version, g, of E [f |G ] satises, g (a, b) a.s.

Conversely if F L1 (, G , P ) satises Eq. (18.2), then E(F h) = E(f h) = E(EG f h) for all h Gb , or equivalently E((F EG f ) h) = 0 for all h Gb . Taking h = sgn (F EG f ) in this identity then shows E [|F EG f |] = 0, i.e. F = EG f a.s. This proves item 4. Item 5. is now an easy consequence of the characterization in item 4., since if h Gb , E [(g EG f ) h] = E [EG f hg ] = E [f hg ] = E [gf h] = E [EG (gf ) h] . Thus EG (gf ) = g EG f, P a.e. Items 1 and 2. If f, h 0 then 0 E(f h) = E(EG f h) and since this holds for all h 0 in Gb , EG f 0, P a.e. If f g a.s., we may apply this result with f replaced by f g to complete the proof of both items. Item 3. If f is real, f |f | and so by Item 2., EG f EG |f | , i.e. |EG f | EG |f | , P a.e. For complex f, let h 0 be a bounded and G measurable function. Then

18.1 Examples
Example 18.7. Suppose G is the trivial algebra, i.e. G = {, } . In this case EG f = Ef a.s. Example 18.8. On the opposite extreme, if G = B , then EG f = Ef a.s.

Page: 204

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

18.1 Examples

205

Lemma 18.9. Suppose (X, M) is a measurable space, X : X is a measurable function, and G is a sub- -algebra of B . If X is independent of G and f : X R is a measurable function such that f (X ) L (, B , P ) , then EG [f (X )] = E [f (X )] a.s.. Conversely if EG [f (X )] = E [f (X )] a.s. for all bounded measurable functions, f : X R, then X is independent of G . Proof. Suppose that X is independent of G , f : X R is a measurable function such that f (X ) L (, B , P ) , := E [f (X )] , and A G . Then, by independence, E [f (X ) : A] = E [f (X ) 1A ] = E [f (X )] E [1A ] = E [1A ] = E [ : A] . Therefore EG [f (X )] = = E [f (X )] a.s. Conversely if EG [f (X )] = E [f (X )] = and A G , then E [f (X ) 1A ] = E [f (X ) : A] = E [ : A] = E [1A ] = E [f (X )] E [1A ] . Since this last equation is assumed to hold true for all A G and all bounded measurable functions, f : X R, X is independent of G . The following remark is often useful in computing conditional expectations. The following Exercise should help you gain some more intuition about conditional expectations. (X ) a.s. Remark 18.10 (Note well.). According to Lemma 18.1, E (f |X ) = f (X ) is for some measurable function, f : X R. So computing E (f |X ) = f equivalent to nding a function, f : X R, such that (X ) h (X ) E [f h (X )] = E f for all bounded and measurable functions, h : X R. Exercise 18.2. Suppose (, B , P ) is a probability space and P := B is a partition of . (Recall this means = i=1 Ai .) Let G be the algebra generated by P . Show: 1. B G i B = i Ai for some N. 2. g : R is G measurable i g = i=1 i 1Ai for some i R. 1 3. For f L (, B , P ), let E [f |Ai ] := E [1Ai f ] /P (Ai ) if P (Ai ) = 0 and E [f |Ai ] = 0 otherwise. Show
{Ai }i=1

Proposition 18.11. Suppose that (, B , P ) is a probability space, (X, M, ) and (Y, N , ) are two nite measure spaces, X : X and Y : Y are measurable functions, and there exists 0 L1 (, B , ) such that P ((X, Y ) U ) = U (x, y ) d (x) d (y ) for all U M N . Let (x) :=
Y

(x, y ) d (y )

(18.9)

and x X and B N , let Q (x, B ) :=


1 (x) B

(x, y ) d (y ) if (x) (0, ) y0 (B ) if (x) {0, }

(18.10)

where y0 is some arbitrary but xed point in Y. Then for any bounded (or nonnegative) measurable function, f : Y R, we have E [f (Y ) |X ] = Q (X, f ) a.s. where Q (X, f ) :=
1 (X )

f (y ) (X, y ) d (y ) if (X ) (0, ) Y y0 (f ) = f (y0 ) if (X ) {0, } .

Proof. Our goal is to compute E [f (Y ) |X ] . According to Remark 18.10, we are searching for a bounded measurable function, g : X R, such that E [f (Y ) h (X )] = E [g (X ) h (X )] for all h Mb . (18.11)

(18.8)

(Throughout this argument we are going to repeatedly use the Tonelli - Fubini theorems.) We now compare both sides of this equality, E [f (Y ) h (X )] =
XY

h (x) f (y ) (x, y ) d (x) d (y ) h (x)


X Y

f (y ) (x, y ) d (y ) d (x)

(18.12)

E [g (X ) h (X )] =
XY

h (x) g (x) (x, y ) d (x) d (y ) h (x) g (x) (x) d (x)


X

(18.13)

EG f =
i=1

E [f |Ai ] 1Ai a.s.

where Comparing Eqs. (18.12) and (18.13), which are to be equal for all h Mb , requires us to demand, f (y ) (x, y ) d (y ) = g (x) (x) for a.e. x.
Y

(18.14)

Page: 205

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

206

18 Conditional Expectation

There are two possible problems in solving this equation for g (x) as a particular point x; the rst is when (x) = 0 and the second is when (x) = . Since (x) d (x) =
X X Y

Denition 18.12. Let (X, M) and (Y, N ) be measurable spaces. A function, Q : X N [0, 1] is a probability kernel on X Y i 1. Q (x, ) : N [0, 1] is a probability measure on (Y, N ) for each x X and 2. Q (, B ) : X [0, 1] is M/BR measurable for all B N . If Q is a probability kernel on X Y and f : Y R is a bounded measurable function or a positive measurable function, then x Q (x, f ) := f (y ) Q (x, dy ) is M/BR measurable. This is clear for simple functions and Y then for general functions via simple limiting arguments. Denition 18.13. Let (X, M) and (Y, N ) be measurable spaces and X : X and Y : Y be measurable functions. A probability kernel, Q, on X Y is said to be a regular conditional distribution of Y given X i Q (X, B ) is a version of P (Y B |X ) for each B N . Equivalently, we should have Q (X, f ) = E [f |X ] a.s. for all f Nb . When X = and M = G is a sub- algebra of B , we way that Q is the regular conditional distribution of Y given G . The probability kernel, Q, dened in Eq. (18.10) is an example of a regular conditional distribution of Y given X. Remark 18.14. Unfortunately, regular conditional distributions do not always exists. However, if we require Y to be a standard Borel space, (i.e. Y is isomorphic to a Borel subset of R), then a conditional distribution of Y given X will always exists. See Theorem 18.21. Moreover, it is known that all reasonable measure spaces are standard Borel spaces, see Section 18.4 below for more details. So in most instances of interest a regular conditional distribution of Y given X will exist. Exercise 18.3. Suppose that (X, M) and (Y, N ) are measurable spaces, X : X and Y : Y are measurable functions, and there exists a regular conditional distribution, Q, of Y given X. Show: 1. For all bounded measurable functions, f : (X Y, M N ) R, the function X x Q (x, f (x, )) is measurable and Q (X, f (X, )) = E [f (X, Y ) |X ] a.s. (18.16)

(x, y ) d (y ) d (x) = 1,

we know that (x) < for a.e. x and so the second problem is not an issue. Moreover if (x) = 0, then (x, y ) = 0 for a.e. y and therefore f (y ) (x, y ) d (y ) = 0
Y

(18.15)

and Eq. (18.14) will be valid no matter how we choose g (x) at points where (x) = 0. Therefore, if we let y0 Y be an arbitrary but xed point and then dene 1 (x) (0, ) (x) Y f (y ) (x, y ) d (y ) if g (x) := , f (y0 ) if (x) {0, } then we have shown E [f |X ] = g (X ) = Q (X, f ) a.s. as desired. (Observe where that when (x) < , (x, ) L1 ( ) and hence the integral in the denition of g is well dened.) Just for added security, let us check directly that g (X ) = E [f (Y ) |X ] a.s.. According to Eq. (18.13) we have E [g (X ) h (X )] =
X

h (x) g (x) (x) d (x) h (x) g (x) (x) d (x)


X{0<< }

= =
X{0<< }

h (x) (x) h (x)


X{0<< } Y

1 (x)

f (y ) (x, y ) d (y ) d (x)
Y

= =
X

f (y ) (x, y ) d (y ) d (x)

h (x)
Y

f (y ) (x, y ) d (y ) d (x) (by Eq. (18.12)),

= E [f (Y ) h (X )]

wherein we have repeatedly used ( = ) = 0 and Eq. (18.15) holds when (x) = 0. This completes the verication that g (X ) = E [f (Y ) |X ] a.s.. This proposition shows that conditional expectation is a generalization of the notion of performing integration over a partial subset of the variables in the integrand. Whereas to compute the expectation, one should integrate over all of the variables. It also gives an example of regular conditional probabilities.

Hint: let H denote the set of bounded measurable functions, f, on X Y such that the two assertions are valid. 2. If A M N and := P X 1 be the law of X, then P ((X, Y ) A) =
X

Q (x, 1A (x, )) d (x) =


X

d (x)
Y

1A (x, y ) Q (x, dy ) . (18.17)

Page: 206

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

18.2 Additional Properties of Conditional Expectations

207

Exercise 18.4. Keeping the same notation as in Exercise 18.3 and further assume that X and Y are independent. Find a regular conditional distribution of Y given X and prove E [f (X, Y ) |X ] = hf (X ) a.s. bounded measurable f : X Y R, where hf (x) := E [f (x, Y )] for all x X, i.e. E [f (X, Y ) |X ] = E [f (x, Y ) |X ] |x=X a.s.

E [F : A] = lim E [EG [f n] : A] = lim E [f n : A] = lim E [f : A] .


n n n

Thus Eq. (18.18) holds and this uniquely determines F follows from Lemma 18.2. If 0 f g, then EG f = lim EG [f n] lim EG [g n] = EG g a.s.
n n

and so EG still preserves order. Item 2. Suppose that, almost surely, 0 fn fn+1 for all n, then EG fn is a.s. increasing in n. Hence, again by two applications of the MCT, for any A G , we have E lim EG fn : A = lim E [EG fn : A] = lim E [fn : A]
n n n

18.2 Additional Properties of Conditional Expectations


Theorem 18.15 (Extending EG ). If f : [0, ] is B measurable, the function F := limn EG [f n] exists a.s. and is, up to sets of measure zero, uniquely determined by as the G measurable function, F : [0, ] , satisfying E [f : A] = E [F : A] for all A G . (18.18) Hence it is consistent to denote F by EG f. In addition we now have; 1. Properties 2., 5. (with 0 g Gb ), and 6. of Theorem 18.4 still hold for any B measurable functions such that 0 f g. 2. Conditional Monotone Convergence (cMCT). Suppose that, almost surely, 0 fn fn+1 for all n, then then limn EG fn = EG [limn fn ] a.s. 3. Conditional Fatous Lemma (cFatou). Suppose again that 0 fn L1 (, B , P ) a.s., then EG lim inf fn lim inf EG [fn ] a.s.
n n

= E lim fn : A = E EG
n

lim fn : A

from which it follows that limn EG fn = EG [limn fn ] a.s. Item 1. We have already proved property 2. The other properties are also easily proved by simple limiting arguments. Indeed if 0 g Gb and f 0, then by cMCT, EG [gf ] = lim EG [g (f n)] = lim g EG [(f n)] = g EG f a.s.
n n

Similarly by cMCT, EG0 EG1 f = EG0 lim EG1 (f n) = lim EG0 EG1 (f n)
n n

= lim EG0 (f n) = EG0 f


n

and (18.19) EG1 EG0 f = EG1 lim EG0 (f n) = lim EG1 EG0 [f n]
n n

4. Conditional Dominated Convergence (cDCT). If fn f a.s. and |fn | g L1 (, B , P ) , then EG fn EG f a.s. Remark 18.16. Regarding item 4. above. Suppose that fn f, |fn | gn P L1 (, B , P ) , gn g L1 (, B , P ) and Egn Eg. Then by the DCT in Corollary 11.8, we know that fn f in L1 (, B , P ) . Since EG is a contraction, it follows that EG fn EG f in L1 (, B , P ) and hence in probability. Proof. Since f n L1 (, B , P ) and f n is increasing, it follows that F := limn EG [f n] exists a.s. Moreover, by two applications of the standard MCT, we have for any A G , that
P

= lim EG0 (f n) = EG0 f.


n

Item 3. For 0 fn , let gk := inf nk fn . Then gk fk for all k and gk lim inf n fn and hence by cMCT and item 1., EG lim inf fn = lim EG gk lim inf EG fk a.s.
n k k

Item 4. As usual it suces to consider the real case. Let fn f a.s. and |fn | g a.s. with g L1 (, B , P ) . Then following the proof of the Dominated convergence theorem, we start with the fact that 0 g fn a.s. for all n. Hence by cFatou,
macro: svmonob.cls date/time: 23-Feb-2007/15:20

Page: 207

job: prob

208

18 Conditional Expectation
n

EG (g f ) = EG lim inf (g fn ) lim inf EG (g fn ) = EG g +


n

18.3 Regular Conditional Distributions


lim inf n EG (fn ) in + case lim supn EG (fn ) in case, Lemma 18.19. Suppose that (X, M) is a measurable space and F : X R R is a function such that; 1) F (, t) : X R is M/BR measurable for all t R, and 2) F (x, ) : R R is right continuous for all x X. Then F is MBR /BR measurable. Proof. For n N, the function,

where the above equations hold a.s. Cancelling EG g from both sides of the equation then implies lim sup EG (fn ) EG f lim inf EG (fn ) a.s.
n n

Fn (x, t) := Theorem 18.17 (Conditional Jensens inequality). Let (, B , P ) be a probability space, a < b , and : (a, b) R be a convex function. Assume f L1 (, B , P ; R) is a random variable satisfying, f (a, b) a.s. and (f ) L1 (, B , P ; R). Then (EG f ) EG [(f )] a.s. (18.20)
k=

F x, (k + 1) 2n 1(k2n ,(k+1)2n ] (t) ,

is M BR /BR measurable. Using the right continuity assumption, it follows that F (x, t) = limn Fn (x, t) for all (x, t) X R and therefore F is also M BR /BR measurable. Theorem 18.20. Suppose that (X, M) is a measurable space, X : X is a measurable function and Y : R is a random variable. Then there exits a probability kernel, Q, on X R such that E [f (Y ) |X ] = Q (X, f ) , P a.s., for all bounded measurable functions, f : R R. Proof. For each r Q, let qr : X [0, 1] be a measurable function such that E [1Y r |X ] = qr (X ) a.s. Let := P X 1 . Then using the basic properties of conditional expectation, qr qs a.s. for all r s, limr qr = 1 and limr qr = 0, a.s. Hence the set, X0 X where qr (x) qs (x) for all r s, limr qr (x) = 1, and limr qr (x) = 0 satises, (X0 ) = P (X X0 ) = 1. For t R, let F (x, t) := 1X0 (x) inf {qr (x) : r > t} + 1X\X0 (x) 1t0 . Then F (, t) : X R is measurable for each t R and F (x, ) is a distribution function on R for each x X. Hence an application of Lemma 18.19 shows F : X R [0, 1] is measurable. For each x X and B BR , let Q (x, B ) = F (x,) (B ) where F denotes the probability measure on R determined by a distribution function, F : R [0, 1] . We claim that Q is the desired probability kernel. To prove this, let H be the collection of bounded measurable functions, f : R R, such that X x Q (x, f ) R is measurable and E [f (Y ) |X ] = Q (X, f ) , P a.s. It is easily seen that H is a linear subspace which is closed under bounded convergence. We will nish the proof by showing that H contains the multiplicative class, M = 1(,t] : t R . Notice that Q x, 1(,t] = F (x, t) is measurable. Now let r Q and g : X R be a bounded measurable function, then

Proof. Let := Q (a, b) a countable dense subset of (a, b) . By Theorem 11.38 (also see Lemma 7.31) and Figure 7.2 when is C 1 ) (y ) (x) + (x)(y x) for all for all x, y (a, b) , where (x) is the left hand derivative of at x. Taking y = f and then taking conditional expectations imply, EG [(f )] EG (x) + (x)(f x) = (x) + (x)(EG f x) a.s. Since this is true for all x (a, b) (and hence all x in the countable set, ) we may conclude that EG [(f )] sup (x) + (x)(EG f x) a.s.
x

By Exercise 18.1, EG f (a, b) , and hence it follows from Corollary 11.39 that
x

sup (x) + (x)(EG f x) = (EG f ) a.s.

Combining the last two estimates proves Eq. (18.20). Corollary 18.18. The conditional expectation operator, EG maps Lp (, B , P ) into Lp (, B , P ) and the map remains a contraction for all 1 p . Proof. The case p = and p = 1 have already been covered in Theorem 18.4. So now suppose, 1 < p < , and apply Jensens inequality with (x) = p p p |x| to nd |EG f | EG |f | a.s. Taking expectations of this inequality gives the desired result.
Page: 208 job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

18.4 Appendix: Standard Borel Spaces

209

E [1Y r g (X )] = E [E [1Y r |X ] g (X )] = E [qr (X ) g (X )] = E [qr (X ) 1X0 (X ) g (X )] . For t R, we may let r t in the above equality (use DCT) to learn, E [1Y t g (X )] = E [F (X, t) 1X0 (X ) g (X )] = E [F (X, t) g (X )] . Since g was arbitrary, we may conclude that Q X, 1(,t] = F (X, t) = E [1Y t |X ] a.s. This completes the proof. This result leads fairly immediately to the following far reaching generalization. Theorem 18.21. Suppose that (X, M) is a measurable space and (Y, N ) is a standard Borel space, see Appendix 18.4 below. Suppose that X : X and Y : Y are measurable functions. Then there exits a probability kernel, Q, on X Y such that E [f (Y ) |X ] = Q (X, f ) , P a.s., for all bounded measurable functions, f : Y R. Proof. By denition of a standard Borel space, we may assume that Y BR and N = BY . In this case Y may also be viewed to be a measurable map form R such that Y ( ) Y. By Theorem 18.20, we may nd a probability kernel, Q0 , on X R such that E [f (Y ) |X ] = Q0 (X, f ) , P a.s., for all bounded measurable functions, f : R R. Taking f = 1Y in Eq. (18.21) shows 1 = E [1Y (Y ) |X ] = Q0 (X, Y) a.s.. Thus if we let X0 := {x X : Q0 (x, Y) = 1} , we know that P (X X0 ) = 1. Let us now dene Q (x, B ) := 1X0 (x) Q0 (x, B ) + 1X\X0 (x) y (B ) for (x, B ) X BY , where y is an arbitrary but xed point in Y. Then and hence Q is a probability kernel on X Y. Moreover if B BY BR , then Q (X, B ) = 1X0 (X ) Q0 (X, B ) = 1X0 (X ) E [1B (Y ) |X ] = E [1B (Y ) |X ] a.s. This shows that Q is the desired regular conditional probability. Corollary 18.22. Suppose G is a sub- algebra, (Y, N ) is a standard Borel space, and Y : Y is a measurable function. Then there exits a probability kernel, Q, on (, G ) (Y, N ) such that E [f (Y ) |G ] = Q (, f ) , P - a.s. for all bounded measurable functions, f : Y R. Proof. This is a special case of Theorem 18.21 applied with (X, M) = (, G ) and Y : being the identity map which is B /G measurable.
Page: 209 job: prob

18.4 Appendix: Standard Borel Spaces


Denition 18.23. Two measurable spaces, (X, M) and (Y, N ) are said to be isomorphic if there exists a bijective map, f : X Y such that f (M) = N and f 1 (N ) = M, i.e. both f and f 1 are measurable. In this case we say f is a measure theoretic isomorphism and we will write X = Y. Denition 18.24. A measurable space, (X, M) is said to be a standard Borel space if (X, M) = (B, BB ) where B is a Borel subset of (0, 1) , B(0,1) . Denition 18.25 (Polish spaces). A Polish space is a separable topological space (X, ) which admits a complete metric, , such that = . The main goal of this chapter is to prove every Borel subset of a Polish space is a standard Borel space, see Corollary 18.35 below. Along the way we d will show a number of spaces, including [0, 1] , , (0, 1], [0, 1] , Rd , and RN , are all isomorphic to (0, 1) . Moreover we also will see that the a countable product of standard Borel spaces is again a standard Borel space, see Corollary 18.32.

On rst reading, you may wish to skip the rest of this section.
Lemma 18.26. Suppose (X, M) and (Y, N ) are measurable spaces such that X = n=1 Xn , Y = n=1 Yn , with Xn M and Yn N . If (Xn , MXn ) is isomorphic to (Yn , NYn ) for all n then X = Y. Moreover, if (Xn , Mn ) and (Yn , Nn ) are isomorphic measure spaces, then (X := n=1 Xn , n=1 Mn ) are (Y := n=1 Yn , n=1 Nn ) are isomorphic. Proof. For each n N, let fn : Xn Yn be a measure theoretic isomorphism. Then dene f : X Y by f = fn on Xn . Clearly, f : X Y is a bijection and if B N , then
1 1 f 1 (B ) = (B Yn ) = n=1 f n=1 fn (B Yn ) M.

(18.21)

This shows f is measurable and by similar considerations, f 1 is measurable as well. Therefore, f : X Y is the desired measure theoretic isomorphism. For the second assertion, let fn : Xn Yn be a measure theoretic isomorphism of all n N and then dene f (x) = (f1 (x1 ) , f2 (x2 ) , . . . ) with x = (x1 , x2 , . . . ) X. Again it is clear that f is bijective and measurable, since

f 1
n=1

Bn

=
n=1

1 fn (Bn ) n=1 Nn

for all Bn Mn and n N. Similar reasoning shows that f 1 is measurable as well.


macro: svmonob.cls date/time: 23-Feb-2007/15:20

210

18 Conditional Expectation

Proposition 18.27. Let < a < b < . The following measurable spaces equipped with there Borel algebras are all isomorphic; (0, 1) , [0, 1] , (0, 1], [0, 1), (a, b) , [a, b] , (a, b], [a, b), R, and (0, 1) where is a nite or countable subset of R \ (0, 1) . Proof. It is easy to see by that any bounded open, closed, or half open interval is isomorphic to any other such interval using an ane transformation. Let us now show (1, 1) = [1, 1] . To prove this it suces, by Lemma 18.26,to observe that

Notation 18.28 Suppose (X, M) is a measurable space and A is a set. Let a : X A X denote projection operator onto the ath component of X A (i.e. a ( ) = (a) for all a A) and let MA := (a : a A) be the product algebra on X A . Lemma 18.29. If : A B is a bijection of sets and (X, M) is a measurable space, then X A , MA = X B , MB . Proof. The map f : X B X A dened by f ( ) = for all X B is a bijection with f 1 () = 1 . If a A and X B , we have
X X a f ( ) = f ( ) (a) = ( (a)) = (a) ( ) , X X and b are the projection operators on X A and X B respectively. where a XA XB Thus a f = (a) for all a A which shows f is measurable. Similarly, X X 1 b f 1 = is measurable as well. 1 (b) showing f
B A A B A B

(1, 1) = {0}
n=0

(2n , 2n ] [2n1 , 2n )

and [1, 1] = {0}

[2
n=0

, 2

n1

) (2

n1

,2

] .

Similarly (0, 1) is isomorphic to (0, 1] because


(0, 1) =
n=0

[2n1 , 2n ) and (0, 1] =


n=0

(2n1 , 2n ].

The assertion involving R can be proved using the bijection, tan : (/2, /2) R. If = {1} , then by Lemma 18.26 and what we have already proved, (0, 1) {1} = (0, 1] = (0, 1) . Similarly if N N with N 2 and = {2, . . . , N + 1} , then
N 1

Proposition 18.30. Let := {0, 1} , i : {0, 1} be projection onto the ith component, and B := (1 , 2 , . . . ) be the product algebra on . Then (, B ) = (0, 1) , B(0,1) . Proof. We will begin by using a specic binary digit expansion of a point x [0, 1) to construct a map from [0, 1) . To this end, let r1 (x) = x, 1 (x) := 1x21 and r2 (x) := x 21 1 (x) (0, 21 ), then let 2 := 1r2 22 and r3 = r2 22 2 0, 22 . Working inductively, we construct {k (x) , rk (x)}k=1 such that k (x) {0, 1} , and
k

(0, 1) = (0, 1] = (0, 2N +1 ]


n=1

(2n , 2n1 ]

while
N 1

(0, 1) = 0, 2N +1
n=1

2n , 2n1

2n : n = 1, 2, . . . , N

rk+1 (x) = rk (x) 2k k (x) = x


j =1

2j j (x) 0, 2k

(18.22)

and so again it follows from what we have proved and Lemma 18.26 that (0, 1) = (0, 1) . Finally if = {2, 3, 4, . . . } is a countable set, we can show (0, 1) = (0, 1) with the aid of the identities,

for all k. Let us now dene g : [0, 1) by g (x) := (1 (x) , 2 (x) , . . . ) . Since each component function, j g = j : [0, 1) {0, 1} , is measurable it follows that g is measurable. By construction,
k

(0, 1) =
n=1

2n , 2n1

2n : n N

x=
j =1

2j j (x) + rk+1 (x)

and (0, 1) = (0, 1] =

and rk+1 (x) 0 as k , therefore (2


n

,2

n1

] . x=

n=1

2
j =1

j (x) and rk+1 (x) =


j =k+1

2j j (x) .

(18.23)

Page: 210

job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

18.4 Appendix: Standard Borel Spaces

211

Hence if we dene f : [0, 1] by f = 2j j , then f (g (x)) = x for all x [0, 1). This shows g is injective, f is surjective, and f in injective on the range of g. We now claim that 0 := g ([0, 1)) , the range of g, consists of those such that i = 0 for innitely many i. Indeed, if there exists an k N such that j (x) = 1 for all j k, then (by Eq. (18.23)) rk+1 (x) = 2k which would contradict Eq. (18.22). Hence g ([0, 1)) 0 . Conversely if 0 and x = f ( ) [0, 1), it is not hard to show inductively that j (x) = j for all j, i.e. g (x) = . For example, if 1 = 1 then x 21 and hence 1 (x) = 1. Alternatively, if 1 = 0, then

j =1

this map is measurable, we have i g 1 : {0, 1} N 1 1 i g () = g () (i) () = (i, ) and hence j i g () = (i, j ) = i,j from which it follows that j i g 1 = {0,1} N i N 1
N N2 N

= {0, 1} is given
2

{0,1}N

()

is measurable for all i, j N

x=
j =2

2j j <
j =2

2j = 21

and hence g is measurable for all i N and hence g 1 is measurable. N2 N{1,2,...,d} This shows is analogous. = {0, 1} . The proof that d = {0, 1} We may now complete the proof with a couple of applications of Lemma 18.29. Indeed N, N {1, 2, . . . , d} , and N2 all have the same cardinality and therefore, N N{1,2,...,d} N2 {0, 1} = {0, 1} = . = {0, 1}

so that 1 (x) = 0. Hence it follows that r2 (x) = j =2 2j j and by similar reasoning we learn r2 (x) 22 i 2 = 1, i.e. 2 (x) = 1 i 2 = 1. The full induction argument is now left to the reader. Since single point sets are in B and := \ 0 = n=1 { : j = 1 for j n} is a countable set, it follows that B and therefore 0 = \ B . Hence we may now conclude that g : [0, 1), B[0,1) (0 , B0 ) is a measurable bijection with measurable inverse given by f |0 , i.e. [0, 1), B[0,1) = (0 , B0 ) . An application of Lemma 18.26 and Proposition 18.27 now implies = 0 = [0, 1) N = [0, 1) = (0, 1) .

Corollary 18.32. Suppose that (Xn , Mn ) for n N are standard Borel spaces, then X := n=1 Xn equipped with the product algebra, M := n=1 Mn is again a standard Borel space. Proof. Let An B[0,1] be Borel sets on [0, 1] such that there exists a mea surable isomorpohism, fn : Xn An . Then f : X A := n=1 An dened by f (x1 , x2 , . . . ) = (f1 (x1 ) , f2 (x2 ) , . . . ) is easily seen to me a measure theoretic isomorphism when A is equipped with the product algebra, n=1 BAn . So according to Corollary 18.31, to nish the proof it suce to show n=1 BAn = MA N where M := n=1 B[0,1] is the product algebra on [0, 1] . The algebra, n=1 BAn , is generated by sets of the form, B := n=1 Bn where Bn BAn B[0,1] . On the other hand, the algebra, MA is generated where B := B by sets of the form, A B n=1 n with Bn B[0,1] . Since

Corollary 18.31. The following spaces are all isomorphic to (0, 1) , B(0,1) ; d N (0, 1) and Rd for any d N and [0, 1] and RN where both of these spaces are equipped with their natural product algebras, . Proof. In light of Lemma 18.26 and Proposition 18.27 we know that d N N (0, 1) = Rd and (0, 1) = [0, 1] = RN . So, using Proposition 18.30, it sufd N ces to show (0, 1) = = (0, 1) and to do this it suces to show d = and N = . N{1,2,...,d} To reduce the problem further, let us observe that d = {0, 1} 2 2 N N and N be dened by = {0, 1} . For example, let g : N {0, 1} g ( ) (i, j ) = (i) (j ) for all = {0, 1}
N N N

= AB
n=1

n An = B
n=1

Bn

n An is the generic element in BA , we see that BA and where Bn = B n=1 n n MA can both be generated by the same collections of sets, we may conclude that n=1 BAn = MA . Our next goal is to show that any Polish space with its Borel algebra is a standard Borel space. Notation 18.33 Let Q := [0, 1]N denote the (innite dimensional) unit cube in RN . For a, b Q let d(a, b) := 1 1 |an bn | = |n (a) n (b)| . n n 2 2 n=1 n=1

. Then g is a bijection and

since (i,j )

{0 ,1 }

N2

g ( ) = j i ( ) , it follows that g is measurable. The inN2

(18.24)

verse, g 1 : {0, 1}
Page: 211

N , to g is given by g 1 () (i) (j ) = (i, j ) . To see


job: prob macro: svmonob.cls

date/time: 23-Feb-2007/15:20

212

18 Conditional Expectation

Exercise 18.5. Show d is a metric and that the Borel algebra on (Q, d) is the same as the product algebra. Solution to Exercise (18.5). It is easily seen that d is a metric on Q which, by Eq. (18.24) is measurable relative to the product algebra, M.. Therefore, M contains all open balls and hence contains the Borel algebra, B . Conversely, since |n (a) n (b)| 2n d (a, b) , each of the projection operators, n : Q [0, 1] is continuous. Therefore each n is B measurable and hence M = ({n }n=1 ) B . Theorem 18.34. To every separable metric space (X, ), there exists a continuous injective map G : X Q such that G : X G(X ) Q is a homeomorphism. Moreover if the metric, , is also complete, then G (X ) is a G set, i.e. the G (X ) is the countable intersection of open subsets of (Q, d) . In short, any separable metrizable space X is homeomorphic to a subset of (Q, d) and if X is a Polish space then X is homeomorphic to a G subset of (Q, d). Proof. (This proof follows that in Rogers and Williams [4, Theorem 82.5 on p. 106.].) By replacing by 1+ if necessary, we may assume that 0 < 1. Let D = {an }n=1 be a countable dense subset of X and dene G (x) = ( (x, a1 ) , (x, a2 ) , (x, a3 ) , . . . ) Q and (x, y ) = d (G (x) , G (y )) = 1 | (x, an ) (y, an )| n 2 n=1

Now suppose that (X, ) is a complete metric space. Let S := G (X ) and be the metric on S dened by (G (x) , G (y )) = (x, y ) for all x, y X. Then (S, ) is a complete metric (being the isometric image of a complete metric space) and by what we have just prove, = dS . Consequently, if u S and > 0 is given, we may nd () such that B (u, ()) Bd (u, ) . Taking () = min ( () , ) , we have diamd (Bd (u, ())) < and diam (Bd (u, ())) < where diam (A) := {sup (u, v ) : u, v A} and diamd (A) := {sup d (u, v ) : u, v A} . denote the closure of S inside of (Q, d) and for each n N let Let S Nn := {N d : diamd (N ) diam (N S ) < 1/n} and let Un := Nn d . From the previous paragraph, it follows that S Un ( Un ) . and therefore S S n=1 ( Un ) and n N, there exists Nn Nn such Conversely if u S n=1 that u Nn . Moreover, since N1 Nn is an open neighborhood of u S, there exists un N1 Nn S for each n N. From the denition of Nn , we have limn d (u, un ) = 0 and (un , um ) max n1 , m1 0 as m, n . Since (S, ) is complete, it follows that {un }n=1 is convergent in (S, ) to some element u0 S. Since (S, dS ) has the same topology as (S, ) it follows that d (un , u0 ) 0 as well and thus that u = u0 S. We have ( Un ) . This completes the proof because we may now shown, S = S n=1 write S = n=1 S1/n where S1/n := u Q : d u, S < 1/n and therefore, S = ( n=1 Un ) n=1 S1/n is a G set. Corollary 18.35. Every Polish space, X, with its Borel algebra is a standard Borel space. Consequently and Borel subset of X is also a standard Borel space. Proof. Theorem 18.34 shows that X is homeomorphic to a measurable (in fact a G ) subset Q0 of (Q, d) and hence X = Q0 . Since Q is a standard Borel space so is Q0 and hence so is X.

for x, y X. To prove the rst assertion, we must show G is injective and is a metric on X which is compatible with the topology determined by . If G (x) = G (y ) , then (x, a) = (y, a) for all a D. Since D is a dense subset of X, we may choose k D such that 0 = lim (x, k ) = lim (y, k ) = (y, x)
k k

and therefore x = y. A simple argument using the dominated convergence theorem shows y (x, y ) is continuous, i.e. (x, y ) is small if (x, y ) is small. Conversely, (x, y ) (x, an ) + (y, an ) = 2 (x, an ) + (y, an ) (x, an ) 2 (x, an ) + | (x, an ) (y, an )| 2 (x, an ) + 2n (x, y ) . Hence if > 0 is given, we may choose n so that 2 (x, an ) < /2 and so if (x, y ) < 2(n+1) , it will follow that (x, y ) < . This shows = . Since G : (X, ) (Q, d) is isometric, G is a homeomorphism.
Page: 212 job: prob

macro: svmonob.cls

date/time: 23-Feb-2007/15:20

References

1. Patrick Billingsley, Probability and measure, third ed., Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons Inc., New York, 1995, A WileyInterscience Publication. MR MR1324786 (95k:60001) 2. Richard Durrett, Probability: theory and examples, second ed., Duxbury Press, Belmont, CA, 1996. MR MR1609153 (98m:60001) 3. Olav Kallenberg, Foundations of modern probability, second ed., Probability and its Applications (New York), Springer-Verlag, New York, 2002. MR MR1876169 (2002m:60002) 4. L. C. G. Rogers and David Williams, Diusions, Markov processes, and martingales. Vol. 1, Cambridge Mathematical Library, Cambridge University Press, Cambridge, 2000, Foundations, Reprint of the second (1994) edition. MR 2001g:60188 5. S. R. S. Varadhan, Probability theory, Courant Lecture Notes in Mathematics, vol. 7, New York University Courant Institute of Mathematical Sciences, New York, 2001. MR MR1852999 (2003a:60001)

Das könnte Ihnen auch gefallen