My Notes

Advanced Probability
Perla Sousi
October 13, 2013

Contents
1 Conditional expectation 3
1.1 Discrete case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Product measure and Fubinis theorem . . . . . . . . . . . . . . . . . . . . . 11
1.4 Examples of conditional expectation . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.2 Conditional density functions . . . . . . . . . . . . . . . . . . . . . . 12
2 Discrete-time martingales 13
2.1 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Gamblers ruin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Martingale convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Doobs inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 L
p
convergence for p > 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Uniformly integrable martingales . . . . . . . . . . . . . . . . . . . . . . . . 21
2.8 Backwards martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 Applications of martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.9.1 Martingale proof of the Radon-Nikodym theorem . . . . . . . . . . . 27
University of Cambridge, Cambridge, UK; p.sousi@statslab.cam.ac.uk

1
3 Continuous-time random processes 28
3.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Martingale regularization theorem . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Convergence and Doobs inequalities in continuous time . . . . . . . . . . . . 36
3.4 Kolmogorovs continuity criterion . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Weak convergence 40
4.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Tightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Large deviations 47
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Cramers theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6 Brownian motion 51
6.1 History and denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Wieners theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Invariance properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4 Strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.5 Reection principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.6 Martingales for Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . 60
6.7 Recurrence and transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.8 Brownian motion and the Dirichlet problem . . . . . . . . . . . . . . . . . . 65
6.9 Donskers invariance principle . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.10 Zeros of Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7 Poisson random measures 71
7.1 Construction and basic properties . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 Integrals with respect to a Poisson random measure . . . . . . . . . . . . . . 73
7.3 Poisson Brownian motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2
1 Conditional expectation
Let (, T, P) be a probability space, i.e. is a set, T is a -algebra on and P is a
probability measure on (, T).
Denition 1.1. T is a -algebra on if it satises:
1. T
2. If A T, then also the complement is in T, i.e.,A
c
T.
3. If (A
n
)
n1
is a collection of sets in T, then
n=1
A
n
T.
Denition 1.2. P is a probability measure on (, T) if it satises:
1. P : T [0, 1], i.e. it is a set function
2. P() = 1 and P() = 0
3. If (A
n
)
n1
is a collection of pairwise disjoint sets in T, then P(
n=1
A
n
) =
n=1
P(A
n
).
Let A, B T be two events with P(B) > 0. Then the conditional probability of A given the
event B is dened by
P(A[B) =
P(A B)
P(B)
.
Denition 1.3. The Borel -algebra, B(R), is the -algebra generated by the open sets in
R, i.e., it is the intersection of all -algebras containing the open sets of R. More formally,
let O be the open sets of R, then
B(R) = c : c is a -algebra containing O.
Informally speaking, consider the open sets of R, do all possible operations, i.e., unions,
intersections, complements, and take the smallest -algebra that you get.
Denition 1.4. X is a random variable, i.e., a measurable function with respect to T,
if X : R is a function with the property that for all open sets V the inverse image
X
1
(V ) T.
Remark 1.5. If X is a random variable, then the collection of sets
B R : X
1
(B) T
is a -algebra (check!) and hence it must contain B(R).
Denition 1.6. For a collection / of subsets of we write (/) for the smallest -algebra
that contains /, i.e.
(/) = c : c is a -algebra containing /.
Let (X
i
)
iI
be a collection of random variables. Then we dene
(X
i
: i I) = ( : X
i
() B : i I, B B) ,
i.e. this is the smallest -algebra that makes (X
i
)
iI
measurable.
3
Let A T. The indicator function 1(A) is dened via
1(A)(x) = 1(x A) =
_
1, if x A;
0, otherwise.
Recall the denition of expectation. First for positive simple random variables, i.e., linear
combinations of indicator random variables, we dene
E
_
n
i=1
c
i
1(A
i
)
_
:=
n
i=1
c
i
P(A
i
),
where c
i
are positive constants and A
i
are measurable events. Next, let X be a non-negative
random variable. Then X is the increasing limit of positive simple variables. For example
X
n
() = 2
n
2
n
X()| n X() as n .
So we dene
E[X] := limE[X
n
].
Finally, for a general random variable X, we can write X = X
+
X
, where X
+
= max(X, 0)
and X
= max(X, 0) and we dene

E[X] := E[X
+
] E[X
],
if at least one of E[X
+
], E[X
] is nite. We call the random variable X integrable, if it

satises E[[X[] < .
Let X be a random variable with E[[X[] < . Let A be an event in T with P(A) > 0. Then
the conditional expectation of X given A is dened by
E[X[A] =
E[X1(A)]
P(A)
,
Our goal is to extend the denition of conditional expectation to -algebras. So far we only
have dened it for events and it was a number. Now, the conditional expectation is going
to be a random variable, measurable with respect to the -algebra with respect to which we
are conditioning.
1.1 Discrete case
Let X be integrable, i.e., E[[X[] < . Lets start with a -algebra which is generated by a
countable family of disjoint events (B
i
)
iI
with
i
B
i
= , i.e., ( = (B
i
, i I). It is easy
to check that ( =
iJ
B
i
: J I.
The natural thing to do is to dene a new random variable X
t
= E[X[(] as follows
X
t
=
iI
E[X[B
i
]1(B
i
).
4
What does this mean? Let . Then X
t
() =
iI
E[X[B
i
]1( B
i
). Note that we
use the convention that E[X[B
i
] = 0, if P(B
i
) = 0
It is very easy to check that
X
t
is ( measurable (1.1)
and integrable, since
E[[X
t
[]
iI
E[X1(B
i
)[ = E[[X[] < .
Let G (. Then it is straightforward to check that
E[X1(G)] = E[X
t
1(G)]. (1.2)
1.2 Existence and uniqueness
Before stating the existence and uniqueness theorem on conditional expectation, let us
quickly recall the notion of an event happening almost surely (a.s.), the Monotone con-
vergence theorem and /
p
spaces.
Let A T. We will say that A happens a.s., if P(A) = 1.
Theorem 1.7. [Monotone convergence theorem] Let (X
n
)
n
be random variables such
that X
n
0 for all n and X
n
X as n a.s. Then
E[X
n
] E[X] as n .
Theorem 1.8. [Dominated convergence theorem] If X
n
X and [X
n
[ Y for all n
a.s., for some integrable random variable Y , then
E[X
n
] E[X].
Let p [1, ) and f a measurable function in (, T, P). We dene the norm
|f|
p
= (E[[f[
p
])
1/p
and we denote by /
p
= /
p
(, T, P) the set of measurable functions f with |f|
p
< . For
p = , we let
|f|
= inf : [f[ a.e.

and /
the set of measurable functions with |f|
< .
Formally, /
p
is the collection of equivalence classes, where two functions are equivalent if
they are equal almost everywhere (a.e.). In practice, we will represent an element of /
p
by
a function, but remember that equality in /
p
means equality a.e..
Theorem 1.9. The space (/
2
, | |
2
) is a Hilbert space with f, g = E[fg]. If 1 is a closed
subspace, then for all f /
2
, there exists a unique (in the sense of a.e.) g 1 such that
|f g|
2
= inf
h7
|f h|
2
and f g, h = 0 for all h 1.
5
Remark 1.10. We call g the orthogonal projection of f on 1.
Theorem 1.11. Let X be an integrable random variable and let ( T be a -algebra. Then
there exists a random variable Y such that:
(a) Y is (-measurable;
(b) Y is integrable and E[X1(A)] = E[Y 1(A)] for all A (.
Moreover, if Y
t
also satises (a) and (b), then Y = Y
t
a.s..
We call Y (a version of ) the conditional expectation of X given ( and write Y = E[X[(]
a.s.. In the case ( = (G) for some random variable G, we also write Y = E[X[G] a.s..
Remark 1.12. We could replace (b) in the statement of the theorem by requiring that for
all bounded (-measurable random variables Z we have
E[XZ] = E[Y Z].
Remark 1.13. In Section 1.4 we will show how to construct explicit versions of the con-
ditional expectation in certain simple cases. In general, we have to live with the indirect
approach provided by the theorem.
Proof of Theorem 1.11. (Uniqueness.) Suppose that both Y and Y
t
satisfy (a) and (b).
Then, clearly the event A = Y > Y
t
( and by (b) we have
E[(Y Y
t
)1(A)] = E[X1(A)] E[X1(A)] = 0,
hence we get that Y Y
t
a.s. Similarly we can get Y Y
t
a.s.
(Existence.) We will prove existence in three steps.
1st step: Suppose that X /
2
. The space /
2
(, T, P) with inner product dened by
U, V = E[UV ] is a Hilbert space by Theorem 1.9 and /
2
(, (, P) is a closed subspace.
(Remember that /
2
convergence implies convergence in probability and convergence in prob-
ability implies convergence a.s. along a subsequence (see for instance [2, A13.2])).
Thus /
2
(T) = /
2
(() +/
2
(()
, and hence, we can write X as X = Y +Z, where Y /

2
(()
and Z /
2
(()
. If we now set Y = E[X[(], then (a) is clearly satised. Let A (. Then

E[X1(A)] = E[Y 1(A)] +E[Z1(A)] = E[Y 1(A)],
since E[Z1(A)] = 0.
Note that from the above denition of conditional expectation for random variables in /
2
,
we get that
if X 0, then Y = E[X[(] 0 a.s., (1.3)
since note that Y < 0 ( and
E[X1(Y < 0)] = E[Y 1(Y < 0)].
6
Notice that the left hand side is nonnegative, while the right hand side is non-positive,
implying that P(Y < 0) = 0.
2nd step: Suppose that X 0. For each n we dene the random variables X
n
= Xn n,
and hence X
n
/
2
. Thus from the rst part of the existence proof we have that for each n
there exists a (-measurable random variable Y
n
satisfying for all A (
E[Y
n
1(A)] = E[(X n)1(A)]. (1.4)
Since the sequence (X
n
)
n
is increasing, from (1.3) we get that almost surely (Y
n
)
n
is increas-
ing. If we now set Y = limsup
n
Y
n
, then clearly Y is (-measurable and almost surely
Y = lim
n
Y
n
. By the monotone convergence theorem in (1.4) we get for all A (
E[Y 1(A)] = E[X1(A)], (1.5)
since X
n
X, as n .
In particular, if E[X] is nite, then E[Y ] is also nite.
3rd step: Finally, for a general random variable X /
1
(not necessarily positive) we
can apply the above construction to X
+
= max(X, 0) and X
= max(X, 0) and then

E[X[(] = E[X
+
[(] E[X
[(] satises (a) and (b).

Remark 1.14. Note that the 2nd step of the above proof gives that if X 0, then there
exists a (-measurable random variable Y such that
for all A (, E[X1(A)] = E[Y 1(A)],
i.e., all the conditions of Theorem 1.11 are satised except for the integrability one.
Denition 1.15. Sub--algebras (
1
, (
2
, . . . of T are called independent, if whenever G
i
(
i
(i N) and i
1
, . . . , i
n
are distinct, then
P(G
i
1
. . . G
in
) =
n
k=1
P(G
i
k
).
When we say that a random variable X is independent of a -algebra (, it means that (X)
is independent of (.
The following properties are immediate consequences of Theorem 1.11 and its proof.
Proposition 1.16. Let X, Y /
1
(, T, P) and let ( T be a -algebra. Then
1. E[E[X[(]] = E[X]
2. If X is (-measurable, then E[X[(] = X a.s..
3. If X is independent of (, then E[X[(] = E[X] a.s..
4. If X 0 a.s., then E[X[(] 0 a.s..
7
5. For any , R we have E[X + Y [(] = E[X[(] + E[Y [(] a.s..
6. [E[X[(][ E[[X[[(] a.s..
The basic convergence theorems for expectation have counterparts for conditional expecta-
tion. We rst recall the theorems for expectation.
Theorem 1.17. [Fatous lemma] If X
n
0 for all n, then
E[liminf
n
X
n
] liminf
n
E[X
n
].
Theorem 1.18. [Jensens inequality] Let X be an integrable random variable and let
: R R be a convex function. Then
E[(X)] (E[X]).
Proposition 1.19. Let ( T be a -algebra.
1. Conditional monotone convergence theorem: If (X
n
)
n0
is an increasing se-
quence of non-negative random variables with a.s. limit X, then
E[X
n
[(] E[X[(] as n , a.s..
2. Conditional Fatous lemma: If X
n
0 for all n, then
E
_
liminf
n
X
n
[(
_
liminf
n
E[X
n
[(] a.s..
3. Conditional dominated convergence theorem: If X
n
X and [X
n
[ Y for all
n a.s., for some integrable random variable Y , then
lim
n
E[X
n
[(] = E[X[(] a.s..
4. Conditional Jensens inequality: If X is an integrable random variable and :
R (, ] is a convex function such that either (X) is integrable or is non-
negative, then
E[(X)[(] (E[X[(]) a.s..
In particular, for all 1 p <
|E[X[(]|
p
|X|
p
.
Proof. 1. Let Y
n
be a version of E[X
n
[(]. Since 0 X
n
X as n , we have
that almost surely Y
n
is an increasing sequence and Y
n
0. Let Y = limsup
n
Y
n
.
We want to show that Y = E[X[(] a.s.. Clearly Y is (-measurable, as the limsup of
(-measurable random variables. Also, by the monotone convergence theorem we have
for all A (
E[X1(A)] = lim
n
E[X
n
1(A)] = lim
n
E[Y
n
1(A)] = E[Y 1(A)].
8
2. The sequence inf
kn
X
k
is increasing in n and lim
n
inf
kn
X
k
= liminf
n
X
n
.
Thus, by the conditional monotone convergence theorem we get
lim
n
E[ inf
kn
X
k
[(] = E[liminf
n
X
n
[(].
Clearly, E[inf
kn
X
k
[(] inf
kn
E[X
k
[(]. Passing to the limit gives the desired in-
equality.
3. Since X
n
+Y and Y X
n
are positive random variables for all n, applying conditional
Fatous lemma we get
E[X + Y [(] = E[liminf(X
n
+ Y )[(] liminf
n
E[X
n
+ Y [(] and
E[Y X[(] = E[liminf(Y X
n
)[(] liminf
n
E[Y X
n
[(].
Hence, we obtain that
liminf
n
E[X
n
[(] E[X[(] and limsup
n
E[X
n
[(] E[X[(].
4. A convex function is the supremum of countably many ane functions: (see for instance
[2, 6.6])
(x) = sup
i
(a
i
x + b
i
), x R.
So for all i we have E[(X)[(] a
i
E[X[(] + b
i
a.s. Now using the fact that the
supremum is over a countable set we get that
E[(X)[(] sup
i
(a
i
E[X[(] + b
i
) = (E[X[(]) a.s.
In particular, for 1 p < ,
|E[X[(]|
p
p
= E[[E[X[(][
p
] E[E[[X[
p
[(]] = E[[X[
p
] = |X|
p
p
.
Conditional expectation has the tower property:
Proposition 1.20. Let 1 ( be -algebras and X /
1
(, T, P). Then
E[E[X[(][1] = E[X[1] a.s..
Proof. Clearly, E[X[1] is 1-measurable and for all A 1 we have
E[E[X[1]1(A)] = E[X1(A)] = E[E[X[(]1(A)],
since A is also (-measurable.
We can always take out what is known:
9
Proposition 1.21. Let X /
1
and ( a -algebra. If Y is bounded and (-measurable, then
E[XY [(] = Y E[X[(] a.s..
Proof. Let A (. Then by Remark 1.12 we have
E[Y E[X[(]1(A)] = E[E[X[(](1(A)Y )] = E[Y X1(A)],
which implies that E[Y X[(] = Y E[X[(] a.s..
Before stating the next proposition, we quickly recall the denition of a -system and the
uniqueness of extension theorem for probability measures agreeing on a -system generating
a -algebra.
Denition 1.22. Let / be a set of subsets of . We call / a -system if for all A, B /,
the intersection A B /.
Theorem 1.23. [Uniqueness of extension] Let
1
,
2
be two measures on (E, c), where
c is a -algebra on E . Suppose that
1
=
2
on a -system / generating c and that
1
(E) =
2
(E) < . Then
1
=
2
on c.
Proposition 1.24. Let X be integrable and (, 1 T be -algebras. If (X, () is indepen-
dent of 1, then
E[X[((, 1)] = E[X[(] a.s..
Proof. We can assume that X 0. The general case will follow like in the proposition
above.
Let A ( and B 1. Then
E[1(A B)E[X[(1, ()]] = E[1(A B)X] = E[X1(A)]P(B)
= E[E[X[(]1(A)]P(B) = E[1(A B)E[X[(]],
where we used the independence assumption in the second and last equality. Let Y =
E[X[(1, ()] a.s., then Y 0 a.s.. We can now dene the measures
(F) = E[E[X[(]1(F)] and (F) = E[Y 1(F)], for all F T.
Then we have that and agree on the -system A B : A (, B 1 which generates
((, 1). Also, by the integrability assumption, () = () < . Hence, they agree
everywhere on ((, 1) and this nishes the proof.
Warning! If in the above proposition the independence assumption is weakened and we just
assume that (X) is independent of 1 and ( is independent of 1, then the conclusion does
not follow. See examples sheet!
10
1.3 Product measure and Fubinis theorem
A measure space (E, c, ) is called -nite, if there exists a collection of sets (S
n
)
n0
in c
such that
n
S
n
= E and (S
n
) < for all n.
Let (E
1
, c
1
,
1
) and (E
2
, c
2
,
2
) be two -nite measure spaces. The set
/ = A
1
A
2
: A
1
c
1
, A
2
c
2
is a -system of subsets of E = E
1
E
2
. Dene the product -algebra
c
1
c
2
= (/).
Set c = c
1
c
2
.
Theorem 1.25. [Product measure] Let (E
1
, c
1
,
1
) and (E
2
, c
2
,
2
) be two -nite mea-
sure spaces. There exists a unique measure =
1
2
on c such that
(A
1
A
2
) =
1
(A
1
)
2
(A
2
)
for all A
1
c
1
and A
2
c
2
.
Theorem 1.26. [Fubinis theorem] Let (E
1
, c
1
,
1
) and (E
2
, c
2
,
2
) be two -nite measure
spaces.
Let f be c-measurable and non-negative. Then
(f) =
_
E
1
__
E
2
f(x
1
, x
2
)
2
(dx
2
)
_

1
(dx
1
). (1.6)
If f is integrable, then
1. x
2
f(x
1
, x
2
) is
2
-integrable for
1
-almost all x
1
,
2. x
1

_
E
2
f(x
1
, x
2
)
2
(dx
2
) is
1
-integrable and formula (1.6) for (f) holds.
1.4 Examples of conditional expectation
Denition 1.27. A random vector (X
1
, . . . , X
n
) R
n
is called a Gaussian random vector
i for all a
1
, . . . , a
n
R the random variable
n
i=1
a
i
X
i
has a Gaussian distribution.
A real-valued process (X
t
, t 0) is called a Gaussian process i for every t
1
< t
2
< . . . < t
n
the random vector (X
t
1
, . . . , X
tn
) is a Gaussian random vector.
1.4.1 Gaussian case
Let (X, Y ) be a Gaussian random vector in R
2
. Set ( = (Y ). In this example, we are going
to compute X
t
= E[X[(].
11
Since X
t
must be (-measurable and ( = (Y ), by [2, A3.2.] we have that X
t
= f(Y ), for
some Borel function f. Let us try X
t
of the form X
t
= aY + b, for a, b R that we will
determine.
Since E[E[X[(]] = E[X], we must have that
aE[Y ] + b = E[X]. (1.7)
Also, we must have that
E[(X X
t
)Y ] = 0 Cov(X X
t
, Y ) = 0 Cov(X, Y ) = a var(Y ). (1.8)
So, if a satises (1.8), then
Cov(X X
t
, Y ) = 0
and since (X X
t
, Y ) is Gaussian, we get that X X
t
and Y are independent. Hence, if Z
is (Y )-measurable, then using also (1.7) we get that
E[(X X
t
)Z] = 0.
Therefore we proved that E[X[(] = aY + b, for a, b satisfying (1.7) and (1.8).
1.4.2 Conditional density functions
Suppose that X and Y are random variables having a joint density function f
X,Y
(x, y) in
R
2
. Let h : R R be a Borel function such that h(X) is integrable.
In this example we want to compute E[h(X)[Y ] = E[h(X)[(Y )].
The random variable Y has a density function f
Y
, given by
f
Y
(y) =
_
R
f
X,Y
(x, y) dx.
Let g be bounded and measurable. Then we have that
E[h(X)g(Y )] =
_
R
_
R
h(x)g(y)f
X,Y
(x, y) dx dy =
_
R
__
R
h(x)
f
X,Y
(x, y)
f
Y
(y)
dx
_
g(y)f
Y
(y) dy,
where we agree say that 0/0 = 0. If we now set
(y) =
_
R
h(x)
f
X,Y
(x, y)
f
Y
(y)
dx, if f
Y
(y) > 0
and 0 otherwise, then we get that
E[h(X)[Y ] = (Y ) a.s.
We interpret this result by saying that
E[h(X)[Y ] =
_
R
h(x)(Y, dx),
where (y, dx) = f
Y
(y)
1
f
X,Y
(x, y)1(f
Y
(y) > 0)dx = f
X[Y
(x[y)dx. The measure (y, dx) is
called the conditional distribution of X given Y = y, and f
X[Y
(x[y) is the conditional density
function of X given Y = y. Notice this function of x, y is dened only up to a zero-measure
set.
12
2 Discrete-time martingales
Let (, T, P) be a probability space and (E, c) be a measurable space. (We will mostly
consider E = R, R
d
, C. Unless otherwise indicated, it is to be understood from now on that
E = R.)
Let X = (X
n
)
n0
be a sequence of random variables taking values in E. We call X a
stochastic process in E.
A ltration (T
n
)
n
is an increasing family of sub--algebras of T, i.e., T
n
T
n+1
, for all n.
We can think of T
n
as the information available to us at time n. Every process has a natural
ltration (T
X
n
)
n
, given by
T
X
n
= (X
k
, k n).
The process X is called adapted to the ltration (T
n
)
n
, if X
n
is T
n
-measurable for all n. Of
course, every process is adapted to its natural ltration. We say that X is integrable if X
n
is integrable for all n.
Denition 2.1. Let (, T, (T
n
)
n0
, P) be a ltered probability space. Let X = (X
n
)
n0
be
an adapted integrable process taking values in R.
X is a martingale if E[X
n
[T
m
] = X
m
a.s., for all n m.
X is a supermartingale if E[X
n
[T
m
] X
m
a.s., for all n m.
X is a submartingale if E[X
n
[T
m
] X
m
a.s., for all n m.
Note that every process which is a martingale (resp. super, sub) with respect to the given
ltration is also a martingale (resp. super, sub) with respect to its natural ltration by the
tower property of conditional expectation.
Example 2.2. Let (
i
)
i1
be a sequence of i.i.d. random variables with E[
1
] = 0. Then it
is easy to check that X
n
=
n
i=1
i
is a martingale.
Example 2.3. Let (
i
)
i1
be a sequence of i.i.d. random variables with E[
1
] = 1. Then the
product X
n
=
n
i=1
i
is a martingale.
2.1 Stopping times
Denition 2.4. Let (, T, (T
n
)
n
, P) be a ltered probability space. A stopping time T is
a random variable T : Z
+
such that T n T
n
, for all n.
Equivalently, T is a stopping time if T = n T
n
, for all n. Indeed,
T = n = T n T n 1 T
n
.
Conversely,
T n =
kn
T = k T
n
.
13
Example 2.5. Constant times are trivial stopping times.
Let (X
n
)
n0
be an adapted process taking values in R. Let A B(R). The rst
entrance time to A is
T
A
= infn 0 : X
n
A
with the convention that inf() = , so that T
A
= , if X never enters A. This is a
stopping time, since
T
A
n =
kn
X
k
A T
n
.
The last exit time though, T
A
= supn 10 : X
n
A, is not always a stopping time.
As an immediate consequence of the denition, one gets:
Proposition 2.6. Let S, T, (T
n
)
n
be stopping times on the ltered probability space (, T, (T
n
), P).
Then also S T, S T, inf
n
T
n
, sup
n
T
n
, liminf
n
T
n
, limsup
n
T
n
are stopping times.
Proof. Note that in discrete time everything follows straight from the denitions. But when
one considers continuous time processes, then right continuity of the ltration is needed to
ensure that the limits are indeed stopping times.
Denition 2.7. Let T be a stopping time on the ltered probability space (, T, (T
n
), P).
Dene the -algebra T
T
via
T
T
= A T : A T t T
t
, for all t.
Intuitively T
T
is the information available at time T.
It is easy to check that if T = t, then T is a stopping time and T
T
= T
t
.
For a process X, we set X
T
() = X
T()
(), whenever T() < . We also dene the stopped
process X
T
by X
T
t
= X
Tt
.
Proposition 2.8. Let S and T be stopping times and let X = (X
n
)
n0
be an adapted process.
Then
1. if S T, then T
S
T
T
,
2. X
T
1(T < ) is an T
T
-measurable random variable,
3. X
T
is adapted,
4. if X is integrable, then X
T
is integrable.
Proof. 1. Straightforward from the denition.
2. Let A c. Then
X
T
1(T < ) A T t =
t
_
s=1
X
s
A T = s T
t
,
since X is adapted and T = s = T s
u<s
T u T
s
.
14
3. For every t we have that X
Tt
is T
Tt
-measurable, hence by (1) T
t
-measurable since
T t t.
4. We have
E[[X
Tt
[] = E
_
t1
s=0
[X
s
[1(T = s)
_
+E
_

s=t
[X
t
[1(T = s)
_
s=0
E[[X
t
[] < .
2.2 Optional stopping
Theorem 2.9. [Optional stopping] Let X = (X
n
)
n0
be a martingale.
1. If T is a stopping time, then X
T
is also a martingale, so in particular E[X
Tt
] = E[X
0
],
for all t.
2. If S T are bounded stopping times, then E[X
T
[T
S
] = X
S
a.s..
3. If S T are bounded stopping times, then E[X
T
] = E[X
S
].
4. If there exists an integrable random variable Y such that [X
n
[ Y for all n, and T is
a stopping time which is nite a.s., then
E[X
T
] = E[X
0
].
5. If X has bounded increments, i.e., M > 0 : n 0, [X
n+1
X
n
[ M a.s., and T is
a stopping time with E[T] < , then
E[X
T
] = E[X
0
].
Proof. 1. Notice that by the tower property of conditional expectation, it suces to
check that E[X
Tt
[T
t1
] = X
T(t1)
a.s.. We can write
E[X
Tt
[T
t1
] = E
_
t1
s=0
X
s
1(T = s)[T
t1
_
+E[X
t
1(T > t 1)[T
t1
]
= X
T
1(T t 1) +1(T > t 1)X
t1
,
since T > t 1 T
t1
and E[X
t
[T
t1
] = X
t1
a.s. by the martingale property.
2. Suppose that T n a.s.. Since S T, we can write
X
T
= (X
T
X
T1
) + (X
T1
X
T2
) + . . . + (X
S+1
X
S
) + X
S
= X
S
+
n
k=0
(X
k+1
X
k
)1(S k < T).
15
Let A T
S
. Then
E[X
T
1(A)] = E[X
S
1(A)] +
n
k=0
E[(X
k+1
X
k
)1(S k < T)1(A)] = E[X
S
1(A)],
since S k < T A T
k
, for all k and X is a martingale.
3. Taking expectations in 2 gives the equality in expectation.
4. See example sheet.
5. See example sheet.
Remark 2.10. Note that Theorem 2.9 is true if X is a super-martingale or sub-martingale
with the respective inequalities in the statements.
Remark 2.11. Let (
k
)
k
be i.i.d. random variables taking values 1 with probability 1/2.
Then X
n
=
n
k=0
k
is a martingale. Let T = infn 0 : X
n
= 1. Then T is a stopping time
and P(T < ) = 1. However, although from Theorem 2.9 we have that E[X
Tt
] = E[X
0
],
for all t, it holds that 1 = E[X
T
] ,= E[X
0
] = 0.
For non-negative supermartingales, Fatous lemma gives:
Proposition 2.12. Suppose that X is a non-negative supermartingale. Then for any stop-
ping time T which is nite a.s. we have
E[X
T
] E[X
0
].
2.3 Gamblers ruin
Let (
i
)
i1
be an i.i.d. sequence of random variables taking values 1 with probabilities
P(
1
= +1) = P(
1
= 1) = 1/2. Dene X
n
=
n
i=1
i
, for n 1, and X
0
= 0. This is
called the simple symmetric random walk in Z. For c Z we write
T
c
= infn 0 : X
n
= c,
i.e. T
c
is the rst hitting time of the state c, and hence is a stopping time. Let a, b > 0. We
will calculate the probability that the random walk hits a before b, i.e. P(T
a
< T
b
).
As mentioned earlier in this section X is a martingale. Also, [X
n+1
X
n
[ 1 for all n. We
now write T = T
a
T
b
. We will rst show that E[T] < .
It is easy to see that T is bounded from above by the rst time that there are a+b consecutive
+1s. The probability that the rst
1
, . . . ,
a+b
are all equal to +1 is 2
(a+b)
. If the rst
block of a + b variables fail to be all +1s, then we look at the next block of a + b, i.e.
a+b+1
, . . . ,
2(a+b)
. The probability that this block consists only of +1s is again 2
(a+b)
and
16
this event is independent of the previous one. Hence T can be bounded from above by a
geometric random variable of success probability 2
(a+b)
times a + b. Therefore we get
E[T] (a + b)2
a+b
.
We thus have a martingale with bounded increments and a stopping time with nite expec-
tation. Hence, from the optional stopping theorem (5), we deduce that
E[X
T
] = E[X
0
] = 0.
We also have
E[X
T
] = aP(T
a
< T
b
) + bP(T
b
< T
a
) and P(T
a
< T
b
) +P(T
b
< T
a
) = 1,
and hence we deduce that
P(T
a
< T
b
) =
b
a + b
.
2.4 Martingale convergence theorem
Theorem 2.13. [A.s. martingale convergence theorem] Let X = (X
n
)
n
be a super-
martingale which is bounded in L
1
, i.e., sup
n
E[[X
n
[] < . Then X
n
X
a.s. as n ,
for some X
L
1
(T
), where T
= (T
n
, n 0).
Usually when we want to prove convergence of a sequence, we have an idea of what the limit
should be. In the case of the martingale convergence theorem though, we do not know the
limit. And, indeed in most cases, we just know the existence of the limit. In order to show
the convergence in the theorem, we will employ a beautiful trick due to Doob, which counts
the number of upcrossings of every interval with rational endpoints.
Corollary 2.14. Let X = (X
n
)
n
be a non-negative supermartingale. Then X converges a.s.
towards an a.s. nite limit.
Proof. Since X is non-negative we get that
E[[X
n
[] = E[X
n
] E[X
0
] < ,
hence X is bounded in L
1
.
Let x = (x
n
)
n
be a sequence of real numbers. Let a < b be two real numbers. We dene
T
0
(x) = 0 and inductively for k 0
S
k+1
(x) = infn T
k
(x) : x
n
a and T
k+1
(x) = infn S
k+1
(x) : x
n
b (2.1)
with the usual convention that inf = .
We also dene N
n
([a, b], x) = supk 0 : T
k
(x) n, i.e., the number of upcrossings of the
interval [a, b] by the sequence x by time n. As n we have
N
n
([a, b], x) N([a, b], x) = supk 0 : T
k
(x) < ,
i.e., the total number of upcrossings of the interval [a, b].
17
b
a
S
1
T
1
S
2
Figure 1. Upcrossings.
Before stating and proving Doobs upcrossing inequality, we give an easy lemma that will
be used in the proof of Theorem 2.13.
Lemma 2.15. A sequence of real numbers x = (x
n
)
n
converges in

R = R if and
only if N([a, b], x) < for all rationals a < b.
Proof. Suppose that x converges. Then if for some a < b we had that N([a, b], x) = ,
that would imply that liminf
n
x
n
a < b limsup
n
x
n
, which is a contradiction.
Next suppose that x does not converge. Then liminf
n
x
n
< limsup
n
x
n
and so taking a < b
rationals between these two numbers gives that N([a, b], x) = .
Theorem 2.16. [Doobs upcrossing inequality] Let X be a supermartingale and a < b
be two real numbers. Then for all n 0
(b a)E[N
n
([a, b], X)] E[(X
n
a)
].
Proof. We will omit the dependence on X fromT
k
and S
k
and we will write N = N
n
([a, b], X)
to simplify notation. By the denition of the times (T
k
) and (S
k
), it is clear that for all k
X
T
k
X
S
k
b a. (2.2)
We have
n
k=1
(X
T
k
n
X
S
k
n
) =
N
k=1
(X
T
k
X
S
k
) +
n
k=N+1
(X
n
X
S
k
n
)1(N < n) (2.3)
=
N
k=1
(X
T
k
X
S
k
) + (X
n
X
S
N+1
)1(S
N+1
n), (2.4)
since the only term contributing in the second sum appearing on the right hand side of (2.3)
is N +1, by the denition of N. Indeed, if S
N+2
n, then that would imply that T
N+1
n,
which would contradict the denition of N.
Using induction on k, it is easy to see that (T
k
)
k
and (S
k
)
k
are sequences of stopping times.
Hence for all n, we have that S
k
n T
k
n are bounded stopping times and thus by the
Optional stopping theorem, Theorem 2.9 we get that E[X
S
k
n
] E[X
T
k
n
], for all k.
18
Therefore, taking expectations in (2.3) and (2.4) and using (2.2) we get
0 E
_
n
k=1
(X
T
k
n
X
S
k
n
)
_
(b a)E[N] E[(X
n
a)
],
since (X
n
X
S
N+1
)1(S
N+1
n) (X
n
a)
. Rearranging gives the desired inequality.

Proof of Theorem 2.13. Let a < b Q. By Doobs upcrossing inequality, Theorem 2.16
we get that
E[N
n
([a, b], X)] (b a)
1
E[(X
n
a)
] (b a)
1
E[[X
n
[ + a].
By monotone convergence theorem, since N
n
([a, b], X) N([a, b], X) as n , we get that
E[N([a, b], X)] (b a)
1
(sup
n
E[[X
n
[] + a) < ,
by the assumption on X being bounded in L
1
. Therefore, we get that N([a, b], X) < a.s.
for every a < b Q. Hence,
P
_

a<bQ
N([a, b[, X) <
_
= 1.
Writing
0
=
a<bQ
N([a, b[, X) < , we have that P(
0
) = 1 and by Lemma 2.15 on
0
we have that X converges to a possible innite limit X
. So we can dene
X
=
_
lim
n
X
n
, on
0
,
0, on
0
.
Then X
is T
-measurable and by Fatous lemma and the assumption on X being in L

1
we get
E[[X
[] = E[liminf
n
[X
n
[] liminf
n
E[[X
n
[] < .
Hence X
L
1
as required.
2.5 Doobs inequalities
Theorem 2.17. [Doobs maximal inequality] Let X = (X
n
)
n
be a non-negative sub-
martingale. Writing X
n
= sup
0kn
X
k
we have
P(X
n
) E[X
n
1(X
n
)] E[X
n
].
Proof. Let T = infk 0 : X
k
. Then T n is a bounded stopping time, hence by the
Optional stopping theorem, Theorem 2.9, we have
E[X
n
] E[X
Tn
] = E[X
T
1(T n)] +E[X
n
1(T > n)] P(T n) +E[X
n
1(T > n)].
It is clear that T n = X
n
. Hence we get
P(X
n
) E[X
n
1(T n)] = E[X
n
1(X
n
)] E[X
n
].
19
Theorem 2.18. [Doobs L
p
inequality] Let X be a martingale or a non-negative sub-
martingale. Then for all p > 1 letting X
n
= sup
kn
[X
k
[ we have
|X
n
|
p

p
p 1
|X
n
|
p
.
Proof. If X is a martingale, then by Jensens inequality [X[ is a non-negative submartingale.
So it suces to consider the case where X is a non-negative submartingale.
Fix k < . We now have
E[(X
n
k)
p
] = E
__
k
0
px
p1
1(X
n
x) dx
_
=
_
k
0
px
p1
P(X
n
x) dx
_
k
0
px
p2
E[X
n
1(X
n
x)] dx =
p
p 1
E
_
X
n
(X
n
k)
p1
p
p 1
|X
n
|
p
|X
n
k|
p1
p
,
where in the second and third equalities we used Fubinis theorem, for the rst inequality
we used Theorem 2.17 and for the last inequality we used H olders inequality. Rearranging,
we get
|X
n
k|
p

p
p 1
|X
n
|
p
.
Letting k and using monotone convergence completes the proof.
2.6 L
p
convergence for p > 1
Theorem 2.19. Let X be a martingale and p > 1, then the following statements are equiv-
alent:
1. X is bounded in /
p
(, T, P) : sup
n0
|X
n
|
p
<
2. X converges a.s. and in /
p
to a random variable X
3. There exists a random variable Z /

p
(, T, P) such that
X
n
= E[Z[T
n
] a.s.
Proof. 1 = 2 Suppose that X is bounded in /
p
. Then by Jensens inequality, X is also
bounded in /
1
. Hence by Theorem 2.13 we have that X converges to a nite limit X
a.s.
By Fatous lemma we have
E[[X
[
p
] = E[liminf
n
[X
n
[
p
] liminf
n
E[[X
n
[
p
] sup
n0
|X
n
|
p
p
< .
By Doobs /
p
inequality, Theorem 2.18 we have that
|X
n
|
p

p
p 1
|X
n
|
p
,
20
where recall that X
n
= sup
kn
[X
k
[. If we now let n , then by monotone convergence
we get that
|X
|
p

p
p 1
sup
n0
|X
n
|
p
.
Therefore
[X
n
X
[ 2X
/
p
and dominated convergence theorem gives that X
n
converges to X
in /
p
.
2 = 3 We set Z = X
. Clearly Z /
p
. We will now show that X
n
= E[Z[T
n
] a.s. If
m n, then by the martingale property we can write
|X
n
E[X
[T
n
]|
p
= |E[X
m
X
[T
n
]|
p
|X
m
X
|
p
0 as m . (2.5)
Hence X
n
= E[X
[T
n
] a.s.
3 = 1 This is immediate by the conditional Jensens inequality.
Remark 2.20. A martingale of the form E[Z[T
n
] (it is a martingale by the tower property)
with Z /
p
is called a martingale closed in /
p
.
Corollary 2.21. Let Z /
p
and X
n
= E[Z[T
n
] a martingale closed in /
p
. If T
=
(T
n
, n 0), then we have
X
n
X
= E[Z[T
] as n a.s. and in /
p
.
Proof. By the above theorem we have that X
n
X
as n a.s. and in /
p
. It only
remains to show that X
= E[Z[T
] a.s. Clearly X
is T
-measurable. Let A
n0
T
n
.
Then A T
N
for some N and
E[Z1(A)] = E[E[Z[T
]1(A)] = E[X
N
1(A)] E[X
1(A)] as N .
So this shows that for all A
n0
T
n
we have
E[X
1(A)] = E[E[Z[T
]1(A)].
But
n0
T
n
is a -system generating T
, and hence we get the equality for all A T
.
2.7 Uniformly integrable martingales
Denition 2.22. A collection (X
i
, i I) of random variables is called uniformly integrable
(UI) if
sup
iI
E[[X
i
[1([X
i
[ > )] 0 as .
Equivalently, (X
i
) is UI, if (X
i
) is bounded in /
1
and
> 0, > 0 : A T, P(A) < sup
iI
E[[X
i
[1(A)] < .
21
Remember that a UI family is bounded in /
1
. The converse is not true.
If a family is bounded in /
p
, for some p > 1, then it is UI.
Theorem 2.23. Let X /
1
. Then the class
E[X[(] : ( a sub--algebra of T
is uniformly integrable.
Proof. Since X /
1
, we have that for every > 0 there exists a > 0 such that whenever
P(A) , then
E[[X[1(A)] . (2.6)
We now choose < so that E[[X[] . For any sub--algebra ( we have
E[[E[X[(][] E[[X[].
Writing Y = E[X[(] we have by Markovs inequality P([Y [ ) E[[X[]/ . Finally
from (2.6) and the fact that [Y [ ( we have
E[[Y [1([Y [ )] E[[X[1([Y [ )] .
Lemma 2.24. Let (X
n
)
n
, X /
1
and X
n
X as n a.s. Then
X
n
/
1
X as n i (X
n
)
n0
is UI.
Proof. See [2, Theorem 13.7].
Denition 2.25. A martingale (X
n
)
n0
is called a UI martingale if it is a martingale and
the collection of random variables (X
n
)
n0
is a UI family.
Theorem 2.26. Let X be a martingale. The following statements are equivalent.
1. X is a uniformly integrable martingale
2. X
n
converges a.s. and in /
1
(, T, P) to a limit X
3. There exists Z /
1
(, T, P) so that X
n
= E[Z[T
n
] a.s. for all n 0.
Proof. 1 = 2 Since X is UI, it follows that it is bounded in /
1
, and hence from Theo-
rem 2.13 we get that X
n
converges a.s. towards a nite limit X
as n . Since X is UI,
[2, Theorem 13.7] gives the /
1
convergence.
2 = 3 We set Z = X
. Clearly Z /
1
. We will now show that X
n
= E[Z[T
n
] a.s. For all
m n by the martingale property we have
|X
n
E[X
[T
n
]|
1
= |E[X
m
X
[T
n
]|
1
|X
m
X
|
1
0 as m .
3 = 1 Notice that by the tower property of conditional expectation, E[Z[T
n
] is a martin-
gale. The uniform integrability follows from Theorem 2.23.
22
Remark 2.27. As in Corollary 2.21, if X is a UI martingale, then E[Z[T
] = X
, where
T
= (T
n
, n 0).
Remark 2.28. If X is a UI supermartingale (resp. submartingale), then X
n
converges a.s.
and in /
1
to a limit X
, so that E[X
[T
n
] X
n
(resp. ) for every n.
Example 2.29. Let (X
i
)
i
be i.i.d. random variables with P(X
1
= 0) = P(X
1
= 2) = 1/2.
Then Y
n
= X
1
X
n
is a martingale bounded in /
1
and it converges to 0 as n a.s.
But E[Y
n
] = 1 for all n, and hence it does not converge in /
1
.
If X is a UI martingale and T is a stopping time, which could also take the value , then
we can unambiguously dene
X
T
=
n=0
X
n
1(T = n) + X
1(T = ).
Theorem 2.30. [Optional stopping for UI martingales] Let X be a UI martingale and
let S and T be stopping times with S T. Then
E[X
T
[T
S
] = X
S
a.s.
Proof. We will rst show that E[X
[T
T
] = X
T
a.s. for any stopping time T. We will now
check that X
T
/
1
. Since [X
n
[ E[[X
[[T
n
], we have
E[[X
T
[] =
n=0
E[[X
n
[1(T = n)] +E[[X
[1(T = )]
nZ
+
E[[X
[1(T = n)] = E[[X
[].
Let B T
T
. Then
E[1(B)X
T
] =
nZ
+
E[1(B)1(T = n)X
n
] =
nZ
+
E[1(B)1(T = n)X
] = E[1(B)X
],
where for the second equality we used that E[X
[T
n
] = X
n
a.s. Also, clearly X
T
is T
T
-
measurable, and hence
E[X
[T
T
] = X
T
a.s.
Now using the tower property of conditional expectation, we get for stopping times S T,
since T
S
T
T
E[X
T
[T
S
] = E[E[X
[T
T
][T
S
] = E[X
[T
S
] = X
S
a.s.
2.8 Backwards martingales
Let . . . (
2
(
1
(
0
be a sequence of sub--algebras indexed by Z
. Given such
a ltration, a process (X
n
, n 0) is called a backwards martingale, if it is adapted to the
ltration, X
0
/
1
and for all n 1 we have
E[X
n+1
[(
n
] = X
n
a.s.
23
By the tower property of conditional expectation we get that for all n 0
E[X
0
[(
n
] = X
n
a.s. (2.7)
Since X
0
/
1
, from (2.7) and Theorem 2.23 we get that X is uniformly integrable. This is
a nice property that backwards martingales have: they are automatically UI.
Theorem 2.31. Let X be a backwards martingale, with X
0
/
p
for some p [1, ). Then
X
n
converges a.s. and in /
p
as n to the random variable X
= E[X
0
[(
], where
(
=
n0
(
n
.
Proof. We will rst adapt Doobs up-crossing inequality, Theorem 2.16, in this setting. Let
a < b be real numbers and N
n
([a, b], X) be the number of up-crossings of the interval [a, b]
by X between times n and 0 as dened at the beginning of Section 2.4.
If we write T
k
= (
n+k
, for 0 k n, then T
k
is an increasing ltration and the process
(X
n+k
, 0 k n) is an T-martingale. Then N
n
([a, b], X) is the number of up-crossings
of the interval [a, b] by X
n+k
between times 0 and n. Thus applying Doobs up-crossing
inequality to X
n+k
we get that
(b a)E[N
n
([a, b], X)] E[(X
0
a)
].
Letting n we have that N
n
([a, b], X) increases to the total number of up-crossings of
X from a to b and thus we deduce that
X
m
X
as m a.s.,
for some random variable X
, which is (
-measurable, since the -algebras (

n
are de-
creasing.
Since X
0
/
p
, it follows that X
n
/
p
, for all n 0. Also, by Fatous lemma, we get that
X
/
p
. Now by conditional Jensens inequality we obtain
[X
n
X
[
p
= [E[X
0
X
[(
n
][
p
E[[X
0
X
[
p
[(
n
].
But the latter family of random variables, (E[[X
0
X
[
p
[(
n
])
n
is UI, by Theorem 2.23
again. Hence also ([X
n
X
[
p
)
n
is UI, and thus by [2, Theorem 13.7], we conclude that
X
n
X
as n in /
p
.
In order to show that X
= E[X
0
[(
] a.s., it only remains to show that if A (
, then
E[X
0
1(A)] = E[X
1(A)].
Since A (
n
, for all n 0, we have by the martingale property that
E[X
0
1(A)] = E[X
n
1(A)].
Letting n in the above equality and using the /
1
convergence of X
n
to X
nishes
the proof.
24
2.9 Applications of martingales
Theorem 2.32. [Kolmogorovs 0-1 law] Let (X
i
)
i1
be a sequence of i.i.d. random vari-
ables. Let T
n
= (X
k
, k n) and T
=
n0
T
n
. Then T
is trivial, i.e. every A T
has probability P(A) 0, 1.

Proof. Let (
n
= (X
k
, k n) and A T
. Since (
n
is independent of T
n+1
, we have that
E[1(A)[(
n
] = P(A) a.s.
Theorem 2.26 gives that E[1(A)[(
n
] converges to E[1(A)[(
] a.s. as n , where (
=
((
n
, n 0). Hence we deduce that
E[1(A)[(
] = 1(A) = P(A) a.s.,

since T
. Therefore
P(A) 0, 1.
Theorem 2.33. [Strong law of large numbers] Let (X
i
)
i1
be a sequence of i.i.d. random
variables in /
1
with = E[X
1
]. Let S
n
= X
1
+ . . . + X
n
, for n 1 and S
0
= 0. Then
S
n
/n as n a.s. and in /
1
.
Proof. Let (
n
= (S
n
, S
n+1
, . . .) = (S
n
, X
n+1
, . . .). We will now show that (M
n
)
n1
=
(S
n
/(n))
n1
is a (T
n
)
n1
= ((
n
)
n1
backwards martingale. We have for m 1
E
_
M
m+1
T
m
_
= E
_
S
m1
m1
(
m
_
. (2.8)
Setting n = m, since X
n
is independent of X
n+1
, X
n+2
, . . ., we obtain
E
_
S
n1
n 1
(
n
_
= E
_
S
n
X
n
n 1
(
n
_
=
S
n
n 1
E
_
X
n
n 1
S
n
_
. (2.9)
By symmetry, notice that E[X
k
[S
n
] = E[X
1
[S
n
] for all k. Indeed, for any A B(R) we have
that E[X
k
1(S
n
A)] does not depend on k. Clearly
E[X
1
[S
n
] + . . . +E[X
n
[S
n
] = E[S
n
[S
n
] = S
n
,
and hence E[X
n
[S
n
] = S
n
/n a.s. Finally putting everything together we get
E
_
S
n1
n 1
(
n
_
=
S
n
n 1

S
n
n(n 1)
=
S
n
n
a.s.
Thus, by the backwards martingale convergence theorem, we deduce that
Sn
n
converges as
n a.s. and in /
1
to a random variable, say Y = limS
n
/n. Obviously for all k
Y = lim
X
k+1
+ . . . + X
k+n
n
,
25
and hence Y is T
k
= (X
k+1
, . . .)-measurable, for all k, hence it is
k
T
k
-measurable. By
Kolmogorovs 0-1 law, Theorem 2.32, we conclude that there exists a constant c R such
that P(Y = c) = 1. But
c = E[Y ] = limE[S
n
/n] = .
Theorem 2.34. [Kakutanis product martingale theorem] Let (X
n
)
n0
be a sequence
of independent non-negative random variables of mean 1. We set
M
0
= 1 and M
n
= X
1
X
2
. . . X
n
, n N.
Then (M
n
)
n0
is a non-negative martingale and M
n
M
a.s. as n for some random

variable M
. We set a
n
= E[
X
n
], then a
n
(0, 1]. Moreover,
1. if
n
a
n
> 0, then M
n
M
in /
1
and E[M
] = 1,
2. if
n
a
n
= 0, then M
= 0 a.s.
Proof. Clearly (M
n
)
n
is a positive martingale and E[M
n
] = 1, for all n, since the random
variables (X
i
) are independent and of mean 1. Hence, by the a.s. martingale convergence
theorem, we get that M
n
converges a.s. as n to a nite random variable M
. By
Cauchy-Schwarz a
n
1 for all n.
We now dene
N
n
=
X
1
. . . X
n
a
1
. . . a
n
, for n 1.
Then N
n
is a non-negative martingale that is bounded in /
1
, and hence converges a.s. towards
a nite limit N
as n .
1. We have
sup
n0
E[N
2
n
] = sup
n0
1
(
n
i=1
a
i
)
2
=
1
(
n
a
n
)
2
< , (2.10)
under the assumption that
n
a
n
> 0. Since M
n
= N
2
n
(
n
i=1
a
i
)
2
N
2
n
for all n, we get
E[sup
kn
M
k
] E[sup
kn
N
2
k
] 4E[N
2
n
],
where the last inequality follows by Doobs /
2
-inequality, Theorem 2.18. Hence by Monotone
convergence and (2.10) we deduce
E[sup
n
M
n
] < ,
and since M
n
sup
n
M
n
we conclude that M
n
is UI, and hence it also converges in /
1
towards M
. Finally since E[M

n
] = 1 for all n, it follows that E[M
] = 1.
2. We have M
n
= N
2
n
(
n
i=1
a
i
)
2
0, as n , since
n
a
n
= 0 and N
exists and is nite

a.s. by the a.s. martingale convergence theorem. Hence M
= 0 a.s.
26
2.9.1 Martingale proof of the Radon-Nikodym theorem
Theorem 2.35. [Radon-Nikodym theorem] Let P and Q be two probability measures
on the measurable space (, T). Assume that T is countably generated, i.e. there exists a
collection of sets (F
n
: n N) such that
T = (F
n
: n N).
Then the following statements are equivalent:
(a) P(A) = 0 implies that Q(A) = 0 for all A T (and in this case we say that Q is
absolutely continuous with respect to P and write Q P).
(b) > 0, > 0, A T, P(A) implies that Q(A) .
(c) There exists a non-negative random variable X such that
Q(A) = E[X1(A)], A T.
Remark 2.36. The random variable X which is unique P-a.s., is called (a version of) the
Radon-Nikodym derivative of Q with respect to P. We write X = dQ/dP a.s. The theorem
extends immediately to nite measures by scaling, then to -nite measures by breaking the
space into pieces where the measures are nite. Also we can lift the assumption that the
-algebra T is countably generated and the details for that can be found in [2, Chapter 14].
Proof. We will rst show that (a) implies (b). If (b) does not hold, then we can nd
> 0 such that for all n 1 there exists a set A
n
with P(A
n
) 1/n
2
and Q(A
n
) . Then
by the Borel-Cantelli lemma we get that
P(A
n
i.o.) = 0
Therefore from (a) we will get that Q(A
n
i.o.) = 0. But
Q(A
n
i.o.) = Q(
n
kn
A
k
) = lim
n
Q(
kn
A
k
) ,
which is a contradiction, so (a) implies (b).
Next we will show that (b) implies (c). We consider the following ltration:
T
n
= (F
k
, k n).
If we write /
n
= H
1
. . . H
n
: H
i
= F
i
or F
c
i
, then it is easy to see that
T
n
= (/
n
).
Note that the sets in /
n
are disjoint. We now let X
n
: [0, ) be the random variable
dened as follows
X
n
() =
A,n
Q(A)
P(A)
1( A).
27
Since the sets in /
n
are disjoint, we get that
Q(A) = E[X
n
1(A)], for all A T
n
.
We will use the notation
X
n
=
dQ
dP
on T
n
.
It is easy to check that (X
n
)
n
is a non-negative martingale with respect to the ltered
probability space (, T, (T
n
), P). Indeed, if A T
n
, then
E[X
n+1
1(A)] = Q(A) = E[X
n
1(A)], for all A T
n
.
Also (X
n
) is bounded in /
1
, since E[X
n
] = Q() = 1. Hence by the a.s. martingale conver-
gence theorem, it converges a.s. towards a random variable X
as n .
We will now show that (X
n
) is a uniformly integrable martingale. Set = 1/. Then by
Markovs inequality
P(X
n
)
E[X
n
]
=
1
= .
Therefore by (b)
E[X
n
1(X
n
)] = Q(X
n
) ,
which proves the uniform integrability. Thus by the convergence theorem for UI martingales,
Theorem 2.26, we get that X
n
converges to X
as n in /
1
and E[X
] = 1. So for all
A T
n
we have
E[X
n
1(A)] = E[X
1(A)].
Hence if we now dene a new probability measure

Q(A) = E[X
1(A)], then Q(A) =

Q(A)
for all A
n
T
n
. But
n
T
n
is a -system that generates the - algebra T, and hence
Q =

Q on T,
which implies (c).
The implication (c) = (a) is straightforward.
3 Continuous-time random processes
3.1 Denitions
Let (, T, P) be a probability space. So far we have considered stochastic processes in
discrete time only. In this section the time index set is going to be the whole positive real
line, R
+
. As in Section 2, we dene a ltration (T
t
)
t
to be an increasing collection of sub
-algebras of T, i.e. T
t
T
t
, if t t
t
. A collection of random variables (X
t
: t R
+
) is
called a stochastic process. Usually as in Section 2, X will take values in R or R
d
. X is
called adapted to the ltration (T
t
), if X
t
is T
t
-measurable for all t. A stopping time T is a
random variable taking values in [0, ] such that T t T
t
, for all t.
28
When we consider processes in discrete time, if we equip N with the -algebra T(N) that
contains all the subsets of N, then the process
(, n) X
n
()
is clearly measurable with respect to the product -algebra T T(N).
Back to continuous time, if we x t R
+
, then X
t
() is a random variable. But, the
mapping (, t) X
t
() has no reason to be measurable with respect to T B(R) (B(R) is
the Borel -algebra) unless some regularity conditions are imposed on X. Also, if A R,
then the rst hitting time of A,
T
A
= inft : X
t
A
is not in general a stopping time as the set
T
A
t =
0st
T = s / T
t
in general,
since this is an uncountable union.
A quite natural requirement is that for a xed the mapping t X
t
() is continuous in
t. Then, indeed the mapping (, t) X
t
() is measurable. More generally we will consider
processes that are right-continuous and admit left limits everywhere a.s. and we will call such
processes càdlàg from the french continu ` a droite limite à gauche. Continuous and c` adl` ag
processes are determined by their values in a countable dense subset of R
+
, for instance Q
+
.
Note that if a process X = (X
t
)
t(0,1]
is continuous, then the mapping
(, t) X
t
()
is measurable with respect to T B((0, 1]). To see this, note that by the continuity of X in
t we can write
X
t
() = lim
n
2
n
1
k=0
1(t (k2
n
, (k + 1)2
n
])X
k2
n().
For each n it is easy to see that
(, t)
2
n
1
k=0
1(t (k2
n
, (k + 1)2
n
])X
k2
n()
is TB((0, 1])-measurable. Hence X
t
() is TB((0, 1])-measurable, as a limit of measurable
functions.
We let C(R
+
, E) (D(R
+
, E)) be the space of continuous (cadlag) functions x : R
+
E
endowed with the product -algebra that makes the projections
t
: X X
t
measurable for
every t. Note that E = R or R
d
in this course.
For a stopping time T we dene as before
T
T
= A T : A T t T
t
for all t.
For a cadlag process X we set X
T
() = X
T()
(), whenever T() < and again as before
we dene the stopped process X
T
by X
T
t
= X
Tt
.
29
Proposition 3.1. Let S and T be stopping times and X a cadlag adapted process. Then
1. S T is a stopping time,
2. if S T, then T
S
T
T
,
3. X
T
1(T < ) is an T
T
-measurable random variable,
4. X
T
is adapted.
Proof. 1,2 follow directly from the denition like in the discrete time case. We will only
show 3. Note that 4 follows from 3, since X
Tt
will then be T
Tt
-measurable, and hence
T
t
-measurable, since by 2, T
Tt
T
t
.
Note that a random variable Z is T
T
measurable if and only if Z1(T t) is T
t
-measurable
for all t. It follows directly by the denition that if Z is T
T
-measurable, then Z1(T t) is
T
t
-measurable for all t. For the other implication, note that if Z = c1(A), then the claim is
true. This extends to all nite linear combinations of indicators, since if Z =
n
i=1
c
i
1(A
i
),
where the constants c
i
are positive, then we can write Z as a linear combination of indicators
of disjoint sets and then the claim follows easily. Finally for any positive random variable
Z we can approximate it by Z
n
= 2
n
2
n
Z| n Z as n . Then the claim follows for
each Z
n
, since if Z1(T t) is T
t
-measurable, then also Z
n
1(T t) is T
t
-measurable, for
all t. Finally the limit of T
T
-measurable random variables is T
T
-measurable.
So in order to prove that X
T
1(T < ) is T
T
-measurable, we will show that X
T
1(T t) is
T
t
-measurable for all t. We can write
X
T
1(T t) = X
T
1(T < t) + X
t
1(T = t).
Clearly, the random variable X
t
1(T = t) is T
t
-measurable. It only remains to show that
X
T
1(T < t) is T
t
-measurable. If we let T
n
= 2
n
,2
n
T|, then it is easy to see that T
n
is a
stopping time that takes values in the set T
n
= k2
n
: k N. Indeed
T
n
t = ,2
n
T| 2
n
t = T 2
n
2
n
t| T
2
n
]2
n
t|
T
t
.
By the cadlag property of X and the convergence T
n
T we get that
X
T
1(T < t) = lim
n
X
Tnt
1(T < t).
Since T
n
takes only countably many values, we have
X
Tnt
1(T < t) =
dTn,dt
X
d
1(T
n
= d) + X
t
1(T
n
> t)1(T < t).
But T
n
is a stopping time wrt the ltration (T
t
), and hence we see that X
Tnt
1(T < t) is
T
t
-measurable for all n and this nishes the proof.
30
Example 3.2. Note that when the time index set is R
+
, then hitting times are not always
stopping times. Let J be a random variable that takes values +1 or 1 each with probability
1/2. Consider now the following process
X
t
=
_
t, if t [0, 1];
1 + J(t 1), if t > 1.
Let T
t
= (X
s
, s t) be the natural ltration of X. Then if A = (1, 2) and we consider
T
A
= inft 0 : X
t
A, then clearly
T
A
1 / T
1
.
If we impose some regularity conditions on the process or the ltration though, then we get
stopping times like in the next two propositions.
Proposition 3.3. Let A be a closed set and let X be a continuous adapted process. Then
the rst hitting time of A,
T
A
= inft 0 : X
t
A,
is a stopping time.
Proof. It suces to show that
T
A
t =
_
inf
sQ,st
d(X
s
, A) = 0
_
, (3.1)
where d(x, A) stands for the distance of x from the set A. If T
A
= s t, then there exists a
sequence s
n
of times such that X
sn
A and s
n
s as n . By continuity of X, we then
deduce that X
sn
X
s
as n and since A is closed, we must have that X
s
A. Thus
we showed that X
T
A
A. We can now nd a sequence of rationals q
n
such that q
n
T
A
as
n and since d(X
T
A
, A) = 0 we get that d(X
qn
, A) 0 as n .
Suppose now that inf
sQ,st
d(X
s
, A) = 0. Then there exists a sequence s
n
Q, s
n
t, for
all n such that
d(X
sn
, A) 0 as n .
We can extract a converging subsequence of s
n
s and by continuity of X we get that
X
sn
X
s
as n . Since d(X
s
, A) = 0 and A is a closed set, we conclude that X
s
A,
and hence T
A
t.
Denition 3.4. Let (T
t
)
tR
+
be a ltration. For each t we dene
T
t+
=
s>t
T
s
.
If T
t+
= T
t
for all t, then we call the ltration (T
t
) right-continuous.
Proposition 3.5. Let A be an open set and X a continuous process. Then
T
A
= inft 0 : X
t
A
is a stopping time with respect to the ltration (T
t+
).
31
Proof. First we show that for all t, the event T
A
< t T
t
. Indeed,by the continuity of
X and the fact that A is open we get that
T
A
< t =
_
qQ,q<t
X
q
A T
t
,
since it is a countable union.
Since we can write
T
A
t =
n
T < t + 1/n
we get that T
A
t T
t+
.
3.2 Martingale regularization theorem
As we discussed at the beginning of the section, we can view a stochastic process indexed
by R
+
as a random variable with values in the space of functions f : R
+
E endowed
with the product -algebra that makes the projections f f(t) measurable. The law of the
process X is the measure that is dened as
(A) = P(X A),
where A is in the product -algebra. However the measure is not easy to work with.
Instead we consider simpler objects that we dene below.
Given a probability measure on D(R
+
, E) we consider the probability measure
J
, where
J R
+
is a nite set, dened as the law of (X
t
, t J). The probability measures (
J
) are
called the nite dimensional distributions of . By a -system uniqueness argument, is
uniquely determined by its nite-dimensional distributions. Indeed the set
sJ
X
s
A
s
: J is nite , A
s
B(R)
is a -system generating the product -algebra. So, when we want to specify the law of
a cadlag process, it suces to describe its nite-dimensional distributions. Of course we
have no a priori reason to believe there exists a cadlag process whose nite-dimensional
distributions coincide with a given family of measures (
J
: J R
+
, J nite).
Even if we know the law of a process, this does not give us much information about the
sample path properties of the process. Namely, there could be dierent processes with the
same nite marginal distributions. This motivates the following denition:
Denition 3.6. Let X and X
t
be two processes dened on the same probability space
(, T, P). We say that X
t
is a version of X if X
t
= X
t
t
a.s. for every t.
Remark 3.7. Note that two versions of the same process have the same nite marginal
distributions. But they do not share the same sample path properties.
Example 3.8. Let X = (X
t
)
t[0,1]
be the process that is identical to 0 for all t. Then
obviously the nite marginal distributions will be Dirac measures at 0. Now let U be a
uniform random variable on [0, 1]. We dene X
t
t
= 1(U = t). Then clearly the nite
32
marginal distributions of X
t
are Dirac measures at 0, and hence it is a version of X. However
it is not continuous and furthermore
P(X
t
t
= 0 t [0, 1]) = 0.
In this section we are going to show two theorems that guarantee the existence of a continuous
or cadlag version of a process.
Let (, T, (T
t
), P) be a ltered probability space. Let A be the collection of sets in T of
measure 0. We dene the ltration
T
t
= (T
t+
, A).
Denition 3.9. If a ltration satises

T
t
= T
t
for all t, then we say that (T
t
) satises the
usual conditions.
Before stating the next theorem, note that the denitions of martingales (resp. supermartin-
gales and submartingales) are the same in continuous time as the ones given for discrete
time processes.
Theorem 3.10. [Martingale regularization theorem] Let (X
t
)
t0
be a martingale with
respect to the ltration (T
t
)
t0
. Then there exists a cadlag process

X which is a martingale
with respect to (
T
t
) and satises
X
t
= E[
X
t
[T
t
] a.s.
for all t 0. If the ltration (T
t
) satises the usual conditions, then

X is a cadlag version
of X.
Before proving the theorem we state and prove an easy result about functions which is
analogous to Lemma 2.15 which was used in the proof of the a.s. martingale convergence
theorem.
Lemma 3.11. Let f : Q
+
R be a function dened on the positive rational numbers.
Suppose that for all a < b and a, b Q and all bounded I Q
+
the function f is bounded
on I and the number of upcrossings of the interval [a, b] during the time intervals I by f is
nite, i.e. N([a, b], I, f) < , where N([a, b], I, f) is dened as
sup n 0 : 0 s
1
< t
1
< . . . < s
n
< t
n
, s
i
, t
i
I, f(s
i
) < a, f(t
i
) > b, 1 i n .
Then for every t R
+
the right and left limits of f exist and are nite, i.e.
lim
st
f(s), lim
st
f(s) exist and are nite.
Proof. First note that if (s
n
) is a sequence of rationals decreasing to t, then by Lemma 2.15
we get that the limit lim
n
f(s
n
) exists. Similarly if s
t
n
is a sequence increasing to t, then
the limit lim
n
f(s
t
n
) exists. So far we showed that for any sequence converging to t from
above (or below) the limit exists. It remains to show that the limit is the same along any
sequence decreasing to t. To see this, note that if s
n
is a sequence decreasing to t and q
n
is
33
another sequence decreasing to t and lim
n
f(s
n
) ,= lim
n
f(q
n
), then we can combine the two
sequences and get a decreasing sequence (a
n
) converging to t such that lim
n
f(a
n
) does not
exist, which is a contradiction, since we already showed that for every decreasing sequence
the limit exists. Finally the limits from above or below are nite, which follows by the
assumption that f is bounded on any bounded subset of Q
+
.
Proof of Theorem 3.10. The goal is to dene

X as follows:
X
t
= lim
st,sQ
+
X
s
on a set of measure 1 and 0 elsewhere.
So rst we need to check that the limit above exists a.s. and is nite. In order to do so, we
are going to use Lemma 3.11. Therefore we rst show that X is bounded on bounded subsets
I of Q
+
. Let I be such a subset. Consider J = j
1
, . . . , j
n
I, where j
1
< j
2
< . . . < j
n
.
Then the process (X
j
)
jJ
is a discrete time martingale. By Doobs maximal inequality we
obtain
P(max
jJ
[X
j
[ > ) E[[X
jn
[] E[[X
K
[],
where K > sup I. So taking a monotone limit over J nite subsets of I with union the set
I, then we get that
P(sup
tI
[X
t
[ > ) E[[X
K
[].
Therefore by letting this shows that
P(sup
tI
[X
t
[ < ) = 1.
Let a < b be rational numbers. Then we have N([a, b], I, X) = sup
JI, nite
N([a, b], J, X).
Let J = a
1
, . . . , a
n
(in increasing order again) be a nite subset of I. Then (X
a
i
)
in
is a
martingale and Doobs upcrossing lemma gives that
(b a)E[N([a, b], J, X)] E[(X
an
a)
] E[(X
K
a)
] (3.2)
By monotone convergence again, if we let I
M
= Q
+
[0, M], we then get that for all M
N([a, b], I
M
, X) < a.s.
Thus if we now let
0
=
MN
a<b,a,bQ
N([a, b], I
M
, X) < sup
tI
M
[X
t
[ < ,
then we obtain that P(
0
) = 1 . For
0
by Lemma 3.11 the following limits exist in R:
X
t+
() = lim
st,sQ
X
s
(), t 0
X
t
() = lim
st,sQ
X
s
(), t > 0.
34
Hence we can now dene for t 0,
X
t
=
_
X
t+
, on
0
;
0, otherwise.
Then clearly

X is

T adapted, since

T contains also the events of 0 probability.
Let t
n
be a sequence in Q such that t
n
t as n . Then
X
t
= lim
n
X
tn
.
Notice that the process (X
tn
: n 1) is a backwards martingale, and hence it converges a.s.
and in /
1
as n . Therefore,
E[X
t
i
[T
t
] E[
X
t
[T
t
] in /
1
.
But E[X
t
i
[T
t
] = X
t
. Therefore
X
t
= E[
X
t
[T
t
] a.s.. (3.3)
It remains to show the martingale property of

X. Let s < t and s
n
a sequence in Q such
that s
n
s and s
0
< t. Then
X
s
= limX
sn
= limE[X
t
[T
sn
].
Now note that (E[X
t
[T
sn
]) is a backwards martingale and hence it converges a.s. and in /
1
to E[X
t
[T
s+
]. Therefore
X
s
= E[X
t
[T
s+
] a.s. (3.4)
If s < t, then by the tower property and (3.4) and (3.3) we get that
E[
X
t
[T
s+
] =

X
s
a.s.
Notice that if ( is any -algebra and X is an integrable random variable, then
E[X[( A] = E[X[(] a.s.
Finally we get that E[
X
t
[
T
s
] =

X
s
a.s., which shows that

X is a martingale with respect to
the ltration

T.
The only thing that remains to prove is the cadlag property.
Suppose that for some
0
we have that

X is not right continuous. Then this means
that there exists a sequence (s
n
) such that s
n
t as n and
[
X
sn

X
t
[ > ,
for some > 0. By the denition of

X for
0
, there exists a sequence of rational
numbers (s
t
n
) such that s
t
n
> s
n
, s
t
n
t as n and
[
X
sn
X
s
n
[

2.
35
Therefore, we get that
[X
s
n

X
t
[ >

2
,
which is a contradiction, since X
s
n

X
t
as n .
The proof that

X has left limits is left as an exercise (hint: use the nite up-crossing property
of X on rationals).
Example 3.12. Let , be independent random variables taking values +1 or 1 with equal
probability. We now dene
X
t
=
_
_
_
0, if t < 1;
, if t = 1;
+ , if t > 1.
We also dene T
t
to be the natural ltration, i.e. T
t
= (X
s
, s t). Then clearly, X is a
martingale relative to the ltration (T
t
), but it is not right continuous at 1. Also, it is easy
to see that T
1
= () but T
1+
= (, ). We now dene
X
t
=
_
0, if t < 1;
+ , if t 1.
It is easy to check that X
t
= E[
X
t
[T
t
] a.s. for all t and

X is a martingale with respect to the
ltration (T
t+
). It is obvious that

X is cadlag. Note though that

X is not a version of X,
since X
1
,=

X
1
.
From now on when we work with martingales in continuous time, we will always consider
their cadlag version, provided that the ltration satises the usual conditions.
3.3 Convergence and Doobs inequalities in continuous time
In this section we will give the continuous time analogues of Doobs inequalities and the
convergence theorems for martingales.
Theorem 3.13. [A.s. martingale convergence] Let (X
t
: t 0) be a cadlag martingale
which is bounded in /
1
. Then X
t
X
a.s. as t , for some X
/
1
(T
).
Proof. If N([a, b], I
M
, X) stands for the number of up-crossings of the interval [a, b] as
dened in Lemma 3.11, then from (3.2) in the proof of the martingale regularization theorem,
we get that
(b a)E[N([a, b], I
M
, X)] a + sup
t0
E[[X
t
[] < ,
since X is bounded in /
1
. Hence, if we take the limit as M then we get that
N([a, b], Q
+
, X) < a.s.
Therefore, the set
0
=
a<b,a,bQ
N([a, b], Q
+
, X) <
36
has probability 1. On
0
it is easy to see that X
q
converges as q and q Q
+
.
Indeed, as in the proof of Lemma 2.15, if X
q
did not converge, then limsup X
q
,= liminf X
q
and this would contradict the nite number of up-crossings of the interval [a, b], where
liminf < a < b < limsup. Thus (X
q
)
qQ
+
converges a.s. as q , q Q
+
, to X
. We will
now use the cadlag property of X to deduce that
X
t
X
as t .
Since X
q
X
as q , q Q
+
, for each > 0, there exists q
0
such that
[X
q
X
[ <

2
, for all q > q
0
.
By right continuity, we get that for t > q
0
there exists a rational q such that q > t and
[X
t
X
q
[ <

2
.
Hence we conclude that
[X
t
X
[ .
Theorem 3.14. [Doobs maximal inequality] Let (X
t
: t 0) be a cadlag martingale
and X
t
= sup
st
[X
s
[. Then, for all 0 and t 0
P(X
t
) E[[X
t
[].
Proof. Notice that by the cadlag property we have
sup
st
[X
s
[ = sup
st([0,t]Q
+
)
[X
s
[.
The rest of the proof follows in the same way as the rst part of the proof of Theorem 3.10
Theorem 3.15. [Doobs /
p
-inequality] Let (X
t
: t 0) be a cadlag martingale. Setting
X
t
= sup
st
[X
s
[, then for all p > 1 we have
|X
t
|
p

p
p 1
|X
t
|
p
.
Theorem 3.16. [/
p
martingale convergence theorem] Let X be a cadlag martingale
and p > 1, then the following statements are equivalent:
1. X is bounded in /
p
(, T, P) : sup
t0
|X
t
|
p
<
2. X converges a.s. and in /
p
to a random variable X
3. There exists a random variable Z /

p
(, T, P) such that
X
t
= E[Z[T
t
] a.s.
37
Theorem 3.17. [UI martingale convergence theorem] Let X be a cadlag martingale.
Then X is UI if and only if X converges a.s. and in /
1
to X
and this if and only if X is

closed.
Theorem 3.18. [Optional stopping theorem] Le X be a cadlag UI martingale. Then
for every stopping times S T, we have
E[X
T
[T
S
] = X
S
a.s.
Proof. Let A T
S
. We need to show tbat
E[X
T
1(A)] = E[X
S
1(A)].
Let T
n
= 2
n
,2
n
T| and S
n
= 2
n
,2
n
S|. Then T
n
T andS
n
S as n and by the right
continuity of X we get that
X
Sn
X
S
and X
Tn
X
T
as n .
Also, from the discrete time optional stopping theorem we have that X
Tn
= E[X
[T
Tn
] and
thus we see that X
Tn
is UI. Hence it converges to X
T
as n also in /
1
. By the discrete
time optional stopping theorem for UI martingales we have
E[X
Tn
[T
Sn
] = X
Sn
a.s. (3.5)
Since A T
S
the denition of S
n
implies that A T
Sn
. Hence from (3.5) we obtain that
E[X
Tn
1(A)] = E[X
Sn
1(A)]
Letting n and using the /
1
convergence of X
Tn
to X
T
and of X
Sn
to X
S
we have
E[X
T
1(A)] = E[X
S
1(A)].
3.4 Kolmogorovs continuity criterion
Let T
n
= k2
n
: 0 k 2
n
be the set of dyadic rationals of level n and T =
n0
T
n
.
Theorem 3.19. [Kolmogorovs continuity criterion] Let (X
t
)
tT
be a stochastic process
with real values. Suppose there exists p > 0, > 0 so that
E[[X
t
X
s
[
p
] c[t s[
1+
, for all s, t T,
for some constant c < . Then for every (0, /p), the process (X
t
)
tT
is -Holder
continuous, i.e. there exists a random variable K
such that
[X
t
X
s
[ K
[s t[
, for all s, t T.
38
Proof. By Markovs inequality and the assumption we have
P
_
[X
k2
n X
(k+1)2
n[ 2
n
_
c2
np
2
nn
.
By the union bound we have
P
_
max
0k<2
n
[X
k2
n X
(k+1)2
n[ 2
n
_
c2
n(p)
.
By Borel-Cantelli, since (0, /p), we deduce
max
0k<2
n
[X
k2
n X
(k+1)2
n[ 2
n
, for all n suciently large.
Therefore, there exists a random variable M such that
sup
n0
max
0k<2
n
[X
k2
n X
(k+1)2
n[
2
n
M < . (3.6)
We will now show that there exists a random variable M
t
< a.s. so that for every s, t T
we have
[X
t
X
s
[ M
t
[t s[
.
Let s, t T and let r be the unique integer such that
2
(r+1)
< t s 2
r
.
Then there exists k such that s < k2
(r+1)
< t. Set = k2
r+1
, then 0 < t < 2
r
. So
we have that
t =
kr+1
x
j
2
j
,
where x
j
0, 1 for all j (in fact this is a nite sum because t is dyadic). Similarly we
can write
s =
jr+1
y
j
2
j
,
where y
j
0, 1 for all j. Thus we see that we can write the interval [s, t) as a disjoint
union of dyadic intervals of length 2
n
for n r +1 and where at most 2 such intervals have
the same length. Therefore,
[X
s
X
t
[
d,n
[X
d
X
d+2
n[,
where d, d +2
n
in the summation above are the endpoints of the intervals in the decompo-
sition of [s, t). Hence using (3.6) we obtain that for all s, t T
[X
s
X
t
[ 2
nr+1
M2
n
= 2M
2
(r+1)
1 2
.
Thus, if we set M
t
= 2M/(1 2
), then we get that for s, t T

[X
s
X
t
[ M
t
2
(r+1)
M
t
[t s[
.
Therefore we get that (X
t
)
tT
is -H older continuous a.s.
39
4 Weak convergence
4.1 Denitions
Let (M, d) be a metric space endowed with its Borel -algebra. All the measures that we
will consider in this section will be measures on such a measurable space.
Denition 4.1. Let (
n
, n 0) be a sequence of probability measures on a metric space
(M, d). We say that
n
converges weakly to and write
n
if
n
(f) (f) as n
for all bounded continuous functions f on M, where (f) =
_
M
f d.
Notice that by the denition is also a probability measure, since (1) = 1.
Example 4.2. Let (x
n
)
n0
be a sequence in a metric space M that converges to x as
n . Then
xn
converges weakly to
x
as n , since if f is any continuous function,
then f(x
n
) f(x) as n .
Example 4.3. Let M = [0, 1] with the Euclidean metric and
n
= n
1
0kn1
k/n
.
Then
n
(f) is the Riemann sum n
1
0kn1
f(k/n) and it converges to
_
1
0
f(x) dx if f is
continuous, which shows that
n
converges weakly to Lebesgue measure on [0, 1].
Remark 4.4. Notice that if A is a Borel set, then it is not always true that
n
(A) (A)
as n , when
n
. Indeed, let x
n
= 1/n and
n
=
xn
. Then
n

0
, but
n
(A) = 1
for all n, when A is the open set (0, 1), and
0
(A) = 0.
Theorem 4.5. Let (
n
)
n0
be a sequence of probability measures. The following are equiv-
alent:
(a)
n
as n ,
(b) liminf
n
n
(G) (G) for all open sets G,
(c) limsup
n
n
(A) (A) for all closed sets A,
(d) lim
n
n
(A) = (A) for all sets A with (A) = 0.
Proof. (a) = (b). Let G be an open set with non-empty complement G
c
. For every
positive M we now dene
f
M
(x) = 1 (Md(x, G
c
)).
Then f
M
is a continuous and bounded function and for all M we have f
M
(x) 1(x G).
Also f
M
1(G) as M , since G
c
is a closed set. Since f
M
is continuous and bounded
we have
n
(f
M
) (f
M
) as n .
Hence
liminf
n
n
(G) liminf
n
n
(f
M
) = (f
M
).
Now using monotone convergence as M we get
liminf
n
n
(G) (G).
40
(b) (c). This is obvious by taking complements.
(b),(c) = (d). Let

A and

A denote the interior and the closure of the set A respectively.
Since
(A) = (

A

A) = 0,
we get that (
A) = (A) = (

A). Hence,
limsup
n
n
(

A) (A) liminf
n
n
(
A)
and since

A

A this gives the result.
(d) = (a). Let f : M R
+
be a continuous bounded non-negative function. Using
Fubinis theorem we get
_
M
f(x)
n
(dx) =
_
M
n
(dx)
_

0
1(t f(x)) dt =
_
K
0
n
(f t) dt,
where K is an upper bound for f. We will now show that for Lebesgue almost all t we have
(f t) = 0. (4.1)
Notice that f t f = t, since f t is a closed set by the continuity of f and
f > t is an open set contained in the interior. However, there can be at most a countable
set of numbers t such that (f = t) > 0, because
t : (f = t) > 0 =
n1
t : (f = t) n
1
and the n-th set on the right has at most n elements. Hence this proves (4.1).
Therefore by (d) and dominated convergence on
_
K
0

n
(f t) dt we get that
n
(f) (f) as n .
The extension to the case of a function f not necessarily positive is immediate.
For a nite non-negative measure on R we dene its distribution function
F
(x) = ((, x]), x R.

As a consequence of the theorem above we will now prove the following:
Proposition 4.6. Let (
n
)
n
be a sequence of probability measures in R. The following are
equivalent:
(a)
n
converges weakly to as n ,
(b) for every x R such that F
is continuous at x, the distribution functions F

n
(x)
converges to F
(x) as n .
41
Proof. (a)=(b). Let x be a continuity point of F
. Then
((, x]) = (x) = ((, x]) lim
n
((, x1/n]) = F
(x) lim
n
F
(x1/n) = 0,
since x is a continuity point of F
. Thus we get that

F
n
(x) =
n
((, x]) ((, x]),
by the 4-th equivalence in Theorem 4.5.
(b)=(a). First of all note that a distribution function is increasing, and hence has only
countably many points of discontinuity.
Let G be an open set in R. Then we can write G =
k
(a
k
, b
k
), where the intervals (a
k
, b
k
)
are disjoint. We thus have that
n
(G) =
k

n
((a
k
, b
k
)). For each interval (a, b) we have
n
((a, b)) = F
n
(b) F
n
(a) F
n
(b
t
) F
n
(a
t
),
where a
t
, b
t
are continuity points of F
(remember there are only countably many points of

discontinuity set of continuity points is dense) satisfying
a < a
t
< b
t
< b.
Therefore
liminf
n
n
((a, b)) F
(b
t
) F
(a
t
) = ((a
t
, b
t
))
and hence if we let a
t
a and b
t
b along continuity points of F
, then
liminf
n
n
((a, b)) (a, b). (4.2)
Finally we deduce
liminf
n
n
(G) = liminf
n
n
((a
k
, b
k
))
k
liminf
n
n
((a
k
, b
k
))
k
((a
k
, b
k
)) = (G),
where the rst inequality follows from Fatous lemma and the second one from (4.2).
Denition 4.7. Let (X
n
)
n
be a sequence of random variables taking values in a metric
space (M, d) but dened on possibly dierent probability spaces (
n
T
n
, P
n
). We say that X
n
converges in distribution to a random variable X dened on the probability space (, T, P) if
the law of X
n
converges weakly to the law of X as n . Equivalently, if for all functions
f : M R continuous and bounded
E
Pn
[f(X
n
)] E
P
[f(X)] as n .
Proposition 4.8 (a). Let (X
n
)
n
be a sequence of random variables that converges to X in
probability as n . Then X
n
converges to X in distribution to X as n .
(b). Let (X
n
)
n
be a sequence of random variables that converges to a constant c in distribution
as n . Then X
n
converges to c in probability as n .
Proof. See example sheet.
Example 4.9. [Central limit theorem] Let (X
n
)
n
be a sequence of i.i.d. random variables
in /
2
with m = E[X
1
] and
2
= var(X
1
). We set S
n
= X
1
+ . . . + X
n
. Then the central
limit theorem states that the normalized sums (S
n
nm)/
n converge in distribution to
a Gaussian A(0, 1) random variable as n .
42
4.2 Tightness
Denition 4.10. A sequence of probability measures (
n
)
n
on a metric space M is said to
be tight if for every > 0, there exists a compact subset K M such that
sup
n
n
(M K) .
Remark 4.11. Note that if a metric space M is compact, then every sequence of measures
is tight.
Theorem 4.12. [Prohorovs theorem] Let (
n
)
n
be a tight sequence of probability mea-
sures on a metric space M. Then there exists a subsequence (n
k
) and a probability measure
on M such that
n
k
.
Proof. We will prove the theorem in the case when M = R. Let F
n
= F
n
be the distribution
function corresponding to the measure
n
. We will rst show that there exists a subsequence
n
k
and a non-decreasing function F such that F
n
k
(x) converges to F(x) for all x Q. To
prove that we will use a standard extraction argument.
Let (x
1
, x
2
, . . .) be an enumeration of Q. Then (F
n
(x
1
))
n
is a sequence in [0, 1], and hence
it has a converging subsequence. Let the converging subsequence be F
n
(1)
k
(x
1
) and the limit
F(x
1
). Then (F
n
(1)
k
(x
2
))
k
is a sequence in [0, 1] and thus also has a converging subsequence.
If we continue in this way, we get for each i 1 a sequence n
(i)
k
so that F
n
(i)
k
(x
j
) converges
to a limit F(x
j
) for all j = 1, . . . , i. Then the diagonal sequence m
k
= n
(k)
k
satises that
F
m
k
(x) converges for all x Q to F(x) as k . Since the distribution functions F
n
(x)
are non-decreasing in x, then we get that F(x) is also non-decreasing in x.
By the monotonicity of F we can dene for all x R
F(x) = lim
qx,qQ
F(q).
The denition of F gives that it is right continuous and the monotonicity property gives
that left limits exist, hence F is cadlag.
We will next show that if t is a point of continuity of F, i.e. F(t) = F(t), then
lim
k
F
m
k
(t) = F(t).
Let s
1
< t < s
2
with s
1
, s
2
Q and such that [F(s
i
) F(t)[ < /2 for i = 1, 2. Note that
such rational numbers s
1
and s
2
exist since t is a continuity point of F. Then using the
monotonicity property of F
m
k
we get that for k large enough
F(t) < F(s
1
)

2
< F
m
k
(s
1
) F
m
k
(t) F
m
k
(s
2
) < F(s
2
) +

2
< F(t) + .
By tightness, for every > 0, there exists N such that
n
([N, N]
c
) n.
43
Note that we can choose N so that both Nand N are continuity points of F (F is mono-
tone). Therefore it follows that
F(N) and 1 F(N) .
Hence we see that
lim
x
F(x) = 0 and lim
x
F(x) = 1.
Finally we need to show that there exists a measure such that F = F
. To this end, we
dene
((a, b]) = F(b) F(a).
Then can be extended to a Borel probability measure by Caratheodorys extension theorem
and F = F
. Another way to construct the measure is given in [2, Section 3.12].

Proposition 4.6 now nishes the proof.
4.3 Characteristic functions
Denition 4.13. Let X be a random variable taking values in R
d
with law . We dene
the characteristic function =
X
by
(u) = E[e
iu,X)
] =
_
R
d
e
iu,x)
(dx), u R
d
.
Remark 4.14. The characteristic function of a random variable X is clearly a continuous
function on R
d
and (0) = 1.
The characteristic function
X
determines the law of a random variable X, in the sense
that if
X
(u) =
Y
(u) for all u, then /(X) = /(Y ). To prove this see the Probability and
Measure notes by James Norris, Theorem 7.2.2.
Theorem 4.15. [Levys convergence theorem] Let (X
n
)
n0
, X be random variables in
R
d
. Then
/(X
n
) /(X) if and only if
Xn
()
X
() R
d
.
We will prove the more general result:
Theorem 4.16. [Levy] 1. If /(X
n
) /(X) as n , then
Xn
()
X
() as n
for all R
d
.
2. If (X
n
)
n0
is a sequence of random variables in R
d
such that there exists : R
d
C
continuous at 0 with (0) = 1 and such that
Xn
() () as n for all R
d
, then
=
X
, for some X and /(X
n
) /(X) as n .
Before giving the proof of Levys theorem we state and prove a useful lemma:
Lemma 4.17. If X is a random variable in R
d
, then for all K > 0
P(|X|
> K) C(K/2)
d
_
[K
1
,K
1
]
d
(1 1
X
()) d,
where C = (1 sin 1)
1
.
44
Proof. Let be the distribution of X. Then by Fubinis theorem we have
_
[,]
d
1
X
(u) du =
_
[,]
d
1
__
e
iu,x)
d(x)
_
du = 1
_
d
j=1
_
[,]
e
iu
j
x
j
du
j
d(x)
= 1
_
d
j=1
_
1
ix
j
_
e
ix
j
e
ix
j
_
_
d(x) =
_
d
j=1
_
2 sin(x
j
)
x
j
_
d(x).
Therefore we have
1
d
_
[,]
d
(1 1
X
(u)) du = 2
d
_
R
d
d(x)
_
1
d
j=1
sin(x
j
)
x
j
_
. (4.3)
It is easy to check that if x 1, then
[ sin x[ x sin 1,
and hence the function f : R
d
R given by f(u) =
d
j=1
sin u
j
/u
j
satises [f(u)[ sin 1
when |u|
1. Thus for C = (1 sin 1)

1
we have
1(|u|
1) C(1 f(u)).
Hence, we have
P(|X|
K) = P
__
_
_
_
X
K
_
_
_
_
1
_
CE
_
1
d
j=1
sin(K
1
X
j
)
K
1
X
j
_
= C
_
R
d
d(x)
_
1
d
j=1
sin(K
1
x
j
)
K
1
x
j
_
.
Equation (4.3) now nishes the proof.
Proof of Theorem 4.16. 1. If X
n
converges in distribution to X as n , then for all
f continuous and bounded, writing
n
= /(X
n
) and = /(X), we have
n
(f) (f) as n .
Take f(x) = e
i,x)
. Then f is clearly continuous and bounded, and hence
Xn
() =
n
(e
i,)
) (e
i,)
) =
X
().
2. We will rst show that the sequence (/(X
n
)) is tight. From Lemma 4.17 we have that
for all K > 0
P(|X
n
|
> K) C
d
K
d
_
[K
1
,K
1
]
d
(1 1
Xn
(u)) du.
By the assumption and since [1 1
Xn
(u)[ 2 for all n using the dominated convergence
theorem we have
lim
n
K
d
_
[K
1
,K
1
]
d
(1 1
Xn
(u)) du = K
d
_
[K
1
,K
1
]
d
(1 1(u)) du.
45
Since is continuous at 0, if we take K large enough we can make this limit < /(2C
d
) and
so for all n large enough
P(|X
n
|
> K) .
If we now take K even larger, then the above inequality holds for all n showing the tightness
of the family (/(X
n
)).
By Prohorovs theorem there exists a subsequence (X
n
k
) that converges in distribution to
some random variable X. So
Xn
k
converges pointwise to
X
, and hence
X
= , which
shows that is a characteristic function.
We will nally show that X
n
converges in distribution to X. If not, then there would exist
a subsequence (m
k
) and a continuous and bounded function f such that for some > 0 and
all k
[E[f(X
m
k
)] E[f(X)][ > . (4.4)
But since the laws of (X
m
k
) are tight, we can extract a subsequence (
k
) along which (X
k
)
converges in distribution to some Y , which would imply that =
Y
and thus Y would have
the same distribution as X, contradicting (4.4).
46
5 Large deviations
5.1 Introduction
Let X
i
be a sequence of i.i.d. random variables with E[X
1
] = x and we set S
n
=
n
i=1
X
i
.
By the central limit theorem (assuming var(X
i
) =
2
< ) we have
P(S
n
n x + a
n) P(Z a) as n ,
where Z A(0, 1).
Large deviations: What are the asymptotics of P(S
n
an) as n , for a > x?
Example 5.1. Let X
i
be i.i.d. distributed as A(0, 1). Then
P(S
n
an) = P(X
1
a
n)
1
a
2n
e
a
2
n/2
,
where we write f(x) g(x) if f(x)/g(x) 1 as x . So
1
n
log P(S
n
an) I(a) =
a
2
2
as n .
In general we have
P(S
n+m
a(n + m)) P(S
n
an)P(S
m
am),
so b
n
= log P(S
n
an) satises that
b
n+m
b
n
+ b
m
,
and hence this implies the existence of the limit (exercise)
lim
b
n
n
= lim
1
n
log P(S
n
an) = I(a).
Note that if P(X
1
a
0
) = 1, then we will only consider a a
0
, since clearly P(S
n
na) = 0
for a > a
0
.
5.2 Cramers theorem
We will now obtain a bound for P(S
n
na) using the moment generating function of X
1
.
For 0 we set
M() = E[e
X
1
],
which could also be innite. We dene
() = log M().
47
Note that (0) = 0 and by Markovs inequality for 0
P(S
n
na) = P(e
Sn
e
na
) e
na
E[e
Sn
] =
_
e
a
M()
_
n
= exp(n(a ()). (5.1)
We now dene the Legendre transform of :
(a) = sup
0
(a ()) (0) = 0.
Then (5.1) yields
P(S
n
an) e
n
(a)
n,
whence
liminf
n
1
n
log P(S
n
an)
(a). (5.2)
Theorem 5.2. [Cramers theorem] Let (X
i
) be i.i.d. random variables with E[X
1
] = x
and S
n
=
n
i=1
X
i
. Then
lim
n
1
n
log P(S
n
na) =
(a) for a x.
Before proving the theorem we state and prove a preliminary lemma.
Lemma 5.3. The functions M() and () are continuous in D = : M() < and
dierentiable in

D with
M
t
() = E[X
1
e
X
1
] and
t
() =
M
t
()
M()
for

D.
Proof. Continuity follows immediately from the dominated convergence theorem.
Note that D is a (possibly innite) interval i.e. if
1
< <
2
and
1
,
2
D, then also
D, since e
x
e
1
x
+ e
2
x
for all x.
To show that it is dierentiable, note that
M( + h) M()
h
= E
_
e
(+h)X
e
X
h
_
and for 2[h[ < min
i=1,2
[
i
[ = 2 we have
e
(+h)x
e
x
h
= [x[e
(e
1
x
+ e
2
x
)
1
,
where

is in [
1
,
2
] if [h[ < min
i
[
i
[ = 2.
Proof of Theorem 5.2. The direction
lim
n
1
n
log P(S
n
na)
(a)
48
follows from (5.2).
Replacing X
i
by

X
i
= X
i
a yields
P(S
n
na) = P(

S
n
0)
and

M() = E[e
X
1
] = e
a
M() so
() = log
M() = () a.
Thus we need to show that
1
n
log P(
S
n
0)
(0) = sup
0
[()].
In view of (5.2) what remains is (dropping tildes)
liminf
1
n
log P(S
n
0) inf
0
() (5.3)
when x < 0.
If P(X
1
0) = 1, then
inf
0
() lim
() = log (0),
where = /(X
1
), so (5.3) holds in this case. Thus we may assume that P(X
1
> 0) > 0.
Next consider the case M() < for all . Dene a new law
where
d
d
(x) =
e
x
M()
, so E
[f(X
1
)] =
_
f(x)
e
x
M()
d(x).
More generally
E
[F(X
1
, . . . , X
n
)] =
_
F(x
1
, . . . , x
n
)
n
i=1
e
x
i
d(x
1
) . . . d(x
n
)
M()
n
holds when F(x
1
, . . . , x
n
) =
n
i=1
f
i
(x
i
), and hence for all bounded measurable F.
The dominated convergence theorem gives that g() = E
[X
1
] is continuous and g(0) = x <
0, while
lim
g() = lim
_
xe
x
d
_
e
x
d
> 0,
since (0, ) > 0. Thus we can nd > 0 such that E
[X
1
] = 0.
We now have
P(S
n
0) P(S
n
[0, n]) E
_
e
(Snn)
1(S
n
[0, n])
= M()
n
P
(S
n
[0, n])e
n
.
By the central limit theorem we have that P
(S
n
[0, n]) 1/2 as n so
liminf
n
1
n
log P(S
n
0) () .
49
Letting 0 proves (5.3) in the case where M() < for all .
Now we are going to prove the theorem in the general case. Let
n
= /(S
n
) and the
law of X
1
conditioned on [X
1
[ K and
n
the law of S
n
=
n
i=1
X
i
conditioned on
the event
n
i=1
[X
i
[ K. Then we have that
n
[0, )
n
[0, )[K, K]
n
. We write
K
() = log
_
K
K
e
x
d(x) and observe that
log
_

e
x
d(x) =
K
() log [K, K].
Therefore
liminf
1
n
log
n
[0, ) log [K, K] + liminf
1
n
log
n
[0, ) inf
0
K
() = J
K
.
Note that
K
as K , so J
K
J as K , for some J, and
liminf
n
1
n
log
n
[0, ) J. (5.4)
Since J
K

K
(0) (0) = 0, so we have J 0.
For large K we have that [0, K] > 0, and hence J
K
> whence J > . By the
continuity of
K
(Lemma 5.3) the level sets :
K
() J are non-empty compact
nested sets, so there exists
K
:
K
() J.
Therefore we obtain
(
0
) = lim
K
K
(
0
) J,
and hence by (5.4) we get
liminf
n
1
n
log
n
[0, ) (
0
) inf
0
()
as claimed.
5.3 Examples
Example 5.4. Let X A(0, 1), then
M() =
_
1
2
e
xx
2
/2
dx = e
2
/2
, so () =

2
2
.
In order to minimize a () we need to solve a =
t
() = , and hence
(a) = a
2
a
2
2
=
a
2
2
.
50
Example 5.5. Let X Exp(1). If < 1, then
M() =
_
e
xx
dx =
1
1
.
So for < 1 we have () = log(1 ) and for 1 we have M() = , and thus
() = . Solving a =
t
() gives that a =
1
1
or equivalently that = 1
1
a
, and hence
(a) = a 1 log a.
Example 5.6. Let X Poisson(1). Then
M() =
k=0
1
k!
e
k1
= e
e
1
,
so () = e
1. Solving a =
t
() gives that a = e
, and hence
(a) = a log a a + 1.
6 Brownian motion
6.1 History and denition
Brownian motion is named after R. Brown who observed in 1827 the erratic motion of small
particles in water. A physical model was developed by Einstein in 1905 and the mathematical
construction is due to N. Wiener in 1923. He used a random Fourier series to construct
Brownian motion. Our treatment follows later ideas of Levy and Kolmogorov.
Denition 6.1. Let B = (B
t
)
t0
be a continuous process in R
d
. We say that B is a Brownian
motion in R
d
started from x R
d
if
(i) B
0
= x a.s.,
(ii) B
t
B
s
A(0, (t s)I
d
), for all s < t,
(iii) B has independent increments, independent of B
0
.
Remark 6.2. We say that (B
t
)
t0
is a standard Brownian motion if x = 0.
Conditions (ii) and (iii) uniquely determine the law of a Brownian motion. In the next
section we will show that Brownian motion exists.
Example 6.3. Suppose tht (B
t
, t 0) is a standard Brownian motion and U is an indepen-
dent random variable uniformly distributed on [0, 1]. Then the process (
B
t
, t 0) dened
by
B
t
=
_
B
t
, if t ,= U;
0, if t = U
has the same nite-dimensional distributions as Brownian motion, but is discontinuous if
B(U) ,= 0, which happens with probability one, and hence it is not a Brownian motion.
51
6.2 Wieners theorem
Theorem 6.4. [Wieners theorem] There exists a Brownian motion on some probability
space.
Proof. We will rst prove the theorem in dimension d = 1 and we will construct a process
(B
t
, 0 t 1) and then extend it to the whole of R
+
and to higher dimensions.
Let T
0
= 0, 1 and T
n
= k2
n
, 0 k 2
n
for n 1, and T =
n0
T
n
be the set
of dyadic rational numbers in [0, 1]. Let (Z
d
, d T) be a sequence of independent random
variables distributed according to A(0, 1) on some probability space (, T, P). We will rst
construct (B
d
, d T) inductively.
First set B
0
= 0 and B
1
= Z
1
. Inductively, given that we have constructed (B
d
, d T
n1
)
satisfying the conditions of the denition, we build (B
d
, d T
n
) as follows:
Take d T
n
T
n1
and let d
= d 2
n
and d
+
= d + 2
n
, so that d
, d
+
are consecutive
dyadic numbers in T
n1
. We write
B
d
=
B
d
+ B
d
+
2
+
Z
d
2
(n+1)/2
.
Then we have
B
d
B
d
=
B
d
+
B
d
2
+
Z
d
2
(n+1)/2
and B
d
+
B
d
=
B
d
+
B
d
2

Z
d
2
(n+1)/2
. (6.1)
Setting N
d
=
B
d
+
B
d
2
and N
d
=
Z
d
2
(n+1)/2
, we see by the induction hypothesis that N
d
and
N
d
are independent centred Gaussian random variables with variance 2
n1
. Therefore
Cov(N
d
+ N
d
, N
d
N
d
) = var(N
d
) var(N
d
) = 0,
and hence the two new increments B
d
B
d
and B
d
+
B
d
, being Gaussian, are independent.
Indeed, all increments (B
d
B
d2
n) for d T
n
are independent. To see this it suces to
show that they are pairwise independent, as the vector of increments is Gaussian. Above
we showed that increments over consecutive intervals are independent. If they are dened
over intervals that are not consecutive, then notice that the increment is equal to half the
increment of the previous scale plus an independent Gaussian random variable by (6.1), and
hence this shows the claimed independence.
We have thus dened a process (B
d
, d T) satisfying the properties of Brownian motion.
Let s t T and notice that for every p > 0, since B
t
B
s
A(0, t s), we have
E[[B
t
B
s
[
p
] = [t s[
p/2
E[[N[
p
],
where N A(0, 1). Since N has moments of all orders, it follows by Kolmogorovs continuity
criterion, Theorem 3.19, that (B
d
, d T) is -H older continuous for all < 1/2 a.s. Hence
in order to extend to the whole of [0, 1] we simply let for t [0, 1]
B
t
= lim
i
B
d
i
,
52
where d
i
is a sequence in T converging to t. It follows easily that (B
t
, t [0, 1]) is -H older
continuous for all < 1/2 a.s.
Finally we will check that (B
t
, t [0, 1]) has the properties of Brownian motion. We will
rst prove the independence of the increments property. Let 0 = t
0
< t
1
< . . . < t
k
and let
0 = t
n
0
t
n
1
. . . t
n
k
be dyadic rational numbers such that t
n
i
t
i
as n for each i.
By continuity (B
t
n
1
, . . . , B
t
n
k
) converges a.s. to (B
t
1
, . . . , B
t
k
) as n , while on the other
hand the increments (B
t
n
j
B
t
n
j1
, 1 j k) are independent Gaussian random variables
with variances (t
n
j
t
n
j1
, 1 j k). Then as n we have
E
_
exp
_
i
k
j=1
u
j
(B
t
n
j
B
t
n
j1
)
__
=
k
j=1
e
(t
n
j
t
n
j1
)u
2
j
/2
j=1
e
(t
j
t
j1
)u
2
j
/2
.
By Levys convergence theorem we now see that the increments converge in distribution to
independent Gaussian random variables with respective variances t
j
t
j1
, which is thus the
distribution of (B
t
j
B
t
j1
, 1 j k) as desired.
To nish the proof we will construct Brownian motion indexed by R
+
. To this end, take
a sequence (B
i
t
, t [0, 1]) for i = 0, 1, . . . of independent Brownian motions and glue them
together, more precisely by
B
t
= B
]t|
t]t|
+
]t|1
i=0
B
i
1
.
This denes a continuous random process B : [0, ) R and it is easy to see from what
we have already shown that B satises the properties of a Brownian motion.
Finally to construct Brownian motion in R
d
we take d independent Brownian motions in 1
dimension, B
1
, . . . , B
d
, and set B
t
= (B
1
t
, . . . , B
d
t
). Then it is straightforward to check that
B has the required properties.
Remark 6.5. The proof above gives that the Brownian paths are a.s. -H older continuous
for all < 1/2. However, a.s. there exists no interval [a, b] with a < b such that B is Holder
continuous with exponent 1/2 on [a, b]. See example sheet for the last fact.
6.3 Invariance properties
The following invariance properties of Brownian motion will be used a lot.
Proposition 6.6. Let B be a standard Brownian motion in R
d
.
1. If U is an orthogonal matrix, then UB = (UB
t
, t 0) is again a standard Brownian
motion. In particular, B is a standard Brownian motion.
2. If > 0, then (
1/2
B
t
, t 0) is a standard Brownian motion (scaling property).
3. For every t 0, the shifted process (B
t+s
B
s
, t 0) is a standard Brownian motion
independent of T
B
t
(simple Markov property).
53
Theorem 6.7. [Time inversion] Suppose that (B
t
, t 0) is a standard Brownian motion.
Then the process (X
t
, t 0) dened by
X
t
=
_
0, if t = 0;
tB
1/t
, for t > 0
is also a standard Brownian motion.
Proof. The nite dimensional distributions (B
t
1
, . . . , B
tn
) of Brownian motion are Gaussian
random vectors and are therefore characterized by their means E[B
t
i
] = 0 and covariances
Cov(B
t
i
, B
t
j
) = t
i
for 0 t
i
t
j
.
So it suces to show that the process X is a continuous Gaussian process with the same
means and covariances as Brownian motion. Clearly the vector (X
t
1
, . . . , X
tn
) is a centred
Gaussian vector. The covariances for s t are given by
Cov(X
s
, X
t
) = st Cov(B
1/s
, B
1/t
) = st
1
t
= s.
Hence X and B have the same nite marginal distributions. The paths t X
t
are clearly
continuous for t > 0, so it remains to show that they are also continuous for t = 0. First notice
that since X and B have the same nite marginal distributions we get that (X
t
, t 0, t Q)
has the same law as a Brownian motion and hence
lim
t0,tQ
X
t
= 0 a.s.
Since Q
+
is dense and X is continuous for t > 0 we get that
0 = lim
t0,tQ
X
t
= lim
t0
X
t
a.s.
Corollary 6.8. [Law of large numbers] Almost surely, lim
t
Bt
t
= 0.
Proof. Let X
t
be as dened in Theorem 6.7. Then
lim
t
B
t
t
= lim
t
X(1/t) = X(0) = 0 a.s.
Remark 6.9. Of course one can show the above result directly using the strong law of large
numbers, i.e. lim
n
B
n
/n = 0. The one needs to show that B does not oscillate too much
between n and n + 1. See example sheet.
Denition 6.10. We dene (T
B
t
, t 0) to be the natural ltration of (B
t
, t 0) and T
+
s
the slightly augmented -algebra dened by
T
+
s
=
t>s
T
B
t
.
54
Remark 6.11. By the simple Markov property of Brownian motion B
t+s
B
s
is independent
of T
B
s
. Clearly T
B
s
T
+
s
for all s, since in T
+
s
we allow an additional innitesimal glance
into the future. But the next theorem shows that B
t+s
B
s
is still independent of T
+
s
.
Theorem 6.12. For every s 0 the process (B
t+s
B
s
, t 0) is independent of T
+
s
.
Proof. Let (s
n
) be a strictly decreasing sequence converging to s as n . By continuity
B
t+s
B
s
= lim
n
B
sn+t
B
sn
a.s.
Let A T
+
s
and t
1
, . . . , t
m
0. For any F continuous and bounded on(R
d
)
m
we have by
the dominated convergence theorem
E[F((B
t
1
+s
B
s
, . . . , B
tm+s
B
s
))1(A)] = lim
n
E[F((B
t
1
+sn
B
sn
, . . . , B
tm+sn
B
sn
))1(A)].
Since A T
+
s
, we have that A T
B
sn
for all n, and hence by the simple Markov property we
obtain that for all n
E[F((B
t
1
+sn
B
sn
, . . . , B
tm+sn
B
sn
))1(A)] = P(A)E[F((B
t
1
+sn
B
sn
, . . . , B
tm+sn
B
sn
))].
Therefore, taking the limit again we deduce that
E[F((B
t
1
+s
B
s
, . . . , B
tm+s
B
s
))1(A)] = E[F((B
t
1
+s
B
s
, . . . , B
tm+s
B
s
))]P(A),
and hence proving the claimed independence.
Theorem 6.13. [Blumenthals 0-1 law] The -algebra T
+
0
is trivial, i.e. if A T
+
0
, then
P(A) 0, 1.
Proof. Let A T
+
0
. Then A (B
t
, t 0), and hence by Theorem 6.12 we obtain that A
is independent of T
+
0
, i.e. it is independent of itself:
P(A) = P(A A) = P(A)
2
,
which gives that P(A) 0, 1.
Theorem 6.14. Suppose that (B
t
)
t0
is a standard Brownian motion in 1 dimension. Dene
= inft > 0 : B
t
> 0 and = inft > 0 : B
t
= 0. Then
P( = 0) = P( = 0) = 1.
Proof. For all n we have
= 0 =
kn
0 < < 1/k : B
> 0
and thus = 0 T
B
1/n
for all n, and hence
= 0 T
+
0
.
55
Therefore, P( = 0) 0, 1. It remains to show that it has positive probability. Clearly,
for all t > 0 we have
P( t) P(B
t
> 0) =
1
2
.
Hence by letting t 0 we get that P( = 0) 1/2 and this nishes the proof. In exactly the
same way we get that
inft > 0 : B
t
< 0 = 0 a.s.
Since B is a continuous function, by the intermediate value theorem, we deduce that
P( = 0) = 1.
Proposition 6.15. For d = 1 and t 0 let S
t
= sup
0st
B
s
and I
t
= inf
0st
B
s
.
1. Then for every > 0 we have
S
> 0 and I
< 0 a.s.
In particular, a.s. there exists a zero of B in any interval of the form (0, ), for all > 0.
2. A.s. we have
sup
t0
B
t
= inf
t0
B
t
= +.
Proof. 1. For all t > 0 we have that
P(S
t
> 0) P(B
t
> 0) =
1
2
.
Thus, if t
n
is a sequence of real numbers decreasing to 0 as n , then by Fatous inequality
P(B
tn
> 0 i.o.) = P(limsupB
tn
> 0) limsup
n
P(B
tn
> 0) =
1
2
.
Clearly, the event B
tn
> 0 i.o. is in T
+
0
since it is T
B
t
k
-measurable for all k (notice that for
all k it does not depend on B
t
1
, . . . , B
t
k
). By Blumenthals 0-1 law we get that
P(B
tn
> 0 i.o.) = 1,
and hence S
> 0 a.s. for all > 0.

The same is true for the inmum by considering B which is again a standard Brownian
motion.
2. By scaling invariance of Brownian motion we get that
S
= sup
t0
B
t
= sup
t0
B
t
(d)
= sup
t0
B
t
.
Hence S
(d)
= S
for all > 0. Thus for all x > 0 the probability P(S
x) is a constant
c, and hence
P(S
0) = c.
But we have already showed that P(S
0) = 1. Therefore, for all x we have

P(S
x) = 1,
which gives that P(S
= ) = 1.
56
Proposition 6.16. Let C be a cone in R
d
with non-empty interior and origin at 0, i.e. a
set of the form tu : t > 0, u A, where A is a non-empty open subset of the unit sphere
of R
d
. If
H
C
= inft > 0 : B
t
C
is the rst hitting time of C, then H
C
= 0 a.s.
Proof. Since the cone C is invariant under multiplication by a positive scalar, by the scaling
invariance property of Brownian motion we get that for all t
P(B
t
C) = P(B
1
C).
Since C has non-empty interior, it is straightforward to check that
P(B
1
C) > 0
and then we can nish the proof using Blumenthals 0-1 law as in the proposition above.
6.4 Strong Markov property
Let (T
t
)
t0
be a ltration. We say that a Brownian motion B is an (T
t
)-Brownian motion
if B is adapted to (T
t
) and (B
s+t
B
s
s
for every s 0.
In Proposition 3.3 we saw that the rst hitting time of a closed set by a continuous process
is always a stopping time. This is not true in general though for an open set. However, if we
consider the right continuous ltration, i.e. (T
t+
), then we showed in Proposition 3.5 that
the rst hitting time of an open set by a continuous process is always an (T
t+
) stopping time.
So, in what follows we will be considering the right continuous ltration. As this ltration
is larger, this choice produces more stopping times.
Theorem 6.17. [Strong Markov property] Let T be an a.s. nite stopping time. Then
the process
(B
T+t
B
T
, t 0)
is a standard Brownian motion independent of T
+
T
.
Proof. We will rst prove the theorem for the stopping times T
n
= 2
n
,2
n
T| that discretely
approximate T from above. We write B
(k)
t
= B
t+k2
n B
k2
n which is a Brownian motion
and B
for the process dened by

B
(t) = B
t+Tn
B
Tn
.
We will rst show that B
is a Brownian motion independent of T

+
Tn
. Let E T
+
Tn
. For
every event B
A we have
P(B
A E) =
k=0
P(B
(k)
A E T
n
= k2
n
)
=
k=0
P(B
(k)
A)P(E T
n
= k2
n
),
57
since by the simple Markov property B
(k)
A is independent of T
+
k2
n
and E T
n
=
k2
n
T
+
k2
n
. Since B
(k)
is a Brownian motion, we have P(B
(k)
A) = P(B A) does
not depend on k, and hence
P(B
A E) = P(B A)P(E).
Taking E to be the whole space gives that B
is a Brownian motion, and hence

P(B
A E) = P(B
A)P(E)
for all A and E, thus showing the claimed independence.
By the continuity of Brownian motion we get that
B
t+s+T
B
s+T
= lim
n
(B
s+t+Tn
B
s+Tn
).
The increments (B
t+s+Tn
B
s+Tn
) are normally distributed with 0 mean variance equal to
t. Thus for any s 0 the increments B
t+s+T
B
s+T
are also normally distributed with 0
mean and variance t. As the process (B
t+T
B
T
, t 0) is a.s. continuous, it is a Brownian
motion. It only remains to show that it is independent of T
+
T
.
Let A T
+
T
and t
1
, . . . , t
k
0. We will show that for any function F : (R
d
)
k
R continuous
and bounded we have
E[1(A)F((B
t
1
+T
B
T
, . . . , B
t
k
+T
B
T
))] = P(A)E[F((B
t
1
+T
B
T
, . . . , B
t
k
+T
B
T
))].
Using the continuity again and the dominated convergence theorem, we get that
E[1(A)F((B
t
1
+T
B
T
, . . . , B
t
k
+T
B
T
))] = lim
n
E[1(A)F((B
t
1
+Tn
B
Tn
, . . . , B
t
k
+Tn
B
Tn
))].
Since T
n
> T, it follows that A T
+
Tn
. But we already showed that the process (B
t+Tn

B
Tn
+
Tn
, hence using the continuity and dominated convergence
one more time gives the claimed independence.
Remark 6.18. Let = inft 0 : B
t
= max
0s1
B
s
. It is intuitively clear that is not a
stopping time. To prove that, rst show that < 1 a.s. The increment B
t+
B
is negative
in a small neighbourhood of 0, which contradicts the strong Markov property.
6.5 Reection principle
Theorem 6.19. [Reection principle] Let T be an a.s. nite stopping time and (B
t
, t 0)
a standard Brownian motion. Then the process (
B
t
, t 0) dened by
B
t
= B
t
1(t T) + (2B
T
B
t
)1(t > T)
is also a standard Brownian motion and we call it Brownian motion reected at T.
58
Proof. By the strong Markov property, the process
B
(T)
= (B
T+t
B
T
, t 0)
is a standard Brownian motion independent of (B
t
, 0 t T). Also the process
B
(T)
= (B
T
B
t+T
, t 0)
is a standard Brownian motion independent of (B
t
, 0 t T). Therefore, the pair
((B
t
, 0 t T), B
(T)
) has the same law as ((B
t
, 0 t T), B
(T)
).
We now dene the concatenation operation at time T between two continuous paths X and
Y by
T
(X, Y )(t) = X
t
1(t T) + (X
T
+ Y
tT
)1(t > T).
Applying
T
to B and B
(T)
gives us the Brownian motion B, while applying it to B and
B
(T)
gives us the process

B.
Let / be the product -algebra on the space ( of continuous functions on [0, ). It is easy
to see that
T
is a measurable mapping from (( (, //) to ((, /) (by approximating T
by discrete stopping times).
Hence, B and

B have the same law.
Corollary 6.20. [Reection principle] Let B be a standard Brownian motion in 1 di-
mension and b > 0 and a b Then writing S
t
= sup
0st
B
s
we have that for every t 0
P(S
t
b, B
t
a) = P(B
t
2b a).
Proof. For any x > 0 we dene T
x
= inft 0 : B
t
= x. Since S
= (S
= sup
t0
B
t
)
a.s. we have that T
x
< a.s.
By the continuity of Brownian motion, we have that B
Tx
= x a.s. Clearly S
t
b = T
b

t. By the reection principle applied to T
b
we get
P(S
t
b, B
t
a) = P(T
b
t, 2b B
t
2b a) = P(T
b
t,

B
t
2b a),
since

B
t
= 2b B
t
when t T
b
.
Since a b, the event
B
t
2b a is contained in the event T
b
t. Hence we get
P(S
t
b, B
t
a) = P(
B
t
2b a) = P(B
t
2b a),
where the last equality follows again by the reection principle (
B is a standard Brownian
motion).
Corollary 6.21. For every t 0 the variables S
t
and [B
t
[ have the same law.
Proof. Let a > 0. Then by Corollary 6.20 we get that
P(S
t
a) = P(S
t
a, B
t
a) +P(S
t
a, B
t
> a) = 2P(B
t
a) = P([B
t
[ a),
since the event B
t
> a is contained in S
t
a.
Exercise 6.22. Let x > 0 and T
x
= inft > 0 : B
t
= x. Then the random variable T
x
has
the same law as (x/B
1
)
2
.
59
6.6 Martingales for Brownian motion
Proposition 6.23. Let (B
t
, t 0) be a standard Brownian motion in 1 dimension. Then
(i) the process (B
t
, t 0) is an (T
+
t
)-martingale,
(ii) the process (B
2
t
t, t 0) is an (T
+
t
)-martingale.
Proof. (i) Let s t, then
E[B
t
B
s
[T
+
s
] = E[B
t
B
s
] = 0,
since the increment B
t
B
s
is independent of T
+
s
by Theorem 6.12.
(ii) The process is adapted to the ltration (T
+
t
) and if s t, then
E[B
2
t
t[T
+
s
] = E[(B
t
B
s
)
2
[T
+
s
] + 2E[B
t
B
s
[T
+
s
] E[B
2
s
[T
+
s
] t
= (t s) + 2B
2
s
B
2
s
t = B
2
s
s.
Using the above proposition, one can show that
Proposition 6.24. Let B be a standard Brownian motion in 1 dimension and x, y > 0.
Then
P(T
y
< T
x
) =
x
x + y
and E[T
x
T
y
] = xy.
Proposition 6.25. Let B be a standard Brownian motion in d dimensions. Then for each
u = (u
1
, . . . , u
d
) R
d
the process
M
u
t
= exp
_
u, B
t

[u[
2
t
2
_
, t 0
is an (T
+
t
) martingale.
Proof. Integrability follows since E[exp(u, B
t
)] = exp
_
t
d
i=1
u
2
i
/2
_
for all t 0. Let
s t, then
E[M
u
t
[T
+
s
] = e
[u[
2
t/2
E
_
exp(u, B
t
B
s
+ B
s
)[T
+
s
= e
[u[
2
t/2
exp(u, B
s
)E[exp(u, B
t
B
s
)] ,
where the last equality follows from Theorem 6.12. Since the increment B
t
B
s
is distributed
according to A(0, (t s)I
d
) we get that
E[M
u
t
[T
+
s
] = M
u
s
,
and hence proving the martingale property.
60
We saw above that if f(x) = x
2
, then the right term to subtract from f(B
t
) in order to
make it a martingale is t. More generally now, we are interested in nding what we need to
subtract from f in order to obtain a martingale. Before stating the theorem for Brownian
motion, lets look at a discrete time analogue for a simple random walk on the integers. Let
(S
n
) be the random walk. Then
E[f(S
n+1
)[S
1
, . . . , S
n
] f(S
n
) =
1
2
(f(S
n
+ 1) 2f(S
n
) + f(S
n
1))
=
1
2
f(S
n
),
where

f(x) := f(x + 1) 2f(x) + f(x 1). Hence
f(S
n
)
1
2
n1
k=0
f(S
k
)
denes a discrete time martingale. In the Brownian motion case we expect a similar result
with

replaced by its continuous analogue, the Laplacian
f(x) =
d
i=1
2
f
x
2
i
.
Theorem 6.26. Let B be a Brownian motion in R
d
. Let f(t, x) : R
+
R
d
R be con-
tinuously dierentiable in the variable t and twice continuously dierentiable in the variable
x. Suppose in addition that f and its derivatives up to second order are bounded. Then the
following process
M
t
= f(t, B
t
) f(0, B
0
)
_
t
0
_
t
+
1
2
_
f(s, B
s
) ds, t 0
is an (T
+
t
)-martingale.
Proof. Integrability follows trivially by the assumptions on the boundedness of f and its
derivatives.
We will now show the martingale property. Let 0 t. Then
M
t+s
M
s
= f(t + s, B
t+s
) f(s, B
s
)
_
s+t
s
_

r
+
1
2
_
f(r, B
r
) dr
= f(t + s, B
t+s
) f(s, B
s
)
_
t
0
_

r
+
1
2
_
f(r + s, B
r+s
) dr.
Since B
t+s
B
s
is independent of T
+
s
by Theorem 6.12 and B
s
is T
+
s
-measurable, writing
p
s
(z, y) = (2s)
d/2
e
[zy[
2
/(2s)
for the transition density in time s, we have (check!)
E[f(t + s, B
t+s
)[T
+
s
] = E[f(t + s, B
t+s
B
s
+ B
s
)[T
+
s
] =
_
R
d
f(t + s, B
s
+ x)p
t
(0, x) dx.
61
Now notice that by the boundedness assumption on f and all its derivatives
E
__
t
0
_

r
+
1
2
_
f(r + s, B
r+s
) dr
T
+
s
_
=
_
t
0
E
__

r
+
1
2
_
f(r + s, B
r+s
)
T
+
s
_
dr.
(Check! using Fubinis theorem and the denition of conditional expectation.) Using again
the fact that B
t+s
B
s
is independent of T
+
s
, we get
E
__

r
+
1
2
_
f(r + s, B
r+s
B
s
+ B
s
)
T
+
s
_
=
_
R
d
_

r
+
1
2
_
f(r+s, x+B
s
)p
r
(0, x) dx.
By the boundedness of f and its derivatives, using the dominated convergence theorem we
deduce
_
t
0
_
R
d
_

r
+
1
2
_
f(r+s, x+B
s
)p
r
(0, x) dx dr = lim
0
_
t
_
R
d
_

r
+
1
2
_
f(r+s, x+B
s
)p
r
(0, x)dxdr.
Using integration by parts twice in this last integral and Fubinis theorem we have that it is
equal to
_
R
d
(f(t + s, B
s
+ x)p
t
(0, x) f( + s, x + B
s
)p
(0, x)) dx
_
R
d
_
t
r
p
r
(0, x)f(r + s, x + B
s
) dr dx
+
_
t
_
R
d
1
2
p
r
(0, x)f(r + s, x + B
s
) dx dr.
The transition density p
r
(0, x) satises the heat equation, i.e. (
r
/2)p = 0, and hence
this last expression is equal to
_
R
d
(f(t + s, B
s
+ x)p
t
(0, x) f( + s, x + B
s
)p
(0, x)) dx.

Now notice that as 0 we get
lim
0
_
R
d
f( + s, x + B
s
)p
(0, x)) dx = f(s, B

s
),
since the limit above is equal to lim
0
E[f(s + , B
s+
)[T
+
s
] which by the continuity of the
Brownian motion and of f and by the conditional dominated convergence theorem is equal
to f(s, B
s
).
Therefore we showed that
E[M
t+s
M
s
[T
+
s
] = 0 a.s.
and this nishes the proof.
6.7 Recurrence and transience
We note that if a Brownian motion starts from x R
d
, i.e. B
0
= x, then B can be written
as
B
t
= x +

B
t
,
where

B is a standard Brownian motion.
We will write P
x
to indicate that the Brownian motion starts from x, i.e. under P
x
the
process (B
t
x, t 0) is a standard Brownian motion.
62
Theorem 6.27. Let B be a Brownian motion in d 1 dimensions.
(i) If d = 1, then B is point-recurrent, in the sense that for all x a.s. the set
t 0 : B
t
= x
is unbounded.
(ii) If d = 2, then B is neighbourhood recurrent, in the sense that for every x, z under
P
x
-a.s. the set
t 0 : [B
t
z[
is unbounded for every > 0.
However, B does not hit points, i.e. for every x R
d
P
0
(t > 0 : B
t
= x) = 0.
(iii) If d 3, then B is transient, in the sense that
[B
t
[ as t P
0
-a.s.
Proof. (i) This is a consequence of Proposition 6.15, since
limsup
t
B
t
= = liminf
t
B
t
.
(ii) Note that it suces to show the claim for z = 0.
Let (
2
b
(R
2
) be such that
(y) = log [y[, for [y[ R,
where R > > 0. Note that (y) = 0 for [y[ R. Let the Brownian motion start
from x, i.e. B
0
= x with < [x[ < R.
By Theorem 6.26 the process
M =
_
(B
t
)
_
t
0
1
2
(B
s
) ds
_
t
is a martingale.
We now set S = inft 0 : [B
t
[ = and T
R
= inft 0 : [B
t
[ = R. Then H = S T
R
is an a.s. nite stopping time and (M
tH
)
t0
= (log [B
tH
[, t 0) is a bounded martingale.
By the optional stopping theorem, since H < a.s., we thus obtain that
E
x
[log [B
H
[] = log [x[
or equivalently,
log()P
x
(S < T
R
) + log(R)P
x
(T
R
< S) = log [x[,
63
which gives that
P
x
(S < T
R
) =
log R log [x[
log R log
. (6.2)
Letting R we have that T
R
a.s. and hence P
x
(S < ) = 1, which shows that
P
x
([B
t
[ , for some t > 0) = 1.
Applying the Markov property at time n we get
P
x
([B
t
[ , for some t > n) = P
x
([B
t+n
B
n
+ B
n
[ , for some t > 0)
=
_
R
2
P
0
([B
t
+ y[ , for some t > 0)P
x
(B
n
dy)
=
_
R
2
P
y
([B
t
[ , for some t > 0)P
x
(B
n
dy).
(P
x
(B
n
dy) is the law of B
n
under P
x
.) Since we showed above that for all z
P
z
([B
t
[ , for some t > 0) = 1,
we deduce that P
x
([B
t
[ , for some t > n) = 1 for all x.
Therefore the set t 0 : [B
t
[ is unbounded P
x
-a.s.
Letting 0 in (6.2) gives that the probability of hitting 0 before hitting the boundary of
the ball around 0 of radius R is 0. Therefore, letting R gives that the probability of
ever hitting 0 is 0, i.e. for all x ,= 0
P
x
(B
t
= 0, for some t > 0) = 0.
We only need to show now that
P
0
(B
t
= 0, for some t > 0) = 0.
Applying again the Markov property at a > 0 we get
P
0
(B
t
= 0, for some t a) =
_
R
2
P
0
(B
t+a
B
a
+ y = 0, for some t > 0)P
0
(B
a
dy)
=
_
R
2
P
y
(B
t
= 0, for some t > 0)
1
(2a)
d/2
e
[y[
2
/(2a)
dy = 0
since for all y ,= 0 we have already proved that P
y
(B
t
= 0, for some t > 0) = 0.
Thus, since P
0
(B
t
= 0, for some t a) = 0 for all a > 0, letting a 0 we deduce that
P
0
(B
t
= 0, for some t > 0) = 0.
(iii) Since the rst three components of a Brownian motion in R
d
form a Brownian motion
in R
3
, it suces to treat the case d = 3. As we did above, let f be a function f (
2
b
(R
3
)
such that
f(y) =
1
[y[
, for [y[ R.
64
Note that f(y) = 0 for [y[ R. Let B
0
= x with [x[ R. If we dene again S
and T
R
as above the same argument shows that
P
x
(S < T
R
) =
[x[
1
R
1
1
R
1
.
As R this converges to /[x[ which is the probability of ever visiting the ball centred
at 0 and of radius when starting from [x[ .
We will now show that
P
0
([B
t
[ as t ) = 1.
Let T
r
= inft > 0 : [B
t
[ = r for r > 0. We dene the events
A
n
= [B
t
[ > n for all t T
n
3.
By the unboundedness of Brownian motion, it is clear that
P
0
(T
n
3 < ) = 1.
Applying the strong Markov property at the time T
n
3 we obtain
P
0
(A
c
n
) = P
0
_
[B
t+T
n
3
B
T
n
3
+ B
T
n
3
[ n for some t 0
_
= E
0
[P
B
T
n
3
(T
n
< )] =
n
n
3
=
1
n
2
.
Since the right hand side is summable, by the Borel-Cantelli lemma we get that only nitely
many of the sets A
c
n
occur, which implies that [B
t
[ diverges to as t .
6.8 Brownian motion and the Dirichlet problem
Denition 6.28. We call a connected open subset D of R
d
a domain. We say that D
satises the Poincare cone condition at x D (boundary of D) if there exists a non-empty
open cone C with origin at x and such that C B(x, r) D
c
for some r > 0.
Theorem 6.29. [Dirichlet problem] Let D be a bounded domain in R
d
such that every
boundary point satises the Poincare cone condition. Suppose that is a continuous function
on D. We let (D) = inft 0 : B
t
D, which is an almost surely nite stopping
time when starting in D. Then the function u :

D R given by
u(x) = E
x
[(B
(D)
)], for x

D,
is the unique continuous function satisfying
u = 0 on D
u(x) = (x) for x D.
Before solving the Dirichlet problem we state a well-known result and for the proof we refer
the reader to [1, Theorem 3.2].
65
Theorem 6.30. Let D be a domain in R
d
and u : D R measurable and locally bounded.
The following conditions are equivalent:
(i) u is twice continuously dierentiable and u = 0,
(ii) for any ball B(x, r) D we have
u(x) =
1
/(B(x, r))
_
B(x,r)
u(y) dy,
(iii) for any ball B(x, r) D we have
u(x) =
1
x,r
(B(x, r))
_
B(x,r)
u(y) d
x,r
(y),
where
x,r
is the surface area measure on B(x, r).
Denition 6.31. A function satisfying one of the equivalent conditions of Theorem 6.30 is
called harmonic in D.
The next theorem and corollary folowing it will be used in the uniqueness part of the proof
of Theorem 6.29.
Theorem 6.32. [Maximum principle] Suppose that u : R
d
R is a harmonic function
on a domain D R
d
.
(i) If u attains its maximum in D, then u is a constant on D.
(ii) If u is continuous on

D and D is bounded, then
max
x
D
u(x) = max
xD
u(x).
Proof. (i) Let M be the maximum. Then the set V = x D : u(x) = M is relatively
closed in D (if x
n
is a sequence of points in V converging to x D, then x V ), since u is
continuous. Since D is open, for any x V there exists r > 0 such that B(x, r) D. From
Theorem 6.30 we have
M = u(x) =
1
/(B(x, r))
_
B(x,r)
u(y) dy M.
We thus deduce that u(y) = M for almost all y B(x, r). But since u is continuous, this
gives that u(y) = M for all y B(x, r). Therefore, B(x, r) V . Hence V is also open and
by assumption non-empty. But since D is connected, we must have that V = D. Hence u is
constant on D.
(ii) Since u is continuous and

D is closed and bounded, u attains a maximum on

D. By (i),
the maximum has to be attained on D.
Corollary 6.33. Suppose that u
1
, u
2
: R
d
R are functions harmonic on a bounded domain
D and continuous on

D. If u
1
and u
2
agree on D, then they are identical.
66
Proof. By Theorem 6.32 (ii) applied to u
1
u
2
we obtain that
max
x
D
(u
1
(x) u
2
(x)) = max
xD
(u
1
(x) u
2
(x)) = 0,
and hence we obtain that u
1
(x) u
2
(x) for all x

D. In the same way u
2
(x) u
1
(x) for
all x

D. Hence u
1
= u
2
on

D.
Proof of Theorem 6.29. Since the domain D is bounded, we get that u is bounded. We
will rst show that u = 0 on D, by showing that u satisfes condition (iii) of Theorem 6.30.
Let x D. Then there exists > 0 such that

B(x, ) D. Let = inft > 0 : B
t
/ B(x, ).
Then this is an a.s. nite stopping time, and hence applying the strong Markov property at
we get
u(x) = E
x
[(B
D
)] = E
x
[E
x
[(B
D
)[T
]] = E
x
[E
B
[(B
D
)]]
= E
x
[u(B
)] =
1
(B(x, r))
_
B(x,r)
u(y) d
x,r
(y).
The uniqueness now follows from Corollary 6.33.
It remains to show that u is continuous on

D. Clearly u is continuous on D. So we only
need to show that u is continuous on D. Let z D. Since the domain D satises the
Poincare cone condition, there exists h > 0 and a non-empty open cone C
z
with origin at z
such that C
z
B(z, h) D
c
.
Since is continuous on D, we get that for every > 0, there exists 0 < h such that
if [y z[ and y D, then [(y) (z)[ < .
Let x be such that [x z[ 2
k
, for some k > 0. Then we have
[u(x) u(z)[ = [E
x
[(B
D
)] (z)[ E
x
[[(B
D
) (z)[]
P
x
(
D
<
B(z,)
) + 2||
P
x
(
B(z,)
<
D
)
P
x
(
D
<
B(z,)
) + 2||
P
x
(
B(z,)
<
Cz
).
Now we note that
P
x
(
B(z,)
<
Cz
) a
k
,
for some a < 1. Thus by choosing k large enough, we can get this last probability as small
as we like, and hence this completes the proof of continuity.
We will now give an example where the domain does not satisfy the conditions of Theo-
rem 6.29 and the function u as dened there fails to solve the Dirichlet problem.
Example 6.34. Let v be a solution of the Dirichlet problem on B(0, 1) with boundary
condition : B(0, 1) R. We now let D = x R
2
: 0 < [x[ < 1 be the punctured disc.
We will show that the function u(x) = E
x
[(B
D
)] given by Theorem 6.29 fails to solve
the problem on D with boundary condition : B(0, 1) 0 if (0) ,= v(0). Indeed, since
planar Brownian motion does not hit points, the rst hitting time of D = B(0, 1) 0 is
equal a.s. to the rst hitting time of B(0, 1). Therefore,
u(0) = E
0
[(B
D
)] = v(0) ,= (0).
67
6.9 Donskers invariance principle
In this section we will show that Brownian motion is the scaling limit of random walks with
steps of 0 mean and nite variance. This can be seen as a generalization of the central limit
theorem to processes.
For a function f (([0, 1], R) we dene its uniform norm |f| = sup
t
[f(t)[. The uniform
norm makes (([0, 1], R) into a metric space so we can consider weak convergence of probability
measures. The associated Borel -algebra coincides with the -algebra generated by the
coordinate functions.
Theorem 6.35. [Donskers invariance principle] Let (X
n
, n 1) be a sequence of R-
valued integrable independent random variables with common law such that
_
x d(x) = 0 and
_
x
2
d(x) =
2
(0, ).
Let S
0
= 0 and S
n
= X
1
+. . . +X
n
and dene a continuous process that interpolates linearly
between values of S, namely
S
t
= (1 t)S
[t]
+tS
[t]+1
, t 0,
where [t] denotes the integer part of t and t = t [t]. Then S
[N]
:= ((
2
N)
1/2
S
Nt
, 0
t 1) converges in distribution to a standard Brownian motion between times 0 and 1, i.e.
for every bounded continuous function F : (([0, 1], R) R,
E[F(S
[N]
)] E[F(B)] as N .
Remark 6.36. Note that from Donskers theorem we can infer that N
1/2
sup
0nN
S
n
converges to sup
0t1
B
t
in distribution as N , since the function f sup f is a
continuous operation on (([0, 1], R).
The proof of Theorem 6.35 that we will give uses a coupling of the random walk with
the Brownian motion, called the Skorokhod embedding theorem. It is however specic to
dimension d = 1.
Theorem 6.37. [Skorokhod embedding for random walks] Let be a probability
measure on R of mean 0 and variance
2
< . Then there exists a probability space (, T, P)
with ltration (T
t
)
t0
, on which is dened a Brownian motion (B
t
)
t0
and a sequence of
stopping times
0 = T
0
T
1
T
2
. . .
such that, setting S
n
= B
Tn
,
(i) (T
n
)
n0
is a random walk with steps of mean
2
,
(ii) (S
n
)
n0
is a random walk with step distribution .
68
Proof. Dene Borel measures
on [0, ) by
(A) = (A), A B([0, )).

There exists a probability space on which are dened a Brownian motion (B
t
)
t0
and a
sequence ((X
n
, Y
n
) : n N) of independent random variables in R
2
with law given by
(dx, dy) = C(x + y)
(dx)
+
(dy)
where C is a suitable normalizing constant. Set T
0
= (X
n
, Y
n
: n N) and T
t
= (T
0
, T
B
t
).
Set T
0
= 0 and dene inductively for n 0
T
n+1
= inft T
n
: B
t
B
Tn
= X
n+1
or Y
n+1
.
Then T
n
is a stopping time for all n. Note that, since has mean 0, we must have
C
_
0
(x)(dx) = C
_

0
y(dy) = 1.
Write T = T
1
,X = X
1
and Y = Y
1
.
By Proposition 6.24, conditional on X = x and Y = y, we have T < a.s. and
P(B
T
= Y [X, Y ) = X/(X + Y ) and E[T[X, Y ] = XY.
So, for A B([0, )),
P(B
T
A) =
_
A
_

0
x
x + y
C(x + y)
(dx)
+
(dy)
so P(B
T
A) = (A). A similar argument shows this identity holds also for A B((, 0]).
Next
E[T] =
_

0
_

0
xyC(x + y)
(dx)
+
(dy)
=
_
0
(x)
2
(dx) +
_

0
y
2
(dy) =
2
.
Now by the strong Markov property for each n 0 the process (B
Tn+t
B
Tn
)
t0
is a Brownian
motion, independent of T
B
Tn
. So by the above argument B
T
n+1
B
Tn
has law , T
n+1
T
n
has mean
2
, and both are independent of T
B
Tn
. The result follows.
Proof of Theorem 6.35. We assume for this proof that = 1. This is enough by scaling.
Let (B
t
)
t0
be a Brownian motion and (T
n
)
n1
be the sequence of stopping times as con-
structed in Theorem 6.37. Then B
Tn
is a random walk with the same distribution as S
n
.
Let (S
t
)
t0
be the linear interpolation between the values of (S
n
).
For each N 1 we set
B
(N)
t
=
NB
N
1
t
,
69
which by the scaling invariance property of Brownian motion is again a Brownian motion.
We now perform the Skorokhod embedding construction with (B
t
)
t0
replaced by (B
(N)
t
)
t0
,
to obtain stopping times T
(N)
n
. We then set S
(N)
n
= B
(N)
T
(N)
n
and interpolate linearly to form
(S
(N)
t
)
t0
. Clearly, for all N we have
_
(T
(N)
n
)
n0
, (S
(N)
t
)
t0
_
((T
n
)
n0
, (S
t
)
t0
) .
Next we set

T
(N)
n
= N
1
T
(N)
n
and

S
(N)
t
= N
1/2
S
(N)
Nt
. Then
(
S
(N)
t
)
t0
(S
[N]
t
)
t0
and

S
(N)
n/N
= B
T
(N)
n
for all n. We need to show that for all bounded continuous functions
F : (([0, 1], R) R that as N
E[F(S
[N]
)] E[F(B)].
In fact we will show that for all > 0 we have
P
_
sup
0t1
S
(N)
t
B
t
>
_
0.
Since F is continuous, this implies that F(
S
(N)
) F(B) in probability, which by bounded
convergence is enough.
Since T
n
is a random walk with increments of mean 1 by the strong law of large numbers we
have that a.s.
T
n
n
1 as n .
So as N we have that a.s.
N
1
sup
nN
[T
n
n[ 0 as n .
Hence for all > 0 we have that as N
P
_
sup
nN
T
(N)
n
n/N
>
_
0.
Since

S
(N)
n/N
= B
T
(N)
n
for all n we have that for every n/N t (n + 1)/N there exists
T
(N)
n
u

T
(N)
n+1
such that

S
(N)
t
= B
u
. This follows by the intermediate value theorem and
the fact that (
S
(N)
t
) is the linear interpolation between the values of S
n
. Hence we have
[
S
(N)
t
B
t
[ > for some t [0, 1] [
T
(N)
n
n/N[ > for some n N
[B
u
B
t
[ > for some t [0, 1] and [u t[ + 1/N
= A
1
A
2
.
The paths of (B
t
)
t0
are uniformly continuous on [0, 1]. So for any > 0 we can nd > 0
so that P(A
2
) /2 whenever N 1/. Then by choosing N even larger we can ensure
that P(A
1
) /2 also. Hence

S
(N)
B uniformly on [0, 1] in probability as required.
Remark 6.38. From the proof above we see that we can construct the Brownian motion
and the random walk on the same space so that as N
P
_
sup
0t1
[S
[N]
t
B
t
[ >
_
0.
70
6.10 Zeros of Brownian motion
Theorem 6.39. Let (B
t
)
t0
be a one dimensional Brownian motion and
Zeros = t 0 : B
t
= 0
is the zero set. Then, almost surely, Zeros is a closed set with no isolated points.
Proof. Since Brownian motion is continuous almost surely, the zero set is closed a.s. To
prove that no point is isolated we do the following: for each rational q [0, ) we consider
the rst zero after q, i.e.
q
= inft q : B
t
= 0.
Note that
q
is an almost surely nite stopping time. Since Zeros is a closed set, this inmum
is almost surely a minimum. By the strong Markov property, applied to
q
, we have that
for each q, almost surely
q
is not an isolated zero from the right. But since the rational
numbers is a countable set we get that almost surely for all rational q, the zero
q
is not
isolated from the right.
The next thing to prove is that the remaining points of Zeros are not isolated from the left.
We claim that for any 0 < t in the zero set which is dierent from
q
for all rational q is not
an isolated point from the left. Take a sequence q
n
t with q
n
Q. Dene t
n
=
qn
. Clearly
q
n
t
n
< t and so t
n
t. Thus t is not isolated from the left.
Theorem 6.40. Fix t 0. Then, almost surely, Brownian motion in one dimension is not
dierentiable at t.
Proof. Exercise.
But also a much stronger statement is true, namely
Theorem 6.41. [Paley, Wiener and Zygmund 1933] Almost surely, Brownian motion
in one dimension is nowhere dierentiable.
7 Poisson random measures
7.1 Construction and basic properties
For (0, ) we say that a random variable X in Z
+
is Poisson of parameter and write
X P() if
P(X = n) = e
n
/n!
We also write X P(0) to mean X 0 and write X P() to mean X .
Proposition 7.1. [Addition property] Let N
k
, k N, be independent random variables,
with N
k
P(
k
) for all k. Then
k
N
k
P(
k
).
71
Proposition 7.2. [Splitting property] Let N, Y
n
, n N, be independent random variables,
with N P(), < and P(Y
n
= j) = p
j
, for j = 1, . . . , k and all n. Set
N
j
=
N
n=1
1(Y
n
= j).
Then N
1
, . . . , N
k
are independent random variables with N
j
P(p
j
) for all j.
Proof. Left as an exercise.
Let (E, c, ) be a -nite measure space. A Poisson random measure with intensity is a
map
M : c Z
+

satisfying, for all sequences (A
k
: k N) of disjoint sets in c,
(i) M(
k
A
k
) =
k
M(A
k
),
(ii) M(A
k
), k N, are independent random variables,
(iii) M(A
k
) P((A
k
)) for all k.
Denote by E
the set of Z
+
-valued measures on c and dene, for A c,
X : E
c Z
+
, X
A
: E
Z
+

by
X(m, A) = X
A
(m) = m(A).
Set c
= (X
A
: A c).
Theorem 7.3. There exists a unique probability measure
on (E
, c
) such that under
X is a Poisson random measure with intensity .

Proof. (Uniqueness.) For disjoint sets A
1
, . . . , A
k
c and n
1
, . . . , n
k
Z
+
, set
A
= m E
: m(A
1
) = n
1
, . . . , m(A
k
) = n
k
.
Then, for any measure
making X a Poisson random measure with intensity ,
(A
) =
k
j=1
e
(A
j
)
(A
j
)
n
j
/n
j
!
Since the set of such sets A
is a -system generating c
, this implies that
is uniquely
determined on c
.
(Existence.) Consider rst the case where = (E) < . There exists a probability
space (, T, P) on which are dened independent random variables N and Y
n
, n N, with
N P() and Y
n
/ for all n. Set
M(A) =
N
n=1
1(Y
n
A), A c. (7.1)
72
It is easy to check, by the Poisson splitting property, that M is a Poisson random measure
with intensity . Indeed, for disjoint A
1
, . . . , A
k
in c with nite measures, we let X
n
= j
whenever Y
n
A
j
, so that M(A
j
), 1 j k are independent P((A
j
)), 1 j k random
variables.
More generally, if (E, c, ) is -nite, then there exist disjoint sets E
k
c, k N, such
that
k
E
k
= E and (E
k
) < for all k. We can construct, on some probability space,
independent Poisson random measures M
k
, k N, with M
k
having intensity [
E
k
. Set
M(A) =
kN
M
k
(A E
k
), A c.
It is easy to check, by the Poisson addition property, that M is a Poisson random measure
with intensity . The law
of M on E
is then a measure with the required properties.

The above construction gives the following important property of Poisson random measures.
Proposition 7.4. Let M be a Poisson random measure on E with intensity , and let A c
be such that (A) < . Then M(A) has law P((A)), and given M(A) = k, the restriction
M[
A
has same law as
k
i=1
X
i
, where (X
1
, . . . , X
k
) are independent with law ( A)/(A).
Moreover, if A, B c are disjoint, then the restrictions M[
A
, M[
B
are independent.
Exercise 7.5. Let E = R
+
and = 1(t 0) dt. Let M be a Poisson random measure on
R
+
with intensity measure and let(T
n
)
n1
and T
0
= 0 be a sequence of random variables
such that (T
n
T
n1
, n 1) are independent exponential random variables with parameter
> 0. Then
_
N
t
=
n1
1(T
n
t), t 0
_
and (N
t
t
= M([0, t]), t 0)
have the same distribution.
7.2 Integrals with respect to a Poisson random measure
Theorem 7.6. Let M be a Poisson random measure on E with intensity . Then for
f /
1
(), then so is M(f) dened by
M(f) =
_
E
f(y)M(dy)
and
E[M(f)] =
_
E
f(y)(dy), var(M(f)) =
_
E
f(y)
2
(dy).
Let f : E R
+
be a measurable function. Then for u > 0
E
_
e
uM(f)
= exp
_
_
E
(1 e
uf(y)
)(dy)
_
.
Let f : E R be in /
1
(). Then for any u
E
_
e
iuM(f)
= exp
__
E
(e
iuf(y)
1)(dy)
_
.
73
Proof. First assume that f = 1(A), for A c. Then M(A) is a random variable by
denition of M and this extends to any nite linear combination of indicators. Since any
measurable non-negative function is the increasing limit of nite linear combinations of such
indicator functions, we obtain by monotone convergence that M(f) is a random variable as
a limit of random variables.
Let E
n
, n 0 be a measurable partition of E into sets of nite -measure. A similar
approximation argument shows that M(f1(E
n
)), n 0 are independent random variables.
Let f /
1
(). We will rst show the formula for the expectation and the variance. If
f = 1(A), then this is clear. This extends to nite linear combinations and to any non-
negative measurable functions by approximation. For a general f, we do the standard
procedure, separating into f = f
+
f
and use the fact that M(f

+
) and M(f
) are
independent.
Since by Proposition 7.4 given M(E
n
) = k, the restriction M[
En
has the same law as
k
i=1
X
i
, where (X
1
, . . . , X
k
) are independent with law ( E
n
)/(E
n
), we get
E[exp(uM(f1(E
n
)))] =
k=0
E[exp(uM(f))[M(E
n
) = k]P(M(E
n
) = k)
=
k=0
e
(En)
(E
n
)
k
k!
__
En
e
uf(x)
(dx)
(E
n
)
_
k
= e
(En)
exp
__
En
e
uf(x)
(dx)
_
= exp
_
_
En
(dx)(1 exp(uf(x)))
_
.
Since the random variables M(f1(E
n
)) are independent over n 0, we can take products
over n 0 and by monotone convergence we obtain the wanted formula.
To establish the formula in the case where f /
1
(), follows by the same kind of arguments.
We rst establish the formula for f1(E
n
) in place of f. Then to obtain the result, we must
show that _
An
(dx)(e
iuf(x)
1)
_
E
(dx)(e
iuf(x)
1) as n ,
where A
n
= E
0
. . . E
n
. But since [e
ix
1[ [x[ for all x, we have that
[e
iuf(x)
1[ [uf(x)[,
whence the function under consideration is integrable with respect to , which by dominated
convergence gives the result.
7.3 Poisson Brownian motions
In this section we are going to consider Poisson random measures in R
d
for d 1 with
intensity measure given by = dx, i.e. multiples of the Lebesgue measure in d dimensions.
74
Let be a Poisson random measure in R
d
of intensity (this means times Lebesgue
measure). Note that the construction of Theorem 7.3 gives that can be written as
=
i=1
X
i
,
where X
i
are random variables, since the Lebesgue measure of the whole space is innite.
We will sometimes say Poisson point process to mean a Poisson random measure in R
d
.
Proposition 7.7. [Thinning property] Let = X
i
be a Poisson point process in R
d
of
intensity . For each point X
i
we perform an independent experiment and we keep it with
probability p(X
i
) and we remove it with the complementary probability, where p : R
d
[0, 1]
is a measurable function. Thus we dene a new process that contains the points X
i
that we
kept. The process is a Poisson random measure in R
d
with intensity (A) =
_
A
p(x) dx.
Proof. The independence property follows easily from the independence of . We will now
show that for any set A with nite volume we have (A) P((A)), where is the intensity
measure given in the statement. By Proposition 7.4 we have
P((A) = k) =
nk
P((A) = n, (A) = k)
=
nk
e
vol(A)
(vol(A))
n
n!
_
n
k
___
A
p(x)
dx
vol(A)
_
k
__
A
(1 p(x))
dx
vol(A)
_
nk
= e
vol(A)
k
k!
__
A
p(x) dx
_
k
nk
nk
(n k)!
__
A
(1 p(x)) dx
_
nk
= e
vol(A)
k
k!
__
A
p(x) dx
_
k
exp
_
_
A
(1 p(x)) dx
_
= exp
_
_
A
p(x) dx
_
(
_
A
p(x) dx)
k
k!
.
Proposition 7.8. Let = X
i
be a Poisson point process in R
d
of intensity . Let (Y
i
)
be i.i.d. random variables with law . Dene the measure =
X
i
+Y
i
. Then is again
a Poisson point process of the same intensity as .
Proof. It suces to check that for any u > 0 and f : R
d
R
+
we have
E
_
e
u(f)
= exp
_
_
R
d
(e
uf(x)
1) dx
_
.
We can write
E
_
e
u(f)
= E
_
e
u
i
f(X
i
+Y
i
)
75
and conditioning on X
i
and using the independence of the (Y
i
)s we obtain
E
_
e
u(f)
= E
_
E
_
e
u
i
f(X
i
+Y
i
)
[
= E
_
i
_
R
d
e
uf(X
i
+y)
(dy)
_
= E
_
exp
_
log
i
_
R
d
e
uf(X
i
+y)
(dy)
__
= E
_
exp
_
i
_
log
_
R
d
e
uf(X
i
+y)
(dy)
_
__
= E[exp ((g))] ,
where g(x) = log
_
R
d
e
uf(x+y)
(dy). By Theorem 7.6 we have
E[exp ((g))] = exp
_
_
R
d
_
exp
_
log
_
e
uf(x+y)
(dy)
_
1
_
dx
_
= exp
_
_
R
d
__
R
d
e
uf(x+y)
(dy) 1
_
dx
_
= exp
_
_
R
d
_
e
uf(x)
1
_
dx
_
,
where in the last step we used Fubinis theorem and the fact that is a probability measure
on R
d
.
For the rest of this section we are going to consider the following model: let (0) be a Poisson
point process in R
d
of intensity , let (0) = X
i
. We now let each point of the Poisson
process move independently according to a standard Brownian motion in d dimensions.
Namely the point X
i
moves according to the Brownian motion (
i
(t))
t0
. This way at every
time t we obtain a new process (t) = X
i
+
i
(t), which by Proposition 7.8 is again a
Poisson point process of intensity .
We can think of the points of the Poisson process (0) as the users of a wireless network
that can communicate with each other when they are at distance at most r from each other.
So it is natural to introduce mobility to the model and this is why we let them evolve in
space.
We now x a target particle which is at the origin of R
d
and we are interested in the rst
time that one of the points of the Poisson process is within distance r from it, i.e. we dene
T
det
= inf t 0 : 0
i
B(X
i
+
i
(t), r) ,
where B(x, r) stands for the ball centred at x of radius r.
Theorem 7.9. [Stochastic geometry formula] Let be a standard Brownian motion in
d dimensions and let W(t) =
st
B((s), r) be the so-called Wiener sausage up to time t.
Then, for any dimension d 1, the detection probability satises
P(T
det
> t) = exp(E[vol(W(t))]).
76
Random walk sausage
Proof. Let be the set of points of (0) that have detected 0 by time t, that is
= X
i
(0) : s t s.t. 0 B(X
i
+
i
(s), r).
Since the
i
s are independent we have by Proposition 7.7 that is a thinned Poisson point
process with intensity (x)dx where is given by
(x) = P(x
st
B((s), r)),
for is a standard Brownian motion.
So for the probability that the detection time is greater than t we have that
P(T
det
> t) = P((R
d
) = 0) = exp
_
_
R
d
P(x
st
B((s), r)) dx
_
= exp(E[vol(
st
B((s), r))]) = exp(E[volW(t)]),
where the third equaliy follows by Fubini.
Theorem 7.10. The expected volume of the Wiener sausage W(t) =
st
B((s), r) satises
as t
E[vol(W(t))] =
_
_
_
8t
+ 2r for d = 1
2t
log t
(1 + o(1)) for d = 2
2
d/2
r
d2
t
(
d2
2
)
(1 + o(1)) for d 3.
Proof. Dimension d = 1 is left as an exercise.
For all d we have that
E[vol(W
t
)] =
_
R
d
P(y
st
B((s), r))dy =
_
R
d
P(
B(y,r)
t)dy
= vol(B(0, r)) +
_
R
d
\B(0,r)
P(
B(y,r)
t)dy,
77
where
A
is the rst hitting time of the set A by the Brownian motion. Dene
Z
y
t
=
_
t
0
1((s) B(y, r)) ds, (7.2)
i.e. the time that the Brownian motion spends in the ball B(y, r) before time t. It is clear
by the continuity of the Brownian paths that Z
y
t
> 0 =
B(y,r)
t. We now have
P(Z
y
t
> 0) =
E[Z
y
t
]
E[Z
y
t
[Z
y
t
>0]
and for the rst moment we have
E[Z
y
t
] =
_
t
0
P
0
((s) B(y, r)) ds =
_
t
0
_
B(y,r)
1
(2s)
d/2
e
|z|
2
2s
dz ds =
_
t
0
_
B(0,r)
1
(2s)
d/2
e
|z+y|
2
2s
dz ds
and for the conditional expectation E[Z
y
t
[Z
y
t
> 0], if we write T for the rst time that the
Brownian motion hits the boundary of the ball B(y, r), then we get that in 2 dimensions for
all y / B(0, r)
E[Z
y
t
[Z
y
t
> 0] = E
__
t
T
1((s) B(y, r)) ds
_
_
t
0
P
0
((s) B(0, r)) ds
1 +
_
t
1
_
B(0,r)
1
2s
e
|z|
2
2s
dz ds 1 +
r
2
2
log t.
In dimensions d 3 we have for all y / B(0, r)
E[Z
y
t
[Z
y
t
> 0] = E
__
t
T
1((s) B(y, r)) ds
_
_
t
0
P
0
((s) B(0, r)) ds
=
_
t
0
_
B((0,r),r)
1
(2s)
d/2
e
|z|
2
2s
dz ds =
1
(2)
d/2
_
B((0,r),r)
1
[z[
d2
_

|z|
2
2t
s
d/22
e
s
ds dz,
where B((0, r), r) stands for the ball centred at (0, . . . , 0, r) and of radius r and the last step
follows by a change of variable. Now notice that
_

|z|
2
2t
s
d/22
e
s
ds
_
d 2
2
_
as t
and by the mean value property for the harmonic function 1/[z[
d2
we get that
_
B((0,r),r)
dz
[z[
d2
= vol(B(0, 1))r
2
.
So, putting all things together we obtain that in 2 dimensions
E[vol(W
t
)] = vol(B(0, r)) +
_
R
2
\B(0,r)
E[Z
y
t
]
E[Z
y
t
[Z
y
t
> 0]
dy
vol(B(0, r)) +
_
t
0
_
B(0,r)
_
_
R
2
1
2s
e
|z+y|
2
2s
dy
_
dz ds
_
B(0,r)
E[Z
y
t
] dy
1 +
r
2
2
log t
= vol(B(0, r)) +
2tr
2
2 + r
2
log t

2
_
B(0,r)
E[Z
y
t
] dy
2 + r
2
log t
.
78
It is easy to see that
_
B(0,r)
E[Z
y
t
] dy = O(log t) and hence in 2 dimensions we get
liminf
t
E[vol(W
t
)]
log t
2t
1.
In d 3 we obtain in the same way as above
liminf
t
E[vol(W
t
)]
_
d2
2
_
2
d/2
r
d2
t
1,
since
_
B(0,r)
E[Z
y
t
] dy = O(1). It remains to show that in 2 dimensions
limsup
t
E[vol(W
t
)]
log t
2t
1 (7.3)
and in d 3 that
limsup
t
E[vol(W
t
)]
_
d2
2
_
2
d/2
r
d2
t
1. (7.4)
Let > 0. We dene

Z
y
t
=
_
t(1+)
0
1((s) B(y, r)) ds and use the obvious inequality
P(Z
y
t
> 0)
E[
Z
y
t
]
E[
Z
y
t
[Z
y
t
> 0]
.
We can now lower bound the conditional expectation appearing in the denominator above
as follows. In d = 2 we have
E[
Z
y
t
[Z
y
t
> 0]
_
t
0
_
B((0,r),r)
1
2s
e
|z|
2
2s
dz ds
_
t
log(t)
_
B((0,r),r)
1
2s
e
|z|
2
2s
dz ds
r
2
e
2r
2
log(t)
2
(log(t) log log(t)).
For d 3 we have
E[
Z
y
t
[Z
y
t
> 0]
_
t
0
_
B((0,r),r)
1
(2s)
d/2
e
|z|
2
2s
dz ds =
1
(2)
d/2
_
B((0,r),r)
1
[z[
d2
_

|z|
2
2t
s
d/22
e
s
ds dz.
Similarly to the calculations leading to the lower bound we get that in d = 2
E[vol(W
t
)]
2t(1 + )e
2r
2
log(t)
log t + log log log(t)
and hence for d = 2
limsup
t
E[vol(W
t
)]
log t
2t
1 + ,
for all > 0, and thus letting go to 0 proves (7.3).
For d 3 in the same way we obtain
limsup
t
E[vol(W
t
)]
_
d2
2
_
2
d/2
r
d2
t
1 + ,
for all > 0, and thus letting go to 0 proves (7.4).
79
Now suppose that the target particle is moving according to a deterministic function
f : R
+
R
d
. We dene the detection time
T
f
det
= inft 0 : f(t)
i
B(X
i
+
i
(t), r)).
Then we have the following theorem which is non-examinable:
Theorem 7.11 (Peres-Sousi). For all times t and all dimensions d we have
P(T
f
det
> t) P(T
det
> t).
Using a straightforward generalization of Theorem 7.9 we get the equivalent statement
Theorem 7.12 (Peres-Sousi). For all times t and all dimensions d we have
E[vol(W
f
(t))] E[vol(W(t))],
where W
f
(t) =
st
B((s) + f(s), r).
We now conclude this course by stating an open question.
Question 7.13. Does the stochastic domination inequality
P(vol(
st
B((s) + f(s), r)) ) P(vol(
st
B((s), r)) )
also hold?
Acknowledgements
These notes are based in part on the lecture notes by James Norris and Gregory Miermont
for the same course and [1].
References
[1] Peter M orters and Yuval Peres. Brownian motion. Cambridge Series in Statistical and
Probabilistic Mathematics. Cambridge University Press, Cambridge, 2010. With an ap-
pendix by Oded Schramm and Wendelin Werner.
[2] David Williams. Probability with martingales. Cambridge Mathematical Textbooks.
Cambridge University Press, Cambridge, 1991.
80

My Notes

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

My Notes

Hochgeladen von

Copyright:

Verfügbare Formate

Advanced Probability

October 13, 2013

University of Cambridge, Cambridge, UK; p.sousi@statslab.cam.ac.uk

= max(X, 0) and we dene

] is nite. We call the random variable X integrable, if it

= inf : [f[ a.e.

the set of measurable functions with |f|

, and hence, we can write X as X = Y +Z, where Y /

. If we now set Y = E[X[(], then (a) is clearly satised. Let A (. Then

= max(X, 0) and then

[(] satises (a) and (b).

. Rearranging gives the desired inequality.

-measurable and by Fatous lemma and the assumption on X being in L

3. There exists a random variable Z /

, and hence we get the equality for all A T

[1(T = n)] = E[[X

-measurable, since the -algebras (

] a.s., it only remains to show that if A (

is trivial, i.e. every A T

has probability P(A) 0, 1.

] = 1(A) = P(A) a.s.,

a.s. as n for some random

. Finally since E[M

exists and is nite

1(A)], then Q(A) =

a.s. as t , for some X

3. There exists a random variable Z /

and this if and only if X is

), then we get that for s, t T

(x) = ((, x]), x R.

is continuous at x, the distribution functions F

. Thus we get that

(remember there are only countably many points of

. Another way to construct the measure is given in [2, Section 3.12].

1. Thus for C = (1 sin 1)

> 0 a.s. for all > 0.

0) = 1. Therefore, for all x we have

for the process dened by

is a Brownian motion independent of T

is a Brownian motion, and hence

(0, x)) dx.

(0, x)) dx = f(s, B

(A) = (A), A B([0, )).

) such that under

X is a Poisson random measure with intensity .

making X a Poisson random measure with intensity ,

, this implies that

is then a measure with the required properties.

and use the fact that M(f

Das könnte Ihnen auch gefallen