Beruflich Dokumente
Kultur Dokumente
MARKOV CHAINS
With 40 Figures
Springer-Vedag
New York Heidelberg Berlin
David Freedman
Department of Statistics
University of California
Berkeley, CA 94720
U.S.A.
The original version of this book was published by Holden-Day, Inc. in 1971.
9 8 7 6 5 432 1
A long time ago I started writing a book about Markov chains, Brownian
motion, and diffusion. I soon had two hundred pages of manuscript and my
publisher was enthusiastic. Some years and several drafts later, I had a
thousand pages of manuscript, and my publisher was less enthusiastic. So
we made it a trilogy:
Markov Chains
Brownian Motion and Diffusion
Approximating Countable Markov Chains
In one semester, you can cover Sections 1.1-9, 5.1-3, 7.1-3 and 9.1-3. This
gets you the basic results for both discrete and continuous time. In one year
you could do the whole book, provided you handle Chapters 4, 6, and 8
lightly. Chapters 2-4, 6 and 8 are largely independent of one another, treat
specialized topics, and are more difficult; Section 8.5 is particularly hard.
I do recommend looking at Section 6.6 for some extra grip on Markov times.
Sections 10.1-3 explain the cruel and unusual notation, and the reference
system; 10.4-9 review probability theory quickly; 10.10-17 do the more ex-
otic analyses which I've found useful at various places in the trilogy; and
a few things are in 10.10-17 just because I like them.
Chapter 10 is repeated in B & D; Chapters I, 5, 7 and 10 are repeated in
ACM. The three books have a common preface and bibliography. Each has
its own index and symbol finder.
Acknowledgments
David Freedman
Berkeley, California
September, 1982
TABLE OF CONTENTS
6. Proof of Kingman-Orey 64
7. An example of Dyson 70
8. Almost everywhere ratio limit theorems 73
9. The sum of a function over different j-blocks 75
4. THE BOUNDARY
1. Introduction 111
2. Proofs 113
3. A convergence theorem 121
4. Examples 124
5. The last visit to i before the first visit to J\{i} 132
Part III.
10. APPENDIX
1. Notation 329
2. Numbering 330
3. Bibliography 330
4. The abstract Lebesgue integral 331
5. Atoms 334
6. Independence 337
7. Conditioning 338
8. Martingales 339
9. Metric spaces 346
10. Regular conditional distributions 347
XIV CONTENTS
BIBLIOGRAPHY 367
INDEX 373
INTRODUCTION TO DISCRETE
TIME
1. FOREWORD
I want to thank Richard Olshen for checking the final draft of this chapter.
1
2 INTRODUCTION TO DISCRETE TIME [l
Then ~o, ~l' ... is the coordinate process. Give I'" the smallest a-field
a(/"') over which each coordinate function is measurable. Thus, a(JOC') is
generated by the cylinders
go = io, .•. , ~n = in}·
For any i E I and stochastic matrix P on I, there is one and only one proba-
bility Pi on r making the coordinate process Markov with stationary
transitions P and starting state i. In other terms:
Pig n = in for n = 0, ... , N} = n~':-ol P(in' in+1)'
for all N and in E I with io = i. The probability Pi really does depend only
on P and i.
Now r is the sample space for X, namely the space of all realizations.
More formally, there is a mapping M from !![ to I"', uniquely defined by
the relation
~ n(Mx) = Xn(x) for all n = 0, I, ... and x E .0£.
That is, the nth coordinate of Mx is Xn(x), and Mx is the sequence of states
X passes through at x, namely: (Xo(x), X1(x), X 2(x), ... ). Check that Mis
measurable. Fix i E I and a stochastic matrix P on l. Suppose X is Markov
with stationary transitions P and starting state i. With respect to the distri-
bution of X, namely f!lJ M-l, the coordinate process is Markov with stationary
transitions P and starting state i. Therefore f!lJ M-l = Pi' Conversely,
f!lJM-l = Pi implies that X is Markov with stationary transitions P and
starting state i. Now probability statements about X can be translated into
statements about Pi' For example, the following three assertions are all
equivalent:
(5a) Pi{~n = i for infinitely many n} = 1.
(5b) For some Markov chain X with stationary transitions P and starting
state i,
f!lJ{Xn = i for infinitely many n} = 1.
(5c) For all Markov chains X with stationary transitions P and starting
state i,
f!lJ{Xn = i for infinitely many n} = 1.
Indeed, the set talked about in (5b) is the M-inverse image of the set talked
about in (5a); and Pi = f!lJM-l.
The basic theory of these processes is developed in a rapid but complete
4 INTRODUCTION TO DISCRETE TIME [1
way in Sections 3-9; Sections 10, 12, and 14 present some examples, while
Sections 11 and 13 cover special topics. Readers who want a more leisurely
discussion of the intuitive background should look at (Feller, 1968, XV) or
(Kemeny and Snell, 1960). Here is a summary of Sections 3-9.
2. SUMMARY
The main result in Section 3 is the strong Markov property. To state the
best case of it, let the random variable T on fOO take only the values
0, 1, ... , 00. Suppose the set {T = n} is in the a-field spanned by ~o, ... , ~n
for n = 0, 1, ... , and suppose
(HH) t ! t !
Then 1,2,3 are essential and 4 is inessential. Moreover, 1 +-+ 1 while 2 +-+ 3.
For the rest of this summary,
Then ~ is an equivalence relation. For the rest of this summary, suppose that
I consists of one equivalence class, namely,
(
0 0
o 0
t
t t
1)
t tOO
t tOO
Then I has period 2, and Co = {I, 2} and Cl = {3, 4}.
For the rest of the summary,
suppose period i = 1 for all i E I.
*
HINT. See (16) below.
For the rest of this summary,
suppose all i E I are recurrent.
6 INTRODUCTION TO DISCRETE TIME [1
PI P2 P3
o I------.......j I------l 2 I------l 3 1------ ...
••
•
Figure 1
(10) Example. Let f = {a, I, 2, ... }. Let Pn > and ~:=l Pn = 1. Let·
pea, n) = Pn and pen, n - I) = 1 for n = 1,2, .... See Figure 2. The
°
states are positive recurrent or null recurrent according as ~:=1 nPn is finite
or infinite.
1.3] THE MARKOV AND STRONG MARKOV PROPERTIES 7
P2
P3
••
•
Figure 2
Let I be a finite or countably infinite set. Give I the discrete a-field, that is,
the a-field of all its subsets. Let r be the space of all I-sequences, namely,
functions from the nonnegative integers to I. For W E rand n = 0, 1, ...
let ~n«(O) = w(n). Call ~ the coordinate process. Give r the product a-field
a(I"'), namely, the smallest a-field such that ~o, ~1' ••• are measurable. A
matrix P on I is a function (i,j) -+ P(i,j) from I X I to the real line. Say P is
stochastic iff P(i,j) ~ 0 for all i,j and ~; P(i,j) = 1 for all i. Say P is sub-
stochastic iff P(i,j) ~ 0 for all i,j and ~iP(i,j) ~ 1 for all i. Let P be
8 INTRODUCTION TO DISCRETE TIME [1
function. Thus, Tnw is w shifted n times to the left; the first n terms of w
disappear during this transaction. In slightly ambiguous notation,
Pw = (w(n), wen + 1), ... ).
Formally,
;m Tn = ;m+n;
0
Then
T-n{;o = jo, ... , ;m = jm} = {;n = jo, ... , ;n+m = jm}·
So Tn is measurable.
Theorem (14) makes an assertion about regular conditional distributions:
these objects are discussed in the Appendix. And (14) uses the symbol P Sn .
This is an abbreviation for a function Q of pairs (w, B), with wE rand
BE a(/""), namely:
Q(w, B) = PSn<w)(B).
(14) Theorem. (Markov property). P;n is a regular conditional P p-distribution
for T" given ;0' ... , ;n·
PROOF. For WEI'" and B E a(I"'), let
Q(w, B) = P;n<w)(B).
For each w, the function B -+ Q(w, B) is a probability. For each B, the
function w -+ Q(w, B) is measurable on ;0, ... , ;n' because ;n is. What I
L
need is
for all A measurable on ;0' ... , ;n and all measurable B. Both sides of this
equality are countably additive in A. If I could prove the equality separately
for each piece {A and ;n = j}, I could finish by summing outj. But {;n = j}
is measurable on ;0' ... , ;n, so {A and ;n = j} is the typical subset of
gn = j} measurable on ;0'··.' ;n; therefore, I only have to prove the
equality for subsets A of {;n = j} measurable on ;0' ... , ;n- The integrand
on the right is now constant, so the integral is
P ptA} . Pi{B}.
What I have left to prove is this identity:
(15) P p{A and Tn E B} = P ptA} . Pi{B}
for all subsets A of {;n = j} which are measurable on ~o, ... , ~n' and all
10 INTRODUCTION TO DISCRETE TIME [1
*
of two special B is special: indeed, two different special B are either disjoint
or nested. Use (10.16) to' complete the proof.
Let A be a subset of gn = j} which is measurable o'n ~o, ... , ~n' and letf
be a nonnegative, measurable function on I"'. Then (15) can be rewritten as
(15*)
Check
So,
Pp{G n} = ~i Pp{~n = i and G n}
= ~i Pp{~n = ;}Pi{GO} by (15)
=1.
*
If P and Q are sub stochastic matrices on I, so is PQ, where:
Pig n = j} = pn(i,j).
finite. More formally, the atoms of .'FTare the singletons in {T = oo}, and
all sets
so , is measurable.
'n = ~n 0 "
Q(w, B) = p'o(co)(B).
1.3] THE MARKOV AND STRONG MARKOV PROPERTIES 13
(22*)
PROOF. Let n = 1,2, ... ; let io "e j and let iI' ... , im E I. Then
{~o = j and T = n and So = io, ... , Sm = im }
= {~o = ... = ~n-l = j, ~n = io, .•. , ~n+m = im },
an event of P;-probability
*
14 INTRODUCTION TO DISCRETE TIME [1
For (24), keep j E I with P(j,j) < 1. Introduce the notion of a j-sequence
of~; namely, a maximal interval of times n with ~n = j. Let CI , C2 , ••• be the
cardinalities of the first, second, ... j-sequences of ~. Let Cn+! = 0 if there
are n or fewer j-sequences. Let AN be the event that there are N or more
j-sequences in ~.
(24) Proposition. Given AN' the variables CI - 1, C2 - 1, ... , C N - 1 are
conditionally Pp-independent and geometrically distributed, with common
parameter P(j,j).
PROOF. Fix positive integers Nand cl , ... , CN' I claim
(25) Pp{CI = CI , •.• , CN = CN IAN} = Pj{CI = CI , .•. , CN = CN I AN}'
Let
B = {CI = CI , ••. , = CN and AN}'
CN
Let a be the least n if any with ~ n = j, and a = 00 if none. Then a is Markov.
Let 'fJ be the post-a process. Now
B c: Al = {a < oo}
'fJo = jon {a < oo}
AN = {a < 00 and 'fJ EAN}'
C n = Cn 'fJ for n = 1, ... , N.
0
So
(27)
Divide (26) by (27) to substantiate the claim (25). The case N = 1 is now
immediate from (23).
Abbreviate 6 = P(j,j) and q = qj, as defined for (23). I claim
(28) Pj{CI = CI, ••• , C N +1 = CN+! I A N +1}
= (1 - 0)6C 1 -1 Pq{CI = C2, ... , CN = eN+! IAN}'
This and (25) prove (24) inductively. To prove (28), let T be the least n if
any with ~n =F j, and T = 00 if none. Let ~ be the post-T process. On {~o = j},
CI = T and AN+! =g EAN}'
1.3] THE MARKOV AND STRONG MARKOV PROPERTIES 15
On go = j and A N +!},
Cn +! = C n 0 ~ for n = 1, ... , N.
By (23),
(29) P j {CI = c1 , .•• , CN +! = CS + 1 and AN+!}
Let C be a measurable subset of the space of i-blocks, and let A E ffT with
AC{Tn<oo}. n
Then
{Tn < 00 and Bn E e} = {Tn < CIJ and ~ E BIle}.
With the help of (22):
Pp{A and Bn E e} = Pp{A and ~ E B11e}
= Pp{A} . Pi{BllC}
= Pp{A} . ,u{C}.
To identify ,u in (31), anticipate a more general definition. Let P{i} be this
*
substochastic matrix on I: for j E I and k ~ i, let P{i}(j, k) = P(j, k);
while P{i}(j, i) = 0.
(32) Proposition. With respect to Pi' the first i-block has distribution P{i}t:
it is Markov with stationary substochastic transitions P{i}, starting from i.
PROOF. Confine w to the set where ~o = i, so TI = 0. This set has P i -
probability 1. Let Pm be the mth term of the first i-block, so
Pm = ~m when T2 > m
Pm is undefined when T2 ~ m.
Now
Pi{Pm = .im for m = 0, ... , M and T2 > M}
is ° unless jo
probability is
= i, while jl, ... ,jm all differ from i; in which case, this
(33) Definition.
i °
--+ j iff pn(i,j) > for some n > 0.
i ~ j iff i --+ j and j --+ i.
i is essential iff i --+ j implies j --+ i.
You should check the following properties of --+.
(34) For any i, there is aj with i --+ j:
because '2:. j P(i,j) = 1.
(35) Ifi --+ j and j --+ k, then i -+ k:
1.4] CLASSIFICATION OF STATES 17
because
(36) pn+m(i, k) ~ pn(i,j) . pm(j, k).
(37) If i is essential, i -- i:
use (34), the definition, and (35).
(38) Lemma. ~ is an equivalence relation when restricted to the essential
states.
PROOF. Use properties (35, 37).
The ~ equivalence classes of essential states will be called communicating
*
classes or sometimes just classes. The communicating class containing i is
sometimes written C(i). You should check
(39) ).;j {P(i,j):j E C(i)} = 1:
indeed, P(i,j) > 0 implies i -- j, so j -- i because i is essential; and j E C(i).
(40) Lemma. If i is essential, and i -- j, then j is essential.
*
PROOF. Suppose j -- k. Then i -- k by (35), so k -- i because i is essential.
And k -- j by (35) again.
(41) Definition. Ifi -- i, then period i is the g.c.d. (greatest common divisor)
of {n:n > 0 and pn(i, i) > O}.
(42) Lemma. i~ j implies period i = period j.
PROOF. Clearly,
(43) PHm+b(i, i) ~ pa(i,j). pm(j,j) . pb(j, 0.
Choose a and b so that pa(i,j) > 0 and Pb(j, i) > o. If pm(j,j) > 0,
thenP2m(j,j) > 0 by (36), so (43) implies
pa+m+b(i, i) > 0 and > O.
pa+2 m+b(i, i)
Therefore, period i divides a+m +b and a + 2m + b.
So period i
*
divides the difference m. That is, period i is no more than periodj. Equally,
period j is no more than period i.
Consequently, the period of a class can safely be defined as the period of
any of its members. As usual, m == n (d) means that m - n is divisible by d.
For (44) and (45), fix i E I and suppose
I forms one class of essential states, with period d.
(44) Lemma. To each j E I there corresponds an rj = 0, 1, ... , d - 1,
such that: pn(i,j) > 0 implies n == rj (d).
18 INTRODUCTION TO DISCRETE TIME [1
PROOF. Choose s so that p.(j, i) > O. If pm(i,j) > 0 and pn(i,j) > 0,
then pm+.(i, i) > 0 andpn+.(i, i) > 0 by (36). So period i = d divides m + s
and n + s. Consequently, d divides the difference m - n, and m == n (d).
You can define r; as the remainder when n is divided by d, for any n with
F~n>Q *
Let Cr be the set of j with r; == r (d), for each integer r. Thus, Co = Cd'
(45) Theorem. (a) Co, ... , Cd - l are disjoint and their union is I.
(b) j E Cr and P(j, k) > 0 imply k E Cr+1'
PROOF. Assertion (a). Use (44).
Assertion (b). If P"(i,j) > 0 and P{j, k) > 0, then pn+1(i, k) > 0 by
(36). Since n == r (d), therefore n + 1 == r + 1 (d) and rk == r + 1 (d),
using (44) again. *
(46) Proposition. Let A o, ... ,Ad- l be disjoint sets whose union is I. For
integer rand s, let Ar = As when r == s (d). Suppose j E Ar and P(j, k) > 0
imply k E Ar+l' Fix io E Ao. Then An = Cn(io).
*
so JEAn. That is, Cn(io) C An. Now (45) and the first condition on the sets
Ao, ... ,Ad- l imply Cn{io) = An·
Corollary. Ifj E Cr(i), then C.(j) = Cr+s(i).
(48) Proposition. States j and k are ill the same Cr iff there is h in [ and an
n > 0 such that pn(j, h) > 0 and pn(k, h) > O.
PROOF. The ifpart is clear. Foronlyif,supposejECo(k). Thenpad(j, k) > 0
for some positive integer a. But (59) implies prul(k, k) > 0 for all positive
integers n ~ no. Thus, (36) makes
p(a+no)d(j, k) >0 and p(a+no)d(k, k) > o.
5. RECURRENCE
theorem (31), all j-blocks are finite with P;-probability I, that is, ~ visits
infinitely manyj's withP;-probability 1. This also proves (c) whenfP(j,j) = 1.
Assertion (b) and (c). Clearly, eP(j,j) is the Prmean number of j-blocks.
But (31) implies
P;{Bn+l is infinite B I ,I ... , Bn are finite} = 1- fP(j,j).
= r
J{r< co}
~:'=o c5;({,,) dPi
*
Interchange i and j. Finally, use (51a).
The next result implies that recurrence is a class property; that is, if one
state in a class is recurrent, all are.
(55) Theorem. fP(j,j) = 1 and j -+ k implies
fP(j, k) = fP(k,j) = fP(k, k) = 1.
PROOF. Suppose k =/= j. Let Bb B 2, . .. be thej-blocks. SincefP(j,j) = 1,
by the block theorem (31), the Bn are independent, identically distributed,
finite blocks relative to Pi' Since j -+ k, therefore Bl contains a k with
positive Prprobability. The events {Bn contains a k} are Prindependent and
have common positive Prprobability. Then (52) implies that with Prprob-
ability 1, there is an n such that Bn contains a k. Let T be the least n if any
with ~n = k, and T = 00 if none. Plainly, T is Markov. The first part of the
argument shows fP(j, k) = Pi{T < ex)} = 1. The strong Markov property
(22) impliesfP(k,j) is the Prprobability that ~r+n = j for some n = 0, 1, ....
So fP(k,j) = 1. Finally, use (53). *
NOTE. If j is recurrent, then j is essential.
(56) Proposition. For finite I, there is at least one essential state; and a state
is recurrent iff it is essential.
PROOF. Suppose io E I is not essential. Then io -+ il -f+ io; in particular,
il =/= io· If i1 is not essential, il -+ i2 -f+ iI' in particular, i2 =/= io and i2 =/= i 1.
And so on. This has to stop, so there is an essential state. Next, suppose J is a
finite communicating class. Any infinite J-sequence contains infinitely many
j, for some j E J. Fix i E J. There is one j E J such that
°
distributed, positive integer-valued random variables,on the triple (Q,.?7, &).
Let # be the expectation of Y i , and 1/# = if # = 00. Let So = 0, and Sn =
Yl + ... + Yn , and let
This result is immediate from (65) and (66). Lemma (65) follows from
(58-64), and (66) is proved by a similar argument. To state (58-59), let F
be a subset of the integers, containing at least one nonzero element. Let
group F (respectively, semigroup F) be the smallest subgroup (respectively,
subsemigroup) of the additive integers including F.
More constructively , semigroup F is the set of all integers n which can be
represented asfl + ... + fm for some positive integer m andfl' ... ,fm E F.
And group F is the set of all integers n which can be represented as a - b,
with a, b E semigroup F. If A E group F and q is a positive integer, then
Aq = ~~~l A E group F.
(58) Lemma. g.c.d. F is the least positive element A of group F.
PROOF. Plainly, g.c.d. F divides A, so g.c.d. F ~ A. To verify that A
divides any f in F, let f = Aq + r, where q is an integer and r is one of
0, ... , A - 1. Now r = f - Aq E group F, and
Consequently, A ~ g.c.d. F.
°
~ r < A, so r = 0.
*
(59) Lemma. Suppose each f in F is positive. Let g.c.d. F = 1. Then for
some positive integer m o, semigroup F contains all m ~ mo.
PROOF. Use (58) to find a and b in semigroup Fwith a - b = 1. Plainly,
semigroup F::::> semigroup {a, b}. I say semigroup {a, b} contains all m ~ b2 •
For if m ~ b2 , then m = qb + r, where q is a nonnegative integer and r is
one of 0, ... , b - 1. Necessarily, q ~ b, so q - r > 0. Then
*
by strong Markov (22) or a direct argument. The Am are pairwise disjoint,
and their union is n. So ~::'~o f!J{Am} = 1.
PROOF. As usual, ~:~o .9'{Yi > m} = ft. Suppose ft < 00. In (64),
replace n by n* of (63). Then let n -+ 00. Dominated convergence implies
~:~o .9'{Yi > m}' L = 1, so L = 11ft. When ft = 00, use the same argu-
ment and Fatou to get
~:~o .9'{Y; > m} . L ~ 1,
=0=
forcing L
(66) Lemma.
11ft.
*
and reverse the inequalities at the end. Lemmas (63, 65) still hold, with the
same proof.
This completes the proof of (57). I will restate matters for use in (69).
Abbreviate Pn = .9'{Yi = n}. Drop the assumption that
> O} = 1.
g.c.d. {n:Pn
PROOF. Plainly, {m: U(m) > O} = {O} u semigroup {n :Pn > O}. If F
is a nonempty subset of the positive integers, g.c.d. F = g.c.d. semigroup F.
This does (a). Claim (b) reduces to (57) when you divide the 1'; by d. *
7. THE LIMITS OF pn
The renewal theorem gives considerable insight into the limiting behavior
of pn. To state the results, let mP(i,j) be the P;-expectation of 'Ti' where 'Ti
is the least n > 0 if any with; n = j, and 'Tj = 00 if none. For n = 0, 1, ...
let
cpnp(i,j) = P;{;n = j and ;m ~ j for m < n}.
Thus, cpoP(i,j) is 1 or 0, according as i =j or i ~ j. And cpnp(i, i) = 0 for
n > O. Let
PROOF. Claim (a) follows from (b) and (d), or can be proved directly,
as in (Doob, 1953, p. 175).
Claim (b) is similar to (d), and the proof is omitted.
Claim (c) is the leading special case of (d). Suppose (c) were known for
i = j. Then (c) would hold for any i, by using dominated convergence on the
identity
(70)
26 INTRODUCTION TO DISCRETE TIME [I
8. POSITIVE RECURRENCE
Call j positive recurrent iff j is recurrent and mP(j,j) < 00. Call j null
recurrent iff j is recurrent and mP(j,j) = 00. Is positive recurrence a class
property? That is, suppose C is a class andj E C is positive recurrent. Does it
follow that all k E C are positive recurrent? The affirmative answer is provided
by (76), to which (71-73) are preliminary. Theorem (76) also makes the harder
assertion: in a positive recurrent class, mP(i,j) < 00 for all pairs i, j.
Lemma (71) is Wald's (1944) identity. To state it, let Y u Y 2 , ••• be in-
dependent and identically distributed on (0, :F, 9). Let
Sn = Y1 + ... + Y n,
(75)
=
Consequently, Pi{lim sup Am}
This repeats part of (55).
1.
*
(76) Theorem. Let I be a recurrent class. Either mP(i,j) < 00 for all i and
j in I, or mP(j,j) = 00 for all j in I.
For a generalization, see (2.98).
PROOF. Suppose mP(i, i) < 00 for an i in I. Fix j ¥- i. What must be
proved is that mP(i,j) , mP(j, i), and mP(j,j) are all finite. To start the proof,
confine w to Iro (\ go = i}. Let r be the least n such that the nth i-block
contains a j. Using the Am of (73), and the notation C\D for the set of points
in C but not in D,
{r = n} = (/ro\Al) (\ ... (\ «(,,\A n- 1) (\ An.
Relation (74) implies that r is Pi-distributed like the waiting time for the
first head in tossing a p-coin, where 0 < p = Pi(Am) by (75). Now (72)
implies f r dP i < 00. Let Yu Y 2 , ••• be the lengths of the successive i-blocks.
28 INTRODUCTION TO DISCRETE TIME [1
By the block theorem (31), the Ym's are Pi-independent and identically
distributed; and {T < n} is Pi-independent of Y n . By definition, J Y1 dP i =
mP(i, i). Now Wald's identity (71) forces
u-J
1st j
I. T .1.
Figure 3
As in Figure 3, let T(w) be the least n with w(n) = j. Let T(w) + U(w) be the
least n > T( w) with w(n) = i. Then
*
PROOF. Reduce I to the communicating class containing j, and use (79)
below.
(79) Proposition. Let Ti be the least n if any with ~ n = j, and Ti = 00 if
none. Suppose I is finite, j is a given state in I, and i ---+- j for all i E I. Then there
are constants A and r with 0 < A < 00 and 0 < r < 1, such that
Pi{T; > n} ~ Ar n for all i E I and n = 0,1, ....
PROOF. Let P agree with P except in the jth row, where P(j,j) = 1. Then
Pi{eo = i o, ... , ~m = im } = Pi{eo = i o, ... , ~m = im }
1.9] INVARIANT PROBABILITIES 29
provided io, iI, ... , im- I are all different fromj; however, im = j is allowed.
Sum over all such sequences with im = j and m ~ n:
P;{-rj ~ n} = Pih ~ n} = pn(i,j).
So i -- j for P, and I only have to get the inequality for P.
You should see that pn(i,j) is nondecreasing with n, and is positive for
n ~ ni , for some positive integer ni . Let N = maxi ni , so
0< e = miniPN(i,j),
using the finitude of I twice. Thus
1 - e ~ Pih > N} for all i.
Recall the shift T, introduced for the Markov property (14). Check
{Tj > (n + 1)N} = {'Tj > nN} () inN {'T; > N}.
Make sure that {'T; > nN} is measurable on ~o, ... , ~nN' Therefore,
Pih > (n + 1)N} = ~k Pi{~nN = k and 'Tj > (n + 1)N}
= ~k Pi{~nN = k and 'Tj > nN}' Pkh > N} by (15)
~ (1 - e) ~k Pi{~nN = k and 'T; > nN}
= (1 - e)P;h > nN}
~ (l - e)nH by induction.
Suppose nN ~ m < (n + 1)N. Then
Pi{'T; > m} ~ Pi{'T; > nN}
~ (1 - e)n
9. INVARIANT PROBABILITIES
therefore
~iEl [~~::'=lpm(i,j)J P(j, k) = ~~~::2pm(i, k).
Send n to 00. Use (69a) and Fatou, as in (82):
*
~jEJ 7r(j)P(j, k) ~ 7r(k).
PROOF. By iteration,
so
=
(87) Lemma. 7T
ft(j)
is a probability.
~iEI ft(i)7T(j)·
*
PROOF. Using (82) and (84), put 7T for ft in (85):
7T(j) = [~iEI 7T(i)] 7T(j).
=
~iEI 7T(i) 1.
*
*
PROOF OF (81). Use (84), (87), and (85).
Now drop the assumption that I is a positive recurrent class. Let ft be a
signed measure on I with finite mass. The next theorem describes all invariant
ft. To state it, let C be the set of all positive recurrent classes J c l. For
JEC, define a probability TTJ on Iby: 7TAj) = l/mP(j,j) forjEJ, and
7T Aj) = 0 for j 1= J.
Therefore
fJ, = L JEC fJ,(J) 7T J.
is also P-invariant.
*
10. THE BERNOULLI WALK
In this section, I is the set of integers and 0 < p < I. Define the stochastic
matrix [p] on I by: [p](i, i + 1) = p, and [p](i, i - I ) = 1 - p. This
notation is strictly temporary. Plainly, I is a communicating class of period 2.
(92) Theorem. I is recurrent for [p] iffp = l·
1.10] THE BERNOULLI WALK 33
°
To begin with, x = j[p]( -1, 0): indeed, the [p]o-distribution of ~o - 1,
~l - 1, ... is [pLl; and ~o - 1, ~1 - 1, ... hits iff ~o, ~l' ... hits 1. Next,
°
the second equality is old; the first one works because the [tll-distribution of
- ~o, - ~1' •.• is [tLl' and - ~o, - ~l' ... hits iff ~o, ~l' ... hits 0. Now
use (54). *
(94) The class I is null recurrent for [U
Here is an argument for (94) that I learned from Harry Reuter. By
previous reasoning, pn(j,j) does not depend on j. So lim n_ oo P2n(j,j) does
not depend on j. If I were positive recurrent, the invariant probability
34 INTRODUCTION TO DISCRETE TIME [1
would have to assign equal mass to all integers by (69d) and (81). This is
untenable.
Suppose! < p < 1. The two solutions of (93) are 1 and p/(1 - p) > 1.
Thus,f[p](O, 1) = 1. Previous arguments promote this to
y = 1_ P + py2.
The two solutions are y = 1 and y = (1 - p)/p, so f[p](O, -1) = (1 - p)/p.
Previous arguments promote this to
The material in this section will be used in Section 12, and referred to in
Chapter 3 and 4. It is taken from (Chung, 1960, Section 1.9).
(99) Definition. For any subset H of I, define a substochastic matrix PH on I:
for i E land j i H, let PH(i,j) = P(i,j): but for j E H, let PH(i,j) = 0.
°
Let 7 be the least n > if any with ~n E H, and 7 = 00 if none. With
respect to Pi' the fragment {~n:O ~ n < 7} is Markov with stationary transi-
tions PH. Thus, ePH(i, k) is the Pi-mean number of n ?; 0, but less than the
first positive m with ~m E H, such that ~n = k. Moreover,fPH(i, k) is the
Pi-probability that there is an n > 0, but less than the first positive m with
~m E H, such that ~n = k. The operators e andfwere defined in (49).
1.11] FORBIDDEN TRANSITIONS 35
In the proof of (100), I will use some theorems proved for stochastic P
on slJbstochastic P. To legitimate this, adjoin a new state 0 to I. Define
Give I the discrete topology, and let 1 = I U {rp} be its one-point com-
*
pactification; forexample, let 1 = {iI' i2, ... ,ioo}, where ioo = rp; metrize i
. p(in' im )
wIth = 11- - -1 I and 1
- = o. A sequence k n E I converges to rp
n m OCJ
iff k n = j for only finitely many n, for eachj E I.
*
PROOF. With respect to Pk , almost all paths hit either i or k first, in
positive time.
12. THE HARRIS WALK
The next example was studied by Harris (1952), using Brownian motion.
°
To describe the example, let < aj < 1 and bj = 1 - aj for j = 1,2, ....
Let I be the set of nonnegative integers. Define the stochastic matrix P on I
by:P(O, 1) = 1, whileP(j,j + 1) = aj andP(j,j - 1) = b,for 1 ~j < 00.
Plainly, I is an essential class of period 2. When is it recurrent? To state the
answer, let ro=l; let rn=(bl···bn)/(al···an) for n=I,2, ... ; let
R(O) = 0; let R(n) = ro + ... + r n- 1 for n = 1,2, ... ; and let R(oo) =
L;::'=o rn·
(104) Theorem. I is recurrent or transient according as R( 00) = 00 or
R(oo) < 00. If I is recurrent, it is null or positive according as L;::'=ll/(anr n )
is infinite or finite.
The proof of this theorem is presented as a series of lemmas. Of these,
(105-111) deal with the criterion for recurrence, and (112) deals with
distinguishing null from positive recurrence. It is convenient to introduce a
stochastic matrix Q on I, which agrees with P except in the Oth row, when
Q(O, 0) = 1.
(105) Lemma. For each i E I, the process Rao), R(~l)' ... is a martingale
relative to Qi.
PROOF. On {~n = j}, the conditional Qi-expectation of R(~n+1) given
is Lk Q(j, k)R(k), by Markov (15). When j = 0, this sum is
°
~o, ... , ~n
clearly = R(j). Whenj > 0, this sum is
ajR(j + 1) + bjR(j - 1) = aj[R(j) + rj] + bj[R(j) - rj_1 ]
= R(j) + ajrj - bjrj_1
= R(j).
*
1.12] THE HARRIS WALK 37
*
that ;n+l' ;n+2' ... proceed steadily to the right until reaching k. So, the
Qj-probability that i < ;0' ... , ;md < k is no more than (1 - n)m.
Restate (106) as
(107) fQ{i}(j, k) + fQ{k}(j, i) = l.
Of course,fQ{i}(j, k) is the Qrprobability that ~ hits k before i.
(108) Lemma. fQ{k}(j, i) = [R(k) - R(j)]/[R(k) - R(i)].
PROOF. Let x = fQ{k}(j, i), so 1 - x = fQ{i}(j, k) by (107). Let T be
the least n with ;n = i or k, and T = 00 if none. Stop {Ran)} at T, using
(106) and (10.28). Thus
*
Solve for x.
(109) Lemma. fQ{k}(j, i) = fP{k}(j, i).
PROOF. Let io, iI' ... ,in be any I-sequence which does not contain 0
except possibly for in. Then
Pj{;o = io, ;1 = iI' ... , ;n = in}
= Qi{~O = io, ~1 = iI' ... , ~n = in}·
< m < n;
*
Sum over all such sequences, with io = j and in = i and im ~ k for 0
even n is allowed to vary.
(110) Lemma. (a) fP{k} (j, i) = [R(k) - R(j)]/[R(k) - R(i)].
(b)fP(j, i) = [R(oo) - R(j)]j[R(oo) - R(i)] ifR(oo) < 00.
(c)fP(j,i) = 1 if R(oo) = 00.
Use (l08, 109) to get (a). Let k and use (101) to get (b)
*
PROOF. -+ 00
and (c).
(111) Lemma. fP(i,j) = 1.
*
PROOF. As in (106).
38 INTRODUCTION TO DISCRETE TIME [1
1 -fP{O\(k k) = b
J' k
[1 - R(k -
R(k)
1)J',
and by algebra,
(115) 1 - jP{O}(k k) = akrk .
, R(k)
Now compute mP(O, 0), as follows:
mP(O, O) = I:~o eP{O}(O, k) (102)
= eP{O}(O, 0) + I:';:1 eP{O}(O, k)
= 1 ~oo jP{O}(O, k) ()
+ ~k~11 _ jP{O}(k, k) 100
(114, 115). *
In a similar way, mP(i,j) can be computed. It is easy to drop the condition
o < a < 1, and using the idea of (94), it is easy to handle the case where I
j
is all the integers. For further information on random walks, consult (Chung
and Fuchs, 1951), (Feller, 1966), or (Spitzer, 1964).
1.13] THE TAIL a-FIELD AND A THEOREM OF OREY 39
° °
moving subclasses; recall from (48) that i and j are in the same 1m iff there is
an n ~ and a state k in I with pn(i, k) > and pn(j, k) > 0. For m E M,
let t(m) be the index of the subclass following 1m; so i E 1m and P(i,j) > 0
imply j E It(ml> by (45). Thus, t is a 1-1 mapping of M onto itself. Let Tbe the
40 INTRODUCTION TO DISCRETE TIME [1
(119) Theorem. Each §'(OO)-set differs by a Pp-null set from some union of
sets {~o E 1m}. More precisely, let A E §,(OO). Let M(A) be the set of mE M
such that Pi(A) > 0 for some i E 1m. Then A differs from U {~o Elm: mE
M(A)} by a Pp-null set. Conversely, each set {~o E 1m} differs by a Pp-null set
from an §'(oo)-set. Finally, M(T-1A) = t-1M(A).
NOTE. T-I{~O Elm} = {~l Elm} = {~o E It-I(m)}: the last equality is a.e.
WARNING. Pp(A) = 0 and Pp(T-IA) = 1 is a possibility, because
p(Im) = 0 and p(It-I(m» = 1 is a possibility.
(120) Theorem. Each .f-set differs by a Pp-null set from some union of
sets {~o E Ie}. Conversely, each set {~o E Ie} differs by a Pp-null set from an
.f-set.
NOTE. The partition {le:e E E} is finer than the partition {Im:m EM},
which in turn is finer than {Ie:c E C e}.
Turn now to the proofs. The first result is the 0-1 law of Hewitt and Savage
(1955). For future use, I will state the result quite generally. Let (V,d)be a
measurable space. Let VOO be the space of V-sequences, endowed with the
product a-fieldd oo • Afinite permutation 7T on Z = (0,1, ... ) is a 1-1 mapping
of Z onto Z, with 7T(n) = n for all but finitely many n. Each 7T induces a 1-1
bimeasurable mapping 7T* of VOO onto V oo :
7T*(VO' VI' •.• ) = (v"o, v,,!' ... ).
The a-field iff of exchangeable sets is the a-field of A E d oo with 7T* A = A
for all finite permutations 7T of Z. Let (n, §', &) be a probability triple.
Let Xu, Xl' ... be measurable mappings from (n, §') to (V, d). Then
X = (Xo, Xl' ... ) is a measurable mapping from (n, § , ) to (V OO , d OO ).
An exchangeable X-set is a set X-IA with A E iff.
1.13] THE TAIL a-FIELD AND A THEOREM OF OREY 41
Consequently,
IQ(B (') A) - Q(B (') 7T* At)1 < e.
I will construct a 7T with
Q(B (') 7T*A t } = Q(B)Q(7T*At) = Q(B)Q(AJ
Indeed, suppose B depends only on coordinates n ~ b, and At depends only
on coordinates n ~ a. Let c > max {a, b}. Let
7T(n) =n+c for 0 <c
~ n
=n - c for c ~ n < 2c
=n for n ~ 2c.
Then 7T* A. depends only on coordinates n with c ~ 11 < 2c, and is in-
dependent of B. For this 7T,
IQ(B (') 7T*A t ) - Q(B)Q(A)I < Q(B) IQ(A t } - Q(A)I < e.
As usual,
Q(B (') A) - Q(B)Q(A) = Q(B (') A) - Q(B (') 7T* At)
*
by (45). Now P;{A} is constant at 0 or 1. So P;{T-IA} is 0 or 1, according as
i E 1m with t(m) E Mo or t(m) E MI.
PROOF OF (120). Suppose 1m c Ie' and Ie has period d. Then (45) shows
Ie = U~:~ It-n Cm )·
Let A be an invariant set. Then A E .?F COO ). Use (119) to check M(A) =
M(T-IA) = t-IM(A). Consequently,
k's, the preceding part of the sequence is obtained by permuting some finite
state sequence ip, all the transitions in ipk being possible. I say P;{A 1 } = O.
For Al C U:'=l B n, where Bn is the set of infinite state sequences with k in
the nth place, the preceding part of the sequence being a permutation of
*
some finite state sequence ip, all the transitions in ipk being possible. But,
i f'/..J j implies P;{Bn } = O.
*
PROOF OF (121). As for (119), using (125-127).
Similar results apply to random walks. Let G be a countable Abelian group
and let 7T be a probability on G. Let V be the set of i E G with 7T(i) > O. Let
{Gm: m E M} partition G into the co sets of the group spanned by V-V.
Let {G e : c E C} partition G into the cosets of the group spanned by V. Let
{Zn:O ~ n < <Xl} be independent random variables with values in G, the
Zn with n ~ 1 having common distribution 7T. Let Xn = ~~=oZv' Now,
{Xn} is a Markov chain, but is in general transient. The tail a-field of {Xn}
is equivalent to the atomic a-field generated by the sets {Xo E Gm} for m E M.
The invariant a-field of {Xn } is equivalent to the atomic a-field generated by
the sets {Xo E Ge } for c E C. Proofs are omitted, being virtually identical to
those for (119) and (120). An analog of (128) can also be obtained, using
similar ideas. This leads to another proof of the renewal theorem.
The next theorem is due to (Orey, 1962).
(128) Theorem. If i and j are in the same cyclically moving subclass,
limn_eo ~keI Ipn(i, k) - pn(j, k)1 = O.
PROOF OF (128). Let Q = iP; + iP;. As (119) implies, ff(ro) is trivial for
Q. Use (129) twice, with A = {~n E S} and B = {~o = i} or B = {~o = j},
to get
limn_ro SUPScI IPi{~n E S} - P;{~n E S}I = 0.
*
Finally, ~kEI Iq(k) - r(k)1 = 2 SUPScI Iq(S) - r(S)1 for any probabilities
q and r on all subsets S of I.
14. EXAMPLES
*
return to the origin at the same time with probability 1, which is known to be
false (Chung and Fuchs, 1951, Theorem 6).
NOTE. Let P be a stochastic matrix on I. Let {Xn } and {Yn } be independent
P-chains. Then {(Xn' Y n)} is also a Markov chain, with state space I x I and
transitions T, where
m[(i,j), (i' ,j')] = pn(i, i') pn(j,j').
46 INTRODUCTION TO DISCRETE TIME [1
*
is measurable on Um+!' Um+2 , • •• , and A has probability 0 or 1 by the
Kolmogorov 0-1 law.
(133) Example. There is a transition matrix P and starting probability
p on I = {I, 2, 3} for which I is a communicating class of period 1, but the
exchangeable a-field is nontrivial.
DISCUSSION. P is the matrix
Let p and 0' be finite sequences of states with all transitions possible in 1p3
and 3a3. In a finite sequence of states with all transitions possible, any 3 is
either at the beginning of the sequence or is preceded by a 2. Consequently,
in Ip there is one more 2 than 3. In 30', however, there are as many 2's as 3's.
*
Thus Ip cannot be permuted into 30', and in view of (121), the a-field of
exchangeable events is not trivial for PP ' provided p(l) > 0 and p(3) > O.
2
1. INTRODUCTION
I want to thank Allan Izenman for checking the final draft of this chapter.
47
48 RATIO LIMIT THEOREMS [2
(3) Theorem (Kingman and Orey, 1964). Let N be a positive integer, and
let e > O. Suppose I has period 1, and pN(i, i) >
e for all L Thenfor each
m = 0, 1, ... and i,j, k, I E I:
2. REVERSAL OF TIME
*
PROOF. v(j) ~ ~k v(k)pn(k, j) ~ v(i)pn(i, j).
(6) Lemma. If I is a communicating class, either v is identically 0, or v is
strictly positive and locally finite, or v( i) = 00 for all i.
PROOF. Use (5).
Let J = {i: 0 < v( i) < oo}. For any matrix M on I, define the matrix,
*
vM on Jby:
vM(i, j) = v(j)M(j, i)/v(i).
Call vM the reversal of M by v.
2.2] REVERSAL OF TIME 49
The general case. Splitfinto its positive and negative parts. Use (17) and
case 3. *
Remember that ft(i) = eP{s}{s, i) is the P,-mean number of i's in the first
s-block.
(18) Lemma. (a) Let k 'F s. The P,-mean number of pairs j followed by k
in the first s-block is ft(j)P(j, k).
(b) The PB-probability that the first s-block ends with j is ft(j)P(j, s).
PROOF. Claim (a). Let7" be the least n > 0 if any with ~ .. = sand 7" = ex:>
if none. Let IX..
be the indicator function of the event
A .. = {~.. =jandn < 7"}.
Let P.. be the indicator function of the event
Bn = {An and ~"+1 = k}.
Confirm 7" > n + 1 on B .. , because k 'F s. On {~o = s};
~~o IX.. is the number of j's in the first s-block;
~:=o P.. is the number of pairs j followed by k in the first s-block.
Check that {7" > n} is in the O'-field generated by ~o, ..• , ~ ... Now Markov
(1.15) and monotone convergence imply
= ft(j)P(j, k).
Claim (b) is similar.
*
(19) Lemma. ft(s) = 1 and ft is invariant.
PROOF. By definition, the first s-block begins at the first s, and ends just
before the second s. There is exactly one s in the first s-block, so ft(s) = 1.
If k = s, then ft(j)P(j, k) is th~ P,-probability that the first s-block ends with
j by (I8b). The sum onj is therefore I, that is, ft(s). If k 'F s, then ft(j)P(j, k)
is the P,-mean number of pairs j followed by k in the first s-block by (I8a).
*
The sum on j is therefore the P.-mean number of k's in the first s-block,
that is, ft(k).
(20) Lemma. If'll is a strictly positive and locally finite invariant measure
then v(j)/v(k) = ft(j)/ft(k).
2.4] VARIATIONS 53
PROOF. Use (15), with / = t5 j and g = 15 k • Then use (19) to put fl for v
*
in (15).
*
PROOF OF (1). Use (12, 19,20).
PROOF OF (2). Abbreviate ()(i,j) = t5 i e n Pt5;. Then
()(i,j)
()(i,j) ()(k,j)
--=- _._-
()( k, /) ()( k, j) ()( k, l)
Use (13) with t5Jor p and 15 k for q andj for i to make the first factor converge
to 1. Use (15) with k for i and 15; for/and 15 1 for g to make the second factor
converge to fl(j)/ fl(l). *
4. VARIATIONS
So
= J~~~l Zn dP +IA dP JB dP
i i - i·
J A dP i = J B dP i ;
J Zl dP i = eP{j}(j, k).
54 RATIO LIMIT THEOREMS [2
a
I [ -------. ZI 0 r·-----)
I
[----------------.J.----. A,---------------------)[ -------- Z1----------)
:----------------r-----------------
I
I I I /f-
j j
T =0 [------------- B,-------------)
a
I
I
[.----------A ,--------)[,------- z1,------ )[------Z2'-----)[.------- z3'----f----- )[--- Z4'----)
I
~--- --- ~------.
I I I 1~
j j j j j
T=3 [--B-o)
Visits to i and j are shown, but not to k. The variables A, B, ZI' Z2' 0 • are the
number of k's in the intervals shown.
Figure 1.
By blocks (1.31),
(22) the variables ZI' Z2' ... are Pi-independent and identically distributed.
I claim
*
This proves (23).
For (24), let I be a positive recurrent class. That is, the Pcmean waiting
time mP(i, i) for a return to i is finite. And 7T(i) = l/mP(i, i) is the unique
invariant probability.
*
PROOF. Claim (a). As in (21), using (1.84) for the first argument.
Claim (b). Use (1) and (a).
At a first reading of the book, skip to Section 6.
Remember enp = po + ... + pn, so (enP)(i,j) is the Pi-mean number of
j's through time n, and
pMf = ~i.i p(i)M(i, j)f(j)·
So
By blocks (1.31),
(27) Z1> Z2' ... are P p-independent and identically distributed.
I claim
(28) {T(n) < m} is P p-independent of Zm.
56 RATIO LIMIT THEOREMS [2
a
,II ---Z1 0 ~-- ) [ . Z2 0 ~.)[.-- Z3 0 ~--- )
r(n)=O
a
,11-'-Z1 0 ~ --.}[_.- Z2 0 ~ •• - )
Ten) =2
Visits to j are shown, but not to i. Variables A, B, (n), Zl' Z2 ... are the number of
i's in the intervals shown.
Figure 2.
for t < m. To prove (29), let a be the time of the t + Istj. Then a is Markov,
and ~(f = j on {a < oo}. Moreover, {T(n) = t} is in the pre-a sigma field.
Let' be the post-a process. Then Zm = Zm-t 0 , on {T(n) = t}, as in Figure
2. By strong Markov (1.22):
time 0
~1-----+--------~~----------+-------~~/~A=3
time 0 time n
~I---+----+----+---+1-+1--+-+---+""1//- B(n)= 2
j j j
Figure 3.
As in Figure 3, let A be the number of i's on or after the first i but before the
firstj after the first i, and let B(n) be the number of i's on or after the first i
after n, but before the first j after the first i after n. By the strong Markov
property (1.22*),
JA dP = JB(n) dP
p p = eP{j}(i, i),
which is finite by (1.100). Plainly, A ~ A and B(n) ~ B(n). So
Ipepenpb;
np.t
Ui
_ eP{ .}(. i)
j j,
II ~ eP{j}(i, i) .
-
pe
np.t
Ui
*
But penPbi ~ 00 as n ~ 00, by (1.51).
(31) Theorem. Let f be a function on I, with fl(lfl) < 00 and fl(f) =;1= O.
Then
limn~oo (bienPf)/(bienPf) = l.
But, I say,
epP{j}(j, i) = p.(i)/p.(j).
For p.(')/p.(j) is 1 atj by inspection, and is pP-invariant by computation. So
(1) works on pP.
The general case. Let j+ and j- be the positive and negative parts of
/, so/=j+ - j-. Let
an = lJienPj+ and bn = lJienPj-
C n = lJjenPj+ and dn = lJjenPj-.
Then
an _ bn d n
lJ;enPj an - bn Cn dn en
--= =
lJjenPj en - dn 1 _ dn
en
Use the special case to get an/cn -+ 1 and bn/dn -+ 1. Use (15) to get
dn/c n -+ p.(/-)/p.(/+) ¥: 1. *
Let f and g be functions on I, with p.(1/1) < 00, and p.(lgl) < 00, and
p.(g) ¥: O. Let p and q be probabilities on l. Combining (13) and (25) gives
(32) limn_a> (penPlJ;)/(qenPlJj) = p.(i)/p.(j).
Combining (15) and (31) gives
(33) limn->oo (lJienPf)/(lJjenPg) = p.(j)/p.(g).
Of course, (33) can be obtained from (32) by reversal. It is tempting to
combine (32) and (33). However, according to (Krengel 1966), there are
probabilities p and q bounded above setwise by a multiple of p., and a set
A c I with p.(A) < 00, such that (pe nPI A )/(qenPI A ) fails to converge. The
same paper contains this analog of (31):
Indeed,
J
= L. t: 1 Pi{T = t} L.;:'::O g(~m) dPi
by Markov (1.15*). This proves (35).
Abbreviate Iig I = SUPk g(k). Divide both sides of (35) by t5 j enPg. For each
t, the ratio (t5jen-tpg)/(t5jenpg) is at most 1, and tends to 1 as n increases,
because
o~ t5 je nPg - t5 jen- tPg ~ t· Ilgll
and t5 j e nPg -- 00 as n -- 00. By dominated convergence,
lim infn~'" (t5 ie nPg)/(t5jenPg) ~ 1.
*
Interchange i and j.
Let, be the post-T process; that is, 'maps {T < oo} into Q by
['(w)](n) = ;T(w)+n(w) for w E domain ;dw)+n'
Letj E J and A E Y with A C {;o = j}. Then (39) makes
*
This gives (36) with .9/J for .9 and ;jJ for ;.
Indeed,
9{ Yo = I and T ~ I} = 9{ Yo = I}.
For n ~ 2,
9{Yo = 1 and T ~ n}
= 9{Yo = 1 and YI = ... = Y n- 1 = O}
= 9{Y1 = ... = Y n- 1 = O} - 9{Yo = ... = Y n- 1 = O}
= 9{ Yo = ... = Y n- 2 = O} - 9{ Yo = ... = Y n- I = O}.
9{ Yo = I} + 9{ Yo = O} - limn-->oo 9{ Yo = ... = Yn = O}
which is the same as
6. PROOF OF KINGMAN-OREY
I learned the proof of (3) from Don Ornstein: here are the preliminaries.
Let
°
(47) Lemma. Fix a with < a < 1. As p runs through the closed interval
[0, 1], the function p -- f(p, a) has a strict maximum at p = a.
PROOF. Calculus. *
Let
m(p, a) = f(p, a)lJ(a, a).
(48) Lemma. Let °
< p < a < 1. Let Y1 , Y 2 , ••• be independent and
°
identically distributed random variables on the probability triple (11,.iF, [llJ),
each taking the value 1 with probability p, and the value with probability 1 - p.
(a) [llJ{Yl + ... + Yn ~ na} ;;:;; m(p, a)n.
(b) m(p, a) < I.
PROOF. Claim (a). Abbreviate Y = Y1 and S = Y1 + ... + Yn.
Write E for expectation relative to [llJ. Let 1 < x < 00. Then S ~ na iff
x S ~ xna. By Chebychev, this event has probability at most
Compute
E(x Y ) = I - p + px.
By calculus, the minimum of x -- x- a (1 - p + px) occurs at
x = a(l - p) >I
p(l - a)
and is m(p, a). Use this x.
Claim (b). Use (47).
°
(49) Lemma. Let ;;:;;f;;:;; 00 be a subadditivefunction on {I, 2, ... }, which
*
isfinite on {A, A + 1, ... } for some A. Let
IX = infn~d(n)/n.
Then
limn~ocJ(n)/n = IX.
onf(n) = -logpn(i,
PROOF.
(51) Lemma.
Use (49) i).
*
Suppose is a communicating class of period 1. There is an L
I
with 0 ~ L ~ I, such that limn~'" [pn(i,j)]1/n = Lfor all i andj in I.
PROOF. Let k > O. Then
(52) kiln ~ 1 as n ~ 00.
As (50) implies, limn~oo [pn(i, i)]l/n = LU) exists for all i. Fix i ¥= j. Choose
a and b with
pa(j, i) >0 and Pb(i,j) > o.
Then
pn(j,j) ~ pa(j, i) . pn-a-b(i, i) . Pb(i,j).
Take nth roots and use (52-53) to get L(j) ~ L(i). Interchange i and j to
get L(i) = L(j) = L say. Abbreviate g(n) = [pn(i,j)]l/n. I say g(n) ~ L.
Indeed,
pn(i,j) . pa(j, i) ~ pn+a(i, i)
pn(i,j) ~ Pb(i,j) . pn-b(j,j).
66 RATIO LIMIT THEOREMS [2
(55) Lemma.
on l. Then limn~oo (PpnHb;)/(ppnb;)
°
Suppose P(i, i) > e > for all i E l.
= 1.
Let p be a probability
Check that I is still a communicating class of period 1 for P*. Use (1.47)
to create a positive integer n* such that:
(57)
By algebra
so
_e_ < Sm < _e_, for m E M.
e + e' tm e - e'
By Cauchy's inequality,
e an e
(58) --<-<--.
e + e' en e - e'
Let Y1 , Y 2 , ••• be independent and identically distributed random vari-
ables on (n,!F, f!J'), taking the value 1 with probability 1 - e and the value
o with probability e. Let ji = 1 - y. Because P* is stochastic,
o ~ pp*m(jj ~ 1.
Similarly,
+ ... + Y.. ~ (n + 1)(1 - e + e')}
d.. ;;;; &'{Yi
+ &'{ Y1 + . " + Y.. ~ n - (n + 1)(1 - e - e')}
;;;; &'{ Y1 + ... + Y .. ~ n(l - e + e')}
+ &'{ Y1 + ... + y.. ~ nee + le')}
by (56). So (48) produces an r = r(e, e') < I.such that
(59) b.. ;;;; r" and d.. ;;;; r".
But (51, 54, 57) makes lim inf (a .. + b .. )l/.. ~I, so r" = o(a.. + b ..); and
b .. ;;;; r" by (59), so b .. = o(a.. + b .. ), forcing b .. = o(a.. ). Similarly,
d .. = o(e.. ). Therefore,
· In
I1m . fa .. + b.. = I'1m In
. fa..
- an d I'1m sup a .. + b.. = I'1m sup -a .. .
e.. + d.. e.. e.. + d.. e..
By (57, 58), the lim inf and lim sup of (pP"+ldj)/(pP"d j) are both trapped
between el(e + e') and el(e - e'); these bounds are close to I. *
(60) Lemma. Suppose P(i, i) > e > 0 for all i E l. For any probability p
on I,
lim .. _ex> (pP"dj)/(pP"d i) = eP{i}(i,j).
PROOF. For any subsequence, use the diagonal argument (10.56) to find
a sub-subsequerice n' such that (pP"'dj)/(pP"'~i) converges as n ~ 00, say to
p(j), for allj E I. Here 0;;;; p(j) ;;;; 00, and p(i) = I. My problem is to show
p(j) = eP{i}(i,j). As (1.47) shows, PP"~i > 0 for all large n. Make the
convention 00 ' 0 = 0, as required by (1.80), and estimate as follows:
= ~
4;
lim...
.. ex>
[PP"'d;' P(j,
pp..,~i
k)] (convention)
< I'm
= I ..... ex>
~.
4,
[PP"'d;' P(j,
pP"'d;
k)] (Fatou)
. [PP"'+ld k ] (algebra)
= hmM"'~
.. - ppn'lJ;
= p(k) (definition),
So, p is subinvariant. Use (I).
*
2.6] PROOF OF KINGMAN-0REY 69
I will now work on the case N > 1 in theorem (3). Suppose N is an integer
greater than 1. Suppose I is a recurrent class of period 1 relative to P, and
PROOF. Check that I is a recurrent class of period 1 for pN. Let r be one
of 0, ... , N - 1. Use (55) with p.Y for P and pP' for p, to see that
*
pN. The uniqueness part of (1), applied to pN, identifies f-l(j) with
eP{i}(i,j).
PROOF.
(64) Lemma.
Reverse (62), as in (15).
limn~oo pn+l(j,j)/pn{j,j) = 1.
*
PROOF. Let r be one of 1, ... , N - 1. By algebra,
Put m = n + r:
Invert:
(66)
7. AN EXAMPLE OF DYSON
The construction for (67) has three parameters. The first parameter p is a
positive function on the positive integers, with 1::'=1 pen) = 1. There are no
other strings on p: you choose it now. The second parameter f is a function
from the positive integers to the positive integers, withf(l) = 1. For n = 2,
3, ... , I will require
(68) fen) ?;f(n - 1) + 2.
I get to choose f inductively; it will increase quickly. The third parameter q is
a function from {2, 3, ... } to (0,1). You can pick it, subject to
You check (67a-b), using (1.16). Abbreviate O(n) = pn(s, s). For my f
and n ~ 2, it will develop that
(70a) e[f(n)] < 2p(n)/n
(70b) (1 -- p~n»)p(n) < e[f(n) + 1] < (1 + ~)p(n)
(70c) 2(1 - p~n») p(n)P(s, s) < e[f(n) + 2] < 2P(n)[ pes, s) + A 1
You can get (67c-d) from (70).
Remember that 100 is the space of I-sequences. And ~t(w) = wet) for
w E 100 and t = 0, 1, .... Let rt.(t, w) be the first component of wet), and let
{J(t, w) be the second component, so
~t(w) = [rt.(t, w), {J(t, w)].
Let T n( w) be the least t if any with rt.(t, w) ~ n, and let T n( w) = 00 if none.
The probability p. on r makes ~ Markov with stationary transitions P and
starting state s. For n = 2, 3, ... , I will require
(71)
a
Let t/= /. Let ,~ = ~t for t < Tn' and ,~ = 0 for t ~ Tn" Let N ~ 2.
Supposef(n) andq(n) have been chosen for n < N, so that (68, 69, 71) hold.
I know this doesn't determine P, but the P.-distribution of is already ,N
fixed: ,N is Markov with stationary transitions and a finite state space;
every state leads to the absorbing state o. I can use (1.79) to choose feN) so
large that (68, 71) hold for N. Now you choose q(N) so (69) holds for N.
The construction is finished.
I must now argue (70). Fix n ~ 2. Suppose Tn(W) < 00. Abbreviate
rt.(w) = rt.[Tn(W), w].
Now rt.(w) ~ n. By (68),
(72a) Tn(W) < 00 implies f[rt.(w)] ~f(n)
{~t(n) = s} c loo\G.
Use (74):
Offen)] ;[: 1 - P,(G) < 2p(n)Jn.
ARGUMENT FOR (70b). Plainly,
So (69) does the first inequality in (70b). For the second, suppose wE G.
Suppose Tn(W) = 1 and ~l(W) ¥- tn, 1]. Then (73c) makes oc(w) > n, and
(72b) makes
(75)
now (73c) prevents w(f(n) + 1] = s. Suppose T new) ~ 2. Then (72a) estab-
lishes (75), and (73c) still prevt;nts w(f(n) + 1] = s. That is,
G n gt(n)+l = s} C {~l = (n, I]).
Use (74):
8[f(n) + 1] ;[: Psg l = (n, In + Ps{IOO\G}
But A and B are disjoint, and A u B C {~f(nl+2 = s}. This proves the first
inequality in (70c). For the second, suppose W E G. Suppose 'Tn{w) = 1 and
~l{W) -:F [n, l],or'Tn{w) = 2and~2{w) -:F [n, I]. Then (73c)makesot{w) > n,
and (72b) makesf[oc{w)] ?;,f{n) + 2. So
(76) 'Tn(w) + f[ot{w)] - I ?;,f(n) + 2.
Now (73c) prevents w[f(n) + 2] = s. Suppose 'Tn{w)?;, 3. Then (72a)
establishes (76), and (73c) still prevents w[f(n) + 2] = s. If wE G and
~l(W) = [n, I], then wE A; this uses (73d). So
(78) Theorem. Suppose f and g are functions on I, with ,uGfl) < 00 and
,u(lgl) < co. Suppose at least one of ,u(f) and ,u(g) is nonzero. Fix s E I. With
P.-probability 1,
·
tImn-+oo ~;:.~o f(~m) _ ,u(f)
-.
~;:.~o g(~m) ,u(g)
PROOF. Suppose ,u(s) = 1 and ,u(g) =F O. Using (1.73), confine w to the
set where ~o = s and ~n = s for infinitely many n, which has P.-probability
1. Let 0 = 'T 1 < 'T2 < ... be the times n with ~n = s. Let I(n) be the largest m
with 'T m ~ n, so len) -+ co with n. Let h be a function on Iwith ,u(lhl) < co:
the interesting h's aref, IfI , g, Igl. For m = 1,2, ... ,let
Ym(h) = ~n {h(~n):'Tm ~ n < 'Tm+l}
Vm(h) = Y 1(h) + ... + Ym(h).
I claim that with P.-probability 1, as 11 _+ co:
V!(n)(f) ,u(f)
(79) ---+-
Vz(n)( g) ,u(g) ,
and
(80)
'i
Introduce Es for expectation relative to P s. Let be the number ofj's in the
first s-block: namely, the number of n < T2 with ~ n = j. As (I) implies,
,u(j) = Es{O· Clearly,
So
E S{Y1(h)} = ~jEI h(j),u(j) = ,u(h).
By blocks (1.31), the variables Ym(h) are independent and identically
distributed for m = I, 2, .... The strong law now implies that with p.-
probability 1,
(81)
Put h = f or g and divide: with P.-probability 1,
limm--->oo V mCf)fVmeg) = ,u(f)/,u(g).
Put m = len) to get (79). Next,
YmCh)
- - =Vm(h)
---m - -_1 .V_
m_1Ch)
-
m m m m-l
2.9] THE SUM OF A FUNCTION OVER DIFFERENT j-BLOCKS 75
Check
(82)
Clearly,
Sn(j) = Sn(j) - l/z(n)(j) + l/z(n,(j)
Sn(g) Sn(g) - l/z(n)(g) + l/z(n)(g)
with P.-probability 1.
==
PROOF. Put g I in (78).
*
9. THE SUM OF A FUNCTION OVER DIFFERENT j-BLOCKS
*
(86) Lemma. Let Ui be real numbers, and 0 < p < 1. Then
PROOF. Easy.
~ f'~~=l Wml P
d&
< 00 ••
Claim (b). Suppose p ~ 1. Check this computation, using (85).
&{U + v = O} = 1.
(a) 9'{Wl = K} = 1.
(b) If K ¥- 0, then 9'{M = I} = 1.
For any particular j, the variables {Yn(j):n = 1,2, ... } are independent and
identically distributed relative to Pi' The distribution depends on j, but not
on i.
(92) Theorem. (a) If Yn(j) is in U relative to Pi for some i and j, then
Yn(j) is in LP relative to Pifor all i and j.
(b) If P i{ YJj) = O} = 1 for some i and j, then P i{ Y n(j) = O} = 1 for all
i andj.
PROOF. There is no interest in varying i, so fix it. Fixj ¥- k. I will inter-
changej and k. Look at Figure 4. Let N(j, k) be the least n such that the nth
j-block contains a k. Abbreviate
Indeed, let A(j, k) be the sample sequence from the firstj until just before the
first k after the first j. Let B(k,j) be the sample sequence from this k until
just before the next j. Then A(j, k)B(k,j) is the sample sequence from the
first j until just before the first j after the first k after the first j. So V(j, k)
is the sum of h('Y}) as 'Y} moves through the state sequence A(j, k)B(k,j).
Equally, V(k,j) is the sum of h('Y}) as 'Y} moves through the state sequence
2.9] THE SUM OF A FUNCTION OVER DIFFERENT j-BLOCKS 79
~ ~~ ~n
time 0
L--..JI
r-----"71
11------··-i"AU, ')"'-'1'-'---'---1[-- i)-,)I NU, B(: fI- k) =3
j j j k k j
[-no k-)[,-noj~----)
Figure 4.
(94) the P;-distribution of YI(j) is c(j, k) times the ;7l-distribution of C(j, k),
plus [1 - c(j, k)] times the ;7l-distribution of Dn(j, k).
80 RA TIO LIMIT THEOREMS [2
Abbreviate
U(j, k) = l:;:;~{.k) Dm(j, k),
with the usual convention that l:?n=1 = O. Blocks (1.31) and (4.48) show
(95) the 9-distribution of C(j, k) + U(j, k) coincides with the P i-
distribution of V(j, k).
Claim (a). Assume Y1(j) E U for Pi' I have to show Y I(k) E U for Pi'
I claim
(96) V(j, k) E U for Pi'
Suppose c(j, k) < 1, the other case being easier. Now C(j, k) and D I(j, k)
are in U, using (94) and (87b). So U(j, k) E U by (89b). This and (88a) force
C(j, k) + U(j, k) E U. Now (95) proves (96).
As (93,96) imply, V(k,j) E U for Pi' So C(k,j) + U(k,j) E U by (95).
Consequently, C(k,j) E U and U(k,j) E U by (88b). In particular,
D I(k,j) E U by (89a). This gets Y1(k) E U by (94, 87a).
But
S(j, k) + T(k,j) = V(j, k) E £P
*
by (96). So S(j, k) E £P by (88b). Use (92) to vary j.
The next result generalizes (1.76).
(98) Corollary. If T2(j) - Tl(j) E £P relative to Pi for some i and j, then
T2(j) - Tl{j) E £P and T{j, k) - T1(j) E £P relative to PJor all i,j, k.
PROOF. Put h == 1 in (92, 97). *
3
1. INTRODUCTION
This chapter deals with the asymptotic behavior of the partial sums of
functionals of a Markov chain, and in part is an explanation of the central
limit theorem for these processes. Markov (1906) introduced his chains in
order to extend the central limit theorem; this chapter continues his program.
Section 3 contains an arcsine law for functional processes. The invariance
principles of Donsker (1951) and Strassen (1964), to be discussed in B & D,
are extended to functional processes in Section 4. For an alternative
treatment of some of these results, see (Chung, 1960, Section 1.l6).
Throughout this chapter, let I be a finite or countably infinite set, with at
least two elements. Let P be a stochastic matrix on I, for which I is one posi-
tive recurrent class. Let 7T be the unique invariant probability; see (1.81).
Recall that the probability Pi on sequence space I<f:> makes the coordinate
process {~..} Markov with stationary transitions P and starting state i.
Fix a reference state S E I. Confine w to the set where ~fI = s for infinitely
many n. Let 0 ~ 71 < 72 < ... be the times n with ~fI = s. Let/be a real-
valued function on I. Let
and
Here and elsewhere in this chapter, j is used as a running index with values
1, 2, ... ; and not as a generic state in I. Let Vo = 0 and V m = ~7'=1 Y i and
I want to thank Pedro Fernandez and S.-T. Koay for checking the final draft
of this chapter.
82
3.2] ESTIMATING THE PARTIAL SUMS 83
and
(2) Vi has finite Ps-expectation.
NOTE. If (2) holds for one reference state s, it holds for all s: the
dependence of V j on s is implicit. This follows from (2.92), and can be used
to select the reference state equal to the starting state. I will not take
advantage of this. Theorems (3) and (4) hold if [x] is interpreted as any non-
negative integer m with 1m - xl ~ 2. The max can be taken over all values of
[ ]. And i is a typical element of I.
(3) Theorem. n- i max {ISj - V[ju(s)]I:O ~j ~ n} -+ ° in P;-probability.
(4) Theorem. With Pi-probability 1,
(n log log n)-i max {ISj - V(ju(s)Jl: 0 ~ j ~ n} -+ 0.
The idea of comparing Sj with V[ju(S)] is in (Chung, 1960, p. 78).
For (6), do not assume (1) and (2), but assume
(5) Y j differs from 0 with positive Ps-probability.
Sn °°
Let Vm be 1 or according as Vm is positive or nonpositive. Similarly, let
be 1 or according as Sn is positive or nonpositive.
NOTE. s is a fixed state, and Sn is a random variable.
(6) Theorem. With Pcprobability 1,
Using (2.24),
E.(Y1) = l:kEI j(k)E'('k) = l:kEI j(k)1T(k)/1T(S) = O.
Assumption (2) implies E i ( Y:) < 00.
On {'Tl ~ n}, let:
len) be the largestj with 'Ti ~ n;
Y'(n) = l:i {j<Ei):O ~j ~ 'Tl - I};
= l:i {f<Ei):'Tl(n) ~j ~ n}.
Y"(n)
On {'T1 > n}, let I(n) = 0 and Y'(n) = Sn and Y"(n) = O. Let V_I = O. As
in Figure I,
(7) Sn = Y'(n) + Y"(n) + Vl(nl-l'
[.----- Y' (n )-----) [ ------------Y 1------------)[-,----. y 2-------)[----. Y 3-----)[-- Y" (n ) -- )
o n
[--------------------------------------------------. Sn .--------------------------------------------- )
l(n)=4
Figure 1.
Then (3) follows from (10), (11), and (13), to which (8) and (9) are
preliminary .
(8) Lemma_ Let ai' a2 , ••• and 0 < bl ~ b2 ~ ••• be real numbers with
bi - 00 and ai/bi - O. Then
max {ai, ... , an}/b n - O.
*
PROOF. Easy.
(9) Lemma. Let a > O. Let ZI, Z2' . .. be identically distributed, not
necessarily independent, with E(\Zlla) < 00. Then
n-I / a maXi {\Zil : 1 ~ j ~ n} - 0 a.e.
PROOF. It is enough to do the case a = 1 and Zi ~ 0: to get the general
case, replace Zi by IZila. By (8), it is enough to show
Znln-O a.e.
3.2] ESTIMATING THE PARTIAL SUMS 85
Let e > 0, abbreviate Am = {em ~ Zl < e(m + I)}, and check this
computation.
~~1 Prob {Zn ~ en} = ~::"=l~:=n Prob Am
= ~:=1 ~::'=1 Prob Am
<l~ooJ~Z
=e m=l 1 Am
< 00.
*
Borel-CantiIli implies that almost everywhere, Zn ~ en for only finitely
many n. Let e ! 0 through a sequence.
PROOF.
(11) Lemma.
I Y'(n)1 ~ ~j {If(~j)I:O ~j ~
n- i maXj {I Y" (j)1 : 0 ~ j ~ n}
T1 -
->-
I}, for all n.
0 with P;-probability 1.
*
PROOF. Plainly, I Y" (j)1 ~ U IC })' where Uo is temporarily O. But
l(n)~n+l,so
-rl(n)-l-rl(n)-
~1-----+-------4----~----~;~~(----~I----~I----~I--~1
o Tl T2 T3 TI(n) - 1 Tl(n) n TI(n) +1
[-RI(n)-d
Figure 2.
has Pi-probability more than 1 - b. Choose n l > no, finite but so large that
for n > nl, the event
en = {en! > max llVw)-l - V[;,,(s)]I:O ~j ~ N]}
has Pi-probability more than 1 - b. Thus, Pi(An n B n en) >1- 315 for
n > nl . If n > n l andj = 0, ... ,n, I claim
*
lesser is between I and n; and the greater is at most r times the lesser, because
WEB. The inequality is now visibly true, because wEAn.
This completes the proof of (3). The proof of (4) is similar, using (B & D,
1.119) instead of (B&D, 1.118). To quote (B&D, 1.118-119), let Y1 ,
Y2 , ••• be independent, identically distributed random variables on some
probability triple (.Q, ff, &). Suppose Y1 has mean 0 and variance 1. Let
Vo = 0, and for n ~ I let
Vn = + ... + Yn'
Y1
(B & D, 1.118). Let e > 0 and r > 1. Let per, n) = &(A), where A is the
event that
Then
limr j 1 lim sUPn_ 00 p(r, n) = O.
(B & D, 1.119). Let e > O. There is an r > 1, which depends on e but not
on the distribution of Yl> such that &(lim sup An) = 0, where An is the event
that
max {IV; - Vkl:O ~j ~ II andj ~ k ~ rj}
exceeds e(n log log n)t
Assume (5). Suppose that V m > 0 for infinitely many m along Pcalmost
all paths. In the opposite case, Vm ~ 0 for all large m, along Pcalmost all
paths, by the Hewitt-Savage 0-1 Law (1.122). The argument below then
establishes (6), with I - s; for s; and I - v; for v;. This modified (6) is
equivalent to original (6).
As in (12), let r; = 7;+1 - 7;. With respect to Pi' the r; are independent,
identically distributed, and have mean Ij7T(s). The Pcdistribution of r; does
not depend on i, but does depend on the reference state s. Theorem (6) will
88 SOME INVARIANCE PRINCIPLES [3
1
en = nLm {sm:l ~ m ~ n}.
The two estimates are
(14) An - Bn ---+ 0 with Prprobability 1
(15) Bn - en ---+ 0 with Pi-probability 1.
Add (14) and (15) to get (6) in the form An - en ---+ 0 a.e. Relation (14) is
obtained by replacing '1+1 with its mean value 1/1T(s). The error is negligible
after dividing by n, in view of the strong law. Relation (15) will follow from
the fact that essentially all m ~ n are in intervals h+1,7"1+2) over which
Sm = Vi' because IV;I is large by comparison with Ui+1 and Y'(m).
Making this precise requires lemmas (16) and (17). For (16), let '1' '2' ...
be any independent, identically distributed random variables, with finite
expectation. Let~I c ~2 C .•• be a-fields, such that, n is~ n+I-measurable,
and ~ n is independent of,n' Let ZI' Z2' ••• be random variables, taking only
the values 0 and 1, such that Zn is ~ n+1-measurable, and L;' zn = 00 a.e.
(16) Lemma. (L~=I Z;'1+I)/(L~=1 z;) conve'ges to E(,;) a.e.
PROOF. Let Zn = Lf Zi' For m = 1, 2, ... , let W m be 1 plus the smallest
n such that Zn = m. I say that {Wm = j} E ~;. Indeed, for j ~ m + 1,
{Wm = j} = {ZI + ... + Z;_I = m > ZI + ... + Z;_2}'
If m' < m and A is a Borel subset of the line, I deduce
{rw m ' EA and Wm =j} E~;;
for this set is
u~:~ {rk E A and Wm, = k and Wm = j}
I conclude that 'Wl' 'w.' ...
are independent and identically distributed, the
distribution of 'w m being that of ';' Indeed, if AI' ... ,Am are Borel subsets
of the line, then
Prob {rw lEAl' ... , rw m-l E Am-I, rW m E Am}
= L~I Prob {rw lEAl' ... , rw m-l E A m- I and Wm =j and r; E Am}
= L~I Prob {rw 1 E AI> ... , rw m-l E A m- I and Wm = j} . Prob {r; E Am}
= Prob {rw I E AI' ... , rW m-l E ~m-I} . Prob {r; E Am},
because {'WI E AI' ... ,'Wm-l E A m- I and Wm = j} E ~;, while '; is
independent of ~j and Prob {'j E Am} does not depend onj.
3.3] THE NUMBER OF POSITIVE SUMS 89
Because Zn -- 00 a.e.,
Z;;-l~drwk:l ~ k ~ Zn}--E(r 1 ) a.e.
But
~; {z;r;+1: 1 ~ j ~ n} = ~drwk:l ~ k ~ Zn}:
for Zn is the number of j = 1, ... , n with = 1; and W k is 1 plus the kth
*
z;
iwith z; = 1.
For (17), let YI , Y 2 , ••• be any sequence of independent, identically
distributed random variables. Put Vn = ~; Y;. Make no assumptions about
the moments of Y;. Let M be Ii positive, finite number. Let dn be 1 or 0
according as IVnl ~ M or IVnl > M.
(11) Lemma. Suppose Y; differs from 0 with positive probability. Then
1
- ~; d; -- 0 a.e.
n
PROOF. Suppose Y; differs from x with positive probability, for each x.
Otherwise the result is easy. Let Cn be the concentration function of Vn , as
defined in Section 5. Fix k at one of 1, 2, ... and fix r equal to one of 0, ... ,
k - 1. Let On be the conditional probability that IVnk+rl ~ M given
YI , . . . , Y(n-llk+r' I claim
(a)
Indeed, let fl be the distribution of Y = (YI , . . . , Y(n-llk+r), a probability
on the set of (n - l)k + r-vectors y = (YI' ... ,Y(n-llk+r)' Let A be a Borel
subset of this vector space. Let T = Y(n-llk+r+1 + ... + Ynk+r' so Tis
independent of Yand distributed like Vk. Let s(y) = Yl + ... + Y(n-l)k+r,
so
V nk+r = s( Y) + T.
By Fubini,
~ {C k (2M) fl(dy)
= Ci2M) . fleA)
= Ck (2M) . Prob {Y E A}.
90 SOME INVARIANCE PRINCIPLES [3
(b)
This fact may not be in general circulation; two references are (Dubins and
Freedman, 1965, Theorem (1» arid (Neveu, 1965, p. 147). Claim (b) can
also be deduced from the strong law for coin tossing. Suppose without real
loss that there is a uniform random variable independent of Y1 , Y 2 , ••••
Then you can construct independent 0-1 valued random variables e1, e2' ..•
such that: en is 1 with probability Ck (2M), and en ~ d nkH .
Sum out r = 0, ... ,k - 1 in claim (b) and divide by k:
(c) lim SUPn~CX) ~k ~; {d;: 1 ;;:; j < (n + 1)k} ;;:; Ci2M) a.e.
Let m and n tend to 00, with nk ;;:; m < (n + l)k. Then m/(nk) -- 1, so (c)
implies
[-------------------------. vr-·-------------------- ) rj + 1 - - -
o Tj+1 Tj+2
[ ------------------------------I¥ Tj + 1------------------------------ J
[------------------.--------------------------I¥ Tj + 2 --------------------------------------------- I
Figure 3.
the integral does not depend on i. Let Gn be the following random subset of
the positive integers: j E Gn iff TH2 ~ n and I V;I > M and UH1 ~ M12. In
particular, j E G n implies I ~ j ~ len) - 2. See Figure 4. Of course, G n
depends on M, although this is not explicit in the notation. Let
Hn = U {RH1:j E G n},
a random subset of {I, ... , n}. In particular, mE Hn implies T2 ~ m < THnb
[------------------------------Sm-----------------------------]
[------ Y'.--- )[------••----- ~ ------.--------) [---------. UI+ 1----------- )
[----------Rj + 1-----------)
1/
I, I I I II
" I I I
o TI+l m Tj+2 'Ii(II) n TI(II)+1
---"j+l---
Figure 4.
By looking at Figure 2,
~i {rH1 : 1 ~ j ~ /(n) - 2} ~ n
and
So
~i {rH1: 1 ~j ~ /(n) - 2 andj ¢ G,,} ~ n - #H".
But Vi = 0 or 1, so
o ~ ~dViri+1: 1 ~j ~ /(n) - 2 andj ¢ G,,} ~ n - #H".
Similarly,
Indeed, m E RS+1 and j E G" and Y' ~ MI2 force 3m = Vi: because
'Ti+1 ~ m < 'THI ~ n and IVii> M and UH1 ~ M12, so
If ~:l di < 00
with positive P;-probability, Hewitt-Savage (1.122) implies
di = 0 for all large j, with Pi-probability 1; and (21) holds. Suppose
~:l di = 00
with P;-probability 1. Lemma (16) makes
(1.21), with .'Fj = .'FTi' See Figure 3. But (17) makes ~ ~:l dj ~ 0 with
Pi-probability 1. So (21) still holds. Put len) - 2 for n in (21) and remember
len) ~ n + 1:
(22) ~j {d,rHI : 1 ~j ~ len) - 2} ~ en/3 for all large n, withP;-probability 1.
Next, blocks (1.31) and the strong law imply that with Pi-probability 1,
Put len) for n; remember that len) ~ n + I, and the integral is less than e/3
by choice of M;
(23) L j {rH1 : I ~ j ~ len) - 2 and Vi+! > M/2} ~ en/3 for all large n, with
Pi-probability 1.
(24) The number of m ~ n with m < 7'2 or m ~ 7'l(n) is at most en/3 for all
large n, with P;-probability 1.
Let Bn be the random set of mE {I, ... , n} which have property (a) or
(b) or (c):
But N" is the complement of H" relative to {I, ... , n}, proving (20). *
To state the arcsine law (26), define Fa as follows. For a = 0 or 1, let Fa
be the distribution function of point mass at a. For 0 < a < 1, let Fa be the
probability on [0, 1], with density proportional to y ~ ya-I(I _ y)-a.
94 SOME INV ARIANCE PRINCIPLES [3
*
PROOF. Use (6) and (Spitzer, 1956, Theorem 7.1).
P i {7 1 = 0 and 72 = 7} = 1.
except on a Pi-null set. The first summand is I(i), but this does not help.
Blocks (1.31) imply that the vectors T 1 , T2 , ••• are independent and
3.4] THE NUMBER OF POSITIVE SUMS 95
lim j .... oo j f
~ 1 l~:::::~ Sm = lm {Sm: O ~ m < 'T2} dPi •
Confine j to the sequence /(n), and use (12). *
The limit in (29) depends on i. For example, let I = {I, 2, 3} and let
P= (°0 1°0)1 ,
100
so 1 --->- 2 --->- 3 --->- 1. Then 7T(i) =i for i = 1,2, 3; condition (1) is equivalent
to
J(1) + J(2) + J(3) = 0.
°
the differences B(to) , B(tl ) - B(to), ... , B(tn) - B(tn-l) are independent
normal random variables, with means and variances
respectively. Such a construct exists by (B & D, 1.6). Let qo, 1] be the space
of continuous, real-valued functions on [0, I], with the sup norm
\IJII = max {IJ(t)I:O ~ t ~ I}.
Give qo, 1] the metric
distance (j, g) = Ii! - gil,
96 SOME INVARIANCE PRINCIPLES [3
and the a-field generated by the sets which are open relative to this metric.
The distribution TI" of {B(0'2t):0 ~ t ~ I} is a probability on qo, 1]. For
more discussion of all these objects, see Sections 1.5 and 1.7 of B & D. The
next theorem, which is an extension of Donsker's invariance principle to
functional processes, depends on some results from B & D. These are quoted
at the end of the section.
(30) Theorem. Suppose (1) and (2). Then 0'2 = 'IT (s) . S y~ dPi depends
neither on i nor on the reference state s. Let cp be a bounded, measurable, real-
valued function on qo, 1], which is continuous with TI,,-probability 1. Then
O:$~:$m+l:$l;
- n7T(s) - n7T(s) -
and study the successive corners of Sen) from the greatest one ~ no greater
m) to t h e 1east one -b no 1ess t h an -
t h an - - m- . dnm
+)1. T h"IS IS deplcte . Figure
. 5.
n7T(s n n7T(s
Analytically,
m m+l
mr(s) mr(s)
I I I I I
a a+l a+2 a+3
n -n- -n- -n-
b=2
Figure 5.
is at most
n-1 maxI.,. {lSI - V,.I:,u = m or m + 1 and a ~ j ~ b};
As before,
98 SOME INV ARIANCE PRINCIPLES [3
is at most
n-!maxi.1l {lSi - VIlI:ft = m or m + 1 and a ~j ~ n};
in this display too, 1ft - j1T(s)1 ~ 2. Thus, IIS(n) - Fn I is at most
Ei{cp(V(n»} = Ei{rp(V(n»)}
-f rp(j) nidf)
=f cpcl) TIp(df)
=f cp(f) TIuedf)·
This completes the proof of (30) for bounded and uniformly continuous cpo
Now use (B & D, 1.127).
Finally, (j2 does not depend on i because the Pi-distribution of Y1 does not
depend on i. And (j2 does not depend on s because the approximants to
J cp d TIq do not depend on s, a.nd this integral for all bounded, uniformly
continuous cp determines TI q, so (j. *
NOTE. In principle, (j2 can be computed from P. See (Chung, 1960, p. 81)
for an explicit formula.
Let Kq be the set of absolutely continuous functions f on [0, 1] which
satisfy: f(O) = 0 and J~ 1f'(t)12 dt ~ (j2. The extension of Strassen's invari-
ance principle to functional processes is
(32) Theorem. Suppose (I) and (2). For Pealmost all sample sequences, the
indexed subset
{(2 log log n)-! S(n):n = 3,4, ... }
of qo, 1] is relatively compact, and its set of limit points is K q , where
(j2 = 7T(S) • E;( Yf).
3.5] THE CONCENTRATION FUNCTION 99
°
random variables on (0, IF, &'), with mean 0 and finite variance p2. Let
Vo = and
Vn = Y1 + ... + Yn for n ~ l.
Let q; be a bounded, real-valued, measurable function on qo, 1], which is
continuous &'-almost everywhere. Then
limn-+oo
Jur q;(V(n) d&' =f
C[O,I]
q;(f) n
p
(d/).
(B & D, 127). Let {&' n' &'} be probabilities on qo, I]. Suppose
Jq;d&'n-Jq;d&'
for all bounded, uniformly continuous q;. Then the convergence holds for all
bounded, measurable q; which are continuous &'-almost everywhere.
(B & D, 129). Let {Zn' W n} be measurable maps from (0, IF, &') to
qo, I]. Suppose
IIZn - Wn II - 0 in &'-probability.
Let q; be a bounded and uniformly continuous function on qo, 1]. Then
For (33), let - 00 < a < b < 00. Let an ---+ a and bn ---+ b.
So
Then
~~+0~&>~~X~a+u+~+e
~ &>{a ~ X ~ a + u} + &>{a + u ~ X ~ a + u + v} + e
~ Cx(u) + Cx(v) + e.
3.5] THE CONCENTRATION FUNCTION 101
= Cx(u).
Claim (f). If x ~ X ~ x + u and x ~ Y ~ x + u, then IX - YI ~ u.
So
+ up ~ &'{IX - YI ~ u}.
&'{x ~ X ~ x
For (35-36), let Xl, X 2 , ••• be independent, identically distributed random
*
variables on the probability triple (0, ff, &'). Suppose &'{Xi = O} < 1.
Let K be a positive number. Let Sn = Xl + ... + X n. Let T be the least
n if any with ISnl ~ K, and T = co if none. Use E for expectation relative
to &'.
(35) Lemma. (a) There are A < co and p < 1 such that &'{T > n} ~ Apn.
(b) E{T} < co and &'{T < oo} = 1.
(c) Either &'{Iim sup Sn = oo} = 1 or &'{Iim inf Sn = - oo} = 1.
NOTE. In (a), the constants A and p do not depend on n; the inequality
holds for all n.
PROOF. Claim (a). Suppose &,{X, > O} > 0; the case &,{Xi < O} > 0
is symmetric. Find b > 0 so small that &'{XI ~ b} > O. If &'{XI ~ b} = 1
the proof terminates; so suppose &'{XI ~ b} < 1. Find a positive integer
N so large that Nb > 2K. Fix k = 0,1, .... Now SNk > -K and Xi ~ b
for i = Nk + 1, ... ,N(k + 1) imply SN(k+1) ~ K. So the relation
T > N(k + 1) implies ISnl < K for 1 ~ n ~ Nk, and Xi < b for at least
one i = Nk + 1, ... , N(k + 1). Consequently,
> N(k + 1)} ~ (1 - O)&'{T > Nk},
&'{T
where 0 = &'{XI ~ b}N > O. By substituting,
&'{T > N(k + I)} ~ (1 - ())k+1.
If Nk ~ n < N(k + 1),
So
&{sup ISnl < oo} = 1.
By countable additivity, there is a finite K with
&{sup ISnl < K} > O.
*
This contradicts (b).
Recall that Xl' X 2 , ••• are independent and identically distributed, and
Sn = Xl + ... + X n·
(36) Theorem. /f&{Xl = c} < 1 for all c, then Cs (u) -4- 0 as n -4- oo,/or
each u ~ O. n
PROOf. If the theorem fails, use (34b, e) to find u > 0 and () > 0 with
CsJu) ~ () for all m. Let Xl' Yl , X 2 , Y2 , ••• be independent and identically
distributed. Let Ui = Xi - Y i and Tn = Ul + ... + Un' By (34f),
(37)
Let F be the distribution function of Xl' By Fubini,
&'{T1 = O} = &{Xl = YI }
= J &'{Xl = y} F(dy)
<1.
By (35c) and symmetry,
(38) &'{lim sup Tn = oo} = 1.
Choose a positive integer N so large that N()2 ;;; 2. I will obtain a contra-
diction by constructing a positive integer M, and N disjoint intervals
11 ,12 , ••• , IN' so that
&,{TME I j } ~ i()2 for j = 1, ... , N.
Let II = [-u, u]. By (37),
&{Tn E II} ~ i()2 for all n.
As in Figure 6, let Tl be the least n with Tn > 3u. As (38) implies,
&'{ Tl < oo} = 1. Choose positive integers Ml and KI so large that
&'{Tl ~ Ml and TTl ~ K l } ~ i·
In particular, Kl > 3u. Let 12 = [2u, KI + u]. I claim
&{Tn E I 2 } ~ i()2 for all n ~ MI'
3.5] THE CONCENTRATION FUNCTION 103
o n
Figure 6.
#~ = j(n - r).
3.5] THE CONCENTRATION FUNCTION 105
Let .fFo be the family of subsets of F of the form Ai U {x} for (Ai' x) E~.
This representation is not unique, and I must now estimate #.fFo. Consider
the set ~o of all pairs (B, y), where B E .fFo and y E B. Plainly
#~o = (#.fFo)(r + 1).
Now (Ai' x) -+ (Ai U {x}, x) is a 1-1 mapping of ~ into ~o, so
(#.fFo)(r + 1) = #~o ~ #~ = j(n - r).
But r < n/2 and nis even, so r < (~ - 1)/2, and n- r > r + 1. Therefore,
#.fFo > j.
Let.fF' = .fFo U (.fF\{A I , . . . , Ai})' So #.fF' > #.fF. I will argue that.fF' is
incomparable. This contradicts the maximality of #.fF, and settles the even
case. First, .fFo is incomparable since all A E .fFo have the same cardinality.
Second, .fF\{A I , . . . , Ai} is incomparable because .fF is. Third, if (Ai' x) E ~
and B E .fF\{A I , . • . , Ai}' then
#(Ai U {x}) ~ #B,
so B c Ai U {x} entails B = Ai U {x} and Ai c B. And Ai U {x} c B
also implies Ai c B. But Ai c B contradicts the incomparability of .fF.
This completes the argument that .fF' is incomparable, so the proof for even n.
Suppose n is odd. Let .fF be incomparable with maximal #.fF. The
argument for even n shows that A E .fF has
#A = (n - 1)/2 or (n + 1)/2.
Suppose some A E.fF have #A = (n - 1)/2 and some #A = (n + 1)/2.
Let AI' ... , Ai have cardinality (n - 1)/2. Repeat the argument for even n
*
to construct an incomparable family .fF' with #.fF' ~ #.fF, and all A E.fF
having #A = (n + 1)/2.
To state (41), let Xl' . . . , Xn be real numbers greater than 1. Let V be the
set of n-tuples v = (VI' ... , vn ) of ± 1. Let a be a real number. Let U be the
set of V E V with
a ~ ~~l ViXi ~ a + 2.
(41) Lemma. #U ~ Cn;2»)'
PROOF. For V E V, let A(v) = {i:i = 1, ... ,n and Vi = I}. For u =;6
in U, the sets A(u) and A(v) are incomparable, because all Xi> 1. Use (40). *V
tm = 2- 2m C=)·
By algebra,
so
(1 + 2m + 2)lt = (2m + 1)l(2m + 3)l(1 + 2m)lt .
m+1 2m +2 m
*
so the lemma holds for odd n because it does for even n.
by Schwarz. Consequently,
............"....----f
o k
Figure 7
= frp[f;(y)] dy
° °
Remember that E is expectation relative to f!i'.
Let cp;(Y) = 1 or according as hi(y) > I or h;(y) ~ I, for ~ y ~ t.
Let Q be the f!i'-distribution of Y = (Yl , • . • , Y n ), a probability on S, the
°
set of all n-tuples y = (Yl' ... ,Yn) with ~ Yi ~ t. For yES, let
1T = f!i'{a ~ L':l Zi ~ a+ 2}
=r f!i'{a ~ ~F=lgi(Yi) + L/~l h;(y;)(J; ~ a + 2} Q(dy).
)YES -
By (44),
1T
<
= ..:!(~n
2 .... ;=1 Pi
)-t .
*
Finally, use (46).
110 SOME INVARIANCE PRINCIPLES [3
THE BOUNDARY
1. INTRODUCTION
I want to thank Isaac Meilijson for checking the final draft of this chapter.
111
112 THE BOUNDARY [4
I-sequences, with the product a-field. Give n* u 100 the a-field generated by
all the subsets of n* and all the measurable subsets of jOO. Let ~o, ... be the
coordinate processes on n * u r. Let n be the union of n * and the set n°o
of wE 100 such that ~n(w) = i for only finitely many n, for each i E I. Retract
~o, ... to n, and give n the relative a-field.
For any probability q on I, let P q be the probability on n for which the
coordinate process is Markov with starting probability q and stationary
transitions P. Of course, Pq(n) = 1 because I is transient and q is a prob-
ability. If q(i) = I, write Pi for PQ • Let fh = {i: i E I and h(i) > O}. Let
(ph)(i) = p(i)h(i), and
P"(i,j) = h(i)-lP(i,j)h(j)
for i and j in fh. Plainly, P" is substochastic on 1\
(ph)n = (pn)h and Gh(i,j) = h(i)-lG(i,j)h(j) = ~:=o (p,,)n(i,j) < 00.
Abbreviate Q" for P;", the probability on n such that the coordinate process
is Markov with starting distribution ph and stationary transitions ph. That is,
extreme excessive functions, the Q" are mutually orthogonal. There are two
properties g(i) must have from the point of view of Q,,: it must vanish a.e.
when i 1= I"; and g(i)h(i) must be a version of the Radon-Nikodym derivative
of PI' with respect to Q", when retracted to f, for i E Jh. Perhaps the leading
°
special case is the following: p concentrates on one state, P is stochastic,
GU, k) > for allj, k E I, and h is harmonic. Then h is positive everywhere
and 0* is not needed. Moreover, On has measure 1 if p(In) = 1; these
quantities will be defined later. Section 2 contains proofs. Section 3 contains
the following theorem:
G(i, ~n)/P(~n) converges to a finite limit Pp-almost surely.
2. PROOFS
A = (T-;;lB) U (A\On)
for some measurable B. This shows w E A iff Tw E A. Finally, I claim
f c f n . Indeed, let A E f. The problem is to show A nOn E f". But
A n On = T-;;lA E f".
Assertion (b) is immediate from (a) and
(5) L(w) = ~o(ynw) for all large n
WARNING. On 1= f n +!·
= ~o(Tnw) for all large n.
*
(6) Lemma. Let i E Is. Then Pi is absolutely continuous with respect to P'J)
when both probabilities are retracted to OS.1'N' a version of the Radon-
Nikodym derivative on ON being G(i, Y.v)/ pG( Y1\,).
PROOF. Fix M andjo,'" ,jM' withjo E IN andh 1= IN"" ,jM ¢ IN' Let
7T = P(jO,jl) ... P(hI-l,h,I) . Pj)~n E IN for no n > O}.
The event
Am = {ON and ~rN = jo, ... , ~rN+M = IM and TN = m}
is the same as
4.2] PROOFS 115
*
Use (10.16) to vary A.
Let C i be the set where G(i, Yn)/pG( Y n) converges to a finite limit as n -- 00.
Call the limit g(i). Of course, g(i) may be O. Plainly, C; E .f', and g(i) is f-
measurable.
(7) Lemma. Pi is absolutely continuous with respect to Pp, when both
probabilities are retracted to f. Moreover, Pp(C;) = I. and g(i) is a version of
the Radon-Nikodym derivative of Pi with respect to Pp, when both probabilities
are retracted to f.
PROOF. As (4) and (6) imply, Pi is absolutely continuous with respect to
Pp on o..\'f. Since o..v i 0. as N i 00, the absolute continuity on f follows.
Use (10.35) on (6) for convergence. *
Remember that the probabilities PI' and Q" are concentrated in the part of
0. where all coordinates are in /".
(8) Lemma. If i rt IlL, then Q,,( C i ) = I and g(i) = 0 with Q,,-probability 1.
PROOF. Clearly, Q,,{ Y", E Jh} = 1. Moreover,
h(i) ~ ~j P"(i,j)h(j) ~ P"(i,j)h(j),
proving
(9) i rt [" and j E [" imply G(i,j) = o.
So i rt Jh makes
A helpful consequence of (9) is that for all io, ... , in in [, whether in ["
*
or not,
116 THE BOUNDARY [4
= :Ei€lh p(i)G(i,j)h(j)
= :EiEI p(i)G(i,j)h(j) by (9)
= pG(j)h(j)
> o.
(12) Lemma. Let iE/h. Then Qh(Ci ) = 1. Moreover, P~ is absolutely
continuous with respect to Q", with Radon-Nikodym derivative g(i)fh (i) ,
provided both probabilities are retracted to .1'.
PROOF. Keep n so large that i E In. Then Pi is absolutely continuous
with respect to QII when both are retracted to n"f", a Radon-Nikodym deriv-
ative being
Gh(i, Yr,)fphG h ( Y1/) = h(i)-l[G(i, Y,,)/pG(Yn )]
when Y n E [h. This follows from (6) on ph and (11), in the part of where n
all coordinates are in I". But this part has probability 1, both for P: and Qh'
Now (7) on ph makes this derivative sequence converge Qh-almost surely in
the part of n where all coordinates are in /h, which is Qh-almost all of n.
This makes Qh( C i ) = 1. Lemma (7) also identifies the 'limit of the derivative
sequence, namely g(i)/h(i), as the required Radon-Nikodym derivative. *
Let C = nici . So C E.1', and Q,,(C) = I by (8) al1~ (12).
That is,
Qh(B n A) = L
P,p(B) g(in) dQh'
*
Let H = {w:w E C and g(·)(w) is excessive}. Check HE oF.
(15) Lemma. Qh(H) = 1.
PROOF. Clearly, (1) is satisfied: g(·)(w) ~ 0 for all wE C. Next, I will
work on (2): the set of WEe such that
kiEIP(i)g(i)( w) = 1
has to have Qh-probability 1 for all h. But Qh{~O E Jh} = 1. So with Qh-
probability 1,
1 = Qh{~O E/h I oF}
= kiEjA Q,Mo = i I oF}
= r.iEjh p(i)g(i) by (14).
By (8), the sum can be extended over all of I, at the expense of changing the
exceptional null set.
Finally, I will work on (3): for each i E I, the set of OJ such that
r.;Ej P(i,j)g(j)(w) ~ g(i)(w)
Let
*
In view of (9), the sum can be extended over all of I, by changing the
exceptional null set.
For wE H, let Qw be the probability on 0 making the coordinate process a
Markov chain with starting probability pg(·)(w) and stationary transitions
pg(.)(W).
*
E} E J' and Qh(E1) = 1 by (10.52). Finally, wE E1 makes Qw concentrate
on the C-atom containing w.
(18) Theorem. There is one and only one probability m on C, such that
r
JE
g dm = h.
Namely, m is Qh retracted to C.
PROOF. The retraction of Qh to C works by (13) and (17). For the
uniqueness, suppose m on C satisfies
r
JE
gdm = h.
As (10) and (10.16) imply,
Use (l7b):
Qh{g = h'} = 1
for some h', using (10.17b, -18). Necessarily, h' = h by (18). And there is
an wEE with g(·)(w) = h. Next, let WEE and put h = g(·)(w). Let m
assign mass 1 to the te-atom containing w, so m{g = h} = 1. Then m is
0-1 and Sg dm = h, so h is extreme. *
(20) Lemma. If wE EQoo, then g(·)(w) is harmonic.
> 0.
*
PROOF. If h is excessive but not harmonic, then Qh(Q*) But
Qw(Q*) = 0, because w E Eo.
(21) Lemma. Fix k E I, and let h = G(" k)/pG(k). Then h is excessive, and
equality holds in (3) iff i =/= k. In particular, h determines k.
PROOF. P(i,j)G(j, k) is the Pcmean number of visits to k in positive
*
I:jEI
time.
NOTE. This observation and Fatou afford another proof of (15).
and let
Bi = {B and ~n+2 = j}.
Use (10):
Then
Q,,(A) ~ Q,,(B\U i B i )
= Q,,(B) - ~i Q,,(Bi )
= 7Tp.
*
wE 0*, the function g(·)(w) determines L(w) by (21). Consequently,
Jf* = 0*6".
Incidentally, the argument shows Qh(A) = 1. For a more direct proof,
see (33).
(23) Theorem. As k ranges over I, thefunction G(·, k)jpG(k) ranges over the
extreme, excessive functions which are not harmonic.
PROOF. From (19) and (20), the only candidates for the role of extreme,
*
excessive, non-harmonic functions are g(')(w) for w E EO*. As (22) implies,
0* c E and each candidate succeeds.
(24) Remark. 6" is a countably generated sub a-field of E. If A E Jf, then
{w:w E E and Q",(A) = I} E 6" differs from A by a Q,,-null set. In particular,
.F and 6" are equivalent a-fields for Q". However, the a-field Jf is inseparable.
Each of its atoms is countable. In general, for WEE the probability Qw is
continuous, and therefore assigns measure 0 to the Jf-atom containing w.
(25) Remark. h is extreme iff Q" is 0-1 on Jf.
*
PROOF. Proved in (19).
(26) Remark. Suppose P is stochastic. Then 1 is extreme iff there are no
further bounded harmonic h.
PROOF. For "if," use an argument based on (1). For "only if" suppose
h ~ I is bounded harmonic and e > 0 is small. Then
1 = t(1 - e) 1 - eh + t(1 + e) 1 + eh
l-e l+e
displays 1 as a convex combination of distinct harmonic functions. *
4.3] A CONVERGENCE THEOREM 121
*
PROOF. Use (25) and (26).
On a first reading of this chapter, skip to Section 3. It is possible to study
the bounded, excessive functions in a little more detail, and in parentheses.
To begin with, (10) implies that Qh is absolutely continuous with respect to
P p on the first n + 1 coordinates, and has derivative h( ~ n). This martingale
converges to dQh/dPp by (10.35). Of course, Qh need not be absolutely
continuous with respect to Pp on the full a-field. However, if h is bounded by
K then Qh ~ KPp by what precedes, and h(~n) converges even in £1. Con-
versely, if Qh ~ KPp , even on C, then h is bounded by K in view of (13).
If h* = lim h(~n)' then h(i) = E(h* ~n = i). I
Turn now to extreme, excessive functions which are bounded. The
characterization is simple: h is bounded and extreme iff Pp {g = h} > O. For
(13) implies 1 = SgdPp. If Pp{g = h} = IX > 0, then IXh ~ 1; while h'is
extreme by (19). If h is bounded and extreme, then Qh{g = h} = 1 by (19),
and Qh ~ KP p by the previous paragraph. There are at most countably
many such h, say hi' h2' .... General h can be represented as
~n qnhn + qJlc + qshs.
Here the q's are nonnegative numbers which sum to 1. As usual, the h's are
excessive. Retracting Qh and Pp to g,
3. A CONVERGENCE THEOREM
I remind you that ON is the set where IN is visited, and TN is the time of the
°
last visit to IN' On ON' let ~m = ~Trm for ~ m ~ TN' Of course, even ~o
is only partially defined, namely on ON'
122 THE BOUNDARY [4
Then
where
An = {~n = illl , .•• , ~n+"lI = io, and ~. E IN for no v> n + M}.
By Markov (1.15),
PiAn) = P"gn = i M } • Tr' u(io).
Sum out n and manipulate:
For the moment, take ({J) on faith. Check that ON and {IN are nondecreasing
with N. So ({J) and monotone convergence imply
By (lO.10b),
< oo} = 1.
P'P{hm.v {IN
Let Og be the intersection of {limN {IN < oo} as a, b vary over the rationals.
Then Og is measurable and P'P{Og} = 1. You can check that Pn converges
4.3] A CONVERGENCE THEOREM 123
L X m+ 1 dP p ~ L Xm dP p '
=lXmdPp ,
A
for these reasons: Xm+l = 0 on {TN = m} in the first line; split over the sets
gm+l = k} and use the definition of X in the second; use (28) in the third;
use (21) on R in the fourth; use the definition of X in the last. Consequently,
the sequence X o, Xl> ... is an expectation-decreasing martingale. Plainly,
f3N is at most the number of downcrossings of [a, b) by X o, Xl' .... By
(10.33),
r
JI1N
f3N dP p ~ (b - 1 r X o dP
ar JON p ~ (b r S«(o, i) dP
- a)-l pG(i)-l
JI1N
p'
Use (28): the last integral is the mean number of visits to i by '0, '1' ... ,
*
that is, the mean number of visits to i by ~o, ... , ~TN' and is no more
than pG(i). This proves (f3).
124 THE BOUNDARY [4
(30) Corollary. For all h, the sequence G(i, ~n)/pG(~n) converges Q,,-almost
surely. Moreover, h is extreme iff
*
converges Q,,-almost surely by (II), and (29) on ph. The last assertion now
follows from (19).
(31) Remark. G(i, ·)/pG(·) is bounded, because (1.51d) makes
G(i,j)/pGU) = SU, i)pG(i) ~ S(i, i)/pG(i) = G(i, i)/pG(i).
4. EXAMPLES
ED
I
• ~ EJ • • ••
• • •
• • •
• • •
• • •
• ••
• • •
From any state other than 1. it is possible to jump to 1. This transition is not shown.
Figure 1.
P is transient;
P[I, (n, 1)] >0 and ~nP[I, (n, 1)] = 1;
o < peen, m), (n, m + 1)] < 1;
P[(n, m), 1] = 1 - P[(n, m), (n, m + 1)].
In the presence of the other conditions, the first condition is equivalent to
(35)
126 THE BOUNDARY [4
Suppose in converges. Let in = (an, bn). Clearly, bn -+ 00. Let j = (c, d).
Suppose bn ~ d. If an = c, then 1 leads to in only throughj. By (1.16) and
strong Markov (1.22),
cp(l, in) = cp(1 ,j) . cp(j, in)·
The right side of (35) is therefore Ijcp(l,j). If an #: c, thenj leads to in only
through 1. So
cp(j, in) = cp(j, 1) . cp(l, in),
and the right side of (35) is cp(j, 1). But
cp(j, 1) < Ijcp(l,j),
for otherwise cp(l,j) = cpU, 1) = 1 and P is recurrent by (1.54). Because in
converges, an is eventually constant, say at a. The limit of in is then h a . By
(19), any extreme harmonic h is an ha •
I will now check that ha is harmonic. To begin with, I say
cp[l, (a, 1)] = P[I, (a, 1)] + }:n*a P[l, (n, 1)] . cp[(n, 1), 1] . cp[l, (a, 1)].
Indeed, consider the chain starting from 1. How can it reach (a, I)? The
first move can be to (a, I). Or the first move can be to (n, 1) with n #: a:
the chain has then to get back to I, and from 1 must make it to (a, 1). This
argument can be rigorized using (1.16, 1.15, 1.22). Divide the equality by
cp[l, (a, I)]:
That is,
Next, I say
+ I)] = cp[l, (a, b)] . Plea, b), (a, b + I)]
cp[l, (a, b
+ cp[l, (a, b)] . Plea, b), I]· cp[l, (a, b + 1)].
Indeed, a chain reaches (a, b + 1) from 1 by first hitting (a, b). It then makes
(a, b + 1) on one move, or returns to 1 and must try again. Rearranging,
ha(a, b) = Plea, b), (a, b + I)] . ha(a, b + 1) + Plea, b), 1] . ha(l)
= }:j Plea, b),j] . ha(j)
Finally, let n #: a:
cp[(n, m), I] = P[(n, m), (n, m + 1)] . cp[(n, m + 1),1] + P[(n, m), I];
so
ha(n, m) = }:j P[(n, m),j] . ha(j).
I will now check that ha is extreme. Abbreviate
7ra = p~a.
4.4] EXAMPLES 127
As (33) implies, 'lTa-almost all sample sequences reach (a, b). Therefore, with
'lTa-probability 1 the first coordinate of ~ n is a for infinitely many n. But ~n
*
converges with 'lTa-probability 1 by (30). So 'lTa{~n ->- ha} = 1, and ha is
extreme by (30).
(36) Example. There are countably many extreme harmonic h. There is a
sequence in in I which converges to an extreme excessive h which is not
harmonic. This example is obtained by modifying (34) as follows: The state
space consists of 1 and all pairs (n, m) of positive integers. The reference
probability concentrates on I. The transitions are constrained as in (34). The
new convergence is (n, m) with n ->- 00 and m free. The limit is hoo, where
hoo(j) = 4>(j, 1) = G(j, 1)/G(I, 1).
Use (21) to see hOC! is not harmonic. The rest of the argument is like (34). *
(37) Example. There are c extreme harmonic h. The state space I consists
of all finite sequences of O's and 1's, including the empty sequence 0. The
reference probability concentrates on 0. The transition probabilities P
are subject to the following conditions, as in Figure 2:
• • • • • • • •
• • • • • • • •
• • • • • • • •
From any state other than 0, it is possible to jump to 0. This transition is not shown.
Figure 2.
P is transient;
0< P(0, 0) < 1 and P(0, 1) = 1 - P(0, 0);
for eachj ¥: 0 in I, the three numbers P(j,jO) and P(j,jl)
and P(j, 0) are all positive and sum to 1.
For each infinite sequence s of O's and 1's, let hs be this function on I:
hs(j) = 1/4>( 0 , j) if s extends j;
hs(j) = 4>(j, 0) otherwise.
128 THE BOUNDARY [4
Then {h.} are the extreme harmonic h. The argument is like (34): a sequence
in in I converges iff the length of in tends to 00, and the mth component of in
is eventually constant, say at Sm' for each m. Then in -+ hs .
Now suppose that P(j,jO) = P(j,jl) and depends only on the length of j,
for allj. I claim each h. is unbounded. Indeed, suppose for a moment thatj
has length N. Let ()(j) be the P 0 -probability that {; n} visits j before any other
k of length N. By symmetry, ()(j) = 2-N • If ;n visits j after visiting some k
other than j of length N, there is a return from k to 0, except for miracles.
Thus,
cp(0 ,j) = ()(j) + tJ(j),
where tJ(j) is at most P", (AN)' and
AN = {;. returns to 0 after first having length N}.
But limN~'" P '" (AN) = 0, because ;n visits 0 infinitely often on nN AN.
Consequently,
limlY~oo cp(0 ,j) = O.
However, there are many bounded, harmonic, nonextreme h: here is an
example:
h(i) is twice the Pi-probability that the first coordinate of ;n is 0 for all
large n. ~
PI P2 P3
• • •
•• •
•••
PI P2 P3
Figure 3.
(41) Example. The random walk in space-time. The state space I consists
of pairs (n, m) of integers with 0 ;;; m ;;; n. The reference probability con-
centrates on (0, 0). Let 0 < p < I. The situation with p = 0 or I is easier.
The transition probabilities P are given by
P[(n, m), (n + 1, m + 1)] = P and P[(n, m), (n + 1, m)] = 1 - p.
You should check that G[(a, b), (n, m)] = 0 unless n ~ a, and m ~ b, and
n - a ~ m - b, in which case
= (:Y(1 - ;fb
~ q\1 _ q)a-b.
4.4] EXAMPLES 131
*
Now, ph a is again a random walk in space-time, with q replacing p, so that hq
is extreme harmonic by (25) and Hewitt-Savage (1.122).
The next example, the Polya urn, has been studied recently by (Blackwell
and Kendall, 1964).
°
(42) Example. An urn contains u balls at time 0, of which ware white and
u - w black. Assume w > and u - w > 0. At time n, a ball is drawn at
random, and replaced. A ball of the same color is added. Then time moves on
to n + 1. Let Un be the number of balls in the urn at time n, namely n + u.
Let Wn be the number of white balls in the urn at time n. Then
{(Un' Wn):n = 0, ... } is a Markov chain starting from (u, w), with state
°
space I consisting of pairs (t, v) of integers having < v < t. The chain has
stationary transitions P, where
h,,(a, c) = (a - 1) ( a - 2) . 7T C- 1 (1 _ 7T)a-c-I.
C - 1
This follows by (40):
(Un - a)! 2-a
(Un - 2)!
~ Un
*
random walk in space-time with parameter 7T, so h1f is extreme harmonic by
(25) and Hewitt-Savage (1.122).
This section is somewhat apart from the rest of the chapter, but uses
similar technology. The results wiII be referred to in ACM. Let P be a
4.5] THE LAST VISIT TO i BEFORE THE FIRST VISIT TO J\ {i} 133
1= {I, 2, 3, 4, 5} J = {I, 2, 3} i= 1
a a+fj
O~~ __ ~~ __ ~ __ ~~ __ ~ __ L-~ __~~__~__~~_
o 2 3 4 5 6 7 8 9 10 11 12 13 14
Figure 4.
stochastic matrix on the countable set I; suppose I fonns one recurrent class
relative to P. This stands in violent contrast to Sections 1-4. The coordinate
process ~o, ~1' ••• on (J"', Pi) is Markov with starting state j and stationary
transitions P. Fix J c I and i EJ. Let (ex + (J) be the least n with ~n EJ\{i}.
Let ex be the greatest n < (ex + (J) with ~n = i. See Figure 4. For j E I, let
= Pi { ~ visits i before hitting J\ {i}}.
h(j)
So h(i) = 1, while h(j) = 0 for j EJ\{i}. Check
(43 a) h(j) = 'i:.kE1P(j, k)h(k) for j E I\J.
On the other hand,
(43b) 'i:. kEI P(i, k)h(k) = 1 - (),
where
() = P i { ~ hits JHi} before returning to i} > O.
Let
H = U:j E I and h(j) > O}.
Then i E H, and H\{i} c I\J. Let
To check (e), stop'; when it first enters J\{i}, and reverse time. That is,
look at ~n = ~a+P-n for 0 ~ n ~ IX + {J. Then ~ is a Markov chain with
stationary substochastic transitions, as in (28). Now {J = (IX + {J) - IX is
the least n with ~n = i, and is Markov for ,. Use strong Markov (1.21) on
the process ~ and the time {J. This proof of (e) was suggested by Aryeh
Dvoretzky.
You can also derive (e) from blocks (1.31) and lemma (48) below. Use the
successive i-blocks for Xl' X 2 , • • • • Let V be the set of i-blocks free of J\{i}.
Now L is the least n with Xn ¢ V. Check that {';n:O ~ n ~ IX} is measurable
on (Xl' " .. , X £-1), while {.;a+n: 0 ~ n ~ {J} is measurable on XL-But XL
is independent of (Xl' ... ,XL-I)'
I will now argue (a) from (e) and strong Markov. Abbreviate·
X = {';n:O ~ n ~ IX}
Y = {';a+n:O ~ n ~ {J}
y* = ';a+fJ
Z = {';a+P+n:O ~ n}.
Let '(v) be the post-a. process. Check that a. is Markov, B. is in the pre-a.
sigma field, and '(v)o = i. Check
B.+1 = B. n {'(v) E B 2 } for v ~ 1.
By definition,
Pi{Bo} = P i{B 1 } = 1 and P i{B 2} = 1 - e.
By strong Markov (1.22) and induction,
By Markov (1.15),
(45b)
Check that
{B. and s(v) E C} for v = 1,2, ...
are pairwise disjoint and their union is A. So
Pi{A} = ~::'1 Pi{B. and s(v) E C}
= ~::'1 Pi { Bv} . Pi { C} by strong Markov (1.22)
= ~~=1 (1 - 0)"-1. Pi { C} by (45a)
by (45b)
*
=0 if in rf=H*.
Suppose 0 < 1. Define a new matrix M on H as follows:
M(j, k) = M(j, k) for j ~ i;
M(i, k) =
_1_ M(i, k).
1- 0
This M is stochastic. Let A. be the least n > 0 with ~ n = i. Let D be the
conditional Prdistribution of {~n:O ~ n < A.}, given that ~n EI\{i} for no
n < A.. Define a new process gn:O ~ n} with state space I, starting from i,
visiting i an infinite number of times, such that the i-blocks of ~ are independ-
ent and have common distribution D.
(46) Theorem. ~ is Markov with stationary transitions M.
PROOF. Let io = i. Let iI' ... , in be in Ill. Let
A = gm = im for 0 ~ m ~ n}
B= g returns to i before visiting l\{i}}.
The D-probability of starting off (io, iI' ... , in) is
Pi{A I B} = Pi{A (l B}!Pi{B}
= _1_ [Il::'~~ P(im' imH )] h(ln),
1- 0
by Markov (1.15). The D-probability of starting off (io, i1 , ••• ,in)' and
terminating at time n, is
Pi{A and ~n+1 = i I B} = Pi{A and ~nH = i}!Pi{B}
= 1 ~ 0 [Il::,-:lo P(irn' imH)] PUn' i)h(i).
4.5] THE LAST VISIT TO i BEFORE THE FIRST VISIT TO J\{i} 137
*
PROOF. Use (48) below. Let Xn be the nth i-block in .;. Let V be the set
of i-blocks free of J\{i}. So L = T, and the two ()'s coincide.
To state (48), let Xl' X 2 , ••• be a sequence of independent and identically
distributed random objects. Let V be a measurable set of values, such that
Xl t/= V has positive probability () less than 1. Let L be the least n with
Xn t/= V. Next, let T, Z, Y I , Y 2 , ••• be independent random objects. Suppose
T is n with probability 0(1 - ()n-l for n = 1,2, .... Suppose the distri-
bution of Z coincides with the conditional distribution of Xl given Xl t/= v.
Suppose the distribution of Y n coincides with the conditional distribution of
Xl given Xl E V for n = 1,2, ....
(48) Lemma. (Xl' ... ' XL-I, XL' L) is distributed like (YI , .. . , Y T-l' Z, T).
PROOF. Let n be a positive integer. Let AI' ... ,A n - l be measurable
subsets of V. Let B be a measurable set disjoint from V. Then
INTRODUCTION TO
CONTINUOUS TIME
I want to thank Richard Olshen for checking the final draft of this chapter.
138
5.1 ] SEMIGROUPS AND PROCESSES 139
proof is to complete ~ under f7J and modify each X(I) on a set of f7J-measure
0, so the resulting process X* is separable; this uses an uncountable axiom of
choice. Then, you prove that for almost all w, the function 1--+ X*(I, w) for
rational I is a step function. As a function of real I, it is separable, and there-
fore a step function. The method here is to construct one process which not
only has the required distribution, but also has step functions for all its
sample functions.
Most results in this chapter are standard. References are usually given for
proofs appropriated from others, but not for results. This section concludes
with lemma (4), which will be used in most of the constructions in the rest of
the book. Section 2 establishes the basic analytic properties of standard
stochastic semigroups; these results will also be used many times. Sections
3 and 4 are independent of Section 2, and cover a special topic: the analytic
properties of uniformly continuous semigroups. The results in this case are
simpler and more complete. Given a uniformly continuous stochastic
semigroup P, Section 7 constructs a Markov chain with stationary transitions
P, all of whose sample functions are step functions. This construction depends
on the construction in Section 6 of the general Markov chain whose sample
functions are step functions up to the first bad discontinuity, and are then
constant. Section 5 contains preliminary material on the exponential distri-
bution. I will refer to Section 5 repeatedly, and Section 6 occasionally, in
later constructions.
Finite I
Here is a summary of the results for finite I. Let P be a standard stochastic
semigroup on I. Then
P'(O) = Q
exists and is finite; Q(i, i) ~ 0; while Q(i,j) ~ 0 for i =F j; and
L j Q(i,j) = O.
Any finite matrix Q which satisfies these three conditions is the derivative at 0
of some standard stochastic P. Furthermore, Q determines P; in fact,
pet) = eQ!.
Fix a standard stochastic P, and let Q = p' (0). Let
q(i) = -Q(i, i).
Let
r(i,j) = Q(i,j)/q(i) when i =F j and q(i) >0
=0 otherwise.
5.1 ] SEMIGROUPS AND PROCESSES 141
°
ability triple (Q i , .'Fi , Pi)' Let .'Fi(t) be the a-field in Q i spanned by X;(s) for
~ s ~ t. Suppose
Pi{Xi(O) = i} = 1 for all i
°
(3a)
(3b) P;{X;(t) E l} = 1 for all t ~ and all i
(3c) Pi{A and Xi(t + s) = k} = Pi{A}' R(s,j, k) for all i,j, k in I,
all nonnegative sand t, and all A E .'F(t) with
A c {Xi(O) = i and Xi(t) = j}.
(4) Lemma. Suppose conditions (3). Then R is a stochastic semigroup.
Relative to Pi' the process Xi is Markov with stationary transitions Rand
starting state i.
PROOF. In (3c), put t = 0, i= j, A = {X;(O) = i}, and use (3a):
(5) R(s, i, k) = P;{Xi(S) = k} ~ 0.
Sum out k E I and use (3b):
1;kEI R(s, i, k) = Pi{Xi(S) E l} = 1.
So R(s) is a stochastic matrix, taking care of (1 a).
142 INTRODUCTION TO CONTINUOUS TIME [5
In (3c), put A = {Xi(O) = i and Xi(t)= j}, and use (3a, 5):
R(t, i,j) • R(s,j, k) = Pi{X;(t) = j and X;(t + s) = k}.
Sum outj E I and use (3b, 5):
1:;EI R(t, i,j)' R(s,j, k) = P;{X;(t) E I and X;(t + s) = k}
= P;{X;(t + s) = k}
= R(t + s, i, k).
So R satisfies (1 b). Condition (Ic) is taken care of by (3a) and (5). This
makes R a stochastic semi group.
Let io, ... , in' in+! E I. Let 0 = to < ... < tn < tn+! < 00. In (3c), put
i = i o, j = in, k = i n+1 , t = tn' and s = tn+! - tn. Put
A = {X;(trn) = im for m = 0, ... , n}.
Then
Pio{A and X;(tn+l) = in+l} = Pio{A} . R(tn+l - tn' in> in+!)
= n;:.=o R(t m +1 - t m, i m, im+l)
by induction: the case n = 0 is (5). This and (3a) make Xi Markov with
stationary transitions R and starting state i, relatIve to Pi' *
2. ANALYTIC PROPERTIES
*
Claim (b). Claim (a) shows that pes) is stochastic for 0 ~ s ~ t. Now
P(u) = p(u/n)n is visibly stochastic when u/n ~ t.
NOTE. Fix i E /. If 1:;EI P(t, i,j) = 1 for some t > 0, then equality holds
for all t. This harder fact follows from Levy's dichotomy (ACM, 2.8).
5.2] ANALYTIC PROPERTIES 143
*
But 0;:;; pet, k,j) ;:;; 1 and 2:. keF; pes, i, k) ~ 1 - pes, i, i).
(10) Lemma. P'(O, i, i) exists and is nonpositil'e.
PROOF. Letf(t) = -logP{t, Then ° r(t) < for all I> °
i, i). by~ 00
(7), andf(O) 0, andfis subadditive, andfis continuous by (9). Let
=
Fix a with
°
If q = 0, thenf= and P(t, i, i) = 1. So assume q > 0.
°;:; °
a < q. Fix t > so that t-1j(t) ~ a. Think of s as small
and positive. Of course, t = ns + b for a unique n = 0, 1, ... and b with
o ;:;; t5 < s; both nand b depend on s. So,
a;:;; t-1j(t);:;; t-I[nf(s) + f(b)] = (t-Ins)s-1j(s) + t-1j(t5).
Let s -+ O. Then t-Ins -+ 1 and b -+ 0, so
a ;:;; lim infs~o s-1j(s).
144 INTRODUCTION TO CONTINUOUS TIME [5
This proves:
(13) If SUPi q(i) < 00, then lim t_ o P(t, i, i) = 1 uniformly in i.
The converse of (13) is also true: see (29).
(14) Proposition. FixiEJ, withq(i) < 00. Fixj ~ i. ThenP'(O, i,j) = Q(i,j)
exists and is finite. Moreover, ~jEI Q(i,j) ~ O.
PROOF. I say
Let Q(i,j) = lim SUPd_O d-1P(d, i,j). Let 0 ~ 0 in such a way that
O-lP(O, i,j) ~ Q(i,j), and let n ~ IX) in such a way that nO ~ s < t. From
(16),
S-lP(S, i,j) ~ (1 - 8)2Q(i,j),
so lim inf.-o S-lP(S, i,j) ~ QU,j). This proves the first claim.
For the second claim, rearrange ~j P(/, i,j) ~ 1 to get
where
A = {~m = ; but ~!l ¢ j for 0 < f-t < m}
B. = {~m = i and ~. = j but ~!l ¢ j for 0 < f-t < v}.
Since "L":'::./ f(v) ~ I, relation (19) shows
(20) gem) ~ P(mb, i, i) - max {P(s,j, ;):0 ~ s ~ mb}.
EA e > O. Find t = t(i,j, e) > 0 so small that pes, i, i) > 1 - e and
P(s,j,j) > 1 - e for 0 ~ s ~ t; then P(s,j, i) < e. If nb < t, and m ~ n,
then gem) ~ I - 2e by (20). Combine this with (18);
P(nb, i,j) ~ (1 - e)(1 - 2e)nP(b, i,j).
Complete the argument as in (14).
The main result, due to (Doob, 1942) and (Kolmogorov, 1951), is
*
(21) Theorem. If P is a standard substochastic semigroup on the finite or
countably infinite set I, then P'(O, i,j) = Q(i,j) exists for all i,j and isfinite
for i ¢ j.
PROOF. Use (10) and (17). *
The matrix Q = p' (0) will be called the generator of P.
WARNING. Q is not the infinitesimal generator of P, and in fact does
not determine P. For examples, see Sections 6.3 and 6.5.
The following theorem (Ornstein, 1960) and (Chung, 1960, p. 269) will
not be used later, but is stated for its interest. A special case will be proved
later, in Section 7.6.
(22) Theorem. P(t, i,j) is continuously differentiable on (0, (0). For i ¢ j or
i = j and q(i) < 00, the derivative is continuous at O. For 0 < t < 00,
"L; Ipl(t, i,j)1 < <X) and "L; PI(t, i,j) = O.
For positive finite sand t,
P'(s + t) = P'(s)P(t) = P(s)P'(t).
An example of Smith (1964) shows that P'(t, i, i) may oscillate as t - 0
if q(i) = 00. A similar example is presented in Section 3.6 of ACM.
NOTE. Let Q be a matrix on I. When does there exist a standard stochastic
or substochastic semigroup P with P' (0) = Q? This is one of the most
interesting open questions in the Markov business. It is particularly intriguing
when Q is allowed to take the value - <X) on the diagonal. For some recent
work on this question, see (Williams, 1967); one of his results is discussed in
5.3] UNIFORM SEMIGROUPS 147
Section 3.5 of ACM. [This question was settled by Williams (1976), The
Q-matrix problem, Seminaire de Probabilites X, Lecture Notes in Mathe-
matics 511, Springer, Berlin.]
3. UNIFORM SEMIGROUPS
IIA + BII ~ IlAIl + IIBIl, IIABII ~ IlAIl· IIBIl, IlocAIl = loci· IIAII,
A¥-O implies I A I > 0, and I All = 1.
I -+
An ... + A + II ~--+ n m
IIAll ... + IIAlln+m , n
n! (n + m)! - n! (n + m)!
so converges (completeness) to an element of sI. The function A _ eA is
uniformly continuous on {A: IIAII ~ K < oo}. If A and B commute, that is
AB = BA, then
If/is a continuous function from [0, 00) to d, and ~ a < b < 00, then°
S!f(t) dt is defined as the limit of the usual Riemann sums. The old arguments
show that/is uniformly continuous on [a, b], so the limitexists and depends
linearly on! Moreover,
Ilff(t) I ~
dt f"f(t) II dt.
For A Ed,
= i"P(U + v) dv
= (u+SP(v) dv
J"
= A(u + s) - A(u).
5.3] UNIFORM SEMIGROUPS 149
So
[P(u) - ~]A(s) = A(u + s) - A(u) - A(s)
= [peS) - ~]A(u)
= A(u)[P(s) - ~].
Multiply on the right by A(S)-l:
P(u) = ~ + A(u)Q.
By induction,
where
Rn = :![f(t - u)np(u) du ]Q n +1
pet, i, i) - i + "'.
.:...)*,
PCt, i,j) < 0
= .
t t
By Fatou:
(28d) ~j Q(i,j) ~ O.
f
{Aand S~t)
= e- qU e-q(t-S) d [JIl
{A and S~t}
The first and last equalities come from Fubini (10.21). I will set the first one
up for you. Use (.0, ff, 9) for the basic triple. Let Xl(W) = wand let
(.o b ff 1) = (.0, ~). Let X 2 = T; let (.0 2, ff 2 ) be the Borel half-line [0, 00).
Let
f(w,y) = I if WEA and S(w)~t~t+u<S(w)+y
= 0 otherwise.
Thenfis ffl x ff 2-measurable. Andf(w, .) == 0 unless wE A and Sew) ~ t.
If wE A and Sew) ~ t, then
*
PROOF. Integrate and expand.
Let T and 11 be independent random variables on (.0, ff, 9).
(32) Lemma. (a) If T has a density bounded above by B, so does T + 11.
(b) If T has a continuous density bounded above by B, so does T + 11.
PROOF. Claim (a) is easy. Use (a) and dominated convergence for (b). *
NOTE. If Tis e(q), its density is bounded above by q, but is discontinuous
at 0, as a function on the real line.
To avoid exceptional cases, I will sometimes write e(O) for the distribution
concentrated at 00, and e( 00) for the distribution concentrated at O. Then
e(q) has mean l/q, where 0 and 00 are reciprocals. For lemmas (33) and (34),
let Tm be independent e(qm) 00, for nonnegative integer m, on (.0, ff, 9),
with 0 ~ qm < 00. Let 00 + x = 00 for x > - 00. Write E for 9-expectation.
(33) Lemma. Let M be a finite or infinite subset of the nonnegative integers.
Then ~mdI Tm < 00 a.e. if ~mElIf l/qm < 00, and ~mElIf Tm = 00 a.e. if
~mE.M l/qm = 00.
154 INTRODUCTION TO CONTINUOUS TIME [5
it
PROOF. Lemma (32b) reduces M to {O, I}. Then, the density at t ~ 0 is
In this section, I will construct the general Markov chain whose sample
functions are initially step functions, and are constant after the first bad
discontinuity. The generality follows from (7.33). This construction will be
used in Section 7 and in Chapters 6-8, so I really want you to go through it,
even though the first reading may be difficult. I hope you will eventually
feel that the argument is really simple.
Let I be a finite or countably infinite set. Give I the discrete topology. Let
o < a ~ 00. Let f be a function from [0, a) to I. Then f is an I-valued right
continuous step function on [0, a) provided
f(t) = lim {f(s):s t t} for 0 ~ t < a
f(t-) = lim {f(s):s it} exists in I for 0 < t < a.
Let r be a substochastic matrix on I, with
r(i, i) = 0 for all i.
Let q be a function from /to [0, (0).
Formalities resume
Fix a 1= I. Extend r to j = I U {a} by setting
r
Thus, is stochastic on !. Extend q to j by setting q( a) = 0. Define a matrix
Q on j as follows:
QU, i) = -q(i) for i E 1
Q(i,j) = qU)r(i,j) for i ~j in 1.
Introduce the set f!l' of pairs x = (w, w), where w is a sequence of elements
of 1, and w is a sequence of elements of (0, 00).
Let
;n(w, w) = wen) E j and 7'n(w, w) = wen) E (0,00]
for n = 0, I, .... Give f!l' the product a-field, namely the smallest a-field
over which ;0' ;1' ... and 7'o, 7'1' ... are measurable. Of course, f!l' is Borel.
INFORMAL NOTE. The process X begins by visiting ;0, ;1' ... with holding
times 7'o, 7'1' ....
For each i E 1, let 1Ti be the unique probability on f!l' for which, semi-
formally:
(a) ;0' ;1' ...
is a discrete time Markov chain with stationary stochastic
r
transitions on 1 and starting state i;
156 INTRODUCTION TO CONTINUOUS TIME [5
(b) given ;0' ;1' ... , the random variables TO, Tl' ••• are conditionally
independent and exponentially distributed, with parameters qao),
q(;I), ....
INFORMAL NOTE. 7Ti makes X a Markov chain with starting state i and
generator Q.
More rigorously: introduce W, the space of sequences w of elements of
(0, 00]. Give W the product a-field. For each function r from {O, 1, ... } to
[0, (0), let 'Yjr be the probability on W making the coordinates independent
and exponentially distributed, the nth coordinate having parameter r(n).
For wE 100 , the set of I-sequences, let q(w) be the function on {O, 1, ... }
whose value at n is q(w(n». Think of; = ao,
;1' ... ) as projecting f!( onto
100 , and T = (TO' T 1 , ••• ) as projecting f!( onto W. Then 7T i , a probability on
f!( is uniquely defined by the two requirements:
A
(a) 7T i ;-1 = f i ;
where:
(a) ri was defined in Section 1.3 as the probability on /00 making the
coordinate process a f-chain starting from i;
(b) A is a product measurable subset of f!(;
(c) A(w) = {w: wE Wand (w, w) E A} is the w-section of A.
There is a more elementary characterization of 7T i • which is also useful:
7Ti is the unique probability on f!( for which
(36) 7T;{;0 = i o••·· , ~n = in and TO> to,· .. ,Tn> tn } = pe- t ,
where
b=hoT
~4 =6 0 T
X
~o
~3 = ~2 0 T
TI
EI=EooT
-TO--
rt
TOoT
T2 =TI 0 T - I-T3 =T2 0 T-
t+s
l
N(t)= I N(t+s)=4
Figure 1.
d<oosoa=oo
absorbing
state in I
x
d= 00 and a = 00
/ x
. .'I
a
d =00 and a < 00
Figure 2.
158
5.6] THE STEP FUNCTION CASE 159
Tn = 00 iff ~n is absorbing;
Discussion
The argument is about the same, although it's harder to compute the
derivative.
Theorems
(38) Theorem. Let i E I. Then X has I-valued sample functions, which are
right continuous step functions on [0, (0), with 7Ti-probability 1 iff
with 7T i-probability l. More precisely, let 0 be the least t if any with X(t) = a,
and 0 = 00 if none. Then A = {O = oo} differs by a 7Ti-null set from the set B
where
*
SHORT PROOF. Condition on ~ and use (33).
Temporarily, let dew) be the least n if any with wen) = a, and dew) = 00 if
none. Let B be the set of w E foo such that
7Ti{A ~ B} =1 ro~B
1]q(ro){A(w)}ri (dw) + f 1]q(ro){W\A(w)} r;(dw).
roEB
Check
(j = ~n {Tn: O ~ n < d}.
Temporarily, let ~n be the coordinate process on W. Then A(w) is the subset
of Wwhere ,
~n {~n:O ~ n < dew)} = 00.
With respect to 1]q(rol' the variables ~n are independent and exponential with
parameter q(w(n». For w f/: B,
.1]q(ro){A(w)} =0
by the first assertion in (33). For wEB,
1]q(ro){W\A(w)} = 0
*
by the second assertion in (33).
(39) Theorem. With respect to 7T i , the process X is Markov with stationary
transitions, say R, and starting state i. Here R is a standard stochastic semi-
a
group on l,/or which is absorbing. Moreover, R'(O) = Q.
NOTE. The retract of R to I is stochastic iff for any i E I, with 7Tcprob-
ability 1, the sample functions of X are I-valued everywhere: you can prove
this directly. You know that an I-valued sample function is automatically a
right continuous step function.
PROOF. Let 0 ~ t < 00 and let 0 < s < 00. Let ~(t) be the smallest
<1-field over which all the X(u) are measurable, for 0 ;;; u ;;; t. The main
thing to prove is the Markov property:
(40) 7T.{A and X(I + s) = k} = 7Ti {A}' 7Tj{X(S) = k},
where
A c {X(O) = i and X(t) = j} and A E ~(/) and i,j, k are in I
First, I will argue the easy case j = a. Then X(t + s) = a, at least on
{¥o and X(t) = j}, while 7TiJ{X(S) = a} = 1. If k ¢ a, both sides of (40)
vanish. If k = a, both sides of (40) are 7Ti (A). This settles the casej = a.
From now on, assume j E I: Abbreviate
Dm = {X(O) = i and X(t) = j and N(t) = m}.
Let.91 m be the <1-field of subsets of Dm of the form Dm n A* with A* E ~(t).
For m = 0, I, ... , 00, the sets Dm are pairwise disjoint, and their union is
162 INTRODUCTION TO CONTINUOUS TIME [5
{X(O) = i and X(t) = j}. So it is enough to prove (40) for A Ed m: both sides
are countably additive in A. If i = Gor m = 00, both sides of (40) vanish. So
fix i E I and m < 00 for the rest. By definition, X(t) = ~m on {N(t) = m},
so ~m =j on Dm·
Call a set A special iff A is the event that
You get Dmn E~m < m. Ifh ¥:j, then Dmm = 0. If h =j, then
for n
Dmm = Dm n {TO + ... + T m_1 ~ u}.
Either way, Dmm E ~m" This proves d m c ~m' Two different special A are
disjoint or nested; so the class of special A, with the null set added, is closed
under intersection. The union over i 1 , . . . ,;m-1 of the special A with
to = ... = t m - 1 = 0 is D m' And both sides of (40) are countably additive in A.
If I manage to get (40) for the special A, then (10.16) will extend (40) to ~m'
which is more than enough.
Both sides of (40) vanish if one or more of i 1 , ••• ,im - 1 are equal to G, so
fix them all in I. For the next part of the proof, I will construct a measurable
mapping T of {N(t) = m} into ft, such that
(41) X(I + s) = Xes) a T on {N(t) = m}
and
(42)
for all special A and all measurable subsets B of {X(O) = j}. Let
Property (41) is fairly clear: look at Figure 1, and think this way. Fix x E!!£
with t < O'(x) and N(t, x) = m. Fix s ~ O. Suppose t + s < O'(x). Then
N(t + s, x) ~ m; say N(t + s, x) = m + n. In particular,
+ s,,.x) = ~m+n(x),
X(t
Let O'o(x) = 0 and O'm(x) = TO(X) + ... + T m-l for m = 1,2, .... Then
N(t + s, x) = m + n means
O'm+1(x) + T m+1(x) + ... + T m+n-l(X)
;;;; t + s < O'm+1(x) + Tm+1(X) + ... + Tm+n(X);
that is,
O'm+1(x) - t + T m+l(X) + ... + T m+n-l(X)
;;;; S < O'm+1(x) - t + T m+1(x) + ... + T m+n(x).
That. is, 7Ti (A n T-IB) = abde- v - u • But 7Ti (A) = ad and 7T;(B) = be- v- u ,
by (37). This completes the proof of (42) for one special A and all special B.
Clearly, (42) holds for B = 0 and B = {X(O) = j}; call these sets special
also. Two different special B are disjoint or nested, and the class of special
B's is closed under intersection and generates the full a-field on {X(O) = j}.
Both sides of (42) are countably additive in B, so (10.16) makes (42) hold for
all measurable B. This completes the proof of (40).
Let R(/, i,j) = 7Ti{X(/) = j}. Use (40), and (4) with i for I to see: R is a
stochastic semigroup on i; while X is Markov with stationary transitions R
and starting state i relative to 7Ti. I still have to show that R is standard, and
R'(O) = Q. The (} row is easy. Fix i E I. I will do the i row. I say
(44)
Suppose i is not absorbing; the other case is trivial. Let Ui and U; be
independent and exponentially distributed, with parameters q(i) and q(j).
5.7] THE UNIFORM CASE 165
By (37),
+ Tl ;:;; t and ~1 = j} = + Ui
A
b(f)
~3(f)
f
~o(f)
h(f)
Figure 3.
Let S be the set of right continuous step functions from [0, (0) to l. Let
X(I,f) = f(/) for 1 ~ 0 andfE S, and endow S with the smallest a-field ~
over which all X(t) are measurable. I claim that ;n and On are ~-measurable.
The case n = 0 is easy: ;o(f) = f(O) and 0o(f) = O. Suppose inductively
that on is ~-measurable. Confine f to {on < oo}. Then
;n(f) = lim.jof[on(f) + e);
so ;n(f) =j iff for all m = 1, 2, ... there is a rational r with
Because {~n> an} span l:, there is at most one probability Pi on l: for which:
(a) is a discrete time Markov chain with stationary transitions
~o, ~1' •••
r and starting state i;
(b) given ~o, ~1' ••• , the random variables 'tn = an+t - an are con-
ditionally independent and exponential with parameter q(~n)' for
n = 0, 1, ....
NOTE. If r is substochastic, then ~n may be undefined with positive
probability. By (29), the jth row sum of r is 1 or 0, according as q(j) > 0
orq(j) = o.
The condition on Pi can be restated as follows, using (10.16): for any
nonnegative integer n, and sequence io = i, i1> ••• , in in f, and nonnegative
real numbers to, ... , In:
Pi{~m = im and 'tm > tm for m = 0, ... ,n} = pe-t ,
where
n-l 1'(· . )
p = II m=O I m,lm+l
t = l:::'=o q(im)t m.
By convention, an empty product is 1.
(45) Theorem. The probability Pi exists. With respect to Pi' the process
{X(t):O ~ t < oo} is a Markov chain with stationary transitions P and
starting state i.
FIRST PROOF. Use the setup of Section 6, with the present rand q. Check
that the two Q's coincide on f. Fix i E l. With respect to Tr i , the process X
on fE is Markov with stationary standard and stochastic transitions R on 1,
a
by (39); and is absorbing for R. So R is a standard substochastic semigroup
when retracted to f. And R'(O) = Q on f by (39). Furthermore,
R(t, i, i) ~ ra(i)t,
so
~n {1/q(~n):O ~ n < d} = 00,
whether d < 00 or d = 00. Let fiE] be the set of x E fiEo such that X(', x) is
I-valued everywhere. Remember
7Ti{fiEO and ~o E J} = 1.
Now (38) makes 7T i (fiE]) = 1.
If x E fiE], you know that X(', x) is a right continuous I-valued step function
on [0, 00). Visualize X as the mapping from fiE] to S, which sends x E fiE] to
X(', x) E S. Check that X is measurable. Let Pi = 7T iX-]. Then Pi is a prob-
ability on ~, because 7T;(fiE]) = 1. For 0 ~ n < d, check that ~n on fiE] is
the composition of;n on S with X on fiE], and 'Tn on fiE] is the composition of
"rn on S with X on .or]; while;n or "rn on S applied to X(', x) is undefined for
n ~ d(x). Indeed, ~n ~ 0 implies ~m+l ~ ~m and 'Tm < 00 for m < n, while
~n = 0 implies 'Tm = 00 for some m < n, on fiE]. Consequently, the Pc
distribution of {;n' "rn} on S coincides with the 7Tcdistribution of
gn, 'Tn:O ~ n < d} on fiE.
Namely, {;n} is Markov with stationary transitions r
and starting state i.
Given ;, the holding times "rn are conditionally independent and exponentially
distributed, the parameter for "rn being q(;n)' So, I constructed the right Pi'
The Pi-distribution of the coordinate process X on S coincides with the 7Ti-
*
distribution of the process X on fiE: both are Markov with transitions P and
starting state i.
SECOND PROOF. Use (38) and (7.33).
*
What does (45) imply about an abstract Markov chain with transitions P?
Roughly, there is a standard modification, all of whose sample functions
are right continuous step functions. Then the jump process is a r -chain.
Given the jumps, the holding times are conditionally independent and
exponentially distributed, the parameter for visits to j being q(j).
More exactly, let (0,~,.9') be an abstract probability triple. Let
{Y(t): 0 ~ t < oo} be an I-valued process on (0, ~). With respect to .9',
suppose Y is a Markov chain with stationary transitions P. Remember that
P is uniform, by assumption. For simplicity, suppose .9'{ YeO) = i} = 1. Let
0 0 be the set of wE 0 such that Y(', w) retracted to the rationals agrees with
somejE S retracted to the rationals; of course,jdepends on w.
(46) Proposition. 00 E ~ and .9'(0 0) = 1.
5.7] THE UNIFORM CASE 169
PROOF. Consider the set of functions 1jJ from the nonnegative rationals R
to /, with the product a-field. The set F of 1jJ which agree with some
f = f(1jJ) E S retracted to the rationals is measurable by this argument. Let
Oo(1jJ) = O. Let
'n(1jJ) = lim {1jJ(r):r E Rand r! On(1jJ)}
0n+l(1jJ) = sup {r:r E Rand 1jJ(s) = 'n(1jJ) for S E R with 0n(1jJ) < S < r}.
Then F is the set of 1jJ such that: either
Oo(1jJ) < 01(1jJ) < 02(1jJ) < ... < ro
all exist, and 0n(1jJ) ~ 00 as n ~ ro, and
or for some n,
0O<1jJ) < 01(1jJ) < ... < On(1jJ) < ro and 0n+l(1jJ) = 00
Let X be the coordinate process on S, and let XRt be X with time domain
retracted to Rt. The &,-distribution of Y Rt coincides with the Pi-distribution
of X Rt . But {XRt E G} is all of S, and therefore has P;-probability 1. So
and
*
probability, so &'M-1 = Pi' And the &,-distribution of {~n' 7 n} coincides
with the Pi-distribution of {;no Tn}.
NOTE. If r is substochastic, then ~n may be undefined with positive
probability. By convention, an empty product is 1.
There is a useful way to restate (48). Define the probability 7Ti on f!( as in
Section 6, using the present rand q. Suppose for a moment that q(j) > 0 for
5.7] THE UNIFORM CASE 171
all j. Let .0 be the subset of Q where gn is defined and Tn < 00 for all n. Then
.9'(.0) = 1. Let M map .0 into !!l':
~n(Mw) = gn(w) and Tn(Mw) = Tn(W).
Then
(49) .9'M-l = 7T i •
1. INTRODUCTION
The set I will be the state space. Starting from i E I, my process will
move through Ii in order.
(2) Let q be a function from I to (O, 00).
The holding time in j will be exponential with parameter q{j), independent
of everything else.
Suppose
(3) ~j {ljq{j): j Eli} = 00 for all i E I
and
(4) ~; {ljq{j):jE/i andj ~ k} < 00 for all iEI and k Eli'
Condition (3) guarantees that my process is defined on all of [0, 00).
Condition (4) guarantees that it moves through all of Ii' when it starts
from i.
The construction
(8) Give I the discrete topology, and let j =I u {q;} be its one-point
compactification.
(9) Define a process Xi on Wi:
Xi{t) = j if ),(j) ~ t < p(j) for some j E Ii
= q; if ),(j) ~ t < p{j) for no j Eli'
EXPLANATION. The process should visit j on an interval of length 7(j).
°
The process should spend total time in the fictitious state q;. So I put the
left endpoint of the j-interval at the sum of the lengths of the preceding
intervals, namely ),(j). To prevent various difficulties with the sample
functions, I will have to cut Wi down to Wi in (12).
6.2] THE FIRST CONSTRUCTION 175
That is, (t, w) - Xi(t, w) is product measurable on [0, <Xl) X Wi' where
[0, <Xl) has the Borel a-field, and Wi has the a-field (6).
(13) Let 7Ti be the probability on Wi which makes the 7"(j) independent
and exponentially distributed, the parameter for 7"(j) being q(j).
Use conditions (3-4) and (5.33):
(14)
Lemmas
Let i range over I, and j over Ii. Let w E wt, as defined in (12).
(15) Xi(t, w) =j iff A(j, w) ~ t < p(j, w). This interval has length T(j, w).
(16) j - [A(j, w), p(j, w)) is 1-1 and order preserving on Ii. The union
of [A(j, w), p(j, w)) as j ranges over Ii is precisely {t: Xi(t, w) Eli}.
(17) X i (', w) is regular in the sense of (7.2).
(IS) Lemma. Lebesgue {t:X(t, w) = cp} = 0, for WE wt.
PROOF. Relations (15-16) show
Lebesgue {t: t ~ A(j, w) and Xi(t, w) E I} = A(j, w).
So
Lebesgue {t:t ~ A(j, w) and Xi(t, w) = cp} = O.
But A(j, w) increases to <Xl as j increases through Ii' by definition (12) of
~. *
176 EXAMPLES FOR THE STABLE CASE [6
So
P(t, i,j) = 'lTk(i) ~< r(i) + r(j)}
t
= 'lTi{r(i) ~ t} - 'lTi{r(i) + r(j) ~ t}.
But
'lTi{r(i) ~ t} = 1 - e-q(i)t by (13)
'lTi{r(i) + r(j) ~ t} = oCt) by (13) and (5.34).
Suppose j >i but j ~ sci). Then there is a state k with i < k < j. By
(10),
{Xi(t) = j} C {rei) + r(k) ~ t},
so
*
pet, i,j) = oCt) by (13) and (5.34).
The theorem
(27) Theorem. Suppose (1-4). Define the process Xi on the probability
triple (Wi' 'lTi ) by (5-9) and (13). Define Q and P by (23-25).
(a) P is a standard stochastic semigroup on I, with generator Q.
(b) Xi is Markov with stationary transitions P and starting state i.
NOTE. Xi has properties (15-18) on W;*, which has 'lTi-probability 1 by
(14).
PROOF. Let i ~j ~ k be in I, and let s, t be nonnegative.
(28) Let ff(t) be the a-field in Wi spanned by X(u) for 0 ~ u ~ t.
Let A E ff(t) with A c {Xi(t) = j}. I will argue
(29) 'lTi{A and Xi(t + s) = k} = 'lTi{A} . P(s,j, k).
Take (29) on faith for a minute. Use lemma (5.4) on (15), (16), (19), (29)
to make P a stochastic semigroup on I, and Xi a P-chain starting from i.
This proves (b), and (26) completes the proof of (a).
To start on (29),
(30) let T map {Xi(t) = j} into Wi as follows:
(Tw)(j) = AV, w) + r(j, w) - t;
(Tw)(h) = r(h, w) for h > j.
From definition (9),
(31) Xi(t + s) = Xis) 0 T on {Xlt) = j};
so
(32) T-l{X;(S) = k} = {Xi(t) =j and Xi(t + s) = k}.
178 EXAMPLES FOR THE STABLE CASE [6
Next,
(33) let B = {w: WE Wi and w(j".) > u". for m = 0, ... ,n}, where
jo = j <h < ... <jn are in I, and uo, Ul, ... 'Un are non-
negative numbers.
I claim
So
(41)
because A and Co are measurable on r(h) with h ~ j; while Bl is measurable
on r(h) with h > j, by (33, 39); and definition (13) makes these two lots
7T;-independent.
Check that A and A(j) are ~-measurable. The definitions are (36--37) and
(7). Definition (13) makes r(j) independent of ~ and exponential with
parameter q(j), relative to 7T;. Abbreviate u = uo, and remember j = jo.
Use (5.30) to conclude
(42)
where C was defined in (35).
Use definitions (13,39,33) to check
(43)
*
Combine (41-43) and (37) to settle (34).
3~
2~
T(O) TO)
I
o
Figure 1.
*
(e) Formal construction. Use (27).
180 EXAMPLES FOR THE STABLE CASE [6
(45) Example. (a) Description. The semigroup is not uniform, but almost
all sample functions are right continuous step functions. The states are the
integers I. Starting from i E I, the process moves successively through i,
i + 1, .... See Figure 1.
(b) State space. I is the integers, with the usual order.
(c) Holding times. q(i) is arbitrary subject to: 0 < qU) < 00; and
q(i) -+ 00 as i -+ 00; and L;:l l/q(i) = 00.
{t:X.(t, W) E I}
is a countable union of maximal intervals [a, b) which are ordered like the
rationals: between any two, there is a third. So
(IX, n
(d) Generator.
+ 1). And
Let i = (IX, n). Then Q(i,j) = ° unless t = (IX, n) or
RELEVANT FACT. Let Sand T be two closed subsets of [0, 00), without *
interior. Then S is homeomorphic to T iff the set of complementary intervals
of S is order-isomorphic to the set of complementary intervals of T.
Informal description
Let I be a countably infinite state space. Let C be a countably infinite set,
linearly ordered by <, with first element 0. The intervals of constancy will
182 EXAMPLES FOR THE STABLE CASE [6
be indexed by some initial segment of C. Let ,;(c) be the value of the process
on interval c. I want ,;(c) to be Markovian. What does this mean? Let r i be
the distribution of ,; when the starting state is i. This explains condition (63).
Fix a present index dEC. Let
Cd = {c:c E C and c ~ d}.
The past is
{,;(c):c E C and c ~ d}.
The future is
{,;(c):c E Cd}'
Given the past and ,;(d) = j, the conditional redistribution of the future
should be the redistribution of the whole jump process {';(c)}. As far as r;
is concerned, c runs over all of C. The index c in the future only runs over Cd'
So I have to wish these index sets order-isomorphic. More explicitly, there
should be a strictly increasing map
Md = M(d,')
of Cd onto C. Suppose a < b < c are in C. There are now two ways to
compute the position of c relative to b. The direct method gives M(b, c).
The indirect method maps first by M(a, .), getting
0= M(a, a) < M(a, b) < M(a, c);
and then computes the position of M(a, c) relative to M(a, b). I want the
two methods to agree:
M(b, c) = M[M(a, b), M(a, c)].
You should now accept condition (51) on the order structure of C and (66)
on the Markov property.
DIGRESSION. Let c = M;/b, so CECa and b = Mac. I claim
Me = MbMa'
Both sides have domain C e• Take dE C e :
Med = M(Mac, Mad) = MbMad.
So C is a semigroup with identity 0, where
a + b = M;lb.
And b > a iff -a + b = M b E C. So you're really facing the nonnegative
part of a countably infinite, linearly ordered, non-commutative group. *
Here is the final slogan: Given the visiting process, the holding times are
conditionally independent and exponential, with parameter depending only
6.4] THE SECOND CONSTRUCTION 183
on what the process is currently holding onto. Let q specify these parameters:
namely, q is a nonnegative function on I. Given ~,the length T(e) of the eth
interval of constancy is conditionally exponential with parameterq[~(e)], and
these lengths are conditionally independent as e varies over C.
How could you construct such an object? It is easy to generate a process
([~(e), T(e)]: e E C} of states and holding times which has the right properties.
You might as well use the coordinate process on the set of functions from C
to I X (0, 00]. This is done in (75). The process should be ~(e) on the eth
interval of constancy, which has length T(e). But where do you put this
interval? The sample function should spend Lebesgue almost no time out-
side intervals of constancy. This suggests making the left endpoint A(e) of
the eth interval equal to the sum of the lengths of the previous intervals:
To get it, assume (65). The intervals of constancy now cover Lebesgue
almost all of [0, 00). On the exceptional null set, put the process in its
fictitious state lfi, namely the point in the one-point compactification of
discrete I.
WARNING. Give C the order topology, and complete it. Suppose x is in
this completion, and is a limit point of C from both sides. If you don't take
precautions, you get stuck with
G = ~ {T(e):e ~ x} < 00,
but
~ {T(e):x < e < e*} = 00 for all e* > x.
Then you can't continue the sample function past G.
I want [A(e), pee»~ to be the eth interval of constancy. Part of this is free:
e -+ [A(e), pee»~ for e E C
184 EXAMPLES FOR THE STABLE CASE [6
(51) For each dEC, assume that there is a 1-1 order preserving map
Md = M(d, .) of Cd onto C, such that:
Mo is the identity;
M[M(a, b), M(a, e)] = M(b, e) for a < b < e in C.
> O. If C is discrete, each e has an
(52) Say C is discrete iff there is a least e
immediate successor, call it s(e): use (51). Let 1 = s(O). Otherwise,
say C is indiscrete: then C is order-isomorphic to the nonnegative
rationals.
(53) Illustration. Let C = {O, 1,2, ... } with the usual order. Define
M(d, n) = n - d for n = d, d + 1, ... .
(54) Illustration. Let C consist of all pairs (m, n) of nonnegative integers.
Let (m, n) < (m', n')iff
m < m', or
m = m' and n < n'.
Let d = (m, n) ~ (m', n'). Define
Md(m', n') = (0, n' - n) for m' = m
= (m' - m, 11') for m' > m.
You check (51).
(55) Illustration. Let C consist of all pairs (m, n) of integers, such that:
n ~ 0 when m = O. Let (m, n) < (m', n') iff
m < m', or
m = m' and n' < n'.
6.4] THE SECOND CONSTRUCTION 185
(60). Observation. Let CE C and w EO with A*(c, w) < 00. For each
i E I, there are only finitely many indices d < C with w(d) = i.
(61) Let 0* be the set of all wE 0 such that
supc {p*(c, w):c E C and A*(C, w) < oo} = 00.
The conditions
where
Co = 0 < C1 < ... < Cn are in C, and
io = i, iI, ... , in are in I, and
(67) Y(C,j, k) = rj{~(C) = k}.
Here is a digression. What y can appear in (67)? Clearly,
y(c,j, k) ~ 0
"f. k y(c,j, k) = 1
y(O, i, i) = 1
y(1, i, i) =0 if C is discrete
y(b, i, k) = "f. j y(a, i,j) . y(Mab,j, k) for a < b.
Conversely, suppose y satisfies these conditions. Then you can define by r
(66) and Kolmogorov: use (51) to help the consistency. Properties (63-64)
are immediate. Property (65) remains an assumption.
The construction
(68) Let W be the set of all functions w from C to (0, 00 J. Let
T(C, w) = w(c) for cE C and wE W.
Give W the smallest a-field over which all T(C) are measurable.
(73) For each WEn, let 1]q(CJ) be the probability on W which makes the
T(C) independent and exponentially distributed, the parameter for T(C)
being q[w(c)].
(74) For A c !!l" and WEn, let A(w) be the w-section of A:
A(w) = {w:w E Wand (w, w) E A}.
Properties of X
The f is for finite. The square bracket is to prevent confusion with sections.
Fix x = (£0, w) E ffl". Review (70-72).
(84) The map c -- [A,(c, x), p(c, x» is 1-1 and strictly increasing on C,[x].
(85) X(t,x) = E(c,x)forA.(c,x) ~ t < p(c,x),anintervaloflengthT(c,x).
(86) The union of [A,(c, x), p(c, x» as c varies over C,[x] is
{t:t<Pt(x) and X(t,x)EI}.
(87) Lebesgue {t: t < pix) and X(t, x) = p} = o.
(88) X(t, x) = p for t ~ Pf(X).
(89) WARNING. X(·, x) need not be regular in the sense of (7.2). And
[A,(c, x), p(c, x» need not be a maximal interval of constancy. See (94).
PROOF.
(92)
Use (64, 66).
Review (59, 61) and (68-70) and (83) and (90). Let fI1 be the set of
*
x = (£0, w) E ffl" such that:
£0 E Q * f"'I QS, and
A,(c, w) < 00 iff A,*(c, (0) < 00 for all c E C, and Pt (w) = 00.
For £0 E Q, let
W'" = nc {W[c, £o]:c E C}.
Let
Woo = {w:w E Wand Pt(w) = oo}.
Let
Q 1 = Q* f"'I QS.
t There's less here than meets the eye; but keep track of the notation.
6.4] THE SECOND CONSTRUCTION 189
Then f![1 is the set of pairs (w, w) with w E 0 1 and WE WW n WOO" By (75),
7T i{f![I} =1 0 1
'lJq(w){Ww n Woo} ri(dw).
But r i {D1} = 1 by (65) and (91). Fix wED. By (73) and (5.33),
1]q(w){W[e, w]} = 1 for each e;
so
1]q(w){Ww} = 1.
Review (59, 61). Fix wE 0*. Let
C;[w] = {e: }.*(e, w) < oo}
O'/(w, w) = ~c {w(e):e E C;[w]}.
By (62, 73) and (5.33),
1]q(w){w:w E Wand O'/(w, w) = oo} = 1.
Review (83). If W E W w, then C:[w] = C/[w], so
O'f(w, w) = pAw, w) = pj(w).
Consequently,
changes at pee, x). If e = sed) for some d, the same argument makes X(·, x)
change at A(e, x). If e >
0 and e = sed) for no d, then C is a limit point of
C from the left. Use (60) to find e(x) < e, such that
Hd, x) -:;E:. ~(e, x) for c(x) < d < e.
So
X(t, x) -:;E:. ~(c, x) for p(c(x), x) ~ t < A(c, x).
This forces X(·, x) to change at A(e, x).
(95) Lemma. Tri {X(t) E J} = 1. *
PROOF. If q(i) = 0, use (77) and (79). Suppose q(i) > O. By (76),
(96) T(O) is exponential with parameter q(i), and is independent of
{~(c), T(e):e >
O}, relative to Tr i .
Let
Yes, x) = X[T(O, x) + s, xl.
So Yis jointly measurable by (78). If x E fIb then p,(x) = 00 by definitions
(83, 92), so
Lebesgue {s: yes, x) = T} = 0
by (87). Temporarily, let
Ao(c) = ~d {T(d):d E C and 0 < d < e}.
Then
yes) = ~(c) if )'o(c) ~ s < Ao(c) + T(e) for some e E C
= T if Ao(e) ~ s < Ao(c) + T(C) for no C E C.
Therefore, Y is measurable on {Hc), T(C):C > OJ. Now (96) makes T(O)
*
exponential with parameter q(i), independent of Y, relative to Tr i • Complete
the argument as in (19).
JBi
h dFi = [ r
Jrg(d)~j}
h driJ . rj{B I }·
PROOF. This restates the Markov property (66): time d is the present, h is
in the past, BI is in the future, and Bl is BI shifted to start at time O. Formally,
let
do = 0 < dl < ... < ds = d be in C
io = i, iI' ... , (v = j be in I
D = {w:w E nand w(dm ) = im for m = 0, ... , N}.
Now
eo = Md l Co = Md l 0 = d = d.v
do < d < ... < dx < e < ... < en
l 1
jo = j = is·
By (66-67),
where
and
q= n~-::o y[M(e m, em+I),jm,}mH] = ri{B I}:
because (51) makes
M(e m , e mH ) = M(Mde m , Mde mH ) = M(c m, CmH)·
= =
*
So (99) holds for h 1D . By (10.16), the result holds for h IA with
A E d. Now extend.
*
Now extend.
192 EXAMPLES FOR THE STABLE CASE [6
The generator
(102) Let pet, i,j) = 'lTi{X(t) = j}.
(103) Define a matrix Q on I as follows:
QU, i) = -q(i);
when C is discrete in the sense of (52),
QU, sCi»~ = q(i)
QU,j) = 0 for j =F i, sCi);
when C is indiscrete,
QU,j) = 0 for j =F i.
(104) Lemma P'(O) = Q, as defined in (102-103).
PROOF When C is discrete, you can use the corresponding argument in
(5.39). The results you need are (64, 76, 79-82).
Suppose C is indiscrete. Fix i and j in I, with j = i allowed. The case
q(i) = 0 is easy, so assume q(i) > O. Fix 00 f/= C, and pretend 00 > C for all
C E C. Review definition (59,61) of Q*. Define a measurable mapping K
from Q* to C U {oo} as follows. If A*(C, (0) < 00 and w(c) =j for some
C > 0, there is a least such C by (60); and K(w) is this least c. Otherwise,
K(w) = 00. Count C off as {c1 , C 2 , ••• }. Define a measurable mapping L
from Q* to C as follows:
L(w) is the Cn with least n satisfying 0 < Cn < K(w).
Because C is indiscrete, L is properly defined. Define a measurable mapping
, from Q* to I:
'(£0) = w[L(w)].
For each k E I, let Ui and Uk be independent, exponential random vari-
ables, withparametersq(i)andq(k). Ifw E Q* and ~(O, (0) = iand '(£0) = k,
then (73) shows:
(105) the 7Ja(O»)-distribution of 7'(0) and 7'[L(w)] coincides with the distri-
bution of Ui and Uk'
I claim:
(106) 'lTi{7'(O) ;;a t and X(t) = j} = oCt) as t -+- O.
To argue (106), abbreviate
A, = {7'(0) ;;a t and X(t) = j}.
6.4] THE SECOND CONSTRUCTION 193
(107)
The theorem
(108) Theorem. Suppose (48-51) and (63-66). Define the probability triple
(,q£, 'Tr i ) by (70, 75). Define the process X on ,q£ by (72). Define P and Q by
(102-103). Then
(a) P is a standard stochastic semigroup on I, with generator Q.
(b) X is Markov with stationary transitions P and starting state i,
relative to 'Tri'
NOTE. The construction has properties (78-88) and (93-94).
PROOF. To start with, fix i andj in I, fix t ~ 0, and fix dEC.
(109) LetD = {Wand/,(d) ~ t < )'(d) + T(d)}; the definitions are (68-69).
(I 10) Define a mapping TI of 12 into 12:
(TIW)(C) = w(Mi1c) for c E C;
the prior definitions are (51, 57).
(111) Defining a mapping T2 of D into W:
(T2W)(0) = )'(d, w) + T(d, w) - t;
(T2W)(C) = w(MiIc) for c E C with c > 0;
the prior definitions are (51), (68-69), and (109).
(112) Define a mapping T of Q x D into,q£:
T(w, w) = (TIW, T 2w).
You have to argue
(113) X(t + s) = Xes) 0 T for all s on 12 x D.
This is a straightforward and boring project, using (84-86) and (88).
(I 14) Define a subset A of,q£ as follows.
A = Al X (A2 n D), where:
D was defined in (109);
Al = {w:w E 12 and w(dm) = im for m = 0, ... ,N};
A2 = {w: WE Wand w(dm) > tm for m = 0, ... , N - I};
do = 0 < d} < ... < dN = d are in C;
io = i, iI, ... , iN = j are in I;
to, t}, ... ,tN-I are nonnegative numbers.
NOTE. m < N in A 2•
6.4] THE SECOND CONSTRUCTION 195
By (75),
(119) 7T;(A n T-IB) = f A 'fjq(w)(A z n fJ n liz) r;(dw).
A1I"'IBl
But A z and fJ are measurable on the a-field ffI of (100). Use definition (73):
(120) 'fjq(w)(A 2 n fJ n liz)= 'fjq(w)(A z n fJ). e-v , where
v = q(jl)U I + ... + q(jn)u n and W E iiI'
Let ~ be the a-field in W spanned by T(C) with C < d. Then Az and )'(d) are
~-measurable: definitions (114) and (69). Abbreviate u = uo. Remember
196 EXAMPLES FOR THE STABLE CASE [6
(123) f A
A 1 nB1
"lq(w)(A 2 (') D) ri(dw) = [f "lq(ro)(A 2
Ai
(') D) ri(dW)] . rj(BI )·
(124)
Use (113):
T-IB = {Q x D and ~(d) =j and X(t + s) = k}.
From (126),
(127) A(d) c {Q x D and ~(d) = j}.
So extended (116) makes
(128) 7T i {A and X(t + s) = k} = 7T i {A} . 7Tj{X(S) = k} for all A E d(d).
I claim
(130) If A E .fF(t) as defined in (129), then A(d) (l A E d(d), as defined
in (126).
NOTE. I do not claim A(d) E .fF(t).
Let
r(i,j) = Q(i,j)/q(i) for i ~ j and q(i) >0
= 0 elsewhere.
So
r(i, i) = 0
r(i,j) =0 when q(i) =0
~; r(i,j) = 1 when q(i) > O.
Let p be a probability on l. Starting from i, the process jumps according to
r, and the holding times are filled in according to q. If the process hits an
absorbing state j, that is q(j) = 0, the visit to j has infinite length and the
sample function is completely defined. Otherwise, the sample function makes
an infinite number of visits. However, the time () to perfonn these visits may
be finite. If so, start the process over again at a state chosen from p, independ-
ent of the past sample function. Repeat this at any future exceptional times.
See Figure 2. If () is finite with positive probability, then there is a 1-1
correspondence between p and the transitions Pp.
..
1
I
j
! J
~/ ,i
W,I)
~r(O, 0)- X
Figure 2.
*
= 00.
So (62) works again. Theorem (l08) completes the formal construction.
Write
71'; = r; x 'Yj,
as defined in (75), to show the dependence on p. I would now like to isolate
the properties of the construction that will be useful in (7.51). Fix i E l.
Use (76):
(134) ~(1,0) is independent of g(O,n),T(O,n):n=O,l, ... } and has
distribution p, relative to 71'{.
Let
(fn = T(O, 0) + ... + T(O, n - 1) for n = 1,2, ...
Use (134):
(135) 71'r{O < 00 and ~(1, 0) = j} = 71'[{O < oo}· p(j) for j E I.
200 EXAMPLES FOR THE STABLE CASE [6
To state (136), let fIo be the subset of fIl' as defined in (92), with:
~(m, 0) E I for m = 0, I, ... ;
r[~(m, n), ~(m, n + I)] > 0 for (m, n) e C.
(136) Lemma. (a) 7Tf{fIO} = 1 for i E I.
(b) If x E fIo, then X(t, x) E I for all t ~ o.
(e) If x E fIo, then X(·, x) is regular in the sense of (7.2).
PROOF. Claim (a). Use (93) and (76).
Claim (b). Let x E fI o. Suppose ~(., ·)(x) visits a or b. Let (m, n) be the
first index with ~(m, n) = a or b. Get n > 0 and q[~(m, n - I)(x)] = o. So
A*(m, n)(x) = 00 by definition (59), forcing A(m, n)(x) = 00 by definitions
(69, 92). This prevents XL x) from reaching a or b, by definition (72). You
*
check X(t, x) =F cpo
Claim (c). Use (94).
(137) Lemma. Let PP(t, i,j) = 7TF{X(t) = j} for i and j in I. Then pP is a
standard stochastic semigroup on I, with generator Q.
*
PROOF. Use (108) and (136).
DISCUSSION. Fix x E fIo, as defined for (136). Here is a description of
XC·, x). Let
OJf(x) = ~ {T(m, n)(x):(m, n) E C and m < M};
so 0o(x) = 0 and Ol(X) = 0Cx). Suppose A(M, N)(x) < 00. Let m be one of
0, ... ,M - 1.
X(·, x) is a step function on [Om(x), 0m+l(x», visiting ~(m, n)(x)
with holding time T(m, n)(x) for n = 0, I, ....
X(·, x) is a step function on WM(X), p(M, N)(x», visiting ~(M, n)(x)
with holding time T(M, n)(x) for n = 0, ... , N.
And
limn~oo ~(m, n)(x) = cp;
in fact,
~;:o I/q[~(m, n)(x)] < 00.
Part of this I need. Keep x E Xu, and check (138-140); the times 0 and (1n
were defined after (134).
on intervals of length
T(O, O)(x), T(O, 1)(x), ... , T(O, n)(x).
(139) ~(1, 0) = limX(r) as rational r decreases to e, on {e < oo}.
Use (138-139).
(140) The sets {e < oo} and {e < 00 and ~(l, 0) = j} are in the (J'-field
spanned by {X(r):' is rational}, on fl'o.
Here is a more explicit proof of (140). For any real t, the event that e ~ t
coincides with the event that for any finite subset J of I, there is a rational
, ~ t with X(,) 1= J. The event that (J < 00 and HI, 0) = j coincides with the
event that for any pair of rationals rand s,
either (J 1= (r, s)
or there is a rational t with r < (J < t < sand X(t) = J.
WARNING. 7Tf has mysterious features not controlled by the semigr 'up of
transition probabilities, like the beauty of the sample functions. HONever,
the 7Tf-distribution of X retracted to rational times has no mysteries at all:
it is completely controlled by the semigroup and the starting state i. Since Q
is silent about p, it does not determine the semigroup.
°
(141) Note. To get the simplest case of (132), let I be the integers. Let
r(n, n + 1) = 1 for all n. Let < q(n) < 00 with
T(O,O)
°
-J
-2
-3
-4
1 I Figure 3.
I
(143) The generators in (141) and (142) are equal. But the sample functions
are very different.
(144) Example (a) Description. The process moves cyclically through the
rationals in [0, I).
(b) State space. I is the set of rationals in [0, I).
(c) Holding times. q is arbitrary, subject to
q(i) > 0 for all i, and ~iEI Ijq(i) < 00.
(d) Generator. Q(i, i) = -q(i) for all i, and Q(i,j) = 0 for i ¢ j.
(e) Formal construction. Define C, <, and M as in (56). Define the
r i of (57-58) by the requirement that
~(c) E I and ~(c) = i + c modulo 1 for all c, almost surely.
6. MARKOV TIMES
Example (160) and the results of this section may help you understand
some of the technicalities in my formulation of strong Markov, Section 7.4.
The results of this section will not be used in other chapters of the book. Let
(,q[,.'F) be a measurable space. Let V be a compact metric set. Endow V
with the Borel a-field. For each t ~ 0, let X(t) be a V-valued and .'F-measur-
able function on ,q[.
(l45a) Let .'F(t) be the a-field in ,q[ spanned by Xes) for 0 ~ s ~ t.
PROOF. Let r range over a countable dense subset of [0, t], which
contains t.
Claim (a). {a < T} n {T ~ t} = Ur {a < r < T ~ t}.
Claim (b). {a < T} n {a ~ t} = Ur {a ~ r < T}.
Claim (c). {a ~ T} = gr\{T < a}. Use (a) and (b).
Claim (d). Let A c {a ~ T} and A E §"(a). Then
A n {T ~ t} = Ur {A and 0' ~ r} n {T ~ r}.
Claims (e) and (f) follow from (c) and (d), because
= =
{a T} {a ~ T} n {T ~ a}.
(149) Lemma. Let T be a Markov time. For each n, let an be a strict Markov
*
time. For each x, suppose
PROOF. To begin with, §"(T+) C §"(o'n): adapt the argument for (148).
Next, §"( an) nonincreases by (148d). Finally, let A E §"( an) for all n. I have
to get A E §"(T+). Let En = {an ~ t}. Then En i {T < t}. So
< t}.
A n En E §"(t) and A n En iAn {T
*
(150) Lemma. Let T be a Markov time. Then T + -1 is a strict Markov time,
and §"(T + ~) is non increasing with n, and n
*
Claim (c). Suppose A E ~ is a union of ~<Xl-atoms. Now (b) stops A
from splitting ~n-atoms, so A E ~n' forcing A E ~<Xl'
(154) Lemma. (a) Points x and yare in the same ~(t)-atom iff
Xes, x) = X(s,y) for 0 ~ s ~ t.
(b) Points x and yare in the same %(1+ )-atom iff there is a positive
8 = 8(X, y) such that
Xes, x) = xes, y) for 0 ~ s ~ t + B.
PROOF.
Temporarily,
Check (a). Then use (153b) to get (b).
*
X(oo, z) = O.
(155) Lemma. Let a be a strict Markov time.
(a) Then x and yare in the same ~(a)-atom iff a(x) = a(y) and
X(I, x) = X(I,y) for 0 ~ I ~ a(x).
(b) If a(x) = u < 00, and
X(t, x) = X(t,y) for 0 ~ t ~ u,
then a(y) = u.
Let T be a Markov time.
206 EXAMPLES FOR THE STABLE CASE [6
(c) Then x and yare in the same ~(T+ )-atom iff T(X) = T(y) and there
is an e = e(x, y) > 0 such that
X(t, x) = X(t,y) for 0 ~ t ~ T(x) + e.
(d) If T(X) = U < 00, and e > 0, and
X(t,x) = X(t,y) forO ~ t ~ u + e,
then T(Y) = U.
PROOF. Claim (a). Suppose x and yare in the same ~(a)-atom. Then
a(x) = a(y) because a is ~(a)-measurable, and
X(t, x) = X(t, y) for 0 ~ t ~ a(x)
because
{X(t) = v and a ~ t} E ~(a).
As (154a) shows, x and yare in the same ~(u)-atom: so both are in or both
are out of A n {a ~ u}. Since both are in {a ~ u}, it follows that both are in
or both are out of A.
Claim (b). The set {a = u} is in ~(u), and can't split ~(u)-atoms. But
x andy are in the same ~(u)-atom, by (154a).
*
Claim (c). Use (153), (150), and (a).
Claim (d) is like (b).
For the rest of the section, suppose
(156) X(', x) is right continuous for all x E fr.
(157) Proposition. Suppose (156), and suppose (fE,~) is Borel.
(a) ~(t) is separable, and saturated.
(b) ~(t+) is saturated.
For (c), let a be a strict Markov time. Define a process Yasfollows:
Y(t) = X(t) for all t on {a = oo};
yet) = X(t) for t ~ a on {a < oo};
= X(a) for t ~ a on {o' < oo},
(c) Y generates ~(a); in particular, ~(a) is separable and saturated.
For (d), let T be a Markov time.
6.6] MARKOV TIMES 207
*
(152).
Claim (d). Use (c) and (l53c) and (150).
NOTE. Suppose (156), and suppose (,q', §") is Borel. Let 0 ;;;; 0' ;;;; 00 be
°; ;
measurable, and satisfy (I 55b). Then a is strict Markov, by (157a). Let
7' ~ 00 be measurable, and satisfy (155d). Then 7' is Markov by (147c)
and (157b).
If (,q', §") isn't Borel, this characterization of stopping times fails, as does
(l57a); analyticity isn't enough. I don't know about (l57c).
EXAMPLE. Let A be a non-Borel subset of [0, 1]. Let B = [0, 1]\A. Let,q'
be the following subset of [0, I] X {O, I}:
(A X {O}) U (B X {I}).
For t ~ °and (u, v) E,q', let
X(t, (u, v» =u °
when ~ t ~ 1
= u + vet - 1) when t ~ 1.
Let §" be the smallest a-field in ,q' which makes each X(t) measurable. So X
is a process with real-valued, continuous sample functions. Let
a(u, v) = iv.
I claim:
(a) §"(I) is not saturated;
(b) 0' has property (l55b) and is §"-measurable, but is not strict Markov.
PROOF. Let f1d be the full Borel a-field in [0, 1] X {O, I}. Let ~ be the
a-field in [0, 1] X {O, I} of sets of the form C X {O, I}, where C is a Borel
subset of [0, 1]. For ~ = f1d or ~, let ~ be the a-field in ,q' of all sets ,q' r. S
with S E ~; the atoms of ~ are the singletons.
I say §" = Ii. Indeed, X(t) is ii-measurable, so §" c Pi. Conversely,
{(u, v): (u, v) E,q' and u ;;;; a} = {X(O) ;;;; a} E ?
{(u, v): (u, v) E,q' and v ;;;; b} = {X(2) - X(l) ;;;; b} E ?;
so ij c ? Similarly, ?(I) = ci.
208 EXAMPLES FOR THE STABLE CASE [6
(c) ~ is &-independent of C;
(d) C is inseparable.
PROOF. Claim (a). Suppose ~(x) = j -#: ~(y). If a(x) -#: a0'), then x and y
can even be separated by a ~-set. So let a(x) = a0') = t. Then x and yare
separated by the C-set {X(t) = j and a ~ t}.
6.6] MARKOV TIMES 209
Claim (b). The basic tS'-set {X(t) = j and a ~ t} differs by a .9'-null set
from the set {X(t) = j and a > t}. This set is empty unless j = 1, in which
case this set reduces to {a > t}. Either way, this set is in ~.
*
Claim (c). Use (b).
Claim (d). Use (a, c) and (152).
For the rest of this section, let! be a countably infinite set. Let V = I u {rp}
be the one-point compactification of discrete J.
EXAMPLE. Let (g[, g-) be the Borel space of sequences of ± l. Let
sn(x) = x(n) for n = 1,2, ... and x E g[. For t ~ 0 and x E g[, let
X(t, x) = 0 when t ~ 1
1
= sn(x)n when - - :::; t < -1 and n = 1,2, ...
n+l- n
= rp when t = O.
So X is a right-continuous process. As everybody knows, g-(O+) is
inseparable.
PROOF. Let.9' be the probability on g- which makes the Sn independent
and ± 1 with probability t each. Let ~ be the tail a-field in g[. Each ~-atom
is a countable set: x and yare in the same ~-atom iff
sn(x) = sn(Y)
for all n ~ n(x, y),
by (l53b). So.9' assigns measure 0 to each atom of~. But .9' is 0-1 on ~, by
Kolmogorov. Now (10.17) forestalls the separability of~. You have to check
~ = g-(O+). *
(158) Proposition. Suppose (156). Then
(a) {X(t) E I} n g-(t) = {X(t) E l} n g-(t+).
More generally, for strict Markov a,
(b) {X(a) El} n g-(a) = {X(a) El} n g-(a+).
PROOF. Claim (a). Let A C {X(t) E l} and A E ~(t+). I have to get
A E g-(t). Let
*
The first set on the right is in ff"(t) by definition. The second one is at first
sight only in ff"(t+), but (a) gets it into ff"(t).
(159) Proposition. Suppose (156), and suppose X(t, x) E I for all t ~ 0
and all x E!!t. Then every Markov time is strict.
*
PROOF. Use (147c) and (I 58a).
NOTE. Suppose (156). Let T be a Markov time. Suppose T(X) = 00 or
X[T(X), x] E I, for all x E!!t. Then T is strict, as in (I 58b). This sharpens (159).
NOTE. Suppose f!P is a probability on (!!t, ff"), which makes X an 1-
valued Markov chain: so f!P{X(t) E I} = 1 for all t; and ff"(t+) is larger than
ff"(t) only on a f!P-null set, which depends on t. There is (181) a strict Markov
a with ff"(a+) really larger than ff"(a); and (I83) a Markov T which is really
different from any strict Markov time.
(160) Example. (a) Description. The states are the pairs of integers.
Starting from (a, b), the process moves successively through (a, b),
(a, b + 1), .... This defines the process only a finite interval [0,0). Let S
be a stochastic matrix on the integers. At e, choose a' from S(a, .),
independent of the past sample function, and restart the construction from
(a', - (0). See Figure 4.
(b) State space. 1= {(u, v):u and v are integers}.
(c) Holding times. q(u, v) = rev), where 0 < rev) < 00, and
~':-<Xl 1Ir(v) < 00.
(d) Generator.
Q[(u, v), (u, v)] = -rev);
Q[(u, v), (u, v + I)] = rev);
Q[(u, v), (u', v')] = 0 unless u' = u and v' = v or v + 1.
(e) Formal construction. Define, C, <, and M as in (55). Let
i = (a, b) E 1.
(161) Let n i be the set of co En, as defined in (57), such that: coCO, n) =
(a, b + n) for n = 0, 1, ... ; the first coordinate ~(m, co) of co(m, n)
depends on m, but not on n; co(m, n) = (~(m, co), n) for positive m
and integer n.
6.7] CROSSING THE INFINITIES 211
4
··
,,':
-2
-3
1st coord ina te 1st coordina te 1st coord ina te
-4 is ~(O) is ~(l) is ~(2)
! f
i !
Figure 4.
W E W such that
A(e, w) < 00 for all e E C,
and
~CEC wee) = 00.
As in (176),
(179) ff[.:t(M, N)] = d(M, N): definitions (l46c, 169b).
Use (149):
(ISO) ff(OM+) = d(M, -(0): definitions (146d, 169c).
(181) Proposition. 01 is strict Markov. If S(O, .) is nontrivial, then the 7T ,-
measure algebra of ff(Ol +) is strictly larger than the 7T ,-measure algebra of
ff(Ol)'
PROOF. Use (175) for the first claim. For the second, '(1) is ff(Ol +)-
measurable by (ISO). Use (170, 176) to see that '0) is 7T z-independent of
ff(Ol)' and has 7Tz-distribution S(O, '). *
214 EXAMPLES FOR THE STABLE CASE [6
PROOF. Use (175) for the first assertion, and (173a) for the second. For
the third, I say that ~(2) is measurable on Y. Indeed, (173) makes ~(2) the
first coordinate of yet) for all small positive t. Combine (170, 176):
7Tz{~(2) = 3\ ~(82)} = S(1, 3) on g(l) = I}
= S(2, 3) on g(1) = 2}.
*
From (168),
7TZ{~(1) = I} = S(O, 1) and 7T z g(1) = 2} = S(O, 2).
Let T be the least 8m if any with ~(m) = 1, and T = 00 if none.
'(183) Proposition. T is Markov in the sense of 046b). Let
NOTE. 7Tz{8 1 = T} = K ~ 1.
PROOF. Use (173) for the first assertion, and (168) for the second. For the
third, let a* be a strict Markov time. Let 8", = 00. Let a be the least
8m ~ a* for m = 1,2, ... , 00. Then a is strict Markov,
a = 8. for random y = 1,2, ... , 00,
and
*
7
THE STABLE CASE
1. INTRODUCTION
I want to thank Howard Taylor and Victor Yohai for checking the final draft of
this chapter.
216
7.2] REGULAR SAMPLE FUNCTIONS 217
Let T be a general Markov time. Given X(T) = j EO I, the pre-T sigma field
and the post-T process are conditionally independent. The post-T process
is a P-chain starting fromj, all of whose sample functions are regular. This
is proved in Section 4.
Let Q be a matrix on I, with
q(i) = -QU, i) ~ 0 for all i
Q(i,j) ~ 0 for all i ¥= j
L j Q(i,j) = 0 for all i.
Then there is a minimal standard substochastic semigroup P with generator
Q. If P isn't stochastic, there are a continuum of different standard stochastic
semi groups P with generator Q. You can manufacture P as follows. Let
r(i,j) = Q(i,j)!q(i) for i ¥= j and q(i) >0
=0 elsewhere.
Start a chain jumping according to r, and waiting according to q. If the
chain only covers part of the line according to this program, too bad for it.
The transitions of this chain are P. You pick up the other solutions by
continuing the construction in different ways, as in (6.132). These results are
proved in Section 5.
Here are the main results of Section 6. First,
t ~ P(t, i,j)
is continuously differentiable. Second,
ret) = QP{t) for some t >0
iff there are jumps to ({J on almost no sample functions, iff
Lj Q(i,j) = 0 for all i.
Third,
ret) = P(t)Q for some t >0
iff there are jumps from ({J on almost no sample functions. In fact,
J(t, i,j) = P'(t, i,j) - LkP(t, i, k)Q(k,j)
is the renewal density for jumps from ({J to j, in a chain starting from i.
(4) Lemma. A(i, s) is measurable and Pi{A(i, s)} = rq(i)', even without (1).
PROOF. Suppose s E R. Let n be so large that N = 2 n s is a positive
integer, and let
A(n, i, s) = {w:w E nand w(mj2n) = i for m = 0, ... , N}.
Plainly,
(S) Pi{A(n, i, s)} = [P(2- n , i, i)f".
*
As n increases, A(n, i, s) decreases to A(i, s), while the right side of (S)
converges to e- q (i)8. You move s.
7.2] REGULAR SAMPLE FUNCTIONS 219
limHoP;{G(r, en = 1
for each r E R. To avoid trivial complications, suppose r > 0. Fix a positive
binary rational e less than r. Using (4) and a primitive Markov property,
< l.
° ° °
(7) Lemma. Let Un be geometric with parameter Pn Let Pn - l. Let
~q ~ 00. Let an > and an - so that
(1 - Pn)/an - q.
Then the distribution of an Un converges to exponential with parameter q.
*
PROOF. Easy.
For wE Q g and r E R, there is a maximal interval of s E R with r as
interior point and w(s) = w(r). Let u and v be the endpoints of this interval,
which depend on w. If r > 0, then u r < <
v and w(s) = w(r) for all s E R
with u < < s v, and this is false for smaller u or larger v. Either u 0 or u =
is binary irrational; either v = 00 or v is a binary irrational. The changes for
r = 0 should be clear. If w(r) = j, the interval (u, v) n R will be called a
j-interval of w. Let Qj .• be the set of w E Q g such that only finitely many j-
intervals of w have nonempty intersection with [0, s]. Let
°
N increases. Because Og is measurable and Pi{D g } = 1 by (6), it is enough to
prove that A(N) is measurable and Pi{A(N)} -+ as N -+ 00. Let Yen, m) =
X(m2- n ) , so Y(n,O), Yen, 1), ' .. is a discrete time Markov chain with
transitions P(2- n ) starting from i, with respect to Pi' For the moment, fix n.
Aj-sequence of w is a maximal interval of times m for which Yen, m)(w) = j.
Let C 1 , C 2 , ••• be the cardinalities of the first, second, ... j-sequence. Of
course, the C's are only partially defined. Let A(n, N) be the set of w E Og
such that N or more j-sequences of w have nonempty intersection with the
interval m = 0, ... ,2n. Plainly, A(n, N) is measurable. On A(n, N), there
are N or more j-sequences, of which the first N - 1 are disjoint subintervals of
0, ... ,2n. The vth subinterval covers (C. - 1)/2 n of the original time scale.
Consequently Pi{A(n, N)} is no more than the conditional Pi-probability
that },;~::;l 2- n ( C. - 1) ~ 1, given there are N or more j-sequences. Given
there are N or more j-sequences, (1.24) shows C1 - 1, ... , C N - 1 are
conditionally Pi-independent and geometrically distributed, with common
parameter
P(2- n ,j,j) = 1 - 2- nq(j) + 0(2- n ) as n -+ 00.
By (7), the conditional Pi-distribution of 2- n (Cv - 1) converges to the
exponential distribution with parameter q(j) as n -+ 00. As n increases to 00,
however, A(n, N) increases to A(N). Consequently, A(N) is measurable, and
Pi{A(N)} is at most the probability that the sum of N - 1 independent
*
exponential random variables with parameter q(j) does not exceed 1. This
is small for large N.
For w E Ov and nonnegative real f, there are only two possibilities as r E R
decreases to t: either
(9) w(r) -+ i E I;
or
(10) w(r) -+ cp,
If fER, only (9) can hold, with i = w(t). For wE Ov and positive real f,
as r E R increases to t, the only two possibilities are still (9) and (10); if
fER, only (9) can hold, with i = wet).
using (8).
I still have to argue that
f!JJ{ Y(t) = Y*(t)} = 1.
The f!JJ-distribution of {Y(s): s E R or s = t} coincides with the Prdistribution
of {X(s):s E R or s = t}, by (12). And the set of functions cp from R U {t}
to 1 with
cp(t) = lim cp(r) as r E R decreases to t
is product measurable. Using (11),
1 = Pi{X(t) = lim X(r) as r E R decreases to t}
= f!JJ{ Y(t) = lim Y(r) as r E R decreases to t}
= f!JJ{ yet) = Y*(t)}. *
The y* process has an additional smoothness property: each sample
function is I-valued and continuous at each r E R. This property is an easy
one to secure.
(16) Lemma. Let Y be a Markov process on (flE,~, f!JJ) with stationary
transitions P and regular sample functions. Fix a nonnegative real number t.
Let f!tt be the set of x E f!t such that Y(', x) is continuous and I-valued at t.
Then flE t E ~ and f!JJ{flEt} = 1.
PROOF. As in (15), using (14).
Return now to the X process of (11). Keep W E Ov' I say
*
(l7a) (t, w) ---->- X(t, w) is jointly measurable:
that is, with respect to the product of the Borel a-field on [0, 00) and the
relative a-field on Ov' Indeed, X(t, w) = j E I iff for all n there is a binary
rational r with
1
t < r < t + - and X(r, w) = j.
n
The sets of constancy
For i Eland wE Ov, let
Si(W) = {t:O ~ t < 00 and X(t, w) = i}.
This is a level set, or a set of constancy.
7.3] THE POST-EXIT PROCESS 223
The results of this section are essentially due to Levy (1951). For another
treatment, see Chung (1960, 11.15). Here are some preliminaries (18-20).
For now, drop the assumption (1) of stability. Let {Z(t):O ~ t < <Xl} be an
I-valued process on the probability triple (,q', ~,fYJ).
(18) Definition. Drop (I). Say Z is Markov on (0, (0) with stationary
transitions P iff:
(22)
where A = {T :?; t} n Band
To begin with, suppose t > and t, So, . . . , s.'II E R. Let Tn be the least
m/2n with X(mj2n) =;6 i. So Tn :?; Ij2n. Let
,
An = {Tn:?; t and XCTn + sm) = im for m = 0, ... ,M}.
TIOO(W)
t
T(W)
Figure 1.
Because the sample functions are regular, there is an interval to the right of T
free of Si = {t:X(t) = i}, as in Figure 1. Check that Tn is in this interval
for large n, and Tn! T. SO {Tn:?; t}! {T:?; t}. Using the regularity again,
lim sup An cAe lim inf An.
By Fatou,
(23)
and the problem is to compute P;{A n }.
Consider only n so large that 2 nt, 2nso, ... ,2ns M are integers. Then
(24)
where An,N is the event that XCm/2n) = i for m = 0, ... , N - 1 and
X(N/2n) =;6 i and X(N2- n + sm) = im for m = 0, ... ,M. The problem is
to compute Pi{An,N}' Let
(25 a) a(n) = P(2- n, i, i) = I - q(i)2- n + o(2- n)
(25b) ben) = a(np n t - 1 =P;{Tn :?; t}--P;{T:?; t}
(25c) c(n) = [1 - a(n)]-1 P;{X(2- n) =;6 i and X(2- n + so) = io}.
Because {X(mj2n):m = 0, 1, ... } is Markov with transitions P(2- n),
This could also be deduced from the proof of (21), as follows. Abbreviate
t = So ~ 0 andj = io andf(t) = pet, i,j). Let en = 2- n • Recall that a(n) =
peen, i, i) from (25). Now
°
By regularity, Y is continuous at O. And Y is continuous with Pi-probability
I at each t > by (29). Consequently, t ---+ P i { Y(t) = j} is continuous, and
therefore f* is continuous. But en can be replaced by any sequence tending
to 0, without affecting the argument much: Tn is the least men with X(me n ) ¥- i,
and Tn ---+ T from the right but not monotonically. The limit of the difference
quotient does not depend on the sequence, in view of (31). Thus, the right
*
derivative off exists, and is continuous, beingf*. Use 00.67) to see thatf*
is the calculus derivative off
The global structure of {X(t):O ~ t < oo} is not well understood. But the
local behavior is no harder than in the uniform case. To explain it, introduce
the following notation. At time 0, the process is in some state ~o; it remains
there for some time TO' then jumps to T or to a new state ~1' If the latter, it
-remains in ~I for some time TI' then jumps, and so on. See Figure 2.
More formally, let ~o = X(O) and let
TO = inf {t:X(t) ¥- ~o}·
The inf of an empty set is 00. Suppose ~o, ... , ~ n and TO, •.• , Tn are defined.
If Tn = 00 or Tn < 00 but X(TO + ... + Tn) = T, then ~n+!' ~n+2"" as
well as Tn+!' T n+2' . . . are undefined. If Tn < 00 and X( TO + ... + Tn) E [,
then ~n+I = X(TO + ... + Tn) and
T1 + ... + Tn+! = inf {t:T1 + ... + Tn ~ t and XU) ¥- ~n+!}'
1 b x I
/
/
-T~
...
b ......." "
~O
TO
~l
-Tl-
Figure 2.
(33) Theorem. With respect to Pi; the process ~o, ... is a partially defined
discrete time Markov chain, with stationary transitions r, and starting state i.
Given ~o, ... , the holding times 'To, ... are conditionally P;-independent and
exponential with parameters q(~o), .... In other terms, let io = i, ... , in E I.
Given that ~o, ... , ~n are defined, and ~o = i o, ... , ~n = in' the random
variables 'To, ... , 'Tn are conditionally Pi-independent and exponentially
distributed, with parameters q(io), ... ,q(in).
PROOF. Let to, ... , tn be positive real numbers. The thing to prove is
F;{A} = e-S 7T,
where
A = {~o = i o, ~1 = i1 , ••• , ~n = in and 'To ~ fo, 'T1 ~ f1,··· ,'Tn ~ tn}
and
and
Let
B = = iI, ... , ~n-1 = in and 'To ~ t 1, ... ,'Tn-1 ~ tn}·
go
Put i = io and j = i1 and t = to. Let Y be the post-exit process. Let Yn(-, w)
be Y(·, w) retracted to R. I claim
P i { Y(O) = j but Yn 1: OJ = O.
Indeed, Y(·, w) is the translate of part of X(·, w), and has only finitely many
k-intervals in any finite time interval, and is continuous at o. If Y(·, w) is
continuous and I-valued at positive r E R, and Y(·, w) is I-valued at 0, then
Y n(·, w) E Ov . So (29) gets the claim. Exclude this null set. Then
A = {'To ~ t and X(O) = i and Y(O) = j} () {Yn E B}.
7.4] THE STRONG MARKOV PROPERTY 229
For a moment, let 'T be the first holding time in X. According to (21), the
post-'T process Y is a finitary P-chain on (0, (0), and is independent of 'T.
The strong Markov property (41) makes a similar but weaker assertion for a
much more general class of times 'T. In suggestive but misleading language:
the post-'T process is conditionally a finitary P-chain on (0, (0), given the
process to time 'T. By example (6.182), the post-'T process need not be independ-
ent of the process to time 'T. Here is the program for proving strong Markov.
First, I will prove a weak but computational form of the assertion for
constant 'T, in (34). Using (34), I will get this computational assertion for
random'T in (35). I can then get strong Markov (38) on the set {X('T) E I}.
General strong Markov finally appears in (41).
230 THE STABLE CASE [7
You may wish to look at Sections 6.6 and 6.7 when you think about the
present material. David Gilat forced me to rewrite this section one more time
than I had meant to.
Let ~(t) be the a-field generated by Xes) for 0 ~ s ~ t.
(34) Lemma Let 0 ~ So < SI < ... < Sill and let io, iI' ... , i 1"I E I. Let
o ~ t < 00 and let D E ~(t). Then
Pi{D and XU + sm) = i",jor m = 0, ... ,M}
= Pi{D and X(t + so) = io} . 'iT,
where
PROOF. Let Tn be the least m/2 n greater than T. This approximation differs
from the one in (21). Let Bn be the event that X( Tn + sm) = im for
m = 0, ... , M. By regularity,
lim sup (A n Bn) cAn B c lim inf (A n Bn),
so Pi{A n Bn} ---+ Pi{A n B} by Fatou. Let
Cn . m = {em - 1)/2 n ~ T < m/2n}.
Then
P;{A n Bn} = ~;::~1 Pi{A n Bn n C n.m }.
But by (34),
Pi{A n Bn n C n . m } = Pi{A n C n.m n [X(2- nm + so) = ion· 7T,
because
Consequently,
Pi {A n Bn} = Pi{A and X(Tn + so) = io}' 7T.
PSEUDO PROOF. Let T be the least! ifany with X(t) = j, and T = 00 ifnone.
Then T is a Markov time. Suppose Pi{T < oo} = 1 for a moment. Let Y be
the post-T process. Let a be the first holding time in X.
(a) () = a 0 Y.
(b) The Pi-distribution of Y is Pi.
*
(c) The Prdistribution of () coincides with the Prdistrlbution of r1.
(d) The last distribution is exponential with parameter q(j), by (21).
This proof is perfectly sound in principle, but it breaks down in detail. The
time domain of a Y sample function is [0, (0). But a is defined only on a
space offunctions with time domain R. And P j also acts only in this space. So
(a) and (b) stand discredited, for the most sophistical of reasons. For a
quick rescue, let S be the retraction of Y to time domain R.
REAL PROOF. (a) () = r1 0 S on {S E 0v}, and Pi{S E Qv} = 1.
*
(b) The P;-distribution of Sis Pj.
(c, d) stay the same.
I would now like to set this argument up in a fair degree of generality. Let
Q be the set of all functions w from R to 1. Let X(r, w) = w(r) for w E Q
and r E R. Give Q the product a-field, namely the smallest a-field over which
all X(r) are measurable. Here is the shift mapping S from D. to Q:
(37) X(r, Sw) = Y(r, w) = X[T(W) + r, w].
You should check that S is measurable. Here is the strong Markov property
on {XC T) E l}; the set {XC T) = cp} is tougher, and I postpone dealing with it
until (41).
(38) Theorem. Suppose T is Markov. Let D. = {T < O'J}, and let Y be the
post-T process. Define the shift S by (37).
(a) Pi{D. and X(T) E I but S fj; Ov} = 0.
(b) If wE {D. and X(T) E I and S E Ov}, then Y = X 0 S; that is,
Yet, w) = X(t, Sw)for all t ~ 0.
(c) Suppose A E ff(T+) and A c {D. and X(T) = j E I}. Suppose B is a
measurable subset of o. Then
PROOF. Claim (a). Suppose w E .ov. Then Y(·, w) is the translate of part
of X(·, w), is continuous at 0, and has only finitely many k-intervals on
finite intervals. Now use (36d).
Claim (b). Use the definitions.
Claim (c). Use (35) to handle the special B, of the form
{X(sm) = im for m = 0, ... , M},
with °;:;
So < ... < S 111 and i o,·
Claim (d). Use claim (c).
•• , i Jl1 in I. Then use (10.16).
*
The general statement (41) of strong Markov is quite heavy. Here is some
explanation of why a light statement doesn't work. Suppose for a bit that
Pi{T < CX) and X(T) = cp} = 1.
Then §"'(T+) and Yare usually dependent: see (6.182) for an example. At
the beginning, I said that Y is conditionally a finitary P-chain on (0, 00),
given §"'(T+). This is much less crisp than it sounds. To formalize it, I would
have to introduce the conditional distribution of Y given §"'(T+). Unless one
takes precautions, this distribution would act only on the meager collection
of product measurable subsets in 1[0. it loses the fine structure of the
(0
);
Remember the mapping S from (37). What does it mean to say that the
conditional distribution of S is a finitary P-chain on positive R? To answer
this question, introduce the class P of probabilities fl on 0, having the
properties:
(40a) fl{X(r) E I} = I for all positive r E R;
for all w EA. Integrate both sides of this inequality over A. On the right,
you get Pi{A}. On the left, you get
Pi{A and S E [X(r) E In = Pi{A and Y(r) E l} = Pi{A}
by (36c). So, strict inequality holds almost nowhere, proving (43).
Let s = (so, ... , s "'1) be an (M + 1)-tuple of elements of R, with
° ~ So < ... < s1l1' Let i = (io, ... , i M ) be an (M + I)-tuple of elements of
I. Let
B = B(s, i) = {X(sm) = im for m = 0, ... ,M}
C = C(s, i) = {X(so) = io}
7T = 7T(S, j) = n ~l;:ol P(sm+l - Sw i m , i m+l)'
Let G(s, i) be the set of OJ E A satisfying
(44) Q(OJ, B) = Q(w, C) . 7T.
I say
(45) G(s, i) E §(T+) and Pi{A\G(s, i)} = 0.
lhe measurability is clear. To proceed, integrate both sides of (44) over an
arbitrary A E §( T+) with A c A. On the left, you get
Pi{A and Y(sm) = im for m = 0, ... , M}.
On the right you get
Pi{A and Y(so) = io} . 7T.
These two expressions are equal by (35). Now (lO.lOa) settles (45). But
= n
Ap [n r G(r)]
NOTE. Strong Markov (38 and 41) holds for any
[n S • i G(s, i)].
t-t E P
*
in place of Pi;
review the proofs.
Given the ordering of the states, the holding time on each visit to state i is
exponential with parameter q(i), independent of other holding times. This
can be made precise in various ways. For example, let D be a finite set of
236 THE STABLE CASE [7
states. Let iI' ... , in E D, not necessarily distinct. Suppose qUI), ... ,qUn)
positive. You may wish to review (1.24) before tackling the next theorem.
(46) Theorem. Let fl E P. Given that {X(t):O ;;;; t < oo} pays at least n
visits to D, the 1st being to iI, ... , the nth being to in' the holding times on
these visits are conditionally fl-independent and exponential with parameters
qUI), ... ,qUn)'
?ROOF. Let A be the event X visits D at least once, the 1st visit being to i l .
Let be the holding time on this visit. Let B be the event X visits D at least
0"1
n + 1 times, the 2nd visit being to i 2 , ••• , the n + 1st to in+!' On B, let
0"2' ••• , O"n+! be the holding times on visits 2 through n + 1. Let C be the
event X visits D at least n times, the first visit being to i 2 , ••• , the nth to
in+!' On C, let TI' ••. , Tn be the holding times on the n visits. Let t I , •.• , tn+!
be nonnegative numbers. Let
G = {A and Band O"m > tm for m = 1, ... ,n + I}
H = {C and Tm > tm+! for m = 1, ... ,n}.
If fl E P, I claim
(a) fl{G I A n B} = Pi1{G I B}.
Argumentfor (a). Let cp be the time of first visiting D. Then cp is Markov.
Let S", be X(cp + .) retracted to R. Confine w to Q" n S;IQv , a set of fl-
probability 1 by (38 on fl). Then
A n B =An {X(cp) = il and S", E B}
G= A n {X(cp) = il and S", E G}.
So (38 on fl) implies
fl{A n B} = fl{A} . P i, {B}
fl{G} = fl{A} ·P,,{G}.
Divide to get (a).
Confine w to {Q v and X(O) = iI}' There, 0"1 coincides with the first holding
time in X, which is Markov. Let SI be X(O"I + .) retracted to R. Let v be the
Pi I-distribution of SI; so v depends on i l . I claim
(b) PiJG I B} = v{H I C} . e-q(illtl.
Argument for (b). Confine w to {Q v and X(O) = il and SI E Qv}, which
has Pi,-probability 1 by (41a). There,
B = {SI E C}
G = {SI E H} n {O"I >t I }.
7.5] THE MINIMAL SOLUTION 237
Using (21),
PiJB} = V{C}
Pi {G} = v{H} . e-q(hltl.
1
Divide to clinch (b).
Combine (a) and (b):
f-l{G IA II B} = v{H I C} . e-q(illtl.
But v E P by (21), so induction wins again. *
NOTE. Specialize D = {j}. Then (46) asserts: given X visits j at least n
times, the first n holding times in j are conditionally independent and
exponential, with common parameter q(j). This is the secret explanation for
the proof of (8).
REFERENCES. Strong Markov was first treated formally by Blumenthal
(1957) and Hunt (1956), but was used implicitly by Levy (1951) to prove (46).
For another discussion, see (Chung, 1960, 11.9).
The results of this section, which can be skipped on a first reading of the
book, are due to Feller (1945, 1957). For another treatment, see Chung
(1960,11.18). Let Q be a real-valued matrix on the countable set I, with
(47a) q(i) = -Q(i, i) ~ 0;
(47b) Q(i,j) ~ 0 for i -:;t. j;
(47c) L j Q(i,j) ~ o.
NOTE. Any generator Q with all its 'entries finite satisfies (47), by (5.10,
14). If I is finite, or even if SUPi q(i) < 00, then (47) makes Q a generator by
(5.29).
When is there a standard stochastic or substochastic semi group P with
P' (0) = Q? When is P unique? To answer these questions, at least in part,
define r by
r(i,j) = Q(i,j)/q(i) for i -:;t. j and q(i) > 0
=0 elsewhere.
Define a minimal Q-process starting from i as follows. The order of states is a
partially defined discrete time Markov chain with stationary substochastic
transitions r; the holding time on each visit to j is exponential with parameter
q(j), independent of other holding times and the order of states; the sample
function is continuous from the right where defined. In general, there is
238 THE STABLE CASE [7
positive probability that a sample function will be defined so far only on a
finite interval. The most satisfying thing to do is simply to leave the process
partially defined. To avoid new definitions, however, introduce an absorbing
state a1: 1. When the sample function is still undefined, set it equal to a.
The minimal Q-process then has state space I U {a}; starting from a, it
a
remains in forever. The minimal Q-process is Markov, with stationary
standard transitions, say Pa. And P a is a stochastic semigroup on I U {a},
a
with absorbing.
NOTE. All minimal Q-processes starting from i have the same distribution.
One rigorous minimal Q-process can be constructed by (5.39): just plug
in the present matrix Q for the (5.39) matrix Q; more exactly, plug in the
present r for the (5.39) matrix r, and the present q for the (5.39) function q.
Check that the (5.39) matrix Q coincides with the present Q. The construction
embodied in (5.39) produces a process, which I just described informally; and
(5.39) asserts that the formal process is Markov with stationary transitions
Pa, which are standard and stochastic on I u {a}. Let P be the retraction of
Fa to I:
P(t, i,j) = Pa(t, i,j) for t ~ °
and i,j in I.
Now P is a substochastic semigroup on I, because is absorbing. And P a
is standard because P a is. And (5.39) shows P'(O) = Q.
(48) Lemma. P is stochastic iff the minimal Q-process starting from any
i E I has almost all its sample/unctions I-valued on [0, (0). Indeed, ~;El P(t, i,j)
is just the probability that a minimal Q-process starting from i is I-valued on
[0, t].
a
*
PROOF. If the process hits at all on [0, t], it does so at a rational time,
a
and is then stuck in at time t.
It is convenient to generalize the notion of minimal Q-process slightly.
Let the stochastic process {Z(t):O ~ t < oo} on a triple (f!C, ~,9) be a
regular Q-process starting from i E I, namely: Z is a Markov process starting
from i, with state space I u {a}, and stationary standard stochastic tran-
sitions P on I u {a}; moreover, a is absorbing for P, and the retraction of
P to I has generator Q; finally, the sample functions of Z are j U {a}-valued
and regular: a is an isolated point of j U {a}. Let T be the least t if any with
Z(t)f/: I or limBt t Z(s)f/: I;
if none, T = 00. Let Z*(t) = Z(t) if t < T and Z*(t) = aif t ~ T.
a
from i can reach on first leaving i. However, P may be the only standard
substochastic semigroup with generator Q, as is the case when SUPi q(i) < 00.
The main results of this section, which can be skipped on a first reading of
the book, are due to Doob (1945). For another treatment, see Chung
(1960, 11.17). Let P be a standard stochastic semi group on the countable set I;
the finite I case has already been dealt with, in (5.29). Let P'(O) = Q; let
q(i) = -QU, i), and suppose q(i) finite for all i. The problem is to decide
when the following two equations hold:
By (57) below, b(·, i,j) is continuous on [0, (0); here is a more interesting
argument. Suppose Sn -+ s. Using (29),
As (5.14) implies, ~k;o'i Q(i, k) < 00. As (5.9) implies,P(', k,j) is continuous.
By dominated convergence, <1(', i,j) is finite and continuous on [0, 00). Let
A(i,j) be 1 or 0, according as i = j or i ~ j.
244 THE STABLE CASE [7
(52) Theorem.
(a) pet, i,j) = fl(i,j)e-q(ilt + fe-q(ilU [b(t - u, i,j) + a(t - u, i, j)] duo
f
q(i). Fubini this:
Pi{X(t) = j} = e-q(ilu q(i)P,{ Y(t - u) = j} duo
Clearly,
{Y(s) = j} = Uke1 {Y(O) = k and Yes) = j}.
So
P i { Yes) = j} = ~kel P i{ yeO) = k and Yes) = j}.
Use (21) again for k E I:
q(i)Pi{ YeO) =k and Yes) = j} = q(i)r(i, k)P(s, k,j)
= Q(i, k)P(s, k,j).
By definition,
= cp and Yes) = j} = b(s, i,j).
q(i)Pi{Y(O)
Claim (b). Put s = t - u in (a) and differentiate with respect to t, using
the continuity of band C1. Then use (a) again.
Claim (c). Use (b) and the definition of 0'. *
(55) Theorem. (a) If (53) holds for any t > 0, it holds for all t; and then
(56) pet, i, j) = fl(i, j)e-q(ilt + Ie-q(;)U a(t - u, i,j) duo
fe-q(il" bet - u, i, j) du = O.
*
This forces b to vanish somewhere, so everywhere. Use (52c).
(58) Theorem. Thefollowing three relations are all equivalent:
(a) Relation (53) holds for all j.
(b) The X sample function jumps from i to cp with Pi-probability O.
(c) L; Q(i,j) = O.
PROOF. (a) iff (b). A~ usual, suppose q(i) > O. Let T be the first holding
time, and confine w to
{X(O) = i and T < 00 and X(T + I) E I},
246 THE STABLE CASE [7
and is now finite and continuous, by the argument for (1. To remove condition
(59), use the argument on the standard substochastic semigroup t -+ e- t P(t).
Let
I(t, i,j) = P'(t, i,j) + q(j)P(t, i,j) - p(t, i,j),
a continuous function on [0, 00). By Fatou,/~ O.
Informally, (52b) reveals b as the b of P. That is, bet, i,j) is q(i) times the
probability that a P-chain with regular sample functions starting from i
jumps to p on leaving i, and is inj at time t after the jump. So, (57) holds with
hats on. By algebra,
ft(i)j(t, i,j)/ft(j) = b(t,j, i);
By more algebra, hatted (57) is (60). To remove condition (59), apply the
argument to the standard substochastic semigroup t -+ e- t pet). The fudge
factors cancel. *
Of course, (52) and (57) work for substochastic P, by the usual maneuver
of adding o. The argument in (55) shows
(61) Corollary. j(t, i,j) is identically 0 or strictly positive on (0, (0).
If p is continuously differentiable on [0, (0), and
number, integration by parts shows
° ~ q < 00 is a real
Recall that
Jet, i,j) = P'(t, i,j) + q(j)P(t, i,j) - pet, i,j),
where
pet, i,j) = 1: k #j pet, i, k)Q(k,j)
is finite and continuous.
(63) Theorem.
(a) ret, i,j) = j(t, i,j) - q(j)P(t, i,j) + pet, i,j).
(b) pet, i,j) = b..(i,j)e-q(j)t + f [j(s, i,j) + pes, i,j)] e-q(j)(t-s) ds.
(c) In particular,
(64) PI(t, i,j) = '£.kP(t, i, k)Q(k,j)
is equivalent to
(65) j(t, i,j) = 0.
(d) if (64) holds for any t > 0, it holds for all t; and this is equivalent to
pet, i,j) = Mi,j)e-a(j)t + {pes, i,j)e-a(;)(t-s> ds.
is the Pi-probability that the sample function experiences at least one dis-
continuity on [0, t], and the last discontinuity is a jump from some real state
to j. Now (63b) reveals f(s, i,j) ds as the Pi-probability of a jump from f{J
to j in (s, s + ds). All these statements are rigorous in their way. To begin
checking this out, let y be the time of the last discontinuity of X on or before
°
time t, on the set D where X has at least one such discontinuity. That is, D
is the complement of {X(s) = X(t) for ~ s ~ t}. On D, the random variable
y is the sup of s < t with X(s) =F- X(t). By regularity,
X(y) = lim,d y Xes) = X(t);
X(y-) = 1ims! y X(s) =F- X(t)
is a random element of 1.
NOTE. D and y depend on t.
(66) Proposition. Let j, k E I and k =F- j.
(a) Pi{D and X(y-) = k and X(t) = j} = {pes, i,k)Q(k,j)rq(iJ(t-s) ds.
(c) Pi{D and X(y-) = f{J and X(t) = j} = ff(S, i,j)e-q(;)(t-s) ds.
PROOF. Claim (a). Without real loss, put t = 1. Let Dn be the event
X(m/2n) =F- X(I) for some m = 0, ... , 2 n - 1. On D n , let Yn be the greatest
m/2n < I with X(m/2 n) =F- X(I). Using the regularity, check Dn i D and
Yn i y and
{Dn and X(Yn) =k and X(I) = j} ~ {D and X(y-) = k and X(I) = j}.
7.6] THE BACKWARD AND FORWARD EQUATIONS 249
p(1..2 n'
k .) -
,j -
1..
2n
Q(k .) + 0(1..) .
,j 2n '
and
so
p(1..r ,j',j.)m_ e-q(;)u as m _ u, uniformly in
r
°-:=:; u -:=:; 1.
Consequently, (67) converges to the right side of claim (a).
*
Claim (b). Sum claim (a) over k E I\{j}.
Claim (c). Subtract claim (b) from (63b).
If X(y-) = k and X(t) = j, call the last discontinuity of X on [0, t] a
= cpo Let Ilij(t) be the Pi-mean number of jumps
jump from k to j; even if k
from cp to j in [0, t].
°
Let < Yl < Y2 < ... be the times of the first, second, ... jumps from
cp to j. If there are fewer than n jumps, put Yn = 00. Thus Yn - 00 as
n - 00. If Yn < 00, let Tn be the length of the j-interval whose left endpoint
is Yn' Now Yn + Tn ~ Yn+!; while
{D and X(y-) = cp and X(t) =j} = U~1 {Yn ~ t < Yn + Tn}.
SO,
Pi{D and X(y-) = cp and X(t) = j} = l:~1 Pi{Yn ~ t < Yn + Tn}.
Fix a positive real number t. I say that {Yn ~ t} E ~(t). Indeed, let D be
a countable dense subset of [0, t], with tED. For a < bin D, let £(a, b) be
the event that X(b) = j, and for all finite subsets J of I there are binary
rational r E (a, b) with X(r) ¢ J. Let F(s) be the event that for all positive
integer m, there are a and b in D with
s <a<b~ t and b - a < 11m and £(a, b).
250 THE STABLE CASE [7
Then
{Yn ~ t} = Us {Yn-I ~ sand F(s):s ED},
proving that Yn is a Markov time. Clearly X(Yn) = j on {Yn < oo}.
By (21) and strong Markov (38), given {Yn < oo}, the variable Tn is con-
ditionally Pi-exponential with parameter q(j), independent of Yn. Let
'Vn(t) = Pi{Yn ~ t}. By Fubini,
f
'lin'
*
PROOF. Use (21) and (52b) for the first display. Use (57) and (60) for the
second, with b andfexpressed in terms of P using (52b) and (63b).
1. AN OSCILLATING SEMIGROUP
The elements of (5.39) are I, r, q. The state space Ihas already been defined.
Define the substochastic matrix r on I as follows, with d = b or C and with
n = 1,2, ... :
qa, (d, n, 1)]= dn ;
qed, n, m), (d, n, m + 1)] = 1 for m = 1, ... ,fen) - 1;
r[(n,f(n)), d] = 1;
all other entries in r vanish. Define the function q on I as follows, with
d = b or C and n = 1, 2, ... :
q(a) = 1;
qed, n, m) = qn.m;
qed) = O.
254 MORE EXAMPLES FOR THE STABLE CASE [8
• •• •
C)
•• • ••
c, 3, 1 c,3,2 ••• c, 3, 1(3)
C2
••• c, 2,/(2)
Cl
c, 1, 1 C, 1,2 ••• c,I,[(1)
bl
b, 1, 1 b, 1,2 ••• b,I,[(1)
b2
b, 2, 1 •••
b3
b, 3,1 b,3,2 •••
• • ••
•• •• •
Figure 1.
Now (5.39) yields a process X and a probability TTi which makes X Markov
with stationary transitions P and starting state i. The semigroup P is standard
and stochastic on I U {o}, where 0 is absorbing. As you will agree in a
minute, X cannot really reach 0 starting from i E I; so P is standard and
stochastic when retracted to 1. Use (5.39) to check (la-b).
The visiting process in (5.39) was called g n}, and the holding time process,
{Tn}. Let f!(o be the set where ~o = a, and ~1 = (d, n, 1) for some d and n,
while ~1 = (d, n, 1) implies:
Lemmas
Here are some preliminaries to choosing f(n) and qn,m' For (7), let Un
and Vn be random variables on a probability triple (Q n, ~ n' &' n). Suppose
Un has a continuous &' n-distribution function F, which does not depend on n.
Suppose Vn converges in &' n-probability to the constant v, as n -+ 00. Let
G be the &' n-distribution function of Un + v, so
G(t) = F(t - v).
(7) Lemma. &' n{Un + Vn ~ t} -+ G(t) uniformly in t, as n -+ 00.
and
Let
256 MORE EXAMPLES FOR THE STABLE CASE [8
(8) Fact. Sn has mean L~:~ I/qn,m and variance L~:~ I/q~.m'
*
PROOF. Use (5.31).
Choosing f and q
is so small that (14) holds. I can do this because (8) and Chebychev make
SN -- ON in probability as a -- O. But (11) and (12) make
ON < tN < t N- < ... < t
1 l ,
ARGUMENT FOR (9). I will continue with the notation established for
(5.39). Let
As usual, d = b or c. Let
den, t) = {~l = (d, n, 1) and TO + an ~ t}.
As (4) implies, {X(t) = d} differs by a 7Ta -null set from U:=l den, t). As (5,6)
imply,
So
(15)
Use (14) with N = n:
9{To + Sn ~ tn} ~ 9{To + On ~ tn}.
Abbreviate
e(x) = 1 - e- Z•
Let 'V = 1, ... , n - 1. Then (13) with n for N and 'V for n makes
.?J{To + Sv ~ tn} ~ Entn'
Since ~v dv ~ 1, relation (10) makes
*
Combine (IS-IS) to get (9).
Modifications.
The conclusions and proof of (1) apply to this modified chain, provided
that {tn} satisfies (19) in addition to (11-14):
(19)
8.1] AN OSCILLATING SEMIGROUP 259
Let
d+(n, t) = {~I = (d, n, 1) and + an ~ t < TO + an + T,(n)+l}
TO
{U~=I d+(n, t)} c {X(t) = d} c {U~=I d+(n, t)} U {U~I Ud=b.c d*(n, tn.
So the new pet, a, d) is trapped in [D+(t), D+(t) + D*(t)], where
D+(t) = :E~=I 7Ta { d+(n, t)}
and
D*(t) = :E~I :Ed=b.c 7Ta { d*(n, t)}.
But
and
Check
{To + Sn ~ t} n {Tl > t} c {To + Sn ~ t < To + Sn + T I }
c {To + Sn '~ t};
so
.9{To + Sn ~ t} . .9{TI > t} ~ .9{To + Sn ~ t < To + Sn + T I}
~ .9{To + Sn ~ t};
and
7Ta{d+(n, t)} ""=! dnfY'{To + Sn ~ t}
as t -+ 0, uniformly in n. This means you can e5timate D+(t) by the old
pet, a, d). Furthermore, :E n •d d n = 1. So
D*(tN) = .9{To + TI ~ IN} = o(dstN)
*
by (10, 19). This term is trash. The overall conclusion: as N -+ 00, the new
P(tN' a, d) is asymptotic to the old P(t..... , a, d).
Continue with the modified chain. Given the order of visits ~o, ~I' .•• ,
the holding times TO' TI' . . . are independent and exponential, so I once
expected
7T a {To + T1 + T2 ~ I} = 0(t 2) as 1-+0.
Since b can be reached from a in two jumps but not in one, I also expected
pet, a, b) '" t 2 as t -+ O.
260 MORE EXAMPLES FOR THE STABLE CASE [8
•••
pz
2, 1 2, 2
PI
1, 1
Figure 2.
The construction
The elements of (5.39) are I, r, and q. The state space I has already been
r
defined. Define the substochastic matrix on I as follows, with n = 1, 2, ... :
r(a, (n, 1)] = Pn;
r(n, 111), (n, m + 1)] = 1 for m = 1, ... , n - 1;
r(n, n), b] = 1;
262 MORE EXAMPLES FOR THE STABLE CASE [8
*
The rest of the proof
PROOF OF (c). Relation (25) shows that except for a 7Ta -null set,
Therefore,
pet, a, b) = ~~=l PnFn(t).
As (5.34) implies, for all t ~ 0
(28) o ~ F~(t) ~ 2.
By dominated convergence
The density d n of
Sn = T n.l + ... + T n.n
is the convolution of n exponential densities with parameter n:
dn(t) = nn t n- 1 e-nt/(n - I)! for t ~ O.
By Stirling,
(30)
Abbreviate e(t) for 2e- u . So To has density e(·) on [0, 00). And the density
F~ of To + Sn is the convolution of e(·) and d n. Namely,
(31)
In particular,
(32) h-l[F~(1 + h) - F~(l)] ~ _}.2 for -1 < h < 0 or h > O.
Let -1 < h < 0 or 0 < h < 00. Introduce the approximate second
derivatives
s(h) = h-1 [P (1 + h, a, b) - P'(1, a, b)]
I
Of course, r.~:~Pn ~ 1. So
You can modify the construction so P"(t, 0, 1) = OC! for all tEe, a given
countable subset of [0, 00). Count C off as {t}, t 2 , ••• }. Suppose first that all
t. are positive. Let I consist of a, b, and (v, n, m) for m = 1, ... ,n and
positive integer v and n. Rework the chain so that it jumps from a to (v, n, 1)
with positive probability P•. n' where:
r. •. nP •. n = 1; and r.nP •. n n! = 00 for each v.
Make the chain jump from (v, n, m) to (v, n, m + 1) when m < n, and to b
when m = n. Make b absorbing. Let the holding time parameter for a be A.
Let the holding time parameter for (v, n, m) be nit •.
As before,
pet, a, b) = r. •. n P•. n F •. n(t)
and
P'(t, a, b) = r. •. n P•. n F~.n(t):
where F. n is the distribution function of the sum of n 1 independent, +
exponential random variables, of which the first has parameter }. and the
other n have parameter nit•. The reason is like (28). I will work on tlo the
other t. being symmetric. Let
s(h) = h-} [P'(t} + h, a, b) - P'(t}, a, b)].
*
The rest is the same.
266 MORE EXAMPLES FOR THE STABLE CASE [8
The main results (35-36) of this section are taken from (Blackwell and
Freedman, 1968). They should be compared with (1.1-4) of ACM. LetI" =
{I, 2, ... ,n}. Let P n be a generic standard stochastic semigroup on In.
(35) Theorem. For any c5 > 0, there is a P n with
do t
lit
I remind you that
t --+ - f(s) ds
t 0
lit
is nonincreasing iff
f(t) ~ - f(s) ds.
t 0
(36) Theorem. For any K < t,for any small positive B, there is a P n with
and
Pn(l, 1, 1) >1- B.
I will prove (35) and (36) later. Here is some preliminary material, which
8.3] LARGE OSCILLATIONS IN P(/, 1, 1) 267
Figure 3.
Z(/) = 1 for o ~ 1 < To
and + c ~ t < To + c + Tl
To
and To + c + Tl + C ~ t < To + c + Tl + C + Tz
Fubini says
&I'{To ~ t - c and Z*(t - c - To) = 1}
= Jor - f(t
t c
- c - s) qe-q , ds
uniformly in bounded t.
INFORMAL PROOF. Consider a Markov chain with stationary transitions
p .. and starting state 1. The process moves cyclically 1 ---+ 2 ---+ ••• ---+ n ---+ 1.
The holding times are unconditionally independent and exponentially
distributed; the holding time in 1 has parameter q; the other holding time
parameters are (n - 1)/c. There are n - 1 visits to other states intervening
between successive visits to 1. So the gaps between the I-intervals are
independent and identically distributed, with mean c and variance c2 /(n - 1).
For large n, the first 1010 gaps are nearly c, so p ..(t, 1, 1) is nearly f(t) for all
moderate t. *
8.3] LARGE OSCILLATIONS IN P(t, 1, 1) 269
FORMAL PROOF. Use (5.45). Let S be the set of right continuous In-valued
step functions. Let X be the coordinate process on S. Let {~m} be the visiting
process, and let {Tm} be the holding time process. The probability Tr = (P n)i
on S makes X Markov with transitions P n and starting state 1. Let r(m) be
one plus the remainder when m is divided by n, so
r(m) E In and r(m) == m +1 modulo n.
Let So be the set where for all m = 0, 1, ...
0< Tm < 00 and ~111 = r(m).
Then
(44)
And with respect to Tr,
(45) TO, Tlo ••• are unconditionally independent and exponentially dis-
tributed, the parameter for T m being q when m is a multiple of n, and
(n - I)/c for other m.
Let ()o, ()I, ... be the successive holding times of X in 1, and let Yo, YI, ...
be the successive gaps between the I-intervals of X. Formally, on So let
Fix t* with 0 < t* < 00, and confine t to [0, t*]. Then
Consequently,
1 - h(q) = kq + 0(q2);
h(q) - g(q) = iq2 + 0(q3)
= HI - h(q)]2 + 0[1 - h(q)]3.
Fix K with 0 < K < k. Choose q* > 0 but so small that on (0, q*):
h is strictly decreasing;
Let 0 <e<1- h(q*). Choose q' with 0 < q' < q*, so
h(q') = 1 - e.
Then
g(q') < h(q') - K[I - h(q')]2 = 1 - e - Ke 2.
By continuity, there is a positive q less than q', but very close, with
h(q) >1- e
g(q) < 1 - e - Ke 2•
That is, you can find small positive q and c = t so that 1= I(q, c: .)
satisfies the two inequalities of (36). Now use (43) to approximate I by
P n (-, 1, 1). *
4. AN EXAMPLE OF SPEAKMAN
pet, 1,2) = P ( t, 2, 3) = P(
t, 3,)
1 = '13 + '23 e-31/2 cos (3!t 21T)
2" - 3"
The same trigonometry, coupled with cos (2mr) = 1, shows that P(nc) =
P~~. *
(51) Remark. P(t, 1, 1) is not a monotone function of t.
Let n - 00 and use (52) to get &{To = k} = 0 for all k. Sum out k to get
1 = O.
274 MORE EXAMPLES FOR THE StABLE CASE [8
Figure 4.
Aside. You get the same effect by starting a p-walk at the value with
time going forward from 0, and an independent (1 - p)-walk at the value
°°
with time going backward from 0, provided you condition the second walk on
never returning to 0.
S-10 on [0,7_1)
So(') on [0, 00)
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 277
*
But S(J. +.) is obtained by laying the first lot of fragments together, end to
end; and S;(-) is obtained in a similar way from the second lot.
This'; will be the visiting process in X. Give 0 the smallest a-field over which
all Hc) are measurable.
Some lemmas
To state (67), let F be a finite subset of C. Letfbe a function from Fto I.
Let WF be the set of functions from F to (0, 00), with the product a-field.
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 279
*
PROOF. Use definition (66).
(68) Lemma. With respect to 7T;, the processes ~o, ~l' ... are independent.
For m > 0, the 7Tcdistribution of ~m depends neither on i nor on m.
proving the first assertion through (10.16). The second assertion also follows
from (69). Suppose i(m, .) and t(m, .) do not depend on m > O. Then sCm)
does not depend on m > O. And r,{D m } depends neither on i nor on m > 0,
by definition (61). *
The set n 1, the index A, and the a-field ..4
Fix M = 1, 2, ... and N = 0, 1, ... and j E 1.
(70) Definition. (a) Let 0 1 be the subset of 0 0 , as defined in (60), where
~(M, .) visits j at least N + 1 times.
(b) On 0 1 , let A be the index at which ~(M, .) visits j for the N + 1st
time. So
(71) ;(M, A) =j,
and there are N inteKers n with n < A and ;(M, n) = j.
(e) Let d be the a-field in 0 1 generated by ;(m, .) with m <M and by
~(M, A - n) with n ~ O.
(d) Let A(W, w) = A(w)for W E 0 1 and w E W.
NOTATION. 0 1 , A, and d all depend on M and N.
For (72) and the proof of (73), let do be the a-field in 0 generated by
~(m, .) with m < M. Let d l be the a-field in 0 1 generated by ~(M, A -n)
with n ~ O. Check
(72) d is the a-field in 0 1 generated by sets J n K with J Edo and K E d 1•
rh dr
JD i = ri{D*} r
JOt
h dr i •
That is,
ri{J n K n D} = ri{J n K}· r;{D*};
and (73) holds for this h. By (72) and (10.16), the result holds for h = lA
*
with A Ed. It then holds for simple d-functions by linearity, and non-
negative d-functions by monotone approximation.
To state (74-79), let C* be the set of all pairs (m, n) E C with m < M,
together with the nonnegative integers. Let W* be the set of all functions
from C* to (0, (0), with the product a-field. Let nA be a nonnegative integer.
Let CA be a finite subset of C, with m < M for all (m, n) E CA. Suppose
0< tA(c) < 00 for CECA or C· = 1, ... ,nA. Let WA be the set of WE W*
such that
(74) w(c) > tA(C) for CECA' and
w(n) > IA(n) for n = 1, ... ,nA' and
~ {w(c):c E C* but C 7fI: O} ~ t < ~ {w(c):c E C*}.
Check:
(75) W A is a measurable subset of W*.
Define a mapping r ro from W to W*:
(76) Tro(w)(m, n) = w(m, n) for (m, n) E C with m <M
Tro(w)(n) = w[M, A(W) - n] for n = 0,1, ....
Check:
(77) Trois measurable
(78) the ,/,}q(w)-distribution of Tw is d-measurable, as w varies over 01.
Conclude from (75,77-78):
(79) the function w -- '/'}q(ro){Tro E WA } is d-measurable on 01.
(e) Let !!to = Q o X Wo, where Q o was defined in (60). Give !!to the
product u-.field.
(f) On !!to, let
X(t) = ~(m, n) when u(m, n) ~ t < u(m, n + 1)
= fJ' when u(m, n) ~ t < u(m, n + 1) for no (m, n).
Or-----~------~~--------~~----_H~----~~
-I
-2
-3
-4
-s
l'
Figure 5.
PROOF. You have to remember definitions (66) of 7Ti and (80) of !!Co. I
will refer to r, defined in (80b). I will argue that
(84) r
Jnoxw
rem) d7Ti < 00 for m = 0, 1, ....
By thinking,
Lemma (68) makes ro, r1 , ••• independent; the rm being identically distributed
for m > O. Since r 1 is positive,
(85a) r V(j, m) dr
Jao i = _1_
2p - 1
for m >0
r
Jnoxw
rem) d1Ti =r
JnoJw
r r(m)(w, w) 17q«J»(dw) r.(dw) by (66)
=Jnr ~ n Jw
r w(m, n)17q«J» ri(dw) by (64)
o
r
=Jn ~ n IJq[w(m, n)] ridw)
o
by (87)
= ~ _1_.
Iq{j)Jno
r V(j, m)(w) ri(dw) by monotone
convergence
<~_1_ 1 by (86)
= "-I q(j) (2p - 1)
<00
This completes the construction: (~o, 1Ti) is a bona fide probability triple
by (63).
*
by (83), and Xis an I-valued process on this triple by (81). Properties (54a--c)
have already been claimed (82). At this point, you could check property
(54d), and
(88) 1Ti{X(O) = i} = 1 for all i E I.
(89) Lemma. 1Ti{X(t) E /} = 1.
PROOF. Using (54b),
Lebesgue {t:X(t, x) = lP} = O.
Let E be the set of t such that
1T/ {X(t) = lP} >0 for some j.
Use (81) and Fubini to deduce
(90) Lebesgue E = O.
Fix i E I. Let S map Q into itself:
(Sw)(O, n) = w(O, n + 1)
(Sw)(m. n) = w(m. n) for m > O.
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 285
The special A
(96) Definition. Remember the definition (80) of :!Co and X. Let :!C(O) be the
subset of:!C0 where
o ~ t < CfJ1 and X(t) = j.
Let M = 1,2, ... and N = 0, 1, .... Let :!C(M, N) be the subset of:!Co where:
CfJltI < t < CfJ11Hl and X(t) = j; and
the number of j-intervals in X after CfJ]If but before the
one surrounding t is N.
Then :!C(O) and :!c(M, N) are in ~(t). These sets are pairwise disjoint as
M and N vary; their union is {X(t) = j}. You should prove (95) when
A c :!C(O): it's similar to (5.39). The proof I will give works for these A's,
if you treat the notation gently; but it's silly.
(97) I only have to prove (95) for A c :!C(M, N).
,
=~(M, X+ 2) I
2 j
T(O, 2) 0 T
xi
I 0
.... II
teO, 1)0 T
=~(M, A+ I)
'-=T(M, X+2)-
- _J
8-:
o ,-..
I -~
T(O, I) oT
a(M, X) t
I-=T(M, X + 1)-
o~
:l:l;:"-'"
II
o
1{J2 :::- I{J\ oT=1{J3
I
.< ~T(O,O)oT
~ i--T(M, X)-
1 ,.....- N
"-'"
i I T
i
:
.< T(M, X-I)
~
-2 Jj "-'"
T(M, X- 2)
Figure 6.
So fix positive integer M and nonnegative integer N. Define 01' ;., and .s;/
by (70), with this choice of M and N. Review (80) and look at Figure 6. Use
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 287
(99)
PROOF. Number the intervals of constancy in [0, fPl] from left to right so
that interval number 0 has left endpoint o. Then X is ~(O, n) on interval n,
which has length T(O, n), for n = 0, 1, .... Let 1 ~ m < M. Let rp be the
least t > fPm with X(t) = O. Number the intervals of constancy on
(fPm' fPm+l) from left to right so that interval number 0 has left endpoint rp.
Then X is ~(m, n) on interval n, which has length T(m, n), for
integer n. *
(103) Let!JB be the a-field in f!l"(M, N) spanned by ~(m, .) and T(m, .) with
m < M and domain cut down to f!l"(M, N). Let ~ be the a-field in
f!l"(M, N) spanned by ~(M, A - n) and T(M, A - n) with n = 1, 2, ...
and domain cut down to f!l"(M, N).
(104) Lemma. f!l"(M, N) ('\ ~(t) is spanned by !JB and~.
PROOF. fPM < t on !£(M, N). So !JB c f!l"(M, N) ('\ ~(t) by (102).
Next, '1l c !!£(M, N) ('\ ~(t), because the nth interval of constancy in X
before the one at time t is a visit to ~(M, A - n) of length T(M, A - n):
use (98-101). I now have to compute {X(s):O ~ s ~ t} on f!l"(M, N) from
!JB and ~. To begin, you can compute {X(s):O ~ s ~ fPAi} on f!l"(M, N)
from !JB, using definition (80): and fPAi retracted to f!l"(M, N) is !JB-measurable
by (99). So a(M, A - n) retracted to f!l"(M, N) is !JB V ~-measurable by
(l00), for n = 0, 1, .... You can now compute the fragment {Xes): fPM <
s < a(M, A)} on f!l"(M, N) from!JB v~, using (101). Finally,
Xes) =j for a(M, A) ~ s ~ t on f!l"(M, N);
*
my authority is (71,98,101). But I peek at Figure 6.
WARNING. A retracted to f!l"(M, N) is not ~(t)-measurable.
(105) Definition. Review (70a, b). Call a set A special iff there is a finite
subset CA of C, with m < M for all (m, n) E CA , a nonnegative integer nA'
288 MORE EXAMPLES FOR THE STABLE CASE [8
The mapping T
Remember definition (96) of f£(M, N). Define a mapping T of ff(M, N)
into ff, as in Figure 6 on page 286:
(107a) ~(O, n) 0 T = ~(M, A + n) for n ~ 0;
(107b) ~(m, n) 0 T = ~(M + m, n) for m > 0;
(107c) T(O, 0) 0 T = a(M, A + 1) - t;
(107d) T(O, n) 0 T= T(M, A + n) for n> 0;
(107e) T(m, n) 0 T = T(M + m, n) for m > O.
Review definition (80) of f£ 0 and X. Check that T maps into ff 0, and
(108) X(t + s) = Xes) 0 T on f£(M, N).
Relation (108) is a straightforward but tedious project, which I leave to
you. Consider the assertion
(109)
I claim
(110) it is enough to prove (109) for special A and all measurable subsets
B of {~(O, 0) = j}.
To see this, put B = {X(O) = j and Xes) = k} in (109). Then use (108) to get
(95) for the special A. Then use (106).
The special B
(111) Definition. A set B is special iff there is a finite subset C B of C, with
m> 0 for all (m, n) E CB , a nonnegative integer nB' a function in from
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 289
{O, ... ,nB} U CB to I, and a/unction tB/rom {O, ... ,nB} U CB to (0, 00),
such that
iB(O) = j, and
B = Bl n B 2 , where
Bl = {~(O, n) = iB(n) and T(O, n) > tB(n)/or n = 0, ... , nB}, while
B2 = {~(c) = iB(c) and T(C) > tB(c)/or c E CB }.
I claim
(112) it is enough to prove (109) for special A and special B.
Indeed, the special B span the full a-field on {~(O, 0) = j}. Two different
special B are disjoint or nested, by inspection. And {;(O, 0) = j} is special.
So (112) follows from (110) and (10.16).
The ultraspecial B
(113) Call B ultraspecial iff B is special in the sense of (111), and CB is
empty: so B = B], as defined in (111).
I claim
(114) it is enough to prove (109) for special A and ultraspecial B.
Fix a special B, in the sense of (111). Remember C B , nB' iB' t B , Bl and B2
from (111). Remember
(115) m >0 for (m, n) E CB .
Let Dl be the subset of f£(M, N) where ~(M, Ie + n) = iB(n) for n =
0, ... , nB and a(M, Ie + 1) > t + tB(O) and T(M, Ie + n) > tB(n) for
n = 1, ... ,nB'
Let D2 be the subset of f£ where ~(M + m, n) = iB(m, n) and
T(M + m, n) > tB(m, n) for all (m, n) E CB .
Check
(116a) T-IB 1 = Dl and T-IB = Dl n D 2.
Get an A from (l05). I claim:
(116b) 7T i{A n Dl n D2} = 7Ti{A n D1 } • 7Ti{D2}
(116c) 7T i{D 2} = 7Tj{B 2}
(116d) 7Tj{B 1 n B2} = 7Tj{B 1 } • 7Tj{B 2}.
Remember 'm= {~(m, .), T(m, .)}. In order, f£(M, N), D 1 , and A n Dl are
all measurable on '0' ... ,
'lIi: use (98) for the first move. Next, D2 is measur-
able on a~1+l> 'M+2, ...) by (115). So (68) proves (116b). Relation (116c)
290 MORE EXAMPLES FOR THE STABLE CASE [8
follows from (115) and (68). Finally, BI is measurable on So; and (115)
makes B2 measurable on aI,
S2' ... ). So (68) proves (116d). Suppose (109)
for ultraspecial B. Compute:
'lTi{A n T-IB} = 'lTi{A n DI n D 2 } by (116a)
= 'lTi{A n D 1 } • 'lT i {D 2 } by (116b)
= 'lTi{A n D 1 } • 'lTj{B 2 } by (116c)
= 'lTi{A n T-IB 1 } • 'lT j {B 2 } by (116a)
= 'lTJA} . 'lTj{B J } • 'lTj{B 2 } by (109 on B 1 )
= 'lTi{A} . 'lTj{B I n B2 } by (116d)
= 'lTi{A} . 'lTj{B} by (111).
This settles (114). I wish I could reward you for coming this far, but the worst
lies ahead.
Check
(117) = {(w, W):w E Al and wE Aro n Wo}
A
(118) B = D* X H.
Remember in(O) = j; use (117-118) and definition (107) to check
(119) A n T-IB = {(w, w):w E Al n D and WE Aro nEro n Fro n Wo}.
=1
By (120, 121):
IA,
1Jq(Wi{A w} rldw) = I
A,
1Jq(W){A w nwo} ri(dw) = 7T i {A}.
The ideas of Section 5 can be used to construct the most general Markov
chain with all states stable and isolated infinities. In this section, an infinity
is a time t such that: in any open interval around t, the sample function
makes infinitely many jumps. I will sketch the program. To begin with, let
r be a substochastic matrix on the countably infinite set I. A strongly
approximate r-chain on the probability triple (O,:F, g» is a partially
defined, I-valued process ~(n), which has the strong Markov property for
hitting times. More exactly, let J be a finite subset of I. Let OJ be the set where
~(n) E J for some n; on OJ, let T J be the least such n: assume there is a least.
Given OJ> the process ~(T., + .) is required to be Markov with stationary
transitions r. This does not make ~ Markov. Incidentally, the time parameter
n runs over a random subinterval of the integers; the most interesting case is
where n runs over all the integers, so the ~(n) are defined everywhere.
Let X be a Markov chain with stationary, standard transitions P, and
regular sample functions.
Suppose the infinities of X are isolated, almost surely;
and occur at times CPt < CP2 < .... Let CPo = 0, and CPm = OCJ if there are
fewer than m infinities.
NOTE. You can show that the set of paths with isolated infinities is
measurable, so its probability can in principle be computed from P and the
starting state.
8.6] ISOLATED INFINITIES 293
Given Pm+l < 00 and X(t) for t ~ Pm+! and ;m+l( - 00), the visiting process
~m+l (-) on (<Pm+!' Pm+2) is a strongly approximate r -chain starting from
;m+l( - 00). As usual, given the visiting process, the holding times are
conditionally independent and exponentially distributed, holding times in j
having parameter q(j).
In particular, the conditional distribution of X( Pm+! + .) given Pm+! < 00
and X(t) for t ~ Pm+l depends only on ;",( 00). On Pm+! < 00, the pre-Pm+!
sigma field is spanned up to null sets by X(t) for t ~ Pm+l and ;m+l( - 00);
the conditional distribution of X( Pm+! + .) given Pm+! < 00 and the pre-
Pm+! sigma field depends only on ;m+l( - 00).
By reversing the procedure, you can construct the general chain with
stable states and isolated infinities, from the holding time parameters q, the
jump matrix r, and the crossing kernel K. It's like Section 5. For details in a
similar construction, see (Chung, 1963, 1966).
The construction of a strongly approximate r -chain starting from ft E B_ oo
is not trivial. For simplicity, suppose all i E I are transient relative to r.
Let {;(n)} be a strongly approximate r-chain starting from ft E B_ oo , on
the probability triple (0, ff, £?ll). More or less by definition, ft(j) is the mean
number of n with ;(n) = j. As before, let OJ be the set where Hn) E J for
294 MORE EXAMPLES FOR THE STABLE CASE [8
some n, and let TJ be the least such n. Let f-l(J) be the distribution of HTJ):
f-l(J)(j) = &P{Q J and ~(T J) = j} for j E J.
It turns out that the main problem in constructing g(n)} is the computation
of f-l(J): because f-l(J) determines the distribution of ~(TJ + .). One method
is sketched in (Hunt, 1960). Here is another.
Let G(i,j) be the mean number of visits to j by a true r-chain starting from
i. By the strong Markov property,
~ieJ f-l(J) (i) G(i,j) = f-l(j) for j E J.
This system of linear equations uniquely determines f-l(J): I will write down
the inversion matrix. Let r(J) be the transition matrix of a true r-chain
watched only when in J, as discussed in Section 2.5. Let /:l.J be the identity
matrix on J:
/:l.Ai,j) = 1 for i = j in J
= 0 for i ¥: j in J.
Define GJ and f-lJ as follows:
GAi,j) = G(i,j) for i andj in J
f-lAj) = f-l(j) for j in J.
The system of equations for Il(J) can now be rewritten in matrix notation:
Il(J) • G J = IlJ'
I claim
G J = ~:,o (r (J»n:
erasing non-J times doesn't affect the number of visits to j E J. The sum
converges beautifully, and
G-:/ = /:l.J - r(J):
boldly multiply both sides of the equation for GJ by (/:l.J - r(J». Therefore,
f-l(J) = f-lJ . (/:l.J - r(J». *
NOTE. /:l.J> GJ , and f-lJ are respectively /:l., G, and f-l retracted to J. But
r(J) and f-l(J) are the restrictions of rand f-l to J in a much subtler sense.
This is all 1 want to say about the isolated infinities case. There are two
drawbacks to the theory. First, it is hard to tell from P when the infinities are
isolated. Second, there are extreme, invariant f-l which do not give the
expected number of visits by a strongly approximate r-chain on any finite
measure space; and you can't tell the players without a program.
8.7] THE SET OF INFINITIES IS BAD 295
This kind of construction probably fails when the set of infinities is count-
able but complicated. It certainly fails when the set of infinities is uncountable.
See Section 2.13 of B & D for an example.
These parameters enter in the following way. The X process will visit various
states i; on reaching i, the process remains in i for a length of time which is
exponential with parameter q(i). The holding time is independent of every-
thing else. On leaving i = (a, n), the process jumps to (a, n + 1) with
overwhelming probability, namely p(i). It jumps to each other state in I
with positive probability summing to I - p(i). These other probabilities also
constitute parameters for the construction, but they are not important, and
can be fixed in any way subject to the constraints given.
The local behavior of the process is now fixed. To explain the global
behavior, say a < b for a E A and b E A iff a is to the left of b as a subset of
the line; say (a, m) < (b, n) for (a, m) E I and (b, n) E I iff a < b, or a = b
and m < n. Fix t and w with X(I, w) = ((!. The global behavior of X is deter-
mined by the requirement that either case 1 or case 2 holds.
Case 1. There is an e > 0, an a E A, and an integer L, such that: as
u increases through (1 - e, t), the function X(u, w) runs in order through
precisely the states (a, n): n ~ L in order. Then there is a 15 > 0, an interval
C E A with c > a, and an integer K, such that: as u increases through (t, t + 15),
skipping times u' with X(u', w) = ((!, the function X(u, w) runs in order
through precisely the states: (b, n) with a < b < c, all n; and b ~ c, but
n ~ K.
Case 2. There is an e > 0 and an interval a E A such that: as u increases
through (1 - e, t), skipping times u' with X(u', w) = ((!, the function X(u, w)
runs in order through precisely the states (b, n) with b > a, all n. Then there
is a 15 > 0 and an integer K such that: as u increases through (t, t + 15), the
*
function X(u, w) runs in order through precisely the states (I, n) with n ~ K
and I = (-00,0).
OUTLINE OF THE PROOF. Whenever case 2 occurs, there is positive
probability II p(i) that the chain proceeds to move through its states in
order. Whenever this occurs, the corresponding section of S",(w) is similar
to S. By Borel-Cantelli, infinitely many disjoint sections of S"'( w) are similar
to S, as required. *
9
1. AN EXAMPLE OF BLACKWELL
on to, I}, with A and /-l nonnegative, A + /-l positive. There is exactly one
standard stochastic semigroup P on to, I} with P' (0) = Q, namely:
/-l+A /-l+A
P(t, 0, 1) = 1 - P(t, 0, 0)
(1)
P(t, 1, 1) = _A_ + _/-l_ e-(I'+).)t
/-l+A /-l+A
P(t, 1,0) = 1 - P(t, 1, 1).
One way to see this is to use (5.29): define P by (I); check P is continuous,
P(O) is the identity, P'(O) = Q, andP(t + s) = P(t)· pes). Dull computations
in the last step can be avoided by thinking: it is enough to do /-l + A = 1 by re-
scaling time; since P(u) is 2 x 2 and stochastic when u is t or s or t + s,
I want to thank Mike Orkin for checking the final draft of this chapter.
297
298 THE GENERAL CASE [9
Qn_- ( -A n
J-ln
with An and J-ln positive, and let P n be the standard stochastic semigroup on
{O, I} with P~ (0) = Qn' Let / be the countable set of infinite sequences
i = (iI' ;2' i3, ...) of O's and 1's, with only finitely many 1'so Let N(i) be the
least N such that 1l ~ N implies in = O. Suppose nn J-ln/(J-ln + An) > 0,
that is,
(2)
Let {Xn(t):O ~ I < oo} be 0-1 valued stochastic processes with right
continuous step functions for sample functions, on a convenient measurable
space (0, ff). Let X(t) be the sequence (X1 (t), X 2 (t), . .. ). For each iE/,
let f!J1i be a probability on ff for which Xl> X 2 , ••• are independent, and Xn
is Markov with stationary transitions P lI , starting from in. This construction
is possible by (5.45) and the existence of product measure. I say
pet, i, i) = .9'i{X(t) = i}
= [n~,:jl Pn(t, in' in)] . [n~=N Pn(t, 0, 0)].
The first factor tends to 1 as t ->- O. Using (1), the second factor is at least
n~=N fin/(fin + An), which is nearly 1 for large N by (2).
Finally, suppose
(4) ~n An = 00.
For the rest of this chapter, let I be a finite or countably infinite set in the
discrete topology. Let I = I for finite I. Let I = I U {cp} be the one-point
compactification of I for infinite I. Call cp the infinite or fictitious or adjoined
state, as opposed to the finite or real states i E I. Let P be a standard stochastic
semigroup on I, with Q = PI(O) and q(i) = -Q(i, i). Do not assume
300 THE GENERAL CASE [9
q(i) < 00. The main point of this section is to construct a Markov chain with
stationary transitions P, all of whose sample functions have this smoothness
property at all nonnegative I:
iff(t) = cp, thenf(r) converges to cp as binary rational r decreases to t;
if f(t) E I, then fer) has precisely one limit point in I, namely f(t), as
binary rational r decreases to I.
Tilis result is in (Chung, 1960,11.7). The key lemma (9) is due to Doob (1942).
Downcrossings
For any finite sequence s = (s(I), s(2), ... , seN»~ of real numbers, and
pair u < v of real numbers, the number of downcrossings {3(u, v, s) of [u, v]
by s is the largest positive integer d such that: there exist integers
1 ~ nl < n2 < . . . < n2d ~ N
with
s(nl) ~ v, s(n2) ~ u, ... , s(n2d_l) ~ v, s(n 2d ) ~ u.
If no such d exists, the number of downcrossings is O. If sand t are finite
sequences, and s is a subsequence of t, then
{3(u, v, s) ~ {3(u, v, t).
Of course, {3(u, v, s) depends on the enumeration of s.
where (x) is the greatest integer no more than x. Verify that {3n(u, v, .) is
measurable, and nondecreasing in n.
(5) Lemma. M* is Ihe sel offsuch Ihal limn {3n(u, v,!) < oofor aI/rational
u and v with u < v. In particular, M* is measurable.
9.2] QUASI REGULAR SAMPLE FUNCTIONS 301
PROOF. Supposef¢ M*. Suppose that for some t E [0, M) and sequence
rmER II [O,M)withrm!t,
a = lim inff(rm) < lim supf(rm) = b.
The increasing case is similar. Choose rational u, v with
a < u < v < b.
For large N, the number of downcrossings D of [u, v] by f(r 1), ••• ,ferN)
is large. The number of downcrossings of [u, v] by ferN), ... ,f(r1 ) is at
least D - 1. If n is so large that 2n r1 , ••• , 2"rN are all integers,
So
Conversely, supposefE M*. Fix u < v. Let 0< 8 < !(v - u). Abbreviate
as in Figure 1. I say that J(t n) contains at most three rm's. For suppose l{tn)
contains rm' rm+l' rm+2, rm+3' as in Figure 1. Then In is either to the right of
*
rm+l or to the left of rm+2' In either case, there is a forbidden oscillation. So
there are at most 3N points rm' That is, 2d ~ 3N.
Pedro Fernandez eliminated the unnecessary part of an earlier proof.
302 THE GENERAL CASE [9
Figure 1.
Quasiconvergence
(6) Definition. Let A be a set directed by > and let ia E 1 for each a E A.
That is, {ia} is a generalized sequence. Say ia quasiconverges to j E I, or
q-lim ia = j,
iff: for any finite subset D of I\{j}, there is some a(D) E A such that a > a(D)
implies ia tt D; and for
any a E A there is some b > a such that ib = j. Say ia
quasiconverges to cp, or
q-lim ia = cp,
iff: for any finite subset D of I, there is some a(D) E A such that a > s(D)
implies ia i D.
The directed sets of main interest are: the nonnegative integers; the
nonnegative binary rationals less than a given real number; the nonnegative
real numbers less than a given real number; the binary rationals greater than
a given nonnegative real number; the real numbers greater than a given
nonnegative real number. In the first three cases, a > b means a is greater
than b. In the last two cases, a > b means a is less than b. Here is the usual
notation for these five quasilimits; t is the given real number and R is the set
9.2] QUASlREGULAR SAMPLE FUNCTIONS 303
exists for all positive real t, and equals f(t) for all positive t E R.
A function f from [0, (0) to j is quasiregular iff: f(r) E I for all r E R; and
f retracted to R is quasiregular; and
Suppose fis quasi regular from [0, (0) to i. I claim fis quasicontinuous from
the right: that is, f(t) = q-lim {f(s):s! t}. Begin the check by supposing
f(t) = i E I. By definition, f(s) = i for s arbitrarily close to t on the right.
Conversely, suppose f(sn) = i E I for some sequence sn! t. Without loss,
make the Sn strictly decreasing, and use the definition to find binary rational
rn with Sn ~ rn < Sn-l and fern) = i. But rn! t, so the definition forces
f(t) = i. Similarly,f has a quasi limit from the left at all positive times, and
is quasicontinuous at all binary rational r.
The process on R
Let n be the set of all functions w from R to /. Let {X(r):r E R} be the
coordinate process on n, that is, X(r)(w) = w(r) for r E Rand WEn.
Endow I with the a-field of all its subsets, and n with the product a-field,
that is, the smallest a-field over which each X(r) is measurable. For each
i E I, let Pi be the probability on n for which {X(r): r E R} is Markov with
stationary transitions P and X(O) = i. Namely, for 0 = ro < r1 < ... < rn
in Rand io = i, iI' ... , in in I,
f!nCU, v, w) P;(dw)
WARNING. The limiting random variable X(t) is well defined only a.e.
(11) Lemma. Let t be positive and real. Choose a version of X(t), as defined
in (lOb). For Pi-almost all w,for any e > 0, there are rand sin R with
t - e <r <t <s <t + e and w(r) = X(t, w) = w(s).
PROOF. Choose a sequence r n E R with r nit. By (10), X(r n) --'>- X(t) in
Pi-probability. Now choose a subsequence n' with Pi{X(r n ,) --'>- X(t)} = 1.
Similarly for the right. *
(12) Definition. Let Qq = {w: wE Q and w is quasiregular}.
Quasiregularity was defined in (7).
(13) Lemma. The set Qq is measurable and Pi{Qq} = 1.
PROOF. For r E R, let G(r) be the set of wE Q for which: there are s E R
with s > r but arbitrarily close having w(s) = w(r); and if r > 0, there are
s E R with s < r but arbitrarily close having w(s) = w(r). Clearly, G(r) is
measurable; and Pi{G(r)} = 1 by (toc, II). Remember (8-9). Check
The process X was defined in (14). These sets are called the level sets or sets
of constancy.
(19) Definition. Let j E I. Then Q(j) is the set of W E Qq such that: for all
t ~ 0,
X(t, w) = j implies lim.! t Xes, w) = j;
and for all t > 0,
q-limsttX(s,w)=j implies limsttX(s,w)=j.
The set Q q was defined in (12).
(20) Theorem. Let j E I, and let wE Qq. Then WE Q(j) iff the set Sj(w) is
either empty or a finite or countably infinite union of intervals [an, b n) with
a 1 < b 1 < a2 < .. '. Of course, a and b depend on w, and are not binary
rational. If there are infinitely many intervals, an -+ 00.
PROOF. The "if" part is easy. For "only if," let wE Q (j). Suppose t E Sj(w).
Then [t, t + c) C Sj(w) for some c > O. Of course, Sj(w) is closed from the
right, by quasiregularity. Consequently, Sj(w) is a finite or countably infinite
union of maximal disjoint nonempty intervals [an, bn). Suppose there are an
infinite number. By way of contradiction, suppose an' i c < 00 for a sub-
sequence n'. Then X(t, w) quasiconverges, so converges, to j as t increases to
c. So am < c ~ bm for some m, contradicting the disjointness. Similarly
for the right. *
(21) Theorem. The set Q(j) is measurable. And Pi{Q(j)} = 1 if j E I is
stable.
PROOF. First, Q(j) is the set of W E Qq such that the indicator function
of Rj(w), as a function on R, is continuous on R, and has limits from left
and right at all positive t f/: R: use (19, 20). So Q(j) is measurable by (5).
Check that the indicator function of Rj(w), as a function on R, is continuous
for Pralmost all w: use the argument for (7.6). For such an w, any r E Rj(w)
is interior to a maximalj-interval of w, as in the paragraph before (7.8). The
*
argument for (7.8) shows that for Pi-almost all w, for any n, only a finite
number of these intervals meet [0, n]. This disposes of the second claim.
Recall that j is instantaneous iff q(j) = 00.
Use (10.16) on the definition of Pi and Pj' to see that for all measurable A:
Pi{X(r) =j and Tr E A} = P(r, i,j)Pj{A}.
Put A(j, s) for A: then
B(r, s) = {Tr E A(j, s)} c {X(r) = j},
so (18) makes
Pi{B(r, s)} = P(r, i,j)Pj{A(j, s)} = 0.
But Ur •s B(r, s) is the complement of the set described in the lemma. *
(23) Theorem. Suppose j E I is instantaneous: then the set of wE nq
satisfying (a) is measurable and has Pi-probability l. Properties (b) and (c)
hold for all W E nq and all j E /.
(a) Sj(w) is nowhere dense.
(b) Each point of Sj(w) is a limit from the right of Sj(w).
(c) Sj(w) is closed from the right.
PROOF. You should check (b) and (c). I will then get (a) from (22). In fact,
for W E n q , property (a) coincides with the property described in (22).
To see this, suppose wE nq and R;(w)::J [a, b] n R for a < b in R. Then
Sj(w) ::J [a, b) by (c). Conversely, suppose wE nq and Sj(w) is dense in
(a, b) with a < b. Choose a pair of binary rationals c, d with a < c < d < b.
Then Sj(w)::J [c, d] by (c), so Rj(w)::J [c, d] n R. *
(24) Remarks. (a) Suppose W E nq • Then [0, 00 )\S;(w) is the finite or count-
ably infinite union of intervals [a, b) whose closures [a, b] are pairwise
disjoint. This follows from properties (23b-c).
For (b-d), suppose W E nq satisfies (23a).
(b) [0, oo)\S/w) is dense in [0, (0), and is therefore a countably infinite
union of maximal intervals.
(c) That is, S;(w) looks like the Cantor set, except that the left endpoints of
the complementary intervals have been removed from the set. And Sj(w) has
positive Lebesgue measure, as will be seen in (28, 32).
(d) If t E Siw), there is a sequence rn E R with rn! t and X(rn' w) rf j, so
X(rn' w) -+ rp.
(26) Theorem. The set of wE D.qfor which S'I'(w) has Lebesgue measure 0
is measurable and has Pi-probability 1.
*
PROOF. Fubini on (15, 16).
NOTE. Suppose allj E J are instantaneous. For almost all w, the set Siw)
is the complement of a set of the first category. Consequently, any nonempty
interval meets S'I'(w) in uncountably many points. For a discussion of category,
see (Kuratowski, 1958, Sections 10 and 30).
(27) Definition. Call a Borel subset B of [0, 00) metrically perfect iff: for any
nonempty open interval (a, b), the set B n (a, b) is either empty or ofpositive
Lebesgue measure. Let D. m be the set of wE D.q, as defined in (12), such that:
for all j E J, the set S;( w) is metrically perfect. This is no restriction for stable j
and w E D.(j), as defined in (19).
(28) Lemma. The set D. m is measurable and Pi{D. m } = 1.
PROOF. For a < r < band r E Rand j E J, let A(j, r, a, b) be the set of
wE D. q such that: either w(r) :F- j, or
Lebesgue {S;(w) n (a, b)} > o.
Any proper interval contains a proper subinterval with rational endpoints.
Moreover, S;(w) n (a, b) is nonempty iff R;(w) n (a, b) is nonempty.
Consequently, D. m is the intersection of A(j, r, a, b) for j E J and r E Rand
rational a, b with a < r < b. So D. m is measurable, and it is enough to prove
Pi{A(j, r, a, b)} = 1.
Suppose P(r, i,j) > 0, for otherwise there is little to do. Let e > O. Let
L(e, w) = e1 Lebesgue {S;(w) n (r, r + e)}.
Use (15) and Fubini in the first line, and (16) in the last:
i {X(TJ~;}
L(e) dP; =e
1 fT+e P;{X(r) = j
T
and X(t) = j} dt
r
){X(TJ=i}
L(e) dP; --+ P;{X(r) = j} as e --+ o.
312 THE GENERAL CASE [9
*
at least one side to be finite, so the infinite case follows from the finite case.
The argument on the left is similar
Let t ~ O. Let W t be the set of wE Q q such that u ---+ X(t + u, w) is quasi-
regular on [0, (0). For w E Qq, let Ttw be the function r ---+ X(t + r, w) from
R to I. Let ff(t) be the a-field spanned by Xes) for s ;; t. Here is the Markov
Property.
(31) Lemma. Fix t ~ O.
(a) W t is measurable and has Pi-probability 1.
(b) T t is a measurable mapping of W t into Qq.
If wE W t and u ~ 0, then X(t + u, w) = X(u, Ttw).
(e)
(d) Suppose A E ff(t) and A c {X(t) = j}. Suppose B is a measurable
subset of Q. Then
Pi{A and T;lB} = Pi{A}· Pj{B}
(c) On {X(t) = j} n Wt, a regular conditional P,-distribution for T,
given ff (t) is Pj.
(f) Given {X(t) = j}, the shift T t is conditionally P;-independent of ff(t),
and its conditional Pi-distribution is Pj.
(g) Let ff be the product a-field in n
relativized to Qq. Let F be a non-
negative, measurable function on the carlesian product
(Q q , ff(t» X (Q q , ff).
=i
Then
=i
where
The first line is definition (14). The second line is the definition of T t • The
third line uses w E Wt.
Claim (d). By inspection or (10.16), it is enough to do this for special A
and B, of the form
A = {X(sn) = in for n = 0, ... , N}
B = {X(u m) = jm for m = 0, ... ,M};
where °°= So < ... < sN = t and io = i, ... ,iN = j are in I; for the
second line ~ Uo < ... < U M < 00 and the j's are in I. Let
Metric density
The next result is due to Chung (1960, Theorem 3 on p. 146). It has content
only for instantaneous j.
(32) Theorem. Let j E I. The set of w E Q q with
To complete the argument, check that for measurable subsets B of [0, (0),
You should review Section 7.4 before tackling this section, which is pretty
technical. One reason is a breakdown in the proof of (7.35). I'll use its nota-
tion for the moment. Then
lim sup (A n Bn) cAn B
survives, by quasiregularity. But
A nBc lim inf (A n Bn)
collapses. To patch this up in (34), I need (33). Here is the permanent
notation.
° As in Section 7.4, let .F(t) be the smallest a-field over which all Xes) with
~ s ~ t are measurable: the coordinate process X on nq has smooth
sample functions and is Markov relative to Pi' with stationary standard
316 THE GENERAL CASE [9
Let yet) = X(T + t). Call Y the poSt-T process. I say (t, w) -+ Yet, w)
*
is measurable. Indeed, (t, w) -+ T(W) + t is the sum of two measurable
functions, and is therefore measurable. Consequently, (t, w) -+ (T(W) + t, w)
is measurable. But (t, w) -+ X(t, w) is measurable by (IS). The composition
of the last two mappings is (t, w)-+ Yet, w).
°
(34) Proposition. Suppose T is a Markov time. Let Y be the poSt-T process.
Let ~ So < SI < ... < sJJl' Let i o, iI' ... ,(lI be in I. Let
B = {Y(sm) = imfor m = 0, ... , M}
and
*
Thus, the right side of (36) converges to the right side of (35). Similarly for
the left sides.
The next problem is controlling the sample functions of the post-T process.
It is convenient to prove the more general lemma (37) before cleaning the
post-T process in (39). Proposition (39) is a preliminary form of the strong
Markov property (41), on the set {X(T) E I}.
(37) Lemma. Let Y be a jointly measurable, I-valued process on a probability
triple (0, :F, f!J'). Suppose Y is a Markov chain with stationary transitions P
and starting state j E I. Suppose:
(a) Y(', w) is quasicontinuous from the right for all w;
(b) {t: Yet, w) = k} is metrically perfect for all k EI and all w.
Then the set of w such that Y(', w) is quasiregular has inner f!J'-probability 1.
PROOF. Let OR be the set of w E 0 such that Y(', w) retracted to R is
quasiregular. As (17) shows, OR E:F and f!J'{OR} = 1. Let
Y*(t, w) = q-lim Y(r, w) as r E R decreases to t
for all t ~ 0 and wE OR' In particular, the y* sample functions are quasi-
regular. As (17) shows,
f!J'{ Y*(t) = yet)} = 1 for each t ~ O.
Let 0 0 be the set of w E OR such that
Lebesgue {t: Y*(t, w) =F Yet, w)} = o.
As Fubini implies, f!J'{00} = I. The proof of (37) is accomplished by showing
that for all w E 0 0 :
Y(t, w) = Y*(t, w) for all t ~ O.
Indeed,
(38) if Y*(t, w) = k E I, then Y(t, w) = k,
because of (a). Conversely, suppose Yet, w) =k E I. Now (b) implies
Lebesgue D > 0 for any () > 0, where
D = {s:t < s < t + () and Yes, w) = k}.
Because wE 0 0, there is an sED with Y*(s, w) = Yes, w), that is, with
Y*(s, w) = k. Because Y*(', w) is quasiregular, Y*(t, w) = k. I haven't
handled the value cp explicitly. But yet, w) =F Y*(t, w) implies that at least
one is in I, so the infinite case follows from the finite case. *
The final polish on this argument is due to Pedro Fernandez.
9.4] THE STRONG MARKOV PROPERTY 319
Let ~q be the set of w E ~ such that Y(', w) is quasi regular : where Y is the
post-T process.
WARNING. Select an w such that Y(', w) is quasiregular when retracted to
R. Even though Y(', w) is quasicontinuous from the right, it is still possible
that
q-limd t Y(r, w) = qJ while Yet, w) E I;
so w If ~q. This hurts me more than you.
(39) Proposition. Let T be Markov, and ~ = {T < co}.
(a) Given ~ and X(T) = j E I, the pre-T sigmafield ~(T+) and the post-T
process Yare conditionally Prindependent, Y being conditionally Markov with
stationary transitions P and starling stale j.
(b) ~q is measurable.
I
(c) Pi{~q ~ and X(T) =j E I} = l.
(c) q-limf(r) exists as r E R increases to I for all I> 0, and is f{/) for
positive I E R.
Let f be a function from [0, (0) to 1. Say f is quasiregular on (O, (0) iff: f
retracted to R is quasi regular on (0, (0), and
f(/) = q-limf(r) as r E R decreases to I
for all t ~ 0. Let 0(/ be the set of wE 0 which are quasiregular on (0, (0).
CRITERION. Let wE O. Then wE Oa iff w(r + .) E Q a for all positive
r E R, and
w(O) = q-lim {w(s):s E Rand s! OJ.
As (8) implies, 0(/ is measurable. For w E 0(/, let
(42) X(t, w) = q-lim X(r, w) as r E R decreases to t.
Introduce the class P of probabilities f' on 0, having the properties:
(43a) f'{X(r) E I} = 1 for all positive r E R;
and
(43b) f'{X(rn) = in for n = 0, ... ,N}
= f'{ X(ro) = io} . n;;':-l pern+l - r n> in> in+l)
for all nonnegative integers N, and °
~ ro < rl < ... < rN in R, and
i o, •.. ,iN in I. By convention, an empty product is 1.
322 THE GENERAL CASE [9
(48) Theorem. Let;o = i, iI' ... , iN be in J, and let to, ... , tN be non-
negative numbers. Then
where
and
t = ~;;~o q(in)t n·
where
*
PROOF. Use (41), (45), and (50) below.
(50) Proposition. Let M be this mapping from fl" to QQ:
X(r, Mx) = Z(r, x) for all r E R.
There is a Markov time T on QQ such that a = ToM. Then
{a < oo} n ~(a+) = M-l [{T < oo} n~(T+)].
Let Y be the post-T process on Qq, and let S be Y with time domain retracted
to R. Then
Wet, x) = Yet, Mx)
and
Tx = SMx
for all x E fl".
PROOF. The first problem is to find a Markov time T on QQ such that
a = ToM. Let A E ~(a+) with A c {a < oo}. The second problem is to
find B E ~(T+) with Be {T < oo} and A = M-IB. The rest is easy. To start
the constructions, let ~(oo) be the a-field spanned by Z, and let ~(oo) be the
full a-field in QQ' namely the a-field spanned by X. Check
~(t) = for 0 ~ t ~ 00.
M-l~(t)
I remind you that M-l commutes with set operations. Start work on T.
Confine rand s to R. Now {a < r} E ~(r), so {a < r} = M-IFr for some
Fr E ~(r). Let
GT = Us<r Gs·
Let G = Ur Gr. Off G, let T = 00. For wE G, let T{W) be the sup of r with
W ¢= Gr. You should check that {T < r} = GT • So T is Markov, because
{T < t} = Ur<t {T < r}. And ToM = a, because ToM < r iff a < r.
Turn to the second problem. Let A E ~(a+) and A c {a < oo}. Now
A n {a < r} E ~(r), so A n {a < r} = M-IHT for some Hr E~(r). Let
Jr = Hr n < r}
{T
Kr = Us<r {T < s}\Js
Br = Jr\Kr
B = UTBr.
328 THE GENERAL CASE [9
APPENDIX
1. NOTATION
I want to thank Allan Izenman for checking the final draft of this chapter.
329
330 APPENDIX [10
a" - b" means: there are finite, positive K and N such that
1
Kb" ~ a" ~ Kb" for all n ~ N.
an ~ b" means: for any e with 0 < e < 1, there is a finite N. such that
(1 - e)b" ~ a" ~ (1 + e)b" for all n ~ N •.
[0,1) = {x:O ~ x < 1}.
(x) means the greatest integer n ~ x.
x is positive means x > 0, while x is nonnegative means x ~ O.
When it seems desirable, the redundancy "x is strictly positive" is employed.
Similarly for increasing and nondecreasing.
Real-valued means in (- 00, 00), while extended real-valued means in
[ - 00,00]. Random variables are allowed to take infinite values without
explicit warning.
Clearly usually means that the assertion which follows is clear. Some-
times, by force of habit, it means that I didn't feel like writing out the argu-
ment.
Let f be a real-valued function on S x T. Then f(s) = f(s, . ) is the
real-valued function t --+ f(s, t) on T, while f(t) = f( . , t) is the real-valued
function s --+ f(s, t) on S. Furthermore, f is used indifferently for the real-
valued function (s, t) --+ f(s, t) on S x T, the function-valued mapping
s --+ f(s) on S, and the function-valued mappingf --+ f(t) on T. Whenever this
threatens to get out of hand, some explanation is provided.
2. NUMBERING
3. BIBLIOGRAPHY
L Xd&= LYd&
Absolute continuity
P(A) = {fdQ.
f gdP = f gfdQ.
Convergence
Suppose X" and X are finite almost surely. Say X" converges to X, or
X" -+ X, in &'-probability iff
&'{JX" - XI ~ e} -+ 0
for any e > O. Say {X,,} is fundamental in probability iff
lim",m->oo &'{IX" - Xml ~ e} =0
for any e > O. For extended real-valued X" and X, say X" -+ X in &'-
probability iff for any positive, finite e and K:
&,{IX" - XI ~ e and IXI < oo} -+ 0
&,{X" ~ K and X = oo} -+ 0
&,{X" ~ - K and X = -(X)} -+ O.
(13) Theorem. Suppose {X,,} are finite almost surely and fundamental in
334 APPENDIX [10
The LP-spaces
A random variable X is in U relative to &' iff f IXI P d&' < 00. The p-th
root of this number is the U-norm of X. A sequence Xn -+ X in LP if the
norm of X - X n tends to O. After identifying functions which are equal a.e.,
U is a Banach space for p ~ 1. This popular fact is not used in the book.
The results of this section (except that uniform integrability gets more
complicated) are usually true, and sometimes used, for measures &' which
are not probabilities; a measure on ff is nonnegative and countably additive.
In places like (11), you have to assume that &' is a-finite:
n = U f; 1 n i with &'(ni ) < 00.
For the rest, suppose &' is a probability; although I occasionally use converse
Fubini (22) for a-finite measures.
5. ATOMS*
= Bn U (An + 1 \ Bn)
= Bn U (An+l \ en)
where
en = An+l nBn = U7=1 (An+l nAJ
Now An+ 1 n Ai E C(j, because C(j was assumed closed under intersection. So
en, being the union of n sets in C(j, is in Iff by inductive assumption. But
en c An + 1, SO An + 1 \ en E 8. Finally, An + 1 \ en is disjoint from Bn, so its
union Bn+ 1 with Bn is in Iff. This completes the induction.
Let A* = A or Q \ A. If Ai E C(j for i = 1, ... , n. I will get
B = n7=1 AT Etff.
Using the assumption that C(j is closed under intersection, you can rewrite Bas
336 APPENDIX [10
= D\(CnD).
6. INDEPENDENCEt
Let (O,IF. &') be a probability triple. Sub IT-fields ~,!Fz, ... are inde-
pendent (with respect to &') iff Ai E ~ implies
&,(AI nA 2 n···) = &'(Ad·&'(A2)···.
Random variables X I, X 2, ... are independent iff the IT-fields they span are
independent; the IT-field spanned or generated by Xi is the smallest IT-field
with respect to which Xi is measurable. Sets A, B, ... are independent iff
lA, lB' .. · are independent.
(19) Borel-CanteIli Lemma. (a) l:&'{ An} < 00 implies &,{lim sup An} = 0;
(b) l:&'{ An} = 00 and AI, A 2, ... independent implies &,{lim sup An} = 1.
Suppose Xi is a measurable mapping from (O,~) to (OJ,~) for i = 1,2.
The distribution or ~-distribution.~ Xii of X I is a probability on ~I : namely,
(&'XII)(A) = &,(X I I A) for A E~.
(20) Change of variables formula. Iff is a random variable on (0 1, ~I)'
then
E[f(X d] = fXIE{!I
f(xd(&'XII)(dxd·
Let X 1 and X 2 be independent: that is, Xii fft and Xii §i are. Let fft x §i
be the smallest IT-field of 0 1 x O2 containing all sets Al x A2 with Ai E~.
Let f be a random variable on (0 1 x O 2 , fft x !Fz) such that E[f(X I, X 2)]
exists.
(21) Fubini's theorem. If X I and X 2 are independent,
E[f(X I ,X 2)] = f
XIE{!I
E[f(xI,X 2)]&,Xi l ( dx I)
= f
X2E{!2
E[f(X I, X2)] &'X i l ( dx 2)'
In particular,
(2la) E(X IX 2) = E(X I)' E(X 2)'
t References: (Loeve, 1963, Sections 8.2 and 15); (Neveu, 1965, Section IV.4).
338 APPENDIX [10
Conversely, suppose (0;, ff;,~) are probability triples for i = 1,2. Let
0=0 1 X O2 and iF = ffii x ~.
(22) Converse Fubini. There is a unique probability fJ> = fJ>1 X fJ>2 on
ffii x ~ such that
fJ>(AI x A 2) = fJ>1(A 1)· fJ>2(A 2)
for Ai E ff;. Then
7. CONDITIONINGt
8. MARTINGALESt
Let T be a subset of the line. For t E T, let !!Fr be a sub-CT-field of $', and let
XI be an !!Fr-measurable function on 0, with finite expectation. Suppose that
s < t in T implies: ~ c !!Fr, and for A E~,
L X,d&' ~L X , d9,
tReferences: (Doob, 1953, Chapter VII, Sections 1--4); (Loeve, 1963, Section 29);
(Neveu, 1965, Section IV.5).
340 APPENDIX [10
Suppose T = {O, 1, ... }. Let T be a random variable on (n, iF, &'), which
is 00 or is T. Suppose (T = t) E ~ for all t E T. Then T is admissible,
or a stopping time. The pre-T a-field ~ is the a-field of A E ~such that
A n {T = t} E §, for all t E T.
(28) Theorem. Suppose T = {O, I, ... }. Suppose {X,,~: t E T} is a
martingale, and T is admissible.
f {IZTI>k}
IZTld&,;;;; k-1ME(IXnl) + f
{lXnl>M}
IXnld~.
f
Am
X d.~ =
T f Am
X m d~ = fAm
Xn
dY'
Sum out m = 0, ... , n to prove (i). For (ii), put A = {XT ~ O} or {XT < O} to
see E(I ZTI) ;;;; E(IXn I), so
For (iii), let A E~. Let Am = {A and T = m}. Then Am E s;;;... If m ;;;; n,
f Am
X r d&' = f Am
X m d&' = f Am
X n d&'.
8] MARTINGALES 341
J{Aandt~n}
X,d&J = J{Aand,~n}
Xn d&.
Now
L Xod&J = L Xnd&J
= J{Aand,~n}
X,d&J +J
{Aand,>n}
Xn d&.
limn _ oo J{Aand'~n}
X, d&J = I x,
A
d&.
J{Aand t>,,}
X"d&J -+ O.
*
Example. Let the Y,. be independent and identically distributed, each
being 0 or 2 with probability! each. Let }f. 0 = 1 and let X" = Y1 ... Y,. for
n ~ 1. Let T be the least n if any with X" = 0, and T = 00 if none. Then {X,,}
is a nonnegative martingale, T is a stopping time, E{T) < 00, and X t = 0
almost surely. This martingale was proposed by David Gilat. *
Example. There is a martingale, and a stopping time with finite mean,
which satisfy (28b) but not (28a).
DISCUSSION. For n ~ 1, let
(n - 1)2
a" = log (3 + n)
and b" + 1 = a; + 1 - a;.
Let b 1 = 1. Check b" > 0 for all n. Let Y,. be N{O, bn), and let Y1 , Y2 , • •• be
independent. Let Xn = Y1 + ... + Y,., so Xn is a martingale. For n ~ 1,let
so
342 APPENDIX [10
~f E{ISJ(y)l} ()(dy)
f
= c aJ()") ()(dy) where c = E[I YII}
= 00.
Continuing,
r
J{r>nl
IXnl d,Uj> = r
J{J>nl
E{ly + Snl} O(dy)
~ r
J{J>nl
[Iyl + E{ISnl}J ()(dy)
= r
J{J>nl
Iyl O(dy) + can' O{f > n}
--> 0.
*
Suppose T = [0,00). Let T be a random variable on (Q, qf, 0'), which is 00
or in T. Suppose {T < I} E ~ for all lET. Then T is admissible, or a
slopping lime.
r
J{Aandt;;il)
X t dq> =
n
r
J{Aandt;;il)
Xld&>.
r
J{Aandt;;il)
X t dq> = r
J{Aandt;;i I)
X I d&>.
Now
L X 0 dq> = L XI dq>
= r
J {A and t;;i t)
XI dq> + r
J {A and t > I)
XI dq>
= r
J{Aandt;;il)
X t dq> + r
J{Aandt>l)
XI d&>.
{AandTn~t}E§,
(31)
for a typical A E ~m with A C {Tn = m}. I will argue inductively that for
M~m,
f X M d& ~ f X M + I d&
f f
A n{a>M} A n{a>M}
= X M + I d& + X M + I d&
f f
An{a=M+I} An{a>M+l}
= X a d& + X M + I dPJ.
An{a=M+I} An{a>M+I}
This proves (32). Now (32) is even truer without the rightmost term. Drop it,
and let M increase to 00 to get (31). *
If s I, ... , SN is a sequence of real numbers, and a < b are real numbers, the
number of downcrossings of [a, b) by Sl, ... , SN is the largest positive integer
k for which there exists integers 1 ~ nl < n2 < ... < n2k ~ N with
snl ~ b, sn2 ~ a, . .. ,sn2k_1 ~ b, sn2k ~ a.
If no such k exists, the number of downcrossings is O.
For (35-36), let P and Q be two probabilities on g;. Then Q divides up into
three g;-sets, Qp and Q e and na,
such that P(QQ) = Q(Qp) = and P is °
346 APPENDIX [10
= 00 on Q p
and let dP/dQ be the Radon-Nikodym derivative of P with respect to Q
on Qe • This function is $' -measurable and unique up to changes on (P + Q)-
null sets. Let {dn } be a nondecreasing or nonincreasing sequence of a-
fields. In the former case, let d 00 be the a-field generated by the union of the
d n. In the latter case, let d 00 be the intersection of the d n. For any measure
R, let Rn be the retraction of R to dn' Define dPjdQn like dP/dQ above, with
d n replacing $'. Thus, dPn/dQn is an dn-measurable function to [0, 00],
unique up to changes on (Pn + Qn)-null sets.
9. METRIC SPACESt
(37) Example. Let 0 be Euclidean n-space R". Let p be the usual distance:
p(x, y) = Ilx - yll and IIul1 2 = L7= 1 uf. Then 0 is complete and separable.
(38) Example. Let 0 be the rationals. There is no way to metrize the usual
topology so that 0 is complete.
(39c) for each A2 E Ji'2, the function w -.. Q[X I(W), A 2] is a version of
&I{X2 E A2IX i l Ji'd.
Condition (c) can be rephrased as follows: if Aj E Ji'j, then
(39d) f AI
Q(x l , A2)(&lXil)(dxl) = &I{X I E Al and X 2 E A 2}.
You only have to check this for generating classes of A I'S and A2'S closed
under intersection, by (16). Sometimes Q(xJ,·) is called a regular con-
ditional .o/l-distribution for X 2 given X I = XI.
Suppose Q is a regular conditional &I-distribution for X 2 given X I. Let
4> be a measurable mapping from (0 2 '~) to (0"" .?f4,). Let Qq,(x l , .) be
the Q(x l , . )-distribution of 4>. Make sure that Qq,(x l , .) is a probability
on 0",.
(40) Lemma. Q", is a regular conditional &I-distribution for 4>(X 2) given X I.
EXAMPLE. Let 0 1 = O 2 = 0 = [0, 1]. Let X I(W) = X 2(W) = w. Let
Ji'l be the Borel a-field of [0, 1]. Let A be Lebesgue measure on Ji'l. Let B
be a non-Lebesgue-measurable set, namely A*(B) < A*(B). Let ~ = Ji' be
the a-field generated by Ji'l and B. Extend A to a probability &I on Ji'. You
can do this so &I(B) is any number in the interval [A*(B), A*(B»). There is no
regular conditional &I-distribution for X 2 given X I. For suppose Q were
such an object. Theorem (51) below produces a &I-null set N E Ji'1, such that
Q(w, .) is point mass at w for w r/: N. In particular,
r f dr!Y Jill
JIl
r JIl2
=r f(xI, X2) Q(x l , dx 2)r!YI(dx l ).
r!Y(A) = r
Jill
Q(x I, A(x I» r!YMx I)
for A E $'1 X $'2: as before, A(x d is the XI-section of A, namely the set of
X2 with (x I, X2) EA. Check that r!Y is a probability satisfying (a) and (b).
The integration formula now holds with f = 1A ; both sides are linear and
continuous under increasing passages to the limit. *
Regular conditional distributions given ~
In the book, the usual case is: 0 1 = 0 and $'1 c $' and X I(W) = w.
Then, a regular conditional r!Y-distribution for X 2 given X I is called a regular
conditional r!Y-distribution for X 2 given $'1 . The next theorem (43) embodies
the main advantage of regular distributions. It is easy to prove, and intuitive:
it says that when you condition on a a-field ~, you can put any ~-measurable
function U equal to a typical value u, and then substitute U for u when
you're through conditioning. That is, U is truly constant given ~. However,
example (48) shows that something a little delicate happened.
I will state (44) in its most popular form. The notation will be used through
(50). Let (0, $', r!Y) be the basic probability triple. Let ~ be a sub-a-field of $'.
Let U be a measurable mapping from (O,~) to a new space (Ou, $'u). Let V
be a measurable mapping from (0, $') to a new space (Oy, $'y). Thus, U is
~-measurable and V is $'-measurable. The situation is summarized in
Figure 1. Let Q be a regular conditional r!Y-distribution for V given ~, so Q
350 APPENDIX [10
'£cF
n
nv
V-l~VC ~
nux nv-----
F
[0,00)
F is $u X $v- measurable
F
Q u X Qv-+ [0, (0)
F is ~ x ~-measurable
Figure 1.
(45) fAJrnv
F(U(w), v) Q(w, dv) 9(dw) = fAF(U(w), V(w»9(dw).
I know
Is Q(w, C) 9(dw) = 9{S and VE C}
for S E 1: and C E §"y. Rewrite this with {A and U E B} in place of S, where
B is a variable element of §"u. This is legitimate because U is 1:-measurable.
I now have (45) for a special F:
F(u, v) = 18(u)· Idv).
Both sides of (45) are linear in F, and continuous under increasing passages
to the limit. Use (47) below. *
10] REGULAR CONDITIONAL DISTRIBUTIONS 351
is in F when B E.?u and C E Fy. Then F consists of all the nonnegative measur-
able functions on (nu x ny, ffu x .?v).
(48) Example. Suppose U = V is uniform on [0, 1]. Let F(u, v) be 1 or 0
according as u = v or u # v. Then F(U, V) = 1 almost surely, so
E{F(U, V)IU} = 1
Theorem (49) sharpens (44). To state it and (50), let </J be a measurable
*
mapping from (nu x ny, ffv x ffv) to some new space (n"" ff",). Tempora-
rily, fix WEn and u E nu. Then </J(u, .) is a measurable mapping from
(ny, .?y) to (n"" ~). And Q(w,·) is a probability on .?v. So I am entitled
to define D(w, u, . ) as the Q(w, . )-distribution of </J(u, . ); this comes out to a
probability on .?",. Let R(w, . ) = D(w, U(w), .), again a probability on ff",.
(49) Theorem. R(·,·) is a regular conditional &-distribution for </J(U, V)
given L.
PROOF. Let A ELand BE ff",. I have to check that
= f Jflvr
A
F[U(w), v] Q(w, dv) &(dw).
352 APPENDIX [10
Recognize
r
Jov
F[U(w),v]Q(w,dv) = Q(w,{v:4>[U(w),v]EB})
= R(w, B).
The situation is more tractable when L and V are independent, which will
*
be assumed in (50). Let D(u, . ) be the &>-distribution of 4>(u, . ), a probability
011 fF4> for each u E nu.
Use Fubini (21) to evaluate the left side. Keep (0, fF, &» for the basic prob-
ability triple. Put (n, L) for (n" .?;), with X 1(w) = w. Put (ny, fFy) for
(n 2 , fF2), with X 2(W) = V(w). Let f(w, v) = 1 if WE A and 4>[U(w), v] E B;
otherwise, let f(w, v) = o. Let .'fo be &> retracted to L. Then
&>{ A and 4>(U, V) E B} = In f[X l(W), V(w)] &>(dw)
= D[U(w), B]
for WE A.
(51) Theorem. Let L be countably generated. Then the set of w such that
Q(w, L(W)) = 1 is a L-set of I?i'-probability 1.
PROOF. Let d be a countable generating algebra for L. For each A Ed,
let A* be the set of w such that Q(w, A) = lA(w). Then A* is a L-set of I?i'-
probability 1, and the intersection of A* as A varies over .91 is the set de-
scribed in the theorem. *
For (52), do not assume that L is countably generated. Let C be the smallest
a-field over which w -> Q(w, A) is measurable, for all A E ff. Thus, C C L.
Let E be the set of w such that
Q(w, {w': Q(w', . ) = Q(w, . )}) = 1.
(52) Theorem. Suppose ff is countably generated.
(a) g is countably generated.
(b) E EC.
(c) I?i'(E) = 1.
PROOF. Let .91 be a countable generating algebra for ff. Then C is also
the smallest a-field over which w -> Q(w, A) is measurable, for all A E .91, by
the monotone class argument (Section 5). This proves (a). As (18) now implies,
C(w) = {w' :Q(w', A) = Q(w, A) for all A Ed}.
Of course, Q is a regular conditional I?i'-probability given C. Finally, (51)
proves (b) and (c). *
Regular conditional distributions for partially defined random
variables
Let (n, ff, I?i') be the basic probability triple, and let L be a sub-a-field of ff.
Let DEL. Let V be a measurable mapping from (D, DL) to a new space
(n v , ffv). As usual, DL is the a-field of subsets of D of the form D n S with
S E L. A regular conditional I?i'-distribution for V given L on D is a function
Q of pairs (w, B) with WED and BE ffv, such that:
Q(w, .) is a probability on ffv for each wED;
Q( . , B) is DL-measurable for each BE ffv ; and
for all A E L with A c D and all BE ffv. Of course, A and B can be confined
to generating classes in the sense of (16). The partially defined situation is
isomorphic to a fully defined one. Replace n by D, and ff by Dff, and L by
354 APPENDIX [10
D}2, and &> by &>{ ·ID}. Theorems like (44) can therefore be used in partially
defined situations.
Conditional independence
Let (n, ff) and (n i , ffi) be Borel. Let &> be a probability on ff, and Xi a
measurable mapping from (n, ff) to (ni , ffJ Let }2 be a sub-a-field of ff.
What does it mean to say X I and X 2 are conditionally &>-independent given
}2? The easiest criterion is
&>{XIEAI and X 2 EA 2 1}2} = &>{X I EA II}2},&>{X 2 EA 2 1}2} a.e.
for all Ai E:!F;. Nothing is changed if Ai is confined to a generating class for
ffi in the sense of(16). Here is an equivalent criterion. Let Q( " . ) be a regular
conditional &>-distribution for (X I, X 2) given}2. Then Q(w, . ) is a probability
on ffl x ff2 • The variables X I, X 2 are conditionally &>-independent given
}2 iff for &>-almost all w,
Q(w, . ) = QI(W, . ) x Q2(W, . )
where Qi(W, . ) is the projection of Q(w, . ) onto ffi . Necessarily, Qi is a regular
conditional &>-distribution for Xi given }2. The equivalence of these condi-
tions is easy, using (lOa).
(53) Theorem. There is a unique probability &> on (n, ff) with &>n; I = &>n
for all n.
Let Z be the positive integers. Let S be the set of strictly increasing functions
from Z to Z. Call s E S a subsequence of Z. For s E S, the range of s is the
s-image s(Z) of Z ; and s(n) ~ n. Say s is a subsequence or on special occasions
a sub-subsequence of t E S iff s E S and for each nEZ, there is a O"(n) E Z with
s(n) = t[O"(n»). This well-defines 0", and forces 0" E S. Further, s(n) ~ t(n),
because O"(n) ~ n. Geometrically, s E S is a subsequence of t E S iff the range
of s is a subset of the range of t. Thus, if s is a subsequence of t, and t is a
subsequence of u E S, then s is a subsequence of u. If s E S, and m = 0, 1, ... ,
define s(m + . ) E S as follows:
s(m + . )(n) = s(m + n) for n E Z.
Of course, s(m + . ) is a subsequence of s.
Here is a related notion. Say s is eventually a subsequence of t E S iff
s E Sand s(m + . ) is a subsequence of t for some m = 0, 1, .... Geometrically,
s E S is eventually a subsequence of t E S iff the range of s differs by a finite
set from a subset of the range of t. In particular, if s is eventually a subsequence
of t, and t is eventually a subsequence of u E S, then s is eventually a sub-
sequence of u.
To state the first diagonal principle, let Sl E S and let sn+ 1 be a subsequence
of Sn for n = 1,2, .... Let d be the diagonal sequence:
d(n) = sn(n) for n = 1,2, ....
Iimn~(xJ[t(n)]= YEn.
356 APPENDIX [10
2 3 4
Sl 2 3 4 5
S2 3 4 5 6
S3 4 5 6 7
S4 5 6 7 8
Figur« 2.
*
13. CLASSICAL LEBESGUE MEASURE
limh_o f
R"
If(x + h) - f(x)IAidx) = 0.
Let f be a real-valued function on [0, 1]. Let S = {so, S1, ... , Sft} be a finite
°
subset of [0, 1] with = So < SI < ... < Sft = 1. Let
~S=max{(sj+l-s):j=O, ... ,n-l},
1 Reference: (Saks, 1964, Theorem 6.1 on p. 117. Theorem 10.2 on p. 129 is the n-
dimensional generalization, which is harder.)
358 APPENDIX [10
and
Sf = !:j:J If(sj+d - f(s)l.
Let W(S,j) = !:j:J (M j - mj), where
M j = max {j(t):Sj ~ t ~ sj+d and mj = min {j(t):Sj ~ t ~ Sj+d.
The variation off is sups Sf; if this number is finite, f is of bounded variation.
If Sn is nondecreasing and JS n t 0, then Snf tends to the variation of f; so
W(Sn,j) must tend to the variation of f also.
(60) Lebesgue's theorem. 1 Iff is of bounded variation, then f has a finite
derivative Lebesgue almost everywhere.
Theorem (60) can be sharpened as follows.
(61) Theorem. 2 Suppose f is of bounded variation. The pointwise derivative
off is a version of the Radon-N ikodym derivative of the absolutely continuous
part off, with respect to Lebesgue measure.
Even more is true.
(62) de la Vallee Poussin's theorem. 3 SupposeJis of bounded variation. The
positive, continuous, singular part off is concentrated on {x :f'(x) = oo}.
ASSUMPTION. For the rest of this section, assume J is a continuous
function on [0, 1].
1 References: (Saks, 1964, Theorem 5.4 on p. llS); (Riesz-Nagy, 1955, Chapter 1) has a
proof from first principles. This theorem is hard.
2 References: (Dunford and Schwartz, 1958, III.l2); (Saks, 1964, Theorem 7.4 on
p. 1l9). It's hard.
3 Reference: (Saks, 1964, Theorem 9.6 on p. 127). Theorems (60--62) are hard.
4 Reference: (Saks, 1964, Theorem 6.4 on p. 280).
14] REAL VARIABLES 359
o a Xo
Figure 3.
where
Mn,i = max {J(t):j/2 n ~ t ~ (j + 1)j2n}
mn,i = min {J(t):j/2n ~ t
The upper right Dini derivative D* f is defined by:
~ (j + 1)j2"}.
*
D* Ix = lim sup, !o [f(x + e) - f(x)]je,
for 0 ~ x < 1.
(64) Zygmund's theorem. l If the set of values assumed by f on the set of x
with D* Ix ~ 0 includes no proper interval, then f is nondecreasing.
(68) Corollary.
Use (66).
If D* f == 0 on [0, 1), then f is constant on [0, 1].
*
PROOF. Use (67).
(69) Theorem. Suppose f has a finite, right continuous, right derivative f + *
r
on (0, 1), whichhasafinite integral over (e, 1 - e)foranye > O.IfO < x < y < 1,
then
PROOF. Let
Then g is continuous and D*g = 0 on [x, 1), while g(x) = O. Use (68). *
Miscellany
Let 11 be a probability on the Borel subsets of [0, (0). Its Lap/ace transform
<p is this function of nonnegative A:
cp(A) = r
J[O,oo)
e- lx l1(dx).
( n) n!
m = m!(n - m)!'
the number of subsets with m elements which can be chosen from a set with
n elements. Temporarily, let
and
imply
f 0 g(t) = f
g(r)
0 f'(u) du
= {f'[g(t)] g(dt)
= {f'[g(t)]g'(t) dt.
The first line holds by (61); the second by (20), for the g-distribution of g is
uniform on [0, 1]; and the third by (61) and (11). Now use (61) and (11) to
differentiate. *
(72) Theorem. Suppose the upper or lower right Dini derivative of f is
finite at all but a countable number of points, and f is of bounded variation.
Then f is absolutely continuous.
PROOF. Use (62).
*
16. CONVEX FUNCTIONS
As in Figure 4,
f(y') - f(x') ~ f(y) - f(x)
(73)
y' - x' y - x
Indeed, the case x = x' restates the definition of convexity, as does the
case y = y'. General (73) follows by compining the two cases: the slope of
16] CONVEX FUNCTIONS 363
a x x' y y' b
Figure 4.
f over (x, y) is at most the slope over (x, y'), which does not exceed the
slope over (x', y').
(74) Theorem. Suppose f is convex on (a, b).
(a) f is continuous.
(b) f has a finite right derivative f +, which is nondecreasing and con-
tinuous from the right.
(c) f has a finite left derivative f -, which is nondecreasing and con-
tinuous from the left.
(d) f+ ~ f-·
(e) The discontinuity sets of f + and f - coincide, and are countable.
Off this set, f + = f - .
(f) For a < x < y < b,
PROOF. Claim (b). Let y decrease to x > w. The slope of f over (x, y)
non increases and is at least the slope over (w, x) by (73), proving that
(75) f+(x) exists and is at most the slope of f over (x, y), and at least the
slope over (w, x).
364 APPENDIX [10
a x y z b
Figure 5.
You can use (73) to show that f + is nondecreasing. I will argue that f + is
continuous from the right at x. Let
a < x < y < z < b,
as in Figure 5.
Then f+(x) ~ f+(y). But f+(y) is at most the slope of f over (y, z) by (75),
which tends to the slope of f over (x, z) when y tends to x. That is,
Let
A(t) = s(t - x) + f(x),
the linear function with slope s which agrees with f at x. Then
A~ f on (a, b).
In particular, a convex function on a finite interval is bounded below.
PROOF. Let x < t < b: the other case is symmetric. Then
s ~ f +(x) ~ f(t~ =~(X)
by (75); look at Figure 6. So
+ f(x)
f(t) ~ s(t - x)
then f is convex.
f(y) = f(x) + r g(t) dt when a < x < y < b,
Figure 6.
366 APPENDIX [10
PROOF. Condition (a) implies (b) with g = f +, by (69). Suppose (b). Fix
x, y, z with
a < x < y < z < b.
Abbreviate c for the slope of f over (x, y), and d for the slope of f over (y, z).
Then
c = -1- fY g(t)dt ~ --
1 fZ
g(t)dt = d.
y-X z-yy
*
X
LARS V. AHLFORS (1953; 2nd ed., 1965). Complex Analysis, McGraw-Hili, New York.
DAVID BLACKWELL (1954). On a class of probability spaces, Proc. 3rd Berk. Symp.,
Vol. 2, pp. 1-6.
DAVID BLACKWELL (1955). On transient Markov processes with a countable number
of states and stationary transition probabilities, Ann. Math. Statist., Vol. 26, pp.
654-658.
DAVID BLACKWELL (1958). Another countable Markov process with only instan-
taneous states, Ann. Math. Statist., Vol. 29, pp. 313-316.
DAVID BLACKWELL (1962). Representation of nonnegative martingales on transient
Markov chains, Mimeograph, Statistics Department, University of California at
Berkeley.
DAVID BLACKWELL and LESTER DUBINS (1963). A converse to the dominated conver-
gence theorem, Illinois J. Math., Vol. 7, pp. 508-514.
DAVID BLACKWELL and DAVID A. FREEDMAN (1964). The tail a-field of a Markov
chain and a theorem ofOrey, Ann. Math. Statist., Vol. 35, pp. 1291-1295.
DAVID BLACKWELL and DAVID FREEDMAN (1968). On the local behavior of Markov
transition probabilities, Ann. Math. Statist., Vol. 39, pp. 2123-2127.
DAVID BLACKWELL and DAVID KENDALL (1964). The Martin boundary for P6lya's
urn scheme and an application to stochastic population growth, J. Appl. Proba-
bility, Vol. 1, pp. 284-296.
R. M. BLUMENTHAL (1957). An extended Markov property, Trans. Amer. Math. Soc.,
Vol. 85, pp. 52-72.
R. M. BLUMENTHAL and R. K. GETOOR (1968). Markov Processes and Potential
Theory, Academic Press, New York.
R. M. BLUMENTHAL, R. GETOOR, and H. P. MCKEAN, Jr. (1962). Markov processes
with identical hitting distributions, Illinois J. Math., Vol. 6, pp. 402-420.
LEO BREIMAN (1968). Probability, Addison-Wesley, Reading.
D. BURKHOLDER (1962). Transient processes and a problem of Blackwell, Mimeo-
graph, Statistics Department, University of California at Berkeley.
D. BURKHOLDER (1962). Successive conditional expectations of an integrable function,
Ann. Math. Statist., Vol. 33, pp. 887-893.
KAI LAI CHUNG (1960; 2nd ed., 1967). Markov Chains with Stationary Transition
Probabilities, Springer, Berlin.
KAI LAI CHUNG (1963). On the boundary theory for Markov chains, I, Acta. Math.,
Vol. 110, pp. 19-77.
KAI LAI CHUNG (1966). On the boundary theory for Markov chains, II, Acta. Math.,
Vol. Jl5, pp. Ill-163.
367
368 BIBLIOGRAPHY
KAI LAI CHUNG and W. H. J. FUCHS (1951). On the distribution of values of sums of
random variables, Mem. Amer. Math. Soc., no. 6.
R. CoGBURN and H. G. TUCKER (1961). A limit theorem for a function of the incre-
ments of a decomposable process, Trans. Amer. Math. Soc., Vol. 99, pp. 278-284.
HARALD CRAMER (1957). Mathematical Methods of Statistics, Princeton University
Press.
ABRAHAM DE MOIVRE (1718). The Doctrine of Chances, Pearson, London, Chelsea,
New York (1967).
C. DERMAN (1954). A solution to a set of fundamental equations in Markov chains,
Proc. Amer. Math. Soc., Vol. 5, pp. 332-334.
W. DoEBLlN (1938). Sur deux problemes de M. Kolmogorov concernant les chaines
denombrables, Bull. Soc. Math. France, Vol. 66, pp. 21()-220.
W. DoEBLlN (1939). Sur certains mouvements aleatoires discontinus, Skand. Akt.,
Vol. 22, pp. 211-222.
MONROE D. DoNSKER (1951). An invariance principle for certain probability limit
theorems, Mem. Amer. Math. Soc., no. 6.
J. L. DooB (1942). Topics in the theory of Markoff chains, Trans. Amer. Mllth. Soc.,
Vol. 52, pp. 37-64.
J. L. DooB (1945). Markoff chains-denumerable case, Trans. Amer. Math. Soc., Vol.
58, pp. 455-473.
J. L. DooB (1953). Stochastic Processes, Wiley, New York.
J. L. DooB (1959). Discrete potential theory and boundaries, J. Math. Mech., Vol. 8,
pp. 433-458, 993.
J. L. DooB (1968). Compactification of the discrete state space of a Markov process,
Z. Wahrscheinlichkeitstheorie, Vol. 10, pp. 236--251.
LESTER E. DUBINS and DAVID A. FREEDMAN (1964). Measurable sets of measures,
Pac. J. Math., Vol. 14, pp. 1211-1222.
LESTER E. DUBINS and DAVID A. FREEDMAN (1965). A sharper form of the Borel-
Cantelli lemma and the strong law, Ann. Math. Statist., Vol. 36, pp. 80()-807.
LESTER E. DUBINS and GIDEON SCHWARZ (1965). On continuous martingales, Proc.
Nat. A cad. Sci. USA, Vol. 53, pp. 913-916.
NELSON DUNFORD and JACOB T. SCHWARTZ (1958). Linear operators, Part /, Wiley,
New York.
A. DVORETZKY, P. ERDOS, and S. KAKUTANI (1960). Nonincrease everywhere of the
Brownian motion process, Proc. 4th Berk. Symp., Vol. 2, pp. 103-116.
E. B. DYNKIN (1965). Markov Processes, Springer, Berlin.
P. ERDOS and M. KAC (1946). On certain limit theorems of the theory of probability.
Bull. Amer. Math. Soc., Vol. 52. pp. 292-302.
WILLIAM FELLER (1945). On the integro-differential equations of purely discontinuous
Markoffprocesses, Trans. Amer. Math. Soc., Vol. 48, pp. 488-515.
WILLIAM FELLER (1956). Boundaries induced by nonnegative matrices, Trans. Amer.
Math. Soc., Vol. 83, pp. 19-54.
WILLIAM FELLER (1957). On boundaries and lateral conditions for the Kolmogoroff
differential equations, Ann. of Math., Vol. 65, pp. 527-570.
WILLIAM FELLER (1959). Non-Markovian processes with the semigroup property,
Ann. Math. Statist., Vol. 30, pp. 1252-1253.
BIBLIOGRAPHY 369
WILLIAM FELLER (1961). A simple proof for renewal theorems, Comm. Pure Appl.
Math., Vol. 14, pp. 285-293.
WILLIAM FELLER (1966). An introduction to probability theory and its applications,
Vol. 2, Wiley, New York.
WILLIAM FELLER (1968). An introduction to probability theory and its applications,
Vol. 1, 3rd ed., Wiley, New York.
WILLIAM FELLER and H. P. MCKEAN, Jr. (1956). A diffusion equivalent to a countable
Markov chain, Proc. Nat. Acad. Sci. USA, Vol. 42, pp. 351-354.
R. GETooR (1965). Additive functionals and excessive functions, Ann. Math. Statist.,
Vol. 36, pp. 409--423.
G. H. HARDY, J. E. LITTLEWOOD, and G. POLYA (1934). Inequalities, Cambridge
University Press.
T. E. HARRIS (1952). First passage and recurrence distributions, Trans. Amer. Math.
Soc., Vol. 73, pp. 471--486.
T. E. HARRIS and H. ROBBINS (1953). Ergodic theory of Markov chains admitting an
infinite invariant measure, Proc. Nat. Acad. Sci. USA, Vol. 39, pp. 860--864.
P. HARTMAN and A. WINTNER (1941). On the law of the iterated logarithm, Amer.
J. Math., Vol. 63, pp. 169-176.
FELIX HAUSDORFF (1957). Set Theory, Chelsea, New York.
EDWIN HEWITT and L. J. SAVAGE (1955). Symmetric measures on Cartesian products,
Trans. Amer. Math. Soc., Vol. 80, pp. 470--501.
E. HEWITT and K. STROMBERG (1965). Real and Abstract Analysis, Springer,
Berlin.
G. A. HUNT (1956). Some theorems concerning Brownian motion, Trans. Amer.
Math. Soc., Vol. 81, pp. 294-319.
G. A. HUNT (1957). Markoff processes and potentials, 1,2,3, Illinois J. Math., Vol. 1,
pp. 44-93; Vol. 1, pp. 316--369; Vol. 2, pp. 151-213 (1958).
G. A. HUNT (1960). Markoff chains and Martin boundaries, Illinois J. Math., Vol. 4,
pp.313-340.
K. In) and H. P. McKEAN, Jr. (1965). Diffusion Processes and Their Sample Paths,
Springer, Berlin.
W. B. JURKAT (1960). On the analytic structure of semigroups of positive matrices,
Math. Zeit., Vol. 73, pp. 346--365.
A. A. JUSKEVIC (1959). Differentiability of transition probabilities of a homogeneous
Markov process with countably many states, Moskov. Gos. Univ. Ucenye Zapiski,
No. 186, pp. 141-159; in Russian. Reviewed in Math. Rev. No. 3124 (1963).
M. KAC (1947). On the notion of recurrence in discrete stochastic processes, Bull.
Amer. Math. Soc., Vol. 53, pp. 1002-1010.
S. KAKUTANI (1943). Induced measure preserving transformations, Proc. Impl.
Acad. Tokyo, Vol. 19, pp. 635--641.
J. G. KEMENY and J. L. SNELL (1960). Finite Markov Chains, Van Nostrand, Prince-
ton.
J. G. KEMENY, J. SNELL, and A. W. KNAPP (1966). Denumerable Markov Chains,
Van Nostrand, Princeton.
A. KHINTCHINE (1924). Ein Satz der Wahrscheinlichkeitsrechnung, Fund. Math.,
Vol. 6, pp. 9-;20.
370 BIBLIOGRAPHY
DANIEL RAy (1967). Some local properties of Markov processes, Proc. 5th Berk.
Symp., Vol. 2, part 2, pp. 201-212.
G. E. H. REUTER (1957). Denumerable Markov processes and the associated con-
traction semigroups on p, Acta Math., Vol. 97, pp. 1-46.
G. E. H. REUTER (1959). Denumerable Markov processes. J. London Math. Soc.,
Vol. 34, pp. 81-91.
G. E. H. REUTER (1969). Remarks on a Markov chain example of Kolmogorov, Z.
Wahrscheinlichkeitstheorie, Vol. 13, pp. 315-320.
F. RIEsz and B. Sz. NAGY (1955). Functional Analysis, Ungar, New York.
B. A. ROGOZIN (1961). On an estimate of the concentration function, Theor. Proba-
bility Appl., Vol. 6, pp. 94-96.
H. L. ROYDEN (1963). Real Analysis, Macmillan, New York.
STANISLAW SAKS (1964). Theory of the Integral, 2nd rev. ed., Dover, New York.
JAMES SERRIN and D. E. VARBERG (1969). A general chain rule for derivatives and the
change of variables formula for the Lebesgue integral, Amer. Math. Monthly, Vol.
76, pp. 514-520.
A. SKOROKHOD (1965). Studies in the Theory of Random Processes, Addison-Wesley,
Reading.
GERALD SMITH (1964). Instantaneous states of Markov processes, Trans. Amer.
Math. Soc., Vol. 110, pp. 185-195.
J. M. O. SPEAKMAN (1967). Two Markov chains with a common skeleton, Z.
Wahrscheinlichkeitstheorie, Vol. 7, p. 224.
FRANK SPITZER (1956). A combinatorial lemma and its applications to probability
theory, Trans. Amer. Math. Soc., Vol. 82, pp. 323-339.
FRANK SPITZER (1964). Principles of Random Walk, Van Nostrand, Princeton.
VOLKER STRASSEN (1964). An invariance principle for the law of the iterated logarithm,
Z. Wahrscheinlichkeitstheorie, Vol. 3, pp. 211-226.
VOLKER STRASSEN (1966). A converse to the law of the interated logarithm, Z.
Wahrscheinlichkeitstheorie, Vol. 4, pp. 265-268.
VOLKER STRASSEN (1966a). Almost sure behavior of sums of independent random
variables and martingales, Proc. 5th Berk. Symp., Vol. 2, part I, pp. 315-343.
H. F. TROTTER (195~). A property of Brownian motion paths, Illinois J. Math., Vol. 2,
pp.425-433.
A. WALD (1944). On cumulative sums of random variables, Ann. Math. Statist., Vol.
IS, pp. 283-296.
N. WIENER (1923). Differential space, J. Math. and Phys., Vol. 2, pp. 131-174.
DAVID WILLIAMS (1964). On the construction problem for Markov chains, Z. Wahr-
scheinlichkeitstheorie, Vol. 3, pp. 227-246.
DAVID WILLIAMS (1966). A new method of approximation in Markov chain theory
and its application to some problems in the theory of random time substitution,
Proc. Lond. Math. Soc. (3), Vol. 16, pp. 213-240.
DAVID WILLIAMS (1967). Local time at fictitious states, Bull. Amer. Math. Soc., Vol.
73, pp. 542-544.
DAVID WILLIAMS (1967a): A note on the Q-matrices of Markov chains, Z. Wahrschein-
lichkeitstheorie, Vol. 7, pp. 116--121.
372 BIBLIOGRAPHY
DAVID WILLIAMS (l967b). On local time for Markov chains, Bull. Amer. Math. Soc.,
Vol. 73, pp. 432-433.
HELEN WmENBERG (1964). Limiting distributions of random sums of independent
random variables, Z. Wahrscheinlichkeitstheorie, Vol. I, pp.7-18.
A. ZYGMUND (1959). Trigonometric Series, Cambridge University Press.
Additional references
absolute continuity, 333, 345, 360, 362 concentration function, 80, 99ff
Ahlfors, 365 Kolmogorov's inequality on, 104
almost sure statements, 334 of a sum tends to 0, 102
analytic functions, 365 conditional
arcsine law, 82, 93 distribution, 347
atoms, 204, 334, 352 expectation, probability, 338, 347
construction of a Markov chain
backward equation, 150, 243ff, 325; see from its visiting process and holding
"forward equation" times, 172ff
Banach, 358 which moves through its states in
Banach algebra, 147, 151 order, 173ff
binomial distribution which moves through the rationals,
bound on tails, 64 in order, 180,202
concentration of, l04ff with given generator, 154ff, 197ff,
maximal term, 273 237ff, 165ff
Blackwell, 39, 111,204,297,334 with given transitions, stable states
Blackwell and Freedman, 266 and regular sample functions, 221
Blackwell and Kendall, 131 with given transitions and quasi-
blocks, 15, 76 regular sample functions, 307
Blumenthal, 237 with sample functions which are ini-
boundary, 111, 124,293 tially step functions and are then
Brownian motion, 95 constant, 154ff
category with sample functions which are step
of set of infinities, 311 functions, 154ff, 165ff
of singularities in P( " a, b), 266 continuity
central limit theorem, 82 absolute, of transitions, 266
change of variables, 337 in probability of a chain, 221, 312
Chebychev, 332 of pre-t sigma-field in t, 204
Chung, 48, 82, 83, 98, 145, 146,218,223, of transitions, 143
237,243,245,266,300,314,323,325 convergence
Chung and Fuchs, 38 almost sure, 334
class in a metric space, 346
alternative description, 19 in U, 334
communicating, 17,40 in probability, 255, 333
cyclically moving, 18,40 of G(i, ~.)/pG(~.), 124
closed sets, 346 of states to the boundary, 124
373
374 INDEX
DESCRIPTION
I've listed here the symbols with some degree of permanence; the list is
not complete, and local usage is sometimes different. The listing is alpha-
betical, first English then Greek; I give a quick definition, if possible, and
a page reference for the complete definition.
Sections 10.1-3 discuss notation and references.
ENGLISH
C, Cd: index sets, Chapter 6 only, page 184
C/[m, w] = CAw]: set, Chapter 6 only, page 184
C(i): communicating class containing i, page 17
Cr{i): cyclically moving class, page 18
ex: concentration function of X, page 99
eP: expected number of visits, page 19
e"P : expected number of visits, page 49
ePH: expected number of visits, page 34
eP{ i} : expected number of visits, page 47
E is expectation
E j is Prexpectation
E: set, Chapter 4 only, page 118
8: exchangeable a-field, Chapter 1 only, page 39
8: equivalent to invariant a-field, Chapter 4 only, page 118
fP: hitting probability, page 19
f"P: hitting probability, page 19
fPH: hitting probability, page 34
f x v: measure, Chapter 2 only, page 51
f/' T: pre-r sigma-field, page 11
f/'(r): pre-r sigma-field, page 203
f/'(r + ): pre-r sigma-field
379
380 SYMBOL FINDER
GREEK