Sie sind auf Seite 1von 394

David Freedman

MARKOV CHAINS

With 40 Figures

Springer-Vedag
New York Heidelberg Berlin
David Freedman
Department of Statistics
University of California
Berkeley, CA 94720
U.S.A.

AMS Subject Classifications: 60JlO, 60J27

Library of Congress Cataloging in Publication Data


Freedman, David, 1938-
Markov chains.
Originally published: San Francisco: Holden-Day,
1971 (Holden-Day series in probability and statistics)
Bibliography: p.
Includes index.
I. Markov processes. I. Title. II. Series:
Holden-Day series in probability and statistics.
QA274.7.F74 1983 519.2'33 82-19577

The original version of this book was published by Holden-Day, Inc. in 1971.

© 1971 by Holden-Day Inc.


© 1983 by David A. Freedman
Softcover reprint of the hardcover 1st edition 1983
All rights reserved. No part of this book may be translated or reproduced in any
form without written permission from Springer-Verlag, 175 Fifth Avenue, New
York, N.Y. 10010, U.S.A.

9 8 7 6 5 432 1

ISBN-13: 978-1-4612-5502-4 e-ISBN-13: 978-1-4612-5500-0


DOl: 10.1007/978-1-4612-5500-0
TO WILLIAM FELLER
PREFACE

A long time ago I started writing a book about Markov chains, Brownian
motion, and diffusion. I soon had two hundred pages of manuscript and my
publisher was enthusiastic. Some years and several drafts later, I had a
thousand pages of manuscript, and my publisher was less enthusiastic. So
we made it a trilogy:

Markov Chains
Brownian Motion and Diffusion
Approximating Countable Markov Chains

familiarly - MC, B & D, and ACM.


I wrote the first two books for beginning graduate students with some
knowledge of probability; if you can follow Sections 10.4 to 10.9 of Markov
Chains you're in. The first two books are quite independent of one another,
and completely independent of the third. This last book is a monograph
which explains one way to think about chains with instantaneous states. The
results in it are supposed to be new, except where there are specific disclaim-
ers; it's written in the framework of Markov Chains.
Most of the proofs in the trilogy are new, and I tried hard to make them
explicit. The old ones were often elegant, but I seldom saw what made them
go. With my own, I can sometimes show you why things work. And, as I will
VB1 PREFACE

argue in a minute, my demonstrations are easier technically. If I wrote them


down well enough, you may come to agree.
The approach in all three books is constructive: I did not use the notion
of separability for stochastic processes and in general avoided the uncount-
able axiom of choice. Separability is a great idea for dealing with any really
large class of processes. For Markov chains I find it less satisfactory. To begin
with, a theorem on Markov chains typically amounts to a statement about a
probability on a Borel a-field. It's a shame to have the proof depend on the
existence of an unnamable set. Also, separability proofs usually have two
parts. There is an abstract part which establishes the existence of a separable
version. And there is a combinatorial argument, which establishes some prop-
erty of the separable version by looking at the behavior of the process on a
countable set of times. If you take the constructive approach, the combina-
torial argument alone is enough proof.
When I started writing, I believed in regular conditional distributions. To
me they're natural and intuitive objects, and the first draft was full of them. I
told it like it was, and if the details were a little hard to supply, that was the
reader's problem. Eventually I got tired of writing a book intelligible only
to me. And I came to believe that in most proofs, the main point is estimat-
ing a probability number: the fewer complicated intermediaries, the better.
So I switched to computing integrals by Fubini. This is a more powerful tech-
nique than you might think and it makes for proofs that can be checked.
Virtually all the conditional distributions were banished to the Appendix.
The major exception is Chapter 4 of Markov Chains, where the vividness of
the conditional distribution language compensates for its technical difficulty.
In Markov Chains, Chapters 3 to 6 and 8 cover material not usuaI1y avail-
able in textbooks - for instance: invariance principles for functionals of a
Markov chain; Kolmogorov's inequality on the concentration function; the
boundary, with examples; and the construction of a variety of continuous-
time chains from their jump processes and holding times. Some of these con-
structions are part of the folklore, but I think this is the first careful public
treatment.
Brownian Motion and Diffusion dispenses with most of the customary
transform apparatus, again for the sake of computing probability numbers
more directly. The chapter on Brownian motion emphasizes topics which
haven't had much textbook coverage, like square variation, the reflection
principle, and the invariance principle. The chapter on diffusion shows how
to obtain the process from Brownian motion by changing time.
I studied with the great men for a time, and saw what they did. The trilogy
is what I learned. All I can add is my recommendation that you buy at least
one copy of each book.
PREFACE IX

User's guide to Markov Chains

In one semester, you can cover Sections 1.1-9, 5.1-3, 7.1-3 and 9.1-3. This
gets you the basic results for both discrete and continuous time. In one year
you could do the whole book, provided you handle Chapters 4, 6, and 8
lightly. Chapters 2-4, 6 and 8 are largely independent of one another, treat
specialized topics, and are more difficult; Section 8.5 is particularly hard.
I do recommend looking at Section 6.6 for some extra grip on Markov times.
Sections 10.1-3 explain the cruel and unusual notation, and the reference
system; 10.4-9 review probability theory quickly; 10.10-17 do the more ex-
otic analyses which I've found useful at various places in the trilogy; and
a few things are in 10.10-17 just because I like them.
Chapter 10 is repeated in B & D; Chapters I, 5, 7 and 10 are repeated in
ACM. The three books have a common preface and bibliography. Each has
its own index and symbol finder.

Acknowledgments

Much of the trilogy is an exposition of the work of other mathematicians,


who sometimes get explicit credit for their ideas. Writing Markov Chains
would have been impossible without constant reference to Chung (1960).
Doob (1953) and Feller (1968) were also heavy involuntary contributors.
The diffusion part of Brownian Motion and Diffusion is a peasant's version
of Ito and McKean (1965).
The influence of David Blackwell, Lester Dubins and Roger Purves will
be found on many pages, as will that of my honored teacher, William Feller.
Ronald Pyke and Harry Reuter read large parts of the manuscript and made
an uncomfortably large number of excellent suggestions, many of which I
was forced to accept. I also tested drafts on several generations of graduate
students, who were patient, encouraging and helpful. These drafts were
faithfully typed from the cuneiform by Gail Salo.
The Sloan Foundation and the US Air Force Office of Scientific Research
supported me for various periods, always generously, while I did the writ-
ing. I finished two drafts while visiting the Hebrew University in Jerusalem,
Imperial College in London, and the University of Tel Aviv. I am grateful
to the firm of Cohen, Leithman, Kaufman, Yarosky and Fish, criminal law-
yers and xerographers in Montreal. And I am still nostalgic for Cohen's Bar
in Jerusalem, the caravansary where I wrote the first final draft of Approxi-
mating Countable Markov Chains.
David Freedman
Berkeley, California
July, 1970
x PREFACE

Preface to the Springer edition

My books on Markov Chains, Brownian Motion and Diffusion, and


Approximating Countable Markov Chains, were first published in the early
1970's, and have not been readily available since then. However, there still
seems to be some substantial interest in them, perhaps due to their construc-
tive and set-theoretic flavor, and the extensive use of concrete examples. I
am pleased that Springer-Verlag has agreed to reprint the books, making
them available again to the scholarly public. I have taken the occasion to
correct many small errors, and to add a few references to new work.

David Freedman
Berkeley, California
September, 1982
TABLE OF CONTENTS

Part I. Discrete time


1. INTRODUCTION TO DISCRETE TIME
I. Foreword 1
2. Summary 4
3. The Markov and strong Markov properties 7
4. Classification of states 16
5. Recurrence 19
6. The renewal theorem 22
7. The limits of pn 25
8. Positive recurrence 26
9. Invariant probabilities 29
10. The Bernoulli walk 32
II. Forbidden transitions 34
12. The Harris walk 36
13. The tail IT-field and a theorem of Orey 39
14. Examples 45

2. RATIO LIMIT THEOREMS


1. Introduction 47
2. Reversal of time 48
3. Proofs of Derman and Doeblin 50
4. Variations 53
5. Restricting the range 59
XU CONTENTS

6. Proof of Kingman-Orey 64
7. An example of Dyson 70
8. Almost everywhere ratio limit theorems 73
9. The sum of a function over different j-blocks 75

3. SOME INVARIANCE PRINCIPLES


1. Introduction 82
2. Estimating the partial sums 83
3. The number of positive sums 87
4. Some invariance principles 95
5. The concentration function 99

4. THE BOUNDARY
1. Introduction 111
2. Proofs 113
3. A convergence theorem 121
4. Examples 124
5. The last visit to i before the first visit to J\{i} 132

Part II. Continuous time


5. INTRODUCTION TO CONTINUOUS TIME
1. Semigroups and processes 138
2. Analytic properties 142
3. Uniform semigroups 147
4. Uniform sub stochastic semigroups 150
5. The exponential distribution 152
6. The step function case 154
7. The uniform case 165

6. EXAMPLES FOR THE STABLE CASE


1. Introduction 172
2. The first construction 173
3. Examples on the first construction 179
4. The second construction 181
5. Examples on the second construction 197
6. Markov times 203
7. Crossing the infinities 210
CONTENTS Xlll

7. THE STABLE CASE


1. Introduction 216
2. Regular sample functions 217
3. The post-exit process 223
4. The strong Markov property 229
5. The minimal solution 237
6. The backward and forward equations 243

8. MORE EXAMPLES FOR THE STABLE CASE


1. An oscillating semigroup 252
2. A semigroup with an infinite second derivative 260
3. Large oscillations in P (t, 1,1 ) 266
4. An example of Speakman 271
5. The embedded jump process is not Markov 273
6. Isolated infinities 292
7. The set of infinities is bad 295

9. THE GENERAL CASE


1. An example of Blackwell 297
2. Quasiregular sample functions 299
3. The sets of constancy 308
4. The strong Markov property 315
5. The post-exit process 323
6. The abstract case 326

Part III.

10. APPENDIX
1. Notation 329
2. Numbering 330
3. Bibliography 330
4. The abstract Lebesgue integral 331
5. Atoms 334
6. Independence 337
7. Conditioning 338
8. Martingales 339
9. Metric spaces 346
10. Regular conditional distributions 347
XIV CONTENTS

11. The Kolmogorov consistency theorem 353


12. The diagonal argument 354
13. Classical Lebesgue measure 356
14. Real variables 357
15. Absolute continuity 360
16. Convex functions 361
17. Complex variables 365

BIBLIOGRAPHY 367

INDEX 373

SYMBOL FINDER 379


1

INTRODUCTION TO DISCRETE
TIME

1. FOREWORD

Consider a stochastic process which moves through a countable set I of


states. At stage n, the process decides where to go next by a random mech-
anism which depends only on the current state, and not on the previous
history or even on the time n. These processes are called Markov chains with
stationary transitions and countable stat~ space. They are the object of study
in the first part of this book. More formally, there is a countable set of
states I, and a stochastic process X o, Xl' ... on some probability triple
(!!l',:F, rY'), with Xn(x) E /for all nonnegative integer n and x E!!l'. Moreover,
there is a function P on I x I such that

That is, the conditional distribution of X n+1 given X o, ... , Xn depends on


X n, but not on n or on X o, ... ,Xn_l . The process X is said to be Markov
with stationary transitions P, or to have transitions P. Suppose I is reduced to
the essential range, namely the set of j with rY'{Xn = j} > 0 for some n.
Then the transitions P are unique, and form a stochastic matrix. Here is an
equivalent characterization: X is Markov with stationary transitions P iff

rY'{ X n = jn for n = 0, ... , N} = rY'{ Xo = jo} n~:~ P(jn' jn+l)

for all Nand jn E I. If rY'{Xo = j} = 1 for some j E I, then X is said to start

I want to thank Richard Olshen for checking the final draft of this chapter.
1
2 INTRODUCTION TO DISCRETE TIME [l

from j or to have starting state j. This involves no real loss in generality, as


one sees by conditioning on Xo.
(1) Definition. A stochastic matrix P on I is a function on I X I, such that:

P(i,j) ~ 0 for all i and j in I;


and
'LiE[ P(i,j) = 1 for all i in I.

If P and Q are stochastic matrices on I, so is PQ, where

(PQ)(i, k) = 'LiE[ P(i,j)Q(j, k).


And so are P", where pI = P and pn+1 = ppn.
Here are three examples: let Yn be independent and identically distributed,
taking the values 1 and -1 with equal probability t.
(2) Example. Let Xo = 1. For n = 1, 2, ... , let Xn = Y w Then {Xn} is a
Markov chain with state space I = {-I, I} and stationary transitions P,
where P(i,j) = t for all i and j in I. The starting state is 1.
(3) Example. Let Xo = O. For n = 1, 2, ... , let Xn = X n- I + Y n. Then
{Xn} is a Markov chain with the integers for state space and stationary
transitions P, where
P(n,11 + 1) = Pen, n - 1) =t
Pen, 111) = 0 when In - ml y!:: 1.
The starting state is O.
(4) Example. Let Xn = (Yn, Y n+l ) for n = 0, 1, .... Then {Xn} is a
Markov chain with state space I and stationary transitions P, where I is the
set of pairs (a, b) with a = ± 1 and b = ± 1, and
P[(a, b), (c, d)] =0 when b y!:: c
=! when b = c.
By contrast, let Xn = Yn + Yn+1' Now Xn is a function of X n. But {Xn} is
not Markov.
Return to the general Markov chain X with stationary transitions. For
technical reasons, it is convenient to study the distribution of X rather than
X itself. The formal exposition begins in Section 3 by describing these distri-
butions. This will be repeated here, with a brief explanation of how to translate
the results back into statements about X. Introduce the space r of [-
sequences. That is, r is the set of functions w from the nonnegative integers
1.1] FOREWORD 3

to I. For n = 0, 1, ... , define the coordinate function ~n on r by


~n(w) = wen) for WEn.

Then ~o, ~l' ... is the coordinate process. Give I'" the smallest a-field
a(/"') over which each coordinate function is measurable. Thus, a(JOC') is
generated by the cylinders
go = io, .•. , ~n = in}·
For any i E I and stochastic matrix P on I, there is one and only one proba-
bility Pi on r making the coordinate process Markov with stationary
transitions P and starting state i. In other terms:
Pig n = in for n = 0, ... , N} = n~':-ol P(in' in+1)'
for all N and in E I with io = i. The probability Pi really does depend only
on P and i.
Now r is the sample space for X, namely the space of all realizations.
More formally, there is a mapping M from !![ to I"', uniquely defined by
the relation
~ n(Mx) = Xn(x) for all n = 0, I, ... and x E .0£.
That is, the nth coordinate of Mx is Xn(x), and Mx is the sequence of states
X passes through at x, namely: (Xo(x), X1(x), X 2(x), ... ). Check that Mis
measurable. Fix i E I and a stochastic matrix P on l. Suppose X is Markov
with stationary transitions P and starting state i. With respect to the distri-
bution of X, namely f!lJ M-l, the coordinate process is Markov with stationary
transitions P and starting state i. Therefore f!lJ M-l = Pi' Conversely,
f!lJM-l = Pi implies that X is Markov with stationary transitions P and
starting state i. Now probability statements about X can be translated into
statements about Pi' For example, the following three assertions are all
equivalent:
(5a) Pi{~n = i for infinitely many n} = 1.
(5b) For some Markov chain X with stationary transitions P and starting
state i,
f!lJ{Xn = i for infinitely many n} = 1.
(5c) For all Markov chains X with stationary transitions P and starting
state i,
f!lJ{Xn = i for infinitely many n} = 1.
Indeed, the set talked about in (5b) is the M-inverse image of the set talked
about in (5a); and Pi = f!lJM-l.
The basic theory of these processes is developed in a rapid but complete
4 INTRODUCTION TO DISCRETE TIME [1

way in Sections 3-9; Sections 10, 12, and 14 present some examples, while
Sections 11 and 13 cover special topics. Readers who want a more leisurely
discussion of the intuitive background should look at (Feller, 1968, XV) or
(Kemeny and Snell, 1960). Here is a summary of Sections 3-9.

2. SUMMARY

The main result in Section 3 is the strong Markov property. To state the
best case of it, let the random variable T on fOO take only the values
0, 1, ... , 00. Suppose the set {T = n} is in the a-field spanned by ~o, ... , ~n
for n = 0, 1, ... , and suppose

P;{T < 00 and ~T = j} = 1 for some j E [.

Then the fragment

and the process

are P;-independent; the P;-distribution of the process is Pj' This is a special


case of the strong Markov property.
(6) Illustration. Let T be the least n with ~n = j, and T = 00 if there is no
such n; the assumption above is Pi{T < oo} = 1.
To state the results of Section 4, write:

i -- j iff pn(i,j) >0 for some n = 1,2, ... ;


i +-+ j iff i -- j and j -- i
i is essential iff i -- j implies j -- i.

(7) Illustration. Suppose [= {I, 2, 3, 4} and P is this matrix:

(HH) t ! t !
Then 1,2,3 are essential and 4 is inessential. Moreover, 1 +-+ 1 while 2 +-+ 3.
For the rest of this summary,

suppose all i E [ are essential.


1.2] SUMMARY 5

Then ~ is an equivalence relation. For the rest of this summary, suppose that
I consists of one equivalence class, namely,

suppose i -+ j and j -+ i for all i and j in I.


Let period i be the greatest common denominator (g.c.d.) of the set of
n > 0 with pn(i, i) > o. Then period i does not depend on i; say it is d. And
I is the disjoint union of sets Co, Cl , . . . , Cd - l , such that

i E C n and P(i,j) >0 imply i E C nEB1 ,

where EB means addition modulo d.


(8) Illustration. Suppose I = {I, 2, 3, 4} and P is this matrix:

(
0 0
o 0
t
t t
1)
t tOO
t tOO
Then I has period 2, and Co = {I, 2} and Cl = {3, 4}.
For the rest of the summary,
suppose period i = 1 for all i E I.

To state the result of Section 5, say


i is recurrent iff Pig n = i for infinitely many n} = 1
i is transient iff Pig n = i for infinitely many n} = O.
This classification is exhaustive. Namely, the state i is either recurrent or
transient, according as ~n pn(i, i) is infinite 'or finite. And all i E I are
recurrent or transient together. These results follow from the strong Markov
property. Parenthetically, under present assumptions: if I is finite, all i E I
are recurrent.
(9) Example. Suppose 1= {O, 1,2, ... }. Let 0 < Pn < 1. Suppose
P(O, 1) = 1 and for n = 1,2, ... suppose P(n, n + 1) = Pn and P(n, 0) =
1 - Pn. Suppose all other entries in P vanish; see Figure 1. The states are
recurrent or transient according as n Pn is zero or positive.

*
HINT. See (16) below.
For the rest of this summary,
suppose all i E I are recurrent.
6 INTRODUCTION TO DISCRETE TIME [1

PI P2 P3
o I------.......j I------l 2 I------l 3 1------ ...

••

Figure 1

To state the result of Section 6, let YI , Y2 , ••• be a sequence of inde-


pendent, identically distributed random variables, taking only the values
1,2,3, ... with probabilities PI' P2' P3' .... Let fl = ~ npn' and suppose

g.c.d. {n:Pn > O} = 1.

Let U(m) be the probability that

YI + ... + Yn =m for some n = 0, 1, 2, ....


Then
limm~C(J U(m) = l/p.
This result is called the renewal theorem. It is used in Section 7, together with
strong Markov, to show that

lim n _+ oo pn(i,j) = -rr(j),


where I/-rr(j) is the Prexpectation of the least m
To state the result of Section 8, say
> °
with ~m = j.

j is positive recurrent iff -rr(j) >°


j is null recurrent iff -rr(j) = 0.
Then all i E I are either positive recurrent or null recurrent together.

(10) Example. Let f = {a, I, 2, ... }. Let Pn > and ~:=l Pn = 1. Let·
pea, n) = Pn and pen, n - I) = 1 for n = 1,2, .... See Figure 2. The
°
states are positive recurrent or null recurrent according as ~:=1 nPn is finite
or infinite.
1.3] THE MARKOV AND STRONG MARKOV PROPERTIES 7

o I-------l 1--------1 2 1 - - - - - - - 1 3 1----- ...


PI

P2

P3

••

Figure 2

HINT. See (16) below.


For the rest of this summary,

suppose all i E I are positive recurrent.

To state the result of Section 9, say a measure m on lis invariant iff

m(j) = ~iEI m(i)P(i,j) for allj E I.

Recall that 7T(j) = limn_~a) pn(i,j). Then 7T is an invariant probability. And


any invariant signed measure which has finite total mass is a scalar multiple
of 7T. A signed measure m has finite mass if ~iEl \m(i)\ < 00.
The results in Sections 3-9 are standard, so references are sparse. The basic
results for finite I are usually due to Markov himself. The extension to
countable I is usually due to Kolmogorov or Levy.

3. THE MARKOV AND STRONG MARKOV PROPERTIES

Let I be a finite or countably infinite set. Give I the discrete a-field, that is,
the a-field of all its subsets. Let r be the space of all I-sequences, namely,
functions from the nonnegative integers to I. For W E rand n = 0, 1, ...
let ~n«(O) = w(n). Call ~ the coordinate process. Give r the product a-field
a(I"'), namely, the smallest a-field such that ~o, ~1' ••• are measurable. A
matrix P on I is a function (i,j) -+ P(i,j) from I X I to the real line. Say P is
stochastic iff P(i,j) ~ 0 for all i,j and ~; P(i,j) = 1 for all i. Say P is sub-
stochastic iff P(i,j) ~ 0 for all i,j and ~iP(i,j) ~ 1 for all i. Let P be
8 INTRODUCTION TO DISCRETE TIME [1

a stochastic matrix on I, and p a probability on l. There is a unique prob-


ability P p on (1 00 , 0'(/00)) such that for all n and all io, ... , in in I,

(11) Ppg m = im for m = 0, 1, ... , n} = p(i(j) rr;:.-:,lo P(im, i m+1);


by convention, an empty product is 1. For example, use the Kolmogorov
consistency theorem (10.53). If p{i} = 1, write Pi for P p'
Sometimes, it is convenient to define P p even for sub stochastic P and sub-
probabilities p. To do this, let 1* be the set of all finite I-sequences, including
the empty sequence. Give I'" U 1* the smallest a-field which contains all
subsets of 1*, and all sets in 0'(/ Then ~o, ~l' . • . are partially defined and
00
).

measurable on I'" U 1*; namely, ~m(w) = w(m) is defined provided WEI'"


or w E 1* has length at least m + I. For this purpose, a sequence of length
m + I is a function from {O, I, ... , m} to l. Then there is still one and only
one probability P p on l'" U 1* satisfying (11). Of course, P p may assign
quite a lot of mass to the empty sequence.
Let Xu, Xl' ... be an I-valued stochastic process on a probability triple
(!!t,~, &'). Then X is a measurable mapping from (!!t,~) to (/00,0'(/00)):

[X(x)](n) = Xn(x) for x E!!t and n = 0, 1, ....


The distribution of X is &'X-r, a probability on a (l"'). More generally, let
Xu, Xl' ... be a partially defined, I-valued process on (!!t,~, &'). That is,
Xn is a function from part of!!t to I; and domain X n+1 c domain Xn; and
{Xn = i} E~. Then X is a measurable mapping from !!t to l'" U 1*. And
&'X-I resides in I'" U 1*.
(12) Definition. X o, Xl' ... is a Markov chain with stationary transitions P
and starting distribution p iff the distribution of X is P p' If p{i} = 1, say the
chain starts from i, or has starting state i. In particular, for stochastic P and
probabilities p, the coordinate process ~O, ~l' .•• is a Markov chain with
stationary transitions P and starting probability p, on the probability triple
(/00, 0'(/00), P p).
From now on, unless otherwise noted,
(13) Convention. P is a stochastic matrix on the finite or countably infinite
set I. And p is a probability on l.
For (14) and later use, define the shift T as this mapping from l'" to l"':

(Tw)(m) = w(m + 1) for m = 0, 1, ... and w E1 OO •


For n = I, 2, ... ,
(Tnw)(m) = w(m + n) for m = 0, 1, ... and w E 1"'.
It is convenient to adopt this definition even for n = 0, so TO is the identity
1.3] THE MARKOV AND STRONG MARKOV PROPERTIES 9

function. Thus, Tnw is w shifted n times to the left; the first n terms of w
disappear during this transaction. In slightly ambiguous notation,
Pw = (w(n), wen + 1), ... ).
Formally,
;m Tn = ;m+n;
0

where 0 is composition. Write

Then
T-n{;o = jo, ... , ;m = jm} = {;n = jo, ... , ;n+m = jm}·
So Tn is measurable.
Theorem (14) makes an assertion about regular conditional distributions:
these objects are discussed in the Appendix. And (14) uses the symbol P Sn .
This is an abbreviation for a function Q of pairs (w, B), with wE rand
BE a(/""), namely:
Q(w, B) = PSn<w)(B).
(14) Theorem. (Markov property). P;n is a regular conditional P p-distribution
for T" given ;0' ... , ;n·
PROOF. For WEI'" and B E a(I"'), let

Q(w, B) = P;n<w)(B).
For each w, the function B -+ Q(w, B) is a probability. For each B, the
function w -+ Q(w, B) is measurable on ;0, ... , ;n' because ;n is. What I

L
need is

Pp{A and Tn E B} = Q (w, B) Pp(dw)

for all A measurable on ;0' ... , ;n and all measurable B. Both sides of this
equality are countably additive in A. If I could prove the equality separately
for each piece {A and ;n = j}, I could finish by summing outj. But {;n = j}
is measurable on ;0' ... , ;n, so {A and ;n = j} is the typical subset of
gn = j} measurable on ;0'··.' ;n; therefore, I only have to prove the
equality for subsets A of {;n = j} measurable on ;0' ... , ;n- The integrand
on the right is now constant, so the integral is
P ptA} . Pi{B}.
What I have left to prove is this identity:
(15) P p{A and Tn E B} = P ptA} . Pi{B}
for all subsets A of {;n = j} which are measurable on ~o, ... , ~n' and all
10 INTRODUCTION TO DISCRETE TIME [1

BE eT(]"'). Consider special A, of the form


go = io, ... , ~n = in} with in = j;
and special B, of the form
{~o = jo, ... , ~m = jm}·
Here, i o, ... , in, jo, ... ,jm are all in I, and m is variable. Then
{Tn E B} = = jo, ... , ~n+m = jm}.
{~n
Ifjo ¥- j, both sides of (15) vanish. Ifjo = j, each side of (15) can be computed
separately from (11), remembering in = jo = j, and works out to
pOo) . [n~:t P(i., i.+1)] . [n~ol P(j., jv+1)]'
So, I have verified (15) for special A and special B.
Any subset A of gn = j} which is measurable on ~o, ... , ~n is empty, or
a finite or countably infinite union of different special A, which are auto-
matically pairwise disjoint. For each B, both sides of (15) are countably
additive in general A. Therefore, (15) holds for general A and special B.
Fix one general A. Both sides of (J 5) are countably additive in general B,
and agree at special B. This is still true if I call the null set and the whole
space special. The special B generate the full eT-field eT(loo). The intersection

*
of two special B is special: indeed, two different special B are either disjoint
or nested. Use (10.16) to' complete the proof.
Let A be a subset of gn = j} which is measurable o'n ~o, ... , ~n' and letf
be a nonnegative, measurable function on I"'. Then (15) can be rewritten as

(15*)

This is (15) whenfis an indicator. As a function off, each side of (15*) is


linear and continuous for nondecreasing passages to the limit.
For (16) and later use, call a transition from ito j possible iff P(i,j) > O.
All transitions in ware possible iff P[~n(w), ~n+1(w)] > 0 for all n.
(16) Proposition. The set G of w such that all transitions in ware possible
is measurable, and P p {G} = 1.

PROOF. Clearly, G = n:=o G n, where


G n = {P(~n' ~n+l) > O}.
Now
Pi{G O} = P;{P(i, ~1) > O}
= ~i {P(i,j):P(i,j) > O}
= ~iP(i,j)
=1.
1.3] THE MARKOV AND STRONG MARKOV PROPERTIES 11

Check

So,
Pp{G n} = ~i Pp{~n = i and G n}
= ~i Pp{~n = ;}Pi{GO} by (15)
=1.
*
If P and Q are sub stochastic matrices on I, so is PQ, where:

PQ(i, k) = ~iEI P(i,j)Q(j, k).


If P and Q are stochastic, so is PQ. If P is a substochastic matrix on I, so is
pn, where pl = P and pn+l = ppn = pnp. If P is stochastic, so is pn.
(17) Theorem (Semigroup property). For all i Eland n = 1,2, ...

Pig n = j} = pn(i,j).

PROOF. This is trivial for n = 1. Use induction:

Pi{~n+l = k} = ~iPi{~n = j and ~n+l = k}


= ~i Pi{~n = j} . P(j, k) by (15)
= ~jpn (i, j)P(j, k) by inductive assumption
=pn+l(i,k). *
WARNING. This does not characterize Markov chains. See (Feller, 1959).
The strong Markov property (21) strengthens (14). To state it, make the
following definitions: A random variable T on (100, a(/oo» is a Markov time,
or just Markov, iff: T takes only the values 0,1, ... ,00; for every n = 0,
1, ... , the set {T ~ n} is in the a-field ~n generated by ~o, ... , ~n. The
pre-T sigma-field ~T is the a-field of all sets A E a(/"') such that

A n{T~n}E~n foreveryn=O,I, ....


NOTES. (a) Suppose T is a function on /00, taking the values 0, 1, ... , 00.
Do not assume T is measurable. Then T is Markov iff T(W) = nand
~m(w') = ~m(w) for m = 0, ... ,n force T(W') = n. Indeed, if T satisfies the
condition, then {T = n} is a union of sets

{~m = imform = 0, ... ,n}E~n·


Conversely, if T is Markov, then {T = n} E §"n is a union of such sets.
(b) If T isn't Markov, then ~T isn't a a-field; in fact, I'" 1= ~T.
(c) ~T specifies the sample function up to and including time T, when Tis
12 INTRODUCTION TO DISCRETE TIME [1

finite. More formally, the atoms of .'FTare the singletons in {T = oo}, and
all sets

(d) Suppose T is Markov, and A is a subset of {T < oo}. Do not assume


A is measurable. Then A E.'FTiff wE A and T(W) = n and ~m(w) = ~m(w')
for m = 0, ... , n force w' EA.
(18) Illustration. Let I be a nonnegative function on I, and let T be the
least n if any with ·'i::::n~o/( ~m) ~ 17; let T = 00 if none. Then T is Markov.
(19) Illustrations. Let T be a Markov time.
(a) 11 = {T < oo} E.'FT.
(b) Any measurable subset of r"\I1, the complement of 11, is in .'FT'
(c) The time Tis .'Fr-measurable.
(d) The sum '.E~~o/an) is .'Fr-measurable, for any function I on l.
This includes (c): put/:: 1.
(e) This event is in .'FT: the process {~n} visits both i andj on or before
time T, and the first i occurs before the first j.
Let 'n = ~r+n' defined on 11 = {T < oo}. More explicitly,
'n(w) = ~r(co)+n(w) for wE 11 and n = 0, 1, ....
Of course, Sn is measurable:
{Sn = k} = U:::~o {T = m and ~m+n = k}.
Let' be the whole post-T process. Informally,
, = ('0, '1' ...).
Formally, , is this mapping of 11 into fOO:
'(w) = TT(CO)(W) for w EI1,
where T is the shift, as defined for (14). Verify that

so , is measurable.
'n = ~n 0 "

(20) Illustration. '0 = ~r is .'FT-measurable; that is,


{11 and '0 = i} E ff r•
Theorem (21) uses the notation P'o' This is an abbreviation for a function
Q of pairs (w, B), with w E 11 and B E a(IOO) :

Q(w, B) = p'o(co)(B).
1.3] THE MARKOV AND STRONG MARKOV PROPERTIES 13

(21) Theorem. (Strong Markov property). Let T be Markov and let S be


the post-T process. Given ff" a regular conditional P p-distribution for S on Ll
is P'o'
PROOF. As in (14), I can reduce this to proving
(22)
for all A E ffT with A C {Ll and So = i}, and all B E a(/"'). But

{A and SE B} = U:~o {A and T = n and Tn E B}.


Now ~n = ~T = So = i on {A and T = n}, and this set is in ffn' because
A E ff T. With this and (15) as props:

Pp{A and SE B} = ~:~o Pp{A and T = n and Tn E B}


= ~:~o Pp{A and T = n} . Pi{B}
= Pp{A} . Pi{B}. *
Let A E ffT and A C {T < 00 and ~T = j}. Let/be a nonnegative, measur-
able function on 100 • Then (22) can be rewritten as

(22*)

This is (22) when / is an indicator. As a function of /' each side of (22*) is


linear and continuous for nondecreasing passages to the limit.
Proposition (23) is preliminary to (24), which illustrates the use of strong
Markov (21). Neither result will be used again until Chapter 7. For (23),
fix j E 1 with P(j,j) < 1. Let qj be the probability on I which assigns mass
to j, and mass P(j, k)f[1 - P(j,j)] to k "e j. Let T be the least n if any with
°
; n "e ;0, and let T = 00 if none. Let Sbe the post-T process: S = TT. Say that
U is geometric with parameter () iff U is u with probability (1 - ())()U for
u = 0, 1, ....
(23) Proposition. With respect to Pi:
T - 1 is geometric with parameter P(j,j);
S is Markov with stationary transitions P and starting probability qi;
T and Sare independent.

PROOF. Let n = 1,2, ... ; let io "e j and let iI' ... , im E I. Then
{~o = j and T = n and So = io, ... , Sm = im }
= {~o = ... = ~n-l = j, ~n = io, .•. , ~n+m = im },
an event of P;-probability

*
14 INTRODUCTION TO DISCRETE TIME [1

For (24), keep j E I with P(j,j) < 1. Introduce the notion of a j-sequence
of~; namely, a maximal interval of times n with ~n = j. Let CI , C2 , ••• be the
cardinalities of the first, second, ... j-sequences of ~. Let Cn+! = 0 if there
are n or fewer j-sequences. Let AN be the event that there are N or more
j-sequences in ~.
(24) Proposition. Given AN' the variables CI - 1, C2 - 1, ... , C N - 1 are
conditionally Pp-independent and geometrically distributed, with common
parameter P(j,j).
PROOF. Fix positive integers Nand cl , ... , CN' I claim
(25) Pp{CI = CI , •.• , CN = CN IAN} = Pj{CI = CI , .•. , CN = CN I AN}'
Let
B = {CI = CI , ••. , = CN and AN}'
CN
Let a be the least n if any with ~ n = j, and a = 00 if none. Then a is Markov.
Let 'fJ be the post-a process. Now
B c: Al = {a < oo}
'fJo = jon {a < oo}
AN = {a < 00 and 'fJ EAN}'
C n = Cn 'fJ for n = 1, ... , N.
0

So

By strong Markov (21),


(26)

(27)

Divide (26) by (27) to substantiate the claim (25). The case N = 1 is now
immediate from (23).
Abbreviate 6 = P(j,j) and q = qj, as defined for (23). I claim
(28) Pj{CI = CI, ••• , C N +1 = CN+! I A N +1}
= (1 - 0)6C 1 -1 Pq{CI = C2, ... , CN = eN+! IAN}'
This and (25) prove (24) inductively. To prove (28), let T be the least n if
any with ~n =F j, and T = 00 if none. Let ~ be the post-T process. On {~o = j},
CI = T and AN+! =g EAN}'
1.3] THE MARKOV AND STRONG MARKOV PROPERTIES 15

On go = j and A N +!},
Cn +! = C n 0 ~ for n = 1, ... , N.
By (23),
(29) P j {CI = c1 , .•• , CN +! = CS + 1 and AN+!}

= (1 - O)Oc c l Pq {CI = c2 , .•. , CN = cN +! and AN}.

NOTE. Check the indices.


Sum out c1 , ••• , CN+!:
(30)
Divide (29) by (30) to prove (28).
One of the most useful applications of (21) is (31), a result that goes back
*
to (Doeblin, 1938). To state it, introduce the notion of an i-block, namely, a
finite or infinite I-sequence, which begins with i but contains no further i.
The space of i-blocks is a measurable subset of r u 1*; give it the relative
a-field. Let Tl, T2, . . • be the times n at which ~n visits i. The mth i-block
Bm is the sample sequence ~ from T m to just before T m+!' shifted to the left
so as to start at time o. Formally, let Tl be the least n if any with ~n = i;
if none, Tl = 00. Suppose Tl, ..• , T m defined. If T m = 00, then T m+! = 00.
lf T m < 00, then T m+! is the least n > T m with ~n = i, if any; if none, T m+l =
00. On T m < 00, let Bm be the sequence of length T m+! - T m' whose nth term
is ~T +n for 0 ~ n < Tm+! - Tm. On Tm = 00, let Bm = 0, the empty
sequ;nce. Thus, B 1 , B 2 , .' •• are random variables, with values 0 or i-blocks.
Let f-l = PiB?, the Pi-distribution of B1 , a probability on the space of i-blocks.
(31) Theorem. (Blocks). Given B 1 , ••• , B n - 1 , where B n- 1 is a finite
i-block, a regular conditional Pp-distribution for Bn is f-l.
PROOF. Clearly, Tn is Markov, and
{B n - 1 is a finite i-block} = {Tn < oo}.
Check that B 1 , ••• ,Bn - 1 are ffTn-measurable. Let ~ be the post-Tn process:
~ = Tn on {Tn < oo}.
On {Tn < oo},
and
Bn = B1 0 ,.

Let C be a measurable subset of the space of i-blocks, and let A E ffT with
AC{Tn<oo}. n

NOTE. A and C are sets ; Tn' " B 1 , Bn are functions.


16 INTRODUCTION TO DISCRETE TIME [1

Then
{Tn < 00 and Bn E e} = {Tn < CIJ and ~ E BIle}.
With the help of (22):
Pp{A and Bn E e} = Pp{A and ~ E B11e}
= Pp{A} . Pi{BllC}
= Pp{A} . ,u{C}.
To identify ,u in (31), anticipate a more general definition. Let P{i} be this
*
substochastic matrix on I: for j E I and k ~ i, let P{i}(j, k) = P(j, k);
while P{i}(j, i) = 0.
(32) Proposition. With respect to Pi' the first i-block has distribution P{i}t:
it is Markov with stationary substochastic transitions P{i}, starting from i.
PROOF. Confine w to the set where ~o = i, so TI = 0. This set has P i -
probability 1. Let Pm be the mth term of the first i-block, so
Pm = ~m when T2 > m
Pm is undefined when T2 ~ m.
Now
Pi{Pm = .im for m = 0, ... , M and T2 > M}
is ° unless jo
probability is
= i, while jl, ... ,jm all differ from i; in which case, this

by (11). That is, whenjo = i,

Pi{Pm = jm for m = 0, ... , M and T2 > M} = II~":-ol P{i}(jm, im+l)' *


4. CLASSIFICATION OF STATES

(33) Definition.
i °
--+ j iff pn(i,j) > for some n > 0.
i ~ j iff i --+ j and j --+ i.
i is essential iff i --+ j implies j --+ i.
You should check the following properties of --+.
(34) For any i, there is aj with i --+ j:
because '2:. j P(i,j) = 1.
(35) Ifi --+ j and j --+ k, then i -+ k:
1.4] CLASSIFICATION OF STATES 17

because
(36) pn+m(i, k) ~ pn(i,j) . pm(j, k).

(37) If i is essential, i -- i:
use (34), the definition, and (35).
(38) Lemma. ~ is an equivalence relation when restricted to the essential
states.
PROOF. Use properties (35, 37).
The ~ equivalence classes of essential states will be called communicating
*
classes or sometimes just classes. The communicating class containing i is
sometimes written C(i). You should check
(39) ).;j {P(i,j):j E C(i)} = 1:
indeed, P(i,j) > 0 implies i -- j, so j -- i because i is essential; and j E C(i).
(40) Lemma. If i is essential, and i -- j, then j is essential.

*
PROOF. Suppose j -- k. Then i -- k by (35), so k -- i because i is essential.
And k -- j by (35) again.
(41) Definition. Ifi -- i, then period i is the g.c.d. (greatest common divisor)
of {n:n > 0 and pn(i, i) > O}.
(42) Lemma. i~ j implies period i = period j.
PROOF. Clearly,
(43) PHm+b(i, i) ~ pa(i,j). pm(j,j) . pb(j, 0.
Choose a and b so that pa(i,j) > 0 and Pb(j, i) > o. If pm(j,j) > 0,
thenP2m(j,j) > 0 by (36), so (43) implies
pa+m+b(i, i) > 0 and > O.
pa+2 m+b(i, i)
Therefore, period i divides a+m +b and a + 2m + b.
So period i

*
divides the difference m. That is, period i is no more than periodj. Equally,
period j is no more than period i.
Consequently, the period of a class can safely be defined as the period of
any of its members. As usual, m == n (d) means that m - n is divisible by d.
For (44) and (45), fix i E I and suppose
I forms one class of essential states, with period d.
(44) Lemma. To each j E I there corresponds an rj = 0, 1, ... , d - 1,
such that: pn(i,j) > 0 implies n == rj (d).
18 INTRODUCTION TO DISCRETE TIME [1

PROOF. Choose s so that p.(j, i) > O. If pm(i,j) > 0 and pn(i,j) > 0,
then pm+.(i, i) > 0 andpn+.(i, i) > 0 by (36). So period i = d divides m + s
and n + s. Consequently, d divides the difference m - n, and m == n (d).
You can define r; as the remainder when n is divided by d, for any n with
F~n>Q *
Let Cr be the set of j with r; == r (d), for each integer r. Thus, Co = Cd'

Sometimes, Cr(i) is written for Cr , to show the dependence on i. These sets


are called the cyclically moving subclasses of I, and Cr+1 is called the subclass
following Cr.

(45) Theorem. (a) Co, ... , Cd - l are disjoint and their union is I.
(b) j E Cr and P(j, k) > 0 imply k E Cr+1'
PROOF. Assertion (a). Use (44).
Assertion (b). If P"(i,j) > 0 and P{j, k) > 0, then pn+1(i, k) > 0 by
(36). Since n == r (d), therefore n + 1 == r + 1 (d) and rk == r + 1 (d),
using (44) again. *
(46) Proposition. Let A o, ... ,Ad- l be disjoint sets whose union is I. For
integer rand s, let Ar = As when r == s (d). Suppose j E Ar and P(j, k) > 0
imply k E Ar+l' Fix io E Ao. Then An = Cn(io).

PROOF. I say Cn(io) cAn. Letj ECn(io)' If necessary, changen by a multiple


of d, so pn(io,j) > O. This changes neither Cn(io) nor An. Now there are
i l , ••. , ;n-l with

*
so JEAn. That is, Cn(io) C An. Now (45) and the first condition on the sets
Ao, ... ,Ad- l imply Cn{io) = An·
Corollary. Ifj E Cr(i), then C.(j) = Cr+s(i).

PROOF. Use (46), with A. = Cr+.(i).


*
(47) Lemma. Let I form one communicating class of period 1. Let p be a
probability on I, and let j E I. Then there is a positive integer n* such that

Pp{~n = j} > Ofor all n > n*.


PROOF. Fix i with p(i) > O. Find a positive integer a with pa(i,j) > O.
The set of n with P"{j,j) > 0 is a semigroup by (36) and has g.c.d. 1, so it
includes {b, b + 1, ... } for some positive integer b, by (59). Then n* = a + b
works, by (36). *
1.5] RECURRENCE 19

(48) Proposition. States j and k are ill the same Cr iff there is h in [ and an
n > 0 such that pn(j, h) > 0 and pn(k, h) > O.
PROOF. The ifpart is clear. Foronlyif,supposejECo(k). Thenpad(j, k) > 0
for some positive integer a. But (59) implies prul(k, k) > 0 for all positive
integers n ~ no. Thus, (36) makes
p(a+no)d(j, k) >0 and p(a+no)d(k, k) > o.

5. RECURRENCE

(49) Definition. For substochastic P, define matrices eP,fnp, andfP on [as


follows: The entry eP(i,j) is the Pi-mean number of visits to j, which may be
infinite. Algebraically, eP = ~~=o P", where po =~, the identity matrix.
The entry fnp(i,j) is the Pi-probability of a first visit to j in positive time at
time n. Algebraically, fnp(i,j) is the sum of TI:;.-::,lO P(im' im+1) over all [-
sequences io, ... , in with io = i, in = j, and im =F j for 0 < m < n. The entry
fP(i,j) is the Pi-probability of eL'er visiting j in positive time. Thus fP(i, i) is
the Prprobability of a return to i. Algebraically, fP = ~:=1 rp.
(50) Definition. A state j is recurrent iff
Pj{~n = jfor infinitely many n} = 1.

A state j is transient iff


Pjg n = j for finitely many n} = 1.
Equivalently, j is transient iff
Pj{ ~ n = j for infinitely many n} = O.

NOTE. Theorem (51) shows this classification to be exhaustive. Namely, j


is recurrent or transient according as ~n pn(j,j) is infinite or finite.
(51) Theorem. (a) fP(j,j) = 1 implies j is recurrent, and j is recurrent
implies eP(j,j) = 00.
(b) fP(j,j) < 1 implies eP(j,j) < 00, and eP(j,j) < 00 implies j is
transient.
(c) eP(j,j) = 1/[1 - fP(j,j)].
(d) eP(i,j) = fP(i,j) . eP(j,j) for i =F j.
PROOF. Assertion (a). Suppose fP(j,j) = 1. Let T be the least n > 0
with ~n = j, and T = 00 if none. Now Pj{T < oo} = fP(j,j) = 1, so the
first j-block is finite with P;-probability 1. Consequently, by the block
20 INTRODUCTION TO DISCRETE TIME [1

theorem (31), all j-blocks are finite with P;-probability I, that is, ~ visits
infinitely manyj's withP;-probability 1. This also proves (c) whenfP(j,j) = 1.
Assertion (b) and (c). Clearly, eP(j,j) is the Prmean number of j-blocks.
But (31) implies
P;{Bn+l is infinite B I ,I ... , Bn are finite} = 1- fP(j,j).

Consequently, the number of j-blocks is distributed with respect to P j like


the number of tosses of a p-coin needed to produce a head, with the identifi-
cation p = I - jP(j,j). To complete the proof of (c) whenjP(j,j) < I, use
this easy fact:
(52) Toss a p-coin until a head is first obtained. The mean number of trials
is IIp.
In particular,fP(j,j) < 1 implies eP(j,j) < 00, so the number of visits to j
is finite Pj-almost surely.
Assertion (d). Use strong Markov on the time of first hitting j. More
precisely, let T be the least n with ~n = j, and T = 00 if none. Then T is
Markov. Let {" = ~T+" on {T < oo}. So {o = jon {T < oo}. Let fJ;<k) be 1
or 0, according as j = k or j "t= k. Then

eP(i,j) = f~~=o c5;(~n) dPi


= r
J{r<co}
~::'T 15M,,) dPi

= r
J{r< co}
~:'=o c5;({,,) dPi

= Pi{T < oo} J~:=o ()j(~n) dP j by (22*)

=fP(i, j) . eP(j, j). *


For a generalization of (52), see (72).
(53) Lemma. fP(i, k) ~fP(i,j) ·fP(j, k).
PROOF. Let K be the event that ~" = k for some n > O. Let T be the
least positive n if any with ~" = j, and T = 00 if none. Let K* be the event
that-r < 00 and ~n = k for some n > T. Then K:::l K*, sofP{i, k) ~ Pi(K*).
Check that T is Markov; let { be the post-T process: { = T. Check that
K* = {T < oo} n {{eK}
and
{o = ~T =j on {T < oo}.
1.5] RECURRENCE 21

By strong Markov (22),

= P;{T < = fP(i,j) ·fP(j, k).


(54) Corollary.
Pi(K*) oo} . PiCK)

Fix two states i andj. IffP(i,j) = fP(j, i) =


*
1, then i andj
are recurrent.
PROOF. Using (53),

fP(i, i) ~fP(i,j) ·fP(j, i) = 1.

*
Interchange i and j. Finally, use (51a).
The next result implies that recurrence is a class property; that is, if one
state in a class is recurrent, all are.
(55) Theorem. fP(j,j) = 1 and j -+ k implies
fP(j, k) = fP(k,j) = fP(k, k) = 1.
PROOF. Suppose k =/= j. Let Bb B 2, . .. be thej-blocks. SincefP(j,j) = 1,
by the block theorem (31), the Bn are independent, identically distributed,
finite blocks relative to Pi' Since j -+ k, therefore Bl contains a k with
positive Prprobability. The events {Bn contains a k} are Prindependent and
have common positive Prprobability. Then (52) implies that with Prprob-
ability 1, there is an n such that Bn contains a k. Let T be the least n if any
with ~n = k, and T = 00 if none. Plainly, T is Markov. The first part of the
argument shows fP(j, k) = Pi{T < ex)} = 1. The strong Markov property
(22) impliesfP(k,j) is the Prprobability that ~r+n = j for some n = 0, 1, ....
So fP(k,j) = 1. Finally, use (53). *
NOTE. If j is recurrent, then j is essential.
(56) Proposition. For finite I, there is at least one essential state; and a state
is recurrent iff it is essential.
PROOF. Suppose io E I is not essential. Then io -+ il -f+ io; in particular,
il =/= io· If i1 is not essential, il -+ i2 -f+ iI' in particular, i2 =/= io and i2 =/= i 1.
And so on. This has to stop, so there is an essential state. Next, suppose J is a
finite communicating class. Any infinite J-sequence contains infinitely many
j, for some j E J. Fix i E J. There is one j E J such that

Pig n =j for infinitely many n} > O.


Use (51) to makej recurrent. Use (55) to see allj EJ are recurrent. *
If J c I is a communicating class, and all j E J are recurrent, call J a
recurrent class.
22 INTRODUCTION TO DISCRETE TIME [1

6. THE RENEWAL THEOREM

This section contains a proof of the renewal theorem based on (Feller,


1961). To state the theorem, let Y l , Y 2 , ••• be independent, identically

°
distributed, positive integer-valued random variables,on the triple (Q,.?7, &).
Let # be the expectation of Y i , and 1/# = if # = 00. Let So = 0, and Sn =
Yl + ... + Yn , and let

U(m) = &{Sn = m for some n = 0, 1, ... }.


In particular, U(O) = 1. Of course, {Sn} is a transient Markov chain with
stationary transitions, say Q, and Qi is the distribution of
{j + Sn, n = 0, 1, ... }.
(57) Theorem. (Renewal theorem). If g.c.d. {n: &[ Y i = n] > O} = I, then
limm~oo U(m) = 1/#.

This result is immediate from (65) and (66). Lemma (65) follows from
(58-64), and (66) is proved by a similar argument. To state (58-59), let F
be a subset of the integers, containing at least one nonzero element. Let
group F (respectively, semigroup F) be the smallest subgroup (respectively,
subsemigroup) of the additive integers including F.
More constructively , semigroup F is the set of all integers n which can be
represented asfl + ... + fm for some positive integer m andfl' ... ,fm E F.
And group F is the set of all integers n which can be represented as a - b,
with a, b E semigroup F. If A E group F and q is a positive integer, then

Aq = ~~~l A E group F.
(58) Lemma. g.c.d. F is the least positive element A of group F.
PROOF. Plainly, g.c.d. F divides A, so g.c.d. F ~ A. To verify that A
divides any f in F, let f = Aq + r, where q is an integer and r is one of
0, ... , A - 1. Now r = f - Aq E group F, and
Consequently, A ~ g.c.d. F.
°
~ r < A, so r = 0.
*
(59) Lemma. Suppose each f in F is positive. Let g.c.d. F = 1. Then for
some positive integer m o, semigroup F contains all m ~ mo.
PROOF. Use (58) to find a and b in semigroup Fwith a - b = 1. Plainly,
semigroup F::::> semigroup {a, b}. I say semigroup {a, b} contains all m ~ b2 •
For if m ~ b2 , then m = qb + r, where q is a nonnegative integer and r is
one of 0, ... , b - 1. Necessarily, q ~ b, so q - r > 0. Then

m = qb + r(a - b) = ra + (q - r)b E semigroup {a, b}. *


1.6] THE RENEWAL THEOREM 23

For (60-66), suppose


g.c.d. {n: &'[Yi = n] > O} = 1.
(60) Lemma. There is a positive integer mo such that: for all m ~ m o, there
is a positive integer n = n(m) with &{Sn = m} > O.
PROOF. Let G be the set of m such that &'{Sn = m} >0 for some n.
Then G is a semigroup, because

&'{Sn+n' = m + m'} ~ &'{Sn = m} . &'{Sn' = m'}.


And G =:> {m: &'[Yi = m] > O}. So g.c.d. G = 1. Now use (59). *
For (61-65), use the mo of (60), and let

L = lim sUPn~oo U(n).


(61) Lemma. Let n' be a subsequence with limn~oo U(n') = L. Then
limn~ooU(n' - m) = Lfor any m ~ mo·
Here n' is to be thought of as a function of n.
PROOF. Fix m ~ mo. Using (60), choose N = N(m) so &'{SN = m} > O.
Thus, N ~ m. Using the diagonal argument (10. 56), find a subsequence n"
of n' such that
A(t) = limn~oo U(n" - t)
exists for aU t, and
;.(m) = lim infn~oo U(n" - m).

Fix an integer j > N. Clearly,


{Sr = j for some r}
is the disjoint union of
{Sr =j for some r ~ N}
and
{Sr = j for no r ~ N, but Sr = j for some r > N}.
The last set is U::~. At, where At is the event:
SN = t and YN+l + ... + YN+ n = j - t for some n.
Consequently,

(62) U(j) = &'{Sr =j for some r ~ N} + L.:::,~ &'{SN = t} . U(j - t).


Now max r {Sr:r ~ N} is a finite random variable. Putj = n" in (62); this is
safe for large n. Let n ->- 00 and use dominated convergence:

L = L.~N &'{SN = t} . A(t).


24 INTRODUCTION TO DISCRETE TIME [1

Dominated convergence is legitimate because U ~ 1; and n" is a subsequence


of n', so L = limn~oo U(n"). But A(t) ~ L for all t, and f!J{SN = m} > 0;
so A(m) < L is impossible. *
(63) Lemma. There is a subsequence n* such that limn~'" U(n* - m) = L
for every m = 0, 1, ....
= > mo for
*
PROOF. Find a subsequence n' with limn~oo U(n') Land n'
all n. Let n* = n' - mo. Use (61).

(64) Lemma. ~::'~o f!J{Yi > m}' U(n - m) = 1.


PROOF. For m = 0, ... ,n, let Am be the event that So, Sl' ... hits
n - m, but does not hit n - m + 1, ... , n. Then
.9'{Am} = U(n - m)' f!J{Y, > m},

*
by strong Markov (22) or a direct argument. The Am are pairwise disjoint,
and their union is n. So ~::'~o f!J{Am} = 1.

(65) Lemma. L = 11ft.

PROOF. As usual, ~:~o .9'{Yi > m} = ft. Suppose ft < 00. In (64),
replace n by n* of (63). Then let n -+ 00. Dominated convergence implies
~:~o .9'{Yi > m}' L = 1, so L = 11ft. When ft = 00, use the same argu-
ment and Fatou to get
~:~o .9'{Y; > m} . L ~ 1,
=0=
forcing L

(66) Lemma.
11ft.

lim infn~'" U(n) = 11ft.


*
PROOF. This follows the pattern for (65). Now let L stand for
lim infn~'" U(n) = 11ft. Lemma (61) still holds, with essentially the same
proof: make
A(m) = lim sUPn~'" U(n' - m);

*
and reverse the inequalities at the end. Lemmas (63, 65) still hold, with the
same proof.
This completes the proof of (57). I will restate matters for use in (69).
Abbreviate Pn = .9'{Yi = n}. Drop the assumption that

> O} = 1.
g.c.d. {n:Pn

(61) Proposition. Let d = g.c.d. {n :Pn > O}.


(a) d = g.c.d. {m: U(m) > O}.
(b) limn~oo U(nd) = dlft.
1.7] THE LIMITS OF pn 25

PROOF. Plainly, {m: U(m) > O} = {O} u semigroup {n :Pn > O}. If F
is a nonempty subset of the positive integers, g.c.d. F = g.c.d. semigroup F.
This does (a). Claim (b) reduces to (57) when you divide the 1'; by d. *
7. THE LIMITS OF pn

The renewal theorem gives considerable insight into the limiting behavior
of pn. To state the results, let mP(i,j) be the P;-expectation of 'Ti' where 'Ti
is the least n > 0 if any with; n = j, and 'Tj = 00 if none. For n = 0, 1, ...
let
cpnp(i,j) = P;{;n = j and ;m ~ j for m < n}.
Thus, cpoP(i,j) is 1 or 0, according as i =j or i ~ j. And cpnp(i, i) = 0 for
n > O. Let

cpP(i,j) = Pi{;n =j for some n ~ O} = ~:=o cpnp(i,j).


There is no essential difference between cp and the / of (49). But this section
goes more smoothly with cpo
(68) Theorem. If j is transient, limn_co P"(i ,j) = O.
PROOF. Theorem (51) implies eP(i,j) < 00. But eP(i,j) = ~~=opn(i,j). *
(69) Theorem. Suppose j is recurrent.

(a) limn_co A~::'=1 pm(i,j) = CPP(i,j)/mP(j,j).


(b) IfmP(j,j) = 00, thenlim n _ oo P"(i,j) = 0.
(c) I/mP(j,j) < 00 and periodj = 1, then

lim n _", pn(i,j) = CPP(i,j)/mP(j,j).

(d) IfmP(j,j) < 00 andperiodj = d, then/or r = 0, 1, ... , d - 1,

lim,,--+CX) pnd+T(i, j) = d~:=o cpmd+Tp(i, j)fmP(j, j).

PROOF. Claim (a) follows from (b) and (d), or can be proved directly,
as in (Doob, 1953, p. 175).
Claim (b) is similar to (d), and the proof is omitted.
Claim (c) is the leading special case of (d). Suppose (c) were known for
i = j. Then (c) would hold for any i, by using dominated convergence on the
identity

(70)
26 INTRODUCTION TO DISCRETE TIME [I

ARGUMENT FOR (70). This identity is trivial when i = j. Suppose i -=;6. j,


so cpoP(i,j) = O. Clearly,

go = i and ~n = j} = U;:'~l Am'


where
Am = go = i and ~m = ~n = j, but ~r -=;6. j for r < m}.
Markov (IS) implies
Pi{Am} = cpmp(i,j) pn-m(j,j).
This completes the proof of (70).
The special case i = j of (c) follows from the renewal theorem (57). Take
the lengths of the successive j-blocks for the random variables Y1 , Y2 , ••••
Use blocks (31) to verify that the Y's are independent and identically
distributed. Check that U(n) = P"(j,j). From (67a):
g.c.d. {n:Pj[Yl = n] > O} = 1.

This completes the argument for (c).


Claim (d) is similar to (c). In (70), if n == red), then pn-m(j,j) = 0 unless
m == red). Use (67) to make
= djmP(j,j).
*
limn~oo pnd(j,j)

8. POSITIVE RECURRENCE

Call j positive recurrent iff j is recurrent and mP(j,j) < 00. Call j null
recurrent iff j is recurrent and mP(j,j) = 00. Is positive recurrence a class
property? That is, suppose C is a class andj E C is positive recurrent. Does it
follow that all k E C are positive recurrent? The affirmative answer is provided
by (76), to which (71-73) are preliminary. Theorem (76) also makes the harder
assertion: in a positive recurrent class, mP(i,j) < 00 for all pairs i, j.
Lemma (71) is Wald's (1944) identity. To state it, let Y u Y 2 , ••• be in-
dependent and identically distributed on (0, :F, 9). Let
Sn = Y1 + ... + Y n,

so So = O. Let T be a random variable on (0, :F) whose values are non-


negative integers or 00. Suppose {T < n} is independent of Y n for all n; so
{T ~ n} is also independent of Yn • Use E for expectation relative to 9.
(71) Lemma. E(ST) = E(Yn ) • E(T), provided (a) or (b) holds.
(a) Y n ~ 0 and 9{ Y n > O} > O.
(b) E(I Ynl) < 00 and E· (T) < 00.
1.8] POSITIVE RECURRENCE 27

PROOF. Here is a formal computation: as usual, lA is 1 on A and 0 off A.

E(ST) = E(~;:O~l Ynl{T",n})


= ~;:O~l E(Yn l{T",n})
= E(Yn ) • ~;:O~l 9{r ~ n}
= E(Yn) E(r).
< 00
*
If Y n ~ 0, the interchange of E and ~ is justified by Fubini. If E(I Ynl)
and (Er) < 00, then ~;:O~l E I Ynl{T",n} 1< 00, so Fubini still works.
(72) Example. A p-coin is tossed independently until n heads are obtained.
The expected number of tosses is nip.
PROOF. Suppose p > O. Let Y m be 1 or 0 according as the mth toss is
head or tail. Let r be the least m with Sm = Y1 + ... + Y m = n. Then
ST = n, so n = E(ST) = E(Yl ) ' E(r) = p. E(r), using (71). *
To state (73), let Iro be the set of W E r' such that wen) = k for infinitely
many n, for every k E I.
(73) Lemma. If I is a recurrent class, Pi{I"J = 1.
PROOF. Let Am be the event: the mth i-block contains a j. From the
block theorem (31), with respect to Pi'
(74) the Am are independent and have common probability.
Because i~ j,

(75)
=
Consequently, Pi{lim sup Am}
This repeats part of (55).
1.
*
(76) Theorem. Let I be a recurrent class. Either mP(i,j) < 00 for all i and
j in I, or mP(j,j) = 00 for all j in I.
For a generalization, see (2.98).
PROOF. Suppose mP(i, i) < 00 for an i in I. Fix j ¥- i. What must be
proved is that mP(i,j) , mP(j, i), and mP(j,j) are all finite. To start the proof,
confine w to Iro (\ go = i}. Let r be the least n such that the nth i-block
contains a j. Using the Am of (73), and the notation C\D for the set of points
in C but not in D,
{r = n} = (/ro\Al) (\ ... (\ «(,,\A n- 1) (\ An.
Relation (74) implies that r is Pi-distributed like the waiting time for the
first head in tossing a p-coin, where 0 < p = Pi(Am) by (75). Now (72)
implies f r dP i < 00. Let Yu Y 2 , ••• be the lengths of the successive i-blocks.
28 INTRODUCTION TO DISCRETE TIME [1

By the block theorem (31), the Ym's are Pi-independent and identically
distributed; and {T < n} is Pi-independent of Y n . By definition, J Y1 dP i =
mP(i, i). Now Wald's identity (71) forces

J!:~=l Y m dP i < 00.


T=4
Yl Y2 Y3 Y4
I'
lsti
'1'
2nd i
'I'
3rd i
'1'
4th i
I
'I
5th i

u-J
1st j

I. T .1.
Figure 3

As in Figure 3, let T(w) be the least n with w(n) = j. Let T(w) + U(w) be the
least n > T( w) with w(n) = i. Then

!:~=l Ym = T + U; so JT dP < i 00 and JU dP < i 00.

By definition, J T dPi = mP(i,j). Use the strong Markov property to see


J U dPi = mP(j, i). This proves mP(i,j) and mP(j, i) are finite. To settle
mP(j,j) , check Y1 ~ T + U, so mP(i, i) ~ mP(i,j) + mP(j, i). Inter-
change i andj to get mP(j,j) ~ mP(i,j) + mP(j, i) < 00. *
(77) Remark. The argument shows: if mP(i,j) < OCJ and mP(j, i) < 00 for
some i and j, then i and j are positive recurrent.
If J c I is a communicating class, and all j E J are positive (respectively,
nUll) recurrent, call J a positive (respectively, null) recurrent class.
(78) Proposition. Suppose I is finite and j E I is recurrent. Then j is positive
recurrent.

*
PROOF. Reduce I to the communicating class containing j, and use (79)
below.
(79) Proposition. Let Ti be the least n if any with ~ n = j, and Ti = 00 if
none. Suppose I is finite, j is a given state in I, and i ---+- j for all i E I. Then there
are constants A and r with 0 < A < 00 and 0 < r < 1, such that
Pi{T; > n} ~ Ar n for all i E I and n = 0,1, ....
PROOF. Let P agree with P except in the jth row, where P(j,j) = 1. Then
Pi{eo = i o, ... , ~m = im } = Pi{eo = i o, ... , ~m = im }
1.9] INVARIANT PROBABILITIES 29

provided io, iI, ... , im- I are all different fromj; however, im = j is allowed.
Sum over all such sequences with im = j and m ~ n:
P;{-rj ~ n} = Pih ~ n} = pn(i,j).
So i -- j for P, and I only have to get the inequality for P.
You should see that pn(i,j) is nondecreasing with n, and is positive for
n ~ ni , for some positive integer ni . Let N = maxi ni , so
0< e = miniPN(i,j),
using the finitude of I twice. Thus
1 - e ~ Pih > N} for all i.
Recall the shift T, introduced for the Markov property (14). Check
{Tj > (n + 1)N} = {'Tj > nN} () inN {'T; > N}.
Make sure that {'T; > nN} is measurable on ~o, ... , ~nN' Therefore,
Pih > (n + 1)N} = ~k Pi{~nN = k and 'Tj > (n + 1)N}
= ~k Pi{~nN = k and 'Tj > nN}' Pkh > N} by (15)
~ (1 - e) ~k Pi{~nN = k and 'T; > nN}
= (1 - e)P;h > nN}
~ (l - e)nH by induction.
Suppose nN ~ m < (n + 1)N. Then
Pi{'T; > m} ~ Pi{'T; > nN}
~ (1 - e)n

= _1_ [(1 _ e)IIN](n+l)N


1- e
~ _1_ [(1 _
*
e)IIN]m.
1- e

9. INVARIANT PROBABILITIES

(80) Definition. A measure I-' on I is invariant iff


I-'(j) = ~iEI 1-'(t)P(i,j) for all j,
and subinvariant iff I-'(j) ~ ~iEI 1-'(i)P(i,j). The convention 00 • 0 = 0 applies.
Abbreviate TT(j) = IjmP(j,j). The main result on invariant probabilities is:
(81) Theorem. If I is a positive recurrent class, then TT is an invariant
probability, and any invariant signed measure with finite mass is a scalar
multiple of TT.
30 INTRODUCTION TO DISCRETE TIME [1

NOTES. Suppose I forms one recurrent class.


(a) 7r(j) > 0 for all j.
(b) As will be seen in (2.24), any subinvariant measure is automatically
finite and invariant, and a nonnegative scalar multiple of 11'.
Measures are nonnegative, unless specified otherwise. The proof of (81)
consists of lemmas (82-87). In all of them, assume I is a positive recurrent
class.
(82) Lemma. 11' is a subprobability.
PROOF. Because pm is stochastic,

Send n to 00. By (69a),

~~~n=l pm(i,j) ---+ CPP(i,j)jmP(j,j).


But CPP(i,j) = 1 because I is a recurrent class, and 1jmP(j,j) = 7r(j) by
definition. So Fatou makes

(83) Lemma. 11' is subinvariant.


*
PROOF. Because

therefore
~iEl [~~::'=lpm(i,j)J P(j, k) = ~~~::2pm(i, k).
Send n to 00. Use (69a) and Fatou, as in (82):

*
~jEJ 7r(j)P(j, k) ~ 7r(k).

(84) Lemma. 11' is invariant.


PROOF. Lemma (83) makes 'Tr(k) ~ ~iEl'Tr(j)P(j, k). If inequality occurs
anywhere, sum out k and get

~kEI 'Tr(k) > ~jEI 'Tr(j).


This contradicts (82).
For (85), let fl be an invariant signed measure with finite mass.
*
(85) Lemma. fl(j) = [~iEI fl(i)] 1T(j).
1.9] INY ARIANT PROBABILITIES 31

PROOF. By iteration,

so

Send n to 00. Use (69a) as in (82), and dominated convergence:

=
(87) Lemma. 7T
ft(j)
is a probability.
~iEI ft(i)7T(j)·
*
PROOF. Using (82) and (84), put 7T for ft in (85):
7T(j) = [~iEI 7T(i)] 7T(j).

But 7T(j) = l/mP(j,j) > 0, because I is positive recurrent. So

=
~iEI 7T(i) 1.
*
*
PROOF OF (81). Use (84), (87), and (85).
Now drop the assumption that I is a positive recurrent class. Let ft be a
signed measure on I with finite mass. The next theorem describes all invariant
ft. To state it, let C be the set of all positive recurrent classes J c l. For
JEC, define a probability TTJ on Iby: 7TAj) = l/mP(j,j) forjEJ, and
7T Aj) = 0 for j 1= J.

(88) Theorem. ft is invariant iff ft = ~JEC ft(J) 7T J.


The proof of (88) is deferred.
(89) Lemma. Let ft be an invariant signed measure on I, with finite mass.
Then ft assigns measure 0 to the transient and null recurrent states.
Send m to
*
PROOF. 00 in (86). Then use dominated convergence, and (68)
or (69b).
For (90-91), define a matri~ P Jon J E C by PAi,j) = P(i,j), for i and j
in J. As (39) implies, P J is stochastic. For (90), fix J E C. Let vJ be a signed
measure on J with finite mass, invariant with respect to P J. Define a measure
v on I by: v(i) = vAi) for i E J, and v(i) = 0 for i 1= J.
(90) Lemma. v is invariant.
PROOF. Suppose j E J. Then

~iEI v(i)PU,j) = ~iEJ vJ(i)P Ai,j) = vAj) = v(j).


32 INTRODUCTION TO DISCRETE TIME [1

Suppose j ¢ J. If 'P(i) > 0, then i E J, so P(i,j) = 0 by (39). That is,


= 0 for all i. And
'P(i)P(i,j)

LiE! 'P(i)P(i,j) = 0 = 'P(j). *


For (91), let fJ, be an invariant signed measure on I, with finite mass. Fix
JEC.
(91) Lemma. fJ, retracted to J is invariant with respect to P J.
PROOF. Suppose j E J. Then P(i,j) = 0 when i EKE C\{J} by (39). And
fJ,(i)=0 when i¢U{K:KEC}, by (89). So fJ,(i)P(i,j)=O unless iEJ.
That is,
fJ,(j) = LiE! fJ,(i)P(i,j)
= L iEJ fJ,(i)P(i,j)
= L iEJ fJ,(i)P Ai,j).

PROOF OF (88). Let fJ, be an invariant signed measure on I, with finite


*
mass. As (89) implies, fJ, concentrates on U {J:J E C}. When J E C, let

fJ,Aj) = fJ,(j) for j E J


= 0 forj¢J.
Then
fJ, = LJEC fJ,J.

As (91) implies, the retract of fJ, to J is P rinvariant. So, (81) on P J implies

Therefore
fJ, = L JEC fJ,(J) 7T J.

Conversely, the retract of 7T J to J is P rinvariant by (81). So 7T J is P-


invariant by (90). If L IdJI < 00, then

is also P-invariant.
*
10. THE BERNOULLI WALK

In this section, I is the set of integers and 0 < p < I. Define the stochastic
matrix [p] on I by: [p](i, i + 1) = p, and [p](i, i - I ) = 1 - p. This
notation is strictly temporary. Plainly, I is a communicating class of period 2.
(92) Theorem. I is recurrent for [p] iffp = l·
1.10] THE BERNOULLI WALK 33

PROOF. Only if. You should check that


[p]o{~o = 0, ~1 = iI, ... , ~2n-1= i 2n- 1, ~2n = o}
= (4p(1 - p»n [t]o{~o = 0, ~1 = iI' ... , ~2n-1 = i 2n- 1, ~2n = O}.
Sum over all these sequences, with im =F °for °< m < 2n;
j2n[p](0, 0) = (4p(1 - pW j2n[t](0, 0).
The definition ofjm andfis in (49). Ifp =F t, then 4p(l - p) < I, so
j[p](O,O) = ~~d2n[p](0, 0)
< ~~d2n[t](0, 0)
~1.
Use (51) to see that p =F t implies I is transient. This argument was suggested
by T. F. Hou.
If. I say that x = j[p](O, 1) satisfies
(93) x = p + (1 - p)X2.

°
To begin with, x = j[p]( -1, 0): indeed, the [p]o-distribution of ~o - 1,
~l - 1, ... is [pLl; and ~o - 1, ~1 - 1, ... hits iff ~o, ~l' ... hits 1. Next,

f[p](-I, 1) = f[p](-I, 0) '/[p](O, 1) = X2,


by strong Markov (22). Use Markov (15) in line 3:
x = [plo{~n = 1 for some n}
= [Plo{~l = I} + [P1o{~l = -1 and ~n = 1 for some n}
= p + (I - p) [pLl{~n = 1 for some n}
= p + (I - p)f[p](-I, I)
=P + (1 - p)X2.
This proves (93).
For the rest of the proof, suppose p = t. Then (93) has only one solution,
x = 1. Moreover,
f[t](I, 0) =f[t](-I,O) =f[t](O, I);

°
the second equality is old; the first one works because the [tll-distribution of
- ~o, - ~1' •.• is [tLl' and - ~o, - ~l' ... hits iff ~o, ~l' ... hits 0. Now
use (54). *
(94) The class I is null recurrent for [U
Here is an argument for (94) that I learned from Harry Reuter. By
previous reasoning, pn(j,j) does not depend on j. So lim n_ oo P2n(j,j) does
not depend on j. If I were positive recurrent, the invariant probability
34 INTRODUCTION TO DISCRETE TIME [1

would have to assign equal mass to all integers by (69d) and (81). This is
untenable.
Suppose! < p < 1. The two solutions of (93) are 1 and p/(1 - p) > 1.
Thus,f[p](O, 1) = 1. Previous arguments promote this to

(95) f[p](i,j) = 1 for i <j.

Now Y = j[p](O, -1) < 1, for otherwise I would be recurrent by (54).


Interchange right and left, so p and 1 - p, in (93):

y = 1_ P + py2.
The two solutions are y = 1 and y = (1 - p)/p, so f[p](O, -1) = (1 - p)/p.
Previous arguments promote this to

(96) j[p](i,j) = C~ prj for i > j.


Moreover, j[p](O,O) = pf[p](l, 0) + (l - p)j[p](-l, 0) = 2(1 - p).
Previous arguments promote this to

(97) f[p](i, i) = 2(1 - p).

Use (51) to get:


(98) e[p](i,j) = _1_ for i ~ j
2p - 1
1-
= (--
p)i-j- -1 - for i >j.
P 2p - 1

11. FORBIDDEN TRANSITIONS

The material in this section will be used in Section 12, and referred to in
Chapter 3 and 4. It is taken from (Chung, 1960, Section 1.9).
(99) Definition. For any subset H of I, define a substochastic matrix PH on I:
for i E land j i H, let PH(i,j) = P(i,j): but for j E H, let PH(i,j) = 0.
°
Let 7 be the least n > if any with ~n E H, and 7 = 00 if none. With
respect to Pi' the fragment {~n:O ~ n < 7} is Markov with stationary transi-
tions PH. Thus, ePH(i, k) is the Pi-mean number of n ?; 0, but less than the
first positive m with ~m E H, such that ~n = k. Moreover,fPH(i, k) is the
Pi-probability that there is an n > 0, but less than the first positive m with
~m E H, such that ~n = k. The operators e andfwere defined in (49).
1.11] FORBIDDEN TRANSITIONS 35

In the proof of (100), I will use some theorems proved for stochastic P
on slJbstochastic P. To legitimate this, adjoin a new state 0 to I. Define

Pa(i,j) = P(i,j) for i andj in I


Pa(i, 0) = 1 - l:iEI P(i,j) for i in I
. Pa(o, i) =0 foriinI
P,(o, a) = 1.

Suppose {Xn } is a Markov chain with substochastic transitions P. Let


Y n = Xn when Xn is defined, and Y n = a
when Xn is undefined. Then {Yn}
is Markov with stochastic transitions Po. Use the old theorems on Po.

(100) Lemma. If k -+ h for some hE H, then ePH(i, k) < 00. More


precisely:
ePH(k, k) = 1/[1 - fPH(k, k)];
and for i ¥: k,
ePH(i, k) = fPH(i, k)/[l - fPH(k, k)].

PROOF. First, suppose k E H. Then fPH(k, k) = 0 and ePH(k, k) = 1,


proving the first display. Let i ¥: k. Then fPH(i, k) = ePH(i, k) = 0,
proving the second display.
Now, suppose k rf: H. A k-block contains no hE H with probability
fPH(k, k) < 1. Use (SIc) to verify that

ePH(k, k) = 1/[1 - fPH(k, k)].

Let i ¥: k. Use (SId) to get


ePH(i, k) = fPH(i, k) . ePH(k, k).

Give I the discrete topology, and let 1 = I U {rp} be its one-point com-
*
pactification; forexample, let 1 = {iI' i2, ... ,ioo}, where ioo = rp; metrize i
. p(in' im )
wIth = 11- - -1 I and 1
- = o. A sequence k n E I converges to rp
n m OCJ
iff k n = j for only finitely many n, for eachj E I.

(101) Lemma. fP(i,j) = limk_.",fP{k}(i,j).

PROOF. Let Dn be a sequence offinite sets swelling to l. As n increases, the


event En that {~m} hitsj before hitting I\D n increases to the event Ethat {~m}
hits j. So, P;(En) -+fP(i,j). If k rf: D n, then En is included in the event that
{~m} hits j before hitting k. So,

fP(i,j) ~fP{k}(i,j) ~ P;(En)·


*
36 INTRODUCTlON TO DISCRETE TIME [1

(102) Lemma. mP(i, i) = Lk eP{i}(i, k).


°
PROOF. Let T be the least n > if any with ~n = i, and T = 00 if none.
Let ,en, k) be the indicator of the event: n < T and ~n = k. Thus,
T = LkEI L;::'=o ,en, k),
and
mP(i, i) = f
T f
dP. = LkEZL:'=o ,en, k) dPi = LkEI eP{i}(i, k). *
(103) Lemma. If I is recurrent and i ':;/= k,
fP{i}(k, k) + fP{k}(k, i) = 1.

*
PROOF. With respect to Pk , almost all paths hit either i or k first, in
positive time.
12. THE HARRIS WALK

The next example was studied by Harris (1952), using Brownian motion.
°
To describe the example, let < aj < 1 and bj = 1 - aj for j = 1,2, ....
Let I be the set of nonnegative integers. Define the stochastic matrix P on I
by:P(O, 1) = 1, whileP(j,j + 1) = aj andP(j,j - 1) = b,for 1 ~j < 00.
Plainly, I is an essential class of period 2. When is it recurrent? To state the
answer, let ro=l; let rn=(bl···bn)/(al···an) for n=I,2, ... ; let
R(O) = 0; let R(n) = ro + ... + r n- 1 for n = 1,2, ... ; and let R(oo) =
L;::'=o rn·
(104) Theorem. I is recurrent or transient according as R( 00) = 00 or
R(oo) < 00. If I is recurrent, it is null or positive according as L;::'=ll/(anr n )
is infinite or finite.
The proof of this theorem is presented as a series of lemmas. Of these,
(105-111) deal with the criterion for recurrence, and (112) deals with
distinguishing null from positive recurrence. It is convenient to introduce a
stochastic matrix Q on I, which agrees with P except in the Oth row, when
Q(O, 0) = 1.
(105) Lemma. For each i E I, the process Rao), R(~l)' ... is a martingale
relative to Qi.
PROOF. On {~n = j}, the conditional Qi-expectation of R(~n+1) given
is Lk Q(j, k)R(k), by Markov (15). When j = 0, this sum is
°
~o, ... , ~n
clearly = R(j). Whenj > 0, this sum is
ajR(j + 1) + bjR(j - 1) = aj[R(j) + rj] + bj[R(j) - rj_1 ]
= R(j) + ajrj - bjrj_1
= R(j).
*
1.12] THE HARRIS WALK 37

Let 0 ~ i <j < kin (106-111).


(106) Lemma. With Qrprobability 1, there is an n such that ~ n is i or k.
PROOF. Let n be the product ai+l ••• a k - 1 , and let d = k - i - l. Given
;0' ... , ;n' on i < ;n < k, the conditional Qrprobability that
i < ; n' ••• , ; n+d <k
is no more than 1 - n. Indeed, n underestimates the conditional probability

*
that ;n+l' ;n+2' ... proceed steadily to the right until reaching k. So, the
Qj-probability that i < ;0' ... , ;md < k is no more than (1 - n)m.
Restate (106) as
(107) fQ{i}(j, k) + fQ{k}(j, i) = l.
Of course,fQ{i}(j, k) is the Qrprobability that ~ hits k before i.
(108) Lemma. fQ{k}(j, i) = [R(k) - R(j)]/[R(k) - R(i)].
PROOF. Let x = fQ{k}(j, i), so 1 - x = fQ{i}(j, k) by (107). Let T be
the least n with ;n = i or k, and T = 00 if none. Stop {Ran)} at T, using
(106) and (10.28). Thus

R{j) = JR(;T) dQj


= xR(i) + (1 - x)R(k).

*
Solve for x.
(109) Lemma. fQ{k}(j, i) = fP{k}(j, i).
PROOF. Let io, iI' ... ,in be any I-sequence which does not contain 0
except possibly for in. Then
Pj{;o = io, ;1 = iI' ... , ;n = in}
= Qi{~O = io, ~1 = iI' ... , ~n = in}·
< m < n;
*
Sum over all such sequences, with io = j and in = i and im ~ k for 0
even n is allowed to vary.
(110) Lemma. (a) fP{k} (j, i) = [R(k) - R(j)]/[R(k) - R(i)].
(b)fP(j, i) = [R(oo) - R(j)]j[R(oo) - R(i)] ifR(oo) < 00.
(c)fP(j,i) = 1 if R(oo) = 00.

Use (l08, 109) to get (a). Let k and use (101) to get (b)
*
PROOF. -+ 00
and (c).
(111) Lemma. fP(i,j) = 1.

*
PROOF. As in (106).
38 INTRODUCTION TO DISCRETE TIME [1

ARGUMENT FOR RECURRENCE CRITERION IN (104). Suppose R(oo) = 00.


ThenjP(i,j) =jP(j, i) = 1 by (110c, 111). And lis recurrent by (54).
Suppose R( (0) < 00. Clearly, R(i) < R(j); so jP(j, i) < 1 by (llOb).
Now I is transient by (55). *
Suppose I is recurrent for (112).
(112) Lemma. mP(O, 0) = 1 + I:~1 I/(ah).
This is sharper than the null recurrence criterion of (104).
PROOF. Begin by computing some hitting probabilities. Let 0 < j < k.
Then
jP{O}(j, k) = 1 - jP{k}(j, 0) (103) and recurrence
= 1 _ R(k) - R(j) (110a).
R(k)
So
(113) f P{O}(' k) = R(j) .
j, R(k)
Clearly,fP{O}(O, k) = j{O}(1, k); so (113) makes

(114) jP{O}(O, k) = _1_ .


R(k)
By (16) and recurrence,fP{O}(k, k) = ak + bdP{O}(k - 1, k); so (113) makes

1 -fP{O\(k k) = b
J' k
[1 - R(k -
R(k)
1)J',
and by algebra,
(115) 1 - jP{O}(k k) = akrk .
, R(k)
Now compute mP(O, 0), as follows:
mP(O, O) = I:~o eP{O}(O, k) (102)
= eP{O}(O, 0) + I:';:1 eP{O}(O, k)
= 1 ~oo jP{O}(O, k) ()
+ ~k~11 _ jP{O}(k, k) 100

(114, 115). *
In a similar way, mP(i,j) can be computed. It is easy to drop the condition
o < a < 1, and using the idea of (94), it is easy to handle the case where I
j

is all the integers. For further information on random walks, consult (Chung
and Fuchs, 1951), (Feller, 1966), or (Spitzer, 1964).
1.13] THE TAIL a-FIELD AND A THEOREM OF OREY 39

13. THE TAIL a-FIELD AND A THEOREM OF OREY

Let {Xn : n = 0, 1, ... } be a stochastic process. Let 1;(n) be the a-field


spanned by X n, Xn+l' .... The tail a-.field of X is 1;(00) = nn 1;(n). Let ~(OCJ)
be the tail a-field of the coordinate process {~n} on r. The invariant a-.field
J' of r is the a-field of measurable sets B such that (io, iI' ... ) E B iff
(iI' i 2 , ••• ) E B. If each X". is I-valued, then the invariant a-.field of Xis X-If
Of course, X = (Xo, Xl' ... ) is a measurable object with values in r. The
exchangeable a-.field e of /,X! is the a-field of measurable sets B such that
(io, iI, ... ) E B iff (i"o' i,,!, ... ) E B for all finite permutations Tr of (0,1, ... ):
namely, those 1-1 mappings Tr of (0, 1, ... ) onto itself with Tr(n) ¥= n for
only finitely many n. If each Xn is I-valued, the exchangeable a-.field of X is
X-Ie. The object of this section is to describe ~(OO), J', and e up to null sets
for recurrent chains, and to prove theorem (128) of Orey. The discussion
here and in Section 14 is based on (Blackwell and Freedman, 1964).
Clearly, .~ c §:,(OO) c e. The inclusions are strict if I has two or more
elements. For simplicity, assume I;::) {-I, l} during the next three
illustrations.
(116) Illustration. Let A be the set where ~n = 1 for infinitely many n.
Then A Ef
(117) Illustration. Let B be the set where ~2n = 1 for infinitely many n.
Then BE §:'(OO) , but B rt.Jf. Indeed, define w* E I CXo as follows:
w*(n) = 1 for even n.
= -1 for odd n.
Then w* E B but Tw* = w*(1 + .) ¢: B.
(118) Illustration.
Let C be the set where ~ n = -lor 1 for all 11 and
~o + ... + ~n = ° for
infinitely many n. Then C E e, but C rt ~(OO).
Indeed, define w* as in (117). Let W+(I1) = w*(n) except at n = 1. Let
w+(1) = 1. Then w* E C but w+ rt c.
Throughout this section, unless specified otherwise, I is a countable set
and p is a probability on I; and P is a stochastic matrix on I, for which
ASSUMPTION. All states are recurrent.
To state the results, let {1m: mE M} be the partition of I into its cyclically

° °
moving subclasses; recall from (48) that i and j are in the same 1m iff there is
an n ~ and a state k in I with pn(i, k) > and pn(j, k) > 0. For m E M,
let t(m) be the index of the subclass following 1m; so i E 1m and P(i,j) > 0
imply j E It(ml> by (45). Thus, t is a 1-1 mapping of M onto itself. Let Tbe the
40 INTRODUCTION TO DISCRETE TIME [1

shift, so Tw = w(1 + .). Check T-l§,(OO) c §,(OO). Let {Ie:c E C} be the


partition of I into its communicating classes; recall from (33) that i and j
are in the same Ie iff there are nonnegative nand m with pn(i,j) > 0 and
pm(j, i) > o. Finally, say i ,...., j iff there is a state k and finite sequences p
and a of states, such that i p is a permutation of ja and all the transitions in
ipk and in jak are possible. Of course, the transition from a to b is possible
iff P(a, b) > o.

(119) Theorem. Each §'(OO)-set differs by a Pp-null set from some union of
sets {~o E 1m}. More precisely, let A E §,(OO). Let M(A) be the set of mE M
such that Pi(A) > 0 for some i E 1m. Then A differs from U {~o Elm: mE
M(A)} by a Pp-null set. Conversely, each set {~o E 1m} differs by a Pp-null set
from an §'(oo)-set. Finally, M(T-1A) = t-1M(A).

NOTE. T-I{~O Elm} = {~l Elm} = {~o E It-I(m)}: the last equality is a.e.
WARNING. Pp(A) = 0 and Pp(T-IA) = 1 is a possibility, because
p(Im) = 0 and p(It-I(m» = 1 is a possibility.
(120) Theorem. Each .f-set differs by a Pp-null set from some union of
sets {~o E Ie}. Conversely, each set {~o E Ie} differs by a Pp-null set from an
.f-set.

(121) Theorem. The relation ,...., is an equivalence relation: let {Ie: e E E}


be the partition it induces on I. Each iff-set differs by a P p-null set from some
union of sets {~o E Ie}: and each set {~o E Ie} differs by a P p-nul/ set from an
iff-set.

NOTE. The partition {le:e E E} is finer than the partition {Im:m EM},
which in turn is finer than {Ie:c E C e}.
Turn now to the proofs. The first result is the 0-1 law of Hewitt and Savage
(1955). For future use, I will state the result quite generally. Let (V,d)be a
measurable space. Let VOO be the space of V-sequences, endowed with the
product a-fieldd oo • Afinite permutation 7T on Z = (0,1, ... ) is a 1-1 mapping
of Z onto Z, with 7T(n) = n for all but finitely many n. Each 7T induces a 1-1
bimeasurable mapping 7T* of VOO onto V oo :
7T*(VO' VI' •.• ) = (v"o, v,,!' ... ).
The a-field iff of exchangeable sets is the a-field of A E d oo with 7T* A = A
for all finite permutations 7T of Z. Let (n, §', &) be a probability triple.
Let Xu, Xl' ... be measurable mappings from (n, §') to (V, d). Then
X = (Xo, Xl' ... ) is a measurable mapping from (n, § , ) to (V OO , d OO ).
An exchangeable X-set is a set X-IA with A E iff.
1.13] THE TAIL a-FIELD AND A THEOREM OF OREY 41

(122) Theorem. Suppose X o, Xl> ... are f!I-independent and identically


distributed. Then any exchangeable X-set has f!I-probability 0 or 1.
PROOF. Let Q =f!lX-l. It is enough to prove (122) when Xn is the co-
ordinate process on (VOO, dOC! , Q). Let A E rff and let B be measurable and
depend only on finitely many coordinates. The rest of the proof shows
Q(A (') B) = Q(A)Q(B). The equality then holds for all measurable B,
especially B = A. Therefore, Q(A) = Q(A)2 = 0 or 1.
Fix e > O. Find a measurable set At' depending on only finitely many
coordinates, with Q(A. ~ A) < e. Here, C ~ D = (C - D) u (D - C).
Now Q and A are invariant under 7T*, so

Consequently,
IQ(B (') A) - Q(B (') 7T* At)1 < e.
I will construct a 7T with
Q(B (') 7T*A t } = Q(B)Q(7T*At) = Q(B)Q(AJ
Indeed, suppose B depends only on coordinates n ~ b, and At depends only
on coordinates n ~ a. Let c > max {a, b}. Let
7T(n) =n+c for 0 <c
~ n
=n - c for c ~ n < 2c
=n for n ~ 2c.
Then 7T* A. depends only on coordinates n with c ~ 11 < 2c, and is in-
dependent of B. For this 7T,
IQ(B (') 7T*A t ) - Q(B)Q(A)I < Q(B) IQ(A t } - Q(A)I < e.
As usual,
Q(B (') A) - Q(B)Q(A) = Q(B (') A) - Q(B (') 7T* At)

+ Q(B (') 7T* A.) - Q(B)Q(A).


Thus,
< 2e.
*
IQ(B (') A) - Q(B)Q(A)I
(123) Lemma. If A E §"(OC!), then PiCA) = 0 or 1.
PROOF. Let a be the set of wEI"" such that wen) = i for infinitely many
n, and w(O) = i. Then Pica) = 1, by (50). So

PiCA) = PiCA (') a).


I say A (') a is in the exchangeable a-field of the process of i-blocks retracted
to a. Indeed, let Po, Pi' ... be the successive i-blocks. Now P = (Po, Pi' ... )
42 INTRODUCTION TO DISCRETE TIME [I

is a I-I bimeasurable mapping of Q onto the space of sequences of finite


i-blocks. Let B = (3(A n Q). Let w be a sequence of finite i-blocks, and let
7T be a finite permutation of Z. I have to show that
WEB iff 7T*W E B.
But W = {3w and 7T*W = (3w* for unique wand w* in Q. By thinking,
w* = p*w for some finite permutation p of Z with p(O) = O.
NOTE. p depends on 7T and w; for each w, as 7T ranges over all the finite
permutations, p typically ranges only over a small subset of the finite per-
mutations.
Finally, wE A n Q iff p*w E A n Q, because A E ~(OO). Blocks (31) and
Hewitt-Savage (122) now force
PiCA n Q) = 0 or 1.
(124) Lemma. Let i and j be in the same cyclically moving subclass. Then
*
Pi = P j on ~(OO).
PROOF. Suppose pn(i, k) > 0 and pn(j, k) > O. Let A E ~(oo) and
suppose Pi{A} > O. Using (123),
I + [I - pn(i, k)] . Pi{A I ~n -:;i: k}.
1 = Pi{A} = pn(i, k) . Pi{A ~n = k}
So, 0 < Pi{A I ~n = k} = Pj{A I ~n = k} by Markov (15); and Pj{A} > 0,
forcing Pj{A} = 1 by (123). *
This proof of (124) was suggested by Volker Strassen.
PROOF OF (119). Let A E ~(OO). Then M, the set of cyclically moving sub-
classes 1m, is the disjoint union of Mo and M 1, where mEMo iff Pt{A} = 0
for all i E 1m, and mE Ml iff P;{A} = 1 for all i E 1m. This uses (123-124).
Thus, M(A) = M 1. If mEMo, then Pj>{A and ~o Elm} = O. If mE M 1 , then
Pj>{A and ~o Elm} = Pj>{~O E 1m}. Thus, A differs by a Pj>-null set from the
union over m E Ml of {~o Elm}.
For the converse assertion, suppose i E 1m has period d. Then
gnd E 1m for infinitely many n}
is a tail set which differs from go E 1m} by a Pj>-null set; for instance, use (16)
and (45b).
For the final assertion, use Markov (15):
Pi{T-1A} = ~j Pi{~l =jand T-1A}
= ~j P(i,j) . Pj{A}.
Suppose i E 1m' Then j can be confined to I t (ml> for otherwise P(i,j) = 0
1.13] THE TAIL a-FIELD AND A THEOREM OF OREY 43

*
by (45). Now P;{A} is constant at 0 or 1. So P;{T-IA} is 0 or 1, according as
i E 1m with t(m) E Mo or t(m) E MI.
PROOF OF (120). Suppose 1m c Ie' and Ie has period d. Then (45) shows
Ie = U~:~ It-n Cm )·
Let A be an invariant set. Then A E .?F COO ). Use (119) to check M(A) =
M(T-IA) = t-IM(A). Consequently,

U {lm:m E M(A)} = U {le:c E qA)},


where C E qA) iff Ie ~ 1m for some mE M(A).
Conversely, {~o E Ie} differs by a P p-null set from the invariant set
{;n E Ie for infinitely many n};
For instance, use (16) and (39).
(125) Lemma. If A E g, then Pi{A} = 0 or 1.
*
*
PROOF. As in (123).
(126) Lemma. i ""' j implies Pi = P; on g.
PROOF. Find a state k and finite state sequences p and a, with ip a permu-
tation of ja and all transitioils in ipk and jak possible. Let N be length of p,
also of a. Let A E g, with Pi{A} = 1. The problem is to show P;{A} > 0;
this reduction depends on (125).
For any finite state sequence cp, let A<p be the set of infinite state sequences
'1Jl with cp'1Jl EA. Here, cp'1Jl means the infinite state sequence cp followed by '1Jl.
If cp is a permutation of cp*, then A<p = A<po by exchangeability. Because
Pi{A} = 1 and all transitions in ipk are possible,

1 = Pi{A I ~o •.• ~N+l = ipk}


= Pk{A ipk }
= Pk{A;ak}
= P;{A I ~o· .. ~N+I = jak} ;
this uses Markov (15). Because all transitions in jak are possible,
P;{A} > O. *
(127) Lemma. i r-f j implies Pi ¥= P; on g.
PROOF. Fix any state k in the same communicating class as i. Let Ao
be the set of all infinite state sequences with infinitely many k's, starting from
i, and all transitions possible. So Pi{Ao} = 1 by (16, 73). Let Al be the small-
est exchangeable set including A o, so Pi{A I } = 1. Now Al consists of all
infinite state sequences such that k occurs infinitely often, and for all remote
44 INTRODUCTION TO DISCRETE TIME [1

k's, the preceding part of the sequence is obtained by permuting some finite
state sequence ip, all the transitions in ipk being possible. I say P;{A 1 } = O.
For Al C U:'=l B n, where Bn is the set of infinite state sequences with k in
the nth place, the preceding part of the sequence being a permutation of

*
some finite state sequence ip, all the transitions in ipk being possible. But,
i f'/..J j implies P;{Bn } = O.

*
PROOF OF (121). As for (119), using (125-127).
Similar results apply to random walks. Let G be a countable Abelian group
and let 7T be a probability on G. Let V be the set of i E G with 7T(i) > O. Let
{Gm: m E M} partition G into the co sets of the group spanned by V-V.
Let {G e : c E C} partition G into the cosets of the group spanned by V. Let
{Zn:O ~ n < <Xl} be independent random variables with values in G, the
Zn with n ~ 1 having common distribution 7T. Let Xn = ~~=oZv' Now,
{Xn} is a Markov chain, but is in general transient. The tail a-field of {Xn}
is equivalent to the atomic a-field generated by the sets {Xo E Gm} for m E M.
The invariant a-field of {Xn } is equivalent to the atomic a-field generated by
the sets {Xo E Ge } for c E C. Proofs are omitted, being virtually identical to
those for (119) and (120). An analog of (128) can also be obtained, using
similar ideas. This leads to another proof of the renewal theorem.
The next theorem is due to (Orey, 1962).
(128) Theorem. If i and j are in the same cyclically moving subclass,
limn_eo ~keI Ipn(i, k) - pn(j, k)1 = O.

This auxiliary result will be helpful:


(129) Lemma. Let {Xn:n = 0, 1, ... } be a stochastic process on
(0, iF, flJ). The tail a-field of X is trivial under flJ if
(130) lim n _ oo SUPAe~ln) 1.9'{A n B} - .9'{A} flJ{B} I = 0
for each B E ~(eo). If the tail a-field of X is trivial under .9', then (130) holds
for each B E iF.
PROOF. If (130) holds, use it on A = B E ~(",) to get flJ{B} = .9'{B}2 = 0
or 1. Conversely, if ~(oo) is trivial, fix BE iF, and let IB be I on Band 0 oft' B.
For A E ~(n),

IflJ{A n B} - .9'{A} flJ{B} I = IL[I B - .9'{B}] d.9' I


= IL[.9'{B I~(n)} - flJ{B}] dflJ I
~ LI flJ{B I~(n)} .9'{B}1 dflJ.
*
-

Use backward martingales (10.34).


1.14] EXAMPLES 45

PROOF OF (128). Let Q = iP; + iP;. As (119) implies, ff(ro) is trivial for
Q. Use (129) twice, with A = {~n E S} and B = {~o = i} or B = {~o = j},
to get
limn_ro SUPScI IPi{~n E S} - P;{~n E S}I = 0.

*
Finally, ~kEI Iq(k) - r(k)1 = 2 SUPScI Iq(S) - r(S)1 for any probabilities
q and r on all subsets S of I.

14. EXAMPLES

(131) Example. There is a transition matrix P for which I is a recurrent


class of states, having period 1, such that two independent particles, starting
in different states and moving according to P, may never meet.
DISCUSSION. The states are the nonnegative integers, and P(i, i - I ) = 1
for i ~ 1, while P(O, n) = an+! > °for n = 0, 1, . .. will be defined
below. Consider two independent Markov proces5es {Xn:O ~ n < oo} and
{Yn : 0 ~ n < oo} with P for matrix of stationary transition probabilities and
°
initial states and 1, respectively. If the two processes ever meet, they stay
together until reaching 0. Let Sv be the time of the vth return of {Xn} to 0;
while Tv is the time of the vth visit of {Yn} to o. Then, Sk = ~~~1 Uvand
Tk = 1 + ~~:\ Vv , where {Uv : 1 ~ v < 00; Vv : 1 ~ v < oo} are independent,
with this common distribution: the value n is taken with probability an for
n = 1,2, .... The {an} will be chosen to make the event {Sk = Tm for some
k ~ 1 and m ~ I} have probability less than 1. Consider the random walk
on the planar lattice where a point moves to each of its four neighbors with
probability 1/5 and stays fixed with probability 1/5. Let an be the probability
of a first return to the origin at time n. If two independent such walks start
simultaneously from the origin, the probability that for some n the first
returns to the origin at time n, while the second returns at time n - 1, is
precisely the probability of
{Sk= Tmforsomek~ 1 andm~ I}.
This probability cannot be 1; for suppose it were. By symmetry, with prob-
ability 1 there is some m such that the first walk returns to the origin at time
m - 1 and the second returns at m. By strong Markov, the two walks would

*
return to the origin at the same time with probability 1, which is known to be
false (Chung and Fuchs, 1951, Theorem 6).
NOTE. Let P be a stochastic matrix on I. Let {Xn } and {Yn } be independent
P-chains. Then {(Xn' Y n)} is also a Markov chain, with state space I x I and
transitions T, where
m[(i,j), (i' ,j')] = pn(i, i') pn(j,j').
46 INTRODUCTION TO DISCRETE TIME [1

Suppose I is a positive recurrent class of period 1 for P. As (68-69) imply,


I X J is a positive recurrent class of period 1 for T. In particular, {Xn } and
{Yn } meet with probability 1.

(132) Example. There is a transition matrix P and starting probability p


for which I is a transient essential class, the invariant o'-field is trivial, while
the tail o'-field is full and nonatomic.
DISCUSSION. The states are pairs (m, n) of positive integers with n ~
2 . 3m . The matrix P is defined by:
P[(m, 1), (m, 3m )] = P[(m, 1), (m, 2 . 3m )] = !;
while
P[(m, n), lm, n - 1)] = 1 for n ~ 3
and
P[(m, 2), (m + 1, 1)] = 1.
The starting probability concentrates on (1, 1).
For m ~ 1, let U m be 1 or 2 according as (m, 1) is followed by (m, 3m ) or
(m,2'3 m) in {~n:n=O,I, ... }. Consider pO.I)' The time Tm at which
(m + 1,1) is reached equals L::'~l U)" a.e., and Tm determines UI , · · • , Um
a.e. The tail o'-field of {~n} is therefore equivalent to the o'-field determined
by {Un: 1 ~ n < oo}, which is equivalent to the full o'-field. Since U I , U2 , •••
are independent and identically distributed, and nonconstant, the o'-field they
generate is nonatomic. Let A be an invariant set. Then

lA = lA(~1'm' ~Tm+1' .•• )

*
is measurable on Um+!' Um+2 , • •• , and A has probability 0 or 1 by the
Kolmogorov 0-1 law.
(133) Example. There is a transition matrix P and starting probability
p on I = {I, 2, 3} for which I is a communicating class of period 1, but the
exchangeable a-field is nontrivial.
DISCUSSION. P is the matrix

Let p and 0' be finite sequences of states with all transitions possible in 1p3
and 3a3. In a finite sequence of states with all transitions possible, any 3 is
either at the beginning of the sequence or is preceded by a 2. Consequently,
in Ip there is one more 2 than 3. In 30', however, there are as many 2's as 3's.

*
Thus Ip cannot be permuted into 30', and in view of (121), the a-field of
exchangeable events is not trivial for PP ' provided p(l) > 0 and p(3) > O.
2

RATIO LIMIT THEOREMS

1. INTRODUCTION

Throughout this chapter, unless noted otherwise, I is a recurrent class


relative to the stochastic matrix P. Interest centers on the null recurrent case
and on measures with infinite mass. Fix a reference state s E I. Remember
that {~n} is Markov with stationary transitions P and starting state s relative
to the probability p •. Remember that the first s-block runs from the first s
to just before the second s. Remember the definition (1.80) of invariance and
subinvariance. Define the measure fl on I by the relation: fl(i) is the P.-mean
number of i's in the first s-block; that is,
fl(i) = e(P{s})(s, i) = eP{s}(s, i).
As (1.100) implies, flU) < 00.
(1) Theorem (Derman, 1954). There is one and only one invariant measure
whose value at s is 1, namely fl. Any sub invariant measure is invariant.
(2) Theorem (Doeblin, 1938). Let i,j, k, I E I. Then
lim n _ oo [~;:.=o pm(i,j)]f[~;:'=o pm(k, 1)] = fl(j)ffl(I).
Section 2 contains some preliminary material on reversing time, and makes
no assumptions about l. Theorems (1) and (2) are then proved together in
Section 3. Section 4 contains some variations on (1) and (2); on first reading
the book, do only (21-24) in Section 4. Section 5 is about restricting the range
of a chain {~n}: fix a set J of states, and delete the times n with ~ n rt= J. This
idea leads to another proof of (1), and reappears in ACM. On first reading
the book, skip Section 5.

I want to thank Allan Izenman for checking the final draft of this chapter.
47
48 RATIO LIMIT THEOREMS [2

(3) Theorem (Kingman and Orey, 1964). Let N be a positive integer, and
let e > O. Suppose I has period 1, and pN(i, i) >
e for all L Thenfor each
m = 0, 1, ... and i,j, k, I E I:

This result is proved in Section 6. Section 7 contains an example like


Dyson's in (Chung, 1960, p. 55), which shows why you should assume
pN(i, i) > e. This section can be skipped without logical loss.
(4) Theorem (Harris, 1952) and (Levy, 1951). Let Vn(i) be the number of
m ~ n with ~m = i. With P.-probability 1,

limn->oo Vn(j)/Vn(k) = f.-t(j)/ f.-t(k).


This result is proved as (83) in Section 8. Section 9 contains related material
which is used in a marginal way in Chapter 3. Part (a) of (92) is in (Chung,
1960, p. 79).

2. REVERSAL OF TIME

In this section, let P be a stochastic matrix on the countable set I; but do


not make any assumptions about the recurrence properties of I. Let v be a
measure on I. Say v is strictly positive iff v(i) > 0 for all i E I. Say v is locally
finite iff v(i) < 00 for all i E I. Say v is finite iff v(l) < 00. Do not assume v is
finite without authorization. Throughout this section, suppose
v is subinvariant.
If Q is substochastic, write i -- j relative to Q iff Qn(i, j) > 0 for some n > O.
Write i -- j iff i -- j for P.
(5) Lemma. (a) v(i) > 0 and i -- j imply v(j) > O.
(b) v(j) < 00 and i -- j imply v(i) < 00.

*
PROOF. v(j) ~ ~k v(k)pn(k, j) ~ v(i)pn(i, j).
(6) Lemma. If I is a communicating class, either v is identically 0, or v is
strictly positive and locally finite, or v( i) = 00 for all i.
PROOF. Use (5).
Let J = {i: 0 < v( i) < oo}. For any matrix M on I, define the matrix,
*
vM on Jby:
vM(i, j) = v(j)M(j, i)/v(i).
Call vM the reversal of M by v.
2.2] REVERSAL OF TIME 49

(7) Lemma. (a).P is substochastic.


(b) Suppose v is locally finite, and not identically zero. Then .p is stochastic
iff v is invariant.
PROOF. Abbreviate J = l and fix i E J. If v{j) = 0, then v{j)P{j, i) = O.
If v(j) = 00, then P(j, i) = 0 by (5b), so v{j)P{j, i) = 0 by convention.
Consequently,

(8) LjEJ .P(i,j) = L jEJ v{j)P(j, i)/v(i)


= LjE[ v(j)P{j, i)/v(i).
This proves (a), and the ifpart of (b), because J is nonempty. For the only if
part of (b), suppose v is locally finite and yP is stochastic. Then (8) makes

v(i) = LjE.T l'(j)P(j, i) for v(i) > O.


Subinvariance by itself makes

v(i) = LjEf v(j)P(j, i) for v(i) = O.


*
Let e"P = L~=o pm, where po is the identity matrix. So, e"P(i,j) is the
P;-mean number of visits to j up to time n. Remember eP from (1.49). So
eP = limn enp. Check

(9) (.p)n = .(P") and e"(.P) = .(e"P) and e(.P) = veeP).

Suppose i,j E J Use (9) and (1.51, 1.69):


(10) i---+ j for P and .p simultaneously; i is transient, null recurrent or
positive recurrent for P and .p simultaneously.
(11) Lemma. Let Q be a substochastic matrix on I, which is not stochastic.
Suppose that i ---+ j relative to Q for any pair i, j. Then eQ < 00.
PROOF. Let a ¢: I. Extend Q to a matrix Q on I = I U {a} as follows:

Q(i,j) = Q(i,j) for i andj in I


Q(i, 0) = 1- LjEf Q(i,j) for i in I
Q(o,i) = 0 forj in I
Q(o, a) = 1.

Then Q is stochastic. By assumption, there is a k E I with Q(k, a) > O. Let


i E /. By assumption, i ---+ k for Q. Then Q makes i ---+ k and k ---+ so i ---+ a, a;
a
but fr i: now (1.55) shows i to be transient. Let j E I. Use (1.51) to get
eQ(i,j) < 00. Check Qn(i,j) = Q"(i,j). So eQ(i,j) < 00. *
50 RATIO LIMIT THEOREMS [2

3. PROOFS OF DERMAN AND DOEBLIN

Suppose again that I is a recurrent class of states relative to P.

(12) Lemma. Any subvariant measure is invariant.


PROOF. Let v be sul:>invariant. Suppose v is strictly positive and locally
fipite: (6) takes care of the other cases. Then (9) and (1.51) make
eCP)(i,j) = 00.

Lemma (II) makes vP stochastic, and (7b) makes


I f v is a measure, / is a function, and M is a matrix on J, let
l' invariant.
*
vM/= L i . i v(i)M(i,j)f(j).
Let the probability bi assign mass 1 to i and vanish elsewhere. Let the function
c5 i be I at i and vanish elsewhere. So pe"PIJ i is the P p-mean number of visits
to i up to time n.

(13) Lemma. LeI p and q be probabilities on I. Then


lim n _ oo (pen PIJ;)/(qe"Pc5;) = 1.
PROOF. It is enough to do the case q =IJ i :
penPIJ i = penPIJi ,c)ienpbi
qenPIJ i c5 i enPc5 i qe 71 Pc5 i
Let T be the least n if any with ~n = i, and T = 00 if none. 1 say
(14) penni = Lt~o Pp{T = t} . (en-tp)(i, n.
Indeed, b;(~m) = 0 for m < T, so
pe npc5; = fL::'~o c5;(~m) dPp

= L~~J L::'~t t5Mm) dP p'


J{T~t}
Check that {T = I} is measurable on {~o, ... , ~t}; and ~t = i on {T = t}.
So Markov (l.l5 *) implies

f ~~,~t t5;(~m) dPp = Pp{T = t} ·fL::'-::o c5;(~m) dPi


J{T~t}

This proves (14).


2.3] PROOFS OF DERMAN AND DOEBLIN 51

Divide both sides of (14) by


()ienP()i = (enP)(i, i).
For each t = 0, 1, ... , the ratio (en-tP)(i, i)/(enP)(i, i) is no more than 1,
and converges to 1 as n -+ 00: because
o ~ (enP)(i, i) - (en-tP)(i, i) ~ t
and

by (1.51). Next, (1.55) makes


~o;;:;t<CX) Pp{T = t} = I.
= bi •
*
Dominated convergence now proves (13) in the case q
F or measures v and functions f on I, let v(f) = ~iErf(i)v(i), and let f x"
be the measure on I which assigns mass f (i)v(i) to i.

(15) Lemma. Let f and g be functions on I. Let v be a strictly positive and


locally finite invariant measure on I. Suppose v(lfl) < 00 and v(/gi) < 00 and
v(g) =F O. Then

PROOF. Let M be a sub stochastic matrix on I, and let h be a nonnegative


function on I, with v(h) < 00. By algebra,
biMh = (h x v)(vM)bdv(i).
Using (9),
(16)
Suppose h' is another nonnegative function on I, with v(h') < 00. As (16)
implies, bienPh' < 00. Consequently,
(17)
Case 1: f and g are nonnegative. Use (13), with (f x v)/v(f) for p and
(g X v)/v(g) for q and .p for P; this is legitimate because (10) makes I a
recurrent class for .P. The yield is

(f x v)e1l( .P)b; vC!)


-+--
(g x v)e 1l(.P)b. v(g)'
Now use (16) withf or g for h.
Case 2: g is nonnegative. Splitfinto its positive and negative parts. Use
(17) and case 1.
Case 3: f is nonnegative. Take reciprocals and use case 2.
52 RATIO LIMIT THEOREMS [2

The general case. Splitfinto its positive and negative parts. Use (17) and
case 3. *
Remember that ft(i) = eP{s}{s, i) is the P,-mean number of i's in the first
s-block.
(18) Lemma. (a) Let k 'F s. The P,-mean number of pairs j followed by k
in the first s-block is ft(j)P(j, k).
(b) The PB-probability that the first s-block ends with j is ft(j)P(j, s).
PROOF. Claim (a). Let7" be the least n > 0 if any with ~ .. = sand 7" = ex:>
if none. Let IX..
be the indicator function of the event
A .. = {~.. =jandn < 7"}.
Let P.. be the indicator function of the event
Bn = {An and ~"+1 = k}.
Confirm 7" > n + 1 on B .. , because k 'F s. On {~o = s};
~~o IX.. is the number of j's in the first s-block;
~:=o P.. is the number of pairs j followed by k in the first s-block.
Check that {7" > n} is in the O'-field generated by ~o, ..• , ~ ... Now Markov
(1.15) and monotone convergence imply

f(~~=o {3..) dP. = ~~=o P.{A" and ~..+1 = k}

= ~~o P,(A .. ) . P(j, k)


= [f(~:=oIX..)dP,] . P(j, k)

= ft(j)P(j, k).
Claim (b) is similar.
*
(19) Lemma. ft(s) = 1 and ft is invariant.
PROOF. By definition, the first s-block begins at the first s, and ends just
before the second s. There is exactly one s in the first s-block, so ft(s) = 1.
If k = s, then ft(j)P(j, k) is th~ P,-probability that the first s-block ends with
j by (I8b). The sum onj is therefore I, that is, ft(s). If k 'F s, then ft(j)P(j, k)
is the P,-mean number of pairs j followed by k in the first s-block by (I8a).

*
The sum on j is therefore the P.-mean number of k's in the first s-block,
that is, ft(k).
(20) Lemma. If'll is a strictly positive and locally finite invariant measure
then v(j)/v(k) = ft(j)/ft(k).
2.4] VARIATIONS 53

PROOF. Use (15), with / = t5 j and g = 15 k • Then use (19) to put fl for v

*
in (15).

*
PROOF OF (1). Use (12, 19,20).
PROOF OF (2). Abbreviate ()(i,j) = t5 i e n Pt5;. Then
()(i,j)
()(i,j) ()(k,j)
--=- _._-
()( k, /) ()( k, j) ()( k, l)

Use (13) with t5Jor p and 15 k for q andj for i to make the first factor converge
to 1. Use (15) with k for i and 15; for/and 15 1 for g to make the second factor
converge to fl(j)/ fl(l). *
4. VARIATIONS

(21) Theorem. eP{i}(i, k) = eP{i}(i,j) . ePU}(j, k).


FIRST PROOF. Define a measure v on I by:
v(k) = eP{i}(i, k)/eP{i}(i,j).
Plainly, v(j) = 1. By (19), the measure v is invariant, so (1) implies
v(k) = eP{j}(j, k). *
SECOND PROOF. The nth j-block is the sample sequence from the nth j
until just before the n + 1st j, shifted to the left so as to start from time O.
Here, n = 1,2, .... Suppose i,j, k are all different. Let Zn be the number
of k's in the nth j-block. Let T be the number of j's in the first i-block. As
in Figure 1, let A be the number of k's on or after the first i, but before
the first j after the first i. Let B be the number of k's on or after the second i,
but before the first} after the second i. On {~o = i}, the number of k's in the
first i-block is

So

eP{i}(i, k) = J(~~~l Zn + A - B) dPi

= J~~~l Zn dP +IA dP JB dP
i i - i·

By strong Markov (1.22*):

J A dP i = J B dP i ;

J Zl dP i = eP{j}(j, k).
54 RATIO LIMIT THEOREMS [2

a
I [ -------. ZI 0 r·-----)
I
[----------------.J.----. A,---------------------)[ -------- Z1----------)
:----------------r-----------------
I

I I I /f-
j j

T =0 [------------- B,-------------)

a
I
I
[.----------A ,--------)[,------- z1,------ )[------Z2'-----)[.------- z3'----f----- )[--- Z4'----)
I
~--- --- ~------.
I I I 1~
j j j j j

T=3 [--B-o)

Visits to i and j are shown, but not to k. The variables A, B, ZI' Z2' 0 • are the
number of k's in the intervals shown.
Figure 1.

By blocks (1.31),

(22) the variables ZI' Z2' ... are Pi-independent and identically distributed.

I claim

(23) {-r < n} is independent of Zn.


Granting (23), Wald (1.71) implies

J r.~=1 Zn dPi = J'T dPi -J Zl dP

= eP{i}(i,j)· eP{j}(j, k).


Relation (23) follows from

Pi{'T = m and Zn = z} = Pk = m} 0 Pi{Zn = z}


for m < n. Let a be the time of the second i. Then a is Markov, and ~" = i
on {a < oo}. Clearly, 'T is measurable on the pre-a sigma field. Let ~ be the
post-a process. Then Zn = Zn-m 0 ~ on go = i and 'T = m}, as in Figure 1.
2.4] VARIATIONS 55

Use strong Markov:


Pi{T = m and Zn = z} = Pi{T = m and, E [Zn-m = z]}
= Pi{T = m} . Pi{Zn-m = z} by (1.22)
= Pi{T = m} . Pi{Zn = z} by (22).

*
This proves (23).

For (24), let I be a positive recurrent class. That is, the Pcmean waiting
time mP(i, i) for a return to i is finite. And 7T(i) = l/mP(i, i) is the unique
invariant probability.

(24) Theorem. Suppose I is positive recurrent:

(a) eP{i}(i,j) = 7T(j)/7T(i) = mP(i, i)/mP(j,j).


(b) Any subinvariant measure is a nonnegative scalar mUltiple of 7T, and
is automatically offinite total mass and invariant.

*
PROOF. Claim (a). As in (21), using (1.84) for the first argument.
Claim (b). Use (1) and (a).
At a first reading of the book, skip to Section 6.
Remember enp = po + ... + pn, so (enP)(i,j) is the Pi-mean number of
j's through time n, and
pMf = ~i.i p(i)M(i, j)f(j)·

(25) Theorem. For any probability p on I,


limn~oo (penPCJi)/(penpo j ) = eP{j}(j, i).

PROOF. Suppose i ¥- j. Let T(n) be the number ofj's up to and including


n. Let Zm be the number of i's in the mthj-block. As in Figure 2, let A be
the number of i's before the first j, and let B(n) be the number of i's after n
but before the first j after n. Then the number of i's up to and including time
n IS

So

(26) penPCJi = f ~::':>1 Zm dP + fA dP


p p - f B(n) dPp '

By blocks (1.31),
(27) Z1> Z2' ... are P p-independent and identically distributed.
I claim
(28) {T(n) < m} is P p-independent of Zm.
56 RATIO LIMIT THEOREMS [2

a
,II ---Z1 0 ~-- ) [ . Z2 0 ~.)[.-- Z3 0 ~--- )

I ---------------------- A ---------------------- )[----. Z 1'---') 1--' Z2-· )[_.-.• Zr-'-)


I
time n :. --- --------- ~ -----------+
time 0
I I I I I I /f-
j j

r(n)=O

a
,11-'-Z1 0 ~ --.}[_.- Z2 0 ~ •• - )

1,--,,-',,-' A·····--··· )[.-. Z 1-- )[.-••• Z2"'-)[ ··-.Z3····· )1 -····Z4·····)


: -- - - - - - - - - - - ~ ~ -------- - --+
time 0 I
I I I I /f-
j j j

Ten) =2

Visits to j are shown, but not to i. Variables A, B, (n), Zl' Z2 ... are the number of
i's in the intervals shown.
Figure 2.

This follows from

for t < m. To prove (29), let a be the time of the t + Istj. Then a is Markov,
and ~(f = j on {a < oo}. Moreover, {T(n) = t} is in the pre-a sigma field.
Let' be the post-a process. Then Zm = Zm-t 0 , on {T(n) = t}, as in Figure
2. By strong Markov (1.22):

P j){T(n) = t and Zm = z} = P j){T(n) = t and, E [Zm-t = z]}


= P j){T(n) = t} . P;{Zm-t = z}.
and
Pj{Zm-t = z} = Pj){Zm = z}.
This proves (29), and with it (28). By strong Markov (1.22*),

f Zl dPj) = eP{j}(j, i).


2.4] VARIATIONS 57

Use Wald (1.71):


(30a) J~::':>l Zm dPp = J T(n) dPp JZl dPp
= penPb i . eP{j}(j, i).

time 0
~1-----+--------~~----------+-------~~/~A=3

time 0 time n
~I---+----+----+---+1-+1--+-+---+""1//- B(n)= 2
j j j

Figure 3.

As in Figure 3, let A be the number of i's on or after the first i but before the
firstj after the first i, and let B(n) be the number of i's on or after the first i
after n, but before the first j after the first i after n. By the strong Markov
property (1.22*),

JA dP = JB(n) dP
p p = eP{j}(i, i),
which is finite by (1.100). Plainly, A ~ A and B(n) ~ B(n). So

(30b) J A dP p ~ eP{j}(i, i) and J B(n) dPp ~ eP{j}(i, i).

Using (26, 30),

Ipepenpb;
np.t
Ui
_ eP{ .}(. i)
j j,
II ~ eP{j}(i, i) .
-
pe
np.t
Ui

*
But penPbi ~ 00 as n ~ 00, by (1.51).

Remember fl(i) = eP{s}(s, i); and fl(g) = ~ g(i)fl(i); and g X fl assigns


mass g(i)fl(i) to i,

(31) Theorem. Let f be a function on I, with fl(lfl) < 00 and fl(f) =;1= O.
Then
limn~oo (bienPf)/(bienPf) = l.

PROOF. The case f ~ O. By (16),

bienPf = (f X fl)en(,..P)bd fl(i) < 00.


Use (25) with (f X fl)/fl(f) for p and ",P for P. This is legitimate because
58 RATIO LIMIT THEOREMS [2

(10) makes I a recurrent class for pP. You learn

But, I say,
epP{j}(j, i) = p.(i)/p.(j).
For p.(')/p.(j) is 1 atj by inspection, and is pP-invariant by computation. So
(1) works on pP.

The general case. Let j+ and j- be the positive and negative parts of
/, so/=j+ - j-. Let
an = lJienPj+ and bn = lJienPj-
C n = lJjenPj+ and dn = lJjenPj-.
Then
an _ bn d n
lJ;enPj an - bn Cn dn en
--= =
lJjenPj en - dn 1 _ dn
en
Use the special case to get an/cn -+ 1 and bn/dn -+ 1. Use (15) to get
dn/c n -+ p.(/-)/p.(/+) ¥: 1. *
Let f and g be functions on I, with p.(1/1) < 00, and p.(lgl) < 00, and
p.(g) ¥: O. Let p and q be probabilities on l. Combining (13) and (25) gives
(32) limn_a> (penPlJ;)/(qenPlJj) = p.(i)/p.(j).
Combining (15) and (31) gives
(33) limn->oo (lJienPf)/(lJjenPg) = p.(j)/p.(g).
Of course, (33) can be obtained from (32) by reversal. It is tempting to
combine (32) and (33). However, according to (Krengel 1966), there are
probabilities p and q bounded above setwise by a multiple of p., and a set
A c I with p.(A) < 00, such that (pe nPI A )/(qenPI A ) fails to converge. The
same paper contains this analog of (31):

(34) Theorem. If g is nonnegative, positive somewhere, and bounded, then


limn_ oo (lJienPg)/(lJjenPg) = 1.
PROOF. Suppose i ¥: j. Let T be the least t if any with ~, =j and T = 00
if none. I claim .
(35)
2.5] RESTRICTING THE RANGE 59

Indeed,

t5;enPg = f L.;:'=o g(~m) dP;

r L.::'=o g(~m) dP;


~ J{T~n}

= L.~l r L.;:'=O g(~m) dPi


J{T=t}

J
= L. t: 1 Pi{T = t} L.;:'::O g(~m) dPi
by Markov (1.15*). This proves (35).
Abbreviate Iig I = SUPk g(k). Divide both sides of (35) by t5 j enPg. For each
t, the ratio (t5jen-tpg)/(t5jenpg) is at most 1, and tends to 1 as n increases,
because
o~ t5 je nPg - t5 jen- tPg ~ t· Ilgll
and t5 j e nPg -- 00 as n -- 00. By dominated convergence,
lim infn~'" (t5 ie nPg)/(t5jenPg) ~ 1.

*
Interchange i and j.

5. RESTRICTING THE RANGE

The computations in this section were suggested by related work of


Farrell; the basic idea goes back to Kakutani (1943). Let In be the set of
functions (J) from {O, ... ,n} to I. For (J) E In and 0 ~ m ~ n, let ~m«(J) =
w(m). Let n consist of r, all the In' and the empty sequence 0. So ~m is
partially defined on n:
domain ~m = I'" u [Un;"m In].
Give n the smallest <1-field § over which all ~ m are measurable. Let f!jl be a
finite or infinite measure on §, such that
f!jl{~n = i} < 00 for all nand i.
Say f!jl is subinvariant iff for all n, and all I-sequences io, .•• , in'
(36) f!jl{~m = im for 0 ~ m ~ n} ~ f!jl{~m+l = im for 0 ~ m ~ n}.
Say f!jl is invariant iff equality always holds in (36). Let Q be a substochastic
matrix on I. Say 9 is Markov with stationary transitions Q iff for all n, and
60 RATIO LIMIT THEOREMS [2

all I-sequences io, •.• , in' in+l'


(37) &P{~m = im for 0 ~ m ~ n + I}
= &P{~m = im for 0 ~ m ~ n} . Q(in' in+l)'
Let (,qr, d, Jt) be an abstract measure space. Let Yo, Y1 , ••. be partially
defined functions from ,qr to /. Call Y = (Yo, Y1> ... ) a partially defined [-
process iff domain Yn+l c domain Y n , and {Yn = i} Ed has finite Jt-
measure for all nand i. Then Y is a measurable mapping from ,qr to n:
[Y(x)](n) = Yn(x) if x E domain Y n.
The distribution of Y is Jt Y-l. Say Y is subinvariant, invariant, or Markov
with stationary transitions Q iff its distribution has the corresponding
properties.
n
Let J be a subset of I. Define Markov times T m on as follows: TO = 0;
and T m+l is the least n > T m if any with ~ n E J, while T m+l = 00 if none. Let
nlJ = {~o EJ}. Let :FIJ be :F relativized to nlJ. Let &PIJ be the retract
of &P to (nIJ, :F/J). Let ~fJ be this partially defined I-process on (nlJ, :FIJ):
alJ)n = ~Tn provided Tn < 00.
(38) Theorem. Suppose &P is subinvariant. Then ~/J is subinvariant.
PROOF. Let io E J, and let i1> ... , iN E I. Abbreviate T for Tl' I claim
(39) &Pg n = in for 0 ~ n ~ N}

~ &P{~o E J and T < 00 and ~T+n = in for 0 ~ n ~ N}.


Indeed, let:
Am = go EJand ~t 1= J for 1 ~ t ~ m - 1 and ~m+n = in for 0 ~ n ~ N};
Bm = {~t 1= J for 1 ~ t ~ m - 1 and ~m+n = in for 0 ~ n ~ N};
C m = {~t 1=J for 0 ~ t ~ m - 1 and ~m+n = in for 0 ~ n ~ N}.
By inspection, Cm C Bm. By subinvariance,
(40)
So, &P(C m ) ~ &P(Bm) ~ &P(Cm _ 1). Consequently,
(41) &'(C m ) nondecreases to a nonnegative limit as m increases.
The Am are pairwise disjoint, their union is the set on the right of (39), and
Am = Bm and Bm\Cm • So the right side of (39) is
~::=l &P(Am) = ~::=l [&P(Bm) - &P(C m)]
~ ~::=l [&P(C m _ 1) -
&P(C m )] by (40)
= &P(Co) - limm &P(C m) by (41) and telescoping
~ &P(Co)·
But Co is the set on the left of (39); this completes (39).
2.5] RESTRICTING THE RANGE 61

Let, be the post-T process; that is, 'maps {T < oo} into Q by
['(w)](n) = ;T(w)+n(w) for w E domain ;dw)+n'
Letj E J and A E Y with A C {;o = j}. Then (39) makes

.9{A} ~ .9{;0 E J and T < 00 and, E A}.


Letjo, . .. ,jn EJ. Specializej = jo and
A = go EJ and (;/J)m = im for 0 ~ m ~ n}.
Then
go EJ and T < 00 and, E A} = go EJ and (;jJ)m+1 = im for 0 ~ m ~ n}.

*
This gives (36) with .9/J for .9 and ;jJ for ;.

(42) Theorem. If.9 is Markov with stationary substochastic transitions, then


;/J is Markov with stationary substochastic transitions relative to .9/J.
If .9 has transitions Q, write Q/J for the transitions of ;/J. Even if Q is
stochastic, Q/J need not be.
PROOF. Let Q be the transitions of .9. Remember that the probability Qi
on Q makes ; Markov with stationary transitions Q and starting state i.
Let

for i and j in J. Let ;0' ... , in, in+1 E J. I say

.9go EJ and (;/J)m = im for 0 ~ m ~ n + I}


= &'{C;O E J and (;!J)m = im for 0 ~ m ~ n} . R(in, i n+1 ).

Indeed,

so it is enough to do it with Qj in place of .9. After this reduction, use


strong Markov (1.22) on the Markov time Tn' If m ~ n, then (;/J)m is
measurable on the pre-Tn sigma-field. *
(43) Lemma. Let I = {I, 2}. Let Q be a stochastic matrix on l,/or which 1
is a communicating class. There is one and only one subinvariant measure, up
to scalar multiplication, and it is necessarily invariant.
PROOF. A locally finite, subinvariant measure has finite total mass, so is
invariant. Let u = [u(l), u(2)] be an invariant measure. Since Q(2, 1) > 0,
the equation uQ = u implies
=
*
u(2) Q(l, 2) Q(2, 1)-1 u(l).
62 RATIO LIMIT THEOREMS [2

With these preliminaries out of the way, it is possible to give an


ALTERNATIVE PROOF OF (1). Remember that P is a stochastic matrix on
I, for which I is a recurrent class. Let u be a locally finite subinvariant measure
on I. I have to show u is invariant and unique. Define a measure 9 on Q by:
9 = ~iEI u(i)P i •
Check that 9 is subinvariant and Markov with stationary transitions P.
Let J c I. Relative to 91J, the process glJ is subinvariant and Markov with
stationary transitions PIJ by (38, 42); its starting measure is the retract of
u toJ. I claim the retract of u toJ is subinvariant for PIJ. Indeed, if j E J, then
u(j) = 9{ go = j}
= 9{go E J and (gIJ)o = j}
~ 9{go EJ and (gfJ)l = j}
= ~iEJ 9{go EJ and (gfJ)o = i and afJ)l = j}
= ~iEJ 9{go E J and alJ)o = i} . (PIJ)(i,j)
= ~iEJ 9{go = i} . (PIJ)(i,j)
= ~iEJ u(i)(PIJ)(i,j)·
Eachj E J is recurrent, so gfJ is defined almost everywhere, and
(44) PIJ is stochastic.
Because I is a communicating class for P, therefore J is a communicating
class for PIJ. If J = {i,j}, the ratio u(i)ju(j) can be computed from PIJusing
(43), so in principle from P. So there is only one locally finite subinvariant
measure, up to scalar multiplication.
Why is it invariant? Suppose
u(j) > ~iEI u(i)P(i,j);
so
(45) 9{go = j} > 9{gl = j}.
Let J = {j} I say
u(j) = 9{go = j}
= 9{go EJ and (g/J)o = j}
> 9{ go E J and (g/J)l = j}
= 9{go EJ and (g/J)o = j and alJ)l = j}
= 9{to E J and (g/J)o = j} . (P/J)(j,j)
= 9{go = j} . (P/J)(j,j)
= u(j).
2.5] RESTRICTING THE RANGE 63

To get line 3, argue as for (39): put N = 0 and io = j, so BI = {~I = j} and


Co = go = j}, and 9(BI) < 9(Co) by (45). To get line 4, remember
(~fJ)o EJ = {j}. To get line 5, use (42). For the last line, (44) makes PIJ
stochastic on J = {j}. The contradiction uU) > uU) forces u to be
invariant. *
EXAMPLE. Let I = {I, 2, 3}. Define the stochastic matrix P on I by

P(I,2) = P(2, 3) = P(3, 1) = 1;

all other entries in P vanish. Let p be the invariant probability:

p(l) = p(2) = p(3) = 1/3.

Let TO, T I , ••• be the times n with ~n = 1 or 2. Then {~T:n


n
= 0, I, ... }
is not P v-invariant:

A digression on e,'godic theory


The technique used in proving (39) gives a quick proof of a theorem of Kac
(1947). To state it, let Yo, YI , . . . be an invariant process ofO's and l's on a
probability triple (n, !F, 9). Let T be the least positive n if any with Y n = 1,
and T = 00 if none.

(46) Theorem. S{Yo=l) T d9 = 9{Yn = 1 for some n ~ O}.

PROOF. The left side of (46) is ~:=l 9{ Yo = 1 and T ~ n}. But

9{ Yo = I and T ~ I} = 9{ Yo = I}.
For n ~ 2,
9{Yo = 1 and T ~ n}
= 9{Yo = 1 and YI = ... = Y n- 1 = O}
= 9{Y1 = ... = Y n- 1 = O} - 9{Yo = ... = Y n- 1 = O}
= 9{ Yo = ... = Y n- 2 = O} - 9{ Yo = ... = Y n- I = O}.

So, ~:=l 9{Yo = I and T ~ n} telescopes to

9{ Yo = I} + 9{ Yo = O} - limn-->oo 9{ Yo = ... = Yn = O}
which is the same as

I - 9{Yn = 0 for all n ~ O} = 9{Yn = I for some n ~ o}. *


64 RATIO LIMIT THEOREMS [2

6. PROOF OF KINGMAN-OREY

I learned the proof of (3) from Don Ornstein: here are the preliminaries.
Let

for p and a in [0,1], except for p =a= °andp = a = l.

°
(47) Lemma. Fix a with < a < 1. As p runs through the closed interval
[0, 1], the function p -- f(p, a) has a strict maximum at p = a.
PROOF. Calculus. *
Let
m(p, a) = f(p, a)lJ(a, a).
(48) Lemma. Let °
< p < a < 1. Let Y1 , Y 2 , ••• be independent and
°
identically distributed random variables on the probability triple (11,.iF, [llJ),
each taking the value 1 with probability p, and the value with probability 1 - p.
(a) [llJ{Yl + ... + Yn ~ na} ;;:;; m(p, a)n.
(b) m(p, a) < I.
PROOF. Claim (a). Abbreviate Y = Y1 and S = Y1 + ... + Yn.
Write E for expectation relative to [llJ. Let 1 < x < 00. Then S ~ na iff
x S ~ xna. By Chebychev, this event has probability at most

Compute
E(x Y ) = I - p + px.
By calculus, the minimum of x -- x- a (1 - p + px) occurs at

x = a(l - p) >I
p(l - a)
and is m(p, a). Use this x.
Claim (b). Use (47).

°
(49) Lemma. Let ;;:;;f;;:;; 00 be a subadditivefunction on {I, 2, ... }, which
*
isfinite on {A, A + 1, ... } for some A. Let
IX = infn~d(n)/n.
Then
limn~ocJ(n)/n = IX.

Subadditive meansf(a + b) ;;:;;f(a) + feb).


2.6] PROOF OF KINGMAN-QREY 65

PROOF. Fix!5 > O. I will argue


lim sUPn~oof(n)/n ~ ot + !5.
To begin, choose a positive integer a with
f(a)/a ~ ot + !5.
Sincef(ka) ~ kf(a), I can choose a ~ A. Let fJ = max {f(n):o ~ n < 2a}.
Let n ~ 2a; then
n = ma + b = (m - I)a + (a + b)
for some positive integer m and an integer b with 0 ~ b < a. Of course,
m and b depend on n. So
fen) ~ (m - l)f(o) + f(a + b);
fen) < (m - l)a '/(0) f(a + b)
n = n a + n

:$ (m - 1)0. (ot + !5) + ~ .


- n n

But (m - I)a/n ~ 1 and fJ/n ~ 0 as n ~


(50) Lemma.
00.

If i has period I, then [pn(i, O]I/n converges as n ~


* 00.

onf(n) = -logpn(i,
PROOF.

(51) Lemma.
Use (49) i).
*
Suppose is a communicating class of period 1. There is an L
I
with 0 ~ L ~ I, such that limn~'" [pn(i,j)]1/n = Lfor all i andj in I.
PROOF. Let k > O. Then
(52) kiln ~ 1 as n ~ 00.

Letf(n)iln ~ Las n ~ 00, and let c be an integer. Then


(53) fen + c)l/n ~ L as n ~ 00.

As (50) implies, limn~oo [pn(i, i)]l/n = LU) exists for all i. Fix i ¥= j. Choose
a and b with
pa(j, i) >0 and Pb(i,j) > o.
Then
pn(j,j) ~ pa(j, i) . pn-a-b(i, i) . Pb(i,j).
Take nth roots and use (52-53) to get L(j) ~ L(i). Interchange i and j to
get L(i) = L(j) = L say. Abbreviate g(n) = [pn(i,j)]l/n. I say g(n) ~ L.
Indeed,
pn(i,j) . pa(j, i) ~ pn+a(i, i)
pn(i,j) ~ Pb(i,j) . pn-b(j,j).
66 RATIO LIMIT THEOREMS [2

Take nth roots and use (52-53):


lim sup g(n) ~ =L
L(i)
~ L(j) = L.
*
lim inf g(n)

(54) Lemma. If I is a recurrent class of period 1, then L = 1.


< 1, then r.npn(i, i) < 00. Use (1.51).
*
PROOF. If L

NOTE. In some transient classes, L = 1.

For the balance of this section, suppose I is a recurrent class of period 1.


The next two lemmas prove the case N = 1 of theorem (3). Remember that
°
CJ; is 1 at j and elsewhere. Remember
r. p(i)M(i,j)f(j)·
pMf = i ,;

(55) Lemma.
on l. Then limn~oo (PpnHb;)/(ppnb;)
°
Suppose P(i, i) > e > for all i E l.
= 1.
Let p be a probability

PROOF. Suppose e < 1. Let P* be this stochastic matrix on I:


P*(i,j) = P(i,j)/(1 - e) for i ¥= j
= [P(i, i) - e]/(1 - e) for i = j.
Then P = (1 - e)P* + el:1, where 1:1 is the identity matrix. Make the usual
convention that p*o = 1:1. Let

Sm = (n ~ 1)(1 _e)men+l-m pp*m CJ ;

tm = (:)(1 - e)men- m pp*m CJ;.


So

Of course, Sm and tm depend on n.


Fix a positive e' much smaller than e. Let

M = {m: m = 0, ... , n + 1 and 1 - e - e' <~ <1- e + e'}


n+l
an = r.m {sm: m E M}
bn = r.m {sm:m = 0, ... ,n + 1 but m 1: M}
Cn = r.m {tm:m EM}
d n = r.m {tm:m = 0, ... ,n but m 1: M}.
2.6] PROOF OF KINGMAN-OREY 67

Check that I is still a communicating class of period 1 for P*. Use (1.47)
to create a positive integer n* such that:

n > n* implies pP*"(jj > O.


Keep n so large that
(56a) + 1)(1 - e - e') > n*
(n

(56b) (n + 1)(1 - e + e') < n

(56c) n - (n + 1)(1 - e - e') > n(e + ie').

In particular, mE M makes m < nand pP*"'(jj > O. So

(57)

By algebra

so
_e_ < Sm < _e_, for m E M.
e + e' tm e - e'
By Cauchy's inequality,

e an e
(58) --<-<--.
e + e' en e - e'
Let Y1 , Y 2 , ••• be independent and identically distributed random vari-
ables on (n,!F, f!J'), taking the value 1 with probability 1 - e and the value
o with probability e. Let ji = 1 - y. Because P* is stochastic,
o ~ pp*m(jj ~ 1.

So b n and d n are nonnegative. Furthermore,

bn ~ Im{(n ~ 1)(1_ e)me +1-m:o ~ m ~ n + 1but m ¢M}


n

= f!J'{Yl + ... + Yn +1 ~ (n + + e')}


1)(1 - e
+ f!J'{Yl + ... + Yn+1 ~ (n + 1)(1 - e - e')}
= f!J'{Yl + ... + Yn+1 ~ (n + 1)(1 - e + e')}
+ f!J'{Yl + ... + Yn+1 ~ (n + 1)(e + e')}.
68 RATIO LIMIT THEOREMS [2

Similarly,
+ ... + Y.. ~ (n + 1)(1 - e + e')}
d.. ;;;; &'{Yi
+ &'{ Y1 + . " + Y.. ~ n - (n + 1)(1 - e - e')}
;;;; &'{ Y1 + ... + Y .. ~ n(l - e + e')}
+ &'{ Y1 + ... + y.. ~ nee + le')}
by (56). So (48) produces an r = r(e, e') < I.such that
(59) b.. ;;;; r" and d.. ;;;; r".
But (51, 54, 57) makes lim inf (a .. + b .. )l/.. ~I, so r" = o(a.. + b ..); and
b .. ;;;; r" by (59), so b .. = o(a.. + b .. ), forcing b .. = o(a.. ). Similarly,
d .. = o(e.. ). Therefore,

· In
I1m . fa .. + b.. = I'1m In
. fa..
- an d I'1m sup a .. + b.. = I'1m sup -a .. .
e.. + d.. e.. e.. + d.. e..
By (57, 58), the lim inf and lim sup of (pP"+ldj)/(pP"d j) are both trapped
between el(e + e') and el(e - e'); these bounds are close to I. *
(60) Lemma. Suppose P(i, i) > e > 0 for all i E l. For any probability p
on I,
lim .. _ex> (pP"dj)/(pP"d i) = eP{i}(i,j).
PROOF. For any subsequence, use the diagonal argument (10.56) to find
a sub-subsequerice n' such that (pP"'dj)/(pP"'~i) converges as n ~ 00, say to
p(j), for allj E I. Here 0;;;; p(j) ;;;; 00, and p(i) = I. My problem is to show
p(j) = eP{i}(i,j). As (1.47) shows, PP"~i > 0 for all large n. Make the
convention 00 ' 0 = 0, as required by (1.80), and estimate as follows:

~j p(j)P(j, k) = I; Iiin..... [:;:.~:] ex> • P(j, k) (definition)

= ~
4;
lim...
.. ex>
[PP"'d;' P(j,
pp..,~i
k)] (convention)

< I'm
= I ..... ex>
~.
4,
[PP"'d;' P(j,
pP"'d;
k)] (Fatou)

. [PP"'+ld k ] (algebra)
= hmM"'~
.. - ppn'lJ;

-- I'1m..... ex> [PP"'d


- -k ]
ppn'd;
(55)

= p(k) (definition),
So, p is subinvariant. Use (I).
*
2.6] PROOF OF KINGMAN-0REY 69

I will now work on the case N > 1 in theorem (3). Suppose N is an integer
greater than 1. Suppose I is a recurrent class of period 1 relative to P, and

pN(i, i) >8 >0 for all i E I.


Let p be any probability on I.

(61) Lemma. limn~oo (ppn+NOj)/(ppnOj) = 1.

PROOF. Check that I is a recurrent class of period 1 for pN. Let r be one
of 0, ... , N - 1. Use (55) with p.Y for P and pP' for p, to see that

(62) Lemma. limn~oo (ppnOj)/(ppno i ) = eP{i}(i,j).


*
PROOF. As in (60), choose a subsequence n' such that (ppn'Oj)/(ppn'Oi)
converges, say to f-l(j), for allj. Using (61), argue that f-l is subinvariant with
respect to pN. NoweP{i}(i, .) is invariant with respect to P, so with respect to

*
pN. The uniqueness part of (1), applied to pN, identifies f-l(j) with
eP{i}(i,j).

(63) Lemma. limn~oo pn(k,j)/pn(j,j) = 1.

PROOF.

(64) Lemma.
Reverse (62), as in (15).

limn~oo pn+l(j,j)/pn{j,j) = 1.
*
PROOF. Let r be one of 1, ... , N - 1. By algebra,

By (63) and Fatou,

(65) . f n~oo pn+r('j,j')/pn('j,j.) >


· 10
I1m = 1.
Use (61) to replace n in the denominator by n +N without changing the
lim inf:

Put m = n + r:
Invert:
(66)

for r = 1, ... , N - 1. Put r = 1 in (65) and r = N - 1 in (66). *


Theorem (3) follows by algebra from (62-64).
70 RATIO LIMIT THEOREMS [2

7. AN EXAMPLE OF DYSON

In this section, I will follow Dyson's lead in constructing a countable set I,


a state s E I, and a stochastic matrix P on I, such that:

(67a) I is a recurrent class of period 1 for P;


(67b) P(i, i) >0 for all i in I;
(67c) lim infn~oo pn+1(s, s)jpn(s, s) ~ 2P(s, s);
(67d) lim sUPn_oo pn+l(s, s)jpn(s, s) = 00.

NOTE. pn+1(s, s)jpn(s, s) ?; pes, s) for any P.

The construction for (67) has three parameters. The first parameter p is a
positive function on the positive integers, with 1::'=1 pen) = 1. There are no
other strings on p: you choose it now. The second parameter f is a function
from the positive integers to the positive integers, withf(l) = 1. For n = 2,
3, ... , I will require
(68) fen) ?;f(n - 1) + 2.
I get to choose f inductively; it will increase quickly. The third parameter q is
a function from {2, 3, ... } to (0,1). You can pick it, subject to

(69) q(N)f{N) >1_ pen) for n = 1, ... ,N


n
The state space / consists of all pairs [n, m] of positive integers with
1 ~ m ~f(n). The special state s is [1,1]. Here comes P. Let

P(s,j) = 0 unlessj = [n, 1] for some n;


pes, [n, I)) = pen) for n = 1,2, ....

For n > 1 and 1 ~ m <fen), let:


P([n, m),j) = 0 unlessj = [n, m] or [n, m + 1];
= 1 - q(n);
P([n, m], [n, m))
P([n, m], [n, m + I)) = q(n).
Let P([n,f(n»),j) = 0 unless j = [n,f(n)] or s;
P([n,f(n)], [n,f(n))) = 1- q(n);
P([n,f(n»), s) = q(n).
2.7] AN EXAMPLE OF DYSON 71

You check (67a-b), using (1.16). Abbreviate O(n) = pn(s, s). For my f
and n ~ 2, it will develop that
(70a) e[f(n)] < 2p(n)/n
(70b) (1 -- p~n»)p(n) < e[f(n) + 1] < (1 + ~)p(n)
(70c) 2(1 - p~n») p(n)P(s, s) < e[f(n) + 2] < 2P(n)[ pes, s) + A 1
You can get (67c-d) from (70).
Remember that 100 is the space of I-sequences. And ~t(w) = wet) for
w E 100 and t = 0, 1, .... Let rt.(t, w) be the first component of wet), and let
{J(t, w) be the second component, so
~t(w) = [rt.(t, w), {J(t, w)].
Let T n( w) be the least t if any with rt.(t, w) ~ n, and let T n( w) = 00 if none.
The probability p. on r makes ~ Markov with stationary transitions P and
starting state s. For n = 2, 3, ... , I will require

(71)

a
Let t/= /. Let ,~ = ~t for t < Tn' and ,~ = 0 for t ~ Tn" Let N ~ 2.
Supposef(n) andq(n) have been chosen for n < N, so that (68, 69, 71) hold.
I know this doesn't determine P, but the P.-distribution of is already ,N
fixed: ,N is Markov with stationary transitions and a finite state space;
every state leads to the absorbing state o. I can use (1.79) to choose feN) so
large that (68, 71) hold for N. Now you choose q(N) so (69) holds for N.
The construction is finished.
I must now argue (70). Fix n ~ 2. Suppose Tn(W) < 00. Abbreviate
rt.(w) = rt.[Tn(W), w].
Now rt.(w) ~ n. By (68),
(72a) Tn(W) < 00 implies f[rt.(w)] ~f(n)

(72b) Tn(W) < 00 and rt.(w) >n imply f[rt.(w)] ~f(n) + 2.


Let G be the set of w such that
(73a) w(O) = s
(73b) Tn(W) ~f(n)
(73c) wet) = [rt.(w), t - Tn(W) + 1] for
Tn(W) ~ t ~ Tn(W)+ f[rt.(w)] - 1
(73d) wet) =s for t = TnCW) + f[rt.(w)].
72 RATIO LIMIT THEOREMS [2

Relations (73a-b) force


(73e)
Check that Tn is Markov. Remember oc ~ n. By strong Markov (1.21) and
(69),

almost surely on {Tn ;[:f(n)}. By (71),

(74) Ps{G} > [1 - p~n)r > 1_2p~n) .

ARGUMENT FOR (70a). Suppose wE G. By (72a) and (73e),

Tn(W) + f[oc(w)] - 1 ~f(n).


And (73c) prevents w[f(n)] = s. That is,

{~t(n) = s} c loo\G.
Use (74):
Offen)] ;[: 1 - P,(G) < 2p(n)Jn.
ARGUMENT FOR (70b). Plainly,

{~m = (n, m) for m = 1, ... ,fen) and ~t(n)+1 = s} c gt(nl+l = s}.

So (69) does the first inequality in (70b). For the second, suppose wE G.
Suppose Tn(W) = 1 and ~l(W) ¥- tn, 1]. Then (73c) makes oc(w) > n, and
(72b) makes
(75)
now (73c) prevents w(f(n) + 1] = s. Suppose T new) ~ 2. Then (72a) estab-
lishes (75), and (73c) still prevt;nts w(f(n) + 1] = s. That is,
G n gt(n)+l = s} C {~l = (n, I]).
Use (74):
8[f(n) + 1] ;[: Psg l = (n, In + Ps{IOO\G}

< p(n)( 1 + ~).


ARGUMENT FOR (70c). Let A be the event that

~m = (n, m] for m = 1, ... ,fen) and ~t(n)+l = ~t(n)+2 = S


2.8] ALMOST EVERYWHERE RATIO LIMIT THEOREMS 73

and let B be the event that

~l = s and ~m = [n, m - 1] for m = 2, ... ,fen) + 1 and ~f(nl+2 = s.


Use (69):

P.(A) = PiB) = q(n)f(nlp(n)P(s, s) > (1 - p~n»)p(n)p{s, s).

But A and B are disjoint, and A u B C {~f(nl+2 = s}. This proves the first
inequality in (70c). For the second, suppose W E G. Suppose 'Tn{w) = 1 and
~l{W) -:F [n, l],or'Tn{w) = 2and~2{w) -:F [n, I]. Then (73c)makesot{w) > n,
and (72b) makesf[oc{w)] ?;,f{n) + 2. So
(76) 'Tn(w) + f[ot{w)] - I ?;,f(n) + 2.
Now (73c) prevents w[f(n) + 2] = s. Suppose 'Tn{w)?;, 3. Then (72a)
establishes (76), and (73c) still prevents w[f(n) + 2] = s. If wE G and
~l(W) = [n, I], then wE A; this uses (73d). So

G n {~f(nl+2 = s} c A U g2 = [n, In.


And
O[f(n) + 2] ~ Ps(A) + P sg 2 = [n, I]} + I - Ps(G).
Check
(77) Psg l -:F s and ~2 = [n, In = o.
Use (74) and (77):

Ps(A) < p{n)P(s, s)


P s {;2 = [n, In = PS{~l = s and ~2 = [n, In = p(n)P(s, s)
I - Ps(G) < 2p(n)/n.
Add to get the second inequality in (70c). *
I think you can modify P to keep (67a, c, d), but strengthen (67b) to
P(i,j) > 0 for all i, j.

8. ALMOST EVERYWHERE RATIO LIMIT THEOREMS

Let I be a recurrent class relative to the stochastic matrix P. Let fl be an


invariant measure. I remind you that

fl(h) = ~iEl fl(i)h(i).


The next result is due to Harris and Robbins (1953).
74 RATIO LIMIT THEOREMS [2

(78) Theorem. Suppose f and g are functions on I, with ,uGfl) < 00 and
,u(lgl) < co. Suppose at least one of ,u(f) and ,u(g) is nonzero. Fix s E I. With
P.-probability 1,
·
tImn-+oo ~;:.~o f(~m) _ ,u(f)
-.
~;:.~o g(~m) ,u(g)
PROOF. Suppose ,u(s) = 1 and ,u(g) =F O. Using (1.73), confine w to the
set where ~o = s and ~n = s for infinitely many n, which has P.-probability
1. Let 0 = 'T 1 < 'T2 < ... be the times n with ~n = s. Let I(n) be the largest m
with 'T m ~ n, so len) -+ co with n. Let h be a function on Iwith ,u(lhl) < co:
the interesting h's aref, IfI , g, Igl. For m = 1,2, ... ,let
Ym(h) = ~n {h(~n):'Tm ~ n < 'Tm+l}
Vm(h) = Y 1(h) + ... + Ym(h).
I claim that with P.-probability 1, as 11 _+ co:

V!(n)(f) ,u(f)
(79) ---+-
Vz(n)( g) ,u(g) ,
and

(80)

'i
Introduce Es for expectation relative to P s. Let be the number ofj's in the
first s-block: namely, the number of n < T2 with ~ n = j. As (I) implies,
,u(j) = Es{O· Clearly,

So
E S{Y1(h)} = ~jEI h(j),u(j) = ,u(h).
By blocks (1.31), the variables Ym(h) are independent and identically
distributed for m = I, 2, .... The strong law now implies that with p.-
probability 1,
(81)
Put h = f or g and divide: with P.-probability 1,
limm--->oo V mCf)fVmeg) = ,u(f)/,u(g).
Put m = len) to get (79). Next,
YmCh)
- - =Vm(h)
---m - -_1 .V_
m_1Ch)
-
m m m m-l
2.9] THE SUM OF A FUNCTION OVER DIFFERENT j-BLOCKS 75

So (81) implies that with P.-probability I,


limm _ oo Ym(h)!m = O.
Putg for h in (81): with P.-probability 1,
lim m _ oo Ym(h){Vm(g) = O.
Put m = I(n) to get (80).
Abbreviate

Check
(82)
Clearly,
Sn(j) = Sn(j) - l/z(n)(j) + l/z(n,(j)
Sn(g) Sn(g) - l/z(n)(g) + l/z(n)(g)

But Sn(h) - VHn)(h) = O[VHn)(g)] almost surely for h =f or g by (80, 82).


Using (79),

with P.-probability 1.

(83) Corollary. Let Vn(i) be the number of m ~ n with em = i.


*
Then with
P.-probability I,

PROOF. Putf = b; and g = bk in (78).

(84) Corollary. Suppose I is positive recurrent, with invariant probability 7T.


*
Suppose 7T(ljD < 00. Then with P.-probability I,

limn-+oo ~~::'=oj(em) = 7T(j).

==
PROOF. Put g I in (78).
*
9. THE SUM OF A FUNCTION OVER DIFFERENT j-BLOCKS

The distribution of Yn(h), as defined in Section 8, depends not only on h,


but also on the reference state s. Certain facts about this dependence will be
useful in Chapter 3, and are conveniently established as (92). Result (92a)
is in (Chung, 1960, p. 79). Here are some standard preliminaries.
76 RATIO LIMIT THEOREMS [2

(85) Lemma. Let Ui be real numbers, and 1 ~ P < 00. Then

PROOF. Use Jensen's inequality (10.9) on the convex function x ~ Ixl P


to get

*
(86) Lemma. Let Ui be real numbers, and 0 < p < 1. Then

I~;'!:l uil P ~ ~I!:l IUiI P •


PROOF. The general .case follows from the case n = 2, by induction.
The inequality lu + vi ~ lui + Ivl further reduces the problem to nonnegative
u and v. Divide by u to make the final reduction: I only need
(1 + x)P ~ 1 + x for x ~ O. P

o. The derivative of the left side is strictly less than


*
Both sides agree at x =
that of the right side at positive x.
From now on, 0 < p < 00. Let 7T be a probability on (- 00, (0). Say
7T EP iff S~ co Ixl P 7T(dx) < 00.
(81) Lemma. Let 7Tl and 7T2 be probabilities on (- 00, (0). Let 0 ~ 0 ~ 1
and let 7T = 07T! + (1 - O)7T2.
(a) If 7Tl and 7T2 are in LP, then 7T E LP.
(b) If 0 > 0 and 7T E P, then 7Tl E P.

PROOF. Easy.

Let (O,.'F, &l') be a probability triple. Write E for &l'-expectation. Say a


*
random variable X E P iff E{lXIP} < 00. For (88), let U and V be random
variables on (0, .'F, &l').

(88) Lemma. (a) If U E P and V E P, then U + V E P.


(b) If U and V are independent, and U + V E P, then UEP and
VEP.

PROOF Claim (a) is clear from (85) or (86).


Claim (b). By Fubini, E(lu + VIP) <
00 for .9'U-l-almost all u. Choose
one good u. Then
V = (u + V) + (-u)
is in P by (a).
*
2.9] THE SUM OF A FUNCTION OVER DIFFERENT j-BLOCKS 77

For (89), let M, WI> W 2 , ••• be independent random variables on


(n, .'F, &). Suppose the W's are identically distributed. Suppose M is non-
negative integer-valued, and &{M = I} > O. Make the convention ~~ = O.
(89) Lemma. Let p ~ 1 and ME £P, or 0 < p < 1 and M E U.
(a) ~!I=l Wm E £P implies Wl E £P.
(b) Wl E £P implies ~!=l W m E £P.
PROOF. Claim (a).

&{M = I}' E{IWlI P} = r


){M=l}
IWllP d&

~ f'~~=l Wml P
d&

< 00 ••
Claim (b). Suppose p ~ 1. Check this computation, using (85).

E{I~;!l WnI P} = ~~=l E{I~::'=l WnI P} . &{ M = m}


~ ~~=l mP-l~:=l E{IWnIP} . &{M = m}
= {E{IWlI P} . ~~=l mP&{M = m}
= E{I WlIP} . E{ M P}
< 00.
The argument for 0 < p < 1 is similar, using (86).
(90) Lemma. Suppose U and V are independent random variables, and suppose
*
&{U+ V=O}= 1. Then&{U= -K}=&{V=K}= 1 for some constant
K.
PROOF. Fubini will produce a constant v with

&{U + v = O} = 1.

For (91), suppose M, Wl' W 2 , ••• are independent random variables on


*
(n,.'F, &), the W's being identically distributed. Suppose M is nonnegative
integer valued, and &{M = I} > o. As before, ~~ = O.
(91) Lemma. Suppose K is a constant and
78 RA no LIMIT THEOREMS [2

(a) 9'{Wl = K} = 1.
(b) If K ¥- 0, then 9'{M = I} = 1.

PROOF. Claim (a).

9'{Wl = K} = 9'{Wl = KIM = I}


= 9'{~!=1 Wm = KIM = 1}
=1.
= = =
Claim (b). Clearly ~~=l Wm KM, so 9'{KM K}

Return to Markov chains; I is a recurrent class for the stochastic matrix P.


1.
*
Confine w to the set where ~n = j for infinitely many n, for allj E I. This set
has Prprobability 1 by (1.73). Let 0 ~ Tl(j) < T2(j) < ... be the n with
~n =j. The nthj-block is the sample sequence on [Tn(j), Tn+1(j», shifted so
as to start at time O. Here, n = 1, 2, .... Fix a function h on I, and let

For any particular j, the variables {Yn(j):n = 1,2, ... } are independent and
identically distributed relative to Pi' The distribution depends on j, but not
on i.
(92) Theorem. (a) If Yn(j) is in U relative to Pi for some i and j, then
Yn(j) is in LP relative to Pifor all i and j.
(b) If P i{ YJj) = O} = 1 for some i and j, then P i{ Y n(j) = O} = 1 for all
i andj.
PROOF. There is no interest in varying i, so fix it. Fixj ¥- k. I will inter-
changej and k. Look at Figure 4. Let N(j, k) be the least n such that the nth
j-block contains a k. Abbreviate

I claim that with respect to Pi'

(93) V(j, k) is distributed like V(k,j).

Indeed, let A(j, k) be the sample sequence from the firstj until just before the
first k after the first j. Let B(k,j) be the sample sequence from this k until
just before the next j. Then A(j, k)B(k,j) is the sample sequence from the
first j until just before the first j after the first k after the first j. So V(j, k)
is the sum of h('Y}) as 'Y} moves through the state sequence A(j, k)B(k,j).
Equally, V(k,j) is the sum of h('Y}) as 'Y} moves through the state sequence
2.9] THE SUM OF A FUNCTION OVER DIFFERENT j-BLOCKS 79

~ ~~ ~n

time 0
L--..JI
r-----"71
11------··-i"AU, ')"'-'1'-'---'---1[-- i)-,)I NU, B(: fI- k) =3

j j j k k j

[--- no j ---) [---------------------------- no k ----------------------------.)

[-no k-)[,-noj~----)

Figure 4.

A(k,j)B(j, k). As strong Markov (1.22) implies, A(j, k) is independent of


B(k,j), and is distributed like B(j, k). So the rearranged sample sequence
B(k,j)A(j, k) is distributed like A(k,j)B(j, k). Addition being commutative,
(93) is proved.
Let c(j, k) be the Pi-probability that the firstj-block contains a k, so

Pi{N(j, k) = n} = c(j, k)[l - c(j, k)]n-I for n = 1,2, ....


NOTE. 0 < c(j, k) ~ 1.
On some convenient probability triple (n,~, ;7l), construct independent
random variables M(j, k), C(j, k), DI(j, k), D 2(j, k), ... with the following
three properties:

(a) ;7l{M(j, k) = m} = c(j, k)[l - c(j, k)]m for m = 0, 1, ... ;


(b) the ;7l-distribution of C(j, k) is the conditional Pi-distribution of
YI(j) given the firstj-block contains a k;
(c) the ;7l-distribution of Dn(j, k) is the conditional Pi-distribution of
YI (j) given that the first j- block does not contain a k, and does not
depend on n.
In particular,

(94) the P;-distribution of YI(j) is c(j, k) times the ;7l-distribution of C(j, k),
plus [1 - c(j, k)] times the ;7l-distribution of Dn(j, k).
80 RA TIO LIMIT THEOREMS [2

Abbreviate
U(j, k) = l:;:;~{.k) Dm(j, k),
with the usual convention that l:?n=1 = O. Blocks (1.31) and (4.48) show
(95) the 9-distribution of C(j, k) + U(j, k) coincides with the P i-
distribution of V(j, k).

Claim (a). Assume Y1(j) E U for Pi' I have to show Y I(k) E U for Pi'
I claim
(96) V(j, k) E U for Pi'
Suppose c(j, k) < 1, the other case being easier. Now C(j, k) and D I(j, k)
are in U, using (94) and (87b). So U(j, k) E U by (89b). This and (88a) force
C(j, k) + U(j, k) E U. Now (95) proves (96).
As (93,96) imply, V(k,j) E U for Pi' So C(k,j) + U(k,j) E U by (95).
Consequently, C(k,j) E U and U(k,j) E U by (88b). In particular,
D I(k,j) E U by (89a). This gets Y1(k) E U by (94, 87a).

Claim (b). AssumePi{YI(j) = O} = 1. I have to shOWPi{YI(k) = O} = 1.


Clearly, Pi{V(j, k) = O} = 1. So (93) implies Pi{V(k,j) = O} = 1. By (95),
9{C(k,j) + U(k,j) = O} = 1.
By (90), there is a constant K with
&{C(k,j) = -K} = &{U(k,j) = K} = 1.
So (9Ia) makes 9{D 1 (k,j) = K} = 1. But c(1 - c) < 1 for 0 ~ c ~ 1, so
9{M(k,j) = I} < 1. This and (91b) force K = O. Now (94) gets
P;{YI (k) = O} = 1. *
Remember T1(j) is the least n with ~n = j. Let T(j, k) be the least n > 71(j)
with ~n = k. Let p(k,j) be the least n > 7(j, k) with ~n = j. See Figure 4.

(91) Corollary. If Y1(j) E U relative to Pdor some i and j, then


l:n {h(~n):7I(j) ~ n < 7(j, k)} E U
relative to Pdor all i,j, k.

PROOF. Suppose j ¥= k. Let


S(j, k) = l:n {han): Tl(j) ~ n < T(j, k)}
T(k,j) = l:n {h(~n):T(j, k) ~ n < p(k,j)}.
Then S(j, k) is the sum of h('Y) as 'Y) moves through A(j, k). And T(k,j) is the
sum of h('Y) as 'Y) moves through B(k,j). So S(j, k) is independent of T(k,j).
2.9] THE SUM OF A FUNCTION OVER DIFFERENT j-BLOCKS 81

But
S(j, k) + T(k,j) = V(j, k) E £P

*
by (96). So S(j, k) E £P by (88b). Use (92) to vary j.
The next result generalizes (1.76).
(98) Corollary. If T2(j) - Tl(j) E £P relative to Pi for some i and j, then
T2(j) - Tl{j) E £P and T{j, k) - T1(j) E £P relative to PJor all i,j, k.
PROOF. Put h == 1 in (92, 97). *
3

SOME INVARIANCE PRINCIPLES

1. INTRODUCTION

This chapter deals with the asymptotic behavior of the partial sums of
functionals of a Markov chain, and in part is an explanation of the central
limit theorem for these processes. Markov (1906) introduced his chains in
order to extend the central limit theorem; this chapter continues his program.
Section 3 contains an arcsine law for functional processes. The invariance
principles of Donsker (1951) and Strassen (1964), to be discussed in B & D,
are extended to functional processes in Section 4. For an alternative
treatment of some of these results, see (Chung, 1960, Section 1.l6).
Throughout this chapter, let I be a finite or countably infinite set, with at
least two elements. Let P be a stochastic matrix on I, for which I is one posi-
tive recurrent class. Let 7T be the unique invariant probability; see (1.81).
Recall that the probability Pi on sequence space I<f:> makes the coordinate
process {~..} Markov with stationary transitions P and starting state i.
Fix a reference state S E I. Confine w to the set where ~fI = s for infinitely
many n. Let 0 ~ 71 < 72 < ... be the times n with ~fI = s. Let/be a real-
valued function on I. Let

and

Here and elsewhere in this chapter, j is used as a running index with values
1, 2, ... ; and not as a generic state in I. Let Vo = 0 and V m = ~7'=1 Y i and

I want to thank Pedro Fernandez and S.-T. Koay for checking the final draft
of this chapter.
82
3.2] ESTIMATING THE PARTIAL SUMS 83

Sn = ~;~o f( ~j). For (3) and (4) below, assume:


(1) ~iEI 71"(i) IfU)1 < 00 and ~iEI 71"(i)f(i) = 0,

and
(2) Vi has finite Ps-expectation.
NOTE. If (2) holds for one reference state s, it holds for all s: the
dependence of V j on s is implicit. This follows from (2.92), and can be used
to select the reference state equal to the starting state. I will not take
advantage of this. Theorems (3) and (4) hold if [x] is interpreted as any non-
negative integer m with 1m - xl ~ 2. The max can be taken over all values of
[ ]. And i is a typical element of I.
(3) Theorem. n- i max {ISj - V[ju(s)]I:O ~j ~ n} -+ ° in P;-probability.
(4) Theorem. With Pi-probability 1,
(n log log n)-i max {ISj - V(ju(s)Jl: 0 ~ j ~ n} -+ 0.
The idea of comparing Sj with V[ju(S)] is in (Chung, 1960, p. 78).
For (6), do not assume (1) and (2), but assume
(5) Y j differs from 0 with positive Ps-probability.

Sn °°
Let Vm be 1 or according as Vm is positive or nonpositive. Similarly, let
be 1 or according as Sn is positive or nonpositive.
NOTE. s is a fixed state, and Sn is a random variable.
(6) Theorem. With Pcprobability 1,

NOTE. The quantities Tn' Y j , V j , V m' Vm depend on the reference state s.


This dependence is not displayed in the notation. The quantities Sm and Sm
do not depend on the reference state. In (3), the convergence may be a.e. I
doubt it, but have no example.

2. ESTIMATING THE PARTIAL SUMS

Blocks (1.31) imply: Y1 , Y2 , . . . are Pi-independent and identically


distributed. So are Vb V 2 , •••• The joint Pi-distribution of the Y's and U's
does not depend on the state i, but does of course depend on the reference
state s. Introduce Ei for expectation relative to Pi· Assumption (1) implies
'k
E i ( Y1 ) = 0: it is enough to check this when i = s. Let be the number of k's
84 SOME IN VARIANCE PRINCIPLES [3

in the first s-block. Then

Using (2.24),
E.(Y1) = l:kEI j(k)E'('k) = l:kEI j(k)1T(k)/1T(S) = O.
Assumption (2) implies E i ( Y:) < 00.
On {'Tl ~ n}, let:
len) be the largestj with 'Ti ~ n;
Y'(n) = l:i {j<Ei):O ~j ~ 'Tl - I};
= l:i {f<Ei):'Tl(n) ~j ~ n}.
Y"(n)
On {'T1 > n}, let I(n) = 0 and Y'(n) = Sn and Y"(n) = O. Let V_I = O. As
in Figure I,
(7) Sn = Y'(n) + Y"(n) + Vl(nl-l'
[.----- Y' (n )-----) [ ------------Y 1------------)[-,----. y 2-------)[----. Y 3-----)[-- Y" (n ) -- )

o n

[--------------------------------------------------. Sn .--------------------------------------------- )
l(n)=4
Figure 1.

Then (3) follows from (10), (11), and (13), to which (8) and (9) are
preliminary .

(8) Lemma_ Let ai' a2 , ••• and 0 < bl ~ b2 ~ ••• be real numbers with
bi - 00 and ai/bi - O. Then
max {ai, ... , an}/b n - O.

*
PROOF. Easy.

(9) Lemma. Let a > O. Let ZI, Z2' . .. be identically distributed, not
necessarily independent, with E(\Zlla) < 00. Then
n-I / a maXi {\Zil : 1 ~ j ~ n} - 0 a.e.
PROOF. It is enough to do the case a = 1 and Zi ~ 0: to get the general
case, replace Zi by IZila. By (8), it is enough to show

Znln-O a.e.
3.2] ESTIMATING THE PARTIAL SUMS 85

Let e > 0, abbreviate Am = {em ~ Zl < e(m + I)}, and check this
computation.
~~1 Prob {Zn ~ en} = ~::"=l~:=n Prob Am
= ~:=1 ~::'=1 Prob Am

<l~ooJ~Z
=e m=l 1 Am

< 00.

*
Borel-CantiIli implies that almost everywhere, Zn ~ en for only finitely
many n. Let e ! 0 through a sequence.

(10) Lemma. n- i maXj {I Y' (j)1 : 0 ~ j ~ n} ->- 0 with Pi-probability 1.

PROOF.
(11) Lemma.
I Y'(n)1 ~ ~j {If(~j)I:O ~j ~
n- i maXj {I Y" (j)1 : 0 ~ j ~ n}
T1 -

->-
I}, for all n.
0 with P;-probability 1.
*
PROOF. Plainly, I Y" (j)1 ~ U IC })' where Uo is temporarily O. But
l(n)~n+l,so

maXj {I Y"(j)I:O ~j ~ n} ~ maXj {Uj: 1 ~j ~ n + I}.


Use (2) and (9). *
(12) Lemma. (a) l(n)/n ->- 7T(S) with P;-probability 1.
(b) Tl(n)/n ->- 1 with Pi-probability 1.

PROOF. Let rj = TJ+1 - T j. By blocks (1.31), the Pi-distribution of {r;}


does not depend on i; and with respect to Pi' the random variables {r;} are
independent and identically distributed. Moreover, £i(r 1 ) = 1/7T(s) by (1.81).
The strong law implies that with Pi-probability 1,
(r1 + ... + rm)/m ->- 1/7T(s).
Put m = len) and take reciprocals to see that with Pi-probability I,
(a) l(n)/[r1 + ... + rHnd ->- 7T(S).
Remember len) ~ n + 1 and look at Figure 2:
TUn)+! - n ~ rUn) ~ max; {r;: 1 ~j ~ n + I}.
86 SOME INVARIANCE PRINCIPLES [3

-rl(n)-l-rl(n)-

~1-----+-------4----~----~;~~(----~I----~I----~I--~1
o Tl T2 T3 TI(n) - 1 Tl(n) n TI(n) +1

[-RI(n)-d

Figure 2.

Use (9) to deduce:


(b) rHn) = o(n) a.e.
(c) THnl+l - n = o(n) a.e.
As Figure 2 shows,
r1 + ... + r.en) = n + (Tlln)+1 - n) - Tl'
Clearly,
(d) Tl = o(n).

Use (c) and (d) to deduce:


(e) rl + ... + rHn) = n + o(n) a.e.
Combine this with (a) to prove (l2a). Next,
Tllnl = (rl + ... + rllnl) - rllnl +T 1;

combine this with (b), (d), and (e) to get (12b).


The proof of the next result involves (B & D, 1.118), which is quoted below.
*
(13) Lemma. n-! max {I V!(j)_1 - V[;17(s)]I:O ~ j ~ n} -+ 0 in Pi probability.
PROOF. Use (13.118) and (I2a). Namely, fix e > 0 and 0> O. For r > 1,
let OCr, n) be the Pi-probability-which does not depend on i-of the event
An = {en! > max [IVa - Vbl:O ~ a ~ b ~ ra and 0 ~ a ~ n]}.
Use (B & D, 1.118) to find one r for which there is an no = no(r, e, 0) < 00 so
large that for all n > no,
OCr, n) > 1 - 0.
Then use (l2a) to find N < 00 so large that N7T(S) > 2 and the event
B = {-I < /(')
) - 1 < r for all j > N }
r [j7T(S)]
3.3] THE NUMBER OF POSITIVE SUMS 87

has Pi-probability more than 1 - b. Choose n l > no, finite but so large that
for n > nl, the event
en = {en! > max llVw)-l - V[;,,(s)]I:O ~j ~ N]}
has Pi-probability more than 1 - b. Thus, Pi(An n B n en) >1- 315 for
n > nl . If n > n l andj = 0, ... ,n, I claim

IV!(;)_l - V[;"(s)] I < en!,


provided w is confined to An n B n en. Indeed, if j ~ N the inequality
holds because WEen' Suppose N < j ~ n. Then l(j) - I ~ n, and
[j7T(S)] ~ 1. This and WEB force l(j) - I ~ 1. Of l(j) - I and [j7T(S)], the

*
lesser is between I and n; and the greater is at most r times the lesser, because
WEB. The inequality is now visibly true, because wEAn.
This completes the proof of (3). The proof of (4) is similar, using (B & D,
1.119) instead of (B&D, 1.118). To quote (B&D, 1.118-119), let Y1 ,
Y2 , ••• be independent, identically distributed random variables on some
probability triple (.Q, ff, &). Suppose Y1 has mean 0 and variance 1. Let
Vo = 0, and for n ~ I let
Vn = + ... + Yn'
Y1
(B & D, 1.118). Let e > 0 and r > 1. Let per, n) = &(A), where A is the
event that

Then
limr j 1 lim sUPn_ 00 p(r, n) = O.
(B & D, 1.119). Let e > O. There is an r > 1, which depends on e but not
on the distribution of Yl> such that &(lim sup An) = 0, where An is the event
that
max {IV; - Vkl:O ~j ~ II andj ~ k ~ rj}
exceeds e(n log log n)t

3. THE NUMBER OF POSITIVE SUMS

Assume (5). Suppose that V m > 0 for infinitely many m along Pcalmost
all paths. In the opposite case, Vm ~ 0 for all large m, along Pcalmost all
paths, by the Hewitt-Savage 0-1 Law (1.122). The argument below then
establishes (6), with I - s; for s; and I - v; for v;. This modified (6) is
equivalent to original (6).
As in (12), let r; = 7;+1 - 7;. With respect to Pi' the r; are independent,
identically distributed, and have mean Ij7T(s). The Pcdistribution of r; does
not depend on i, but does depend on the reference state s. Theorem (6) will
88 SOME INVARIANCE PRINCIPLES [3

be proved by establishing (14) and (15). To state them, let

An = _l_Li {vi:1 ~j ~ n1T(S)}


n1T(S)

Bn = ~Li {Viri+1:1 ~j ~ len) - 2}

1
en = nLm {sm:l ~ m ~ n}.
The two estimates are
(14) An - Bn ---+ 0 with Prprobability 1
(15) Bn - en ---+ 0 with Pi-probability 1.
Add (14) and (15) to get (6) in the form An - en ---+ 0 a.e. Relation (14) is
obtained by replacing '1+1 with its mean value 1/1T(s). The error is negligible
after dividing by n, in view of the strong law. Relation (15) will follow from
the fact that essentially all m ~ n are in intervals h+1,7"1+2) over which
Sm = Vi' because IV;I is large by comparison with Ui+1 and Y'(m).
Making this precise requires lemmas (16) and (17). For (16), let '1' '2' ...
be any independent, identically distributed random variables, with finite
expectation. Let~I c ~2 C .•• be a-fields, such that, n is~ n+I-measurable,
and ~ n is independent of,n' Let ZI' Z2' ••• be random variables, taking only
the values 0 and 1, such that Zn is ~ n+1-measurable, and L;' zn = 00 a.e.
(16) Lemma. (L~=I Z;'1+I)/(L~=1 z;) conve'ges to E(,;) a.e.
PROOF. Let Zn = Lf Zi' For m = 1, 2, ... , let W m be 1 plus the smallest
n such that Zn = m. I say that {Wm = j} E ~;. Indeed, for j ~ m + 1,
{Wm = j} = {ZI + ... + Z;_I = m > ZI + ... + Z;_2}'
If m' < m and A is a Borel subset of the line, I deduce
{rw m ' EA and Wm =j} E~;;
for this set is
u~:~ {rk E A and Wm, = k and Wm = j}
I conclude that 'Wl' 'w.' ...
are independent and identically distributed, the
distribution of 'w m being that of ';' Indeed, if AI' ... ,Am are Borel subsets
of the line, then
Prob {rw lEAl' ... , rw m-l E Am-I, rW m E Am}
= L~I Prob {rw lEAl' ... , rw m-l E A m- I and Wm =j and r; E Am}
= L~I Prob {rw 1 E AI> ... , rw m-l E A m- I and Wm = j} . Prob {r; E Am}
= Prob {rw I E AI' ... , rW m-l E ~m-I} . Prob {r; E Am},
because {'WI E AI' ... ,'Wm-l E A m- I and Wm = j} E ~;, while '; is
independent of ~j and Prob {'j E Am} does not depend onj.
3.3] THE NUMBER OF POSITIVE SUMS 89

By the strong law,

Because Zn -- 00 a.e.,
Z;;-l~drwk:l ~ k ~ Zn}--E(r 1 ) a.e.
But
~; {z;r;+1: 1 ~ j ~ n} = ~drwk:l ~ k ~ Zn}:
for Zn is the number of j = 1, ... , n with = 1; and W k is 1 plus the kth
*
z;
iwith z; = 1.
For (17), let YI , Y 2 , ••• be any sequence of independent, identically
distributed random variables. Put Vn = ~; Y;. Make no assumptions about
the moments of Y;. Let M be Ii positive, finite number. Let dn be 1 or 0
according as IVnl ~ M or IVnl > M.
(11) Lemma. Suppose Y; differs from 0 with positive probability. Then
1
- ~; d; -- 0 a.e.
n
PROOF. Suppose Y; differs from x with positive probability, for each x.
Otherwise the result is easy. Let Cn be the concentration function of Vn , as
defined in Section 5. Fix k at one of 1, 2, ... and fix r equal to one of 0, ... ,
k - 1. Let On be the conditional probability that IVnk+rl ~ M given
YI , . . . , Y(n-llk+r' I claim
(a)
Indeed, let fl be the distribution of Y = (YI , . . . , Y(n-llk+r), a probability
on the set of (n - l)k + r-vectors y = (YI' ... ,Y(n-llk+r)' Let A be a Borel
subset of this vector space. Let T = Y(n-llk+r+1 + ... + Ynk+r' so Tis
independent of Yand distributed like Vk. Let s(y) = Yl + ... + Y(n-l)k+r,
so
V nk+r = s( Y) + T.
By Fubini,

Prob {Y E A and IVnk+rl ~ M}

= {prob {Is(y) + TI ~ M} fl(dy)

= {prob {-M - s(y) ~T~M- s(y)} fl(dy)

~ {C k (2M) fl(dy)

= Ci2M) . fleA)
= Ck (2M) . Prob {Y E A}.
90 SOME INVARIANCE PRINCIPLES [3

This completes the proof of (a).


I claim

(b)

Claim (b) follows from (a) and this martingale fact:

This fact may not be in general circulation; two references are (Dubins and
Freedman, 1965, Theorem (1» arid (Neveu, 1965, p. 147). Claim (b) can
also be deduced from the strong law for coin tossing. Suppose without real
loss that there is a uniform random variable independent of Y1 , Y 2 , ••••
Then you can construct independent 0-1 valued random variables e1, e2' ..•
such that: en is 1 with probability Ck (2M), and en ~ d nkH .
Sum out r = 0, ... ,k - 1 in claim (b) and divide by k:

(c) lim SUPn~CX) ~k ~; {d;: 1 ;;:; j < (n + 1)k} ;;:; Ci2M) a.e.

Let m and n tend to 00, with nk ;;:; m < (n + l)k. Then m/(nk) -- 1, so (c)
implies

Now let k -- 00, so C k (2M) -- 0 by (36) below.

[-------------------------. vr-·-------------------- ) rj + 1 - - -

o Tj+1 Tj+2

[ ------------------------------I¥ Tj + 1------------------------------ J
[------------------.--------------------------I¥ Tj + 2 --------------------------------------------- I

Figure 3.

PROOF OF (14). Recall the definitions of An and B n, made before (14).


Introduce
Dn = _1_~; {v i :1;;:;j;;:; l(n) - 2}.
n1/'(s)
You should use (16) to see that with Pi-probability 1,

(~;!l v;ri+1)/(~i=l v;) -- 1/1/'(s).


3.3] THE NUMBER OF POSITIVE SUMS 91

The conditions of (16) are satisfied by strong Markov (1.21), with~; = ~T J:


look at Figure 3. The condition ~ v; = OCJ a.e. is the assumption made at the
beginning of this section. Put m = len) - 2 and rearrange to get
(18) Dn/Bn -+ I with P;-probability 1.

Use (I2a) to see


(19) An - Dn -+ 0 with Pi-probability 1.
As Figure 2 shows, Bn ~ I. Thus

IDn - Bnl ~ IDn - Bnl/Bn --.. 0


with Pi-probability I by (18). Combine this with (19) to get (14). *
PROOF OF (15). Temporarily, let S be a subset of the nonnegative integers.
A random subset R of S assigns a subset R(w) of S to each wE l"", so that
{w:jER(w)} is measurable for eachjES. The cardinality #R of R is a
random variable on j"J, whose value at w is the number of elements of R( w).
For j = 1,2, ... , let
R;(w) = {m:m is a nonnegative integer and T;(W) ~ m < TH1(W)}.
Then R; is a random subset of the nonnegative integers, and #R; = r;, as in
Figure 2.
Fix E > O. Choose M so large that

[ 1> 1II/2} '1 dP < E/3;


J{u i

the integral does not depend on i. Let Gn be the following random subset of
the positive integers: j E Gn iff TH2 ~ n and I V;I > M and UH1 ~ M12. In
particular, j E G n implies I ~ j ~ len) - 2. See Figure 4. Of course, G n
depends on M, although this is not explicit in the notation. Let

Hn = U {RH1:j E G n},
a random subset of {I, ... , n}. In particular, mE Hn implies T2 ~ m < THnb

as in Figure 4. The main part of the argument is designed to show


(20) For Pi-almost all sample sequences, #Hn ~ (1 - e)n for all large n.
How large depends on the sample sequence.
Before proving (20), I will derive (15) from (20). Let EM be the subset of
j"J where
92 SOME INVARIANCE PRINCIPLES [3

[------------------------------Sm-----------------------------]
[------ Y'.--- )[------••----- ~ ------.--------) [---------. UI+ 1----------- )
[----------Rj + 1-----------)
1/
I, I I I II

" I I I
o TI+l m Tj+2 'Ii(II) n TI(II)+1

---"j+l---

Figure 4.
By looking at Figure 2,
~i {rH1 : 1 ~ j ~ /(n) - 2} ~ n
and

So
~i {rH1: 1 ~j ~ /(n) - 2 andj ¢ G,,} ~ n - #H".
But Vi = 0 or 1, so
o ~ ~dViri+1: 1 ~j ~ /(n) - 2 andj ¢ G,,} ~ n - #H".
Similarly,

On EM' I say that

Indeed, m E RS+1 and j E G" and Y' ~ MI2 force 3m = Vi: because
'Ti+1 ~ m < 'THI ~ n and IVii> M and UH1 ~ M12, so

ISm - Vii ~ Y' + UH1 ~ M < lViI,


and Sm is positive or negative with Vi' See Figure 4. Consequently, on EM'

I~i {Vi 'HI: 1 ~j ~ /(n) - 2} - ~m {3 m : 1 ~ m ~ n}1 ~ n - #H".


Recall the definitions of B" and C" given before (14). Relation (20) implies
that Pi-almost everywhere on EM'

IB" - C"I ~ B for all large n.


Let M increase to 00 through a countable set, so EM swells to the whole
space. Then let B decrease to 0 through a countable set, and get (15) from (20).
Tum now to the proof of (20). Let ds be 1 or 0 according as IVii ~ M or
IVsI > M. I claim that for Pi-almost all sample sequences,
(21) ~f-l d i ri+1 ~ Bn/3 for all large n.
3.3] THE NUMBER OF POSITIVE SUMS 93

If ~:l di < 00
with positive P;-probability, Hewitt-Savage (1.122) implies
di = 0 for all large j, with Pi-probability 1; and (21) holds. Suppose
~:l di = 00
with P;-probability 1. Lemma (16) makes

(~;=l diri+1)/(~j=1 d i ) ~ l/7T(s)


with Pi-probability 1. The conditions of (16) are satisfied by strong Markov

(1.21), with .'Fj = .'FTi' See Figure 3. But (17) makes ~ ~:l dj ~ 0 with
Pi-probability 1. So (21) still holds. Put len) - 2 for n in (21) and remember
len) ~ n + 1:
(22) ~j {d,rHI : 1 ~j ~ len) - 2} ~ en/3 for all large n, withP;-probability 1.

Next, blocks (1.31) and the strong law imply that with Pi-probability 1,

_l_L, {ri+1:1 ~j ~n- 2 and Vi+! > M/2} ~f r 1 dP;.


n- 2 {Ul>M/2}

Put len) for n; remember that len) ~ n + I, and the integral is less than e/3
by choice of M;

(23) L j {rH1 : I ~ j ~ len) - 2 and Vi+! > M/2} ~ en/3 for all large n, with
Pi-probability 1.

Finally, use (12 b) to see that

(24) The number of m ~ n with m < 7'2 or m ~ 7'l(n) is at most en/3 for all
large n, with P;-probability 1.

Let Bn be the random set of mE {I, ... , n} which have property (a) or
(b) or (c):

(a) mE RHI for somej = 1, ... ,l(n) - 2 with di = 1.


(b) mE Ri+! for some j = 1, ... , len) - 2 with V i +1 > M/2.
(c) m < 7'2 or m ~ 7'IC,,).

Combine (22-24) to get


(25) #B" ~ en for all large d, with Pcprobability 1.

But N" is the complement of H" relative to {I, ... , n}, proving (20). *
To state the arcsine law (26), define Fa as follows. For a = 0 or 1, let Fa
be the distribution function of point mass at a. For 0 < a < 1, let Fa be the
probability on [0, 1], with density proportional to y ~ ya-I(I _ y)-a.
94 SOME INV ARIANCE PRINCIPLES [3

(26) Corollary. Suppose (5). The P;-distribution of ~ 2:;=1 Sj converges to Fa

iff the Pi-mean of ~ 2:;=1 Sj converges to a.

*
PROOF. Use (6) and (Spitzer, 1956, Theorem 7.1).

NOTE. Temporarily, let Mn = ~ 2:7=1 Sj. If the distribution of Mn con-


verges to anything, say F, then the Pi-mean of Mn converges to the mean I-'
of F, because 0 ~ Mn ~ 1. Thus F = FJl' This need not hold for subsequential
limits of the distribution of Mn' Furthermore, if the convergence holds for
one i, it holds for all i. More generally, (6) shows that the P;-distribution of
~ ~;=1 Sj is asymptotically free of i; because the P;-distribution of the Y's,
so of the v's, does not depend on i.
The balance of this section concerns the exceptional case

(27) Yj = 0 with Pi-probability 1.


(28) Theorem. Let 7 be the least positive n with ~n = i. If (27) holds, then
with Pi-probability 1,

(29) lim n _ 00 ~ ~::'=1 Sm = 1T( i) f


2: m {sm: 0 ~ m < 7} dP,.
PROOF. If (27) holds for one reference state s, it holds for all s by (2.92);
the dependence of Yj on s is implicit. Consequently, in studying the asymp-
totic behavior of ~ 2:~ Sm relative to Pi' it is legitimate to use i as the reference
n
state s: condition (27) will still hold. The simplification is

P i {7 1 = 0 and 72 = 7} = 1.

For j = 1,2, ... , let T j be the random vector of random length r; =


71+1 - 7j, whose mth term is S,;+m for m = 0, ... ,rj - 1. Now

S,;_1 = Y1 + ... + Y = 0 j- 1 a.e.,

so the mth term in Tj is really

except on a Pi-null set. The first summand is I(i), but this does not help.
Blocks (1.31) imply that the vectors T 1 , T2 , ••• are independent and
3.4] THE NUMBER OF POSITIVE SUMS 95

identically distributed relative to Pi' By the strong law, with Pi-probability 1,

lim j .... oo j f
~ 1 l~:::::~ Sm = lm {Sm: O ~ m < 'T2} dPi •
Confine j to the sequence /(n), and use (12). *
The limit in (29) depends on i. For example, let I = {I, 2, 3} and let

P= (°0 1°0)1 ,
100
so 1 --->- 2 --->- 3 --->- 1. Then 7T(i) =i for i = 1,2, 3; condition (1) is equivalent
to
J(1) + J(2) + J(3) = 0.

Condition (27) is automatic. When i = 1, the right side of (29) is ! times


the number of positive sums
J(1), J(I) + J(2), J(I) + J(2) + J(3).
When i = 2, the right side of (29) is t times the number of positive sums
J(2), J(2) + J(3), J(2) + J(3) + J(I).

4. SOME INVARIANCE PRINCIPLES

As in Section 1.7 of B & D, let g be a real-valued function on the non-


negative integers, whose value at n is gn- Then gIn) is the continuous, real-
valued function on [0, 1] whose value atj/n isgj/n l , and which is linearly inter-
polated. Let {B(t):O ~ t < cD} be normalized Brownian motion. That is:
B(O) = 0; all the sample paths of B are continuous; and for

°
the differences B(to) , B(tl ) - B(to), ... , B(tn) - B(tn-l) are independent
normal random variables, with means and variances

respectively. Such a construct exists by (B & D, 1.6). Let qo, 1] be the space
of continuous, real-valued functions on [0, I], with the sup norm
\IJII = max {IJ(t)I:O ~ t ~ I}.
Give qo, 1] the metric
distance (j, g) = Ii! - gil,
96 SOME INVARIANCE PRINCIPLES [3

and the a-field generated by the sets which are open relative to this metric.
The distribution TI" of {B(0'2t):0 ~ t ~ I} is a probability on qo, 1]. For
more discussion of all these objects, see Sections 1.5 and 1.7 of B & D. The
next theorem, which is an extension of Donsker's invariance principle to
functional processes, depends on some results from B & D. These are quoted
at the end of the section.
(30) Theorem. Suppose (1) and (2). Then 0'2 = 'IT (s) . S y~ dPi depends
neither on i nor on the reference state s. Let cp be a bounded, measurable, real-
valued function on qo, 1], which is continuous with TI,,-probability 1. Then

lim" .... oof['" cp(S(n) dPi = r


JO[O.I]
cp d TI".

PROOF. Set Vo = O. Then V(n) is linear on


.. +
[~'7 IJ and takes the value
Viln! at jln. Similarly for S(n» except that So = f( ~o). Let
Fn(t) = V(n)(rr(s)t) for 0 ~ t ~ 1.
Thus, Fn is a measurable mapping from fOO to C[O, 1]. The values of F" are
piecewise linear functions, with corners at ~) for m = 0, 1, ... , m o,
mr(s
where mo is the largest integer no more than mr(s).
I claim
(31) IIS(n) - Fn II -+ 0 in Pi-probability.
. m m+1
Study two succeSSIve corners - - ) and - - ) of Fn; so
mr(s n7T(S

O:$~:$m+l:$l;
- n7T(s) - n7T(s) -
and study the successive corners of Sen) from the greatest one ~ no greater
m) to t h e 1east one -b no 1ess t h an -
t h an - - m- . dnm
+)1. T h"IS IS deplcte . Figure
. 5.
n7T(s n n7T(s
Analytically,

0:$ ~:$ ~ :$ a + 1 :$ ... :$ b - 1 :$ m + 1 :$ ~:$ 1;


- n - n7T(s) - n - - n - n7T(s) - n -
that is,
a7T(s) ~ m ~ (a + l)7T(s) ~ ... ~ (b - l)7T(S) ~ m + 1 ~ b7T(S).
By the linearity,

max, {IS(n)(t) - Fn(t)1 : ~


n7T(s)
~t~ m+
n7T(s)
I}
3.4] SOME INVARIANCE PRINCIPLES 97

m m+l
mr(s) mr(s)
I I I I I
a a+l a+2 a+3
n -n- -n- -n-
b=2

Figure 5.

is at most
n-1 maxI.,. {lSI - V,.I:,u = m or m + 1 and a ~ j ~ b};

and l,u - j1T(s)1 ~ 2. Suppose, as is likely, that the last comer ~ of Fn


is less than 1. That is n1T(s)
~<l<m+l.
n1T(s) n1T(s)

Again, let ~ be the greatest corner of SIn) no greater than~, so


n mr(s)

~<~<a+l< ... <~<m+l.


=
n n1T(s) n = n =
n1T(s) , = =
that is,

As before,
98 SOME INV ARIANCE PRINCIPLES [3

is at most
n-!maxi.1l {lSi - VIlI:ft = m or m + 1 and a ~j ~ n};
in this display too, 1ft - j1T(s)1 ~ 2. Thus, IIS(n) - Fn I is at most

n-! maxi {lSi - V[hds)]I:O ;;;, j ~ n}.


Use (3) to complete the proof of (31).
I will now derive (30) from (31). Temporarily, confine cp to the bounded,
uniformly continuous functions on qo, 1]. By (31) and (B & D, 1.129),

E;{ cp(S(n»)} - E i { cp(Fn)} - O.


For fE C[O, 1], let let)= f(1T(s)t) for 0 ~ t ~ 1, so lE qo, 1] and
III I ~ IIf II· Let rp(f) = cpU)· Then rp is still bounded and uniformly
continuous. Check Fn = V(n)' Let p2 = E i ( YD. By Donsker (B & D, 1.120),

Ei{cp(V(n»} = Ei{rp(V(n»)}
-f rp(j) nidf)

=f cpcl) TIp(df)

=f cp(f) TIuedf)·

This completes the proof of (30) for bounded and uniformly continuous cpo
Now use (B & D, 1.127).
Finally, (j2 does not depend on i because the Pi-distribution of Y1 does not
depend on i. And (j2 does not depend on s because the approximants to
J cp d TIq do not depend on s, a.nd this integral for all bounded, uniformly
continuous cp determines TI q, so (j. *
NOTE. In principle, (j2 can be computed from P. See (Chung, 1960, p. 81)
for an explicit formula.
Let Kq be the set of absolutely continuous functions f on [0, 1] which
satisfy: f(O) = 0 and J~ 1f'(t)12 dt ~ (j2. The extension of Strassen's invari-
ance principle to functional processes is
(32) Theorem. Suppose (I) and (2). For Pealmost all sample sequences, the
indexed subset
{(2 log log n)-! S(n):n = 3,4, ... }
of qo, 1] is relatively compact, and its set of limit points is K q , where
(j2 = 7T(S) • E;( Yf).
3.5] THE CONCENTRATION FUNCTION 99

PROOF. Use (4) and Strassen's invariance principle (B & D, 1.133), as


in the proof of (30). *
Some results from B & D
(B & D, 120). Let Yb Y 2 , ••• be independent, identically distributed

°
random variables on (0, IF, &'), with mean 0 and finite variance p2. Let
Vo = and
Vn = Y1 + ... + Yn for n ~ l.
Let q; be a bounded, real-valued, measurable function on qo, 1], which is
continuous &'-almost everywhere. Then

limn-+oo
Jur q;(V(n) d&' =f
C[O,I]
q;(f) n
p
(d/).

(B & D, 127). Let {&' n' &'} be probabilities on qo, I]. Suppose
Jq;d&'n-Jq;d&'
for all bounded, uniformly continuous q;. Then the convergence holds for all
bounded, measurable q; which are continuous &'-almost everywhere.
(B & D, 129). Let {Zn' W n} be measurable maps from (0, IF, &') to
qo, I]. Suppose
IIZn - Wn II - 0 in &'-probability.
Let q; be a bounded and uniformly continuous function on qo, 1]. Then

J q;(Zn) d&' - J q;(Wn) d&' - O.


(B & D, 133). Let Yb Y2 , • , . be independent, identically distributed
random variables on (0, IF, &'), with mean 0 and finite variance p2. Let
Vo = 0 and
Vn = Y1 + ... + Y n for n ~ l.
Let 0* be the set of w such that the indexed subset
{(2Iog log n)-i V(n)(', w):n = 3,4, ... }
of qo, I] is relatively compact, with limit set Kp. Then 0* ElF and
&'(0*) = l.

5. THE CONCENTRATION FUNCTION

Let X be a random variable on (0, IF, &').


DEFINITION. The concentration function ex of X is this function of non-
negative u:
100 SOME INV ARIANCE PRINCIPLES [3

For (33), let - 00 < a < b < 00. Let an ---+ a and bn ---+ b.

(33) Lemma. &>{a ~ X ~ b} ~ lim sup &>{an ~ X ~ bn}.

PROOF. Fix e > O. Choose b > 0 so small that


&>{a - b ~ X ~ b + b} ~ &>{a ~ X ~ b} + e.
For all large n,

So

&>{an ~ X ~ b n} ~ &>{a - b ~ X ~ b + b} ~ &>{a ~ X ~ b} + e. *


(34) Lemma. (a) If A > 0, then C;.x(}'u) = Cx(u).
(b) C x is nondecreasing.
(c) Cx is continuous from the right.
(d) C x(u + v) ~ C x(u) + C xCv).
(e) If X and Yare independent, Cx + y ~ CX.
(f) If X and Yare independent and identically distributed, then

C X(U)2 ~ &>{IX - YI ~ u}.

PROOF. Claims (a, b) are easy.


Claim (c). Let u ~ 0 and Un ! u. As (b) implies, Cx(u n ) converges, and
the limit is at least Cx(u). Select positive en tending to O. Select an such that

Then

If lanl ---+ 00, this forces


limn C x(u n ) = 0 ~ C x(u).

Otherwise, pass to a subsequence n* with an. ---+ a and use (33):

Claim (d). Fix e > O. For suitable a,

~~+0~&>~~X~a+u+~+e
~ &>{a ~ X ~ a + u} + &>{a + u ~ X ~ a + u + v} + e
~ Cx(u) + Cx(v) + e.
3.5] THE CONCENTRATION FUNCTION 101

Claim (e). Let F be the distribution function of Y. Use Fubini:

&'{x ~ X + Y ~ x + u} = J&'{x - y ~ X ~ x - y + u} F(dy)


~ J Cx(u) F(dy)

= Cx(u).
Claim (f). If x ~ X ~ x + u and x ~ Y ~ x + u, then IX - YI ~ u.
So
+ up ~ &'{IX - YI ~ u}.
&'{x ~ X ~ x
For (35-36), let Xl, X 2 , ••• be independent, identically distributed random
*
variables on the probability triple (0, ff, &'). Suppose &'{Xi = O} < 1.
Let K be a positive number. Let Sn = Xl + ... + X n. Let T be the least
n if any with ISnl ~ K, and T = co if none. Use E for expectation relative
to &'.
(35) Lemma. (a) There are A < co and p < 1 such that &'{T > n} ~ Apn.
(b) E{T} < co and &'{T < oo} = 1.
(c) Either &'{Iim sup Sn = oo} = 1 or &'{Iim inf Sn = - oo} = 1.
NOTE. In (a), the constants A and p do not depend on n; the inequality
holds for all n.
PROOF. Claim (a). Suppose &,{X, > O} > 0; the case &,{Xi < O} > 0
is symmetric. Find b > 0 so small that &'{XI ~ b} > O. If &'{XI ~ b} = 1
the proof terminates; so suppose &'{XI ~ b} < 1. Find a positive integer
N so large that Nb > 2K. Fix k = 0,1, .... Now SNk > -K and Xi ~ b
for i = Nk + 1, ... ,N(k + 1) imply SN(k+1) ~ K. So the relation
T > N(k + 1) implies ISnl < K for 1 ~ n ~ Nk, and Xi < b for at least
one i = Nk + 1, ... , N(k + 1). Consequently,
> N(k + 1)} ~ (1 - O)&'{T > Nk},
&'{T
where 0 = &'{XI ~ b}N > O. By substituting,
&'{T > N(k + I)} ~ (1 - ())k+1.
If Nk ~ n < N(k + 1),

&'{T> n} ~ &'{T> Nk} ~ (1 - ot ~ 1 ~ 0 pn,


where p = (1 - O)IIN.
Claim (b) is easy, using (a).
Claim (c). Suppose the claim is false. By Hewitt-Savage (1.122),
&,{Iim sup Sn < oo} = &,{Iim inf Sn > - oo} = 1.
102 SOME INVARIANCE PRINCIPLES [3

So
&{sup ISnl < oo} = 1.
By countable additivity, there is a finite K with
&{sup ISnl < K} > O.

*
This contradicts (b).

Recall that Xl' X 2 , ••• are independent and identically distributed, and
Sn = Xl + ... + X n·
(36) Theorem. /f&{Xl = c} < 1 for all c, then Cs (u) -4- 0 as n -4- oo,/or
each u ~ O. n

PROOf. If the theorem fails, use (34b, e) to find u > 0 and () > 0 with
CsJu) ~ () for all m. Let Xl' Yl , X 2 , Y2 , ••• be independent and identically
distributed. Let Ui = Xi - Y i and Tn = Ul + ... + Un' By (34f),
(37)
Let F be the distribution function of Xl' By Fubini,

&'{T1 = O} = &{Xl = YI }

= J &'{Xl = y} F(dy)

<1.
By (35c) and symmetry,
(38) &'{lim sup Tn = oo} = 1.
Choose a positive integer N so large that N()2 ;;; 2. I will obtain a contra-
diction by constructing a positive integer M, and N disjoint intervals
11 ,12 , ••• , IN' so that
&,{TME I j } ~ i()2 for j = 1, ... , N.
Let II = [-u, u]. By (37),
&{Tn E II} ~ i()2 for all n.
As in Figure 6, let Tl be the least n with Tn > 3u. As (38) implies,
&'{ Tl < oo} = 1. Choose positive integers Ml and KI so large that
&'{Tl ~ Ml and TTl ~ K l } ~ i·
In particular, Kl > 3u. Let 12 = [2u, KI + u]. I claim
&{Tn E I 2 } ~ i()2 for all n ~ MI'
3.5] THE CONCENTRATION FUNCTION 103

o n

Figure 6.

Indeed, {Tn E 12} ~ Uj'!i Ai for n ~ M l , where

Ai = {Tl = j and Ti ~ Kl and ITn - Til ~ u}.


Now {Tl = j and Ti ~ K l } is measurable on Ul , ••. , Uj; moreover, Tn - Ti
is measurable on Ui+l, Ui+2, ... and is distributed like Tn-i' Using (37),
&{Ai} ~ &{Tl = j and Ti ~ Kl } 15 2 •
Sum outj = 1, ... ,Ml :

&{Tn E 12} ~ &{Tl ~ Ml and TTl ~ Kl } 15 2•


To proceed, let T2 be the least n > Tl with Tn> Kl + 3u; find M2 > Ml
and K2 with
&{T2 ~ M2 and TT2 ~ K 2} ~ l.
In particular,K2 > Kl + 3u. Let/ = [Kl + 2u, K2 + u]. Stages 3,4, ... ,N
2
should be clear; M is any integer greater than M.v. *
104 SOME INVARIANCE PRINCIPLES [3

The much more interesting inequality (39) of Kolmogorov shows


C s (u) = O(n-t ), instead of 0(1). This result is included because of its
int~rest; it is not used in the book. I learned the proof from Lucien LeCam.
A reference is (Rogozin, 1961).

(39) Theorem. Let n be a positive integer. Let Xl' ... , Xn be independent


random variables, perhaps with different distributions. Let u and v be non-
negative numbers. Then

CX1 +,,·+x n (u) + (u/v» P:f~l [1 - Cx, (v)]}-t.


~ %(1
In the formula, % = 1 and u/O = 00 for u > 0; while (x) is the greatest
integer which does not exceed x.
The main lemma (40) is combinatorial. It leads to an estimate (42) of the
concentration function of a sum of symmetric, two poiT\t variables. General
variables turn out (45) to have a symmetric part. This does the case u = v = 2
in (49). The case u = v > 0 follows by scaling, and u = v = 0 by continuity.
Finally, the general case drops out of subadditivity.
Here are the preliminaries. Write #G for the cardinality of G, namely the
number of elements of G. Let F be a finite set. If A c F and B c F, say A
and B are incomparable iff neither A c B nor Be A. For example, A and B
are incomparable provided A ¥= Band #A = #B. A family §" of subsets of
F is incomparable iff A E §" and BE§" and A ¥= B imply A and Bare
incomparable. Any family of subsets with common cardinality is incom-
parable. Let n = #F. The family .# of subsets of F having cardinality
(n/2) is incomparable and has cardinality ( n/ ').
(n 2)
(40) Lemma. #§" ~ Cn;2») for any incomparable family §" of subsets
of F: where n = #F.
PROOF. Suppose n is even. Let §" be an incomparable family with
maximal #§". I assert that # A = n/2 for all A E §". Since §"* =
{F\A:A E §"} is also incomparable, and #§"* = #§" is also maximal, I
only have to show that A E §" implies #A ~ n12.
By way of contradiction, suppose
r = min {#A:A E §"} < n/2,
and suppose AI' ... , Aj in §" have cardinality r, while other A E §" have
#A > r. Consider the set ~ of all pairs (Ai' x), such that i = 1, ... ,j,
and for each i, the point x E F\A i • Of course,

#~ = j(n - r).
3.5] THE CONCENTRATION FUNCTION 105

Let .fFo be the family of subsets of F of the form Ai U {x} for (Ai' x) E~.
This representation is not unique, and I must now estimate #.fFo. Consider
the set ~o of all pairs (B, y), where B E .fFo and y E B. Plainly
#~o = (#.fFo)(r + 1).
Now (Ai' x) -+ (Ai U {x}, x) is a 1-1 mapping of ~ into ~o, so
(#.fFo)(r + 1) = #~o ~ #~ = j(n - r).
But r < n/2 and nis even, so r < (~ - 1)/2, and n- r > r + 1. Therefore,
#.fFo > j.
Let.fF' = .fFo U (.fF\{A I , . . . , Ai})' So #.fF' > #.fF. I will argue that.fF' is
incomparable. This contradicts the maximality of #.fF, and settles the even
case. First, .fFo is incomparable since all A E .fFo have the same cardinality.
Second, .fF\{A I , . . . , Ai} is incomparable because .fF is. Third, if (Ai' x) E ~
and B E .fF\{A I , . • . , Ai}' then
#(Ai U {x}) ~ #B,
so B c Ai U {x} entails B = Ai U {x} and Ai c B. And Ai U {x} c B
also implies Ai c B. But Ai c B contradicts the incomparability of .fF.
This completes the argument that .fF' is incomparable, so the proof for even n.
Suppose n is odd. Let .fF be incomparable with maximal #.fF. The
argument for even n shows that A E .fF has
#A = (n - 1)/2 or (n + 1)/2.
Suppose some A E.fF have #A = (n - 1)/2 and some #A = (n + 1)/2.
Let AI' ... , Ai have cardinality (n - 1)/2. Repeat the argument for even n

*
to construct an incomparable family .fF' with #.fF' ~ #.fF, and all A E.fF
having #A = (n + 1)/2.

To state (41), let Xl' . . . , Xn be real numbers greater than 1. Let V be the
set of n-tuples v = (VI' ... , vn ) of ± 1. Let a be a real number. Let U be the
set of V E V with
a ~ ~~l ViXi ~ a + 2.
(41) Lemma. #U ~ Cn;2»)'
PROOF. For V E V, let A(v) = {i:i = 1, ... ,n and Vi = I}. For u =;6
in U, the sets A(u) and A(v) are incomparable, because all Xi> 1. Use (40). *V

To state (42), let Xl':'" Xn be independent random variables on


(n, .fF, &).
Suppose for each i there is a nonnegative real number Xi with

&{Xi = Xi} = &{Xi = -Xi} = t·


Let m be the number of Xi > 1.
106 SOME INVARIANCE PRINCIPLES [3

(42) Lemma. CX1 +"+ Xn (2) ~ 2-m «;;2J

PROOF. By (34e), it is enough to do the case m = n. This case is immediate


from (41). *
(43) Lemma. 2- n Cn~2» ~ (1 + n)-l.
PROOF. Suppose first that n is even. Abbreviate

tm = 2- 2m C=)·
By algebra,

so
(1 + 2m + 2)lt = (2m + 1)l(2m + 3)l(1 + 2m)lt .
m+1 2m +2 m

Geometric means are less than arithmetic means, so


(1 + 2m)ltm decreases as m increases.
This proves the even n case. Parenthetically, (2m)ltm increases with m.
Suppose n is odd. By algebra,

2- 2m- 1 (2m + 1) = 2m + 12-2m(2m),


m 2m + 2 m
and by inspection,

2m + 1 (1 + 2m)-l ::5 (1 + 2m + 1)-l;


2m +2 -

*
so the lemma holds for odd n because it does for even n.

To state (44), let W be a nonnegative random variable, and k a positive


real number.

(44) Lemma. I/Var W ~ E(W) = k, then E[(1 + W)-l] ~ tk- l .

PROOF. Define the function g on [0, (0) by g(x) = (1 + x)-i. Verify


that g is convex: and decreasing. Letf(O) = g(O); let/be linear on [0, k];
and letf(x) = g(k) for x ~ k. See Figure 7. Algebraically,
f(x) = k- 1 [1 - g(k)] (x - k)- + g(k),
3.5] THE CONCENTRATION FUNCTION 107

wherey- = max {-y, O}. Now g ;[:/, so E[g(W)];[: E[J(W)]. But

E[(W - k)-] = tE[lW - kl1 ;[: t{E[(W - k)2]}! ~ tk!,

by Schwarz. Consequently,

E[g(W)] ;[: k- l [1 - g(k)] t k! + g(k)


~ tk-! + k-!
-- .:!k-!
2 •

............"....----f

o k

Figure 7

For (45-50), let Xl>'" , Xn be independent random variables, with


distribution functions Fl , . . . , Fn and concentration functions C l , . . . , C n,
respectively. The best case to think about is continuous, strictly increasing F i •
Do not assume the Fi are equal. There are nondecreasing functions/I, ... ,/n
on (0, 1), whose distribution functions with respect to Lebesgue measure are
Fl , • . . , Fn: for example/;(y) = inf {X:Fi(X) > y}. Let

(Y) -_j;(1 - y)2 + j;(y) d h ( ) _j;(1 - y) - j;(y)


gi an i y - 2 .

On a convenient probability triple (n, .'F, gil), construct independent random


variables Yl> ... , Y n , bl , . . . , bn , where each Y i is uniformly distributed on
[0, H and each bi is ± 1 with probability t. Let

(45) Lemma. {Zi:i = I, ... , n} is distributed like {Xi:i = 1, ... , n}.


108 SOME INV ARIANCE PRINCIPLES [3

PROOF. Plainly, theZ i are &,-independent. Let rp be a bounded, measurable


function on (- 00, (0). Check this computation, where E is expectation
relative to &'.
E{ rp[gi(Y;) + bihi(Y;)]} = i E{ rp[gi(Y;) - hi(Y;)]} + i E{ rp[gi(¥;) + hi (¥;)]}
= iE{fP[j;(Y;)]} + iE{ rp[f;(1 - Y;)]}

= So! rp[j;(y)] dy + So! rp[j;(l - y)] dy

= So! rp[j;(y)] dy + f: rp[j;(y)] dy

= frp[f;(y)] dy

(46) Lemma. &'{hi(Y.)


= Loooo rp(x) F. (dx).
> I} ~ 1 - C i (2).
*
PROOF. Fix y with
(47) o < y < HI - C i (2)].
Theny < t, so
(48) h(Y) ~h(z) ~h(l - y) for y ~ z ~ 1 - y.
I say
C i (2) <1- 2y
~ Lebesgue {z: hey) ~ h(Z) ~ h(1 - y)}
= Prob {f.(Y) ~ Xi ~ h(1 - y)}:
the first line follows from (47), the second from (48), and the third from the
fact that the Lebesgue distribution ofh coincides with the distribution of Xi'
From the definition of the concentration function,
h(1 - y) - h(Y) > 2;
so
(47) implies hi(y) > 1.
Therefore,
{Yi < HI - C i (2)]} c {hi(Y i ) > I}.
But Y i is uniform on [0, H
(49) Proposition. Let n be a positive integer. Let Xl' ... , Xn be independent
*
random variables, perhaps with different distributions. Then
C xl+",+ x n(2) ~ -! {~;'..l [1 - C xP)]}-!·
3.5] THE CONCENTRATION FUNCTION 109

PROOF. Remember that Xi has distribution function Fi and concentration


function Ci . Remember that /; is a nondecreasing function on (0, 1) whose
Lebesgue distribution is F i . Remember

_ /;(1 - y) + f;(y) d h ( ) _ /;(1 - y) - f;(y)


gi (Y) - an ;y - .
2 2
Remember Yl , ... , Y n, (Jb ... , (In are independent random variables on
(0, .'F, f!i'); the Yi are uniform over [0, t]; the (J, are ± 1 with probability t.
Remember

° °
Remember that E is expectation relative to f!i'.
Let cp;(Y) = 1 or according as hi(y) > I or h;(y) ~ I, for ~ y ~ t.
Let Q be the f!i'-distribution of Y = (Yl , • . • , Y n ), a probability on S, the
°
set of all n-tuples y = (Yl' ... ,Yn) with ~ Yi ~ t. For yES, let

cp(y) = CPl(Yl) + ... + CPn(Yn)·


Let a be a real number. Let

1T = Prob {a ~ LF=l Xi ~ a + 2}.


By (45) and Fubini,

1T = f!i'{a ~ L':l Zi ~ a+ 2}
=r f!i'{a ~ ~F=lgi(Yi) + L/~l h;(y;)(J; ~ a + 2} Q(dy).
)YES -

For each vector y, the integrand is at most

2-'1'(") ( cp(y) ) ~ [I + cp(y)]-l


(cp(y)j2)
by (42-43). Therefore,
1T ~ E{[l + cp( Y)]-t}.
Abbreviate Pi = f!i'{h i ( Yi ) > I} and verify this computation.

E[cp(Y)] = LF=l Pi-


Var [rp(Y)] = LF=l Var [CPi(}i)]
~ Lt~l E{[ rpi(}i)]2}
= Lt~l E{ cp;(}i)}

By (44),
1T
<
= ..:!(~n
2 .... ;=1 Pi
)-t .

*
Finally, use (46).
110 SOME INVARIANCE PRINCIPLES [3

(50) Proposition. Let n be a positive integer. Let Xl> ... , Xn be independent


random variables, perhaps with different distributions. Let v be a nonnegative
number. Then
CX1+"'+Xn(v) ~ ·H~~~l [1 - Cx;(v)]}-t.
PROOF. For v> 0, use (49) and (34a). Then let v ! 0 and use (34c). *
PROOF OF It is harmless to suppose v > O. Abbreviate C =
(39).
C X1 + ... +X n' If u < v,
then C(u) ~ C(v) by (34b), and the inequality (39)
holds by (50). Let m be a positive integer and mv ~ u < (m + l)v. Then
(u/v) = m. By (34b, d)
C(u) ~ C[(m + l)v]
~ (m + l)C(v)
= [1 + (u/v> ]C(v).
*
Use (50).
4

THE BOUNDARY

1. INTRODUCTION

This chapter is based on work of Blackwell (1955, 1962), Doob (1959),


Feller (1956), and Hunt (1960). Let P be a substochastic matrix on I: that is,
P(i,j) ~ 0 and ~iEI P(i,j) ~ l. Let po be the identity matrix, and G =
~:~o pn. Suppose G < 00. By (1.51), this is equivalent to saying that all
i E I are transient. Let p be a probability on I such that pG(i) > 0 for all
i E I. Here pG(i) means ~jEf p(j)G(j, i). A function h on I is excessive iff:
(I) h ~ 0;
(2) ~iEI p(i)h(i) = 1;
and
(3) ~jE! P(i,j)h(j) ~ h(i) for all i E I.
Check, h(i) < 00 for all i E I. If equality holds in (3), then h is harmonic.
Because of (2), these definitions are relative to the reference probability p.
Throughout, i, j are used for generic elements of I, and h for a generic
excessive function.
The set of h is convex. One object of this chapter is to identify the
extreme h, and prove that any h can be represented as a unique integral
average of extreme h. This is equivalent to constructing a regular condi-
tional distribution for the Markov chain given the invariant a-field.
Give !the a-field of all its subsets. Let q* be the set of all finite, nonempty
I-sequences, with the a-field of all its subsets. Let [rfJ be the set of infinite

I want to thank Isaac Meilijson for checking the final draft of this chapter.

111
112 THE BOUNDARY [4

I-sequences, with the product a-field. Give n* u 100 the a-field generated by
all the subsets of n* and all the measurable subsets of jOO. Let ~o, ... be the
coordinate processes on n * u r. Let n be the union of n * and the set n°o
of wE 100 such that ~n(w) = i for only finitely many n, for each i E I. Retract
~o, ... to n, and give n the relative a-field.
For any probability q on I, let P q be the probability on n for which the
coordinate process is Markov with starting probability q and stationary
transitions P. Of course, Pq(n) = 1 because I is transient and q is a prob-
ability. If q(i) = I, write Pi for PQ • Let fh = {i: i E I and h(i) > O}. Let
(ph)(i) = p(i)h(i), and
P"(i,j) = h(i)-lP(i,j)h(j)
for i and j in fh. Plainly, P" is substochastic on 1\
(ph)n = (pn)h and Gh(i,j) = h(i)-lG(i,j)h(j) = ~:=o (p,,)n(i,j) < 00.

Abbreviate Q" for P;", the probability on n such that the coordinate process
is Markov with starting distribution ph and stationary transitions ph. That is,

Qh{~O = i o, .•• , ~n = in} =0


unless i o, .•. , in are all in [h, in which case

QhgO = i o, .•. , ~n = in} = (ph)(io) n::.-;:,~ ph(im' im+1)'


Let Y be the invariant a-field of n; the definition will be given later. The
main result will now be summarized. There is one subset HE ..1', of Q,,-
measure 1 for all h; and for each i E I, there is an J-measurable functiong(i)
from H to the real line, such that for each wE H, the function g(·)(w) on I is
excessive; moreover, abbreviating Q w for Qu(')(w), the mapping w -+ QOJ is a
regular conditional Q,,-probability given Y for all h.
This result, and general reasoning, give the following: Let E be the set
of w E H satisfying:
Qw{w':w' E Hand Qw' = Qw} = 1;
and
Q.,,(noo) = 1 for w E noo.
Then E E Y and Q/ E) = 1. Let tff be the a-field of subsets of E measurable
on w -+ Qw' Then there is one and only one probability on tff, namely Qh'
integrating g to h. Thus, as w runs through E, the function g(·)(w) on I runs
through the extreme, excessive h. Finally, the extreme, excessive h which are
not harmonic are precisely the functions G(', j)!pG(j) as j varies over I.
The function g(i) is a version of the Radon-Nikodym derivative of Pi
with respect to P 'P' retracted to J. The main difficulty is to choose this
version properly from the point of view of the Qh' for as h varies through the
4.2] PROOFS 113

extreme excessive functions, the Q" are mutually orthogonal. There are two
properties g(i) must have from the point of view of Q,,: it must vanish a.e.
when i 1= I"; and g(i)h(i) must be a version of the Radon-Nikodym derivative
of PI' with respect to Q", when retracted to f, for i E Jh. Perhaps the leading

°
special case is the following: p concentrates on one state, P is stochastic,
GU, k) > for allj, k E I, and h is harmonic. Then h is positive everywhere
and 0* is not needed. Moreover, On has measure 1 if p(In) = 1; these
quantities will be defined later. Section 2 contains proofs. Section 3 contains
the following theorem:
G(i, ~n)/P(~n) converges to a finite limit Pp-almost surely.

Section 4 contains examples. Section 5 contains related material, which will


be referred to in ACM.
NOTATION. (a) G used to be called eP in (1.49).
(b) 0* does not include the empty sequence, thereby differing from 1* in
Section 1.3.
(c) If S is a set, ~ is a a-field of subsets of S, and F E~, then F~ is the
a-field of subsets of F of the form FA, with A E~. This a-field is called ~
relativized to F, or the reiatil'e a-jield if F and .'F are understood. This
notation is only used when F E~.

2. PROOFS

Recall that 0 = 0* U 0"', where 0* is the set of nonempty, finite


I-sequences, and 0'" is the set of infinite I-sequences which visit each state
only finitely often. And {t n } is the coordinate process on n. The shift Tis
this mapping of 0 into O. If w EO"', then Tw E 0 00 and

~n(Tw) = ~"+l(w) for n = 0, 1, ....

If wE 0* has length m ~ 2, then Tw E 0* has length m - 1 and

~n(Tw) = ~n+l(w) for n = 0, ... ,m - 2.

If wE 0* has length 1, then Tw = w. For all w, let TOw = w, and Tn+lw =


TTnw. The invariant a-jield f of 0 is the a-field of measurable subsets A of
o which are invariant: T-IA = A. Let.f* = O*f and foo = ooof.
The first lemma below gives a more constructive definition of f. To state it,
and for use throughout the section, fix a sequence In of finite subsets of I
which increase to I; let On be the set of w EO with ~m(w) E In for some
m = 0, 1, .... On On' let Tn be the largest m with ~m E In' and let Y n = ~T n •
114 THE BOUNDARY [4

Let Tn be this mapping from On to 0:


Tn(w) = T'n(W)(w).
Verify the measurability of Tn. Let fn be the a-field generated by the subsets
of On measurable on Tn' and all measurable subsets of O\Qn' For wE 0*,
let L(w) be the last defined coordinate of w.
(4) Lemma. (a) f n !f
(b) f* is the a-field of subsets of 0 * generated by L.
PROOF. Assertion (a). ] claim that f" decreases. To begin with, Tn+! =
Tn+! 0 Tn on On. If A E On+lf1<+!' then A = T-;'!lB for some measurable
B. So
A = (T;;lT;;~l B) U (A\On) E f n.
And On increases with n, completing the proof that fn decreases. I claim
that nn~, c f. Indeed, let A E nn
fn' and fix w. I have to show w E A
iff Tw E A. Suppose w has length at least 2, for otherwise w = Tw. Fix a
positive integer n so large that ~l(W) E In. Then wand Tw are in On, and
Tn(w) = Tn(Tw). But A E fn' so

A = (T-;;lB) U (A\On)
for some measurable B. This shows w E A iff Tw E A. Finally, I claim
f c f n . Indeed, let A E f. The problem is to show A nOn E f". But
A n On = T-;;lA E f".
Assertion (b) is immediate from (a) and
(5) L(w) = ~o(ynw) for all large n

WARNING. On 1= f n +!·
= ~o(Tnw) for all large n.
*
(6) Lemma. Let i E Is. Then Pi is absolutely continuous with respect to P'J)
when both probabilities are retracted to OS.1'N' a version of the Radon-
Nikodym derivative on ON being G(i, Y.v)/ pG( Y1\,).
PROOF. Fix M andjo,'" ,jM' withjo E IN andh 1= IN"" ,jM ¢ IN' Let
7T = P(jO,jl) ... P(hI-l,h,I) . Pj)~n E IN for no n > O}.
The event
Am = {ON and ~rN = jo, ... , ~rN+M = IM and TN = m}
is the same as
4.2] PROOFS 115

Let q be a generic probability on I. Then Markov (1.15) makes

Pq(Am) = Pq{~m = io} . Tr.


Let A = U:=o Am' so
A = {ON and ~T.v = io, ... , ~T.v+"lI = i.l[}'
Sum out m:
Pq{A} = qG(jo) . Tr.
Let q assign mass I to i or let q = p; remember Y N = jo on A :

*
Use (10.16) to vary A.
Let C i be the set where G(i, Yn)/pG( Y n) converges to a finite limit as n -- 00.
Call the limit g(i). Of course, g(i) may be O. Plainly, C; E .f', and g(i) is f-
measurable.
(7) Lemma. Pi is absolutely continuous with respect to Pp, when both
probabilities are retracted to f. Moreover, Pp(C;) = I. and g(i) is a version of
the Radon-Nikodym derivative of Pi with respect to Pp, when both probabilities
are retracted to f.
PROOF. As (4) and (6) imply, Pi is absolutely continuous with respect to
Pp on o..\'f. Since o..v i 0. as N i 00, the absolute continuity on f follows.
Use (10.35) on (6) for convergence. *
Remember that the probabilities PI' and Q" are concentrated in the part of
0. where all coordinates are in /".
(8) Lemma. If i rt IlL, then Q,,( C i ) = I and g(i) = 0 with Q,,-probability 1.
PROOF. Clearly, Q,,{ Y", E Jh} = 1. Moreover,
h(i) ~ ~j P"(i,j)h(j) ~ P"(i,j)h(j),
proving
(9) i rt [" and j E [" imply G(i,j) = o.
So i rt Jh makes

A helpful consequence of (9) is that for all io, ... , in in [, whether in ["
*
or not,
116 THE BOUNDARY [4

Suppose j E [II. Remember ph acts on [II. So


(11) phG"(j) = :EiElh p(i)h(i)h(i)-lG(i,j)h(j)

= :Ei€lh p(i)G(i,j)h(j)
= :EiEI p(i)G(i,j)h(j) by (9)
= pG(j)h(j)
> o.
(12) Lemma. Let iE/h. Then Qh(Ci ) = 1. Moreover, P~ is absolutely
continuous with respect to Q", with Radon-Nikodym derivative g(i)fh (i) ,
provided both probabilities are retracted to .1'.
PROOF. Keep n so large that i E In. Then Pi is absolutely continuous
with respect to QII when both are retracted to n"f", a Radon-Nikodym deriv-
ative being
Gh(i, Yr,)fphG h ( Y1/) = h(i)-l[G(i, Y,,)/pG(Yn )]
when Y n E [h. This follows from (6) on ph and (11), in the part of where n
all coordinates are in I". But this part has probability 1, both for P: and Qh'
Now (7) on ph makes this derivative sequence converge Qh-almost surely in
the part of n where all coordinates are in /h, which is Qh-almost all of n.
This makes Qh( C i ) = 1. Lemma (7) also identifies the 'limit of the derivative
sequence, namely g(i)/h(i), as the required Radon-Nikodym derivative. *
Let C = nici . So C E.1', and Q,,(C) = I by (8) al1~ (12).

(13) Lemma. h(i) = Ie g(i) dQh'


PROOF. For i 1= I", use (8). For i E I", use (12).
The next step is to calculate the conditional distribution given .1'. *
(14) Lemma. Let i o, ... , i" E I. A version of the conditional Q,,-probability
that ~o = io, ... , ~n = in' given .1', is
p(io)P(io, i 1) ••• P(in-I, in)g(in)·
PROOF. Let B be the event that ~o = io, ... , ~n = in. Let A E J. Check
T-n{~o = in and A} = {~n = in and A}.
By Markov (1.15),
Q,,(B ('\ A) = Q,.(B) PI:,(A).
By (10),
Q,,(B) = P,iB) hUn}'
By (8) and (12),
4.2] PROOFS 117

That is,

Qh(B n A) = L
P,p(B) g(in) dQh'
*
Let H = {w:w E C and g(·)(w) is excessive}. Check HE oF.
(15) Lemma. Qh(H) = 1.
PROOF. Clearly, (1) is satisfied: g(·)(w) ~ 0 for all wE C. Next, I will
work on (2): the set of WEe such that
kiEIP(i)g(i)( w) = 1
has to have Qh-probability 1 for all h. But Qh{~O E Jh} = 1. So with Qh-
probability 1,
1 = Qh{~O E/h I oF}
= kiEjA Q,Mo = i I oF}
= r.iEjh p(i)g(i) by (14).

By (8), the sum can be extended over all of I, at the expense of changing the
exceptional null set.
Finally, I will work on (3): for each i E I, the set of OJ such that
r.;Ej P(i,j)g(j)(w) ~ g(i)(w)

has to have Qh-probability 1 for all h. Suppose first i ¢ Th. Then


Qh{g(i) = O} = 1 by (8). Ifj¢fh, use (8); ifjETh, use (9); either way
Qh{P(i,j)g(j) = O} = 1.
Suppose next i E [h. Use (9) and pG(i) >0 to find nand io, .•. , i n - 2 in [h
with

Let

Then A :::;) {A and ~n E Ih}. Use (14): with Qh-probability 1,


7Tg(i) = Qh{A oF} I
~ Qh{A and ~n E/h oF} I
= r.;eIh Q!I{A and ~n = j I oF}
= 7T r.;Ejh P(i,j)g(j).
Divide out 7T: with Qh-probability 1,
g(i) ~ r.jEjh P(i,j)g(j).
118 THE BOUNDARY [4

*
In view of (9), the sum can be extended over all of I, by changing the
exceptional null set.
For wE H, let Qw be the probability on 0 making the coordinate process a
Markov chain with starting probability pg(·)(w) and stationary transitions
pg(.)(W).

(16) Theorem. w -+ Qw is a regular conditional Qh-probability, given of.


PROOF. This follows from (14), (IS), and (10.16).
Let Eo be the set of wE H such that: Qw(O*) is 0 or 1, according as
*
wE 0* or wE 0 00 • Let E1 be the set of w E H such that g(')(w') = g(')(w)
for Qw-almost all w'. Let E = Eo (') E 1. Let C be the a-field of subsets of E
measurable on g. In particular, C is countably generated. Verify that
Qh determines h.
Consequently, E} is the set of W E H such that Q"" = Qw for Qw-almost all
w' E H. And C is the a-field generated by w -+ Q",. The atoms of C are pre-
cisely the sets of constancy of w -+ g(')(w), namely the sets of constancy of
W -+ Q",: use (10.18).

(17) Theorem. (a) E E of and Qh(E) = 1.


(b) Q",(A)=IA(w) forWEEandAEC.
PROOF. The set 0* E of, and Eo is the set where W -+ Qw(O*) is equal to
the indicator function of Q*. Thus Eo E of and Qh(Eo) = 1. Moreover,

*
E} E J' and Qh(E1) = 1 by (10.52). Finally, wE E1 makes Qw concentrate
on the C-atom containing w.
(18) Theorem. There is one and only one probability m on C, such that

r
JE
g dm = h.

Namely, m is Qh retracted to C.
PROOF. The retraction of Qh to C works by (13) and (17). For the
uniqueness, suppose m on C satisfies

r
JE
gdm = h.
As (10) and (10.16) imply,

Use (l7b):

Qh(A) = LQw(A) m(dw) = meA).


*
4.2] PROOFS 119

(19) Corollary. As w runs through E, the function g(·)(w) on I runs through


the set of extreme, excessive h. Moreover, h is extreme iff Qh{g = h} = l.
PROOF. Using (18), the mapping m ~ Sg dm is 1-1 and affine, from the
set of probabilities on te onto the set of h. The inverse image of h is the
retraction of Qh to te. Now m is extreme iff it is 0-1, by (l0.17a). Thus, h is
extreme iff Qh is 0-1 on te. But Qh is 0-1 on te iff

Qh{g = h'} = 1

for some h', using (10.17b, -18). Necessarily, h' = h by (18). And there is
an wEE with g(·)(w) = h. Next, let WEE and put h = g(·)(w). Let m
assign mass 1 to the te-atom containing w, so m{g = h} = 1. Then m is
0-1 and Sg dm = h, so h is extreme. *
(20) Lemma. If wE EQoo, then g(·)(w) is harmonic.
> 0.
*
PROOF. If h is excessive but not harmonic, then Qh(Q*) But
Qw(Q*) = 0, because w E Eo.
(21) Lemma. Fix k E I, and let h = G(" k)/pG(k). Then h is excessive, and
equality holds in (3) iff i =/= k. In particular, h determines k.
PROOF. P(i,j)G(j, k) is the Pcmean number of visits to k in positive
*
I:jEI

time.
NOTE. This observation and Fatou afford another proof of (15).

(22) Lemma. Q* c E. Andfor w E Q*,

g(·)(w) = G(·, L(w»/pG(L(w».

Moreover, .fi* = D*te.

PROOF. Clearly, Q* c C and for w E Q*,

g(·)(w) = G(·, L(w»/pG(L(w».

As (21) implies, Q* c H. Fix k E I. Let A = {w:w E Q* and L(w) = k}, and


let h = G(', k)/pG(k).
I say Qh(A) > O. Indeed, find nand i o, .•. , in in I with
7T = p(io)P(io, i 1) .•• P(in-l, in)P{in, k) > O.
Let
p = h(k) - I: j P(k,j)h(j),
which is positive by (21). Let
B = {';o = i o, ... , ';n = in and ~n+l = k}
120 THE BOUNDARY [4

and let
Bi = {B and ~n+2 = j}.
Use (10):

Then
Q,,(A) ~ Q,,(B\U i B i )
= Q,,(B) - ~i Q,,(Bi )
= 7Tp.

Now Q,,(E) = 1 by (17), so E n A is nonempty. But (4b) and (21) make A an


atom of Jf; and E EJf: therefore AcE. Consequently, 0* c E. For

*
wE 0*, the function g(·)(w) determines L(w) by (21). Consequently,
Jf* = 0*6".
Incidentally, the argument shows Qh(A) = 1. For a more direct proof,
see (33).
(23) Theorem. As k ranges over I, thefunction G(·, k)jpG(k) ranges over the
extreme, excessive functions which are not harmonic.
PROOF. From (19) and (20), the only candidates for the role of extreme,

*
excessive, non-harmonic functions are g(')(w) for w E EO*. As (22) implies,
0* c E and each candidate succeeds.
(24) Remark. 6" is a countably generated sub a-field of E. If A E Jf, then
{w:w E E and Q",(A) = I} E 6" differs from A by a Q,,-null set. In particular,
.F and 6" are equivalent a-fields for Q". However, the a-field Jf is inseparable.
Each of its atoms is countable. In general, for WEE the probability Qw is
continuous, and therefore assigns measure 0 to the Jf-atom containing w.
(25) Remark. h is extreme iff Q" is 0-1 on Jf.

*
PROOF. Proved in (19).
(26) Remark. Suppose P is stochastic. Then 1 is extreme iff there are no
further bounded harmonic h.
PROOF. For "if," use an argument based on (1). For "only if" suppose
h ~ I is bounded harmonic and e > 0 is small. Then

1 = t(1 - e) 1 - eh + t(1 + e) 1 + eh
l-e l+e
displays 1 as a convex combination of distinct harmonic functions. *
4.3] A CONVERGENCE THEOREM 121

(27) Theorem. Suppose P is stochastic. Then Pp is 0-1 on J iff 1 is the only


bounded harmonic h.

*
PROOF. Use (25) and (26).
On a first reading of this chapter, skip to Section 3. It is possible to study
the bounded, excessive functions in a little more detail, and in parentheses.
To begin with, (10) implies that Qh is absolutely continuous with respect to
P p on the first n + 1 coordinates, and has derivative h( ~ n). This martingale
converges to dQh/dPp by (10.35). Of course, Qh need not be absolutely
continuous with respect to Pp on the full a-field. However, if h is bounded by
K then Qh ~ KPp by what precedes, and h(~n) converges even in £1. Con-
versely, if Qh ~ KPp , even on C, then h is bounded by K in view of (13).
If h* = lim h(~n)' then h(i) = E(h* ~n = i). I
Turn now to extreme, excessive functions which are bounded. The
characterization is simple: h is bounded and extreme iff Pp {g = h} > O. For
(13) implies 1 = SgdPp. If Pp{g = h} = IX > 0, then IXh ~ 1; while h'is
extreme by (19). If h is bounded and extreme, then Qh{g = h} = 1 by (19),
and Qh ~ KP p by the previous paragraph. There are at most countably
many such h, say hi' h2' .... General h can be represented as
~n qnhn + qJlc + qshs.
Here the q's are nonnegative numbers which sum to 1. As usual, the h's are
excessive. Retracting Qh and Pp to g,

qn = Qh{g = h n} and qehc = f


g dmc and qshs = f g dm.,

where: me is the part of Qh which is absolutely continuous with respect to the


continuous part of Pp; and ms is the part of Qh singular with respect to Pp.
In particular, qsQhs = ms is singular with respect to P p • As (10.35) implies,
°
hs(~n) -- with Pp-probability 1 and hs(~n) -- 00 with Qh,-probability 1.

3. A CONVERGENCE THEOREM

Let e(j) = pG(j). Let


R(i,j) = e(j)P(j, i)/e(i),
so R is substochastic: ~j e(j)P(j, i) is the Pp-mean number of visits to i in
positive time, and is at most e(i). Let S = ~~=o Rn, so
S(i,j) = e(j)G(j, i)/e(i) < 00.

I remind you that ON is the set where IN is visited, and TN is the time of the
°
last visit to IN' On ON' let ~m = ~Trm for ~ m ~ TN' Of course, even ~o
is only partially defined, namely on ON'
122 THE BOUNDARY [4

(28) Lemma. ~ is a partially defined Markov chain with stationary transitions


R, relative to P".
PROOF. Let io E IN' and let iI' ... , i.l1 E I. Let
Tr = II!::o1 P(i.lf-m, i.lf-m-l)'
For i E IN' let
u(i) = Pi{~V E I.v for no v > O}.
Let

Then

where
An = {~n = illl , .•• , ~n+"lI = io, and ~. E IN for no v> n + M}.
By Markov (1.15),
PiAn) = P"gn = i M } • Tr' u(io).
Sum out n and manipulate:

P,,(A) = e(i M) . Tr • u(io)


_ u(io)I1 M - 1 R(' . )
- .
e(lo)
m=O ' m, 'm+l . *
(29) Theorem. As n - 00, the ratio GU, ~n)/pG(~n) tends to a limit P'P-
almost surely.
PROOF. Abbreviate Pn = G(i, ~n)/pG(~n)' Let 0 ~ a < b < 00. On
ON: let {J N be the number of downcrossings, as defined for (10.33), of [a, b] by
Pn as n decreases from 'TN to O. I will eventually prove inequality ({J):

({J) r (IN dP" ~ l/(b -


JnN
a).

For the moment, take ({J) on faith. Check that ON and {IN are nondecreasing
with N. So ({J) and monotone convergence imply

By (lO.10b),
< oo} = 1.
P'P{hm.v {IN
Let Og be the intersection of {limN {IN < oo} as a, b vary over the rationals.
Then Og is measurable and P'P{Og} = 1. You can check that Pn converges
4.3] A CONVERGENCE THEOREM 123

as n - 00 everywhere on Og, because TN - 00 as N - 00. The limit is finite


by (31) below.
I will now prove inequality (f3). Define R, S, and ~m as for (28). On ON, let
Xm = G(i, ~Trm)/pGaTrm) = S(~m' i)/pG(i)
for 0 ~ m ~ TN' and let Xm = 0 for m > TN' Let Xm = 0 off ON' Let.'FN
be the a-field spanned by all measurable subsets of

0HO,V and T.V ~ m},

and by '0' ... ,


'm on {Os and TS ~ m}. You should check that the .'Fm
are nondecreasing, and X", is.'F m-measurable. For A E.'F m' I claim

L X m+ 1 dP p ~ L Xm dP p '

This can be checked separately for A C O\{ON and TN ~ m} and for A of


the form
{ON and TN ~ m and '0 = jo, ... , 'm = jm}·
You should do the first check. Here is the second. I say

= ~kEI Pp{A and 'm+l = k and TN ~ m + I} . S(k, i)/pG(i)


= ~kEI Pp(A) . R(jm' k) . S(k, i)/pG(i)

~ Pp(A)' S(jm' i)/pG(i)

=lXmdPp ,
A

for these reasons: Xm+l = 0 on {TN = m} in the first line; split over the sets
gm+l = k} and use the definition of X in the second; use (28) in the third;
use (21) on R in the fourth; use the definition of X in the last. Consequently,
the sequence X o, Xl> ... is an expectation-decreasing martingale. Plainly,
f3N is at most the number of downcrossings of [a, b) by X o, Xl' .... By
(10.33),

r
JI1N
f3N dP p ~ (b - 1 r X o dP
ar JON p ~ (b r S«(o, i) dP
- a)-l pG(i)-l
JI1N
p'

Use (28): the last integral is the mean number of visits to i by '0, '1' ... ,

*
that is, the mean number of visits to i by ~o, ... , ~TN' and is no more
than pG(i). This proves (f3).
124 THE BOUNDARY [4

(30) Corollary. For all h, the sequence G(i, ~n)/pG(~n) converges Q,,-almost
surely. Moreover, h is extreme iff

Q { lim G(·, ~,,) = h} = 1.


h n pG(~n)
PROOF. Qh{~"Elk} = 1, so the case i¢lh is easy by (9). For iElk
and ~" E Jh,
Gh(i, ~n) GO, ~n)lh(i)
phGh(~,,)
= pG(~n)

*
converges Q,,-almost surely by (II), and (29) on ph. The last assertion now
follows from (19).
(31) Remark. G(i, ·)/pG(·) is bounded, because (1.51d) makes
G(i,j)/pGU) = SU, i)pG(i) ~ S(i, i)/pG(i) = G(i, i)/pG(i).

4. EXAMPLES

Let cp(i,j) = Pi{~n =j for some n ~ O}. By (1.51d),


(32) G(i,j) = CP(i,j)GU,j).
If io = i, ... , in =j are in T, and i E l \ then (9) shows

P~{~o = io, ... , ~n = in} = h(i)-Ipi{~O = io, ... , ~n = i,,}h(j).


Sum over all such sequences with iI' ... , i"_1 not equal to j to get
(33) P~{~n = j for some n ~ O} = h(i)-Icp(i,j)h(j) for i E 1".
Let {i,,} be a sequence in 1. Say in converges with limit h iff for eachj in I
there are only finitely many n with i" = j, and
lim,,_oo GU, in)/pG(in) = hU).
In the conventional treatment, 1 is compactified by adding all these limits;
the extra points form the boundary.
For given reference probability p and substochastic matrix P, the set of
extreme harmonic h was identified by (19) as some of the limits of convergent
sequences in. In this section, the extreme harmonic h are found in seven
examples. The first four are artificial, and are introduced to clarify certain
points in the theory. The last three present some well known processes.
(34) Example. Let N be a positive integer. There will be precisely N
extreme harmonic h. The state space 1 consists of 1 and all pairs (n, m),
4.4] EXAMPLES 125

with n = 1, ... , Nand m = 1, 2, .... The reference probability concentrates


on 1. The transition probabilities P are subject to the following conditions.
as in Figure 1 :

ED
I
• ~ EJ • • ••
• • •
• • •
• • •
• • •

• ••

• • •

From any state other than 1. it is possible to jump to 1. This transition is not shown.

Figure 1.

P is transient;
P[I, (n, 1)] >0 and ~nP[I, (n, 1)] = 1;
o < peen, m), (n, m + 1)] < 1;
P[(n, m), 1] = 1 - P[(n, m), (n, m + 1)].
In the presence of the other conditions, the first condition is equivalent to

n:~1 P[(n, m), (n, m + 1)] > 0 for some n;


use (1.16). For a = 1, ... , N, let ha be this function on I:
ha(l) = 1;
ha(n, m) = 1/4>[1, (n, m)] for n = a;
ha(n, m) = 4> [(n, m), 1] for n -:;f a
Then hI' ... ,hN are the extreme harmonic h.

PROOF. To see this, let j E I, and let in be a sequence in I. By (32),

(35)
126 THE BOUNDARY [4

Suppose in converges. Let in = (an, bn). Clearly, bn -+ 00. Let j = (c, d).
Suppose bn ~ d. If an = c, then 1 leads to in only throughj. By (1.16) and
strong Markov (1.22),
cp(l, in) = cp(1 ,j) . cp(j, in)·
The right side of (35) is therefore Ijcp(l,j). If an #: c, thenj leads to in only
through 1. So
cp(j, in) = cp(j, 1) . cp(l, in),
and the right side of (35) is cp(j, 1). But
cp(j, 1) < Ijcp(l,j),
for otherwise cp(l,j) = cpU, 1) = 1 and P is recurrent by (1.54). Because in
converges, an is eventually constant, say at a. The limit of in is then h a . By
(19), any extreme harmonic h is an ha •
I will now check that ha is harmonic. To begin with, I say
cp[l, (a, 1)] = P[I, (a, 1)] + }:n*a P[l, (n, 1)] . cp[(n, 1), 1] . cp[l, (a, 1)].
Indeed, consider the chain starting from 1. How can it reach (a, I)? The
first move can be to (a, I). Or the first move can be to (n, 1) with n #: a:
the chain has then to get back to I, and from 1 must make it to (a, 1). This
argument can be rigorized using (1.16, 1.15, 1.22). Divide the equality by
cp[l, (a, I)]:

That is,

Next, I say
+ I)] = cp[l, (a, b)] . Plea, b), (a, b + I)]
cp[l, (a, b
+ cp[l, (a, b)] . Plea, b), I]· cp[l, (a, b + 1)].
Indeed, a chain reaches (a, b + 1) from 1 by first hitting (a, b). It then makes
(a, b + 1) on one move, or returns to 1 and must try again. Rearranging,
ha(a, b) = Plea, b), (a, b + I)] . ha(a, b + 1) + Plea, b), 1] . ha(l)
= }:j Plea, b),j] . ha(j)

Finally, let n #: a:
cp[(n, m), I] = P[(n, m), (n, m + 1)] . cp[(n, m + 1),1] + P[(n, m), I];
so
ha(n, m) = }:j P[(n, m),j] . ha(j).
I will now check that ha is extreme. Abbreviate
7ra = p~a.
4.4] EXAMPLES 127

As (33) implies, 'lTa-almost all sample sequences reach (a, b). Therefore, with
'lTa-probability 1 the first coordinate of ~ n is a for infinitely many n. But ~n

*
converges with 'lTa-probability 1 by (30). So 'lTa{~n ->- ha} = 1, and ha is
extreme by (30).
(36) Example. There are countably many extreme harmonic h. There is a
sequence in in I which converges to an extreme excessive h which is not
harmonic. This example is obtained by modifying (34) as follows: The state
space consists of 1 and all pairs (n, m) of positive integers. The reference
probability concentrates on I. The transitions are constrained as in (34). The
new convergence is (n, m) with n ->- 00 and m free. The limit is hoo, where
hoo(j) = 4>(j, 1) = G(j, 1)/G(I, 1).
Use (21) to see hOC! is not harmonic. The rest of the argument is like (34). *
(37) Example. There are c extreme harmonic h. The state space I consists
of all finite sequences of O's and 1's, including the empty sequence 0. The
reference probability concentrates on 0. The transition probabilities P
are subject to the following conditions, as in Figure 2:

• • • • • • • •
• • • • • • • •
• • • • • • • •
From any state other than 0, it is possible to jump to 0. This transition is not shown.

Figure 2.

P is transient;
0< P(0, 0) < 1 and P(0, 1) = 1 - P(0, 0);
for eachj ¥: 0 in I, the three numbers P(j,jO) and P(j,jl)
and P(j, 0) are all positive and sum to 1.
For each infinite sequence s of O's and 1's, let hs be this function on I:
hs(j) = 1/4>( 0 , j) if s extends j;
hs(j) = 4>(j, 0) otherwise.
128 THE BOUNDARY [4

Then {h.} are the extreme harmonic h. The argument is like (34): a sequence
in in I converges iff the length of in tends to 00, and the mth component of in
is eventually constant, say at Sm' for each m. Then in -+ hs .
Now suppose that P(j,jO) = P(j,jl) and depends only on the length of j,
for allj. I claim each h. is unbounded. Indeed, suppose for a moment thatj
has length N. Let ()(j) be the P 0 -probability that {; n} visits j before any other
k of length N. By symmetry, ()(j) = 2-N • If ;n visits j after visiting some k
other than j of length N, there is a return from k to 0, except for miracles.
Thus,
cp(0 ,j) = ()(j) + tJ(j),
where tJ(j) is at most P", (AN)' and
AN = {;. returns to 0 after first having length N}.
But limN~'" P '" (AN) = 0, because ;n visits 0 infinitely often on nN AN.
Consequently,
limlY~oo cp(0 ,j) = O.
However, there are many bounded, harmonic, nonextreme h: here is an
example:
h(i) is twice the Pi-probability that the first coordinate of ;n is 0 for all
large n. ~

PI P2 P3
• • •

•• •

•••
PI P2 P3

Figure 3.

(38) Example. A sequence in converges to an h which is harmonic but not


extreme. The state space consists of a1 , a2 , • •• and b1 , b 2 , • •• and
4.4] EXAMPLES 129

The reference probability concentrates on


Co, C1 , •••• Co' Choose a sequence
Pn with 0 < Pn < 1 and
rr:=l Pn > o.
Define the transition probabilities P as in Figure 3:
P(co' a1) = P(co' b1 ) = t;
P(an' a n+1) = P(b n, bn+1) = Pn for n = 1,2, ... ;
P(an' cn) = P(b n, cn) = 1 - Pn for n = 1,2, ... ;
P(c n+1' cn) = 1 for n = 0, 1, ... .
Then an converges to an extreme harmonic function, as does b n ; and this
exhausts the extreme harmonic functions. But cn converges to the constant
function 1, which is not extreme.
PROOF. I will argue that Cn converges to 1, which isn't extreme. The rest
is like (34). By symmetry,
ep(aj , cn) = ep(b j , cn)·
Suppose n '?;. j. Then Co leads to c n only through aj or bj' And PCD-almost
all sample paths hit aj or bj' So
ep(aj , cn) = ep(b j , cn) = ep(co, cn)·
Now (32) implies
G(aj , cn)/G(co, cn) = G(b j , cn)/G(co, cn) = 1.
For n > j,
ep(cj , cn) = ep(co' cn)
because cj leads to c n only through 1, and Pc;-almost all sample paths reach 1.
Now use (32) again:
G(c j , cn)/G(co, cn) = 1.
Thus, Cn converges to 1. The event {;n is an a for all large n} is invariant and
has PCD-probability t by symmetry, so the invariant a-field is not PCD-trivial.
Now (25) prevents 1 from being extreme. *
(39) Example. The random walk. The state space I consists of the integers.
The reference probability concentrates on O. Let t < p < 1. The transition
probabilities are given by:
pen, n + 1) = P and pen, n - 1) = 1 - p.
As (1.95, 1.96) show,
ep(i,j) = 1 ifj'?;. i and ep(i,j) = [(1 - p)/p]H ifj < i.
130 THE BOUNDARY [4

So iv converges, to h+ and Ie respectively, iff iv ~ 00 or iv ~ - 00; where

Now h+ is extreme because the invariant a-field is PI-trivial; and h_ is extreme


for a similar reason, Ph_ being the random walk with I - P replacing p.
Triviality follows from Hewitt-Savage (l.I22). *
Write Xn ~ Yn' or Xn is asymptotic to Yn' iff xnlYn ~ I. Suppose d is a
nonnegative integer, and l' ~ 00 through the integers. Then
. v! = v(v - I)" . (v - d + 1)(v - d)!;
so
(40) v! ~ vd(v - d)! as v~ 00.

(41) Example. The random walk in space-time. The state space I consists
of pairs (n, m) of integers with 0 ;;; m ;;; n. The reference probability con-
centrates on (0, 0). Let 0 < p < I. The situation with p = 0 or I is easier.
The transition probabilities P are given by
P[(n, m), (n + 1, m + 1)] = P and P[(n, m), (n + 1, m)] = 1 - p.
You should check that G[(a, b), (n, m)] = 0 unless n ~ a, and m ~ b, and
n - a ~ m - b, in which case

G[(a, b), (n, m)] = (n - a) . pm(l _ p)n-rn . p-b(l _ p)b-a.


m-b
Suppose (n, m) converges. Then, n ~ 00. If m is bounded, by passing to a
subsequence suppose m is eventually constant, say at M. Then (n, m) con-
verges to h 111., where
h 1l1 (a, b) = 0 for b > 0
= (1 - p)-a for b = O.

This function is not harmonic. Similarly, (n, m) does not converge to a


harmonic function if n - m is bounded. So, suppose that m ~ 00 and
n - m ~ 00. By passing to a subsequence if necessary, suppose min con-
verges, say to q, with 0 ;;; q ;;; I. Then

( n - a)/(n) (n - a)! m! (n - m)!


m - b m = n! . (m - b)! . [n - m - (a - b)]!
~ n-amb(n - m)a-b by (40)

= (:Y(1 - ;fb
~ q\1 _ q)a-b.
4.4] EXAMPLES 131

So (n, m) converges to hq , where

hq{a, b) = (q)b[(1 - q)Ja-b


p (1 - p)

*
Now, ph a is again a random walk in space-time, with q replacing p, so that hq
is extreme harmonic by (25) and Hewitt-Savage (1.122).

The next example, the Polya urn, has been studied recently by (Blackwell
and Kendall, 1964).

°
(42) Example. An urn contains u balls at time 0, of which ware white and
u - w black. Assume w > and u - w > 0. At time n, a ball is drawn at
random, and replaced. A ball of the same color is added. Then time moves on
to n + 1. Let Un be the number of balls in the urn at time n, namely n + u.
Let Wn be the number of white balls in the urn at time n. Then
{(Un' Wn):n = 0, ... } is a Markov chain starting from (u, w), with state
°
space I consisting of pairs (t, v) of integers having < v < t. The chain has
stationary transitions P, where

P[(t, v), (t + 1, v + 1)] = vlt


P[(t, v), (t + 1, v)] = (t - v)lt.

I claim that {Wn+l - Wn:n = 0, I, ... } is exchangeable. In the present


notation, suppose a ;;:; c and

and Vm+l = vm + 1. Then the p(a.c) probability that


or Vm

~o = (a, vo), ~l = (a + 1, VI), ~2 = (a + 2, v ~n = 2 ), ••• , (a + n, vn )


is alleged to depend only on Vn- The easiest method is to argue, inductively
on n, that this probability is equal to the product

~. c + 1 .. . _ _w_-_l__ a-c a-c+1 u-w-1


a a+ 1 a+w-c-l a+w-c a+w-c+l u - 1
where u = a + nand w = V n • Call this product O(a, c, u, w).
°
Keep a;;:; c. Then G[(a, c), (u, w)] = unless a ~ u, and c ~ w, and
a - c ~ u - w. In the latter case, by exchangeability, G[(a, c), (u, w)] is the
product of two factors: the first is the number of sequences of u - a balls,
of which w - c are white and the others black; the second is the common
probability O(a, c, u, w) that an urn with c white balls and a - c black balls
will produce some specified sequence of u - a draws, of which w - care
132 THE BOUNDARY [4

white and (u - IV) - (a - c) are black. That is,

G[(a, c), (u, IV)] = (uIV-C


- a) . O(a, c, U, IV).

Let the reference probability concentrate on (2, 1). Suppose


u~a IV~C u-lV~a-c
and
u~2 II' ~ 1 u-IV~1 a>c~1.
Then
G[(a, c), (u, IV)] = (a _ 1)(a -
G[(2, 1), (u, IV)] c- 1
2) (U - a)/(u - 2).
W - C1 IV -

Suppose (un' wn ) converges. Then Un -+ 00. By compactness, suppose IVnlu n


converges, say to 7T. If IV n -+ 00 and un - IV n -+ 00, then (un' Wn ) -+ hIT' where

h,,(a, c) = (a - 1) ( a - 2) . 7T C- 1 (1 _ 7T)a-c-I.
C - 1
This follows by (40):
(Un - a)! 2-a
(Un - 2)!
~ Un

(IV n - 1)! c-I


~ Wn
(IV n - c)!
(Un - Wn - 1) "'-' ( Un )a-c-l.,
+ c)!
-,...;;) - Wn
(Un - Wn - a

so the product, namely (Un - a)/(UIVn-


Wn - C n -
21), is asymptotic to

If un -+ 00 and Wn is bounded, then (un' IVn ) -+ f, where f(a, c) = 0 for


c> 1, and f(a, 1) = a-I. If Wn -+ 00 and Un - Wn is bounded, then
(un' IVn) -+ g, where g(a, c) = 0 for a - c> 1, and g(a, c) = a-I for
a - c = 1. Now f and g are not harmonic. Therefore, {h,,:O ~ 7T ~ I}
contains all extreme harmonic h by (19). But by algebra, ph1f corresponds to a

*
random walk in space-time with parameter 7T, so h1f is extreme harmonic by
(25) and Hewitt-Savage (1.122).

s. THE LAST VISIT TO i BEFORE THE FIRST VISIT TO J\{i}

This section is somewhat apart from the rest of the chapter, but uses
similar technology. The results wiII be referred to in ACM. Let P be a
4.5] THE LAST VISIT TO i BEFORE THE FIRST VISIT TO J\ {i} 133

1= {I, 2, 3, 4, 5} J = {I, 2, 3} i= 1

a a+fj
O~~ __ ~~ __ ~ __ ~~ __ ~ __ L-~ __~~__~__~~_
o 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 4.

stochastic matrix on the countable set I; suppose I fonns one recurrent class
relative to P. This stands in violent contrast to Sections 1-4. The coordinate
process ~o, ~1' ••• on (J"', Pi) is Markov with starting state j and stationary
transitions P. Fix J c I and i EJ. Let (ex + (J) be the least n with ~n EJ\{i}.
Let ex be the greatest n < (ex + (J) with ~n = i. See Figure 4. For j E I, let
= Pi { ~ visits i before hitting J\ {i}}.
h(j)
So h(i) = 1, while h(j) = 0 for j EJ\{i}. Check
(43 a) h(j) = 'i:.kE1P(j, k)h(k) for j E I\J.
On the other hand,
(43b) 'i:. kEI P(i, k)h(k) = 1 - (),
where
() = P i { ~ hits JHi} before returning to i} > O.
Let
H = U:j E I and h(j) > O}.
Then i E H, and H\{i} c I\J. Let

M(j, k) = _1_ P(j, k)h(k) for j, k E H.


h(j)
Using (43), check that M is a substochastic matrix on H, whose rows sum
to 1, except that row i sums to I - ().
Let
H* = U:j E I and h(j) < I} U {i}.
134 THE BOUNDARY [4

So H* ;:) J. Define a matrix M* on H* as follows.

M*(j, k) = 1 . P(j, k)[l - h(k)] for j ¢ J.


1 - h(j)
M*(j, k) =0 for j E J\{i}.

M*(i, k) = ! P(i, k)[1 - h(k)].


e
Using (43), check that M* is a substochastic matrix on H*, whose jth row
sums to 1 or 0, according as j ¢J\{i} or j EJ\{i}.
Let T be the number of n ~ IX with'; n = i.
(44) Theorem. With respect to Pi:

(a) {';n:O ~ n ~ IX} is independent of {.;a+n:O ~ n < <Xl};


(b) {.;n:O ~ n ~ IX} is Markov with stationary transitions M;
(c) {.;a+n:O ~ n ~ {J} is Markov with stationary transitions M*;
(d) Pi{T = v} = e(l - e),-l for v = 1,2, ....

PROOF. Claim (a). 1 say


(e) {.;n:O ~ n ~ IX} is Pi-independent of {';a+n:O ~ n ~ {J}.

To check (e), stop'; when it first enters J\{i}, and reverse time. That is,
look at ~n = ~a+P-n for 0 ~ n ~ IX + {J. Then ~ is a Markov chain with
stationary substochastic transitions, as in (28). Now {J = (IX + {J) - IX is
the least n with ~n = i, and is Markov for ,. Use strong Markov (1.21) on
the process ~ and the time {J. This proof of (e) was suggested by Aryeh
Dvoretzky.
You can also derive (e) from blocks (1.31) and lemma (48) below. Use the
successive i-blocks for Xl' X 2 , • • • • Let V be the set of i-blocks free of J\{i}.
Now L is the least n with Xn ¢ V. Check that {';n:O ~ n ~ IX} is measurable
on (Xl' " .. , X £-1), while {.;a+n: 0 ~ n ~ {J} is measurable on XL-But XL
is independent of (Xl' ... ,XL-I)'
I will now argue (a) from (e) and strong Markov. Abbreviate·

X = {';n:O ~ n ~ IX}
Y = {';a+n:O ~ n ~ {J}
y* = ';a+fJ
Z = {';a+P+n:O ~ n}.

Check that ';a+' is measurable on (Y, Z), and y* is measurable on Y. Let


4.5] THE LAST VISIT TO i BEFORE THE FIRST VISIT TO l\{i} 135

A, B, C be measurable sets. I say


Pi{X E A and Y E Band y* = j and Z E C}
= Pi{X E A and Y E Band y* = j} . P;{C}
=Pi{XEA}'P;{YEBand y* =j}'P;{C}
=P;{XEA}'Pi{YEBand y* =jandZEC};
the first and third equalities come from strong Markov (1.22) on the time
oc + {3; the second equality comes from (e). This proves (a).
Claim (b). Fix io = i. Fix n ~ 1. Fix iI' ... , in- 1 in H and in in /. Then
{~m = im for 0 ~ m ~ nand oc ~ n}
= {~m = im for 0 ~ m ~ n and ~n+' visits i before l\{i}}.
By Markov (1.15),

Pi{~m = im for 0 ~ m ~ nand oc ~ n} = [n:::~ PUm, i m+1)] h(in)

= n;:.-:,lo M(im' i m+1) if in E H


= 0 if in ¢ H.
Claims (c-d). Fix io = i. Fix n ~ 1. Fix iI, ... , i n - 2 in H*\1. Fix i n - 1
in H* and in in I. Let
A = {~a+m = im for 0 ~ m ~ nand {3 ~ n}.

To get (c), I have to compute Pi{A}. Clearly, Pi{A} = 0 if in = i, or if n ~ 2


and i n - 1 E 1. Exclude these two cases. Let B. be the event that ~ visits i at
least v times before the first visit to l\{i}. So B. = {T ~ v}. To get (d), I have
to compute Pi{B). Let a. be the time of the vth visit to ifor v = 1,2, ... ; so
P i {a 1 = O} = 1.

Let '(v) be the post-a. process. Check that a. is Markov, B. is in the pre-a.
sigma field, and '(v)o = i. Check
B.+1 = B. n {'(v) E B 2 } for v ~ 1.
By definition,
Pi{Bo} = P i{B 1 } = 1 and P i{B 2} = 1 - e.
By strong Markov (1.22) and induction,

(45a) Pi {B.+ 1 } = Pi{B.} . (1 - e) = (1 - e)" for v = 1,2, ...


This settles (d).
Let
C = {~m = im for 0 ~ m ~ n and ~n+' visits l\{i} before i}.
136 THE BOUNDARY [4

By Markov (1.15),
(45b)
Check that
{B. and s(v) E C} for v = 1,2, ...
are pairwise disjoint and their union is A. So
Pi{A} = ~::'1 Pi{B. and s(v) E C}
= ~::'1 Pi { Bv} . Pi { C} by strong Markov (1.22)
= ~~=1 (1 - 0)"-1. Pi { C} by (45a)

by (45b)

= Il~;:~ M*(im' i lllH ) if in E H*

*
=0 if in rf=H*.
Suppose 0 < 1. Define a new matrix M on H as follows:
M(j, k) = M(j, k) for j ~ i;

M(i, k) =
_1_ M(i, k).
1- 0
This M is stochastic. Let A. be the least n > 0 with ~ n = i. Let D be the
conditional Prdistribution of {~n:O ~ n < A.}, given that ~n EI\{i} for no
n < A.. Define a new process gn:O ~ n} with state space I, starting from i,
visiting i an infinite number of times, such that the i-blocks of ~ are independ-
ent and have common distribution D.
(46) Theorem. ~ is Markov with stationary transitions M.
PROOF. Let io = i. Let iI' ... , in be in Ill. Let
A = gm = im for 0 ~ m ~ n}
B= g returns to i before visiting l\{i}}.
The D-probability of starting off (io, iI' ... , in) is
Pi{A I B} = Pi{A (l B}!Pi{B}
= _1_ [Il::'~~ P(im' imH )] h(ln),
1- 0
by Markov (1.15). The D-probability of starting off (io, i1 , ••• ,in)' and
terminating at time n, is
Pi{A and ~n+1 = i I B} = Pi{A and ~nH = i}!Pi{B}
= 1 ~ 0 [Il::,-:lo P(irn' imH)] PUn' i)h(i).
4.5] THE LAST VISIT TO i BEFORE THE FIRST VISIT TO J\{i} 137

Let io = i and N ~ 1. Let iI' ... , is-l be in H. Let iN be in I. Suppose m of


io, ... , i.V-l are equal to i. Then

Prob gn = in for 0 ~ n ~ N} = (1 - (i)-m [II;;~~ P(in' i n+l )] h(i.v)


= II;;~~ M(im in+!) if iN E H
=0
Let T be independent of ~. Let the distribution of T coincide with the
*
distribution of T relative to Pi' as described in (44d). Let T* be the time of the
Tth visit to i in ~.
(47) Theorem. The joint distribution of T and {Sn:O ~ n ~ T*} coincides
with the joint P;-distribution ofT and {';n:O ~ n ~ IX}.

*
PROOF. Use (48) below. Let Xn be the nth i-block in .;. Let V be the set
of i-blocks free of J\{i}. So L = T, and the two ()'s coincide.
To state (48), let Xl' X 2 , ••• be a sequence of independent and identically
distributed random objects. Let V be a measurable set of values, such that
Xl t/= V has positive probability () less than 1. Let L be the least n with
Xn t/= V. Next, let T, Z, Y I , Y 2 , ••• be independent random objects. Suppose
T is n with probability 0(1 - ()n-l for n = 1,2, .... Suppose the distri-
bution of Z coincides with the conditional distribution of Xl given Xl t/= v.
Suppose the distribution of Y n coincides with the conditional distribution of
Xl given Xl E V for n = 1,2, ....
(48) Lemma. (Xl' ... ' XL-I, XL' L) is distributed like (YI , .. . , Y T-l' Z, T).
PROOF. Let n be a positive integer. Let AI' ... ,A n - l be measurable
subsets of V. Let B be a measurable set disjoint from V. Then

Prob {X m Am for 1 ~ m ~ n - 1 and X nEB and L = n}


E

= Prob {Xm E Am for 1 ~ m ~ n - 1 and Xn E B}


= [II;:'-:'\ Prob {Xl E Am}] . Prob {Xl E B}
= [II;:.-:,ll (1 - 0) Prob {YI E Am}] . 0 Prob {Z E B}
= [11;:'-:'\ Prob {YI Am}] . Prob {Z B} . Prob {T = n}
*
E E
= Prob {Ym E Am for 1 ~ m ~ n - 1 and Z E Band T = n}.
NOTE. The first i-block in .;, conditioned on missing J\{i}, is Markov
relative to Pi. If you condition the first i-block on hitting J\ {i}, however, it
stops being Markov. The process ';01+. isn't Markov: its first i-block hits
J\{i}; the second one could miss.
5

INTRODUCTION TO
CONTINUOUS TIME

1. SEMIGROUPS AND PROCESSES

Let I be a finite or countably infinite set. A matrix M on I is a function


(i,j) -+ M(i,j) from I x I to the real line. Call M stochastic iff M(i,j) ~ 0
for all i and j, while ~i M(i,j) = I for all i. Call M substochastic iff
M(i,j) ~ 0 for all i andj, while ~i M(i,j) ~ I for all i. Matrix multiplication
is defined as usual:
MN(i,j) = ~kEI M(i, k)N(k,j).
(I) Definition. P = {P(t):O ~ t < oJ} is a stochastic semi group on I if!
(a) pet) is a stochastic matrix on I for each t ~ 0;
and
(b) pet + s) = P(t)P(s) for all t ~ 0 and s ~ 0;
and
(c) P(O)(i,j) = I or 0 according as i =j or i ~ j.
Call P a substochastic semigroup if! (b) and (c) hold, but pet) is substochastic
for each t ~ O. Call P standard if!
(d) lim t _ O P(t)(i, i) = 1 for all i.
Usually, P(t)(i,j) is abbreviated to pet, i,j).

I want to thank Richard Olshen for checking the final draft of this chapter.
138
5.1 ] SEMIGROUPS AND PROCESSES 139

NOTE. Let P be a standard substochastic semigroup. Then P(t, i,j) is


continuous at t = 0; this is an immediate consequence of the definition. In
fact, P(·, i,j) is continuous on [0, 00); this is proved as (9). Finally, if P(t)
is stochastic for any t > 0, then P(t) is stochastic for all t; see (6).
For finite f, let 1 = I. For infinite f, endow f with the discrete topology,
and let 1 = f U {rp} be the one-point compactification of I. Let (0, ff,.9') be
a probability triple. For each t ~ 0, let X(t) be an ff-measurable function
from 0 to 1. Let p be a probability on f and let P be a stochastic semigroup
on f.
(2) Definition. {X(t):O ~ t < oo} is a Markov chain with starting distri-
bution p and stationary transitions P iff

.9'{X(tm) = imfor m = 0, ... , n} = p(io) n;:;:}o P(tm+I - 1m, i m , i m +I )


when
0= to < II < ... < tn < 00 and io, ... , in E T.
ffp(i) = 1, say the chain starts from i. By convention, an empty product is l.
It is worth noting that 9{X(t) E I} = I for all t ~ 0; it is impossible to
prove, for it is usually false or meaningless, that9{X(t) E ffor all t ~ O} = 1.
Does there always exist such a Markov chain? The answer is yes. This
will be proved as a by-product of some rather difficult arguments and only
for standard P. In other treatments, an affirmative answer is obtained
immediately. Here is a digression which sketches the usual procedure. For
simplicity, suppose f is finite. Let 0 be the set of all functions w from [0, 00)
to f; for t ~ 0, let X(t, w) = w(I); let ff be the usual product a-field on 0,
namely, the smallest a-field over which all the X(t) are measurable. From the
Kolmogorov consistency theorem (10.53), there is a unique probability .9'
on ff making {X(t):O ~ t < oo} a Markov chain with the specified
distribution. It is commonly held that, for a reasonable Markov chain, almost
all the sample functions are step functions. How does X behave? Let S be
the set of w EO which are step functions. As everybody knows,
S includes no nonempty ff-set.
Indeed, the monotone class argument associates to each A E ff a countable
subset C(A) of [0, 00), such that:
w E A and w' = w on C(A) entails w' EA.
If A E ff and A c Sand w E A, there is an w' E O\S with w' = won C(A):
a palpable contradiction. Consequently, S has inner .9'-measure 0. Therefore,
X is badly behaved; the most you can hope for is that S has outer .9'-measure
1, which is in fact the case for standard P and finite I. The usual method of
140 INTRODUCTION TO CONTINUOUS TIME [5

proof is to complete ~ under f7J and modify each X(I) on a set of f7J-measure
0, so the resulting process X* is separable; this uses an uncountable axiom of
choice. Then, you prove that for almost all w, the function 1--+ X*(I, w) for
rational I is a step function. As a function of real I, it is separable, and there-
fore a step function. The method here is to construct one process which not
only has the required distribution, but also has step functions for all its
sample functions.
Most results in this chapter are standard. References are usually given for
proofs appropriated from others, but not for results. This section concludes
with lemma (4), which will be used in most of the constructions in the rest of
the book. Section 2 establishes the basic analytic properties of standard
stochastic semigroups; these results will also be used many times. Sections
3 and 4 are independent of Section 2, and cover a special topic: the analytic
properties of uniformly continuous semigroups. The results in this case are
simpler and more complete. Given a uniformly continuous stochastic
semigroup P, Section 7 constructs a Markov chain with stationary transitions
P, all of whose sample functions are step functions. This construction depends
on the construction in Section 6 of the general Markov chain whose sample
functions are step functions up to the first bad discontinuity, and are then
constant. Section 5 contains preliminary material on the exponential distri-
bution. I will refer to Section 5 repeatedly, and Section 6 occasionally, in
later constructions.

Finite I
Here is a summary of the results for finite I. Let P be a standard stochastic
semigroup on I. Then
P'(O) = Q
exists and is finite; Q(i, i) ~ 0; while Q(i,j) ~ 0 for i =F j; and
L j Q(i,j) = O.
Any finite matrix Q which satisfies these three conditions is the derivative at 0
of some standard stochastic P. Furthermore, Q determines P; in fact,
pet) = eQ!.
Fix a standard stochastic P, and let Q = p' (0). Let
q(i) = -Q(i, i).
Let
r(i,j) = Q(i,j)/q(i) when i =F j and q(i) >0
=0 otherwise.
5.1 ] SEMIGROUPS AND PROCESSES 141

Construct a process X as follows. The sample functions of X are I-valued


right continuous step functions. The length of each visit to j is exponential
with parameter q(j), independent of everything else. The sequence of states
visited by X is a discrete time Markov chain, with stationary transitions r
and starting state i. Then X is Markov with stationary transitions P and
starting state i.
Let Y be any Markov chain with stationary transitions P and starting
state i. Then there is a process y* such that:
(a) Y*(t) = yet) almost surely, for each fixed t;
(b) the y* sample functions are right continuous I-valued step functions.
The jumps and holding times in y* are automatically distributed like the
jumps and holding times in X. To be explicit, the sequence of states visited by
y* is a discrete time Markov chain, with stationary transitions r and starting
state i. The length of each visit to j is exponential with parameter q(j),
independent of everything else.

A lemma for general I


To state (4), let R(t) be a matrix on I for each t E [0, (0). Remember that
I = I for finite I, and I is the one-point compactificationof discrete I for
countably infinite l. For each i, let Xi be an I-valued process on the prob-

°
ability triple (Q i , .'Fi , Pi)' Let .'Fi(t) be the a-field in Q i spanned by X;(s) for
~ s ~ t. Suppose
Pi{Xi(O) = i} = 1 for all i
°
(3a)
(3b) P;{X;(t) E l} = 1 for all t ~ and all i
(3c) Pi{A and Xi(t + s) = k} = Pi{A}' R(s,j, k) for all i,j, k in I,
all nonnegative sand t, and all A E .'F(t) with
A c {Xi(O) = i and Xi(t) = j}.
(4) Lemma. Suppose conditions (3). Then R is a stochastic semigroup.
Relative to Pi' the process Xi is Markov with stationary transitions Rand
starting state i.
PROOF. In (3c), put t = 0, i= j, A = {X;(O) = i}, and use (3a):
(5) R(s, i, k) = P;{Xi(S) = k} ~ 0.
Sum out k E I and use (3b):
1;kEI R(s, i, k) = Pi{Xi(S) E l} = 1.
So R(s) is a stochastic matrix, taking care of (1 a).
142 INTRODUCTION TO CONTINUOUS TIME [5

In (3c), put A = {Xi(O) = i and Xi(t)= j}, and use (3a, 5):
R(t, i,j) • R(s,j, k) = Pi{X;(t) = j and X;(t + s) = k}.
Sum outj E I and use (3b, 5):
1:;EI R(t, i,j)' R(s,j, k) = P;{X;(t) E I and X;(t + s) = k}
= P;{X;(t + s) = k}
= R(t + s, i, k).
So R satisfies (1 b). Condition (Ic) is taken care of by (3a) and (5). This
makes R a stochastic semi group.
Let io, ... , in' in+! E I. Let 0 = to < ... < tn < tn+! < 00. In (3c), put
i = i o, j = in, k = i n+1 , t = tn' and s = tn+! - tn. Put
A = {X;(trn) = im for m = 0, ... , n}.
Then
Pio{A and X;(tn+l) = in+l} = Pio{A} . R(tn+l - tn' in> in+!)
= n;:.=o R(t m +1 - t m, i m, im+l)
by induction: the case n = 0 is (5). This and (3a) make Xi Markov with
stationary transitions R and starting state i, relatIve to Pi' *
2. ANALYTIC PROPERTIES

Let P be a sub stochastic semigroup on the finite or countably infinite set I.


Except for (6), suppose
P is standard.
(6) Lemma. Even if P is not standard,
(a) 1:; P(t, i,j) is non increasing with t.
(b) If P(t) is stochastic for some t > 0, then pet) is stochastic for all t.
PROOF. Claim (a). Compute.
1:; pet + s, i,j) = 1:; 1:k P(t, i, k)P(s, k,j)
= 1:k 1:; P(t, i, k)P(s, k,j)
~ 1: k P(t, i, k).

*
Claim (b). Claim (a) shows that pes) is stochastic for 0 ~ s ~ t. Now
P(u) = p(u/n)n is visibly stochastic when u/n ~ t.
NOTE. Fix i E /. If 1:;EI P(t, i,j) = 1 for some t > 0, then equality holds
for all t. This harder fact follows from Levy's dichotomy (ACM, 2.8).
5.2] ANALYTIC PROPERTIES 143

(7) Lemma. For each i,


pet, i, i) > ° for all t.
PROOF. P(t, i, i) ~ P(t/n, i, on, and P(t/n, i, i) -+ 1 as n -+ 00. *
(8) Lemma. Fix i E l. If P{t, i, i) = 1 for some I> 0, then pet, i, i) = 1
for all t.
PROOF. °<
Let s < I. Then
° = 1 - P(/, i, i) ~ 2:.jofti P(s, i,j)P(t - s,j,j).
But P{t - s,j,j) > °by (7), forcing pes, i,j) = °for allj -:/= i. Using (6a),
pes, i,i) = 1 - 2:. jofti P(s, i,j) = 1.
For general s,
pes, i, i) ~ P(s/n, i, i)" = 1
when sin < t. That is,
=
(9) Lemma.
pes, i, i) 1.
For each i andj in I, the function t -+P(t, i,j) is continuous. In
*
fact,
W(t + s, i,j) - pet, i,j)1 ;:;; 1 - P(lsl, i, i).
PROOF. It is enough to prove t I-Jis for s > 0: replace t by t - s to get
s < 0. Now
P{t + s, i,j) = ~k pes, i, k)P{t, k,j);
so
pet + s, i,j) - P{t, i,j) = [pes, i, i) - I]P(t, i,j) + 2:. kofti pes, i, k)P(t, k,j).

*
But 0;:;; pet, k,j) ;:;; 1 and 2:. keF; pes, i, k) ~ 1 - pes, i, i).
(10) Lemma. P'(O, i, i) exists and is nonpositil'e.
PROOF. Letf(t) = -logP{t, Then ° r(t) < for all I> °
i, i). by~ 00
(7), andf(O) 0, andfis subadditive, andfis continuous by (9). Let
=

(11) q = SUPt>o t-1j(t).

Fix a with
°
If q = 0, thenf= and P(t, i, i) = 1. So assume q > 0.
°;:; °
a < q. Fix t > so that t-1j(t) ~ a. Think of s as small
and positive. Of course, t = ns + b for a unique n = 0, 1, ... and b with
o ;:;; t5 < s; both nand b depend on s. So,
a;:;; t-1j(t);:;; t-I[nf(s) + f(b)] = (t-Ins)s-1j(s) + t-1j(t5).
Let s -+ O. Then t-Ins -+ 1 and b -+ 0, so
a ;:;; lim infs~o s-1j(s).
144 INTRODUCTION TO CONTINUOUS TIME [5

Let a increase to q, proving


lims_o s-1f(s) = q.
In particular,f(s) > 0 and in consequence
d(s)= 1 - pes, i, i) >0
for small positive s. Of course, des) ~ 0 as s ~ O. For x> 0,

lim",_o k[-log (1 - x)] = 1,


so
·
IIm !(s) _ \. -log [1 - des)] _ 1
s_ o - Im s _ o - .
des) des)
Consequently,
. _ -des) = q.
*
hm s o
s
Let
q(i) = -P'(O, i, i).

WARNING. q(i) = 00 for all i E I is a distinct possibility. For examples,


see Section 9.1, (ACM, Sec. 3.3), (B & D, Sec. 2.12).
A state i with q(i) < 00 is called stable. If all i are stable, the semigroup or
process is called stable. A state i with q(i) = 00 is called instantaneous.
A state i with q(i) = 0 is called absorbing; this is equivalent to pet, i, i) = 1
for some or all positive t, by (8) or (12), below.
In view of (11),

(12) pet, i, i) ~ e-q(i)t.

This proves:
(13) If SUPi q(i) < 00, then lim t_ o P(t, i, i) = 1 uniformly in i.
The converse of (13) is also true: see (29).
(14) Proposition. FixiEJ, withq(i) < 00. Fixj ~ i. ThenP'(O, i,j) = Q(i,j)
exists and is finite. Moreover, ~jEI Q(i,j) ~ O.
PROOF. I say

(IS) P(nb, i,j) ~ ~;:;::,~ PCb, i, i)mp(b, i,j)P[(n - m - l)b,j,j].


Indeed, the mth term on the right side of (15) is the probability that a discrete
time chain with transitions P( b) and starting state i stays in i for m moves,
then jumps to j, and is again inj at the nth move. Fix E > O. Using (12),
choose t> 0 so that mb < t implies PCb, i, i)m > 1 - E, and s < t
5.2] ANALYTIC PROPERTIES 145

implies P(s,j,j) > 1- 8. For nO< I, relation (15) implies


P(no, i,j) > (1 - 8)2nP(0, i,j).
That is,
(16) (no)-l P(no, i,j) > (1 - 8)20-1P(0, i,j).

Let Q(i,j) = lim SUPd_O d-1P(d, i,j). Let 0 ~ 0 in such a way that
O-lP(O, i,j) ~ Q(i,j), and let n ~ IX) in such a way that nO ~ s < t. From
(16),
S-lP(S, i,j) ~ (1 - 8)2Q(i,j),

so lim inf.-o S-lP(S, i,j) ~ QU,j). This proves the first claim.
For the second claim, rearrange ~j P(/, i,j) ~ 1 to get

_P-,-(I_,I_"",;,i)_-_1 "' .. P(I, i,j) < 0


I
+ ""J<F' I
= .
Use (10) on the first term. Use the first claim and Fatou on the sum. *
This proof is based on (Doob, 1953, p. 246).
(17) Lemma. Fix i E I, and j ¥= i. Then P'(O, i,j) = Q(i,j) exists and is
finite.
This differs from (13) in that q(i) may be infinite. The proof is similar but
harder, and is taken from (Chung, 1960, 11.2).

PROOF. Let 0 > O. Of course, P(o) is a substochastic matrix on I, and


p(o)n = P(no). From Section 1.3, recall that {~n} is Markov with stationary
transitions P(o) and starting state i, relative to P(tJ);. Let

g(n) = P(O)i{~n = i but ~m ¥= j for 0 < m < n}.


Then
(18) P(no, i, j) ~ ~::;:~ g(m)P(o, i, j)P[(n - m - 1)0, j, j),
since the mth term on the right is the P(o);-probability that ~n = j, and the
firstj among ~o, ... , ~n occurs at the m + 1st place, and is preceded by an i.
Let
fen) = P(o);{~n = j but ~m ¥= j for 0 < m < n}.
Then

(19) P(mo, i, i) = gem) + ~~~lf(Y)P[(m - V)O,j, i]:


for
146 INTRODUCTION TO CONTINUOUS TIME [5

where
A = {~m = ; but ~!l ¢ j for 0 < f-t < m}
B. = {~m = i and ~. = j but ~!l ¢ j for 0 < f-t < v}.
Since "L":'::./ f(v) ~ I, relation (19) shows
(20) gem) ~ P(mb, i, i) - max {P(s,j, ;):0 ~ s ~ mb}.
EA e > O. Find t = t(i,j, e) > 0 so small that pes, i, i) > 1 - e and
P(s,j,j) > 1 - e for 0 ~ s ~ t; then P(s,j, i) < e. If nb < t, and m ~ n,
then gem) ~ I - 2e by (20). Combine this with (18);
P(nb, i,j) ~ (1 - e)(1 - 2e)nP(b, i,j).
Complete the argument as in (14).
The main result, due to (Doob, 1942) and (Kolmogorov, 1951), is
*
(21) Theorem. If P is a standard substochastic semigroup on the finite or
countably infinite set I, then P'(O, i,j) = Q(i,j) exists for all i,j and isfinite
for i ¢ j.
PROOF. Use (10) and (17). *
The matrix Q = p' (0) will be called the generator of P.
WARNING. Q is not the infinitesimal generator of P, and in fact does
not determine P. For examples, see Sections 6.3 and 6.5.
The following theorem (Ornstein, 1960) and (Chung, 1960, p. 269) will
not be used later, but is stated for its interest. A special case will be proved
later, in Section 7.6.
(22) Theorem. P(t, i,j) is continuously differentiable on (0, (0). For i ¢ j or
i = j and q(i) < 00, the derivative is continuous at O. For 0 < t < 00,
"L; Ipl(t, i,j)1 < <X) and "L; PI(t, i,j) = O.
For positive finite sand t,
P'(s + t) = P'(s)P(t) = P(s)P'(t).
An example of Smith (1964) shows that P'(t, i, i) may oscillate as t - 0
if q(i) = 00. A similar example is presented in Section 3.6 of ACM.
NOTE. Let Q be a matrix on I. When does there exist a standard stochastic
or substochastic semigroup P with P' (0) = Q? This is one of the most
interesting open questions in the Markov business. It is particularly intriguing
when Q is allowed to take the value - <X) on the diagonal. For some recent
work on this question, see (Williams, 1967); one of his results is discussed in
5.3] UNIFORM SEMIGROUPS 147

Section 3.5 of ACM. [This question was settled by Williams (1976), The
Q-matrix problem, Seminaire de Probabilites X, Lecture Notes in Mathe-
matics 511, Springer, Berlin.]

3. UNIFORM SEMIGROUPS

Let sI be a Banach algebra with norm II I and identity A. That is, sI is a


noncommutative algebra over the reals, with identity A. By convention,
IXA = AIX for real IX and A E sI. For A, BE sI and real oc,

IIA + BII ~ IlAIl + IIBIl, IIABII ~ IlAIl· IIBIl, IlocAIl = loci· IIAII,
A¥-O implies I A I > 0, and I All = 1.

Finally, sI is complete in the metric peA, B) = IIA - BII. In this section,


convergence and continuity are with respect to p. Here are the local high
points.
For A E sI, by definition
A2 A3
eA=A+A+-+-+···.
2! 3!
The series is Cauchy because

I -+
An ... + A + II ~--+ n m
IIAll ... + IIAlln+m , n

n! (n + m)! - n! (n + m)!
so converges (completeness) to an element of sI. The function A _ eA is
uniformly continuous on {A: IIAII ~ K < oo}. If A and B commute, that is
AB = BA, then

by power series manipulation. If oc is real,


e'!'J. = e' A.
If IIA II < 1, then by definition,
i(A) = A + A + A2 + ...
and by manipulation,
i(A)(A - A) = (A - A)i(A) = A.
Consequently, if B E sI and P is real and I A - PB I < 1, then B is invertible:
there is a unique C E sI with BC = CB = A. The uniqueness follows by
familiar algebra from the existence. Write C = B-1, and call C the inverse of
B. Check that (e A)-l = r-1, because A and -A commute.
148 INTRODUCTION TO CONTINUOUS TIME [5

If/is a continuous function from [0, 00) to d, and ~ a < b < 00, then°
S!f(t) dt is defined as the limit of the usual Riemann sums. The old arguments
show that/is uniformly continuous on [a, b], so the limitexists and depends
linearly on! Moreover,

Ilff(t) I ~
dt f"f(t) II dt.

For A Ed,

f Af(t) dt = A ff(t) dt and ff(t)A dt = [ff(t) dtJ A.


For c > 0,
I bf(t + c) dt = Ib+cf(t) dt.
a a+c
For a < b < c,
ff(t) dt + ff(t) dt = ff(t) dt.
Finally, if g is a continuous, real-valued function on [0, t], then "by Fubini,"

(t (U g(u)f(v) dv du = (t tg(u)f(v) du dv.


JoJo JoJv
Let {P(t):O ~ t < oo} be a uniform semigroup, namely,
P(t) E d, P(t + s) = P(t)P(s), P(O) = il;
and
t -;. P(t) is continuous

(23) Theorem. t-1[P(t) - il] -;. Q in d as t -;. 0, and P(t) = e Qt •


PROOF. Let A(t) = n
P(u) duo Then t-1A(t) -;. il as t -;. 0, so A(s) is
invertible for some small positive s. Let
Q = [P(s) - il]A(s)-l.
Now
P(Il)A(s) = i"P(Il)P(V) dv

= i"P(U + v) dv
= (u+SP(v) dv
J"
= A(u + s) - A(u).
5.3] UNIFORM SEMIGROUPS 149

So
[P(u) - ~]A(s) = A(u + s) - A(u) - A(s)
= [peS) - ~]A(u)
= A(u)[P(s) - ~].
Multiply on the right by A(S)-l:

P(u) = ~ + A(u)Q.
By induction,

where

Rn = :![f(t - u)np(u) du ]Q n +1

Indeed, substituting ~ + A(u)Q for P(u) in the formula for Rn shows


Qn+ltn+l
R n -- (n + 1)! +R n+l·
Fix T> 0 and let M = max {IIP(t) I :0 ~ t ~ T}. Then for 0 ~ t ~ T,

I R II ::5 M I QII n+lT + ----+ o.


n1
n - (n + 1)! *
For more information, see (Dunford and Schwartz, 1958, VIII.1).
(24) Remark. If QEd, then t ----+ e Qt is a uniform semigroup. What (23)
says is that all uniform semi groups are of this exponential type.
(25) Remark. Let QEd and P(t) = eQt • Let h > O. Then
pet + h) - pet) = (e Qh - ~)P(t) = P(t)(e Qh - ~)
pet) - pet - h) = (e Qh - ~)P(t. - h) = pet - h)(e Qh - ~).
By looking at the power series,
limh .... o ! (e Qh - ~) = Q.
h
Thus, P'(t) = QP(t) = P(t)Q.
Letfbe a function from [0, (0) to d. Say fis differentiable iff

!£ f(t) = f'(t) = lim ..... o e- [j(t + e) -


1 f(t)]
dt
exists for all t ~ O. Iff and g are differentiable, so is fg; namely, (fg), =
f'g + fg'· If f is differentiable and A Ed, then (Af) and (A + f) are
150 .INTRODUCTION TO CONTINUOUS TIME [5

differentiable: (Af)' = AI' and (A +f), = f'· If / is differentiable and


I' == 0, then/is constant. Indeed, replace/by -f(O) + ito getf(O) = O.
Then
f(s) = f(s) - f(t) + f(t),
so
11/(s)1I - 1I/(t)1I ~ II/(s) - 1(t)I\·
In particular, t - II/(t)1I is real-valued, vanishes at 0, and has vanishing
derivative. So 1I/(t) II = 0 for all t.
Fix Q E.xI. If pet) = eQt, remark (25) shows
PI(t) = Qe Qt = eQtQ.
Conversely, if P is differentiable, P(O) = A, and either
PI(t) = QP(t) for all t ~ 0
or
PI(t) = P(t)Q for all t ~ 0,
then pet) = eQt. Indeed, with the first condition,

~ [e-QtP(t)] = e- Qt PI(t) _ e-QtQP(t)


dt
= e-Qt[pI(t) - QP(t)]
= o.
So, e-QtP(t) is constant; put t = 0 to see the constant is A. Thus, pet) = eQt.
For the second condition, work on P(t)e- Qt . In particular, P is a semigroup.
This result can be summarized as
(26) Theorem. Let Q E.xI. Then P(t) = eQt is the unique solution of either
the forward system
P(D) =A and PI(t) = P(t)Q
or the backward system
P(O) =A and PI(t) = QP(t).
I learned the argument from Harry Reuter.

4. UNIFORM SUBSTOCHASTIC SEMI GROUPS

Suppose the standard substochastic semigroup PC·) on the finite or count-


ably infinite set I is uniform, that is,
lim t _ o pet, i, i) = 1 uniformly in i.
5.4] UNIFORM SUBSTOCHASTIC SEMIGROUPS 151

This condition is automatically satisfied if I is finite, a case treated in (Doob,


1953, VI.I). If A is a matrix on I, define IIA II as SUPi ~j IA(i,j)l. The set .91
of matrices A with II A II < 00 is a Banach algebra with noml II' ii, and identity
tJ.., where ~(i,j) is 1 or 0 according as i = j or i ¥= j. And IIA II ~ 1 if A is
substochastic. If 0 ~ s < t < 00 and h = t - s, then
(27) IIP(t) - P(s) II = I (P(h) - tJ..)P(s)1I
~ IIP(h) - ~II
~ 2 SUPi [1 - P(h, i, i)].

So P is uniform in the sense of Section 3. As (23) implies, P has derivative


Q E.9I at 0, and P(t) = eQt. Clearly, Q(i,j) = P'(O, i,j), where' is the cal-
culus derivative.
What matrices can arise? Plainly,
(28a) QU, i) ~ 0 for all i
(28b) QU,j) ~ 0 for all i ¥= j.

Inequality (28a) implies -QU, i) ~ IIQII; so Q E.9I implies


(28c) infi QU, i) >- 00.

By rearranging ~j PCt, i,j) ~ I,

pet, i, i) - i + "'.
.:...)*,
PCt, i,j) < 0
= .
t t
By Fatou:
(28d) ~j Q(i,j) ~ O.

(29) Theorem. Let P be a uniform substochastic semigroup on I. Then


Q = r(O) exists and satisfies condition (28). Conversely, let Q be a matrix on I
which satisfies (28). Then there exists a unique uniform substochastic semigroup
P with P'CO) = Q, namely

Finally,for any t > 0:


pet) is stochastic iff ~j Q(i,j) = 0 for all i.

NOTE. Suppose Q satisfies (28) and P is a standard sub stochastic semi-


group on I with P'(O) = Q. Then P is uniform by (13).
PROOF The first assertion has already been argued. For "conversely,"
pet) = eOI is a unifonn .9I-valued semigroup with P'(O) = Q by (24-25). The
uniqueness of P is part of (23). When is P stochastic or substochastic?
152 INTRODUCTION TO CONTINUOUS TIME [5

Choose a positive, finite q with

Q(i, i) ~ -q for all i.


Let
~ 1
Q = - Q +~,
q
where ~ is the identity matrix. Then Q is always substochastic. And Q is
stochastic iff ~j Q(i,j) = 0 for all i. But Qt = -qt~ + qtQ, so

_ -qtLoo (qt)n Q~n


- e n~O
n!
is the Poisson average of powers of Q, all of which are substochastic. Thus,
eQt is always substochastic. Let t > O. Then Q appears with positive weight,
so eQf is stochastic iff Q is: that is, iff ~j Q(i,j) = 0 for all i. *
This proof was suggested by John Kingman.

5. THE EXPONENTIAL DISTRIBUTION

Let T be a random variable on the probability triple (n,.f7, [JIl).


DEFINITION. Let 0 < q < 00. Say T has exponential distribution with
parameter q, abbreviated as T is e(q), iff [JIl{T > t} = e- qt for all t ~ O.
Then [JIl{T = t} = 0 and [JIl{T ~ t} = e- qt for t ~ O.
To state (30), suppose Tis e(q) on (n,.f7, [JIl). Let ~ be a sub-a-field of.f7,
independent of T. Let S be ~-measurable; allow [JIl{S = oo} > O. Let
A E ~. Let u and t be nonnegative and finite.
(30) Lemma.
[JIl{A and S ~ t ~ t + u < S + T} = e- qU [JIl{A and S ~ t < S + T}.
PROOF. Here is a computation.

[JIl{A and S ~t~t+u <S + T} = J e-q(t+u-S) d[Jll

f
{Aand S~t)

= e- qU e-q(t-S) d [JIl
{A and S~t}

= e- qu [JIl{A and S ~ t < S + T}.


5.5] THE EXPONENTIAL DISTRIBUTION 153

The first and last equalities come from Fubini (10.21). I will set the first one
up for you. Use (.0, ff, 9) for the basic triple. Let Xl(W) = wand let
(.o b ff 1) = (.0, ~). Let X 2 = T; let (.0 2, ff 2 ) be the Borel half-line [0, 00).
Let
f(w,y) = I if WEA and S(w)~t~t+u<S(w)+y
= 0 otherwise.
Thenfis ffl x ff 2-measurable. Andf(w, .) == 0 unless wE A and Sew) ~ t.
If wE A and Sew) ~ t, then

I f[w, T(w')] 9(dw') = e-q[t+"-S(wll.

Let rfo be 9 retracted to ~. So

9{A and S ~ 1~ t + 1I <S + T} = If[X1(w), X 2(w)] 9(dw)

= IIf[W, T(w')] 9(dw') ~(dw)


f
fA and
e-q(t+u-Sl d 9.
S";;tj
*
(31) Lemma. If Tis e(q) and A> -q, the expectation of e- AT isq/(q + A).
In particular, the expectation of Tn is n !/qn, and the variance of T is l/q2.

*
PROOF. Integrate and expand.
Let T and 11 be independent random variables on (.0, ff, 9).
(32) Lemma. (a) If T has a density bounded above by B, so does T + 11.
(b) If T has a continuous density bounded above by B, so does T + 11.
PROOF. Claim (a) is easy. Use (a) and dominated convergence for (b). *
NOTE. If Tis e(q), its density is bounded above by q, but is discontinuous
at 0, as a function on the real line.
To avoid exceptional cases, I will sometimes write e(O) for the distribution
concentrated at 00, and e( 00) for the distribution concentrated at O. Then
e(q) has mean l/q, where 0 and 00 are reciprocals. For lemmas (33) and (34),
let Tm be independent e(qm) 00, for nonnegative integer m, on (.0, ff, 9),
with 0 ~ qm < 00. Let 00 + x = 00 for x > - 00. Write E for 9-expectation.
(33) Lemma. Let M be a finite or infinite subset of the nonnegative integers.
Then ~mdI Tm < 00 a.e. if ~mElIf l/qm < 00, and ~mElIf Tm = 00 a.e. if
~mE.M l/qm = 00.
154 INTRODUCTION TO CONTINUOUS TIME [5

PROOF. Abbreviate S = I:mE1l1 T m. As (31) implies, E(S) = I: meM 1/qm,


proving the first assertion. For the second, let e-OO = 0, so 0 ~ e-" ~ 1 is
continuous and decreasing on [0, 00]. By (31) and monotone convergence,
= nmE~1f qm/(qm + 1) = ltnmeM (1 + q;.l) = 0
E(e-s )
when I: meM 1/qm = 00. Then e-s = 0 a.e., forcing S = 00 a.e.
*
(34) Lemma. Let M be a finite or infinite subset of the nonnegative integers
containing at least 0 and 1. For mE M, suppose 0 < qm < 00, and suppose
that I:mEM l/qm < 00. Then I:mElIf T m has a continuous density vanishing on
( - 00, 0] and bounded above by qo.

it
PROOF. Lemma (32b) reduces M to {O, I}. Then, the density at t ~ 0 is

QOql e-OoU-·'e-Ol. ds,

while the density at t < 0 is o. This function of t is continuous. For t ~ 0,


the integrand is at most r 01., and J~ q1e-'l1' ds = 1. So the density is at
mostqo· *
6. THE STEP FUNCTION CASE

In this section, I will construct the general Markov chain whose sample
functions are initially step functions, and are constant after the first bad
discontinuity. The generality follows from (7.33). This construction will be
used in Section 7 and in Chapters 6-8, so I really want you to go through it,
even though the first reading may be difficult. I hope you will eventually
feel that the argument is really simple.
Let I be a finite or countably infinite set. Give I the discrete topology. Let
o < a ~ 00. Let f be a function from [0, a) to I. Then f is an I-valued right
continuous step function on [0, a) provided
f(t) = lim {f(s):s t t} for 0 ~ t < a
f(t-) = lim {f(s):s it} exists in I for 0 < t < a.
Let r be a substochastic matrix on I, with
r(i, i) = 0 for all i.
Let q be a function from /to [0, (0).

Informal statement of(39)


Construct a process X as follows: The sample functions of X are I-valued
right-continuous step functions, at least initially. The length of each visit to j
is exponential with parameter q(j), independent of everything else. The
5.6] THE STEP FUNCTION CASE ISS

sequence of states visited by X is a discrete time Markov chain, with stationary


transitions r and starting state i E l. This may define the X sample function
only on a finite time interval. Indeed, if ~k r(j, k) < I, the jump process may
disappear on leaving j. And even if X succeeds in visiting a full infinite
sequence of states, the sum of the lengths of the visits may be finite. In either
case, continue the X sample function over the rest of [0, (0) by making it
a
equal to 1= l. Then X is Markov with stationary transitions R, where R is a
standard stochastic semi group on I u {a}, with a absorbing. Moreover,

R'(O,j,j) = -q(j) for j in I


R'(O,j, k) = q(j)r(j, k) for j ~ k in l.

Formalities resume
Fix a 1= I. Extend r to j = I U {a} by setting

r(i,j) = r(i,j) for i,j E I


...
r(i, a) = 1- ~iEI r(i,j) for i E I
...
qa,i)=O foriEI
rCa, a) = 1.

r
Thus, is stochastic on !. Extend q to j by setting q( a) = 0. Define a matrix
Q on j as follows:
QU, i) = -q(i) for i E 1
Q(i,j) = qU)r(i,j) for i ~j in 1.
Introduce the set f!l' of pairs x = (w, w), where w is a sequence of elements
of 1, and w is a sequence of elements of (0, 00).
Let
;n(w, w) = wen) E j and 7'n(w, w) = wen) E (0,00]
for n = 0, I, .... Give f!l' the product a-field, namely the smallest a-field
over which ;0' ;1' ... and 7'o, 7'1' ... are measurable. Of course, f!l' is Borel.
INFORMAL NOTE. The process X begins by visiting ;0, ;1' ... with holding
times 7'o, 7'1' ....
For each i E 1, let 1Ti be the unique probability on f!l' for which, semi-
formally:
(a) ;0' ;1' ...
is a discrete time Markov chain with stationary stochastic
r
transitions on 1 and starting state i;
156 INTRODUCTION TO CONTINUOUS TIME [5

(b) given ;0' ;1' ... , the random variables TO, Tl' ••• are conditionally
independent and exponentially distributed, with parameters qao),
q(;I), ....
INFORMAL NOTE. 7Ti makes X a Markov chain with starting state i and
generator Q.
More rigorously: introduce W, the space of sequences w of elements of
(0, 00]. Give W the product a-field. For each function r from {O, 1, ... } to
[0, (0), let 'Yjr be the probability on W making the coordinates independent
and exponentially distributed, the nth coordinate having parameter r(n).
For wE 100 , the set of I-sequences, let q(w) be the function on {O, 1, ... }
whose value at n is q(w(n». Think of; = ao,
;1' ... ) as projecting f!( onto
100 , and T = (TO' T 1 , ••• ) as projecting f!( onto W. Then 7T i , a probability on
f!( is uniquely defined by the two requirements:
A

(a) 7T i ;-1 = f i ;

(b) 'Ilq(w) is a regular conditional 7Tcdistribution for T given; = w.


By (10.43), this amounts to defining

(35) 7Ti{A} = L 'Yjq(w) {A(w)} r;(dw),

where:
(a) ri was defined in Section 1.3 as the probability on /00 making the
coordinate process a f-chain starting from i;
(b) A is a product measurable subset of f!(;
(c) A(w) = {w: wE Wand (w, w) E A} is the w-section of A.
There is a more elementary characterization of 7T i • which is also useful:
7Ti is the unique probability on f!( for which
(36) 7T;{;0 = i o••·· , ~n = in and TO> to,· .. ,Tn> tn } = pe- t ,
where

and p is the product


A A
f(io, i 1 ) ••• f(in-l, in);
this must hold for all n and all I-sequences io = i, iI, ... , in' and all non-
negative numbers to, ... ,tn' By convention, an empty product is 1. Use
(10.16) for the uniqueness of 7T;. For an easy construction of 7Ti , let '0' '1, ...
be Markov with transitions f starting from i, on some triple. Independent of
5.6] THE STEP FUNCTION CASE 157

{, let To, T 1 , ••• be independent and exponential with parameter 1. Let


On = Tn/qa,). Then 'IT; is the distribution of 0). a,
If 'IT; satisfies the first characterization (35), it satisfies the second (36) by
this computation. Let A be the set of w E /00 with w(m) = im for
m = 0, ... ,n. Let B be the set of W E W such that w(m) > tm for
m = 0, ... , n. Then
'lTi(A x B) = i 1)q(Q))(B) f\(dw) = pe- t ,
"
because fiCA) = p and 1)q(Q))(B) = e- t for all wE A. Since both character-
izations pick out a unique probability, they are equivalent.
From either characterization,
(37) 'lTi{~O = io, ... , ~n = in and (TO' ..• ,Tn) E B} = pq,
where: B is an arbitrary Borel subset of Rn+1; and q is the probability that
(Uo,·.·, Un) EB,
Uo, ... , Un being independent exponential random variables, with par-
ameters q(io), ... ,q(in)' respectively; and p is defined as for (36).

b=hoT

~4 =6 0 T
X

~o

~3 = ~2 0 T
TI

EI=EooT

-TO--
rt
TOoT
T2 =TI 0 T - I-T3 =T2 0 T-

t+s
l
N(t)= I N(t+s)=4

Figure 1.

Let a = ~:=o Tn' so


be the least m with
°< a~ 00. For x E f!( and t < a(x), let N(t, x)

TO(X) + ... + Tm(X) > t.


For t ~ a(x), let N(t, x) = 00. Define a process X on f!( as in Figures 1 and 2:
X(t, x) = ~m(x) when t < a(x) and N(t, x)= m;
= (} when t ~ a(x).
a
x absorbing
state in I
x

d<oosoa=oo

absorbing
state in I
x

d= 00 and a = 00

/ x

. .'I

a
d =00 and a < 00

Figure 2.
158
5.6] THE STEP FUNCTION CASE 159

Check that X is jointly measurable and I-valued. Joint measurability mean


relative to the product of the Borel a-field on [0, (0) and the a-field on fr.
FACT. If X(', x) is I-valued everywhere, then a(x) = 00, and X(', x) is
a right continuous I-valued step function.
DEFINITION. Let d be the least n if any with ~n = a,
and d = 00 ifnone.
Call j E I absorbing iff q(j) = O. Let fro be the subset of fr where for all n:

Tn = 00 iff ~n is absorbing;

Discussion

Use (37) to check 7Ti(frO) = 1. Confine x to fro. Then ~n = a implies


~n+l = 0, because rCa, a) = 1. And ~n ~ a implies ~n+l ~ ~n' because
r(j,j) = O. Look at Figure 2 while you think about the next lot of assertions.
If d < 00, then (J = 00, and the X sample function is a step function, which
usually terminates with a a; but if ~ visits an absorbing state in I before the
first visit to 0, then the sample function of X is an I-valued step function,
with a finite number of steps. If d = 00 and (J = 00, then the X sample
function is an I-valued step function, which typically has infinite number of
steps; but if ~ visits an absorbing state in I, there are only a finite number. If
d = 00 and (J < 00, then ~ cannot visit an absorbing state; the X sample
function is an I-valued step function on [0, 0'), has a bad discontinuity at
a
0', and is after 0'. In any case, each X sample function is continuous from
the right.
NOTE. If ~n is absorbing, so Tn = 00, then the values of ~n+m and T n+m
for m > 0 have no effect on X. Consequently, r could be normalized so
r(j, k) = 0 for absorbing j E I. Suppose this done. Keep x confined to fro.
Then ~n is the nth state visited by X, with holding time Tn' for 0 ~ n < d.
If d = 00, then ~ visits no absorbing states, and X performs an infinite
sequence of visits but may still ooze into a. If d < 00 and ~d-l is absorbing,
then ~d-l is the last state visited by X. If d < 00 and ~d-l is not absorbing,
then ~d = a is the last state visited by X. If the row sums of rare 1 except
at the absorbing states, then d < 00 implies that X terminates with a visit of
infinite length at ~d-l E I.
WARNING. Outside fro, it is likely that ~1 = ~o· SO TO and ~1 cannot be
recovered from X. If x is confined to fro, however, then TO is measurable on
X; if x is confined to {fro and TO < oo}, then ~1 is measurable on X. If
160 INTRODUCTION TO CONTINUOUS TIME [5

r(i, i) > 0, contrary to assumption, then ~l = ~o would have positive


7Ti-probability. So ~l would not be the state Xvisits on leaving ~o. The holding
time parameter for i would be
q(i)[l- r(i,i)].
The probability of jumping to j on leaving i would be
r(i,j)/[l - r(i, i)].
And X would still be Markov with stationary transitions. Transition (i, i)
would have derivative
-q(i)[l - r(i, i)].
Transition (i,j) would still have derivative
q(or(i,j).

The argument is about the same, although it's harder to compute the
derivative.

Theorems
(38) Theorem. Let i E I. Then X has I-valued sample functions, which are
right continuous step functions on [0, (0), with 7Ti-probability 1 iff

with 7T i-probability l. More precisely, let 0 be the least t if any with X(t) = a,
and 0 = 00 if none. Then A = {O = oo} differs by a 7Ti-null set from the set B
where

*
SHORT PROOF. Condition on ~ and use (33).

LONG PROOF. The problem is to show 7T i {A ~ B} = 0; as usual


A ~ B = (A\B) U (B\A). FOI C c !!t = n
X Wand WEn,
let C(w) be the
w-section of C, namely the set of w E W with (w, w) E C. By (35),

7Ti{A~B} = J1'Jq(",){A(W)~B(W)} r(dw).

Temporarily, let dew) be the least n if any with wen) = a, and dew) = 00 if
none. Let B be the set of w E foo such that

~n {l/q(w(n»:O ;;:;;; n < dew)} = 00.


5.6] THE STEP FUNCTION CASE 161

Then B = B X W, so B(w) = 0 for w 1= Band B(w) = W for wEB. And

7Ti{A ~ B} =1 ro~B
1]q(ro){A(w)}ri (dw) + f 1]q(ro){W\A(w)} r;(dw).
roEB

Check
(j = ~n {Tn: O ~ n < d}.
Temporarily, let ~n be the coordinate process on W. Then A(w) is the subset
of Wwhere ,
~n {~n:O ~ n < dew)} = 00.

With respect to 1]q(rol' the variables ~n are independent and exponential with
parameter q(w(n». For w f/: B,
.1]q(ro){A(w)} =0
by the first assertion in (33). For wEB,
1]q(ro){W\A(w)} = 0

*
by the second assertion in (33).
(39) Theorem. With respect to 7T i , the process X is Markov with stationary
transitions, say R, and starting state i. Here R is a standard stochastic semi-
a
group on l,/or which is absorbing. Moreover, R'(O) = Q.
NOTE. The retract of R to I is stochastic iff for any i E I, with 7Tcprob-
ability 1, the sample functions of X are I-valued everywhere: you can prove
this directly. You know that an I-valued sample function is automatically a
right continuous step function.
PROOF. Let 0 ~ t < 00 and let 0 < s < 00. Let ~(t) be the smallest
<1-field over which all the X(u) are measurable, for 0 ;;; u ;;; t. The main
thing to prove is the Markov property:
(40) 7T.{A and X(I + s) = k} = 7Ti {A}' 7Tj{X(S) = k},
where
A c {X(O) = i and X(t) = j} and A E ~(/) and i,j, k are in I
First, I will argue the easy case j = a. Then X(t + s) = a, at least on
{¥o and X(t) = j}, while 7TiJ{X(S) = a} = 1. If k ¢ a, both sides of (40)
vanish. If k = a, both sides of (40) are 7Ti (A). This settles the casej = a.
From now on, assume j E I: Abbreviate
Dm = {X(O) = i and X(t) = j and N(t) = m}.
Let.91 m be the <1-field of subsets of Dm of the form Dm n A* with A* E ~(t).
For m = 0, I, ... , 00, the sets Dm are pairwise disjoint, and their union is
162 INTRODUCTION TO CONTINUOUS TIME [5

{X(O) = i and X(t) = j}. So it is enough to prove (40) for A Ed m: both sides
are countably additive in A. If i = Gor m = 00, both sides of (40) vanish. So
fix i E I and m < 00 for the rest. By definition, X(t) = ~m on {N(t) = m},
so ~m =j on Dm·
Call a set A special iff A is the event that

~o = io, ... , ~m = im and TO > to, ... , T m - 1 > I m - 1 and

where io = i, ... , im = j are in i, and to, ... , t m- 1 are nonnegative numbers.


So A c Dm. Let ~m be the a-field in Dm generated by the special A. In other
terms: ~m is the a-field in Dm generated by ~o, ... , ~m and TO,'" ,Tm-1'
all retracted to D m' I say d m c f!) rll' Indeed, let 0 ~ u ~ t and let h E I.
Then
{Dm and X(u) = h} = U~~o Dmm
where'

You get Dmn E~m < m. Ifh ¥:j, then Dmm = 0. If h =j, then
for n
Dmm = Dm n {TO + ... + T m_1 ~ u}.

Either way, Dmm E ~m" This proves d m c ~m' Two different special A are
disjoint or nested; so the class of special A, with the null set added, is closed
under intersection. The union over i 1 , . . . ,;m-1 of the special A with
to = ... = t m - 1 = 0 is D m' And both sides of (40) are countably additive in A.
If I manage to get (40) for the special A, then (10.16) will extend (40) to ~m'
which is more than enough.
Both sides of (40) vanish if one or more of i 1 , ••• ,im - 1 are equal to G, so
fix them all in I. For the next part of the proof, I will construct a measurable
mapping T of {N(t) = m} into ft, such that
(41) X(I + s) = Xes) a T on {N(t) = m}
and
(42)

for all special A and all measurable subsets B of {X(O) = j}. Let

B = {X(O) =j and Xes) = k}.


Using (41), check
T-1B = {N(t) = m and X(t) =j and X(t + s) = k}.
5.6] THE STEP FUNCTION CASE 163

Remember A c {N(t) = m and X(t) = j}. So


A n T-IB = {A and X(t + s) = k}.
Clearly,
7Tj{B} = 7Tj{X(S) = k}.
So (40) follows from (41-42).
INFORMAL NOTE. T is obtained by shifting the X sample function to the
left by t, and then relabelling the jumps and holding times.
Define a fonnal T by the requirement that for x E !!£ with N(t, x) = m:
~n(Tx) = ~m+n(x) for n = 0,1, ... ;
(43) Tn(Tx) = Tm+n(X) for n = 1,2, ... ;
To(Tx) = TO(X) + ... + T m(x) - t.

Property (41) is fairly clear: look at Figure 1, and think this way. Fix x E!!£
with t < O'(x) and N(t, x) = m. Fix s ~ O. Suppose t + s < O'(x). Then
N(t + s, x) ~ m; say N(t + s, x) = m + n. In particular,
+ s,,.x) = ~m+n(x),
X(t

Let O'o(x) = 0 and O'm(x) = TO(X) + ... + T m-l for m = 1,2, .... Then
N(t + s, x) = m + n means
O'm+1(x) + T m+1(x) + ... + T m+n-l(X)
;;;; t + s < O'm+1(x) + Tm+1(X) + ... + Tm+n(X);

that is,
O'm+1(x) - t + T m+l(X) + ... + T m+n-l(X)
;;;; S < O'm+1(x) - t + T m+1(x) + ... + T m+n(x).

But O'm+1(x) - t = To(Tx) and T m+v(x) = Tv(Tx) for v ~ 1. Therefore,


N(t + s, x) = m + n means
O'n(Tx) ;;;; S < O'n+l(Tx),
that is,
N(s, Tx) = n.
So,
Xes, Tx) = ~n(Tx) = ~m+n(x) = X(t + s, x),
which disposes of (41) under the assumption t + s < O'(x). The case
O'(x) ;;;; t + s is similar.
164 INTRODUCTION TO CONTINUOUS TIME [5

To handle (42), consider a special B of the fonn


B = {;o = jo, ... , ;n = jn and TO > U o, ... ,Tn> Un},
where jo = j, ... ,jn are in i and Uo, . .. ,Un are nonnegative numbers.
Remember im = jo = j. Now A n T-IB is the set where
;0 = to, ... , ;m = i m, ;m+l = jl' ... , ;m+n = jn and
TO> to,··· ,Tm-l > 1m- I, Tm+1 > U1,.·· ,Tm+n > Un and
TO + ... + T m-l ~ I ~ I + Uo < TO + ... + Tm-l + Tm.
As (37) shows, 7T;(A n T-IB) = abce- u , where:
U = q(jl)U 1 + ... + q(jn)u n;
a= 11:'0 1 f(iv, iV+l);
b = 11~:J r(jv,jv+l);
while c is the probability that
U o > to, ... , Um- > I m- 1 and
1

Uo + ... + Um- 1 ~ I ~ I + U o < Uo + ... + Um- 1 + Um'


for independent exponential random variables Uo, ... , Um, having pa-
rameters q(io), ... ,q(im). Now (30) implies that c = de-v, where v = q(j)u o
and d is the probability that

That. is, 7Ti (A n T-IB) = abde- v - u • But 7Ti (A) = ad and 7T;(B) = be- v- u ,
by (37). This completes the proof of (42) for one special A and all special B.
Clearly, (42) holds for B = 0 and B = {X(O) = j}; call these sets special
also. Two different special B are disjoint or nested, and the class of special
B's is closed under intersection and generates the full a-field on {X(O) = j}.
Both sides of (42) are countably additive in B, so (10.16) makes (42) hold for
all measurable B. This completes the proof of (40).
Let R(/, i,j) = 7Ti{X(/) = j}. Use (40), and (4) with i for I to see: R is a
stochastic semigroup on i; while X is Markov with stationary transitions R
and starting state i relative to 7Ti. I still have to show that R is standard, and
R'(O) = Q. The (} row is easy. Fix i E I. I will do the i row. I say
(44)
Suppose i is not absorbing; the other case is trivial. Let Ui and U; be
independent and exponentially distributed, with parameters q(i) and q(j).
5.7] THE UNIFORM CASE 165

By (37),

+ Tl ;:;; t and ~1 = j} = + Ui
A

7T i {TO r(i,j) Prob{Ui ;:;; t};


so

7T i {TO + Tl ;:;; t} = Li r(i,j) Prob{Ui + Ui ;:;; t}.


The contribution to this sum fromj = aor absorbingj in I is clearly O. When
q(j)> 0, lemma (34) shows that
t- 1 Prob{Ui +U j ;:;; t}
is at most q(i) and tends to 0 as t -+ O. Dominated convergence finishes (44).
Confine x to the set {~o = i and ~1 =;6 i}, which has 7Ti-probability 1 because
r(i, i) = O. Suppose for a moment that TO + Tl > t: then X(t) = i iff TO > t;
and XU) = j =;6 i iff ~1 = j and TO;:;; t. As t -+ 0:
R(t, i, i) = > t} + oCt)
7Ti {TO by (44)
= Prob{ Ui > t} + oCt) by (37)
= + o(t);
e-q(i)t

and for i =;6 j,


R(t, i,j) = 7Ti{~1 = j and To;:;; t} + o(t) by (44)
= r(i,j) Prob{Ui ;:;; t} + oCt) by (37)
= r(i,j) [1 - e-q(i)t] + oCt).
A

The rest is calculus.


**
7. THE UNIFORM CASE

Fix a finite or countably infinite set I. Fix a uniform stochastic semigroup


on I, as in Section 4. As (29) implies, Q = r(O) exists; the entries are uni-
formly bounded; the diagonal elements are nonpositive; the off-diagonal
elements are nonnegative; and the row sums are zero. The first problem in
this section is to construct a Markov chain with stationary transitions P, all
of whose sample functions are step functions. Give I the discrete topology. A
function f from [0, (0) to I is a right continuous step function iff f(t) =
limqd(s) for all t ~ 0 and f(t-) = limst t f(s) E I exists for all t > O. The
discontinuity set of f is automatically finite on finite intervals, and may be
enumerated as 0l(f) < 02(f) < .... Iffhas infinitely many discontinuities,
then on(f) -+ 00 as n -+ 00. Iffhas only n discontinuities, it is convenient to
set 0n+I(f) = 0n+2(f) = ... = 00. In any case, it is convenient to set
0o(f) = O. Thenfisaconstant, say ~n(j) EI, on [on(f), 0n+l(j). Iffhas
166 INTRODUCTION TO CONTINUOUS TIME [5

only n discontinuities, leave ;n+l(f), ;n+2(f), ... undefined. Informally,


;o(f), ;1(f), .. , are the successive jumps in f, or the successive states f
visils;fvisits ;n(f) on [on(f), 0n+l(f)) with holding time On+! (f) -On(f) =
"Cn(j). See Figure 3.

b(f)

~3(f)
f

~o(f)

h(f)

To(f) TI(f) T2(f)

Figure 3.

Let S be the set of right continuous step functions from [0, (0) to l. Let
X(I,f) = f(/) for 1 ~ 0 andfE S, and endow S with the smallest a-field ~
over which all X(t) are measurable. I claim that ;n and On are ~-measurable.
The case n = 0 is easy: ;o(f) = f(O) and 0o(f) = O. Suppose inductively
that on is ~-measurable. Confine f to {on < oo}. Then
;n(f) = lim.jof[on(f) + e);
so ;n(f) =j iff for all m = 1, 2, ... there is a rational r with

So ;n is ~-measurable. If t > 0, then


0n+l(j) - on(f) <t
iff there are rational r, s with 0 < r < s < r + 1 and

So on+! = On + On+! - On is ~-measurable. Next, I claim that and on ;n


span ~. For f(t) = j iff ;n(f) = j and on{f) ~ 1 < 0n+l(f) for some
n = 0,1, ....
Let q(i) = -Q(i, i) ~ O. Define the substochastic jump matrix r on I as
follows:
r(i,j) = Q(i,j)/q(i) for i =F j and q(i) > 0
=0 for i = j or q(i) = O.
5.7] THE UNIFORM CASE 167

Because {~n> an} span l:, there is at most one probability Pi on l: for which:
(a) is a discrete time Markov chain with stationary transitions
~o, ~1' •••
r and starting state i;
(b) given ~o, ~1' ••• , the random variables 'tn = an+t - an are con-
ditionally independent and exponential with parameter q(~n)' for
n = 0, 1, ....
NOTE. If r is substochastic, then ~n may be undefined with positive
probability. By (29), the jth row sum of r is 1 or 0, according as q(j) > 0
orq(j) = o.
The condition on Pi can be restated as follows, using (10.16): for any
nonnegative integer n, and sequence io = i, i1> ••• , in in f, and nonnegative
real numbers to, ... , In:
Pi{~m = im and 'tm > tm for m = 0, ... ,n} = pe-t ,
where
n-l 1'(· . )
p = II m=O I m,lm+l

t = l:::'=o q(im)t m.
By convention, an empty product is 1.
(45) Theorem. The probability Pi exists. With respect to Pi' the process
{X(t):O ~ t < oo} is a Markov chain with stationary transitions P and
starting state i.
FIRST PROOF. Use the setup of Section 6, with the present rand q. Check
that the two Q's coincide on f. Fix i E l. With respect to Tr i , the process X
on fE is Markov with stationary standard and stochastic transitions R on 1,
a
by (39); and is absorbing for R. So R is a standard substochastic semigroup
when retracted to f. And R'(O) = Q on f by (39). Furthermore,
R(t, i, i) ~ ra(i)t,

either from the construction or from (I2). So R is uniform. Now R = P on f,


by (29). To summarize: with respect to Tr i , the process X on fE is Markov
with stationary transitions P and starting state i.
As before, let fEo be the subset of fE where for all n:
Tn = 00 iff qan) = 0;
ran, ~n+l) > O.
Confine x to {fEu and ~o E I}. Then d is one plus the least n if any with ~n
absorbing, and d = 00 if none. Indeed, r(j, a) is 0 or 1, according as q(j) > 0
"-
168 INTRODUCTION TO CONTINUOUS TIME [5

or q(j) = O. So d < 00 makes 'Td -] = 00. Let


q = SUpj q(j) < 00.
Then

so
~n {1/q(~n):O ~ n < d} = 00,
whether d < 00 or d = 00. Let fiE] be the set of x E fiEo such that X(', x) is
I-valued everywhere. Remember
7Ti{fiEO and ~o E J} = 1.
Now (38) makes 7T i (fiE]) = 1.
If x E fiE], you know that X(', x) is a right continuous I-valued step function
on [0, 00). Visualize X as the mapping from fiE] to S, which sends x E fiE] to
X(', x) E S. Check that X is measurable. Let Pi = 7T iX-]. Then Pi is a prob-
ability on ~, because 7T;(fiE]) = 1. For 0 ~ n < d, check that ~n on fiE] is
the composition of;n on S with X on fiE], and 'Tn on fiE] is the composition of
"rn on S with X on .or]; while;n or "rn on S applied to X(', x) is undefined for
n ~ d(x). Indeed, ~n ~ 0 implies ~m+l ~ ~m and 'Tm < 00 for m < n, while
~n = 0 implies 'Tm = 00 for some m < n, on fiE]. Consequently, the Pc
distribution of {;n' "rn} on S coincides with the 7Tcdistribution of
gn, 'Tn:O ~ n < d} on fiE.
Namely, {;n} is Markov with stationary transitions r
and starting state i.
Given ;, the holding times "rn are conditionally independent and exponentially
distributed, the parameter for "rn being q(;n)' So, I constructed the right Pi'
The Pi-distribution of the coordinate process X on S coincides with the 7Ti-

*
distribution of the process X on fiE: both are Markov with transitions P and
starting state i.
SECOND PROOF. Use (38) and (7.33).
*
What does (45) imply about an abstract Markov chain with transitions P?
Roughly, there is a standard modification, all of whose sample functions
are right continuous step functions. Then the jump process is a r -chain.
Given the jumps, the holding times are conditionally independent and
exponentially distributed, the parameter for visits to j being q(j).
More exactly, let (0,~,.9') be an abstract probability triple. Let
{Y(t): 0 ~ t < oo} be an I-valued process on (0, ~). With respect to .9',
suppose Y is a Markov chain with stationary transitions P. Remember that
P is uniform, by assumption. For simplicity, suppose .9'{ YeO) = i} = 1. Let
0 0 be the set of wE 0 such that Y(', w) retracted to the rationals agrees with
somejE S retracted to the rationals; of course,jdepends on w.
(46) Proposition. 00 E ~ and .9'(0 0) = 1.
5.7] THE UNIFORM CASE 169

PROOF. Consider the set of functions 1jJ from the nonnegative rationals R
to /, with the product a-field. The set F of 1jJ which agree with some
f = f(1jJ) E S retracted to the rationals is measurable by this argument. Let
Oo(1jJ) = O. Let
'n(1jJ) = lim {1jJ(r):r E Rand r! On(1jJ)}
0n+l(1jJ) = sup {r:r E Rand 1jJ(s) = 'n(1jJ) for S E R with 0n(1jJ) < S < r}.
Then F is the set of 1jJ such that: either
Oo(1jJ) < 01(1jJ) < 02(1jJ) < ... < ro
all exist, and 0n(1jJ) ~ 00 as n ~ ro, and

all exist, and

or for some n,
0O<1jJ) < 01(1jJ) < ... < On(1jJ) < ro and 0n+l(1jJ) = 00

all exist, and

all exist and


'm(1jJ) = 1jJ[Om(1jJ)] whenever m = 0, ... , nand Om(1jJ) E R.
By discarding a null set, suppose Y(r) E / for all r E R. Let Y R be Y with
time domain retracted to the rationals. Then YR is measurable, and no is
{ YREF}. Let X be the coordinate process on S, and let X R be X with
time domain retracted to the rationals. The &'-distribution of Y R coincides
with the Pi-distribution of X R . But {XR E F} is all of S, and therefore has
P;-probabiIity I. So
&'{no} = &'{YR E F} = Pi{XR E F} = I. *
NOTATION. Suppose Y(r, w) E / for all r E R. The value of Y Rat w is the
function r ~ Y(r, w) from R to /. Similarly for X R .
For wE no, let Y*(t, w) be the limit of Y(r, w) as rational r decreases to t.
For w ¢ no, let y*(., w) = i.
(47) Proposition. y*(., w) E Sand &'{ Y(t) = Y*(t)} = 1 for each t ~ O.
PROOF. The first claim is easy. For the second, fix t ~ O. Let Rt = R U {t}
be the set of nonnegative rationals R, and t. Consider the set of functions
1jJ from Rt to /, with the product a-field. Let G be the set of 1jJ with

1jJ(t) = lim 1jJ(r) as r E R decreases to t.


170 INTRODUCTION TO CONTINUOUS TIME [5

Then G is measurable. Confine w to the subset of no where yet) E I. This


subset has &'-probability 1. Let Y Rt be Y with time domain retracted to
RI. Then YRt is measurable, and

{Y(t) = Y*(t)} = {YRt E G}.

Let X be the coordinate process on S, and let XRt be X with time domain
retracted to Rt. The &,-distribution of Y Rt coincides with the Pi-distribution
of X Rt . But {XRt E G} is all of S, and therefore has P;-probability 1. So

&'{ yet) = Y*(t)} = &'{ Y Rt E G} = Pi{XRt E G} = 1. *


NOTATION. Suppose w E no and Y(t, w) E I. The value of Y Rt at w is the
function s --+ Yes, w) from Rt to I. Similarly for X Rt .
In the terminology of (Doob, 1953), the process y* is a standard modifi-
cation of Y. For (48), suppose all the sample functions of Yare in S; if not,
replace Y by Y*. The mapping M which sends w to Y(', w) is a measurable
mapping from n to S. Let Y(', w) visit the states ~o(w), ~l(W), ... with
holding times 70(W), 71(W), .... That is,
~n(w) = ;n(Mw) and 7 n(W) = Tn(Mw).
(48) Proposition. With respect to &': the process ~OJ1' ... is a Markov
chain with stationary transitions r and starting state i; given ~o, ~1' ••• , the
random variables 70,71' ... are conditionally independent and exponentially
distributed, with parameters q(~o), q(~l)' .' ... That is, for io = i, i 1 , ••• , in
in I and nonnegative to, ... , tn'
&'{~o = i o, ... , ~n = in and 70> to, ... , 7 n > t n } = pe- t ,
where p is the product

and

PROOF. With respect to &' M-1, the coordinate process X on S is Markov


with stationary transitions P and starting state i. There is at most one such

*
probability, so &'M-1 = Pi' And the &,-distribution of {~n' 7 n} coincides
with the Pi-distribution of {;no Tn}.
NOTE. If r is substochastic, then ~n may be undefined with positive
probability. By convention, an empty product is 1.
There is a useful way to restate (48). Define the probability 7Ti on f!( as in
Section 6, using the present rand q. Suppose for a moment that q(j) > 0 for
5.7] THE UNIFORM CASE 171

all j. Let .0 be the subset of Q where gn is defined and Tn < 00 for all n. Then
.9'(.0) = 1. Let M map .0 into !!l':
~n(Mw) = gn(w) and Tn(Mw) = Tn(W).
Then
(49) .9'M-l = 7T i •

Now drop the assumption that q(j) > O. Let


d= inf {n : gn is undefined} on.o.
Remember
d = inf {n: ~n = a} on!!l'.
Then
(50) the .9'-distribution of {an, Tn):O ~ n < d} coincides with the 7Ti -

distribution of{(~n' Tn):O ~ n < d}.


6
EXAMPLES FOR THE STABLE
CASE

1. INTRODUCTION

I will prove some general theorems on stable chains in Chapter 7. In this


chapter, I construct examples; Section 6 is about Markov times and I think
you'll find it interesting. This chapter is quite difficult, despite my good
intentions; and it isn't really used in the rest of the book, so you may want to
postpone a careful reading.
Each example provides a countable state space I, and a reasonably con-
crete stochastic process {X(t):O ~ t < oo}, which is Markov with stationary
transitions P. Here P is a standard stochastic semigroup on I. In most of
the examples, the sample functions are not step functions, and P is not
uniform. The generator Q = p' (0) of P will be computed explicitly, but P
will be given only in terms of the process. Each state i E I will be stable:
that is,
q(i) = -QU, i) < 00.
Each example lists
(a) Description
(b) State space I
(c) Holding time q
(d) Generator Q
(e) Formal construction.
Only the ordering of the states is specified. The holding time for each visit
to i is exponential with parameter q(i), independent of all other parts of the
I want to thank Howard Taylor and Charles Yarbrough for reading drafts of
this chapter.
172
6.2] THE FIRST CONSTRUCTION 173

construction. On a first reading of the chapter, you should skim Sections


2 and 4, and ignore the formal constructions, which are based on theorems
(27) and (108). The examples will give you some idea of the possible pa-
thology. You can study theorems (27, 108) and the formal constructions
later, when you want a proof that the processes described in the examples
really are Markov with the properties I claim. The only thing in this chapter
used elsewhere in the book is example (132): it is part of the proof of (7.51).
This chapter only begins to expose the corruption. More is reported in
Chapter 8, in Chapter 3 of ACM, and in Sections 2.12-13 of B & D.
The class of processes that can be constructed by the present theorems
(27, 108) is small. One main restriction is that all states are stable. So
as not to contradict later theorems, the sample functions will be constant
in lover maximal half-open intervals [a, b), called the intervals of constancy,
which cover Lebesgue almost all of [0, 00). On the exceptional set, the sample
function will be infinite. Within the class of stable processes, the processes
of this chapter have a very special feature. Loosely stated, the intervals of
constancy can be indexed in order by part of a fixed set C in such a way
that as c increases through C, the cth state visited by the process-namely,
the value of the process on the cth interval of constancy-evolves in a
Markovian way. At first sight, this may not look like an assumption. Sections
8.5-8.6 and B & D, Sec. 2.13 show that it is a serious one. Incidentally, the set
of intervals of constancy is countable and linearly ordered, so these two
properties are forced on C.
In the first class of processes I want to construct, the order type of the
intervals of constancy depends on the starting state. These processes have a
countable, linearly ordered state space I. Starting from i E I, they move
through all} ;:;; i, in order. I will make this precise in Section 2. The examples
of Section 3 fall into this class. In the second class of processes, the order
type of the intervals of constancy, 'until you reach the end of time, is fixed.
I will describe this class in Section 4. The examples of Sections 5 and 7 fall
into the second class. Naturally, there is one theorem that covers both
constructions, and other things as well. But it seems too complicated for
the examples I really want.
A general method for constructing stable chains, in the spirit of this
chapter, is not yet known.

2. THE FIRST CONSTRUCTION


The parameters
(1) Let I be a countable set, linearly ordered by <, without a last element.
For i E I, let
Ii = {j:} E I and} ;:;; i}.
174 EXAMPLES FOR THE STABLE CASE [6

The set I will be the state space. Starting from i E I, my process will
move through Ii in order.
(2) Let q be a function from I to (O, 00).
The holding time in j will be exponential with parameter q{j), independent
of everything else.
Suppose
(3) ~j {ljq{j): j Eli} = 00 for all i E I
and
(4) ~; {ljq{j):jE/i andj ~ k} < 00 for all iEI and k Eli'
Condition (3) guarantees that my process is defined on all of [0, 00).
Condition (4) guarantees that it moves through all of Ii' when it starts
from i.

The construction

Let i range over I.


(5) Let Wi be the set of functions w from Ii to (O, (0). For j E Ii and
wE Wi' let
7{j, w) = w{j).
Strictly speaking, 7 should have a subscript i. Informally speaking, 7(j)
will be the holding time in j.
(6) Give Wi the smallest a-field making all the 7{j) measurable.
(7) For j E Ii and WE Wi' let
),(j) = ~k {7(k):k E Ii and k <j}
p{j) = ),(j) + 7{j) = ~k {7{k):k E Ii and k ~j}.

(8) Give I the discrete topology, and let j =I u {q;} be its one-point
compactification.
(9) Define a process Xi on Wi:
Xi{t) = j if ),(j) ~ t < p(j) for some j E Ii
= q; if ),(j) ~ t < p{j) for no j Eli'
EXPLANATION. The process should visit j on an interval of length 7(j).
°
The process should spend total time in the fictitious state q;. So I put the
left endpoint of the j-interval at the sum of the lengths of the preceding
intervals, namely ),(j). To prevent various difficulties with the sample
functions, I will have to cut Wi down to Wi in (12).
6.2] THE FIRST CONSTRUCTION 175

You should check (10-14).


(lOa) Xi(t) = i iff 0 ~ t < 7"(i).
If i has an immediate successor j, then
(lOb) Xi(t) = j iff T(i) ~ t < T(i) + T(j).
Ifi < k <j, then
(lOc) {Xi(t) = j} C {T(i) + T(k) ~ t}.
(11) Xi is jointly measurable.

That is, (t, w) - Xi(t, w) is product measurable on [0, <Xl) X Wi' where
[0, <Xl) has the Borel a-field, and Wi has the a-field (6).

(12) Let wt be the set of w E Wi such that

~j {w(j):j Eli} = <Xl, but


A(j, w) < OC! for allj Eli.

(13) Let 7Ti be the probability on Wi which makes the 7"(j) independent
and exponentially distributed, the parameter for 7"(j) being q(j).
Use conditions (3-4) and (5.33):
(14)

Lemmas
Let i range over I, and j over Ii. Let w E wt, as defined in (12).
(15) Xi(t, w) =j iff A(j, w) ~ t < p(j, w). This interval has length T(j, w).
(16) j - [A(j, w), p(j, w)) is 1-1 and order preserving on Ii. The union
of [A(j, w), p(j, w)) as j ranges over Ii is precisely {t: Xi(t, w) Eli}.
(17) X i (', w) is regular in the sense of (7.2).
(IS) Lemma. Lebesgue {t:X(t, w) = cp} = 0, for WE wt.
PROOF. Relations (15-16) show
Lebesgue {t: t ~ A(j, w) and Xi(t, w) E I} = A(j, w).
So
Lebesgue {t:t ~ A(j, w) and Xi(t, w) = cp} = O.
But A(j, w) increases to <Xl as j increases through Ii' by definition (12) of
~. *
176 EXAMPLES FOR THE STABLE CASE [6

(19) Lemma. 7T i {Xit) E I} = 1.


PROOF. Define a new process Y on Wi:
Yes, w) = Xi[w(i) + s, w].
As (11) shows., Y is joiptly measurable: it's the composition of Xi with the
mapping (s, w) -+- (w(i) + s, w). As (18) shows,
(20) Lebesgue {s: Yes, w) = cp} =0 for each w E Wi.
Let
E = {s:O ~ s < 00 and 7Ti[Y(S) = cp] > O}. •
Use Fubini on (14, 20). The set E is Borel, and
(21) Lebesgue E = O.
Check
(22) {Xi(t) = cp} = {T(i) ~ t and Y[t - T(i)] = cp}.
I say Y is measurable on {T(j):j> i}. For Yes) > i; and Yes) =k >i
iff
~j {T(j):i <j < k} ~ s <.~j {T(j):i <j ~ k}.
Review (13) to see that Y is 7T i -independent of T(i), and T(i) is exponential
with parameter q(i). Fubini up (22) and use (21):

7Ti{Xi(t) = cp} = q(i) f 7Ti {Y(t-s) = cp}. e-q(i)' ds = O. *


(23) Let l" be the set of i E I which have an immediate successor, s(i).
(24) Let Q be this matrix on I:
QU, i) = -q(i) for all i E I
Q(i,j) = 0 ifj #= i or sCi)
Q(i, s(i» = q(i) for i E ]8.

(25) Let pet, i,j) = 7T.{Xi (t) = j}.


(26) Lemma. PI(O, i, j) = Q(i, j); the definitions are (23-25).
PROOF. This is easyforj < i. Supposej = i. Then {Xi(t) = i} = {T(i) > t}
by (10). So
pet, i, i) = '1Ti{T(i) > t} = e-q(i)t by (13).
Suppose i E l" and j = sCi). By (10),
{Xi(t) = j} = {T(i) ~ t < T(i) + T(j)}.
6.2] THE FIRST CONSTRUCTION 177

So
P(t, i,j) = 'lTk(i) ~< r(i) + r(j)}
t
= 'lTi{r(i) ~ t} - 'lTi{r(i) + r(j) ~ t}.
But
'lTi{r(i) ~ t} = 1 - e-q(i)t by (13)
'lTi{r(i) + r(j) ~ t} = oCt) by (13) and (5.34).
Suppose j >i but j ~ sci). Then there is a state k with i < k < j. By
(10),
{Xi(t) = j} C {rei) + r(k) ~ t},
so

*
pet, i,j) = oCt) by (13) and (5.34).
The theorem
(27) Theorem. Suppose (1-4). Define the process Xi on the probability
triple (Wi' 'lTi ) by (5-9) and (13). Define Q and P by (23-25).
(a) P is a standard stochastic semigroup on I, with generator Q.
(b) Xi is Markov with stationary transitions P and starting state i.
NOTE. Xi has properties (15-18) on W;*, which has 'lTi-probability 1 by
(14).
PROOF. Let i ~j ~ k be in I, and let s, t be nonnegative.
(28) Let ff(t) be the a-field in Wi spanned by X(u) for 0 ~ u ~ t.
Let A E ff(t) with A c {Xi(t) = j}. I will argue
(29) 'lTi{A and Xi(t + s) = k} = 'lTi{A} . P(s,j, k).
Take (29) on faith for a minute. Use lemma (5.4) on (15), (16), (19), (29)
to make P a stochastic semigroup on I, and Xi a P-chain starting from i.
This proves (b), and (26) completes the proof of (a).
To start on (29),
(30) let T map {Xi(t) = j} into Wi as follows:
(Tw)(j) = AV, w) + r(j, w) - t;
(Tw)(h) = r(h, w) for h > j.
From definition (9),
(31) Xi(t + s) = Xis) 0 T on {Xlt) = j};
so
(32) T-l{X;(S) = k} = {Xi(t) =j and Xi(t + s) = k}.
178 EXAMPLES FOR THE STABLE CASE [6

Next,

(33) let B = {w: WE Wi and w(j".) > u". for m = 0, ... ,n}, where
jo = j <h < ... <jn are in I, and uo, Ul, ... 'Un are non-
negative numbers.

I claim

(34) '1Ti{A ('\ T-1B} = '1Ti{A} • '1Ti{B}


for all A E §"(t) with A c {Xi(t) = j}
and all B of the form (33).
Take (34) on faith for a minute. The equality holds for B = 0 or Wi'
This enlarged set of B's is closed under intersection and generates the full
O'-field on Wi' So (10.16) makes the equality for all measurable subsets B
of Wi' Put B = {Xi(s) = k} and use (32) to get (29).
I will now prove (34). Fix A E §"(t) with A c {X.(t) = j}. Remember
p{j) = 'A(j) + T(j) from (7). Review (9).
(35) Abbreviate C = {Wi and 'A(j) ~ t < 'A(j) + T(j)} = {Xi(t) = j}.
(36) Let l: be the O'-field in Wi generated by T(h) with i ~ h < j.
I claim
(37) There is an A E l: with A = A ('\ C.
Let 0 ~ U ~ t. Check i ~ X(u) ~j on C, by 05-16). Let i ~ h < j. Then
{Xi(u) = hand Xi(t) = j} = A ('\ C. where
A = {'A(h) ~ U < p(h)} E l:.
Similarly,
{X.(U) = Xi(t) = j} = A ('\ C, where
A = {'A(j) ~ u} E l:.

This proves (37).


NOTE. If A E l:, then A ('\ C E j& (t).
Fix B of the form (33). Let
(38) Co = {Wi and 'A(j) ~ t ~ t + Uo < 'A(j) + T(j)}
(39) Bl = {Wi and T{j".) > u". for m = I, ... ,n}.

Use the A of (37) and definition (30) to get


(40)
6.3] EXAMPLES ON THE FIRST CONSTRUCTION 179

So
(41)
because A and Co are measurable on r(h) with h ~ j; while Bl is measurable
on r(h) with h > j, by (33, 39); and definition (13) makes these two lots
7T;-independent.
Check that A and A(j) are ~-measurable. The definitions are (36--37) and
(7). Definition (13) makes r(j) independent of ~ and exponential with
parameter q(j), relative to 7T;. Abbreviate u = uo, and remember j = jo.
Use (5.30) to conclude
(42)
where C was defined in (35).
Use definitions (13,39,33) to check
(43)

*
Combine (41-43) and (37) to settle (34).

3. EXAMPLES ON THE FIRST CONSTRUCTION

(44) Example. (a) Description. The Poisson process with parameter A


moves through 0, 1, ... in order. The holding times are independent and
exponentially distributed, with parameter A. See Figure 1.

3~

2~
T(O) TO)
I

o
Figure 1.

(b) State space. 1= {O, 1, ... }, with the usual order.


(c) Holding times. q(i)= A for all i.
(d) Generator. QU,j) = 0 unless j = i or i + 1; and QU, i) = -A;
and QU, i + 1) = A.

*
(e) Formal construction. Use (27).
180 EXAMPLES FOR THE STABLE CASE [6

(45) Example. (a) Description. The semigroup is not uniform, but almost
all sample functions are right continuous step functions. The states are the
integers I. Starting from i E I, the process moves successively through i,
i + 1, .... See Figure 1.
(b) State space. I is the integers, with the usual order.
(c) Holding times. q(i) is arbitrary subject to: 0 < qU) < 00; and
q(i) -+ 00 as i -+ 00; and L;:l l/q(i) = 00.

(d) Generator. QU,j) = 0 unless j = i or i + 1; and QU, i) = -q(i);


and QU, i + 1) = q(i).
(e) Formal construction. Use (27). The semigroup can't be uniform,
by (5.29). If W E wt, as defined in (12), then X(·, w) is a right continuous
step function, by (15-16). And 7T;{Wt} = 1 by (14). *
(46) Example. (a) Description. The process moves through the non-
negative rationals, in order. The set of infinities is dense in itself, closed from
the right, open from the left, and has the power of the continuum. All
off-diagonal elements in the generator vanish, and the generator need not
determine the semigroup.
(b) State space. I is the set of nonnegative rationals, in the usual order.
(c) Holding times. q is arbitrary, subject to (2-4).
(d) Generator. QU,j) = 0 unless j = i; and QU, i) = -q(i).
(e) Formal construction. Use (27). Fix wE wt, as defined in (12);
remember (14). As (15-16) show, .

{t:X.(t, W) E I}

is a countable union of maximal intervals [a, b) which are ordered like the
rationals: between any two, there is a third. So

S,/W) = {t:Xi(t,w) = lP}

is dense in itself; if tn E S,/w) decreases to t, then t E S<p(w); if t E S",(w),


there are s > t but arbitrarily close with s E· S<p(w). And S<p(w) has power c.
Suppose i ~ j but qU) = q(j). Interchanging i and j does not affect Q,
but does change P: so Q cannot determine P. *
HINT. There are only countably many left endpoints of intervals of fini-
tude. Temporarily add them to S<p(w), and make this larger set have power c.
Example (46) is due to Levy.
(47) Example. (a) Description. Let C be a closed subset of [0, 00),
which contains 0 but has no interior. For almost all sample functions: the
6.4] THE SECOND CONSTRUCTION 181

set of infinities is homeomorphic to C; and the sample function is continuous


at its infinities. The process moves through its states, in order.
(b) State space. Let ~ be the set of maximal subintervals of ( - 00, 00)
complementary to C. Let I consist of all pairs (IX, m), where IX E~ and m is
an integer. Let (IX, m) < «(3, n) iff
(1. is to the left of (3 or IX = (3 and m < n.
(e) Holding times. q is arbitrary subject to (2-4).

(IX, n
(d) Generator.
+ 1). And
Let i = (IX, n). Then Q(i,j) = ° unless t = (IX, n) or

QU, (IX, n + 1)] = -Q(i, i) =;= q(i).


(e) Formal construction. Use (27). Let 4> be the complementary
interval whose right endpoint is 0. Let i = (4),0) and wE as defined in wt,
(12). Remember (14) that 7Ti{Wi*} = 1. Remember Ii from (1). For IX E~,
let ).(IX, w) be the sum of T({3, m) over all «(3, m) E Ii with {3
to the left of IX;
let p(lX, w) be the sum of 1'«(3, m) over all «(3, m) E Ii with (3
not to the right of (1..
By (15-16), the set
{t:X(t, w) E l}
is a countable union of maximal intervals ().(IX, w), p(lX, w» for IX E~.
Furthermore,
(1. --+ ().«(1., w), p(lX, w»
is 1-1 and order-preserving. On interval 4>, the process runs in order through
{(4), n):n = 0, 1, ...}. On interval IX > 4>, the process runs in order through
{(IX, n):n = ... , -1,0,1, ... }. So
Stp(w) = {w:Xi(t, w) = p}
is closed and homeomorphic to C; the function X i (·, w) is continuous on S i w).

RELEVANT FACT. Let Sand T be two closed subsets of [0, 00), without *
interior. Then S is homeomorphic to T iff the set of complementary intervals
of S is order-isomorphic to the set of complementary intervals of T.

4. THE SECOND CONSTRUCTION

Informal description
Let I be a countably infinite state space. Let C be a countably infinite set,
linearly ordered by <, with first element 0. The intervals of constancy will
182 EXAMPLES FOR THE STABLE CASE [6

be indexed by some initial segment of C. Let ,;(c) be the value of the process
on interval c. I want ,;(c) to be Markovian. What does this mean? Let r i be
the distribution of ,; when the starting state is i. This explains condition (63).
Fix a present index dEC. Let
Cd = {c:c E C and c ~ d}.
The past is
{,;(c):c E C and c ~ d}.
The future is
{,;(c):c E Cd}'
Given the past and ,;(d) = j, the conditional redistribution of the future
should be the redistribution of the whole jump process {';(c)}. As far as r;
is concerned, c runs over all of C. The index c in the future only runs over Cd'
So I have to wish these index sets order-isomorphic. More explicitly, there
should be a strictly increasing map
Md = M(d,')
of Cd onto C. Suppose a < b < c are in C. There are now two ways to
compute the position of c relative to b. The direct method gives M(b, c).
The indirect method maps first by M(a, .), getting
0= M(a, a) < M(a, b) < M(a, c);
and then computes the position of M(a, c) relative to M(a, b). I want the
two methods to agree:
M(b, c) = M[M(a, b), M(a, c)].
You should now accept condition (51) on the order structure of C and (66)
on the Markov property.
DIGRESSION. Let c = M;/b, so CECa and b = Mac. I claim

Me = MbMa'
Both sides have domain C e• Take dE C e :
Med = M(Mac, Mad) = MbMad.
So C is a semigroup with identity 0, where
a + b = M;lb.
And b > a iff -a + b = M b E C. So you're really facing the nonnegative
part of a countably infinite, linearly ordered, non-commutative group. *
Here is the final slogan: Given the visiting process, the holding times are
conditionally independent and exponential, with parameter depending only
6.4] THE SECOND CONSTRUCTION 183

on what the process is currently holding onto. Let q specify these parameters:
namely, q is a nonnegative function on I. Given ~,the length T(e) of the eth
interval of constancy is conditionally exponential with parameterq[~(e)], and
these lengths are conditionally independent as e varies over C.
How could you construct such an object? It is easy to generate a process
([~(e), T(e)]: e E C} of states and holding times which has the right properties.
You might as well use the coordinate process on the set of functions from C
to I X (0, 00]. This is done in (75). The process should be ~(e) on the eth
interval of constancy, which has length T(e). But where do you put this
interval? The sample function should spend Lebesgue almost no time out-
side intervals of constancy. This suggests making the left endpoint A(e) of
the eth interval equal to the sum of the lengths of the previous intervals:

A(e) = ~d {T(d):dE C and d < e}.


The right endpoint of the eth interval should be

pee) = A(e) + T(e),


so the length is T(e). And

X(t) = ~(e) for A(e) ~ t < pee),


so X is ~(e) in the eth interval, which is half-open. This is done in (72).
How much sample functions have I got? If A(e) < 00, then the time domain
of the sample function extends to pee) at least. To cover the line, I want

sUPe {p(e):A(e) < oo} = 00.

To get it, assume (65). The intervals of constancy now cover Lebesgue
almost all of [0, 00). On the exceptional null set, put the process in its
fictitious state lfi, namely the point in the one-point compactification of
discrete I.
WARNING. Give C the order topology, and complete it. Suppose x is in
this completion, and is a limit point of C from both sides. If you don't take
precautions, you get stuck with
G = ~ {T(e):e ~ x} < 00,
but
~ {T(e):x < e < e*} = 00 for all e* > x.
Then you can't continue the sample function past G.
I want [A(e), pee»~ to be the eth interval of constancy. Part of this is free:
e -+ [A(e), pee»~ for e E C
184 EXAMPLES FOR THE STABLE CASE [6

is order-preserving and 1-1. But why is [A(e), p(e» a maximal interval of


constancy? If there happens to be a least element 1 > 0 in C, I assume (64).
This encourages [A(e), p(e» to be maximal. The argument is in (94).

The parameters and tile space n


(48) Let I be a countably infinite set. Give I the discrete topology, and let
1 = I U {q;} be its one-point compactification.
(49) Let q be a function from I to [0, 00).
Do not assume q(i) > O.
(50) Let C be a countably infinite set, linearly ordered by <, with a first
element O. For dEC, let
Cd = {e:e E C and e ~ d}.

(51) For each dEC, assume that there is a 1-1 order preserving map
Md = M(d, .) of Cd onto C, such that:
Mo is the identity;
M[M(a, b), M(a, e)] = M(b, e) for a < b < e in C.
> O. If C is discrete, each e has an
(52) Say C is discrete iff there is a least e
immediate successor, call it s(e): use (51). Let 1 = s(O). Otherwise,
say C is indiscrete: then C is order-isomorphic to the nonnegative
rationals.
(53) Illustration. Let C = {O, 1,2, ... } with the usual order. Define
M(d, n) = n - d for n = d, d + 1, ... .
(54) Illustration. Let C consist of all pairs (m, n) of nonnegative integers.
Let (m, n) < (m', n')iff
m < m', or
m = m' and n < n'.
Let d = (m, n) ~ (m', n'). Define
Md(m', n') = (0, n' - n) for m' = m
= (m' - m, 11') for m' > m.
You check (51).
(55) Illustration. Let C consist of all pairs (m, n) of integers, such that:
n ~ 0 when m = O. Let (m, n) < (m', n') iff
m < m', or
m = m' and n' < n'.
6.4] THE SECOND CONSTRUCTION 185

Let d = (m, n) ~ (m', n'). Define


Mim', n') = (0, n' - n) for m' = m
= (m' - m, n') for m' > m.
You check (51).
(56) Illustration. Let C be the nonnegative rationals, with the usual order.
Let Md(r) = r - d for r ?; d.
(57) Let 0 be the set of all functions w from C to J. Let
g(c, w) = w(c) for c E C and WE O.
Give 0 the smaIlest a-field over which all ~(c) are measurable.
(58) For each i E I, let ri be a probability on O.
There is a condition (65) on q and r. To state it, and for later use, make
the foIl owing definitions.
(59) For c E C and wE 0 let,
A.*(C, w) = ~d {I/q[w(d)]:d E C and d < c}
p*(c, w) = ~d {1/q[w(d)]:d E C and d ~ c}.

(60). Observation. Let CE C and w EO with A*(c, w) < 00. For each
i E I, there are only finitely many indices d < C with w(d) = i.
(61) Let 0* be the set of all wE 0 such that
supc {p*(c, w):c E C and A*(C, w) < oo} = 00.

(62) Observation. wE 0* iff


~c {I/q[w(c)]:c E C and ).*(c, w) < oo} = 00.

The conditions

For each i E I, suppose:


(63) ri{~O = i} = 1;
(64) ri{~(l) :;6 i} = 1 if C is discrete, as defined in (52);
(65) ri{o*} = I, where 0* was defined in (59,61).
The next assumption is the Markov property. Suppose
(66) ri{~(Cm) = im for m = 0, ... ,n} = n;:.-:!o y[M(c m , Cm+l)' i m , imHl
186 EXAMPLES FOR THE STABLE CASE [6

where
Co = 0 < C1 < ... < Cn are in C, and
io = i, iI, ... , in are in I, and
(67) Y(C,j, k) = rj{~(C) = k}.
Here is a digression. What y can appear in (67)? Clearly,
y(c,j, k) ~ 0
"f. k y(c,j, k) = 1
y(O, i, i) = 1
y(1, i, i) =0 if C is discrete
y(b, i, k) = "f. j y(a, i,j) . y(Mab,j, k) for a < b.
Conversely, suppose y satisfies these conditions. Then you can define by r
(66) and Kolmogorov: use (51) to help the consistency. Properties (63-64)
are immediate. Property (65) remains an assumption.

The construction
(68) Let W be the set of all functions w from C to (0, 00 J. Let
T(C, w) = w(c) for cE C and wE W.

Give W the smallest a-field over which all T(C) are measurable.

(69) For c E C and wE W, let

J.(c, w) = "f.d {T(d, w):d E C and d < c}


p(c, w) = J.(c, w) + T(C, w) = "f. ct {T(d, w):d E C and d ~ c}.
(70) Let:!( = Q X W, with the product a-field.

(71) Let x = (ro, w) with ro E 0 and wE W. Let C E C. Define

~(c, x) = ~(c, ro) and T(C, x) = T(C, w)


J.(c, x) = J.(c, w) and p(c, x) = p(c, w).

(72) Define a process X on :!(:

X(t, x) = ~(c, x) if J.(c, x) ~ t < p(c, x) for some CE C


=cp if J.(c, x) ~ t < p(c, x) for no CEC.
6.4] THE SECOND CONSTRUCTION 187

(73) For each WEn, let 1]q(CJ) be the probability on W which makes the
T(C) independent and exponentially distributed, the parameter for T(C)
being q[w(c)].
(74) For A c !!l" and WEn, let A(w) be the w-section of A:
A(w) = {w:w E Wand (w, w) E A}.

(75) For each i E J, let 7Ti be the following probability on !!l":

7T i {A} = In 1]q(w){A(w)} f;(dw);


the prior definitions are (57-58) and (68-70) and (73-74).
To state (76), let
Co = 0 < CI < ... < Cn be in C
io = i, iI' ... , in be in J;
and let

be independent and exponentially distributed, with respective parameters


qUo), qUI), ... ,qUn)' Let B be a Borel subset of Euclidean (n+ I)-space.
(76) 7Ti{~(Cm) = im for m = 0, ... ,n and (T{C O)' T{CI), ... , T{C n» E B}
= ri{~{Cm) = im for m = 0, ... , n}' Prob {(Uo, Ub ... , Un) EB}.
Use (76) and (63):
07) 7Ti{~(O) = i} = I and 7Ti{T(O) > t} = e-q(i)t.

Properties of X

(78) X is jointly measurable.


(79) X{t) = ~(O) for 0 ~ t < T(O).
(80) X{O) = ~(O) by (79).
(81) 7Ti{X{O) = i} I =
by (80, 77).
(82) If C is discrete in the sense of (52), then
X{t) = ~(1) for T(O) ~ t < T(O) + T(I).
(83) Review (68-69). Let
C,[W] = {c:c E C and A(e, w) < oo}
Pt(w) = ~c {T(C, 11"):c E CAw]}
C,[w, w] = C,[w] and Pt(w, w) = Pt(w).
188 EXAMPLES FOR THE STABLE CASE [6

The f is for finite. The square bracket is to prevent confusion with sections.
Fix x = (£0, w) E ffl". Review (70-72).
(84) The map c -- [A,(c, x), p(c, x» is 1-1 and strictly increasing on C,[x].
(85) X(t,x) = E(c,x)forA.(c,x) ~ t < p(c,x),anintervaloflengthT(c,x).
(86) The union of [A,(c, x), p(c, x» as c varies over C,[x] is
{t:t<Pt(x) and X(t,x)EI}.
(87) Lebesgue {t: t < pix) and X(t, x) = p} = o.
(88) X(t, x) = p for t ~ Pf(X).
(89) WARNING. X(·, x) need not be regular in the sense of (7.2). And
[A,(c, x), p(c, x» need not be a maximal interval of constancy. See (94).

The set [!l"1

(90) Review (52, 57). If C is discrete, let


QS = {£O:£o E Q and £o[s(c)] #- £o(c) for all c E C}.
If C is indiscrete, let Qs = Q.
(91) Lemma. fi{Q'} = 1, where Qs is defined in (90).

PROOF.

(92)
Use (64, 66).
Review (59, 61) and (68-70) and (83) and (90). Let fI1 be the set of
*
x = (£0, w) E ffl" such that:
£0 E Q * f"'I QS, and
A,(c, w) < 00 iff A,*(c, (0) < 00 for all c E C, and Pt (w) = 00.

(93) Lemma. 7I"i {ffl"1} = 1, where ffl"1 is defined in (92).


PROOF.t For c E C and £0 E Q, let
W[c, £0] = {w:w E Wand A,(c, w) < oo} when A,*(c, (0) < 00
= {w:w E Wand A,(c, w) = oo} when A,*(c, (0) = 00.

For £0 E Q, let
W'" = nc {W[c, £o]:c E C}.
Let
Woo = {w:w E Wand Pt(w) = oo}.
Let
Q 1 = Q* f"'I QS.

t There's less here than meets the eye; but keep track of the notation.
6.4] THE SECOND CONSTRUCTION 189

Then f![1 is the set of pairs (w, w) with w E 0 1 and WE WW n WOO" By (75),

7T i{f![I} =1 0 1
'lJq(w){Ww n Woo} ri(dw).

But r i {D1} = 1 by (65) and (91). Fix wED. By (73) and (5.33),
1]q(w){W[e, w]} = 1 for each e;
so
1]q(w){Ww} = 1.
Review (59, 61). Fix wE 0*. Let
C;[w] = {e: }.*(e, w) < oo}
O'/(w, w) = ~c {w(e):e E C;[w]}.
By (62, 73) and (5.33),
1]q(w){w:w E Wand O'/(w, w) = oo} = 1.
Review (83). If W E W w, then C:[w] = C/[w], so
O'f(w, w) = pAw, w) = pj(w).
Consequently,

Ww n Woo = Wo> n {W:W E Wand O'tCw, w) = oo}


has ,/}q(w)-probability 1.

(94) Proposition. (a) If x


then X (-, x) is regular in the sense of (7.2).
E f![1,
*
(b) If x E f![1 and e E C and }.(e, x) < 00, then [A(e, x), pee, x)) is a
maximal interval ofeonstaney in X(', x).
PROOF. Claim (a). I will argue that X(', x) is continuous from the
right at t. The existence of a limit from the left is similar. If
tE Uc [A(e, x), pee, x)),
it's easy. Otherwise, X(t, x) = 97. Let tn ! t, with X(tn' x) = jn E 1.
I have
to make jn -+ 97· By (84-86), you can find a nonincreasing sequence en E C
with
A(en, x) ~ tn < peen' x) and ~(en' x) = jn-
As definition (92) implies, A*(en, x) < 00. Use (60) and definition (71).
Claim (b). Suppose C is discrete in the sense of (52); the argument for
ic::!iscrete C is easier. Suppose pee, x) < 00. Then
pee, x) = A(s(e), .1').
But Hs(e), x) =/= ~(e, x), because x E Os X W, as defined in (90). So X(', x)
190 EXAMPLES FOR THE STABLE CASE [6

changes at pee, x). If e = sed) for some d, the same argument makes X(·, x)
change at A(e, x). If e >
0 and e = sed) for no d, then C is a limit point of
C from the left. Use (60) to find e(x) < e, such that
Hd, x) -:;E:. ~(e, x) for c(x) < d < e.
So
X(t, x) -:;E:. ~(c, x) for p(c(x), x) ~ t < A(c, x).
This forces X(·, x) to change at A(e, x).
(95) Lemma. Tri {X(t) E J} = 1. *
PROOF. If q(i) = 0, use (77) and (79). Suppose q(i) > O. By (76),
(96) T(O) is exponential with parameter q(i), and is independent of
{~(c), T(e):e >
O}, relative to Tr i .
Let
Yes, x) = X[T(O, x) + s, xl.
So Yis jointly measurable by (78). If x E fIb then p,(x) = 00 by definitions
(83, 92), so
Lebesgue {s: yes, x) = T} = 0
by (87). Temporarily, let
Ao(c) = ~d {T(d):d E C and 0 < d < e}.
Then
yes) = ~(c) if )'o(c) ~ s < Ao(c) + T(e) for some e E C
= T if Ao(e) ~ s < Ao(c) + T(C) for no C E C.
Therefore, Y is measurable on {Hc), T(C):C > OJ. Now (96) makes T(O)

*
exponential with parameter q(i), independent of Y, relative to Tr i • Complete
the argument as in (19).

The a-fields d and f!8

Fix i and j in J, and dEC.


(97) Let d be the a-field in Q spanned by ~(c) with c ~ d.
(98) Let
o= Co < CI < ... < Cn be in C
em = M;;IC m for m = 0, ... , n
jo = j,h, ... ,jn be in J
BI = {w:w E Q and w(c m) = jm for m = 0, ... , n}
ill = {w:w E nand w(em) = jm for m = 0, ... ,n}.
The prior definitions are (51) of M and (57) of Q.
6.4] THE SECOND CONSTRUCTION 191

(99) Lemma. Let h be a nonnegative, d-measurable function on n, as


defined in (97). Define Bl and Bl as in (98). Then

JBi
h dFi = [ r
Jrg(d)~j}
h driJ . rj{B I }·

PROOF. This restates the Markov property (66): time d is the present, h is
in the past, BI is in the future, and Bl is BI shifted to start at time O. Formally,
let
do = 0 < dl < ... < ds = d be in C
io = i, iI' ... , (v = j be in I
D = {w:w E nand w(dm ) = im for m = 0, ... , N}.
Now
eo = Md l Co = Md l 0 = d = d.v
do < d < ... < dx < e < ... < en
l 1

jo = j = is·
By (66-67),

where

and
q= n~-::o y[M(e m, em+I),jm,}mH] = ri{B I}:
because (51) makes
M(e m , e mH ) = M(Mde m , Mde mH ) = M(c m, CmH)·

= =
*
So (99) holds for h 1D . By (10.16), the result holds for h IA with
A E d. Now extend.

(100) Let f!d be the a-field in W spanned by T(C) with C ~ d.

(101) Lemma. Review (73). If BE f!d, as defined in (100), then w -- 1]q(w){B}


is d -measurable, as defined in (97)
PROOF. This is easy when
B = {w: WE Wand w(c m ) < tm for m = 0, ... , n}
with
Co < CI < ... < Cn ~ d in C.

*
Now extend.
192 EXAMPLES FOR THE STABLE CASE [6

The generator
(102) Let pet, i,j) = 'lTi{X(t) = j}.
(103) Define a matrix Q on I as follows:
QU, i) = -q(i);
when C is discrete in the sense of (52),
QU, sCi»~ = q(i)
QU,j) = 0 for j =F i, sCi);
when C is indiscrete,
QU,j) = 0 for j =F i.
(104) Lemma P'(O) = Q, as defined in (102-103).
PROOF When C is discrete, you can use the corresponding argument in
(5.39). The results you need are (64, 76, 79-82).
Suppose C is indiscrete. Fix i and j in I, with j = i allowed. The case
q(i) = 0 is easy, so assume q(i) > O. Fix 00 f/= C, and pretend 00 > C for all
C E C. Review definition (59,61) of Q*. Define a measurable mapping K
from Q* to C U {oo} as follows. If A*(C, (0) < 00 and w(c) =j for some
C > 0, there is a least such C by (60); and K(w) is this least c. Otherwise,
K(w) = 00. Count C off as {c1 , C 2 , ••• }. Define a measurable mapping L
from Q* to C as follows:
L(w) is the Cn with least n satisfying 0 < Cn < K(w).
Because C is indiscrete, L is properly defined. Define a measurable mapping
, from Q* to I:
'(£0) = w[L(w)].
For each k E I, let Ui and Uk be independent, exponential random vari-
ables, withparametersq(i)andq(k). Ifw E Q* and ~(O, (0) = iand '(£0) = k,
then (73) shows:
(105) the 7Ja(O»)-distribution of 7'(0) and 7'[L(w)] coincides with the distri-
bution of Ui and Uk'
I claim:
(106) 'lTi{7'(O) ;;a t and X(t) = j} = oCt) as t -+- O.
To argue (106), abbreviate
A, = {7'(0) ;;a t and X(t) = j}.
6.4] THE SECOND CONSTRUCTION 193

By (65) and (75),

(107)

Fix w E {O* and HO) = i and, = k}. Abbreviate


£[/, w] = {Wand T(O) + T[L(w)] ~ t}.
Define W", as in (93), and remember 'f}q(",) {W",} = 1 by (5.33). I claim that
W", n At(w) C £[/, w]. Indeed, fix an x = (m, w) with WE W", n AtCw).
Then X(t, x) = j. So (72) there is aCE C with

A(c, w) ~ t < ptc, w) and ~(c, w) = j.


As (92) implies, A*(c, w) < 00. So K(w) E C, and
0< L(w) < K(w) ~ c.
Now (84) shows

T(O, w) + T[L(w), w] ~ p[L(w), w] < A(c, w) ~ t,


proving Wro n At(w) C £[/, w]. Conclude

'f}q(ro){At(w)} ~ 'f}q(ro){£[t, w]} = Prob {U i + Uk ~ t}


by (105). Combine this with (107):

7Ti{At} ~ ~k 7T i g = k} . Prob {Ui + Uk ~ t}.


But (5.34) makes

q(i) ~ t- 1 Prob {Ui + Uk ~ t}-O as t-O.


Now dominated convergence settles (106).
If j 'jIf i, then (79) makes

{~(O) = i and X(t) = j} C {T(O) ~ t};


so (77) and (106) prove
7Ti{X(t) = j} = oCt) as t - O.
This proves P'(O, i,j) = 0 = Q(i,j) for i 'jIf j.
I will now compute P'(O, i, i). Check that {~(O) = i and X(t) = j} equals
{~(O) = i and T(O) > t} u g(O) = i and T(O) ~ t and X(t) = i}.
Use (77) and (106):
= i} = e-q(i)t + oCt)
7T i {X(t) as t - O.
*
194 EXAMPLES FOR THE STABLE CASE [6

The theorem
(108) Theorem. Suppose (48-51) and (63-66). Define the probability triple
(,q£, 'Tr i ) by (70, 75). Define the process X on ,q£ by (72). Define P and Q by
(102-103). Then
(a) P is a standard stochastic semigroup on I, with generator Q.
(b) X is Markov with stationary transitions P and starting state i,
relative to 'Tri'
NOTE. The construction has properties (78-88) and (93-94).
PROOF. To start with, fix i andj in I, fix t ~ 0, and fix dEC.
(109) LetD = {Wand/,(d) ~ t < )'(d) + T(d)}; the definitions are (68-69).
(I 10) Define a mapping TI of 12 into 12:
(TIW)(C) = w(Mi1c) for c E C;
the prior definitions are (51, 57).
(111) Defining a mapping T2 of D into W:
(T2W)(0) = )'(d, w) + T(d, w) - t;
(T2W)(C) = w(MiIc) for c E C with c > 0;
the prior definitions are (51), (68-69), and (109).
(112) Define a mapping T of Q x D into,q£:
T(w, w) = (TIW, T 2w).
You have to argue
(113) X(t + s) = Xes) 0 T for all s on 12 x D.
This is a straightforward and boring project, using (84-86) and (88).
(I 14) Define a subset A of,q£ as follows.
A = Al X (A2 n D), where:
D was defined in (109);
Al = {w:w E 12 and w(dm) = im for m = 0, ... ,N};
A2 = {w: WE Wand w(dm) > tm for m = 0, ... , N - I};
do = 0 < d} < ... < dN = d are in C;
io = i, iI, ... , iN = j are in I;
to, t}, ... ,tN-I are nonnegative numbers.
NOTE. m < N in A 2•
6.4] THE SECOND CONSTRUCTION 195

(115) Define a subset B of f![ as follows.


B = BI X B z, where:
BI = {w:w EQ and w(c m) =jm for m = 0, ... ,n};
B z = {w: wE Wand w(c m ) > U m for m = 0, ... ,n};
Co °
= < CI < ... < Cn are in C;
jo = j,h, ... ,jn are in J;
uo, UI , ••• 'Un are nonnegative numbers.
I claim:
(116)
To start on (116), make the following definition.
(117) With the notation of (115), let
em = Mil Cm for m = 0, ... , n,
the M coming from (51); so
do °
= < d1 < ... < d s = d = eo < e l < ... < en,
the d m coming from (114); let
BI = {w:w E Q and w(e m) = jm for m = 0, ... , n};
Bz = {w: wE Wand l1'(e m ) > U m for m = 1, ... ,n};
fJ = {w: WED and t + Uo < )'(d, 11') + T(d, w)},
where D was defined in (109).

NOTE. m > °in B 2•

Remember iN = j = jo and dN = d = eo. Confirm

(118) A n T-IB = (AI n BI ) x (A z n fJ n liz).

By (75),
(119) 7T;(A n T-IB) = f A 'fjq(w)(A z n fJ n liz) r;(dw).
A1I"'IBl

But A z and fJ are measurable on the a-field ffI of (100). Use definition (73):
(120) 'fjq(w)(A 2 n fJ n liz)= 'fjq(w)(A z n fJ). e-v , where
v = q(jl)U I + ... + q(jn)u n and W E iiI'

Let ~ be the a-field in W spanned by T(C) with C < d. Then Az and )'(d) are
~-measurable: definitions (114) and (69). Abbreviate u = uo. Remember
196 EXAMPLES FOR THE STABLE CASE [6

iN = j = jo and dN = d = eo. Use (73) and (5.30):


(121) "lq(ro)(A 2 (') D) = "lq(ro)(A 2 (') D) . rq(j)u when wed) = j;
the set D comes from (109).
Combine (119-121):
(122) 7Ti(A (') T-IB) = rsf A "lq(ro)(A 2 (') D) ri(dw), where
A 1 nB1

S = q(jo)uo + q(h)u I + ... + q(jn)un,


But A2 (') D E ~; the definitions are (114, 109) and (100). So (101) makes
w ---+ "lq(w)(A 2 (') D)
d-measurable, as defined in (97). Check Al E d and Al c g(d) = j}
from definition (114). By (99),

(123) f A

A 1 nB1
"lq(w)(A 2 (') D) ri(dw) = [f "lq(ro)(A 2
Ai
(') D) ri(dW)] . rj(BI )·

By (75) and (114),

(124)

By (76) and (115),


(125) 7T'j(B) =r S rj(BI ), where s comes from (122).
Combine (122-125) to get (116).
The class of sets B of the form (115), with variable n, em, jm' and Urn' is
closed under intersection, modulo the null set, and generates the full a-field
on {,q' and ~(o) = j}. Each B is a subset of {,q' and ~(o) = j}; and this set is of
the form (115), with n = 0 and Uo = O. Both sides of (116) are countably
additive in B. By (10.16), equality holds in (116) for any A of theform (114),
and any measurable subset B of {,q' and ~(O) = j}.
I now have to vary A.
(126) Let A(d) = g(O) = i and ~(d) = j and A(d) ~ I < p(d)} c ,q'; let
d(d) be the a-field of subsets of A(d) generated by sets of the form
(1 14).
The class of sets A of the form (114), with variable N, dm, i m , and 1m is
closed under intersection, modulo the null set. Each A is a subset of A(d),
and A(d) is of the form (114), with N = 1 and to = O. Both sides of (116)
are countably additive in A. By (10.16), equality stands in (116) for all
A E d(d) and all measurable subsets B of {,q' and ~(O) = j}. Put
B = {~(O) =j and Xes) = k}.
6.5] EXAMPLES ON THE SECOND CONSTRUCTION 197

Use (113):
T-IB = {Q x D and ~(d) =j and X(t + s) = k}.
From (126),
(127) A(d) c {Q x D and ~(d) = j}.
So extended (116) makes
(128) 7T i {A and X(t + s) = k} = 7T i {A} . 7Tj{X(S) = k} for all A E d(d).

How big is d(d)?


(129) Let .fF(t) be the (J'-field in f1l' spanned by X(u) for 0 ~ u ~ t.

I claim
(130) If A E .fF(t) as defined in (129), then A(d) (l A E d(d), as defined
in (126).
NOTE. I do not claim A(d) E .fF(t).

To argue (130), let 0 ~ u ~ t and let h E I. Then


{A(d) and X(u) = h} = Uc {G(e):e E C and e ~ d}, where
G(e) = {A(d) and ~(e) = hand A(e) ~ u < pee)}.
If e <
d, then G(e) E d(d) by definitions (126) and (69). If e = d, then
pee) = p(d) >
u is free on A(d), so G(e) is still in d(d). This proves (130).
The sets A(d) of (126) are disjoint as d varies over C, and their union is
{X(O) = i and X(t) = j}.
Use (128-130):
(131) 7T i {A and X(t + s) = k} = 7T;{A} . 7Tj{X(S) = k}
for all A E .fF(t) with A c {X(O) = i and X(t) = j}.
Now use lemma (5.4). Condition (5.3a) holds by (81). Condition (5.3b) holds
by (95). Condition (5.3c) holds by (131). This and (104) prove (108). ***
5. EXAMPLES ON THE SECOND CONSTRUCTION

The first example will be useful in proving (7.51).


(132) Example. (a) Description. Let I be a countably infinite set. Let Q
be a matrix on I, such that
q(i) = -QU, i) ~ 0
Q(i,j) ~ 0 for i ::F j
~j Q(i,j) = o.
198 EXAMPLES FOR THE STABLE CASE [6

Let
r(i,j) = Q(i,j)/q(i) for i ~ j and q(i) >0
= 0 elsewhere.
So
r(i, i) = 0
r(i,j) =0 when q(i) =0
~; r(i,j) = 1 when q(i) > O.
Let p be a probability on l. Starting from i, the process jumps according to
r, and the holding times are filled in according to q. If the process hits an
absorbing state j, that is q(j) = 0, the visit to j has infinite length and the
sample function is completely defined. Otherwise, the sample function makes
an infinite number of visits. However, the time () to perfonn these visits may
be finite. If so, start the process over again at a state chosen from p, independ-
ent of the past sample function. Repeat this at any future exceptional times.
See Figure 2. If () is finite with positive probability, then there is a 1-1
correspondence between p and the transitions Pp.

..
1
I
j
! J
~/ ,i
W,I)

~(O, 1) W,O) rr(1, 1)-

--r(O, 1)-1 ~(O, 2) .r(1,O)- ~(2, 1)


+r(0,2). H2, 0) ~r(2, 1)~
~(O, 0) io-r(2,0)-

~r(O, 0)- X

Figure 2.

(b) State space. Fix a ~ b outside I. The state space (48) is


1= I u {a, b}.
(c) Holding times. Extend q to vanish at a and b. So q is defined on 1.
(d) Generator. Extend Q so QU, c) = Q(c, i) =0 for i Eland
c = a or b.
6.5] EXAMPLES ON THE SECOND CONSTRUCTION 199

(e) Formal construction. Define C, <, and M as in (54). Extend r


to a matrix on 1 as follows:
r(i, a) = 0 or 1 according as q(i) > 0 or q(i) = 0, for i E I;
r(i, b) = 0 for i E I;
r(c, i) = 0 for c = a or band i E I;
rea, b) = reb, a) = 1
rea, a) = reb, b) = O.
I need a and b to get (64). The extended matrix is stochastic on 1. Define
the probability r/ of (57-58) by the requirements that it makes:
{~(m, n):n = 0, 1, ... } independent Markov chains with stationary
transitions r for m = 0, 1, ... ;
~(O, 0) = i almost surely;
~(m, 0) have distribution p, for m > o.
You should check (48-51) and (63-64) and (66). I will check (65) for i E I;
you do i = a or b. Relative to rr, the variables l/q[~(m, 0)] are independent
and identically distributed for m = 1, 2, .... They are positive. So
(133) ~CEC l/q[~(c)] ?; ~~~1 1/q[~(m, 0)] = 00
with rr-probability 1. Fix one w satisfying (133). I say w E Q*, as defined in
(59, 61).IfA*(c, w) < 00 for all c, this follows from (62). If A*(C, w) = 00 for
some c, then there is a least such c, call it c( w): because C is well-ordered.
So,
A*(C, w) < 00 iff c < c(w);
and
~c {l/q[w(c)]: c E C and A*(c, w) < oo} = ~c {l/q[w(c)]:c E C and c < c(w)}
= A*(C(W), w)

*
= 00.
So (62) works again. Theorem (l08) completes the formal construction.
Write
71'; = r; x 'Yj,
as defined in (75), to show the dependence on p. I would now like to isolate
the properties of the construction that will be useful in (7.51). Fix i E l.
Use (76):
(134) ~(1,0) is independent of g(O,n),T(O,n):n=O,l, ... } and has
distribution p, relative to 71'{.
Let
(fn = T(O, 0) + ... + T(O, n - 1) for n = 1,2, ...

Use (134):
(135) 71'r{O < 00 and ~(1, 0) = j} = 71'[{O < oo}· p(j) for j E I.
200 EXAMPLES FOR THE STABLE CASE [6

To state (136), let fIo be the subset of fIl' as defined in (92), with:
~(m, 0) E I for m = 0, I, ... ;
r[~(m, n), ~(m, n + I)] > 0 for (m, n) e C.
(136) Lemma. (a) 7Tf{fIO} = 1 for i E I.
(b) If x E fIo, then X(t, x) E I for all t ~ o.
(e) If x E fIo, then X(·, x) is regular in the sense of (7.2).
PROOF. Claim (a). Use (93) and (76).
Claim (b). Let x E fI o. Suppose ~(., ·)(x) visits a or b. Let (m, n) be the
first index with ~(m, n) = a or b. Get n > 0 and q[~(m, n - I)(x)] = o. So
A*(m, n)(x) = 00 by definition (59), forcing A(m, n)(x) = 00 by definitions
(69, 92). This prevents XL x) from reaching a or b, by definition (72). You

*
check X(t, x) =F cpo
Claim (c). Use (94).
(137) Lemma. Let PP(t, i,j) = 7TF{X(t) = j} for i and j in I. Then pP is a
standard stochastic semigroup on I, with generator Q.

*
PROOF. Use (108) and (136).
DISCUSSION. Fix x E fIo, as defined for (136). Here is a description of
XC·, x). Let
OJf(x) = ~ {T(m, n)(x):(m, n) E C and m < M};
so 0o(x) = 0 and Ol(X) = 0Cx). Suppose A(M, N)(x) < 00. Let m be one of
0, ... ,M - 1.
X(·, x) is a step function on [Om(x), 0m+l(x», visiting ~(m, n)(x)
with holding time T(m, n)(x) for n = 0, I, ....
X(·, x) is a step function on WM(X), p(M, N)(x», visiting ~(M, n)(x)
with holding time T(M, n)(x) for n = 0, ... , N.
And
limn~oo ~(m, n)(x) = cp;
in fact,
~;:o I/q[~(m, n)(x)] < 00.
Part of this I need. Keep x E Xu, and check (138-140); the times 0 and (1n
were defined after (134).

(138) (1n < 00 iff there are at least n + 1 intervals of constancy in X. If


(1l1(X) < 00, then X(·, x) begins by visiting
~(o, O)(x), ~(O, I)(x), ... , ~(O, n)(x)
6.5] EXAMPLES ON THE SECOND CONSTRUCTION 201

on intervals of length
T(O, O)(x), T(O, 1)(x), ... , T(O, n)(x).
(139) ~(1, 0) = limX(r) as rational r decreases to e, on {e < oo}.
Use (138-139).
(140) The sets {e < oo} and {e < 00 and ~(l, 0) = j} are in the (J'-field
spanned by {X(r):' is rational}, on fl'o.
Here is a more explicit proof of (140). For any real t, the event that e ~ t
coincides with the event that for any finite subset J of I, there is a rational
, ~ t with X(,) 1= J. The event that (J < 00 and HI, 0) = j coincides with the
event that for any pair of rationals rand s,
either (J 1= (r, s)
or there is a rational t with r < (J < t < sand X(t) = J.
WARNING. 7Tf has mysterious features not controlled by the semigr 'up of
transition probabilities, like the beauty of the sample functions. HONever,
the 7Tf-distribution of X retracted to rational times has no mysteries at all:
it is completely controlled by the semigroup and the starting state i. Since Q
is silent about p, it does not determine the semigroup.

°
(141) Note. To get the simplest case of (132), let I be the integers. Let
r(n, n + 1) = 1 for all n. Let < q(n) < 00 with

~;:-oo l/q(n) < 00.


(142) Example. (a) Description. For each sample function, there are
exceptional times t such that for any e > 0, on (t - e, t) and on (t, t + e),
the function assumes infinitely many values in the integers I. Give I the
discrete topology, and let I U {If} be the one-point compactification. There
is no natural way to assign an I-value to the sample function at an exceptional
t. But if the sample function is set equal to If at exceptional t, continuity in
I U {If} is secured there. Starting from i, the process moves successively
through i, i + 1, .... This specifies the process only on a finite interval,
[0, (J). At (J and future exceptional times, restart the process with i = - 00.
See Figure 3.
(b) State space. I is the integers.
(c) Holding times. °< < 00; and ~iEI 1jq(i) < 00.
°
q(i)
(d) Generator. Q(i,j) = unlessj = i or i + 1; and Q(i, i) = -q(i);
and Q(i, i + 1) = q(i).
(e) Formal construction. Define C, <, and M as in (55). Define the
202 EXAMPLES FOR THE STABLE CASE [6

riof (57-58) by the requirements:


~(O, n) = i + n for all n = 0, I, ... , almost surely;
~(m, n) = n for all m = I, 2, ... and integer n, almost surely.
You should check (48-51) and (63-66). Then use (l08). *
1
4

T(O,O)

°
-J

-2

-3

-4

1 I Figure 3.
I
(143) The generators in (141) and (142) are equal. But the sample functions
are very different.
(144) Example (a) Description. The process moves cyclically through the
rationals in [0, I).
(b) State space. I is the set of rationals in [0, I).
(c) Holding times. q is arbitrary, subject to
q(i) > 0 for all i, and ~iEI Ijq(i) < 00.
(d) Generator. Q(i, i) = -q(i) for all i, and Q(i,j) = 0 for i ¢ j.
(e) Formal construction. Define C, <, and M as in (56). Define the
r i of (57-58) by the requirement that
~(c) E I and ~(c) = i + c modulo 1 for all c, almost surely.

You should check (48-51) and (63-66). Then use (l08). *


6.6] MARKOV TIMES 203

6. MARKOV TIMES

Example (160) and the results of this section may help you understand
some of the technicalities in my formulation of strong Markov, Section 7.4.
The results of this section will not be used in other chapters of the book. Let
(,q[,.'F) be a measurable space. Let V be a compact metric set. Endow V
with the Borel a-field. For each t ~ 0, let X(t) be a V-valued and .'F-measur-
able function on ,q[.
(l45a) Let .'F(t) be the a-field in ,q[ spanned by Xes) for 0 ~ s ~ t.

(145b) Let .'F(t+) = n:=l.'F(t + ~).


(146a) Say 0' is a strict Markov time iff 0 ~ 0' ~ 00 and
{a ~ t} E .'F(t) for all t ~ O.
(146b) Say T is a Markov time iff 0 ~ T ~ 00 and
{T < t} E .'F(t) for all t ~ O.
Suppose p is measurable and 0 ~ p ~ 00.
(146c) Let .'F(p) be the collection of A E.'F such that
A n {p ~ t} E .'F(t) for all t
O. ~

(l46d) Let .'F(p+) be the collection of A E .'F such that


A n {p < t} E .'F(t) for all t ~ O.
You can check (147).
(l47a) A E .'F(p+) iff A n {p ~ t} E .'F(t+) for all t ~ O.
(147b) ~(p) c ~(p+).
(147c) p is Markov iff {p ~ t} E .'F(t+) for all t ~ O.
(147d) p is Markov iff ,q[ E .'F(p+): then .'F(p+) is a a-field.
(1 47e) p is strict Markov iff ,q[ E .fT(p); then .fT(p) is a a-field.
(l47f) Strict Markov times are Markov.
(148) Lemma. Let 0' and T be two strict Markov times.
(a) {a < T} E .fT(T).
(b) {a < T} E .fT( 0').
(c) {a ~ T} E .'F(a) n .fT(T).
(d) {a ~ T} n .fT(a) c {a ~ T} n .fT(T).
(e) {a = T} E .'F(a) n .fT(T).
(f) {a = T} n .fT(a) = {a = T} n .fT(T).
204 EXAMPLES FOR THE STABLE CASE [6

PROOF. Let r range over a countable dense subset of [0, t], which
contains t.
Claim (a). {a < T} n {T ~ t} = Ur {a < r < T ~ t}.
Claim (b). {a < T} n {a ~ t} = Ur {a ~ r < T}.
Claim (c). {a ~ T} = gr\{T < a}. Use (a) and (b).
Claim (d). Let A c {a ~ T} and A E §"(a). Then
A n {T ~ t} = Ur {A and 0' ~ r} n {T ~ r}.

Claims (e) and (f) follow from (c) and (d), because
= =
{a T} {a ~ T} n {T ~ a}.

(149) Lemma. Let T be a Markov time. For each n, let an be a strict Markov
*
time. For each x, suppose

Then §"(a n ) is nonincreasing with n, and


§"(T+) = n:=l §"(a n )·

PROOF. To begin with, §"(T+) C §"(o'n): adapt the argument for (148).
Next, §"( an) nonincreases by (148d). Finally, let A E §"( an) for all n. I have
to get A E §"(T+). Let En = {an ~ t}. Then En i {T < t}. So
< t}.
A n En E §"(t) and A n En iAn {T
*
(150) Lemma. Let T be a Markov time. Then T + -1 is a strict Markov time,
and §"(T + ~) is non increasing with n, and n

§"(T+) = n:=l§"(T + ~).


PROOF. Use (149).
(151) Definition. If"'i. is a sub o':field of §", then "'i.(x) is the "'i.-atom containing
*
x: namely, the intersection of all "'i.-sets containing x. Alternatively, y E "'i.(x)
iff x and yare both in or both out of any "'i.-set. Say "'i. is separable iff"'i. is the
rc
smallest o':field which includes a countable collection ofsets. Say"'i. is saturated
iff any A E §" which is a union, even uncountable, of "'i.-atoms is in "'i..
I will quote the main result (152) on saturation from Blackwell (1954)
without proof.
(152) Suppose (gr, §") is Borel. Any separable sub a-field of §" is saturated.
6.6] MARKOV TIMES 205

Borel is defined in Section 10.9. Atoms are discussed in Section 10.5.


(153) Lemma. Let ~l ~ ~2 ~ ••• be sub a-fields of ~. Let

~<Xl = n~l~n and IX(X) = U~l~n(X),


(a) ~l(X) c ~2(X) C ••••

(b) ~<Xl(x) = IX (x).


(c) If each ~n is saturated, then ~<Xl is saturated.
The definitions are in (151).
PROOF. Claim (a) is easy.
Claim (b). Let y E IX(X). Then there is an n = n(x, y) with y E ~n(x).
No ~n-set can separate x andy: so no ~<Xl-set can do it. This proves:
IX(X) c ~<Xl(x).
Let y rj: IX(X). Then y rj: ~n(x) for all n. For all n, there is an An E ~n with
X E An and y rj: An.
Let A = lim inf An. Then x E A, and y rj: A. But A E ~<Xl' because the ~n
non increase. This proves y rj: ~<Xl(x), so
~<Xl(x) C IX(X).

*
Claim (c). Suppose A E ~ is a union of ~<Xl-atoms. Now (b) stops A
from splitting ~n-atoms, so A E ~n' forcing A E ~<Xl'
(154) Lemma. (a) Points x and yare in the same ~(t)-atom iff
Xes, x) = X(s,y) for 0 ~ s ~ t.
(b) Points x and yare in the same %(1+ )-atom iff there is a positive
8 = 8(X, y) such that
Xes, x) = xes, y) for 0 ~ s ~ t + B.
PROOF.
Temporarily,
Check (a). Then use (153b) to get (b).
*
X(oo, z) = O.
(155) Lemma. Let a be a strict Markov time.
(a) Then x and yare in the same ~(a)-atom iff a(x) = a(y) and
X(I, x) = X(I,y) for 0 ~ I ~ a(x).
(b) If a(x) = u < 00, and
X(t, x) = X(t,y) for 0 ~ t ~ u,
then a(y) = u.
Let T be a Markov time.
206 EXAMPLES FOR THE STABLE CASE [6

(c) Then x and yare in the same ~(T+ )-atom iff T(X) = T(y) and there
is an e = e(x, y) > 0 such that
X(t, x) = X(t,y) for 0 ~ t ~ T(x) + e.
(d) If T(X) = U < 00, and e > 0, and
X(t,x) = X(t,y) forO ~ t ~ u + e,
then T(Y) = U.
PROOF. Claim (a). Suppose x and yare in the same ~(a)-atom. Then
a(x) = a(y) because a is ~(a)-measurable, and
X(t, x) = X(t, y) for 0 ~ t ~ a(x)
because
{X(t) = v and a ~ t} E ~(a).

Conversely, suppose a(x) = a(y) = u say, and


X(t, x) = for 0 ~ t ~ u.
X(t,y)
Let A E ~(a). I have to show that x and yare both in or both out of A. But
A n {a ~ u} E~(U).

As (154a) shows, x and yare in the same ~(u)-atom: so both are in or both
are out of A n {a ~ u}. Since both are in {a ~ u}, it follows that both are in
or both are out of A.
Claim (b). The set {a = u} is in ~(u), and can't split ~(u)-atoms. But
x andy are in the same ~(u)-atom, by (154a).

*
Claim (c). Use (153), (150), and (a).
Claim (d) is like (b).
For the rest of the section, suppose
(156) X(', x) is right continuous for all x E fr.
(157) Proposition. Suppose (156), and suppose (fE,~) is Borel.
(a) ~(t) is separable, and saturated.
(b) ~(t+) is saturated.
For (c), let a be a strict Markov time. Define a process Yasfollows:
Y(t) = X(t) for all t on {a = oo};
yet) = X(t) for t ~ a on {a < oo};
= X(a) for t ~ a on {o' < oo},
(c) Y generates ~(a); in particular, ~(a) is separable and saturated.
For (d), let T be a Markov time.
6.6] MARKOV TIMES 207

(d) §"(7'+) is saturated.


Separable and saturated are defined in (151).
PROOF. Claim (a). Use (152) to saturate §"(t).
Claim (b). Use (a) and (153c).
Claim (c). Let qy be the a-field generated by Y. Check that yet) is §"(o')-
measurable, so qy c §" (0'). Check that Y is right continuous, so qy is separ-
able. Using (155a, b), check that qy has the same atoms as §"(a). Now use

*
(152).
Claim (d). Use (c) and (l53c) and (150).
NOTE. Suppose (156), and suppose (,q', §") is Borel. Let 0 ;;;; 0' ;;;; 00 be

°; ;
measurable, and satisfy (I 55b). Then a is strict Markov, by (157a). Let
7' ~ 00 be measurable, and satisfy (155d). Then 7' is Markov by (147c)
and (157b).
If (,q', §") isn't Borel, this characterization of stopping times fails, as does
(l57a); analyticity isn't enough. I don't know about (l57c).
EXAMPLE. Let A be a non-Borel subset of [0, 1]. Let B = [0, 1]\A. Let,q'
be the following subset of [0, I] X {O, I}:
(A X {O}) U (B X {I}).
For t ~ °and (u, v) E,q', let
X(t, (u, v» =u °
when ~ t ~ 1
= u + vet - 1) when t ~ 1.
Let §" be the smallest a-field in ,q' which makes each X(t) measurable. So X
is a process with real-valued, continuous sample functions. Let
a(u, v) = iv.
I claim:
(a) §"(I) is not saturated;
(b) 0' has property (l55b) and is §"-measurable, but is not strict Markov.
PROOF. Let f1d be the full Borel a-field in [0, 1] X {O, I}. Let ~ be the
a-field in [0, 1] X {O, I} of sets of the form C X {O, I}, where C is a Borel
subset of [0, 1]. For ~ = f1d or ~, let ~ be the a-field in ,q' of all sets ,q' r. S
with S E ~; the atoms of ~ are the singletons.
I say §" = Ii. Indeed, X(t) is ii-measurable, so §" c Pi. Conversely,
{(u, v): (u, v) E,q' and u ;;;; a} = {X(O) ;;;; a} E ?
{(u, v): (u, v) E,q' and v ;;;; b} = {X(2) - X(l) ;;;; b} E ?;
so ij c ? Similarly, ?(I) = ci.
208 EXAMPLES FOR THE STABLE CASE [6

I say A x {O} is in iJ but not in i. First,


A X {O} = !!C () ([0, I] X {O}).
Second, if C c [0, I] and
!!C () (C X {O, I}) = A x {O},
then C = A; so C is not Borel.
Claim (a). The set A X {O} is in :F and is a union of :F(I)-atoms, but is
not in :F(l).
Claim (b). The time a is :F-measurable. And a is 0 or t, according as
X(O) E A or X(O) E B. But
{a = O} = A X {O} ¢ :F(l).
Proposition (157c) identifies a generating class for :F(a): sets of the first*
kind {X(t) = j and a ~ t}, and sets of the second kind {X(a) =j}.
I once thought that sets of the first kind were enough, but this is seldom true.
EXAMPLE. Let (!!C, :F) be the cartesian product of Borel (0, CX» and {2, 3}.
For t ~ 0 and (u, v) E!!C, let
X(t, (u, v» = 1 when 0 ~ t < u
= v when t ~ u.
So X is a right-continuous process. Let
a(u, v) = u and ~(u, v) = v,
so a is a strict Markov time. Let ~ be the a-field generated by a, and let C be
the a-field generated by the sets
{X(t) = j and a ~ t}.
Clearly, ~ c C c :F(a). Let & be the probability on :F such that:
a and ~ are independent;
a is exponential with parameter 1;
~ is 2 or 3 with probability! each.

Parenthetically, & makes X a Markov chain. I claim:


(a) ~ is constant on C-atoms;
(b) each C-set differs by a &-null set from a ~-set;

(c) ~ is &-independent of C;
(d) C is inseparable.
PROOF. Claim (a). Suppose ~(x) = j -#: ~(y). If a(x) -#: a0'), then x and y
can even be separated by a ~-set. So let a(x) = a0') = t. Then x and yare
separated by the C-set {X(t) = j and a ~ t}.
6.6] MARKOV TIMES 209

Claim (b). The basic tS'-set {X(t) = j and a ~ t} differs by a .9'-null set
from the set {X(t) = j and a > t}. This set is empty unless j = 1, in which
case this set reduces to {a > t}. Either way, this set is in ~.

*
Claim (c). Use (b).
Claim (d). Use (a, c) and (152).
For the rest of this section, let! be a countably infinite set. Let V = I u {rp}
be the one-point compactification of discrete J.
EXAMPLE. Let (g[, g-) be the Borel space of sequences of ± l. Let
sn(x) = x(n) for n = 1,2, ... and x E g[. For t ~ 0 and x E g[, let
X(t, x) = 0 when t ~ 1
1
= sn(x)n when - - :::; t < -1 and n = 1,2, ...
n+l- n
= rp when t = O.
So X is a right-continuous process. As everybody knows, g-(O+) is
inseparable.
PROOF. Let.9' be the probability on g- which makes the Sn independent
and ± 1 with probability t each. Let ~ be the tail a-field in g[. Each ~-atom
is a countable set: x and yare in the same ~-atom iff
sn(x) = sn(Y)
for all n ~ n(x, y),
by (l53b). So.9' assigns measure 0 to each atom of~. But .9' is 0-1 on ~, by
Kolmogorov. Now (10.17) forestalls the separability of~. You have to check
~ = g-(O+). *
(158) Proposition. Suppose (156). Then
(a) {X(t) E I} n g-(t) = {X(t) E l} n g-(t+).
More generally, for strict Markov a,
(b) {X(a) El} n g-(a) = {X(a) El} n g-(a+).
PROOF. Claim (a). Let A C {X(t) E l} and A E ~(t+). I have to get
A E g-(t). Let

Bn = {X(t) E I and Xes) = X(t) for t ~ s ~ t + ~}.


Using (156), you can get En E g- (t + ~) and En i {X(t) E I}. Because
A E g-(t + ~), you can use the monotone class argument to find
An E g-(t) with An C {X(t) E I}, such that
A n En = An n En for all n.
Check A = lim inf An E g-(t).
210 EXAMPLES FOR THE STABLE CASE [6

Claim (b). Let A c {X(a) E I} and A Eff"(a+). I need A Eff"(a). But


{A and a ~ t} = {A and a < t} U {A and a = t}.

*
The first set on the right is in ff"(t) by definition. The second one is at first
sight only in ff"(t+), but (a) gets it into ff"(t).
(159) Proposition. Suppose (156), and suppose X(t, x) E I for all t ~ 0
and all x E!!t. Then every Markov time is strict.

*
PROOF. Use (147c) and (I 58a).
NOTE. Suppose (156). Let T be a Markov time. Suppose T(X) = 00 or
X[T(X), x] E I, for all x E!!t. Then T is strict, as in (I 58b). This sharpens (159).
NOTE. Suppose f!P is a probability on (!!t, ff"), which makes X an 1-
valued Markov chain: so f!P{X(t) E I} = 1 for all t; and ff"(t+) is larger than
ff"(t) only on a f!P-null set, which depends on t. There is (181) a strict Markov
a with ff"(a+) really larger than ff"(a); and (I83) a Markov T which is really
different from any strict Markov time.

7. CROSSING THE INFINITIES

(160) Example. (a) Description. The states are the pairs of integers.
Starting from (a, b), the process moves successively through (a, b),
(a, b + 1), .... This defines the process only a finite interval [0,0). Let S
be a stochastic matrix on the integers. At e, choose a' from S(a, .),
independent of the past sample function, and restart the construction from
(a', - (0). See Figure 4.
(b) State space. 1= {(u, v):u and v are integers}.
(c) Holding times. q(u, v) = rev), where 0 < rev) < 00, and
~':-<Xl 1Ir(v) < 00.
(d) Generator.
Q[(u, v), (u, v)] = -rev);
Q[(u, v), (u, v + I)] = rev);
Q[(u, v), (u', v')] = 0 unless u' = u and v' = v or v + 1.
(e) Formal construction. Define, C, <, and M as in (55). Let
i = (a, b) E 1.
(161) Let n i be the set of co En, as defined in (57), such that: coCO, n) =
(a, b + n) for n = 0, 1, ... ; the first coordinate ~(m, co) of co(m, n)
depends on m, but not on n; co(m, n) = (~(m, co), n) for positive m
and integer n.
6.7] CROSSING THE INFINITIES 211

4
··
,,':

-2

-3
1st coord ina te 1st coordina te 1st coord ina te
-4 is ~(O) is ~(l) is ~(2)

! f

i !
Figure 4.

(162) Define the r i of (58) by the requirements that ri(Qi) = 1, while


g(m):m = 0, 1, ... } is a discrete-time Markov chain with stationary
transitions S and starting state a, relative to r i.
You should check (48-51) and (63-66). Then use (108).
Here are some of the features of (160).
*
(163) Q is silent about S.
z = (0,0) E I.
°
(164) Let
(165) If S(O, 0) = 1, then pet, Z, (1,0)) = for aU t > O.
If S(O, 1) > 0, then P(t, Z, (1,0)) >0 for aU t > 0.
Press (165) harder.
(166) The classification of states as transient, null recurrent, or positive
recurrent, and the partition into communicating classes, all depend
on S.
So (163) prevents Q from determining these properties.
I would now like to consider the strong Markov properties of (160). It
will help to clean the sample functions. Review (68-69). Let Wo be the set of
212 EXAMPLES FOR THE STABLE CASE [6

W E W such that
A(e, w) < 00 for all e E C,
and
~CEC wee) = 00.

Review (73, 161, 164). Check 'l}q(Q»)(Wo) = 1 for w E Oz. Let


:!fz = Oz X Wo, with the product a-field.
Review (75, 162). Check
(167) 7T z{:!f.} = 1, and :!fz is Borel.
Define ,em) on :!fz:

'(m)(w, w) = 'em, w).


Remember (71) that
T(m, n)(w, w) = w(m, n) and },(m, n)(w, w) = },(m, n)(w).

Retract these functions to :!fz. Use (76, 162):


(168) The variables T(m, n) with (m, n) E C and the stochastic process
{';(m):m = 0,1, ... } are all 7Tz-independent; T(m, n) is exponential
with parameter r(n); and, is Markov with stationary transitions S
and starting state O.
Let M = 1,2, ....
(1 69a) Let d(M) be the 'a-field in:!f z spanned by
g(m):m < M} and {T(m, n):(m, n) E C and m < M}.
(1 69b) Let d(M, N) be the a-field in :!fz spanned by
g(m):m ~ M} and {T(m, n):(m, n) E C and (m, n) < (M, N)}.
(169c) Let d(M, - (0) = nN =l d(M, -N).
NOTE. ,(M) is d(M, - 00 )-measurable, but not d(M)-measurable.
Use (168):
(170) S[,(M - 1), k] is a version of 7T z g(M) = k I d(M)}.
On :!f., let
(171) (h.l = ~ {T(m, n): (m, n) E C and m < M}.
Check
(172)
6.7] CROSSING THE INFINITIES 213

Review (72). Retract X(t) to ftz. You should check


(173) Description of X(', x) for x Eft,:
(a) X(t, x) = q; iff t = Om(x) for some m = 1,2, ... ;
(b) X(', x) moves through {CO, n):n = 0, I, ... } in order on [O,Ol(X»,
the holding time in (0, n) being 1'(0, n)(x);
(c) X(', x) moves through {<,em, x), n):n = ... ,-1,0,1, ... } in order
on (Om(x), °m+1(x» , the holding time in <'em, x), n) being T(m, n)(x),
for m = 1,2, ....
Use (ft., 7T z) for the probability triple of Section 6. The mass is 1 and the
O'-field is Borel by (167). Use the present X for the process of Section 6. Use
(173a):
(174) 0M(X) is the time of the Mth visit to q; by X(', x).
I claim
(175) OM is strict Markov: definitions (171, 146a).
Indeed, let M be a positive integer and let t be a positive real number. The
event 0M ~ t coincides with this event. There is a rational r < t such that
0Jll-1 ~ r, and for any finite subset J of I there is a rational s with

r < s < t and Xes) ~ J.


I claim
(176) ff(OM) = d(M): definitions (146c, 169a).
Indeed, ff(O 111) is separable by (157c) and d(M) is separable by inspection.
The atoms of ff(O M) and d(M) coincide by (155a) and (173). Now use (152).
Review (6S-71) and use (173):
(177) .:t(M, N)(x) is the least t > 0M(x) such that the second coordinate of
X(t, x) is N.
So
(17S) .:t(M, N) is a strict Markov time, and .:t(M, N) ! OM as N ! - 00.

As in (176),
(179) ff[.:t(M, N)] = d(M, N): definitions (l46c, 169b).
Use (149):
(ISO) ff(OM+) = d(M, -(0): definitions (146d, 169c).
(181) Proposition. 01 is strict Markov. If S(O, .) is nontrivial, then the 7T ,-
measure algebra of ff(Ol +) is strictly larger than the 7T ,-measure algebra of
ff(Ol)'
PROOF. Use (175) for the first claim. For the second, '(1) is ff(Ol +)-
measurable by (ISO). Use (170, 176) to see that '0) is 7T z-independent of
ff(Ol)' and has 7Tz-distribution S(O, '). *
214 EXAMPLES FOR THE STABLE CASE [6

Let Y be the post-8 2 process:


Yet, x) = X[8 2 (x) + t, xl
(182) Proposition. 82 is a strict Markov time. The post-8 2 process Y is
identically rp at time 0. But Y is dependent on ~(82)' so on ~(82+), provided
Sea, 1) > °and S(O, 2) > °and S(1, 3) ~ S(2, 3).

PROOF. Use (175) for the first assertion, and (173a) for the second. For
the third, I say that ~(2) is measurable on Y. Indeed, (173) makes ~(2) the
first coordinate of yet) for all small positive t. Combine (170, 176):
7Tz{~(2) = 3\ ~(82)} = S(1, 3) on g(l) = I}
= S(2, 3) on g(1) = 2}.

*
From (168),
7TZ{~(1) = I} = S(O, 1) and 7T z g(1) = 2} = S(O, 2).
Let T be the least 8m if any with ~(m) = 1, and T = 00 if none.
'(183) Proposition. T is Markov in the sense of 046b). Let

S(j, k) = K ~ 1 for j, k = 0, 1, ... ,K.

Then 7T z{T < oo} = I. But


1
7Tz{a=T}~K+l
for any strict Markov time a.

NOTE. 7Tz{8 1 = T} = K ~ 1.
PROOF. Use (173) for the first assertion, and (168) for the second. For the
third, let a* be a strict Markov time. Let 8", = 00. Let a be the least
8m ~ a* for m = 1,2, ... , 00. Then a is strict Markov,
a = 8. for random y = 1,2, ... , 00,
and

Use (148e, 175):


{y = m} = {a = 8m} E ~(8m) for m = 1,2, ....
Use (170, 176):
1
7Tz{Y = m and ~(m) = I} = 7Tz{Y = m} K + 1.
6.7] CROSSING THE INFINITIES 215

Abbreviate J = {I, 2, ... }. Then


{a = 'T} C UmEJ {v = m and '(m) = I}.
So
7T Z{a = 'T} ~~mEJ 7T z {V = m and '(m) = I}
1
=~me,J 7T z {V = m}--
K +1

*
7
THE STABLE CASE

1. INTRODUCTION

In this chapter, unless I say otherwise, let P be a standard stochastic semi-


group on the countable set I, with all states stable:
q(i) = _PI(O, i, i) < 00 for all i.
The first problem is to create a P-chain X, all of whose sample functions are
regular: continuous from the right, with limits from the left at all times, when
discrete I has been compactified by adjoining the point at infinity qJ. This is
done in Section 2. To see why compactification is a good idea, look at (6.142).
Let Q(i,j) = P'(O, i,j), and let
r(i,j) = Q(i,j)/q(i) for i ¥: j and q(i) > 0
= 0 elsewhere.
Call r the jump matrix. Suppose the chain X starts from i with q(i) > O. Let
T be the time of first leaving i. Let

Y(t) = X(T + t).


Call Y the post-exit process. Then:
T and Yare independent;
T is exponential with parameter q(i);
YeO) = j with probability r(i,j);
Y is Markov chain with stationary transitions P and regular sample
functions;
YeO) = qJ is a distinct possibility.
This theorem is proved in Section 3.

I want to thank Howard Taylor and Victor Yohai for checking the final draft of
this chapter.
216
7.2] REGULAR SAMPLE FUNCTIONS 217

Let T be a general Markov time. Given X(T) = j EO I, the pre-T sigma field
and the post-T process are conditionally independent. The post-T process
is a P-chain starting fromj, all of whose sample functions are regular. This
is proved in Section 4.
Let Q be a matrix on I, with
q(i) = -QU, i) ~ 0 for all i
Q(i,j) ~ 0 for all i ¥= j
L j Q(i,j) = 0 for all i.
Then there is a minimal standard substochastic semigroup P with generator
Q. If P isn't stochastic, there are a continuum of different standard stochastic
semi groups P with generator Q. You can manufacture P as follows. Let
r(i,j) = Q(i,j)!q(i) for i ¥= j and q(i) >0
=0 elsewhere.
Start a chain jumping according to r, and waiting according to q. If the
chain only covers part of the line according to this program, too bad for it.
The transitions of this chain are P. You pick up the other solutions by
continuing the construction in different ways, as in (6.132). These results are
proved in Section 5.
Here are the main results of Section 6. First,
t ~ P(t, i,j)
is continuously differentiable. Second,
ret) = QP{t) for some t >0
iff there are jumps to ({J on almost no sample functions, iff
Lj Q(i,j) = 0 for all i.
Third,
ret) = P(t)Q for some t >0
iff there are jumps from ({J on almost no sample functions. In fact,
J(t, i,j) = P'(t, i,j) - LkP(t, i, k)Q(k,j)
is the renewal density for jumps from ({J to j, in a chain starting from i.

2. REGULAR SAMPLE FUNCTIONS

Throughout this chapter, except as noted in Section 5, let P be a standard


stochastic semigroup on the finite or countably infinite set I. As (5.10) states,
o ~ q(i) = _P'(O, i, i) exists; throughout this chapter, except in (3-5) and
218 THE STABLE CASE [7

(18-20), make the


(1) Assumption. q(i) < 00 for all i.
As (S.14) states, P'(O) = Q exists and is finite, with Q(i,j) ~
i ~ j and ~i Q(i,j) ~ 0. Give I the discrete topology and let I = I U {p}
for °
be the one-point compactification of I for infinite l. The state p is called
infinite or fictitious or adjoined by contrast with finite or real states i E I. Let
I = I for finite l. Letfbe a function from [0, (0) to i.
(2) Definition. Say f is regular iff:
(a) limf(s) = f(t) as s decreases to t,for all t ~ 0;
(b) limf(s) exists as s increases to t,for all t > 0.
The main point of this section is to construct a Markov chain with
stationary transitions P, starting from any i, such that all the sample functions
are regular. This result is essentially due to Doob (1942) and Levy (19S1).
For another treatment, see (Chung, 1960, II.S and 11.6).
Let R be the set of binary rationals in [0, (0), namely,
R = {r:r = a2- b for some nonnegative integers a and b}.
Let n be the set of all functions from R to I. Endow I with the a-field of all
its subsets, and n with the product a-field. Of course, n is Borel. Let
{X(r):r E R} be the coordinate process on n, namely,
X(r)(w) = w(r) for WEn and r E R.
For each i E I, let Pi be the probability on n for which {X(r): r E R} is Markov
°
with stationary transitions P and X(O) = i: for = ro < r1 < ... < rn in R
and io = i, iI' ... , in in I,
(3) Pi{X(r m) = im for m = 0, ... , n} = n::.-:,~ P(r m+l - r m' im> im+l)'
By convention, an empty product is 1.
Let A(i, s) = {w:w(r) = i for ~ r ° ~ s}.

(4) Lemma. A(i, s) is measurable and Pi{A(i, s)} = rq(i)', even without (1).
PROOF. Suppose s E R. Let n be so large that N = 2 n s is a positive
integer, and let
A(n, i, s) = {w:w E nand w(mj2n) = i for m = 0, ... , N}.
Plainly,
(S) Pi{A(n, i, s)} = [P(2- n , i, i)f".

*
As n increases, A(n, i, s) decreases to A(i, s), while the right side of (S)
converges to e- q (i)8. You move s.
7.2] REGULAR SAMPLE FUNCTIONS 219

ASSUMPTION (I) IS NOW IN FORCE.


For r E Rand e > 0, let G(r, e) be the set of w E Q such that w(s) = w(r)
for all s E R with Is - rl ~ e. Let
Qy = n TER U,>o G(r, e).
(6) Lemma. Q g is measurable, and Pi{Qg} = l.
PROOF. The set Q g is measurable, because e can be confined to a sequence
tending to 0. The main thing to prove is

limHoP;{G(r, en = 1
for each r E R. To avoid trivial complications, suppose r > 0. Fix a positive
binary rational e less than r. Using (4) and a primitive Markov property,

Pi{G(r, e) and X(r) = j}


= j and + s) = j °
°
= Pi{X(r - e) X(r - e for s E R with ~ s ~ 2e}
= Pi{X(r - e) =j}' Pj{X(s) =j for s ER with ~ s ~ 2e}
= per - e, i, j) . e- 2q (jl,.
Sum outj, and use Fatou; or note that per - e, i,') -per, i,') in norm as
e-O. *
The variable U is geometric with parameter p iff U takes the value u with
probability (1 - p)p" for u = 0, 1, ....

< l.
° ° °
(7) Lemma. Let Un be geometric with parameter Pn Let Pn - l. Let
~q ~ 00. Let an > and an - so that
(1 - Pn)/an - q.
Then the distribution of an Un converges to exponential with parameter q.

*
PROOF. Easy.
For wE Q g and r E R, there is a maximal interval of s E R with r as
interior point and w(s) = w(r). Let u and v be the endpoints of this interval,
which depend on w. If r > 0, then u r < <
v and w(s) = w(r) for all s E R
with u < < s v, and this is false for smaller u or larger v. Either u 0 or u =
is binary irrational; either v = 00 or v is a binary irrational. The changes for
r = 0 should be clear. If w(r) = j, the interval (u, v) n R will be called a
j-interval of w. Let Qj .• be the set of w E Q g such that only finitely many j-
intervals of w have nonempty intersection with [0, s]. Let

Qv = njE1 ns>o Qj,s'


(8) Lemma. Q v is measurable and Pi{Qv} = 1.
220 THE STABLE CASE [7

PROOF. In the definition of D v , the index s can be confined to R without


changing the set. Consequently, it is enough to prove that OJ,S is measurable
and P,{Oj,s} = 1. To avoid needless complications, suppose s = 1 and
q(j) > 0. Let A(N) be the set of w E Og such that N or more j-intervals of w
have nonempty intersection with [0, 1]. Then A(N) decreases to Og\Oj.l as

°
N increases. Because Og is measurable and Pi{D g } = 1 by (6), it is enough to
prove that A(N) is measurable and Pi{A(N)} -+ as N -+ 00. Let Yen, m) =
X(m2- n ) , so Y(n,O), Yen, 1), ' .. is a discrete time Markov chain with
transitions P(2- n ) starting from i, with respect to Pi' For the moment, fix n.
Aj-sequence of w is a maximal interval of times m for which Yen, m)(w) = j.
Let C 1 , C 2 , ••• be the cardinalities of the first, second, ... j-sequence. Of
course, the C's are only partially defined. Let A(n, N) be the set of w E Og
such that N or more j-sequences of w have nonempty intersection with the
interval m = 0, ... ,2n. Plainly, A(n, N) is measurable. On A(n, N), there
are N or more j-sequences, of which the first N - 1 are disjoint subintervals of
0, ... ,2n. The vth subinterval covers (C. - 1)/2 n of the original time scale.
Consequently Pi{A(n, N)} is no more than the conditional Pi-probability
that },;~::;l 2- n ( C. - 1) ~ 1, given there are N or more j-sequences. Given
there are N or more j-sequences, (1.24) shows C1 - 1, ... , C N - 1 are
conditionally Pi-independent and geometrically distributed, with common
parameter
P(2- n ,j,j) = 1 - 2- nq(j) + 0(2- n ) as n -+ 00.
By (7), the conditional Pi-distribution of 2- n (Cv - 1) converges to the
exponential distribution with parameter q(j) as n -+ 00. As n increases to 00,
however, A(n, N) increases to A(N). Consequently, A(N) is measurable, and
Pi{A(N)} is at most the probability that the sum of N - 1 independent

*
exponential random variables with parameter q(j) does not exceed 1. This
is small for large N.
For w E Ov and nonnegative real f, there are only two possibilities as r E R
decreases to t: either
(9) w(r) -+ i E I;
or
(10) w(r) -+ cp,
If fER, only (9) can hold, with i = w(t). For wE Ov and positive real f,
as r E R increases to t, the only two possibilities are still (9) and (10); if
fER, only (9) can hold, with i = wet).

(11) Definition. For w E Ov and t ~ 0, let


X(t, w) = lim X(r, w) as r E R decreases to t.
7.2] REGULAR SAMPLE FUNCTIONS 221

Plainly, X(t) is a measurable function from Dv to J, and for each wE Dv ,


the function X(', w) is regular in the sense of (2).
(12) Theorem. Define X(t) by (11). For each i E I, the process
{X(t):O ~ t < oJ} is a Markov chain on the probability triple (Dv' Pi)' with
stationary transitions P, starting from i, such that all sample functions are
regular in the sense of (2).
NOTE. The a-field on Dv is the product a-field on D relativized to Dv'
And Pi is retracted to this a-field.
PROOF. In (3), let r1 , ••• ,rn in R decrease to t 1 , ••• ,tn respectively. By
regularity,
{X(r m) = im for m = 0, ... ,n} -- {X(tm) = im for m = 0, ... ,n},
= = 0.
where ro
(13) Lemma.
to Use Fatou and (8).
t -- X(t) is continuous in Pi-probability.
*
PROOF. Let °< s < t. Then
Pi{X(t) = Xes)} = ~j pes, i,j)P(t - s,j,j).
Let s, t tend to a common, finite limit. Use Fatou to check that the right side
of the display tends to 1. *
(14) Lemma. Fix positive t. For Pi-almost all w, the point t is interior to a
j-interval for some j.
PROOF. Suppose P(t, i,j) > 0. By (13), given that X(t) = j, for Pralmost
all w, there is a sequence rn E R increasing to t with w(rn) = j. For w E Dv ,
each r n is interior to a j-interval of w, and there are only finitely many j-
intervals of w meeting [0, t]. Thus Xes, w) = j for t - e ~ S ~ t, where
e = e(w) > 0. By (11), there is a sequence rn E R decreasing to t, with
w(r n) = j. Only finitely many j-intervals of w meet [0, t + 1]. Thus,
Xes, w) = j for t ~ s ~ t + e, where e = e(w) > 0. *
For (15) and (16), let (.q[',.'F, &') be any probability triple, and
{Y(t):O ~ t < oJ} any Markov chain on (.q[',.'F, &') with stationary
transitions P.
(15) Corollary. Let .q['v be the set of x E.q[' such that Y(', x) retracted to R is in
Dv' Then !!l"v E.'F and &'{!!l"v} = 1. For x E .q['v, let Y*(t, x) = lim Y(r, x) as
r E R decreases to t. Then Y*(', x) is regular, and &'{ yet) = Y*(t)} = I
for each t.
PROOF. Without real loss, suppose &'{ YeO) = i} = 1. Let
.q['o = {x: Y(', x) is I-valued on R}.
Plainly, !!l"o E .'F and &,{.q['o} = 1. Let M be the map which sends x E!!l"o
222 THE STABLE CASE [7

to the function Y(', x) retracted to R. Then M is ~-measurable from flEo


to 0, the Borel set of functions from R to l. And f!JJ M-l = Pi' But
flE" = M-10 v , so flEv E ~ and

using (8).
I still have to argue that
f!JJ{ Y(t) = Y*(t)} = 1.
The f!JJ-distribution of {Y(s): s E R or s = t} coincides with the Prdistribution
of {X(s):s E R or s = t}, by (12). And the set of functions cp from R U {t}
to 1 with
cp(t) = lim cp(r) as r E R decreases to t
is product measurable. Using (11),
1 = Pi{X(t) = lim X(r) as r E R decreases to t}
= f!JJ{ Y(t) = lim Y(r) as r E R decreases to t}
= f!JJ{ yet) = Y*(t)}. *
The y* process has an additional smoothness property: each sample
function is I-valued and continuous at each r E R. This property is an easy
one to secure.
(16) Lemma. Let Y be a Markov process on (flE,~, f!JJ) with stationary
transitions P and regular sample functions. Fix a nonnegative real number t.
Let f!tt be the set of x E f!t such that Y(', x) is continuous and I-valued at t.
Then flE t E ~ and f!JJ{flEt} = 1.
PROOF. As in (15), using (14).
Return now to the X process of (11). Keep W E Ov' I say
*
(l7a) (t, w) ---->- X(t, w) is jointly measurable:
that is, with respect to the product of the Borel a-field on [0, 00) and the
relative a-field on Ov' Indeed, X(t, w) = j E I iff for all n there is a binary
rational r with
1
t < r < t + - and X(r, w) = j.
n
The sets of constancy
For i Eland wE Ov, let
Si(W) = {t:O ~ t < 00 and X(t, w) = i}.
This is a level set, or a set of constancy.
7.3] THE POST-EXIT PROCESS 223

For i E I, the set Si(W) is either empty or a countable union of intervals


[an, b n) with a 1 < b 1 < a 2 < b 2 < .... If there are infinitely many such
intervals, an ~ 00 as n ~ 00. In more picturesque language, X visits i for
the nth time on [an, b n ), with holding time b n - an" Moreover, S'I'(w) is a
nowhere dense set closed from the right. In particular, S'I'(w) is Borel. This
follows more prosaically from Fubini, which also implies
(l7b) {w:Lebesgue S'I'(w) = O} is measurable and has Pi-probability l.

3. THE POST-EXIT PROCESS

The results of this section are essentially due to Levy (1951). For another
treatment, see Chung (1960, 11.15). Here are some preliminaries (18-20).
For now, drop the assumption (1) of stability. Let {Z(t):O ~ t < <Xl} be an
I-valued process on the probability triple (,q', ~,fYJ).
(18) Definition. Drop (I). Say Z is Markov on (0, (0) with stationary
transitions P iff:

fYJ{Z(tn) = in for n = 0, ... , N}


= fYJ{Z(to) = io} II;;,:-i P(tn+l - tn' in' in+!)
for all N and in E I and 0 ~ < tl < ... < tN. If also
to

fYJ{Z(t) E I} = I for all t > 0,

then Z is finitary. By convention, an empty product is 1.


There is nothing here to prevent fYJ{Z(O) = <p} > 0, and this is the point
of the generalization.
ILLUSTRATION. Let X be the process of example (6.142). Let e be the least
twith X(t) = <p,soeis finite and X(e) = <palmost surely. LetZ(s) = X(e + s).
Then Z is finitary and Markov on (0, (0), with the same stationary transitions
as X. But Z(O) = <p almost surely.
(19) Criterion. Z is a finitary P-chain in (0, (0) iff Z(c + .) is an ordinary
P-chain on [0, (0) for a sequence of c decreasing to O. This holds without (I).
(20) Lemma. Let Z be Markov on (0, (0) with stationary transitions P.
Suppose Z is jointly measurable and

Lebesgue {t:Z(t, w) = <p} =0


for almost all w. Then Z isfinitary, namely, Z(t) E I almost surely for all t > O.
This holds without (1).
224 THE STABLE CASE [7

PROOF. Let f{t) be the probability that Z(t) E I. I say / is nondecreasing


on [0, (0), because P is stochastic:
f(t + s) ~ 'I.;,jEI.9'{Z(t) = i and Z(t + s) = j}
= 'I. i';EI.9'{Z(t) = i} . pes, i,j)
~ 'I. iE1 .9'{Z(t) = i}
= f(t).
By Fubini,f(t) = 1 except for a Lebesgue-null set of t. Now the monotonicity
forces/= 1. *
ASSUMPTION (1) IS BACK IN FORCE.
By convention, the inf of an empty set is 00. Let
T(W) = inf {t:X(t, w) =F X(O, w)},
the first holding time. Check that T is measurable, and positive everywhere. Let
Y(t, w) = X[T(W) + t, w] when T(W) < 00.
Call Y the post-exit process. Clearly, y(., w) is regular. Moreover, Y is
jointly measurable: that is, with respect to the product of the Borel a-field on
[0, (0) and the relative a-field on Qi' Indeed, Y is the composition of X with
(t, w) -+- (T(W) + t, w). The last function is measurable, because its first com-
ponent is the sum of two visibly measurable functions:
(t, w) -- t
and (t, w) -+- T{W)
And X is jointly measurable by (17a).
Fix i E I with q(i) > O. Let Q i be the set of w E Q v with X(O, w) = i and
T{W) <00. Then P.{Qi} = 1 by (4) and (8). Confine W to Qi' The function
X(T), whose value at w is X[T(W), w], is measurable on Qi, and is either some
element j of I other than i, or is cpo In the first case, say the process jumps to j
on leaving i; in the second, say the process jumps to cp on leaving i.
To study this more closely, let
r(i,j) = Q(i,j)/q(i) for i =F j
=0 for i = j.
Call r the jump matrix of Q.
(21) Theorem. With respect to Pi' the post-exit process Y and the first
holding time T are independent; Yis Markov on (0, (0) with transitions P; and
T is exponential with parameter q(i). Moreover,
P i{ YeO) = j} = r(i,j) for j E I.
Finally, Y is finitary;
Pi{Y{t) E I} = 1 for all t > O.
7.3] THE POST-EXIT PROCESS 225

PROOF. Keep w in Qi' Let °; ;


t < 00. Let
and let io, iI' ... , i MEl. The main thing to prove is
°; ; So < SI < ... < SM,

(22)
where A = {T :?; t} n Band

B = {Y(sm) = im for In = 0, ... , M}


and
= n!~l P(sm+l - i m, im+l)'
°
7T Sm'

To begin with, suppose t > and t, So, . . . , s.'II E R. Let Tn be the least
m/2n with X(mj2n) =;6 i. So Tn :?; Ij2n. Let

,
An = {Tn:?; t and XCTn + sm) = im for m = 0, ... ,M}.

TIOO(W)

[--) -------) Si(W)

t
T(W)

Figure 1.

Because the sample functions are regular, there is an interval to the right of T
free of Si = {t:X(t) = i}, as in Figure 1. Check that Tn is in this interval
for large n, and Tn! T. SO {Tn:?; t}! {T:?; t}. Using the regularity again,
lim sup An cAe lim inf An.
By Fatou,
(23)
and the problem is to compute P;{A n }.
Consider only n so large that 2 nt, 2nso, ... ,2ns M are integers. Then
(24)
where An,N is the event that XCm/2n) = i for m = 0, ... , N - 1 and
X(N/2n) =;6 i and X(N2- n + sm) = im for m = 0, ... ,M. The problem is
to compute Pi{An,N}' Let
(25 a) a(n) = P(2- n, i, i) = I - q(i)2- n + o(2- n)
(25b) ben) = a(np n t - 1 =P;{Tn :?; t}--P;{T:?; t}
(25c) c(n) = [1 - a(n)]-1 P;{X(2- n) =;6 i and X(2- n + so) = io}.
Because {X(mj2n):m = 0, 1, ... } is Markov with transitions P(2- n),

P;{A n.N} = a(n)N-l[l - a(n)]c(n)7T.


226 THE STABLE CASE [7

Sum out N = 2 n t, 2 n t + 1, ... and use (24):


(26)
Put t = 2- n and M = 0 in (26):
(27)
by regularity. Let n ---+ 00 in (26) and use (23, 25b, 27) to get (22) for t > 0
and t, So, . . . ,SM E R. Get the full (22) by a passage to the limit and
regularity.
Put t = 0 in (22) to see that Y is Markov on (0, 00) with stationary tran-
sitions P. So you can rewrite (22):
(28) Pi{C and B} = Pi{C} . Pi{B}
for C = {T ~ t}. As M and io, ... ,iM and So, •.• ,SM vary, the B's are
closed under intersection, modulo the null set, and generate the same a-field
as Y. The sets C do similar things for T. Both sides of (28) are countably
additive in both arguments. So Y is independent of T by (10.16). And T is
exponential with parameter q(i) by (4).
From the definition, Pi { YeO) = i} = O. Letj ¥= i. To compute Pi { YeO) = j},
go back to (25-26). Put So = 0 and io = j. Then
c(n) = [1 - a(n)]-l P i {X(2- n ) = j}
= [1 - a(n)]-l P(2- n , i,j)
= [q(i)2- n + 0(2- n)]-1 [Q(i,j)2- n + 0(2-n)]
= [q(i) + 0(1)]-1 [Q(i,j) + 0(1)]
---+ r(i,j).
Use (27).
I still have to prove P i { yet) E I} = 1 for all t > O. But
{t: Y(t, w) = cp}
is the translate by T(W) of a subset of
{t:X(t, w) = cp},
and has Lebesgue measure 0 for almost all w by (17b). Use (20). *
(29) Remark. For each positive t, the function Y(', w) is I-valued and
continuous at t for P;-almost all w: use (21, 19, 16). So Y(', w) is I-valued
and continuous at all positive r E R, for Pi-almost all w.
(30) Remark. The post-exit process is studied again in Section 6, where it is
shown that pi exists and is continuous, and
Pi{Y(t) = j} = P(t, i,j) + q(i)-l P'(t, i,j).
7.3] THE POST - EXIT PROCESS 227

This could also be deduced from the proof of (21), as follows. Abbreviate
t = So ~ 0 andj = io andf(t) = pet, i,j). Let en = 2- n • Recall that a(n) =
peen, i, i) from (25). Now

Pi{X(e n) ¥- i and X(e n + t) = j} = J(e n + t) - Pi{X(e n ) = i andX(en + t) = j}


= [1 - a(n)]f(t) + J(t + en) - J(t).
Recall c(n) from (25). Check
c(n) = J(t) + en .J(t + en) - J(t)
1 - a(n) en
But c(n) ---+ P i { yet) = j} by (27). Consequently,

J*(t) = limn_",J(t + en) - J(t)


en
exists, and
(31) P i { Y(t) = j} = f(t) + q(i)-1 f*(t)·

°
By regularity, Y is continuous at O. And Y is continuous with Pi-probability
I at each t > by (29). Consequently, t ---+ P i { Y(t) = j} is continuous, and
therefore f* is continuous. But en can be replaced by any sequence tending
to 0, without affecting the argument much: Tn is the least men with X(me n ) ¥- i,
and Tn ---+ T from the right but not monotonically. The limit of the difference
quotient does not depend on the sequence, in view of (31). Thus, the right

*
derivative off exists, and is continuous, beingf*. Use 00.67) to see thatf*
is the calculus derivative off
The global structure of {X(t):O ~ t < oo} is not well understood. But the
local behavior is no harder than in the uniform case. To explain it, introduce
the following notation. At time 0, the process is in some state ~o; it remains
there for some time TO' then jumps to T or to a new state ~1' If the latter, it
-remains in ~I for some time TI' then jumps, and so on. See Figure 2.
More formally, let ~o = X(O) and let
TO = inf {t:X(t) ¥- ~o}·

The inf of an empty set is 00. Suppose ~o, ... , ~ n and TO, •.• , Tn are defined.
If Tn = 00 or Tn < 00 but X(TO + ... + Tn) = T, then ~n+!' ~n+2"" as
well as Tn+!' T n+2' . . . are undefined. If Tn < 00 and X( TO + ... + Tn) E [,
then ~n+I = X(TO + ... + Tn) and
T1 + ... + Tn+! = inf {t:T1 + ... + Tn ~ t and XU) ¥- ~n+!}'

Define the substochastic jump matrix r on [ by


(32) r(i,j) = Q(i,j)/q(i) for q(i) > 0 and i ¥- j
=0 elsewhere.
228 THE STABLE CASE [7

1 b x I
/
/
-T~
...

b ......." "

~O

TO
~l

-Tl-

Figure 2.

(33) Theorem. With respect to Pi; the process ~o, ... is a partially defined
discrete time Markov chain, with stationary transitions r, and starting state i.
Given ~o, ... , the holding times 'To, ... are conditionally P;-independent and
exponential with parameters q(~o), .... In other terms, let io = i, ... , in E I.
Given that ~o, ... , ~n are defined, and ~o = i o, ... , ~n = in' the random
variables 'To, ... , 'Tn are conditionally Pi-independent and exponentially
distributed, with parameters q(io), ... ,q(in).
PROOF. Let to, ... , tn be positive real numbers. The thing to prove is
F;{A} = e-S 7T,

where
A = {~o = i o, ~1 = i1 , ••• , ~n = in and 'To ~ fo, 'T1 ~ f1,··· ,'Tn ~ tn}
and

and

Let
B = = iI, ... , ~n-1 = in and 'To ~ t 1, ... ,'Tn-1 ~ tn}·
go
Put i = io and j = i1 and t = to. Let Y be the post-exit process. Let Yn(-, w)
be Y(·, w) retracted to R. I claim
P i { Y(O) = j but Yn 1: OJ = O.
Indeed, Y(·, w) is the translate of part of X(·, w), and has only finitely many
k-intervals in any finite time interval, and is continuous at o. If Y(·, w) is
continuous and I-valued at positive r E R, and Y(·, w) is I-valued at 0, then
Y n(·, w) E Ov . So (29) gets the claim. Exclude this null set. Then
A = {'To ~ t and X(O) = i and Y(O) = j} () {Yn E B}.
7.4] THE STRONG MARKOV PROPERTY 229

Use (21) to see


Pi{A} = e-q(i)t r(i,j) P;{B}.
Induct.
*
NOTE. This result and (5.38) prove (5.45). For SUPi q(i) < 00 implies
~ I/qan) = 00.
There is a useful way to restate (33). Define
d=inf{n:~n$l} onQv'
As in Section 5.6, let 0 $ I. Let !!C be the set of pairs (w, w), where w is a
sequence of elements of I U {o}, and w is a sequence of elements of (0, 00].
Let
~n(w, w) = wen) and 'Tn(w, 11') = wen).
Let
d = inf {n: ~n = o} on !!C.
Define the probability Tri in !!C by (5.35). Relative to Tr i ,

is a discrete time Markov chain with stationary transitions r, extended


~ 1/
to be stochastic on I U {o}, and starting state i; given gn}, the holding
times 'Tn are conditionally independent and exponential, the parameter
for 'Tn being q(~n)'
Then
the Pi-distribution of {(~n' 'T1/):0 ~ n < d} coincides with the Tri-distri-
bution of {(~n' 'Tn):O ~ n < d}.
Suppose q(i) > °and ~; Q(i,j) = ° for all i. Then Pi{d = oo} = I, so
(33*) the Pi-distribution of {(~n' 'Tn):O ~ n < cD} is Tr i •

4. THE STRONG MARKOV PROPERTY

For a moment, let 'T be the first holding time in X. According to (21), the
post-'T process Y is a finitary P-chain on (0, (0), and is independent of 'T.
The strong Markov property (41) makes a similar but weaker assertion for a
much more general class of times 'T. In suggestive but misleading language:
the post-'T process is conditionally a finitary P-chain on (0, (0), given the
process to time 'T. By example (6.182), the post-'T process need not be independ-
ent of the process to time 'T. Here is the program for proving strong Markov.
First, I will prove a weak but computational form of the assertion for
constant 'T, in (34). Using (34), I will get this computational assertion for
random'T in (35). I can then get strong Markov (38) on the set {X('T) E I}.
General strong Markov finally appears in (41).
230 THE STABLE CASE [7

You may wish to look at Sections 6.6 and 6.7 when you think about the
present material. David Gilat forced me to rewrite this section one more time
than I had meant to.
Let ~(t) be the a-field generated by Xes) for 0 ~ s ~ t.
(34) Lemma Let 0 ~ So < SI < ... < Sill and let io, iI' ... , i 1"I E I. Let
o ~ t < 00 and let D E ~(t). Then
Pi{D and XU + sm) = i",jor m = 0, ... ,M}
= Pi{D and X(t + so) = io} . 'iT,
where

PROOF. This is clear from (12) for special D, of the form


{X(tn) = jn for n = 0, ... , N},
where 0 ~ 10 < 11 < ... < tN ~ t andjo,h, ... ,jN E /. Now use (10.16). *
Call a nonnegative random variable T on Ov a Markov time, or Markov,
iff {T < t} is in ~(t) for all t. Let ~(T+) be the a-field of all measurable
sets A such that A n {T < t} is in ~(t) for all t. Call this the pre-T sigma
field. Let Dr. = {T < oo}. Now Dr. c Ov, for w is confined to Ov. Let
yet) = X(T + t) on Dr.. More explicitly,
Yet, w) = X[T(W) + I, w] for T(W) < 00.
Call Y the poSt-T process. Clearly, Y is a jointly measurable process with
regular sample functions.
WARNING. T and ~(T+) both peek at the future; this point is discussed
in Sections 6.6 and 6.7.
Here is a preliminary version of the strong Markov property.
(35) Proposition. Suppose T is a Markov time. Let Y be the poSt-T process.
Let
o ~ So < SI < ... < SJi
and let i o, ib ... , i ill E I. Let

B = {Y(sm) = imfor m = 0, ... ,M}


and

Let A E ~(T+) with A c Dr.. Then


P;{A n B} = P;{A and Y(so) = io} . 'iT.
7.4] THE STRONG MARKOV PROPERTY 231

PROOF. Let Tn be the least m/2 n greater than T. This approximation differs
from the one in (21). Let Bn be the event that X( Tn + sm) = im for
m = 0, ... , M. By regularity,
lim sup (A n Bn) cAn B c lim inf (A n Bn),
so Pi{A n Bn} ---+ Pi{A n B} by Fatou. Let
Cn . m = {em - 1)/2 n ~ T < m/2n}.
Then
P;{A n Bn} = ~;::~1 Pi{A n Bn n C n.m }.
But by (34),
Pi{A n Bn n C n . m } = Pi{A n C n.m n [X(2- nm + so) = ion· 7T,

because

Consequently,
Pi {A n Bn} = Pi{A and X(Tn + so) = io}' 7T.

Let n ---+ 00 and use regularity again.


Let ~v be the set of w E ~ such that y(., w) is continuous when retracted to
*
R, and I-valued when retracted to positive R.
(36) Corollary. (a) Given ~ and X( T) = j E I, the a-field ff"( T+) and the
poSt-T process Yare conditionally P;-independent, Y being Markov with
stationary transitions P and starting state j.
(b) With respect to P i{' I ~}, the poSt-T process Y is a Markov chain on
(0, 00), with stationary transitions P.
(c) P i{ yet) E I I ~} = I for t > 0, so Y is finitary.
(d) Pi{~v I ~} = 1.
PROOF. Use (35) to get (a). Put A = ~ in (35) to get (b). Then use (20)
to get (c): the set {t: Yet, w) = IP} is the translate by T(W) of a subset of
{t:X(t, w) = IP}, and the latter set is typically Lebesgue null by (l7b). Let
r E R be positive. Then Y(r + .) is an ordinary Markov chain on [0, 00)
relative to P i {' I ~} by (b-c). So Y(r + .) is almost surely continuous and
I-valued on R, by (16). Intersect on r to get (d): the continuity at 0 is forced
by regularity. *
I would now like to explain why (35-36) are inadequate. Fix i ¥= j. Let ()
be the length of the first j-interval in X. According to (46) below, the Pc
distribution of () is exponential with parameter q(j).
232 THE STABLE CASE [7

PSEUDO PROOF. Let T be the least! ifany with X(t) = j, and T = 00 ifnone.
Then T is a Markov time. Suppose Pi{T < oo} = 1 for a moment. Let Y be
the post-T process. Let a be the first holding time in X.
(a) () = a 0 Y.
(b) The Pi-distribution of Y is Pi.

*
(c) The Prdistribution of () coincides with the Prdistrlbution of r1.
(d) The last distribution is exponential with parameter q(j), by (21).
This proof is perfectly sound in principle, but it breaks down in detail. The
time domain of a Y sample function is [0, (0). But a is defined only on a
space offunctions with time domain R. And P j also acts only in this space. So
(a) and (b) stand discredited, for the most sophistical of reasons. For a
quick rescue, let S be the retraction of Y to time domain R.
REAL PROOF. (a) () = r1 0 S on {S E 0v}, and Pi{S E Qv} = 1.

*
(b) The P;-distribution of Sis Pj.
(c, d) stay the same.
I would now like to set this argument up in a fair degree of generality. Let
Q be the set of all functions w from R to 1. Let X(r, w) = w(r) for w E Q
and r E R. Give Q the product a-field, namely the smallest a-field over which
all X(r) are measurable. Here is the shift mapping S from D. to Q:
(37) X(r, Sw) = Y(r, w) = X[T(W) + r, w].
You should check that S is measurable. Here is the strong Markov property
on {XC T) E l}; the set {XC T) = cp} is tougher, and I postpone dealing with it
until (41).
(38) Theorem. Suppose T is Markov. Let D. = {T < O'J}, and let Y be the
post-T process. Define the shift S by (37).
(a) Pi{D. and X(T) E I but S fj; Ov} = 0.
(b) If wE {D. and X(T) E I and S E Ov}, then Y = X 0 S; that is,
Yet, w) = X(t, Sw)for all t ~ 0.
(c) Suppose A E ff(T+) and A c {D. and X(T) = j E I}. Suppose B is a
measurable subset of o. Then

Pi{A and S E B} = PiCA) . Pi(B).


(d) Given D. and X(T) = j E I, the pre-T sigma field ff(T+) is con-
ditionally Pi-independent of the shift S, and the conditional Prdistribution of
Sis Pj.
NOTE. Claims (b-d) make sense, if you visualize 0 as this subset of Q:
{w:w E Q and w(r) E I for all r E R}.
7.4] THE STRONG MARKOV PROPERTY 233

PROOF. Claim (a). Suppose w E .ov. Then Y(·, w) is the translate of part
of X(·, w), is continuous at 0, and has only finitely many k-intervals on
finite intervals. Now use (36d).
Claim (b). Use the definitions.
Claim (c). Use (35) to handle the special B, of the form
{X(sm) = im for m = 0, ... , M},
with °;:;
So < ... < S 111 and i o,·
Claim (d). Use claim (c).
•• , i Jl1 in I. Then use (10.16).
*
The general statement (41) of strong Markov is quite heavy. Here is some
explanation of why a light statement doesn't work. Suppose for a bit that
Pi{T < CX) and X(T) = cp} = 1.
Then §"'(T+) and Yare usually dependent: see (6.182) for an example. At
the beginning, I said that Y is conditionally a finitary P-chain on (0, 00),
given §"'(T+). This is much less crisp than it sounds. To formalize it, I would
have to introduce the conditional distribution of Y given §"'(T+). Unless one
takes precautions, this distribution would act only on the meager collection
of product measurable subsets in 1[0. it loses the fine structure of the
(0
);

sample functions. Also, to check that a probability on 1[0. 00 ) is Markov, one


would have to look at an uncountable number of conditions; it's hard to
organize the work in a measurable way. So, the charming informal statement
leads into a morass. My way around is dividing the problem in two. First,
retract the time domain of Y to R; call this retract S, for shift. Now Stakes
its values in the Borel space Q = IR; we can discuss its conditional distri-
bution with a clear conscience, and in a countable number of moves force
these distributions to be finitary P-chains on positive R. Second, we can
relate Y to S: except for a negligible set of w, the sample function Y(·, w) is
obtained by filling in S( w)( .).
The detail work begins here. Remember that Q is the set of all functions
w from R to 1; and X(r, w) = w(r) for w E Q and r E R; and Q is endowed
with the product a-field, namely the smallest a-field over which all X(r) are
measurable. Let Qv be the set of w E Q such that:
w(·) is continuous at all r E R;
w(·) is I-valued at all positive r E R;
w(·) has only finitely many j-intervals on finite time intervals, for all j E I.
CRITERION. Let w E Q. Then w E Q v iff w(r + .) E Q v for all positive
r E R, and w(O) = lim w(r) as r E R decreases to o.
As (8) implies, Q v is measurable. For w E Qv, let

(39) x(t, w) = lim X(r, w) as r E R decreases to t.


234 THE STABLE CASE [7

Remember the mapping S from (37). What does it mean to say that the
conditional distribution of S is a finitary P-chain on positive R? To answer
this question, introduce the class P of probabilities fl on 0, having the
properties:
(40a) fl{X(r) E I} = I for all positive r E R;

(40b) p{X(rn) = i"for n = 0, ... , N}


= fl{ X(ro) = io} n;;,;;-~ P(r n+1 - r 11' in' i n +1)

for nonnegative integers N, and 0 ~ ro < r 1 < ... < rN in Rand


i o, ... , iN in I. By convention, an empty product is 1.

CRITERION. 11 E P iff for all positive r E R,


fl{X(r) E I} = I,
and the fl-distribution of {X(r + s):s E R} is
~jEl fl{X(r) = j} . Pj.
This makes sense, because {s -->- X(r + s, OJ):s E R} is an element of Q for
fl-almost all OJ E O. And P j acts on Q.
The results of this chapter extend to all fl E P in a fairly obvious way,
with X replacing X. In particular, fl E P concentrates on Ov by (8). And
fl E P iff relative to fl, the process {X(t):O ~ t <
oJ} is finitary and Markov
on (0, (0) with stationary transitions P, by (12): the definition is (18).
Here is the strong Markov property. Regular conditional distributions are
discussed in Section 10.10.
(41) Theorem. Suppose r is Markov. Let ~ = {r < oo}, and let Y be the
post-r process. Define S by (37).
(a) Pi{S EO" I ~} = I.
Define X by (39).
(b) If (JJ E {~ and SED,,}, then Y = X 0 S; that is,
Yet, OJ) = X(t, Sw) for all t ~ o.
On~, let Q(.,.) be a regular conditional Pi-distribution for S given §"(r+).
Remember that P is the set of probabilities fl on 0 which satisfy (40). Let ~ p
be the set of w E ~ such that Q(w, .) E P.
(c) ~p E §"(r+), and Pi{~l' IM = I.
PROOF. Claim (a). As in (38a).
Claim (b). Use the definitions.
7.4] THE STRONG MARKOV PROPERTY 235

Claim (c). Let r E R be positive. Let G(r) be the set of wE A satisfying


(42) Q(w, [X(r) E I]) = 1.
I say
(43) G(r) E §(T+) and Pi{A\G(r)} = 0.

The measurability is clear. Furthermore,


Q(w, [X(r) E I]) ~ 1

for all w EA. Integrate both sides of this inequality over A. On the right,
you get Pi{A}. On the left, you get
Pi{A and S E [X(r) E In = Pi{A and Y(r) E l} = Pi{A}
by (36c). So, strict inequality holds almost nowhere, proving (43).
Let s = (so, ... , s "'1) be an (M + 1)-tuple of elements of R, with
° ~ So < ... < s1l1' Let i = (io, ... , i M ) be an (M + I)-tuple of elements of
I. Let
B = B(s, i) = {X(sm) = im for m = 0, ... ,M}
C = C(s, i) = {X(so) = io}
7T = 7T(S, j) = n ~l;:ol P(sm+l - Sw i m , i m+l)'
Let G(s, i) be the set of OJ E A satisfying
(44) Q(OJ, B) = Q(w, C) . 7T.
I say
(45) G(s, i) E §(T+) and Pi{A\G(s, i)} = 0.
lhe measurability is clear. To proceed, integrate both sides of (44) over an
arbitrary A E §( T+) with A c A. On the left, you get
Pi{A and Y(sm) = im for m = 0, ... , M}.
On the right you get
Pi{A and Y(so) = io} . 7T.
These two expressions are equal by (35). Now (lO.lOa) settles (45). But
= n
Ap [n r G(r)]
NOTE. Strong Markov (38 and 41) holds for any
[n S • i G(s, i)].
t-t E P
*
in place of Pi;
review the proofs.
Given the ordering of the states, the holding time on each visit to state i is
exponential with parameter q(i), independent of other holding times. This
can be made precise in various ways. For example, let D be a finite set of
236 THE STABLE CASE [7

states. Let iI' ... , in E D, not necessarily distinct. Suppose qUI), ... ,qUn)
positive. You may wish to review (1.24) before tackling the next theorem.
(46) Theorem. Let fl E P. Given that {X(t):O ;;;; t < oo} pays at least n
visits to D, the 1st being to iI, ... , the nth being to in' the holding times on
these visits are conditionally fl-independent and exponential with parameters
qUI), ... ,qUn)'
?ROOF. Let A be the event X visits D at least once, the 1st visit being to i l .
Let be the holding time on this visit. Let B be the event X visits D at least
0"1
n + 1 times, the 2nd visit being to i 2 , ••• , the n + 1st to in+!' On B, let
0"2' ••• , O"n+! be the holding times on visits 2 through n + 1. Let C be the
event X visits D at least n times, the first visit being to i 2 , ••• , the nth to
in+!' On C, let TI' ••. , Tn be the holding times on the n visits. Let t I , •.• , tn+!
be nonnegative numbers. Let
G = {A and Band O"m > tm for m = 1, ... ,n + I}
H = {C and Tm > tm+! for m = 1, ... ,n}.
If fl E P, I claim
(a) fl{G I A n B} = Pi1{G I B}.
Argumentfor (a). Let cp be the time of first visiting D. Then cp is Markov.
Let S", be X(cp + .) retracted to R. Confine w to Q" n S;IQv , a set of fl-
probability 1 by (38 on fl). Then
A n B =An {X(cp) = il and S", E B}
G= A n {X(cp) = il and S", E G}.
So (38 on fl) implies
fl{A n B} = fl{A} . P i, {B}
fl{G} = fl{A} ·P,,{G}.
Divide to get (a).
Confine w to {Q v and X(O) = iI}' There, 0"1 coincides with the first holding
time in X, which is Markov. Let SI be X(O"I + .) retracted to R. Let v be the
Pi I-distribution of SI; so v depends on i l . I claim
(b) PiJG I B} = v{H I C} . e-q(illtl.
Argument for (b). Confine w to {Q v and X(O) = il and SI E Qv}, which
has Pi,-probability 1 by (41a). There,
B = {SI E C}
G = {SI E H} n {O"I >t I }.
7.5] THE MINIMAL SOLUTION 237

Using (21),
PiJB} = V{C}
Pi {G} = v{H} . e-q(hltl.
1
Divide to clinch (b).
Combine (a) and (b):
f-l{G IA II B} = v{H I C} . e-q(illtl.
But v E P by (21), so induction wins again. *
NOTE. Specialize D = {j}. Then (46) asserts: given X visits j at least n
times, the first n holding times in j are conditionally independent and
exponential, with common parameter q(j). This is the secret explanation for
the proof of (8).
REFERENCES. Strong Markov was first treated formally by Blumenthal
(1957) and Hunt (1956), but was used implicitly by Levy (1951) to prove (46).
For another discussion, see (Chung, 1960, 11.9).

5. THE MINIMAL SOLUTION

The results of this section, which can be skipped on a first reading of the
book, are due to Feller (1945, 1957). For another treatment, see Chung
(1960,11.18). Let Q be a real-valued matrix on the countable set I, with
(47a) q(i) = -Q(i, i) ~ 0;
(47b) Q(i,j) ~ 0 for i -:;t. j;
(47c) L j Q(i,j) ~ o.
NOTE. Any generator Q with all its 'entries finite satisfies (47), by (5.10,
14). If I is finite, or even if SUPi q(i) < 00, then (47) makes Q a generator by
(5.29).
When is there a standard stochastic or substochastic semi group P with
P' (0) = Q? When is P unique? To answer these questions, at least in part,
define r by
r(i,j) = Q(i,j)/q(i) for i -:;t. j and q(i) > 0
=0 elsewhere.
Define a minimal Q-process starting from i as follows. The order of states is a
partially defined discrete time Markov chain with stationary substochastic
transitions r; the holding time on each visit to j is exponential with parameter
q(j), independent of other holding times and the order of states; the sample
function is continuous from the right where defined. In general, there is
238 THE STABLE CASE [7
positive probability that a sample function will be defined so far only on a
finite interval. The most satisfying thing to do is simply to leave the process
partially defined. To avoid new definitions, however, introduce an absorbing
state a1: 1. When the sample function is still undefined, set it equal to a.
The minimal Q-process then has state space I U {a}; starting from a, it
a
remains in forever. The minimal Q-process is Markov, with stationary
standard transitions, say Pa. And P a is a stochastic semigroup on I U {a},
a
with absorbing.
NOTE. All minimal Q-processes starting from i have the same distribution.
One rigorous minimal Q-process can be constructed by (5.39): just plug
in the present matrix Q for the (5.39) matrix Q; more exactly, plug in the
present r for the (5.39) matrix r, and the present q for the (5.39) function q.
Check that the (5.39) matrix Q coincides with the present Q. The construction
embodied in (5.39) produces a process, which I just described informally; and
(5.39) asserts that the formal process is Markov with stationary transitions
Pa, which are standard and stochastic on I u {a}. Let P be the retraction of
Fa to I:
P(t, i,j) = Pa(t, i,j) for t ~ °
and i,j in I.
Now P is a substochastic semigroup on I, because is absorbing. And P a
is standard because P a is. And (5.39) shows P'(O) = Q.
(48) Lemma. P is stochastic iff the minimal Q-process starting from any
i E I has almost all its sample/unctions I-valued on [0, (0). Indeed, ~;El P(t, i,j)
is just the probability that a minimal Q-process starting from i is I-valued on
[0, t].
a
*
PROOF. If the process hits at all on [0, t], it does so at a rational time,
a
and is then stuck in at time t.
It is convenient to generalize the notion of minimal Q-process slightly.
Let the stochastic process {Z(t):O ~ t < oo} on a triple (f!C, ~,9) be a
regular Q-process starting from i E I, namely: Z is a Markov process starting
from i, with state space I u {a}, and stationary standard stochastic tran-
sitions P on I u {a}; moreover, a is absorbing for P, and the retraction of
P to I has generator Q; finally, the sample functions of Z are j U {a}-valued
and regular: a is an isolated point of j U {a}. Let T be the least t if any with
Z(t)f/: I or limBt t Z(s)f/: I;
if none, T = 00. Let Z*(t) = Z(t) if t < T and Z*(t) = aif t ~ T.

(49) Lemma. Z* is a minimal Q-process starting from i.


PROOF. Use (33).
*
7.5] THE MINIMAL SOLUTION 239

NOTE. Different regular Q-processes can have different distributions. As


(49) shows, however, the distribution of Z* is determined by Q for any regular
Q-process Z.
These considerations answer the existence question for substochastic P.
(50) Theorem. The standard substochastic semigroup P on I satisfies
P'(O) = Q. If a standard substochastic semigroup P on I satisfies P'(O) = Q,
then P ~ P.
PROOF. Only the second claim has to be argued. Define the standard
stochastic semigroup Pi! on I u {a} by:
Pi!(t, i,j) = P(t, i,j) for i,j E I;
pi!(t,a,i)=O foriEI;
Pi!(t, i, a) = 1 - ~jEl pet, i,j) for i E I;
PaCt, a, a) = 1.
Now construct a Markov chain {Z(t):O ~ t < oo}onsometriple(q', .'F,f!Jl),
with state space I u {a}, regular j u {a}-valued sample functions, stationary
transitions Pi!' starting from i E l. In particular, Z is a regular Q-process.
Use the notation and result of (49). For i andj in I,
P(t, i,j) = PaCt, i,j)
= f!Jl{Z(t) = j}
~ f!Jl{Z(t) = j and T > t}
= f!Jl{Z*(t) = j}
= P(t, i,j).
Suppose ~j Q(i,j) = 0 for all i. Then (51) below answers the uniqueness *
question. But I believe the most interesting development in (51) to be this:
if P is not stochastic, then there is a continuum of stochastic P with
P'(O) = Q.
(51) Theorem. Suppose ~j Q(i,j) = 0 for all i. Then conditions (a)-(g)
are all equivalent.
(a) P is stochastic.
(b) Any regular Q-process has for almost all its sample functions I-valued
step functions on [0, (0).
(e) Any discrete time Markov chain ~o, ... with stationary transitions r
satisfies ~n l/q(~n) = 00 almost surely.
(d) There is at most one standard substochastic semigroup P with
PI(O) = Q.
(e) There is at most one standard stochastic semigroup P with PI(O) = Q.
240 THE STABLE CASE [7

Claim A. If P is not stochastic, there are a continuum of difJerent standard


stochastic semigroups P on I with P'(O) = Q.
To state (f) and (g), let z be a function from I to [0, 1]. Then Qz is this
function on I:
(Qz)(i) = ~; Q(i,j)z(j).
For A > 0, let (S;.) be this system:
(S;.) z is a function from I to [0, 1] and Qz = AZ.
(f) For all A > 0, the system (S;.) has only the trivial solution z == O.
(g) For one A > 0, the system (S;.) has only the trivial solution z == O.
Let () be the first time if any at which the minimal Q-process starting from
i hits G, and () = 00 if none. Let Z(A, i) be the expectation of e-;.e for A > O.
Then
Claim B. zO,·) is the maximal solution of (S;.) for each A > O.
I learned about (S;.) from Harry Reuter.
PROOF. (a) ifJ(b). Use (48) and (49).
(a) ifJ (c). Use (5.38).
(a) implies (d). Suppose (a), and suppose P is a standard substochastic
semigroup with P'(O) = Q. By (50).
P(t, i,j) ~ pet, i,j).
Sum outj to see that equality holds; namely, P = P.
(d) implies (e). Logic.
(e) implies (a). Follows from claim A.
Claim A. The construction in (6.132) had a parameter p, which ranged
over the probabilities on I. The construction produced a triple (2(0,7Tf)
and a process X, which was Markov with stationary transitions p'P and
starting state i. The transitions formed a standard stochastic semigroup on I,
with generator Q, by (6.37). The sample functions of X were I-valued and
regular, by (6.136). So Xis a regular Q-process. Remember that () in (6.132)
was the least time to the left of which there were infinitely many jumps.
Suppose
"f.; pet, i, j) < 1.
By (48-49),
7Tr{() ~ t} = 1 -"f.;P(t, i,j) > o.
By (6.135),
7Tr{~(1, 0) =j I () < oo} = p(j).
7.5] THE MINIMAL SOLUTION 241

By (6.140), this conditional probability can be computed from the 7Tf-


distribution of {X(r):rational r}. This distribution can be computed from i
and Pp. So pP determines p, and there are continuum many distinct Pp.
Underground explanation. The minimal Q-process starting from j cannot
a
reach by jumping into it, but only by visiting an infinite sequence of states
in a finite amount of time. This protects me from increasing Q(j, k) when
a
I replace by k.
(j) implies (g). Logic.
(a) implies (j) and (g) implies (a). Use claim B.
Claim B. Let X be the minimal Q-process constructed in (5.39), with
visits ~ n' and holding times T no The probability controlling X is 7T i ; write Ei
for 7T i -expectation. Relative to 7Ti' the visits gn} are a r-chain starting from i,
where f is r extended to be stochastic on I U {a} and absorbing at a; given
g n}, the holding times {Tn} are conditionally independent and exponentially
distributed, with parameters {q(~n)}' where q(a) = O. By deleting a set which
is null for all 7Ti , suppose that for all n,
Tn=oo iff q(~n)=O iff ~n+1=a.
Let d be the least n if any with ~n = a, and d = 00 if none. Let
() = ~ {Tn:O ~ n < d} = ~n Tn'
because d < 00 makes Td-l = 00, so () = 00 either way. Then () is the least t
if any with X(t) = a,
and () = 00 if none.
Fix A > O. Abbreviate exp x = eX. Let
z(A, i) = Ei{exp (-A()};
zo(}" i) = 1;
Zn(A, i) = Ei{exp [-A(To + ... + Tn-I)]} for n ~ 1.
I claim
(S;'.n) [q(i) + A]Zn+l(A, i) = ~jr'i Q(i,j)Zn(A,j).
Indeed, with respect to 7T i , the time TO is independent of Tl, ..• , Tn) al>
and is exponential with parameter q(i), by (5.37). So (5.31) makes

ZI(A, i) = Ei{exp (-A'To)} = q(i)


q(i) +A
This is (S;..o)' For n ~ 1, the argument shows
zn+lA, i) = E;{exp(-ATo)' exp [-ACTt + ... + Tn)]}
= :(i) . Ei{exp [-ACTt + ... + Tn)]}.
q(l) + A
242 THE STABLE CASE [7

Let cP be a nonnegative, measurable function of n real variables. Then


Ei{cp(T I , . .. ,Tn)} = ~j;O!i r(i,j)EjN(To, ... ,Tn-I)}:

to check this, split the left side over the sets


gl =j, ~2 =h,···, ~n =jn};
split the jth term on the right over
{~I = h, ... , ~n-I = jn};
and use (5.37). Consequently,
zn+l(J., i) = :(0 . ~i#'i r(i, j)zn(J., j).
q(l) + J.
This settles (S;',n) for n ~ 1.
Check zn(J., i)! z(J., i). Let n ---+ 00 in (S;. n):
[q(i) + J.]z(J., i) = ~j;O!i Q(i,j)z(J.,j).
This shows Z(A, .) to be a solution of (S;.). Why is Z(A, .) maximal? Let z
be a competitive solution. Then
z(j) ~ 1 = zo(J.,j).
So
[q(i) + A]z(i) = Lj;O!i Q(i,j)z(j) by (S)
~ ~j;O!i Q(i,j)zo(J.,j)
= [q(i) + J.]ZI(A, i) by (S;',o)'
That is,
for all i.
Persevering,
z(i) ~ zn(J., i) for all nand i.
Let n ---+ 00 and remember zn(J., i) ! Z(A, i) to get
z(i) ~ z(J., i) for all i,
as advertised. *
NOTE. Suppose ~j Q(i,j) = 0, and suppose the minimal solution P is
really substochastic, If P is a standard substochastic semigroup with P ~ P
coordinatewise, this forces PI(O) = Q, as suggested by Volker Strassen.
Indeed, P'(O) ~ Q coordinatewise by calculus, and the row sums of PI(O)
are nonpositive by (5.14).
NOTE. If ~i Q(i,j) < 0, then P cannot be stochastic. Indeed, q(i) > 0
and r(i, .) has mass strictly less than 1. So the minimal Q-process starting
7.6] THE BACKWARD AND FORWARD EQUATIONS 243

a
from i can reach on first leaving i. However, P may be the only standard
substochastic semigroup with generator Q, as is the case when SUPi q(i) < 00.

6. THE BACKWARD AND FORWARD EQUATIONS

The main results of this section, which can be skipped on a first reading of
the book, are due to Doob (1945). For another treatment, see Chung
(1960, 11.17). Let P be a standard stochastic semi group on the countable set I;
the finite I case has already been dealt with, in (5.29). Let P'(O) = Q; let
q(i) = -QU, i), and suppose q(i) finite for all i. The problem is to decide
when the following two equations hold:

(Backward equation) P'(t) = QP(t)


(Forward equation) P'(t) = P(t)Q.
It is clear from Fatou and the existence of P', to be demonstrated below,
that both relations hold with = replaced by ~. Let X be a Markov process on
the triple (n v , Pi)' with stationary transitions P, starting state j, and regular
sample functions which are finite and continuous at the binary rationals;
for a discussion, see (12).

The hack ward equation


Recall the definition and properties of the post-exit process Y from
Section 3. Let b(s, i,j) be q(i) times the P;-probability that the post-exit
process starts in ({', and is in j at time s:
b(s, i,j) = q(i)Pi{Y(O) = rp and Yes) = j}.

By (57) below, b(·, i,j) is continuous on [0, (0); here is a more interesting
argument. Suppose Sn -+ s. Using (29),

{Y(O) = ({' and Y(sn) = j} -+ {Y(O) = rp and Y(s) = j}, a.e.

Fatou implies b(sn' i,j) -+ b(s, i,j).


Let
<1(s, i,j) = ~k*i Q(i, k)P(s, k,j).

As (5.14) implies, ~k;o'i Q(i, k) < 00. As (5.9) implies,P(', k,j) is continuous.
By dominated convergence, <1(', i,j) is finite and continuous on [0, 00). Let
A(i,j) be 1 or 0, according as i = j or i ~ j.
244 THE STABLE CASE [7

(52) Theorem.
(a) pet, i,j) = fl(i,j)e-q(ilt + fe-q(ilU [b(t - u, i,j) + a(t - u, i, j)] duo

(b) P'(t, i,j) exists and is continuous on [0, 00); namely,


P'(t, i,j) = bet, i,j) - q(i)P(t, i,j) + aCt, i,j).
(c) In particular,
(53) P'(t, i,j) = ~k Q(i, k)P(t, k,j)
is equivalent to
(54) b(t, i,j) = O.
PROOF. Claim (a). Condition on the first holding time and use (21).
More prosaically, suppose q(i) > 0; the opposite case is trivial. And suppose
i :;!: j; the case i = j is very similar. Confine OJ to the set {X(O) = i}. Let T
be the first holding time. Confine OJ further to the set {X(O) = i and T < oo},
which has Pi-probability 1. Clearly,
{X(t) = j} = {T ~ t and yet - T) = j}.
But (21) makes T and Y independent, T being exponential with parameter

f
q(i). Fubini this:
Pi{X(t) = j} = e-q(ilu q(i)P,{ Y(t - u) = j} duo
Clearly,
{Y(s) = j} = Uke1 {Y(O) = k and Yes) = j}.
So
P i { Yes) = j} = ~kel P i{ yeO) = k and Yes) = j}.
Use (21) again for k E I:
q(i)Pi{ YeO) =k and Yes) = j} = q(i)r(i, k)P(s, k,j)
= Q(i, k)P(s, k,j).
By definition,
= cp and Yes) = j} = b(s, i,j).
q(i)Pi{Y(O)
Claim (b). Put s = t - u in (a) and differentiate with respect to t, using
the continuity of band C1. Then use (a) again.
Claim (c). Use (b) and the definition of 0'. *
(55) Theorem. (a) If (53) holds for any t > 0, it holds for all t; and then
(56) pet, i, j) = fl(i, j)e-q(ilt + Ie-q(;)U a(t - u, i,j) duo

(b) If (56) holds for any t > 0, then (53) holds.


7.6] THE BACKWARD AND FORWARD EQUATIONS 245

PROOF. Claim (a). To begin with, I say t > 0 and s ~ 0 imply


(57) b(t + s, i,j) = Lk bet, i, k)P(s, k,j).
Indeed, (21) makes P i{ yet) E /} = 1 and
P;{Y(O) = cp and yet) =k and yet + s) = j}
= P i{ YeO) = cp and yet) = k} . pes, k,j).
So,
b(t + s, i,j) = q(i)P;{Y(O) = cp and yet + s) = j}
= q(i)P;{ YeO) cp and yet) E I and yet + s) = j}
=
= Lk q(i)Pi { YeO) = cp and yet) = k and yet + s) = j}
= Lk q(i)Pi{ YeO) = cp and yet) = k} . pes, k,j)
= Lk bet, i, k)P(s, k,j).
This proves (57).
By a theorem of Levy, to be proved in Section 2.3 of ACM., the function
s -pes, k,j) is identically 0 or strictly positive on (0,00); there is a really
clever analytic proof to this theorem, due to Don Ornstein, in Chung (1960,
pp. 121-122). By (57), the same dichotomy applies to b(t, i,j). Indeed,
suppose bet, i,j) > 0 for some t > O. Use (5.7):
bet + s, i,j) ~ b(t, i,j)P(s,j,j) > 0,
so b(u, i,j) > 0 for u ~ t. Next, let 0 < s < t. Then
bet, i,j) = Lk b(s, i, k)P(t - s, k,j),
so b(s, i, k)P(t - s, k,j) > 0 for some k. Then P(u, k,j) > 0 for all u > 0
and b(s + u, i,j) ~ b(s, i, k)P(u, k,j) > O. This proves the dichotomy. The
dichotomy and (52c) prove the first part of (a). Then (52a) gets the rest of (a).
Claim (b). If (56) holds, then (52a) implies

fe-q(il" bet - u, i, j) du = O.

*
This forces b to vanish somewhere, so everywhere. Use (52c).
(58) Theorem. Thefollowing three relations are all equivalent:
(a) Relation (53) holds for all j.
(b) The X sample function jumps from i to cp with Pi-probability O.
(c) L; Q(i,j) = O.
PROOF. (a) iff (b). A~ usual, suppose q(i) > O. Let T be the first holding
time, and confine w to
{X(O) = i and T < 00 and X(T + I) E I},
246 THE STABLE CASE [7

which has Pi-probability 1 by (29). But


{X(O) = i and X(T) = IP} = UdX(O) = i and X(T) = lPandX(T + 1) =j};
so
q(i)P;{X(T) = IP} = ~; b(l, i,j).
Use (52c).
(b) i./f(e) is clear from (21).
*
The forward equation
The next difficulty is proving that pet, i,j) is finite and continuous on
[0, 00), where
pet, i,j) = ~k*iP(t, i, k)Q(k,j).
Suppose without real loss of generality that pet, i, k) > 0 for all t > 0 and
k E I: you can reduce I to this set of k, and then retract P to the smaller I. Let
p(k) = S~ pet, i, k) dt, and suppose first
(59) p(k) < 00 for all k.
Check that
~k p(k)P(t, k,j) ~ p(j).
Let
P(t,j, k) = p,(k)P(t, k,j)/p,(j).
Then P is a standard substochastic semigroup with generator {?, where
{?(j, k) = p,(k)Q(k,j)/p,(j).
But
pet, i,j) = p,(j). ~k*i {?(j, k)P(t, k, i)/p,(i),

and is now finite and continuous, by the argument for (1. To remove condition
(59), use the argument on the standard substochastic semigroup t -+ e- t P(t).
Let
I(t, i,j) = P'(t, i,j) + q(j)P(t, i,j) - p(t, i,j),
a continuous function on [0, 00). By Fatou,/~ O.

(60) Lemma. For t ~ 0 and s > 0,


I(t + s, i,j) = ~kP(t, i, k)f(s, k,j).
PROOF. First, suppose (59). As (52b) implies, P is continuously differ-
entiable. Let
h(t, i,j) = P'(t, i,j) - ~k {?(i, k)/8s, k,j).
7.6] THE BACKWARD AND FORWARD EQUATIONS 247

Informally, (52b) reveals b as the b of P. That is, bet, i,j) is q(i) times the
probability that a P-chain with regular sample functions starting from i
jumps to p on leaving i, and is inj at time t after the jump. So, (57) holds with
hats on. By algebra,
ft(i)j(t, i,j)/ft(j) = b(t,j, i);
By more algebra, hatted (57) is (60). To remove condition (59), apply the
argument to the standard substochastic semigroup t -+ e- t pet). The fudge
factors cancel. *
Of course, (52) and (57) work for substochastic P, by the usual maneuver
of adding o. The argument in (55) shows
(61) Corollary. j(t, i,j) is identically 0 or strictly positive on (0, (0).
If p is continuously differentiable on [0, (0), and
number, integration by parts shows
° ~ q < 00 is a real

(62) {[P'(S) + qp(s)]e-q(t-s) ds = pet) - e- at p(O).

Recall that
Jet, i,j) = P'(t, i,j) + q(j)P(t, i,j) - pet, i,j),
where
pet, i,j) = 1: k #j pet, i, k)Q(k,j)
is finite and continuous.
(63) Theorem.
(a) ret, i,j) = j(t, i,j) - q(j)P(t, i,j) + pet, i,j).
(b) pet, i,j) = b..(i,j)e-q(j)t + f [j(s, i,j) + pes, i,j)] e-q(j)(t-s) ds.
(c) In particular,
(64) PI(t, i,j) = '£.kP(t, i, k)Q(k,j)
is equivalent to
(65) j(t, i,j) = 0.
(d) if (64) holds for any t > 0, it holds for all t; and this is equivalent to
pet, i,j) = Mi,j)e-a(j)t + {pes, i,j)e-a(;)(t-s> ds.

PROOF. Claim (a) rearranges the definition off


Claim (b). Use (62), with P(·, i,j) for p and q(j) for q:

pet, i,j) = Ll(i,j)e-a(j)t + {[P'(S, i,j) + q(j)P(s, i,j)] e-a(J)(t-s> ds.


248 THE STABLE CASE [7

Substitute claim (a) into this formula.


Claim (c) is immediate from (a).
Claim (d). Use (b) and (c) and (61).

I will now obtain an analog of (58a~ 58b). It is even possible to throw in


*
(58c), by working with P and Q; but I will not do this. Informally, for k E I
and k =F- j,
pes, i, k)Q(k,j) ds

is the Pcprobability that a jump from k to j occurs in (s, s + ds). Thus,


p(s, i,j) ds is the Pcprobability that X jumps from some state to j in
(s, s + ds), and
fp(S, i,j)e-q(j)(t-s) ds

is the Pi-probability that the sample function experiences at least one dis-
continuity on [0, t], and the last discontinuity is a jump from some real state
to j. Now (63b) reveals f(s, i,j) ds as the Pi-probability of a jump from f{J
to j in (s, s + ds). All these statements are rigorous in their way. To begin
checking this out, let y be the time of the last discontinuity of X on or before

°
time t, on the set D where X has at least one such discontinuity. That is, D
is the complement of {X(s) = X(t) for ~ s ~ t}. On D, the random variable
y is the sup of s < t with X(s) =F- X(t). By regularity,
X(y) = lim,d y Xes) = X(t);
X(y-) = 1ims! y X(s) =F- X(t)
is a random element of 1.
NOTE. D and y depend on t.
(66) Proposition. Let j, k E I and k =F- j.
(a) Pi{D and X(y-) = k and X(t) = j} = {pes, i,k)Q(k,j)rq(iJ(t-s) ds.

(b) Pi{D and X(y-) E I and X(t) = j} = fp(S, i,j)e-q(;)(t-S) ds.

(c) Pi{D and X(y-) = f{J and X(t) = j} = ff(S, i,j)e-q(;)(t-s) ds.

PROOF. Claim (a). Without real loss, put t = 1. Let Dn be the event
X(m/2n) =F- X(I) for some m = 0, ... , 2 n - 1. On D n , let Yn be the greatest
m/2n < I with X(m/2 n) =F- X(I). Using the regularity, check Dn i D and
Yn i y and
{Dn and X(Yn) =k and X(I) = j} ~ {D and X(y-) = k and X(I) = j}.
7.6] THE BACKWARD AND FORWARD EQUATIONS 249

By Fatou, the Pi-probability of the left side converges to the Pi-probability


of the right side. But the P;-probability of the left side is

(67) !:=~ p( 1 - m 2~ 1, i, k) . pGn' k,j) .pGn ,j,jf·


As n- 00,

p(1..2 n'
k .) -
,j -
1..
2n
Q(k .) + 0(1..) .
,j 2n '
and

so
p(1..r ,j',j.)m_ e-q(;)u as m _ u, uniformly in
r
°-:=:; u -:=:; 1.
Consequently, (67) converges to the right side of claim (a).

*
Claim (b). Sum claim (a) over k E I\{j}.
Claim (c). Subtract claim (b) from (63b).
If X(y-) = k and X(t) = j, call the last discontinuity of X on [0, t] a
= cpo Let Ilij(t) be the Pi-mean number of jumps
jump from k to j; even if k
from cp to j in [0, t].

(68) Proposition. Il;;(t) = 1'/(S, i,j) ds.


PROOF. This is a rehash of (66c). Jumps from cp to j occur at the beginning
of j-intervals; so there are only finitely many on finite time intervals:
W EQv'

°
Let < Yl < Y2 < ... be the times of the first, second, ... jumps from
cp to j. If there are fewer than n jumps, put Yn = 00. Thus Yn - 00 as
n - 00. If Yn < 00, let Tn be the length of the j-interval whose left endpoint
is Yn' Now Yn + Tn ~ Yn+!; while
{D and X(y-) = cp and X(t) =j} = U~1 {Yn ~ t < Yn + Tn}.
SO,
Pi{D and X(y-) = cp and X(t) = j} = l:~1 Pi{Yn ~ t < Yn + Tn}.
Fix a positive real number t. I say that {Yn ~ t} E ~(t). Indeed, let D be
a countable dense subset of [0, t], with tED. For a < bin D, let £(a, b) be
the event that X(b) = j, and for all finite subsets J of I there are binary
rational r E (a, b) with X(r) ¢ J. Let F(s) be the event that for all positive
integer m, there are a and b in D with
s <a<b~ t and b - a < 11m and £(a, b).
250 THE STABLE CASE [7

Then
{Yn ~ t} = Us {Yn-I ~ sand F(s):s ED},
proving that Yn is a Markov time. Clearly X(Yn) = j on {Yn < oo}.
By (21) and strong Markov (38), given {Yn < oo}, the variable Tn is con-
ditionally Pi-exponential with parameter q(j), independent of Yn. Let
'Vn(t) = Pi{Yn ~ t}. By Fubini,

Pi{Yn ~ t < Yn + Tn} = f e- Q(;) (t-.) 'Vn(ds).


But flu = ~~~I so

f
'lin'

P i { D and X(y-) = P and XCI) = j} = e- Q (;) (t-s) fli;(ds).

Compare this with (66c) to get


fli; (ds) = f(s, i,j) ds. *
Aside. With respect to Pi' given {Yn < oo}, the variable Yn+l - Yn is
independent of YI, ... , Yn and its distribution coincides with the P;-
distribution of YI.
(69) Theorem. The following two relations are equivalent:
(a) Relation (64) holds for some t > O.
(b) The X sample function jumps from p to j with Pi-probability O.
PROOF. Suppose (a). By (63c, d), relation (65) holds for all t. Then (68)
shows /-l;; = O. So (b) holds. Conversely, suppose (b). Rememberf ~ O. Then
= 0 for some t > 0 by (68), so for all t by (61). Then (a) holds by
*
f(t, i,j)
(63c).
(70) Theorem. For positive t and s,
:E; IpI(t, i,j)1 < 00 and :E; P'(t, i,j) = 0;
moreover,
PI(t + s) = PI(t)P(s) = P(t)P'(s).

*
PROOF. Use (21) and (52b) for the first display. Use (57) and (60) for the
second, with b andfexpressed in terms of P using (52b) and (63b).

Reversing the sample function


The semigroup P, introduced after (59), has a sample function interpreta-
tion. Temporarily, renormalize X to be continuous from the left. Fix Pi and
compute relative to it. Suppose (59). For finite subsets J of I, let T J be the time
of the last visit to J. The process
{X(TJ - t):O ~ t < TJ}
7.6] THE BACKWARD AND FORWARD EQUATIONS 251

is Markov with stationary transitions ft, and regular (partially defined)


sample functions.
What happens without (59)? Keep looking at Pi' If T is independent of X
and exponentially distributed with parameter 1, then
{X(/):O ~ 1~ T}
is Markov with stationary transitions 1 ---+ e-tp(/); and
{X(T - 1):0 ~ t ~ T}
is Markov with stationary transitions ft, where
ft(t,j, k) = /-,(k)e- t P(t, k,j)//-,(j)

/-,(j) = La) e- t P(t, i,j) dl.


8

MORE EXAMPLES FOR THE


STABLE CASE

1. AN OSCILLATING SEMIGROUP

Let P be a standard stochastic semigroup on the countably infinite set l.


Suppose a, b, c are distinct elements of I. As usual, Q = P' (0) exists by (5.21).
If Q(a, b) > 0 or if Q(a, c) > 0, then P(t, a, b)/P(t, a, c) converges as t -+ 0,
namely to Q(a, b)/Q(a, c). If P is uniform, then P is analytic by (5.29), so the
convergence holds by I'Hopital (10.78 and 80). Lester Dubins asked me
whether the convergence held in general. The object in this section is to
provide a counterexample:
(1) Theorem. There is a countable set I, containing a, b, c, and a standard
stochastic semigroup P on I, such that;
(a) all elements of Q = p' (0) are finite;
(b) there is a Markov chain with stationary transitions P, all of whose
sample functions are step functions;
(c) lim SUpt .... () P(t, a, b)/P(t, a, c) = 00, and
lim inft .... o P(t, a, b)/P(t, a, c) = o.
The construction

OUTLINE. The state space I consists of a, b, c and (d, n, m) for


d = b or c and n = 1, 2, ... and m = 1, ... ,f(n). Here f(n) is a positive
I want to thank Isaac Meilijson for checking the final draft of this chapter, which
can be skipped on a first reading of the book.
252
8.1] AN OSCILLATING SEMIGROUP 253

integer to be chosen later. Think of it as large. Let bn and Cn be positive, with


1::'=1 (b n + Cn) = 1. Suppose:
(2) dn+1 + dn+ + ... = o(dn)
2 for d = b or c;
(3) lim sUPn-+oo bn/cn = 00 and lim infn-+oo bn/cn = O.
For instance. If n is even, let
bn = fJ n-1 2- n and Cn = Y n- 2 2- n •
If n is odd, let
= fJ n- 2 2- n and C n = Y n-1 2- n •
bn
Choose the positive constants fJ and y so ~:'=1 (b n + cn) = 1.
Let 0 < qn m < 00. These numbers will be chosen later; think of them as
large. On a ~onvenient probability triple (O,.'F, glJ), let To be exponential
with parameter 1; and let T n.m be exponential with parameter qn.m for
n = 1,2, ... and m = 1, .. . f(n). Suppose thef(n) + 1 variables

are mutually independent, for each n.


Construct an informal stochastic process as in Figure 1, with d = b or c.
The process starts in a, stays there time To, then jumps to (d, n, 1) with
probability dn . Having reached (d, n, m), the process stays there time T n • m ,
and then jumps to (d, n, m + 1) unless m = fen), in which case the process
jumps to d. Having reached d, the process stays there. By (5.39), this process
is a Markov chain with stationary transitions P, which are standard and
stochastic on I; and (a) holds. Later, I will chooseqn.m andf(n), and argue (c).

Formal use of (5.39)

The elements of (5.39) are I, r, q. The state space Ihas already been defined.
Define the substochastic matrix r on I as follows, with d = b or C and with
n = 1,2, ... :
qa, (d, n, 1)]= dn ;
qed, n, m), (d, n, m + 1)] = 1 for m = 1, ... ,fen) - 1;
r[(n,f(n)), d] = 1;
all other entries in r vanish. Define the function q on I as follows, with
d = b or C and n = 1, 2, ... :
q(a) = 1;
qed, n, m) = qn.m;
qed) = O.
254 MORE EXAMPLES FOR THE STABLE CASE [8

• •• •
C)
•• • ••
c, 3, 1 c,3,2 ••• c, 3, 1(3)

C2
••• c, 2,/(2)

Cl
c, 1, 1 C, 1,2 ••• c,I,[(1)

bl
b, 1, 1 b, 1,2 ••• b,I,[(1)

b2
b, 2, 1 •••

b3
b, 3,1 b,3,2 •••
• • ••
•• •• •
Figure 1.

Now (5.39) yields a process X and a probability TTi which makes X Markov
with stationary transitions P and starting state i. The semigroup P is standard
and stochastic on I U {o}, where 0 is absorbing. As you will agree in a
minute, X cannot really reach 0 starting from i E I; so P is standard and
stochastic when retracted to 1. Use (5.39) to check (la-b).
The visiting process in (5.39) was called g n}, and the holding time process,
{Tn}. Let f!(o be the set where ~o = a, and ~1 = (d, n, 1) for some d and n,
while ~1 = (d, n, 1) implies:

Tm < 00 and ~m = (d, n, m) for m = 1, ... ,fen);

Tm = 00 and ~m =d for m = fen) + 1;


Tm = 00 and ~m =0 for m > fen) + 1.
8.1] AN OSCILLATING SEMIGROUP 255

Use (5.37) to check (4-6):


(4)
(5) 1Ta{~l = (d, n, I)} = dn ;
(6) given {~1 = (d, n, I)}, the conditional1Ta -distribution of

coincides with the &,-distribution of

Aside. It is easy to make the chain more attractive. Let it stay in b or c


*
for an independent, exponential time, and then return to a. This complicates
the argument, but not by much. Some of the details for this modification are
presented later in the section.

Lemmas

Here are some preliminaries to choosing f(n) and qn,m' For (7), let Un
and Vn be random variables on a probability triple (Q n, ~ n' &' n). Suppose
Un has a continuous &' n-distribution function F, which does not depend on n.
Suppose Vn converges in &' n-probability to the constant v, as n -+ 00. Let
G be the &' n-distribution function of Un + v, so
G(t) = F(t - v).
(7) Lemma. &' n{Un + Vn ~ t} -+ G(t) uniformly in t, as n -+ 00.

PROOF. Let E > O. Check


+ Vn ~ t} C {Un + V ~ t + E} U {Vn < V -
{Un E}:
because Un(w) + Vn(w) ~ t and Vn(w) ~ v - e imply
Un(w) + v ~ Un{w) + Vn{w) + e ~ t + e.
Similarly,
{Un +V~ t - e} C {Un + Vn ~ t} U {Vn > V + e}.
So,

and

Let
256 MORE EXAMPLES FOR THE STABLE CASE [8

so l:5(e) t 0 as e t o. The absolute value of


o9n{Un + Vn ~ I} - o9n{Un + v ~ I}
is at most
+ 09 n{l Vn - > e}.
Let
l:5(e) vi
*
Sn ~f(n) T
= ~m=l n.m"

(8) Fact. Sn has mean L~:~ I/qn,m and variance L~:~ I/q~.m'

*
PROOF. Use (5.31).

Choosing f and q

The program is to definef(n) and {qn,m:m = I, ... ,f(n)} inductively on


n, and with them a sequence In t 0, such that
(9)
Xn R:! Yn means xn/Yn ~ 1. Relations (3) and (9) establish (c).
Fix a positive sequence en with en ~ 1 and
(10)
Abbreviate

For N E:; 2, I will require:


(11) 0 < IN < lIN-I;
(12) eN = liN;
(13) o9{To+Sn~/N}<eNIN forn=l, ... ,N-l;
(14) 1 - eN < o9{To + SN ~ In}/o9{To + eN ~ In} < 1 + eN
forn = 1, ... ,N.
Let f(l) = 1 and 11 = 1 and qI,I = 2, so (12)
holds even at N = 1. Let
N ~ 2. Suppose f(n), In, and qn,m chosen for n < N, so that (11-12) holds.
I will choosef(N), IN' andqN,m' To begin with, (5.34) impliesthatforn < N,
o9{To + Sn ~ I} = 0(/) as I ~ O.
Choose IN so (11) and (13) hold. Now choosef(N) and qN,m so that

L~~~ l/qN.m = t N12,


making (12) good; while
~f(NJ 11 2
(] = km=I qN,m
8.l] AN OSCILLATING SEMIGROUP 257

is so small that (14) holds. I can do this because (8) and Chebychev make
SN -- ON in probability as a -- O. But (11) and (12) make
ON < tN < t N- < ... < t
1 l ,

so 9{To + ON ~ tn} > 0 for n =


*
1, ... ,N. Now (7) gets (14). This
completes the construction.

The rest of the proof

ARGUMENT FOR (9). I will continue with the notation established for
(5.39). Let

As usual, d = b or c. Let
den, t) = {~l = (d, n, 1) and TO + an ~ t}.
As (4) implies, {X(t) = d} differs by a 7Ta -null set from U:=l den, t). As (5,6)
imply,

So

(15)
Use (14) with N = n:
9{To + Sn ~ tn} ~ 9{To + On ~ tn}.
Abbreviate
e(x) = 1 - e- Z•

Use (12) with N = n:

9{To + On ~ tn} = e{t" - On) = e(tt,.) ~ tt,..


So
(16) dn9{To + S,. ~ tn} ~ tdnt n as n -- 00.

Suppose N > n. Use (14) and eN ~ 1 to check this estimate.


9{To + SN ~ tn} ~ (1 + eN)9{To + ON ~ tn}
~ 29{To ~ t n}
= 2e(tn)
~ 2t,..
Use (2):
25S MORE EXAMPLES FOR THE STABLE CASE [S

Let 'V = 1, ... , n - 1. Then (13) with n for N and 'V for n makes
.?J{To + Sv ~ tn} ~ Entn'
Since ~v dv ~ 1, relation (10) makes

(IS) I~:~ dv.?J{To + Sv ~ tn} = o(dnt n) as n -+ 00.

*
Combine (IS-IS) to get (9).

Modifications.

Let Tl be an exponential random variable with parameter 1 on (Q, .'7, .?J).


Suppose the fen) + 2 variables
To, Tn,l' ... , Tn,t(nl> TI
are mutually independent, for each n. Modify the chain so that on first
reaching b or c, it stays there time Tb then jumps back to a. On returning to
a, make the chain restart afresh. In formal (5.39) terms, this amounts to
redefining red, .) and qed) for d = b or c:
red, a) = 1;
qed) = 1.
As before, let fl'o be the set where ~o = a, and ~l = (d, n, 1) for some d and
n, while~l = (d, n, 1) implies:
~m = (d, n' m) for m = 1, ... ,fen);
~m = d for m = fen)+ 1;
for m =f(n) + 2;
and 'T m < 00 for all m = 0, 1, .... Use (5.37) to check:
7Ta {fl'O} = 1;
7Ta{$1 = (d, n, I)} = dn ;
given {$l = (d, n, I)}, the conditional 7T a -distribution of

coincides with the .?J-distribution of

The conclusions and proof of (1) apply to this modified chain, provided
that {tn} satisfies (19) in addition to (11-14):
(19)
8.1] AN OSCILLATING SEMIGROUP 259

Here are some clues. Keep

Let
d+(n, t) = {~I = (d, n, 1) and + an ~ t < TO + an + T,(n)+l}
TO

d*(n, t) = {~I = (d, n, 1) and TO + T/(n)+1 ~ t}.


On fIo,

{U~=I d+(n, t)} c {X(t) = d} c {U~=I d+(n, t)} U {U~I Ud=b.c d*(n, tn.
So the new pet, a, d) is trapped in [D+(t), D+(t) + D*(t)], where
D+(t) = :E~=I 7Ta { d+(n, t)}
and
D*(t) = :E~I :Ed=b.c 7Ta { d*(n, t)}.
But

and

Check
{To + Sn ~ t} n {Tl > t} c {To + Sn ~ t < To + Sn + T I }
c {To + Sn '~ t};
so
.9{To + Sn ~ t} . .9{TI > t} ~ .9{To + Sn ~ t < To + Sn + T I}
~ .9{To + Sn ~ t};
and
7Ta{d+(n, t)} ""=! dnfY'{To + Sn ~ t}
as t -+ 0, uniformly in n. This means you can e5timate D+(t) by the old
pet, a, d). Furthermore, :E n •d d n = 1. So
D*(tN) = .9{To + TI ~ IN} = o(dstN)

*
by (10, 19). This term is trash. The overall conclusion: as N -+ 00, the new
P(tN' a, d) is asymptotic to the old P(t..... , a, d).
Continue with the modified chain. Given the order of visits ~o, ~I' .•• ,
the holding times TO' TI' . . . are independent and exponential, so I once
expected
7T a {To + T1 + T2 ~ I} = 0(t 2) as 1-+0.
Since b can be reached from a in two jumps but not in one, I also expected
pet, a, b) '" t 2 as t -+ O.
260 MORE EXAMPLES FOR THE STABLE CASE [8

Both expectations were illusory. Let


gil) = "lTa{X(t) = d and TO + Tl + T2 ;;;; I}.
Then
(20) P(I, a, d) = git) + "lTa{X(t) = d and TO + Tl + T2 > t}.
Remember f(l) = 1. Suppose fen) > 1 for n > 1. Except for a "ITa-null set,
{X(t) = d and TO + 71 + T2 > t}
= gl = (d, 1, 1) and TO + Tl ;;;; t < TO + Tl + TS}.
SO the second term in (20) is
d/!~{To + T 1,1 ;;;; t < To + T 1 ,1 + T1} = d1t2 + 0(t 2) as t -+ O.
Con seq uently,
t- 2 pet, a, d) = t-S git) + d1 + 0(1) as t -+ o.
Put d = b or c and t = tn and divide. Remember (9):

bn P(tn' a, b) 0(1) + b 1 + t;;2 . go(tn)


Cn R:::1 P(tn' a, c) = 0(1) + Cl + r;;2 . gC<t n) .
Now lim sup bn/c n = 00 forces lim sUPn_ oo (;:2. go(tn) = 00.
In particular,
(21) lim SUPt_O t- 2 "lTa{To + Tl + T2 ;;;; t} = 00;
and
(22) lim SUPt_O t- 2 P(t, a, b) = 00;
here b can be reached from a in two jumps, but not in one. This disposes of
the two illusions. Moreover, (7.52) implies that pet, a, b) is continuously
differentiable on [0, (0). And (5.39) implies P'(O, a, b) = O. Consequently,
(22) implies that pet, a, b) does not have a finite second derivative at O.

2. A SEMIGROUP WITH AN INFINITE SECOND DERIVATIVE

In this section, I will prove the following theorem of luskevic (1959).


(23) Theorem. There is a countable set I containing a and b, and a standard
stochastic semigroup P on I, such that:
(a) all elements of Q = r(o) are finite;
(b) there is a Markov chain with stationary transitions P, all of whose
sample functions are step functions;
(c) r(l,a,b) = 00.
8.2] A SEMIGROUP WITH AN INFINITE SECOND DERIVATIVE 261

•••

pz
2, 1 2, 2

PI
1, 1

Figure 2.

The construction

OUTLINE. The states are a, band (n, m) for m = 1, ... ,n and


n = 1, 2, .... Let Pn be positive, with :E:=1 Pn = 1 and
(24)
Fix;' > O. On a convenient probability triple (0, .'F, f!J1), let To be expo-
nential with parameter J., and let T n •m be exponential with parameter n, for
each state (n, m). Suppose the n + 1 variables
To, T n •lo ••• , T n • n
are mutually independent for each n.
Construct an informal stochastic process as in Figure 2. The process
starts in a, stays there time To, and then jumps to (n, 1) with probability Pn'
Having reached (n, m), the process stays there time T n • m , and then jumps
to (n, m + 1) if m < n or to b if m = n. Having reached b, the process stays
there. By (5.39), this process is a Markov chain with stationary transitions P,
which are standard and stochastic on I; and (a) holds. I will argue (c) soon.

Formal use of (5.39)

The elements of (5.39) are I, r, and q. The state space I has already been
r
defined. Define the substochastic matrix on I as follows, with n = 1, 2, ... :
r(a, (n, 1)] = Pn;
r(n, 111), (n, m + 1)] = 1 for m = 1, ... , n - 1;
r(n, n), b] = 1;
262 MORE EXAMPLES FOR THE STABLE CASE [8

all other entries in r vanish. Define the function q on I as follows, with


n = 1,2, ... :
q(a) = A.
q(n, m) = n
q(b) = O.
Now (5.39) constructs a process X and a probability 7Ti which makes X
Markov with stationary transitions P and starting state i. The semigroup P
is standard and stochastic on I U {o}, where 0 is absorbing. As you will see
in a minute, X cannot really reach 0 starting from i E I; so P is standard and
stochastic when retracted to I. Use (5.39) to check (23a-b).
The visiting process in (5.39) was called gn}, and the holding time process,
{Tn}. Let f!(o be the set where ~o = a, and ~1 = (n, 1) for some n, while
~l = (n; 1) implies:

Tm<oo and ~m=(n,m) form=I, ... ,n;


Tm = 00 and ~m =b for m =n+ 1;
Tm = 00 and ~m = 0 for m > n + 1.
Use (5.37) to check (25-27):
(25) 7Ta (f!(O) = 1;
(26) 7Ta{~l = en, I)} = Pn;
(27) given {~l = (n, 1)}, the conditional 7Ta -distribution of

coincides with the 9"-distribution of

*
The rest of the proof

PROOF OF (c). Relation (25) shows that except for a 7Ta -null set,

{X(t) = b} = U:'1 A,,(t),


where
An{t) = {~1 = (n, 1) and To + Tl + ... + Tn ~ t}.
By (26,27),

7Ta{An(t)} = PnF net),


where
Fn(t) = 9"{To + Tn.l + ... + Tn.n ~ t}.
8.2] A SEMIGROUP WITH AN INFINITE SECOND DERIVATIVE 263

Therefore,
pet, a, b) = ~~=l PnFn(t).
As (5.34) implies, for all t ~ 0

(28) o ~ F~(t) ~ 2.
By dominated convergence

(29) Pl(t, a, b) = ~:'l PnF~(t).

The density d n of
Sn = T n.l + ... + T n.n
is the convolution of n exponential densities with parameter n:
dn(t) = nn t n- 1 e-nt/(n - I)! for t ~ O.
By Stirling,
(30)
Abbreviate e(t) for 2e- u . So To has density e(·) on [0, 00). And the density
F~ of To + Sn is the convolution of e(·) and d n. Namely,

F~(t) = fdn(S) Ae-·Ht - s ) ds for t ~ O.


Differentiate this:
F~(t) = Un(t) - AF~(t) for t ~ O.
Use (28):

(31)
In particular,
(32) h-l[F~(1 + h) - F~(l)] ~ _}.2 for -1 < h < 0 or h > O.
Let -1 < h < 0 or 0 < h < 00. Introduce the approximate second
derivatives
s(h) = h-1 [P (1 + h, a, b) - P'(1, a, b)]
I

sn(h) = h-1 [F~(1 + h) - F~(l)].


By (29),
s(h) = ~:'=l Pnsn(h).
Fix c with 0 < c < (27T)-i. Using (30), find a positive integer N so large that
(33) U n (1) - }.2 ~ ).cn! for n ~ N.
For 1 ~ n < N, inequality (32) gives
sn(h) ~ _22.
264 MORE EXAMPLES FOR THE STABLE CASE [8

Of course, r.~:~Pn ~ 1. So

s(h) ~ _}.2 + r.~~ Pnsn(h).


Forn ~ N,
lim h-+ o sn(h) = F~(1) (calculus)
~ Adn (1) _}.2 (31)
~ }.cn! (33).
At this point, use Fatou:

lim infh-+o s(h) ~ _}.2 + }.c r.~N Pnn!


Now exploit assumption (24)
*
Modifications

You can modify the construction so P"(t, 0, 1) = OC! for all tEe, a given
countable subset of [0, 00). Count C off as {t}, t 2 , ••• }. Suppose first that all
t. are positive. Let I consist of a, b, and (v, n, m) for m = 1, ... ,n and
positive integer v and n. Rework the chain so that it jumps from a to (v, n, 1)
with positive probability P•. n' where:
r. •. nP •. n = 1; and r.nP •. n n! = 00 for each v.
Make the chain jump from (v, n, m) to (v, n, m + 1) when m < n, and to b
when m = n. Make b absorbing. Let the holding time parameter for a be A.
Let the holding time parameter for (v, n, m) be nit •.
As before,
pet, a, b) = r. •. n P•. n F •. n(t)
and
P'(t, a, b) = r. •. n P•. n F~.n(t):
where F. n is the distribution function of the sum of n 1 independent, +
exponential random variables, of which the first has parameter }. and the
other n have parameter nit•. The reason is like (28). I will work on tlo the
other t. being symmetric. Let
s(h) = h-} [P'(t} + h, a, b) - P'(t}, a, b)].

Fix a large positive integer N. For v> 1 or v = 1 but n < N,


h-} [F~.n(t} + h) - F;.n(t})] ~ _}.2.
The reason is like (32). So

s(h) ~ _}.2 + r.:=N P}.n h-1 [F~.n(tl + h) - F~.n(t})].


8.2] A SEMIGROUP WITH AN INFINITE SECOND DERIVATIVE 265

Let dn e be the density of the sum of n independent, exponential random


variabies, each having parameter niO. Thus,

dn.1 = dn and dn.e(t) = ~ dn(~).


And ELn is the convolution of eO with dn,t 1 • Let 0 < c < (27T)-1. Arguing
as for (31),
(Ac/t 1 )n 1 < lim infh-+ o h-1 [F~,n(tl + h) - F~.n(tl)]
for large enough n. For large N, Fatou implies

_A 2 + ;;L::N hn nt ~ lim mfh-+o s(h).


This completes the discussion for C c (0, (0).
Now suppose 11 = 0, and all other Iv > O. In this case, let I consist of a, b,
and (1, n) for positive integer n, and (v, n, m) for m = 1, ... ,n and
n = 1,2, ... and v = 2,3, .... From a, let the chain jump to (1, n) with
probability hn' and let the chain jump to (v, n, 1) with probability Pv,n'
Suppose the Pv,n are positive and sum to 1, while
L n Pl,n n = 00 and LnPv,n n1 = 00 for v = 2,3, ....
From (1, n) or from (v, n, n), let the chain jump to b, and let b be absorbing.
From (v, n, m) with m < n, let the chain jump to (v, n, m + 1). Let the
holding time parameter for a be A. Let the holding time parameter for (1, n)
be n. Let the holding time parameter for (v, n, m) be nlt v•
The argument for tv > 0 is essentially the same as before. Here is the pro-
gram for tl = O. Use the same formulas for PCt, a, b) and PI(t, a, b); but
now F 1,n is the distribution function of the sum of two independent,
exponential random variables-the first having parameter A and the
second having parameter n. Define s(h) the same way, for h > O. For the
usual reasons,

s(h) ~ _A 2 + L:'=N hn h-1 [F~.n(h) - F~.n(O)];


and (5.34) implies
F~.nCO) = O.
Now

F'l,n (h) =lhne- ns Ae-;'(h-S) ds ,


o
so
limh~o h-1 F~,n(h) = An.

*
The rest is the same.
266 MORE EXAMPLES FOR THE STABLE CASE [8

If C is dense in [0, (0), it follows automatically that


lim SUPh.O h- 1 [PI(t + h, 0,1) - pI(t, 0,1)] = 00

for a residual set of t in [0, (0), using the


(34) Fact. If sv(t) is a continuous function of t for 11 = 1, 2, ... , then
{t:supvsv(t) = oo} is a G6 •
According to (Chung, 1960, p. 268), the function Pl(t, a, b) is absolutely
continuous, so the situation cannot get much worse.

3. LARGE OSCILLATIONS IN pet, 1, 1)

The main results (35-36) of this section are taken from (Blackwell and
Freedman, 1968). They should be compared with (1.1-4) of ACM. LetI" =
{I, 2, ... ,n}. Let P n be a generic standard stochastic semigroup on In.
(35) Theorem. For any c5 > 0, there is a P n with

Pn(t, 1, 1) < c5 for c5 ~ t ~ 1 - c5 and 1 + c5 ~ t ~ 2 - c5


while
Pn(l, 1, 1) > lIe.
In particular,
1 - _pet, 1, 1)
t --+ pet, 1, 1), t --+ ! tp(s, 1, 1) ds, t --+ _ ...0....;..-'---'-

do t

are not monotone functions.

lit
I remind you that
t --+ - f(s) ds
t 0

lit
is nonincreasing iff
f(t) ~ - f(s) ds.
t 0

A more elegant nonmonotone P can be found in Section 4.

(36) Theorem. For any K < t,for any small positive B, there is a P n with

and
Pn(l, 1, 1) >1- B.

I will prove (35) and (36) later. Here is some preliminary material, which
8.3] LARGE OSCILLATIONS IN P(/, 1, 1) 267

will also be useful in ACM. Let q and c be positive real numbers. On a


convenient probability triple (n, §', &'), let To, T 1 , .•• be independent and
exponential with common parameter q. Define a stochastic process Z as in
Figure 3.

Figure 3.
Z(/) = 1 for o ~ 1 < To
and + c ~ t < To + c + Tl
To
and To + c + Tl + C ~ t < To + c + Tl + C + Tz

Z(t) = 0 for remaining I.


Let
(37) f(t) = f(q, c; I) = &'{Z(t) = I}.
The process Z is not Markov. But there are standard stochastic semi groups
P n on {1,2, ... ,n} such that Pn(t,l,l)-+-f(/) uniformly on bounded
I-sets. I will argue this later (43). Clearly.
(38) f(t) = e- qt for 0 ~ t ~ c.
by conditioning on To.

(39) j(t) = e-qt + L-Cqe-q<t-c-S)j(S) ds for t ~ c.

LONG ARGUMENT FOR (39). Define a new process Z* as follows:


Z*(t) = Z(To + c + t) for 0 ~ t < 00.
Now Z* can be constructed from T 1 , T z, ••• just the way Z was constructed
from To. T 1 , ••• ; so Z* is distributed like Z and is independent of To. For
t ~ c,
{Z(t) = I} = {To> t} U {To ~ t - c and Z*(t - c - To) = I}.
So
f(t) = e- qt + &,{To ~ 1 - c and Z*(t - c- To) = I}.
268 MORE EXAMPLES FOR THE STABLE CASE [8

Fubini says
&I'{To ~ t - c and Z*(t - c - To) = 1}

= Jort-c &I'{Z*(t - c- s) = 1} qe- q, ds

= Jor - f(t
t c
- c - s) qe-q , ds

= Jort- cqe-q(t-c-I) f(s) ds.


This finishes (39).
Substitute (38) into (39):
(40) f(t) = e-Qt + q(t - c)e-ll(t-c) for c ~ t ~ 2c.
Let n ~ 2. Define a matrix Q .. on I .. = {I, ... ,n} as follows:
(41a) Q.. (I, 1) = -q and Q.. (1,2) = q
and Q..(I,j) = 0 for j:;l= 1,2;
if 1 ~ i < n,
(41b) Q.. (i, i) = -en - 1)/c and Q..(i, i + 1) = (n - 1)/c
and Q.. (i,j) = 0 for j :;1= i, i + 1;
(41c) Q.. (n, n) = - (n - 1)/c and Q.. (n, 1) = (n - 1)/c
and Q.. (n,j) = 0 for j:;l= n, 1.
Check that Q .. satisfies (5.28). Using (5.29),
(42) let p .. be the unique standard stochastic semigroup on I .. with
P~(O) = Q...
(43) Proposition. Fix positive, finite numbers q and c. Define f by (37) and
p .. by (41,42). Then

uniformly in bounded t.
INFORMAL PROOF. Consider a Markov chain with stationary transitions
p .. and starting state 1. The process moves cyclically 1 ---+ 2 ---+ ••• ---+ n ---+ 1.
The holding times are unconditionally independent and exponentially
distributed; the holding time in 1 has parameter q; the other holding time
parameters are (n - 1)/c. There are n - 1 visits to other states intervening
between successive visits to 1. So the gaps between the I-intervals are
independent and identically distributed, with mean c and variance c2 /(n - 1).
For large n, the first 1010 gaps are nearly c, so p ..(t, 1, 1) is nearly f(t) for all
moderate t. *
8.3] LARGE OSCILLATIONS IN P(t, 1, 1) 269

FORMAL PROOF. Use (5.45). Let S be the set of right continuous In-valued
step functions. Let X be the coordinate process on S. Let {~m} be the visiting
process, and let {Tm} be the holding time process. The probability Tr = (P n)i
on S makes X Markov with transitions P n and starting state 1. Let r(m) be
one plus the remainder when m is divided by n, so
r(m) E In and r(m) == m +1 modulo n.
Let So be the set where for all m = 0, 1, ...
0< Tm < 00 and ~111 = r(m).
Then
(44)
And with respect to Tr,

(45) TO, Tlo ••• are unconditionally independent and exponentially dis-
tributed, the parameter for T m being q when m is a multiple of n, and
(n - I)/c for other m.
Let ()o, ()I, ... be the successive holding times of X in 1, and let Yo, YI, ...
be the successive gaps between the I-intervals of X. Formally, on So let

Ym = ~. {T.:mn < 'JI < (m + I)n}


for m = 0,1, .... With respect to Tr, I say:
(46) ()o, ()t> ••• are independent and exponentially distributed, with common
parameter q;
(47) Yo, YI, ... are independent and identically distributed, with mean c
and variance c2 /(n - 1).
The reason for (46) is (45); the reason for (47) is (45) and (5.31). The ()'s
and the is are Tr-independent, but this doesn't affect the rest of the argument.
Use (47) and Chebychev: for each m = 0, 1, ....
(48) Yo + ... + Ym converges to (m + J)c in Tr-probability as n- 00.

WARNING. S, X, ~, T, (), Y, and Tr all depend on n The Tr-distribution ot


Y depends on n. But the Tr-distribution of () does not depend on n.
On some convenient probability triple (n,.'F, &), let To, Tlo ... be
independent and exponentially distributed, with parameter q. So the Tr-
distribution of «()o, ()i' ... ) coincides by (46) with the &-distribution of
(To, T I , . • .).
For m = 0, I, ... , let Am(t) be the event that
()o + Yo + ... + ()m + y", ~ t < 0 + Yo + ... + Om + Ym + 0m+1
0
270 MORE EXAMPLES FOR THE STABLE CASE [8

and let Bm(t) be the event that


To + ... + Tm + (m + l)c ~ t < To + ... + Tm + Tm+1 + (m + l)c.
WARNING. Am(t) depends on n; Bm(t) doesn't.

Use (44): except for a 7T-nuII set,

{X(t) = I} = {Oo > t} U {U:=o Am(t)};


so
Pn(t, 1, 1) = 7T{Oo > t} + ~:=o 7T{Am(t)}.
Use (7) and (48): for each m = 0, 1, ...
1T{Am(t)} ---+ &{B.,.(t)} uniformly in t as n ---+ 00.

Fix t* with 0 < t* < 00, and confine t to [0, t*]. Then

which is sum mabIe in m. And

.9'{Bm(t)} = 0 when (m + l)c > T.


You can safely conclude

(49) limn->C() Pn(t, 1, 1) = .9'{To > t} + ~:=o .9'{Bm(t)}


uniformly in I ~ 1*. Remember the definition (37) of J; by inspection, the
right side of (49) is J(I). *
PROOF OF (35). Define J by (37). Let c < 1 increase to 1, and let q =
1/(1 - c). Then J tends to 0 uniformly on [0, 1 - 0] by (38), and on
[1 + 0,2 - 0] by (40), whileJ(I) decreases to l/e by (40). That is, you can
find q and c so J = J(q, c: .) satisfies the two inequalities of (35). Now use (43)
to approximateJby P n (-, 1, 1). *
PROOF OF (36). Define J by (37). Fix c = t, and let q decrease to o.
Abbreviate
g(q) = J(q, i: t) and h(q) = J(q, t: 1).
Then
g(q) = r q/ 2 by (38)
= 1 - tq + iq2 + 0(q3);
h(q) = e- q + iqr q / 2 by (40)
= 1 - iq + !q2 + 0(q3).
8.4] AN EXAMPLE OF SPEAKMAN 271

Consequently,

1 - h(q) = kq + 0(q2);
h(q) - g(q) = iq2 + 0(q3)
= HI - h(q)]2 + 0[1 - h(q)]3.

Fix K with 0 < K < k. Choose q* > 0 but so small that on (0, q*):
h is strictly decreasing;

h- g > K(I - h)2.

Let 0 <e<1- h(q*). Choose q' with 0 < q' < q*, so
h(q') = 1 - e.
Then
g(q') < h(q') - K[I - h(q')]2 = 1 - e - Ke 2.

By continuity, there is a positive q less than q', but very close, with

h(q) >1- e
g(q) < 1 - e - Ke 2•

That is, you can find small positive q and c = t so that 1= I(q, c: .)
satisfies the two inequalities of (36). Now use (43) to approximate I by
P n (-, 1, 1). *
4. AN EXAMPLE OF SPEAKMAN

My object in this section is to give Speakman's (1967) example of two


standard stochastic semigroups P and P on I = {I, 2, 3}, and P = P for
some but not all times. Let

By (5.29), there are unique standard stochastic semigroups P and P on I


with P'(O) = Q and p'(O) = Q. In particular, P "¢ P.
(50) Theorem. P(ne) = PCne) lor n = 0, 1, •.. , where c = 47T3-t . How-
ever, P"¢ P.
272 MORE EXAMPLES FOR THE STABLE CASE [8

PROOF. It is even possible to write down P and P explicitly. Here is P.

pet, 1, 1) = P(t, 2, 2) = P ( t, 3, 3) = '13 + '23 e-31/2 cos (3!t)


2"

pet, 1,2) = P ( t, 2, 3) = P(
t, 3,)
1 = '13 + '23 e-31/2 cos (3!t 21T)
2" - 3"

pet, 1, 3) = pet, 2, 1) = P ( t, 3, 2) = '13 + '23 e-31/2 cos (3!t 21T) .


2" + 3"
Here is P.
Pct, i, i) = 1- + ie-3t12 for i = 1,2,3
Pet, i,j) = 1- - 1-e-3t / 2 for i =;l: j in I.
I will check P; you should check P. By (5.29), the matrix Q generates a
standard stochastic semigroup; by (5.26), the semigroup is the unique
solution to the forward or backward system. Now P(O) is the identity
matrix, and P is differentiable; all lowe you is
P'= QP.
Ostensibly, there are 9 things to check. But P and Q are invariant under the
cyclic permutation 1 - 2 - 3 - 1, so I only have to check row 1.
Abbreviate fJ = 311/2 and A = 21T/3. Remember from high school:
1 31
cos A = -- and sin A = -
2 2
1 3!
cos (fJ ± A) = - 2 cos fJ T "2 sin fJ.
In particular, rows 1 and 2 of P sum to 1; so row 1 of P' as well as row 1 of
QP sum to 0, and I only have to check the (l, 1) and (1,2) positions. I will
do (1, 1), and leave (1,2) to you.
Position (1, 1) on the left is P' (1, 1, 1), namely
_e- 3t / 2 cos fJ - 3-1 e- 31 / 2 sin fJ.
Position (1, 1) on the right is-pet, 1, 1) + pet, 2,1), namely
-ie-31 / 2 cos fJ + ie-31 / 2 cos (fJ + A).
Multiply both expressions by !e 3t / 2 and add cos fJ: the left side becomes
1 3!
- - cos fJ - - sin fJ
2 2
and the right side becomes
cos (fJ + A).
I have already claimed the last two expressions are equal.
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 273

The same trigonometry, coupled with cos (2mr) = 1, shows that P(nc) =
P~~. *
(51) Remark. P(t, 1, 1) is not a monotone function of t.

5. THE EMBEDDED JUMP PROCESS IS NOT MARKOV

Consider a Markov chain with countable state space, stationary standard


transitions, and continuous time parameter. Suppose all states are stable.
As shown in (7.33), the embedded jump process is a discrete time Markov
chain with stationary transitions, up to the first infinity. My object in this
section is to show, by example, that the embedded jump process need not
have this property between the first infinity and the second infinity, even
when there is a second infinity. The possibility of this phenomenon was
suggested by (Hunt, 1960); a related phenomenon is exhibited in Section
2.13 of B & D.
Let I be the integers. Let t < p < 1. A discrete time Markov chain is a
p-walk iff it has state space I, and moves from j to j + 1 with probability p,
and from j to j - 1 with probability 1 - p. You should check that

is maximized for j satisfying


np - (1 - p) <j ~ np + p,
and is then of order n- 1: for example, use (Feller, 1968, Theorem on
p. 158), followed by Stirling. In particular,

(52) maxi (;)pi(1 - p)n-i _ 0 as n _ 00.

Let {Tn:n = ... -1,0,1, ... } be a stochastic process on the probability


triple (n, iF, &). Can {Tn} be a p-walk? Suppose it were. Then
&{To = k} = ~i &{To = kiLn = k - j} . &{Ln = k - j}
~ maxi&{To = kiLn = k - j}.
But

&{To = kILn = k - j} = G)pi(l - p)n-i.

Let n - 00 and use (52) to get &{To = k} = 0 for all k. Sum out k to get
1 = O.
274 MORE EXAMPLES FOR THE StABLE CASE [8

(53) No probability triple supports a p-walk with time parameter running


through all the integers.
WARNING. It's easy to manufacture a process {Tn: integer n} such that
Tn+l - Tn are independent, 1 with probability p, and -I with probability
1 - p. For instance, start a p-walk at the value 0 with time going forward
from 0; and start an independent {l - p)-walk at the value 0 with time going
backward from O. Such a process has independent increments all right, but
isn't Markov.
Endow I with the discrete topology, and let I = I U {<p} be its one point
compactification. Here is the main result of this section.
(54) Theorem. There is a probability triple (,q'o, ofF, 7T i ), and a stochastic
process {X{t):O ~ t < oo} on (,q'o, ofF, 7T i ), such that;
(a) for each x E ,q'o, the function X(', x) is I-valued, continuous from the
right, and has a limit from the left, at all times;
(b) X(t, x) = <P iff t = <Pn{X) for n = 1,2, ... , where
0< <Pl(X) < <P2{X) < ... < <Pn{x) ---+ 00 as n ---+ 00;
at these times, XL x) is continuous;
(c) Between <Pn(x) and <Pn+l(x), the function X(·, x) visits each state at
least once;
(d) On [0, <PI), the sequence of states visited by X is a p-walk starting
from i;
(e) X is a Markov chain with stationary standard transitions.
Let ~ = { ... , Ll' ~o, ~l' ... } be ofF-measurable, I-valued functions giving
the order of visits paid by X on (<PI, <P2)' Here is one way to formalize this
idea. Suppose cPo is a function, with <PI < cPo < <P2, and X discontinuous at
cPo. For n = I, 2, ... , let cPn be the time of the nth discontinuity in X after cPo,
and let cP-n be the time of the nth discontinuity in X before cPo. Then ~ gives
the order of visits paid by X on (<PI' <P2) iff there exists a cPo with <PI < cPo < <P2'
and X discontinuous at cPo, and
~n=X(cPn) forn= ... ,-I,O,I, ....
The paradox embodied in (54) is
(55) Proposition. ~ cannot be Markov with stationary transitions.
PROOF. Suppose it were. I will use the strong Markov property separately
on ~ and on X, to show that ~ is a p-walk. This contradicts (53). Fix i = O.
Let (] be the least n with ~n = 0; this makes sense by (54c). Let T be the
least t > <PI with X(t) = O. Then T < <P2 by (54c). Now ~..+. is the sequence
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 275

of states visited by X{T + .) up to its first infinity. By strong Markov (7.38),


s.
the shifted process X(T + .) is distributed like X. So +" is a p-walk by (54d).
s.
But +" has the same transitions as S, by discrete-time strong Markov. *
Here are some preliminaries to (54). The segments of the visiting process
in X up to the first infinity and between the successive infinities will be
independent. The visiting process between the nth and n + 1st infinity will
be a strongly approximate p-walk. I prefer not to define these objects now. But
I will point to one. I warn you that the rest of this section is ridiculously hard.

A strongly approximate p-walk


Let (n,:F, &) be a convenient probability triple. On it, let
{Sm{n):n = 0, 1, ... } be iridependent p-walks starting from m, for non-
positive integer m. I am entitled to assume Sm(n + 1) = Sm(n) ± 1 for all m
and n. In view of (1.95, 96), I can also assume Sm(n) -+ 00 as n -+ 00 for all
m. Let 0'0 = 0. For m = -1, -2, ... , let Tm be the least n such that
Sm(n) = m + 1, and let am = T -1 + ... + T m' Define a new process S as
in Figure 4.

- S is the shift of S_2---S is the:t SisSo- - -


shift 2
of S-1
-2

Figure 4.

(56) Sen) = So(n) for n = 0, 1, .. .


= Sm{O'm + n) for n = -am> ... , -am+!
and m = -1, -2, ....
Formally, S is defined twice at -am; but the definitions agree and make
S = m. If n < -am, then Sen) < m. On the time interval [-am, -O'm+1],
the process S moves from first hitting m to first hitting m + 1; the sequence
of moves performed by S on this interval coincides with the sequence of
moves performed by Sm until Sm first hits m + 1. Clearly,
(57) Sen + 1) = Sen) ± 1 for all n, and limn-+±a:> Sen) = ± 00.
NOTE. S is a strongly approximate p-walk.
276 MORE EXAMPLES FOR THE STABLE CASE [8

Aside. You get the same effect by starting a p-walk at the value with
time going forward from 0, and an independent (1 - p)-walk at the value
°°
with time going backward from 0, provided you condition the second walk on
never returning to 0.

(58) Lemma. Let j E I. Let A. be the least n with S(n) = j. Then


{S(A. + n):n = 0, 1, ... }
is a p-walk starting from j, and is independent of
{S(A. - n):n = 0, 1, ... }.
PROOF. You should do the easier case j ~ 0. I will argue the case j < 0.
Then A. = -(1j' You should check that {S(A. - n):n ~ o} is measurable on
{S'_b Sj_2, ... }. I will show that {S(A. + n):n ~ o} is measurable on
{S" S'+1' ... , So}. This will prove the independence. I also have to argue
that S(A. + .) is distributed like Sj.
To begin with,
S(A. + .) = S;(-) on [0,7,)
S(A. + 7 j + .) = SHI0 on [0, 7i+l)

S(A. + 7J + ... + 7_2 + .) = S_1(') on [0,7_1)

S(A. + 7, + ... + 7_1 + .) = SoO on [0, 00);

where 7k by definition is the least n with Sk(n) = k + 1. Introduce the


corresponding times OJ, ... , 0_ 1 for Sj: namely, OJ = 7j is the least n with
S;(n) = j + 1; and Ok is the least n with
+ ... + 0k-l + n) = k + 1.
S,(Oj

Thus, OJ + ... + Ok is the least n with Sj(n) = k + 1. Use successive doses


of strong Markov on Sj to check the next assertion. The joint &-distribution
of the Ijl + 1 fragments
S;(-) on [0, 7 j )
SHl(') on [0,71+1)

S-10 on [0,7_1)
So(') on [0, 00)
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 277

coincides with the joint .9'-distribution of the Ijl + 1 fragments


s;(·) on [0,0;)
S;(O; +.) on [0, 0Hl)

S;(O; + ... + 0_ 2 +.) on [0, 0_1)


Sj(O; + ... + 0_1 +.) on [0, (0).

*
But S(J. +.) is obtained by laying the first lot of fragments together, end to
end; and S;(-) is obtained in a similar way from the second lot.

It will be helpful to generalize (58). Let N = 0, 1, .... Let A be the set


where S(n) = j for N + 1 or more n's. On A, let J. be the n such that S(n) = j
for the N + 1st time.

(59) Lemma. Given A, the process {S(J. + n):n = 0, 1, ... } is a p-walk


startingfromj, independent of {S(J. - n):n = 0, 1, ... }.
PROOF. Use (58) and strong Markov (1.22).

This ends the p-walk story.


*
The construction of 7t;
Let C be the set of pairs (m, n) with m = 0, 1, ... , and n = 0, 1, ...
when m = 0; while n is any integer when m > O. Write (m', n') < (m, n)
iff m' < m, or m' = m but n' < n. The intervals of constancy in X will be
indexed by C. Let 0 be the set of functions w from C to 1. Let

';(c)(w) = w(c) for c E C and wE O.

This'; will be the visiting process in X. Give 0 the smallest a-field over which
all Hc) are measurable.

(60) Definition. Let 0 0 be the subset of 0 where:

';(m, n + 1) = ';(m, n) ± 1 for all (m, n) E C;

limn-+oo ';(0, n) = 00;


limn-+±oo Hm, n) = ± 00 forallm = 1,2, ....
278 MORE EXAMPLES FOR THE STABLE CASE [8

(61) Definition. For each i E I, let r i be the probability on Ofor which:


;(m, .) are independent for m = 0, 1, ... ;
HO, .) is a p-walk starting from i;
Hm,') is distributed like the S of (56) for m = 1, 2, ....
Using (57),
(62)
For i E I, let 0 < q(i) < 00, with
(63) ~iEI l/q(i) < 00.
The holding time parameter in i will be q(i). Let W be the set of functions w
from C to (0, 00). Let
T(C)(W) = w(c) for c E C and WE W.
The length of interval c will be T(e). Give W the smallest a-field over which
all T(C) are measurable.
Let !!l' = 0 x W, in the product a-field. Let
(64) ;(c)(w, w) = wee) and T(e)(w, w) = wee)
for e E C and w E 0 and IV E W.
F or A c !!l', let A (w) be the w-section of A:
(65) A(w) = {IV: WE Wand (w, IV) E A}.
(66) Definition. (a) For w EO, let 'YJq(w) be the probability on W which
makes the variables T(e) independent and exponentially distributed, the
parameter for T(e) being q[w(e)].
(b) For i E I, let 7Ti be this probability on !!l':

7Ti{A} = L'YJq(w){A(W)} ri(dw):


where ri was defined in (61).
INFORMAL NOTE. The 7Tcdistribution of; is rio The 7Ti-conditional distri-
bution of T given; = w is 'YJq«(J)'
The main line of argument starts at (80).

Some lemmas
To state (67), let F be a finite subset of C. Letfbe a function from Fto I.
Let WF be the set of functions from F to (0, 00), with the product a-field.
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 279

Let 'YJ, be the probability on W p which makes the coordinates independent


and exponentially distributed, the parameter for coordinate c E F beingf(c).
Let TF map!!C into W F :

(TFX)(C) = T(C)(X) for c E F and x E!!C.

(67) Lemma. For any measurable subset B of WH ,

7T;g(C) = f(c) for c E F, and TF E B} = r;g(c) = f(c)for c E F}· 'YJ,{B}.

*
PROOF. Use definition (66).

To state (68), let ~'" = gem, .), T(m, .)}.

(68) Lemma. With respect to 7T;, the processes ~o, ~l' ... are independent.
For m > 0, the 7Tcdistribution of ~m depends neither on i nor on m.

PROOF. Fix positive integers M and N. Let i(n) E I and


° ~ ten) < 00
for n = 0, ... ,N. Let i(m, n) E I and ~ t(m, n) < 00 for m = 1, ... , M
°
and n = -N, ... ,N. Let
s(o) = ~~=o q[;(n)]t(n)
sCm) = ~~=-N q[i(m, 11)]t(m, n) for m = 1, ... ,M
s = s(o) + s(l) + ... + sCM).
Let
Do = g(O, n) = ;(n) for n = 0, ... , N}
Dm = {~(m, = i(m, n) for n = -N, ... ,N}
n) for m = 1, ... ,M
Eo = {T(O, 11) > ten) for n = 0, ... , N}
Em = {T(m, n) > t(m, n) for n = -N, ... ,N} for m = 1, ... ,M.
Let
F = Do n Dl n ... n D111 n Eo n El n ... n E 111'
Then
7Ti{F} = r;{Do n Dl n ... n D lIl }' e- s by (67)
= r i { Do} . r i { D1 } ... r i { D.11} • e- s by definition (61).
By (67),

(69) 1Ti{ Dm n Em} = r i { Dm} . e-s(m) for m = 0, ... , M.


So
280 MORE EXAMPLES FOR THE STABLE CASE [8

proving the first assertion through (10.16). The second assertion also follows
from (69). Suppose i(m, .) and t(m, .) do not depend on m > O. Then sCm)
does not depend on m > O. And r,{D m } depends neither on i nor on m > 0,
by definition (61). *
The set n 1, the index A, and the a-field ..4
Fix M = 1, 2, ... and N = 0, 1, ... and j E 1.
(70) Definition. (a) Let 0 1 be the subset of 0 0 , as defined in (60), where
~(M, .) visits j at least N + 1 times.
(b) On 0 1 , let A be the index at which ~(M, .) visits j for the N + 1st
time. So
(71) ;(M, A) =j,
and there are N inteKers n with n < A and ;(M, n) = j.
(e) Let d be the a-field in 0 1 generated by ;(m, .) with m <M and by
~(M, A - n) with n ~ O.
(d) Let A(W, w) = A(w)for W E 0 1 and w E W.
NOTATION. 0 1 , A, and d all depend on M and N.
For (72) and the proof of (73), let do be the a-field in 0 generated by
~(m, .) with m < M. Let d l be the a-field in 0 1 generated by ~(M, A -n)
with n ~ O. Check
(72) d is the a-field in 0 1 generated by sets J n K with J Edo and K E d 1•

NOTATION K E d 1 forces K COl .


To state (73), let nIJ be a positive integer. Let iB(n) E I for n = 0, ... , nB.
Let
D = {w:w E 0 1 and w[M, A(W) + n] = iB(n) for n = 0, ... ,nB}
D* = {w: wE 0 and w(O, n) = iB(n) for n = 0, ... , nB}.
(73) Lemma. Let h be any nonnegative, d-measurablefunction on 0 1 . Then

rh dr
JD i = ri{D*} r
JOt
h dr i •

PROOF. Let h = 1JnI(' with J E do and K E d 1 • Now 0 1, D, and K are


in the a-field spanned by ;(M, .), which is ri-independent of do, from
definition (61) of rio So
ri{J n K} = ri{J} . ri{K}
ri{J n K n D} = ri{J} . ri{K n D}.
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 281

Use (59) and (61):

That is,
ri{J n K n D} = ri{J n K}· r;{D*};
and (73) holds for this h. By (72) and (10.16), the result holds for h = lA

*
with A Ed. It then holds for simple d-functions by linearity, and non-
negative d-functions by monotone approximation.
To state (74-79), let C* be the set of all pairs (m, n) E C with m < M,
together with the nonnegative integers. Let W* be the set of all functions
from C* to (0, (0), with the product a-field. Let nA be a nonnegative integer.
Let CA be a finite subset of C, with m < M for all (m, n) E CA. Suppose
0< tA(c) < 00 for CECA or C· = 1, ... ,nA. Let WA be the set of WE W*
such that
(74) w(c) > tA(C) for CECA' and
w(n) > IA(n) for n = 1, ... ,nA' and
~ {w(c):c E C* but C 7fI: O} ~ t < ~ {w(c):c E C*}.
Check:
(75) W A is a measurable subset of W*.
Define a mapping r ro from W to W*:
(76) Tro(w)(m, n) = w(m, n) for (m, n) E C with m <M
Tro(w)(n) = w[M, A(W) - n] for n = 0,1, ....
Check:
(77) Trois measurable
(78) the ,/,}q(w)-distribution of Tw is d-measurable, as w varies over 01.
Conclude from (75,77-78):
(79) the function w -- '/'}q(ro){Tro E WA } is d-measurable on 01.

The construction of f.l' 0 and X


(80) Definition. On W:
(a) Let a(c) be the sum of T(d) over dEC with d < c.
(b) Let r(m) be the sum of T(m, n) over n with (m, n) E C.
(c) Let a(c)(w, w) = a(c)(w) and r(m)(w, w) = r(m)(w) forw En
and w E W.
(d) Let Wo be the subset of W where a(c) < 00 for all c, but
sUPe a(c) = 00.
282 MORE EXAMPLES FOR THE STABLE CASE [S

(e) Let !!to = Q o X Wo, where Q o was defined in (60). Give !!to the
product u-.field.
(f) On !!to, let
X(t) = ~(m, n) when u(m, n) ~ t < u(m, n + 1)
= fJ' when u(m, n) ~ t < u(m, n + 1) for no (m, n).

Or-----~------~~--------~~----_H~----~~
-I

-2
-3
-4
-s
l'
Figure 5.

Definition (80f) is illustrated by Figure 5. You should check


(SI) X is jointly measurable.
With some thinking, you can get
(S2) X has properties (54a-c). The map (m, n) - [u(m, n), u(m, n + 1»
is order-preserving from C onto the intervals of constancy in X. The
value of X on interval c is ~(c), and the length of interval c is T(C).
Furthermore,
fJ'n = reO) + ... + r(n - 1) for n = 1, 2, ... ;
where r is defined in (80b), and fJ'n is defined in (54b).

(83) Lemma. 7T;{!!tO} = 1.


8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 283

PROOF. You have to remember definitions (66) of 7Ti and (80) of !!Co. I
will refer to r, defined in (80b). I will argue that

(84) r
Jnoxw
rem) d7Ti < 00 for m = 0, 1, ....

Relations (62, 84) and (lO.lOb) make


7Ti{Q O X Wand rem) < oo} = 1 for m = 0, 1, ....
Countable additivity makes
7Ti{Q O X Wand rem) < 00 for m = 0, 1, ... } = 1.

By thinking,

{rem) < 00 for m = 0, I, ... } = {aCe) < 00 for e E C}.


Next,
sUPe aCe) = ~:=o rem).

Lemma (68) makes ro, r1 , ••• independent; the rm being identically distributed
for m > O. Since r 1 is positive,

7Ti{Q O X W and ~:=o rem) = oo} = 1.


So (83) reduces to (84).
To argue (84), let V(j, m)(w) be the number of n with w(m, n) = j.
Remember the definition (61) of rio
Use (58) and (1.98):

(85a) r V(j, m) dr
Jao i = _1_
2p - 1
for m >0

(85b) r V(j, 0) dri


Jao
= _1_
2p - 1
for j ~ i

(85c) r V(j, 0) dr = (1 -p p)i-i. 2p_1_


Jao
i
- 1
for j < i.
By (85),

(86) r V(j, m) dr ~ _1_ for all i,j, m.


Jno i
2p - 1
Use (5.31) and definition (55) of 1]:
(87) for each w, the function w -4- w(m, n) on W has 1]q(w)-expectation equal
to l!q[w(m, n)].
284 MORE EXAMPLES FOR THE STABLE CASE [8

Let us compute together, keepingj in I, and m fixed, and (m, n) in C.

r
Jnoxw
rem) d1Ti =r
JnoJw
r r(m)(w, w) 17q«J»(dw) r.(dw) by (66)

= r~ r T(m, n)(w, W)17q«J»(dw) ri(dw) by monotone


Jno n Jw convergence

=Jnr ~ n Jw
r w(m, n)17q«J» ri(dw) by (64)
o

r
=Jn ~ n IJq[w(m, n)] ridw)
o
by (87)

= r ~. V{j, m)(w)Jq(j) ri(dw) by rearranging


Jno '

= ~ _1_.
Iq{j)Jno
r V(j, m)(w) ri(dw) by monotone
convergence

<~_1_ 1 by (86)
= "-I q(j) (2p - 1)
<00
This completes the construction: (~o, 1Ti) is a bona fide probability triple
by (63).
*
by (83), and Xis an I-valued process on this triple by (81). Properties (54a--c)
have already been claimed (82). At this point, you could check property
(54d), and
(88) 1Ti{X(O) = i} = 1 for all i E I.
(89) Lemma. 1Ti{X(t) E /} = 1.
PROOF. Using (54b),
Lebesgue {t:X(t, x) = lP} = O.
Let E be the set of t such that
1T/ {X(t) = lP} >0 for some j.
Use (81) and Fubini to deduce
(90) Lebesgue E = O.
Fix i E I. Let S map Q into itself:
(Sw)(O, n) = w(O, n + 1)
(Sw)(m. n) = w(m. n) for m > O.
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 285

Using (60,61), check that S maps no into itself, and


(91) the r;-distribution of S is pri+l + (1 - P)r;-l'
Let T map W into itself:
(Tw)(O, n) = w(O, n + 1)
(Tw)(m, n) = w(m, n) for m > O.
Using definition (80), check that T maps Wo into itself. Remember definition
(66a) of 1]. Relative to 1]q(w):
(92) T is independent of T(O, 0) and has distribution 1]q(Sw);
T(O,O) is exponential with parameter q[w(O, 0)].
Let U map fE into itself:
U(w, w) = (Sw, Tw).
Using (80), check that U maps fEo into itself, and
X(t, x) = X[t - T(O, O)(x), U(x)] when T(O, O)(x) ~ t.
INFORMAL NOTE. U shifts X by T(O, 0).
So
(93) {X(O,') = i and X(t, .) = p}
= {X(O, .) = i and T(O, 0) ~ t and X[t - T(O, 0), U] = p}.
Abbreviate q = q(i) and Tr = PTri+l + (1 - P)Tr;_l' Combine (91,92) to see
that relative to Tr i ,
(94) U is independent of T(O, 0)
U has distribution Tr

T(O, 0) is exponential with parameter q.


Fubini up (93,94):

Tri{X(t) = p} = qL e-Q8 Tr{X(t - s) = p}ds.


1

But Tr{X(t - s) = p} = 0 for Lebesgue almost all s E [0, t] by (90). *


Let:F be the product a-field on fE o. Let :F(t) be the a-field spanned by
Xes) for s ~ t. Let s, t ~ O. Let i,j, k E I. Let A E:F(t) with A c: {X(t) = j}.
I will eventually succeed in arguing
(95) Markov property. Tri{A and X(t + s) = k} = Tr;{A} . TrAX(s) = k}.
Relations (88, 89, 9S) and lemma (S.4) make X Markov with stationary
transitions and starting state i, relative to Tri' These transitions have to be
standard by (S4a). So (S4d) reduces to (9S); but the proof of (9S) is hairy.
WARNING. Theorem (6.108) does not apply, because condition (6.66) fails.
286 MORE EXAMPLES FOR THE STABLE CASE [8

The special A
(96) Definition. Remember the definition (80) of :!Co and X. Let :!C(O) be the
subset of:!C0 where
o ~ t < CfJ1 and X(t) = j.
Let M = 1,2, ... and N = 0, 1, .... Let :!C(M, N) be the subset of:!Co where:
CfJltI < t < CfJ11Hl and X(t) = j; and
the number of j-intervals in X after CfJ]If but before the
one surrounding t is N.
Then :!C(O) and :!c(M, N) are in ~(t). These sets are pairwise disjoint as
M and N vary; their union is {X(t) = j}. You should prove (95) when
A c :!C(O): it's similar to (5.39). The proof I will give works for these A's,
if you treat the notation gently; but it's silly.
(97) I only have to prove (95) for A c :!C(M, N).

,
=~(M, X+ 2) I
2 j
T(O, 2) 0 T
xi

I 0
.... II
teO, 1)0 T
=~(M, A+ I)
'-=T(M, X+2)-

- _J
8-:
o ,-..

I -~
T(O, I) oT
a(M, X) t
I-=T(M, X + 1)-
o~
:l:l;:"-'"
II
o
1{J2 :::- I{J\ oT=1{J3
I
.< ~T(O,O)oT
~ i--T(M, X)-
1 ,.....- N
"-'"

i I T
i
:
.< T(M, X-I)

~
-2 Jj "-'"

T(M, X- 2)

Figure 6.

So fix positive integer M and nonnegative integer N. Define 01' ;., and .s;/
by (70), with this choice of M and N. Review (80) and look at Figure 6. Use
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 287

(71,82) to make sure that

(98) f!l"(M, N) = {ill x Woatld a(M, A) ~ t < a(M, A+ I)}.


On ill X Wo, check:

(99)

(l00) a(M, A- n) = fP M + ~~n+1T(M, A- v);

(101) Xes) = ~(M, A - n) for a(M, A - n) ~ s < a(M, A - n + 1).


(102) Lemma. Let m < M. Then ~(m, n) and T(rn, n) can be measurably
computedfrom {X(s):O ~ s < fPM}, at least on f!l"o.

PROOF. Number the intervals of constancy in [0, fPl] from left to right so
that interval number 0 has left endpoint o. Then X is ~(O, n) on interval n,
which has length T(O, n), for n = 0, 1, .... Let 1 ~ m < M. Let rp be the
least t > fPm with X(t) = O. Number the intervals of constancy on
(fPm' fPm+l) from left to right so that interval number 0 has left endpoint rp.
Then X is ~(m, n) on interval n, which has length T(m, n), for
integer n. *
(103) Let!JB be the a-field in f!l"(M, N) spanned by ~(m, .) and T(m, .) with
m < M and domain cut down to f!l"(M, N). Let ~ be the a-field in
f!l"(M, N) spanned by ~(M, A - n) and T(M, A - n) with n = 1, 2, ...
and domain cut down to f!l"(M, N).
(104) Lemma. f!l"(M, N) ('\ ~(t) is spanned by !JB and~.

PROOF. fPM < t on !£(M, N). So !JB c f!l"(M, N) ('\ ~(t) by (102).
Next, '1l c !!£(M, N) ('\ ~(t), because the nth interval of constancy in X
before the one at time t is a visit to ~(M, A - n) of length T(M, A - n):
use (98-101). I now have to compute {X(s):O ~ s ~ t} on f!l"(M, N) from
!JB and ~. To begin, you can compute {X(s):O ~ s ~ fPAi} on f!l"(M, N)
from !JB, using definition (80): and fPAi retracted to f!l"(M, N) is !JB-measurable
by (99). So a(M, A - n) retracted to f!l"(M, N) is !JB V ~-measurable by
(l00), for n = 0, 1, .... You can now compute the fragment {Xes): fPM <
s < a(M, A)} on f!l"(M, N) from!JB v~, using (101). Finally,
Xes) =j for a(M, A) ~ s ~ t on f!l"(M, N);

*
my authority is (71,98,101). But I peek at Figure 6.
WARNING. A retracted to f!l"(M, N) is not ~(t)-measurable.
(105) Definition. Review (70a, b). Call a set A special iff there is a finite
subset CA of C, with m < M for all (m, n) E CA , a nonnegative integer nA'
288 MORE EXAMPLES FOR THE STABLE CASE [8

a function iA from {I, ... , nA} U CA to I, and a function tA from


{I, ... ,nA} U CA to (0, (0), such that A is the subset of 0 1 X Wo where:
~(c) = iA(c) and T(C) > tA(C) for all CECA;
~(M, A - n) = iA(n) and T(M, A - n) > tA(n) for all n = I, ... ,nA ;
a(M, A) ~ t < a(M, A + I).
I claim
(106) I only have to prove (95) for special A.
Indeed, the special A are subsets of f£(M, N) by (98). They span
f£(M, N) n ff(t) by (104). Two different special A are disjoint or nested,
by inspection. And f£(M, N) is special. So (106) follows from (97) and (10.16).

The mapping T
Remember definition (96) of f£(M, N). Define a mapping T of ff(M, N)
into ff, as in Figure 6 on page 286:
(107a) ~(O, n) 0 T = ~(M, A + n) for n ~ 0;
(107b) ~(m, n) 0 T = ~(M + m, n) for m > 0;
(107c) T(O, 0) 0 T = a(M, A + 1) - t;
(107d) T(O, n) 0 T= T(M, A + n) for n> 0;
(107e) T(m, n) 0 T = T(M + m, n) for m > O.
Review definition (80) of f£ 0 and X. Check that T maps into ff 0, and
(108) X(t + s) = Xes) 0 T on f£(M, N).
Relation (108) is a straightforward but tedious project, which I leave to
you. Consider the assertion
(109)
I claim
(110) it is enough to prove (109) for special A and all measurable subsets
B of {~(O, 0) = j}.
To see this, put B = {X(O) = j and Xes) = k} in (109). Then use (108) to get
(95) for the special A. Then use (106).

The special B
(111) Definition. A set B is special iff there is a finite subset C B of C, with
m> 0 for all (m, n) E CB , a nonnegative integer nB' a function in from
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 289

{O, ... ,nB} U CB to I, and a/unction tB/rom {O, ... ,nB} U CB to (0, 00),
such that
iB(O) = j, and
B = Bl n B 2 , where
Bl = {~(O, n) = iB(n) and T(O, n) > tB(n)/or n = 0, ... , nB}, while
B2 = {~(c) = iB(c) and T(C) > tB(c)/or c E CB }.
I claim
(112) it is enough to prove (109) for special A and special B.
Indeed, the special B span the full a-field on {~(O, 0) = j}. Two different
special B are disjoint or nested, by inspection. And {;(O, 0) = j} is special.
So (112) follows from (110) and (10.16).

The ultraspecial B
(113) Call B ultraspecial iff B is special in the sense of (111), and CB is
empty: so B = B], as defined in (111).
I claim
(114) it is enough to prove (109) for special A and ultraspecial B.
Fix a special B, in the sense of (111). Remember C B , nB' iB' t B , Bl and B2
from (111). Remember
(115) m >0 for (m, n) E CB .
Let Dl be the subset of f£(M, N) where ~(M, Ie + n) = iB(n) for n =
0, ... , nB and a(M, Ie + 1) > t + tB(O) and T(M, Ie + n) > tB(n) for
n = 1, ... ,nB'
Let D2 be the subset of f£ where ~(M + m, n) = iB(m, n) and
T(M + m, n) > tB(m, n) for all (m, n) E CB .
Check
(116a) T-IB 1 = Dl and T-IB = Dl n D 2.
Get an A from (l05). I claim:
(116b) 7T i{A n Dl n D2} = 7Ti{A n D1 } • 7Ti{D2}
(116c) 7T i{D 2} = 7Tj{B 2}
(116d) 7Tj{B 1 n B2} = 7Tj{B 1 } • 7Tj{B 2}.
Remember 'm= {~(m, .), T(m, .)}. In order, f£(M, N), D 1 , and A n Dl are
all measurable on '0' ... ,
'lIi: use (98) for the first move. Next, D2 is measur-
able on a~1+l> 'M+2, ...) by (115). So (68) proves (116b). Relation (116c)
290 MORE EXAMPLES FOR THE STABLE CASE [8

follows from (115) and (68). Finally, BI is measurable on So; and (115)
makes B2 measurable on aI,
S2' ... ). So (68) proves (116d). Suppose (109)
for ultraspecial B. Compute:
'lTi{A n T-IB} = 'lTi{A n DI n D 2 } by (116a)
= 'lTi{A n D 1 } • 'lT i {D 2 } by (116b)
= 'lTi{A n D 1 } • 'lTj{B 2 } by (116c)
= 'lTi{A n T-IB 1 } • 'lT j {B 2 } by (116a)
= 'lTJA} . 'lTj{B J } • 'lTj{B 2 } by (109 on B 1 )
= 'lTi{A} . 'lTj{B I n B2 } by (116d)
= 'lTi{A} . 'lTj{B} by (111).

This settles (114). I wish I could reward you for coming this far, but the worst
lies ahead.

The proof of (109) for special A and ultraspecial B


Review definitions (70, 80). For the rest of the argument, T(rn, n) and
a(rn, n) have domain W. Fix one special A in the sense of (105), and one
ultra special B in the sense of (113). Remember CA' nA' iA and IA from (105).
Remember nB, in, and IB from (113, 111). Introduce the following subsets
of 0: read (117-119) with the list.

Al = {w: WE 0 1 and wee) =


iA(e) for all e E CA , and
w[M, A(w) - n] = iA(n) for all n
= I, ... , nA}.
D = {w:w E 0 1 and w[M, A(w) + n] = iB(n) for all n = 0, ... , nB}.
D* = {w: wE 0 and w(O, n) = iB(n) for all n = 0, ... , nB}.
Introduce the subset H of W:
H = {w: WE Wand w(O, n) > IB(n) for all n = 0, ... ,nB}.
For each w E 0 1 , introduce the following subsets of W.

A:; = {w:w > IA(e) for all e E CA, and


E Wand wee)
w[M, A(w) - n] > IA(n) for all n = I, ... , nA}.
A~ = {w: wE Wand arM, A(w)](w) ~ I < arM, A(W) + I](w)}.
A", = A:; n A~.
E", = {w: WE Wand arM, A(W) + l](w) > t + IB(O)}.
F", = {w: wE Wand w[M, A(w) + n](w) > IB(n) for all n = I, ... , nB}.
8.5] THE EMBEDDED JUMP PROCESS IS NOT MARKOV 291

Check
(117) = {(w, W):w E Al and wE Aro n Wo}
A
(118) B = D* X H.
Remember in(O) = j; use (117-118) and definition (107) to check
(119) A n T-IB = {(w, w):w E Al n D and WE Aro nEro n Fro n Wo}.

From (119) and definition (66b):

(120) 7Ti{A n T-1B} =1


AlnD
1]q(ro){Aro nEro n Fro n Wo} ri(dw).
As (83) and definition (66b) imply,
(121)

=1
By (120, 121):

(122) 7Ti{A n T-1B} 1]q(ro){Aro nEro n Fro} ri(dw).


AlnD
Fix wEAl n D. Let .91ro be the a-field in W spanned by T(m, n) with
(m, n) ~ (M, A(W». Relative to 1]q(rob

.9Iro and T[M, A(W) + n] for n = 1, ... ,nB


are mutually independent, the nth variable being exponential with parameter
q[iB(n)]. This follows from definition (66a). But Aro nEro E .91roo So

(123) 1]q(lO){A co nEw n FlO} = e-v 1]q(lO){A lO nEw}, where


v = L:~l tn(n)q[iB(n)].
Let LlO be the a-field in W spanned by T(m, n) with (m, n) < (M, A(W»).
Relative to 1]q(ro):
~loand T[M, A(W)] are independent;
T[M, A(W)] is exponential with parameter q(j).
This follows from (71) and definition (66a). But A: and a[M, A(W)] are
~lO-measurable. And

a[M, A(W) + 1] = a[M, A(W)] + T[M, A(W)].


So (5.30) makes

(124) 1]q(lO){AlO nEw} = e- u 1]q(lO){A ro }, where u = q(j)tn(O).


Combine (122-124):

(125) 7Ti{A n T-1B} = e-U-VL1Al1]q(ro){AlO} ri(dw).


292 MORE EXAMPLES FOR THE STABLE CASE [8

Review definition (70c) of d, definition (74) of WA , and (76) of T",. Check


Aw = {Tw E WA }.
Conclude from (79) that
w 1Jq(w){A",}
--+

is d-measurable on 0 1 . Check Al E d. Use (73, 125):

7Ti {A n T-l B} = [L,1Jq(w){A w}ri(dW)] . [e- U - V rj{D*}].

Use (117, 121) and definition (66b):

IA,
1Jq(Wi{A w} rldw) = I
A,
1Jq(W){A w nwo} ri(dw) = 7T i {A}.

Use (118) and (67):


e- U - V rj{D*} = 7Tj{B}.

This settles (109) for special A and ultra special B.


*****
6. ISOLATED INFINITIES

The ideas of Section 5 can be used to construct the most general Markov
chain with all states stable and isolated infinities. In this section, an infinity
is a time t such that: in any open interval around t, the sample function
makes infinitely many jumps. I will sketch the program. To begin with, let
r be a substochastic matrix on the countably infinite set I. A strongly
approximate r-chain on the probability triple (O,:F, g» is a partially
defined, I-valued process ~(n), which has the strong Markov property for
hitting times. More exactly, let J be a finite subset of I. Let OJ be the set where
~(n) E J for some n; on OJ, let T J be the least such n: assume there is a least.
Given OJ> the process ~(T., + .) is required to be Markov with stationary
transitions r. This does not make ~ Markov. Incidentally, the time parameter
n runs over a random subinterval of the integers; the most interesting case is
where n runs over all the integers, so the ~(n) are defined everywhere.
Let X be a Markov chain with stationary, standard transitions P, and
regular sample functions.
Suppose the infinities of X are isolated, almost surely;
and occur at times CPt < CP2 < .... Let CPo = 0, and CPm = OCJ if there are
fewer than m infinities.
NOTE. You can show that the set of paths with isolated infinities is
measurable, so its probability can in principle be computed from P and the
starting state.
8.6] ISOLATED INFINITIES 293

As usual, let Q = P'(O) and q(i) = -Q(i,j) < 00, and


r(i,j) = Q(i,j)/q(i) for i ~ j and q(i) >0
=0 elsewhere.
Given {Pm < oo}, the order of visits ~m(-) paid by X on (Pm, Pm+!) is a
strongly approximate r-chain. This is more or less obvious from (7.33) and
strong Markov (7.38).
A strongly approximate r"chain ~ has an exit boundary Boo, consisting
of extreme, excessive functions; and an entrance boundary B_ oo , consisting
of extreme, subinvariant measures. As time n increases, ;(n) converges
almost surely to a random point ;( 00) E Boo; as time n decreases, ;(n) con-
verges almost surely to a random point ;( - 00) E B_oo- Up to null sets, the
point H 00) generates the a-field of invariant subsets of the far future, and
the point H - 00) generates the a-field of invariant subsets of the remote
past. Most of this is in Chapter 4.
On {Pm < (f)}, let ~m( - 00) E B_ oo be the limit of ;m(n) as n decreases,
and let ~m( 00) E Boo be the limit of L(n) as n increases. Now P is specified
by q, r, and a kernel K(h, dft), which is a probability on B_ oo for each hE B",.
At the infinity Pm+!' the process X crosses from the exit point ;m( 00) to the
entrance point ~m+l(-oo) using the kernel K. That is, the distribution of
;m+!( - 00) given Pm+! < 00 and given X(t) for t ~ Pm+l is

Given Pm+l < 00 and X(t) for t ~ Pm+! and ;m+l( - 00), the visiting process
~m+l (-) on (<Pm+!' Pm+2) is a strongly approximate r -chain starting from
;m+l( - 00). As usual, given the visiting process, the holding times are
conditionally independent and exponentially distributed, holding times in j
having parameter q(j).
In particular, the conditional distribution of X( Pm+! + .) given Pm+! < 00
and X(t) for t ~ Pm+l depends only on ;",( 00). On Pm+! < 00, the pre-Pm+!
sigma field is spanned up to null sets by X(t) for t ~ Pm+l and ;m+l( - 00);
the conditional distribution of X( Pm+! + .) given Pm+! < 00 and the pre-
Pm+! sigma field depends only on ;m+l( - 00).
By reversing the procedure, you can construct the general chain with
stable states and isolated infinities, from the holding time parameters q, the
jump matrix r, and the crossing kernel K. It's like Section 5. For details in a
similar construction, see (Chung, 1963, 1966).
The construction of a strongly approximate r -chain starting from ft E B_ oo
is not trivial. For simplicity, suppose all i E I are transient relative to r.
Let {;(n)} be a strongly approximate r-chain starting from ft E B_ oo , on
the probability triple (0, ff, £?ll). More or less by definition, ft(j) is the mean
number of n with ;(n) = j. As before, let OJ be the set where Hn) E J for
294 MORE EXAMPLES FOR THE STABLE CASE [8

some n, and let TJ be the least such n. Let f-l(J) be the distribution of HTJ):
f-l(J)(j) = &P{Q J and ~(T J) = j} for j E J.

It turns out that the main problem in constructing g(n)} is the computation
of f-l(J): because f-l(J) determines the distribution of ~(TJ + .). One method
is sketched in (Hunt, 1960). Here is another.
Let G(i,j) be the mean number of visits to j by a true r-chain starting from
i. By the strong Markov property,
~ieJ f-l(J) (i) G(i,j) = f-l(j) for j E J.
This system of linear equations uniquely determines f-l(J): I will write down
the inversion matrix. Let r(J) be the transition matrix of a true r-chain
watched only when in J, as discussed in Section 2.5. Let /:l.J be the identity
matrix on J:
/:l.Ai,j) = 1 for i = j in J
= 0 for i ¥: j in J.
Define GJ and f-lJ as follows:
GAi,j) = G(i,j) for i andj in J
f-lAj) = f-l(j) for j in J.

The system of equations for Il(J) can now be rewritten in matrix notation:
Il(J) • G J = IlJ'

I claim
G J = ~:,o (r (J»n:
erasing non-J times doesn't affect the number of visits to j E J. The sum
converges beautifully, and
G-:/ = /:l.J - r(J):
boldly multiply both sides of the equation for GJ by (/:l.J - r(J». Therefore,
f-l(J) = f-lJ . (/:l.J - r(J». *
NOTE. /:l.J> GJ , and f-lJ are respectively /:l., G, and f-l retracted to J. But
r(J) and f-l(J) are the restrictions of rand f-l to J in a much subtler sense.
This is all 1 want to say about the isolated infinities case. There are two
drawbacks to the theory. First, it is hard to tell from P when the infinities are
isolated. Second, there are extreme, invariant f-l which do not give the
expected number of visits by a strongly approximate r-chain on any finite
measure space; and you can't tell the players without a program.
8.7] THE SET OF INFINITIES IS BAD 295

This kind of construction probably fails when the set of infinities is count-
able but complicated. It certainly fails when the set of infinities is uncountable.
See Section 2.13 of B & D for an example.

7. THE SET OF INFINITIES IS BAD

Consider a stochastic process {X(t):O ~ t < oo} on a probability triple


(Q, ff, &). Suppose the process is
(126) a Markov chain with countable state space I, stationary standard
transitions, and all states stable.
Give I the discrete topology, and compactify by adjoining one state, cpo
Suppose further:
(127) for each wE Q and 0 ~ t < 00, the J U {cp}-valued function
t ~ X(t, w) is continuous from the right, has a limit from the left,
and is continuous when it takes the value cp;
(128) it is possible to jump from any i E J to any j E J;
(129) each i E I is recurrent.
Let S'I'(w) = {t:X(t, w) = cpl. How bad is S'I'(w)? In view of (127) and
(7.17),
(130) S'I'(w) is closed and has Lebesgue measure 0 for &-almost all w.
The object of this section is to indicate, by construction, that (130) is best
possible. More precisely, fix a compact subset S of the real line, having
Lebesgue measure O. Call a subset S' of the real line similar to S iff there is a
strictly increasing function f of the real line onto itself, which carries S onto
S', such that f and f- 1 are absolutely continuous.
(131) Theorem. There is a stochastic process {X(t):O ~ t < oo} on a triple
(Q, ff, &) satisfying (126) through (129), such that for all wE Q, the set
S'I'(w) includes a countably infinite number of disjoint sets similar to S.
A similar phenomenon appears in Section 2.13 of B & D. I provide an
outline of the construction. We're both too tired for a formal argument.
OUTLINE OF THE CONSTRUCTION. Suppose S c [0, 1], and 0 E S, and
1 E S. Suppose further that 1 is a limit point of S, the other case being similar
and easier. Let A be the set of maximal open subintervals of (- 00, 1]
complementary to S. The state space I consists of all pairs (a, n) with a E A
and n an integer. The construction has parameters p(i) and q(i) for i E J,
arbitrary subject to
o < q(i) < 00 and ~ieI l!q(i) < 00
o < p(i) < 1 and IIiEI p(i) > O.
296 MORE EXAMPLES FOR THE STABLE CASE [8

These parameters enter in the following way. The X process will visit various
states i; on reaching i, the process remains in i for a length of time which is
exponential with parameter q(i). The holding time is independent of every-
thing else. On leaving i = (a, n), the process jumps to (a, n + 1) with
overwhelming probability, namely p(i). It jumps to each other state in I
with positive probability summing to I - p(i). These other probabilities also
constitute parameters for the construction, but they are not important, and
can be fixed in any way subject to the constraints given.
The local behavior of the process is now fixed. To explain the global
behavior, say a < b for a E A and b E A iff a is to the left of b as a subset of
the line; say (a, m) < (b, n) for (a, m) E I and (b, n) E I iff a < b, or a = b
and m < n. Fix t and w with X(I, w) = ((!. The global behavior of X is deter-
mined by the requirement that either case 1 or case 2 holds.
Case 1. There is an e > 0, an a E A, and an integer L, such that: as
u increases through (1 - e, t), the function X(u, w) runs in order through
precisely the states (a, n): n ~ L in order. Then there is a 15 > 0, an interval
C E A with c > a, and an integer K, such that: as u increases through (t, t + 15),
skipping times u' with X(u', w) = ((!, the function X(u, w) runs in order
through precisely the states: (b, n) with a < b < c, all n; and b ~ c, but
n ~ K.
Case 2. There is an e > 0 and an interval a E A such that: as u increases
through (1 - e, t), skipping times u' with X(u', w) = ((!, the function X(u, w)
runs in order through precisely the states (b, n) with b > a, all n. Then there
is a 15 > 0 and an integer K such that: as u increases through (t, t + 15), the

*
function X(u, w) runs in order through precisely the states (I, n) with n ~ K
and I = (-00,0).
OUTLINE OF THE PROOF. Whenever case 2 occurs, there is positive
probability II p(i) that the chain proceeds to move through its states in
order. Whenever this occurs, the corresponding section of S",(w) is similar
to S. By Borel-Cantelli, infinitely many disjoint sections of S"'( w) are similar
to S, as required. *
9

THE GENERAL CASE

1. AN EXAMPLE OF BLACKWELL

My object in this section is to present Blackwell's (1958) ex.ample of a


standard stochastic semigroup, all of whose states are instantaneous. For
other examples of this phenomenon, see Sections 3.3 of ACM and Section
2.12 of B & D. To begin with, consider the matrix

on to, I}, with A and /-l nonnegative, A + /-l positive. There is exactly one
standard stochastic semigroup P on to, I} with P' (0) = Q, namely:

P(t, 0, 0) = _/-l_ + _A_ e-(/I+).>/

/-l+A /-l+A
P(t, 0, 1) = 1 - P(t, 0, 0)
(1)
P(t, 1, 1) = _A_ + _/-l_ e-(I'+).)t

/-l+A /-l+A
P(t, 1,0) = 1 - P(t, 1, 1).

One way to see this is to use (5.29): define P by (I); check P is continuous,
P(O) is the identity, P'(O) = Q, andP(t + s) = P(t)· pes). Dull computations
in the last step can be avoided by thinking: it is enough to do /-l + A = 1 by re-
scaling time; since P(u) is 2 x 2 and stochastic when u is t or s or t + s,

I want to thank Mike Orkin for checking the final draft of this chapter.
297
298 THE GENERAL CASE [9

it is enough to check that pet + s) = pet) . pes) on the diagonal; by inter-


changing J-l and A, so 0 and 1, it is enough to check the (0, 0) position. This
is easy.
Parenthetically, (1) implies the following. Let M be a stochastic matrix on
{O, I}. There is a standard stochastic semigroup P on {O, I} with P(I) = M
iff trace M > 1. The corresponding question for {O, 1, 2}, let alone for an
infinite state space, is open. For a recent discussion, see (Kingman, 1962).
Now let

Qn_- ( -A n
J-ln
with An and J-ln positive, and let P n be the standard stochastic semigroup on
{O, I} with P~ (0) = Qn' Let / be the countable set of infinite sequences
i = (iI' ;2' i3, ...) of O's and 1's, with only finitely many 1'so Let N(i) be the
least N such that 1l ~ N implies in = O. Suppose nn J-ln/(J-ln + An) > 0,
that is,

(2)

For t ~ 0, define the matrix pet) on / as

(3) pet, i,j) = nnPn(t, in,jn)'

Let {Xn(t):O ~ I < oo} be 0-1 valued stochastic processes with right
continuous step functions for sample functions, on a convenient measurable
space (0, ff). Let X(t) be the sequence (X1 (t), X 2 (t), . .. ). For each iE/,
let f!J1i be a probability on ff for which Xl> X 2 , ••• are independent, and Xn
is Markov with stationary transitions P lI , starting from in. This construction
is possible by (5.45) and the existence of product measure. I say

f!J1i{X(t) E /} = 1 for all t ~ O.

Indeed, for n ~.N(i),

by (1), so ~:=1 f!J1;{Xn(t) = I} < 00 by (2). Use Borel-Cantelli.


I will now check that X is a Markov chain, with stationary transitions P,
which are stochastic on 1. Indeed, suppose 0 < 11 < ... < tN < 00 and
j(1), ... ,j(N) are in 1. Let to = 0 and j(O) = i. Let j(n, m) be the mth
9.2] QUASI REGULAR SAMPLE FUNCTIONS 299

component of j(n). Then


.9'i{X(tn) = j(n) for n = 1, ... , N}
= .9'i{Xm (tn) = j(n, m) for n = 1, ... , Nand m = 1,2, ... }
= n:=l .9'i{Xm (tn) = j(n, m) for n = 1, ... ,N}
= n:=l n~':-ol P m[tn+l -- tn' j(n, m), j(n + 1, m)]
= n~,:-~ n:=l P m[tn+l - tn' j(n, m), j(n + 1, m)]
= n~':-ol P[tn+l - tn' j(n), j(n + 1)].
Use (5.4) to clinch the argument.
I will now verify that P is standard. Let N E;; N(i). Then

pet, i, i) = .9'i{X(t) = i}
= [n~,:jl Pn(t, in' in)] . [n~=N Pn(t, 0, 0)].
The first factor tends to 1 as t ->- O. Using (1), the second factor is at least
n~=N fin/(fin + An), which is nearly 1 for large N by (2).
Finally, suppose
(4) ~n An = 00.

I claim each i E I is instantaneous. Indeed, fix t > 0 and consider


.9'i{X(r) = i for all binary rational r with 0 ~ r ~ t}.
This number is at most
.9'i{Xn(r) = 0 for all binary rational r with 0 ~ r ~ t and n E;; N(i)},
which by independence is the product as n runs from N(i) to 00 of
.9'i{Xn(r) = 0 for all binary rational r with 0 ~ r ~ t} = e- Ant,
where the last equality comes from (5.48) or (7.4). Using (4),
.9'i{X(r) = i for all binary rational r with 0 ~ r ~ t} = O.
Now (7.4) forces P'(O, i, i) = - 00. *
2. QUASIREGULAR SAMPLE FUNCTIONS

For the rest of this chapter, let I be a finite or countably infinite set in the
discrete topology. Let I = I for finite I. Let I = I U {cp} be the one-point
compactification of I for infinite I. Call cp the infinite or fictitious or adjoined
state, as opposed to the finite or real states i E I. Let P be a standard stochastic
semigroup on I, with Q = PI(O) and q(i) = -Q(i, i). Do not assume
300 THE GENERAL CASE [9

q(i) < 00. The main point of this section is to construct a Markov chain with
stationary transitions P, all of whose sample functions have this smoothness
property at all nonnegative I:
iff(t) = cp, thenf(r) converges to cp as binary rational r decreases to t;
if f(t) E I, then fer) has precisely one limit point in I, namely f(t), as
binary rational r decreases to I.
Tilis result is in (Chung, 1960,11.7). The key lemma (9) is due to Doob (1942).

Downcrossings

For any finite sequence s = (s(I), s(2), ... , seN»~ of real numbers, and
pair u < v of real numbers, the number of downcrossings {3(u, v, s) of [u, v]
by s is the largest positive integer d such that: there exist integers
1 ~ nl < n2 < . . . < n2d ~ N
with
s(nl) ~ v, s(n2) ~ u, ... , s(n2d_l) ~ v, s(n 2d ) ~ u.
If no such d exists, the number of downcrossings is O. If sand t are finite
sequences, and s is a subsequence of t, then
{3(u, v, s) ~ {3(u, v, t).
Of course, {3(u, v, s) depends on the enumeration of s.

Functions with right and left limits


Let R be the set of nonnegative binary rationals. Let F be the set of func-
tionsffrom R to [0,1], with the product a-field, namely the smallest a-field
over which all Y r are measurable, where Yr(f) = fer) for r E Rand fE F.
Let 0 < M < 00. Let M* be the set of fE F such that: for all real I with
o ~ I < M, as r E R decreases to I, the generalized sequencef(r) converges;
and for all real I with 0 < I ~ M, as r E R increases to I, the generalized
sequence fer) converges. For f E F, let {3n(u, v,!) be the number of down-
crossings of [u, v] by

where (x) is the greatest integer no more than x. Verify that {3n(u, v, .) is
measurable, and nondecreasing in n.
(5) Lemma. M* is Ihe sel offsuch Ihal limn {3n(u, v,!) < oofor aI/rational
u and v with u < v. In particular, M* is measurable.
9.2] QUASI REGULAR SAMPLE FUNCTIONS 301

PROOF. Supposef¢ M*. Suppose that for some t E [0, M) and sequence
rmER II [O,M)withrm!t,
a = lim inff(rm) < lim supf(rm) = b.
The increasing case is similar. Choose rational u, v with
a < u < v < b.
For large N, the number of downcrossings D of [u, v] by f(r 1), ••• ,ferN)
is large. The number of downcrossings of [u, v] by ferN), ... ,f(r1 ) is at
least D - 1. If n is so large that 2n r1 , ••• , 2"rN are all integers,

So

Conversely, supposefE M*. Fix u < v. Let 0< 8 < !(v - u). Abbreviate

f(t+) = lim {fer): r E Rand r ! I}


f(/-) = lim {fer): r E Rand r it}.

For any t E [0, M], there is a b = bet) > 0, such that:


if r E R II [0, M] II (t, t + (i), then If(r) - f(/+)1 < 8;
if r E R II [0, M] II (t - b, t), then If(r) - f(t-)I < 8.
The first condition is vacuous for I = M, and the second is vacuous for I = 0.
In particular, If(r) - f(s)1 < 28 < V - u if rand s are in R II [0, M] and:
either rand s are in (t, t + b), or rand s are in (t - b, t). Let J(t) be the open
interval (t - b(t), I + bet»~. By compactness, there are finitely many points
t 1 , ••• , (v in [0, M] such that the union of f(tn) for n = 1, ... , N covers
[0, M]. I claim that for all v,

(3v(u, v,f) ~ 3N/2.

Indeed, suppose ° ~ r1 < r2 < ... < r2d ~ M are in R, and

as in Figure 1. I say that J(t n) contains at most three rm's. For suppose l{tn)
contains rm' rm+l' rm+2, rm+3' as in Figure 1. Then In is either to the right of

*
rm+l or to the left of rm+2' In either case, there is a forbidden oscillation. So
there are at most 3N points rm' That is, 2d ~ 3N.
Pedro Fernandez eliminated the unnecessary part of an earlier proof.
302 THE GENERAL CASE [9

'm+l rm+2 rm+3

Figure 1.

Quasiconvergence

(6) Definition. Let A be a set directed by > and let ia E 1 for each a E A.
That is, {ia} is a generalized sequence. Say ia quasiconverges to j E I, or
q-lim ia = j,
iff: for any finite subset D of I\{j}, there is some a(D) E A such that a > a(D)
implies ia tt D; and for
any a E A there is some b > a such that ib = j. Say ia
quasiconverges to cp, or
q-lim ia = cp,
iff: for any finite subset D of I, there is some a(D) E A such that a > s(D)
implies ia i D.
The directed sets of main interest are: the nonnegative integers; the
nonnegative binary rationals less than a given real number; the nonnegative
real numbers less than a given real number; the binary rationals greater than
a given nonnegative real number; the real numbers greater than a given
nonnegative real number. In the first three cases, a > b means a is greater
than b. In the last two cases, a > b means a is less than b. Here is the usual
notation for these five quasilimits; t is the given real number and R is the set
9.2] QUASlREGULAR SAMPLE FUNCTIONS 303

of nonnegative binary rationals.


q-lim in.
q-lim {ir:r E Rand r it}.
q-lim {is: sit} = q-lims t t i•.
q-lim {ir: r E Rand r ! t}.
q-lim {is:s! t} = q-limsj tis.
Quasiconvergence is not topological. In fact, a typical sequence which
quasiconverges to j E I has subsequences quasiconverging to cpo Try
1,2,1,3,1,4, ...
whose q-lim is 1. On the brighter side, if q-lim ia = j E I, and a* is coter-
minous generalized subsequence, then q-lim i a * exists, and is either j or cpo
Coterminous means: for any a E A, there is an a* > a.
Conversely, if q-lim ia exists, and ia = j E I for arbitrarily remote a, then
q-lim ia = j.
Quasilimits cohere with convergence in probability, a fact that will be in
use later. Let {Xn} be a sequence of I-valued random variables on (0, Y;, f?lJ).
Suppose
Xq = q-limn Xn
exists f?lJ-almost surely. And suppose Xn converges in f?lJ-probability to an
I-valued limit Xp. Then

To see this, choose a subsequence n* such that


X n* = Xp for infinitely many n
with f?lJ-probability 1.
If f?lJ{Xp = cp} > 0, the only safe assertion is
f?lJ{Xp = Xq Xp I E I} = 1.
The situation is similar for generalized sequences, provided the directing set
A has a coterminous countable subset A *.
EXAMPLE. Suppose Xn is 0 with probability lin, and n with probability
(n - 1)/n. Suppose the Xn are independent. Then
Xn --+ cp in probability
q-limn Xn =0 with probability 1.
Let R be the set of binary rationals in [0, (0) as before. The next definition
is key.
304 THE GENERAL CASE [9

(7) Definition. A function f from R to I is quasi regular iff:


q-lim {fer): r E Rand r ! t}
exists for all nonnegative real t, and equals f(t)for all t E R; while
q-lim {f(r):r E Rand r i t}

exists for all positive real t, and equals f(t) for all positive t E R.
A function f from [0, (0) to j is quasiregular iff: f(r) E I for all r E R; and
f retracted to R is quasiregular; and

q-lim {f(r):r E Rand r! t} = f(t)

for all nonnegative real t.

WARNING. t runs over all of [0, (0).

Suppose fis quasi regular from [0, (0) to i. I claim fis quasicontinuous from
the right: that is, f(t) = q-lim {f(s):s! t}. Begin the check by supposing
f(t) = i E I. By definition, f(s) = i for s arbitrarily close to t on the right.
Conversely, suppose f(sn) = i E I for some sequence sn! t. Without loss,
make the Sn strictly decreasing, and use the definition to find binary rational
rn with Sn ~ rn < Sn-l and fern) = i. But rn! t, so the definition forces
f(t) = i. Similarly,f has a quasi limit from the left at all positive times, and
is quasicontinuous at all binary rational r.

The process on R
Let n be the set of all functions w from R to /. Let {X(r):r E R} be the
coordinate process on n, that is, X(r)(w) = w(r) for r E Rand WEn.
Endow I with the a-field of all its subsets, and n with the product a-field,
that is, the smallest a-field over which each X(r) is measurable. For each
i E I, let Pi be the probability on n for which {X(r): r E R} is Markov with
stationary transitions P and X(O) = i. Namely, for 0 = ro < r1 < ... < rn
in Rand io = i, iI' ... , in in I,

Pi{X(r m) = im for m = 0, ... , n} = n;:;:lo Per m+1 - rm' i m, im+1)'


By convention, an empty product is 1.
For 0 < L < 00, let L * be the set of WEn such that: for all real t with
o ~ t < L, as r E R decreases to t, the generalized sequence w(r) quasi-
converges; and for all real t with 0 < t ~ L, as r E R increases to t, the
generalized sequence w(r) quasiconverges.
(8) Lemma. The set L * is measurable.
9.2] QUASIREGULAR SAMPLE FUNCTIONS 305

PROOF. If i o, iI' ... ,im is an I-sequence, there is a change at index


v < miff iv ¥- ivH ' Letj and k be different states. Consider the state sequence
{w(m2-n):m = 0, ... ,(L2n)}.
Delete those terms which are neither j nor k. Count the number of changes
in this reduced sequence, and call it fJn(j, k, w). Check that fJn(j, k, .) is
measurable, and nondecreasing with n. You can show that L * is the set of
wEn with:
lim n _ oo fJn(j, k, w) < 00
for all pairs of different states j and k. The argument is similar to the one in
(5). For the second part, fixj ¥- k in I and w E L *. For each 1 E [0, L], there
°
is a b = b(/) > such that:
there do not exist r, s in R n [0, L] n (/,1 + b) with w(r) = j and
w(s) = k;
there do not exist r, s, in R n [0, L] n (I - b, t) with w(r) = j and
w(s) = k. *
(9) Lemma. The set nL>o L * is measurable and has PdrobabililY 1.
PROOF. Clearly,
n {L*:L is a positive real number} = n {L*:L is a positive integer}.
It is therefore enough to prove Pi{L *} = 1 for each L > 0. Without real loss
of generality, fix L = 1. For r ERn [0,1] and k E I, define a real-valued
function fr.k of pairs s ERn [0, 1] and WEn by
ir.is, w) = P(r - s, w(s), k) for s ~ r,
=J,..k(r, w) for s > r.
Let Fr • k be the set of all WEn such that:
for all 1 E (0, 1], the generalized sequence fr.k(S, w) converges as s E R
increases to I; and,
for all 1 E [0, 1), the generalized sequence f,.k(S, w) converges as s E R
decreases to I.
For ° ~ s ~ rand s E R,
fr.k(S, .) = Pi{X(r) = k I X(u) for U E Rand u ~ s}.
Therefore, {J,..k(S, '):0 ~ s ~ rand s E R} is a martingale relative to Pi'
Consequently,
{J,..is, ·):0 ~ s ~ 1 and s E R}
is a martingale relative to Pi' Let u < v be rational, and let fJn(u, v, w) be the
306 THE GENERAL CASE [9

number of downcrossings of [u, v] by


{J,.,,,(m/2n, w):m = 0, ... , 2n}.
As (5) implies, .Fr,k is the set of w with limn ~n(u, v, w) < 00 for all rational
pairs u < v. By the downcrossings inequality (10.33),

f!nCU, v, w) P;(dw)

is bounded in n for each pair u, v. But f3nCu, v, w) is nondecreasing with n.


Therefore,

!olim n ~(u, v, w) P;(dw) < 00.


By CI0.l0b),
Pi{lim n f3nCu, v, .) < oo} = 1
for each pair u, v. So Pi{Fr,k} = 1, and nr,k Fr,k has Pi-probability 1.
Let w E nr,k Fr,k' To see that WEI *, suppose Sn E Rand sn E R both
increase to t E CO, 1], while wCsn) = j E I and w(sn) = j E I. The decreasing
case is similar. Let r ERn [0, 1], and r ~ t. Let k E I. Then
fr,k(Sn' w) = PCr - sn'}, k) - PCr - t,j, k)
fr,kCSn' w) = PCt - sn,j, k) - PCr - t,j, k)
by C5,9). Since w E Fr,k' the two subsequential limits must be equal:
PCr - t,j, k) = PCr - t,j, k),
Let r ! t and use C5.9) again to get
PCO,j, k) = P(O,j, k).
The left side is 1 iff j = k, and the right side is 1 iff j = k, so j = j. *
(10) Lemma. (a) Let S E R. As r E R converges to s, the generalized sequence
X(r) converges in Pi-probability to Xes).
(b) Let t be real. As r E R converges to t, the generalized sequence X(r)
converges in Pi-probability to an I-valued random variable get).
(c) If t E R, then get) = X(t) with Pi-probability 1.
PROOF. Claim (b). It is enough to check that Pi{X(r) = Xes)} - 1 as
r, s E R converge to t. If 0 ~ r ~ s,
Pi{X(r) = Xes)} = ~j Per, i,j)P(s - r,j,j).
Now pes - r,j,j) - 1 for eachj. Use Fatou.
Claim (a) is similar.
Claim (c) is immediate from (a) and (b).
*
9.2] QUASIREGULAR SAMPLE FUNCTIONS 307

WARNING. The limiting random variable X(t) is well defined only a.e.
(11) Lemma. Let t be positive and real. Choose a version of X(t), as defined
in (lOb). For Pi-almost all w,for any e > 0, there are rand sin R with
t - e <r <t <s <t + e and w(r) = X(t, w) = w(s).
PROOF. Choose a sequence r n E R with r nit. By (10), X(r n) --'>- X(t) in
Pi-probability. Now choose a subsequence n' with Pi{X(r n ,) --'>- X(t)} = 1.
Similarly for the right. *
(12) Definition. Let Qq = {w: wE Q and w is quasiregular}.
Quasiregularity was defined in (7).
(13) Lemma. The set Qq is measurable and Pi{Qq} = 1.
PROOF. For r E R, let G(r) be the set of wE Q for which: there are s E R
with s > r but arbitrarily close having w(s) = w(r); and if r > 0, there are
s E R with s < r but arbitrarily close having w(s) = w(r). Clearly, G(r) is
measurable; and Pi{G(r)} = 1 by (toc, II). Remember (8-9). Check

Qq = (nL>o L*) (\ (n TER G(r». *


The process on [0, co)

(14) Definition. For wE Qq, let X(t, w) = q-lim X(r, w) as r E R decreases


to t.
Check X(r, w) = w(r) for wE Qq and r E R.

(15) Lemma. The function (t, w) ---->- X(t, w) is measurable.

PROOF. The set of pairs (t, w) with 0 ~ t < 00 and wE Q q and


X(t,w)=iis
n:=l UrER A(n, r),
where A(n, r) is the set of pairs (t, w) with 0 ~ t < 00 and WE Q q and
1
t < r < t + - and w(r) = i.
n
*
(16) Theorem. The process {X(t):O ~ t < oo}, defined by (14) on the
probability triple (Qq, P;), is a Markov chain with stationary transitions P,
starting state i, and all sample functions quasiregular in the sense of (7).
NOTE. The a-field on Q q is the product a-field on Q relativized to Q a ;
and Pi is retracted to this smaller a-field.
308 THE GENERAL CASE [9

PROOF. Fix t ~ o. As r E R decreases to t, I claim X(r) converges to X(t)


in Pi-probability; you derive (16) from this claim and (13). To argue the
claim, fix a version of X(t), as defined in (1Ob). Now X(r) converges to
X(t) in P;-probability, by (lOb); so it is enough to show
Pi{X(t) = X(t)} = 1.
By (11), for Pi-almost aU wE Qq, there are r E R greater than but arbitrarily
close to t, with X(r, w) = X(t, w). But X(r, w) quasiconverges to X(t, w) as
r decreases to t, by definition (14); this identifies X(t, w) with X(t, w). *
Abstract processes
For (17), let (flE, fF,;?JJ) be an abstract probability triple, and
{Y(t): 0 ~ t < ex)} be a Markov chain on (flE, fF, ;?JJ), with stationary
transitions P.
(17) Theorem. Let flEq be the set of x EflE such that Y(·, x) retracted to R
is I-valued and in Qq. Then flEa E fF and ;?JJ{flEq} = 1. On flEq, let Y*(t, x) =
q-lim Y(r, x) as r E R decreases to t. Then Y*(·, x) is quasiregular and
;?JJ{ yet) = Y*(t)} = 1.
PROOF. Use (13) to get flEqEfF and ;?JJ{flEq} = 1. Plainly, Y*(·,x) is
quasi regular for x E flEq. Suppose ;?JJ{ YeO) = i} = 1. The joint ;?JJ-distribution
of {Y(t) and Y(r): r E R} cqincides with the joint P;-distribution of
{X(t) and X(r):r E R}, by (16),~ So,
;?JJ{ yet) = Y*(t)} = Pi{X(t) = q-lim X(r) as r E R decreases to t} = 1,
by definition (14). See (5.45-48) or (7.15) in case of trouble. *
3. THE SETS OF CONSTANCY

Let A(i, s) be the set of WE Q with w(r) = i when r E Rand r ~ s.


(18) Lemma. A(i, s) is measurable, and
Pi{A(i, s)} = e- q (i)3.
Here e- co = 0, and ex) • s = 00 for s > 0, while ex) • 0 = O.
PROOF.
For j E
This repeats (7.4).
I, let *
R;(w) = {r:r E Rand w(r) = j}.
For j E I and w E Qq, let
Sj(w) = {t:O ~ t < 00 and X(t, w) = j}.
9.3] THE SETS OF CONSTANCY 309

The process X was defined in (14). These sets are called the level sets or sets
of constancy.
(19) Definition. Let j E I. Then Q(j) is the set of W E Qq such that: for all
t ~ 0,
X(t, w) = j implies lim.! t Xes, w) = j;
and for all t > 0,
q-limsttX(s,w)=j implies limsttX(s,w)=j.
The set Q q was defined in (12).
(20) Theorem. Let j E I, and let wE Qq. Then WE Q(j) iff the set Sj(w) is
either empty or a finite or countably infinite union of intervals [an, b n) with
a 1 < b 1 < a2 < .. '. Of course, a and b depend on w, and are not binary
rational. If there are infinitely many intervals, an -+ 00.
PROOF. The "if" part is easy. For "only if," let wE Q (j). Suppose t E Sj(w).
Then [t, t + c) C Sj(w) for some c > O. Of course, Sj(w) is closed from the
right, by quasiregularity. Consequently, Sj(w) is a finite or countably infinite
union of maximal disjoint nonempty intervals [an, bn). Suppose there are an
infinite number. By way of contradiction, suppose an' i c < 00 for a sub-
sequence n'. Then X(t, w) quasiconverges, so converges, to j as t increases to
c. So am < c ~ bm for some m, contradicting the disjointness. Similarly
for the right. *
(21) Theorem. The set Q(j) is measurable. And Pi{Q(j)} = 1 if j E I is
stable.
PROOF. First, Q(j) is the set of W E Qq such that the indicator function
of Rj(w), as a function on R, is continuous on R, and has limits from left
and right at all positive t f/: R: use (19, 20). So Q(j) is measurable by (5).
Check that the indicator function of Rj(w), as a function on R, is continuous
for Pralmost all w: use the argument for (7.6). For such an w, any r E Rj(w)
is interior to a maximalj-interval of w, as in the paragraph before (7.8). The

*
argument for (7.8) shows that for Pi-almost all w, for any n, only a finite
number of these intervals meet [0, n]. This disposes of the second claim.
Recall that j is instantaneous iff q(j) = 00.

(22) Lemma. Suppose j E I is instantaneous. The set of W E Q for which R;(w)


includes no proper interval of R is measurable and has Pdrobability 1.
A proper interval has a nonempty interior.
PROOF. For rand s in R with s > 0, let B(r, s) be the set of wE Q such
that w(t) = j for all t ERn [r, r + s]. Then B(r, s) is measurable. I say
Pi{B(r, s)} = 0. Indeed, consider this mapping Tr of Q into Q:
(Trw)(s) = w(r + s) for s E R.
310 THE GENERAL CASE [9

Use (10.16) on the definition of Pi and Pj' to see that for all measurable A:
Pi{X(r) =j and Tr E A} = P(r, i,j)Pj{A}.
Put A(j, s) for A: then
B(r, s) = {Tr E A(j, s)} c {X(r) = j},
so (18) makes
Pi{B(r, s)} = P(r, i,j)Pj{A(j, s)} = 0.
But Ur •s B(r, s) is the complement of the set described in the lemma. *
(23) Theorem. Suppose j E I is instantaneous: then the set of wE nq
satisfying (a) is measurable and has Pi-probability l. Properties (b) and (c)
hold for all W E nq and all j E /.
(a) Sj(w) is nowhere dense.
(b) Each point of Sj(w) is a limit from the right of Sj(w).
(c) Sj(w) is closed from the right.
PROOF. You should check (b) and (c). I will then get (a) from (22). In fact,
for W E n q , property (a) coincides with the property described in (22).
To see this, suppose wE nq and R;(w)::J [a, b] n R for a < b in R. Then
Sj(w) ::J [a, b) by (c). Conversely, suppose wE nq and Sj(w) is dense in
(a, b) with a < b. Choose a pair of binary rationals c, d with a < c < d < b.
Then Sj(w)::J [c, d] by (c), so Rj(w)::J [c, d] n R. *
(24) Remarks. (a) Suppose W E nq • Then [0, 00 )\S;(w) is the finite or count-
ably infinite union of intervals [a, b) whose closures [a, b] are pairwise
disjoint. This follows from properties (23b-c).
For (b-d), suppose W E nq satisfies (23a).
(b) [0, oo)\S/w) is dense in [0, (0), and is therefore a countably infinite
union of maximal intervals.
(c) That is, S;(w) looks like the Cantor set, except that the left endpoints of
the complementary intervals have been removed from the set. And Sj(w) has
positive Lebesgue measure, as will be seen in (28, 32).
(d) If t E Siw), there is a sequence rn E R with rn! t and X(rn' w) rf j, so
X(rn' w) -+ rp.

(25) Theorem. For wE nq and i i= j in I, the set Si(W) n Sj(w) is finite in


finite intervals.
Here, A is the closure of A.
PROOF. Use compactness.
This set is studied again in Section 2.2 of ACM.
*
9.3] THE SETS OF CONSTANCY 311

(26) Theorem. The set of wE D.qfor which S'I'(w) has Lebesgue measure 0
is measurable and has Pi-probability 1.

*
PROOF. Fubini on (15, 16).
NOTE. Suppose allj E J are instantaneous. For almost all w, the set Siw)
is the complement of a set of the first category. Consequently, any nonempty
interval meets S'I'(w) in uncountably many points. For a discussion of category,
see (Kuratowski, 1958, Sections 10 and 30).
(27) Definition. Call a Borel subset B of [0, 00) metrically perfect iff: for any
nonempty open interval (a, b), the set B n (a, b) is either empty or ofpositive
Lebesgue measure. Let D. m be the set of wE D.q, as defined in (12), such that:
for all j E J, the set S;( w) is metrically perfect. This is no restriction for stable j
and w E D.(j), as defined in (19).
(28) Lemma. The set D. m is measurable and Pi{D. m } = 1.
PROOF. For a < r < band r E Rand j E J, let A(j, r, a, b) be the set of
wE D. q such that: either w(r) :F- j, or
Lebesgue {S;(w) n (a, b)} > o.
Any proper interval contains a proper subinterval with rational endpoints.
Moreover, S;(w) n (a, b) is nonempty iff R;(w) n (a, b) is nonempty.
Consequently, D. m is the intersection of A(j, r, a, b) for j E J and r E Rand
rational a, b with a < r < b. So D. m is measurable, and it is enough to prove
Pi{A(j, r, a, b)} = 1.
Suppose P(r, i,j) > 0, for otherwise there is little to do. Let e > O. Let
L(e, w) = e1 Lebesgue {S;(w) n (r, r + e)}.
Use (15) and Fubini in the first line, and (16) in the last:

i {X(TJ~;}
L(e) dP; =e
1 fT+e P;{X(r) = j
T
and X(t) = j} dt

= ~ iep;{X(r) = j and X(r + t) = j} dt

= P(r, i,j)~ fp(t,j,j)dt.


Now 0 ~ L(e) ~ I, and the preceding computation shows

r
){X(TJ=i}
L(e) dP; --+ P;{X(r) = j} as e --+ o.
312 THE GENERAL CASE [9

For any c < 1, Chebychev on 1 - L(e) makes


Pi{L(e) > c I X(r) =j}_1 as e-+O.
So
Pi{A(j, r, a, b) X(r) I = j} = 1.
By definition,
Pi{A(j, r, a, b) I X(r) ¥= j} = 1.
*
The Markov property
The next main result on the sets of constancy is (32). To prove it, I need
the Markov property (31). Lemmas (29-30) are preliminary to (31).
(29) Lemma. Let tn -+ t. Then:
(a) The sequence X(tn) tends to X(t) in Pdrobability;
(b) P;{q-lim X(tn) = X(t)} = I.
PROOF. The argument for (10) proves the first claim. For the second,
suppose without much loss of generality that tn < t for infinitely many n;
and t n > t for infinitely many n; and t n = t for no n. Consider first the set
L of n with tn < t. Use (a) to find a subsequence n' E L with
Pi{X(tn') -+ X(t)} = I.
As n -+ 00 through L, the sequence X(tn' w) has at most one finite limit, by
quasiregularity. So, for Pi-almost all w, it has exactly one, namely X(t, w).
Similarly for the right. *
Let R* be a countable dense subset of [0, (0). Say a function f from
[0, (0) to I is quasiregular relative to R* iff for all t ~ 0,
f(t) = q-limf(r) as r E R* decreases to I,

and for all I > 0,


q-lim f (r) as r E R * increases to I
exists, and f is finite and quasicontinuous when retracted to R*.
(30) Lemma. The sel of wE Q q such that X(', w) is quasiregular relative to
R* is measurable and has Pi-probability I.
PROOF. Let G be the set of w E Q q such that X(', w) is finite and quasi-
continuous when retracted to R*, and satisfies
q-lim X(r*, w) = X(r, w) as r* E R* decreases to r.
Now G is measurable, and (29) implies P;{G} = 1. If X(', w) is quasiregular
relative to R*, then wE G. Suppose wE G. I say X(', w) is quasiregular
relative to R*. Fix t ~ 0, and let r* E R* decrease to t. Since X(', w) is
9.3] THE SETS OF CONSTANCY 313

quasiregular, X(r*, w) quasiconverges; if the limit is j E I, then X(t, w) = j.


On the other hand, if X(t, w) = j E I, there is a sequence r n E R decreasing
to t with X(r n' w) = j. Find r: E R* to the right of rn but close to it, with
X(r:, w) = j, and make X(r*, w) quasiconverge to j. I haven't handled the
value cp explicitly. But X(t, w) ¥= q-lim {X(r*):r*E R* and r* i t} forces

*
at least one side to be finite, so the infinite case follows from the finite case.
The argument on the left is similar
Let t ~ O. Let W t be the set of wE Q q such that u ---+ X(t + u, w) is quasi-
regular on [0, (0). For w E Qq, let Ttw be the function r ---+ X(t + r, w) from
R to I. Let ff(t) be the a-field spanned by Xes) for s ;; t. Here is the Markov
Property.
(31) Lemma. Fix t ~ O.
(a) W t is measurable and has Pi-probability 1.
(b) T t is a measurable mapping of W t into Qq.
If wE W t and u ~ 0, then X(t + u, w) = X(u, Ttw).
(e)
(d) Suppose A E ff(t) and A c {X(t) = j}. Suppose B is a measurable
subset of Q. Then
Pi{A and T;lB} = Pi{A}· Pj{B}
(c) On {X(t) = j} n Wt, a regular conditional P,-distribution for T,
given ff (t) is Pj.
(f) Given {X(t) = j}, the shift T t is conditionally P;-independent of ff(t),
and its conditional Pi-distribution is Pj.
(g) Let ff be the product a-field in n
relativized to Qq. Let F be a non-
negative, measurable function on the carlesian product
(Q q , ff(t» X (Q q , ff).

=i
Then

r F(w, 7;w) Pi(dw)


Jw, ~
F*(w) P;(dw),

=i
where

F*(w) F(w, w') Px<t.w) (dw').


n,
PROOF. Claim (a) is clear from (30). For R* take the union of
R n [0, I] with {I + r:r E Rand r > O}. You have to worry separately that
X(t) = q-lim {X(t + r):r E Rand riO}.
Claim (b) is clear.
Claim (c) follows from this computation, for w E Wt.
X(u, Ttw) = q-lim {X(r, Ttw):r E Rand r ! u}
= q-lim {X(t + r, w):r E Rand r i u}
= X(t + u, w).
314 THE GENERAL CASE [9

The first line is definition (14). The second line is the definition of T t • The
third line uses w E Wt.
Claim (d). By inspection or (10.16), it is enough to do this for special A
and B, of the form
A = {X(sn) = in for n = 0, ... , N}
B = {X(u m) = jm for m = 0, ... ,M};
where °°= So < ... < sN = t and io = i, ... ,iN = j are in I; for the
second line ~ Uo < ... < U M < 00 and the j's are in I. Let

B* = {X(t + u m) = jm for m = 0, ... , M}.


Now compute.

Pi{A and T;1B} = Pi{A and J'ft and T;lB} by (a)


= Pi { A and Wt and B*} by (c)
= Pi { A and B*} by (a)
= Pi{A} . Pj{B} by (16).

Claims (e,f) follow from (d).


Claim (g). If F(w, w') = IA(w)· IlJ(w'), this reduces to (d). Now
extend.
For a general discussion of (g), see (10.44).
*
WARNING. The condition w E Q q does not imply Tt'JJ E Qq. The conditions
wE Q q and Ttw E Q q do not imply Xes, Ttw) = X(t s, w). +

Metric density
The next result is due to Chung (1960, Theorem 3 on p. 146). It has content
only for instantaneous j.
(32) Theorem. Let j E I. The set of w E Q q with

lim ..... o e- 1 Lebesgue {S;(w) () [0, en = 1

is measurable and has P;-probability I.


PROOF. Confine w to Qq. The functionfn whose value at (t, w) is

nLebesgue {s: t ~ s ~ t + ~ and Xes, w) = j}


9.4] THE STRONG MARKOV PROPERTY 315

is measurable in w by (15) and continuous in I by inspection. Consequently,


it is jointly measurable. Let G be the set of pairs (I, w) where
lim ll .... oo l n (t, w) = 1. Then G is measurable. Let G t be the I-section of G,
namely, the set of w E nq with (I, w) E G. Similarly, G", is the set of t with
(I, w) E G. For each wE n q , the set G", differs by a Lebesgue-null set from
Sj(w), by the metric density theorem (10.59). By Fubini, G differs from the
set of pairs (1, w) with X(/, w) = j by a Lebesgue X Pi-null set. Using
Fubini again, there is a Lebesgue-null set N, and for each t f/= N there is a
Prnull set N(t), such that: if t rf: Nand w rf: N(t) then X(t, w) = j iff w E Gt .
Fix any t rf: N. Now
P;{X(t) = j} = P(t,j,j) > °
I
by (5.7), so P;{G t X(t) = j} = 1. If wE WI' as defined for (31), then
In(t, w) = In(O, Ttw)
by (3Ic). So wE Gt iff Ttw E Go. Then (3Id) implies
I
Pj{G o} = Pj{G t X(t) = j} = 1.

To complete the argument, check that for measurable subsets B of [0, (0),

limn-+oo n Lebesgue {B n [0, ljn]} = lim.-+ o ~ Lebesgue {B n [0, en,


because Lebesgue {B n [0, en is monotone in e and nj(n + 1) -4- 1. *
An attractive conjecture is: for P;-almost all w, the set Slw) has right
metric density I at all its points.

4. THE STRONG MARKOV PROPERTY

You should review Section 7.4 before tackling this section, which is pretty
technical. One reason is a breakdown in the proof of (7.35). I'll use its nota-
tion for the moment. Then
lim sup (A n Bn) cAn B
survives, by quasiregularity. But
A nBc lim inf (A n Bn)
collapses. To patch this up in (34), I need (33). Here is the permanent
notation.

° As in Section 7.4, let .F(t) be the smallest a-field over which all Xes) with
~ s ~ t are measurable: the coordinate process X on nq has smooth
sample functions and is Markov relative to Pi' with stationary standard
316 THE GENERAL CASE [9

transitions P and starting state i. A nonnegative random variable T on Q a is a


Markov time or just Markov iff for all t, the set {T < t} is in .?F(t). Let
.?F(T+), the pre-T sigma field, be the collection of all measurable sets A
with A ("'\ {T < t} in .?F(t) for all t. Let Ll = {T < oo}, so Ll c Q a• Let Tn
be the least m/2n > T, on Ll.
EXAMPLE. X(T) is measurable relative to .?F(T+).
PROOF. Let j E 1. Y oushould check that
X(T) =j and T <t
iff for any positive rational r, there are rational u and v with
u ~ T <V <t and v - u < rand XCv) = j. *
(33) Lemma. Suppose T is Markov. For each t ~ 0 and j E I,

lim n .... oo Pi{X(T + t) = j and X(Tn + t) ¥= j} = O.


PROOF. I will only argue t = O. Fix e > O. Then fix n so large that
t ~ 2- n implies 1 - P(t,j,j) ~ e. Now

{X(T) =j and X(Tn) ¥=j} = U~~lA(m),


where
A(m) = {em - 1)2- n ~ T < m2- n and X(T) = j and X(m2- n ) ¥= j}.
ButA(m) c {B(m) and X(m2- n) ¥= j}, whereB(m) is the event:

(m - 1)2-n ~ T < m2- n ,


and there is a binary rational r with
T < r < m2-n and X(r) = j.
Furthermore, B(m) is the increasing limit of B(m, N), where B(m, N) is
the event:

and there is a binary rational r with


T < r < m2- n and X(r) = j

and 2N r an integer. Finally, B(m, N) is the disjoint union of B(m, N, M),


where B(m, N, M) is the event:

(m - 1)2-n ~ T < M2-N < m2-n and X(M2-N ) = j,


and M is the least such nonnegative integer.
9.4] THE STRONG MARKOV PROPERTY 317

Now B(m, N, M) E ff(M/2 N ). The definition of P; or the Markov property


(31) imply:

P;{B(m, N, M) and X(m2- n) ¥= j}

= P;{B(m, N, M)}' [ 1 - p(; - ;! ,i,i) ]


~ eP;{B(m, N, M)}.
Sum on M:
P;{B(m, N) and X(m2-n) ¥= j} ~ eP;{B(m, N)}.
Let N i 00:
P;{B(m) and X(m2-n) ¥= j} ~ eP;{B(m)}.
But the B(m) are disjoint. Sum on m:

Let yet) = X(T + t). Call Y the poSt-T process. I say (t, w) -+ Yet, w)
*
is measurable. Indeed, (t, w) -+ T(W) + t is the sum of two measurable
functions, and is therefore measurable. Consequently, (t, w) -+ (T(W) + t, w)
is measurable. But (t, w) -+ X(t, w) is measurable by (IS). The composition
of the last two mappings is (t, w)-+ Yet, w).

°
(34) Proposition. Suppose T is a Markov time. Let Y be the poSt-T process.
Let ~ So < SI < ... < sJJl' Let i o, iI' ... ,(lI be in I. Let
B = {Y(sm) = imfor m = 0, ... , M}
and

Let A E ff(T+) with A c ~. Then


(35)
PROOF. As in (33), let Tn be the least m/2n >T, on ~. As in (7.35),
(36) P;{A and X(Tn + sm) = im for m = 0, ... ,M}
= P;{A and X(T n + so) = io} . TT.
But
{A and X( T + so) = io} ::J lim supn {A and X( T n + so) = io}
by quasiregularity. So
P;{A and X(T + so) = io} ~ lim supn P;{A and X(T n + so) = i o}.
By (33),
P;{A and X( T + so) = io} ~ lim infn P;{A and X( T n + so) = io}.
318 THE GENERAL CASE [9

*
Thus, the right side of (36) converges to the right side of (35). Similarly for
the left sides.
The next problem is controlling the sample functions of the post-T process.
It is convenient to prove the more general lemma (37) before cleaning the
post-T process in (39). Proposition (39) is a preliminary form of the strong
Markov property (41), on the set {X(T) E I}.
(37) Lemma. Let Y be a jointly measurable, I-valued process on a probability
triple (0, :F, f!J'). Suppose Y is a Markov chain with stationary transitions P
and starting state j E I. Suppose:
(a) Y(', w) is quasicontinuous from the right for all w;
(b) {t: Yet, w) = k} is metrically perfect for all k EI and all w.
Then the set of w such that Y(', w) is quasiregular has inner f!J'-probability 1.
PROOF. Let OR be the set of w E 0 such that Y(', w) retracted to R is
quasiregular. As (17) shows, OR E:F and f!J'{OR} = 1. Let
Y*(t, w) = q-lim Y(r, w) as r E R decreases to t
for all t ~ 0 and wE OR' In particular, the y* sample functions are quasi-
regular. As (17) shows,
f!J'{ Y*(t) = yet)} = 1 for each t ~ O.
Let 0 0 be the set of w E OR such that
Lebesgue {t: Y*(t, w) =F Yet, w)} = o.
As Fubini implies, f!J'{00} = I. The proof of (37) is accomplished by showing
that for all w E 0 0 :
Y(t, w) = Y*(t, w) for all t ~ O.
Indeed,
(38) if Y*(t, w) = k E I, then Y(t, w) = k,
because of (a). Conversely, suppose Yet, w) =k E I. Now (b) implies
Lebesgue D > 0 for any () > 0, where
D = {s:t < s < t + () and Yes, w) = k}.
Because wE 0 0, there is an sED with Y*(s, w) = Yes, w), that is, with
Y*(s, w) = k. Because Y*(', w) is quasiregular, Y*(t, w) = k. I haven't
handled the value cp explicitly. But yet, w) =F Y*(t, w) implies that at least
one is in I, so the infinite case follows from the finite case. *
The final polish on this argument is due to Pedro Fernandez.
9.4] THE STRONG MARKOV PROPERTY 319

Let ~q be the set of w E ~ such that Y(', w) is quasi regular : where Y is the
post-T process.
WARNING. Select an w such that Y(', w) is quasiregular when retracted to
R. Even though Y(', w) is quasicontinuous from the right, it is still possible
that
q-limd t Y(r, w) = qJ while Yet, w) E I;
so w If ~q. This hurts me more than you.
(39) Proposition. Let T be Markov, and ~ = {T < co}.
(a) Given ~ and X(T) = j E I, the pre-T sigmafield ~(T+) and the post-T
process Yare conditionally Prindependent, Y being conditionally Markov with
stationary transitions P and starling stale j.
(b) ~q is measurable.
I
(c) Pi{~q ~ and X(T) =j E I} = l.

PROOF. t Use (34).


Claim (a).
Claim (b). Let ~R be the set of w E ~ such that Y(', w) retracted to R
is quasiregular. As (17) shows, ~R is measurable. For w E ~R' let
Y*(/, w) = q-lim Y(r, w) as r E R decreases to t.
Then Y*(-, w) is quasi regular. Now w E ~q iff w E ~R and
Y(t, w) = Y*(t, w) for all I ~ O.
If w E ~R' then automatically
Y(r, w) = Y*{r, w) for all r E R.
Let ~Q be the set of w E ~R such that for all r E R, either r < T{W) or
X(r, w) = Y*(r - T(W), w).
Plainly, ~Q is measurable.
I say ~Q = ~q. To begin with, I will argue ~q c ~Q' Fix w E ~a' Then
Y*{t, w) = Y(t, w) = X[T{W) + t, w]
for O. If r ?; T{W), putt = r - T(W), and get w E ~Q' Next, I will argue
t ~
~Q c ~q. Suppose w E ~Q and I ~ O. Then w E ~R' and all I have to get is
Y(/, w) = Y*(/, w). If Y*(t, w) = j E I, there are r E R close to t on the right
with Y*(r, w) = Y(r, w) = j; and Y(', w) is quasicontinuous from the right,
so Y(t, w) = j. Suppose Yet, w) = j E I. There is a sequence rn E R with
rn! T(W) + t and X(rn' w) = j. Because w E ~Q'
Y*(r n - T{W), w) = X(rn' w) =j.
t I know this is a bad one, but it's plain sailing afterwards.
320 THE GENERAL CASE [9

But rn - r(w) ~ t and Y*(', w) is quasicontinuous from the right, so


Y*(t, w) = j = yet, w). The case of infinite values follows by logic. This
completes the argument that d Q = d Q , and shows d Q is measurable.
Claim (c). Use (a-b) and (37). If w E Om, as defined in (27), then con-
dition (37b) holds; and Pi{Om} = 1 by (28). *
Let n be the set of all functions from R to i. Let X(r, w) = w(r) for
r E Rand WEn. Give n the smallest a-field over which all X(r) are measur-
able. Here is the shift mapping S from d to n:
(40) X(r, Sw) = Y(r, w) = X[r(w) + r, w] . for all r E R.
You should check that S is measurable.
Here is the strong Markov property on the set {X(r) E I}.
(41) Theorem. Suppose r is Markov. Let d = {r < co} and let Y be the
post-r process. Remember that d Q is the set of WEd such that Y(', w) is quasi-
regular. Define the shift S by (40).
I
(a) Pi{d Q d and X(r) = j E l} = 1.
(b) If wE d Q, then Y = X 0 S; that is,
Yet, w) = X(t, Sw) for all t ~ O.
(c) Suppose A E ff(r+) and A C {d and X(r) =j E I}. Suppose B is a
measurable subset of O. Then
PitA and S E B} = Pi{A} . P;{B}.
(d) Given d and X(r) = j E I, the pre-r sigmafield ff(r+) is conditionally
Pi-independent of the shift S, and the conditional Pi-distribution of S is Pi'
(e) Let ff be the product a-jield in 0 relativized to OQ' Let F be a non-
negative, measurable function on the cartesian product
(OQ' ff(r+» X (OQ' ff).
LetjEI, and
D = {d Q and X(r) = j}.
Then

LF(W, Sw) Pi(dw) = LF*(W) Pi(dw)


where

F*(w) = r F(w, Wi) Pldw').


In.
NOTE. Claims (b-e) make sense, if you visualize 0 as this subset of n:
{w:w En and w(r) E I for all r E R}.
Then S maps d Q into O.
9.4] THE STRONG MARKOV PROPERTY 321

PROOF. Claim (a). Use (39b, c).


Claim (b). Use the definitions.
Claim (c). Use (34) to handle the special B, of the form
{X{sm) = im for m = 0, ... ,M},
with °
~ So < SI < ... < S.ll and i o, iI' ••• , i 111 in l. Then use (10.16).
Claim (d). Use (c).
=
*
Claim (e). When F{w, w') lA{w)· In{w') this reduces to claim (c).
Now extend.
For a general discussion of (e), see (1O.44).
Remember 0 is the set of all functions from R to I; and X(r, w) = w{r)
n
for wE 0 and r E R; and is endowed with the smallest a-field over which
all O(r) are measurable.
Letfbe a function from R to I. Say fis quasiregular on (0, (0) iff:
(a) fer) E I for all r > 0;
(b) q-limf(r) exists as r E R decreases to I for all I ~ 0, and is f(/) for
IE R;

(c) q-limf(r) exists as r E R increases to I for all I> 0, and is f{/) for
positive I E R.
Let f be a function from [0, (0) to 1. Say f is quasiregular on (O, (0) iff: f
retracted to R is quasi regular on (0, (0), and
f(/) = q-limf(r) as r E R decreases to I
for all t ~ 0. Let 0(/ be the set of wE 0 which are quasiregular on (0, (0).
CRITERION. Let wE O. Then wE Oa iff w(r + .) E Q a for all positive
r E R, and
w(O) = q-lim {w(s):s E Rand s! OJ.
As (8) implies, 0(/ is measurable. For w E 0(/, let
(42) X(t, w) = q-lim X(r, w) as r E R decreases to t.
Introduce the class P of probabilities f' on 0, having the properties:
(43a) f'{X(r) E I} = 1 for all positive r E R;
and
(43b) f'{X(rn) = in for n = 0, ... ,N}
= f'{ X(ro) = io} . n;;':-l pern+l - r n> in> in+l)
for all nonnegative integers N, and °
~ ro < rl < ... < rN in R, and
i o, •.. ,iN in I. By convention, an empty product is 1.
322 THE GENERAL CASE [9

CRITERION. fl E P iff for all positive r E R,


,u{X(r) E /} = 1,
and the ,u-distribution of {X(r + s):s E R} is
LjEI ,u{X(r) = j} . Pj.
This makes sense, because {s -+ X(r + s, w):s E R} is an element of 0 for
fl-almost all w E Q. And Pj acts on O.
The results of this chapter extend to all fl E P in a fairly obvious way, with
X replacing X. In particular, ,u E P concentrates on Dq by (13). And (16)
shows that fl E P iff relative to ,u, the process {X(t): ~ t < co} is finitary
and Markov on (0, co) with stationary transitions P, as defined in (7.18).
°
(44) Proposition Suppose T is Markov. Let Ll = {T < co}, and let Y be the
poSt-T process. Let liq be the set of WEll such that Y(·, w) is quasiregular on
(0, co).
(a) With respect to P i {·' Ll}, the poSt-T process Y is a Markov chain on
(0, co), with stationary transitions P.
(b) P i { yet) E / , Ll} = 1 for t > 0, so Y is finitary.
(c) b.q is measurable.
(d) Pi{b. q , Ll} = 1.
PROOF. Claim (a) follows from (34): put A = Ll.
Claim (b) follows from (a), (26), and (7.20).
Claims (c, d). Let R+ be the set of positive r E R. Let Gr be the set of
WEll such that Y(r + ., w) is quasiregular. Let

Y(O+, w} = q-lim {Y(r, w):r E Rand r! O}


H = {w:w Ell and YeO, w) = Y(o+, w)}.
Then
b.q = H n (n rER + Gr ).
Clearly, H is measurable. Because Y(·, w) is quasicontinuous from the
right, yeO +, w) E I implies YeO, w) = yeO +, w). Proposition (39c) implies
Pi{Y(O, w) E I and Y(O+, w) ¥= YeO, w) and Ll} = 0.
If two elements of j are unequal, not both are q;; so
Pi{H' Ll} = 1.
Fix r E R+. Then T + r is Markov, so Gr is measurable by (39b) on T + r.
And
9.5J THE POST-EXIT PROCESS 323

by (b), so I can use (39c) on T + r to get

Here is the strong Markov property.


Pi{Gr Ill} = 1.
*
(45) Theorem. Suppose T is Markov. Let Ll = {T < oo}, and let Y be the
post-T process. Let ~Il be the set of wEll such that Y(·, w) is quasiregular on
(0, (0). Define the shift S by (40).
(a) ~Il is measurable and Pi{~11 IM = 1.
(b) If w E ~Il' then Y = X 0 S; that is,
Y(t, w) = X(t, Sw) for all t ~ o.
On Ll, let Q(.,.) be a regular conditional Pi-distribution for S given ~(T+).
Remember that P is the set of probabilities on Q which satisfy (43). Let Ll p
be the set of wEll such that Q(w, .) E P.
(e) Ll p E ~(T+), and P;{Ll p IM = 1.
PROOF. Claim (a) repeats (44c-d).
Claim (b) follows from the definitions.
Claim (c) follows from (44b) and (34), as in (7.41).
ASIDE. Let ~ y(O+) = n,>o ~ y(e), where ~ y(e) is the a-field spanned
*
by yet) for 0 ~ t ~ e and all measurable subsets of OIl\Ll. Given ~ y(O+),
on Ll the process Yand the a-field ~ (T+) are conditionally P;-independent.
This generalizes (41d).
NOTE. Strong Markov (41, 45) holds with f1 E P in place of Pi: review
the proofs.
For another treatment of strong Markov, see (Chung, 1960, 11.9), and
for a complete strong Markov property on {X(T) = If}, see (Doob, 1968).

5. THE POST-EXIT PROCESS

For w E 0Il' let


T(W) = inf {t:X(t, w) ~ X(O, w)}.
Plainly, T is Markov. Let
Yet, w) = X[T(W) + t, w] if T(W) < 00,
the post-exit processes. Following Section 7.3, let i E I and 0 < q(i) < 00.
Let Oi be the set of wE 0Il with X(O, w) == i and T(W) < 00. Let
r(i,j) = Q(i,j)/q(i) for j ~ i,
r(i,j) =0 for j = i.
324 THE GENERAL CASE [9

(46) Theorem. (a) P;{-r < t} = e-q(i)t, so Pi{Qi} = 1.


(b) T and Yare independent.
(c) Pi{Y(O) = j} =
r(i,j).
(d) Y is Markov and finitary on (0, (0), with stationary transitions P and
almost all sample functions quasiregular on (0, (0).

PROOF. Claim (a) follows from (18).


Claim (d) follows from (44c-d).
Claims (b) and (c) are proved as in (7.21), with a new difficulty for instan-
taneous j. Let Tn be the least m/2n greater than T. Let C1 n be the least m/2n
with X(m/2n) ¢. i. Now C1 n ! T, but X(C1 n) need not converge to X(T), if X(T)
is not a stable state. To overcome this difficulty, suppose W E Q(i), as defined
in (19). Check that Tn(W) = C1 n(W) for all large enough n, using (20). So,
Pi{T n = C1 n} -- 1. Consequently, (33) implies
+ t) = j and X(C1 n + t) ¢. j} -- O.
Pi{X(T
(47) Remark. (7.46) continues to hold, supposing q(i1), •••
*
,q(in) positive
and finite. The proof still works.
Similarly, (7.33) holds with the proper convention. Let {~ n' Tn} be the
successive jumps and holding times in X, so far as they are defined. That is,
~O = X(O) and
TO = inf {t:X(t) ¢. ~o}.

Put 0'_1 = - 00 and 0'0 = 0 and


C1 n = TO + ... + Tn-l for n ~ 1.

Suppose ~o, ... , ~n and TO,'" , Tn are all defined. If Tn is 0 or 00,


then ~n+!' ~n+2' ... as well as Tn+!' T n+2' ... are left undefined. If 0 < Tn < 00
and X(C1 n+!) = q;, then ~n+! = q;; but ~n+2' ~n+3' ... and Tn+l' Tn+2' ... are
left undefined. If 0 < Tn < 00 and X(C1 n+!) E I, then
~ n+! = X( C1 n+!)
Tn+! + C1 n+! = inf {t:t > C1 n+! and X(t) ¢. ~n+!}'
Here is an inductive measurability check. First, C10 = 0 and ~o = X(O) are
measurable. Next, C1n+! < t iff ~n E I and there is a binary rational r with
C1n- 1 < C1 n < r < t and X(r) ¢. ~n'
Finally, ~n+! = j E I iff: C1 n+! < 00, and for any binary rational r bigger than
C1n+!, there is a binary rational s with
C1 n < C1 n+! < s < rand Xes) = j.
9.5] THE POST-EXIT PROCESS 325

Recall Q = P'(O) and q(i) = -Q(i, i). Let

r(i,j) = Q(i,j)/q(i) for 0 < q(i) < 00 and i ~ j


=0 elsewhere.

(48) Theorem. Let;o = i, iI' ... , iN be in J, and let to, ... , tN be non-
negative numbers. Then

where

and
t = ~;;~o q(in)t n·

Here, 00' 0 = 0 and 00 + 0 = 00; while 00' x = 00 + x = 00 for


x > 0; and e--oo = O. In particular, if ~n is defined and absorbing, then
r n = 00 a.e.; if ~n is defined and instantaneous, then r n = 0 a.e.; in either
case, further es or r's are defined almost nowhere. If ~n is defined and stable
but not absorbing, then 0 < r n < 00 a.e. The proof of (7.33) stiII works.

The hack ward and forward equations


The results of Section 7.6 can be extended to the general case. For (7.52-58)
on the backward equation, assume i is stable. The argument is about the
same, because (7.21) works in general (46). For (7.60-68) on the forward
equation, assume j is stable. To handle (7.66), let D be the complement of

Xes) = X(t) for 0 ~ s ~ t.

On D, let y be the sup of s with Xes) ~ X(t), and let

X(y-) = q-lims t t Xes).


Rescue the argument by using the idea of (33) on the reversed process

{X(t - s):O ~ s ~ t},

which is Markov with nonstationary transitions. Details for this maneuver


appear in (Chung, 1967, pp. 198-199). You will have to check that given
X(t) = j, the time t is almost surely interior to a j-interval, by adapting the
proof of (7.6). For (7.68), say X jumps from gJ to j at time r iff
X(r) =j and X(r-) = q-limst t Xes) = gJ.
326 THE GENERAL CASE [9

6. THE ABSTRACT CASE

The results of this chapter, notably (41-48), can be applied to abstract


Markov chains. Let (,q(,~, p) be an abstract probability triple, and
{Z(t):O ~ t < oo} an I-valued process on (,q(, ~). Suppose Z is a Markov
chain with stationary transitions P and starting state i, relative to p. Suppose
that all the sample functions of Z are quasiregular: if not, Z can be modified
using (17). There is no difficulty in transcribing (46-48) to this situation: see
(5.46-48) for the style. Strong Markov is something else. Probably the easiest
thing to do is to review the proof and make sure it still works. For ideological
reasons, I will use the Chapter 5 approach.
Let ~(t) be the a-field in ,q( spanned by Z(s) for 0 ~ s ~ t. Let a be a
random variable on ,q(, with values in [0,00]. Suppose a is Markov for Z,
namely {a < t} E ~(t) for all t. Let ~(a+) be the a-field of all A E ~ such
that A n {a < t} E ~(t) for allt. Let W be the post-a process:

Wet, x) = Z[a(x) + t, x] when a(x) < 00.


Let Tx E nbe the function W(·, x) retracted to the binary rationals R in
[0, (0). Define P by (43). The object is to prove:
(49) Theorem. Let Q(.,.) be a regular conditional p-distribution for T
given ~(a+). Let /!tq be the set ofxE{a< oo} such that W(·, x) is quasi-
regular on (0, (0).

(a) fiiq E ~ and p{/!tq a I < oo} = 1.


(b) If x E,q(q then W = X 0 T; that is, Wet, x) = X(t, Tx) for al/
t ~ 0: where X was defined by (42).
(e) p{Q E I
P a < oo} = 1.
(d) Given {a < oo} and Z(a) = j E I, the pre-a sigma field ~(a+) is
conditionally p-independent of the shift T, and the conditional p-distribution
ofT is P;.
(e) Let ~ be the product a-fteld in Q relativized to Qq. Let F be a non-
negative, measurable function on the cartesian product

(,q(, ~(t» X (Qq, ~).


Let j E I and
D = {,q(q and Z(a) = j}.
Then

inF(X, Tx) p(dx) = inF*(X) p(dx),


9.6] THE ABSTRACT CASE 327

where

F*(x) = L.F(X, w) P;(dw).

*
PROOF. Use (41), (45), and (50) below.
(50) Proposition. Let M be this mapping from fl" to QQ:
X(r, Mx) = Z(r, x) for all r E R.
There is a Markov time T on QQ such that a = ToM. Then
{a < oo} n ~(a+) = M-l [{T < oo} n~(T+)].

Let Y be the post-T process on Qq, and let S be Y with time domain retracted
to R. Then
Wet, x) = Yet, Mx)
and
Tx = SMx
for all x E fl".
PROOF. The first problem is to find a Markov time T on QQ such that
a = ToM. Let A E ~(a+) with A c {a < oo}. The second problem is to
find B E ~(T+) with Be {T < oo} and A = M-IB. The rest is easy. To start
the constructions, let ~(oo) be the a-field spanned by Z, and let ~(oo) be the
full a-field in QQ' namely the a-field spanned by X. Check
~(t) = for 0 ~ t ~ 00.
M-l~(t)
I remind you that M-l commutes with set operations. Start work on T.
Confine rand s to R. Now {a < r} E ~(r), so {a < r} = M-IFr for some
Fr E ~(r). Let

Then Gr E~{r) and {a < r} = M-IGr • Moreover,

GT = Us<r Gs·
Let G = Ur Gr. Off G, let T = 00. For wE G, let T{W) be the sup of r with
W ¢= Gr. You should check that {T < r} = GT • So T is Markov, because
{T < t} = Ur<t {T < r}. And ToM = a, because ToM < r iff a < r.
Turn to the second problem. Let A E ~(a+) and A c {a < oo}. Now
A n {a < r} E ~(r), so A n {a < r} = M-IHT for some Hr E~(r). Let
Jr = Hr n < r}
{T
Kr = Us<r {T < s}\Js
Br = Jr\Kr
B = UTBr.
328 THE GENERAL CASE [9

Check A = M-IB. I claim BE §"(-T+). I only have to prove


B n {T < t} = Ur<tBr.
Clearly, r < t makes Br C < t}, so
{T
B n {T < t} :::> Ur<t Br.
For the opposite inclusion, fix WEB n {T < t}. Choose s with T(W) < s < t.
Now W E Br for some r; if r < t, you're done: suppose r ~ t. But W It Kr and
T(W) < s, so WE 18 , And K is nondecreasing, so wit K•. That is, wE B8 •
The rest is haggling. *
The final version of this proof is due to Mike Orkin.
QUESTION. Can you lift a strict Markov time this way?
10

APPENDIX

1. NOTATION

* is the end of a proof, or a discussion.


iff is if and only if.
A \ B is the set of points in A but not in B.
A b. B = (A \ B) U (B \ A).
(/; is the empty set.
o is composition.

If iF is a a-field and A E iF, then AiF is the a-field of subsets of A of the


n
form A F, with FE iF.
X is measurable on Y means that X is measurable relative to any a-
field over which Y is measurable. Usually, this means you can compute X
in a measurable way from Y.
If n is a statement about points x, then
{n} = {x:n(x)} = [n] = [x:n(x)]
is the set of x for which n(x) is true. And
kx {ax:n(x)} = k {ax:n(x)}

is the sum of ax over all x for which n(x) is true.


An empty sum is 0; an empty product is 1.
an = O(bn ) means: there are finite K and N such that

lanl ~ Klbnl for all n ~ N.


an = o(bn) means: for any positive e, there is a finite N. such that
lanl ~ elbnl for all n ~ N •.

I want to thank Allan Izenman for checking the final draft of this chapter.
329
330 APPENDIX [10

a" - b" means: there are finite, positive K and N such that
1
Kb" ~ a" ~ Kb" for all n ~ N.

an ~ b" means: for any e with 0 < e < 1, there is a finite N. such that
(1 - e)b" ~ a" ~ (1 + e)b" for all n ~ N •.
[0,1) = {x:O ~ x < 1}.
(x) means the greatest integer n ~ x.
x is positive means x > 0, while x is nonnegative means x ~ O.
When it seems desirable, the redundancy "x is strictly positive" is employed.
Similarly for increasing and nondecreasing.
Real-valued means in (- 00, 00), while extended real-valued means in
[ - 00,00]. Random variables are allowed to take infinite values without
explicit warning.
Clearly usually means that the assertion which follows is clear. Some-
times, by force of habit, it means that I didn't feel like writing out the argu-
ment.
Let f be a real-valued function on S x T. Then f(s) = f(s, . ) is the
real-valued function t --+ f(s, t) on T, while f(t) = f( . , t) is the real-valued
function s --+ f(s, t) on S. Furthermore, f is used indifferently for the real-
valued function (s, t) --+ f(s, t) on S x T, the function-valued mapping
s --+ f(s) on S, and the function-valued mappingf --+ f(t) on T. Whenever this
threatens to get out of hand, some explanation is provided.

2. NUMBERING

In each chapter, all important formulas, definitions, theorems and so on


are treated as displays and numbered consecutively from 1 on. Inside chapter
a: display (b) means the display numbered b in chapter a; for a' ::j:. a, display
(a'· b) means the display number b in chapter a'. Section a· b is section b
of chapter a. And (MC, a· b) is the display numbered b in chapter a of MC;
this kind of reference is used in B & D and A CM.

3. BIBLIOGRAPHY

(Blackwell, 1958) and Blackwell (1958) refer to the work of Blackwell


listed in the bibliography with year of publication 1958. The obvious problem
is settled by this device: (Levy, 1954a). Each entry in the bibliography gives
the edition I used when writing the book. In certain cases, notably (Chung,
1960), a more recent edition is now available. When this is known to me, the
4] THE ABSTRACT LEBESGUE INTEGRAL 331

newer edition is referred to in parentheses following the main entry. Journals


are abbreviated following Math. Rev. practice. I do not give references to
my own articles.
This book is part ofa triology, published by Holden-Day at San Francisco
in 1970. The titles, and their abbreviations, are:
Markov Chains (MC)
Brownian Motion and Diffusion (B & D)
Approximating Countable Markov Chains (ACM)

4. THE ABSTRACT LEBESGUE INTEGRAL t

As usual, 0 is a set, and ~ is a a-field of subsets of 0; that is, 0 E ~ and


~ is closed under complementation and the formation of countable unions.
A probability &' on ~ is a countably additive, nonnegative function, with
&'(0) = 1. Then (O,~, &') is called a probability triple. If &'(0) ~ 1, then
&' is a subprobability. On occasion, reference will be made to the inner
measure &'*(A) of A c 0, which is sup {&,(F): F E ~ and F c A}. Similarly,
the outer measure &,*(A) = inf {&,(F): F E ~ and A c F}.
The indicator function lA is 1 on A and 0 on 0 \ A, the set of points in 0
but not in A. A random variable X is an ~-measurable function from 0 to the
real line : that is,
[X ~ x] = {X ~ x} = {W:WEO and X(w) ~ x} E~
for all x. If X takes infinite values, it will sometimes, but not always, be
described as extended real-valued. A partially defined random variable is a
random variable on (A, A~), where A E ~ and A~ = {A n B:BE ~}.
For X ;?; 0, the expectation (or &'-expectation, if several probabilities are in
sight) of X is a nonnegative, extended real number E(X):
E(lA) = &,(A); and E(cxX + f3Y) = cxE(X) + f3E(y);
and
(1) Monotone convergence theorem. If 0 ~ Xn i X, then E(Xn) i E(X).
Herecx;?; Oandf3;?; O;and(cxX + f3Y)(w) = cxX(w) + f3Y(w);andX n i X
means that Xn(w) is nondecreasing with n and tends to X(w) as n -+ 00 for
each WE O. Moreover,
(2) Fatou's Lemma. If Xn ~ 0, then E(lim inf Xn) ~ lim inf E(Xn).
If {An} is a sequence of sets, then lim sup An is the set of W which are
elements of infinitely many An; and lim inf An is the set of W which are
t References: (Loeve, 1963, Part 1); (Neveu, 1965, Chapters 1 and 2).
332 APPENDIX [10

elements of all but finitely many An. Thus,


llim inC An = lim inf 1An and llim sup An = lim sup 1An'
Furthermore, 0 \ (lim sup A,,) = lim inf (0 \ A,,). By definition An ..... A iff
A = lim sup An = lim inf A•. As (2) implies,
(3) Corollary. If An ..... A, then &(A.) ..... &(A).
In general, X = X+ - X-, where X+ = XU 0 = max {X, O}, and

E(X) = f X d& = f X(w) &(dw) = E(X+) - E(X-),

except 00 - 00 is taboo. Write E(lAX) = fAX = fA X d&.


(4) Dominated convergence theorem. If X n ..... X and IX.I ~ Yfor all nand
E(Y) < 00, then E(X n) ..... E(X).
If limk-+oo fllXnl~k} IX.I d& = 0 uniformly in n, the X. are uniformly
integrable.
(5) Criterion. {X.} is uniformly integrable if either (a) or (b) holds:
(a) E(lX.I P) ~ K for all n, for some p > land K < 00.
(b) &{IXnl ~ x} ~ &{Y~ x} forallnandallx > K, where 0 ~ K < 00
and 0 ~ Yand E(Y) < 00.
(6) Theorem. If X n ..... X and the X nare uniformly integrable, E(X n) ..... E(X);
in fact, E(IX - X nl) ..... O.
Easy, useful estimates:
(7) Chebychev inequality. &{X ~ k} ~ k- 1 E(X) for X ~ 0 and k > O.
(8) Schwarz inequality. [E(Xy)]2 ~ E(X2). E(y2). Equality holds iff
&{ Y = aX} = 1 for some constant a.
(9) Jensen's inequality. Suppose E(lX!) < 00 and f is convex. Then
E[f(X)] ~ f[E(X)]. Equality holds iff f is linear on an interval [a, b] with
&{a ~ X ~ b} = 1.
PROOF. Use (76) below.
Useful miscellany:
*
(10) Theorem. (a) Let X and Y be random variables. Suppose

L Xd&= LYd&

for all measurable A. Then &{ X = Y} = 1.


4] THE ABSTRACT LEBESGUE INTEGRAL 333

(b) E(IX!) < 00 implies &,{IXI < oo} = 1.


(c) E(JX!) = 0 implies &,{X = O} = 1.

Absolute continuity

Suppose P and Q are two probabilities on IF. Then P « Q, or P is abso-


lutely continuous with respect to Q, iff Q(A) = 0 implies P(A) = O.
(11) Radon-Nikodym Theorem. P« Q iff there is a nonnegative measurable
finite f with

P(A) = {fdQ.

Thisfis unique up to changes on null sets, by (lOa), and is the Radon-Nikodym


derivative of P with respect to Q. For any nonnegative measurable g,

f gdP = f gfdQ.

Say P is orthogonal or singular with respect to Q, or P .l.. Q, iff there is a


set A with P(A) = 0 and Q(Q \ A) = O.
(12) Lebesgue decomposition. P = Po + P l , where Po« Q and P l .l.. Q
and the Pi are subprobabilities.
Say P is equivalent to Q, or P =
Q, iff P « Q and Q « P.

Convergence

Suppose X" and X are finite almost surely. Say X" converges to X, or
X" -+ X, in &'-probability iff
&'{JX" - XI ~ e} -+ 0
for any e > O. Say {X,,} is fundamental in probability iff
lim",m->oo &'{IX" - Xml ~ e} =0
for any e > O. For extended real-valued X" and X, say X" -+ X in &'-
probability iff for any positive, finite e and K:
&,{IX" - XI ~ e and IXI < oo} -+ 0
&,{X" ~ K and X = oo} -+ 0
&,{X" ~ - K and X = -(X)} -+ O.
(13) Theorem. Suppose {X,,} are finite almost surely and fundamental in
334 APPENDIX [10

probability. Then there is a random variable X, also finite almost surely,


such that X n -+ X in probability. Conversely, if {X n} and X are finite almost
surely, and X n -+ X in probability, then {X n} is fundamental in probability.
Say Xn converges to X, or Xn -+ X, almost surely iff

(14) Theorem. If Xn -+ X almost surely, then Xn -+ X in probability. If


X n --+ X in prolmbility, there is a nonrandom subsequence n* such that X n. --+ X
almost surely.
If n(w) is a statement about w, then n almost everywhere, or n &'-a.e. or
n almost surely means &'(w:n(w) is false} = O. Similarly, n a.e. on A means
&'{w:n(w) is false and wEA} = O. Finally n almost nowhere means that
&'{w:n(w) is true} = O.

The LP-spaces
A random variable X is in U relative to &' iff f IXI P d&' < 00. The p-th
root of this number is the U-norm of X. A sequence Xn -+ X in LP if the
norm of X - X n tends to O. After identifying functions which are equal a.e.,
U is a Banach space for p ~ 1. This popular fact is not used in the book.
The results of this section (except that uniform integrability gets more
complicated) are usually true, and sometimes used, for measures &' which
are not probabilities; a measure on ff is nonnegative and countably additive.
In places like (11), you have to assume that &' is a-finite:
n = U f; 1 n i with &'(ni ) < 00.

For the rest, suppose &' is a probability; although I occasionally use converse
Fubini (22) for a-finite measures.

5. ATOMS*

If I: is a a-field of subsets of nand WEn, the I:-atom containing w is 1:(w),


the intersection of all the I:-sets containing w. Say I: is separable or countably
generated iff it is the smallest a-field which includes some countable collection
rtf of sets. In this case, let .s;I be the smallest field containing rtf. Namely,
n E .s;I; and .s;I is closed under the formation of complements and finite
unions. Then .s;I is countable and generates I:. That is, I: is the smallest
a-field which includes d. Let .s;I(w) be the intersection of all the .s;I-sets

* References: (Blackwell, 1954); (Loeve, 1963, Sees. 1.6,25.3 and 26.2).


5] ATOMS 335

contaInIng w, which by definition is the d-atom contaInIng w. Then


d(w) E L, and d(w) = L(W). Indeed, d(w) is wholly included in or wholly
disjoint from each d -set. By the monotone class argument, which I will
make in a second, this goes for L as well.
Call M a monotone class iff:
(a) An E M and Ale A2 c ... imply Un An EM; and
(b) AnEMandAl:::J Az:::J ···imply nnAnEM.
The monotone class argument. Let M be the set of A E L such that: d(w)
is wholly included in or wholly disjoint from A. Then M is monotone and
includes d. Now (15) below implies M :::J L.
(15) Lemma. The smallest monotone class which includes a field d coincides
with the smallest a-field which includes d
(16) Theorem. Let C(j be a collection of sets which is closed under intersection
and generates the a-jield ff. Let P and Q be two subprobabilities on ff, which
agree on C(j. Suppose Q is a countable union of pairwise disjoint elements of C(j,
or more generally that P(Q) = Q(Q). Then P = Q on ff.
PROOF. Let Iff be the class of A E ff with P(A) = Q(A). Clearly, Q E Iff
and ~' c Iff. If A E Iff and BE Iff and A :::J B, then A \ B E Iff because R(A \ B) =
R(A) - R(B) for R = P or Q. If A E Iff and BE tff and A n B = (/J, then
AU BE tff, for a similar reason.
If Ai E C(j for i = 1, ... , n, then Bn = U7= 1 Ai E 8 by induction on n. The
case n = 1 is clear. And

= Bn U (An + 1 \ Bn)
= Bn U (An+l \ en)
where
en = An+l nBn = U7=1 (An+l nAJ
Now An+ 1 n Ai E C(j, because C(j was assumed closed under intersection. So
en, being the union of n sets in C(j, is in Iff by inductive assumption. But
en c An + 1, SO An + 1 \ en E 8. Finally, An + 1 \ en is disjoint from Bn, so its
union Bn+ 1 with Bn is in Iff. This completes the induction.
Let A* = A or Q \ A. If Ai E C(j for i = 1, ... , n. I will get
B = n7=1 AT Etff.
Using the assumption that C(j is closed under intersection, you can rewrite Bas
336 APPENDIX [10

with C 1 , •.• , C m and D in CC. Let C = Ui"~ 1 Ci . Then


B = D\C

= D\(CnD).

Now C n Dc D and C nD = Ui"~1 (C i n D)EC, because Ci nDECC. This


forces BE c.
lf Ai E CC for i = 1, ... , n, the field generated by A 1, ... , An is included in C;
the typical atom was displayed above as BE C, and any nonempty set in the
field is a disjoint union of some atoms. Consequently, the field .91 generated
by CC is included in C. Of course, .91 generates:#'o Now use the monotone class
argument: C is a monotone class, and includes .91 ; so C includes the smallest
monotone class which includes .91, namely:#'o *
Let L be a a-field of subsets of O. The set of probabilities on L is convex:
the convex combination of two probabilities is again a probability. A
probability is extreme iff it cannot be represented as the convex combination
of two distinct probabilities. A probability is 0-1 iff it only assumes the
values 0 and 1 ; sometimes, L is called trivial relative to such a probability.
(17) Theorem. Let m be a probability on L.
(a) m is extreme iff m is 0 - 1.
(b) Suppose L is countably generated. Then m is 0 - 1 iff m(B) = 1
for some atom B of L.
PROOF. Claim (a). Suppose m is not 0 - 1. Then 0 < m(A) < 1 for
some A E L. And
m = m(A)· m(·1 A) + [1 - m(A)]· m(·1 0\ A)
is not extreme. Conversely, suppose mis not extreme. Then
m = pm' + (1 - p)m"
for 0 < p < 1 and m' i= m". Find A E L with m'(A) i= m"(A). Conclude
o < m(A) < 1 and m is not 0 - 1.
Claim (b). The if part is easy. For only if, let .91 be a countable generating
field for L. Let d i be the set of A E .91 such that m(A) = i, for i = 0 or 1.
Now .91 = do U .911 , and A E do iff 0 \ A E .911 . Let B be the intersection of all
A E .911 • Then BEL has m-probability 1, and in particular is nonempty.
Fix WEB. If A E .911 , then A ~ B. If A E do, then 0 \ A E .911 and A n B = ¢.
Thus, W E A E .91 iff A E .911 , and B is an atom. *
Say X is a measurable mapping from (0, :#') to (0', :#,') iff X is a function
from 0 to 0' and
6] INDEPENDENCE 337

(18) Theorem. Let X be a measurable mapping from (O,~) to (0', ~').


If~' is countably generated, so is X-I ~'. The atoms of x- I~' are precisely
the X-inverse images of the atoms of ~'.

6. INDEPENDENCEt

Let (O,IF. &') be a probability triple. Sub IT-fields ~,!Fz, ... are inde-
pendent (with respect to &') iff Ai E ~ implies
&,(AI nA 2 n···) = &'(Ad·&'(A2)···.
Random variables X I, X 2, ... are independent iff the IT-fields they span are
independent; the IT-field spanned or generated by Xi is the smallest IT-field
with respect to which Xi is measurable. Sets A, B, ... are independent iff
lA, lB' .. · are independent.
(19) Borel-CanteIli Lemma. (a) l:&'{ An} < 00 implies &,{lim sup An} = 0;
(b) l:&'{ An} = 00 and AI, A 2, ... independent implies &,{lim sup An} = 1.
Suppose Xi is a measurable mapping from (O,~) to (OJ,~) for i = 1,2.
The distribution or ~-distribution.~ Xii of X I is a probability on ~I : namely,
(&'XII)(A) = &,(X I I A) for A E~.
(20) Change of variables formula. Iff is a random variable on (0 1, ~I)'
then
E[f(X d] = fXIE{!I
f(xd(&'XII)(dxd·

Let X 1 and X 2 be independent: that is, Xii fft and Xii §i are. Let fft x §i
be the smallest IT-field of 0 1 x O2 containing all sets Al x A2 with Ai E~.
Let f be a random variable on (0 1 x O 2 , fft x !Fz) such that E[f(X I, X 2)]
exists.
(21) Fubini's theorem. If X I and X 2 are independent,

E[f(X I ,X 2)] = f
XIE{!I
E[f(xI,X 2)]&,Xi l ( dx I)

= f
X2E{!2
E[f(X I, X2)] &'X i l ( dx 2)'

In particular,
(2la) E(X IX 2) = E(X I)' E(X 2)'
t References: (Loeve, 1963, Sections 8.2 and 15); (Neveu, 1965, Section IV.4).
338 APPENDIX [10

Conversely, suppose (0;, ff;,~) are probability triples for i = 1,2. Let
0=0 1 X O2 and iF = ffii x ~.
(22) Converse Fubini. There is a unique probability fJ> = fJ>1 X fJ>2 on
ffii x ~ such that
fJ>(AI x A 2) = fJ>1(A 1)· fJ>2(A 2)
for Ai E ff;. Then

for nonnegative (ffii x ~)-measurable f.


PROOF. The uniqueness comes from (16). For existence, if A E iFl X ~,
let A(Wl) be the wI-section of A, namely, the set of W2 E O 2 with (WI, (2) EA.
Let

Let Xl, X 2, ..• be independent and identically distributed. Suppose Xl


*
has finite mean p..
(23a) Weak law of large numbers. n- 1(X 1 + ... + Xn) converges to
p. in probability.
(23b) Strong law of large numbers. n - l(X 1 + ... + X n) converges to
It with probability 1.
Let ffii , ~, ... be independent sub a-fields in (0, ff, fJ». Let iF(n) be the
a-field generated by ~,~+ 1" ••. The tail a-field iF(oo) is n:,= liF(n).
(24) Kolmogorov 0-1 Law. Each iF(oo)-set has fJ>-probability 0 or 1.

7. CONDITIONINGt

Let d be a sub-a-field of ff, and let X be a random variable with expecta-


tion. The conditional expectation or fJ>-expectation of X given d is E{ Xld} =
Y, the d-measurable random variable Y such that fA Y dfJ> = fAX dfJ> for
all A E d. For BE ff, the conditional probability or fJ>-probability of B given
disfJ>{Bld} = E{l B ld}.IfZisameasurablemapping,E{XIZ} = E{Xld},
where d is the a-field spanned by Z. According to convenience, { } changes
to [ ] or ( ). Conditional expectations are unique up to changes on sets of
measure 0, by (lOa), and exist by Radon-Nikodym (11).
t References: (Loeve, 1963, Sections 24 and 25); (Neveu, 1965, Chapter IV).
8J MARTINGALES 339

Let f!A be a sub-CT-field of d These facts about conditional expectations


are used rather casually: equality and inequality are only a.e.
(25) Facts. (a) X ~ 0 implies E{Xld} ~ o.
(b) E{Xld} depends linearly on X.
(c) E{Xld} = X if X is d-measurable.
(d) E{XYld} = XE{Yld} if X is d-measurable.
(e) E{X} = E{E(Xld)}.
(f) E{E(Xld)If!A} = E{XIf!A}.
(g) E{ Xld} = E{ XI&I} if E{ Xld} is &I-measurable.
(b) If X is independent of d, then E{Xld} = E{X}.
Say d is trivial iff &,(A) = 0 or 1 for any A E d
(i) If d is trivial, then d is independent of X, and E{Xld} = E{X}.

8. MARTINGALESt

Let T be a subset of the line. For t E T, let !!Fr be a sub-CT-field of $', and let
XI be an !!Fr-measurable function on 0, with finite expectation. Suppose that
s < t in T implies: ~ c !!Fr, and for A E~,

LX, d&' = LXI d.o/'.

Then {x" !!Fr: t E T} is a martingale, or {XI} is a martingale relative to {!!Fr}.


If under similar circumstances,

L X,d&' ~L X , d9,

then {XI} is an expectation-decreasing martingale relative to {!!Fr}. If {!!Fr} is


not specified, then !!Fr is the CT-field generated by X, for SET with s ~ t.
(26) Example. Suppose !!Fr is a sub-CT-field of ff for each t E T, and ~ c !!Fr
for s < t. Let X be a random variable on (0, ff) with finite expectation, and
XI = E{XI!!Fr}. Then {XI,!!Fr:tE T} is a martingale.
(27) Lemma. Let {XI} be an expectation-decreasing martingale, and let
f be concave and nondecreasing. Then {J(X I)} is an expectation-decreasing
martingale.

tReferences: (Doob, 1953, Chapter VII, Sections 1--4); (Loeve, 1963, Section 29);
(Neveu, 1965, Section IV.5).
340 APPENDIX [10

Suppose T = {O, 1, ... }. Let T be a random variable on (n, iF, &'), which
is 00 or is T. Suppose (T = t) E ~ for all t E T. Then T is admissible,
or a stopping time. The pre-T a-field ~ is the a-field of A E ~such that
A n {T = t} E §, for all t E T.
(28) Theorem. Suppose T = {O, I, ... }. Suppose {X,,~: t E T} is a
martingale, and T is admissible.

(i) f{A andT~n}


XTd &' = f{A andT~n}
Xn d &' for all A E~.
(ii) For each n, the variables ZT = X T . I{T~n} are uniformly integrable,
as T varies. In fact, for any fixed M > 0,

f {IZTI>k}
IZTld&,;;;; k-1ME(IXnl) + f
{lXnl>M}
IXnld~.

Suppose Jl>{ T < oo} = 1, and


(a) E(IXTI) < 00

(b) lim infn_ 00 f


{T> n}
IXn I d&' = o.
Then (Xo, X T) is a martingale, so
(iii) E(Xo) = E(XT).

PROOF. Let A E ~. Let Am = {A and T = m}. Then Am E~. Fix n. If


m;;;; n,

f
Am
X d.~ =
T f Am
X m d~ = fAm
Xn
dY'

Sum out m = 0, ... , n to prove (i). For (ii), put A = {XT ~ O} or {XT < O} to
see E(I ZTI) ;;;; E(IXn I), so

&'{IZTI> k} < k-1E(IXnl).

Then put A = {XT > k} or {XT < -k} to see

;;;; MJI'{IZTI > k} + f


{IXnl> M}
IXnl dJl'.

For (iii), let A E~. Let Am = {A and T = m}. Then Am E s;;;... If m ;;;; n,

f Am
X r d&' = f Am
X m d&' = f Am
X n d&'.
8] MARTINGALES 341

Sum out m = 0, . .. ,n:

J{Aandt~n}
X,d&J = J{Aand,~n}
Xn d&.

Now

L Xod&J = L Xnd&J

= J{A and '~n}


X n d&J +J {A and, > n}
X n d&J

= J{Aand,~n}
X,d&J +J
{Aand,>n}
Xn d&.

This doesn't use (a-b). By (a),

limn _ oo J{Aand'~n}
X, d&J = I x,
A
d&.

By (b), if n increases through the right subsequence,

J{Aand t>,,}
X"d&J -+ O.
*
Example. Let the Y,. be independent and identically distributed, each
being 0 or 2 with probability! each. Let }f. 0 = 1 and let X" = Y1 ... Y,. for
n ~ 1. Let T be the least n if any with X" = 0, and T = 00 if none. Then {X,,}
is a nonnegative martingale, T is a stopping time, E{T) < 00, and X t = 0
almost surely. This martingale was proposed by David Gilat. *
Example. There is a martingale, and a stopping time with finite mean,
which satisfy (28b) but not (28a).
DISCUSSION. For n ~ 1, let
(n - 1)2
a" = log (3 + n)
and b" + 1 = a; + 1 - a;.
Let b 1 = 1. Check b" > 0 for all n. Let Y,. be N{O, bn), and let Y1 , Y2 , • •• be
independent. Let Xn = Y1 + ... + Y,., so Xn is a martingale. For n ~ 1,let

so
342 APPENDIX [10

Let f be a measurable function from ( - 00, (0) to {I, 2, ... }, with


~{f( Yd = n} = Pn for n = I, 2, ... .
So r = f( Y\) is a stopping time, with finite mean.
Let () be the distribution of Y\, and let Sn = Y2 + ... + Y", so S, = 0.
Now Y, is independent of S" S2,"" and Xn = Y, + Sn, and Sn is N(O, a;).
Compute.

E{IXrl} = f E{ly + SJ(y)l} ()(dy)

~f E{ISJ(y)l} ()(dy)

f
= c aJ()") ()(dy) where c = E[I YII}

= 00.

Continuing,

r
J{r>nl
IXnl d,Uj> = r
J{J>nl
E{ly + Snl} O(dy)

~ r
J{J>nl
[Iyl + E{ISnl}J ()(dy)

= r
J{J>nl
Iyl O(dy) + can' O{f > n}

--> 0.
*
Suppose T = [0,00). Let T be a random variable on (Q, qf, 0'), which is 00
or in T. Suppose {T < I} E ~ for all lET. Then T is admissible, or a
slopping lime.

(29) Theorem. Suppose T = [0, (0). Suppose {X!,,~: t E T} is a martingale,


and r is admissible. Suppose
(a) t --> Xt(w) is continuous from the right, for each w
(b) E(IXrl) < 00

(c) lim inft _ ao S{r>tlIXtl d~ = 0.


Then (Xo, Xr) is a martingale, so E(Xo) = E(Xr)'
8J MARTINGALES 343

PROOF. Fix t. Let Tn be the least (j/n)t ~ T. Fix A E ~. As in (28),

r
J{Aandt;;il)
X t dq> =
n
r
J{Aandt;;il)
Xld&>.

Let n -+ 00; use (a) and uniform integrability (28):

r
J{Aandt;;il)
X t dq> = r
J{Aandt;;i I)
X I d&>.

Now

L X 0 dq> = L XI dq>

= r
J {A and t;;i t)
XI dq> + r
J {A and t > I)
XI dq>

= r
J{Aandt;;il)
X t dq> + r
J{Aandt>l)
XI d&>.

Let t ---+ 00; use (b) and (c).


Theorem (30) partially extends (28). To state it, let *
{XI,§,:t = 0, 1, ... }
be a nonnegative, expectation-decreasing martingale. Let To ~ T1 ~ ••. be
stopping times. Let
Y" = X tn '

Let <;9'n be the a-field of measurable A such that

{AandTn~t}E§,

for all t = 0, 1, ....


(30) Theorem. {Y", <;9'n : n = 0, 1, ... } is an expectation-decreasing martingale.
PROOF. You should check that Y" is <;9'n-measurable, and <;9'n c <;9'n+ l ' Let
A E <;9'n. I have to prove

It is enough to do this with A n {Tn = m} in place of A: afterwards, sum over


m. Abbreviate a = Tn+ 1, SO Y,,+ 1 = X". On A n {Tn = m},
a ~ m and Y" = Xm •
344 APPENDIX [10

Check {Tn = m} and A n {Tn = m} are both in~. My problem is reduced


to showing that

(31)

for a typical A E ~m with A C {Tn = m}. I will argue inductively that for
M~m,

This is clear for M = m. Suppose it true for some M ~ m. Now


{a> M} = Q\ {a ~ M}E~M' and AE~ C ~M.

The computation rolls on:

f X M d& ~ f X M + I d&

f f
A n{a>M} A n{a>M}

= X M + I d& + X M + I d&

f f
An{a=M+I} An{a>M+l}

= X a d& + X M + I dPJ.
An{a=M+I} An{a>M+I}

This proves (32). Now (32) is even truer without the rightmost term. Drop it,
and let M increase to 00 to get (31). *
If s I, ... , SN is a sequence of real numbers, and a < b are real numbers, the
number of downcrossings of [a, b) by Sl, ... , SN is the largest positive integer
k for which there exists integers 1 ~ nl < n2 < ... < n2k ~ N with
snl ~ b, sn2 ~ a, . .. ,sn2k_1 ~ b, sn2k ~ a.
If no such k exists, the number of downcrossings is O.

(33) Downcrossings inequality. Let X o , XI' ... be a nonnegative, expecta-


tion-decreasing martingale. Let 0 ~ a < b < 00. The mean number of down-
crossings of [a, b) by X 0, X I, ... is at most (b - a)- I times the mean of X o.
This differs only in detail from the upcrossings inequality (Doob, 1953,
Theorem 3.3 on p. 316).
PROOF. Introduce Pn for the number of downcrossings of [a, b) by
X 0, ... , X n. It is enough to prove
E(Pn) ~ E(X o)/(b - a)
in the case Xi ~ b for i = 0, ... , n; use (27) on the function x ---> max {x, b}.
8] MARTINGALES 345

Let ao be the least m = 0, ... ,n if any with X m = b: if none, let aD = n.


Let a, be the least m = 0, ... , n if any with m > aD and Xm ~ a; if none,
let a, = n. And so on, up to an' Now X 0, X Uo' ... , X Un is an expectation-
decreasing martingale by (30). Check,
f3ib - a) ~ Lm {Xu m - XUm+l:m = 0, ... , n - 1 and m is even}.
Therefore,
E(X 0) ~ E(X uo)
~ E(Xuo) - E(XuJ
= L::'~~ {E(XuJ - E(Xu m+ I)}
~ Lm {E(X um ) - E(Xu m+ ):m = 0, ... , n - 1 and m is even]
= E[Lrn {Xu m - Xu m+ I:m = 0, ... , n - 1 and m is even}]
~ E[f3:(b - a)].
*
Martingale convergence theorem
(34) Theorem. Forward martingales. Let {X n: n = 0, 1 ... } be a martin-
gale. If supn E(IX nl) < 00, then X n converges a.e. as n - 00 ; the convergence
is L 1 iff the X n are uniformly integrable. If p > 1 and supn E(IX niP) < 00, then
X n converges in LP as n - 00. Suppose {,?,,: n = 0, 1, ... } are non decreasing
a-fields and X E U for p ~ 1 and Xn = E{XI'?"}. Then {Xn} is a martingale,
and E(IXnIP) ~ E(IXn so the previous assertions apply: The Xn are auto-
matically uniformly integrable, even for p = 1. The limit of X n is the condi-
tional expectation of X given the a-field generated by all the~.
Backward martingales. Let {X n: n = ... , - 3, - 2, - I} be a martin-
gale. Then Xn converges a.e. as n ~ - 00. The Xn are automatically uniformly
integrable, and the convergence is also L '. If X-I E U for p > 1, so are all the
X n' and the convergence is also U. Suppose {g;n: = ... , - 3, - 2, -I} are
nondecreasing a-fields and X E U for p ~ 1 and Xn = E{XI~}. Then {XII}
is a martingale and EOX niP) ~ E(IXjP1 so the previous assertions apply. The
limit of X n is the conditional expectation of X given the intersection of
the g;n'
PROOF. If the X n are nonnegative, the a.e. convergence follows from (33).
General X n follow the same route, with minor changes. The LP convergence
follows from (6). *
Differentiation

For (35-36), let P and Q be two probabilities on g;. Then Q divides up into
three g;-sets, Qp and Q e and na,
such that P(QQ) = Q(Qp) = and P is °
346 APPENDIX [10

equivalent to Q on Qe . This partition is unique up to (P + Q)-null sets. Let


dP
dQ = 0 on QQ

= 00 on Q p
and let dP/dQ be the Radon-Nikodym derivative of P with respect to Q
on Qe • This function is $' -measurable and unique up to changes on (P + Q)-
null sets. Let {dn } be a nondecreasing or nonincreasing sequence of a-
fields. In the former case, let d 00 be the a-field generated by the union of the
d n. In the latter case, let d 00 be the intersection of the d n. For any measure
R, let Rn be the retraction of R to dn' Define dPjdQn like dP/dQ above, with
d n replacing $'. Thus, dPn/dQn is an dn-measurable function to [0, 00],
unique up to changes on (Pn + Qn)-null sets.

(35) Theorem. lim n _ oo ~~: :;:' except on a (P + Q)-null set.


PROOF. Introduce the probability A = i(p + Q). Then P;£ 2A, so
Pn « An and 0 ;£ dPn/dA n ;£ 2. Let E A be expectation relative to A. Abbre-
viate rn = dPjdQn' Suppose the d nare nondecreasing. Make the convention
00/00 = 1, and use (11) to check that

2rn _ dPn _ E {dPoo Id}'


1 + rn - dAn - A dA 00 n '
and use (34). Suppose the d n are non-increasing. Check that

2rn _ dPn _ E {dP1id}


1 + rn - dAn - A dA I n ,

and use (34).


(36) Example. Suppose the d n are nondecreasing. If Pn« Qn for
*
n = 1,2, ... , then {dPjdQn: n = 1,2, ... } is a martingale.

9. METRIC SPACESt

A metric p on Q is a nonnegative real-valued function on Q x Q such that:


p(x, y) = 0 iff x = y; and

p(x, y) + p(y, z) ~ p(x, z).

t References: (Dunford and Schwartz, 1958), (Hausdorff, 1957), (Kuratowski, 1958),


(Loeve, 1963, Section 2).
10] REGULAR CONDITIONAL DISTRIBUTIONS 347

Say xn converges to x or xn ~ x iff p(xn' x) ~ O. Say V c 0 is closed iff Xn E V


and Xn ~ x implies x E V. Say U c 0 is open iff 0 \ U is closed. The topology
or p-topology of 0 is the set of open U. A sequence {xn} in 0 is Cauchy iff
p(xn' xm) ~ 0 as n, m ~ 00. Say 0 is complete iff each Cauchy sequence con-
verges. Say 0 is separable if there is a countable subset C of 0 dense in 0:
for each x E 0, there are xn E C with Xn ~ x.
If (0, p) is complete and separable, the Borel-a-fi eld of (0, p) is the smallest
a-field containing the p-topology. Then (0 1 , 37'd is Borel iff there is a com-
plete, separable metric space (0, p), such that 0 1 is in the Borel a-field 37' of
(0, p), and 37'1 = Olg;: In this case also, 37'1 is called the Borel-a-field of
0 1 , If (Oi,37'i) are Borel, so is (0 1 X O 2 X "',37'1 X 37'2 X ... ). Borel a-
fields are countably generated.
If I is a countably infinite set, p(i,j) = 1 or 0 according as i #- j or i = j
is a perfectly good metric: I is complete and separable. The corresponding
topology and a-field are called discrete. The one-point compactijication
1 = I U {(f)} is obtained as follows. Let I = {iI' i 2 , ... } ; let (f) rj I; and let
1 = {iI' i 2 ,··., ioc,} with (,' = (f); let

where 1/00 = O. Of course, p retracted to I produces the same topology as p.

(37) Example. Let 0 be Euclidean n-space R". Let p be the usual distance:
p(x, y) = Ilx - yll and IIul1 2 = L7= 1 uf. Then 0 is complete and separable.
(38) Example. Let 0 be the rationals. There is no way to metrize the usual
topology so that 0 is complete.

10. REGULAR CONDITIONAL DISTRmUTIONst

COMFORT. The material in Sections 10-15 is fairly exotic, and is used


only on special occasions.
Let Xi be a measurable mapping from 0 to (Oi> 37'J for i = 1,2. A regular
conditional 9-distribution for X 2 given X 1 is a function Q( " .) on 0 1 x 37'2
with the following properties:
(39a) Q(Xl,') is a probability on 37'2 for each Xl E 0 1;
(39b) Q(', A 2) is 37't-measurable for each A2 E 37'2;

t References: (Blackwell, 1954); (Loeve, 1963, Sections 26 and 27).


348 APPENDIX [10

(39c) for each A2 E Ji'2, the function w -.. Q[X I(W), A 2] is a version of
&I{X2 E A2IX i l Ji'd.
Condition (c) can be rephrased as follows: if Aj E Ji'j, then

(39d) f AI
Q(x l , A2)(&lXil)(dxl) = &I{X I E Al and X 2 E A 2}.

You only have to check this for generating classes of A I'S and A2'S closed
under intersection, by (16). Sometimes Q(xJ,·) is called a regular con-
ditional .o/l-distribution for X 2 given X I = XI.
Suppose Q is a regular conditional &I-distribution for X 2 given X I. Let
4> be a measurable mapping from (0 2 '~) to (0"" .?f4,). Let Qq,(x l , .) be
the Q(x l , . )-distribution of 4>. Make sure that Qq,(x l , .) is a probability
on 0",.
(40) Lemma. Q", is a regular conditional &I-distribution for 4>(X 2) given X I.
EXAMPLE. Let 0 1 = O 2 = 0 = [0, 1]. Let X I(W) = X 2(W) = w. Let
Ji'l be the Borel a-field of [0, 1]. Let A be Lebesgue measure on Ji'l. Let B
be a non-Lebesgue-measurable set, namely A*(B) < A*(B). Let ~ = Ji' be
the a-field generated by Ji'l and B. Extend A to a probability &I on Ji'. You
can do this so &I(B) is any number in the interval [A*(B), A*(B»). There is no
regular conditional &I-distribution for X 2 given X I. For suppose Q were
such an object. Theorem (51) below produces a &I-null set N E Ji'1, such that
Q(w, .) is point mass at w for w r/: N. In particular,

Q(w, B) = 1B(w) for w r/: N.


The left side is an Ji'1-measurable function of w. So B differs by a subset of
the null set N from an Ji'l-set, a contradiction. *
(41) Theorem. If (0 2 , Ji'2) is Borel, then a regular conditional &I-distribu-
tion for X 2 given X I exists.
Theorem (41) is hard. One of its virtues (although this does not materially
increase the difficulty) is the absence of conditions on Ji' or Ji'l.
(42) Theorem. Suppose Ji'2 is countably generated. Suppose Q and Q* are
two regular conditional &I-distributions for X 2 given X I. Then
{x:x E 0 1 and Q(x, .) = Q*(x, . )}
is in Ji'l and has &lXii-probability 1.
PROOF. Let d be a countable, generating algebra for Ji'2. Then
{x:x E 0 1 and Q(x, A) = Q*(x, A)}
10] REGULAR CONDITIONAL DISTRIBUTIONS 349

is an $'rset of r!YX11-probability 1, for each A


all A E d is the set described in the theorem.
E d. The intersection over
*
The next result generalizes converse Fubini (22). Suppose 0 = 0 1 X O2
and $' = $'1 X $'2. Suppose r!YI is a probability on $'1, and Q satisfies
(39a, b). Let
X 1(X I ,X2) = XI and X 2(XI,X2) = X2.
(43) Theorem. There is a unique probability r!Y on $' satisfying the two
conditions:
(a) r!YXll = r!Y[, and
(b) Q is a regular conditional r!Y-distribution for X 2 given X I.
If f is a nonnegative, measurable function on (0 1 x O2, $'1 X $'2), then

r f dr!Y Jill
JIl
r JIl2
=r f(xI, X2) Q(x l , dx 2)r!YI(dx l ).

PROOF. The uniqueness follows from (16). For existence, define

r!Y(A) = r
Jill
Q(x I, A(x I» r!YMx I)

for A E $'1 X $'2: as before, A(x d is the XI-section of A, namely the set of
X2 with (x I, X2) EA. Check that r!Y is a probability satisfying (a) and (b).
The integration formula now holds with f = 1A ; both sides are linear and
continuous under increasing passages to the limit. *
Regular conditional distributions given ~

In the book, the usual case is: 0 1 = 0 and $'1 c $' and X I(W) = w.
Then, a regular conditional r!Y-distribution for X 2 given X I is called a regular
conditional r!Y-distribution for X 2 given $'1 . The next theorem (43) embodies
the main advantage of regular distributions. It is easy to prove, and intuitive:
it says that when you condition on a a-field ~, you can put any ~-measurable
function U equal to a typical value u, and then substitute U for u when
you're through conditioning. That is, U is truly constant given ~. However,
example (48) shows that something a little delicate happened.
I will state (44) in its most popular form. The notation will be used through
(50). Let (0, $', r!Y) be the basic probability triple. Let ~ be a sub-a-field of $'.
Let U be a measurable mapping from (O,~) to a new space (Ou, $'u). Let V
be a measurable mapping from (0, $') to a new space (Oy, $'y). Thus, U is
~-measurable and V is $'-measurable. The situation is summarized in
Figure 1. Let Q be a regular conditional r!Y-distribution for V given ~, so Q
350 APPENDIX [10

'£cF
n

nv
V-l~VC ~

nux nv-----
F
[0,00)
F is $u X $v- measurable
F
Q u X Qv-+ [0, (0)
F is ~ x ~-measurable

Figure 1.

is a function of pairs (w, C) with WEn and C E §"y. Let F be a nonnegative,


measurable function on (nu x ny, §"u x §"y).
(44) Theorem. Let F*(u, w) = Inv F(u, v) Q(w, dv). Then F* is §" u x 1:-
measurable. And F*(U,· ), namely
w -+ F*(U(w), w),
is a version of E{F(U, V)II:}·
PROOF. You can check the measurability of F* by using (47) below. Now
w -+ (U(w), w) is a measurable mapping from (n,1:) to (nu x n, §"u x 1:).
But F*(U, .) is the composition of F* with this mapping, and is1:-measurable.
Fix A E 1:. I have to show

(45) fAJrnv
F(U(w), v) Q(w, dv) 9(dw) = fAF(U(w), V(w»9(dw).

I know
Is Q(w, C) 9(dw) = 9{S and VE C}
for S E 1: and C E §"y. Rewrite this with {A and U E B} in place of S, where
B is a variable element of §"u. This is legitimate because U is 1:-measurable.
I now have (45) for a special F:
F(u, v) = 18(u)· Idv).
Both sides of (45) are linear in F, and continuous under increasing passages
to the limit. Use (47) below. *
10] REGULAR CONDITIONAL DISTRIBUTIONS 351

(46) CoroUary. E[F(U, V)] = SnSflv F(U(w), v) Q(w, dv) &(dw).


(47) Lemma. Let F be a family of nonnegative, (.?u x .?y)-measurable
functions on nu x n y . Suppose af + bg E F when f, g E F and a, b are non-
negative constants. Suppose f - g E F when f, g E F and 1 ~ f ~ g. Suppose
f E F when fn E F and fn i f Finally, suppose

(u, v) -+ 18(u)· Idv)

is in F when B E.?u and C E Fy. Then F consists of all the nonnegative measur-
able functions on (nu x ny, ffu x .?v).
(48) Example. Suppose U = V is uniform on [0, 1]. Let F(u, v) be 1 or 0
according as u = v or u # v. Then F(U, V) = 1 almost surely, so

E{F(U, V)IU} = 1

E{ F(u, V)I U}= ° °


almost surely. But F(u, V) = almost surely for any particular u, forcing
almost surely. Theorem (43) rescues this example by
defining
E{F(u, V)IU = u} = 1.

Theorem (49) sharpens (44). To state it and (50), let </J be a measurable
*
mapping from (nu x ny, ffv x ffv) to some new space (n"" ff",). Tempora-
rily, fix WEn and u E nu. Then </J(u, .) is a measurable mapping from
(ny, .?y) to (n"" ~). And Q(w,·) is a probability on .?v. So I am entitled
to define D(w, u, . ) as the Q(w, . )-distribution of </J(u, . ); this comes out to a
probability on .?",. Let R(w, . ) = D(w, U(w), .), again a probability on ff",.
(49) Theorem. R(·,·) is a regular conditional &-distribution for </J(U, V)
given L.
PROOF. Let A ELand BE ff",. I have to check that

&{A and </J(U, V) E B} = L R(w, B) &(dw).

Let F(u, v) = 1 or 0, according as </J(u, v) E B or ¢ B. Use (44):

&{A and </J(U, V)EB} = {F(U, V)d&

= { E{F(U, V)IL} d&

= f Jflvr
A
F[U(w), v] Q(w, dv) &(dw).
352 APPENDIX [10

Recognize
r
Jov
F[U(w),v]Q(w,dv) = Q(w,{v:4>[U(w),v]EB})

= R(w, B).

The situation is more tractable when L and V are independent, which will
*
be assumed in (50). Let D(u, . ) be the &>-distribution of 4>(u, . ), a probability
011 fF4> for each u E nu.

(SO) Theorem. Suppose L and V are independent. Then D(U, . ) is a regular


conditional &>-distribution for 4>(U, V) given L.

PROOF. Let A ELand BE fF 4>. I have to check that

&>{ A and 4>( U, V) E B} = { D[U(w), B] &>(dw).

Use Fubini (21) to evaluate the left side. Keep (0, fF, &» for the basic prob-
ability triple. Put (n, L) for (n" .?;), with X 1(w) = w. Put (ny, fFy) for
(n 2 , fF2), with X 2(W) = V(w). Let f(w, v) = 1 if WE A and 4>[U(w), v] E B;
otherwise, let f(w, v) = o. Let .'fo be &> retracted to L. Then
&>{ A and 4>(U, V) E B} = In f[X l(W), V(w)] &>(dw)

= In In f[w, V(w')] &>(dw') #>(dw)

= {In f[w, V(w')] &>(dw') #>(dw),

because f(w, . ) = 0 for w ¢ A. Recognize


tf[W, V(w')] &>(dw') = &>{w' :4>[U(w), V(w')] E B}

= D[U(w), B]
for WE A.

Regular conditional probabilities


*
If ny = nand fFy = fF and V(w) = w, then a regular conditional &>-
distribution for V given L is called a regular conditional &>-probability
given L. For (51) and (52), let Q be a regular conditional &>-probability given
L. That is, (n, fF, &» is the basic probability triple, and L is a sub-a-field of fF.
Moreover Q is a function of pairs (w, B), with WEn and BE fF. The function
Q( . ,B) is a version of &>(BIL), and the function Q(w, . ) is a probability.
Recall that L(W) is the L-atom containing w.
II] THE KOLMOGOROV CONSISTENCY THEOREM 353

(51) Theorem. Let L be countably generated. Then the set of w such that
Q(w, L(W)) = 1 is a L-set of I?i'-probability 1.
PROOF. Let d be a countable generating algebra for L. For each A Ed,
let A* be the set of w such that Q(w, A) = lA(w). Then A* is a L-set of I?i'-
probability 1, and the intersection of A* as A varies over .91 is the set de-
scribed in the theorem. *
For (52), do not assume that L is countably generated. Let C be the smallest
a-field over which w -> Q(w, A) is measurable, for all A E ff. Thus, C C L.
Let E be the set of w such that
Q(w, {w': Q(w', . ) = Q(w, . )}) = 1.
(52) Theorem. Suppose ff is countably generated.
(a) g is countably generated.
(b) E EC.
(c) I?i'(E) = 1.
PROOF. Let .91 be a countable generating algebra for ff. Then C is also
the smallest a-field over which w -> Q(w, A) is measurable, for all A E .91, by
the monotone class argument (Section 5). This proves (a). As (18) now implies,
C(w) = {w' :Q(w', A) = Q(w, A) for all A Ed}.
Of course, Q is a regular conditional I?i'-probability given C. Finally, (51)
proves (b) and (c). *
Regular conditional distributions for partially defined random
variables
Let (n, ff, I?i') be the basic probability triple, and let L be a sub-a-field of ff.
Let DEL. Let V be a measurable mapping from (D, DL) to a new space
(n v , ffv). As usual, DL is the a-field of subsets of D of the form D n S with
S E L. A regular conditional I?i'-distribution for V given L on D is a function
Q of pairs (w, B) with WED and BE ffv, such that:
Q(w, .) is a probability on ffv for each wED;
Q( . , B) is DL-measurable for each BE ffv ; and

L Q(w, B) I?i'(dw) = I?i'{A and VE B}

for all A E L with A c D and all BE ffv. Of course, A and B can be confined
to generating classes in the sense of (16). The partially defined situation is
isomorphic to a fully defined one. Replace n by D, and ff by Dff, and L by
354 APPENDIX [10

D}2, and &> by &>{ ·ID}. Theorems like (44) can therefore be used in partially
defined situations.

Conditional independence

Let (n, ff) and (n i , ffi) be Borel. Let &> be a probability on ff, and Xi a
measurable mapping from (n, ff) to (ni , ffJ Let }2 be a sub-a-field of ff.
What does it mean to say X I and X 2 are conditionally &>-independent given
}2? The easiest criterion is
&>{XIEAI and X 2 EA 2 1}2} = &>{X I EA II}2},&>{X 2 EA 2 1}2} a.e.
for all Ai E:!F;. Nothing is changed if Ai is confined to a generating class for
ffi in the sense of(16). Here is an equivalent criterion. Let Q( " . ) be a regular
conditional &>-distribution for (X I, X 2) given}2. Then Q(w, . ) is a probability
on ffl x ff2 • The variables X I, X 2 are conditionally &>-independent given
}2 iff for &>-almost all w,
Q(w, . ) = QI(W, . ) x Q2(W, . )
where Qi(W, . ) is the projection of Q(w, . ) onto ffi . Necessarily, Qi is a regular
conditional &>-distribution for Xi given }2. The equivalence of these condi-
tions is easy, using (lOa).

11. THE KOLMOGOROV CONSISTENCY THEOREMt

Let (ni,:!F;) be Borel for i = 1,2, .... Let


n = n l x n 2 x .,. and ff = ~ x ~ x ....
Let 7t n project n l x ... x nn + I onto n l x ... x nn: namely,

Let n" project n onto n l x ... x nn: namely,


n,,(WI, ... ,Wn,Wn+I,"') = (WI,·'·'W n)·
For n = 1,2, ... , let &>n be a probability on (n l x ... x nn, ffl x ... x ffn)'
Suppose the &>n are consistent, namely, &>n+ 17t;; I = &>n for all n.

(53) Theorem. There is a unique probability &> on (n, ff) with &>n; I = &>n
for all n.

t References: (Loeve, 1963, Section 4.3); (Neveu, 1965, Section III.3).


12] THE DIAGONAL ARGUMENT 355

12. THE DIAGONAL ARGUMENT

Let Z be the positive integers. Let S be the set of strictly increasing functions
from Z to Z. Call s E S a subsequence of Z. For s E S, the range of s is the
s-image s(Z) of Z ; and s(n) ~ n. Say s is a subsequence or on special occasions
a sub-subsequence of t E S iff s E S and for each nEZ, there is a O"(n) E Z with
s(n) = t[O"(n»). This well-defines 0", and forces 0" E S. Further, s(n) ~ t(n),
because O"(n) ~ n. Geometrically, s E S is a subsequence of t E S iff the range
of s is a subset of the range of t. Thus, if s is a subsequence of t, and t is a
subsequence of u E S, then s is a subsequence of u. If s E S, and m = 0, 1, ... ,
define s(m + . ) E S as follows:
s(m + . )(n) = s(m + n) for n E Z.
Of course, s(m + . ) is a subsequence of s.
Here is a related notion. Say s is eventually a subsequence of t E S iff
s E Sand s(m + . ) is a subsequence of t for some m = 0, 1, .... Geometrically,
s E S is eventually a subsequence of t E S iff the range of s differs by a finite
set from a subset of the range of t. In particular, if s is eventually a subsequence
of t, and t is eventually a subsequence of u E S, then s is eventually a sub-
sequence of u.
To state the first diagonal principle, let Sl E S and let sn+ 1 be a subsequence
of Sn for n = 1,2, .... Let d be the diagonal sequence:
d(n) = sn(n) for n = 1,2, ....

(54) First diagonal principle. The diagonal sequence d is a subsequence of


Sl' and is eventually a subsequence of Sn for all n.
PROOF. I claim dES:

d(n + 1) = sn+ l(n + 1) ~ sn(n + 1) > sin) = d(n).

Fix n = 1,2, .... I claim d(n - 1 + . ) is a subsequence of Sn' Indeed, fix


m = 1, 2, .... Then m - 1 ~ 0, so

d(n - 1 + m) = Sn-l+m(n - 1 + m)Esn-l+m(Z) C sn(Z), *


(55) IDustration. Let sn(m) = n + m. So d(n) = 2n, as in Figure 2.
To make this a little more interesting, introduce a metric space (.0, p).
Let f be a function from Z to .0. Let .t E S. Suppose

Iimn~(xJ[t(n)]= YEn.
356 APPENDIX [10

2 3 4

Sl 2 3 4 5

S2 3 4 5 6

S3 4 5 6 7

S4 5 6 7 8

Figur« 2.

If s is eventually a subsequence of t, you should check


lim.. _oof[s(n)] = y.
For the second diagonal principle, let C be a countable set. For c E C, let /.
be a function from Z to Q. Suppose that for each c E C and t E S, there is a
subsequence Sc of t, such that
lim,,_ oo/.[sc(n)]
exists. This Sc depends on c and t.
(56) Second diagonal principle. For each t E S, there is a subsequence d
of t such that
lim,,_ oo/.[d(n)]
exists for all c E C. This d depends on t, but not on c.
PROOF. Enumerate C as {c" C2, .•. }. Abbreviate fn = icn. Inductively,
construct s" E S such that So = t and s" is a subsequence of s,,_ 1 and
limm _ oo!,,[s,,(m)]
exists. Call this limit Yn; of course, y" may depend on s 1, . . . , s". Using the
14] REAL VARIABLES 357

first diagonal principle, construct the diagonal subsequence d, which is a


subsequence of t and eventually a subsequence of each Sn' So,

*
13. CLASSICAL LEBESGUE MEASURE

Euclidean n-space R n comes equipped with a metric Pn' and is complete


and separable (37). A real-valued random variable on (n, §) is now a
measurable mapping to Borel R 1. The classical n-dimensional Lebesgue
measure An is the countably additive, nonnegative set function on Borel Rn,
whose value at an n-dimensional cube is its n-dimensional volume.
(57) Theorem. Suppose f is a bounded function on R n which vanishes
outside a cube. Then f is Riemann integrable iff f is continuous An-almost
everywhere, and its Riemann integral coincides with Sf dAn.
This theorem will be used only to evaluate Lebesgue integrals, and the
Riemann integrability of f will be obvious.
(58) Theorem. Let f be a measurable function on R", with finite Lebesgue
integral. Then

limh_o f
R"
If(x + h) - f(x)IAidx) = 0.

PROOF. If f is continuous and vanishes outside a large cube, the result is


clear. General f can be approximated by these special f in L I-norm. *
If A is a Borel subset of Rl, the metric density of A at x is
.
IIm,6 Ad(x - B,X + b)nA}
0 ~ .
+
I
• • B U

(59) Metric density theorem. 1 Let A be a Borel subset of RI. There is a


Borel set B with )'I(B) = 0, such that A has metric density 1 at all x E A \ B.

14. REAL VARIABLFS

Let f be a real-valued function on [0, 1]. Let S = {so, S1, ... , Sft} be a finite
°
subset of [0, 1] with = So < SI < ... < Sft = 1. Let
~S=max{(sj+l-s):j=O, ... ,n-l},

1 Reference: (Saks, 1964, Theorem 6.1 on p. 117. Theorem 10.2 on p. 129 is the n-
dimensional generalization, which is harder.)
358 APPENDIX [10

and
Sf = !:j:J If(sj+d - f(s)l.
Let W(S,j) = !:j:J (M j - mj), where
M j = max {j(t):Sj ~ t ~ sj+d and mj = min {j(t):Sj ~ t ~ Sj+d.
The variation off is sups Sf; if this number is finite, f is of bounded variation.
If Sn is nondecreasing and JS n t 0, then Snf tends to the variation of f; so
W(Sn,j) must tend to the variation of f also.
(60) Lebesgue's theorem. 1 Iff is of bounded variation, then f has a finite
derivative Lebesgue almost everywhere.
Theorem (60) can be sharpened as follows.
(61) Theorem. 2 Suppose f is of bounded variation. The pointwise derivative
off is a version of the Radon-N ikodym derivative of the absolutely continuous
part off, with respect to Lebesgue measure.
Even more is true.
(62) de la Vallee Poussin's theorem. 3 SupposeJis of bounded variation. The
positive, continuous, singular part off is concentrated on {x :f'(x) = oo}.
ASSUMPTION. For the rest of this section, assume J is a continuous
function on [0, 1].

Let s(y) be the number of x with f(x) = y, so s(y) = 0, 1, ... , 00.


(63) Banach's theorem. 4 The variation off is f~ 00 s(y) dy.
PROOF. Let Sn = {O, 1/2n, 2/2n, ... ,1}. Let sn,O be the indicator function
of the f-image of the interval [0, 1/2n]. For j = 1, ... , 2n - 1, let Sn,j be the
indicator function of the f-image of (j/2n, (j + 1)/2n]. Let Sn = !:J:o 1 Sn,j'
Verify that Sn i s, so s is Borel and

f:oo s(y) dy = limn f:oo sn(Y) dy = limn W(Sn,j):


because

1 References: (Saks, 1964, Theorem 5.4 on p. llS); (Riesz-Nagy, 1955, Chapter 1) has a
proof from first principles. This theorem is hard.
2 References: (Dunford and Schwartz, 1958, III.l2); (Saks, 1964, Theorem 7.4 on
p. 1l9). It's hard.
3 Reference: (Saks, 1964, Theorem 9.6 on p. 127). Theorems (60--62) are hard.
4 Reference: (Saks, 1964, Theorem 6.4 on p. 280).
14] REAL VARIABLES 359

o a Xo

Figure 3.

where
Mn,i = max {J(t):j/2 n ~ t ~ (j + 1)j2n}
mn,i = min {J(t):j/2n ~ t
The upper right Dini derivative D* f is defined by:
~ (j + 1)j2"}.
*
D* Ix = lim sup, !o [f(x + e) - f(x)]je,
for 0 ~ x < 1.
(64) Zygmund's theorem. l If the set of values assumed by f on the set of x
with D* Ix ~ 0 includes no proper interval, then f is nondecreasing.

PROOF. Suppose by way of contradiction that there are a and b with


o< a < b < 1 and f(a) > f(b). Find one y with f(a) > y > f(b), such that
y¢f{D*f ~ O}; that is, y = f(x) entails D*Ix > O. Let Xo be the largest
x E [a, b] with f(x) = y, so Xo < b. But f < yon (xo, b], so D*j-x o ~ O. See
Figure 3. *
(65) CoroUary. If the set of x with D*Ix < 0 is at most countable, then
f is nondecreasing.
PROOF. Let e > 0 and f.(x) = f(x) + ex. Now D* f.·x = D* Ix + e, so
{D*f. ~ O} is at most countable. By Zygmund's theorem,f, is nondecreasing
Let e --. O. *
I Reference: (Saks, 1964, Theorem 7.1 on p. 203).
360 APPENDIX [10

(66) Dini's theorem. l


f(y) - f(x)
(a) SUPO<x<l
~
D*Ix = SUPO:S:x<y:S:l =-----:::--'---------'---'-----
- - y-x
and
. . f(y) - f(x)
(b) mfo<x<l
~
D*Ix = mfO:S:X< y:S:1
- - y-x .
PROOF. Fix a finite m. Then D* Ix ~ m for 0 ~ x < 1 implies f(x) - mx
is nondecreasing with x, by (65). By algebra, for 0 ~ x < y ~ 1,
f(y) - f(x)
.::........:::-'----'---'----- 2:: m.
y - x -
Consequently, inf[J(y) - f(x)]/(y - x) ~ inf D*f· x. The opposite inequal-
ity is clear, proving (b). Assertion (a) is easy. *
(67) Corollary.2 If D* f is continuous at x, then f is differentiable at x.
PROOF.

(68) Corollary.
Use (66).
If D* f == 0 on [0, 1), then f is constant on [0, 1].
*
PROOF. Use (67).
(69) Theorem. Suppose f has a finite, right continuous, right derivative f + *
r
on (0, 1), whichhasafinite integral over (e, 1 - e)foranye > O.IfO < x < y < 1,
then

f(y) = f(x) + f +(t) dt.

PROOF. Let

g(y) = f(y) - f(x) - r f +(t) dt.

Then g is continuous and D*g = 0 on [x, 1), while g(x) = O. Use (68). *
Miscellany

Let 11 be a probability on the Borel subsets of [0, (0). Its Lap/ace transform
<p is this function of nonnegative A:

cp(A) = r
J[O,oo)
e- lx l1(dx).

1 Reference: (Saks, 1964, p. 204).


2 Reference: (Saks, 1964, p. 204).
16] CONVEX FUNCTIONS 361

(70) Theorem. 1 cp determines 11.


As usual, O! = 1, and I! = 1, and (n + 1)! = (n + 1)n!. Write

( n) n!
m = m!(n - m)!'

the number of subsets with m elements which can be chosen from a set with
n elements. Temporarily, let

(71) Stirling's formula. 2 n!/s(n) -+ I as n -+ 00.

15. ABSOLUTE CONTINUITY

A function f on [0, 1] is absolutely continuous iff for every positive t; there


is a positive J such that:

and

imply

In other words, f is absolutely continuous iff f is of bounded variation,


and the measure induced by f is absolutely continuous with respect to
Lebesgue measure. The pointwise derivative of f is then a version of the
Radon-Nikodym derivative by (61). Suppose g is another function on [0, 1].
If f and g are absolutely continuous, and I' = g' a.e. it follows that f - g is
constant. Suppose f and g are absolutely continuous, and g is nondecreasing,
and 0 ~ g(O) ~ g(l) ~ 1. Then fog is absolutely continuous, as is immediate
from the definition; and
THE CHAIN RULE.3 (f 0 g)' = (I' 0 g)g' a.e.

1 Reference: (Feller, 1966, Theorem 1 on p. 408).


2 Reference: (Feller, 1968, 11.9).
3 Reference: (Serrin and Varberg, 1969).
362 APPENDIX [10

PROOF. For simplicity, suppose f(O) = g(O) = 0 and g(1) = 1. Let


o~ t ~ 1. Here is a computation.

f 0 g(t) = f
g(r)

0 f'(u) du

= {f'[g(t)] g(dt)

= {f'[g(t)]g'(t) dt.
The first line holds by (61); the second by (20), for the g-distribution of g is
uniform on [0, 1]; and the third by (61) and (11). Now use (61) and (11) to
differentiate. *
(72) Theorem. Suppose the upper or lower right Dini derivative of f is
finite at all but a countable number of points, and f is of bounded variation.
Then f is absolutely continuous.
PROOF. Use (62).
*
16. CONVEX FUNCTIONS

Let f be a real-valued function on the open interval (a, b). Abbreviate


p = 1 - p. Then f is convex iff
a < x < y < band 0 < p< 1
imply
f(px + py) ~ pf(x) + pf(y)·
Geometrically, each chord of f is nowhere below f, as in Figure 4. Say f is
strictly convex iff inequality holds. Say f is concave or strictly concave iff
- f is convex or strictly convex. For (73), suppose f is convex and
a < x < y < b and a < x' < y' < b
x ~ x' and y ~ y'.

As in Figure 4,
f(y') - f(x') ~ f(y) - f(x)
(73)
y' - x' y - x
Indeed, the case x = x' restates the definition of convexity, as does the
case y = y'. General (73) follows by compining the two cases: the slope of
16] CONVEX FUNCTIONS 363

a x x' y y' b

Figure 4.

f over (x, y) is at most the slope over (x, y'), which does not exceed the
slope over (x', y').
(74) Theorem. Suppose f is convex on (a, b).
(a) f is continuous.
(b) f has a finite right derivative f +, which is nondecreasing and con-
tinuous from the right.
(c) f has a finite left derivative f -, which is nondecreasing and con-
tinuous from the left.
(d) f+ ~ f-·
(e) The discontinuity sets of f + and f - coincide, and are countable.
Off this set, f + = f - .
(f) For a < x < y < b,

f(y) - f(x) = f f+(t)dt = f f-(t)dt.

(g) Iff is strictly convex, then f + and f - are strictly increasing.


(h) Suppose a and f(a + ) are finite. Define f(a) = f(a +). Then f has a
right derivative f+(a) at a, and - 00 ~ f+(a) < 00. And f+(a+) = f+(a).
The situation at b is symmetric.

PROOF. Claim (b). Let y decrease to x > w. The slope of f over (x, y)
non increases and is at least the slope over (w, x) by (73), proving that
(75) f+(x) exists and is at most the slope of f over (x, y), and at least the
slope over (w, x).
364 APPENDIX [10

a x y z b

Figure 5.

You can use (73) to show that f + is nondecreasing. I will argue that f + is
continuous from the right at x. Let
a < x < y < z < b,
as in Figure 5.

Then f+(x) ~ f+(y). But f+(y) is at most the slope of f over (y, z) by (75),
which tends to the slope of f over (x, z) when y tends to x. That is,

limy_xf+(y) ~ f(z) - f(x) ! f+(x) as z! x.


z-x
Claim (c)is symmetric.
Claim (d) follows from (73).
Claim (a). Get f(x) = f(x + ) from the finitude of f +, and f(x) = f(x - )
from symmetry.
Claim (e). Let h > O. Then f-(x + h) ~ f+(x) by (73). And (d) makes
f-(x + h) ~ f+(x + h). Let h -+ 0 and use (b):
f-(x+) = f+(x+) = f+(x).
Similarly
f-(x-) = f+(x-) = f-(x).
Claim (f). Use (a, b) and (69), then (e).
Claim (g). Improve (73) to strict inequality, when x < x' or y < y'. Let
a < x < y < z < w < b.
Use (75) and improved (73):
f(x+) ~ f(y) - f(x) < f(z) - f(y) < f(w) - f(z) ! f+(z)
y-x z-y w-z
as w! z.
17] COMPLEX VARIABLES 365

Claim (h) is like (b).


(76) Theorem. *
Suppose f is convex on (a, b), and x E (a, b). V sing (74d),
choose s with

Let
A(t) = s(t - x) + f(x),
the linear function with slope s which agrees with f at x. Then
A~ f on (a, b).
In particular, a convex function on a finite interval is bounded below.
PROOF. Let x < t < b: the other case is symmetric. Then
s ~ f +(x) ~ f(t~ =~(X)
by (75); look at Figure 6. So
+ f(x)
f(t) ~ s(t - x)

NoTE. Suppose f is a function on (a, b). If either


= A(t).
*
(a) f has a finite, right continuous derivative f + which is nondecreasing,
or
(b) there is a non decreasing g with

then f is convex.
f(y) = f(x) + r g(t) dt when a < x < y < b,

Figure 6.
366 APPENDIX [10

PROOF. Condition (a) implies (b) with g = f +, by (69). Suppose (b). Fix
x, y, z with
a < x < y < z < b.
Abbreviate c for the slope of f over (x, y), and d for the slope of f over (y, z).
Then

c = -1- fY g(t)dt ~ --
1 fZ
g(t)dt = d.
y-X z-yy
*
X

So (y,f(y)) cannot be above the chord joining (x,f(x)) and (z,f(z)).

17. COMPLEX VARIABLES

If z = x + iy, where x and yare real and i 2 = -1, then Re z = x and


1m z = y and Izl = (x 2 + y2)t. Moreover, eZ = eX(cos y + i sin y), SOl
(77)
A complex function f of a complex variable is analytic on an open half-
plane H iff it is differentiable at every point of H.

(78) Theorem. 2 Suppose f is analytic on an open half-plane H. Then f has


derivatives j<") of all orders n at all points of H. If Zo E H, choose a positive
real number r so small that
Dr = {z:lz - zol ~ r} c H.
Then
f(z) = 1::'= 0 j<") (zo) . (z - zo)"In!.
The series converges absolutely and uniformly on Dr.
(79) Theorem. 3 Iff,. is an analyticfunction on a half-plane andfn -+ funiformly
on compact sets, then f is analytic.
(SO) Theorem. 4 Iff is an analytic function on a half-plane, and f vanishes
on a set which has a point of accumulation, then f is identically O.

1 Reference: (Ahlfors, 1953, p. 47).


2 Reference: (Ahlfors, 1953, pp. 96 and 142). Hard.
3 Reference: (Ahlfors, 1953, Theorem 1 on p. 138). Hard.
4 Reference: (Ahlfors, 1953, p. 102). Hard.
BIBLIOGRAPHY

LARS V. AHLFORS (1953; 2nd ed., 1965). Complex Analysis, McGraw-Hili, New York.
DAVID BLACKWELL (1954). On a class of probability spaces, Proc. 3rd Berk. Symp.,
Vol. 2, pp. 1-6.
DAVID BLACKWELL (1955). On transient Markov processes with a countable number
of states and stationary transition probabilities, Ann. Math. Statist., Vol. 26, pp.
654-658.
DAVID BLACKWELL (1958). Another countable Markov process with only instan-
taneous states, Ann. Math. Statist., Vol. 29, pp. 313-316.
DAVID BLACKWELL (1962). Representation of nonnegative martingales on transient
Markov chains, Mimeograph, Statistics Department, University of California at
Berkeley.
DAVID BLACKWELL and LESTER DUBINS (1963). A converse to the dominated conver-
gence theorem, Illinois J. Math., Vol. 7, pp. 508-514.
DAVID BLACKWELL and DAVID A. FREEDMAN (1964). The tail a-field of a Markov
chain and a theorem ofOrey, Ann. Math. Statist., Vol. 35, pp. 1291-1295.
DAVID BLACKWELL and DAVID FREEDMAN (1968). On the local behavior of Markov
transition probabilities, Ann. Math. Statist., Vol. 39, pp. 2123-2127.
DAVID BLACKWELL and DAVID KENDALL (1964). The Martin boundary for P6lya's
urn scheme and an application to stochastic population growth, J. Appl. Proba-
bility, Vol. 1, pp. 284-296.
R. M. BLUMENTHAL (1957). An extended Markov property, Trans. Amer. Math. Soc.,
Vol. 85, pp. 52-72.
R. M. BLUMENTHAL and R. K. GETOOR (1968). Markov Processes and Potential
Theory, Academic Press, New York.
R. M. BLUMENTHAL, R. GETOOR, and H. P. MCKEAN, Jr. (1962). Markov processes
with identical hitting distributions, Illinois J. Math., Vol. 6, pp. 402-420.
LEO BREIMAN (1968). Probability, Addison-Wesley, Reading.
D. BURKHOLDER (1962). Transient processes and a problem of Blackwell, Mimeo-
graph, Statistics Department, University of California at Berkeley.
D. BURKHOLDER (1962). Successive conditional expectations of an integrable function,
Ann. Math. Statist., Vol. 33, pp. 887-893.
KAI LAI CHUNG (1960; 2nd ed., 1967). Markov Chains with Stationary Transition
Probabilities, Springer, Berlin.
KAI LAI CHUNG (1963). On the boundary theory for Markov chains, I, Acta. Math.,
Vol. 110, pp. 19-77.
KAI LAI CHUNG (1966). On the boundary theory for Markov chains, II, Acta. Math.,
Vol. Jl5, pp. Ill-163.
367
368 BIBLIOGRAPHY

KAI LAI CHUNG and W. H. J. FUCHS (1951). On the distribution of values of sums of
random variables, Mem. Amer. Math. Soc., no. 6.
R. CoGBURN and H. G. TUCKER (1961). A limit theorem for a function of the incre-
ments of a decomposable process, Trans. Amer. Math. Soc., Vol. 99, pp. 278-284.
HARALD CRAMER (1957). Mathematical Methods of Statistics, Princeton University
Press.
ABRAHAM DE MOIVRE (1718). The Doctrine of Chances, Pearson, London, Chelsea,
New York (1967).
C. DERMAN (1954). A solution to a set of fundamental equations in Markov chains,
Proc. Amer. Math. Soc., Vol. 5, pp. 332-334.
W. DoEBLlN (1938). Sur deux problemes de M. Kolmogorov concernant les chaines
denombrables, Bull. Soc. Math. France, Vol. 66, pp. 21()-220.
W. DoEBLlN (1939). Sur certains mouvements aleatoires discontinus, Skand. Akt.,
Vol. 22, pp. 211-222.
MONROE D. DoNSKER (1951). An invariance principle for certain probability limit
theorems, Mem. Amer. Math. Soc., no. 6.
J. L. DooB (1942). Topics in the theory of Markoff chains, Trans. Amer. Mllth. Soc.,
Vol. 52, pp. 37-64.
J. L. DooB (1945). Markoff chains-denumerable case, Trans. Amer. Math. Soc., Vol.
58, pp. 455-473.
J. L. DooB (1953). Stochastic Processes, Wiley, New York.
J. L. DooB (1959). Discrete potential theory and boundaries, J. Math. Mech., Vol. 8,
pp. 433-458, 993.
J. L. DooB (1968). Compactification of the discrete state space of a Markov process,
Z. Wahrscheinlichkeitstheorie, Vol. 10, pp. 236--251.
LESTER E. DUBINS and DAVID A. FREEDMAN (1964). Measurable sets of measures,
Pac. J. Math., Vol. 14, pp. 1211-1222.
LESTER E. DUBINS and DAVID A. FREEDMAN (1965). A sharper form of the Borel-
Cantelli lemma and the strong law, Ann. Math. Statist., Vol. 36, pp. 80()-807.
LESTER E. DUBINS and GIDEON SCHWARZ (1965). On continuous martingales, Proc.
Nat. A cad. Sci. USA, Vol. 53, pp. 913-916.
NELSON DUNFORD and JACOB T. SCHWARTZ (1958). Linear operators, Part /, Wiley,
New York.
A. DVORETZKY, P. ERDOS, and S. KAKUTANI (1960). Nonincrease everywhere of the
Brownian motion process, Proc. 4th Berk. Symp., Vol. 2, pp. 103-116.
E. B. DYNKIN (1965). Markov Processes, Springer, Berlin.
P. ERDOS and M. KAC (1946). On certain limit theorems of the theory of probability.
Bull. Amer. Math. Soc., Vol. 52. pp. 292-302.
WILLIAM FELLER (1945). On the integro-differential equations of purely discontinuous
Markoffprocesses, Trans. Amer. Math. Soc., Vol. 48, pp. 488-515.
WILLIAM FELLER (1956). Boundaries induced by nonnegative matrices, Trans. Amer.
Math. Soc., Vol. 83, pp. 19-54.
WILLIAM FELLER (1957). On boundaries and lateral conditions for the Kolmogoroff
differential equations, Ann. of Math., Vol. 65, pp. 527-570.
WILLIAM FELLER (1959). Non-Markovian processes with the semigroup property,
Ann. Math. Statist., Vol. 30, pp. 1252-1253.
BIBLIOGRAPHY 369

WILLIAM FELLER (1961). A simple proof for renewal theorems, Comm. Pure Appl.
Math., Vol. 14, pp. 285-293.
WILLIAM FELLER (1966). An introduction to probability theory and its applications,
Vol. 2, Wiley, New York.
WILLIAM FELLER (1968). An introduction to probability theory and its applications,
Vol. 1, 3rd ed., Wiley, New York.
WILLIAM FELLER and H. P. MCKEAN, Jr. (1956). A diffusion equivalent to a countable
Markov chain, Proc. Nat. Acad. Sci. USA, Vol. 42, pp. 351-354.
R. GETooR (1965). Additive functionals and excessive functions, Ann. Math. Statist.,
Vol. 36, pp. 409--423.
G. H. HARDY, J. E. LITTLEWOOD, and G. POLYA (1934). Inequalities, Cambridge
University Press.
T. E. HARRIS (1952). First passage and recurrence distributions, Trans. Amer. Math.
Soc., Vol. 73, pp. 471--486.
T. E. HARRIS and H. ROBBINS (1953). Ergodic theory of Markov chains admitting an
infinite invariant measure, Proc. Nat. Acad. Sci. USA, Vol. 39, pp. 860--864.
P. HARTMAN and A. WINTNER (1941). On the law of the iterated logarithm, Amer.
J. Math., Vol. 63, pp. 169-176.
FELIX HAUSDORFF (1957). Set Theory, Chelsea, New York.
EDWIN HEWITT and L. J. SAVAGE (1955). Symmetric measures on Cartesian products,
Trans. Amer. Math. Soc., Vol. 80, pp. 470--501.
E. HEWITT and K. STROMBERG (1965). Real and Abstract Analysis, Springer,
Berlin.
G. A. HUNT (1956). Some theorems concerning Brownian motion, Trans. Amer.
Math. Soc., Vol. 81, pp. 294-319.
G. A. HUNT (1957). Markoff processes and potentials, 1,2,3, Illinois J. Math., Vol. 1,
pp. 44-93; Vol. 1, pp. 316--369; Vol. 2, pp. 151-213 (1958).
G. A. HUNT (1960). Markoff chains and Martin boundaries, Illinois J. Math., Vol. 4,
pp.313-340.
K. In) and H. P. McKEAN, Jr. (1965). Diffusion Processes and Their Sample Paths,
Springer, Berlin.
W. B. JURKAT (1960). On the analytic structure of semigroups of positive matrices,
Math. Zeit., Vol. 73, pp. 346--365.
A. A. JUSKEVIC (1959). Differentiability of transition probabilities of a homogeneous
Markov process with countably many states, Moskov. Gos. Univ. Ucenye Zapiski,
No. 186, pp. 141-159; in Russian. Reviewed in Math. Rev. No. 3124 (1963).
M. KAC (1947). On the notion of recurrence in discrete stochastic processes, Bull.
Amer. Math. Soc., Vol. 53, pp. 1002-1010.
S. KAKUTANI (1943). Induced measure preserving transformations, Proc. Impl.
Acad. Tokyo, Vol. 19, pp. 635--641.
J. G. KEMENY and J. L. SNELL (1960). Finite Markov Chains, Van Nostrand, Prince-
ton.
J. G. KEMENY, J. SNELL, and A. W. KNAPP (1966). Denumerable Markov Chains,
Van Nostrand, Princeton.
A. KHINTCHINE (1924). Ein Satz der Wahrscheinlichkeitsrechnung, Fund. Math.,
Vol. 6, pp. 9-;20.
370 BIBLIOGRAPHY

J. F. C. KINGMAN (1962). The imbedding problem for finite Markov chains, Z.


Wahrscheinlichkeitstheorie, Vol. I, pp. 14-24.
J. F. C. KINGMAN (1964). The stochastic theory of regenerative events, Z. Wahrschein-
lichkeitstheorie, Vol. 2, pp. 180--224.
J. F. C. KINGMAN (1968). On measurable p-functions, Z. Wahrscheinlichkeitstheorie,
Vol. II, pp. 1-8.
J. F. C. KINGMAN and STEVEN OREY (1964). Ratio limit theorems for Markov chains,
Proc. Amer. Math. Soc., Vol. 15, pp. 907-910.
FRANK KNIGHT and STEVEN OREY (1964). Construction of a Markov process from
hitting probabilities, J. Math. Mech., Vol. 13, pp. 857-873.
A. KOLMOGOROV (1931). Ober die analytischen Methoden in der Wahrscheinlich-
keitsrechnung, Math. Ann., Vol. 104, pp. 415-458.
A. KOLMOGOROV (1936). Anfangsgriinde der Theorie der Markoffschen Ketten mit
unendlichen vielen moglichen Zti~tanden, Mat. Sb., pp. 607-{)1O.
A. KOLMOGOROV (1951). On the differentiability of the transition probabilities in
stationary Markov processes with a denumerable number of states, Moskov. Gos.
Univ. Ucenye Zapiski, Vol. 148, Mat. 4, pp. 53-59; in Russian. Reviewed on
p. 295 of Math. Rev. (1953).
ULRICH KRENGEL (1966). On the global limit behavior of Markov chains and of
general nonsingular Markov processes, Z. Wahrscheinlichkeitstheorie, Vol. 4,
pp.302-316.
CASIMIR KURATOWSKI (1958). Topologie I, 4th ed. Warsaw.
PAUL LEVY (1951). Systemes markoviens et stationnaires. Cas denombrable, Ann.
Sci. Ecole Norm. Sup., (3), Vol. 68, pp. 327-381.
PAUL LEVY (1952). Complement a l'etude des processus de Markoff, Ann. Sci. Ecole
Norm. Sup., (3) Vol. 69, pp. 203-212.
PAUL LEVY (1953). Processus markoviens et stationnaires du cinquieme type, C. R.
Acad. Sci. Paris, Vol. 236, pp. 1630--1632.
PAUL LEVY (1954). Le Mouvement Brownien, Gauthier Villars, Paris.
PAUL LEVY (l954a). Theorie de I'Addition des Variables Aleatoires, Gauthier Villars,
Paris.
PAUL LEVY (1958). Processus markoviens et stationnaires. Cas denombrable, Ann.
lnst. H. Poincare, Vol. 16, pp. 7-25.
PAUL LEVY (1965). Processus Stochastiques et Mouvement Brownien, Gauthier Villars,
Paris.
MICHEL LoEVE (1963). Probability Theory, 3rd ed., Van Nostrand, Princeton.
A. A. MARKOV (1906). Extension of the law of large numbers to dependent events,
Bull. Soc. Phys. Math. Kazan., (2), Vol. 15, pp. 135-156; in Russian.
JACQUES NEVEU (1965). Mathematical Foundations of the Calculus of Probability,
Holden-Day, San Francisco.
STEVEN OREY (1962). An ergodic theorem for Markov chains, Z. Wahrscheinlich-
keitstheorie, Vol. I, pp. 174-176.
DoNALD ORNSTEIN (1960). The differentiability of transition functions, Bull. Amer.
Math. Soc., Vol. 66, pp. 36-39.
DANIEL RAY (1956). Stationary Markov processes with continuous paths, Trans. Amer.
Math. Soc., Vol. 82, pp. 452-493.
BIBLIOGRAPHY 371

DANIEL RAy (1967). Some local properties of Markov processes, Proc. 5th Berk.
Symp., Vol. 2, part 2, pp. 201-212.
G. E. H. REUTER (1957). Denumerable Markov processes and the associated con-
traction semigroups on p, Acta Math., Vol. 97, pp. 1-46.
G. E. H. REUTER (1959). Denumerable Markov processes. J. London Math. Soc.,
Vol. 34, pp. 81-91.
G. E. H. REUTER (1969). Remarks on a Markov chain example of Kolmogorov, Z.
Wahrscheinlichkeitstheorie, Vol. 13, pp. 315-320.
F. RIEsz and B. Sz. NAGY (1955). Functional Analysis, Ungar, New York.
B. A. ROGOZIN (1961). On an estimate of the concentration function, Theor. Proba-
bility Appl., Vol. 6, pp. 94-96.
H. L. ROYDEN (1963). Real Analysis, Macmillan, New York.
STANISLAW SAKS (1964). Theory of the Integral, 2nd rev. ed., Dover, New York.
JAMES SERRIN and D. E. VARBERG (1969). A general chain rule for derivatives and the
change of variables formula for the Lebesgue integral, Amer. Math. Monthly, Vol.
76, pp. 514-520.
A. SKOROKHOD (1965). Studies in the Theory of Random Processes, Addison-Wesley,
Reading.
GERALD SMITH (1964). Instantaneous states of Markov processes, Trans. Amer.
Math. Soc., Vol. 110, pp. 185-195.
J. M. O. SPEAKMAN (1967). Two Markov chains with a common skeleton, Z.
Wahrscheinlichkeitstheorie, Vol. 7, p. 224.
FRANK SPITZER (1956). A combinatorial lemma and its applications to probability
theory, Trans. Amer. Math. Soc., Vol. 82, pp. 323-339.
FRANK SPITZER (1964). Principles of Random Walk, Van Nostrand, Princeton.
VOLKER STRASSEN (1964). An invariance principle for the law of the iterated logarithm,
Z. Wahrscheinlichkeitstheorie, Vol. 3, pp. 211-226.
VOLKER STRASSEN (1966). A converse to the law of the interated logarithm, Z.
Wahrscheinlichkeitstheorie, Vol. 4, pp. 265-268.
VOLKER STRASSEN (1966a). Almost sure behavior of sums of independent random
variables and martingales, Proc. 5th Berk. Symp., Vol. 2, part I, pp. 315-343.
H. F. TROTTER (195~). A property of Brownian motion paths, Illinois J. Math., Vol. 2,
pp.425-433.
A. WALD (1944). On cumulative sums of random variables, Ann. Math. Statist., Vol.
IS, pp. 283-296.
N. WIENER (1923). Differential space, J. Math. and Phys., Vol. 2, pp. 131-174.
DAVID WILLIAMS (1964). On the construction problem for Markov chains, Z. Wahr-
scheinlichkeitstheorie, Vol. 3, pp. 227-246.
DAVID WILLIAMS (1966). A new method of approximation in Markov chain theory
and its application to some problems in the theory of random time substitution,
Proc. Lond. Math. Soc. (3), Vol. 16, pp. 213-240.
DAVID WILLIAMS (1967). Local time at fictitious states, Bull. Amer. Math. Soc., Vol.
73, pp. 542-544.
DAVID WILLIAMS (1967a): A note on the Q-matrices of Markov chains, Z. Wahrschein-
lichkeitstheorie, Vol. 7, pp. 116--121.
372 BIBLIOGRAPHY

DAVID WILLIAMS (l967b). On local time for Markov chains, Bull. Amer. Math. Soc.,
Vol. 73, pp. 432-433.
HELEN WmENBERG (1964). Limiting distributions of random sums of independent
random variables, Z. Wahrscheinlichkeitstheorie, Vol. I, pp.7-18.
A. ZYGMUND (1959). Trigonometric Series, Cambridge University Press.

Additional references

C. DELLACHERIE and P. A. MEYER (1978). Probabilites et potentiel, Hermann,


Paris.
F. B. KNIGHT (1981). Essentials of Brownian motion and diffusion, Math. Surveys,
Vol. 18.
H. P. MCKEAN (I969). Stochastic Integrals, Academic Press, New York.
DAVID WILUAMS (1979). Diffusions, Markov Processes, and Martingales, Wiley,
New York.
INDEX

absolute continuity, 333, 345, 360, 362 concentration function, 80, 99ff
Ahlfors, 365 Kolmogorov's inequality on, 104
almost sure statements, 334 of a sum tends to 0, 102
analytic functions, 365 conditional
arcsine law, 82, 93 distribution, 347
atoms, 204, 334, 352 expectation, probability, 338, 347
construction of a Markov chain
backward equation, 150, 243ff, 325; see from its visiting process and holding
"forward equation" times, 172ff
Banach, 358 which moves through its states in
Banach algebra, 147, 151 order, 173ff
binomial distribution which moves through the rationals,
bound on tails, 64 in order, 180,202
concentration of, l04ff with given generator, 154ff, 197ff,
maximal term, 273 237ff, 165ff
Blackwell, 39, 111,204,297,334 with given transitions, stable states
Blackwell and Freedman, 266 and regular sample functions, 221
Blackwell and Kendall, 131 with given transitions and quasi-
blocks, 15, 76 regular sample functions, 307
Blumenthal, 237 with sample functions which are ini-
boundary, 111, 124,293 tially step functions and are then
Brownian motion, 95 constant, 154ff
category with sample functions which are step
of set of infinities, 311 functions, 154ff, 165ff
of singularities in P( " a, b), 266 continuity
central limit theorem, 82 absolute, of transitions, 266
change of variables, 337 in probability of a chain, 221, 312
Chebychev, 332 of pre-t sigma-field in t, 204
Chung, 48, 82, 83, 98, 145, 146,218,223, of transitions, 143
237,243,245,266,300,314,323,325 convergence
Chung and Fuchs, 38 almost sure, 334
class in a metric space, 346
alternative description, 19 in U, 334
communicating, 17,40 in probability, 255, 333
cyclically moving, 18,40 of G(i, ~.)/pG(~.), 124
closed sets, 346 of states to the boundary, 124
373
374 INDEX

convex functions, 361 Donsker, 82, 96, 98


coordinate process, 3, 7, 8, 218, 304 Doob, 25, Ill, 145, 146, 151,218,243,
coterminous, 303 300,323,339
criterion for downcrossings, 300, 344
almost everywhere equality of func- Dubins and Freedman, 90
tions,334 Dunford and Schwartz, 149,357
equality of probabilities, 334 Dyson, 48, 70
Markov property, 141
Markov property on (0,00), 223, 234,
322 embedding a stochastic matrix in a
Markov times, 205ff semigroup, 298
minimal solution to be stochastic, 239 ergodic theory, 63
quasiregularity on (0, (0), 321 essential range, 1
recurrence, 5, 19,28 excessive
recurrence in Bernoulli walk, 33 functions, 111,293
recurrence in Harris walk, 36 measures, 293
regularity on (0, (0), 233 exchangeable
sample functions to be step functions, process, 40-41, 131
160,239 a-field, 39ff, 46
stochastic semigroups, 141-142 existence
strict Markov times, 205ff of derivatives, see "differentiation"
uniform integrability, 332 of Markov chains, 139; see "construc-
tion"
de la Vallee Poussin, 357 expectation, 331
derivatives, see "differentiation," exponential
"generator" bound for absorption probabilities,
Dini,361 28, 101
Derman, 47 bound for the law oflarge numbers, 64
diagonal argument, 354 distribution, 152ff
differentiation of a matrix, 147ff
of compositions, 361 extreme
of convex functions, 361ff excessive functions, IlIff, 119, 293ff
of functions of bounded variation, 357 excessive measures, 293ff
of probabilities through a net, 345 measures, 336
of transition probabilities, 140, 146,
151, 226, 243ff
of uniform semigroups, 148, 151 Fatou,331
Dini,361 Feller, 4, 11,22, 111,237,273,360
directed sets, 302ff field, 334
distribution, 3, 7, 337 finitary, 223, 234ff, 322ff
of a chain given its invariant a-field, finite state space, 21, 28, 140
118/f forward equation, 150, 243, 246ff, 325;
of first hitting place, 294 see also "backward equation"
of holding times and visiting process, Fubini, 337/f
154ff, 165ff, 170-171, 227ff, 235ff, functional process
324ff partial sums, 82ff
of post-T process given pre-r sigma- sum over a j-block is 0, 78
field, llff, 229ff, 315ff sum over aj-block is U, 60, 78
Doeblin, 15,47 which isn't Markov, 2
dominated convergence, 331 functions with right and left limits, 300
INDEX 375

g.c.d., 5, 22 in variance principles, 82ff, 95ff


generator, 146, 148, 172, 192, 217, 237ff invariant, see also "excessive,"
and sample functions, 154ff, 165ff, 170, "harmonic"
228ff, 324ff measures, 7, 29, 47ff, 53ff, 59ff, 62ff
characterized in uniform case, 151 a-field, 39ff, 46, IlIff, 120, 293ff
computed for examples, Chapters 6
and 8 j-interval, 219, 221, 309
does not determine the semigroup, j-seq uence, 14
180, 197ff,202 joint measurability, 159, 175, 187,222
exists, 146 jump matrix, 216, 223ff, 227ff, 325
on two states, 297 jump process, see "jumps," "visiting
geometric distribution, 13,219 process"
greatest common divisor, see "g.c.d." jumps, 216, 224, 227ff, 324; see also
group "visiting process"
and g.c.d., 22 to or from the fictitious state, 217, 224,
random walk on, 44 245, 248ff, 325
Juskevic, 260
harmonic, 111
Harris, 36, 48
Harris and Robbins, 73 Kac,63
Hewitt and Savage, 40 Kakutani, 59
hitting times in U, 27, 81; see also Kemeny and Snell, 4
"times" Kingman, 298
hitting probabilities, 19; see also Kingman and Orey, 48
"Bernoulli walk," "Harris walk" Kolmogorov,7, 146
computed for examples in Sections Kolmogorov consistency theorem, 353
4.4-5 Komogorov 0-1 Law, 338
holding times, 13, 141, 154ff, 174ff, 182ff, Krengel, 58
216, 223ff, 235ff, 292ff
general case, 324ff U-spaces, 76ff, 334, 347
stable case, 227ff Laplace transforms, 360
uniform case, 165ff Law of large numbers, 338
Hunt, 111,237,294 for i.i.d. variables with skipping, 88
Lebesgue
incomparable, 104 decomposition, 333
independence, 337 integral, measure, 331, 356
conditional, 353 theorem on derivatives, 357
indicator function, 331 level sets, see "sets of constancy"
inequality Levy, 7,48,142,180,218,223,237
Chebychev,332 limits of
down crossings, 344 concentration function, 102, 104
Jensen, 332 distribution of paths, see "in variance
Kolmogorov, 104 principles"
martingale, 339ff number of visits near 0, 89
on central term of binomial distribu- P{k}, 101
tion,43 pn,25
on E{(1 + wrt}, 106 (pn)lln,65
on tail of binomial, 64 ratio of number of visits or mean
Schwarz, 332 number of visits, see "ratio limit
intervals of constancy, 173,219,221,309 theorems"
376 INDEX

subadditive function, see "subaddi- one-point compactification, 346; see also


tive" "state, infinite"
. sum ofi.i.d. variables with skipping, 88 open sets, 346
Ime of support, 136 Orey, 39, 44
Loeve,331ft' Ornstein, 146,245
Markov, 7, 82
Markov chain defined, 1,8,60, 139 p-walk,273
Markov on (0, 00) defined, 223 period, 5, 17ft'; see also "Farrell"
Markov property, 9, 312ft' permutation, 39
Markov times, 11, 203ft', 212ft', 217, Poisson process, 179
229ft', 32ft'; see also "strict Markov post-exit process, 216, 223ft', 243ft', 323ft'
times," "strong Markov property" Polya urn, 131
martingales, 90, 339ft' possible transition, 10, 40, 46
convergence, 344 poSt-T process, 12-13, 230ft', 317ft'
transformation, 339 pre-T sigma-field, 11ft', 203ft', 230ft', 316ft'
matrix probability, inner, outer, sub-, triple
331 '
exponentiation, see "exponential"
. jump, see "jump matrix" process
multiplication, 2, 7, 138 partially defined, 8, 59ft'
semigroup, see "semigroup" Poisson, 179
stochastic, 2, 7, 138 post-exit, 216, 223ft', 243ft', 323ft'
substochastic, 2, 7, 138 poSt-T, 12-13, 230ft', 317ft'
maximum of identically distributed U whose holding times areaIternately
variables, 84 constant and exponential, 267ft'
mean product measurable, see "jointly
hitting times, 25, 26ft' measurable"
number of visits, 19, 35, 52, 111ft'; see
also "ratio limit theorems" q-lim, 302
return times, 25, 26ft', 30, 55 quasiconvergence, 302ft'
measurability, 331, 337; see also "joint quasilimit, 302
measurability" quasiregularity, 304, 307ft', 318ft'
measures, 334; see also "probabilities" on (0,00), 321ft'
equivalent, 333 relative to a given set, 312
extreme, 336
orthogonal, 333 Radon-Nikodym
singular, 333 derivative, 333, 345, 357, 361
a-finite, 334 derivative for h-chains 113ft'
meeting of particles, 45 theorem, 333 '
metric density, 314ft', 357 random subset, 91
metric perfection, 311 random variable, 331
metric space, 346 partially defined, 331, 353
minimal Q-process, 237ft' random walk
minimal solution, 217, 237ft' Bernoulli, 32
monotone class, 335 boundary, 129
monotone convergence, 331 Harris, 36
in space-time, 130
Neveu, 90, 331ft' on group, 44
number of visits to i on planar lattice, 45
before hitting J \ {i}, 132ft' Polya urn, 131
up to time n, 48, 75 unbounded, 101
INDEX 377

visits near 0, 89 separability, 204


with integer time, 273ff sequence, Cauchy, fundamental, 333, 346
ratio limit theorems Serrin and Varberg, 361
almost everywhere, 73 set of infinities is
Doeblin, 147 closed from the right, 223
for Pcmean number of visits to j, 47,50 discrete, 292ff
for Pcmean sum off(~n)' 51, 57, 58 homeomorphic to a given set, 181,295
for pn(i, j), 48, 50, 64ff, 70ff Lebesgue-null, 223, 311
for Pp-mean number of visits to i, 50, residual, 311
55,58 uncountable, 180
Harris-Levy, 48 sets of constancy, 222, 308ff
Kingman-Orey, 48, 64ff shift, 8, 113, 232ff, 313, 320ff
recurrence, 5 sigma-field, 331
criterion for, 5, 19,28 Borel, 204, 346
in Bernoulli walk, 32ff exchangeable, 39ff, 46
in Harris walk, 36ff invariant, 39ff, 46, IlIff, 120, 293ff
null, 6, 26, 27, 47ff pre-T, 11ff, 203ff, 230ff, 316ff
positive, 6, 26, 27 separable, countably generated, 334,
regular conditional distributions, 347ff 352
regular Q-process, 238ff spanned, generated by a variable, 334
regular sample functions, 216ff tail, 39ff, 46
renewal theorem, 6, 22, 44 Smith, 146
restricting the range, 47,294 Speakman, 271
reversal Spitzer, 38
of matrix, 48 standard modification of a chain
of time, 47, 48ff, 250 with stable transitions, 221 ff
Riemann integral, 356 with standard transitions, 308
Riesz-Nagy,357 with uniform transitions, 168ff
Rogozin, 104 state
absorbing, 144
Saks,357ff adjoined, see "infinite"
saturation, 204 essential, 16,21
semigroup fictitious, see "infinite"
property, 11 infinite, 174, 183,201,216,218,299
standard, stochastic, substochastic, instantaneous, 144, 297ff
138 null, 19
standard stochastic on three states, period, 17ff
not determined by its value at positive, 26ff
time 1,271 recurrent, 19ff
standard stochastic on two states, 297 space, 1
standard stochastic with all states stable, 144, 216ff, 309
instantaneous, 297ff transient, 19
standard stochastic with infinite step function, 139, 154, 165
second derivative, 260ff Stirling's formula, 360
standard stochastic with P(t, a, b)j Strassen, 82, 98, 99
P(t, a, c) oscillating as t -+ 0, 252ff strict Markov times, 203ff, 328
standard stochastic with PH, a, a) near different from Markov times, 214
o and P(l, a, a) near lie, 266ff Strong law, see "law of large numbers"
uniform, 147ff for martingales, 90
uniform substochastic, 150ff, 165ff
378 INDEX

strong Markov property, 4, 13, 229ff, transient, 19ff


315ff transitions, 1, 60
strongly approximate forbidden, 34
Markov chains, 292ff possible, see "possible transitions"
p-walks,275ff
subadditive, 64 uniform continuity of transition prob-
subsequences, 354 abilities, 144, 150
uniform integrability, 332
tail a-field, 39ff, 46
times variation, 357ff
admissible, 340 visiting process, 141, 154ff, 182ff, 223ff,
first holding, 216, 224 273ff, 292ff
hitting, 25, 27, 60, 80, 292ff general case, 324ff
holding, see "holding times" stable case, 227ff
Markov, see "Markov times" uniform case, 165ff
of first bad discontinuity, 160ff, 240ff
of last visit, 113ff Wald,26
return, 29, 85 Williams, 146
reversal of, see "reversal of time"
stopping, 340 zero-one law
strict Markov, see "strict Markov for chains, 39ff
times" Hewitt-Savage, 40
to absorption, 28 Kolmogorov, 338
to first 1 in a stationary 0-1 process, 63
to leave a strip, 101 Zygmund, 358
SYMBOL FINDER

DESCRIPTION

I've listed here the symbols with some degree of permanence; the list is
not complete, and local usage is sometimes different. The listing is alpha-
betical, first English then Greek; I give a quick definition, if possible, and
a page reference for the complete definition.
Sections 10.1-3 discuss notation and references.

ENGLISH
C, Cd: index sets, Chapter 6 only, page 184
C/[m, w] = CAw]: set, Chapter 6 only, page 184
C(i): communicating class containing i, page 17
Cr{i): cyclically moving class, page 18
ex: concentration function of X, page 99
eP: expected number of visits, page 19
e"P : expected number of visits, page 49
ePH: expected number of visits, page 34
eP{ i} : expected number of visits, page 47
E is expectation
E j is Prexpectation
E: set, Chapter 4 only, page 118
8: exchangeable a-field, Chapter 1 only, page 39
8: equivalent to invariant a-field, Chapter 4 only, page 118
fP: hitting probability, page 19
f"P: hitting probability, page 19
fPH: hitting probability, page 34
f x v: measure, Chapter 2 only, page 51
f/' T: pre-r sigma-field, page 11
f/'(r): pre-r sigma-field, page 203
f/'(r + ): pre-r sigma-field
379
380 SYMBOL FINDER

ff(t): pre-t sigma-field


g: kernel, Chapter 4 only, page 115
G: expected number of visits, Chapter 4 only, page 111
h: excessive function, Chapter 4 only, page 111
H: set, Chapter 4 only, page 117
i, j, k, /: states
i --+ j: leads to, page 16
I: state space, pages 1,7, 138, 173, 184,217,299
Ii: states after i, Chapter 6 only, page 173
1 = I U {({J}: compactified state space, pages 139, 174, 184,218,299
100 : space of infinite I-sequences, pages 2, 7
1*: space of finite I-sequences, page 8
oF: invariant a-field, pages 39, 113
oF*, oF oo : invariant a-fields, Chapter 4 only, page 113
Ih = {h > O}: Chapter 4 only, page 112
In: finite set swelling to I, Chapter 4 only, page 113
L: last coordinate, Chapter 4 only, page 114
/(n): last hit before n, pages 74, 84
mP: mean waiting time, page 25
M, Md = M(d,· ): order-isomorphism, Chapter 6 only, page 184
N(t,x): random index, Chapter 5 only, page 157
p: starting probability on I, pages 8, 139
pG: expected number of visits, Chapter 4 only, page 111
ph: starting probability, Chapter 4 only, page 111
P: stochastic matrix, page 1
P: stochastic semigroup, pages 138, 176, 192,217,299
Pi: distribution of P-chain starting from i, pages 218, 304
Pp: distribution of P-chain starting from p, page 8
PH: forbidden transitions, page 34
P {i} : forbidden transitions, page 16
ph: Chapter 4 only, page 112
q: holding time parameters, pages 140, 155, 156, 166, 174, 184,217,237,299
q-lim: Chapter 9 only, page 112
Q = P'(O): generator, pages 140, 155, 166, 176, 192
Qh, Q",: probabilities, Chapter 4 only, page 112
rj: length of jth s-block, Chapter 3 only, page 85
R: nonnegative binary rationals, page 169
Riw) = {r :w(r) = j}: Chapter 9 only, page 308
s: reference state, pages 74, 82
sn = l{s.> O}: Chapter 3 only, page 83
S: shift, pages 232, 320
Si(W) = {t: X(t, w) = i}: pages 222, 308
Sn = ~~ f(e): Chapter 3 only, page 83
T: shift, pages 8,162,177,194
T,,: shift, page 114
SYMBOL FINDER 381

T,: shift, page 313


Vj = :En {If(~n)l: Tj ~ n < Tj+ d: Chapter 3 only, page 83
Vm = l(Vm>O): Chapter 3 only, page 83
Vm = Y, + ... + Y
m : Chapter 3 only, page 83
WE W: sequence of positive reals, pages 155, 175, 186
H-;: sequence of positive reals, Chapter 6 only, page 174
W. : set, Chapter 9 only, page 313
X: coordinate process, pages 155,158,218,220,304,307
Xi: coordinate process, Chapter 6 only, page 174
f![: base space for s constructions, pages 155, 186
Y : post-exit process and post -T process, pages 224, 230, 317, 323
lj =:E n {J(~n):Tj ~ n < Tj +,}: Chapter 3 only, page 82
Y,.: last visit to In' Chapter 4 only, page 113

GREEK

r: jump matrix, pages 140, 166, 185,224,237,325


f: extended to 1 = I U {b}, Chapter 5 only, page 155
b: absorbing state
L\ = {T < oo}: pages 12,230,316
,: poSt-T process, page 12
'1 q : probability making coordinate c exponential with parameter q(c), independent of
the other coordinates, pages 156, 187
A: left endpoint of interval of constancy, Chapter 6 only, pages 174, 186
A*: sum, Chapter 6 only, page 185
v(f) = :Ef(i)v(i}: Chapter 2 only, page 51
J = {v > o}: Chapter 2 only, page 48
vM: reversal of M by v, Chapter 2 only, page 48
vMf = ~ v(i)M(i,j)j(j): Chapter 2 only, page 48
~ : coordinate process on 1 pages 3, 7
00 ,

~: visiting process, pages 155, 185,227,324


11:;: probability in constructions, pages 155, 175, 187
p: right endpoint of interval of constancy, Chapter 6 only, pages 174, 186
P* : sum, Chapter 6 only, page 185
PI: endpoint, Chapter 6 only, page 187
a(Ioo): product a-field in I, pages 3, 7
(J: time of bad discontinuity, Chapter 5 only, page 157
T: Markov time, pages 11, 230, 316
T: holding times, pages 155, 174, 186,227,324
T: first holding time, pages 224, 323
Tn: times of visits to s, pages 74, 82
Tn: times of visits to J, Chapter 2 only, page 60
Tn: time oflast visit to In, Chapter 4 only, page 113
</>P(i,j): hitting probability, Chapter 1 only, page 25
382 SYMBOL FINDER

4>(i,j): hitting probability, Chapter 4 only, page 124


cp: infinite state, pages 174, 184,218,299
wEI"': I-valued function on the nonnegative integers, pages 2, 7, 155
WE Q: I-valued function, pages 112, 185,218,304
Q* : set, pages 112, 185
Qoo: set, Chapter 4 only, page 112
Q g : good sample functions, Chapter 7 only, page 219
Q m : metrically perfect sample functions, Chapter 9 only, page 311
Q o : hit 10 , Chapter 4 only, page 113
Q q : quasiregular sample functions, Chapter 9 only, page 307
Q v : very good sample functions, Chapter 7 only, page 219

Das könnte Ihnen auch gefallen