Sie sind auf Seite 1von 691


Stochastic Processes
with Applications
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out
of print by their original publishers, though they are of continued importance and interest to the
mathematical community. SIAM publishes this series to ensure that the information presented in these
texts is not lost to today's students and researchers.

Robert E. O'Malley, Jr., University of Washington

Editorial Board
John Boyd, University of Michigan
Leah Edelstein-Keshet, University of British Columbia
William G. Faris, University of Arizona
Nicholas J. Higham, University of Manchester
Peter Hoff, University of Washington
Mark Kot, University of Washington
Peter Olver, University of Minnesota
Philip Protter, Cornell University
Gerhard Wanner, L'Universite de Geneve

Classics in Applied Mathematics

C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and
Computational Methods
James M. Ortega, Numerical Analysis: A Second Course
Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained
Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis
George F. Carrier and Carl E. Pearson, Ordinary Differential Equations
Leo Breiman, Probability
R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding
Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences
Olvi L. Mangasarian, Nonlinear Programming
*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One,
Part Two, Supplement. Translated by G. W. Stewart
Richard Bellman, Introduction to Matrix Analysis
U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for
Ordinary Differential Equations
K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-Value Problems
in Differential- Algebraic Equations
Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems
J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear
Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability
Cornelius Lanczos, Linear Differential Operators
Richard Bellman, Introduction to Matrix Analysis, Second Edition
Beresford N. Parlett, The Symmetric Eigenvalue Problem
Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow
Peter W. M. John, Statistical Design and Analysis of Experiments
Tamer Baar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition
Emanuel Parzen, Stochastic Processes
*First time in print.
Classics in Applied Mathematics (continued)

Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis
and Design
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New
Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kale and Malcolm Slaney, Principles of Computerized Tomographic Imaging
R. Wong, Asymptotic Approximations of Integrals
O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation
David R. Brillinger, Time Series: Data Analysis and Theory
Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point
Philip Hartman, Ordinary Differential Equations, Second Edition
Michael D. Intriligator, Mathematical Optimization and Economic Theory
Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. I: Theory
M. Vidyasagar, Nonlinear Systems Analysis, Second Edition
Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice
Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology
of Selecting and Ranking Populations
Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah Edelstein-Keshet, Mathematical Models in Biology
Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the Navier-Stokes Equations
J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition
George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and
Friedrich Pukelsheim, Optimal Design of Experiments
Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications
Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics
Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues
Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics
Charles A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties
Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations
Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems
I. Gohberg, P. Lancaster, and L. Rodman, Matrix Polynomials
Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics
Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone, The Linear Complementarity Problem
Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications
Robert J. Adler, The Geometry of Random Fields
Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity
Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions
F ^
Stochastic Processes
with Applications

b ci

Rabi N. Bhattacharya
University of Arizona
Tucson, Arizona

Edward C. Waymire
Oregon State University
Corvallis, Oregon

pia m o
Society for Industrial and Applied Mathematics
Copyright 2009 by the Society for Industrial and Applied Mathematics

This SIAM edition is an unabridged republication of the work first published by John
Wiley & Sons (SEA) Pte. Ltd., 1992.


All rights reserved. Printed in the United States of America. No part of this book may
be reproduced, stored, or transmitted in any manner without the written permission of
the publisher. For information, write to the Society for Industrial and Applied
Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.

Library of Congress Cataloging-in-Publication Data

Bhattacharya, R. N. (Rabindra Nath), 1937-

Stochastic processes with applications / Rabi N. Bhattacharya, Edward C. Waymire.
p. cm. -- (Classics in applied mathematics ; 61)
Originally published: New York : Wiley, 1990.
Includes index.
ISBN 978-0-898716-89-4
1. Stochastic processes. I. Waymire, Edward C. II. Title.
QA274.B49 2009

S1L2JTL. is a registered trademark.

To Gouri and Linda,
with love

Preface to the Classics Edition xiii

Preface xv

Sample Course Outline xvii

I Random Walk and Brownian Motion 1

1. What is a Stochastic Process?, 1
2. The Simple Random Walk, 3
3. Transience and Recurrence Properties of the Simple Random Walk, 5
4. First Passage Times for the Simple Random Walk, 8
5. Multidimensional Random Walks, 11
6. Canonical Construction of Stochastic Processes, 15
7. Brownian Motion, 17
8. The Functional Central Limit Theorem (FCLT), 20
9. Recurrence Probabilities for Brownian Motion, 24
10. First Passage Time Distributions for Brownian Motion, 27
11. The Arcsine Law, 32
12. The Brownian Bridge, 35
13. Stopping Times and Martingales, 39
14. Chapter Application: Fluctuations of Random Walks with Slow Trends
and the Hurst Phenomenon, 53
Exercises, 62
Theoretical Complements, 90

II Discrete-Parameter Markov Chains 109

1. Markov Dependence, 109
2. Transition Probabilities and the Probability Space, 110



3. Some Examples, 113

4. Stopping Times and the Strong Markov Property, 117
5. A Classification of States of a Markov Chain, 120
6. Convergence to Steady State for Irreducible and Aperiodic Markov
Processes on Finite Spaces, 126
7. Steady-State Distributions for General Finite-State Markov
Processes, 132
8. Markov Chains: Transience and Recurrence Properties, 135
9. The Law of Large Numbers and Invariant Distributions for Markov
Chains, 138
10. The Central Limit Theorem for Markov Chains, 148
11. Absorption Probabilities, 151
12. One-Dimensional Nearest-Neighbor Gibbs States, 162
13. A Markovian Approach to Linear Time Series Models, 166
14. Markov Processes Generated by Iterations of I.I.D. Maps, 174
15. Chapter Application: Data Compression and Entropy, 184
Exercises, 189
Theoretical Complements, 214

III BirthDeath Markov Chains 233
1. Introduction to BirthDeath Chains, 233
2. Transience and Recurrence Properties, 234
3. Invariant Distributions for BirthDeath Chains, 238
4. Calculations of Transition Probabilities by Spectral Methods, 241
5. Chapter Application: The Ehrenfest Model of Heat Exchange, 246
Exercises, 252
Theoretical Complements, 256

IV Continuous-Parameter Markov Chains 261

1. Introduction to Continuous-Time Markov Chains, 261
2. Kolmogorov's Backward and Forward Equations, 263
3. Solutions to Kolmogorov's Equations in Exponential Form, 267
4. Solutions to Kolmogorov's Equations by Successive Approximation, 271
5. Sample Path Analysis and the Strong Markov Property, 275
6. The Minimal Process and Explosion, 288
7. Some Examples, 292
8. Asymptotic Behavior of Continuous-Time Markov Chains, 303
9. Calculation of Transition Probabilities by Spectral Methods, 314
10. Absorption Probabilities, 318

11. Chapter Application: An Interacting System: The Simple Symmetric

Voter Model, 324
Exercises, 333
Theoretical Complements, 349

V Brownian Motion and Diffusions 367

1. Introduction and Definition, 367
2. Kolmogorov's Backward and Forward Equations, Martingales, 371
3. Transformation of the Generator under Relabeling of the State Space, 381
4. Diffusions as Limits of BirthDeath Chains, 386
5. Transition Probabilities from the Kolmogorov Equations: Examples, 389
6. Diffusions with Reflecting Boundaries, 393
7. Diffusions with Absorbing Boundaries, 402
8. Calculation of Transition Probabilities by Spectral Methods, 408
9. Transience and Recurrence of Diffusions, 414
10. Null and Positive Recurrence of Diffusions, 420
11. Stopping Times and the Strong Markov Property, 423
12. Invariant Distributions and the Strong Law of Large Numbers, 432
13. The Central Limit Theorem for Diffusions, 438
14. Introduction to Multidimensional Brownian Motion and Diffusions, 441
15. Multidimensional Diffusions under Absorbing Boundary Conditions and
Criteria for Transience and Recurrence, 448
16. Reflecting Boundary Conditions for Multidimensional Diffusions, 460
17. Chapter Application: G. I. Taylor's Theory of Solute Transport in a
Capillary, 468
Exercises, 475
Theoretical Complements, 497

VI Dynamic Programming and Stochastic Optimization 519

1. Finite-Horizon Optimization, 519
2. The Infinite-Horizon Problem, 525
3. Optimal Control of Diffusions, 533
4. Optimal Stopping and the Secretary Problem, 542
5. Chapter Application: Optimality of (S, s) Policies in Inventory
Problems, 549
Exercises, 557
Theoretical Complements, 559


VII An Introduction to Stochastic Differential Equations 563

1. The Stochastic Integral, 563
2. Construction of Diffusions as Solutions of Stochastic Differential
Equations, 571
3. It6's Lemma, 582
4. Chapter Application: Asymptotics of Singular Diffusions, 591
Exercises, 598
Theoretical Complements, 607

0 A Probability and Measure Theory Overview 625

1. Probability Spaces, 625
2. Random Variables and Integration, 627
3. Limits and Integration, 631
4. Product Measures and Independence, RadonNikodym Theorem and
Conditional Probability, 636
5. Convergence in Distribution in Finite Dimensions, 643
6. Classical Laws of Large Numbers, 646
7. Classical Central Limit Theorems, 649
8. Fourier Series and the Fourier Transform, 653

Author Index 665

Subject Index 667
Errata 673
Preface to the Classics Edition

The publication of Stochastic Processes with Applications (SPWA) in the SIAM

Classic in Applied Mathematics series is a matter of great pleasure for us, and we
are deeply appreciative of the efforts and good will that went into it. The book has
been out of print for nearly ten years. During this period we received a number of
requests from instructors for permission to make copies of the book to be used as
a text on stochastic processes for graduate students. We also received many kind
laudatory words, along with inquiries about the possibility of bringing out a second
edition, from mathematicians, statisticians, physicists, chemists, geoscientists, and
others from the U.S. and abroad. We hope that the inclusion of a detailed errata is a
helpful addition to the original.
SPWA was a work of love for its authors. As stated in the original preface,
the book was intended for use (1) as a graduate-level text for students in diverse
disciplines with a reasonable background in probability and analysis, and (2) as a
reference on stochastic processes for applied mathematicians, scientists, engineers,
economists, and others whose work involves the application of probability. It was our
desire to communicate our sense of excitement for the subject of stochastic processes
to a broad community of students and researchers. Although we have often empha-
sized substance over form, the presentation is systematic and rigorous. A few proofs
are relegated to Theoretical Complements, and appropriate references for proofs are
provided for some additional advanced technical material. The book covers a sub-
stantial part of what we considered to be the core of the subject, especially from the
point of view of applications. Nearly two decades have passed since the publication
of SPWA, but the importance of the subject has only grown. We are very happy to
see that the book's rather unique style of exposition has a place in the broader applied
mathematics literature.
We would like to take this opportunity to express our gratitude to all those col-
leagues who over the years have provided us with encouragement and generous
words on this book. Special thanks are due to SIAM editors Bill Faris and Sara
Murphy for shepherding SPWA back to print.


This is a text on stochastic processes for graduate students in science and

engineering, including mathematics and statistics. It has become somewhat
commonplace to find growing numbers of students from outside of mathematics
enrolled along with mathematics students in our graduate courses on stochastic
processes. In this book we seek to address such a mixed audience. For this
purpose, in the main body of the text the theory is developed at a relatively
simple technical level with some emphasis on computation and examples.
Sometimes to make a mathematical argument complete, certain of the more
technical explanations are relegated to the end of the chapter under the label
theoretical complements. This approach also allows some flexibility in
instruction. A few sample course outlines have been provided to illustrate the
possibilities for designing various types of courses based on this book. The
theoretical complements also contain some supplementary results and references
to the literature.
Measure theory is used sparingly and with explanation. The instructor may
exercise control over its emphasis and use depending on the background of the
majority of the students in the class. Chapter 0 at the end of the book may be
used as a short course in measure theoretical probability for self study. In any
case we suggest that students unfamiliar with measure theory read over the first
few sections of the chapter early on in the course and look up standard results
there from time to time, as they are referred in the text.
Chapter applications, appearing at the end of the chapters, are largely drawn
from physics, computer science, economics, and engineering. There are many
additional examples and applications illustrating the theory; they appear in the
text and among the exercises.
Some of the more advanced or difficult exercises are marked by asterisks.
Many appear with hints. Some exercises are provided to complete an argument
or statement in the text. Occasionally certain well-known results are only a few
steps away from the theory developed in the text. Such results are often cited
in the exercises, along with an outline of steps, which can be used to complete
their derivation.
Rules of cross-reference in the book are as follows. Theorem m.n, Proposition


m.n, or Corollary m.n, refers to the nth such assertion in section m of the same
chapter. Exercise n, or Example n, refers to the nth Exercise, or nth Example,
of the same section. Exercise m.n (Example m.n) refers to Exercise n (Example
n) of a different section m within the same chapter. When referring to a result
or an example in a different chapter, the chapter number is always mentioned
along with the label m.n to locate it within that chapter.
This book took a long time to write. We gratefully acknowledge research
support from the National Science Foundation and the Army Research Office
during this period. Special thanks are due to Wiley editors Beatrice Shube and
Kate Roach for their encouragement and assistance in seeing this effort through.


Bloomington, Indiana
Corvallis, Oregon
February 1990
Sample Course Outlines
Beginning with the Simple Random Walk, this course leads through Brownian
Motion and Diffusion. It also contains an introduction to discrete/continuous-
parameter Markov Chains and Martingales. More emphasis is placed on concepts,
principles, computations, and examples than on complete proofs and technical
Chapter 1 Chapter II Chapter III
1-7 (+ Informal Review of Chapter 0, 4) 1-4 1--3
13 (Up to Proposition 13.5) 5 (By examples) 5
11 (Example 2)
Chapter IV Chapter V Chapter VI
1-7 (Quick survey 1 4
by examples) 2 (Give transience/recurrence
from Proposition 2.5)
3 (Informal justification of
equation (3.4) only)
11 (Omit proof of Theorem 11.1)
The principal topics are the Functional Central Limit Theorem, Martingales,
Diffusions, and Stochastic Differential Equations. To complete proofs and for
supplementary material, the theoretical complements are an essential part of this
Chapter I Chapter V Chapter VI Chapter VII
1-4 (Quick survey) 1-3 4 1--4
6-10 6-7
13 11
13 -17
This is a course on Markov Chains that also contains an introduction to
Martingales. Theoretical complements may he used only sparingly.

Chapter I Chapter II Chapter III Chapter IV Chapter VI

1-6 1-9 1 1-11 1-2
13 11 5 4-5
12 or 15

Random Walk and Brownian



Denoting by X the value of a stock at an nth unit of time, one may represent
its (erratic) evolution by a family of random variables {X0 , X,, ...} indexed by
the discrete-time parameter n E 7L + . The number X, of car accidents in a city
during the time interval [0, t] gives rise to a collection of random variables
{ X1 : t 0} indexed by the continuous-time parameter t. The velocity X. at a
point u in a turbulent wind field provides a family of random variables
{X: u e l8 3 indexed by a multidimensional spatial parameter u. More generally

we make the following definition.

Definition 1.1. Given an index set I, a stochastic process indexed by I is a

collection of random variables {X1 : 2 e I} on a probability space (Cl, ., P)
taking values in a set S. The set S is called the state space of the process.

In the above, one may take, respectively: (i) I = Z , S = I!; (ii) I = [0, oo),
S = Z; (iii) I = l, S = X8 3 . For the most part we shall study stochastic
processes indexed by a one-dimensional set of real numbers (e.g., time). Here
the natural ordering of numbers coincides with the sense of evolution of the
process. This order is lost for stochastic processes indexed by a multidimensional
parameter; such processes are usually referred to as random fields. The state
space S will often be a set of real numbers, finite, countable, (i.e., discrete) or
uncountable. However, we also allow for the possibility of vector-valued
variables. As a matter of convenience in notation the index set is often suppressed
when the context makes it clear. In particular, we often write {X} in place of
{X: n = 0, 1, 2, ...} and {X,} in place of {X,: t >, 0}.
For a stochastic process the values of the random variables corresponding

to the occurrence of a sample point co e fl constitute a sample realization

of the process. For example, a sample realization of the coin-tossing
process corresponding to the occurrence of w e f2 is of the form
(X0 (aw), X,(co), ... , X(w), ...). In this case X(w) = 1 or 0 depending on
whether the outcome of the nth toss is a head or a tail. In the general case of
a discrete-time stochastic process with state-space S and index set
I = 7L + = {0, 1, 2, ...}, the sample realizations of the process are of the form
(X0 (a ), X l (co), ... , X(w), ...), X(co) e S. In the case of a continuous-parameter
stochastic process with state space S and index set I = I{B + = [0, cc), the sample
realizations are functions t X(w) e S, w e S2. Sample realizations of a
stochastic process are also referred to as sample paths (see Figures 1.1 a, b).
In the so-called canonical choice for f) the sample points of f represent
sample paths. In this way S2 is some set of functions w defined on I taking
values in S, and the value X,(co) of the process at time t corresponding to the
outcome co E S2 is simply the coordinate projection X,(w) = co y . Canonical
representations of sample points as sample paths will be used often in the text.
Stochastic models are often specified by prescribing the probabilities of events
that depend only on the values of the process at finitely many time points. Such
events are called finite-dimensional events. In such instances the probability
measure P is only specified on a subclass ' of the events contained in a sigmafield
F. Probabilities of more complex events, for example events that depend on
the process at infinitely many time points (infinite-dimensional events), are



Figure 1.1

frequently calculated in terms of the probabilities of finite-dimensional events

by passage to a limit.
The ideas contained in this section will be illustrated in the example and
in exercises.

Example 1. The sample space S2 for repeated (and unending) tosses of a coin
may be represented by the sequence space consisting of sequences of the form
w = (co l , w 2 . . , w n , ...) with aw n = 1 or co,, = 0. For this choice of 0, the value

of X. corresponding to the occurrence of the sample point w e f is simply the

nth coordinate projection of w; i.e., X(w) = w,. Suppose that the probability of
the occurrence of a head in a single toss is p. Since for any number n of tosses
the results of the first n 1 tosses have no effect on the odds of the nth toss,
the random variables X 1 . . , X. are, for each n >, 1, independent. Moreover,

each variable has the same (Bernoulli) distribution. These facts are summarized
by saying that {X 1 , X2,. . .} is a sequence of independent and identically
distributed (i.i.d.) random variables with a common Bernoulli distribution. Let
Fn denote the event that the specific outcomes E 1 , ... , e n occur on the first n
tosses respectively. Then

Fn = {X1 =e ,...,Xn =En } = { w a fl: w 1 = s 13 ...,CJ n =E n }

is a finite-dimensional event. By independence,

P(F,,) = p'"( 1 p)" - '" (1.1)

where rn is the number of l's among e, . , e n . Now consider the singleton

. .

event G corresponding to the occurrence of a specific sequence of outcomes

c ,e ,...,s n ,... . Then

G = {X l =e ,. . . , Xn = E n ,. . .} = {(E1, E2, . . . , en , ...)}

consists of the single outcome a = ( E,, e 2 , ... , E n , ...) in f2. G is an

infinite-dimensional event whose probability is easily determined as follows.
Since G c Fn for each n > 1, it follows that

0 < P(G) < P(F,,) = p'"(1 p)" '" - for each n = 1, 2, .... (1.2)

Now apply a limiting argument to see that, for 0 < p < 1, P(G) = 0. Hence the
probability of every singleton event in S2 is zero.


Think of a particle moving randomly among the integers according to the

following rules. At time n = 0 the particle is at the origin. At time n = 1 it

moves either one unit forward to + I or one unit backward to 1, with

respective probabilities p and q = 1 p. In the case p = 2, this may be
accomplished by tossing a balanced coin and making the particle move forward
or backward corresponding to the occurrence of a "head" or a "tail",
respectively. Similar experiments can be devised for any fractional value of p.
We may think of the experiment, in any case, as that of repeatedly tossing a
coin that falls "head" with probability p and shows "tail" with probability
I p. At time n the particle moves from its present position S_ 1 by a unit
distance forward or backward depending on the outcome of the nth toss.
Suppose that X. denotes the displacement of the particle at the nth step from
its position S_, at time n 1. According to these considerations the
displacement (or increment) process {X.} associated with {S} is an i.i.d. sequence
with P(X = + 1) = p, P(X _ 1) = q = 1 p for each n > 1. The position
process {S} is then given by

S,,:=X i +...+X., S 0 =0. (2.1)

Definition 2.1. The stochastic process {S,,: n = 0, 1, 2, ...} is called the simple
random walk. The related process S = S + x, n = 0, 1, 2, ... is called the simple
random walk starting at x.

The simple random walk is often used by physicists as an approximate model

of the fluctuations in the position of a relatively large solute molecule immersed
in a pure fluid. According to Einstein's diffusion theory, the solute molecule
gets kicked around by the smaller molecules of the fluid whenever it gets within
the range of molecular interaction with fluid molecules. Displacements in any
one direction (say, the vertical direction) due to successive collisions are small
and taken to be independent. We shall return to this physical model in Section 7.
One may also think of X,, as a gambler's gain in the nth game of a series of
independent and stochastically identical games: a negative gain means a loss.
Then S = x is the gambler's initial capital, and S is the capital, positive or
negative, at time n.
The first problem is to calculate the distribution of S. To calculate the
probability of {S; = y}, count the number u of + I's in a path from x to y in
n steps. Since n u is then the number of l's, one must have
u (n u) = y x, or u = (n + y x)/2. For this, nand y x must be both
even or both odd, and ly xj <, n. Hence

n + y x pin+Yx)12q(nY+x)/2 if ly xI < ri

P(S. =Y)= 2
and y x, n have the same parity,
0 otherwise. (2.2)



Let us first consider the manner in which a particle escapes from an interval.
Let TY denote the first time that the process starting at x reaches y, i.e.

Ty:= min{n >, 0:S = y}. (3.1)

To avoid trivialities, assume 0 <p < 1. For integers c and d with c < d, denote

4(x):= P(T < T' ). (3.2)

In other words, 4(x) is the probability that the particle starting at x reaches d
before it reaches c. Since in one step the particle moves to x + I with probability
p, or to x 1 with probability q, one has

4(x) = po(x + 1) + q4(x 1) (3.3)

so that

O(x+ 1)O(x)=-[O(x)c(x 1)], c+ 1 ,<x,<d 1

0(c) = 0,
q(d) = 1.

Thus, (x) is the solution to the discrete boundary-value problem (3.4). For
p ^ q, Eq. 3.4 yields

x-1 x-1 q Y
O(x) = Z [^(y + 1 ) o(y)] = Z - [O(c + 1) O(c)]
v=c v=c P

1 - (q/P)x -'

=0(c+1) Y - =0(c+1) 1- /p) (3.5)

yc ' \P )

To determine 4(c + 1) take x = d in Eq. 3.5 to get

1 (qlP)- `
1 =4(d)=4(c+ 1)
1 q/P


1 q/P
q(c + 1) = 1
(glp)d c

so that

P(Tx<Tx)= 1 (q/P)xd-c for c<x<d, p q. (3.6)
1 (q/P)

Now let

0/i(x)==P(T, < Tdx). (3.7)

By symmetry (or the same method as above),

P(Tx<Td)= 1(P/q)d-xd-c
or c<x<d,p q.
f (3.8)
1 (P/q)

Note that O(x) + fr(x) = 1, proving that the particle starting in the interior of
[c, d] will eventually reach the boundary (i.e., either c or d) with probability
1. Now if c < x, then (Exercise 3)

P({S} will ever reach c) = P(T,' < oo) = lim i(x)


um \9/ x
ifp >21
= dam

1, ifp <Z,

q if p>
= P (3.9)
1, ifp<Z.

By symmetry, or as above,
P({S:} will ever reach d) = P(Td' < oo) = 1' d_x i f p > 2 (3.10)

C q/ ,

Observe that one gets from these calculations the (geometric) distribution
function for the extremes Mx = sup,, S and mx = inf S; (Exercise 7).
Note that, by the strong law of large numbers (Chapter 0),

P Sx = x+S
^pgasn--oo =1. (3.11)
n n

Hence, if p > q, then the random walk drifts to + oo (i.e., S -* + co) with
probability 1. In particular, the process is certain to reach d > x if p > q.
Similarly, if p < q, then the random walk drifts to - co (i.e., S -+ - cc), and
starting at x > c the process is certain to reach c if p < q. In either case, no
matter what the integer y is,

P(Sn = y i.o.) = 0, if p q, (3.12)

where i.o. is shorthand for "infinitely often." For if Sx = y for integers

nl < n2< through a sequence going to infinity, then

= y -+ 0 as n k - cc,
nk nk

the probability of which is zero by Eq. 3.11.

Definition 3.1. A state y for which Eq. 3.12 holds is called transient. If all
states are transient then the stochastic process is said to be a transient process.

In the case p = q = 2, according to the boundary-value problem (3.4), the

graph of 4(x) is along the line of constant slope between the points (c, 0) and
(d, 1). Thus,

P(Tx<Tx)= ,
c<x<d,p=q =Z (3.13)


P(Tcx <Td)=
c<,x<d,p=q =2 (3.14)

Again we have

q5(x) + i(x) = 1. (3.15)

Moreover, in this case, given any initial position x> c,

P({S} will eventually reach c) = P(Tc < cc)

= lim P({S} will reach c before it reaches d)

= lim d-x = 1. (3.16)

d - mo d - c

Similarly, whatever the initial position x < d,

P({S.} will eventually reach d) = P(Td < oo)

= lim xc = 1. (3.17)
e -- ao d c

Thus, no matter where the particle may be initially, it will eventually reach any
given state y with probability 1. After having reached y for the first time, it will
move to y + 1 or to y 1. From either of these positions the particle is again
bound to reach y with probability 1, and so on. In other words (Exercise 4),

P(S' = y i.o.) = 1, if p = 9 = 2. (3.18)

This argument is discussed again in Example 4.1 of Chapter II.

Definition 3.2. A state y for which Eq. 3.18 holds is called recurrent. If all
states are recurrent, then the stochastic process is called a recurrent process.

Let r^ X denote the time of the first return to x,

rl := inf{n >, 1: S.' = x} . (3.19)

Then, conditioning on the first step, it will follow (Exercise 6) that

P(f],, < oo) = 2 min(p, q). (3.20)


Consider the random variable 7 := T representing the first time the simple
random walk starting at zero reaches the level (state) y. We will calculate the
distribution of Ty by means of an analysis of the sample paths of the simple
random walk. Let FN , y = {Ty = N} denote the event that the particle reaches
state y for the first time at the Nth step. Then,

FN.y ={Siky for n=0,1,...,N-1,SN =y}. (4.1)

Note that "SN = y" means that there are (N + y)/2 plus l's and (N y)/2
minus 1's among X I , X2 , ... , XN (see Eq. 2.1). Therefore, we assume that
IYI <, N and N + y is even. Now there are as many paths leading from (0, 0)
to (N, y) as there are ways of choosing (N + y)/2 plus l's among X 1 , X2 , ... , XN ,

Each of these choices has the same probability of occurrence, specifically

p(N+y)/2q(N-r)/z Thus,

Lp(N+r)rzq(N -vz (4.2)

P(FN.Y) =

where L is the number of paths from (0, 0) to (N, y) that do not touch or cross
the level y prior to time N. To calculate L, consider the complementary number
L of paths that do reach y prior to time N,
L'= N+y L. (4.3)
First consider the case of y> 0. If a path from (0, 0) to (N, y) has reached
y prior to time N, then either (a) SJY _ 1 = y + 1 (see Figure 4.1a) or
(b) SN _ 1 = y I and the path from (0, 0) to (N 1, y 1) has reached y prior
to time N 1 (see Figure 4.1b). The contribution to L from (a) is
We need to calculate the contribution to L from (b).



Figure 4.1


Proposition 4.1. (A Reflection Principle). Let y > 0. The collection of all paths
from (0, 0) to (N 1, y 1) that touch or cross the level y prior to time N 1
is in one-to-one correspondence with the collection of all possible paths from
(0,0)to (N 1,y + 1).

Proof. Given a path y from (0, 0) to (N 1, y + 1), there is a first time r at

which the path reaches level y. Let y' denote the path which agrees with y up
to time T but is thereafter the mirror reflection of y about the level y (see Figure
4.2). Then y' is a path from (0, 0) to (N 1, y 1) that touches or crosses the
level y prior to time N 1. Conversely, a path from (0, 0) to (N 1, y 1)
that touches or crosses the level y prior to time N 1 may be reflected to get
a path from (0, 0) to (N 1, y + 1). This reflection transformation establishes
the one-to-one correspondence. n

It now follows from the reflection principle that the contribution to L' from
(b) is



L' =2 N+y (4.4)

Therefore, by (4.3), (4.2),

N N-1
P(T,,=N)= P(FN. y )= N + y 2 N + y
p(N +Y)/2q(N-y)/2

2 2

Figure 4.2

= IYI N + y p(N+Y)12q(N_Y)1i for N >, y, y + N even, y > 0
N \ 2
To calculate P(TT = N) for y < 0, simply relabel H as T and T as H (i.e.,
interchange + 1, 1). Using this new code, the desired probability is given by
replacing y by y and interchanging p, q in (4.5), i.e.,

P(Tr = N) = ( + y q(N_Y)/2p(N+Y)/2

Thus, for all integers y 0, one has

P ( Ty = N ) = N + y p ( N+y)/2 q (x -v)I2 = I (4.6)
p(SN = y)

for N = IYI, IYI + 2 , IYI + 4, .... In particular, if p = q = Z, then (4.6) yields

P(Ty = N)= I N N+y
Z N for N= IYI,IYI +2,IYI +4,.... (4.7)

However, observe that the expected time to reach y is infinite since by Stirling's
formula, k! = (2irk) 1 / 2 Ve '`( 1 + o(1)) as k -,, oo, the tail of the p.m.f. of Ty is of

the order of N -3 / 2 as N - oo (Exercise 10).


The k-dimensional unrestricted simple symmetric random walk describes the

motion of a particle moving randomly on the integer lattice 7L k according to
the following rules. Starting at a site x = (x., ... , x k ) with integer coordinates,
the particle moves to a neighboring site in one of the 2k coordinate directions
randomly selected with probability 1/2k, and so on, independently of previous
displacements. The displacement at the nth step is a random variable X. whose
possible values are vectors of the form e ; , i = 1, ... , k, where the jth
component of e ; is 1 for j = i and 0 otherwise. X 1 , X 2 ,... are i.i.d. with

P(X= e ; )=P(X= e,) = 1/2k fori= 1,...,k. (5.1)


The corresponding position process is defined by

S=x, S"= x+X 1 ++X n , ni1. (5.2)

The case k = 1 is that already treated in the preceding sections with p = q = 2.

In particular, for k = 1 we know that the simple symmetric random walk is
Consider the coordinates of X. = (X,.. . , X.). Although X,', and X are not
independent, notice that they are uncorrelated for 0 j. Likewise, it follows that
the coordinates of the position vector S = (Sn' 1 , ... , S) are uncorrelated.
In particular,

ES = x,

Cov(S nxi xj
' , Sn')
= tn, if =j i (5.3)
0, ifi*j.

Therefore the covariance matrix of S. is nI where I is the k x k identity matrix.

The problem of describing the recurrence properties of the simple symmetric
random walk in k dimensions is solved by the following theorem of P1ya.

Theorem 5.1. [P61ya]. {S} is recurrent for k = 1, 2 and transient for k ? 3.

Proof. The result has already been obtained for k = 1. In general, let S. = Sno
and write


fn = P(S" = 0 for the first time after time 0 at n), n >, 1. (5.4)

Then we get the convolution equation

rn = f r"_ forn=1,2,...,
j j
j =0
ro=1, f0= 0 . (5.5)

Let P(s) and f(s) denote the respective probability generating functions of {r n }
and {f,.} defined by

P(s) _ f (s) _ > fn s" (0 < s < 1). (5.6)

n =o n =o

The convolution equation (5.5) transforms as

P(s) = 1 + I E .ijrn-js'sn-j = 1 + jZ Y r
n =1j =0 =0 (M W
m sm)fj sj = 1 + f(s)f(s). (5.7)



r(s) 1 f(s)

The probability of eventual return to the origin is given by

Y:= Y- fn =.f(l) (5.9)

Note that by the Monotone Convergence Theorem (Chapter 0), P(s) ,, r(1) and
f(s) / f(1) as s T 1. If f(1) < 1, then P(l) = (1 f (1))' < oo. If f(1) = 1,
then P(1) = um s , (1 f(s) = oo. Therefore, y < 1 (i.e., 0 is transient) if
and only if :=r(1) < oo.
This criterion is applied to the case k = 2 as follows. Since a return to 0 is
possible at time 2n if and only if the numbers of steps among the 2n in the
positive horizontal and vertical directions equal the respective numbers of steps
in the negative directions,

r2n=4 -in " (2n)! 1

=_ 2n ( n2
j=o j!j!(n j)!(n j)! 42" n j

412n nn ^ n ^I n n 1 nn^ z . (5.10)
j=o j 4 2n

The combinatorial identity used to get the last line of (5.10) follows by
considering the number of ways of selecting samples of size n from a population
of n objects of type 1 and n objects of type 2 (Exercise 2). Apply Stirling's
formula to (5.10) to get r2 = 0(1/n) > c/n for some c > 0. Therefore,
= P(1) = + oo and so 0 is recurrent in the case k = 2.
In the case k = 3, similar considerations of "coordinate balance" give

rzn = 6zn (2n)!

(j.m):j+m ,n j!j!m!m!(n
j m)!(n
j m)!

1 2n 1
( ) n! 2
= 22n n (5.11)
j+msn 3" j!m!(n j m)!} .

Therefore, writing

n! 1
pj, m =
j!m!(n j m)! 3n

and noting that these are the probabilities for the trinomial distribution, we have


1 (2n) 2
= 2 z" (P;.m) (5.12)

is nearly an average of pj , m 's (with respect to the p j , m distribution). In any case,

( 2n)
22 n jmax Pj,m]Pj,m= 2 a" ( nnx) ma Pj.m. (5.13)
j,m j,m

The maximum value of pj , m is attained at j and m nearest to n/3 (Exercise 5).

Therefore, writing [x] for the integer part of x,

r2n \ 1 2n 1
( ) n.i
2 2 n n 3" rn i fl, [n],

Apply Stirling's formula to get (see 5.19 below),

r 2n - 2" nn n n
3/2 for some C' > 0. (5.15)

In particular,
Er"<oo. (5.16)

The general case, r2n < c k n -k/ 2 for k > 3, is left as an exercise (Exercise 1).

The constants appearing in the estimate (5.15) are easily computed from the
monotonicity of the ratio n!/{(2nn)`I 2 n"e - "}; whose limit as n -> oo is 1
according to Stirling's formula. To see that the ratio is monotonically decreasing,
simply observe that

log n! = log n! flog n n log n + n log(2n) l i 2
(2nn) 112 n"e - "

J .

j log j Z log n}{n lognn}log(2n)"2

= j log(j 1) + log(j) f " log

x dx } + 1 log(2n) l i z

U2 2 J^ J

where the integral term may be checked by integration by parts. The point is
that the term defined by

" log(J 1) + log(j)

2 (5.18)
j =2

provides the inner trapezoidal approximation to the area under the curve
y = log x, 1 < x <, n. Thus, in particular, a simple sketch shows

01 J logxdxT"

is monotonically increasing. So, in addition to the asymptotic value of the ratio,

one also has

n! e
n = 1, 2, .... (5.19)
1 (2nn) 112 n"e - " < (2n) 1 / z ,


Often a stochastic process is defined on a given probability space as a sequence

of functions of other already constructed random variables. For example, the
simple random walk {S" = X l + + X"}, So = 0 is defined in terms of the
coin-tossing process {X"} in Section 2. At other times, a probability space is
constructed specifically to define the stochastic process. For example, the
probability space for the coin-tossing process was constructed starting from the
specifications of the probabilities of finite sequences of heads and tails. This
latter method, called the canonical construction, is elaborated upon in this
Consider the case that the state space is R' (or a subset of it) and the
parameter is discrete (n = 1, 2, ...). Take S2 to be the space of all sample paths;
i.e., 52:= (ff!)' := R' is the space of all sequences co = (cw l , w 2 ,...) of real
numbers. The appropriate sigmafield .y := R is then the smallest sigmafield
containing all finite-dimensional sets of the form {w e SZ: w 1 e B I , ... , w e Bk },
where B I , ... , B k are Borel subsets of W. The coordinate functions X" are
defined by X(w) = con.
As in the case of coin tossing, the underlying physical process sometimes
suggests a specification of probabilities of finite-dimensional events defined by
the values of the process at time points 1, 2, ... , n for each n >, 1. That is, for
each n > 1 a probability measure P. is prescribed on (R", M "). The problem is
that we require a probability measure P on (f, F) such that P" is the distribution
of X 1 , ... , X. That is, for all Borel sets B 1 , ... , B",

P(we02:m,EB 1 ,...,Cw "EB")=P"(B 1 x ... x B"). (6.1)



P(X 1 E B 1 , ... , X E B) = P(B 1 x .. x B). (6.2)

Since the events {X 1 c-B 1 ,...,XEB,Xn+1 eR'}and {X 1 eB 1 ,...,XEB}

are identical subsets of .^', for there to be a well-defined probability measure
P prescribed by (6.1) or (6.2) it is necessary that

PP +1(B, x x B x Ili') = Pn (B, x .. x B) (6.3)

for all Borel sets B 1 , . .. , B. in 118 1 and n >, 1. Kolmogorov's Existence Theorem
asserts that the consistency condition (6.3) is also sufficient for such a probability
measure P to exist and that there is only one such P on (, R) = (12, F)
(theoretical complement 1). This holds more generally, for example, when the
state space S is l, a countable set, or any Borel subset of tF . A proof for the
simple case of finite state processes is outlined in Exercise 3.

Example 1. Consider the problem of canonically constructing a sequence

X 1 , X2 , ... of i.i.d. random variables having the common (marginal) distribution
Q on (111 1 , R 1 ). Take il = IR', F = R', and X. the nth coordinate projection
X(w) = w, w E S2. Define, for each n >, 1 and all Borel sets B 1 , ... , B,,,

p,, (B 1 x ... x B.) = Q(B 1 ). . . Q(B,,). (6.4)

Since Q(R') = 1, the consistency condition (6.3) follows immediately from the
definition (6.4). Now one simply invokes the Kolmogorov Existence Theorem
to get a probability measure P on (S2, F) such that

P(X 1 E B 1 , ... , Xq E Bn) = Q(B1) ...


= p(X1 EB1) . .
.p(X,,EB). (6.5)

The simple random walk can be constructed within the framework of the
canonical probability space (S2, F, P) constructed for coin tossing, although
this is a noncanonical probability space for {S}. Alternatively, a canonical
construction can be made directly for {S} (Exercise 2(i)). This, on the other
hand, provides a noncanonical probability space for the displacement
(coin-tossing) process defined by the differences X. = S. S_ 1 , n > 1.

Example 2. The problem is to construct a Gaussian stochastic process having

prescribed means and covariances. Suppose that we are given a sequence of
real numbers i , p a ,... , and an array, a 1 , i, j = 1, 2, ... , of real numbers
Qi; = o;i for all i, j, (6.6)


(Non-negative Definiteness)

Z 6 ;j x ; xj ^ 0 for all n-tuples (x 1 , ... , x) in (6.7)

i,j= 1

Property (6.7) is the condition that D. = ((Q ;j )), ,; , be a nonnegative definite

matrix for each n. Again take ) = R', = :4', and X,, X 2 ,. . . the respective
coordinate projections. For each n >, 1, let P be the n-dimensional Gaussian
distribution on (O", ") having mean vector ( l , . .. , ) and covariance matrix
D. Since a linear transformation of a Gaussian random vector is also Gaussian,
the consistency condition (6.3) can be checked by applying the coordinate
projection mapping (x 1 ..... x + ,) -+ (x 1 , ... , x) from I +' to tll (Exercise 1).

Example 3. Let S be a countable set and let p = ((p)) be a matrix of

nonnegative real numbers such that for each fixed i, p ; , j is a probability
distribution (sums to I over j in S). Let a = (7r i ) be a probability distribution
on S. By the Kolmogorov Existence Theorem there is a probability distribution
P,, on the infinite sequence space = S x S x x S x . such that
PP (X0 =10.....X =j) = n 30 pJ0 J1 p where X denotes the nth
projection map (Exercise 2(ii)). In this case the process {X}, having distribution
P, is called a Markov chain. These processes are the subject of Chapter II.


Perhaps the simplest way to introduce the continuous-parameter stochastic

process known as Brownian motion is to view it as the limiting form of an
unrestricted random walk. To physically motivate the discussion, suppose a
solute particle immersed in a liquid suffers, on the average, f collisions per
second with the molecules of the surrounding liquid. Assume that a collision
causes a small random displacement of the solute particle that is independent
of its present position. Such an assumption can be justified in the case that the
solute particle is much heavier than a molecule of the surrounding liquid. For
simplicity, consider displacements in one particular direction, say the vertical
direction, and assume that each displacement is either +A or A with
probabilities p and q = 1 p, respectively. The particle then performs a
one-dimensional random walk with step size A. Assume for the present that
the vessel is very large so that the random walk initiated far away from the
boundary may be considered to be unrestricted. Suppose at time zero the particle
is at the position x relative to some origin. At time t > 0 it has suffered
approximately n = tf independent displacements, say Z 1 , Z 2 , ... , Z. Since f
is extremely large (of the order of 10 21 ), if t is of the order of 10 -10 second
then n is very large. The position of the particle at time t, being x plus the sum
of n independent Bernoulli random variables, is, by the central limit theorem,
approximately Gaussian with mean x + tf(p q)0 and variance tf4A 2 pq. To
make the limiting argument firm, let

p=2+ 2 ^ o and 0= ^

Here p and a are two fixed numbers, or > 0. Then as f --> cc, the mean
displacement t f (p q)0 converges to t and the variance converges to ta e . In
the limit, then, the position X, of the particle at time t > 0 is Gaussian with
probability density function (in y) given by

2QZt_t)z (7.1)
P(t; x, Y) _ (2ita2t)1j2 eXp{

Ifs > 0 then X, + X, is the sum of displacements during the time interval
(t, t + s]. Therefore, by the argument above, X, +s X, is Gaussian with mean
s and variance sa e , and it is independent of {X,,: 0 < u < t}. In particular, for
every finite set of time points 0 < t l < t 2 < <t, the random variables
X,,, X^ 2 X,.. . , X XX ,,,_, are independent. A stochastic process with this
last property is said to be a process with independent increments. This is the
continuous-time analogue of random walks. From the physical description of
the process {X} as representing (a coordinate of) the path of a diffusing solute
particle, one would expect that the sample paths of the process (i.e., the
trajectories t * X(w) = w,) may be taken to be continuous. That this is indeed
the case is an important mathematical result originally due to Norbert Wiener.
For this reason, Brownian motion is also called the Wiener process. A complete
definition of Brownian motion goes as follows.

Definition 7.1. A Brownian motion with drift and diffusion coefficient a 2 is

a stochastic process {X,: t 0} having continuous sample paths and
independent Gaussian increments with mean and variance of an increment
XX+s XX being sp and sa e , respectively. If X0 = x, then this Brownian motion
is said to start at x. A Brownian motion with zero drift and diffusion coefficient
of 1 is called the standard Brownian motion.

Families of random variables {X} constituting Brownian motions arise in

many different contexts on diverse probability spaces. The canonical model for
Brownian motion is given as follows.

1. The sample space S2 := C[0, oo) is the set of all real-valued continuous
functions on the time interval [0, cc). This is the set of all possible
trajectories (sample paths) of the process.
2. XX (co) := co, is the value of the sample path w at time t.
3. S2 is equipped with the smallest sigmafield .y of subsets of S2
containing the class .moo of all finite-dimensional sets of the form
F = fce e ): a ; <w,. < b i , i = 1, 2, ... , k}, where a <b, are constants

and 0 < t l < t 2 < < t k are a finite set of time points. .F is said to be
generated by .moo.


4. The existence and uniqueness of a probability measure P x on F, called

the Wiener measure starting at x, as specified by Definition 7.1 is
determined by the probability assignments of the form of (7.2) below.

For the set F above, P (F) can be calculated as follows. Definition (7.1) gives

the joint density of X, Xr2 - X,,, ... , X,, - Xtk _, as that of k independent
Gaussian random variables with means t 1 p., (t 2 - t1), ... , (tk - tk -1)/ 2 ,
respectively, and variances tIQ 2 , (t2 t1)a 2 , ... , (tk tk_ 1 )a 2 , respectively.
Transforming this (product) joint density, say in variables z 1 , z 2 , ... , by the
change of variables z 1 = YI, z2 = Y2 - Y1' I zk = Yk - Yk-1 and using the fact
that the Jacobian of this linear transformation is unity, one obtains

PX (a ; <X11 <b 1 fori= 1,2,...,k)

I fbk

^ bI ...

{(Y1 X t A'
2 "2 exp 2
01 ak I ak (27rQ t l) 2v t,

1 t t1)1 1 ) 2
1Y2 Y1 (2 l
(21IU2(t2 t1))"2exp

2a2(t2 t1) 1
... 2
1/ 2
J (Yk Yk-1 (tk tk-1)t^) 2
expl( 2 dYk dYk-I
(27L6 (tk tk - 1)) 26
(tk tk - 1) ^
The joint density of X, , X, , ... , X,,, is the integrand in (7.2) and may be
1 2

expressed, using (7.1), as

P(t1;x+Y1)P(t2 t1+Y1 , Y2)"'P(tk tk-I+Yk-I,Yk) (7.3)

The probabilities of a number of infinite-dimensional events will be calculated

in Sections 9-13, and in Chapter IV. Some further discussion of mathematical
issues in this connection are presented in Section 8 also. The details of a
construction of the Brownian motion and its Wiener measure distribution are
given in the theoretical complements of Section 13.
If {X,(' ) }, j = 1, 2, ... , k, are k independent standard Brownian motions, then
the vector-valued process {X,} = {{X; 1) , Xt( 2 I, ... , Xr( k) )} is called a standard
k-dimensional Brownian motion. If {X,} is a standard k-dimensional Brownian
motion, p = ( (1) , ... , ( k 1 ) a vector in l, and A a k x k nonsingular matrix,
then the vector-valued process {Y, = AX, + t} has independent increments, the
increment Y +S - Y, = A(X, +s - X,) + (t + s - t) being Gaussian with mean
vector su and covariance (or dispersion) matrix sD, where D = AA' and A'
denotes the transpose of A. Such a process Y is called a k-dimensional Brownian
motion with drift vector p and diffusion matrix, or dispersion matrix D.


The argument in Section 7 indicating Brownian motion (with zero drift
parameter for simplicity) as the limit of a random walk can be made on the
basis of the classical central limit theorem which applies to every i.i.d. sequence
of increments {Z m } having finite mean and variance. While we can only obtain
convergence of the finite-dimensional distributions by such considerations, much
more is true. Namely, probabilities of certain infinite-dimensional events will
also converge. The convergence of the full distributions of random walks to
the full distribution of the Brownian motion process is informally explained in
this section. A more detailed and precise discussion is given in the theoretical
complements of Sections 8 and 13.
To state this limit theorem somewhat more precisely, consider a sequence
of i.i.d. random variables {Z.} and assume for the present that EZ, = 0 and
Var Z. = a 2 > 0. Define the random walk
S0= 0 , Sm =Z 1 +.+.Zm (m=1,2,...). (8.1)

Define, for each value of the scale parameter n >, 1, the stochastic process

X i n) = S[nr] (t
^ i 0), (8.2)
V "

where [nt] is the integer part of nt. Figure 8.1 plots the sample path of
{X;"^: t >, 0} up to time t = 13/n if the successive displacements take values
Z 1 =-1, Zz =+1, Z 3 =+1, Z4 =+1, Z 5 =-1, Z6 =+1, Z 7 =+1,
Zg = 1, Zq = + 1, Z10 = + 1, 211 = + 1, Z12 = 1.

_ 1

I n .i

4 4 '-4
^---; -4
? 4.--1-4


1 1 3 4 5 6 7 i 9 10 11 12 13 t
n n n n n n It n n n n n n

EX " = 0, VarX^11 = n ~ t,
' )

Cov(Vt., + `v l )= [n] ~ S.

Figure 8.1


The process {S 11 : t 0} records the discrete-time random walk
{S.: m = 0, 1, 2, ...} on a continuous time scale whose unit is n times that of
the discrete time unit, i.e., Sm is plotted at time m/n. The process
{X} = {(1/ \/)S[ f] also scales distance by measuring distances on a scale

whose unit is f times the unit of measurement used for the random walk.
This is a convenient normalization, since

[nt]0 z2
EX ) = 0, Var X}" ) = ta z for large n. (8.3)
In a time interval (t 1 , t 2 ] the overall "displacement" X X( ) is the sum of
a large number [nt z ] [nt,] n(t 2 t,) of small i.i.d. random variables

1 1

In the case {Z,} is i.i.d. Bernoulli, this means reducing the step sizes of the
random variables to t1 = 1/,.fn. In a physical application, looking at {X}
means the following.

1. The random walk is observed at times t, < t 2 <t 3 < sufficiently far
apart to allow a large number of individual displacements to occur during
each of the time intervals (t,, t z ], (t z , t 3 ... , and

2. Measurements of distance are made on a "macroscopic" scale whose unit

of measurement is much larger than the average magnitude of the
individual displacements. The normalizing large parameter n scales time
and n'' z scales space coordinates.

Since the sample paths of {X} have jumps (though small for large n) and
are, therefore, discontinuous, it is technically more convenient to linearly
interpolate the random walk between one jump point and the next, using the
same spacetime scales as used for {X}. The polygonal process {X,( 1 is " }

formally defined by

Xt") = SIntl + (nt [nt]) t 0. (8.4)

In this way, just as for the limiting Brownian motion process, the paths of
{X} are continuous. Figure 8.2 plots the path of {X1 "} corresponding to the
path of {X} drawn in Figure 8.1. In a time interval m/n < t < (m + 1)/n, X;" )
is constant at level 1// S while X}" ) changes linearly from l/ f S. at time
t=m/n to
I S, +1 = S"' Z'" + ' at time
me t =


I [n(] Z101j+j
S^ rl + (t
n ) ^n



[nt] + 1 (t [nt] t
= 0, VarX^ rn) _ )2
n n n


Figure 8.2

Thus, in any given interval [0, T] the maximum difference between the two
processes {X,(n} and {X,( } does not exceed

c n (T) = max IZII IZ21 , , IZ nT,+1I


To see that the difference between {X,(n } and {X;n } is negligible for large n,
) )

consider the following estimate. For each 6 > 0,

P(e n (T) > 6) = 1 P(e n (T) < (5)

I ^'<(5
= I P( Z for allm= 1,2,...,[nT]+ 1)
lV n
= 1 (P(IZ11 < 5\))
= 1 (1 P(IZ11 > 6.^ n))[nT 1 +l (8.5)

Assuming for simplicity that EIZ 1 1 3 < co, Chebyshev's inequality yields
P(1Z11 > 6 ^) <, EIZII 3 /6 3 n 3/2 . Use this in (8.5) to get (Exercise 9)
P(e (T) > (5) 1 ( i (533/2 )

1exp{ EIZ113T } 0 (8.6)

63 n 1/2

when n is large. Here indicates that the difference between the two sides

goes to zero. Thus, on any closed and bounded time interval the behaviors of
{X,(" } and {X} are the same in the large-n limit.

Note that given any finite set of time points 0 < t 1 < t 2 < < t, the joint
distribution of (X, X;z ) , .. . , X(" ) converges to the finite-dimensional

distribution of (X , X, , ... , X ), where {X,} is a Brownian motion with zero

X1 2 tk

drift and diffusion coefficient a 2 . To see this, note that X, X Xt"^, ... ,
X,( X() , are independent random variables that by the classical central limit
k )

theorem (Chapter 0) converge in distribution to Gaussian random variables

with zero means and variances t1a 2 , (t2 t, )a 2 , ... , (tk tk_ t )Q 2 . That is to
say, the joint distribution of (X, X ,( " ) X X (") X ("^ )converges to

that of (X,,, X X, ... , X X, _,). By a linear transformation, one gets


the desired convergence of finite-dimensional distributions of {X(" } (and, )

therefore, of {X^" 1 }) to those of the Brownian motion process {X} (Exercise 1).
Roughly speaking, to establish the full convergence in distribution of {X!" 1 }
to Brownian motion, one further looks at a finite set of time points comprising
a fine subdivision of a bounded interval [0, T] and shows that the fluctuations
of the process {X^"^} on [0, T] between successive points of this subdivision
are sufficiently small in probability, a property called the tightness of the process.
This control over fluctuations together with the convergence of {X^" 1 } evaluated
at the time points of the subdivision ensures convergence in distribution to a
continuous process whose finite-dimensional distributions are the same as those
of Brownian motion (see theoretical complements for details). Since there is no
process other than Brownian motion with continuous sample paths that has
these limiting finite-dimensional distributions, it follows that the limit must be
Brownian motion.
A precise statement of the functional central limit theorem (FCLT) is the

Theorem 8.1. (The Functional Central Limit Theorem). Suppose {Z,,,:

m = 1, 2, ...} is an i.i.d. sequence with EZ, = 0 and variance a 2 > 0. Then as
n * cc the stochastic processes {X: t > 0} (or {Xr": t >, 0}) converge in
distribution to a Brownian motion starting at the origin with zero drift and
diffusion coefficient a 2 .

An important way in which to view the convergence asserted in the FCLT

is as follows. First, the sample paths of the polygonal process {X^" 1 } belong to
the Space S = C[O, oo) of all continuous real-valued function on [0, oo), as do
those of the limiting Brownian motion {X}. This space C[O, oo) is a metric
space with a natural notion of convergence of sequences {d" ) }, say, being that
"{m ( " ) } converges to w in C[O, co) as n tends to infinity if and only if
{co ( " ) (t): a <, t < b} converges uniformly to {w(t): a < t < b} for all closed and
bounded intervals [a, b]." Second, the distributions of the processes {X} and
{X,} are probability measures P" and P on a certain class F of events of C[0, cc),
called Borel subsets, which is generated by and therefore includes all of the
finite-dimensional events. .F includes as well various important infinite-

dimensional events, e.g., the events {max a , b X > y} and f maxa t b X < x}
pertaining to extremes of the process. More generally, if f is a continuous
function on C[0, oo) then the event { f( {X}) < x} is also a Borel subset of
C[0, oo) (Exercise 2). With events of this type in mind, a precise meaning of
convergence in distribution (or weak convergence) of the probability measures
P. to P on this infinite-dimensional space C[0, oo) is that the probability
distributions of the real-valued (one dimensional) random variables f( {X;'})
converge (in distribution as described in Chapter 0) to the distribution of f( {X1 })
for each real-valued continuous function f defined on C[0, cc). Since a number
of important infinite-dimensional events can be expressed in terms of continuous
functionals of the processes, this makes calculations of probabilities possible
by taking limits; for examples of infinite dimensional events whose probabilities
do not converge see Exercise 9.3(iv).
Because the limiting process, namely Brownian motion, is the same for all
increments {Z,} as above, the limit Theorem 8.1 is also referred to as the
Invariance Principle, i.e., invariance with respect to the distribution of the
increment process.
There are two distinct types of applications of Theorem 8.1. In the first type
it is used to calculate probabilities of infinite-dimensional events associated with
Brownian motion by studying simple random walks. In the second type it
(invariance) is used to calculate asymptotics of a large variety of partial-sum
processes by studying simple random walks and Brownian motion. Several such
examples are considered in the next two sections.


The first problem is to calculate, for a Brownian motion {X} with drift Ic = 0
and diffusion coefficient Q 2 , starting at x, the probability

P(T < Ta) = P({X' } reaches c before d) (c < x < d), (9.1)


T;:= inf{t >, 0: Xx = y} . (9.2)

Since {B, = (X; x)/v) is a standard Brownian motion starting at zero,

P(2x < ra) = P({B,} reaches c x before d (9.3)
a Q

Now consider the i.i.d. Bernoulli sequence {Z m : m = 1, 2, ...} with P(Z m = 1) =

P(Zm = 1) = 2, and the associated random walk So = 0, Sm = Z l + + Z,
(m >, 1). By the FCLT (Theorem 8.1), the polygonal process {X} associated
with this random walk converges in distribution to {B,}. Hence (theoretical


complement 2)

cx dxl
P(i < rd) = lim P ( {i} reaches ------- before ----/)
"- x Q 6

= lim P({S,} reaches c" before d"), (9.4)



c"= Lc -x ;],


x n if d" = d X is an integer,
d =
" dx
d x + 1 if not.

By relation (3.14) of Section 3, one has

d_ x -
- n
d a
P(rx <t) = l im " = lim ----- -- . (9.5)


P(r, <ra) = dc (c<x<d,=0). (9.6)

Similarly, using relation (3.13) of Section 3 instead of (3.14), one gets

P(ta <T') = x_c (c <x<d,p =0). (9.7)

Letting d --* + oo in (9.6) and c + co in (9.7), one has

P(rc < oo) = P({X, } ever reaches c) = I (c < x, p = 0),

P(r< < oo) = P({X; } ever reaches d) = 1 (x < d, p = 0).

The relations (9.8) mean that a Brownian motion with zero drift is recurrent,


just as a simple symmetric random walk was shown to be in Section 3.

The next problem is to calculate the corresponding probabilities when the
drift is a nonzero quantity p. Consider, for each large n, the Bernoulli sequence
2 26 ^ ,

{Z, n : m = 1, 2, ...} with

P(Zm,n= 1)=9 n =---
22 ------.

Write Sm , n =Z l , n + +Z, n for m>,1,So. ,,=0.Then,


EX (n) = ES1 n,1,n = a n tp
^n or (9.9)

nt] Var Z n [nt] (
(1 1\1 )Z) t,
Var X^ (n) =
n n 7

and a slight modification of the FCLT, with no significant difference in proof,

implies that {X;n ) } and, therefore, {X;n} converges in distribution to a Brownian
motion with drift /Q and diffusion coefficient of I that starts at the origin. Let
{X'} be a Brownian motion with drift It and diffusion coefficient a 2 starting at
x. Then {W = (X, x)/Q} is a Brownian motion with drift p/a and diffusion
coefficient of! that starts at the origin. Hence, by using relation (3.8) of Section 3,

P(i, <x) = P({X, } reaches c before d)

d x^
= Pl{W} reaches c x before
Q a
= lim ({S m , n : m = 0, 1, 2, ...} reaches c n before d n )
n- ro
1 (ihn /^]n) a f
= lim / d-x - c=x
n a n
1 (pn/qn) a
1 + a n

1- J

1 7
= um
n-m l^ dc
a Jn 1 +
U n
1 I I
; ^

c -

d z x

exp- 2 A ^

c) Ja
e x p^ (d 2
d -

vz 1L


1 exp{2(d x)p/v z }
P(i' < zcd) = (c < x < d, p 0). (9.10)
1 exp{2(d c)p/v 2 }

If relation (3.6) of Section 3 is used instead of (3.8), then

1 exp{ 2(x c) /a2}

P(Td < T') = (c <x < d, y 0). (9.11)
1 exp{-2(d c)/a}

Letting d T oo in (9.10), one gets

P(i<<oo)=exp{- 2(x z c)p } (c < x, p > 0),
l o J)) (9.12)
P(r <oo)= 1 (c<x,p<0).

Thus, in this case the extremal random variable min,,, X is exponentially

distributed (Exercise 4). Letting c j oc in (9.11) one obtains

P(trd <oo)=1 (x<d,p>0),

P(-rd < oo) = exp{2(d x)/a 2 } (x < d, p < 0).

In particular it follows that max,, o X is exponentially distribute (Exercise 4).

Relations (9.12), (9.13) imply that a Brownian motion with a nonzero drift is
transient. This can also be deduced by an appeal to (a continuous time version
of) the strong law of large numbers, just as in (3.11), (3.12) of Section 3
(Exercise 1).



We have seen in Section 4, relation (4.7), that for a simple symmetric random
walk starting at zero, the first passage time 7 y to the state y 0 0 has the



P(7.=N)=IYI N+y 1
Y N N=IYI>IYI+2,IYI+4..... (10.1)

Now let r = T be the first time a standard Brownian motion starting at the
origin reaches z. Let {X^" } be the polygonal process corresponding to the simple

symmetric random walk. Considering the first time {X } reaches z, one has )

by the FLCT (Theorem 8.1) and Eq. 10.1 (Exercise 1),

P(a = > t) = lim P(T Z f ] > [nt])

n- X

= lim P(T=,n] = N)
n-+m N=(nt]+1

y IYI ( N
= lim N+ (Y = [z^])
n-+ao N=tnt]+1, N 2

Now according to Stirling's formula, for large integers M, we can write
2 e -M M nr+2 (1 + S M )
M! = (21r) (10.3)

where 8 M 0 as M oo. Since y = [z\], N> [nt], and 2(N y) >

{[nt] I[z/]I}/2, both N and N + y tend to infinity as n oo. Therefore,
for N + y even,

N e-NNN+#2-N
IYI N + y 2 _ N = IYI 2
(N+y)/2+I e l (N-Yu2+#
N 2 (2ir) t N e -(N+Y)12( N + Y ) (NY)/2 (N Y
2 ` 2 J
X (1 + o(1))

(2ir)1I2N312 1+ N I 1 N (1 + o( 1 ))

I 2 (N + Y)/ 2 (N Y)l2
(2ir) /2N3/2 (1 + ) 1 (1 + o( 1 )),
where o(1) denotes a quantity whose magnitude is bounded above by a quantity
en (t, z) that depends only on n, t, z and which goes to zero as n oo. Also,


log (1
y (N+ y uz y wN-vuz _ N + Y Y _ Y z IYI3
+ N) 1 - N 2 N 2N2 +(N3)

2 3
+N 2 y [
N+2 N +O \ INI3 /]

_ -2N+8(N,y), (10.5)

where IB(N, y)j < n -11 z c(t, z) and c(t, z) is a constant depending only on t and
z. Combining (10.4) and (10.5), we have

^N N + Y 2-N = n N3I/2 1 exp -

2N}(1 + o( 1 ))
( (10.6)
= n I N 312 exp1-2N}(1 + 0(1)),

where o(1) --, 0 as n -* oo, uniformly for N> [nt], N - [z\] even. Using this
in (10.2), one obtains

P(r= > t) = lim rz I

( z
N 31 2 expj -2N}. (10.7)
n^ao N> n:J,
N[z.n) even
Now either [nt] + 1 or [nt] + 2 is a possible value of N. In the first case the
sum in (10.7) is over values N = [nt] - I + 2r (r = 1, 2, ...), and in the second
case over values [nt] + 2r (r = 1, 2, ...). Since the differences in corresponding
values of N/n are 2/n, one may take N = [nt] + 2r for the purpose of calculating
(10.7). Thus,

P(t 2 > t) = lim ] + 2 ex ( nz2

fl X) n ([nt] + 2r) 31^ p 1 2([nt] + }

_j2- Izl lim ^ 1

= 2
n -.. r1 2 (t + 2 r /n) 312 2
n z
1 (t + 2r/n) 1
_ r
I zl 2 f
- z2^
u3/ ^ exp - 2u du. (10.8)

Now, by the change of variables v = Izi/ f , we get

P(T= > t) _ v
^ f
e - " Z / z dv. (10.9)


The first passage time distribution for the more general case of Brownian
motion {X1 } with zero drift and diffusion coefficient Qz > 0, starting at the
origin, is now obtained by applying (10.9) to the standard Brownian motion
{(1/Q)X}. Therefore,

P(;> t) _ 2 f o
e-2/2 dv. (10.10)

The probability density function f 2(t) of; is obtained from (10.10) as


f 2(t) = Izl e-ZZna=t (t > 0). (10.11)

(2nci 2 ) 1/2 t 3/2

Note that for large t the tail of the p.d.f. f 2 (t) is of the order of t -3 / 2 . Therefore,
although {X} will reach z in a finite time with probability 1, the expected time
is infinite (Exercise 11).
Consider now a Brownian motion {X,} with a nonzero drift and diffusion
coefficient a 2 that starts at the origin. As in Section 9, the polygonal process
{X^n) } corresponding to the simple random walk S,, n = Z 1 , + + Z,, n ,
S0 ,, = 0, with P(Ztn , n = 1) = p = 2 + /(2Q\), converges in distribution to
{W = Xja}, which is a Brownian motion with drift /u and diffusion coefficient
1. On the other hand, writing Ty , n for the first passage time of {S, n : m = 0, 1, ...}
to y, one has, by relation (4.6) of Section 4,

N = IY)
P(1 = N) N N +y p(N+v)/2R(N-v)/2
N (N-i-y)12 (lN-Y)l2
IYI N ) (

N+y 2 - 1+
N 2
Ql Qn

FYI 2 \ N/21 y /2 p y /2
NN 2 U
+ y 2 N(
n \ 1 +
a2 n / \
; )

For y = [w..J] for some given nonzero w, and N = [nt] + 2r for some given
t> 0 and all positive integers r, one has
N/2 / l-y/2
^ v/z/ l
( _
I1 1 }
J ^

/ \ z ) nt/2 +r J Wf/z w,,//z

a2n (1 + 0(1))
\1 + a nI \l Q^)

2 2r
exp{ _
= ex 4
t e i
}(x a^w
ex 1
l + 0 ())
n^ p 26 p 2v (

t/1 2 /LW
-zn1 +o(1))

i ( z l r ro
t 2 + ^W exp Y-2 + e (l + 0(1)) (10.13)
2a Cr l Q J
where E does not depend on r and goes to zero as n + oc, and o(l) represents
a term that goes to zero uniformly for all r >, 1 as n --* x. The first passage
time i 2 to z for {X1 } is the same as the first passage time to w = z/a for the
process {I4'} = {X/a}. It follows from (9.12), (9.13) that if

(i) p. <0 and z > O or

(ii) p > 0 and z < 0,

then there is a positive probability that the process { W} will never reach w = z/a
(i.e., t Z = co). On the other hand, the sum of the probabilities in (10.12) over
N> [nt] only gives the probability that the random walk reaches [w,,/n- ] in a
finite time greater than [nt]. By the FCLT, (10.6)(10.8) and (10.12) and (10.13),
we have

P(t < r < co) = 2 IwI exp z

tu2 + w lim 1
n 2a Q ^^ r _ 1 n(t + 2r/n) 312
x exp + e )'^
2(t + 2r/n)

(wI exp 2a2 + 2 V312 exp 2v

_ ^ t2 jJ1 1 f w2 ^

x [ du

(2n)i/z w=
exI I p a
^ t1i
2a W/.1

x exp{----
(v t) dv
2v 2az

1 W/I ) z z
= ^ aw^ exp fj 1 exp W v A,
(2n) l/Z v 3 ' 2 2v 2U2 ^

for w = z/a. (10.14)


Therefore, for t > 0,

z z
( = 1 i 1ex z -- v dv. ( 10.15 )
) (27r) / Z a o2 v3/2 p 2a 2 o 2QZ

Differentiating this with respect to t (and changing the sign) the probability
density function of r Z is given by

.fa=. (t) = Izl exp ^ z z2 Z t}

(2 nc 2 ) l / 2 t 3 J 2 QZ 2Q Z t 2o 2


.f 2 exp{ -- 1 (z t) 2 } (t > 0). (10.16)
(t) _ (2na2)1J2t3n

In particular, letting p(t; 0, y) denote the p.d.f. (7.1) of the distribution of the
position X at time t, (10.16) can be expressed as (see 4.6)

I I p(t; 0, Z).
.%2. u (t) = C (10.17)

As mentioned before, the integral of J 2 , (t) is less than 1 if either


(i) p>O,z<Oor
(ii) p<0,z>0.
In all other cases, (10.16) is a proper probability density function. By putting
p = 0 in (10.16), one gets (10.11).


Consider a simple symmetric random walk {S,} starting at zero. The problem
is to calculate the distribution of the last visit to zero by S o , S,, ... , S. For
this we first calculate the probability that the number of + l's exceeds the
number of l's until time N and with a given positive value of the excess at
time N.

Lemma 1. Let a, b be two integers, 0 < a < b. Then

P(S1>0,S2>0,...,S.+b-i> 0,Sa+b=ba)
[(a+ 11 (a+b 111^21a+n(a b - a
b b)a+b(2)a+n.

Proof. Each of the (t b)

b ) paths from (0, 0) to (a + b, b a) has probability
(2) +b We seek the number M of those for which S 1 = 1, S2 > 0, S 3 > 0, ... ,
Sa+b_1 > 0, Sa+b = b a. Now the paths from (1,1) to (a + b, b a) that cross
or touch zero (the horizontal axis) are in oneone correspondence with those
that go from (1, 1) to (a + b, b a). This correspondence is set up by reflecting
each path of the last type about zero (i.e., about the horizontal time axis) up
to the first time after time zero that zero is reached, and leaving the path from
then on unchanged. The reflected path leads from (1, 1) to (a + b, b a) and
crosses or touches zero. Conversely, any path leading from (1,1) to
(a + b, b a) that crosses or touches zero, when reflected in the same manner,
yields a path from (1, 1) to (a + b, b a). But the number of all paths from
(1, 1) to (a + b, b a) is simply ("1') since it requires b plus l's and a 1
minus l's among a + b I steps to go from (1, 1) to (a + b, b a). Hence

M= (a +
bb i 1) (a+b-1^

since there are altogether (' + 6 1 1 ) paths from (1,1) to (a + b, b a). Now a
straightforward simplification yields

_ a+b ba
M b )a+b

Lemma 2. For the simple symmetric random walk starting at zero we have,

f'(S154 0 ,S2^` 0 ,...,Sz" 0)=P(Sz"=0)_\nn2n. (11.2)

Proof. By symmetry, the leftmost side of (11.2) equals

2P(S I >0,S 2 >0,...,S 2n >0)

=2 Z P(S 1 >O,S 2 >0,...,S Z "_ 2 >0,S en =2r)

=2,= 2
[ (n +r 1 1) ( 2n+r/](2)

= 2(2nn 1 ( 2n)(^)'"
= P(S2" = 0),
)\ 2 / 2n = n

where we have adopted the convention that ( 2 2n 1 ) = 0 in writing the middle



Theorem 11.1. Let I' ( ' ) = max{ j: 0 ,< j ,< m, Si = 0}. Then
(2n) = 2k) = P(S2k = 0 )P(S2n-2k = 0 )
2 ) J2n-2k
= \ k /\ 2/ 2k (fl_k k2
(2k)!(2n 2k)! ( i'\"
= fork =0,1,2,..., n. (11.3)
(k!) 2 ((n k)!)2 2

Proof. By considering the conditional probability given {S 2k = 0} one can easily

justify that
P(r(2n) = 2k) = P(S2k
= 0, S2k +1 5 0, Sek + 2 0, ... , Sen * 0)
= P(S2k = 0)P(S I 0, S2 0, ... , S2n-2k 0 )
= P(S2k = 0 )P(S2n-2k = 0)
Theorem 11.1 has the following symmetry relation as a corollary.

P(r' (2 n ) = 2k) = P(17 (2 n) = 2n 2k) for all k = 0,1, ... , n. (11.4)

Theorem 11.2. (The Arc Sine Law). Let {B,} be a standard Brownian motion
at zero. Let y = sup{t: 0 < t 5 1, B, = 0}. Then y has the probability density

i(x) 1 0<x<l. (11.5)

_ IZ(x(1 x))112 ,

P(y < x) =
fo x
.f(y) dy = sin
1 x-. (11.6)

Proof. Let {So = 0, S 1 , S 2 , ...} be a simple symmetric random walk. Define

{X ' ) } as in (8.4). By the FCLT (Theorem 8.1) one has
P(y '< x) = um P(y (") < x) (0 < x < 1),
n ao

y (" ) = sup{t: 0 '< t '< 1, X = 0} = 1 sup{m: 0 < m <, n, S. = O} = 1 r( " ) .

n n

In particular, taking the limit over even integers, it follows that

P(y x) = limP 1 r 2n ) x I = um P(r(2 nj < 2nx),

n- 0,( 2n jjj n ao

where I' (2 n 1 is defined in Theorem 11.1. By Theorem 11.1 and Stirling's


(2k)!(2n 2k)!
lim P(I' (Z " 1 < 2nx) = lim [^1 2_2n
n-w n-. k=o (k!) 2 ((n k)!)z

1-1 (27r) 112 e - 2k( 2k )zk+ j

= lim Y-
112 -k k+})2
n-.00 k=o ((2n) e k

(2x)1/2e-2(n-k)(2(n k))[2cn-k)+#12-2n

x \ ((2n)'1'e-("-k)(n k)n-k++)2

= lim Y-
n-+oo k=o 7i (n k)

1 Intl 1 1 _ 1
= n li k= n k
k 112 n o (y( 1 y)) 1I2 dy
\ n^ 1

The following (invariance) corollary follows by applying the FCLT.

Corollary 11.3. Let {Z,, Z 2 , ...} be a sequence of i.i.d. random variables such
that EZ, = 0, EZ i = 1. Then, defining {X^" 1 } as in (8.4) and y ( n 1 as above, one has

lim P(y ( n ) x) = 2 sin - ' ^. (11.7)

n-^M n

From the arcsine law of the time of the last visit to zero it is also possible
to get the distribution of the length of time in [0, 1] the standard Brownian
motion spends on the positive side of the origin (i.e., an occupation time
law) an arcsine distribution. This fact is recorded in the following
corollary (Exercise 2).

Corollary 11.4. Let U = I {t < 1: Bt e IT + }I, where I I denotes Lebesgue

measure and {B 1 } is standard Brownian motion starting at 0. Then,

P(U < x) = 2 sin -1 /i (11.8)



Let {B,} be a standard Brownian motion starting at zero. Since B, tB l vanishes

for t = 0 and t = 1, the stochastic process {B*} defined by

B*:=B t tB,, O'<t'< 1, (12.1)

is called the Brownian bridge or the tied-down Brownian motion.

Since {B,} is a Gaussian process with independent increments, it is simple
to check that {B*} is a Gaussian process; i.e., its finite dimensional distributions
are Gaussian. Also,
EB* = 0, (12.2)

Cov(B*, B*) = Cov(B,,, B, 2 ) t 2 Cov(B,,, B 1 ) t, Cov(B,,, B 1 )
+ t1t2 Cov(B 1 , B1)
= tl t2tl t1t2 + t1t2 = t1(1 t2), for t i t2. (12.3)

From this one can also write down the joint normal density of (B*, B*, ... , B)
for arbitrary 0 < t l < t 2 < < t k < 1 (Exercise 1).
The Brownian bridge arises quite naturally in the asymptotic theory of
statistics. To explain this application, let us consider a sequence of real-valued
i.i.d. random variables Y1 , Y2 ,... , having a (common) distribution function F.
The nth empirical distribution is the discrete probability distribution on the line
assigning a probability 1/n to each of the n values Y Y2 , ... , Y. The
corresponding distribution function F. is called the (nth) empirical distribution
F(t)=#{j:1<j<n,Y; <t}, co<t<00, (12.4)

where # A denotes the cardinality of the set A.

Suppose Y Y(2) 1< ... 1< Y is the ordering of the first n observations.
Figure 12.1 illustrates F 5 . Note that {F(t): t >, 0} is for each n a stochastic


1 ^--

3 ~ I

O Yti)=Y2 Y(2)=Ys Y{3)=YI Y(4)= Y3 Y(c =Y, t

Figure 12.1


process. Now for each t the random variable


nF(t) _ Y l <() (} (12.5)


is the sum of n i.i.d. Bernoulli random variables each taking the value I with
probability F(t) = P( t) and the value 0 with probability 1 F(t). Now
E(1 (y . ) ) = F(t) and, for t 1

F(t1)( 1 F(t2)),
Cov( 1 (Y ; s1,) , 1{Yks:2))
to, ifj = k,

since in the case j = k,

Cov(l{Yj_<ti)' I (YkSt2)) = E(l(YjEt1} 1 (Y^Se2}) E( 1 (YJ_<tt))E( 1 {yJ ,2})

= F(t1) F(t1)F(t2) = F(t1)( 1 F(ti))

It follows from the central limit theorem that

n 1/2 ( 1{Yt) nF(t)) = fn(Fn(t) F(t))

- 1

is asymptotically (as n > oo) Gaussian with mean zero and variance
F(t)(1 F(t)). For t l < t 2 < . < t k , the multidimensional central limit
theorem applied to the i.i.d. sequence of k-dimensional random vectors
( 1 (Y)' 1(Y;st^)' ... , 1 (Y ; ,Ik t) shows that (.(Fn(t1) F(t1)), \/(F,(t2) F(ti)),
. , f (F,,(t k ) F(t k ))) is asymptotically (k-dimensional) Gaussian with zero
mean and dispersion matrix E = ((a s,)), where

= Cov(I(Y s,,), 1 {Y ; st ; )) = F(t,)(1 F(t ; )),

for t. < t j . (12.7)

In the special case of observations from the uniform distribution on [0, 1],
one has

F(t)=t, O'<t'<1, (12.8)

so that the Finite-dimensional distributions of the stochastic process

^(Fn (t) F(t)) converge to those of the Brownian bridge as n --p cc. As in
the case of the functional central limit theorem (FCLT), probabilities of many
infinite-dimensional events of interest also converge to those of the Brownian
bridge (theoretical complements 1 and 2). The precise statement of such a result
is as follows.

Proposition 12.1. If Y1 , Y2 , ... is an i.i.d. sequence having the uniform

distribution on [0, 1], then the normalized empirical process { f (F(t) t):
0 < t < 1} converges in distribution to the Brownian bridge as n -+ co.

Let Y1 , Y2 ,... be an i.i.d. sequence having a (common) distribution function

F that is continuous on the real number line. Note that in the case that F is
strictly increasing on an interval (a, b) with F(a) = 0, F(b) = 1, one has for

P(F(Yk) < t) = P(Yk < F -1 (t)) = F(F -1 (t)) = t, (12.9)

so the sequence U 1 = F(Y1 ), U2 = F(Y2 ), . .. is i.i.d. uniform on [0, 1]. The same
is true more generally (Exercise 2). Let F be the empirical distribution function
of Y1 , ... , Y,,, and G. that of U1 ,. .. , U. Then, since the proportion of Yk 's,
1 < k < n, that do not exceed t coincides with the proportion of Uk 's, 1 < k < n,
that do not exceed F(t), we have

f [F(t) F(t)] = /[G(F(t)) F(t)], a < t < b. (12.10)

If a = oo (b = + oo), the index set [a, b] for the process is to exclude a (b).
Since ^(G(t) t), 0 < t <, 1, converges in distribution to the Brownian bridge,
and since t -+ F(t) is increasing on (a, b), one derives the following extension
of Proposition 12.1.

Proposition 12.2. Let Y1 , Y2 , ... be a sequence of i.i.d. real-valued random

variables with continuous distribution function F on (a, b) where F(a) = 0,
F(b) = 1. Then the normalized empirical process \/(F,,(t) F(t)), a < t b,
converges in distribution to the Gaussian process {Z} :_ {BF() : a <, t < b}, as
n -->co.

It also follows from (12.10) that the Kolmogorov-Smirnov statistic defined

by D := sup{ JIF(t) F(t)I: a < t s b} satisfies

D. = sup "I F(t) F(t)I = sup ' I G(F(t)) F(t)I = sup ,/ /IG(t) tl.
a_<t5b a5t5b 0-<t<,1
Thus, the distribution of D. is the same (namely that obtained under the uniform
distribution) for all continuous F. This common distribution has been tabulated
for small and moderately large values of n (see theoretical complement 2). By
Proposition 12.2, for large n, the distribution is approximately the same as that
of the statistic defined by (also see theoretical complement 1)
D:= sup (B*I. (12.12)

A calculation of the distribution of D yields (theoretical complement 3 and

Exercise 4*(iii))

P(D >d) =2 (- 1)k-

ie- 2k2d2, d>0. (12.13)

These facts are often used to test the statistical hypothesis that observations
Y,, Y2 , ... , Y are from a specified distribution with a continuous distribution
function F. If the observed value, say d, of D is so large that the probability
(approximated by (12.13) for large n) is very small for a value of Dn as large as
or larger than d to occur (under the assumption that Y,, ... , Y do come from
F), then the hypothesis is rejected.
In closing, note that by the strong law of large numbers, F (t) -+ F(t) as

n --> cia, with probability 1. From the FCLT, it follows that

sup IF(t) F(t)I -- 0 in probability as n - oo. (12.14)
- oo < t

In fact, it is possible to show that the uniform convergence in (12.14) is also

almost sure (Exercise 8). This stronger result is known as the Glivenko-Cantelli


An extremely useful concept in probability is that of a stopping time, sometimes

also called a Markov time. Consider a sequence of random variables
{X: n = 0, 1, ...}, defined on some probability space (0, F, P). Stopping times
with respect to {X} are defined as follows. Denote by . the sigmafield
Q{X .... , X} comprising all events that depend only on {X0 , X 1 ..... X}.

Definition 13.1. A stopping time r for the process {X} is a random variable
taking nonnegative integer values, including possibly the value + oo, such that

{t<n }E. ( n=0,1,...). (13.1)

Observe that (13.1) is equivalent to the condition

{t=n}e5 (n=0,1,...), (13.2)

since . are increasing sigmafields (i.e., . c 3y ,,) and r is integer-valued.

Informally, (13.1) says that, using r, the decision to stop or not to stop by
time n depends only on the observations X0 , X 1 . . , X. ,.

An important example of a stopping time is the first passage time r B to a

(Bore]) set B c R',

t := min{n > 0: X e B}. (13.3)

If X. does not lie in B for any n, one takes T B = oo. Sometimes the minimum
in (13.3) is taken over In >, 1, X . e B} , in which case we call it the first return
time to B, denoted rl B
A less interesting but useful example of a stopping time is a constant time,

T:= m (13.4)

where m is a fixed positive integer.

One may define, for every positive integer r, the rth passage time t B
recursively, by

i2:= min{n > rB -1) : X. E B} (r = 2, ...)

rB = T B .

Again, if X. does not lie in B for any n> TB 1) , take ie^ = oo. Also note that
if i8 ) = oo for some r then r ' = oo for all r' r. It is a simple exercise to check
( )

that each TB' ) is a stopping time (Exercise 1).

The usefulness of the concept of stopping times will now be illustrated by a
result that in gambling language says that in a fair game the gambler has no
winning strategy. To be precise, consider a sequence {X: n = 0, 1, ...} of
independent random variables, S = X0 + X l + + X. Obviously, if
EX = 0, n > 1, then ES = S o for each n. We will now consider an extension
of this property when n is replaced by certain stopping times. Since {X 0 ,. . . , X}
and {So , S 1 , ... , S} determine each other, stopping times for {S} are the same
as those for {X}.

Theorem 13.1. Let r be a stopping time for the process {S}. If

1. EX=0forn>,1,EIX0 I-EIS0 I<oo,
2. P(i< cc) =1,
3. EISJ < oo, and
4. E(S m1{T>) --> 0 as m 00,


ES= ES o . (13.6)

Proof. First assume T < m for some integer m. Then

ES, = E(Sol{L=ol) + E(Sll(T=i^) + ... + E(S m l {s=m) )

= E(Xo 1 (T, )) + E(X, 1 (,,)) + ... + E(XJ 1 (T j}) + ... + E(Xm1{rim})
= EXo + E(X 1 1 (t , l) ) + + E(XJl{t_>j}) + . + E(Xml{T,m}). (13.7)

Now {i > j} = {x <j}` = {r <j l }` depends only on Xo , X,, ... , Xj -,.


E(Xjl(t , i)) = E[ 1 ( >J)E(XJ {X 0 .... , X; -i })]

= E[l >_ E(XX )]=0 (j=1,2,...,m), (13.8)

so that (13.7) reduces to

ESL = EXo = ES,. (13.9)

To prove the general result, define

r m := r A m = min{r, m}, (13.10)

and check that r m is a stopping time (Exercise 2). By (13.9),

ES=ES, (m=1,2,...). (13.11)

Since r = z m on the set Jr <, m}, one has

IESI ESJ = IE(SS S^m)1 = I E((S5 Sm) 1 (s>m))I

IE(sl{Y>m))I + IE(Sml{t>m))I. (13.12)

The first term on the right side of the inequality in (13.12) goes to zero as
m > oo, by assumptions (2), (3) (Exercise 3), while the second term goes to
zero by assumption (4). 0.

Assumptions (2), (3) ensure that ES, is well defined and finite. Assumption
(4) is of a technical nature, but cannot be dispensed with. To demonstrate this,
consider a simple symmetric random walk {S n } starting at zero (i.e., S o = 0).
Write r y for T { ^, ) , the first passage time to the state y, y 0. Then (1), (2), (3)
are satisfied. But

ES,Y=y 0. (13.13)

The reason (13.6) does not hold in this case is that assumption (4) is violated
(see Exercise 4). If, on the other hand,

r = min{n: S = a or b}, (13.14)

where a and b are positive integers, then P(T < co) = 1. There are various ways
of proving this last assertion. A more general result, namely, Proposition 13.4,
is proved later in the section to take care of this condition. To check condition
(3) of Theorem 13.1, note that ISJ < max{a, b}, so that EISB I < max{a, b}. Also,
on the set {tr > m} one has a <Sm <b and therefore

I E ( Sm l {i >m })I < max {a, b }EI 1 (t>m )I

= max {a, b}P{ r > m} ,. 0 as m cc. (13.15)

Thus condition (4) is verified. Hence the conclusion (13.6) of Theorem 13.1
holds. This means

0 = ES, = aP(t _ q < Tb ) + bP(T_a > T i,)

= a(1 P(r_ a > T b )) + bP(t_ o > t b ), (13.16)

which may be solved for P(T_ a > Tb ) to yield

( a >z b ) =
Pz_ a (, 13.17)

a result that was obtained by a different method earlier (see Chapter I, Eq. 3.13).
To deal with the case EX 0, as is the case with the simple asymmetric
random walk, the following corollary to Theorem 13.1 is useful.

Corollary 13.2. (Wald's Equation). Let { Y.: n = 1, 2, ...} be a sequence of i.i.d.

random variables with EY = . Let S = Yl + + Y., and let i be a stopping
time for the process {S: n = 1, 2, ...} such that
2'. ET < cc,
3'. ES LI < oo, and
4'. lim,,, IE(Sml {t>m })I = 0.


ES,' = (E-r). (13.18)

Proof. To prove this, simply set X0 = 0, X = Y p (n >, 1), and

S. = X i + + XX = S n, and apply Theorem 13.1 to get

0 = ES = E(ST i) = EST E(r)u,

Q (13.19)

which yields (13.18). Note that EISB I < EISLI + (Er)I j! < oo, by (2') and (3'). Also

IE(Sm 1 {i >m })I < IE(Sm 1 (r>m})I + EIZi21 {t >m}I

= IE(S;,,1 T >m ) I + I1EItI { T >m I - 0.

( } }

For an application of Corollary 13.2, consider the case Y = + 1 or 1 with

probabilities p and q = I p, respectively, Z < p < 1. Let r = min {n 1:

S', = a or b}, where a, b are positive. Then (13.19) yields

aP(t - a < TO + bP(T > TO _ (Er)(p q). (13.20)

From this one gets

Er= (b+a)P(T_a>rb)a
p R

Making use of the evaluation

(P) a
P(ta > Tb) _(q'\)a+b'

(Chapter I, Eq. 3.6) one has


P q( a (13.22)

b+a (

Assumption (2') for this case follows from Proposition 13.4 below, while (3'),
(4'), follow exactly as in the case of the simple symmetric random walk (see
Eq. 13.15).
In the proof of Theorem 13.1, the only property of the sequence {X} that
is made use of is the property

E(Xa 1 {X0 ,X 1 ,...,X.})=0 (n= 0,1,2,...). (13.23)

Theorem 13.1 therefore remains valid if assumption (1) is replaced by (13.23),

and (2)-(4) hold. No other assumption (such as independence) concerning the
random variables is needed. This motivates the following definition.

Definition 13.2. A sequence {X: n = 0, 1, 2, ...} satisfying (13.23) is called a

sequence of martingale differences, and the sequence of their partial sums, {S},
is called a martingale.

Note that (13.23) is equivalent to

E(S +1 {S o ,...,Sn })=S (n = 0, 1,2,...), (13.24)



E(S+ j I {So,S1,...,S}) = E(S + i I {Xo , X l , ... , X})

= E(S + X+ 1 I {Xo , X i ,... , X})
= S + E(X + , I {X0 , X 1 .... , X}) = S. (13.25)

Conversely, if (13.24) holds for a sequence of random variables {S:

n = 0, 1, 2, ...}, then the sequence {X: n = 0, 1, 2, ...} defined by

X=SS i (n=1,2,3,...),
- Xo=So, (13.26)

satisfies (13.23).
Martingales necessarily have constant expected values. Likewise, if {X} is a
martingale difference sequence, then EX = 0 for each n >, 1. Theorem 13.1,
Corollary 13.2, and Theorem 13.3 below, assert that this constancy of
expectations of a martingale continues to hold at appropriate stopping times.
In the gambling setting, the martingale property (13.24), or (13.23), is often
taken as the definition of a fair game, since whatever be the outcomes of the
first n plays, the expected net gain at the (n + 1)st play is zero. As an example
of a strategy for the gambler, suppose that it is decided not to stop until an
amount a is lost or an amount b is gained, whichever comes first. Under (13.23)
and conditions (2)(4) of Theorem 13.1, the expected gain at the end of the
game is still zero. This conclusion holds for more general stopping times, as
stated in Theorem 13.3 below. Before this result is stated, it would be useful to
extend the definition of a martingale somewhat. To motivate this new definition,
consider a sequence of i.i.d. random variables { Y: n = 1, 2, ...} such that
EY = 0, EY,, = Var()) = 1. Then {S n: n = 0, 1, 2, ...} is a martingale,
where So is an arbitrary random variable independent of { Y: n = 1, 2, ...}
satisfying ES < oo.
To see this, form the difference sequence

X+ 1' =Sn + 1 ( n + 1) (S n) = Yn+a +2SY +i 1 (n = 0, 1,2,...),

Xo _
Szo . (13.27)

Then, writing Yo = So ,

E(X +aI {Yo ,Y,,Y2 ,...,} })

= E(Yn +1 I {Yo , Yl , ... , Y}) + 2 SE(Y,. { Yo, Yl , ... , Y}) 1

= E(Yn +l ) + 2SE(Yn+1 ) 1 = 1 + 0 1 = 0. (13.28)

Since Xo , X 1 , X2 , ... , X. are determined by (i.e., are Borel measurable

functions of) Yo , Y1 , Y2 , ... , Y, it follows that


E(Xn+ 1 {X0, Xl, X2, .... Xn })

= E[E(X i l {Yo , Yi , ... , Y})I {Xo , X i , X 2 .. .. , X}] = 0, (13.29)

by (13.28). Thus,

E(X + 1 I { Yo , Yl , ... , Y}) = 0 (n = 1, 2, ...) (13.30)

implies that

E(X +i 1 {X0 ,X 1 ,...,X})=0 (n = 1,2,...). (13.31)

In general, however, the converse is not true; namely, (13.31) does not imply
(13.30). To understand this better, consider a sequence of random variables
{Y: n = 0, 1, 2, ...}. Suppose that {X: n = 0, 1, 2, ...} is another sequence of
random variables such that, for every n, X0 , X 1 , X2,... , X can be expressed
as functions of Yo , Y1 , Y2 . . , Y. Also assume EXn < oo for all n. The condition
(13.31) implies that X, +1 is orthogonal to all square integrable functions of
X0 , X 1 , ... , X, while (13.30) implies that X + , is orthogonal to all square
integrable functions of Yo , Y1 .....} (Chapter 0, Eq. 4.20). The latter class of
functions is larger than the former class. Property (13.30) is therefore stronger
than property (13.31).
One may express (13.30) as

E(X +1 1S )=0 (n=0,1,2,...), (13.32)

where . := 6{ Yo , Y1 .....}} is the sigmafield of all events determined by

{Yo , Y1 , ... , Y}. Note that X is .f-measurable, i.e., a Borel-measurable
function of Yo , ... , Y. Also, {.} is increasing, .^ c + This motivates the

following more general definition of a martingale.

Definition 13.3. Let {XX } be a sequence of random variables, and {. } an

increasing sequence of sigmafields such that, for every n, Xo , X 1 , X 2 , ... , X.
are 3-measurable. If EJXX j < oo and (13.32) holds for all n, then
{X: n = 0, 1, 2, ...} is said to be a sequence of {F }-martingale differences. The
sequence of partial sums {Z = X0 + . + X: n = 0, 1, 2, ...} is then said to
be a {.F}-martingale.

Note that a martingale in this sense is also a martingale in the sense of Definition
13.2, since (13.30) implies (13.31). In order to state an appropriate generalization
of Theorem 13.1 we need to extend the definition of stopping times given earlier.

Definition 13.4. Let {.: n = 0, 1, 2, ...} be an increasing sequence of

sub-sigmafields of the basic sigmafield F . A random variable r with nonnegative
integral values, including possibly the value + co, is a {.^}-stopping time if
(13.1) (or, (13.2)) holds for all n.

Theorem 13.3. Let {X}, {Z}, {} be as in Definition 13.3. Let i be a

{.}-stopping time such that
1. P(r<oo)=1,
2. EIZJ < oo, and
3. limm E(Zml{t>m)) = 0.


EZZ = EZo . (13.33)

The proof of Theorem 13.3 is essentially identical to that of Theorem 13.1

(Exercise 6), except that (13.6) becomes

EZL = EZm = EZa (13.34)

which may not be zero.
For an application of Theorem 13.3 consider an i.i.d. Bernoulli sequence
{Y:n= 1,2,...}:

P(Y = 1) = P() = 1) = i. (13.35)

Let Yo = 0, ^ = Q{ Yo , Y,, ... , Y}. Then, by (13.27) and (13.28), the sequence
of random variables

Z:= S2 n (n = 0, 1, 2, ...) (13.36)

is a {.F}-martingale sequence. Here S = Yo + + Y. Let r be the first time

the random walk {S: n = 0, 1, 2, ...} reaches a orb, where a, b are two given
positive integers. Then r is a { F}-stopping time. By Proposition 13.4 below,
one has P(r < oo) = 1 and E ' < oo for all k. Moreover,

EIZT I = EIS z rl < ESt + Et <, max{a 2 , b 2 } + Er < oo, (13.37)


EIZm1{t>m}I < E((max {a 2 , b 2 } + m) 1 {t>m))

= (max{a 2 , b 2 } + m)P(i > m)

= max{a 2 , b 2 }P(r > m) + mP(r > m). (13.38)

Now P(i > m) -- 0 as m -+ co and, by Chevyshev's inequality,

mP(T > m) <, m Er e 0. (13.39)


Conditions (1)(3) of Theorem 13.3 are therefore verified. Hence,

EZ, =EZ 1 = ES;-1= 1 -1=0, (13.40)

i.e., using (13.17),

a2b 2
Et=ES, =a 2 P(r_ a <T b )+ b 2 P(t_ a > Tb) = (13.41)
a+b + a6 ab ab.

By changing the starting position Yo = So = 0 of the random walk to

Yo = So = x one then gets

ET= (xc)(dx), c<x<d, (13.42)

where z = min{n: S.' = c or d}.

The next result has been made use of in deriving (13.17), (13.22) and (13.42).

Proposition 13.4. Let {X": n = 1, 2, ...} be an i.i.d. sequence of random

variables such that P(X" = 0) < 1. Let T be the first escape time of the (general)
random walk {S.x = x + X l + + X": n = 1, 2, ...}, S = x, from the interval
(a, b), where a and b are positive numbers. Then P(i < oo) = 1, and the
moment generating function 4(z) = Eezt oft is finite in a neighborhood of z = 0.

Proof. There exists an c> 0 such that either P(X" > c) > 0 or P(X" < c)> 0.
Assume first that 6 := P(X" > c) > 0. Define

no = r a+b l
+ 1, (13.43)

where [(a + b) /E] is the integer part of (a + b) /E. No matter what the starting
position x E (a, b) of the random walk may be, if X" > E for all
n=1,2,...,n o , then S=x+X 1 ++X" o >x+n o e>x +a+b>,b.

P(T<,n o )>P(S > b) >,P(X">rforn= 1,2,...,n o )>,6=6 0 , ( 13.44)

say. Now consider the events

Ak ={ a<S <b for (k 1)n o <n`kn o } ( k= 1,2,3,...),

A 1 ={a<Sx < b for 1 <n-<n o }.

By (13.44),

P(A1) = P(i > n o ) < I S o . (13.46)



P(r > 2n o ) = E(I A ,1 A2 ) = E[1A1E(lA2 I {Si, ... , S.X})]. (13.47)


E(I A2 I {S;,...,So})=P(a <S <bform= 1,...,n o I{Si,...,S

+m 0 })

=P(a<S o +Xna+l +... +Xno+m <b,

for m = 1, ... , n o l {Si, ... , S p}). (13.48)

Since X 0+1 + + Xn n +m is independent of Si, ... , So , the last conditional

probability may be evaluated as

P(a<z+Xnn+t + + Xno +m <bform= 1 ,...,no)Z=s o

=P(a<z+X l ++Xm <bform=1,...,n o ) = = s;o . ( 13.49)

The equality in (13.49) is due to the fact that the distribution of (X 1 , X2 , ... , Xno )
is the same as that of (Xn o +l, Xn o +z, . .. , X20 ). Note that Sao E ( a, b) on the
set A 1 . Hence the last probability in (13.49) is not larger than 1 5 by (13.46).
Therefore, (13.47) yields

P(A 1 n A 2 ) = P(z > 2n o ) < E[l A1 (1 So)] = ( 1 8 0)P(A1) <- (1 8 o ) 2 .

By recursion it follows that

P(T > kn o ) = E(1 42 IA,,) = E[1A, ... 1 A,,- E(lAk {S1, ... , S(k- 1)no })]
E[I A1 ...l Ak- ,( 1 So)] < (1 b o )P(A 1 n ... n Ak-i)

= (1 8 0 )P(r > ( k 1)n o ) <, (1 5 o )(1 o)k 1

=(1-8 o ) k (k=1,2,...). (13.51)

Since {r = co} c {z > kn o } for every k, one has

P(r = co) < (1 8 o ) k for all k, (13.52)

and consequently, P(T = oo) = 0. Finally,

Ee:z = em:P(T = m) < emIZIP(r = m)

m =1 m =1

0o kno
_ )e'I'IP(-r = m)
k=1 m= (k- 1)no+1


w oo
e k ^ 0I=I P(2 > (k 1)n0) Ze (1 6) k-1
k=1 k=1

log(1 6)
= e ((1 b)e Holz)) k-1 < cc for jzj < . (13.53)
k=1 no

One may proceed in an entirely analogous manner assuming P(X,, < E) > 0.

An immediate corollary to Proposition 13.4 is that Er k < cc for all

k = 1, 2, ... (Exercise 18(i)).
It is easy to check that (Exercise 8), instead of the independence of {X n }, it
is enough to assume that for some e > 0 there is a S > 0 such that

P(XX+ 1 > e l {X 1 , ... , Xn }) >, 8 > 0 for all n,

or (13.54)
P(Xn+1 < EI {X 1 ,...,Xn })>,S>0 foralln.

Thus, Proposition 13.4 has the following extension, which is useful in studying
processes other than random walks.

Proposition 13.5. The conclusion of Proposition 13.4 holds if, instead of the
assumption that {X} is i.i.d., (13.54) holds for a pair of positive numbers s, S.

The next result in this section is Doob's Maximal Inequality. A maximal

inequality relates the growth of the maximum of partial sums to the growth of
the partial sums, and typically shows that the former does not grow much faster
than the latter.

Theorem 13.6. (Doob's Maximal Inequality). Let {Z o , ... , Zn } be a martingale,

MM max{1Z0 1, ... , JZJ).
(a) For all A > 0,

P(MM , A) E(IZfhI(M.>A ) < EIZnj/A.


(b) If EZ < co then, for all A > 0,

P(Mn i A) E(ZR 1 (M>_ A)) < EZn/ 22 , (13.56)


EMn < 4EZ,^, . (13.57)


Proof. (a) Write .Fk := a{Z 0 , ... , Zk }, the sigmafield of events determined by
Z0 ,. . . , Z. Consider the events A o '= {IZ O I % A}, A k := {IZ; I <A for 0 < j < k,
IZkI >, A} (k = 1, ... , n). The events A k are pairwise disjoint and
UA k ={M 2}.

P(MM > A) _ Y P(A k ). (13.58)

Now I < IZk !/.1 on A k . Using this and the martingale property,

P(Ak) = E(l Ak ) E(lAkIZkl) _ E[ 1 A,,IE(Zn I `yk)I] < E[ 1 A k E(IZfI I "'k)]

Since ' A , is .F-measurable, 1 A , E(IZn I I ) = E(I Ak jZn I I .^k ); and as the
expectation of the conditional expectation of a random variable Z equals EZ
(see Chapter 0, Theorem 4.4),

P(Ak) ' A E[E(IA k IZnl _ E( 1 jZnl). (13.60)

Summing over k one arrives at (13.55).

(b) The inequality (13.56) follows in the same manner as above, starting with

P(Ak) = E(lA k ) Z E( 1 AkZk), (13.61)

and then noting that

E(Zn Ilk) = E(Zk (Zn Zk) 2 2 Zk(4 Zk) I . 9k)

= E(Zk + (Z Zk) 2 13k) 2ZkE(Zn Zk I Jk)

= E(Zk Ilk) + E((Zn Zk) Z I.k) i E(Zk I lk). (13.62)

E( 1 AkZ^) ? E[ 1 AkE(Zk I . )] = E[E(1 Ak Zk I k)] = E( 1 AkZk). (13.63)

Now use (13.63) in (13.61) and sum over k to get (13.56).

In order to prove (13.57), first express M as 2 j'" A dA and interchange the
order of taking expectation and integrating with respect to Lebesgue measure


(Fubini's Theorem, Chapter 0, Theorem 4.2), to get

f A d2 dP = 2
om n o
21(M,M d i) dP

21 o
2(J 1{MA} dP) dA = 2
n f ,0 2P(M > A) dA. (13.64)

Now use the first inequality in (13.55) to derive

EM2 < 2 E( 1 (M,A)IZnl) dA = 2 J IZI(J M dA) dP

f00 n o /

2 fr i
IZIM dP = 2 E(IZI M) < 2(EZ2)'/ 2 (EMn) 112 , (13.65)

using the Schwarz Inequality. Now divide the extreme left and right sides of
(13.65) by (EM)" 2 to get (13.57). n

A well-known inequality of Kolmogorov follows as a simple consequence of


Corollary 13.7. Let X0 , X 1 ,. . . , X be independent random variables, EXX = 0

for l < j < n, EX j2 < oo for 0 <, j < n. Write S,k := Xo + + Xk , M:= max{ISkI:
0<k<n}. Then for every .?>0,

P(M>1A)<22 SdP J IM>_A)

z ES. (13.66)

The main results in this section may all be extended to continuous-parameter

martingales. For this purpose, consider a stochastic process {X1 : t > 0} having
right continuous sample paths t + X. A stopping time r for {X} is a random
variable with values in [0, oo], satisfying the property

{T<,t}E.^, forallt>,0, (13.67)

where;:= r{X: 0 < u < t} is the sigmafield of events that depend only on the
("past" of the) process up to time t. As in the discrete-parameter case, first
passage times to finite sets are stopping times. Write for Bore! sets B c ff8 1 ,

r B :=min{t>,0:X,eB}, (13.68)

for the first passage time to the set B. If B is a singleton, B = {y}, T, is written
simply as r y , the first passage time to the state y.

A stochastic process {Z,} is said to be a {.F}-martingale if

E(Z,1 5;) = Zs (s < t). (13.69)

For a Brownian motion {X} with drift , the process {Z, := XX t} is easily
seen to be a martingale with respect to {A}. For this {X,} another example is
{Z, :_ (X1 t) 2 ta 2 }, where a 2 is the diffusion coefficient of {X1 }.
The following is the continuous-parameter analogue of Theorem 13.6(b).

Proposition 13.8. Let {Z1 : t >, 0} be a right continuous square integrable

{.^^}-martingale, and let M,:= sup{jZj: 0 < s <, t}. Then

P(M, > ).) < EZ, /2 2 (). > 0) (13.70)

EM, <,4EZ,. (13.71)

Proof. For each n let 0 = t,, n <t 2 ,, < < = t be such that the sets
I:= {t j ,: I j < n} are increasing (i.e., I n c I + ,) and U 1. is dense in [0, t].
Write M1 , n := max{,Zj: I < j < n}. By (13.56),

P(Mr,, )) EZ`
> 2

Letting n j oo, one obtains (13.70) as the sets F := {M A,} increase to a set
that contains {M, > Al as n j oo.
Next, from (13.57),

EM2 < 4EZ, .

Now use the Monotone Convergence Theorem (Chapter 0, Theorem 3.2), to

get (13.71). n

The next result is an analogue of Theorem 13.3.

Proposition 13.9. (Optional Stopping). Let {Z,: t > 0} be a right continuous

square integrable {. }-martingale, and r a {3}-stopping time such that
(i) P(r < oo) = 1,
(ii) EIZ^I < oo, and
(iii) EZ,,,. * EZt as r cc.



EZi = EZo . (13.72)

Proof. Define, for each positive integer n, the random variables

k2-" on {(k - 1)2 " < t < k2 "}

- - (k = 0, 1, 2, ...)
t": rrr 13.73
1 00 on {i =oo}. ( )

It is simple to check that T " is a stopping time with respect to the sequence of
( )

sigmafields {.k2 -n: k = 0, 1, ...}, as is r " A r for every positive integer r. Since
( )

A r < r, it follows from Theorem 13.3 that

EZT ., A , = EZ o . (13.74)

Now ' , T for all n and T " J, r as n j oc. Therefore, t ( " ) A r j r A r. By the
( )

right continuity of t -> Z 1 , Z'-, r -> Z, A r .

Also, IZI,,,, r I < Mr := sup{lZt I: 0 < t < r}, and EMr <, (EM; )" 2 < oo by
(13.71). Applying Lebesgue's Dominated Convergence Theorem (Chapter 0,
Theorem 3.4) to (13.74), one gets EZt r = EZo . The proof is now complete by
assumption (iii). n

One may apply (13.72) exactly as in the case of the simple symmetric random
walk starting at zero (see (13.16) and (13.17)) to get, in the case it = 0,

P('c_ a > T 6 ):= P({X,} reaches b before - -

a) =a-+b
--- (13.75)

for arbitrary a > 0, b > 0. Similarly, applying (13.72) to {Z, :_ (XX - tp) 2 - ta l l,
as in the case of the simple asymmetric random walk (see (13.20)-(13.22), and
use (9.11)),

Cl - exp^_2a2 11(b + a)
(b + a)P(r_ Q > TO - a = 6 }) - a
Et _ __ (13.76)
a6 ^ 2ba '
(1 - exp{ - (+
6Z )N 1)

where {X} is a Brownian motion with a nonzero drift p, starting at zero.



Let { }": n = 1, 2, ...} be a sequence of random variables representing the annual

flows into a reservoir over a span of N years. Let S" = Y, + + Y", n >, 1,

So = 0. Also let YN = N 'SN . A variety of complicated natural processes (e.g.,


sediment deposition, erosion, etc.) constrain the life and capacity of a reservoir.
However, a particular design parameter analyzed extensively by hydrologists,
based on an idealization in which water usage and natural loss would occur at
an annual rate estimated by YN units per year, is the (dimensionless) statistic
defined by

DN DN (14.1)


MN:=max{SnYN:n=0, 1,...,N}
_ (14.2)
mN :=min{SnYN :n=0, 1,...,N},

DN :=r (Y YN ) Z ] IIZ ,
YN = - -SN .
' (14.3)

The hydrologist Harold Edwin Hurst stimulated a tremendous amount of

interest in the possible behaviors of R N /DN for large values of N. On the basis
of data that he analyzed for regions of the Nile River, Hurst published a finding
that his plots of log(R N /DN ) versus log N are linear with slope H : 0.75. William
Feller was soon to show this to be an anomaly relative to the standard statistical
framework of i.i.d. flows Yl , ... , Y having finite second moment. The precise
form of Feller's analysis is as follows.
Let { Y} be an i.i.d. sequence with

EY=d, VarY=a 2 >0. (14.4)

First consider that, by the central limit theorem, SN = Nd + O(N 1 " 2 ) in the
sense that (SN Nd)/..N is, for large N, distributed approximately like a
Gaussian random variable with mean zero and variance 0 2 . If one defines

MN =max{Snd:0 n<N},
r N =min{Snd:0^n<,N}, (14.5)

RN = MN - rN,
then by the functional central limit theorem (FCLT)

mN l (M, m), R
NvJ /
as N oo, (14.6)

where denotes convergence in distribution of the sequence on the left to the


distribution of the random variable(s) on the right. Here

M := max{B,: 0 . t . l } ,
m:=min{B,:0 < t 1}, (14.7)

with {B 1 } a standard Brownian motion starting at zero. It follows that the

magnitude of R N is 0(N 112 ). Under these circumstances, therefore, one would
expect to find a fluctuation between the maximum and minimum of partial
sums, centered around the mean, over a period N to be of the order N 2 . To
see that this still remains for R N /(./ DN ) in place of R N /(fN o), first note
that, by the strong law of large numbers applied to Y and Y separately, we
have with probability 1 that YN -- d, and
DN= Yn-YN->EYI-d2= .2a
sN -.00. (14.8)
N n= 1

Therefore, with probability 1, as N -* oo,

V '. DN \/'. Q

where "-S" indicates "asymptotic equality" in the sense that the ratio of the
two sides goes to 1 as N -- oo. This implies that the asymptotic distributions
of the two sides of (14.9) are the same. Next notice that

MN _ ( (S-nd)-n(, -d))
v,IN o-<n,<N aIN

( SiN - [Nt]d - [Nt] ( SN - Nd \1

= max n`)
o s t ,1 a\/7

max (B t - tB i ):= M (14.10)



mN - [Nt] ( SN d \l
= min ( SIN`S [Nt]d min (B, - tB l ) := m,



a \/
' aN
-^^(M,m). (14.11)


DN . 7
/ a..JN

where R is a strictly positive random variable. Once again then, R N /DN , the
so-called rescaled adjusted range statistic, is of the order of O(N 1 J 2 ).
The basic problem raised by Hurst is to identify circumstances under which
one may obtain an exponent H > 2. The next major theoretical result
following Feller was again somewhat negative, though quite insightful.
Specifically, P. A. P. Moran considered the case of i.i.d. random variables
Y1 , Y2 , having "fat tails" in their distribution. In this case the re-scaling by

DN serves to compensate for the increased fluctuation in R N to the extent that

cancellations occur resulting again in H = Z.
The first positive result was obtained by Mandelbrot and VanNess, who
obtained H > ? under a stationary but strongly dependent model having
moments of all orders, in fact Gaussian (theoretical complements 1, and
theoretical complements 1.3, Chapter IV). For the present section we will
consider the case of independent but nonstationary flows having finite second
moments. In particular, it will now be shown that under an appropriately slow
trend superimposed on a sequence of i.i.d. random variables the Hurst effect
appears. We will say that the Hurst exponent is H if R N /(DN N") converges in
distribution to a nonzero real-valued random variable as N tends to infinity.
In particular, this includes the case of convergence in probability to a positive
Let {X} be an i.i.d. sequence with EX = d and Var X. = a 2 as above, and
let f (n) be an arbitrary real-valued function on the set of positive integers. We
assume that the observations Y are of the form

Y. = X + f (n). (14.13)

The partial sums of the observations are

S=Y1 +...+Y=X 1 +...+X+f(1)+... +f(n)

= S* + Z f(j), So = 0, (14.14)
i= 1
S,*= X1 ++X. (14.15)

Introduce the notation DN for the standard deviation of the X-values

{X: 1 < n < N},
DN Z = (X XN ) 2 . 1N _, ( 14.16)

Then, writing IN = f (n)/N,

N _
D^:=1 L. (}'n Y, )2
N L=1
1 N1
N 2
/' N
= N n^l (X XN) 2 + N nZl ( f (n) JN) 2 + N nl (f(n) fN)(Xn XN)

1 N 2 N
=D2 + I (f(n) fN)Z + ( f (n) fv)(X XN). (14.17)
N= 1 N=1

Also write

MN = max {S nYN } = max S* nXN + Y- (f(j) fN )},

O-<n<N 0-<n5N j=1 )))

_ _ n _
m N = min {SS nYN } = min Sn nXN + Y- (f (j) fN )} , (14.18)
O-<n-<N O-<n-<N j=1

RN = MN M N , RN = max {Sn nXN } min {S nXN}.

O-<n-<N 05n-<N

For convenience, write

IN)' /IN(0) 0,
^N(n) : jY-
=1 (1(f)
AN max IN(n) min PN(n).
O-<n<-N 05n-<N

Observe that

MN max 1N(fl) + max (S* nXN ),

OSn-<N O^n-<N
mN min I.IN(n) + min (Sn nXN),
O-<n-<N On-<N


MN >, max N (n) + min (S . nXN ),

OSn-<N O-<n-<N

M N - min P N (n) + max {S, nXN).


From (14.20) one gets R N < A N + RN, and from (14.21), R N > A N R. In


other words,

IR N - A N S < R. (14.22)

Note that in the same manner,

IR N - RNS < A N . (14.23)

It remains to estimate DN and A N .

Lemma 1. If f(n) converges to a finite limit, then DN converges to Q 2 with

probability 1.

Proof. In view of (14.17), it suffices to prove

I (f(n) fN) 2 + ? (f(n) fN)(X,, XN) -+ 0 as N -+ oo. (14.24)
N= 1 N

Let be the limit of f(n). Then

1 Z (f(n) - JN) 2 = 1 Y (f(n) - a) 2 - (f - a) 2 . (14.25)

N=1 N=1

Now if a sequence g(n) converges to a limit 0, then so do its Caesaro means

N -1 I N g(n). Applying this to the sequences (f(n) - a) 2 and f(n), observe that
(14.25) goes to zero as N -* oo. Next

1 1, (f(n) fN)(X,, XN) = 1 Y, (f(n) a)(X,, d) (IN a)(XN d).

N , N,1

The second term on the right clearly tends to zero as N increases. Also, by
Schwarz inequality,
1 N 1 N 1/2 N 1/2
- (f(n)-a)(Xn-d) -< 1 Z (f(n)- a)2
^ - d)2
N n=1 N n=1 ( =J

By the strong law of large numbers, N (XX - d)2 - E ( X1 - d)2 = a 2

and the Caesaro means N - ' >.rt=1 (f(n) - a) 2 go to zero as N -> oo, since
(f(n)a) 2 ,.0asn*oo. n

From (14.12), (14.22), and Lemma 1 we get the following result.


Theorem 14.1. If f (n) converges to a finite limit, then for every H > 2,

RN AN I p 0 in probability as N . oo. (14.27)


In particular, the Hurst effect with exponent H > Z holds if and only if, for
some positive number c',

lm ^
N = c'. (14.28)
N-' N "

Example 1. Take

f(n)=a+c(n+m) (n = 1,2,...), (14.29)

where a, c, m, are parameters, with c 0 0, m >, 0. The presence of in indicates

the starting point of the trend, namely, in units of time before the time n = 0.
Since the asymptotics are not affected by the particular value of m, we assume
henceforth that m = 0 without essential loss of generality. For simplicity, also
take c > 0. The case c < 0 can be treated in the same way.

First let < 0. Then f (n) a, and Theorem 14.1 applies. Recall that
AN = max N (n) min P N (n), (14.30)
0-<n5N O-<n-<N


1 1N (n) = Y- (f(j) IN) for 1 < n ,< N,

i =1 (14.31)
N( 0 ) = 0.

Notice that, with m = 0 and c > 0,

) 14.32)
N(n) pN(n 1) = c n !J (

is positive for n < (N ' Ij 1j6)116, and negative or zero otherwise. This shows

that the maximum of p h (n) is attained at n = n 0 given by

1 N 1/
n o = j jQ (14.33)

where [x] denotes the integer part of x. The minimum value of p(fl) is zero,

attained at n = 0 and n = N. Thus,

ON = /N(no) = c Y ^k 1 E j) (14.34) .

k=1 Ni=i

By a comparison with a Riemann sum approximation to f o x dx, one obtains

(l +)N for > 1

1j=N jl#1 N-'logN for= 1(14.35)
Ni =1 ; YN J N N-' j for < 1.

By (14.33) and (14.35),

(1 + ) 'I N
- for > 1
N/log N for = 1
C l
^ j1 N 1- for
for < 1.

From (14.34)(14.36) it follows that

ng N c1 Ni+fl,
cno > -1,
1 -^ "
\ 1+
A N cn o (n^ ' log n o N -1 log N) c log N, = 1, (14.37)
C Y_ j# = c 2 , <-1.

Here c l , c 2 are positive constants depending only on . Now consider the

following cases.

CASE 1: 2 < < 0. In this case Theorem 14.1 applies with H() = 1 + > 2.
Note that, by Lemma 1, DN a with probability 1. Therefore,

N' - c' >0

RN in probability as N cc, if 2 < < 0. (14.38)
DN +

CASE 2: < Z. Use inequality (14.23), and note from (14.37) that
O N = o(N l l 2 ). Dividing both sides of (14.23) by DN N 112 one gets, in probability
asN --goo,

RN R" R"
if < 2 .
DN N 1 I 2 DN N 1 I 2 QN1 2
1 (14.39)

But RN/vN'" Z converges in distribution to R by (14.11). Therefore, the Hurst

exponent is H() = Z.

CASE 3: = 0. In this case the Y are i.i.d. Therefore, as proved at the outset,
the Hurst exponent is 2.

CASE 4: > 0. In this case Lemma 1 does not apply, but a simple computation

DN c 3 N with probability 1 as N + oo, if >0. (14.40)

Here c 3 is an appropriate positive number. Combining (14.40), (14.37), and

(14.22) one gets

RN 4 in probability as N oo , if > 0, (14.41)


where c 4 is a positive constant. Therefore, H() = 1.

CASE 5: /i = . A slightly more delicate argument than used above shows


R 1 j2 max (B 1 tB, 2ct(1 t)),


where {B,} is a standard Brownian motion starting at zero. Thus, H(

Z) = Z
In this case one considers the process {Z N (s)} defined by

_nl S nYN
ZN( N) forn= l,2,...,N,
V ' DN

and linearly interpolated between n/N and (n + 1)/N. Then {Z N (s)} converges in
distribution to {BS + 2c,(1 ../s)}, where B'} is the Brownian bridge. In
this case the asymptotic distribution of R N /( DN) is the nondegenerate
distribution of

max B min B,
oss,i osssi

(Exercise 1).
The graph of H() versus in Figure 14.1 summarizes the results of the
preceding cases 1 through 5.


Figure 14.1

For purposes of data analysis, note that (14.38) implies

log RN [loge'+(l +) log N] 0 if 2 < <0. (14.43)

In other words, for large N the plot of log R N /DN against log N should be
approximately linear with slope H = 1 + , if < < 0.
Under the i.i.d. model one would expect to find a fluctuation between the
maximum and the minimum of partial sums, centered around the sample mean,
over a period N to be of the order of N 1 / 2 . One may then try to check the
appropriateness of the model, i.e., the presumed i.i.d. nature of the observations,
by taking successive (disjoint) blocks of Y. values each of size N, calculating
the difference between the maximum and minimum of partial sums in each
block, and seeing whether this difference is of the order of N" 2 . In this regard
it is of interest that many other geophysical data sets indicative of climatic
patterns have been reported to exhibit the Hurst effect.


Exercises for Section I.1

1. The following events refer to the coin-tossing model. Determine which of these
events is finite-dimensional and calculate the probability of each event.
(i) 1 appears for the first time on the 10th toss.
(ii) 1 appears for the last time on the 10th toss.
(iii) 1 appears on every other toss.
(iv) The proportion of 1's stabilizes to a given value r as the number of tosses is
increased without bound.
(v) Infinitely many l's occur.
2. For the coin-tossing experiment (Example 1) determine the probability that the
first head is followed immediately by a tail.
3. Let L. denote the length of the run of 1's starting at the nth toss in the coin-tossing
model. Calculate the distribution of L.


Each integer lattice site of Z' is independently colored red or green with
probabilities p and q = 1 - p, respectively. Let E m be the event that the number
of green sites equals the number of red sites in the block of sites of side lengths
2m (sites per side) with a corner at the origin. Calculate P(E m i.o.) for d > 3. [Hint:
Use the Borel-Cantelli Lemma, Chapter 0, Lemma 6.1.]
5. A die is repeatedly tossed and the number of spots is recorded at each stage. Fix
j, 1 _< j < 6, and let p,, be the probability that j occurs among the first n tosses.
Calculate p. and the probability that j eventually occurs.
6. (A Fair Coin Simulation) Suppose that you are given a coin for which the
probability of a head is p where 0 < p < 1. At each unit of time toss the coin twice
and at the nth such double toss record:
XX = 1 if a head followed by a tail occurs,
X, = -1 if a tail followed by a head occurs,
XX = 0 if the outcomes of the double toss coincide.
Let,r=min{n_> 1:X= 1 or -1}.
(i) Verify that P(t < oo) = 1.
(ii) Calculate the distribution of Y = X.
(iii) Calculate Er.
7. Show that the two probability distributions for unending independent tosses of a
coin, corresponding to distinct probabilities Pi 0 p Z for a head in a single toss,
assign respective total probabilities to mutually disjoint subsets of the coin-tossing
sample space S2. [Hint: Consider the density of l's in the various possible sequences
in S2 and use the SLLN (Chapter 0, Theorem 6.1). Such distributions are said to
be mutually singular.]
8. Suppose that M particles can be in each of N possible states s,, S21'.. , s N . Construct
a probability space and calculate the distribution of (X ... , XN ), where Xi is the
number of particles in state s ; , for each of the following schemes (i)-(iii).
(i) (Maxwell-Boltzmann) The particles are distinguishable, say labeled m i , . .. , m M

and are randomly assigned states in such a way that all possible
distinct assignments are equally likely to occur. (Imagine putting balls
(particles) into boxes (states).)
(ii) (Bose-Einstein) The particles are not distinguishable but are randomly
assigned states in such a way that all possible values of the numbers of particles
in the various states are equally likely to occur.
(iii) (Fermi-Dirac) The particles are not distinguishable but are randomly
assigned states in such a way that there can be at most one particle in any
one of the states and all possible values of the numbers of particles in various
states under the exclusion principle are equally likely to occur.
(iv) For each of the above distributions calculate the asymptotic distribution of
X as M and N -. oo such that MIN .- where p > 0 is the asymptotic

density of occupied states.

9. Suppose that the sample paths of {X,} solve the following problem, dX/dt = -X,
X0 = 1, with probability 1, where is a random parameter having a normal
(i) Calculate EX,.
(ii) Calculate the solution x(t) to the problem with replaced by E.
(iii) How do EX, and x(t) compare for short times? What about in the long run?


(iv) Calculate the distribution of the process{X,}.

*10. (i) Show that the sample space S2 = {w = (ah, w 2 ,.. .): w, = I or 0} for repeated
tosses of a coin is uncountable.
(ii) Show that under binary expansion of numbers in the unit interval, the event
{w E S2: CO e l , w Z = 8 Z , .. , co" = e"}, e i = 0 or 1, is represented by an
interval in [0, 1] of length 1/2.
*11. Suppose that S2 = [0,1] and 9 1 is the Borel sigmafield of [0, 1]. Suppose that P is
defined by the uniform p.d.f. f(x) = 1, x E [0, 1]. Remove the middle one-third
segment from [0,1] and let J = [ 0,1/3] and J 12 = [ 2/3,1] be the remaining left
and right segments, respectively. Let 1 2 = J l , u J 12 . Next remove the middle
one-third from each of these and let J ill = [0, 1/9] and J 112 = [ 2/9, 3/9] be the
remaining left and right segments of J l ,, and let J121 = [ 6/9, 7/9] and
J 122 = [ 8/9, 1] be the remaining left and right intervals of J 12 . Let
1 3 = J 111 L) J112 .. J121 v J122. Repeat this process for each of these segments, and
so on. At the nth stage, I n is the union of 2" ' disjoint intervals each of length

1/3 " - t . In particular, the probability (under f) that a randomly selected point
belongs to I", i.e., the length of 1", is P(I") = 2" '/3" '. The sets
- -

I l 1 2 In ... form a decreasing sequence. The Cantor set is

the limiting set defined by C = n I".
(i) Show that C is in one-to-one correspondence with the interval (0, 1].
(ii) Verify that C is a Borel set and calculate P(C).
(iii) Construct a continuous c.d.f that does not have a p.d.f. [Hint: Define the
Cantor ternary function F on [0,1] as follows. Let F0 (x) = (2k 1)/2" for x
in the kth interval from the left among those 2" ' subintervals deleted for the

first time at the nth stage. Then F 0 is well defined and has a continuous
extension to a function F on all of [0,1] with F(l) = I and F(0) = 0.]

Exercises for Section I.2

1. Let {X": n 1} be an i.i.d. sequence with EX.' < eo, and define the general random
walk starting at x by S = x, S; = x + X, + + X". Calculate each of the following
numerical characteristics of the distribution.
(i) ES.'
(ii) Var S.'
(iii) COV(Sn, Sm)
2. (A Mechanical Model) Consider the scattering device depicted in Figure Ex.I.2,
consisting of regularly spaced scatterers in a triangular array. Balls are successively
dropped from the top, scattered by the pegs and finally caught in the bins along
the bottom row.
(i) Calculate the probabilities that a ball will land in each of the respective bins if
the tree has n levels.
(ii) What is the expected number of balls that will fall in each of the bins if N balls
are dropped in succession?
*3. (Dorfman 's Blood Testing Scheme) Prior to World War lithe army made individual
tests of blood samples for venereal disease. So N recruits entering a processing
station required N tests. R. Dorfman suggested pooling the blood samples of m
individuals and then testing the pooled sample. If the test is negative then only one


u u 0 u II u
Figure Ex.I.2

test is needed. If the test of the pool is positive then at least one individual has the
disease and each of the m persons must be retested individually, resulting in this
event in m + 1 tests. Let X 1 , X 2 , ... , XN be an i.i.d. sequence of 0 or 1 valued
random variables with p = P(X = 1) for n = 1, 2, .... Let the event X. = l} be
used to indicate that the nth individual is infected; then the parameter p measures
the incidence of the disease in the population. Let S. = X, + X 2 + + X. denote
the number of infected individuals among the first n individuals tested, S o = 0. Let
Tk denote the number of tests required for the kth group of m individuals tested,
k = 1,2,...,[N/m]. Thus, form? 2, Tk = m + I if5,kSm(k_ ^0and Tk = I if
Smk 'Sm(k-,) = 0. The total number of tests (cost) for N individuals tested in groups
of size m each is

C N = Tk+(Nm[N/m]), m? 2.

Find m such that, for given N (large) and p, the expected number of tests per person
is minimal. [Hint: Consider the limit as N oo and show that the optimal m, if
one exists, is the integer value of m that minimizes the function

c(m)=lim =1+--(1p) m , m? 2, c(1) = 1.
N -' N in

Analyze the extreme values of the function g(x) = (I/x) (1 p)x for x > 0
(see D. W. Turner,. F. E. Tidmore, D. M. Young (1988), SIAM Review, 30,
pp. 119-122).]
4. Let {S;} be the simple symmetric random walk starting at x and let

u(n, y) = P(S = y). Verify that u(n, y) satisfies the following initial value problem

a^u = 2oyu, u(0, y) = bx,Y>

where is Kronecker's delta and

u(n, y) = u(n + 1, y) u(n, y),

V y u(n, y) = u(n, y + Z) u(n, y 2)
5. Suppose that {S,.,'"} and {S 2 } are independent (general) random walks whose
increments (step sizes) have the common distribution {p x : x E Z}. Verify that
{T = S S 2) } is a random walk with step size distribution {p x : x e Z} given by
a symmetrization of {p,: x e 7L}, i.e. px = XEE 7Z. Express {p s } in terms of
{px : xE Z}.

Exercises for Section I.3

1. Write out a complete derivation of (3.3) by conditioning on X,.
2. (i) Write out the symmetry argument for (3.8) using (3.6). [Hint: Look at the
random walk {S.x: n >, 0}.]
(ii) Verify that (3.6) can be expressed as

sink{y (x c)}
P(Td < Tx) = exp{y p (d x)} where y,, = ln((q/p)" 2 )
Binh{y,,(d c)}'
3. Justify the use of limits in (3.9) using the continuity properties (Chapter 0, (1.1),
(1.2)) of a probability measure.
4. Verify that P(S.x j4 y for all n >, N) = 0 for each N = 1, 2, ... to establish (3.18).
5. (i) If p < q and x < d, then give the symmetry argument to calculate, using (3.9),
the probability that the simple random walk starting at x will eventually reach
d in (3.10).
(ii) Verify that (3.10) may be expressed as

P(Ta < oo) = e - ap (d- "^ for some a >- 0.

6. Let rl, denote the "time of the first return to x" for the simple random walk {S.}
starting at x. Show that

P(rl, < < co) = P({S} eventually returns to x)

= 2 min(p, q).

i < cc) + q P(T' -' < cc).]

[Hint: P(nx < cc) = pP(T

7. Let M - M := sup, S,,, m = m := inf,,, S. be the (possibly infinite) extreme
statistics for the simple random walk {S} starting at 0.
(i) Calculate the distribution function and probability mass function for M in the
case p < q. [Hint: Use (3.10) and consider P(M > t).]
(ii) Do the calculations corresponding to (i) for m when p > q, and for M, and m'.


8. Let {X"} be an arbitrary i.i.d. sequence of integer-valued displacement random

variables for a general random walk on Z denoted by S" = X t + + X", n >_ 1,
and So = 0. Show that the random walk is transient if EX, exists and is noqzero.
Say that an "equalization" occurs at time n in coin tossing if the number of heads
acquired by time n coincides with the number of tails. Let E. denote the event that
an equalization occurs at time n.
(i) Show that if p j4 Z then P(E 2 ) -r c,r"n -1 / 2 , for some 0< r < 1, as n - oo.
[Hint: Use Stirling's formula: n! (2rtn)` 12 n"e - " as n -. cc; see W. Feller (1968),
An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed:,
pp. 52-54, Wiley, New York.]
(ii) Show that if p = Z then P(E 2 ) c 2 n -1 / 2 as n -+ oo.
(iii) Calculate P(E i.o.) for p 96 2 by application of the Borel-Cantelli Lemma
(Chapter 0, Lemma 6.1) and (i). Discuss why this approach fails for p = 2.
(Compare Exercise 1.4.)
(iv) Calculate P(E n i.o.) for arbitrary values of p by application of the results of this
10. (A Gambler's Ruin) A gambler plays a game in which it is possible to win a unit
amount with probability p or lose a unit amount with probability q at each play.
What is the probability that a gambler with an initial capital x playing against an
infinitely rich adversary will eventually go broke (i) if p = q = 2, (ii) if p > z?
11. Suppose that two particles are initially located at integer points x and y, respectively.
At each unit of time a particle is selected at random, both being equally likely to
be selected, and is displaced one unit to the right or left with probabilities p and q
respectively. Calculate the probability that the two particles will eventually meet.
[Hint: Consider the evolution of the difference between the positions of the two
12. Let {Xn } be a discrete-time stochastic process with state spaces S = Z such that
state y is transient. Let Z N denote the proportion of time spent in state y among the
first N time units. Show that lim N -. T N = 0 with probability 1.
13. (Range of Random Walk) Let {S"} be a simple random walk starting at 0 and
define the Range R. in time n by

R"= #{S 0 =0,S i ,...,S"}.

R. represents the number of distinct states visited by the random walk in time 0 to n.
(i) Show that E(R/n) -' ^p qi as n -* oo. [Hint: Write

R" _ /k where I. = 1, I k = I k (S,, . , Sk ) is defined by


_ (l ifSk;S;forallj=0,1,...,k-1,
Ik Sl0 otherwise.

Elk = P(Sk - Sk _ 1 ^ 0, Sk - Sk _ 2 ^ 0, ... , Sk 96 0)

=P(S; r0,j= 1,2,...,k) (justify)

= 1 P(time of the "first return to 0" _< k)

= 1 Y {P(T =J)p + P(To 1 =1)q} . Ip ql ask-+ co. ]
(ii) Verify that R"/n + 0 in probability as n * oo for the symmetric case p = q = Z.
[Hint: Use (i) and Chebyshev's (first moment) Inequality. (Chapter 0, (2.16).]
For almost sure convergence, see F. Spitzer (1976), Principles of Random Walk,
Springer-Verlag, New York.
14. (LLD. Products) Let X,, X2 ,... be an i.i.d. sequence of nonnegative random
variables having a nondegenerate distribution with EX, = 1. Define T. = fl. , Xk,
n > 1. Show that with probability unity, T. 0 as n --> oo. [Hint: If P(X, = 0) > 0

P(T" = 0 for all n sufficiently large)

= P(Xk = 0 for some k) = lim (1 [1 P(X, = 0)]") = 1.


For the case P(X 1 = 0) = 0, take logarithms and apply Jensen's Inequality (Chapter
0, (2.7)) and the SLLN (Chapter 0, Section 0.6) to show log T. -- 00 a.s. Note the
strict inequality in Jensen's Inequality by nondegeneracy.]
15. Let {S"} be a simple random walk starting at 0. Show the following.
(i) If p = Z, then P(S" = 0) diverges.
(ii) If p Z, then =o P(S" = 0) = (1 4pq) -112 = Ip q1'. [Hint: Apply the
Taylor series generalization of the Binomial theorem to z"P(Sn = 0) noting

C 2n (pq)" =
n )
(_1)(4pq)n (

\ n
2 I ]

(iii) Give a proof of the transience of 0 using (ii) for p ^ Z. [Hint: Use the
BorelCantelli Lemma (Chapter 0).]
16. Define the backward difference operator by

VO(x) = O(x) 0(x 1).

(i) Show that the boundary-value problem (3.4) can be expressed as

ZagOZ^+V =0on(c,d) and 4(c)=0, 4i(d)=1,

where u = p q and a 2 = 2p.

(ii) Verify that if p = q( = 0), then 0 satisfies the following averaging property on
the interior of the interval.

4(x) = [4(x 1) + 4(x + 1)]/2 for c < x < d.

Such a function is called a harmonic function.

(iii) Show that a harmonic function on [c, d] must take its maximum and its
minimum on the boundary 0 = {c, d}.


(iv) Verify that if two harmonic functions agree on a, then they must coincide on
all of [c, d].
(v) Give an alternate proof of the fact that a symmetric simple random walk starting
at x in (c, d) must eventually reach the boundary based on the above
maximum/minimum principle for harmonic functions. [Hint: Verify that the sum
of two harmonic functions is harmonic and use the above ideas to determine
the minimum of the escape probability from [c, d] starting at x, c -< x -< d.]
17. Consider the simple random walk with p -< q starting at 0. Let N denote the number ;

of visits to j > 0 that occur prior to the first return to 0. Give an argument that
EN; = (p/q)'. [Hint: The number of excursions to j before returning to 0 has a
geometric distribution. Condition on the first displacement.]

Exercises for Section I.4

Let T" denote the time to reach the boundary for a simple random walk {S}
starting at x in (c, d). Let p = 2p - 1, a 2 = 2p.
(i) Verify that ET" < oo. [Hint: Take x = 0. Choose N such that P(ISI > d - c).

Argue that P(T > rN) -< ('-,)', r = 1, 2, ... using the fact that the r sums over
(jN - N, jN], j = 1, ... , r, are i.i.d. and distributed as SN.]
(ii) Show that m(x) = ETx solves the boundary value problem

2 -V m+pVm= -1,
m(c) = m(d) = 0, Vm(x):= m(x) m(x 1).

(iii) Find an analytic expression for the solution to the nonhomogeneous boundary
value problem for the case p = 0. [Hint: m(x) = -x 2 is a particular solution
and 1, x solve the homogeneous problem.]
(iv) Repeat (iii) for the case p 96 0. [Hint: m(x) = Iq - p 'x is a particular solution

and 1, (q/p)" solve the homogeneous problem.]

(v) Describe ETx in the case c = 0 and d -^ oc.
2. Establish the following identities as a consequences of the results of this section.

(i) Y- N
( I N +y 2 -N =1
1 forally#0.
N+yeven 2

(ii) For p > q,

+ 1
p (N+y)lZ q (N y)/2 = / p\ v
f I

for y < 0.
N+ y even 2 \q )

(iii) Give the corresponding results for p < q.


3. (A Reflection Property) For the simple symmetric random walk {S} starting at 0
show that, for y > 0,

P(max S. > y = 2P(S N -> Y) P(SN = Y)


Let S. = X l + + X, So = 0, and suppose that X 1 , X2 , ... are independent

random variables such that S. S S S 2 , ... have symmetric probability
(i) Show that P(max, N S. >- y) = 2P(SN y) P(S N = y) for all y> 0.
(ii) Show that P(max, N S x, SN = y) = P(SN = y) if y x and P(max N S > x,
SN =y)= P(SN =2xy)ify-<x.

5. (A Gambler's Ruin) A gambler wins or loses 1 unit with probabilities p and

q = 1 p, respectively, at each play of a game. The gambler has an initial capital
of x units and the adversary has an initial capital of d> x units. The game is played
repeatedly until one of the players is broke.
(i) Calculate the probability that the gambler will eventually go broke.
(ii) What is the expected duration of the game?
6. (Bertrand's Classical Ballot Problem) Candidates A and B have probabilities p and
1 p (0 <p < 1) of winning any particular vote. If A scores m votes and B scores
n votes, m > n, then what is the probability that A will maintain a lead throughout
the process of sequentially counting all m + n votes cast? [Hint: P (out of m + n
votes cast, A scores m votes, B scores n votes, and A maintains a lead
throughout) = P(T n = m + n).]

7. Let {S} be the simple random walk starting at 0 and let

MN =max{S:n=0, 1,2,...,N},
m N =min{S:n=0, 1, 2,...,N}.

(i) Calculate the distribution of MN .

(ii) Calculate the distribution of m N .
(iii) Calculate the joint distribution of MN and S N . [Hint: Let a > 0, b be integers,
a -> b. Then
P(MN >,a,S N ^b)=P(Ta -<N,SN ->b)= P(Tn =n,SN ^b)
_ Z P(Ta =n,SN S,>ba)

_ P(Tn =n)P(SN _>-ba)

8. What percentage of the particles at y at time N are there for the first time in a dilute
system of many noninteracting (i.e., independent) particles each undergoing a simple
random walk starting at the origin?
*9. Suppose that the points of the state space S = 1 are painted blue with probability


p or green with probability 1 - p, 0 -< p < 1, independently of each other and of a

simple random walk {Sn } starting at 0. Let B denote the random set of states (integer
sites) colored blue and let N(p) denote the amount of time (occupation time) that
the random walk spends in the set B prior to time n, i.e.,

Nn(P) _ 1B(Sk)

(i) Show that EN(p) = (n + 1)p. [Hint: EI B (Sk ) = E{E[I B (Sk ) I Sk ]}.]
(ii) Verify that

for p = Z
lim Var{ Nn(p) l - cap
l n ) P(I - P)
for p z.
1p - 9^

[Hint: Use Exercise 13.15.]

(iii) For p ^ Z, use (ii), to show Nn (p)/n -+ p in probability as n -+ Go. [Hint: Use
Chebyshev's Inequality.]
(iv) For p = 1, show NN (p)/n -^ p in probability. [Hint: Show Var(N n (p)/n) -* 0 as
n. co.]

10. Apply Stirling's Formula, (k! _ (27rk)'/ Z k k e -k (1 + 0(l)) as k -. 00), to show for the
simple symmetric random walks starting at 0 that

(i) P(T N)
as N -^ oo.
(2rz)1'^z N-3/2

(ii) ETy = co.

Exercises for Section 1.5

I. (i) Complete the proof of P61ya'.s Theorem fork > 3. (See Exercise 5 below.)
(ii) Give an alternative proof of transience for k > 3 by an application of the
Borel-Cantelli Lemma Part 1 (Chapter 0, (6.1)). Why cannot Part 2 of the
lemma be directly applied to prove recurrence for k = 1, 2?
2. Show that

kO \k/ Z =()
/ .

[Hint: Consider the number of ways in which n balls can be selected from a box
of n black and n white balls.]
3. (i) Show that for the 2-dimensional simple symmetric random walk, the probability
of a return to (0, 0) at time 2n is the same as that for two independent walkers,
one along the horizontal and the other along the vertical, to be at (0, 0) at
time 2n. Also verify this by a geometric argument based on two independent
walkers with step size 1/ f and viewed along the axes rotated by 450


(ii) Show that relations (5.5) hold for a general random walk on the integer lattice
in any dimension. Use these to compute, for the simple symmetric random
walk in dimension two, the probabilities fj that the random walk returns to
the origin at time j for the first time for j = 1, ... , 8. Similarly compute fj in
dimension three for I < j < 4.
4. (i) Show that the method of Exercise 3(i) above does not hold in k = 3 dimensions.
(ii) Show that the motion of three independent simple symmetric random walkers
starting at (0, 0, 0) in 71 3 is transient.
5. Show that the trinomial coefficient

n n!
( j, k, n j k j!k!(njk)!

is largest for j, k, n j k, closest to n/3. [Hint: Suppose a maximum is attained

for j = J, k = K. Consider the inequalities of the form

n ^ n
( j, k,njk)_ < (J,K,nJK)

when j, k and/or n j k differ from J, K, n J K, respectively, by 1. Use

this to show In J 2KI < 1, in K 2JI < 1.]

6. Give a probabilistic interpretation to the relation (see Eq. 5.9) = 1/(1 y). [Hint:
Argue that the number of returns to 0 is geometrically distributed with parameter y.]
7. Show that a multidimensional random walk is transient when the (one-step) mean
displacement is nonzero.
8. Calculate the probability that the simple symmetric k-dimensional random walk
will return i.o. to a previously occupied site. [Hint: The conditional probability,
given S o , ... , S, that S + 1 ^ {S o , ... , S} is at most (2k 1)/2k. Check that

2k-1 '"
2k )

for each m >, 1.]

9. (i) Estimate (numerically) the expected number ( 1) of returns to the origin.
[Hint: Estimate a bound for C in (5.15) and bound (5.16) with a Riemann
(ii) Give a numerical estimate of the probability y that the simple random walk
in k = 3 dimensions will return to the origin. [Hint: Use (i).]
*10. Calculate the probability that a simple symmetric random walk ink = 3 dimensions
will eventually hit a given line {(ra, rb, rc): r e Z}, where (a, b, c) & (0, 0, 0) is a
lattice point.
*11. (A Finite Switching Network) Let F = (x 1 , X21... , x k } be a Finite set of k sites
that can be either "on" (1) or "off" (0). At each instant of time a site is randomly
selected and switched from its current state e to 1 e. Let S = {0,1 } F =
{(e l , ... , e k ): e ; = 0 or 1}. Let X1 , X2 , ... be i.i.d. S-valued random variables with


P(X" = e ; ) = p ; , i = i,... , k, where e ; e S is defined by e.(j) = b ; , j , and p, _> 0,

I p i = 1. Define a random walk on S, regarded as a group under coordinatewise
addition mod 2, by


Show that the configuration in which all switches are off is recurrent in the cases
k = 1, 2. The general case will follow from the methods and theory of Chapter II
when k < oo. The problem when k = cc has an interesting history: see F. Spitzer
(1976), Principles of Random Walk, Springer Verlag, New York, and references
*12. Use Exercise 11 above and the examples of random walks on Z to arrive at a
general formulation of the notion of a random walk on a group. Describe a random
walk on the unit circle in the complex plane as an illustration of your ideas.
13. Let {X"} denote a recurrent random walk on the 1-dimensional integer lattice.
Show that

E,(number of visits to x before hitting 0) = + cc, x ^ 0.

[Hint: Translate the problem by x and consider that starting from 0, the number
of visits to 0 before hitting x is bounded below by the number of visits to 0
before leaving the (open) interval centered at 0 of length lxi. Use monotonicity to
pass to the limit.]

Exercises for Section I.6

1. Let X be an (n + 1)-dimensional Gaussian random vector and A an arbitrary linear

transformation from 11" + ' to I. Show that AX is a Gaussian random vector in 11'.
2. (i) Determine the consistent specification of finite-dimensional distributions for the
canonical construction of the simple random walk.
(ii) Verify Kolmogorov consistency for Example 3.
*3. (A Kolmogorov Extension Theorem: Special Case) This exercise shows the role of
topology in proving what otherwise seems a (purely) measure-theoretical assertion
in the simplest case of Kolmogorov's theorem. Let fl = {0, 1 } be the product space
consisting of sequences of the form w = (cw w 2 ,...) with w i e {0, 1}. Give S2 the
product topology for the discrete topology on {0, 1}. By Tychonoff's Theorem from
topology, this makes S compact. Let X" be the nth coordinate projection mapping
on S2. Let .y be the Borel sigmafield for ).
(i) Show that F coincides with the sigmafield for S2 generated by events of the form

F(E 1 ,.. ,E")={we c2:w 1 =E i for i= 1,2,.. ,n},

for an arbitrarily prescribed sequence E i ..... E" of l's and 0's. [Hint:
p(w, n) = J^ ^ Iw" 7"I/ 2 " metrizes the product topology on S2. Consider the
open balls of radii of the form r = 2 -^' centered at sequences which are 0 from
some n onward and use separability.]

(ii) Let {P"} be a consistent family of probability measures, with P" defined on
(Ii",."), and such that P. is concentrated on Q, = {0,1 }". Define a set function
for events of the form F = F(E 1 , ... , e") in (i), by

P(F)=P"({w ; =c,i=1,2,.. ,n}).

Show that there is a unique probability measure P on the sigmafield .^" of

cylinder sets of the form

C={a eft(w,.. .w n )EB},

where B c {0, 1 }", which agrees with this formula for Fe .f", n >_ 1.
(iii) Show that F := Un 1 .f" is a field of subsets of Q but not a sigmafield.
(iv) Show that P is a countably additive measure on F. [Hint: f) is compact and
the cylinder sets are both open and closed for the product topology on D.]
(v) Show that P has a unique extension to a probability measure on F. [Hint:
Invoke the Caratheodory Extension Theorem (Chapter 0, Section 1) under (iii),
(vi) Show that the above arguments also apply to any finite-state discrete-parameter
stochastic process.
4. Let (S2, F, P) and (S, 2) represent measurable spaces. A function X defined on S2
and taking values in S is called measurable if X - `(B) E F for all Be 2', where
X -' (B) = {X E B} _ {w E D: X(o) e B}. This is the meaning of an S-valued random
variable. The distribution of X is the induced probability measure Q on ,P defined by

Q(B)=P(X - '(B))=P(XEB), BEY.

Let (S2, .F, P) be the canonical model for nonterminating repeated tosses of a coin
and X(a) = w", WE n. Show that {X" X.1, . ..}, m an arbitrary positive integer,
is a measurable function on (S2, F, P) taking values in (S2, F) with the distribution
P; i.e., {X,", Xm+ ,, ... , } is a noncanonical model for an infinite sequence of coin
5. Suppose that Di" and D'2 are covariance matrices.
(i) Verify that aD" + D 21 , a, _> 0, is a covariance matrix.
(ii) Let {D ( " ) = ((a;;'))} be a sequence of covariance matrices (k x k) such that
lim a v;j ) = a i; exists. Show that D = ((a u )) is a covariance matrix.
*6. Let (t) = f e"'p(dx) be the Fourier transform of a positive finite measure p.
(Chapter 0, (8.46)).
(i) Show that ((t, t.)) is a nonnegative definite matrix for any t, < < t k .
(ii) Show that (t) = e - I` 1 if p is the Cauchy distribution.
*7. (Plya Criterion for Characteristic Functions) Suppose that 4 is a real-valued
nonnegative function on ( cc, oo) with 4(t) = 4(t) and 0(0) = 1. Show that if 0
is continuous and convex on [0, cc), then 0 is the Fourier transform (characteristic
function) of a probability distribution (in particular, for any t, < t 2 < < t k , k >_ 1,
((4(t ; ti ))) is nonnegative definite by Exercise 6), via the following steps.
(i) Check that 1 Its, Its _< 1, t E 68',
Y(t) = 0,
Itl > 1


is the characteristic function of the (probability) measure with p.d.f.

y(x) = (1 - cos x)/itx 2 , - oo <x < oc. Use the Fourier inversion formula
[(8.43), Chapter 0].
(ii) Check that the characteristic function of a convex combination (mixture) of
probability distributions is the corresponding convex combination of
characteristic functions.
(iii) Let 0 < a, < <a be real numbers, p ; >- 0, Y-"_, p ; = 1. Show that
D(t) = p, y(t/a 1 ) + + p, (t/a^) is a characteristic function. Draw a graph
of 0(t) and check that the slope of the segment between a k and ak+I
is -[(p k /a k ) + + (p/a)], k = 1, 2, ... , n - 1. Interpret the numbers
Pi , p, + p 2 , ... , p, + + p = I along the vertical axis with reference to the
polygonal graph of (t).
(iv) Show that a function 0(t) satisfying Plya's criterion can be approximated by
a function of the form (iii) to arbitrary accuracy. [Hint: Approximate 4(t) by
a polygonal path consisting of n segments of decreasing slopes.]
(v) Show that the (pointwise) limit of characteristic functions is a characteristic
*8. Show that the following define covariance functions for a Gaussian process.

(i) Q.j =e i,J = 0, 1, 2, .. .

(ii) Q = min(i, j) = (iii + I il - Ii -JI)/2


[Hint: Use Exercises 6 and 7.]

Exercises for Section I.7

1. Verify that (B,,, ... , B,,,,) has an m-dimensional Gaussian distribution and calculate
the mean and the variance-covariance matrix using the fact that the Brownian
motion has independent Gaussian increments.
2. Let {X,} be a process with stationary and independent increments starting at 0 with
EXs < cc for s> 0. Assume EX EX, are continuous functions of t.
(i) Show that EX, = mt for some constant m.
(ii) Show that Var X, = at for some constant a 2 > 0.
(iii) Calculate the limiting distribution of

Y( . ) (X, - mnt)

asn-> cc, for t>0,fixed.

3. (i) (Diffusion Limit Scalings) Let X, be a random variable with mean
x + t f (p - q)0 and variance t f A 2 pq. Give a direct calculation of p and q = I - p
in terms off and A using only the requirements that the mean and variance of
X, - x should stabilize to some limiting values proportional to t as f -. oc and
A -+0.
(ii) Verify convergence to the Gaussian distribution for the distribution of (Q//)S1 , j
as n -* co, where {S} is the simple random walk with p = (p/2 f) + Z, by
application of Liapunov's CLT (Chapter 0, Corollary 7.3).


4. Let {X} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and
zero drift.
(ii) Show that the process has the following scaling property. For each A > 0 the
process {Y} defined by Y, = A - ' t Z Xx, is distributed exactly as the process Al.
(ii) How does (i) extend to k-dimensional Brownian motion?
5. Let {X} be a stochastic process which has stationary and independent increments.
(i) Show that the distribution of the increments must be infinitely divisible; i.e., for
each integer n, the distribution of X, - X, (s < t) can be expressed as an n-fold
convolution of a probability measure p".
(ii) Suppose that the increment 24 - X s has the Cauchy distribution with p.d.f.
(t - s)/n[(t - s) 2 + x 2 ] for s < t, x e I8'. Show that the Cauchy process so
described is invariant under the rescaling { Y} where Y = A -' Xi, for A > 0; i.e.,
{Y,} has the same distribution as {X,}. (This process can be constructed by
methods of theoretical complements 1, 2 to Section IV.1.)
6. Let {X} be a Brownian motion starting at 0 with zero drift and diffusion coefficient
a 2 > 0. Define Y,= JXJ,t ?0.
(i) Calculate EY, Var Y,.
(ii) Is { Y} a process with independent increments?
7. Let R, = X, where {Xj is a Brownian motion starting at 0 with zero drift and
diffusion coefficient a 2 > 0. Calculate the distribution of R.
8. Let {B,} be a standard Brownian motion starting at 0. Define

V" _ - Bo- nie "I

(i) Verify that EV" = 2"' 2 EIB 1 I.

(ii) Show that Var V. = VarIB 1 j.
(iii) Show that with probability one, {B,} is not of bounded variation on 0 -< -<
t 1.
[Hint: Show that I' + l', >, >-
V", n 1, and, using Chebyshev's Inequality,
P(V">M)->lasn- ooforanyM>0.]
9. The quadratic variation over [0, t) of a function f on [0, oo) is defined by
V(f)= lim. .v"(t,f),where
v"(t,f) _ [f(kt/ 2") - f((k - 1 )t/ 2 ")] z

provided the limit exists.

(i) Show if f is continuous and of bounded variation then V (f) = 0.

(ii) Show that Ev"(t, {XX - a 2 t as n - cc for Brownian motion {X} with

diffusion coefficient a 2 and drift p.

(iii) Verify that v"(t, {X,}) - v et in probability as n - co.
(*iv) Show that the limit in (iii) holds almost surely. [Hint: Use Borel-Cantelli
Lemma and Chebyshev Inequality.]
10. Let {X} be the k-dimensional Brownian motion with drift lt and diffusion coefficient
matrix D. Calculate the mean of X, and the variance-covariance matrix of X.


11. Let {X} be any mean zero Gaussian process. Let t, < t 2 < < t".
(i) Show that the characteristic function of (X r ,... .. X,^) is of the form e -44 ) for
some quadratic form Q(E,) = <A4, i>.
(ii) Establish the pair-correlation decomposition formula for block correlations:

0 if n is odd
E{X,,X, 2 ...X,} =
* E{X,X, }. E{X,X, k } if n is even,

where Y* denotes the sum taken over all possible decompositions into all possible
disjoint pairs {t ; , t 3 }, ... , {t,,,, t,} obtained from {t,.. ... t"}. [Hint: Use induction
on derivatives of the (multivariate) characteristic function at (0, 0, ... , 0) by
first observing that c?e -1 Q 14) /c7 i = a,e t'2 and c?x ; /a5 t = a ;j , where
x i = Y J a ;j ^ t and A = ((a i; )).]

Exercises for Section I.8

1. Construct a matrix representation of the linear transformation on U8'` of the

increments (x,,, x, 2 x . ,x X, k ,) to (x,,, x, z , .
- , x tk ).

*2. (i) Show that the functions f: CEO, oo) E defined by

f(w) = max w(t), g(co) = min w(t)


are continuous for the topology of uniform convergence on bounded intervals.

(ii) Show that the set { f({X;"}) _< x} is a Borel subset of CEO, oo) if f is continuous
on C[0, oo).
(iii) Let f: C[0, oo) --* R'` be continuous. Explain how it follows from the definition
of weak convergence on C[0, oo) given after the statement of the FCLT (pages
23-24) that the random vectors f({X;"}) must converge in distribution to f({X })
(iv) Show that convergence in distribution on CEO, cc) implies convergence of the
finite-dimensional distributions. Exercise 3 below shows that the converse is
not true in general.
*3. Suppose that for each n = 1, 2, ... , {x"(t), 0 _< t _< 1}, is the deterministic process
whose sample path is the continuous function whose graph is given by Figure Ex.I.8.
(i) Show that the finite-dimensional distributions converges to those of the a.s.
identically zero process {z(t)}, i.e., z(t) - 0, 0 S t _< 1.
(ii) Check that max,,,,, x(t) does not converge to max o ,,,, z(t) in distribution.

2n n
Figure Ex.I.8

*4 Give an example to demonstrate that it is not the case that the FCLT gives
convergence of probabilities of all infinite-dimensional events in C[0, x). [Hint:
The polygonal process has finite total variation over 0 <_ t z 1 with probability 1.
Compare with Exercise 7.8.]
5. Verify that the probability density function p(t; x, y) of the position at time t of the
Brownian motion starting at x with drift p and diffusion coefficient a 2 solves the
so-called Fokker-Planck equation (for fixed x) given by

ap _ 1 2 02 p ap
at - Zo OY Z - p aY
(i) Check that for fixed y, p also satisfies the adjoint equation

ap =, 22

a l p a
c7 + p
at ax "^ ax

(ii) Show that p is a symmetric function of x and y if and only if p = 0.

(iii) Let c(t, y) = $ c o (x)p(t, x, y) dx, where c o is a positive bounded initial
concentration smoothly distributed over a finite interval. Verify that c(t, y)
solves the Fokker-Planck equation with initial condition c o , i.e.,

ac a2C ac
za g 2 - p, c(O , Y) = co(y).
8t = Y Y
6. (Collective Risk in Actuary Science) Suppose that an insurance company has an
initial reserve (total assets) of X0 > 0 units. Policy holders are charged a (gross)
risk premium rate a per unit time and claims are made at an average rate A. The
average claim amount is it with variance a 2 . Discuss modeling the risk reserve
process {X,} as a Brownian motion starting at x with drift coefficient of the form
a - p l and diffusion coefficient 2a 2 , on some scale.
7. (Law of Proportionate Effect) A material (e.g., pavement) is subject to a succession
of random impacts or loads in the form of positive random variables L,, L 2 , ..
(e.g., traffic). It is assumed that the (measure of) material strength T k after the kth
impact is proportional to the strength Tk _, at the preceding stage through the
applied load L k , k = 1, 2, ... , i.e., Tk = L,,T,k _,. Assume an initial strength To - 1
as normalization, and that E(log L 1 ) 2 < co. Describe conditions under which it is
appropriate to consider the geometric Brownian motion defined by {exp(pt + a 2 B,)},
where {Bj is standard Brownian motion, as a model for the strength process.
8. Let X 1 , X2 ,,.. be i.i.d. random variables with EX = 0, Var X. = a 2 > 0. Let
S. = X, + . + X,,, n >, 1, So = 0. Express the limiting distribution of each of the
random variables defined below in terms of the distribution of the appropriate
random variable associated with Brownian motion having drift 0 and diffusion
coefficient a 2 > 0.
(i) Fix 0>0, Y. = n -012 max{ISj : 1 _< k < n}.
(ii) Yn = n - 'I'S..
(iii) Y. = n 312 >I Sk . [Hint: Consider the integral of t -- S(n , l , 0 5 t - 1.]
9. (i) Write R n (x) = 1(1 + xfn)" - esj. Show that

+ I x en+
R(x) 1 1 1 _ r = 1_ ^x r t e^xl
n n Jj r! (n + 1)!

sup R(x) *0 as n ' oo (for every c > 0).

JxJ Sc

(ii) Use (i) to prove (8.6). [Hint: Use Taylor's theorem for the inequality, and
Lebesgue's Dominated Convergence Theorem (Chapter 0, Section 0.3).]

Exercises for Section I.9

1. (i) Use the SLLN to show that the Brownian motion with nonzero drift is transient.
(ii) Extend (i) to the k-dimensional Brownian motion with drift.
2. Let X, = X 0 + vt, t >, 0, where v is a nonrandom constant-rate parameter and X 0
is a random variable.
(i) Calculate the conditional distribution of X,, given XS = x, for s < t.
(ii) Show that all states are transient if v 0.
(iii) Calculate the distribution of X, if the initial state is normally distributed with
mean and variance a 2 .
3. Let {X,} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and
zero drift.
(i) Define { }} by Y, = tX,,, for t > 0 and Y0 = 0. Show that { Y} is distributed as
Brownian motion starting at 0. [Hint: Use the law of large numbers to prove
sample path continuity at t = 0.]
(ii) Show that {X,} has infinitely many zeros in every neighborhood of t = 0 with
probability 1.
(iii) Show that the probability that t - X, has a right-hand derivative at t = 0 is zero.
(iv) Use (iii) to provide another example to Exercise 8.4.
4. Show that the distribution of min,, 0 X is exponential if {X, } is Brownian motion
starting at 0 with drift p > 0. Likewise, calculate the distribution of max,,, X, when
*5. Let {Sn } denote the simple symmetric random walk starting at 0, and let
m= min S k , M= max Sk , n= 1, 2, ... .
OSk<n 05k-'n

Let {B,} denote a standard Brownian motion and let m = min,,,,, B,,
M = max, < ,,, B,. Then, by the FCLT, n" t j 2 (m n , Mn , Sn ) converges in distribution
to (m, M, B 1 ); for rigorous justification use theoretical complements 1.8, 1.9 noting
that the functional w + (min,,,,, co,, max,,,,, w,, w,) is a continuous map of the
metric space C[0, 1] into R 3 . For notational convenience, let

Pn(J)=P(Sn=1), Pn (u,v,y)=P(u<in _<MM <v,S.=y),

for integers u, v, y such that u _< 0 _< v, u < v and u _< y _< v. Also let
1(a, b) = P(a <Z < b), where Z has the standard normal distribution. The following
use of the reflection principle is taken from an exercise in P. Billingsley (1968),


Convergence of Probability Measures, Wiley, New York, p. 86. These results for
Brownian motion are also obtained by other methods in Chapter V.
(i) P(u, v, Y) = P,(Y) n(v, Y) it(u, Y) + n(v, u, y) + n(u, v, Y) t(v, u, v, Y)
n(u, v, u, y) + , where for any fixed sequence of nonnegative integers
Y1, Y2, , Yk, y, k, n, zc(y 1 , Y2, ... , Yk, y) denotes the probability that an n-step
random walk meets y, (at least once), then meets Y 2 , then meets y 3 , ... ,
then meets (-1) k- 'y k , and ends at y.

( 11 ) R(Y 1, Y2, ... , Yk, Y) = pn( 2 Y 1 + 2Y 2 + ... + 2y k - 1 + (-1)k + 1 y) if (-1)k + 'Y > Yk,
n(Y1, Y2, ... , Yk, Y) = Pn( 2 Y1 + 2Y 2 + ... + 2y k _ (_ 1)k+ l y) if (1)'y -< Yk
[Hint: Use Exercise 4.4(ii), the reflection principle, and induction on k. Reflect
through (-1)k y k _, the part of the path to the right of the first passage through
that point following successive passages through y1, Y2, ... , ( -1 ) k-1 Yk-2.]

(iii) p(u, v, y) _ p(y + 2k(v u)) p(2v y + 2k(v u)).

(iv) For integers, u -< 0 -< v, u <- y 1 < Y 2

P(u <m<v,y l <S,,<Y 2 )

= j P(y, + 2k(v u) < S. < Y 2 + 2k(v u)))

Y P(2vy 2 +2k(vu)<S<2vy,+2k(vu)).

[Hint: Sum over y in (iii).]

(v) For real numbers u < 0 -< v, u < y 1 < Y 2

P(u<m-<M<v,y 1 <B, <Y 2 )

_ ^(y, + 2k(v u), Y 2 + 2k(v u))


(D(2v Y 2 + 2k(v u), 2v y, + 2k(v u)).


[Hint: Respectively substitute the integers [u / ], [u/], [y,^],

[ --- Y2\/] into (iv) ([ ] denoting the greatest integer function). Use Scheffe's
Theorem (Chapter 0) to justify the interchange of limit with summation over k.]

(vi) P(M < v, y 1 < B, < ky2) = 1 (YI,Y2) '( 2 v y 2 , 2v YI)

[Hint: Take u = n I in (iv) and then pass to the limit.]

(vii) P(u <m -< M < v) _ (-1)kD(u + 2k(v u), v + 2k(v u)).

[Hint: Take y, = U, Y 2 = v in (v).]


(viii) P(sup^B,j < u) _ ( l ) k (D((2k 1)v, (2k + 1)v).


[Hint: Take u = v in (vii).]

Exercises for Section I.10

* 1. (i) Show that (10.2) holds at each point z _< 0 (> 0) of continuity of the distribution
function for min o , X. (max o <,,, X,). [Hint: These latter functionals are
(ii) Use (i) and (10.9) to assert (10.2) for all ^.
2. Calculate the probability that a Brownian motion with drift it and diffusion coefficient
a 2 > 0 starting at x will reach y : x in time t or less.
3. Suppose that solute particles are undergoing Brownian motion in the horizontal
direction in a semi-infinite tube whose left end acts as an absorbing boundary in
the sense that when a particle reaches the left end it is taken out of the flow. Assume
that initially a proportion i(x) dx of the particles are present in the element of
volume between x and x + dx from the left end, so that f b(x) dx = 1. For a given
drift it away from the left end and diffusion coefficient a 2 > 0, calculate the fraction
of particles eventually absorbed. What if p = 0?
4. Two independent Brownian motions with drift p ; and diffusion coefficient a?, i = 1, 2,
are found at time t = 0 at positions x i , i = 1, 2, with x, <x 2 .
(i) Calculate the probability that the two particles will never meet.
(ii) Calculate the probability that the particles will meet before time s > 0.
5. (i) Calculate the distribution of the maximum value of the Brownian motion
starting at 0 with drift p and diffusion coefficient a 2 over the time period [0, t].
*(ii) For the case p = 0 give a geometric "reflection" argument that P(max o , s <, Xs >_ y)
= 2P(X, _> y). Use (i) to verify this.
6. Calculate the distribution of the minimum value of a Brownian motion starting at
0 with drift I and diffusion coefficient a 2 over the time period [0, t].
7. Let {B 1 } be standard Brownian motion starting at 0 and let a, b > 0.
(i) Calculate the probability that at < B, < bt for all sufficiently large t.
(ii) Calculate the probability that {B,} last touches the line v = at instead of
y = bt. [Hint: Consider the process {Z,} defined by Zo = 0, Z, = tB, I , for t > 0.
and Exercise 9.3(i).]
8. Let {(B,(' ) , B 2 )} be a two-dimensional standard Brownian motion starting at (0, 0)
(see Section 7). Let r y = inf{t _> 0: B; 2 = y}, y > 0. Calculate the distribution of
Bty ) . [Hint: {B} and {B 2 } are independent one-dimensional Brownian motions.
Condition on r. Evaluate the integral by substituting u = (x 2 + y 2 )/t.]
9. Let {Br } be a standard Brownian motion starting at 0. Describe the geometric
structure of sample paths for each of the following stochastic processes and calculate
fY = B 1 if max o<s ,, Bs < a
(i) (Absorption)
1 Y = a if max o , s <, B s >_ a,
where a > 0 is a constant.

(*ii) (Reflection)
J Y=B,
if B,>a,

where a > 0 is a constant.

(*iii) (Periodic) Y, = B, [B, ],

where [x] denotes the greatest integer less than or equal to x.

10. Let, for a > 0,

a e
.f8(t) = (2 )"2 22 ' (t > 0).
t 3/2

(i) Verify the convolution property

fa * ff (t) = L + (t) for any a, > 0.

(ii) Verify that the distribution of; is a stable law with exponent 0 = 2 (index z)
in the sense that if Ti , T2 , ... , T. are i.i.d. and distributed as T. then
n -8 (T1 + + T.) is distributed as T. (see Eq. 10.2).
(iii) (Scaling property) ; is distributed as z 2 z1.
11. Let T. be the first passage time to z for a standard Brownian motion starting at 0
with zero drift.
(i) Verify that Ez= is not finite.
(ii) Show that Ee = e i 2 ' ^"', A > 0. [Hint: Tedious integration will
(iii) Use Laplace transforms to check that (1/n)z (,J J converges in distribution to
t z as n > oo.
12. Let {B,} be standard Brownian motion starting at 0. Let s < t. Show that the
probability that {B,} has at least one zero in (s, t) is given by (2/it) cos - '(s/t)`/ 2 .
[Hint: Let

p(x) = P({B,} has at least one zero in (s, t) ^ B, = x).

Then for x > 0,

p(x)=P(min B,-<OIB,=x)=P( max BOIB,=

5,.sr s,<r,<1

=P(max BxIB s =0)=P(t x -<ts).

Likewise for x < 0, p(x) = P(r _ x -< t s). So the desired probability can be obtained
by calculating

EP(IB,I) =
e zs ^ 2 dx.

Exercises for Section 1.11

Throughout this set of exercises {S"} denotes the simple symmetric random
walk starting at 0.
1. Show the following for r 0.

2n Irl
(i) P(S, ^0,S 2 ^0,...,S 2n - t # 0 ,S2 = 2r) = 2
fl + rJ n
= 2r) _ (2k)( 2n 2k) Iri 2 -zn
(ii) P(r' 2 n ) = 2k, S Zn
k/J nk+r /Jnk

(*iii) Calculate the joint distribution of (y, B 1 ).

2. Let U. = #{k < n: Sk ( or Sk > 0} denote the amount of time spent (by the
polygonal path) on the positive side of the state space during time 0 to n.
(i) Show that

P(UZn = 2k) = P(r,( zn ) = 2k)

rz(k(n k))'

Check k = 0, k = n first. Use mathematical induction on n and consider

the conditional distribution of U 2n given the time of the first return to 0. Derive
the last equality of (11.3) for U 2 using the induction hypothesis and (11.3).]
(ii) Prove Corollary 11.4 by calculating the distribution of the proportion of time
spent above 0 in the limit as n - oo for the random walk.
(iii) How does one reconcile the facts that U2 "/2n++Z as n --* co and P(S" > 0) -+
as n -+ ^o? [Hint: Consider the average length of time between returns to zero.]
*3. Let r (" 1 = min{k -< n: Sk = max o , ; , n S; } denote the location of the first absolute
maximum in time 0 to n. Then

{T (n) =k} _ {Sk> 0, Sk> S(,..., S,> Sk_1,Sk>-Sk+1,. ,Sk>- S n }

fork>- t,{T " =0}={Sk <-0,1 -<k-<n}.

( (

(i) Show that

p(r(zn^ = 0) = P( z (z" >> = 0) _ ( 2 n _2n

\n J2

[Hint: Use (10.1) and induction.]

(ii) Show that

P(T(z") = 2n) = P( ,r (2n+1) = 2n) _ 1(2n\2

2- n

[Hint: Consider the dual paths to {S k : k = 0, 1, ... , n} obtained by reversing the

order of the displacements, S, = S" S"_,, Sz = (S n Sn _ 1 ) + (S"_, Sn_2),


... , S;, = S". This transformation corresponds to a rotation through 180 degrees.
Use (11.2).]
(iii) Show that

P(i a") =
2k) = P(t (2 "' = 2k + 1) = 1 (2k)2-2k(2(n - k))2-2i"-k)
2\k/ n-k
for k = 1, ... , n in the first case and k = 0, ... , n - 1 in the second. [Hint: A
path of length 2n with a maximum at 2k can be considered in two sections.
Apply (i) and (ii) to each section.]

n C
(iv) lim P t /I1 -sin - ^,

*4. Let F, UU be as defined in Exercise 2. Define V. = #{k < V" ) : Sk _ , >, 0, Sk _> 0} =
UT "). Show that P(VZ " = 2r ( S 2 . = 0) = 1/(n + 1), r = 0, 1, ... , n. [Hint: Use
induction and Exercise 3(i) to show that P(V2 . = 2r, S 2 " = 0) does not depend on

Exercises for Section I.12

1. Show that the finite-dimensional distributions of the Brownian bridge are Gaussian.
2. Suppose that F is an arbitrary distribution function (not necessarily continuous).
Define an inverse to F as F - '(y) = inf{x: F(x) > y}. Show that if Y is uniform on
[0, 1] then X = F '(Y) has distribution function F.

3. Let {B r } be standard Brownian motion starting at 0 and let B* = B, - tB,, 0 < t <_ 1.
(i) Show that {B*} is independent of B,.
(ii) (The Inverse Simulation) Give a construction of standard Brownian motion
from the Brownian bridge. [Hint: Use (i).]
*4. Let {B,} be a standard Brownian motion starting at 0 and let {B*} be the Brownian
(i) Show that for time points 0 < t, <t 2 < < t k _ 1,

1imP(B,,<x 1 ,i=1,2,...,k!-s<B, <s)=P(B,*.<x i ,i=1,...,k).


Likewise, for conditioning on B, e DE = [0, e) or DE = [-s, 0) the limit is

unchanged; for the existence of {B*} as the limit distribution (tightness) as
e -* 0, see theoretical complement 3.
(ii) Show that for m* = info ,,,, B*, M* = sup 0 ,,, B*, u <0< v,

P(u < m* _< M* _< v) _ exp{-2k2(v - u) 2 }


- exp{-2[v+k(v-u)]2}.

[Hint: Express as a limit of the ratio of probabilities as in (i) and use Exercise
9.5(v). Also, 4(x, x + e) = e/(2n) " 2 exp( - x 2 /2) + o(1) as e -+ 0.]


(iii) Prove

sup IB*l<yI=1+2 J (-1)ke - zk 2 y 2 , y>0.

P^ OSt6l / k=1

[Hint: Take u = v in (ii).]

(iv) P(M* < v) = 1 e 22 , v > 0. [Hint: Use Exercise 9.5(vi) for the ratio of
probabilities described in (i).]
5. (Random Walk Bridge) Let {S.} denote the simple symmetric random walk starting
at 0.
(i) Calculate P(S, = y I S 2 = 0), 0 -< m < 2n.
(*ii) Let U 2 . = # {k 2n: Sk _, 0, Sk > 0}. Calculate P(U2 , = r S Z = 0). [Hint:
See Exercise 11.4*.]

*6. (Brownian Meander) The Brownian meander {B+ } is defined as the limiting
distribution of the standard Brownian motion {B,} starting at 0, conditional on
{m = min,,, I B, > e} as E > 0 (see theoretical complement 4 for existence). Let
m + = min a ^ B,+ , M + = max o ,,, , B+ . Prove the following:

(i) P(M -< x, B -< y) = ^ [e-l2kx)2/2 - 0 < y x.

[Hint: Express as a limit of ratios of probabilities and use Exercise 9.5(v). Also
P(m > e)=(2/tt) 2 +o(1);see Exercise 10.5(ii)notingmin(A)= max(A)
and symmetry. Justify interchange of limits with the Dominated Convergence
Theorem (Chapter 0).]
(ii) P(M + < x) = I + 2 Yk I ( 1) k exp{ (kx) 2 /2}. [Hint: Consider (i) with
y = x.]
(iii) EM + = (2tt)'t 2 log 2 = 1.7374.... [Hint: Compute f P(M + > x) dx from
(iv) (Rayleigh Distribution) P(B, < x) = I e 22 , x > 0. [Hint: Consider (i) in
the limit as x oo.]
*7. (Brownian Excursion) The Brownian excursion {B* + } is defined by the limiting
distribution of {B} conditioned on {m* > s} as d0 (see theoretical complement
4 for existence). Let M* = max 0 , 1 B* + . Prove the following:

(i) P(M* + -< x) = 1 + 2 1 [l (2kx) 2 ] exp{(2kx) 2 /2}, x > 0.

[Hint: Write P(M* + < x) as the limit of ratios as 0. Multiply numerator

and denominator by s -Z , apply Exercise 4, and use ]'Hospital's rule twice to
evaluate the limit term by term. Check that the Dominated Convergence
Theorem (Chapter 0) can be used to justify limit interchange.]
(ii) EM* + = (n/2)" 22 . [Hint: Note that interchange of (limits) integral with sum in
(i) to compute EM* + = 10 P(M* + > x) dx leads to an absurdity, since the
values of the termwise integrals are zero. Express EM* + as
Q, q)

2 lim Y [(2kx) 2 l]exp{ ' (2kx) 2 } dx

e - 0 k=1


and note that for k >- A the integrand is nonnegative on [A, oc ). So Lebesgue's
monotone convergence can be applied to interchange integral with sum over
k > 1/(20) to get zero for this. Thus, EM* + is the limit as 0 - 0 of a finite
sum over k <i of an integral that can be evaluated (by parts). Note that this
gives a Riemann sum limit for 2J exp(- Zx 2 ) dx = (it/2)' 1 2 .]

(iv/2)'t 2 , if r = 1
(2(2)li2)-.4( n )lt2 r 1 (r), if r = 2, 3 , ... ,
\ 2 /

where C(r) _ ^k , k ' is the Riemann Zeta function (r >, 2). [Hint: The case
r = 1 is given in (ii) above.] For the case r >- 2, we have

p*+(r)=r^xr-1(1 - F(x))dx=2r ^^^t' '{(2t) 2 1}e ^ z n 22 dt,

0 k=l k 0

where the interchange of limits is justified for r >- 2 since

i -I tr - 'I(2t) 2 _ lle - an = rz dt < oo, r = 2, 3, ... .
k=1 k f,^ '

In particular, letting Z denote a standard normal random variable,

2 (n)l/ 2 [EIZI . +I EIZI'-1]
u*+(r) = 2 - 'r^(r)(2)''

Consider the two cases whether r is odd or even.

8. Prove the Glivenko-Cantelli Lemma by justifying the following steps.
(i) For each t, the event {F (t) - F(t)} has probability 1.

(ii) For each t, the event {F(t) - F(t )} has probability 1.

(iii) Let r(y) = inf{t: F(t) -> y}, 0 <y < 1. Then F(z(y) ) y < F(r(y)).
(iv) Let
= max {IF(z(k/m)) - F(r(k/m))I, IF(t(k/m) - ) - F(i(k/m) - )I}.
I km

Then, by considering the cases

TI k m 1 )^i^T^m^, t<1(m)andt>t(1),

check that

sup IF(t) - F(t)I < D, + 1


m m
(v) C= U U {{F(t(k/m))+-*F(i(k/m))} u {F(z(k/m) - )+-* F(z(k/m) - )}


has probability zero, and for w e C and each m

D;n , n (w)->0 asn- co.

(vi) sup IF (t, w) - F(t)J -> 0
F as n -+ co for we C.

9. (The Gnedenko-Koroljuk Formula) Let (X 1 ..... X) and (Y1 .....) be two

independent i.i.d. random samples with continuous distribution functions F and G,
respectively. To test the null hypothesis that both samples are from the same
population (i.e., F = G), let Fn and G. be the respective empirical distribution functions
and consider the statistic Dn , n = sup x jFn (x) - G(x)I. Under the null hypothesis,
X 1 ..... X, Y,..... Y are 2n i.i.d. random variables with the common distribution
F. Verify that under the null hypothesis:
(i) the distribution of Dn. does not depend on F and can be explicitly calculated
according to the formula

P Dn.n < r = P max (S kc2nl*

n O<ks2n

where {Sk 2 nj *: k = 0, 1, 2, ... , 2n} is the simple symmetric random walk bridge
(starting at 0 and tied down at k = 2n) as defined in Exercise 5. [Hint: Arrange
X, Y1 ..... Y in increasing order as X(1) < X (2) < < X 2
define the kth displacement of {Sk 2 n ) *} by

Sk2nl* Sk2n`* = + 1 if X(k) E {X,, ... , Xn }

Skan)* - Skzn),* = - 1 if X(k) E { Y1 , ... , Yn }. ]

(ii) Find the analytic expression for the probability in (i). [Hint: Consider the event
that the simple random walk with absorbing boundaries at r returns to 0 at
time 2n. First condition on the initial displacement.]
(iii) Calculate the large-sample-theory (i.e., asymptotic as n - x) limit distribution
of fn D., n . See Exercise 4(iii).
(iv) Show

r'\ n-r/f
Pl sup (F.(x)- Gn(x))<= 1 ---- r= 1,...,n.
n () 2n

[Hint: Only one absorbing barrier occurs in the random walk approach.]

Exercises for Section 1.13

1. Prove that t defined by (13.5) (r = 1, 2, ...) are stopping times.
2. If r is a stopping time and m > 0, prove that r A m := min{r, m} is a stopping time.
3. Prove E(S l,, > .,) -*0 as m -* oo under assumptions (2) and (3) of Theorem 13 1.


4. For the simple symmetric random walk, starting at x, show that E{S
yP(r y = r) for r _< m.
, ,1(, ,)} _
5. Prove that EZn is independent of n (i.e., constant) for a martingale {Zn ). Show also
that E(Zn I {Z o , ... , Zk )) = Zk for any n> k.
6. Write out a proof of Theorem 13.3 along the lines of that of Theorem 13.1.
7. Let {Sn } be a simple symmetric random walk with p a (2, 1).
(i) Prove that {(q/p)s ': n = 0, 1, 2, ...} is a martingale.
(ii) Let c <x <d be integers, So = x, and T = z, A t d := min(t,, r d ). Apply Theorem
13.3 to the martingale in (i) and t to compute P({S n } reaches c before d).
8. Write out a proof of Proposition 13.5 along the lines of that of Proposition 13.4.
9. Under the hypothesis that the pth absolute moments are finite for some p _> 1, derive
the Maximal Inequality P(MM >, A) < EIZn I/ti" in the context of Theorem 13.6.
10. (Submartingales) Let {Z n : n = 0, 1, 2, ...} be a finite or infinite sequence of
integrable random variables satisfying E(Zn+ I {Z 0 , ...Zn }) > Zn for all n. Such a
sequence {Zn } is called a submartingale.
(i) Prove that, for any n > k, E(Zn ( {Z0.....4)) >- Z.
(ii) Let Mn = max{Z o , ... , Zn }. Prove the maximal inequality P(MM _> A) _< EZ/A 2
for A > 0. [Hint: E(ZkIAk(Zn Zk)) = E(Z,IAkE(Zn Zk I {Zo, ... , Zk})) i 0
for n>k, where A k :={Z 0 <A,...,Zk _ I <A,Zk ->.i}.]
(iii) Extend the result of Exercise 9 to nonnegative submartingales.
11. Let {Zn } be a martingale. If EIZ,,I < oo then prove that IZ,,I is a submartingale,
p >_ 1. [Hint: Use Jensen's or Hlder's Inequality, Chapter 0, (2.7), (2.12).]
12. (An Exponential Martingale) Let {X,: j _> 0} be a sequence of independent random
variables having finite moment-generating functions 4,(^):= E exp{^XX } for some
96 0. Define Sn := X, + + XX , Zn = exp{^Sn }/fl7 = , q().
(i) Prove that {Zn } is a martingale.
(ii) Write M = max{S...... Sn }. If > 0, prove that

P(MM - A) -< exp{ ZA}
11 O ; (Z) ( >0).

(iii) Write mit = min{S l , ... , Sn }. If < 0, prove

P(mn -< A) -< exp{ZA} 11 4;(Z) (A > 0).

i =1

13. Let {Xn : n >_ 1) be i.i.d. Gaussian with mean zero and variance a 2 > 0. Let
Sn = X, + + Xn , MM = max{S 1 , ... , Sn }. Prove the following for A > 0.
(i) P(MM _> 2) < exp{ 2 2 /(2a 2 n)). [Hint: Use Exercise 12(ii) and an appropriate
choice of .]
(ii) P(max {ISST: 1 < j < n} >_ Aaln) _< 2 exp{ A 2 /2}.

14. Let r ' r 2 be stopping times. Show the following assertions (i) (v) hold.
(i) z l v r 2 '= max(r l , tr z ) is a stopping time.


(ii) il A r Z := min(r l , 2 2 ) is a stopping time.

(iii) t l + 1 2 is a stopping time.
(iv) at,, where a is a positive integer, is a stopping time.
(v) If i l < t Z a.s. then it need not be the case that 1 2 r, is a stopping time.
(vi) If r is an even integer-valued stopping time, must zr be a stopping time?
15. (A Doob-Meyer Decomposition)
(i) Let { } } be an arbitrary submartingale (see Exercise 10) with respect to
sigmafields , o c ,yl c ,F2 c . Show that there is a unique sequence { V} such
(a) 0= VQ _< V <_ _< . .
(b) V. is .f _ I -measurable.

(c) {Z} := { } V} is a martingale with respect to .. [Hint: Define

V=V_ 1 +E{YIF 1 }Y,n> 1.]
(ii) Calculate the {V}, {Z} decomposition for Y = S, where {S} is the simple
symmetric random walk starting at 0. [Note: A sequence { V} satisfying (b) is
called a predictable sequence with respect to {.y}.]
16. Let {S} be the simple random walk starting at 0. Let {G} be a predictable sequence
of nonnegative random variables with respect to .f = a{X l , ... , X} = a So , ... , S},
where X. = SS S,_ 1 (n = 1, 2, ...); i.e., each G. is 9_ 1 -measurable. Assume each
G. to have finite first moment. Such a sequence {G} will be called a strategy. Define

Wn =Wo+ Y Gk(Sk Sk-I), nil,


where Wo is an integrable nonnegative random variable independent of {S}

(representing initial capital). Show that regardless of the strategy {G} we have the
(i) If p = Z then { W} is a martingale.
(ii) If p > Z then { W} is a submartingale.
(iii) If p < Z then {W} is a supermartingale (i.e., EIWI < cc, E(Wn+1 I f) W,
(iv) Calculate EW, n > 1, in the case of the so-called double-or-nothing strategy
defined by G = 2 S ^ - '1 (S=i _ 1j , n >_ 1.

17. Let {S} be the simple symmetric random walk starting at 0. Let r = inf{n ? 0:
(i) Calculate Er from the distribution of t.
(ii) Use the martingale stopping theorem to calculate Et.
(*iii) How does this generalize to the cases r = inf{n ? 0: S = b n}, where b is
a positive integer? [Hint: Check that n + S. is even for n = 0, 1, 2, ....]
18. (i) Show that if X is a random variable such that g(z) = Ee^ x is finite in a
neighborhood of z = 0, then EX < oo for all k = 1, 2.....
(ii) For a Brownian motion {X,} with drift p and diffusion coefficient a 2 , prove that
exp{AX, Atp A 2 a 2 t/2} (t _> 0) is a martingale.
19. Consider an arbitrary Brownian motion with drift and diffusion coefficient a 2 > 0.
(i) Let m(x) = ET", where Ts is the time to reach the boundary {c, d} starting at
x e [c, d]. Show that m(x) solves the boundary-value problem


d e m dm
zag d2 + dz 1,

(ii) Let r(x) = Px (r d < ;) for x e [c, d]. Verify that r(x) solves the boundary value

Z 2 x z

+ dr = 0,
r(c) = 0, r(d) = 1.

Exercises for Section I.14

1. In the case = Z, consider the process {Z,N (s)} defined by
(n) ,v
S n
n=1 2 ...N
1 \NJ f i D,

and linearly interpolate between n/N and (n + 1)/N.

(i) Show that {Z N } converges in distribution to BS + 2cs 2 (1 s 2 ), where
{B'} = {B s sB 1 : 0 < s < 1} is the Brownian Bridge.
(ii) Show that R N /(J/ DN ) converges in distribution to maxo, S,1 B; min o , s , , B'.
(iii) Show that the asymptotic distribution in (ii) is nondegenerate.


Theoretical Complements to Section I.1

1. Events that can be specified by the values of X, X,, 1 , ... for each value of n (i.e.,
events that depend only on the long-run values of the sequences) are called tail events.

Theorem T.1.1. (Kolmogorov Zero-One Law). A tail event for a sequence of

independent random variables has probability either zero or one. 0

Proof. To see this one uses the general measure-theoretic fact that the probability
of any event A belonging to the sigmafield F = 6 {X,, X2, ... , Xn , ...} generated (

by X 1 , X2 ,... , X, ...) can be approximated by events A 1 ..... A n , .. belonging to


the field of events ,`fo = Un 1 ar {X i ..... X} in the sense that A. e a{X ... , X} for
each n and P(AAA) -+ 0 as n -+ oo, where A denotes the symmetric difference
AAA = (A n A n v (A` n A). Applying this approximation to a tail event A, one
obtains that since Ac u{X +1, X +2, ...} for each n, A is independent of each event
A. Thus, 0 = lim... P(AAA) = 2P(A)P(AC) = 2P(A)(1 P(A)). The only solutions
to the equation x(1 x) = 0 are 0 and 1. n
2. Let S. = X l + + X, n 1. Events that depend on the tail of the sums are trivial
(i.e., have probability I or 0) whenever the summands X 1 , X 2 , ... are i.i.d. This is a


consequence of the following more general zeroone law for events that symmetrically
depend on the terms X 1 , X 2 , .. . of an i.i.d. sequence of random variables (or vectors).
Let ' denote the sigmafield of subsets of I8x' _ {(x,, x 2 , ...): x ; E W} generated by
events depending on finitely many coordinates.

Theorem T.1.2. (HewittSavage ZeroOne Law). Let X,, X 2 ,. . . bean i.i.d. sequence
of random variables. If an event A = {(X,, X 2 , ...) e B}, where Be , is invariant
under finite permutations (Xi ,, X.....) of terms of the sequence (X,, X 2 , ...), that
is, A = {(X, X......) e B} for any finite permutation (i,, i 2 , ...) of (1, 2, ...), then
P(A) = I or 0.

As noted above, the symmetric dependence with respect to {X n } applies, for

example, to tail events for the sums {S n }.

Proof. To prove the HewittSavage 0-1 law, proceed as in the Kolmogorov 0-1 law
by selecting finite-dimensional approximants to A of the form A n = {(X,, ... , Xn ) e B},
B. e 4n, such that P(AAA,) - 0 as n * oo. For each fixed n, let (i,, i Z , ...) be the
permutation (2n, 2n 1, ... , 1, 2n + I, ...) and define A. = {(X; , ... , X; ,) e B}.
Then A and A n are independent with P(A, n A n ) = P(A,)P(A,) = (P(A n )) 2 - (P(A)) z
as n ^ co. On the other hand, P(AA,) = P(ALA) * 0, so that P(A,O n ) 0 and,
in particular, therefore P(A n r n ) * P(A) as n 4 co. Thus x = P(A) satisfies x = x 2 .

Theoretical Complements to Section 1.3

1. Theorem T.3.1. If {Xn } is an i.i.d. sequence of integer-valued random variables for
a general random walk S. = X, + + X,,, n >_ 1, So = 0, on Z with 1 - EX, = 0,
then {Sn } is recurrent.

Proof. To prove this, first observe that P(S, = 0 i.o.) is 1 or 0 by the HewittSavage
zeroone law (theoretical complement 1.2). If X , P(S n = 0) < oc, then P(S, = 0
i.o.) = 0 by the BorelCantelli Lemma. If Y_ , P(S, = 0) is divergent (i.e., the
expected number of visits to 0 is infinite), then we can show that P(Sn = 0 i.o.) = 1
as follows. Using independence and the property that the shifted sequence
Xk , Xk ,,, ... has the same distribution as X,, X 2 , ... , one has

1 >, P(S, = 0 finitely often) > P(S, = 0, S, 0, m> n)

P(Sn =0)P(Sm S, 540,m>n)

= Y_ P(S = 0)P(Sm, 0,m i 1).

Thus, if Z P(S, = 0) diverges, then P(S, : 0, m >_ 1) = 0 or equivalently P(S m = 0

for some m >_ 1) = 1. This may now be extended by induction to get that at least r
visits to 0 is certain for each r = 1, 2, .... One may also use the strong Markov
property of Chapter II, Section 4, with the time of the rth visit to 0 as the stopping time.
From here the proof rests on showing E n P(S, = 0) is divergent when p = 0.
Consider the generating/unction of the sequence P(S, = 0), n = 0, 1, 2, ... , namely,


g(x):= P(S" = 0)x", Ix) < 1.


The problem is to investigate the divergence of g(x) as x -+ 1 . Note that P(S" = 0) -

is the 0th-term Fourier coefficient of the characteristic function (Fourier series)

Ee tts" _ P(S" = k) e ttk


1 1 "
P(S " = 0) = Ee"s^ dt = (p"(t) dt
lac _ 2n _"

where cp(t) = Ee. It follows that for lxl < 1,

1 N dt
9(x) = 2n " 1 - xtp(t)

Thus, recurrence or transience depends on the divergence or convergence, respectively,

of this integral. Now, with p = 0, we have q(t) = 1 - o(Iti) as t - 0. Thus, for any
e > 0 there is a 6 > 0 such that I1 - cp 1 (t)I ltl. I(P2(t)I lti, for Itl _< S, where
q(t) = q 1 (t) + i , 2 (t) has real and imaginary parts q, qi 2 . Now, for 0 < x < 1, noting
that g(x) is real valued,

f n dt f ( l
" Re 1 dt =
f 1 - x(p (t) dt a 1 - xq,(t)
J n -- x0(t) - .L \1 - xtP(t)/ J -" I 1 - x^P(t)Iz j I1 - x^V(t)I Z

> a
J- a (1 -
1 -x(p 1 (t)

x(V1(t))2 + x2(t)


dt v
f b
-6 (1
1 -x
- x + xslti) 2 + x2e2t2

dt> a 1-x dt
2(1 - x) 2 + 3x 2 r 2 t 2 3[(1 -x) 2 + e 2 t 2 ]

2 - 1^ ES 1 n
=tan /J-^ asx-' 1.
3e 1 - x 3E

Since e is arbitrary, this completes the argument. n

The above argument is a special case of the so-called Chung-Fuchs recurrence

criterion developed by K. L. Chung and W. H. J. Fuchs (1951), "On the Distribution
of Values of Sums of Random Variables," Mem. Amer. Math. Soc., No. 6.

Theoretical Complements to Section I.6

1. The Kolmogorov Extension Theorem holds for more general spaces S than the case
S = R' presented. In general it requires that S be homeomorphic to a complete and
separable metric space. So, for example, it applies whenever S is R k or a rectangle,


or when S is a finite or countable set. Assuming some background in analysis, a

simple proof of Kolmogorov's theorem (due to Edward Nelson (1959), "Regular
Probability Measures on Function Space," Annals of Math., 69, pp. 630-643) in the
case of compact S can be made as follows. Define a linear functional Ion the subspace
of C(S') consisting of continuous functions on S' that depend on finitely many
coordinates, by

l(f)' L

'I "ki
f(x1 .,..., x !k)pI1....f Pik(dxi^ . . .dx^k)s


f((xl)IE/) = J (xi i , . . . , xik).

By consistency, l is a well-defined linear functional on a subspace of C(S') that, by

the StoneWeierstrass Theorem of functional analysis, is dense in C(S'); note that S'
is compact for the product topology by Tychonoff's Theorem from topology (see H.
L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, pp. 174, 166). In
particular, I has a natural extension to C(S') that is linear and continuous. Now
apply the RieszRepresentation Theorem to get a (probability) measure p on (S', .y )
such that for any Je C(S'), l(f) = f s t f dp (see Royden, loc. cit., p. 310). To make
the proof in the noncompact but separable and complete case, one can use a
fundamental (homeomorphic) embedding of S into IJ (see P. Billingsley (1968),
Convergence of Probability Measures, Wiley, New York, p. 219); this is a special case
of Urysohn's Theorem in topology (see H. L. Royden, loc. cit., p. 149). Then, by
making a two-point compactification of R', S can be further embedded in the Hilbert
cube [0,1], where the measure p can be obtained as above. Consistency allows one
to restrict p back to S. 0

Theoretical Complements to Section I.7

1. (Brownian Motion and the Inadequacy of Kolmogorov's Extension Theorem) Let

S = C[0,1] denote the space of continuous real-valued functions on [0, 1] equipped
with the uniform metric

p(x, y) = max Jx(t) y(t)I, x, ye C[0, 1].


The Borel sigmafield .4 of C[0, 1] for the metric p is the smallest sigmafield of subsets
of C[0, 1] that contains all finite-dimensional events of the form

{xeC{0,1]:a.<x(t )_<b ,i= 1,.. ,k},

; ;

k 1, 0 <... < t, I, a 1 ... .a k' b.,.. ,b k C_R"

The problem of constructing standard Brownian motion (starting at 0) on the time

interval 0 _< t _< 1 corresponds to constructing a process {X,: 0 <_ t _< 1} having
Gaussian finite-dimensional distributions with mean zero and variancecovariance
matrix y ;; = min(t i , t.) for the time points 0 _< t o < . < t k _< 1, and having a.s.
continuous sample paths. One can easily construct a process on the product space

S2 = t O . 11 (or R 10, ' ) ) equipped with the sigmafeld F generated by finite-dimensional

events of the form {x E 118 10 ' ) : a, < x(t 1 ) _< b i , i = 1, ... , k} having the prescribed
finite-dimensional distributions; in particular, consistency in the Kolmogorov
extension theorem can be checked by noting that for any time points
t 1 < t 2 < . < t k , the matrix y ;j = min(t,, t j ), 1 i, j _< k, is symmetric and
nonnegative definite, since for any real numbers x l , ... , x k , t o = 0,

min(i,j) / k \2
^yijXiXj=>Xi Xj ^ (Lr trt) = ttrtr-1) xi J1 iO.
i,j i,j r=1 r i=r

However, the problem with this construction of a probability space (i2, F, P) for
{X,} is that events in F can only depend on specifications of values at countably
many time points. Thus, the subset C[0, 1] of S2 is not measurable; i.e., C[O, 1] 0 F.
This dilemma is resolved in the theoretical complement to Section I.13 by showing
that there is a modification of the process {X} that yields a process {B,} with sample
paths in C[O, 1] and having the same finite-dimensional distributions as {X}; i.e.,
{B,} is the desired Brownian motion process. The basic idea for this modification is
to show that almost all paths q + Xq , q e D, where D is a countable dense set of time
points, are uniformly continuous. With this, one can then define {B,} by the continuous
extension of these paths given by

B, _ XQ ift=qeD

lim Xq if t D.

It is then a simple matter to check that the finite-dimensional distributions of {Xr }

carry over to {B,} under this extension. The distribution of {B r } defines the Wiener
measure on C[O, 1]. So, from the point of view of Kolmogorov's (canonical)
construction, the main problem remains that of showing a.s. uniform continuity on
a countable dense set. A solution is given in theoretical complement 13.1.
2. In general, if {X,: t e i} is a stochastic process with an uncountable index set, for
example I = [0, 1], then the measurability of events that depend on sample paths at
uncountably many points t e I can be at issue. This issue goes beyond the
finite-dimensional distributions, and is typically solved by exploiting properties of
the sample paths of the model. For example, if I is discrete, say I = {0, 1, 2, ...},
then the event {sup , X. > x} is the countable union {X > x} of measurable
sets, which is, therefore, measurable. But, on the other hand, if I is uncountable, say
1 = [0, 1], then {sup,., X, > x} is an uncountable union of measurable sets. This of
course does not make {sup,,, X,> x} measurable. If, for example, it is known that
{XX } has continuous sample paths, however, then, letting T denote the rationals in
[0, 1], we have {sup,,, X, > x} = {sup, ET X, > x} = U tET {X> x}. Thus {sup,,, X, > x}
is seen to be measurable under these circumstances. While it is not always possible
to construct a stochastic process {X,: t _> 0} having continuous sample paths for a
given consistent specification of finite-dimensional distributions, it is often possible
to construct a model (S2, F, P), {Xr : t 0} with the following property, called
separability. There is a countable dense subset T of I = [0, oo] and a set D e F with
P(D) = 0 such that for each we S1 - D, and each t e i there is a sequence t 1 , t 2 ,... in
T (which may depend on w) such that t -> t and X(w) -+ X(w) (P. Billingsley (1986),


Probability and Measure, 2nd ed., Wiley, New York, P. 558). In theory, this is enough
sample path regularity to make manageable most such measurability issues connected
with processes at uncountably many time points. In practice though, one seeks to
explicitly construct models with sufficient sample path regularity that such
considerations are often avoidable. The latter is the approach of this text.

Theoretical Complements to Section I.8

1. Let {X;" }, n = 1, 2, ... and {X,} be stochastic processes whose sample paths

belong to a metric space S with metric p; for example, S = C[O, 1] with

p(aw, ri) = sup o , t ,, 1co, n,j, w, ry E S. Let .4 denote the Borel sigmafield for S. The
distributions P of {X,} and P. of {X," }, respectively, are probability measures on

(S, 9). Assume {X;" } and {X} are defined on a probability space (S2, .F , Q). Then,

(i) P(B) = Q({w E ): {X,(uo)} E B})

(ii) P = Q({ (U aft { ^n) (m)} E B}), n >- 1.

Convergence in distribution of {X} to {X} has been defined in the text to mean that
the sequence of real-valued random variables Y:= f({X;" }) converges in distribution )

to Y:= f({X})for each continuous (for the metric p) function f: S - W. However,

an equivalent condition is that for each bounded and continuous real-valued function
f: S one has,

lim Ef({X!" }) = Ef({X,}).

) (T.8.2)

To see the equivalence, first observe that for any continuous f: S R', the
functions cos(rf) and sin(rf) are, for each r E R', continuous and bounded functions
on S. Therefore, assuming that condition (T.8.2) gives the convergence of the
characteristic functions of the Y. to that of Y for each continuous f on S. In particular,
the Y" must converge in distribution to Y. To go the other way, suppose that f: S ff8'
is continuous and bounded. Assume without loss of generality that 0 <f _< 1. Then,
for each N _> 1,

" k-1 /k-1 k) ( N k (k-1 kl

P < f< - /J , J fdP^ P l----< < /J (T.8.3)
k , N N N s k=1 N N N

Equivalently, by rearranging terms,

N k Ej P(f> k N ) -S jsfdP_<- ky' P^f>k N l ^. (T.8.4)

Likewise, these inequalities hold with P. in place of P. Therefore, by (T.8.4) applied

to P,
x _ 1
fdP^N P^f> k N l l
S k=t

1N kj=, liminf P f> k N- I


liminf f dP + N , (T.8.5)

by (T.8.4) applied to P, and the fact that lim Prob(} > x) = Prob(Y > x) for all
points x of continuity of the d.f. of Y implies liminf Prob( Y > y) >- Prob(Y > y) for
ally. Letting N -' gives

S J f dP -< liminf J fdP..

n s

The same argument applied to f gives that

s J f dP >- limsup f dP.


Thus, in general,


J s
f dP < I f dP -< liminf I f dP
i s ', s

which implies

limsup J f dP = liminf J f dP = J f dP. (T.8.7)

s s s

This is the desired condition (T.8.2). n

With the above equivalence in mind we make the following general definition.

Definition. A sequence {P} of probability measures on (S, .4) converges weakly (or in
distribution) to a probability measure P on (S, 9) provided that lim f s f dP = f s f dP
for all bounded and continuous functions f: S -+ U8'.

Weak convergence is sometimes denoted by P P as n -+ oo. Other equivalent

notions of convergence in distribution are as follows. The proofs are along the lines
of the arguments above. (P. Billingsley (1968), Convergence of Probability Measures,
Theorem 9.1, pp. 11-14.)

Theorem T.8.1. (Alexandrov). P, = P as n -. oo if and only if

(i) lim.^ P(A) = P(A) for all A e -4 such that P(A) = 0, where A denotes the
boundary of A for the metric p.
(ii) limsup, P(F) P(F) for all closed sets F c S.
(iii) liminf, P,(G) P(G) for all open sets G c S.

2. The convergence of a sequence of probability measures is frequently established by

an application of the following theorem due to Prohorov.


Theorem T.8.2. (Prohorov). Let {PP } be a sequence of probability measures on the

metric space S with Bore! sigmafield . If for each r > 0 there is a compact set K E c S
such that

P(K) > 1 a for all n = 1, 2, ... , (T.8.8)

then {P.} has a subsequence weakly convergent to a probability measure Q on (S, t7).
Moreover, if S is complete and separable then the condition (T.8.8) is also necessary.

The condition (T.8.8) is referred to as tightness of the sequence of probability measures

{PP }. A proof of sufficiency of the tightness condition in the special case S = 11' is
given in Chapter 0, Theorem 5.2. For the general result, consult Billingsley, loc. cit.,
pp. 37-40. A version of (T.8.8) for processes with continuous paths is computed
below in Theorem T.8.4.
3. In the case of probability measures {P P } on S = C[0, 1], if the finite-dimensional
distributions of P. converge to those of P and if the sequence {P n } is tight, then it
will follow from Prohorov's theorem that {P P } converges weakly to P. To check
tightness it is useful to have the following characterization of (relatively) compact
subsets of C[0, 1] from real analysis (A. N. Kolmogorov and S. V. Fomin (1975),
Introductory Real Analysis, Dover, New York, p. 102).

Theorem T.8.3. (Arzela-Ascoli). A subset A of functions in C[0, 1] has compact

closure if and only if

(i) sup Iwol < c,


(ii) lim sup v w (S) = 0,

5-o w.A

where v() is the oscillation in we C[0,1 ] defined by v w (S) = sup <5Iws co . D

The condition (ii) refers to the equicontinuity of the functions in A in the sense that
given any r > 0 there is a common S > 0 such that for all functions w e A we have
Iw, cw,I < e if It sI < b. Conditions (i) and (ii) together imply that A is uniformly
bounded in the sense that there is a number B for which

IIwII := sup ow,l _< B for all w e A.


This is because for N sufficiently large we have sup WEA v w (1/N) < I and, therefore,
for each 0<t<1
Iw1(< Iw01 + wi,1N wr-,u Nl _< sup Iwol + N sup v^,( = B.
i =1 weA w.A \N

4. Combining the Prohorov theorem (T.8.2) with the Arzela-Ascoli theorem (T.8.3)
gives the following criterion for tightness of probability measures { P} on S = C[0, 1].

Theorem T.8.4. Let {P} be a sequence of probability measures on C[0, 1]. Then
{P} is tight if and only if the following two conditions hold.


(i) For each ry > 0 there is a number B such that

P"({weC[0,1]:1W o l> B})<ry, n = 1,2,....

(ii) For each e> 0, ry > 0, there is a 0 <(5 < 1 such that

P"({weC[0,1]:v",(b)>_a})_<ry, n>_1.

Proof. If {P"} is tight, then given ry > 0 there is a compact K such that P(K) > 1 ry
for all n. By the ArzelaAscoli theorem, if B> supKIWol then

P"({w e C[O, 1]: I(0 o l >, B}) 5 P"(K c ) _< 1 (1 n) = n.

Also given e > 0 select b > 0 such that sup. EK v.((5) < E. Then

P"({w e C[0,1]: v W (S) >_ ej) < P(KC) < ry for all n _> 1.

The converse goes as follows. Given ry > 0, first select B using (i) such that
P"({w: Iw o l < B}) _> 1 Zry, for n >, 1. Select S, using (ii) such that P({w: v w ((5,) < 1/r})
1 for n >, 1. Now take K to be the closure of

{w: Iw o l < B} n n t
co: v w (8,) <

Then P"(K) > 1 ry for n > 1, and K is compact by the ArzelaAscoli theorem.

The above theorem taken with Prohorov's theorem is a cornerstone of weak

convergence theory in C[O, 1]. If one has proved convergence of the finite-dimensional
distributions of {X,('} to {X,} then the distributions of {Xo'} must be tight as
probability measures on III' so that condition (i) is implied. In view of this and the
Prohorov theorem we have the following necessary and sufficient condition for weak
convergence in C[0,1] based on convergence of the finite-dimensional distributions
and tightness.

Theorem T.8.5. Let {X: 0 < t < 1) and {XX :0 _< t <, 1} be stochastic processes on
(S2, .f, P) which have a.s. continuous sample paths and suppose that the
finite-dimensional distributions of {X} converge to those of {X}. Then {X;"}
converges weakly to {X,} if and only if for each F > 0

lim sup sup IX;"^ Xs" I > e^ = 0.

6-0 n Is-11<'6

Corollary. For the last limit to hold it is sufficient that there be positive numbers
a, , M such that

EIX;" X" M
) -< MIt
it sl' +0 for all s, t, n.

To prove the corollary, let D be the set of all dyadic rationals in [0, 1], i.e., numbers
in [0, 1] of the form j/2" for integers j and m. By sample path continuity, the oscillation


of the process over It - sI <- is given by

v(D, (5) = sup; X;"'l a.s.

It - si-<a.
S.t E D

Take 6 = 2 -k+ '. Then, taking suprema over positive integers i, j, n,

v(D, (5) 2 sup IXiz)-m - Xj2 -k1. )

j2 - k<i2 - ^<(j+l)2 - k

Now, for j2 -k < i2 - m < (j + 1)2 -k , writing

i2 - m=j2 -k + 1] 2 - m' wherek<m 1 <m 2 < <m,^m,

we have, writing a(p) = Jv=, 2 , -

(n) lI ^`
X - k X.j2-k+al/+) Xj2 - k+al -



v(D, (5) < 2 Y sup IX( +l)2( m - Xnz--I

m=k+1 0-<h-<2'-1

Let e > 0 and take = 2 -k+ so small (i.e., k so large) that ^=k+1 1/m 2 < E/2. Then

P(v(D, S) > s) P^ sup IXin+n2-m - Xhz--I >

k+1 <
0-<h2m-, 12^
2'^-1 1 1
Y- P IXI h X > () J .

h=0 m

Now apply Chebyshev's Inequality to get

/ ao 2' - 1
P(vn(D, (5) > E) m2a EIX(h+1)2-m Xh2 - "l
m=k+l h=0

x m 2
m 2a2m M2 m(l+p) = M m
mk+1 mk+12

This bound does not depend on n and goes to zero as S -+ 0 (i.e.,

k = log 2 (5 -' + 1 - + oo) since it is the tail of a convergent series. This proves the
corollary. n

5. (FCLT and a Brownian Motion Construction) As an application of the above

corollary one can obtain a proof of Donsker's Invariance Principle for i.i.d. summands
having finite fourth moments. Moreover, since the simple random walk has moments


of all orders, this approach can also be used to give an alternative rigorous construction
of the Wiener measure based on Prohorov's theorem as the limiting distribution of
random walks. (Compare theoretical complement 13.1 for another construction). Let
Z 1 , Z 2 ,... be i.i.d. random variables on a probability space (S2, y , P) having mean
zero, variance one, and finite fourth moment m, = EZ. Define So = 0,
S"=Z 1 + +Z",n>,1,and

Xtn] = n- IIZ
Stnr] + n - ' ]Z (nt [ nt])Zt,, i +l, 0 <- I -< 1.

We will show that there are positive numbers a, and M such that

E^X;" XS" ] ^a ] - Mit sa l+ for 0 < s, t <- 1, n= 1, 2, .... (T.5.1)

By our corollary this will prove tightness of the distributions of the process
n = 1, 2, .... This together with the finite-dimensional CLT proves the FCLT under
the assumption of finite fourth moments. One needs to calculate the probabilities of
fluctuations described in the ArzelaAscoli theorem more carefully to get the proof
under finite second moments alone.
To establish (T.5.1), take a = 4. First consider the case s = (j/n) < (k/n) = t are
at the grid points. Then

E{X;" 1snl} 4 = n -2 E{Z^ +1 + ... + Z} 4


k k k k
=n 2 E{Zi,ZZ,,Z;,}
ij+1 i2=j+1 i3=j+1 is=j+1

= fl-2 (k j)EZ1 + (2)( k 1 )(EZ;) 2


Thus, in this case,

E{X;'] XS"]}4 = n-2{(k

j)m4 + 3(k j)(k j 1)}
k z
n -2 {(kj)m 4 + 3(k j) 2 } -<(m,+3) 1
n n
(m 4 + 3)It s1 2 = c l (t st 2 , where c 1 = m 4 + 3.

Next, consider the more general case 0 < s, t -< 1, but for which It si -> 1/n. Then,
for s < t,

E{X;' X;" ] } 4 = n Z E
t [ntl

j=(ns]+ 1
ZJ + (nt [nt])Z[",1+1 ([ns] ns)Z]"sl+l

[n^] \a
n - 2 3 E( ' Z^ ) + (nt [nt]) 4 EZ1", 1+ 1

+ (ns [ns]) 4 EZ[ns]+1

n - 2 3 4 {c 1 ([nt] [ns]) 2 + (nt [nt]) 2 m 4 + (ns [ns])2m4}

n -2 3ac 1 {([nt] [ns]) z + (nt [nt])z + (ns [ns])Z}

n -2 3 4 c i {([nt] [ns]) + (nt [nt]) + (ns [ns])} 2

= n -2 3 4 c 1 {nt ns + 2(ns [ns])} 2

n -2 3 a c 1 {nt ns + 2(nt ns)} 2

= n -2 3 6 c1(nt ns) 2 = 3 6 c1(t s) 2 .

In the above, we used the fact that (a + b + c) 4 < 3 4 (a 4 + b a + c a ) to get the first
inequality. The analysis of the first (gridpoint) case was then used to get the second
inequality. Finally, if it sI < 1/n, then either

(a) k_<s<t<k+
l forsome0_<k_<n 1,or
n n
k k+ 1 k+2
(b)- s< and\t< forsome0_<k_<n-1.
n n n n

In (a), S1 ,, = S, so that, since Int nsl 1,

E{X;") Xs"'}a = n -2 E{n(t s)4+ }4 = m4n 2 (t s)4

= ma(nt ns) 2 (t s) 2 -< m a (t

In (b), S = Zk+ ,, so that

E{X^") _ XS"^}a = n Z E Zk + nl t
k k
\ Zk+i n(s -^Zk+I
\ ( n n )
( ( k+l ^ k+11 la
n -Z E n \ t Zk+zn s JZk+i Jj
n n

k a+(k+
1 a
s) m a

= 2 a n 2 m a 1(t k 1-^ a + I k n--s )a }
a )J)
+ I + k + 1 sl = 2 n2m
2an2ma 4 4 (t s)a
n n J

= 24 m4(nt ns) 2 (t s) 2 24m4(t s) 2 .

Take = 1, M = max{2 4 m 4 , 3 6 (m 4 + 3)} = 3 6 (m 4 + 3) for a = 4. n

The FCLT (Theorem 8.1) is stated in the text for convergence in S = C[0, 00),
when S has the topology of uniform convergence on compacts. One may take the met-
ric to be p(w, co') _ Zk 1 2 -k dk /(1 + dk ), where d k = max{Iw(t) cu'(t)J: 0 t k}.
Since the above arguments apply to [0, k] in place of [0, 1], the assertion of Theorem
8.1 follows (under the moment condition m a < oo).

6. (Measure Determining Classes) Let (S, p) be a metric space, .a(S) its Borel sigmafield.
A class l c a(S) is measure-determining if, for any two finite measures u, v,
p(C) = v(C) VC e' implies p = v. An example is the class -9 of all closed sets. To
see this, consider the lambda class .sad of all sets A for which p(A) = v(A). If this class
contains -9 then by the Pi-Lambda Theorem (Chapter 0, Theorem 4.1)
a a(s) = .a(S). Similarly, the class (0 of all open sets is measure-determining. A
class 9 of real-valued bounded Borel measurable functions on S is measure-determining
if ff dp = f f dv Vg e I implies p = v. The class C b (S) of real-valued bounded
continuous functions on S is measure-determining. To prove this, it is enough to
show that for each Fe -9 there exists a sequence {f} c Cb (S) such that f" j I F as
n I oo. For this, let h a (r) = 1 nr for 0 _< r < 1/n, h a (r) = 0 for r > 1/n. Then take
f(x) = h"(p(x, F)).

Theoretical Complements to Section I.9

1. If f: C[0, oo) -+ 08' is continuous, then the FCLT provides that the real-valued
random variables X. = f({X;" 1 }), n > 1, converge in distribution to X = f({X' }).
Thus, if one checks by direct computation that the limit F(a) = lim a P(X" _< a), or
F(a ) = lim a P(X" < a), exists for all a, then F is the d.f. of X. Applying this to

f ({Xs }) := max{X5 :0 ,< s ,< t} __ M, we get (10.10), for P(T. > t) = P(M< < z), if z > 0.
The case z < 0 is similar.
Joint distributions of several functionals may be similarly obtained by looking at
linear combinations of the functionals. Here is the precise statement.

Theorem T.9.1. If f: C[0, oo) -- l8'` is continuous, say f = (fl , ... , f,,) where
f : C[0, oo) - III', then the random vectors X. = f({X;" 1 }), n >, 1, converge in
distribution to X = f({X,}).

Proof. For any r i , ... , rk e R , ^j_ rr f : C[0, co) -- W is continuous so that

^j= 1 r; f ({ X}'}) converges in distribution to jj =1 rt f ({X}) by the FCLT. Therefore,
its characteristic function converges to Ee'<' 12 '"> as n - oo for each r e W'. This
means f({X^"^}) converges in distribution to f({X,}) as asserted. n

2. (Mann-Wald) It is sufficient that f: C[0, cc) -+ t8 be only a.s. continuous with

respect to the limiting distribution for the FCLT to apply, i.e., for the convergence
'}) in distribution to f({X,}). That is,
of f ({X,(

Theorem T.9.2. If {X;">} converges in distribution to {Xf } and if P({X,} e Df ) = 0,


Df = {x e C[0, oo): f is discontinuous at x},

then f({X}) converges in distribution to f({XX }).

Proof. This can be proved using Alexandrov's Theorem (T.8.1(ii)), since for any
closed set F, f '(F) c f - `(F) = Df v f - `(F), where the overbar denotes the closure

of the set. n

In applications it is the requirement that the limit distribution assign probability


zero to the event DJ that is sometimes nontrivial to check. As an example, consider

(9.5) and (9.6). Let F = {rc < z. }. Recall that Q - C[0, oo) is given the topology
of uniform convergence on bounded subintervals of [0, x). Consider an element w e S2
that reaches a number less than c before reaching d. It is simple to check that cu
belongs to the interior of F. On the other hand, if w belongs to the closure of F, then
either (i) we F, or (ii) w neither reaches c nor d. Thus, if cu e iF, then either (ii)
occurs, the probability of which is zero for a Brownian motion since JX,I -i x in
probability, as t -+ oo, or (iii) in the interval [r., z'), w never goes below c. By the
strong Markov property (see Chapter V, Theorem 11.1), the latter event (iii) has the
same probability as that of a Brownian motion, starting at c, never reaching below
c before reaching d. In turn, the last probability is the same as that of a Brownian
motion, starting at 0, never reaching below 0 before reaching d c. This probability
is zero, since r = r' converges to zero in probability as z 10 (see Eq. 10.10). By
combining such arguments with those given in theoretical complement I above, one
may also arrive at (10.15) for Brownian motions with nonzero drifts.

Theoretical Complements to Section 1.12

A proof of Proposition 12.1 for the special case of infinite-dimensional events that
depend on the empirical process through the functional (w - supo,,,, Ico,l) used to
define the Kolmogorov-Smirnov statistic (12.11) is given below. This proof is based
on a trick of M. D. Donsker (1952), "Justification and Extension of Doob's Heuristic
Approach to the Kolmogorov-Smirnov Theorems," Annals Math. Statist., 23,
pp. 277-281, which allows one to apply the FCLT as given in Section 8 (and proved
in theoretical complements to Section 1.8 under the assumption of finite fourth

The key to Donsker's proof is the simple observation that the distribution of the
order statistic (Y i) , ... , Y) of n i.i.d. random variables Y, Y2 , ... from the uniform
distribution on [0, 1] can also be obtained as the distribution of the ratios

S, S Z S
( S., S+ I S +I

where Sk = T + + T k > 1, and T1 , T2 ,

k, ... is an i.i.d. sequence of (mean 1)
exponentially distributed random variables.

Intuitively, if the T are regarded as the successive times between occurrence of some
phenomena, then S, is the time to the (n + 1)st occurrence and, in units of S + 1 ,
the occurrence times should be randomly distributed because of lack of memory and
iridependence properties. A version of this simple fact is given in Chapter IV
(Proposition 5.6) for the Poisson process. The calculations are essentially the same,
so this is left as an exercise here.
The precise result that we will prove here is as follows. The symbol = below
denotes equality in distribution.

Proposition T.12.1. Let Y1 , Y2 be i.i.d. uniform on [0, 1] and let, for each n _> 1,
, ...

{F(t)} be the corresponding empirical process based on Y1 , Y. Then

... ,

D = sup o ,,, i JF(t) tI converges in distribution to sup o ,,, i IB*l as n - oc.


Proof. In the notation introduced above, we have

D= fn sup iFF(t) - tI = f max I y k) - k

O^t51 k5n n


d 'I
I Sk
k5n S^11
-- =
n Skk kSn+1
max I --
Sn+1 kn n

= sup JX;' tX; lI + O(n -1 / 2 ), (T.12.1)



Xcnl = S[ ,u1 - [nt]


and, by the SLLN, n/(S n+1 ) -^ I a.s. as n - ao. The result follows from the FCLT,
(8.6), and the definition of Brownian bridge. n

This proposition is sufficient for the particular application to the large-sample

distribution given in the text. A precise definition of weak convergence of the scaled
empirical distribution function as well as a proof of Proposition 12.1 can be found
in P. Billingsley (1968), Convergence of Probability Measures, Wiley, New York.
2. Tabulations of the distribution of the Kolmogorov-Smirnov statistic can be found
in L. H. Miller (1956), "Table of Percentage Points of Kolmogorov Statistics," J.
Amer. Statist. Assoc., 51, pp. 111-121.
3. Let (X I A) denote the conditional distribution of a random variable or stochastic
process X given an event A. As suggested by Exercise 12.4*, from the convergence
of the finite-dimensional distributions it is possible to obtain the Brownian bridge
as the limiting distribution of ({B,} I 0 < B, _< e) as e -* 0, where {B,} denotes standard
Brownian motion on 0 _< t < 1. To make this observation firm, one needs to check
tightness of the family {({B,} 0 < B, _< e): e > 0). The following argument is used
by P. Billingsley (1968), Convergence of Probability Measures, Wiley, New York, p. 84.
Let F be any closed subset of C[0, 1]. Then, since sup,,,,, IB* - B 1 1 = IB,^,

P({B t }eFl0<-B, -< e)-<P({B*}e FE I0<B, -<e),

where FE := {w e C[0,1]: dist(o, F) _< a} fordist(w, A):= inf{p(w, y): ye Al, A c C[0, 1].
But, starting with finite-dimensional sets and then using the monotone class argument
(Chapter 0), one may check that the events {{B*) e F } and {0 _< B, _< e} are E

independent. Therefore, P({B,} e F 0 _< B 1 _< e) _< P({B*} e FE ) for any s > 0. Since
F is closed, the events {{B*} e F } decrease to {{B*} e F) as e - 0, and tightness

follows from the continuity of the probability measure P and Alexandrov's Theorem
T.8.1 (ii).
4. A check of tightness is also required for the Brownian meander and Brownian
excursion as described in Exercises 12.6 and 12.7, respectively. For this, consult
R. T. Durrett, D. L. Iglehart, and D. R. Miller (1977), "Weak Convergence to
Brownian Meander and Brownian Excursion," Ann. Probab., 5, pp. 117-129. The


distribution of the extremal functionals outlined in the exercises can also be found
in R. T. Durrett and D. L. Iglehart (1977), "Functionals of Brownian Meander and
Brownian Excursion," Ann. Probab., 5, pp. 130-135; K. L. Chung (1976), "Excursions
in Brownian Motion," Ark. Mat., pp. 155-177; D. P. Kennedy (1976), "Maximum
Brownian Excursion," J. App!. Probability, 13, 371-376. Durrett, Iglehart and Miller
(1977) also show that the * and + commute in the sense that the Brownian excursion
can be obtained either by a meander of the Brownian bridge (as done in Exercise
12.7) or as a bridge of the meander (i.e., conditioning the meander in the sense of
theoretical complement 3 above). Brownian meander and Brownian excursion have
been defined in a variety of other ways in work originating in the late 1940's with
Paul Levy; see P. Levy (1965), Processus Stochastiques et Mouvement Brownien,
Gauthier-Villars, Paris. The theory was extended and terminology introduced in
K. Ito and H. P. McKean, Jr. (1965), Diffusion Processes and Their Sample Paths,
Springer Verlag, New York. The general theory is introduced in D. Williams (1979),
Diffusions, Markov Processes, and Martingales, Vol. 1, Wiley, New York. A much
fuller theory is then given in L. C. G. Rogers and D. Williams (1987), Diffusions,
Markov Processes, Martingales, Vol. II, Wiley, New York. Approaches from the point
of view of Markov processes (see theoretical complement 11.2, Chapter V) having
nonstationary transition law are possible. Another very useful approach is from the
point of view of FCLTs for random walks conditioned on a late return to zero; see
W. D. Kaigh (1976), "An Invariance Principle for Random Walk Conditioned by a
Late Return to Zero," Ann. Probab., 4(1), pp. 115 -121, and references therein. A
connection with extreme values of branching processes is described in theoretical
complement 11.2, Chapter V.

Theoretical Complements to Section I.13

(Construction of Wiener Measure) Let {X,: t _> 0} be the process having the
finite-dimensional distributions of the standard Brownian motion starting at 0 as
constructed by the Kolmogorov Extension Theorem on the product probability space
(D, .y , P). As discussed in theoretical complement 7.1, events pertaining to the
behavior of the process at uncountably many time points (e.g., path continuity) cannot
be represented in this framework. However, we will now see how to modify the model
in such a way as to get around this difficulty.
Let D be the set of nonnegative dyadic rational numbers and let
J", k = [ k/2", (k + 1)/2"]. We will use the maximal inequality to check that

j P max
sup Xq - XXI2 "i >
(0_<k1.2- qeJ,,,,, D
n1 < oo . (T.13.1)

By the Borel-Cantelli lemma we will get from this that with probability 1, for all n
sufficiently large,
max sup IXq - Xkf2 .,I -< -. (T.13.2)
OEk<n2" geJ,,.kn D n

In particular, it will follow that with probability 1, for every t > 0, q - Xq is uniformly
continuous on D n [0, t]. Thus, almost all sample paths of {Xq : q e D} have a unique
extension to continuous functions {B,: t 0}. That is, letting C = {wu e Q: for each


t > 0, q - Xq (w) is uniformly continuous on D n [0, t]}, define for w e C,

B,(co) = Xq (W), if t = q e D,
lim Xq (co), if t 0 D,

where the limit is over dyadic rational q decreasing to t. By construction, {B,: t _> 0}
has continuous paths with probability 1. Moreover, for 0 < t, < < t, with prob-
ability one, (B,,, .. . , B1 ) = lim" . (Xqc ., ... , Xq ,) for dyadic rational q;" , ... , qp"
) )

decreasing to t,.. ... t k . Also, the random vector (Xgti .... .. Xq ^ ,) has the multivariate
normal distribution with mean vector 0 and variancecovariance matrix
= min(t ; , t i ), I < i, j < k as a limiting distribution. lt follows from these
two facts that this must be the distribution of (B 1 , B, k ). Thus, {B,} is a standard
Brownian motion process.
To verify the condition (T.13.1) for the BorelCantelli lemma, just note that by
the maximal inequality (see Exercises 4.3, 13.11),

P( max IX,+,a2--
a) 2P(IX,+a X11 % a)

4 E(X1+, X,) 4


since the increments of {X,} are independent and Gaussian with mean 0. Now since
the events {max 142 .IX, +1a2 -, X, a} increase with m, we have, letting m * oo,

P( sup iXr+qa X1I > a \ 6S . (T.13.5)

o$q$l.q.D J a


/ 1
Pl max sup Xq Xk12 .4 > - Pf sup (Xq Xk,2"I >
O^kan2"qeJ".knD n k=0 \qeJ",k,D n

6^ri2 ^
( n)2
)4 =


which is summable.

Theoretical Complements to Section 1.14

1. The treatment given in this section follows that in R. N. Bhattacharya, V. K. Gupta,
and E. Waymire (1983), "The Hurst Effect Under Trends," J. App!. Probability, 20,
pp. 649-662. The research on the Hurst effect is rather extensive. For the other related
results mentioned in the text consult the following references:


W. Feller (1951), "The Asymptotic Distribution of the Range of Sums of

Independent Random Variables," Ann. Math. Statist., 22, pp. 427-432.
P. A. P. Moran (1964), "On the Range of Cumulative Sums," Ann. Inst. Statist.
Math., 16, pp. 109-112.
B. B. Mandelbrot and J. W. Van Ness (1968), "Fractional Brownian Motions,
Fractional Noises and Applications," SIAM Rev., 10, pp. 422-437. Processes
of the type considered by Mandelbrot and Van Ness are briefly described in
theoretical complement 1.3 to Chapter IV of this book.

Discrete-Parameter Markov Chains


Consider a discrete-parameter stochastic process {X}. Think of X0 , X,, ... , X,_,

as "the past," X. as "the present," and X +1 X +z, i as "the future" of the
, .

process relative to time n. The law of evolution of a stochastic process is often

thought of in terms of the conditional distribution of the future given the present
and past states of the process. In the case of a sequence of independent random
variables or of a simple random walk, for example, this conditional distribution
does not depend on the past. This important property is expressed by
Definition 1.1.

Definition 1.1. A stochastic process {X0 , X 1 ..... X, ...} has the Markov
property if, for each n and m, the conditional distribution of X + 1 .. , Xn +m

given X0 , X 1 ..... X is the same as its conditional distribution given X. alone.

A process having the Markov property is called a Markov process. If, in addition,
the state space of the process is countable, then a Markov process is called a
Markov chain.

In view of the next proposition, it is actually enough to take m = I in the above


Proposition 1.1. A stochastic process X0 , X,, X 2 , .. . has the Markov property

if and only if for each n the conditional distribution of X + , given X 0 X,..... X
, X

is a function only of X.

Proof. For simplicity, take the state space S to be countable. The necessity of
the condition is obvious. For sufficiency, observe that

P(Xn+1 =ill . .. , Xn +m .lm I Xo = ... , Xn = in)

= P(Xn+t = f1 I Xo = io, .. . , Xn = in)'

P(Xn+z = J2jXo= io,...,Xn=In,Xn+t = J1) ...
P(Xn +m=lmIXo = io,...,Xn +m-1 =1m-1)

= P(Xn +1 =J1 I Xn = in)P(Xn +2 =12 I Xn +1 =1 ).

P(Xn +m =1m Xn +m-1 =1m-1). (1.1)

The last equality follows from the hypothesis of the proposition. Thus the
conditional distribution of the future as a function of the past and present states
i 0 , i 1 , ... , i n depends only on the present state i n . This is, therefore, the
conditional distribution given X n = i n (Exercise 1). n

A Markov chain {X0 , X1,. . .} is said to have a homogeneous or stationary

transition law if the distribution of Xn +l, ... , Xn +m given Xn = y depends on
the state at time n, namely y, but not on the time n. Otherwise, the transition
law is called nonhomogeneous. An i.i.d. sequence {X n } and its associated random
walk possess time-homogeneous transition laws, while an independent
nonidentically distributed sequence {X} and its associated random walk have
nonhomogeneous transitions. Unless otherwise specified, by a Markov process
(chain) we shall mean a Markov process (chain) with a homogeneous transition
The Markov property as defined above refers to a special type of statistical
dependence among families of random variables indexed by a linearly ordered
parameter set. In the case of a continuous parameter process, we have the
following analogous definition.

Definition 1.2. A continuous-parameter stochastic process {X,} has the

Markov property if for each s < t, the conditional distribution of X, given
{X, u < s} is the same as the conditional distribution of X, given Xs . Such a
process is called a continuous-parameter Markov process. If, in addition, the
state space is countable, then the process is called a continuous-parameter
Markov chain.


An i.i.d. sequence and a random walk are merely two examples of Markov
chains. To define a general Markov chain, it is convenient to introduce a matrix
p to describe the probabilities of transition between successive states in the
evolution of the process.

Definition 2.1. A transition probability matrix or a stochastic matrix is a square


matrix p = ((p i3 )), where i and] vary over a finite or denumerable set S, satisfying
(i) p i; >, 0 for all i and j,

(ii) YJES pit = I for all i.

The set S is called the state space and its elements are states.

Think of a particle that moves from point to point in the state space according
to the following scheme. At time n = 0 the particle is set in motion either by
starting it at a fixed state i o , called the initial state, or by randomly locating it
in the state space according to a probability distribution it on S, called the
initial distribution. In the former case, it is the distribution concentrated at the
state i 0 , i.e., n ; = 1 if j = i o , ire = 0 if j : i 0 . In the latter case, the probability
is 7r ; that at time zero the particle will be found in state i, where 0 < n i < 1 and
y i tc i = 1. Given that the, particle is in state i 0 at time n = 0, a random trial is
performed, assigning probability p ;0 to the respective states j' E S. If the
outcome of the trial is the state i l , then the particle moves to state i 1 at time
n = 1. A second trial is performed with probabilities p i , j - of states j' E S. If the
outcome of the second trial is i 2 , then the particle moves to state i 2 at time
n = 2, and so on.
A typical sample point of this experiment is a sequence of states, say
(i o , i 1 , i z , ... , i n , ...), representing a sample path. The set of all such sample
paths is the sample space S2. The position Xn at time iI is a random variable
whose value is given by X n = i n if the sample path is (i 0 , i l , ... , i n , ...). The
precise specification of the probability P,, on Q for the above experiment is
given by

.. p
Pn( XO = l 0, X1 = ii, , Xn = in) = M io Pioi1 Pi l i 2 (2.1)

More generally, for finite-dimensional events of the form

A={(X0 ,X 1 ....,Xn )EB}, (2.2)

where B is an arbitrary set of (n + 1)-tuples of elements of S, the probability

of A is specified by

P,1(A) _ y- iioPioi,...p1 ^. (2.3)

(io.i1 ..... ln)EB

By Kolmogorov's existence theorem, P,, extends uniquely as a probability

measure on the smallest sigmafield F containing the class of all events of the
form (2.2); see Example 6.3 of Chapter I. This probability space (i), F, Pn )
with Xn (CU) = w,, co E S2, is a canonical model for the Markov chain with transition
probabilities ((p ij )) and initial distribution it (the Markov property is established
below at (2.7)(2.10)). In the case of a Markov chain starting in state i, that
is, 1t ; = 1, we write Pi in place of P11.

To specify various joint distributions and conditional distributions associated

with this Markov chain, it is convenient to use the notation of matrix
multiplication. By definition the (i, j) element of the matrix p 2 is given by

nj ) _ P kcS
ik Pkj (2.4)

The elements of the matrix p" are defined recursively by p" = p" 'p so that the -

(i, j) element of p" is given by

cn) = cn-1) _ cn-') n = 2,3 .... (2.5)

P ik
Pkj PikPkj >
kES kcS

It is easily checked by induction on n that the expression for p;n is given directly )

in terms of the elements of p according to

Pi; = ) Y_ P1Pi1i2..Pin-zln-IPin-li. (2.6)

11.....In - l E S

Now let us check the Markov property of this probability model. Using (2.1)
and summing over unrestricted coordinates, the joint distribution of
Xo , Xn ,, X" 2 , . .. , Xnk , with 0 = n o < n l < n2 < < nk, is given by

Pa(X0 = i, Xn l =j, Xn 2
12' ... , Xn k =ik)

_ Z E . . . I (
Y /n2 - 1 J2 )
7r l Pii, Pi, . . . . pin 1 - lj l)(Pi l in i + I Pin, + l in, + Z . . I7 . .

1 2 k

X ( Pik-link-,+1 Pink-,+l ink_I+2...Pink-lik)> (2.7)

where >, is the sum over the rth block of indices ii + n ,_ ', .+ .), . 1,,
, in ,
(r = 1, 2, . .. , k). The sum Y_ k , keeping indices in all other blocks fixed, yields
the factor p^kk Ok -') using (2.6) for the last group of terms. Next sum successively
over the (k 1)st, ... , second, and first blocks of factors to get

, Xn k =1k) = 7riP) p
^nJ 2 nl)..
P (X0 = i, Xn1 =j1, X,2 =J2' . plkk Ijk 1). (2.8)

Now sum over i e S to get

Xnk 1) .
PE(Xni j1, X2 121 , Jk) = ( ^I Pin1))Pjll.i2 "
. .pjkk lik 1). (2.9)
( ics /

Using (2.8) and the elementary definition of conditional probabilities, it now

follows that the conditional distribution of X +m given X o , X,,. . . , Xn is given by

Pt(Xn+m = J I Xo = i0, Xl = il, .. , Xn 1 = in 1, Xn = i)

=p^) =P"(Xn+m=j1 Xn =i)


= PP(Xm = .1 1 XO = 1), m i 1 , j E S. (2.10)

Although by Proposition 1.1 the case m = 1 would have been sufficient to prove

the Markov property, (2.10) justifies the terminology that p' :_ ((p;; >)) is the
m-step transition probability matrix. Note that p' is a stochastic matrix for all
The calculation of the distribution of X, follows from (2.10). We have,

Pn(Xm=j)=ZPn(Xm=j X0 =i) =ZPP(X0=i)Pi(Xm=j1 X0 =')

i i

_ 7 Z.pi;' = (n'pm)., (2.11)

where n' is the transpose of the column vector n, and (n'pt) j is the jth element
of the row vector n'pm.


The transition probabilities for some familiar Markov chains are given in the
examples of this section. Although they are excluded from the general
development of this chapter, examples of a non-Markov process and a Markov
process having a nonhomogeneous transition law are both supplied under
Example 8 below.

Example 1. (Completely Deterministic Motion). Let the only elements of p,

which may be either a finite or denumerable square matrix, be 0's and l's. That
is, for each state i there exists a state h(i) such that

p ih(i) = 1, pi; = 0 for j 0 h(i) (i n S). (3.1)

This means that if the process is now in state i it must be in state h(i) at the
next instant. In this case, if the initial state X o is known then one knows the
entire future. Thus, if X 0 = i, then

X l = h(i),
X z = h(h(i)) := h (z) (i), ... ,
" - 1)
X" = h(h ( (i)) = h ( " (i), ... .

Hence p;;^ = 1 if j = h ( " (i) and p;j" = 0 if j ^ h ( " (i). Pseudorandom number
) ) )

generators are of this type (see Exercise 7).

Example 2. (Completely Random Motion or Independence). Let all the rows

of p be identical, i.e., suppose p i , is the same for all i. Write for the common row,

p si =p (ieS,jeS). (3.2)

In this case, X0 , X 1 , X2 . . forms a sequence of independent random variables.


The distribution of X0 is it while X 1 ,. . . , X, ... have a common distribution

given by the probability vector (p j )jcs . If we let it = (p; ), ES , then Xo , X 1 , .. .
form a sequence of independent and identically distributed (i.i.d.) random
variables. The coin-tossing example is of this kind. There, S = {0,1 } (or {H, T } )
and po=2'Pji
Example 3. (Unrestricted Simple Random Walk). Here, S = {0, + 1, 2, ...}
and p = ((p ;j )) is given by

Pi; =P ifj =i +1
=q ifj =i -1
=0 ifjjiI> 1. (3.3)

where 0 < p < l and q = 1 p.

Example 4. (Simple Random Walk with Two Reflecting Boundaries). Here

S = {c, c + 1, ... , d}, where c and d are integers, c < d. Let

p ;j =p ifj =i +1 andc<i<d
=q ifj =i I andc<i<d
Pc,c +i = I, Pa,e -i = 1. (3.4)

In this case, if at any point of time the particle finds itself in state c, then at
the next instant of time it moves with probability I to c + 1. Similarly, if it is
at d at any point of time it will move to d I at the next instant. Otherwise
(i.e., in the interior of [c, d]), its motion is like that of a simple random walk.

Example S. (Simple Random Walk with Two Absorbing Boundaries). Here

S = {c, c + 1, ... , d}, where c and d are integers, c < d. Let

PCC = 1, Pad = 1. (3.5)

For c < i < d, p i; is as defined by (3.3) or (3.4). In this case, once the particle
reaches c (or d) it stays there forever.

Example 6. (Unrestricted General Random Walk). Take S = {0, + 1, 2, ...}.

Let Q be an arbitrary probability distribution on S, i.e.,
(i) Q(i) > 0 for all i E S,
(ii) liS Q(i) = I.
Define the transition matrix p by

= Q(j i), i, j E S. (3.6)

One may think of this Markov chain as a partial-sum process as follows. Let
X0 have a distribution n. Let Z 1 , Z 2 , ... be a sequence of i.i.d. random variables
with common distribution Q and independent of X o . Then,

X^=X0 +Z 1 +--+Z,,, forn>,l, (3.7)

is a Markov chain with the transition probability (3.6). Also note that Example
3 is a special case of Example 6, with Q( l) = q, Q(l) = p, and Q(i) = 0 for
i: +1.

Example 7. (BienaymeGaltonWatson Simple Branching Processes). Particles

such as neutrons or organisms such as bacteria can generate new particles or
organisms of the same type. The number of particles generated by a single
particle is a random variable with a probability function f; that is, f (j) is the
probability that a single particle generates j particles, j = 0, 1, 2, .... Suppose
that at time n = 0 there are Xo = i particles present. Let Z 1 , Z 2 , ... , Z i denote
the numbers of particles generated by the first, second, ... , ith particle,
respectively, in the initial set. Then each of Z 1 , ... , Z ; has the probability
function f and it is assumed that Z 1 , ... , Z; are independent random variables.
The size of the first generation is X 1 = Z 1 + + Z., the total number generated
by the initial set. The X, particles in the first generation will in turn generate
a total of X 2 particles comprising the second generation in the same manner; that
is, the X, particles generate new particles independently of each other and the
number generated by each has probability function f. This goes on so long as
offspring occur. Let X. denote the size of the nth generation. Then, using the
convolution notation for distributions of sums of independent random variables,

pti=P(Z1+ ... +Z.=1)_f *i (j), i> 1, j^0,

Poo= 1 , p oi =0 ifj00.

The last row says that "zero" is an absorbing state, i.e., if at any point of time
X = 0, then X. = 0 for all m > n, and extinction occurs.

Example 8. (Plya Urn Scheme and a Non-Markovian Example). A box

contains r red balls and b black balls. A ball is randomly selected from the box
and its color is noted. The ball selected together with c 0 balls of the same
color are then placed back into the box. This process is repeated in successive
trials numbered n = 1, 2,-. . . . Indicate the event that a red ball occurs at the
nth trial by XX = 1 and that a black ball occurs at the nth trial by X = 0. A
straightforward induction calculation gives for 0 < Yk= 1 e k < n,


P(X1 = E1, .Xn = En)

[r+(s-1)c][r+(s-2)c]. r[b+(T-1)c].b
[r+b+(n 1)c][r+b+(n-2)c]...[r+b]


s= Y_ s k , r =ns. (3.10)

In the cases=n(i.e.,c 1 =...=E=1),

[ r + (n 1) c] r
P(X I = 1, ... X = 1) (3.11)

and if s = 0 (i.e., e 1 = = E = 0) then

[b + (n 1)c].b
P(XI=0,...,Xn =0)= (3.12)
[r+b+(n 1)c][r+b]

In particular,

P(Xn = EIXt = ei,...,Xn-1 =E -1)

P (X1 = E1, .. . , X. = n)

P (X 1 = c1,. ..,Xt = En t )

_ [r + (s 1)c]..r[b + (r 1)c]b [r + b + (n 2)c]..[r + b]

[r + (s n _ 1 1)c]. r[b + (t n - 1 1)c] b [r + b + (n 1)c]. .[r + b]

ifE= 1
r+b+(n 1)c
r+b+ (ii 1)c

It follows that {X} is non-Markov unless c = 0 (in which case {X} is i.i.d.).
Note, however, that {X} does have a distinctive symmetry property reflected
in (3.9). Namely, the joint distribution is a function of s = Yk =1 e k only, and
is therefore invariant under permutations of e l . i,,. Such a stochastic process
is called exchangeable (or symmetrically dependent). The Plya urn model was
originally introduced to illustrate a notion of "contagious disease or "accident
proneness' for actuarial mathematics. Although {X} is non-Markov for c ^ 0,
it is interesting to note that the partial-sum process {S}, representing the
evolution of accumulated numbers of red balls sampled, does have the Markov
property. From (3.13) one can also get that

r + CS n _ , if s = l +s_,
P(sn= sIS, =s1, ,Sn_, =s,)= r+b
b +(n S- 1 ) c
lfs=s _ i .
r + b + (n 1)c
Observe that the transition law (3.14) depends explicitly on the time point n.
In other words, the partial-sum process {S} is a Markov process with a
nonhomogeneous transition law. A related continuous-time version of this
Markov process, again usually called the P1ya process is described in Exercise
1 of Chapter IV, Section 4.1. An alternative model for contagion is also given
in Example 1 of Chapter IV, Section 4, and that one has a homogeneous
transition law.


One of the most useful general properties of a Markov chain is that the Markov
property holds even when the "past" is given up to certain types of random
times. Indeed, we have tacitly used it in proving that the simple symmetric
random walk reaches every state infinitely often with probability 1 (see
Eq. 3.18 of Chapter 1). These special random times are called stopping times
or (less appropriately) Markov times.

Definition 4.1. Let {},,: n = 0, 1, 2, ...} be a stochastic process having a

countable state space and defined on some probability space (S2, 3, P). A
random variable r defined on this space is said to be a stopping time if
(i) It assumes only nonnegative integer values (including, possibly, +co),
(ii) For every nonnegative integer m the event {w: r(w) < m} is determined
by Yo , Y1 ,..., Y..

Intuitively, if r is a stopping time, then whether or not to stop by time m

can be decided by observing the stochastic process up to time m. For an example,
consider the first time T y the process { Y} reaches the state y, defined by

t(co) = inf{n > 0: Y(co) = y} . (4.1)

If co is such that Y(w) y whatever be n (i.e., if the process never reaches y),
then take r y,(w) = oo. Observe that

{w: r(w) < m} = U {co: Y(w) = y} . (4.2)

n =O

Hence zr Y is a stopping time. The rth return times r;' ofy are defined recursively by

r" (w) = inf{n

) 1: Y(w) = y},
for r = 2, 3, .... (4.3)
r(w) = inf{n > iy '(w): Y (w) = y},

Once again, the infimum over an empty set is to be taken as oo. Now whether
or not the process has reached (or hit) the state y at least r times by the time
m depends entirely on the values of Y1 , ... , Y.. Indeed, {rI m} is precisely
the event that at least r of the variables Y1 , ... , Y,n equal y. Hence ry" is a
stopping time. On the other hand, if n y denotes the last time the process reaches
the state y, then ?J is not a stopping time; for whether or not i < m cannot
in general be determined without observing the entire process {Y n }.
Let S be a countable state space and p a transition probability matrix on S,
and let P,, denote the distribution of the Markov process with transition
probability p and initial distribution n. It will be useful to identify the events
that depend on the process up to time n. For this, let S2 denote the set of all
sequences w = (i 0 , i 1 , i 2 , ...) of states, and let Y(w) be the nth coordinate of w
(if w = (i o , i,, ... , i, ...), then Yn (cw) = in ). Let .fin denote the class of all events
that depend only on Yo , YI , ... , Yn . Then the n form an increasing sequence
of sigmafields of finite-dimensional events. The Markov property says that given
the "past" Yo , Yi , ... , Y. up to time m, or given .gym , the conditional distribution
of the "after-m"stochastic process Y, = {(Ym ) } := {Ym+n . n = 0, 1, ...} is P.
In other words, if the process is re-indexed after time m with m + n being
regarded as time n, then this stochastic process is conditionally distributed as
a Markov chain having transition probability p and initial state Y..

A WORD ON NOTATION. Many of the conditional distributions do not depend

on the initial distribution. So the subscripts on P,^, Pi , etc., are suppressed as a
matter of convenience in some calculations.

Suppose now that r is the stopping time. "Given the past up to time t" means
given the values oft and Yo , Y1 , ... , YY . By the "after-r"process we now mean
the stochastic process

Yi = {Yt+n :n=0, 1,2,...},

which is well defined only on the set IT < co}.

Theorem 4.1. Every Markov chain { Yn : n = 0, 1, 2, ...} has the strong Markov
property; that is, for every stopping time i, the conditional distribution of the
after-r process Yt = { Yt+n : n = 0, 1, 2, ...}, given the past up to time i is P..
on the set {i < co}.

Proof. Choose and fix a nonnegative integer m and a positive integer k along
with k time points 0 _< m, < m 2 < < m k , and states i o , i l , ... 'im'

J1,J2, ,Jk Then,

P(Yt +m l J1 Yt +m 2 =J2, = Jk I T = m, Yo = i o , ... , Ym = Im)

... , Yt +m k

= P(Ym+mi = J1 , Ym+mz = J2, .. , Ym+mk Jk I T = m, Yo = io... , Ym = Im)

Now if the event IT = m} (which is determined by the values of Y,,... , Ym )
is not consistent with the event "Yo = i o , ... , Ym = im " then {T = m,
Yo = i o , , Ym = i m } = 0 and the conditioning event is impossible. In that
case the conditional probability may be defined arbitrarily (or left undefined).
However, if IT = m} is consistent with (i.e., implied by) "Yo = i o , , Ym =
then{ T= m,Yo =i o .....Ym = i m }= {Yo =i o ,,Ym =i m },and the right side
of (4.4) becomes

P(Ym+mi 11> Ym+mz =i2' .. , Ym+m k =Jk IY Q = 1 p , ... , Ym = im) (4.5)

But by the Markov property, (4.5) equals

Pi m (Ym i = J1, Ymz =J2, , Ymk = Jk)

= Py,(Ym i = 11, Ym z = J1, Ym z = J2, ... , Ym k =Jk), (4.6)

on the set {T = m}. n

We have considered just "future" events depending on only finitely many
(namely k) time points. The general case (applying to infinitely many time
points) may be obtained by passage to a limit. Note that the equality of (4.4)
and (4.5) holds (in case {r = m} {Y0 = i o , ... , Ym = i m }) for all stochastic
processes and all events whose (conditional) probabilities may be sought. The
equality between (4.5) and (4.6) is a consequence of the Markov property. Since
the latter property holds for all future events, so does the corresponding result
for stopping times T.
Events determined by the past up to time T comprise a sigmafield . , called
the pre -T sigmafleld. The strong Markov property is often expressed as: the
conditional distribution of Y7 given . is PY (on {T < co}). Note that .
is the smallest sigmafield containing all events of the form IT = m,
,Ym =im }.

Example 1. In this example let us reconsider the validity of relation (3.18) of

Chapter I in light of the strong Markov property. Let d and y be two integers.
For the simple symmetric random walk {S: n = 0, 1, 2, ...} starting at x,
is an almost surely finite stopping time, for the probability p x ,, that the random
walk ever reaches y is I (see Chapter 1, Eqs. 3.16-3.17). Denoting by E, the
expectation with respect to P, we have


P.(TY < oo) = P (S Y + = y for some n > 1)


= EX[P(Sv+ = y for some n , 1 (4 , So , ... ,SY)]

Y = y)
= E X [PY (S = y for some n >, 1)] (since S t,,,
(Strong Markov Property)
= P(S = y for some n > 1)
=Py (S, =y 1,S=y forsome n>, 1)
+PP (S 1 = y + 1, S=y for some n>, 1)
=PY (S 1 = y-1)Py (S=y forsomen> 1IS =y-1) 1

+PP (S, =y 1)Py (S=yforsomen>, 1 IS 1 =y+ 1)

=zPy (S, + ,=y for some m>,01S, = y 1)
+ZPy (S l+ ,=y for some m >,QIS 1 =y+1) (m=n-1)
=ZPy _ 1 ( S.=y for some m>,0) +ZPY+ ,( S,=y for some m,0)
(Markov property)
2P r 1,r +2Py +1, 2 +z= 1. (4.7)

Now all the steps in (4.7) remain valid if one replaces Ty l) by i;,' -1) and T;,2) by
zy'r and assumes that < oo almost surely. Hence, by induction,
P (') < oo) = 1 for all positive integers r. This is equivalent to asserting

1 = P.,(T ( ' ) < oo for all positive integers r)

= PX (S = y for infinitely many n). (4.8)

The importance of the strong Markov property will be amply demonstrated

in Sections 9-11.


The unrestricted simple random walk {S} is an example in which any state
i E S can be reached from every state j in a finite number of steps with positive
probability. If p denotes its transition probability matrix, then p 2 is the transition
probability matrix of { Y} 2_ {SZ : n = 0, 1, 2, ...}. However, for the Markov
chain { Y}, transitions in a finite number of steps are possible from odd to odd
integers and from even to even, but not otherwise. For {S} one says that there
is one class of "essential states and for { Y} that there are two classes of
essential states.
A different situation occurs when the random walk has two absorbing
boundaries on S = {c, c + 1, ... , d 1, d}. The states c, d can be reached (with
positive probability) from c + 1, ... , d 1. However, c + 1, ... , d 1 cannot

be reached from c or d. In this case c + 1, ... , d - I are called "inessential"

states while {c }, {d} form two classes of essential states.
The term "inessential" refers to states that will not play a role in the long-run

behavior of the process. If a chain has several essential classes, the process
restricted to each class can be analyzed separately.

Definition S.I. Write i --' f and read it as either ` j is accessible from i" or "the

process can go from i to j" if p;l > 0 for some n ) 1.


Pi;)= Pi (5.1)
1I.i2....,ln - 1 E.S

i - j if and only if there exists one chain (i, i,, i 2 , ... , i n _ 1 , j) such that
Pul' pi,i2 , . , p,,,_ 1 j are strictly positive.

Definition 5.2. Write i H j and read "i and j communicate" if i - j and j - i.

Say "i is essential" if i - j implies j -+ i ( i.e., if any state j is accessible from i,
then i is accessible from that state). We shall let & denote the set of all essential
states. States that are not essential are called inessential.

Proposition 5.1

(a) For every i there exists (at least one) j such that i ' f. --

(b) i -*j,j- k imply i- k.

(c) "i essential" implies i <-+ i.
(d) i essential, i --'f imply "f is essential" and i-'].
(e) On (' the relation "H" is an equivalence relation (i.e., reflexive,
symmetric, and transitive).

Proof. (a) For each i, jEs pit = 1. Hence there exists at least one j for which
p i; >0; for this] one has i-- j.
(b) i -* j, j - k means that there exist m? 1, n I such that p;f > 0, >, )

pik > 0. Hence,


(m+n)_ (m) (n)

Pik Pu Pik

(m) (n) (m) (n) (m) (n)

= Vif P.ik + L. Pa P)k % Pij Pik > 0. (5.2)

Hence, i - k. Note that the first equality is a consequence of the relation

p m+n = pm pn
(c) Suppose i is essential. By (a) there exists j such that p ;j > 0. Since i is
essential, this implies] --' i, i.e., there exists m I such that pj7' 1 > 0. But then

= li- Pu
(m) = pif Pi (m) + pilp it ) > 0. (5.3)
Ics )3E.%

Hence i - i and, therefore, i 4--* i.

(d) Suppose i is essential, i -*j. Then there exist m > 1, n >, 1 such that
p, J > 0 and p > 0. Hence i j. Now suppose k is any state such that j -> k,
) )

i.e., there exists m' >, 1 such that p1') > 0. Then, by (b), i - k. Since i is essential,
one must have k -> i. Together with i --+ j this implies (again by (b)) k - j.
Thus, if any state k is accessible from j, then j is accessible from that state k,
proving that j is essential.
(e) If 60 is empty (which is possible, as for example in the case p 1 = 1,

i = 0, 1, 2, ...), then, there is nothing to prove. Suppose ' is nonempty. Then:

(i) On f the relation "+-" is reflexive by (c). (ii) If i is essential and i.-.j, then
(by (d)) j is essential and, of course, i -^ j and j H i are equivalent properties.
Thus "->" is symmetric (on ' as well as on S). (iii) If i H j and j *-+ k, then i j -

and j - k. Hence i -4 k (by (b)). Also, k - j and j -> i imply k --> i (again by
(b)). Hence i H k. This shows that "-+" is transitive (on 9 as well as on S).

From the proof of (e) the relation "^-" is seen to be symmetric and transitive
on all of S (and not merely 9). However, it is not generally true that i i (or,
i - i) for all i e S. In other words, reflexivity may break down on S.

Example 1. (Simple (Unrestricted) Random Walk). S = {0, 1, 2, ...}.

Assume, as usual, 0 <P < 1. Then i -+ j for all states i e S, j e S. Hence ' = S.

Example 2. (Simple Random Walk with Two Absorbing Boundaries). Here

S = {c, c + 1, ... , d}, 9 = {c, d}. Note that c is not accessible from d, nor is d
accessible from c.

Example 3. (Simple Random Walk with Two Reflecting Boundaries). Here

S={c,c+ 1,....,d}, and i--*j for all i eSandjeS. Hence f'=S.

Example 4. Let S = {1, 2, 3, 4, 5} and let

1 1 1
5 5 5 5
0 3 0 3 0
p= O 0 4 0 (5.4)
0 3 0 3 0
1 0 1 0 1
[ ii
3 3 3

Note that 9 = {2, 4}.

In Examples I and 3 above, there is one essential class and there are no
inessential states.

Definition 5.3. A transition probability matrix p having one essential class and
no inessential states is called irreducible.
Now fix attention on S. Distinct subsets of essential states can be identified
according to the following considerations. Let i e.1. Consider the set
6'(i) = { j e 6": i --> j}. Then, by (d), i +-] for all j E 6(i). Indeed, if], k e 6"(i), then
j H k (for j --> i, i - k imply j -+ k; similarly, k - j). Thus, all members of 6'(i)
communicate with each other. Let r e 6, r 6"(i). Then r is not accessible from
a state in 6'(i) (for, if j e e'(i) and j -+ r, then i --> j, j -a r will imply i -> r so
that r e 6"(i), a contradiction). Define S(r) = { j E 6',r -+ j}. Then, as before, all
states in c0(r) communicate with each other. Also, no state in 6"(r) is accessible
from any state in 6'(i) (for if ! e "(r), and j e 6'(i) and j -+ 1, then i - 1; but r F + 1,
so that i - 1, 1 --> r implying i - r, a contradiction). In this manner, one
decomposes 6" into a number of disjoint classes, each class being a maximal set
of communicating states. No member of one class is accessible from any member
of a different class. Also note that if k e 6"(i), then 6"(i) = 6'(k). For if j e 6'(i),
then j , i, i -* k imply j -- k; and since j is essential one has k --> j. Hence
j e 6"(k). The classes into which of decomposes are called equivalence classes.
In the case of the unrestricted simple random walk {S}, we have
6" S = {0, + 1, 2,. . .}' and all states in 6" communicate with each other;
only one equivalence class. While for {X} = {S 2n }, = S consists of two disjoint
equivalence classes, the odd integers and the even integers.
Our last item of bookkeeping concerns the role of possible cyclic motions
within an essential class. In the unrestricted simple random walk example, note
that p i ,=0 for all i=0,1,2,...,butp;2 ) =2pq>0. In fact p;7 1 =0for
all odd n, and p;" > 0 for all even n. In this case, we say that the period of i

is 2. More generally, if i -- i, then the period of i is the greatest common divisor

of the integers in the set A = In >, 1: p}. If d = d, is the period of i, then
p;" ) = 0 whenever n is not a multiple of d and d is the largest integer with this

Proposition 5.2
(a) If i H j then i and j possess the same period. In particular "period" is
constant on each equivalence class.
(b) Let i e 9' have a period d = d ; . For each j e 6'(i) there exists a unique
integer r 1 , 0 < rj d - 1, such that p;j ) > 0 implies n = rj (mod d) (i.e.,
either n = rj or n = sd + rj for some integer s >, I).

Proof. (a) Clearly,

(a+m+b) (a) (m) (b)
P >P);Pi; P;, (5.5)

for all positive integers a, m, b. Choose a and b such that p;, > 0 and pj(b > 0.
) )

If pj7 1 > 0, then ps_m ) - pj^" ) p^^ ) > 0, and

(a+2m+b) > (a ) ^^m) ^b) > 0.
P^l P ( ) P (b.. p..
(a+m+b) > ) > 0
Pu (
) P ( p. p. ( 5.6 )

Therefore, d (the period of i) divides a + m + b and a + 2m + b, so that it

divides the difference m = (a + 2m + b) (a + m + b). Hence, the period of i
does not exceed the period of j. By the same argument (since i 4J is the same
as j -+ i), the period of j does not exceed the period of i. Hence the period of i
equals the period of j.
(b) Choose a such that !> > 0. If ;'") > 0, ;") > 0, then "' 3p 1) p)> 0
and p;' ? pp;) > 0. Hence d, the period of i, divides m + a, n + a and,
therefore, m n = m + a (n + a). Since this is true for all m, n such that
p 1) > 0, p;j> > 0, it means that the difference between any two integers in the
set A = {n: p;jn> > 0} is divisible by d. This implies that there exists a unique
integer r,, 0 < rj < d 1, such that n = rj (mod d) for all n e A (i.e., n = sd + r,
for some integer s >, 0 where s depends on n). n

It is generally not true that the period of an essential state i is

min{n >, 1: p;; > 0}. To see this consider the chain with state space { 1, 2, 3, 4}

and transition matrix

0 1 0 0
0 0 1 0
0 2 1 0 '2
1 0 0 0

Schematically, only the following one-step transitions are possible.


Thus p;i' = 0, pi 1 > 0 , P11 > 0, etc., and pi"i = 0 for all odd n. The states
communicate with each other and their common period is 2, although
min{n: p;"1 > 0} = 4. Note that min{n > 1: p; ) > 0} is a multiple of d, since d,
divides all n for which p;" ) > 0. Thus, d. <, min{n >, 1: p> 0}.

Proposition 5.3. Let i E e have period d> 1. Let Cr be the set of j e .9(i) such
that rj = r, where rr is the remainder term as defined in Proposition 5.2(b). Then
(a) Co , C,, ... , Cd _, are disjoint, U r = C, =
(b) If je C then pik >0 implies k e C, + ,, where we take r + 1 = 0 if
r = d 1.

Proof. (a) Follows from Proposition 5.2(b).

(b) Suppose j e C, and p;j > 0. Then n = sd + r for some s >, 0. Hence, if

p;k > 0 then


Pik +>> Pi; )Pjk >0, (5.7)

which implies k e Cr+l (since n + I = sd + r + I = r + 1 (mod d)), by

Proposition 5.2(b). n

Here is what Proposition 5.3 means. Suppose i is an essential state and has
a period d > 1. In one step (i.e., one time unit) the process can go from i E Ca
only to some state in C, (i.e., p 1 > 0 only if j e C 1 ). From states in C,, in one
step the process can go only to states in C 2 . This means that in two steps the
process can go from i only to states in C 2 (i.e., p; > 0 only if je C 2 ), and so
on. In d steps the process can go from i only to states in Cd + , = CO3 completing
one cycle (of d steps). Again in d + 1 steps the process can go from i only to
states in C 1 , and so on. In general, in sd + r steps the process can go from i
only to states in Cr. Schematically, one has the picture in Figure 5.1 for the
case d = 4 and a fixed state i e Co of period 4.

Example S.S. In the case of the unrestricted simple random walk, the period
is 2 and all states are essential and communicate with each other. Fix i = 0.
Then C o = {0, 2, 4, ...}, C l = (1, 3, 5, ...}. If we take i to be any
even integer, then C O3 C l are as above. If, however, we start with i odd, then
C o = { 1, 3, 5, ...}, C, = {0, 2, 4, ...}.


iEG,--.jEC j ^kEC,./EC,^mEC^

Figure 5.1



As will be demonstrated in this section, if the state space is finite, a complete

analysis of the limiting behavior of p", as n oo, may be carried out by
elementary methods that also provide sharp rates of convergence to the so-called
steady-state or invariant distributions. Although later, in Section 9, the
asymptotic behavior of general Markov chains is analyzed in detail, including
the law of large numbers and the central limit theorem for Markov chains that
admit unique steady-state distributions, the methods of the present section are
also suited for applications to certain more general (nonfinite) state spaces (e.g.,
closed and bounded intervals). These latter extensions are outlined in the
First we consider what happens to the n-step transition law if all states are
accessible from each other in one time step.

Proposition 6.1. Suppose S is finite and p,J > 0 for all i, j. Then there exists a
unique probability distribution it = {m 1 : j e S} such that

yrc ; p ;J =n J for alljeS (6.1)



^p;j^itJ1 (1Nb)" for all i,jeS, n>,1, (6.2)

where 6 = min{p ;J : i, je S} and N is the number of elements in S. Also, it J >, 6

for alljeS and 6<1/N.

Proof. Let M;" ) , m;" ) denote the maximum and the minimum, respectively, of
the elements {p: i e S} of the jth column of p'. Since p ;J >, 6 and
p ;J = 1 Y- k0 J P ik <- 1 (N 1)b for all i, one has

M}' ) mj 1) <l(N-1)S-6=IN5. (6.3)

Fix two states i, i' arbitrarily. Let J = { j e S: p ;J > p ; .J }, J' _ { j e S: p ;J ^ p ; J }.


0 = 1 1 = E (p Pi J) = Y_ (P Pij) + Y_ (Pij PrJ),


J JE)' JE)

so that

I (Pij PrJ) = Y_ (p Pi,j), (6.4)

je)' JE



I (Pij P) _ y- Pij
jE jEJ jEJ

= 1 y Pij y pi'j 1 (#J')6 (#J)6 = I N. (6.5)

jE ' jE

(n + 1)(n + 1) (n) (n) (n)
nj Pi'j = Pik Pkj Pi'k Pj
k = (Pik Pi'k)Pj
k k k

/ (n) (n)
\Pik Pi'k)Pkj + (Pik Pi'k)Pkj
kcJ kEJ'

(Pik Pi'k)Mj ") + Y- (Pik Pi'k)mj ")

kJ kJ'

min)). (6.6)
_ (Mj mj )` ") ") y (Pik Pik)) (1 NS)(M

Letting i, i' be such that p (n+l) _ Mj(n+l) p1j = m(n+l) one gets from (6.6),
Min+l) m 5v+l) < (1 N6)(Mj(" min)). ) (6.7)

Iteration now yields, using (6.3) as well as (6.7),

M ) m (1 N(5)" for n >, 1. (6.8)

M in+ 1) = max (n+ 1) = max I )) < max (Y- p ik M(n)1 = Mcn) ,
P \ P'k
Pkj j J
i i k i k

r 1) =min p^^ +i) = min Y_ P ik Pkj) min )
(^ Pikm;" ^ = m;" ) ,
i i \ k J i k

i.e., Mj(" is nonincreasing and m

) nondecreasing in n. Since M;" , mj" are ) )

bounded above by 1, (6.7) now implies that both sequences have the same limit,
say n j . Also, 6<mj(' <m<nj <Mj for all n, so that n j >6 for all jand
) ()

\p i1)_ 7El\ M(n) m^n)

m ,")M(n)

which, together with (6.8), implies the desired inequality (6.2).

Finally, taking limits on both sides of the identity

Pij +l) _ > P ik Pkj )


one gets it j = E k Trk Pkj, proving (6.1). Since E j p;^ = 1, taking limits, as n -- co,

it follows that Y j nj = 1. To prove uniqueness of the probability distribution

it satisfying (6.1), let it = f nj : j e S} be a probability distribution satisfying
ic'p = it'. Then by iteration it follows that

frj = n;P;j = (it'p)j = (n'pp)j = (n'p 2 )j = ... _ (n'p")j = Y n;P. (6.10)

Taking limits as n * oo, one gets nj = Y ; n ; ij = n j . Thus, n = n. n

For an irreducible aperiodic Markov chain on a finite state space S there is

a positive integer v such that

6':=minp;j ) >0. (6.11)


Applying Proposition 6.1 with p replaced by p" one gets a probability

it = {rc j : j e S} such that

max nj l < (1 Na')", n = 1, 2 .... (6.12)


Now use the relations

IP(jv+r") 1Cjl = - v)
Pik ) (Pkj x j ) ) 1< L, Pik )(1 N(5 ' ) " = (1 Nb ' ) ,
" in i 1,
k k
to obtain

p8j) it! < (1 N8')[ "iv) n = 1, 2, ... , (6.14)

where [x] is the integer part of x. From here one obtains the following corollary
to Proposition 6.1.

Corollary 6.2. Let p be an irreducible and aperiodic transition law on a state

space S having N states. Then there is a unique probability distribution it on
S such that

Y_n;p;j =nj foralljeS. (6.15)



Ip!) ijl < (1 N6')r"J" ] for all i, je S, (6.16)

for some 6' > 0 and some positive integer v.


The property (6.15) is important enough to justify a definition.

Definition 6.1. A probability measure it satisfying (6.15) is said to be an

invariant or steady-state distribution for p.

Suppose that it is an invariant distribution for p and let {X n } be the Markov

chain with transition law p starting with initial distribution n. It follows from
the Markov property, using (2.9), that

Pn(Xm = lo, Xm+1 = i l , ... , Xm+n = in) _ (TC ' P m )i0Pt0t Pi 1 i2
1 .. .
r 11.

_ lt lo PioiI Pi,(z. . . D)n Iin

= P (X0 = i0, X1 = i1,

P . Xn = in),
for any given positive integer n and arbitrary states i 0 , i 1 , ... , i n e S. In other
words, the distribution of the process is invariant under time translation; i.e.,
{Xn } is a stationary Markov process according to the following definition.

Definition 6.2. A stochastic process { Yn } is called a stationary stochastic

process if for all n, m

P(YO = i0, Y1 = i1, ... , Y. = in) = P(Ym = lo, Y1 +m = i1 , . Yn+m = in).

Proposition 6.1 or its corollary 6.2 establish the existence and uniqueness
of an invariant initial distribution that makes the process stationary. Moreover,
the asymptotic convergence rate (relaxation time) may also be expressed in the

IPi;) ir;I < ce ' - (6.19)

for some c, A > 0.

Suppose that {Xn } is a stochastic process with state space J = [c, d] and
having the Markov property. Suppose also that the conditional distribution of
Xn+ 1 given X. = x has a density p(x, y) that is jointly continuous in (x, y) and
does not depend on n >, 1. Given X0 = x 0 , the (conditional on X 0 ) joint density
of X 1 , ... , Xn is given by

J(Xi.....X..IXo=xo)(xl, ... , xn) = P(x0, X1)P(x1, X2) ...

P(Xn 1, xn). (6.20)

If X0 has a density u(x), then the joint density of X 0 , . .. , Xn is

Jxo,....X,)(xo, .. . , xn) = ( x0)P(xo, x1) ...

P(xn-1, xn). (6.21)

Now let

6 = min p(x, y). (6.22)

x.y c [c,dj

In a manner analogous to the proof of Proposition 6.1, one can also obtain
the following result (Exercise 9).

Proposition 6.3. If 6 = minx, Y E[C,dlp(x, y) > 0 then there is a continuous

probability density function n(x) such that

f d n(x)p(x, y) dx = n(y) for all y e (c, d) (6.23)


p( )(x, y) n(y)I '< [1 6(d c)]' '0 for all x, ye (c, d) (6.24)


0 = max { p(x, y) p(z, y)} . (6.25)

x,y, z e [c,dl

Here p "(x, y) is the n -step transition probability density function of X" given

Xo = x.

Markov processes on general state spaces are discussed further in theoretical

complements to this chapter.
Observe that if A = (( a s)) is an N x N matrix with strictly positive entries,
then one may readily deduce from Proposition 6.1 that if furthermore
yN , a, = 1. for each i, then the spectral radius of A, i.e., the magnitude of the
largest eigenvalue of A, must be 1. Moreover, 2 = I must be a simple eigenvalue
(i.e., multiplicity 1) of A. To see this let z be a (left) eigenvector of A
corresponding to A = 1. Then for t sufficiently large, z + to is also a positive
eigenvector (and normalizable), where it is the invariant distribution (normalized
positive eigenvector for A = 1) of A. Thus, uniqueness makes z a scalar multiple
of n. The following theorem provides an extension of these results to the case
of arbitrary positive matrices A = ((a u )), i.e., a 1 > 0 for 1 < i, j < N. Use is not
made of this result until Sections 11 and 12, so that it may be skipped on first
reading. At this stage it may be regarded as an application of probability to

Theorem 6.4. [PerronFrobenius]. Let A = ((a ij )) be a positive N x N matrix.

(a) There is a unique eigenvalue A o of A that has largest magnitude.
Moreover, A is positive and has a corresponding positive eigenvector.


(b) Let x be any nonnegative nonzero vector. Then

v = tim A nAnx (6.26)


exists and is an eigenvector of A corresponding to A,, unique up to a

scalar multiple determined by x, but otherwise independent of x.

Proof. Define A + = {A > 0: Ax >, tix for some nonnegative nonzero vector x};
here inequalities are to be interpreted componentwise. Observe that the set A +
is nonempty and bounded above by IIAII :_ JN , J" , a ij . Let A o be the least
upper bound of A. There is a sequence {2 n : n >, l} in A + with limit A o as
n -+ oc. Let {x n : n >, l} be corresponding nonnegative vectors, normalized so
that hIxIl xi = 1, n >, 1, for which Ax n >, .? n x n . Then, since Ilxnll = 1 ,
n = 1, 2, ... , {x n } must have a convergent subsequence, with limit denoted x 0 ,
say. Therefore Ax 0 >, 2 0 x 0 and hence A o e A + . In fact, it follows from the least

1 _j a x
upper bound property of 2 c that Ax o = 2 0 x 0 . For otherwise there must be a
component with strict inequality, say l j j 2 0 x 1 = > 0, where
x 0 = (x 1 ..... x N )', and Ij , a kj x j 2 o x k >, 0, k = 2, ... , N. But then taking
y = (x 1 + ((5/2), x 2 , ... , x N )' we get Ay > pl o y with strict inequality in each
component. This contradicts the maximality of 2. To prove that if A is any
other eigenvalue then Al J <, A., let z be an eigenvector corresponding to A and
define Izl = (Iz1l, , Iz n l). Then Az = Az implies AIzl %JAIIzl. Therefore, by
definition of 2, we have 1 2 1 <, 20. To prove part (ii) of the Theorem we can
apply Proposition 6.1 to the transition probability matrix

C ^ P`, 10xi

where x o = (x l , ... , x r,) is a positive eigenvector corresponding to A c,. In

particular, noting that

)X j
(2) N
Puj k1 Pik Pkj
N a ik x k a kj xj a
k = 1 20x1 AOxk ^O X

and inductively
^n> ai] xj
Pij ^f n >
A xi

the result follows. n

Corollary 6.5. Let A = ((a ij )) be an N x N matrix with positive entries and
let B be the (N 1) x (N 1) matrix obtained from A by striking out an ith
row and jth column. Then the spectral radius of B is strictly smaller than that
of A.

Proof. Let Z oo be the largest positive eigenvalue of B and without loss of

generality take B = ((a 1 : i, j = 1, ... , N 1)). Since, for some positive x,

a,j x j = 2 o x 1 , i = 1, .. , N, x i > 0,

we have
(Bx)i = a,jxj = 2oxi a,NXN < 2OX;, 1= 1,. ..,N 1.

Thus, by the property (6.26) applied to Z oo we must have .loo <2 g . n

Corollary 6.6. Let A = ((a ;j )) be a matrix of strictly positive elements and let
2 0 be the positive eigenvalue of maximum magnitude (spectral radius). Then
A 0 is a simple eigenvalue.

Proof. Consider the characteristic polynomial p(A) = det(A Al). Differen-

tiation with respect to A gives p'(2) = yk=, det(A k ) I), where A k is the
matrix obtained from A by striking out the kth row and column. By Corollary
6.5 we have det(A k 1 o I) 0, since each A k has smaller spectral radius than
). o . Moreover, since each polynomial det(A k AI) has the same leading term,
they all have the same sign at A = A. Thus p'(2 o ) 0 0. n

Another formula for the spectral radius is given in Section 14.



In general, the transition law p may admit several essential classes, periodicities,
or inessential states. In this section, we consider the asymptotics of p" for such
First suppose that S is finite and is, under p, a single class of periodic essential
states of period d> 1. Then the matrix p , regarded as a (one-step) transition
probability matrix of the process viewed every d time steps, admits d
(equivalence) classes C o , C l , ... , C_ 1 , each of which is aperiodic. Applying
(6.16) to Cr and p d (instead of S and p) one gets, writing N, for the number of
elements in Cr ,

^pi; ^ ic j 1 '< (1 Nr S r
)In/vr for i, j a Cr, (7.1)

where n j = lim n . p;^l, v, is the smallest positive integer such that p!yd > > 0

for all i, j e C and b r = min{ pij, d ) : i, j e C,}. Let 6 = min{b,: r = 0, 1, ... , d l}.
Then one has, writing L = min{ N,: 0 < r < d 1 }, v = max{v,: 0 < r < d 1 },

piid)j1 <(1 Lc5)I'I for i, je C, (r=0,1,. ..,d 1). (7.2)


If i E C, and j c C S with s = r + m (mod d), then one gets, using the facts that
pkjd) = 0 if k is not in Cs , and that Y-kEC. pik ) = 1 ,

d+m) 71j )I
7I j l = I I Pik)(Pkjd) (I L5)'" for i e Cr, E Cs .
^ Pij j
keC s
Of course,
d+m') = 0 if i e C j E Cs , and m' s r (mod d). (7.4)

Note that {nj : j e Cr } is the unique invariant initial distribution on C r for the
restriction of p d to C,.
Now, in view of (7.4),

nd nI
= pit'd+m)

if je C j E Cs , and m = s r (mod d). (7.5)
t=1 no

If r = s then m = 0 and the index of the second sum in (7.5) ranges from 1 to
n. By (7.3) one then has
I nd
lim - Y pii = 7r J.
) (i, j ES). (7.6)
nac n t=1

This implies, on multiplying (7.6) by 1/d,

lim - P _
) (i, j e S). (7.7)
nm nt=1 d

Now observe that it = {n j /d: j e S} is the unique invariant initial distribution

of the periodic chain defined by p since

k 1 n

keS d
, lim
kes n-oo
n [=I
P ik ) Pkj

= lim - Y Pi;+ I) = (j e S). (7.8)
n-xnI d

Moreover, if it = {i c j : j e S} is another invariant distribution, then


Frj = nkPkj it] (t = 1, 2, ...), (7.9)

kES keS

so that, on averaging over t = 1, ... , n 1, n,

fj = 1 Y_ Y ^k Pkj =
=1 kES keS
nk ^ 1

The right side converges to Y- k n k (rc j /d) = n j /d, as n * oc by (7.7). Hence

n = /d.

If the finite state space S consists of b equivalence classes of essential states

(b > 1), say 61,, &'z , ... .9b , then the results of the preceding paragraphs may
be applied to each , for the restriction of p to ef, separately. Let it denote
the unique invariant initial distribution for the restriction of p to f;. Write n ` ( )

for the probability distribution on S that assigns probability 71j(` to states j e 9i )

and zero probabilities to states not in 4 Then it is easy to check that ^6 = I a ; t" )

is an invariant initial distribution for p, for every b-tuple of nonnegative numbers

... , a b satisfying = 1. The probabilistic interpretation of this is
as follows. Suppose by a random mechanism the equivalence class .6 ; is chosen
with probability a i (1 < i < b). If tf, is the class chosen in this manner, then
start the process with initial distribution n ' on S. The resulting Markov chain
( )

{X} is a stationary process.

It remains to consider the case in which S has some inessential states. In the
next section it will be shown that if S is finite, then, no matter how the Markov
chain starts, after a finite (possibly random) time the process will move only
among essential states, so that its asymptotic behavior depends entirely on the
restriction of p to the class of essential states. The main results of this section
may be summarized as follows.

Theorem 7.1. Suppose S is finite.

(i) If p is a transition probability matrix such that there is only one essential
class ' of (communicating) states and these states are aperiodic, then
there exists a unique invariant distribution it such that y j , S n ; = 1 and
(6.16) holds for all i, j e if.
(ii) If p is such that there is only one essential class of states o' and it is
periodic with period d, then again there exists a unique invariant
distribution it such that ;E , n j = 1 and (7.3), (7.4), (7.7) hold for i, j e
and i; =
(iii) If there are b equivalence classes of essential states, then (i), (ii), as the
case may be, apply to each class separately, and every convex
combination of the invariant distributions of these classes (regarded as
state spaces in their own right) is an invariant distribution for p.



Let {X} be a Markov chain with countable state space S and transition
probability law p = ((p ij )). As in the case of random walks, the frequency of
returns to a state is an important feature of the evolution of the process.

Definition 8.1. A state j is said to be recurrent if

PJ (X =j i.o.) = 1, (8.1)

and transient if

PJ(X =j i.o.) = 0. (8.2)

Introduce the successive return times to the state j as

t^ =0,
} '}=min{n>0:X=j},
r=min{n> T:X=j} (r= 1,2,...),

with the convention that Tj(' is infinite if there is no n >

} for which X =j.

pj1=Pj(X=i for some n? 1)=Pj (r;<oo). (8.4)

Using the strong Markov property (Theorem 4.1) we get

Pj(T! ') < 00) = Pj (ii' - ' < oo and Xt,,- + = i for some n i 1)

= Ej( 1 {ty- < } PXt , r - u (X = i for some n > 1))

= E j (1,- < R,(X = i for some n? 1))
= Ej( 1 {t^.- <x)Pij) = Pj(T,'-n < 00 )P. (8.5)

Therefore, by iteration,

Pj(T ! ' ) < cc) = Pj(i{ l } < cc)P' - ' = Pji P - ` (r = 2, 3, ...). (8.6)

In particular, with i =j,

Pj(T(' <cc)=Pjj (r=1,2,3,...). (8.7)


Pj (X =j for infinitely many n) = Pj (rj(' < cc for all r) }


= ^Pii=1
1 i
= Jim P i (T;r ^ < oo) f (8.8)
r .. 0 if pii < 1.

Further, write N(j) for the number of visits to the state j by the Markov chain
{X}, and denote its expected value by

G(i,j) = E.N(j). (8.9)

Now by (8.6) and summation by parts (Exercise 1), if i j then

E1N(j) _ Pj(N(j) > r) _ Pr+t < )
op) = Pji (8.10)
r=0 r=0 r=0

so that, if i 0 j,

0 if i +* j, i.e., if p ii = 0,
G(i, j) = Pii/(l - p) if i --, j and p 1, (8.11)
00 if i -+ f and pii = 1.

This calculation provides two useful characterizations of recurrence; one is in

terms of the long-run expected number of returns and the other in terms of the
probability of eventual return.

Proposition 8.1
(a) Every state is either recurrent or transient. A state j is recurrent iff p ii = 1
iff G(j, j) = oc, and transient if pii < 1 if G(j, j) < oo.
(b) If j is recurrent, j --* i, then i is recurrent, and pi; = p ii = 1. Thus,
recurrence (or transience) is a class property. In particular, if all states
communicate with each other, then either they are all recurrent, or they
are all transient.
(c) Let j be recurrent, and S(j) _ {i e S: j -- i) be the class of states which
communicate with j. Let n be a probability distribution on S(j). Then

P,r (X visits every state in S(j) i.o.) = 1. (8.12)

Proof. Part (a) follows from (8.8), (8.11). For part (b), suppose j is recurrent
and j - i (i j). Let A r denote the event that the Markov chain visits i between
the rth and (r + 1)st visits to state). Then under Pi , A, (r >, 0) are independent
events and have the same probability 6, say. Now 0 > 0. For if 0 = 0, then
PJ (X = i for some n >, 1) = Pj ((J r>0 A,) = 0, contradicting j - i. It now
follows from the second half of the Borel-Cantelli Lemma (Chapter 0, Lemma
6.1) that PJ (A. i.o.) = 1. This implies G(j, i) = oo. Interchanging i and j in (8.11)
one then obtains p,, = 1. Hence i is recurrent. Also, pi; >_ Pj (A r i.o.) = 1. By
the same argument, p ii = 1.


To prove part (c) use part (b) to get

P* (X visits i i.o.) = 1] n k P,k (X visits i i.o.) = I n,^ = 1. (8.13)

k c S(j) k E S(j)


Pn( n {XX visits i i.o.}) = 1. (8.14)


Note that G(i, i) = 1/(1 p a ), i.e., replace p, by 1 in (8.10).
For the simple random walk with p> Z, one has (see (3.9), (3.10) of
Chapter I),


(1 2p) for i > j,

G(i, j) _ (qp (8.5)
1/(1-2p) for i<j.

Proposition (8.1) shows that the difference between recurrence and transience
is quite dramatic. If) is recurrent, then P P (N(j) = cc) = 1. If] is transient, not
only is it true that P3 (N(j) < oo) = 1, but also E1 (N(j)) < oo. Note also that
every inessential state is transient (Exercise 7).

Example 1. (Random Rounding). A real number x >, 0 is truncated to its

(greatest) integer part by the function [x] := max{n E 7L: n < x}. On the other
hand, [x + 0.5] is the value of x rounded-off to the nearest whole number.
While both operations are quite common in digital filters, electrical engineers
have also found it useful to use random switchings between rounding and
truncation in the hardware design of certain recursive digital multipliers. In
random rounding, one applies the function [x + u], where u is randomly selected
from [0, 1), to digitize. The objective in such designs is to remove spurious fixed
points and limit cycles (theoretical complement 8.1).
For the underlying mathematical ideas, first consider the simple one-
dimensional digital recursion (with deterministic round-off), x + i = [ax + 0.5],
n = 0, 1, 2, ... , where dal < 1 is a fixed parameter. Note that x o = 0 is always
a fixed point; however, there can be others in 71 (depending on a). For example,
if a = Z, then x o = 1 is another. Let U 1 , U 2 ,... be an i.i.d. sequence of random
variables uniformly distributed on [0, 1) and consider X, = [aX + U, + ,],
n >, 0 on the state space 7L. Again X0 = 0 is a fixed point (absorbing state).
However, for this case, as will now be shown, X. --> 0 as n oo with probability
1. To see this, first note that 0 is an essential state and is accessible from any
other state x. That 0 is accessible in the case 0 < a < I goes as follows. For
x > 0, [ax + u] _ [ax] < ax <x if u E [0, 1 (ax [ax] )). If x < 0, ax 0 7L,
^[ax + u]I = [ax] + II < alx) if ue(1 + ([ax] ax), 1). In either case, these

events have positive probability. If ax c 11, then I[ax + u]I = laxl. Since
0 is absorbing, all states x ; 0 must, therefore, be inessential, and thus transient.
To obtain convergence, simply note that the state space of the process started
at x is finite since, with probability 1, the process remains in the interval
[ Ixl, Ixl]. The finiteness is critical for this argument, as one can readily see
by considering for contrast the case of the simple asymmetric (p > 2) random
walk on the nonnegative integers with absorbing boundary at 0.
Consider now the k-dimensional problem X n+ 1 = [AX n + U n+ 1 ], n = 0, 1, 2,
, where A is a k x k real matrix and {U n } is an i.i.d. sequence of random
vectors uniformly distributed over the k-dimensional cube, [0, 1) k and [ ] is
defined componentwise, i.e., [x] = ([x 1 ], ... , [x k ]), x e 11 k It is convenient to
use the norm II II defined by Ilx lie := max{Ix,l, , x,j}. The ball
B,(0):= {x: Ilxllo < r} (of radius r centered at 0) for this norm is the square of
side length 2r centered at 0. Assume II Ax II o < II x II o, for all x 0 (i.e.,
IIAII := sup, , 0 I1AxiI 0 /Ilx11 0 < 1) as our stability condition. Once again we wish
to show that X n 0 as n > oo with probability 1. As in the one-dimensional
case, 0 is an (absorbing) essential state that is accessible from every other state
x because there is a subset N(x) of [0, 1) k having positive volume such that for
each u c N(x), one has II[Ax + u]110 < IIAxllo < Ilxll o . So each state x # 0 is
inessential and thus transient. The result follows since the process starting at
x e 71k does not escape B 11x1 ^ 0 (0), since II[Ax + u]110 < I1Axllo + I < 11x1, + 1
and, since II[Ax + u]Il o and Ilx110 are both integers, (I[Ax + u](lo < 11x10
Linear models X n+ , = AX n + E n+ ,, with {E n } a sequence of i.i.d. random
vectors, are systematically analyzed in Section 13.



The existence of an invariant distribution is intimately connected with the

limiting frequency of returns to recurrent states. For a process in steady state
one expects the equilibrium probability of a state j to coincide with the fraction
of time spent on the average by the process in state j. To this effect a major
goal of this section is to obtain the invariant distribution as a consequence of
a (strong) law of large numbers.
Assume from now on, unless otherwise specified, that S comprises a single
class of (communicating) recurrent states under pfor the Markov chain {X}.
Let f be a real-valued function on S, and define the cumulative sums

Sn = f(Xm) (n = 1, 2, ...). (9.1)


For example, if f (i) = 1 for i = j and f (i) = 0 for i j, then S/(n + 1) is


the average number of visits to j in time 0 to n. As in Section 8, let t.r denote

the time of the rth visit to state j. Write the contribution to the sum S, from
the rth block of'time (tr, zr + "] as,

Zj r I I I

Z r = I f^(X,,,) (r=0,1,2,...). (9.2)

in =+ I

By the strong Markov property, the conditional distribution of the process

{Xt l ,, Xt ..,, + ,, .. , X r ir +n , ...}, given the past up to time r, is PP , which is the
distribution of {X 0 , X,, ... , Xn , ...} under X0 =1. Hence, the conditional
distribution of Zr given the process up to time Tcr' is that of
Z o = [(X 1 ) + + f (X T ), given X 0 =1. This conditional distribution does
not change with the values of X 0 , X 1 .... , X(= j), rfr'. Hence, Zr is
independent of all events that are determined by the process up to time rfr . In )

particular, Zr is independent of Z,, ... , Z r _ . Thus, we have the following result.

Proposition 9.1. The sequence of random variables {Z 1 , Z 2 , ...} is i.i.d., no

matter what the initial distribution of {X,,: n = 0, 1, 2, ...} may be.

This decomposition will be referred to as the renewal decomposition. The strong

law of large numbers now provides that, with probability 1,

lim Z Z= EZ 1 , (9.3)
r-+x r 1

provided that EIZ 1 I < oc. In what follows we will make the stronger assumption

T 2,

E I f(Xm)I < oo . (9.4)

m= tJ! I ,

Write Nn for the number of visits to state j by time n. Now,

NN =max{r>0:tjr'<n}. (9.5)

Then (see Figure 9.1),

r N tih * 1^

n 1 J( m)
m=0 r=1 n+I

For each sample path there are a finite number, t 1, of summands in

the first sum on the right side, except for a set of sample paths having probability
zero. Therefore,

S Al)
f(i) f(Xn)
x,=r T T
f(j) f(j) f(j) f(j) f(j)
---- T-------I Y-------------T--------------z---
k T T

CO tI... i (') C (') + 1 10)... T oi ii +l ... T in L N. +I...

0 1 .. n n+ I ... T iN.+u...

U t I I
ZIA Zi Z, ... ZN
"-I Z.N,


Figure 9.1

1 T 11)

lim ^ f(Xm ) = 0, with probability 1. (9.7)

n-.m n m=0

The last sum on the right side of (9.6) has at most r;'"' ) r( N ^ ) summands,
this number being the time between the last visit to j by time n and the next
visit to j. Although this sum depends on n, under the condition (9.4) we still
have that (Exercise 1)
1T '" a l l 1 T'N-' 1 1

Y_ .f(X.)I' I If(Xm)^ +0 a.s. as n co. (9.8)

n m=n n m-r"+1


S _ 1 N^ N 1 N^
2] Z,+R n = " E Z,+R n , (9.9)
n n, 1n
Nn ,= 1

where R n --* 0 as n --> oo with probability 1 under (9.4). Also, for each sample
path outside a set of probability 0, N N oo as n oo and therefore by (9.3)
(taking limit over a subsequence)

1 N^
lim Y Z r = EZ i (9.10)
if (9.4) holds. Now, replacing f by the constant function f - 1 in (9.10), we have


_' = E(tj 2) T i

lim L_ (9.11)
nx Nn

assuming that the right side is finite. Since

T (N,J
nT (N " ) < T(N"+ 1)

which is negligible compared to N n as n oo, one gets (Exercise 2)

tim n = E( T ^. 2 ) r'') (9.12)

, Nn

Note that the right side, E(r 2) t'), is the average recurrence time of
j(= E j tj(' ) ), and the left side is the reciprocal of the asymptotic proportion of
time spent at].

Definition 9.1. A state j is positive recurrent if

E T;' < oo.

; ) (9.13)

Combining (9.9)(9.11) gives the following result. Note that positive recurrence
is a class property (see Theorem 9.4 and Exercise 4).

Theorem 9.2. Suppose j is a positive recurrent state under p and that f is a

real-valued function on S such that

E {If(X1)I + ... + If(X)I} < oo.


Then the following are true.

(a) With F3 -probability 1,
lim -- Z f(Xm) = E;(.f (X,) + ... + .i(X , ))/E^T; 1) . (9.15)
n-aao n m=O

(b) If S comprises a single class of essential states (irreducible) that are all
positive recurrent, then (9.15) holds with probability I regardless of the
initial distribution.

Corollary 9.3. If S consists of a single positive recurrent class under p, then

for all i, j E S,

1 n
lim - P;;) _ (9.16)
n- n m=i Et

A WORD ON PROOFS. The calculations of limits in the following proofs require

the use of certain standard results from analysis such as Lebesgue's Dominated
Convergence, Fatou's Lemma, and Fubini's theorem. These results are given
in Chapter 0; however, the reader unfamiliar with these theorems may proceed
formally through the calculations to gain acquaintance with the statements.

Proof Take f to be the indicator function of the state j,

I for all k ^ j. (9.17)

Then Zr - 1 for all r = 0, 1, 2, ... since there is only one visit to state j in
(T cr) ,
i(r+1)] Hence, taking expectation on both sides of (9.15) under the initial
state i, one gets (9.16) after interchanging the order of the expectation and the
limit. This is permissible by Lebesgue's Dominated Convergence Theorem since
(n + I)' YOf(Xm)I < 1.

It will now be shown that the quantities defined by

i; =(Eji; n ) - ' (jES) (9.18)

constitute an invariant distribution if the states are all positive recurrent and
communicate with each other. Let us first show that Y- t it = I in this case. For
this, introduce the random variables

Ti(" ) = # {m e (r (.' ) , i (.' +i)]: Xm = i} (r = 0, 1, 2, ...), (9.19)

i.e., T i ' is the amount of time the Markov chain spends in the state i between
( )

the rth and (r + 1)th passages to j. The sequence {T: r = 1, 2, ...} is i.i.d. by
the strong Markov property. Write

O (i) = E J T;' . ) (9.20)


03(i) _ EJ T; 1) = Ej ( T}1)) = E j ( t f2 _ r ) = -.
ics 1 l l 7f

Also, taking f to be the indicator function of {i} and replacing j by i in Theorem

9.2, one obtains

#{m n:X m = = Ir!

lim h w probability
it 1. (9.22)
n-+ao n

On the other hand, by the strong law of large numbers applied to


r = 1, 2, ...}, and by (9.12) and (9.18), the limit on the left side also equals
V T(r)
nJ IN, T1'
lim -h---- = lim " N = n 1 Bj (i). (9.23)
n- oc n- x n \r1 n


ni = n j Oj (i). (9.24)

Adding over i, one has using (9.21),

71 f
By Scheffe's Theorem (Theorem 3.7 of Chapter 0) and (9.16),

- 71i +0 as n ^ oo . (9.26)
iES nm =1


1 1
Pam) Pij < TCi - Y- pj, " ') --> 0 as n
iES iES n m= 1 iES n m= 1


ni Pi; = lim E - Y Pjl Pi; ")

iES n-^ iESn m=1

= hm - Y E p^.m)p . `

n-^ao n m= 1 iES

1 n (m+l) _
= lim Pi; it; (9.28)
n-'x n m=1

In other words, it is invariant.

Finally, let us show that for a positive recurrent chain the probability
distribution it above is the only invariant distribution. For this let it be another
invariant distribution. Then

it = z niPij (m = 1, 2, ...).
; )

Averaging over m = 1, 2, ... , n, one gets

n; = Y_ ni
ieS n m=1
i Pi; (9.30)

The right side converges to Y_ i n i n; = nj by Lebesgue's Dominated Convergence

Theorem. Hence n; = n; .
We have proved above that if all states of S are positive recurrent and
communicate with each other, then there exists a unique invariant distribution
given by (9.18). The next problem is to determine what happens if S comprises
a single class of recurrent states that are not positive recurrent.

Definition 9.2. A recurrent state j is said to be null recurrent if

Ej t,' = cc. (9.31)

If j is a null recurrent state and i..j, then for the Markov chain having initial
state i the sequence {Z,: r = 1, 2, ...} defined by (9.2) with f - 1 is still an i.i.d.
sequence of random variables, but the common mean is infinity. It follows
(Exercise 3) that, with Pi -probability 1,
j (N")

llm ' = cc.

n^oo Nn

Since n we have
lim = cc,
n- w Nn

and, therefore,

lim N = 0
" with P- probability 1. (9.32)
". te n+l

Since 0 < N" /(n + 1) < 1 for all n, Lebesgue's Dominated Convergence
Theorem applied to (9.32) yields

lim E " ; = 0. (9.33)
n -^ n+1


EiNn = Ei(t. 1 (x.., =J)) = Z pi) , (9.34)

m=1 m=1


and (9.33), (9.34) lead to

lim p) = 0 (9.35)
n-.o Il + 1 m=1

if j is null-recurrent and j -+ i (or, equivalently, j H i). In other words, (9.16)

holds if S comprises either a single class of positive recurrent states, or a single
class of null recurrent states.
In the case that j is transient, (8.11) implies that G(i, j) < co, i.e.,

p;; ) <ao.

In particular, therefore,

lmp;; = 0 ) (in S), (9.36)

n -.

if j is a transient state.
The main results of this section may be summarized as follows.

Theorem 9.4. Assume that all states communicate with each other. Then one
has the following results.
(a) Either all states are recurrent, or all states are transient.
(b) If all states are recurrent, then they are either all positive recurrent, or
all null recurrent.
(c) There exists an invariant distribution if and only if all states are positive
recurrent. Moreover, in the positive recurrent case, the invariant
distribution it is unique and is given by

n= (E;i ')) 1 (j E S). (9.37)

(d) In case the states are positive recurrent, no matter what the initial
distribution p, if E,, I f (X 1 )I < cc, then

him !
- Z f(Xm) = Z nif(i) = Enf(X,) (9.38)
n c m=1 ieS

with P -probability 1.

Proof. Part (a) follows from Proposition 8.1(a).

To prove part (b), assume that j is positive recurrent and let i 0 j. Since
T`r) < r+1) _ T^'
) , one has ET;' < co. Hence (9.23) holds. Also, (9.22) holds

with iv i > 0 if i is positive recurrent, and it = 0 if i is null recurrent (use (9.32)


in the latter case). Thus (9.24) holds. Now O (i) > 0; for otherwise T = 0 with
probability 1 for all r > 0, implying p , = 0, which is ruled out by the assumption.

The relation (9.24) implies it > 0, since it > 0 and O (i) > 0. Therefore, all states
are positive recurrent.
For part (c), it has been proved above that there exists a unique invariant
probability distribution it given by (9.35) if all states are positive recurrent.
Conversely, suppose it is an invariant probability distribution. We need to show
that all states are positive recurrent. This is done by elimination of other
possibilities. If the states are transient, then (9.36) holds; using this in (9.29)
(or (9.30)) one would get it = 0 for all j, which is a contradiction. Similarly,
null recurrence implies, by (9.30) and (9.35), that ii = 0 for all j. Therefore, the
states are all positive recurrent. Part (d) will follow from Theorem 9.2 if:
(i) The hypothesis (9.14) holds whenever

Y iil.f(i)I < oo, (9.39)



n() E;Zo = E;(.i(X1) + ... + .i(XT;1)) = 1 i f(i). (9.40)

tj ics

To verify (i) and (ii), first note that by the definition of


If(Xm)I = If(i)IT;' (9.41)

m I iES

Since ET;' = O (i) = n i /rr (see Eq. 9.24), taking expectations in (9.41) yields

E( If(Xm)I = E If(i)IT')) = If(i)I"`. (9.42)

m=T^ ) +I ies iES 7rj

The last equality follows upon interchanging the orders of summation and
expectation, which by Fubini's theorem is always permissible if the summands
are nonnegative. Therefore (9.14) follows from (9.39). Now as in (9.42),
T cn

E;Z0 = EZi = E .i(Xm) = E( , i(i)T;' )

m +i lES

_ f(i)ET' ) = 1 f(i) 1 , (9.43)

ics I)

where this time the interchange of the orders of summation and expectation is
justified again using Fubini's theorem by finiteness of the double "integral."

If the assumption that "all states communicate with each other" in Theorem
9.4 is dropped, then S can be decomposed into a set . t of inessential states and
(disjoint) classes S l , S 2 , ... , St of essential states. The transition probability
matrix p may be restricted to each one of the classes S..... , S, and the
conclusions of Theorem 9.2 will hold individually for each class. If more than
one of these classes is positive recurrent, then more than one invariant
distribution exist, and they are supported on disjoint sets. Since any convex
combination of invariant distributions is again invariant, an infinity of invariant
distributions exist in this case. The following result takes care of the set , of
inessential states in this connection (also see Exercise 4).

Proposition 9.5
(a) If j is inessential then it is transient.
(b) Every invariant distribution assigns zero probability to inessential,
transient, and null recurrent states.

Proof. (a) If j is inessential then there exist i E S and m > I such that

P;M > 0
) and p;; = 0
) for all n >, 1. (9.44)


PJ(N(j)<c)>Pj(Xm =i, X,^j for n>m)

=pP.(Xn rj for n>0)=p;7' 1 >0. (9.45)

By Proposition 8.1, j is transient, since (9.45) says j is not recurrent.

(b) Next use (9.36), (9.35) and (9.30) and argue as in the proof of part (c)
of Theorem 9.2 to conclude that nj = 0 if j is either transient or null recurrent.

Corollary 9.6. If S is finite, then there exists at least one positive recurrent
state, and therefore at least one invariant distribution n. This invariant
distribution is unique if and only if all positive recurrent states communicate.

Proof. Suppose it possible that all states are either transient or null recurrent.

lim p7 = 0 for all i, j e S. (9.46)

n -.^ n+1 m -p

Since (n + 1) ' I, _ , p;T < I for all], and there are only finitely many states

j, by Lebesgue's Dominated Convergence Theorem,


1 1 "
lim Pi; ^) = lim Pi;
jes n n+ 1 m=1 nix jes n+ 1 m=o

= lim (m)
n -. x n + 1 m=0 jcs
=tim 1 i 1 =lim n--1 =1. (9.47)
(n + 1 m=o
- n-x,n+1

But the first term in (9.47) is zero by (9.46). We have reached a contradiction.
Thus, there exists at least one positive recurrent state. The rest follows from
Theorem 9.2 and the remark following its proof.


The same method as used in Section 9 to obtain the law of large numbers may
be used to derive a central limit theorem for S. = Ym =o f(Xm ), where f is a
real-valued function on the state space S. Write

= Ef(X0) = 1] it f(i), (10.1)


and assume that, for Z, = L , r fr, + 1 f (Xm ), r = 0, 1, 2, ...

Ej (Z0 - E ; Zo ) 2 < co. (10.2)

Now replace f by f = f - , and write

_ n _
Sn = Y- .J(X.) = Z (f(Xm) - U),
m=0 m=0

Zr = ^ J(X) (r = 0, 1, 2, ...).
m =,(i +1

Then by (9.40),

Ej Zr = (Ejt)En f(Xo) = 0 (r = 0, 1, 2, ...). (10.4)

Thus {Z,: r = 1, 2, ...} is an i.i.d. sequence with mean zero and finite variance

or = EJ . (10.5)

Now apply the classical central limit theorem to this sequence. As r - cc,
(1/^) jk =1 Zk converges in distribution to the Gaussian law with mean zero
and variance U 2 . Now express Sn as in (9.6) with f replaced by f, S" by S,, and

Zr by Zr, to see that the limiting distribution of (1/^)S n is the same as that
of (Exercise 1)

1 N,, Na )112
1 N"
7 r i
n JNn r=1

We shall need an extension of the central limit theorem that applies to sums
of random numbers of i.i.d. random variables. We can get such a result as an
extension of Corollary 7.2 in Chapter 0 as follows.

Proposition 10.1. Let {XX : j >, 1} be i.i.d., EX^ = 0, 0 < a 2 := EX J2 < oc. Let
{v n : n >, l} be a sequence of nonnegative integer-valued random variables with
lim " = a in probability (10.7)
n-,, n

for some constant a > 0. Then XX/v, converges in distribution to

N(0, a 2 ).

Proof. Without loss of generality, let a = 1. Write S n := X, + + Xn . Choose

F > 0 arbitrarily. Then,

) < P(Ivn [na]I

P(IS," Small > c([na]) s 3 [na])

P( max
(mI (na1I <L 3 Ina17
ISm (n]1 % c([na]) 1J2 )

The first term on the right goes to zero as n > x, by (10.7). The second term
is estimated by Kolmogorov's Maximal Inequality (Chapter 1, Corollary 13.7),
as being no more than

{s([na]) 1 J z ) -z (8 3 [na]) = r. (10.9)

This shows that

S" Sfna1 + 0
in probability. (10.10)

Since S1na1 /([na]) l " 2 converges in distribution to N(0, 1), it follows from (10.10)
that so does S"/([na])'/ z . The desired convergence now follows from (10.7).
By Proposition 10.1, N'2 1 Zr is asymptotically Gaussian with mean
zero and variance o z . Since N/n converges to (Er') ', it follows that the
expression in (10.6) is asymptotically Gaussian with mean zero and variance

(E j i5 l) ) -1 0 2 . This is then the asymptotic distribution of n - '/ 2 5,,. Moreover,

defining, as in Chapter I,

Wn(t) = Sin(]

l' (t) = W(t) + (nt [nt])X 1+ 1 (t '> 0),

all the finite-dimensional distributions of {W(t)}, as well as {W(t)}, converge

in distribution to those of Brownian motion with zero drift and diffusion
D = (Ejij1>)-'Q2 (10.12)

(Exercise 2). In fact, convergence of the full distribution may also be obtained
by consideration of the above renewal argument. The precise form of the
functional central limit theorem (FCLT) for Markov chains goes as follows (see
theoretical complement 1).

Theorem 10.2. (FCLT). If S is a positive recurrent class of states and if (10.2)

holds then, as n -+ oo, l4(I) = (n + 1) -112 S converges in distribution to a
Gaussian law with mean zero and variance D given by (10.12) and (10.5).
Moreover, the stochastic process {W(t)} (or {W(t)}) converges in distribution
to Brownian motion with zero drift and diffusion coefficient D.

Rather than using (10.12), a more convenient way to compute D is sometimes

the following. Write E,^, Var, f , Cov,, for expectation, variance, and covariance,
respectively, when the initial distribution is it (the unique invariant distribution).
The following computation is straightforward. First write, for any two functions
h, g that are square summable with respect to it,

= Z h(i)g(i)r1 = E,,[h(X0)g(X0)]. (10.13)



Var,,((n + 1)-1/2) = E,d f(Xm) Z '(n + 1)

( M=O JJ

= E, MZ0 1 2(Xm) +2 mm 1 M^ O J (Xm')J (Xm)]'(n + 1)


E,J 2 (Xm)
n + 1 m=0


+ ? I Y E,,[J (Xm , )Erz(f (Xm) I Xr')]

n + 1 m=1 m0

= EJ 2(X0) + Y Y Erz[f
1 (Xm , )(P J )(XYr')]
n + 1 m=1 m=0

rn- I
= EJ 2 (X0 ) + ^ Erz % (X0)(P m '.j)(X0)]
n + 1 m=1 m'=0

2 n m-1

=ErzfZ(X0)+ ---- Z _Z <J,Pm-mJ>rz ,

n + 1 m=1 m'=0

=Erzf 2 (X0)+ 2 f, 1 Y- k
Y-Pi ( k=mm ' )
n+ 1 m = 1 k= 1I I n
Now assume that the limit

y= lim I <f,P kfi rz (10.15)
mimeo k=1

exists and is finite. Then it follows from (10.14) that

D = lim Var,,((n + l)_' 2 ) = E,, f 2 (Xo ) + 2y = irj f 2 (j) + 2y. (10.16)

n^ o is

Note that

<f, P k firz = COVf{.f(X0),f(Xk)}, (10.17)

m m
Y <j P k j >rz = COV{f(X0),J (Xk)l.
k=1 k=1

The condition (10.15) that y exists and is finite is the condition that the
correlation decays to zero at a sufficiently rapid rate for time points k units
apart as k , oo.


Suppose that p is a transition probability matrix for a Markov chain {Xn }

starting in state i e S. Suppose that j e S is an absorbing state of p, i.e., p 1,
Pik = 0 for k # j. Let r j denote the time required to reach j,


rj = inf{n: X. =j}. (11.1)

To calculate the distribution of i j , consider that

P1(r > m) = Y-* Per, PiIi2... Pi-- ;m , (11.2)

where Y* denotes summation over all m-tuples (i,, i 2 , ... , i) of elements from

S { j}. Now let p denote the matrix obtained by deleting the jth row and jth
column from p,

p = ((p ik : i, k E S { j})). (11.3)

Then, by definition of matrix multiplication, the calculation (11.2) may be

expressed as

Pi(t j > m) _ L P kkm1, (11.4)


and, therefore,

Pi(tj=m)=ZA<M-11_LPklml 7 Ai. (11.5)

k k

Observe that the above idea can be applied to the calculation of the first
passage time to any state j e S or, for that matter, to any nonempty set A of
states such that i 0 A. Moreover, it follows from Theorem 6.4 and its corollaries
that the rate of absorption is, therefore, furnished by the spectral radius of p .
This will be amply demonstrated in examples of this section.

Proposition 11.1. Let p be a transition probability matrix for a Markov chain

{X} starting in state i. Let A be a nonempty subset of S, i 0 A. Let

T A = inf{n >, 0: X E A}. (11.6)


R(TA-m)=1Y_Pk^m), m=1,2,..., (11.7)


where p is the matrix obtained by deleting the rows and columns of p
corresponding to the states in A.

In general, the matrix p is not a proper transition probability matrix since the
row sums may be strictly less than 1 upon the removal of certain columns from
p. However, if each of the rows in p corresponding to states j e A is replaced
by ej having 1 in the jth place and 0 elsewhere, then the resulting matrix ,
say, is a transition probability matrix and

Pi(TA < m) = 1 Pik ) (11.8)


The reason (11.8) holds is that up to the first passage time t o the distribution
of Markov chains having transition probability matrices p and (starting at i)
are the same. In particular,

Pik ) =Pk(m) fori,k0A. (11.9)

In the case that there is more than one state in A, an important problem is
to determine the distribution of X,,, starting from i A. Of course, if
P; ('r A < co) < 1 then X, is a defective random variable under Pi, being defined
on the set {r A < co} of P; -probability less than 1.

aJ(i):= Pi ({T A < co, XXA =j}) (j E A, i E S). (11.10)

By the Markov property, (conditional on X, in (11.10)),

a j (i)_>P ik a i (k) (jEA,ieS). (11.11)


Denoting by a j the vector (a(i): i E S), viewed as a column vector, one may
express (11.10) as

aj=paj (je A). (11.12)

Alternatively, (11.11) or (11.12) may be replaced by

a j (i) _ E p k a J (k) for i A,


_ 1 ifi=j
(^ E A).
a ' (j) 0 if i E A, but i A j

A function (or, vector) a = (a(i): i ES) is said to be p-harmonic on B(c S) if

a(i) = (pa)(i) for i e B. (11.14)

Hence a j is p-harmonic on A` and has the (boundary-)values on A,

1 ifi=j
a ' (i) f o if i e A, i^ j.

We have thus proved part (a) of the following proposition.


Proposition 11.2. Let p be a transition probability matrix and A a nonempty

subset of S.
(a) Then the distribution of Xtq , as defined by (11.10), satisfies (11.13).
(b) This is the unique bounded solution if and only if

P; (T A <oo)=1 forallieS. (11.16)

Proof. (b) Let i e A. Then

Pil o = P1 (T A > n)1 P,(T A = cc) as n T oo. (11.17)


Hence, if (11.16) holds, then

um p;V = 0 for all i, k e A. (11.18)

n-. .0

On the other hand,

Pil o =P(TA<n,X^A=k)TPI(TA< cc, XTA=k)=ak(i) (11.19)


PIk = b ik for all n, if i e A, k e S.


Now let a be another solution of (11.13), besides a j . Then a satisfies (11.12),

which on iteration yields a = p"a. Taking the limit as n j oo, and using (11.18),
(11.19), one gets

a(i) = tim a(k) _ a k (i)a(k) = a(i) (11.20)

nw k keA

for all je A`, since a(k) = 0 for k E A\{ j} and a(j) = 1. Hence a ; is the unique
solution of (11.13).
Conversely, if P; (T A < cc) < 1 for some in A`, then the function
h = (h(i): i e S) defined by

h(i):= I P; (T A < cc) = Pi(TA = 00) (i n S), (11.21)

may be shown to be p-harmonic in A with (boundary-)value zero on A. The

harmonic property is a consequence of the Markov property (Exercise 5),

h(i) = Pi(TA = 00 ) = Y_ PikPk(TA = 00 )


_ Y_ Pik') _ Y_ P ik h(k) (i E A`). (11.22)

k k

Since P;(t A = 0) = 1 for je A, h(i) = 0 for je A. It follows that both a j and

a J + h satisfy (11.13). Since h r 0, the solution of (11.13) is not unique.

Example 1. (A Random Replication Model). The simple Wright-Fisher model

originated in genetics as a model for the evolution of gene frequencies. Here
the mathematical model will be described in a somewhat different physical
context. Consider a collection of 2N individuals, each one of which is either in
favor of or against some issue. Let X denote the number of individuals in favor
of the issue at time n = 0, 1, 2, .... In the evolution, each one of the individuals
will randomly re-decide his or her position under the influence of the current
overall opinion as follows. Let B = X /2N denote the proportion in favor of

the issue at time n. Then given X 0 , X...... X, each of the 2N individuals,

independently of the choices of the others, elects to be in favor with probability
0,, or against the issue with probability I 0. That is,

P(-Y+1 = k X o , X,, ... , X) 8n(1 B)zN k -

for k = 0, 1, ... , 2N. So {X} is a Markov chain with state space

S = {0, 1, ... , 2N} and one-step transition matrix p = ((p, j )), where

Pik _ (2N^^2N)j(I )2N-i

i, j = 0, 1, ... , 2N. (11.24)

Notice that {X} is an aperiodic Markov chain. The "boundary states {0}
and {2N} form closed classes of essential states. The set of states
{ 1, 2, ... , 2N I } constitute an inessential class. The model has a special
conservation property of the form of the following martingale property,

E(X+1 1 X o , X 1 ..... X) = E(X+1 1 X) = (2N)2N = X, (11.25)

for n = 0, 1, 2, .... In particular, therefore,

EX,,, = E{E(X+, 1 X o , X t , ... , X)} = EX, (11.26)

for n = 0, 1, 2..... However, since S is finite we know that in the long run
{X} is certain to be absorbed in state 0 or 2N, i.e., the population is certain
to eventually come to a unanimous opinion, be it pro or con. It is of interest to
calculate the absorption probabilities as well as the rate of absorption. Here,
with A = {0, 2N}, one has p = p.
Let a(i) denote the probability of ultimate absorption at j = 0 or at j = 2N
starting from state i e S. Then,

a2N( 1 ) _ > hika2N(k) for 0< i < 2N,

/ k (11.27)
a2N( 2 N) = 1, a2N(0) = 0,
and a o (i) = 1 a 2N (i). In view of (11.26) and (11.19) we have
i = E ; X0 = E . X,, _ kp ;k) . 0a 0 (i) + 2 Na2N(i) = 2 Na2N(i)

for i = 1, ... , 2N 1. Therefore,

a 2N()
i = 2N , i =0, 1, 2,...,2N, (11.28)
= 2N i
ao(i) 2N ' i = 0, 1, 2, ... , 2N. (11.29)

Check also that (11.29) satisfies (11.27).

In order to estimate the rate at which fixation of opinion occurs, we shall
calculate the eigenvalues of p(= p here).
Let v = (v o ,v 2N )' and consider the eigenvalue problem
... ,

E' p j vj =2v i ,
; i =0,1,...,2N. (11.30)
The rth factorial moment of the binomial distribution (p ij : 0 < j < 2N) is
(2N rl
Z j(j 1)... (j r + l)pij =( 1r(2N)...
j=o 2N)
(2N r + 1)
j =r j r
(jJ_r i 2N -j
x 2N) 1 2N

_ (_N I r (2N)(2N 1)...(2N r + 1),

jj (11.31)
for r = 1, 2, ... , 2N. Equation (11.31) contains a transformation between
"factorial powers" and "ordinary powers" that deserves to be examined for
connections with (11.30). The "factorial powers" (j) r := j(j 1)... (j r + 1)
are simply polynomials in the "ordinary powers" and can be expressed as
j(1 1)(j r + 1) _ skjk. (11.32)
k =1
Likewise, "ordinary powers" jr can be expressed as polynomials in the "factorial

powers" as

J r = Sk(J)k, (11.33)

with the convention (j) 0 = 1. Note that S r' = 1 for all r > 0. The coefficients
Sr), {Sk} are commonly referred to as Stirling coefficients of the first and second
kinds, respectively.
Now every vector v = (v 0 , ... , v 2N )' may be represented as the successive
values of a unique (factorial) polynomial of degree 2N evaluated at 0, 1, . .. , 2N
(Exercise 7), i.e.,

a r (j)r forj=0,1,...,2N. (11.34)

According to (11.24), (11.32),

2N 2N 2N 2N !2N\2N /2N) r

pij vj = L-
j =0 r=0
Y- pij(J)r r=0
ar Y- a r
r jr = Y- ar r t J
2N n=0
r=0 (2N)
S( \

2N /2N

(2N)r (11.35)
= Y- ar (2N )r S)(1)n.
n=0 r=n

It is now clear that (11.30) holds if and only if

2N (2N)'
a, (2N)r S 7 (11.36)
r n

In particular, taking n = 2N and noting S' = 1, we see that (11.36) holds if

_ _ (2 N )2N (2N)
^`'2N (2N)2N + a2N I,
\ 1 1.37 )

and a r = & 2N) , r = 2N 1,... 0, are solved recursively from (11.36). Next
takea2N 1) 0 , a2N-1 1) = 1 and solve for

(2N)2N_ 1


0 < r < 2N. (11.38)

2r (2N)' = ( 1 2N)...(1 r 1),

Notice that .l o = A l = 1. The next largest eigenvalue is A 2 = 1 (1/2N). Let

V = ((v ij )) be the matrix whose columns are the eigenvectors obtained above.
Writing D = diag(a. 0 , A.1, ... , 2 2 ,), this means

pV = VD, (11.39)


p = VDV -1 , (11.40)

so that

p' = VDmV -1 . (11.41)

Therefore, writing V' = ((v i ')), we have from (11.8)

2N-1 2N-1 2N 2N 2N-1
Pi(T(o.2N) > m) = Y p,, Y_ Y_ Ak vikU k ' _ Y Y_ V ik V k ' tik . (11.42)
j=1 j=1 k=O k=O j=1

Since the left side of (11.42) must go to zero as m oo, the coefficients of
and A; - 1 must be zero. Thus,
2N 2N-1
Pi(T(0.2N) > m) _ Y, Y, Vikv k ^^k
k=O j=1

_ ^2 vi2v 2i ) + kY 1
Lt , E1 1 , Y1 1
Vik Uk,/ \^ 2/ m J

(const.).12 for large m. (11.43)

Example 2. (BienaymeGaltonWatson Simple Branching Process). A simple

branching process was introduced as Example 3.7. The state X,, of the process
at time n represents the total number of offspring in the nth generation of a
population that evolves by i.i.d. random replications of parent individuals. If
the offspring distribution is denoted by f then, as explained in Section 3, the
one-step transition probabilities are given by

.f * `(j), ifi>'1,j>'0,
p ij = 1 ifi=0,j=0, (11.44)
0 ifi=0,j 0.

According to the values of p i j at the boundary, the state i = 0 is absorbing

(permanent extinction). Write p io for the probability that eventually extinction
occurs given X0 = i. Also write p = p lo . Then p io = p i , since each of the i
sequences of generations arising from the i initial particles has the same chance

p of extinction, and the i sequences evolving independently must all be extinct

in order that there may be eventual extinction, given X0 = i.
If f(0) = 0, then p = p, o = 0 and extinction is impossible. If f(0) = 1, then
pro % pi = I and extinction is certain (no matter what X o is). To avoid these

and other trivialities, we assume (unless otherwise specified)

0<f(0)<1, f(0)+f(1)<1. (11.45)

Introduce the probability generating function of f: .

0(z) _ f(j)z' = f( 0 ) + f(j)z i ( IzI '< 1). (11.46)

i =o i= 1

Since a power series can be differentiated term by term within its radius of
convergence, one has

4 (z) = 4)(z) _ Y_ jf(j)z i '

, ( Iz1 < 1). (11.47)
dz j=1

i.e., if

= Y_
If the mean y of the number of particles generated by a single particle is finite,

i =1
ifU) < oo,

then (11.47) holds even for the left-hand derivative at z = 1, i.e.,



Since &'(z) > 0 for 0 < z < 1, 0 is strictly increasing. Also, since 4) "(z) (which
exists and is finite for 0 < z < l) satisfies

zz O(z) _ j(j 1)f(j)z' 2 >0 for 0 < z < 1, (11.50)
i =2

the function 0 is strictly convex on [0, 1]. In other words, the line segment
joining any two points on the curve y = 4)(z) lies strictly above the curve (except
at the two points joined). Because 0(0) = f(0) >0 and 4)(l) _ f(j) = 1,
the graph of looks like that of Figure 11.1 (curve a or b).
The maximum of '(z) is y, which is attained at z = 1. Hence, in the case
> 1, the graph of y = 4)(z) must lie below that of y = z near z = 1 and, because
4)(0) = f(0) > 0, must cross the line y = z at a point z o , 0 < z o < 1. Since the
slope of the curve y = 4)(z) continuously increases as z increases in (0, 1), z o is
the unique solution of the equation z = 4)(z) that is smaller than 1.


4(0) = f(0)

U Zo I z

Figure 11.1

In case u <, 1, y = cb(z) must lie strictly above the line y = z, except at z = 1.
For if it meets the line y = z at a point z o < 1, then it must go under the line
in the immediate vicinity to the right of z o , since its slope falls below that of
the line (i.e., unity). In order to reach the height 4(l) = 1 (also reached by the
line at the same value z = 1) its slope then must exceed 1 somewhere in (z o , 1];
this is impossible since 0'(z) < 0'(1) = p < I for all z in [0,1]. Thus, the only
solution of the equation z = 4(z) is z = 1.
Now observe

P=Pio= P(X1=X0= 1 )Pjo= ^.f(1)P'=q5 (P), (11.51)

thus if p <, 1, then p = 1 and extinction is certain. On the other hand, suppose
p> 1. Then p is either z o or 1. We shall now show that p = z o (< 1). For this,
consider the quantities

q'=P(X=0I Xo = 1)=Pi (n=1,2,...). (11.52)

That is, q is the probability that the sequence of generations originating from
a single particle is extinct at time n. As n increases, q. j p; for clearly,
{X=0}c{X,,,=0} for all m>,n,so that q<,qifn<m.Also

lim X = 0 = U {X = 0} = {extinction occurs).

Now, by independence of the generations originating from different particles,


P(X=O1 Xo=j)=q i (J=O, 1,2,...),

q,,+1 =P(X,+1 =01X 0 = 1)=P(X I =01X0 = 1)


+ Y- P(X 1 =J,X,+1 =O^Xo= I)



=f( 0 )+ Z P(X 1 = j I Xo = 1 )P(X+1 = 0 1X0 = 1, X 1 =J)


_ f(0) + f(J)q,' = O(qn) (n = 1, 2, ...). (11.53)

i =1

Since q 1 = f(0) = 4(0) < 4(z 0 ) = z o (recall that b(z) is strictly increasing in z
for 0 < z < 1), one has using (11.53) with n = 1, q 2 = o(q 1 ) < O(z o ) = z o , and
so on. Hence, q <z for all n. Therefore, p = lim n .. q < z o . This proves
p = zo.
If f(0) + f(1) = 1 and 0 <f(0) < 1, then q5"(z) = 0 for all z, and the graph
of 4(z) is the line segment joining (0, f(0)) and (1, 1). Hence, p = I in this case.
Let us now compute the average size of the nth generation. One has

E(a'^+1 I `Yo = 1 ) = Z kPik+l^ _ k(Z PliPik)

k=1 k=1 \j=1

_^ ^k P ^n1c11^
1j Pjk P1j
Pjk l `

k=1 j=1 J. k=1

O ro

_ p E(XI I Xo =j)=
) pi jE(XI 1X0 = 1)
j=1 j=1
ao a.
_ Pi;Ju = P 1Pij = E(X I X o = 1). (11.54)
j= 1 j=1

Continuing in this manner, one obtains

E(X,,+ 1 I Xo = 1) = uE(X,, j X0 = 1) = p 2 E(XX- j Xo = 1 )

=...=p"E(X 1 IXo =1)=n+ 1 . (11.55)

It follows that

E(X,, I X0 =1) =.IP . (11.56)

Thus, in the case < 1, the expected size of the population at time n decreases
to zero exponentially fast as n -+ co. If = 1, then the expected size at time n
does not depend on n (i.e., it is the same as the initial size). If > 1, then the
expected size of the population increases to infinity exponentially fast.


The notion of a Gibbs distribution has its origins in statistical physics as a

probability distribution with respect to which bulk thermodynamic properties
of materials in equilibrium can be expressed as expected values (called phase
averages). The thrust of Gibbs' idea is that a theoretically convenient way in
which to view materials at the microscopic scale is as a large system composed
of randomly distributed but interacting components, such as the positions and
momenta of molecules comprising the material.
To arrive at an appropriate probability distribution for computing large-scale
averages, Gibbs argued that the probability of a given configuration of
component values in equilibrium should be inversely proportional to an
exponential function of the total energy of the configuration. In this way the
lowest-energy configurations (ground states) are the most likely (modes) to
occur. Moreover, the additivity of the combined total energy of two
noninteracting systems is reflected in the multiplication rule for (exponential)
probabilities of independent values for such a specification. It is a tribute to
the genius of Gibbs that this approach leads to the correct thermodynamics at
the bulk material scale. Perhaps because of the great success these ideas have
enjoyed in physics, Gibbs' probability distributions have also been introduced
to represent systems having a large number of interacting components in a wide
variety of other contexts, ranging from both genetic and automata codes to
economic and sociological systems.
For purposes of orientation one may think of random values {X: n e A}
(states) from a finite set S distributed over a finite set of sites. The set of
possible configurations is then represented by the cartesian product
S2:_ {w = ( 6: n e A) n --> fi n e S. n e A, is a function on Al. For A and S
finite, S2 is also a finite set and a (free-boundary) Gibbs distribution on S2 can
be described by a probability mass function (p.m.f.) of the form

P(Xn = o, n e A):= Z ` exp{ U(cw)}, - w = ( 6,,) e S (12.1)

where U(w) is a real-valued function on S2, referred to as total potential energy

of configurations as e S2, is a positive parameter called inverse temperature, and

Z:= Z exp{U(w)}, (12.2)


is the normalization constant, referred to as the partition function.

Observe that if P is any probability distribution on a finite set S2 that assigns
strictly positive probability p(w) to each co e S2, then P can trivially be expressed
in the form (12.1) with U(w) = ' log{p(w)}. In physics, however, one

regards the total potential energy as a sum of energies at individual sites plus
the sum of interaction energies between pairs of sites plus the sum of energies
between triples, etc., and the probability distribution is specified for various
types of such interactions. For example, if U(w) = > fEA q1(6 n ) for

co = (o: n E A) is a sum of single-site energies, higher-order interactions being

zero, then the distribution (12.1) is that of independent components; similarly
for the case of infinite temperature ((i = 0).
It is both natural and quite important to consider the problem of extending
the above formulation to the so-called infinite volume setting in which A is a
countably infinite set of sites. We will consider this problem here for
one-dimensional systems on A = Z consisting of interacting components taking
values in a finite set S and for which the nonzero interaction energies contributing
to U are at most between pairs of nearest-neighbor (i.e., adjacent) integer sites.
Let Q = S 1 for a finite set S and let .F be the sigmafield generated by
finite-dimensional cylinder sets of the form

C={w=(a)efto k =s k forkeA}, (12.3)

where A is a finite subset of Z and Sk e S is specified for each k e A. Although

it would be enough to consistently prescribe the probabilities for such events

C, except for the independent case it is not quite obvious how to do this starting
with energy considerations. First consider then the independent case. For
single-site energies q 1 (s, n), se S. at the respective sites ne Z, the probabilities
of cylinder events can be specified according to the formula

P(C) = Zn' exp{ cp, (s k , k)}, (12.4)

l keA J

where ZA appropriately normalizes P for each set of sites A. In the homogeneous

(or translation-invariant) case we have cp,(s, n) = cp l (s), s e S, depending on the
site n only through the component values s at n.
Now suppose that we have in addition to single-site energies cp, (s), pairwise
nearest-neighbor interactions represented by (p 2 (s, n; t, m) for In ml = 1, s, t e S.
For translation invariance of interactions, take cp 2 (s, n; t, m) = q 2 (s, t) if
In ml = 1 and 0 otherwise. Now because values inside of A should be
dependent on values across boundaries of A it is less straightforward to
consistently write down expressions for P(C) than in (12.4). In fact, in this
connection it is somewhat more natural to consider a (local) specification of
conditional probabilities of finite-dimensional events of the form C, given
information about a configuration outside of A. Take A = {n} and consider a
specification of the conditional probability of C = {X = s} given an event of
the form {Xk = S k for k e D \{n}}, where D is a finite set of sites that includes
n and the two neighboring sites n 1 and n + 1, as follows
P(X= sIXk =s k ,keD\{n})
= Z,,.'' exp{ [cpi(s) + (P2(s, s -1) + (P2(s, sn +i)] }> (12.5)
Zn.n= Y exp{
[9,(s)+92(s>s,- 1)+42(s,S,, +i)] }. (12.6)

That is, the state at n depends on the given states at sites in D \ {n} only through
the neighboring values at n 1, n + 1. One would like to know that there is
a probability distribution having these conditional probabilities. For the
one-dimensional case at hand, we have the following basic result.

Theorem 12.1. Let S be an arbitrary finite set, S2 = S^, and let be the
sigmafield of subsets of ) generated by finite-dimensional cylinder sets. Let
{Xn } be the coordinate projections on S2. Suppose that P is a probability measure
on S with the following properties.
(i) P(C) > 0 for every finite-dimensional cylinder set C e S.
(ii) For arbitrary n c- Z let 3(n) = {n 1, n + 1 } denote the boundary of {n}
in 7L. If f is any S-valued function defined on a finite subset D,, of 7/ that
contains {n} u (n) and if a e S is arbitrary, then the conditional

P(X=a Xm = f(m),meD\{n})

depends only on the values of f on a(n).

(iii) The value of the conditional probability in (ii) is invariant under
translation by any amount 0 e 7L.
Then P is the distribution of a (unique) stationary (i.e., translation-invariant)
Markov chain having strictly positive transition probabilities and, conversely,
every stationary Markov chain with strictly positive transition probabilities
satisfies (i), (ii), and (iii).

Proof. Let
gb,C(a)=P(X=1 X.- =b,X+ =c). (12.7)

The family of conditional probabilities {g b (a)} is referred to as the local structure

of the probability distribution. We will prove the converse statement first and
along the way we will calculate the local structure of P in terms of the transition
probabilities. So assume that P is a stationary Markov chain with strictly
positive transition matrix ((p(a, b): a, b e S)) and marginal distribution
(n(a): a E S). Consider cylinder set probabilities given by

P(XX = a o , X+ = a l , ... , X,,+n = ab) = ir(ao)p(ao, a, ) p(ab_ 1, ab). (12.8)

So, in particular, the condition (i) is satisfied; also see (2.9), (2.6), Prop. 6.1.
For m and n > I it is a straightforward computation to verify

P(Xo=al X-m=b_,,...,X_, =b-,,X1 =b1,...,XX =b^)

=P(X0 =ajX- 1 =b-1,X1 =b1)

= p(b-1, a)p(a, bi) /p

(b- i, bi ) = gb e. b,(a). (12.9)

Therefore, the condition (ii) holds for P. since condition (iii) also holds because
P is the distribution of a stationary Markov chain. Next suppose that P is a
probability distribution satisfying (i), (ii), and (iii). We must show that P is the
distribution of a stationary Markov chain. Fix an arbitrary element of S, denoted
as 0, say. Let the local structure of P be as defined in (12.7). Observe that for
each b, c E S. g( ) is a probability measure on S. Outlined in Exercise 1 are
the steps required to show that

q(b, a)q(a, c)
g(a) =q^z^(b ^) (12.10)

0 , (b)
q(b, c) = g (12.11)
g 9 (0)

q("+ '>
(b, c) = I q(b, a)q(a, c). (12.12)

An induction argument, as outlined in Exercise 2, can be used to check that

P(`ik+h = ak , 1 < k < r I Xh+ l -" = a, Xh+r+n = b)

= q " (a, a1)q( ai, a2 . .. q (")( ar b)

( )

b) q (r+ 2n - 1)(Q b) >

l ,

for h E 7L, a , a, b E S. n, r > 1. We would like to conclude that P coincides


with the (uniquely determined) stationary Markov chain having transition

probabilities ((q(b, c))) defined by normalizing ((q(b, c))) to a transition
probability matrix according to the special formula

q( b, c)(c) 9(b, c)(c)

4(b,c)= --- _ (12.14)
Y q(b,a),u(a) Au(b)

where p = ((a)) is a positive eigenvector of q corresponding to the largest (in

magnitude) eigenvalue ), = Amax of q. Note that q is a strictly positive transition
probability matrix with unique invariant distribution n, say, and such that
(Exercise 3)

) (b, c) (c)
9(b, c) = q^" (12.15)
2 "u(b)
Let Q be the probability distribution of this Markov chain. It is enough to

show that P and Q agree on the cylinder events. Using (12.13) we have for any
n 1,

P(X0 =a o ,...,Xr = ar)

_ P(X-n= a,Xn+r=b)P(X0=a0,...,Xr=arI X - n = a,Xn+r=b)


(n)(a,, b)
(n) (a , ao)9(ao, a,) ... q(ar-,, ar)q
_ P(X-n = a, Xn+r = b) q q(r+2n)(a, b)

(n) (a , a0)4(a0, a1). . (a r _ 1, ar)4 (n) (ar , b )

Y_ Y_ P ( X
aeS beS
- n = a, Xn+r = b) 4 q(r
+2n)(a, b)
Now let n - oo to get by the fundamental convergence result of Proposition
6.1 (Exercise 4),

P(X0 = a 0 , ... , Xr = a,) = ir(ao)4(ao, a1)...4(ar -i, ar)

=Q(X0 =a o ,...,Xr =a r ). (12.17)

Probability distributions on 7L that satisfy (i)-(iii) of Theorem 12.1 are also

referred to as one-dimensional Markov random fields (MRF). This definition
has a natural extension to probability distributions on 7L called the
d-dimensional Markov random field. For that matter, the important element of
the definition of a MRF on a countable set A is that A have a graph structure
to accommodate the notion of nearest-neighbor (or adjacent) sites. The result
here shows that one-dimensional Markov random fields can be locally specified
and are in fact stationary Markov chains. While existence of a probability
distribution satisfying (i)-(iii) can be proved for any dimension d (for finite S),
the probability distribution need not be unique. This interesting phenomenon
is known as a phase transition. Existence may fail in the case that S is countably
infinite too (see theoretical complement 1).



The canonical construction of Markov chains on the space of trajectories has

been explained in Chapter I, Section 6, using Kolmogorov's Existence Theorem.
In the present section and the next, another widely used general method of
construction of Markov processes on arbitrary state spaces is illustrated.
Markovian models in this form arise naturally in many fields, and they are
often easier to analyze in this noncanonical representation.

Example 1. (The Linear Autoregressive Model of Order One, or the AR(1)

Model).Let b be a real number and {E n : n > 1} an i.i.d. sequence of real-valued

random variables defined on some probability space (f), .F, P). Given an initial
random variable X 0 independent of {E}, define recursively the sequence of
random variables {X": n > 0} as follows:

X,, X 1 := bX o + c , Xn+ 1 := bXn + (n 0). (13.1)

As X0 , X 1 ... . , Xn are determined by {X 0 , s .... E n }, and E" + , is independent

of the latter, one has, for all Bore! sets C,

P(Xn+t E Cl {X0 , X 1 ... .. Xn}) = [P(bx + c,,1 E C)]x=x

= [P(En+ ^ E C bx)] =x = Q(C bXn),

where Q is the common distribution of the random variables t. n . Thus {Xn : n >, 0}
is a Markov process on the state space S = ER', having the transition probability
(of going from x to C in one step)

p(x, C):= Q(C bx), (13.3)

and initial distribution given by the distribution of X 0 . The analysis of this

Markov process is, however, facilitated more by its representation (13.1) than
by an analytical study of the asymptotics of n-step transition probabilities. Note
that successive iteration in (13.1) yields

X,=bX0 +E,, X 2 =bX,+f 2 =b 2 X 0 +he,+c 2

XX = bnX0 + bn-' E1 + bn-2 c2 + ... + be-, + e (n % I).

The distribution of Xn is, therefore, the same as that of

Y := b"X0 + e, + bs 2 + b 2 e 3 + . + b""'F (n >, 1). (13.5)

Assume now that

IbI < 1 (13.6)

and le n s < c with probability I for some constant c. Then it follows from (13.5)

Yn ' b"En+^ a.s., (13.7)


regardless of X0 . Let it denote the distribution of the random variable on the

right side in (13.7). Then Y. converges in distribution to it as n -, oo (Exercise
1). Because the distribution of X. is the same as that of Yn , it follows that X.
converges in distribution to n. Therefore, is is the unique invariant distribution
for the Markov process {Xn }, i.e., for p(x, dy) (Exercise 1).

The assumption that the random variable s, is bounded can be relaxed.

Indeed, it suffices to assume

" E P(IE I > cS") < oo

1 I for some 6 <--- and some c > 0. (13.8)

For (13.8) is equivalent to assuming I

P(IE" + 1 I > c8") < co so that, by the
BorelCantelli Lemma (see Chapter 0, Section 6),

P(IE"+ lI < cS" for all but finitely many n) = 1.

This implies that, with probability 1, Ib "E" +l l < c(Ibj6)" for all but finitely many
n. Since IbI b < 1, the series on the right side of (13.7) is convergent and is the
limit of Y".
It is simple to check that (13.8) holds if I bi < 1 and (Exercise 3)

Eie 1 i' <oo for some r > 0. (13.9)

The conditions (13.6) and (13.8) (or (13.9)) are therefore sufficient for the
existence of a unique invariant probability it and for the convergence of X. in
distribution to it.

Next, Example 1 is extended to multidimensional state space.

Example 2. (General Linear Time Series Model). Let {E": n 1} be a sequence

of i.i.d. random vectors with values in Rm and common distribution Q, and let
B be an m x m matrix with real entries b . Suppose X o is an m-dimensional

random vector independent of {E" }. Define recursively the sequence of random


X0,X 1'= BXn+En +1 (n =0, 1,2,...). (13.10)

As in (13.2), (13.3), {X"} is a Markov process with state space 68'" and transition

p(x, C):=Q(C Bx) (for all Borel sets C c Qm). (13.11)

Assume that

IIB"II < 1 for some positive integer n o . ( 13.12)

Recall that the norm of a matrix H is defined by

IIHII:= sup IHxi, (13.13)

IXI =1

where Ixl denotes the Euclidean length of x in LRm. For a positive integer n > n o
write n = jn o + j', where 0 < j' < n o . Then using the fact IIB1B2II IIB1II IIB2II
for arbitrary m x m matrices B,, B 2 (Exercise 2), one gets

II WI = IIB'"B' II < IIB"IVIiB' II < cIIB"II'> c max {IIBII:0 < r < n o }.

From (13.12) and (13.14) it follows, as in Example 1, that the series Z B"E,
converges a.s. in Euclidean norm if (Exercise 3), for some c > 0,

" ^ 1 P(i 1 I > cb") < oo for some 6 < II Bn0Il11n0 . (13.15)

Write, in this case,

Y:= E B"sn+l (13.16)


It also follows, as in Example 1 (see Exercise 1), that no matter what the initial
distribution (i.e., the distribution of X 0 ) is, X n converges in distribution to the
distribution it of Y. Therefore, it is the unique invariant distribution for p(x, dy).
For purposes of application it is useful to know that the assumption (13.12)
holds if the maximum modulus of eigenvalues of B, also known as the spectral
radius r(B) of B, is less than 1. This fact is implied by the following result from
linear algebra.

Lemma. Let B be an m x m matrix. Then the spectral radius r(B) satisfies

r(B) >, Iii JIB"Il li ". (13.17)


Proof. Let A,, ... , A m be the eigenvalues of B. This means det(B Al) =
(^ 1 A)(A2 A) * (Am A), where det is shorthand for determinant and I is
the identity matrix. Let A m have the maximum modulus among the A ; , i.e.,
J > Iiml then B Al is invertible, since det(B AI) 0. Indeed,
I'ml = r(B). If Al
by the definition of the inverse, each element of the inverse of B Al is a
polynomial in A (of degree m 1 or m 2) divided by det(B AI). Therefore,
one may write

(BAI) -1 =(A, A) -1 ...(A m A) -] (B o +.B 2 +...+,lm -1 B m - 1 )

(IAI > kml), (13.18)

where B,(0 < j < m 1) are m x m matrices that do not involve A. Writing
z = 1/A, one may express (13.18) as

(B AI)-1 = (_)_m(1 /^1,)-1.. .(1 )!mR)-12m-1 j=0

= (-1)mz(l .l,z) -1 ...(I .mZ) 1 I z m 1-, Bj

Z Y a,, Z n) -1 Zm-1-jBj
(IZI < I2ml-1 ). (13.19)
^ n=0 /-0

On the other hand,

(B ):I)-1 = z(I zB)' = z IZI < 1zkB . (13.20)


To see this, first note that the series on the right is convergent in norm for
Izl < 1 /1111, and then check that term-by-term multiplication of the series I z k B k
by I zB yields the identity I after all cancellations. In particular, writing b;j'^
for the (i, j) element of B k , the series
zY_z k b; (13.21)

converges absolutely for Izl < 1/ B. Since (13.21) is the same as the (i, j) element
of the series (13.19), at least for Izl < I/IIBII, their coefficients coincide (Exercise
4) and, therefore, the series in (13.21) is absolutely convergent for Izl < IAm) -'
(as (13.19) is).
This implies that, for each e > 0,

I14 I < (I.l m l + E) k for all sufficiently large k. (13.22)

For if (13.22) is violated, one may choose Izl sufficiently close to (but less than)
1/I ; m l such that Iz^ k ' ) b'>l -- co for a subsequence {k'}, contradicting the
requirement that the terms of the convergent series (13.21) must go to zero for
IZI < 1/IAmI
Now IIB k ll < m' 12 max{Ib>I: 1 < i, j < m} (Exercise 2). Since m l/21 ^ 1 as
k --* co, (13.22) implies (13.17). n

Two well-known time series models will now be treated as special cases of
Example 2. These are the pth order autoregressive (or AR(p)) model, and the
autoregressive moving-average model ARMA(p, q).

Example 2(a). (AR(p) Model). Let p > 1 be an integer, o , ,, . .. , P -, real

constants. Given a sequence of i.i.d. real-valued random variables {ri n : n > p},
and p other random variables U0 , U,..... UP _, independent of {ri n }, define



U+1 + 1
Un+p iln+p (n >, 0). (13.23)
i =0

The sequence { Un } is not in general a Markov process, but the sequence of

p-dimensional random vectors

Xn'=(Un , Un+1 , ... , U+p-1)' (n 0) (13.24)

is Markovian. Here the prime (') denotes transposition, so X. is to be regarded

as a column vector in matrix operations. To prove the Markov property,
consider the sequence of p-dimensional i.i.d. random vectors

0 ,ij +p-1Y (n>, 1), (13.25)

and note that

X+, = BX n + r n+1 (13.26)

where B is the p x p matrix

0 1 0 0 0 0
0 0 1 0 .. 0 0
B:= . (13.27)
0 0 0 0 ... 0 1

0 F'1 N2 1'3 ' Np - 2 p-1

Hence, arguing as in (13.2), (13.3), or (13.11) X,; is a Markov process on the

state space R. Write

). 1 0 0 0 0
0 ) 1 0 0 0
BA.(= ... .

0 0 0 0 ^ 1

YO I'1 /32/33 ' * * /'p-2 1 1 p-1 a

Expanding det(B Al) by its last row, and using the fact that the determinant
of a matrix in triangular form (i.e., with all zero off-diagonal elements on one
side of the diagonal) is the product of its diagonal elements (Exercise 5), one gets
det(B AI) = ( - 1 p+1 ) (0 + 1A + ... + p ,; ,p -
A"). (13.28)

Therefore, the eigenvalues of B are the roots of the equation


0+1A+ ...
+p-I )p-I Ap = 0 . (13.29)

Finally, in view of (13.17), the following proposition holds (see (13.15) and
Exercise 3).

Proposition 13.1. Suppose that the roots of the polynomial equation (13.29)
are all strictly inside the unit circle in the complex plane, and that the common
distribution G of {tl"} satisfies

l G({x e AB I : lxi > cS"}) < oo for some S < ^ (13.30)

n A mt

where JA m t is the maximum modulus of the roots of (13.29). Then (i) there exists
a unique invariant distribution 71 for the Markov process {X"}, and (ii) no
matter what the initial distribution, X n converges in distribution to it.

Once again it is simple to check that (13.30) holds if G has a finite absolute
moment of some order r > 0 (Exercise 3).
An immediate consequence of Proposition 13.1 is that the time series
{ Un : n >, 0} converges in distribution to a steady state n U given, for all Bore!
sets C R', by

1t (C):= ir({x E W: x ' e C}). ( ) (13.31)

To see this, simply note that U. is the first coordinate of X, so that X,, converges
to it in distribution implies U. converges to it in distribution.

Example 2(b). (ARMA(p, q) Model). The autoregressive moving-average

model of order (p, q), in short ARMA(p, q), is defined by

p I q
Un+p := Z i Un +i + 1 Sj^1 n+pj + rin+p (n 0), (13.32)
i =0 j=1

where p, q are positive integers, i (0 < i < p 1) and bj (1 < j < q) are real
constants, {ri n : n >, p q} is an i.i.d. sequence of real-valued random variables,
and U. (0 < i <, p 1) are arbitrary initial random variables independent of
{rj"}. Consider the sequence {X"}, {s"} of (p + q)-dimensional vectors

X. _ (Un , , Un+pI. Iln+pq+ e In+p-1) , e

(n ? 0), (13.33)
$ n := (0, 0, . . 0, 1%n + p 1 , 0, . . , 0 , j,,+,,_ 1 )/

where il n+p-1 occurs as the pth and (p + q)th elements of E.


X,,+1 = HXn + En+ 1 (n -> 0 ), (13.34)

where H is the (p + q) x (p + q) matrix

b ll . ... b 1 0 . ... 0 0

h nl h ... b 2 d1
0 00 1 0 .. 00
H :=
0 00 0 1 00

0 0 ... 0 0 01
0 0 . 0 0 ... 00

the first p rows and p columns of H being the matrix B in (13.27).

Note that U., ... , U P _ 1 , rl p _ q , ... , q p _, determine X 0 so that X o is ,

independent of rl p and, therefore, of E,. It follows by induction that X. and E n+ ,

are independent. Hence {X n } is a Markov process on the state space I8 + Q.

In order to apply the Lemma above, expand det(H ;.I) in terms of the
elements of its pth row to get (Exercise 5)

det(H Al) = det(B Al)(_2) 9 . ( 13.35)

Therefore, the eigenvalues of H are q zeros and the roots of (13.29). Thus, one
has the following proposition.

Proposition 13.2. Under the hypothesis of Proposition 13.1, the ARMA(p, q)

process {X n } has a unique invariant distribution n, and X. converges in
distribution to it no matter what the initial distribution is.

As a corollary, the time series { Un } converges in distribution to m given for

all Borel sets C c R' by

ii u (C):= rz({x e RP + q: x ( ' E

) C}), (13.36)

no matter what the distribution of (U0 , U ... , U p _,) is, provided the
hypothesis of Proposition 13.2 is satisfied.
In the case that E n is Gaussian, it is simple to check that under the hypothesis
(13.12) in Example 2 the random vector V in (13.16) is Gaussian. Therefore, it
is Gaussian, so that the stationary vector-valued process {X n } with initial
distribution it is Gaussian (Exercise 6). In particular, if q n are Gaussian in
Example 2(a), and the roots of the polynomial equation (13.29) lie inside the
unit circle in the complex plane, then the stationary process {Un }, obtained

when (U0 , U I ... , U,,,_ 1 ) have distribution

, it in Example 2(a), is Gaussian. A
similar assertion holds for Example 2(b).



The method of construction of Markov processes illustrated for linear time

series models in Section 13 extends to more general Markov processes. The
present section is devoted to the construction and analysis of some nonlinear
models. Before turning to these models, note that one may regard the process
{X} in Example 13.1 (see (13.1)) to be generated by successive iterations of an
i.i.d. sequence of random maps a l , a...... a, ... defined by

x-->ax=bx +E (n> 1),

{s: n >, l} being a sequence of i.i.d. real-valued random variables. Each a is

random (affine linear) map on the state space R' into itself. The Markov
sequence {X} is defined by

X = a . a,X (n > 1), (14.1)


where the initial X 0 is a real-valued random variable independent of the sequence

of random maps {a: n >, 1 }. A similar interpretation holds for the other
examples of Section 13. Indeed it may be shown, under a very mild condition
on the state space, that every Markov process in discrete time may be represented
as (14.1) (see theoretical complement 1). Thus the method of the last section
and the present one is truly a general device for constructing and analyzing
Markov processes on general state spaces.

Example 1. (Iterations of I.I.D. Increasing Maps). Let the state space be an

interval J, finite or infinite. On some probability space (Q, .F , P) is given a
sequence of i.i.d. continuous and increasing random maps {a: n >, l} on J into
itself. This means first of all that for each w E S2, a(w) is a continuous and
increasing (i.e., nondecreasing) function on J into J. Second, there exists a set
I' of continuous increasing functions on J into J such that P(a e i') = I for
all n; I' has a sigmafield .4(I') generated by sets of the form {y e I': a < yx < b}
where a < b and x are arbitrary elements of J, and yx denotes the value of y
at x. The maps a on Q into I' are measurable, i.e., F := {co e S2: a(w) e D} E F
for every D e .(r). Also, P(F) is the same for all n. Finally, {a: n >, 1 } are
independent, i.e., events {a e D} is an independent sequence for every given
sequence {D}
For any finite set of functions y,, Y 2 , ... , y k in I', one defines the composition
Y1Y2' Yk in the usual manner. For example, y 1 y 2 x = y 1 (y 2 x), the value of y 1
at the point y2x.

For each x e J define the sequence of random variables

X0(x):=x, X,(x):=aX - 1(x) = anan- 1 ...

a,x (n > 1). (14.2)

In view of the independence of {a: n > 1}, {X: n 0} is a Markov process

(Exercise 1) on J, starting at x and having the (one-step) transition probability

p(y, C):= P(a y E C) = p({y e F: yy e Cl) (C Borel subset of J), (14.3)

where y is the common distribution of an .

lt will be shown now that the following condition guarantees the existence
of a unique invariant probability it as well as stability, i.e., convergence of X(x)
in distribution to it for every initial state x. Assume

6, := P(X o (x) '< z 0 Vx) >0 and 82 := P(X o (x) z 0 Vx) >0
for some zo E J and some integer n o .

A,:= sup I P(X,(x) '< z) P(X,(y) '< z). (14.5)

For the existence of a unique invariant probability it and for stability it is

enough to show that A,, > 0 as n > co. For this implies P(X(x) < z) converges,
uniformly in z e J, to a distribution function (of a probability measure on J).
To see this last fact, observe that X +m (x) _ a + m a,x has the same
distribution as a a1 +m a + ,x, so that

IP(Xn +m(x) < Z) P(Xn(x) <, z)1

= IP(Xn(Ln +m ...
an+lx) <, Z) P(X(x) <, z)I < A,
by comparing the conditional probabilities given a +m, ... , an+ 1 first. Thus, if
A + 0, then the sequence of distribution functions {P(X(x) < z)} is a Cauchy
sequence with respect to uniform convergence for z E J. It is simple to check
that the limiting function of this sequence is a distribution function of a
probability measure is on J (Exercise 2). Further, A. --+ 0 implies that this limit
it does not depend on the initial state x, showing that P is the unique invariant
probability (Exercise 13.1).
In order to prove A. --> 0, the first step is to establish, under the assumption
(14.4), the inequality

A o < 6:=max{1 S 1 z }. (14.6)

For this fix x, y E J and first take z < z o . On the set F2 :_ {X 0 (x) >, z 0 Vx} the

events {X 0 (x) <, z}, {X 0 (y) < z} are both empty. Hence, by the second
condition in (14.4),

IP(Xno (x) < z) P(Xn o (y) < z)I = IE(1{x^o(x)sz) 1{x,o(Y),z))I P(FZ) = 1 6 2 ,
since the difference between the two indicator functions in (14.7) is zero on F2 .
Similarly, if z > z o then on the set F, :_ {X 0 (x) <, z 0 Vx} the two indicator
functions both equal 1, so that their difference vanishes and one gets

IP(X^ o (x) < z) P(Xn o (y) < z)] < P(F) = 1 6 1 . (14.8)

Combining (14.7) and (14.8) one gets

IP(Xno (x) < z) P(Xn o (y) <, z)I < 8 (14.9)

for all z z o . But the function on the left is right-continuous in z. Therefore,

letting z I z o , (14.9) holds also for z = z o . In other words, (14.6) holds.
Next note that O n is monotonically decreasing,

On+i < On. (14.10)


IP(XX+i(x) 5 z) P(Xn+i(y) 5 z)1

= IP(n+l...a2a1X' Z) Man+l ... a2 0tlY ^ Z)I ^ An,

by comparing the conditional probabilities given a,.

The final step in proving O n -^ 0 is to show
A n < 8[n/no) (14.11)

where [r] is the integer part of r. In view of (14.10), it is enough to prove

O jno S' (j=1,2,...). (14.12)

Suppose that this is true for some j >, 1. Then,

IP(X(j+l)no(x) < Z) P(X(j+l)no(.y) < z)I

= I E ( 1 {ai . +un o ... ain o +IXjn o (x) z) 1 (au+IW o no + Xjn o (Y)Sz) ) I

= I E(l {xjno(x) 2 ( ( Itj+1mo ... n 0+1) - '( aO.zJ) 1 {Xjno(Y)e(a(j+ llno ... a^no+1) - '( ao.zJ) )1 '

F3 1 = {a(j + 1) no ...aj no+ lx < zOVx}, F4:= {a(j+1) no ...aj np +1X s Zpdx}.

Take z < z o first. Then the inverse image of (x,:] in (14.13) is empty on
F4 , so that the difference between the two indicator functions vanishes on F 4 .
On the complement of F,, the inverse image of ( ao, z] under the continuous
increasing map a (j+1)n0 ' *anno + , is an interval (oo, Z'] n J, where Z' is a
random variable. Therefore, (14.13) leads to

IP(X(J+I)no(x) < z) P(X(J+1)no(y) < z)I = JE 1 F ( I lX,lx)SZ'1 1 X(v)$l'^)I-


As F and Z' are determined by a+,)n o , ... , a;n.+, and the latter are
independent of Xj(x), X 0 (y) one gets, by taking conditional expectation given
1aU+1)no, ... , ajno+1

IP(X(J+1)no(X) l< z) P(X(J+1)no(Y) S z)1

IE1 Fg i jn0 I _ (1 5 2 )A jn0 < b\ ino < hJ+ 1 . (14.15)

Similarly, if z > z 0 , the inverse image in (14.13) is J on F3 . Therefore, the

difference between the two indicator functions in (14.13) vanishes on F3 , and
one has (14.14), (14.15) with F4 , b 2 replaced by F 3 and J,. As (14.12) holds for
j = 1 (see (14.6)), the induction is complete and (14.12) holds for allj >, 1. Since
6 < 1, it follows that under the hypothesis (14.4), A n * 0 exponentially fast as
n + oo and, therefore, there exists a unique invariant probability.
If J is a closed bounded interval [a, b], then the condition (14.4) is essentially
necessary for stability. To see this, define

Y0(x) = x, }(x):=12. (n
(n 1). (14.16)

Then Yn (x) and X(x) have the same distribution. Also,

Y1(a) a, Y2(a) = Y1(a2a) >, Y,(a), .. .

Yn+,(a) = Yn(an+la) i Yn(a),

i.e., the sequence of random variables { Yn (a): n > 0} is increasing. Similarly,

n > 0} is decreasing. Let the limits of these two sequences be Y, Y,
respectively. As Yn (a) < Yn (b) for all n, Y < Y If P( Y < Y) > 0, then Y and Y
cannot have the same distribution. In other words, Yn (a) (and ; therefore, Xn (a))
and Yn (b) (and, therefore, XX (b)) converge in distribution to different limits
it 1 , rr z say. On the other hand, if Y = Y a.s., then these limiting distributions
are the same. Also, Yn (a) < Yn (x) ` Yn (b) for all x, so that Yn (x) converges in
distribution to the same limit it, whatever x. Therefore, it is the unique invariant
probability. Assume that is does not assign all its mass at a single point. That
is, rule out the case that with probability I ally's in I, have a common fixed
point. Then there exist m < M such that P(Y < m) > 0 and P(Y > M) > 0.
There exists n o such that P(Yno (b) < m) > 0 and P(Yno (a) > M) > 0. Now any
z o e [m, M] satisfies (14.4).

As an application of Example 1, consider the following example from


Example 1(a). (A Descriptive Model of Capital Accumulation). Consider an

economy that has a single producible good. The economy starts with an initial
stock X0 = x > 0 of this good which is used to produce an output Y, in period
1. The output Y, is not a deterministic function of the input x. In view of the
randomness of the state of nature, Y, takes one of the values fr (X) with
probability p, > 0 (1 < r <, N). Here f, are production functions having the
following properties:
(i) f, is twice continuously differentiable, f(x) > 0 and f' (x) <0 for all
(ii) limxlo fr(X) = 0, limXlof;(x) > 1, limit f(x) = 0.
(iii) If r > r', then fr (X) > fr (X) for all x > 0.
The strict concavity of f, in (i) reflects a law of diminishing returns, while (iii)
assumes an ordering of the technologies or production functions f from the
least productive fi to the most productive fN .
A fraction (0 < < 1) of the output Yl is consumed, while the rest (1 )Y,
is invested for the production in the next period. The total stock X, at hand
for investment in period 1 is OXo + (1 )Y1 . Here 0 < I is the rate of
depreciation of capital used in production. This process continues indefinitely,
each time with an independent choice of the production function (f, with
probability p r , 1 <, r < N). Thus, the capital X + , at hand in period n + 1

Xt = 0X + ( 1 )q 1(X.) (n 0), (14.17)

where cp is the random production function in period n,

P(q'n= fr) =pr (I <r<N),

and the cp (n >, 1) are independent. Thus the Markov process {X(x): n >, 0}
on the state space (0, cc) may be represented as

X.(X) = a( ...p(IX,

where, writing

g,(x):= 0x + (1 )f,(x), 1 < r < N, (14.18)

one has

P(an=9^)=Pr (I <r <N). (14.19)


Suppose, in addition to the assumptions already made, that

0+(1)limf;(x)>I (1<r<,N), (14.20)


i.e., lim x10 g,(x) > I for all r. As lim x ^ g(x) = 0 + (1 /3) lim x . f(x) = 0 < 1,
it follows from the strictly increasing and strict concavity properties of g r that
each g r has a unique fixed point a r (see Figure 14.1)

g,(a r )=a r (I <r<N). (14.21)

Note that by property (iii) of fr , a, < a z < < a N . If y >, a,, then
9r(Y) % g(a 1 ) % g 1 (a 1 ) = a,, so that X,,(x) >, a, for all n > 0 if x >, a,. Similarly,
if y < a N then gr(Y) 9 r(aN ) ' 9N(aN) = a N , so that X,,(x) < a N for all n >, 0 if
x < a N . As a consequence, if the initial state x is in [a,, a N ], then the process
n >, 0} remains in [a,, a N ] forever. In this case, one may take
J = [a,, a N ] to be the effective state space. Also, if x > a, then the nth iterate
of g i , namely gin ) (x), decreases as n increases. For if x a,, then g,(x) < x,
8121 (x) = g, (g, (x)) < g, (x), etc. The limit of this decreasing sequence is a fixed
point of g l (Exercise 3) and, therefore, must be a,. Similarly, if x < a N then
g(x) increases, as n increases, to a N . In particular,

lim g(a N ) = a,, lim g(a 1 ) = a N .

n-x n-y,

Thus, there exists an integer n 0 such that

91 (aN) < 9 N< 0) (a,). (14.22)


u a,

Figure 14.1


This means that if z o e [gi" '(aN), 9 ) (a1)], then

P(X" O (x) <z O Vxc[a i ,a N ])>P(oc"=g l for I <,n <n o )=pi 0 >0,
P(X"o(x) >, z 0 Vx e [a1, aN]) % P(a = gN for 1 < n < n o ) = pn > 0.

Hence, the condition (1.4.4) of Example I holds, and there exists a unique
invariant probability it, if the state space is taken to be [a,, a N ].
Next fix the initial state x in (0, a 1 ). Then g(x) increases, as n increases.
The limit must be a fixed point and, therefore, a 1 . Since g(a 1 ) > a, for
r = 2, ... , N, there exists s > 0 such that g(y) > a l (2 r < N) if
ye [a 1 e, a 1 ]. Now find n E such that g(x) >, a i e. If T, inf{n > 1:
X(x) >_ a,}, then it follows from the above that

P(z i >n E +k)<pi (k^l),

because t, > n E + k implies that the last k among the first n e + k function a"
are g 1 . Since p; goes to zero as k + oo, it follows from this that i l is a.s. finite.
Also XTI (x) < a N as g(y) < gr(aN) gN(aN) = aN (t < r < N) for y <, a 1 , so that
in a single step it is not possible to go from a state less than a 1 to a state larger
than a N . By the strong Markov property, and the result in the preceding
paragraph on the existence of a unique invariant distribution and stability on
[a l , a N ], it follows that XL , + ,(x) converges in distribution to it, as m o0
(Exercise 5). From this, one may show that pl"'(x, dy) converges weakly to it(dy)
for all x, as n + oo, so that it is the unique invariant probability on (0, oo)
(Exercise 5).
In the same manner it may be checked that X(x) converges in distribution
to it if x > a N . Thus, no matter what the initial state x is, X(x) converges in
distribution to n. Therefore, on the state space (0, cc) there exists a unique
invariant distribution it (assigning probability 1 to [a,, a N ]), and stability holds.
In analogy with the case of Markov chains, one may call the set of states
{x; 0 < x <a 1 or x > a N } inessential.
The study of the existence of unique invariant probabilities and stability is
relatively simpler for those cases in which the transition probabilities p(x, dy)
have a density p(x, y), say, with respect to some reference measure (dy) on the
state space. In the case of Markov chains this measure may be taken to be the
counting measure, assigning mass I to each singleton in the state space. For a
class of simple examples with an uncountable state space, let S = Il' and f a
bounded measurable function on I8', a < f (x) < b. Let {E"} be an i.i.d. sequence
of real-valued random variables whose common distribution has a strictly
positive continuous density cp with respect to Lebesgue measure on 1. Consider
the Markov process

X" +1 := f(X") + E" + , (n > 0), (14.23)

with X0 arbitrary (independent of {E"}). Then the transition probability p(x, dy)


has the density

p (x, y):= (P(y .i (x)). (14.24)

Note that

tp(y f (x)) >, ii(y) for all XE 68, (14.25)


i(y):= min{cp(y z): a <, z < b} > 0.

Then (see theoretical complement 6.1) it follows that this Markov process has
a unique invariant probability with a density n(y) and that the distribution of
X. converges to it(y) dy, whatever the initial state.
The following example illustrates the dramatic difference between the cases
when a density exists and when it does not.

Example 2. Let S = [-2,2] and consider the Markov process X + , =

.f(X) + r + , (n > 0), X0 independent of {^}, where ie} is an i.i.d. sequence
with values in [-1,1], and

x+l if-2x0,
Lx-1 if 0<x^2.

First let E be Bernoulli, P(E = I) = '' = P(f _ I). Then, with X. - x e (0, 2],

X 1 (x) =
I x 2
with probability ' ,
with probability i,

and X, (x 2) has the same distribution as X 1 (x). It follows that

P(X 2 (x)=x-2IX,(x)=x)=?=P(X 2 (x)=x 21X,(x)=x-2),

P(X 2 (x) = xI X 1 (x) = x) = z = P(X 2 (x) = x X 1 (x) = x 2).

In other words, X 1 (x) and X 2 (x) are independent and have the same two-point
distribution It s . It follows that { X(x): n >, I } is i.i.d. with common distribution
m. In particular, n x is an invariant initial distribution. If x e [-2,0], then
{X(x): n > l} is i.i.d. with common distribution nx+2, assigning probabilities
Z and Z to {x + 2} and {x}. Thus, there is an uncountable family
of invariant initial distributions {n r : 0 < x < 1) v {n x+ ,: I _< x 0}.
On the other hand, suppose s is uniform on [ 1, 1], i.e., has the density z
on [ 1, 1] and zero outside. Check that (Exercise 6) {X 2 (x): n > l} is an i.i.d.


sequence whose common distribution does not depend on x and has a density

7 r(Y) = 2 4 IYI
2<y<2. (14.27)

Thus, i(y) dy is the unique invariant probability, and stability holds.

The final example deals with absorption probabilities.

Example 3. (Survival Probability of an Economic Agent). Suppose that in a

one-good economy the agent starts with an initial stock x. A fixed amount
c > 0 is consumed (x > c) and the remainder x c is invested for production in
the next period. The stock produced in period 1 is X l = t 1 (X0 c) = E 1 (x c),
where a l is a nonnegative random variable. Again, after consumption, X l c
is invested in production of a stock of E 2 (X 1 c), provided X l > c. If X l < c,
the agent is ruined. In general,

Xo = x, Xn+l = En+1 (Xn c) (n>0). (14.28)

where {En : fl ? l} is an i.i.d. sequence of nonnegative random variables. The

state space may be taken to be [0, oo) with absorption at 0. The probability of
survival of the economic agent, starting with an initial stock x > c, is

p(x):= P(Xn > c for all n > 01 X 0 = x). (14.29)

If S = P(c 1 = 0) > 0, then it is simple to check that P(r n = 0 for some

n > 0) = 1 (Exercise 8), so that p(x) = 0 for all x. Therefore, assume

P(e 1 > 0) = 1. (14.30)

From (14.28) one gets, by successive iteration,

c+ --
Xn+l > C lff Xn > C + C iff Xn - 1 > C +
En+l =C+c + - C
E n+1 En En 2n n+1

iffX,=x>c+ c +-- -+..+ -

Hence, on the set {E n > 0 for all n},

{Xn >cforalln}={x>c+ +
_ - ++ 1 foralln}
l E1 6162 E1gZ...En
` D
1 llj^
( 1 0D x
l n=1 6 I E Z ...E n ) In=1 E182...En C


In other words, Ix
p(x)=P1 x 1 (14.31)
(n=j E 1 g 2 C,, C

This formula will be used to determine conditions on the common distribution

of E n under which (1) p(x) = 0, (2) p(x) = 1, (3) p(x) < I (x > c). Suppose first
that E log E, exists and E log E, < 0. Then, by the Strong Law of Large
1 n
Y log E n gElogE l <0,
n r= , a.s.

so that log E 1 c 2 E --p oo a.s., or e,e 2 E n --, 0 a.s. This implies that the
infinite series in (14.31) diverges a.s., that is,
p(x) = 0 for all x, if E log E, < 0. (14.32)
Now by Jensen's Inequality (Chapter 0, Section 2), E log E, < log EE,, with
strict inequality unless E, is degenerate. Therefore, if EE, < 1, or EE, = 1 and
r, is nondegenerate, then E log e, < 0. If E, is degenerate and EE 1 = 1, then
P(E 1 = 1) = 1, and the infinite series in (14.31) diverges. Therefore, (14.32)
p(x) = 0 for all x, if EE 1 < 1. (14.33)

It is not true, however, that E log E, > 0 implies p(x) = 1 for large x. To see
this and for some different criteria, define

m:=inf{z>0:P(e, <z)>0}. (14.34)

Let us show that
p(x)<1 for allx,ifm<l. (14.35)

For this fix A > 0, however large. Find n o such that

n o >A rfl (I +). (14.36)


This is possible, as 11 (1 + 1/r 2 ) < exp{j] l/r 2 } < oo. If m < I then
P(E, < I + 1/r 2 ) > 0 for all r >, 1. Hence,

0< P(E r z I + l/r 2 for I r n0 )

n0I n I nU In
r=1 E1...E r, E,
\ 1 + 1 /

API >A^^P >A.

\ r =I E,...0 r=1 E1...Er


Because A is arbitrary, (14.31) is less than 1 for all x, proving (14.35).

One may also show that, if m> 1, then

<1 ifx<c
m 1
P(x) _ (m > 1). (14.37)
=1 ifx>c

To prove this, observe that

1 < 1 1
n =1 E1...E,, n =1 m m 1

with probability 1 (if m > 1). Therefore, (14.31) implies the second relation in

for some 8 > 0. Then x/c 1 < 1/(m 1) Choose b.

(14.37). In order to prove the first relation in (14.37), let x < cm/(m 1) cS
such that n(b)

a (14.38)
n(b) m 2

and then choose 8 r > 0 (1 <, r n(b)

<, 1) such that

n(S) 1 1 n(S) 1 1 a
r -1 (m +b )...(
i m+8 r ) > = 1 mr
-2. (14.39)


0<P(Er<m+Srfor I <r<n(b) 1)<P(

(a)-1 1

r.l m

5P1 1 >> 1 a1 =P
\ r =1 ElE2...Er r =1 m
r )
1 > 1 SI.
(r11r m )
If (5 >0 is small enough, the last probability is smaller than P(I 1/(E, E r ) >
x/c 1), provided x/c 1 < 1/(m 1), i.e., if x < cm/(m 1). Thus for such
x one has 1 p(x) > 0, proving the first relation in (14.37).



The mathematical theory of information storage and retrieval rests largely on

foundations established by Claude Shannon. In the present section we will
consider one aspect of the general theory. We will suppose that "text" is

constructed from symbols in a finite alphabet S = {a,, a 2 , ... , a M }. The term

"text" may be interpreted rather broadly and need not be restricted to the text
of ordinary human language; other usages occur in genetic sequences of DNA,
computer data storage, music, etc. However, applications to linguistics have
played a central role in the historical development as will be discussed below.
An encoding algorithm is a transformation in which finite sequences of text are
replaced by sequences of code symbols in such a way that it must be possible
to uniquely reconstruct (decode) the original text from the encoded text. For
simplicity we shall consider compression codes in which the same alphabet is
available for code symbols. Any sequence of symbols is referred to as a word
and the number of symbols as its length. A word of length t will be encoded
into a code-word of length s by the encoding algorithm. To compress the text
one would use a short code for frequently occurring words and reserve the
longer code-words for the more rarely occurring sequences. It is in this
connection that the statistical structure of text (i.e., word frequencies) will play
an important role. We consider a case in which the symbols occur in sequence
according to a Markov chain having a stationary transition law
((p,: i j = 1, 2, ... , M)). Shannon has suggested the following scheme for
generating a Markov approximation to English text. Open a book and select
a letter at random, say T. Next skip a few lines and read until a T is encountered,
and select the letter that follows this T (we observed an H in our trial). Next
skip a few more lines, and read until an H is encountered and take the next
letter (we observed an A), and so on to generate a sample of text that should
be more closely Markovian than text composed according to the usual rules
of English grammar. Moreover, one expects this to resemble more closely the
structure of the English language than independent samples of randomly selected
single letters. Accordingly, one may also consider higher-order Markov
approximations to the structure of English text, for example, by selecting letters
according to the two preceding letters. Likewise, as was also done by Shannon
and others, one may generate text by using "linguistic words" as the basic
alphabetic symbols (see theoretical complement 1 for references). It is of
significant historical notice that, in spite of a modern-day widespread utility in
diverse physical sciences, Markov himself developed his ideas on dependence
with linguistic applications in mind.
Let X0 , X,, ... denote the Markov chain with state space S having the
stationary transition law ((p ;j )) and a unique invariant initial distribution
n = (n ). Then {X} is a stationary process and the word a = ( a, a ;Z ..... a )
; ;

of length t has probability 7c i , p ; ,. ; z p ;, _ .;, of occurring. Suppose that under

the coding transformation the word a = ( a ,, .... a ; ,) is encoded to a word of

length s = c(a ...... a ,). Let

; ;

u, = E[c(X,..... X,)]/t. (15.1)

The quantity p, is referred to as the average compression for words of length t.

The optimal extent to which a given (statistical) type of text can be compressed
by a code is measured by the so-called compression coefcient defined by

It = limsup ,. (15.2)

The problem for our consideration here is to calculate the compression

coefficient in terms of the parameters of the given Markov structure of the text
and to construct an optimum compression code. We will show that the optimal
compression coefficient is given by

H 17r; H; 7c, ^ p ^ log Pt1J

= ------ log

^ --, ( 15.3 )
log M log M

in the sense that the compression coefficient of a code is never smaller than
this, although there are codes whose coefficient is arbitrarily close to it.
The parameter H = p, log p ;j is referred to as the entropy of the

transition distribution from state i and is a measure of the information obtained

when the Markov chain moves one step ahead out of state i (Exercises 1 and
2). The quantity H = Y_ zt H is called the entropy of the Markov chain. Observe
; ; ;

that, given the transition law of the Markov chain, the optimal compression
coefficient may easily be computed from (15.3) once the invariant initial
distribution is determined.
For a word a of length t let p,(a) = P((Xo , ... , X, _ ,) = a). Then,

log P,((Xo, ... , X,-1)) = log Tt(Xo ) + Y, ( log Px;.x1. ) = Yo + Z Y


where Y,, Y2 , . .. is a stationary sequence of bounded random variables. By the

law of large numbers applied to the stationary sequence Y1 , Y2 ,... , we have
by Theorems 9.2 and 9.3 (see Exercise 10.5),

log P,((Xo _- , X,- 1 )) *

EY, = H (15.5)

as t --+ oo with probability 1 (Exercise 4); i.e., for almost all sample realizations,
for large t the probability of the sequence X0 , X,, ... , X,_, is approximately
exp{ tH}. The result (15.5) is quite remarkable. It has a natural generalization
that applies to a large class of stationary processes so long as a law of large
numbers applies (Exercise 4).
An important consequence of (15.5) that will be used below is obtained by
considering for each t the M` words of length t arranged as x ( , x ( ,., ... in order
of decreasing probability. For any positive number a < 1, let

Ne (e) = min{N: Y_ p,(a^,^) s}. (15.6)


Proposition 15.1. For any 0 < g < 1,

log N,(c)
lim - - = H. (15.7)

Proof. Since almost sure convergence implies convergence in probability, it

follows from (15.5) that for arbitrarily small positive numbers y and S,
b < max{e, I s}, for all sufficiently large t we have

C ^ log
Pt(Xo, ... , X' -' )
H < v )>

In particular, for all sufficiently large t, say t >, T,

exp{t(H + y)} < p,(X 0 , ... , X,_,) < exp{t(H y)}

with probability at least I . Let R, denote the set consisting of all words x
of length t such that e - ` ( " + }' ) < p 1 (a) < e - ` ( " - ' ) . Fix t larger than T. Let
Sr = {a ) , a (2 ... , ; N , (e }. The sum of the probabilities of the M,(E), say, words
a of length t in R, that are counted among the N,(e) words in S, equals
Yr s, u R, p,(a) > r 5 by definition of N,(e). Therefore,

M, ( s ) e -r(H-vi >, I p,(a) > t b. (15.8)

aeS,' R,

Also, none of the elements of S, has probability less than exp{ t(H + y)}, since
the set of all a with p,(a) > exp{ t(H + y)} contains R, and has total probability
larger than I > E. Therefore,

N,(c)c- pur +y) < 1. (15.9)

Taking logarithms, we have

log N,(e)
On the other hand, by (15.8),

N,(r)e-^ur-Y) > e 6. (15.10)

Again taking logarithms and now combining this with (15.9), we get

log N,(E)

Since y and 6 are arbitrarily small the proof is complete. n


Returning to the problem of calculating the compression coefficient, first let

us show that p >, H/log M. Let 6 > 0 and let H' = H 2b < H. For an arbitrary
given code, let

J, = {a a is a word of length t and c(a) < tH'/log M}. (15.11)


#J < M + M 2 + + MIIH'IIogM] < M(1H'jiogM) { l + 1/M + _ ,
since the number of code-words of length k is M k . Now observe that

t,=Ec(X1,...,X^)%lo MP{(X...,X,)eJi}

_ ^H[1 P{(X,,...,X,)eJ,}]. (15.13)

log M


p = limsup p, limsup [1 P{(X,, ... , X,) e J}]. (15.14)

> log M

Now observe that for any positive number e < 1, for the probability
P({(X,, ... , Xr ) e J}) to exceed E requires that N,(E) be smaller than #J,. In
view of (15.12) this means that

N,(E) <M--exp{t(H 2b)} (15.15)


log N1(c) <

0(1) + H 26. (15.16)

Now by Proposition 15.1 for any given t this can hold for at most finitely many
values of t. In other words, we must have that the probability
P({ (X,, ... , X,) e Jr }) tends to 0 in the limit as t grows without bound. Therefore,
(15.14) becomes

H' H-26
log M log M

and since 6 > 0 is arbitrary we get p >, H/log M as desired.


To prove the reverse inequality, and therefore (15.3), again let b be an

arbitrarily small positive number. We shall construct a code whose compression
coefficient u does not exceed (H + S)/log M. For arbitrary positive numbers y
and e, we have

N1(1 e) < e ^(H+y) = M t(n+y)(Io9m (15.18)

for all sufficiently large t. That is, the number of (relatively) high-probability
words of length t, the sum of whose probabilities exceeds I e, is no greater
than the number M` (H +y ) / I og At of words of length t(H + y)/log M. Therefore,
there are enough distinct sequences of length t(H + y)/log M to code the
N (1 e) words "most likely to occur." For the lower-probability words, the
sum of whose probabilities does not exceed 1 (I e) = t, just code each one
as itself. To ensure uniqueness for decoding, one may put one of the previously
unused sequences of length t(H + y)/log M in front of each of the self-coded
terms. The length c(X0 , X,.... , X,_,) of code-words for such a code is then
either t(H + y)/log M or t + t(H + y)/log M, the latter occurring with
probability at most e. Therefore,

t(H + y) [ _
t(H + y)l t(H + 8)
Ec(X0 , X 1 .... , X1 _ t ) ^ ^
l + t + (15.19)
g lo g M F lo g M

where b = eH + e log M + ey + y. The desired inequality now follows.


Exercises for Section I1.1

1. Verify that the conditional distribution of X n+ , given Xo , X i , .... Xn is the
conditional distribution of X +1 given X. if and only if the conditional distribution
of X + given
given Xo , X 1 , ... , Xn is a (measurable) function of X. alone. [Hint: Use
properties of conditional expectations. Section 0.4.]
2. Show that the simple random walk has the Markov property.
3. Show that every discrete-parameter stochastic process with independent increments
is a Markov process.
4. Let A, B, C be events with C, B n C having positive probabilities. Verify that the
following are equivalent versions of the conditional independence of A and B given
C: P(A n B C) = P(A C)P(B I C) if and only if P(A I B n C) = P(A I C).

5. (i) Let {Xn } be a sequence of random variables with denumerable state space S.
Call {Xn } rth order Markov-dependent if

P(Xn +i =JI Xo=io,..., Xn = i. '

= P(Xn+ 1 =j I Xn-r+1 = ( n-r+ l+ , Xn = in ) for i o , .. , i n , j e S, n i r.


Show that Y. = (XX , X + I , ... , X + ,._, ), n = 0, 1, 2, ... is a (first-order) Markov

chain under these circumstances.
(ii) Let V. = X +1 X, n = 0, 1, 2, .... Show that if {X} is a Markov chain, then
so is {(X, V)}. [Hint: Consider first {(X X + ,)} and then apply a one-to-one
6. Show that a necessary and sufficient condition on the correlations of a
discrete-parameter stationary Gaussian stochastic process {X n } to have a Markov
property is Cov(X,,, Xn+m ) = a 2 pi for some a' > 0, 1p, < 1, m, n = 0, 1, 2, .... [A
stochastic process {X: n >, 0} is said to be stationary if, for all n, m, the distribution
of (Xo , .. , X) is the same as that of (Xm, Xm+1, , Xm+n)]
7. Let {} be an i.i.d. sequence of 1-valued Bernoulli random variables with
parameter 0 <p < 1. Define a new stochastic process by XX = (Y + Y_,)/2, for
n = 1, 2, .... Show that {X} does not have the Markov property.
8. Let {S} denote the simple symmetric random walk starting at the origin and let
R = ISn I. Show that {R n } is a Markov chain.
9. (Random Walk in Random Scenery) Let { Y: n e 7L} be a symmetric i.i.d. sequence
of + 1-valued random variables indexed by the set of integers 1. Let {S}
be the simple symmetric random walk on the state space 7l starting at S o = 0. The
random walk {S} is assumed to be independent of the random scenery {Y}. Define
a new process {X} by noting down the scenery at each integer site upon arrival in
the course of the walk. That is, X. = Ys, n = 0, 1, 2, ... .
(i) Calculate EX. [Hint: X. _ ^m=_ YmI(s.=mj.]
(*ii) Show that {X} is stationary. [See Exercise 6 for a definition of stationarity.]
(iii) Is {X} Markovian?
(iv) Show that Cov(X, X +m ) (2m)112m 112 for large even m, and zero for odd
m. (For an analysis of the "long-range dependence in this example, see H.
Kesten and F. Spitzer (1979), "A Limit Theorem Related to a New Class of
Self-Similar Processes, Z. Wahr. Verw. Geb., 50, 5-25.)
10. Let {Z: n = 0, 1, 2, ...} be i.i.d. + I-valued with P(Z = 1) = p ^ 2. Define
X=Z Zn +1 , n=0,1,2,.... Show that for k_<n-1, P(Xn+1 =jIXk =i)=
P(X + 1 j), i.e., X + I and Xk are independent for each k = 0, ... , n 1, n 1. Is
{X} a Markov chain?

Exercises for Section II.2

1. (i) Show that the transition matrix for a sequence of independent integer-valued
random variables is characterized by the property that its rows are identical; i.e.,
p ;f =p ;for all i,jeS.
(ii) Under what further condition is the Markov chain an i.i.d. sequence?
2. (i) Let {} be a Markov chain with a one-step transition matrix p. Suppose that
the process {Y} is viewed only at every mth time step (m fixed) and let X. = Y m ,
for n = 0, 1, 2, .... Show that {X} is a Markov chain with one-step transition
law given by pm.
(ii) Suppose {X} is a Markov chain with transition probability matrix p. Let
n l < n 2 < < n k . Prove that
P(Xk _J I X 1 = i ^, . . . , Xnk - = `k O = Pik - IJ


3. (Random Walks on a Group) Let G be a finite group with group operation denoted
by. That is, G is a nonempty set and is a well-defined binary operation for G
such that (i) if x, y a G then x p+ y e G; (ii) if x, y, z e G then x (D(v z) = (v J y) z;
(iii) there is an e E G such that x (D e = e p+ x = x for all x e G; (iv) for each XE G
there is an element in G, denoted x, such that x $(x) = (_r) +Q x = e. If (1 is
commutative, i.e., x Q+ y = y O + x for all x, y E G, then G is called abelian. Let
X 1 , X 2 , ... be i.i.d. random variables taking values in G and having the common
probability distribution Q(g) = P(X = g), g e G.
(i) Show that the random walk on G defined by S = X 0 Q+ X i Q+ O+ X, ri _> 0, is
a Markov chain and calculate its transition probability matrix. Note that it is
not necessary for G to be abelian for {S} to be Markov.
(ii) (Top-In Card Shuffles) Construct a model for card shuffling as a Markov chain
on a (nonabelian) permutation group on N symbols in which the top card of
the deck is inserted at a randomly selected location in the deck at each shuffle.
(iii) Calculate the transition probability matrix for N = 3. [Hint: Shuffles are of the
form (c1, c2, c 3) s (c2. c1, c 3) or (c2, c3, C1) only.] Also see Exercise 4.5.
An individual with a highly contagious disease enters a population. During each
subsequent period, either the carrier will infect a new person or be discovered and
removed by public health officials. A carrier is discovered and removed with
probability q = I p at each unit of time. An unremoved infected individual is sure
to infect someone in each time unit. The time evolution of the number of infected
individuals in the population is assumed to be a Markov chain {X: n = 0, 1, 2, ...}.
What are its transition probabilities?

5. The price of a certain commodity varies over the values 1, 2, 3, 4, 5 units depending
on supply and demand. The price X at time n determines the demand D. at time n
through the relation D. = N X, where N is a constant larger than 5. The supply
C. at time n is given by CC = N 3 + E. where {F} is an i.i.d. sequence of equally
likely 1-valued Bernoulli random variables. Price changes are made according to
the following policy:

X,, +1 X=+1 if DC>0,

X + , X= 1 if DC<0,
X, 1 X= 0 if DC=0.

(i) Fix X0 = i o . Show that {X} is a Markov chain with state space S = { 1, 2, 3, 4, 5}.
(ii) Compute the transition probability matrix of {X}.
(iii) Calculate the two-step transition probabilities.
6. A reservoir has finite capacity of h units, where h is a positive integer. The daily
inputs are i.i.d. integer-valued random variables {J: n = 1, 2, ...} with the common
p.m.f. {g j = P(J = j), j = 0, 1, 2, ...}. One unit of water is released through the dam
at the end of each day provided that the reservoir is not empty or does not exceed
its capacity. If it is empty, there is no release. If it exceeds capacity, then the excess
water is released. Let X. denote the amount of water left in the reservoir on the nth
day after release of water. Compute the transition matrix for {X}.
7. Suppose that at each unit of time each particle located in a fixed region of space has
probability p, independently of the other particles present, of leaving the region. Also,


at each unit of time a random number of new particles having Poisson distribution
with parameter ) enter the region independently of the number of particles already
present at time n. Let X. denote the number of particles in the region at time n.
Calculate the transition matrix of the Markov chain {X"}.
8. We are given two boxes A and B containing a total of N labeled balls. A ball is
selected at random (all selections being equally likely) at time n from among the N
balls and then a box is selected at random. Box A is selected with probability p and
B with probability q = I p independently of the ball selected. The selected ball is
moved to the selected box, unless the ball is already in it. Consider the Markov
evolution of the number X. of balls in box A. Calculate its transition matrix.
9. Each cell of a certain organism contains N particles, some of which are of type A
and the others type B. The cell is said to be in state j if it contains exactly j particles
of type A. Daughter cells are formed by cell division as follows: Each particle replicates
itself and a daughter cell inherits N particles chosen at random from the 2j particles
of type A and the 2N 2j particles of type B present in the parental cell. Calculate
the transition matrix of this Markov chain.

Exercises for Section II.3

1. Let p be the transition matrix for a completely random motion of Example 2. Show
that p" = p for all n.
2. Calculate p;; for the unrestricted simple random walk.

3. Let p = ((p i; )) denote the transition matrix for the unrestricted general random walk
of Example 6.
(i) Calculate p;t interms of the increment distribution Q.
(ii) Show that p = Q*"(j i), where the n-fold convolution is defined recursively by

Q * "(j) _ Q *( " - I) (k)Q(j k), Q * "' = Q.


4. Verify each of the following for the Plya urn model in Example 8.
(i) P(X" = 1) = r/(r + b) for each n = 1, 2, 3, ... .
(ii) P(X1 = e1,...,Xn =En)= P(Xi+n=x ,...,X. +h= ),foranyh=0,1,2,...
(*iii) {X is a martingale (see Definition 13.2, Chapter I).

5. Describe the motion represented by a Markov chain having transition matrix of the
following forms:

(i) = 0 1 ],

=[1 0J'
5 S
(iii) p =

(iv) Use the probabilistic description to write down p" without algebraically
performing the matrix multiplications. Generalize these to m-state Markov
(Length of a Queue) Suppose that items arrive at a shop for repair on a daily basis
but that it takes one day to repair each item. New arrivals are put on a waiting list
for repair. Let A. denote the number of arrivals during the nth day. Let X. be the
length of the waiting list at the end of the nth day. Assume that A,, A,, .. . is an i.i.d.
nonnegative integer-valued sequence of random variables with a(x) = P(A = x),
x = 0, 1, 2, .... Assume that A + , is independent of X o X (n _> 0). Calculate
, ... ,

the transition probabilities for {X}.

7. (Pseudo Random Number Generator) The linear congruential method of generating
integer values in the range 0 to N I is to calculate h(x) = (ax + c) mod(N) for
some choice of integer coefficients 0 a, c < N and an initial seed value of x.
More generally, polynomials with integer coefficients can be used in place of ax + c.
Note that these methods cycle after N iterations.
(i) Show that the iterations may be represented by a Markov chain on a circle.
(ii) Calculate the transition probabilities in the case N = 5, a = 1, c = 2.
(iii) Calculate the transition probabilities in the case h(x) _ (x 2 + 2) mod(5).
[See D. Knuth (1981), The Art of Computer Programming, Vol. lt, 2nd ed.,
Addison-Wesley, Menlo Park, for extensive treatments.]
(A Renewal Process) A system requires a certain device for its operation that is
subject to failure. Inspections for failure are made at regular points in time, so that
an item that fails during the nth period of time between n 1 and n is replaced at
time n by a device of the same type having an independent service life. Let p denote
the probability that a device will fail during the nth period of its use. Let X. be the
age (in number of periods) of the item in use at time n. A new item is started at time
n = 0, and X. = 0 if an item has just been replaced at time n. Calculate the transition
matrix of the Markov chain {X}.

Exercises for Section 11.4

A balanced six-sided die is rolled repeatedly. Let Z denote the smallest number of
rolls for the occurrence of all six possible faces. Let Z, = 1, Z ; = smallest number of
tosses to obtain the jth new face after j 1 distinct faces have occurred. Then
Z=Z 1 ++Z 6 .

(i) Give a direct proof that Z,, ... , Z 6 are independent random variables.
(ii) Give a proof of (i) using the strong Markov property. [Hint: Define stopping
times t, denoting the first time after r j _, that X. is not among X,, .... X T

where X,, X2 , are the respective outcomes on the successive tosses.]


(iii) Calculate the distributions of Z21 ... , Z 6


(iv) Calculate EZ and Var Z.

2. Let {S} denote the two-dimensional simple symmetric random walk on the integer
lattice starting at the origin. Define r, = inf{n: IISI1 = r}, r = 1, 2, ... , where
II(a, b)JI = (ai + Ibl. Describe the distribution of the process {S,,: n = 0, 1, 2, ...} in
the two cases r = 1 and r = 2.

3. (Coupon Collector's Problem) A box contains N balls labeled 0, 1, 2, ... , N 1.

Let T - TN be the number of selections (with replacement) required until each ball
is sampled at least once. Let T; be the number of selections required to sample j
distinct balls. Show that

(i) T=(T" T"-a)+(T"-I T"-2)+ ...

+(T2 T1)+T1,

where T, = 1, T2 T,..... T + , T, ... , T" T"_, are independent geomet-

rically distributed with parameters (N j)/N, respectively.
(ii) Let r j be the number of selections to get ball j. Then r ; is geometrically distributed.
(iii) P(T > m) [Hint: P(T > m) < I "_, P(r > m)].
; ;

N )(
(iv) P(T > m) = k ^^ ( I)k+l(

[Hint: Use inclusionexclusion on {T> m} = U=L {i ; > m}.]

(v) Let X,, X2 , ... be the respective numbers on the balls selected. Is T a stopping
time for {X}?
4. Let {Xn } and { }.} be independent Markov chains with common transition probability
matrix p and starting in states i and j respectively.
(i) Show that {(X, Y)} is a Markov chain on the state space S x S.
(ii) Calculate the transition law of {(X n , Y,)}.
(iii) Let T = inf{n: X. = Y.J. Show that T is a stopping time for the process {(X n , Yc,)}.
(iv) Let {Z} be the process obtained by watching {X n } up until time T and then
switching to { } n } after time T; i.e., Zn = X, n < T, and Zn = Y., for n > T Show
that {Z} is a Markov chain and calculate its transition law.
5. (Top-In Card Shuffling) Suppose that a deck of N cards is shuffled by repeatedly
taking the top card and inserting it into the deck at a random location. Let G be
the (nonabelian) group of permutations on N symbols and let X,, X 2 , ... be i.i.d.
G-valued random variables with

P(Xk =<i,i-1,...,1>)=1/N fori= 1,2,...,N,

where (i, i 1, ... , I> is the permutation in which the ith value moves to i 1,
i I to i 2, ... 2 to 1, and 1 to i. Let S o be the identity permutation and let
S = X,. . . X, where the group operation is being expressed multiplicatively. Let T
denote the first time the original bottom card arrives at the top and is inserted back
into the deck (cf. Exercise 2.3). Then
(i) T is a stopping time.
(ii) T has the additional property that P(T = k, Sk = g) does not depend on g e G.
[Hint: Show by induction on N that at time T I the (N 1)! arrangements
of the cards beneath the top card are equally likely; see Exercise 2.3(iii).]
(iii) Property (ii) is equivalent to P(Sk = g I T = k) = 1/1G 5 I; i.e., the deck is mixed
at time T. This property is referred to as the strong uniform time property by
D. Aldous and P. Diaconis (1986), "Shuffling Cards and Stopping Times, Amer.
Math. Monthly, 93, pp. 333-348, who introduced this example and approach.


_< _<
(iv) Show that

maxlP(S E A) Al P(T > n) Ne

[Hint: Write


0 _< _<
I B + rP(T > n),
= I I P(T n) + P(S e A T > n)P(T > n) = Al

1. For the rightmost upper bound, compare Exercise 4.3.]

6. Suppose that for a Markov chain {Xn p,.,,:= Pr (XX = y for some n
that PX (X, = y for infinitely many n) = 1.
}, >_ 1) = 1. Prove

7. (Record Times) Let X 1 , X2 , .. . be an i.i.d. sequence of nonnegative random variables

having a continuous distribution (so that the probability of a tie is zero). Define
R, = 1, R k = inf {n_> R k _ I + 1:XX >,max(X,,...,XX _ 1 )}, for k=2,3,....
(i) Show that {R} is a Markov chain and calculate its transition probabilities.
[Hint: All ik ! rankings of (X,, X2 , ... , X;k are equally likely. Consider the event

{R 1 = 1, R 2 = '2, ... , R k = ik } and count the number of rankings of (X l , X2 , ... , X,


that correspond to its occurrence.]

(ii) Let T = R +1 R. Is {T} a Markov chain? [Hint: Compute P(T3 = 1 T2 = 1, I
T1 = 1) and P(T3 =I T2 = 1).]
8. (Record Values) Let X,, X2 ,.. . be an i.i.d. sequence of nonnegative random
variables having a discrete distribution function. Define the record times R 1 = 1,
R 2 , R 3 , ... as in Exercise 7. Define the record values by Vk = XRk k = 1, 2, ... .

(i) Show that each R k is a stopping time for {X. }.

(ii) Show that {1'} is a Markov process and calculate its transition probabilities.
(iii) Extend (ii) to the case when the distribution function of Xk is continuous.

Exercises for Section II.5

1. Construct a finite-state Markov chain such that
(i) There is only one inessential state.
(ii) The set .' of essential states decomposes into two equivalence classes with periods
d = I and d = 3.
2. (i) Give an example of a transition matrix for which all states are inessential.
(ii) Show that if S is finite then there is at least one essential state.
3. Classify all states for p given below into essential and inessential subsets. Decompose
the set of all essential states into equivalence classes of communicating states.

j 0 0 0 j 0 0
0 0 0; 0 0 2
I 1 1 1 1 1
6 6 6 6 6 6

0 1 0 0 0 1 0
5 0 0 0 3 00
0 0 0 6 0 0 6`

0 * 0 0 0 3 0

4. Suppose that S comprises a single essential class of aperiodic states. Show that there
is an integer v such that p$ 1 > 0 for all i, j e S by filling in the details of the following
(i) For a fixed (i, )), let B i; _ {v > 1: p;J) > 0}. Then for each state j, B is closed
under addition.
(ii) (Basic Number Theory Lemma) If B is a set of positive integers having greatest
common divisor 1 and if B is closed under addition, then there is an integer b
such that ne B for all n >, b. [Hints:
(a) Let G be the smallest additive subgroup of Z that contains B. Then argue
that G = Z since if d is the smallest positive integer in G it will follow that
if n E B, then, since n = qd + r, 0 <, r < d, one obtains r = n qd E G and
hence r = 0, i.e., d divides each n e B and thus d = 1.
(b) If leB,theneachn=l+1++Ie B. If10B,thenby(a), 1 =a
for a, E B. Check b = ( a + ) Z + 1 suffices; for if n > (a + ) 2 , then, writing

n > ( r + 1)(a + ).]

n= q(a +)+r, 0<r <a+, n= q(a +)+r(a)= (q +r)a +(q r),
and in particular n e B since q + r > 0 and q r > 0 by virtue of

(iii) For each (i, j) there is an integer b;; such that v b implies v e B.. [Hint:

Obtain b from (ii) applied to (i) and then choose k such that p;; 9 > 0. Check
that b = k + b suffices.]
;; ;;

(iv) Check that v = max {b : i, j e S} suffices for the statement of the exercise.

5. Classify the states in Exercises 2.4, 2.5, 2.6, 2.8 as essential and inessential states.
Decompose the essential states into their respective equivalence classes.
6. Let p be the transition matrix on S = {0, 1, 2, 3} defined below.

2 0 ' 0 '2
20 Z 0
2 0 1 0 '2
2 0 2 0

Show that S is a single class of essential states of period 2 and calculate p" for all n.
7. Use the strong Markov property to prove that if j is inessential then P,,(X,, =j for
infinitely many n) = 0.
8. Show by induction on N that all states communicate in the Top-In Card Shuffling
example of Exercises 2.3(ii) and 4.5.


Exercises for Section II.6

1. (i) Check that "deterministic motion going one step to the right" on
S = {0, 1, 2, ...} provides a simple example of a homogeneous Markov chain
for which there is no invariant distribution.
(ii) Check that the "static evolution" corresponding to the identity matrix provides
a simple example of a 2-state Markov chain with more than one invariant
distribution (see Exercise 3.5(i)).
(iii) Check that "deterministic cyclic motion" of oscillations between states provides
a simple example of a 2-state Markov chain having a unique invariant
distribution it that is not the limiting distribution lim,,. P(X =1) for any
other initial distribution It : it (see Exercise 3.5(ii)).
(iv) Show by direct calculation for the case of a 2-state Markov chain having strictly
positive transition probabilities that there is a unique invariant distribution
that is the limiting distribution for any initial distribution. Calculate the precise
exponential rate of convergence.
2. Calculate the invariant distribution for Exercise 2.5. Calculate the so-called
equilibrium price of the commodity,
3. Calculate the invariant distribution for Exercise 2.8.
4. Calculate the invariant distribution for Exercise 2.3(ii) (see Exercise 5.8).
5. Suppose that n states a,, a 2 .... , a are arranged counterclockwise in a circle. A
particle jumps one unit in the clockwise direction with probability p, 0 _< p _< 1, or
one unit in the counterclockwise direction with probability q = I p. Calculate the
invariant distribution.
6. (i) (Coupling Bound) If X and Y are arbitrary real-valued random variables and
J an arbitrary interval then show that JP(X e J) P(Y e J)I < P(X ^ Y).
(ii) (Doeblin's Coupling Method) Let p be an aperiodic irreducible finite-state
transition matrix with invariant distribution n. Let {X} and { Y} be independent
Markov chains with transition law p and having respective initial distributions
it and S ; ,. Let T denote the first time n that X. =
(a) Show P(T < oo) = 1. [Hint: Let v = min{n: p >0 for all i, j e S}. Argue
P(T> kv)5 P(Xkv^ YkvIX(k-i)i# (k-1)v''Xv# Yv) ...

P(X 2 ,, ^ Y'2V I X ^ YV )P(X V 0 YY ) < (I Nb 2 ) k ,

where 6 = min ; p;j> > 0, N = ISI.]

(b) Show that {(X, ))} is a Markov chain on S x S and T is a stopping time
for this Markov chain.
(c) Define {Z} to be the process obtained by observing { Y} until it meets {X.}
and then watching {X} from then on. Show that {Z} is a Markov chain
with transition law p and invariant initial distribution H.
(d) Show IP(Z = j) P(XX =1)1 _< P(T _> n).
(e) Show Ip N2)o.iv^ -
P(T >_ n) _< ( 1

Let A = ((a ;J )) be an N x N matrix. Suppose that A is a transition probability

matrix with strictly positive entries a ;j .
(i) Show that the spectral radius, i.e., the magnitude of the largest eigenvalue of A,

must be 1. [Hint: Check first that A is an eigenvalue of A (in the sense Ax = Ax

has a nonzero solution x = (x 1 x")') if and only if zA = atz has a nonzero

solution z = (z, z"); recall that det(B) = det(B') for any N x N matrix B.]
(ii) Show that A = 1 must be a simple eigenvalue of A (i.e., geometric multiplicity
1). [Hint: Suppose z is any (left) eigenvector corresponding to A = 1. By the
results of this section there must be an invariant distribution (positive
eigenvector) n. For t sufficiently large z + to is also positive (and normalizable).]
*8. Let A = ((a ;j )) be a N x N matrix with positive entries. Show that the spectral radius
is also given by min{.l > 0: Ax < Ax for some positive x}. [Hint: A and its transpose
A' have the same eigenvalues (why?) and therefore the same spectral radius. A' is
adjoint to A with respect to the usual (dot) inner product in the sense
(Ax, y) = (x, A'y) for all x, y, where (u, v) = Z" 1 u 1 v.. Apply the maximal property
to the spectral radius of A'.]
9. Let p(x, y) be a continuous function on [c, d] x [c, d] with c < d. Assume that
p(x, y) > 0 and p(x, y) dy = 1. Let S2 denote the space of all sequences
w = (x 0 x 1 ,...) of numbers x ; e [c, d]. Let .yo denote the class of all

finite-dimensional sets A of the form A = {co = (x 0 x,, . ..) e S2: a, <x < b ; , ,

i = 0,1, ... , n}, where c < a ; < b, < d for each i. Define P (A) for such a set A by P

Px(A) = "... P(x,Y1)p(Y1,Y2) ... p(Y "-1,Y ")dY" ... dy1 forxe[ao,bo].
J fa bl ,

Define P (A) = 0 if x < a o or x > b 0 . The Kolmogorov Extension Theorem assures


us that PX has a unique extension to a probability measure defined for all events in
the smallest sigmafield .y of subsets of Q that contains FO . For any nonnegative
integrable function y with integral 1, define

P(A) = J P (A)y(X) dx,

x A e F.

Let X. denote the nth coordinate projection mapping on 52. Then {X"} is said to
be a Markov chain on the state space S = [c, d] with transition density p(x, y) and
initial density y under P. Under P. the process is said to have initial state x.
(i) Prove the Markov property for {X"}; i.e., the conditional distribution of X"+ 1
given X0 ,... , X" is p(X", y) dy.
(ii) Compute the distribution of X. under P.
(iii) Show that under PY the conditional distribution of X. given X o = x o is
p 1 " ) ( x o , y) dy, where

p(x,Y)= (x, z)p(z, y) dz and p ( ' ) =p.

f p ( n - 1)

(iv) Show that if S = inf{p(x, y): x, ye [c, d]} > 0, then

J I P(x, Y) P(z, Y)I dy < 2[ 1 b(d c)]


by breaking the integral into two terms involving y such that p(x, y) > p(z, y)
and those y such that p(x, y) -< p(z, y).
(v) Show that there is a continuous strictly positive function n(y) such that

max {Ip " (x, y) m(y)I: c -< x, y -< d} < [1 b(d c)]" 'p
( )

where p = max {Ip(x, y) p(z, y)I: c -< x, y, z -< d} < oo.

(vi) Prove that it is an invariant distribution and moreover that under the present
conditions it is unique.
(vii) Show that P(X" e (a, b) i.o.) = 1, for any c <a < b < d. [Hint: Show that
Py (X" $ (a, b) for m < n -< M) [1 b(b a)]"` ". Calculate P,(X" (a, b) for

all n> m).]

10. (Random Walk on the Circle) Let {X"} be an i.i.d. sequence of random variables
taking values in [0, 1] and having a continuous p.d.f. _f(x). Let {S} be the process
on [0, 1) defined by

S"=x+X,++X" mod1,

where x E [0, 1).

(i) Show that {S"} is a Markov chain and calculate its transition density.
(ii) Describe the time asymptotic behavior of p "(x, y).

(iii) Describe the invariant distribution.

11. (The Umbrella Problem) A person who owns r umbrellas distributes them between
home and office according to the following routine. If it is raining upon departure
from either place, an event that has probability p, say, then an umbrella is carried
to the other location if available at the location of departure. If it is not raining,
then an umbrella is not carried. Let X. denote the number of available umbrellas at
whatever place the person happens to be departing on the nth trip.
(i) Determine the transition probability matrix and the invariant distribution
(ii) Let 0 < a < 1. How many umbrellas should the person own so that the
probability of getting wet under the equilibrium distribution is at most a against
a climate (p)? What number works against all possible climates for the
probability a?

Exercises for Section I1.7

1. Calculate the invariant distributions for Exercise 2.6.
2. Calculate the invariant distribution for Exercise 5.6.
3. A transition matrix is called doubly-stochastic if its transpose p' is also a transition
matrix; i.e., if the elements of each column add to 1.
(i) Show that the vector consisting entirely of l's is invariant under p and can be
normalized to a probability distribution if S is finite.
(ii) Under what additional conditions is this distribution the unique invariant
4. (i) Suppose that it = ( a ; ) is an invariant distribution for p. The distribution P,, of

the process is called time-reversible if it p ij = 7r j pj; for all i, j e S [it is often said
to be time-reversible (with respect to p) as well]. Show that if S is finite and p
is doubly stochastic, then the (discrete) uniform distribution makes the process
time-reversible if and only if p is symmetric.
(*ii) Suppose that {X^} is a Markov chain with invariant distribution it and started
in x. Then {X} is a stationary process and therefore has an extension backward
in time ton = 0, 1, 2..... [Use Kolmogorov's Extension Theorem.] Define
the time-reversed process by Y = X_,. Show that the reversed process {}} is
a Markov chain with 1-step transition probabilities q ;j = nj pji /n,.
(iii) Show that under the time-reversibility condition (i), the processes in (ii), { Y}
and {X}, have the same distribution; i.e., in equilibrium a movie of the evolution
looks the same statistically whether run forward or backward in time.
(iv) Show that an irreducible Markov chain on a state space S with an invariant
initial distribution it is time-reversible if and only if (Kolmogorov Condition):

Pr, Pi,i 2 Piki = PukP ikik- 1 * P i,i for all i, i 1 , .... i k E S, k >_ 1.

(v) If there is a j e S such that p ij > 0 for all i 0 j in (iv), then for time-reversibility
it is both necessary and sufficient that p ij Pik Pki = Pik Pkj Pji for all i, j, k.
5. (Random Walk on a Tree) A tree graph on n vertices v 1 , v 2 , ... , v, is a connected
graph that contains no cycles. [That is, there is given a collection of unordered pairs
of distinct vertices (called edges) with the following property: Any two distinct vertices
u, v e S are uniquely connected in the sense that there is a unique sequence
e l , e 2 .... , e of edges e ; _ {v k; , v,,} such that u e e, v e e,,, e i n e i ,, ^ 0,
i = 1, ... , n 1.] The degree v i of the vertex v ; represents the number
of vertices adjacent to v i , where u, v E S are called adjacent if there is an edge {u, v}.
By a tree random walk on a given tree graph we mean a Markov chain on the state
space S = {v1, v2.... , v r } that at each time step n changes its state v ; to one of its v i
randomly selected adjacent states, with equal probabilities and independently of its
states prior to time n.
(i) Explain that such a Markov chain must have a unique invariant distribution.
(ii) Calculate the invariant distribution in terms of the vertex degrees v i , i = 1, ... , r.
(iii) Show that the invariant distribution makes the tree random walk time-reversible.
6. Let {X} be a Markov chain on S and define Y. = (X, X n = 0, 1, 2, ... .
(i) Show that {Y} is a Markov chain on S' = {(i, j) E S x S: p i , > 0).
(ii) Show that if {X} is irreducible and aperiodic then so is {Y}.
(iii) Show that if {X} has invariant distribution it = (rz ; ) then {}} has invariant
distribution (n ; p ; ,).
7. Let {X} be an irreducible Markov chain on a finite state space S. Define a graph G
having states of S as vertices with edges joining i and j if and only if either p, j > 0
or pj; > 0.
(i) Show that G is connected; i.e., for any two sites i and j there is a path of edges
from i to j.
(ii) Show that if {X} has an invariant distribution it then for any A c S,

Y. E 7 ri Pij = /_ Y_ Irj Pjl

leA jES\A ieA jeS\A


(i.e., the net probability flux across a cut of S into complementary subsets A, S\A
is in balance).
(iii) Show that if G contains no cycles (i.e., is a tree graph in the sense of Exercise
5). then the process is time-reversible started with n.

Exercises for Section 1I.8

1. Verify (8.10) using summation by parts as indicated. [Hint: Let Z be nonnegative
integer-valued. Then

P(Z>r)= P(Z=n).
.=o r=on=r+t

Now interchange the order of summation.]

2. Classify the states in Examples 3.1-3.7 as transient or recurrent.
3. Show that if j is transient and i *j then - o p ^ < co and, in particular, p *0
( )

as n oc. [Hint: Represent N(j) as a sum of indicator variables and use (8.11).]
4. Classify the states for the models in Exercises 2.6, 2.7, 2.8, 2.9 as transient or recurrent.
5. Classify the states for {R} = {IS,,}, where S. is the simple symmetric random walk
starting at 0 (see Exercise 1.8).
6. Show that inessential states are transient.
7. (A Birth or Collapse Model) Let

1 i
-- '
Pi.^+I =i+1 Pi.o i =0, 1,2,....
=i+ 1'

Determine whether p is transient or recurrent.

8. Solve Exercise 7 when

I i
A.o = Pi.;+I = i + 1' >_ I ' Po.i 1.
i+ l ' i

9. Let p,, +1 = p, p; , o = q, i = 0, 1, 2, .... Classify the states of S= {0, 1, 2, ...} as

transient or recurrent (0 < p < 1, q = 1 p).
10. Fix i,.j E S. Write

r=P;(X=j) = p;; ' (n%1). r0= 1 .

%=P i (X,,,^jform<n,X=j) (n%1)

(i) Prove (see (5.5) of Chapter I) that r = J]ni=1 f,,r_ m (n _> 1).
(ii) Sum (i) over n to give an alternative proof of (8.11).
(iii) Use (i) to indicate how one may compute the distribution of the first visit to
state j (after time zero), starting in state i, in terms of p;PP (n _> 1).

11. (i) Show that if II II and II 110 are any two norms on t, then there are positive
constants c,, c 2 such that

c1 IIxll -< Ilxllo <- c211x11 for all x e

A norm on t8 k is a nonnegative function x p II x II on 08 k with the properties that

(a) Ilcx11 = Icl Ilxll for c e R , x e ask,
(b) IlxII =0,xel8k,iffx=0,
(c) Ix + y11 _< Ilxll + IIYII for all x, ye R'.
[Hint: Use compactness of the unit ball in the case of the Euclidean norm,
IIXII=(xi+ ... +x)' 12 x=(x 1 ,...,x k )]

(ii) Show that the stability condition given in Example 1 implies that X. . 0 in
every norm.

Exercises for Section II.9

1. Let Y1 , Y2 , ... be i.i.d. random variables.
(i) Show that if EIY,I < oo then {max(Y1 , .... Yn )}/n --* 0 a.s. as n > cc.
(ii) Verify (9.8) under the assumption (9.4). [Hint: Show that P(IY^I > e i.o.) = 0
for every e> 0.]
2. Verify for (9.12) that

lim = E(T; 2) t) =

provided E j (tj(' ) ) < co.


Jim ' = o0
n w Nn

Pia.s. for a null-recurrent state j such that i#-j.

4. Show that positive and null recurrence are class properties.
5. Let {X} be an irreducible aperiodic positive recurrent Markov chain having
transition matrix p on a denumerable state space S. Define a transition matrix q
on S x S by

= Pik P;i, (i, j), (k, 1) e S x S.

Let {Z} be a Markov chain on S x S with transition law q. Define

Tp = inf{n: Zn e D}, where D = {(i, i): je S} c S x S. Show that if P(i.j) (TD < cc) = I
for all i, j then for all i, je S, lim,,.,. p;; = 7t 1 , where {n t } is the unique invariant

distribution. [Hint: Use the coupling method described in Exercise 6.6 for finite
state space.]


6. (General BirthCollapse) Let p be a transition probability matrix on

S={0,1,2,...} of the form e ;. , + ,=p i ,p,. o =I p j , i=0,1,2,...,O<p i <1,
i >_ 1, Po = 1. Show:
(i) All states are recurrent

k x

iff lim F1 p j = 0 iff I (1 p j ) = oo .

(ii) If all states are recurrent, then positive recurrence holds

iff pj < b .

(iii) Calculate the invariant distribution in the case p j = 1/(j + 2).

7. Let {S"} be the simple symmetric random walk starting at the origin. Define
f: Z -^ l8' by f (n) = I if n _> 0 and f (n) = 0 if n < 0. Describe the behavior of
[ (So) + ... + f(Sn- 1 )]/n as n -i oo.

8. Calculate the invariant distribution for the Renewal Model of Exercise 3.8, in the
case that p"=pn -1 (1 p),n=0, 1,2,. . . where 0 <p < 1.
9. (One-Dimensional Nearest-Neighbor Ising Model) The one-dimensional nearest-
neighbor Ising model of magnetism consists of a random distribution of + I-valued
random variables (spins) at the sites of the integers n = 0, 1, 2, .... The
parameters of the model are the inverse temperature /i = - 1 - > 0 where T is
temperature and k is a universal constant called Boltzmann s constant, an external
field parameter H, and an interaction parameter (coupling constant) J. The spin
variables X", n = 0, + 1, 2, +3, ... , are distributed according to a stochastic
process on { 1,1 } indexed by Z with the Markov property and having stationary
transition law given by

exp{Jai + Hn}
p(X" + ' =nI X" =Q)= 2cosh(H+Ja)

for a, >a e { + 1, 1}, n = 0, + 1, 2.... ; by the Markov property is meant that

the conditional distribution of X" + , given {Xk : k _< n} does not depend on
{Xk ,k_<n-1}.
(i) Calculate the unique invariant distribution n for p.
(ii) Calculate the large-scale magnetization (i.e., ability to pick up nails), defined by

MN =[X_ N + ... +XN ]/(2N+1)

in the so-called bulk (thermodynamic) limit as N + oo.

(iii) Calculate and plot the graph (i.e., magnetic isotherm) of EX0 as a function of
H for fixed temperature. Show that in the limit as H 0 + or H 0 - the
bulk magnetization EX, tends to 0; i.e., there is no (zero) residual magnetization
remaining when H is turned off at any temperature.

(iv) Determine when the process (in equilibrium) is reversible for the invariant
distribution; modify Exercise 7.4 accordingly.
10. Show that if {X} is an irreducible positive-recurrent Markov chain then the
condition (iv) of Exercise 7.4 is necessary and sufficient for time-reversibility of the
stationary process started with distribution it.
*11. An invariant measure for a transition matrix ((p ;j )) is a sequence of nonnegative
numbers (m ; ) such that m ; p ;J = m j for all j ES. An invariant measure may or
may not be normalizable to a probability distribution on S.
(i) Let p ; _ ;+ , = p ; and p ; , o = I p ; for i = 0, 1, 2, .... Show that there is a unique
invariant measure (up to multiples) if and only if tim,,. fjk=, Pk = 0; i.e., if
and only if the chain is recurrent, since the product is the probability of no
return to the origin.
(ii) Show that invariant measures exist for the unrestricted simple random walk
but are not unique in the transient case, and is unique (up to multiples) in
the (null) recurrent case.
(iii) Let Poo = Poi = i and p i , ; _, = p i , ; = 2 - ' -Z , and p ;.;+ , = I 2 i = 1, 2, 3,
.... Show that the probability of not returning to 0 is positive (i.e., transience),
but that there is a unique invariant measure.
12. Let { Y} be any sequence of random variables having finite second moments and
let y, = Cov(Y, Y,), = EY, o = Var(Y) = y, and p
(i) Verify that 1 <_ p _< 1 for all n and m. [Hint: Use the Schwarz Inequality.]
(ii) Show that if p . , _< f(In ml), where f is a nonnegative function such that
n_ 2 Yk=f(k)Y n =1 Qk -*0 as n -* oo, then the WLLN holds for {}}.
(iii) Verify that if = p o ^_^ = p(ln ml) > 0, then it is sufficient that
p(k) -*0 as n -+ oc for the WLLN.
(iv) Show that in the case of nonpositive correlations it is sufficient that
n -1 Ik= 1 ak -+0 as n , oo for the WLLN.
13. Let p be the transition probability matrix for the asymmetric random walk on
S = {0, 1, 2, ...} with 0 absorbing and p i , ;+ , = p > Z for i _> 1. Explain why for
fixed i > 0,

1 ^
j e S,

does not converge to the invariant distribution S o ({j}) (as n -- co). How can this
be modified to get convergence?
14. (Iterated Averaging)
(i) Let a,, a 2 , a 3 be three numbers. Define a 4 = (a, + a 2 + a 3 )/3, a 5 =
(a 2 + a 3 + a 4 )/3,.... Show that lim,, a = (a, + 2a 2 + 3a 3 )/6.
(ii) Let p be an irreducible positive recurrent transition law and let a,, a 2 , ... be
any bounded sequence of numbers. Show that

lim Pi! aj _
) ajnt,

where (it) is the invariant distribution of p. Show that the result of (i) is a
special case.

Exercises for Section 11.10

1. (i) Let Y1 , Y,, ... be i.i.d. with EY < co. Show that max(Y 1 , ... , Y")/,/n 0 a.s.
as n * . [Hint: Show that P(Y.2 > ne i.o.) = 0 for every e > 0.]
(ii) Verify that n '/ZS" has the same limiting distribution as (10.6).

2. Let {W"(t): t >_ 0} be the path process defined in (10.7). Let t I < t 2 < < t k , k >_ 1,
be an arbitrary finite set of time points. Show that (W"(t 1 ), ... , W"(t k )) converges in
distribution as n ^ x to the multivariate Gaussian distribution with mean zero and
variancecovariance matrix ((D min{t i , t i })), where D is defined by (10.12).
3. Suppose that {X"} is Markov chain with state space S = { 1, 2, . .. , r} having unique
invariant distribution (it ; ). Let

N(i)= #{k_<n:X k =i }, ieS.

Show that

`(N"(1) Na(r)
n 7C 1 ...
n n

is asymptotically Gaussian with mean 0 and variancecovariance matrix F = ((y, J ))


= b it n ; n t + (p) njmi) + (pj n t 7Z I ), for 1 -< i, j -< r.

k=1 k=1

4. For the one-dimensional nearest-neighbor Ising model of Exercise 9.9 calculate the
(i) The pair correlations p,,,, = Cov(X", X,").
(ii) The large-scale variance (magnetic susceptibility) parameter Var(X 0 ).
(iii) Describe the distribution of the fluctuations in the (bulk limit) magnetization
(cf. Exercise 9.9(i)).

5. Let {X"} be a Markov chain on S and define Y = (X", Xri+ 1 ), n = 0, 1, 2..... Let
p = ((p, j )) be the transition matrix for {X"}.
(i) Show that { Y"} is a Markov chain on the state space defined by
S'_ {(i,j)eS x S:p ij >0}.
(ii) Show that if {XX } is irreducible and aperiodic then so is { }"}.
(iii) Suppose that {X"} has invariant distribution it = (n i ). Calculate the invariant
distribution of { },}.
(iv) Let (i, j) e S' and let T be the number of one-step transitions from i to j by
X0 , X"... , X" started with the invariant distribution of (iii). Calculate
lim". x (T"/n) and describe the fluctuations about the limit for large n.
6. (Large-Sample Consistency in Statistical Parameter Estimation) Let XX = I or 0
according to whether the nth day at a specified location is wet (rain) or dry. Assume
{X"} is a two-state Markov chain with parameters = P(X" +1 = II X" = 0) and
S=P(X 1 =0^X"=1),n=0,1,2,...,0<f<1,0<S<I.Suppose that {X}
is in equilibrium with the invariant initial distribution it = (n 1 , n o ). Define statistics
based on the sample X 0 , X 1 , ... , X" to estimate /3, it , respectively, by t" = S/(n + 1)

and ^" = T"/n, where S. = X0 + + X. is the number of wet days and


_ >. k=O l ((Xk.Xk l)=(0.I)) is the number of dry-to-wet transitions. Calculate

n(" and lim". ^"
) )

7. Use the result of Exercise 1.5 to describe an extension of the SLLN and the CLT to
certain rth-order dependent Markov chains.

Exercises for Section II.11

1. Let {X"} be a two-state Markov chain on S = {O, 1 } and let T o be the first time
{X"} reaches 0. Calculate P l (z o = n), n _> 1, in terms of the parameters p, o and Poi
2. Let {X"} be a three-state Markov chain on S = {0, 1, 2} where 0, 1, 2 are arranged
counterclockwise on a circle, and at each time a transition occurs one unit clockwise
with probability p or one unit counterclockwise with probability 1 p. Let t o denote
the time of the first return to 0. Calculate P(-r o > n), n > 1.
3. Let T o denote the first time starting in state 2 that the Markov chain in Exercise
5.6 reaches state 0. Calculate P 2 (r o > n).

4. Verify that the Markov chains starting at i having transition probabilities p and p,
and viewed up to time T A have the same distribution by calculating the probabilities
of the event {X0 = i, X, = i ...... Xm = m , T A =m} under each of p and p.
5. Write out a detailed explanation of (11.22).
6. Explain the calculation of (11.28) and (11.29) as given in the text using earlier results
on the long-term behavior of transition probabilities.
7. (Collocation) Show that there is a unique polynomial p(x) of degree k that takes
prescribed (given) values v o , v 1 ..... v k at any prescribed (given) distinct points
x 0 , x,, ... , x k , respectively; such a polynomial is called a collocation polynomial.
[Hint: Write down a linear system with the coefficients a o , a,, ... , a k of p(x) as the
unknowns. To show the system is nonsingular, view the determinant as a polynomial
and identify all of its zeros.]
*8. (Absorption Rates and the Spectral Radius) Let p be a transition probability matrix
for a finite-state Markov chain and let r ; be the time of the first visit to j. Use (11.4)
and the results of the PerronFrobenius Theorem 6.4 and its corollary to show that
exponential rates of convergence (as obtained in (11.43)) can be anticipated more

9. Let p be the transition probability matrix on S = {0, 1, 2, ...} defined by


ifi<0,j=0,-1,-2,..., i +I

1 ifi=0,j=0
0 ifi=0,j960.

(i) Calculate the absorption rate.


(ii) Show that the mean time to absorption starting at i > 0 is given by =, (i /k).
10. Let {X} be the simple branching process on S = {0, 1, 2, ...} with offspring
distribution { fj }, f j a jfj 1.
(i) Show that all nonzero states in S are transient and that lim^ P 1 (X = k) = 0,
(ii) Describe the unique invariant probability distribution for {X}.
II. (i) Suppose that in a certain society each parent has exactly two children, and
both males and females are equally likely to occur. Show that passage of the
family surname to descendants of males eventually stops.
(ii) Calculate the extinction probability for the male lineage as in (i) if each parent
has exactly three children.
(iii) Prompted by an interest in the survival of family surnames, A. J. Lotka (1939),
"Theorie Analytique des Associations Biologiques II, Actualites Scientifiques
et Industrielles, (N.780), Hermann et Cie, Paris, used data for white males in
the United States in 1920 to estimate the probability function f for the
number of male children of a white male. He estimated f(0) = 0.4825,
f(j)= (0.2126)(0.5893)' ' (j = 1,2,...).

(a) Calculate the mean number of male offspring.

(b) Calculate the probability of survival of the family surname if there is only
one male with the given name.
(c) What is the survival probability forj males with the given name under this
(iv) (Maximal Branching) Consider the following modification of the simple
branching process in which if there are k individuals in the nth generation, and
if X,, X 2 , ... , X,, are independent random variables representing their
respective numbers of offspring, then the (n + 1)st generation will contain
Zn + , = max{X . ,, X z , ... , X.k } individuals. In terms of the survival of family
names one may assume that only the son providing the largest number of
grandsons is entitled to inherit the family title in this model (due to J. Lamperti
(1970), "Maximal Branching Processes and Long-Range Percolation, J. Appt.
Probability, 7, pp. 89-98).
(a) Calculate the transition law for the successive size of the generations when
the offspring distribution function is F(x), x = 0, 1, 2, ... .
(b) Consider the case F(x) = 1 (1/x), x = 1, 2, 3..... Show that

lim P(Z + , -< kx I Z = k) = e x ',

- - x>0.

12. Let f be the offspring distribution function for a simple branching process having
finite second moment. Let p = > k kf(k), v 2 = E k (k p) 2 f(k). Show that, given
Xo = 1,

1 )/(u 1) if it # I
Var X. =
g if p = 1.

13. Each of the following distributions below depends on a single parameter. Construct
graphs of the nonextinction probability and the expected sizes of the successive
generations as a function of the parameter.


p ifj= 2
(i) f(j)= q ifj=0
0 otherwise;

(ii) f(j)=qp', j=0,1,2,...;

(iii) f(j)=^ i j

14. (Electron Multiplier) A weak current of electrons may be amplified by a device

consisting of a series of plates. Each electron, as it strikes a plate, gives rise to a
random number of electrons, which go on to strike the next plate to produce more
electrons, and so on. Use the simple branching process with a Poisson offspring
distribution for the numbers of electrons produced at successive plates.
(i) Calculate the mean and variance in the amplification of a single electron at the
nth plate (see Exercise 12).
(ii) Calculate the survival probability of a single electron in an infinite succession
of plates if y = 1.01.
15. (A Generalized Gambler's Ruin) Gamblers 1 and 2 have respective initial capitals
i > 0 and c i > 0 (whole numbers) of dollars. The gamblers engage in a series of
fair bets (in whole numbers of dollars) that stops when and only when one of the
gamblers goes broke. Let X. denote gambler l's capital at the nth play (n >_ 1),
Xo = I. Gambler 1 is allowed to select a different game (to bet) at each play subject
to the condition that it be fair in the sense that E(X X^_,) = X,_,, n = 1, 2, ... ,
and that the amounts wagered be covered by the current capitals of the respective
gamblers. Assume that gambler l's selection of a game for the (n + 1)st play (n >_ 0)
depends only on the current capital X. so that {XX } is a Markov chain with stationary
transition probabilities. Also assume that P(X + I = i XX = i) < 1, 0 < i < c,
although it may be possible to break even in a play. Calculate the probability that
gambler 1 will eventually go broke. How does this compare to classical gamblers
ruin (win or lose $1 bets placed on outcomes of fair coin tosses)? [Hint: Check that
a c (i) = i/c, 0 < i < c, a 0 (i) _ (c i)/c, 0 < i _< c (uniquely) solves (11.13) using the
equation E(X X,) = X_ 1 .]

Exercises for Section II.12

1. (i) Show that for (12.10) it is sufficient to check that

9,(a) _
_ q(b, a)q(a, c)
a, b, cc S. [Hint: Y_ g(a) = 1, b, c e S.]
9,(0) q(b, O)q(O, c)' a

(ii) Use (12.11), (12.12) to show that this condition can be expressed as

9(a) 9e.o(b) 9e.e( 0 ) 9e.^(a)

a, b, cc S.
9n,( 8 ) g9(0) 9e,e(b) 9e.,,(0)


(iii) Consider four adjacent sites on Z in states a, , a, b, respectively. For notational

convenience, let [a, , a, b] = P(X0 = a, X, = , X 2 = a, X 3 = b). Use the
condition (ii) of Theorem 12.1 to show

[a, , a, b] = P(X0 = a, X 2 = a, X3 = b)g".a()

and, therefore,

[a, , a, b] _ g()
[a, /3', a, b] g,,,(')

(iv) Along the lines of (iii) show that also

[a, , a, b] g (a)
[a, , a', b] g.b(a')

Use the "substitution scheme" of (iii) and (iv) to verify (12.10) by checking (ii).
2. (i) Verify (12.13) for the case n = 1, r = 2. [Hints: Without loss of generality, take
h = 0, and note,

[x, , y b]

P(X 1 = ,X 2 = yIXo = a,X3 =h)= --

[a, u, v, b]

and using Exercise 1(iii)(iv) and (12.10),

[a, u, v, b] _ g(u)g(v) _ 9(a, u)q(u, v)q(v, b)

[a, u', v', b] ga."(u')gr,b(v') q(a, u')q(u', )9(v', b)

Sum over u, v E S and then take u' = , v' = y.]

(ii) Complete the proof of (12.13) for n = 1, r 2 by induction and then n >- 2.
3. Verify (12.15).
4. Justify the limiting result in (12.17) as a consequence of Proposition 6.1. [Hint: Use
Scheffe's Theorem, Chapter 0.]
*5. Take S = {0, 1}, g,,,(1) = , g, 0 (1) = v. Find the transition matrix q and the
invariant initial distribution for the Markov random field viewed as a Markov chain.

Exercises for Section 11.13

(i) Let {Y": n -> 0} be a sequence of random vectors with values in Il k which converge
a.s. to a random vector Y. Show that the distribution Q. of Y. converges weakly
to the distribution Q of Y.
(ii) Let p(x, dy) be a transition probability on the state space S = R k (i.e., (a) for each
x e S. p(x, dy) is a probability measure on (R", B') and (b) for each B E
x . p(x, B) is a Borel-measurable function on R k ). The n-step transition
probability p " (x, dy) is defined recursively by
( )

p cl) pcn+1)(x, p(y, B)p' n) (x, dy).

(x, dy) = p(x, dy), B)
= Js
Show that, if p " (x, dy) converges weakly to the same probability measure it(dy)
) )

for all x, and x * p(x, dy) is weakly continuous (i.e., f s f (y) p(x, dy) is a continuous
function of x for every bounded continuous function f on S), then n is the unique
invariant probability for p(x, dy), i.e., j 13 p(x, B)rc(dx) = n(B) for all Be .V V. [Hint:
Let f be bounded and continuous. Then

J dY) -> f .%(Y)7r(dy),

1lr(dz). ]
(iii) Extend (i) and (ii) to arbitrary metric space S, and note that it suffices to require
convergence of n - ' Jm- If (Y)pl'")(x, dy) to If (y)ir(y) dy for all bounded
continuous f on S.
2. (i) Let B 1 , B 2 be m x m matrices (with real or complex coefficients). Define IIBII as
in (13.13), with the supremum over unit vectors in Il'" or C'. Show that


(ii) Prove that if B is an m x m matrix then

IIBII < m' 12 max {Ib 1 l: 1 < i,j <, m}.

(iii) If B is an m x m matrix and IIBII is defined to be the supremum over unit vectors
in C'", show that IIB"II >, r"(B). Use this together with (13.17) to prove that
limllB"II" exists and equals r(B). [Hint: Let A. be an eigenvalue such that
12,1 = r(B). Then there exists x e C', (1x11 = 1, such that Bx = Ax.]

3. Suppose E I is a random vector with values in 1.
(i) Prove that if b > 1 and c > 0, then

log c
P(Ie 1 I > cb") EEIZI 1, where Z = logI ,I
" =1 log (5

[Hint: P(IZI > n) _ nP(n < IZI < n + 1). ]

_< _<
n =1 n=1

(ii) Show that if (13.15) holds then (13.16) converges. [Hint:

dfl(" B""II 1 "f") where d = max{8'IIBII': 0

(iii) Show that (13.15) holds, if it holds for some S < 1 /r(B). [Hint: Use the Lemma.]
r n o }. ]

4. Suppose Y a"z" and 1] b "z" are absolutely convergent and are equal for Izl < r, where
r is some positive number. Show that a n = bn for all n. [Hint: Within its radius of

convergence a power series is infinitely differentiable and may be repeatedly

differentiated term by term.]
5. (i) Prove that the determinant of an m x m matrix in triangular form equals the
product of its diagonal elements.
(ii) Check (13.28) and (13.35).
6. (i) Prove that under (13.15), Y in (13.16) is Gaussian if c" are Gaussian. Calculate
the mean vector and the dispersion matrix of Y in terms of those of E".
(ii) Apply (i) to Examples 2(a) and 2(b).
7. (i) In Example I show that Jhi < 1 is necessary for the existence of a unique invariant
(ii) Show by example that ibI < I is not sufficient for the existence of a unique invariant
probability. [Hint: Find a distribution Q of the noise E" with an appropriately
heavy tail.]
8. In Example 1, assume EE < oo, and write a = EE", X" + , = a + bX" + 0" + where
= a" a (n -> 1). The least squares estimates of a, b are N , b N , which minimize
Yn- (XX+ , a bX") z with respect to a, b.
(i) Show that a N = Y 6 N X, b N = I ' X" + , (X" (XX X ) 2 , where
X= N 1 X", Y= N 1 I i X. [Hint: Reparametrize to write a + bX" _
a, + b(X" X).]
(ii) In the case Ibi < 1, prove that a N + a and b N - b a.s. as N oo.
9. (i) In Example 2 let m = 2, b _ 4, b 12 = 5, b 2 , = 10, b 22 = 3. Assume e, has
a finite absolute second moment. Does there exists a unique invariant probability?
(ii) For the AR(2) or ARMA(2, q) models find a sufficient condition in terms of o
and (i, for the existence of a unique invariant probability, assuming
that q" has a finite rth moment for some r > 0.

Exercises for Section II.14

1. Prove that the process {X"(x): n >- 0} defined by (14.2) is a Markov process having
the transition probability (14.3). Show that this remains true if the initial state x is
replaced by a random variable X. independent of {a"}.
2. Let F"(z):= P(X, -< z), n >- 0, be a sequence of distribution functions of random
variables X. taking values in an interval J. Prove that if F + ,(z) F(z) converges
to zero uniformly for z e J, as n and m go to co, then F(z) converges uniformly (for
all z e J) to the distribution function F(z) of a probability measure on J.
3. Let g be a continuous function on a metric space S (with metric p) into itself. If,
for some x e S, the iterates g ( "^(x) - g(gt" (x)), n >- 1, converge to a point x* E S
as n * cc, then show that x* is a fixed point of g, i.e., g(x*) = x*.
4. Extend the strong Markov property (Theorem 4.1) to Markov processes {X"} on
an interval J (or, on
5. Let r be an a.s. finite stopping time for the Markov process {X"(x): n > 0} defined
by (14.2). Assume that X,(x) belongs to an interval J with probability 1 and p ( " ) (y, dz)
converges weakly to a probability measure n(dz) on J for all ye J. Assume also
that p(y, J) = I for all y e J.

(i) Prove that p " (x, dz) converges weakly to n(dz). [Hint: p '` (x, J) . I as k --i c0,
( ) ( )

(k+r) .) (k)
f f(y)p (x dy) = f ($f(z)p^ (y, dz))p (x dy).]
(ii) Assume the hypothesis above for all x e J (with J not depending on x). Prove
that it is the unique invariant probability.
6. In Example 2, ifs,, are i.i.d. uniform on [ 1,1], prove that {X 2n (x): n > 1} is i.i.d.
with common p.d.f. given by (14.27) if XE [-2, 2].
7. In Example 2, modify f as follows. Let 0 < <. Define fo (x): _f(x) for

2 _< x < 6, and 6 _< x _< 2, and linearly interpolate between (,), so that f,
is continuous.
(i) Show that, for x e [6, 1] (or, x c [-1, b]) {X"(x): n >, 1} is i.i.d. with common
distribution it (or, nx+2).
(ii) For x e (1, 2] (or, [-2, 1)) {X"(x): n >_ 2} is i.i.d. with common distribution
ix (or, ix +2).
(iii) For x e (-8, 6) {X"(x): n >_ 1} is i.i.d. with common distribution it_ X+ ,.
8. In Example 3, assume P(e 1 = 0) > 0 and prove that P(e,, = 0 for some n >_ 0) = 1.
9. In Example 3, suppose E log s > 0. ;

(i) Prove that E, , {1/(E 1 ...e" )} converges a.s. to a (finite) nonnegative random
variable Z.
(ii) Let d, := inf{z > 0: P(Z < z) > 0}, d 2 == sup{z >- 0: P(Z >- z) > 0}. Show that

=0 if x < c(d, + l ),
p(x) e (0, 1) if c(d, + l) < x < c(d 2 + 1),
=1 if x > c(d 2 + 1).

10. In Example 3, define M := sup{z >_ 0: P(e, > z) > 0}.

(i) Suppose I <M < oo. Then show that p(x) = 0 if x < cM/(M 1).
(ii) If Al _< 1, then show that p(x) = 0 for all x.
(iii) Let m be as in (14.34) and M as above. Let d,, d 2 be as defined in Exercise
9(ii). Show that

_ 1
(a) d, = > m- " if m > 1, = oo if m 5 1 , and
n =, mI

(b)d 2 = M " - (=_ifM>1=coifM1).


Exercises for Section I1.15

1. Let 0 < p < I and suppose h(p) represents a measure of uncertainty regarding the
occurrence of an event having probability p. Assume
(a) h(i)=0.
(b) h(p) is strictly decreasing for 0 < p _< 1.
(c) h is continuous on (0, 1].
(d)h(pr) = h(p) + h(r), 0 <p < 1,0< r < I.


Intuitively, condition (d) says that the total uncertainty in the joint occurrence of
two independent events is the cumulative uncertainty for each of the events. Verify
that h must be of the form h(p) = c log 2 p where c = h(21) > h(1) = 0 is a positive

constant. Standardizing, one may take

h(p) = log2 p.
2. Let f= be a probability distribution on S = { 1, 2, . .. , M}. Define the entropy
in f by the "average uncertainty," i.e.,

H(f) =Zf loge fi (0 log 0:= 0).


(i) Show that H(f) is maximized by the uniform distribution on S.
(ii) If g = (g ; ) is another probability distribution on S then

H(f) Ji 1og 2 gi

with equality if and only if f = g for all i E S.


3. Suppose that X is a random variable taking values a 1 ..... a M with respective
probabilities p(a, ), ... , p(a M ). Consider an arbitrary binary coding of the respective
symbols a,, I < i z M, by a string cb(a ; ) = ( rI', ... , sl,'t) of 0's and l's, such that no
string (eilt, ... , F 11 ) can be obtained from a shorter code (c,, ... , e 1 n ; n j , by
adding more terms; such codes will be called admissible. The number n ; of bits is
called the length of the code-word 4)(a 1 ).
(i) Show that an admissible binary code 0 having respective lengths n ; exists if and
only if
i =1

[Hint: Let n 1 .... , n M be positive integers and let

Pk = #{ i:n i =k}, k = 1,2,....

Then it is necessary and sufficient for an admissible code that p, S 2,

P2522-2P1.....Pk52kpl2k 1 Pk -1 2 ,k>, 1 .]

(ii) (Noiseless Coding Theorem) For any admissible binary code 0 of a,..... a M
having respective lengths n 1 ..... M' the average length of code-words cannot

n.p(a,) % Z
be made smaller than the entropy of the distribution p of a,, ... , a,, i.e.,

i =1
p(a) log2 p(ai)

H(p) Y
[Hint: Use Exercise 2() with f = p(a ; ), g ; = 2 - " to show

i =1
n1p(ai) + log 2 (

Then apply Exercise 3(i) to get the result.]


k =1
2 - "").


(iii) Show that it is always possible to construct an admissible binary code of

a l , ... , a, such that

H(p) n1 P(ai) 5 H(p) + 1.

[Hint: Select n, such that

log t p(a 1 )

and apply 3(i).]

_< <,
n; log 2 p(a) + 1 for 1 _< _<
i M,

(iv) Verify that there is not a more efficient (admissible) encoding (i.e. minimal
average number of bits) of the symbols a,, a 2 , a 3 for the distribution p(al) = z,
P(a2) = p(a 3 ) = , than the code 0(a 1 ) = ( 0), 0(a 2 ) = ( 1, 0), (a 3 ) = ( 1, 1)
4. (i) Show that Y1 , Y2 ,in (15.4) satisfies the law of large numbers.

(ii) Show that for (15.5) to hold it is sufficient that Yl , YZ satisfy the law of large
, ...



Theoretical Complements to Section II.6

1. Let S be an arbitrary state space equipped with a sigmafield .5 of events. Let y(dx)
be a sigmafinite measure on (S, 9) and let p(x, y) be a transition probability density
with respect to p; i.e., p is a nonnegative measurable function on (S x S, .9' x ,9') such
that for each fixed x, p(x, y) is a p-integrable function of y with total mass 1. Let S2
denote the space of all sequences w = (x 0 x,, ...) of states x i e S. Let .F, denote the

class of all finite-dimensional sets A of the form

A={w=(x 0 ,x l ,...)eftx 1 eB 1 ,i = 0,1,.. ,n},

where B ; e 9 for each i. Define PP (A) for such a set A, x e B 0 , by

=L ...
P(x,Y1)P(Y1,Y2)...P(yn_1,Y(dyn)...u(dy1) (T.6.1)

for x e S. Define P(A) = 0 if x 0 B 0 . The Kolmogorov Extension Theorem assures

us that Px has a unique extension to a probability measure defined for all events in
the smallest sigmafield .F of subsets of S2 that contain S. For any probability measure
y on (S, .9"), define

P1(A) = J PX(A)y(dx), A e F. (T.6.2)

Let X. denote the nth coordinate projection mapping on 12. Then {X} is said to be
a Markov chain on the state space S with transition density p(x, y) with respect to p
and initial distribution, y under P. The results of Exercise 6.9 can be extended to this


setting as follows. Suppose that there is a positive integer r and a p-integrable function
p on S such that fs p(x)p(dx) > 0 and p ' (x, y) > p(y) for all x, y in S. Then there is
( )

a unique invariant distribution it such that

sup p ( " (x, y)p(dy) it(B) (1


where a = f s p(x) u(dx), n' = [n/r], and the sup is over B e .Y .

Proof. To see this define

sup p " (u, y)p(dy)

( )

M(B) := uES ^

m"(B):= inf if p " (u, y)p(dy),

( ) (T.6.3)
s "

sup {M(B) m"(B)}.



= sup Ip ( " ) " 1.

(x, y) pc (z, y)I u(dy)

2 , s

pick+ n.>
(x, y)p(dy) p k+ '(z, y)p(dy)1 ( 1 e)[Mk.(B) mk.(B)]
(ii) Ifil

(iii) The probability measure it given by

n(B) = lim J p " (x, y)p(dy)

( ) (T.6.4)

is well defined and

su ( " ) (x, y)p(dy) it(B) (1 a)"'. (T.6.5)

B p Ip

Also, the following facts are simple to check.

(iv) it is absolutely continuous with respect to p and therefore by the
RadonNykodym Theorem (Chapter 0) has a density rz(y) with respect to p.
(v) For Be .9' with 7r(B) > 0, one has Py,(X" e B i.o.) = 1.
2. (Doeblin Condition) A well-known condition ensuring the existence of an invariant
distribution is the following. Suppose there is a finite measure cp and an integer r >_ 1,
and > 0, such that p ' (x, B) < I r whenever (p(B) _< e. Under this condition there
( )

is a decomposition of the state space as S = U;"_ j S i such that

(i) Each Si is closed in the sense that p(x, S ; ) = I for x e S. (1 i m).
(ii) p restricted to Si has a unique invariant distribution rti.

(iii) - j p ' (x, B) -+ zr (B)

( ) ; as n -. cc for x e Si .
n r =^

Moreover, the convergence is exponentially fast and uniform in x and B.

It is easy to check that the condition of theoretical complement 1 above implies the
Doeblin condition with Qp(dx) = p(x)p(dx). The more general connections between
the Doeblin condition with the gi(dx) = p(x) p(dx) condition in theoretical complement
1 can be found in detail in J. L. Doob (1953), Stochastic Processes, Wiley, New York,
pp. 190.
3. (Lengths of Increasing Runs) Here is an example where the more general theory
Let X,, X2 , ... be i.i.d. uniform on [0, 1]. Successive increasing runs among the
values of the sequence X 1 , X2 ,. . . are defined by placing a marker at 0 and then between
X; and X +1 whenever X; exceeds X +1 , e.g., X0.20, 0.24, 0.6010.5010.30, 0.7010.20... .
Let Y. denote the initial (smallest) value in the nth run, and let L. denote the length
of the nth run, n = 1, 2.... .
(i) { Y,} is a Markov chain on the state space S = [0, 1] with transition density

e' - x ify<x
P(x, Y) _

(ii) Applying theoretical complement 1, { Y} has a unique invariant distribution n.

Moreover, it has a density given by n(y) = 2(1 y), 0 < y < 1.
(iii) The limit in distribution of the length L. as n --+ oo may also be calculated from
that of { }}, since

P(L m) =
P(L m I Y = Y)P(Y E dY) =
( m 1)! J P(Y E dY),

and therefore,
P(L >, m):= lim P(L >_ m) = J ^ 1
0 )1 2(1 y) dy.
--W o (m 1)!

Note from this that

EL. _ P(L. _> m) = 2.


Theoretical Complement to Section 11.8

1. We learned about the random rounding problem from Andreas Weisshaar (1987),
"Statistisches Runden in rekursiven, digitalen Systemen 1 und 2, Diplomarbeit erstellt
am Institut fr Netzwerk- und Systemtheorie, Universitt Stuttgart. The stability
condition of Example 8.1 is new, however computer simulations by Weisshaar suggest
that the result is more generally true if the spectral radius is less than 1. However,
this is not known rigorously. For additional background on the applications of this


technique to the design of digital filters, see R. B. Kieburtz, V. B. Lawrance, and K.

V. Mina (1977), "Control of Limit Cycles in Recursive Digital Filters by Randomized
Quantization," IEEE Trans. Circuits and Systems, CAS-24(6), pp. 291-299, and
references therein.

Theoretical Complements to Section I1.9

1. The lattice case of Blackwell's Renewal Theorem (theoretical complement 2 below)
may be used to replace the large-scale Caesaro type convergence obtained for the
transition probabilities in (9.16) by the stronger elementwise convergence described
as follows.
(i) If j e S is recurrent with period 1, then for any i e S,

lim pi; ) = pj;(E;i; t) ) - `


where p 0 is the probability P; (i^' ) < cc) (see Eq. 8.4).

(ii) If j e S is recurrent with period d> 1, then by regarding p as a one-step transition

lim p(Jd) = pijd(E;i^ii)-

To obtain these from the general renewal theory described below, take as the
delay Zo the time to reach j for the first time starting at i. The durations of the
subsequent replacements Z 1 , Z 2 , ... represent the lengths of times between returns
to I.

2. (Renewal Theorems) Let Z,, 4,... be independent nonnegative random variables

such that Z,, 4,.. . are i.i.d. with common (nondegenerate) distribution function F,
and Z o has distribution function G. In the customary framework of renewal theory,
components subject to failure (e.g. lightbulbs) are instantly replaced upon failure,
and Z,, Z 2 , ... represent the random durations of the successive replacements. The
delay random variable Zo represents the length of time remaining in the life of the
initial component with respect to some specified time origin. For example, if the
initial component has age a relative to the placement of the time origin, then one
may take

F(x + a) F(a)
G(x) _
1 F(a)

S. = Zo + Z, + + Zn , n ^> 0, (T.9.1)

and let

N,=inf{n>-O:Sn>t}, t>-0. (T.9.2)

We will use the notation S.0, N sometimes to identify cases when Z o = 0 a.s. Then
Sn is the time of the (n + 1)st renewal and N, counts the number of renewals up to


and including time t. In the case that Z o = 0 a.s., the stochastic (counting) process
{N} - {N} is called the (ordinary) renewal process. Otherwise {N,} is called a delayed
renewal process.
For simplicity, first restrict attention to the case of the ordinary renewal process.
Let u = EZ l < oo. Then 1/ is called the renewal rate. The interpretation as an
average rate of renewals is reasonable since

Sx -i \ t \
N, N, N,

and N, -- oc as t --* oo, so that by the strong law of large numbers

N, -. 1
a.s. as t - oc. (T.9.4)
t p

Since N, is a stopping time for {Sn } ( for fixed t 0), it follows from Wald's Equation
(Chapter I, Corollary 13.2) that
ESN , = pEN,. (T.9.5)

In fact, the so-called Elementary Renewal Theorem asserts that

as t --* Cc. (T.9.6)

To deduce this from the above, simply observe that pEN, = ESN , > t and therefore

liminf -

- . t p

On the other hand, assuming first that Z. < C a.s. for each n 1, where C is a >,
positive constant, gives pEN, < t + C and therefore for this case, limsup,.,(EN,/t)
I/p. More generally, since truncations of the Z. at the level C would at most decrease
the S, and therefore at most increase N, and EN this last inequality applied to the
truncated process yields

<- 1
limsup as C -4 co. (T.9.7)
t CP(Z 1 >, C) + xF(dx) p

The above limits (T.9.4) and (T.9.6) also hold in the case that p = co, under the
convention that 1 /0o is 0, by the SLLN and (T.9.7). Moreover, these asymptotics
can now be applied to the delayed renewal process to get precisely the same conclusions
for any given (initial) distribution G of delay.
With the special choice of G = F. defined by

F(x) =1 I P(Z, > u)du, x 0, (T.9.8)
u j0


the corresponding delayed renewal process N, called the equilibrium renewal process,
has the property that

EN(t, t + h] = h/p, for any h, t l 0, (T.9.9)


N. (T.9.10)

To prove this, define the renewal function m(t) = EN t >- 0, for {N1 }. Then for the
general (delayed) process we have

m(t) = EN, = P(N, n) = G(t) + P(N, n+l)

= G(t) + P(N,_ -> n)F(du) = G(t) + J m(t u)F(du)

' 0

= G(t) + m * F(t), (T.9.11)

where * denotes the convolution defined by

m s F(t) = J m(t s)F(ds). (T.9.12)


Observe that g(t) = t/p, t -> 0, solves the renewal equation (T.9.11) with G = Fx,;

u o
(1 F(u)) du +
F(u) du = F(t) + --
N o 0
F(ds) du JJ
= F^(t) + 1
u J J r duo
' s f EP
F(ds) = F (t) + t s F(ds).

To finish the proof of (T.9.9), observe that g(t) = m`(t):= EN;, t >- 0, uniquely
solves (T.9.11), with G = F., among functions that are bounded on finite intervals.
For if r(t) is another such function, then by iterating we have

r(t) = F(t) +^^ r(t u)F(du)


= F(t) + f { F^(t u) +

J t
u r(t u s)F(ds)jF(du)

= P(Nj' >- 1) + P(N' -> 2)

+ f r(t v)P(S e dv)

= P(Nf' ^ 1) + P(N, 3 2) + + P(Nf' ? n) + J r(t v)P(S e dv).



r(t) _ P(Nj 3 n) = m W (t)
n =i


r Ir(t - v)IP(S E dv) _< sup jr(s)jP(S <_ t) -. 0

.' as n oo;
J o ssr

P(S, -< t) = P(N >_ n) -+ 0 as n -- oo since P(N >, n) = EN < cc.
n =i

Let d be a positive real number and let L d = {0, d, 2d, 3d, ...}. The common
distribution F of the durations Z 1 , Z 2 , ... is said to be a lattice distribution if there
is a number d > 0 such that P(Z 1 e L a ) = 1. The largest such d is called the period of F.

Theorem T.9.1. (Blackwell's Renewal Theorem)

(i) If F is not lattice, then for any h > 0,

EN(t,t+h]= P(t<Sk_<t+h)-^h ast -+oo, (T.9.15)


where N(t, t + h] := N, +k - NN is the number of renewals in time t to t + h.

(ii) If F is lattice with period d, then

EN " = ( ) P(Sk = nd) - - as n - oo, (T.9.16)

k=^ u


Nc") _ (T.9.17)

is the number of renewals at the time nd. In particular, if P(Z, = 0) = 0, then

EN " is simply the probability of a renewal at time nd.
( )

Note that assuming that the limit exists for each h > 0, the value of the limit in
(i), likewise (ii), of Blackwell's theorem can easily be identified from the elementary
renewal theorem (T.9.6) by noting that cp(h):= lim^.. m EN(t, t + h] must then be
linear in h, p(0) = 0, and

q(1)= lim {EN" +, - EN"}

[{EN, -ENo }+{EN 2 -EN,}++{ENn -EN I }]

= 1im

EN" 1
_ (im = -.
n -.. n


Recently, coupling constructions have been successfully used to prove Blackwell's

renewal theorem. The basic idea, analogous to that outlined in Exercise 6.6 for
another convergence problem, is to watch both the given delayed renewal and an
equilibrium renewal with the same renewal distribution F until their renewal times
begin (approximately) to agree. After sufficient agreement is established, then one
expects the statistics of the given delayed renewal process to resemble more and more
those of the equilibrium process from that time onwards. The details for this argument
follow. A somewhat stronger equilibrium property than (T.9.9) will be invoked, but
not verified until Chapter IV.

Proof. To make the coupling idea precise for the case of Blackwell's theorem with
< oc, let {Z8 : n >- 1} and {Z: n >- l} denote two independent sequences of renewal
lifetime random variables with common distribution F, and let Z o and Z o be
independent delays for the two sequences having distributions G and G = F es ,
respectively. The tilde () will be used in reference to quantities associated with the
latter (equilibrium) process. Let a > 0 and define,

v(r)=inf{n>0:IS for some h}, (T.9.18)

v"(s)=inf{n>,0:IS,jj e. for some n}. (T.9.19)

Suppose we have established that (e-recurrence) P(v(e) < oo) = I (i.e., the coupling will
occur). Since the event {v(e) = n, v(e) = n} is determined by 4, Z 1 , ... , Z and
Z;,, the sequence of lifetimes {Z' +k : k >- l} may be replaced by the
sequence {Z, +k : k >- 1} without changing the distributions of {S}, {N}, etc. Then,
after such a modification for < h/2, observe with the aid of a simple figure that

N(t+r,t+he]1 1S,wt) -<l (t,t+h]l{s ^SI<N(tE,I+h+E]1{s,iEi^^


N(t+e,t+hs]N(t,t+h]1 {s , ( , > , t

=N(t+e,t+h e]l^s +(N(t+E,t+hE]N(t,t+h])l (s >g

N(t + s, t + h E] 1 ^ s ,, E
N(t, t + h]I (s , (,) ,, I
N(t, t + h]1 Is , II + N(t, t + h]l {s )(= N(t, t + h])

N(t E, t + h + E]1 (s , } + N(t, t + h]l {s ,,,,,,,

N(te,t+h+e]+N(t,t+h]l 5 )>,}. (T.9.21)

Taking expected values and noting the first, fifth, and seventh lines, we have the
following coupling inequality,

EN(t+r,t+he]E(N(t,t +h]l is ,. ( ,,,)

EN(t, t + h] < EN(t c, t + h + e] + E(N(t, t + h]1 (5 , > ,). (T.9.22)

Using (T.9.9),

EN(t+s, t+h e]= h - 2s and EN(t E, t+h+e]=


EN(t, t + h] hl < ?E + E(N(t, t + h]l {s ,}). (T.9.23)

Since A = I Is ,()>$} is independent of {ZN, +k: k >, 1}, we have

E(N(t, t + h]1 (s ,^ >,^) < E o Nh P(SVO > t) = m(h)P(S V(C) > t), (T.9.24)

where E o denotes expected value for the process N h under zero-delay. More precisely,
because (t, t + h] c (t, SN , + h] and there are no renewals in (t, SN ,), we have
N(t, t + h] < inf{k 3 0: ZN, +k > h). In particular, noting (T.9.2), this upper bound
by an ordinary (zero-delay) renewal process with renewal distribution F, is
independent of the event A, and furnishes the desired estimate (T.9.24).
Now from (T.9.23) and (T.9.24) we have the estimate

EN(t, t + h] hl < m(h)P(SV(C) > t) + ?E , (T.9.25)

which is enough, since e> 0 is arbitrary, provided that the initial e-recurrence
assumption, P(v(e) < oo) = 1, can be established. So, the bulk of the proof rests on
showing that the coupling will eventually occur. The probability P(v(c) < cc) can be
analyzed separately for each of the two cases (i) and (ii) of Theorem
First take the lattice case (ii) with lattice spacing (period) d. Note that for e < d,

v(e) = v(0) = inf{n > 0: S" = 0 for some n}. (T.9.26)

Also, by recurrence of the mean-zero random walk on the integers (theoretical

complement 3.1 of Chapter I), we have P(v(0) < co) = 1. Moreover, S, (0) is a.s. a
finite integral multiple of d. Taking t = nd and h = d with e = 0, we have

EN ( " ) = EN(nd, nd + h] -+ h = d as n - cc. (T.9.27)

P p

For case (i), observe by the HewittSavage zeroone law (theoretical complement
1.2 of Chapter I) applied to the i.i.d. sequence (Z 1 , Z 1 ), (Z 2 , Z 2 ), (Z 3 , Z 3 ), ... , that

P(R " < c i.o. I Z o = z) = 0 or 1,

where R"=
min{ S,fS":5;,S">0,n_>0}=SS,^S "

Now, the distribution of {SR, +; t} ; does not depend on t (Exercise 7.5, Chapter

IV). This, independence of {Z' j } and {S}, and the fact that {Sk + Sk : n 0} does
not depend on k, make {R,, +k } also have distribution independent of k. Therefore,
the probability P(R, < e for some n > k), does not depend on k, and thus

{R < e i.o.} = n {R. < e for some n -> k} (T.9.28)


implies P(R < r i.o.) = P(R < e for some n >, 0) -< P(v(e) < oo). Now,

J P(R < s i.o. = z)P(2 0 e dz)

= P(R < e i.o.) = P(R < r for some n)

= J P(R. <e for some n 12 = z)P(2 e dz). (T.9.29)

0 0

The proof that P(R <e for some n z) > 0 (and therefore is 1) in (T.9.29)
follows from a final technical lemma given below on "points of increase" of distribution
functions of sums of i.i.d. nonlattice positive random variables; a point x is called a
point of increase of a distribution function F if F(b) F(a) > 0 whenever a < x < b.

Lemma. Let F be a nonlattice distribution function on (0, co). The set E of points
of increase of the functions F, F* z , F* 3 , . . is "asymptotically dense at co" in the
sense that for any t > 0 and x sufficiently large, E n (x, x + e) 96 0, i.e., the interval
(x, x + e) meets E for x sufficiently large. 0

The following proof follows that in W. Feller (1971), An Introduction to Probability

Theory and Its Applications, 2nd ed., Wiley, New York, p. 147.

Proof. Let a, b E E, 0 <a < b, such that b a < e. Let 1 = (na, nb]. For
a < n(b a), the interval 1 properly contains (na, (n + 1)a), and therefore each
x > a 2 /(b a) belongs to some I., n >, 1. Since E is easily checked to be closed under
addition, the n + 1 points na + k(b a), k = 0,1, ... , n, belong to E and partition
I. into n subintervals of length b a < r. Thus each x > a 2 /(b a) is at a distance
(b a)/2 </2 of E. If for some r > 0, b a -> e for all a, b e E then F must be
a lattice distribution. To see this say, without loss of generality, E -< b a < 2a for
somea,beE.ThenEnl c {na +k(ba):k= 0,1,...,n}.Since(n+l)aeEnl
for a < n(b a), E n I must consist of multiples of (b a). Thus, if c e E then
c + k(b a) e / n E for n sufficiently large. Thus c is a multiple of (b a). n

Coupling approaches to the renewal theorem on which the preceding is based can
be found in the papers of H. Thorisson (1987), "A Complete Coupling Proof of
Blackwell's Renewal Theorem," Stoch. Proc. App!., 26, pp. 87-97; K. Athreya, D.
McDonald, P. Ney (1978), "Coupling and the Renewal Theorem," Amer. Math.
Monthly, 851, pp. 809-814; T. Lindvall (1977), "A Probabilistic Proof of Blackwell's
Renewal Theorem," Ann. Probab., 5, pp. 482-485.
3. (Birkhoff's Ergodic Theorem) Suppose {X: n -> 0} is a stochastic process on
(S2, .F, P) with values in (S, ,V). The process {X} is (strictly) stationary if for every
pair of integers m >- 0, r >- 1, the distribution of (X 0 , X 1 , ... , X,,) is the same as that


of (X X l+ ... , X. + ,). An equivalent definition is: {X} is stationary if the distribu-
tion , say, of X :_ (X0 , X 1 , X2 , ...) is the same as that of T'X :_ (X,, X l + X2 + ...)
for all r 0. Recall that the distribution of (X X, + ...) is the probability measure

_< _< _< _>

induced on (St, .9 x) by the map co -+ (X,(w), X 1 + ,( w), X 2+r (w)....). Here St is
the space of all sequences x = (x 0 , x 1 , x 2 , ...) with x i E S for all i, and .5' is the
smallest sigmafield containing the class of all sets of the form C = {x e St: x i E B, for
0 < i n} where n 0 and B E . ' (0 i n) are arbitrary. The shift transformation

T is defined by Tx:= (x i , x 2 , ...) on St, so that T'x = (x XI+ x2+ ...).

Denote by I the sigmafield generated by {X: n 0 }. That is, ! is the class of all
sets of the form G = X -1 C = {co E Q: X(co) E C}, C E .9'x For a set G of this form,
write T '6= {w e Q: TX(w) E C) = {(X,, X 2 ,. . .) E C) _ {X E T 'Cl. Such a set
G is said to be invariant if P(G AT 'G) = 0, where A denotes the symmetric difference.

By iteration it follows that if G = {X e C} is invariant then P(G AT 'G) = 0 for all


r > 0, where T 'G = {(X X l + X2,,. . .) e C}. Let f be a real-valued measurable


function on (5 50) Then cp(w):= f(X(w)) is W- measurable and, conversely, all


1-measurable functions are of this form. Such a function cp = f(X) is invariant if

f(X) = f(TX) a.s. Note that G = {X E C) is invariant if and only if 1 G = l c (X) is
invariant. Again, by iteration, if f(X) is invariant then f(X) = f (T'X) a.s. for all r ? 1.
Given any p -measurable real-valued function cp = f(X), the functions (extended

f(X): = lim n -1 (f(X)+f(TX)+-+f(T'X))

n -m


lim n '( f(X)+ ... + f(Tn -1 X))

f(X):= nyt -

are invariant, and the set {7(X) = f(X)} is invariant.

The class 5 of all invariant sets (in 5
) is easily seen to be a sigmafield, which is
called the invariant sigmafield. The invariant sigmafield .1 is said to be trivial if
P(G) = 0 or 1 for every G E J.

Definition T.9.1. The process {X: n > 0} and the shift transformation T are said
to be ergodic if S is trivial.

numbers (Chapter 0, Theorem 6.1).

Theorem T.9.2. (Birkhoff's Ergodic Theorem). Let {X: n _>

The next result is an important generalization of the classical strong law of large

0) be a stationary
sequence on the state space S (having sigmafield .9'). Let f(X) be a real-valued
1-measurable function such that Elf(X)( < oc. Then
(a) n - ' Y_;=
f(TX) converges a.s. and in L' to an invariant random variable g(X),
(b) g(X) = Ef(X) a.s. if S is trivial.

We first need an inequality whose derivation below follows A. M. Garcia (1965),

"A Simple Proof of E. Hopf's Maximal Ergodic Theorem, J. Math. Mech., 14,
pp. 381-382. Write

M(f):= max{0, f(X),f(X)+ f(TX),...,f(X)+ + f(T" - 'X)},

M(f-T)= max{0, f(TX),f(TX)+f(T 2 X),...,f(TX)++f(T"X)},

M(f):= lim M(f)= sup M(f).


Proposition T.9.3. (Maxima! Ergodic Theorem). Under the hypothesis of Theorem


f(X)dP>-0 VGEJ. (T.9.31)

fuW 1 01IG

Proof. Note that f(X)+M(foT)=M^ +1 (f) on the set {MM+ ,(f)>0}. Since
M, + , (f) >, M(f) and {M(f) > 0} c {M + , (f) > 0}, it follows that f (X)
M(f) M(f o T) on {M(f) > 0}. Also, M (f) -> 0, M^(f o T) -> 0. Therefore,

f(X)dP> - (M^(f)M^(fT))dP
M(f)>OV G {M(f)>OInG

= M(f) dP M(f o T) dP

M^(f) dP M(f o T) dP

= 0,

where the last equality follows from the invariance of G and the stationarity of
Thus, (T.9.31) holds with {M(f) > 0} in place of {M(f) > 0}. Now let n j cc.

Now consider the quantities

1 n- 1

A,(f):= max{.f(X), -1(.f(X) + f(TX)), ... , f(T`X)

n,_ o

A(f):= tim A(f)= sup A(f).

new n,(

The following is a consequence of Proposition T.9.3.

Corollary T.9.4. (Ergodic Maximal Inequality). Under the hypothesis of Theorem

T.9.1 one has, for every cc If8',

f(A(fc)n G
f(X) dP -> cP({A(f ) > c} n G) VG E 5. (T.9.32)

Proof. Apply Proposition T.9.3 to the function f - c to get

f (M(f-^0)nG
f(X) dP -> cP({M(f - c) > 0} n G).

But {M(f - c) > 0} = {A(f - c) > 0} = {A(f) > c}, and {M(f - c) > 0} _
{A(f)>c}. n

We are now ready to prove Theorem T.9.2, using (T.9.31).

Proof of Theorem T.9.2. Write

I - ' I^-'
7(X):= i-- > f(T'X), f(X):= lim - Y f(T'X),
p_Qn.=o n,n.=o (T.9.33)

Gc,,(f )'= {f(X) > c, f(X) <d} (c, dc R' )

;Since GG , d ( f) e 5 and Gc , d (f) c {A(f) > c}, (T.9.32) leads to

f(X) dP -> cP(GC , d ( f )). (T.9.34)


Now take -f in place of f and note that (- f) _ - f, ( f ) _

- - f
G_ d , _ C (- f) = GC d (f) to get from (T.9.34) the inequality

f(X)dP _> dP(GC , d (f )),


f(X)dP < dP(Gc , d (f )). (T.9.35)

Now if c > d, then (T.9.34) and (T.9.35) cannot both be true unless P(G,, d (f )) = 0.
Thus, if c > d, then P(GC , d (f )) = 0. Apply this to all pairs of rationals c > d to get
P(f (X) > f(X)) = 0. In other words, (1/n) y;= f (T'X) converges a.s. to
g(X) := f(X).
To complete the proof of part (a), it is enough to assume f >, 0, since
n - ' j -1 f + (T'X) -- 7(X) a.s. and n - ' 2]o - 'f(TX) - r(X) a.s., where
f + = max{ f, 0}, - f - = min{ f, 0}. Assume then f > 0. First, by Fatou's Lemma
and stationarity of {X},

E7(X) < lim E - f(T'(X)) = Ef(X) < oo.
n-ao n r=o

To prove the L'-convergence, it is enough to prove the uniform integrability of the

sequence {(1/n)S(f):n_>1}, where S(f):=Z 'f(T'X). Now since f(X) is

nonnegative and integrable, given s > 0 there exists a constant N E such that


II f (X) fE(X)II l <r where fE (X) := min{ f (X ), N.J. Then

J 1 S"(f) dP 5 J 1 S"(f fE) dP + S"(.ff) dP

t.S^(f>>a1 nn
Ins.,ti)>,1 n

+ N"P({n S"(f) > })

S e + N,Ef(X)/.l. (T.9.36)

It follows that the left side of (T.9.36) goes to zero as .l --p oo, uniformly for all n.
Part (b) is an immediate consequence of part (a).

Notice that part (a) of Theorem T.9.2 also implies that g(X) = E(f (X) J).
Theorem T.9.2 is generally stated for any transformation T on a probability space
(S2, ^, p) satisfying p(T 'G) = p(G) for all G e ^. Such a transformation is called

measure-preserving. If in this case we take X to be the identity map: X(cu) = w, then

parts (a) and (b) hold without any essential change in the proof.

Theoretical Complements to Section 11.10

1. To prove the FCLT for Markov-dependent summands as asserted in Theorem 10.2,

first consider

X= + +Z

Since Z i , . are i.i.d. with finite second moment, the FCLT of Chapter I provides
that {X;"} converges in distribution to standard Brownian motion. The corresponding
result for {W"(t)} follows by an application of the Maximal Inequality to show

sup I X;" W " E . iJ (t)I - 0 in probability as n -+ cc, (T.10.1)


where t is the first return time to j.

Theoretical Complements to Section 11.12

There are specifications of local structure that are defined in a natural manner but
for which there are no Gibbs states having the given structure when, for example,
A = Z, but S is not finite. As an example, one can take q to be the transition matrix
of a (general) random walk on S = Z such that q = q11-1 > 0 for all i, j. In this case

no probability distribution on S^ exists having the local structure furnished by (12.10).

For proofs, refer to the papers of F. Spitzer (1974), "Phase Transition in One-
Dimensional Nearest-Neighbor Systems," J. Functional Analysis, 20, pp. 240-254;
H. Kesten (1975), "Existence and Uniqueness of Countable One-Dimensional Markov
Random Fields," Ann. Probab., 4, pp. 557-569. The treatment here follows F. Spitzer
(1971), "Random Fields and Interacting Particle Systems," MAA Lecture Notes,
Washington, D.C.

Theoretical Complements to Section II.14

(Markov Processes and Iterations of I.I.D. Maps) Let p(x; dy) denote a transition
probability on a state space (S, 91); that is, (1) for each x e S. p(x; dy) is a probability
measure on (S, 9), and (2) for each B e.9', x --+ p(x; B) is .9'-measurable. We will
assume that S is a Borel subset of a complete separable metric space, and .9' its Borel
sigmafield M(S). It may be shown that S may be "relabeled" as a Borel subset C of
[0, 1], with M(C) as the relabeling of .^(S). (See H. L. Royden (1968), Real Analysis,
2nd ed., Macmillan, New York, pp. 326-327). Therefore, without any essential loss
of generality, we take S to be a Borel subset of [0, 1]. For each x e S, let F(.)
denote the distribution function of p(x; dy): FX (y)1= p(x; S o (oo, y]). Define
Fx ' (t) := inf {y e 11': FF (y) > t}. Let U be a random variable defined on some
probability space (f2, , , P), whose distribution is uniform on (0, 1). Then it is
simple to check that P(Fx'(U) _< y) > P(F x (y) > U) = P(U < FX (y)) = Fx (y), and
P(F'(U) _< y) _< P(FF (y) 3 U) = P(U S FX(y)) = Fx(y). Therefore, P(FX'(U) _< y) _
F,(y), that is, the distribution of Fx'(U) is p(x; dy). Now let U,, U2 ,. . . be a sequence
of i.i.d. random variables on (S2, , P), each having the uniform distribution on (0, 1).
Let X0 be a random variable with values in S. independent of {U}. Define
X + 1 = f(X, U,, ,) (n > 0), where f(x, u) := Fx '(u). It then follows from the above
that {X: n >_ 0} is a Markov process having transition probability p(x; dy), and initial
distribution that of X0 .
Of course, this type of representation of a Markov process having a given transition
probability and a given initial distribution is not unique.
For additional information, see R. M. Blumenthal and H. K. Corson (1972), "On
Continuous Collections of Measures, Proc. 6th Berkeley Symposium on Math. Stat.
and Prob., Vol. 2, pp. 33-40.
2. Example I is essentially due to L. E. Dubins and D. A. Freedman (1966), "Invariant
Probabilities for Certain Markov Processes, Ann. Math. Statist., 37, pp. 837-847.
The assumption of continuity of the maps is not needed, as shown in J. A. Yahav
(1975), "On a Fixed Point Theorem and Its Stochastic Equivalent," J. App!.
Probability, 12, pp. 605-611. An extension to multidimensional state space with an
application to time series models may be found in R. N. Bhattacharya and O. Lee
(1988), "Asymptotics of a Class of Markov Processes Which Are Not in General
Irreducible," Ann. Probab., 16, pp. 1333-1347. Example l(a) may be found in
L. J. Mirman (1980), "One Sector Economic Growth and Uncertainty: A Survey,"
Stochastic Programming (M. A. H. Dempster, ed.), Academic Press, New York.
It is shown in theoretical complement 3 below that the existence of a unique
invariant probability implies ergodicity of a stationary Markov process. The SLLN
then follows from Birkhoff's Ergodic Theorem (see Theorem T.9.2). Central limit
theorems for normalized partial sums may be derived for appropriate functions on
the state space, by Theorem T.13.3 in the theoretical complements of Chapter V.
Also see, Bhattacharya and Lee, loc. cit.
Example 3 is due to M. Majumdar and R. Radner, unpublished manuscript.
K. S. Chan and H. Tong (1985), "On the Use of Deterministic Lyapunov Function
for the Ergodicity of Stochastic Difference Equations," Advances in App!. Probability,
17, pp. 666-678, consider iterations of i.i.d. piecewise linear maps.
3. (Irreducible Markov Processes) A transition probability p(x; dy) on the state space
(S,5) is said to be co-irreducible with respect to a sigmafinite nonzero measure q if,
for each x e S and Be .9' with q(B) > 0, there exists an integer n = n(x, B) such that

p "^(x, B) > 0. There is an extensive literature on the asymptotics of cp-irreducible


Markov processes. We mention in particular, N. Jain and B. Jamison (1967),

"Contributions to Doeblin's Theorem of Markov Processes," Z. Wahrscheinlichkeits-
theorie und Venn Gebiete, 8, pp. 19-40; S. Orey (1971), Limit Theorems for Markov
Chain Transition Probabilities. Van Nostrand, New York; R. L. Tweedie (1975),
"Sufficient Conditions for Ergodicity and Recurrence of Markov Chains on a General
State Space," Stochastic Process App!., 3, pp. 385-403. Irreducible Markov chains
on countable state spaces S are the simplest examples of cp-irreducible processes; here
cp is the counting measure, ep(B)'= number of points in B. Some other examples are
given in theoretical complements to Section 11.6.
There is no general theory that applies if p(x; dy) is not cp-irreducible, for any
sigmafinite q. The method of iterated maps provides one approach, when the Markov
process arises naturally in this manner. A simple example of a nonirreducible p is
given by Example 2. Another example, in which p admits a unique invariant
probability, is provided by the simple linear model: X" = ZX,, + E" + ,, where F" are
i.i.d., P(e" = i) = P(F" = i) = z.

4. (Ergodicity, SLLN, and the Uniqueness of Invariant Probabilities) Suppose

{XX : it -> 0} is a stationary Markov process on a state space IS, 9'), having a transition
probability p(x; dy) and an invariant initial distribution it. We will prove the following
result: The process {X"} is ergodic if and only if there does not exist an invariant
probability n' that is absolutely continuous with respect to it and different from it.
The crucial step in the proof is to show that every (shift) invariant bounded
measurable function h(X) is a.s. equal to a random variable g(Xo ) where g is a
measurable function on (S, .9'). Here X= (X0 , X 1 , X 2 , . ..), and we let T denote the
shift transformation and 5 the (shift) invariant sigmafield (see theoretical complement
9.3). Now if h(X) is invariant, h(X) = h(T"X) a.s. for all n > 1. Then, by the Markov
property, E(h(X) I cr(Xo , ... , XX }) = E(h(T"X) I a{X 0 , ... , X,,}) = E(h(T"X) 1 6{X"}) =
g(X"), where g(x) = E(h(X0 , X . ..) I X0 = x). By the Martingale Convergence
Theorem (see theoretical complement 5.1 to Chapter IV, Theorem T.5.2), applied to
the martingale g(X) = E(h(X) a{Xo , ... , X"}), g(X) converges a.s., and in L', to
E(h(X) I a{Xo , X ...}) = h(X). But g(X) h(X) = g(X) h(T"X) has the same
distribution as g(X0 ) h(X) for all n > 1. Therefore, g(Xo ) h(X) = 0 a.s., since the
limit of g(X) h(X) is zero a.s. In particular, if G e .S then there exists B E . ' such
that {X 0 e B} = G a.s. This implies it(B) = P(X0 e B) = P(G). If {X"} is not ergodic,
then there exists G ei such that 0 < P(G) < I and, therefore, 0 < n(B) < 1 for a
corresponding set BE .y as above. But the probability measure r1 B defined by:
n e (A) = tr(A n B)/n(B), A e.9", is invariant. To see this observe that $ p(x; A)i B (dx) =
f B p(x; A)rz(dx)/ic(B) = P(X0 e B, X, e A)/ir(B) = P(X, E B, X, E A)/it(B) (since {X o e B}
is invariant) = P(X0 E A n B)/n(B) (by stationarity) = n(A r B)/tc(B) = tt 8 (A). Since
i,(B) = 1 > rz(B), and tt B is absolutely continuous with respect to n, one half of the
italicized statement is proved.
To prove the other half, suppose {X"} is ergodic and n' is also invariant and
absolutely continuous with respect to n. Fix A e Y. By Birkhoff's Ergodic Theorem,
and conditioning on X 0 , (I/n) Z;= p ' (x; A) converges to it(A) for all x outside a set
( )

of zero it-measure. Now the invariance of n' implies f (1/n) p(')(x; A)zr'(dx) = rz'(A)
for all n. Therefore, n'(A) = rz(A). Thus it' = it, completing the proof.
As a very special case, the following strong law of large numbers (SLLN) for
Markov processes on general state spaces is obtained: If p(x; dy) admits a unique
invariant probability rr, and {X": n -> 0} is a Markov process with transition probability

p and initial distribution n, then (1/n) j ' f (X,) converges to f f (x)n(dx) a.s. provided

that If (x)I i(dx) < co. This also implies, by conditioning on X 0 , that this almost
sure convergence holds under all initial states x outside a set of zero it-measure.
5. (Ergodic Decomposition of a Compact State Space) Suppose S is a compact metric
space and S = .l(S) its Borel sigmafield. Let p(x; dy) be a transition probability on
(S, s(S)) having the Feller property: x -* p(x; dy) is weakly continuous on S into
p(S)the set of all probability measures on (S,.R(S)). Let T* denote the map on
9(S) into a(S) defined by: (T*)(B) = $ p(x; B)p(dx) (Be f(S)). Then T* is weakly
continuous. For if probability measures ^ converge weakly to p then, for every
real-valued bounded continuous f on S, J f d(T*p") = f (f f (y)p(x; dy))p ^ (dx)
converges to ($ f(y)p(x; dy))p(dx) =If d(T*p), since x -a f f(y)p(x;dy) is continuous
by the Feller property of p.
Let us show that under the above hypothesis there exists at least one invariant

probability for p. Fix p e P1(S). Consider the sequence of probability measures

1^ '

^'= T *',u (n 1),

n r=o


T *I = T*p, and T*('"y = T*(T (r 1).

T *0 p = u, *rp)

Since S is compact, by Prohorov's Theorem (see theoretical complement 8.2 of Chapter

I), there exists a subsequence {p.} of {p"} such that p". converges weakly to a
probability measure n, say. Then T *p , converges weakly to T *n. On the other hand,

J f d^ . - J f d(T *u )I =4 f
f dy - f f d(T*" p)) -< (sup{f(x)I: x e S })(2/n') -+ 0,

as n' - oo. Therefore, {p} and {T*p".} converge to the same limit. In other words,
it = T*n, or it is invariant. This also shows that on a compact metric space, and with
p having the Feller property, if there exists a unique invariant probability it then
T*p 1_ (1/n) T*'p converges weakly to n, no matter what (the initial
distribution) p is.
Next, consider the set .f = .alp of all invariant probabilities for p. This is a convex
and (weakly) compact subset of P1(S). Convexity is obvious. Weak compactness follows
from the facts (i) q(S) is weakly compact (by Prohorov's Theorem), and (ii) T* is
continuous for the weak topology on 9(S). For, if u ^ e .elf and ^ converges weakly
to p, then ^ = T*" converges weakly to T*p. Therefore, T*p = p. Also, P1(S) is a
metric space (see, e.g., K. R. Parthasarathy (1967), Probability Measures on Metric
Spaces, Academic Press, New York, p. 43). It now follows from the Krein-Milman
Theorem (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York,
p. 207) that di is the closed convex hull of its extreme points. Now if {X ^ } is not
ergodic under an invariant initial distribution n, then, by the construction given in
theoretical complement 4 above, there exists B e P(S) such that 0 < n(B) < I and
it = it(B)it B + n(B`)i B .,, with n B and iB ., mutually singular invariant probabilities. In
other words, the set K, say, of extreme points of d# comprises those it such that {X"}
with initial distribution it is ergodic. Every it e .i is a (weak) limit of convex
combinations of the form .1;'p;" ( n -+ cc), where 0 < A;^ < 1, .l;' = 1, ;' e K.

Therefore, the limit it may be expressed uniquely as it = f K pm(dp), where m is a

probability measure on (K, :. (K)). This means, for every real-valued bounded
continuous f, 1, f drt = Sic (f s f dp)m(d).

Theoretical Complements to Section II.15

1. For some of Claude Shannon's applications of information theory to language
structure, see C. E. Shannon (1951), "Prediction and Entropy of Printed English,"
Bell System Tech. J., 30(1), pp. 50-64. The basic ideas originated in C. E. Shannon
(1948), "A Mathematical Theory of Communication," Bell System Tech. J., 27,
pp. 379-423, 623-656. There are a number of excellent textbooks and references
devoted to this and other problems of information theory. A few standard references
are: C. E. Shannon and W. Weaver (1949), The Mathematical Theory of
Communications, University of Illinois Press, Urbana; and N. Abramson (1963),
Information Theory and Coding. McGraw-Hill, New York.

BirthDeath Markov Chains


Each of the simple random walk examples described in Section 1.3 has the
special property that it does not skip states in its evolution. In this vein, we
shall study time-homogeneous Markov chains called birthdeath chains whose
transition law takes the form

; ifj =i +1
S ; ifj =i -1
a i ifj =i
0 otherwise,

where a + , + b = 1. In particular, the displacement probabilities may depend

; ;

on the state in which the process is located.

Example 1. (The BernoulliLaplace Model). A simple model to describe the

mixing of two incompressible liquids in possibly different proportions can be
obtained by the following considerations. Consider two containers labeled
box I and box II, respectively, each having N balls. Among the total of 2N
balls, there are 2r red and 2w white balls, I < r < w. At each instant of time,
a ball is randomly selected from each of the boxes, and moved to the other
box. The state at each instant is the number of red balls in box I.
In this example, the state space is S = {O, 1, ... , 2r} and the evolution is a
Markov chain on S with transition probabilities given by

for I i<,2r-1,
(w+4i)(2r i)
P1,i+ 1 =
(w + r) 2

_ i(2r mal) + .(2r^r(w + r i)

P" (1.2)
(2 + r) 2 (w + r) 2
i(w r + i)
Pi,i-1 =
(w + r) 2
w r
Poo =P2r,2r= w+r

Pol = P2r,2r - 1 = w + r

Just as the simple random walk is the discrete analogue of Brownian motion,
the birthdeath chains are the discrete analogues of the diffusions studied in
Chapter V.
Most of this chapter may be read independently of Chapter II.


The long-run behavior of a birthdeath chain depends on the nature of its

(local) transition probabilities pi,i+1 = ., p;,i-1 = S i at interior states i as well
as on its transitions at boundaries, when present. In this section a case-by-case
computation of recurrence properties will be made according to the presence
and types of boundaries.

CASE I. Let {X} bean unrestricted birthdeath chain on S = {O, 1, 2, ...} = 7L.
The transition probabilities are

1 = i, Pi,e-1 = ai, Pi,i = 1 i Si (2.1)


0<i<1, 0<,<1, .+bl. (2.2)

Let c, d e S, c < d, and write

i(i) = P({X} reaches c before d I Xo = i) = Pi (T < Td ) (c < i < d), (2.3)

where Tr denotes the first time the chain reaches r. Now,

^(i) = ( 1 i 1 )i(i) + rli(i + 1) + S1'(i 1),


or equivalently,

,(i/i(i + 1) iji(i)) = b.(iJi(i) t'(i 1)) (c + I 5 i <, d 1). (2.4)

The boundary conditions for Ii are

ci(c)= 1, li(d)=0. (2.5)

Rewrite (2.4) as

i/i(i + 1) i(i) = (ii(i) 1)), (2.6)


for (c + I < i < d 1). Iteration now yields

/x bx-l ... b^+l

(x+ 1) ^i(x) = S (^(c + 1) ^i(c)) (2.7)
#X #X 1 #c + l

for c + l < x < d 1. Summing (2.7) over x = y, y + 1, ... , d 1, one gets

. :. 61
0(d) 0(y) = d - 1 S +1 (0(c + 1) 0(c)). (2.8)
x=yxl'x-1 ... Nc+l

Let y = c + 1 and use (2.5) to get

d-1 Sxax-1..'Sc+1

NxNx 1 ' ' ' /'c+ 1

0(c + 1) = X=+ 1 ------- -- . (2.9)
Sx6..Sc+ 1
1 +
x=c+1 Nxx 1' .. #C+1

Using this in (2.8) (and using fr(d) = 0, i/i(c) = 1) one gets

d-1 5x5x-1 "'Sc +l

= xx-
x=y l ^+ l
(y)-- (c + l <, y < d 1). (2.10)
d-1 S
xSx-1c + 1
1+ E /^(^ /^
x=c+1 xx-1' ' .Yc+l

Let p y, denote the probability that starting at y the process eventually reaches
c after time 0, i.e.,

py, = PP (X = c for some n? 1). (2.11)

Then (Exercise 1),


if dx x _1_+l = 0
p = lim ^i(y)
y ^
= 1
di x x=c+l !'xl'x I " ' f'c+ l

axSx-1...6C < x (c < y). (2.12)

< I if
x=c+1#X#Xl' . 'I'c+l

Since, for c + 1 < 0,

Sxax-1'_c+1 = 0 ac+1 c+2" 'ax

x=c+1 /'xYx-1 ... c+l x=[c+1 lac+11'c+2 ... 1'x

+ ac+ISc+2 x M2...gx ( 2.13)

Nc+lc+2 *'Ox 1 F12"'x

and a similar equality holds for c + 1 > 0, (2.12) may be stated as

2 ax = cc
i=l for ally > c iff Y a
x=1 12 '

< I for all y> c if Y x < 00. (2.14)


By relabeling the states i as i (i = 0, + 1, 2, ...), one gets (Exercise 2)

x x+ = 0
p yd = 1 for all y < d iff F
x=oo axax+l"'SO

<1 for all y < d iff Y xx+l' - < oo (2.15)
x=m Sxsx+1" 60

By the Markov property, conditioning on X I (Exercise 3),

pYY 6 ypY 1.Y + p

+ 1'y + (1 Sy /3 y ). (2.16)

If both sums in (2.14) and (2.15) diverge, then p Y _ l , y = 1, p y+l , y = 1, so that

(2.16) implies

pyy = 1, for all y. (2.17)

In other words, all states are recurrent.

If one of the sums (2.14) or (2.15) is convergent, say (2.14), then by (2.16)
we get

pyy < 1, y e S. (2.18)


A state y e S satisfying (2.18) is called a transient state; since (2.18) holds for
all y e S, the birthdeath chain is transient. Just as in the case of a simple
asymmetric random walk, the strong Markov property may be applied to see
that with probability 1 each state occurs at most finitely often in a transient
birthdeath Markov chain.

CASE II. The next case is that of two reflecting boundaries. For this take
S = {0, 1, 2, .. , N } , P00 = 1 0 , POI = 0' PN.N-1 = 6 N' PN.N = 1 6N, and
Pi.j+ 1 = fli, Pi.r-1 = b1, pi,; = 1 . d ; for I <, i < N -- 1. If one takes c = 0,
d = N in (2.3), then fr (y) gives the probability that the process starting at y
reaches 0 before reaching N. The probability 4(y), for the process to reach N
before 0 starting at y, may be obtained in the same fashion by changing the
boundary conditions (2.5) to c(0) = 0, (N) = I to get that q (y) = I Ji(y).
Alternatively, check that b(y) - I ^i(y) satisfies the equation (2.6) (with 0
replacing 0) and the boundary conditions (P(0) = 0, 4(N) = 1, and then argue
that such a solution is necessarily unique (Exercise 4). All states are recurrent,
by Corollary 9.6 (see Exercise 5 for an alternative proof).

CASE III. For the case of one absorbing boundary, say at 0, take
S = j0, 1, 2, ...1, Poo = 1 , Pi.i+1 = #i, Pi.^-1 = b;, Pi.; = 1 ; S ; for i > 0;
; , 6 i > 0 for i > 0, fl + 1 < 1. For c, d e S, the probability Ji(y) is given by
(2.10) and the probability p, which is also interpreted as the probability of
eventual absorption starting at y> 0, is given by

d-1 ax ax l - bl
Y ..- 1 . . Ij1
p Ya =hm d-1

dtv 1 + Sxbx-1_..51

x=1 x-1 ... Y1

= 1 iff 2 a lb
/j J^ a = oo (for y > 0). (2.19)
x=1 I2'''Yx

Whether or not the last series diverges,

...6, > 0 , for all y> 0 (2.20)

P Yd 1 6 y 6 y _ 1 * (6 1 < l , ford > y > 0,
Pod=O foralld>0.

By (2.16) it follows that

pYY < 1 (y > 0). (2.22)

Thus, all nonzero states y are transient.


CASE IV. As a final illustration of transiencerecurrence conditions, take the

case of one reflecting boundary at 0 with S = {0, 1, 2, 3, ...} and Poo = I o,
Poi =o, pi.r+ =' = p ;.; = 1 ; b ; for i > 0; 1 >O for all i,
b ; > 0 for i > 1, i + 6 i < 1. Let us now see that all states are recurrent if and
only if the infinite series (2.19) diverges, i.e., if and only if p yo = 1.
First assume that the infinite series in (2.19) diverges, i.e., p yo = I for all
y > 0. Then condition on X, to get

Poo = ( 1 o) + opio, (2.23)

so that

Poo = 1. (2.24)

Next look at (see Eq. 2.16)

Pu = 6 1Poi + 1P21 + (1 6 1 l). (2.25)

Since P 20 = 1 and the process does not skip states, P 22 = 1. Also, p ol = 1

(Exercise 6). Thus, p ll = I and, proceeding by induction,

p= 1, for each y > 0. (2.26)

On the other hand, if the series in (2.19) converges then p^, o < 1 for all y > 0.
In particular, from (2.23), we see Poo < 1. Convergence of the series in (2.19)
also gives p r,, < 1 for all c < y by (2.12). Now apply (2.16) to get

p<1, for ally (2.27)

whenever the series in (2.19) converges. That is, the birthdeath chain is transient.

The various remaining cases, for example, two absorbing, or one absorbing
and one reflecting boundary, are left to the Exercises.


Suppose that there is a probability distribution it on S such that

n'p = n'. (3.1)

Then n'p" = it' for each time n = 1, 2, .... That is, it is invariant under the
transition law p. Note that if {X"} is started with an invariant initial
distribution it then X" has distribution it at each successive time point n.
Moreover, {X"} is stationary in the sense that the P,,-distribution of
(X0 , X,, ... , X.) is for each m > I invariant under all time shifts, i.e., for all


Pn(xo = io, ... , Xm Im) = Pn (Xk = 1p, ... > Xm+k = I m ). (3.2)

For a birth-death process on S = {0, 1, 2, ... , N} with two reflecting

boundaries, the invariant distribution it is easily obtained by solving n'p' = n',

no(1 o) + ir rbi = ir o
7i ifli i +(l i ai) + mi+Ibi+1 = ni (j = 1,2,...,N 1),


7[i-i-^ ni(i + S i ) + ni+^Sj+1 = 0. (3.4)

The solution, subject to ni 0 for all j and J] i rc i = 1, is given by

n i n (l<j<N),
...iI) 1
N ot
= 1 +

i =1 CS1(52...(SJ

For a birth-death process on S = {0, 1, 2, ...} with 0 as a reflecting

boundary, the system of equations n'p = n' are

7ro(I o)+n151=ito,
71 i -Ii -^+it(I i S i )+7r i +la i +^=j (j>11).

The solution in terms of n o is

7C = o i ...i 1 n o (j% 1). (3.7)

a 1 a 2 ...Si
In order that this may be a probability distribution one must have

01 .J- 1
< co. (3.8)
i =1 6162..81

In this case one must take

n o =l +
1 .. .
1 1
. (3.9)
...6 1
16 2
i =1

For an unrestricted birth-death process on S = {0, 1, 2,. . .} the

equations n' p' = n' are

7 rj-1Ni-1 + 1Cj(I /'j Uj) + 7C j+1 b j+1 = 7Cj (j = 0, + 1, 2,...) (3.10)

which are solved (in terms of n o ) by

ol ... j-1
7t, 6162...gj Ro (J% 1 ),
aj+16j+2 07r0
(j1< 1).
+r * * F'-1

This is a probability distribution if and only if


Y_ 6
bj+1 . j+2 E . 01...j-1
< 00, (3.12)
j<-1 Pifli+1 .-1 .j>-1 6 1 2 ...(a j

in which case

...S 0 + ol ... i 1
7< o= 1+ (3.13)
j_' 1 Yjl'j+l
I j>-1 6162...(ai
Notice that the convergence of the series in (3.12) implies the divergence of
the series in (2.14), (2.15). In other words, the existence of an equilibrium
distribution for the chain implies its recurrence. The same remark applies to
the birth-death chain with one or two reflecting boundaries.

Example 1. (Equilibrium for the Bernoulli-Laplace Model). For the Bernoulli-

Laplace model described in Section 1, the invariant distribution it =
(n : i = 0, 1, ... , 2r) is the hypergeometric distribution calculated from (3.5) as

2r 2w
_ o -1
j_ 1 2r(w + r) i (w + r i)(2r i) (j X W + j
^j S 1 b j "o j(2r+j) j i(wr +i) 2w+2r
The assertions concerning positive recurrence contained in Theorem 3.1
below rely on the material in Section 2.9 and may be omitted on first reading.
Recall from Theorem 9.2(c) of Chapter I that in the case that all states
communicate with each other, existence of an invariant distribution is equivalent
to positive recurrence of all states.

Theorem 3.1

(a) For a birth-death chain on S = {O, 1, ... , N} with both boundaries

reflecting, the states are all positive recurrent and the unique invariant
distribution is given by (3.5).
(b) For a birth-death chain on S = {0, 1, 2, ...} with 0 a reflecting boundary,
all states are recurrent or transient according as the series

c` 66

[J1 1 Nx

diverges or converges. All states are positive recurrent if and only if the
series (3.8) converges. In the case that (3.8) converges, the unique
invariant distribution is given by (3.7), (3.9).
(c) For an unrestricted birth-death chain on S = {0, 1, 2, ...} all states
are transient if and only if at least one of the series in (2.14) and (2.15)
is convergent. All states are positive recurrent if and only if (3.12) holds;
if (3.12) holds, then the unique invariant distribution is given by (3.11),



We will apply the spectral theorem to calculate p", for n = 1, 2, ... , in the case
that p is the transition law for a birth-death chain.
First consider the case of a birth-death chain on S = {0, 1, ... , N} with
reflecting boundaries at 0 and N. Then the invariant distribution it is given by
(3.5) as

7z 1 = n o , 71 t = '^ '-1 71 0 (2 < j < N). (4.1)
b l51...51

It is straightforward to check that

i m.i-1 =m1b1=ni-1i-1 = 7 Ei-1Pi-1.;, i= 1,2.....N, (4.2)

from which it follows that

7T ; p i; = n j p j for all i, j. (4.3)

In the applied sciences the symmetry property (4.3) is often referred to as detailed
balance or time reversibility. Introduce the following inner product ( . )" in the
vector space R"+'

(x, Y)n =

xiYi 7ry x = (x 0 , x i , ... , xN)',

so that the "length" IlxiI,, of a vector x is given by


Y = (Yo, Yi, ... , YNY, (4.4)

i xI
1JZ . ( 4.5)

With respect to this inner product the linear transformation x px is symmetric

since by (4.3) we have
(px, y),, = Pijxj Yiii = PjixjYiitj
i =0 j =o i=O j=O

i=O UO PjiYiJxj7rj

= i=O Y- PijYj)xi 71 i = ( PY, x). = (x, PY)n.

Z (=O (4.6)

Therefore, by the spectral theorem, p has N + 1 real eigenvalues a o , a l , ... , a N
(not necessarily distinct) and corresponding eigenvectors (0o, 4' , dN, which
are of unit length and mutually orthogonal with respect to ( , ). Therefore,
the linear transformation x --> px has the spectral representation
P= akEk ,
x= E ak(4>k, x)aek ,

where Ek denotes orthogonal projection with respect to ( , ) onto the

one-dimensional subspace spanned by 4: E k x = ( 4 k , x),1 4 k . It follows that
p= Z akEk ,
N (4.8)
Pn x = Y- ak(4)k, x)nek

Letting x = e j denote the vector with 1 in the jth coordinate and zeros elsewhere,
one gets

p;7) = ith element of p"ej

_ ak( ok, ej),Oki = akOki4kjnj. (4.9)
k=0 k=0

Without loss of generality, one may take a o = I and 0 o = 1 throughout. We

now consider two special birth-death chains as examples.

Example 1. (Simple Symmetric Random Walk with Two Reflecting Boundaries)

S={0,1,.. ,N}, p;,;+a=Pj,^-1 =i, for1_<i<N-1,

Pol = 1 =PN ,N-1

In this case the invariant initial distribution is given by

1 I
7Z j =N (1 <j<,N-1), tr o =zZ N =2
N . (4.10)

If a is an eigenvalue of p, then a corresponding eigenvector x = (x o , x 1 , ... , x N )'

satisfies the equation

z(x j _ 1 +x j+l )=ax j (1<,j<N-1), (4.11)

along with "boundary conditions"

x l = aX0, XN_ 1 = axN. (4.12)

As a trial solution of (4.11) consider x j = 0 1 for some nonzero 0. Since all

vectors of C N+ 1 may be expressed as unique linear combinations of functions
j - exp{ (2rn/N + 1) ji} = 0', with i = (-1) 1 / 2 , one expects to arrive at the right
combination in this manner. Then (4.11) yields '-z (B' -1 + B' +1 ) = cth , i.e.,

0 2 - 2at + 1 = 0, (4.13)

whose two roots are

B 1 =a+i,,/T-c , 0 2 =a-ijl-a 2 . (4.14)

The equation (4.11) is linear in x, i.e., if x and y are both solutions of (4.11)
then so is ax + by for arbitrary numbers a and b. Therefore, every linear

xj=A(cc)0 +B(a)0z (0<j<N) (4.15)

satisfies (4.11). We now apply the boundary conditions (4.12) to fix A(a), B(a),
up to a constant multiplier. Since every scalar multiple of a solution of (4.11)
and (4.12) is also a solution, let us fix x o = 1. Note that x o = 0 implies x 3 = 0
for all j. Letting j = 0 in (4.15), one has

A(a) + B(a) = 1, B(a) = I - A(a). (4.16)


The first boundary condition, x, = ax, = a, then becomes

A(a)(0 1 0 2 ) + 0 2 = a, (4.17)


2A(a)(l a 2 ) 1' 2i = ( 1 a 2 )'' 2 i,

i.e., at least for a 1,

A(a) = i, B(a) ='-z . (4.18)

The second boundary condition x N - 1 = ax N may then be expressed as

( 1 + 0) = 2 (0; + 0z). (4.19)

Now write 0 1 = e`o, 0 2 = e - ' O, where 0 is the unique angle in [0, it] such that
cos 4) = a. Note that cosine is strictly decreasing in [0, n] and assumes its entire
range of values [-1, 1] on [0, 7c]. Note also that this is consistent with the
requirement sin 4) = 1 a 2 0. Then (4.19) becomes

cos(N 1)4) = a cos No = cos 0 cos N4), (4.20)


sin No sin 0 = 0, (4.21)

whose only solutions in [0, it] are

4)=cos (k= 0,1,2,...,N). (4.22)

Thus, there are N + 1 distinct (and, therefore, simple) eigenvalues

a k = cos (k = 0, 1, 2, ... , N), (4.23)

and corresponding eigenvectors x '` ( k = 0, 1, ... , N):

( )

x5 = z(0i+9z)= cos k 0,1,...,N). (4.24)




k1 = J n j cosz
N kn l 1 N -i (k7rj) 1
2N+Nlyl cosz
N + 2N
cos2(krrj) = 1 il I + cos(2kirj/N)
= 1 N1
N j _ o N N j=0 2
c1 ifk=0orN
t J- ifk=1,2,...,N-1.

Thus, the normalized eigenvectors are

=( 1 , 1 , ,l)', ON=( 1 , -1 ,+1, -- 1,..

^ (4.26)
4)k/2-x 0kj=^/Lcos (1 <k<N-1).

Now use (4.9), (4.23), and (4.26) to get, for 0 < i, j < N,

(n) E k
n4 ki kj 7 lj
= rr j + 2rz j y- cos ( k^ cos kni cos( + (-1)rc j . (4.27)
k=1\ Nj N N

Thus, for 1 <j,<N-1,0,<i,<N,

1 2 } _ 1 n+j i _
p j 1 + COS"( )COS COS !

N N k=1
^l N (N I )
) .. ) N ( N

For 0<i<N,

1 IN 1 ( ^ ( ) 1
Po = + COS"t
/ + 1 " I
2N N k = t N N 2N

For 0<i<N,

1 1 N -' / k\ ki i
P,N + Z COs cos (--- I COS + ( 1)n+N (4.28)
2N N k = 1 N N N 2N

Note that when n and j i have the same parity, say n = 2m and j i is
even, then

C p;;"' 1 I=4
L cos( I I cos( cos( I[] + o(1)] (4.29)

as m - co. This establishes the precise rate of exponential convergence to the

steady state. One may express this as

u m (p;]'" ) ^^e z n' A _ cosh cos ^, (4.30)

where A = log a, (see (4.23)).

Example 2. (Simple Symmetric Random Walk with One Reflecting Boundary)

S = 0, 1, 2, ... Poi = 1 and i_

Pi,k -1 = 2 = Pij+ i

for all i > 1. Note that p;! is the same as in the case of a random walk with

two reflecting boundaries 0 and N, provided N > n + i, since the random walk
cannot reach N in n steps (or fewer) starting from i if N > n + i. Hence for all
i, j, n, p is obtained by taking the limit in (4.28) as N . oc, i.e.,

2 "
, cos"(nO) cos(iirO) cos( jn6) dO

_ I cos"(0) cos(iO) cos( JO) dO (j 1> 1, i >1 0), (4.31)

l o
p p = 1c
) os(0) cos(iO) dO (i i 0).
ir o

An alternative calculation of (4.31) can be made by first noting that the

condition (4.3) is valid for the sequence of weights {rr ; } given in (4.1) with
it o = 1. This provides an inner product (sequence) space on which p is bounded
self-adjoint linear transformation. The spectral theory extends to such settings
as well.

An example in which the birthdeath parameters are state-space dependent

(i.e., nonconstant) is given in the chapter application.


The Ehrenfest model illustrates the process of heat exchange between two bodies
that are in contact and insulated from the outside. The temperatures are assumed
to change in steps of one unit and are represented by the numbers of balls in
two boxes. The two boxes are marked I and II and there are 2d balls labeled
1, 2, ... , 2d. Initially some of these balls are in box I and the remainder in box
II. At each step a ball is chosen at random (i.e., with equal probabilities among
ball numbers 1, 2, ... , 2d) and moved from its box to the other box. If there

are i balls in box I, then there are 2d i balls in box II. Thus there is no overall
heat loss or gain. Let X. denote the number of balls in box I after the nth trial.
Then {X: n = 0, 1, ...} is a Markov chain with state space S = {0, 1, 2, ... , 2d}
and transition probabilities

Pu.,- = 2d ' Pi,c+1 =1 d- , for i = 1,2,... , 2d 1,

Poi- 1 , P2d,24-1= 1 ,

P ij = 0, otherwise.

This is a birth-death chain with two reflecting boundaries at 0 and 2d. The
transition probabilities are such that the mean change in temperature, in box
I, say, at each step is propostional to the negative of the existing temperature
gradient, or temperature difference, between the two bodies. We will first see
that the model yields Newton's law of cooling at the level of the evolution of
the averages. Assume that initially there are i balls in box I. Let Y = X d,
the excess of the number of balls in box I over d. Writing e = E j (Y), the
expected value of Y given X 0 = i, one has

e=E,(Xd)=E,[X- d+(XX-,)]

/ 2d x_, X _ i 1
=E,(X_1 X 1)=e-1+Er
2d 2d )

e -i 1\
= e- 1 + Ei d = e-, d = 1 ^ e 1. -

Note that in evaluating E i (X X _ 1 ) we first calculated the conditional

expectation of X X-, given Xn _ 1 and then took the expectation of
this conditional mean. Now, by successive applications of the relation
e = (1

e=(1 !) e o = ( 1 !) E i (X0 d) =(i d)(1 ! .


Suppose in the physical model the frequency of transitions is r per second. Then
in time t there are n = tT transitions. Write v = log[(l (1/d))]T. Then

e = (i d)e - `, (5.3)

which is Newton's law of cooling.

The equilibrium distribution for the Ehrenfest model is easily seen, using (3.5),
to be

_ ( 2 d) 2 _ a y
j=0, 1,...,2d. (5.4)

That is, it = (ij : j e S) is binomial with parameters 2d, z. Note that d =

is the (constant) mean temperature under equilibrium in (5.3).
The physicists P. and T. Ehrenfest in 1907, and later Smoluchowski in 1916,
used this model in order to explain an apparent paradox that at the turn of
the century threatened to wreck Boltzmann's kinetic theory of matter. In the
kinetic theory, heat exchange is a random process, while in thermodynamics it
is an orderly irreversible progression toward equilibrium. In the present context,
thermodynamic equilibrium would be achieved when the temperatures of the
two bodies became equal, or at least approximately or macroscopically equal.
But if one uses a kinetic model such as the one described above, from the state
i = d of thermodynamical equilibrium the system will eventually pass to a state
of extreme disequilibrium (e.g., i = 0) owing to recurrence. This would contradict
irreversibility of thermodynamics. However, one of the main objectives of kinetic
theory was to explain thermodynamics, a largely phenomenological
macroscopic-scale theory, starting from the molecular theory of matter.
Historically it was Poincare who first showed that statistical-mechanical
systems have the recurrence property (theoretical complement 2). A scientiest
named Zermelo then forcefully argued that recurrence contradicted
Although Boltzmann rightly maintained that the time required by the random
process to pass from the equilibrium state to a state of macroscopic
nonequilibrium would be so large as to be of no physical significance, his
reasoning did not convince other physicists. The Ehrenfests and Smoluchowski
finally resolved the dispute by demonstrating how large the passage time may
be from i = d to i = 0 in the present model.
Let us now present in detail a method of calculating the mean first passage
time m = E 1 To , where To = inf {n > 0: X = 0} . Since the method is applicable

to general birthdeath chains, consider a state space S = {0, 1, 2, ... , N} and

a reflecting chain with parameters ; , S ; = 1 , such that 0 < , < 1 for
1 <i<,N l and o =S N = 1. Then,

m ; =1+ m 1 +5 1 m i _ 1 (1<i<N-1),
m0=0, mN = 1 + mN_1.

Relabel the states by i u, so that u ; is increasing with i,

u 0 =0, u l = 1, (5.6)

and, for all x e S,

4'(x) __ Px ({ X} reaches 0 before N) = u " u " (5.7)


In other words, in this new scale the probability of reaching the relabeled
boundary u = 0, before U N , starting from u x (inside), is proportional to the
distance from the boundary U N . This scale is called the natural scale. The
difference equations (5.5) when written in this scale assume a simple form, as
will presently be shown. First let us determine u x from (5.6) and (5.7) and the
difference equation

'(x)= x '(x+1)+ 5 x (i(x-1), 0<x< N,

tf(x) = 1, i(N) = 0,

which may also be expressed as

^i(x+l)^i(x)= 6X[^i(x)^i(x-1)], 0<x<N,

/i(0) = 1, /i(N)=0.

Equations (5.6), (5.7), and (5.9) yield

ux+1 ux= --- (ux ux -i)


a 1 S Z .. _ S,a2... x
a a a (ui uo)= a (1<x<N-1), (5.10)


x . (5 i
ux+1 = 1 +i^ i2 ...i (1 x<N 1). (5.11)

Now write

m(u) - m x . (5.12)

Then (5.5) becomes

[m(ui +i) m(ur)]r [m(ui) m(ui- )]5 = 1 (1 <, i < N 1),

m(u 0 )=m(0)=0, m(u N )m(u N - 1 )= 1.

One may rewrite this, using (5.10), as

m(u1 + 1 ) m(ui) m(ui) m(u1 1 ) _ o1 1

- .
(1 < i < N 1)
ui +l Ui ui Ui -1 6162.. 4j

or, summing over i = x, x + 1, ... , N 1 and using the last boundary

condition in (5.13), one has

1 m(u) m(u- 1 ) 1
_ (1 x N 1).
UN UN-1 Ux Ux-1 i =x
Relations (5.10) and (5.14) lead to

l'N-1 +
m \U x ) m(U x -1) = Yx/'x+l 1 F'x
(1 x < N 1). Ni -1i
...6 (Sx...Sii
Sxax+l N-1 i =x l
The factor ; / i is introduced in the last summands to take care of the summand
corresponding to i = x (this summand is actually 1/S x ). Sum (5.15) over
x = 1, 2, ... , y to finally get, using m(u 0 ) = 0,

m(UO _ xfx+1
...%jN ...
-1 + x i 1t
Ll (1 < y < N 1).
x=1 Sx 6 x+I * * ' SN-1 x=1 i =x 6X... 6 i
In particular, for the Ehrenfest model one gets

(2dx).2.1 + U-1 (2dx)(2d i)

m =m(ud)=
x= 1x(x+1)...(2d-1) x=1 i =x x(x+1)...i(2d i)

= (2d x)!(x 1)! d 2d1 (2d x)!(x 1)!

x=1 (2d 1)! x=1 i =x (2d i)!i!

= 2d 22d(1 (5.17)
Next let us calculate

m =_ Ei T (0<i<d),
i 4 (5.18)

Td= inf{n >0:Xd}. (5.19)

Writing m(u i ) = Ph i , one obtains the same equations as (5.13) for 1 < i < d 1,
and boundary conditions

in(u o ) = 1 + n(u 1 ), m(u) = 0. (5.20)

As before, summing the equations over i = 1, 2, ... , x,


m(ux+l) - m(ux) tn(ul) - tn(uo) x o1 _-I
ux+I U x U1 UQ i =1 6 1 62...^i

_ -1- Y
_ 6 6 ...6.
1 1 2

where 0 = 1. Therefore,

m(u x+, ) - m(u) _- ' z
x - Y r+ I x x+ I
(5 22)
x ,z
x i 1 , x6x+I '

which, on summing over x = 1, 3, ... , d - 1 and using (5.20), leads to

.g x - d - i x 5;+ , ...g xbx+l

-m(u 0 ) _ -1 - d-a 6152 . (5.23)
x=1 PlP2 . .Px x=1 i =1 Pf ... xbx+l

For the present example this gives

m l+
X= I



+ x

+ d-I
d-I x ((x + 1)x (i + 2)(i + 1)

i; (2d_i)...(2d_x)(x+1) )

2d x x x -i
<1 +
x=1 (2d- 1)(2d-x) x = 1 2d-x 1= 2d-x
d-1 x! d-I
,1+Y- -+I
x= I (2d - 1)...(2d - x) x= I 2(d - x)

dIl xl + d(log d + 1). (5.24)
x= I (2d - 1) (2d - x)

Since the sum in the last expression goes to zero as d - oo,

m o <d+dlogd+0(1), asd -* oo. (5.25)

For d = 10 000 balls and rate of transition one ball per second, it follows that

m o < 102 215 seconds < 29 hours,

a n d = 10 10 6000 years.

It takes only about a day on the average for the system to reach equilibrium from
a state farthest from equilibrium, but takes an average time inconceivably large,
even compared to cosmological scales, for the system to go back to that state
from equilibrium.

The original calculations by the Ehrenfests concerned the mean recurrence

times. Using Theorem 9.2(c) of Chapter II it is possible to get these results
quite simply as follows. Let t;' = min{n ? 1: X = i }. Then, the mean recurrence
time of the state i is

1 = i!(2d i)! 22d

Et' (5.27)
it 2d!

For d = 10 000 one gets, using Stirling's approximation for the second estimate,

E o zoi 220000 Edt'^ ^_ l00,Jr. . (5.28)

Thus, within time scales over which applications of thermodynamics make

sense, one would not observe a passage from equilibrium to a (macroscopic)
nonequilibrium state. Although Boltzmann did not live to see it, this vindication
of his theory ended a rather spirited debate on its validity and contributed in
no small measure to its eventual acceptance by physicists.
The spectral representation for p can be obtained by precisely the same steps
as those outlined in Exercise 4.3. The 2d eigenvalues that one obtains are given
by a j =j/d, j = 1,2,..., d.


Exercises for Section III.1

(Artificial Intelligence) The amplitudes of pure noise in a signal- detection device
has p.d.f. fo (x), while the amplitude is distributed as f l (x) when a signal is present
with the noise. A detection procedure is designed as follows. Select a threshold value
0 0 = r8 for .integer r. If the first amplitude observed, X 1 , is larger than the initial
threshold value 0, then decide that the signal is present. Otherwise decide that the
signal is absent. Suppose that a signal is being sent with probability p = Z and that
upon making a decision the observer learns whether or not the decision was correct.
The observer keeps the same threshold value if the decision is correct, but if the
decision was incorrect the threshold is increased or decreased by an amount S
depending on the type of error committed. The rule governing the learning process
of the observer is then

0.11 = n + U{1(O,.)(X. 0)S]

where S. is 1 or 0 depending on whether a signal is sent or not. The signal transmission

processes {S} and {X} are i.i.d.
(i) Show that the threshold adjustment process {0} is a birthdeath Markov chain
and identify the state space.
(ii) Calculate the transition probabilities.


2. Suppose that balls labeled 1, ... , N are initially distributed between two boxes labeled
I and II. The state of the system represents the number of balls in box I. Determine
the one-step transition probabilities for each of the following rules of motion in the
state space.
(i) At each time step a ball is randomly (uniformly) selected from the numbers
1, 2, ... , N. Independently of the ball selected, box I or II is selected with
respective probabilities p, and P2 = t p,. The ball selected is placed in the
box selected.
(ii) At each time step a ball is randomly (uniformly) selected from the numbers in
box I with probability p, or from those in II with probability P2 = 1 p,. A
box is then selected with respective probabilities in proportion to current box
sizes. The ball selected is placed in the box selected.
(iii) At each time step a ball is randomly (uniformly) selected from the numbers in
box I with probability proportional to the current size of I or from those in II
with the complementary probability. A box is also selected with probabilities in
proportion to current box size. The ball selected is placed in the box selected.

Exercises for Section 111.2

>_ > >_
1. Let A d be the set {w: X 0 (co) = y, {Xn (w): n 0} reaches c before d}, where y > c.
Show that A d j A = {cw: X0 (cu) = y, {X (cu): n O} ever reaches c}.

2. Prove (2.15) by using (2.14) and looking at {Xn : n 0}.

_< _<
3. Prove (2.4), (2.16), and (2.23) by conditioning on X, and using the Markov property.

>_ _<
4. Suppose that cp(i)(c i d) satisfy the equations (2.4) and the boundary conditions
q(c) = 0, cp(d) = 1. Prove that such a cp is unique.
5. Consider a birthdeath chain on S = {0, 1, ... , N } with both boundaries reflecting.
(i) Prove that P(T mN) (I S N 5 N _ I ...6, )m if i > j, and < (1 o/i t .. N _, )m
if i < f. Here T = inf {n 1: Xn =j}.
(ii) Use (i) to prove that p ; , = P; (Tt < x) = I for all i, j.
6. Consider a birthdeath chain on S = {0, 1, ...} with 0 reflecting. Argue as in Exercise
5 to show that p 1 = I for all y.
7. Consider a birthdeath chain on S = {0, I, ... , N} with 0, N absorbing. Calculate

lim n ' p;T', for all i, j.

nix m=1

8. Let 0 be a reflecting boundary for a birthdeath chain on S = {... , 3, 2, 1, 0}.

9. If 0 is absorbing, and N
Derive the necessary and sufficient condition for recurrence.
reflecting, for a birthdeath chain on S = {0, 1, ... , N},
then show that 0 is recurrent and all other states are transient.
10. Let p be the transition probability matrix of a birthdeath chain on S = {0, 1, 2, ...}

.= j= 0.1,2,....
2 (j +21 ) Si 2 (j+ 1),

(i) Are the states transient or recurrent?

(ii) Compute the probability of reaching c before d, c < d, starting from state i,
11. Suppose p is the transition matrix of a birthdeath chain on S = {0, 1, 2, ...} such
that o = 1, , < S ; for j = 1, 2, .... Show that all states must be recurrent.

Exercises for Section III.3

1. Let {X} be the asymmetric simple random walk on S = {0, 1, 2, ...} with j = p < 2,
1, 2,... and (partial) reflection at 0 with
(i) Calculate the invariant initial distribution it.
(ii) Calculate EX as a function of p < Z.
2. (A BirthDeath Queue) During each unit of time either 0 or I customer arrives for
service and joins a single line. The probability of one customer arriving is 2, and no
customer arrives with probability 1 2. Also during each unit of time, independently
of new arrivals, a single service is completed with probability p or continues into the
next period with probability 1 p. Let X. be the total number of customers (waiting
in line or being serviced) at the nth unit of time.
(i) Show that {X} is a birthdeath chain on S = {0, 1, 2, ...}.
(ii) Discuss transience, recurrence, positive-recurrence.
(iii) Calculate the invariant initial distribution when A < p.
(iv) Calculate EX when A < p, where it is the invariant initial distribution.
3. Calculate the invariant distribution for Exercise 1.2(i) where

N i
N p l , ifj =i +1,

Ni (P 1N i)
+ N p 2 , ifj=i,i=0,1 ' ..., N,

Ni P 2' ifj =i -1,

0, otherwise.

Discuss the situation for Exercise 2(ii) and (iii).

4. (Time Reversal) Let {X} be a (stationary) irreducible birthdeath chain with in-
variant initial distribution n. Show that P(X = j I X + i = i) = Pn(X+ 1 = j I X. = i).

Exercises for Section III.4

1. Calculate p for the birthdeath queue of Exercise 3.2.
2. Let T be a self-adjoint linear transformation on a finite-dimensional inner product
space V. Show
(i) All eigenvalues of T must be real.
(ii) Eigenvectors of T associated with distinct eigenvalues are orthogonal.


3. Calculate the transition probabilities p" for n >, 1 by the spectral method in the case
of Exercise 1.2(i) and p, = p 2 = Z according to the following steps.
(i) Consider the eigenvalue problem for the transpose p'. Write out difference
equations for p'x = ax.
(ii) Replace the system of equations in (i) by the infinite system

1 1 N i 1 i +2
- x0+ ---X I = ax0, 2N Xi + - xi+) + x^+2 = ax;+
2 2N 2 2N

i = 0, 1, 2, .... Show that if for some a there is a nonzero solution

x = (x 0 , x,, x 2' ...)' of this infinite system with X N+l = 0, then a must be an
eigenvalue of p' with corresponding eigenvector (x o , x,, ... , x N )'. [Hint: Show
that X N+ , = 0 implies x i = 0 for all i -> N + 1.]
(iii) Introduce the generating function q(z) = Z^ x ; z` for the infinite system in (ii).
Note that for the desired solutions satisfying X N+ , = 0, q(z) will be a polynomial
of degree _< N. Show that

N(2a 1 z)
q (Z) =
' q(Z), q(0) = xo.
1 ZZ

[Hint: Multiply both sides of the second equation in (ii) by z' and sum over
i >-0.]
(iv) Show that (iii) has the unique solution
N I' -a)(I + )Na
(P(Z) = X0( 1 z) Z

(v) Show that for aj = j/N, j = 0, 1, ... , N, cp(z) is a polynomial of degree N and
therefore, by (ii) and (iii), a j = j/N, j = 0, 1, ... , N, are the eigenvalues of p' and,
therefore, of p.
(vi) Show that the eigenvector x (' ) = (x ) , ... , xN ) )' corresponding to aj = j/N is
given with xo' ) = 1, by xk ) = coefficient of z' in (1 z)" - '(1 + z) .
(vii) Write B for the matrix with columns x^ 0) , .. , x (N) . Then,

(B') - ' diag(a, . , a N)B



no n
(B') ' B diag ...
(IX' 0) lirz ' IIX IN) IIa2 )

and the (invariant) distribution it is binomial with p = 2, N; see Exercise 1.2(i).

[Hint: Use the definitions of eigenvectors and matrix multiplication to write
(p')"B = B diag(cc .... , aN). Multiply both sides by B' and take the transpose
to get the spectral representation of p". Note that the columns of (B') - ' are the
eigenvectors of p since p(B') - ' = (B') - ` diag(a o , ... , a N ). To compute (B') - '
use orthogonality of the eigenvectors with respect to ( , )" to first write
B' diag(a , ... ,. N )B = diag(llx (0) jjn, , IIx (N) 11n). The formula for (B') - '

4. (Relaxation and Correlation Length) Let p be the transition matrix for a finite state
stationary birthdeath chain {X n } on S = {0, 1, ... , N} with reflecting boundaries at
0 and N. Show that

sup {Corr,,( f(Xn ), g(X 0 ))} = e z i.


Corrn(.i(X.), 9(X0)) = En{[f(Xn) Ef(XX)][g(X0) Eg(Xo)]}

(Varnf(Xn)) 2 (Var,^ g(X0)) 1J2

A l is the largest nontrivial (i.e., ^ 1) eigenvalue of p, and the supremum is over

real-valued functions f and g on S. The parameter r = 1/x. 1 is called the relaxation
time or correlation length in applications. [Hint: Use the self-adjointness of p (i.e.,
time-reversibility) with respect to ( , ),, to obtain an orthonormal basis {cp"} of
eigenvectors of p. Check equality in the case f = g = cp l is the eigenvector
corresponding to A 1 . Restrict attention to f and g such that E n f = E n g = 0 and
ii IIn = 119 11,, = 1, and expand as

f =Y_(f (M. P.,

( g=Y_(g ,qin)np .
n n

Use the inequality ab _< (a' + b 2 )/2 to show (Corr, ( f (X.), g(X ))I < e -z ".]

5. (i) (Simple Random Walk with Periodic Boundary) States 0, 1, 2, ... , N 1 are
arranged clockwise in a circle. A transition occurs either one unit clockwise or
one unit counterclockwise with respective probabilities p and q = 1 p. Show

1 N-1
N r_o1

where 0 = e(znptN is an Nth root of unity (all Nth roots of unity being
1,0,0 ,...,0 ).
2 N-1

(*ii) (General Random Walk with Periodic Boundary) Suppose that for the
arrangement in (i), a transition k units clockwise (equivalently, N k units
counterclockwise) occurs with probability p k , k = 0,1, ... , N 1. Show that

(n) = 1 0rU'k) I Br5 "
Pjk P,
N r=o s=o

where 0 = e (2 "' is an Nth root of unity.


Theoretical Complements to Section III.5

1. Conservative dynamical systems consisting of one or many degrees of freedom
(components, particles) are often represented by one-parameter families of


transformations {T,} acting on points x = (x,, ... , x") of [I8" (positionmomentum

phase space). The transformations are obtained from the differential equations
(Newton's Law) that govern the evolution started at arbitrary states x e (18"; i.e., Tx
is the state at time t when initially the state is T o x = x, x e ti". The mathematical
theory described below applies to phenomena where the physics provide a law of
evolution of the form

aT, x
f(T,x), t > 0,
at (T.5.1)

To x=x,

such that f = (f,, ... , f") . I^" R" uniquely determines the solution at all timest >0
for each initial state x by (T.5.1).

Example 1. Consider a mass m on a spring initially displaced to a location x,

(relative to rest position at x, = 0) and with initial momentum x 2 . Then Hooke's
law provides the force (acting along the gradient of the potential curve U(x 1 ) = Zkx, ),
according to which f(x) - (f,(x), f2 (x)) _ ((1/m)x 2 , kx,), where k > 0 is the spring
constant. In particular, it follows that T r x = xA(t), where

cos(yt) my sin(yt) k
A(t)= 1 t_>0, where y= > 0.
--sin(yt) cos(yt)

Notice that areas (2-dimensional phase-space volume) are preserved under T, since
det A(t) = 1. The motion is obviously periodic in this case.

Example 2. A standard model in statistical mechanics is that of a system having k

(generalized) position coordinates q l , ... , q, and k corresponding (generalized)
momentum coordinates p l , ... , P k . The law of evolution is usually cast in Hamiltonian

dq;_aH dp;_
i =1,...,k, (T.5.2)
dt ap ' ; dt aq; '

where H - ll(q,, . .. , qk, p,, ... , Pk) is the Hamiltonian function representing the
total energy (kinetic energy plus potential energy) of the system. Example I is of this
form with k = 1, H(q, p) = p 2 /2m + kg 2 . Writing n = 2k, x, = q,, ... , X k = qk>
Xk+ 1 = Pi, , X2k = Pk, this is also of the form (T.5.1) with

p / OH OH OH aH
f(x) _ (fl (x), ..... 2k(x)) _ , ... , -- (T.5.3)
GXk+ 1 aX2, ax, OXk

Observe that for H sufficiently smooth, the flow in phase space is generally
incompressible. That is,

div f(x) - trace^af 1

of r a / a Fl 1 + a (_ OH )]
ax;/1 i =1 ax; i 1 LOx1\aXk+t/ axk +i I ax1
= 0 for all x. (T.5.4)

Liouville first noticed the important fact that incompressibility gives the volume
preserving property of the flow in phase space.

Lionville Theorem T.5.1. Suppose that f(x) in (T.5.1) is such that div f(x) = 0 for
all x. Then for each bounded (measurable) set D c R', IT DI = IDI for all t > 0, where
I I denotes n-dimensional volume (Lebesgue measure).

Proof. By the uniqueness condition stated at the outset we have T, +h = T,Th for all
t, h > 0. So, by the change of variable formula,

ITs+hDI = f d e t( l dx.
11,D \ ax )

To calculate the Jacobian, first note from (T.5.1) that

I+ af h+O(h2) as h0.
ax = ax
But, expanding the determinant and collecting terms, one sees for any matrix M that

det(I + hM) = 1 + h trace(M) + 0(h 2 ) as h 0.

Thus, since trace(af/ax) = div f(x) = 0,

det( Ox ) = 1 + O(h 2 ) as h p 0.
It follows that for each t >_ 0

IT,+h DI = IT,DI + O(h 2 ) as h + 0,

IT,DI = 0 and ITODI = IDI,

i.e., t * ITDI is constant with constant value L. n

2. Liouville's theorem becomes especially interesting when considered along side the
following theorem of Poincare.

Poincare's Recurrence Theorem T.5.2. Let T be any volume preserving continuous

one-to-one mapping of a bounded (measurable) region D c 1' onto itself. Then for
each neighborhood A of any point x in D and every n, however large there is a subset
B of A having positive volume such that for all ye B T'y E A for some
r _> n.


Proof: Consider A, T", T - 2 "A, .... Then there are distinct times i, j such that
IT -. "A n T'01 ^ 0; for otherwise

Ipl >- T - i "a,1 =I I -i "AI = Z JAI = +oo.

i =o i =a i =o

It follows that

IO n T - "li - 'IAI ^ 0.

Take B=AnT - "'i - 'IA,r= nil ii. n

3. S. Chandrasekhar (1943), "Stochastic Problems in Physics and Astronomy", Reviews
in Modern Physics, 15, 1-89, contains a discussion of Boltzmann and Zermello's
classical analysis together with other applications of Markov chains to physics. More
complete references on alternative derivations as well as the computation of the mean
recurrence time of a state can be found in M. Kac (1947), "Random Walk and the
Theory of Brownian Motion", American Mathematical Monthly, 54, 369-391; also
see E. Waymire (1982), "Mixing and Cooling from a Probabilistic Point of View",
SIAM Review, 24, 73-75.

Continuous-Parameter Markov


Suppose that {X,: t O} is a continuous-parameter stochastic process with a

finite or denumerably infinite state space S. Just as in the discrete-parameter
case, the Markov property here also refers to the property that the conditional
distribution of the future, given past and present states of the process, does not
depend on the past. In terms of finite-dimensional events, the Markov property
requires that for arbitrary time points 0 < s o < s, < ... < s <S <1< t, < . .
<t and states i0 .. , i k , i, j, j,, ... , j in S

P(X,=j,X l = jt,...,X,.=jfl Xso =i o ,...,X=i k ,Xs =i)

=P(X,=J X, i =11 ...,X1=J.IXs =i). (1.1)
, ,

In other words, for any sequence of time points 0 t o < t, < ... , the discrete
parameter process Yo := X, 0 , Y, := Xr ..... is a Markov chain as described in
Chapter II. The conditional probabilities p, J (s, t) = P(X1 = j Xs = i), 0 < s < t,
are collectively referred to as the transition probability law for the process. In
the case p ; j (s, t) is a function of t s, the transition law is called
time-homogeneous, and we write p, 1 (s, t) = p, j (t s).
Simple examples of continuous-parameter Markov chains are the
continuous-time random walks, or processes with independent increments on
countable state space. Some others are described in the examples below.

Example 1. (The Poisson Process). The Poisson process with intensity function
p is a process with state space S = {0, 1, 2, ...} having independent increments
distributed as

(f:P(u)du )^ (

P(X,XX =j)= exp p(u)duI, (1.2)

for j = 0, 1, 2, ... , s < t, where p(u), u > 0, is a continuous nonnegative

function. Just as with the simple random walk in Chapter II, the Markov
property for the Poisson process is a consequence of the independent increments
property (Exercise 2). Moreover,

P.1(s,t)=P(Xr=j1 X, = i)
= P = j, Xs = i)

P(Xt Xs=j i)P(X5 =i)

= P(XS=I)

(J i )!
l expj p(u) du)
for j i
0 ifj<i.

In the case that p(u) _ A (constant) for each u >, 0,

[2(t s))' ' e 2(i s)


(j i)! .
Pi;(s,t)= (1.4)
0, 3> ,

is a function p ;; (t s) of s and t through t s; i.e., the transition law is

time-homogeneous. In this case the process is referred to as the Poisson process
with parameter 2.

Example 2. (The Compound Poisson Process). Let {N} be a Poisson process

with parameter A > 0 starting at 0, and let Y,, Y2 ,... be i.i.d. integer-valued
random variables, independent of the process {N}, having a common
probability mass function f. The process {X,} is defined by
x= E Y, (1.5)

where Yo is independent of the process {NN } and of Y1 , Y2 .... . The stochastic

process {XX } is called a Compound Poisson Process. The process has independent
increments and is therefore Markovian (Exercise 4). As a consequence of the
independence of increments,

p i,(s,t)=P(X,=jI X5 =i)= P(X,XX =ji XX =i)

=P(X,X =j i). S (1.6)


p,J(s,t)=E{P(X,XS=j i N,N )} S

2(t S)]k
*k ^

_ Y if) (j k^ e
( 1 .7)

where f * k is the k-fold convolution of f with f * 0 (0) = 1. In particular, {X,}

has a time-homogeneous transition law. The continuous-time simple random
walk is defined as the special case with f( + 1) = P(Yn = + 1) = p,
f(l)=P(Y,,= l)=q,O<p,q< , l,p+q= 1.

Another popular continuous-parameter Markov chain with nonhomogeneous

transition law, the Plya process is provided in Exercise 6 of the next section
as a limiting approximation to the (nonhomogeneous) discrete-parameter
partial sum process in Example 3.8, Chapter II. However, unless stated
otherwise, we shall generally restrict our attention to the study of Markov
chains with a time-homogeneous transition law.


We continue to denote the finite or denumerable state space by S. To construct

a Markov process in discrete time (i.e., to find all possible joint distributions)
it was enough to specify a one-step transition matrix p along with an initial
distribution n. In the continuous-parameter case, on the other hand, the
specification of a single transition matrix p(t o ) = ((p i; (t 0 ))), where p 11 (t o ) gives
the probability that the process will be in state j at time t o if it is initially at
state i, together with an initial distribution n, is not adequate. For a single time
point t o , p(t o ) together with it will merely specify joint distributions of
X0 , X, o , X 2 , 0 , . .. , XX , o , ... ; for example,

Pn (X0 = i0, X, 0 = i1, ... , Xn,0 = in) = iti0pioi,(t0)pi1iz(t0)...pi^,-1in(t0). (2.1)

Here p(t o ) takes the place of p, and t o is treated as the unit of time. Events that
depend on the process at time points that are not multiples of t o are excluded.
Likewise, specifying transition matrices p(t o ), p(t, ), ... , p(t) for an arbitrary
finite set of time points t o , t,, ... , t, will not be enough.
On the other hand, if one specifies all transition matrices p(t) of a
time-homogeneous Markov chain for values of t in a time interval 0 < t <, t o
for some t o > 0, then, regardless of how small t o > 0 may be, all other transition
probabilities may be constructed from these. To understand this basic fact, first

assume transition matrices p(t) to be given for all t > 0, together with an initial
distribution n. Then for any finite set of time points 0 < t l < t 2 < < t,,, the
joint distribution of X 0 , X,,, ... , X given by

Pn( XO = i0+ X1 i = i1 , Xr2 = i2. , Xt -i = in - 1 , X, = in)

= 1tioPi0i,(t1)Pi1i2(t2
_ t1) ... Pi,,-( t,, t,, I). (2.2)

Specializing to n i = 1, t = t, t 2 = t + s, it follows that

Pi(X1 =j, X1 k) = Pi;(t)P;k(s),


Pi(XI +s = k) = p(t + s). (2.3)

But {Xt +s = k} is the countable union U JEs {X, =1' Xt +s = k} of pairwise

disjoint events. Therefore,

Pi(XI +s = k) =Y JEs
P;(XX =j, Xt +s = k). (2.4)

The relations (2.3) and (2.4) provide the ChapmanKolmogorov equations,


pik(t + s) _Ep
i; (t)p Jk (s) (i, k E S; s > 0, t > 0), (2.5)

which may also be expressed in matrix notation by the following so-called

semigroup property

p(t + s) = p(t)p(s) (s > 0, t > 0). (2.6)