Denumerable Markov Chains

woess_titelei 15.7.
2009 14:12 Uhr Seite 1

EMS Textbooks in Mathematics
EMS Textbooks in Mathematics is a series of books aimed at students or professional mathemati-
cians seeking an introduction into a particular field. The individual volumes are intended not only
to provide relevant techniques, results, and applications, but also to afford insight into the motiva-
tions and ideas behind the theory. Suitably designed exercises help to master the subject and
prepare the reader for the study of more advanced and specialized literature.
Jrn Justesen and Tom Hholdt, A Course In Error-Correcting Codes
Markus Stroppel, Locally Compact Groups
Peter Kunkel and Volker Mehrmann, Differential-Algebraic Equations
Dorothee D. Haroske and Hans Triebel, Distributions, Sobolev Spaces, Elliptic Equations
Thomas Timmermann, An Invitation to Quantum Groups and Duality
Oleg Bogopolski, Introduction to Group Theory
Marek Jarnicki und Peter Pflug, First Steps in Several Complex Variables: Reinhardt Domains
Tammo tom Dieck, Algebraic Topology
Mauro C. Beltrametti et al., Lectures on Curves, Surfaces and Projective Varieties
woess_titelei 15.7.2009 14:12 Uhr Seite 2
Wolfgang Woess
Denumerable
Markov Chains
Generating Functions
Boundary Theory
Random Walks on Trees
Author:
Wolfgang Woess
Institut fr Mathematische Strukturtheorie
Technische Universitt Graz
Steyrergasse 30
8010 Graz
Austria
2000 Mathematical Subject Classification (primary; secondary): 60-01; 60J10, 60J50, 60J80, 60G50,
05C05, 94C05, 15A48
Key words: Markov chain, discrete time, denumerable state space, recurrence, transience, reversible
Markov chain, electric network, birth-and-death chains, GaltonWatson process, branching Markov
chain, harmonic functions, Martin compactification, Poisson boundary, random walks on trees
ISBN 978-3-03719-071-5
The Swiss National Library lists this publication in The Swiss Book, the Swiss national bibliography,
and the detailed bibliographic data are available on the Internet at http://www.helveticat.ch.
This work is subject to copyright. All rights are reserved, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation,
broadcasting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of
use permission of the copyright owner must be obtained.
2009 European Mathematical Society
Contact address:
European Mathematical Society Publishing House
Seminar for Applied Mathematics
ETH-Zentrum FLI C4
CH-8092 Zrich
Switzerland
Phone: +41 (0)44 632 34 36
Email: info@ems-ph.org
Homepage: www.ems-ph.org
Cover photograph by Susanne Woess-Gallasch, Hakone, 2007
Typeset using the authors TEX files: I. Zimmermann, Freiburg
Printed in Germany
9 8 7 6 5 4 3 2 1
Preface
This book is about time-homogeneous Markov chains that evolve with discrete
time steps on a countable state space. This theory was born more than 100 years
ago, and its beauty stems from the simplicity of the basic concept of these random
processes: given the present, the future does not depend on the past. While of
course a theory that builds upon this axiomcannot explain all the weird problems of
life in our complicated world, it is coupled with an ample range of applications as
well as the development of a widely ramied and fascinating mathematical theory.
Markov chains provide one of the most basic models of stochastic processes that
can be understood at a very elementary level, while at the same time there is an
amazing amount of ongoing, new and deep research work on that subject.
The present textbook is based on my Italian lecture notes Catene di Markov
e teoria del potenziale nel discreto from 1996 [W1]. I thank Unione Matematica
Italiana for authorizing me to publish such a translation. However, this is not just a
one-to-one translation. My viewon the subject has widened, part of the old material
has been rearranged or completely modied, and a considerable amount of material
has been added. Only Chapters 1, 2, 6, 7 and 8 and a smaller portion of Chapter 3
follow closely the original, so that the material has almost doubled.
As one will see from summary (page ix) and table of contents, this is not about
applied mathematics but rather tries to develop the pure mathematical theory,
starting at a very introductory level and then displaying several of the many fasci-
nating features of that theory.
Prerequisites are, besides the standard rst year linear algebra and calculus
(including power series), an understanding of and most important interest in
probability theory, possibly including measure theory, even though a good part of
the material can be digested even if measure theory is avoided. A small amount
of complex function theory, in connection with the study of generating functions,
is needed a few times, but only at a very light level: it is useful to know what a
singularity is and that for a power series with non-negative coefcients the radius
of convergence is a singularity. At some points, some elementary combinatorics is
involved. For example, it will be good to know how one solves a linear recursion
with constant coefcients. Besides this, very basic Hilbert space theory is needed
in C of Chapter 4, and basic topology is needed when dealing with the Martin
boundary in Chapter 7. Here it is, in principle, enough to understand the topology
of metric spaces.
One cannot claim that every chapter is on the same level. Some, specically at
the beginning, are more elementary, but the road is mostly uphill. I myself have
used different parts of the material that is included here in courses of different levels.
vi Preface
The writing of the Italian lecture notes, seen a posteriori, was sort of a warmup
before my monograph Random walks on innite graphs and groups [W2]. Markov
chain basics are treated in a rather condensed way there, and the understanding of
a good part of what is expanded here in detail is what I would hope a reader could
bring along for digesting that monograph.
I thank Donald I. Cartwright, Rudolf Grbel, Vadim A. Kaimanovich, Adam
Kinnison, Steve Lalley, Peter Mrters, Sebastian Mller, Marc Peign, Ecaterina
Sava and Florian Sobieczky very warmly for proofreading, useful hints and some
additional material.
Graz, July 2009 Wolfgang Woess
Contents
Preface v
Introduction ix
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Raison dtre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1 Preliminaries and basic facts 1
A Preliminaries, examples . . . . . . . . . . . . . . . . . . . . . . . . 1
B Axiomatic denition of a Markov chain . . . . . . . . . . . . . . . 5
C Transition probabilities in n steps . . . . . . . . . . . . . . . . . . . 12
D Generating functions of transition probabilities . . . . . . . . . . . 17
2 Irreducible classes 28
A Irreducible and essential classes . . . . . . . . . . . . . . . . . . . 28
B The period of an irreducible class . . . . . . . . . . . . . . . . . . . 35
C The spectral radius of an irreducible class . . . . . . . . . . . . . . 39
3 Recurrence and transience, convergence, and the ergodic theorem 43
A Recurrent classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
B Return times, positive recurrence, and stationary probability measures 47
C The convergence theorem for nite Markov chains . . . . . . . . . 52
D The PerronFrobenius theorem . . . . . . . . . . . . . . . . . . . . 57
E The convergence theorem for positive recurrent Markov chains . . . 63
F The ergodic theorem for positive recurrent Markov chains . . . . . . 68
G j-recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4 Reversible Markov chains 78
A The network model . . . . . . . . . . . . . . . . . . . . . . . . . . 78
B Speed of convergence of nite reversible Markov chains . . . . . . 83
C The Poincar inequality . . . . . . . . . . . . . . . . . . . . . . . . 93
D Recurrence of innite networks . . . . . . . . . . . . . . . . . . . . 102
E Random walks on integer lattices . . . . . . . . . . . . . . . . . . . 109
5 Models of population evolution 116
A Birth-and-death Markov chains . . . . . . . . . . . . . . . . . . . . 116
B The GaltonWatson process . . . . . . . . . . . . . . . . . . . . . 131
C Branching Markov chains . . . . . . . . . . . . . . . . . . . . . . . 140
viii Contents
6 Elements of the potential theory of transient Markov chains 153
A Motivation. The nite case . . . . . . . . . . . . . . . . . . . . . . 153
B Harmonic and superharmonic functions. Invariant and excessive
measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
C Induced Markov chains . . . . . . . . . . . . . . . . . . . . . . . . 164
D Potentials, Riesz decomposition, approximation . . . . . . . . . . . 169
E Balayage and domination principle . . . . . . . . . . . . . . . . . 173
7 The Martin boundary of transient Markov chains 179
A Minimal harmonic functions . . . . . . . . . . . . . . . . . . . . . 179
B The Martin compactication . . . . . . . . . . . . . . . . . . . . . 184
C Supermartingales, superharmonic functions, and excessive measures 191
D The PoissonMartin integral representation theorem . . . . . . . . . 200
E Poisson boundary. Alternative approach to the integral representation 209
8 Minimal harmonic functions on Euclidean lattices 219
9 Nearest neighbour random walks on trees 226
A Basic facts and computations . . . . . . . . . . . . . . . . . . . . . 226
B The geometric boundary of an innite tree . . . . . . . . . . . . . . 232
C Convergence to ends and identication of the Martin boundary . . . 237
D The integral representation of all harmonic functions . . . . . . . . 246
E Limits of harmonic functions at the boundary . . . . . . . . . . . . 251
F The boundary process, and the deviation from the limit geodesic . . 263
G Some recurrence/transience criteria . . . . . . . . . . . . . . . . . 267
H Rate of escape and spectral radius . . . . . . . . . . . . . . . . . . 279
Solutions of all exercises 297
Bibliography 339
A Textbooks and other general references . . . . . . . . . . . . . . . . 339
B Research-specic references . . . . . . . . . . . . . . . . . . . . . 341
List of symbols and notation 345
Index 349
Introduction
Summary
Chapter 1 starts with elementary examples (A), the rst being the one that is
depicted on the cover of the book of Kemeny and Snell [K-S]. This is followed
by an informal description (What is a Markov chain?, The graph of a Markov
chain) and then (B) the axiomatic denition as well as the construction of the
trajectory space as the standard model for a probability space on which a Markov
chain can be dened. This quite immediate rst impact of measure theory might be
skipped at rst reading or when teaching at an elementary level. After that we are
back to basic transition probabilities and passage times (C). In the last section (D),
the rst encounter with generating functions takes place, and their basic properties
are derived. There is also a short explanation of transition probabilities and the
associated generating functions in purely combinatorial terms of paths and their
weights.
Chapter 2 contains basic material regarding irreducible classes (A) and periodicity
(B), interwoven with examples. It ends with a brief section (C) on the spectral
radius, which is the inverse of the radius of convergence of the Green function (the
generating function of n-step transition probabilities).
Chapter 3 deals with recurrence vs. transience (A & B) and the fundamental
convergence theorem for positive recurrent chains (C & E). In the study of posi-
tive recurrence and existence and uniqueness of stationary probability distributions
(B), a mild use of generating functions and de lHospitals rule as the most dif-
cult tools turn out to be quite efcient. The convergence theorem for positive
recurrent, aperiodic chains appears so important to me that I give two different
proofs. The rst (C) applies primarily (but not only) to nite Markov chains and
uses Doeblins condition and the associated contraction coefcient. This is pure
matrix analysis which leads to crucial probabilistic interpretations. In this context,
one can understand the convergence theorem for nite Markov chains as a special
case of the famous PerronFrobenius theorem for non-negative matrices. Here
(D), I make an additional detour into matrix analysis by reversing this viewpoint:
the convergence theorem is considered as a main rst step towards the proof of the
PerronFrobenius theorem, which is then deduced. I do not claim that this proof
is overall shorter than the typical one that one nds in books such as the one of
Seneta [Se]; the main point is that I want to work out how one can proceed by
extending the lines of thought of the preceding section. What follows (E) is an-
other, elegant and much more probabilistic proof of the convergence theorem for
general positive recurrent, aperiodic Markov chains. It uses the coupling method,
x Introduction
see Lindvall [Li]. In the original Italian text, I had instead presented the proof
of the convergence theorem that is due to Erds, Feller and Pollard [20], a
breathtaking piece of elementary analysis of sequences; see e.g, [Se, 5.2]. It is
certainly not obsolete, but I do not think I should have included a third proof here,
too. The second important convergence theorem, namely, the ergodic theorem for
Markov chains, is featured in F. The chapter ends with a short section (G) about
j-recurrence.
Chapter 4. The chapter (most of whose material is not contained in [W1]) starts
with the network interpretation of a reversible Markov chain (A). Then (B) the
interplay between the spectrumof the transition matrix and the speed of convergence
to equilibrium (= the stationary probability) for nite reversible chains is studied,
with some specic emphasis on the special case of symmetric random walks on
nite groups. This is followed by a very small introductory glimpse (C) at the
very impressive work on geometric eigenvalue bounds that has been promoted in
the last two decades via the work of Diaconis, Saloff-Coste and others; see [SC]
and the references therein, in particular, the basic paper by Diaconis and Stroock
[15] on which the material here is based. Then I consider recurrence and transience
criteria for innite reversible chains, featuring in particular the ow criterion (D).
Some very basic knowledge of Hilbert spaces is required here. While being close
to [W2, 2.B], the presentation is slightly different and slower. The last section
(E) is about recurrence and transience of random walks on integer lattices. Those
Markov chains are not always reversible, but I gured this was the best place to
include that material, since it starts by applying the ow criterion to symmetric
random walks. It should be clear that this is just a very small set of examples from
the huge world of random walks on lattices, where the classical source is Spitzers
famous book [Sp]; see also (for example) Rvsz [R], Lawler [La] and Fayolle,
Malyshev and Menshikov [F-M-M], as well as of course the basic material in
Fellers books [F1], [F2].
Chapter 5 rst deals with two specic classes of examples, starting with birth-
and-death chains on the non-negative integers or a nite interval of integers (A).
The Markov chains are nearest neighbour random walks on the underlying graph,
which is a half-line or line segment. Amongst other things, the link with analytic
continued fractions is explained. Then (B) the classical analysis of the Galton
Watson process is presented. This serves also as a prelude of the next section (C),
which is devoted to an outline of some basic features of branching Markov chains
(BMCs, C). The latter combine Markov chains with the evolution of a population
according to a GaltonWatson process. BMCs themselves go beyond the theme of
this book, Markov chains. One of their nice properties is that certain probabilistic
quantities associated with BMC are expressed in terms of the generating functions
of the underlying Markov chain. In particular, j-recurrence of the chain has such an
interpretation via criticality of an embedded GaltonWatson process. In viewof my
Summary xi
insisting on the utility of generating functions, this is a very appealing propaganda
instrument regarding their probabilistic nature.
In the sections on the GaltonWatson process and BMC, I pay some extra
attention to the rigorous construction of a probability space on which the processes
can be dened completely and with all their features; see my remarks about a
certain nonchalance regarding the existence of the probabilistic heaven further
below which appear to be particularly appropriate here. (I do not claim that the
proposed model probability spaces are the only good ones.)
Of this material, only the part of Adealing with continued fractions was already
present in [W1].
Chapter 6 displays basic notions, terminology and results of potential theory in the
discrete context of transient Markov chains. The discrete Laplacian is 1 1, where
1 is the transition matrix and 1 the identity matrix. The starting point (A) is the
nite case, where we declare a part of the state space to be the boundary and its
complement to be the interior. We look for functions that have preassigned value
on the boundary and are harmonic in the interior. This discrete Dirichlet problem
is solved in probabilistic terms.
We then move on to the innite, transient case and (in B) consider basic features
of harmonic and superharmonic functions and their duals in terms of measures on
the state space. Here, functions are thought of as column vectors on which the
transition matrix acts from the left, while measures are row vectors on which the
matrix acts from the right. In particular, transience is linked with the existence
of non-constant positive superharmonic functions. Then (C) induced Markov
chains and their interplay with superharmonic functions and excessive measures
are displayed, after which (D) classical results such as the Riesz decomposition
theorem and the approximation theorem for positive superharmonic functions are
proved. The chapter ends (E) with an explanation of balayage in terms of rst
entrance and last exit probabilities, concluding with the domination principle for
superharmonic functions.
Chapter 7 is an attempt to give a careful exposition of Martin boundary theory
for transient Markov chains. I do not aim at the highest level of sophistication but
at the broadest level of comprehensibility. As a mild but natural restriction, only
irreducible chains are considered (i.e., all states communicate), but substochastic
transition matrices are admitted since this is needed anyway in some of the proofs.
The starting point (A) is the denition and rst study of the extreme elements
in the convex cone of positive superharmonic functions, in particular, the minimal
harmonic functions. The construction/denition of the Martin boundary (B) is pre-
ceded by a preamble on compactications in general. This section concludes with
the statement of one of the two main theorems of that theory, namely convergence to
the boundary. Before the proof, martingale theory is needed (C), and we examine
the relation of supermartingales with superharmonic functions and, more subtle and
xii Introduction
important here, with excessive measures. Then (D) we derive the PoissonMartin
integral representation of positive harmonic functions and show that it is unique
over the minimal boundary. Finally (E) we study the integral representation of
bounded harmonic functions (the Poisson boundary), its interpretation via termi-
nal random variables, and the probabilistic Fatou convergence theorem. At the
end, the alternative approach to the PoissonMartin integral representation via the
approximation theorem is outlined.
Chapter 8 is very short and explains the rather algebraic procedure of nding all
minimal harmonic functions for random walks on integer grids.
Chapter 9, on the contrary, is the longest one and dedicated to nearest neighbour
random walks on trees (mostly innite). Here we can harvest in a concrete class
of examples from the seed of methods and results of the preceding chapters. First
(A), the fundamental equations for rst passage time generating functions on trees
are exhibited, and some basic methods for nite trees are outlined. Then we turn to
innite trees and their boundary. The geometric boundary is described via the end
compactication (B), convergence to the boundary of transient random walks is
proved directly, and the Martin boundary is shown to coincide with the space of ends
(C). This is also the minimal boundary, and the limit distribution on the boundary
is computed. The structural simplicity of trees allows us to provide also an integral
representation of all harmonic functions, not only positive ones (D). Next (E) we
examine in detail the Dirichlet problem at innity and the regular boundary points,
as well as a simple variant of the radial Fatou convergence theorem. A good part
of these rst sections owes much to the seminal long paper by Cartier [Ca], but
one of the innovations is that many results do not require local niteness of the
tree. There is a short intermezzo (F) about how a transient random walk on a tree
approaches its limiting boundary point. After that, we go back to transience/recur-
rence and consider a few criteria that are specic to trees, with a special eye on
trees with nitely many cone types (G). Finally (H), we study in some detail two
intertwined subjects: rate of escape (i.e., variants of the lawof large numbers for the
distance to the starting point) and spectral radius. Throughout the chapter, explicit
computations are carried out for various examples via different methods.
Examples are present throughout all chapters.
Exercises are not accumulated at the end of each section or chapter but built in
the text, of which they are considered an integral part. Quite often they are used in
the subsequent text and proofs. The imaginary ideal reader is one who solves those
exercises in real time while reading.
Solutions of all exercises are given after the last chapter.
The bibliography is subdivided into two parts, the rst containing textbooks and
other general references, which are recognizable by citations in letters. These are
also intended for further reading. The second part consists of research-specic
Raison dtre xiii
references, cited by numbers, and I do not pretend that these are complete. I tried
to have them reasonably complete as far as material is concerned that is relatively
recent, but going back in time, I rely more on the belief that what Im using has
already reached a conrmed status of public knowledge.
Raison dtre
Why another book about Markov chains? As a matter of fact, there is a great
number and variety of textbooks on Markov chains on the market, and the older ones
have by no means lost their validity just because so many new ones have appeared
in the last decade. So rather than just praising in detail my own opus, let me display
an incomplete subset of the mentioned variety.
For me, the all-time classic is Chungs Markov chains with stationary transition
probabilities [Ch], along with Kemeny and Snell, Finite Markov chains [K-S],
whose rst editions are both from 1960. My own learning of the subject, years
ago, owes most to Denumerable Markov chains by Kemeny, Snell and Knapp
[K-S-K], for which the title of this book is thought as an expression of reverence
(without claiming to reach a comparable amplitude). Besides this, I have a very high
esteem of Senetas Non-negative matrices and Markov chains [Se] (rst edition
from 1973), where of course a reader who is looking for stochastic adventures will
need previous motivation to appreciate the matrix theory view.
Among the older books, one denitely should not forget Freedman [Fr]; the
one of Isaacson and Madsen [I-M] has been very useful for preparing some of
my lectures (in particular on non time-homogeneous chains, which are not featured
here), and Revuz [Re] profound French style treatment is an important source
permanently present on my shelf.
Coming back to the last 1012 years, my personal favourites are the monograph
by Brmaud [Br] which displays a very broad range of topics with a permanent eye
on applications in all areas (this is the book that I suggest to young mathematicians
who want to use Markov chains in their future work), and in particular the very
nicely written textbook by Norris [No], which provides a delightful itinerary into
the worldof stochastics for a probabilist-to-be. Quite recently, D. Stroockenriched
the selection of introductory texts on Markov processes by [St2], written in his
masterly style.
Other recent, maybe more focused texts are due to Behrends [Be] and Hgg-
strm [H], as well as the St. Flour lecture notes by Saloff-Coste [SC]. All
this is complemented by the high level exercise selection of Baldi, Mazliak and
Priouret [B-M-P].
In Italy, my lecture notes (the rst in Italian dedicated exclusively to this topic)
were followed by the densely written paperback by Pintacuda [Pi]. In this short
xiv Introduction
review, I have omitted most of the monographs about Markov chains on non-discrete
state spaces, such as Nummelin [Nu] or Hernndez-Lerma and Lasserre [H-L]
(to name just two besides [Re]) as well as continuous-time processes.
So in view of all this, this text needs indeed some additional reason of being.
This lies in the three subtitle topics generating functions, boundary theory, random
walks on trees, which are featured with some extra emphasis among all the material.
Generating functions. Some decades ago, as an apprentice of mathematics, I learnt
from my PhD advisor Peter Gerl at Salzburg how useful it was to use generating
functions for analyzing random walks. Already a small amount of basic knowl-
edge about power series with non-negative coefcients, as it is taught in rst or
second year calculus, can be used efciently in the basic analysis of Markov chains,
such as irreducible classes, transience, null and positive recurrence, existence and
uniqueness of stationary measures, and so on. Beyond that, more subtle methods
from complex analysis can be used to derive rened asymptotics of transition prob-
abilities and other limit theorems. (See [53] for a partial overview.) However, in
most texts on Markov chains, generating functions play a marginal role or no role
at all. I have the impression that quite a few of nowadays probabilists consider
this too analytically-combinatorially avoured. As a matter of fact, the three Italian
reviewers of [W1] criticised the use of generating functions as being too heavy to
be introduced at such an early stage in those lecture notes. With all my students
throughout different courses on Markov chains and random walks, I never noticed
any such difculties.
With humble admiration, I sympathise very much with the vibrant preface of
D. Stroocks masterpiece Probability theory: an analytic view [St1]: (quote) I
have never been able to develop sufcient sensitivity to the distinction between
a proof and a probabilistic proof . So, conrming hereby that Im not a (quote)
dyed-in-the-wool probabilist, Imstubbornenoughtoinsist that the systematic use
of generating functions at an early stage of developing Markov chain basics is very
useful. This is one of the specic raisons dtre of this book. In any case, their use
here is very very mild. My original intention was to include a whole chapter on the
application of tools from complex analysis to generating functions associated with
Markov chains, but as the material grew under my hands, this had to be abandoned
in order to limit the size of the book. The masters of these methods come from
analytic combinatorics; see the very comprehensive monograph by Flajolet and
Sedgewick [F-S].
Boundary theory and elements of discrete potential theory. These topics are
elaborated at a high level of sophistication by Kemeny, Snell and Knapp [K-S-K]
and Revuz [Re], besides the literature from the 1960s and 70s in the spirit of
abstract potential theory. While [K-S-K] gives a very complete account, it is not
at all easy reading. My aim here is to give an introduction to the language and
basics of the potential theory of (transient) denumerable Markov chains, and, in
Raison dtre xv
particular, a rather complete picture of the associated topological boundary theory
that may be accessible for good students as well as interested colleagues coming
from other elds of mathematics. As a matter of fact, even advanced non-experts
have been tending to mix up the concepts of Poisson and Martin boundaries as well
as the Dirichlet problem at innity (whose solution with respect to some geometric
boundary does not imply that one has identied the Martin boundary, as one nds
stated). In the exposition of this material, my most important source was a rather
old one, which still is, according to my opinion, the best readable presentation of
Martin boundary theory of Markov chains: the expository article by Dynkin [Dy]
from 1969.
Potential and boundary theory is a point of encounter between probability and
analysis. While classical potential theory was already well established when its
intrinsic connection with Brownian motion was revealed, the probabilistic theory
of denumerable Markov chains and the associated potential theory were developed
hand in hand by the same protagonists: to their mutual benet, the two sides
were never really separated. This is worth mentioning, because there are not only
probabilists but also analysts who distinguish between a proof and a probabilistic
proof in a different spirit, however, which may suggest that if an analytic result
(such as the solution of the Dirichlet problemat innity) is deduced by probabilistic
reasoning, then that result is true only almost surely before an analytic proof has
been found.
What is not included here is the potential and boundary theory of recurrent
chains. The former plays a prominent role mainly in relation with random walks
on two-dimensional grids, and Spitzers classic [Sp] is still a prominent source on
this; I also like to look up some of those things in Lawler [La]. Also, not much
is included here about the
2
-potential theory associated with reversible Markov
chains (networks); the reader can consult the delightful little book by Doyle and
Snell [D-S] and the lecture notes volume by Soardi [So].
Nearest neighbour random walk on trees is the third item in the subtitle. Trees
provide an excellent playground for working out the potential and boundary theory
associated with Markov chains. Although the relation with the classical theory is
not touched here, the analogy with potential theory and Brownian motion on the
open unit disk, or rather, on the hyperbolic plane, is striking and obvious. The com-
binatorial structure of trees is simple enough to allowa presentation of a selection of
methods and results which are well accessible for a sufciently ambitious beginner.
The resulting, rather long nal chapter takes up and elaborates upon various topics
from the preceding chapters. It can serve as a link with [W2], where not as much
space has been dedicated to this specic theme, and, in particular, the basics are not
developed as broadly as here.
In order to avoid the impact of additional structure-theoretic subtleties, I insist
on dealing only with nearest neighbour randomwalks. Also, this chapter is certainly
xvi Introduction
far frombeing comprehensive. Nevertheless, I think that a good part of this material
appears here in book form for the rst time. There are also a few new results and/or
proofs.
Additional material can be found in [W2], and also in the ever forthcoming,
quite differently avoured wonderful book by Lyons with Peres [L-P].
At last, I want to say a few words about
the role of measure theory. If one wants to avoid measure theory, and in particular
the extension machinery in the construction of the trajectory space of a Markov
chain, then one can carry out a good amount of the theory by considering the Markov
chain in a nite time interval {0. . . . . n]. The trajectory space is then countable and
the underlying probability measure is atomic. For deriving limit theorems, one may
rst consider that time interval and then let n o. In this spirit, one can use a
rather large part of the initial material in this book for teaching Markov chains at
an elementary level, and I have done so on various occasions.
However, it is myopinionthat it has beena great achievement that probabilityhas
been put on the solid theoretical fundament of measure theory, and that students of
mathematics (as well as physics) should be exposed to that theoretical fundament,
as opposed to fake attempts to make their curricula more soft or applied by
giving up an important part of the mathematical edice.
Furthermore, advanced probabilists are quite often and with very good reason
somewhat nonchalant when referring to the spaces on which their randomprocesses
are dened. The attitude oftenbecomes one where we are condent that there always
is some big probability space somewhere up in the clouds, a kind of probabilistic
heaven, on which all the random variables and processes that we are working with
are dened and comply with all the properties that we postulate, but we do not
always care to see what makes it sure that this probabilistic heaven is solid. Apart
from the suspicion that this attitude may be one of the causes of the vague distrust
of some analysts to which I alluded above, this is ne with me. But I believe this
should not be a guideline of the education of master or PhD students; they should
rst see how to set up the edice rigorously before passing to nonchalance that is
based on rm knowledge.
What is not contained about Markov chains is of course much more than what is
contained in this book. I could have easily doubled its size, thereby also changing its
scope and intentions. I already mentioned recurrent potential and boundary theory,
there is a lot more that one could have said about recurrence and transience, one
could have included more details about geometric eigenvalue bounds, the Galton
Watson process, and so on. I have not included any hint at continuous-time Markov
processes, and there is no random environment, in spite of the fact that this is
currently very much en vogue and may have a much more probabilistic taste than
Markov chains that evolve on a deterministic space. (Again, Im stubborn enough
to believe that there is a lot of interesting things to do and to say about the situation
Raison dtre xvii
where randomness is restricted to the transition probabilities themselves.) So, as I
also said elsewhere, Im sure that every reader will be able to single out her or his
favourite among those topics that are not included here. In any case, I do hope that
the selected material and presentation may provide some stimulus and usefulness.
Chapter 1
Preliminaries and basic facts
A Preliminaries, examples
The following introductory example is taken from the classical book by Kemeny
and Snell [K-S], where it is called the weather in the land of OZ.
1.1 Example (The weather in Salzburg). [The author studied and worked in the
beautiful city of Salzburg from 1979 to 1981. He hopes that Salzburg tourism
authorities wont take offense from the following over-simplied meteorological
model.] Italian tourists are very fond of Salzburg, the Rome of the North. Arriving
there, they discover rapidly that the weather is not as stable as in the South. There
are never two consecutive days of bright weather. If one day is bright, then the
next day it rains or snows with equal probability. A rainy or snowy day is followed
with equal probability by a day with the same weather or by a change; in case of a
change of the weather, it improves only in one half of the cases.
Let us denote the three possible states of the weather in Salzburg by (bright),
(rainy) and (snowy). The following table and gure illustrate the situation.
_
_
_
_
0 1,2 1,2
1,4 1,2 1,4
1,4 1,4 1,2
_
_
_
_
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
...............................................................................................................................................................
...............................................................................................................................................................
.........................
. . . . . . . . . . . .
.............. . . . . . . . . . . .
............
1,4 1,4
.......................................................................................................
.......................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............. ............
........................................................................................................ . . . . . . . . . . . . . . . . . . . . . . .
1,2
1,4
.......................................................................................................
.......................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................
...............
........................................................................................................ . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
1,2
1,4
................................................
................................................
......................
. . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
1,2
................................................
................................................
........... . . . . . . . . . . .
............
. . . . . . . .
. . . . . . . .
1,2
Figure 1
The table (matrix) tells us, for example, in the rst row and second column that
after a bright day comes a rainy day with probability 1,2, or in the third row and
rst column that snowy weather is followed by a bright day with probability 1,4.
2 Chapter 1. Preliminaries and basic facts
Questions:
a) It rains today. What is the probability that two weeks from now the weather
will be bright?
b) How many rainy days do we expect (on the average) during the next month?
c) What is the mean duration of a bad weather period?
1.2 Example ( The drunkards walk [folklore, also under the name of gamblers
ruin]). A drunkard wants to return home from a pub. The pub is situated on a
straight road. On the left, after 100 steps, it ends at a lake, while the drunkards
home is on the right at a distance of 200 steps from the pub, see Figure 2. In each
step, the drunkard walks towards his house (with probability 2,3) or towards the
lake (with probability 1,3). If he reaches the lake, he drowns. If he returns home,
he stays and goes to sleep.
~
~
~
~
~
~
~
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................................................................................ ... .............
. . . . . . .................................................................... . . . . . .
...
. . . . . . .................................................................... . . . . . .
Figure 2
Questions:
a) What is the probability that the drunkard will drown? What is the probability
that he will return home?
b) Supposing that he manages to return home, how much time ( how many
steps) does it take him on the average?
1.3 Example (P. Gerl). A cat climbs a tree (Figure 3).
.............................................................................................................................................................................................................................................................................................
,,,,,,,,,,,,,,,,,,,,
................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
............................................
................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
................................................
.......................................................................................................
......................................................................................................
..................................................
..........................................................
...........................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
........................................................
v
v
v v
v
v
v
v
v
v
v
v
v v
v
v
v
v
v
v
v
v
v
v
Figure 3
At each ramication point it decides by chance, typically with equal probability
among all possibilities, to climb back down to the previous ramication point or to
advance to one of the neighbouring higher points. If the cat arrives at the top (at
a leaf) then at the next step it will return to the preceding ramication point and
continue as before.
A. Preliminaries, examples 3
Questions:
a) What is the probability that the cat will ever return to the ground? What is
the probability that it will return at the n-th step?
b) How much time does it take on the average to return?
c) What is the probability that the cat will return to the ground before visiting
any (or a specic given) leaf of the tree?
d) How often will the cat visit, on the average, a given leaf , before returning
to the ground?
1.4 Example. Same as Example 1.3, but on another planet, where trees have innite
height (or an innite number of branchings).
Further examples will be given later on.
What is a Markov chain?
We need the following ingredients.
(1) A state space X, nite or countably innite (with elements u. . n. .. ,. etc.,
or other notation which is suitable in the respective context). In Example 1.1,
X = { . . ], the three possible states of the weather. In Example 1.2,
X = {0. 1. 2. . . . . 300], all possible distances (in steps) from the lake. In
Examples 1.3 and 1.4, X is the set of all nodes (vertices) of the tree: the root,
the ramication points, and the leaves.
(2) A matrix (table) of one-step transition probabilities
1 =
_
(.. ,)
_
x,,A
.
Our random process consists of performing steps in the state space from
one point to next one, and so on. The steps are random, that is, subject to
a probability law. The latter is described by the transition matrix 1: if at
some instant, we are at some state (point) . X, the number (.. ,) is
the probability that the next step will take us to ,, independently of how we
arrived at .. Hence, we must have
(.. ,) _ 0 and
,A
(.. ,) = 1 for all . X.
In other words, 1 is a stochastic matrix.
(3) An initial distribution v. This is a probability measure on X, and v(.) is the
probability that the random process starts at ..
Time is discrete, the steps are labelled by N
0
, the set of non-negative integers.
At time 0 we start at a point u X. One after the other, we perform random steps:
the position at time n is random and denoted by 7
n
. Thus, 7
n
, n = 0. 1. . . . , is
a sequence of X-valued random variables, called a Markov chain. Denote by Pr
u
the probability of events concerning the Markov chain starting at u. We hence have
Pr
u
7
n1
= , [ 7
n
= .| = (.. ,). n N
0
(provided Pr
u
7
n
= .| > 0, since the denition Pr( [ T) = Pr( T), Pr(T)
of conditional probability requires that the condition T has positive probability).
In particular, the step which is performed at time n depends only on the current
position (state), and not on the past history of how that position was reached, nor
on the specic instant n: if Pr
u
7
n
= .. 7
n-1
= .
n-1
. . . . . 7
1
= .
1
| > 0 and
Pr
u
7
n
= .| > 0 then
Pr
u
7
n1
= , [ 7
n
= .. 7
n-1
= .
n-1
. . . . . 7
1
= .
1
|
= Pr
u
7
n1
= , [ 7
n
= .| = Pr
u
7
n1
= , [ 7
n
= .|
= (.. ,).
(1.5)
The graph of a Markov chain
A graph I consists of a denumerable, nite or innite vertex set V(I) and an edge
set 1(I) V(I) V(I); the edges are oriented: the edge .. ,| goes from. to ,.
(Graph theorists would use the term digraph, reserving graph to the situation
where edges are non-oriented.) We also admit loops, that is, edges of the form
.. .|. If .. ,| 1(I), we shall also write .
1
,.
1.6 Denition. Let 7
n
, n = 0. 1. . . . , be a Markov chain with state space X and
transition matrix 1 =
_
(.. ,)
_
x,,A
. The vertex set of the graph I = I(1)
of the Markov chain is V(I) = X, and .. ,| is an edge in 1(I) if and only if
(.. ,) > 0.
We can also associate weights to the oriented edges: the edge .. ,| is weighted
with (.. ,). Seen in this way, a Markov chain is a denumerable, oriented graph
with weighted edges. The weights are positive and satisfy
,:x
1
,
(.. ,) = 1 for all . V(I).
As a matter of fact, in Examples 1.11.3, we have already used those graphs for
illustrating the respective Markov chains. We have followed the habit of drawing
one non-oriented edge instead of a pair of oppositely oriented edges with the same
endpoints.
B. Axiomatic denition of a Markov chain 5
B Axiomatic denition of a Markov chain
In order to give a precise denition of a Markov chain in the language of probability
theory, we start with a probability space
1
(C
+
. A
+
. Pr
+
) and a denumerable set X,
the state space. With the latter, we implicitly associate the o-algebra of all subsets
of X.
1.7 Denition. A Markov chain is a sequence 7
+
n
, n = 0. 1. 2. . . . , of random
variables (measurable functions) 7
+
n
: C
+
X with the following properties.
(i) Markov property. For all elements .
0
. .
1
. . . . . .
n
and .
n1
X which
satisfy Pr
+
7
+
n
= .
n
. 7
+
n-1
= .
n-1
. . . . . 7
+
0
= .
0
| > 0, one has
Pr
+
7
+
n1
= .
n1
[ 7
+
n
= .
n
. 7
+
n-1
= .
n-1
. . . . . 7
+
0
= .
0
|
= Pr
+
7
+
n1
= .
n1
[ 7
+
n
= .
n
|.
(ii) Time homogeneity. For all elements .. , X and m. n N
0
which satisfy
Pr
+
7
+
n
= .| > 0 and Pr
+
7
+
n
= .| > 0, one has
Pr
+
7
+
n1
= , [ 7
+
n
= .| = Pr
+
7
+
n1
= , [ 7
+
n
= .|.
Here, 7
+
n
= .| is the set (event) {o
+
C
+
[ 7
+
n
(o
+
) = .] A
+
, and
Pr
+
[ 7
+
n
= .| is probability conditioned by that event, and so on. In the sequel,
the notation for an event [logic expression] will always refer to the set of all elements
in the underlying probability space for which the logic expression is true.
If we write
(.. ,) = Pr
+
7
+
n1
= , [ 7
+
n
= .|
(which is independent of n as long as Pr
+
7
+
n
= .| > 0), we obtain the transi-
tion matrix 1 =
_
(.. ,)
_
x,,A
of (7
+
n
). The initial distribution of (7
+
n
) is the
probability measure v on X dened by
v(.) = Pr
+
7
+
0
= .|.
(We shall always write v(.) instead of v({.]), and v() =

x
v(.). The initial
distribution represents an initial experiment, as in a board game where the initial
position of a gure is chosen by throwing a dice. In case v =
u
, the point mass at
u X, we say that the Markov chain starts at u.
More generally, one speaks of a Markov chain when just the Markov property (i)
holds. Its fundamental signicance is the absence of memory: the future (time
n 1) depends only on the present (time n) and not on the past (the time instants
1
That is, a set D
with a c-algebra A
of subsets of D
and a c-additive probability measure Pr
on A
. The
superscript is used because later on, we shall usually reserve the notation (D, A, Pr) for
a specic probability space.
0. . . . . n 1). If one does not have property (ii), time homogeneity, this means
that
n1
(.. ,) = Pr
+
7
+
n1
= , [ 7
+
n
= .| depends also on the instant n
besides . and ,. In this case, the Markov chain is governed by a sequence 1
n
(n _ 1) of stochastic matrices. In the present text, we shall limit ourselves to
the study of time-homogeneous chains. We observe, though, that also non-time-
homogeneous Markov chains are of considerable interest in various contexts, see
e.g. the corresponding chapters in the books by Isaacson and Madsen [I-M],
Seneta [Se] and Brmaud [Br].
In the introductory paragraph, we spoke about Markov chains starting directly
with the state space X, the transition matrix 1 and the initial point u X, and the
sequence of random variables (7
n
) was introduced heuristically. Thus, we now
pose the question whether, with the ingredients X and 1 and a starting point u X
(or more generally, an initial distribution v on X), one can always nd a probability
space on which the randomposition after nsteps can be described as the n-th random
variable of a Markov chain in the sense of the axiomatic Denition 1.7. This is
indeed possible: that probability space, called the trajectory space, is constructed
in the following, natural way.
We set
C = X
N
0
= {o = (.
0
. .
1
. .
2
. . . . ) [ .
n
X for all n _ 0]. (1.8)
An element o = (.
0
. .
1
. .
2
. . . . ) represents a possible evolution (trajectory), that
is, a possible sequence of points visited one after the other by the Markov chain.
The probability that this single o will indeed be the actual evolution should be the
innite product v(.
0
)(.
0
. .
1
)(.
1
. .
2
) . . . , which however will usually converge
to 0: as in the case of Lebesgue measure, it is in general not possible to construct
the probability measure by assigning values to single elements (atoms) of C.
Let a
0
. a
1
. . . . . a
k
X. The cylinder with base a = (a
0
. a
1
. . . . . a
k
) is the set
C(a) = C(a
0
. a
1
. . . . . a
k
)
= {o = (.
0
. .
1
. .
2
. . . . ) C : .
i
= a
i
. i = 0. . . . . k].
(1.9)
This is the set of all possible ways how the evolution of the Markov chain may
continue after the initial steps through a
0
. a
1
. . . . . a
k
. With this set we associate
the probability
Pr
_
C(a
0
. a
1
. . . . . a
k
)
_
= v(a
0
)(a
0
. a
1
)(a
1
. a
2
) (a
k-1
. a
k
). (1.10)
We also consider C as a cylinder (with empty base) with Pr
(C) = 1.
For n N
0
, let A
n
be the o-algebra generated by the collection of all cylinders
C(a) with a X
k1
, k _ n. Thus, A
n
is in one-to-one correspondence with the
collection of all subsets of X
n1
. Furthermore, A
n
A
n1
, and
F =
_
n
A
n
is an algebra of subsets of C: it contains C and is closed with respect to taking
complements and nite unions. We denote by Athe o-algebra generated by F .
Finally, we dene the projections 7
n
, n _ 0: for o = (.
0
. .
1
. .
2
. . . . ) C,
7
n
(o) = .
n
. (1.11)
1.12Theorem. (a) The measure Pr
has a unique extension to a probability measure

on A, also denoted Pr
.
(b) On the probability space (C. A. Pr
), the projections 7
n
, n = 0. 1. 2. . . . ,
dene a Markov chain with state space X, initial distribution v and transition
matrix 1.
Proof. By (1.10), Pr
is dened on cylinder sets. Let A

n
. We can write as
a nite disjoint union of cylinders of the form C(a
0
. a
1
. . . . . a
n
). Thus, we can use
(1.10) to dene a o-additive probability measure v
n
on A
n
by
v
n
() =
aA
nC1
: C(a)
1r
_
C(a)
_
.
We prove that v
n
coincides with Pr
on the cylinder sets, that is,

v
n
_
C(a
0
. a
1
. . . . . a
k
)
_
= Pr
_
C(a
0
. a
1
. . . . . a
k
)
_
. if k _ n. (1.13)
We proceed by reversed induction on k. By (1.10), the identity (1.13) is valid for
k = n. Now assume that for some k with 0 < k _ n, the identity (1.13) is valid for
every cylinder C(a
0
. a
1
. . . . . a
k
). Consider a cylinder C(a
0
. a
1
. . . . . a
k-1
). Then
C(a
0
. a
1
. . . . . a
k-1
) =
_
o
k
A
C(a
0
. a
1
. . . . . a
k
).
a disjoint union. By the induction hypothesis,
v
n
_
C(a
0
. a
1
. . . . . a
k-1
)
_
=
o
k
A
Pr
_
C(a
0
. a
1
. . . . . a
k
)
_
.
But from the denition of Pr
and stochasticity of the matrix 1, we get
o
k
A
Pr
_
C(a
0
. a
1
. . . . . a
k
)
_
=
o
k
A
Pr
_
C(a
0
. a
1
. . . . . a
k-1
)
_
(a
k-1
. a
k
)
= Pr
_
C(a
0
. a
1
. . . . . a
k-1
)
_
.
which proves (1.13).
(1.13) tells us that for n _ k and A
k
A
n
, we have v
n
() = v
k
().
Therefore, we can extend Pr
fromthe collection of all cylinder sets to F by setting

Pr
() = v
n
(). if A
n
.
We see that Pr
is a nitely additive measure with total mass 1 on the algebra F ,

and it is o-additive on each A
n
. In order to prove that Pr
is o-additive on F , on
the basis of a standard theorem from measure theory (see e.g. Halmos [Hal]), it is
sufcient to verify continuity of Pr
at 0: if (
n
) is a decreasing sequence of sets
in F with
_
n
n
= 0, then lim
n
Pr
(
n
) = 0.
Let us now suppose that (
n
) is a decreasing sequence of sets in F , and that
lim
n
Pr
(
n
) > 0. Since the o-algebras A
n
increase with n, there are numbers
k(n) such that
n
A
k(n)
and 0 _ k(n) < k(n 1). We claim that there is a
sequence of cylinders C
n
= C(a
n
) with a
n
X
k(n)1
such that for every n
(i) C
n1
C
n
,
(ii) C
n

n
, and
(iii) lim
n
Pr
(
n
C
n
) > 0.
Proof. The o-additivity of Pr
on A
k(n)
implies
0 < lim
n
Pr
(
n
) = lim
n
aA
k.0/C1
Pr
n
C(a)
_
=
aA
k.0/C1
lim
n
Pr
n
C(a)
_
.
(1.14)
It is legitimate to exchange sum and limit in (1.14). Indeed,
Pr
n
C(a)
_
_ Pr
0
C(a)
_
.
aA
k.0/C1
Pr
0
C(a)
_
= Pr
(
0
) < o.
and considering the sum as a discrete integral with respect to the variable a
X
k(0)1
, Lebesgues dominated convergence theorem justies (1.14).
From (1.14) it follows that there is a
0
X
k(0)1
such that for C
0
= C(a
0
) one
has
lim
n
Pr
(
n
C
0
) > 0. (1.15)
0
is a disjoint union of cylinders C(b) with b X
k(0)1
. In particular, either
C
0

0
= 0 or C
0

0
. The rst case is impossible, since Pr
(
0
C
0
) > 0.
Therefore C
0
satises (ii) and (iii).
Now let us suppose to have already constructed C
0
. . . . . C
i-1
such that (i), (ii)
and (iii) hold for all indices n < r. We dene a sequence of sets
t
n
=
ni
C
i-1
. n _ 0.
By our hypotheses, the sequence (
t
n
) is decreasing, lim
n
Pr
(
t
n
) > 0, and
t
n

A
k
0
(n)
, where k
t
(n) = k(n r). Hence, the reasoning that we have applied
above to the sequence (
n
) can now be applied to (
t
n
), and we obtain a cylinder
C
i
= C(a
i
) with a
i
X
k
0
(0)1
= X
k(n)1
, such that
lim
n
Pr
(
t
n
C
i
) > 0 and C
i

t
0
.
From the denition of
t
n
, we see that also
lim
n
Pr
(
n
C
i
) > 0 and C
i

i
C
i-1
.
Summarizing, (i), (ii) and (iii) are valid for C
0
. . . C
i
, and the existence of the
proposed sequence (C
n
) follows by induction.
Since C
n
= C(a
n
), where a
n
X
k(n)1
, it follows from(i) that the initial piece
of a
n1
uptothe indexk(n) must be a
n
. Thus, there is a trajectoryo = (a
0
. a
1
. . . . )
such that a
n
= (a
0
. . . . . a
k(n)
) for each n. Consequently, o C
n
for each n, and
_
n
n

_
C
n
= 0.
We have proved that Pr
is continuous at 0, and consequently o-additive on F . As

mentioned above, nowone of the fundamental theorems of measure theory (see e.g.
[Hal, 13]) asserts that Pr
extends in a unique way to a probability measure on the

o-algebra Agenerated by F : statement (a) of the theorem is proved.
For verifying (b), consider .
0
. .
1
. . . . . . = .
n
. , = .
n1
X such that
Pr
7
0
= .
0
. . . . . 7
n
= .
n
| > 0. Then
Pr
7
n1
=, [ 7
n
= .. 7
i
= .
i
for all i < n| =
Pr
_
C(.
0
. . . . . .
n1
)
_
Pr
_
C(.
0
. . . . . .
n
)
_
= (.. ,).
(1.16)
On the other hand, consider the events = 7
n1
= ,| and T = 7
n
= .| in A.
We can write T =
_
{C(a) [ a X
n1
. a
n
= .]. Using (1.16), we get
Pr
7
n1
= , [ 7
n
= .| =
Pr
( T)
Pr
(T)
=
aA
nC1
: o
n
=x,
Pr
(C(a))>0
Pr
_
C(a)
_
Pr
(T)
=
aA
nC1
: o
n
=x,
Pr
(C(a))>0
Pr
_
[ C(a)
_ Pr
_
C(a)
_
Pr
(T)
=
=
aA
nC1
: o
n
=x,
Pr
(C(a))>0
(.. ,)
Pr
_
C(a)
_
Pr
(T)
= (.. ,).
Thus, (7
n
) is a Markov chain on X with transition matrix 1. Finally, for . X,
Pr
7
0
= .| = Pr
_
C(.)
_
= v(.).
so that the distribution of 7
0
is v.
We observe that beginning with the point (1.13) regarding the measures v
n
, the
last theorem can be deduced from Kolmogorovs theorem on the construction of
probability measures on innite products of Polish spaces. As a matter of fact,
the proof given here is basically the one of Kolmogorovs theorem adapted to
our special case. For a general and advanced treatment of that theorem, see e.g.
Parthasarathy [Pa, Chapter V].
The following theorem tells us that (1) the generic probability space of the
axiomatic Denition 1.7 on which the Markov chain (7
+
n
) is dened is always
larger (in a probabilistic sense) than the associated trajectory space, and that (2)
the trajectory space contains already all the information on the individual Markov
chain under consideration.
1.17 Theorem. Let (C
+
. A
+
. Pr
+
) be a probability space and (7
+
n
) a Markov chain
dened on C
+
, with state space X, initial distribution v and transition matrix 1.
Equip the trajectory space (C. A) with the probability measure Pr
dened in (1.10)
and Theorem 1.12, and with the sequence of projections (7
n
) of (1.11). Then the
function
t : C
+
C. o
+
o =
_
7
+
0
(o
+
). 7
+
1
(o
+
). 7
+
2
(o
+
). . . .
_
is measurable, 7
+
n
(o
+
) = 7
n
_
t(o
+
)
_
, and
Pr
() = Pr
+
_
t
-1
()
_
for all A.
Proof. A is the o-algebra generated by all cylinders. By virtue of the extension
theorems of measure theory, for the proof it is sufcient to verify that t
-1
(C) A
+
and Pr
(C) = Pr
+
_
t
-1
(C)
_
for each cylinder C. Thus, let C = C(a
0
. . . . . a
k
) with
a
i
X. Then
t
-1
(C) = {o
+
C
+
[ 7
+
i
(o
+
) = a
i
. i = 0. . . . k]
=
k
_
i=0
{o
+
C
+
[ 7
+
i
(o
+
) = a
i
] A
+
.
and by the Markov property,
Pr
+
_
t
-1
(C)
_
= Pr
+
7
+
k
= a
k
. 7
+
k-1
= a
k-1
. . . . 7
+
0
= a
0
|
= Pr
+
7
+
0
= a
0
|
k
i=1
Pr
+
7
+
i
= a
i
[ 7
+
i-1
= a
i-1
. . . . 7
+
0
= a
0
|
= v(a
0
)
k
i=1
(a
i-1
. a
i
) = Pr
(C).
From now on, given the state space X and the transition matrix 1, we shall
(almost) always consider the trajectory space (C. A) with the family of probability
measures Pr
, corresponding to all the initial distributions v on X. Thus, our usual

model for the Markov chain on X with initial distribution v and transition matrix
1 will always be the sequence (7
n
) of the projections dened on the trajectory
space (C. A. Pr
). Typically, we shall have v =

u
, a point mass at u X. In
this case, we shall write Pr
= Pr
u
. Sometimes, we shall omit to specify the initial
distribution and call Markov chain the pair (X. 1).
1.18 Remarks. (a) The fact that on the trajectory space one has a whole family
of different probability measures that describe the same Markov chain, but with
different initial distributions, seems to create every now and then some confusion
at an initial level. This can be overcome by choosing and xing one specic initial
distribution v
0
which is supported by the whole of X. If X is nite, this may be
equidistribution. If X is innite, one can enumerate X = {.
k
: k N] and set
v
0
(.
k
) = 2
-k
. Then we can assign one global probability measure on (C. A)
by Pr = Pr
0
. With this choice, we get
Pr
u
= Pr [ 7
0
= u| and Pr
uA
Pr [ 7
0
= u| v(u).
if v is another initial distribution.
(b) As mentioned in the Preface, if one wants to avoid measure theory, and in
particular the extension machinery of Theorem 1.12, then one can carry out most
of the theory by considering the Markov chain in a nite time interval 0. n|. The
trajectory space is then X
n1
(starting with index 0), the o-algebra consists of all
subsets of X
n1
and is in one-to-one correspondence with A
n
, and the probability
measure Pr
= Pr
n
is atomic with
Pr
n
(a) = v(a
0
)(a
0
. a
1
) (a
n-1
. a
n
). if a = (a
0
. . . . . a
n
).
For deriving limit theorems, one may rst consider that time interval and then let
n o.
C Transition probabilities in n steps
Let (X. 1) be a Markov chain. What is the probability to be at , at the n-th step,
starting from . (n _ 0, .. , X)? We are interested in the number
(n)
(.. ,) = Pr
x
7
n
= ,|.
Intuitively the following is already clear (and will be proved in a moment): if v is
an initial distribution and k N
0
is such that Pr
7
k
= .| > 0, then
Pr
7
kn
= , [ 7
k
= .| =
(n)
(.. ,). (1.19)
We have
(0)
(.. .) = 1.
(0)
(.. ,) = 0 if . ,= ,. and
(1)
(.. ,) = (.. ,).
We can compute
(n1)
(.. ,) starting from the values of
(n)
(. ) as follows, by
decomposing with respect to the rst step: the rst step goes from. to some n X,
with probability (.. n). The remaining n steps have to take us from n to ,, with
probability
(n)
(n. ,) not depending on the past history. We have to sum over all
possible points n:
(n1)
(.. ,) =
uA
(.. n)
(n)
(n. ,). (1.20)
Let us verify the formulas (1.19) and (1.20) rigorously. For n = 0 and n = 1,
they are true. Suppose that they both hold for n _ 1, and that Pr
7
k
= .| > 0.
Recall the following elementary rules for conditional probability. If . T. C A,
then
Pr
( [ C) =
_
Pr
( C), Pr
(C). if Pr
(C) > 0.
0. if Pr
(C) = 0:
Pr
( T [ C) = Pr
(T [ C) Pr
( [ T C):
Pr
( L T [ C) = Pr
( [ C) Pr
(T [ C). if T = 0.
Hence we deduce via 1.7 that
Pr
7
kn1
= , [ 7
k
= .|
= Pr
J n X : (7
kn1
= , and 7
k1
= n) [ 7
k
= .|
=
uA
Pr
7
kn1
= , and 7
k1
= n [ 7
k
= .|
=
uA
Pr
7
k1
= n [ 7
k
= .| Pr
7
kn1
= , [ 7
k1
= n and 7
k
= .|
(note that Pr
7
k1
= n and 7
k
= .| > 0 if Pr
7
k1
= n [ 7
k
= .| > 0)
C. Transition probabilities in n steps 13
=
uA
Pr
7
k1
= n [ 7
k
= .| Pr
7
kn1
= , [ 7
k1
= n|
=
uA
(.. n)
(n)
(n. ,).
Since the last sum does not depend on v or k, it must be equal to
Pr
x
7
n1
= ,| =
(n1)
(.. ,).
Therefore, we have the following.
1.21 Lemma. (a) The number
(n)
(.. ,) is the element at position (.. ,) in the
n-th power 1
n
of the transition matrix,
(b)
(nn)
(.. ,) =

uA

(n)
(.. n)
(n)
(n. ,),
(c) 1
n
is a stochastic matrix.
Proof. (a) follows directly from the identity (1.20).
(b) follows from the identity 1
nn
= 1
n
1
n
for the matrix powers of 1.
(c) is best understood by the probabilistic interpretation: starting at . X, it is
certain that after n steps one reaches some element of X, that is
1 = Pr
x
7
n
X| =
,A
Pr
x
7
n
= ,| =
,A
(n)
(.. ,).
1.22 Exercise. Let (7
n
) be a Markov chain on the state space X, and let 0 _
n
1
< n
2
< < n
k1
. Show that for 0 < m < n, .. , X and A
n
with
Pr
() > 0,
if 7
n
(o) = . for all o . then Pr
7
n
= , [ | =
(n-n)
(.. ,).
Deduce that if .
1
. . . . . .
k1
X are such that Pr
7
n
k
= .
k
. . . . . 7
n
1
= .
1
| > 0,
then
Pr
7
n
kC1
=.
k1
[ 7
n
k
=.
k
. . . . . 7
n
1
=.
1
| = Pr
7
n
kC1
=.
k1
[ 7
n
k
=.
k
|.
A real random variable is a measurable function f : (C. A) (
R.

B), where
R = RL{o. o] and

B is the o-algebra of extended Borel sets. If the integral
of f with respect to Pr
exists, we denote it by
E
(f ) =
_
D
f J Pr
.
This is the expectation or expected value of f ; if v =
x
, we write E
x
(f ).
An example is the number of visits of the Markov chain to a set W X. We
dene
v
W
n
(o) = 1
W
_
7
n
(o)
_
. where 1
W
(.) =
_
1. if . W.
0. otherwise
is the indicator function of the set W . If W = {.], we write v
W
n
= v
x
n
.
2
The
random variable
v
W
k, nj
= v
W
k
v
W
k1
v
W
n
(k _ n)
is the number of visits of 7
n
in the set W during the time period from step k to step
n. It is often called the local time spent in W by the Markov chain during that time
interval. We write
v
W
=
o
n=0
v
W
n
for the total number of visits (total local time) in W . We have
E
(v
W
k, nj
) =
xA
n
}=k
,W
v(.)
(})
(.. ,). (1.23)
1.24 Denition. A stopping time is a random variable t taking its values in N
0
L
{o], such that
t _ n| = {o C [ t(o) _ n] A
n
for all n N
0
.
That is, the property that a trajectory o = (.
0
. .
1
. . . . ) satises t(o) _ n
depends only on (.
0
. . . . . .
n
). In still other words, this means that one can decide
whether t _ n or not by observing only 7
0
. . . . . 7
n
.
1.25 Exercise (Strong Markov property). Let (7
n
)
n_0
be a Markov chain with
initial distribution v and transition matrix 1 on the state space X, and let t be a
stopping time with Pr
t < o| = 1. Show that (7

tn
)
n_0
, dened by
7
tn
(o) = 7
t(o)n
(o). o C.
is again a Markov chain with transition matrix 1 and initial distribution
v(.) = Pr
7
t
= .| =
o
k=0
Pr
7
k
= .. t = k|.
[Hint: decompose according to the values of t and apply Exercise 1.22.]
2
We shall try to distinguish between the measure
x
and the function 1
x
.
C. Transition probabilities in n steps 15
For W X, two important examples of not necessarily a.s. nite stopping
times are the hitting times, also called rst passage times:
s
W
= inf{n _ 0 : 7
n
W] and t
W
= inf{n _ 1 : 7
n
W]. (1.26)
Observe that, for example, the denition of s
W
should be read as s
W
(o) = inf{n _
0 [ 7
n
(o) W] and the inmum over the empty set is to be taken as o. Thus,
s
W
is the instant of the rst visit of 7
n
in W , while t
W
is the instant of the rst
visit in W after starting. Again, we write s
x
and t
x
if W = {.].
The following quantities play a crucial role in the study of Markov chains.
G(.. ,) = E
x
(v
,
).
J(.. ,) = Pr
x
s
,
< o|. and
U(.. ,) = Pr
x
t
,
< o|. .. , X.
(1.27)
G(.. ,) is the expected number of visits of (7
n
) in ,, starting at ., while J(.. ,)
is the probability to ever visit , starting at ., and U(.. ,) is the probability to visit
, after starting at .. In particular, U(.. .) is the probability to ever return to .,
while J(.. .) = 1. Also, J(.. ,) = U(.. ,) when , = ., since the two stopping
times s
,
and t
,
coincide when the starting point differs from the target point ,.
Furthermore, if we write
(n)
(.. ,) = Pr
x
s
,
= n| = Pr
x
7
n
= ,. 7
i
= , for 0 _ i < n| and
u
(n)
(.. ,) = Pr
x
t
,
= n| = Pr
x
7
n
= ,. 7
i
= , for 1 _ i < n|.
(1.28)
then
(0)
(.. .) = 1,
(0)
(.. ,) = 0 if , = ., u
(0)
(.. ,) = 0 for all .. ,, and
(n)
(.. ,) = u
(n)
(.. ,) for all n, if , = .. We get
G(.. ,) =
o
n=0
(n)
(.. ,). J(.. ,) =
o
n=0
(n)
(.. ,).
U(.. ,) =
o
n=0
u
(n)
(.. ,).
We can now give rst answers to the questions of Example 1.1.
a) If it rains today then the probability to have bright weather two weeks from
now is
(14)
( . ).
This is number obtained by computing the 14-th power of the transition matrix.
One of the methods to compute this power in practice is to try to diagonalize
the transition matrix via its eigenvalues. In the sequel, we shall also study other
methods of explicit computation and compute the numerical values for the solutions
that answer this and the next question.
b) If it rains today, then the expected number of rainy days during the next month
(thirty-one days, starting today) is
E
_
v
0, 30j
_
=
30
n=0
(n)
( . ).
c) A bad weather period begins after a bright day. It lasts for exactly n days
with probability
Pr 7
1
. 7
2
. . . . 7
n
{ . ]. 7
n1
= | = Pr t = n 1| = u
(n1)
( . ).
We shall see belowthat the bad weather does not continue forever, that is, Pr t =
o| = 0. Hence, the expected duration (in days) of a bad weather period is
o
n=1
nu
(n1)
( . ).
In order to compute this number, we can simplify the Markov chain. We have
( . ) = ( . ) = 1,4, so bad weather today ( ) is followed by a bright
day tomorrow ( ) with probability 1,4, and again by bad weather tomorrow ( )
with probability 3,4, independently of the particular type ( or ) of todays bad
weather. Bright weather today ( ) is followed with probability 1 by bad weather
tomorrow ( ). Thus, we can combine the two states and into a single, new
state bad weather, denoted , and we obtain a new state space

X = { . ] with
a new transition matrix

1:
( . ) = 0. ( . ) = 1.
( . ) = 1,4. ( . ) = 3,4.
For the computation of probabilities of events which do not distinguish between the
different types of bad weather, we can use the new Markov chain (

X.

1), with the
associated probability measures Pr ,

X. We obtain
Pr t = n 1| = Pr t = n 1|
= ( . )
_
( . )
_
n-1
( . )
= (3,4)
n-1
(1,4).
In particular, we verify
Pr t = o| = 1
o
n=0
Pr t = n 1| = 0.
D. Generating functions of transition probabilities 17
Using
o
n=1
n:
n-1
=
_
o
n=0
:
n
_
t
=
_
1
1 :
_
t
=
1
(1 :)
2
.
we now nd that the mean duration of a bad weather period is
1
4
o
n=1
n
_
3
4
_
n-1
=
1
4
1
(1
3
4
)
2
= 4.
In the last example, we have applied a useful method for simplifying a Markov
chain (X. 1), namely factorization. Suppose that we have a partition

X of the state
space X with the following property.
For all .. ,

X. (.. ,) =
, ,
(.. ,) is constant for . .. (1.29)
If this holds, we can consider

X as a new state space with transition matrix

1,
where
( .. ,) = (.. ,). with arbitrary . .. (1.30)
The new Markov chain is the factor chain with respect to the given partition.
More precisely, let be the natural projection X

X. Choose an initial
distribution v on X and let v = (v), that is, v( .) =

x x
v(.). Consider the
associated trajectory spaces (C. A. Pr
) and (
C.

A. Pr

) and the natural extension
: C

C, namely (.
0
. .
1
. . . . ) =
_
(.
0
). (.
1
). . . .
_
. Then we obtain for the
Markov chains (7
n
) on X and (
7
n
) on

X
7
n
= (7
n
) and Pr
-1
(

)
_
= Pr

(

) for all

A.
We leave the verication as an exercise to the reader.
1.31 Exercise. Let (7
n
) be a Markov chain on the state space X with transition
matrix 1, and let

X be a partition of X with the natural projection : X

X.
Show that
_
(7
n
)
_
is (for every starting point in X) a Markov chain on

X if and
only if (1.29) holds.
D Generating functions of transition probabilities
If (a
n
) is a sequence of real or complex numbers then the complex power series
o
n=0
a
n
:
n
. : C. is called the generating function of (a
n
). From the study
of the generating function one can very often deduce useful information about the
sequence. In particular, let (X. 1) be a Markov chain. Its Green function or Green
kernel is
G(.. ,[:) =
o
n=0
(n)
(.. ,) :
n
. .. , X. : C. (1.32)
Its radius of convergence is
r(.. ,) = 1
limsup
n
_
(n)
(.. ,)
_
1{n
_ 1. (1.33)
If : = r(.. ,), the series may converge or diverge to o. In both cases, we write
G
_
.. ,[r(.. ,)
_
for the corresponding value. Observe that the number G(.. ,)
dened in (1.27) coincides with G(.. ,[1).
Now let r = inf{r(.. ,) [ .. , X] and [:[ < r. We can form the matrix
G(:) =
_
G(.. ,[:)
_
x,,A
.
We have seen that
(n)
(.. ,) is the element at position (.. ,) of the matrix 1
n
, and
that 1
0
= 1 is the identity matrix over X. We may therefore write
G(:) =
o
n=0
:
n
1
n
.
where convergence of matrices is intended pointwise in each pair (.. ,) X
2
. The
series converges, since [:[ < r. We have
G(:) = 1
o
n=1
:
n
1
n
= 1 :1
o
n=0
:
n
1
n
= 1 :1 G(:).
In fact, by (1.20)
G(.. ,[:) =
(0)
(.. ,)
o
n=1
uA
(.. n)
(n-1)
(n. ,) :
n
(+)
=
(0)
(.. ,)
uA
: (.. n)
o
n=1
(n-1)
(n. ,) :
n-1
=
(0)
(.. ,)
uA
: (.. n) G(n. ,[:):
the exchange of the sums in (+) is legitimate because of the absolute convergence.
Hence
(1 :1) G(:) = 1. (1.34)
and we may formally write G(:) = (1 :1)
-1
. If X is nite, the involved matrices
are nite dimensional, and this is the usual inverse matrix when det(1 :1) = 0.
Formal inversion is however not always justied. If X is innite, one rst has
to specify on which (normed) linear space these matrices act and then to verify
invertibility on that space.
In Example 1.1 we get
G(:) =
_
_
1 :,2 :,2
:,4 1 :,2 :,4
:,4 :,4 1 :,2
_
_
1
=
1
(1 :)(16 :
2
)
_
_
(4 3:)(4 :) 2:(4 :) 2:(4 :)
:(4 :) 2(8 4: :
2
) 2:(2 :)
:(4 :) 2:(2 :) 2(8 4: :
2
)
_
_
.
We can now complete the answers to questions a) and b): regarding a),
G( . [:) =
:
(1 :)(4 :)
=
:
5
_
1
1 :
1
4 :
_
=
o
n=1
1
5
_
1
(1)
n
4
n
_
:
n
.
In particular,
(14)
( . ) =
1
5
_
1
(1)
14
4
14
_
.
Analogously, to obtain b),
G( . [:) =
2(8 4: :
2
)
(1 :)(4 :)(4 :)
has to be expanded in a power series. The sum of the coefcients of :
n
, n =
0. . . . . 30, gives the requested expected value.
1.35 Theorem. If X is nite then G(.. ,[:) is a rational function in :.
Proof. Recall that for a nite, invertible matrix =
_
a
i,}
_
,
-1
=
1
det()
_
a
i,}
_
with a
i,}
= (1)
i-}
det( [ . i ).
where det( [ . i ) is the determinant of the matrix obtained from by deleting
the -th row and the i -th column. Hence
G(.. ,[:) =
det(1 :1 [ ,. .)
det(1 :1)
(1.36)
(with sign to be specied) is the quotient of two polynomials.
Next, we consider the generating functions
J(.. ,[:) =
o
n=0
(n)
(.. ,) :
n
and U(.. ,[:) =
o
n=0
u
(n)
(.. ,) :
n
. (1.37)
For : = 1, we obtain the probabilities J(.. ,[1) = J(.. ,) and U(.. ,[1) =
U(.. ,) introduced in the previous paragraph in (1.27). Note that J(.. .[:) =
J(.. .[0) = 1 and U(.. ,[0) = 0, and that U(.. ,[:) =J(.. ,[:), if . =,. In-
deed, among the U-functions we shall only need U(.. .[:), which is the probability
generating function of the rst return time to ..
We denote bys(.. ,) the radius of convergence of U(.. ,[:). Since u
(n)
(.. ,) _
(n)
(.. ,) _ 1, we must have
s(.. ,) _ r(.. ,) _ 1.
The following theorem will be useful on many occasions.
1.38 Theorem. (a) G(.. .[:) =
1
1 U(.. .[:)
, [:[ < r(.. .).
(b) G(.. ,[:) = J(.. ,[:) G(,. ,[:), [:[ < r(.. ,).
(c) U(.. .[:) =

,
(.. ,): J(,. .[:), [:[ < s(.. .).
(d) If , = . then J(.. ,[:) =

u
(.. n): J(n. ,[:), [:[ < s(.. ,).
Proof. (a) Let n _ 1. If 7
0
= . and 7
n
= ., then there must be an instant
k {1. . . . . n] such that 7
k
= ., but 7
}
= . for = 1. . . . . k 1, that is,
t
x
= k. The events
t
x
= k| = 7
k
= .. 7
}
= . for = 1. . . . . k 1|. k = 1. . . . . n.
are pairwise disjoint. Hence, using the Markov property (in its more general form
of Exercise 1.22) and (1.19),
(n)
(.. .) =
n
k=1
Pr
x
7
n
= .. t
x
= k|
=
n
k=1
Pr
x
7
n
= . [ 7
k
= .. 7
}
,= .
for = 1. . . . . k 1| Pr
x
t
x
= k|
()
=
n
k=1
Pr
x
7
n
= . [ 7
k
= .| Pr
x
t
x
= k|
=
n
k=1
(n-k)
(.. .) u
(k)
(.. .).
In (), the careful observer may note that one could have Pr
x
7
n
= . [ t
x
=
k| = Pr
x
7
n
= . [ 7
k
= .|, but this can happen only when Pr
x
t
x
= k| =
u
(k)
(.. .) = 0, so that possibly different rst factors in those sums are compensated
by multiplication with 0. Since u
(0)
(.. .) = 0, we obtain
(n)
(.. .) =
n
k=0
u
(k)
(.. .)
(n-k)
(.. .) for n _ 1. (1.39)
For n = 0 we have
(0)
(.. .) = 1, while

n
k=0
u
(k)
(.. .)
(n-k)
(.. .) = 0. It
follows that
G(.. .[:) =
o
n=0
(n)
(.. .) :
n
= 1
o
n=1
n
k=0
u
(k)
(.. .)
(n-k)
(.. .) :
n
= 1
o
n=0
n
k=0
u
(k)
(.. .)
(n-k)
(.. .) :
n
= 1 U(.. .[:) G(.. .[:)
by Cauchys product formula (Mertens theorem) for the product of two series, as
long as [:[ < r(.. .), in which case both involved power series converge absolutely.
(b) If . = ,, then
(0)
(.. ,) = 0. Recall that u
(k)
(.. ,) =
(k)
(.. ,) in this
case. Exactly as in (a), we obtain
(n)
(.. ,) =
n
k=1
(k)
(.. ,)
(n-k)
(,. ,) =
n
k=0
(k)
(.. ,)
(n-k)
(,. ,)
(1.40)
for all n _ 0 (no exception when n = 0), and (b) follows again from the product
formula for power series.
(d) Recall that
(0)
(.. ,) = 0, since , = .. If n _ 1 then the events
s
,
= n. 7
1
= n|. n X.
are pairwise disjoint with union s
,
= n|, whence
(n)
(.. ,) =
uA
Pr
x
s
,
= n. 7
1
= n|
=
uA
Pr
x
7
1
= n| Pr
x
s
,
= n [ 7
1
= n|
=
uA
Pr
x
7
1
= n| Pr
u
s
,
= n 1|
=
uA
(.. n)
(n-1)
(n. ,).
(Note that in the last sum, we may get a contribution fromn = , only when n = 1,
since otherwise
(n-1)
(,. ,) = 0.) It follows that s(n. ,) _ s(.. ,) whenever
n = , and (.. n) > 0. Therefore,
J(.. ,[:) =
o
n=1
(n)
(.. ,) :
n
=
uA
(.. n) :
o
n=1
(n-1)
(n. ,) :
n-1
=
uA
(.. n) : J(n. ,[:).
holds for [:[ < s(.. ,).
1.41 Exercise. Prove formula (c) of Theorem 1.38.
1.42 Denition. Let I be an oriented graph with vertex set X. For .. , X, a cut
point between . and , (in this order!) is a vertex n X such that every path in I
from . to , must pass through n.
The following proposition will be useful on several occasions. Recall that
s(.. ,) is the radius of convergence of the power series J(.. ,[:).
1.43 Proposition. (a) For all .. n. , X and for real : with 0 _ : _ s(.. ,) one
has
J(.. ,[:) _ J(.. n[:) J(n. ,[:).
(b) Suppose that in the graph I(1) of the Markov chain (X. 1), the state n is
a cut point between . and , X. Then
J(.. ,[:) = J(.. n[:) J(n. ,[:)
for all : C with [:[ < s(.. ,) and for : = s(.. ,).
Proof. (a) We have
(n)
(.. ,) = Pr
x
s
,
= n|
_ Pr
x
s
,
= n. s
u
_ n|
=
n
k=0
Pr
x
s
,
= n. s
u
= k|
=
n
k=0
Pr
x
s
u
= k| Pr
x
s
,
= n [ s
u
= k|
=
n
k=0
(k)
(.. n)
(n-k)
(n. ,).
The inequality of statement (a) is true when J(.. n[ ) 0 or J(n. ,[ ) 0.
So let us suppose that there is k such that
(k)
(.. n) > 0. Then
(n-k)
(n. ,) _
(n)
(.. ,),
(k)
(.. n) for all n _ k, whence s(n. ,) _ s(.. ,). In the same way,
we may suppose that
(I)
(n. ,) > 0 for some l, which implies s(.. n) _ s(.. ,).
Then
(n)
(.. ,) :
n
_
n
k=0
(k)
(.. n) :
k
(n-k)
(n. ,) :
n-k
for all real : with 0 _ : _ s(.. ,), and the product formula for power series implies
the result for all those : with : < s(.. ,). Since we have power series with non-
negative coefcients, we can let : s(.. ,) from below to see that statement (a)
also holds for : = s(.. ,), regardless of whether the series converge or diverge at
that point.
(b) If n is a cut point between . and ,, then the Markov chain must visit n
before it can reach ,. That is, s
u
_ s
,
, given that 7
0
= .. Therefore the strong
Markov property yields
(n)
(.. ,) = Pr
x
s
,
= n| = Pr
x
s
,
= n. s
u
_ n|
=
n
k=0
(k)
(.. n)
(n)
(n. ,).
We can now argue precisely as in the proof of (a), and the product formula for
power series yields statement (b) for all : C with [:[ < s(.. ,) as well as for
: = s(.. ,).
1.44 Exercise. Show that for distinct .. , X and for real : with 0 _ : _ s(.. .)
one has
U(.. .[:) _ J(.. ,[:) J(,. .[:).
1.45 Exercise. Suppose that n is a cut point between . and ,. Show that the
expected time to reach , starting from . (given that , is reached) satises
E
x
(s
,
[ s
,
< o) = E
x
(s
u
[ s
u
< o) E
u
(s
,
[ s
,
< o).
[Hint: check rst that E
x
(s
,
[ s
,
< o) = J
t
(.. ,[1),J(.. ,[1), and apply
Proposition 1.43 (b).]
We can now answer the questions of Example 1.2. The latter is a specic case
of the following type of Markov chain.
1.46 Example. The random walk with two absorbing barriers is the Markov chain
with state space
X = {0. 1. . . . . N]. N N
and transition probabilities
(0. 0) = (N. N) = 1.
(i. i 1) = . (i. i 1) = q for i = 1. . . . . N 1. and
(i. ) = 0 in all other cases.
Here 0 < < 1 and q = 1 . This example is also well known as the model of
the gamblers ruin.
The probabilities to reach 0 and N starting from the point are J(. 0) and
J(. N), respectively. The expected number of steps needed for reaching N from ,
under the condition that N is indeed reached, is
E
}
(s
1
[ s
1
< o) =
o
n=1
n Pr
}
s
1
= n [ s
1
< o|
J
t
(. N[1)
J(. N[1)
.
where J
t
(. N[:) denotes the derivative of J(. N[:) with respect to :, compare
with Exercise 1.45.
For the random walk with two absorbing barriers, we have
J(0. N[:) = 0. J(N. N[:) = 1. and
J(. N[:) = q: J( 1. N[:) : J( 1. N[:). = 1. . . . . N 1.
The last identity follows from Theorem 1.38 (d). For determining J(. N[:) as a
function of , we are thus lead to a linear difference equation of second order with
constant coefcients. The associated characteristic polynomial
: z
2
z q:
has roots
z
1
(:) =
1
2:
_
1
_
1 4q:
2
_
and z
2
(:) =
1
2:
_
1
_
1 4q:
2
_
.
We study the case [:[ < 1
2
_
q: the roots are distinct, and
J(. N[:) = a z
1
(:)
}
b z
2
(:)
}
.
The constants a = a(:) and b = b(:) are found by inserting the boundary values
at = 0 and = N:
a b = 0 and a z
1
(:)
1
b z
2
(:)
1
= 1.
The result is
J(. N[:) =
z
2
(:)
}
z
1
(:)
}
z
2
(:)
1
z
1
(:)
1
. [:[ <
1
2
_
q
.
With some effort, one computes for [:[ < 1
2
_
q
J
t
(. N[:)
J(. N[:)
=
1
:
_
1 4q:
2
_
N
(:)
1
1
(:)
1
1

(:)
}
1
(:)
}
1
_
.
where
(:) = z
1
(:),z
2
(:).
We have (1) = max{,q. q,]. Hence, if = 1,2, then 1 < 1
(2
_
q), and
E
}
(s
1
[ s
1
< o) =
1
q
_
N
(q,)
1
1
(q,)
1
1

(q,)
}
1
(q,)
}
1
_
.
When = 1,2, one easily veries that
J(. N[1) = ,N.
Instead of J
t
(. N[1),J(. N[1) one has to compute
lim
z-1-
J
t
(. N[:),J(. N[:)
in this case. We leave the corresponding calculations as an exercise.
In Example 1.2 we have N = 300, = 2,3, q = 1,3, = 1,2 and = 200.
We obtain
Pr
100
s
300
< o| =
1 2
-200
1 2
-300
~ 1 and
E
100
(s
300
[ s
300
< o) ~ 600.
The probability that the drunkard does not return home is practically 0. For returning
home, it takes him 600 steps on the average, that is, 3 times as many as necessary.
1.47 Exercise. (a) If = 1,2 instead of = 2,3, compute the expected time
(number of steps) that the drunkard needs to return home, supposing that he does
return.
(b) Modify the model: if the drunkard falls into the lake, he does not drown, but
goes back to the previous point on the road at the next step. That is, the lake has
become a reecting barrier, and (0. 0) = 0, (0. 1) = 1, while all other transition
probabilities remain unchanged. (In particular, the house remains an absorbing
state with (N. N) = 1.) Redo the computations in this case.
1.48 Exercise. Let (X. 1) be an arbitrary Markov chain, and dene a newtransition
matrix 1
o
= a 1 (1 a) 1, where 0 < a < 1. Let G(. [:) and G
o
(. [:)
denote the Green functions of 1 and 1
o
, respectively. Show that for all .. , X
G
o
(.. ,[:) =
1
1 a:
G
_
.. ,
: a:
1 a:
_
.
Further examples that illustrate the use of generating functions will follow later
on.
Paths and their weights
We conclude this section with a purely combinatorial description of several of the
probabilities and generating functions that we have considered so far. Recall the
Denition 1.6 of the (oriented) graph I(1) of the Markov chain (X. 1). A (nite)
path is a sequence = .
0
. .
1
. . . . . .
n
| of vertices (states) such that .
i-1
. .
i
| is
an edge, that is, (.
i-1
. .
i
) > 0 for i = 1. . . . . n. Here, n _ 0 is the length of ,
and is a path from .
0
to .
n
. If n = 0 then = .
0
| consists of a single point.
The weight of with respect to : C is
w([:) =
_
1. if n = 0.
(.
0
. .
1
)(.
1
. .
2
) (.
n-1
. .
n
) :
n
. if n _ 1.
and
w() = w([1).
If is a set of paths, then
(n)
denotes the set of all with length n. We can
consider w([:) as a complex-valued measure on the set of all paths. The weight of
is
w([:) =
t
w([:). and w() = w([1). (1.49)
given that the involved sum converges absolutely. Thus,
w([:) =
o
n=0
w(
(n)
) :
n
.
Now let (.. ,) be the set of all paths from . to ,. Then
(n)
(.. ,) = w
_
(n)
(.. ,)
_
and G(.. ,[:) = w
_
(.. ,)[:
_
. (1.50)
Next, let
(.. ,) be the set of paths from . to , which contain , only as their

nal point. Then
(n)
(.. ,) = w
_
(n)
(.. ,)
_
and J(.. ,[:) = w
_
(.. ,)[:
_
. (1.51)
Similarly, let
-
(.. ,) be the set of all paths . = .
0
. .
1
. . . . . .
n
= ,| for which
n _ 1 and .
i
= , for i = 1. . . . . n 1. Thus,
-
(.. ,) =
(.. ,) when . = ,,
and
-
(.. .) is the set of all paths that return to the initial point . only at the end.
We get
u
(n)
(.. ,) = w
_
(n)
-
(.. ,)
_
and U(.. ,[:) = w
_
-
(.. ,)[:
_
. (1.52)
If
1
= .
0
. . . . . .
n
| and
2
= ,
0
. . . . . ,
n
| are two paths with .
n
= ,
0
, then we
can dene their concatenation as
1

2
= .
0
. . . . . .
n
= ,
0
. . . . . ,
n
|.
We then have
w(
1

2
[:) = w(
1
[:) w(
2
[:). (1.53)
If
1
(.. n) is a set of paths from . to n and
2
(n. ,) is a set of paths from n
to ,, then we set
1
(.. n)
2
(n. ,) = {
1

2
:
1

1
(.. n).
2

2
(n. ,)].
Thus
w
_
1
(.. n)
2
(n. ,)
:
_
= w
_
1
(.. n)
:
_
w
_
2
(n. ,)
:
_
. (1.54)
Many of the identities for transition probabilities and generating functions can be
derived in terms of weights of paths and their concatenation. For example, the
obvious relation
(nn)
(.. ,) =
(n)
(.. n)
(n)
(n. ,)
(disjoint union) leads to
(nn)
(.. ,) = w
_
(nn)
(.. ,)
_
=
u
w
_
(n)
(.. n)
(n)
(n. ,)
_
=
(n)
(.. n)
(n)
(n. ,).
1.55 Exercise. Show that
(.. .) = {.|]
_
-
(.. .) (.. .)
_
and (.. ,) =
o
(.. ,) (,. ,).
and deduce statements (a) and (b) of Theorem 1.38.
Find analogous proofs in terms of concatenation and weights of paths for state-
ments (c) and (d) of Theorem 1.38.
Chapter 2
Irreducible classes
A Irreducible and essential classes
In the sequel, (X. 1) will be a Markov chain. For .. , X we write
a) .
n
,. if
(n)
(.. ,) > 0,
b) . ,. if there is n _ 0 such that .
n
,,
c) . , ,. if there is no n _ 0 such that .
n
,,
d) . - ,. if . , and , ..
These are all properties of the graph I(1) of the Markov chain that do not
depend on the specic values of the weights (.. ,) > 0. In the graph, .
n
,
means that there is a path (walk) of length n from . to ,. If . - ,, we say that
the states . and , communicate.
The relation is reexive and transitive. In fact
(0)
(.. .) = 1 by denition,
and if .
n
n and n
n
, then .
nn
,. This can be seen by concatenating paths
in the graph of the Markov chain, or directly by the inequality
(nn)
(.. ,) _
(n)
(.. n)
(n)
(n. ,) > 0. Therefore, we have the following.
2.1 Lemma. -is an equivalence relation on X.
2.2 Denition. An irreducible class is an equivalence class with respect to -.
In graph theoretical terminology, one also speaks of a strongly connected com-
ponent.
In Examples 1.1, 1.3 and 1.4, all elements communicate, and there is a unique
irreducible class. In this case, the Markov chain itself is called irreducible.
In Example 1.46 (and its specic case 1.2), there are 3 irreducible classes: {0],
{1. . . . . N 1] and {N].
A. Irreducible and essential classes 29
2.3 Example. Let X = {1. 2. . . . . . 13] and the graph I(1) be as in Figure 4.
(The oriented edges correspond to non-zero one-step transition probabilities.) The
irreducible classes are C(1) = {1. 2], C(3) = {3], C(4) = {4. 5. 6], C(7) =
{7. 8. 9. 10], C(11) = {11. 12] and C(13) = {13].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
2
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
3
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
4
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
7
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
5
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
8
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
10
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
6
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
9
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
13
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
11
............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
12
............................................
. . . . . . . . . . . .
................................ . ................................ . . . . . . . . . . .
............
................................
.........................................
. . . . . . . . . . . .
.............................
.........................................
. . . . . . . . . . . .
.............................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.............................................................................. . . . . . . . . . . .
............
.............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................................................. . . . . . . . . . . .
. . . . . . . . . . . .
.................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................. .................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................. .................................................. . . . . . . . . . . .
. . . . . . . . . . . .
.................................................
.........................................
. . . . . . . . . . . .
.............................
.........................................
. . . . . . . . . . . .
.............................
................................................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
......................
. . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
................................................
................................................
......................
. . . . . . . . . . . .
........
........
Figure 4
On the irreducible classes, the relation becomes a (partial) order: we dene
C(.) C(,) if and only if . ,.
It is easy to verify that this order is well dened, that is, independent of the
specic choice of representatives of the single irreducible classes.
2.4 Lemma. The relation is a partial order on the collection of all irreducible
classes of (X. 1).
Proof. Reexivity: since .
0
., we have C(.) C(.).
Transitivity: if C(.) C(n) C(,) then . n ,. Hence . ,, and
C(.) C(,).
Anti-symmetry: if C(.) C(,) C(.) then . , . and thus . - ,,
so that C(.) = C(,).
30 Chapter 2. Irreducible classes
In Figure 5, we illustrate the partial ordering of the irreducible classes of Ex-
ample 2.3:
11. 12 13
4. 5. 6
7. 8. 9. 10
3 1. 2
..............................................................................................................................................................................................
. . . . . . . . . . . .
....................................................................................
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............ ............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
..............................................................................................................................................................................................
. . . . . . . . . . . .
....................................................................................
. . . . . . . . . . . .
Figure 5
The graph associated with the partial order as in Figure 5 is in general not the graph
of a (factor) Markov chain obtained from (X. 1). Figure 5 tells us, for example,
that from any point in the class C(7) it is possible to reach C(4) with positive
probability, but not conversely.
2.5 Denition. The maximal elements (if they exist) of the partial order on
the collection of the irreducible classes of (X. 1) are called essential classes (or
absorbing classes).
1
A state . is called essential, if C(.) is an essential class.
In Example 2.3, the essential classes are {11. 12] and {13].
Once it has entered an essential class, the Markov chain (7
n
) cannot exit from
it:
2.6 Exercise. Let C X be an irreducible class. Prove that the following state-
ments are equivalent.
(a) C is essential.
(b) If . C and . , then , C.
(c) If . C and . , then , ..
If X is nite then there are only nitely many irreducible classes, so that in
their partial order, there must be maximal elements. Thus, from each element
1
In the literature, one sometimes nds the expressions recurrent or ergodic in the place of
essential, and transient in the place of non-essential. This is justied when the state space is
nite. We shall avoid these identications, since in general, recurrent and transient have another
meaning, see Chapter 3.
it is possible to reach an essential class. If X is innite, this is no longer true.
Figure 6 shows the graph of a Markov chain on X = N has no essential classes;
the irreducible classes are {1. 2]. {3. 4]. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
1
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
2
.............................................. . . . . . . . . . . .
. . . . . . . . . . . .
............................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
3
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
4
.............................................. . . . . . . . . . . .
. . . . . . . . . . . .
............................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
5
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
6
.............................................. . . . . . . . . . . .
. . . . . . . . . . . .
...
Figure 6
If an essential class contains exactly one point . (which happens if and only if
(.. .) = 1), then . is called an absorbing state. In Example 1.2, the lake and the
drunkards house are absorbing states.
We call a set T X convex, if .. , T and . n , implies n T.
Thus, if . T and n - ., then also n T. In particular, T is a union of
irreducible classes.
2.7 Theorem. Let T X be a nite, convex set that does not contain essential
elements. Then there is c > 0 such that for each . T and all but nitely many
n N,
,B
(n)
(.. ,) _ (1 c)
n
.
Proof. By assumption, T is a disjoint union of nite, non-essential irreducible
classes C(.
1
). . . . . C(.
k
). Let C(.
1
). . . . . C(.
}
) be the maximal elements in the
partial order , restricted to {C(.
1
). . . . . C(.
k
)], and let i {1. . . . . ]. Since
C(.
i
) is non-essential, there is
i
X such that .
i

i
but
i
, .
i
. By the
maximality of C(.
i
), we must have
i
X \ T. If . T then . .
i
for some
i {1. . . . ], and hence also .
i
, while
i
, .. Therefore there is m
x
N
such that
,B
(n
x
)
(.. ,) < 1.
Let m = max{m
x
: . T]. Choose . T. Write m = m
x
x
, where
x
N
0
.
We obtain
,B
(n)
(.. ,) =
,B
uA
(n
x
)
(.. n)
(I
x
)
(n. ,)

> 0 only if n T
(by convexity of T)
=
uB
(n
x
)
(.. n)
,B
(I
x
)
(n. ,)

_ 1
_
uB
(n
x
)
(.. n) < 1.
Since T is nite, there is k > 0 such that
,B
(n)
(.. ,) _ 1 k for all . T.
Let n N, n _ 2. Write n = km r with 0 _ r < m, and assume that k _ 1
(that is, n _ m). For . T,
,B
(n)
(.. ,) =
,B
uA
(kn)
(.. n)
(i)
(n. ,)

> 0 only if n T
=
uB
(kn)
(.. n)
,B
(i)
(n. ,)

_ 1
_
uB
(kn)
(.. n) (as above)
=
,B
((k-1)n)
(.. ,)
uB
(n)
(,. n)

_ 1 k
_ (1 k)
,B
((k-1)n)
(.. ,) (inductively)
_ (1 k)
k
=
_
(1 k)
k{n
_
n
(since k,n _ 1,(2m))
_
_
(1 k)
1{2n
_
n
= (1 c)
n
.
where c = 1 (1 k)
1{2n
.
In particular, let C be a nite, non-essential irreducible class. The theorem says
that for the Markov chain starting at . C, the probability to remain in C for n
steps decreases exponentially as n o. In particular, the expected number of
visits in C starting from . C is nite. Indeed, this number is computed as
E
x
(v
C
) =
o
n=0
,C
(n)
(.. ,) _ 1,c.
see (1.23). We deduce that v
C
< o almost surely.
2.8 Corollary. If C is a nite, non-essential irreducible class then
Pr
x
J k : 7
n
C for all n > k| = 1.
2.9 Corollary. If the set of all non-essential states in X is nite, then the Markov
chain reaches some essential class with probability one:
Pr
x
s
A
ess
< o| = 1.
where X
ess
is the union of all essential classes.
Proof. The set T = X\X
ess
is a nite union of nite, non-essential classes (indeed,
a convex set of non-essential elements). Therefore
Pr
x
s
A
ess
< o| = Pr
x
J k : 7
n
T for all n > k| = 1
by the same argument that lead to Corollary 2.8.
The last corollary does not remain valid when X\X
ess
is innite, as the following
example shows.
2.10 Example (Innite drunkards walk with one absorbing barrier). The state
space is X = N
0
. We choose parameters . q > 0, q = 1, and set
(0. 0) = 1. (k. k 1) = . (k. k 1) = q (k > 0).
while (k. ) = 0 in all other cases.
The state 0 is absorbing, while all other states belong to one non-essential,
innite irreducible class C = N. We observe that
J(k 1. k[:) = J(1. 0[:) for each k _ 0. (2.11)
This can be justied formally by (1.51), since the mapping n nk induces a bi-
jection from
(1. 0) to
(k 1. k): a path 1 = .
0
. .
1
. . . . . .
I
= 0| in
(1. 0)
is mapped to the path k 1 = .
0
k. .
1
k. . . . . .
I
k = k|. The bijec-
tion preserves all single weights (transition probabilities), so that w
_
(1. 0)[:
_
=
w
_
(k 1. k)[:
_
. Note that this is true because the parts of the graph of our
Markov chain that are on the right of k and on the right of 0 are isomorphic as
weighted graphs. (Here we cannot apply the same argument to (k 1. k) in the
place of
(k 1. k) !)
2.12Exercise. Formulate criteria of isomorphism, resp. restrictedisomorphism
that guarantee G(.. ,[:) = G(.
t
. ,
t
[:), resp. J(.. ,[:) = J(.
t
. ,
t
[:) for a gen-
eral Markov chain (X. 1) and points .. ,. .
t
. ,
t
X.
Returning to the drunkards fate, by Theorem 1.38 (c)
J(1. 0[:) = q: : J(2. 0[:). (2.13)
In order to reach state 0 starting fromstate 2, the randomwalk must necessarily pass
through state 1: the latter is a cut point between 2 and 0, and Proposition 1.43 (b)
together with (2.11) yields
J(2. 0[:) = J(2. 1[:) J(1. 0[:) = J(1. 0[:)
2
.
We substitute this identity in (2.13) and obtain
: J(1. 0[:)
2
J(1. 0[:) q: = 0.
The two solutions of this equation are
1
2:
_
1
_
1 4q:
2
_
.
We must have J(1. 0[0) = 0, and in the interior of the circle of convergence of this
power series, the function must be continuous. It follows that of the two solutions,
the correct one is
J(1. 0[:) =
1
2:
_
1
_
1 4q:
2
_
.
whence J(1. 0) = min{1. q,]. In particular, if > q then J(1. 0) < 1: starting
at state 1, the probability that (7
n
) never reaches the unique absorbing state 0 is
1 J(1. 0) > 0.
We can reformulate Theorem 2.7 in another way:
2.14 Denition. For any subset of X, we denote by 1
the restriction of 1 to :
(.. ,) = (.. ,). if .. , , and
(.. ,) = 0. otherwise.
We consider 1
as a matrix over the whole of X, but the same notation will be

used for the truncated matrix over the set . It is not necessarily stochastic, but
always substochastic: all row sums are _ 1.
The matrix1
describes the evolutionof the Markovchainconstrainedtostaying

in , and the (.. ,)-element of the matrix power 1
n
is
(n)
(.. ,) = Pr
x
7
n
= ,. 7
k
(0 _ k _ n)|. (2.15)
In particular, 1
0
= 1
, the restriction of the identity matrix to . For the associated

Green function we write
G
(.. ,[:) =
o
n=0
(n)
(.. ,) :
n
. G
(.. ,) = G
(.. ,[1). (2.16)

B. The period of an irreducible class 35
(Caution: this is not the restriction of G(. [:) to !) Let r
(.. ,) be the radius of

convergence of this power series, and r
= inf{r
(.. ,) : .. , ]. If we write
G
(:) =
_
G
(.. ,[:)
_
x,,
then
(1
:1
)G
(:) = 1
for all : C with [:[ < r
. (2.17)
2.18 Lemma. Suppose that X is nite and that for each . there is
n X \ such that . n. Then r
> 1. In particular, G
(.. ,) < o for all

.. , .
Proof. We introduce a new state f and equip the state space L {f] with the
transition matrix Q given by
q(.. ,) = (.. ,). q(.. f) = 1 (.. ).
q(f. f) = 1. and q(f. .) = 0. if .. , .
Then Q
= 1
, the only essential state of the Markov chain ( L {f]. Q) is

f, and is convex. We can apply Theorem 2.7 and get r
_ 1,(1 c), where
(n)
(.. ,) _ (1 c)
n
for all . .
B The period of an irreducible class
In this section we consider an irreducible class C of a Markov chain (X. 1). We
exclude the trivial case when C consists of a single point . with (.. .) = 0 (in
this case, we say that . is a ephemeral state). In order to study the behaviour of
(7
n
) inside C, it is sufcient to consider the restriction 1
C
to the set C of the
transition matrix according to Denition 2.14,
1
C
=
_
C
(.. ,)
_
x,,C
. where
C
(.. ,) =
_
(.. ,). if .. , C.
0. otherwise.
Indeed, for .. , C the probability
(n)
(.. ,) coincides withthe element
(n)
C
(.. ,)
of the n-th matrix power of 1
C
: this assertion is true for n = 1, and inductively
(n1)
(.. ,) =
uA
(.. n)
(n)
(n. ,)

= 0 if n C
=
uC
(.. n)
(n)
(n. ,)
=
uC
C
(.. n)
(n)
C
(n. ,) =
(n1)
C
(.. ,).
Obviously, 1
C
is stochastic if and only if the irreducible class C is essential.
By denition, C has the following property.
For each .. , C there is k = k(.. ,) such that
(k)
(.. ,) > 0.
2.19 Denition. The period of C is the number
J = J(C) = gcd({n > 0 :
(n)
(.. .) > 0]).
where . C.
Here, gcd is of course the greatest common divisor. We have to check that J(C)
is well dened.
2.20 Lemma. The number J(C) does not depend on the specic choice of . C.
Proof. Let .. , C, . ,= ,. We write J(.) = gcd(N
x
), where
N
x
= {n > 0 :
(n)
(.. .) > 0] (2.21)
and analogously N
,
and J(,). By irreducibility of C there are k. > 0 such that
(k)
(.. ,) > 0 and
(I)
(,. .) > 0. We have
(kI)
(.. .) > 0, and hence J(.)
divides k .
Let n N
,
. Then
(knI)
(.. .) _
(k)
(.. ,)
(n)
(,. ,)
(I)
(,. .) > 0,
whence J(.) divides k n .
Combining these observations, we see that J(.) divides each n N
,
. We
conclude that J(.) divides J(,). By symmetry of the roles of . and ,, also J(,)
divides J(.), and thus J(.) = J(,).
In Example 1.1, C = X = { . . ] is one irreducible class, ( . ) > 0,
and hence J(C) = J(X) = 1.
In Example 1.46 (random walk with two absorbing barriers), the set C =
{1. 2. . . . . N 1] is an irreducible class with period J(C) = 2.
In Example 2.3, the state 3 is ephemeral, and the periods of the other classes
are J({1. 2]) = 2, J({4. 5. 6]) = 1, J({7. 8. 9. 10]) = 4, J({11. 12]) = 2 and
J({13]) = 1.
If J(C) = 1 then C is called an aperiodic class. In particular, C is aperiodic
when (.. .) > 0 for some . C.
2.22 Lemma. Let C be an irreducible class and J = J(C). For each . C there
is m
x
N such that
(nd)
(.. .) > 0 for all m _ m
x
.
Proof. First observe that the set N
x
of (2.21) has the following property.
n
1
. n
2
N
x
== n
1
n
2
N
x
. (2.23)
B. The period of an irreducible class 37
(It is a semigroup.) It is well known from elementary number theory (and easy to
prove) that the greatest common divisor of a set of positive integers can always be
written as a nite linear combination of elements of the set with integer coefcients.
Thus, there are n
1
. . . . . n
I
N
x
and a
1
. . . . . a
I
Z such that
J =
I
i=1
a
i
n
i
.
Let
n
i:o
i
>0
a
i
n
i
and n
-
=
i:o
i
~0
(a
i
) n
i
.
Now n
. n
-
N
x
by (2.23), and J = n
n
-
. We set k
= n
,J and
k
-
= n
-
,J. Then k
k
-
= 1. We dene
m
x
= k
-
(k
-
1).
Let m _ m
x
. We can write m = q k
-
r, where q _ k
-
1 and 0 _ r _ k
-
1.
Hence, m = q k
-
r(k
k
-
) = (q r)k
-
r k
with (q r) _ 0 and r _ 0,
and by (2.23)
mJ = (q r)m
-
r m
N
x
.
Before stating the main result of this section, we observe that the relation
on the elements of X and the denition of irreducible classes do not depend on the
fact that the matrix 1 is stochastic: more generally, the states can be classied with
respect to an arbitrary non-negative matrix 1 and the associated graph, where an
oriented edge is drawn from . to , when (.. ,) > 0.
2.24 Theorem. With respect to the matrix 1
d
C
, the irreducible class C decomposes
into J = J(C) irreducible, aperiodic classes C
0
. C
1
. . . . . C
d-1
, which are visited
in cyclic order by the original Markov chain: if u C
i
, C and (u. ) > 0
then C
i1
, where i 1 is computed modulo J.
Schematically,
C
0
1
C
1
1

1
C
d-1
1
C
0
. and
.. , belong to the same C
i
==
(nd)
(.. ,) > 0 for some m _ 0.
Proof. Let .
0
C. Since
(n
0
d)
(.
0
. .
0
) > 0 for some m
0
> 0, there are
.
1
. . . . . .
d-1
. .
d
C such that
.
0
1
.
1
1

1
.
d-1
1
.
d
(n
0
-1)d
.
0
.
Dene
C
i
= {. C : .
i
nd
. for some m _ 0]. i = 0. 1. . . . . J 1. J.
(1) C
i
is the irreducible class of .
i
with respect to 1
d
C
:
(a) We have .
i
C
i
.
(b) If . C
i
then . C and .
n
.
i
for some n _ 0. Thus .
i
ndn
.
i
, and
J must divide mJ n. Consequently, J divides n, and .
kd
.
i
for some k _ 0.
It follows that .
1
d
C
.
i
, i.e., . is in the class of .
i
with respect to 1
d
C
.
(c) Conversely, if .
1
d
C
.
i
then there is m _ 0 such that .
i
nd
., and . C
i
.
(d) By Lemma 2.22, C
i
is aperiodic with respect to 1
d
C
.
(2) C
d
= C
0
: indeed, .
0
d
.
d
implies .
d
C
0
. By (1), C
d
= C
0
.
(3) C =
_
d-1
i=0
C
i
: if . C then .
0
kdi
., where k _ 0 and 0 _ r <
J 1. By (1) and (2) there is _ 0 such that .
i
d-i
.
d
Id
.
0
kdi
., that is,
.
i
nd
. with m = k 1. Therefore . C
i
.
(4) If . C
i
, , C and .
1
, then , C
i1
(i = 0. 1. . . . J 1): there
is n with ,
n
.. Hence .
n1
., and J must divide n 1. On the other hand,
.
Id
.
i
( _ 0) by (1), and .
i
1
.
i1
. We get ,
nId1
.
i1
. Now let k _ 0
be such that .
i1
k
,. (Such a k exists by the irreducibility of C with respect
to 1.) We obtain that ,
kId(n1)
,, and J divides k J (n 1). But
then J divides k, and k = mJ with m _ 0, so that .
i1
nd
,, which implies that
, C
i1
.
2.25 Example. Consider a Markov chain with the following graph.
. . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
....................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................................................
6
....................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................................................
5
.................................................................................
. . . . . . . . . . . .
.....................................................................
...................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
..................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
......................................................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
........................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
..................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
.....................................................................
Figure 7
There is a unique irreducible class C = X = {1. 2. 3. 4. 5. 6], the period is J = 3.
Choosing .
0
= 1, .
1
= 2 and .
2
= 3, one obtains C
0
= {1], C
1
= {2. 4. 5] and
C
2
= {3. 6], which are visited in cyclic order.
C. The spectral radius of an irreducible class 39
2.26 Exercise. Show that if (X. 1) is irreducible and aperiodic, then also (X. 1
n
)
has these properties for each m N.
C The spectral radius of an irreducible class
For .. , X, consider the number
r(.. ,) = 1
limsup
n
_
(n)
(.. ,)
_
1{n
.
already dened in (1.33) as the radius of convergence of the power series G(.. ,[:).
Its inverse 1,r(.. ,) describes the exponential decay of the sequence
_
(n)
(.. ,)
_
,
as n o.
2.27 Lemma. If . n , then r(.. ,) _ min{r(.. n). r(n. ,)]. In particular,
. , implies r(.. ,) _ min{r(.. .). r(,. ,)].
If C is an irreducible class which does not consist of a single, ephemeral state
then r(.. ,) =: r(C) is the same for all .. , C.
Proof. By assumption, there are numbers k. _ 0 such that
(k)
(.. n) > 0 and
(I)
(n. ,) > 0. The inequality
(nI)
(.. ,) _
(n)
(.. n)
(I)
(n. ,) yields
_
(nI)
(.. ,)
1{(nI)
_
(nI){n
_
(n)
(.. n)
1{n
(I)
(n. ,)
1{n
.
As n o, the second factor on the right hand side tends to 1, and 1,r(.. ,) _
1,r(.. n).
Analogously, from the inequality
(nk)
(.. ,) _
(k)
(.. n)
(n)
(n. ,) one
obtains 1,r(.. ,) _ 1,r(n. ,).
If . ,, we can write . . ,, and r(.. ,) _ r(.. .) (setting n = .).
Analogously, . , , implies r(.. ,) _ r(,. ,) (setting n = ,).
To see the last statement, let .. ,. . n C. Then . , n, which
yields r(. n) _ r(. ,) _ r(.. ,). Analogously, . n , implies
r(.. ,) _ r(. n).
Here is a characterization of r(C) in terms of U(.. .[:) =

n
u
(n)
(.. .) :
n
.
2.28 Proposition. For any . in the irreducible class C,
r(C) = max{: > 0 : U(.. .[:) _ 1].
Proof. Write r = r(.. .) and s = s(.. .) for the respective radii of convergence
of the power series G(.. .[:) and U(.. .[:). Both functions are strictly increasing
in : > 0. We know that s _ r. Also, U(.. .[:) = o for real : > s. From the
equation
G(.. .[:) =
1
1 U(.. .[:)
we read that U(.. .[:) < 1 for all positive : < r. We can let tend : r frombelow
and infer that U(.. .[r) _ 1. The proof will be concluded when we can show that
for our power series, U(.. .[:) > 1 whenever : > r.
This is clear when U(.. .[r) = 1. So consider the case when U(.. .[r) < 1.
Then we claim that s = r, whence U(.. .[:) = o > 1 whenever : > r. Suppose
by contradiction that s > r. Then there is a real :
0
with r < :
0
< s such that we
have U(.. .[:
0
) _ 1. Set u
n
= u
(n)
(.. .) :
n
0
and
n
=
(n)
(.. .) :
n
0
. Then (1.39)
leads to the (renewal) recursion
2
0
= 1.
n
=
n
k=1
u
k

n-k
(n _ 1).
Since
k
u
k
_ 1, induction on n yields
n
_ 1 for all n. Therefore
G(.. .[:) =
o
n=0
n
(:,:
0
)
n
converges for all : C with [:[ < :
0
. But then r _ :
0
, a contradiction.
The number
j(1
C
) = 1,r(C) = limsup
n
_
(n)
(.. ,)
_
1{n
. .. , C. (2.29)
is called the spectral radius of 1
C
, resp. C. If the Markov chain (X. 1) is irre-
ducible, then j(1) is the spectral radius of 1. The terminology is in part justied
by the following (which is not essential for the rest of this chapter).
2.30 Proposition. If the irreducible class C is nite then j(1
C
) is an eigenvalue
of 1
C
, and every other eigenvalue z satises [z[ _ j(1
C
).
We wont prove this proposition right now. It is a consequence of the famous
PerronFrobenius theorem, which is of utmost importance in Markov chain theory,
see Seneta [Se]. We shall give a detailed proof of that theorem in Section 3.D.
Let us also remark that in the case when C is innite, the name spectral radius
for j(1
C
) may be misleading, since it does in general not refer to an action of 1
C
as a bounded linear operator on a suitable space. On the other hand, later on we
shall encounter reversible irreducible Markov chains, and in this situation j(1) is
a true spectral radius.
The following is a corollary of Theorem 2.7.
2
In the literature, what is denoted u
k
here is usually called (
k
, and what is denoted ]
n
here is often
called u
n
. Our choice of the notation is caused by the necessity to distinguish between the generating
T(x, ,[z) and U(x, ,[z) of the stopping time s
y
and t
y
, respectively, which are both needed at
different instances in this book.
C. The spectral radius of an irreducible class 41
2.31 Proposition. Let C be a nite irreducible class. Then j(1
C
) = 1 if and only
if C is essential.
In other words, if 1 is a nite, irreducible, substochastic matrix then j(1) = 1
if and only if 1 is stochastic.
Proof. If C is non-essential then Theorem 2.7 shows that j(1
C
) < 1. Conversely,
if C is essential then all the matrix powers 1
n
C
are stochastic. Thus, we cannot have
j(1
C
) < 1, since in that case we would have

,C

(n)
(.. ,) 0 as n o.
In the following, the class C need not be nite.

2.32 Theorem. Let C be an irreducible class which does not consist of a single,
ephemeral state, and let J = J(C). If .. , C and
(k)
(.. ,) > 0 (such k must
exist) then
(n)
(.. ,) = 0 for all m , k mod J, and
lim
n-o
(ndk)
(.. ,)
1{(ndk)
= j(1
C
) > 0.
In particular,
lim
n-o
(nd)
(.. .)
1{(nd)
= j(1
C
) and
(n)
(.. .) _ j(1
C
)
n
for all n N.
For the proof, we need the following.
2.33 Proposition. Let (a
n
) be a sequence of non-negative real numbers such that
a
n
> 0 for all n _ n
0
and a
n
a
n
_ a
nn
for all m. n N. Then
J lim
n-o
a
1{n
n
= j > 0 and a
n
_ j
n
for all n N.
Proof. Let n. m _ n
0
. For the moment, consider n xed. We can write m = q
n
n
r
n
with q
n
_ 0 and n
0
_ r
n
< n
0
n. In particular, a
i
m
> 0, we have q
n
oif
and only if m o, and in this case m,q
n
n. Let c = c
n
= min{a
i
m
: m N].
Then by assumption,
a
n
_ a
q
m
n
a
i
m
_ a
q
m
n
c
n
. and a
1{n
n
_ a
q
m
{n
n
c
1{n
n
.
As m o, we get a
q
m
{n
n
a
1{n
n
and c
1{n
n
1. It follows that
j := liminf
n-o
a
1{n
n
_ a
1{n
n
for each n N.
If now also n o, then this implies
liminf
n-o
a
1{n
n
_ limsup
n-o
a
1{n
n
.
and lim
n-o
a
1{n
n
exists and is equal to liminf
n-o
a
1{n
n
= j.
Proof of Theorem2.32. By Lemma 2.22, the sequence (a
n
) =
_
(nd)
(.. .)
_
fullls
the hypotheses of Proposition 2.33, and a
1{n
n
j(1
C
) > 0.
We have
(ndk)
(.. ,) _
(nd)
(.. .)
(k)
(.. ,), and
(ndk)
(.. ,)
1{(ndk)
_
_
(nd)
(.. .)
1{nd
_
nd{(ndk)
(k)
(.. ,)
1{(ndk)
.
As
(k)
(.. ,) > 0, the last factor tends to 1 as n o, while by the above
_
(nd)
(.. .)
1{(nd)
_
nd{(ndk)
j(1
C
). We conclude that
j(1
C
) _ liminf
n-o

(ndk)
(.. ,)
1{(ndk)
_ limsup
n-o
(ndk)
(.. ,)
1{(ndk)
= limsup
n-o
(n)
(.. ,)
1{n
= j(1
C
).
2.34 Exercise. Modify Example 1.1 by making the rainy state absorbing: set
( . ) = 1, ( . ) = ( . ) = 0. The other transition probabilities remain
unchanged. Compute the spectral radius of the class { . ].
Chapter 3
Recurrence and transience, convergence, and the
ergodic theorem
A Recurrent classes
The following concept is of central importance in Markov chain theory.
3.1 Denition. Consider a Markov chain (X. 1). Astate . X is called recurrent,
if
U(.. .) = Pr
x
J n > 0 : 7
n
= .| = 1.
and transient, otherwise.
In words, . is recurrent if it is certain that the Markov chain starting at . will
return to ..
We also dene the probabilities
H(.. ,) = Pr
x
7
n
= , for innitely many n N|. .. , X.
If it is certain that (7
n
) returns to . at least once, then it will return to . innitely
often with probability 1. If the probability to return at least once is strictly less than
1, then it is unlikely (probability 0) that (7
n
) will return to . innitely often. In
other words, H(.. .) cannot assume any values besides 0 or 1, as we shall prove
in the next theorem. Such a statement is called a zero-one law.
3.2 Theorem. (a) The state . is recurrent if and only if H(.. .) = 1.
(b) The state . is transient if and only if H(.. .) = 0.
(c) We have H(.. ,) = U(.. ,) H(,. ,).
Proof. We dene
H
(n)
(.. ,) = Pr
x
7
n
= , for at least m time instants n > 0|. (3.3)
Then
H
(1)
(.. ,) = U(.. ,) and H(.. ,) = lim
n-o
H
(n)
(.. ,).
44 Chapter 3. Recurrence and transience, convergence, and the ergodic theorem
and
H
(n1)
(.. ,) =
o
k=1
Pr
x
t
,
= k and 7
n
= , for at least m instants n > k|
(using once more the rule Pr( T) = Pr() Pr(T [ ), if Pr() > 0)
=
k:u
.k/
(x,,)>0
(k)
(.. ,) Pr
x
_
7
n
= , for at least
m time instants n > k
7
k
= ,. 7
i
,= ,
for i = 1. . . . . k 1
_
(using the Markov property (1.5))
=
k:u
.k/
(x,,)>0
(k)
(.. ,) Pr
x
7
n
= , for at least m instants n > k [ 7
k
= ,|
=
o
k=1
u
(k)
(.. ,) H
(n)
(,. ,) = U(.. ,) H
(n)
(,. ,).
Therefore H
(n)
(.. .) = U(.. .)
n
. As m o, we get (a) and (b). Further-
more, H(.. ,) = lim
n-o
U(.. ,) H
(n)
(,. ,) = U(.. ,) H(,. ,), and we have
proved (c).
We now list a few properties of recurrent states.
3.4 Theorem. (a) The state . is recurrent if and only if G(.. .) = o.
(b) If . is recurrent and . , then U(,. .) = H(,. .) = 1, and , is recurrent.
In particular, . is essential.
(c) If C is a nite essential class then all elements of C are recurrent.
Proof. (a) Observe that, by monotone convergence
1
,
U(.. .) = lim
z-1-
U(.. .[:) and G(.. .) = lim
z-1-
G(.. .[:).
Therefore Theorem 1.38 implies
G(.. .) = lim
z-1-
1
1 U(.. .[:)
=
_
_
o. if U(.. .) = 1.
1
1 U(.. .)
. if U(.. .) < 1.
1
We can interpret a power series
n
o
n
z
n
with o
n
, z _ 0 as an integral
_
(
z
(n) dc(n), where c
is the counting measure on N
0
and (
z
(n) = o
n
z
n
. Then (
z
(n) - (
r
(n) as z - r-(monotone limit
from below), whence
_
(
z
(n) dc(n) -
_
(
r
(n) dc(n), so that this elementary fact from calculus is
indeed a basic variant of the monotone convergence theorem of integration theory.
A. Recurrent classes 45
(b) We proceed by induction: if .
n
, then U(,. .) = 1.
By assumption, U(.. .) = 1, and the statement is true for n = 0. Suppose that
it holds for n and that .
n1
,. Then there is n X such that .
n
n
1
,. By
the induction hypothesis U(n. .) = 1. By Theorem 1.38 (c) (with : = 1),
1 = U(n. .) = (n. .)
,=x
(n. ) U(. .).
Stochasticity of 1 implies
0 =
,=x
(n. )
_
1 U(. .)
_
_ (n. ,)
_
1 U(,. .)
_
_ 0.
Since (n. ,) > 0, we must have U(,. .) = 1.
Hence H(,. .) = U(,. .) H(.. .) = U(,. .) = 1 for every , with . ,. By
Exercise 2.6 (c), . is essential. We nowshowthat also , is recurrent if . is recurrent
and , - .: there are k. _ 0 such that
(k)
(.. ,) > 0 and
(I)
(,. .) > 0.
Consequently, by (a)
G(,. ,) _
o
n=kI
(n)
(,. ,) _
(I)
(,. .)
o
n=0
(n)
(.. .)
(k)
(.. ,) = o.
(c) Since C is essential, when starting in . C, the Markov chain cannot exit
from C:
Pr
x
7
n
C for all n N| = 1.
Let o C be such that 7
n
(o) C for all n. Since C is nite, there is at least one
, C (depending on o) such that 7
n
(o) = , for innitely many instants n. In
other terms,
{o C [ 7
n
(o) C for all n N]
{o C [ J , C : 7
n
(o) = , for innitely many n]
and
1 = Pr
x
J , C : 7
n
= , for innitely many n|
_
,C
Pr
x
7
n
= , for innitely many n| =
,C
H(.. ,).
In particular there must be , C such that 0 < H(.. ,) = U(.. ,)H(,. ,), and
Theorem 3.2 yields H(,. ,) = 1: the state , is recurrent, and by (b) every element
of C is recurrent.
Recurrence is thus a property of irreducible classes: if C is irreducible then
either all elements of C are recurrent or all are transient. Furthermore a recurrent
irreducible class must always be essential. In a Markov chain with nite state space,
all essential classes are recurrent. If X is innite, this is no longer true, as the next
example shows.
3.5 Example (Innite drunkards walk). The state space is X = Z, and the tran-
sition probabilities are dened in terms of the two parameters and q (. q > 0,
q = 1), as follows.
(k. k 1) = . (k. k 1) = q. (k. ) = 0 if [k [ ,= 1.
This Markov chain (random walk) can also be interpreted as a coin tossing game:
if heads comes up then we win one Euro, and if tails comes up we lose one
Euro. The coin is not necessarily fair; heads comes up with probability and
tails with probability q. The state k Z represents the possible (positive or
negative) capital gain in Euros after some repeated coin tosses. If the single tosses
are mutually independent, then one passes in a single step (toss) from capital k to
k 1 with probability , and to k 1 with probability q.
Observe that in this example, we have translation invariance: J(k. [:) =
J(k . 0[:) = J(0. k[:), and the same holds for G(. [:). We compute
U(0. 0[:).
Reasoning precisely as in Example 2.10, we obtain
J(1. 0[:) =
1
2:
_
1
_
1 4q:
2
_
.
By symmetry, exchanging the roles of and q,
J(1. 0[:) =
1
2q:
_
1
_
1 4q:
2
_
.
Now, by Theorem 1.38 (c)
U(0. 0[:) = : J(1. 0[:) q: J(1. 0[:) = 1
_
1 4q:
2
. (3.6)
whence
U(0. 0) = U(0. 0[1) = 1
_
( q)
2
= 1 [ q[.
There is a single irreducible (whence essential) class, but the random walk is recur-
rent only when = q = 1,2.
3.7 Exercise. Show that if v is an arbitrary starting distribution and , is a transient
state, then
lim
n-o
Pr
7
n
= ,| = 0.
B. Return times, positive recurrence, and stationary probability measures 47
B Return times, positive recurrence, and stationary
probability measures
Let . be a recurrent state of the Markov chain (X. 1). Then it is certain that (7
n
),
after starting in ., will return to .. That is, the return time t
x
is Pr
x
-almost surely
nite. The expected return time (=number of steps) is
E
x
t
x
| =
o
n=1
nu
(n)
(.. .) = U
t
(.. .[1).
or more precisely, E
x
t
x
| = U
t
(.. .[1) = lim
z-1-
U
t
(.. .[:) by monotone
convergence. This limit may be innite.
3.8 Denition. A recurrent state . is called
positive recurrent, if E
x
t
x
| < o. and
null recurrent, if E
x
t
x
| = o.
In Example 3.5, the innite drunkards random walk is recurrent if and only if
= q = 1,2. In this case U(0. 0[:) = 1
_
1 :
2
and U
t
(0. 0[:) = :
_
1 :
2
for [:[ < 1 , so that U
t
(0. 0[1) = o: the state 0 is null recurrent.
The next theorem shows that positive and null recurrence are class properties of
(recurrent, essential) irreducible classes.
3.9 Theorem. Suppose that . is positive recurrent and that , - .. Then also ,
is positive recurrent. Furthermore, E
,
t
x
| < o.
Proof. We know from Theorem 3.4 (b) that , is recurrent. In particular, the Green
function has convergence radii r(.. .) = r(,. ,) = 1, and by Theorem 1.38 (a)
1 U(.. .[:)
1 U(,. ,[:)
=
G(,. ,[:)
G(.. .[:)
for 0 < : < 1.
As : 1, the right hand side becomes an expression of type
o
o
. Since 0 <
U
t
(,. ,[1) _ o, an application of de lHospitals rule yields
E
x
(t
x
)
E
,
(t
,
)
=
U
t
(.. .[1)
U
t
(,. ,[1)
= lim
z-1-
G(,. ,[:)
G(.. .[:)
.
There are k. l > 0 such that
(k)
(.. ,) > 0 and
(I)
(,. .) > 0. Therefore, if
0 < : < 1,
G(,. ,[:) _
kI-1
n=0
(n)
(,. ,) :
n
(I)
(,. .) G(.. .[:)
(k)
(.. ,) :
kI
.
We obtain
lim
z-1-
G(,. ,[:)
G(.. .[:)
_
(I)
(,. .)
(k)
(.. ,) > 0.
In particular, we must have E
,
(t
,
) < o.
To see that also E
,
(t
x
) = U
t
(,. .[1) < o, suppose that .
n
,. We proceed
by induction on n as in the proof of Theorem 3.4 (b). If n = 0 then , = . and the
statement is true. Suppose that it holds for n and that .
n1
,. Let n X be
such that .
n
n
1
,. By Theorem 1.38 (c), for 0 < : _ 1,
U(n. .[:) = (n. .):
,=x
(n. ): U(. .[:).
Differentiating on both sides and letting : 1 from below, we see that niteness
of U
t
(n. .[1) (which holds by the induction hypothesis) implies niteness of
U
t
(. .[1) for all with (n. ) > 0, and in particular for = ,.
An irreducible (essential) class is called positive or null recurrent, if all its
elements have the respective property.
3.10 Theorem. Let C be a nite essential class of (X. 1). Then C is positive
recurrent.
Proof. Since the matrix 1
n
is stochastic, we have for each . X
,A
G(.. ,[:) =
o
n=0
,A
(n)
(.. ,) :
n
=
1
1 :
. 0 _ : < 1.
Using Theorem1.38, we can write G(.. ,[:) = J(.. ,[:)
_
1U(,. ,[:)
_
. Thus,
we obtain the following important identity (that will be used again later on)
,A
J(.. ,[:)
1 :
1 U(,. ,[:)
= 1 for each . X and 0 _ : < 1. (3.11)
Now suppose that . belongs to the nite essential class C. Then a non-zero contri-
bution to the sum in (3.11) can come only from elements , C. We know already
(Theorem 3.4) that C is recurrent. Therefore J(.. ,[1) = U(,. ,[1) = 1 for
all .. , C. Since C is nite, we can exchange sum and limit and apply de
lHospitals rule:
1 = lim
z-1-
,C
J(.. ,[:)
1 :
1 U(,. ,[:)
=
,C
1
U
t
(,. ,[1)
. (3.12)
Therefore there must be , C such that U
t
(,. ,[1) < o, so that ,, and thus the
class C, is positive recurrent.
3.13 Exercise. Use (3.11) to give a generating function proof of Theorem3.4 (c).
3.14 Exercise. Let (

X.

1) be a factor chain of (X. 1); see (1.30).
Show that if . is a recurrent state of (X. 1), then its projection (.) = . is
a recurrent state of (

X.

1). Also show that if . is positive recurrent, then so is ..
It is convenient to think of measures v on X as row vectors

_
v(.)
_
xA
. so that
v1 is the product of the row vector v with the matrix 1,
v1(,) =
x
v(.) (.. ,). (3.15)
Here, we do not necessarily suppose that v(X) is nite, so that the last sum might
diverge.
In the same way, we consider real or complex functions on X as column
vectors, whence 1 is the function
1(,) =
,
(.. ,)(,). (3.16)
as long as this sum is dened (it might be a divergent series).
3.17 Denition. A(non-negative) measure v on X is called invariant or stationary,
if v1 = v. It is called excessive, if v1 _ v pointwise.
3.18 Exercise. Suppose that v is an excessive measure and that v(X) < o. Show
that v is stationary.
We say that a set X carries the measure v on X, if the support of v,
supp(v) = {. X : v(.) = 0], is contained in .
In the next theorem, the class C is not necessarily assumed to be nite.
3.19 Theorem. Let C be an essential class of (X. 1). Then C is positive recurrent
if and only if it carries a stationary probability measure m
C
. In this case, the latter
is unique and given by
m
C
(.) =
_
1,E
x
(t
x
). if . C.
0. otherwise.
Proof. We rst show that in the positive recurrent case, m
C
is a stationary proba-
bility measure. When C is innite, we cannot apply (3.11) directly to show that
m
C
(C) = 1, because a priori we are not allowed to exchange limit and sum in
(3.12). However, if C is an arbitrary nite subset, then (3.11) implies that for
0 < : < 1 and . C,
,
J(.. ,[:)
1 :
1 U(,. ,[:)
_ 1.
If we now let : 1 from below and proceed as in (3.12), we nd m
C
() _ 1.
Therefore the total mass of m
C
is m
C
(X) = m
C
(C) _ 1.
Next, recall the identity (1.34). We do not only have G(:) = 1 :1 G(:) but
also, in the same way,
G(:) = 1 G(:) :1.
Thus, for , C,
G(,. ,[:) = 1
xA
G(,. .[:) (.. ,):. (3.20)
We use Theorem 1.38 and multiply once more both sides by 1 :. Since only
elements . C contribute to the last sum,
1 :
1 U(,. ,[:)
= 1 :
xC
J(,. .[:)
1 :
1 U(.. .[:)
(.. ,):. (3.21)
Again, by recurrence, J(,. .[1) = U(.. .[1) = 1. As above, we restrict the last
sum to an arbitrary nite C and then let : 1. De lHospitals rule yields
1
U
t
(,. ,[1)
_
x
1
U
t
(.. .[1)
(.. ,).
Since this inequality holds for every nite C, it also holds with C itself
2
in the place of . Thus, m
C
is an excessive measure, and by Exercise 3.18, it is
stationary. By positive recurrence, supp(m
C
) = C, and we can normalize it so that
it becomes a stationary probability measure. This proves the only if-part.
To show the if part, suppose that v is a stationary probability measure with
supp(v) C.
Let , C. Given c > 0, since v(C) = 1, there is a nite set
t
C such that
v(C \
t
) < c. Stationarity implies that v1
n
= v for each n N
0
, and
v(,) =
xC
v(.)
(n)
(.. ,) _
x
"
v(.)
(n)
(.. ,) c.
2
As a matter of fact, this argument, as well as the one used above to show that m
C
(C) _ 1, amounts
to applying Fatous lemma of integration theory to the sum in (3.21), which once more is interpeted as
an integral with respect to the counting measure on C.
Once again, we multiply both sides by (1 :):
n
(0 < : < 1) and sum over n:
v(,) _
x
"
v(.) J(.. ,[:)
1 :
1 U(,. ,[:)
c.
Suppose rst that C is transient. Then U(,. ,[1) < 1 for each , C by
Theorem 3.4 (b), and when : 1 from below, the last sum tends to 0. That is,
v(,) = 0 for each , C, a contradiction. Therefore C is recurrent, and we can
apply once more de lHospitals rule, when : 1:
v(,) c _
x
"
v(.) m
C
(,) = v(
t
) m
C
(,) _ m
C
(,)
for each c > 0. Therefore v(,) _ m
C
(,) for each , X. Thus m
C
(,) > 0 for
at least one , C. This means that E
,
(t
,
) = 1,m
C
(,) < o, and C must be
positive recurrent.
Finally, since v(X) = 1 and m
C
(X) _ 1, we infer that m
C
is indeed a proba-
bility measure, and v = m
C
. In particular, the stationary probability measure m
C
carried by C is unique.
3.22 Exercise. Reformulate the method of the second part of the proof of Theo-
rem 3.19 to show the following.
If (X. 1) is an arbitrary Markov chain and v a stationary probability measure,
then v(,) > 0 implies that , is a positive recurrent state.
3.23 Corollary. The Markov chain (X. 1) admits stationary probability measures
if and only if there are positive recurrent states.
In this case, let C
i
, i 1, be those essential classes which are positive recurrent
(with 1 = Nor 1 = {1. . . . . k], k N). For i 1, let m
i
= m
C
i
be the stationary
probability measure of (C
i
. 1
C
i
) according to Theorem 3.19. Consider m
i
as a
measure on X with m
i
(.) = 0 for . X \ C
i
.
Then the stationary probability measures of (X. 1) are precisely the convex
combinations
v =
iJ
c
i
m
i
. where c
i
_ 0 and
iJ
c
i
= 1.
Proof. Let v be a stationary probability measure. By Exercise 3.22, every . with
v(.) > 0 must be positive recurrent. Therefore there must be positive recurrent
essential classes C
i
, i 1. By Theorem 3.19, the restriction of v to any of the C
i
must be a non-negative multiple of m
i
. Therefore v must have the proposed form.
Conversely, it is clear that any convex combination of the m
i
is a stationary
probability measure.
3.24 Exercise. Let C be a positive recurrent essential class, J = J(C) its period,
and C
0
. . . . . C
d-1
its periodic classes (the irreducible classes of 1
d
C
) according
to Theorem 2.24. Determine the stationary probability measure of 1
d
C
on C
i
in
terms of the stationary probability m
C
of 1 on J. Compute rst m
C
(C
i
) for
i = 0. . . . . J 1.
C The convergence theorem for nite Markov chains
We shall now study the question of whether the transition probabilities
(n)
(.. ,)
converge to a limit as n o. If , is a transient state then we know from Theo-
rem 3.4 (a) that G(.. ,) = J(.. ,)G(,. ,) < o, so that
(n)
(.. ,) 0. Thus,
the question is of interest when , is essential. For the moment, we restrict attention
to the case when . and , belong to the same essential class. Since (7
n
) cannot
exit from that class, we may assume without loss of generality that this class is the
whole of X. That is, we assume to have an irreducible Markov chain (X,P). Before
stating results, let us see what we expect in the specic case when X is nite.
Suppose that X is nite and that
(n)
(.. ,) converges for all .. , as n o.
The Markov property (absence of memory) suggests that on the long run (as
n o), the starting point should be forgotten, that is, the limit should not
depend on .. Thus, suppose that lim
n
(n)
(.. ,) = m(,) for all ,, where m is a
measure on X. Then since X is nite and 1
n
is stochastic for each n we should
have m(X) = 1, whence m(.) > 0 for some .. But then
(n)
(.. .) > 0 for all
but nitely many n, so that 1 should be aperiodic. Also, if we let n o in the
relation
(n1)
(.. ,) =
uA
(n)
(.. n) (n. ,).
we nd that m should be the stationary probability measure.
We now know what we are looking for and which hypotheses we need, and can
start to work, without supposing right away that X is nite.
Consider the set of all probability distributions on X,
M(X) = {v : X R [ v(.) _ 0 for all . X and
xA
v(.) = 1].
We consider M(X) as a subset of
1
(X). In particular, M(X) is closed in the metric
[v
1
v
2
[
1
=
xA
[v
1
(.) v
2
(.)[.
(The number [v
1
v
2
[
1
,2 is usually called the total variation norm of the signed
measure v
1
v
2
.) The transition matrix 1 acts on M(X), according to (3.15), by
v v1, which is in M(X) by stochasticity of 1. Our goal is to apply the Banach
C. The convergence theorem for nite Markov chains 53
xed point theorem to this contraction, if possible. For , X, we dene
a(,) = a(,. 1) = inf
xA
(.. ,) and t = t(1) = 1
,A
a(,).
Thus, a(,) is the inmum of the ,-column of 1, and 0 _ t _ 1. Indeed, if . X
then (.. ,) _ a(,) for each ,, and
1 =
,A
(.. ,) _
,A
a(,).
We remark that t(1) is a so-called ergodic coefcient, and that the methods that we
are going to use are a particular instance of the theory of those coefcients, see e.g.
Seneta [Se, 4.3] or Isaacson and Madsen [I-M, Ch. V]. We have t(1) < 1 if
and only if 1 has a column where all elements are strictly positive. Also, t(1) = 0
if and only if all rows of 1 coincide (i.e., they coincide with a single probability
measure on X).
3.25 Lemma. For all v
1
. v
2
M(X),
[v
1
1 v
2
1[
1
_ t(1) [v
1
v
2
[
1
.
Proof. For each , X we have
v
1
1(,) v
2
1(,) =
xA
_
v
1
(.) v
2
(.)
_
(.. ,)
=
xA
[v
1
(.) v
2
(.)[(.. ,)
xA
_
[v
1
(.) v
2
(.)[
_
v
1
(.) v
2
(.)
_
_
(.. ,)
_
xA
[v
1
(.) v
2
(.)[(.. ,)
xA
_
[v
1
(.) v
2
(.)[
_
v
1
(.) v
2
(.)
_
_
a(,)
=
xA
[v
1
(.) v
2
(.)[
_
(.. ,) a(,)
_
.
By symmetry,
[v
1
1(,) v
2
1(,)[ _
xA
[v
1
(.) v
2
(.)[
_
(.. ,) a(,)
_
.
Therefore,
[v
1
1 v
2
1[
1
_
,A
xA
[v
1
(.) v
2
(.)[
_
(.. ,) a(,)
_
=
xA
[v
1
(.) v
2
(.)[
,Y
_
(.. ,) a(,)
_
= t(1) [v
1
v
2
[
1
.
This proves the lemma.
The following theorem does not (yet) assume niteness of X.
3.26 Theorem. Suppose that (X. 1) is irreducible and such that t(1
k
) < 1 for
some k N. Then 1 is aperiodic, (positive) recurrent, and there is t < 1 such
that for each v M(X),
[v1
n
m[
1
_ 2 t
n
for all n _ k.
where m is the unique stationary probability distribution of 1.
Proof. The inequality t(1
k
) < 1 implies that a(,. 1
k
) > 0 for some , X. For
this , there is at least one . X with (,. .) > 0. We have
(k)
(,. ,) > 0 and
(k)
(.. ,) > 0, and also
(k1)
(,. ,) _ (,. .)
(k)
(.. ,) > 0. For the period J
of 1 we thus have J[k and J[k 1. Consequently J = 1.
Set t = t(1
k
). By Lemma 3.25, the mapping v v1
k
is a contraction
of M(X). It follows from Banachs xed point theorem that there is a unique
m M(X) with m = m1
k
, and
[v1
kI
m[
1
_ t
I
[v m[
1
_ 2t
I
.
In particular, for v = m1 M(X) we obtain
m1 = (m1
kI
)1 = (m1)1
kI
m. as l o.
whence m1 = m. Theorem 3.19 implies positive recurrence, and m(.) =
1,E
x
(t
x
). [Note: it is only for this last conclusion that we use Theorem 3.19
here, while for the rest, the latter theorem and its proof are not essential in the
present section.]
If n N, write n = kl r, where r {0. . . . . k 1]. Then, for v M(X),
[v1
n
m[
1
= [(v1
i
)1
kI
m[ _ 2t
I
_ 2 t
n
.
where t = t
1{(k1)
.
We can apply this theorem to Example 1.1. We have
1 =
_
_
0 1,2 1,2
1,4 1,2 1,4
1,4 1,4 1,2
_
_
.
and t(1) = 1,2. Therefore, with v =
x
, . X,
,A
[
(n)
(.. ,) m(,)[ _ 2
-n1
.
where
_
m( ). m( ). m( )
_
=
_
1
5
.
2
5
.
2
5
_
.
C. The convergence theorem for nite Markov chains 55
One of the important features of Theorem 3.26 is that it provides an estimate for
the speed of convergence of
(n)
(.. ) to m, and that convergence is exponentially
fast.
3.27 Exercise. Let X = N and (k. k 1) = (k. 1) = 1,2 for all k N,
while all other transition probabilities are 0. (Drawthe graph of this Markov chain.)
Compute the stationary probability measure and estimate the speed of convergence.
Let us now see how Theorem 3.26 can be applied to Markov chains with nite
state space.
3.28 Theorem. Let (X. 1) be an irreducible, aperiodic Markov chain with nite
state space. Then there are k Nand t < 1 such that for the stationary probability
measure m(,) = 1,E
,
(t
,
) one has
,A
[
(n)
(.. ,) m(,)[ _ 2 t
n
for every . X and n _ k.
Proof. We show that t(1
k
) < 1 for some k N.
Given .. , X, by irreducibility we can nd m = m
x,,
N such that
(n)
(.. ,) > 0. Finiteness of X allows us to dene
k
1
= max{m
x,,
: .. , X].
By Lemma 2.22, for each . X there is =
x
such that
(q)
(.. .) > 0 for all
q _
x
. Dene
k
2
= max{
x
: . X].
Let k = k
1
k
2
. If .. , X and n _ k then n = mq, where m = m
x,,
and
q _ k
2
_
,
. Consequently,
(n)
(.. ,) _
(n)
(.. ,)
(q)
(,. ,) > 0.
We have proved that for each n _ k, all matrix elements of 1
n
are strictly positive.
In particular, t(1
k
) < 1, and Theorem 3.26 applies.
A more precise estimate of the rate of convergence to 0 of [v1
n
m[
1
, for
nite X and irreducible, aperiodic 1 is provided by the PerronFrobenius theorem:
[v1
n
m[
1
can be compared with z
n
+
, where z
+
= {max [z
i
[ : z
i
< 1] and
the z
i
are the eigenvalues of the matrix 1. We shall come back to this topic for
specic classes of Markov chains in a later chapter, and conclude this section with
an important consequence of the preceding results, regarding the spectrum of 1.
3.29 Theorem. (1) If X is nite and 1 is irreducible, then z = 1 is an eigenvalue
of 1. The left eigenspace consists of all constant multiples of the unique stationary
probability measure m, while the right eigenspace consists of all constant functions
on X.
(2) If J = J(1) is the period of 1, then the (complex) eigenvalues z of 1 with
[z[ = 1 are precisely the J-th roots of unity z
}
= e
2ti}{d
, = 0. . . . . J 1. All
other eigenvalues satisfy [z[ < 1.
Proof. (1) We start with the right eigenspace and use a tool that we shall re-prove in
more generality later on under the name of the maximumprinciple. Let : X C
be a function such that 1 = in the notation of (3.16) a harmonic function.
Since both 1 and z = 1 are real, the real and imaginary parts of must also be
harmonic. That is, we may assume that is real-valued. We also have 1
n
=
for each n.
Since X is nite, there is . such that (.) _ (,) for all , X. We have
,A
(n)
(.. ,)
_
(.) (,)
_

_ 0
= (.) 1
n
(.) = 0.
Therefore (,) = (.) for each , with
(n)
(.. ,) > 0. By irreducibility, we can
nd such an n for every , X, so that must be constant.
Regarding the left eigenspace, we know already from Theorem 3.26 that all
non-negative left eigenvectors must be constant multiples of the unique stationary
probability measure m. In order to show that there may be no other ones, we use
a method that we shall also elaborate in more detail in a later chapter, namely time
reversal. We dene the m-reverse

1 of 1 by
(.. ,) = m(,)(,. .),m(.). (3.30)
It is well-dened since m(.) > 0, and stochastic since m is stationary. Also, it
inherits irreducibility from 1. Indeed, the graph I(
1) is obtained from I(1) by

inverting the orientation of each edge. This operation preserves strong connect-
edness. Thus, we can apply the above to

1: every right eigenfunction of

1 is
constant. The following is easy.
A (complex) measure v on X satises v1 = v if and only if its
density (.) = v(.),m(.) with respect to m satises

1 = .
(3.31)
Therefore must be constant, and v = c m.
(2) First, suppose that (X. 1) is aperiodic. Let z C be an eigenvalue of 1
with [z[ = 1, and let : X C be an associated eigenfunction. Then [ [ =
[1 [ _ 1[ [ _ 1
n
[ [ for each n N. Finiteness of X and the convergence
D. The PerronFrobenius theorem 57
theorem imply that
[(.)[ _ lim
n-o
1
n
[ [(.) =
,A
[(,)[ m(,)
for each . X. If we choose . such that [(.)[ is maximal, then we obtain
,A
_
[(.)[ [(,)[
_
m(,) = 0.
and we see that [ [ is constant, without loss of generality [ [ 1. There is n such
(n)
(.. ,) > 0 for all ., ,. Therefore
z
n
(.) =
,A
(n)
(.. ,) (,)
is a strict convex combination of the numbers (,) C, , X, which all lie on
the unit circle. If these numbers did not all coincide then z
n
(.) would have to lie
in the interior of the circle, a contradiction. Therefore is constant, and z = 1.
Now let J = J(1) be arbitrary, and let C
0
. . . . . C
d-1
be the periodic classes
according to Theorem 2.24. Then we know that the restriction Q
k
of 1
d
to C
k
is
stochastic, irreducible and aperiodic for each k. If 1 = z on X, where [z[ = 1,
then the restriction
k
of to C
k
satises Q
k
k
= z
d
k
. By the above, we
conclude that z
d
= 1. Conversely, if we dene on X by z
k
}
on C
k
, where
z
}
= e
2ti}{d
, then 1 = z
}
, and z
}
is an eigenvalue of 1.
Stochasticity implies that all other eigenvalues satisfy [z[ < 1.
3.32 Exercise. Prove (3.31).
D The PerronFrobenius theorem
We nowmake a small detour into more general matrix analysis. Theorems 3.28 and
3.29 are often presented as special cases of the PerronFrobenius theorem, which is
a landmark of matrix analysis; see Seneta [Se]. Here we take a reversed viewpoint
and regard those theorems as the rst part of its proof.
We consider a non-negative, nite square matrix , which we write =
_
a(.. ,)
_
x,,A
(instead of (a
i}
)
i,}=1,...,1
) in order to stay close to our usual no-
tation. The set X is assumed nite. The n-th matrix power is written
n
=
_
a
(n)
(.. ,)
_
x,,A
. As usual, we think of column vectors as functions and of row
vectors as measures on X. The denition of irreducibility remains the same as for
stochastic matrices.
3.33 Theorem. =
_
a(.. ,)
_
x,,A
be a nite, irreducible, non-negative matrix,
and let
j() = limsup
n-o
a
(n)
(.. ,)
1{n
Then j() > 0 is independent of . and ,, and
j() = min{t > 0 [ there is g: X (0. o) with g _ t g].
Proof. The property that limsup
n-o
a
(n)
(.. ,)
1{n
is the same for all .. , X
follows from irreducibility exactly as for Markov chains. Choose . and , in X. By
irreducibility, a
(k)
(.. ,) > 0 and a
(I)
(,. .) > 0 for some k. l > 0. Therefore =
a
(kI)
(.. .) > 0, and a
(n(kI))
(.. .) _
n
. We deduce that j() _
1{(kI)
> 0.
In order to prove the variational characterization of j(), let
t
0
= inf{t > 0 [ there is g: X (0. o) with g _ t g].
If g _ t g, where g is a positive function on X, then also
n
g _ t
n
g. Thus
a
(n)
(.. ,) g(,) _ t
n
g(.), whence
a
(n)
(.. ,)
1{n
_ t
_
g(.),g(,)
_
1{n
for each n. This implies j() _ t , and therefore j() _ t
0
.
Next, let G(.. ,[:) =

o
n=0
a
(n)
(.. ,) :
n
. This power series has radius of
convergence 1,j(), and
G(.. ,[:) =
x
(,)
u
a(.. n) : G(n. ,[:).
Now let t > j(), x , X and set
g
t
(.) = G(.. ,[1,t )
G(,. ,[1,t ).
Then we see that g
t
(.) > 0 and g
t
(.) _ t g
t
(.) for all .. Therefore t
0
= j().
We still need to show that the inmum is a minimum. Note that g
t
(,) = 1 for
our xed ,. We choose a strictly decreasing sequence (t
k
)
k_1
with limit j() and
set g
k
= g
t
k
. Then, for each n N and . X,
t
n
1
_ t
n
k
= t
n
k
g
k
(,) _
n
g
k
(,) _ a
(n)
(,. .)g
k
(.).
For each . there is n
x
such that a
(n
x
)
(,. .) > 0. Therefore
g
k
(.) _ C
x
= t
n
x
1
,a
(n
x
)
(,. .) < o for all k.
By the HeineBorel theorem, there must be a subsequence
_
g
k(n)
_
n_1
that con-
verges pointwise to a limit function h. We have for each . X
uA
a(.. n) g
k(n)
(n) _ t
k(n)
g
k(n)
(.).
We can pass to the limit as m o and obtain h _ j() h. Furthermore,
h _ 0 and h(,) = 1. Therefore j()
n
h(.) _ a
(n)
(.. ,)h(,) > 0 if n is such that
a
(n)
(.. ,) > 0. Thus, the inmum is indeed a minimum.
3.34 Denition. The matrix is called primitive if there exists an n
0
such that
a
(n
0
)
(.. ,) > 0 for all .. , X.
Since X is nite, this amounts to irreducible & aperiodic for stochastic ma-
trices (nite Markov chains), compare with the proof of Theorem 3.28.
3.35 PerronFrobenius theorem. Let =
_
a(.. ,)
_
x,,A
be a nite, irreducible,
non-negative matrix. Then
(a) j() is an eigenvalue of , and [z[ _ j() for every eigenvalue z of .
(b) There is a strictly positive function h on X that spans the right eigenspace of
with respect to the eigenvalue j(). Furthermore, if a function : X
0. o) satises _ j() then = c h for some constant c.
(c) There is a strictly positive measure v on X that spans the left eigenspace of
with respect to the eigenvalue j(). Furthermore, if a non-negative measure
v on X satises j _ j() j then j = c v for some constant c.
(d) If in addition is primitive, and v and h are normalized such that
x
h(.) v(.) = 1 then [z[ < j() for every z spec() \ {j()], and
lim
n-o
a
(n)
(.. ,),j()
n
= h(.) v(,) for all .. , X.
Proof. We know from Theorem 3.33 that there is a positive function h on X that
satises h _ j() h. We dene a new matrix 1 over X by
(.. ,) =
a(.. ,) h(,)
j() h(.)
. (3.36)
This matrix is substochastic. Furthermore, it inherits irreducibility from . If
D = D
h
is the diagonal matrix with diagonal entries h(.), . X, then
1 =
1
j()
D
-1
D. whence 1
n
=
1
j()
n
D
-1
n
D for every n N.
That is,
(n)
(.. ,) =
a
(n)
(.. ,) h(,)
j()
n
h(.)
. (3.37)
Taking n-th roots and the limsup as n o, we see that j(1) = 1. Now
Proposition 2.31 tells us that 1 must be stochastic. But this is equivalent with
h = j() h. Therefore j() is an eigenvalue of , and h belongs to the right
eigenspace. Now, similarly to (3.31),
A (complex) function g on X satises g = z g if and only if
(.) = g(.),h(.) satises 1 =
z
j()
.
(3.38)
Since 1 is stochastic, [z,j()[ _ 1. Thus, we have proved (a).
Furthermore, if z = j() in (3.38) then 1 = , and Theorem 3.29 implies
that is constant, c. Therefore g = c h, which shows that the right
eigenspace of with respect to the eigenvalue j() is spanned by h. Next, let
g = 0 be a non-negative function with g _ j() g. Then
n
g _ j()
n
g for
each n, and irreducibility of implies that g > 0. We can replace h with g in our
initial argument, and obtain that g = j() g. But this yields that g is a multiple
of h. We have completed the proof of (b).
Statement (c) follows by replacing with the transposed matrix
t
.
Finally, we prove (d). Since is primitive, 1 is irreducible and aperiodic. The
fact that [z[ < j() for each eigenvalue z = j() of follows fromTheorem3.29
via the arguments used to prove (b): namely, z is an eigenvalue of if and only if
z,j() is an eigenvalue of 1.
Suppose that v and h are normalized as proposed. Then m(.) = v(.)h(.) is a
probability measure on X, and we can write m = vD. Therefore
m1 =
1
j()
vDD
-1
D =
1
j()
vD = vD = m.
We see that m is the unique stationary probability measure for 1. Theorem 3.28
implies that
(n)
(.. ,) m(,) = h(,)v(,). Combining this with (3.37), we get
the proposed asymptotic formula.
The following is usually also considered as part of the PerronFrobenius theo-
rem.
3.39 Proposition. Under the assumptions of Theorem 3.35, j() is a simple root
of the characteristic polynomial of the matrix .
Proof. Recall the denition and properties of the adjunct matrix

of a square
matrix (not to be confused with the adjoint matrix

t
). Its elements have the form
a(.. ,) = (1)
e
det( [ ,. .).
where ( [ ,. .) is obtained from by deleting the row of , and the column of
., and c = 0 or = 1 according to the parity of the position of (.. ,); in particular
c = 0 when , = .. For z C, let
2
= z1 and

2
its adjunct. Then
2
=

2
=
(z) 1. (3.40)
where
(z) is the characteristic polynomial of . If we set z = j = j(),

then we see that each column of

p
is a right j-eigenvector of , and each row
is a left j-eigenvector of . Let h and v, respectively, be the (positive) right and
left eigenvectors of that we have found in Theorem 3.35, normalized such that
x
h(.)v(.) = 1.
3.41 Exercise. Deduce that a
p
(.. ,) = h(.) v(,), where R, whence
p
h = h.
We continue the proof of the proposition by showing that > 0. Consider
the matrix
x
over X obtained from by replacing all elements in the row and
the column of . with 0. Then
x
_ elementwise, and the two matrices do not
coincide. Exercise 3.43 implies that j > [z[ for every eigenvalue z of
x
, that is,
det(j 1
x
) > 0. (This is because the leading coefcient of the characteristic
polynomial is 1, so that the polynomial is positive for real arguments that are bigger
than its biggest real root.) Since det(j 1
x
) = j a
p
(.. .), we see indeed that
> 0.
We now differentiate (3.40) with respect to z and set z = j: writing

t
p
for that
elementwise derivative, since
t
p
= 1, the product rule yields
t
p

p

p
=
t
(j) 1.
We get
(j) h =

t
p

p
h
= 0
p
h = h.
Therefore
t
(j) = > 0, and j is a simple root of
( ).
3.42 Proposition. Let X be nite and , T be two non-negative matrices over X.
Suppose that is irreducible and that b(.. ,) _ a(.. ,) for all .. ,. Then
max{[z[ : z spec(T)] _ j().
Proof. Let z spec(T) and : X C an associated eigenfunction (right eigen-
vector). Then
[z[ [ [ = [T [ _ T[ [ _ [ [.
Now let v be as in Theorem 3.35 (c). Then
[z[
xA
v(.)[(.)[ _ v[ [ = j()
xA
v(.)[(.)[.
Therefore [z[ _ j().
3.43 Exercise. Showthat in the situation of Proposition 3.42, one has that max{[z[ :
z spec(T)] = j() if and only if T = .
[Hint: dene 1 as in (3.36), where h = j() h, and let
q(.. ,) =
b(.. ,)h(,)
j()h(.)
.
Show that = T if and only if Q is stochastic. Then assume that T = and that
T is irreducible, and show that j(T) < j() in this case. Finally, when T is not
irreducible, replace T by a slightly bigger matrix that is irreducible and dominated
by .]
There is a weak formof the PerronFrobenius theoremfor general non-negative
matrices.
3.44 Proposition. Let =
_
a(.. ,)
_
x,,A
be a nite, non-zero, non-negative
matrix. Then has a positive eigenvalue j = j() with non-negative (non-
zero) left and right eigenvectors v and h, respectively, such that [z[ _ j for every
eigenvalue of .
Furthermore, j = max
C
j(
C
), where C ranges over all irreducible classes
with respect to , the matrix
C
is the restriction of to C, and j(
C
) is the
PerronFrobenius eigenvalue of the irreducible matrix
C
.
Proof. Let 1 be the matrix with all entries = 1. Let
n
=
1
n
1, a non-negative,
irreducible matrix. Let h
n
> 0 and v
n
> 0 be such that
n
h
n
= j(
n
) h
n
and
v
n
n
= j(
n
) v
n
. Normalize h
n
and v
n
such that
x
h
n
(.) =

x
v
n
(.) = 1.
By compactness, there are a subsequence (n
t
) and a function (column vector) h, as
well as a measure (row vector) v such that h
n
0 h and v
n
0 v. We have h _ 0,
v _ 0, and
x
h(.) =

x
v(.) = 1, so that h. v = 0.
By Proposition 3.42, the sequence
_
j(
n
)
_
is decreasing. Let j be its limit.
Then h = j h and v = j v. Also by Proposition 3.42, every z spec()
satises [z[ _ j(
n
). Therefore
max{[z[ : z spec()] _ j.
as proposed.
If h(,) > 0 and a(.. ,) > 0 then h(.) _ j a(.. ,) h(,) > 0. Therefore
h(.) > 0 for all . C(,), the irreducible class of , with respect to the matrix .
We see that the set {. X : h(.) > 0] is the union of irreducible classes. Let
C
0
be such an irreducible class on which h > 0, and which is maximal with this
property in the partial order on the collection of irreducible classes. Let h
C
0
be
the restriction of h to C
0
. Then
C
0
h
C
0
= j h
C
0
. This implies (why?) that
j = j(
C
0
).
E. The convergence theorem for positive recurrent Markov chains 63
Finally, if C is any irreducible class, then consider the truncated matrix
C
as
a matrix on the whole of X. It is dominated by , whence by
n
, which implies
via Proposition 3.42 that j(
C
) _ j(
n
) for each n, and in the limit, j(
C
) _ j.
3.45 Exercise. Verify that the variational characterization of Theorem 3.33 for
j() also holds when is an innite non-negative irreducible matrix, as long as
j() is nite. (The latter is true, e.g., when has bounded row sums or bounded
column sums, and in particular, when is substochastic.)
E The convergence theorem for positive recurrent
Markov chains
Here, we shall (re)prove the analogue of Theorem 3.28 for an arbitrary Markov
chain (X. 1) that is irreducible, aperiodic and positive recurrent, when X is not
necessarily nite, nor t(1
k
) < 1 for some k. The price that we have to pay
is that we shall not get exponential decay in the approximation of the stationary
probability by v1
n
. Except for this last fact, we could of course have omitted the
extra Section C regarding the nite case and appeal directly to the main theorem
that we are going to prove. However, the method used in Section C is interesting
on its own right, which is another reason for having chosen this slight redundancy
in the presentation of the material.
We shall also determine the convergence behaviour of n-step transition proba-
bilities when (X. 1) is null recurrent.
The clue for dealingwiththe positive recurrent case is the following: we consider
two independent versions (7
1
n
) and (7
2
n
) of the Markov chain, one with arbitrary
initial distribution v, and the other with initial distribution m, the stationary proba-
bility measure of 1. Thus, 7
2
n
will have distribution mfor each n. We then consider
the stopping time t
T
when the two chains rst meet. On the event t
T
_ n|, it will
be easily seen that 7
1
n
and 7
2
n
have the same distribution. The method also adapts
to the null recurrent case.
What we are doing here is to apply the so-called coupling method: we construct
a larger probability space on which both processes can be dened in such a way
that they can be compared in a suitable way. This type of idea has many fruitful
applications in probability theory, see the book by Lindvall [Li].
We now elaborate the details of that plan. We consider the new state space
X X with transition matrix Q = 1 1 given by
q
_
(.
1
. .
2
). (,
1
. ,
2
)
_
= (.
1
. ,
1
) (.
2
. ,
2
).
3.46 Lemma. If (X. 1) is irreducible and aperiodic, then the same holds for the
Markov chain (X X. 1 1). Furthermore, if (X. 1) is positive recurrent, then
so is (X X. 1 1).
Proof. We have
q
(n)
_
(.
1
. .
2
). (,
1
. ,
2
)
_
=
(n)
(.
1
. ,
1
)
(n)
(.
2
. ,
2
).
By irreducibility and aperiodicity, Lemma 2.22 implies that there are indices n
i
=
n(.
i
. ,
i
) such that
(n)
(.
i
. ,
i
) > 0 for all n _ n
i
, i = 1. 2. Therefore also Q is
irreducible and aperiodic.
If (X. 1) is positive recurrent and mis the unique invariant probability measure
of 1, then m m is an invariant probability measure for Q. By Theorem 3.19, Q
is positive recurrent.
Let v
1
and v
2
be probability measures on X. If we write (7
1
n
. 7
2
n
) for the
Markov chain with initial distribution v
1
v
2
and transition matrix Q on X X,
then for i = 1. 2, (7
i
n
) is the Markov chain on X with initial distribution v
i
and
transition matrix 1.
Now let D = {(.. .) : . X] be the diagonal of XX, and t
T
the associated
stopping time with respect to Q. This is the time when (7
1
n
) and (7
2
n
) rst meet
(after starting).
3.47 Lemma. For every . X and n N,
Pr
2
7
1
n
= .. t
T
_ n| = Pr
2
7
2
n
= .. t
T
_ n|.
Proof. This is a consequence of the Strong Markov Property of Exercise 1.25. In
detail, if k _ n, then
Pr
2
7
1
n
= .. t
T
= k| =
uA
Pr
2
7
1
n
= .. 7
1
k
= n. t
T
= k|
=
uA
(n-k)
(n. .) Pr
2
7
1
k
= n. t
T
= k|
= Pr
2
7
2
n
= .. t
T
= k|.
since by denition, 7
1
k
= 7
2
k
, if t
T
= k. Summing over all k _ n, we get the
proposed identity.
3.48 Theorem. Suppose that (X. 1) is irreducible and aperiodic.
(a) If (X. 1) is positive recurrent, then for any initial distribution v on X,
lim
n-o
[v1
n
m[
1
= 0.
and in particular
lim
n-o
Pr
7
n
= .| = m(.) for every . X.
where m(.) = 1,E
x
(t
x
) is the stationary probability distribution.
(b) If (X. 1) is null recurrent or transient, then for any initial distribution v on X,
lim
n-o
Pr
7
n
= .| = 0 for every . X.
Proof. (a) In the positive recurrent case, we set v
1
= v and v
2
= mfor dening the
initial measure of (7
1
n
. 7
2
n
) on XX in the above construction. Since t
T
_ t
(x,x)
,
Lemma 3.46 implies that
t
T
< o Pr
2
-almost surely. (3.49)
(This probability measure lives on the trajectory space associated with Q over
X X !)
We have [v1
n
m[
1
= [v
1
1
n
v
2
1
n
[
1
. Abbreviating Pr
2
= Pr and
using Lemma 3.47, we get
[v
1
1
n
v
2
1
n
[
1
=
,A
Pr7
1
n
= ,| Pr7
2
n
= ,|
,A
Pr7
1
n
= ,. t
T
_ n| Pr7
1
n
= ,. t
T
> n|
Pr7
2
n
= ,. t
T
_ n| Pr7
2
n
= ,. t
T
> n|
,A
_
Pr7
1
n
= ,. t
T
> n| Pr7
2
n
= ,. t
T
> n|
_
_ 2 Prt
T
> n|.
(3.50)
By (3.49), we have that Prt
T
> n| Prt
T
= o| = 0 as n o.
(b) If (X. 1) is transient then statement (b) is immediate, see Exercise 3.7.
So let us suppose that (X. 1) is null recurrent. In this case, it may happen that
(X X. Q) becomes transient. [Example: take for (X. 1) the simple random walk
on Z
2
; see (4.64) in Section 4.E.] Then, setting v
1
= v
2
= v, we get that
_
Pr
7
n
= .|
_
2
= Pr
2
(7
1
n
. 7
2
n
) = (.. .)| 0 as n o.
once more by Exercise 3.7. (Note again that Pr
and Pr
2
live on different
trajectory spaces !)
Since (X. 1) is a factor chain of (X X. Q), the latter cannot be positive
recurrent when (X. 1) is null recurrent; see Exercise 3.14. Therefore the remaining
case that we have to consider is the one where both (X. 1) and the product chain
(X X. Q) are null recurrent. Then we rst claim that for any choice of initial
probability distributions v
1
and v
2
on X,
lim
n-o
_
Pr
1
7
n
= .| Pr
2
7
n
= .|
_
= 0 for every . X.
Indeed, (3.49) is valid in this case, and we can use once more (3.50) to nd that
[v
1
1
n
v
2
1
n
[
1
_ 2 Prt
T
> n| 0.
Setting v
1
= v and v
2
= v1
n
and replacing n with nm, this implies in particular
that
lim
n-o
_
Pr
7
n
= .| Pr
7
n-n
= .|
_
= 0 for every . X. m _ 0. (3.51)
The remaining arguments involve only the basic trajectory space associated with
(X. 1). Let c > 0. We use the null of null recurrence, namely, that
E
x
(t
x
) =
o
n=0
Pr
x
t
x
> m| = o.
Hence there is M = M
t
such that
n=0
Pr
x
t
x
> m| > 1,c.
For n _ M, the events
n
= 7
n-n
= .. 7
n-nk
= . for 1 _ k _ m| are
pairwise disjoint for m = 0. . . . . M. Thus, using the Markov property,
1 _
n=0
Pr
(
n
)
=
n=0
Pr
7
n-nk
= . for 1 _ k _ m [ 7
n-n
= .| Pr
7
n-n
= .|
=
n=0
Pr
x
t
x
> m| Pr
7
n-n
= .|.
Therefore, for each n _ M there must be m = m(n) {0. . . . . M] such that
Pr
7
n-n(n)
= .| < c. But Pr
7
n
= .| Pr
7
n-n(n)
= .| 0 by (3.51) and
boundedness of m(n). Thus
limsup
n-o
Pr
7
n
= .| _ c
for every c > 0, proving that Pr
7
n
= .| 0.
3.52 Exercise. Let (X. 1) be a recurrent irreducible Markov chain (X. 1), J its
period, and C
0
. . . . . C
d-1
its periodic classes according to Theorem 2.24.
(1) Show that (X. 1) is positive recurrent if and only if the restriction of 1
d
to
C
i
is positive recurrent for some (==all) i {0. . . . . J 1].
(2) Show that for .. , C
i
,
lim
n-o
(nd)
(.. ,) = J m(,) with m(,) = 1,E
,
(t
,
).
(3) Determine the limiting behaviour of
(n)
(.. ,) for . C
i
. , C
}
.
In the hope that the reader will have solved this exercise before proceeding,
we now consider the general situation. Theorem 3.48 was formulated for an irre-
ducible, positive recurrent Markov chain (X. 1). It applies without any change to
the restriction of a general Markov chain to any of its essential classes, if the latter
is positive recurrent and aperiodic. We now want to nd the limiting behaviour
of n-step transition probabilities in the case when there may be several essential
irreducible classes, as well as non-essential ones.
For .. , X and J = J(,), the period of (the irreducible class of) ,, we dene
J
i
(.. ,) = Pr
x
s
,
< o and s
,
r mod J|
=
o
n=0
(ndi)
(.. ,). r = 0. . . . . J 1.
(3.53)
We make the following observations.
(i) J(.. ,) = J
0
(.. ,) J
1
(.. ,) J
d-1
(.. ,).
(ii) If . - , then by Theorem 2.24 there is a unique r such that J(.. ,) =
J
i
(.. ,), while J
}
(.. ,) = 0 for all other {0. 1. . . . . J 1].
(iii) If furthermore , is a recurrent state then Theorem 3.4 (b) implies that
J
i
(.. ,) = J(.. ,) = U(.. ,) = 1 for the index r that we found above in
(ii).
(iv) If . and , belong to different classes and . ,, then it may well be that
J
i
(.. ,) > 0 for different indices r {0. 1. . . . . J 1]. (Construct examples !)
3.54 Theorem. (a) Let , X be a positive recurrent state and J = J(,) its
period. Then for each . X and r {0. 1. . . . . J 1],
lim
n-o
(ndi)
(.. ,) = J
i
(.. ,) J,E
,
(t
,
).
(b) Let , X be a transient or null recurrent state. Then
lim
n-o
(n)
(.. ,) = 0.
Proof. (a) By Exercise 3.52, for c > 0 there is N
t
> 0 such that for every n _ N
t
,
we have [
(nd)
(,. ,) J m(,)[ < c, where m(,) = 1,E
,
(t
,
). For such n,
applying Theorem 1.38 (b) and equation (1.40),
(ndi)
(.. ,) =
ndi
I=0
(I)
(.. ,)
(ndi-I)
(,. ,)

> 0 only if
r 0 mod J
=
n
k=0
(kdi)
(.. ,)
((n-k)d)
(,. ,)
_
n-1
"
k=0
(kdi)
(.. ,)
_
J m(,) c
_
k>n-1
"
(kdi)
(.. ,).
Since
k>n-1
"

(kdi)
(.. ,) is a remainder term of a convergent series, it tends
to 0, as n o. Hence
limsup
n-o
(ndi)
(.. ,) _ J
i
(.. ,)
_
J m(,) c
_
for each c > 0.
Therefore
limsup
n-o
(ndi)
(.. ,) _ J
i
(.. ,) J m(,).
Analogously, the inequality
(ndi)
(.. ,) _
n-1
"
k=0
(kdi)
(.. ,)
_
J m(,) c
_
yields
liminf
n-o

(ndi)
(.. ,) _ J
i
(.. ,) J m(,).
This concludes the proof in the positive recurrent case.
(b) If , is transient then G(.. ,) < o, whence
(n)
(.. ,) 0. If , is null
recurrent then we can apply part (1) of Exercise 3.52 to the restriction of 1
d
to C
i
,
the periodic class to which , belongs according to Theorem 2.24. It is again null
recurrent, so that
(nd)
(,. ,) 0 as n o. The proof now continues precisely
as in case (a), replacing the number m(,) with 0.
F The ergodic theorem for positive recurrent Markov chains
The purpose of this section is to derive the second important Markov chain limit
theorem.
F. The ergodic theorem for positive recurrent Markov chains 69
3.55 Ergodic theorem. Let (X. 1) be a positive recurrent, irreducible Markov
chain with stationary probability measure m( ). If : X R is m-integrable,
that is
_
[ [ Jm =

x
[(.)[ m(.) < o, then for any starting distribution,
lim
1-o
1
N
1-1
n=0
(7
n
) =
_
A
Jm almost surely.
As a matter of fact, this is a special case of the general ergodic theorem of
Birkhoff and von Neumann, see e.g. Petersen [Pe]. Before the proof, we need
some preparation. We introduce new probabilities that are dual to the
(n)
(.. ,)
which were dened in (1.28):
(n)
(.. ,) = Pr
x
7
n
= ,. 7
k
= . for k {1. . . . . n]| (3.56)
is the probability that the Markov chain starting at . is in , at the n-th step before
returning to .. In particular,
(0)
(.. .) = 1 and
(0)
(.. ,) = 0 if . = ,. We can
dene the associated generating function
1(.. ,[:) =
o
n=0
(n)
(.. ,) :
n
. 1(.. ,) = 1(.. ,[1). (3.57)
We have 1(.. .[:) = 1 for all :. Note that while the quantities
(n)
(.. ,), n N,
are probabilities of disjoint events, this is not the case for
(n)
(.. ,), n N.
Therefore, unlike J(.. ,), the quantity 1(.. ,) is not a probability. Indeed, it is
the expected number of visits in , before returning to the starting point ..
3.58 Lemma. If , ., or if , is a transient state, then 1(.. ,) < o.
Proof. The statement is clear when . = ,. Since
(n)
(.. ,) _
(n)
(.. ,), it is also
obvious when , is a transient state.
We now assume that , .. When . , ,, we have 1(.. ,) = 0.
So we consider the case when . - , and , is recurrent. Let C be the irreducible
class of ,. It must be essential. Recall the Denition 2.14 of the restriction 1
C\x
of 1 to C \ {.]. Factorizing with respect to the rst step, we have for n _ 1
(n)
(.. ,) =
uC\x
(.. n)
(n-1)
C\x
(n. ,).
because the Markov chain starting in . cannot exit fromC. Nowthe Green function
of the restriction satises
G
C\x
(n. ,) = J
C\x
(n. ,) G
C\x
(,. ,) _ G
C\x
(,. ,) < o.
Indeed, those quantities can be interpreted in terms of the modied Markov chain
where the state . is made absorbing, so that C \ {.] contains no essential state for
the modied chain. Therefore the associated Green function must be nite. We
deduce
1(.. ,) =
uC\x
(.. n) G
C\x
(n. ,) _ G
C\x
(,. ,) < o.
as proposed.
3.59 Exercise. Prove the following in analogy with Theorem 1.38 (b), (c), (d).
G(.. ,[:) = G(.. .[:) 1(.. ,[:) for all .. ,.
U(.. .[:) =
,
1(.. ,[:) (,. .):. and
1(.. ,[:) =
u
1(.. n[:) (n. ,):. if , = .
for all : in the common domain of convergence of the power series involved.
3.60 Exercise. Derive a different proof of Lemma 3.58 in the case when . - ,:
show that for [:[ < 1,
1(.. ,[:)1(,. .[:) = J(.. ,[:)J(,. .[:).
and let : 1.
[Hint: multiply both sides by G(.. .[:)G(,. ,[:).]
3.61 Exercise. Show that when . is a recurrent state,
,A
1(.. ,) = E
x
(t
x
).
[Hint: use again that

,
G(.. ,[:) = 1,(1 :) and apply the rst formula of
Exercise 3.59 in the same way as Theorem 1.38 (b) was used in (3.11) and (3.12).]
Thus, for a positive recurrent chain, 1(.. ,) has a particularly simple form.
3.62 Corollary. Let (X. 1) be a positive recurrent, irreducible Markov chain with
stationary probability measure m( ). Then for all .. , X,
1(.. ,) = m(,),m(.) = E
x
(t
x
),E
,
(t
,
).
Proof. We have 1(.. .) = 1 = U(.. .) by recurrence. Therefore the second and
the third formula of Exercise 3.59 show that for any xed ., the measure v(,) =
1(.. ,) is stationary. Exercise 3.61 shows that v(X) = E
x
(t
x
) = 1,m(.) < o
in the positive recurrent case. By Theorem 3.19, v =
1
m(x)
m.
After these preparatory exercises, we dene a sequence (t
x
k
)
k_0
of stopping
times by
t
x
0
= 0 and t
x
k
= inf{n > t
x
k-1
: 7
n
= .].
so that t
x
1
= t
x
, as dened in (1.26). Thus, t
x
k
is the random instant of the k-th
visit in . after starting.
The following is an immediate consequence of the strong Markov property, see
Exercise 1.25. For safetys sake, we outline the proof.
3.63 Lemma. In the recurrent case, all t
x
k
are a.s. nite and are indeed stopping
times. Furthermore, the random vectors with random length
(t
x
k
t
x
k-1
: 7
i
. i = t
x
k-1
. . . . . t
x
k
1). k _ 1.
are independent and identically distributed.
Proof. We abbreviate t
x
k
= t
k
.
First of all, we can decide whether t
k
_ n| by looking only at the initial
trajectory (7
0
. 7
1
. . . . . 7
n
). Indeed, we only have to check whether . occurs at
least k times in (7
1
. . . . . 7
n
). Thus, t
k
is a stopping time for each k.
By recurrence, t
1
= t
x
is a.s. nite. We can now proceed by induction on k.
If t
k
is a.s. nite, then by the strong Markov property, (7
t
k
n
)
n_0
is a Markov
chain with the same transition matrix 1 and starting point .. For this new chain,
t
k1
t
k
plays the same role as t
x
for the original chain. In particular, t
k1
t
k
is a.s. nite.
This proves the rst statement. For the second statement, we have to show the
following: for any choice of k. l. r. s N
0
with k < l and all points .
1
. . . . . .
i-1
.
,
1
. . . . . ,
x-1
X, the events
= t
k
t
k-1
= r. 7
t
k1
i=x
i
. i = 1. . . . . r 1|
and
T = t
I
t
I-1
= s. 7
t
l1
}=,
j
. = 1. . . . . s 1|
are independent. Now, in the time interval t
k
1. t
I
|, the chain (7
n
) visits .
precisely l k times. That is, for the Markov chain (7
t
k
n
)
n_0
, the stopping time
t
I
plays the same role as the stopping time t
I-k
plays for the original chain (7
n
)
n_0
starting at ..
3
In particular, Pr
x
(T) is the same for each l N. Furthermore,
3
More precisely: in terms of Theorem 1.17, we consider (D
, A
, Pr
) = (D, A, Pr
x
), the
trajectory space, but 7
n
= 7
t
k
Cn
. Let t
m
be the stopping time of the n-th return to x of (7
n
).
Then, under the mapping r of that theorem,
t
m
(o) = t
kCm
_
r(o)
_
.
n
= t
k-1
= m| A
ni
and 7
ni
(o) = . for all o
n
. Thus,
Pr
x
( T) =
n
Pr
x
(T[
n
) Pr
x
(
n
)
=
n
Pr
x
t
I-k
t
I-k-1
= s. 7
t
lk1
}=,
j
. = 0. . . . . s 1| Pr
x
(
n
)
= Pr
x
(T)
x
Pr
x
(
n
) = Pr
x
(T) Pr
x
().
as proposed.
Proof of the ergodic theorem. We suppose that the Markov chain starts at .. We
write t
1
= t and, as above, t
x
k
= t
k
. Also, we let
S
1
( ) =
1-1
n=0
(7
n
).
Assume rst that _ 0, and consider the non-negative random variables
Y
k
=
n=t
k
-1
n=t
k1
(7
n
). k _ 1.
By Lemma 3.63, they are independent and identically distributed. We compute,
using Lemma 3.62 in the last step,
E
x
(Y
k
) = E
x
(Y
1
)
= E
x
_
t-1
n=0
(7
n
)
_
= E
x
_
o
n=0
(7
n
) 1
t>nj
_
=
o
n=0
,A
(,) Pr
x
7
n
= ,. t > n|
=
,A
(,) 1(.. ,) =
1
m(.)
_
A
Jm.
which is nite by assumption. The strong law of large numbers implies that
1
k
k
}=1
Y
}
=
1
k
S
t
k
( )
1
m(.)
_
A
Jm Pr
x
-almost surely, as k o.
In the same way, setting 1,
1
k
t
k

1
m(.)
Pr
x
-almost surely, as k o.
In particular, t
k1
,t
k
1 almost surely. Now, for N N, let k(N) be the
(random and a.s. dened) index such that
t
k(1)
_ N < t
k(1)1
.
As N o, also k(N) o almost surely. Dividing all terms in this double
inequality by t
k(1)
, we nd
1 _
N
t
k(1)
_
t
k(1)1
t
k(1)
.
We deduce that t
k(1)
,N 1 almost surely. Since _ 0, we have
1
N
S
t
k.N/
( ) _
1
N
S
1
( ) _
1
N
S
t
k.N/C1
( ).
Now,
1
N
S
t
k.N/
( ) =
t
k(1)
N
1
k(N)
t
k(1)
m(.)
1
k(N)
S
t
k.N/
( )
_
A
Jm Pr
x
-almost surely.
In the same way,
1
N
S
t
k.N/
1
( )
_
A
Jm Pr
x
-almost surely.
Thus, S
1
( ),N
_
A
Jm when _ 0.
If is arbitrary, then we can decompose =

-
and see that
1
N
S
1
( ) =
1
N
S
1
(

)
1
N
S
1
(
-
)
_
A

Jm
_
A
-
Jm =
_
A
Jm
Pr
x
-almost surely.
3.64 Exercise. Above, we have proved the ergodic theorem only for a determin-
istic starting point. Complete the proof by showing that it is valid for any initial
distribution.
[Hint: replace t
x
0
= 0 with s
x
, as dened in (1.26), which is almost surely nite.]

The ergodic theorem for Markov chains has many applications. One of them
concerns the statistical estimation of the transition probabilities on the basis of
observing the evolution of the chain, which is assumed to be positive recurrent.
Recall the random variable v
x
n
= 1
x
(7
n
). Given the observation of the chain in a
time interval 0. N|, the natural estimate of (.. ,) appears to be the number of
times that the chain jumps from . to , relative to the number of visits in .. That
is, our estimator for (.. ,) is the statistic
T
1
=
1-1
n=0
v
x
n
v
,
n1
_
1-1
n=0
v
x
n
.
Indeed, when 7
0
= o, the expected value of the denominator is
1-1
n=0

(n)
(o. .),
while the expected value of the denumerator is

1-1
n=0

(n)
(o. .) (.. ,). (As a
matter of fact, T
1
is the maximum likelihood estimator of (.. ,).) By the ergodic
theorem, with = 1
x
,
1
N
1-1
n=0
v
x
n
m(.) Pr
o
-almost surely.
1
N
1-1
n=0
v
x
n
v
,
n1
m(.)(.. ,) Pr
o
-almost surely.
[Hint: show that (7
n
. 7
n1
) is an irreducible, positive recurrent Markov chain on
the state space {(.. ,) : (.. ,) > 0]. Compute its stationary distribution.]
Combining those facts, we get that
T
1
(.. ,) as N o.
That is, the estimator is consistent.
G -recurrence
In this short section we suppose again that (X. 1) is an irreducible Markov chain.
From 2.C we know that the radius of convergence r = 1,j(1) of the power series
G(.. ,[:) does not depend on .. , X. In fact, more is true:
3.66 Lemma. One of the following holds for r = 1,j(1), where j(1) is the
spectral radius of (X. 1).
(a) G(.. ,[r) = o for all .. , X, or
G. j-recurrence 75
(b) G(.. ,[r) < o for all .. , X.
In both cases, J(.. ,[r) < o for all .. , X.
Proof. Let .. ,. .
t
. ,
t
X. By irreducibility, there are k. _ 0 such that
(k)
(.. .
t
) > 0 and
(I)
(,
t
. ,) > 0. Therefore
o
n=kI
(n)
(.. ,) r
n
_
(k)
(.. .
t
)
(I)
(,
t
. ,) r
kI
o
n=0
(n)
(.
t
. ,
t
) r
n
.
Thus, if G(.
t
. ,
t
[r) = othen also G(.. ,[r) = o. Exchanging the roles of .. ,
and .
t
. ,
t
, we also get the converse implication.
Finally, if mis such that
(n)
(,. .) > 0, then we have in the same way as above
that for each : (0. r)
G(,. ,[:) _
(n)
(,. .) :
n
G(.. ,[:) =
(n)
(,. .) :
n
J(.. ,[:) G(,. ,[:)
by Theorem 1.38 (b). Letting : r from below, we nd that
J(.. ,[r) _ 1
_
(n)
(,. .) r
n
_
.
3.67 Denition. In case (a) of Lemma 3.66, the Markov chain is called j-recurrent,
in case (b) it is called j-transient.
This denition and various results are due to Vere-Jones [49], see also Seneta
[Se]. This is a formal analogue of usual recurrence, where one studies G(.. ,[1)
instead of G(.. ,[r). While in the previous (1996) Italian version of this chapter, I
wrote there is no analogous probabilistic interpretation of j-recurrence, I learnt
in the meantime that there is indeed an interpretation in terms of branching Markov
chains. This will be explained in Chapter 5. Very often, one nds r-recurrent
in the place of j-recurrent, where j = 1,r. There are good reasons for either
terminology.
3.68 Remarks. (a) The Markov chain is j-recurrent if and only if U(.. .[r) = 1
for some (==all) . X. The chain is j-transient if and only if U(.. .[r) < 1 for
some (==all) . X. (Recall that the radius of convergence of U(.. .[ ) is _ r,
and compare with Proposition 2.28.)
(b) If the Markov chain is recurrent in the usual sense, G(.. .[1) = o, then
j(1) = r = 1 and the chain is j-recurrent.
(c) If the Markov chain is transient in the usual sense, G(.. .[1) < o, then
each of the following cases can occur. (We shall see Example 5.24 in Chapter 5,
Section A.)
r = 1, and the chain is j-transient,
r > 1, and the chain is j-recurrent,
r > 1, and the chain is j-transient.
(d) If r > 1 (i.e., j(1) < 1) then in any case the Markov chain is transient in the
usual sense.
In analogy with Denition 3.8, j-recurrence is subdivided in two cases.
3.69 Denition. In the j-recurrent case, the Markov chain is called
j-positive-recurrent, if U
t
(.. .[r) =

o
n=1
n r
n-1
u
(n)
(.. .) < o for
some (==every) . X, and
j-null-recurrent, if U
t
(.. .[r) = ofor some (==every) . X.
3.70 Exercise. Prove that in the (irreducible) j-recurrent case it is indeed true that
when U
t
(.. .[r) < ofor some . then this holds for all . X.
3.71 Exercise. Fix . X and let s = s(.. .) be the radius of convergence of
U(.. .[:). Show that if U(.. .[s) > 1 then the Markov chain is j-positive
recurrent. Deduce that when (X. 1) is not j-positive-recurrent then s(.. .) = r
for all . X.
3.72 Theorem. (a) If (X. 1) is an irreducible, j-positive-recurrent Markov chain
then for .. , X
lim
n-o
r
ndI
(ndI)
(.. ,) = J
J(.. ,[r)
r U
t
(,. ,[r)
.
where J is the period and {0. . . . . J 1] is such that .
kdI
, for some
k _ 0.
(b) If (X. 1) is j-null-recurrent or j-transient then for .. , X
lim
n-o
r
n
(n)
(.. ,) = 0.
Proof. We rst consider the case . = ,. In the j-recurrent case we construct the
following auxiliary Markov chain on N with transition matrix

1 given by
(1. n) = u
(nd)
(,. ,) r
nd
. (n 1. n) = 1.
while (m. n) = 0 in all other cases. See Figure 8.
G. j-recurrence 77
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........................................................................
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........................................................................
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
........................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
........................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
............. . . . . . . . . . . .
............
. . . . . . . . . .
. . . . . . . . . . u
.d/
(.. .) r
nd
..................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............. . . . . . . . . . . .
. . . . . . . . . . . .
u
.2d/
(.. .) r
2d
................................................................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............. . . . . . . . . . . .
. . . . . . . . . . . .
u
.3d/
(.. .) r
3d
................................................................................................................................................................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............. . . . . . . . . . . .
. . . . . . . . . . . .
u
.4d/
(.. .) r
4d
Figure 8
The transition matrix

1 is stochastic as (X. 1) is j-recurrent; see Remark 3.68 (a).
The construction is such that the rst return probabilities of this chain to the state 1
are u
(n)
(1. 1) = u
(nd)
(,. ,) r
nd
. Applying (1.39) both to (X. 1) and to (N.

1),
we see that
(nd)
(,. ,) r
nd
=
(n)
(1. 1). Thus, (N.

1) is aperiodic and recurrent,
and Theorem 3.48 implies that
lim
n-o
(nd)
(,. ,) r
nd
= lim
n-o

(n)
(1. 1) =
1
o
k=1
k u
(k)
(1. 1)
=
J
r U
t
(,. ,[r)
.
This also applies to the j-null-recurrent case, where the limit is 0.
In the j-transient case, it is clear that
(nd)
(,. ,) r
nd
0.
Now suppose that . = ,. We know from Lemma 3.66 that J(.. ,[r) < o.
Thus, the proof can be completed in the same way as in Theorem 3.54.
Chapter 4
Reversible Markov chains
A The network model
4.1 Denition. An irreducible Markov chain (X. 1) is called reversible if there is
a positive measure m on X such that
m(.) (.. ,) = m(,) (,. .) for all .. , X.
We then call m a reversing measure for 1.
This symmetry condition allows the development of a rich theory which com-
prises many important classes of examples and models, such as simple random
walk on graphs, nearest neighbour random walks on trees, and symmetric random
walks on groups. Reversible Markov chains are well documented in the literature.
We refer rst of all to the beautiful little book of Doyle and Snell [D-S], which
lead to a breakthrough of the popularity of random walks. Further valid sources
are, among others Saloff-Coste [SC], several parts of my monograph [W2], and
in particular the (ever forthcoming) perfect book of Lyons with Peres [L-P]. Here
we shall only touch a small part of the vast interesting material, and encourage the
reader to consult those books.
If (X. 1) is reversible, then we call a(.. ,) = m(.)(.. ,) = a(,. .) the
conductance between . and ,, and m(.) the total conductance at ..
Conversely, we can also start with a symmetric function a: X X 0. o)
such that 0 < m(.) =

,
a(.. ,) < o for every . X. Then (.. ,) =
a(.. ,),m(.) denes a reversible Markov chain (random walk).
Reversibility implies that X is the union of essential classes that do not commu-
nicate among each other. Therefore, it is no restriction that we shall always assume
irreducibility of (X. 1).
4.2 Lemma. (1) If (X. 1) is reversible then m( ) is an invariant measure for 1
with total mass
m(X) =
x,,A
a(.. ,).
(2) In particular, (X. 1) is positive recurrent if and only if m(X) < o, and in
this case,
E
x
(t
x
) = m(X),m(.).
(3) Furthermore, also 1
n
is reversible with respect to m.
A. The network model 79
Proof. We have
x
m(.) (.. ,) =
x
m(,) (,. .) = m(,).
The statement about positive recurrence follows from Theorem 3.19, since the
stationary probability measure is
1
m(A)
m( ) when m(X) < o, while otherwise m
is an invariant measure with innite total mass.
Finally, reversibility of 1 is equivalent with symmetry of the matrix D1D
-1
,
where D is the diagonal matrix over X with diagonal entries
_
m(.), . X.
Taking the n-th power, we see that also D1
n
D
-1
is symmetric.
4.3Example. Let I = (X. 1) be a symmetric (or non-oriented) graphwithV(I) =
X and non-empty, symmetric edge set 1 = 1(I), that is, we have .. ,| 1 ==
,. .| 1. (Attention: here we distinguish between the two oriented edges .. ,|
and ,. .|, when . = ,. In classical graph theory, such a pair of edges is usually
considered and drawn as one non-oriented edge.)
We assume that I is locally nite, that is,
deg(.) = [{, : .. ,| 1][ < o for all . X.
and connected. Simple random walk (SRW) on I is the Markov chain with state
space X and transition probabilities
(.. ,) =
_
1, deg(.). if .. ,| 1.
0. otherwise.
Connectedness is equivalent with irreducibility of SRW, and the random walk is
reversible with respect to m(.) = deg(.). Thus, a(.. ,) = 1 if .. ,| 1, and
a(.. ,) = 0, otherwise. The resulting matrix =
_
a(.. ,)
_
x,,A
is called the
adjacency matrix of the graph.
In particular, SRW on the graph I is positive recurrent if and only if I is nite,
and in this case,
E
x
(t
x
) = [1[, deg(.).
(Attention: [1[ counts all oriented edges. If instead we count undirected edges,
then [1[ has to be replaced with 2 the number of edges with distinct endpoints
plus 1 the number of loops.)
For a general, irreducible Markov chain (X. 1) which is reversible with respect
to the measure m, we also consider the associated graph I(1) according to Deni-
tion 1.6. By reversibility, it is again non-oriented (its edge set is symmetric). The
graph is not necessarily locally nite.
The period of 1 is J = 1 or J = 2. The latter holds if and only if the graph
I(1) is bipartite: the vertex set has a partition X = X
1
LX
2
such that each edge
80 Chapter 4. Reversible Markov chains
has one endpoint in X
1
and the other in X
2
. Equivalently, every closed path in
I(1) has an even number of edges.
For an edge e = .. ,| 1 = 1(1), we denote ` e = ,. .|, and write e
-
= .
and e
= , for its initial and terminal vertex, respectively. Thus, e 1 if and

only if a(.. ,) > 0. We call the number r(e) = 1,a(.. ,) = r( ` e) the resistance
of e, or rather, the resistance of the non-oriented edge that is given by the pair of
oriented edges e and ` e.
The triple N = (X. 1. r) is called a network, where we imagine each edge e
as a wire with resistance r(e), and several wires are linked at each node (vertex).
Equivalently, we may think of a system of tubes e with cross-section 1 and length
r(e), connected at the vertices. If we start with X, 1 and r then the requirements
are that (X. 1) is a countable, connected, symmetric graph, and that
0 < m(.) =
eT:e
=x
1,r(e) < o for each . X.
and then (.. ,) = 1
_
m(.)r(.. ,|)
_
whenever r(.. ,|) > 0, as above.
The electric network interpretation leads to nice explanations of various results,
see in particular [D-S] and [L-P]. We will come back to part of it at a later stage.
It will be convenient to introduce a potential theoretic setup, and to involve some
basic functional analysis, as follows. The (real) Hilbert space
2
(X. m) consists of
all functions : X R with [ [
2
= (. ) < o, where the inner product of
two such functions
1
,
2
is
(
1
.
2
) =
xA
1
(.)
2
(.) m(.). (4.4)
Reversibility is the same as saying that the transition matrix 1 acts on
2
(X. m) as
a self-adjoint operator, that is, (1
1
.
2
) = (
1
. 1
2
) for all
1
.
2

2
(X. m).
The action of 1 is of course given by (3.16), 1(.) =

,
(.. ,)(,).
The Hilbert space
2
[
(1. r) consists of all functions : 1 R which are anti-
symmetric: ( ` e) = (e) for each e 1, and such that (. ) < o, where the
inner product of two such functions
1
,
2
is
(
1
.
2
) =
1
2
eT
1
(e)
2
(e) r(e). (4.5)
We imagine that such a function represents a ow, and if (e) _ 0 then this
is the amount per time unit that ows from e
-
to e
, while if (e) < 0 then

(e) = ( ` e) ows from e
to e
-
. Note that (e) = 0 when e is a loop.
We introduce the difference operator
V:
2
(X. m)
2
[
(1. r). V(e) =
(e
) (e
-
)
r(e)
. (4.6)
A. The network model 81
If we interpret as a potential (voltage) on the set of nodes (vertices) X, then
V(e) represents the electric current along the edge e, and the dening equation
for V is just Ohms law.
Recall that the adjoint operator V
+
:
2
[
(1. r)
2
(X. m) is dened by the
equation
(. V
+
) = (V. ) for all
2
(X. m).
2
[
(1. r).
4.7 Exercise. Prove that the operator V has norm[V[ _
_
2, that is, (V. V ) _
2 (. ).
Show that V
+
is given by
V
+
(.) =
1
m(.)
eT : e
C
=x
(e). (4.8)
[Hint: it is sufcient to check the dening equation for the adjoint operator only
for nitely supported functions . Use anti-symmetry of .]
In our interpretation in terms of ows,
e
C
=x, (e)>0
(e) and
e
C
=x, (e)~0
(e)
are the amounts owing into node . resp. out of node ., and m(.)V
+
(.) is the
difference of those two quantities. Thus, if V
+
(.) = 0 then this means that the
ow has no source or sink at .. This is known as Kirchhoff s node law. Later, we
shall give a more precise denition of ows. The Laplacian is the operator
L = V
+
V = 1 1. (4.9)
where 1 is the identity matrix over X and 1 is the transition matrix of our random
walk, both viewed as operators on functions X R.
4.10 Exercise. Verify the equation V
+
V = (1 1) for
2
(X. m).
For reversible Markov chains, the name spectral radius for the number j(1)
is justied in the operator theoretic sense by the following.
4.11 Proposition. If (X. 1) is reversible then
j(1) = [1[.
the norm of 1 as an operator on
2
(X. m).
Proof. First of all,
(n)
(.. .) m(.) = (1
n
1
x
. 1
x
) _ [1[
n
(1
x
. 1
x
) = [1[
n
m(.).
Taking n-th roots and letting n o, we see that j(1) _ [1[.
For showing the (more interesting) reversed inequality, we use the fact that the
linear space
0
(X) of nitely supported real functions on X is dense in
2
(X. m).
Thus, it is sufcient to show that (1. 1 ) _ j(1)
2
(. ) for every non-zero
nitely supported function on X. We rst assume that is non-negative. By
self-adjointness of 1 and a standard use of the CauchySchwarz inequality,
(1
n1
. 1
n1
)
2
= (1
n
. 1
n2
)
2
_ (1
n
. 1
n
)(1
n2
. 1
n2
).
We see that the sequence
_
(1
n1
. 1
n1
),(1
n
. 1
n
)
_
n_0
is increasing. A
basic lemma of elementary calculus says that when a sequence (a
n
) of positive
numbers is such that a
n1
,a
n
converges, then also a
1{n
n
converges, and the two
limits coincide. We claim that
lim
n-o
(1
n
. 1
n
)
1{n
= j(1)
2
.
Choose .
0
supp( ). Then, since _ 0,
(1
n
. 1
n
) =
xsupp(( )
m(.)
_

,supp(( )
(n)
(.. ,)(,)
_
2
_ m(.
0
)
(n)
(.
0
. .
0
)
2
(.
0
)
2
.
Therefore the above limit is _ j(1)
2
. Conversely, given c > 0, there is n
t
such
that
(n)
(.. ,) _
_
j(1) c
_
n
for all n _ n
t
. .. , supp( ).
We infer that
(1
n
. 1
n
) _ C
_
j(1) c
_
2n
. where C =
xsupp(( )
m(.)
_

,supp(( )
(,)
_
2
.
Taking n-th roots and letting n o, we see that the claim is true. Consequently
(1. 1 )
(. )
_
(1
n1
. 1
n1
)
(1
n
. 1
n
)
_ j(1)
2
for every non-negative function in
0
(X). If
0
(X) is arbitrary, then
(1. 1 ) _ (1[ [. 1[ [) _ j(1)
2
([ [. [ [) = j(1)
2
(. ).
whence [1[ _ j(1).
B. Speed of convergence of nite reversible Markov chains 83
B Speed of convergence of nite reversible Markov chains
In this section, we always assume that X is nite and that (X. 1) is irreducible
and reversible with respect to the measure m. Since m(X) < o, we may assume
without loss of generality that m is a probability measure:
xA
m(.) = 1.
If the period of 1 is J = 1, then we know from Theorem 3.28 that the difference
[
(n)
(.. ) m[
1
tends to 0 exponentially fast.
This fact has an algorithmic use: if X is a very large nite set, and the values
m(.) are very small, then we cannot use a random number generator to simulate
the probability measure m. Indeed, such a generator simulates the continuous
equidistribution on the interval 0. 1|. For simulating m, we should partition 0. 1|
into [X[ intervals 1
x
of length m(.), . X. If our generator provides a number ,
then the output of our simulation of mshould be the point ., when 1
x
. However,
if the numbers m(.) are belowthe machines precision, then they will all be rounded
to 0, and the simulation cannot work. An alternative is to run a Markov chain whose
stationary probability distribution is m. It is chosen such that at each step, there are
only relatively few possible transitions which all have relatively large probabilities,
so that they can be efciently simulated by the above method. If we start at .
and make n steps then the distribution
(n)
(.. ) of 7
n
is very close to m. Thus,
the random element of X that we nd after the simulation of n successive steps
of the Markov chain is almost distributed as m. (Random is in quotation
marks because the random number generator is based on a clever, but deterministic
algorithmwhose output is almost equidistributed on 0. 1|.) The basic question is
now: howmany steps of the Markov chain should we performso that the distribution
(n)
(.. ) is sufciently close (for our purposes) to m? That is, given a small c > 0,
we want to know how large we have to choose n such that
[
(n)
(.. ) m[
1
< c.
This is a mathematical analysis that we should best perform before starting our
algorithm. The estimate of the speed of convergence, i.e., the parameter t found in
Theorem 3.28, is in general rather crude, and we need better methods of estimation.
Here we shall present only a small glimpse of some basic methods of this type.
There is a vast literature, mostly of the last 2025 years. Good parts of it are
documented in the book of Diaconis [Di] and the long and detailed exposition of
Saloff-Coste [SC].
The following lemma does not require niteness of X, but we need positive
recurrence, that is, we need that m is a probability measure.
4.12 Lemma. [
(n)
(.. ) m[
2
1
_

(2n)
(.. .)
m(.)
1.
Proof. We use the CauchySchwarz inequality.
[
(n)
(.. ) m[
2
1
=
_
,A
[
(n)
(.. ,) m(,)[
_
m(,)
_
m(,)
_
2
_
,A
_
(n)
(.. ,) m(,)
_
2
m(,)
,A
m(,)

= 1
=
,A
(n)
(.. ,)
2
m(,)
2
,A
(n)
(.. ,)
,A
m(,)
=
,A
(n)
(.. ,)
(n)
(,. .)
m(.)
1 =
1
m(.)
(2n)
(.. .) 1.
as proposed.
Since X is nite and 1 is self-adjoint on
2
(X. m) R
A
, the spectrum of 1 is
real. If (X. 1), besides being irreducible, is also aperiodic, then 1 < z < 1 for
all eigenvalues of 1 with the exception of z = 1; see Theorem 3.29. We dene
z
+
= z
+
(1) = max{[z[ : z spec(1). z = 1]. (4.13)
4.14 Lemma. [
(n)
(.. .) m(.)[ _
_
1 m(.)
_
z
n
+
.
Proof. This is deduced by diagonalizing the self-adjoint operator 1 on
2
(X. m).
We give the details, using only elementary facts from basic linear algebra.
For our notation, it will be convenient to index the eigenvalues with the elements
of X, as z
x
, . X, including possible multiplicities. We also choose a root
o X and let the maximal eigenvalue correspond to o, that is, z
o
= 1 and z
x
< 1
for all . = o. Recall that D1D
-1
is symmetric, where D = diag
__
m(.)
_
xA
.
There is a real matrix V =
_
(.. ,)
_
x,,A
such that
D1D
-1
= V
-1
V. where = diag(z
x
)
xA
.
and V is orthogonal, V
-1
= V
t
. The row vectors of the matrix VD are left
eigenvectors of 1. In particular, recall that z
o
= 1 is a simple eigenvalue, so that
each associated left eigenvector is a multiple of the unique stationary probability
measure m. Therefore there is C = 0 such that (o. .)
_
m(.) = C m(.) for
each .. Since
x
(o. .)
2
= 1 by orthogonality of V , we nd C
2
= 1. Therefore
(o. .)
2
= m(.).
We have 1
n
= (D
-1
V
t
)
n
(VD), that is
(n)
(.. .
t
) =
,A
m(.)
-1{2
(,. .) z
n
,
(,. .
t
) m(.
t
)
1{2
.
Therefore
(n)
(.. .) m(.)
,A
(,. .)
2
z
n
,
(o. .)
2
,yo
(,. .)
2
z
n
x
_
_

,A
(,. .)
2
(o. .)
2
_
z
n
+
=
_
1 m(.)
_
z
n
+
.
As a corollary, we obtain the following important estimate.
4.15 Theorem. If X is nite, 1 irreducible and aperiodic, and reversible with
respect to the probability measure m on X, then
[
(n)
(.. ) m[
1
_
_
_
1 m(.)
_
m(.) z
n
+
.
We remark that aperiodicity is not needed for the proof of the last theorem, but
without it, the result is not useful. The proof of Lemma 4.14 also leads to the better
estimate
[
(n)
(.. ) m[
2
1
_
1
m(.)
,yo
(,. .)
2
z
2n
x
(4.16)
in the notation of that proof.
If we want to use Theorem 4.15 for bounding the speed of convergence to
the stationary distribution, then we need an upper bound for the second largest
eigenvalue z
1
= max
_
spec(1) \ {1]
_
, and a lower bound for z
min
= min spec(1)
[since z
+
= max{z
1
. z
min
]|, unless we can compute these numbers explicitly. In
many cases, reasonable lower bounds on z
min
are easy to obtain.
4.17 Exercise. Let (X. 1) be irreducible, with nite state space X. Suppose that
one can write 1 = a 1 (1 a) Q, where 1 is the identity matrix, Q is another
stochastic matrix, and 0 < a < 1. Show that z
min
(1) _ 1 2a, with equality
when z
min
(Q) = 1.
More generally, suppose that there is an odd k _ 1 such that 1
k
_ a 1
elementwise, where a > 0. Deduce that z
min
(1) _ (1 2a)
1{k
.
Random walks on groups
A simplication arises in the case of random walks on groups. Let G be a nite or
countable group, in general written multiplicatively (unless the group is Abelian,
in which case often is preferred for the group operation). By slightly unusual
notation, we write o for its unit element. (The symbol e is used for edges of graphs
here.) Also, let j be a probability measure on G. The (right) random walk on G
with law j is the Markov chain with state space G and transition probabilities
(.. ,) = j(.
-1
,). .. , G. (4.18)
The random walk is called symmetric, if (.. ,) = (,. .) for all .. ,, or equiv-
alently, j(.
-1
) = j(.) for all . G. Then 1 is reversible with respect to the
counting measure (or any multiple thereof).
Instead of the trajectory space, another natural probability space can be used
to model the random walk with law j on G. We can equip C
+
= G
N
with the
product o-algebra A
+
of the discrete one on G (the family of all subsets of G). As
in the case of the trajectory space, it is generated by the family of all cylinder
sets, which are of the form

n
n
, where
n
G and
n
= G for only nitely
many n. Then G
N
is equipped with the product measure Pr
+
= j
N
, which is the
unique measure on A
+
that satises Pr
+
_
n
n
_
=

n
j(
n
) for every cylinder
set as above. Now let Y
n
: G
N
G be the n-th projection. Then the Y
n
are i.i.d.
G-valued random variables with common distribution j. The random walk (4.18)
starting at .
0
G is then modeled as
7
+
0
= .
0
. 7
+
n
= .
0
Y
1
Y
n
. n _ 1.
Indeed, Y
n1
is independent of 7
0
. . . . . 7
n
, whence
Pr
+
7
+
n1
= , [ 7
+
n
= .. 7
+
k
= .
k
(k < n)|
= Pr
+
Y
+
n1
= .
-1
, [ 7
+
n
= .. 7
+
k
= .
k
(k < n)|
= Pr
+
Y
+
n1
= .
-1
,| = j(.
-1
,)
(as long as the conditioning event has non-zero probability). According to Theo-
rem 1.17, the natural measure preserving mapping t from the probability space
(C
+
. A
+
. Pr
+
) to the trajectory space equipped with the measure Pr
x
0
is given by
(,
n
)
n_1
(:
n
)
n_0
. where :
n
= .
0
,
1
,
n
.
In order to describe the transition probabilities in n steps, we need the denition of
the convolution j
1
+ j
2
of two measures j
1
. j
2
on the group G:
j
1
+ j
2
(.) =
,G
j
1
(,)j
2
(,
-1
.). (4.19)
4.20 Exercise. Suppose that Y
1
and Y
2
are two independent, G-valued random
variables with respective distributions j
1
and j
2
. Show that the product Y
1
Y
2
has
distribution j
1
+ j
2
.
We write j
(n)
= j+ +j (n times) for the n-th convolution power of j, with
j
(0)
=
o
, the point mass at the group identity. We observe that
supp(j
(n)
) = {.
1
.
n
[ .
i
supp(j)].
4.21 Lemma. For the random walk on G with law j, the transition probabilities
in n steps are
(n)
(.. ,) = j
(n)
(.
-1
,).
The random walk is irreducible if and only if
o
_
n=1
supp(j
(n)
) = G.
Proof. For n = 1, the rst assertion coincides with the denition of 1. If it is true
for n, then
(n1)
(.. ,) =
uA
(.. n)
(n)
(n. ,)
=
uA
j(.
-1
n)j
(n)
(n
-1
,) setting = .
-1
n|
=
A
j()j
(n)
(
-1
.
-1
,) since n
-1
=
-1
.
-1
|
= j + j
(n)
(.
-1
,).
To verify the second assertion, it is sufcient to observe that o
n
. if and only if
. supp(j
(n)
), and that .
n
, if and only if o
n
.
-1
,.
Let us now suppose that our group G is nite and that the random walk on G is
irreducible, and therefore recurrent. The stationary probability measure is uniform
distribution on G, that is, m(.) = 1,[G[ for every . G. Indeed,
xA
1
[G[
(.. ,) =
1
[G[
xA
j(.
-1
,) =
1
[G[
.
the transition matrix is doubly stochastic (both rowand column sums are equal to 1).
If in addition the random walk is also symmetric, then the distinct eigenvalues of 1
1 = z
0
> z
1
> > z
q
_ 1
are all real, with multiplicities mult(z
i
) and

q
i=0
mult(z
i
) = [G[. By Theo-
rem 3.29, we have mult(z
0
) = 1, while z
q
= 1 if and only if the random walk
has period 2.
4.22 Lemma. For a symmetric, irreducible random walk on the nite group G, one
has
(n)
(.. .) =
1
[G[
q
i=0
mult(z
i
) z
n
i
.
Proof. We have
(n)
(.. .) = j
(n)
(o) =
(n)
(,. ,) for all .. , G. Therefore
(n)
(.. .) =
1
[G[
,A
(n)
(,. ,) =
1
[G[
tr(1
n
).
where tr(1
n
) is the trace (sum of the diagonal elements) of the matrix 1
n
, which
coincides with the sum of the eigenvalues (taking their multiplicities into account).
In this specic case, the inequalities of Lemmas 4.12 and 4.14 become
[
(n)
(.. ) m[
1
_
_
[G[
(2n)
(.. .) 1 =
_
q
i=1
mult(z
i
) z
2n
i
_
_
[G[ 1 z
n
+
.
(4.23)
where the measure m is equidistribution on G, that is, m(.) = 1,[G[. and z
+
=
max{z
1
. z
q
].
4.24 Example (Random walk on the hypercube). The hypercube is the (additively
written) Abelian group G = Z
d
2
, where Z
2
= {0. 1] is the group with two elements
and addition modulo 2. We can viewit as a (non-oriented) graph with vertex set Z
d
2
,
and with edges between every pair of points which differ in exactly one component.
This graph has the form of a hypercube in J dimensions. Every point has J
neighbours. According to Example 4.3, simple random walk is the Markov chain
which moves froma point to any of its neighbours with equal probability 1,J. This
is the symmetric random walk on the group Z
d
2
whose lawj is the equidistribution
on the points (vectors) e
i
=
_
i
( )
_
}=1,...,d
. The associated transition matrix is
irreducible, but its period is 2.
In order to compute the eigenvalues of 1, we introduce the set E
d
= {1. 1]
d
R
d
, and dene for " = (c
1
. . . . . c
d
) E
d
the function (column vector)
"
: Z
d
2

R as follows. For x = (.
1
. . . . . .
n
) Z
d
2
,
"
(x) = c
x
1
1
c
x
2
2
c
x
d
d
.
where of course (1)
0
= 1 and (1)
1
= 1. It is immediate that
"
(x y) =
"
(x)
"
(y) for all x. y Z
d
2
.
Multiplying the matrix 1 by the vector
"
,
1
"
(x) =
1
J
d
i=1
"
(x e
i
) =
_
1
J
d
i=1
"
(e
i
)
_
"
(x) =
J 2k(")
J
"
(x).
where k(") = [{i : c
i
= 1][, and thus
i
c
i
= J 2k(").
4.25 Exercise. Showthat the functions
"
, where " E
d
, are linearly independent.
Now, 1 is a (2
d
2
d
)-matrix, and we have found 2
d
linearly independent
eigenfunctions (eigenvectors). For each k {0. . . . . J], there are
_
d
k
_
elements
" E
d
such that k(") = k. We conclude that we have found all eigenvalues of 1,
and they are
z
k
= 1
2k
J
with multiplicity mult(z
k
) =
_
J
k
_
. k = 0. . . . . J.
By Lemma 4.22,
(n)
(.. .) =
1
2
d
d
k=0
_
J
k
__
1
2k
J
_
n
.
Since 1 is periodic, we cannot use this randomwalk for approximating the equidis-
tribution in Z
d
2
. We modify 1, dening
Q =
1
J 1
1
J
J 1
1.
This describes simple random walk on the graph which is obtained from the hyper-
cube by adding to the edge set a loop at each vertex. Now every point has J 1
neighbours, including the point itself. The new random walk has law j given by
j(0) = j(e
i
) = 1,(J 1), i = 1. . . . . J. It is irreducible and aperiodic. We nd
Q
"
= z
t
k(")
"
, where
z
t
k
= 1
2k
J 1
with multiplicity mult(z
k
) =
_
J
k
_
.
Therefore
q
(n)
(.. .) =
1
2
d
d
k=0
_
J
k
__
1
2k
J 1
_
n
. (4.26)
Furthermore, z
1
= z
d
=
d-1
d1
= z
+
. Applying (4.23), we obtain
[q
(n)
(.. ) m[
1
_
_
2
d
1
_
J 1
J 1
_
n
.
We can use this upper bound in order to estimate the necessary number of steps after
which the random walk (Z
d
. Q) approximates the uniform distribution m with an
error smaller than e
-C
(where C > 0): we have to solve the inequality
_
2
d
1
_
J 1
J 1
_
n
_ e
-C
.
and nd
n _
C log
_
2
d
1
log
_
1
2
d-1
_ .
which is asymptotically (as J o) of the order of
_
1
4
log 2
_
J
2
C
2
J. We see
that for large J, the contribution coming from C is negligible in comparison with
the rst, quadratic term.
Observe however that the upper bound on q
(2n)
(.. .) that has lead us to this
estimate can be improved by performing an asymptotic evaluation (for J o) of
the right hand term in (4.26), a nice combinatorial-analytic exercise.
4.27 Example (The Ehrenfest model). In relation with the discussion of Boltz-
manns Theorem H of statistical mechanics, P. and T. Ehrenfest have proposed
the following model in 1911. An urn (or box) contains N molecules. Further-
more, the box is separated in two halves (sides) and T by a wall with a small
membrane, see Figure 9. In each of the successive time instants, a single molecule
chosen randomly among all N molecules crosses the membrane to the other half
of the box.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....................................................................................................................................................................................................................................
T
Figure 9
We ask the following two questions. (1) Howcan one describe the equilibrium, that
is, the state of the box after a long time period: what is the approximate probability
that side of the box contains precisely k of the N molecules? (2) If initially side
is empty, how long does one have two wait for reaching the equilibrium, that is,
how long does it take until the approximation of the equilibrium is good?
As a Markov chain, the Ehrenfest model is described on the state space
X = {0. 1. . . . . N], where the states represent the number of molecules in side .
The transition matrix, denoted

1 for a reason that will become apparent immedi-
ately, is given by
(. 1) =

N
( = 1. . . . . N)
and
(. 1) =
N
N
( = 0. . . . . N 1).
where (. 1) is the probability, given particles in side , that the randomly
chosen molecule belongs to side and moves to side T. Analogously, (. 1)
corresponds to the passage of a molecule from side T to side .
This Markov chain cannot be described as a random walk on some group.
However, let us reconsider SRW 1 on the hypercube Z
1
2
. We can subdivide the
points of the hypercube into the classes
C
}
= {x Z
1
2
: x has N zeros ]. = 0. . . . . N.
If i. {0. . . . . N] and x C
i
then (x. C
}
) = (i. ). This means that our
partition satises the condition (1.29), so that one can construct the factor chain.
The transition matrix of the latter is

1.
Starting with the Ehrenfest model, we can obtain the ner hypercube model
by imagining that the molecules have labels 1 through N, and that the state x =
(.
1
. . . . . .
1
) Z
1
2
indicates that the molecules labeled i with .
i
= 0 are currently
on (or in) side , while the others are on side T. Thus, passing from the hypercube
to the Ehrenfest model means that we forget about the labels and just count the
number of molecules in (resp. T).
4.28 Exercise. Suppose that (X. 1) is reversible with respect to the measure m( ),
and let (

X.

1) be a factor chain of (X. 1) such that m( .) < ofor each class . in
the partition

X of X. Show that (

X.

1) is reversible.
We get that

1 is reversible with respect to the measure on {0. . . . . N] given by
m( ) = m(C
}
) =
1
2
1
_
N
_
.
This answers question (1).
The matrix

1 has period 2, and Theorem 3.28 cannot be applied. In this sense,
question (2) is not well-posed. As in the case of the hypercube, we can modify the
transition probabilities, considering the factor chain of Q, given by

Q =
1
11
1
1
11
1. (Here, 1 is the identity matrix over {0. . . . . N].) This means that at each
step, no molecule crosses the membrane (with probability
1
11
), or one random
molecule crosses (with probability
1
11
for each molecule). We obtain
q(. 1) =

N 1
( = 1. . . . . N).
(. 1) =
N
N 1
( = 0. . . . . N 1)
and
q(. ) =
1
N 1
( = 0. . . . . N).
Then

Q is again reversible with respect to the probability measure m. Since
C
0
= {0], we have q
(n)
(0. 0) = q
(n)
(0. 0). Therefore the bound of Lemma 4.12,
applied to

Q, becomes
[ q
(n)
(0. ) m[
1
_ 2
1
q
(2n)
(0. 0) 1.
which leads to the same estimate of the number of steps to reach stationarity as in
the case of the hypercube: the approximation error is smaller than e
-C
, if
n _
C log
_
2
1
1
log
_
1
2
1-1
_ ~
_
1
4
log 2
_
N
2
as N o.
This holds when the starting point is 0 (or N), which means that at the beginning of
the process, side (or side T, respectively) is empty. The reason is that the classes
C
0
and C
1
consist of single elements. For a general starting point {0. . . . . N],
we can use the estimate of Theorem 4.15 with z
+
=
1-1
11
, so that
[ q
(n)
(. ) m[
1
_
_
2
1
_
1
}
_ 1
_
N 1
N 1
_
n
.
Thus, the approximation error is smaller than e
-C
, if
n _
C log
_
_
2
1
_
1
}
_
_
1
log
_
1
2
1-1
_ ~
N
4
log
2
1
_
1
}
_ as N o. (4.29)
4.30 Exercise. Use Stirlings formula to show that when N is even and = N,2,
the asymptotic behaviour of the error estimate in (4.29) is
N
4
log
2
1
_
1
1{2
_ ~
N log N
8
as N o.
C. The Poincar inequality 93
Thus, a good approximation of the equilibrium is obtained after a number of
steps of order N log N, when at the beginning both sides and T contain the same
number of molecules, while our estimate gives a number of steps of order N
2
, if at
the beginning all molecules stay in one of the two sides.
C The Poincar inequality
In many cases, z
1
and z
min
cannot be computed explicitly. Then one needs methods
for estimatingthese numbers. The rst maintaskis tondgoodupper bounds for z
1
.
A range of methods uses the geometry of the graph of the Markov chain in order to
nd such bounds. Here we present only the most basic method of this type, taken
from the seminal paper by Diaconis and Stroock [15].
As above, we assume that X is nite and that 1 is reversible with invariant
probability measure m. We can consider the discrete probability space (X. m). For
any function : X R, its mean and variance with respect to m are
E
m
( ) =
x
(.) m(.) = (. 1
A
) and
Var
m
( ) = E
m
_
E
m
( )
_
2
= (. ) (. 1
A
)
2
.
with the inner product (. ) as in (4.4).
4.31 Exercise. Verify that
Var
m
( ) =
1
2
x,,A
_
(,) (.)
_
2
m(.) m(,).
The Dirichlet norm or Dirichlet sum of a function : X R is
D( ) = (V. V ) =
1
2
eT
_
(e
) (e
-
)
_
2
r(e)
=
1
2
x,,A
_
(.) (,)
_
2
m(.) (.. ,).
(4.32)
The following is well known matrix analysis.
4.33 Proposition.
1 z
1
= min
_
D( )
Var
m
( )
: X R non-constant
_
.
Proof. First of all, it is well known that
z
1
= max
_
(1. )
(. )
: = 0. J1
A
_
. (4.34)
Indeed, the eigenspace with respect to the largest eigenvalue z
o
= 1 is spanned by
the function (column vector) 1
A
, so that z
1
is the largest eigenvector of 1 acting
on the orthogonal complement of 1
A
: we have an orthonormal basis
x
, . X, of
2
(X. m) consisting of eigenfunctions of 1 with associated eigenvalues z
x
, such
that
o
= 1
A
, z
o
= 1, and z
x
< 1 for all . = o. If J1
A
then is a linear
combination =

xyo
(.
x
)
x
, and
(1. ) =
xyo
(.
x
) (1
x
. ) =
xyo
z
x
(.
x
)
2
_ z
1
xyo
(.
x
)
2
= (. ).
The maximum in (4.34) is attained for any z
1
-eigenfunction.
Since D( ) = (V
+
V. ) = ( 1. ) by Exercise 4.10, we can rewrite
(4.34) as
1 z
1
= min
_
D( )
(. )
: = 0. J1
A
_
.
Now, if is non-constant, then we can write the orthogonal decomposition
= (. 1
A
) 1
A
g = E
m
( ) 1
A
g. where gJ1
A
.
We then have D( ) = D(g) for the Dirichlet norm, and Var
m
( ) = Var
m
(g).
Therefore the set of values over which the minimum is taken does not change when
we replace the condition J1
A
with non-constant.
We now consider paths in the oriented graph I(1) with edge set 1. We choose
a length element l : 1 (0. o) with l( ` e) = l(e) for each e 1. If =
.
0
. .
1
. . . . . .
n
| is such a path and 1() = {.
0
. .
1
|. . . . . .
n-1
. .
n
|] stands for the
set of (oriented) edges of X on that path, then we write
[[
I
=
eT(t)
l(e)
for its length with respect to l( ). For l( ) 1, we obtain the ordinary length
(number of edges) [[
1
= n. The other typical choice is l(e) = r(e), in which
case we get the resistance length [[
i
of .
We select for any ordered pair of points .. , X, . = ,, a path
x,,
from .
to ,. The following denition relies on the choice of all those
x,,
and the length
element l( ).
4.35 Denition. The Poincar constant of the nite network N = (X. 1. r) (with
normalized resistances such that
eT
1,r(e) = 1) is
k
I
= max
eT
k
I
(e). where k
I
(e) =
r(e)
l(e)
x,, : eT(t
x;y
)
[
x,,
[
I
m(.) m(,).
For simple random walk on a nite graph, we have k
1
= k
i
.
4.36 Theorem (Poincar inequality). The second largest eigenvalue of the re-
versible Markov chain (X. 1), resp. the associated network N = (X. 1. r), satis-
es
z
1
_ 1
1
k
I
.
with respect to any length element l( ) on 1.
Proof. Let : X Rbe any function, and let . = , and
x,,
= .
0
. .
1
. . . . . .
n
|.
Then by the CauchySchwarz inequality
_
(,) (.)
_
2
=
_
n
i=1
_
l(.
i-1
. .
i
|)
(.
i
) (.
i-1
)
_
l(.
i-1
. .
i
|)
_
2
_
n
i=1
l(.
i-1
. .
i
|)
n
i=1
_
(.
i
) (.
i-1
)
_
2
l(.
i-1
. .
i
|)
= [
x,,
[
I
eT(t
x;y
)
_
V(e) r(e)
_
2
l(e)
.
(4.37)
Therefore, using the formula of Exercise 4.31,
Var
m
( ) _
1
2
x,,A,xy,
[
x,,
[
I
m(.) m(,)
eT(t
x;y
)
_
V(e) r(e)
_
2
l(e)
=
1
2
eT
_
V(e)
_
2
r(e)
x,,A:
eT(t
x;y
)
r(e)
l(e)
[
x,,
[
I
m(.) m(,)

k
l
(e)
_ k
I
D( ).
Together with Proposition 4.33, this proves the inequality.
The applications of this inequality require a careful choice of the paths
x,,
,
.. , X.
For simple randomwalk on a nite graph I = (X. 1), we have for the stationary
probability and the associated resistances
m(.) = deg(.),[1[ and r(e) = [1[.
(Recall that m and thus also r( ) have to be normalized such that m(X) = 1.) In
particular, k
i
= k
1
, and
k
1
(e) =
1
[1[
x,, : eT(t
x;y
)
[
x,,
[
1
deg(.) deg(,)
_ k
+
=
1
[1[
max
x,,A
[
x,,
[
1
_
max
xA
deg(.)
_
2
max
eT
;(e). where
;(e) =
{(.. ,) X
2
: e
x,,
]
.
(4.38)
Thus, z
1
_ 1 1,k
+
for SRW. It is natural to use shortest paths, that is, [
x,,
[
1
=
J(.. ,). In this case, max
x,,A
[
x,,
[
1
= diam(I) is the diameter of the graph I.
If the graph is regular, i.e., deg(.) = deg is constant, then [1[ = [X[ deg, and
k
1
(e) =
deg
[X[
diam(I)
k=1
k ;
k
(e). where
;
k
(e) =
{(.. ,) X
2
: [
x,,
[
1
= k. e
x,,
]
.
(4.39)
4.40 Example (Random walk on the hypercube). We refer to Example 4.24, and
start with SRW on Z
d
2
, as in that example. We have X = Z
d
2
and
1 = {x. x e
i
| : x Z
d
2
. i = 1. . . . . J].
Thus, [X[ = 2
d
. [1[ = J 2
d
. diam(I) = J, and deg(x) = J for all x.
We now describe a natural shortest path
x,y
from x = (.
1
. . . . . .
d
) to y =
(,
1
. . . . . ,
d
) = x. Let 1 _ i(1) < i(2) < < i(k) _ J be the coordinates
where ,
i
= .
i
, that is, ,
i
= .
i
1 modulo 2. Then J(.. ,) = k. We let
x,y
= x = x
0
. x
1
. . . . . x
k
= y|. where
x
}
= x
}-1
e
i(})
. = 1. . . . . k.
We rst compute the number ;(e) of (4.38) for e = u. u e
i
| with u =
(u
1
. . . . . u
d
) Z
d
2
. We have e 1(
x,y
) precisely when .
}
= u
}
for =
i. . . . . J, ,
i
= u
i
1 mod 2 and and ,
}
= u
}
for = 1. . . . . i 1. There are
2
d-i
free choices for the last J i coordinates of x and 2
i-1
free choices for the
rst i coordinates of y. Thus, ;(e) = 2
d-1
for every edge e. We get
k
+
=
1
J 2
d
J J
2
2
d-1
=
J
2
2
. whence z
1
_ 1
2
J
2
.
Our estimate of the spectral gap 1z
1
_ 2,J
2
misses the true value 1z
1
= 2,J
by the factor of 1,J.
Next, we can improve this crude bound by computing k
1
(e) precisely. We need
the numbers ;
k
(e) of (4.39). With e = u. u e
i
|, x and y as above such that
e 1(
x,y
), we must have (mod 2)
x = (.
1
. . . . . .
i-1
. u
i
. u
i1
. . . . . u
d
)
and
y = (u
1
. . . . . u
i-1
. u
i
1. ,
i1
. . . . . ,
d
).
If J(x. y) = k then x and y differ in precisely k coordinates. One of them is the
i -th coordinate. There remain precisely
_
d-1
k-1
_
free choices for the other coordinates
where x and y differ. This number of choices is ;
k
(e). We get
k
1
= k
1
(e) =
J
2
d
d
k=1
k
_
J 1
k 1
_
=
J
2
d
(J 1)2
d-2
=
J(J 1)
4
.
and the new estimate 1 z
1
_ 4,
_
J(J 1)
_
is only slightly better, missing the
true value by the factor of 2,(J 1).
4.41 Exercise. Compute the Poincar constant k
1
for the Ehrenfest model of Ex-
ample 4.27, and compare the resulting estimate with the true value of z
1
, as above.
[Hint: it will turn out after some combinatorial efforts that k
1
(e) is constant over
all edges.]
4.42 Example (Card shufing via random transpositions). What is the purpose of
shufing a deck of N cards? We want to simulate the equidistribution on all N
permutations of the cards. We might imagine an urn containing N decks of cards,
each in a different order, and pick one of those decks at random. This is of course
not possible in practice, among other because N is too big. Areasonable algorithm
is to rst pick at random one among the N cards (with probability 1,N each), then
to pick at random one among the remaining N 1 cards (with probability N 1
each), and so on. We will indeed end up with a random permutation of the cards,
such that each permutation occurs with the same probability 1,N .
Card shufing can be formalized as a random walk on the symmetric group
S
1
of all permutations of the set {1. . . . . N] (an enumeration of the cards). The
n-th shufe corresponds to a random permutation Y
n
(n = 1. 2. . . . ) in S
1
, and
a fair shufer will perform the single shufes such that they are independent. The
random permutation obtained after n shufes will be the product Y
1
. . . Y
n
. Thus,
if the law j of Y
n
is such that the resulting random walk is irreducible (compare
with Lemma 4.21), then we have a Markov chain whose stationary distribution is
equidistribution on S
1
. If, in addition, this random walk is also aperiodic, then the
convergence theorem tells us that for large n, the distribution of Y
1
Y
n
will be
a good approximation of that uniform distribution. This explains the true purpose
of card shufing, although one may guess that most card shufers are not aware of
such a justication of their activity.
One of the most commonmethodof shufing, the rife shufe, has beenanalyzed
by Bayer and Diaconis [4] in a piece of work that has become very popular, see
also Mann [43] for a nice exposition. Here, we consider another shufing model,
or random walk on S
1
, generated by random transpositions.
Throughout this example, we shall write .. ,. . . . for permutations of the set
{1. . . . . N], and id for the identity. Also, we write the composition of permutations
from left to right, that is, (.,)(i ) = ,
_
.(i )
_
. The transposition of the (distinct)
elements i. is denoted t
i,}
. The law j of our random walk is equidistribution on
the set T of all transpositions. Thus,
(.. ,) =
_
_
_
2
N(N 1)
. if , = . t for some t T.
0. otherwise.
(Note that if , = . t then . = , t .) The stationary probability measure and the
resistances of the edges are given by
m(.) =
1
N
and r(e) =
N(N 1)
2
N. where e = .. . t |. . S
1
. t T.
For m < N, we consider S
n
as a subgroup of S
1
via the identication
S
n
= {. S
1
: .(i ) = i for i = m1. . . . . N].
Claim 1. Every . S
1
\ S
1-1
has a unique decomposition
. = , t. where , S
1-1
and t T
1
= {t
},1
: = 1. . . . . N 1].
Proof. Since . S
1-1
, we must have .(N) = {1. . . . . N 1]. Set , =
. t
},1
. Then ,(N) = t
},1
_
.(N)
_
= N. Therefore , S
1-1
and . = , t
},1
.
If there is another decomposition . = ,
t
t
}
0
,1
then t
},1
t
}
0
,1
= ,
-1
,
t
S
1-1
,
whence =
t
and , = ,
t
.
Thus, every . S
1
has a unique decomposition
. = t
}(1),n(1)
t
}(k),n(k)
(4.43)
with 0 _ k _ N 1, 2 _ m(1) < < m(k) _ N and 1 _ (i ) < m(i ) for all i .
In the graph I(1), we can consider only those edges .. . t | and . t. .|,
where . S
n
and t T
n
0 with m
t
> m. Then we obtain a spanning tree
of I(1), that is, a subgraph that contains all vertices but no cycle (A cycle is a
sequence .
0
. .
1
. . . . .
k-1
. .
k
= .
0
| such that k _ 3, .
0
. . . . . .
k-1
are distinct,
and .
i-1
. .
i
| 1 for i = 1. . . . . k.) The tree is rooted; the root vertex is id.
In Figure 10, this tree is shown for S
4
. Since the graph is symmetric, we have
drawn non-oriented edges. Each edge .. . t | is labelled with the transposition
t = t
i,}
= (i. ). written in cycle notation. The permutations corresponding to
the vertices are obtained by multiplying the transpositions along the edges on the
shortest path (geodesic) from id to the respective vertex. Those permutations are
again written in cycle notation in Figure 10.
id
(12)
(123)
(132)
(13)
(23)
(1234)
(1423)
(1243)
(1324)
(1342)
(1432)
(124)
(142)
(12)(34)
(134)
(13)(24)
(143)
(14)(23)
(234)
(243)
(14)
(24)
(34)
..................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................
.......................................................................................................................................................................................................................................................................................................................................................................................................
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
............................................................................................................................................................................................................
........................................................................................................................................... ..........................................................................................................................................................................................................................................................................................................................................................
...........................................................................................................................................................................................................................................................................................................................................................................
..................................................................................................................................................................................................................................................................................................................................................................................................
..........................................................................................................................................
................................................................................................................................. ..........................................................................................................................................
..........................................................................................................................................
................................................................................................................................. ..........................................................................................................................................
...................................................................................................................................................
........................................................................................................................................... ...................................................................................................................................................
........................................................................................................................................... ...................................................................................................................................................
...........................................................................................................................................................................
(14)
(24)
(34)
(14)
(24)
(34)
(14)
(24)
(34)
(14)
(24)
(34)
(14)
(24)
(34)
(14)
(24)
(34)
(13)
(23)
(13)
(23)
(12)
Figure 10. The spanning tree of S
4
.
If . S
1
\{id ] is decomposed into transpositions as in (4.43), then we choose
id,x
= .
0
= id. .
1
. . . . . .
k
= .| with .
i
= t
}(1),(1)
. . . t
}(i),(i)
. This is the
shortest path from id to . in the spanning tree. Then we let
x,,
= .
id,x
1
,
.
For a transposition t = t
},n
, where < m, we say that an edge e of I(1) is
of type t , if it has the form e = u. ut | with u S
1
. We want to determine the
number k
+
of (4.38).
Claim 2. If e = u. ut | is an edge of type t then
;(e) =
. S
n
\ {id ] :
id,x
contains an edge of type t
_
.
Proof. Let I(e) =

(.. ,) S
1
: . = ,. e 1(
x,,
)
_
and
(t ) =

. S
1
\ {id ] :
id,x
contains an edge of type t
_
.
Then ;(e) = [I(e)[ by denition. We show that the mapping (.. ,) .
-1
, is a
bijection from I(e) to (t ).
First of all, if (.. ,) I(e) then by denition of
x,,
, the edge .
-1
u. .
-1
ut |
belongs to
id,x
1
,
. Therefore .
-1
, (t ).
Second, if n
k
(t ) then the decomposition (4.43) of n (in the place of .)
contains t . We can write this decomposition as n = n
1
t n
2
. Then the unique
edge of type t on
id,u
is n
1
. n
1
t |. We set . = un
-1
1
and , = ut n
2
. Then
.
-1
, = n, so that
x,,
= .
id,u
contains the edge . n
1
. . n
1
t | = e. Thus, the
mapping is surjective, and since the edge of type t on
id,u
is unique, the mapping
is also one-to-one (injective).
So we next have to compute the cardinality of (t ). Let t = (. m) with
< m. By (4.43), every . (t ) can be written uniquely as . = ut ,, where
u S
n-1
(and any such u may occur) and , is any element of the form , =
t
}(1),n(1)
t
}(k),n(k)
with m < m(1) < m(k) _ N and (i ) < m(i ) for all i .
(This includes the case k = 0, , = id.) Thus, we have precisely (m 1) choices
for u. On the other hand, since every element of S
1
can be written uniquely as
, where S
n
and , has the same form as above, we conclude that the set of
all valid elements , forms a set of representatives of the right cosets of S
n
in S
1
:
S
1
=
,
S
n
,.
The number of those cosets is N,m, and this is also the number of choices for ,
in the decomposition . = ut ,. Therefore, if e is an edge of type t = (. m) then
;(e) = [(t )[ = (m 1)
N
m
=
N
m
.
The maximum is obtained when m = 2, that is, when t = (1. 2). We compute
k
+
=
1
N
(N 1)
N(N 1)
2
N
2
=
N(N 1)
2
4
. and
z
1
= z
1
(1) _ 1
4
N(N 1)
2
.
Again, this random walk has period 2, while we need an aperiodic one in order
to approximate the equidistribution on S
1
. Therefore we modify the random
transposition of each single shufe as follows: we select independently and at
random (with probability 1,N each) two indices i. {1. . . . . N] and exchange
the corresponding cards, when i = . When i = , no card is moved. The
transition probabilities of the resulting random walk of S
1
are
q(.. ,) =
_
_
1,N. if , = ..
1,N
2
. if , = . t with t T.
0. in all other cases.
In terms of transition matrices, Q =
1
1
1
1-1
1
1. Therefore
z
1
(Q) =
1
N
N 1
N
z
1
(1) _ 1
4
N
2
(N 1)
.
By Exercise 4.17, z
min
= 1
2
1
. Therefore
z
+
(Q) _ 1
4
N
2
(N 1)
.
Equation 4.23 leads to
[
(n)
(.. ) m[
1
_
_
N 1
_
1
4
N
2
(N 1)
_
n
.
Thus, if we want to be sure that [
(n)
(.. ) m[
1
< e
-C
then we can choose
n _
_
C
1
2
log(N 1)
_
_
log
_
1
4
1
2
(1-1)
__
~
1
3
4
C
1
4
S
log N.
The above bound for z
+
(Q) can be improved by additional combinatorial efforts.
However, also in this case the true value is known: z
+
(Q) = 1
2
1
, see Diaconis
and Shahshahani [14].
D Recurrence of innite networks
In this section, we assume that N = (X. 1. r) is an innite network associated
with a reversible, irreducible Markov chain (X. 1). We want to establish a set
of recurrence (resp. transience) criteria that can actually be applied in a variety of
cases. The network is called recurrent (resp. transient), if (X. 1) has the respective
property.
We shall need very basic properties of Hilbert spaces, namely the Cauchy
Schwarz inequality and the fact that any non-empty closed convex set in a Hilbert
space has a unique element with minimal norm.
The Dirichlet space D(N) associated with the network consists of all functions
on X (not necessarily in
2
(X. m)) such that V
2
[
(1. r). The Dirichlet
normD( ) of such a function was dened in (4.32). The kernel of this quasi-norm
consists of the constant functions on X. On the space D(N), we dene an inner
product with respect to a reference point o X:
(. g)
T
= (. g)
T,o
= (V. Vg) (o)g(o).
4.44 Lemma. (a) D(N) is a Hilbert space.
(b) For each . X, there is a constant C
x
> 0 such that
C
-1
x
(. )
T,o
_ (. )
T,x
_ C
x
(. )
T,o
for all D(N). That is, changingthe reference point ogives rise toanequivalent
Hilbert space norm.
(c) Convergence of a sequence of functions in D(N) implies their pointwise
convergence.
Proof. Let . X \ {o]. By connectedness of the graph I(1), there is a path
o,x
= o = .
0
. .
1
. . . . . .
k
= .| with edges e
i
= .
i-1
. .
i
| 1. Then for
D(N), using the CauchySchwarz inequality as in Theorem 4.36,
_
(.) (o)
_
2
=
_
k
i=1
(.
i
) (.
i-1
)
_
r(e
i
)
_
r(e
i
)
_
2
_ c
x
D( ).
where c
x
=

k
i=1
r(e
i
) = [
o,x
[
i
. Therefore
(.)
2
_ 2
_
(.) (o)
_
2
2(o)
2
_ 2c
x
D( ) 2(o)
2
.
and
(. )
T,x
= D( ) (.)
2
_ (2c
x
1)D( ) 2(o)
2
_ C
x
(. )
T,o
.
D. Recurrence of innite networks 103
where C
x
= max{2c
x
1. 2]. Exchanging the roles of o and ., we get (. )
T,o
_
C
x
(. )
T,x
with the same constant C
x
. This proves (b).
We next showthat D(N) is complete. Let (
n
) be a Cauchy sequence in D(N),
and let . X. Then, by (b),
_
n
(.)
n
(.)
_
2
_ (
n

n
.
n

n
)
T,x
0 as m. n o.
Therefore there is a function on X such that
n
pointwise. On the other
hand, as m. n o,
(V(
n

n
). V(
n

n
)) = D(
n

n
) _ (
n

n
.
n

n
)
T,o
0.
Thus (V
n
) is a Cauchy sequence in the Hilbert space
2
[
(1. r). Hence, there is

2
[
(1. r) such that V
n
in that Hilbert space. Convergence of a sequence
of functions in the latter implies pointwise convergence. Thus
(e) = lim
n-o
n
(e
)
n
(e
-
)
r(e)
= V(e)
for each edge e 1. We obtain D( ) = (. ) < o, so that D(N). To
conclude the proof of (a), we must show that (
n
.
n
)
T,o
0 as n o.
But this is true, since both D(
n
) = (V
n
. V
n
) and
_
n
(o) (o)
_
2
tend to 0, as we have just seen.
The proof of (c) is contained in what we have proved above.
4.45 Exercise. Extension of Exercise 4.10. Show that V
+
(V ) = L for every
D(N), even when the graph I(1) is not locally nite.
[Hint: use the CauchySchwarz inequality once more to check that for each ., the
sum
,:x,,jT
[(.) (,)[ a(.. ,) is nite.]
We denote by D
0
(N) the closure in D(N) of the linear space
0
(X) of all
nitely supported real functions on X. Since V is a bounded operator,
2
(X. m)
D
0
(N) as sets, but in general the Hilbert space norms are not comparable, and
equality in the place of does in general not hold.
Recall the denitions (2.15) and (2.16) of the restriction of 1 to a subset of
X and the associated Green kernel G
(. ), which is nite by Lemma 2.18.

4.46 Lemma. Suppose that X is nite, . , and let
0
(X) be such
that supp( ) . Then
(V. VG
(. .)) = m(.)(.).
Proof. The functions and G
(. .) are 0 outside , and we can use Exercise 4.45 :

(V. VG
(. .)) =
_
. (1 1)G
(. .)
_
=
,
(,)
_
G
(,. .)
uA
(,. n)G
(n. .)

= 0. if n
_
=
_
. (1
)G
(. .)
_
= (. 1
x
).
In the last step, we have used (2.17) with : = 1.
4.47 Lemma. If (X. 1) is transient, then G(. .) D
0
(N) for every . X.
Proof. Let T be two nite subsets of X containing .. Setting = G
(. .),
Lemma 4.46 yields
(VG
(. .). VG
(. .)) = (VG
(. .). VG
B
(. .)) = m(.)G
(.. .).
Analogously, setting = G
B
(. .),
(VG
B
(. .). VG
B
(. .)) = m(.)G
B
(.. .).
Therefore
D
_
G
B
(. .) G
(. .)
_
= (VG
B
(. .). VG
B
(. .))
2(VG
(. .). VG
B
(. .)) (VG
(. .). VG
(. .))
= m(.)
_
G
B
(.. .) G
(.. .)
_
.
Now let (
k
)
k_1
be an increasing sequence of nite subsets of X with union X.
Using (2.15), we see that
(n)
k
(.. ,)
(n)
(.. ,) monotonically from below as
k o, for each xed n. Therefore, by monotone convergence, G
k
(.. .) tends to
G(.. .) monotonically frombelow. Hence, by the above, the sequence of functions
_
G
k
(. .)
_
k_1
is a Cauchy sequence in D(N). By Lemma 4.44, it converges in
D(N) to its pointwise limit, which is G(. .). Thus, G(. .) is the limit in D(N)
of a sequence of nitely supported functions.
4.48 Denition. Let . X and i
0
R. A ow with nite power from . to o
with input i
0
(also called unit ow when i
0
= 1) in the network N is a function

2
[
(1. r) such that
V
+
(,) =
i
0
m(.)
1
x
(,) for all , X.
Its power is (. ).
1
1
Many authors call (, ) the energy of , but in the correct physical interpretation, this should be
the power.
The condition means that Kirchhoffs node law is satised at every point ex-
cept .,
eT:e
C
=,
(e) =
_
0. , = ..
i
0
. , = ..
As explained at the beginning of this chapter, we may think of the network as a
system of tubes; each edge e is a tube with length r(e) cm and cross-section 1 cm
2
,
and the tubes are connected at their endpoints (vertices) according to the given graph
structure. The network is lled with (incompressible) liquid, and at the source .,
liquid is injected at a constant rate of i
0
liters per second. Requiring that this be
possible with nite power (effort) (. ) is absurd if the network is nite (unless
i
0
= 0). The main purpose of this section is to show that the existence of such
ows characterizes transient networks: even though the network is lled, it is so
big at innity, that the permanently injected liquid can ow off towards innity
at the cost of a nite effort. With this interpretation, recurrent networks correspond
more to our intuition of the real world. An analogous interpretation can of course
be given in terms of voltages and electric current.b
4.49 Denition. The capacity of a point . X is
cap(.) = inf{D( ) :
0
(X). (.) = 1].
4.50 Lemma. It holds that cap(.) = min{D( ) : D
0
(N). (.) = 1], and
the minimum is attained by a unique function in this set.
Proof. First of all, consider the closure C
-
of C = {
0
(X) : (.) = 1]
in D(N). Every function in C
-
must be in D
0
(N), and (since convergence in
D(N) implies pointwise convergence) has value 1 in .. We see that the inclusion
C
-
{ D
0
(N) : (.) = 1] holds. Conversely, let D
0
(N) with
(.) = 1. By denition of D
0
(N) there is a sequence of functions
n
in
0
(X)
such that
n
in D(N), and in particular z
-1
n
=
n
(.) (.) = 1. But
then z
n

n
C and z
n

n
in D(N), since in addition to convergence in
the point . we have D(z
n

n
n
) = (z
n
1)
2
D(
n
) 0 D( ) = 0, that is,
n
and z
n

n
have the same limit in D(N). We conclude that
C
-
= { D
0
(N) : (.) = 1].
This is a closed convex set in the Hilbert space. It is a basic theorem in Hilbert
space theory that such a set possesses a unique element with minimal norm; see
e.g. Rudin [Ru, Theorem 4.10]. Thus, if this element is
0
then
D(
0
) = min{D( ) : C
-
] = inf{D( ) : C] = cap(.).
since D( ) is continuous.
The ow criterion is now part (b) of the following useful set of necessary and
sufcient transience criteria.
4.51 Theorem. For the network N associated with a reversible Markov chain
(X. 1), the following statements are equivalent.
(a) The network is transient.
(b) For some (==every) . X, there is a ow from. to owith non-zero input
and nite power.
(c) For some (==every) . X, one has cap(.) > 0.
(d) The constant function 1 does not belong to D
0
(N).
Proof. (a) == (b). If the network is transient, then G(. .) D
0
(N) by Lem-
ma 4.47. We dene =
i
0
m(x)
VG(. .). Then
2
[
(1. r), and by Exercise 4.45
V
+
=
i
0
m(.)
LG(. .) =
i
0
m(.)
1
x
.
Thus, is a ow from . to owith input i
0
and nite power.
(b) ==(c). Suppose that there is a ow from . to owith input i
0
= 0 and
nite power. We may normalize such that i
0
= 1. Now let
0
(X) with
(.) = 1. Then
(V. ) = (. V
+
) =
_
.
1
m(.)
1
x
_
= (.) = 1.
Hence, by the CauchySchwarz inequality,
1 = [(V. )[
2
_ (V. V ) (. ) = D( ) (. ).
We obtain cap(.) _ 1,(. ) > 0.
(c) ==(d). This follows from Lemma 4.50 : we have cap(.) = 0 if and only
if there is D
0
(N) with (.) = 1 and D( ) = 0, that is, = 1 is in D
0
(N).
Indeed, connectedness of the network implies that a function with D( ) = 0 must
be constant.
(c) == (a). Let X be nite, with . . Set = G
(. .),G
(.. .).
Then
0
(X) and (.) = 1. We use Lemma 4.46 and get
cap(.) _ D( ) =
1
G
(.. .)
2
(VG
(. .). VG
(. .)) =
m(.)
G
(.. .)
.
We obtain that G
(.. .) _ m(.), cap(.) for every nite X containing ..

Now take an increasing sequence (
k
)
k_1
of nite sets containing ., whose union
is X. Then, by monotone convergence,
G(.. .) = lim
k-o
G
k
(.. .) _ m(.), cap(.) < o.
4.52 Exercise. Prove the following in the transient case. The ow from . to o
with input 1 and minimal power is given by = VG(. .),m(.), its power (the
resistance between . and o) is G(.. .),m(.), and cap(.) = m(.),G(.. .).
The ow criterion became popular through the work of T. Lyons [42]. A few
years earlier, Yamasaki [50] had proved the equivalence of the statements of Theo-
rem4.51for locallynite networks ina less commonterminologyof potential theory
on networks. The interpretation in terms of Markov chains is not present in [50];
instead, niteness of the Green kernel G(.. ,) is formulated in a non-probabilistic
spirit. For non locally nite networks, see Soardi and Yamasaki [48].
If X then we write 1
for the set of edges with both endpoints in , and

d for all edges e with e
-
and e
X \ (the edge boundary of ). Below

in Example 4.63 we shall need the following.
4.53 Lemma. Let be a ow from . to owith input i
0
, and let X be nite
with . . Then

eJ
(e) = i
0
.
Proof. Recall that ( ` e) = (e) for each edge. Thus,

eT : e
=,
(e) = i
0

1
x
(,) for each , X, and
eT : e
=,
(e) = i
0
.
If e 1
then both e and ` e appear precisely once in the above sum, and the two
of them contribute to the sum by ( ` e) (e) = 0. Thus, the sum reduces to all
those edges e which have only one endpoint in 1, that is
eT : e
=,
(e) =
eJ
(e).
Before using this, we consider a corollary of Theorem 4.51:
4.54 Exercise (Denition). A subnetwork N
t
= (X
t
. 1
t
. r
t
) of N = (X. 1. r) is
a connected graph with vertex set X
t
and symmetric edge set 1
t
1
A
0 such that
r
t
(e) _ r(e) for each e 1
t
. (That is, a
t
(.. ,) _ a(.. ,) for all .. , X
t
.) We
call it an induced subnetwork, if r
t
(e) = r(e) for each e 1
t
.
Use the ow criterion to show that transience of a subnetwork N
t
implies
transience of N.
Thus, if simple random walk on some innite, connected, locally nite graph is
recurrent, then SRW on any subgraph is also recurrent.
The other criteria in Theorem 4.51 are also very useful. The next corollary
generalizes Exercise 4.54.
4.55 Corollary. Let 1 and Q be the transition matrices of two irreducible, re-
versible Markov chains on the same state space X, and let D
1
and D
Q
be the
associated Dirichlet norms. Suppose that there is a constant c > 0 such that
c D
Q
( ) _ D
1
( ) for each
0
(X).
Then transience of (X. Q) implies transience of (X. 1).
Proof. The inequality implies that for the capacities associated with 1 and Q
(respectively), one has
cap
1
(.) _ c cap
Q
(.).
The statement now follows from criterion (c) of Theorem 4.51.
We remark here that the notation D
1
( ) is slightly ambiguous, since the resis-
tances of the edges in the network depend on the choice of the reversing measure
m
1
for 1: if we multiply the latter by a constant, then the Dirichlet norm divides
by that constant. However, none of the properties that we are studying change, and
in the above inequality one only has to adjust the value of c > 0.
Now let I = (X. 1) be an arbitrary connected, locally nite, symmetric graph,
and let J(. ) be the graph distance, that is, J(.. ,) is the minimal length (number
of edges) of a path from . to ,. For k N, we dene the k-fuzz I
(k)
of I as the
graph with the same vertex set X, where two points ., , are connected by an edge
if and only if 1 _ J(.. ,) _ k.
4.56 Proposition. Let I be a connected graph with uniformly bounded vertex
degrees. Then SRW on I is recurrent if and only if SRW on the k-fuzz I
(k)
is
recurrent.
Proof. Recurrence or transience of SRW on I do not depend on the presence of
loops (they have no effects on ows). Therefore we may assume that I has no
loops. Then, as a network, I is a subnetwork of I
(k)
. Thus, recurrence of SRW on
I
(k)
implies recurrence of SRW on I.
Conversely, let .. , X with 1 _ J = J(.. ,) _ k. Choose a path
x,,
=
. = .
0
. .
1
. . . . . .
d
= ,| in X. Then for any function on X,
_
(,)(.)
_
2
=
_
d
i=1
1
_
(.
i
)(.
i-1
)
_
_
2
_ J
eT(t
x;y
)
1
_
(e
)(e
-
)
_
2
by the CauchySchwarz inequality. Here, 1(
x,,
) = {.
0
. .
1
|. . . . . .
d-1
. .
d
|]
stands for the set of edges of X on that path.
We write = {
x,,
: .. , X. 1 _ J = J(.. ,) _ k]. Since the graph X
has uniformly bounded vertex degrees, there is a bound M = M
k
such that any
edge e of X lies on at most M paths in X with length at most k. (Exercise: verify
E. Random walks on integer lattices 109
this fact !) We obtain for
0
(X) and the Dirichlet forms associated with SRW
on I and I
(k)
(respectively)
D
I
.k/ ( ) =
1
2
(x,,):1_d(x,,)_k
_
(,) (.)
_
2
_
1
2
(x,,):1_d(x,,)_k
J(.. ,)
eT(t
x;y
)
_
(e
) (e
-
)
_
2
=
k
2
eT(t)
_
(e
) (e
-
)
_
2
=
k
2
eT
t:eT(t)
_
(e
) (e
-
)
_
2
_
k
2
eT
M
_
(e
) (e
-
)
_
2
= kM D
I
( ).
Thus, we can apply Corollary 4.55 with c = 1,(kM), and nd that recurrence of
SRW on I implies recurrence of SRW on I
(k)
.
E Random walks on integer lattices
When we speak of Z
d
as a graph, we have in mind the J-dimensional lattice whose
vertex set is Z
d
, and two points are neighbours if their Euclidean distance is 1, that
is, they differ by 1 in precisely one coordinate. In particular, Z will stand for the
graph which is a two-way innite path. We shall now discuss recurrence of SRW
in Z
d
.
Simple random walk on Z
d
Since Z
d
is an Abelian group, SRW is the random walk on this group in the sense
of (4.18) (written additively) whose law is the equidistribution on the set of integer
unit vectors {e
1
. . . . . e
d
].
4.57 Example (Dimension J = 1). We rst propose different ways for showing
that SRW on Z is recurrent. This is the innite drunkards walk of Example 3.5
with = q = 1,2.
4.58 Exercise. (a) Use the ow criterion for proving recurrence of SRW on Z.
(b) Use (1.50) to compute
(2n)
(0. 0) explicitly: show that
(2n)
(0. 0) =
1
2
2n
(2n)
(0. 0)
=
1
2
2n
_
2n
n
_
.
(c) Use generating functions, and in particular (3.6), to verify that G(0. 0[:) =
1
_
1 :
2
, and deduce the above formula for
(2n)
(0. 0) by writing down the
power series expansion of that function.
(d) Use Stirlings formula to obtain the asymptotic evaluation
(2n)
(0. 0) ~
1
_
n
. (4.59)
Deduce recurrence from this formula.
Before passing to dimension 2, we consider the following variant Q of SRW 1
on Z:
q(k. k 1) = 1,4. q(k. k) = 1,2. and q(k. l) = 0 if [k l[ _ 2. (4.60)
Then Q =
1
2
1
1
2
1, and Exercise 1.48 yields
G
Q
(0. 0)(:) = 1
_
1 :. (4.61)
This can also be computed directly, as in Examples 2.10 and 3.5. Therefore
q
(n)
(0. 0) =
1
4
n
_
2n
n
_
~
1
_
n
as n o. (4.62)
This return probability can also be determined by combinatorial arguments: of the
n steps, a certain number k will go by 1 unit to the left, and then the same number
of steps must go by one unit to the left, each of those steps with probability 1,4.
There are
_
n
k
_
_
n-k
k
_
distinct possibilities to select those k steps to the right and k
steps to the left. The remaining n 2k steps must correspond to loops (where the
current position in Z remains unchanged), each one with probability 1,2 = 2,4.
Thus
q
(n)
(0. 0) =
1
4
n
jn{2]
k=0
_
n
k
_
_
n k
k
_
2
n-2k
.
The resulting identity, namely that the last sum over k equals
_
2n
n
_
, is a priori not
completely obvious.
2
4.63 Example (Dimension J = 2). (a) We rst show how one can use the ow
criterion for proving recurrence of SRW on Z
2
. Recall that all edges have conduc-
tance 1. Let be a ow from 0 = (0. 0) to o with input i
0
> 0. Consider the
box
n
= {(k. l) Z
2
: [k[. [l[ _ n], where n _ 1. Then Lemma 4.53 and the
CauchySchwarz inequality yield
i
0
=
eJ
n
(e) 1 _ [d
n
[
eJ
n
(e)
2
.
2
The author acknowledges an email exchange with Ch. Krattenthaler on this point.
The sets d
n
are disjoint subsets of the edge set 1. Therefore
(. ) _
1
2
o
n=1
eJ
n
(e)
2
_
o
n=1
i
0
2[d
n
[
= o.
since [d
n
[ = 8n 4. Thus, every ow from the origin to owith non-zero input
has innite power: the random walk is recurrent.
(b) Another argument is the following. Let two walkers perform the one-di-
mensional SRW simultaneously and independently, each one with starting point 0.
Their joint trajectory, viewed in Z
2
, visits only the set of points (k. l) with k l
even. The resulting Markov chain on this state space moves from (k. l) to any of
the four points (k 1. l 1) with probability
1
(k. k 1)
1
(l. l 1) = 1,4,
where
1
(. ) now stands for the transition probabilities of SRW on Z. The graph
of this doubled Markov chain is the one with the dotted edges in Figure 11. It is
isomorphic with the lattice Z
2
, and the transition probabilities are preserved under
this isomorphism. Hence SRW on Z
2
satises
(2n)
(0. 0) =
_
1
2
2n
_
2n
n
__
2
~
1
n
. (4.64)
Thus G(0. 0) = o.
Figure 11. The grids Z and Z
2
.
- - - - - - - - -
- - - - - - - - -
- - - - - - - - -
- - - - - - - - -
- - - - - - - - -
- - - - - - - - -
- - - - - - - - -
- - - - - - - - -
- - - - - - - - -
- - - - - - - - -
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.65 Example (Dimension J = 3). Consider the randomwalk on Zwith transition
matrix Qgiven by 4.60. Again, we let three independent walkers start at 0 and move
according to Q. Their joint trajectory is a random walk on the Abelian group Z
3
with law j given by
j(e
1
) = j(e
2
) = j(e
3
) = 1,16.
j(e
1
e
2
) = j(e
1
e
3
) = j(e
2
e
3
) = 1,32.
and j(e
1
e
2
e
3
) = 1,64.
We write Q
3
for its transition matrix. It is symmetric (reversible with respect to the
counting measure) and transient, since its n-step return probabilities to the origin
are
q
(n)
3
(0. 0) =
_
1
4
n
_
2n
n
__
3
~
1
( n)
3{2
.
whence G(0. 0) < ofor the associated Green function.
If we think of Z
3
made up by cubes then the edges of the associated network plus
corresponding resistances are as follows: each side of any cube has resistance 16,
each diagonal of each face (square) of any cube has resistance 32, and each diag-
onal of any cube has resistance 64. In particular, the corresponding network is a
subnetwork of the one of SRW on the 3-fuzz of the lattice, if we put resistance 64
on all edges of the latter. Thus, SRW on the 3-fuzz is transient by Exercise 4.54,
and Proposition 4.56 implies that also SRW on Z
3
is transient.
4.66 Example (Dimension J > 3). Since Z
d
for J > 3 contains Z
3
as a subgraph,
transience of SRW on Z
d
now follows from Exercise 4.54.
We remark that these are just a few among many different ways for showing
recurrence of SRWon Zand Z
2
and transience of SRWon Z
d
for J _ 3. The most
typical and powerful method is to use characteristic functions (Fourier transform),
see the historical paper of Plya [47].
General random walks on the group Z
d
are classical sums of i.i.d. Z-valued
randomvariables 7
n
= Y
1
Y
n
, where the distributionof the Y
k
is a probability
measure j on Z
d
and the starting point is 0. We now want to state criteria for
recurrence/transience without necessarily assuming reversibility (symmetry of j).
First, we introduce the (absolute) moment of order k associated with j:
[j[
k
=
xZ
d
[x[
k
j(x).
([x[ is the Euclidean length of the vector x.) If [j[
1
is nite, the mean vector
j =
xZ
d
x j(x)
describes the average displacement in a single step of the random walk with lawj.
4.67Theorem. In arbitrary dimension J, if [j[
1
< oand j = 0, then the random
walk with law j is transient.
Proof. We use the strong law of large numbers (in the multidimensional version).
Let (Y
n
)
n_1
be a sequence of independent R
d
-valued random variables with
common distribution j. If [j[
1
< othen
lim
n-o
1
n
(Y
1
Y
n
) = j
with probability 1.
In our particular case, given that j = 0, the set of trajectories
=

o C : there is n
0
such that

1
n
7
n
(o) j
< [ j[ for all n _ n

0
_
contains the eventlim
n
7
n
,n = j| and has probability 1. (Standard exercise:
verify that the latter event as well as belong to the o-algebra generated by the
cylinder sets.) But for every o , one may have 7
n
(o) = 0 only for nitely
many n. In other words, with probability 1, the random walk (7
n
) returns to the
origin no more than nitely many times. Therefore H(0. 0) = 0, and we have
transience by Theorem 3.2.
In the one-dimensional case, the most general recurrence criterion is due to
Chung and Ornstein [12].
4.68 Theorem. Let jbe a probability distribution on Zwith [j[
1
< oand j = 0.
Then every state of the random walk with law j is recurrent.
In other words, Z decomposes into essential classes, on each of which the
random walk with law j is recurrent.
4.69 Exercise. Show that since j = 0, the number of those essential classes is
nite except when j is the point mass in 0.
For the proof of Theorem 4.68, we need the following auxiliary lemma.
4.70 Lemma. Let (X. 1) be an arbitrary Markov chain. For all .. , X and
N N,
1
n=0
(n)
(.. ,) _ U(.. ,)
1
n=0
(n)
(,. ,).
Proof.
1
n=0
(n)
(.. ,) =
1
n=0
n
k=0
u
(n-k)
(.. ,)
(k)
(,. ,)
=
1
k=0
(k)
(,. ,)
1-k
n=0
u
(n)
(.. ,).
Proof of Theorem 4.68. We use once more the law of large numbers, but this time
the weak version is sufcient.
Let (Y
n
)
n_1
be a sequence of independent real valued random variables with
common distribution j. If [j[
1
< othen
lim
n-o
Pr
_
1
n
(Y
1
Y
n
) j
> c
_
= 0
for every c > 0.
Since in our case j = 0, this means in terms of the transition probabilities that
lim
n-o
n
= 1. where
n
=
kZ:[k[_nt
(n)
(0. k).
Let M. N N and c = 1,M. (We also assume that Nc N.) Then, by
Lemma 4.70,
1
n=0
(n)
(0. k) _
1
n=0
(n)
(k. k) =
1
n=0
(n)
(0. 0) for all k Z.
Therefore we also have
1
n=0
(n)
(0. 0) _
1
2Nc 1
k:[k[_1t
1
n=0
(n)
(0. k)
=
1
2Nc 1
1
n=0
k:[k[_1t
(n)
(0. k)
_
1
2Nc 1
1
n=0
n
.
where in the last step we have replaced Nc with nc. Since
n
1,
lim
1-o
1
2Nc 1
1
n=0
n
= lim
1-o
N
2Nc 1
1
N
1
n=0
n
=
1
2c
=
M
2
.
We infer that G(0. 0) _ M,2 for every M N.
Combining the last two theorems, we obtain the following.
4.71 Corollary. Let j be a probability measure on Z with nite rst moment and
j(0) < 1. Then the random walk with law j is recurrent if and only if j = 0.
Compare with Example 3.5 (innite drunkards walk): in this case, j =
1
q
1
, and j = q; as we know, one has recurrence if and only if = q
(= 1,2).
What happens when [j[
1
= o? We present a class of examples.
4.72 Proposition. Let jbe a symmetric probability distribution on Zsuch that for
some real exponent > 0,
0 < lim
k-o
k
j(k) < o.
Then the random walk with law j is recurrent if _ 1, and transient if < 1.
Observe that for > 1 recurrence follows from Theorem 4.68.
In dimension 2, there is an analogue of Corollary 4.71 in presence of nite
second moment.
4.73 Theorem. Let j be a probability distribution on Z
2
with [j[
2
< o. Then
the random walk with law j is recurrent if and only if j = 0.
(The only if follows from Theorem 4.67.)
Finally, the behaviour of the simple random walk on Z
3
generalizes as follows,
without any moment condition.
4.74 Theorem. Let j be a probability distribution on Z
d
whose support generates
a subgroup that is at least 3-dimensional. Then each state of the random walk with
law j is transient.
The subgroup generated by supp(j) consists of all elements of Z
d
which can
be written as a sum of nitely many elements of supp(j) Lsupp(j). It is known
from the structure theory of Abelian groups that any subgroup of Z
d
is isomorphic
with Z
d
0
for some J
t
_ J. The number J
t
is the dimension (often called the rank)
of the subgroup to which the theoremrefers. In particular, every irreducible random
walk on Z
d
, J _ 3, is transient.
We omit the proofs of Proposition 4.72 and of Theorems 4.73 and 4.74. They
rely on the use of characteristic functions, that is, harmonic analysis. For a detailed
treatment, see the monograph by Spitzer [Sp, 8]. A very nice and accessible
account of recurrence of random walks on Z
d
is given by Lesigne [Le].
Chapter 5
Models of population evolution
In this chapter, we shall study three classes of processes that can be interpreted as
theoretical models for the random evolution of populations. While in this book we
maintain discrete time and discrete state space, the second and the third of those
models will go slightly beyond ordinary Markov chains (although they may of
course be interpreted as Markov chains on more complicated state spaces).
A Birth-and-death Markov chains
In this section we consider Markov chains whose state space is X = {0. 1. . . . . N]
with N N, or X = N
0
(in which case we write N = o). We assume that
there are non-negative parameters
k
, q
k
and r
k
(0 _ k _ N) with q
0
= 0
and (if N < o)
1
= 0 that satisfy
k
q
k
r
k
= 1, such that whenever
k 1. k. k 1 X (respectively) one has
(k. k 1) =
k
. (k. k 1) = q
k
and (k. k) = r
k
.
while (k. l) = 0 if [k l[ > 1. See Figure 12. The nite and innite drunkards
walk and the Ehrenfest urn model are all of this type.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
0
................................................
................................................
........... . . . . . . . . . . .
. . . . . . . . . . . . ........
. . . . . . . .
r
0
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
1
................................................
................................................
........... . . . . . . . . . . .
. . . . . . . . . . . . ........
. . . . . . . .
r
1
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
2
................................................
................................................
........... . . . . . . . . . . .
. . . . . . . . . . . . ........
. . . . . . . .
r
2
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
3
................................................
................................................
........... . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
........
r
3
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
....................
3
. . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
4
...
Figure 12
Such a chain is called random walk on N
0
, or on {0. 1. . . . . N], respectively. It
is also called a birth-and-death Markov chain. This latter name comes from the
following interpretation. Consider a population which can have any number k of
members, where k N
0
. These numbers are the states of the process, and we
consider the evolution of the size of the population in discrete time steps n =
0. 1. 2. . . . (e.g., year by year), so that 7
n
= k means that at time n the population
has k members. If at some time it has k members then in the next step it can
increase by one (with probability
k
the birth rate), maintain the same size (with
probability r
k
) or, if k > 0, decrease by one individual (with probability q
k
the
death rate). The Markov chain describes the random evolution of the population
A. Birth-and-death Markov chains 117
size. In general, the state space for this model is N
0
, but when the population size
cannot exceed N, one takes X = {0. 1. . . . . N].
Finite birth-and-death chains
We rst consider the case when N < o, and limit ourselves to the following three
cases.
(a) Reecting boundaries:
k
> 0 for each k with 0 _ k < N and q
k
> 0 for
every k with 0 < k _ N.
(b) Absorbing boundaries:
k
. q
k
> 0 for 0 < k < N, while
0
= q
1
= 0
and r
0
= r
1
= 1.
(c) Mixed case (state 0 is absorbing and state N is reecting):
k
. q
k
> 0 for
0 < k < N, and also q
1
> 0, while
0
= 0 and r
0
= 1.
In the reecting case, we do not necessarily require that r
0
= r
1
= 0; the main
point is that the Markov chain is irreducible. In case (b), the states 0 and N are
absorbing, and the irreducible class {1. . . . . N 1] is non-essential. In case (c),
only the state 0 is absorbing, while {1. . . . . N] is a non-essential irreducible class.
For the birth-and-death model, the last case is the most natural one: if in some
generation the population dies out, then it remains extinct. Starting with k _ 1
individuals, the quantity J(k. 0) is then the probability of extinction, while 1
J(k. 0) is the survival probability.
We now want to compute the generating functions J(k. m[:). We start with the
case when k < m and state 0 is reecting. As in the specic case of Example 1.46,
Theorem 1.38 (d) leads to a linear recursion in k:
J(0. m[:) = r
0
: J(0. m[:)
0
: J(1. m[:) and
J(k. m[:) = q
k
: J(k 1. m[:) r
k
: J(k. m[:)
k
: J(k 1. m[:)
for k = 1. . . . . m 1. We write for : = 0
J(k. m[:) = Q
k
(1,:) J(0. m[:). k = 0. . . . . m.
With the change of variable t = 1,:, we nd that the Q
k
(t ) are polynomials with
degree k which satisfy the recursion
Q
0
(t ) = 1.
0
Q
1
(t ) = t r
0
. and
k
Q
k1
(t ) = (t r
k
)Q
k
(t ) q
k
Q
k-1
(t ). k _ 1.
(5.1)
It does not matter here whether state N is absorbing or reecting.
Analogously, if k > m and state N is reecting then we write for : = 0
J(k. m[:) = Q
+
k
(1,:) J(N. m[:). k = m. . . . . N.
118 Chapter 5. Models of population evolution
Again with t = 1,:, we obtain the downward recursion
Q
+
1
(t ) = 1. q
1
Q
+
1-1
(t ) = t r
1
. and
q
k
Q
+
k-1
(t ) = (t r
k
)Q
+
k
(t )
k
Q
+
k1
(t ). k _ N 1.
(5.2)
The last equation is the same as in (5.1), but with different initial values and working
downwards instead of upwards.
Now recall that J(m. m[:) = 1. We get the following.
5.3 Lemma. If the state 0 is reecting and 0 _ k _ m, then
J(k. m[:) =
Q
k
(1,:)
Q
n
(1,:)
.
If the state N is reecting and m _ k _ N, then
J(k. m[:) =
Q
+
k
(1,:)
Q
+
n
(1,:)
.
Let us now consider the case when state 0 is absorbing. Again, we want to
compute J(k. m[:) for 1 _ k _ m _ N. Once more, this does not depend on
whether the state N is absorbing or reecting, or even N = o. This time, we write
for : = 0
J(k. m[:) = 1
k
(1,:) J(1. m[:). k = 1. . . . . m.
With t = 1,:, the polynomials 1
k
(:) have degree k 1 and satisfy the recursion
1
1
(t ) = 1.
1
1
2
(t ) = t r
1
. and
k
1
k1
(t ) = (t r
k
)1
k
(t ) q
k
1
k-1
(t ). k _ 1.
(5.4)
which is basically the same as (5.1) with different initial terms. We get the following.
5.5 Lemma. If the state 0 is absorbing and 1 _ k _ m, then
J(k. m[:) =
1
k
(1,:)
1
n
(1,:)
.
We omit the analogous case when state N is absorbing and m _ k < N. From
those formulas, most of the interesting functions and quantities for our Markov
chain can be derived.
5.6 Example. We consider simple random walk on {0. . . . . N] with state 0 reect-
ing and state N absorbing. That is, r
k
= 0 for all k < N,
0
= r
1
= 1 and
k
= q
k
= 1,2 for k = 1. . . . . N 1. We want to compute J(k. N[:) and
J(k. 0[:) for k < N.
The recursion (5.1) becomes
Q
0
(t ) = 1. Q
1
(t ) = t. and Q
k1
(t ) = 2t Q
k
(t ) Q
k-1
(t ) for k _ 1.
This is the well known formula for the Chebyshev polynomials of the rst kind, that
is, the polynomials that are dened by
Q
k
(cos ) = cos k.
For real : _ 1, we can set 1,: = cos . Thus, : is strictly increasing from
0. ,2) to 1. o), and
J
_
k. N
1
cos
_
=
cos k
cos N
.
We remark that we can determine the (common) radius of convergence s of the
power series J(k. N[:) for k < N, which are rational functions. We know that
s > 1 and that it is the smallest positive pole of J(k. N[:). Thus, s = 1, cos
t
21
.
J(k. 0[:) is the same as the function J(N k. N[:) in the reversed situation
where state 0 is absorbing and state N is reecting. Therefore, in our example,
J(k. 0[:) =
1
1-k-1
(1,:)
1
1-1
(1,:)
.
where
1
0
(t ) = 1. 1
1
(t ) = 2t. and 1
k1
(t ) = 2t 1
k
(t ) 1
k-1
(t ) for k _ 1.
We recognize these as the Chebyshev polynomials of the second kind, which are
dened by
1
k
(cos ) =
sin(k 1)
sin
.
We conclude that for our random walk with 0 reecting and N absorbing,
J
_
k. 0
1
cos
_
=
sin(N k)
sin N
.
and that the (common) radius of convergence of the power series J(k. 0[:) (1 _
k < N) is s
t
= 1, cos
t
1
. We can also compute G(0. 0[:) = 1
_
1 : J(1. 0[:)
_
in these terms:
G
_
0. 0
1
cos
_
=
tan N
tan
. and G(0. 0[:) =
(1,:)1
1-1
(1,:)
Q
1
(1,:)
.
In particular, G(0. 0[:) has radius of convergence r = s = 1, cos
t
21
. The values at
: = 1 of our functions are obtained by letting 0 from the right. For example,
the expected number of visits in the starting point 0 before absorption in the state
N is G(0. 0[1) = N.
We return to general nite birth-and-death chains. If both 0 and N are reecting,
then the chain is irreducible. It is also reversible. Indeed, reversibility with respect
to a measure m on {0. 1. . . . . N] just means that m(k)
k
= m(k 1) q
k1
for all
k < N. Up to the choice of the value of m(0), this recursion has just one solution.
We set
m(0) = 1 and m(k) =

0

k-1
q
1
q
k
. k _ 1. (5.7)
Then the unique stationary probability measure is m
0
(k) = m(k)
1
}=0
m( ). In
particular, in view of Theorem 3.19, the expected return time to the origin is
E
0
(t
0
) =
1
}=0
m( ).
We now want to compute the expected time to reach 0, starting from any state
k > 0. This is E
k
(s
0
) = J
t
(k. 0[1). However, we shall not use the formula of
Lemma 5.3 for this computation. Since k 1 is a cut point between k and 0, we
have by Exercise 1.45 that
E
k
(s
0
) = E
k
(s
k-1
) E
k-1
(s
0
) = =
k-1
i=0
E
i1
(s
i
).
Now U(0. 0[:) = r
0
:
0
: J(1. 0[:). We derive with respect to : and take into
account that both r
0
0
= 1 and J(1. 0[1) = 1:
E
1
(s
0
) = J
t
(1. 0[1) =
U
t
(0. 0[1) 1
0
=
E
0
(t
0
) 1
0
=
1
}=1
1

}-1
q
1
q
}
.
We note that this number does not depend on
0
. Indeed, the stopping time s
0
depends only on what happens until the rst visit in state 0 and not on the outgoing
probabilities at 0. In particular, if we consider the mixed case of birth-and-death
chain where the state 0 is absorbing and the state N reecting, then J(1. 0[:) and
J
t
(1. 0[1) are the same as above. We can use the same argument in order to compute
J
t
(i 1. i [1), by making the state i absorbing and considering our chain on the
set {i. . . . . N]. Therefore
E
i1
(s
i
) =
1
}=i1
i1

}-1
q
i1
q
}
.
We subsume our computations, which in particular give the expected time until
extinction for the true birth-and-death chain of the mixed model (c).
5.8 Proposition. Consider a birth-and-death chain on {0. . . . . N], where N is
reecting. Starting at k > 0, the expected time until rst reaching state 0 is
E
k
(s
0
) =
k-1
i=0
1
}=i1
i1

}-1
q
i1
q
}
.
Innite birth-and-death chains
We nowturn our attention to birth-and-death chains on N
0
, again limiting ourselves
to two natural cases:
(a) The state 0 is reecting:
k
> 0 for every k _ 0 and q
k
> 0 for every k _ 1.
(b) The state 0 is absorbing:
k
. q
k
> 0 for all k _ 1, while
0
= 0 and r
0
= 1.
Most answers to questions regarding case (b) will be contained in what we shall
nd out about the irreducible case, so we rst concentrate on (a). Then the Markov
chain is again reversible with respect to the same measure m as in (5.7), this time
dened on the whole of N
0
. We rst address the question of recurrence/transience.
5.9 Theorem. Suppose that the random walk on N
0
is irreducible (state 0 is re-
ecting). Set
S =
o
n=1
q
1
q
n
1

n
and T =
o
n=1
0

n-1
q
1
q
n
.
Then
(i) the random walk is transient if S < o,
(ii) the random walk is null-recurrent if S = o and T = o, and
(iii) the random walk is positive recurrent if S = o and T < o.
Proof. We use the ow criterion of Theorem 4.51. Our network has vertex set N
0
and two oppositely oriented edges between m 1 and m for each m N, plus
possibly additional loops m. m| at some or all m. The loops play no role for the
ow criterion, since every ow must have value 0 on each of them. With m as in
(5.7), the conductance of the edge m. m1| is
a(m. m1) =
0
1

n
q
1
q
n
: a(0. 1) =
0
.
There is only one ow with input i
0
= 1 from 0 to o, namely the one where
(m 1. m|) = 1 and (m. m 1|) = 1 along the two oppositely oriented
edges between m 1 and m. Its power is
(. ) =
o
n=1
1
a(m 1. m)
=
S 1
0
.
Thus, there is a owfrom0 to owith nite power if and only if S < o: recurrence
holds if and only if S = o. In that case, we have positive recurrence if and only
if the total mass of the invariant measure is nite. The latter holds precisely when
T < o.
5.10 Examples. (a) The simplest example to illustrate the last theorem is the
one-sided drunkards walk, whose absorbing variant has been considered in Ex-
ample 2.10. Here we consider the reecting version, where
0
= 1,
k
= ,
q
k
= q = 1 for k _ 1, and r
k
= 0 for all k.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
0
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
1
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
2
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
3
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
....................
. . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
...
Figure 13
Theorem5.9 implies that this randomwalk is transient when > 1,2, null recurrent
when = 1,2, and positive recurrent when < 1,2.
More generally, suppose that we have
0
= 1. r
k
= 0 for all k _ 0, and
k
_ 1,2 c for all k _ 1, where c > 0. Then it is again straightforward that
the random walk is transient, since S < o. If on the other hand
k
_ 1,2 for
all k _ 1, then the random walk is recurrent, and positive recurrent if in addition
k
_ 1,2 c for all k.
(b) In view of the last example, we now ask if we can still have recurrence when
all outgoing probabilities satisfy
k
> 1,2. We consider the example where
0
= 1. r
k
= 0 for all k _ 0, and
k
=
1
2
c
k
. where c > 0. (5.11)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
0
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
c
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
1
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
1
2
c
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2

c
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
2
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
1
2

c
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2

c
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
3
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
....................
1
2

c
3
. . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2

c
4
...
Figure 14
We start by observing that the inequality log(1 t ) log(1 t ) _ 2t holds for all
real t 0. 1). Therefore
log

k
q
k
= log
_
1
2c
k
_
log
_
1
2c
k
_
_
4c
k
.
whence
log

1

n
q
1
q
n
_ 4c
n
k=1
1
k
_ 4c log m.
We get
S _
o
n=1
1
m
4c
.
and see that S < oif c > 1,4.
On the other hand, when c _ 1,4 then q
k
,
k
_ (2k 1),(2k 1), and
q
1
q
n
1

n
_
1
2m1
.
so that S = o.
We have shown that the random walk of (5.11) is recurrent if c _ 1,4, and
transient if c > 1,4. Recurrence must be null recurrence, since clearly T = o.
Our next computations will lead to another, direct and more elementary proof
of Theorem 5.9, that does not involve the ow criterion.
Theorem 1.38 (d) and Proposition 1.43 (b) imply that for k _ 1
J(k. k 1[:) = q
k
: r
k
: J(k. k 1[:)
k
: J(k 1. k 1[:) and
J(k 1. k 1[:) = J(k 1. k[:) J(k. k 1[:).
Therefore
J(k. k 1[:) =
q
k
:
1 r
k
:
k
: J(k 1. k[:)
. (5.12)
This will allow us to express J(k. k 1[:), as well as U(0. 0[:) and G(0. 0[:) as a
continued fraction. First, we dene a new stopping time:
s
k
i
= min{n _ 0 [ 7
n
= k. [7
n
k[ _ i for all m _ n].
Setting
(n)
i
(l. k) = Pr
I
s
k
i
= n| and J
i
(l. k[:) =
o
n=0
(n)
i
(l. k) :
n
.
the number J
i
(l. k) = J
i
(l. k[1) is the probability, starting at l, to reach k before
leaving the interval k i. k i |.
5.13 Lemma. If [:[ _ r, where r = r(1) is the radius of convergence of G(l. k[:),
then
lim
i-o
J
i
(l. k[:) = J(l. k[:).
Proof. It is clear that the radius of convergence of the power series J
i
(l. k[:) is at
least s(l. k) _ r. We know from Lemma 3.66 that J(l. k[r) < o.
For xed n, the sequence
_
(n)
i
(l. k)
_
is monotone increasing in i , with limit
(n)
(l. k) as i o. In particular,
(n)
i
(l. k) [:
n
[ _
(n)
(l. k) r
n
. Therefore we
can use dominated convergence (the integral being summation over n in our power
series) to conclude.
5.14 Exercise. Prove that for i _ 1,
J
i
(k 1. k 1[:) = J
i-1
(k 1. k[:) J
i
(k. k 1[:).
In analogy with (5.12), we can compute the functions J
i
(k. k1[:) recursively.
J
0
(k. k 1[:) = 0. and
J
i
(k. k 1[:) =
q
k
:
1 r
k
:
k
: J
i-1
(k 1. k[:)
for i _ 1.
(5.15)
Note that the denominator of the last fraction cannot have any zero in the domain
of convergence of the power series J
i
(k. k1[:). We use (5.15) in order to compute
the probability J(k 1. k).
5.16 Theorem. We have
J(k 1. k) = 1
1
1 S(k)
. where S(k) =
o
n=1
q
k1
q
kn
k1

kn
.
Proof. We prove by induction on i that for all k. i N
0
,
J
i
(k 1. k) = 1
1
1 S
i
(k)
. where S
i
(k) =
i
n=1
q
k1
q
kn
k1

kn
.
Since S
0
(k) = 0, the statement is true for i = 0. Suppose that it is true for i 1
(for every k _ 0). Then by (5.15)
J
i
(k 1. k) =
q
k1
1 r
k1

k1
J
i-1
(k 2. k 1)
= 1

k1
_
1 J
i-1
(k 2. k 1)
_
k1
_
1 J
i-1
(k 2. k 1)
_
q
k1
= 1
1
1
q
k1
k1
1
1 J
i-1
(k 2. k 1)
= 1
1
1
q
k1
k1
_
1 S
i-1
(k 1)
_
.
which implies the proposed formula for i (and all k).
Letting i o, the theorem now follows from Lemma 5.13.
We deduce that
J(l. k) = J(l. l 1) J(k 1. k) =
I-1
}=k
S( )
1 S( )
. if l > k. (5.17)
while we always have J(l. k) = 1 when l < k, since the Markov chain must exit
the nite set {0. . . . . k 1] with probability 1. We can also compute
U(k. k) =
k
J(k 1. k) q
k
J(k 1. k) r
k
= 1

k
1 S(k)
.
Since S = S(0) for the number dened in Theorem 5.9, we recover the recurrence
criterion from above. Furthermore, we obtain formulas for the Green function at
: = 1:
5.18 Corollary. When state 0 is reecting, then in the transient case,
G(l. k) =
_
_
1 S(k)
k
. if l _ k.
S(k)
]
k
I-1
}=k1
S( )
1 S( )
. if l > k.
When state 0 is absorbing, that is, for the true birth-and-death chain, the for-
mula of (5.17) gives the probabilityof extinctionJ(k. 0), whenthe initial population
size is k _ 1.
5.19 Corollary. For the birth-and-death chain on N
0
with absorbing state 0, ex-
tinction occurs almost surely when S = o, while the survival probability is positive
when S < o.
5.20 Exercise. Suppose that S = o. In analogy with Proposition 5.8, derive a
formula for the expected time E
k
(s
0
) until extinction, when the initial population
size is k _ 1.
Let us next return briey to the computation of the generating functions
J(k. k 1[:) and J
i
(k. k 1[:) via (5.12) and the nite recursion (5.15), re-
spectively. Precisely in the same way, one nds an analogous nite recursion for
computing J(k. k 1[:), namely
J(0. 1[:) =

0
:
1 r
0
:
. and
J(k. k 1[:) =

k
:
1 r
k
: q
k
: J(k 1. k[:)
.
(5.21)
We summarize.
5.22 Proposition. For k _ 1, the functions J(k. k 1[:) and J(k. k 1[:) can
be expressed as nite and innite continued fractions, respectively. For : Cwith
[:[ _ r(1),
J(k. k 1[:) =

k
:
1 r
k
:
q
k
k-1
:
2
1 r
k-1
: .
.
.
q
1
0
:
2
1 r
0
:
. and
J(k. k 1[:) =
q
k
:
1 r
k
:

k
q
k1
:
2
1 r
k1
:

k1
q
k2
:
2
1 r
k1
: .
.
.
The last innite continued fraction is of course intended as the limit of the
nite continued fractions J
i
(k. k 1[:) the approximants that are obtained by
stopping after the i -th division, i.e., with 1 r
k-1i
: as the last denominator.
There are well-known recursion formulas for writing the i -th approximant as a
quotient of two polynomials. For example,
J
i
(1. 0[:) =

i
(:)
T
i
(:)
.
where
0
(:) = 0. T
0
(:) = 1.
1
(:) = q
1
:. T
1
(:) = 1 r
1
:.
and, for i _ 1,
i1
(:) = (1 r
i1
:)
i
(:)
i
q
i1
:
2
i-1
(:).
T
i1
(:) = (1 r
i1
:)T
i
(:)
i
q
i1
:
2
T
i-1
(:).
To get the analogous formulas for J
i
(k. k 1[:) and J(k. k 1[:), one just has
to adapt the indices accordingly. This opens the door between birth-and-death
Markov chains and the classical theory of analytic continued fractions, which is in
turn closely linked with orthogonal polynomials. A few references for that theory
are the books by Wall [Wa] and Jones and Thron[J-T], and the memoir by Askey
and Ismail [2]. Its application to birth-and-death chains appears, for example, in
the work of Good [28], Karlin and McGregor [34] (implicitly), Gerl [24] and
Woess [52].
More examples
Birth-and-death Markov chains on N
0
provide a wealth of simple examples. In
concluding this section, we consider a few of them.
5.23 Example. It may be of interest to compare some of the features of the innite
drunkards walk on Z of Example 3.5 and of its reecting version on N
0
of Exam-
ple 5.10 (a) with the same parameters and q = 1 , see Figure 13. We shall
use the indices Z and N to distinguish between the two examples.
First of all, we know that the random walk on Zis transient when = 1,2 and
null recurrent when = 1,2, while the walk on N
0
is transient when > 1,2,
null recurrent when = 1,2, and positive recurrent when < 1,2.
Next, note that for l > k _ 0, then the generating function J(l. k[:) =
J(1. 0[:)
I-k
is the same for both examples, and
J(1. 0[:) =
1
2:
_
1
_
1 4q:
2
_
.
We have already computed U
Z
(0. 0[:) in (3.6), and U
N
(0. 0[:) = :J(1. 0[:). We
obtain
G
Z
(0. 0[:) =
1
_
1 4q:
2
and G
N
(0. 0[:) =
2
2 1
_
1 4q:
2
.
We compute the asymptotic behaviour of the 2n-step return probabilities to the
origin, as n o. For the random walk on Z, this can be done as in Example 4.58:
(2n)
Z
(0. 0) =
n
q
n
_
2n
n
_
~
(4q)
n
_
n
.
For the random walk on N
0
, we rst consider < 1,2 (positive recurrence).
The period is J = 2, and E
0
t
0
= U
t
N
(0. 0[1) = 2q,(q ). Therefore, using
Exercise 3.52,
(2n)
N
(0. 0)
4q
q
. if < 1,2.
If = 1,2, then G
N
(0. 0[:) = G
Z
(0. 0[:). Indeed, if (S
n
) is the random walk
on Z, then
_
[S
n
[
_
is a Markov chain with the same transition probabilities as the
randomwalk on N
0
. It is the factor chain as described in (1.29), where the partition
of Z has the blocks {k. k], k N
0
. Therefore
(2n)
N
(0. 0) =
(2n)
Z
(0. 0) ~
1
_
n
. if = 1,2.
If > 1,2, then we use the fact that
(2n)
N
(0. 0) is the coefcient of :
2n
in the
Taylor series expansion at 0 of G
N
(0. 0[:), or (since that function depends only
on :
2
) equivalently, the coefcient of :
n
in the Taylor series expansion at 0 of the
function
G(:) =
2
2 1
_
1 4q:
=
1
: 1
_
( q)
_
1 4q:
_
.
This function is analytic in the open disk {: C : [:[ < :
0
] where :
0
= 1,(4q).
Standard methods in complex analysis (Darboux method, based on the Riemann
Lebesgue lemma; see Olver [Ol, p. 310]) yield that the Taylor series coefcients
behave like those of the function
H(:) =
1
:
0
1
_
( q)
_
1 4q:
_
.
The n-th coefcient (n _ 1) of the latter is
1
:
0
1
(4q)
n
_
1,2
n
_
=
2
:
0
1
(4q)
n
n2
2n-2
_
2n 2
n 1
_
.
Therefore, with a use of Stirlings formula,
(2n)
N
(0. 0) ~
8q
1 4q
(4q)
n
n
_
n
. if > 1,2.
The spectral radii are
j(1
Z
) = 2
_
q and j(1
N
) =
_
1. if _ 1,2.
2
_
q. if > 1,2.
After these calculations, we turn to issues with a less combinatorial-analytic avour.
We can realize our random walks on Z and on N
0
on one probability space, so
that they can be compared (a coupling). For this purpose, start with a probability
space (C. A. Pr) on which one can dene a sequence of i.i.d. {1]-valued random
variables (Y
n
)
n_1
with distribution j =
1
q
-1
. This probability space may
be, for example, the product space
_
{1. 1]. j
_
N
, where Ais the product o-algebra
of the discrete one on {1. 1]. In this case, Y
n
is the n-th projection C {1. 1].
Now dene
S
n
= k
0
Y
1
Y
n
= S
n-1
Y
n
.
This is the innite drunkards walk on Zwith
Z
(k. k1) = and
Z
(k1. k) =
q, starting at k
0
Z. Analogously, let k
0
N
0
and dene
7
0
= k
0
and 7
n
= [7
n-1
Y
n
[ =
_
1. if 7
n-1
= 0.
7
n-1
Y
n
. if 7
n-1
> 0.
This is just the reecting random walk on N
0
of Figure 13. We now suppose that
for both walks, the starting point is k
0
_ 0. It is clear that 7
n
_ S
n
, and we are
interested in their difference. We say that a reection occurs at time n, if 7
n-1
= 0
and Y
n
= 1. At each reection, the difference increases by 2 and then remains
unchanged until the next reection. Thus, for arbitrary n, we have 7
n
S
n
= 21
n
,
where 1
n
is the (random) number of reections that occur up to (and including)
time n.
Now suppose that > 1,2. Then the reecting walk is transient: with proba-
bility 1, it visits 0 only nitely many times, so that there can only be nitely many
reections. That is, 1
n
remains constant from a (random) index onwards, which
can also be expressed by saying that 1
o
= lim
n
1
n
is almost surely nite, whence
1
n
,
_
n 0 almost surely. By the law of large numbers, S
n
,n q almost
surely, and
_
S
n
n(q)
_
2
_
qn is asymptotically standard normal N(0. 1) by
the central limit theorem. Therefore
7
n
n
q almost surely, and
7
n
n( q)
2
_
qn
N(0. 1) in law, if > 1,2.
In particular, q = 2 1 is the linear speed or rate of escape, as the random
walk tends to o.
If = 1,2 then the law of large numbers tells us that S
n
,n 0 almost surely
and S
n
,(2
_
n) is asymptotically normal. We know that the sequence
_
[S
n
[
_
is a
model of (7
n
), i.e., it is a Markov chain with the same transition probabilities as
(7
n
). Therefore
7
n
n
0 almost surely, and
7
n
2
_
n
[N[(0. 1) in law, if = 1,2.
where [N[(0. 1) is the distribution of the absolute value of a normal randomvariable;
the density is (t ) =
_
2
t
e
-t
2
1
0, o)
(t ).
Finally, if < 1,2 then for each > 0, the function (k) = k
1{
on N
0
is integrable with respect to the stationary probability measure, which is given by
m(0) = c = (q ),(2q) and m(k) = c
k-1
,q
k
for k _ 1. Therefore the
Ergodic Theorem 3.55 implies that
1
n
n-1
k=0
7
1{
n

o
k=1
k
1{
m(k) almost surely.
It follows that
7
n
n
0 almost surely for every > 0.

Our next example is concerned with j-recurrence.
5.24 Example. We let > 1,2 and consider the following slight modication
of the reecting random walk of the last example and Example 5.10 (a):
0
= 1,
1
= q
1
= 1,2, and
k
= , q
k
= q = 1 for k _ 2, while r
k
= 0 for all k.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
0
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1,2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
1
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
1,2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
2
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................
3
................................................. . . . . . . . . . . .
. . . . . . . . . . . .
....................
. . . . . . . . . . . . . . . . . . . .............
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
q
...
Figure 15
The function J(2. 1[:) is the same as in the preceding example,
J(2. 1[:) =
1
2:
_
1
_
1 4q:
2
_
.
By (5.12),
J(1. 0[:) =
1
2
:
1
1
2
: J(2. 1[:)
.
and U(0. 0[:) = : J(1. 0[:). We compute
U(0. 0[:) =
2:
2
4 1
_
1 4q:
2
.
We use a basic result from Complex Analysis, Pringsheims theorem, see e.g.
Hille [Hi, p. 133]: for a power series with non-negative coefcients, its radius
of convergence is a singularity; see e.g. Hille [Hi, p. 133]. Therefore the radius of
convergence s of U(0. 0[:) is the smallest positive singularity of that function, that
is, the zero s = 1,
_
4q of the square root expression. We compute for > 1,2
U(0. 0[s) =
1
(4 1)(2 2)
_
_
= 1. if =
3
4
.
< 1. if
1
2
< <
3
4
.
> 1. if
3
4
< < 1.
Next, we recall Proposition 2.28: the radius of convergence r = 1,j(1) of the
Green function is the largest positive real number for which the power series
U(0. 0[r) converges and has a value _ 1. Thus, when >
3
4
then r must be
the unique solution of the equation U(0. 0[:) = 1 in the real interval (0. s), which
turns out to be r =
_
(4 2),. We have U
t
(0. 0[r) < o in this case. When
1
2
< _
3
4
, we conclude that r = s, and compute U
t
(0. 0[s) < o. Therefore we
have the following:
If
1
2
< <
3
4
then j(1) = 2
_
q, and the random walk is j-transient.
B. The GaltonWatson process 131
If =
3
4
then j(1) =
_
3
2
, and the random walk is j-null-recurrent.
If
3
4
< < 1 then j(1) =
_
]
4]-2
, and the random walk is j-positive-
recurrent.
B The GaltonWatson process
Citing from Norris [No, p. 171], the original branching process was considered
by Galton and Watson in the 1870s while seeking a quantitative explanation for the
phenomenon of the disappearance of family names, even in a growing population.
Their model is based on the concept that family names are passed on from fathers
to sons. (We shall use gender-neutral terminology.)
In general, a GaltonWatson process describes the evolution of successive gen-
erations of a population under the following assumptions.
The initial generation number 0 has one member, the ancestor.
The number of children (offspring) of any member of the population (in any
generation) is random and follows the offspring distribution j.
The j-distributed random variables that represent the number of children of
each of the members of the population in all the generations are independent.
Thus, j is a probability distribution on N
0
, the non-negative integers: j(k)
is the probability to have k children. We exclude the degenerate cases where
j =
1
, that is, where every member of the population has precisely one offspring
deterministically, or where j =
0
and there is no offspring at all.
The basic question is: what is the probability that the population will survive
forever, and what is the probability of extinction?
To answer this question, we set up a simple Markov chain model: let N
(n)
}
,
_ 1, n _ 0, be a double sequence of independent random variables with iden-
tical distribution j. We write M
n
for the random number of members in the n-th
generation. The sequence (M
n
) is the GaltonWatson process. If M
n
= k then we
can label the members of that generation by = 1. . . . . k. For each , the -th
member has N
(n)
}
children, so that M
n1
= N
(n)
1
N
(n)
k
. We see that
M
n1
=
}=1
N
(n)
}
. (5.25)
Since this is a sum of i.i.d. random variables, its distribution depends only on the
value of M
n
and not on past values. This is the Markov property of (M
n
)
n_0
, and
the transition probabilities are
(k. l) = PrM
n1
= l [ M
n
= k| = Pr
_
k
}=1
N
(n)
}
= l
_
= j
(k)
(l). k. l N
0
.
(5.26)
where j
(k)
is the k-th convolution power of j, see (4.19) (but note that the group
operation is addition here):
j
(k)
(l) =
I
}=0
j
(k-1)
(l ) j( ). j
(0)
(l) =
0
(l).
In particular, 0 is an absorbing state for the Markov chain (M
n
) on N
0
, and the
initial state is M
0
= 1. We are interested in the probability of absorption, which is
nothing but the number
J(1. 0) = PrJ n _ 1 : M
n
= 0| = lim
n-o
PrM
n
= 0|.
The last identity holds because the events M
n
= 0| = J k _ n : M
k
= 0|
are increasing with limit (union) J n _ 1 : M
n
= 0|. Let us now consider the
probability generating functions of j and of M
n
,
(:) =
o
I=0
j(l) :
I
and g
n
(:) =
o
k=0
PrM
n
= k| :
k
= E(:
n
). (5.27)
where 0 _ : _ 1. Each of these functions is non-negative and monotone increasing
on the interval 0. 1| and has value 1 at : = 1. We have g
n
(0) = PrM
n
= 0|.
Using (5.26), we now derive a recursion formula for g
n
(:). We have g
0
(:) = :
and g
1
(:) = (:), since the distribution of M
1
is j. For n _ 1,
g
n
(:) =
o
k,I=0
PrM
n
= l [ M
n-1
= k| PrM
n-1
= k| :
I
=
o
k=0
PrM
n-1
= k|
k
(:). where
k
(:) =
o
I=0
j
(k)
(l) :
I
.
We have
0
(:) = 1 and
1
(:) = (:). For k _ 1, we can use the product formula
for power series and compute
k
(:) =
o
I=0
I
}=0
_
j
(k-1)
(l ) :
I-}
__
j( ) :
}
_
=
_
o
n=0
j
(k-1)
(m) :
n
__
o
}=0
j( ) :
}
_
=
k-1
(:) (:).
Therefore
k
(:) = (:)
k
, and we have obtained the following.
5.28 Lemma. For 0 _ : _ 1,
g
n
(:) =
o
k=0
PrM
n-1
= k| (:)
k
= g
n-1
_
(:)
_
= . . .

n times
(:) =
_
g
n-1
(:)
_
.
In particular, we see that
g
1
(0) = j(0) and g
n
(0) =
_
g
n-1
(0)
_
. (5.29)
Letting n o, continuity of the function (:) implies that the extinction prob-
ability J(1. 0) must be a xed point of , that is, a point where (:) = :. To
understand the location of the xed point(s) of in the interval 0. 1|, note that is
a convex function on that interval with (0) = j(0) and (1) = 1. We do not have
(:) = :, since j =
1
. Also, unless (:) = j(0)j(1): with j(1) = 1j(0),
we have
tt
(:) > 0 on (0. 1). Therefore there are at most two xed points, and one
of them is : = 1. When is there a second one? This depends on the slope of the
graph of (:) at : = 1, that is, on the left-sided derivative
t
(1) =

o
n=1
nj(n).
This is the expected offspring number. (It may be innite.)
If
t
(1) _ 1, then (:) > : for all : < 1, and we must have J(1. 0) = 1.
See Figure 16 a.
.................................................................................................................................................................................................................................................................. . . . . . . . . . . .
. . . . . . . . . . . .
:
.................................................................................................................................................................................................................................................................................
. . . . . . . . . . . .
,
...........................................................................................................................................................................................................................................................................................................................................................................
(1. 1)
, = :
.......................................................................................................................................................................................................................................................................................................
, = (:)
Figure 16 a
.................................................................................................................................................................................................................................................................. . . . . . . . . . . .
. . . . . . . . . . . .
:
.................................................................................................................................................................................................................................................................................
. . . . . . . . . . . .
,
...........................................................................................................................................................................................................................................................................................................................................................................
(1. 1)
, = (:)
...............................................................................................................................................................................................................................................................................................................................
, = :
Figure 16 b
If 1 <
t
(1) _ othen besides : = 1, there is a second xed point z < 1, and
convexity implies
t
(z) < 1. See Figure 16 b. Since
t
(:) is increasing in :, we
get that
t
(:) _
t
(z) on 0. z|. Therefore z is an attracting xed point of on
0. z|, and (5.29) implies that g
n
(0) z.
We subsume the results.
5.30 Theorem. Let j be the non-degenerate offspring distribution of a Galton
Watson process and
j =
o
n=1
nj(n)
be the expected offspring number.
If j _ 1, then extinction occurs almost surely.
If 1 < j _ othen the extinction probability is the unique non-negative number
z < 1 such that
o
k=0
j(k) z
k
= z.
and the probability that the population survives forever is 1 z > 0.
If j = 1, the GaltonWatson process is called critical, if j < 1, it is called
subcritical, and if j > 1, the process is called supercritical.
5.31 Exercise. Let t be the time until extinction of the GaltonWatson process with
offspring distribution j. Show that E(t) < oif j < 1.
[Hint: use E(t) =

n
Prt > n| and relate Prt > n| with the functions of (5.27).]
The GaltonWatson process as the basic example of a branching process is very

well described in the literature. Standard monographs are the ones of Harris [Har]
and Athreya and Ney [A-N], but the topic is also presented on different levels
in various books on Markov chains and stochastic processes. For example, a nice
treatment is given by Lyons with Peres [L-P]. Here, we shall not further develop
the detailed study of the behaviour of (M
n
).
However, we make one step backwards and have a look beyond counting the
number M
n
of members in the n-th generation. Instead, we look at the complete in-
formation about the generation tree of the population. Let N = sup{n : j(n) > 0],
and write
=
_
{1. . . . . N}. if N < o.
N. if N = o.
Any member of the population will have a certain number k of children, which we
denote by 1. 2. . . . . k. If itself is not the ancestor c of the population, then it
is an offspring of some member u of the population, that is, = u , where N.
Thus, we can encode by a sequence (word)
1

n
with length [[ = n _ 0
and
1
. . . . .
n
. The set of all such sequences is denoted
+
. This includes the
empty sequence or word c, which stands for the ancestor. If has the form = u ,
then the predecessor of is
-
= u. We can draw an edge from u to in this case.
Thus,
+
becomes an innite N-ary tree with root c, where in the case N = othis
means that every vertex u has countably many forward neighbours (namely those
with
-
= u), while the number of forward neighbours is N when N < o. For
u.
+
, we shall write
u 4 . if = u
1

I
with
1
. . . . .
I
(l _ 0).
This means that u lies on the shortest path in
+
from c to .
A full genealogical tree is a nite or innite subtree T of
+
which contains the
root and has the property
uk T == u T. = 1. . . . . k 1. (5.32)
(We use the word full because the tree is thought to describe the generations
throughout all times, while later on, we shall consider initial pieces of such trees
up to some generation.) Note that in this way, our tree is what is often called a
rooted plane tree or planted plane tree. Here, rooted refers to the fact that
it is equipped with a distinguished root, and plane means that isomorphisms of
such objects do not only have to preserve the root, but also the drawing of the tree
in the plane. For example, the two trees in Figure 17 are isomorphic as usual rooted
graphs, but not as planted plane trees.
...................................................................................................................................................................................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..............................................................................................................................
..................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................................................
..............................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................................................
Figure 17
A full genealogical tree T represents a possible genealogical tree of our population
throughout all lifetimes. If it has nite height, then this means that the population
dies out, and if it has innite height, the population survives forever. The n-th
generation consists of all vertices of T at distance n from the root, and the height
of T is the supremum over all n for which the n-th generation is non-empty.
We can nowconstruct a probability space on which one can dene a denumerable
collection of random variables N
u
, u
+
, which are i.i.d. with distribution j.
This is just the innite product space
(C
GW
. A
GW
. Pr
GW
) =
_
S. j
_
.
where S = supp(j) = {k N
0
: j(k) > 0] and the o-algebra on S is of course
the family of all subsets of S. Thus, A
GW
is the o-algebra generated by all sets of
the form
B
,k
=

(k
u
)
u
S
: k
= k
_
.
+
. k S.
and if
1
. . . . .
i
are distinct and k
1
. . . . . k
i
S then
Pr
GW
(B
1
,k
1
B
r
,k
r
) = j(k
1
) j(k
i
).
The random variable N
u
is just the projection
N
u
(o) = k
u
. if o = (k
u
)
u
.
What is then the (random) GaltonWatson tree, i.e., the full genealogical tree T(o)
associated with o? We can build it up recursively. The root c belongs to T(o).
If u
+
belongs to T(o), then among its successors, precisely those points u
belong to T(o) for which 1 _ _ N
u
(o).
Thus, we can also interpret T(o) as the connected component of the root c in a
percolation process. We start with the tree structure on the whole of
+
and decide
at random to keep or delete edges: the edge from any u
+
to u is kept when
_ N
u
(o), and deleted when > N
u
(o). Thus, we obtain several connected
components, each of which is a tree; we have a random forest. Our GaltonWatson
tree is T(o) = T
e
(o), the component of c. Every other connected component in
that forest is also a GaltonWatson tree with another root. As a matter of fact, if we
only look at forward edges, then every u
+
can thus be considered as the root
of a GaltonWatson tree T
u
whose distribution is the same for all u. Furthermore,
T
u
and T
are independent unless u 4 or 4 u.

Coming back to the original GaltonWatson process, M
n
(o) is now of course
the number of elements of T(o) which have length (height) n.
5.33 Exercise. Consider the GaltonWatson process with non-degenerate offspring
distribution j. Assume that j _ 1. Let T be the resulting random tree, and let
[T[ =
o
n=0
M
n
be its size (number of vertices). This is the total number of the population, which
is almost surely nite.
(a) Show that the probability generating function
g(:) =
o
k=1
Pr [T[ = k| :
k
= E(:
[T[
)
satises the functional equation
g(:) = :
_
g(:)
_
.
where is the probability generating function of j as in (5.27).
(b) Compute the distribution of the population size [T[, when the offspring
distribution is
j = q
0

2
. 0 < _ 1,2. q = 1 .
(c) Compute the distribution of the population size [T[, when the offspring
distribution is the geometric distribution
j(k) = q
k
. k N
0
. 0 < _ 1,2. q = 1 . (5.34)
One may observe that while the construction of our probability space is simple,
it is quite abundant. We start with a very big tree but keep only a part of it as T(o).
There are, of course, other possible models which are not such that large parts of
the space C may remain unused; see the literature, and also the next section.
In our model, the offspring distribution j is a probability measure on N
0
, and
the number of children of any member of the population is always nite, so that the
GaltonWatson tree is always locally nite. We can also admit the case where the
number of children may be innite (countable), that is, j(o) > 0. Then = N,
and if u
+
is a member of our population which has innitely many children,
then this means that u T for every . The construction of the underlying
probability space remains the same. In this case, we speak of an extended Galton
Watson process in order to distinguish it from the usual case.
5.35 Exercise. Show that when the offspring distribution satises j(o) > 0, then
the population survives with positive probability.
5.36 Example. We conclude with an example that will link this section with
the one-sided innite drunkards walk on N
0
which is reecting at 0, that is,
(0. 1) = 1. See Example 5.10 (a) and Figure 13. We consider the recurrent case,
where (k. k 1) = _ 1,2 and (k. k 1) = q = 1 for k _ 1. Then t
0
,
the return time to 0, is almost surely nite. We let
M
k
=
t
0
n=1
1
7
n1
=k, 7
n
=k1j
be the number of upward crossings of the edge from k to k 1. We shall work
out that this is a GaltonWatson process. We have M
0
= 1, and a member of the
k-th generation is just a single crossing of k. k 1|. Its offspring consists of all
crossings of k 1. k 2| that occur before the next crossing of k. k 1|.
That is, we suppose that (7
n-1
. 7
n
) = (k. k 1). Since the drunkard will
almost surely return to state k with probability 1 after time n, we have to consider
the number of times when he makes a step fromk1 to k2 before the next return
from k 1 to k. The steps which the drunkard makes right after those successive
returns to state k 1 and before t
k
are independent, and will lead to k 2 with
probability each time, and at last from k 1 to k with probability q. This is just
the scheme of subsequent Bernoulli trials until the rst failure (=step to the left),
where the failure probability is q. Therefore we see that the offspring distribution
is indeed the same for each crossing of an edge, and it is the geometric distribution
of (5.34).
This alone does not yet prove rigorously that (M
k
) is a GaltonWatson process
with offspring distribution j. To verify this, we provide an additional explanation
of the genealogical tree that corresponds to the above interpretation of offspring
of an individual as the upcrossings of k 1. k 2| in between two subsequent
upcrossings of k. k 1|.
Consider any possible nite trajectory of the random walk that returns to 0 in
the last step and not earlier.
This is a sequence k = (k
0
. k
1
. . . . . k
1
) in N
0
such that k
0
= k
1
= 0, k
1
= 1,
k
n1
= k
n
1 and k
n
= 0 for 0 < n < N. The number N must be even, and
k
1-1
= 1.
Which such a sequence, we associate a rooted plane tree T(k) that is constructed
in recursive steps n = 1. . . . . N 1 as follows. We start with the root c, which is
the current vertex of the tree at step 1. At step n, we suppose to have already drawn
the part of the tree that corresponds to (k
0
. . . . . k
n
), and we have marked a current
vertex, say u, of that current part of the tree. If n = N 1, we are done. Otherwise,
there are two cases. (1) If k
n1
= k
n
1, then the tree remains unchanged, but
we mark the predecessor of . as the new current vertex. (2) If k
n1
= k
n
1 then
we introduce a new vertex, say , that is connected to u by an edge in the tree and
becomes the new current vertex. When the procedure ends, we are back to c as the
current vertex.
Another way is to say that the vertex set of T(k) consists of all those initial
subsequences (k
0
. . . . . k
n
) of k, where 0 < n < N and k
n
= k
n-1
1. The
predecessor of the vertex corresponding to (k
0
. . . . . k
n
) with n > 1 is the shortest
initial subsequence of (k
0
. . . . . k
n
) that ends at k
n-1
. The root corresponds to
(k
0
. k
1
) = (0. 1).
This is the walk-to-tree coding with depth-rst traversal. For example, the
trajectory (0. 1. 2. 3. 2. 3. 4. 3. 4. 3. 2. 1. 2. 3. 2. 3. 2. 3. 2. 1. 0) induces the rst of
the two trees in Figure 17. (The initial and nal 0 are there automatically. We
might as well consider only sequences in N that start and end at 1.)
Conversely, when we start with a rooted plane tree, we can read its contour,
which is a trajectory k as above (with the initial and nal 0 added). We leave it to
the reader to describe how this trajectory is obtained by a recursive algorithm.
The number of vertices of T(k) is [T(k)[ = N,2. Any vertex at level (=distance
from the root) k corresponds to precisely one step from k 1 to k within the
trajectory k. (It also encodes the next occurring step from k to k 1.) Thus, each
vertex encodes an upward crossing, and T(k) is the genealogical tree that we have
described at the beginning, of which we want to prove that it is a GaltonWatson
tree with offspring distribution j.
The correspondence between k and T = T(k) is one-to-one. The probability
that our random walk trajectory until the rst return to 0 induces T is
Pr
RW
(T) = Pr
o
7
0
= k
0
. . . . . 7
1
= k
1
| = q(q)
1{2-1
= q
[T[
[T[-1
.
where the superscript RW means random walk. Now, further above a general
construction of a probability space was given on which one can realize a Galton
Watson tree with general offspring distribution j. For our specic example where
j is as in (5.34), we can also use a simpler model.
Namely, it is sufcient to have a sequence (X
n
)
n_1
of i.i.d. 1-valued random
variables with Pr
GW
X
n
= 1| = and Pr
GW
X
n
= 1| = q. The superscript
GW refers to the GaltonWatson tree that we are going to construct. We consider
the value 1 as success and 1 as failure. With probability one, both values
1 and 1 occur innitely often. With (X
n
) we can build up recursively a random
genealogical tree T, based on the breadth-rst order of any rooted plane tree. This
is the linear order where for vertices u, we have u < if either the distances to
the root satisfy [u[ < [[, or [u[ = [[ and u is further to the left than .
At the beginning, the only vertex of the tree is the root c. For each success
before the rst failure, we draw one offspring of the root. At the rst failure, c
is declared processed. At each subsequent step, we consider the next unprocessed
vertex of the current tree in the breadth-rst order, and we give it one offspring
for each success before the next failure. When that failure occurs, that vertex is
processed, and we turn to the next vertex in the list. The process ends when no
more unprocessed vertex is available; it continues, otherwise. By construction,
the offspring numbers of different vertices are i.i.d. and geometrically distributed.
Thus, we obtain a GaltonWatson tree as proposed. Since
j _ 1 == _ 1,2.
that tree is a.s. nite in our case. The probability that it will be a given nite
rooted plane tree T is obtained as follows: each edge of T must correspond to a
success, and for each vertex, there must be one failure (when that vertex stops to
create offspring). Thus, the rst 2[T[ 1 members of (X
n
) must consist of [T[ 1
successes and [T[ failures. Therefore
Pr
GW
(T) = q
[T[
[T[-1
= Pr
RW
(T).
We have obtained a one-to-one correspondence between random walk trajectories
until the rst return to 0 and GaltonWatson trees with geometric offspring distri-
bution, and that correspondence preserves the probability measure.
This proves that the random tree created from the upcrossings of the drunkards
walk with _ 1,2 is indeed the proposed GaltonWatson tree.
We remarkthat the above correspondence betweendrunkards walks andGalton
Watson trees was rst described by Harris [30].
5.37 Exercise. Show that when > 1,2 in Example 5.36, then (M
k
) is not a
GaltonWatson process.
C Branching Markov chains
We now combine the two models: Markov chain and GaltonWatson process.
We imagine that at a given time n, the members of the n-th generation of a nite
population occupy various points (sites) of the state space of a Markov chain (X. 1).
Multiple occupancies are allowed, and the initial generation has only one member
(the ancestor). The population evolves according to a GaltonWatson process with
offspring distribution j, and at the same time performs random moves according
to the underlying Markov chain. In order to create the members of next generation
plus the sites that they occupy, each member of the n-th generation produces its k
children, accordingtothe underlyingGaltonWatsonprocess, withprobabilityj(k)
and then dies (or we may say that it splits into k new members of the population).
Each of those new members which are thus born at a site . X then moves
instantly to a random new site , with probability (.. ,), independently of all
others and independently of the past. In this way, we get the next generation and
the positions of its members. Here, we always suppose that the offspring distribution
lives on N
0
(that is, j(o) = 0) and is non-degenerate (we do not have j(0) = 1
or j(1) = 1). We do allow that j(0) > 0, in which case the process may die out
with positive probability.
The construction of a probability space on which this model may be realized
can be elaborated at various levels of rigour. One that comprises all the available
information is the following.
A single trajectory should consist of a full generation tree T, where to each
vertex u of T (= element of the population) we attach an element . X, which
is the position of u. If u is a successor of u in T, then u occupies site , with
probability (.. ,), given that u occupies .. This has to be independent of all the
other members of the population.
Thus (recalling that S is the support of j), our space is
C
BMC
= (S X)
=

o = (k
u
. .
u
)
u
: k
u
S. .
u
X
_
.
It is once more equipped with the product o-algebra A
BMC
of the discrete one
on S X. For o C
BMC
, let o = (k
u
)
u
be its projection onto C
GW
.
The associated GaltonWatson tree is T(o) = T( o), dened as at the end of the
preceding section.
C. Branching Markov chains 141
Nowlet be any nite subtree of
+
containing the root, and choose an element
a
u
X for every u . For every u , we let
k(u) = k
(u) = max{ : u ].
in particular k(u) = 0 if u has no successor in . With these data, we associate the
basic event (analogous to the cylinder sets of Section 1.B)
D( : a
u
. u ) =

o = (k
u
. .
u
)
u
: .
u
= a
u
and k
u
_ k(u) for all u
_
.
Since k
u
= N
u
(o) is the number of children of u in the GaltonWatson process, the
condition k
u
_ k(u) means that whenever u then this is one of the children
of u. Thus, D(: a
u
. u ) is the event that is part of T(o) and that each member
u of the population occupies the site a
u
of X.
Given a starting point . X, the probability measure governing the branching
Markov chain is now the unique measure on A
BMC
with
Pr
BMC
x
_
D(: a
u
. u )
_
=
x
(a
e
) j
_
k(c). o
_

u\e
j
_
k(u). o
_
(a
u
. a
u
).
(5.38)
where j. o) = j
_
{k : k _ ]
_
.
We can nowintroduce the randomvariables that describe the branching Markov
chain BMC(X. 1. j) starting at . X, where j is the offspring distribution. If
o = (k
u
. .
u
)
u
then for u
+
, we write
7
u
(o) = .
u
for the site occupied by the element u. Of course, we are only interested in 7
u
(o)
when u T(o), that is, when u belongs to the population that descends from the
ancestor c. The branching Markov chain is then the (random) GaltonWatson tree
T together with the family of random variables (7
u
)
uT
indexed by the Galton
Watson tree T. The Markov property extended to this setting says the following.
5.39 Facts. (1) For
+
and .. , X,
Pr
BMC
x
T. 7
}
= ,| = j. o) (.. ,).
(2) If u. u
t

+
are such that neither u 4 nor 4 u then the families
_
T
u
: (7
)
T
u
_
and
_
T
t
u
: (7
0 )
0
T
u
0
_
are independent.
(3) Given that 7
u
= ,, the family (7
)
T
u
is BMC(X. 1. j) starting at ,.
(4) In particular, if T contains a ray {u
n
=
1

n
: n _ 0] (innite path
starting at c) then (7
u
n
)
n_0
is a Markov chain on X with transition matrix 1.
Property (3) also comprises the generalization of time-homogeneity. The num-
ber of members of the n-th generation that occupy the site , X is
M
,
n
(o) =
{u T(o) : [u[ = n. 7
u
(o) = ,]
.
Thus, the underlying GaltonWatson process (number of members of the n-th gen-
eration) is
M
n
=
,A
M
,
n
.
while
M
,
=
o
n=0
M
,
n
is the total number of occupancies of the site , X during the whole lifetime of
the BMC. The random variable M
,
takes its values in N L {o]. The following
may be quite clear.
5.40 Lemma. Let u =
1

n

+
and .. , X. Then
Pr
BMC
x
u T. 7
u
= ,| =
(n)
(.. ,)
n
i=1
j
i
. o).
Proof. We use induction on n. If n = 1 and u = then this is (5.39.1). Now
suppose the statement is true for u =
1

n
and let = u
n1

+
. Then
T implies u T. Using this and (5.39.3),
Pr
BMC
x
T. 7
= ,|
=
uA
Pr
BMC
x
T. 7
= , [ u T. 7
u
= n| Pr
BMC
x
u T. 7
u
= n|
=
uA
Pr
BMC
x
u
n1
T. 7
u}
nC1
=, [ u T. 7
u
=n|
(n)
(.. n)
n
i=1
j
i
. o)
=
uA
Pr
BMC
u

n1
T. 7
}
nC1
= ,|
(n)
(.. n)
n
i=1
j
i
. o)
=
uA
(n)
(.. n)(n. ,)
n1
i=1
j
i
. o).
This leads to the proposed statement.
5.41 Exercise. (a) Deduce the following from Lemma 5.40. When the initial site
is ., then the expected number of members of the n-th generation that occupy the
site , is
E
BMC
x
(M
,
n
) =
(n)
(.. ,) j
n
.
where (recall) j is the expected offspring number.
[Hint: write M
,
n
as a sum of indicator functions of events as those in Lemma 5.40.]
(b) Let .. , X and
(k)
(.. ,) > 0. Show that Pr
BMC
x
M
,
k
_ 1| > 0.
The question of recurrence or transience becomes more subtle for BMCthan for
ordinary Markov chains. We ask for the probability that throughout the lifetime of
the process, some site , is occupied by innitely many members in the successive
generations of the population. In analogy with 3.A, we dene the quantities
H
BMC
(.. ,) = Pr
BMC
x
M
,
= o|. .. , X.
5.42 Theorem. One either has (a) H
BMC
(.. ,) = 1 for all .. , X, or (b)
0 < H
BMC
(.. ,) < 1 for all .. , X, or (c) H
BMC
(.. ,) = 0 for all .. , X.
Before the proof, we need the following.
5.43 Lemma. For all .. ,. ,
t
X, one has Pr
BMC
x
M
,
= o. M
,
0
< o| = 0.
Proof. Let k be such that
(k)
(,. ,
t
) > 0. First of all, using continuity of the
probability of increasing sequences,
Pr
BMC
x
M
,
= o. M
,
0
< o| = lim
n-o
Pr
BMC
x
_
M
,
= o| T
n
_
.
where T
n
= M
,
0
n
= 0 for all n _ m|. Now, M
,
= o| T
n
is the limit
(intersection) of the decreasing sequence of the events
n,i
, where
n,i
=
_
There are n(1). . . . . n(r) _ m with n( ) > n( 1) k
such that M
,
n(})
_ 1 for all
_
T
n
T
n,i
=
_
There are n(1). . . . . n(r) _ m with n( ) > n( 1) k
such that M
,
n(})
_ 1 and M
,
0
n(})k
= 0 for all
_
.
If o T
n,i
then there are (random) elements u( ) T(o) with [u( )[ = n( )
such that 7
u(})
(o) = .. Since n( ) > n( 1) k, the initial parts with
height k of the generation trees rooted at the u( ) are independent. None of the
descendants of the u( ) after k generations occupies site ,
t
. Therefore, if we let
= Pr
BMC
,
M
,
0
k
= 0| then < 1 by Exercise 5.41 (b), and
Pr
BMC
x
(
n,i
) _ Pr
BMC
x
(T
n,i
) _
i
.
Letting r o, we obtain Pr
BMC
x
_
M
,
= o| T
n
_
= 0, as required.
Proof of Theorem 5.42. We rst x ,. Let .. .
t
X be such that (.
t
. .) > 0.
Choose k _ 1 be such that j(k) > 0. The probability that the BMC starting at
.
t
produces j(k) children, which all move to ., is Pr
BMC
x
0
_
N
e
= k = M
x
1
_
=
j(k) (.
t
. .)
k
. Therefore
H
BMC
(.
t
. ,) _ Pr
BMC
x
0
_
N
e
= k = M
x
1
. M
,
= o
_
= j(k) (.
t
. .)
k
Pr
BMC
x
0
_
M
,
= o [ N
e
= k = M
x
1
_
.
The last factor coincides with the probability that BMC with k particles (instead
of one) starting at . evolves such that M
,
= o. That is, we have independent
replicas M
,,1
. . . . . M
,,k
of M
,
descending from each of the k children of c that
are now occupying site ., and
H
BMC
(.
t
. ,)
_
j(k) (.
t
. .)
k
_
_ Pr
BMC
x
M
,,1
M
,,k
= o|
=
_
1 Pr
BMC
x
M
,,1
< o. . . . . M
,,k
< o|
_
=
_
1
_
1 H
BMC
(.. ,)
_
k
_
_ H
BMC
(.. ,).
Irreducibility nowimplies that whenever H
BMC
(.. ,) > 0 for some ., then we have
H
BMC
(.
t
. ,) > 0 for all .
t
X. In the same way,
1 H
BMC
(.
t
. ,) = Pr
BMC
x
0
M
,
< o|
_ j(k) (.
t
. .)
k
Pr
BMC
x
M
,,1
M
,,k
< o|
= j(k) (.
t
. .)
k
_
1 H
BMC
(.. ,)
_
k
.
Again, irreducibility implies that whenever H
BMC
(.. ,) < 1 for some . then
H
BMC
(.
t
. ,) < 1 for all .
t
X. Now Lemma 5.43 implies that when we have
H
BMC
(.. ,) = 1 for some .. , then this holds for all .. , X, see the next
exercise.
5.44 Exercise. Show that indeed Lemma 5.43 together with the preceding argu-
ments yields the nal part of the proof of the theorem.
The last theorem justies the following denition.
5.45 Denition. The branching Markov chain (X. 1. j) is called strongly recurrent
if H
BMC
(.. ,) = 1, weakly recurrent if 0 < H
BMC
(.. ,) < 1, and transient if
H
BMC
(.. ,) = 0 for some (equivalently, all) .. , X.
Contrary to ordinary Markov chains, we do not have a zero-one law here as in
Theorem3.2: we shall see examples for each of the three regimes of Denition 5.45.
Before this, we shall undertake some additional efforts in order to establish a general
transience criterion.
Embedded process
We introduce a new, embedded GaltonWatson process whose population is just
the set of elements of the original GaltonWatson tree T that occupy the starting
point . X of BMC(X. 1. j). We dene a sequence (W
x
n
)
n_0
of subsets of T:
we start with W
x
0
= {c], and for n _ 1,
W
x
n
=

T : 7
= .. [{u T : c = u 4 . 7
u
= .][ = n
_
.
Thus, W
n
means that if c = u
0
. u
1
. . . . . u
i
= are the successive points on
the shortest path in T from c to , then 7
u
k
(k = 0. . . . . r) starts at . and returns
to . precisely n times. In particular, the denition of W
1
should remind the reader
of the rst return stopping time t
x
dened for ordinary Markov chains in (1.26).
We set Y
x
n
= [W
x
n
[.
5.46 Lemma. The sequence (Y
x
n
)
n_0
is an extended GaltonWatson process with
non-degenerate offspring distribution
v
x
(m) = Pr
BMC
x
Y
x
1
= m|. m N
0
L {o].
Its expected offspring number is
v
x
= U(.. .[ j) =
o
n=1
u
(n)
(.. .) j
n
.
where j is the expected offspring number in BMC(X. 1. j), and U(.. .[ ) is the
generating function of the rst return probabilities to . for the Markov chain (X. 1),
as dened in (1.37).
Proof. We know from (5.39.3) that for distinct elements u W
x
n
, the families
(7
)
T
u
are independent copies of BMC(X. 1. j) starting at .. Therefore the
random numbers [{ W
x
n1
: u 4 ][, where u W
x
n
, are independent and have
all the same distribution as Y
x
1
. Now,
W
x
n1
=
_
uW
x
n
{ W
x
n1
: u 4 ].
a disjoint union. This makes it clear that (Y
x
n
)
n_0
is a (possibly extended) Galton
Watson process with offspring distribution v
x
. The sets W
x
n
, n N, are its succes-
sive generations. In order to describe v
x
, we consider the events W
x
1
|.
5.47 Exercise. Prove that if =
1

n

then
Pr
BMC
x
W
x
1
| = u
(n)
(.. .)
n
i=1
j
i
. o).
We resume the proof of Lemma 5.46 by observing that
Y
x
1
=
C
1
W
x
1
j
=
o
n=1
n
1
W
x
1
j
.
Now
E
BMC
x
_

n
1
W
x
1
j
_
=
=}
1
}
n
n
Pr
BMC
x
W
x
1
|
=
}
1
,...,}
n
N
u
(n)
(.. .)
n
i=1
j
i
. o)
= u
(n)
(.. .)
n
i=1
_
}
i
N
j
i
. o)
_
= u
(n)
(.. .) j
n
.
(5.48)
This leads to the proposed formula for v
x
= E
BMC
x
(Y
x
1
). At last, we show that v
x
is non-degenerate, that is, Pr
BMC
x
Y
x
1
= 1| < 1, since clearly v
x
(0) < 1 (e.g., by
Exercise 5.47). By assumption, j(1) < 1. If the population dies out at the rst
step then also Y
x
1
= 0. That is, Pr
BMC
x
Y
x
1
= 0| _ j(0), and if j(0) > 0 then
Pr
BMC
x
Y
x
1
= 1| < 1. So assume that j(0) = 0. Then there is m _ 2 such that
j(m) > 0. By irreducibility, there is n _ 1 such that u
(n)
(.. .) > 0. In particular,
there are .
0
. .
1
. . . . . .
n
in X with .
0
= .
n
= . and .
k
= . for 1 _ k _ n 1
such that (.
k-1
. .
k
) > 0. We can consider the subtree of
+
with height n that
consists of all elements u {1. . . . . m]
+

+
with [u[ _ m. Then (5.38) yields
Pr
BMC
x
Y
x
1
_ m
n
| _ Pr
BMC
x
T. 7
u
= .
k
for all u with [u[ = k|
= jm. o)
1nn
n1
n
k=1
(.
k-1
. .
k
)
n
k
> 0.
and v
x
is non-degenerate.
5.49 Theorem. For an irreducible Markov chain (X. 1), BMC(X. 1. j) is tran-
sient if and only if j _ 1,j(1).
Proof. For BMC starting at ., the total number of occupancies of . is
M
x
=
o
n=0
Y
x
n
.
We can apply Theorem 5.30 to the embedded GaltonWatson process (Y
x
n
)
n_0
,
taking into account Exercise 5.35 in the case when Pr
BMC
x
Y
x
1
= o| > 0. Namely,
Pr
BMC
x
M
x
< o| = 1 precisely when the embedded process has average offspring
_ 1, that is, when U(.. .[ j) _ 1. By Proposition 2.28, the latter holds if and only
if j _ 1,j(1).
Regarding the last theorem, it was observed by Benjamini and Peres [6]
that BMC(X. 1. j) is transient when j < 1,j(1) and recurrent when j >
1,j(1). Transience in the critical case j = 1,j(1) was settled by Gantert
and Mller [23].
1
While the last theorem provides a good tool to distinguish between the transient
and the recurrent regime, at the moment (2009) no general criterion of compara-
ble simplicity is known to distinguish between weak and strong recurrence. The
following is quite obvious.
5.50 Lemma. Let . X. BMC(X. 1. j) is strongly recurrent if and only if
Pr
BMC
x
7
u
= . for some u T \ {c]| = 1.
or equivalently (when [X[ _ 2), for all , X \ {.],
Pr
BMC
,
7
u
= . for some u T \ {c]| = 1.
For this it is necessary that j(0) = 0.
Proof. We have strong recurrence if and only if the extinction probability for the
embedded GaltonWatson process (Y
x
n
) is 0. A quick look at Theorem 5.30 and
Figures 16 a, b convinces us that this is true if and only if v
x
> 1 and v
x
(0) = 0.
Since v
x
is non-degenerate, v
x
(0) = 0 implies v
x
> 1. Therefore we have strong
recurrence if and only if v
x
(0) = Pr
BMC
x
Y
x
1
= 0| = 0. Since Pr
BMC
x
Y
x
1
= 0| =
1 Pr
BMC
x
7
u
= . for some u T \ {c]|, the proposed criterion follows.
If j(0) > 0 then with positive probability, the underlying GaltonWatson tree
T consists only of c, in which case Y
x
1
= 0. Therefore Pr
BMC
x
M
,
= o| _
Pr
BMC
x
Y
x
1
> 0| < 1, and recurrence cannot be strong. This proves the rst criterion.
It is clear that the second criterion is necessary for strong recurrence: if innitely
members of the population occupy site ., when the starting point is , = ., then
at least one member distinct from c must occupy .. Conversely, suppose that the
criterion is satised. Then necessarily j(0) = 0, and each member of the non-
empty rst generation moves to some , X. If one of them stays at . (when
(.. .) > 0) then we have an element u T \ {c] such that 7
u
= .. If one of
them has moved to , = ., then our criterion guarantees that one of its descendants
will come back to . with probability 1. Therefore the rst criterion is satised, and
we have strong recurrence.
1
The above short proof of Theorem 5.49 is the outcome of a discussion between Nina Gantert and
the author.
5.51 Exercise. Elaborate the details of the last argument rigorously.
We remark that in our denition of strong recurrence we mainly have in mind
the case when j(0) = 0: there is always at least one offspring. In the case when
j(0) > 0, we have the obvious inequality 0 _ H
BMC
(.. ,) _ 1 z, where
z is the extinction probability of the underlying GaltonWatson process. One
then may strengthen Theorem 5.42 by showing that one of H
BMC
(.. ,) = 1 z,
0 < H
BMC
(.. ,) < 1 z or H
BMC
(.. ,) = 0 holds for all .. , X. Then one
may redene strong recurrence by H
BMC
(.. ,) = 1 z. We leave the details of
the modied proof of Theorem 5.42 to the interested reader.
We note the following consequence of (5.39.4) and Lemma 5.50.
5.52 Exercise. Showthat if the irreducible Markov chain (X. 1) is recurrent and the
offspring distribution satises j(0) = 0 then BMC(X. 1. j) is strongly recurrent.
There is one class of Markov chains where a complete description of strong

recurrence is available, namely random walks on groups as considered in (4.18).
Here we run into a small conict of notation, since in the present chapter, j stands
for the offspring distribution of a GaltonWatson process and not for the law of a
random walk on a group. Therefore we just write 1 for the (irreducible) transition
matrix of our random walk on X = G and recall that it satises
(n)
(.. ,) =
(n)
(g.. g,) for all .. ,. g G and n N
0
.
An obvious, but important consequence is the following.
5.53 Lemma. If (7
u
)
uT
is BMC(G. 1. j) starting at . G and g G then
(g7
u
)
uT
is (a realization of ) BMC(G. 1. j) starting at , = g..
(By a realization of we mean the following: we have chosen a concrete
construction of a probability space and associated family of random variables that
are our model of BMC. The family (g7
u
)
uT
is not exactly the one of this model,
but it has the same distribution.)
5.54 Theorem. Let (G. 1) be an irreducible random walk on the group G. Then
BMC(G. 1. j) is strongly recurrent if and only if the offspring distribution j sat-
ises j(0) = 0 and j > 1,j(1).
Proof. We know that transience holds precisely when j _ 1,j(1). So we assume
that j(0) = 0 and j > 1,j(1). Then there is k N such that
=
(k)
(.. .) j
k
> 1.
which is independent of . by group invariance. We x this k and construct another
family of embedded GaltonWatson processes of the BMC. Suppose that u T.
We dene recursively a sequence of subsets V
u
n
of T (as long as they are non-empty):
V
u
0
= {u]. V
u
1
= { T
u
: [[ = [u[ k. 7
= 7
u
]. and V
u
n1
=
_
uV
u
n
V
u
1
.
In words, if [u[ = n
0
then V
u
1
consists of all descendants of u in generation number
n
0
k that occupy the same site as u, and V
u
n1
consists of all descendants of the
elements in V
u
n
that belong to generation number n
0
(n 1)k and occupy again
the same site. Then
_
[V
u
n
[
_
n_0
is a GaltonWatson process, which we call shortly
the (u. k)-process. By Lemma 5.53, all the (u. k)-processes, where u T (and
k is xed), have the same offspring distribution. Suppose that 7
u
= ,. Then,
by Exercise 5.41 (a), the average of this offspring distribution is
(k)
(,. ,) j
k
=
> 1. By Theorem 5.30, we have z < 1 for the extinction probability of the
(u. k)-process, and z does not depend on , = 7
u
.
Now recall that we assume j(0) = 0 and that j(1) = 1, so that j2. o) > 0.
We set u(m) = 0 01
, the word starting with (m1) letters 0 and the last

letter 1, where m N. Then the predecessor of u(m) (the word 0 0 with length
m1) is in T almost surely, so that Pru(m) T| = j2. o) > 0. For m
1
= m
2
,
none of u(m
1
) or u(m
2
) is a predecessor of the other. Therefore (5.39.2) implies
that all the
_
u(m). k
_
-processes are mutually independent, and so are the events
T
n
=
_
u(m) T. the
_
u(m). k
_
-process survives|.
Since Pr(T
n
) = (1z) j2. o) > 0is constant, the complements of the T
n
satisfy
Pr
__
n
T
c
n
_
= 0. On
_
n
T
n
, at least one of the
_
u(n). k
_
-processes survives, and
all of its members belong to T. Therefore
Pr
x
M
,
= o for some , X| = 1.
On the other hand, by Lemma 5.43,
Pr
x
M
x
<o. M
,
= o for some , X| _
,A
Pr
x
M
x
< o. M
,
= o| = 0.
Thus Pr
x
M
x
< o| = 0, as proposed.
We see that in the group invariant case, weak recurrence never occurs. Now we
construct examples where one can observe the phase transition from transience via
weak recurrence to strong recurrence, as the average offspring number j increases.
We start with two irreducible Markov chains (X
1
. 1
1
) and (X
2
. 1
2
) and connect
the two state spaces at a single root o. That is, we assume that X
1
X
2
= {o].
(Or, in other words, we identify roots o
i
X
i
, i = 1. 2, to become one common
point o, while keeping the rest of the X
i
disjoint.) Then we choose parameters
1
.
2
= 1
1
> 0 and dene a new Markov chain (X. 1), where X = X
1
LX
2
and 1 is given as follows (where i = 1. 2):
(.. ,) =
_
i
(.. ,). if .. , X
i
and . = o.
i

i
(o. ,). if . = o and , X
i
\ {o}.
1
(o. o)
2
2
(o. o). if . = , = o. and
0. in all other cases.
(5.55)
In words, if the Markov chain at some time has its current state in X
i
\ {o], then it
evolves in the next step according to 1
i
, while if the current state is o, then a coin
is tossed (whose outcomes are 1 or 2 with probability
1
and
2
, respectively) in
order to decide whether to proceed according to
1
(o. ) or
2
(o. ).
The new Markov chain is irreducible, and o is a cut point between X
1
\ {o]
and X
2
\ {o] (and vice versa) in the sense of Denition 1.42. It is immediate from
Theorem 1.38 that
U(o. o[:) =
1
U
1
(o. o[:)
2
U
2
(o. o[:).
Let s
i
and s be the radii of convergence of the power series U
i
(o. o[:) (i = 1. 2)
and U(o. o[:), respectively. Then (since these power series have non-negative
coefcients) s = min{s
1
. s
2
].
5.56 Lemma. min{j(1
1
). j(1
2
)] _ j(1) _ max{j(1
1
). j(1
2
)].
If (X
i
. 1
i
) is not j(1
i
)-positive-recurrent for i = 1. 2 then
j(1) = max{j(1
1
). j(1
2
)].
Proof. We use Proposition 2.28. Let r
i
= 1,j(1
i
) and r = 1,j(1) be the radii of
convergence of the respective Green functions.
If :
0
= min{r
1
. r
2
] then U
i
(o. o[:
0
) _ 1 for i = 1. 2, whence U(o. o[:
0
) _ 1.
Therefore :
0
_ r.
Conversely, if : > max{r
1
. r
2
] then U
i
(o. o[:) > 1 for i = 1. 2, whence
U(o. o[:) > 1. Therefore r < :.
Finally, if none (X
i
. 1
i
) is j(1
i
)-positive-recurrent then we know from Ex-
ercise 3.71 that r
i
= s
i
. With :
0
as above, if : > :
0
= min{s
1
. s
2
], then at
least one of the power series U
1
(o. o[:) and U
2
(o. o[:) diverges, so that certainly
U(o. o[:) > 1. Again by Proposition 2.28, r < :. Therefore r = :
0
, which proves
the last statement of the lemma.
5.57 Proposition. If 1 on X
1
L X
2
with X
1
X
2
= {o] is dened as in (5.55),
then BMC(X. 1. j) is strongly recurrent if and only if BMC(X
i
. 1
i
. j) is strongly
recurrent for i = 1. 2.
Proof. Suppose rst that BMC(X
i
. 1
i
. j) is strongly recurrent for i = 1. 2. Then
Pr
BMC
,
7
i
u
= o for some u T \ {c]| = 1 for each , X
i
, where
_
7
i
u
_
uT
is of
course BMC(X
i
. 1
i
. j). Now, withinX
i
\{o], BMC(X. 1. j) andBMC(X
i
. 1
i
. j)
evolve in the same way. Therefore
Pr
BMC
,
7
i
u
= o for some u T \ {c]| = Pr
BMC
,
7
u
= o for some u T \ {c]|
for all , X
i
\ {o], i = 1. 2. That is, the second criterion of Lemma 5.50 is
satised for BMC(X. 1. j).
Conversely, suppose that for at least one i {1. 2], BMC(X
i
. 1
i
. j) is not
strongly recurrent. Then, once more by Lemma 5.50, there is , X
i
\ {o] such
that Pr
BMC
,
7
i
u
= o for all u T| > 0. Again, this probability coincides with
Pr
BMC
,
7
u
= o for all u T|. and BMC(X. 1. j) cannot be strongly recurrent.
We now can construct a simple example where all three phases can occur.
5.58Example. Let X
1
andX
2
be twocopies of the additive groupZ. Let 1
1
(onX
1
)
and1
2
(onX
2
) be innite drunkardswalks as inExample 3.5withthe parameters
i
andq
i
= 1
i
for i = 1. 2. We assume that
1
2
<
1
<
2
. Thus j(1
i
) =
_
4
i
q
i
,
and j(1
1
) > j(1
2
). We can use (3.6) to see that these two random walks are
j-null-recurrent. We now connect the two walks at o = 0 as in (5.55). As above,
we write (X. 1) for the resulting Markov chain. Its graph looks like an innite
cross, that is, four half-lines emanating from o. The outgoing probabilities at o are
1
in direction East,
1
q
1
in direction West,
2
2
in direction North and
2
q
2
in direction South. Along the horizontal line of the cross, all other Eastbound tran-
sition probabilities (to the next neighbour) are
1
and all Westbound probabilities
are q
1
. Analogously, along the vertical line of the cross, all Northbound transition
probabilities (except those at o) are
2
and all Southbound transition probabilities
are q
2
. The reader is invited to draw a gure.
By Lemma 5.56, j(1) = j(1
1
). We conclude: in our example, BMC(X. 1. j)
is
transient if and only if j _ 1
_
4
1
q
1
,
recurrent if and only if j > 1
_
4
1
q
1
,
strongly recurrent if and only if j(0) = 0 and j > 1
_
4
2
q
2
.
Further examples can be obtained from arbitrary random walks on countable
groups. One can proceed as in the last example, using the important fact that such
a random walk can never be j-positive recurrent. This goes back to a theorem of
Guivarch [29], compare with [W2, Theorem 7.8].
The properties of the branching Markov chain (X. 1. j) give us the possibility to
give probabilistic interpretations of the Green function of the Markov chain (X. 1),
as well as of j-recurrence and -transience. We subsume.
For real : > 0, consider BMC(X. 1. j) with initial point ., where the offspring
distribution j has mean j = :. By Exercise 5.41, if : > 0, the Green function of
the Markov chain (X. 1),
G(.. ,[:) =
o
n=0
E
BMC
x
(M
,
n
) = E
BMC
x
(M
,
).
is the expected number of occupancies of the site , X during the lifetime of the
branching Markov chain. Also, we know from Lemma 5.46 that
U(.. .[:) = E
BMC
x
(Y
x
1
)
is the average offspring number of in the embedded GaltonWatson process Y
x
n
=
[W
x
n
[, where (recall) W
x
n
consists of all elements u in T with the property that along
the shortest path in the tree T from c to u, the n-th return to site . occurs at u.
Clearly r = 1,j(1) is the maximum value of : = j for which BMC(X. 1. j)
is transient, or, equivalently, the process (Y
x
n
) dies out almost surely.
Now consider in particular BMC(X. 1. j) with j = r. If the Markov chain
(X. 1) is j-transient, then the embedded process (Y
x
n
) is sub-critical: its average
offspring number is < 1, and we do not only have Pr
BMC
x
M
,
< o| = 1, but also
E
BMC
x
(M
,
) < o for all .. , X. In the j-recurrent case, (Y
x
n
) is critical: its
average offspring number is = 1, and while Pr
BMC
x
M
,
< o| = 1, the expected
number of occupancies of , is E
BMC
x
(M
,
) = o for all .. , X. We can also
consider the expected height in T of an element in the rst generation W
x
1
of the
embedded process. By a straightforward adaptation of (5.48), this is
E
BMC
x
_

u
[u[ 1
uW
x
1
j
_
= r U
t
(.. .[r).
It is nite when (X. 1) is j-positive recurrent, and innite when (X. 1) is j-null-
recurrent.
Chapter 6
Elements of the potential theory of transient
Markov chains
A Motivation. The nite case
At the centre of classical potential theory stands the Laplace equation
^ = 0 on O R
d
. (6.1)
where ^ is the Laplace operator on R
d
and O is a relatively compact open domain.
Atypical problemis to nd a function C(O
-
), twice differentiable and solution
of (6.1) in O, which satises
[
JO
= g C(dO) (6.2)
(Dirichlet problem). The function g represents the boundary data.
Let us consider the simple example where the dimension is J = 2 and O is the
interior of the square whose vertices are the points (1. 1), (1. 1), (1. 1) and
(1. 1). Atypical method for approximating the solution of the problem(6.1)(6.2)
consists in subdividing the square by a partition of its sides in 2n pieces of length
t = 1,n; see Figure 18.
t
Figure 18
For the second order partial derivatives we then substitute the symmetric differences
d
2
d.
2
(.. ,) ~
(. t. ,) 2(.. ,) (. t. ,)
t
2
.
d
2
d,
2
(.. ,) ~
(.. , t) 2(.. ,) (.. , t)
t
2
.
154 Chapter 6. Elements of the potential theory of transient Markov chains
We then write down the difference equation obtained in this way from the Laplace
equation in the points
(.. ,) = (i t. t). i. = n 1. . . . . 0. . . . . n 1:
_
(i 1)t. t
_
2
_
i t. t
_
_
(i 1)t. t
_
t
2

_
i t. ( 1)t
_
2
_
i t. t
_
_
i t. ( 1)t
_
t
2
= 0.
Setting h(i. ) = (i t. t) and dividing by 4, we get
1
4
_
h(i 1. ) h(i 1. ) h(i. 1) h(i. 1)
_
h(i. ) = 0. (6.3)
that is, h(i. ) must coincide with the arithmetic average of the values of h at the
points which are neighbours of (i. ) in the square grid. As i and vary, this
becomes a system of 4(n 1)
2
linear equations in the unknown variables h(i. ),
i. = n1. . . . . 0. . . . . n1; the function g on dO yields the prescribed values
h(n. ) = g(1. t) and h(i. n) = g(i t. 1). (6.4)
This system can be resolved by various methods (and of course, the original partial
differential equation can be solved by the classical method of separation of the
variables). The observation which is of interest for us is that our equations are
linked with simple random walk on Z
2
, see Example 4.63. Indeed, if 1 is the
transition matrix of the latter, then we can rewrite (6.3) as
h(i. ) = 1h(i. ). i. = n 1. . . . . 0. . . . . n 1.
where the action of 1 on functions is dened by (3.16). This suggests a strong
link with Markov chain theory and, in particular, that the solution of the equations
(6.3)(6.4) can be found as well as interpreted probabilistically.
Harmonic functions and Dirichlet problem for nite Markov chains
Let (X. 1) be a nite, irreducible Markov chain (that is, X is nite). We choose and
x a subset X
o
X, which we call the interior, and its complement 0X = X\X
o
,
the boundary, both non-empty. We suppose that X
o
is connected in the sense that
1
A
o =
_
(.. ,)
_
x,,A
o
the restriction of 1 to X
o
in the sense of Denition 2.14
is irreducible. (This means that the subgraph of I(1) induced by X
o
is strongly
connected; for any pair of points .. , X
0
there is an oriented path from . to ,
whose points all lie in X
o
.)
We call a function h: X R harmonic on X
o
, if h(.) = 1h(.) for every .
X
o
, where (recall) 1h(.) =

,A
(.. ,)h(,). As in Chapter 4, our Laplace
A. Motivation. The nite case 155
operator is 1 1, acting on functions as the product of a matrix with column
vectors. Harmonicity has become a mean value property: in each . X
o
, the
value h(.) is the weighted mean of the values of h, computed with the weights
(.. ,), , X.
We denote by H(X
o
) = H(X
o
. 1) the linear space of all functions on X which
are harmonic on X
o
. Later on, we shall encounter the following in a more general
context. We have already seen it in the proof of Theorem 3.29.
6.5 Lemma (Maximum principle). Let h H(X
o
) and M = max
A
h(.). Then
there is , 0X such that h(,) = M.
If h is non-constant then h(.) < M for every . X
o
.
Proof. We modify the transition matrix 1 by setting
(.. ,) = (.. ,). if . X
o
. , X.
(.. .) = 1. if . 0X.
(.. ,) = 0. if . 0X and , = ..
We obtain a new transition matrix

1, with one non-essential class X
o
and all points
in 0X as absorbing states. (We have made all elements of 0X absorbing.) Indeed,
by our assumptions, also with respect to

1
. , for all . X
o
. , X.
We observe that h H(X
o
. 1) if and only if h(.) =

1h(.) for all . X (not
only those in 0X). In particular,

1
n
h(.) = h(.) for every . X.
Suppose that there is . X
o
with h(.) = M. Take any .
t
X. Then

(n)
(.. .
t
) > 0 for some n. We get
M = h(.) =
(n)
(.. .
t
) h(.
t
)
,yx
0

(n)
(.. ,) h(,)
_
(n)
(.. .
t
) h(.
t
)
,yx
0

(n)
(.. ,) M
=
(n)
(.. .
t
) h(.
t
)
_
1
(n)
(.. .
t
)
_
M.
whence h(.
t
) _ M. Since M is the maximum, h(.
t
) = M. Thus, h must be
constant.
In particular, if h is non-constant, it cannot assume its maximum in X
o
.
Let s = s
A
be the hitting time of 0X, see (1.26):
s
A
= inf{n _ 0 : 7
n
0X].
for the Markov chain (7
n
) associated with 1, see (1.26). Note that substituting 1
with

1, as dened in the preceding proof, does not change s. Corollary 2.9, applied
to

1, yields that Pr
x
s
A
< o| = 1 for every . X. We set
v
x
(,) = Pr
x
s < o. 7
s
= ,|. , 0X. (6.6)
Then, for each . X, the measure v
x
is a probability distribution on 0X, called
the hitting distribution of 0X.
6.7 Theorem (Solution of the Dirichlet problem). For every function g: 0X R
there is a unique function h H(X
o
. 1) such that h(,) = g(,) for all , 0X.
It is given by
h(.) =
_
A
g Jv
x
.
Proof. (1) For xed , 0X,
. v
x
(,) (. X)
denes a harmonic function. Indeed, if . X
o
, we know that s _ 1, and applying
the Markov property,
v
x
(,) =
uA
Pr
x
7
1
= n. s < o. 7
s
= ,|
=
uA
(.. n) Pr
x
s < o. 7
s
= , [ 7
1
= n|
=
uA
(.. n) Pr
u
s < o. 7
s
= ,|
=
uA
(.. n) v
u
(,).
Thus,
h(.) =
_
A
g Jv
x
=
,A
g(,) v
x
(,)
is a convex combination of harmonic functions. Therefore h H(X
o
. 1). Fur-
thermore, for , 0X
v
,
(,) = 1 and v
,
(,
t
) = 0 for all ,
t
0X. ,
t
= ,.
We see that h(,) = g(,) for every , 0X.
(2) Having found the harmonic extension of g to X
o
, we have to show its
uniqueness. Let h
t
be another harmonic function which coincides with g on 0X.
Then h
t
h is harmonic, and
h(,) h
t
(,) = 0 for all , 0X.
A. Motivation. The nite case 157
By the maximum principle, h h
t
_ 0. Analogously h
t
h _ 0. Therefore h
t
and
h coincide.
From the last theorem, we see that the potential theoretic task to solve the
Dirichlet problem has a probabilistic solution in terms of the hitting distributions.
This will be the leading viewpoint in the present chapter, namely, to develop some
elements of the potential theory associated with a stochastic transition matrix 1
and the associated Laplace operator 1 1 under the viewpoint of its probabilistic
interpretation.
We return to the Dirichlet problem for nite Markov chains, i.e., chains with
nite state space. Above, we have adopted our specic hypotheses on X
o
and 0X
only in order to clarify the analogy with the continuous setting. For a general nite
Markov chain, not necessarily irreducible, we dene the linear space of harmonic
functions on X
H = H(X. 1) = {h: X R [ h(.) = 1h(.) for all . X].
Then we have the following.
6.8 Theorem. Let (X. 1) be a nite Markov chain, and denote its essential classes
by C
i
, i 1 = {1. . . . . m].
(a) If h is harmonic on X, then h is constant on each C
i
.
(b) For each function g: 1 R there is a unique function h H(X. 1) such
that for all i 1 and . C
i
one has h(.) = g(i ).
Proof. (a) Let M
i
= max
C
i
h, and let . C
i
such that h(.) = M
i
. As in the
proof of Lemma 6.5, if .
t
C
i
, we choose n with
(n)
(.. .
t
) > 0. Then
M
i
= h(.) = 1
n
h(.) _
_
1
(n)
(.. .
t
)
_
M
i

(n)
(.. .
t
)h(.
t
).
and h(.
t
) _ M
i
. Hence h(.
t
) = M
i
.
(b) Let
s = s
A
ess
= inf{n _ 0 : 7
n
X
ess
].
where X
ess
= C
1
L L C
n
. By Corollary 2.9, we have Pr
x
s < o| = 1 for
each .. Therefore
v
x
(i ) = Pr
x
s < o. 7
s
C
i
| (6.9)
denes a probability distribution on 1. As above,
h(.) =
iJ
g(i )v
x
(i )
denes the unique harmonic function on X with value g(i ) on C
i
, i 1. We leave
the details as an exercise to the reader.
Note, for the nite case, the analogy with Corollary 3.23 concerning the sta-
tionary probability measures.
6.10 Exercise. Elaborate the details from the end of the last proof, namely, that
h(.) =

iJ
g(i )v
x
(i ) is the unique harmonic function on X with value g(i )
on C
i
.
B Harmonic and superharmonic functions. Invariant
and excessive measures
In Theorem 6.8 we have described completely the harmonic functions in the nite
case. From now on, our focus will be on the innite case, but most of the results
will be valid also when X is nite. However, we shall work under the following
restriction.
We always assume that 1 is irreducible on X.
1
We do not specify any subset of X as a boundary: in the innite case, the bound-
ary will be a set of new points, to be added to X at innity. We shall also admit
the situation when 1 is a substochastic matrix. In this case, the measures Pr
x
(. X) on the trajectory space, as constructed in Section 1.B are no more prob-
ability measures. In order to correct this defect, we can add an absorbing state f
to X. We extend the transition probabilities to X L {f]:
(f. f) = 1 and (.. f) = 1
,A
(.. ,). . X. (6.11)
Now the measures on the trajectory space of X L{f], which we still denote by Pr
x
(. X), become probability measures. We can think of f as a tomb: in any
state ., the Markov chain (7
n
) may die ( be absorbed by f) with probability
(.. f).
We add the state f to X only when the matrix 1 is strictly substochastic in
some ., that is,

,
(.. ,) < 1. From now on, speaking of the trajectory space
(C. A. Pr
x
), . X, this will refer to (X. 1), when 1 is stochastic, and to (X L
{f]. 1), otherwise.
Harmonic and superharmonic functions
All functions : X Rconsidered in the sequel are supposed to be 1-integrable:
,A
(.. ,) [(,)[ < o for all . X. (6.12)
1
Of course, potential and boundary theory of non-irreducible chains are also of interest. Here, we
restrict the exposition to the basic case.
B. Harmonic and superharmonic functions. Invariant and excessive measures 159
In particular, (6.12) holds for every function when 1 has nite range, that is, when
{, X : (.. ,) > 0] is nite for each ..
As previously, we dene the transition operator 1 ,
1(.) =
,A
(.. ,)(,).
We repeat that our discrete analogue of the Laplace operator is 1 1, where 1 is
the identity operator. We also repeat the denition of harmonic functions.
6.13 Denition. A real function h on X is called harmonic if h(.) = 1h(.), and
superharmonic if h(.) _ 1h(.) for every . X.
We denote by
H = H(X. 1) = {h: X R [ 1h = h].
H
= {h H [ h(.) _ 0 for all . X] and

H
o
= {h H [ h is bounded on X]
(6.14)
the linear space of all harmonic functions, the cone of the non-negative harmonic
functions and the space of bounded harmonic functions. Analogously, we dene
= (X. 1), the space of all superharmonic functions,
and
o
. (Note that
is not a linear space.)
The following is analogous to Lemma 6.5. We assume of course irreducibility
and that [X[ > 1.
6.15 Lemma (Maximum principle). If h H and there is . X such that
h(.) = M = max
A
h, then h is constant. Furthermore, if M = 0 , then 1 is
stochastic.
Proof. We use irreducibility. If .
t
X and
(n)
(.. .
t
) > 0, then as in the proof of
Lemma 6.5,
M = h(.) _
,yx
0
(n)
(.. ,) M
(n)
(.. .
t
) h(.
t
)
_
_
1
(n)
(.. .
t
)
_
M
(n)
(.. .
t
) h(.
t
)
where in the second inequality we have used substochasticity. As above it follows
that h(.
t
) = M. In particular, the constant function h M is harmonic. Thus, if
M = 0, then the matrix 1 must be stochastic, since X has more than one element.
6.16 Exercise. Deduce the following in at least two different ways.

If X is nite and 1 is irreducible and strictly substochastic in some point, then
H = {0].
We next exhibit two simple properties of superharmonic functions.
6.17 Lemma. (1) If h
then 1
n
h
for each n, and either h 0 or

h(.) > 0 for every ..
(2) If h
i
, i 1, is a family of superharmonic functions and h(.) = inf
J
h
i
(.)
denes a 1-integrable function, then also h is superharmonic.
Proof. (1) Since 0 _ 1h _ h, the 1-integrability of h implies that of 1h and,
inductively, also of 1
n
h. Furthermore, the transition operator is monotone: if
_ g then 1 _ 1g. In particular, 1
n
h _ h.
Suppose that h(.) = 0 for some .. Then for each n,
0 = h(.) _
(n)
(.. ,)h(,).
Since h _ 0, we must have h(,) = 0 for every , with .
n
,. Irreducibility
implies h 0.
(2) By monotonicity of 1, we have 1h _ 1h
i
_ h
i
for every i 1. Therefore
1h _ inf
J
h
i
= h.
6.18 Exercise. Show that in statement (2) of Lemma 6.17, 1-integrability of h =
inf
J
h
i
follows from 1-integrability of the h
i
, if v the set 1 is nite, or if v the h
i
are uniformly bounded below (e.g., non-negative).
In the case when 1 is strictly substochastic in some state ., the elements of X
cannot be recurrent. Indeed, if we pass to the stochastic extension of 1 on X L{f],
we know that the irreducible class X is non-essential, whence non-recurrent by
Theorem 3.4 (b).
In general, in the transient (irreducible) case, there is a fundamental family of
functions in
:
6.19 Lemma. If (X. 1) is transient, then for each , X, the function G(. ,),
dened by . G(.. ,), is superharmonic and positive. There is at most one
, X for which G(. ,) is a constant function. If 1 is stochastic, then G(. ,) is
non-constant for every ,.
Proof. We know from (1.34) that
1G(. ,) = G(. ,) 1
,
.
Therefore G(. ,)
.
Suppose that there are ,
1
. ,
2
X, ,
1
= ,
2
, such that the functions G(. ,
i
)
are constant. Then, by Theorem 1.38 (b)
J(,
1
. ,
2
) =
G(,
1
. ,
2
)
G(,
2
. ,
2
)
= 1 and J(,
2
. ,
1
) =
G(,
2
. ,
1
)
G(,
1
. ,
1
)
= 1.
Now Proposition 1.43 (a) implies J(,
1
. ,
1
) _ J(,
1
. ,
2
)J(,
2
. ,
1
) = 1, and ,
1
is
recurrent, a contradiction.
Finally, if 1 is stochastic, then every constant function is harmonic, while
G(. ,) is strictly subharmonic at ,, so that it cannot be constant.
6.20 Exercise. Show that G(. ,) is constant for the substochastic Markov chain
illustrated in Figure 19.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....................................................................
.
.......................................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............
............
1,2
.................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
,
Figure 19
The following is the fundamental result in this section. (Recall our assumption
of irreducibility and that [X[ > 1, while 1 may be substochastic.)
6.21Theorem. (X. 1) is recurrent if and only if every non-negative superharmonic
function is constant.
Proof. a) Suppose that (X. 1) is recurrent.
First step. We show that
= H
:
Let h
. We set g(.) = h(.) 1h(.). Then g is non-negative and

1-integrable. We have
n
k=0
1
k
g(.) =
n
k=0
_
1
k
h(.) 1
k1
h(.)
_
= h(.) 1
n1
h(.).
Suppose that g(,) > 0 for some ,. Then
n
k=0
(k)
(.. ,) g(,) _
n
k=0
1
k
g(.) _ h(.)
for each n, and
G(.. ,) _ h(.),g(,) < o.
a contradiction. Thus g 0, and h is harmonic. In particular, substochasticity
implies that the constant function 1 is superharmonic, whence harmonic, and 1
must be stochastic. (We know this already from the fact that otherwise, X is a
non-essential class in X L {f].)
Second step. Let h
= H
, and let .
1
. .
2
X. We set M
i
= h(.
i
), i = 1. 2.
Then h
i
(.) = min{h(.). M
i
] is a superharmonic function by Lemma 6.17, hence
harmonic by the rst step. But h
i
assumes its maximum M
i
in .
i
and must be
constant by the maximum principle (Lemma 6.15): min{h(.). M
i
] = M
i
for all ..
This yields
h(.
1
) = min{h(.
1
). h(.
2
)] = h(.
2
).
and h is constant.
b) Conversely, suppose that
= {constants]. Then, by Lemma 6.19, (X. 1)

cannot be transient: otherwise there is , X for which G(. ,)
is non-
constant.
6.22 Exercise. The denition of harmonic and superharmonic functions does of
course not require irreducibility.
(a) Show that when 1 is substochastic, not necessarily irreducible, then the
function J(. ,) is superharmonic for each ,.
(b) Show that (X. 1) is irreducible if and only if every non-negative, non-zero
superharmonic function is strictly positive in each point.
(c) Assume in addition that 1 is stochastic. Show the following. If a superhar-
monic function attains its minimum in some point . then it has the same value in
every , with . ,.
(d) Show for stochastic 1 that irreducibility is equivalent with the minimum
principle for superharmonic functions: if a superharmonic function attains a mini-
mum in some point then it is constant.
Invariant and excessive measures
As above, we suppose that (X. 1) is irreducible, [X[ _ 2 and 1 substochastic. We
continue, with the proof of Theorem 6.26, the study of invariant measures initiated
in Section 3.B. Recall that a measure v on X is given as a row vector
_
v(.)
_
xA
.
Here, we consider only non-negative measures. In analogy with 1-integrability of
functions, we allow only measures which satisfy
v1(,) =
xA
v(.)(.. ,) < o for all , X. (6.23)
The action of the transition operator is multiplication with 1 on the right: v
v1. We recall from Denition 3.17 that a measure v on X is called invariant or
stationary, if v = v1. Furthermore, v is called excessive or superinvariant, if
v(,) _ v1(,) for every , X. We denote by
(X. 1) and E
=
E
(X. 1) the cones of all invariant and superinvariant measures, respectively.

Theorem 3.19 and Corollary 3.23 describe completely the invariant measures in
the case when X is nite and 1 stochastic, not necessarily irreducible. On the other
hand, if X is nite and 1 irreducible, but strictly substochastic in some point, then
the unique invariant measure is v 0. In fact, in this case, there are no harmonic
functions = 0 (see Exercise 6.16). In other words, the matrix 1 does not have
z = 1 as an eigenvalue.
The following is analogous to Lemmas 6.17 and 6.19.
6.24 Exercise. Prove the following.
(1) If v E
then v1
n
E
for each n, and either v 0 or v(.) > 0 for

every ..
(2) If v
i
, i 1, is a family of excessive measures, then also v(.) = inf
J
v
i
(.)
is excessive.
(3) If (X. 1) is transient, then for each . X, the measure G(.. ), dened by
, G(.. ,), is excessive.
Next, we want to knowwhether there also are excessive measures in the recurrent
case. To this purpose, we recall the last exit probabilities
(n)
(.. ,) and the
associated generating function 1(.. ,[:) dened in (3.56) and (3.57), respectively.
We know from Lemma 3.58 that
1(.. ,) =
o
n=0
(n)
(.. ,) = 1(.. ,[1).
the expected number of visits in , before returning to ., is nite. Setting : = 1 in
the second and third identities of Exercise 3.59, we get the following.
6.25 Corollary. In the recurrent as well as in the transient case, for each . X,
the measure 1(.. ), dened by , 1(.. ,), is nite and excessive.
Indeed, in Section 3.F, we have already used the fact that 1(.. ) is invariant in
the recurrent case.
6.26 Theorem. Let (X. 1) be substochastic and irreducible. Then (X. 1) is recur-
rent if and only if there is a non-zero invariant measure v such that each excessive
measure is a multiple of v, that is
E
(X. 1) = {c v : c _ 0].
In this case, 1 must be stochastic.
Proof. First, assume that 1 is recurrent. Then we know e.g. from Theorem 6.21
that 1 must be stochastic (since constant functions are harmonic). We also know,
from Corollary 6.25, that there is an excessive measure v satisfying v(,) > 0 for
all ,. (Take v = 1(.. ) for some ..) We construct the v-reversal

1 of 1 as in
(3.30) by
(.. ,) = v(,)(,. .),v(.). (6.27)
Excessivity of v yields that

1 is substochastic. Also, it is straightforward to prove
that

(n)
(.. ,) = v(,)
(n)
(,. .),v(.).
Summing over n, we see that also

1 is recurrent, whence stochastic. Thus, v
must be invariant. If o is any other excessive measure, and we dene the function
h(.) = o(.),v(.), then as in (3.31), we nd that

1h _ h. By Theorem 6.21, h
must be constant, that is, o = c v for some c _ 0.
To prove the converse implication, we just observe that in the transient case,
the measure o = G(.. ) satises o 1 = o
x
. It is excessive, but not invariant.
C Induced Markov chains

We now introduce and study an important probabilistic notion for Markov chains,
whose relevance for potential theoretic issues will become apparent in the next
sections.
Suppose that (X. 1) is irreducible and substochastic. Let be an arbitrary
non-empty subset of X. The hitting time t
= inf{n > 0 : 7
n
] dened in
(1.26) is not necessarily a.s. nite. We dene
(.. ,) = Pr
x
t
< o. 7
t
A = ,|.
If , then
(.. ,) = 0. If , ,
(.. ,) =
o
n=1
x
1
,...,x
n1
A\
(.. .
1
)(.
1
. .
2
) (.
n-1
. ,). (6.28)
We observe that

,
(.. ,) = Pr
x
t
< o| _ 1.
In other words, the matrix 1
=
_
(.. ,)
_
x,,
is substochastic. The Markov
chain (. 1
) is called the Markov chain induced by (X. 1) on .

We observe that irreducibility of (X. 1) implies irreducibility of the induced
chain: for .. , (. = ,) there are n > 0 and .
1
. . . . . .
n-1
X such that
(.. .
1
)(.
1
. .
2
) (.
n-1
. ,) > 0. Let i
1
< < i
n-1
be the indices for
which .
i
j
. Then
(.. .
i
1
)
(.
i
1
. .
i
2
)
(.
i
m1
. ,) > 0, and .
n
,
with respect to 1
.
In general, the matrix 1
is not stochastic. If it is stochastic, that is,

Pr
x
t
< o| = 1 for all . .

then we call the set recurrent for (X. 1). If the Markov chain (X. 1) is recurrent,
then every non-empty subset of X is recurrent for (X. 1). Conversely, if there exists
a nite recurrent subset of X, then (X. 1) must be recurrent.
C. Induced Markov chains 165
6.29 Exercise. Prove the last statement as a reminder of the methods of Chapter 3.
On the other hand, even when (X. 1) is transient, one can very well have
(innite) proper subsets of X that are recurrent.
6.30 Example. Consider the random walk on the Abelian group Z
2
whose law j
in the sense of (4.18) (additively written) is given by
j
_
(1. 0)
_
=
1
. j
_
(0. 1)
_
=
2
. j
_
(1. 0)
_
=
3
. j
_
(0. 1)
_
=
4
.
and j
_
(k. l)
_
= 0 in all other cases, where
i
> 0 and
1

2

3

4
= 1.
Thus
_
(k. l). (k 1. l)
_
=
1
.
_
(k. l). (k. l 1)
_
=
2
.
_
(k. l). (k 1. l)
_
=
3
.
_
(k. l). (k. l 1)
_
=
4
.
(k. l) Z
2
.
6.31 Exercise. Show that this random walk is recurrent if and only if
1
=
3
and
2
=
4
.
Setting
= {(k. l) Z
2
: k is even ].
one sees immediately that is a recurrent set for any choice of the
i
. Indeed,
Pr
(k,I)
t
= 2| = 1 for every (k. l) . The induced chain is given by
_
(k. l). (k 2. l)
_
=
2
1
.
_
(k. l). (k 1. l 1)
_
= 2
1
2
.
_
(k. ). (k. 2)
_
=
2
2
.
_
(k. l). (k 1. l 1)
_
= 2
2
3
.
_
(k. l). (k 2. l)
_
=
2
3
.
_
(k. l). (k 1. l 1)
_
= 2
3
4
.
_
(k. l). (k. l 2)
_
=
2
4
.
_
(k. l). (k 1. l 1)
_
= 2
1
4
.
_
(k. l). (k. l)
_
= 2
1
3
2
2
4
.
6.32 Example. Consider the innite drunkards walk on Z (see Example 3.5)
with parameters and q = 1 . The random walk is recurrent if and only if
= q = 1,2.
(1) Set = {0. 1. . . . . N]. Being nite, the set is recurrent if and only the
random walk itself is recurrent. The induced chain has the following non-zero
transition probabilities.
(k 1. k) = .
(k. k 1) = q (k = 1. . . . . N). and
(0. 0) =
(N. N) = min{. q].

Indeed, if starting at state N the rst return to occurs in N, the rst step of (7
n
)
has to go from N to N 1, after which (7
n
) has to return to N. This means that
(N. N) = J(N 1. N), and in Examples 2.10 and 3.5, we have computed
J(N 1. N) = J(1. 0) = (1 [ q[),2, leading to
(N. N) = min{. q].

By symmetry,
(0. 0) has the same value.

In particular, if > q, the induced chain is strictly substochastic only at the
point N. Conversely, if < q, the only point of strict substochasticity is 0.
(2) Set = N
0
. The transition probabilities of the induced chain coincide with
those of the original random walk in each point k > 0. Reasoning as above in (1),
we nd
N
0
(0. 1) = and
N
0
(0. 0) = min{. q].
We see that the set N
0
is recurrent if and only if _ q. Otherwise, the only point
of strict substochasticity is 0.
6.33 Lemma. If the set is recurrent for (X. 1) then
Pr
x
t
< o| = 1 for all . X.

Proof. Factoring with respect to the rst step, one has even when the set is not
recurrent
Pr
x
t
< o| =
,
(.. ,)
,A\
(.. ,) Pr
,
t
< o| for all . X.

(Observe that in case , = 7
1
, one has t
= 1.) In particular, if we have

Pr
,
t
< o| = 1 for every , , then the function h(.) = Pr

x
t
< o| is har-
monic andassumes its maximumvalue 1. Bythe maximumprinciple (Lemma 6.15),
h 1.
Observe that the last lemma generalizes Theorem 3.4 (b) in the irreducible case.
Indeed, one may as well introduce the basic potential theoretic setup in particular,
the maximum principle at the initial stage of developing Markov chain theory and
thereby simplify a few of the proofs in the rst chapters.
The following is intuitively obvious, but laborious to formalize.
6.34 Theorem. If T X then (1
B
)
= 1
.
Proof. Let (7
B
n
) be the Markov chain relative to (T. 1
B
). It is a random subse-
quence of the original Markov chain (7
n
), which can be realized on the trajectory
space associated with (X. 1) (which includes the tomb state f). We use the ran-
dom variable v
B
introduced in 1.C (number of visits in T), and dene w
B
n
(o) = k,
if n _ v
B
(o) and k is the instant of the n-th return visit to T. Then
7
B
n
=
_
7
w
B
n
. if n _ v
B
.
f. otherwise.
(6.35)
C. Induced Markov chains 167
Let t
B
be the stopping time of the rst visit of (7
B
n
) in . Since T, we have for
every trajectory o C that t
(o) = oif and only if t
B
(o) = o. Furthermore,
t
(o) _ t
B
(o). Hence, if t
(o) < o, then (6.35) implies

7
B
t
A
B
(o)
(o) = 7
t
A
(o)
(o).
that is, the rst return visits in of (7
B
n
) and of (7
n
) take place at the same point.
Consequently, for .. , ,
(
B
)
(.. ,) = Pr
x
t
B
< o. 7
B
t
A
B
= ,|
= Pr
x
t
< o. 7
t
A = ,| =
(.. ,).
If and T are two arbitrary non-empty subsets of X (not necessarily such that
one is contained in the other), we dene the restriction of 1 to T by
1
,B
=
_
(.. ,)
_
x,,B
. (6.36)
In particular, if = T, we have 1
,
= 1
, as dened in Denition 2.14. Recall

(2.15) and the associated Green function G
(.. ,) for .. , , which is nite by

Lemma 2.18.
6.37 Lemma. 1
= 1
1
,A\
G
A\
1
A\,
.
Proof. We use formula (6.28), factorizing with respect to the rst step. If .. , ,
then the induced chain starting at . and going to , either moves to , immediately,
or else exits and re-enters into only at the last step, and the re-entrance must
occur at ,:
(.. ,) = (.. ,)
A\
(.. ) Pr
< o. 7
t
A = ,|. (6.38)
We now factorize with respect to the last step, using the Markov property:
Pr
< o. 7
t
A = ,| =
uA\
Pr
< o. 7
t
A
-1
= n. 7
t
A = ,|
=
uA\
o
n=1
Pr
= n. 7
n-1
= n. 7
n
= ,|
=
uA\
o
n=1
Pr
7
n
= ,. 7
n-1
= n. 7
i
for all i < n|
=
uA\
o
n=0
Pr
7
n
= n. 7
i
for all i _ n| (n. ,)
=
uA\
G
A\
(. n) (n. ,).
[In the last step we have used (2.15).] Thus
(.. ,) = (.. ,)
A\
uA\
(.. ) G
A\
(. n) (n. ,).
6.39 Theorem. Let v E
(X. 1), X and v
the restriction of v to . Then

v
(. 1
).
Proof. Let . . Then
v
(.) = v(.) _ v1(.) = v
(.) v
A\
1
A\,
(.).
Hence
v
_ v
v
A\
1
A\,
.
and by symmetry
v
A\
_ v
A\
1
A\
v
1
,A\
.
Applying
n-1
k=0
1
k
A\
from the right, the last relation yields
v
A\
_ v
A\
1
n
A\
v
1
,A\
_
n-1
k=0
1
k
A\
_
_ v
1
,A\
_
n-1
k=0
1
k
A\
_
for every n _ 1. By monotone convergence,
v
1
,A\
_
n-1
k=0
1
k
A\
_
v
1
,A\
G
A\
pointwise, as n o. Therefore
v
A\
_ v
1
,A\
G
A\
.
Combining the inequalities and applying Lemma 6.37,
v
_ v
1
,A\
G
A\
1
A\,
= v
.
as proposed
6.40 Exercise. Prove the dual to the above result for superharmonic functions:
if h
(X. 1) and X, then the restriction of h to the set satises

h
(. 1
).
D. Potentials, Riesz decomposition, approximation 169
D Potentials, Riesz decomposition, approximation
With Theorems 6.21 and 6.26, we have completed the description of all positive
superharmonic functions and excessive measures in the recurrent case. Therefore,
in the rest of this chapter,
we assume that (X. 1) is irreducible and transient.
This means that
0 < G(.. ,) < o for all .. , X.
6.41 Denition. A G-integrable function : X R is one that satises
,
G(.. ,) [(,)[ < o
for each . X. In this case,
g(.) = G(.) =
,A
G(.. ,) (,)
is called the potential of , while is called the charge of g.
If we set

(.) = max{(.). 0] and
-
(.) = max{(.). 0] then is G-
integrable if and only

and
-
have this property, and G = G

G
-
. In the
sequel, whenstudyingpotentials G , we shall always assume tacitlyG-integrability
of . The support of is, as usual, the set supp( ) = {. X : (.) = 0].
6.42 Lemma. (a) If g is the potential of , then = (1 1)g. Furthermore,
1
n
g 0 pointwise.
(b) If is non-negative, then g = G
, and g is harmonic on X\supp( ),

that is, 1g(.) = g(.) for every . X \ supp( ).
Proof. We may suppose that _ 0. (Otherwise, decomposing =

-
,
the extension to the general case is immediate.)
Since all terms are non-negative, convergence of the involved series is absolute,
and
1 G = G 1 =
o
n=1
1
n
= G .
This implies the rst part of (a) as well as (b). Furthermore
1
n
g(.) = G1
n
(.) =
o
k=n
1
k
(.)
is the n-th rest of a convergent series, so that it tends to 0.
Formally, one has G =

o
n=0
1
n
= (1 G)
-1
(geometric series), but as
already mentioned in Section 1.D one has to pay attention on which space of
functions (or measures) one considers G to act as an operator. For the G-integrable
functions we have seen that (1 1)G = G(1 1) = . But in general, it is
not true that G(1 1) = , even when (1 1) is a G-integrable function.
For example, if 1 is stochastic and (.) = c > 0, then (1 1) = 0 and
G(1 1) = 0 = .
6.43 Riesz decomposition theorem. If u
then there are a potential g = G

and a function h H
such that
u = G h.
The decomposition is unique.
Proof. Since u _ 0 and u _ 1u, non-negativity of 1 implies that for every . X
and every n _ 0,
1
n
u(.) _ 1
n1
u(.) _ 0.
Therefore, there is the limit function
h(.) = lim
n-o
1
n
u(.).
Since 0 _ h _ u and u is 1-integrable, Lebesgues theorem on dominated conver-
gence implies
1h(.) = 1
_
lim
n-o
1
n
u
_
(.) = lim
n-o
1
n
(1u)(.) = lim
n-o
1
n1
u(.) = h(.).
and h is harmonic. We set
= u 1u.
This functionis non-negative, and1
k
-integrable alongwithuand1u. Inparticular,
1
k
= 1
k
u 1
k1
u for every k _ 0:
u 1
n1
u =
n
k=0
(1
k
u 1
k1
u) =
n
k=0
1
k
.
Letting n owe obtain
u h =
o
k=0
1
k
= G = g.
This proves existence of the decomposition. Suppose now that u = g
1
h
1
is an-
other decomposition. We have 1
n
u = 1
n
g
1
h
1
for every n. By Lemma 6.42 (a),
1
n
g
1
0 pointwise. Hence 1
n
u h
1
, so that h
1
= h. Therefore also g
1
= g
and, again by Lemma 6.42 (a),
1
= (1 1)g
1
= (1 1)g = .
D. Potentials, Riesz decomposition, approximation 171
6.44 Corollary. (1) If g is a non-negative potential then the only function h H
with g _ h is h 0.
(2) If u
and there is a potential g = G with g _ u, then u is the

potential of a non-negative function.
Proof. (1) By Lemma 6.42 (a), we have h = 1
n
h _ 1
n
g 0 pointwise, as
n o.
(2) We write u = G
1
h
1
with h
1
H
and
1
_ 0 (Riesz decomposition).
Then h
1
_ g, and h
1
0 by (1).
We now illustrate what happens in the case when the state space is nite.
The nite case
(I) If X is nite and 1 is irreducible but strictly substochastic in some point, then
H = {0], see Exercise 6.16. Consequently every positive superharmonic function
is a potential G , where _ 0. In particular, the constant function 1 is superhar-
monic, and there is a function _ 0 such that G 1. Let u be a superharmonic
function that assumes negative values. Setting M = min
A
u(.) > 0, the func-
tion . u(.) M becomes a non-negative superharmonic function. Therefore
every superharmonic function (not necessarily positive) can be written in the form
u = G( M ).
where _ 0.
(II) Assume that X is nite and 1 stochastic. If 1 is irreducible then (X. 1) is
recurrent, and all superharmonic functions are constant: indeed, by Theorem 6.21,
this is true for non-negative superharmonic functions. On the other hand, the con-
stant functions are harmonic, and every superharmonic function can be written as
u M, where M is constant and u
.
(III) Let us now assume that (X. 1) is nite, stochastic, but not irreducible,
with the essential classes C
i
, i 1 = {1. . . . . m]. Consider X
ess
, the union of the
essential classes, and the probability distributions v
x
on 1 as in (6.9). The set
X
o
= X \ X
ess
is assumed to be non-empty. (Otherwise, (X. 1) decomposes into a nite number
of irreducible Markov chains the restrictions to the essential classes which do
not communicate among each other, and to each of them one can apply what has
been said in (II).)
Let u (X. 1). Then the restriction of u to C
i
is superharmonic for 1
C
i
.
The Markov chain (C
i
. 1
C
i
) is recurrent by Theorem 3.4 (c). If g(i ) = min
C
i
u,
then u[
C
i
g(i )
(C
i
. 1
C
i
). Hence u[
C
i
is constant by Lemma 6.17 (1) or
Theorem 6.21,
u[
C
i
g(i ).
We set
h(.) =
_
J
g Jv
x
=
iJ
g(i ) v
x
(i ).
see Theorem 6.8 (b). Then h is the unique harmonic function on X which satises
h(.) = g(i ) for each i 1. . C
i
, and so uh (X. 1) and u(,) h(,) = 0
for each , X
ess
. Exercise 6.22 (c) implies that u h _ 0 on the whole of X.
(Indeed, if the minimum of u h is attained at . then there is , X
ess
such
that . ,, and the minimum is also attained in ,.) We infer that _ 0, and
(u h)[
A
o
(X
o
. 1
A
o).
6.45 Exercise. Deduce that there is a unique function on X
o
such that
(u h)[
A
o = G
A
o = G.
and _ 0 with supp( ) X
o
. (Note here that (X
o
. 1
A
o) is substochastic, but
not necessarily irreducible.)
We continue by observing that G(.. ,) = 0 for every . X
ess
and , X
o
, so
that we also have uh = G on the whole of X. We conclude that every function
u
(X. 1) can be uniquely represented as

u(.) = G(.)
_
J
g Jv
x
.
where and g are functions on X
0
and 1, respectively.
Let us return to the study of positive superharmonic functions in the case where
(X. 1) is irreducible, transient, not necessarily stochastic. The following theorem
will be of basic importance when X is innite.
6.46 Approximation theorem. If h
(X. 1) then there is a sequence of po-

tentials g
n
= G
n
,
n
_ 0, such that g
n
(.) _ g
n1
(.) for all . and n, and
lim
n-o
g
n
(.) = h(.).
Proof. Let be a nite subset of X. We dene the reduced function of h on : for
. X,
1
h|(.) = inf {u(.) : u
. u(,) _ h(,) for all , ].

The reduced function is also dened when is innite, and since h is superhar-
monic
1
h| _ h.
E. Balayage and domination principle 173
In particular, we have
1
h|(.) = h(.) for all . .

Furthermore, 1
h|
by Lemma 6.17(2). Let

0
(.) = h(.), if . ,
and
0
(.) = 0, otherwise.
0
is non-negative and nitely supported. (It is here
that we use the assumption of niteness of for the rst time.) In particular, the
potential G
0
exists and is nite on X. Also, G
0
_
0
. Thus G
0
is a positive
superharmonic function that satises G
0
(,) _ h(,) for all , . By denition
of the reduced function, we get 1
h| _ G
0
. Now we see that 1
h| is a positive
superharmonic function majorized by a potential, and Corollary 6.44(2) implies
that there is a function =
h,
_ 0 such that
1
h| = G.
Let T be another nite subset of X, containing . Then 1
B
h| is a positive super-
harmonic function that majorizes h on the set . Hence
1
B
h| _ 1
h|. if T .
Nowwe can conclude the proof of the theorem. Let (
n
) be an increasing sequence
of nite subsets of X such that X =
_
n
n
, and let
g
n
= 1
n
h|.
Then each g
n
is the potential of a non-negative function
n
, we know that g
n
_
g
n1
_ h, and g
n
coincides with h on
n
.
The approximation theoremapplies in particular to positive harmonic functions.
In the Riesz decomposition of such a function h, the potential is 0. Nevertheless, h
can not only be approximated frombelowby potentials, but the latter can be chosen
such as to coincide with h on arbitrarily large nite subsets of X.
E Balayage and domination principle
For X and .. , X we dene
J
(.. ,) =
o
n=0
Pr
x
7
n
= ,. 7
}
for 0 _ < n| 1
(,).
1
(.. ,) =
o
n=0
Pr
x
7
n
= ,. 7
}
for 0 < _ n| 1
(.).
(6.47)
Thus, J
(.. ,) = Pr
x
7
s
A = ,| is the probability that the rst visit in the set
of the Markov chain starting at . occurs at ,. On the other hand, 1
(.. ,) is the
expected number of visits in the point , before re-entering , where 7
0
= . .
J
,
(.. ,) = J(.. ,) = G(.. ,),G(,. ,) coincides with the probability to
reach , starting from., dened in (1.27). In the same way 1
x
(.. ,) = 1(.. ,) =
G(.. ,),G(.. .) is the quantitydenedin(3.57). Inparticular, Lemma 3.58implies
that 1
(.. ,) _ 1(.. ,) is nite even when the Markov chain is recurrent.

Paths and their weights have been considered at the end of Chapter 1. In that
notation,
J
(.. ,) = w
_
{ (.. ,) : meets only in the terminal point]
_
. and
1
(.. ,) = w
_
{ (.. ,) : meets only in the initial point]
_
.
6.48 Exercise. Prove the following duality between J
and 1
: let

1 be the
reversal of 1 with respect to some excessive (positive) measure v, as dened in
(6.27), then
(.. ,) =
v(,)J
(,. .)
v(.)
and

J
(.. ,) =
v(,)1
(,. .)
v(.)
.
where

J
(.. ,) and

1
(.. ,) are the quantities of (6.47) relative to

1.
The following two identities are obvious.
. == J
(.. ) =
x
. and , == 1
(. ,) = 1
,
. (6.49)
(It is always useful to think of functions as column vectors and of measures as row
vectors, whence the distinction between 1
,
and
x
.) We recall for the following that
we consider the restriction 1
A\
of 1 to X \ and the associated Green function
G
A\
on the whole of X, taking values 0 if . or , .
6.50 Lemma. (a) G = G
A\
J
G. (b) G = G
A\
G 1
.
Proof. We show only (a); statement (b) follows from (a) by duality (6.48). For
.. , X,
(n)
(.. ,) = Pr
x
7
n
= ,. s
> n| Pr
x
7
n
= ,. s
_ n|
=
(n)
A\
(.. ,)
Pr
x
7
n
= ,. s
_ n. 7
s
A = |
=
(n)
A\
(.. ,)
k=0
Pr
x
7
n
= ,. s
= k. 7
k
= |
=
(n)
A\
(.. ,)
k=0
Pr
x
s
= k. 7
k
= | Pr
x
7
n
= , [ 7
k
= |
=
(n)
A\
(.. ,)
k=0
Pr
x
s
= k. 7
k
= |
(n-k)
(. ,).
Summing over all n and applying (as so often) the Cauchy formula for the product
of two absolutely convergent series,
G(.. ,) = G
A\
(.. ,)
_
o
k=0
Pr
x
s
= k. 7
k
= |
__
o
n=0
(n)
(. ,)
_
= G
A\
(.. ,)
Pr
x
s
< o. 7
s
A = | G(. ,)
= G
A\
(.. ,)
A
J
(.. ) G(. ,).

as proposed.
The interpretation of statement (a) in terms of weights of paths is as follows.
Recall that G(.. ,) is the weight of the set of all paths from. to ,. It can be decom-
posed as follows: we have those paths that remain completely in the complement of
their contribution to G(.. ,) is G
A\
(.. ,) and every other path must posses
a rst entrance time into , and factorizing with respect to that time one obtains
that the overall weight of the latter set of paths is

(.. ) G(. ,). The

interpretation of statement (b) is analogous, decomposing with respect to the last
visit in .
6.51 Corollary. J
G = G 1
.
There is also a link with the induced Markov chain, as follows.
6.52 Lemma. The matrix 1
over satises
1
= 1
,A
J
= 1
1
A,
.
Proof. We can rewrite (6.38) with s
in the place of t
, since these two stopping

times coincide when the initial point is not in :
(.. ,) = (.. ,)
A\
(.. ) Pr
< o. 7
s
A = ,|
=
(.. )
(,)
A\
(.. ) J
(. ,)
=
A
(.. ) J
(. ,).
Observing that (6.28) implies
(.. ,) = v(,)
(,. .),v(.) (where v is an

excessive measure for 1), the second identity follows by duality.
6.53 Lemma. (1) If h
(X. 1), then J
h(.) =

,
J
(.. ,) h(,) is nite

and
J
h(.) _ h(.) for all . X.

(2) If v E
(X. 1), then v1
(,) =

x
v(.) 1
(.. ,) is nite and

v1
(,) _ v(,) for all , X.

Proof. As usual, (2) follows from (1) by duality. We prove (1). By the approxima-
tion theorem (Theorem 6.46) we can nd a sequence of potentials g
n
= G
n
with
n
_ 0 and g
n
_ g
n1
, such that lim
n
g
n
= h pointwise on X. The
n
can be
chosen to have nite support. Lemma 6.50 implies
J
g
n
= J
G
n
= G
n
G
A\
n
_ g
n
_ h.
By the monotone convergence theorem,
J
h = J
_
lim
n-o
g
n
_
= lim
n-o
_
J
g
n
_
_ h.
which proves the claims.
Recall the denition of the reduced function on of a positive superharmonic
function h: for . X,
1
h|(.) = inf {u(.) : u
. u(,) _ h(,) for all , ].

Analogously one denes the reduced measure on of an excessive measure v:
1
v|(.) = inf {j(.) : j E
. j(,) _ v(,) for all , ].

We are now able to describe the reduced functions and measures in terms of matrix
operators.
6.54Theorem. (i) If h
then 1
h| = J
h. In particular, 1
h| is harmonic
in every point of X \ , while 1
h| h on .
(ii) If v E
then 1
v| = v1
. In particular, 1
v| is invariant in every
point of X \ , while 1
v| v on .
Proof. Always by duality it is sufcient to prove (a).
1.) If . X \and , , we factorize with respect to the rst step: by (6.49)
J
(.. ,) = (.. ,)
A\
(.. ) J
(. ,) =
A
(.. ) J
(. ,).
In particular,
J
h(.) = 1(J
h)(.). . X \ .
2.) If . then by Lemma 6.52 and the dual of Theorem 6.39 (Exer-
cise 6.40),
1 (J
h)(.) =
,
1 J
(.. ,) h(,) = 1
h(.) _ h(.).
3.) Again by (6.49),
J
h(.) = h(.) for all . .

Combining 1.), 2.) and 3.), we see that
J
h {u
: u(,) _ h(,) for all , ].

Therefore 1
h| _ J
h.
4.) Now let u
and u(,) _ h(,) for every , . By Lemma 6.53, for

very . X
u(.) _
,
J
(.. ,) u(,) _
,
J
(.. ,) h(,) = J
h(.).
Therefore 1
h| _ J
h.
In particular, let be a non-negative G-integrable function and g = G its po-
tential. By Corollary 6.44(2), 1
g| must be a potential. Indeed, by Corollary 6.51,

1
g| = J
G = G 1
is the potential of 1
.
Analogously, if j is a non-negative, G-integrable measure (that is, jG(,) =
x
j(.) G(.. ,) < o for all ,), then its potential is the excessive measure
v = jG. In this case,
1
v| = jJ
G
is the potential of the measure jJ
.
6.55 Denition. (1) If is a non-negative G-integrable function on X, then the
balaye of is the function

= 1
.
(2) If j is a non-negative, G-integrable measure on X, then the balaye of j is
the measure j
= jJ
.
The meaning of balaye (French, balayer sweep out) is the following: if
one considers the potential g = G only on the set , the function (the charge)
contains superuous information. The latter can be eliminated by passing to the
charge 1
which has the same potential on the set , while on the complement
of that potential is as small as possible.
An important application of the preceding results is the following.
6.56 Theorem (Domination principle). Let be a non-negative, G-integrable
function on X, with support . If h
is such that h(.) _ G(.) for every

. , then h _ G on the whole of X.
Proof. By (6.49),

= . Lemma 6.53 and Corollary 6.51 imply
h(.) _ J
h(.) =
,
J
(.. ,)h(,)
_
,
J
(.. ,) G(,) = J
G(.) = G

(.) = G(.)
for every . X.
6.57 Exercise. Give direct proofs of all statements of the last section, concerning
excessive and invariant measures, where we just relied on duality.
In particular, formulate and prove directly the dual domination principle for
excessive measures.
6.58 Exercise. Use the domination principle to show that
G(.. ,) _ J(.. n)G(n. ,).
Dividing by G(,. ,), this leads to Proposition 1.43 (a) for : = 1.
Chapter 7
The Martin boundary of transient Markov chains
A Minimal harmonic functions
As in the preceding chapter, we always suppose that (X. 1) is irreducible and
1 substochastic. We want to undertake a more detailed study of harmonic and
superharmonic functions.
We know that
(X. 1), besides the 0 function, contains all non-nega-

tive constant functions. As we have already stated,
is a cone with vertex 0: if

u
\{0], then the ray (half-line) {a u : a _ 0] starting at 0 and passing through

u is entirely contained in
. Furthermore, the cone
is convex: if u
1
. u
2
are
non-negative superharmonic functions and a
1
. a
2
_ 0 then a
1
u
1
a
2
u
2

.
(Since we have a cone with vertex 0, it is superuous to require that a
1
a
2
= 1.)
A base of a cone with vertex is a subset B such that each element of the cone
different from can be uniquely written as a (u ) with a > 0 and u B.
Let us x a reference point (origin) o X. Then the set
B = {u
: u(o) = 1] (7.1)
is a base of the cone
. Indeed, if
and = 0, then (.) > 0 for each

. X by Lemma 6.17. Hence u =
1
(o)
B and we can write = a u with
a = (o).
Finally, we observe that
, as a subset of the space of all functions X R,

carries the topology of pointwise convergence: a sequence of functions
n
: X R
converges to the function if and only if
n
(.) (.) for every . X. This is
the product topology on R
A
.
We shall say that our Markov chain has nite range, if for every . X there is
only a nite number of , X with (.. ,) > 0.
7.2 Theorem. (a)
is closed and B is compact in the topology of pointwise

convergence.
(b) If 1 has nite range then H
is closed.
Proof. (a) In order to verify compactness of B, we observe rst of all that B is
closed. Let (u
n
) be a sequence of functions in
that converges pointwise to the

function u: X R. Then, by Fatous lemma (where the action of 1 represents
the integral),
1u = 1 (liminf u
n
) _ liminf 1u
n
_ liminf u
n
= u.
180 Chapter 7. The Martin boundary of transient Markov chains
and u is superharmonic. Let . X. By irreducibility we can choose k = k
x
such
that
(k)
(o. .) > 0. Then for every u B
1 = u(o) _ 1
k
u(o) _
(k)
(o. .)u(.).
and
u(.) _ C
x
= 1,
(k
x
)
(o. .). (7.3)
Thus, Bis contained in the compact set
xA
0. C
x
|. Being closed, Bis compact.
(b) If (h
n
) is a sequence of non-negative harmonic functions that converges
pointwise to the function h, then h
by (a). Furthermore, for each . X, the

summation in 1h
n
(.) =

,
(.. ,)h
n
(,) is nite. Therefore we may exchange
summation and limit,
1h = 1 (limh
n
) = lim1h
n
= limh
n
= h.
and h is harmonic.
We see that
is a convex cone with compact base B which contains H
as a convex sub-cone. When 1 does not have nite range, that sub-cone is not
necessarily closed. It should be intuitively clear that in order to know
(and
consequently also H
) it will be sufcient to understand dB, the set of extremal

points of the convex set B. Recall that an element uof a convex set is called extremal
if it cannot be written as a convex combination a u
1
(1 a) u
2
(0 < a < 1) of
distinct elements u
1
. u
2
of the same set. Our next aim is to determine the elements
of dB.
In the transient case we know from Lemma 6.19 that for each , X, the
function . G(.. ,) belongs to
and is strictly superharmonic in the point ,.

However, it does not belong to B. Hence, we normalize by dividing by its value
in o, which is non-zero by irreducibility.
7.4 Denition. (i) The Martin kernel is
1(.. ,) =
J(.. ,)
J(o. ,)
. .. , X.
(ii) A function h H
is called minimal, if
h(o) = 1, and
if h
1
H
and h _ h
1
in each point, then h
1
,h is constant.
Note that in the recurrent case, the Martin kernel is also dened and is constant
= 1, and
= H
= {non-negative constant functions] by Theorem 6.21. Thus,

we may limit our attention to the transient case, in which
1(.. ,) =
G(.. ,)
G(o. ,)
. (7.5)
A. Minimal harmonic functions 181
7.6Theorem. If (X. 1) is transient, then the extremal elements of Bare the Martin
kernels and the minimal harmonic functions:
dB = {1(. ,) : , X] L {h H
: h is minimal ].
Proof. Let u be an extremal element of B. Write its Riesz decomposition (Theo-
rem 6.43): u = G h with _ 0 and h H
.
Suppose that both G and h are non-zero. By Lemma 6.17, the values of these
functions in o are (strictly) positive, and we can dene
u
1
=
1
G(o)
G. u
2
=
1
h(o)
h B.
Since G(o) h(o) = u(o) = 1, we can write u as a convex combination u =
a u
1
(1 a) u
2
with 0 < a = G(o) < 1. But u
1
is strictly superharmonic
in at least one point, while u
2
is harmonic, so that we must have u
1
= u
2
. This
contradicts extremality of u. Therefore u is a potential or a harmonic function.
Case 1. u = G , where _ 0. Let = supp( ). This set must have at least one
element ,. Suppose that has more than one element. Consider the restrictions
1
and
2
of to {,] and \ {,], respectively. Then u = G
1
G
2
, and as
above, setting a = G
1
(o), we can rewrite this identity as a convex combination,
u = a Gg
1
(1 a) Gg
2
. where g
1
=
1
o
1
and g
2
=
1
1-o
2
.
By assumption, u is extremal. Hence we must have Gg
1
= Gg
2
and thus (by
Lemma 6.42) also g
1
= g
2
, a contradiction.
Consequently = {,] and = a 1
,
with a > 0. Since a G(o. ,) =
G(o) = u(o) = 1, we nd
u = 1(. ,).
as proposed.
Case 2. u = h H
. We have to prove that h is minimal. By hypothesis,

h(o) = 1. Suppose that h _ h
1
for a function h
1
H
. If h
1
= 0 or h
1
= h, then
h
1
,his constant. Otherwise, settingh
2
= hh
1
, bothh
1
andh
2
are strictlypositive
harmonic functions (Lemma 6.17). As above, we obtain a convex combination
h = h
1
(o)
h
1
h
1
(o)
h
2
(o)
h
2
h
2
(o)
.
But then we must have
1
h
i
(o)
h
i
= h. In particular, h
1
,h is constant.
Conversely, we must verify that the functions 1(. ,) and the minimal harmonic
functions are elements of dB.
Consider rst the function 1(. ,) with , X. It can be written as the potential
G , where =
1
G(o,,)
1
,
. Suppose that
1(. ,) = a u
1
(1 a) u
2
with 0 < a < 1 and u
i
B. Then u
1
_ G
_
1
o
_
, and u
1
is dominated by a
potential. By Corollary 6.44, it must itself be a potential u
1
= G
1
with
1
_ 0.
If supp(
1
) contained some n X different from ,, then u
1
, and therefore also
1(. ,), would be strictly superharmonic in n, a contradiction. We conclude that
supp(
1
) = {,], and
1
= c G(. ,) for some constant c > 0. Since u
1
(o) = 1,
we must have c = 1,G(o. ,), and u
1
= 1(. ,). It follows that also u
2
= 1(. ,).
This proves that 1(. ,) dB.
Now let h H
be a minimal harmonic function. Suppose that

h = a u
1
(1 a) u
2
with 0 < a < 1 and u
i
B. None of the functions u
1
and u
2
can be strictly
subharmonic in some point (since otherwise also h would have this property). We
obtain a u
1
H
and h _ a u
1
. By minimality of h, the function u
1
,h is
constant. Since u
1
(o) = 1 = h(o), we must have u
1
= h and thus also u
2
= h.
This proves that h dB.
We shall now exhibit two general criteria that are useful for recognizing the
minimal harmonic functions. Let hbe anarbitrarypositive, non-zerosuperharmonic
function. We use h to dene a new transition matrix 1
h
=
_
h
(.. ,)
_
x,,A
:
h
(.. ,) =
(.. ,)h(,)
h(.)
. (7.7)
The Markov chain with these transition probabilities is called the h-process, or also
Doobs h-process, see his fundamental paper [17].
We observe that in the notation that we have introduced in 1.B, the random
variables of this chain remain always 7
n
, the projections of the trajectory space
onto X. What changes is the probability measure on C (and consequently also the
distributions of the 7
n
). We shall write Pr
h
x
for the family of probability measures
on (C. A) with starting point . X which govern the h-process. If h is strictly
subharmonic in some point, then recall that we have to add the tomb state f as in
(6.11).
The construction of 1
h
is similar to that of the reversed chain with respect to an
excessive measure as in (6.27). In particular,
(n)
h
(.. ,) =

(n)
(.. ,)h(,)
h(.)
. and G
h
(.. ,) =
G(.. ,)h(,)
h(.)
. (7.8)
where G
h
denotes the Green function associated with 1
h
in the transient case.
A. Minimal harmonic functions 183
7.9 Exercise. Prove the following simple facts.
(1) The matrix 1
h
is stochastic if and only if h H
.
(2) One has u (X. 1) if and only if u = u,h (X. 1
h
). Furthermore, u is
harmonic with respect to 1 if and only if u is harmonic with respect to 1
h
.
(3) A function u H
(X. 1) with u(o) = 1 is minimal harmonic with respect

to 1 if and only if h(o) u H
(X. 1
h
) is minimal harmonic with respect
to 1
h
.
We recall that H
o
= H
o
(X. 1) denotes the linear space of all bounded
harmonic functions.
7.10 Lemma. Let (X. 1) be an irreducible Markov chain with stochastic transition
matrix 1. Then H
o
= {constants] if and only if the constant harmonic function 1
is minimal.
Proof. Suppose that H
o
= {constants]. Let h
1
be a positive harmonic function
with 1 _ h
1
. Then h
1
is bounded, whence constant by the assumption. Therefore
h
1
,1 is constant, and 1 is a minimal harmonic function.
Conversely, suppose that the harmonic function 1 is minimal. If h H
o
then
there is a constant M such that h
1
= h M is a positive function. Since 1 is
stochastic, h
1
is harmonic. But h
1
is also bounded, so that 1 _ c h
1
for some
c > 0. By minimality of 1, the ratio h
1
,1 must be constant. Therefore also h is
constant.
Setting u = h in Exercise 7.9(3), we obtain the following corollary (valid also
when 1 is substochastic, since what matters is stochasticity of 1
h
).
7.11 Corollary. A function h H
(X. 1) is minimal if and only if one has

H
o
(X. 1
h
) = {constants].
If C is a compact, convex set in the Euclidean space R
d
, and if the set dC
of its extremal points is nite, then it is known that every element x C can be
written as a weighted average x =

cdC
v(c) c of the elements of dC. The
numbers v(c) make up a probability measure on dC. (Note that in general, dC
is not the topological boundary of C.) If dC is innite, the weighted sum has to
be replaced by an integral x =
_
dC
c Jv(c), where v is a probability measure on
dC. In general, this integral representation is not unique: for example, the interior
points of a rectangle (or a disk) can be written in different ways as weighted averages
of the four vertices (or the points on the boundary circle of the disk, respectively).
However, the representation does become unique if the set C is a simplex: a triangle
in dimension 2, a tetrahedron in dimension 3, etc.; in dimension J, a simplex has
J 1 extremal points.
We use these observations as a motivation for the study of the compact convex
set B, base of the cone
. We could appeal to Choquets representation theory of

convex cones in topological linear spaces, see for example Phelps [Ph]. However,
in our direct approach regarding the (X. 1), we shall obtain a more detailed specic
understanding. In the next sections, we shall prove (among other) the following
results:
The base B is a simplex (usually innite dimensional) in the sense that every
element of B can be written uniquely as the integral of the elements of dB
with respect to a suitable probability measure.
Every minimal harmonic function can be approximated by a sequence of
functions 1(. ,
n
), where ,
n
X.
We shall obtain these results via the construction of the Martin compactication, a
compactication of the state space X dened by the Martin kernel.
B The Martin compactication
Preamble on compactications
Given the countably innite set X, by a compactication of X we mean a compact
topological Hausdorff space

X containing X such that
the set X is dense in

X, and
in the induced topology, X

X is discrete.
The set

X \ X is called the boundary or ideal boundary of X in

X. We consider
two compactications of X as equal, that is, equivalent, if the identity function
X X extends to a homeomorphism between the two. Also, we consider one
compactication bigger than a second one, if the identity X X extends to a
continuous surjection from the rst onto the second. The following topological
exercise does not require CantorBernstein or other deep theorems.
7.12 Exercise. Two compactications of X are equivalent if and only if each of
them is bigger than the other one.
[Hint: use sequences in X.]
Given a family F of real valued functions on X, there is a standard way to
associate with F a compactication (in general not necessarily Hausdorff) of X.
In the following theorem, we limit ourselves to those hypotheses that will be used
in the sequel.
7.13 Theorem. Let F be a denumerable family of bounded functions on X. Then
there exists a unique (up to equivalence) compactication

X =

X
F
of X such that
B. The Martin compactication 185
(a) every function F extends to a continuous function on

X (which we still
denote by ), and
(b) the family F separates the boundary points: if . j

X \ X are distinct,
then there is F with () = (j).
Proof. 1.) Existence (construction).
For . X, we write 1
x
for the indicator function of the point .. We add all
those indicator functions to F , setting
F
+
= F L {1
x
: . X]. (7.14)
For each F
+
, there is a constant C
(
such that [(.)[ _ C
(
for all . X.
Consider the topological product space
F
=
( F
C
(
. C
(
| = {: F
+
R [ ( ) C
(
. C
(
| for all F
+
].
The topology on
F
is the one of pointwise convergence:
n
if and only if
n
( ) ( ) for every F
+
. A neighbourhood base at
F
is given
by the nite intersections of sets of the form {[
F
: [[( ) ( )[ < c], as
F
+
and c > 0 vary.
We can embed X into
F
via the map
| : X
F
. |(.) =
x
. where
x
( ) = (.) for F
+
.
If .. , are two distinct elements of X then
x
(1
x
) = 1 = 0 =
,
(1
x
). Therefore
| is injective. Furthermore, the neighbourhood {[
F
: [[(1
x
)
x
(1
x
)[ < 1]
of |(.) =
x
contains none of the functions
,
with , X \ {.]. This means that
|(X), with the induced topology, is a discrete subset of
F
. Thus we can identify
X with |(X). [Observe how the enlargement of F by the indicator functions has
been crucial for this reasoning.]
Now

X =

X
F
is dened as the closure of X in
F
. It is clear that this is
a compactication of X in our sense. Each

X \ X is a function F
+
R
with [( )[ _ C
(
. By the construction of

X, there must be a sequence (.
n
) of
distinct points in X that converges to , that is, (.
n
) =
x
n
( ) ( ) for every
F
+
. We prefer to think of as a limit point of X in a more geometrical
way, and dene () = ( ) for F . Observe that since
x
n
(1
x
) = 0 when
.
n
= ., we have 1
x
() = (1
x
) = 0 for every . X, as it should be.
If (.
n
) is an arbitrary sequence in X which converges to in the topology of

X,
then for each F one has
(.
n
) =
x
n
( ) ( ) = ().
Thus, has become a continuous functions on

X. Finally, F separates the points
of

X \ X: if . j are two distinct boundary points, then they are also distinct in
their original denition as functions on F
+
. Hence there is F
+
such that
( ) = j( ). Since (1
x
) = j(1
x
) = 0 for every . X, we must have F .
With the reversed notation that we have introduced above, () = (j).
2.) Uniqueness. To show uniqueness of

X up to homeomorphism, suppose
that

X is another compactication of X with properties (a) and (b). We only use
those dening properties in the proof, so that the roles of

X and

X can be exchanged
(symmetry). In order to distinguish the continuous extension of a function F
to

X from the one to

X, in this proof we shall write

for the former and

for the
latter.
We construct a function t :

X

X: if . X

X then we set t(.) = . (the
latter seen as an element of

X).
If

X \ X, then there must be a sequence (.
n
) in X such that in the topology
of

X, one has .
n

. We show that .
n
(= t(.
n
)) has a limit in

X \ X: let

X
be an accumulation point of (.
n
) in the latter compact space. If

X then .
n
=

for innitely many n, which contradicts the fact that .

n

in

X. Therefore every
accumulation point of (.
n
) in

X lies in the boundary

X\X. Suppose there is another
accumulation point j

X \ X. Then there is F with

(
) =

( j). As

is
continuous, we nd that the real sequence
_
(.
n
)
_
possesses the two distinct real
accumulation points

() and

(j). But this is impossible, since with respect to
the compactication

X, we have that (.
n
)

(
).
Therefore there is

X \X such that .
n

in the topology of

X. We dene
t(
) =

. This mapping is well dened: if (,
n
) is another sequence that tends to

in

X, then the union of the two sequences (.
n
) and (,
n
) also tends to

in

X, so that
the argument used a few lines above shows that the union of those two sequences
must also converge to

in the topology of

X.
By construction, t is continuous. [Exercise: in case of doubts, prove this.]
Since X is dense in both compactications, t is surjective.
In the same way (by symmetry), we can construct a continuous surjection

X
X which extends the identity mapping X X. It must be the inverse of t (by

continuity, since this is true on X). We conclude that t is a homeomorphism.
We indicate two further, equivalent ways to construct the compactication

X.
1.) Let us say that .
n
o for a sequence in X, if for every nite subset
of X, there are only nitely many n with .
n
. Consider the set
X
o
= {(.
n
) X
N
: .
n
o and
_
(.
n
)
_
converges for every F ].
On X
o
, we consider the following equivalence relation:
(.
n
) ~ (,
n
) == lim(.
n
) = lim(,
n
) for every F .
The boundary of our compactication is X
o
, ~, that is,

X = X L (X
o
, ~).
The topology is dened via convergence: on X, it is discrete; a sequence (.
n
) in
X converges to a boundary point if (.
n
) X
o
and (.
n
) belongs to as an
equivalence class under ~.
2.) Consider the countable family of functions F
+
as in (7.14). Each F
+
is
bounded by a constant C
(
. We choose weights n
(
> 0 such that
n
(
C
(
< o.
Then we dene a metric on X:
0(.. ,) =
( F
n
(
[(.) (,)[. (7.15)
In this metric, X is discrete, while a sequence (.
n
) which tends to oin the above
sense is a Cauchy sequence if and only if
_
(.
n
)
_
converges in R for every F .
The completion of (X. 0) is (homeomorphic with)

X.
Observation: if F contains only constant functions, then

X is the one-point
compactication:

X = X L{o], and convergence to ois dened as in 1.) above.
7.16 Exercise. Elaborate the details regarding the constructions in 1.) and 2.) and
showthat the compactications obtained in this way are (equivalent with)

X
F
.
After this preamble, we can now give the denition of the Martin compactica-
tion.
7.17 Denition. Let (X. 1) be an irreducible, (sub)stochastic Markov chain. The
Martin compactication of X with respect to 1 is dened as

X(1) =

X
F
, the
compactication in the sense of Theorem7.13 with respect to the family of functions
F = {1(.. ) : . X]. The Martin boundary M = M(1) =

X(1) \ X is the
ideal boundary of X in this compactication.
Note that all the functions 1(.. ) of Denition 7.4 are bounded. Indeed, Propo-
sition 1.43 (a) implies that
1(.. ,) =
J(.. ,)
J(o. ,)
_
1
J(o. .)
= C
x
for every , X. By (7.15), the topology of

X(1) is induced by a metric (as it
has to be, since

X(1) is a compact separable Hausdorff space). If M, then we
write of course 1(.. ) for the value of the extended function 1(.. ) at .
If (X. 1) is recurrent, F = {1], and the Martin boundary consists of one
element only. We note at this point that for recurrent Markov chains, another notion
of Martin compactication has been introduced, see Kemeny and Snell [35]. In
the transient case, we have the following, where (attention) we nowconsider 1(. )
as a function on X for every

X(1). In our notation, when applied from the left
to the Martin kernel, the transition operator 1 acts on the rst variable of 1(. ).
7.18 Lemma. If (X. 1) is transient and M then 1(. ) is a positive super-
harmonic function. If 1 has nite range at . X (that is, for the given ., the set
{, X : (.. ,) > 0] is nite), then the function 1(. ) is harmonic in ..
Proof. By construction of M, there is a sequence (,
n
) in X, tending to o, such
that 1(. ,
n
) 1(. ) pointwise on X. Thus 1(. ) is the pointwise limit of the
superharmonic functions 1(. ,
n
) and consequently a superharmonic function.
By (1.34) and (7.5),
11(.. ,
n
) =
,:](x,,)>0
(.. ,) 1(,. ,
n
) = 1(.. ,
n
)

x
(,
n
)
1(o. ,
n
)
.
If the summation is nite, it can be exchanged with the limit as n o. Since
,
n
= . for all but (at most) nitely many n, we have that
x
(,
n
) 0. Therefore
11(.. ) = 1(.. ).
In particular, if the Markov chain has nite range (at every point), then for every
M, the function 1(. ) is positive harmonic with value 1 in o.
Another construction in case of nite range
The last observation allows us to describe a fourth, more specic construction of the
Martin compactication in the case when (X. 1) is transient and has nite range.
Let B = {u
: u(o) = 1] be the base of the cone
, dened in
(7.1), with the topology of pointwise convergence. We can embed X into B via
the map , 1(. ,). Indeed, this map is injective (one-to-one): suppose that
1(. ,
1
) = 1(. ,
2
) for two distinct elements ,
1
. ,
2
X. Then
1
J(o. ,
1
)
= 1(,
1
. ,
1
) = 1(,
1
. ,
2
) =
J(,
1
. ,
2
)
J(o. ,
2
)
and
1
J(o. ,
2
)
= 1(,
2
. ,
2
) = 1(,
2
. ,
1
) =
J(,
2
. ,
1
)
J(o. ,
1
)
.
We deduce J(,
1
. ,
2
)J(,
2
. ,
1
) = 1 which implies U(,
1
. ,
1
) = 1, see Exer-
cise 1.44.
Now we identify X with its image in B. The Martin compactication is then
the closure of X in B. Indeed, among the properties which characterize

X(1)
according to Theorem 7.13, the only one which is not immediate is that X
{1(. ,) : , X] is discrete in B: let us suppose that (,
n
) is a sequence of
distinct elements of X such that 1(. ,
n
) converges pointwise. By nite range, the
limit function is harmonic. In particular, it cannot be one of the functions 1(. ,),
where , X, as the latter is strictly superharmonic at ,. In other words, no element
of X can be an accumulation point of (,
n
).
We remark at this point that in the original article of Doob [17] and in the book
of Kemeny, Snell and Knapp [K-S-K] it is not required that X be discrete in
the Martin compactication. In their setting, the compactication can always be
described as the closure of (the embedding of) X in B. However, in the case when
1 does not have nite range, the compact space thus obtained may be smaller than
in our construction, which follows the one of Hunt [32]. In fact, it can happen that
there are , X and Msuch that 1(. ,) = 1(. ): in our construction, they
are considered distinct in any case (since is a limit point of a sequence that tends
to o), while in the construction of [17] and [K-S-K], and , would be identied.
Adiscussion, in the context of randomwalks of trees with innite vertex degrees,
can be found in Section 9.E after Example 9.47. In few words, one can say that
for most probabilistic purposes the smaller compactication is sufcient, while for
more analytically avoured issues it is necessary to maintain the original discrete
topology on X.
We now want to state a rst fundamental theorem regarding the Martin com-
pactication. Let us rst remark that

X(1), as a compact metric space, carries a
natural o-algebra, namely the Borel o-algebra, which is generated by the collection
of all open sets. Speaking of a random variable with values in

X(1), we intend
a function from the trajectory space (C. A) to

X(1) which is measurable with
respect to that o-algebra.
7.19 Theorem(Convergence to the boundary). If (X. 1) is stochastic and transient
then there is a randomvariable 7
o
taking its values in Msuch that for each . X,
lim
n-o
7
n
= 7
o
Pr
x
-almost surely
in the topology of

X(1).
In terms of the trajectory space, the meaning of this statement is the following.
Let
C
o
=
_
o = (.
n
) C :
there is .
o
M such that
.
n
.
o
in the topology of

X(1)
_
. (7.20)
Then
C
o
A and Pr
x
(C
o
) = 1 for every . X.
Furthermore,
7
o
(o) = .
o
(o C
o
)
denes a random variable which is measurable with respect to the Borel o-algebra
on

X(1).
When 1 is strictly substochastic in some point, we have to modify the statement
of the theorem. As we have seen in Section 7.B, in this case one introduces the
absorbing (tomb) state f and extends 1 to X L {f]. The construction of the
Martin boundary remains unchanged, and does not involve the additional point f.
However, the trajectory space (C. A) with the probability measures Pr
x
, . X
now refers to X L {f]. In reality, in order to construct it, we do not need all
sequences in X L {f]: it is sufcient to consider
C = X
N
0
L C
|
. where
C
|
=
_
o = (.
n
) : there is k _ 1 with
_
.
n
X for all n _ k.
.
n
= f for all n > k
_
.
(7.21)
Indeed, once the Markov chain has reached f, it has to stay there forever. We write
(o) = k for o C
|
, with k as in the denition of C
|
, and (o) = o for
o C\ C
|
. Thus,
= t
|
1
is a stopping time, the exit time from X the last instant when the Markov chain is
in X. With C
|
and C
o
as in (7.21) and (7.20), respectively, we now dene
C
= C
|
L C
o
. and
7
: C

X(1). 7
(o) =
_
7
(o)
(o). o C
|
.
7
o
(o). o C
o
.
In this setting, Theorem 7.19 reads as follows.
7.22 Theorem. If (X. 1) is transient then for each . X,
lim
n-
7
n
= 7
Pr
x
-almost surely
in the topology of

X(1).
Theorem 7.19 arises as a special case. Note that Theorem 7.22 comprises the
following statements.
(a) C
belongs to the o-algebra A;

(b) Pr
x
(C
) = 1 for every . X;
(c) 7
: C

X(1) is measurable with respect to the Borel o-algebra of

X(1).
The most difcult part is the proof of (b), which will be elaborated in the next
section. Thereafter, we shall deduce from Theorem 7.19 resp. 7.22 that
every minimal harmonic function is a Martin kernel 1(. ), with M;
every positive harmonic function h has an integral representation h(.) =
_
M
1(.. ) Jv
h
, where v
h
is a Borel measure on M.
C. Supermartingales, superharmonic functions, and excessive measures 191
The construction of the Martin compactication is an abstract one. In the study
of specic classes of Markov chains, typically the state space carries an algebraic
or geometric structure, and the transition probabilities are in some sense adapted to
this structure; compare with the examples in earlier chapters. In this context, one
is searching for a concrete description of the Martin compactication in terms of
that underlying structure. In Chapters 8 and 9, we shall explain some examples;
various classes of examples are treated in detail in the book of Woess [W2]. We
observe at this point that in all cases where the Martin boundary is known explicitly
in this sense, one also knows a simpler and more direct (structure-specic) method
than that of the proof of Theorem 7.19 for showing almost sure convergence of the
Markov chain to the geometric boundary.
C Supermartingales, superharmonic functions, andexcessive
measures
This section follows the exposition by Dynkin [Dy], which gives the clearest and
best readable account of Martin boundary theory for denumerable Markov chains so
far available in the literature (old and good, and certainly not obsolete). The method
for provingTheorem7.22 presented here, which combines the study of non-negative
supermartingales with time reversal, goes back to the paper by Hunt [32].
Readers who are already familiar with martingale theory can skip the rst part.
Also, since our state space X is countable, we can limit ourselves to a very elemen-
tary approach to this theory.
I. Non-negative supermartingales
Let C be as in (7.21), with the associated o-algebra Aand the probability measure
Pr (one of the measures Pr
x
. . X). Even when 1 is stochastic, we shall need
the additional absorbing state f.
Let Y
0
. Y
1
. . . . . Y
1
be a nite sequence of random variables C X L{f], and
let W
0
. W
1
. . . . . W
1
be a sequence of real valued, composed random variables of
the form
W
n
=
n
(Y
0
. . . . . Y
n
). with
n
: (X L {f])
n1
0. o).
7.23 Denition. The sequence W
0
. . . . . W
1
is called a supermartingale with re-
spect to Y
0
. . . . . Y
1
, if for each n {1. . . . . N] one has
E(W
n
[ Y
0
. . . . . Y
n-1
) _ W
n-1
almost surely.
Here, E( [ Y
0
. . . . . Y
n-1
) denotes conditional expectation with respect to the
o-algebra generated by Y
0
. . . . . Y
n-1
. On the practical level, the inequality means
that for all .
0
. . . . . .
n-1
X L {f] one has
,AL|
n
(.
0
. . . . . .
n-1
. ,) PrY
0
= .
0
. . . . . Y
n-1
= .
n-1
. Y
n
= ,|
_
n-1
(.
0
. . . . . .
n-1
) PrY
0
= .
0
. . . . . Y
n-1
= .
n-1
|.
(7.24)
or, equivalently,
E
_
W
n
g(Y
0
. . . . . Y
n-1
)
_
_ E
_
W
n-1
g(Y
0
. . . . . Y
n-1
)
_
(7.25)
for every function g: (X L {f])
n
0. o).
7.26 Exercise. Verify the equivalence between (7.24) and (7.25). Refresh your
knowledge about conditional expectation by elaborating the equivalence of those
two conditions with the supermartingale property.
Clearly, Denition 7.23 also makes sense when the sequences Y
n
and W
n
are
innite (N = o). Setting g = 1, it follows from (7.25) that the expectations
E(W
n
) form a decreasing sequence. In particular, if W
0
is integrable, then so are
all W
n
.
Extending, or specifying, the denition given in Section 1.B, a random variable
t with values in N
0
L{o] is called a stopping time with respect to Y
0
. . . . . Y
1
(or
with respect to the innite sequence Y
0
. Y
1
. . . . ), if for each integer n _ N,
t _ n| A(Y
0
. . . . . Y
n
).
the o-algebra generated by Y
0
. . . . . Y
n
. As usual, W
t
denotes the random variable
dened on the set {o C : t(o) < o] by W
t
(z) = W
t(o)
(o). If t
1
and t
2
are
two stopping times with respect to the Y
n
, then so are t
1
. t
2
= inf{t
1
. t
2
] and
t
1
t
2
= sup{t
1
. t
2
].
7.27 Lemma. Let s and t be two stopping times with respect to the sequence (Y
n
)
such that s _ t. Then
E(W
s
) _ E(W
t
).
Proof. Suppose rst that W
0
is integrable and N is nite, so that s _ t _ N. As
usual, we write 1
for the indicator function of an event A. We decompose

W
s
=
1
n=0
W
n
1
s=nj
= W
n
1
s_0j
n=1
W
n
(1
s_nj
1
s_n-1j
)
=
1
n=0
W
n
1
s_nj
1-1
n=0
W
n1
1
s_nj
.
We infer that W
s
is integrable, since the sum is nite and the W
n
are integrable. We
decompose W
t
in the same way and observe that s _ N| = t _ N| = C, so that
1
s_1j
= 1
t_1j
. We obtain
W
s
W
t
=
1-1
n=0
W
n
(1
s_nj
1
t_nj
)
1-1
n=0
W
n1
(1
s_nj
1
t_nj
).
For each n, the randomvariable 1
s_nj
1
t_nj
is non-negative and measurable with
respect to A(Y
0
. . . . . Y
n
). The latter o-algebra is generated by the disjoint events
(atoms) Y
0
= .
0
. . . . . Y
n
= .
n
| (.
0
. . . . . .
n
X), and every A(Y
0
. . . . . Y
n
)-
measurable function must be constant on each of those sets. (This fact also stands
behind the equivalence between (7.24) and (7.25).) Therefore we can write
1
s_nj
1
t_nj
= g
n
(Y
0
. . . . . Y
n
).
where g
n
is a non-negative function on (X L {f])
n1
. Now (7.25) implies
E
_
W
n1
(1
s_nj
1
t_nj
)
_
_ E
_
W
n
(1
s_nj
1
t_nj
)
_
for every n, whence E(W
s
W
t
) _ 0.
If W
0
does not have nite expectation, we can apply the preceding inequality to
the supermartingale (W
n
. c)
n=0,...,1
. By monotone convergence,
E(W
s
) = lim
c-o
E
_
(W . c)
s
_
_ lim
c-o
E
_
(W . c)
t
_
= E(W
t
).
Finally, if the sequence is innite, we may apply the inequality to the stopping times
(s .N) and (t .N) and use again monotone convergence, this time for N o.
Let (r
n
) be a nite or innite sequence of real numbers, and let a. b| be an
interval. Then the number of downward crossings of the interval by the sequence
is
D
]
_
(r
n
)
a. b|
_
= sup
_
_
_
k _ 0 :
there are n
1
_ n
2
_ _ n
2k
with r
n
i
_ b for i = 1. 3. . . . . 2k 1
and r
n
j
_ a for = 2. 4. . . . . 2k
_
_
_
.
In case the sequence is nite and terminates with r
1
, one must require that n
2k
_ N
in this denition, and the supremum is a maximum. For an innite sequence,
lim
n-o
r
n
o. o| exists ==
D
]
_
(r
n
)
a. b|
_
< o for every interval a. b|.
(7.28)
Indeed, if liminf r
n
< a < b < limsup r
n
then D
]
_
(r
n
)
a. b|
_
= o. Observe
that in this reasoning, it is sufcient to consider only the countably many intervals
with rational endpoints.
Analogously, one denes the number of upward crossings of an interval by a
sequence (notation: D
]
).
7.29 Lemma. Let (W
n
) be a non-negative supermartingale with respect to the
sequence (Y
n
). Then for every interval a. b| R
E
_
D
]
_
(W
n
)
a. b|
__
_
1
b a
E(W
0
).
Proof. We suppose rst to have a nite supermartingale W
0
. . . . . W
1
(N < o).
We dene a sequence of stopping times relative to Y
0
. . . . . Y
1
, starting with t
0
= 0.
If n is odd,
t
n
=
_
min{i _ t
n-1
: W
i
_ b}. if such i exists;
N. otherwise.
If n > 0 is even,
t
n
=
_
min{ _ t
n-1
: W
}
_ a}. if such exists;
N. otherwise.
Setting d = D
]
_
W
0
. . . . . W
1
a. b|
_
, we get t
n
= N for n _ 2d 2. Further-
more, t
n
= N also for n _ N. We choose an integer m _ N,2 and consider
W = W
t
1

n
}=1
(W
t
2jC1
W
t
2j
)

(1)
=
d
i=1
(W
t
2i1
W
t
2i
) W
t
2dC1

(2)
.
(We have used the fact that W
t
2dC2
= W
t
2dC3
= = W
t
2mC1
= W
1
.) Applying
Lemma 7.27 to term (1) gives
E(
W ) = E(W
t
1
)
n
}=1
_
E(W
t
2jC1
) E(W
t
2j
)
_
_ E(W
t
1
) _ E(W
0
).
The expression (2) leads to

W _ (b a) d and thus also to
E(
W ) _ (b a)E(d).
Combining these inequalities, we nd that
E
_
D
]
(W
0
. . . . . W
1
a. b|)
_
_
1
b a
E(W
0
).
If the sequences (Y
n
) and (W
n
) are innite, we can let N tend to o in the latter
inequality, and (always by monotone convergence) the proposed statement follows.

II. Supermartingales and superharmonic functions
As an application of Lemma 7.29, we obtain the limit theorem for non-negative
supermartingales.
7.30 Theorem. Let (W
n
)
n_0
be a non-negative supermartingale with respect to
(Y
n
)
n_0
such that E(W
0
) < o. Then there is an integrable (whence almost surely
nite) random variable W
o
such that
lim
n-o
W
n
= W
o
almost surely.
Proof. Let a
i
. b
i
|, i N, be an enumeration of all intervals with non-negative
rational endpoints. For each i , let
d
i
= D
]
_
(W
n
) [ a
i
. b
i
|
_
.
By Lemma 7.29, each d
i
is integrable and thus almost surely nite. Let
C =
_
iN
d
i
< o|.
Then Pr(
C) = 1, and for every o

C, each interval a
i
. b
i
| is crossed downwards
only nitely many times by
_
W
n
(o)
_
. By (7.28), there is W
o
= lim
n
W
n

0. o| almost surely. By Fatous lemma, E(W
o
) _ lim
n
E(W
n
) _ E(W
0
) < o.
Consequently, W
o
is a.s. nite.
This theoremapplies to positive superharmonic functions. Consider the Markov
chain (7
n
)
n_0
. Recall that Pr = Pr
x
some . X. Let : X R be a non-
negative function. We extend to X L {f] by setting (f) = 0. By (7.24), the
sequence of real-valued random variables
_
(7
n
)
_
n_0
is a supermartingale with
respect to (7
n
)
n_0
if and only if for every n and all .
0
. . . . . .
n-1
,AL|
x
(.
0
) (.
0
. .
1
) (.
n-2
. .
n-1
) (.
n-1
. ,) (,)
_
x
(.
0
) (.
0
. .
1
) (.
n-2
. .
n-1
) (.
n-1
).
that is, if and only if is superharmonic.
7.31 Corollary. If
(X. 1) then lim

n-o
(7
n
) exists and is Pr
x
-almost
surely nite for every . X.
If 1 is strictly substochastic in some point, then the probability that 7
n
dies
(becomes absorbed by f) is positive, and on the corresponding set C
|
of trajectories,
(7
n
) tends to 0. What is interesting for us is that in any case, the set of trajectories
in X
N
0
along which (7
n
) does not converge has measure 0.
7.32 Exercise. Prove that for all .. , X,
lim
n-o
G(7
n
. ,) = 0 Pr
x
-almost surely.
III. Supermartingales and excessive measures
A specic example of a positive superharmonic function is 1(. ,), where , X.
Therefore, 1(7
n
. ,) converges almost surely, the limit is 0 by Exercise 7.32. How-
ever, in order to prove Theorem 7.22, we must verify instead that Pr
x
(C
) = 1,
or equivalently, that lim
n-
1(,. 7
n
) exists Pr
x
-almost surely for all .. , X.
We shall rst prove this with respect to Pr
o
(where the starting point is the same
origin o as in the denition of the Martin kernel).
Recall that we are thinking of functions on X as column vectors and of measures
as row vectors. In particular, G(.. ) is a measure on X. Now let j be an arbitrary
probability measure on X. We observe that
jG(,) =
x
j(.) G(.. ,) _
x
j(.) G(,. ,) = G(,. ,)
is nite for every ,. Then v = jG is an excessive measure by the dual of
Lemma 6.42 (b). Furthermore, we can write
jG(,) =
(,) G(o. ,). where
(,) =
x
j(.) 1(.. ,). (7.33)
We may consider the function
as the density of the excessive measure jG with

respect to the measure G(o. ). We extend
to X L {f] by setting
(f) = 0.
We choose a nite subset V X that contains the origin o. As above, we dene
the exit time of V :
V
= sup{n : 7
n
V ]. (7.34)
Contrary to = t
|
1, this is not a stopping time, as the property that
V
= k
requires that 7
n
V for all n > k. Since our Markov chain is transient and o V ,
Pr
o
0 _
V
< o| = 1.
that is, Pr
o
(C
V
) = 1, where C
V
= {o C : 0 _
V
(o) < o]. Observe that for
. V , it can occur with positive probability that (7
n
) never enters V , in which
case
V
= o, while
V
is nite only for those trajectories starting from . that
visit V .
Given an arbitrary interval a. b|, we want to control the number of its upward
crossings by the sequence
(7
0
). . . . .
(7
V
), where
is as in (7.33). Note
that this sequence has a random length that is almost surely nite. For n _ 0 and
o C
V
we set
7
V
-n
(o) =
_
7
V
(o)-n
(o). n _ (o):
f. n > (o).
Then with probability 1 (that is, on the event
V
< o|)
D
]
_
(7
0
).
(7
1
). . . . .
(7
V
)
a. b|
_
= lim
1-o
D
]
_
(7
V
).
(7
V
-1
). . . . .
(7
V
-1
)
a. b|
_
.
(7.35)
With these ingredients we obtain the following
7.36 Proposition. (1) E
o
_
(7
V
)
_
_ 1.
(2)
(7
V
).
(7
V
-1
). . . . .
(7
V
-1
) is a supermartingale with respect
to 7
V
. 7
V
-1
. . . . . 7
V
-1
and the measure Pr
o
on the trajectory space.
Proof. First of all, we compute for .. , V
Pr
x
7
V
= ,| =
o
n=0
Pr
x
V
= n. 7
n
= ,|.
(Note that more precisely, we should write Pr
x
0 _
V
< o. 7
V
= ,| in
the place of Pr
x
7
V
= ,|.) The event
V
= n| depends only on those
7
k
with k _ n (the future) , and not on 7
0
. . . . . 7
n-1
(the past). Therefore
Pr
x
V
= n. 7
n
= ,| =
(n)
(.. ,) Pr
,
V
= 0|, and
Pr
x
7
V
= ,| = G(.. ,) Pr
,
V
= 0|. (7.37)
Taking into account that 0 _
V
< oalmost surely, we now get
E
o
_
(7
V
)
_
=
,V
(,) Pr
o
7
V
= ,|
=
,V
(,) G(o. ,) Pr
,
V
= 0|
=
,V
jG(,) Pr
,
V
= 0|
=
xA
j(.)
,V
G(.. ,) Pr
,
V
= 0|
=
xA
j(.)
,V
Pr
x
7
V
= ,| _ 1.
This proves (1). To verify (2), we rst compute for .
0
V , .
1
. . . . .
n
X
Pr
o
7
V
= .
0
. 7
V
-1
= .
1
. . . . . 7
V
-n
= .
n
|
(since we have
V
_ n, if 7
V
-n
= f)
=
o
k=n
Pr
o
V
= k. 7
k
= .
0
. 7
k-1
= .
1
. . . . . 7
k-n
= .
n
|
=
o
k=n
(k-n)
(o. .
n
) (.
n
. .
n-1
) (.
1
. .
0
) Pr
x
0
V
= 0|
= G(o. .
n
) (.
n
. .
n-1
) (.
1
. .
0
) Pr
x
0
V
= 0|.
We now check (7.24) with
n
(.
0
. . . . . .
n
) =
(.
n
). Since
(f) = 0, we only
have to sumover elements . X. Furthermore, if Y
n
= 7
V
-n
X then we must
have
V
_ n and 7
V
-k
X for k = 0. . . . . n. Hence it is sufcient to consider
only the case when .
0
. . . . . .
n-1
X:
xA
(.) Pr
0
7
V
= .
0
. . . . . 7
V
-n1
= .
n-1
. 7
V
-n
= .|
=
xA
(.) G(o. .) (.. .

n-1
) (.
1
. .
0
) Pr
x
0
V
= 0|
=
_
xA
jG(.) (.. .
n-1
)
_
(.
n-1
. .
n-2
) (.
1
. .
0
) Pr
x
0
V
= 0|
_
_
jG(.
n-1
)
_
(.
n-1
. .
n-2
) (.
1
. .
0
) Pr
x
0
V
= 0|
=
(.
n-1
) Pr
o
7
V
= .
0
. . . . . 7
V
-n1
= .
n-1
|.
In the inequality we have used that jG is an excessive measure.
This proposition provides the main tool for the proof of Theorem 7.22.
7.38 Corollary.
E
o
_
D
]
_
_
(7
n
)
_
n_
a. b|
__
_
1
b a
.
Proof. If V X is nite, then (7.35), Lemma 7.29 and Proposition 7.36 imply, by
virtue of the monotone convergence theorem, that
E
o
_
D
]
_
(7
0
).
(7
1
). . . . .
(7
V
)
a. b|
__
_
1
b a
E
o
_
(7
V
)
_
_
1
b a
.
We choose a sequence of nite subsets V
k
of X containing o, such that V
k
V
k1
and
_
k
V
k
= X. Then lim
k-o
V
k
= , and
lim
k-o
D
]
_
_
(7
n
)
_
n_
V
k
a. b|
_
= D
]
_
_
(7
n
)
_
n_
a. b|
_
.
Using monotone convergence once more, the result follows.
IV. Proof of the boundary convergence theorem
Now we can nally prove Theorem 7.22. We have to prove the statements (a), (b),
(c) listed after the theorem. We start with (a), C
A, whose proof is a standard

exercise in measure theory. (In fact, we have previously omitted such detailed
considerations on several occasions, but it is good to go through this explicitly on
at least one occasion.)
It is clear that C
|
A, since it is a countable union of basic cylinder sets. On the
other hand, we nowprove that C
o
can be obtained by countably many intersections
and unions, starting with cylinder sets in A. First of all C
o
=
_
x
C
x
, where
C
x
=

o = (.
n
) X
N
0
: lim
n-o
1(.. .
n
) exists in R
_
.
Now
_
1(.. .
n
)
_
is a bounded sequence, whence by (7.28)
C
x
=
_
o, bj rational
x
(a. b|). where
x
(a. b|) =
_
(.
n
) : D
]
__
1(.. .
n
)
_
a. b|
_
< o
_
.
We show that C
o
\
x
(a. b|) Afor any xed interval a. b|: this is
_
(.
n
) : D
]
__
1(.. .
n
)
_
a. b|
_
= o
_
=
_
k
_
I,n_k
_
{(.
n
) : 1(.. .
I
) _ b] {(.
n
) : 1(.. .
n
) _ a]
_
.
Each set {(.
n
) : 1(.. .
I
) _ b] depends only on the value 1(.. .
I
) and is the
union of all cylinder sets of the form C(,
0
. . . . . ,
I
) with 1(.. ,
I
) _ b. Analo-
gously, {(.
n
) : 1(.. .
n
) _ a] is the union of all cylinder sets C(,
0
. . . . . ,
n
) with
1(.. ,
n
) _ a.
(b) We set j =
x
and apply Corollary 7.38:
(,) = 1(.. ,), whence

lim
n-
1(.. 7
n
) exists Pr
o
-almost surely for each ..
Therefore, (7
n
) converges Pr
o
-almost surely in the topology of

X(1). In other
terms, Pr
o
(C
) = 1. In order to see that the initial point o can be replaced with any
.
0
X, we use irreducibility. There are k _ 0 and ,
1
. . . . . ,
k-1
X such that
(o. ,
1
)(,
1
. ,
2
) (,
k-1
. .
0
) > 0.
Therefore
(o. ,
1
) (,
1
. ,
2
) (,
k-1
. .
0
) Pr
x
0
(C\ C
)
= (o. ,
1
) (,
1
. ,
2
) (,
k-1
. .
0
) Pr
x
0
_
C(.
0
) (C\ C
)
_
= Pr
o
_
C(o. ,
1
. . . . . ,
k-1
. .
0
) (C\ C
)
_
= 0.
We infer that Pr
x
0
(C\ C
) = 0.
(c) Since X is discrete in the topology of

X(1), it is clear that the restriction
of 7
to C
|
is measurable. We prove that also the restriction to C
o
is measurable
with respect to the Borel o-algebra on M. In view of the construction of the Martin
boundary (see in particular the approach using the completion of the metric (7.15)
on X), a base of the topology is given by the collection of all nite intersections of
sets of the form
T
x,,t
=

j M : [1(.. j) 1(.. )[ < c
_
.
where . X, Mand c > 0 vary. We prove that 7
T
x,,t
| Afor each of
those sets. Write c = 1(.. ). Then
7
T
x,,t
| =

o = (.
n
) C
o
: [1(.. .
o
) c[ < c
_
=

o = (.
n
) C
o
:
lim
n-o
1(.. .
n
) c
< c
_
.
7.39 Exercise. Prove in analogy with (a) that the latter set belongs to the o-alge-
bra A.
This concludes the proof of Theorem 7.22.
Theorem7.19 follows immediately fromTheorem7.22. Indeed, if 1 is stochas-
tic then Pr
x
(C
|
) = 0 for every . X, and = o. In this case, adding f to the
state space is not needed and is just convenient for the technical details of the proofs.
D The PoissonMartin integral representation theorem
Theorems 7.19 and 7.22 showthat the Martin compactication

X =

X(1) provides
a geometric model for the limit points of the Markov chain (always considering
the transient case). With respect to each starting point . X, we consider the
distribution v
x
of the random variable 7
: for a Borel set T

X,
v
x
(T) = Pr
x
7
T|.
D. The PoissonMartin integral representation theorem 201
In particular, if , X, then (7.37) (with V = X) yields
v
x
(,) = G(.. ,)
_
1
uA
(,. n)
_
= G(.. ,) (,. f).
1
(7.40)
If :

X R is v
x
-integrable then
E
x
_
(7
)
_
=
_
`
A
Jv
x
. (7.41)
7.42 Theorem. The measure v
x
is absolutely continuous with respect to v
o
, and
(a realization of ) its RadonNikodym density is given by
Jv
x
Jv
o
= 1(.. ). Namely,
if T

X is a Borel set then
v
x
(T) =
_
B
1(.. ) Jv
o
.
Proof. As above, let V be a nite subset of X and
V
the exit time from V . We
assume that o. . V . Applying formula (7.37) once with starting point . and once
with starting point o, we nd
Pr
x
7
V
= ,| = 1(.. ,) Pr
o
7
V
= ,|
for every , V . Let :

X R be a continuous function. Then
E
x
_
(7
V
)
_
=
,V
(,) Pr
x
7
V
= ,|
=
,V
(,) 1(.. ,) Pr
o
7
V
= ,| = E
o
_
(7
V
) 1(.. 7
V
)
_
.
We now take, as above, an increasing sequence of nite sets V
k
with limit (union)
X. Then lim
k
7
V
k
= 7
almost surely with respect to Pr

x
and Pr
o
. Since
and 1(.. ) are continuous functions on the compact set

X, Lebesgues dominated
convergence theorem implies that one can exchange limit and expectation. Thus
E
x
_
(7
)
_
= E
o
_
(7
) 1(.. 7
)
_
.
that is,
_
`
A
() Jv
x
() =
_
`
A
() 1(.. ) Jv
o
()
for every continuous function :

X R. Since the indicator functions of open
sets in

X can be approximated by continuous functions, it follows that
v
x
(T) =
_
B
1(.. ) Jv
o
1
For any measure , we always write (u) = (u) for the mass of a singleton.
for every open set T. But the open sets generate the Borel o-algebra, and the result
follows.
(We can also use the following reasoning: two Borel measures on a compact
metric space coincide if and only if the integrals of all continuous functions coincide.
For more details regarding Borel measures on metric spaces, see for example the
book by Parthasarathy [Pa].)
We observe that by irreducibility, also v
o
is absolutely continuous with respect
to v
x
, with RadonNikodym density 1,1(.. ). Thus, all the limit measures v
x
,
. X, are mutually absolutely continuous. We add another useful proposition
involving the measures v
x
.
7.43 Proposition. If :

X R is a continuous function then
E
x
_
(7
)
_
=
,A
(,) v
x
(,) lim
n-o
1
n
(.)
=
,A
(,) G(.. ,) (,. f) lim
n-o
1
n
(.).
Proof. We decompose
E
x
_
(7
)
_
= E
x
_
(7
) 1
D
_
E
x
_
(7
) 1
D
1
_
.
The rst term can be rewritten as
E
x
_
(7
) 1
~oj
_
=
,A
(,) Pr
x
< o. 7
= ,| =
,A
(,) v
x
(,).
Using continuity of and dominated convergence, the second term can be written
as
lim
n-o
E
x
_
(7
n
) 1
_nj
_
.
On the set _ n| we have 7
k
X for each k _ n. Hence
E
x
_
(7
n
) 1
_nj
_
=
,A
(,) Pr
x
7
n
= ,| = 1
n
(.).
Combining these relations, we obtain the rst of the proposed identities. The second
one follows from (7.40).
The support of a (non-negative) Borel measure v is the set
supp(v) = { : v(V ) > 0 for every neighbourhood V of ].
If we set h =
_
`
A
1(. ) Jv() then
1h =
_
`
A
11(. ) Jv(). (7.44)
Indeed, if 1 has nite range at . X then 1h(.) =

,
(.. ,)h(,) is a nite sum
which can be exchanged with the integral. Otherwise, we choose an enumeration
,
k
, k = 1. 2. . . . . of the , with (.. ,) > 0. Then
n
k=1
(.. ,
k
)h(,
k
) =
_
`
A
n
k=1
(.. ,
k
) 1(,
k
. ) Jv()
for every n. Using monotone convergence as n o, we get (7.44). In particular,
h is a superharmonic function.
We have now arrived at the point where we can prove the second main theorem
of Martin boundary theory, after the one concerning convergence to the boundary.
7.45Theorem(PoissonMartin integral representation). Let (X. 1) be substochas-
tic, irreducible and transient, with Martin compactication

X and Martin bound-
ary M. Then for every function h
(X. 1) there is a Borel measure v

h
on

X
such that
h(.) =
_
`
A
1(.. ) Jv
h
for every . X.
If h is harmonic then supp(v
h
) M.
Proof. We exclude the trivial case h 0. Then we know (from the minimum
principle) that h(.) > 0 for every ., and we can consider the h-process (7.7). By
(7.8), the Martin kernel associated with 1
h
is 1
h
(.. ,) = 1(.. ,)h(o),h(.). In
view of the properties that characterize the Martin compactication, we see that
X(1
h
) =

X(1), and that for every . X
1
h
(.. ) = 1(.. )
h(o)
h(.)
on

X. (7.46)
Let v
x
be the distribution of 7
with respect to the h-process with starting point .,

that is, v
x
(T) = Pr
h
x
7
T|. [At this point, we recall once more that when

working with the trajectory space, the mappings 7
n
and 7
dened on the latter do

not change when we consider a modied process; what changes is the probability
measure on the trajectory space.] We apply Theorem 7.42 to the h-process, setting
T =

X, and use the fact that 1
h
(.. ) = J v
x
,J v
o
:
1 = v
x
(

X) =
_
`
A
1
h
(.. ) J v
o
.
We set v
h
= h(o) v
o
and multiply by h(.). Then (7.46) implies the proposed
integral representation h(.) =
_
`
A
1(.. ) Jv
h
.
Let now h be a harmonic function. Suppose that , supp(v
h
) for some
, X. Since X is discrete in

X, we must have v
h
(,) > 0. We can decompose
v
h
= a
,
v
t
, where supp(v
t
)

X\{,]. Then we get h(.) = a 1(.. ,)h
t
(.),
where h
t
(.) =
_
`
A
1(.. ) Jv
t
. But 1(. ,) and h
t
are superharmonic functions,
and the rst of the two is strictly superharmonic in ,. Therefore also h must be
strictly superharmonic in ,, a contradiction.
The proof has provided us with a natural choice for the measure v
h
in the integral
representation: for a Borel set T

X
v
h
(T) = h(o) Pr
h
o
7
T|. (7.47)
7.48 Lemma. Let h
1
. h
2
be two strictly positive superharmonic functions, let
a
1
. a
2
> 0 and h = a
1
h
1
a
2
h
2
. Then v
h
= a
1
v
h
1
a
2
v
h
2
.
Proof. Let o = .
0
. .
1
. . . . . .
k
X L{f]. Then, by construction of the h-process,
h(o) Pr
h
o
7
0
= .
0
. . . . . 7
k
= .
k
|
= h(o)
(.
0
. .
1
)h(.
1
)
h(.
0
)

(.
k-1
. .
k
)h(.
k
)
h(.
k-1
)
= (.
0
. .
1
) (.
k-1
. .
k
) h(.
k
)
= a
1
(.
0
. .
1
) (.
k-1
. .
k
) h
1
(.
k
) a
2
(.
0
. .
1
) (.
k-1
. .
k
) h
2
(.
k
)
= a
1
h
1
(o) Pr
h
1
o
7
0
= .
0
. . . . . 7
k
= .
k
|
a
2
h
2
(o) Pr
h
2
o
7
0
= .
o
. . . . . 7
k
= .
k
|.
We see that the identity between measures
h(o) Pr
h
o
= a
1
h
1
(o) Pr
h
1
o
a
2
h
2
(o) Pr
h
2
o
is valid on all cylinder sets, and therefore on the whole o-algebra A. If T is a Borel
set in

X then we get
v
h
(T) = h(o) Pr
h
o
7
T|
= a
1
h
1
(o) Pr
h
1
o
7
T| a
2
h
2
(o) Pr
h
2
o
7
T|
= a
1
v
h
1
(T) a
2
v
h
2
(T).
as proposed.
7.49 Exercise. Let h
(X. 1). Use (7.8), (7.40) and (7.47) to show that

v
h
(,) = G(o. ,)
_
h(,) 1h(,)
_
.
In particular, let , X and h = 1(. ,). Show that v
h
=
,
.
In general, the measure in the integral representation of a positive (super)harmo-
nic function is not necessarily unique. We still have to face the question under
which additional properties it does become unique. Prior to that, we show that
every minimal harmonic function is of the form1(. ) with M always under
the hypotheses of (sub)stochasticity, irreducibility and transience.
7.50 Theorem. Let h be a minimal harmonic function. Then there is a point M
such that the unique measure v on

X which gives rise to an integral representation
h =
_
`
A
1(. j) Jv(j) is the point mass v =
. In particular,
h = 1(. ).
Proof. Suppose that we have
h =
_
`
A
1(. j) Jv(j).
By Theorem 7.45, such an integral representation does exist. We have v(

X) = 1
because h(o) = 1(o. j) = 1 for all j

X. Suppose that T

X is a Borel set
with 0 < v(T) < 1. Set
h
B
(.) =
1
v(T)
_
B
1(.. j) Jv(j)
and
h
`
A\B
(.) =
1
v(

X\T)
_
`
A\B
1(.. j) Jv(j).
Then h
B
and h
`
A\B
are positive superharmonic with value 1 at o, and
h = v(T) h
B

_
(1 v(T)
_
h
`
A\B
is a convex combination of two functions in the base B of the cone
. Therefore
we must have h = h
B
= h
`
A\B
. In particular,
_
B
h(.) Jv(j) = v(T) h(.) =
_
B
1(.. j) Jv(j)
for every . X and every Borel set T X (if v(T) = 0 or v(T) = 1, this is
trivially true). It follows that for each . X, one has 1(.. j) = h(.) for v-almost
every j. Since X is countable, we also have v() = 1, where
= {j

X : 1(.. j) = h(.) for all . X].
This set must be non-empty. Therefore there must be such that h = 1(. ).
If j = then 1(. j) = 1(. ) by the construction of the Martin compactication.
In other words, cannot contain more than the point , and v =
. Since h is
harmonic, we must have M.
We dene the minimal Martin boundary M
min
as the set of all M such
that 1(. ) is a minimal harmonic function. By now, we know that every minimal
harmonic function arises in this way.
7.51 Corollary. For a point M one has M
min
if and only if the limit
distribution of the associated h-process with h = 1(. ) is v
1(,)
=
.
Proof. The only if is contained in Theorem 7.50.
Conversely, let v
1(,)
=
. Suppose that 1(. ) = a

1
h
1
a
2
h
2
for two
positive superharmonic functions h
1
. h
2
with h
i
(o) = 1 and constants a
1
. a
2
> 0.
Since h
i
(o) = 1, the v
h
i
are probability measures. By Lemma 7.48
a
1
v
h
1
a
2
v
h
2
=
.
This implies v
h
1
= v
h
2
=
, and 1(. ) dB.

Now suppose that 1(. ) = 1(. ,) for some , X. (This can occur
only when 1 does not have nite range, since nite range implies that 1(. )
is harmonic, while 1(. ,) is not.) But then we know from Exercise 7.49 that
v
h
=
,
=
in contradiction with the initial assumption. By Theorem 7.6, it only

rests that h is a minimal harmonic function.
Note a small subtlety in the last lines, where the proof relies on the fact that we
distinguish M from , X even when 1(. ) = 1(. ,). Recall that this is
because we wanted X to be discrete in the Martin compactication, following the
approach of Hunt [32]. It may be instructive to reect about the necessary modi-
cations in the original approach of Doob [17], where would not be distinguished
from ,.
We can combine the last characterization of M
min
with Proposition 7.43 to obtain
the following.
7.52 Lemma. M
min
is a Borel set in

X.
Proof. As we have seen in (7.15), the topology of

X is induced by a metric 0(. ).
Let M. For a function h
with h(o) = 1, we consider the h-process

and apply Proposition 7.43 to the starting point o and the continuous function
n
= e
-n0(,)
:
_
`
A
n
Jv
h
=E
h
o
_
n
(7
)
_
=
xA
n
(.) v
h
(.) lim
n-o
,A
(n)
(o. ,) h(,)
n
(,).
If m o then
n
1
, and by dominated convergence

xA

n
(.) v
h
(.)
tends to 0, while the integral on the left hand side tends to v
h
(). Therefore
lim
n-o
lim
n-o
,A
(n)
(o. ,) h(,)
n
(,) = v
h
().
Setting h = 1(. ) and applying Corollary 7.51, we see that
M
min
=

M : lim
n-o
lim
n-o
,A

(n)
(o. ,) 1(,. ) e
-n0(,,)
= 1
_
.
Thus we have characterized M
min
as the set of points Min which the triple limit
(the third being summation over ,) of a certain sequence of continuous functions
on the compact set M is equal to 1. Therefore, M
min
is a Borel set by standard
measure theory on metric spaces.
We remark here that in general, M
min
can very well be a proper subset of M.
Examples where this occurs arise, among other, in the setting of Cartesian products
of Markov chains, see Picardello and Woess [46] and [W2, 28.B]. We can now
deduce the following result on uniqueness of the integral representation.
7.53 Theorem (Uniqueness of the representation). If h
then the unique

measure v on

X such that
v(M\ M
min
) = 0
and
h(.) =
_
`
A
1(.. ) Jv for all . X
is given by v = v
h
, dened in (7.47).
Proof. 1.) Let us rst verify that v
h
(M \ M
min
) = 0. We may suppose that
h(o) = 1. Let . g:

X R be two continuous functions. Then
E
h
o
_
(7
n
) g(7
nn
) 1
_nnj
_
=
x,,A
(n)
h
(o. .) (.)
(n)
h
(.. ,) g(,)
=
xA
(n)
(o. .) h(.) (.) E
h
x
_
g(7
n
) 1
_nj
_
.
Letting m o, by dominated convergence
E
h
o
_
(7
n
) g(7
o
) 1
D
1
_
=
xA
(n)
(o. .) h(.) (.) E
h
x
_
g(7
o
) 1
D
1
_
. (7.54)
since on C
o
we have = oand 7
= 7
o
. Now, on C
o
we also have 7
o
M.
Considering the restriction g[
M
of g to M, and applying (7.41) and Theorem 7.42
to the h-process, we see that
E
h
x
_
g(7
o
) 1
D
1
_
= E
h
x
_
g[
M
(7
)
_
=
_
M
1
h
(.. ) g() Jv
h
(). (7.55)
Hence we can rewrite the right hand side of (7.54) as
_
M
xA
(n)
(o. .) h(.) (.) 1
h
(.. ) g() Jv
h
()
=
_
M
xA
(n)
(o. .) (.) 1(.. ) g() Jv
h
()
=
_
M
xA
(n)
1(,)
(o. .) (.) g() Jv
h
()
=
_
M
E
1(,)
o
_
(7
n
) 1
_nj
_
g() Jv
h
().
If n othen as in (7.55), with . = o and with 1(. ) in the place of h
E
1(,)
o
_
(7
n
) 1
_nj
_
E
1(,)
o
_
M
(7
)
_
=
_
M
(j) Jv
1(,)
(j).
In the same way,
E
h
o
_
(7
n
) g(7
o
) 1
D
1
_
E
h
o
_
(7
o
) g(7
o
) 1
D
1
_
=
_
M
() g() Jv
h
().
Combining these equations, we get
_
M
()g() Jv
h
() =
_
M
__
M
(j) Jv
1(,)
(j)
_
g() Jv
h
().
This is true for any choice of the continuous function g on

X. One deduces that
() =
_
M
(j) Jv
1(,)
(j) for v
h
-almost every M. (7.56)
This is valid for every continuous function on

X. The boundary Mis a compact
space with the metric 0(. ). It has a denumerable dense subset {
k
: k N]. We
consider the countable family of continuous functions
k,n
= e
-n0(,
k
)
on

X.
Then v
h
(T
k,n
) = 0, where T
k,n
is the set of all which do not satisfy (7.56) with
=
k,n
. Then also v
h
(T) = 0, where T =
_
k,n
T
k,n
. If M\ T then for
every m and k
e
-n0(,
k
)
=
_
M
e
-n0(),
k
)
Jv
1(,)
(j).
There is a subsequence of (
k
) which tends to . Passing to the limit,
1 =
_
M
e
-n0(),)
Jv
1(,)
(j).
E. Poisson boundary. Alternative approach to the integral representation 209
If now m o, then the last right hand term tends to v
1(,)
(). Therefore
v
1(,)
=
, and via Corollary 7.51 we deduce that M

min
. Consequently
v
h
(M\ M
min
) _ v
h
(T) = 0.
2.) We show uniqueness. Suppose that we have a measure v with the stated
properties. We can again suppose without loss of generality that h(o) = 1. Then v
and v
h
are probability measures. Let :

X R be continuous. Applying (7.41),
Proposition 7.43 and (7.40) to the h-process,
_
M
min
() Jv
h
()
=
,A
(,) G
h
(o. ,)
_
1
uA
h
(,. n)
_
lim
n-o
1
n
h
(o)
=
,A
(,) G(o. ,)
_
h(,)
uA
(,. n) h(n)
_
lim
n-o
xA
(n)
(o. .) (.) h(.).
(7.57)
Choose j M
min
. Substitute h with 1(. j) in the last identity. Corollary 7.51
gives
(j) =
_
M
min
() Jv
1(,))
()
=
,A
(,) G(o. ,)
_
1(,. j)
uA
(,. n) 1(n. j)
_
lim
n-o
xA
(n)
(o. .) (.) 1(.. j).
Integrating the last expression with respect to v over M
min
, the sums and the limit
can exchanged with the integral (dominated convergence), and we obtain precisely
the last line of (7.57). Thus
_
M
min
(j) Jv(j) =
_
M
min
() Jv
h
()
for every continuous function on

X: the measures v and v
h
coincide.
E Poisson boundary. Alternative approach to the integral
representation
If v is a Borel measure on Mthen
h =
_
M
min
1(. ) Jv()
denes a non-negative harmonic function. Indeed, by monotone convergence (ap-
plied to the summation occurring in 1h), one can exchange the integral and the
application of 1, and each of the functions 1(. ) with M
min
is harmonic. If
u
then by Theorem 7.45

u(.) =
,A
1(.. ,) v
u
(,)
_
M
min
1(.. ) Jv
u
().
Set g(.) =

,A
1(.. ,) v
u
(,) and h(.) =
_
M
min
1(.. ) Jv
u
(). Then, as we
just observed, h is harmonic, and v
h
= v
u
[
M
by Theorem 7.53.
In viewof Exercise 7.49, we nd that g = G , where = u1u. In this way
we have re-derived the Riesz decomposition u = G h of the superharmonic
function u, with more detailed information regarding the harmonic part.
The constant function 1 = 1
A
is harmonic precisely when 1 is stochastic, and
superharmonic in general. If we set T =

X in Theorem 7.42, then we see that the
measure on

X, which gives rise to the integral representation of 1
A
in the sense of
Theorem 7.53, is the measure v
o
. That is, for any Borel set T

X,
v
1
(T) = v
o
(T) = Pr
o
7
T|.
If 1 is stochastic then = oand v
o
as well as all the other measures v
x
, . X,
are probability measures on M
min
. If 1 is strictly substochastic in some point, then
Pr
x
< o| > 0 for every ..
7.58 Exercise. Construct examples where Pr
x
< o| = 1 for every ., that is, the
Markov chain does not escape to innity, but vanishes (is absorbed by f) almost
surely.
[Hint: modify an arbitrary recurrent Markov chain suitably.]
In general, we can write the Riesz decomposition of the constant function 1
on X:
1 = h
0
G
0
with h
0
(.) = v
x
(M) for every . X. (7.59)
Thus, h
0
0 == < o almost surely == 1
A
is a potential.
In the sequel, when we speak of a function on M, then we tacitly assume that
is extended to the whole of

X by setting = 0 on X. Thus, a v
o
-integrable function
on Mis intended to be one that is integrable with respect to v
o
[
M
. (Again, these
subtleties do not have to be considered when 1 is stochastic.) The Poisson integral
of is the function
h(.) =
_
M
1(.. ) Jv
o
=
_
M
Jv
x
= E
x
_
(7
o
) 1
D
1
_
. . X. (7.60)
It denes a harmonic function (not necessarily positive). Indeed, we can decompose
=

-
and consider the non-negative measures Jv
() =
() Jv
o
().
Since v
o
(M\ M
min
) = 0 (Theorem 7.53), the functions h
=
_
M
1(. ) Jv
()
are harmonic, and h = h
h
-
. If is a bounded function then also h is bounded.
Conversely, the following holds.
7.61 Theorem. Every bounded harmonic function is the Poisson integral of a
bounded measurable function on M.
Proof. Claim. For every bounded harmonic function h on X there are constants
a. b R such that a h
0
_ h _ b h
0
.
To see this, we start with b _ 0 such that h(.) _ b for every .. That is,
u = b 1
A
h is a non-negative superharmonic function. Then u =

h b G
0
,
where

h = b h
0
h is harmonic. Therefore 0 _ 1
n
u =

h b 1
n
G
0

h as
n o, whence

h _ 0.
For the lower bound, we apply this reasoning to h.
The claim being veried, we now set c = b a and write c h
0
= h
1
h
2
,
where h
1
= b h
0
h and h
2
= h a h
0
. The h
i
are non-negative harmonic
functions. By Lemma 7.48,
v
h
1
v
h
2
= c v
h
0
= c 1
M
v
o
.
where 1
M
v
o
is the restriction of v
o
to M. In particular, both v
h
i
are absolutely
continuous with respect to 1
M
v
o
and have non-negative RadonNikodymdensities
i
supported on Mwith respect to v
o
. Thus, for i = 1. 2,
h
i
=
_
M
1(. )
i
() Jv
o
().
Adding the two integrals, we see that
2
= c 1
M
v
o
-almost everywhere, whence the
i
are v
o
-almost everywhere bounded. We now
get
h = h
2
a h
0
=
_
M
1(. ) () Jv
o
(). where =
2
a 1
M
.
This is the proposed Poisson integral.
The last proof becomes simpler when 1 is stochastic.
We underline that by the uniqueness theorem (Theorem 7.42), the bounded
harmonic function h determines the function uniquely v
o
-almost everywhere.
For the following measure-theoretic considerations, we continue to use the es-
sential part of the trajectory space, namely C = X
N
0
L C
|
as in (7.21). We write
(by slight abuse of notation) Afor the o-algebra restricted to that set, on which all
our probability measures Pr
x
live. Let be the shift operator on C, namely
(.
0
. .
1
. .
2
. . . . ) = (.
1
. .
2
. .
3
. . . . ).
Its actionextends toanyextendedreal randomvariable W denedonCbyW(o) =
W(o)
Following once more the lucid presentation of Dynkin [Dy], we say that such
a random variable is terminal or nal, if W = W and, in addition, W 0
on C
|
. Analogously, an event A is called terminal or nal, if its indicator
function 1
has this property. The terminal events form a o-algebra of subsets of

the ordinary trajectory space X
N
0
. Every non-negative terminal random variable
can be approximated in the standard way by non-negative simple terminal random
variables (i.e., linear combinations of indicator functions of terminal events).
Terminal means that the value of W(o) does not depend on the deletion,
insertion or modication of an initial (nite) piece of a trajectory within X
N
0
.
The basic example of a terminal random variable is as follows: let : M
min

o. o| be measurable, and dene
W(o) =
_
_
7
o
(o)
_
. if o C
o
.
0. otherwise.
There is a direct relationbetweenterminal randomvariables andharmonic functions,
which will lead us to the conclusion that every terminal random variable has the
above form.
7.62 Proposition. Let W _ 0 be a terminal random variable satisfying 0 <
E
o
(W ) < o. Then h(.) = E
x
(W ) < o, and h is harmonic on X.
The probability measures with respect to the h-process are given by
Pr
h
x
() =
1
h(.)
E
x
(1
W ). A. . X.
Proof. (Note that in the last formula, expectation E
x
always refers to the ordinary
probability measure Pr
x
on the trajectory space.)
If W is an arbitrary (not necessarily terminal) random variable on C, we can
write W(o) = W
_
7
0
(o). 7
1
(o). . . .
_
. Let ,
1
. . . . . ,
k
X (k _ 1) and denote,
as usual, by E
x
( [ 7
1
= ,
1
. . . . . 7
k
= ,
k
) expectation with respect to the
probability measure Pr
x
( [ 7
1
= ,
1
. . . . . 7
k
= ,
k
). By the Markov property and
time-homogeneity,
E
x
_
1
7
1
=,
1
,...,7
k
=,
k
j
W(7
k
. 7
k1
. . . . )
_
= Pr
x
7
1
= ,
1
. . . . . 7
k
= ,
k
|
E
x
_
W(7
k
. 7
k1
. . . . ) [ 7
1
= ,
1
. . . . . 7
k
= ,
k
_
= (.. ,
1
) (,
k-1
. ,
k
) E
x
_
W(7
k
. 7
k1
. . . . ) [ 7
k
= ,
k
_
= (.. ,
1
) (,
k-1
. ,
k
) E
,
_
W(7
0
. 7
1
. . . . )
_
(7.63)
In particular, if W _ 0 is terminal, that is, W(7
0
. 7
1
. 7
2
. . . . ) = W(7
1
. 7
2
. . . . ),
then for arbitrary ,,
E
x
(1
7
1
=,j
W ) = (.. ,) E
,
(W ).
Thus, if E
x
(W ) < o and (.. ,) > 0 then E
,
(W ) < o. Irreducibility now
implies that when E
o
(W ) < othen E
x
(W ) < ofor all .. In this case, using the
fact that W 0 on C
|
,
E
x
(W ) =E
x
_

,AL|
1
7
1
=,j
W
_
=
,A
E
x
(1
7
1
=,j
W ) =
,A
(.. ,) E
,
(W ):
the function h is harmonic.
In order to prove the proposed formula for Pr
h
x
, we only need to verify it for an
arbitrary cylinder set = C(a
0
. a
1
. . . . . a
k
), where a
0
. . . . . a
k
X. (We do not
need to consider a
}
= f, since the h-process does not visit f when it starts in X.)
For our cylinder set,
Pr
h
x
() =
x
(a
0
)
h
(a
0
. a
1
)
h
(a
k-1
. a
k
)
=
h(a
k
)
h(.)
x
(a
0
)(a
0
. a
1
) (a
k-1
. a
k
).
On the other hand, we apply (7.63) and get, using that W is terminal,
1
h(.)
E
x
(1
W ) =
1
h(.)
x
(a
0
)(a
0
. a
1
) (a
k-1
. a
k
) E
o
k
(W ).
which coincides with Pr
h
x
() as claimed.
The rst part of the last proposition remains of course valid for any nal random
variable W that is E
o
-integrable: then it is E
x
-integrable for every . X, and
h(.) = E
x
(W ) is harmonic. If W is (essentially) bounded then h is a bounded
harmonic function.
7.64 Theorem. Let W be a bounded, terminal random variable, and let be the
bounded measurable function on Msuch that h(.) = E
x
(W ) satises
h(.) =
_
M
Jv
x
.
Then
W = (7
o
) 1
D
1
Pr
x
-almost surely for every . X.
Proof. As proposed, we let be the function on M that appears in the Poisson
integral representation of h. Then W
t
= (7
o
) 1
D
1
is a bounded terminal
random variable that satises
E
x
(W
t
) = h(.) = E
x
(W ) for all . X.
Therefore the second part of Proposition 7.62 implies that
_
W J Pr
x
=
_
W
t
J Pr
x
for every A.
The result follows.
7.65 Corollary. Every terminal random variable W is of the form
W = (7
o
) 1
D
1
Pr
x
-almost surely for every . X,
where is a measurable function on M.
Proof. For bounded W , this follows from Theorems 7.61 and 7.64. If W is non-
negative, choose n N and set W
n
= min{n. W] (pointwise). This is a bounded,
non-negative terminal random variable, whence there is a bounded, non-negative
function
n
on Msuch that
W
n
=
n
(7
o
) 1
D
1
Pr
x
If we set = limsup
n
n
(pointwise on M) then W = (7
o
) 1
D
1
.
Finally, if W is arbitrary, then we decompose W = W
W
-
. We get W
(7
o
) 1
D
1
. Then we can set =
-
(choosing the value to be 0 whenever
this is of the indenite form o o; note that W
= 0 where W
-
> 0 and vice
versa).
For the following, we write v = 1
M
v
o
for the restriction of v
o
to M.
The pair (M. v), as a measure space with the Borel o-algebra, is called the
Poisson boundary of (X. 1). It is a probability space if and only if the matrix 1
is stochastic, in which case v = v
o
. Since v(M \ M
min
) = 0, we can identify
(M. v) with (M
min
. v). Besides being large enough for providing a unique integral
representation of all bounded harmonic functions, the Poisson boundary is also the
right model for the distinguishable limit points at innity which the Markov chain
(7
n
) can attain. Note that for describing those limit points, we do not only need
the topological model (the Martin compactication), but also the distribution of the
limit random variable. Theorem 7.64 and Corollary 7.65 show that the Poisson
boundary is the nest model for distinguishing the behaviour of (7
n
) at innity.
On the other hand, in many cases the Poisson boundary can be smaller than
M
min
in the sense that the support of v does not contain all points of M
min
. In
particular, we shall say that the Poisson boundary is trivial, if supp(v) consists of
a single point. In formulating this, we primarily have in mind the case when 1 is
stochastic. In the stochastic case, triviality of the Poisson boundary amounts to the
(weak) Liouville property: all bounded harmonic functions are constant.
When 1 is strictly substochastic in some point, recall that it may also happen
that < o almost surely, in which case v
x
(M) = 0 for all ., and there are no
non-zero bounded harmonic functions. In this case, the Poisson boundary is empty
(to be distinguished from trivial).
7.66 Exercise. As in (7.59), let h
0
be the harmonic part in the Riesz decomposition
of the superharmonic function 1 on X. Show that the following statements are
equivalent.
(a) The Poisson boundary of (X. 1) is trivial.
(b) The function h
0
of (7.59) is non-zero, and every bounded harmonic function
is a constant multiple of h
0
.
(c) One has h
0
(o) = 0, and
1
h
0
(o)
h
0
is a minimal harmonic function.
We next deduce the following theorem of convergence to the boundary.
7.67 Theorem (Probabilistic Fatou theorem). If is a v
o
-integrable function on M
and h its Poisson integral, then
lim
n-o
h(7
n
) = (7
o
) v
o
-almost surely on C
o
.
Proof. Suppose rst that is bounded. By Corollary 7.31, W = lim
n-o
h(7
n
)
exists Pr
x
-almost surely. This W is a terminal random variable. (From the lines
preceding the corollary, we see that W 0 on C
|
.) Since h is bounded, we can
use Lebesgues theorem (dominated convergence) to obtain
E
x
(W ) = lim
n-o
E
x
_
h(7
n
)
_
= lim
n-o
1
n
h(.) = h(.)
for every . X. Now Theorem 7.64 implies that W = (7
o
) 1
D
1
, as claimed.
Next, suppose that is non-negative and v
o
-integrable. Let N N and dene
1
= 1
_1j
and [
1
= 1
>1j
. Then
1
[
1
= . Let g
1
and h
1
be the
Poisson integrals of
1
and [
1
, respectively. These two functions are harmonic,
and g
1
(.) h
1
(.) = h(.).
We can write
h(.) = E
x
(Y ). where Y = (7
o
) 1
D
1
.
g
1
(.) = E
x
(V
1
). where V
1
=
1
(7
o
) 1
D
1
.
and
h
1
(.) = E
x
(Y
1
). where Y
1
= [
1
(7
o
) 1
D
1
.
Since
1
is bounded, we know from the rst part of the proof that
lim
n-o
g
1
(7
n
) = V
1
v
o
-almost surely on C
o
.
Furthermore, we know from Corollary 7.31 that W = lim
n-o
h(7
n
) and W
1
=
lim
n-o
h
1
(7
n
) exist and are terminal random variables.
We have W = V
1
W
1
and Y = V
1
Y
1
and need to show that W = Y
almost surely. We cannot apply the dominated convergence theorem to h
1
(7
n
), as
n o, but by Fatous lemma,
E
x
(W
1
) _ lim
n-o
E
x
_
h
1
(7
n
)
_
= h
1
(.).
Therefore
E
x
_
[W Y [
_
= E
x
_
[W
1
Y
1
[
_
_ E
x
(W
1
) E
x
(Y
1
) _ 2h
1
(.).
Now by irreducibility, there is C
x
> 0 such that h
1
(.) _ C
x
h
1
(o), see (7.3).
Since is v
o
-integrable, h
1
(o) = v
o
> N| 0 as N o. Therefore
W Y = 0 Pr
x
-almost surely, as proposed.
Finally, in general we can decompose =

-
and apply what we just
proved to the positive and negative parts.
Besides the Riesz decomposition (see above), also the approximation theo-
rem(Theorem6.46) can be easily deduced by the methods developed in this section:
if h
and V X is nite, then by transience of the h-process one has
,V
Pr
h
x
7
V
= ,|
_
= 1. if . V.
_ 1. otherwise.
Applying (7.37) to the h-process, this relation can be rewritten as
,V
G(.. ,) h(,) Pr
h
,
V
= 0|
_
= h(.). if . V.
_ h(.). otherwise.
If we choose an increasing sequence of nite sets V
n
with union X, we obtain h as
the pointwise limit from below of a sequence of potentials.
Alternative approach to the PoissonMartin integral representation
The above way of deducing the approximation theorem, involving Martin boundary
theory, is of course far more complicated than the one in Section 7.D. Conversely,
the integral representation can also be deduced directly from the approximation
theorem without prior use of the theorem on convergence to the boundary.
Alternative proof of the integral representation theorem. Let h
. By Theo-
rem 6.46, there is a sequence of non-negative functions
n
on X such that g
n
(.) =
G
n
(.) h(.) pointwise. We rewrite
G
n
(.) = G
n
(o)
,A
1(.. ,) v
n
(,).
where
v
n
(,) =
G(o. ,)
n
(,)
G
n
(o)
.
Then v
n
is a probability distribution on the set X, which is discrete in the topology
of

X. We can consider v
n
as a Borel measure on

X and rewrite
G
n
(.) = G
n
(o)
_
`
A
1(.. ) Jv
n
.
Now, the set of all Borel probability measures on a compact metric space is compact
in the topology of convergence in law (weak convergence), see for example [Pa].
This implies that there are a subsequence v
n
k
and a probability measure v on

X
such that
_
`
A
Jv
n
k

_
`
A
Jv
for every continuous function on

X. But the functions 1(.. ) are continuous,
and in the limit we obtain
h(.) = h(o)
_
`
A
1(.. ) Jv.
This provides an integral representation of h with respect to the Borel measure
h(o) v.
This proof appears simpler than the road we have taken in order to achieve the
integral representation. Indeed, this is the approach chosen in the original paper
by Doob [17]. It uses a fundamental and rather profound theorem, namely the
one on compactness of the set of Borel probability measures on a compact metric
space. (This can be seen as a general version of Hellys principle, or as a special
case of Alaoglus theorem of functional analysis: the dual of the Banach space of
all continuous functions on

X is the space of all signed Borel measures; in the
weak topology, the set of all probability measures is a closed subset of the unit
ball and thus compact.) After this proof of the integral representation, one still
needs to deduce from the latter the other theorems regarding convergence to the
boundary, minimal boundary, uniqueness of the representation. In conclusion, the
probabilistic approach presented here, which is due to Hunt [32], has the advantage
to be based only on a relatively elementary version of martingale theory.
Nevertheless, in lectures where time is short, it may be advantageous to deduce
the integral representation directly from the approximation theorem as above, and
then prove that all minimal harmonic functions are Martin kernels, precisely as
in Theorem 7.50. After this, one may state the theorems on convergence to the
boundary and uniqueness of the representation without proof.
This approach is supported by the observations at the end of Section B: in all
classes of specic examples of Markov chains where one is able to elaborate a con-
crete description of the Martin compactication, one also has at hand a direct proof
of convergence to the boundary that relies on the specic features of the respective
example, but is usually much simpler than the general proof of the convergence
theorem.
What we mean by concrete description is the following. Imagine to have a
class of Markov chains on some state space X which carries a certain geometric,
algebraic or combinatorial structure (e.g., a hyperbolic graph or group, an integer
lattice, an innite tree, or a hyperbolic graph or group). Suppose also that the
transition probabilities of the Markov chain are adapted to that structure. Then we
are looking for a natural compactication of that structure, a priori dened in the
respective geometric, algebraic or combinatorial terms, maybe without thinking yet
about the probabilistic model (the Markov chain and its transition matrix). Then we
want to know if this model may also serve as a concrete description of the Martin
compactication.
In Chapter 9, we shall carry out this program in the class of examples which is
simplest for this purpose, namely for random walks on trees. Before that, we insert
a brief chapter on random walks on lattices.
Chapter 8
Minimal harmonic functions on Euclidean lattices
In this chapter, we consider irreducible random walks on the Abelian group Z
d
in
the sense of (4.18), but written additively. Thus,
(n)
(k. l ) = j
(n)
(l k) for k. l Z
d
.
where j is a probability measure on Z
d
and j
(n)
its n-th convolution power given
by j
(1)
= j and
j
(n)
(k) =
k
1
k
n
=k
j(k
1
) j(k
n
) =
lZ
d
j
(n-1)
(l ) j(k l ).
(Compare with simple random walk on Z
d
, where j is equidistribution on the
set of integer unit vectors.) Recall from Lemma 4.21 that irreducibility means
_
n
supp(j
(n)
) = Z
d
, with supp(j
(n)
) = {k
1
k
n
: k
i
supp(j)]. The
action of the transition matrix on functions : Z
d
R is
1(k) =
lZ
d
(l ) j(l k) =
mZ
d
(k m) j(m). (8.1)
We rst study the bounded harmonic functions, that is, the Poisson boundary. The
following theorem was rst proved by Blackwell [8], while the proof given here
goes back to a paper by Dynkin and Malyutov [19], who attribute it to A. M.
Leonotovi c.
8.2 Theorem. All bounded harmonic functions with respect to j are constant.
Proof. The proof is based on the following.
Let h H
o
and l supp(j). Then we claim that
h(k l ) _ h(k) for every k Z
d
. (8.3)
Proof of the claim. Let c > 0 be a constant such that [h(k)[ _ c for all k. We set
g(k) = h(k l ) h(k). We have to verify that g _ 0. First of all,
1g(k) =
mZ
d
_
h(k l m) h(k m)
_
j(m) = g(k).
(Note that we have used in a crucial way the fact that Z
d
is an Abelian group.)
Therefore g H
o
, and [g(k)[ _ 2c for every k.
220 Chapter 8. Minimal harmonic functions on Euclidean lattices
Suppose by contradiction that
b = sup
kZ
d
g(k) > 0.
We observe that for each N N and every k Z
d
,
1-1
n=0
g(k nl ) = h(k Nl ) h(k) _ 2c.
We choose N sufciently large so that N
b
2
> 2c. We shall now nd k Z
d
such
that g(k nl ) >
b
2
for all n < N, thus contradicting the assumption that b > 0.
For arbitrary k Z
d
and n < N,
(n)
(k. k nl ) = j
(n)
(nl ) _
_
j(l )
_
n
_
_
j(l )
_
1-1
= a.
where 0 < a < 1. Since b(1
o
2
) < b, there must be k Z
d
with
g(k) > b
_
1
a
2
_
.
Observing that 1
n
g = g, we obtain for this k and for 0 _ n < N
b
_
1
j
(n)
(nl )
2
_
_ b
_
1
a
2
_
< g(k)
= g(k nl ) j
(n)
(nl )
mynl
g(k m) j
(n)
(m)
_ g(k nl ) j
(n)
(nl )
mynl
b j
(n)
(m)
= g(k nl ) j
(n)
(nl ) b
_
1 j
(n)
(nl )
_
.
Simplifying, we get g(knl ) >
b
2
for every n < N, as proposed. This completes
the proof of (8.3).
If h H
o
then we can apply (8.3) both to h and to h and obtain
h(k l ) = h(k) for every k Z
d
and every l supp(j).
Now let k Z
d
be arbitrary. By irreducibility, we can nd n > 0 and elements
l
1
. . . . . l
n
supp(j) such that k = l
1
l
n
. Then by the above
h(0) = h(l
1
) = h(l
1
l
2
) = = h(l
1
l
n-1
) = h(k).
and h is constant.
Chapter 8. Minimal harmonic functions on Euclidean lattices 221
Besides the constant functions, it is easy to spot another class of functions on
Z
d
that are harmonic for the transition operator (8.1). For c R
d
, let
c
(k) = e
ck
. k Z
d
. (8.4)
(Here, c k denotes the standard scalar product in R
d
.) Then
c
(k) is 1-integrable
if and only if
(c) =
kZ
d
e
ck
j(k) (8.5)
is nite, and in this case,
1
c
(k) =
lZ
d
e
ck
e
c(l-k)
j(l k) = (c)
c
(k). (8.6)
Note that (c) = 1
c
(0). If (c) = 1 then
c
is a positive harmonic function. For
the reference point o, our natural choice is the origin 0 of Z
d
, so that
c
(o) = 1.
8.7 Theorem. The minimal harmonic functions for the transition operator (8.1)
are precisely the functions
c
with (c) = 1.
Proof. A. Let h be a minimal harmonic function. For l Z
d
, set h
l
(k) =
h(k l ),h(l ). Then, as in the proof of Theorem 8.2,
1h
l
(k) =
1
h(l )
mZ
d
h(k ml ) j(m) = h
l
(k).
Hence h
l
B. If n _ 1, by iterating (8.1), we can write the identity 1
n
h = h as
h(k) =
lZ
d
h
l
(k) h(l ) j
(n)
(l ).
That is, h =

l
a
l
h
l
with a
l
= h(l ) j
(n)
(l ), and h is a convex combination of
the functions h
l
with a
l
> 0 (which happens precisely when l supp(j
(n)
)). By
minimality of h we must have h
l
= h for each l supp(j
(n)
). This is true for
every n, and h
l
= h for every l Z
d
, that is,
h(k l ) = h(k) h(l ) for all k. l Z
d
.
Now let e
i
be i -th unit vector in Z
d
(i = 1. . . . . J) and c R
d
the vector whose
i -th coordinate is c
i
= log h(e
i
). Then
h(k) = e
ck
.
and harmonicity of h implies (c) = 1.
B. Conversely, let c R
d
with (c) = 1. Consider the
c
-process:
(
c
(k. l ) =
(k. l )
c
(l )
c
(k)
= e
c(l-k)
j(l k).
These are the transition probabilities of the random walk on Z
d
whose law is the
probability distribution j
c
, where
j
c
(k) = e
ck
j(k). k Z
d
.
We can apply Theorem 8.2 to j
c
in the place of j, and infer that all bounded
harmonic functions with respect to 1
(
c
are constant. Corollary 7.11 yields that
c
is minimal with respect to 1.
8.8 Exercise. Check carefully all steps of the proofs to show that the last two
theorems are also valid if instead of irreducibility, one only assumes that supp(j)
generates Z
d
as a group.
Since we have developed Martin boundary theory only in the irreducible case,
we return to this assumption. We set
C = {c R
d
: (c) = 1]. (8.9)
This set is non-empty, since it contains 0. By Theorem 8.7, the minimal Martin
boundary is parametrised by C. For a sequence (c
n
) we have c
n
c if and only if
c
n

c
pointwise on Z
d
. Now recall that the topology on the Martin boundary
is that of pointwise convergence of the Martin kernels 1(. ), M. Thus, the
bijection C M
min
induced by c
c
is a homeomorphism. Theorem 7.53
implies the following.
8.10 Corollary. For every positive function h on Z
d
which is harmonic with respect
to j there is a unique Borel measure v on C such that
h(k) =
_
C
e
ck
Jv(c) for all k Z
d
.
8.11 Exercise (Alternative proof of Theorems 8.2 and 8.7). A shorter proof of the
two theorems can be obtained as follows.
vStart withpart Aof the proof of Theorem8.7, showingthat everyminimal harmonic
function has the form
c
with c C.
vArguing as before Corollary 8.10, infer that there is a subset C
t
of C such that the
mapping c
c
(c C
t
) induces a homeomorphism from C
t
to M
min
. It follows
that every positive harmonic function h has a unique integral representation as in
Corollary 8.10 with v(C \ C
t
) = 0.
Now consider a bounded harmonic function h. Show that the representing
measure must be v = c
0
, a multiple of the point mass at 0.
[Hint: if supp v = {0], show that the function h cannot be bounded.]
Theorem 8.2 follows.
v Now conclude with part B of the proof of Theorem 8.7 without any change,
showing a posteriori that C
t
= C.
We remark that the proof outlined in the last exercises is shorter than the road
taken above only because it uses the highly non-elementary integral representation
theorem. Thus, our completely elementary approach to the proofs of Theorems 8.2
and 8.7 is in reality more economic.
For the rest of this chapter, we assume in addition to irreducibility that 1 has
nite range, that is, supp(j) is nite. We analyze the properties of the function .
(c) =
ksupp()
e
ck
j(k)
is a nite sum of exponentials, hence a convex function, dened and differentiable
on the whole of R
d
. (It is of course also convex on the set where it is nite when
j does not have nite support).
8.12 Lemma. lim
[c[-o
(c) = o.
Proof. (8.6) implies that 1
n
c
= (c)
n
c
. Hence, applying (8.5) and (8.6) to the
probability measure j
(n)
,
(c)
n
=
ksupp(
.n/
)
e
ck
j
(n)
(k).
By irreducibility, we can nd n N and > 0 such that
n
k=1
j
(k)
(e
i
) _
for i = 1. . . . . J. Therefore
n
k=1
(c)
k
_
n
k=1
d
i=1
_
e
ce
i
j
(k)
(e
i
) e
-ce
i
j
(k)
(e
i
)
_
_
d
i=1
(e
c
i
e
-c
i
).
from which the lemma follows.
8.13 Exercise. Deduce the following. The set {c R
d
: (c) _ 1] is compact
and convex, and its topological boundary is the set C of (8.9). Furthermore, the
function assumes its absolute minimum in the unique point c
min
which is the
solution of the equation
ksupp()
e
ck
j(k) k = 0.
In particular,
c
min
= 0 == j = 0. where j =
kZ
d
j(k) k.
(The vector j is the average displacement of the random walk in one step.) Thus
C = {0] == j = 0.
Otherwise, c
min
belongs to the interior of the set { _ 1], and the latter is homeo-
morphic to the closed unit ball in R
d
, while the boundary C is homeomorphic to
the unit sphere S
d-1
= {u R
d
: [u[ = 1] in R
d
.
8.14 Corollary. Let j be a probability measure on Z
d
that gives rise to an irre-
ducible random walk.
(1) If j = 0 then all positive j-harmonic functions are constant, and the minimal
Martin boundary consists of a single point.
(2) Otherwise, the minimal Martin boundary M
min
is homeomorphic with the set
C and with the unit sphere S
d-1
in R
d
.
So far, we have determined the minimal Martin boundary M
min
and its topol-
ogy, and we know how to describe all positive harmonic functions with respect
to j. These results are due to Doob, Snell and Williamson [18], Choquet
and Deny [11] and Hennequin [31]. However, they do not provide the complete
knowledge of the full Martin compactication. We still have the following ques-
tions. (a) Do there exist non-minimal elements in the Martin boundary? (b) What
is the topology of the full Martin compactication of Z
d
with respect to j? This
includes, in particular, the problem of determining the directions of convergence
in Z
d
along which the functions
c
(c C) arise as pointwise limits of Martin
kernels. The answers to these questions require very serious work. They are due to
Ney and Spitzer [45]. Here, we display the results without proof.
8.15 Theorem. Let j be a probability measure with nite support on Z
d
which
gives rise to an irreducible random walk which is transient (== j = 0 or J _ 3).
(a) If j = 0 then the Martin compactication coincides with the one-point-
compactication of Z
d
.
(b) If j = 0 then the Martin boundary is homeomorphic with the unit sphere
S
d-1
. The topology of the Martin compactication is obtained as the closure
of the immersion of Z
d
into the unit ball via the mapping
k
k
1 [k[
.
If (k
n
) is a sequence in Z
d
such that k
n
,(1 [k
n
[) u S
d-1
then
1(. k
n
)
c
, where c is the unique vector in in R
d
such that (c) = 1
and the gradient V(c) is collinear with u.
The proof of this theoremin the original paper of Ney and Spitzer [45] requires
a large amount of subtle use of characteristic function theory. A shorter proof, also
quite subtle (communicated to the author by M. Babillot), is presented in 25.B
of the monograph [W2].
In particular, we see from Theorem 8.15 that M = M
min
. From Theorem 8.2
we know that the Poisson boundary is always trivial. In the case j = 0 it is easy
to nd the boundary point to which (7
n
) converges almost surely in the topology
of the Martin compactication. Indeed, by the law of large numbers,
1
n
7
n
j
almost surely. Therefore,
7
n
1 [7
n
[

j
[ j[
almost surely.
In the following gure, we illustrate the Martin compactication in the case J = 2
and j = 0. It is a sh-eyes view of the world, seen from the origin.
....................................................................................................................................................................................................................................................................................................................................................................
....................................................................................................................................................................................................................................................................................................................................................................
..................................................................................................................................................................................................................................................................................................................................................................................................
..................................................................................................................................................................................................................................................................................................................................................................................................
.................................................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................................................................................................................................................................................................
..........................................................................................................................................................................................................................................................................................................................................................................................
..........................................................................................................................................................................................................................................................................................................................................................................................
.............................................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..............................................................................................................................................................................................
.............................................................................................................................................................................................................................................................................................................................................................
.............................................................................................................................................................................................................................................................................................................................................................
................................................................................................................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................................................................................................................................................................
............................................................................................................................................................................................................................................................................................................................
............................................................................................................................................................................................................................................................................................................................
............................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..............................................................................................................................................................
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
............. ............. .............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
Figure 20
Chapter 9
Nearest neighbour random walks on trees
In this chapter we shall study a large class of Markov chains for which many
computations are accessible: we assume that the transition matrix 1 is adapted
to a specic graph structure of the underlying state space X. Namely, the graph
I(1) of Denition 1.6 is supposed to be a tree, so that we speak of a random walk
on that tree. In general, the use of the term random walk refers to a Markov chain
whose transition probabilities are adapted in some way to a graph or group structure
that the state space carries. Compare with random walks on groups (4.18), where
adaptedness is expressed in terms of the group operation. If we start with a locally
nite graph, then simple random walk is the standard example of a Markov chain
adapted to the graph structure, see Example 4.3. More generally, we can consider
nearest neighbour random walks, such that
(.. ,) > 0 if and only if . ~ ,. (9.1)
where . ~ , means that the vertices . and , are neighbours.
Trees lend themselves particularly well to computations with generating func-
tions. At the basis stands Proposition 1.43 (b), concerning cut points. We shall be
mainly interested in innite trees, but will also come back to nite ones.
.............................................................................................................................................................................................................................................................................................
,,,,,,,,,,,,,,,,,,,,
................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
............................................
................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
................................................
.......................................................................................................
......................................................................................................
..................................................
..........................................................
...........................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
........................................................
v
v
v v
v
v
v
v
v
v
v
v
v v
v
v
v
v
v
v
v
v
v
v
Figure 21
A Basic facts and computations
Recall that a tree T is a nite or innite, symmetric graph that is connected and
contains no cycle. (A cycle in a graph is a sequence .
0
. .
1
. . . . . .
k-1
. .
k
= .
0
|
such that k _ 3, .
0
. . . . . .
k-1
are distinct, and .
i-1
~ .
i
for i = 1. . . . . k.) All
our trees have to be nite or denumerable.
A. Basic facts and computations 227
We choose and x a reference point (root) o T . (Later, o will also serve as the
reference point in the construction of the Martin kernel). We write [.[ = J(.. o)
for the length . T . [.[ _ 1 then we dene the predecessor .
-
of . to be the
unique neighbour of . which is closer to the root: [.
-
[ = [.[ 1.
In a tree, a geodesic arc or path is a nite sequence = .
0
. .
1
. . . . . .
k-1
. .
k
|
of distinct vertices such that .
}-1
~ .
}
for = 1. . . . . k. Ageodesic ray or just ray
is a one-sided innite sequence = .
0
. .
1
. .
2
. . . . | of distinct vertices such that
.
}-1
~ .
}
for all N. A 2-sided innite geodesic or just geodesic is a sequence
= . . . . .
-2
. .
-1
. .
0
. .
1
. .
2
. . . . | of distinct vertices such that .
}-1
~ .
}
for all
Z. In each of those cases, we have J(.
i
. .
}
) = [ i [ for the graph distance
in T . An innite tree that is not locally nite may not possess any ray. (For example,
it can be an innite star, where a root has innitely many neighbours, all of which
have degree 1.)
Acrucial property of a tree is that for every pair of vertices .. ,, there is a unique
geodesic arc (.. ,) starting at . and ending at ,.
If .. , T are distinct then we dene the associated cone of T as
T
x,,
= {n T : , (.. n)].
(That is, n lies behind , when seen from ..) This is (or spans) a subtree of T . See
Figure 22. (The boundary dT
x,,
appearing in that gure will be explained in the
next section.)
........................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....................................................................................................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.........
.........
.........
.........
.........
.........
.........
.........
.........
..... . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . .
....
....
....
....
....
....
....
....
....
....
....
....
....
....
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
- -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
.
,

T
x,,
dT
x,,

T
x,,
Figure 22
On some occasions, it will be (technically) convenient to consider the following
variants of the cones. Let .. ,| be an (oriented) edge of T . We augment T
x,,
by
. and write T
x,,j
for the resulting subtree, which we call a branch of T . Given 1
on T , we dene 1
x,,j
on T
x,,j
by
x,,j
(. n) =
_
(. n). if . n T
x,,j
. = ..
1. if = .. n = ,.
(9.2)
228 Chapter 9. Nearest neighbour random walks on trees
Then J(,. .[:) is the same for 1 on T and for 1
x,,j
on T
x,,j
. Indeed, if the
random walk starts at , it cannot leave T
x,,j
before arriving at .. (Compare with
Exercise 2.12.)
The following is the basic ingredient for dealing with nearest neighbour random
walks on trees.
9.3 Proposition. (a) If .. , T and n lies on the geodetic arc (.. ,) then
J(.. ,[:) = J(.. n[:)J(n. ,[:).
(b) If . ~ , then
J(,. .[:) = (,. .) :
u:u~,
u,=x
(,. n) : J(n. ,[:) J(,. .[:)
=
(,. .) :
1

u:u~,
u,=x
(,. n) : J(n. ,[:)
.
Proof. Statement (a) follows from Proposition 1.43 (b), because n is a cut point
between . and ,. Statement (b) follows from (a) and Theorem 1.38:
J(,. .[:) = (,. .) :
u:u~,
u,=x
(,. n) : J(n. .[:).
and J(n. .[:) = J(n. ,[:)J(,. .[:) by (a).
9.4 Exercise. Deduce from Proposition 9.3 (b) that
J
t
(,. .[:) =
J(,. .[:)
2
(,. .) :
2

u:u~,
u,=x
J(,. .[:)
2
(,. n)
(,. .)
J
t
(n. ,[:).
(This will be used in Section H).
On the basis of these formulas, there is a simple algorithm to compute all the
generating functions on a nite tree T . They are determined by the functions
J(,. .[:), where . and , run through all ordered pairs of neighbours in T .
If , is a leaf of T , that is, a vertex which has a unique neighbour . in T , then
J(,. .[:) = (,. .) :. (In the stochastic case, we have (,. .) = 1, but below
we shall also refer to the case when 1 is substochastic.)
Otherwise, Proposition 9.3 (b) says that J(,. .[:) is computed in terms of the
functions J(n. ,[:), where n varies among the neighbours of , that are distinct
from .. For the latter, one has to consider the sub-branches T
,,uj
of T
x,,j
. Their
sizes are all smaller than that of T
x,,j
. In this way, the size of the branches reduces
step by step until one arrives at the leaves. We describe the algorithm, in which
every oriented edge .. ,| is recursively labeled by J(.. ,[:).
(1) For each leaf , and its unique neighbour ., label the oriented edge ,. .| with
J(,. .[:) = (,. .) : (= : when 1 is stochastic).
(2) Take any edge ,. .| which has not yet been labeled, but such that all the
edges n. ,| (where n = .) already carry their label J(n. ,[:). Label ,. .|
with the rational function
J(,. .[:) =
(,. .) :
1

u:u~,
uyx
(,. n) : J(n. ,[:)
.
Since the tree is nite, the algorithm terminates after [1(T )[ steps, where 1(T )
is the set of oriented edges (two oppositely oriented edges between any pair of
neighbours). After that, one can use Lemma 9.3 (a) to compute J(.. ,[:) for
arbitrary .. , T : if (.. ,) = . = .
0
. .
1
. . . . . .
k
= ,| then
J(.. ,[:) = J(.
0
. .
1
[:)J(.
1
. .
2
[:) J(.
k-1
. .
k
[:)
is the product of the labels along the edges of (.. ,). Next, recall Theorem 1.38:
U(.. .[:) =
,~x
(.. ,) : J(,. .[:)
is obtained from the labels of the ingoing edges at . T , and nally
G(.. ,[:) =
J(.. ,[:)
1 U(,. ,[:)
.
The reader is invited to carry out these computations for her/his favorite examples
of random walks on nite trees. The method can also be extended to certain classes
of innite trees & random walks, see below. In a similar way, one can compute
the expected hitting times E
,
(t
x
), where .. , T . If . = ,, this is J
t
(,. .[1).
However, here it is better to proceed differently, by rst computing the stationary
measure. The following is true for any tree, nite or not.
9.5 Fact. 1 is reversible.
Indeed, for our root vertex o T , we dene m(o) = 1. Then we can construct
the reversing measure m recursively:
m(.) = m(.
-
)
(.
-
. .)
(.. .
-
)
.
That is, if (o. .) = o = .
0
. .
1
. . . . . .
k-1
. .
k
= .| then
m(.) =
(.
0
. .
1
)(.
1
. .
2
) (.
k-1
. .
k
)
(.
1
. .
0
)(.
2
. .
1
) (.
k
. .
k-1
)
. (9.6)
9.7 Exercise. Change of base point: write m
o
for the measure of (9.6) with respect
to the root o. Verify that when we choose a different point , as the root, then
m
o
(.) = m
o
(,) m
,
(.).
The measure m of (9.6) is not a probability measure. We remark that one need
not always use (9.6) in order to compute m. Also, for reversibility, we are usually
only interested in that measure up to multiplication with a constant. For simple
random walk on a locally nite tree, we can always use m(.) = deg(.), and for a
symmetric random walk, we can also take m to be the counting measure.
9.8 Proposition. The random walk is positive recurrent if and only if the measure
m of (9.6) satises m(T ) =

xT
m(.) < o. In this case,
E
x
(t
x
) =
m(T )
m(.)
.
When , ~ . then
E
,
(t
x
) =
m(T
x,,
)
m(,)(,. .)
.
Proof. The criterion for positive recurrence and the formula for E
x
(t
x
) are those
of Lemma 4.2.
For the last statement, we assume rst that , = o and . ~ o. We mentioned
already that E
o
(t
x
) = E
o
(t
x
x,oj
), where t
x
x,oj
is the rst passage time to the point
. for the random walk on the branch T
x,oj
with transition probabilities given by
(9.2). If the latter walk starts at ., then its rst step goes to o. Therefore
E
x
(t
x
x,oj
) = 1 E
o
(t
x
).
Now the measure m
x,oj
with respect to base point o that makes the random walk
on the branch T
x,oj
reversible is given by
m
x,oj
(n) =
_
m(n). if n T
x,oj
\ {.}.
(o. .). if n = ..
Applying to the branch what we stated above for the random walk on the whole
tree, we have
E
x
(t
x
x,oj
) = m
x,oj
(T
x,oj
),m
x,oj
(.) =
_
m(T
x,o
) (o. .)
_
,(o. .).
Therefore
E
o
(t
x
) = m(T
x,o
),(o. .).
Finally, if , is arbitrary and . ~ , then we can use Exercise 9.7 to see how the
formula has to be adapted when we change the base point from o to ,.
Let .. , T be distinct and (,. .) = , = ,
0
. ,
1
. . . . . ,
k
= .|. By
Exercise 1.45
E
,
(t
x
) =
k
}=1
E
,
j1
(t
,
j
).
Thus, we have nice explicit formulas for all the expected rst passage times in the
positive recurrent case, and in particular, when the tree is nite.
Let us now answer the questions of Example 1.3 (the cat), which concerns the
simple random walk on a nite tree.
a) The probability that the cat will ever return to the root vertex o is 1, since the
random walk is recurrent (as T is nite).
The probability that it will return at the n-th step (and not before) is u
(n)
(o. o).
The explicit computation of that number may be tedious, depending on the struc-
ture of the tree. In principle, one can proceed by computing the rational function
U(o. o[:) via the algorithm described above: start at the leaves and compute re-
cursively all functions J(.. .
-
[:). Since deg(o) = 1 in our example, we have
U(o. o[:) = : J(. o[:) where is the unique vertex with
-
= o. Then one
expands that rational function as a power series and reads off the n-th coefcient.
b) The average time that the cat needs to return to o is E
o
(t
o
). Since m(.) =
deg(.) and deg(o) = 1, we get E
o
(t
o
) =

xT
deg(.) = 2([T [ 1). Indeed,
the sum of the vertex degrees is the number of oriented edges, which is twice the
number of non-oriented edges. In a tree T , that last number is [T [ 1.
c) The probability that the cat returns to the root before visiting a certain subset
Y of the set of leaves of the tree can be computed as follows: cut off the leaves of
that set, and consider the resulting truncated Markov chain, which is substochastic
at the truncation points. For that newrandomwalk on the truncated tree T
we have
to compute the functions J
(.. .
-
) = J
(.. .
-
[1) following the same algorithm
as above, but now starting with the leaves of T
. (The superscript refers of course

to the truncated random walk.) Then the probability that we are looking for is
U
(o. o) = J
(. o), where is the unique neighbour of o.

A different approach to the same question is as follows: the probability to
return to o before visiting the chosen set Y of leaves is the same as the probability
to reach o from before visiting Y . The latter is related with the Dirichlet problem
for nite Markov chains, see Theorem 6.7. We have to set dT = {o] L 1 and
T
o
= T \ dT . Then we look for the unique harmonic function h in H(T
o
. 1)
that satises h(o) = 1 and h(,) = 0 for all , Y . The value h() is the required
probability. We remark that the same problemalso has a nice interpretation in terms
of electric currents and voltages, see [D-S].
d) We askfor the expectednumber of visits toa givenleaf , before returningto o.
This is 1(o. ,) = 1(o. ,[1), as dened in (3.57). Always because (o. ) = 1,
this number is the same as the expected number of visits to , before visiting o,
when the random walk starts at . This time, we chop off the root o and consider
the resulting truncated random walk on the new tree T
= T \{o], which is strictly

substochastic at . Then the number we are looking for is G
(. ,) = G
(. ,[1),
where G
is the Green function of the truncated walk. This can again be computed
via the algorithm described above. Note that U
(,. ,) = J
(,
-
. ,), since ,
-
is
the only neighbour of the leaf ,. Therefore G
(. ,) = J
(. ,)
_
1J
(,
-
. ,)
_
.
On can again relate this to the Dirichlet problem. This time, we have to set
dT = {. ,]. We look for the unique function h that is harmonic on T
o
= T \ dT
which satises h(o) = 0 and h(,) = 1. Then h(.) = J
(.. ,) for every ., so that

G
(. ,) = h()
_
1 h(,
-
)
_
.
B The geometric boundary of an innite tree
After this prelude we shall now concentrate on innite trees and transient nearest
neighbour random walks. We want to see how the boundary theory developed in
Chapter 7 can be implemented in this situation. For this purpose, we rst describe
the natural geometric compactication of an innite tree.
9.9 Exercise. Showthat a locally nite, innite tree possesses at least one ray.
As mentioned above, an innite tree that is not locally nite might not possess
any ray. In the sequel we assume to have a tree that does possess rays. We do
not assume local niteness, but it may be good to keep in mind that case. A ray
describes a path from its starting point .
0
to innity. Thus, it is natural to use rays
in order to distinguish different directions of going to innity. We have to clarify
when two rays dene the same point at innity.
9.10 Denition. Two rays = .
0
. .
1
. .
2
. . . . | and
t
= ,
0
. ,
1
. ,
2
. . . . | in the
tree T are called equivalent, if their symmetric difference is nite, or equivalently,
there are i. N
0
such that .
in
= ,
}n
for all n _ 0.
An end of T is an equivalence class of rays.
9.11 Exercise. v Prove that equivalence of rays is indeed an equivalence relation.
v Show that for every point . T and end of T , there is a unique geodesic ray
(.. ) starting at . that is a representative of (as an equivalence class).
v Show that for every pair of distinct ends . j of T , there is a unique geodesic
(. j) = . . . . .
-2
. .
-1
. .
0
. .
1
. .
2
. . . . | such that (.
0
. ) = .
0
. .
-1
. .
-2
. . . . |
and (.
0
. j) = .
0
. .
1
. .
2
. . . . |.
B. The geometric boundary of an innite tree 233
................................................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
v
v
.
0
,
0
.
i
= ,
}
t
......................................................................................................................................................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
Figure 23. Two equivalent rays.
We write dT for the set of all ends of T . Analogously, when .. , T are
distinct, we dene
dT
x,,
= { dT : , (.. )].
compare with Figure 22. This is the set of ends of T which have a representative
ray that lies in T
x,,
. Equivalently, we may consider it as the set of ends of the tree
T
x,,
. (Attention: work out why this identication is legitimate!)
When . = o, the root, then we just write T
,
= T
o,,
= T
,
,,
and dT
,
=
dT
o,,
= dT
,
,,
.
If . j T LdT are distinct, then their conuent .j is the vertex with maximal
length on (o. ) (o. j), see Figure 24. If on the other hand = j, then we set
.j = j. The only case in which .j is not a vertex of T is when = j dT .
For vertices, we have
[. . ,[ =
1
2
_
[.[ [,[ J(.. ,)
_
. .. , T.
..................................................................................................................................................................................
v v o
................................................................................................................................................................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................................................................................................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
j
. j
Figure 24
9.12 Lemma. For all . j. T L dT ,
[ . j[ _ min{[ . [. [ . j[].
Proof. We assume without loss of generality that . j. are distinct and set . = .j
and , = . . Then we either have , (o. .) or , (.. ) \ {.].
If , (o. .) then
[ . j[ = [.[ _ [,[ = min{[ . [
= [,[
. [ . j[
_ [,[
].
If , (.. ) \ {.] then [ . j[ = [.[, and the statement also holds. (The reader
is invited to visualize the two cases by gures.)
We can use the conuents in order to dene a new metric on T L dT :
0(. j) =
_
e
-[.)[
. if = j.
0. if = j.
9.13 Proposition. 0 is an ultrametric on T L dT , that is,
0(. j) _ max{0(. ). 0(. j)] for all . j. T L dT.
Convergence of a sequence (.
n
) of vertices or ends in that metric is as follows.
If . T then .
n
. if and only if .
n
= . for all n _ n
0
.
If dT then .
n
if and only if [.
n
. [ o.
T is a discrete, dense subset of this metric space. The space is totally disconnected:
every point has a neighbourhood base consisting of open and closed sets. If the
tree T is locally nite, then T L dT is compact.
Proof. The ultrametric inequality follows from Lemma 9.12, so that 0 is a metric.
Let . T and [.[ = k. Then [. .[ _ k and consequently 0(.. ) _ e
-k
for
every T L dT . Therefore the open ball with radius e
-k
centred at . contains
only .. This means that T is discrete.
The two statements about convergence are now immediate, and the second
implies that T is dense.
A neighbourhood base of dT is given by the family of all sets
{j T L dT : 0(. j) < e
-k1
] = {j T L dT : [ . j[ > k 1]
= {j T L dT : [ . j[ _ k]
= {j T L dT : 0(. j) _ e
-k
]
= T
x
k
L dT
x
k
.
where k N and .
k
is the point on (o. ) with [.
k
[ = k. These sets are open as
well as closed balls.
Finally, assume that T is locally nite. Since T is dense, it is sufcient to show
that every sequence in T has a convergent subsequence in T L dT . Let (.
n
) be
such a sequence. If there is . T such that .
n
= . for innitely many n, then we
have a subsequence converging to .. We now exclude this case.
There are only nitely many cones T
,
= T
o,,
with , ~ o. Since .
n
= o for all
but nitely many n, there must be ,
1
~ o such that T
,
1
contains innitely many of
the .
n
. Again, there are only nitely many cones T
with
-
= ,
1
, so that there
B. The geometric boundary of an innite tree 235
must be ,
2
with ,
-
2
= ,
1
such that T
,
2
contains innitely many of the .
n
. We now
proceed inductively and construct a sequence (,
k
) such that ,
-
k1
= ,
k
and each
cone T
,
k
contains innitely many of the .
n
. Then = o. ,
1
. ,
2
. . . . | is a ray. If
is the end represented by , then it is clearly an accumulation point of (.
n
).
If T is locally nite, then we set
T = T L dT.
and this is a compactication of T in the sense of Section 7.B. The ideal boundary
of T is dT .
However, when T is not locally nite, T L dT is not compact. Indeed, if ,
is a vertex which has innitely many neighbours .
n
, n N, then (.
n
) has no
convergent subsequence in T L dT . This defect can be repaired as follows. Let
T
o
= {, T : deg(,) = o]
be the set of all vertices with innite degree. We introduce a new set
T
+
= {,
+
: , T
o
].
disjoint from T L dT , such that the mapping , ,
+
is one-to-one on T
o
. We
call the elements of T
+
the improper vertices. Then we dene
T = T L d
+
T. where d
+
T = T
+
L dT. (9.14)
For distinct points .. , T , we dene

T
x,,
accordingly: it consists of all vertices
in T
x,,
, of all improper vertices
+
with T
x,,
and of all ends dT which
have a representative ray that lies in that cone. The topology on

T is such that each
singleton {.] T is open (so that T is discrete). A neighbourhood base of dT
is given by all

T
x,,
that contain , and it is sufcient to take only the sets

T
,
=

T
o,,
,
where , (o. ) \ {o]. A neighbourhood base of ,
+
T
+
is given by the family
of all sets that are nite intersections of sets

T
x,
that contain ,, and it is sufcient
to take just all the nite intersections of sets of the form

T
x,,
, where . ~ ,.
We explain what convergence of a sequence (n
n
) of elements of T LdT means
in this topology.
We have n
n
. T precisely when n
n
= . for all but nitely many n.
We have n
n
dT precisely when [n
n
. [ o.
We have n
n
,
+
T
+
precisely when for every nite set of neighbours
of ,, one has (,. n
n
) = 0 for all but nitely many n. (That is, n
n
rotates around ,.)
Convergence of a sequence (.
+
n
) of improper vertices is as follows.
We have .
+
n
.
+
T
+
or .
+
n
dT precisely when in the above sense,
.
n
.
+
or .
n
, respectively.
9.15 Exercise. Verify in detail that this is indeed the convergence of sequences
induced by the topology introduced above. Prove that

T is compact.
The following is a useful criterion for convergence of a sequence of vertices to
an end.
9.16 Lemma. A sequence (.
n
) of vertices of T converges to an end if and only if
[.
n
. .
n1
[ o.
Proof. The only if is straightforward and left to the reader.
For sufciency of the criterion, we rst observe that conuents can also be
dened when improper vertices are involved: for .
+
. n
+
T
+
, , T , and
dT ,
.
+
. , = . . ,. .
+
. n
+
= . . n. and .
+
. = . . .
Let (.
n
) satisfy [.
n
. .
n1
[ o. By Lemma 9.12,
[.
n
. .
n
[ _ min{[.
k
. .
k1
[ : i = m. . . . . n 1]
which tends to o as m o and n > m. Suppose that (.
n
) has two distinct
accumulation points j. in

T . Let = j . , a vertex of T . Then there must be
innitely many m and n > m such that .
n
. .
n
= , a contradiction. Therefore
(.
n
) must have a limit in

T . This cannot be a vertex. If it were an improper vertex
.
+
then again .
n
..
n
= . for innitely many mand n > m, another contradiction.
Therefore the limit must be an end.
The reader may note that the difculty that has lead us to introducing the im-
proper vertices arises because we insist that T has to be discrete in our compacti-
cation. The following will also be needed later on for the study of random walks,
both in the case when the tree is locally nite or not.
A function : T R is called locally constant, if the set of edges
{.. ,| 1(T ) : (.) = (,)]
is nite.
9.17 Exercise. Show that the locally constant functions constitute a linear space L
which is spanned by the set
L
0
= {1
T
x;y
: .. ,| 1(T )]
of the indicator functions of all branches of T . (Once more, edges are oriented
here!)
C. Convergence to ends and identication of the Martin boundary 237
We can nowapply Theorem7.13: there is a unique compactication of T which
contains T as a discrete, dense subset with the following properties: (a) every
function in L
0
, and therefore also every function in L, extends continuously, and
(b) for every pair of points . j in the associated ideal boundary, there is an extended
function that assumes different values at and j.
It is now obvious that this is just the compactication described above,

T =
T LT
+
LdT . Indeed, the continuous extension of 1
T
x;y
is just 1
`
T
x;y
, and it is clear
that those functions separate the points of d
+
T .
C Convergence to ends and identication of the Martin
boundary
As one may expect in view of the efforts undertaken in the last section, we shall
show that for a transient nearest neighbour random walk on a tree T , the Martin
compactication coincides with the geometric compactication. At the end of
Section 7.B, we pointed out that in most known concrete examples where one is
able to achieve that goal, one can also give a direct proof of boundary convergence.
In the present case, this is particularly simple.
9.18 Theorem. Let (7
n
) be a transient nearest neighbour random walk with
stochastic transition matrix on the innite tree T . Then 7
n
converges almost surely
to a random end of T : there is a dT -valued random variable 7
o
such that in the
topology of the geometric compactication

T ,
lim
n-o
7
n
= 7
o
Pr
x
-almost surely for every starting point . T.
Proof. Consider the following subset of the trajectory space C = T
N
0
:
C
0
=
_
o = (.
n
)
n_0
C :
.
n
~ .
n-1
.
[{n N
0
: .
n
= ,}[ < o for every , T
_
.
Then Pr
x
(C
0
) = 1 for every starting point ., and of course 7
n
(o) = .
n
. Now let
o = (.
n
) C
0
. We dene recursively a subsequence (.
k
), starting with
0
= max{n : .
n
= .
0
]. and
k1
= max{n : .
n
= .
k
1
]. (9.19)
By construction of C
0
,
k
is well dened for each k. The points .
k
, k N
0
, are
all distinct, and .
kC1
~ .
k
. Thus .
0
. .
1
. .
2
. . . . | is a ray and denes an end
of T . Also by construction, .
n
T
x
0
,x
k
for all k _ 1 and all n _
k
. Therefore
.
n
.
In probabilistic terminology, the exit times
k
are non-negative randomvariables
(but not stopping times). If we recall the notation of (7.34), then
k
=
B(x
0
,k)
,
the exit time from the ball T(.
0
. k) = {, T : J(,. .
0
) _ k] in the graph metric
of T . We see that by transience, those exit times are almost surely nite even in the
case when T is not locally nite and the balls may be innite. The following is of
course quite obvious from the beginning.
9.20 Corollary. If the tree T contains no geodesic ray then every nearest neighbour
random walk on T is recurrent.
Note that when T is not locally nite, then it may well happen that its diameter
sup{J(.. ,) : .. , T ] is innite, while T possesses no ray.
Let us now study the Martin compactication. Setting : = 1 in Proposi-
tion 9.3 (a), we obtain for the Martin kernel with respect to the root o,
1(.. ,) =
J(.. ,)
J(o. ,)
=
J(.. . . ,) J(. . ,. ,)
J(o. . . ,) J(. . ,. ,)
=
J(.. . . ,)
J(o. . . ,)
= 1(.. . .,).
If we write (o. .) = o =
0
.
1
. . . . .
k
= .| then
1(.. ,) =
_
_
1(.. o) = J(.. o) for , T
1
,o
.
1(..
}
) for , T
j
\ T
jC1
( = 1. . . . . k 1).
1(.. .) = 1,J(o. .) for , T
x
.
This can be rewritten as
1(.. ,) = 1(.. o)
k
}=1
_
1(..
}
) 1(..
}-1
)
_
1
T
j
(,). (9.21)
We see that 1(.. ) is a locally constant function, which leads us to the following.
9.22 Theorem. Let 1 be the stochastic transition matrix of a transient nearest
neighbour random walk on the innite tree T . Then the Martin compactication of
(T. 1) coincides with the geometric compactication

T . The continuous extension
of the Martin kernel to the Martin boundary d
+
T = T
+
L dT is given by
1(.. ,
+
) = 1(.. ,) for ,
+
T
+
. and 1(.. ) = 1(.. . . ) for dT.
Each function 1(. ), where dT , is harmonic. The minimal Martin boundary
is the space of ends dT of T .
Proof. We know from the preceding computation that for each . T , the kernel
1(.. ) is locally constant on T . Therefore it has a continuous extension to

T . This
extension is the one stated in the theorem: when ,
n
,
+
then . . ,
n
= . . ,
for all but nitely many n, and 1(.. ,
n
) = 1(.. . . ,) = 1(.. ,) for those n.
Analogously, if ,
n
then . .,
n
= . . and thus 1(.. ,
n
) = 1(.. . .) for
all but nitely many n.
We next show that 1(. ) is harmonic when dT . (When T is locally
nite, this is clear, because all extended kernels are harmonic when 1 has nite
range.) Let . T and let be the point on (o. ) with
-
= . . . Then
1(,. ) = 1(,. ) for all , ~ ., and 1(.. ) = 1(.. ). Now the function
1(. ) = G(. ),G(o. ) is harmonic in every point except . In particular, it is
harmonic in .. Therefore 1(. ) is harmonic in every ..
The extended kernel separates the boundary points:
1.) If n
+
. ,
+
T
+
are distinct, then the functions 1(. n
+
) = 1(. n) and
1(. ,
+
) = 1(. ,) are distinct, since the rst is harmonic everywhere except at
n, while the second is not harmonic at ,.
2.) If ,
+
T
+
and dT then 1(. ) is harmonic, while 1(. ,
+
) is strictly
superharmonic at ,. Therefore the two functions do not coincide.
3.) The interesting case is the one where . j dT are distinct. Let . = . j
and let , be the neighbour of . on (.. ), see Figure 25.
..................................................................................................................................................................................
o
................................................................................................................................................................................................. . . . . . . . . . . .
. . . . . . . . . . . .
................................................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . .
j
v v
v .
,
Figure 25
Then, since J(o. ,) = J(o. .)J(.. ,),
1(,. j)
1(,. )
=
1(,. .)
1(,. ,)
=
J(o. ,)J(,. .)
J(o. .)
= J(.. ,)J(,. .) _ U(.. .)
by Exercise 1.44. Now U(.. .) < 1 by transience, so that 1(. j) = 1(. ).
We see that the Martin compactication is

T . The last step is to show that
1(. ) is a minimal harmonic function for every end dT . Suppose that
1(. ) = a h
1
(1 a) h
2
(0 < a < 1)
for two positive harmonic functions with h
i
(o) = 1.
Let . T , and let , be an arbitrary point on the ray (.. ). Lemma 6.53,
applied to = {,], implies that h(.) _ J(.. ,)h(,) for every positive harmonic
function. [This can be seen more directly: G
h
(.. ,) = G(.. ,)h(,),h(.) is the
Green kernel associated with the h-process, and G
h
(.. ,) = J
h
(.. ,)G
h
(,. ,) _
G
h
(,. ,). Dividing by G
h
(,. ,) and multiplying by h(.), one gets the inequality.]
On the other hand, our choice of , implies by Lemma 9.3 (a), as so often that
1(.. ) = J(.. ,)1(,. ). Therefore
1(.. ) = a h
1
(.) (1 a) h
2
(.)
_ a J(.. ,)h
1
(,) (1 a) J(.. ,)h
2
(,)
= J(.. ,)1(,. ) = 1(.. ).
The inequality in the middle cannot be strict, and we deduce that
J(.. ,)h
i
(,) = h
i
(.) (i = 1. 2)
for all . T and all , (.. ). In particular, choose , = . .. Then, applying
the last formula to the points . and o (in the place of .),
h
i
(.) =
h
i
(.)
h
i
(o)
=
J(.. ,)h
i
(,)
J(o. ,)h
i
(,)
= 1(.. ,) = 1(.. ).
We conclude that 1(. ) is minimal.
We nowstudy the family of limit distributions (v
x
)
xT
on the space of ends dT ,
given by
v
x
(T) = Pr
x
7
o
T|. T a Borel set in dT.
(We can also consider v
x
as a measure on the compact set d
+
T = T
+
L dT that
does not charge T
+
.) For any xed ., the Borel o-algebra of dT is generated by
the family of sets dT
x,,
, where , varies in T \ {.]. Therefore v
x
is determined by
the measures of those sets.
9.23 Proposition. Let .. , T be distinct, and let n be the neighbour of , on the
arc (.. ,). Then
v
x
(dT
x,,
) = J(.. ,)
1 J(,. n)
1 J(n. ,)J(,. n)
.
Proof. Note that dT
x,,
= dT
u,,
. If 7
0
= . and 7
o
dT
x,,
then s
,
< o,
the random walk has to pass through ,, since , is a cut point between . and T
x,,
.
Therefore, via the Markov property,
v
x
(dT
x,,
) = Pr
x
7
o
dT
x,,
. s
,
< o|
=
o
n=1
Pr
x
7
o
dT
x,,
[ s
,
= n| Pr
x
s
,
= n|
=
o
n=1
Pr
,
7
o
dT
x,,
| Pr
x
s
,
= n|
= J(.. ,) v
,
(dT
u,,
) = J(.. ,)
_
1 v
,
(dT
,,u
)
_
.
since dT
u,,
= dT \ dT
,,u
. In particular,
v
u
(dT
u,,
) = J(n. ,)
_
1 v
,
(dT
,,u
)
_
and
v
,
(dT
,,u
) = J(,. n)
_
1 v
u
(dT
u,,
)
_
.
From these two equations, we compute
v
u
(dT
u,,
) = J(n. ,)
1 J(,. n)
1 J(n. ,)J(,. n)
.
Finally, we have v
x
(dT
x,,
) = J(.. n) v
u
(dT
u,,
), since the random walk start-
ing at . must pass through n when 7
o
dT
x,,
. Recalling that J(.. ,) =
J(.. n)J(n. ,), this leads to the proposed formula.
The next formula follows from the fact that v
x
(dT ) = 1.
, : ,~x
J(.. ,)
1 J(,. .)
1 J(.. ,)J(,. .)
= 1 for every . T. (9.24)
We are primarily interested in v
o
= v
1
, the probability measure on dT that
appears in the integral representation of the constant harmonic function 1. As
mentioned already, the sets dT
x
= dT
o,x
, where . = o, are a base of the topology
on dT . By the above,
v
o
(dT
x
) = J(o. .)
1 J(.. .
-
)
1 J(.
-
. .)J(.. .
-
)
. (9.25)
where (recall) .
-
is the predecessor of . on (o. .). We see that the support of v
o
is
supp(v
o
) = { dT : v
o
(dT
x
) > 0 for all . (o. ). . = o]
= { dT : J(.. .
-
) < 1 for all . (o. ). . = o].
We call an end of T transient if J(.. .
-
) < 1 for all . (o. )\{o], and we call
it recurrent otherwise. This terminology is justied by the following observations:
if . ~ , then the function : J(,. .[:) is the generating function associated with
the rst return time t
x
x,,j
to . for 1
x,,j
. Thus, J(,. .) = 1 if and only if 1
x,,j
on the branch T
x,,j
is recurrent. In this case we shall also say more sloppily that
the cone T
x,,
is recurrent, and call it transient, otherwise. We see that an end is
transient if and only if each cone T
x
is transient, where . (o. ) \ {o].
9.26 Exercise. Show that when . ~ , and J(,. .) = 1 then J(n. ) = 1 for all
. n T
x,,
with ~ n and J(. ,) < J(n. ,). Thus, when T
x,,
is recurrent,
then all ends in dT
x,,
are recurrent.
Conclude that transience of an end does not depend on the choice of the root o.

We see that for a transient randomwalk on T , there must be at least one transient
end, and that all the measures v
x
, . T , have the same support, which consists
precisely of all transient ends. This describes the Poisson boundary. Furthermore,
the transient ends together with the origin o span a subtree T
tr
of T , the transient
skeleton of (T. 1). Namely, by the above,
T
tr
= {o] L {. = o : J(.. .
-
) < 1] (9.27)
is such that when . T
tr
\ {o] then .
-
T
tr
, and . must have at least one
forward neighbour. Then, by construction dT
tr
consists precisely of all transient
ends. Therefore 7
o
dT
tr
Pr
x
-almost surely for every .. On its way to the limit
at innity, the random walk (7
n
) can of course make substantial detours into
T \ T
tr
.
Note that T
tr
depends on the choice of the root, T
tr
= T
o
tr
. For . ~ o, we have
three cases.
(a) If J(.. o) < 1 and J(o. .) < 1, then T
x
tr
= T
o
tr
.
(b) If J(.. o) < 1 but J(o. .) = 1, then T
x
tr
= T
o
tr
\ {o].
(c) If J(.. o) = 1 but J(o. .) < 1, then T
x
tr
= T
o
tr
L {.].
Proceeding by induction on the distance J(o. .), we see that T
x
tr
and T
o
tr
only
differ by at most nitely many vertices from the geodesic segment (o. .).
It may also be instructive to spend some thoughts on T
tr
= T
o
tr
as an induced
subnetwork of T in the sense of Exercise 4.54;
a(.. ,) = m(.)(.. ,). if .. , T
tr
. . ~ ,.
With respect to those conductances, we get new transition probabilities that are
reversible with respect to a new measure m
tr
, namely
m
tr
(.) =
,~x : ,T
tr
a(.. ,) = m(.)(.. T
tr
) and
tr
(.. ,) = (.. ,),(.. T
tr
). if .. , T
tr
. . ~ ,.
(9.28)
The resulting random walk is (7
n
), conditioned to stay in T
tr
.
Let us now look at some examples.
9.29 Example (The homogeneous tree). This is the tree T = T
q
where every
vertex has degree q 1, with q _ 2. (When q = 1 this is Z, visualized as the
two-way-innite path.)
........................................................................................................................................................................................................
......................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.........................................................
.........................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.........................................................
.........................................................
..................
..................
..................
..................
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
..................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..................
..................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..................
o
- -
-
-
-
-
-
-
-
-
-
-
-
-
Figure 26
We consider the simple random walk. There are many different ways to see that
it is transient. For example, one can use the ow criterion of Theorem 4.51. We
choose a root o and dene the ow by (e) = 1
_
(q 1)q
n-1
_
= ( ` e) if
e = .
-
. .| with [.[ = n. Then it is straightforward that is a ow from o to o
with input 1 and with nite power.
The hitting probability J(.. ,) must be the same for every pair of neighbours
.. ,. Thus, formula (9.24) becomes (q 1)J(.. ,)
_
1 J(.. ,)
_
= 1, whence
J(.. ,) = 1,q. For an arbitrary pair of vertices (not necessarily neighbours), we
get
J(.. ,) = q
-d(x,,)
. .. , T.
We infer that the distribution of 7
o
, given that 7
0
= o, is equidistribution on dT ,
v
o
(dT
x
) =
1
(q 1)q
n-1
. if [.[ = n _ 1.
We call it equidistribution because it is invariant under rotations of the tree
around o. (In graph theoretical terminology, it is invariant under the group of all
self-isometries of the tree that x o.) Every end is transient, that is, the Poisson
boundary (as a set) is supp(v
o
) = dT .
The Martin kernel at dT is given by
1(.. ) = q
-hor(x,)
. where hor(.. ) = J(.. . . ) J(o. . . ).
Below, we shall immediately come back to the function hor. The PoissonMartin
integral representation theorem now says that for every positive harmonic function
h on T = T
q
, there is a unique Borel measure v
h
on dT such that
h(.) =
_
JT
q
-hor(x,)
Jv
h
().
Let us now give a geometric meaning to the function hor. In an arbitrary tree,
we can dene for . T and T LdT the Busemann function or horocycle index
of . with respect to by
hor(.. ) = J(.. . . ) J(o. . . ). (9.30)
9.31 Exercise. Show that for . X and dT ,
hor(.. ) = lim
,-
hor(.. ,) = lim
,-
J(.. ,) J(o. ,).
The Busemann function should be seen in analogy with classical hyperbolic
geometry, where one starts with a geodesic ray =
_
(t )
_
t_0
, that is, an isometric
embedding of the interval 0. o) into the hyperbolic plane (or another suitable
metric space), and considers the Busemann function
hor(.. ) = lim
t-o
_
J
_
.. (t )
_
J
_
o. (t )
__
.
Ahorocycle in a tree with respect to an end is a level set of the Busemann function
hor(. ):
Hor
k
= Hor
k
() = {. T : hor(.. ) = k]. k Z. (9.32)
This is the analogue of a horocycle in hyperbolic plane: in the Poincar disk model
of the latter, a horocycle at a boundary point on the unit circle (the boundary of
the hyperbolic disk) is a circle inside the disk that is tangent to the unit circle at .
Let us consider another example.
9.33 Example. We now construct a new tree, here denoted again T , by attaching a
ray (hair) at each vertex of the homogeneous tree T
q
, see Figure 27. Again, we
consider simple random walk on this tree. It is transient by Exercise 4.54, since the
T
q
is a subgraph on which SRW is transient.
...............................................................................................................................................................................
.....................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
............... . . . . . . . . . . . . . . .
............................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
.............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................................
.............................................................................................................................................................
.............................................................................................................................................................
............................................................................................................................
............................................................................................................................
......................................................................................................................................................................................................................
......................................................................................................................................................................................................................
o
Figure 27
- -
- -
- -
-
-
-
-
-
-
-
-
-
-
-
-
The hair attached at . T
q
has one end, for which we write j
x
. Since simple
random walk on a single ray (a standard birth-and-death chain with forward and
backward probabilities equal to 1,2) is recurrent, each of the ends j
x
, . T
q
, is
recurrent.
Let . be the neighbour of . on the hair at .. Then J( .. .) = 1. Again, by
homogeneity of the structure, J(.. ,) is the same for all pairs .. , of neighbours in
T that belong both to T
q
. We infer that T
tr
= T
q
, and formula (9.24) at . becomes
(q 1)
J(.. ,)
1 J(.. ,)
J(.. .)
1 J( .. .)
1 J(.. .)J( .. .)

= 0
= 1.
Again, J(.. ,) = 1,q for neighbours .. , T
q
T . The limit distribution v
o
on dT is again uniform distribution on dT
q
dT . What we mean here is that we
know already that v
o
({j
x
: . T
q
]) = 0, so that we can think of v
o
as a Borel
probability measure on the compact subset dT
q
of dT , and v
o
is equidistributed on
that set in the above sense. (Of course, o is chosen to be in T
q
.)
It may be noteworthy that in this example, the set {j
x
: . T
q
] of recurrent
ends is dense in dT in the topology of the geometric compactication of T . On the
other hand, each j
x
is isolated, in that the singleton {j
x
] is open and closed.
Note also that of course each of the recurrent as well as of the transient ends
denes a minimal harmonic function 1(. ). However, since v
x
(j
,
) = 0 for all
.. ,, every bounded harmonic function is constant on each hair. (This can also be
easily veried directly via the linear recursion that a harmonic function satises on
each hair.) That is, it arises from a bounded harmonic function h for SRW on T
q
such that its value is h(.) along the hair attached at ..
For computing the (extended) Martin kernel, we shall need J(.. .). We know
that J(,. .) = 1,q for all , ~ ., , T
q
. By Proposition 9.3 (b),
J(.. .) =
(.. .)
1

,~x,,T
q
(.. ,) J(,. .)
=
1
q2
1
q1
(q2)q
=
q
q
2
q 1
.
Nowlet .. , T
q
be distinct (not necessarily neighbours), n (.. j
x
) (a generic
point on the hair at .), and dT
q
dT . We need to compute 1(n. ), 1(n. j
,
)
and 1(n. j
x
) in order to cover all possible cases.
(i) We have J(n. .) = 1 by recurrence of the end j
x
. Therefore
1(n. ) = J(n. .) 1(.. ) = 1(.. ) = q
-hor(x,)
.
(ii) We have n . j
,
= n . ,. Therefore
1(n. j
,
) =1(n. n.,) =1(n. ,) =J(n. .) 1(.. ,) =1(.. ,) =q
-hor(x,,)
.
(iii) We have n . j
x
= n for every n (.. j
x
), so that 1(n. j
x
) =
1(n. n) = 1,J(o. n). In order to compute this explicitly, we use harmonicity of
the function h
x
= 1(. j
x
). Write (.. j
x
) = . = n
0
. n
1
= .. n
2
. . . . |. Then
h
x
(n
0
) =
1
J(o. .)
= q
[x[
and h
x
(n
1
) =
1
J(o. .)J(.. .)
=
q
2
q 1
q
q
[x[
.
and h
x
(n
n
) =
1
2
_
h
x
(n
n-1
) h
x
(n
n1
)
_
for n _ 1. This can be rewritten as
h
x
(n
n1
) h
x
(n
n
) = h
x
(n
n
) h
x
(n
n-1
) = h
x
(n
1
) h
x
(n
0
) =
q
2
1
q
q
[x[
.
We conclude that h
x
(n
n
) = h
x
(n
0
) n
_
h
x
(n
1
) h
x
(n
0
)
_
, and nd 1(n. j
x
).
We also write the general formula for the kernel at j
,
(since n varies and j
,
is
xed, we have to exchange the roles of . and , with respect to the above !):
1(n. j
,
) =
_
q
[,[
_
1 J(n. ,)
q
2
-1
q
_
. if n (,. j
,
).
q
-hor(x,,)
. if n (.. j
x
). . T
q
. . = ,.
Thus, for every positive harmonic function h on T there is a Borel measure v
h
on
dT such that
h(n) = h
1
(n) h
2
(n). where for n (.. j
x
). . T
q
.
h
1
(n) =
_
JT
q
q
-hor(x,)
Jv
h
() and
h
2
(n) = q
[x[
_
1 J(n. .)
q
2
-1
q
_
v
h
(j
x
)
,T
q
, ,yx
q
-hor(x,,)
v
h
(j
,
).
The function h
1
in this decomposition is constant on each hair.
D The integral representation of all harmonic functions
Before considering further examples, let us return to the general integral repre-
sentation of harmonic functions. If h is positive harmonic for a transient nearest
neighbour random walk on a tree T , then the measure v
h
on dT in the Poisson
Martin integral representation of h is h(o) (the limit distribution of the h-process),
see (7.47). The Green function of the h-process is G
h
(.. ,) = G(.. ,)h(,),h(.),
and G
h
(.. .) = G(.. .). Furthermore, the associated hitting probabilities are
J
h
(.. ,) = J(.. ,)h(,),h(.). We can apply formula (9.25) to the h-process,
replacing J with J
h
:
v
h
(dT
x
) = J(o. .)
h(.) J(.. .
-
)h(.
-
)
1 J(.
-
. .)J(.. .
-
)
.
D. The integral representation of all harmonic functions 247
Conversely, if we start with v on dT , then the associated harmonic function is
easily computed on the basis of (9.21). If . T and the geodesic arc from o to .
is (o. .) = o =
0
.
1
. . . . .
k
= .| then
h(.) =
_
JT
1(.. ) Jv
= 1(.. o) v(dT )
k
}=1
_
1(..
}
) 1(..
}-1
)
_
v(dT
j
).
(9.34)
Note that 1(..
}
) 1(..
}-1
) = 1(..
}
)
_
1 J(
}
.
}-1
)J(
}-1
.
}
)
_
.
We see that the integral in (9.34) takes a particularly simple form due to the fact
that the integrand is the extension to the boundary of a locally constant function, and
we do not need the full strength of Lebesgues integration theory here. This will al-
lowus to extend the PoissonMartin representation to get an integral representation
over the boundary of all (not necessarily positive) harmonic functions.
Before that, we need two further identities for generating functions that are
specic to trees.
9.35 Lemma. For a transient nearest neighbour random walk on a tree T ,
G(.. .[:) (.. ,): =
J(.. ,[:)
1 J(.. ,[:)J(,. .[:)
if , ~ .. and
G(.. .[:) = 1
,:,~x
J(.. ,[:)J(,. .[:)
1 J(.. ,[:)J(,. .[:)
for all : with [:[ < r(1), and also for : = r(1).
Proof. We use Proposition 9.3 (b) (exchanging . and ,). It can be rewritten as
J(.. ,[:) = (.. ,):
_
U(.. .[:) (.. ,): J(,. .[:)
_
J(.. ,[:).
Regrouping,
(.. ,):
_
1 J(.. ,[:)J(,. .[:)
_
= J(.. ,[:)
_
1 U(.. .[:)
_
. (9.36)
Since G(.. .[:) = 1
_
1 U(.. .[:)
_
, the rst identity follows. For the second
identity, we multiply the rst one by J(,. .[:) and sum over , ~ . to get
,:,~x
J(.. ,[:)J(,. .[:)
1 J(.. ,[:)J(,. .[:)
=
,:,~x
(.. ,): G(,. .[:) = G(.. .[:) 1
by (1.34).
For the following, recall that T
x
= T
o,x
for . = o. For convenience we write
T
o
= T .
A signed measure v on the collection of all sets
F
o
= {dT
x
: . T ]
is a set function v : F
o
R such that for every .
v(dT
x
) =
,:,
=x
v(dT
,
).
When deg(.) = o, the last series has to converge absolutely. Then we use formula
(9.34) in order to dene
_
JT
1(.. ) Jv. The resulting function of . is called the
Poisson transform of the measure v.
9.37 Theorem. Suppose that 1 denes a transient nearest neighbour random walk
on the tree T .
A function h: T R is harmonic with respect to 1 if and only if it is of the
form
h(.) =
_
JT
1(.. ) Jv.
where v is a signed measure on F
o
. The measure v is determined by h, that is,
v = v
h
, where
v
h
(dT ) = h(o) and v
h
(dT
x
) = J(o. .)
h(.) J(.. .
-
)h(.
-
)
1 J(.
-
. .)J(.. .
-
)
. . = o.
Proof. We have to verify two principal facts.
First, we start with h and have to show that v
h
, as dened in the theorem, is a
signed measure on F
o
, and that h is the Poisson transform of v
h
.
Second, we start with v and dene h by (9.34). We have to show that h is
harmonic, and that v = v
h
.
1.) Given the harmonic function h, we claim that for any . T ,
h(.) =
, : ,~x
J(.. ,)
h(,) J(,. .)h(.)
1 J(.. ,)J(,. .)
. (9.38)
We can regroup the terms and see that this is equivalent with
_
1
, : ,~x
J(.. ,)J(,. .)
1 J(.. ,[:)J(,. .[:)
_
h(.) =
, : ,~x
J(.. ,)
1 J(.. ,)J(,. .)
h(,).
By Lemma 9.35, this reduces to
G(.. .) h(.) =
,~x
G(.. .) (.. ,) h(,).
D. The integral representation of all harmonic functions 249
which is true by harmonicity of h.
If we set . = o, then (9.38) says that v
h
(dT ) =

,~o
v
h
(dT
,
). Suppose that
. = o. Then by (9.38),
,:,
=x
v
h
(dT
,
) = J(o. .)
, : ,
=x
J(.. ,)
h(,) J(,. .)h(.)
1 J(.. ,)J(,. .)
= J(o. .)
_
h(.) J(.. .
-
)
h(.
-
) J(.
-
. .)h(.)
1 J(.. .
-
)J(.
-
. .)
_
= J(o. .)
h(.) J(.. .
-
)h(.
-
)
1 J(.. .
-
)J(.
-
. .)
= v
h
(dT
x
).
We have shown that v
h
is indeed a signed measure on F
o
. Now we check that
_
JT
1(.. ) Jv
h
= h(.). For . = o this is true by denition. So let . = o. Using
the same notation as in (9.34), we simplify
_
1(..
}
) 1(..
}-1
)
_
v
h
(dT
j
) = J(..
}
)
_
h(
}
) J(
}
.
}-1
)h(
}-1
)
_
.
whence we obtain a telescope sum
_
JT
1(.. ) Jv
h
= 1(.. o)h(o)
k
}=1
_
J(..
}
)h(
}
) J(..
}-1
)h(
}-1
)
_
= J(.. .)h(.) = h(.).
2.) Given v, let h(.) =
_
JT
1(.. ) Jv.
9.39 Exercise. Show that h is harmonic at o.
Resuming the proof of Theorem 9.37, we suppose again that . = o and use the
notation of (9.34). With the xed index k = [.[, we consider the function
g(n) = 1(n. o) v(dT )
k
}=1
_
1(n.
}
) 1(n.
}-1
)
_
v(dT
j
). n T.
Since for i < one has 1(
i
.
}
) = 1(
i
.
i
), we have h(
}
) = g(
}
) for all
_ k. In particular, h(.) = g(.) and h(.
-
) = g(.
-
). Also,
h(,) = g(,)
_
1(,. ,) 1(,. .)
_
v(dT
,
). when ,
-
= ..
Recalling that 1 1(.. ) = 1(.. )
1
G(o,)
1
(.), we rst compute

1g(.) = g(.)
1
G(o. .)
v(dT
x
).
Now, using Lemma 9.35,
1h(.) = 1g(.)
,:,
=x
(.. ,)
1 J(.. ,)J(,. .)
J(o. .)J(.. ,)
v(dT
,
)
= h(.)
1
G(o. .)
v(dT
x
)
1
J(o. .)
,:,
=x
1
G(.. .)
v(dT
,
)
= h(.)
1
G(o. .)
v(dT
x
)
1
G(o. .)
,:,
=x
v(dT
,
) = h(.).
Finally, to show that v
h
(dT
x
) = v(dT
x
) for all . T , we use induction on
k = [.[. The statement is trivially true for . = o. So let once more [.[ = k _ 1
and (o. .) =
0
= o.
1
. . . . .
k
= .|, and assume that v
h
(dT
j
) = v(dT
j
)
for all < k. Since we know that
_
JT
1(.. ) Jv
h
= h(.) =
_
JT
1(.. ) Jv, the
induction hypothesis and (9.34) yield that
_
1(.. .) 1(..
}-1
)
_
v
h
(dT
x
) =
_
1(.. .) 1(..
}-1
)
_
v(dT
x
).
Therefore v
h
(dT
x
) = v(dT
x
).
In the case when h is a positive harmonic function, the measure v
h
is a non-
negative measure on F
o
. Since the sets in F
o
generate the Borel o-algebra of dT ,
one can justify with a little additional effort that v
h
extends to a Borel measure
on dT . In this way, one can deduce the PoissonMartin integral theorem directly,
without having to go through the whole machinery of Chapter 7.
Some references are due here. The results of this and the preceding section are
basically all contained in the seminal article of Cartier [Ca]. Previous results,
regarding the case of random walks on free groups ( homogeneous trees) can
be found in the note of Dynkin and Malyutov [19]. The part of the proof of
Theorem 9.22 regarding minimality of the extended kernels 1(. ), dT , is
based on an argument of Derriennic [13], once more in the context of free groups.
The integral representation of arbitrary harmonic functions is also contained in [Ca],
but was also proved more or less independently by various different methods in the
paper of Koranyi, Picardello and Taibleson [38] and by Steger, see [21], as
well as in some other work.
All those references concern only the locally nite case. The extension to
arbitrary countable trees goes back to an idea of Soardi, see [10].
E. Limits of harmonic functions at the boundary 251
E Limits of harmonic functions at the boundary
The Dirichlet problem at innity
We nowconsider another potential theoretic problem. In Chapter 6 we have studied
and solved the Dirichlet problem for nite Markov chains with respect to a subset
the boundary of the state space. Now let once more T be an innite tree
and 1 the stochastic transition matrix of a nearest neighbour random walk on T .
The Dirichlet problem at innity is the following. Given a continuous function
on d
+
T =

T \ T , is there a continuous extension of to

T that is harmonic on T ?
This problem is related with the limit distributions v
x
, . T , of the random
walk. We have been slightly ambiguous when speaking about the (common) support
of these probability measures: since we know that 7
o
dT almost surely, so far
we have considered them as measures on dT . In the spirit of the construction of
the geometric compactication (which coincides with the Martin compactication
here) and the general theorem of convergence to the boundary, the measures should
a priori be considered to live on the compact set d
+
T =

T \ T = dT LT
+
. This is
the viewpoint that we adopt in the present section, which is more topology-oriented.
If T is locally nite, then of course there is no difference, and in general, none of
the two interpretations regarding where the limit distributions live is incorrect.
In any case, now supp(v
o
) is the compact subset of d
+
T that consists of all
points d
+
T with the property that v
x
(V ) > 0 for every neighbourhood V of
in

T . We point out that supp(v
x
) = supp(v
o
) for every . T , and that v
x
is
absolutely continuous with respect to v
o
with RadonNikodym density
Jv
x
Jv
o
= 1(.. ).
see Theorem 7.42.
9.40 Proposition. If the Dirichlet problem at innity admits a solution for every
continuous function on the boundary, then the solution is unique and given by the
harmonic function
h(.) =
_
J
T
Jv
x
=
_
J
T
1(.. ) Jv
o
.
Proof. If h is the solution with boundary data given by , then h is a bounded
harmonic function. By Theorem 7.61, there is a bounded measurable function [
on the Martin boundary M =

T \ T such that
h(.) =
_
J
T
[ Jv
x
.
By the probabilistic Fatou theorem 7.67,
h(7
n
) [(7
o
) Pr
o
-almost surely.
Since h provides the continuous harmonic extension of , and since 7
n
7
o
Pr
o
-almost surely,
h(7
n
) (7
o
) Pr
o
-almost surely.
We conclude that [ and coincide v
o
-almost surely. This proves that the solution
of the Dirichlet problem is as stated, whence unique.
Let us remark here that one can state the Dirichlet problem for an arbitrary
Markov chain and with respect to the ideal boundary in an arbitrary compactication
of the innite state space. In particular, it can always be stated with respect to the
Martin boundary. In the latter context, Proposition 9.40 is correct in full generality.
There are some degenerate cases: if the randomwalk is recurrent and [d
+
T [ _ 2
then the Dirichlet problem does not admit solution, because there is some non-
constant continuous functiononthe boundary, while all boundedharmonic functions
are constant. Onthe other hand, when[d
+
T [ = 1thenthe constant functions provide
the trivial solutions to the Dirichlet problem.
The last proposition leads us to the following local version of the Dirichlet
problem.
9.41 Denition. (a) A point d
+
T is called regular for the Dirichlet problem if
for every continuous function on d
+
T , its Poisson integral h(.) =
_
J
T
Jv
x
satises
lim
x-
h(.) = ().
(b) We say that the Green kernel vanishes at , if
lim
,-
G(,. o) = 0.
We remark that lim
,-
G(,. o) = 0 if and only if lim
,-
G(,. .) = 0 for
some (== every) . T . Indeed, let k = J(.. o). Then
(k)
(.. o) > 0, and
G(,. .)
(k)
(.. o) _ G(.. o).
Also, if .
+
T
+
and {,
k
: k N] is an enumeration of the neighbours of . in
T , then the Green kernel vanishes at .
+
if and only if
lim
k-o
G(,
k
. .) = 0.
This holds because if (n
n
) is an arbitrary sequence in T that converges to .
+
,
then k(n) o, where k(n) is the unique index such that n
n
T
x,,
k.n/
: then
G(n
n
. .) = J(n
n
. ,
k(n)
)G(,
k(n)
. .) 0.
Regularity of a point d
+
T means that
lim
,-
v
,
=
weakly,
where weak convergence of a sequence of nite measures means that the integrals
of any continuous function converge to its integral with respect to the limit measure.
The following is quite standard.
9.42 Lemma. A point d
+
T is regular for the Dirichlet problem if and only if
for every set d
+
T
,u
=

T
,u
\ T
,u
that contains (. n T , n = ),
lim
,-
v
,
(d
+
T
,u
) = 1.
Proof. The indicator function 1
J
T
v;w
is continuous. Therefore regularity implies
that the above limit is 1.
Conversely, assume that the limit is 1. Let be a continuous function on the
compact set d
+
T , and let M = max [[. Write h for the Poisson integral of .
We have () =
_
J
T
() Jv
,
(j). Given c > 0, we rst choose

T
,u
such that
[() (j)[ < c for all j d
+
T
,u
. Then, if , T
,u
is close enough to , we
have v
,
(d
+
T
u,
) < c. For such ,,
[h(,) ()[ _
_
J
T
v;w
[() (j)[ Jv
,
(j)
_
J
T
w;v
[() (j)[ Jv
,
(j)
_ c v
,
(d
+
T
,u
) 2M v
,
(d
+
T
u,
) < (1 2M)c.
This concludes the proof.
9.43 Theorem. Consider a transient nearest neighbour randomwalk on the tree T .
(a) A point d
+
T is regular for the Dirichlet problem if and only if the Green
kernel vanishes at , and in this case, supp(v
o
) d
+
T .
(b) The regular points form a Borel set that has v
x
-measure 1.
Proof. Consider the set
T =

d
+
T : lim
,-
G(,. .) = 0
_
.
Since G(n. o) _ G(,. o) for all n T
,
, we can write
T =
_
nN
_
,:G(,,o)~1{n
d
+
T
,
.
so that T is a Borel set. We know that it is independent of the choice of .. Our
rst claim is that v
x
(T) = 1 for some and hence every . T . We take . = o.
Consider the event
C =
_
o C :
7
0
(o) = o. 7
n1
(o) ~ 7
n
(o) for all n.
G
_
7
n
(o). o
_
0. 7
n
(o) 7
o
(o) dT
_
.
Then Pr
o
(
C) = 1, see Exercise 7.32. For dT , let us write

k
() for the point
on (o. ) at distance k from o. For o

C, the sequence
_
7
n
(o)
_
visits every
point on the ray from o to 7
o
(o). Recall the sequence of exit times
k
(o) =
max
n : 7
n
(o) =
k
_
7
o
(o)
__
. Then
k
(o) oand
G
_
k
_
7
o
(o)
_
. o
_
= G
_
7
k
(o). o
_
0.
We obtain
v
o
_
dT : G
_
k
(). o
_
0
__
= Pr
o
_
7
o

dT : G
_
k
(). o
_
0
__
= 1.
If , and G
_
k
(). o
_
0 then , . =
k
() with k = k(,) o, and
G(,. o) = J(,. , . ) G
_
k
(). o
_
_ G
_
k
(). o
_
0.
This shows that v
o
(T) = 1.
We now prove statement (a), and then statement (b) follows from the above.
Suppose that the Green kernel vanishes at d
+
T . Let

T
,u
be a neighbour-
hood of , where ~ n. Its complement in

T is

T
u,
. Let , T
,u
. Then
v
,
(
T
u,
) = J(,. n) v
u
(
T
u,
) _ G(,. n) 0. as , .
Therefore any nite intersection U =
_
n
}=1

T
j
,u
j
of such neighbourhoods of
satises v
,
(U) 1 as , . Lemma 9.42 yields that is regular, and we also
conclude that v
o
(U) > 0 for every basic neighbourhood of , whence supp(v
o
).
Conversely, suppose that is regular. If v
o
=
, then must be an end, and

since v
x
(T) = 1, the Green kernel vanishes at . So suppose that supp(v
o
) has at
least two elements. One of them must be . There must be a basic neighbourhood
T
,u
( ~ n) of such that its complement contains some element of supp(v
o
).
Let = 1
J
T
w;v
and h its Poisson integral. By assumption,
0 = lim
,-
h(,) = lim
,-
J(,. n) v
u
(d
+
T
u,
)

> 0
.
Therefore G(,. n) = J(,. n)G(n. n) 0 as , .
The proof of the following corollary is left as an exercise.
9.44 Corollary (Exercise). The Dirichlet problem at innity admits solution if and
only if the Green kernel vanishes at innity, that is, for every c > 0 there is a nite
subset
t
T such that
G(.. o) < c for all . T \
t
.
In this case, supp(v
o
) = d
+
T , the full boundary.
Let us next consider some examples.
9.45 Examples. (a) For simple random walk on the homogeneous tree of Exam-
ple 9.29 and Figure 26, we have J(.. ,) = q
-d(x,,)
. We see that the Green kernel
vanishes at innity, and the Dirichlet problem is solvable.
(b) For simple random walk on the tree of Example 9.33 and Figure 27, all the
hairs give rise to recurrent ends j
x
, . T
q
. If n (.. j
x
) then J(n. .) = 1,
so that j
x
is not regular for the Dirichlet problem. On the other hand, the Green
kernel of simple random walk on T clearly vanishes at innity on the subtree T
q
,
and all ends in dT
q
dT are regular for the Dirichlet problem. The set dT
q
is
a closed subset of the boundary, and every continuous function on dT
q
has a
continuous extension to

T which is harmonic on T . For the extended function,
the values at the ends j
x
, . T
q
, are forced by , since the harmonic function is
constant on each hair.
As dT
q
is dense in dT , in the spirit of this section we should consider the support
of the limit distribution to be supp(v
o
) = dT , although the random walk does not
converge to one of the recurrent ends.
In this example, as well as in (a), the measure v
o
is continuous: v
o
() = 0 for
every dT .
We also note that each non-regular (recurrent) end is itself an isolated point
in dT .
We now consider an example with a non-regular point that is not isolated.
9.46 Example. We construct a tree starting with the half-line N
0
, whose end is
= o. At each point we attach a nite path of length (k) (a nite hair). At
the end of each of those hairs, we attach a copy of the binary tree by its root. (The
binary tree is the tree where the root, as well as any other vertex ., has precisely two
forward neighbours: the root has degree 2, while all other points have degree 3.)
See Figure 28. We write T for the resulting tree.
Figure 28
-
-
-
-
-
-
-
- - -
-
-
-
-
-
-
.......................................
. . . . . . . . . . . .
o = 0
1
k
. . . . . . . . . . . . . . . .................
. . . . . . . . . . . . . . . .................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................
. . . . . . . . . . . . . . . .................
. . . . . . . . . . . . . . . .................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................
. . . . . . . . . . . . . . . .................
. . . . . . . . . . . . . . . .................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................
.
.
.
.
.
.
.
.
.
.....
Our root vertex is o = 0 on the backbone N
0
. We consider once more
simple random walk. If n is one of the vertices on one of the attached copies
of the binary tree, then J(n. n
-
) is the same as on the homogeneous tree with
degree 3. (Compare with Exercise 2.12 and the considerations following (9.2).)
From Example 9.29, we know that J(n. n
-
) = 1,2. Exercise 9.26 implies that
J(.. .
-
) < 1 for every . = o. All ends are transient, and supp(v
o
) = dT , the full
boundary. The Green kernel vanishes at every end of each of the attached binary
trees. That is, every end in dT \ {] is regular for the Dirichlet problem. We now
show that with a suitable choice of the lengths (k) of the hairs, the end is
not regular. We suppose that (k) oas k o. Then it is easy to understand
that J(k. k 1) 1.
Indeed, consider rst a tree

T similar to ours, but with (k) = ofor each k,
that is, each of the hairs is an innite ray (no binary tree). SRW on this tree is
recurrent (why?) so that

J(1. 0) = 1 for the associated hitting probability. Then let
J
n
(1. 0) be the probability that SRW on

T starting at 1 reaches 0 before leaving the
ball

T
n
with radius n around 0 in the graph metric of

T . Then lim
n-o

J
n
(1. 0) =
J(1. 0) = 1 by monotone convergence. On the other hand, in our tree T , consider

the cone T
k
= T
0,k
. If n(k) = inf{(m) : m _ k] then the ball of radius n(k)
centred at the vertex k in T
k
is isomorphic with the ball

T
n(k)
in

T . Therefore
J(k. k 1) _

J
n(k)
(1. 0) 1 as k o.
Next, we write k
t
for the neighbour of the vertex k N that lies on the hair
attached at k. We have J(k
t
. k) _
_
(k) 1)
_
(k), since the latter quotient is
the probability that SRW on T starting at k
t
reaches k before the root of the binary
tree attached at the k-th hair. (To see this, we only need to consider the drunkards
walk of Example 1.46 on {0. 1. . . . . (k)] with = q = 1,2.)
Now Proposition 9.3 (b) yields for SRW on T
1 J(k. k 1) = 1
1
3 J(k 1. k) J(k
t
. k)
=
_
1 J(k 1. k)
_
_
1 J(k
t
. k)
_
1
_
1 J(k 1. k)
_
_
1 J(k
t
. k)
_
_
_
1 J(k 1. k)
_
_
1 J(k
t
. k)
_
.
Recursively, we deduce that for each r _ k,
1 J(k. k 1) _
_
1 J(r 1. r)
_
n=k
_
1 J( m. m)
_
.
We let r oand get
1 J(k. k 1) _
o
n=k
_
1 J( m. m)
_
.
Therefore
o
k=1
_
1 J(k. k 1)
_
_
o
k=1
k
_
1 J(k
t
. k)
_
_
o
k=1
k
(k)
.
If we choose (k) > k such that the last series converges (for example, (k) = k
3
),
then
J(n. 0) =
n
k=1
J(k. k 1)
o
k=1
J(k. k 1) > 0
(because for a sequence of numbers a
k
(0. 1), the innite product
k
a
k
is > 0
if and only if
k
(1 a
k
) < o), and the end = ois non-regular.
Next, we give an example of a non-simple random walk, also involving vertices
with innite degree.
9.47 Example. Let again T = T
x-1
be the homogeneous tree with degree s _ 3,
but this time we also allow that s = o. We colour the non-oriented edges of T
by the numbers (colours) in the set , where = {1. . . . . s] when s < o, and
= N when s = o. This coloring is such that every vertex is incident with
precisely one edge of each colour. We choose probabilities
i
> 0 (i ) such
that

i

i
= 1 and dene the following symmetric nearest neighbour random
walk.
(.. ,) = (,. .) =
i
. if the edge .. ,| has colour i .
See Figure 29, where s = 3.
........................................................................................................................................................................................................
......................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.........................................................
.........................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.........................................................
.........................................................
..................
..................
..................
..................
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
..................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..................
..................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..................
2
Figure 29
We remark here that this is a random walk on the group with the presentation
G = (a
i
. i [ a
2
i
= o for all i ).
where (recall) we denote by o the unit element of G. For readers who are not
familiar with group presentations, it is easy to describe Gdirectly. It consists of all
words
. = a
i
1
a
i
2
a
i
n
. where n _ 0. i
}
. i
}1
= i
}
for all .
The number n is the length of the word, [.[ = n. When n = 0, we obtain
the empty word o. The group operation is concatenation of words followed by
cancellations. Namely, if the last letter of . coincides with the rst letter of ,,
then both are cancelled in the product ., because of the relation a
2
i
= o. If in
the remaining word (after concatenation and cancellation) one still has an a
2
}
in the
middle, this also has to be cancelled, and so on until one gets a square-free word.
The latter is the product of . and , in G. In particular, for . as above, the inverse
is .
-1
= a
i
n
a
i
2
a
i
1
. In the terminology of combinatorial group theory, G is the
free product over all i of the 2-element groups {o. a
i
] with a
2
i
= o.
The tree is the Cayley graph of Gwith respect to the symmetric set of generators
S = {a
i
: i ]: the set of vertices of the graph is G, and two elements ., , are
neighbours in the graph if , = .a
i
for some a
i
S. Our random walk is a random
walk on that group. Its lawj is supported by the set of generators, and j(a
i
) =
i
for each i .
We now want to compute the Green function and the functions J(.. ,[:). We
observe that J(,. .[:) = J
i
(:) is the same for all edges .. ,| with colour i .
Furthermore, the function G(.. .[:) = G(:) is the same for all . T . We know
from Theorem 1.38 and Proposition 9.3 that
G(:) =
1
1
i

i
: J
i
(:)
and
J
i
(:) =

i
:
1
},=i

}
: J
}
(:)
=

i
:
1
G(:)
i
: J
i
(:)
.
(9.48)
We obtain a second order equation for the function J
i
(:), whose two solutions are
_
1
_
1 4
2
i
:
2
G(:)
2
1
_
_
2
i
: G(:)
_
. Since J
i
(0) = 0, while G(0) = 1,
the right solution is
J
i
(:) =
_
1 4
2
i
:
2
G(:)
2
1
2
i
: G(:)
. (9.49)
Combining (9.48) and (9.49), we obtain an implicit equation for the function G(:):
G(:) =
_
:G(:)
_
. where (t ) = 1
1
2
i
_
_
1 4
2
i
t
2
1
_
. (9.50)
We can analyze this formula by use of some basic facts from the theory of complex
functions. The function G(:) is dened by a power series and is analytic in the
disk of convergence around the origin. (It can be extended analytically beyond that
disk.) Furthermore, the coefcients are non-negative, and by Pringsheims theorem
(which we have used already in Example 5.24), its radius of convergence must be
a singularity. Thus, r = r(1) is the smallest positive singularity of G(:). We are
lead to studying the equation (9.50) for :. t _ 0.
For t _ 0, the function t (t ) is monotone increasing, convex, with
(0) = 1 and
t
(0) = 0. For t o, it approaches the asymptote with equa-
tion , = t
x-2
2
in the (t. ,)-plane. For 0 < : < r(1), by (9.50), G(:) is the
,-coordinate of a point where the curve , = (t ) intersects the line , =
1
z
t , see
Figures 30 and 31.
Case 1. s = 2. The asymptote is , = t , there is a unique intersection point of
, = (t ) and , =
1
z
t for each xed : (0. 1), and the angle of intersection is
non-zero.
By the implicit function theorem, there is a unique analytic solution of the
equation (9.50) for G(:) in some neighbourhood of :. This gives G(:), which is
analytic in each real : (0. 1). On the other hand, if : > 1, the curve , = (t )
and the line , =
1
z
t do not intersect in any point with real coordinates : there
is no real solution of (9.50). It follows that r = 1. Furthermore, we see from
Figure 30 that G(:) owhen : 1 from the left. Therefore the random walk
is recurrent. Our random walk is symmetric, whence reversible with respect to the
counting measure, which has innite mass. We conclude that recurrence is null
recurrence.
.................................................................................................................................................................................................................................................................................................................................................................. . . . . . . . . . . .
. . . . . . . . . . . .
t
...................................................................................................................................................................................................................................................................................................................................................................................
. . . . . . . . . . . .
,
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
, = t
...........................................................................................................................................................................................................................................................................................................................................................................................................
, =
1
z
t
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
, = (t )
Figure 30
Case 2. 3 _ s _ o. The asymptote intersects the ,-axis at
x-2
2
< 0. By the
convexity of , there is a unique tangent line to the curve , = (t ) (t _ 0) that
emanates from the origin, as one can see from Figure 31.
Let j be the slope of that tangent line. Its slope is smaller than that of the
asymptote, that is, j < 1. For 0 < : _ 1, the line , =
1
z
t intersects the curve
, = (t ) in a unique point with positive real coordinates, and the same argument
as in Case 1 shows that G( ) is analytic at :. For 1 < : < 1,j, there are two
intersection points with non-zero angles of intersection. Therefore both give rise to
ananalytic solutionof the equation(9.50) for G(:) ina complexneighbourhoodof :.
Continuity of G(:) and analytic continuation imply that G(:) =

n
(n)
(.. .) :
n
is nite and coincides with the ,-coordinate of the intersection point that is more
to the left.
.................................................................................................................................................................................................................................................................................................................................................................. . . . . . . . . . . .
. . . . . . . . . . . .
t
..............................................................................................................................................................................................................................................................................................................................................................................................
. . . . . . . . . . . .
,
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
, = t
x-2
2
........................................................................................................................................................................................................................................................................................................................................................................................................................
, =
1
z
t
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................
, = (t )
Figure 31
On the other hand, for : > 1,j there is no real solution of (9.50). We conclude
that the radius of convergence of G(:) is r = 1,j. We also nd a formula for
j = j(1). Namely, j = (t
0
), where t
0
is the unique positive solution of the
equation
t
(t ) = (t ),t , which can be written as
1
2
i
_
_
_
1
1
_
1 4
2
i
t
2
_
_
_
= 1.
We also have j(1) = min{(t ),t : t > 0].
In particular, G(1) is nite, and the random walk is transient. (This can of
course be shown in various different ways, including the ow criterion.) Also,
G(r) = lim
z-r-
G(:) = (t
0
) is nite, so that the randomwalk is also j-transient.
Let us now consider the Dirichlet problem at innity in Case 2. By equation
(9.49), we clearly have J
i
= J
i
(1) < 1 for each i . If s = o, then we also observe
that by (9.48)
J
i
_
i
G(1) 0. as i o.
Thus we always have

J = max
i
J
i
< 1. For arbitrary ,,
G(,. o) =

J(,. o)G(o. o) _

J
[,[
G(o. o).
Thus,
lim
,-
G(,. o) = 0 for every end dT.
When 3 _ s < o, we see the Dirichlet problem admits a solution for every
continuous function on dT .
Nowconsider the case when s = o. Every vertex has innite degree, and T
+
is
in one-to-one correspondence with T . We see from the above that the Green kernel
vanishes at every end. We still have to show that it vanishes at every improper
vertex; compare with the remarks after Denition 9.41. By the spatial homogeneity
of our random walk (i.e., because it is a random walk on a group), it is sufcient to
show that the Green kernel vanishes at o
+
. The neighbours of o are the points a
i
,
i 1, where the colour of the edge o. a
i
| is i . We have G(a
i
. o) = J
i
G(o. o),
which tends to 0 when i o. Now , o
+
means that i(,) o, where
i = i(,) is such that , T
o,o
i
.
We have shown that the Green kernel vanishes at every boundary point, so that
the Dirichlet problem is solvable also when s = o.
At this point we can make some comments on why it has been preferable to
introduce the improper vertices. This is because we want for a general irreducible,
transient Markov chain (X. 1) that the state space X remains discrete in the Martin
compactication. Recall what has been said in Section 7.B (before Theorem 7.19):
in the original article of Doob [17] it is not required that X be discrete in the
Martin compactication. In that setting, the compactication is the closure of (the
embedding of) X in B, the base of the cone of all positive superharmonic functions.
In the situation of the last example, this means that the compactication is T LdT ,
but a sequence that converges to an improper vertex .
+
in our setting will then
converge to the original vertex ..
Now, for this smaller compacticationof the tree withs = ointhe last example,
the Dirichlet problem cannot admit solution. Indeed, there every vertex is the limit
of a sequence of ends, so that a continuous function on dT forces the values of the
continuous extension (if it exists) at each vertex before we can even start to consider
the Poisson integral. For example, if . ~ , then the continuous extension of the
function 1
JT
x;y
to T LdT in the wrong compactication is 1
T
x;y
LJT
x;y
. It is not
harmonic at . and at ,.
We see that for the Dirichlet problem it is really relevant to have a compacti-
cation in which the state space remains discrete.
We remark here that Dirichlet regularity for points of the Martin boundary of
a Markov chain was rst studied by Knapp [37]. Theorem 9.43 (a) and Corol-
lary 9.44 are due to Benjamini and Peres [5] and, for trees that are not necessarily
locally nite, Cartwright, Soardi and Woess [10]. Example 9.46, with a more
general and more complicated proof, is due to Amghibech [1]. The paper [5] also
contains an example with an uncountable set of non-regular points that is dense in
the boundary.
A radial Fatou theorem
We conclude this section with some considerations on the Fatou theorem. Its classi-
cal versions are non-probabilistic and concern convergence of the Poisson integral
h(.) of an integrable function on the boundary. Here, is not assumed to be
continuous, and the Dirichlet problem at innity may not admit solution. We look
for a more restricted variant of convergence of h when approaching the boundary.
Typically, . (a boundary point) in a specic way (non-tangentially, radi-
ally, etc.), and we want to know whether h(.) (). Since h does not change
when is modied on a set of v
o
-measure 0, such a result can in general only hold
v
o
-almost surely. This is of course similar to (but not identical with) the probabilis-
tic Fatou theorem 7.67, which states convergence along almost every trajectory of
the random walk. We prove the following Fatou theorem on radial convergence.
9.51 Theorem. Consider a transient nearest neighbour random walk on the count-
able tree T . Let be a v
o
-integrable function on the space of ends dT , and h(.)
its Poisson integral.
Then for v
o
-almost every dT ,
lim
k-o
h
_
k
()
_
= ().
where
k
() is the vertex on (o. ) at distance k from o.
Proof. Similarly to the proof of Theorem 9.43, we dene the event
C
t
=
_
o C :
7
0
(o) = o. 7
n1
(o) ~ 7
n
(o) for all n.
7
n
(o) 7
o
(o) dT. h
_
7
n
(o)
_

_
7
o
(o)
_
_
.
Then Pr
o
(C
t
) = 1. Precisely as in the proof of Theorem 9.43, we consider the exit
times
k
=
k
(o) = max
n : 7
n
(o) =
k
_
7
o
(o)
__
for o C
t
. Then
k
o,
whence
h
_
k
(7
o
)
_
= h(7
k
) (7
o
) on C
t
.
Therefore, setting
T = { dT : h
_
k
()
_
()].
we have v
o
(T) = Pr
o
7
o
T| = 1, as proposed.
9.52 Exercise. Verify that the set T dened at the end of the last proof is a Borel
subset of dT .
Note that the improper vertices make no appearance in Theorem 9.51, since
v
o
(T
+
) = 0.
Theorem9.51 is due to Cartier [Ca]. Much more work has been done regarding
Fatou theorems on trees, see e.g. the monograph by Di Biase [16].
F. The boundary process, and the deviation from the limit geodesic 263
F The boundary process, and the deviation from the limit
geodesic
We know that a transient nearest neighbour random walk on a tree T converges
almost surely to a random end. We next want to study how and when the initial
pieces with lengths k N
0
of (7
0
. 7
n
) stabilize. Recall the exit times
k
that
were introduced in the proof of Theorem 9.18:
k
= n means that n is the last
instant when J(7
0
. 7
n
) = k. In this case, 7
n
=
k
(7
o
), where (as above) for an
end , we write
k
() for the k-th point on (o. ).
9.53 Denition. For a transient nearest neighbour random walk on a tree T , the
boundary process is W
k
=
k
(7
o
) = 7
k
, and the extended boundary process is
_
W
k
.
k
_
, k _ 0.
When studying the boundary process, we shall always assume that 7
0
= o.
Then [W
k
[ = k, and for . T with [.[ = k, we have Pr
o
W
k
= .| = v
o
(dT
x
).
9.54 Exercise. Setting : = 1, deduce the following from Proposition 9.3 (b). For
. T \ {o],
Pr
x
7
n
T
x
\ {.] for all n _ 1| = (.. .
-
)
1 J(.. .
-
)
J(.. .
-
)
.
[Hint: decompose the probability on the left hand side into terms that correspond to
rst moving from . to some forward neighbour , and then never going back to ..]
9.55 Proposition. The extended boundary process

_
W
k
.
k
_
k_1
is a (non-irre-
ducible) Markov chain with state space T
tr
N
0
. Its transition probabilities are
Pr
o
W
k
= ,.
k
= n [ W
k-1
= ..
k-1
= m|
=
J(.. .
-
)
J(,. .)
1 J(,. .)
1 J(.. .
-
)
(.. ,)
(.. .
-
)
(n-n)
(,. .).
where , T
tr
with [,[ = k. . = ,
-
, and n m N is odd.
Proof. For the purpose of this proof, the quantity computed in Exercise 9.54 is
denoted
g(.) = Pr
x
7
i
T
x
\ {.] for all i _ 1|.
Let o = .
0
. .
1
. . . . . .
k
| be any geodesic arc in T
tr
that starts at o, and let m
1
<
m
2
< < m
k
be positive integers such that m
}
m
}-1
N
odd
for all . (Of
course, N
odd
denotes the odd positive integers.) We consider the following events
in the trajectory space.
k
= W
k
= .
k
.
k
= m
k
|.
T
k
= W
k-1
= .
k-1
.
k-1
= m
k-1
. W
k
= .
k
.
k
= m
k
| =
k-1

k
.
C
k
= W
1
= .
1
.
1
= m
1
. . . . . W
k
= .
k
.
k
= m
k
|. and
D
k
= [7
n
[ > k for all n > m
k
|.
We have to show that Pr
o
k
[ C
k-1
| = Pr
o
k
[
k-1
|, and we want to compute
this number. A difculty arises because the two conditioning events depend on all
future times after
k-1
= m
k-1
. Therefore we also consider the events
+
k
= 7
n
k
= .
k
|.
T
+
k
= 7
n
k1
= .
k-1
. 7
n
k
= .
k
. [7
i
[ _ k for i = m
k-1
1. . . . . m
k
|.
and
C
+
k
=
_
7
n
j
= .
}
for = 1. . . . . k.
[7
i
[ _ for i = m
}-1
1. . . . . m
}
. = 2. . . . . k
_
.
Each of them depends only on 7
0
. . . . . 7
n
k
. Now we can apply the Markov
property as follows.
Pr
o
(
k
) = Pr
o
(D
k

+
k
) = Pr
o
(D
k
[
+
k
) Pr
o
(
+
k
) = g(.
k
) Pr
o
(
+
k
).
and analogously
Pr
o
(T
k
) = g(.
k
) Pr
o
(T
+
k
) and Pr
o
(C
k
) = g(.
k
) Pr
o
(C
+
k
).
Thus, noting that C
+
k
= T
+
k
C
+
k-1
,
Pr
o
(
k
[ C
k-1
) =
Pr
o
(C
k
)
Pr
o
(C
k-1
)
=
g(.
k
)
g(.
k-1
)
Pr
o
(C
+
k
)
Pr
o
(C
+
k-1
)
=
g(.
k
)
g(.
k-1
)
Pr
o
(T
+
k
[ C
+
k-1
)
=
g(.
k
)
g(.
k-1
)
Pr
o
(T
+
k
[
+
k-1
)
=
g(.
k
)
g(.
k-1
)
Pr
o
(T
+
k
)
Pr
o
(
+
k-1
)
=
Pr
o
(T
k
)
Pr
o
(
k-1
)
= Pr
o
(
k
[
k-1
).
F. The boundary process, and the deviation from the limit geodesic 265
as required. Now that we know that the process is Markovian, we assume that
.
k-1
= ., .
k
= ,, m
k-1
= m and m
k
= n, and easily compute
Pr
o
(T
+
k
) = Pr
o
(
+
k-1
) Pr
x
7
n-n
= ,. 7
i
= . for i = 1. . . . . n m|
= Pr
o
(
+
k-1
)
(n-n)
(.. ,).
where
(n-n)
(.. ,) is the last exit probability of (3.56). By reversibility (9.5)
and Exercise 3.59, we have for the generating function 1(.. ,[:) of the
(n)
(.. ,)
that
m(.) 1(.. ,[:) =
m(.) G(.. ,[:)
G(.. .[:)
=
m(,) G(,. .[:)
G(.. .[:)
= m(,) J(,. .[:).
Therefore,
(n-n)
(.. ,) = m(,)
(n-n)
(,. .),m(.), and of course we also have
m(,) (,. .),m(.) = (.. ,). Putting things together, the transition probability
from (.. m) to (,. n) is
Pr
o
(
k
[
k-1
) =
g(,)
g(.)
(n-n)
(.. ,).
which reduces to the stated formula.
Note that the transition probabilities in Proposition 9.55 depend only on ., ,
and the increment
k1
=
k1

k
. Thus, also the process
_
W
k
.
k
_
k_1
is a
Markov chain, whose transition probabilities are
Pr
o
W
k1
= ,.
k1
= n [ W
k
= ..
k
= m|
=
J(.. .
-
)
J(,. .)
1 J(,. .)
1 J(.. .
-
)
(.. ,)
(.. .
-
)
(n)
(,. .).
(9.56)
Here, n has to be odd, so that the state space is T
tr
N
odd
. Also, the process
factorizes with respect to the projection (.. n) ..
9.57 Corollary. The boundary process (W
k
)
k_1
is also a Markov chain. If .. ,
T
tr
with [.[ = k and ,
-
= . then
Pr
o
W
k1
= , [ W
k
= .| =
v
o
(dT
,
)
v
o
(dT
x
)
= J(.. .
-
)
1 J(,. .)
1 J(.. .
-
)
(.. ,)
(.. .
-
)
.
We shall use these observations in the next sections.
9.58 Exercise. Compute the transition probabilities of the extended boundary pro-
cess in Example 9.29.
[Hint: note that the probabilities
(n)
(.. .
-
) can be obtained explicitly from the
closed formula for J(.. .
-
[:), which we knowfromprevious computations.]
Now that we have some idea how the boundary process evolves, we want to
know how far (7
n
) can deviate from the limit geodesic (o. 7
o
) in between the
exit times. That is, we want to understand how J
_
7
n
. (o. 7
o
)
_
behaves, where
(7
n
) is a transient nearest neighbour random walk on an innite tree T . Here, we
just consider one basic result of this type that involves the limit distributions on dT .
9.59 Theorem. Suppose that there is a decreasing function : R
with
lim
t-o
(t ) = 0 such that
v
x
(dT
x,,
) _
_
J(.. ,)
_
for all .. , T (. = ,).
Then, whenever (r
n
) is an increasing sequence in Nsuch that
n
(r
n
) < o, one
has
limsup
n-o
J
_
7
n
. (.. 7
o
)
_
r
n
_ 1 Pr
x
-almost surely
for every starting point . T .
In particular, if
sup
J(.. ,) : .. , T. . ~ ,] = z < 1
then
limsup
n-o
J
_
7
n
. (.. 7
o
)
_
log n
_ log(1,z) Pr
x
-almost surely for every ..
Proof. The assumptions imply that each v
x
is a continuous measure, that is, v
x
() =
0 for every end of T . Therefore dT must be uncountable. We choose and x an
end dT , and assume without loss of generality that the starting point is 7
0
= o.
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. . . . . . . . .
. . . . . . . . .
........................................................................................................................................................ . . . . . . . .
. . . . . . . . .
.......................................................................................................................................................
- - -
-
o 7
o
7
n
. 7
o
7
n
. 7
o
Figure 32
From Figure 32 one sees the following: if [7
n
. 7
o
[ > [ . 7
o
[ then one
has J
_
7
n
. (o. 7
o
)
_
= J
_
7
n
. (. 7
o
)
_
. (Note that (. 7
o
) is a bi-innite
G. Some recurrence/transience criteria 267
geodesic.) Therefore, if r > 0, then
Pr
o
_
J
_
7
n
. (o. 7
o
)
_
_ r. [7
n
. 7
o
[ > [ . 7
o
[
_
_ Pr
o
_
J
_
7
n
. (. 7
o
)
_
_ r
_
=
xT
Pr
o
7
n
= .| Pr
x
_
J
_
.. (. 7
o
)
_
_ r
_
_
xT
Pr
o
7
n
= .| (r) = (r).
since J
_
.. (. 7
o
)
_
_ r implies that 7
o
dT
x,,
, where , is the element on the
ray (.. ) at distance r from .. Now consider the sequence of events
n
=
_
J
_
7
n
. (o. 7
o
)
_
_ r
n
. [7
n
. 7
o
[ > [ . 7
o
[
_
in the trajectory space. Then by the above,

n
Pr
o
(
n
) < o and, by the Borel
Cantelli lemma, Pr
o
(limsup
n
n
) = 0. We know that Pr
o
7
o
= | = 1, since
v
o
is continuous. That is, [ . 7
o
[ < o almost surely. On the other hand,
[7
n
. 7
o
[ o. We see that
Pr
o
_
liminf
n
_
[7
n
. 7
o
[ > [ . 7
o
[
__
_ Pr
o
7
o
= | = 1.
Therefore Pr
o
_
limsup
n
_
J
_
7
n
. (o. 7
o
)) _ r
n
__
= 0, whichimplies the proposed
general result.
In the specic case when z = sup
J(.. ,) : .. , T. . ~ ,] < 1, we can

set (t ) = z
t
. If we choose r
n
=

(1 ) log n
log(1,z)
_
, where > 0, then
we see that (r
n
) _ 1,n
1
, whence
limsup
n-o
J
_
7
n
. (.. 7
o
)
_
log n
_
log(1,z)
1
Pr
x
-almost surely,
and this holds for every > 0.
The (log n)-estimate in the second part of the theorem was rst proved for
random walks on free groups by Ledrappier [40]. A simplied generalization is
presented in [44], but it contains a trivial error at the end (the exponential function
does not vary regularly at innity), which was observed by Gilch [26].
The boundary process was used by Lalley [39] in the context of nite range
random walks on free groups.
G Some recurrence/transience criteria
So far, we have always assumed transience. We now want to present a few (of the
many) criteria for transience and recurrence of a nearest neighbour random walk
with stochastic transition matrix 1 on a tree T .
In viewof reversibility (9.5), we already have a criterion for positive recurrence,
see Proposition 9.8. We can use the ow criterion of Theorem 4.51 to study tran-
sience. If . = o and (o. .) = o = .
0
. .
1
. . . . . .
k-1
. .
k
= .| then the resistance
of the edge e = .
-
. .| = .
k-1
. .
k
| is
r(e) =
1
(.
0
. .
1
)
k-1
i=1
(.
i
. .
i-1
)
(.
i
. .
i1
)
.
There are various simple choices of unit ows from o to o which we can use
to test for transience.
9.60 Example. The simple owon a locally nite tree T with root o and deg(.) _ 2
for all . = o is dened recursively as
(o. .) =
1
deg(.)
. if .
-
= o. and (.. ,) =
(.
-
. .)
deg(.) 1
. if ,
-
= . = o.
Of course (.. .
-
) = (.
-
. .). With respect to simple random walk, its power
is just
(. ) =
xT\o
(.
-
. .)
2
.
This sum can be nicely interpreted as the total area of a square tiling that is lled
into the strip 0. 1| 0. o) in the plane. Above the base segment we draw deg(o)
squares with side length 1, deg(o). Each of them corresponds to one of the edges
o. .| with .
-
= o. If we already have drawn the square corresponding to an edge
.
-
. .|, then we subdivide its top side into deg(.) 1 segments of equal length.
Each of them becomes the base segment of a new square that corresponds to one of
the edges .. ,| with ,
-
= .. See Figure 33. If the total area of the tiling is nite
then the random walk is transient.
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
..........................................................................................................................................................................................................................................................................................................................
..............................................................................................
............................................................................................................................................................................................ ..............................................................................................................................................................
................................ ................................
................
................
.............................................................................................. ................................
................................
.
.
.
(0. 0) (1. 0)
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ .................................................................................................................................. ..................................................................................................................................
..................................................................................................................................................................................................................
............................................................................................................................................
.
.
.
o
-
- -
- - - -
Figure 33. A tree and the associated square tiling.
This square tiling associated with SRW on a locally nite tree was rst consid-
ered by Gerl [25]. More generally, square tilings associated with random walks
on planar graphs appear in the work of Benjamini and Schramm [7].
Another choice is to take an end and send a unit ow
from o to : if
(o. ) = o = .
0
. .
1
. .
2
. . . . | then this ow is given by (e) = 1 and ( ` e) = 1
for the oriented edge e from .
i-1
to .
i
, while (e) = 0 for all edges that do not
lie on the ray (o. ). If has nite power then the random walk is transient. We
obtain the following criterion.
9.61 Corollary. If T has an end such that for the geodesic ray (o. ) = o =
.
0
. .
1
. .
2
. . . . |,
o
k=1
k
i=1
(.
i
. .
i-1
)
(.
i
. .
i1
)
< o.
then the random walk is transient.
9.62 Exercise. Is it true that any end that satises the criterion of Corollary 9.61
is a transient end?
When T itself is a half-line (ray), then the condition of Corollary 9.61 is the
necessary and sufcient criterion of Theorem 5.9 for birth-and-death chains. In
general, the criterion is far from being necessary, as shows for example simple
random walk on T
q
.
In any case, the criterion says that the ingoing transition probabilities (towards
the root) are strongly dominated by the outgoing ones. We want to formulate a
more general result in the same spirit. Let be any strictly positive function on T .
We dene z = z
(
: T \ {o] (0. o) and g = g
(
: T 0. o) by
z(.) =
1
(.. .
-
)(.)
, : ,
=x
(.. ,)(,).
g(o) = 0. and if . = o. (o. .) = o = .
0
. .
1
. . . . . .
n
= .|. then
g(.) = (.
1
)
n-1
k=1
(.
k1
)
z(.
1
) z(.
k
)
.
(9.63)
Admitting the value o, we can extend g = g
(
to dT by setting
g() = (.
1
)
o
k=1
(.
k1
)
z(.
1
) z(.
k
)
. if (o. ) = o = .
0
. .
1
. . . . |. (9.64)
9.65 Lemma. Suppose that T is such that deg(.) _ 2 for all . T \ {o]. Then
the function g = g
(
of (9.63) satises
1g(.) = g(.) for all . = o. and 1g(o) = 1(o) > 0 = g(o):
it is subharmonic: 1g _ g.
Proof. Since deg(.) _ 2, we have z(.) > 0 for all . T \ {o]. Let us dene
a(o) = 1 and recursively for . = o
a(.) = a(.
-
),z(.).
Then g(.) is also dened recursively by
g(o) = 0 and g(.) = g(.
-
) a(.
-
)(.). . = o.
Therefore, if . = o then
1g(.) = (.. .
-
)g(.
-
)
,:,
=x
(.. ,)
_
g(.) a(.)(,)
_
= (.. .
-
)
_
g(.) a(.
-
)(.)
_
_
1 (.. .
-
)
_
g(.) a(.)z(.)

a(.
)
(.. .
-
)(.)
= g(.).
It is clear that 1g(o) = 1(o) =

x
(o. .)(.).
9.66 Corollary. If deg(.) _ 2 for all . T \ {o] and g = g
(
is bounded then the
random walk is transient.
Proof. If g(.) < M for all ., then M g(.) is a non-constant, positive superhar-
monic function. Theorem 6.21 yields transience.
We have dened the function g
(
= g
(
o
with respect to the root o, and we can
also dene an analogous function g
with respect to another reference vertex .

(Predecessors have then to be considered with respect to instead of o.) Then, with
z(.) dened with respect to o as above, we have the following.
If ~ o then g
(
o
(.) = ()
1
z()
g
(
(.) for all . T

o,
\ {].
9.67 Corollary. If there is a cone T
,u
( ~ n) of T such that deg(.) _ 2 for all
. T
,u
, and the function g = g
(
of (9.63) is bounded on that cone, then the
random walk on T is transient, and every element of dT
,u
is a transient end.
Proof. By induction on J(o. ), we infer from the above formula that g = g
o
is
bounded on T
,u
if and only if g
is bounded on T
,u
. Equivalently, it is bounded on
the branch T
,uj
= T
,u
L{]. On T
,uj
, we have the randomwalk with transition
matrix 1
,uj
which coincides with the restriction of 1 along each oriented edge of
that branch, with the only exception that
,uj
(. n) = 1. See (9.2). We can now
apply Corollary 9.66 to 1
,uj
on T
,uj
, with g
in the place of g. It follows that

1
,uj
is transient on T
,uj
. We know that this holds if and only if J(n. ) < 1,
which implies transience of 1 on T .
Furthermore, If g is bounded on T
,u
, then it is bounded on every sub-cone
T
x,,
, where . ~ , and . (n. ,). Therefore J(,. .) < 1 for all those edges
.. ,|. This says that all ends of T
,u
are transient.
Another by-product of Corollaries 9.66 and 9.67 is the following criterion.
9.68 Corollary. Suppose that there are a bounded, strictly positive function on
T \ {o] and a number z > 0 such that
, : ,
=x
(.. ,)(,) _ z(.. .
-
)(.) for all . T \ {o].
If z > 1 then every end of T is transient.
Proof. We can choose (o) > 0 arbitrarily. Since is bounded and z(.) _ z > 1
for all . = o, we have sup g
(
_ sup
n
z
-n
< o.
We remark that the criteria of the last corollaries can also be rewritten in terms
of the conductances a(.. ,) = m(.)(.. ,). Therefore, if we nd an arbitrary
subtree of T to which one of them applies with respect to a suitable root vertex,
then we get transience of that subtree as a subnetwork, and thus also of the random
walk on T itself. See Exercise 4.54.
We now consider the extension (9.64) of g = g
(
to the space of ends of T .
9.69 Theorem. Suppose that T is such that deg(.) _ 2 for all . T \ {o].
(a) If g() < ofor all dT , then the random walk is transient.
(b) If the random walk is transient, then g() < ofor v
o
-almost every T .
Proof. (a) Suppose that the random walk is recurrent. By Corollary 9.66, g cannot
be bounded. There is a vertex n(1) = o such that g
_
n(1)
_
_ 1. Let (1) = n(1)
-
.
Then J
_
n(1). (1)
_
= 1, the cone T
(1),u(1)
is recurrent. By Corollary 9.67, g
is unbounded on T
(1),u(1)
, and we nd n(2) = n(1) in T
(1),u(1)
such that
g
_
n(2)
_
_ 2.
We now proceed inductively. Given n(n) T
(n-1),u(n-1)
\ {n(n 1)]
with g
_
n(n)
_
_ n, we set (n) = n(n)
-
. Recurrence of T
(n),u(n)
implies
via Corollary 9.67 that g is unbounded on that cone, and there must be n(n 1)
T
(n),u(n)
\ {n(n)] with g
_
n(n 1)
_
_ n 1.
The points n(n) constructed in this way lie on an innite ray. If is the
corresponding end, then g() _ g
_
n(n)
_
o. This contradicts the assumption
of niteness of g on dT .
(b) Suppose transience. Recall that 1 G(.. o) = G(.. o), when . = o, while
1 G(o. o) = G(o. o)1. Taking Lemma 9.65 into account, we see that the function
h(.) = g(.)c G(.. o) is positive harmonic, where c = 1(o) is chosen in order
tocompensate the strict subharmonicityof g at o. ByCorollary7.31fromthe section
about martingales, we know that limh(7
n
) exists and is Pr
o
-almost surely nite.
Also, by Exercise 7.32, limG(7
n
. o) = 0 almost surely. Therefore
lim
n-o
g(7
n
) exists and is Pr
o
-almost surely nite.
Proceeding once more as in the proof of Theorem 9.43, we have
k
oPr
o
-al-
most surely, where
k
= max{n : 7
n
=
k
(7
o
)]. Then
lim
k-o
g
_
k
(7
o
)
_
< o Pr
o
-almost surely.
Therefore
v
o
_
dT : g() < o
__
= Pr
o
_
7
o

dT : lim
k-o
g
_
k
()
_
< o
__
= 1.
as proposed.
9.70 Exercise. Prove the following strengthening of Theorem 9.69 (a).
Suppose that there is a cone T
,u
( ~ n) of T such that deg(.) _ 2 for all
. T
,u
, and the extension (9.64) of g = g
(
to the boundary satises g() < o
for every dT
,u
. Then the random walk is transient. Furthermore, every end
dT
,u
is transient.
The simplest choice for is 1. In this case, the function g = g
1
has the
following form.
g(o) = 0. g() = 1 for ~ o. and
g(.) = 1
n-1
k=1
k
i=1
(.
i
. .
i-1
)
1 (.
i
. .
i-1
)
.
(9.71)
if (o. .) = o = .
0
. .
1
. . . . . .
n
= .| with m _ 2.
9.72 Examples. In the following examples, we always choose 1, so that
g = g
1
is as in (9.71).
(a) For simple random walk on the homogeneous tree with degree q 1 of
Example 9.29 and Figure 26, we have for the function g = g
1
g(.) = 1 q
-1
q
-[x[1
for . = o.
The function g is bounded, and SRW is transient, as we know.
(b) Consider SRW on the tree of Example 9.33 and Figure 27. There, the
recurrent ends are dense in dT , and g(j
x
) = o for each of them. Nevertheless,
the random walk is transient.
(c) Next, consider SRW on the tree T of Figure 28 in Example 9.46.
For the vertex k _ 2 on the backbone, we have g(k) = 2 2
-k2
.
For its neighbour k
t
on the nite hair with length (k) attached at k, the value
is g(k
t
) = 2 2
-k1
.
The function increases linearly along that hair, and for the root o
k
of the binary
tree attached at the endpoint of that hair, g(o
k
) = g(k) 2
-k1
(k).
Finally, if . is a vertex on that binary tree and J(.. o
k
) = m then g(.) =
g(o
k
) 2
-k1
(1 2
-n
).
We obtain the following values of g on dT .
g() = 2 and g() = 2 2
-k2
2
-k1
_
(k) 1
_
. if dT
k
0 .
If we choose (k) big enough, e.g. (k) = k 2
k
, then the function g is everywhere
nite, but unbounded on dT .
(d) Finally, we give an example which shows that for the transience criterion of
Theorem 9.69 it is essential to have no vertices with degree 1 (at least in some cone
of the tree). Consider the half line Nwith root o = 1, and attach a dangling edge
at each k N. See Figure 34. SRW on this tree is recurrent, but the function g is
bounded.
- - - - - - - -
- - - - - - - -
o

Figure 34
9.73 Exercise. Show that for the random walk of Example 9.47, when 3 _ s _ o,
the function g is bounded.
[Hint: when max
i

i
< 1,2 this is straightforward. For the general case show that
max
]
i
1-]
i
]
j
1-]
j
: i. . i =
_
< 1 and use this fact appropriately.]
With respect to 1, the function g = g
1
of (9.71) was introduced by
Bajunaid, Cohen, Colonna and Singman [3] (it is denoted H there). The proofs
of the related results have been generalized and simplied here. Corollary 9.68 is
part of a criterion of Gilch and Mller [27], who used a different method for the
proof.
Trees with nitely many cone types
We now introduce a class of innite, locally nite trees & random walks which
comprise a variety of interesting examples and allow many computations. This
includes a practicable recurrence criterion that is both necessary and sufcient.
We start with (T. 1), where 1 is a nearest neighbour transition matrix on the
locally nite tree T . As above, we x an origin o T . For . T \ {o], we
consider the cone T
x
= T
x
,x
of . as labelled tree with root .. The labels are the
probabilities (. n), . n T
x
( ~ n).
9.74 Denition. Two cones are isomorphic, if there is a root-preserving bijection
between the two that also preserves neighbourhood as well as the labels of the edges.
A cone type is an isomorphism class of cones T
x
, . = o.
The pair (T. 1) is called a tree with nitely many cone types if the number of
distinct cone types is nite.
We write for the nite set of cone types. The type (in ) of . T \ {o] is the
cone type of T
x
and will be denoted by |(.). Suppose that |(.) = i . Let d(i. ) be
the number of neighbours of . in T
x
that are of type . We denote
p(i. ) =
, : ,
=x, t(,)=}
(.. ,). and p(i ) = (.. .
-
) = 1
}
p(i. ).
As the notation indicates, those numbers depend only on i and , resp. (for the
backward probability) only on i . In particular, we must have
}
p(i. ) < 1. We
also admit leaves, that have no forward neighbour, in which case p(i ) = 1.
We can encode this information in a labelled oriented graph with multiple edges
over the vertex set . For i. , there are d(i. ) edges from i to , which carry
the labels (.. ,), where . is any vertex with type i and , runs through all forward
neighbours with type of .. This does not depend on the specic vertex . with
|(.) = i .
Next, we augment the graph of cone types by the vertex o (the root of T ). In
this new graph
o
, we draw d(o. i ) edges from o to i , where d(o. i ) is the number
of neighbours . of o in T with |(.) = i . Each of those edges carries one of the
labels (o. .), where . ~ o and |(.) = i . As above, we write
p(o. i ) =
x : x~o, t(x)=i
(o. .).
Then

i
p(o. i ) = 1. The original tree with its transition probabilities can be
recovered from the graph
o
as the directed cover. It consists of all oriented paths
in
o
that start at o. Since we have multiple edges, such a path is not described
fully by its sequence of vertices; a path is a nite sequence of oriented edges of
with the property that the terminal vertex of one edge has to coincide with the
initial vertex of the next edge in the path. This includes the empty path without any
edge that starts and ends at o. (This path is denoted o.) Given two such paths in
o
, here denoted .. ,, we have ,
-
= . as vertices of the tree if , extends the path
. by one edge at the end. Then (.. ,) is the label of that nal edge from
o
. The
backward probability is then (,. .) = p( ), if the path , terminates at .
9.75 Examples. (a) Consider simple random walk on the homogeneous tree with
degree q 1 of Example 9.29 and Figure 26. There is one cone type, = {1],
we have d(1. 1) = q, and each of the q loops at vertex ( type) 1 carries the label
(probability) 1,(q 1). Furthermore, d(o. 1) = q 1, and each of the q 1 edges
from o to 1 carries the label 1,(q 1).
(b) Consider simple random walk on the tree of Example 9.33 and Figure 27.
There are two cone types, = {1. 2], where 1 is the type of any vertex of the
homogeneous tree, and 2 is the type of any vertex on one of the hairs. We have
d(1. 1) = q, and each of the q loops at vertex (type) 1 carries the label 1,(q 2),
while d(1. 2) = d(2. 2) = 1 with labels 1,(q 2) and 1,2, respectively.
Furthermore, d(o. 1) = q 1, and each of the q 1 edges from o to 1 carries
the label (probability) 1,(q 2), while d(o. 2) = 1 and (o. 1) = 1,(q 2).
Figure 35 refers to those two examples with q = 2.
....................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
o
1
...................................................................... . . . . . . . . . . .
. . . . . . . . . . . .
........................................................................ . . . . . . . . . . .
. . . . . . . . . . . .
........................................................................ . . . . . . . . . . .
. . . . . . . . . . . .
each 1,3
.......... . . . . . . . . . . .
. . . . . . . . . . . . .......
. . . . . . .
1,3
.......... . . . . . . . . . . .
. . . . . . . . . . . .
.......
. . . . . . .
1,3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................................................
....................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
o
2
1
........................
. . . . . . . . . . . .
1,4
...............................................................................
................................................................................ . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
1,4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...............................................................................
.................................................................................
.................................................................................. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
................................................................................ . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
.................................................................................. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
each 1,4
.....................
. . . . . . . . . . . .
.......
.......
1,2
.......... . . . . . . . . . . .
............
.......
.......
1,4
. . . . . . . . .............
............
.......
. . . . . . .
1,4
Figure 35. The graphs
o
for SRW on T
2
and on T
2
with hairs.
(c) Consider the random walk of Example 9.47 in the case when s < o. Again,
we have nitely many cone types. As a set, = {1. . . . . s]. The graph structure is
that of a complete graph: there is an oriented edge i. | for every pair of distinct
elements i. . We have d(i. ) = 1 and p(i. ) = p( ) =
}
. (As a
matter of fact, when some of the
}
coincide, we have a smaller number of distinct
cone types and higher multiplicities d(i. ), but we can as well maintain the same
model.)
In addition, in the augmented graph
o
, there is an edge o. i | with d(o. i ) = 1
and p(o. i ) =
i
for each i .
The reader is invited to draw a gure.
(d) Another interesting example is the following. Let 0 < < 1 and =
{1. 1]. We let d(1. 1) = 2, d(1. 1) = d(1. 1) = 1 and d(1. 1) = 0. The
probability labels are ,2 at each of the loops at vertex (type) 1 as well as at the
edge from 1 to 1. Furthermore, the label at the loop at vertex 1 is 1 .
For the augmented graph
o
, we let d(o. 1) = 1 with p(o. 1) = 1 , and
d(o. 1) = 2 with each of the resulting two edges having label ,2.
The graph
o
is shown in Figure 36. The reader is solicited to drawthe resulting
tree and transition probabilities. We shall reconsider that example later on.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................................................
....................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
o
1
1
............. . . . . . . . . . . .
............
,2
...............................................................................
................................................................................ . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................................................................................
.................................................................................. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
.................................................................................. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
,2
,2
.....................
. . . . . . . . . . . .
.......
....... 1
.......... . . . . . . . . . . .
............
.......
.......
,2
. . . . . . . . .............
............
.......
. . . . . . .
,2
Figure 36. Random walk on T
2
in horocyclic layers.
We return to general trees with nitely many cone types. If |(.) = i then
J(.. .
-
[:) = J
i
(:) depends only on the type i of .. Proposition 9.3 (b) leads to
a nite system of algebraic equations of degree _ 2,
J
i
(:) = p(i ):
}
p(i. ) : J
}
(:)J
i
(:). (9.76)
(If . is a terminal vertex then J
i
(:) = :.) In Example 9.47, we have already
worked with these equations, and we shall return to them later on.
We now dene the non-negative matrix
=
_
a(i. )
_
i,}
with a(i. ) = p(i. ),p(i ). (9.77)
Let j() be its largest non-negative eigenvalue; compare with Proposition 3.44.
This number can be seen as an overall average or balance of quotients of forward
and backward probabilities.
9.78 Theorem. Let (T. 1) be a random walk on a tree with nitely many cone
types, and let be the associated matrix according to (9.77). Then the random
walk is
positive recurrent if and only if j() < 1,
null recurrent if and only if j() = 1, and
transient if and only if j() > 1.
Proof. First, we consider positive recurrence. We have to show that m(T ) < o
for the measure of (9.6) if and only if j() < 1.
Let T
i
= T
x
be a cone in T with type |(.) = i . We dene a measure m
i
on T
i
by
m
i
(.) = 1 and m
i
(,) =
(,
-
. ,)
(,. ,
-
)
m
i
(,
-
) for , T
x
\ {.].
Then
m(T ) = 1
i
p(o. i )
p(i )
m
i
(T
i
).
We need to show that m
i
(T
i
) < o for all i . For n _ 0, let T
n
i
= T
n
x
be the ball
of radius n centred at . in the cone T
x
(all vertices at graph distance _ n from .).
Then
m
i
(T
0
i
) = 1 and m(T
n
i
) = 1
}
a(i. )m
}
(T
n-1
}
). n _ 1
for each i . Consider the column vectors m =
_
m(T
i
)
_
i
and m
(n)
=
_
m(T
n
i
)
_
i
, n _ 1, as well as the vector 1 over with all entries equal to 1. Then
we see that
m
(n)
= 1 m
(n-1)
= 1 1
2
1
n
1.
and m
(n)
m as n o. By Proposition 3.44, this limit is nite if and only if
j() < 1.
Next, we show that transience holds if and only if j() > 1.
We start with the if part. Assume that j = j() > 1. Once more by
Proposition 3.44, there is an irreducible class J of the matrix such that
j() = j(
J
), where
J
is the restriction of to J. By the PerronFrobenius
theorem, there is a column vector h =
_
h(i )
_
iJ
with strictly positive entries such
that
J
h = j() h.
We x a vertex .
0
of T with |(.
0
) J and consider the subtree T
J
of T
x
0
which is spanned by all vertices , T
x
0
which have the property that |(n) J for
all n (.
0
. ,). Note that T
J
is innite. Indeed, every vertex in T
J
has at least
one successor in T
J
, because the matrix
J
is non-zero and irreducible.
We dene (.) = h
_
|(.)
_
for . T
J
. This is a bounded, strictly positive
function, and
,T
J
, ,
=x
(.. ,)(,) = j (.. .
-
)(.) for all . T
J
.
Therefore Corollary 9.68 applies with z = j() > 1 to T
J
as a subnetwork of T ,
and transience follows; compare with the remarks after the proof of that corollary.
Finally, we assume transience and have to show that j() > 1.
Consider the transient skeleton T
tr
as a subnetwork, and the associated transition
probabilities of (9.28). Then (T
tr
. 1
tr
) is also a tree with nitely many cone types:
if .
1
. .
2
T
tr
have the same cone type in (T. 1), then they also have the same cone
type in (T
tr
. 1
tr
). These transient cone types are just
tr
= {i : J
i
(1) < 1].
Furthermore, it follows from Exercise 9.26 that when i
tr
, and i
with respect to the non-negative matrix (i.e., a
(n)
(. i ) > 0 for some n), then

tr
. Therefore,
tr
is a union of irreducible classes with respect to . From
(9.28), we also see that the matrix
tr
associated with 1
tr
according to (9.77) is just
the restriction of to
tr
. Proposition 3.44 implies j() _ j(
tr
). The proof will
be completed if we show that j(
tr
) > 1.
For this purpose, we may assume without loss of generality that T = T
tr
,
1 = 1
tr
and =
tr
. Consider the diagonal matrix
D(:) = diag
_
J
i
(:)
_
i
.
and let 1 be the identity matrix over . For : = 1, we can rewrite (9.76) as
1 J
i
(1) =
}
J
i
(1) a(i. )
_
1 J
}
(1)
_
.
Equivalently, the non-negative, irreducible matrix
Q =
_
1 D(1)
_
-1
D(1)
_
1 D(1)
_
(9.79)
is stochastic. Thus, j(Q) = 1, and therefore also j
_
D(1)
_
= 1. The (i. )-
element of the last matrix is J
i
(1)a(i. ). It is 0 precisely when a(i. ) = 0, while
J
i
(1)a(i. ) < a(i. ) strictly, when a(i. ) > 0. Therefore Proposition 3.44 in
combination with Exercise 3.43 (applying the latter to the irreducible classes of )
yields j() > j
_
D(1)
_
= 1.
The above recurrence criterion extends the one for homesick random walk
of Lyons [41]; it was proved by Nagnibeda and Woess [44], and a similar result
in completely different terminology had been obtained by Gairat, Malyshev,
Menshikov and Pelikh [22]. (In [44], the proof that z() = 1 implies null
recurrence is somewhat sloppy.)
9.80 Exercise. Compute the largest eigenvalue j() of for each of the random
walks of Example 9.75.
H. Rate of escape and spectral radius 279
H Rate of escape and spectral radius
We now plan to give a small glimpse at some results concerning the asymptotic
behaviour of the graph distances J(7
n
. o), which will turn out to be related with
the spectral radius j(1).
A sum S
n
= X
1
X
n
of independent, integer (or real) random variables
denes a random walk on the additive group of integer (or real) numbers, compare
with (4.18). If the X
k
are integrable then the law of large numbers implies that
1
n
J(S
n
. 0) =
1
n
[S
n
[ almost surely, where = [E(X
1
)[.
We can ask whether analogous results hold for an arbitrary Markov chain with
respect to some metric on the underlying state space. A natural choice is of course
the graph metric, when the graph of the Markov chain is symmetric. Here, we shall
address this in the context of trees, but we mention that there is a wealth of results
for different types of Markov chains, and we shall only scratch the surface of this
topic.
Recall the denition (2.29) of the spectral radius of an irreducible Markov chain
(resp., an irreducible class). The following is true for arbitrary graphs in the place
of trees.
9.81Theorem. Let X be a connected, symmetric graph and 1 the transition matrix
of a nearest neighbour random walk on X. Suppose that there is c
0
> 0 such that
(.. ,) _ c
0
whenever . ~ ,, and that j(1) < 1. Then there is a constant > 0
such that
liminf
n-o
1
n
J(7
n
. 7
0
) _ Pr
x
Proof. We set j = j(1) and claim that for all .. , X and n _ 0,
(n)
(.. ,) _ (j,c
0
)
d(x,,)
j
n
for all .. , X and n _ 0.
This is true when . = , by Theorem 2.32. In general, let J = J(.. ,). We have
by assumption
(d)
(,. .) _ c
d
0
, and therefore
(n)
(.. ,) c
d
0
_
(nd)
(.. .) _ j
nd
.
The claimed inequality follows by dividing by c
d
0
.
Note that we must have deg(.) _ M = ]1,c
0
for every . X, where M _ 2.
This implies that for each . X and k _ 1,
[{, X : J(.. ,) _ k][ _ M(M 1)
k-1
.
Also, if . ~ , then j
2
_
(2)
(.. .) _ (.. ,)(,. .) _ c
2
0
, so that c
0
_ j. Since
j < 1, we can now choose a real number > 0 such that (j,c
2
0
)
I
< 1,j. Consider
the sets
n
= J(7
n
. 7
0
) < n|
in the trajectory space. We have
Pr
x
(
n
) =
,:d(x,,)~I n
(n)
(.. ,)
_
,:d(x,,)~I n
(j,c
0
)
d(x,,)
j
n
_ j
n
_
1
j I n]
k=1
,:d(x,,)=k
(j,c
0
)
k
_
= j
n
_
1
j I n]
k=1
M(M 1)
k-1
(j,c
0
)
k
_
= j
n
_
1 M(j,c
0
)
_
(M 1)j,c
0
_
j I n]
1
_
(M 1)j,c
0
_
1
_
_ C
_
(j,c
2
0
)
I
j
_
n
.
where C =
p
(-1)p-t
0
> 0. Therefore

n
Pr
x
(
n
) < o, and by the Borel
Cantelli lemma, Pr
x
(limsup
n
) = 0, or equivalently,
Pr
x
_
_
k_1
_
n_k
c
n
_
= 1
for the complements of the
n
. But this says that Pr
x
-almost surely, one has
J(7
n
. 7
0
) _ n for all but nitely many n.
We see that a reasonable random walk with j(1) < 1 moves away from
the starting point at linear speed. (Reasonable means that (.. ,) _ c
0
along
each edge). So we next ask when it is true that j(1) < 1. We start with simple
randomwalk. When I is an arbitrary symmetric, locally nite graph, then we write
j(I) = j(1) for the spectral radius of the transition matrix 1 of SRW on I.
9.82 Exercise. Show that for simple random walk on the homogeneous tree T
q
with degree q 1 _ 2, the spectral radius is
j(T
q
) =
2
_
q
q 1
.
[Hints. Variant 1: use the computations of Example 9.47. Variant 2: consider the
factor chain (
7
n
) on N
0
where

7
n
= [7
n
[ = J(7
n
. o) for the simple random
walk (7
n
) on T
q
starting at o. Then determine j(
1) from the computations in

Example 3.5.]
For the following, T need not be locally nite.
9.83 Theorem. Let T be a tree with root o and 1 a nearest neighbour transition
matrix with the property that (.. .
-
) _ 1 for each . T \ {o], where
1,2 < < 1. Then
j(1) _ 2
_
(1 )
and
liminf
n-o
1
n
J(7
n
. 7
0
) _ 2 1 Pr
x
-almost surely for every . T.
Furthermore,
J(.. .
-
) _ (1 ), for every . T \ {o].
In particular, if T is locally nite then the Green kernel vanishes at innity.
Proof. We dene the function g = g
on N
0
by
g
(0) = 1 and g
(n) =
_
1 (2 1)n
_ _
1-
_
n{2
for n _ 1.
Then g(1) = 2
_
(1 ), and a straightforward computation shows that
(1 ) g(n 1) g(n 1) = 2
_
(1 ) g(n) for n _ 1.
Also, g is decreasing. On our tree T , we dene the function by (.) = g
([.[),
where (recall) [.[ = J(.. o). Then we have for the transition matrix 1 of our
random walk
1(o) = g
(1) = 2
_
(1 ) g
(0) = 2
_
(1 ) (o).
and for . = o
1(.) 2
_
(1 ) (.) =
_
(.. .
-
) (1)
_ _
g
([.[ 1) g
([.[ 1)
_

> 0
.
We see that when (.. .
-
) _ 1 for all . = o then 1 _ 2
_
(1 ) and
(n)
(o. o) _ 1
n
(o) _
_
2
_
(1 )
_
n
(o) =
_
2
_
(1 )
_
n
for all n. The proposed inequality for j(1) follows.
Next, we consider the rate of escape. For this purpose, we construct a coupling
of our randomwalk (7
n
) on T and the randomwalk (sumof i.i.d. randomvariables)
(S
n
) on Z with transition probabilities (k. k 1) = and (k. k 1) = 1
for k Z. Namely, on the state space X = T Z, we consider the Markov chain
with transition matrix Q given by
q
_
(o. k). (,. k 1)
_
= (o. ,) and
q
_
(o. k). (,. k 1)
_
= (o. ,) (1 ). if ,
-
= o.
q
_
(.. k). (.
-
. k 1)
_
= (.. .
-
). if . = o.
q
_
(.. k). (,. k 1)
_
= (.. ,)
1 (.. .
-
)
and
q
_
(.. k). (,. k 1)
_
= (.. ,)
1 (.. .
-
)
1 (.. .
-
)
. if . = o. ,
-
= ..
All other transition probabilities are = 0. The two projections of X onto T and onto
Z are compatible with these transition probabilities, that is, we can form the two
corresponding factor chains in the sense of (1.30). We get that the Markov chain
on X with transition matrix Q is (7
n
. S
n
)
n_0
. Now, our construction is such that
when 7
n
walks backwards then S
n
moves to the left: if [7
n1
[ = [7
n
[ 1 then
S
n1
= S
n
1. In the same way, when S
n1
= S
n
1 then [7
n1
[ = [7
n
[ 1.
We conclude that
[7
n
[ _ S
n
for all n, provided that [7
0
[ _ S
0
.
In particular, we can start the coupled Markov chain at (.. [.[). Then S
n
can be
written as S
n
= [.[X
1
X
n
where the X
n
are i.i.d. integer randomvariables
with PrX
k
= 1| = and PrX
k
= 1| = 1 . By the law of large numbers,
S
n
,n 2 1 almost surely. This leads to the lower estimate for the velocity of
escape of (7
n
) on T .
With the same starting point (.. [.[), we see that (7
n
) cannot reach .
-
before
(S
n
) reaches [.[ 1 for the rst time. That is, in our coupling we have
t
[x[-1
Z
_ t
x
T
Pr
(x,[x[)
-almost surely,
where the two rst passage times refer to the randomwalks on Zand T , respectively.
Therefore,
J
T
(.. .
-
) = Pr
(x,[x[)
_
t
x
T
< o
_
_ Pr
(x,[x[)
_
t
[x[-1
Z
< o
_
= J
Z
([.[. [.[ 1).
We knowfromExamples 3.5and2.10that J
Z
([.[. [.[1) = J
Z
(1. 0) = (1),.
This concludes the proof.
9.84 Exercise. Show that when (.. .
-
) = 1 for each . T \ {o], where
1,2 < < 1, then the three inequalities of Theorem 9.83 become equalities.
[Hint: verify that ([7
n
[)
n_0
is a Markov chain.]
9.85 Corollary. Let T be a locally nite tree with deg(.) _ q 1 for all . T .
Then the spectral radius of simple random walk on T satises j(T ) _ j(T
q
).
Indeed, in that case, Theorem 9.83 applies with = q,(q 1).
Thus, when deg(.) _ 3 for all . then j(T ) < 1. We next ask what happens
with the spectral radius of SRW when there are vertices with degree 2.
We say that an unbranched path of length N in a symmetric graph I is a path
.
0
. .
1
. . . . . .
1
| of distinct vertices such that deg(.
k
) = 2 for k = 1. . . . . N 1.
The following is true in any graph (not necessarily a tree).
9.86 Lemma. Let I be a locally nite, connected symmetric graph. If I contains
unbranched paths of arbitrary length, then j(I) = 1.
Proof. Write
(n)
Z
(0. 0) for the transition probabilities of SRW on Z that we have
computed in Exercise 4.58 (b). We know that j(Z) = 1 in that example.
Given any n N, there is an unbranched path in I with length 2n 2. If . is
its midpoint, then we have for SRW on I that
(2n)
(.. .) =
(2n)
Z
(0. 0).
since within the rst 2n steps, our random walk cannot leave that unbranched path,
where it evolves like SRW on Z. We know from Theorem 2.32 that
(2n)
(.. .) _
j(I)
2n
. Therefore
j(I) _
(2n)
Z
(0. 0)
1{(2n)
1 as n o.
Thus, j(I) = 1.
Now we want to know what happens when there is an upper bound on the
lengths of the unbranched paths. Again, our considerations are valid for an arbitrary
symmetric graph I = (X. 1). We can construct a new graph

I by replacing
each non-oriented edge of I (= pair of oppositely oriented edges with the same
endpoints) with an unbranched path of length k (depending on the edge). The vertex
set X of I is a subset of the vertex set of

I. We call

I a subdivision of I, and the
maximum of those numbers k is the maximal subdivision length of

I with respect
to I.
In particular, we write I
(1)
for the subdivision of I where each non-oriented
edge is replaced by a path of the same length N. Let (7
n
) be SRW on I
(1)
. Since
the vertex set X
(1)
of I
(1)
contains X as a subset, we can dene the following
sequence of stopping times.
t
0
= 0. t
}
= inf{n > t
}-1
: 7
n
X. 7
n
= 7
t
j1
] for _ 1.
The set of those n is non-empty with probability 1, in which case the inmum is a
minimum.
9.87 Lemma. (a) The increments t
}
t
}-1
( _ 1) are independent and identically
distributed with probability generating function
E
x
0
(:
t
j
-t
j1
) =
o
n=1
Pr
x
t
1
= n| :
n
= (:) = 1,Q
1
(1,:). .
0
. . X.
where Q
1
(t ) is the N-th Chebyshev polynomial of the rst kind; see Example 5.6.
(b) If , is a neighbour of . in I then
o
n=1
Pr
x
t
1
= n. 7
t
1
= ,| :
n
=
1
deg(.)
(:).
(c) For any . X,
o
n=0
Pr
x
7
n
= .. t
1
> n| :
n
= [(:) =
(1,:)1
1-1
(:)
Q
1
(1,:)
.
where 1
1-1
(t ) is the (N 1)-st Chebyshev polynomial of the second kind.
Proof. (a) Let 7
t
j1
= . X X
(1)
, and let S(.) be the set consisting of . and
the neighbours of . in the original graph I. In I
(1)
, this set becomes a star-shaped
graph S
(1)
(.) with centre ., where for each terminal vertex , N(.) \ {.] there
is a path from . to , with length N. Up to the stopping time t
}
, (7
n
) is SRW
on S
(1)
(.). But for the latter simple random walk, we can construct the factor
chain

7
n
= J(7
n
. .), with J(. ) being the graph distance in S
(1)
(.). This is the
birth-and-death chain on {0. . . . . N] which we have considered in Example 5.6. Its
transition probabilities are (0. 1) = 1 and (k. k1) = 1,2 for k = 1. . . . . N1.
Then t
}
is the rst instant n after time t
}-1
when

7
n
= N. But this just says that
Pr
x
0
t
}
t
}-1
= k| =
(k)
(0. N).
the probability that (
7
n
) starting at 0 will rst hit the state N at time k. The associ-
ated generating function J(0. N[:) = 1,Q
1
(1,:) was computed in Example 5.6.
Since this distribution does not depend on the specic starting point .
0
nor on
. = 7
t
j1
, the increments must be independent. The precise argument is left as
an exercise.
(b) In S
(1)
(.) every terminal vertex , occurs as 7
t
j
with the same probability
1, deg(.). This proves statement (b).
(c) Pr
x
7
n
= .. t
1
> n| is the probability that SRW on S
(1)
(.) returns to . at
time n before visiting any of the endpoints of that star. With the same factor chain
argument as above, this is the probability to return to 0 for the simple birth-and-
death Markov chain on {0. 1. . . . . N] with state 0 reecting and state N absorbing.
Therefore [(:) is the Green function at 0 for that chain, which was computed at
the end of Example 5.6.
9.88 Exercise. (1) Complete the proof of the fact that the increments t
}
t
}-1
are
independent. Does this remain true for an arbitrary subdivision of I in the place of
I
(1)
?
(2) Use Lemma 9.87 (b) andinductiononk toshowthat for all .. , X X
(1)
,
o
n=0
Pr
x
_
t
k
= n. 7
n
= ,
_
:
n
=
(k)
(.. ,) (:)
k
.
where
(k)
(.. ,) refers to SRW on I.
[Hint: Lemma 9.87 (b) is that statement for k = 1.]
9.89 Theorem. Let I = (X. 1) be a connected, locally nite symmetric graph.
(a) The spectral radii of SRW on I and its subdivision I
(1)
are related by
j(I
(1)
) = cos
arccos j(I)
N
.
(b) If

I is an arbitrary subdivision of I with maximal subdivision length N,
then
j(I) _ j(
I) _ j(I
(1)
).
Proof. (a) We write G(.. ,[:) and G
(1)
(.. ,[:) for the Green functions of SRW
on I
(1)
and I, respectively. We claim that for .. , X X
(1)
,
G
(1)
(.. ,[:) = G
_
.. ,[(:)
_
[(:). (9.90)
where (:) and [(:) are as in Lemma 9.87. Before proving this, we explain how
it implies the formula for j(I
(1)
).
All involved functions in (9.90) arise as power series with non-negative coef-
cients that are _ 1. Their radii of convergence are the respective smallest positive
singularities (by Pringsheims theorem, already used several times). Let r(I
(1)
) be
the (common) radius of convergence of all the functions G
(1)
(.. ,[:). By (9.90),
it is the minimum of the radii of convergence of G
_
.. ,[(:)
_
and [(:).
We know from Example 5.6 that the radii of convergence of (:) and [(:)
(which are the functions J(0. N[:) and G(0. 0[:) of that example, respectively)
coincide and are equal to s = 1, cos
t
21
. Therefore the function G
_
.. ,[(:)
_
is -
nite and analytic for each : (0. s) that satises (:) < r(I) = 1,j(I), while the
function has a singularity at the point where (:) = r(I), that is, Q
1
(1,:) = j(I).
The unique solution of this last equation for : (0. s) is : = 1, cos
arccos p(I)
1
. This
must be r(I
(1)
), which yields the stated formula for j(I
(1)
).
So we now prove (9.90).
We consider the random variables j
n
= max{ : t
}
_ n] and decompose, for
.. , X X
(1)
,
G
(1)
(.. ,[:) =
o
k=0
o
n=0
Pr
x
7
n
= ,. j
n
= k| :
n

W
k
(.. ,[:)
.
When 7
n
= , and j
n
= k then we cannot have 7
t
k
= ,
t
= ,, since the random
walk cannot visit any new point in X (i.e., other than ,
t
) in the time interval t
k
. n|.
Therefore 7
t
k
= , and 7
i
X \ {,] when t
k
< i < n. Using Lemma 9.87 and
Exercise 9.88(2),
W
k
(.. ,[:)
=
o
n=0
o
n=n
Pr
x
_
t
k
= m. 7
n
= 7
n
= ,. 7
i
X\{,] (m < i < n)
_
:
n
=
o
n=0
Pr
x
_
t
k
= m. 7
n
= ,
_
:
n
n=n
Pr
x
_
7
n
= ,. 7
i
X \ {,] (m < i < n) [ 7
n
= ,
_
:
n-n
=
(k)
(.. ,) (:)
k
o
n=0
Pr
,
_
7
n
= ,. 7
i
X \ {,] (0 < i < n)
_
:
n
=
(k)
(.. ,) (:)
k
[(:).
where
(k)
(.. ,) refers to SRW on I. We conclude that
G
(1)
(.. ,[:) =
o
k=0
(k)
(.. ,) (:)
k
[(:).
which is (9.90).
(b) The proof of the inequality uses the network setting of Section 4.A, and
in particular, Proposition 4.11. We write X and

X for the vertex sets, and 1 and
1 for the transition operators (matrices) of SRW on I and

I, respectively. We
distinguish the inner products on the associated
2
-spaces as well as the Dirichlet
norms of (4.32) by a I, resp.

I in the index. Since 1 is self-adjoint, we have
[1[ = sup
_
(1. )
I
(. )
I
:
0
(X). = 0
_
.
Let

X be the vertex set of

I. We dene a map g:

X X as follows. For
. X

X, we set g(.) = .. If .

X \ X is one of the new vertices on
one of the inserted paths of the subdivision, then g( .) is the closer one among the
two endpoints in X of that path. When . is the midpoint of such an inserted path,
we have to choose one of the two endpoints as g(.). Given
0
(X), we let
= g
0
(

X). The following is simple.
(

.

)
I
_ (. )
I
and D
I
(

) = D
I
( ).
Having done this exercise, we can resume the proof of the theorem. For arbitrary

0
(X),
(. )
I
(1. )
I
= D
I
( ) = D
I
(

) = (

.

)
I
(
1

.

)
I
_ (1 [
1[)(

.

)
I
_ (1 [
1[)(. )
I
.
That is, for arbitrary
0
(X),
(1. )
I
_ [
1[(. )
I
.
We infer that j(I) = [1[ _ [
1[ = j(
I).
Since I
(1)
is in turn a subdivision of

I, this also yields j(
I) _ j(I
(1)
).
Combining the last theorem with Lemma 9.86 and Corollary 9.85 we get the
following.
9.92 Theorem. Let T be a locally nite tree without vertices of degree 1. Then
SRW on T satises j(T ) < 1 if and only if there is a nite upper bound on the
lengths of all unbranched paths in T .
Another typical proof of the last theorem is via comparison of Dirichlet forms
and quasi-isometries (rough isometries), which is more elegant but also a bit less
elementary. Compare with Theorem 10.9 in [W2]. Here, we also get a numerical
bound. When deg(.) _ q 1 for all vertices . with deg(.) > 2 and the upper
bound on the lengths of unbranched paths is N, then
j(T ) _ cos
arccos j(T
q
)
N
.
The following exercise takes us again into the use of the Dirichlet sum of a
reversible Markov chain, as at the end of the proof of Theorem 9.89. It provides a
tool for estimating j(1) for non-simple random walks.
9.93 Exercise. Let T be as in Corollary 9.92. Let 1 be the transition matrix of a
nearest neighbour random walk on the locally nite tree T with associated measure
m according to (9.6). Suppose the following holds.
(i) There is c > 0 such that m(.)(.. ,) _ c for all .. , T with . ~ ,.
(ii) There is M < osuch that m(.) _ M deg(.) for every . T .
Show that the spectral radii of 1 and of SRW on T are related by
1 j(1) _
c
M
_
1 j(T )
_
.
In particular, j(1) < 1 when T is as in Theorem 9.92.
[Hint: establishinequalities betweenthe
2
andDirichlet norms of nitelysupported
functions with respect to the two reversible Markov chains. Relate D( ),(. )
with the spectral radius.]
We next want to relate the rate of escape with natural projections of a tree onto
the integers. Recall the denition (9.30) of the horocycle function hor(.. ) of a
vertex . with respect to the end of a tree T with a chosen root o. As above, we
write
n
() for the n-th vertex on the geodesic ray (o. ). Once more, we do not
require T to be locally nite.
9.94 Proposition. Let (.
n
) be a sequence in a tree T with .
n-1
~ .
n
for all n.
Then the following statements are equivalent.
(i) There is a constant a _ 0 (the rate of escape of the sequence) such that
lim
n-o
1
n
[.
n
[ = a.
(ii) There is an end dT and a constant b _ 0 such that
lim
n-o
1
n
J
_
.
n
.
jbn]
()
_
= 0.
(iii) For some (==every) end dT , there is a constant a
R such that
lim
n-o
1
n
hor(.
n
. ) = a
.
Furthermore, we have the following.
(1) The numbers a, b and a
of the three statements are related by a = b = [a
[.
(2) If b > 0 in statement (ii), then .
n
.
(3) If a
< 0 in statement (iii), then .

n
, while if a
> 0 then one has

lim.
n
dT \ {].
Proof. (i) ==(ii). If a = 0 then we set b = 0 and see that (ii) holds for arbitrary
dT . So suppose a > 0. Then
[.
n
. .
n1
[ =
1
2
_
[.
n
[ [.
n1
[ J(.
n
. .
n1
)
_
= na o(n) o.
By Lemma 9.16, there is dT such that .
n
. Thus,
[.
n
.[ = lim
n-o
[.
n
..
n
[ _ lim
n-o
min{[.
i
..
i1
[ : i = n. . . . . m1] =nao(n).
Since [.
n
.[ _ [.
n
[ = na o(n), we see that [.
n
.[ = na o(n). Therefore,
on one hand
J(.
n
. .
n
. ) = [.
n
[ [.
n
. [ = o(n).
while on the other hand .
n
. and
jon]
() lie both on (o. ), so that
J
_
.
n
. .
jon]
()
_
= o(n).
We conclude that J
_
.
n
.
jon]
()
_
= o(n), as proposed. (The reader is invited to
visualize these arguments by a gure.)
(ii) ==(iii). Let be the end specied in (ii). We have
J(.
n
. .
n
. ) = J
_
.
n
. (o. )
_
_ J
_
.
n
.
jbn]
()
_
= o(n).
Also,
J
_
.
n
. .
jbn]
()
_
_ J
_
.
n
.
jbn]
()
_
= o(n).
Therefore
[.
n
. [ = [
jbn]
()[ o(n) = bn o(n). and
hor(.
n
. ) = J(.
n
. .
n
. ) [.
n
. [ = bn o(n).
Thus, statement (iii) holds with respect to with a
= b. It is clear that .
n

when b > 0, and that we do not have to specify when b = 0.
To complete this step of the proof, let j = be another end of T , and assume
b > 0. Let , = . j. Then .
n
. (,. ) and .
n
. j = , for n sufciently
large. For such n,
hor(.
n
. j) = J(.
n
. .
n
. j) [.
n
. j[ = J(.
n
. ,) [,[
= J(.
n
. .
n
. ) [.
n
. [ 2[,[ = bn o(n).
(The reader is again invited to draw a gure.) Thus, statement (iii) holds with
respect to j with a
)
= b.
(iii) ==(i). We suppose to have one end such that h(.
n
. ),n a
. Without
loss of generality, we may assume that .
0
= o. Since for any .,
[.[ = J(.. . . ) [. . [ and hor(.. ) = J(.. . . ) [. . [.
we have
liminf
n-o
[.
n
[
n
_ lim
n-o
[ hor(.
n
. )[
n
= [a
[.
Next, note that every point on (o. .
n
) is some .
k
with k _ n. In particular,
.
n
. = .
k(n)
with k(n) _ n.
Consider rst the case when a
< 0. Then
[.
k(n)
[ = J(.
n
. .
k(n)
) hor(.
n
. ) _ hor(.
n
. ) o.
so that k(n) o,
[.
n
[
n
=
hor(.
n
. ) 2[.
n
. [
n
=
hor(.
n
. )
n
2[ hor(.
k(n)
. )[
k(n)
k(n)
n
_ 1
.
which implies
limsup
n-o
[.
n
[
n
_ a
2[a
[ = [a
[.
The same argument also applies when a
= 0, regardless of whether k(n) o

or not.
Finally, consider the case when a
> 0. We claim that k(n) is bounded. In-

deed, if k(n) had a subsequence k(n
t
) tending to o, then along that subsequence
hor(.
k(n
0
)
. ) = a
k(n
t
) o
_
k(n
t
)
_
o, while we must have hor(.
k(n
0
)
. ) _ 0
for all n. Therefore
[.
n
[
n
=
hor(.
n
. )
n
2[.
k(n)
[
n

0
a
as n o.
Proposition 9.94 contains information about the rate of escape of a nearest
neighbour random walk on a tree T . Namely, lim[7
n
[,n exists a.s. if and only
if limhor(7
n
. ),n exists a.s., and in that case, the former limit is the absolute
value of the latter. The process S
n
= hor(7
n
. ) has the integer line Z as its state
space, but in general, it is not a Markov chain. We next consider a class of simple
examples where this horocyclic projection is Markovian.
9.95 Example. Choose and x an end of the homogeneous tree T
q
, and let
hor(.) = hor(.. ) be the Busemann function with respect to . Recall the
denition (9.32) of the horocycles Hor
k
, k Z, with respect to that end. Instead
of the radial picture of T
q
of Figure 26, we can look at it in horocyclic layers
with each Hor
k
on a horizontal line.
This is the upper half plane drawing of the tree. We can think of T
q
as an
innite genealogical tree, where the horocycles are the successive, innite genera-
tions and is the mythical ancestor from which all elements of the population
descend. For each k, every element in the k-th generation Hor
k
has precisely one
predecessor .
Hor
k-1
and q successors in Hor
k1
(vertices , with ,
= .).
Analogously to the notation for .
-
as the neighbour of . on .. o|, the predecessor
.
is the neighbour of . on .. |).

................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
.........................................................................................................................................................................................................................
.....................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
............................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
............................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...............
. . . . . . . . . . . . . . .
...............
. . . . . . . . . . . . . . .
...............
. . . . . . . . . . . . . . .
...............
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.........................................................................................................................................................................................................................
. . . . . . . . . . . . . . .
...............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
............................................
. . . . . . . . . . . . . . .
...............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
. . .
. . .
Hor
-3
Hor
-2
Hor
-1
Hor
0
Hor
1
.
.
.
.
.
.
o
-
Figure 37
We look at the simplest type of nearest neighbour transition probabilities on T
q
that
are compatible with this generation structure. For a xed parameter (0. 1),
dene
(.
. .) =

q
and (.. .
) = 1 . . T
q
.
For the resulting random walk (7
n
) on the tree, X
n
= hor(7
n
) denes clearly a
factor chain in the sense of (1.29). Its state space is Z, and its (nearest neighbour)
transition probabilities are
(k 1. k) = and (k. k 1) = 1 . . T
q
.
This is the innite drunkards walk on Z of Example 3.5 with = . (Attention:
our q here is not 1 , but the branching number of T
q
!) If we have the starting
point 7
0
= . and k = hor(.), then we can represent S
n
as a sum
S
n
= k X
1
X
n
.
where the X
}
are independent with PrX
}
= 1| = and PrX
}
= 1| = 1 .
The classical law of large numbers tells us that S
n
,n 2 1 almost surely.
Proposition 9.94 implies immediately that on the tree,
lim
n-o
[7
n
[
n
= [2 1[ Pr
x
-almost surely for every ..
We next claim that Pr
x
-almost surely for every starting point . T
q
,
7
n
7
o
dT \ {]. when > 1,2. and
7
n
. when _ 1,2.
In the rst case, 7
o
is a true random variable, while in the second case, the
limiting end is deterministic. In fact, when = 1,2, this limiting behaviour
follows again from Proposition 9.94. The drift-free case = 1,2 requires some
additional reasoning.
9.96 Exercise. Show that for every (0. 1), the random walk is transient (even
for = 1,2).
[Hint: there are various possibilities to verify this. One is to compute the Green
function, another is to use the fact the we have nitely many cone types; see Fig-
ure 35.]
Now we can show that 7
n
when = 1,2. In this case, S
n
= hor(7
n
)
is recurrent on Z. In terms of the tree, this means that 7
n
visits Hor
0
innitely
often; there is a random subsequence (n
k
) such that 7
n
k
Hor
0
. We know from
Theorem 9.18 that (7
n
) converges almost surely to a random end. This is also true
for the subsequence (7
n
k
). Since hor(7
n
k
) = 0 for all k, this random end cannot
be distinct from .
9.97 Exercise. Compute the Green and Martin kernels for the random walk of
Example 9.95. Show that the Dirichlet problem at innity is solvable if and only if
> 1,2.
We mention that Example 9.95 can be interpreted in terms of products of random
afne transformations over a non-archimedean local eld (such as the -adic num-
bers), compare with Cartwright, Kaimanovich and Woess [9]. In that paper, a
more general version of Proposition 9.94 is proved. It goes back to previous work
of Kaimanovich [33].
Rate of escape on trees with nitely many cone types
We now consider (T. 1) with nitely many cone types, as in Denition 9.74. Con-
sider the associated matrix over the set of cone types , as in (9.77).
Here, we assume to have irreducible cone types, that is, the matrix is irre-
ducible. In terms of the tree T , this means that for all i. , every cone with
type i contains a sub-cone with type . Recall the denition of the functions J
i
(:),
i , and the associated system (9.76) of algebraic equations.
9.98 Lemma. If (T. 1) has nitely many, irreducible cone types and the random
walk is transient, then for every i ,
J
i
(1) < 1 and J
t
i
(1) < o.
Proof. We always have J
i
(1) _ 1 for every i . If J
}
(1) < 1 for some then
(9.76) yields J
i
(1) < 1 for every i with a(i. ) > 0. Now irreducibility yields that
J
i
(1) < 1 for all i .
Besides the diagonal matrix D(:) = diag
_
J
i
(:)
_
i
that we introduced in the
proof of Theorem 9.78, we consider
T = diag
_
(i )
_
i
.
Then we can write the formula of Exercise 9.4 as
D
t
(:) =
1
:
2
D(:)
2
T
-1
D(:)
2
D
t
(:).
where D
t
(:) refers to the elementwise derivative. We also know that j(Q) = 1
for the matrix of (9.79), and also that j
_
D(1)
_
= 1. Proposition 3.42 and
Exercise 3.43 imply that j
_
D(:)
2
_
_ j
_
D(1)
2
_
< 1 for each : 0. 1|.
Therefore the inverse matrix
_
1 D(:)
2
_
-1
exists, is non-negative, and depends
continuously on : 0. 1|. We deduce that
D
t
(:) =
1
:
2
_
1 D(:)
2
_
-1
D(:)
2
T
-1
is nite in each (diagonal) entry for every : 0. 1|.
The last lemma implies that the Dirichlet problem at innity admits solution
(Corollary 9.44) and that limsup
n
J
_
7
n
. (o. 7
o
)
_
log n _ C < o almost
surely (Theorem 9.59).
Now recall the boundary process (W
k
)
k_1
of Denition 9.53, the increments
k1
=
k1
k
of the exit times, and the associated Markov chain
_
W
k
.
k
_
k_1
.
The formula of Corollary 9.57 shows that the transition probabilities of (W
k
) depend
only on the types of the points. That is, we can build the -valued factor chain
_
|(W
k
)
_
k_1
, whose transition matrix is just the matrix Q =
_
q(i. )
_
i,}
of (9.79).
It is irreducible and nite, so that it admits a unique stationary probability measure
o on . Note that o(i ) is the asymptotic frequency of cone type i in the boundary
process: by the ergodic theorem (Theorem 3.55),
lim
n-o
1
n
n
k=1
1
i
_
|(W
k
)
_
= o(i ) Pr
o
-almost surely.
The sum appearing on the left hand side is the number of all vertices that have cone
type i among the rst n points (after o) on the ray (o. 7
o
). For the following, we
write
f
(n)
( ) =
(n)
(,. ,
-
). where , T \ {o]. |(,) = .
9.99 Lemma. If (T. 1) has nitely many, irreducible cone types and the random
walk is transient, then the sequence
_
|(W
k
).
k
_
k_1
is a positive recurrent Markov
chain with state space N
odd
. Its transition probabilities are
q
_
(i. m). (. n)
_
= q(i. )f
(n)
( ),J
}
(1).
Its stationary probability measure is given by
o(. n) =
i
o(i ) q
_
(i. m). (. n)
_
(independent of m).
Proof. We have that f
(n)
( ) > 0 for every odd n: if . has type and , is a forward
neighbour of ., then
(2n1)
(.. .
-
) _
_
(.. ,)(,. .)
_
n
(.. .
-
). We see that
irreducibility of Q implies irreducibility of

Q. It is a straightforward exercise to
compute that o is a probability measure and that o

Q = o.
9.100 Theorem. If (T. 1) has nitely many, irreducible cone types and the random
walk is transient, then
lim
n-o
[7
n
[
n
= Pr
o
-almost surely, where = 1
i
o(i )
J
t
i
(1)
J
i
(1)
.
Proof. Consider the projection g : N
odd
N, (i. n) n. Then
_
N
odd
g J o =
nN
odd
n o(. n)
=
i,}
nN
o(i ) q(i. ) nf
(n)
( ),J
}
(1)
=
}
o( ) J
t
}
(1),J
}
(1) = 1,.
where is dened by the rst of the two formulas in the statement of the theorem.
The ergodic theorem implies that
lim
k-o
1
k
k
n=1
g
_
|(W
n
).
n
_
= 1, Pr
o
-almost surely.
The sum on the left hand side is
k

0
, and we get that k,
k
Pr
o
-almost
surely.
We now dene the integer valued random variables
k(n) = max{k :
k
_ n].
Then k(n) oPr
o
-almost surely, 7
k.n/
= W
k(n)
, and since 7
n
T
W
k.n/
, the
cone rooted at the random vertex W
k(n)
,
[7
n
[ k(n) = [7
n
[ [W
k(n)
[ = J(7
n
. W
k(n)
)
_ n
k(n)
<
k(n)1

k(n)
=
k(n)1
.
(The middle _ follows from the nearest neighbour property.) Now
0 <
k(n)1
n
n
_

k(n)1

k(n)
n
_

k(n)1

k(n)
k(n)
0 Pr
o
-almost surely,
as n o, because
k1
,
k
1 as k o. Consequently,
[7
n
[ k(n)
n
0 and

k(n)
n
1.
We conclude that
[7
n
[
n
=
[7
n
[ k(n)
n
k(n)
k(n)
k(n)
n
Pr
o
-almost surely,
as n o.
The last theorem is taken from [44].
9.101 Exercise. Show that the formula for the rate of escape in Theorem 9.100 can
be rewritten as
= 1
i
o(i )
J
i
(1)
p(i )
_
1 J
i
(1)
_.
9.102 Example. As an application, let us compute the rate of escape for the ran-
dom walk of Example 9.47. When is nite, there are nitely many cone types,
d(i. ) = 1 and p(i. ) =
}
when = i , and p(i ) =
i
. The numbers J
i
(1)
are determined by (9.49), and
q(i. ) = J
i
(1)
i
1 J
}
(1)
1 J
i
(1)
.
We compute the stationary probability measure o for Q. Introducing the auxiliary
term
H =
}
o( )
}
J
}
(1)
1 J
}
(1)
.
the equation o Q = o becomes
_
H
o(i )
i
J
i
(1)
1 J
i
(1)
_
i
_
1 J
i
(1)
_
= o(i ). i .
Therefore
o(i ) = H
i
1 J
i
(1)
1 J
i
(1)
=
H
G(1)
J
i
(1)
_
1 J
i
(1)
_
2
.
The last identity holds because
i
_
1 J
i
(1)
2
_
= J
i
(1),G(1) by (9.48), which
also implies that
i
J
i
(1)
1 J
i
(1)
= G(1)
i
J
i
(1) G(1) = 1
by (1.34). Now, since o is a probability measure,
o(i ) =
J
i
(1)
_
1 J
i
(1)
_
2
_
}
J
}
(1)
_
1 J
}
(1)
_
2
.
Combining all those identities with the formula of Exercise 9.101, we compute with
some nal effort that the rate of escape is
=
1
G(1)
i
J
i
(1)
_
1 J
i
(1)
_
2
.
I rst learnt this specic formula from T. Steger in the 1980s.
Solutions of all exercises
Exercises of Chapter 1
Exercise 1.22. This is straightforward, but formalizing it correctly needs some
work. Working with the trajectory space, we rst assume that is a cylinder
in A
n
, that is, = C(a
0
. . . . . a
n-1
. .) with a
0
. . . . . a
n-1
X. Then, by the
Markov property as stated in Denition 1.7, combined with (1.19),
Pr
7
n
= , [ |
=
x
mC1
,...,x
n1
A
Pr
_
7
n
= ,. 7
}
= .
}
( = m1. . . . . n 1)
7
n
= .. 7
i
= a
i
(i = 0. . . . . m 1)
_
=
x
mC1
,...,x
n1
A
Pr
x
7
n-n
= ,. 7
}
= .
}
( = 1. . . . . n m 1)|
= Pr
x
7
n-n
= ,| =
(n-n)
(.. ,).
Next, a general set A
n
with the stated properties must be a nite or countable
disjoint union of cylinders C
i
A
n
, i , each of the form C(a
0
. . . . . a
n-1
. .)
with certain a
0
. . . . . a
n-1
X. Therefore
Pr
7
n
= , [ C
i
| =
(n-n)
(.. ,) whenever Pr
(C
i
) > 0.
So by the rules of conditional probability,
Pr
7
n
= , [ | =
1
Pr
()
i
Pr
(7
n
= ,| C
i
)
=
1
Pr
()
i
Pr
7
n
= , [ C
i
| Pr
(C
i
)
=
1
Pr
()
(n-n)
(.. ,) Pr
(C
i
) =
(n-n)
(.. ,).
The last statement of the exercise is a special case of the rst one.
Exercise 1.25. Fix n N
0
and let .
0
. . . . . .
n1
X. For any k N
0
, the event
k
= t = k. 7
k}
= .
}
( = 0. . . . . n)| is in A
kn
. Therefore we can apply
298 Solutions of all exercises
Exercise 1.22:
Pr
7
tn1
= .
n1
. 7
t}
= .
}
( = 0. . . . . n)|
=
o
k=0
Pr
7
kn1
= .
n1
. 7
k}
= .
}
( = 0. . . . . n). t = k|
=
o
k=0
Pr
7
kn1
= .
n1
[
k
| Pr
(
k
)
=
o
k=0
(.
n
. .
n1
) Pr
(
k
)
= (.
n
. .
n1
) Pr
7
t}
= .
}
( = 0. . . . . n)|.
Dividing by Pr
7
t}
= .
}
( = 0. . . . . n)|, we see that the statements of the
exercise are true. (The computation of the initial distribution is straightforward.)
Exercise 1.31. We start with the if part, which has already been outlined. We
dene Pr by Pr(

) = Pr
-1
(

)
_
for

A and have to show that under this
probability measure, the sequence of n-th projections

7
n
:

C

X is a Markov
chain with the proposed transition probabilities ( .. ,) and initial distribution v.
For this, we only need to show that Pr = Pr

, and equality has only to be checked
for cylinder sets because of the uniqueness of the extended measure. Thus, let
= C( .
0
. . . . . .
n
)

A. Then
-1
(

) =
x
0
x
0
,...,x
n
x
n
C(.
0
. . . . . .
n
).
and (inductively)
Pr(

) =
x
0
x
0
,...,x
n
x
n
v(.
0
) (.
0
. .
1
) (.
n-1
. .
n
)
=
x
0
x
0
v(.
0
)
x
1
x
1
(.
0
. .
1
)
x
n
x
n
(.
n-1
. .
n
)

( .
n1
. .
n
)
= v( .
0
) ( .
0
. .
1
) ( .
n-1
. .
n
) = Pr

(

).
For the only if, suppose that
_
(7
n
)
_
is (for every starting point . X) a Markov
chain on

X with transition probabilities ( .. ,). Then, given two classes .. ,

X,
for every .
0
. we have
( .. ,) = Pr
x
0
(7
1
) = ,[(7
0
) = .| = Pr
x
0
7
1
,| =
, ,
(.
0
. ,).
as required.
Exercises of Chapter 1 299
Exercise 1.41. We decompose with respect to the rst step. For n _ 1
u
(n)
(.. .) =
,A
Pr
x
t
x
= n. 7
1
= ,| =
,A
(.. ,) Pr
x
t
x
= n [ 7
1
= ,|
=
,A
(.. ,) Pr
,
s
x
= n 1| =
,A
(.. ,)
(n-1)
(,. .).
Multiplying by :
n
and summing over all n, we get the formula of Theorem 1.38 (c).
Exercise 1.44. This works precisely as in the proof of Proposition 1.43.
u
(n)
(.. .) = Pr
x
t
x
= n|
_ Pr
x
t
x
= n. s
,
_ n|
=
n
k=0
Pr
x
s
,
= k| Pr
x
t
x
= n [ s
,
= k|
=
n
k=0
(k)
(.. ,)
(n-k)
(,. .).
since u
(n-k)
(,. .) =
(n-k)
(,. .) when . = ,.
Exercise 1.45. We have
E
x
(s
,
[ s
,
< o) =
o
n=1
n Pr
x
s
,
= n [ s
,
< o|
=
o
n=1
n
Pr
x
s
,
= n|
1r
x
s
,
< o|
=
o
n=1
n
(n)
(.. ,)
J(.. ,[1)
=
J
t
(.. ,[1)
J(.. ,[1)
.
More precisely, in the case when : = 1 is on the boundary of the disk of con-
vergence of J(.. ,[:), that is, when s(.. ,) = 1, we can apply the theorem of
Abel: J
t
(. N[1) has to be replaced with J
t
(. N[1), the left limit along the
real line. (Actually, we just use the monotone convergence theorem, interpreting
n
n
(n)
(.. ,):
n
as an integral with respect to the counting measure and letting
: 1 from below.)
If n is a cut point between . and ,, then Proposition 1.43 (b) implies
J
t
(.. ,[:)
J(.. ,[:)
=
J
t
(.. n[:)
J(.. n[:)
J
t
(n. ,[:)
J(.. ,[:)
.
and the formula follows by setting : = 1.
Exercise 1.47. (a) If = q = 1,2 then we can rewrite the formula for
J
t
(. N[:),J(. N[:)
of Example 1.46 as
J
t
(. N[:)
J(. N[:)
=
2
:
2
z
1
(:)
1
(:) 1
_
N
(:)
1
1
(:)
1
1

(:)
}
1
(:)
}
1
_
.
We have z
1
(1) = (1) = 1. Letting : 1, we see that
E
}
(s
1
[ s
1
< o) = lim
-1
2
1
_
N
1
1
1
1

}
1
}
1
_
.
This can be calculated in different ways. For example, using that
(
1
1)(
}
1) ~ ( 1)
2
N as 1.
we compute
2
1
_
N
1
1
1
1

}
1
}
1
_
~
2
( 1)
3
N
_
(N )(
1}
1) (N )(
1

}
)
_
=
2
( 1)
2
N
_
(N )
1}-1
k=0
k
(N )
1-1
k=}
k
_
=
2
( 1)
2
N
_
(N )
1}-1
k=1
(
k
1) (N )
1-1
k=}
(
k
1)
_
=
2
( 1) N
_
(N )
1}-1
k=1
k-1
n=0
n
(N )
1-1
k=}
k-1
n=0
n
_
=
2
( 1) N
_
(N )
1}-1
k=1
k-1
n=1
(
n
1) (N )
1-1
k=}
k-1
n=1
(
n
1)
_
~
2
N
_
(N )
1}-1
k=1
k-1
n=1
m (N )
1-1
k=}
k-1
n=1
m
_
=
N
2

2
3
.
(b) If the state 0 is absorbing, then the linear recursion
J(. N[:) = q: J( 1. N[:) : J( 1. N[:). = 1. . . . . N 1.
remains the same, as well as the boundary value J(N. N[:) = 1. The boundary
value at 0 has to be replaced with the equation J(0. N[:) = : J(1. N[:). Again,
for [:[ < 1
2
_
q, the solution has the form
J(. N[:) = a z
1
(:)
}
b z
2
(:)
}
.
but now
a b = a : z
1
(:) b : z
2
(:) and a z
1
(:)
1
b z
2
(:)
1
= 1.
Solving in a and b yields
J(. N[:) =
_
1 :z
1
(:)
_
z
2
(:)
}
_
1 :z
2
(:)
_
z
1
(:)
}
_
1 :z
1
(:)
_
z
2
(:)
1
_
1 :z
2
(:)
_
z
1
(:)
1
.
In particular, Pr
}
s
1
< o| = J(. N[1) = 1, as it must be (why?), and E
}
(s
1
) =
J
t
(. N[1) is computed similarly as before. We omit those nal details.
Exercise 1.48 Formally, we have G(:) = (1 :1)
-1
, and this is completely
justied when X is nite. Thus,
G
o
(:) =
_
1 :
_
a1 (1 a)1
_
_
-1
=
1
1-oz
_
1
z-oz
1-oz
1
_
-1
=
1
1-oz
G
_
z-oz
1-oz
_
.
For general X, we can argue by approximation with respect to nite subsets, or as
follows:
G
o
(.. ,[:) =
o
n=0
:
n
n
k=0
_
n
k
_
a
n-k
(1 a)
k
(k)
(.. ,)
=
o
k=0
_
1 a
a
_
k
(k)
(.. ,)
o
n=k
_
n
k
_
(a:)
n
.
Since
o
n=k
_
n
k
_
(a:)
n
=
1
1 a:
_
a:
1 a:
_
k
for [a:[ < 1.
the formula follows.
Exercise 1.55. Let = . = .
0
. .
1
. . . . . .
n
= .| be a path in (.. .) \ {.|].
Since .
n
= ., we have 1 _ k _ n, where k = min{ _ 1 : .
}
= .]. Then
1
= .
0
. .
1
. . . . . .
k
|
-
(.. .).
2
= .
k
. .
k1
. . . . . .
n
| (.. .).
and =
1

2
.
This decomposition is unique, which proves the rst formula. We deduce
G(.. .[:) = w
_
(.. .)[:
_
= w
_
.|[:
_
w
_
-
(.. .) (.. .)[:
_
= 1 w
_
-
(.. .)[:
_
w
_
(.. .)[:
_
= 1 U(.. .[:)G(.. .[:).
Analogously, if = . = .
0
. .
1
. . . . . .
n
= ,| is a path in (.. ,) then we let
m = min{ _ 0 : .
}
= ,]. We get
1
= .
0
. .
1
. . . . . .
n
|
(.. ,).
2
= .
n
. .
k1
. . . . . .
n
| (,. ,).
and =
1

2
.
Once more, the decomposition is unique, and
G(.. ,[:) = w
_
(.. ,)[:
_
= w
_
(.. ,) (,. ,)[:

_
= J(.. ,[:)G(,. ,[:).
Regarding Theorem 1.38 (c), every
-
(.. .) has a unique decomposition
= .. ,|
2
, where .. ,| 1
_
I(1)
_
and
2

(,. .). That is,
-
(.. .) =
,:x,,jT
_
I(1)
_
.. ,|
(,. .).
which yields the formula for U(.. .[:). In the same way, let , = . and
(.. ,). Then either = .. ,|, which is possible only when .. ,| 1

_
I(1)
_
,
or else = .. n|
2
, where .. n| 1
_
I(1)
_
and
2

(n. ,). Thus,

noting that
(,. ,) = {,|],
(.. ,) =
u:x,ujT(I(1))
.. n|
(n. ,). , = ..
which yields Theorem 1.38 (d) in terms of weights of paths.
Exercise 2.6. (a) ==(b). If C is essential and . , then C = C(.) C(,) in
the partial order of irreducible classes. Since C is maximal in that order, C(,) = C,
whence , C.
(b) ==(c). Let . C and . ,. By assumption, , C, that is, . - ,.
(c) ==(a). Let C(,) be any irreducible class such that C C(,) in the partial
order of irreducible classes. Choose . C. Then . ,. By assumption, also
, ., whence C(,) = C(.) = C. Thus, C is maximal in the partial order.
Exercise 2.12. Let (X
1
. 1
1
) and (X
2
. Y
2
) be Markov chains.
An isomorphism between (X
1
. 1
1
) and (X
2
. 1
2
) is a bijection : X
1
X
2
such that
2
(.
1
. ,
1
) =
1
(.
1
. ,
1
) for all .
1
. ,
1
X
1
.
Note that this denition does not require that the matrix 1 is stochastic. An au-
tomorphism of (X. 1) is an isomorphism of (X. 1) onto itself. Then we have the
following obvious fact.
If there is an automorphism of (X. 1) such that . = .
t
and , = ,
t
then G(.. ,[:) = G(.
t
. ,
t
[:).
Next, let , = . and dene the branch
T
,,x
= {n X : . n ,].
We let 1
,,x
be the restriction of the matrix 1 to that branch.
We say that the branches T
,,x
and T
,
0
,x
0 are isomorphic, if there is an iso-
morphism of (T
,,x
. 1
,,x
) onto (T
,
0
,x
0 . 1
,
0
,x
0 ) such that . = .
t
and , = ,
t
.
Again, after formulating this denition, the following fact is obvious.
If T
,,x
and T
,
0
,x
0 are isomorphic, then J(.. ,[:) = J(.
t
. ,
t
[:).
Indeed, before reaching , for the rst time, the Markov chain starting at . can never
leave T
,,x
.
Exercise 2.26. If (X. 1) is irreducible and aperiodic and .. , X, then there
is k = k
x,,
such that
(k)
(.. ,) > 0. Also, By Lemma 2.22, there is m
x
such
that
(n)
(.. .) > 0 for all m _ m
x
. Therefore
(qn)
(.. ,) > 0 for all q with
qn k
x,,
_ m
x
.
Exercise 2.34. For C = { . ], the truncated transition matrix is
1
C
=
_
0 1,2
1,4 1,2
_
.
Therefore, using (1.36),
G
C
( . [:) = 1, det(1 :1
C
) = 8,(8 4: :
2
).
Its radius of convergence is the root of the denominator with smallest absolute value.
The spectral radius is the inverse of that root, that is, the largest eigenvalue. We get
j(1
C
) = (1
_
3),4.
Exercise 3.7. If , is transient, then
o
n=0
Pr
7
n
= ,| =
x
v(.)G(.. ,)
=
x
v(.) J(.. ,)

_ 1
G(,. ,) _ G(,. ,) < o.
In particular, Pr
7
n
= ,| 0.
Exercise 3.13. If C is a nite, essential class and . C, then by (3.11),
,C
J(.. ,[:)
1 :
1 U(,. ,[:)
= 1.
Suppose that some and hence every element of C is transient. Then U(,. ,) < 1
for all , C, so that niteness of C yields
lim
z-1-
,C
J(.. ,[:)
1 :
1 U(,. ,[:)
= 0.
a contradiction.
Exercise 3.14. We always think of

X as a partition of X. We realize the factor
chain (
7
n
) on the trajectory space of (X. 1) (instead of its own trajectory space),
which is legitimate by Exercise 1.31:

7
n
= (7
n
). Since . ., the rst visit of
7
n
in the set . cannot occur after the rst visit in the point . .. That is, t
x
_ t
x
.
Therefore, if . is recurrent, Pr
x
t
x
< o| = 1, then also Pr
x
t
x
< o| =
Pr
x
t
x
< o| = 1. In the same way, if . is positive recurrent, then also
E
x
(t
x
) = E
x
(t
x
) _ E
x
(t
x
) < o.
v(X) _
,A
v1(,) =
xA
,A
v(.)(.. ,) = v(X).
Thus, we cannot have v1(,) < v(,) for any , X.
Exercise 3.22. Let c = v(,),2. We can nd a nite subset
t
of X such that
v(X \
t
) < c. As in the proof of Theorem 3.19, for 0 < : < 1,
2c = v(,) _
x
"
v(.) J(.. ,[:)
1 :
1 U(,. ,[:)
c.
Therefore
x
"
v(.) J(.. ,[:)
1 :
1 U(,. ,[:)
_ c.
Suppose rst that U(,. ,[1) < 1. Then the left hand side in the last inequality tends
to 0 as : 1, a contradiction. Therefore , must be recurrent, and we can apply
de lHospitals rule. We nd
x
"
v(.) J(.. ,)
1
U
t
(,. ,[1)
_ c.
Thus, U
t
(,. ,[1) < o, and , is positive recurrent.
Exercise 3.24. We write C
d
= C
0
. Let i {1. . . . . J]. If , C
i
and (.. ,) > 0
then . C
i-1
. Therefore
m
C
(C
i
) =
,C
i
xC
i1
m
C
(.)(.. ,) =
xC
i1
m
C
(.)
,C
i
(.. ,)

= 1
= m(C
i-1
).
Thus, m
C
(C
i
) = 1,J. Now write m
i
for the stationary probability measure of 1
d
C
on C
i
. We claim that
m
i
(.) =
_
J m
C
(.). if . C
i
.
0. otherwise.
By the above, this is a probability measure on C
i
. If , C
i
, then
m
i
(,) = J
xA
m
C
(.)
(d)
(.. ,)

> 0 only if . C
i
= m
i
1
d
(,).
as claimed. The same result can also be deduced by observing that
t
x
1
d
= t
x
1
,J. if 7
0
= ..
where the indices 1
d
and 1 refer to the respective Markov chains.
Exercise 3.27. (We omit the gure.) Since (k. 1) = 1,2 for all k, while ev-
ery other column of the transition matrix contains a 0, we have t(1) = 1,2.
Theorem 3.26 implies that the Markov chain is positive recurrent. The stationary
probability measure must satisfy
m(1) =
kN
m(k) (k. 1) = 1,2 and m(k1) = m(k) (k. k1) = m(k),2.
Thus, m(k) = 2
-k
, and for every N,
kN
[
(n)
(. k) 2
-k
[ _ 2
-n1
.
Exercise 3.32. Set (.) = v(.),m(.). Then
1(.) =
,
m(,)(,. .)
m(.)
v(,)
m(,)
=
v1(.)
m(.)
.
Thus

1 = if and only if v1 = v.
Exercise 3.41. In addition to the normalization
x
h(.)v(.) = 1 of the right and
left j-eigenvectors of , we can also normalize such that
x
v(.) = 1. With those
conditions, the eigenvectors are unique.
Consider rst the ,-column a
p
(. ,) of

p
. Since it is a right j-eigenvector of
, there must be a constant c(,) depending on , such that a
p
(.. ,) = c(,)h(.).
On the other hand, the .-row a
p
(.. ) = c( )h(.) is a left j-eigenvector of ,
and since h(.) > 0, also c( ) is a left j-eigenvector. Therefore there is a constant
such that c(,) = v(,) for all ,.
Exercise 3.43. With
(.. ,) =
a(.. ,)h(,)
j()h(.)
and q(.. ,) =
b(.. ,)h(,)
j()h(.)
we have that 1 is stochastic and q(.. ,) _ (.. ,) for all .. , X. If Q is also
stochastic, then

,
(.. ,) q(.. ,)

_ 0
= 0
for every ., whence 1 = Q.
Now suppose that T is irreducible and T = . Then Q is irreducible and
strictly substochastic in at least one row. By Proposition 2.31, j(Q) < 1. But
j(T) = j(Q),j(), so that j(T) < j(). Finally, if T is not irreducible and
dominated by , let C =
1
2
( T). Then C is irreducible, dominates T and
is dominated by . Furthermore, there must be .. , X such that a(.. ,) > 0
and b(.. ,) = 0, so that c(.. ,) < a(.. ,). By Proposition 3.42 we have that
max{[z[ : z spec(T)] _ j(C), and by the above, j(C) < j() strictly.
Exercise 3.45. We just have to check that every step of the proof that
j() = min{t > 0 [ there is g: X (0. o) with g _ t g]
remains validevenwhenX is aninnite (countable) set. This is indeedthe case, with
some care where the HeineBorel theorem is used. Here one can use the classical
diagonal methodfor extractinga subsequence (g
k(n)
) that converges pointwise.
Exercise 3.52. (1) Let (7
n
)
n_0
be the Markov chain on X with transition matrix
1. We know from Theorem 2.24 that 7
n
can return to the starting point only when
n is a multiple of J. Thus,
t
x
1
d
= t
x
1
,J. if 7
0
= ..
where the indices 1
d
and 1 refer to the respective Markov chains, as mentioned
above. Therefore E
x
(t
x
1
) < oif and only if E
x
(t
x
1
d
) < o.
(2) This follows by applying Theorem3.48 to the irreducible, aperiodic Markov
chain (C
i
. 1
d
C
i
).
(3) Let r = i if _ i and r = i J if < i . Then we know from
Theorem 2.24 that for n C
}
we have
(n)
(.. n) > 0 only if J[m r. We can
write
(ndi)
(.. ,) =
uC
j
(i)
(.. n)
(nd)
(n. ,).
By (2),
(nd)
(n. ,) J m(,) for all n C
}
. Since

uC
j
(i)
(.. n) = 1,
we get (by dominated convergence) that for . C
i
and , C
}
,
lim
n-o
(nd}-i)
(.. ,) = J m(,).
Exercise 3.59. The rst identity is clear when . = ,, since 1(.. .[:) = 1.
Suppose that . = ,. Then
(n)
(.. ,) =
n-1
k=0
Pr
x
7
k
= .. 7
}
= . ( = k 1. . . . . n). 7
n
= ,|
=
n-1
k=0
(k)
(.. .)
(n-k)
(.. ,) =
n
k=0
(k)
(.. .)
(n-k)
(.. ,).
since
(0)
(.. ,) = 0. The formula now follows once more from the product rule
for power series.
For the second formula, we decompose with respect to the last step: for n _ 2,
Pr
x
t
x
= n| =
,yx
Pr
x
7
}
= . ( = 1. . . . . n 1). 7
n-1
= ,. 7
n
= .|
=
,yx
(n-1)
(.. ,) (,. .) =
(n-1)
(.. ,) (,. .).
since
(n-1)
(.. .) = 0. The identity Pr
x
t
x
= n| =

,

(n-1)
(.. ,) (,. .) also
remains valid when n = 1. Multiplying by :
n
and summing over all n, we get the
proposed formula.
The proof of the third formula is completely analogous to the previous one.
Exercise 3.60. By Theorem 1.38 and Exercise 3.59,
G(.. ,[:)G(,. .[:) =
_
J(.. ,[:)G(,. ,[:)J(,. .[:)G(.. .[:)
G(.. .[:)1(.. ,[:)G(,. ,[:)1(,. .[:).
For real : (0. 1), we can divide by G(.. .[:)G(,. ,[:) and get the proposed
identity
1(.. ,[:)1(,. .[:) = J(.. ,[:)J(,. .[:).
Since . - ,, we have 1(.. ,[1) > 0 and 1(,. .[1) > 0. But
1(.. ,[1)1(,. .[1) = J(.. ,)J(,. .) _ 1.
so that we must have 1(.. ,[1) < oand 1(,. .[1) < o.
Exercise 3.61. If . is a recurrent state and 1(.. ,[:) > 0 for : > 0 then . - ,
(since . is essential). Using the suggestion and Exercise 3.59,
,A
G(.. .[:) 1(.. ,[:) =
1
1 :
. or equivalently,
,A
1(.. ,[:) =
1 U(.. .[:)
1 :
.
Letting : 1, the formula follows.
Exercise 3.64. Following the suggestion, we consider an arbitrary initial distri-
bution v and choose a state . X. Then we dene t
0
= s
x
and, as before,
t
k
= inf{n > t
k-1
: 7
n
= .] for k _ 1. Then we let Y
0
=

s
x
n=0
(7
n
), while
Y
k
for k _ 1 remains as before. The Y
k
, k _ 0, are independent, and for k _ 1,
they all have the same distribution. In particular,
1
k
S
t
k
( ) =
1
k
Y
0
1
k
k
}=1
Y
}

1
m(.)
_
A
Jm.
The proof now proceeds precisely as before.
Exercise 3.65. Let

7
n
= (7
n
. 7
n1
). It is immediate that this is a Markov chain.
For (.. ,). (. n)

X, write
_
(.. ,). (. n)
_
for its transition probabilities. Then

_
(.. ,). (. n)
_
=
_
(,. n). if = ,.
0. otherwise.
By induction (prove this in detail!),

(n)
((.. ,). (. n)
_
=
(n-1)
(,. )(. n).
From this formula, it becomes apparent that the new Markov chain inherits irre-
ducibility and aperiodicity from the old one. The limit theorem for (X. 1) implies
that

(n)
((.. ,). (. n)
_
m()(. n). as n o.
Thus, the stationary probability distribution for the newchain is given by m(.. ,) =
m(.)(.. ,). We can apply the ergodic theorem with = 1
(x,,)
to (
7
n
) and get
1
N
1-1
n=0
v
x
n
v
,
n1
=
1
N
1-1
n=0
1
(x,,)
(
7
n
) m(.. ,) = m(.)(.. ,)
almost surely.
Exercise 3.70. This is proved exactly as Theorem 3.9. For 0 < : < r,
1 U(.. .[:)
1 U(,. ,[:)
=
G(,. ,[:)
G(.. .[:)
_
(I)
(,. .)
(k)
(.. ,) :
kI
.
where k and l are chosen such that
(k)
(.. ,) > 0 and
(I)
(,. .) > 0. Therefore,
in the j-recurrent case, once again via de lHospitals rule,
U
t
(.. .[r)
U(,. ,[r)
= lim
z-1-
G(,. ,[:)
G(.. .[:)
_
(I)
(,. .)
(k)
(.. ,) r
kI
> 0.
Exercise 3.71. The function : U(.. .[:) is monotone increasing and differ-
entiable for : (0. s). It follows from Proposition 2.28 that r must be the unique
solutionin(0. s) of the equationU(.. .[:) = 1. Inparticular, U
t
(.. .[r) < o.
Exercise 4.7. For the norm of V,
(V. V ) =
1
2
eT
1
r(e)
_
(e
) (e
-
)
_
2
_
eT
1
r(e)
_
(e
)
2
(e
-
)
2
_
=
xA
_

e : e
C
=x
1
r(e)
e : e
=x
1
r(e)
_
(.)
2
=
xA
2 m(.) (.)
2
= 2 (. ).
Regarding the adjoint operator, we have for every nitely supported : X R
(V. ) =
1
2
eT
_
(e
) (e
-
)
_
(e)
=
1
2
xA
_

e : e
C
=x
(.)(e)
e : e
=x
(.)(e)
_
=
xA
(.)
e : e
C
=x
(e)
_
since (e) = ( ` e)
_
= (. g). where g(.) =
1
m(.)
e : e
C
=x
(e).
Therefore V
+
= g.
Exercise 4.10. It is again sufcient to verify this for nitely supported . Since
1
m(x)i(e)
= (.. ,) for e = .. ,|, we have
V
+
(V )(.) =
1
m(.)
e : e
C
=x
(e
) (e
-
)
r(e)
=
, : x,,jT
_
(.) (,)
_
(.. ,) = (.) 1(.).
as proposed.
Exercise 4.17. If Q = z then 1 =
_
a (1 a)z
_
. Therefore
z
min
(1) = a (1 a)z
min
(Q) _ a (1 a).
with equality precisely when z
min
(Q) = 1.
If 1
k
_ a1 then we can write 1
k
= a1 (1a)Q, where Qis stochastic. By
the rst part, z
min
(1
k
) _ 1 2a. When k is odd, z
min
(1
k
) =
_
z
min
(1)
_
k
.
PrY
1
Y
2
= .| =
,
PrY
1
= ,. Y
1
Y
2
= .| =
,
PrY
1
= ,. Y
2
= ,
-1
.|
=
,
PrY
1
= ,| PrY
2
= ,
-1
.| =
,
j
1
(,) j
2
(,
-1
.).
This is j
1
+ j
2
(.).
Exercise 4.25. We use induction on J. Thus, we need to indicate the dimension
in the notation
"
=
(d)
"
.
For J = 1, linear independence is immediate. Suppose linear independence
holds for J 1. For " = (c
1
. . . . . c
d
) E
d
we let "
t
= (c
1
. . . . . c
d-1
) E
d-1
,
so that " = ("
t
. c
d
). Analogously, if x = (.
1
. . . . . .
d
) Z
d
2
. then we write
x = (x
t
. .
d
). Now suppose that for a set of real coefcients c
"
, we have
"E
d
c
"

(d)
"
(x) = 0 for all x Z
d
2
.
Since
(d)
"
(x) = "
x
d
d

(d-1)
"
0
(x
t
), we can rewrite this as
"
0
E
d1
_
c
("
0
,1)
(1)
x
d
c
("
0
,-1)
_

(d-1)
"
0
(x
t
) = 0 for all x
t
Z
d-1
2
. .
d
Z
2
.
By the induction hypothesis, with .
d
= 0 and = 1, respectively,
c
("
0
,1)
c
("
0
,-1)
= 0 and c
("
0
,1)
c
("
0
,-1)
= 0 for all "
t
E
d-1
. .
d
Z
2
.
Therefore c
("
0
,1)
= c
("
0
,-1)
= 0, that is c
"
= 0 for all " E
d
, completing the
induction argument.
Exercise 4.28. We dene the measure m on

X by m( .) = m( .) =

x x
m(.).
Then
m( .) ( .. ,) =
x x
m(.)
, ,
(.. ,) =
, ,
x x
m(,) (,. .) = m( ,) ( ,. .).
Exercise 4.30. Stirlings formula says that

N ~ (N,e)
1
_
2N. as N o.
Therefore
_
N
N,2
_
~
(N,e)
1
_
2N
_
N,(2e)
_
1
N
= 2
1
1
2
_
N
and
N
4
log
2
1
_
1
1{2
_ ~
N
4
log
_
N
2
=
N
8
_
log N log(,2)
_
~
N log N
8
.
as N o.
Exercise 4.41. In the Ehrenfest model, the state space is X = {0. . . . . N], the
edge set is 1 = { 1. |. . 1| : = 1. . . . . N],
m( ) =
_
1
}
_
2
1
. and r( 1. |) =
2
1
_
1-1
}-1
_.
For i < k, we have the obvious choices
i,k
= i. i 1. . . . . k| and
k,i
=
k. k 1. . . . . i |. We get
k
1
(. 1|) = k
1
( 1. |)
=
1
2
1
_
1-1
}-1
_
}-1
i=0
1
k=}
(k i )
_
N
i
__
N
k
_

= S
1
S
2
.
We compute
S
1
=
}-1
i=0
_
N
i
_
1
k=}
k
_
N
k
_
= N
}-1
i=0
_
N
i
_
1
k=}
_
N 1
k 1
_
= N
}-1
i=0
_
N
i
_
_
2
1-1
}-2
k=0
_
N 1
k
_
_
and
S
2
=
}-1
i=0
i
_
N
i
_
1
k=}
_
N
k
_
= N
}-1
i=1
_
N 1
i 1
_
1
k=}
_
N
k
_
= N
}-2
i=0
_
N 1
i
_
_
2
1
}-1
k=0
_
N
k
_
_
.
Therefore, using
_
1
i
_
_
1-1
i
_
=
_
1-1
i-1
_
,
S
1
S
2
= N 2
1-1
}-1
i=0
_
N
i
_
N 2
1
}-2
i=0
_
N 1
i
_
= N 2
1-1
}-1
i=0
_
N
i
_
N 2
1-1
}-1
i=0
_
N 1
i
_
N 2
1-1
}-2
i=0
_
N 1
i
_
N 2
1-1
_
N 1
1
_
= N 2
1-1
}-1
i=1
_
N 1
i 1
_
N 2
1-1
}-2
i=0
_
N 1
i
_
N 2
1-1
_
N 1
1
_
= N 2
1-1
_
N 1
1
_
.
We conclude that k
1
(e) = N,2 for each edge. Thus, our estimate for the spectral
gap becomes 1 z
1
_ 2,N, which misses the true value 2,(N 1) by the
(asymptotically as N o) negligible factor N,(N 1).
1
Exercise 4.45. We have to take care of the case when deg(.) = o. Following the
suggestion,
,:x,,jT
[(.) (,)[ a(.. ,) =
,:x,,jT
_
[(.) (,)[
_
a(.. ,)
_
_
a(.. ,)
_
,:x,,jT
_
(.) (,)
_
2
a(.. ,)
,:x,,jT
a(.. ,)
_ D( ) m(.).
which is nite, since D(N). Now both sums
,:x,,jT
_
(.) (,)
_
a(.. ,) = m(.) V
+
(V )(.) and
,:x,,jT
(.) a(.. ,) = m(.) (.)
are absolutely convergent. Therefore we may separate the differences:
m(.) V
+
(V )(.) =
,:x,,jT
(.) a(.. ,)
,:x,,jT
(,) a(.. ,)
= m(.)
_
(.) 1(.)
_
.
This is the proposed formula.
Exercise 4.52. Let = VG(. .),m(.). Since G(. .) D(N), we can apply
Exercise 4.45 to compute the power of :
(. ) =
1
m(.)
2
_
G(. .). V
+
VG(. .)
_
=
1
m(.)
2
_
G(. .). (1 1)G(. .)

= 1
x
_
=
G(.. .)
m(.)
.
From the proof of Theorem 4.51, we see that for nite X,
m(.)
G
(.. .)
_ cap(.) _
1
(. )
=
m(.)
G(.. .)
.
If we let , X, we get the proposed formula for cap(.).
1
The author acknowledges input from Theresia Eisenklbl regarding the computation of S
1
-S
2
.
Finally, we justify writing The ow from . to owith...: the set of all ows
from . to owith input 1 is closed and convex in
2
[
(1. r), so that it has indeed a
unique element with minimal norm.
Exercise 4.54. Suppose that
t
is a ow from . X
t
to owith input 1 and nite
power in the subnetwork N
t
= (X
t
. 1
t
. r
t
) of N = (X. 1. r). We can extend
t
to a function on 1 by setting
t
(e) =
_
t
(e). if e 1
t
.
0. if e 1 \ 1
t
.
Then is a ow from . X to owith input 1. Its power in
2
[
(1. r) reduces to
eT
0
t
(e)
2
r(e) _
eT
0
t
(e)
2
r
t
(e).
which is nite by assumption. Thus, also N is transient.
Exercise 4.58. (a) The edge set is {k. k 1| : k Z]. All resistances are
= 1. Suppose that is a ow with input 1 from state 0 to o. Let (0. 1|) = .
Then (0. 1|) = 1 . Furthermore, we must have (k. k 1|) = and
(k. k 1|) = 1 for all k _ 0. Therefore
(. ) =
_
2
(1 )
2
_
o.
Thus, every ow from 1 to o with input 1 has innite power, so that SRW is
recurrent.
(b) We use the classical formula number of favourable cases divided by the
number of possible cases, which is justied when all cases are equally likely.
Here, a single case is a trajectory (path) of length 2n that starts at state 0. There are
2
2n
such trajectories, each of which has probability 1,2
2n
.
If the walker has to be back at state 0 at the 2n-th step, then of those 2n steps,
n must go right and the other n must go left. Thus, we have to select from the
time interval {1. . . . . 2n] the subset of those n instants when the walker goes right.
We conclude that the number of favourable cases is
_
2n
n
_
. This yields the proposed
formula for
(2n)
(0. 0).
(c) We use (3.6) with = q = 1,2 and get U(0. 0[:) = 1
_
1 :
2
. By
binomial expansion,
G(0. 0[:) =
1
_
1 :
2
= (1 :
2
)
-1{2
=
o
n=0
(1)
n
_
1,2
n
_
:
2n
.
The coefcient of :
2n
is
(2n)
(0. 0) = (1)
n
_
1,2
n
_
=
1
4
n
_
2n
n
_
.
(d) The asymptotic evaluation has already been carried out in Exercise 4.30. We
see that
(n)
(0. 0) = o.
Exercise 4.69. The irreducible class of 0 is the additive semigroup S generated by
supp(j). Since it is essential, k 0 for every k S. But this means that k S,
so that S is a subgroup of Z. Under the stated assumptions, S = {0]. But then
there must be k
0
N such that S = {k
0
n : n Z]. In particular, S must have
nite index in Z. The irreducible (whence essential) classes are the cosets of S in
Z, so that there are only nitely many classes.
Exercise 5.14. We modify the Markov chain by restricting the state space to
{0. 1. . . . . k 1 i ]. We make the point k 1 i absorbing, while all other
transition probabilities within that set remain the same. Then the generating func-
tions J
i
(k 1. k 1[:), J
i
(k. k 1[:) and J
i-1
(k 1. k[:) with respect to the
original chain coincide with the functions J(k 1. k 1[:), J(k. k 1[:) and
J(k 1. k[:) of the modied chain. Since k is a cut point between k 1 and
k 1, the formula now follows from Proposition1.43 (b), applied to the modied
chain.
Exercise 5.20. In the null-recurrent case (S = o), we know that E
k
(s
0
) = o.
Let us therefore suppose that our birth-and-death chain is positive recurrent. Then
E
0
(t
0
) =

o
}=0
m( ). We can the proceed precisely as in the computations that
led to Proposition 5.8 to nd that
E
k
(s
0
) =
k-1
i=0
o
}=i1
i1

}-1
q
i1
q
}
.
Exercise 5.31. We have for n _ 1
Prt > n| = 1 PrM
n
= 0| = 1 g
n
(0).
Using the mean value theorem of differential calculus and monotonicity of
t
(:),
1 g
n
(0) = (1)
_
g
n-1
(0)
_
=
_
1 g
n-1
(0)
_
t
() _
_
1 g
n-1
(0)
_
j.
where g
n-1
(0) < < 1. Inductively,
1 g
n
(0) _
_
1 j(0)
_
j
n-1
.
Therefore
E(t) =
o
n=0
Prt > n| _ 1
_
1 j(0)
_
o
n=1
j
n-1
= 1
1 j(0)
1 j
.
Exercise 5.35. With probability j(o), the rst generation is innite, that is,
T. But then
PrN
}
< o for all [ T| = 0:
at least one of the members of the rst generation has innitely many children.
Repeating the argument, at least one of the latter must have innitely many children,
and so on (inductively). We conclude that conditionally upon the event T|,
all generations are innite:
PrM
n
= o for all n _ 1 [ M
1
= o| = 1.
Exercise 5.33. (a) If the ancestor c has no offspring, M
1
= 0, then [T[ = 1.
Otherwise,
[T[ = 1 [T
1
[ [T
2
[ [T
1
[.
where T
}
is the subtree of T rooted at the -th offspring of the ancestor, .
Since these trees are i.i.d., we get
E(:
[T[
) = j(0) :
o
k=1
PrM
1
= k| E
_
:
1[T
1
[[T
2
[[T
k
[
_
=
o
k=1
j(k) : g(:)
k
.
(b) We have (:) = q :
2
. We get a quadratic equation for g(:). Among
the two solutions, the right one must be monotone increasing for : > 0 near 0.
Therefore
g(:) =
1
2:
_
1
_
1 4q:
2
_
.
It is now a straightforward task to use the binomial theorem to expand g(:) into a
power series. The series coefcients are the desired probabilities.
(c) In this case, (:) = q,(1 :), and
g(:) =
1
2
_
1
_
1 4q:
_
.
The computation of the probabilities Pr [T[ = k| is almost the same as in (b) and
also left to the reader.
Exercise 5.37. When > 1,2, the drunkards walk is transient. Therefore
7
n
oalmost surely. We infer that
Pr
0
t
0
= o. 7
n
o| = Pr
0
t
0
= o| = 1 J(1. 0) > 0.
On the event t
0
= o. 7
n
o|, every edge is crossed. Therefore
PrM
k
> 0 for all k| > 0.
If (M
k
) were a GaltonWatson process then it would be supercritical. But the
average number of upcrossings of k 1. k 2| that take place after an upcrossing
of k. k 1| and before another upcrossing of k. k 1| may occur (i.e., before the
drunkard returns from state k 1 to state k) coincides with
E
0
(M
1
) =
o
n=1
Pr
0
7
n-1
= 1. 7
n
= 2. n < t
0
|
=
o
n=1
Pr
1
7
n-1
= 1. 7
n
= 2. 7
i
= 0 (i < n)|
=
o
n=1
q
Pr
1
7
n-1
= 1. 7
n
= 0. 7
i
= 0 (i < n)|
=

q
J(1. 0) = 1.
But if the average offspring number is 1 (and the offspring distribution is non-
degenerate, which must be the case here) then the GaltonWatson process dies out
almost surely, a contradiction.
Exercise 5.41. (a) Following the suggestion, we decompose
M
,
n
=
u
n
1
uT, 7
u
=,j
.
Therefore, using Lemma 5.40,
E
BMC
x
(M
,
n
) =
u
n
Pr
BMC
x
u T. 7
u
= ,|
=
}
1
,...,}
n
(n)
(.. ,)
n
i=1
j
i
. o) =
(n)
(.. ,)
n
i=1
}
i
j
i
. o).
Since
}
j. o) = j, the formula follows.
(b) By part (a), we have E
BMC
x
(M
,
k
) > 0. Therefore Pr
BMC
x
M
,
k
= 0| > 0.
Exercise 5.44. To conclude, we have to show that when there are .. , such that
H
BMC
(.. ,) =1 then H
BMC
(.. ,
t
) =1 for every ,
t
X. If Pr
BMC
x
M
,
=o| =1,
then
Pr
BMC
x
M
,
0
< o| = Pr
BMC
x
M
,
= o. M
,
0
< o|.
which is = 0 by Lemma 5.43.
Exercise 5.47. Let u(0) = c, u(i ) =
1

i
, i = 1. . . . . n1 be the predecessors
of u(n) = in
+
, and let be the subtree of
+
spanned by the u(i ). Let
,
1
. . . . . ,
n-1
X and set ,
n
= .. Then we can consider the event D(: a
u
. u ),
where a
u(})
= ,
}
, and (5.38) becomes
Pr
BMC
x
T. 7
= .. 7
u
i
= ,
k
(i = 1. . . . . n1)| =
n
k=1
j
i
. o)
_
,
i-1
. ,
i
_
.
Now we can compute
Pr
BMC
x
W
x
1
| =
,
1
,...,,
n1
A\x
Pr
BMC
x
T. 7
u
i
= ,
k
(i = 1. . . . . n)|
=
n
k=1
j
i
. o)
,
1
,...,,
n1
A\x
(.. ,
1
)(,
1
. ,
2
) (,
n-1
. .).
The last sum is u
(n)
(.. .).
Exercise 5.51. It is denitely left to the reader to formulate this with complete
rigour in terms of events in the underlying probability space and their probabilities.
Exercise 5.52. If j(0) = 0 then each of the elements u

n
= 1
n
= 1 1 (n _ 0
times) belongs to T with probability 1. We know from (5.39.4) that (7
u
n
)
n_0
is
a Markov chain on X with transition matrix 1. If (X. 1) is recurrent then this
Markov chain returns to the starting point . innitely often with probability 1. That
is, H
BMC
(.. .) = 1.
Exercise 6.10. The function . v
x
(i ) has the following properties: if . C
i
then v
x
(i ) =

,
(.. ,)v
,
(i ) = 1, since , C
i
when (.. ,) > 0. If . C
}
with = i thenv
x
(i ) = 0. If . C
i
thenwe candecompose withrespect tothe rst
step and obtain also v
x
(i ) =

,
(.. ,)v
,
(i ). Therefore h(.) =

i
g(i ) v
x
(i )
is harmonic, and has value g(i ) on C
i
for each i .
If h
t
is another harmonic function with value g(i ) on C
i
for every i , then
g = h
t
h is harmonic with value 0 on X
ess
. By a straightforward adaptation of
the maximum principle (Lemma 6.5), g must assume its maximum on X
ess
, and the
same holds for g. Therefore g 0.
Exercise 6.16. Variant 1. If the matrix 1 is irreducible and strictly substochastic
in some row, then we know that it cannot have 1 as an eigenvalue.
Variant 2. Let h H. Let .
0
be such that h(.
0
) = max h. Then, repeating
the argument from the proof of the maximum principle, h(,) = h(.
0
) for every
, with (.
0
. ,) > 0. By irreducibility, h c is constant. If now
0
is such that
,
(
0
. ,) < 1 then c =

,
(
0
. ,) c, which is possible only when c = 0.
Exercise 6.18. If 1 is nite then [ inf
J
h
i
[ _

J
[h
i
[ is 1-integrable.
If h
i
(.) _ C for all . X and all i 1 then C _ inf
J
h
i
_ h
1
, so that
[ inf
J
h
i
[ _ [C[ [h
1
[, a 1-integrable upper bound.
1 =
_
0 1
1,2 0
_
and G = (1 1)
-1
=
_
2 2
1 2
_
.
and G(. ,) 2.
Exercise 6.40. This can be done by a straightforward adaptation of the proof of
Theorem 6.39 and is left entirely to the reader.
Exercise 6.22. (a) By Theorem 1.38 (d), J(. ,) is harmonic at every point ex-
cept ,. At ,, we have
u
(,. n) J(n. ,) = U(,. ,) _ 1 = J(,. ,).
(b) The only if is part of Lemma 6.17. Conversely, suppose that every non-
negative, non-zero superharmonic function is strictly positive. For any , X, the
function J(. ,) is superharmonic and has value 1 at ,. The assumption yields that
J(.. ,) > 0 for all .. ,, which is the same as irreducibility.
(c) Suppose that u is superharmonic and that . is such that u(.) _ u(,) for all
, X. Since 1 is stochastic,
,
(.. ,)
_
u(,) u(.)
_

_ 0
_ 0.
Thus, u(,) = u(.) whenever .
1
,, and consequently whenever . ,.
(d) Assume rst that (X. 1) is irreducible. Let u be superharmonic, and let .
0
be such that u(.
0
) = min
A
u(.). Since 1 is stochastic, the function u u(.
0
) is
again superharmonic, non-negative, and assumes the value 0. By Lemma 6.17(1),
u u(.
0
) 0.
Conversely, assume that the minimum principle for superharmonic functions
is valid. If J(.. ,) = 0 for some .. , then the superharmonic function J(. ,)
assumes 0 as its minimum. But this function cannot be constant, since J(,. ,) = 1.
Therefore we cannot have J(.. ,) = 0 for any .. ,.
Exercise 6.24. (1) If 0 _ v1 _ v then inductively
v1
n
_ v1
n-1
_ _ v1 _ v.
(2) For every i 1, we have v1 _ v
i
1 _ v
i
. Thus, v1 _ inf
J
v
i
= v.
(3) This is immediate from (3.20).
Exercise 6.29. If Pr
x
t
< o| = 1 for every . , then the Markov chain must

return to innitely often with probability 1. Since is nite, there must be at
least one , that is visited innitely often:
1 = Pr
x
J, : 7
n
= , for innitely many n| _
,
H(.. ,).
Thus there must be , such that H(.. ,) = U(.. ,)H(,. ,) > 0. We see that
H(,. ,) > 0, which is equivalent with recurrence of the state , (Theorem3.2).
Exercise 6.31. We can compute the mean vector of the law of this random walk:
j =
_
]
1
-]
3
]
2
-]
4
_
, and appeal to Theorem 4.68.
Since that theorem was not proved here, we can also give a direct proof.
We rst prove recurrence when
1
=
3
and
2
=
4
. Then we have a
symmetric Markov chain with m(x) = 1 and resistances
1
on the horizontal edges
and
2
on the vertical edges of the square grid. Let N
t
be the resulting network,
and let N be the network with the same underlying graph but all resistances equal
to max{
1
.
2
]. The Markov chain associated with N is simple random walk on
Z
2
, which we know to be recurrent from Example 4.63. By Exercise 4.54, also N
t
is recurrent.
If we do not have
1
=
3
and
2
=
4
, then let c = (c
1
. c
2
) R
2
and dene
the function
c
(x) = e
c
1
x
1
c
2
x
2
for x = (.
1
. .
2
) Z
2
. The one veries easily
that
1
c
= z
c

c
.
Then also 1
n
c
= z
n
c

c
, so that j(1) _ z
c
for every c. By elementary calculus,
min{z
c
: c R
2
] = 2
_
3
2
_
4
< 1.
Therefore j(1) < 1, and we have transience.
Exercise 6.45. We know that g = u h _ 0 and g
A
ess
0. We can proceed as
in the proof of the Riesz decomposition: set = g 1g, which is _ 0. Then
supp( ) X
o
, and
g 1
n1
g =
n
k=0
1
k
.
Now
(n1)
(.. ,) 0 as n o, whenever , X
o
. Therefore 1
n1
g 0, and
g = G . Uniqueness is immediate, as in the last line of the proof of Theorem 6.43.
Exercise 6.48. The notation

will refer to sets of paths in I(
1), and w( ) to their

weights with respect to

1. Then it is clear that the mapping
= .
0
. .
1
. . . . . .
n
| = .
n
. . . . . .
1
. .
0
|
is a bijection between { (.. ,) : meets only in the terminal point] and
{

(.. ,) : meets only in the initial point]. It is also a bijection be-
tween { (.. ,) : meets only in the initial point] and {

(.. ,) :
meets only in the terminal point].
By a straightforward computation, w( ) = v(.
0
)w(),v(.
n
) for any path
= .
0
. .
1
. . . . . .
n
|. Summing over all paths in the respective sets, the formulas
follow.
Exercise 6.57. This is left entirely to the reader.
Exercise 6.58. We x n and ,. Set h = G(. ,) and =
G(u,,)
G(u,u)
1
u
. Then
G(.) = J(.. n)G(n. ,). We have h(n) = G(n). By the domination princi-
ple, h(.) _ G(.) for all ..
Exercise 7.9. (1) We have
,

h
(.. ,) =
1
h(x)
1h(.), which is = 1 if and only
if 1h(.) = h(.).
(2) In the same way,
1
h
u(.) =
,
(.. ,)h(,)
h(.)
u(,)
h(,)
=
1
h(.)
1u(.).
which is _ u(.) if and only if 1u(.) _ u(.).
(3) Suppose that u is minimal harmonic with respect to 1. We know from (2)
that h(o) u is harmonic with respect to 1
h
. Suppose that h(o) u _

h
1
, where
u = u,h as above, and

h
1
H
(X. 1
h
). By (2), h
1
= h

h
1
H
(X. 1). On
the other hand, u _
1
h(o)
h
1
. Since u is minimal, u,h
1
is constant. But then also
h(o) u,
h
1
is constant. Thus, h(o) u is minimal harmonic with respect to 1
h
.
The converse is proved in the same way and is left entirely to the reader.
Exercise 7.12. We suppose to have two compactications

X and

X of X and
continuous surjections t :

X

X and o :

X

X such that t(.) = o(.) = . for
all . X.
Let

X. There is a sequence (.
n
) in X such that .
n
in the topology
of

X. By continuity of t, .
n
= t(.
n
) t() in the topology of

X. But then
by continuity of o, .
n
= o(.
n
) o
_
t()
_
in the topology of

X. Therefore
o
_
t()
_
= . Therefore t is injective, whence bijective, and o is the inverse
mapping.
Exercise 7.16. We start with the second model, and will also see that it is equivalent
with the rst one.
First of all, X is discrete because when . X and , = . then 0(.. ,) _
n
1
x
[1
x
(.) 1
x
(,)[ = n
1
x
, so that the open ball with centre . and radius n
1
x
consists only of .. Next, let (.
n
) be a Cauchy sequence in the metric 0. Then
_
(.
n
)
_
must be a Cauchy sequence in R and lim
n
(.
n
) exists for every F
+
.
Conversely, suppose that
_
(.
n
)
_
is a Cauchy sequence in R for every F
+
.
Given c > 0, there is a nite subset F
t
F
+
such that

F
\F
"
n
(
C
(
< c,4.
Now let N
t
be such that [(.
n
) (.
n
)[ < C
(
c,(2S) for all n. m _ N
t
and all
F
t
, where S =

F
n
(
C
(
_
. Then for such n. m
0(.
n
. .
n
) <
\F
"
n
(
_
[(.
n
)[ [(.
n
)[
_
F
"
n
(
C
(
c,(2S) _ c.
Thus, (.
n
) is a Cauchy sequence in the metric 0(. ).
Let (.
n
) be such a sequence. If there is . X such that .
n
k
= . for an
innite subsequence (n
k
), then 1
x
(.
n
k
) = 1 for all k. Since
_
1
x
(.
n
)
_
converges,
the limit is 1. We see that every Cauchy sequence (.
n
) is such that either .
n
= .
for some . X and all but nitely many n, or else (.
n
) tends to o and
_
(.
n
)
_
is a convergent sequence in R for every F .
At this point, recall the general construction of the completion of the metric space
(X. 0). The completion consists of all equivalence classes of Cauchy sequences,
where (.
n
) ~ (,
n
) if 0(.
n
. ,
n
) 0. In our case, this means that either .
n
= ,
n
=
. for some . X and all but nitely many n, or as above that both sequences
tend to oand lim(.
n
) = lim(,
n
) for every F . The embedding of X in
this space of equivalence classes is via the equivalence classes of constant sequences
in X, and the extended metric is 0(. j) = lim
n
0(.
n
. ,
n
), where (.
n
) and (,
n
) are
arbitrary representatives of the equivalence classes and j, respectively.
At this point, we see that the completion X coincides with X
o
of the rst model,
including the notion of convergence in that model.
We now write

X for this completion. We show compactness. Since X is
dense in

X, it is sufcient to show that every sequence (.
n
) in X has a convergent
subsequence. Now, the sequence
_
(.
n
)
_
is bounded for every F
+
. Since F
+
is countable, we can use the well-known diagonal argument to extract a subsequence
(.
n
k
) such that
_
(.
n
k
)
_
converges for every F
+
. We know that this is a
Cauchy sequence, whence it converges to some element of

X.
By construction, every F extends to a continuous function on

X. Finally,
if . j

X \ X are distinct, then let the sequences (.
n
) and (,
n
) be representatives
of those two respective equivalence classes. They are not equivalent, so that there
must be F such that lim
n
(.
n
) = lim
n
(,
n
). That is, () = (j). We
conclude that F separates the boundary points.
By Theorem 7.13,

X is (equivalent with)

X
F
.
Exercise 7.26. Let us refresh our knowledge about conditional expectation: we
have the probability space (C. A. Pr), and a sub-o-algebra A
n-1
, which in our case
is the one generated by (Y
0
. . . . . Y
n-1
). We take a non-negative, real, A-measurable
random variable W , in our case W = W
n
. We can dene a measure Q
W
on A
by Q
W
() =
_
W J Pr. Both Pr and Q

W
are now considered as measures on
A
n-1
, and Q
W
is absolutely continuous with respect to Pr. Therefore, it has an
A
n-1
-measurable RadonNikodym density, which is Pr-almost surely unique. By
denition, this is E(W[A
n-1
). If W itself is A
n-1
-measurable then it is itself that
density. In general, Q
W
can also be dened on A
n-1
via integration: for every
non-negative A
n-1
-measurable function (random variable) V on C,
_
D
V JQ
=
_
D
V W J Pr .
Thus, by the construction of the conditional expectation, the latter is characterized
as the a.s. unique A
n-1
-measurable random variable E(W[A
n-1
) that satises
_
D
V E(W[A
n-1
) J Pr =
_
D
V W J Pr.
that is,
E
_
V E(W[A
n-1
)
_
= E(V W )
for every V as above.
In our case, A
n-1
is generated by the atoms Y
0
= .
0
. . . . . Y
n-1
= .
n-1
|,
where .
0
. . . . . .
n-1
X L {f]. Every A
n-1
-measurable function is constant
on each of those atoms. In other words, every such function has the form V =
g(Y
0
. . . . . Y
n-1
). We can reformulate: E(W[Y
0
. . . . . Y
n
) is the a.s. unique A
n-1
-
measurable random variable that satises
E
_
g(Y
0
. . . . . Y
n-1
) E(W[A
n-1
)
_
= E
_
g(Y
0
. . . . . Y
n-1
) W
_
for every function g: (X L {f])
n
0. o).
If we now set W = W
n
=
n
(Y
0
. . . . . Y
n
), then the supermartingale property
E(W
n
[ Y
0
. . . . . Y
n-1
) _ W
n-1
implies
E(g(Y
0
. . . . . Y
n-1
) W
n
) = E
_
g(Y
0
. . . . . Y
n-1
) E(W
n
[A
n-1
)
_
_ E
_
g(Y
0
. . . . . Y
n-1
) W
n-1
_
for every g as above. Conversely, if the inequality between the second and the third
term holds for every such g, then the measures Q
W
n
and Q
W
n1
on A
n-1
satisfy
Q
W
n
_ Q
W
n1
. But then their RadonNikodym derivatives with respect to Pr
x
also satisfy
E(W
n
[ Y
0
. . . . . Y
n-1
) =
JQ
W
n
J Pr
_
JQ
W
n1
J Pr
= E(W
n-1
[ Y
0
. . . . . Y
n-1
) = W
n-1
almost surely. (The last identity holds because W
n-1
is A
n-1
-measurable.)
So nowwe see that (7.25) is equivalent with the supermartingale property. If we
take g = 1
(x
0
,...,x
n1
)
, where (.
0
. . . . . .
n-1
) (X L{f])
n
, then (7.25) specializes
to (7.24). Conversely, every function g: (X L {f])
n
0. o) is a nite, non-
negative linear combination of such indicator functions. Therefore (7.24) implies
(7.25).
Exercise 7.32. The function G(. ,) is superharmonic and bounded by G(,. ,).
Thus,
_
G(7
n
. ,)
_
is a bounded, positive supermartingale, and must converge almost
surely. Let W be the limit random variable. By dominated convergence, E
x
(W ) =
lim
n
E
x
_
G(7
n
. ,)
_
. But
E
x
_
G(7
n
. ,)
_
=
(n)
(.. ) G(. ,) =
o
k=n
(k)
(.. ,).
which tends to 0 as n o. Therefore E
x
(W ) = 0, so that (being non-negative)
W = 0 Pr
x
-almost surely.
Exercise 7.39. Let k. l. r N and consider the set
k,I,i
=

o = (.
n
) X
N
0
: 1(.. .
i
) < c c
1
k

1
I
_
.
This set is a union of basic cylinder sets, since it depends only on the r-th projection
.
i
of o. We invite the reader to check that
o
_
k=1
o
_
I=1
o
_
n=1
o
_
i=n
k,I,i
=
limsup
=
o =(.
n
) X
N
0
: limsup
n-o
1(.. .
n
) < cc
_
.
Therefore the latter set is in A.
We leave it entirely to the reader to work out that analogously,
liminf
=

o = (.
n
) X
N
0
: liminf
n-o
1(.. .
n
) > c c
_
A.
Then our set is
C
o

limsup

liminf
A.
Exercise 7.49. By (7.8), the Green function of the h-process is
G
h
(.. ,) = G(.. ,)h(,),h(.).
Therefore (7.47) and (7.40) yield
v
h
() = h(o) G
h
(o. )
_
1
uA
h
(. n)
_
= G(o. ) h()
_
1
uA
(. n)
h(n)
h()
_
= G(o. )
_
h()
uA
(. n)h(n)
_
.
as proposed. Setting h = 1(. ,), we get for X
v
h
() = G(o. )
_
1(. ,)
uA
(. n)1(n. ,)
_
= G(. ,) 1G(. ,) =
,
().
Exercise 7.58. Let (X. Q) be irreducible and recurrent. In particular, Qis stochas-
tic. Choose n X and dene a transition matrix 1 by
(.. ,) =
_
q(.. ,),2. if . = n.
q(.. ,). if . = n.
Thus, (n. f) = 1,2 and (.. f) = 0 when . = n. Then J(.. n) is the same
for 1 and Q, because J(.. n) does not depend on the outgoing probabilities at n;
compare with Exercise 2.12. Thus J(.. n) = 1 for all .. For the chain extended to
XL{f], the state nis a cut point between any . X\{n] and f. Via Theorem1.38
and Proposition 1.43,
J(n. f) =
1
2
xA
(n. .)J(.. n)J(n. f) =
1
2
1
2
J(n. f).
We see that J(n. f) = 1, and for . X \ {n], we also have J(.. f) =
J(.. n)J(n. f) = 1. Thus, the Markov chain with transition matrix 1 is ab-
sorbed by f almost surely for every starting point.
Exercise 7.66. (a) == (b). Let supp(v) = {]. Then h
0
(.) = v
x
() =
1(.. ) v() is non-zero. If h is a bounded harmonic function then there is a
bounded measurable function on Msuch that
h(.) =
_
M
1(.. ) Jv = 1(.. ) () v() = () h
0
(.).
(b) == (c). Let h
1
be a non-negative harmonic function such that h
0
_ h
1
.
Then h
1
is bounded, and by the hypothesis, h
1
,h
0
is constant. That is,
1
h
0
(o)
h
0
is
minimal.
(c) ==(a). If
1
h
0
(o)
h
0
is minimal then it must be a Martin kernel 1(. ). Thus
h
0
(.) = h
0
(o)
_
M
min
1(.. ) J
=
_
M
min
1(.. ) Jv.
By the uniqueness of the integral representation, v = h
0
(o)
.
Exercise 8.8. In the proof of Theorem 8.2, irreducibility is only used in the last 4
lines. Without any change, we have h(k l ) = h(k) for every k Z
d
and every
l supp(j). But then also h(kl ) = h(k). Suppose that supp(j) generates Z
d
as
a group, and let k Z
d
. Then we can nd n > 0 and elements l
1
. . . . . l
n
supp(j)
such that k = l
1
l
n
. Again, we get h(0) = h(k).
In part A of the proof of Theorem 8.7, irreducibility is not used up to the point
where we obtained that h
l
= h for every l supp(j
(n)
and each n _ 0. That is,
fur such l and every k Z
d
, h(k l ) = h(k)h(l ). But then also
h(k) = h(k l l ) = h(k l )h(l ).
With k = 0, we nd h(l ) = 1,h(l ), and then in general
h(k l ) = h(k),h(l ) = h(k)h(l ).
Now, as above, if supp(j) generates Z
d
as a group, then every l Z
d
has the form
l = l
1
l
2
, where l
i
supp(j
(n
i
)
) for some n
i
N
0
(i = 1. 2). But then
h(k l ) = h(k l
1
l
2
) = h(k)h(l
1
)h(l
2
) = h(k)h(l ).
and this is true for all k. l Z
d
.
In part B of the proof of Theorem 8.7, irreducibility is not used directly. We
should go back a little bit and see whether irreducibility is needed for Corollary 7.11,
or for Exercise 7.9 and Lemma 7.10 which lead to that corollary. There, the crucial
point is that we are allowed to dene the h-process for h =
c
, since that function
does not vanish at any point.
Exercise 8.11. Part A of the proof of Theorem 8.7 remains unchanged: every
minimal harmonic function has the form
c
with c C. Pointwise convergence
within that set of functions is the same as usual convergence in the set C. The
topology on M
min
is the one of pointwise convergence. This yields that there is a
subset C
t
of C such that the mapping c
c
(c C
t
) is a homeomorphism from
C
t
to M
min
. Therefore every positive harmonic function h has a unique integral
representation
h(k) =
_
C
0
e
ck
Jv(c) for all k Z
d
.
Suppose that there is c = 0 in supp(v). Let T be the open ball in R
d
with centre c
and radius [c[,2. The cone {t x : t _ 0. x T] opens with the angle ,3 at its
vertex 0. Therefore, if k Z
d
\ {0] is in that cone, then
[x k[ _ cos(,3) [x[ [k[ _ [k[ [c[,4 for all x T.
For such k, we get h(k) _ [k[ v(T) [c[,4, which is unbounded. We see that when h
is a bounded harmonic function, then supp(v) contains no non-zero element. That
is, v is a multiple of the point mass at 0, and h is constant.
Theorem 8.2 follows.
As suggested, we now conclude our reasoning with part B of the proof of The-
orem 8.7 without any change. This shows also that C
t
cannot be a proper subset
of C.
Exercise 8.13. Since the function is convex, the set {c R
d
: (c) _ 1] is
convex. Since is continuous, that set is closed, and by Lemma 8.12, it is bounded,
whence compact. Its interior is {c R
d
: (c) < 1], so that its topological
boundary is C. As is strictly convex, it has a unique minimum, which is the point
where the gradient of is 0. This leads to the proposed equation for the point where
the minimum is attained.
Exercise 9.4. This is straightforward by the quotient rule.
Exercise 9.7. We rst show this when , ~ o. Then m
o
(,) = (o. ,),(,. o) =
1,m
,
(o). We have to distinguish two cases.
(1) If . T
o,,
then , = .
1
in the formula (9.6) for m
o
(.), and
m
o
(.) =
(o. ,)
(,. o)
(,. .
2
) (.
k-1
. .
k
)
(.
2
. ,) (.
k
. .
k-1
)
= m
o
(,)m
,
(.).
(2) If . T
,,o
then we can exchange the role of o and , in the rst case and get
m
,
(.) = m
,
(o)m
o
(.) = m
o
(.),m
o
(,).
The rest of the proof is by induction on the distance J(,. o) in T . We suppose
the statement is true for ,
-
, that is, m
o
(.) = m
o
(,
-
)m
,
(.) for all .. Applying

the initial argument to ,
-
in the place of o, we get m
,
(.) = m
,
(,)m
,
(.). Since
m
o
(,
-
)m
,
(,) = m
o
(,), the argument is complete.
Exercise 9.9. Among the nitely many cones T
x
, where . ~ o, at least one must
be innite. Let this be T
x
1
. We nowproceed by induction. If we have already found
a geodesic arc o. .
1
. . . . . .
n
| such that T
x
n
is innite, then among the nitely many
T
,
with ,
-
= .
n
, at least one must be innite. Let this be T
x
nC1
.
In this way, we get a ray o. .
1
. .
2
. . . . |.
Exercise 9.11. Reexivity and symmetry of the relation are clear. For transitivity,
let = .
0
. .
1
. . . . |,
t
= ,
0
. ,
1
. . . . | and
tt
= :
0
. :
1
. . . . | be rays such that
and
t
as well as
t
and
tt
are equivalent. Then there are i. and k. l such that
.
in
= ,
}n
and ,
kn
= :
In
for all n. Then .
(ik)n
= ,
}kn
= :
(}I)n
for all n, so that and
tt
are also equivalent.
For the second statement, let = ,
0
. ,
1
. . . . | be a geodesic ray that represents
the end . For . X, consider (.. ,
0
) = . = .
0
. .
1
. . . . . .
n
= ,
0
|. Let be
minimal in {0. . . . . m] such that .
}
. That is, .
}
= ,
k
for some k. Then
t
= . = .
0
. .
1
. . . . . .
}
= ,
k
. ,
k1
. . . . |
is a geodesic ray equivalent with that starts at .. Uniqueness follows from the
fact that a tree has no cycles.
For two ends . j, let = (o. ) = o = n
0
. n
1
. . . . | and
t
= (o. j) =
o = ,
0
. ,
1
. . . . |. These rays are not equivalent. Let k be minimal such that
n
k1
= ,
k1
. Then k _ 1, and the rays n
k1
. n
k2
. . . . | and ,
k1
. ,
k2
. . . . |
must be disjoint, since otherwise there would be a cycle in T . We can set .
0
= ,
k
=
n
k
, .
n
= ,
kn
and .
-n
= n
kn
for n > 0. Then . . . . .
2
. .
1
. .
0
. .
1
. .
2
. . . . |
is a geodesic with the required properties. Uniqueness follows again from the fact
that a tree has no cycles.
Exercise 9.15. The rst case (convergence to a vertex .) is clear, since the topology
is discrete on T .
By denition, n
n
dT if and only if for every , (o. ), there is n
,
such that n
n

T
,
for all n _ n
,
. Now, if n
n

T
,
then (o. ,) is part of (o. )
as well as of (o. ). That is, n
n
.

T
,
, so that [n
n
. [ _ [,[ for all n _ n
,
.
Therefore [n
n
.[ o. Conversely, if [n
n
.[ othen for each , (o. )
there is n
,
such that [n
n
. [ _ [,[ for all n _ n
,
, and in this case, we must have
n
n

T
,
.
Again by denition of the topology, n
n
,
+
if and only if for every nite set
of neighbours of ,, there is n
such that n
n

T
x,,
for all . and all n _ n
.
But for . , one has n
n

T
x,,
if and only if . (,. n
n
).
Finally, since .
+
n

T
u,,
if and only if .
n

T
u,,
, and since all types of conver-
gence are based on inclusions of the latter type, the statement about convergence
of (.
+
n
) follows.
We now prove compactness. Since the topology has a countable base, we just
show sequential compactness. By construction, T is dense in

T . Therefore it
is enough to show that every sequence (.
n
) of elements of T has a subsequence
that converges in

X. If there is . T such that .
n
= . for innitely many n,
then we have such a subsequence. So we may assume (passing to a subsequence,
if necessary) that all .
n
are distinct. We use an elementary inductive procedure,
similar to Exercise 9.9.
If for every , ~ o, the cone T
,
contains only nitely many .
n
, then o T
o
and .
n
o
+
, and we are done.
Otherwise, there is ,
1
~ o such that T
,
1
contains innitely many .
n
, and we
can pass to the next step.
If for every , with ,
-
= ,
1
, the cone T
,
contains only nitely many .
n
, then
,
1
T
o
and .
n
k
,
+
1
for the subsequence of those .
n
that are in T
,
1
, and we
are done.
Otherwise, there is ,
2
with ,
-
2
= ,
1
such that T
,
2
n
,
and we pass to the third step.
Inductively, we either nd ,
k
T
o
and a subsequence of (.
n
) that converges
to ,
+
k
, or else we nd a sequence o ~ ,
1
. ,
2
. ,
3
. . . . such that ,
-
k1
= ,
k
for all k,
such that each T
,
k
n
. The ray o. ,
1
. ,
2
. . . . | represents
an end of T , and it is immediate that (.
n
) has a subsequence that converges to
that end.
Exercise 9.17. If
1
.
2
L and z
1
. z
2
R then
.. ,| 1(T ) : z
1
1
(.) z
2
2
(.) = z
1
1
(,) z
2
2
(,)
_

.. ,| 1(T ) :
1
(.) =
1
(,)
_
L
.. ,| 1(T ) :
2
(.) =
2
(,)
_
.
which is nite.
Next, let L. In order to prove that is in the linear span of L
0
, we proceed
by induction on the cardinality n of the set {e 1(T ) : (e
) = (e
-
)], which
has to be even in our setting, since we have oriented edges. If n = 0 then c,
and we can write = c 1
T
x;y
c 1
T
y;x
.
Now suppose the statement is true for n 2. There must be an edge .. ,|
{e 1(T ) : (e
) = (e
-
)] such that (u) = () for all u. | 1(T
x,,
).
That is, c is constant on T
x,,
. Let g =
_
(.) c
_
1
T
x;y
. Then
g(.) = g(,), and the number of edges along which g differs is n 2. By the
induction hypothesis, g is a linear combination of functions 1
T
u;v
, where u ~ .
Therefore also has this property.
Exercise 9.26. We use the rst formula of Proposition 9.3 (b). If J(,. .) = 1 then
(,. .)
uyx : u~,
(,. n)J(n. ,) = 1.
Since
u : u~,
(,. n) = 1, this yields J(n. ,) = 1 for all n = . with n ~ ,.
The reader can now proceed by induction on the distance from , in an obvious
manner.
Nowlet be a transient end, and let . be a newbase point. Write = (.. ) =
. = .
0
. .
1
. .
2
. . . . |. Then.. = .
k
for some k, sothat .
k
. .
k1
. . . . | (o. )
and .
-
n1
= .
n
for all n _ k. Therefore J(.
n1
. .
n
) < 1 for all n _ k. The rst
part of the exercise implies that this holds for all n _ 0, which shows that is also
a transient end with respect to the base point ..
Exercise 9.31. Given ., let n = . . . This is a point on (o. ) as well as on
(o. .). If , T
u
then J(.. ,) = J(.. n) J(n. ,) and J(o. ,) = J(o. n)
J(n. ,), and . . , = n. We see that hor(.. ) = hor(.. ,) = J(.. ,) J(o. ,)
is constant for all , T
u
, when . is given.
Exercise 9.39. If . ~ o then
h(.) = 1(.. o) v(dT )
_
1(.. .) 1(.. o)
_
v(dT
x
)
= J(.. o) v(dT )
1 J(o. .)J(.. o)
J(o. .)
v(dT
x
)
= J(.. o) v(dT )
1 U(o. o)
(o. .)
v(dT
x
)
by (9.36).Therefore
x~o
(o. .)h(.) =
_
x~o
(o. .)J(.. o)
_
v(dT )
_
1 U(o. o)
_
x~o
v(dT
x
)
= v(dT ) = h(o).
as proposed.
Exercise (Corollary) 9.44. If the Green kernel vanishes at innity then it vanishes
at every boundary point, and the Dirichlet problem at innity admits a solution.
Conversely, if lim
x-
G(.. o) = 0 for every d
+
T , then the function g on

T
is continuous, where g(.) = G(.. o) for . T and g() = 0 for every d
+
T .
Given c > 0, every d
+
T has an open neighbourhood V
in

T on which g < c.
Then V =
_
J
T
V
is open and contains d

+
T . Thus, the complement

T \ V is
compact and contains no boundary point. Thus, it is a nite set of vertices, outside
of which g < c.
Exercise 9.52. For k N and any end dT , let
k
() = h
_
k
()
_
. This is a
continuous function on dT . It is a standard fact that the set of all points where a
sequence of continuous functions converges to a given Borel measurable function
is a Borel set.
Exercise 9.54.
Pr
x
7
n
T
x
\ {.] for all n _ 1| =
, : ,
=x
(.. ,)
_
1 J(,. .)
_
= 1 (.. .
-
)
, : ,
=x
(.. ,)J(,. .)
= (.. .
-
)
(.. .
-
)
J(.. .
-
)
by Proposition 9.3 (b), and the formula follows.
Exercise 9.58. We know that on T
q
, the function J(,. .[:) is the same for all
pairs of neighbours .. ,. It coincides with the function J(1. 0[:) of the factor chain
([7
n
[)
n_0
on N
0
, which is the innite drunkards walk with reecting barrier at 0
and forward transition probability = q,(q 1). Therefore
J(,. .[:) =
q 1
2q:
_
1
_
1 j
2
:
2
_
. where j =
2
_
q
q 1
.
Using binomial expansion, we get
J(,. .[:) =
1
_
q
o
n=1
(1)
n-1
_
1,2
n
_
(j :)
2n-1
.
Therefore
(2n-1)
(,. .) =
(j,2)
2n-1
n
_
q
_
2n 2
n 1
_
.
With these values,
Pr
o
W
k1
= ,.
k1
= m2n 1 [ W
k
= ..
k
= m| =
(2n-1)
(,. .).
Exercise 9.62. Let be an end that satises the criterion of Corollary 9.61. Let
(o. ) = o = .
0
. .
1
. .
2
. . . . |. Suppose that is recurrent. Then there is k such
that J(.
n1
. .
n
) = 1 for all n _ 1. Since all involved properties are independent
of the base point, we may suppose without loss of generality that k = 0. Write
.
1
= .. Then the random walk on the branch T
o,xj
with transition matrix 1
o,xj
is recurrent. By reversibility, we can view that branch as a recurrent network in
the sense of Section 4.D. It contains the ray (o. ) as an induced subnetwork,
recurrent by Exercise 4.54. The latter inherits the conductances a(.
i-1
. .
i
) of its
edges from the branch, and the transition probabilities on the ray satisfy
ray
(.
i
. .
i-1
)
ray
(.
i
. .
i1
)
=
a(.
i
. .
i-1
)
a(.
i
. .
i1
)
=
(.
i
. .
i-1
)
(.
i
. .
i1
)
.
But since
o
k=1
k
i=1
ray
(.
i
. .
i-1
)
ray
(.
i
. .
i1
)
< o.
this birth-and-death chain is transient by Theorem 5.9 (i), a contradiction.
Exercise 9.70. We can consider the branch T
,uj
and the associated random walk
1
,uj
. We know that we can replace g = g
o
with g
in the assumption of the

exercise. With in the place of the root, we can apply Theorem 9.69 (a): niteness
of g
on dT
,u
implies that 1
,uj
is transient. Therefore J(n. ) < 1, so that also
1 on T is transient.
Furthermore, we can apply the same argument to any sub-branch T
x,,j
of
T
,uj
, where . is closer to than ,, and get J(,. .) < 1. Thus, if dT
,u
and
(. ) = = .
0
. .
1
. .
2
. . . . | then J(.
n
. .
n-1
) < 1 for all n _ 1: the end is
transient.
Exercise 9.73. We can order the
i
such that
i1
_
i
for all i . Then also
i
,(1
i
) _
}
,(1
}
) whenever i _ . In particular, we have
z =

1
1
1
2
1
2
. where z = max
_

i
1
i
}
1
}
: i. . i =
_
.
Now the inequality
1

2
< 1 readily implies that z < 1. If . T \ {o] and
(0. .) = o = .
0
. .
1
. . . . . .
n
| then for i = 1. . . . . n 1,
(.
i
. .
i-1
)
1 (.
i
. .
i-1
)
(.
i1
. .
i
)
1 (.
i1
. .
i
)
_ z.
so that
k
i=1
(.
i
. .
i-1
)
1 (.
i
. .
i-1
)
_
_
z
k{2
. if k is even,
]
1
1-]
1
z
(k-1){2
. if k is odd.
It follows that g is bounded by M = 1
_
(1
1
)(1
_
z)
_
.
Exercise 9.80. This computation of the largest eigenvalue of 2 2 matrices is left
entirely to the reader.
Exercise 9.82. Variant 1. In Example 9.47, we have s = q1 and
i
= 1,(q1).
The function (t ) becomes
(t ) =
1
2
_
_
(q 1)
2
4t
2
(q 1)
_
and the unique positive solution of the equation
t
(t ) = (t ),t is easily computed:
j(1) = 2
_
q
(q 1).
Variant 2.

7
n
= [7
n
[ is the innite drunkards walk on N
0
(reecting at state 0)
with forward probability = q,(q 1), where q 1 is the vertex degree.
In this example, G(o. o[:) =

G(0. 0[:), because in our factor chain, the class
corresponding to the state 0 has only the vertex o in its preimage under the natural
projection. But

G(0. 0[:) is the function G
N
(0. 0[:) computed in Example 5.23.
(Attention: the q of that example is 1 .) That is,
G(o. o[:) =
2q
q 1
_
(q 1)
2
4q:
2
.
The smallest positive singularity of this function is r(1) = (q 1)
_
2
_
q
_
, and
j(1) = 1,r(1).
Exercise 9.84. Let C
n
= {. T : [.[ = n], n _ 0, be the classes corresponding
to the projection . [.[. Then C
0
= {o] and (0. 1) = (o. C
1
) = 1. For n _ 1
and . C
n
, we have
(n. n 1) = (.. .
-
) = 1
and
(n. n 1) = (.. C
n1
) = 1 (.. .
-
) = .
These numbers are independent of the specic choice of . C
n
, so that we have
indeed a factor chain. The latter is the innite drunkards walk on N
0
(reecting
at state 0) with forward probability = . The relevant computations can be
found in Example 5.23.
Exercise 9.88. (1) We prove the following by induction on n.
v For all k
1
. . . . . k
n
N and . X X
(1)
,
Pr
x
t
1
= k
1
. t
2
= k
1
k
2
. . . . . t
n
= k
1
k
n
|
=
(k
1
)
(0. N)
(k
2
)
(0. N)
(k
n
)
(0. N).
where
(k
i
)
(0. N) refers to SRW on the integer interval {0. . . . . N]. This implies
immediately that the increments of the stopping times are i.i.d.
For n = 1, we have already proved that formula. Suppose that it is true for
n 1. We have, using the strong Markov property,
Pr
x
t
1
= k
1
. t
2
= k
1
k
2
. . . . . t
n
= k
1
k
n
|
=
,A
Pr
x
t
1
= k
1
. 7
k
1
= ,|
Pr
x
t
2
= k
1
k
2
. . . . . t
n
= k
1
k
n
[ t
1
= k
1
. 7
k
1
= ,|
(+)
=
(+)
=
,A
Pr
x
t
1
= k
1
. 7
k
1
= ,|
Pr
,
t
1
= k
2
. t
2
= k
2
k
3
. . . . . t
n-1
= k
2
k
n
|
=
,A
Pr
x
t
1
= k
1
. 7
k
1
= ,|
(k
2
)
(0. N)
(k
n
)
(0. N)
= Pr
x
t
1
= k
1
|
(k
2
)
(0. N)
(k
n
)
(0. N)
=
(k
1
)
(0. N)
(k
2
)
(0. N)
(k
n
)
(0. N).
We remark that (+) holds since t
k
is the stopping time of the (k 1)-st visit in a
point in X distinct from the previous one after the time t
1
.
If the subdivision is arbitrary, then the distribution of t
1
depends on the starting
point in X. Therefore the increments cannot be identically distributed. They also
cannot be independent, since in this situation, the distribution of t
2
t
1
depends
on the point 7
t
1
.
(2) We know that the statement is true for k = 1. Suppose it holds for k 1. Then,
again by the strong Markov property,
Pr
x
t
k
= n. 7
n
= ,| =
n
n=0
A
Pr
x
t
1
= m. 7
n
= . t
k
= n. 7
n
= ,|
=
n
n=0
A
Pr
x
t
1
= m. 7
n
= | Pr
t
k-1
= n m. 7
n-n
= ,|.
(Note that in reality, we cannot have m = 0 or n = 0; the associated probabilities
are 0.) We deduce, using the product formula for power series,
o
n=0
Pr
x
_
t
k
= n. 7
n
= ,
_
:
n
=
A
o
n=0
n
n=0
Pr
x
t
1
= m. 7
n
= | Pr
t
k-1
= n m. 7
n-n
= ,| :
n
=
A
_
o
n=0
Pr
x
t
1
= m. 7
n
= | :
n
__
o
n=0
Pr
t
k-1
= n. 7
n
= ,| :
n
_
=
A
_
(.. ) (:)
_ _
(k-1)
(. ,) (:)
k-1
_
.
which yields the stated formula.
Exercise 9.91. We notice that the reversing measure m of SRW on

X satises
m[
A
= m. The resistance of every edge is = 1. Therefore
(

.

)
I
=
x

A
( .)
2
m( .) _
xA

A
( .)
2
m( .) = (. )
I
.
Along any edge of the inserted path of the subdivision that replaces an original
edge .. ,| of X, the difference of

is 0, with precisely one exception, where the
difference is (,) (.). Thus, the contribution of that inserted path to D
I
(

)
is
_
(,) (.)
_
2
. Therefore D
I
(

) = D
I
( ).
Exercise 9.93. Conditions (i) and (ii) imply yield that
D
1
( ) _ c D
T
( ) and (. )
1
_ M (. )
T
for every
0
(X).
Here, the index 1 obviously refers to the reversible Markov chain with transition
matrix 1 and associated reversing measure m, while the index T refers to SRW.
We have already seen that (. )
1
(1. )
1
= D
1
( ). Therefore, since
j(1) = sup
_
(1. )
1
(. )
1
:
0
(X). = 0
_
.
we get
1 j(1) = inf
_
D
1
( )
(. )
1
:
0
(X). = 0
_
_
c
M
inf
_
D
T
( )
(. )
T
:
0
(X). = 0
_
=
c
M
_
1 j(T )
_
.
as proposed.
Exercise 9.96. We use the cone types. There are two of them. Type 1 corresponds
to T
x
, where . (o. ), . = o. That is, dT
x
contains . Type 1 corresponds to
any T
,
that looks downwards in Figure 37, so that T
,
is a q-ary tree rooted at .,
and dT
x
.
If . has type 1, then T
x
contains precisely one neighbour of . with the same
type, namely .
, and (.. .
) = 1 . Also, T
x
contains q 1 neighbours , of
. with type 1, and (.. ,) = ,q for each of them.
If , has type 1, then all of its q neighbours n T
,
also have type one, and
(,. n) = ,q for each of them.
Therefore
=
_
q
1-
q 1
0

1-
_
.
The two eigenvalues are the elements in the principal diagonal of , and at least
one of them is > 1. This yields transience.
Exercise 9.97. The functions J
-
(:) = J(.. .
~
[:) and J
(:) = J(.
~
. .[:) are
independent of ., compare with Exercise 2.12. Proposition 9.3 leads to the two
quadratic equations
J
-
(:) = (1 ) : : J
-
(:)
2
and
J
(:) =

q
: (q 1)
q
: J
-
(:)J
(:) (1 ) : J (:)
2
.
Since J
-
(0) = 0, the right one of the two solutions of the quadratic equation for
J
-
(:) is
J
-
(:) =
1
2:
_
1
_
1 4(1 ):
2
_
.
One next has to solve the quadratic equation for J
(:). By the same argument,

the right solution is the one with the minus sign in front of the square root. We let
J
-
= J
-
(1) and J
= J
(1). In the end, we nd after elementary computations

J
-
=
_
_
1
if _
1
2
.
1 if _
1
2
.
and J
=
_
_
1
q
if _
1
2
.
(1 )q
if _
1
2
.
We see that when > 1,2 then J
-
(1) < 1 and J
(1) < 1. Then the Green kernel

vanishes at innity, and the Dirichlet problem is solvable. On the other hand, when
_ 1,2 then the random walk converges almost surely to , so that supp v
o
does
not coincide with the full boundary: the Dirichlet problemat innity does not admit
solution in this case.
We next compute
U(.. .) = U = J
-
(1 ) J
= min{. 1 ]
q 1
q
.
For .. , T
q
, let = (.. ,) be the point on (.. ,) which minimizes hor() =
hor(. ). In other words, this is the rst common point on the geodesic rays
(.. ) and (,. ) the conuent of . and , with respect to . (Recall that
. . , is the conuent with respect to o.) With this notation,
G(.. ,) = J
d(x,)
-
J
d(,,)
1
1 U
.
We now compute the Martin kernel. It is immediate that
1(.. ) = J
-
(1)
hor(x)
=
_
_
_
1
_
hor(x)
if _
1
2
.
1 if _
1
2
.
We leave to the reader the geometric reasoning that leads to the formula for the
Martin kernel at dT
q
\ {]: setting z =
_
J
-
J
,
1(.. ) = 1(.. ) z
hor(x,)-hor(x)
.
where z = min
_
1-
q
.

(1-)q
_
1{2
Exercise 9.101. We can rewrite

D(1)
-1
D
t
(1) =
_
1 D(1)D(1)
_
-1
DT
-1
and, with a few transformations,
_
1 D(1)
_
-1
_
1 D(1)D(1)
_
=
_
1 QD(1)
__
1 D(1)
_
-1
.
The last identity implies
o
_
1 D(1)
_
-1
_
1 D(1)D(1)
_
= o.
Therefore, if 1 denotes the column vector over with all entries = 1, then
1
= o D(1)
-1
D
t
(1) 1 = o
_
1 D(1)
_
-1
DT
-1
1.
which is the proposed formula.
Bibliography
A Textbooks and other general references
[A-N] Athreya, K. B., Ney, P. E., Branching processes. Grundlehren Math. Wiss. 196,
Springer-Verlag, NewYork 1972. 134
[B-M-P] Baldi, P., Mazliak, L., Priouret, P., Martingales and Markov chains. Solved exer-
cises and elements of theory. Translated from the 1998 French original, Chapman
& Hall/CRC, Boca Raton 2002. xiii
[Be] Behrends, E., Introduction to Markov chains. With special emphasis on rapid
mixing. Adv. Lectures Math., Vieweg, Braunschweig 2000. xiii
[Br] Brmaud, P., Markov chains. Gibbs elds, Monte Carlo simulation, and queues.
Texts Appl. Math. 31, Springer-Verlag, NewYork 1999. xiii, 6
[Ca] Cartier, P., Fonctions harmoniques sur un arbre. Symposia Math. 9 (1972),
203270. xii, 250, 262
[Ch] Chung, K. L., Markov chains with stationary transition probabilities. 2nd edition.
Grundlehren Math. Wiss. 104, Springer-Verlag, NewYork 1967. xiii
[Di] Diaconis, P., Group representations in probability and statistics. IMS Lecture
Notes Monogr. Ser. 11, Institute of Mathematical Statistics, Hayward, CA, 1988.
83
[D-S] Doyle, P. G., Snell, J. L., Random walks and electric networks. Carus Math.
Monogr. 22, Math. Assoc. America, Washington, DC, 1984. xv, 78, 80, 231
[Dy] Dynkin, E. B., Boundary theory of Markov processes (the discrete case). Russian
Math. Surveys 24 (1969), 142. xv, 191, 212
[F-M-M] Fayolle, G., Malyshev, V. A., Menshikov, M. V., Topics in the constructive theory
of countable Markov chains. Cambridge University Press, Cambridge 1995. x
[F1] Feller, W., An introduction to probability theory and its applications, Vol. I. 3rd
edition, Wiley, NewYork 1968. x
[F2] Feller, W., An introduction to probability theory and its applications, Vol. II. 2nd
edition, Wiley, NewYork 1971. x
[F-S] Flajolet, Ph., Sedgewick, R., Analytic combinatorics. Cambridge University Press,
Cambridge 2009. xiv
[Fr] Freedman, D., Markov chains. Corrected reprint of the 1971 original, Springer-
Verlag, NewYork 1983. xiii
[H] Hggstrm, O., Finite Markov chains and algorithmic applications. London Math.
Soc. Student Texts 52, Cambridge University Press, 2002. xiii
[H-L] Hernndez-Lerma, O., Lasserre, J. B., Markov chains and invariant probabilities.
Progr. Math. 211, Birkhuser, Basel 2003. xiv
[Hal] Halmos, P., Measure theory. Van Nostrand, NewYork, N.Y., 1950. 8, 9
340 Bibliography
[Har] Harris, T. E., The theory of branching processes. Corrected reprint of the 1963
original, Dover Phoenix Ed., Dover Publications, Inc., Mineola, NY, 2002. 134
[Hi] Hille, E., Analytic function theory, Vols. III. Chelsea Publ. Comp., New York
1962 130
[I-M] Isaacson, D. L., Madsen, R. W., Markov chains. Theory and applications. Wiley
Ser. Probab. Math. Statist., Wiley, NewYork 1976. xiii, 6, 53
[J-T] Jones, W. B., Thron, W. J., Continued fractions. Analytic theory and applications.
Encyclopedia Math. Appl. 11, Addison-Wesley, Reading, MA, 1980. 126
[K-S] Kemeny, J. G., Snell, J. L., Finite Markov chains. Reprint of the 1960 original,
Undergrad. Texts Math., Springer-Verlag, NewYork 1976. ix, xiii, 1
[K-S-K] Kemeny, J. G., Snell, J. L., Knapp, A. W., Denumerable Markov chains. 2nd
edition, Grad. Texts in Math. 40, Springer-Verlag, NewYork 1976. xiii, xiv, 189
[La] Lawler, G. F., Intersections of random walks. Probab. Appl., Birkhuser, Boston,
MA, 1991. x, xv
[Le] Lesigne, Emmanuel, Heads or tails. An introduction to limit theorems in probabil-
ity. Translated from the 2001 French original by A. Pierrehumbert, Student Math.
Library 28, Amer. Math. Soc., Providence, RI, 2005. 115
[Li] Lindvall, T., Lectures on the coupling method. Corrected reprint of the 1992 orig-
inal, Dover Publ., Mineola, NY, 2002. x, 63
[L-P] Lyons, R. with Peres, Y., Probability on trees and networks. Book in preparation;
available online at http://mypage.iu.edu/~rdlyons/prbtree/prbtree.html xvi, 78, 80,
134
[No] Norris, J. R., Markov chains. Cambridge Ser. Stat. Probab. Math. 2, Cambridge
University Press, Cambridge 1998. xiii, 131
[Nu] Nummelin, E., General irreducible Markov chains and non-negative operators.
Cambridge Tracts in Math. 83, Cambridge University Press, Cambridge 1984. xiv
[Ol] Olver, F. W. J., Asymptotics and special functions. Academic Press, San Diego,
CA, 1974. 128
[Pa] Parthasarathy, K. R., Probability measures on metric spaces. Probab. and Math.
Stat. 3, Academic Press, NewYork, London 1967. 10, 202, 217
[Pe] Petersen, K., Ergodic theory. Cambridge University Press, Cambridge 1990. 69
[Pi] Pintacuda, N., Catene di Markov. Edizioni ETS, Pisa 2000. xiii
[Ph] Phelps, R. R., Lectures on Choquets theorem. Van Nostrand Math. Stud. 7, New
York, 1966. 184
[Re] Revuz, D., Markov chains. Revised edition, Mathematical Library 11, North-
Holland, Amsterdam 1984. xiii, xiv
[R] Rvsz, P., Random walk in random and nonrandom environments. World Scien-
tic, Teaneck, NJ, 1990. x
[Ru] Rudin, W., Real and complex analysis. 3rd edition, McGraw-Hill, NewYork 1987.
105
Bibliography 341
[SC] Saloff-Coste, L., Lectures on nite Markov chains. In Lectures on probability
theory and statistics, cole det de probabilits de Saint-Flour XXVI-1996 (ed. by
P. Bernard), Lecture Notes in Math. 1665, Springer-Verlag, Berlin 1997, 301413.
x, xiii, 78, 83
[Se] Seneta, E., Non-negative matrices and Markov chains. Revised reprint of the
second (1981) edition. Springer Ser. Statist. Springer-Verlag, New York 2006.
ix, x, xiii, 6, 40, 53, 57, 75
[So] Soardi, P. M., Potential theory on innite networks. Lecture Notes in Math. 1590,
Springer-Verlag, Berlin 1994. xv
[Sp] Spitzer, F., Principles of random walks. 2nd edition, Grad. Texts in Math. 34,
Springer-Verlag, NewYork 1976. x, xv, 115
[St1] Stroock, D., Probability theory, an analytic view. 2nd ed., Cambridge University
Press, NewYork 2000. xiv
[St2] Stroock, D., An introduction to Markov processes. Grad. Texts in Math. 230,
Springer-Verlag, Berlin 2005. xiii
[Wa] Wall, H. S., Analytic theory of continued fractions. Van Nostrand, Toronto 1948.
126
[W1] Woess, W., Catene di Markov e teoria del potenziale nel discreto. Quaderni
dellUnione Matematica Italiana 41, Pitagora Editrice, Bologna 1996. v, x, xi,
xiv
[W2] Woess, W., Random walks on innite graphs and groups. Cambridge Tracts
in Mathematics 138, Cambridge University Press, Cambridge 2000 (paperback
reprint 2008). vi, x, xv, xvi, 78, 151, 191, 207, 225, 287
B Research-specic references
[1] Amghibech, S., Criteria of regularity at the end of a tree. In Sminaire de probabilits
XXXII, Lecture Notes in Math. 1686, Springer-Verlag, Berlin 1998, 128136. 261
[2] Askey, R., Ismail, M., Recurrence relations, continued fractions, and orthogonal poly-
nomials. Mem. Amer. Math. Soc. 49 (1984), no. 300. 126
[3] Bajunaid, I., Cohen, J. M., Colonna, F., Singman, D., Trees as Brelot spaces. Adv. in
Appl. Math. 30 (2003), 706745. 273
[4] Bayer, D., Diaconis, P., Trailing the dovetail shufe to its lair. Ann. Appl. Probab. 2
(1992), 294313. 98
[5] Benjamini, I., Peres, Y., Random walks on a tree and capacity in the interval. Ann. Inst.
H. Poincar Prob. Stat. 28 (1992), 557592. 261
[6] Benjamini, I., Peres, Y., Markov chains indexed by trees. Ann. Probab. 22 (1994),
219243. 147
[7] Benjamini, I., Schramm, O., Random walks and harmonic functions on innite planar
graphs using square tilings. Ann. Probab. 24 (1996), 12191238. 269
342 Bibliography
[8] Blackwell, D., On transient Markov processes with a countable number of states and
stationary transition probabilities. Ann. Math. Statist. 26 (1955), 654658. 219
[9] Cartwright, D. I., Kaimanovich, V. A., Woess, W., Random walks on the afne group
of local elds and of homogeneous trees. Ann. Inst. Fourier (Grenoble) 44 (1994),
12431288. 292
[10] Cartwright, D. I., Soardi, P. M., Woess, W., Martin and end compactications of non
locally nite graphs. Trans. Amer. Math. Soc. 338 (1993), 679693. 250, 261
[11] Choquet, G., Deny, J., Sur lquation de convolution j = j+o. C. R. Acad. Sci. Paris
250 (1960), 799801. 224
[12] Chung K. L., Ornstein, D., On the recurrence of sums of random variables. Bull. Amer.
Math. Soc. 68 (1962), 3032. 113
[13] Derriennic,Y., Marche alatoire sur le groupe libre et frontire de Martin. Z. Wahrschein-
lichkeitstheorie und Verw. Gebiete 32 (1975), 261276. 250
[14] Diaconis, P., Shahshahani, M., Generating a random permutation with random trans-
positions. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 57 (1981), 159179. 101
[15] Diaconis, P., Stroock, D., Geometric bounds for eigenvalues of Markov chains. Ann.
Appl. Probab. 1 (1991), 3661. x, 93
[16] Di Biase, F., Fatou type theorems. Maximal functions and approach regions. Progr.
Math. 147, Birkhuser, Boston 1998. 262
[17] Doob, J. L., Discrete potential theoryandboundaries. J. Math. Mech. 8(1959), 433458.
182, 189, 206, 217, 261
[18] Doob, J. L., Snell, J. L.,Williamson, R. E., Application of boundary theory to sums of
independent random variables. In Contributions to probability and statistics, Stanford
University Press, Stanford 1960, 182197. 224
[19] Dynkin, E. B., Malyutov, M. B., Random walks on groups with a nite number of
generators. Soviet Math. Dokl. 2 (1961), 399402. 219, 250
[20] Erds, P., Feller, W., Pollard, H., A theorem on power series. Bull. Amer. Math. Soc. 55
(1949), 201204. x
[21] Fig-Talamanca, A., Steger, T., Harmonic analysis for anisotropic random walks on
homogeneous trees. Mem. Amer. Math. Soc. 110 (1994), no. 531. 250
[22] Gairat, A. S., Malyshev, V. A., Menshikov, M. V., Pelikh, K. D., Classication of
Markov chains that describe the evolution of random strings. Uspekhi Mat. Nauk 50
(1995), 524; English translation in Russian Math. Surveys 50 (1995), 237255. 278
[23] Gantert, N., Mller, S., The critical branching Markov chain is transient. Markov Pro-
cess. Related Fields 12 (2006), 805814. 147
[24] Gerl, P., Continued fraction methods for randomwalks on Nand on trees. In Probability
measures on groups, VII, Lecture Notes in Math. 1064, Springer-Verlag, Berlin 1984,
131146. 126
[25] Gerl, P., Rekurrente und transiente Bume. In Sm. Lothar. Combin. (IRMAStrasbourg)
10 (1984), 8087; available online at http://www.emis.de/journals/SLC/ 269
Bibliography 343
[26] Gilch, L., Rate of escape of random walks on free products. J. Austral. Math. Soc. 83
(2007), 3154. 267
[27] Gilch, L., Mller, S., Random walks on directed covers of graphs. Preprint, 2009. 273
[28] Good, I. J., Random motion and analytic continued fractions. Proc. Cambridge Philos.
Soc. 54 (1958), 4347. 126
[29] Guivarch, Y., Sur la loi des grands nombres et le rayon spectral dune marche alatoire.
Astrisque 74 (1980), 4798. 151
[30] Harris T. E., First passage and recurrence distributions. Trans. Amer. Math. Soc. 73
(1952), 471486. 140
[31] Hennequin, P. L., Processus de Markoff en cascade. Ann. Inst. H. Poincar 18 (1963),
109196. 224
[32] Hunt, G. A., Markoff chains and Martin boundaries. Illinois J. Math. 4 (1960), 313340.
189, 191, 206, 218
[33] Kaimanovich, V. A., Lyapunov exponents, symmetric spaces and a multiplicative er-
godic theorem for semisimple Lie groups. J. Soviet Math. 47 (1989), 23872398. 292
[34] Karlin, S., McGregor, J., Random walks. Illinois J. Math. 3 (1959), 6681. 126
[35] Kemeny, J. G., Snell, J. L., Boundary theory for recurrent Markov chains. Trans. Amer.
Math. Soc. 106 (1963), 405520. 187
[36] Kingman, J. F. C., The ergodic theory of Markov transition probabilities. Proc. London
Math. Soc. 13 (1963), 337358.
[37] Knapp, A. W., Regular boundary points in Markov chains. Proc. Amer. Math. Soc. 17
(1966), 435440. 261
[38] Kornyi, A., Picardello, M. A., Taibleson, M. H., Hardy spaces on nonhomogeneous
trees. With an appendix by M. A. Picardello and W. Woess. Symposia Math. 29 (1987),
205265. 250
[39] Lalley, St., Finite range random walk on free groups and homogeneous trees. Ann.
Probab. 21 (1993), 20872130. 267
[40] Ledrappier, F., Asymptotic properties of random walks on free groups. In Topics in
probability and Lie groups: boundary theory, CRM Proc. Lecture Notes 28, Amer.
Math. Soc., Providence, RI, 2001, 117152. 267
[41] Lyons, R., Random walks and percolation on trees. Ann. Probab. 18 (1990), 931958.
278
[42] Lyons, T., A simple criterion for transience of a reversible Markov chain. Ann. Probab.
11 (1983), 393402. 107
[43] Mann, B., How many times should you shufe a deck of cards? In Topics in contempo-
rary probability and its applications, Probab. Stochastics Ser., CRC, Boca Raton, FL,
1995, 261289. 98
[44] Nagnibeda, T., Woess, W., Random walks on trees with nitely many cone types. J.
Theoret. Probab. 15 (2002), 383422. 267, 278, 295
344 Bibliography
[45] Ney, P., Spitzer, F., The Martin boundary for random walk. Trans. Amer. Math. Soc.
121 (1966), 116132. 224, 225
[46] Picardello, M. A., Woess, W., Martin boundaries of Cartesian products of Markov
chains. Nagoya Math. J. 128 (1992), 153169. 207
[47] Plya, G., ber eine Aufgabe der Wahrscheinlichkeitstheorie betreffend die Irrfahrt im
Straennetz. Math. Ann. 84 (1921), 149160. 112
[48] Soardi, P. M., Yamasaki, M., Classication of innite networks and its applications.
Circuits, Syst. Signal Proc. 12 (1993), 133149. 107
[49] Vere-Jones, D., Geometric ergodicity in denumerable Markov chains. Quart. J. Math.
Oxford 13 (1962), 728. 75
[50] Yamasaki, M., Parabolic and hyperbolic innite networks. Hiroshima Math. J. 7 (1977),
135146. 107
[51] Yamasaki, M., Discrete potentials on an innite network. Mem. Fac. Sci., Shimane Univ.
13 (1979), 3144.
[52] Woess, W., Random walks and periodic continued fractions. Adv. in Appl. Probab. 17
(1985), 6784. 126
[53] Woess, W., Generating function techniques for random walks on graphs. In Heat ker-
nels and analysis on manifolds, graphs, and metric spaces (Paris, 2002), Contemp.
Math. 338, Amer. Math. Soc., Providence, RI, 2003, 391423. xiv
List of symbols and notation
This list contains a selection of the most important symbols and notation.
Numbers
N = {1. 2. . . . ] the natural numbers (positive integers)
N
0
= {0. 1. 2. . . . ] the non-negative integers
N
odd
= {1. 3. 5. . . . ] the odd natural numbers
Z the integers
R the real numbers
C the complex numbers
Markov chains
X typical symbol for the state space of a Markov chain
1, also Q typical symbols for transition matrices
C, C(.) irreducible class (of state .)
G(.. ,[:) Green function (1.32)
J(.. ,[:), U(.. ,[:) rst passage time generating functions (1.37)
1(.. ,[:) generating function of last exit probabilities (3.56)
Probability space ingredients
C trajectory space, see (1.8)
A o-algebra generated by all cylinder sets (1.9)
Pr
x
and Pr
probability measure on the trajectory space with respect to

starting point ., resp. initial distribution v, see (1.10)
Pr( ) probability of a set in the o-algebra
Pr | probability of an event described by a logical expression
Pr [ | conditional probability
E expectation (expected value)
Random times
s, t typical symbols for stopping times (1.24)
s
W
, s
x
rst passage time = time (_ 0) of rst visit to the set W , resp. the
point . (1.26)
t
W
, t
x
time (_ 1) of rst visit to the set W , resp. the point . after the start
V
,
k
exit time from the set V , resp. (in a tree) from the ball with radius k
around the starting point (7.34), (9.19)
346 List of symbols and notation
Measures
j, v typical symbols for measures on X, R, Z
d
, a group, etc.
j mean or mean vector of the probability measure j on Z or Z
d
supp support of a measure or function
m
C
the stationary probability measure of a positive recurrent essential
class of a Markov chain (3.19)
m (a) the stationary probability measure of a positive recurrent,
irreducible Markov chain,
(b) the reversing measure of a reversible Markov chain,
not necessarily with nite mass (4.1)
Graphs, trees
I typical notation for a (usually directed) graph
V(I) vertex set of I
1(I) (oriented) edge set of I
I(1) graph of the Markov chain with transition matrix 1 (1.6)
T typical notation for a tree
T
q
homogenous tree with degree q 1
according to context, (a) path in a graph, (b) projection map, or
(c) 3.14159 . . .
a set of paths
. j| geodesic arc, ray or two-way innite geodesic from to j in a tree
T end compactication of the tree T (9.14)

dT space of ends of the tree T
T
o
set of vertices of the tree T with innite degree
T
+
set of improper vertices
d
+
T boundary of the non locally nite tree T , consisting of ends
and improper vertices (9.14)
Reversible Markov chains
m reversing measure (4.1)
V difference operator associated with a Markov chain (4.6)
L = 1 1 Laplace operator associated with a Markov chain
spec(1) spectrum of 1
z typical notation for an eigenvalue of 1
List of symbols and notation 347
Groups, vectors, functions
G typical notation for a group
S permutation group (symmetric group)
e
i
unit vector in Z
d
0 according to context (a) constant function with value 0,
(b) zero column vector in Z
d
1 according to context (a) constant function with value 1,
(b) column vector in Z
d
with all coordinates equal to 1
1
indicator function of the set or event

(t
0
) limit of (t ) as t t
0
from below (t real)
GaltonWatson process and branching Markov chains
GW abbreviation for GaltonWatson
BMC abbreviation for branching Markov chain
BMC(X. 1. j) branching Markov chain with state space X,
transition matrix 1 and offspring distribution j
= {1. . . . . N] or set of possible offspring numbers, interpreted
= N as an alphabet
+
set of all words over , viewed as the N-ary tree
(possibly with N = o)
T full genealogical tree, a random or deterministic
subtree of
+
with property (5.32), typically
a GW tree
deterministic nite subtree of
+
containing
the root c
Potential and boundary theory
H space of harmonic functions
H
cone of non-negative harmonic functions

H
o
space of bounded harmonic functions
set of superharmonic functions
cone of non-negative superharmonic functions

E cone of excessive measures
X a compactication of the discrete set X

, j typical notation for elements of the boundary

X \ X
and for elements of the boundary of a tree
X(1) the Martin compactication of (X. 1)

M=

X(1) \ X the Martin boundary
Index
balaye, 177
birth-and-death Markov chain, 116
Borel o-algebra, 189
boundary process, 263
branching Markov chain
strongly recurrent, 144
transient, 144
weakly recurrent, 144
Busemann function, 244
Chebyshev polynomials, 119
class
aperiodic, 36
essential, 30
irreducible, 28
null recurrent, 48
positive recurrent, 48
recurrent, 45
transient, 45
communicating states, 28
compactication
(general), 184
conductance, 78
cone
base of a , 179
convex, 179
cone type, 274
conuent, 233
continued fraction, 123
convex set of states, 31
convolution, 86
coupling, 63, 128
cut point in a graph, 22
degree of a vertex, 79
directed cover, 274
Dirichlet norm, 93
Dirichlet problem
for nite Markov chains, 154
Dirichlet problem at innity, 251
downward crossings, 193
drunkards walk
nite, 2
on N
0
, absorbing, 33
on N
0
, reecting, 122, 137
on Z, 46
Ehrenfest model, 90
end of a tree, 232
ergodic coefcient, 53
exit time, 190, 196, 237
expectation, 13
expected value, 13
extremal element of a convex set, 180
factor chain, 17
Fatou theorem
probabilistic, 215
radial, 262
nal
event, 212
random variable, 212
nite range, 159, 179
rst passage times, 15
ow in a network, 104
function
G-integrable, 169
1-integrable, 158
harmonic, 154, 157, 159
superharmonic, 159
GaltonWatson process, 131
critical, 134
extended, 137
offspring distribution, 131
subcritical, 134
350 Index
supercritical, 134
GaltonWatson tree, 136
geodesic arc from . to ,, 227
geodesic from to j, 232
geodesic ray from . to , 232
graph
adjacency matrix of a , 79
bipartite, 79
distance in a , 108
k-fuzz of a , 108
locally nite, 79
of a Markov chain, 4
oriented, 4
regular , 96
subdivision of a , 283
symmetric, 79
Green function, 17
h-process, 182
harmonic function, 154, 157, 159
hitting times, 15
horocycle, 244
horocycle index, 244
hypercube
random walk on the , 88
ideal boundary, 184
improper vertices, 235
initial distribution, 3, 5
irreducible cone types, 293
Kirchhoffs node law, 81
Laplacian, 81
leaf of a tree, 228
Liouville property, 215
local time, 14
locally constant function, 236
Markov chain, 5
automorphism of a , 303
birth-and-death, 116
induced, 164
irreducible, 28
isomorphism of a , 303
reversible, 78
time homogeneous, 5
Markov property, 5
Martin boundary, 187
Martin compactication, 187
Martin kernel, 180
matrix
primitive, 59
stochastic, 3
substochastic, 34
maximum principle, 56, 155, 159
measure
excessive, 49
invariant, 49
reversing, 78
stationary, 49
minimal harmonic function, 180
minimal Martin boundary, 206
minimum principle, 162
network, 80
recurrent, 102
transient, 102
null class, 48
null recurrent
state, 47
offspring distribution, 131
non-degenerate, 131
path
nite, 26
length of a , 26
resistance length of a , 94
weight of a , 26
period of an irreducible class, 36
Poincar constant, 95
Poisson boundary, 214
Poisson integral, 210
Poisson transform, 248
Index 351
positive recurrent
class, 48
state, 47
potential
of a function, 169
of a measure, 177
predecessor of a vertex, 227
Pringsheims theorem, 130
random variable, 13
random walk
nearest neighbour, 226
on N
0
, 116
on a group, 86
simple, 79
recurrent
j-, 75
class, 45
network, 102
set, 164
state, 43
reduced
function, 172, 176
measure, 176
regular boundary point, 252
resistance, 80
simple random walk
on integer lattices, 109
simple random walk (SRW), 79
spectral radius of a Markov chain, 40
state
absorbing, 31
ephemeral, 35
essential, 30
null recurrent, 47
positive recurrent, 47
recurrent, 43
transient, 43
state space, 3
stochastic matrix, 3
stopping time, 14
strong Markov property, 14
subharmonic function, 269
subnetwork, 107
substochastic matrix, 34
superharmonic function, 159
supermartingale, 191
support of a Borel measure, 202
terminal
event, 212
random variable, 212
time reversal, 56
topology of pointwise convergence, 179
total variation, 52
trajectory space, 6
transient
j-, 75
class, 45
network, 102
skeleton, 242
state, 43
transition matrix, 3
transition operator, 159
tree
(2-sided innite) geodesic in a ,
227
branch of a , 227
cone of a , 227
cone type of a , 273
end of a , 232
geodesic arc in a , 227
geodesic ray in a , 227
horocycle in a, 244
leaf of a , 228
recurrent end in a, 241
transient end in a, 241
ultrametric, 234
unbranched path, 283
unit ow, 104
upward crossings, 194
walk-to-tree coding, 138

Denumerable Markov Chains

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Denumerable Markov Chains

Hochgeladen von

Copyright:

Verfügbare Formate

woess_titelei 15.7.

2009 14:12 Uhr Seite 1

and a c-additive probability measure Pr

has a unique extension to a probability measure

is dened on cylinder sets. Let A

on the cylinder sets, that is,

and stochasticity of the matrix 1, we get

fromthe collection of all cylinder sets to F by setting

is a nitely additive measure with total mass 1 on the algebra F ,

is continuous at 0, and consequently o-additive on F . As

extends in a unique way to a probability measure on the

, corresponding to all the initial distributions v on X. Thus, our usual

). Typically, we shall have v =

A real random variable is a measurable function f : (C. A) (

t < o| = 1. Show that (7

(.. ,) be the set of paths from . to , which contain , only as their

(.. ,) = (.. ,). if .. , , and

as a matrix over the whole of X, but the same notation will be

describes the evolutionof the Markovchainconstrainedtostaying

, the restriction of the identity matrix to . For the associated

(.. ,[1). (2.16)

(.. ,) be the radius of

for all : C with [:[ < r

(.. ,) < o for all

, the only essential state of the Markov chain ( L {f]. Q) is

_ 1,(1 c), where

In the following, the class C need not be nite.

3.14 Exercise. Let (

It is convenient to think of measures v on X as row vectors

1) is obtained from I(1) by

(z) is the characteristic polynomial of . If we set z = j = j(),

(j) = > 0, and j is a simple root of

74 Chapter 3. Recurrence and transience, convergence, and the ergodic theorem

= , for its initial and terminal vertex, respectively. Thus, e 1 if and

, while if (e) < 0 then

(. ), which is nite by Lemma 2.18.

(. .) are 0 outside , and we can use Exercise 4.45 :

(.. .) _ m(.), cap(.) for every nite X containing ..

for the set of edges with both endpoints in , and

X \ (the edge boundary of ). Below

< [ j[ for all n _ n

0 almost surely for every > 0.

The GaltonWatson process as the basic example of a branching process is very

are independent unless u 4 or 4 u.

There is one class of Markov chains where a complete description of strong

, the word starting with (m1) letters 0 and the last

= {h H [ h(.) _ 0 for all . X] and

6.16 Exercise. Deduce the following in at least two different ways.

for each n, and either h 0 or

. We set g(.) = h(.) 1h(.). Then g is non-negative and

= {constants]. Then, by Lemma 6.19, (X. 1)

(X. 1) the cones of all invariant and superinvariant measures, respectively.

for each n, and either v 0 or v(.) > 0 for

C Induced Markov chains

) is called the Markov chain induced by (X. 1) on .

is not stochastic. If it is stochastic, that is,

< o| = 1 for all . .

= 2| = 1 for every (k. l) . The induced chain is given by

(k. k 1) = q (k = 1. . . . . N). and

(N. N) = min{. q].

(N. N) = min{. q].

(0. 0) has the same value.

< o| = 1 for all . X.

< o| for all . X.