You are on page 1of 38

Lectures on Probability and

Statistical Models

Phil Pollett
Professor of Mathematics
The University of Queensland


c These materials can be used for any educational
purpose provided they are are not altered

Probability & Statistical Models


c Philip K. Pollett
13 Markov chains
Imprecise (intuitive) definition. A Markov process is a
random process that forgets its past, in the following
sense:

Pr(Future = y|Present = x and Past = z)


= Pr(Future = y|Present = x).

Thus, given the past and the present state of the process,
only the present state is of use in predicting the future.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Equivalently,

Pr(Future = y and Past = z|Present = x)


= Pr(Future = y|Present = x) Pr(Past = z|Present = x),

so that, given the present state of the process, its past and
its future are independent. If the set of states S is discrete,
then the process is called a Markov chain.
Remark. At first sight this definition might appear to cover
only trivial examples, but note that the current state could
be complicated and could include a record of the recent
past.

Probability & Statistical Models


c Philip K. Pollett
Andrei Andreyevich Markov

(Born: 14/06/1856, Ryazan, Russia; Died: 20/07/1922, St Petersburg, Russia)

Markov is famous for his pioneering work on Markov chains, which


launched the theory of stochastic processes. His early work was in
number theory, analysis, continued fractions, limits of integrals,
approximation theory and convergence of series.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Example. There are two rooms, labelled A and B. There is
a spider, initially in Room A, hunting a fly that is initially in
Room B. They move from room to room independently:
every minute each changes rooms (with probability p for the
spider and q for the fly) or stays put, with the
complementary probabilities. Once in the same room, the
spider eats the fly and the hunt ceases.

The hunt can be represented as a Markov chain with three


states: (0) the spider and the fly are in the same room (the
hunt has ended), (1) the spider is in Room A and the fly is
in Room B, and, (2) the spider is in Room B and the fly is in
Room A.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Eventually we will be able to answer questions like What is
the probability that the hunt lasts more than two minutes?

Let Xn be the state of the process at time n (that is, after n


minutes). Then, Xn S = {0, 1, 2}. The set S is called the
state space. The initial state is X0 = 1. State 0 is called an
absorbing state, because the process remains there once it
is reached.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Definition. A sequence {Xn , n = 0, 1, . . . } of random
variables is called a discrete-time stochastic process; Xn
usually represents the state of the process at time n. If
{Xn } takes values in a discrete state space S , then it is
called a Markov chain if

Pr(Xm+1 = j|Xm = i, Xm1 = im1 , . . . , X0 = i0 )


= Pr(Xm+1 = j|Xm = i). (1)

for all time points m and all states i0 , . . . , im1 , i, j S . If the


right-hand side of (1) is the same for all m, then the Markov
chain is said to be time homogeneous.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
We will consider only time-homogeneous chains, and we
shall write
(n)
pij = Pr(Xm+n = j|Xm = i)
= Pr(Xn = j|X0 = i)

for the n-step transition probabilities and


(1)
pij := pij = Pr(Xm+1 = j|Xm = i)
= Pr(X1 = j|X0 = i)

for the 1-step transition probabilities (or simply transition


probabilities).

Probability & Statistical Models


c Philip K. Pollett
Markov chains
By the law of total probability, we have that
X (n) X
pij = Pr(Xn = j|X0 = i) = 1,
jS jS
P
and in particular that jS pij = 1.

(n)
The matrix P (n) = (pij , i, j S) is called the n-step
transition matrix and P = (pij , i, j S) is called the 1-step
transition matrix (or simply transition matrix).

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Remarks. (1) Matrices like this (with non-negative entries
and all row sums equal to 1) are called stochastic matrices.
Writing 1 = (1, 1, . . . )T (where T denotes transpose), we see
that P 1 = 1. Hence, P (and indeed any stochastic matrix)
has an eigenvector 1 corresponding to an eigenvalue = 1.

(2) We may usefully set P (0) = I , where, as usual, I


denotes the identity matrix:
(
(0) 1 if i = j,
pij = ij :=
0 if i 6= j.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Example. Returning to the hunt, the three states were: (0)
the spider and the fly are in the same room, (1) the spider is
in Room A and the fly is in Room B, and, (2) the spider is in
Room B and the fly is in Room A. Since the spider changes
rooms with probability p and the fly changes rooms with
probability q ,

1 0 0
P = r (1 p)(1 q) pq ,

r pq (1 p)(1 q)

where r = p(1 q) + q(1 p) = p + q 2pq


= 1 [(1 p)(1 q) + pq].

Probability & Statistical Models


c Philip K. Pollett
Markov chains
For example, if p = 1/4 and q = 1/2, then

1 0 0
P = 1/2 3/8 1/8 .

1/2 1/8 3/8

What is the chance that the hunt is over by n minutes?


Can we calculate the chance of being in each of the various
states after n minutes?

Probability & Statistical Models


c Philip K. Pollett
Markov chains
By the law of total probability, we have

(n+m)
pij = Pr(Xn+m = j|X0 = i)
X
= Pr(Xn+m = j|Xn = k, X0 = i)
kS
Pr(Xn = k|X0 = i).

But,

Pr(Xn+m = j|Xn = k, X0 = i)
= Pr(Xn+m = j|Xn = k) (Markov property)
= Pr(Xm = j|X0 = k) (time homogeneous)

Probability & Statistical Models


c Philip K. Pollett
Markov chains
and so, for all m, n 1,
(n+m) (n) (m)
X
pij = pik pkj , i, j S,
kS

or, equivalently, in terms of transition matrices,


P (n+m) = P (n) P (m) . Thus, in particular, we have
P (n) = P (n1) P (remembering that P := P (1) ). Therefore,

P (n) = P n , n 1.

Note that since P (0) = I = P 0 , this expression is valid for all


n 0.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Example. Returning to the hunt, if the spider and the fly
change rooms with probability p = 1/4 and q = 1/2,
respectively, then

1 0 0
P = 1/2 3/8 1/8 .

1/2 1/8 3/8

A simple calculation gives



1 0 0
P 2 = 3/4 5/32 3/32 ,

3/4 3/32 5/32

Probability & Statistical Models


c Philip K. Pollett
Markov chains

1 0 0
P 3 = 7/8 9/128 7/128 ,

7/8 7/128 9/128
et cetera, and, to four decimal places,

1 0 0
P 15 = 1.0000 0.0000 0.0000 .

1.0000 0.0000 0.0000

(n)
Recall that X0 = 1, so p10 is the probability that the hunts
ends by n minutes. What, then, is the probability that the
hunt lasts more than two minutes? Answer: 1 3/4 = 1/4.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Arbitrary initial conditions. What if we are unsure about
where the process starts?
(n)
Let j = Pr(Xn = j) and define a row vector

(n)
(n) = (j , j S),

being the distribution of the chain at time n.

Suppose that we know the initial distribution (0) , that is,


the distribution of X0 (in the previous example we had
(0) = (0 1 0)).

Probability & Statistical Models


c Philip K. Pollett
Markov chains
By the law of total probability, we have
(n)
X
j = Pr(Xn = j) = Pr(Xn = j|X0 = i) Pr(X0 = i)
iS
(0) (n)
X
= i pij ,
iS

and so (n) = (0) P n , n 0.

Definition. If (n) = is the same for all n, then is called


a stationary distribution. If limn (n) exists and equals ,
then is called a limiting distribution.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Example. Returning to the hunt with p = 1/4 and q = 1/2,
suppose that, at the beginning of the hunt, each creature is
equally likely to be in either room, so that
(0) = (1/2 1/4 1/4).

Then,

(n) = (0) P n
n
= (1/2 1/4 1/4) 1 0 0
1/2 3/8 1/8 .

1/2 1/8 3/8

Probability & Statistical Models


c Philip K. Pollett
Markov chains
For example,
3
(3) = (1/2 1/4 1/4) 1 0 0
1/2 3/8 1/8

1/2 1/8 3/8

= (1/2 1/4 1/4) 1 0 0
7/8 9/128 7/128

7/8 7/128 9/128
= (15/16 1/32 1/32).

So, if, initially, each creature is equally likely to be in either


room, then the probability that the hunt ends within 3
minutes is 15/16.
Probability & Statistical Models
c Philip K. Pollett
Markov chains
The two state chain. Let S = {0, 1} and let
!
1p p
P = ,
q 1q

where p, q (0, 1). It can be shown that


! ! !
1 1 p 1 0 q p
P = ,
p + q 1 q 0 r 1 1

where r = 1 p q . This is of the form P = V DV 1 . Check


it! (The procedure is called diagonalization.)

Probability & Statistical Models


c Philip K. Pollett
Markov chains
This is good news because

P 2 = (V DV 1 )(V DV 1 ) = V D(V 1 V )DV 1


= V (DID)V 1 = V D2 V 1 .

Similarly, P n = V Dn V 1 for all n 1. Hence,


! ! !
(n) 1 1 p 1 0 q p
P =
p + q 1 q 0 rn 1 1
!
1 q + prn p prn
= n n
.
p + q q qr p + qr

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Thus we have an explicit expression for the n-step transition
probabilities.

Remark. The above procedure generalizes to any Markov


chain with a finite state space.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
If the initial distribution is (0) = (a b), then, since
(n) = (0) P n ,

q + (ap bq)rn
Pr(Xn = 0) = ,
p+q
p (ap bq)rn
Pr(Xn = 1) = .
p+q

(You should check this for n = 0 and n = 1.) Notice that


when ap = bq , we have

Pr(Xn = 0) = 1 Pr(Xn = 1) = q/(p + q) ,

for all n 0, so that = (q/(p + q) p/(p + q)) is a stationary


distribution.
Probability & Statistical Models
c Philip K. Pollett
Markov chains
Notice also that |r| < 1, since p, q (0, 1). Therefore, is
also a limiting distribution because

lim Pr(Xn = 0) = q/(p + q) ,


n
lim Pr(Xn = 1) = p/(p + q) .
n

Remark. If, for a general Markov chain, a limiting


distribution exists, then it is a stationary distribution, that
is, P = ( is a left eigenvector corresponding to the
eigenvalue 1).

For details (and the converse), you will need a more


advanced course on Stochastic Processes.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Example. Max (a dog) is subjected to a series of trials, in
each of which he is given a choice of going to a dish to his
left, containing tasty food, or a dish to his right, containing
food with an unpleasant taste.

Suppose that if, on any given occasion, Max goes to the


left, then he will return there on the next occasion with
probability 0.99, while if he goes to the right, he will do so
on the next occasion with probability 0.1 (Max is smart, but
he is not infallible).

Probability & Statistical Models


c Philip K. Pollett
Poppy and Max

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Let Xn be 0 or 1 according as Max chooses the dish to the
left or the dish to the right on trial n. Then, {Xn } is a
two-state Markov chain with p = 0.01 and q = 0.9 and hence
r = 0.09. Therefore, if the first dish is chosen at random (at
time n = 1), then Max chooses the tasty food on the n-th
trial with probability
90 89
(0.09)n1 ,
91 182
the long-term probability being 90/91.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Birth-death chains. Their state space S is either the
integers, the non-negative integers, or {0, 1, . . . , N }, and,
jumps of size greater than 1 are not permitted; their
transition probabilities are therefore of the form pi,i+1 = ai ,
pi,i1 = bi and pii = 1 ai bi , with pij = 0 otherwise.

The birth probabilities (ai ) and the death probabilities (bi )


are strictly positive and satisfy ai + bi 1, except perhaps at
the boundaries of S , where they could be 0. If ai = a and
bi = b, the chain is called a random walk .

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Gamblers ruin. A gambler successively wagers a single
unit in an even-money game. Xn is his capital after n bets
and S = {0, 1, . . . , N }. If his capital reaches N he stops and
leaves happy, while state 0 corresponds to bust. Here
ai = bi = 1/2, except at the boundaries (0 and 1 are
absorbing states). It is easy to show that the player goes
bust with probability 1 i/N if his initial capital is i.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
The Ehrenfest diffusion model. N particles are allowed to
pass through a small aperture between two chambers A
and B . We assume that at each time epoch n, a single
particle, chosen uniformly and at random from the N ,
passes through the aperture.

Let Xn be the number in chamber A at time n. Then,


S = {0, 1, . . . , N } and, for i S , ai = 1 i/N and bi = i/N . In
this model, 0 and N are reflecting barriers. It is easy to
show that the stationary distribution is binomial B(N, 1/2).

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Population models. Here Xn is the size of the population
time n (for example, at the end of the n-th breeding cycle, or
at the time of the n-th census). S = {0, 1, . . . }, or
S = {0, 1, . . . , N } when there is an upper limit N on the
population size (frequently interpretted as the carrying
capacity ). Usually 0 is an absorbing state, corresponding to
population extinction, and N is reflecting.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Example. Take S = {0, 1, . . . } with a0 = 0 and, for i 1,
ai = a > 0 and bi = b > 0, where a + b = 1. It can be shown
that extinction occurs with probability 1 when a b, and
with probability (b/a)i when a > b, where i is the initial
population size. This is a good simple model for a
population of cells: a = /( + ) and b = /( + ), where
and are, respectively, the death and the cell division rates.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
The logistic model. This has S = {0, . . . , N }, with 0
absorbing and N reflecting, and, for i = 1, . . . , N 1,

(1 i/N )
ai = , bi = .
+ (1 i/N ) + (1 i/N )

Here and are birth and death rates. Notice that the birth
and the death probabilities depend on i only through i/N , a
quantity which is proportional to the population density :
i/N = (i/Area)/(N/Area). Models with this property are
called density dependent.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Telecommunications. (1) A communications link in a
telephone network has N circuits. One circuit is held by
each call for its duration. Calls arrive at rate > 0 and are
completed at rate > 0. Let Xn be the number of calls in
progress at the n-th time epoch (when an arrival or a
departure occurs). Then, S = {0, . . . , N }, with 0 and N both
reflecting barriers, and, for i = 1, . . . , N 1,
i
ai = , bi = .
+ i + i

Probability & Statistical Models


c Philip K. Pollett
Markov chains
(2) At a node in a packet-switching network, data packets
are stored in a buffer of size N . They arrive at rate > 0
and are transmitted one at a time (in the order in which they
arrive) at rate > 0. Let Xn be the number of packets yet to
be transmitted just after the n-th time epoch (an arrival or a
departure). Then, S = {0, . . . , N }, with 0 and N both
reflecting barriers, and, for i = 1, . . . , N 1,

ai = , bi = .
+ +

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Genetic models. The simplest of these is the Wright-Fisher
model. There are N individuals, each of two genetic types,
A-type and a-type. Mutation (if any) occurs at birth. We
assume that A-types are selectively superior in that the
relative survival rate of A-type over a-type individuals in
successive generations is > 1. Let Xn be the number of
A-type individuals, so that N Xn is the number of a-type.

Probability & Statistical Models


c Philip K. Pollett
Markov chains
Wright and Fisher postulated that the composition of the
next generation is determined by N Bernoulli trials, where
the probability pi of producing an A-type offspring is given
by

[i(1 ) + (N i)]
pi = ,
[i(1 ) + (N i)] + [i + (N i)(1 )]

where and are the respective mutation probabilities. We


have S = {0, . . . , N } and
 
N
pij = pji (1 pi )N j , i, j S.
j

Probability & Statistical Models


c Philip K. Pollett