Probability Theory - Kelly.78

Probability
Prof. F.P. Kelly

Lent 1996
These notes are maintained by Andrew Rogers. Comments and corrections to soc-archim-notes@lists.cam.ac.uk.
Revision: 1.1 Date: 1998/06/24 14:38:21
The following people have maintained these notes. June 2000 June 2000 date Kate Metcalfe Andrew Rogers
Contents
Introduction 1 Basic Concepts 1.1 Sample Space . . . . . 1.2 Classical Probability . 1.3 Combinatorial Analysis 1.4 Stirlings Formula . . . 2 The Axiomatic Approach 2.1 The Axioms . . . . . . 2.2 Independence . . . . . 2.3 Distributions . . . . . 2.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 1 1 2 2 5 5 7 8 9 11 11 14 16 18 18 23 23 26 27 27 28 31 34 34 36 37 42 47 50 57 64
. . . .
. . . .
. . . .
. . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
3 Random Variables 3.1 Expectation . . . . . . . . . . 3.2 Variance . . . . . . . . . . . . 3.3 Indicator Function . . . . . . . 3.4 Inclusion - Exclusion Formula 3.5 Independence . . . . . . . . . 4 Inequalities 4.1 Jensens Inequality . . . . . 4.2 Cauchy-Schwarz Inequality . 4.3 Markovs Inequality . . . . . 4.4 Chebyshevs Inequality . . . 4.5 Law of Large Numbers . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
5 Generating Functions 5.1 Combinatorial Applications . . . . . . 5.2 Conditional Expectation . . . . . . . 5.3 Properties of Conditional Expectation 5.4 Branching Processes . . . . . . . . . 5.5 Random Walks . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
6 Continuous Random Variables 6.1 Jointly Distributed Random Variables . . . . . . . . . . . . . . . . . 6.2 Transformation of Random Variables . . . . . . . . . . . . . . . . . . 6.3 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . iii
iv 6.4 6.5
CONTENTS
Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . 67 71
Introduction
These notes are based on the course Probability given by Prof. F.P. Kelly in Cambridge in the Lent Term 1996. This typed version of the notes is totally unconnected with Prof. Kelly. Other sets of notes are available for different courses. At the time of typing these courses were: Probability Analysis Methods Fluid Dynamics 1 Geometry Foundations of QM Methods of Math. Phys Waves (etc.) General Relativity Combinatorics They may be downloaded from http://www.istari.ucam.org/maths/ or http://www.cam.ac.uk/CambUniv/Societies/archim/notes.htm or you can email soc-archim-notes@lists.cam.ac.uk to get a copy of the sets you require. Discrete Mathematics Further Analysis Quantum Mechanics Quadratic Mathematics Dynamics of D.E.s Electrodynamics Fluid Dynamics 2 Statistical Physics Dynamical Systems Bifurcations in Nonlinear Convection
Copyright (c) The Archimedeans, Cambridge University. All rights reserved.

Redistribution and use of these notes in electronic or printed form, with or without modication, are permitted provided that the following conditions are met: 1. Redistributions of the electronic les must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in printed form must reproduce the above copyright notice, this list of conditions and the following disclaimer. 3. All materials derived from these notes must display the following acknowledgement: This product includes notes developed by The Archimedeans, Cambridge University and their contributors. 4. Neither the name of The Archimedeans nor the names of their contributors may be used to endorse or promote products derived from these notes. 5. Neither these notes nor any derived products may be sold on a for-prot basis, although a fee may be required for the physical act of copying. 6. You must cause any edited versions to carry prominent notices stating that you edited them and the date of any change. THESE NOTES ARE PROVIDED BY THE ARCHIMEDEANS AND CONTRIBUTORS AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ARCHIMEDEANS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE NOTES, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Chapter 1
Basic Concepts
1.1 Sample Space
Suppose we have an experiment with a set of outcomes. Then is called the sample space. A potential outcome ! 2 is called a sample point. For instance, if the experiment is tossing coins, then = fH T g, or if the experiment was tossing two dice, then = f(i j ) : i j 2 f1 : : : 6gg. A subset A of is called an event. An event A occurs is when the experiment is performed, the outcome ! 2 satises ! 2 A. For the coin-tossing experiment, then the event of a head appearing is A = fH g and for the two dice, the event rolling a four would be A = f(1 3) (2 2) (3 1)g.
1.2 Classical Probability

If is nite, = f!1 : : : !n g, and each of the then the probability of event A occurring is
P(A) =
n sample points is equally likely
jAj j j
Example. Choose r digits from a table of random numbers. Find the probability that for 0 k 9, 1. no digit exceeds k , 2.
k is the greatest digit drawn. Ak = f(a1 : : : ar ) : 0 ai k i = 1 : : : rg :
Solution. The event that no digit exceeds k is
+1 . Now jAk j = (k + 1)r , so that P(Ak ) = k10 Let Bk be the event that k is the greatest digit drawn. Then Bk = Ak n Ak;1 . Also r ;kr Ak;1 Ak , so that jBk j = (k + 1)r ; kr . Thus P(Bk ) = (k+1) 10r
CHAPTER 1. BASIC CONCEPTS
The problem of the points

Players A and B play a series of games. The winner of a game wins a point. The two players are equally skillful and the stake will be won by the rst player to reach a target. They are forced to stop when A is within 2 points and B within 3 points. How should the stake be divided? Pascal suggested that the following continuations were equally likely AAAA AAAB AABA ABAA BAAA AABB ABBA ABAB BABA BAAB BBAA ABBB BABB BBAB BBBA BBBB
This makes the ratio 11 : 5. It was previously thought that the ratio should be 6 : 4 on considering termination, but these results are not equally likely.
1.3 Combinatorial Analysis

The fundamental rule is: Suppose r experiments are such that the rst may result in any of n 1 possible outcomes and such that for each of the possible outcomes of the rst i ; 1 experiments there are ni possible outcomes to experiment i. Let ai be the outcome of experiment i. r Then there are a total of i=1 ni distinct r-tuples (a1 : : : ar ) describing the possible outcomes of the r experiments.
Proof. Induction.
1.4 Stirlings Formula

For functions g (n) and h(n), we say that g is asymptotically equivalent to h and write g(n) ! 1 as n ! 1. g(n) h(n) if h (n) Theorem 1.1 (Stirlings Formula). As n ! 1,
log p
and thus n!
Pn Proof. log n! = 1 log k . Now Zn

1
We rst prove the weak form of Stirlings formula, that log(n!)
2 nnn e;n.
n! !0 2 nnn e;n n log n.
and
n log n ; n + 1 log n! (n + 1) log(n + 1) ; n: log n! Divide by n log n and let n ! 1 to sandwich n log n between terms that tend to 1. Therefore log n! n log n.
R z logx dx = z log z ; z + 1, and so

1 1
log xdx
n X
log k
Z n+1
1
log xdx
1.4. STIRLINGS FORMULA

Now we prove the strong form. Proof. For x > 0, we have
Now integrate from 0 to y to obtain Let hn

n
1 < 1 ; x + x2 : 1 ; x + x2 ; x3 < 1 + x
y ; y2 =2 + y3 =3 ; y4 =4 < log(1 + y) < y ; y2 =2 + y3 =3:

!e 1 = log nn n+1=2 . Then we obtain
so is convergent. Let the limit be A. We have obtained
1 1 1 1 12n2 ; 12n3 hn ; hn+1 12n2 + 6n3 : 1 . Thus h is a decreasing sequence, and 0 For n 2, 0 hn ; hn+1 P n n21 P n 1 h2 ;hn+1 ( h ; h ) . Therefore hn is bounded below, decreasing 2 r r +1 r=2 1 r
n! eAnn+1=2 e;n :
r;1 Ir;2 by integrating by parts. Therefore I 2n = r Now In is decreasing, so
We need a trick to nd A. Let Ir
=2 sinr
(2n)! (2n n!)2
. We obtain the recurrence Ir
=2 and I2n+1 =
(2n n!)2 (2n+1)! .
I2n+1 I2n I2n+1

2
I2n
I2n;1 = 1 + 1 ! 1: I2n+1 2n
2n + 1 2 ! 2 : n e2A e2A
But by substituting our formula in, we get that
Therefore e2A
=2
as required.
1 by
1 playing silly buggers with log 1 + n
CHAPTER 1. BASIC CONCEPTS
Chapter 2
The Axiomatic Approach

2.1 The Axioms
Let be a sample space. Then probability P is a real valued function dened on subsets of satisfying :1.
P(A)
1 for A
2. P(
) = 1,
3. for a nite or innite sequence A1 i P(Au ).
A2
of disjoint events, P(
Ai ) =
The number P(A) is called the probability of event A. We can look at some distributions here. Consider an arbitrary nite or countable = f!1 !2 : : : g and an arbitrary collection fp1 p2 : : : g of non-negative numbers with sum 1. If we dene
P(A) =
i:!i 2A
pi
it is easy to see that this function satises the axioms. The numbers p1 p2 : : : are called a probability distribution. If is nite with n elements, and if p1 = p2 = = 1 we recover the classical denition of probability. pn = n = f0 1 : : : g and attach to outcome r the Another example would be to let r probability pr = e; r! for some > 0. This is a distribution (as may be easily veried), and is called the Poisson distribution with parameter . Theorem 2.1 (Properties of P). A probability P satises 1. P(Ac ) = 1 ; P(A), 2. P(
) = 0,
3. if A 4. P(A
B then P(A)
P(B ),
B ) = P(A) + P(B ) ; P(A \ B ).

5
CHAPTER 2. THE AXIOMATIC APPROACH
Proof. Note that = A Ac , and A \ Ac = . Thus 1 = P( ) = P(A)+ P(Ac ). Now we can use this to obtain P( ) = 1 ; P( c ) = 0. If A B , write B = A (B \ Ac ), so that P(B ) = P(A) + P(B \ Ac ) P(A). Finally, write A B = A (B \ Ac ) and B = (B \ A) (B \ Ac ). Then P(A B ) = P(A) + P(B \ Ac ) and P(B ) = P(B \ A) + P(B \ Ac ), which gives the result. Theorem 2.2 (Booles Inequality). For any A1
P P
n
1
Ai Ai
! X n ! X 1
i i
A2
P(Ai ) P(Ai )
1
1
Proof. Let B1 = A1 and then inductively let Bi disjoint and i Bi = i Ai . Therefore
S = Ai n i1;1 Bk .
Thus the Bi s are
Ai = P
=
X X
i i
P(Bi ) P(Ai )
Bi
as Bi
Ai :
Theorem 2.3 (Inclusion-Exclusion Formula).

P
n
1
Ai =
Proof. We know that P(A1 A2 ) = P(A1 ) + P(A2 ) ; P(A1 \ A2 ). Thus the result is true for n = 2. We also have that
P(A1
S f1 ::: ng S 6=
0 1 \ (;1)jSj;1 P@ Aj A :
j 2S
An ) = P(A1
n i
An;1 ) + P(An ) ; P((A1
An;1 ) \ An ) :
But by distributivity, we have

P
Ai = P
n;1
1
Ai + P(An ) ; P
n;1
1
(Ai \ An ) :
Application of the inductive hypothesis yields the result. Corollary (Bonferroni Inequalities).
according as r is even or odd. Or in other words, if the inclusion-exclusion formula is truncated, the error has the sign of the omitted term and is smaller in absolute value. Note that the case r = 1 is Booles inequality.
S f1 ::: rg S 6=
0 1 \ (;1)jSj;1 P@ Aj A or P
j 2S
n
1
Ai
2.2. INDEPENDENCE
Proof. The result is true for n = 2. If true for n ; 1, then it is true for n and 1 r n ; 1 by the inductive step above, which expresses a n-union in terms of two n ; 1 unions. It is true for r = n by the inclusion-exclusion formula. Example (Derangements). After a dinner, the n guests take coats at random from a pile. Find the probability that at least one guest has the right coat. Solution. Let Ak be the event that guest k has his1 own coat. n We want P( i=1 Ai ). Now,
P(Ai1
\ Air ) =
(n ; r)! n!
by counting the number of ways of matching guests and coats after taken theirs. Thus
i 1 : : : ir have
i1 < <ir n
P(Ai1
\ Air ) =
n (n ; r)! = 1 r! r n!
+ (;1) n!
n;1
and the required probability is

P
which tends to 1 ; e;1 as n ! 1. Furthermore, let Pm(n) be the probability that exactly m guests take the right coat. Then P0 (n) ! e;1 and n! P0(n) is the number of derangements of n objects. Therefore
Pm (n) =
i=1
1+1+ Ai = 1 ; 2! 3!
n 1 P0(n ; m) (n ; m)! m n! ; 1 ; m) e = P0(n m! ! m! as n ! 1:
2.2 Independence
Denition 2.1. Two events A and B are said to be independent if
P(A \ B ) = P(A) P(B ) :
More generally, a collection of events Ai , i 2 I are independent if

P
\ ! Y
Ai =
i2J
Example. Two fair dice are thrown. Let A1 be the event that the rst die shows an odd number. Let A2 be the event that the second die shows an odd number and nally let A3 be the event that the sum of the two numbers is odd. Are A1 and A2 independent? Are A1 and A3 independent? Are A1 , A2 and A3 independent?
1 Im
for all nite subsets J
I.
i2J
P(Ai )
not being sexist, merely a lazy typist. Sex will be assigned at random...
Solution. We rst calculate the probabilities of the events A1 , A2 , A3 , A1 \A2 , A1 \A3 and A1 \ A2 \ A3 . Event Probability
18 36 1 =2 1 2
A1 A2 A3 A1 \ A2 A1 \ A3 A1 \ A2 \ A3
As above,
6 3 36 3 3 36 3 3 36 1 =2 1 =4 1 =4
Thus by a series of multiplications, we can see that A1 and A2 are independent, A1 and A3 are independent (also A2 and A3 ), but that A1 , A2 and A3 are not independent. Now we wish to state what we mean by 2 independent experiments 2. Consider 1 = f 1 : : : g and 2 = f 1 : : : g with associated probability distributions fp1 : : : g and fq1 : : : g. Then, by 2 independent experiments, we mean the sample space 1 2 with probability distribution P(( i j )) = pi qj . Now, suppose A 1 and B 2 . The event A can be interpreted as an event in 1 2 , namely A 2 , and similarly for B . Then
P(A \ B ) =
X
i j
2A 2B
p i qj =
X X
i
2A
pi
2B
qj = P(A) P(B )
which is why they are called independent experiments. The obvious generalisation to n experiments can be made, but for an innite sequence of experiments we mean a sample space 1 : : : satisfying the appropriate formula 8n 2 N . 2 You might like to nd the probability that n independent tosses of a biased coin with the probability of heads p results in a total of r heads.
2.3 Distributions
The binomial distribution with parameters n and p, 0 i n;i probabilities pi = n i p (1 ; p) .
p 1 has = f0 : : : ng and
! 1, p ! 0 with np =
Theorem 2.4 (Poisson approximation to binomial). If n held xed, then
n pr (1 ; p)n;r ! e; r : r! r
2 or
more generally, n.
2.4. CONDITIONAL PROBABILITY

Proof.
n pr (1 ; p)n;r = n(n ; 1) : : : (n ; r + 1) pr (1 ; p)n;r r r! n ; 1 : : : n ; r + 1 (np)r (1 ; p)n;r =n n n n r! r n;i+1 r n ;r Y = 1 ; 1 ; n r! n n i=1

; r! e r = e; r! :
!1
Suppose an innite sequence of independent trials is to be performed. Each trial results in a success with probability p 2 (0 1) or a failure with probability 1 ; p. Such a sequence is called a sequence of Bernoulli trials. The probability that the rst success occurs after exactly r failures is pr = p(1 ; p)r . This is the geometric distribution with parameter p. Since 1 0 pr = 1, the probability that all trials result in failure is zero.
2.4 Conditional Probability

Denition 2.2. Provided P(B ) be
> 0, we dene the conditional probability of AjB 3 to

P(AjB ) = P(A \ B ) : P(B )
Whenever we write P(AjB ), we assume that P(B ) > 0. Theorem 2.5. 1. P(A \ B ) = P(AjB ) P(B ),
Note that if A and B are independent then P(AjB ) = P(A).
4. the function P(
A\B jC ) , 3. P(AjB \ C ) = P(P (B jC )
2. P(A \ B \ C ) = P(AjB \ C ) P(B jC ) P(C ),
jB ) restricted to subsets of B is a probability function on B .
Proof. Results 1 to 3 are immediate from the denition of conditional probability. For result 4, note that A \ B B , so P(A \ B ) P(B ) and thus P(AjB ) 1. P(B jB ) = 1 (obviously), so it just remains to show the last axiom. For disjoint Ai s,
P
(Ai \ B )) Ai B = P( iP (B ) i P P(A \ B) i = i P(B X ) = P(Ai jB ) as required.

i
3 read
A given B .
10
Theorem 2.6 (Law of total probability). Let B1

P(A) =
X
i
B2 : : : be a partition of
. Then
P(AjBi ) P(Bi ) :
Proof.
P(AjBi ) P(Bi ) =
P(A \ Bi )
=P
= P(A)
A \ Bi
as required.
Solution. Let A be the event that he goes broke before reaching $m, and let H or T be the outcome of the rst toss. We condition on the rst toss to get P(A) = P(AjH ) P(H ) + P(AjT ) P(T ). But P(AjH ) = px+1 and P(AjT ) = px;1 . Thus we obtain the recurrence
Example (Gamblers Ruin). A fair coin is tossed repeatedly. At each toss a gambler wins $1 if a head shows and loses $1 if tails. He continues playing until his capital reaches m or he goes broke. Find px , the probability that he goes broke if his initial capital is $x.
px+1 ; px = px ; px;1 : x. Note that px is linear in x, with p0 = 1, pm = 0. Thus px = 1 ; m Theorem 2.7 (Bayes Formula). Let B1 B2 : : : be a partition of . Then P(AjBi ) P(Bi ) P(Bi jA) = P : j P(AjBj ) P(Bj )
Proof.
P(Bi jA) = P(A \ Bi ) P(A)
AjBi ) P(Bi ) = PP(P j (AjBj ) P(Bj )
by the law of total probability.
Chapter 3
Random Variables
Let be nite or countable, and let p!
= P(f!g) for ! 2
Denition 3.1. A random variable X is a function X
: 7! R.
Note that random variable is a somewhat inaccurate term, a random variable is neither random nor a variable. Example. If = f(i j ) 1 Y by X (i j ) = i + j and Y (i
i j tg, then we can dene random variables X and j ) = maxfi j g
Let RX be the image of under X . When the range is nite or countable then the random variable is said to be discrete. We write P(X = xi ) for !:X (!)=xi p! , and for B R
P(X
2 B) =
x2B
P(X
= x) :
Then
(P(X = x) x 2 RX )
is the distribution of the random variable X . Note that it is a probability distribution over RX .
3.1 Expectation
Denition 3.2. The expectation of a random variable X is the number
E
X] =
!2
pw X (!)
provided that this sum converges absolutely. 11
12 Note that
E
CHAPTER 3. RANDOM VARIABLES
X] =
= = =
X
!2
X X
x
pw X (!)
x2RX !:X (!)=x
X X
p! X (!) p!
x2RX !:X (!)=x x2RX
xP(X = x) :
Absolute convergence allows the sum to be taken in any order. If X is a positive random variable and if +1. If
!2
p! X (!) = 1 we write E X ] =
X
x2RX x 0
xP(X = x) = 1 and xP(X = x) = ;1
x2RX x<0
then E
X ] is undened.
= r) = e; r! , then E X ] =
r
Example. If P(X
Solution.
X] =
1 X
r=0
re; rr!
1 X
= e;
= e; e = ( r ; 1)! r=1
r;1
Example. If P(X
; pr (1 ; p)n;r then E X ] = np. = r) = n r
3.1. EXPECTATION
Solution.
E
13
X] =
rpr (1 ; p)n;r n r r=0 n X ! pr (1 ; p)n;r = r r!(nn; r)! r=0 n X ; 1)! = n (r ;(n pr (1 ; p)n;r 1)!( n ; r )! r=1 n X ; 1)! = np (r ;(n pr;1 (1 ; p)n;r 1)!( n ; r )! r=1 n ;1 n ; 1)! X pr (1 ; p)n;1;r = np (r( )!( n ; r )! r=1 n ;1 n ; 1 X r n;1;r = np r p (1 ; p) r=1 = np
n X
For any function f : R 7! R the composition of f and X denes a new random variable f and X denes the new random variable f (X ) given by
Example. dened by
f (X )(w) = f (X (w)): If a, b and c are constants, then a + bX and (X ; c)2 are random variables
(a + bX )(w) = a + bX (w) (X ; c)2 (w) = (X (w) ; c)2 :
and
Note that E Theorem 3.1. 1. If X 2. 3. 4.
X ] is a constant.
0 then E X ] 0. If X 0 and E X ] = 0 then P(X = 0) = 1. If a and b are constants then E a + bX ] = a + bE X ]. For any random variables X , Y then E X + Y ] = E X ] + E Y ].
5. E Proof.
X ] is the constant which minimises E (X ; c)2 1. X 0 means Xw 0 8 w 2

So E
X] =
!2
p! X (!) 0
2. If 9!
with p!
> 0 and X (!) > 0 then E X ] > 0, therefore P(X = 0) = 1.
14 3.
E
a + bX ] =
X
!2 !2
=a
(a + bX (!)) p!
p! + b
= a + E X] :
4. Trivial. 5. Now
E
!2
p! X (!)
(X ; c)2 = E =E =E =E
(X ; E X ] + E X ] ; c)2 (X ; E X ])2 + 2(X ; E X ])(E X ] ; c) + (E X ] ; c)]2 ] (X ; E X ])2 + 2(E X ] ; c)E (X ; E X ])] + (E X ] ; c)2 (X ; E X ])2 + (E X ] ; c)2 :
This is clearly minimised when c = E Theorem 3.2. For any random variables X1
E
X ]. X2 :::: Xn
E
"X # X n n
i=1
Xi =
i=1
Xi ]
Proof.
E
"X # n
i=1
Xi = E
=E
"n ;1 X
i=1 "n ;1 X i=1
Xi + Xn
Xi + E X ]
Result follows by induction.
3.2 Variance
Var X = E X 2 ; E X ]2 = E X ; E X ]]2 = p Standard Deviation = Var X
Theorem 3.3. Properties of Variance (i) Var X 0 if Var X = 0, then P(X = E X ]) = 1 Proof - from property 1 of expectation (ii) If a b constants, Var (a + bX ) = b2 Var X for Random Variable X
2
3.2. VARIANCE
Proof.
15
Var a + bX = E a + bX ; a ; bE X ]] = b2 E X ; E X ]] = b2 Var X
(iii) Var X Proof.
E
= E X 2 ; E X ]2
X ; E X ]]2 = E X 2 ; 2X E X ] + (E X ])2 = E X 2 ; 2E X ] E X ] + E X ]2 = E X 2 ; (E X ])2

= r) = pqr with r = 0 1 2:::
Example. Let X have the geometric distribution P(X q and Var X = q2 . and p + q = 1. Then E X ] = p p Solution.
X E X] = rpqr = pq rqr;1 r=0 r=0 1 X 1 d (qr ) = pq d 1 = pq dq 1 ; q r=0 dq

1
1 X
q = pq(1 ; q);2 = p
1 X
r=0
X2 =
r2 p2 q2r
1 X
r=1
= pq
r(r + 1)qr;1 ;
1 X
r=1
rqr;1
2 ; 1 = 2q ; q = pq( (1 ; q)3 (1 ; q)2 p2 p 2 Var X = E X ; E X ]2 2 = 2q ; q ; q
p2 q =p 2
Denition 3.3. The co-variance of random variables X and Y is:
Cov(X Y ) = E (X ; E X ])(Y ; E Y ])] The correlation of X and Y is: Corr(X Y ) = pCov(X Y ) Var X Var Y
16 Linear Regression Theorem 3.4. Proof.
Var (X + Y ) = Var X + Var Y + 2Cov(X Y )
Var (X + Y ) = E (X + Y )2 ; E X ] ; E Y ] 2 = E (X ; E X ])2 + (Y ; E Y ])2 + 2(X ; E X ])(Y ; E Y ]) = Var X + Var Y + 2Cov(X Y )
3.3 Indicator Function

Denition 3.4. The Indicator Function I
A] of an event A
0
if ! if !
is the function
I A](w) = 1
NB that I 1.
E E
2A 2 = A:
(3.1)
A] is a random variable
I A]] = P(A) X I A]] = p! I A](w)

= P(A)
!2
2. 3. 4.
I A c ] = 1 ; I A] I A \ B ] = I A]I B ] I A B ] = I A] + I B ] ; I A]I B ] I A B ](!) = 1 if ! 2 A or ! 2 B I A B ](!) = I A](!) + I B ](!) ; I A]I B ](!) WORKS!
Example. n couples are arranged randomly around a table such that males and females alternate. Let N = The number of husbands sitting next to their wives. Calculate
3.3. INDICATOR FUNCTION

the E
17
N ] and the Var N .
N=
E
n X i=1 "
I Ai ] I Ai ]
N] = E
= =
n X i=1
E
Ai =
event couple i are together
n X
i=1 n 2 X i=1
I Ai ]]
Thus E
E
2 n !2 3 X N2 = E4 I Ai ] 5 2 0i=1 13 2 n X X = E 4 @ I Ai ] + 2 I Ai ]I Aj ]A5
i=1 i j 2 = nE I Ai ] + n(n ; 1)E (I
2 =2 N] = nn
A1 ]I A2 ])] 2 E I Ai ]2 = E I Ai ]] = n (I A1 ]I A2 ])] = I E A1 \ B2 ]] = P(A1 \ A2 ) = P(A1 ) P(A2 jA1 ) 2 1 1 ; n;2 2 =n n;1n;1 n;1n;1 Var N = E N 2 ; E N ]2 2 (1 + 2(n ; 2)) ; 2 = n; 1 2( n ; 2) = n;1
18
3.4 Inclusion - Exclusion Formula

N
1
"N #
1
Ai =
N !c \ c
Ai = I
" \ N !c # c
1
Ai
1
=1;I =1; =1; =

N X
1
"\ N #
1
Ai
Ac i
N Y N Y
1 1
I Ac i]
(1 ; I Ai ])
I Ai ] ;
i1 i2 I A1 ]I A2 ]
+ ::: + (;1)j+1
Take Expectation
i1 i2 ::: ij
I A1 ]I A2 ]:::I Aj ] + :::
"N #
1
Ai = P
=
N X
1
N
1
Ai
! X
i1 i2 P(A1 \ A2 )
P
P(Ai ) ;
+ ::: + (;1)j+1
i1 i2 ::: ij
;A
i1 \ Ai2 \ :::: \ Aij + :::
3.5 Independence
Denition 3.5. Discrete random variables for any x1 :::xn :
P(X1
X1 ::: Xn are independent if and only if

N Y
1
= x1 X2 = x2 :::::::Xn = xn ) =
P(Xi
= xi )
!
If R
Theorem 3.5 (Preservation of Independence). X1 ::: Xn are independent random variables and f1 f2 :::fn are functions R then f1 (X1 ):::fn (Xn ) are independent random variables
3.5. INDEPENDENCE
Proof.
P(f1 (X1 ) = y1
19
: : : fn (Xn ) = yn ) =
= =
X
x1 :f1 (X1 )=y1 ::: xn :fn (Xn )=yn N Y X N Y
1 1
P(X1
= x1 : : : Xn = xn ) = xi )
xi :fi (Xi )=yi
P(Xi
P(fi (Xi ) = yi )
Theorem 3.6. If X1 :::::Xn are independent random variables then:

E
"Y N # Y N
1
Xi =
NOTE that E
P X ] = P E X ] without requiring independence.

i i
Xi ]
Proof. Write Ri for RXi the range of Xi

E
"Y N #
1
Xi =
= =
X
1
x1 2R1 xn 2Rn N X Y N Y
1
::::
x1 ::xn P(X1 = x1 X2 = x2 ::::::: Xn = xn )

= xi )
xi 2Ri
E
P(Xi
Xi ]
Theorem 3.7. If X1 tion R ! R then:
::: Xn are independent random variables and f1 ::::fn are funcE
"Y N
1
fi (Xi ) =
# Y N
1
E fi (Xi )]
Proof. Obvious from last two theorems! Theorem 3.8. If X1
::: Xn are independent random variables then:

Var
! X n n X
i=1
Xi =
i=1
Var Xi
20 Proof.
Var
! n X
2 n !23 " n #2 X 5 X Xi = E 4 Xi ; E Xi i=1 i=1 i=1 2 3 " n #2 X X X = E 4 Xi2 + Xi Xj 5 ; E Xi i i=1 i6=j X 2 X X 2 X

= = =
X
i X i
Xi +
Xi2 ; E Xi ]2
i6=j
Xi Xj ] ;
Xi ] ;
i6=j
Xi ] E Xj ]
Var Xi
Theorem 3.9. If X1 then
::: Xn are independent identically distributed random variables

n 1X 1 Var X Var n Xi = n i i=1
Proof.
n 1X 1 Var X Var n Xi = n i 2 i=1 n X = 1 Var X
1 Var X =n i
n2 i=1
Example. Experimental Design. Two rods of unknown lengths a b. A rule can measure the length but with but with error having 0 mean (unbiased) and variance 2 . Errors independent from measurement to measurement. To estimate a b we could take separate measurements A B of each rod.
E
A] = a E B] = b
Var A = Var B =
2 2
3.5. INDEPENDENCE
Can we do better? YEP! Measure a + b as X and a ; b as Y
E
21
X ] = a + b Var X = E Y]=a;b Var Y = X +Y =a E

2 Y 1 Var X + 2 =2 X ;Y =b E 2 X Y 1 Var ; 2 =2
2
2 2
So this is better. Example. Non standard dice. You choose 1 then I choose one. Around this cycle
a ! B P(A B ) = 2 3.
So the relation A better that B is not transitive.
22
Chapter 4
Inequalities
4.1 Jensens Inequality
A function f
(a b) ! R is convex if
f (px + qy) pf (x) + (1 ; p)f (y) - 8x y 2 (a b) - 8p 2 (0 1)

Strictly convex if strict inequality holds when x 6= y
f is concave if ;f is convex. f is strictly concave if ;f is strictly convex 23
24 Concave
CHAPTER 4. INEQUALITIES
neither concave or convex. 00 We know that if f is twice differentiable and f (x) 00 convex and strictly convex if f (x) 0 forx 2 (a b). Example.
0 for x 2 (a b) the if f is
f (x) is strictly convex on (0 1)

Example.
1 f (x) = x 2
00
f (x) = ; log x 1 f (x) = ; x

0
f (x) = ;x log x f (x) = ;(1 + logx) 1 f (x ) = ; x 0

0 00
Strictly concave.
f (x = x3 is strictly convex on (0 1) but not on (;1 1) Theorem 4.1. Let f : (a b) ! R be a convex function. Then:
Example.
n X i=1
pi f (xi ) f
n X i=1
convex then equality holds if and only if all xs are equal.

E
x1 : : : Xn 2 (a b), p1 : : : pn 2 (0 1) and
Pp
pi xi
i = 1. Further more if f is strictly
f (X )] f (E X ])
4.1. JENSENS INEQUALITY

Proof. By induction on n n = 1 nothing to prove Assume results holds up to n-1. Consider x1 ::: xn
25
Pp
i=1
n = 2 denition of convexity. 2 (a b), p1 ::: pn 2 (0 1) and

n X i=2
For i = 2:::n set pi

0
pi = 1; p
such that
pi = 1
0
Then by the inductive hypothesis twice, rst for n-1, then for 2
n X
1
pi fi (xi ) = p1 f (x1 ) + (1 ; p1 )
n X i=2
p i f (x i )
0
p1 f (x1 ) + (1 ; p1 )f f p1 x1 + (1 ; p1 )
=f
f is strictly convex n are equal. But then
n X i=2
0
pi xi
0
n X i=2
n X i=1
pi xi
pi xi
3 and not all the x0i s equal then we assume not all of x2 :::xn
n X i=2
(1 ; pj )
So the inequality is strict.
pi f (xi ) (1 ; pj )f
0
n X i=2
pi xi
0
Corollary (AM/GM Inequality). Positive real numbers x1 1
n !n Y
Equality holds if and only if x 1 Proof. Let
= x2 =
P(X
i=1
xi
n i=1 i = xn
n 1X x
: : : xn
then f (x) = ; log x is a convex function on (0 So

E
1 = xi ) = n
1).
Therefore
n 1X log x
1
f (x)] f (E x]) (Jensens Inequality) ;E log x] log E x] 1]

1 n !n Y
; log
n 1X
n 1X x
1
i=1
xi
n i=1 xi
2]
For strictness since f strictly convex equation holds in [1] and hence [2] if and only if
x1 = x2 =
= xn
26
If f : (a b) ! R is a convex function then it can be shown that at each point y a linear function y + y x such that
2 (a b)9
f (x) y + y x f (y ) = y + y y
x 2 (a b)
0
If f is differentiable at y then the linear function is the tangent f (y ) + (x ; y )f
(y )
Let y
= E X ], = y and = y
X] So for any random variable X taking values in (a b)

E E
f (E X ]) = + f (X )]
+ X] = + E X] = f (E X ])
E
4.2 Cauchy-Schwarz Inequality

Theorem 4.2. For any random variables X
E
XY ]2
Y, X2
Y2
Proof. For a
b 2 R Let LetZ = aX ; bY Then0 E Z 2 = E (aX ; bY )2 = a2 E X 2 ; 2abE XY ] + b2 E Y 2
quadratic in a with at most one real root and therefore has discriminant
0.
4.3. MARKOVS INEQUALITY

Take b 6= 0
E
27
XY ]2
X2
Y2
Corollary.
jCorr(X Y )j
1
; E X ] and Y ; E Y ]
Proof. Apply Cauchy-Schwarz to the random variables X
4.3 Markovs Inequality

Theorem 4.3. If X is any random variable with nite mean then,
P(jX j
a)
jX j]
for any a
Proof. Let
A = jX j a Then jX j aI A]
Take expectation
E E
jX j] jX j]
aP(A) aP(jX j a)
4.4 Chebyshevs Inequality

Theorem 4.4. Let X be a random variable with E
P(jX j
X2 X2
2
1. Then 8
Proof.
I jX j
x2 8x
2
28 Then
I jX j
Take Expectation
P(jX j
x2
2
x2 = E X 2 2 2
Note 1. The result is distribution free - no assumption about the distribution of X (other than E X 2 1). 2. It is the best possible inequality, in the following sense
X=+
=;
Then P(jX j
E P(jX j
with probability with probability
22
) = c2 X2 = c
22 = 0 with probability 1 ; c2
c c
E X2 ) = c2 = 2
3. If
= E X ] then applying the inequality to X ;

P(X
gives
2
Var X
Often the most useful form.
4.5 Law of Large Numbers

Theorem 4.5 (Weak law of large numbers). Let X1 X2 ::::: be a sequences of independent identically distributed random variables with Variance 2 1 Let
Sn =
Then
n X i=1
Xi
n; 0, P S n
! 0 as n ! 1
4.5. LAW OF LARGE NUMBERS

Proof. By Chebyshevs Inequality
P
29
Sn ; n
=
E E
( Snn ; )2
2
Thus P
properties of expectation n2 2 Sn Since E S ] = n = Var n 2 n 2 2 But Var Sn = n Sn ; n 2 = 2 !0 n n2 2 n 2
(Sn ; n )2
Example. Then
A1 A2 ::: are independent events, each with probability p. Let Xi = I Ai ]. Sn = nA = number of times A occurs n n number of trials
= E I Ai ]] = P(Ai ) = p
Theorem states that

P
Sn ; p n
! 0 as n ! 1
Which recovers the intuitive denition of probability. Example. A Random Sample of size n is a sequence X1 X2 identically distributed random variables (n observations)
: : : Xn of independent
Xi X = i=1 n is called the SAMPLE MEAN Theorem states that provided the variance of Xi is nite, the probability that the sample mean differs from the mean of the distribution by more than approaches 0 as n ! 1.
We have shown the weak law of large numbers. Why weak? numbers.
P
Pn
9 a strong form of larger
Sn ! n Sn n
as n ! 1
=1
This is NOT the same as the weak form. What does this mean? ! 2 determines
n = 1 2 :::
or it doesnt. as n ! 1
as a sequence of real numbers. Hence it either tends to

P
(!) ! ! : Snn
=1
30
Chapter 5
Generating Functions
In this chapter, assume that X is a random variable taking values in the range 0 Let pr = P(X = r) r = 0 1 2 : : :
1 2 : : :.
Denition 5.1. The Probability Generating Function (p.g.f) of the random variable X,or of the distribution pr = 0 1 2 : : : , is
X X p(z ) = E z X = z r P(X = r) = pr z r
1 1
r=0 r=0
This p(z ) is a polynomial or a power series. If a power series then it is convergent for jz j 1 by comparison with a geometric series.
jp(z )j
Example.
X
r
pr jz jr
X
r
pr = 1
Theorem 5.1. The distribution of X is uniquely determined by the p.g.f p(z ). Proof. We know that we can differential p(z) term by term for jz j
6 p(z ) = E z X = 1 6 1 + z + :::z z 1 ; z6 =6 1;z
pr = 1 6 r = 1 ::: 6
p (z ) = p1 + 2p2z + : : : and so p (0) = p1 (p(0) = p0 )

0 0
Repeated differentiation gives
p(i) (z ) =
and has p(i)
1 r! X r;i (r ; i)! pr z
r=i
= 0 = i!pi Thus we can recover p0 p1 : : : from p(z)

31
32 Theorem 5.2 (Abels Lemma).

E
CHAPTER 5. GENERATING FUNCTIONS
0 X ] = rlim !1 p (z )
Proof.
p0 (z ) =
For z
1 X
r=i
rpr z r;1
1 X
r=i
jz j 1
2 (0 1), p0 (z ) is a non decreasing function of z and is bounded above by

E
X] =
rpr
Choose
0, N large enough that
N X r=i
rpr
X] ;
N X
Then
lim r !1
True 8
1 X
0 and so
rpr z r;1 rlim rpr z r;1 = rpr ! 1 r=i r=i r=i

E
N X
0 X ] = rlim !1 p (z )
Usually p0 (z ) is continuous at z=1, then E Recall p(z ) = Theorem 5.3.

E
X ] = p0 (1). z 1 ; z6 6 1;z
00 X (X ; 1)] = zlim !1 p (z ) 1 X
r=2
Proof.
p00 (z ) =
Proof now the same as Abels Lemma
r(r ; 1)pz r;2
Theorem 5.4. Suppose that X1 X2 : : : Xn are independent random variables with p.g.fs p1 (z ) p2 (z ) : : : pn (z ). Then the p.g.f of
X1 + X2 + : : : Xn
is
p1 (z )p2 (z ) : : : pn (z )
33 Proof.
E
z X1 +X2 +:::Xn = E z X1 :z X2 : : : :z Xn = E z X1 E z X2 : : : E z Xn = p1 (z )p2 (z ) : : : pn (z )

r
Example. Suppose X has Poisson Distribution

P(X
= r) = e; r!
E
r = 0 1 :::
r
Then
zX =
z r e; r! r=0 = e; e; z = e; (1;z)
00
1 X
Lets calculate the variance of X

0
p = e; (1;z)
0 0
p = 2 e; (1;z)
Then
E
X ] = zlim p (z ) = p (1)( Since p (z ) continuous at z = 1 )E X ] = !1

0
X (X ; 1)] = p (1) = 2 Var X = E X 2 ; E X ]2 = E X (X ; 1)] + E X ] ; E X ]2

00
= =
+ ;
Example. Suppose that Y has a Poisson Distribution with parameter . If X and Y are independent then:
E
z X +Y = E z X E z Y = e; (1;z) e; (1;z) = e;( + )(1;z)
But this is the p.g.f of a Poisson random variable with parameter (rst theorem of the p.g.f) this must be the distribution for X + Y Example. X has a binomial Distribution,
P(X E
. By uniqueness
r n;r = r) = n r p (1 ; p)
zX =
= (pz + 1 ; p)n
n n X r n;r r r p (1 ; p) z r=0
r = 0 1 :::
34 This shows that X = Y1 + Y2 + random variables each with

P(Yi
+ Yn . Where Y1 + Y2 +
P(Yi
+ Yn are independent
= 1) = p
= 0) = 1 ; p
Note if the p.g.f factorizes look to see if the random variable can be written as a sum.
5.1 Combinatorial Applications

Tile a (2
n) bathroom with (2 1) tiles. How many ways can this be done? Say fn fn = fn;1 + fn;2 f0 = f1 = 1 F (z ) =
1 X
n=2
Let
1 X
n=0
fn z n
1 X
fn z n = fn;1z n + fn;2 z n fn z n =
1 X
n=2
fn;1z n +
F (z ) ; f0 ; zf1 = z (F (z ) ; f0 ) + z 2 F (z ) F (z )(1 ; z ; z 2 ) = f0 (1 ; z ) + zf1 = 1 ; z + z = 1:

Since f0 Let
n=0
fn;2 z n
= f1 = 1, then F (z ) = 1;z1;z2
p 1+ 5 = 1 2 p 1; 5 = 2 2
1 F (z ) = (1 ; z )(1 ; 2z) 1
n , that is fn , is The coefcient of z1
= (1 ; 1 z ) ; (1 ; 2 z ) 1 2 1 X nn 1 = ; 1 1z ; 1 2 n=0
1 X
n=0
nzn
2
fn =
1 ; 1
+1 +1 ( n ; n 1 2 )
5.2 Conditional Expectation

Let X and Y be random variables with joint distribution
P(X
= x Y = y)
5.2. CONDITIONAL EXPECTATION

Then the distribution of X is
P(X
35
= x) =
X
y2Ry
P(X
= x Y = y)
This is often called the Marginal distribution for X . The conditional distribution for X given by Y = y is
P(X
x Y = y) = xjY = y) = P(XP= (Y = y)
Denition 5.2. The conditional expectation of X given Y

E
X = xjY = y] =
= y is,
x2Rx
xP(X = xjY = y) X jY ] dened by
The conditional Expectation of X given Y is the random variable E

E
X jY ] (!) = E X jY = Y (!)]
X jY ] : ! R Example. Let X1 X2 : : : Xn be independent identically distributed random variables with P(X1 = 1) = p and P(X1 = 0) = 1 ; p. Let Y = X1 + X2 + + Xn
Thus E Then
P(X1
1 = 1 Y = r) = 1jY = r) = P(XP (Y = r)
+ Xn = r ; 1) 2+ = P(X1 = 1 XP (Y = r) + + Xn = r ; 1) = P(X1 ) P(X2P (Y = r) ; n ; 1 r ;1 p ;1 p (1 ; p)n;r = r; n pr (1 ; p)n;r r
r =n
Then
;n;1 1 ;; = r n
r
X1 jY = r] = 0 r =n 1 Y (! ) X1 jY = Y (!)] = n
E
P(X1
= 0jY = r) + 1
P(X1
= 1jY = r)
Therefore E
Note a random variable - a function of Y .
1Y X1 jY ] = n
36
5.3 Properties of Conditional Expectation

Theorem 5.5.
E E
X jY ]] = E X ]
= y) E X jY = y]
Proof.
E E
X jY ]] =
= =
X
y2Ry
X
y y
P(Y
XX
x
P(Y
= y)
xP(X = xjY = y)
x2Rx
P(X
= xjY = y)
= E X]
Theorem 5.6. If X and Y are independent then

E
Proof.
X jY ] = E X ] If X and Y are independent then for any y 2 R y X X E X jY = y ] = xP(X = xjY = y) = xP(X = x) = E X ]

x2Rx x
Example. Let X1 independent of X1
X2 : : : be i.i.d.r.vs with p.g.f p(z ). Let N be a random variable X2 : : : with p.g.f h(z ). What is the p.g.f of: X1 + X2 + + XN
= =
1 X 1 X
E
z X1 + ::: Xn = E
z X1 + ::: Xn jN
P(N P(N
n=0 n=0
= n) E z X1 + ::: Xn jN = n = n) (p(z ))n
= h(p(z ))
Then for example
E
d h(p(z )) X1 + : : : Xn ] = dz z=1
0 0
= h (1)p (1) = E N ] E X1 ] 2 d 2 h(p(z )) and hence Exercise Calculate dz Var X1 + : : : Xn In terms of Var N and Var X1
5.4. BRANCHING PROCESSES
37
5.4 Branching Processes
ation of population. Assume. 1.
X0 X1 : : : sequence of random variables. Xn number of individuals in the nth generX0 = 1

2. Each individual lives for unit time then on death produces k offspring, probability fk . fk = 1
3. All offspring behave independently.
Xn+1 = Y1n + Y2n + + Ynn Where Yin are i.i.d.r.vs. Yin number of offspring of individual i in generation n.
Assume 1. 2.
Let F(z) be the probability generating function ofY in .
f0 0 f0 + f1 1
F (z ) =
Let
1 X
n=0
h i fk z k = E z Xi = E z Yin
Fn (z ) = E z Xn
Then F1 (z ) = F (z ) the probability generating function of the offspring distribution. Theorem 5.7.
Fn+1 (z ) = Fn (F (z )) = F (F (: : : (F (z )) : : : )) Fn (z ) is an n-fold iterative formula.
38 Proof.
Fn+1 (z ) = E z Xn+1 = E E z Xn+1 jXn

= = = =
1 X 1 X 1 X
n=0 n=0 n=0
P(Xn P(Xn P(Xn P(Xn
= k) E z Xn+1 jXn = k
n n n = k) E z Y1 +Y2 + +Yn n n = k ) E z Y1 : : : E z Yn
h h
i i
1 X
n=0 = Fn (F (z ))
= k) (F (z ))k
Theorem 5.8. Mean and Variance of population size If m = and

2
1 X
k=0
kfk 1
(k ; m)2 fk
1
1 X
k=0
Mean and Variance of offspring distribution. Then E Xn ] = mn
Var Xn =
Proof. Prove by calculating F
0
2 mn;1 (mn ;1)
n
00
m;1
m 6= 1 m=1
(5.1)
(z ), F (z ) Alternatively
Xn ] = E E Xn jXn;1 ]] = E mjXn;1 ] = mE Xn;1 ] = mn by induction (Xn ; mXn;1 )2 = E E (Xn ; mXn;1 )2 jXn = E Var (Xn jXn;1 )] = E 2 Xn;1 = 2 mn;1
E
2 2 2 2 n;1 Xn ; 2mE XnXn;1 ] + m2 E Xn ;1 = m
Thus
E

Now calculate
39
Xn Xn;1 ] = E E Xn Xn;1 jXn;1 ]] = E Xn;1 E Xn jXn;1 ]] = E Xn;1 mXn;1] 2 = mE Xn ;1 2 2 n;1 Then E Xn = m + m2 E Xn;1 ]2 2 Var Xn = E Xn ; E Xn ]2 2 2 2 n;1 2 = m2 E Xn ;1 + m ; m E Xn;1 ] = m2 Var Xn;1 + 2 mn;1 = m4 Var Xn;2 + 2 (mn;1 + mn ) = m2(n;1) Var X1 + 2 (mn;1 + mn + + m2n;3 ) = 2 mn;1 (1 + m + + mn )
E
To deal with extinction we need to be careful with limits as n ! 1. Let
An = Xn = 0
and let A =
= Extinction occurs by generation n

1
1
An
= the event that extinction ever occurs

Can we calculate P(A) from P(An )? More generally let An be an increasing sequence
A1 A2 : : :
and dene
A = nlim !1 An =
Dene Bn for n
1
1
An
B1 = A1 Bn = An \
n;1
i=1 c = An \ An;1
Ai
!c
40
Bn for n 1 are disjoint events and

1
i=1 n
Ai = Ai =
1
i=1 n i=1
Bi Bi
1
i=1
i=1
i=1
Ai = P
=
1
Bi
1 X
P(Bi )
= nlim !1 = nlim !1 = nlim !1

Thus
P
n X n
1
P(Bi )
i=1 n i=1
Bi Ai
= nlim !1 P(An ) lim A = nlim n!1 n !1 P(An ) lim P(An ) = nlim !1 P(Xn = 0) Say =q
n!1
Probability is a continuous set function. Thus

P(extinction ever occurs) =
Note P(Xn
= 0), n = 1 2 3 : : : is an increasing sequence so limit exists. But

P(Xn
= 0) = Fn (0)
Fn is the p.g.f of Xn
So
q = nlim !1 Fn (0)
Also
F (q) = F nlim !1 Fn (0) = nlim !1 F (Fn (0)) = nlim !1 Fn+1 (0) Thus F (q ) = q
Since F is continuous

q is called the Extinction Probability. Alternative Derivation
41
q=
X X
k
P(X1
= k) P(extinctionjX1 = k)
= P(X1 = k) qk = F (q)
Theorem 5.9. The probability of extinction, q , is the smallest positive root of the equation F (q ) = q . m is the mean of the offspring distribution. If m Proof.
1 then q = 1
while if m
1thenq 1
0
F (1) = 1 F (z ) =
00
m=
1 X
0
kfk = zlim !1 F (z )
0
1 X
j =z
j (j ; 1)z j;2 in 0 z 1 Since f0 + f1 1 Also F (0) = f0 0

1 then let
Thus if m
1, there does not exists a q 2 (0 1) with F (q) = q. If m
be the smallest positive root of F (z ) = z then
1. Further,
F (0) F ( ) = F (F (0)) F ( ) = Fn (0) 8n 1 q = nlim F !1 n (0) 0 q= Since q is a root of F (z ) = z
42
5.5 Random Walks

Let X1
X2 : : : be i.i.d.r.vs. Let Sn = S0 + X1 + X2 +
+ Xn
Where, usually S0
=0
Then Sn
(n = 0 1 2 : : : is a 1 dimensional Random Walk.
We shall assume
Xn = 1
;1
with probability p with probability q
(5.2)
This is a simple random walk. If p = q
=1 2 then the random walk is called symmetric
Example (Gamblers Ruin). You have an initial fortune of A and I have an initial fortune of B . We toss coins repeatedly I win with probability p and you win with probability q . What is the probability that I bankrupt you before you bankrupt me?
5.5. RANDOM WALKS

Set a = A + B and z
43
= B Stop a random walk starting at z when it hits 0 or a.
Let pz be the probability that the random walk hits a before it hits 0, starting from z . Let qz be the probability that the random walk hits 0 before it hits a, starting from z . After the rst step the gamblers fortune is either z ; 1 or z + 1 with prob p and q respectively. From the law of total probability.
pz = qpz;1 + ppz+1 0 z a Also p0 = 0 and pa = 1. Must solve pt2 ; t + q = 0. p p 1 ; 4pq = 1 1 ; 2p = 1 or q t= 1 2 p 2p p General Solution for p 6= q is q z pz = A + B p A + B = 0A = 1 q a 1;
p
and so
pz =
If p = q , the general solution is A + Bz
q a 1; p
q z 1; p
To calculate qz , observe that this is the same problem with p respectively. Thus
z pz = a
q z replaced by p q a ; z
q z q a p ; p qz = q a if p 6= q ; 1 p
44 or
Thus qz + pz
= 1 and so on, as we expected, the game ends with probability one. P(hits 0 before a) = qz q a q z p ; (p) if p 6= q qz = q a ; 1 p a ;z Or = if p = q
z qz = a ; z if p = q
What happens as a ! 1?
P( paths hit 0 ever) = P(hits 0 ever) =
1
a=z+1 a!1
path hits 0 before it hits a
lim P(hits 0 before a) = alim !1 qz
Let G be the ultimate gain or loss.
=1
q = p
p q p=q
(5.3)
G= a;z ;z
E
G] = apz ; z
0
with probability pz with probability qz if p 6= q if p = q
(5.4)
Fair game remains fair if the coin is fair then then games based on it have expected reward 0.
Duration of a Game Let Dz be the expected time until the random walk hits 0 or a, starting from z . Is Dz nite? Dz is bounded above by x the mean of geometric random variables (number of windows of size a before a window with all +1 0 s or ;10s). Hence Dz is nite. Consider the rst step. Then
E duration] = E E duration j rst step]]
Dz = 1 + pDz+1 + qDz;1
= p (E duration j rst step up]) + q (E duration j rst step down]) = p(1 + Dz+1 ) + q(1 + Dz;1 ) Equation holds for 0 z a with D0 = Da = 0. Lets try for a particular solution Dz = Cz Cz = Cp (z + 1) + Cq (z ; 1) + 1 1 C = q ; p for p 6= q
5.5. RANDOM WALKS

Consider the homogeneous relation
45
pt2 ; t + q = 0
General Solution for p 6= q is
t1 = 1
t2 = q p
q z+ z Dz = A + B p q=p Substitute z = 0 a to get A and B

q z 1 ; p z ; a Dz = q ; p q;p1; q a p
p 6= q
If p = q then a particular solution is ;z 2 . General solution
Dz ; z 2 + A + Bz
Substituting the boundary conditions given.,
Dz = z (a ; z )
Example. Initial Capital. p 0.5 0.45 0.45 q 0.5 0.55 0.55 z 90 9 90 a 100 10 100
P(ruin) 0.1 0.21 0.87
p=q
E gain] 0 -1.1 -77 E duration] 900 11 766
Stop the random walk when it hits 0 or a. We have absorption at 0 or a. Let
Uz n = P(r.w. hits 0 at time nstarts at z) Uz n+1 = pUz+1 n + qUz;1 n U0 n = Ua n = 0 n 0 Ua 0 = 1Uz 0 = 0 0 z a

Let Uz
0 z a
n 0
1 X
n=0
Uz nsn :
1 2:::
Now multiply by sn+1 and add for n = 0
Where U0 (s) = 1 and Ua (s) = 0 Look for a solution
Uz (s) = psUz+1(s) + qsUz;1 (s)
Ux(s) = ( (s))z (s)
46 Must satisfy
(s) = ps (( (s))2 + qs
Two Roots,
1
(s)
(s) = 1
p1 ; 4pqs2
2ps
Every Solution of the form
Uz (s) = A(s) ( 1 (s))z + B (s) ( 2 (s))z Substitute U0 (s) = 1 and Ua (s) = 0.A(s) + B (s) = 1 and A(s) ( 1 (s))a + B (s) ( 2 (s))a = 0
a ( (s))z ; ( (s))z ( (s))a 2 1 2 Uz (s) = ( 1 (s)) ( a ; ( (s))a ( s )) 1 2 q 1 (s) 2 (s) = p recall quadratic q ( 1 (s))a;z ; ( 2 (s))a;z Uz (s) = p ( 1 (s))a ; ( 2 (s))a
But
Same method give generating function for absorption probabilities at the other barrier. Generating function for the duration of the game is the sum of these two generating functions.
Chapter 6
Continuous Random Variables

In this chapter we drop the assumption that given a probability p on some subset of . For example, spin a pointer, and let ! 2 = ! : 0 ! 2 . Let
P(!
id nite or countable. Assume we are give the position at which it stops, with
2 0 ]) =
Denition 6.1. A continuous random variable X is a function X

P(a
(0
2 ) : ! R for which
X (!) b) =
Zb
a
f (x)dx
Where f (x) is a function satisfying 1. 2.
f (x) 0 R +1 f (x)dx = 1 ;1
The function f is called the Probability Density Function. For example, if X (! ) random variable with p.d.f
= ! given position of the pointer then x is a continuous
f (x) =
(1
0
2
(0 x 2 )
otherwise
(6.1)
This is an example of a uniformly distributed random variable. On the interval
02 ]
47
48
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
in this case. Intuition about probability density functions is based on the approximate relation.
P(X
2 x x + x x]) =
Z x+x x
x
f (z )dz
Proofs however more often use the distribution function
F (x) = P(X x) F (x) is increasing in x.
If X is a continuous random variable then
F (x) =
0
Zx
and so F is continuous and differentiable.
;1
f (z )dz
F (x) = f (x)
(At any point x where then fundamental theorem of calculus applies). The distribution function is also dened for a discrete random variable,
F (x) =
and so F is a step function.
!:X (!) x
p!
In either case
P(a
X b) = P(X b) ; P(X a) = F (b) ; F (a)
49 Example. The exponential distribution. Let
;x 0 x 1 F (x) = 1 ; e 0 x 0
The corresponding pdf is
(6.2)
f (x) = e; x
0 x 1
this is known as the exponential distribution with parameter . If X has this distribution then
P(X
X x + z) x + z jX z ) = P(P (X z ) ; (x+z ) = e e; z = e; x = P(X x)
This is known as the memoryless property of the exponential distribution. Theorem 6.1. If X is a continuous random variable with pdf f (x) and h(x) is a continuous strictly increasing function with h;1 (x) differentiable then h(x) is a continuous random variable with pdf
; d h;1 (x) fh (x) = f h;1 (x) dx
Proof. The distribution function of h(X ) is

P(h(X )
; ; x) = P X h;1 (x) = F h;1 (x)

d dx P(h(X ) x)
Since h is strictly increasing and F is the distribution function of X Then.
is a continuous random variable with pdf as claimed f h . Note usually need to repeat proof than remember the result.
50 Example. Suppose X
Y = ; log x
U 0 1] that is it is uniformly distributed on 0 1] Consider

P(Y
y) = P(; log X y) ; = P X e;Y

=
Z1
e
= 1 ; e;Y
Thus Y is exponentially distributed. More generally Theorem 6.2. Let U U 0 1]. For any continuous distribution function F, the random variable X dened by X = F ;1 (u) has distribution function F . Proof.
P(X
1dx
; x) = P F ;1 (u) x = P(U F (x)) = F (x) U 0 1]
Remark 1. a bit more messy for discrete random variables

P(X
= Xi ) = pi
i = 0 1 :::
j X i=0
Let
X = xj if
2. useful for simulations
j ;1 X i=0
pi U
pi
U U 0 1]
6.1 Jointly Distributed Random Variables

For two random variables X and Y the joint distribution function is
F (x y) = P(X x Y
Let
y)
F : R2 ! 0 1]
FX (x) = P(Xz x) = P(X x Y 1) = F ( x 1) = ylim !1 F (x y)
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES

This is called the marginal distribution of X. Similarly
51
FY (x) = F (1 y) X1 X2 : : : Xn are jointly distributed continuous random variables if for a set c 2 R b

P((X1
X2 : : : Xn ) 2 c) =
ZZ
(x1
::: xn )2c
f (x1 : : : xn )dx1 : : : dxn
For some function f called the joint probability density function satisfying the obvious conditions. 1.
f (x1 : : : xn )dx1 0
2.
ZZ
(n = 2)
Z
Rn
f (x1 : : : xn )dx1 : : : dxn = 1
Example.
F (x y) = P(X x Y y) Zx Zy = f (u v)dudv @ 2 F (x y) and so f (x y ) = @x@y provided dened at (x y ). If X and y are jointly continuous random
;1 ;1
Theorem 6.3. variables then they are individually continuous.

P(X
Proof. Since X and Y are jointly continuous random variables
2 A) = P(X 2 A Y 2 (;1 +1))
=
where fX (x) = is the pdf of X .
Z Z1
Z1
;1
A ;1 = fA fX (x)dx
f (x y)dxdy
f (x y)dy
Jointly continuous random variables X and Y are Independent if Then P(X
f (x y) = fX (x)fY (y) 2 A Y 2 B ) = P(X 2 A) P(Y 2 B )

n Y
Similarly jointly continuous random variables X 1
: : : Xn are independent if fXi (xi )
f (x1 : : : xn ) =
i=1
52
Where fXi (xi ) are the pdfs of the individual random variables.
Example. Two points X and Y are tossed at random and independently onto a line segment of length L. What is the probability that:
jX ; Y j
l?
Suppose that at random means uniformly so that
1 f (x y) = L 2
x y 2 0 L]2

Desired probability
53
ZZ
A
f (x y)dxdy
of A = area L2 2 L2 ; 2 1 2 (L ; l ) = L2 2 = 2Ll ; l
L2
Example (Buffons Needle Problem). A needle of length l is tossed at random onto a oor marked with parallel lines a distance L apart l L. What is the probability that the needle intersects one of the parallel lines.
Let 2 0 2 ] be the angle between the needle and the parallel lines and let x be the distance from the bottom of the needle to the line closest to it. It is reasonable to suppose that X is distributed Uniformly.
X U 0 L]
and X and are independent. Thus
U0 )
f (x ) = l1 0 x L and 0
54

The needle intersects the line if and only if X
ZZ Z
sin
The event A
A
0
f (x )dxd
sin d L
=l
Denition 6.2. The expectation or mean of a continuous random variable X is

E
l = 2L
provided not both of
R 1 xf (x)dx and R 0 xf (x)dx are innite ;1 ;1

f (x) = p 1 e
2
;
X] =
Z1
;1
xf (x)dx
Example (Normal Distribution). Let
(x; )2 2 2
;1
x 1
This is non-negative for it to be a pdf we also need to check that
Z1
Make the substitution z
= x;
;1
. Then
f (x)dx = 1 I = p1 2 1 =p 2 dy
Z1
Z ;1 Z1
(x; )2 2 2 dx
;
Thus I 2
= 21
Z1
;1
x2
dx
Z1
;1
y2
;1
z2
dz
Z1Z1 1 e =2 ;1 ;1 Z2 Z1 = 21 re 0 0
=
(y2 +x2 ) 2 dxdy 2 2
Z2
0
drd
d =1
Therefore I = 1. A random variable with the pdf f(x) given above has a Normal distribution with parameters and 2 we write this as
X N
The Expectation is
E
X] = p 1 2 = p1 2
Z1 Z;1 1
;1
xe
(x; )2 2 2 dx
;
(x ; )e
(x; )2 2 2 dx +
p1 2
Z1
;1
(x; )2 2 2 dx:

The rst term is convergent and equals zero by symmetry, so that
E
55
X] = 0 +
=
Theorem 6.4. If X is a continuous random variable then,

E
X] =
Z1
0
P(X
x) dx ; x) dx =
= =
Z1
0
P(X
;x) dx
Proof.
Z1
0
P(X
Z1 Z1
x Z0 1 Z 1
f (y)dy dx
Z 0 1 Z0 y Z0 1 Z0
0 0
I y x]f (y)dydx
dxf (y)dy
Similarly result follows.
Z1
0
=
P(X
yf (y)dy yf (y)dy
;x) dx =
;1
Note This holds for discrete random variables and is useful as a general way of nding the expectation whether the random variable is discrete or continuous. If X takes values in the set 0 1 : : : ] Theorem states
E
X] =
1 X
n=0
P(X
n)
and a direct proof follows
1 X
n=0
P(X
n) =
= =
1 X 1 X
n=0 m=0
I m n]P(X = m) I m n]
1 X 1 X 1 X
m=0 n=0 m=0
P(X
= m)
mP(X = m)
Theorem 6.5. Let X be a continuous random variable with pdf f (x) and let h(x) be a continuous real-valued function. Then provided
Z1
;1
jh(x)j f (x)dx
E
h(x)] =
Z1
;1
h(x)f (x)dx
56 Proof.
Z1
0
P(h(X )
y) dy
= = =
0
Z 1 "Z
x:h(x)
0 0
Z 1Z Z Z
x:h(x)
f (x)dx dy
x:h(x) 0 "Z h(x)

0 0
I h(x) y]f (x)dxdy

0
dy f (x)dx
Similarly
Z1
0
=
P(h(X )
x:h(x)
;y) = ;
h(x)f (x)dy
0
x:h(x)
h(x)f (x)dy
So the result follows from the last theorem. Denition 6.3. The variance of a continuous random variable X is
Var X = E (X ; E X ])2
Note The properties of expectation and variance are the same for discrete and continuous random variables just replace with in the proofs. Example.
Var X = E X 2 ; E X ]2 =
Example. Suppose X
P(Z
Z1
;1
2
x2 f (x)dx ;
] Let z = X ;
Z1
;1
then
xf (x)dx
Let
p 1 e 2 2 dx 2 Z;1 z 1 u2 x ; p e 2 du u= = ;1 2 = (z ) The distribution function of a N (0 1) random variable Z N (0 1)
z z) = P X ; = P(X + z)
=
(x; )2
6.2. TRANSFORMATION OF RANDOM VARIABLES

What is the variance of Z ?
57
Var X = E Z 2 ; E Z ]2 Last term is zero Z 1 z2 1 =p z 2e 2 dz 2 ;1 Z 1 z2 1 z2 + e 2 dz = ; p1 ze 2 2 ;1 ;1 =0+1=1 Var X = 1

; ; ;
Variance of X ?
X= + z Thus E X ] = we know that already Var X = 2 Var Z Var X = 2 X ( 2)
6.2 Transformation of Random Variables

Suppose X1
X2 : : : Xn have joint pdf f (x1 : : : xn ) let Y1 = r1 (X1 X2 : : : Xn ) Y2 = r2 (X1 X2 : : : Xn ) Yn = rn (X1 X2 : : : Xn )

. . .
Let R 2 Rn be such that

P((X1
X2 : : : Xn ) 2 R) = 1
Let S be the image of R under the above transformation suppose the transformation from R to S is 1-1 (bijective).
58 Then 9 inverse functions
x1 = s1 (y1 y2 : : : yn ) x2 = s2 (y1 y2 : : : yn ) : : : xn = sn (y1 y2 : : : yn )

@si exists and is continuous at every point (y1 Assume that @y j @s1 @y1 @sn @y1
. . .
y2 : : : yn ) in S
J=
If A
@s1 : : : @y n
..
:::
@sn @yn
. . .
(6.3)
R
P((X1
: : : Xn ) 2 A) 1] =
=
Z Z
A B
Z Z
f (x1 : : : xn )dx1 : : : dxn f (s1 : : : sn ) jJ j dy1 : : : dyn
Where B is the image of A
= P((Y1 : : : Yn ) 2 B ) 2]
Since transformation is 1-1 then [1],[2] are the same Thus the density for Y1
: : : Yn is
g((y1 y2 : : : yn ) = f (s1 (y1 y2 : : : yn ) : : : sn (y1 y2 : : : yn)) jJ j y1 y2 : : : yn 2 S

= 0 otherwise.
Example (density of products and quotients). Suppose that (X
f (x y) = 4xy
0
Let U
Y ) has density
(6.4)
for 0 x Otherwise.
10 y 1
=X Y
and V
= XY
59
X = UV x = uv r @x = 1 v @u 2 u 2 @y = ;1 v 1 @u 2 u 3 2
Therefore jJ j = 21 u and so
Y= V rU v y= u r @x = 1 u @v 2 v 1 @y = p @v 2 uv :
g(u v) = 21u (4xy)
p v = 21 u 4 uv u = 2u v if (u v) 2 D = 0 Otherwise:
Note U and V are NOT independent
g(u v) = 2 u v I (u v ) 2 D ] A be the n n
not product of the two identities. When the transformations are linear things are simpler still. Let invertible matrix.
0Y 1 0 X 1 1 1 B C B . . . . = A A: @.A @ .C
Yn Xn
jJ j = det A;1 = det A;1
Then the pdf of (Y1
: : : Yn ) is
Example. Suppose X1 X2 have the pdf f (x1 x2 ). Calculate the pdf of X1 + X2 . Let Y = X1 + X2 and Z = X2 . Then X1 = Y ; Z and X2 = Z .
1 f (A;1 g) g(y1 : : : n ) = det A
;1 A;1 = 1 0 1
(6.5)
1 det A;1 = 1 det A
60 Then
g(y z ) = f (x1 x2 ) = f (y ; z z ) joint distributions of Y and X .

Marginal density of Y is
g(y) =
or g (y ) =
Z1
;1
Z;1 1
f (y ; z z )dz
;1
y 1
f (z y ; z )dz By change of variable
If X1 and X2 are independent, with pgfs f1 and f2 then
f (x1 x2 ) = f (x1 )f (x2 ) Z1 and then g (y ) = f (y ; z )f (z )dz

;1
- the convolution of f1 and f2
x ^ is a median if
For the pdf f(x) x ^ is a mode if f (^ x)
f (x)8x
Zx ^
;1
f (x)dx ;
Z1
x ^
f (x)dx = 1 2
For a discrete random variable, x ^ is a median if

P(X
If X1
: : : Xn is a sample from the distribution then recall that the sample mean is n
n 1X X
1
1 or P(X x 1 x ^) 2 ^) 2
i
Let Y1 : : : Yn (the statistics) be the values of X1 : : : Xn arranged in increasing order. Then the sample median is Y n+1 if n is odd or any value in 2
Yn 2 Y n+1 2
if n is even

If Yn = maxX1 sity f then,
61
: : : Xn and X1 : : : Xn are iidrvs with distribution F and denP(Yn
y) = P(X1 y : : : Xn y) = (F (y))n d (F (y))n g(y) = dy
Thus the density of Yn is
Similarly Y1
= minX1 : : : Xn and is P(Y1 y ) = 1 ; P(X1 y : : : Xn y ) = 1 ; (1 ; F (y))n = n (1 ; F (y))n;1 f (y)
= n (F (y))n;1 f (y)
Then the density of Y1 is
G(y yn ) = P(Y1 y1 Yn yn ) = P(Yn yn ) ; P(Yn yn Y1 1 ) = P(Yn yn ) ; P(y1 X1 yn y1 X2 yn : : : y1 Xn yn) = (F (yn ))n ; (F (yn ) ; F (y1 ))n Thus the pdf of Y1 Yn is 2 g(y1 yn ) = @y@@y G(y1 yn ) 1 n = n(n ; 1) (F (yn ) ; F (y1 ))n;2 f (y1 )f (yn ) ; 1 y1 yn 1
=0
otherwise What happens if the mapping is not 1-1? X
P(jX j 2 (a
What about the joint density of Y1
Yn ?
b)) =
n
Zb
a
= f (x) and jX j = g(x)?
(f (x) + f (;x)) dx g(x) = f (x) + f (;x)
Suppose X1
: : : Xn are iidrvs. What is the pdf of Y1 : : : Yn the order statistics? ( yn (6.6) g(y : : : y ) = n!f (y1 ) : : : f (yn) y1 y2
1
Otherwise
Example. Suppose . Let
X1 : : : Xn are iidrvs exponentially distributed with parameter z1 = Y1 z2 = Y2 ; Y1 zn = Yn ; Yn;1

. . .
62 Where Y1 the z 0 s?
: : : Yn are the order statistics of X1 : : : Xn .
What is the distribution of
Z = AY
Where
01 B ;1 B B 0 A=B B @ ...
0
0 1 ;1
. . .
0 :::
..
0 0 1
.
:::0 0 : : : 0 0C C : : : 0 0C C
;1 1
. . . . . .
1 C A
(6.7)
det(A) = 1 h(z1 : : : zn) = g(y1 : : : yn ) = n!f (y1 ) : : : f (yn ) = n! n e; y1 : : : e; yn = n! n e; (y1 + +yn) = n! n e; (z1 2z2 + +nzn )
=
Thus h(z1
n Y
: : : zn) is expressed as the product of n density functions and Zn+1;i exp( i) exponentially distributed with parameter i, with z1 : : : zn independent. Example. Let X and Y be independent N (0:1) random variables. Let D = R 2 = X 2 + Y2
i=1
ie; izn+1
then tan
Y =X
then
y d = x2 + y2 and = arctan x
jJ j =
x2 y 1+ x
;
2x y
2y
y 1+ x
( )2
( )2
=2
(6.8)
63
f (x y) = p1 e
2 1 =2 e
;
x2
(x2 +y2 ) 2
p1 e 2
y2
Thus
g(d ) = 41 e 2d 0 d 1 0
;
But this is just the product of the densities
Then D and exponentially mean 2. U 0 2 ]. Note this is useful for the simulations of the normal random variable. We know we can simulate N 0 1] random variable by X = f 0 (U ) when 0 1] but this is difcult for N 0 1] random variable since
g ( ) = 21 are independent. d
1 e 2d gD (d) = 2
;
0 d 1 0 2
F (x) = (x) =
Z +x
1].
p1 e ;1 2
Let
z2
is difcult. U0 Let U1 and U2 be independent exponential with mean 2. = 2 U2 . Then
X = R cos Y = R sin = ;2 log U2 sin(2 U1 ) Then X and Y are independent N 0 1] random variables.
p = ;2 log U1 cos(2 U2 ) p
U 0 2 ]. Now let
R2 = ;2 log U , so that R2 is
Example (Bertrands Paradox). Calculate pthe probability that a random chord of a circle of radius 1 has length greater that 3. The length of the side of an inscribed equilateral triangle. There are at least 3 interpretations of a random chord. (1) The ends are independently and uniformly distributed over the circumference.
answer =
1 3
64
(2)The chord is perpendicular to a given diameter and the point of intersection is uniformly distributed over the diameter.
a2 +
3 2
!2
answer = 1 2 (3) The foot of the perpendicular to the chord from the centre of the circle is uniformly distributed over the diameter of the interior circle.
interior circle has radius 1 2. answer
;1
1 22 12 = 4
6.3 Moment Generating Functions

If X is a continuous random variable then the analogue of the pgf is the moment generating function dened by for those such that m(
m( ) = E e x
) is nite
m( ) =
where f (x) is the pdf of X .
Z1
;1
e xf (x)dx
6.3. MOMENT GENERATING FUNCTIONS
65
Theorem 6.6. The moment generating function determines the distribution of X , provided m( ) is nite for some interval containing the origin. Proof. Not proved. Theorem 6.7. If X and Y are independent random variables with moment generating function mx ( ) and my ( ) then X + Y has the moment generating function
mx+y ( ) = mx ( ) my ( )
Proof.
E
e (x+y) = E e xe y =E e x E e y = mx( )my ( ) X r ], is the coef-
Theorem 6.8. The rth moment of X ie the expected value of X r , E r cient of r! of the series expansion of n( ). Proof. Sketch of....
e X =1+
e X = 1 + X + 2! X 2 + : : :
E
X ] + 2! E X 2 + : : :
if it has a density
Example. Recall e x for 0 x
X has an exponential distribution, parameter

1.
E
eX =
=
Z1
0 0
Z1
e x e x dx e;( ; )xdx
= ; = m( ) for 1 E X ] = m (0) = = 2 ( ; ) =0 2 2 E X2 = = 2 ( ; ) =0 2
0
Thus
Var X = E X 2 ; E X ]2 = 2 ; 1
2 2
66
Example. Suppose X1 : : : Xn are iidrvs each exponentially distributed with parameter . Claim : X1 : : : Xn has a gamma distribution, ;(n ) with parameters n . With density
n e; x xn;1
(n ; 1)!
0 x 1
we can check that this is a density by integrating it by parts and show that it equals 1.
E
i e (X1 + +Xn ) = E e X1 : : : E e Xn = E e X1 n
=
n
Suppose that Y
E
;(n ).
Y =
Z1
0
n ; x xn;1 e x (en ; 1)! dx

0
n Z 1 ( ; )n e;( ; )x xn;1
(n ; 1)!
dx
Hence claim, since moment generating function characterizes distribution. Example (Normal Distribution).
E
X N 0 1]
x;
X =
Z1 Z;1 1
2 1 ;1 2 2 p = ; 2 2 x) dx exp 2 2 (x ; 2x + 2 ;1 Z1 1 ;1 ; 2 2 p = ) ;2 2 ; 2 exp 2 2 (x ; ; ;1 2 Z 2 2 1 1 ;1 2 2 p ) dx =e + 2 exp 2 2 (x ; ; ;1 2 +

2 2
e x p 1 e;( 2 2 ) dx
2
dx
The integral equals 1 are it is the density of N
=e
2 2 2
2
Which is the moment generating function of N Theorem 6.9. then 1.
] random variable. 2 ] and Y Suppose X , Y are independent X N 1 1
2 2
X +Y
2 1
2 + 2 ]
6.4. CENTRAL LIMIT THEOREM

2.
67
aX N a 1 + a2 2 ]
Proof. 1.
E
e (X +Y ) = E e X E e Y 2 2) ( 2 + 1 2 2 1 1 2 = e( 1 + 2 e 2+ 2 2) 2 1( 1 = e( 1 + 2 ) + 2 N
1
2)
which is the moment generating function for
2 1
2 + 2 ]
2.
E
e (aX ) = E e( a)X 2 2 2 1 ( a) = e 1 ( a)+ 1 2 2 2 2a 1 = e(a 1 ) + 1 Na

1 2 a2 1 ]
which is the moment generating function of
6.4 Central Limit Theorem
X1 : : : Xn iidrvs, mean 0 and variance
2.
Xi has density
Var Xi =
X1 +
+ Xn has Variance Var X1 + + Xn = n

2
68
X1 +
+Xn
has Variance
2 Var X1 + n + Xn = n
X1 +p +Xn has Variance n
Var X1 + pn+ Xn =
Theorem 6.10. Let X1
: : : Xn be iidrvs with E Xi ] = Sn =
n X
1
and Var Xi
1.
Xi
Then 8(a
b) such that ;1 a b 1
lim P a n!1
Sn ; pn n
b =
Zb
p1 e a 2
z2
dz
Which is the pdf of a N
0 1] random variable.
6.4. CENTRAL LIMIT THEOREM

Proof. Sketch of proof..... WLOG take = 0 and
69 . mgf of Xi is
mXi ( ) = E e Xi
=1+
E
= 1. we can replace Xi by Xi ;
2 3
Xi ] + 2 E Xi2 + 3! E Xi3 + : : : 3 2 = 1 + 2 + 3! E Xi3 + : : :
Sn The mgf of p n
E
Sn n
=E e
h h =E e h
=E e
n (X1 + +Xn ) n X1 n
i h :::E e i n X1
n
i
p
n Xn
= mX1 p
2
Then
2 X3 =! e 2 as n ! 1 = 1 + 2n + 3 3!n 2 Which is the mgf of N 0 1] random variable. Note if Sn Bin n p] Xi = 1 with probability p and = 0 with probability (1 ; p).
3E
!n
1 with p constant.
This is called the normal approximation the the binomial distribution. Applies as n ! Earlier we discussed the Poisson approximation to the binomial. which applies when n ! 1 and np is constant. Example. There are two competing airlines. n passengers each select 1 of the 2 plans at random. Number of passengers in plane one
Sn ; np ' N 0 1] pnpq
S Bin n 1 2]
Suppose each plane has s seats and let
s; 1 S;1 2n 2n p 1 n 1 pn 2 2 2 s ; n p =1; n therefore if n = 1000 and s = 537 then f (s) = 0:01. Planes hold 1074 seats only 74 f (s) = P
in excess.
f (s) = P(S s) S ; np ' n 0 1] pnpq
70
Example. An unknown fraction of the electorate, p, vote labour. It is desired to nd p within an error no exceeding 0.005. How large should the sample be. Let the fraction of labour votes in the sample be p . We can never be certain (with0 0:005. Instead choose n so that the event out complete enumeration), that p ; p
0
p;p
0:005 have probability 0:95.

P
p;p
0:005 = P(jSn ; npj 0:005n) p jSn ; npj 0:005 p n = P pnpq n 0:95.
Choose n such that the probability is
Z 1:96
We must choose n so that
p1 e ;1:96 2
x2
dx = 2 (1:96) ; 1
p 0:005 p n
1:96 =1 2
But we dont know p. But pq
1 4
with the worst case p = q

2
96 1 ' 40 000 n 01::005 24

If we replace 0.005 by 0.01 the n by 0.045 then n 475 will sufce.
10 000 will be sufcient. And is we replace 0.005
Note Answer does not depend upon the total population.
6.5. MULTIVARIATE NORMAL DISTRIBUTION
71
6.5 Multivariate normal distribution

Let x1
: : : Xn be iid N 0 1] random variables with joint density g(x1 : : : xn ) g(x1 : : : xn ) =

x2 p1 e 2 i i=1 2 1 Pn 2 2 i=1 xi e = 1n (2 ) 2 x^~ x = 1n e 21 ~ (2 ) 2
; ; ;
n Y
Write
0X 1 1 B X 2C C ~ =B X B A @ ... C
~ where A is an invertible matrix ~ = ~ + AX z x = A;1 (~ x ; ~ ) . Density of ~ 1 e 21 (A 1 (~ z;~ ))T (A 1 (~ z;~ )) f (z1 : : : zn ) = (2 1) n 2 det A 1 z;~ )T 1 (~ 1 z ;~ ) 2 (~ e = 1 n 2 2 (2 ) j j
; ; ; ; ;
Xn
and let ~ z
where
AAT . This is the multivariate normal density ~ z MV N ~ ] j ) entry of
Cov(zi zj ) = E (zi ; i )(zj ; j )]

But this is the (i
E
~ )(AX ~ )T (~ z ; ~ )(~ z ; ~ )T = E (AX = AE XX T AT = AIAT = AAT =

1 e 21 1 i=1 (2 ) 2 i
;
Covariance matrix
If the covariance matrix of the MVN distribution is diagonal, then the components of z are independent since the random vector ~
f (z1 : : : zn) =
Where
n Y
zi ; i i
02 1 B 0 =B B @ ...
0
0 ::: 0 :::
. . .
2 2
:::
.. .
0 0C C
. . .
1 C A
Not necessarily true if the distribution is no MVN recall sheet 2 question 9.
72
Example (bivariate normal).
f (x1 x2 ) =
2 2 (1 ; p2 ) 1
1 exp ; 2(1 ; p2 )
"1
"
x1 ;
1 1
;
2
2p x1 ;
1 2
x2 ;
2
+ x1 ;
##
0 and ;1
variable.
+1.
Joint distribution of a bivariate normal random
Example. An example with
;1 =
;1 ;1 2 1 p1 1 2 ;1 ;1 2 1 ; p2 p 1 2 2
= p
E
2 1 1 2
1 2 2 2
Xi ] = i and Var Xi = i2 . Cov(X1 X2 ) = 1 2 p. Correlation(X1 X2 ) = Cov(X1 X2 ) = p

1 2

Probability Theory - Kelly.78

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Probability Theory - Kelly.78

Hochgeladen von

Copyright:

Verfügbare Formate

Probability

Prof. F.P. Kelly

Revision: 1.1 Date: 1998/06/24 14:38:21

Copyright (c) The Archimedeans, Cambridge University. All rights reserved.

1.2 Classical Probability

n sample points is equally likely

k is the greatest digit drawn. Ak = f(a1 : : : ar ) : 0 ai k i = 1 : : : rg :

Solution. The event that no digit exceeds k is

CHAPTER 1. BASIC CONCEPTS

The problem of the points

1.3 Combinatorial Analysis

1.4 Stirlings Formula

Pn Proof. log n! = 1 log k . Now Zn

We rst prove the weak form of Stirlings formula, that log(n!)

n! !0 2 nnn e;n n log n.

R z logx dx = z log z ; z + 1, and so

1.4. STIRLINGS FORMULA

Now integrate from 0 to y to obtain Let hn

y ; y2 =2 + y3 =3 ; y4 =4 < log(1 + y) < y ; y2 =2 + y3 =3:

so is convergent. Let the limit be A. We have obtained

(2n)! (2n n!)2

. We obtain the recurrence Ir

(2n n!)2 (2n+1)! .

I2n+1 I2n I2n+1

But by substituting our formula in, we get that

1 playing silly buggers with log 1 + n

CHAPTER 1. BASIC CONCEPTS

The Axiomatic Approach

3. for a nite or innite sequence A1 i P(Au ).

B ) = P(A) + P(B ) ; P(A \ B ).

CHAPTER 2. THE AXIOMATIC APPROACH

Proof. Let B1 = A1 and then inductively let Bi disjoint and i Bi = i Ai . Therefore

Thus the Bi s are

Theorem 2.3 (Inclusion-Exclusion Formula).

An;1 ) + P(An ) ; P((A1

But by distributivity, we have

and the required probability is

n 1 P0(n ; m) (n ; m)! m n! ; 1 ; m) e = P0(n m! ! m! as n ! 1:

More generally, a collection of events Ai , i 2 I are independent if

for all nite subsets J

CHAPTER 2. THE AXIOMATIC APPROACH

Theorem 2.4 (Poisson approximation to binomial). If n held xed, then

2.4. CONDITIONAL PROBABILITY

n pr (1 ; p)n;r = n(n ; 1) : : : (n ; r + 1) pr (1 ; p)n;r r r! n ; 1 : : : n ; r + 1 (np)r (1 ; p)n;r =n n n n r! r n;i+1 r n ;r Y = 1 ; 1 ; n r! n n i=1

2.4 Conditional Probability

> 0, we dene the conditional probability of AjB 3 to

Note that if A and B are independent then P(AjB ) = P(A).

A\B jC ) , 3. P(AjB \ C ) = P(P (B jC )

2. P(A \ B \ C ) = P(AjB \ C ) P(B jC ) P(C ),

jB ) restricted to subsets of B is a probability function on B .

(Ai \ B )) Ai B = P( iP (B ) i P P(A \ B) i = i P(B X ) = P(Ai jB ) as required.

CHAPTER 2. THE AXIOMATIC APPROACH

Theorem 2.6 (Law of total probability). Let B1

AjBi ) P(Bi ) = PP(P j (AjBj ) P(Bj )

by the law of total probability.

Denition 3.1. A random variable X is a function X

i j tg, then we can dene random variables X and j ) = maxfi j g

provided that this sum converges absolutely. 11

CHAPTER 3. RANDOM VARIABLES

x2RX !:X (!)=x

x2RX !:X (!)=x x2RX

xP(X = x) = 1 and xP(X = x) = ;1

; pr (1 ; p)n;r then E X ] = np. = r) = n r