Beruflich Dokumente
Kultur Dokumente
These notes are maintained by Andrew Rogers. Comments and corrections to soc-archim-notes@lists.cam.ac.uk.
The following people have maintained these notes. June 2000 June 2000 date Kate Metcalfe Andrew Rogers
Contents
Introduction 1 Basic Concepts 1.1 Sample Space . . . . . 1.2 Classical Probability . 1.3 Combinatorial Analysis 1.4 Stirlings Formula . . . 2 The Axiomatic Approach 2.1 The Axioms . . . . . . 2.2 Independence . . . . . 2.3 Distributions . . . . . 2.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 1 1 2 2 5 5 7 8 9 11 11 14 16 18 18 23 23 26 27 27 28 31 34 34 36 37 42 47 50 57 64
. . . .
. . . .
. . . .
. . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
3 Random Variables 3.1 Expectation . . . . . . . . . . 3.2 Variance . . . . . . . . . . . . 3.3 Indicator Function . . . . . . . 3.4 Inclusion - Exclusion Formula 3.5 Independence . . . . . . . . . 4 Inequalities 4.1 Jensens Inequality . . . . . 4.2 Cauchy-Schwarz Inequality . 4.3 Markovs Inequality . . . . . 4.4 Chebyshevs Inequality . . . 4.5 Law of Large Numbers . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
5 Generating Functions 5.1 Combinatorial Applications . . . . . . 5.2 Conditional Expectation . . . . . . . 5.3 Properties of Conditional Expectation 5.4 Branching Processes . . . . . . . . . 5.5 Random Walks . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
6 Continuous Random Variables 6.1 Jointly Distributed Random Variables . . . . . . . . . . . . . . . . . 6.2 Transformation of Random Variables . . . . . . . . . . . . . . . . . . 6.3 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . iii
iv 6.4 6.5
CONTENTS
Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . 67 71
Introduction
These notes are based on the course Probability given by Prof. F.P. Kelly in Cambridge in the Lent Term 1996. This typed version of the notes is totally unconnected with Prof. Kelly. Other sets of notes are available for different courses. At the time of typing these courses were: Probability Analysis Methods Fluid Dynamics 1 Geometry Foundations of QM Methods of Math. Phys Waves (etc.) General Relativity Combinatorics They may be downloaded from http://www.istari.ucam.org/maths/ or http://www.cam.ac.uk/CambUniv/Societies/archim/notes.htm or you can email soc-archim-notes@lists.cam.ac.uk to get a copy of the sets you require. Discrete Mathematics Further Analysis Quantum Mechanics Quadratic Mathematics Dynamics of D.E.s Electrodynamics Fluid Dynamics 2 Statistical Physics Dynamical Systems Bifurcations in Nonlinear Convection
Chapter 1
Basic Concepts
1.1 Sample Space
Suppose we have an experiment with a set of outcomes. Then is called the sample space. A potential outcome ! 2 is called a sample point. For instance, if the experiment is tossing coins, then = fH T g, or if the experiment was tossing two dice, then = f(i j ) : i j 2 f1 : : : 6gg. A subset A of is called an event. An event A occurs is when the experiment is performed, the outcome ! 2 satises ! 2 A. For the coin-tossing experiment, then the event of a head appearing is A = fH g and for the two dice, the event rolling a four would be A = f(1 3) (2 2) (3 1)g.
jAj j j
Example. Choose r digits from a table of random numbers. Find the probability that for 0 k 9, 1. no digit exceeds k , 2.
+1 . Now jAk j = (k + 1)r , so that P(Ak ) = k10 Let Bk be the event that k is the greatest digit drawn. Then Bk = Ak n Ak;1 . Also r ;kr Ak;1 Ak , so that jBk j = (k + 1)r ; kr . Thus P(Bk ) = (k+1) 10r
This makes the ratio 11 : 5. It was previously thought that the ratio should be 6 : 4 on considering termination, but these results are not equally likely.
Proof. Induction.
log p
and thus n!
2 nnn e;n.
and
n log n ; n + 1 log n! (n + 1) log(n + 1) ; n: log n! Divide by n log n and let n ! 1 to sandwich n log n between terms that tend to 1. Therefore log n! n log n.
log xdx
n X
log k
Z n+1
1
log xdx
1 < 1 ; x + x2 : 1 ; x + x2 ; x3 < 1 + x
1 1 1 1 12n2 ; 12n3 hn ; hn+1 12n2 + 6n3 : 1 . Thus h is a decreasing sequence, and 0 For n 2, 0 hn ; hn+1 P n n21 P n 1 h2 ;hn+1 ( h ; h ) . Therefore hn is bounded below, decreasing 2 r r +1 r=2 1 r
n! eAnn+1=2 e;n :
r;1 Ir;2 by integrating by parts. Therefore I 2n = r Now In is decreasing, so
We need a trick to nd A. Let Ir
=2 sinr
=2 and I2n+1 =
I2n
I2n;1 = 1 + 1 ! 1: I2n+1 2n
2n + 1 2 ! 2 : n e2A e2A
Therefore e2A
=2
as required.
1 by
Chapter 2
P(A)
1 for A
2. P(
) = 1,
A2
of disjoint events, P(
Ai ) =
The number P(A) is called the probability of event A. We can look at some distributions here. Consider an arbitrary nite or countable = f!1 !2 : : : g and an arbitrary collection fp1 p2 : : : g of non-negative numbers with sum 1. If we dene
P(A) =
i:!i 2A
pi
it is easy to see that this function satises the axioms. The numbers p1 p2 : : : are called a probability distribution. If is nite with n elements, and if p1 = p2 = = 1 we recover the classical denition of probability. pn = n = f0 1 : : : g and attach to outcome r the Another example would be to let r probability pr = e; r! for some > 0. This is a distribution (as may be easily veried), and is called the Poisson distribution with parameter . Theorem 2.1 (Properties of P). A probability P satises 1. P(Ac ) = 1 ; P(A), 2. P(
) = 0,
3. if A 4. P(A
B then P(A)
P(B ),
Proof. Note that = A Ac , and A \ Ac = . Thus 1 = P( ) = P(A)+ P(Ac ). Now we can use this to obtain P( ) = 1 ; P( c ) = 0. If A B , write B = A (B \ Ac ), so that P(B ) = P(A) + P(B \ Ac ) P(A). Finally, write A B = A (B \ Ac ) and B = (B \ A) (B \ Ac ). Then P(A B ) = P(A) + P(B \ Ac ) and P(B ) = P(B \ A) + P(B \ Ac ), which gives the result. Theorem 2.2 (Booles Inequality). For any A1
P P
n
1
Ai Ai
! X n ! X 1
i i
A2
P(Ai ) P(Ai )
1
1
S = Ai n i1;1 Bk .
Ai = P
=
X X
i i
P(Bi ) P(Ai )
Bi
as Bi
Ai :
n
1
Ai =
Proof. We know that P(A1 A2 ) = P(A1 ) + P(A2 ) ; P(A1 \ A2 ). Thus the result is true for n = 2. We also have that
P(A1
S f1 ::: ng S 6=
0 1 \ (;1)jSj;1 P@ Aj A :
j 2S
An ) = P(A1
n i
An;1 ) \ An ) :
Ai = P
n;1
1
Ai + P(An ) ; P
n;1
1
(Ai \ An ) :
Application of the inductive hypothesis yields the result. Corollary (Bonferroni Inequalities).
according as r is even or odd. Or in other words, if the inclusion-exclusion formula is truncated, the error has the sign of the omitted term and is smaller in absolute value. Note that the case r = 1 is Booles inequality.
S f1 ::: rg S 6=
0 1 \ (;1)jSj;1 P@ Aj A or P
j 2S
n
1
Ai
2.2. INDEPENDENCE
Proof. The result is true for n = 2. If true for n ; 1, then it is true for n and 1 r n ; 1 by the inductive step above, which expresses a n-union in terms of two n ; 1 unions. It is true for r = n by the inclusion-exclusion formula. Example (Derangements). After a dinner, the n guests take coats at random from a pile. Find the probability that at least one guest has the right coat. Solution. Let Ak be the event that guest k has his1 own coat. n We want P( i=1 Ai ). Now,
P(Ai1
\ Air ) =
(n ; r)! n!
by counting the number of ways of matching guests and coats after taken theirs. Thus
i 1 : : : ir have
i1 < <ir n
P(Ai1
\ Air ) =
n (n ; r)! = 1 r! r n!
+ (;1) n!
n;1
which tends to 1 ; e;1 as n ! 1. Furthermore, let Pm(n) be the probability that exactly m guests take the right coat. Then P0 (n) ! e;1 and n! P0(n) is the number of derangements of n objects. Therefore
Pm (n) =
i=1
1+1+ Ai = 1 ; 2! 3!
2.2 Independence
Denition 2.1. Two events A and B are said to be independent if
P(A \ B ) = P(A) P(B ) :
\ ! Y
Ai =
i2J
Example. Two fair dice are thrown. Let A1 be the event that the rst die shows an odd number. Let A2 be the event that the second die shows an odd number and nally let A3 be the event that the sum of the two numbers is odd. Are A1 and A2 independent? Are A1 and A3 independent? Are A1 , A2 and A3 independent?
1 Im
I.
i2J
P(Ai )
not being sexist, merely a lazy typist. Sex will be assigned at random...
Solution. We rst calculate the probabilities of the events A1 , A2 , A3 , A1 \A2 , A1 \A3 and A1 \ A2 \ A3 . Event Probability
18 36 1 =2 1 2
A1 A2 A3 A1 \ A2 A1 \ A3 A1 \ A2 \ A3
As above,
6 3 36 3 3 36 3 3 36 1 =2 1 =4 1 =4
Thus by a series of multiplications, we can see that A1 and A2 are independent, A1 and A3 are independent (also A2 and A3 ), but that A1 , A2 and A3 are not independent. Now we wish to state what we mean by 2 independent experiments 2. Consider 1 = f 1 : : : g and 2 = f 1 : : : g with associated probability distributions fp1 : : : g and fq1 : : : g. Then, by 2 independent experiments, we mean the sample space 1 2 with probability distribution P(( i j )) = pi qj . Now, suppose A 1 and B 2 . The event A can be interpreted as an event in 1 2 , namely A 2 , and similarly for B . Then
P(A \ B ) =
X
i j
2A 2B
p i qj =
X X
i
2A
pi
2B
qj = P(A) P(B )
which is why they are called independent experiments. The obvious generalisation to n experiments can be made, but for an innite sequence of experiments we mean a sample space 1 : : : satisfying the appropriate formula 8n 2 N . 2 You might like to nd the probability that n independent tosses of a biased coin with the probability of heads p results in a total of r heads.
2.3 Distributions
The binomial distribution with parameters n and p, 0 i n;i probabilities pi = n i p (1 ; p) .
p 1 has = f0 : : : ng and
! 1, p ! 0 with np =
n pr (1 ; p)n;r ! e; r : r! r
2 or
more generally, n.
!1
Suppose an innite sequence of independent trials is to be performed. Each trial results in a success with probability p 2 (0 1) or a failure with probability 1 ; p. Such a sequence is called a sequence of Bernoulli trials. The probability that the rst success occurs after exactly r failures is pr = p(1 ; p)r . This is the geometric distribution with parameter p. Since 1 0 pr = 1, the probability that all trials result in failure is zero.
Whenever we write P(AjB ), we assume that P(B ) > 0. Theorem 2.5. 1. P(A \ B ) = P(AjB ) P(B ),
4. the function P(
Proof. Results 1 to 3 are immediate from the denition of conditional probability. For result 4, note that A \ B B , so P(A \ B ) P(B ) and thus P(AjB ) 1. P(B jB ) = 1 (obviously), so it just remains to show the last axiom. For disjoint Ai s,
P
3 read
A given B .
10
X
i
B2 : : : be a partition of
. Then
P(AjBi ) P(Bi ) :
Proof.
P(AjBi ) P(Bi ) =
P(A \ Bi )
=P
= P(A)
A \ Bi
as required.
Solution. Let A be the event that he goes broke before reaching $m, and let H or T be the outcome of the rst toss. We condition on the rst toss to get P(A) = P(AjH ) P(H ) + P(AjT ) P(T ). But P(AjH ) = px+1 and P(AjT ) = px;1 . Thus we obtain the recurrence
Example (Gamblers Ruin). A fair coin is tossed repeatedly. At each toss a gambler wins $1 if a head shows and loses $1 if tails. He continues playing until his capital reaches m or he goes broke. Find px , the probability that he goes broke if his initial capital is $x.
px+1 ; px = px ; px;1 : x. Note that px is linear in x, with p0 = 1, pm = 0. Thus px = 1 ; m Theorem 2.7 (Bayes Formula). Let B1 B2 : : : be a partition of . Then P(AjBi ) P(Bi ) P(Bi jA) = P : j P(AjBj ) P(Bj )
Proof.
P(Bi jA) = P(A \ Bi ) P(A)
Chapter 3
Random Variables
Let be nite or countable, and let p!
= P(f!g) for ! 2
: 7! R.
Note that random variable is a somewhat inaccurate term, a random variable is neither random nor a variable. Example. If = f(i j ) 1 Y by X (i j ) = i + j and Y (i
Let RX be the image of under X . When the range is nite or countable then the random variable is said to be discrete. We write P(X = xi ) for !:X (!)=xi p! , and for B R
P(X
2 B) =
x2B
P(X
= x) :
Then
(P(X = x) x 2 RX )
is the distribution of the random variable X . Note that it is a probability distribution over RX .
3.1 Expectation
Denition 3.2. The expectation of a random variable X is the number
E
X] =
!2
pw X (!)
12 Note that
E
X] =
= = =
X
!2
X X
x
pw X (!)
X X
p! X (!) p!
xP(X = x) :
Absolute convergence allows the sum to be taken in any order. If X is a positive random variable and if +1. If
!2
p! X (!) = 1 we write E X ] =
X
x2RX x 0
x2RX x<0
then E
X ] is undened.
= r) = e; r! , then E X ] =
r
Example. If P(X
Solution.
X] =
1 X
r=0
re; rr!
1 X
= e;
= e; e = ( r ; 1)! r=1
r;1
Example. If P(X
3.1. EXPECTATION
Solution.
E
13
X] =
rpr (1 ; p)n;r n r r=0 n X ! pr (1 ; p)n;r = r r!(nn; r)! r=0 n X ; 1)! = n (r ;(n pr (1 ; p)n;r 1)!( n ; r )! r=1 n X ; 1)! = np (r ;(n pr;1 (1 ; p)n;r 1)!( n ; r )! r=1 n ;1 n ; 1)! X pr (1 ; p)n;1;r = np (r( )!( n ; r )! r=1 n ;1 n ; 1 X r n;1;r = np r p (1 ; p) r=1 = np
n X
For any function f : R 7! R the composition of f and X denes a new random variable f and X denes the new random variable f (X ) given by
Example. dened by
f (X )(w) = f (X (w)): If a, b and c are constants, then a + bX and (X ; c)2 are random variables
(a + bX )(w) = a + bX (w) (X ; c)2 (w) = (X (w) ; c)2 :
and
X ] is a constant.
0 then E X ] 0. If X 0 and E X ] = 0 then P(X = 0) = 1. If a and b are constants then E a + bX ] = a + bE X ]. For any random variables X , Y then E X + Y ] = E X ] + E Y ].
5. E Proof.
X] =
!2
p! X (!) 0
2. If 9!
with p!
14 3.
E
a + bX ] =
X
!2 !2
=a
(a + bX (!)) p!
p! + b
= a + E X] :
4. Trivial. 5. Now
E
!2
p! X (!)
(X ; c)2 = E =E =E =E
(X ; E X ] + E X ] ; c)2 (X ; E X ])2 + 2(X ; E X ])(E X ] ; c) + (E X ] ; c)]2 ] (X ; E X ])2 + 2(E X ] ; c)E (X ; E X ])] + (E X ] ; c)2 (X ; E X ])2 + (E X ] ; c)2 :
This is clearly minimised when c = E Theorem 3.2. For any random variables X1
E
X ]. X2 :::: Xn
E
"X # X n n
i=1
Xi =
i=1
Xi ]
Proof.
E
"X # n
i=1
Xi = E
=E
"n ;1 X
i=1 "n ;1 X i=1
Xi + Xn
Xi + E X ]
3.2 Variance
Var X = E X 2 ; E X ]2 = E X ; E X ]]2 = p Standard Deviation = Var X
Theorem 3.3. Properties of Variance (i) Var X 0 if Var X = 0, then P(X = E X ]) = 1 Proof - from property 1 of expectation (ii) If a b constants, Var (a + bX ) = b2 Var X for Random Variable X
2
3.2. VARIANCE
Proof.
15
Var a + bX = E a + bX ; a ; bE X ]] = b2 E X ; E X ]] = b2 Var X
(iii) Var X Proof.
E
= E X 2 ; E X ]2
Example. Let X have the geometric distribution P(X q and Var X = q2 . and p + q = 1. Then E X ] = p p Solution.
1 X
q = pq(1 ; q);2 = p
1 X
r=0
X2 =
r2 p2 q2r
1 X
r=1
= pq
r(r + 1)qr;1 ;
1 X
r=1
rqr;1
p2 q =p 2
Cov(X Y ) = E (X ; E X ])(Y ; E Y ])] The correlation of X and Y is: Corr(X Y ) = pCov(X Y ) Var X Var Y
A] of an event A
0
if ! if !
is the function
I A](w) = 1
NB that I 1.
E E
2A 2 = A:
(3.1)
A] is a random variable
2. 3. 4.
Example. n couples are arranged randomly around a table such that males and females alternate. Let N = The number of husbands sitting next to their wives. Calculate
17
N=
E
n X i=1 "
I Ai ] I Ai ]
N] = E
= =
n X i=1
E
Ai =
n X
i=1 n 2 X i=1
I Ai ]]
Thus E
E
2 n !2 3 X N2 = E4 I Ai ] 5 2 0i=1 13 2 n X X = E 4 @ I Ai ] + 2 I Ai ]I Aj ]A5
i=1 i j 2 = nE I Ai ] + n(n ; 1)E (I
2 =2 N] = nn
A1 ]I A2 ])] 2 E I Ai ]2 = E I Ai ]] = n (I A1 ]I A2 ])] = I E A1 \ B2 ]] = P(A1 \ A2 ) = P(A1 ) P(A2 jA1 ) 2 1 1 ; n;2 2 =n n;1n;1 n;1n;1 Var N = E N 2 ; E N ]2 2 (1 + 2(n ; 2)) ; 2 = n; 1 2( n ; 2) = n;1
18
"N #
1
Ai =
N !c \ c
Ai = I
" \ N !c # c
1
Ai
1
"\ N #
1
Ai
Ac i
N Y N Y
1 1
I Ac i]
(1 ; I Ai ])
I Ai ] ;
i1 i2 I A1 ]I A2 ]
+ ::: + (;1)j+1
Take Expectation
i1 i2 ::: ij
I A1 ]I A2 ]:::I Aj ] + :::
"N #
1
Ai = P
=
N X
1
N
1
Ai
! X
i1 i2 P(A1 \ A2 )
P
P(Ai ) ;
+ ::: + (;1)j+1
i1 i2 ::: ij
;A
3.5 Independence
Denition 3.5. Discrete random variables for any x1 :::xn :
P(X1
= x1 X2 = x2 :::::::Xn = xn ) =
P(Xi
= xi )
!
If R
Theorem 3.5 (Preservation of Independence). X1 ::: Xn are independent random variables and f1 f2 :::fn are functions R then f1 (X1 ):::fn (Xn ) are independent random variables
3.5. INDEPENDENCE
Proof.
P(f1 (X1 ) = y1
19
: : : fn (Xn ) = yn ) =
= =
X
x1 :f1 (X1 )=y1 ::: xn :fn (Xn )=yn N Y X N Y
1 1
P(X1
= x1 : : : Xn = xn ) = xi )
P(Xi
P(fi (Xi ) = yi )
"Y N # Y N
1
Xi =
NOTE that E
Xi ]
"Y N #
1
Xi =
= =
X
1
x1 2R1 xn 2Rn N X Y N Y
1
::::
xi 2Ri
E
P(Xi
Xi ]
"Y N
1
fi (Xi ) =
# Y N
1
E fi (Xi )]
! X n n X
i=1
Xi =
i=1
Var Xi
20 Proof.
Var
! n X
X
i X i
Xi +
Xi2 ; E Xi ]2
i6=j
Xi Xj ] ;
Xi ] ;
i6=j
Xi ] E Xj ]
Var Xi
Proof.
1 Var X =n i
n2 i=1
Example. Experimental Design. Two rods of unknown lengths a b. A rule can measure the length but with but with error having 0 mean (unbiased) and variance 2 . Errors independent from measurement to measurement. To estimate a b we could take separate measurements A B of each rod.
E
A] = a E B] = b
Var A = Var B =
2 2
3.5. INDEPENDENCE
Can we do better? YEP! Measure a + b as X and a ; b as Y
E
21
2 2
So this is better. Example. Non standard dice. You choose 1 then I choose one. Around this cycle
a ! B P(A B ) = 2 3.
22
Chapter 4
Inequalities
4.1 Jensens Inequality
A function f
(a b) ! R is convex if
24 Concave
CHAPTER 4. INEQUALITIES
neither concave or convex. 00 We know that if f is twice differentiable and f (x) 00 convex and strictly convex if f (x) 0 forx 2 (a b). Example.
0 for x 2 (a b) the if f is
1 f (x) = x 2
00
Strictly concave.
f (x = x3 is strictly convex on (0 1) but not on (;1 1) Theorem 4.1. Let f : (a b) ! R be a convex function. Then:
Example.
n X i=1
pi f (xi ) f
n X i=1
x1 : : : Xn 2 (a b), p1 : : : pn 2 (0 1) and
Pp
pi xi
f (X )] f (E X ])
25
Pp
i=1
pi = 1; p
such that
pi = 1
0
Then by the inductive hypothesis twice, rst for n-1, then for 2
n X
1
pi fi (xi ) = p1 f (x1 ) + (1 ; p1 )
n X i=2
p i f (x i )
0
p1 f (x1 ) + (1 ; p1 )f f p1 x1 + (1 ; p1 )
=f
f is strictly convex n are equal. But then
n X i=2
0
pi xi
0
n X i=2
n X i=1
pi xi
pi xi
3 and not all the x0i s equal then we assume not all of x2 :::xn
n X i=2
(1 ; pj )
So the inequality is strict.
pi f (xi ) (1 ; pj )f
0
n X i=2
pi xi
0
n !n Y
= x2 =
P(X
i=1
xi
n i=1 i = xn
n 1X x
: : : xn
1 = xi ) = n
1).
Therefore
n 1X log x
1
; log
n 1X
n 1X x
1
i=1
xi
n i=1 xi
2]
For strictness since f strictly convex equation holds in [1] and hence [2] if and only if
x1 = x2 =
= xn
26
CHAPTER 4. INEQUALITIES
If f : (a b) ! R is a convex function then it can be shown that at each point y a linear function y + y x such that
2 (a b)9
f (x) y + y x f (y ) = y + y y
x 2 (a b)
0
(y )
Let y
= E X ], = y and = y
f (E X ]) = + f (X )]
+ X] = + E X] = f (E X ])
E
XY ]2
Y, X2
Y2
Proof. For a
quadratic in a with at most one real root and therefore has discriminant
0.
27
XY ]2
X2
Y2
Corollary.
jCorr(X Y )j
1
; E X ] and Y ; E Y ]
a)
jX j]
for any a
Proof. Let
A = jX j a Then jX j aI A]
Take expectation
E E
jX j] jX j]
aP(A) aP(jX j a)
X2 X2
2
1. Then 8
Proof.
I jX j
x2 8x
2
28 Then
CHAPTER 4. INEQUALITIES
I jX j
Take Expectation
P(jX j
x2
2
x2 = E X 2 2 2
Note 1. The result is distribution free - no assumption about the distribution of X (other than E X 2 1). 2. It is the best possible inequality, in the following sense
X=+
=;
Then P(jX j
E P(jX j
22
) = c2 X2 = c
22 = 0 with probability 1 ; c2
c c
E X2 ) = c2 = 2
3. If
gives
2
Var X
Sn =
Then
n X i=1
Xi
n; 0, P S n
! 0 as n ! 1
29
Sn ; n
=
E E
( Snn ; )2
2
Thus P
(Sn ; n )2
Example. Then
A1 A2 ::: are independent events, each with probability p. Let Xi = I Ai ]. Sn = nA = number of times A occurs n n number of trials
= E I Ai ]] = P(Ai ) = p
Sn ; p n
! 0 as n ! 1
Which recovers the intuitive denition of probability. Example. A Random Sample of size n is a sequence X1 X2 identically distributed random variables (n observations)
: : : Xn of independent
Xi X = i=1 n is called the SAMPLE MEAN Theorem states that provided the variance of Xi is nite, the probability that the sample mean differs from the mean of the distribution by more than approaches 0 as n ! 1.
We have shown the weak law of large numbers. Why weak? numbers.
P
Pn
Sn ! n Sn n
as n ! 1
=1
This is NOT the same as the weak form. What does this mean? ! 2 determines
n = 1 2 :::
or it doesnt. as n ! 1
(!) ! ! : Snn
=1
30
CHAPTER 4. INEQUALITIES
Chapter 5
Generating Functions
In this chapter, assume that X is a random variable taking values in the range 0 Let pr = P(X = r) r = 0 1 2 : : :
1 2 : : :.
Denition 5.1. The Probability Generating Function (p.g.f) of the random variable X,or of the distribution pr = 0 1 2 : : : , is
X X p(z ) = E z X = z r P(X = r) = pr z r
1 1
r=0 r=0
This p(z ) is a polynomial or a power series. If a power series then it is convergent for jz j 1 by comparison with a geometric series.
jp(z )j
Example.
X
r
pr jz jr
X
r
pr = 1
Theorem 5.1. The distribution of X is uniquely determined by the p.g.f p(z ). Proof. We know that we can differential p(z) term by term for jz j
pr = 1 6 r = 1 ::: 6
p(i) (z ) =
and has p(i)
1 r! X r;i (r ; i)! pr z
r=i
0 X ] = rlim !1 p (z )
Proof.
p0 (z ) =
For z
1 X
r=i
rpr z r;1
1 X
r=i
jz j 1
X] =
rpr
Choose
N X r=i
rpr
X] ;
N X
Then
lim r !1
True 8
1 X
0 and so
N X
0 X ] = rlim !1 p (z )
X ] = p0 (1). z 1 ; z6 6 1;z
00 X (X ; 1)] = zlim !1 p (z ) 1 X
r=2
Proof.
p00 (z ) =
Proof now the same as Abels Lemma
Theorem 5.4. Suppose that X1 X2 : : : Xn are independent random variables with p.g.fs p1 (z ) p2 (z ) : : : pn (z ). Then the p.g.f of
X1 + X2 + : : : Xn
is
p1 (z )p2 (z ) : : : pn (z )
33 Proof.
E
= r) = e; r!
E
r = 0 1 :::
r
Then
zX =
z r e; r! r=0 = e; e; z = e; (1;z)
00
1 X
p = e; (1;z)
0 0
p = 2 e; (1;z)
Then
E
= =
+ ;
Example. Suppose that Y has a Poisson Distribution with parameter . If X and Y are independent then:
E
But this is the p.g.f of a Poisson random variable with parameter (rst theorem of the p.g.f) this must be the distribution for X + Y Example. X has a binomial Distribution,
P(X E
. By uniqueness
r n;r = r) = n r p (1 ; p)
zX =
= (pz + 1 ; p)n
n n X r n;r r r p (1 ; p) z r=0
r = 0 1 :::
+ Yn . Where Y1 + Y2 +
P(Yi
+ Yn are independent
= 1) = p
= 0) = 1 ; p
Note if the p.g.f factorizes look to see if the random variable can be written as a sum.
n) bathroom with (2 1) tiles. How many ways can this be done? Say fn fn = fn;1 + fn;2 f0 = f1 = 1 F (z ) =
1 X
n=2
Let
1 X
n=0
fn z n
1 X
fn z n = fn;1z n + fn;2 z n fn z n =
1 X
n=2
fn;1z n +
n=0
fn;2 z n
= f1 = 1, then F (z ) = 1;z1;z2
p 1+ 5 = 1 2 p 1; 5 = 2 2
1 F (z ) = (1 ; z )(1 ; 2z) 1
= (1 ; 1 z ) ; (1 ; 2 z ) 1 2 1 X nn 1 = ; 1 1z ; 1 2 n=0
1 X
n=0
nzn
2
fn =
1 ; 1
+1 +1 ( n ; n 1 2 )
= x Y = y)
35
= x) =
X
y2Ry
P(X
= x Y = y)
This is often called the Marginal distribution for X . The conditional distribution for X given by Y = y is
P(X
x Y = y) = xjY = y) = P(XP= (Y = y)
X = xjY = y] =
= y is,
x2Rx
X jY ] (!) = E X jY = Y (!)]
X jY ] : ! R Example. Let X1 X2 : : : Xn be independent identically distributed random variables with P(X1 = 1) = p and P(X1 = 0) = 1 ; p. Let Y = X1 + X2 + + Xn
Thus E Then
P(X1
1 = 1 Y = r) = 1jY = r) = P(XP (Y = r)
r =n
Then
;n;1 1 ;; = r n
r
X1 jY = r] = 0 r =n 1 Y (! ) X1 jY = Y (!)] = n
E
P(X1
= 0jY = r) + 1
P(X1
= 1jY = r)
Therefore E
1Y X1 jY ] = n
36
X jY ]] = E X ]
= y) E X jY = y]
Proof.
E E
X jY ]] =
= =
X
y2Ry
X
y y
P(Y
XX
x
P(Y
= y)
xP(X = xjY = y)
x2Rx
P(X
= xjY = y)
= E X]
Proof.
X2 : : : be i.i.d.r.vs with p.g.f p(z ). Let N be a random variable X2 : : : with p.g.f h(z ). What is the p.g.f of: X1 + X2 + + XN
= =
1 X 1 X
E
z X1 + ::: Xn = E
z X1 + ::: Xn jN
P(N P(N
n=0 n=0
= h(p(z ))
Then for example
E
d h(p(z )) X1 + : : : Xn ] = dz z=1
0 0
= h (1)p (1) = E N ] E X1 ] 2 d 2 h(p(z )) and hence Exercise Calculate dz Var X1 + : : : Xn In terms of Var N and Var X1
37
Xn+1 = Y1n + Y2n + + Ynn Where Yin are i.i.d.r.vs. Yin number of offspring of individual i in generation n.
Assume 1. 2.
f0 0 f0 + f1 1
F (z ) =
Let
1 X
n=0
h i fk z k = E z Xi = E z Yin
Fn (z ) = E z Xn
Then F1 (z ) = F (z ) the probability generating function of the offspring distribution. Theorem 5.7.
38 Proof.
= k) E z Xn+1 jXn = k
n n n = k) E z Y1 +Y2 + +Yn n n = k ) E z Y1 : : : E z Yn
h h
i i
1 X
n=0 = Fn (F (z ))
= k) (F (z ))k
1 X
k=0
kfk 1
(k ; m)2 fk
1
1 X
k=0
Var Xn =
Proof. Prove by calculating F
0
n
00
m;1
m 6= 1 m=1
(5.1)
(z ), F (z ) Alternatively
Xn ] = E E Xn jXn;1 ]] = E mjXn;1 ] = mE Xn;1 ] = mn by induction (Xn ; mXn;1 )2 = E E (Xn ; mXn;1 )2 jXn = E Var (Xn jXn;1 )] = E 2 Xn;1 = 2 mn;1
E
2 2 2 2 n;1 Xn ; 2mE XnXn;1 ] + m2 E Xn ;1 = m
Thus
E
39
Xn Xn;1 ] = E E Xn Xn;1 jXn;1 ]] = E Xn;1 E Xn jXn;1 ]] = E Xn;1 mXn;1] 2 = mE Xn ;1 2 2 n;1 Then E Xn = m + m2 E Xn;1 ]2 2 Var Xn = E Xn ; E Xn ]2 2 2 2 n;1 2 = m2 E Xn ;1 + m ; m E Xn;1 ] = m2 Var Xn;1 + 2 mn;1 = m4 Var Xn;2 + 2 (mn;1 + mn ) = m2(n;1) Var X1 + 2 (mn;1 + mn + + m2n;3 ) = 2 mn;1 (1 + m + + mn )
E
An = Xn = 0
and let A =
An
A1 A2 : : :
and dene
A = nlim !1 An =
Dene Bn for n
1
1
An
B1 = A1 Bn = An \
n;1
i=1 c = An \ An;1
Ai
!c
40
Ai = Ai =
1
i=1 n i=1
Bi Bi
1
i=1
i=1
i=1
Ai = P
=
1
Bi
1 X
P(Bi )
n X n
1
P(Bi )
i=1 n i=1
Bi Ai
= nlim !1 P(An ) lim A = nlim n!1 n !1 P(An ) lim P(An ) = nlim !1 P(Xn = 0) Say =q
n!1
Note P(Xn
= 0) = Fn (0)
Fn is the p.g.f of Xn
So
q = nlim !1 Fn (0)
Also
F (q) = F nlim !1 Fn (0) = nlim !1 F (Fn (0)) = nlim !1 Fn+1 (0) Thus F (q ) = q
Since F is continuous
41
q=
X X
k
P(X1
= k) P(extinctionjX1 = k)
= P(X1 = k) qk = F (q)
Theorem 5.9. The probability of extinction, q , is the smallest positive root of the equation F (q ) = q . m is the mean of the offspring distribution. If m Proof.
1 then q = 1
while if m
1thenq 1
0
F (1) = 1 F (z ) =
00
m=
1 X
0
kfk = zlim !1 F (z )
0
1 X
j =z
Thus if m
1. Further,
42
X2 : : : be i.i.d.r.vs. Let Sn = S0 + X1 + X2 +
+ Xn
Where, usually S0
=0
Then Sn
We shall assume
Xn = 1
;1
(5.2)
Example (Gamblers Ruin). You have an initial fortune of A and I have an initial fortune of B . We toss coins repeatedly I win with probability p and you win with probability q . What is the probability that I bankrupt you before you bankrupt me?
43
Let pz be the probability that the random walk hits a before it hits 0, starting from z . Let qz be the probability that the random walk hits 0 before it hits a, starting from z . After the rst step the gamblers fortune is either z ; 1 or z + 1 with prob p and q respectively. From the law of total probability.
pz = qpz;1 + ppz+1 0 z a Also p0 = 0 and pa = 1. Must solve pt2 ; t + q = 0. p p 1 ; 4pq = 1 1 ; 2p = 1 or q t= 1 2 p 2p p General Solution for p 6= q is q z pz = A + B p A + B = 0A = 1 q a 1;
p
and so
pz =
If p = q , the general solution is A + Bz
q a 1; p
q z 1; p
To calculate qz , observe that this is the same problem with p respectively. Thus
z pz = a
q z replaced by p q a ; z
q z q a p ; p qz = q a if p 6= q ; 1 p
44 or
Thus qz + pz
= 1 and so on, as we expected, the game ends with probability one. P(hits 0 before a) = qz q a q z p ; (p) if p 6= q qz = q a ; 1 p a ;z Or = if p = q
z qz = a ; z if p = q
What happens as a ! 1?
P( paths hit 0 ever) = P(hits 0 ever) =
1
a=z+1 a!1
=1
q = p
p q p=q
(5.3)
G= a;z ;z
E
G] = apz ; z
0
(5.4)
Fair game remains fair if the coin is fair then then games based on it have expected reward 0.
Duration of a Game Let Dz be the expected time until the random walk hits 0 or a, starting from z . Is Dz nite? Dz is bounded above by x the mean of geometric random variables (number of windows of size a before a window with all +1 0 s or ;10s). Hence Dz is nite. Consider the rst step. Then
E duration] = E E duration j rst step]]
Dz = 1 + pDz+1 + qDz;1
= p (E duration j rst step up]) + q (E duration j rst step down]) = p(1 + Dz+1 ) + q(1 + Dz;1 ) Equation holds for 0 z a with D0 = Da = 0. Lets try for a particular solution Dz = Cz Cz = Cp (z + 1) + Cq (z ; 1) + 1 1 C = q ; p for p 6= q
45
pt2 ; t + q = 0
General Solution for p 6= q is
t1 = 1
t2 = q p
p 6= q
Dz ; z 2 + A + Bz
Substituting the boundary conditions given.,
Dz = z (a ; z )
Example. Initial Capital. p 0.5 0.45 0.45 q 0.5 0.55 0.55 z 90 9 90 a 100 10 100
P(ruin) 0.1 0.21 0.87
p=q
E gain] 0 -1.1 -77 E duration] 900 11 766
0 z a
n 0
1 X
n=0
Uz nsn :
1 2:::
46 Must satisfy
(s) = ps (( (s))2 + qs
Two Roots,
1
(s)
(s) = 1
p1 ; 4pqs2
2ps
Uz (s) = A(s) ( 1 (s))z + B (s) ( 2 (s))z Substitute U0 (s) = 1 and Ua (s) = 0.A(s) + B (s) = 1 and A(s) ( 1 (s))a + B (s) ( 2 (s))a = 0
a ( (s))z ; ( (s))z ( (s))a 2 1 2 Uz (s) = ( 1 (s)) ( a ; ( (s))a ( s )) 1 2 q 1 (s) 2 (s) = p recall quadratic q ( 1 (s))a;z ; ( 2 (s))a;z Uz (s) = p ( 1 (s))a ; ( 2 (s))a
But
Same method give generating function for absorption probabilities at the other barrier. Generating function for the duration of the game is the sum of these two generating functions.
Chapter 6
id nite or countable. Assume we are give the position at which it stops, with
2 0 ]) =
(0
2 ) : ! R for which
X (!) b) =
Zb
a
f (x)dx
f (x) 0 R +1 f (x)dx = 1 ;1
The function f is called the Probability Density Function. For example, if X (! ) random variable with p.d.f
f (x) =
(1
0
2
(0 x 2 )
otherwise
(6.1)
02 ]
47
48
in this case. Intuition about probability density functions is based on the approximate relation.
P(X
2 x x + x x]) =
Z x+x x
x
f (z )dz
F (x) =
0
Zx
;1
f (z )dz
F (x) = f (x)
(At any point x where then fundamental theorem of calculus applies). The distribution function is also dened for a discrete random variable,
F (x) =
and so F is a step function.
!:X (!) x
p!
In either case
P(a
;x 0 x 1 F (x) = 1 ; e 0 x 0
The corresponding pdf is
(6.2)
f (x) = e; x
0 x 1
this is known as the exponential distribution with parameter . If X has this distribution then
P(X
This is known as the memoryless property of the exponential distribution. Theorem 6.1. If X is a continuous random variable with pdf f (x) and h(x) is a continuous strictly increasing function with h;1 (x) differentiable then h(x) is a continuous random variable with pdf
is a continuous random variable with pdf as claimed f h . Note usually need to repeat proof than remember the result.
50 Example. Suppose X
Y = ; log x
Z1
e
= 1 ; e;Y
Thus Y is exponentially distributed. More generally Theorem 6.2. Let U U 0 1]. For any continuous distribution function F, the random variable X dened by X = F ;1 (u) has distribution function F . Proof.
P(X
1dx
= Xi ) = pi
i = 0 1 :::
j X i=0
Let
X = xj if
2. useful for simulations
j ;1 X i=0
pi U
pi
U U 0 1]
F (x y) = P(X x Y
Let
y)
F : R2 ! 0 1]
51
X2 : : : Xn ) 2 c) =
ZZ
(x1
::: xn )2c
For some function f called the joint probability density function satisfying the obvious conditions. 1.
f (x1 : : : xn )dx1 0
2.
ZZ
(n = 2)
Z
Rn
Example.
F (x y) = P(X x Y y) Zx Zy = f (u v)dudv @ 2 F (x y) and so f (x y ) = @x@y provided dened at (x y ). If X and y are jointly continuous random
;1 ;1
=
where fX (x) = is the pdf of X .
Z Z1
Z1
;1
A ;1 = fA fX (x)dx
f (x y)dxdy
f (x y)dy
f (x1 : : : xn ) =
i=1
52
Where fXi (xi ) are the pdfs of the individual random variables.
Example. Two points X and Y are tossed at random and independently onto a line segment of length L. What is the probability that:
jX ; Y j
l?
1 f (x y) = L 2
x y 2 0 L]2
53
ZZ
A
f (x y)dxdy
of A = area L2 2 L2 ; 2 1 2 (L ; l ) = L2 2 = 2Ll ; l
L2
Example (Buffons Needle Problem). A needle of length l is tossed at random onto a oor marked with parallel lines a distance L apart l L. What is the probability that the needle intersects one of the parallel lines.
Let 2 0 2 ] be the angle between the needle and the parallel lines and let x be the distance from the bottom of the needle to the line closest to it. It is reasonable to suppose that X is distributed Uniformly.
X U 0 L]
and X and are independent. Thus
U0 )
f (x ) = l1 0 x L and 0
54
ZZ Z
sin
The event A
A
0
f (x )dxd
sin d L
=l
l = 2L
X] =
Z1
;1
xf (x)dx
(x; )2 2 2
;1
x 1
Z1
= x;
;1
. Then
f (x)dx = 1 I = p1 2 1 =p 2 dy
Z1
Z ;1 Z1
(x; )2 2 2 dx
;
Thus I 2
= 21
Z1
;1
x2
dx
Z1
;1
y2
;1
z2
dz
Z1Z1 1 e =2 ;1 ;1 Z2 Z1 = 21 re 0 0
=
Z2
0
drd
d =1
Therefore I = 1. A random variable with the pdf f(x) given above has a Normal distribution with parameters and 2 we write this as
X N
The Expectation is
E
X] = p 1 2 = p1 2
Z1 Z;1 1
;1
xe
(x; )2 2 2 dx
;
(x ; )e
(x; )2 2 2 dx +
p1 2
Z1
;1
(x; )2 2 2 dx:
55
X] = 0 +
=
X] =
Z1
0
P(X
x) dx ; x) dx =
= =
Z1
0
P(X
;x) dx
Proof.
Z1
0
P(X
Z1 Z1
x Z0 1 Z 1
f (y)dy dx
Z 0 1 Z0 y Z0 1 Z0
0 0
I y x]f (y)dydx
dxf (y)dy
Z1
0
=
P(X
yf (y)dy yf (y)dy
;x) dx =
;1
Note This holds for discrete random variables and is useful as a general way of nding the expectation whether the random variable is discrete or continuous. If X takes values in the set 0 1 : : : ] Theorem states
E
X] =
1 X
n=0
P(X
n)
1 X
n=0
P(X
n) =
= =
1 X 1 X
n=0 m=0
I m n]P(X = m) I m n]
1 X 1 X 1 X
P(X
= m)
mP(X = m)
Theorem 6.5. Let X be a continuous random variable with pdf f (x) and let h(x) be a continuous real-valued function. Then provided
Z1
;1
jh(x)j f (x)dx
E
h(x)] =
Z1
;1
h(x)f (x)dx
56 Proof.
Z1
0
P(h(X )
y) dy
= = =
0
Z 1 "Z
x:h(x)
0 0
Z 1Z Z Z
x:h(x)
f (x)dx dy
dy f (x)dx
Similarly
Z1
0
=
P(h(X )
x:h(x)
;y) = ;
h(x)f (x)dy
0
x:h(x)
h(x)f (x)dy
So the result follows from the last theorem. Denition 6.3. The variance of a continuous random variable X is
Var X = E (X ; E X ])2
Note The properties of expectation and variance are the same for discrete and continuous random variables just replace with in the proofs. Example.
Var X = E X 2 ; E X ]2 =
Example. Suppose X
P(Z
Z1
;1
2
x2 f (x)dx ;
] Let z = X ;
Z1
;1
then
xf (x)dx
Let
z z) = P X ; = P(X + z)
=
(x; )2
57
Variance of X ?
X2 : : : Xn ) 2 R) = 1
Let S be the image of R under the above transformation suppose the transformation from R to S is 1-1 (bijective).
y2 : : : yn ) in S
J=
If A
@s1 : : : @y n
..
:::
@sn @yn
. . .
(6.3)
R
P((X1
: : : Xn ) 2 A) 1] =
=
Z Z
A B
Z Z
= P((Y1 : : : Yn ) 2 B ) 2]
Since transformation is 1-1 then [1],[2] are the same Thus the density for Y1
: : : Yn is
f (x y) = 4xy
0
Let U
Y ) has density
(6.4)
for 0 x Otherwise.
10 y 1
=X Y
and V
= XY
59
X = UV x = uv r @x = 1 v @u 2 u 2 @y = ;1 v 1 @u 2 u 3 2
Therefore jJ j = 21 u and so
Y= V rU v y= u r @x = 1 u @v 2 v 1 @y = p @v 2 uv :
p v = 21 u 4 uv u = 2u v if (u v) 2 D = 0 Otherwise:
g(u v) = 2 u v I (u v ) 2 D ] A be the n n
not product of the two identities. When the transformations are linear things are simpler still. Let invertible matrix.
0Y 1 0 X 1 1 1 B C B . . . . = A A: @.A @ .C
Yn Xn
jJ j = det A;1 = det A;1
Then the pdf of (Y1
: : : Yn ) is
Example. Suppose X1 X2 have the pdf f (x1 x2 ). Calculate the pdf of X1 + X2 . Let Y = X1 + X2 and Z = X2 . Then X1 = Y ; Z and X2 = Z .
;1 A;1 = 1 0 1
(6.5)
60 Then
g(y) =
or g (y ) =
Z1
;1
Z;1 1
f (y ; z z )dz
;1
y 1
x ^ is a median if
f (x)8x
Zx ^
;1
f (x)dx ;
Z1
x ^
f (x)dx = 1 2
If X1
: : : Xn is a sample from the distribution then recall that the sample mean is n
n 1X X
1
1 or P(X x 1 x ^) 2 ^) 2
i
Let Y1 : : : Yn (the statistics) be the values of X1 : : : Xn arranged in increasing order. Then the sample median is Y n+1 if n is odd or any value in 2
Yn 2 Y n+1 2
if n is even
61
Similarly Y1
= n (F (y))n;1 f (y)
G(y yn ) = P(Y1 y1 Yn yn ) = P(Yn yn ) ; P(Yn yn Y1 1 ) = P(Yn yn ) ; P(y1 X1 yn y1 X2 yn : : : y1 Xn yn) = (F (yn ))n ; (F (yn ) ; F (y1 ))n Thus the pdf of Y1 Yn is 2 g(y1 yn ) = @y@@y G(y1 yn ) 1 n = n(n ; 1) (F (yn ) ; F (y1 ))n;2 f (y1 )f (yn ) ; 1 y1 yn 1
=0
otherwise What happens if the mapping is not 1-1? X
P(jX j 2 (a
Yn ?
b)) =
n
Zb
a
Suppose X1
: : : Xn are iidrvs. What is the pdf of Y1 : : : Yn the order statistics? ( yn (6.6) g(y : : : y ) = n!f (y1 ) : : : f (yn) y1 y2
1
Otherwise
62 Where Y1 the z 0 s?
Z = AY
Where
01 B ;1 B B 0 A=B B @ ...
0
0 1 ;1
. . .
0 :::
..
0 0 1
.
:::0 0 : : : 0 0C C : : : 0 0C C
;1 1
. . . . . .
1 C A
(6.7)
det(A) = 1 h(z1 : : : zn) = g(y1 : : : yn ) = n!f (y1 ) : : : f (yn ) = n! n e; y1 : : : e; yn = n! n e; (y1 + +yn) = n! n e; (z1 2z2 + +nzn )
=
Thus h(z1
n Y
: : : zn) is expressed as the product of n density functions and Zn+1;i exp( i) exponentially distributed with parameter i, with z1 : : : zn independent. Example. Let X and Y be independent N (0:1) random variables. Let D = R 2 = X 2 + Y2
i=1
ie; izn+1
then tan
Y =X
then
y d = x2 + y2 and = arctan x
jJ j =
x2 y 1+ x
;
2x y
2y
y 1+ x
( )2
( )2
=2
(6.8)
63
f (x y) = p1 e
2 1 =2 e
;
x2
(x2 +y2 ) 2
p1 e 2
y2
Thus
g(d ) = 41 e 2d 0 d 1 0
;
Then D and exponentially mean 2. U 0 2 ]. Note this is useful for the simulations of the normal random variable. We know we can simulate N 0 1] random variable by X = f 0 (U ) when 0 1] but this is difcult for N 0 1] random variable since
g ( ) = 21 are independent. d
1 e 2d gD (d) = 2
;
0 d 1 0 2
F (x) = (x) =
Z +x
1].
p1 e ;1 2
Let
z2
X = R cos Y = R sin = ;2 log U2 sin(2 U1 ) Then X and Y are independent N 0 1] random variables.
p = ;2 log U1 cos(2 U2 ) p
U 0 2 ]. Now let
R2 = ;2 log U , so that R2 is
Example (Bertrands Paradox). Calculate pthe probability that a random chord of a circle of radius 1 has length greater that 3. The length of the side of an inscribed equilateral triangle. There are at least 3 interpretations of a random chord. (1) The ends are independently and uniformly distributed over the circumference.
answer =
1 3
64
(2)The chord is perpendicular to a given diameter and the point of intersection is uniformly distributed over the diameter.
a2 +
3 2
!2
answer = 1 2 (3) The foot of the perpendicular to the chord from the centre of the circle is uniformly distributed over the diameter of the interior circle.
;1
1 22 12 = 4
m( ) = E e x
) is nite
m( ) =
where f (x) is the pdf of X .
Z1
;1
e xf (x)dx
65
Theorem 6.6. The moment generating function determines the distribution of X , provided m( ) is nite for some interval containing the origin. Proof. Not proved. Theorem 6.7. If X and Y are independent random variables with moment generating function mx ( ) and my ( ) then X + Y has the moment generating function
mx+y ( ) = mx ( ) my ( )
Proof.
E
Theorem 6.8. The rth moment of X ie the expected value of X r , E r cient of r! of the series expansion of n( ). Proof. Sketch of....
e X =1+
e X = 1 + X + 2! X 2 + : : :
E
X ] + 2! E X 2 + : : :
if it has a density
eX =
=
Z1
0 0
Z1
e x e x dx e;( ; )xdx
= ; = m( ) for 1 E X ] = m (0) = = 2 ( ; ) =0 2 2 E X2 = = 2 ( ; ) =0 2
0
Thus
Var X = E X 2 ; E X ]2 = 2 ; 1
2 2
66
Example. Suppose X1 : : : Xn are iidrvs each exponentially distributed with parameter . Claim : X1 : : : Xn has a gamma distribution, ;(n ) with parameters n . With density
n e; x xn;1
(n ; 1)!
0 x 1
we can check that this is a density by integrating it by parts and show that it equals 1.
E
i e (X1 + +Xn ) = E e X1 : : : E e Xn = E e X1 n
=
n
Suppose that Y
E
;(n ).
Y =
Z1
0
n Z 1 ( ; )n e;( ; )x xn;1
(n ; 1)!
dx
Hence claim, since moment generating function characterizes distribution. Example (Normal Distribution).
E
X N 0 1]
x;
X =
Z1 Z;1 1
e x p 1 e;( 2 2 ) dx
2
dx
=e
2 2 2
2
2 2
X +Y
2 1
2 + 2 ]
67
aX N a 1 + a2 2 ]
Proof. 1.
E
e (X +Y ) = E e X E e Y 2 2) ( 2 + 1 2 2 1 1 2 = e( 1 + 2 e 2+ 2 2) 2 1( 1 = e( 1 + 2 ) + 2 N
1
2)
2 1
2 + 2 ]
2.
E
2.
Xi has density
Var Xi =
X1 +
68
X1 +
+Xn
has Variance
2 Var X1 + n + Xn = n
Var X1 + pn+ Xn =
: : : Xn be iidrvs with E Xi ] = Sn =
n X
1
and Var Xi
1.
Xi
Then 8(a
b) such that ;1 a b 1
lim P a n!1
Sn ; pn n
b =
Zb
p1 e a 2
z2
dz
0 1] random variable.
69 . mgf of Xi is
mXi ( ) = E e Xi
=1+
E
= 1. we can replace Xi by Xi ;
2 3
Sn The mgf of p n
E
Sn n
=E e
h h =E e h
=E e
n (X1 + +Xn ) n X1 n
i h :::E e i n X1
n
i
p
n Xn
= mX1 p
2
Then
2 X3 =! e 2 as n ! 1 = 1 + 2n + 3 3!n 2 Which is the mgf of N 0 1] random variable. Note if Sn Bin n p] Xi = 1 with probability p and = 0 with probability (1 ; p).
3E
!n
1 with p constant.
This is called the normal approximation the the binomial distribution. Applies as n ! Earlier we discussed the Poisson approximation to the binomial. which applies when n ! 1 and np is constant. Example. There are two competing airlines. n passengers each select 1 of the 2 plans at random. Number of passengers in plane one
Sn ; np ' N 0 1] pnpq
S Bin n 1 2]
Suppose each plane has s seats and let
s; 1 S;1 2n 2n p 1 n 1 pn 2 2 2 s ; n p =1; n therefore if n = 1000 and s = 537 then f (s) = 0:01. Planes hold 1074 seats only 74 f (s) = P
in excess.
70
Example. An unknown fraction of the electorate, p, vote labour. It is desired to nd p within an error no exceeding 0.005. How large should the sample be. Let the fraction of labour votes in the sample be p . We can never be certain (with0 0:005. Instead choose n so that the event out complete enumeration), that p ; p
0
p;p
p;p
Z 1:96
We must choose n so that
p1 e ;1:96 2
x2
dx = 2 (1:96) ; 1
p 0:005 p n
1:96 =1 2
1 4
71
n Y
Write
0X 1 1 B X 2C C ~ =B X B A @ ... C
~ where A is an invertible matrix ~ = ~ + AX z x = A;1 (~ x ; ~ ) . Density of ~ 1 e 21 (A 1 (~ z;~ ))T (A 1 (~ z;~ )) f (z1 : : : zn ) = (2 1) n 2 det A 1 z;~ )T 1 (~ 1 z ;~ ) 2 (~ e = 1 n 2 2 (2 ) j j
; ; ; ; ;
Xn
and let ~ z
where
Covariance matrix
If the covariance matrix of the MVN distribution is diagonal, then the components of z are independent since the random vector ~
f (z1 : : : zn) =
Where
n Y
zi ; i i
02 1 B 0 =B B @ ...
0
0 ::: 0 :::
. . .
2 2
:::
.. .
0 0C C
. . .
1 C A
72
f (x1 x2 ) =
2 2 (1 ; p2 ) 1
1 exp ; 2(1 ; p2 )
"1
"
x1 ;
1 1
;
2
2p x1 ;
1 2
x2 ;
2
+ x1 ;
##
0 and ;1
variable.
+1.
;1 =
;1 ;1 2 1 p1 1 2 ;1 ;1 2 1 ; p2 p 1 2 2
= p
E
2 1 1 2
1 2 2 2