Sie sind auf Seite 1von 43

Chapter 5

Chapter 5 sections

Discrete univariate distributions:


5.2 Bernoulli and Binomial distributions
Just skim 5.3 Hypergeometric distributions
5.4 Poisson distributions
Just skim 5.5 Negative Binomial distributions
Continuous univariate distributions:
5.6 Normal distributions
5.7 Gamma distributions
Just skim 5.8 Beta distributions
Multivariate distributions
Just skim 5.9 Multinomial distributions
5.10 Bivariate normal distributions

1 / 43
Chapter 5 5.1 Introduction

Families of distributions

How:
Parameter and Parameter space
pf /pdf and cdf - new notation: f (x| parameters )
Mean, variance and the m.g.f. ψ(t)
Features, connections to other distributions, approximation
Reasoning behind a distribution
Why:
Natural justification for certain experiments
A model for the uncertainty in an experiment
All models are wrong, but some are useful – George Box

2 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions

Bernoulli distributions
Def: Bernoulli distributions – Bernoulli(p)
A r.v. X has the Bernoulli distribution with parameter p if P(X = 1) = p
and P(X = 0) = 1 − p. The pf of X is

px (1 − p)1−x

for x = 0, 1
f (x|p) =
0 otherwise

Parameter space: p ∈ [0, 1]

In an experiment with only two possible outcomes, “success” and


“failure”, let X = number successes. Then X ∼ Bernoulli(p) where
p is the probability of success.
E(X ) = p, Var(X ) = p(1 − p) and ψ(t) = E(etX ) = pet + (1 − p)

 0 for x < 0
The cdf is F (x|p) = 1−p for 0 ≤ x < 1
1 for x ≥ 1

3 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions

Binomial distributions

Def: Binomial distributions – Binomial(n, p)


A r.v. X has the Binomial distribution with parameters n and p if X has
the pf
 n x n−x for x = 0, 1, . . . , n
f (x|n, p) = x p (1 − p)
0 otherwise

Parameter space: n is a positive integer and p ∈ [0, 1]

If X is the number of “successes” in n independent tries where prob. of


success is p each time, then X ∼ Binomial(n, p)
Theorem 5.2.1
If X1 , X2 , . . . , Xn form n Bernoulli trials with parameter p
(i.e. are i.i.d. Bernoulli(p)) then X = X1 + · · · + Xn ∼ Binomial(n, p)

4 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions

Binomial distributions

Let X ∼ Binomial(n, p)
E(X ) = np, Var(X ) = np(1 − p)
To find the m.g.f. of X write X = X1 + · · · + Xn where Xi ’s are
i.i.d. Bernoulli(p). Then ψi (t) = pet + 1 − p and we get
n
Y n
Y
pet + 1 − p = (pet + 1 − p)n

ψ(t) = ψi (t) =
i=1 i=1

Px n
pt (1 − p)n−t = yikes!

cdf: F (x|n, p) = t=0 t

Theorem 5.2.2
If Xi ∼ Binomial(ni , p), i = 1, . . P
. , k and the Xi ’s are independent, then
X = X1 + · · · + Xk ∼ Binomial( ki=1 ni , p)

5 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions

Example: Blood testing (Example 5.2.7)


The setup:
1000 people need to be tested for a disease that affects 0.2% of
all people.
The test is guaranteed to detect the disease if it is present in a
blood sample.

Task: Find all the people that have the disease.


Strategy: Test 1000 samples
What’s the expected number of people that have the disease?
Any assumptions you need to make?
Strategy (611):
Divide the people into 10 groups of 100.
For each group take a portion of each of the 100 blood samples
and combine into one sample.
Then test the combined blood samples (10 tests).

6 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions

Example: Blood testing (Example 5.2.7) – continued

Strategy (611):
If all of these tests are negative then none of the 1000 people
have the disease. Total number of tests needed: 10
If one of these tests are positive then we test each of the 100
people in that group. Total number of tests needed: 110
...
If all of the 10 tests are positive we end up having to do 1010 tests
Is this strategy better?
What is the expected number of tests needed?
When does this strategy lose?

7 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions

Example: Blood testing (Example 5.2.7) – continued

Let Yi = 1 if test for group i is positive and Yi = 0 otherwise


Let Y = Y1 + · · · + Y10 = the number of groups where every
individual has to be tested.
Total number of tests needed: T = 10 + 100Y .
Let Zi = number of people in group i that have the disease,
i = 1, . . . , 10. Then Zi ∼ Binomial(100, 0.002)
Then Yi is a Bernoulli(p) r.v. where

p = P(Yi = 1) = P(Zi > 0) = 1 − P(Zi = 0)


 
100
=1− 0.0020 (1 − 0.002)100 = 1 − 0.998100 = 0.181
0
Then Y ∼ Binomial(10, 0.181)
ET = E(10 + 100Y ) = 10 + 100E(Y ) = 10 + 100(10 × 0.181) = 191

8 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions

Example: Blood testing (Example 5.2.7) – continued

When does this strategy (611) lose?


Worst case scenario
10
0.18110 0.8190

P(T ≥ 1000) = P(Y ≥ 9.9) = P(Y = 10) = 10

3.8 × 10−8
Question: can we go further - a 611-A strategy
Any further improvement?

9 / 43
Chapter 5 5.3 Hypergeometric distributions

Hypergeometric distributions
Def: Hypergeometric distributions
A random variable X has the Hypergeometric distribution with
parameters N, M and n if it has the pf
N M
 
x n−x
f (x|N, M, n) = N+M

n

Parameter space: N, M and n are nonnegative integers with


n ≤N +M

Reasoning:
Say we have a finite population with N items of type I and M items
of type II.
Let X be the number of items of type I when we take n samples
without replacement from that population
Then X has the hypergeometric distribution
10 / 43
Chapter 5 5.3 Hypergeometric distributions

Hypergeometric distributions

Binomial: Sampling with replacement


(effectively infinite population)
Hypergeometric: Sample without replacement from a finite
population
You can also think of the Hypergeometric distribution as a sum of
dependent Bernoulli trials
Limiting situation:
Theorem 5.3.4: If the samples size n is much smaller than the
total population N + M then the Hypergeometric distribution with
parameters N, M and n will be nearly the same as the Binomial
distribution with parameters

N
n and p =
N +M
11 / 43
Chapter 5 5.4 Poisson distributions

Poisson distributions

Def: Poisson distributions – Poisson(λ)


A random variable X has the Poisson distribution with mean λ if it has
the pf  e−λ λx
x! for x = 0, 1, 2 . . .
f (x|λ) =
0 otherwise
Parameter space: λ > 0

Show that
f (x|λ) is a pf Var (X ) = λ
t
E(X ) = λ ψ(t) = eλ(e −1)
Px e−λ λk
The cdf: F (x|λ) = k =0 k! = yikes.

12 / 43
Chapter 5 5.4 Poisson distributions

Why Poisson?

The Poisson distribution is useful for modeling uncertainty in


counts / arrivals
Examples:
How many calls arrive at a switch board in one hour?
How many busses pass while you wait at the bus stop for 10 min?
How many bird nests are there in a certain area?
Under certain conditions (Poisson postulates) the Poisson
distribution can be shown to be the distribution of the number of
arrivals (Poisson process). However, the Poisson distribution is
often used as a model for uncertainty of counts in other types of
experiments.
The Poisson distribution can also be used as an approximation to
the Binomial(n, p) distribution when n is large and p is small.

13 / 43
Chapter 5 5.4 Poisson distributions

Poisson Postulates
For t ≥ 0, let Xt be a random variable with possible values in N0
(Think: Xt = number of arrivals from time 0 to time t)
(i) Start with no arrivals: X0 = 0
(ii) Arrivals in disjoint time periods are ind.: Xs and Xt − Xs ind. if s < t
(iii) Number of arrivals depends only on period length:
Xs and Xt+s − Xt are identically distributed
(iv) Arrival probability is proportional to period length, if length is small:
P(Xt = 1)
lim =λ
t→0 t
P(Xt >1)
(v) No simultaneous arrivals: limt→0 t =0
If (i) - (v) hold then for any integer n
(λt)n
P(Xt = n) = e−λt that is, Xt ∼ Poisson(λt)
n!
Can be defined in terms of spatial areas too.
14 / 43
Chapter 5 5.4 Poisson distributions

Properties of the Poisson Distributions


Useful recursive property: P(X = x) = λx P(X = x − 1) for x ≥ 1

Theorem 5.4.4: Sum of Poissons is a Poisson


If X1 , . . . , Xk are independent r.v. and Xi ∼ Poisson(λi ) for all i, then
k
!
X
X1 + · · · + Xk ∼ Poisson λi
i=1

Theorem 5.4.5: Approximation to Binomial


Let Xn ∼ Binomial(n, pn ), where 0 < pn < 1 for all n and {pn }∞
n=1 is a
sequence so that limn→∞ npn = λ. Then

λx
lim fXn (x|n, pn ) = e−λ = f Poisson (x|λ)
n→∞ x!
for all x = 0, 1, 2, . . .
15 / 43
Chapter 5 5.4 Poisson distributions

Example: Poisson as approximation to Binomial


Recall the disease testing example. We had
1000
X
X = Xi ∼ Binomial(1000, 0.002) and
i=1
Y ∼ Binomial(100, 0.181)

16 / 43
Chapter 5 5.5 Negative Binomial distributions

Geometric distributions

Def: Geometric distributions Geometric(p)


A random variable X has the Geometric distribution with parameter p if
it has the pf
p(1 − p)x

for x = 0, 1, 2 . . .
f (x|r , p) =
0 otherwise

Parameter space: 0 < p < 1

Say we have an infinite sequence of Bernoulli trials with


parameter p
X = number of “failures” before the first “success” . Then
X ∼ Geometric(p)

17 / 43
Chapter 5 5.5 Negative Binomial distributions

Negative Binomial distributions


Def: Negative Binomial distributions – NegBinomial(r , p)
A random variable X has the Negative Binomial distribution with
parameters r and p if it has the pf
r +x−1

pr (1 − p)x

for x = 0, 1, 2 . . .
f (x|r , p) = x
0 otherwise

Parameter space: 0 < p < 1 and r positive integer.

Say we have an infinite sequence of Bernoulli trials with


parameter p
X = number of “failures” before the r th “success”. Then
X ∼ NegBinomial(r , p)
Geometric(p) = NegBinomial(1, p)
Theorem 5.5.2: If X1 , . . . , Xr are i.i.d. Geometric(p) then
X = X1 + · · · + Xr ∼ NegBinomial(r , p)
18 / 43
Chapter 5 5.5 Negative Binomial distributions

Chapter 5 sections

Discrete univariate distributions:


5.2 Bernoulli and Binomial distributions
Just skim 5.3 Hypergeometric distributions
5.4 Poisson distributions
Just skim 5.5 Negative Binomial distributions
Continuous univariate distributions:
5.6 Normal distributions
5.7 Gamma distributions
Just skim 5.8 Beta distributions
Multivariate distributions
Just skim 5.9 Multinomial distributions
5.10 Bivariate normal distributions

19 / 43
Chapter 5 5.7 Gamma distributions

Gamma distributions
R∞
The Gamma function: Γ(α) = 0 x α−1 e−x dx

Γ(1) = 1 and Γ(0.5) = π
Γ(α) = (α − 1)Γ(α − 1) if α > 1

Def: Gamma distributions – Gamma(α, β)


A continuous r.v. X has the gamma distribution with parameters α and
β if it has the pdf
( α
β α−1 e−βx
f (x|α, β) = Γ(α) x for x > 0
0 otherwise

Parameter space: α > 0 and β > 0

Gamma(1, β) is the same as the exponential distribution with


parameter β, Expo(β)
20 / 43
Chapter 5 5.7 Gamma distributions

Properties of the gamma distributions


 α
β
ψ(t) = β−t , for t < β.
α α
E(X ) = β and E(X ) = β 2
If X1 , . . . , Xk are independent Γ(αi , β) r.v. then
k
!
X
X1 + · · · + Xk ∼ Gamma αi , β
i=1

21 / 43
Chapter 5 5.7 Gamma distributions

Properties of the gamma distributions

Theorem 5.7.9: Exponential distribution is memoryless


Let X ∼ Expo(β) and let t > 0. Then for any h > 0

P(X ≥ t + h|X ≥ t) = P(X ≥ h)

Theorem 5.7.12: Times between arrivals in a Poisson process


Let Zk be the time until the k th arrival in a Poisson process with rate β.
Let Y1 = Z1 and Yk = Zk − Zk −1 for k ≥ 2.
Then Y1 , Y2 , Y3 , . . . are i.i.d. with the exponential distribution with
parameter β.

22 / 43
Chapter 5 5.8 Beta distributions

Beta distributions

Def: Beta distributions – Beta(α, β)


A continuous r.v. X has the beta distribution with parameters α and β if
it has the pdf
(
Γ(α+β) α−1
Γ(α)Γ(β) x (1 − x)β−1 for 0 < x < 1
f (x|α, β) =
0 otherwise

Parameter space: α > 0 and β > 0

Beta(1, 1) = Uniform(0, 1)
Used to model a r.v.that takes values between 0 and 1.
The Beta distributions are often used as prior distributions for
probability parameters, e.g. the p in the Binomial distribution.

23 / 43
Chapter 5 5.8 Beta distributions

Beta distributions

24 / 43
Chapter 5 5.8 Beta distributions

Chapter 5 sections

Discrete univariate distributions:


5.2 Bernoulli and Binomial distributions
Just skim 5.3 Hypergeometric distributions
5.4 Poisson distributions
Just skim 5.5 Negative Binomial distributions
Continuous univariate distributions:
5.6 Normal distributions
5.7 Gamma distributions
Just skim 5.8 Beta distributions
Multivariate distributions
Just skim 5.9 Multinomial distributions
5.10 Bivariate normal distributions

25 / 43
Chapter 5 5.6 Normal distributions

Why Normal?
Works well in practice. Many physical experiments
have distributions that are approximately normal
Central Limit Theorem: Sum of many i.i.d. random
variables are approximately normally distributed
Mathematically convenient – especially the
multivariate normal distribution.
Can explicitly obtain the distribution of many Gauss
functions of a normally distributed random variable
have.
Marginal and conditional distributions of a
multivariate normal are also normal (multivariate or
univariate).

Developed by Gauss and then Laplace in the early


1800s
Also known at the Gaussian distributions
Laplace
26 / 43
Chapter 5 5.6 Normal distributions

Normal distributions

Def: Normal distributions – N(µ, σ 2 )


A continuous r.v. X has the normal distribution with mean µ and
variance σ 2 if it has the pdf

(x − µ)2
 
2 1
f (x|µ, σ ) = √ exp − , −∞ < x < ∞
2π σ 2σ 2

Parameter space: µ ∈ R and σ 2 > 0

Show:
ψ(t) = exp µt + 12 σ 2 t 2


E(X ) = µ
Var(X ) = σ 2

27 / 43
Chapter 5 5.6 Normal distributions

The Bell curve

28 / 43
Chapter 5 5.6 Normal distributions

Standard normal

Standard normal distribution: N(0, 1)


The normal distribution with µ = 0 and σ 2 = 1 is called the standard
normal distribution and the pdf and cdf are denoted as φ(x) and Φ(x)

The cdf for a normal distribution cannot be expressed in closed


form and is evaluated using numerical approximations.
Φ(x) is tabulated in the back of the book. Many calculators and
programs such as R, Matlab, Excel etc. can calculate Φ(x).
Φ(−x) = 1 − Φ(x)
Φ−1 (p) = −Φ−1 (1 − p)

29 / 43
Chapter 5 5.6 Normal distributions

Properties of the normal distributions

Theorem 5.6.4: Linear transformation of a normal is still normal


If X ∼ N(µ, σ 2 ) and Y = aX + b where a and b are constants and
a 6= 0 then
Y ∼ N(aµ + b, a2 σ 2 )

Let F be the cdf of X , where X ∼ N(µ, σ 2 ). Then


 
x −µ
F (x) = Φ
σ

and
F −1 (p) = µ + σΦ−1 (p)

30 / 43
Chapter 5 5.6 Normal distributions

Example: Measured Voltage

Suppose the measured voltage, X , in a certain electric circuit has the


normal distribution with mean 120 and standard deviation 2
1 What is the probability that the measured voltage is between 118
and 122?
2 Below what value will 95% of the measurements be?

31 / 43
Chapter 5 5.6 Normal distributions

Properties of the normal distributions

Theorem 5.6.7: Linear combination of ind. normals is a normal


Let X1 , . . . , Xk be independent r.v. and Xi ∼ N(µi , σi2 ) for i = 1, . . . , k .
Then  
X1 + · · · + Xk ∼ N µ1 + · · · + µk , σ12 + · · · + σk2

Also, if a1 , . . . , ak and b are constants where at least one ai is not zero:


k k
!
X X
2 2
a1 X1 + · · · + ak Xk + b ∼ N b + a i µi , ai σi
i=1 i=1

In particular:
1 Pn
The sample mean: X n = n i=1 Xi
If X1 , . . . , Xn are a random sample from a N(µ, σ 2 ), what is the
distribution of the sample mean?
32 / 43
Chapter 5 5.6 Normal distributions

Example: Measured voltage – continued

Suppose the measured voltage, X , in a certain electric circuit has the


normal distribution with mean 120 and standard deviation 2.
If three independent measurements of the voltage are made, what
is the probability that the sample mean X 3 will lie between 118
and 120?
Find x that satisfies P(|X 3 − 120| ≤ x) = 0.95

33 / 43
Chapter 5 5.6 Normal distributions

Area under the curve

34 / 43
Chapter 5 5.6 Normal distributions

Lognormal distributions
Def: Lognormal distributions
If log(X ) ∼ N(µ, σ 2 ) then we say that X has the Lognormal distribution
with parameters µ and σ 2 .

The support of the lognormal


distribution is (0, ∞).
Often used to model time
before failure.

Example:
Let X and Y be independent random variables such that
log(X ) ∼ N(1.6, 4.5) and log(Y ) ∼ N(3, 6). What is the
distribution of the product XY ?
35 / 43
Chapter 5 5.10 Bivariate normal distributions

Bivariate normal distributions

Def: Bivariate normal


Two continuous r.v. X1 and X2 have the bivariate normal distribution
with means µ1 and µ2 , variances σ12 and σ22 and correlation ρ if they
have the joint pdf
1
f (x1 , x2 ) =
2π(1 − ρ2 )1/2 σ1 σ2
(x1 − µ1 )2 (x2 − µ2 )2
     
1 x1 − µ1 x 2 − µ2
× exp − − 2ρ + (1)
2(1 − ρ2 ) σ12 σ1 σ2 σ22

Parameter space: µi ∈ R, σi2 > 0 for i = 1, 2 and −1 ≤ ρ ≤ 1

36 / 43
Chapter 5 5.10 Bivariate normal distributions

Bivariate normal pdf


Bivariate normal pdf with different ρ:

Contours:

37 / 43
Chapter 5 5.10 Bivariate normal distributions

Bivariate normal as linear combination

Theorem 5.10.1: Bivariate normal from two ind. standard normals


Let Z1 ∼ N(0, 1) and Z2 ∼ N(0, 1) be independent.
Let µi ∈ R, σi2 > 0 for i = 1, 2 and −1 ≤ ρ ≤ 1 and let

X1 = σ1 Z1 + µ1
p
X2 = σ2 (ρZ1 + 1 − ρ2 Z2 ) + µ2 (2)

Then the joint distribution of X1 and X2 is bivariate normal with


parameters µ1 , µ2 , σ12 , σ22 and ρ

Theorem 5.10.2 (part 1) – the other way


Let X1 and X2 have the pdf in (1). Then there exist independent
standard normal r.v. Z1 and Z2 so that (2) holds.

38 / 43
Chapter 5 5.10 Bivariate normal distributions

Properties of a bivariate normal

Theorem 5.10.2 (part 2)


Let X1 and X2 have the pdf in (1). Then the marginal distributions are

X1 ∼ N(µ1 , σ12 ) and X2 ∼ N(µ2 , σ22 )

And the correlation between X1 and X2 is ρ

Theorem 5.10.4: The conditional is normal


Let X1 and X2 have the pdf in (1). Then the conditional distribution of
X2 given that X1 = x1 is (univariate) normal with

(x1 − µ1 )
E(X2 |X1 = x1 ) = µ2 + ρσ2 and
σ1
2 2
Var(X2 |X1 = x1 ) = (1 − ρ )σ2

39 / 43
Chapter 5 5.10 Bivariate normal distributions

Properties of a bivariate normal


Theorem 5.10.3: Uncorrelated ⇒ Independent
Let X1 and X2 have the bivariate normal distribution. Then X1 and X2
are independent if and only if they are uncorrelated.

Only holds for the multivariate normal distribution


One of the very convenient properties of the normal distribution

Theorem 5.10.5: Linear combinations are normal


Let X1 and X2 have the pdf in (1) and let a1 , a2 and b be constants.
Then Y = a1 X1 + a2 X2 + b is normally distributed with
E(Y ) = a1 µ1 + a2 µ2 + b and
Var(Y ) = a12 σ12 + a22 σ22 + 2a1 a2 ρσ1 σ2

This extends what we already had for independent normals


40 / 43
Chapter 5 5.10 Bivariate normal distributions

Example

Let X1 and X2 have the bivariate normal


distribution with means µ1 = 3, µ2 = 5,
variances σ12 = 4, σ22 = 9 and correlation
ρ = 0.6.
a) Find the distribution of X2 − 2X1
b) What is expected value of X2 , given that
we observed X1 = 2?
c) What is the probability that X1 > X2 ?

41 / 43
Chapter 5 5.10 Bivariate normal distributions

Multivariate normal – Matrix notation

The pdf of an n-dimensional normal distribution, X ∼ N(µ, Σ):


 
1 1 | −1
f (x) = exp − (x − µ) Σ (x − µ)
(2π)n/2 |Σ|1/2 2

where
σ12 σ1,2 σ1,3
 
    ··· σ1,n
µ1 x1 σ2,1 σ 2 σ2,3
 µ2  x2   2 ··· σ2,n 

2 ···
µ =  . , x=. and Σ = σ3,1 σ3,2 σ3 σ3,n 
    
 ..   .. 

 .. .. .. .. .. 
 . . . . . 
µn xn
σn,1 σn,2 σn,3 ··· σn2

µ is the mean vector and Σ is called the variance-covariance matrix.

42 / 43
Chapter 5 5.10 Bivariate normal distributions

Multivariate normal – Matrix notation

Same things hold for multivariate normal distribution as the bivariate.


Let X ∼ N(µ, Σ)
Linear combinations of X are normal
AX + b is (multivariate) normal for fixed matrix A and vector b
The marginal distribution of Xi is normal with mean µi and
variance σi2
The off-diagonal elements of Σ are the covariances between
individual elements of X, i.e. Cov(Xi , Xj ) = σi,j .
The joint marginal distributions are also normal where the mean
and covariance matrix are found by picking the corresponding
elements from µ and rows and columns from Σ.
The conditional distributions are also normal (multivariate or
univariate)

43 / 43

Das könnte Ihnen auch gefallen