Lecture08 PDF

Chapter 5
Chapter 5 sections
Discrete univariate distributions:

5.2 Bernoulli and Binomial distributions
Just skim 5.3 Hypergeometric distributions
5.4 Poisson distributions
Just skim 5.5 Negative Binomial distributions
Continuous univariate distributions:
5.6 Normal distributions
5.7 Gamma distributions
Just skim 5.8 Beta distributions
Multivariate distributions
Just skim 5.9 Multinomial distributions
5.10 Bivariate normal distributions
1 / 43
Chapter 5 5.1 Introduction
Families of distributions
How:
Parameter and Parameter space
pf /pdf and cdf - new notation: f (x| parameters )
Mean, variance and the m.g.f. ψ(t)
Features, connections to other distributions, approximation
Reasoning behind a distribution
Why:
Natural justification for certain experiments
A model for the uncertainty in an experiment
All models are wrong, but some are useful – George Box
2 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions
Bernoulli distributions
Def: Bernoulli distributions – Bernoulli(p)
A r.v. X has the Bernoulli distribution with parameter p if P(X = 1) = p
and P(X = 0) = 1 − p. The pf of X is
px (1 − p)1−x

for x = 0, 1
f (x|p) =
0 otherwise
Parameter space: p ∈ [0, 1]
In an experiment with only two possible outcomes, “success” and

“failure”, let X = number successes. Then X ∼ Bernoulli(p) where
p is the probability of success.
E(X ) = p, Var(X ) = p(1 − p) and ψ(t) = E(etX ) = pet + (1 − p)

 0 for x < 0
The cdf is F (x|p) = 1−p for 0 ≤ x < 1
1 for x ≥ 1

3 / 43
Binomial distributions
Def: Binomial distributions – Binomial(n, p)

A r.v. X has the Binomial distribution with parameters n and p if X has
the pf
n x n−x for x = 0, 1, . . . , n
f (x|n, p) = x p (1 − p)
0 otherwise
Parameter space: n is a positive integer and p ∈ [0, 1]
If X is the number of “successes” in n independent tries where prob. of

success is p each time, then X ∼ Binomial(n, p)
Theorem 5.2.1
If X1 , X2 , . . . , Xn form n Bernoulli trials with parameter p
(i.e. are i.i.d. Bernoulli(p)) then X = X1 + · · · + Xn ∼ Binomial(n, p)
4 / 43
Binomial distributions
Let X ∼ Binomial(n, p)
E(X ) = np, Var(X ) = np(1 − p)
To find the m.g.f. of X write X = X1 + · · · + Xn where Xi ’s are
i.i.d. Bernoulli(p). Then ψi (t) = pet + 1 − p and we get
n
Y n
Y
pet + 1 − p = (pet + 1 − p)n

ψ(t) = ψi (t) =
i=1 i=1
Px n
pt (1 − p)n−t = yikes!

cdf: F (x|n, p) = t=0 t
Theorem 5.2.2
If Xi ∼ Binomial(ni , p), i = 1, . . P
. , k and the Xi ’s are independent, then
X = X1 + · · · + Xk ∼ Binomial( ki=1 ni , p)
5 / 43
Example: Blood testing (Example 5.2.7)

The setup:
1000 people need to be tested for a disease that affects 0.2% of
all people.
The test is guaranteed to detect the disease if it is present in a
blood sample.
Task: Find all the people that have the disease.

Strategy: Test 1000 samples
What’s the expected number of people that have the disease?
Any assumptions you need to make?
Strategy (611):
Divide the people into 10 groups of 100.
For each group take a portion of each of the 100 blood samples
and combine into one sample.
Then test the combined blood samples (10 tests).
6 / 43
Example: Blood testing (Example 5.2.7) – continued
Strategy (611):
If all of these tests are negative then none of the 1000 people
have the disease. Total number of tests needed: 10
If one of these tests are positive then we test each of the 100
people in that group. Total number of tests needed: 110
...
If all of the 10 tests are positive we end up having to do 1010 tests
Is this strategy better?
What is the expected number of tests needed?
When does this strategy lose?
7 / 43
Let Yi = 1 if test for group i is positive and Yi = 0 otherwise

Let Y = Y1 + · · · + Y10 = the number of groups where every
individual has to be tested.
Total number of tests needed: T = 10 + 100Y .
Let Zi = number of people in group i that have the disease,
i = 1, . . . , 10. Then Zi ∼ Binomial(100, 0.002)
Then Yi is a Bernoulli(p) r.v. where
p = P(Yi = 1) = P(Zi > 0) = 1 − P(Zi = 0)

100
=1− 0.0020 (1 − 0.002)100 = 1 − 0.998100 = 0.181
0
Then Y ∼ Binomial(10, 0.181)
ET = E(10 + 100Y ) = 10 + 100E(Y ) = 10 + 100(10 × 0.181) = 191
8 / 43
When does this strategy (611) lose?

Worst case scenario
10
0.18110 0.8190

P(T ≥ 1000) = P(Y ≥ 9.9) = P(Y = 10) = 10
3.8 × 10−8
Question: can we go further - a 611-A strategy
Any further improvement?
9 / 43
Chapter 5 5.3 Hypergeometric distributions
Hypergeometric distributions
Def: Hypergeometric distributions
A random variable X has the Hypergeometric distribution with
parameters N, M and n if it has the pf
N M

x n−x
f (x|N, M, n) = N+M

n
Parameter space: N, M and n are nonnegative integers with

n ≤N +M
Reasoning:
Say we have a finite population with N items of type I and M items
of type II.
Let X be the number of items of type I when we take n samples
without replacement from that population
Then X has the hypergeometric distribution
10 / 43
Chapter 5 5.3 Hypergeometric distributions
Hypergeometric distributions
Binomial: Sampling with replacement

(effectively infinite population)
Hypergeometric: Sample without replacement from a finite
population
You can also think of the Hypergeometric distribution as a sum of
dependent Bernoulli trials
Limiting situation:
Theorem 5.3.4: If the samples size n is much smaller than the
total population N + M then the Hypergeometric distribution with
parameters N, M and n will be nearly the same as the Binomial
distribution with parameters
N
n and p =
N +M
11 / 43
Chapter 5 5.4 Poisson distributions
Poisson distributions
Def: Poisson distributions – Poisson(λ)

A random variable X has the Poisson distribution with mean λ if it has
the pf e−λ λx
x! for x = 0, 1, 2 . . .
f (x|λ) =
0 otherwise
Parameter space: λ > 0
Show that
f (x|λ) is a pf Var (X ) = λ
t
E(X ) = λ ψ(t) = eλ(e −1)
Px e−λ λk
The cdf: F (x|λ) = k =0 k! = yikes.
12 / 43
Why Poisson?
The Poisson distribution is useful for modeling uncertainty in

counts / arrivals
Examples:
How many calls arrive at a switch board in one hour?
How many busses pass while you wait at the bus stop for 10 min?
How many bird nests are there in a certain area?
Under certain conditions (Poisson postulates) the Poisson
distribution can be shown to be the distribution of the number of
arrivals (Poisson process). However, the Poisson distribution is
often used as a model for uncertainty of counts in other types of
experiments.
The Poisson distribution can also be used as an approximation to
the Binomial(n, p) distribution when n is large and p is small.
13 / 43
Poisson Postulates
For t ≥ 0, let Xt be a random variable with possible values in N0
(Think: Xt = number of arrivals from time 0 to time t)
(i) Start with no arrivals: X0 = 0
(ii) Arrivals in disjoint time periods are ind.: Xs and Xt − Xs ind. if s < t
(iii) Number of arrivals depends only on period length:
Xs and Xt+s − Xt are identically distributed
(iv) Arrival probability is proportional to period length, if length is small:
P(Xt = 1)
lim =λ
t→0 t
P(Xt >1)
(v) No simultaneous arrivals: limt→0 t =0
If (i) - (v) hold then for any integer n
(λt)n
P(Xt = n) = e−λt that is, Xt ∼ Poisson(λt)
n!
Can be defined in terms of spatial areas too.
14 / 43
Properties of the Poisson Distributions

Useful recursive property: P(X = x) = λx P(X = x − 1) for x ≥ 1
Theorem 5.4.4: Sum of Poissons is a Poisson

If X1 , . . . , Xk are independent r.v. and Xi ∼ Poisson(λi ) for all i, then
k
!
X
X1 + · · · + Xk ∼ Poisson λi
i=1
Theorem 5.4.5: Approximation to Binomial

Let Xn ∼ Binomial(n, pn ), where 0 < pn < 1 for all n and {pn }∞
n=1 is a
sequence so that limn→∞ npn = λ. Then
λx
lim fXn (x|n, pn ) = e−λ = f Poisson (x|λ)
n→∞ x!
for all x = 0, 1, 2, . . .
15 / 43
Example: Poisson as approximation to Binomial

Recall the disease testing example. We had
1000
X
X = Xi ∼ Binomial(1000, 0.002) and
i=1
Y ∼ Binomial(100, 0.181)
16 / 43
Chapter 5 5.5 Negative Binomial distributions
Geometric distributions
Def: Geometric distributions Geometric(p)

A random variable X has the Geometric distribution with parameter p if
it has the pf
p(1 − p)x

for x = 0, 1, 2 . . .
f (x|r , p) =
0 otherwise
Parameter space: 0 < p < 1
Say we have an infinite sequence of Bernoulli trials with

parameter p
X = number of “failures” before the first “success” . Then
X ∼ Geometric(p)
17 / 43
Negative Binomial distributions

Def: Negative Binomial distributions – NegBinomial(r , p)
A random variable X has the Negative Binomial distribution with
parameters r and p if it has the pf
r +x−1

pr (1 − p)x

for x = 0, 1, 2 . . .
f (x|r , p) = x
0 otherwise
Parameter space: 0 < p < 1 and r positive integer.
Say we have an infinite sequence of Bernoulli trials with

parameter p
X = number of “failures” before the r th “success”. Then
X ∼ NegBinomial(r , p)
Geometric(p) = NegBinomial(1, p)
Theorem 5.5.2: If X1 , . . . , Xr are i.i.d. Geometric(p) then
X = X1 + · · · + Xr ∼ NegBinomial(r , p)
18 / 43
Chapter 5 sections

19 / 43
Chapter 5 5.7 Gamma distributions
Gamma distributions
R∞
The Gamma function: Γ(α) = 0 x α−1 e−x dx
√
Γ(1) = 1 and Γ(0.5) = π
Γ(α) = (α − 1)Γ(α − 1) if α > 1
Def: Gamma distributions – Gamma(α, β)

A continuous r.v. X has the gamma distribution with parameters α and
β if it has the pdf
( α
β α−1 e−βx
f (x|α, β) = Γ(α) x for x > 0
0 otherwise
Parameter space: α > 0 and β > 0
Gamma(1, β) is the same as the exponential distribution with

parameter β, Expo(β)
20 / 43
Properties of the gamma distributions

α
β
ψ(t) = β−t , for t < β.
α α
E(X ) = β and E(X ) = β 2
If X1 , . . . , Xk are independent Γ(αi , β) r.v. then
k
!
X
X1 + · · · + Xk ∼ Gamma αi , β
i=1
21 / 43
Properties of the gamma distributions
Theorem 5.7.9: Exponential distribution is memoryless

Let X ∼ Expo(β) and let t > 0. Then for any h > 0
P(X ≥ t + h|X ≥ t) = P(X ≥ h)
Theorem 5.7.12: Times between arrivals in a Poisson process

Let Zk be the time until the k th arrival in a Poisson process with rate β.
Let Y1 = Z1 and Yk = Zk − Zk −1 for k ≥ 2.
Then Y1 , Y2 , Y3 , . . . are i.i.d. with the exponential distribution with
parameter β.
22 / 43
Chapter 5 5.8 Beta distributions
Beta distributions
Def: Beta distributions – Beta(α, β)

A continuous r.v. X has the beta distribution with parameters α and β if
it has the pdf
(
Γ(α+β) α−1
Γ(α)Γ(β) x (1 − x)β−1 for 0 < x < 1
f (x|α, β) =
0 otherwise
Parameter space: α > 0 and β > 0
Beta(1, 1) = Uniform(0, 1)
Used to model a r.v.that takes values between 0 and 1.
The Beta distributions are often used as prior distributions for
probability parameters, e.g. the p in the Binomial distribution.
23 / 43
Beta distributions
24 / 43
Chapter 5 sections

25 / 43
Chapter 5 5.6 Normal distributions
Why Normal?
Works well in practice. Many physical experiments
have distributions that are approximately normal
Central Limit Theorem: Sum of many i.i.d. random
variables are approximately normally distributed
Mathematically convenient – especially the
multivariate normal distribution.
Can explicitly obtain the distribution of many Gauss
functions of a normally distributed random variable
have.
Marginal and conditional distributions of a
multivariate normal are also normal (multivariate or
univariate).
Developed by Gauss and then Laplace in the early

1800s
Also known at the Gaussian distributions
Laplace
26 / 43
Normal distributions
Def: Normal distributions – N(µ, σ 2 )

A continuous r.v. X has the normal distribution with mean µ and
variance σ 2 if it has the pdf
(x − µ)2

2 1
f (x|µ, σ ) = √ exp − , −∞ < x < ∞
2π σ 2σ 2
Parameter space: µ ∈ R and σ 2 > 0
Show:
ψ(t) = exp µt + 12 σ 2 t 2

E(X ) = µ
Var(X ) = σ 2
27 / 43
The Bell curve
28 / 43
Standard normal
Standard normal distribution: N(0, 1)

The normal distribution with µ = 0 and σ 2 = 1 is called the standard
normal distribution and the pdf and cdf are denoted as φ(x) and Φ(x)
The cdf for a normal distribution cannot be expressed in closed

form and is evaluated using numerical approximations.
Φ(x) is tabulated in the back of the book. Many calculators and
programs such as R, Matlab, Excel etc. can calculate Φ(x).
Φ(−x) = 1 − Φ(x)
Φ−1 (p) = −Φ−1 (1 − p)
29 / 43
Properties of the normal distributions
Theorem 5.6.4: Linear transformation of a normal is still normal

If X ∼ N(µ, σ 2 ) and Y = aX + b where a and b are constants and
a 6= 0 then
Y ∼ N(aµ + b, a2 σ 2 )
Let F be the cdf of X , where X ∼ N(µ, σ 2 ). Then

x −µ
F (x) = Φ
σ
and
F −1 (p) = µ + σΦ−1 (p)
30 / 43
Example: Measured Voltage
Suppose the measured voltage, X , in a certain electric circuit has the

normal distribution with mean 120 and standard deviation 2
1 What is the probability that the measured voltage is between 118
and 122?
2 Below what value will 95% of the measurements be?
31 / 43
Properties of the normal distributions
Theorem 5.6.7: Linear combination of ind. normals is a normal

Let X1 , . . . , Xk be independent r.v. and Xi ∼ N(µi , σi2 ) for i = 1, . . . , k .
Then
X1 + · · · + Xk ∼ N µ1 + · · · + µk , σ12 + · · · + σk2
Also, if a1 , . . . , ak and b are constants where at least one ai is not zero:

k k
!
X X
2 2
a1 X1 + · · · + ak Xk + b ∼ N b + a i µi , ai σi
i=1 i=1
In particular:
1 Pn
The sample mean: X n = n i=1 Xi
If X1 , . . . , Xn are a random sample from a N(µ, σ 2 ), what is the
distribution of the sample mean?
32 / 43
Example: Measured voltage – continued
Suppose the measured voltage, X , in a certain electric circuit has the

normal distribution with mean 120 and standard deviation 2.
If three independent measurements of the voltage are made, what
is the probability that the sample mean X 3 will lie between 118
and 120?
Find x that satisfies P(|X 3 − 120| ≤ x) = 0.95
33 / 43
Area under the curve
34 / 43
Lognormal distributions
Def: Lognormal distributions
If log(X ) ∼ N(µ, σ 2 ) then we say that X has the Lognormal distribution
with parameters µ and σ 2 .
The support of the lognormal

distribution is (0, ∞).
Often used to model time
before failure.
Example:
Let X and Y be independent random variables such that
log(X ) ∼ N(1.6, 4.5) and log(Y ) ∼ N(3, 6). What is the
distribution of the product XY ?
35 / 43
Chapter 5 5.10 Bivariate normal distributions
Bivariate normal distributions
Def: Bivariate normal

Two continuous r.v. X1 and X2 have the bivariate normal distribution
with means µ1 and µ2 , variances σ12 and σ22 and correlation ρ if they
have the joint pdf
1
f (x1 , x2 ) =
2π(1 − ρ2 )1/2 σ1 σ2
(x1 − µ1 )2 (x2 − µ2 )2

1 x1 − µ1 x 2 − µ2
× exp − − 2ρ + (1)
2(1 − ρ2 ) σ12 σ1 σ2 σ22
Parameter space: µi ∈ R, σi2 > 0 for i = 1, 2 and −1 ≤ ρ ≤ 1
36 / 43
Bivariate normal pdf

Bivariate normal pdf with different ρ:
Contours:
37 / 43
Bivariate normal as linear combination
Theorem 5.10.1: Bivariate normal from two ind. standard normals

Let Z1 ∼ N(0, 1) and Z2 ∼ N(0, 1) be independent.
Let µi ∈ R, σi2 > 0 for i = 1, 2 and −1 ≤ ρ ≤ 1 and let
X1 = σ1 Z1 + µ1
p
X2 = σ2 (ρZ1 + 1 − ρ2 Z2 ) + µ2 (2)
Then the joint distribution of X1 and X2 is bivariate normal with

parameters µ1 , µ2 , σ12 , σ22 and ρ
Theorem 5.10.2 (part 1) – the other way

Let X1 and X2 have the pdf in (1). Then there exist independent
standard normal r.v. Z1 and Z2 so that (2) holds.
38 / 43
Properties of a bivariate normal
Theorem 5.10.2 (part 2)

Let X1 and X2 have the pdf in (1). Then the marginal distributions are
X1 ∼ N(µ1 , σ12 ) and X2 ∼ N(µ2 , σ22 )
And the correlation between X1 and X2 is ρ
Theorem 5.10.4: The conditional is normal

Let X1 and X2 have the pdf in (1). Then the conditional distribution of
X2 given that X1 = x1 is (univariate) normal with
(x1 − µ1 )
E(X2 |X1 = x1 ) = µ2 + ρσ2 and
σ1
2 2
Var(X2 |X1 = x1 ) = (1 − ρ )σ2
39 / 43
Properties of a bivariate normal

Theorem 5.10.3: Uncorrelated ⇒ Independent
Let X1 and X2 have the bivariate normal distribution. Then X1 and X2
are independent if and only if they are uncorrelated.
Only holds for the multivariate normal distribution

One of the very convenient properties of the normal distribution
Theorem 5.10.5: Linear combinations are normal

Let X1 and X2 have the pdf in (1) and let a1 , a2 and b be constants.
Then Y = a1 X1 + a2 X2 + b is normally distributed with
E(Y ) = a1 µ1 + a2 µ2 + b and
Var(Y ) = a12 σ12 + a22 σ22 + 2a1 a2 ρσ1 σ2
This extends what we already had for independent normals

40 / 43
Example
Let X1 and X2 have the bivariate normal

distribution with means µ1 = 3, µ2 = 5,
variances σ12 = 4, σ22 = 9 and correlation
ρ = 0.6.
a) Find the distribution of X2 − 2X1
b) What is expected value of X2 , given that
we observed X1 = 2?
c) What is the probability that X1 > X2 ?
41 / 43
Multivariate normal – Matrix notation
The pdf of an n-dimensional normal distribution, X ∼ N(µ, Σ):

1 1 | −1
f (x) = exp − (x − µ) Σ (x − µ)
(2π)n/2 |Σ|1/2 2
where
σ12 σ1,2 σ1,3
 
    ··· σ1,n
µ1 x1 σ2,1 σ 2 σ2,3
 µ2  x2   2 ··· σ2,n 

2 ···
µ =  . , x=. and Σ = σ3,1 σ3,2 σ3 σ3,n 
    
 ..   .. 

 .. .. .. .. .. 
 . . . . . 
µn xn
σn,1 σn,2 σn,3 ··· σn2
µ is the mean vector and Σ is called the variance-covariance matrix.
42 / 43
Multivariate normal – Matrix notation
Same things hold for multivariate normal distribution as the bivariate.

Let X ∼ N(µ, Σ)
Linear combinations of X are normal
AX + b is (multivariate) normal for fixed matrix A and vector b
The marginal distribution of Xi is normal with mean µi and
variance σi2
The off-diagonal elements of Σ are the covariances between
individual elements of X, i.e. Cov(Xi , Xj ) = σi,j .
The joint marginal distributions are also normal where the mean
and covariance matrix are found by picking the corresponding
elements from µ and rows and columns from Σ.
The conditional distributions are also normal (multivariate or
univariate)
43 / 43

Lecture08 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture08 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 5

Discrete univariate distributions:

Parameter space: p ∈ [0, 1]

In an experiment with only two possible outcomes, “success” and

Def: Binomial distributions – Binomial(n, p)

Parameter space: n is a positive integer and p ∈ [0, 1]

If X is the number of “successes” in n independent tries where prob. of

Example: Blood testing (Example 5.2.7)

Task: Find all the people that have the disease.

Example: Blood testing (Example 5.2.7) – continued

Example: Blood testing (Example 5.2.7) – continued

Let Yi = 1 if test for group i is positive and Yi = 0 otherwise

p = P(Yi = 1) = P(Zi > 0) = 1 − P(Zi = 0)

Example: Blood testing (Example 5.2.7) – continued

When does this strategy (611) lose?

Parameter space: N, M and n are nonnegative integers with

Binomial: Sampling with replacement

Def: Poisson distributions – Poisson(λ)

The Poisson distribution is useful for modeling uncertainty in

Properties of the Poisson Distributions

Theorem 5.4.4: Sum of Poissons is a Poisson

Theorem 5.4.5: Approximation to Binomial

Example: Poisson as approximation to Binomial

Def: Geometric distributions Geometric(p)

Parameter space: 0 < p < 1

Say we have an infinite sequence of Bernoulli trials with

Negative Binomial distributions

Parameter space: 0 < p < 1 and r positive integer.

Say we have an infinite sequence of Bernoulli trials with

Discrete univariate distributions:

Def: Gamma distributions – Gamma(α, β)

Parameter space: α > 0 and β > 0

Gamma(1, β) is the same as the exponential distribution with

Properties of the gamma distributions

Properties of the gamma distributions

Theorem 5.7.9: Exponential distribution is memoryless

P(X ≥ t + h|X ≥ t) = P(X ≥ h)

Theorem 5.7.12: Times between arrivals in a Poisson process

Def: Beta distributions – Beta(α, β)

Parameter space: α > 0 and β > 0

Discrete univariate distributions:

Developed by Gauss and then Laplace in the early

Def: Normal distributions – N(µ, σ 2 )

Parameter space: µ ∈ R and σ 2 > 0

The Bell curve

Standard normal distribution: N(0, 1)

The cdf for a normal distribution cannot be expressed in closed

Properties of the normal distributions

Theorem 5.6.4: Linear transformation of a normal is still normal

Let F be the cdf of X , where X ∼ N(µ, σ 2 ). Then

Example: Measured Voltage

Suppose the measured voltage, X , in a certain electric circuit has the

Properties of the normal distributions

Theorem 5.6.7: Linear combination of ind. normals is a normal

Also, if a1 , . . . , ak and b are constants where at least one ai is not zero:

Example: Measured voltage – continued

Suppose the measured voltage, X , in a certain electric circuit has the

Area under the curve

The support of the lognormal

Bivariate normal distributions

Def: Bivariate normal

Parameter space: µi ∈ R, σi2 > 0 for i = 1, 2 and −1 ≤ ρ ≤ 1

Bivariate normal pdf