Beruflich Dokumente
Kultur Dokumente
David Nott
standj@nus.edu.sg
Department of Statistics and Applied Probability
National University of Singapore
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 1
The normal distribution
The probability density function (pdf) of the normal distribution is symmetric, with a
smooth bell-shape.
The formula for the pdf of the N (μ, σ2 ) distribution is:
2
1 (x − μ)
f (x) = √ exp −
2πσ 2 2σ 2
The pdf is largest at μ, the expected value, and how “spread out” the density curve
is will be controlled by σ , the standard deviation.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 298
The normal distribution (cont.)
0.4
0.3
0.2
f(x)
0.1
0.0
5 10 15
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 299
The normal distribution (cont.)
Until conversion to the euro, the German ten mark bill had a picture
of Gauss, a picture of the Gaussian density curve, and even the
formula for the Gaussian density on it.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 300
The normal distribution (cont.)
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 301
Brownian motion
One problem in which the Gaussian distribution arises is in the description of the
physical phenomenon of Brownian motion. Albert Einstein’s mathematical
description of Brownian motion is widely regarded as having convinced most
physicists of the existence of atoms.
Brownian motion is named after Robert Brown, who observed the highly erratic
motion of pollen grains in a drop of water.
The motion of the grains can be thought of as arising from a large number of
collisions with the water molecules. Brownian motion is an idealized model for the
coordinates of the motion.
For a particle starting at the origin and followed for a period of time t the
distribution of the x or y coordinates will be N (0, σ 2 t) for some σ 2 > 0.
You will understand better later why the summing of a large number of independent
perturbations from molecular collisions results in a Gaussian distribution.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 302
Brownian motion
The paths traced out by the process tend to be highly erratic (in fact,
although they are continuous they are not differentiable anywhere, for
those of you who have done enough math to know what that means).
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 303
Brownian motion
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 304
Brownian motion
A century and a half before Brown a draper from Delft, Antony Van
Leeuwenhoek, had observed Brownian motion.
Other scientists had guessed the correct explanation but had not
done any calculations that could be compared to experiments.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 305
Calculating with the normal distribution
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 306
Calculating with the normal distribution
(cont.)
0.4
0.3
standard normal pdf
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
z values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 307
Calculating with the normal distribution
(cont.)
How to find normal probabilities?
Use a computer.
All software packages will calculate P(X < c) for X distributed
normal with mean μ and standard deviation σ .
E.g. R has the command pnorm().
If we wish to calculate a probability P(a < X < b), then we can use
a computer to calculate P(X < a) and P(X < b) and then use
X −μ
Z=
σ
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 309
Calculating with the normal distribution
(cont.)
Example: Marilyn vos Savant’s IQ
Earlier in the course I mentioned Marilyn vos Savant, who was listed for a time in
the Guinness Book of Records as the person with the world’s highest IQ.
She popularized the Monty Hall problem with her column in Parade magazine. Her
presentation of the problem and her (correct) solution, generated a lot of heated
discussion at the time.
IQ scores are often assumed to follow a normal distribution, calibrated so that the
mean is 100 and the standard deviation is 15.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 311
Calculating with the normal distribution
(cont.)
Example: (cont.)
We have:
X −μ 228 − μ 228 − 100
P(X > 228) = P > =P Z> = P(Z > 8.53)
σ σ 15
So the probability we need is P(Z > 8.53), where Z has a standard normal distribution. But
how do we work out what this is? We have to look this probability up in a table, like the one
given in the text book. Actually 8.53 is well beyond the upper limit of values in the table. You
won’t be required to read normal tables for anything in this course.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 312
Calculating with the normal distribution
(cont.)
Consider a normally distributed random variable X with mean μ and variance σ2 , and the random
X−μ
variable Z = σ which is standard normal.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 313
The 68 — 95 — 99.7 rule
μ − 3σ μ − 2σ μ−σ μ μ+σ μ + 2σ μ + 3σ
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 314
Why is the normal distribution so important?
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 315
Why is the normal distribution so important?
(cont.)
0.15
0.10
Sum of one die
0.05
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 316
Why is the normal distribution so important?
(cont.)
0.15
0.10
Sum of two dice
0.05
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 317
Why is the normal distribution so important?
(cont.)
0.12
0.10
0.08
Sum of three dice
0.06
0.04
0.02
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 318
Why is the normal distribution so important?
(cont.)
0.10
0.08
Sum of four dice
0.06
0.04
0.02
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 319
Why is the normal distribution so important?
(cont.)
0.10
0.08
Sum of five dice
0.06
0.04
0.02
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 320
The Central Limit Theorem (CLT)
N OTE: We assume here that the value obtained on one die roll does not
affect the value obtained on another die roll; i.e. we are assuming
independence.
If we do not have independence, then the CLT may not hold.
But there are many different sets of assumptions under which the CLT
holds.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 321
What does the CLT tell us?
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 322
Why is the CLT (and the normal distribution)
important?
It explains why many real-life observed variables have a signal +
noise form with the noise following a normal distribution.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 323
Connection between binomial and normal
distribution
Assume X follows a binomial distribution with parameters n and p.
Then we can view X as the sum of n independent random variables,
each having a Bernoulli distribution with parameter p.
Hence, if Y is a normally distributed random variable with mean np and
variance np(1 − p), then
1
x+ 2
1 1
P (X = x) ≈ P x− ≤Y ≤x+ = fY (u) du
2 2 1
x− 2
≈ fY (x).
2
(y−np)
where fY (y) = √ 1
exp − 2np(1−p) is the probability density
2πnp(1−p)
function of Y .
N OTE : a typical rule of thumb is that this approximation is valid if np ≥ 5 and n(1 − p) ≥ 5.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 324
Connection between binomial and normal
distribution (cont.)
0 5 10 15 20
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 325
Connection between binomial and normal
distribution (cont.)
0 5 10 15 20
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 326
Connection between binomial and normal
distribution (cont.)
You find that your GEM2900 lecturer is becoming increasingly
difficult, unreasonable and paranoid as the semester progresses.
As you don’t have time for this kind of thing you decide to simply
guess an answer at random for each question.
What is the probability that you pass the CA (that is, what is the
probability that you obtain a score of 50 or more correct?)
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 327
Connection between binomial and normal
distribution (cont.)
Let X be your score. Then if you are guessing each question randomly clearly
X ∼ Binomial(100, 0.25). Using the result that for a binomial with parameters
n and p the mean
is np and the variance np(1 − p) we have E(X) = 25 and SD(X) = 300/16.
Let Y be a normal random variable with the same mean and variance as X , i.e.
Y ∼ N (25, 300/16).
Then
P (X ≥ 50) ≈ P (Y ≥ 49.5)
Y −μ 49.5 − μ
= P ≥
σ σ
49.5 − 25
= P Z≥
300/16
= P (Z ≥ 5.66)
≈ 7.6 × 10−9
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 328
Connection between Poisson and normal
distribution
Assume X follows a Poisson distribution with parameters λ.
Then we can view X as the sum of n independent random variables,
each having a Poisson distribution with parameter nλ .
Hence, if Y is a normally distributed random variable with mean λ and
variance λ, then
1
x+ 2
1 1
P (X = x) ≈ P x− ≤Y ≤x+ = fY (u) du
2 2 1
x− 2
≈ fY (x).
2
(y−λ)
where fY (y) = √ 1 exp − 2λ is the probability density function
2πλ
of Y .
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 329
Connection between Poisson and normal
distribution (cont.)
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 330