GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott

GEM2900: Understanding
Uncertainty & Statistical Thinking
David Nott
standj@nus.edu.sg
Department of Statistics and Applied Probability
National University of Singapore
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 1
The normal distribution
The normal distribution, also called the Gaussian distribution, can be

used to model continuous random variables.
The normal distribution has two parameters μ and σ 2 , called the

(population) mean and (population) variance, respectively.
If the random variable X follows a normal distribution with

parameters μ and σ 2 (also denoted by X ∼ N (μ, σ 2 )) then
E[X] = μ and Var[X] = σ 2 .
Woolfson (2008, Chapter 10)

The normal distribution (cont.)
The probability density function (pdf) of the normal distribution is symmetric, with a
smooth bell-shape.
The formula for the pdf of the N (μ, σ2 ) distribution is:
2

1 (x − μ)
f (x) = √ exp −
2πσ 2 2σ 2
The pdf is largest at μ, the expected value, and how “spread out” the density curve
is will be controlled by σ , the standard deviation.
Areas of regions underneath the pdf represent probabilities.
Normal pdfs with same mean but different spread
0.4
0.3
0.2
f(x)
0.1
0.0
5 10 15
As I mentioned, the normal distribution is sometimes also called the

Gaussian distribution, named for the German mathematician Karl
Friedrich Gauss.
The Gaussian distribution, however, was not first discovered by

Gauss, an example of what mathematicians call Stigler’s law. (This is
the law that discoveries should not be named after their inventors).
Stigler’s law is of course an example of Stigler’s law ...
Until conversion to the euro, the German ten mark bill had a picture
of Gauss, a picture of the Gaussian density curve, and even the
formula for the Gaussian density on it.
Brownian motion
One problem in which the Gaussian distribution arises is in the description of the
physical phenomenon of Brownian motion. Albert Einstein’s mathematical
description of Brownian motion is widely regarded as having convinced most
physicists of the existence of atoms.
Brownian motion is named after Robert Brown, who observed the highly erratic
motion of pollen grains in a drop of water.
The motion of the grains can be thought of as arising from a large number of
collisions with the water molecules. Brownian motion is an idealized model for the
coordinates of the motion.
For a particle starting at the origin and followed for a period of time t the
distribution of the x or y coordinates will be N (0, σ 2 t) for some σ 2 > 0.
You will understand better later why the summing of a large number of independent
perturbations from molecular collisions results in a Gaussian distribution.
Brownian motion
It is possible to simulate paths of Brownian motion on a computer.
The java applet at

http://www.ms.uky.edu/ mai/java/stat/brmo.html
simulates a Brownian motion in two dimensions.
The paths traced out by the process tend to be highly erratic (in fact,
although they are continuous they are not differentiable anywhere, for
those of you who have done enough math to know what that means).
Brownian motion
Although Robert Brown wasn’t the first to observe Brownian motion

(Stigler’s law again), he was certainly the most systematic
experimenter to look at this phenomenon.
He looked at not just pollen grains suspended in a water drop, but

also many other things (including scrapings of particles from the
sphinx, which he had access to in his work as a curator at the British
museum - there was some question about whether only living things
were subject to Brownian motion and apparently he regarded the
sphinx as undeniably, certifiably dead).
Brownian motion
A century and a half before Brown a draper from Delft, Antony Van
Leeuwenhoek, had observed Brownian motion.
Among other things, he had looked at scrapings of the unbrushed

teeth of old men.
Einstein was the first scientist to really take an interest in Brownian

motion who also had the mathematical ability to describe it
analytically.
Other scientists had guessed the correct explanation but had not
done any calculations that could be compared to experiments.
Calculating with the normal distribution
A normal distribution is specified by its mean μ and its variance σ 2 .
The standard normal distribution has mean μ = 0 and variance σ2 = 1.

If a random variable is distributed as standard normal, it is typically denoted by Z
rather than X , and the standard normal observations are known as z -values.
The standard normal distribution is symmetric about μ = 0, hence

P(Z < −c) = P(Z > c) for all c ∈ R.
Probabilities of (continuous) random variables are given by areas under pdf
curves. For example, suppose Z is standard normal. Then the probability
P(0 < Z < 1), i.e. the probability that Z is between zero and one, is equal to the
area of the shaded region in the next slide.
(cont.)
0.4
0.3
standard normal pdf
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
z values
(cont.)
How to find normal probabilities?
Use a computer.
All software packages will calculate P(X < c) for X distributed
normal with mean μ and standard deviation σ .
E.g. R has the command pnorm().
If we wish to calculate a probability P(a < X < b), then we can use
a computer to calculate P(X < a) and P(X < b) and then use
P(a < X < b) = P(X < b) − P(X < a)
Alternatively, the probabilities can be calculated “by hand” using a

technique known as standardisation.
(cont.)
Standardisation
The standardised version of a random variable X with (population)
mean μ and (population) standard deviation σ is
X −μ
Z=
σ
The mean of Z is zero; the standard deviation of Z is one.
If X has a normal distribution, then Z has a standard normal

distribution!
This result does not necessarily hold for other distributions.
(cont.)
Example: Marilyn vos Savant’s IQ
Earlier in the course I mentioned Marilyn vos Savant, who was listed for a time in
the Guinness Book of Records as the person with the world’s highest IQ.
She popularized the Monty Hall problem with her column in Parade magazine. Her
presentation of the problem and her (correct) solution, generated a lot of heated
discussion at the time.
IQ is a rather controversial concept as intelligence is certainly a multi-faceted trait.

(If you are interested in the history and abuses of IQ testing you might like to read
Stephen Jay Gould’s (1981) book “The Mismeasure of Man”)
IQ scores are often assumed to follow a normal distribution, calibrated so that the
mean is 100 and the standard deviation is 15.
Marilyn vos Savant’s IQ is 228. What is the probability of someone randomly

chosen from the general population having an IQ score larger than this?
(cont.)
Example: (cont.)
Let X have a normal distribution with mean 100 and standard deviation
15.
The questions ask for P(X > 228).
Method One: Using a computer
In R, 1-pnorm(228, mean=100, sd=15) gives a value of 0

to machine precision ...
Suppose her IQ was a mere 150. Then
1-pnorm(150,mean=100,sd=15) gives 0.0004.
(cont.)
Example: (cont.)
Method Two: using standardisation
We have:

X −μ 228 − μ 228 − 100
P(X > 228) = P > =P Z> = P(Z > 8.53)
σ σ 15
So the probability we need is P(Z > 8.53), where Z has a standard normal distribution. But
how do we work out what this is? We have to look this probability up in a table, like the one
given in the text book. Actually 8.53 is well beyond the upper limit of values in the table. You
won’t be required to read normal tables for anything in this course.
(cont.)
Consider a normally distributed random variable X with mean μ and variance σ2 , and the random
X−μ
variable Z = σ which is standard normal.
The for c > 0

P(μ − cσ < X < μ + cσ) = P(−c < Z < c)
For example
P(μ − σ < X < μ + σ) = P(−1 < Z < 1) ≈ 0.682

P(μ − 2σ < X < μ + 2σ) = P(−2 < Z < 2) ≈ 0.954
P(μ − 3σ < X < μ + 3σ) = P(−3 < Z < 3) ≈ 0.997
This is sometimes called the 68 — 95 — 99.7 rule.

If data are normally distributed, roughly 68% (≈ 23 ) of observations should be within one standard
deviation (SD) of the mean; and roughly 95% of observations should be within two SDs of the
mean.
The 68 — 95 — 99.7 rule
Normal probability density function
Probability of falling Mean μ

into given region Standard deviation σ
68%
95%
99.7%
μ − 3σ μ − 2σ μ−σ μ μ+σ μ + 2σ μ + 3σ
Why is the normal distribution so important?
Motivating example: sums of dice
The next five slides show the probabilities of the sums of

n = 1, 2, 3, 4 and 5 dice.
What happens to the shape of the probabilities as the number of dice

n increases?
Amazingly, this will happen with (almost) any random variable; as

long as n is large enough the probabilities for the sum (and mean)
will start to follow a normal bell-shaped curve.
(cont.)
0.15
0.10
Sum of one die
0.05
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
(cont.)
0.15
0.10
Sum of two dice
0.05
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
(cont.)
0.12
0.10
0.08
Sum of three dice
0.06
0.04
0.02
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
(cont.)
0.10
0.08
Sum of four dice
0.06
0.04
0.02
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
(cont.)
0.10
0.08
Sum of five dice
0.06
0.04
0.02
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
values
The Central Limit Theorem (CLT)
Consider a random variable; such as the number on a die roll.
Suppose we observe n realisations of the random variable; i.e. we

roll the die n times.
If n is large enough, then the distribution of the sum (and mean) of

the n values follows approximately a normal distribution.
N OTE: We assume here that the value obtained on one die roll does not
affect the value obtained on another die roll; i.e. we are assuming
independence.
If we do not have independence, then the CLT may not hold.
But there are many different sets of assumptions under which the CLT
holds.
What does the CLT tell us?
It shows that a normally distributed random variable can be regarded

as the sum (or mean) of a large number of small random
contributions.
Often it can be argued that variables observed in the real physical

world are subject to a large number of different sources of variability.
It is therefore not very surprising that many real-life variables are of

the form signal + noise where the noise has an approximate normal
distribution.
Why is the CLT (and the normal distribution)
important?
It explains why many real-life observed variables have a signal +
noise form with the noise following a normal distribution.
In statistics, very commonly used quantities are sums or means of

observations, so the CLT tells us that these quantities have
approximate normal distributions.
Hence many methods in statistics rely on the normal distribution.
Connection between binomial and normal
distribution
Assume X follows a binomial distribution with parameters n and p.
Then we can view X as the sum of n independent random variables,
each having a Bernoulli distribution with parameter p.
Hence, if Y is a normally distributed random variable with mean np and
variance np(1 − p), then
1
x+ 2
1 1
P (X = x) ≈ P x− ≤Y ≤x+ = fY (u) du
2 2 1
x− 2
≈ fY (x).
2

(y−np)
where fY (y) = √ 1
exp − 2np(1−p) is the probability density
2πnp(1−p)
function of Y .
N OTE : a typical rule of thumb is that this approximation is valid if np ≥ 5 and n(1 − p) ≥ 5.
distribution (cont.)
Binom(20, 0.4) pmf with N(8,4.8) pdf overlayed

0.15
0.10
0.05
0.00
0 5 10 15 20
Binom(20, 0.5) pmf with N(10,5) pdf overlayed

0.15
0.10
0.05
0.00
0 5 10 15 20
You find that your GEM2900 lecturer is becoming increasingly
difficult, unreasonable and paranoid as the semester progresses.
He has just set a continuing assessment for you to do on the IVLE

containing 100 multiple choice questions with 4 options each.
As you don’t have time for this kind of thing you decide to simply
guess an answer at random for each question.
What is the probability that you pass the CA (that is, what is the
probability that you obtain a score of 50 or more correct?)
Let X be your score. Then if you are guessing each question randomly clearly
X ∼ Binomial(100, 0.25). Using the result that for a binomial with parameters
n and p the mean
is np and the variance np(1 − p) we have E(X) = 25 and SD(X) = 300/16.
Let Y be a normal random variable with the same mean and variance as X , i.e.
Y ∼ N (25, 300/16).
Then
P (X ≥ 50) ≈ P (Y ≥ 49.5)

Y −μ 49.5 − μ
= P ≥
σ σ

49.5 − 25
= P Z≥
300/16
= P (Z ≥ 5.66)
≈ 7.6 × 10−9
Connection between Poisson and normal
distribution
Assume X follows a Poisson distribution with parameters λ.
Then we can view X as the sum of n independent random variables,
each having a Poisson distribution with parameter nλ .
Hence, if Y is a normally distributed random variable with mean λ and
variance λ, then
1
x+ 2
1 1
P (X = x) ≈ P x− ≤Y ≤x+ = fY (u) du
2 2 1
x− 2
≈ fY (x).
2

(y−λ)
where fY (y) = √ 1 exp − 2λ is the probability density function
2πλ
of Y .
Connection between Poisson and normal
Poisson(125) pmf with N(125,125) pdf overlayed

0.030
0.020
0.010
0.000
100 120 140 160

GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

GEM2900: Understanding Uncertainty & Statistical Thinking: David Nott

Hochgeladen von

Copyright:

Verfügbare Formate

GEM2900: Understanding

Uncertainty & Statistical Thinking

The normal distribution, also called the Gaussian distribution, can be

The normal distribution has two parameters μ and σ 2 , called the

If the random variable X follows a normal distribution with

E[X] = μ and Var[X] = σ 2 .

Woolfson (2008, Chapter 10)

Areas of regions underneath the pdf represent probabilities.

Normal pdfs with same mean but different spread

As I mentioned, the normal distribution is sometimes also called the

The Gaussian distribution, however, was not first discovered by

It is possible to simulate paths of Brownian motion on a computer.

The java applet at

Although Robert Brown wasn’t the first to observe Brownian motion

He looked at not just pollen grains suspended in a water drop, but

Among other things, he had looked at scrapings of the unbrushed

Einstein was the first scientist to really take an interest in Brownian

A normal distribution is specified by its mean μ and its variance σ 2 .

The standard normal distribution has mean μ = 0 and variance σ2 = 1.

The standard normal distribution is symmetric about μ = 0, hence

P(a < X < b) = P(X < b) − P(X < a)

Alternatively, the probabilities can be calculated “by hand” using a

The mean of Z is zero; the standard deviation of Z is one.

If X has a normal distribution, then Z has a standard normal

IQ is a rather controversial concept as intelligence is certainly a multi-faceted trait.

Marilyn vos Savant’s IQ is 228. What is the probability of someone randomly

Method One: Using a computer

In R, 1-pnorm(228, mean=100, sd=15) gives a value of 0

Method Two: using standardisation

The for c > 0

P(μ − σ < X < μ + σ) = P(−1 < Z < 1) ≈ 0.682

This is sometimes called the 68 — 95 — 99.7 rule.

Normal probability density function

Probability of falling Mean μ

Motivating example: sums of dice

The next five slides show the probabilities of the sums of

What happens to the shape of the probabilities as the number of dice

Amazingly, this will happen with (almost) any random variable; as

Consider a random variable; such as the number on a die roll.

Suppose we observe n realisations of the random variable; i.e. we

If n is large enough, then the distribution of the sum (and mean) of

It shows that a normally distributed random variable can be regarded

Often it can be argued that variables observed in the real physical

It is therefore not very surprising that many real-life variables are of

In statistics, very commonly used quantities are sums or means of

Hence many methods in statistics rely on the normal distribution.

Binom(20, 0.4) pmf with N(8,4.8) pdf overlayed

Binom(20, 0.5) pmf with N(10,5) pdf overlayed

He has just set a continuing assessment for you to do on the IVLE

Poisson(125) pmf with N(125,125) pdf overlayed

100 120 140 160

Das könnte Ihnen auch gefallen