Sie sind auf Seite 1von 35

GEM2900: Understanding

Uncertainty & Statistical Thinking

David Nott
standj@nus.edu.sg
Department of Statistics and Applied Probability
National University of Singapore

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 1
The normal distribution

 The normal distribution, also called the Gaussian distribution, can be


used to model continuous random variables.

 The normal distribution has two parameters μ and σ 2 , called the


(population) mean and (population) variance, respectively.

 If the random variable X follows a normal distribution with


parameters μ and σ 2 (also denoted by X ∼ N (μ, σ 2 )) then

E[X] = μ and Var[X] = σ 2 .

Woolfson (2008, Chapter 10)


GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 297
The normal distribution (cont.)

 The probability density function (pdf) of the normal distribution is symmetric, with a
smooth bell-shape.
The formula for the pdf of the N (μ, σ2 ) distribution is:
 2

1 (x − μ)
f (x) = √ exp −
2πσ 2 2σ 2

 The pdf is largest at μ, the expected value, and how “spread out” the density curve
is will be controlled by σ , the standard deviation.

 Areas of regions underneath the pdf represent probabilities.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 298
The normal distribution (cont.)

Normal pdfs with same mean but different spread

0.4
0.3
0.2
f(x)

0.1
0.0

5 10 15

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 299
The normal distribution (cont.)

 As I mentioned, the normal distribution is sometimes also called the


Gaussian distribution, named for the German mathematician Karl
Friedrich Gauss.

 The Gaussian distribution, however, was not first discovered by


Gauss, an example of what mathematicians call Stigler’s law. (This is
the law that discoveries should not be named after their inventors).
Stigler’s law is of course an example of Stigler’s law ...

 Until conversion to the euro, the German ten mark bill had a picture
of Gauss, a picture of the Gaussian density curve, and even the
formula for the Gaussian density on it.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 300
The normal distribution (cont.)

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 301
Brownian motion

 One problem in which the Gaussian distribution arises is in the description of the
physical phenomenon of Brownian motion. Albert Einstein’s mathematical
description of Brownian motion is widely regarded as having convinced most
physicists of the existence of atoms.

 Brownian motion is named after Robert Brown, who observed the highly erratic
motion of pollen grains in a drop of water.

 The motion of the grains can be thought of as arising from a large number of
collisions with the water molecules. Brownian motion is an idealized model for the
coordinates of the motion.

 For a particle starting at the origin and followed for a period of time t the
distribution of the x or y coordinates will be N (0, σ 2 t) for some σ 2 > 0.

 You will understand better later why the summing of a large number of independent
perturbations from molecular collisions results in a Gaussian distribution.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 302
Brownian motion

 It is possible to simulate paths of Brownian motion on a computer.

 The java applet at


http://www.ms.uky.edu/ mai/java/stat/brmo.html
simulates a Brownian motion in two dimensions.

 The paths traced out by the process tend to be highly erratic (in fact,
although they are continuous they are not differentiable anywhere, for
those of you who have done enough math to know what that means).

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 303
Brownian motion

 Although Robert Brown wasn’t the first to observe Brownian motion


(Stigler’s law again), he was certainly the most systematic
experimenter to look at this phenomenon.

 He looked at not just pollen grains suspended in a water drop, but


also many other things (including scrapings of particles from the
sphinx, which he had access to in his work as a curator at the British
museum - there was some question about whether only living things
were subject to Brownian motion and apparently he regarded the
sphinx as undeniably, certifiably dead).

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 304
Brownian motion

 A century and a half before Brown a draper from Delft, Antony Van
Leeuwenhoek, had observed Brownian motion.

 Among other things, he had looked at scrapings of the unbrushed


teeth of old men.

 Einstein was the first scientist to really take an interest in Brownian


motion who also had the mathematical ability to describe it
analytically.

 Other scientists had guessed the correct explanation but had not
done any calculations that could be compared to experiments.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 305
Calculating with the normal distribution

 A normal distribution is specified by its mean μ and its variance σ 2 .

 The standard normal distribution has mean μ = 0 and variance σ2 = 1.


 If a random variable is distributed as standard normal, it is typically denoted by Z
rather than X , and the standard normal observations are known as z -values.

 The standard normal distribution is symmetric about μ = 0, hence


P(Z < −c) = P(Z > c) for all c ∈ R.
 Probabilities of (continuous) random variables are given by areas under pdf
curves. For example, suppose Z is standard normal. Then the probability
P(0 < Z < 1), i.e. the probability that Z is between zero and one, is equal to the
area of the shaded region in the next slide.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 306
Calculating with the normal distribution
(cont.)

0.4
0.3
standard normal pdf

0.2
0.1
0.0

−3 −2 −1 0 1 2 3

z values

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 307
Calculating with the normal distribution
(cont.)
How to find normal probabilities?
 Use a computer.
All software packages will calculate P(X < c) for X distributed
normal with mean μ and standard deviation σ .
E.g. R has the command pnorm().

 If we wish to calculate a probability P(a < X < b), then we can use
a computer to calculate P(X < a) and P(X < b) and then use

P(a < X < b) = P(X < b) − P(X < a)

 Alternatively, the probabilities can be calculated “by hand” using a


technique known as standardisation.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 308
Calculating with the normal distribution
(cont.)
Standardisation
 The standardised version of a random variable X with (population)
mean μ and (population) standard deviation σ is

X −μ
Z=
σ

 The mean of Z is zero; the standard deviation of Z is one.

 If X has a normal distribution, then Z has a standard normal


distribution!
This result does not necessarily hold for other distributions.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 309
Calculating with the normal distribution
(cont.)
Example: Marilyn vos Savant’s IQ

 Earlier in the course I mentioned Marilyn vos Savant, who was listed for a time in
the Guinness Book of Records as the person with the world’s highest IQ.

 She popularized the Monty Hall problem with her column in Parade magazine. Her
presentation of the problem and her (correct) solution, generated a lot of heated
discussion at the time.

 IQ is a rather controversial concept as intelligence is certainly a multi-faceted trait.


(If you are interested in the history and abuses of IQ testing you might like to read
Stephen Jay Gould’s (1981) book “The Mismeasure of Man”)

 IQ scores are often assumed to follow a normal distribution, calibrated so that the
mean is 100 and the standard deviation is 15.

 Marilyn vos Savant’s IQ is 228. What is the probability of someone randomly


chosen from the general population having an IQ score larger than this?
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 310
Calculating with the normal distribution
(cont.)
Example: (cont.)
Let X have a normal distribution with mean 100 and standard deviation
15.
The questions ask for P(X > 228).

 Method One: Using a computer

 In R, 1-pnorm(228, mean=100, sd=15) gives a value of 0


to machine precision ...
 Suppose her IQ was a mere 150. Then
1-pnorm(150,mean=100,sd=15) gives 0.0004.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 311
Calculating with the normal distribution
(cont.)
Example: (cont.)

 Method Two: using standardisation

 We have:
   
X −μ 228 − μ 228 − 100
P(X > 228) = P > =P Z> = P(Z > 8.53)
σ σ 15

So the probability we need is P(Z > 8.53), where Z has a standard normal distribution. But
how do we work out what this is? We have to look this probability up in a table, like the one
given in the text book. Actually 8.53 is well beyond the upper limit of values in the table. You
won’t be required to read normal tables for anything in this course.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 312
Calculating with the normal distribution
(cont.)
 Consider a normally distributed random variable X with mean μ and variance σ2 , and the random
X−μ
variable Z = σ which is standard normal.

 The for c > 0


P(μ − cσ < X < μ + cσ) = P(−c < Z < c)
 For example

P(μ − σ < X < μ + σ) = P(−1 < Z < 1) ≈ 0.682


P(μ − 2σ < X < μ + 2σ) = P(−2 < Z < 2) ≈ 0.954
P(μ − 3σ < X < μ + 3σ) = P(−3 < Z < 3) ≈ 0.997

 This is sometimes called the 68 — 95 — 99.7 rule.


 If data are normally distributed, roughly 68% (≈ 23 ) of observations should be within one standard
deviation (SD) of the mean; and roughly 95% of observations should be within two SDs of the
mean.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 313
The 68 — 95 — 99.7 rule

Normal probability density function

Probability of falling Mean μ


into given region Standard deviation σ
68%
95%
99.7%

μ − 3σ μ − 2σ μ−σ μ μ+σ μ + 2σ μ + 3σ

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 314
Why is the normal distribution so important?

Motivating example: sums of dice

 The next five slides show the probabilities of the sums of


n = 1, 2, 3, 4 and 5 dice.

 What happens to the shape of the probabilities as the number of dice


n increases?

 Amazingly, this will happen with (almost) any random variable; as


long as n is large enough the probabilities for the sum (and mean)
will start to follow a normal bell-shaped curve.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 315
Why is the normal distribution so important?
(cont.)
0.15
0.10
Sum of one die

0.05
0.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 316
Why is the normal distribution so important?
(cont.)
0.15
0.10
Sum of two dice

0.05
0.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 317
Why is the normal distribution so important?
(cont.)
0.12
0.10
0.08
Sum of three dice

0.06
0.04
0.02
0.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 318
Why is the normal distribution so important?
(cont.)
0.10
0.08
Sum of four dice

0.06
0.04
0.02
0.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 319
Why is the normal distribution so important?
(cont.)
0.10
0.08
Sum of five dice

0.06
0.04
0.02
0.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

values
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 320
The Central Limit Theorem (CLT)

 Consider a random variable; such as the number on a die roll.

 Suppose we observe n realisations of the random variable; i.e. we


roll the die n times.

 If n is large enough, then the distribution of the sum (and mean) of


the n values follows approximately a normal distribution.

N OTE: We assume here that the value obtained on one die roll does not
affect the value obtained on another die roll; i.e. we are assuming
independence.
If we do not have independence, then the CLT may not hold.
But there are many different sets of assumptions under which the CLT
holds.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 321
What does the CLT tell us?

 It shows that a normally distributed random variable can be regarded


as the sum (or mean) of a large number of small random
contributions.

 Often it can be argued that variables observed in the real physical


world are subject to a large number of different sources of variability.

 It is therefore not very surprising that many real-life variables are of


the form signal + noise where the noise has an approximate normal
distribution.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 322
Why is the CLT (and the normal distribution)
important?
 It explains why many real-life observed variables have a signal +
noise form with the noise following a normal distribution.

 In statistics, very commonly used quantities are sums or means of


observations, so the CLT tells us that these quantities have
approximate normal distributions.

 Hence many methods in statistics rely on the normal distribution.

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 323
Connection between binomial and normal
distribution
Assume X follows a binomial distribution with parameters n and p.
Then we can view X as the sum of n independent random variables,
each having a Bernoulli distribution with parameter p.
Hence, if Y is a normally distributed random variable with mean np and
variance np(1 − p), then
   1
x+ 2
1 1
P (X = x) ≈ P x− ≤Y ≤x+ = fY (u) du
2 2 1
x− 2

≈ fY (x).
 2

(y−np)
where fY (y) = √ 1
exp − 2np(1−p) is the probability density
2πnp(1−p)
function of Y .
N OTE : a typical rule of thumb is that this approximation is valid if np ≥ 5 and n(1 − p) ≥ 5.
GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 324
Connection between binomial and normal
distribution (cont.)

Binom(20, 0.4) pmf with N(8,4.8) pdf overlayed


0.15
0.10
0.05
0.00

0 5 10 15 20

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 325
Connection between binomial and normal
distribution (cont.)

Binom(20, 0.5) pmf with N(10,5) pdf overlayed


0.15
0.10
0.05
0.00

0 5 10 15 20

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 326
Connection between binomial and normal
distribution (cont.)
 You find that your GEM2900 lecturer is becoming increasingly
difficult, unreasonable and paranoid as the semester progresses.

 He has just set a continuing assessment for you to do on the IVLE


containing 100 multiple choice questions with 4 options each.

 As you don’t have time for this kind of thing you decide to simply
guess an answer at random for each question.

 What is the probability that you pass the CA (that is, what is the
probability that you obtain a score of 50 or more correct?)

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 327
Connection between binomial and normal
distribution (cont.)
 Let X be your score. Then if you are guessing each question randomly clearly
X ∼ Binomial(100, 0.25). Using the result that for a binomial with parameters
n and p the mean
is np and the variance np(1 − p) we have E(X) = 25 and SD(X) = 300/16.

 Let Y be a normal random variable with the same mean and variance as X , i.e.
Y ∼ N (25, 300/16).
 Then

P (X ≥ 50) ≈ P (Y ≥ 49.5)
 
Y −μ 49.5 − μ
= P ≥
σ σ


49.5 − 25
= P Z≥
300/16
= P (Z ≥ 5.66)
≈ 7.6 × 10−9

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 328
Connection between Poisson and normal
distribution
Assume X follows a Poisson distribution with parameters λ.
Then we can view X as the sum of n independent random variables,
each having a Poisson distribution with parameter nλ .
Hence, if Y is a normally distributed random variable with mean λ and
variance λ, then
   1
x+ 2
1 1
P (X = x) ≈ P x− ≤Y ≤x+ = fY (u) du
2 2 1
x− 2

≈ fY (x).
 2

(y−λ)
where fY (y) = √ 1 exp − 2λ is the probability density function
2πλ
of Y .

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 329
Connection between Poisson and normal
distribution (cont.)

Poisson(125) pmf with N(125,125) pdf overlayed


0.030
0.020
0.010
0.000

100 120 140 160

GEM2900: Understanding Uncertainty & Statistical Thinking DSAP, NUS, Semester 2, 2008/2009 – 330

Das könnte Ihnen auch gefallen