Ips Slides

ST1051-ST3905-ST5005-ST6030
ST1051 - Introduction to Probability and Statistics

ST3905 - Applied Probability and Statistics
ST5005 - Introduction to Probability and Statistics
ST6030 - Foundations of Statistical Data Analytics
Eric Wolsztynski
eric.w@ucc.ie
Department of Statistics
School of Mathematical Sciences
University College Cork, Ireland
2015-2016
Version 1.0
ST1051-ST3905-ST5005-ST6030
Acknowledgment
These lecture notes make use of former material written by Dr
Kingshuk Roy Choudhury and Dr Supratik Roy for previous course
syllabii. This material largely used [Dekking et al 2005].
However the structure of the course has been completely reviewed

in 2014-15 and updated again for 2015-16. Updates are based
mainly on [Rice 1995].
All mistakes and inaccuracies are the sole responsibility of their

author Eric Wolsztynski.
For any comment or query about this document, please contact

eric.w@ucc.ie
IPS 2
ST1051-ST3905-ST5005-ST6030
Course information
References
[1] J. A. Rice, Mathematical Statistics and Data Analysis, 2nd Edition, ITP Duxbury Press 1995
[2] J. L. Devore, Probability and Statistics for Engineering and the Sciences, 3rd Edition, Brooks-Cole 1991
[3] F. M. Dekking, C. Kraaikamp, H. P. Lopuha and L. E. Meester, A Modern Introduction to Probability and
Statistics, Springer 2005
[4] B.W. Lindgren, Statistical Theory, Fourth Edition, Chapman & Hall, 1993
[5] D.A. Berry and B.W. Lindgren, Statistics: Theory and Methods, 2nd edition, 1995
[6] MITOpenCourseWare (MIT online lecture material):

http://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/
[7] J. D. Gibbons and S. Chakraborti, Nonparametric Statistical Inference, 4th Edition, Dekker 2014
[8] B. S. Everitt and T. Hothorn, A Handbook of Statistical Analyses Using R, Second Edition, Chapman & Hall
2010
[9] M. J. Crawley, Statistics: an Introduction Using R, Wiley 2005
[10] R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. URL http://www.R-project.org/.
IPS 3
ST1051-ST3905-ST5005-ST6030
Course information
Timetable
This module is taught in Period 1

Lectures: Mondays 3-4pm in BHSC G01
Fridays 3-4pm in WGB G05
Tutorials: Fridays 11am-12pm in Windle ANLT

Fridays 4-5pm in WGB G05
Practicals: ST1051
Monday 4-5pm in lab WGB G34 (TBC)
Tuesday 3-4pm in lab WGB G33 (TBC)
IPS 4
ST1051-ST3905-ST5005-ST6030
Course information
Assessment
ST1051/ST3905:
2 home assignments (10 + 10 marks)

+ 90-minute exam (80 marks)
ST5005/ST6030:
3 home assignments (10 + 10 + 30 marks)

+ 90-minute exam (50 marks)
IPS 5
ST1051-ST3905-ST5005-ST6030
Course information
Module objective
To provide an understanding of fundamental notions of Probability

and Statistics, and explore basic probability and statistical notions
underlying hypothesis-driven data analytic methods.
IPS 6
ST1051-ST3905-ST5005-ST6030
Outline
1 Motivation
2 Elements of Probability Theory
3 Discrete Random Variables
4 Continuous Random Variables
5 Limit theorems
6 Statistical Inference
7 Estimation
8 Hypothesis Testing
IPS 7
ST1051-ST3905-ST5005-ST6030
Motivation
Section I
Motivation
IPS 8
ST1051-ST3905-ST5005-ST6030
Motivation
General concepts
Probability? Statistics?
Focus on random or unpredictable phenomenon
Goal is usually to understand, represent, describe or predict
Probability theory aims at describing reality: mathematical

framework for representing real-life phenomena
Statistics aim at providing models and techniques to analyse

observations: data-driven approach
The central feature is always the information (data).
IPS 9
ST1051-ST3905-ST5005-ST6030
Motivation
General concepts
Statistics consist in the collection and analysis of data.
Probability theory provides a mathematical foundation for

statistics.
IPS 10
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Typical examples
Business, financial mathematics and actuarial science:
decision making, investment strategies
trading (high-probability trading, return plans, strategies, ...)
insurance / pensions (premium pricing, risk assessment, ...)
Engineering:
tracking mobile terminals in wireless networks
image and video processing
Medical and biostatistics:

clinical trials
diagnostic and prognostic analyses
genomics
IPS 11
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Why probability and statistics: space shuttle Challenger
[Dekking et al 2005]
On 28th January 1986, the space shuttle Challenger exploded

about one minute after it had taken off from the launch pad
at Kennedy Space Center in Florida
Root cause of the disaster: failure of O-rings (sealed joints

that link rocket boosters)
Apparently, a management decision was made to overrule

the engineers recommendation not to launch
IPS 12
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Why probability and statistics: space shuttle Challenger
The Challenger launch was the 24th of the space shuttle

program, and we can look at the data on the number of failed
O-rings, available from previous launches
Each rocket has three O-rings, and two rocket boosters are
used per launch
Because low temperatures are known to adversely affect the

O-rings, we also look at the corresponding launch temperature
IPS 13
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Figure: number of failed O-rings per mission

There are 23 dots: one time the boosters could not be recovered
from the ocean; temperatures are rounded to the nearest degree
Fahrenheit; in case of two or more equal data points these are
shifted slightly
IPS 14
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Modelling...
The probability p(t) that an individual O-ring fails should depend
on the launch temperature t. Use the data to calibrate this model
(a Binomial distribution) and estimate the expected number of
failures, 6p(t).
IPS 15
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Aftermaths...
Combining these with estimated probabilities of other events

needed for a complete failure of the joint, the estimated
probability of failure is 0.023...
Six field-joints implies probability of at least one complete

failure is 1 (1 0.023)6 = 0.13
Would you hop on the shuttle?
IPS 16
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Section II
IPS 17
ST1051-ST3905-ST5005-ST6030
Outline
Introduction
Events and set operations
Computing probabilities
Conditional probability and independence
Random variables and distributions
IPS 18
ST1051-ST3905-ST5005-ST6030
Introduction
Probability
Probability, chance, randomness, likelihood, ...
Probability theory aims at representing chance phenomena

mathematically.
Mathematics allow us to organise the information and its

complexity.
Ultimately, probabilities are always ratios of counts.
IPS 19
ST1051-ST3905-ST5005-ST6030
Introduction
Outcomes, events, and sample spaces
Random or unpredictable phenomenon = experiment outcome
The outcomes are elements of a sample space
Subsets of are called events
An event is assigned a probability, between 0 and 1, that

expresses its likelihood
Sample spaces: sets whose elements describe the outcomes
Basic experiment: the tossing of a coin; 2 possible outcomes:

heads and tails. Sample space = {H, T }
IPS 20
ST1051-ST3905-ST5005-ST6030
Introduction
Outcomes, events, and sample spaces

A commuter drives through a sequence of 3 intersections with
traffic lights. Each time, she either stops (s) or continues (c).
The sample space is the set of all possible outcomes:
= {ccc, ccs, css, csc, sss, ssc, scc, scs}
Experiment: ask the next person we meet on the street in

which month her birthday falls. Sample space:
= {Jan, Feb, Mar , Apr , May , Jun, Jul, Aug , Sep, Oct, Nov , Dec}
Question: the length of time between successive earthquakes

in Nice (France) that are greater than a given magnitude, may
also be considered an experiment. What is the sample space
for this experiment?
IPS 21
ST1051-ST3905-ST5005-ST6030
Introduction
Products of sample spaces

Common scenario: same experiment performed several times
Ex: throw a coin twice. What is the sample space?
= {H, T } {H, T } = {(H, H), (H, T ), (T , H), (T , T )}
If we had a fair coin, i.e., P(H) = P(T ), then
P((H, H)) = P((H, T )) = P((T , H)) = P((T , T )) = 1/4
Generally, for two experiments with sample spaces 1 and 2 ,

sample space for the combined experiment is
= 1 2 = {(1 , 2 ) : 1 1 , 2 2 }
If |1 | = r ,|2 | = s, then |1 2 | = rs
IPS 22
ST1051-ST3905-ST5005-ST6030
Introduction
Events
Recall: subsets of the sample space are called events
Event A occurs if experiment outcome is an element of set A
Example (birthday experiment):

events = outcomes corresponding to a long month (31 days)
L = {Jan, Mar , May , Jul, Aug , Oct, Dec}
Events may be combined according to the usual set operations
Example: event = the months having r in their name

R = {Jan, Feb, Mar , Apr , Sep, Oct, Nov , Dec}
Then long months having the letter r are
L R = {Jan, Mar , Oct, Dec}
IPS 23
ST1051-ST3905-ST5005-ST6030
Events
The set L R is called the intersection of L and R and occurs

if both L and R occur
Similarly, we have the union A B of two sets A and B,

which occurs if at least one of the events A and B occurs
Another common operation is taking complements
The event Ac = { : 6 A} is called the complement of

A; it occurs if and only if A does not occur
The complement of is denoted , the empty set, or

impossible event
IPS 24
ST1051-ST3905-ST5005-ST6030
Sets and set operations and Events
Events A and B disjoint or mutually exclusive if A and B have

no outcomes in common; or A B = .
Ex: {the birthday falls in a long month} {Feb} =
Event A implies event B if the outcomes of A also lie in B:

AB
De Morgans laws: For any two events A and B,
(A B)c = Ac B c
c c c IPS 25
ST1051-ST3905-ST5005-ST6030
Sets and set operations and Events

Let:
J be the event John is to blame
M be the event Mary is to blame
Express the following two statements in terms of the events

J, J c , M, M c :
It is certainly not true that neither John nor Mary is to blame
John or Mary is to blame, or both
Check the equivalence of the statements by means of De

Morgans laws
IPS 26
ST1051-ST3905-ST5005-ST6030
Disjoint and contained events
Minimal and maximal intersection of two sets.
IPS 27
ST1051-ST3905-ST5005-ST6030
Probability
Probability = measure of how likely it is that an event occurs
A probability is a ratio of counts:

number of ways event A can occur
P(A) =
total number of possible outcomes
The number P(A) is called the probability that A occurs
We assign a probability to each event
Since each event has to be assigned a probability, we speak of

a probability function
IPS 28
ST1051-ST3905-ST5005-ST6030
Probability
Definition: a probability function P on a finite sample space

assigns to each event A in a number P(A) in [0, 1] such that
(i) P() = 1
(ii) P(A B) = P(A) + P(B) if AB =
IPS 29
ST1051-ST3905-ST5005-ST6030
Probability
Recall:
(i) P() = 1
(ii) P(A B) = P(A) + P(B) if AB =
(i) states that the outcome of the experiment is always an

element of the sample space
(ii) is the additivity property of a probability function
(ii) implies additivity of the probability function over more

than two sets
If A, B, C are disjoint events, then (A B) C = and

P(A B C ) = P(A B) + P(C )
= P(A) + P(B) + P(C )
IPS 30
ST1051-ST3905-ST5005-ST6030
Probability
Example: to decide whether Peter or Paul has to wash the

dishes, we may toss a coin
We consider this a fair implies heads and tails are equally

likely to occur
So we put P({H}) = P({T }) = 1/2
We write {H} for the set consisting of the single element H,

because a probability function is defined on events, not on
outcomes
IPS 31
ST1051-ST3905-ST5005-ST6030
Probability
Example: due to an asymmetric distribution of the mass over

the coin, the coin is not completely fair
For example, P(H) = 0.4999 and P(T ) = 0.5001
Bernouilli experiment: two possible outcomes, say failure

and success, with probabilities 1 p and p to occur, where
p [0, 1]
Example: buying a ticket in a lottery with 10,000 tickets and

only one prize, where success stands for winning the prize,
then p = 104
IPS 32
ST1051-ST3905-ST5005-ST6030
Probability
How should we assign probabilities in the experiment where
we ask for the birthday month?
P(Jan) = P(Feb)=. . .= P(Dec) = 1/12
What about long/short months?
P(Jan) = 31/365 and P(Apr ) = 30/365
Assuming that one in every four years is a leap year, how

would you assign a probability to each month?
When outcomes are real numbers (e.g. time to next

earthquake), it is impossible to assign a positive probability to
each outcome (there are just too many outcomes!)
IPS 33
ST1051-ST3905-ST5005-ST6030
Probability: extensions to non-disjoint events
In general, additivity of P implies that the probability of an

event is obtained by summing the probabilities of the
outcomes belonging to the event
Exercise: compute P(L) and P(R) in the birthday experiment,
where
L = {Jan, Mar , May , Jul, Aug , Oct, Dec}
R = {Jan, Feb, Mar , Apr , Sep, Oct, Nov , Dec}
IPS 34
ST1051-ST3905-ST5005-ST6030
Rule to compute probabilities of events A and B that are not

disjoint?
Note that we can write
A = (A B) (A B c )
which is a disjoint union
Hence
P(A) = P(A B) + P(A B c )
IPS 35
ST1051-ST3905-ST5005-ST6030
We can split A B in the same way with B and B c
We obtain (A B) B and (A B) B c
These boil down to be respectively B and A B c
Thus
P(A B) = P(B) + P(A B c )
IPS 36
ST1051-ST3905-ST5005-ST6030
Recall: for any two events A and B,
P(A) = P(A B) + P(A B c )

P(A B) = P(B) + P(A B c )
Eliminating P(A B c ) from these 2 equations we obtain the

rule:
P(A B) = P(A) + P(B) P(A B)
From the additivity property we can also find a way to

compute probabilities of complements of events:
since A Ac = , P(Ac ) = 1 P(A)
IPS 37
ST1051-ST3905-ST5005-ST6030
Counting methods: combinations and permutations

Permutation: ordered arrangement of objects
Given a set of size n and a sample of size k, there are...

with replacement: nk different ordered samples
without replacement:
n!
Ank = = n(n 1) . . . (n k + 1)
(n k)!
different ordered samples
Corollary: the number of orderings of n elements is
n! = n(n 1)(n 2) . . . 1
Ex: there are 5!=120 ways to line up five children

IPS 38
ST1051-ST3905-ST5005-ST6030
Counting methods: combinations and permutations

Combinations:

n n!
Ckn = =
k k!(n k)!
enumerates the number of possible combinations of k out of

n items
n!
Using Ckn = k!(nk)! implies that order does not matter
Application: these binomial coefficients occur in

n
X
(a + b)n = Ckn ak b nk
k=0
(try with a = b = 1)
IPS 39
ST1051-ST3905-ST5005-ST6030
Counting methods: contingency tables

Ex: At a particular police checkpoint, 20% of females fail a breath
test for drunken driving. The corresponding percentage for males is
40%. Of the individuals tested, 70% are male.
1 How likely is it that a randomly selected individual passes the

breath test?
2 How likely is it that a randomly selected male passes the

breath test?
3 Suppose that an individual fails a breath test, what is the

probability this individual is female?
IPS 40
ST1051-ST3905-ST5005-ST6030
Ex: At a particular police checkpoint, 20% of females fail a breath

test for drunken driving. The corresponding percentage for males is
40%. Of the individuals tested, 70% are male...
First, pick a hypothetical number of participants, then apply

these proportions in a contingency table:
Gender
Breath test Male Female Total
Pass 420 240 660
Fail 280 60 340
Total 700 300 1,000
Now we can answer the questions (cf. tutorial)...
IPS 41
ST1051-ST3905-ST5005-ST6030
Conditional probability
Ex [Rice 1995 p.15]: digitalis therapy is beneficial to patients with

a particular heart condition, but it has a risk of intoxication (a
serious side-effect that is difficult to diagnose).
For diagnosis purposes, the concentration of digitalis in the blood
is measured in 135 patients and results are arranged as follows:
T+ high blood concentration (positive test)
T low blood concentration (negative test)
D+ toxicity (disease present)
D no toxicity (disease absent)
IPS 42
ST1051-ST3905-ST5005-ST6030
T+ high blood concentration (positive test)
T low blood concentration (negative test)
D+ toxicity (disease present)
D no toxicity (disease absent)
Toxicity
D+ D Total
T+ 25 14 39
T 18 78 96
Total 43 92 135
IPS 43
ST1051-ST3905-ST5005-ST6030
Converting the frequencies to proportions (out of 135):
Toxicity Toxicity
D+ D Total D+ D Total
T+ 25 14 39 T+ .185 .104 .289
T 18 78 96 T .135 .578 .711
Total 43 92 135 Total .318 .682 1.000
From the table: P(T +) = .289, P(D+) = .318.
If one knows that the test for high blood concentration was
positive, what is the probability of disease (toxicity)?
P(D + T +) 25 .185
P(D+ | T +) = = = = .640 = 64%
P(T +) 39 .289
IPS 44
ST1051-ST3905-ST5005-ST6030
Definition of Conditional Probability
Definition: the conditional probability of A given B is

(
P(AB)
P(A | B) = P(B) , if P(B) > 0
0, otherwise
The multiplication rule follows: for any events A and B,

P(A B) = P(A|B)P(B)
Show that P(A|B) + P(Ac |B) = 1
Let B be a fixed conditioning event and define

Q(A) = P(A | B) for events A ; then Q is a probability
function and hence satisfies all the rules
IPS 45
ST1051-ST3905-ST5005-ST6030
The law of total probability and Bayes rule

2001: the EC introduced massive testing of cattle to determine
infection with the transmissible form of Bovine Spongiform
Encephalopathy (BSE, mad cow disease) [Dekking et al 2005]
As no test is 100% accurate, most tests have the problem of

false positives and false negatives
False positive: test says cow is infected, although cow actually

isnt
False negative: an infected cow is not detected by the test
Let B=cow has BSE and T=test comes up positive
Test the test by analyzing samples from cows that are

known to be infected or known to be healthy and so
determine effectiveness of the test.
IPS 46
ST1051-ST3905-ST5005-ST6030
Results may be summarized as follows: an infected cow has a

70% chance of testing positive, and a healthy cow just 10%;
ie P(T |B) = 0.70, P(T |B c ) = 0.10
Probability P(T ) that an arbitrary cow tests positive?
The tested cow is either infected or it is not: T occurs in

combination with B or with B c (no other possibilities)
In terms of events T = (T B) (T B c ), so that

P(T ) =P(T B)+ P(T B c )
P(T B) = P(T |B)P(B)
P(T B c ) = P(T |B c )P(B c )
P(T ) = P(T |B)P(B) + P(T |B c )P(B c )
IPS 47
ST1051-ST3905-ST5005-ST6030
Recall:
P(T ) = P(T |B)P(B) + P(T |B c )P(B c )
This is an application of the law of total probability
Computing a probability through conditioning on several

disjoint events that make up the whole sample space
Suppose P(B) = 0.02; then:
P(T ) = 0.02 0.70 + (1 0.02) 0.10 = 0.112
IPS 48
ST1051-ST3905-ST5005-ST6030
Total probability
Exercise: Calculate P(T ) when P(T |B) = 0.99 and

P(T |B c ) = 0.05
The law of total probability:

Suppose B1 , B2 ,. . . , Bm are disjoint events such that
m
i=1 Bi =
The probability of an arbitrary event A can be expressed as:

m
X
P(A) = P(A|Bi )P(Bi )
i=1
IPS 49
ST1051-ST3905-ST5005-ST6030
Bayes Theorem
Suppose a cow tests positive; what is the probability it really
has BSE?
I.e. what is P(B|T ) given P(T |B)? We have:

P(T B)
P(B|T ) =
P(T )
P(T |B)P(B)
=
P(T |B)P(B) + P(T |B c )P(B c )
So with P(B) = 0.02 we find
P(B|T ) = (0.70 0.02)/(0.70 0.02 + 0.10(1 0.02)) = 0.125
Similarly: P(B|T c ) = 0.0068
Test A is not a very good test; a perfect test would result in

P(B|T ) = 1 and P(B|T c ) = 0. This is Bayes rule.
IPS 50
ST1051-ST3905-ST5005-ST6030
Bayes Theorem
Bayes rule:
Suppose the events B1 , B2 , . . . , Bm are disjoint and
m
i=1 Bi = . Then
P(A|Bi )P(Bi )
P(Bi |A) = Pm
j=1 P(A|Bj )P(Bj )
It follows from P(Bi |A)P(A) = P(A|Bi )P(Bi ) in combination

with the law of total probability applied to P(A)
Mad cow example:

Calculate P(B|T ) and P(B|T c ) if P(T |B) = 0.99 and
P(T |B c ) = 0.05
IPS 51
ST1051-ST3905-ST5005-ST6030
Independence
Consider the three probabilities
P(B) = 0.02, P(B|T ) = 0.125, P(B|T c ) = 0.0068
If we know nothing about a cow, we would say that there is a

2% chance it is infected (B)
But if we know it tested positive (T), we can say there is a

12.5% chance the cow is infected
If it tested negative (TC ), there is only a 0.68% chance
Knowing whether T occurs affects our assessment of

likelihood of B
IPS 52
ST1051-ST3905-ST5005-ST6030
Independence
Consider the three probabilities
P(B) = 0.02, P(B|T ) = 0.125, P(B|T c ) = 0.0068
Imagine the opposite: the test is useless
Whether the cow is infected is unrelated to the outcome of

the test, and knowing the outcome of the test does not
change our probability of B: P(B|T ) = P(B)
In this case we would call B independent of T
IPS 53
ST1051-ST3905-ST5005-ST6030
Independence
Definition:
An event A is called independent of B if P(A|B) = P(A)
By application of the multiplication rule, if A is independent

of B, then
P(A B) = P(A|B)P(B) = P(A)P(B)
On the other hand, if P(A B) = P(A)P(B), then

P(A|B) = P(A) follows from the definition of independence
Therefore A independent of B implies P(A B) = P(A)P(B)
IPS 54
ST1051-ST3905-ST5005-ST6030
Independence
Finally, by definition of conditional probability, if A is
independent of B, then
P(A B) P(A)P(B)
P(B|A) = = = P(B)
P(A) P(A)
that is, B is independent of A
To show that A and B are independent it suffices to prove just

one of the following:
P(A | B) = P(A)
P(B | A) = P(B)
P(A B) = P(A)P(B)
where A may be replaced by Ac and B replaced by B c , or both
If one of these statements holds, all of them are true

IPS 55
ST1051-ST3905-ST5005-ST6030
Independence: example [Rice 1995 p.22]
A card is selected at random from a deck. Let A = card is an

ace and D = card is a diamond.
Knowing the card is an ace gives no information about its suit
Checking formally for independence: P(A) = 4/52 = 1/13

and P(D) = 1/4
Also, P(A D) = 1/52
Since P(A)P(D) = (1/4) (1/13) = 1/52 the events are in

fact independent
IPS 56
ST1051-ST3905-ST5005-ST6030
Independence of more than two events
Events A1 ,A2 ,. . . ,Am are called independent if

m
Y
P(m
i=1 Ai ) = P(Ai )
i=1
This holds if any subset is replaced by complements
Suppose A and B are independent; and B and C are

independent. Then are A and C independent?
IPS 57
ST1051-ST3905-ST5005-ST6030
Independence of more than two events: example
Perform two independent tosses of a coin
Let A={heads on toss 1}, B={heads on toss 2},

and C={ the two tosses are equal}
P(A) =P(B) = 1/2,
P(C ) = P(A B)+ P(Ac B c ) =1/4 + 1/4 = 1/2
A,B are independent by assumption
IPS 58
ST1051-ST3905-ST5005-ST6030
Independence of more than two events: example

Given that the first toss is heads (A occurs), C occurs if and
only if the second toss is heads as well (B occurs), so
1
P(C |A) = P(B|A) = P(B) = = P(C )
2
By symmetry, P(C |B) = P(C )
So all pairs taken from A, B, C are independent: the three are

called pairwise independent
But P(A B C ) = P(A B) = 1/4 , whereas

P(A)P(B)P(C ) = 1/8
And P(A B C c ) = P() = 0, whereas

P(A)P(B)P(C c ) = 1/8
IPS 59
ST1051-ST3905-ST5005-ST6030
Random variables
A random variable is a variable of interest whose values are not

known in advance and are subject to chance (variability).
Each possible value of a r.v. has an associated likelihood, or

probability (or mass, depending on the context).
A r.v. actually is a mapping defined over the whole sample space,

i.e. it is a function. The development of random variables is
associated with measure theory.
IPS 60
ST1051-ST3905-ST5005-ST6030
Discrete random variables

Let be a sample space. A discrete random variable is a
function X : R that takes on a finite number of values
a1 , a2 , . . . , an or an infinite number of values a1 , a2 , . . .
In a way, a discrete random variable X transforms a sample

space to a more tangible sample space , whose events
are more relevant
Example : Two throws with a die and the corresponding sums

and maximum
For instance, S=sum transforms

= {(1, 1), (1, 2), ..., (1, 6), (2, 1), ..., (6, 5), (6, 6)}

to = {2, . . . , 12}
= {1, ..., 6}
M=maximum transforms to
IPS 61
ST1051-ST3905-ST5005-ST6030
Formally, we must determine the probability distribution of X,

i.e. describe how the probability mass is distributed over
possible values of X
Once a discrete r.v. X is introduced, we can list the possible

values of X and their corresponding probabilities, and the
sample space is no longer important
This information is contained in the probability mass function

(pmf) of X
Ex (maximum): what is the pmf of M?
M=a 1 2 3 4 5 6
p(a) 1/36 3/36 5/36 7/36 9/36 11/36
IPS 62
ST1051-ST3905-ST5005-ST6030
The probability mass function p of a discrete random variable X

is the function
p : R [0, 1]
defined by
p(a) = P(X = a)
for < a < . If X is a discrete random variable that takes on
the values a1 , a2 , . . ., then
p(ai ) > 0
X
p(ai ) = 1
i
and p(a) = 0 for all other a.
IPS 63
ST1051-ST3905-ST5005-ST6030
Definition: The distribution function F of a random variable X is

the function
F : R [0, 1]
defined by
F (a) = P(X a)
for < a < .
Both the probability mass function and the distribution

function of a discrete random variable X contain all the
probabilistic information of X
The probability distribution of X is determined by either of

them
IPS 64
ST1051-ST3905-ST5005-ST6030
Properties of pmf and pdf

Example plots for M
IPS 65
ST1051-ST3905-ST5005-ST6030
Properties of the distribution function F of a random variable X:
1 For a b one has that F (a) F (b)
2 Since F (a) is a probability, 0 F (a) 1, and

lim F (a) = 1
a+
lim F (a) = 0
a
3 F is right-continuous, i.e., one has

lim F (a + ) = F (a)
0
NB: a b implies that the event {X a} is contained in the

event {X b}
Conversely, any function F satisfying 1, 2, and 3 is the

distribution function of some random variable
IPS 66
ST1051-ST3905-ST5005-ST6030
Exercise: Let X be a discrete random variable, and let a be such

that p(a) > 0. Show that
F (a) = P(X < a) + p(a)
IPS 67
ST1051-ST3905-ST5005-ST6030
Continuous random variables
Let be a sample space. A continuous random variable is a

function X : R that takes on any value a R
We no longer consider the mass of each possible value of X
Instead we consider the likelihood that X (a, b) for a < b
Example : the pH level X of some chemical compound can

take any value between 0 and 14. We would then evaluate
e.g. the probability that 5.5 X 6.5.
IPS 68
ST1051-ST3905-ST5005-ST6030

The probability density function (pdf) f (x) of X is an
integrable function such that
Z b
P(a X b) = f (x)dx
a
Conditions on f :
f (x) 0 x
R

f (x)dx = 1
The cdf of a continuous r.v. X is defined as

Z x
F (x) = f (u)du = P(X x)

IPS 69
ST1051-ST3905-ST5005-ST6030
Expectation and variance

Definition: The expected value of a discrete random variable X
is defined as X
E (X ) = xi p(xi )
xi
Definition: The expected value of a continuous random variable

X is defined as Z
E (X ) = xf (x)dx

E [g (X )] is said to exist if the corresponding sum and integral

exists.
IPS 70
ST1051-ST3905-ST5005-ST6030
Definition: The variance of a random variable X is defined as
Var (X ) = E (X E (X ))2

= E (X 2 ) E (X )2
The variance of a discrete r.v. X is obtained from the pmf:

2
X X
Var (X ) = xi2 p(xi ) xi p(xi )
xi xi
The variance of a continuous r.v. X is obtained from the pdf:

Z Z 2
2
Var (X ) = x f (x)dx xf (x)dx

IPS 71
ST1051-ST3905-ST5005-ST6030
The standard deviation of a rv X is

p
(X ) = Var (X )
It has the same dimension as the measure itself: e.g. if X is

expressed in metres, then so is (X )
IPS 72
ST1051-ST3905-ST5005-ST6030
Some properties of E (X ) and Var (X )

Expectation:
E (aX ) = aE (X ) a constant
E (XY ) = E (X )E (Y ) if X and Y are independent
E (a + bX ) = a + bE (X ) linearity
E (X + Y ) = E (X ) + E (Y ) linearity
Xn n
X
E[ Xi ] = E [Xi ]
i=1 i=1
Variance:
Var (aX ) = a2 Var (X ) a constant

Var (a + X ) = Var (X ) a constant
IPS 73
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
Section III
IPS 74
ST1051-ST3905-ST5005-ST6030
Outline
The Binomial distribution
The Geometric distribution
The Poisson distribution
IPS 75
ST1051-ST3905-ST5005-ST6030
Binomial experiments
Consider an experiment with outcomes 1 (success) and 0
(failure) five times
Then = {0, 1} {0, 1} {0, 1} {0, 1} {0, 1}
Consider A = exactly one experiment was a success

This event is given by the set
A = {(0, 0, 0, 0, 1), (0, 0, 0, 1, 0), (0, 0, 1, 0, 0), (0, 1, 0, 0, 0), (1, 0, 0, 0, 0)}
Let success have probability p and failure probability 1 p
Then P(A) = 5(1 p)4 p, since there are five outcomes in the
event A, each having probability (1 p)4 p
IPS 76
ST1051-ST3905-ST5005-ST6030
Binomial experiments
Exercise: What is the probability of the event B exactly two

experiments were successful?
IPS 77
ST1051-ST3905-ST5005-ST6030
The Bernoulli and Binomial distributions

The Bernoulli distribution is used to model an experiment with
only two possible outcomes, often referred to as success and
failure, usually encoded as 1 and 0.
Definition: A discrete random variable X has a Bernoulli

distribution with parameter p, where 0 p 1, if its probability
mass function is given by
pX (1) = P(X = 1) = p
and
pX (0) = P(X = 0) = 1 p
Notation: X Ber (p).

IPS 78
ST1051-ST3905-ST5005-ST6030
Suppose you attend, completely unprepared, a multiple-choice

exam
It consists of 10 questions, and each question has four

alternatives (of which only one is correct)
You will pass the exam if you answer six or more questions
correctly
You decide to answer each of the questions in a random way,

in such a way that the answer of one question is not affected
by the answers of the others
What is the probability that you will pass?
IPS 79
ST1051-ST3905-ST5005-ST6030
Bernoulli / Binomial
Setting for i = 1, 2, . . . , 10,

1 if i-th answer is correct
Ri =
0 if i-th answer is wrong
P10
The number of correct answers X is given by X = i=1 Ri
Exercise:
Calculate the probability that you answered the first question
correctly and the second one incorrectly
IPS 80
ST1051-ST3905-ST5005-ST6030
X attains only the values 0, 1, . . . , 10
Let us first consider the case X = 0
Since the answers to the different questions do not influence

each other, we conclude that the events {R1 = a1 },. . . ,
{R10 = a10 } are independent for every choice of the ai , where
each ai is 0 or 1
We have
P(X0 ) = P(R1 = 0, R2 = 0, . . . , R10 = 0)
= P(R1 = 0)P(R2 = 0) . . . P(R10 = 0)
= (3/4)10
The probability that we have answered exactly one question
correctly equals
P(X = 1) = (1/4) (3/4)9 10
IPS 81
ST1051-ST3905-ST5005-ST6030
The probability of observing k independent successes is

k 10k
10 1 3
P(X = k) =
k 4 4
When order matters:
Choose k different objects out of an ordered list of n objects:

n possibilities for the first object
n 1 possibilities for the second object
n 2 possibilities for the third object
...
n (k 1) possibilities for the kth object
So there are n(n 1) . . . (n (k 1)) ways to choose the k

objects
IPS 82
ST1051-ST3905-ST5005-ST6030
When order does not matter:
Any two arrangements will represent the same choice if they

are composed of the same objects
Thus a single choice collection of k objects corresponds to k!

ordered arrangements
So a distinct number of choices is obtained by dividing the

number for ordered by k!
The probability that you will pass is P(X 6) = 0.0197
It pays to study, doesnt it?!
IPS 83
ST1051-ST3905-ST5005-ST6030
Definition: A discrete random variable X has a Binomial

distribution with parameters n and p, where n = 1, 2, . . . and
0 p 1, if its probability mass function is given by

n
pX (k) = P(X = k) = p k (1 p)nk
k
for k = 0, 1, . . . , n
We denote this distribution by Bin(n, p)
The expectation of a Binomial distribution Bin(n, p) is
E (X ) = np
Its variance is
Var (X ) = np(1 p)
IPS 84
ST1051-ST3905-ST5005-ST6030

Example of infinite experiments: the geometric experiment
1 Each observation falls into one of two categories, either

success or failure
2 The probability of a success, call it p, is the same for each

observation
3 The observations are all independent (this allows us to

multiply probabilities)
4 The variable of interest is the number of trials required to

obtain the first success
IPS 85
ST1051-ST3905-ST5005-ST6030
Definition: A discrete random variable X has a geometric

distribution with parameter p, where 0 < p 1, if its probability
mass function is given by
pX (k) = P(X = k) = (1 p)k1 p
for k = 1, 2, ....
We denote this distribution by Geo(p)
The expectation of a Geometric distribution Geo(p) is

X 1
E (X ) = kp(1 p)k1 =
p
k=1
Its variance is
1p
Var (X ) =
p2
IPS 86
ST1051-ST3905-ST5005-ST6030
Geometric distribution
Exercise:
Let X have a Geo(p) distribution. For n 0, show that
P(X > n) = (1 p)n .
IPS 87
ST1051-ST3905-ST5005-ST6030
Memoryless property
Memoryless property: for n, k = 0, 1, 2, . . . one has
P(X > n + k|X > k) = P(X > n)
We have:
P({X > k + n} {X > k})
P(X > n + k | X > k) =
P(X > k)
P(X > k + n)
=
P(X > k)
(1 p)n+k
=
(1 p)k
= (1 p)n
= P(X > n)
IPS 88
ST1051-ST3905-ST5005-ST6030

One may be interested in counts per unit time/space interval
If counts per unit interval are typically relatively low (rare),

the situation may be modelled by a Poisson distribution
Example: we observe the number X of incoming calls (events)

at a call centre per hour
Two assumptions:
Homogeneity: the rate at which events occur is constant
over time/space
Independence: the numbers of events in disjoint intervals are

independent of each other
Homogeneity implies that we require at any unit interval

E (X ) =
IPS 89
ST1051-ST3905-ST5005-ST6030
Definition: A discrete random variable X has a Poisson

distribution with parameter > 0 if its probability mass function p
is given by
k
p(k) = P(X = k) = e
k!
We denote this distribution by Poi().
Derivation of the expectation of a Poisson rv X with rate :

P k
X k1
E (X ) = k=0 ke

k! = e
(k 1)!
k=1
P j
= e j=0 j! =
The variance can be derived in a similar way:
Var (X ) =
IPS 90
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Section IV
IPS 91
ST1051-ST3905-ST5005-ST6030
Outline
The Uniform distribution
The Exponential distribution
The Normal distribution
Moments
IPS 92
ST1051-ST3905-ST5005-ST6030
Many experiments have outcomes that take values on a

continuous scale (e.g. of weight, length, duration, etc.)
Probability density functions may be seen as a (neverending)

process of refinement from discrete random variables
Ex: a discrete rv takes on the value 6.283 with probability p.

This value may be refined (updated at a smaller scale), and
then the probability p is spread over the outcomes
6.2830, 6.2831, . . . , 6.2839
Each of these new values is taken on with a probability

smaller than p, and the sum of the ten probabilities is p
IPS 93
ST1051-ST3905-ST5005-ST6030
Continuing the refinement process to more and more

decimals, the probabilities of the possible values of the
outcomes become smaller and smaller, approaching zero
However, the probability that the possible values lie in some

fixed interval [a, b] will settle down
IPS 94
ST1051-ST3905-ST5005-ST6030

A random variable X is continuous if for some function f : R R
and for any numbers a, b with a b,
Z b
P(a X b) = f (x)dx
a
R
and f satisfies f (x) 0 x and f (x)dx = 1. We call f the
probability density function (or probability density) of X .
IPS 95
ST1051-ST3905-ST5005-ST6030

P(a X b) = area under a probability density function f
on the interval [a, b]
IPS 96
ST1051-ST3905-ST5005-ST6030
Let X be a continuous random variable. Then > 0, we

have: Z a+
P(a X a + ) = f (x)dx
a
For 0, it follows that a, P(X = a) = 0
a, b constant,
P(a X b) = P(a < X b)

= P(a X < b)
= P(a < X < b)
IPS 97
ST1051-ST3905-ST5005-ST6030

For small > 0,
Z a+
P(a X a + ) = f (x)dx 2f (a)
a
Hence f (a) can be interpreted as a (relative) measure of how

likely it is that X will be near a
f (a) is not a probability: f (a) can be arbitrarily large
Ex: Let
if x 0

0
1
f (x) =
2 x
if 0 < x < 1
if x 1

0
is a probability density function
IPS 98
ST1051-ST3905-ST5005-ST6030

Discrete rvs do not have a probability density function f
Continuous rvs do not have a probability mass function p
But both have a distribution function F (a) = P(X a)
For a < b, the event {X b} is a disjoint union of the events

{X a} and {a < X b}
We can express the probability that X lies in an interval (a, b]

directly in terms of F for both cases:
P(a < X < b) = P(X b) P(X a)
= F (b) F (a)
Z b
F (b) = f (x)dx

IPS 99
ST1051-ST3905-ST5005-ST6030
Continuous random variables - example
Suppose we want to make a probability model for an experiment

that can be described as an object hits a disc of radius r in a
completely arbitrary way. We are interested in the distance X
between the hitting point and the center of the disc.
Since distances cannot be negative, we have

F (b) = P(X b) = 0 when b < 0
Since the object hits the disc, we have F (b) = 1 when b > r
Probability of hitting any region is proportional to the area of

that region
IPS 100
ST1051-ST3905-ST5005-ST6030
Continuous random variables - example
The original disc has area r 2
The inner disc defined by the hitting point has radius b and
area b 2
b 2 b2
We should put F (b) = P(X b) = r 2
= r2
for 0 b r
The pdf f of X is equal to 0 outside the interval [0, r ] and, for

0 x r,
dF (x) 1 d 2x
f (x) = = 2 x2 = 2
dx r dx r
IPS 101
ST1051-ST3905-ST5005-ST6030
Continuous random variables - exercise
Exercise:
Compute for the darts example the probability that
0 < X r /2, and the probability that r /2 < X r .
IPS 102
ST1051-ST3905-ST5005-ST6030
Expected value of a functional of a random variable
Let g (x) be a function of rv X ; then g (X ) is also a random

variable
Then, for the discrete and continuous cases respectively,

X
E [g (X )] = g (x)P(X = x)
x
and Z +
E [g (X )] = g (x)f (x)dx

E [g (X )] is said to exist if the corresponding sum and integral

exist
IPS 103
ST1051-ST3905-ST5005-ST6030
The Uniform distribution corresponds to an experiment where

the outcome is completely arbitrary, except that we know that
it lies between certain bounds
Example: measure for a long time the emission of radioactive

particles of some material
IPS 104
ST1051-ST3905-ST5005-ST6030
Suppose the experiment consists of recording in each hour at

what times the particles are emitted
Then the outcomes will lie in the interval [0,60] minutes
The measurements must not concentrate in any temporal

way (in our physical world anyway)
Not concentrating in any way means that subintervals of

the same length should have the same probability
The pdf should be constant on [0, 60]
IPS 105
ST1051-ST3905-ST5005-ST6030
A continuous rv has a Uniform distribution on the interval [, ] if

its probability density function f is given by

0 if x
/ [, ]
f (x) = 1
for x
We denote this distribution by U(, ).
Exercise:
Argue that the distribution function F of a rv that has a
U(, ) distribution is given by F (x) = 0 if x < , F (x) = 1
if x > , and F (x) = (x )/( ) for x .
IPS 106
ST1051-ST3905-ST5005-ST6030

pdf and the distribution function of a U(0, 1/3) distribution:
IPS 107
ST1051-ST3905-ST5005-ST6030

Describes how long until something happens (e.g. time
between emissions of particles from a radioactive source)
Obtained directly from the Geometric distribution
Also has the memoryless property:

P(X > s + t | X > s) = P(X > t)
Notation: Exp(), with rate > 0
If X Exp(), then range of X is R+ , > 0
Cumulative distribution function:

F (a) = 1 e a for a0
Probability density function:
f (x) = e x for x 0
IPS 108
ST1051-ST3905-ST5005-ST6030

Exponential distribution for various rates (en.wikipedia.org):
IPS 109
ST1051-ST3905-ST5005-ST6030
Illustration
Example: relative frequency histogram of lifetimes of a
computer component
What happens if one uses finer bins (with a large enough

sample)?
IPS 110
ST1051-ST3905-ST5005-ST6030
Using finer bins (classes):
This bell-shaped curve is typical of the Normal distribution

IPS 111
ST1051-ST3905-ST5005-ST6030
The normal distribution has two parameters: its mean and

its standard deviation
Notation: N(, 2 )
If X N(, 2 ), then range of X is R, R, and > 0
The density is given by

2
1 1 (x)
f (x) = e 2 2
2
IPS 112
ST1051-ST3905-ST5005-ST6030

The shape of the normal distribution varies according to the
values of and
However the distribution is always bell-shaped and symmetric

about the mean
IPS 113
ST1051-ST3905-ST5005-ST6030
The Standard Normal distribution: probability table

To find probabilities, we would need to integrate the pdf
But integrating this pdf is not straightforward
Instead, use a table of standard normal probabilities
Normal table gives the areas to the right for a series of

z-values. i.e. right hand tails or P(Z z)
IPS 114
ST1051-ST3905-ST5005-ST6030
The Standard Normal distribution: probability table
Table 1. Areas in the Tail of the Standard Normal Distribution
This table gives the probability that a standardised normal variable

x
will be at least ,

where is the mean and is the standard deviation of the normal
variable.
.00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641
0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247
0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859
0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483
0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121
0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776
0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451
0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611
1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379 IPS 115
ST1051-ST3905-ST5005-ST6030
The Standard Normal distribution: exercises
Find P(Z < 0.45)
Find P(Z < 1.03)
Find P(0.36 Z 1.04)
Find P(0.48 Z 0.60)
Find P(1.96 Z 0.63)
IPS 116
ST1051-ST3905-ST5005-ST6030
Percentiles of the Normal distribution

We can also use the standard normal table to find percentiles
of the standard normal distribution
Ex: 80th percentile = value below which lies 80% of the

distribution, i.e. P(Z < P0.80 ) = 0.80
We can use the table of probabilities, working backwards
Since P(Z > P0.80 ) = 0.20, we rather look for 0.2000 within
the table
We read that
P(Z > 0.84) = 0.2005
P(Z > 0.85) = 0.1977
Therefore 0384 < P0.80 < 0.85
Approximating, we get P 0.841 IPS 117

ST1051-ST3905-ST5005-ST6030
Percentiles of the Normal distribution
Exercise: find the 95th percentile of the standard normal

distribution...
This knowledge will become useful in further sections...
IPS 118
ST1051-ST3905-ST5005-ST6030
Standardization
Most commonly, normal rvs are not standard, i.e. 6= 0

and/or 6= 1
Standardizing these allows one to apply the table of normal

probabilities
Standardization: given a rv X N(, ),
X
Z= N(0, 1)

This principle is also (implicitly) fundamental in many
statistical inference methods
IPS 119
ST1051-ST3905-ST5005-ST6030
Standardization: example
A life assurance company has established that the lifetimes of
a certain subgroup of policy-holders are normally distributed
with a mean of 72 years and a standard deviation of 4 years,
i.e. the continuous lifetime H N(72, 4)
Percentage of policy-holders lives longer than 78 years?
IPS 120
ST1051-ST3905-ST5005-ST6030
Standardization: example
Standardize:

H 78 72
P(H > 78) = P > = P(Z > 1.50)
4
IPS 121
ST1051-ST3905-ST5005-ST6030
Adding Normal variables
The sum (or difference) of independent normal variables is

also normally distributed
Suppose we have n independent normal variables

X1 , X2 , . . . , Xn where Xi N(i , i ) i = 1, ..., n
Then, for a sequence of constants {a1 , . . . , an },

v
Xn Xn u n
uX
Y = ai Xi N ai i , t ai2 i2
i=1 i=1 i=1
IPS 122
ST1051-ST3905-ST5005-ST6030
Adding Normal variables: exercises
A chemical detergent is made by mixing 2 ingredients, A and

B.
The volumes of A are normally distributed with a mean of
50ml and a standard deviation of 1.5ml.
The volumes of B are normally distributed with a mean of

75ml and a standard deviation of 2.5ml.
The detergent is made by mixing 2 parts of A with 3 parts of

B.
What proportions of detergent will have volumes greater than

330ml?
IPS 123
ST1051-ST3905-ST5005-ST6030
Adding Normal variables: exercises
At a certain bank, account balances are normally distributed

with a mean of 1700 and a standard deviation of 100.
A random sample of n accounts is taken.
What is the distribution of the sample total?
NB: Each account balance is normally distributed with mean

1700 and standard deviation 100.
IPS 124
ST1051-ST3905-ST5005-ST6030
Moments
Moments
E [X k ] is called the kth raw moment of X , if the expectation

exists, where k is any positive integer.
E [|X |k ] is called the kth absolute moment.
E [(X E [X ])k ] is called the kth central moment.
IPS 125
ST1051-ST3905-ST5005-ST6030
Moments
Moment Generating Function

Special expectation: the Moment Generating Function (MGF)
X (t) = E [e tX ]
if it exists. The MGF is an alternative way of specifying the

distribution of a random variable.
For a continuous distribution,

Z
X (t) = e tx f (x)dx

Z
1
= (1 + tx + t 2 x 2 + . . . )f (x)dx
2!
1
= 1 + tE [X ] + t 2 E [X 2 ] + . . .
2!
IPS 126
ST1051-ST3905-ST5005-ST6030
Moments
Moment Generating Function
Useful properties:
1 Limit:
d k X (t)
lim = E [X k ]
t0 dt k
2 If X , Y are independent,
X +Y (t) = E [e t(X +Y ) ]
= E [e tX e tY ]
= E [e tX ]E [e tY ]
= X (t)Y (t)
IPS 127
ST1051-ST3905-ST5005-ST6030
Moments
Examples of MGFs
X Exp():
1 tx x/
Z
X (t) = e e dx
0
1 x[t(1/)]
Z
= e dx
0
" #
1 e x[t(1/)]
=
t (1/)
0
" #
1 e x[t(1/)]

1 1
= lim
x t (1/) t (1/)
1
=
1 t
as long as t < 1/, since the upper limit will vanish
IPS 128
ST1051-ST3905-ST5005-ST6030
Moments
Examples of MGFs
X N(, 2 ):
Z 2
1 1 (x)
X (t) = e tx e 2 2 dx
2
Z
1 1 x 2 2x+2
= e tx e 2 2 dx
2
Z 2 2 2
1 1 x 2x(+t )+
= e 2 2 dx
2
Z 0 2
(+t 2 )2
2
1 2+ 1 (x )
= e 2 2 2 e 2 2 dx
2
where 0 = ( + t 2 )
The integrand is another Gaussian with different mean

Therefore it must integrate to 2
1 2 2 IPS 129
ST1051-ST3905-ST5005-ST6030
Moments
Characteristic function
The MGF of a random variable does not always exist
Its characteristic function X always exists, where
X (t) = E [e itX ]
Z
= e itx f (x)dx

is the Fourier transform of f (x)
Connection with the MGF:
X (it) = X (t)
Additive property:
X +Y (t) = X (t) + Y (t)

IPS 130
ST1051-ST3905-ST5005-ST6030
Moments
Cumulants
The log of the characteristic function is used to generate
cumulants n :

X (it)n
log (X (t)) = n
n!
n=1
Cumulants are related to moments:
1 = E [X ]
2 = E [X 2 ] E [X ]2
3 = 2E [X ]3 3E [X ]E [X 2 ] + E [X 3 ]
...
Their relationship to centred moments is simpler

IPS 131
ST1051-ST3905-ST5005-ST6030
Limit theorems
Section V
Limit theorems
IPS 132
ST1051-ST3905-ST5005-ST6030
Limit theorems
Outline
Motivation
Limit theorems
IPS 133
ST1051-ST3905-ST5005-ST6030
Limit theorems
Motivation
The Normal distribution: utility
Central Limit Theorem: under very general conditions, the

distribution of the sum of a large number of mutually
independent rvs may be approximated by a Normal
distribution
This is a very important result that allows to use the Normal

distribution in a very large variety of situations
Confidence intervals, hypothesis tests, and regression models

(describing relationships between variables) are some of the
key elements of the theory of Statistics that largely rely on the
normal distribution
IPS 134
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems
Chebyshevs inequality
Let X1 , . . . , Xn be a sequence of iid rvs with E (Xi ) = and

Var(Xi ) = 2 . Let
n
1X
Xn = Xi
n
i=1
Then for any > 0, Chebyshevs inequality states that
Var(Xn )
P | Xn |>
2
IPS 135
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems
The Law of Large Numbers Theorem
Let X1 , . . . , Xn be a sequence of iid rvs with E (Xi ) = and

Var(Xi ) = 2 . Let
n
1X
Xn = Xi
n
i=1
Then for any > 0,
P | Xn |> 0

as n
IPS 136
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems
Definition: convergence in distribution
Let X1 , X2 , . . . be a sequence of rvs with cdf F1 , F2 , . . . and let X

be a rv with cdf F . We say that Xn converges in distribution to X
if
lim Fn (x) = F (x)
n
IPS 137
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems
The Central Limit Theorem
Let X1 , X2 , . . . be a sequence of iid rvs having 0 mean and

variance 2 , and the common distribution function F and MGF M
defined in a neighbourhood of 0. Let
n
X
Sn = Xi
i=1
Then for < x > ,

Sn
lim P x = (x)
n n
where (x) denotes the cdc for the Standard Normal distribution
IPS 138
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems
Examples:
Let X1 , . . . , Xn be iid N(, 2 ); then

Pn
Sn n Xi n
Zn = = i=1 N(0, 1)
n n
Let X1 , . . . , X12 (i.e. n = 12) be iid U(0, 1); then

n
X approx.
Sn 6 = Xi 6 N(0, 1)
i=1
(since X U(0, 1) has E (X ) = 1/2 and Var(X ) = 1/12)
IPS 139
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Section VI
IPS 140
ST1051-ST3905-ST5005-ST6030
Outline
Exploratory Analysis and Descriptive statistics
Sampling
Exploratory data analysis
IPS 141
ST1051-ST3905-ST5005-ST6030
Probability? Statistics?
IPS 142
ST1051-ST3905-ST5005-ST6030
Statistics!
Moneyball (2011)
IPS 143
ST1051-ST3905-ST5005-ST6030
Sampling
Population parameters
Statistical inference consists in estimating population features
Ex: population of N = 393 hospitals in a given country, for

which the mean number of discharges is
N
1 X
= xi = 814.6
N
i=1
The population total (total number of discharges) is
N
X
= xi = N = 320, 138
i=1
Population variance on number of discharges per hospital:
N N
2 1 X 2 1 X
= (xi ) = xi 2 2
N N
i=1 i=1
IPS 144
ST1051-ST3905-ST5005-ST6030
Sampling
Simple Random Sampling (SRS)
A sample of size n is picked to represent a population of size N
Most elementary form of sampling is SRS
Each sample of size n has the same probability of occurrence

N
There are such samples (without replacement)
n
Each item gets picked at most once
Can be performed using a (pseudo-)random generator, balls in

urn, etc.
IPS 145
ST1051-ST3905-ST5005-ST6030
Sampling
A sample of size n is picked to represent a population of size N
The sample mean number of discharges approximates with

n
1X
X = Xi
n
i=1
An estimate of the population total (number of discharges) is

then
= N X
The population variance on the number of discharges per

hospital is estimated with
n
1 X
2
s = (Xi X )2
n1
i=1
IPS 146
ST1051-ST3905-ST5005-ST6030
Sampling
Stratified Random Sampling
For a variety of reasons a population may be partitioned into

groups (strata)
These strata can then be sampled independently

Ex:
human populations organised in geographical areas
Irish pupils stratified by school
shipments of goods stratified by carrier size (large, medium

and small)
A final sample is obtained by combining the results from the

strata
IPS 147
ST1051-ST3905-ST5005-ST6030
Sampling
Consider L strata, each of size Nl , l = 1, . . . , L (before sampling):
The sample mean for each stratum is

nl
1 X
Xl = Xil l
nl
i=1
and the overall population mean estimate is

L L
X Nl Xl X
X = = Wl Xl
N
l=1 l=1
The estimate of l2 is
l n
1 X
sl2 = (Xil Xl )2
nl 1
i=1
IPS 148
ST1051-ST3905-ST5005-ST6030
Sampling
Cluster Sampling
In stratified random sampling, all strata get sampled from
This requirement may be unrealistic in some cases
Cluster sampling consists in grouping the population into

clusters
SRS is then applied by selecting whole clusters
Usually produces greater sampling error than random or

stratified sampling
But loss of precision may be outweighed by the efficiency of

data collection
IPS 149
ST1051-ST3905-ST5005-ST6030
Sampling
Systematic Sampling
Sampling at regular intervals
Ex: select every 10th member of a list
Requires the sequence of members to be random (i.e. not

sorted) so as to avoid bias
IPS 150
ST1051-ST3905-ST5005-ST6030
Exploratory Data Analysis

The durations of 272 eruptions of the Old Faithful geyser at
Yellowstone National Park, Wyoming, USA, were recorded from
1st to 15th Aug 1985 (in seconds)
IPS 151
ST1051-ST3905-ST5005-ST6030

The durations of 272 eruptions of the Old Faithful geyser at
Yellowstone National Park, Wyoming, USA, were recorded
from 1st to 15th Aug 1985 (in seconds)
The variety in the lengths of the eruptions indicates that

randomness is involved
By exploring the dataset we might learn about this
randomness:
which durations are more likely to occur?
is there something like the typical duration of an eruption?
do the durations vary symmetrically around the center of the

dataset?
IPS 152
ST1051-ST3905-ST5005-ST6030
Yellowstone data: duration (seconds) of 272 eruptions

216 108 200 137 272 173 282 216 117 261 110 235 252 105 282
130 105 288 96 255 108 105 207 184 272 216 118 245 231 266
258 268 202 242 230 121 112 290 110 287 261 113 274 105 272
199 230 126 278 120 288 283 110 290 104 293 223 100 274 259
134 270 105 288 109 264 250 282 124 282 242 118 270 240 119
304 121 274 233 216 248 260 246 158 244 296 237 271 130 240
132 260 112 289 110 258 280 225 [......] 200 250 260 270 145 240
250 113 275 255 226 122 266 245 110 265 131 288 110 288 246
238 254 210 262 135 280 126 261 248 112 276 107 262 231 116
270 143 282 112 230 205 254 144 288 120 249 112 256 105 269
240 247 245 256 235 273 245 145 251 133 267 113 111 257 237
140 249 141 296 174 275 230 125 262 128 261 132 267 214 270
249 229 235 267 120 257 286 272 111 255 119 135 285 247 129
265 109 268 (Source: W. Hardle. Smoothing techniques with
implementation in S. 1991; Table 3, page 201. Springer New York)
IPS 153
ST1051-ST3905-ST5005-ST6030
In order to retrieve this type of information, just listing the

observed durations does not help much
Somehow we must summarize the observed data
We could start by computing the mean of the data, which is

209.3 for the Old Faithful data
However, this is a poor summary of the dataset, because there

is a lot more information in the observed durations
How do we get hold of this?
IPS 154
ST1051-ST3905-ST5005-ST6030
Ordered durations
96 100 102 104 105 105 105 105 105 105 107 107 108 108 108 108 109
109 109 110 110 110 110 110 110 110 111 111 112 112 112 112 112 112
112 112 113 113 113 113 115 115 116 116 117 118 118 118 119 119 119
120 120 120 120 121 121 121 122 122 124 125 125 126 126 126 128 129
130 130 131 132 132 132 133 134 134 135 135 136 137 138 139 140 141
142 143 144 144 145 145 149 157 158 168 173 174 184 199 200 200 202
205 207 210 210 214 214 216 216 216 216 221 223 224 225 226 226 229
230 230 230 230 230 231 231 233 235 235 235 237 237 238 238 240 240
240 240 240 240 242 242 243 244 244 245 245 245 245 [.....] 274 274
275 275 275 275 276 276 276 276 277 278 278 278 279 280 280 282 282
282 282 282 282 283 284 285 286 287 288 288 288 288 288 288 289 289
290 290 291 293 294 294 296 296 296 300 302 304 306
Middle elements (136th and 137th) = 240, much closer to max
(306) than to min (96) - implies asymmetry
IPS 155
ST1051-ST3905-ST5005-ST6030
Numerical summaries
A range of descriptive statistics can be used to build a

numerical summary of a sample
This is useful in particular to focus on specific features
Depending on the nature and characteristics of the sampled

data, some statistics are more adequate than others
They usually follow the terms sample of empirical
Some required the ordered sample
X[1] , . . . , X[n]
IPS 156
ST1051-ST3905-ST5005-ST6030
Sample {x1 , . . . , xn } = empirical information
Each observation xi has empirical probability 1/n
Under usual regularity conditions, large sample theory

highlights probabilistic notions
Ex: sample mean

n n
1X X 1
X = Xi = (Xi )
n n
i=1 i=1
n
X
E (X ) = xp(x) (discrete)
Z
n
E (X ) = xf (x)dx (continuous)

IPS 157
ST1051-ST3905-ST5005-ST6030
Common numerical summaries

Median:
more robust than sample mean
is the value xM such that P(X < xM ) = 0.5
from an ordered sample X[1] , . . . , X[n] : median = X[ n+1 ]

2
Ex: median of (1, 3, 4, 7, 9) is 4
Ex: median of (1, 3, 4, 7, 100) is still 4
Ex: median of (1, 3, 4, 7, 9, 10) is 4+0.5(7-4) = 5.5

(interpolation)
IPS 158
ST1051-ST3905-ST5005-ST6030
Given a sample X = {X1 , . . . , Xn } , most commonly used

statistics are:
For centrality: the sample mean X and/or median X[ n+1 ]
2
For shape: the empirical quartiles
qn (0.25) = Q1 (X ) = X[ n+1 ]
4
qn (0.75) = Q3 (X ) = X[ 3(n+1) ]
4
For variability: the sample standard deviation sn or variance

n
1 X
sn2 = (Xi Xn )2
n1
i=1
or the inter-quartile range IQR(X ) = Q3 (X ) Q1 (X )
IPS 159
ST1051-ST3905-ST5005-ST6030
Numerical summaries
A typical summary of a sample X = {X1 , . . . , Xn } would

include:
min(X ) (or alternative)
Q1 (X )
median(X )
Q3 (X )
max(X ) (or alternative)
This summary matches that provided by a typical boxplot
IPS 160
ST1051-ST3905-ST5005-ST6030
Robust numerical summaries

Often with real datasets one has to deal with outlying values
Typically, outliers are defined as values that stand outside
(Q1 (X ) 1.5 IQR, Q3 (X ) + 1.5 IQR)
They may affect summaries significantly and prompt the use

of robust statistics:
use the median rather the the sample mean
use Q1 (X ) 1.5 IQR or qn (0.02) rather than min(X )
use Q3 (X ) + 1.5 IQR or qn (0.98) rather than max(X )
A boxplot should represent these outliers
IPS 161
ST1051-ST3905-ST5005-ST6030
Ex: sepal width on the Iris data (50 flowers from each of 3 species
of iris)
Iris data (2nd component)
4.0
3.5
3.0
2.5
2.0
Source: Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The

New S Language. Wadsworth & Brooks/Cole
IPS 162
ST1051-ST3905-ST5005-ST6030
Histogram of the Old Faithful data

Histogram reveals the asymmetry of the dataset and the fact that
the elements accumulate somewhere near 120 and 270, which was
not clear from the list of values
IPS 163
ST1051-ST3905-ST5005-ST6030
Drawing a histogram
Whenever feasible let the software do it!
Know the theory behind the histogram, so as to interpret it

and modify its calibration
Essential features:
1 Total area under graph taken to represent 1
2 Rectangles put on bin-widths
3 m = 1 + 3.3 log10 (n) be the number of bins, or
4 b = 3.49sn1/3 be the bin-widths, where s is the sample

standard deviation
Also refer to the Normal Reference Curve method
IPS 164
ST1051-ST3905-ST5005-ST6030
Results of different bin-widths
IPS 165
ST1051-ST3905-ST5005-ST6030
Interfailure times data (in CPU seconds)
30 113 81 115 9 2 91 112 15 138 50 77 24 108 88 670 120 26 114

325 55 242 68 422 180 10 1146 600 15 36 4 0 8 227 65 176 58
457 300 97 263 452 255 197 193 6 79 816 1351 148 21 233 134
357 193 236 31 369 748 0 232 330 365 1222 543 10 16 529 379 44
129 810 290 300 529 281 160 828 1011 445 296 1755 1064 1783
860 983 707 33 868 724 2323 2930 1461 843 12 261 1800 865
1435 30 143 108 0 3110 1247 943 700 875 245 729 1897 447 386
446 122 990 948 1082 22 75 482 5509 100 10 1071 371 790 6150
3321 1045 648 5485 1160 1864 4116
Source: J.D. Musa, A. Iannino, and K. Okumoto. Software
reliability: measurement, prediction, application. McGraw-Hill,
New York, 1987; Table on page 305
IPS 166
ST1051-ST3905-ST5005-ST6030
The empirical distribution function

Cumulative representation of the data
Empirical cumulative distribution function of the data Fn is

1
Fn (x) = (number of elements in the dataset x)
n
IPS 167
ST1051-ST3905-ST5005-ST6030
Boxplot
Another way of summarising the underlying data distribution
Given a sample, a boxplot indicates quartiles and outliers:

Faithful data
90
80
70
60
50
IPS 168
ST1051-ST3905-ST5005-ST6030
Scatterplot
To investigate the relationship between two or more variables
Given x and y , the dataset consists of pairs of observations:

(x1 , y1 ), (x2 , y2 ), . . ., (xn , yn ) (bivariate dataset)
Does y depend on x? If so, can we describe their relationship?
A first step is to plot the points (xi , yi ) for i = 1, 2, . . . , n
This plot is called a scatterplot
IPS 169
ST1051-ST3905-ST5005-ST6030
Scatterplot
Example: daily readings of air quality values in NYC,
1st May - 30th Sept 1973 (R dataset airquality)
Ozone: Mean ozone in parts per billion from 1300 to 1500

hours at Roosevelt Island
Solar.R: Solar radiation in Langleys in the frequency band

40007700 Angstroms from 0800 to 1200 hours at Central Park
Wind: Average wind speed in miles per hour at 0700 and

1000 hours at LaGuardia Airport
Temp: Maximum daily temperature in degrees Fahrenheit at

La Guardia Airport.
IPS 170
ST1051-ST3905-ST5005-ST6030
Air quality, NYC, May-Sep 1973 Air quality, NYC, May-Sep 1973
20
90
Temperature (degrees F)
Wind (miles per hour)
15
80
10
70
5
60
0 50 100 150 0 50 100 150
Ozone (parts per billion) Ozone (parts per billion)
IPS 171
ST1051-ST3905-ST5005-ST6030
Estimation
Section VII
Estimation
IPS 172
ST1051-ST3905-ST5005-ST6030
Estimation
Outline
Estimation
Confidence intervals
Linear regression
IPS 173
ST1051-ST3905-ST5005-ST6030
Estimation
Statistical inference
Detection:
Discrete probabilities (most of the time)
Hypothesis testing: minimise probability of incorrect decision
Estimation:
Discrete or continuous probabilities
Classical statistics: estimate a real number, not a r.v.

(e.g. the mass of an electron)
Bayesian inference: estimate a r.v. (with its distribution)
Always minimise (some form of) estimation error
The central feature is always the information (data).
Statistical inference techniques can easily be applied very badly.

IPS 174
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
Estimation, estimators and estimates
Why estimation?
An estimate t is a realization of a random variable T ...
We cannot say anything with certainty about which of the

estimators is closer to the parameters of interest
When is one estimate better than another?
Does there exist a best possible estimate?
How likely it is that an estimate lies within a given distance

from the parameter?
IPS 175
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
Estimators
Let t = h(x1 , x2 , . . . , xn ) be an estimate based on the dataset
x1 , x2 , . . . , xn
Then t is a realization of the random variable

T = h(X1 , X2 , . . . , Xn )
The random variable T is called an estimator
The word estimator refers to the method or device for

estimation
This is distinguished from estimate, which refers to the actual

value computed from a dataset
Note that estimators are special cases of sample statistics
IPS 176
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
An estimator is a statistic - or an appropriate function of the

sample observations that gives us an estimated value for the
unknown parameter
Ex: sample mean X and sample variance s 2 are estimators for

the mean and variance 2 from the Normal distribution
N(, 2 )
There can be more than one estimator feasible
How to choose one of them or select a best one?
We need criteria to determine what is desirable
The two most common criteria used are (a) Unbiasedness (b)
Minimum Variance
IPS 177
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
Unbiasedness and minimum variance

Unbiasedness: If T is an estimator for , then T is called
unbiased if
E [T ] =
Ex: for a Normal sample from N(, 2 ), E [X ] =
Minimum Variance: If an estimator T of achieves

minimum variance, then under regular conditions T achieves
the best possible estimation accuracy
The sample mean often turns out to be a Minimum Variance

unbiased estimator for its expected value
For N(, 2 ), X is a Minimum Variance unbiased estimator

for
Harder to check, but feasible

IPS 178
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
Constructing good estimators: Maximum Likelihood
Let
L() = f (x1 , x2 , . . . , xn ; ),
be the joint pdf of X1 , X2 , . . . , Xn
For a given set of observations (x1 , x2 , . . . , xn ), a value

at which L() is maximum is called a maximum likelihood
estimate (MLE) of
That is is a value of that satisfies

= max f (x1 , x2 , . . . , xn ; )
f (x1 , x2 , . . . , xn ; )

IPS 179
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
Maximum Likelihood Estimator
MLE:
= max f (x1 , x2 , . . . , xn ; )
f (x1 , x2 , . . . , xn ; )

What makes this estimator attractive? Under very general

conditions on the density or pmf:
ML converges to the true in probability as sample size
increases

as sample size increases n( ) converges to a Normal
distribution with mean 0
IPS 180
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
MLE: interfailure data example

Ex: the sample is a realization of random variables
X1 , X2 , . . . , Xn , with n=135, and Xi EXP()
Let the sample be denoted by x1 , . . . , xn
The pdf is
1
f (x, ) = e x/ , x, > 0

Joint pdf is
Y
L() = f (xi , )
i
n
Y 1 xi /
= e

i=1
n ni=1 xi /
P
= e
IPS 181
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
Note that L() is a strictly positive function of
Therefore we can write

Pn
i=1 xi
ln L() = n ln

Note that ln L() is a differentiable function of and
Pn
d ln L() n xi
= + i=1 2
=0 = x
d
Further, Pn
d 2 ln L()

n i=1 xi

= 2 2 <0
d2 3

which imples that is a local maximum of the ML function
IPS 182
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
X is an unbiased estimator for , because E [X ]=E [X1 ] =
The estimate is 656.8815
It can be shown that X is also the Minimum Variance one

among all possible Unbiased estimators [But beyond our scope
now]
IPS 183
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation
How reliable is our estimator?

So we have a supposedly good estimator, X
This is still a rv and therefore will vary in an unpredictable

manner from one sample to another if we repeat the
experiment
We can find a range of values within which we can claim that

the true value lies with a high probability
This is called a Confidence Interval
To find a confidence interval, we need to know the probability

distribution of the estimator statistic
In our example, we need to find the pdf for X
IPS 184
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence Intervals
A confidence interval is an interval in which we are very
confident the population parameter of interest lies
The level of confidence is stated and is frequently 95%
When we estimate a statistic about a population (e.g. a

mean), we calculate a single estimate, known as a point
estimate
This makes no use or mention of the sampling error
Knowledge of the standard error of the estimate will allow us

to give a measure of the sampling error
The standard error (and the sampling distribution) is used to

calculate a confidence interval
IPS 185
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence Intervals for the sample mean

The confidence interval for the mean X is obtained based on
the Normal distribution N(. 2 )
Given a sample of size n and a critical value Z , this CI is

X Z
n
In practice the standard deviation is not known, and one
usually uses its sample estimate instead
If a confidence level of say 95% is set, then the significance

level is 100%-95%=5%
The critical value Z sets the level of confidence
Z is the percentile of the Standard Normal distribution

yielding a rhs area of half the significance level
IPS 186
ST1051-ST3905-ST5005-ST6030
Estimation
Ex: for a 95% CI, one needs to remove the most extreme
2.5% from each tail of the distribution
i.e. one truncates the Standard Normal distribution beyond

(Z , Z ) = (1.96, 1.96)
P( Z > 1.96) = 0.05

0.4
0.3
Density
0.2
0.1
0.0
4 2 0 2 4
X 2
Z=
n
IPS 187
ST1051-ST3905-ST5005-ST6030
Estimation
Ex: a random sample of 50 transactions was selected from a

travel agency
The mean value was 732.16 and the standard deviation was
83.14
95% CI for the mean transaction value is given by

s 83.14
X Z = 732.16 1.96 = 732.16 23.05
n 50
= (709.11,755.21)
We can be 95% confident that the true (population) mean

transaction value lies between 709.11 and 755.21
IPS 188
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence Intervals
In previous example, a 99% CI will be wider: because of the

higher confidence level, more values must be included
Generally speaking 2 parameters control the width of a CI: the

sample size and the level of confidence
CI will be narrower when either increasing n or decreasing Z
It is usually preferred to increase the sample size
If not possible one must then decrease the confidence level
IPS 189
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence Intervals for a proportion
Consider an estimated proportion p of a certain

sub-population
Under usual conditions the maximum likelihood estimator for

+
a proportion given a sample of n observations is p = nn
The associated confidence interval is given by

r
p)
p(1
p Z
n
IPS 190
ST1051-ST3905-ST5005-ST6030
Estimation
Ex: from a random sample of 250 people in a certain electoral

ward, 65 were in favour of a proposed amendment to the
constitution
Find a 99% CI for the proportion of all people in favour...

65
We have p = 250 = 0.26, Z = 2.575 and
r r
p)
p(1 0.26 0.74
pZ = 0.262.575 = 0.26.0714
n 250
This means that we can be 99% confident that the true
(population) proportion of people in favour of a proposed
amendment to the constitution is between 18.66% and
33.14%
IPS 191
ST1051-ST3905-ST5005-ST6030
Estimation
Sample size determination

When we estimate a population feature using a statistic, we
do not know in advance how wide the CI will be
If we use too small a sample size, the confidence interval may

be to wide to be meaningful
If we use too large a sample size, the confidence interval may

be unnecessarily narrow, meaning valuable resources were
wasted in the process
If we have some previous knowledge of the population

variability, we can calculate the sample size required to
estimate the population feature to within a stated range
(precision) with a stated level of confidence
IPS 192
ST1051-ST3905-ST5005-ST6030
Estimation
Sample size for a sample mean
Given a population variance 2 and a critical value Z , the

sample size required to estimate a mean to within an
allowable error is
Z 22
n = 2

Usually, the population standard deviation will not be known
but an estimate, say s, may be available e.g. from a pilot
study or from similar previous studies
We can substitute s into the formula
The accuracy of our calculated sample size depends on the

accuracy of the previous estimate of the standard deviation
IPS 193
ST1051-ST3905-ST5005-ST6030
Estimation
Ex: from a pilot study, the weekly bank charges to private

customers was found to have a standard deviation of 10
How large a sample would be needed to estimate the

population mean bank charge to within 1.50 with 95%
confidence?
We have Z = 1.96, s = 10, = 1.50 so

2
1.96 10
n= = 170.74
1.50
i.e. we must use n = 171
Always round up!!!
IPS 194
ST1051-ST3905-ST5005-ST6030
Estimation
Sample size for a proportion
Consider an estimated proportion p of a certain

sub-population
The sample size required to estimate this proportion to within

an allowable error and with (100 2)% confidence is
determined by
Z 2 p(1
p)

n= 2

The allowable error is expressed as a decimal, i.e. (0, 1)
IPS 195
ST1051-ST3905-ST5005-ST6030
Estimation
Ex: after performing a study, we judge the confident interval

about an estimated proportion p = 0.20 to be too wide
We wish to repeat the study so that we estimate the

proportion to within 4 percentage points with 95%
confidence...
p = 0.20, Z = 1.96, = 0.04
The sample size required to do so is
1.962 0.20 0.80

n= = 384.16
0.042
i.e. we must take n = 385
Always round up!!!
IPS 196
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression
Regression
Let Y be a random variable and x a deterministic variable
(that is, non-random)
Given a random sample (x1 , Y1 ),. . ., (xn , Yn ) we want to find

a mathematical relationship that expresses Y in terms of x
The variable x is called the independent variable and Y is

called the dependent or response variable
In the case of simple linear regression, the model that we

propose is of the form
Y = 0 + 1 x +
where is an error term

IPS 197
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression
We assume that each observation Yi of Y satisfies
Yi = 0 + 1 Xi + i
where i N(0, 2 ) for i = 1, . . . , n, and that the random

variables i are independent
Note that we take for granted that the variance of i is the

same for all values of i
Note also that
the Yi s are observed
the Xi s are known
the i s are unobservable
IPS 198
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression
Regression and Least Squares
For this model, the best estimators of the parameters 0 and

1 , that is, the minimum variance unbiased estimators of 0
and 1 , are obtained using the method of least squares
We define the sum

n
X n
X
SS = 2i = (Yi 0 1 xi )2
i=1 i=1
The estimators 0 and 1 of 0 and 1 by the method of least

squares are the values of 0 and 1 that minimize the sum SS
IPS 199
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression
We set the normal equations

n
SS X
= 2 (Yi 0 1 xi ) = 0
0
i=1
n
SS X
= 2 xi (Yi 0 1 xi ) = 0
1
i=1
The solutions of these equations are

Pn Pn
(x x)(Yi Y ) x Y
i=1 xi Yi n
1 = Pn i
i=1
= n 2
)2 x2
P
i=1 (xi x i=1 xi n
0 = Y 1 x
IPS 200
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression
Example: tensile strength
We want to determine how the tensile strength of a certain

alloy depends on the percentage of zinc it contains. We have
the following data:
% of zinc 4.7 4.8 4.9 5.0 5.1

Tensile strength 1.2 1.4 1.5 1.5 1.7
Consider the simple linear regression model: Y = 0 +1 x +,

where x is the percentage of zinc and Y is the tensile strength
IPS 201
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression
P5 2
P5
x= 4.9, y = 1.46, i=1 xi =120.15 and i=1 xi yi = 35.88
Then,
P5
xi yi 5
x y 35.88 5(4.9)(1.46)
1 = Pi=1
5
= = 1.1
2
i=1 xi 5 x2 120.15 5(4.9)2
and
0 = y 1 x = 1.46 (1.1)(4.9) = 3.93
Thus the prediction equation is given by y = 3.93 + 1.1x
IPS 202
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression
Example: air quality (R dataset)

Air quality, NYC, May-Sep 1973 Air quality, NYC, May-Sep 1973
20
90
Temperature (degrees F)
Wind (miles per hour)
15
80
10
70
5
60
0 50 100 150 0 50 100 150
Ozone (parts per billion) Ozone (parts per billion)
Temp 69.4 + 0.20 Ozone ( = 0.698)
Wind 12.6 0.07 Ozone ( = 0.602)

IPS 203
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Section VIII
Hypothesis Testing
IPS 204
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Outline
Concepts in hypothesis testing
One-sample, one-sided tests of the population mean
One-sample, one-sided tests of the population proportion
One-sample, two-sided tests
Two-sample tests
Goodness-of-fit tests
Testing for significance in linear regression
Summary
IPS 205
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Hypothesis Testing
We know that X is an unbiased estimator
We further have a Confidence Interval at 95% confidence level

for the unknown parameter
Can we check whether the unknown parameter actually takes

values that are not dependent on the sample
[Note that the range of values given in the Confidence Interval

is dependent on the sample]
i.e. instead of deriving ranges of values (even if in probability)

from the sample, can we start out by making assumptions
about the range of possible values and then test the
assumption based on the sample?
IPS 206
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Hypothesis Testing: Interfailure example
Ex: we have x = 656.8815 for unknown parameter
Can we make a hypothesis that the true value of the unknown

is equal to 700?
Well we definitely can!
Having made that hypothesis, we need to test it based on the

sample of observations
NB: we cannot assume that, say, < 500, since is not the
rate of an Exponential distribution
IPS 207
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Forming Hypotheses
Typically, hypotheses are expressed as restriction of possible

values for the true unknown parameter
i.e. they represent a partition of the parameter space
Ex: parameter space is R+ ; i.e. R+
Null hypothesis H0 : = 0 where 0 = 700 is to be tested
We can see that H0 represents a proper subset of
H0 is assumed true until data indicate otherwise
Typically, H0 assumes no effect
IPS 208
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
We also need to formally define an alternative hypothesis HA
HA typically states a significant effect was observed
The set represented by HA is not allowed any overlap with the

set represented by H0
HA contains values that would lead to reject the null H0
Ex: given H0 : = 0 , a reasonable HA is HA : = 1 = 600
IPS 209
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Some standard forms of Rejection Regions

Forms of hypothesis Reject if
H0 : = 0 vs Ha : = 1 , 0 < 1 X > k
H0 : = 0 vs Ha : = 1 , 0 > 1 X < k
H0 : = 0 vs Ha : 6= 1 , X < k1 or X > k2
H0 : < 0 vs Ha : > 0 X > k
H0 : = 0 vs Ha : > 0 X > k
H0 : = 0 vs Ha : < 0 X < k
Note that
P[N(0, 1) < 1.645] = P[N(0, 1) > 1.645] = 0.05
P[N(0, 1) < 1.96] = P[N(0, 1) > 1.96] = 0.025
IPS 210
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Errors in detection
Recall: one seeks to retain or reject a null hypothesis H0 on the
basis of evidence. Let us denote H1 the alternative hypothesis.
H0 is true H1 is true
H0 is accepted Correct decision Type II error
H1 is accepted Type I error Correct decision
The Null hypothesis can never be proven
Type I error occurs when H0 is true but rejected
P(Type I error) = significance level of the test
P(Type II error) = false negative rate

IPS 211
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Test statistic
We test the hypotheses based on the sample
To do this, we can only use functions of sample values that do

not involve the unknown
Hence the use of a statistic... but how to pick one?

Idea: if
the statistic in question, say T, is an unbiased estimator for
and the underlying model distribution has finite variance so

that weak Law of Large Numbers applies
then for large samples T will be quite close in probability to
true value
T will then reflect behaviour of unknown
If T increases wed expect to increase and vice-versa

IPS 212
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
p-value
The test procedure becomes: Reject H0 if T > tc for some
unknown but computable tc
If T < tc , we will say that based on this sample we fail to

reject H0 . Now how to decide on tc ?
The p-value is the probability of obtaining a value of the test
statistic
at least as extreme as the one computed form the sample data
under the assumption that the null hypothesis is true
The smaller the p-value, the less likely H0 is to be true and

therefore the more evidence there is against it
Typically, reject H0 if p < 0.05 (i.e. 5%)
If the decision is to reject H0 , the results are termed

IPS 213
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Finding the p-value

In the example, to set up the test:
use an unbiased estimator, e.g.T = X
fix Type I error (i.e. significance level) e.g. at 5%
We need to find tc such that
Type I Error = P(Reject H0 |H0 is True)

= P(T > tc | = 700)
= 0.05
i.e.
P(|X| > tc | = 700) = 0.05
We need to know the distribution of the test statistic...

IPS 214
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sided tests of the population mean

Let us consider the case where the hypothesized value 0 is
an upper bound on the true population mean
H0 : 0
Where the population variance is known, one defines
the z-test statistic for a sample of size n as
x 0
z=
/ n
(for a Normal population or n > 30)
Then H0 is to be rejected if z z , where z is the P100

percentile of the Normal distribution (e.g. P95 = 1.645)
NB: this example implements an upper-tail test. In a lower

tail test, H0 : 0 is rejected when z z .
IPS 215
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sided tests of the population mean
0.4
P( Z > 1.645, H0) = 0.05
0.3
Density
0.2
0.1
0.0
2 0 2 4
x 0
z=
n
IPS 216
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
z-test in R
Given values n, xbar and mu0, the one-sided z-test may be

carried out by comparing the test statistic
z = (xbar-mu0)/(sigma/sqrt(n))
with the critical value set e.g. for alpha=.05:
z.alpha = qnorm(1-alpha)
Although to obtain a p-value, one may instead instruct:

pval=pnorm(z, lower.tail=FALSE)
pval > alpha
IPS 217
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Case when is unknown: the t-test
Where the population variance is unknown, one uses the

t-test instead of the z-test. The t-test statistic for a sample
of size n is defined using the sample standard deviation s as
x 0
t= , df = n 1
s n
(for a Normal population or n > 30)
Then H0 : 0 is to be rejected if t t , where t is the

P100 percentile of the Student t-distribution with n 1 dfs
NB: this example implements an upper-tail test. In a lower

tail test, H0 : 0 is rejected when t t .
IPS 218
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
If X1 , X2 , . . . , Xn is a sample from N(, 2 ), then:
X N , 2 /n

X12 + + Xn2 2 (n 1)
X
t(n 1)
s/ n
These properties are useful for deriving p-values
The test statistic is often standardized in some way so as to

use known probabilities to derive the p-value
IPS 219
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
A continuous rv has a t-distribution with parameter m, which

is also called the degrees of freedom where m 1 is an
integer, if its probability density is given by
m+1
x2 2

f (x) = km 1 +
n
for x R, where
(m + 12 )
km =
(m/2) m
and Z
(u) = e x x u1 dx
0
IPS 220
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
In this case, we find zL, zU such that

P(t(n 1) < zL ) = P[(t(n 1) > zU ) = 0.025
Everything
else is the same as in the
known- case:
P X zU sn < < X zL sn = 0.95
X = 23.78778, s = 0.07827513, n = 23
Using R:
qt(0.025,22) = -2.073873
qt(0.975,22) = 2.073873
i.e. zL = 2.073873 and zU = 2.073873
So the 95% C.I becomes

0.08 0.08
23.79 2.07 , 23.79 + 2.07 = (23.78, 23.79)
23 23
IPS 221
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
t-test in R
The (very popular) t-test is readily available, with synopsis:

t.test(x, y = NULL,
alternative = c(two.sided, less, greater),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
Be careful during implementation:

t.test(1:10,y=c(7:20)) # p-value = .00001855
t.test(1:10,y=c(7:20, 200)) # p-value = .1245
t.test(1:10,y=c(7:20), alt=less) # comment?
t.test(1:10,y=c(7:20), alt=greater) # comment?
IPS 222
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sided tests of the population proportion

The null hupothesis of an upper-tail test of population
proportion is formulated as
H0 : p 0 p
where p0 is an hypothesized upper bound on the true
population proportion p
Under an adequately randomized sample, and when np0 and

n(1 p0 ) are > 10, the one-proportion z-test is defined as
p p0
z=p
p0 (1 p0 )/ n
The null hypothesis is to be rejected when z z , where z is
the 100(1 ) percentile of the standard Normal distribution
In a lower-tail test, H0 : p0 p is rejected when z z

IPS 223
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Testing proportions in R
As for the z-test for the mean, implementation is direct
Given values n, pbar and p0, the one-sided z-test may be

carried out by comparing the test statistic
z = (pbar-p0)/sqrt(p0*(1-p0)/n)
with the critical value set e.g. for alpha=.05:
z.alpha = qnorm(1-alpha)
Although to obtain a p-value, one may instead instruct:

pval=pnorm(z, lower.tail=FALSE)
pval > alpha
IPS 224
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Two-sided tests of the population mean
For a two-sided z-test, one must check whether
z z/2 or z z/2
In R, the two-tailed p-value of the statistic may be obtained by

pval = 2 * pnorm(z, lower.tail=FALSE)
The two-sided t-test is derived using argument alternative:

t.test(x, alternative=two.sided)
There is actually no need to specify it as it is the default value.
IPS 225
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Two-sided tests of the population proportion
A two-tailed test on proportions may be implemented in R as

follows (e.g. at the 5% significance level):
z = (pbar-p0) / sqrt(p0*(1-p0)/n)
alpha = .05
z.half.alpha = qnorm(1-alpha/2)
pval = 2 * pnorm(abs(z), lower.tail=FALSE)
IPS 226
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Coal example: the Z -test

Combine these values into a confidence statement about the
true gross calorific content of Osterfeld 262DE27?
First assume that the unknown variance component, or

equivalently, the standard deviation is known
Under this assumption X N(, 2 /n)
Here and n are known, is unknown
Moreover, X N(0, 2 /n), that is a distribution which is

free of the unknown parameter
Using standardization,
X
N(0, 1)
/ n
IPS 227
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Example: gross calorific content of coal

When a shipment of coal is traded, a number of its properties
should be known accurately, because the value of the
shipment is determined by them
Gross calorific value characterizes the heat content (in

megajoules per kilogram, MJ/kg)
The ISO 1928 method is carried out to determine its value
Resulting measurement errors are known to be approximately

normal, with a standard deviation of about 0.1 MJ/kg
Laboratories that operate according to standard procedures

receive ISO certificates
The next table shows a number of such ISO 1928

measurements for a shipment of Osterfeld coal coded
IPS 228
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Example: gross calorific content of coal
Gross calorific value measurements for Osterfeld 262DE27:
23.870 23.730 23.712 23.760 23.640 23.850 23.840 23.860

23.940 23.830 23.877 23.700 23.796 23.727 23.778 23.740
23.890 23.780 23.678 23.771 23.860 23.690 23.800
[Source: A.M.H. van der Veen and A.J.M. Broos.

Interlaboratory study programme ILS coal
characterizationreported data. Technical report, NMi Van
Swinden Laboratorium B.V., The Netherlands, 1996 ]
IPS 229
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Coal example: the Z -test
We need to find two points, zL and zU such that
P (z < zL ) = P (z > zU ) = 0.025
This will give us the probability equation
X

P zL < < zU = P zL < X < zU
/ n n n

= P X zU < < X zL
n n
= 0.95
From tables or software, zU = 1.96, and zL = 1.96
IPS 230
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Coal example - confidence interval
From the data, we compute xn = 23.788
Using the given = 0.1 and = 0.05, we find the 95% CI:

0.1 0.1
23.788 1.96 , 23.788 + 1.96
23 23
i.e.
(23.747, 23.829) MJ/kg
IPS 231
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Two-sample tests
Two-sample z-test
In a two-sample z-test, one compares the means of two

samples w.r.t. an hypothesized difference in means d0 , using a
test statistic of the form
x1 x2 ) d0
(
z= q 2
1 22
n1 + n2
IPS 232
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Two-sample tests
Paired t-test
In a paired t-test, one compares the mean d of the differences
between two samples with an hypothesized difference in
means d0 , using a test statistic of the form
d d0
t= , df = n 1
s/ n
Recall synopsis for the t-test:
t.test(x, y = NULL,
alternative = c(two.sided, less, greater),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
In the paired case, both x and y must be specified , and be

the same length. Example:
t.test(x, y, alt=less, paired=TRUE)
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Two-sample tests
Other tests in R
F-test to compare the variances of two samples from normal
populations: var.test()
x <- rnorm(50, mean = 0, sd = 2)
y <- rnorm(30, mean = 1, sd = 1)
var.test(x, y)
Testing for null correlation (Pearsons coefficient):

z <- rnorm(30, mean = 0, sd = 2)
cor.test(y, z)
Nonparametric tests (cf. next section):

wilcox.test() (means), ks.test() (Normality), ...
And many more...
IPS 234
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Pearsons Chi-Square Goodness-of-Fit Test
Testing for the nature of a distribution is also very useful and

often needed
Let X be a random variable whose probability density (or

mass) function fX (x) is unknown
We want to test the null hypothesis H0 : fX (x) = f0 (x)

against the alternative hypothesis HA :fX (x) 6= f0 (x) where f0
is a given distribution function
Pearsons 2 goodness-of-fit test checks whether the observed

frequencies are consistent with those expected under f0 (x)
IPS 235
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
1 Divide the set Sx of possible values of X into k disjoint and

exhaustive classes (or intervals)
2 Take a random sample of size n from the population X

(n m )2
Calculate D 2 = kj=1 j mj j where nj is number of obs in j-th
P
3
class and mj the expected frequency under H0
4 If H0 is true and if n is large enough, then D 2 2kr 1

(approximately), where r is the number of unknown
parameters of the function f0 (x) that we must estimate
5 Reject H0 at the significance level iif D 2 > 2,kr 1
IPS 236
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Example of a Goodness of Fit

Suppose we have a random sample from a discrete random
variable summarized in the following table
value in sample 0 1 2 3 >3

frequency 31 33 22 12 2
Test null hypothesis that this random sample comes from a

Poisson distribution with = 1 at significance level =0.05
0
Compute probabilities P[X = 0]=e 0! =0.3678 and multiply
by sample size n to get an estimated proportion of the
number of 0s expected if it really came from Poi(1)
Similarly, estimate probabilities for X = 1, 2, 3, and P[X > 3]

and multiply by n = 100
IPS 237
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
value in sample 0 1 2 3 >3

frequency 31 33 22 12 2
estimated freq 36.78 36.78 18.39 6.13 1.899
k
X (nj mj )2
D2 =
mj
j=1
D 2 = 7.63 < 7.81 (corresponding to 20.05,511 )
IPS 238
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Testing for significance of regression coefficients

Tests are based on two quantities:
n
X
SSx = (xi x)2 = nsx2
i=1
where sx2 is the sample variance of the xi s, and the sum of

squared errors (or residuals)
n
X
SSE = (Yi Yi )2
i=1
Additionally we define the quantity called Mean Squared Error

SSE
MSE =
n2
IPS 239
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Regression Tests
To test the null hypothesis H0 :0 = 00 we use the statistic
0 00
T0 := q P 2 tn2
MSE i xi
SSX n
We then reject H0 at significance level if and only if

|T0 | > t/2,n2 if HA : 0 6= 00
T0 > t,n2 if HA : 0 > 00
T0 < t,n2 if HA : 0 < 00

IPS 240
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Regression Tests
To test the null hypothesis H0 :1 = 10 we use the statistic
1 10
T0 := q tn2
MSE
SSX
We then reject H0 at significance level if and only if

|T0 | > t/2,n2 if HA : 0 6= 00
T0 > t,n2 if HA : 0 > 00
T0 < t,n2 if HA : 0 < 00

IPS 241
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
There is an easy way to calculate SSE
First find the Total Sum of Squares

n
X
SST = (Yi Y )2
i=1
Xn
= Yi2 nY 2
i=1
Then find the Regression Sum of Squares
SSR = 12 SSX
= SST SSE
IPS 242
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Tensile strength example: testing for significance

Test H0 : 1 = 0 against HA : 1 6= 0
SST = 5i=1 yi2 5y 2 = 10.79 5(1.46)2 =0.132

P
SSR = 12 SSX =(0.1)(1.1)2 = 0.121
Thus, SSE = SST SSR = 0.011

q
SSR
Note that the test statistic reduces to MSE or
s
0.121
T0 = = 33 = 5.744563
0.011/(5 2)
From Tables, t0.025,3 = 3.18, hence we reject the null

hypothesis
IPS 243
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Summary
Summary
1 What we need to do a test:
1 null and alternative hypotheses
2 a test statistic T
3 a significance level
4 a rejection region C ? (where if the test statistic value lies,

reject H0 )
2 Form of the test: Reject H0 if T C ?

3 Two possible errors:
1 Type I error =P[Reject H0 H0 True]
2 Type II error =P[Fail to Reject H0 H0 False]

It is the Type I error which is fixed by setting it equal to the
significance level
IPS 244

Ips Slides

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ips Slides

Hochgeladen von

Copyright:

Verfügbare Formate

ST1051-ST3905-ST5005-ST6030

ST1051 - Introduction to Probability and Statistics

However the structure of the course has been completely reviewed

All mistakes and inaccuracies are the sole responsibility of their

For any comment or query about this document, please contact

[6] MITOpenCourseWare (MIT online lecture material):

[9] M. J. Crawley, Statistics: an Introduction Using R, Wiley 2005

This module is taught in Period 1

Tutorials: Fridays 11am-12pm in Windle ANLT

2 home assignments (10 + 10 marks)

3 home assignments (10 + 10 + 30 marks)

To provide an understanding of fundamental notions of Probability

2 Elements of Probability Theory

3 Discrete Random Variables

4 Continuous Random Variables

Focus on random or unpredictable phenomenon

Goal is usually to understand, represent, describe or predict

Probability theory aims at describing reality: mathematical

Statistics aim at providing models and techniques to analyse

The central feature is always the information (data).

Statistics consist in the collection and analysis of data.

Probability theory provides a mathematical foundation for

trading (high-probability trading, return plans, strategies, ...)

insurance / pensions (premium pricing, risk assessment, ...)

image and video processing

Medical and biostatistics:

diagnostic and prognostic analyses

Why probability and statistics: space shuttle Challenger

On 28th January 1986, the space shuttle Challenger exploded

Root cause of the disaster: failure of O-rings (sealed joints

Apparently, a management decision was made to overrule

Why probability and statistics: space shuttle Challenger

The Challenger launch was the 24th of the space shuttle

Because low temperatures are known to adversely affect the

Figure: number of failed O-rings per mission

Combining these with estimated probabilities of other events

Six field-joints implies probability of at least one complete

Would you hop on the shuttle?

Elements of Probability Theory

Events and set operations

Conditional probability and independence

Random variables and distributions

Probability, chance, randomness, likelihood, ...

Probability theory aims at representing chance phenomena

Mathematics allow us to organise the information and its

Ultimately, probabilities are always ratios of counts.

Outcomes, events, and sample spaces

Random or unpredictable phenomenon = experiment outcome

The outcomes are elements of a sample space

Subsets of are called events

An event is assigned a probability, between 0 and 1, that

Sample spaces: sets whose elements describe the outcomes

Basic experiment: the tossing of a coin; 2 possible outcomes:

Outcomes, events, and sample spaces

= {ccc, ccs, css, csc, sss, ssc, scc, scs}

Experiment: ask the next person we meet on the street in

Question: the length of time between successive earthquakes

Products of sample spaces

Ex: throw a coin twice. What is the sample space?

= {H, T } {H, T } = {(H, H), (H, T ), (T , H), (T , T )}

If we had a fair coin, i.e., P(H) = P(T ), then

P((H, H)) = P((H, T )) = P((T , H)) = P((T , T )) = 1/4

Generally, for two experiments with sample spaces 1 and 2 ,

Event A occurs if experiment outcome is an element of set A