Sie sind auf Seite 1von 23

Applied Statistics and

Decision Support
AA

Sampling, Confidence Interval and


Hypothesis Testing

Christopher Grigoriou

Executive MBA – HEC Lausanne


2007-2008
1
Applied Statistics and
Decision Support
Sampling : Careful with convenience samples !
AA

World War II: A statistical study to decide


where to put more armor on bombers

It seems bombers came back with holes in them, sometimes lots of holes.
Somebody in the brass suggested that they put more armor in the places
that had the most holes, figuring those places must be getting hit more often.

2
Distribution of the sample mean
Applied Statistics and
Decision Support
AA

A machine is set up so that the average content of juice per bottle equals µ.
The actual amounts per bottle are distributed around this average
with a standard deviation σ = 5cl. Consider a sample of 50 bottles.

x : content of a randomly selected bottle


_i
x: average content per bottle for the sample
_ _
Distribution of x: E( x ) = µ _
_
SD( x ) = σ/ √n
Shape: Normal

Assumptions: Population standard deviation σ known


Large sample
Large population 3
Example: Fraction of heads when
Applied Statistics and
Decision Support
AA
tossing a fair coin repeatedly

0.6 0.6
0.5 0.5
n = 1 0.4 n=2 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 0 0.5 1
0.40 0.40

0.30 0.30
n=3 n=4
0.20 0.20

0.10 0.10

0.00 0.00
0 0.33 0.67 1 0 0.25 0.5 0.75 1
4
Confidence intervals: Motivation
Applied Statistics and
Decision Support
AA

The latest poll (1,100 respondents) reveals that 54% of the population supports
the government's budgetary decisions. The margin of error is ± 3%.

==> Point estimate: 54%


Margin of error: ± 3%
==> Confidence interval: [51%, 57%]

Interpretation: If many such polls were taken


the true % of respondents supporting the government
would lie in the calculated interval in most instances

5
Confidence intervals: Motivation (2)
Applied Statistics and
Decision Support
AA

100 measurements of the time required to check a customer in


Average check-in time = 3.5 minutes

(a) How reliable is this estimate?

(b) How certain can we be that on a given day


the actual average check-in time is between 3 and 4 minutes?

(c) Can we define a duration such that we are 99% certain


that the actual average check-in time does not exceed this duration?

6
Confidence intervals: The basic idea
Applied Statistics and
Decision Support
AA

Sampling
Population Sample
µ, σ, p

Inference:
Confidence interval Summarising data
Estimates
_
x, s, p^
_
SE = SD( x )

First focus on sample mean


Then do a similar analysis for proportions

7
Example
Applied Statistics and
Decision Support
AA

A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48cl.
Calculate a 90% and a 95% confidence interval for the average content.

Assume that the population standard deviation σ = 5cl.

8
Sample size
Applied Statistics and
Decision Support
AA
One day there was a fire in a wastebasket in the dean’s office
and in rushed a physicist, a chemist and a statistician.

The physicists immediately started to work on how much energy


would have to be removed from the fire to stop combustion.

The chemist worked on which reagent would have to be added


to the fire to prevent oxidation.

While they were doing this, the statistician was setting fires to
all the other wastebaskets in the office.

“What are you doing?” they asked?

The statistician replied, “well to solve the problem, obviously,


you need a large sample size…”

9
Sample size
Applied Statistics and
Decision Support
AA
What sample size is required to estimate the average contents
to within 0.5cl at the 95% confidence level?

10
Hypothesis testing
Applied Statistics and
Decision Support
AA

Sampling
Population Sample
µ, σ, p

Inference
Hypotheses:
Summarising data
H0, Ha Estimates
_
^
x, s, p
_
SE = SD( x )

Hypothesis test: Observed data


Reject H0?

11
Carrying out a hypothesis test:
Applied Statistics and
Decision Support The classical approach
AA

Step 1. Null hypothesis: H0: µ = m, Alternative hypothesis: Ha: µ ≠ m _


_
==> If hypothesis true, then observed sample mean x ~ N(m, σ/ √n )
_ _
Step 2. Sample size ==> Calculate SD( x ) = σ/ √n

Step 3. Significance level: How unlikely does the observed value have to be
to decide to reject H0

Step 4. Acceptance region: Determine the range of values


for which H0 will not be rejected

Step 5. Take the sample and see if the observed value justifies rejection of H0

12
Example
Applied Statistics and
Decision Support
AA
A machine is set up such that the average content of juice per bottle equals µ.
A sample of 36 bottles yields an average content of 48.5cl.
Test the hypothesis that the average content per bottle is 50cl
at the 5% significance level.

Assume that the population standard deviation σ = 5cl.

13
The impact of sample size
Applied Statistics and
Decision Support
AA

A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48.8cl.
Test the hypothesis that the average content per bottle is 50cl
at the 5% significance level.

Compare the conclusion to that based on the 36 bottle sample.

Assume that the population standard deviation σ = 5cl.

14
One-tailed tests
Applied Statistics and
Decision Support
AA

A machine is set up such that the average content of juice per bottle equals µ.
A sample of 36 bottles yields an average content of 48.5cl.
Can you reject the hypothesis that the average content per bottle is less than or
equal to 45cl in favour of the alternative that it exceeds 45cl (5% significance level)?

Assume that the population standard deviation σ = 5cl.

15
Exercise: Formulating H0
Applied Statistics and
Decision Support
AA

The manager claims that the average content of juice per bottle is less than 50cl.
The machine operator disagrees. A sample of 100 bottles yields an average
content of 49cl per bottle.

Does this sample allow the manager to claim he is right (5% significance level)?

Assume that the population standard deviation σ = 5 cl.

16
Applied Statistics and
Decision Support
AA

How do you choose a significance level?

D. Bowers, Statistics from Scratch 17


Computer based approach:
Applied Statistics and How likely am I to observe such an extreme value?
Decision Support
AA

Compute significance level of sample result


Reject H0 if significance level lower than the required level

Result stated as: The observed difference is significant at the x% level


==> P-value

P/2 P/2 P

_ _
m x m x

2-tailed test: H0: µ = m, Ha: µ ≠ m 1-tailed test: H0: µ ≤ m, Ha: µ > m


18
Example
Applied Statistics and
Decision Support
AA

A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48.8cl.
Calculate the P-value for the hypothesis that the average content per bottle
equals 50cl.

Assume that the population standard deviation σ = 5cl.

19
Unknown population standard deviation σ
Applied Statistics and
Decision Support
AA

If large sample: estimate σ

s=

i =1
( xi − x ) 2

n −1

Replace σ by s in the analysis

20
Small samples
1. Population must follow a Normal distribution
Applied Statistics and
Decision Support
AA
2a. σ known: Sample mean follows Normal distribution
2b. σ unknown: Replace σ by estimate s
Sample mean follows t-distribution
Degrees of freedom
t-score 1 5 10 20 30 50 100
with n-1degrees of freedom
0.0 0.500 0.500 0.500 0.500 0.500 0.500 0.500
1.0 0.250 0.182 0.170 0.165 0.163 0.161 0.160
N(0,1)
2.0 0.148 0.051 0.037 0.030 0.027 0.025 0.024
2.5 0.121 0.027 0.016 0.011 0.009 0.008 0.007
3.0 0.102 0.015 0.007 0.004 0.003 0.002 0.002

Tail Degrees of freedom


probability 1 5 10 20 30 50 100
t(2) t(10)
50.0% 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10.0% 3.08 1.48 1.37 1.33 1.31 1.30 1.29
5.0% 6.31 2.02 1.81 1.72 1.70 1.68 1.66
2.5% 12.71 2.57 2.23 2.09 2.04 2.01 1.98 -3 -2 -1 0 1 2 3
1.0% 31.82 3.36 2.76 2.53 2.46 2.40 2.36 21
Confidence intervals for Proportions
Applied Statistics and
Decision Support
AA

Example: Sample of 80 customers


60 reply they are satisfied with the service they received

Calculate a 95% confidence interval for the proportion of satisfied customers

Observed value ^p:

^
Standard deviation of p:

^
Distribution of p:

95% confidence interval for the true proportion p:

22
Newspaper reports
Applied Statistics and
Decision Support
AA

The latest poll (1,100 respondents) reveals that 54% of the population supports
the government's budgetary decisions. The margin of error is ± 3%.

==> Point estimate: 54%


Margin of error: ± 3%
==> Confidence interval: [51%, 57%]

^
Observed value p:

^
Standard deviation of p:

Confidence level of interval:


23