Sie sind auf Seite 1von 23

# Applied Statistics and

Decision Support
AA

## Sampling, Confidence Interval and

Hypothesis Testing

Christopher Grigoriou

## Executive MBA – HEC Lausanne

2007-2008
1
Applied Statistics and
Decision Support
Sampling : Careful with convenience samples !
AA

## World War II: A statistical study to decide

where to put more armor on bombers

It seems bombers came back with holes in them, sometimes lots of holes.
Somebody in the brass suggested that they put more armor in the places
that had the most holes, figuring those places must be getting hit more often.

2
Distribution of the sample mean
Applied Statistics and
Decision Support
AA

A machine is set up so that the average content of juice per bottle equals µ.
The actual amounts per bottle are distributed around this average
with a standard deviation σ = 5cl. Consider a sample of 50 bottles.

## x : content of a randomly selected bottle

_i
x: average content per bottle for the sample
_ _
Distribution of x: E( x ) = µ _
_
SD( x ) = σ/ √n
Shape: Normal

## Assumptions: Population standard deviation σ known

Large sample
Large population 3
Applied Statistics and
Decision Support
AA
tossing a fair coin repeatedly

0.6 0.6
0.5 0.5
n = 1 0.4 n=2 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 0 0.5 1
0.40 0.40

0.30 0.30
n=3 n=4
0.20 0.20

0.10 0.10

0.00 0.00
0 0.33 0.67 1 0 0.25 0.5 0.75 1
4
Confidence intervals: Motivation
Applied Statistics and
Decision Support
AA

The latest poll (1,100 respondents) reveals that 54% of the population supports
the government's budgetary decisions. The margin of error is ± 3%.

## ==> Point estimate: 54%

Margin of error: ± 3%
==> Confidence interval: [51%, 57%]

## Interpretation: If many such polls were taken

the true % of respondents supporting the government
would lie in the calculated interval in most instances

5
Confidence intervals: Motivation (2)
Applied Statistics and
Decision Support
AA

## 100 measurements of the time required to check a customer in

Average check-in time = 3.5 minutes

## (b) How certain can we be that on a given day

the actual average check-in time is between 3 and 4 minutes?

## (c) Can we define a duration such that we are 99% certain

that the actual average check-in time does not exceed this duration?

6
Confidence intervals: The basic idea
Applied Statistics and
Decision Support
AA

Sampling
Population Sample
µ, σ, p

Inference:
Confidence interval Summarising data
Estimates
_
x, s, p^
_
SE = SD( x )

## First focus on sample mean

Then do a similar analysis for proportions

7
Example
Applied Statistics and
Decision Support
AA

A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48cl.
Calculate a 90% and a 95% confidence interval for the average content.

## Assume that the population standard deviation σ = 5cl.

8
Sample size
Applied Statistics and
Decision Support
AA
One day there was a fire in a wastebasket in the dean’s office
and in rushed a physicist, a chemist and a statistician.

## The physicists immediately started to work on how much energy

would have to be removed from the fire to stop combustion.

## The chemist worked on which reagent would have to be added

to the fire to prevent oxidation.

While they were doing this, the statistician was setting fires to
all the other wastebaskets in the office.

## The statistician replied, “well to solve the problem, obviously,

you need a large sample size…”

9
Sample size
Applied Statistics and
Decision Support
AA
What sample size is required to estimate the average contents
to within 0.5cl at the 95% confidence level?

10
Hypothesis testing
Applied Statistics and
Decision Support
AA

Sampling
Population Sample
µ, σ, p

Inference
Hypotheses:
Summarising data
H0, Ha Estimates
_
^
x, s, p
_
SE = SD( x )

## Hypothesis test: Observed data

Reject H0?

11
Carrying out a hypothesis test:
Applied Statistics and
Decision Support The classical approach
AA

## Step 1. Null hypothesis: H0: µ = m, Alternative hypothesis: Ha: µ ≠ m _

_
==> If hypothesis true, then observed sample mean x ~ N(m, σ/ √n )
_ _
Step 2. Sample size ==> Calculate SD( x ) = σ/ √n

Step 3. Significance level: How unlikely does the observed value have to be
to decide to reject H0

## Step 4. Acceptance region: Determine the range of values

for which H0 will not be rejected

Step 5. Take the sample and see if the observed value justifies rejection of H0

12
Example
Applied Statistics and
Decision Support
AA
A machine is set up such that the average content of juice per bottle equals µ.
A sample of 36 bottles yields an average content of 48.5cl.
Test the hypothesis that the average content per bottle is 50cl
at the 5% significance level.

## Assume that the population standard deviation σ = 5cl.

13
The impact of sample size
Applied Statistics and
Decision Support
AA

A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48.8cl.
Test the hypothesis that the average content per bottle is 50cl
at the 5% significance level.

## Assume that the population standard deviation σ = 5cl.

14
One-tailed tests
Applied Statistics and
Decision Support
AA

A machine is set up such that the average content of juice per bottle equals µ.
A sample of 36 bottles yields an average content of 48.5cl.
Can you reject the hypothesis that the average content per bottle is less than or
equal to 45cl in favour of the alternative that it exceeds 45cl (5% significance level)?

## Assume that the population standard deviation σ = 5cl.

15
Exercise: Formulating H0
Applied Statistics and
Decision Support
AA

The manager claims that the average content of juice per bottle is less than 50cl.
The machine operator disagrees. A sample of 100 bottles yields an average
content of 49cl per bottle.

Does this sample allow the manager to claim he is right (5% significance level)?

## Assume that the population standard deviation σ = 5 cl.

16
Applied Statistics and
Decision Support
AA

## D. Bowers, Statistics from Scratch 17

Computer based approach:
Applied Statistics and How likely am I to observe such an extreme value?
Decision Support
AA

## Compute significance level of sample result

Reject H0 if significance level lower than the required level

==> P-value

P/2 P/2 P

_ _
m x m x

## 2-tailed test: H0: µ = m, Ha: µ ≠ m 1-tailed test: H0: µ ≤ m, Ha: µ > m

18
Example
Applied Statistics and
Decision Support
AA

A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48.8cl.
Calculate the P-value for the hypothesis that the average content per bottle
equals 50cl.

## Assume that the population standard deviation σ = 5cl.

19
Unknown population standard deviation σ
Applied Statistics and
Decision Support
AA

s=

i =1
( xi − x ) 2

n −1

## Replace σ by s in the analysis

20
Small samples
1. Population must follow a Normal distribution
Applied Statistics and
Decision Support
AA
2a. σ known: Sample mean follows Normal distribution
2b. σ unknown: Replace σ by estimate s
Sample mean follows t-distribution
Degrees of freedom
t-score 1 5 10 20 30 50 100
with n-1degrees of freedom
0.0 0.500 0.500 0.500 0.500 0.500 0.500 0.500
1.0 0.250 0.182 0.170 0.165 0.163 0.161 0.160
N(0,1)
2.0 0.148 0.051 0.037 0.030 0.027 0.025 0.024
2.5 0.121 0.027 0.016 0.011 0.009 0.008 0.007
3.0 0.102 0.015 0.007 0.004 0.003 0.002 0.002

## Tail Degrees of freedom

probability 1 5 10 20 30 50 100
t(2) t(10)
50.0% 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10.0% 3.08 1.48 1.37 1.33 1.31 1.30 1.29
5.0% 6.31 2.02 1.81 1.72 1.70 1.68 1.66
2.5% 12.71 2.57 2.23 2.09 2.04 2.01 1.98 -3 -2 -1 0 1 2 3
1.0% 31.82 3.36 2.76 2.53 2.46 2.40 2.36 21
Confidence intervals for Proportions
Applied Statistics and
Decision Support
AA

## Observed value ^p:

^
Standard deviation of p:

^
Distribution of p:

## 95% confidence interval for the true proportion p:

22
Newspaper reports
Applied Statistics and
Decision Support
AA

The latest poll (1,100 respondents) reveals that 54% of the population supports
the government's budgetary decisions. The margin of error is ± 3%.

## ==> Point estimate: 54%

Margin of error: ± 3%
==> Confidence interval: [51%, 57%]

^
Observed value p:

^
Standard deviation of p:

23