Sie sind auf Seite 1von 126

DADM-1

How Many German Tanks?


• WW-II:Allied forces wanted to know how
many tanks the German military was
producing. They tried using traditional
espionage techniques but they kept coming
up with absurdly high estimates.
Number of Tiles in the Bag?
• A bag contains numbered tiles
• You randomly pulled six tiles from the bag
10, 23, 17, 9, 35, and 3
• Can you come up with an estimator that
uses this sample of numbers to estimate the
total number of tiles in the bag?
Estimator?
• 2*Max Sample Value : Biggest integer in our six number
sample is 35. So twice that would be 2 x 35 = 70. So twice
the maximum value, 70, gives us a number that’s way
bigger than the real value(42).
• 2*Mean Value: Well, the mean value of the six number
sample—10, 23, 17, 9, 35, and 3—is just over 16. So twice
the mean value would be about 32. That’s definitely better
than 70, but it’s off by about 25%, which is still quite a bit.
• 2*Median:That turns out to give us an estimate of 27, but
that differs from the actual value by about 35%.

Clearly, these estimators just aren’t good enough.


Better Estimator?
Population Maximum = Sample Maximum + Sample Maximum/Sample Size –1

35 + (35 / 6) – 1, which is just a hair under 40. The


actual maximum value is 42, so our estimate of 40 is
only about 5% less than the actual value!
German Tank Problem
• Calculating the population maximum using the
serial numbers from the captured tanks yielded an
estimate of only 256 per month. And after the war
when all the official German documents were
analyzed, it was found that the true value was 255
per month. That’s only 1 less than was estimated!
Statistical Inference

Statistical inference: Acquire information and draw


conclusions about populations from samples. Two
broad methods:
– Estimation
Estimate unknown population parameter using
sample statistics
– Hypothesis testing
Make a hypothesis about unknown population
parameters and test it against evidence in sample
Estimation

Estimator: Random variable based on


sample statistics that is used in estimation
– Point Estimator: Uses a single value
Ex: Infer population mean is 7
– Interval Estimator: Uses a range of values
and specifies the level of confidence
Ex: Infer µ is between 6.8 and 7.2 with 95%
confidence
Estimator Properties

Some important properties of an


estimator:
• Unbiased vs. biased
• Consistent vs. not consistent
• Relatively efficient vs. not relatively
efficient
Unbiasedness

Unbiased estimator: Estimator whose


expected value equals the population
parameter that it estimates

Does being unbiased rely on having


a large enough sample size?
Direction of Bias

Sometimes we can sign bias


• Upward bias: E[estimator] > parameter
• Downward bias: E[estimator] < parameter

For a Uniform population, is the sample range


an unbiased estimator of population range?
Consistency

Consist estimator: As sample size grows


to an infinite size the estimator converges
to the true population parameter
Consistent?
Consistency
Efficiency
Part I
Single Population
Hypothesis Testing
Hypothesis testing can be used to determine whether
a statement about the value of a population parameter
should or should not be rejected.
The null hypothesis, denoted by H0 , is a tentative
assumption about a population parameter.
The alternative hypothesis, denoted by Ha, is the
opposite of what is stated in the null hypothesis.
The hypothesis testing procedure uses data from a
sample to test the two competing statements
indicated by H0 and Ha.
Developing Null and Alternative Hypotheses

Null Hypothesis as an Assumption to be Challenged


Example:
The label on a soft drink bottle states that it
contains 67.6 fluid ounces.
Null Hypothesis:
The label is correct. m > 67.6 ounces.
Alternative Hypothesis:
The label is incorrect. m < 67.6 ounces.
Hypotheses about a Population Mean

The equality part of the hypotheses always appears


in the null hypothesis.
In general, a hypothesis test about the value of a
population mean µ must take one of the following
three forms (where µ0 is the hypothesized value of
the population mean).

H 0 : µ ≥ µ0 H 0 : µ ≤ µ0 H 0 : µ = µ0
H a : µ < µ0 H a : µ > µ0 H a : µ ≠ µ0
One-tailed One-tailed Two-tailed
(lower-tail) (upper-tail)
Null and Alternative Hypotheses
A major west coast city provides one of the most
comprehensive emergency medical services in the
world. Operating in a multiple hospital system
with approximately 20 mobile medical units, the
service goal is to respond to medical emergencies
with a mean time of 12 minutes or less.
The director of medical services wants to
formulate a hypothesis test that could use a sample
of emergency response times to determine whether
or not the service goal of 12 minutes or less is being
achieved.
Null and Alternative Hypotheses

The emergency service is meeting


H0: µ < 12
the response goal; no follow-up
action is necessary.

The emergency service is not


Ha: µ > 12
meeting the response goal;
appropriate follow-up action is
necessary.

where: µ = mean response time for the population


of medical emergency requests
Type I Error
Because hypothesis tests are based on sample data,
we must allow for the possibility of errors.
A Type I error is rejecting H0 when it is true.
The probability of making a Type I error when the
null hypothesis is true as an equality is called the
level of significance.
Applications of hypothesis testing that only control
the Type I error are often called significance tests.
Type II Error

A Type II error is accepting H0 when it is false.


It is difficult to control for the probability of making
a Type II error.
Statisticians avoid the risk of making a Type II
error by using “do not reject H0” and not “accept H0”.
Steps of Hypothesis Testing

Step 1. Develop the null and alternative hypotheses.


Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the test
statistic.

p-Value Approach
Step 4. Use the value of the test statistic to compute the
p-value.
Step 5. Reject H0 if p-value < a.
Steps of Hypothesis Testing

Critical Value Approach


Step 4. Use the level of significanceto
 to determine the
critical value and the rejection rule.
Step 5. Use the value of the test statistic and the rejection
rule to determine whether to reject H0.
One-Tailed Tests About a Population Mean:
σ Known
Example: Metro EMS
The response times for a random sample of 40
medical emergencies were tabulated. The sample
mean is 13.25 minutes. The population standard
deviation is believed to be 3.2 minutes.
The EMS director wants to perform a hypothesis
test, with a .05 level of significance, to determine
whether the service goal of 12 minutes or less is
being achieved.
One-Tailed Tests About a Population Mean:
σ Known
 p -Value and Critical Value Approaches

H0: µ < 12
Ha: µ > 12

α = .05

x − µ 13.25 − 12
z= = = 2.47
σ / n 3.2/ 40
One-Tailed Tests About a Population Mean:
σ Known
 p –Value Approach

For z = 2.47, cumulative probability = .9932.


p–value = 1 − .9932 = .0068

Because p–value = .0068 < α = .05, we reject H0.

There is sufficient statistical evidence


to infer that Metro EMS is not meeting
the response goal of 12 minutes.
One-Tailed Tests About a Population Mean:
σ Known
 Critical Value Approach

For α = .05, z.05 = 1.645


Reject H0 if z > 1.645

Because 2.47 > 1.645, we reject H0.

There is sufficient statistical evidence


to infer that Metro EMS is not meeting
the response goal of 12 minutes.
p-Value Approach to
Two-Tailed Hypothesis Testing
Compute the p-value using the following three steps:
1. Compute the value of the test statistic z.
2. If z is in the upper tail (z > 0), find the area under
the standard normal curve to the right of z.
If z is in the lower tail (z < 0), find the area under
the standard normal curve to the left of z.
3. Double the tail area obtained in step 2 to obtain
the p –value.
The rejection rule:
Reject H0 if the p-value < α .
Critical Value Approach to
Two-Tailed Hypothesis Testing
The critical values will occur in both the lower and
upper tails of the standard normal curve.
Use the standard normal probability distribution
table to find zα/2 (the z-value with an area of α/2 in
the upper tail of the distribution).
The rejection rule is:
Reject H0 if z < -zα/2 or z > zα/2.
Two-Tailed Tests About a Population Mean:
σ Known
• Example: Glow Toothpaste
The production line for Glow toothpaste is
designed to fill tubes with a mean weight of 6 oz.
Periodically, a sample of 30 tubes will be selected in
order to check the filling process.
Quality assurance procedures call for the
continuation of the filling process if the sample
results are consistent with the assumption that the
mean filling weight for the population of toothpaste
tubes is 6 oz.; otherwise the process will be adjusted.
Two-Tailed Tests About a Population Mean:
σ Known
Example: Glow Toothpaste
Assume that a sample of 30 toothpaste tubes
provides a sample mean of 6.1 oz. The population
standard deviation is believed to be 0.2 oz.
Perform a hypothesis test, at the .03 level of
significance, to help determine whether the filling
process should continue operating or be stopped and
corrected.
Two-Tailed Tests About a Population Mean:
σ Known
 p –Value and Critical Value Approaches

H0 : µ = 6
Ha: µ ≠ 6

α = .03

x − µ0 6.1 − 6
z= = = 2.74
σ / n .2 / 30
Two-Tailed Tests About a Population Mean:
σ Known
 p –Value Approach

For z = 2.74, cumulative probability = .9969


p–value = 2(1 − .9969) = .0062

Because p–value = .0062 < α = .03, we reject H0.


There is sufficient statistical evidence to
infer that the alternative hypothesis is true
(i.e. the mean filling weight is not 6 ounces).
Two-Tailed Tests About a Population Mean:
σ Known
 Critical Value Approach

For α/2 = .03/2 = .015, z.015 = 2.17


Reject H0 if z < -2.17 or z > 2.17

Because 2.74 > 2.17, we reject H0.


There is sufficient statistical evidence to
infer that the alternative hypothesis is true
(i.e. the mean filling weight is not 6 ounces).
Confidence Interval Approach to
Two-Tailed Tests About a
Population Mean
Select a simple random sample from the population
x
and use the value of the sample mean to develop
the confidence interval for the population mean µ.

If the confidence interval contains the hypothesized


value µ0, do not reject H0. Otherwise, reject H0.
Confidence Interval Approach
The 97% confidence interval for µ is
σ
x ± zα /2 = 6.1± 2.17(.2 30) = 6.1± .07924
n
or 6.02076 to 6.17924
Because the hypothesized value for the
population mean, µ0 = 6, is not in this interval,
the hypothesis-testing conclusion is that the
null hypothesis, H0: µ = 6, can be rejected.
Tests About a Population Mean:
σ Unknown
• Test Statistic
x − µ0
t=
s/ n

This test statistic has a t distribution


with n - 1 degrees of freedom.
Tests About a Population Mean:
σ Unknown
Rejection Rule: p -Value Approach
Reject H0 if p –value < α
Rejection Rule: Critical Value Approach
H0: µ > µ0 Reject H0 if t < -tα

H0: µ < µ0 Reject H0 if t > tα

H0: µ = µ0 Reject H0 if t < - tα/2 or t > tα/2


p -Values and the t Distribution
The format of the t distribution table provided in most
statistics textbooks does not have sufficient detail
to determine the exact p-value for a hypothesis test.
However, we can still use the t distribution table to
identify a range for the p-value.
An advantage of computer software packages is that
the computer output will provide the p-value for the
t distribution.
Example: Highway Patrol
A State Highway Patrol periodically samples
vehicle speeds at various locations on a particular
roadway. The sample of vehicle speeds is used to
test the hypothesis H0: µ < 65.
The locations where H0 is rejected are deemed the
best locations for radar traps. At Location F, a
sample of 64 vehicles shows a mean speed of 66.2
mph with a standard deviation of 4.2 mph. Use α
= .05 to test the hypothesis.
One-Tailed Test About a Population Mean:
σ Unknown
 p –Value and Critical Value Approaches

H0: µ < 65
Ha: µ > 65

α = .05

x − µ 0 66.2 − 65
t= = = 2.286
s / n 4.2 / 64
One-Tailed Test About a Population Mean:
σ Unknown
 p –Value Approach

For t = 2.286, the p–value must be less than .025


(for t = 1.998) and greater than .01 (for t = 2.387).
.01 < p–value < .025

Because p–value < α = .05, we reject H0.


We are at least 95% confident that the mean speed
of vehicles at Location F is greater than 65 mph.
One-Tailed Test About a Population Mean:
σ Unknown
 Critical Value Approach

For α = .05 and d.f. = 64 – 1 = 63, t.05 = 1.669


Reject H0 if t > 1.669

Because 2.286 > 1.669, we reject H0.


We are at least 95% confident that the mean speed
of vehicles at Location F is greater than 65 mph.
Location F is a good candidate for a radar trap.
Tests About a Population Proportion

Test Statistic
p − p0
z=
σp

where:

p0 (1 − p0 )
σp =
n

assuming np > 5 and n(1 – p) > 5


Tests About a Population Proportion

Rejection Rule: p –Value Approach


Reject H0 if p –value < α
Rejection Rule: Critical Value Approach
H0: p < p0 Reject H0 if z > zα

H0: p > p0 Reject H0 if z < -zα

H0: p = p0 Reject H0 if z < -zα/2 or z > zα/2


Two-Tailed Test About a
Population Proportion
Example: National Safety Council (NSC)
For a Christmas and New Year’s week, the
National Safety Council estimated that 500 people
would be killed and 25,000 injured on the nation’s
roads. The NSC claimed that 50% of the accidents
would be caused by drunk driving.
A sample of 120 accidents showed that 67 were
caused by drunk driving. Use these data to test the
NSC’s claim with α = .05.
Two-Tailed Test About a
Population Proportion
 p –Value and Critical Value Approaches

H 0 : p = .5
H a : p ≠ .5

α = .05

p0 (1 − p0 ) .5(1 − .5)
σp = = = .045644
n 120
p − p0 (67 /120) − .5
z= = = 1.28
σp .045644
Two-Tailed Test About a
Population Proportion
 p−Value Approach

For z = 1.28, cumulative probability = .8997


p–value = 2(1 − .8997) = .2006

Because p–value = .2006 > α = .05, we cannot reject H0.


Two-Tailed Test About a
Population Proportion
 Critical Value Approach

For α/2 = .05/2 = .025, z.025 = 1.96


Reject H0 if z < -1.96 or z > 1.96

Because 1.278 > -1.96 and < 1.96, we cannot reject H0.
Part II
Two Populations
Inferences About the Difference Between
Two Population Means: σ 1 and σ 2 Known

• Interval Estimation of µ 1 – µ 2
• Hypothesis Tests About µ 1 – µ 2
Sampling Distribution of x1 − x2

Expected Value
E ( x1 − x2 ) = µ 1 − µ 2

Standard Deviation (Standard Error)

σ12 σ 22
σ x1 − x2 = +
n1 n2

where: σ1 = standard deviation of population 1


σ2 = standard deviation of population 2
n1 = sample size from population 1
n2 = sample size from population 2
Interval Estimation of µ1 - µ2:
σ 1 and σ 2 Known
Interval Estimate

σ12 σ 22
x1 − x2 ± zα / 2 +
n1 n2
where:
1 - α is the confidence coefficient
Hypothesis Tests About µ 1 − µ 2:
σ 1 and σ 2 Known
Hypotheses

H0 : µ1 − µ2 ≥ D0 H0 : µ1 − µ2 ≤ D0 H0 : µ1 − µ2 = D0
Ha: µ1 − µ2 < D0 Ha : µ1 − µ2 > D0 Ha : µ1 − µ2 ≠ D0
Left-tailed Right-tailed Two-tailed
Test Statistic

( x1 − x2 ) − D0
z=
σ 12 σ 22
+
n1 n2
Hypothesis Tests About µ 1 − µ 2:
σ 1 and σ 2 Known
Example: Par, Inc.
Can we conclude, using α = .01, that the
mean driving distance of Par, Inc. golf balls is
greater than the mean driving distance of Rap, Ltd.
golf balls?
Hypothesis Tests About µ 1 − µ 2:
σ 1 and σ 2 Known
p –Value and Critical Value Approaches

1. Develop the hypotheses. H0: µ1 - µ2 < 0


Ha: µ1 - µ2 > 0
where:
µ1 = mean distance for the population
of Par, Inc. golf balls
µ2 = mean distance for the population
of Rap, Ltd. golf balls

2. Specify the level of significance. α = .01


Hypothesis Tests About µ 1 − µ 2:
σ 1 and σ 2 Known
p –Value and Critical Value Approaches

3. Compute the value of the test statistic.

( x 1 − x 2 ) − D0
z=
σ 12 σ 22
+
n1 n2
(235 − 218) − 0 17
z= = = 6.49
(15) 2 (20 ) 2 2.62
+
120 80
Hypothesis Tests About µ 1 − µ 2:
σ 1 and σ 2 Known
 p –Value Approach

4. Compute the p–value.


For z = 6.49, the p –value < .0001.

5. Determine whether to reject H0.


Because p–value < α = .01, we reject H0.
At the .01 level of significance, the sample evidence
indicates the mean driving distance of Par, Inc. golf
balls is greater than the mean driving distance of Rap,
Ltd. golf balls.
Hypothesis Tests About µ 1 − µ 2:
σ 1 and σ 2 Known
 Critical Value Approach

4. Determine the critical value and rejection rule.

For α = .01, z.01 = 2.33


Reject H0 if z > 2.33

5. Determine whether to reject H0.


Because z = 6.49 > 2.33, we reject H0.
The sample evidence indicates the mean driving
distance of Par, Inc. golf balls is greater than the mean
driving distance of Rap, Ltd. golf balls.
Small Sample Case

• Each of the two populations is normally


distributed.
• The two samples are independent.
• At least one of the samples is small, n < 30.
• The values of the population variances are
unknown.
• The variances of the two populations are equal.
σ12 = σ22
T stat

(X − X )− (µ − µ )
1 2 1 2
t=
S (n − 1) + S (n − 1)
2 2
1 1 2 2 1 1
+
n +n −2
1 2 n n 1 2
Confidence Interval

(X − X )± t S (
2
1n − 1) + S (n
1
2
2 2
− 1) 1
+
1
1 2
n +n −2 1 2 n n
1 2

where df = n + n − 2
1 2
Inferences About the Difference Between
Two Population Means: Matched Samples
With a matched-sample design each sampled item
provides a pair of data values.
This design often leads to a smaller sampling error
than the independent-sample design because
variation between sampled items is eliminated as a
source of sampling error.
Inferences About the Difference Between
Two Population Means: Matched Samples
Example: Express Deliveries
A Chicago-based firm has documents that must
be quickly distributed to district offices throughout
the U.S. The firm must decide between two delivery
services, UPX (United Parcel Express) and INTEX
(International Express), to transport its documents.
Inferences About the Difference Between
Two Population Means: Matched Samples
Example: Express Deliveries
In testing the delivery times of the two services,
the firm sent two reports to a random sample of its
district offices with one report carried by UPX and
the other report carried by INTEX. Do the data on
the next slide indicate a difference in mean delivery
times for the two services? Use a .05 level of
significance.
Inferences About the Difference Between
Two Population Means: Matched Samples

Delivery Time (Hours)


District Office UPX INTEX Difference
Seattle 32 25 7
Los Angeles 30 24 6
Boston 19 15 4
Cleveland 16 15 1
New York 15 13 2
Houston 18 15 3
Atlanta 14 15 -1
St. Louis 10 8 2
Milwaukee 7 9 -2
Denver 16 11 5
Inferences About the Difference Between
Two Population Means: Matched Samples
 p –Value and Critical Value Approaches
1. Develop the hypotheses.
H0: µd = 0
Ha: µd ≠ 0
Let µd = the mean of the difference values for the
two delivery services for the population
of district offices
Inferences About the Difference Between
Two Population Means: Matched Samples
 p –Value and Critical Value Approaches
2. Specify the level of significance. α = .05

3. Compute the value of the test statistic.

∑ di ( 7 + 6+... +5)
d = = = 2. 7
n 10
2
∑ ( di − d ) 76.1
sd = = = 2. 9
n −1 9
d − µd 2.7 − 0
t= = = 2.94
sd n 2.9 10
Inferences About the Difference Between
Two Population Means: Matched Samples
 p –Value Approach
4. Compute the p –value.

For t = 2.94 and df = 9, the p–value is between


.02 and .01. (This is a two-tailed test, so we double
the upper-tail areas of .01 and .005.)

5. Determine whether to reject H0.


Because p–value < α = .05, we reject H0.
We are at least 95% confident that there is a
difference in mean delivery times for the two
services?
Inferences About the Difference Between
Two Population Means: Matched Samples
 Critical Value Approach
4. Determine the critical value and rejection rule.

For α = .05 and df = 9, t.025 = 2.262.


Reject H0 if t > 2.262

5. Determine whether to reject H0.


Because t = 2.94 > 2.262, we reject H0.
We are at least 95% confident that there is a
difference in mean delivery times for the two
services?
Inferences About the Difference Between
Two Population Proportions

• Interval Estimation of p1 - p2
• Hypothesis Tests About p1 - p2
Sampling Distribution of p1 − p2

Expected Value

E ( p1 − p2 ) = p1 − p2

Standard Deviation (Standard Error)

p1 (1 − p1 ) p2 (1 − p2 )
σ p1 − p2 = +
n1 n2

where: n1 = size of sample taken from population 1


n2 = size of sample taken from population 2
Sampling Distribution of p1 − p2

The sample sizes are sufficiently


large if all of these conditions
are met:
n1p1 > 5 n1(1 - p1) > 5

n2p2 > 5 n2(1 - p2) > 5


Interval Estimation of p1 - p2

Interval Estimate

p1 (1− p1 ) p2 (1− p2 )
p1 − p2 ± zα / 2 +
n1 n2
Hypothesis Tests about p1 - p2

Hypotheses
We focus on tests involving no difference between
the two population proportions (i.e. p1 = p2)

H0 : p1 − p2 ≥ 0 H0: p1 - p2 < 0 H0: p1 − p2 = 0


Ha : p1 − p2 < 0 Ha: p1 - p2 > 0 Ha : p1 − p2 ≠ 0
Left-tailed Right-tailed Two-tailed
Hypothesis Tests about p1 - p2

 Standard Error of p1 − p2 when p1 = p2 = p

1 1
σ p −p = p(1 − p)  + 
 n1 n2 
1 2

 Pooled Estimator of p when p1 = p2 = p

n1 p1 + n2 p2
p=
n1 + n2
Hypothesis Tests about p1 - p2

 Test Statistic

( p1 − p2 )
z=
 1 1 
p (1 − p )  + 
n
 1 n 2 
Hypothesis Tests about p1 - p2

Example: Market Research Associates


Can we conclude, using a .05 level of significance,
that the proportion of households aware of the
client’s product increased after the new advertising
campaign?
Hypothesis Tests about p1 - p2

p -Value and Critical Value Approaches

1. Develop the hypotheses. H0: p1 - p2 < 0


Ha: p1 - p2 > 0
p1 = proportion of the population of households
“aware” of the product after the new campaign
p2 = proportion of the population of households
“aware” of the product before the new campaign
Hypothesis Tests about p1 - p2

 p -Value and Critical Value Approaches

2. Specify the level of significance. α = .05

3. Compute the value of the test statistic.

250(. 48) + 150(. 40) 180


p= = =. 45
250 + 150 400

s p1 − p2 = . 45(. 55)( 1 + 1 ) = . 0514


250 150
(.48 − .40 ) − 0 .08
z= = = 1.56
.0514 .0514
Hypothesis Tests about p1 - p2

 p –Value Approach
4. Compute the p –value.

For z = 1.56, the p–value = .0594

5. Determine whether to reject H0.


Because p–value > α = .05, we cannot reject H0.
We cannot conclude that the proportion of households
aware of the client’s product increased after the new
campaign.
Hypothesis Tests about p1 - p2

 Critical Value Approach


4. Determine the critical value and rejection rule.

For α = .05, z.05 = 1.645


Reject H0 if z > 1.645

5. Determine whether to reject H0.

Because 1.56 < 1.645, we cannot reject H0.


We cannot conclude that the proportion of households
aware of the client’s product increased after the new
campaign.
Inferences About Population
Variances
Inferences about Two Populations Variances
Inference about a Population Variance
Inferences About a Population Variance

A variance can provide important decision-making


information.
Consider the production process of filling containers
with a liquid detergent product.
The mean filling weight is important, but also is the
variance of the filling weights.
By selecting a sample of containers, we can compute
a sample variance for the amount of detergent placed
in a container.
If the sample variance is excessive, overfilling and
underfilling may be occurring even though the mean
is correct.
Inferences About a Population
Variance
• Chi-Square Distribution
• Interval Estimation of σ 2
• Hypothesis Testing
Chi-Square Distribution
The chi-square distribution is the sum of squared
standardized normal random variables such as
(z1)2+(z2)2+(z3)2 and so on.
The chi-square distribution is based on sampling
from a normal population.
The sampling distribution of (n - 1)s2/σ 2 has a chi-
square distribution whenever a simple random sample
of size n is selected from a normal population.
We can use the chi-square distribution to develop
interval estimates and conduct hypothesis tests
about a population variance.
Examples of Sampling Distribution of (n - 1)s2/σ 2

With 2 degrees
of freedom
With 5 degrees
of freedom

With 10 degrees
of freedom

(n −1)s2
0 σ2
Chi-Square Distribution
2
We will use the notation χtoα2 denote the value for the
chi-square distribution that provides an area of α to
2
the right of the stated χvalue.
α

For example, there is a .95 probability of obtaining a χ2


(chi-square) value such that
2
χ.975 ≤ χ 2 ≤ χ.025
2
Interval Estimation of σ 2

2 ( n − 1)s 2 2
χ .975 ≤ ≤ χ .025
σ2
.025
.025
95% of the
possible χ2 values
χ2
2 2
0 χ .975 χ .025
Interval Estimation of σ 2

There is a (1 – α) probability of obtaining a χ2 value


such that 2 2 2
χ (1−α / 2) ≤ χ ≤ χα / 2

Substituting (n – 1)s2/σ 2 for the χ2 we get


(n − 1) s 2
χ (12 −α / 2) ≤ ≤ χα2 / 2
σ2
Performing algebraic manipulation we get
( n − 1) s 2 ( n − 1) s 2
≤ σ2 ≤
χ α2 / 2 χ (21−α / 2)
Interval Estimation of σ 2
• Interval Estimate of a Population Variance

( n − 1) s 2 2 ( n − 1) s 2
2
≤σ ≤
χα /2 χ (21−α / 2)

where the χ2 values are based on a chi-square


distribution with n - 1 degrees of freedom and
where 1 - α is the confidence coefficient.
Interval Estimation of σ
Taking the square root of the upper and lower
limits of the variance interval provides the confidence
interval for the population standard deviation.

(n −1)s2 (n −1)s2
2
≤σ ≤
χα /2 χ(12−α /2)
Interval Estimation of σ 2
Example: Buyer’s Digest (A)
Buyer’s Digest rates thermostats manufactured
for home temperature control. In a recent test,
10 thermostats manufactured by ThermoRite
were selected and placed in a test room that was
maintained at a temperature of 68oF. The
temperature readings of the ten thermostats are
shown on the next slide.
Interval Estimation of σ 2

Example: Buyer’s Digest (A)


We will use the 10 readings below to develop a
95% confidence interval estimate of the population
variance.

Thermostat 1 2 3 4 5 6 7 8 9 10
Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2
Interval Estimation of σ 2

For n - 1 = 10 - 1 = 9 d.f. and α = .05

Selected Values from the Chi-Square Distribution Table


Degrees Area in Upper Tail
of Freedom .99 .975 .95 .90 .10 .05 .025 .01
5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666

10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209

2
Our χ .975 value
Interval Estimation of σ 2

For n - 1 = 10 - 1 = 9 d.f. and α = .05

( n − 1)s 2 2
2.700 ≤ ≤ χ .025
σ2
.025
Area in
Upper Tail
= .975
χ2
0 2.700
Interval Estimation of σ 2

For n - 1 = 10 - 1 = 9 d.f. and α = .05

Selected Values from the Chi-Square Distribution Table


Degrees Area in Upper Tail
of Freedom .99 .975 .95 .90 .10 .05 .025 .01
5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666

10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209

2
Our χ .025 value
Interval Estimation of σ 2

n - 1 = 10 - 1 = 9 degrees of freedom and α = .05

( n − 1)s 2
2.700 ≤ 2
≤ 19.023
σ

.025 Area in Upper


Tail = .025

χ2
0 2.700 19.023
Interval Estimation of 2 σ
• Sample variance s2 provides a point estimate of σ 2.
2 ∑ ( xi − x ) 2 6. 3
s = = =. 70
n −1 9
A 95% confidence interval for the population variance is
given by:
(10 − 1). 70 2 (10 − 1). 70
≤σ ≤
19. 02 2. 70

.33 < σ 2 < 2.33


Hypothesis Testing
About a Population Variance
•Hypotheses
H 0 : σ 2 ≥ σ 02
• Left-Tailed Test
H a : σ 2 < σ 02

where σ 02 is the hypothesized value


for the population variance

•Test Statistic
( n − 1) s 2
χ2 =
σ 20
Hypothesis Testing
About a Population Variance
Left-Tailed Test (continued)
•Rejection Rule
Critical value approach: Reject H0 if χ 2 ≤ χ(12 −α )

p-Value approach: Reject H0 if p-value < α

where χ (21 −α ) is based on a chi-square


distribution with n - 1 d.f.
Hypothesis Testing
About a Population Variance
Right-Tailed Test
•Hypotheses
H0 : σ 2 ≤ σ 20
H a : σ 2 > σ 20

where σ 02 is the hypothesized value


for the population variance

•Test Statistic
( n − 1) s 2
χ2 =
σ 20
Hypothesis Testing
About a Population Variance
Right-Tailed Test (continued)
•Rejection Rule
Critical value approach: Reject H0 if χ2 ≥ χα2

p-Value approach: Reject H0 if p-value < α

where χ α2 is based on a chi-square


distribution with n - 1 d.f.
Hypothesis Testing
About a Population Variance
Two-Tailed Test
•Hypotheses
H0 : σ 2 = σ 20
H a : σ 2 ≠ σ 20

where σ 02 is the hypothesized value


for the population variance

•Test Statistic
( n − 1) s 2
χ2 =
σ 20
Hypothesis Testing
About a Population Variance
Two-Tailed Test (continued)
•Rejection Rule
Critical value approach:
Reject H0 if χ 2 ≤ χ (12 −α /2 ) or χ 2 ≥ χα2 /2

p-Value approach:

Reject H0 if p-value < α

where χ(12 −α /2) and χα2/2 are based on a


chi-square distribution with n - 1 d.f.
Hypothesis Testing
About a Population Variance
Example: Buyer’s Digest (B)

Recall that Buyer’s Digest is rating


ThermoRite thermostats. Buyer’s Digest gives an
“acceptable” rating to a thermostat with a
temperature variance of 0.5 or less.

We will conduct a hypothesis test (with α = .10)


to determine whether the ThermoRite thermostat’s
temperature variance is “acceptable”.
Hypothesis Testing
About a Population Variance
Example: Buyer’s Digest (B)
Using the 10 readings, we will conduct a
hypothesis test (with α = .10) to determine whether
the ThermoRite thermostat’s temperature variance is
“acceptable”.

Thermostat 1 2 3 4 5 6 7 8 9 10
Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2
Hypothesis Testing
About a Population Variance

H 0 : σ 2 ≤ 0.5
• Hypotheses
H a : σ 2 > 0.5

Rejection Rule

Reject H0 if χ 2 > 14.684


Hypothesis Testing
About a Population Variance
For n - 1 = 10 - 1 = 9 d.f. and α = .10

Selected Values from the Chi-Square Distribution Table


Degrees Area in Upper Tail
of Freedom .99 .975 .95 .90 .10 .05 .025 .01
5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666

10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209


Hypothesis Testing
About a Population Variance
Rejection Region

( n − 1)s 2 9s2
χ2 = =
σ2 .5

Area in Upper
Tail = .10

χ2
0 14.684
Reject H0
Hypothesis Testing
About a Population Variance

The sample variance s 2 = 0.7


• Test Statistic 9(.7)
2
χ = = 12.6
.5
Conclusion
Because χ2 = 12.6 is less than 14.684, we cannot
reject H0. The sample variance s2 = .7 is insufficient
evidence to conclude that the temperature variance
for ThermoRite thermostats is unacceptable.
Hypothesis Testing
About a Population Variance

• Using the p-Value


The rejection region for the ThermoRite
thermostat example is in the upper tail; thus, the
appropriate p-value is less than .90 (χ 2 = 4.168)
and greater than .10 (χ 2 = 14.684).
Because the p –value > α = .10, we cannot
reject the null hypothesis.
The sample variance of s 2 = .7 is insufficient
evidence to conclude that the temperature
variance is unacceptable (>.5).

The exact p-value


p-value is .18156.
Inferences About Two Population Variances

We may want to compare the variances in:


product quality resulting from two different
production processes,
temperatures for two heating devices, or
assembly times for two assembly methods.
We use data collected from two independent random
sample, one from population 1 and another from
population 2.

The two sample variances will be the basis for making


inferences about the two population variances.
Hypothesis Testing About the
Variances of Two Populations
One-Tailed Test
•Hypotheses
H 0 : σ 12 ≤ σ 22
H a : σ 12 > σ 22

Denote the population providing the


larger sample variance as population 1.

•Test Statistic
s12
F=
s22
Hypothesis Testing About the
Variances of Two Populations
One-Tailed Test (continued)
•Rejection Rule
Critical value approach: Reject H0 if F > Fα

where the value of Fα is based on an


F distribution with n1 - 1 (numerator)
and n2 - 1 (denominator) d.f.

p-Value approach: Reject H0 if p-value < α


Hypothesis Testing About the
Variances of Two Populations
Two-Tailed Test
•Hypotheses
H 0 : σ 12 = σ 22
Ha : σ 12 ≠ σ 22

Denote the population providing the


larger sample variance as population 1.

•Test Statistic
2
s
F= 1
s22
Hypothesis Testing About the
Variances of Two Populations
Two-Tailed Test (continued)
•Rejection Rule
Critical value approach: Reject H0 if F > Fα/2

where the value of Fα/2 is based on an


F distribution with n1 - 1 (numerator)
and n2 - 1 (denominator) d.f.

p-Value approach: Reject H0 if p-value < α


Hypothesis Testing About the
Variances of Two Populations
Example: Buyer’s Digest (C)
Buyer’s Digest has conducted the same test, as
was described earlier, on another 10 thermostats,
this time manufactured by TempKing. The
temperature readings of the ten thermostats are
listed on the next slide.
We will conduct a hypothesis test with α = .10 to see
if the variances are equal for ThermoRite’s thermostats
and TempKing’s thermostats.
Hypothesis Testing About the
Variances of Two Populations
Example: Buyer’s Digest (C)
ThermoRite Sample

Thermostat 1 2 3 4 5 6 7 8 9 10
Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2

TempKing Sample

Thermostat 1 2 3 4 5 6 7 8 9 10
Temperature 67.7 66.4 69.2 70.1 69.5 69.7 68.1 66.6 67.3 67.5
Hypothesis Testing About the
Variances of Two Populations
• Hypotheses
H 0 : σ 12 = σ 22 (TempKing and ThermoRite thermostats
have the same temperature variance)
H a : σ 12 ≠ σ 22 (Their variances are not equal)

Rejection Rule
The F distribution table (on next slide) shows that with
with α = .10, 9 d.f. (numerator), and 9 d.f. (denominator),
F.05 = 3.18.
Reject H0 if F > 3.18
Hypothesis Testing About the
Variances of Two Populations
Selected Values from the F Distribution Table
Denominator Area in Numerator Degrees of Freedom
Degrees Upper
of Freedom Tail 7 8 9 10 15
8 .10 2.62 2.59 2.56 2.54 2.46
.05 3.50 3.44 3.39 3.35 3.22
.025 4.53 4.43 4.36 4.30 4.10
.01 6.18 6.03 5.91 5.81 5.52

9 .10 2.51 2.47 2.44 2.42 2.34


.05 3.29 3.23 3.18 3.14 3.01
.025 4.20 4.10 4.03 3.96 3.77
.01 5.61 5.47 5.35 5.26 4.96
Hypothesis Testing About the
Variances of Two Populations

TempKing’s sample variance is 1.768


ThermoRite’s sample variance is .700
• Test s 2

Statistic F= 1 2 = 1.768/.700 = 2.53


s
2
Conclusion
We cannot reject H0. F = 2.53 < F.05 = 3.18.
There is insufficient evidence to conclude that
the population variances differ for the two
thermostat brands.
Hypothesis Testing About the
Variances of Two Populations
• Determining and Using the p-Value

Area in Upper Tail .10 .05 .025 .01


F Value (df1 = 9, df2 = 9) 2.44 3.18 4.03 5.35

Because F = 2.53 is between 2.44 and 3.18, the area


in the upper tail of the distribution is between .10
and .05.
But this is a two-tailed test; after doubling the
upper-tail area, the p-value is between .20 and .10.
Because α = .10, we have p-value > α and therefore
we cannot reject the null hypothesis.
References/Sources
• Ken Black
• Anderson Sweeney Williams
• Levin Rubin

Das könnte Ihnen auch gefallen