Sie sind auf Seite 1von 47

Slides Prepared by

JOHN S. LOUCKS
St. Edward’s University

© 2003 South-Western /Thomson Learning™


Slide 1
Chapter 12
Tests of Goodness of Fit and Independence
 Goodness of Fit Test: A Multinomial Population
 Test of Independence
 Goodness of Fit Test: Poisson and Normal
Distributions

© 2003 South-Western /Thomson Learning™


Slide 2
Goodness of Fit Test: A Multinomial Population

1. Set up the null and alternative hypotheses.


2. Select a random sample and record the observed
frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected
frequency, ei , in each category by multiplying the
category probability by the sample size.
continued

© 2003 South-Western /Thomson Learning™


Slide 3
Goodness of Fit Test: A Multinomial Population

4. Compute the value of the test statistic.


kk ( f  e ) 22
 22   ii ii
ii11 eii

5. Rejection rule:
Using test statistic: Reject H0 if  2   a2
Using p-value: Reject H0 if p-value < a
(where a is the significance level and
there are k - 1 degrees of freedom)

© 2003 South-Western /Thomson Learning™


Slide 4
Example: Finger Lakes Homes (A)

Finger Lakes Homes manufactures four models of


prefabricated homes, a two-story colonial, a ranch, a
split-level, and an A-frame. To help in production
planning, management would like to determine if
previous customer purchases indicate that there is a
preference in the style selected.
The number of homes sold of each model for 100
sales over the past two years is shown below.
Model Colonial Ranch Split-Level A-Frame
# Sold 30 20 35 15

© 2003 South-Western /Thomson Learning™


Slide 5
Example: Finger Lakes Homes (A)

 Notation
pC = popul. proportion that purchase a colonial
pR = popul. proportion that purchase a ranch
pS = popul. proportion that purchase a split-level
pA = popul. proportion that purchase an A-frame
 Hypotheses
H0: pC = pR = pS = pA = .25
Ha: The population proportions are not
pC = .25, pR = .25, pS = .25, and pA = .25

© 2003 South-Western /Thomson Learning™


Slide 6
Example: Finger Lakes Homes (A)

 Expected Frequencies
e1 = .25(100) = 25 e2 = .25(100) = 25
e3 = .25(100) = 25 e4 = .25(100) = 25
 Test Statistic
( 30  25) 2 ( 20  25) 2 ( 35  25) 2 (15  25) 2
 
2
  
25 25 25 25
=1+1+4+4
= 10

© 2003 South-Western /Thomson Learning™


Slide 7
Example: Finger Lakes Homes (A)

 Rejection Rule
With a = .05 and
k-1=4-1=3
degrees of freedom

Do Not Reject H0 Reject H0


2
7.81

© 2003 South-Western /Thomson Learning™


Slide 8
Example: Finger Lakes Homes (A)

 Conclusion
2 = 10 > 7.81, so we reject the assumption there
is no home style preference, at the .05 level of
significance.

© 2003 South-Western /Thomson Learning™


Slide 9
Using Excel to Conduct a Goodness of Fit Test

 Worksheet (showing data)


A B C D E F G
1 House Style
2 1 Col
3 2 Ran
4 3 Ran
5 4 Afr
6 5 Col
7 6 Spl
8 7 Afr
9 8 Col
10 9 Afr
11 10 Ran
12 11 Spl
Note: Rows 13-101 are not shown.

© 2003 South-Western /Thomson Learning™


Slide 10
Using Excel to Conduct a Goodness of Fit Test

 Formula Worksheet
C D E F G H I
1 Hyp. Observed Expect. Sq'd. Sq.Diff./
2 Categ. Prop. Frequency Freq. Diff. Diff. Exp.Freq.
3 Col. 0.25 =COUNTIF(B2:B101,"Col") =D3*$E$7 =E4-F4 =G3^2 =H3/F3
4 Ranch 0.25 =COUNTIF(B2:B101,"Ran") =D4*$E$7 =E5-F5 =G4^2 =H4/F4
5 Split-L 0.25 =COUNTIF(B2:B101,"Spl") =D5*$E$7 =E6-F6 =G5^2 =H5/F5
6 A-Fr. 0.25 =COUNTIF(B2:B101,"Afr") =D6*$E$7 =E7-F7 =G6^2 =H6/F6
7 Total =SUM(E3:E6) =SUM(I3:I6)
8
9 Categories 4
10 Test Statistic =I7
11 Degr. of Free. =E-1
12 p -Value =CHIDIST(E11,E12)
Note: Columns A-B and rows 13-101 are not shown.

© 2003 South-Western /Thomson Learning™


Slide 11
Using Excel to Conduct a Goodness of Fit Test

 Value Worksheet
C D E F G H I
1 Hyp. Observed Expect. Sq'd. Sq.Diff./
2 Categ. Prop. Frequency Freq. Diff. Diff. Exp.Freq.
3 Col. 0.25 30 25 5 25 1
4 Ranch 0.25 20 25 -5 25 1
5 Split-L 0.25 35 25 10 100 4
6 A-Fr. 0.25 15 25 -10 100 4
7 Total 100 10
8
9 Categories 4
10 Test Statistic 10
11 Degr. of Free. 3
12 p -Value 0.0186
Note: Columns A-B and rows 13-101 are not shown.

© 2003 South-Western /Thomson Learning™


Slide 12
Using Excel to Conduct a Goodness of Fit Test

 Using the p-Value


• The value worksheet shows that the resulting p-
value is 0.0186.
• The rejection rule is “Reject H0 if p-value < a”
• Because .0186 < .05, we reject H0 and conclude that
the population proportions are not all equal to .25
(we reject the assumption there is no home style
preference)

© 2003 South-Western /Thomson Learning™


Slide 13
Test of Independence: Contingency Tables

1. Set up the null and alternative hypotheses.


2. Select a random sample and record the observed
frequency, fij , for each cell of the contingency table.
3. Compute the expected frequency, eij , for each cell.

(Row i Total)(Column j Total)


eij 
Sample Size

© 2003 South-Western /Thomson Learning™


Slide 14
Test of Independence: Contingency Tables

4. Compute the test statistic.


( f ij  eij ) 2
2   
i j eij

5. Reject H0 if  2   a2 , where a is the significance level


and with n rows and m columns there are (n - 1)(m - 1)
degrees of freedom.

© 2003 South-Western /Thomson Learning™


Slide 15
Example: Finger Lakes Homes (B)

Each home sold can be classified according to


price and to style. Finger Lakes Homes’ manager
would like to determine if the price of the home and
the style of the home are independent variables.
The number of homes sold for each model and
price for the past two years is shown below. For
convenience, the price of the home is listed as either
$65,000 or less or more than $65,000.
Price Colonial Ranch Split-Level A-Frame
< $65,000 18 6 19 12
> $65,000 12 14 16 3

© 2003 South-Western /Thomson Learning™


Slide 16
Example: Finger Lakes Homes (B)

 Hypotheses
H0: Price of the home is independent of the style
of the home that is purchased
Ha: Price of the home is not independent of the
style of the home that is purchased
 Expected Frequencies

Price Colonial Ranch Split-Level A-Frame Total


< $99K 18 6 19 12 55
> $99K 12 14 16 3 45
Total 30 20 35 15 100

© 2003 South-Western /Thomson Learning™


Slide 17
Example: Finger Lakes Homes (B)

 Test Statistic
(18  16 . 5) 2
( 6  11) 2
( 3  6 . 75) 2
2    ... 
16. 5 11 6. 75
= .1364 + 2.2727 + . . . + 2.0833 = 9.1486
 Rejection Rule
With a = .05 and (2 - 1)(4 - 1) = 3 d.f., .205  7. 81
Reject H0 if 2 > 7.81
 Conclusion
We reject H0, the assumption that the price of the
home is independent of the style of the home
that is purchased.

© 2003 South-Western /Thomson Learning™


Slide 18
Using Excel to Conduct a Test of Independence

 Worksheet (showing data entered)


A B C D E
1 Home Price ($) Style
2 1 >99K Colonial
3 2 <=99K Ranch
4 3 >99K Ranch
5 4 <=99K A-Frame
6 5 <=99K Colonial
7 6 <=99K Split-Level
8 7 >99K A-Frame
9 8 >99K Colonial
10 9 <=99K A-Frame
Note: Rows 11-101 are not shown.
© 2003 South-Western /Thomson Learning™
Slide 19
Using Excel to Conduct a Test of Independence

 Worksheet (showing Pivot Table)


D E F G H I J
1 Count of Home Style
2 Price ($) Colonial Ranch Split-Lev. A-Frame Grand Tot.
3 <=99K 18 6 19 12 55
4 >99K 12 14 16 3 45
5 Grand Total 30 20 35 15 100
6
7
8
Note: Columns A-C (sample data) are not shown.

© 2003 South-Western /Thomson Learning™


Slide 20
Using Excel to Conduct a Test of Independence

 Formula Worksheet
D E F G H I J
1 Count of Home Style
2 Price ($) Colonial Ranch Split-Lev. A-Frame Grand Tot.
3 <=99K 18 6 19 12 55
4 >99K 12 14 16 3 45
5 Grand Total 30 20 35 15 100
6
7 Expected Frequencies
8 Colonial Ranch Split-Lev. A-Frame
9 <=99K =F5*J3/J5 =G5*J3/J5 =H5*J3/J5 =I5*J3/J5
10 >99K =F5*J4/J5 =G5*J4/J5 =H5*J4/J5 =I5*J4/J5
11
12 Chi-Sq. =CHIINV(G13,3)
13 p -Value =CHITEST(F3:I4,F9:I10)
Note: Columns A-C (sample data) are not shown.
© 2003 South-Western /Thomson Learning™
Slide 21
Using Excel to Conduct a Test of Independence

 Value Worksheet
D E F G H I J
1 Count of Home Style
2 Price ($) Colonial Ranch Split-Lev. A-Frame Grand Tot.
3 <=99K 18 6 19 12 55
4 >99K 12 14 16 3 45
5 Grand Total 30 20 35 15 100
6
7 Expected Frequencies
8 Colonial Ranch Split-Lev. A-Frame
9 <=99K 16.50 11.00 19.25 8.25
10 >99K 13.50 9.00 15.75 6.75
11
12 Chi-Sq. 9.1486
13 p -Value 0.0274
Note: Columns A-C (sample data) are not shown.
© 2003 South-Western /Thomson Learning™
Slide 22
Goodness of Fit Test: Poisson Distribution

1. Set up the null and alternative hypotheses.


H0: Population has a Poisson probability distribution.
Ha: Population does not have a Poisson probab. distrib.
2. Select a random sample and
a. Record the observed frequency fi for each value of
the Poisson random variable.
b. Compute the mean number of occurrences .
3. Compute the expected frequency of occurrences ei for
each value of the Poisson random variable.
continued

© 2003 South-Western /Thomson Learning™


Slide 23
Goodness of Fit Test: Poisson Distribution

4. Compute the value of the test statistic.

( f i  ei ) 2
k
 
2
i 1 ei
5. Rejection rule:
Using test statistic: Reject H0 if  2   a2

Using p-value: Reject H0 if p-value < a

(where a is the significance level and


there are k - 2 degrees of freedom).

© 2003 South-Western /Thomson Learning™


Slide 24
Example: Troy Parking Garage

 Poisson Distribution Goodness of Fit Test


In studying the need for an additional entrance to a
city parking garage, a consultant has recommended an
approach that is applicable only in situations where
the number of cars entering during a specified time
period follows a Poisson distribution.

© 2003 South-Western /Thomson Learning™


Slide 25
Example: Troy Parking Garage

A random sample of 100 one-minute time intervals


resulted in the customer arrivals listed below. A
statistical test must be conducted to see if the
assumption of a Poisson distribution is reasonable.

# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1

© 2003 South-Western /Thomson Learning™


Slide 26
Example: Troy Parking Garage

 Hypotheses
H0: Number of cars entering the garage during
a one-minute interval is Poisson distributed.
Ha: Number of cars entering the garage during a
one-minute interval is not Poisson distributed

© 2003 South-Western /Thomson Learning™


Slide 27
Example: Troy Parking Garage

 Estimate of Poisson Probability Function


otal Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1) = 600
Total Time Periods = 100
Estimate of  = 600/100 = 6

6 x e 6
Hence, f ( x) 
x!

© 2003 South-Western /Thomson Learning™


Slide 28
Example: Troy Parking Garage

 Expected Frequencies

x f (x ) xf (x ) x f (x ) xf (x )
0 .0025 .25 7 .1389 13.89
1 .0149 1.49 8 .1041 10.41
2 .0446 4.46 9 .0694 6.94
3 .0892 8.92 10 .0417 4.17
4 .1339 13.39 11 .0227 2.27
5 .1620 16.20 12 .0155 1.55
6 .1606 16.06 Total 1.0000 100.00

© 2003 South-Western /Thomson Learning™


Slide 29
Example: Troy Parking Garage

 Observed and Expected Frequencies


i fi ei f i - ei
0 or 1 or 2 5 6.20 -1.20
3 10 8.92 1.08
4 14 13.39 0.61
5 20 16.06 3.94
6 12 16.06 -4.06
7 12 13.77 -1.77
8 9 10.33 -1.33
9 8 6.88 1.12
10 or more 10 8.38 1.62

© 2003 South-Western /Thomson Learning™


Slide 30
Example: Troy Parking Garage

 Test Statistic
( 1.20) 2
(1.08) 2
(1.62) 2
2    ...   3.268
6.20 8.92 8.38
 Rejection Rule
With a = .05 and k - p - 1 = 9 - 1 - 1 = 7 d.f. (where
k = number of categories and p = number of
population parameters estimated),  .205  14 . 07
Reject H0 if 2 > 14.07
 Conclusion
We cannot reject H0. There’s no reason to doubt
the assumption of a Poisson distribution.

© 2003 South-Western /Thomson Learning™


Slide 31
Using Excel to Conduct a
Poisson Distribution Goodness of Fit Test
 Formula Worksheet
A B C D E F
1 Number Observed Expected Sq'd. Sq.Diff./
2 of Arriv. Frequency Frequency Differ. Differ. Exp.Freq.
3 0, 1, or 2 5 =POISSON(2,6,TRUE)*B12 =B3-C3 =D3^2 =E3/C3
4 3 10 =POISSON(3,6,FALSE)*B12 =B4-C4 =D4^2 =E4/C4
5 4 14 =POISSON(4,6,FALSE)*B12 =B5-C5 =D5^2 =E5/C5
6 5 20 =POISSON(5,6,FALSE)*B12 =B6-C6 =D6^2 =E6/C6
7 6 12 =POISSON(6,6,FALSE)*B12 =B7-C7 =D7^2 =E7/C7
8 7 12 =POISSON(7,6,FALSE)*B12 =B8-C8 =D8^2 =E8/C8
9 8 9 =POISSON(8,6,FALSE)*B12 =B9-C9 =D9^2 =E9/C9
10 9 8 =POISSON(9,6,FALSE)*B12 =B10-C10 =D10^2 =E10/C10
11 10 or more 10 =(1-POISSON(9,6,TRUE))*B12 =B11-C11 =D11^2 =E11/C11
12 Total =SUM(B3:B11) =SUM(F3:F11)
13 Categories 9
14 Test Statistic =D11^2
15 Degrees of Freedom =C13-2
16 p -Value =CHIDIST(C14,C15)

© 2003 South-Western /Thomson Learning™


Slide 32
Using Excel to Conduct a
Poisson Distribution Goodness of Fit Test
 Value Worksheet
A B C D E F
1 Number Observed Expected Sq'd. Sq.Diff./
2 of Arriv. Frequency Frequency Differ. Differ. Exp.Freq.
3 0, 1, or 2 5 6.197 -1.197 1.433 0.2312
4 3 10 8.924 1.076 1.159 0.1299
5 4 14 13.385 0.615 0.378 0.0282
6 5 20 16.062 3.938 15.505 0.9653
7 6 12 16.062 -4.062 16.502 1.0274
8 7 12 13.768 -1.768 3.125 0.2270
9 8 9 10.326 -1.326 1.758 0.1702
10 9 8 6.884 1.116 1.246 0.1810
11 10 or more 10 8.392 1.608 2.584 0.3079
12 Total 100 3.2681
13 Categories 9
14 Test Statistic 3.2681
15 Degrees of Freedom 7
16 p -Value 0.8591

© 2003 South-Western /Thomson Learning™


Slide 33
Using Excel to Conduct a
Poisson Distribution Goodness of Fit Test
 Using the p-Value
• The value worksheet shows a p-value of .8591.
• The rejection rule is “Reject H0 if p-value < a”
• With .8591 > a = .05, we cannot reject the null
hypothesis that the number of cars entering the
garage during a one-minute interval is Poisson
distributed

© 2003 South-Western /Thomson Learning™


Slide 34
Goodness of Fit Test: Normal Distribution

1. Set up the null and alternative hypotheses.


2. Select a random sample and
a. Compute the mean and standard deviation.
b. Define intervals of values so that the expected
frequency is at least 5 for each interval.
c. For each interval record the observed frequencies
3. Compute the expected frequency, ei , for each interval.
continued

© 2003 South-Western /Thomson Learning™


Slide 35
Goodness of Fit Test: Normal Distribution

4. Compute the value of the test statistic.


k ( f  e ) 2
2   i i
i 1 ei
5. Reject H0 if  2   a2 (where a is the significance level
and there are k - 3 degrees of freedom).

© 2003 South-Western /Thomson Learning™


Slide 36
Example: Victor Computers

 Normal Distribution Goodness of Fit Test


Victor Computers manufactures and sells a
general purpose microcomputer. As part of a study
to evaluate sales personnel, management wants to
determine if the annual sales volume (number of
units sold by a salesperson) follows a normal
probability distribution.

© 2003 South-Western /Thomson Learning™


Slide 37
Example: Victor Computers

A simple random sample of 30 of the salespeople


was taken and their numbers of units sold are below.

33 43 44 45 52 52 56 58 63 64
64 65 66 68 70 72 73 73 74 75
83 84 85 86 91 92 94 98 102 105

(mean = 71, standard deviation = 18.54)

© 2003 South-Western /Thomson Learning™


Slide 38
Example: Victor Computers

 Hypotheses
H0: The population of number of units sold
has a normal distribution with mean 71
and standard deviation 18.54.
Ha: The population of number of units sold
does not have a normal distribution with
mean 71 and standard deviation 18.54.

© 2003 South-Western /Thomson Learning™


Slide 39
Example: Victor Computers

 Interval Definition
To satisfy the requirement of an expected
frequency of at least 5 in each interval we will
divide the normal distribution into 30/5 = 6
equal probability intervals.

© 2003 South-Western /Thomson Learning™


Slide 40
Example: Victor Computers

 Interval Definition

Areas
= 1.00/6
= .1667

53.02 71 88.98 = 71 + .97(18.54)


63.03 78.97

© 2003 South-Western /Thomson Learning™


Slide 41
Example: Victor Computers

 Observed and Expected Frequencies

i fi ei (fi – ei)2/ei
Less than 53.02 6 5 0.2
53.02 to 63.03 3 5 0.8
63.03 to 71.00 6 5 0.2
71.00 to 78.97 5 5 0
78.97 to 88.98 4 5 0.2
More than 88.98 6 5 0.2
Total 30 30 1.6=x2

© 2003 South-Western /Thomson Learning™


Slide 42
Example: Victor Computers

 Test Statistic
(1) 2
(  2 ) 2
(1) 2
( 0 ) 2
(  1) 2
(1) 2
2        1. 60
5 5 5 5 5 5
 Rejection Rule
With a = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.,
 .205  7. 81 Reject H0 if 2 > 7.81
 Conclusion
We cannot reject H0. There is little evidence to
support rejecting the assumption the population
is normally distributed with  = 71 and  = 18.54.

© 2003 South-Western /Thomson Learning™


Slide 43
Using Excel to Conduct a
Normal Distribution Goodness of Fit Test
 Formula Worksheet
A B C D E F
1 Number of Observed Expected Sq'd. Sq.Differ./
2 Units Sold Frequency Frequency Differ. Differ. Exp.Freq.
3 Less than 53.02 6 5 =B3-C3 =D3^2 =E3/C3
4 53.02 to 63.03 3 5 =B4-C4 =D4^2 =E4/C4
5 63.03 to 71.00 6 5 =B5-C5 =D5^2 =E5/C5
6 71.00 to 78.97 5 5 =B6-C6 =D6^2 =E6/C6
7 78.97 to 88.98 4 5 =B7-C7 =D7^2 =E7/C7
8 88.98 and Over 6 5 =B8-C8 =D8^2 =E8/C8
9 Total =SUM(B3:B8) =SUM(F3:F8)
10 Categories 6
11 Test Statistic =F9
12 Degrees of Freedom =C10-3
13 p -Value =CHIDIST(C11,C12)

© 2003 South-Western /Thomson Learning™


Slide 44
Using Excel to Conduct a
Normal Distribution Goodness of Fit Test
 Value Worksheet
A B C D E F
1 Number of Observed Expected Sq'd. Sq.Differ./
2 Units Sold Frequency Frequency Differ. Differ. Exp.Freq.
3 Less than 53.02 6 5 1 1 0.2
4 53.02 to 63.03 3 5 -2 4 0.8
5 63.03 to 71.00 6 5 1 1 0.2
6 71.00 to 78.97 5 5 0 0 0.0
7 78.97 to 88.98 4 5 -1 1 0.2
8 88.98 and Over 6 5 1 1 0.2
9 Total 30 1.6
10 Categories 6
11 Test Statistic 1.600
12 Degrees of Freedom 3
13 p -Value 0.6594

© 2003 South-Western /Thomson Learning™


Slide 45
Using Excel to Conduct a
Normal Distribution Goodness of Fit Test
 Using the p-Value
• The value worksheet shows a p-value of .6594.
• The rejection rule is “Reject H0 if p-value < a ”
• With .6594 > a = .05, we cannot reject the
assumption that the number of units sold by a
salesperson follows a Normal distribution

© 2003 South-Western /Thomson Learning™


Slide 46
End of Chapter 12

© 2003 South-Western /Thomson Learning™


Slide 47

Das könnte Ihnen auch gefallen