Beruflich Dokumente
Kultur Dokumente
O
i
=
E
i
The _
2
distribution is essentially a continuous distribution; however its
character of continuity is maintained only when the individual frequencies of
the variate values remain greater than or equal to 5. So, in applying _
2
test
in the testing of the goodness of fit or testing of the dependency of variables
in a contingency table, the cell frequency should not be less than 5. In
practical problems we can combine a few values of small frequencies into
one to get the pooled frequency greater than 5.
Key Statistic
The results of Chi-Square test cannot be accurate if the cell frequencies
in a contingency table are less than 5.
10.2.5 Practical applications of Chi-Square test
In inferential statistics, the Chi-Square test can also be applied for the
discrete distributions. In using Chi-Square test, we need no assumptions
regarding the shape of sampling distributions. The applications of Chi-
Square test include testing:
Statistics for Management Unit 10
Sikkim Manipal University Page No. 398
- the significance of sample variances
- the goodness of fit of a theoretical distribution
- the independence in a contingency table whether the observed results
are consistent with the expected segregations in breeding experiments
of genetics
Where the first is a parametric test and the other two are nonparametric test.
10.2.6 Uses of Chi-Square test
The _
2
test is used broadly to:
- Test goodness of fit for one way classification or for one variable only
- Test independence or interaction for more than one row or column in the
form of a contingency table concerning several attributes
- Test population variance o
2
through confidence intervals suggested by
_
2
test
10.2.7 Degrees of freedom
The number of degrees of freedom for n observations is n-k and is usually
denoted by v, where k is the number of independent linear constraints
imposed upon them.
Example 1
For example, we are asked to write any four numbers, we will have all the
numbers of our choice. If a restriction is applied or imposed to the choice
that the sum of these numbers should be 50; then the freedom of choice
would be reduced to three only and so the degrees of freedom would
now be 3.
If a _
2
is defined as the sum of the squares of n independent standardized
normal variates, and the condition of the satisfaction of one linear relation is
imposed upon them (such as the estimation of some population parametric
value, etc.), then the effect of these n constraints would be replaced by n-
k. If the sum of squares of a sample mean is taken instead of the population
mean, then n is replaced by n -1 = v. This is because one linear constraint
has been imposed.
Key Statistic
The Chi-Square distribution has only one parameter, that is, the degrees
of freedom.
Statistics for Management Unit 10
Sikkim Manipal University Page No. 399
0
10.2.8 Levels of significance
Tables have been prepared for the values of P, where the probability of
getting a value of _
2
> _
2
where _0
2
is an observed value. From these
tables, we can find the value of P corresponding to an observed value of _
2
and then proceed to test, whether the difference between observed and
theoretical frequencies is significant or not. Smaller the values of P, greater
the divergence between fact and theory so that small values lead us to
suspect the hypothesis. Not only do small values of P lead us to suspect
the hypothesis but a value of P very near to unity may also lead to a similar
result. Thus, if P = 1, _
2
= 0, showing that there is a perfect agreement
between fact and theory and this is a very improbable event. There are two
conventional levels of significance. They are:
- If P < 0.05, we say that the observed value of _
2
is significant at
5 percent level of significance.
- Similarly, if P < 0.01, the value is significant at 1 % level.
10.2.9 Interpretation of Chi-Square values
After ascertaining the _
2
value, the _
2
table comprises of columns headed
with symbols 0.05 for 5% level of significance, 0.01 for 1% level of
significance, etc. The left hand side indicates the degrees of freedom. If the
calculated value of _
2
falls in the acceptance region, the null hypothesis Ho
is accepted and vice-versa. Figure 10.2 depicts the acceptance and
rejection regions of Chi-Square distribution.
Fig. 10.2: Acceptance and Rejection Regions under Chi-Square Distribution
Statistics for Management Unit 10
Sikkim Manipal University Page No. 400
Key Statistic
The Chi-Square curve will be on the positive side of x-axis because the
Chi-Square values are always positive.
10.3Applications of Chi-Square test
10.3.1 Tests for independence of attributes
In the test for independence, the null hypothesis is that the row and column
variables are independent of each other. We have studied earlier, that the
hypothesis testing is done under the assumption that the null hypothesis is
true.
The following are the properties of the test for independence:
- The data are the observed frequencies
- The data is arranged in the form of a contingency table
- The degrees of freedom v can be calculated as:
v = (Number of rows 1)- (Number of columns 1)
where, v is the degrees of freedom
- The test for independence has a Chi-Square distribution and is always a
right tail test.
- The expected value is computed by taking the row total, multiplying it
with the column total and dividing by the grand total. That is given by:
Row T otal Column T otal
E =
Grand T otal
- The test statistic value does not change, if the order of the rows or
columns is interchanged. Also the value does not change even if the
rows and columns are interchanged.
Solved Problem 1
Calculate the degrees of freedom for a contingency table with three rows
and two columns.
Solution The degrees of freedom denoted by v is calculated as:
v = (Number of rows 1)- (Number of
v = (3 1)- (2 1) = 2
columns 1)
Statistics for Management Unit 10
Sikkim Manipal University Page No. 401
Hence, a contingency table with three rows and two columns has two
degrees of freedom.
Solved Problem 2
Table 10.1 depicts the production in three shifts and the number of defective
goods that turned out in three weeks. Test at 5% level of significance
whether weeks and shifts are independent.
Table 10.1: Production of Defective Goods in Three Shifts
Shift 1 Week 2 Week 3 Week Total
I 15 5 20 40
II 20 10 20 50
III 25 15 20 60
Total 60 30 60 150
Solution: Table 10.1a depicts the observed and expected values required
to calculate _
2
.
Table 10.1a: Observed and Expected Values
Observed
Value
Oi
Expected Value
Row T ot alColumn T ot al
E
i
=
Grand T otal
(O E )
2
i i
(O
i
E
i
)
2
E
i
15 (40 x 60) /150 = 16 1 0.0625
20 (50 x 60) /150 = 20 0 0.0000
25 (60 x 60) /150 = 24 1 0.0417
5 (40 x 30) /150 = 8 9 1.1250
10 (50 x 30) /150 = 10 0 0.0000
15 (60 x 30) /150 = 12 9 0.7500
20 (40 x 60) /150 = 16 16 1.0000
20 (50 x 60) /150 = 20 0 0.0000
20 (60 x 60) /150 = 24 16 0.6667
_
2
cal =3.6459
The steps to calculate _
2
are described as follows:
1. Null hypothesis Ho: The week and shifts are independent
Alternate hypothesis H1: The week and shifts are dependent
2. Level of significance is 5% and degrees of freedom
d.f. = (3 1) (3 1) = 4
_
tab
2
= 9.49
Statistics for Management Unit 10
Sikkim Manipal University Page No. 402
2
ca
l
tab
3. Test statistics
_ =
(O E )
2 i i
E
i
_
2
cal = 3.6459
4. Conclusion: Since _
2
(3.6459) < _
2
), Ho is accepted. Hence,
the attributes week and shifts are independent.
Solved Problem 3
Out of 1000 people surveyed, 600 belonged to urban areas and rest to rural
areas. Among 500 who visited other states, 400 belonged to urban areas.
Test at 5% level of significance whether area and visiting other states are
dependent.
Solution: Table 10.2 depicts the information given in solved problem 3 in a
tabulated form.
Table 10.2: People Belonging to Urban and Rural Areas
Other States Urban Rural Total
Visited 400 100 500
Not Visited 200 300 500
Total 600 400 1000
Table 10.2a depicts the observed and expected values for the calculation of _
2
.
Table 10.2a: Observed and Expected Values
Observed
Value
Oi
Expected Value
Row T ot alColumn T ot al
E
i
=
Grand T otal
(O E )
2
i i
(O E )
2
i i
E
i
400 300 10000 33.33
200 300 10000 33.33
100 200 10000 50.00
300 200 10000 50.00
_
2
cal = 166.66
The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis H0: Area and visit are independent.
Alternate hypothesis H1: They are dependent.
Statistics for Management Unit 10
Sikkim Manipal University Page No. 403
2
cal tab
2. Level of significance is 5% and degrees of freedom
d.f. = (2 1) (2 1) = 1
_
tab
2
= 3.84
3. Test statistics
_ =
(O E )
2 i i
E
i
_
2
cal = 166.66
4. Conclusion: Since _
2
(166.66) > _
2
(3.84), Ho is rejected. Hence, the
area and visit are dependent.
10.3.2 Test of goodness of fit
The test of goodness of fit of a statistical model measures how accurately
the test fits a set of observations. This test measures and summarises the
differences if any, between the observed and expected values of the
considered statistical model. These test results are helpful to know whether
the samples are drawn from identical distributions or not. The degrees of
freedom are n-1 and the expected value is equal to the average of the
observed values.
Solved Problem 4
A personal manager is interested in trying to determine whether
absenteeism is greater on one day of the week than on another day of the
week. The record for the past years is available. Table 10.3a depicts the
absenteeism for each working day over a week. Test whether absenteeism
is uniformly distributed over the week.
Table 10.3: Comparison of Data about Absenteeism
Days of
Week
Monday
Tuesday
Wednesday
Thursday
Friday
Number of
absentees
66
57
54
48
75
Solution: If the absenteeism is uniformly distributed over the week, then
expected number of absenteeism per day is given by:
Statistics for Management Unit 10
Sikkim Manipal University Page No. 404
2
ca
l
tab
E
i
=
(66 + 57 + 54 + 48 + 75)
5
= 60
The table 10.3a depicts the calculated expected values required for
calculation of _
2
for the data related to problem 4.
Table 10.3a: Observed and Expected Values for Calculation of _
2
Observed Value
Oi
Expected Value
E
i
(O E )
2
i i
(O
i
E
i
)
2
E
i
66 60 36 0.6000
57 60 9 0.1500
54 60 36 0.6000
48 60 144 2.4000
75 60 225 3.7500
_
2
cal=7.5000
The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis Ho: The observed frequencies fit with uniform
distribution.
2. Alternate hypothesis H1: The observed frequencies does not fit with
uniform distribution.
3. Level of significance is 5% and degrees of freedom (d.f.)= (5 1) = 4
_
2
tab = 9.49
4. Test statistics
_
2
=
(O
i
E
i
)
E
i
_
2
cal = 7.50
5. Conclusion: Since _
2
(7.5) < _
2
), Ho is accepted. In other
words, we conclude at 5% level of significance that absenteeism is
uniformly distributed and is independent of the days of the week.
Statistics for Management Unit 10
Sikkim Manipal University Page No. 405
2
_
2
ca
l
tab
Solved Problem 5
According to a theory in Genetics, the proportion of beans of A, B, C and D
types in a generation should be 9:3:3:1. In an experiment with 1600 beans,
the frequency of bean of A, B, C and D type was observed to be 882, 313,
287 and 118 respectively. Does the result support the theory?
Solution: The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis Ho: The result supports theory
Alternate hypothesis H1: The result does not support theory
2. Level of significance is 5% and degrees of freedom(d.f.)= (4 1) = 3
3. Test statistics
_
tab
2
= 7.81
_
2
=
(O
i
E
i
)
E
i
Table 10.4 depicts the observed and expected values for calculation of _
2
for solved problem 5.
Table 10.4: Observed and Expected Values for Calculation of _
2
Observed Value
Oi
Expected Value
E
i
(O E )
2
i i
(O
i
E
i
)
2
E
i
882 (1600 x 9) / 16 = 900 324 0.36
313 (1600 x 3) / 16 = 300 169 0.56
287 (1600 x 3) / 16 = 300 169 0.56
118 (1600 x 1) / 16 = 100 324 3.24
_
2
cal = 4.72
cal = 4.72
4. Conclusion: Since _
2
(4.72) < _
2
), Ho is accepted. Therefore,
the result supports the theory.
Statistics for Management Unit 10
Sikkim Manipal University Page No. 406
2
_
2
Solved problem 6
The following table gives the classification of 100 workers according to
gender and the nature of work. Test whether nature of work is independent
of the gender of the worker.
Table 10.5
Skilled Unskilled Total
Males 40 20 60
Females 10 30 40
Total 50 50 100
The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis Ho: There is no association between nature of work
and is independent of the gender of the worker
2. Level of significance is 5% and degrees of freedom(d.f.)=
(r-1)(c-1)= (2-1) (2-1)=1
_
tab
2
= 3.84
3. Test statistics
_ =
(O E )
2 i i
E
i
Table 10.5a depicts the observed and expected values for calculation of _
2
for solved problem 6.
Table 10.5a: Observed and Expected Values for Calculation of _
2
Observed Value
Oi
Expected Value
E
i
(O E )
2
i i
(O
i
E
i
)
2
E
i
40 30 10 3.333
10 20 -10 5.000
20 30 -10 3.333
30 20 10 5.000
_
2
cal = 16.666
cal = 16.666
Statistics for Management Unit 10
Sikkim Manipal University Page No. 407
cal tab
s p p
p
2
4. Conclusion: Since _
2
(16.666) > _
2
), Ho is accepted. Therefore
the null hypothesis that gender and nature of work are independent will
be rejected.
10.3.3 Test for comparing variance
When we have to use _
2
as a test of population variance, then,
Ho: o
2
= o
2
and HA: os
2
2
= o
2
_
2
=
o
s
o
p
2
(n 1)
Where os = variance of the sample
o
2
= variance of the population
(n -1) = degrees of freedom, n being the number of items in the
sample.
Then by comparing the calculated value with the table value of _
2
for (n-1)
degrees of freedom at a given level of significance, we may either accept or
reject the null hypothesis. If the calculated of _
2
is less than the table value,
the null hypothesis is accepted, but if the calculated value is equal or greater
than the table value the hypothesis is rejected.
Self Assessment Questions
1. _
2
test is a test.
2. A table with 4 rows and 2 columns has the degrees of freedom of
.
3. _
2
test is wholly based on data.
4. If there are four rows and five columns in classification for _
2
test, then the number of degrees of freedom equal to .
5. If the calculated _
2
value is less than the tabulated _
2
value, then the
null hypothesis is .
Statistics for Management Unit 10
Sikkim Manipal University Page No. 408
i) 100.0
ii) 38.4
iii) 0.61
iv) -2.45
i) 5
ii) 6
iii) 7
iv) 12
Activity
Objective Questions:
1. What is the appropriate test to use if you want to determine whether
there is evidence that the proportion of successes is higher in group 1
than in group 2 and we have obtained independent samples from the
two groups?
i) The Z test
ii) The Chi-Square test
iii) Both of the above
iv) None of the above
2. Which of the following values cannot occur in a Chi-Square
distribution?
3. What test would you use to determine whether a set of observed
frequencies differ from their corresponding expected frequencies?
i) The t test for dependent samples
ii) The Chi-Square test
iii) The t test for independent samples
iv) The F test
4. When using the chi-square test for differences in two proportions with
a contingency table that has r rows and c columns, how many degrees
of freedom will the test statistic have?
i) n 1
ii) n
1
+ n - 2
2
iii) (r - 1) x (c - 1)
iv) (r - 1) + (c 1)
5. When testing for the independence in a contingency table with 3 rows
and 4 columns, how many the degrees of freedom will the test statistic
have?
Statistics for Management Unit 10
Sikkim Manipal University Page No. 409
6. Which of the following is true about the Chi-Square distribution?
i) It is a skewed distribution
ii) Its shape depends on the number of degrees of freedom
iii) As the degrees of freedom increase, the Chi-Square distribution
becomes more symmetrical
iv) All of the above
7. What other name is used for a contingency table?
i) A cross-classification table
ii) An ANOVA table
iii) A histogram
iv) None of the above
Solutions to Objective Questions
1. i) The Z test
2. iv) -2.45
3. ii) The Chi-Square test
4. iii) (r - 1)x(c 1)
5. ii) 6
6. 8 iv) All of the above
7. i) A cross-classification table
10.4Summary
Let us recapitulate the important concepts discussed in this unit:
- Chi-Square test is a non-parametric test. The important applications of
Chi-Square test are the tests for independence of attributes, the test of
goodness of fit and the test for specified variance.
- _
2
describe the magnitude of discrepancy between the observed and the
expected frequencies. The value of _
2
is calculated as:
(O E )
2
(O E )
2
(O E )
2
(O E )
2
(O E )
2
_
2
=
i i
=
E
i
1 1
+
E
1
2 2
+
E
2
3 3
+ ....... +
n n
E
3
E
n
Where, O1, O2, O3.On are the observed frequencies and E1, E2,
E3En are the corresponding expected or theoretical frequencies..
Statistics for Management Unit 10
Sikkim Manipal University Page No. 410
- An important criterion for applying the Chi-Square test is that the sample
size should be very large.
10.5Glossary
Chi-Square test: It is a non-parametric test where no parameters regarding
the rigidity of population are required.
Level of significance: The smallest probability at which the null hypothesis
would be rejected (type I error). Usually, if the significance level is less than
a number such as 0.05 (5%), the null hypothesis would be rejected in favour
of the alternative; the chance of getting a sample like the one being
analysed if the null hypothesis were true. A small significance level would
imply that getting such a sample was highly unlikely, suggesting that the null
hypothesis is probably not true; also called the P-value of the test.
10.6Terminal Questions
5. 400 items of each (material) were given treatment x and y to enhance
the strength of the material. 80 gained strength by treatment x and 20
gained strength by treatment y. Does the gain in strength depend on
the treatment?
6. The demand for a particular spare part was found to vary from day to
day. Table 10.6 depicts the information obtained in a sample study.
Test the hypothesis that the number demanded depends upon the day.
Table 10.6: Spare Part Demand from Monday to Saturday
Days
Mon
Tue
Wed
Thur
Fri
Sat
Quantity
Demanded
1124
1125
1110
1120
1126
1115
7. In a survey of 200 boys, of which 75 were intelligent, 40 had skilled
fathers. While 85 of the unintelligent boys had unskilled fathers. Can we
say on the basis of the information that skilled fathers had intelligent
boys?
8. The number of car accidents per month in a town was as follows: 6, 9, 4,
12, 8, 20, 14, 15, 2, and 10. Test the hypothesis that the number of
accidents is same every month.
Statistics for Management Unit 10
Sikkim Manipal University Page No. 411
1. _
2. _
3. _
4. _
5. _
6. _
9. In a particular industry the post graduate, graduate, undergraduates are
in the ratio 2:3:5. A firm belonging to the industry had 400, 550 and 1050
postgraduates, graduates and undergraduates on its pay-roll. Do they
follow earlier observation about the industry?
10. Three hundred digits were chosen at random from a set of tables. The
frequencies of the digits were as follows:
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 28 29 33 31 26 35 32 30 31 25
Using Chi-square test assess the hypothesis that the digits were
distributed in equal numbers in the table.
10.7 Answers
Self Assessment Questions
1. Non-parametric
2. 3
3. Sample
4. 12
5. Not Rejected
Terminal Questions
2
cal
2
cal
2
cal
2
cal
2
cal
2
cal
= 41.142 Ho
= 0.179 Ho
= 8.888 Ho
= 26.6 Ho
= 6.6667 Ho
= 2.864 Ho
rejected
accepted
rejected
rejected
rejected
accepted
10.8Case Study
Automobile Preference
A market research firm in an Asian country made a survey to see if there
was any correlation between a persons nationality and their preference in
the make of automobile they purchased. Table 10.7 depicts the sample
information obtained.
Statistics for Management Unit 10
Sikkim Manipal University Page No. 412
Table 10.7: Types of Automobile Purchased in Various Countries
Pakistan China India Srilanka Nepal
Maruti Suzuki 40 28 30 25 50
Opel 32 35 29 39 35
Lancer 24 40 27 28 29
Ford 40 20 40 26 40
Fiat 26 10 35 35 46
Discussion Questions:
i. Indicate the appropriate null and alternative hypothesis to test if the
make of automobile purchased is dependent on an individuals
nationality?
ii. Using the critical value approach of the Chi-Square test at a 1%
significant level, does it appear that there is a relationship between
automobile purchase and nationality?
iii. Verify the result to Question 2 by using the p-value approach of the
Chi-Square test
iv. What has to be the significance level in order that there appears a
breakeven situation between dependency of nationality and
automobile preference?
v. What is your comment about the results?
References:
- Bevington, P. R. & Robinson, D. K. Data Reduction and Error Analysis
for the Physical Sciences (3rd Edition). (Paperback).
- Cowan, G. Statistical Data Analysis (Oxford Science Publications).
(Paperback).
- Devore, J. L. Probability and Statistics for Engineering and the Sciences
Enhanced Review Edition. (Hardcover - Jan. 29, 2008).
- Froedesen, A. G., Skieggestad, D. & Tofte, H. Probability and Statistics
in Particle Physics. (Hardcover, 1979 out of print).
- James. H. Statistical Methods in Experimental Physics (2nd Edition).
(Hardcover - Nov. 29, 2006).
- Levin, R. I. & Rubin, D. S. (2008) Statistics for Management, Seventh
Edition, PHI Learning Private Limited.
- Lyons, L. Nuclear and Particle Physicists. (Paperback, 1989).
Statistics for Management Unit 10
Sikkim Manipal University Page No. 413
- Mandel, J. The Statistical Analysis of Experimental Data. (Paperback).
- Mayer, S. L. Data Analysis for Scientists and Engineers. (Paperback).
- Morris. H., Schervish, M. J. & Degroot Probability and Statistics
[PROBABILITY & STATISTICS 3 -OS]. (Paperback - Jan. 31, 2002).
- Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P.
Numerical Recipes (3rd Edition): The Art of Scientific Computing.
- Ross, S. M. Introduction to Probability and Statistics for Engineers and
Scientists, Fourth Edition. (Hardcover - Feb. 13, 2009).
- Taylor, J. R. An Introduction to Error Analysis: The Study of
Uncertainties in Physical Measurements. (Paperback).