Sie sind auf Seite 1von 38

Chi Squares Tests

Test of more than two Proportions


Test of Independence of Attributes
Test of Goodness of Fit

QAM – II by Gaurav Garg (IIM Lucknow)


Testing of Proportions
• Example:
• Sharing of patient records is a controversial issue in health care.
• A survey was conducted in NCR, Bangalore, and Hyderabad.
• 500 respondents in each location are asked whether they object
to their records being shared by insurance companies/ hospitals.
• The results are summarized on the following table:

Object to Record Location


Sharing NCR Bangalore Hyderabad
Yes 410 295 335
No 90 205 165

QAM – II by Gaurav Garg (IIM Lucknow)


• We wish to test if proportion of respondents who object
to their records being shared is different across the
three locations or not.
• This is the test of difference of more than two
proportions.
• When we have only two populations having population
proportions π1 and π2
• And, we wish to test
• H0: π1 = π2 H1: π1 ≠ π2
• We use N(0,1) distribution for this.
p1  p2
• And the Test Statistic is c Z  ~ N (0,1)
1 1
 (1   )  
 n1 n2 
QAM – II by Gaurav Garg (IIM Lucknow)
• In the given example, we have three populations.
• We wish to test
• H0: π1 = π2 = π3 (All the proportions are the same)
• H1: Not all π1, π2, π3 are equal
• The table of data shown in the example is called as
Contingency Table.
• Contingency Tables are used to classify sample
observations according to two or more characteristics.
• Contingency Table is useful in situations involving
multiple population proportions.
• Let a contingency table has r rows and c columns.
• Then, it will have r x c cells
QAM – II by Gaurav Garg (IIM Lucknow)
• The test statistic is
( f  f ) 2
 c2   o e

all cells fe
• fo = observed frequency in a particular cell of contingency table.
• fe = expected frequency in a particular cell
• = [row total x column total]/ Grand Total
• c2 follows Chi-Square distribution with (r-1)(c-1) d.f.
• The test is considered as a right tailed test.
• We reject H0 in the favor of H1 at α x100% level, if 2
• Assumptions:  c   2
( )

• Total sample size should be large. (more than 50)


• Each cell in the contingency table has expected frequency of at
least FIVE.

QAM – II by Gaurav Garg (IIM Lucknow)


• Calculation of Expected frequencies:

Object to Location
Record NCR Bangalore Hyderabad Total
Sharing

Yes fo = 410 fo = 295 fo = 335 1040


fe = 346.667 fe = 346.667 fe = 346.667
No fo = 90 fo = 205 fo = 165 460
fe = 153.333 fe = 153.333 fe = 153.333
Total 500 500 500 1500

fe = 1040 x 500/ 1500

QAM – II by Gaurav Garg (IIM Lucknow)


• Calculation of Test Statistic:
Location
Object to Record Insurance Pharmacies Medical
Sharing Companies Researchers
Yes  fo  fe  2  f o  f e 2  7.700
 f o  f e 2  0.3926
 11.571 fe fe
fe
No  f o  f e 2  f o  f e 2  f o  f e 2  0.888
 26.159  17.409
fe fe fe

( f  f ) 2
 c2   o e
 64.1196
all cells fe

• Distribution of Test Statistic: 2 with (2-1)(3-1) = 2 d.f.


• Critical Value at 5% level of significance is 5.991
• Since 64.1196 > 5.991, we reject H0 at 5% level.

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:
• A Retirement Plan study discussed the results of
a sample of 2001 Americans ages 50 to 70 who
were employed full time or part time. The results
were as follows:
PLANS
GENDER Not Work Start Own Work Full Work Part Don’t Other
for Pay Business Time Time Know
MALE 257 115 103 457 27 42
FEMALE 359 87 49 436 34 35

• Is there evidence of a significant difference


among the plans for retirement with respect to
gender? (α = 0.05)
QAM – II by Gaurav Garg (IIM Lucknow)
Chi Square Test for Independence of Attributes
• The 2 test for equality of more than two
proportions can be used to test the independence of
two categorical variables or two attributes.
• We test if the two attributes A and B of a given
population are independent or not.
• Attribute A is divided into r classes A1, A2,…Ar.
• Attribute B is divided into c classes B1, B2,…Bc.
• We are given the data in a r x c contingency table.
• H0: Attributes A and B are independent
• H1: Attributes A and B are not independent
QAM – II by Gaurav Garg (IIM Lucknow)
• The method is exactly the same as discussed for testing the
equality of proportions.
• Example: The meal plan selected by 200 students is shown
below:
Number of free meals per week
Class
Standing 20/week 10/week none Total
Fresh 24 32 14 70
Sophomore 22 26 12 60
Junior 10 14 6 30
Senior 14 16 10 40
Total 70 88 42 200

• We wish to test if “Class Standing” is independent of “Number of


Meals per Week”.
QAM – II by Gaurav Garg (IIM Lucknow)
• Calculation of expected frequencies:
Number of free meals per week
Class
Standing 20/week 10/week none Total
Fresh 24.5 30.8 14.7 70
Sophomore 21.0 26.4 12.6 60
Junior 10.5 13.2 6.3 30
Senior 14.0 17.6 8.4 40
Total 70 88 42 200
( fo  fe )2
• Test Statistic:   
2
c  0.709
all cells fe
• Critical Value at 5% significance level and 6 d.f. is
12.592.

QAM – II by Gaurav Garg (IIM Lucknow)


The test statistic is  2  0.709 ,  (20.05) with 6 d.f.  12.592

Chi-square distribution with 6 d.f.

=0.05

0
Do not Reject H0 2
reject H0
0.709 12.592

Here, c2 = 0.709 < 2(0.05) = 12.592,


So do not reject H0 at 5% level of Significance.

QAM – II by Gaurav Garg (IIM Lucknow)


• We always have f
all cells
o  f
all cells
e

• If we approximate some expected frequency, we


have to make sure that above condition is satisfied.
• In these problems, data is of discrete type
• Chi – Square distribution is a continuous
distribution.
• It loses its validity if any expected frequency is less
than FIVE.
• In such case, the expected frequency is pooled with
the preceding or succeeding frequency.
• D.f. is reduced by one for one such pooling.
• We do not make any assumption about the
distribution of parent population.
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• Two researchers adopted different sampling techniques
while investigating the same group of customers to find the
number of customers falling in different buying-intelligence
levels.
• The results are as follows:
No. of Customers in Each level
Researcher Below Average Above Genius Total
Average Average
1 86 60 44 10 200
2 40 33 25 2 100
Total 126 93 69 12 300

• Are the sampling techniques adopted by two researchers are


significantly different? (Use α = 0.05)
QAM – II by Gaurav Garg (IIM Lucknow)
• Calculation of Expected Frequencies:
No. of Customers in Each level
Researcher Total
Below Average Above Genius
Average Average
1 84 62 46 8 200
2 42 31 23 4 100
Total 126 93 69 12 300

fo fe (fo –fe)2 / fe
• H0: No Difference 86 84 0.048
• H1: Difference 60 62 0.064
• Calculation for Chi – Square: 44 46 0.087
10 8 0.500
• α = 0.05
40 42 0.095
• d.f. = (2-1)(4-1) – 1 = 2
33 31 0.129
• Critical Value = 5.991 25 23
27 27 0
• Do not Reject H0 2 4
300 300 0.923
QAM – II by Gaurav Garg (IIM Lucknow)
Chi Square Test for Goodness of Fit

• It is a powerful test given by Karl Pearson.


• It enables us to find if the deviation of the
experiment from theory is just by chance or due
to inadequacy of the theory to fit the observed
data.
• The test statistic is exactly the same as discusses
earlier.
• The only difference is in calculating the
expected frequencies.

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:

• Company A has recently conducted aggressive advertising


campaigns to maintain and possibly increase its share of market
for fabric softener.
• Their main competitor, company B has 40% of the market.
• A number of other competitors account for the remaining 15%.
• To determine if the market shares changed after the marketing
campaigns, the marketing manager of company A solicited the
preferences of a random sample of 200 customers.
• 102 indicated preference for A, 82 for B and remaining 16 for the
others.
• Can the analyst infer that customer preferences are changed?
• (Use α = 0.05)

QAM – II by Gaurav Garg (IIM Lucknow)


• H0: Fit is good.
• H1: Fit is not good. ( fo  fe ) 2

• The test statistic is  


2
c
fe
• fo = observed frequency of a particular class
• fe = expected frequency of that class
• = Total frequency x probability of that class
• c2 follows Chi-Square distribution with (k-1) d.f.
• The test is considered as a right tailed test.
We reject H0 in the favor of H1 at α x100% level, if  c   ( )
2 2

• Assumptions:
• Total sample size should be large. (more than 50)
• Each cell in the contingency table has expected frequency of at
least FIVE.
QAM – II by Gaurav Garg (IIM Lucknow)
• Example: Consider the fabric softener companies
example.
• H0: Fit is good. (data are perfectly fitted to the theory)
• H1: Fit is not good.
fo fe (fo –fe)2 / fe
102 200 x 0.45 = 90 1.60
82 200 x 0.40 = 80 0.05
16 200 x 0.15 = 30 6.53
200 200 8.18
• α = 0.05
• d.f. = 3 – 1 = 2
• Critical Value = 5.991
• Reject H0
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• Suppose a die is tossed 60 times.
• It is observed that
• 1 comes 8 times
• 2 comes 6 times
• 3 comes 12 times
• 4 comes 8 times
• 5 comes 12 times
• 6 comes 14 times
• If the die is fair, expected frequencies are 10, 10,
10, 10, 10, and 10.

QAM – II by Gaurav Garg (IIM Lucknow)


• We want to examine if the difference between observed
frequencies and expected frequencies is significant or not.
• If this difference is significant, then the die is not fair.
• To test this we use Chi Square Test for Goodness of Fit.
fo fe (fo –fe)2 / fe
8 10 0.4
6 10 1.6
12 10 0.4
8 10 0.4
12 10 0.4
14 10 1.6
60 60 4.8

• α = 0.05, d.f. = 6 – 1 = 5
• Critical Value = 11.070
• Do not reject H0 : Fit is good or die is fair.
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• Following are the number of aircraft accidents
during the various days of the week. Find if the
accidents are uniformly distributed over the week.
Days: Mon Tue Wed Thu Fri Sat Sun
No. of Acc.: 14 16 8 12 11 9 14

• (= 4.17)

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:
• A survey of 320 families with 5 children each
revealed the following distribution:
No. of boys: 5 4 3 2 1 0
No. of girls: 0 1 2 3 4 5
No. of families: 14 56 110 88 40 12
• Is this result consistent with the hypothesis that
male and female births are equally probable?
• Hint: Use Binomial Distribution
• (=7.16)

QAM – II by Gaurav Garg (IIM Lucknow)


• H0 : male and female births are equally probable
• H1 : male and female births are not equally probable
• Success: A randomly chosen child is a girl
• X = no. of girl children in a family with 5 children
• Under H0 , X~B(5, 1/2)
• P(X=0) = 5C0(1/2)0 (1 – 1/2)5-0= 5C0(1/2)5 =0.03125
• P(X=1) = 5C1(1/2)5 = 0.15625
• P(X=2) = 5C2(1/2)5 = 0.31250
• P(X=3) = 5C3(1/2)5 = 0.31250
• P(X=4) = 5C4(1/2)5 = 0.15625
• P(X=5) = 5C5(1/2)5 = 0.03125
QAM – II by Gaurav Garg (IIM Lucknow)
• Calculation of Chi Square:
fo fe (fo –fe)2 / fe
14 320 x P(X=0) =10 1.60
56 320 x P(X=1) =50 0.72
110 320 x P(X=2) =100 1.00
88 320 x P(X=3) =100 1.44
40 320 x P(X=4) =50 2.00
12 320 x P(X=5) =10 0.40
320 320 7.16

• α = 0.05, d.f. = 6 – 1 = 5
• Critical Value = 11.070
• Do not reject H0
QAM – II by Gaurav Garg (IIM Lucknow)
Goodness of Fit test for Normal Distribution
• In many statistical analyses, we assume that the data
has normal distribution.
• Using Chi-Square test, we can verify this.
• Example:
• We wish to examine if following data has normal
distribution:
5.65 5.44 5.42 5.40 5.53 5.34 5.54 5.45 5.52 5.41
5.57 5.40 5.53 5.54 5.55 5.62 5.56 5.46 5.44 5.51
5.47 5.40 5.47 5.61 5.53 5.32 5.67 5.29 5.49 5.55
5.77 5.57 5.42 5.58 5.58 5.50 5.32 5.50 5.53 5.58
5.61 5.45 5.44 5.25 5.56 5.63 5.50 5.57 5.67 5.36
QAM – II by Gaurav Garg (IIM Lucknow)
• We wish to examine if the given sample has N(μ,σ)
distribution.
• Sample mean ( x ) is the estimate of μ.
• Sample standard deviation (s1) is the estimate of σ.
• Now, arrange the given data in class intervals.
• Test Statistic: ( f  f ) 2
 c2   o e

fe
• fo = observed frequency of a particular class interval
• fe = expected frequency of that class interval
• = Total frequency x probability of that class interval
• c2 follows Chi-Square distribution with k-1-2 = k-3 d.f.

QAM – II by Gaurav Garg (IIM Lucknow)


• Example: Consider the previous example.
• Probability of each class is the probability that a random
variable X falls in that class, where X ~ N ( x, s1 )
x  5.5014, s1  0.10583
Class Interval fo Probability fe
Below 5.25 0 P(X<5.25) = 0.0088 0.438127
5.25 - 5.35 5 P(5.25<X<5.35) = 0.0675 3.375546
5.35 - 5.45 10 P(5.35<X<5.45) = 0.2373 11.86612
5.45 - 5.55 17 P(5.45<X<5.55) = 0.3634 18.16841
5.55 - 5.65 14 P(5.55<X<5.65) = 0.2429 12.14483
5.65 - 5.75 3 P(5.65<X<5.75) = 0.0707 3.536422
5.75 - 5.85 1 P(5.75<X<5.85) = 0.0089 0.445844
Above 5.85 0 P(X>5.85) = 0.0005 0.024697
50
• Some fe are less than 5.
QAM – II by Gaurav Garg (IIM Lucknow)
• After pooling, we get
fo fe (f0 - fe)2/fe
9 7.820636 0.177850
10 11.86610 0.293475
14 12.14480 0.283384
17 18.16840 0.075140
0.829849

• α = 0.05, d.f. = 8 – 3 – 4 = 1
• Critical Value = 3.841
• Do not reject H0

QAM – II by Gaurav Garg (IIM Lucknow)


• We can take different class intervals.
• Minimum number of classes (after pooling) must be 4,
so that (k-3) is a positive integer.
Class Interval fo Probability fe
Below 5.25 0 0.0088 0.438127
5.25 - 5.50 21 0.4860 24.29801
5.50 - 5.75 28 0.4959 24.79333
Above 5.75 1 0.0094 0.470541

• Some fe are less than 5.


• After pooling, we get only 2 classes
• We can not apply Chi Square.

QAM – II by Gaurav Garg (IIM Lucknow)


• H0: Given data follows Normal Distribution
• H1: Given data does not follow Normal Distribution
X ~ N ( x, s1 ) x  5.5014, s1  0.10583
Class Interval fo Probability fe (f0 - fe)2/fe
Below 5.40 9 P(X<5.40) = 0.1690 8.449827 0.035822
5.40 - 5.45 8 P(5.40<X<5.45) = 0.1446 7.229963 0.082014
5.45 – 5.50 7 P(5.45<X<5.50) = 0.1811 9.056342 0.466915
5.50 - 5.55 10 P(5.50<X<5.55) = 0.1822 9.112070 0.086525
5.55 - 5.60 8 P(5.55<X<5.60) = 0.1473 7.364258 0.054882
Above 5.60 8 P(X>5.60) = 0.1758 8.787539 0.070579
50 0.796737
• α = 0.05, d.f. = 6 – 1 – 2 = 3,
• Critical Chi Square = 7.815
• Computed Chi Square = 0.796737
• Do not reject H0
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• Weekly demand of a product is assumed to be normally
distributed.
• Use a goodness of fit test and the following data to test
this assumption.
• Use α = 0.10, sample mean = 24.5, sample standard
deviation = 3
18 20 22 27 22
25 22 27 25 24
26 23 20 24 26
27 25 19 21 25
26 25 31 29 25
25 28 26 28 24

QAM – II by Gaurav Garg (IIM Lucknow)


Goodness of Fit test for Exponential Distribution
• Example:
• Among 100 vacuum tubes used in an experiment,
 46 had a service life of less than 20 hours,
 19 had a service life of 20 or more but less than 40 hours,
 17 had a service life 40 hours or more but less than 60
hours,
 12 had a service life of 60 hours or more but less than 80
hours,
 6 had a service life of 80 hours or more.
• Test at 1% level of significance whether the lifetimes
may be regarded as a sample from an exponential
population with a mean of 40 hours.
QAM – II by Gaurav Garg (IIM Lucknow)
• A continuous random variable X is said to have Exponential
Distribution with mean 1/ if its probability density function
is given by e  x , x  0,   0
f ( x)  
 0, otherwise
• Also, P ( X  x )  1  e  x
• We use above formula to obtain probabilities of class
intervals.
• This distribution depends on a single parameter  = 1/ mean
• When  is not given, it is estimated from the sample using
sample mean. ˆ
  1 / sample mean  1/ x
• Test Statistic will have a Chi Square distribution with
• d.f. = (k – 1) – no. of estimates = (k – 1) – 1 = k – 2

QAM – II by Gaurav Garg (IIM Lucknow)


• Example: Consider the previous example.
• Population Mean = 40,  = 1/ 40
Class Interval fo Probability fe (fo –fe)2 / fe
Below 20 46 0.393469 39.34693 1.124949
20 – 40 19 0.238651 23.86512 0.991799
40 – 60 17 0.144749 14.47493 0.440485
60 – 80 12 0.087795 8.779488 1.181356
Above 80 6 0.135335 13.53353 4.193589
100 100 7.932177

• Computed Chi Square = 7.932177


• For α = 0.01, d.f. = 5-1=4,
• Critical Chi Square = 13.277
• Accept H0

QAM – II by Gaurav Garg (IIM Lucknow)


• Example: Consider the same example with unknown
population mean.
Class Interval fo Probability fe (fo –fe)2 / fe
Below 20 46 0.458546 45.85459 0.000461
20 – 40 19 0.248282 24.82816 1.3681
40 – 60 17 0.134433 13.44331 0.940993
60 – 80 12 0.072789 7.278934 3.06205
Above 80 6 0.08595 8.595016 0.78349
100 6.155094
Mean = 32.6

• Computed Chi Square = 6.155094


• For α = 0.01, d.f. = 5-2=3,
• Critical Chi Square = 11.345
• Accept H0
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• No. of automobile accidents per day in a particular city is believed
to have a Poisson Distribution.
• A sample of 80 days during the past year gives the following data:
Number of Accidents 0 1 2 3 4
Observed Freq (days) 34 25 11 7 3

• Do these data support the belief.


• Use α = 0.05.
• Hint: Probability mass function of Poisson Distribution is
e   x
p( x )  , x  0,1,2,
x!
• MEAN = λ
• λ is not given.
• You can estimate it from the sample.

QAM – II by Gaurav Garg (IIM Lucknow)


Summary
• Contingency Table
• Chi Square test for differences in more than two
proportions
• Chi Square test for independence of attributes
• Chi Square test for Goodness of Fit
• Test Statistic:
( f  f ) 2
 c2   o e

fe
• Test Statistic follows Chi Square distribution with d.f.
 Contingency Table: (No. of rows – 1) (No. of columns – 1)
 Goodness of Fit: No. of Classes – 1 – No. of Estimates
 If we pool some frequency, d.f. is reduced.
 If we estimate some parameter, d.f is reduced.
QAM – II by Gaurav Garg (IIM Lucknow)

Das könnte Ihnen auch gefallen