Sie sind auf Seite 1von 27

Categorical Data Analysis

&Chi-Square Applications
lilik_sugiharti@yahoo.co.id

lilik for mm 1
introduction
• You are the manager of The Sheraton Hotel Group.
Guests who are satisfied with the quality of services
during their stay are more likely to return on a future
vacation and to recommended the hotel to friends and
relatives. To asses the quality of services being provided
by your hotels, guest are encourage to compete a
satisfaction survey when they check out. You need to
analyze the data from these surveys to determine the
overall satisfaction with the service provided, the
likelihood that the guests will return to the hotel, and the
reasons some guests indicate that they will not return.

lilik for mm 2
Chi-Square Applications

2

Test Between Test of Test of


Proportions Independency Normality

lilik for mm 3
 2 test for the difference
between two proportions
• Comparing the tallies or counts of categorical
responses between two independent groups 
two way cross-classification table
(contingency table)
• H0: there is no difference between the two
population proportions
– H0: p1=p2
• H1: two population proportions are not the same
– H1: p1≠p2

lilik for mm 4
Characteristics of
The Chi-Square Distribution
• It is never negative
• There is a family of chi-square
distributions
– The shape of the chi-square distribution does
not depend on the size of the sample, but the
number of categories used (k)
• It is positively skewed
– As the number of both d.f. increases, the
distribution begins to approximate the normal
distribution
lilik for mm 5
2-2
CHI-SQUARE DISTRIBUTION

df = 3

df = 5
df = 10

2
lilik for mm 6
Chi-Square Test
• Compare several proportion (Multinomial Test)
• One of nonparametric or distribution-free tests of hypothesis
• Data : nominal-scale or ordinal-scale
• The test statistic is :   f  f 2 
x 2   0 e

 fe 
•  2 test statistic is equal to the squared difference between the
observed and expected frequencies, divided by the expected
frequency in each cell of the table
• f0 is observed frequency in a particular cell of a contingency table
• fe is theoretical or expected frequency in a particular cell if the null
hypothesis is true

lilik for mm 7
Row Column variable (group)
Variable
1 2 totals

successe X1 X2 X
s
failures n1-X1 n2-X n-X

totals n1 n2 n
lilik for mm 8
• X1= number of successes in group 1
• X2= number of successes in group 2
• n1-X1= number of failures in group 1
• n2-X2= number of failures in group 2
• X= X1+ X2 is the total number of successes
• n-X=(n1-X1)+(n2-X2) is the total number of
failures
• n1=the sample size in group 1
• n2=the sample size in group 2
• n=n1+n2 is the total sample size

lilik for mm 9
Example: are you likely to choose
this hotel again?
Choose Hotel
hotel
again?
Sheraton Sheraton Total
lagoon Nusa Dua
yes 163 154 317

no 64 108 172

Total 227 262 489


lilik for mm 10
Output minitab
Chi-Square Test: Lagoon, Nusa Dua

Expected counts are printed below observed counts

Lagoon Nusa Dua Total


1 163 154 317
147.16 169.84

2 64 108 172
79.84 92.16

Total 227 262 489

Chi-Sq = 1.706 + 1.478 +


3.144 + 2.724 = 9.053
DF = 1, P-Value = 0.003
lilik for mm 11
cont..
• Chi-square test is used to :
– Test whether an observed set of frequencies
could have come from a hypothesized
population distribution
– Determine whether the sample observations
come from a particular distribution such as the
normal distribution
– Contingency table analysis, is used to test
whether two traits or characteristics are related
(Test of Independency)

lilik for mm 12
Rejection and non-rejection
area

Reject H0 if χ2>χ2U
Otherwise do not reject H0

(1-α)
α
00 χ2
Region of non-rejection
Critical value Region of rejection

lilik for mm 13
• If the null hypothesis is true, the computed 2
statistic should be close to zero because the squared
difference between what is actually observed in each cell
f0, and what is theoretically expected fe, would be very
small
• On the other hand, if H0 is false, and there are real
differences in the population proportions, the computed
statistic is expected to be large. This is because the
difference between what is actually observed in each cell
and what is theoretically
2 expected will be magnified
when the difference are squared

lilik for mm 14
Goodness-of-Fit Test:
Equal Expected Frequencies
• The purpose of Goodness-of-Fit Test is to
compare an observed set of frequencies (fo)
to an expected set of frequencies (fe).
• Ho : no difference between fo and fe
• H1 : there is a difference between fo & fe
• The critical value is a chi-square value with (k
- 1) degrees of freedom, where k is the
number of categories

lilik for mm 15
Goodness-of-Fit Test:
Unequal Expected Frequencies
• Contoh :
Dosen mengharapkan distribusi nilai ujian : A
= 40%, B = 40%, dan C = 20%. Hasil ujian
menunjukkan distribusi nilai sebagai berikut :
A : 30 orang B : 20 orang C : 10 orang
Uji dengan level of significance 10%, apakah
distribusi nilai tersebut sesuai dengan harapan
dosen tersebut ?

lilik for mm 16
Limitations of Chi-Square
• If there are only two cells, the expected
frequency in each cell should be 5 or more
• For more than two cells, Chi-Square should
not be used if more than 20% of the
expected frequency cells have expected
frequency less than 5.

lilik for mm 17
Example

Level of Management fo fe
Foreman 30 32
Supervisor 110 113
Manager 86 87
Middle Manager 23 24
Assistant vice president 5 2
Vice president 5 4
Senior vice president 4 1
TOTAL 263 263
lilik for mm 18
Level of Management fo fe
Foreman 30 32
Supervisor 110 113
Manager 86 87
Middle Manager 23 24
Vice president 14 7
TOTAL 263 263

lilik for mm 19
Goodness-of-Fit Test for Normality
• Purpose: To test whether the observed
frequencies in a frequency distribution
match the theoretical normal distribution.
• Procedure:
– Determine the mean and standard deviation
of the frequency distribution.
– Compute the z-value for the lower class limit
and the upper class limit for each class.
– Determine fe for each category
– Use the chi-square goodness-of-fit test to
determine if fo coincides with fe.

lilik for mm 20
EXAMPLE : Distribution of Salary
Salary ($ 000) frequency   54.03
20 – 30 4   13.76
30 – 40 20
40 – 50 41
50 – 60 44
60 – 70 29
70 – 80 16
80 – 90 2
90 – 100 4
TOTAL 160
lilik for mm 21
Salary (S 000) Z Value Area fe
Under 30 Under –1.75 0.0401 6.416
30 – 40 -1.75 to -1.02 0.1138 18.208
40 – 50 -1.02 to -0.29 0.2320 37.120
50 – 60 -0.29 to 0.43 0.2805 44.880
60 – 70 0.43 to 1.16 0.2106 33.696
70 – 80 1.16 to 1.89 0.0936 14.976
80 or more over 1.89 0.0294 1.704
1 160
x
Z 

lilik for mm 22
Calculation for Chi-Square
( f  f ) 2

Salary (S 000) fo fe (fo – (fo –


fe)2 o f e
e
fe)
Under 30 4 6.416 -2.416 5.837 0.910
30 – 40 20 18.208 1.792 3.211 0.176
40 – 50 41 37.120 3.880 15.054 0.406
50 – 60 44 44.880 -0.880 0.774 0.017
60 – 70 29 33.696 -4.696 22.052 0.654
70 – 80 16 14.976 1.024 1.049 0.070
80 or more 6 1.704 1.296 1.680 0.357
160 160 2.590
lilik for mm
X2 23
• Suppose we knew the mean and standard
deviation of population but wished to find
whether some sample data conform to the
normal distribution,
d.f. = k - 1
• On the other hand, if we don’t know the mean
and standard deviation of population but we
wish to test whether some sample data follow
the normal distribution,
d.f. = k – p – 1
(where p is the number of population
parameter being estimated from the sample
data) lilik for mm 24
Contingency Table Analysis
• Contingency table analysis is used to test whether
two traits or variables are related.
 Two-way classification table
• Each observation is classified according to two
variables.
• d.f. : (number of rows-1)(number of columns-1).
• The expected frequency (fe) is computed as:
fe 
 Row _ totalColoumn _ total
Grand _ total 2
X
• Coefficient of Contingency : C
lilik for mm
X2 N 25
Contoh
Manajer produksi meneliti tingkat kerusakan pada
mesin produksi. Hasilnya pengamatan terhadap
barang yang diproduksi sebagai berikut

Kondisi Mesin 1 Mesin 2 Mesin 3


Rusak 12 15 6
Baik 88 105 74

Apakah kerusakan tersebut disebabkan mesin


atau kebetulan saja ? Uji dengan  = 0,05

lilik for mm 26
Contoh
Lembaga riset meneliti apakah ada hubungan
antara jenis surat kabar yang dibaca dengan
kelompok masyarakat. Hasilnya sebagai
berikut :
Surat Kabar
Kelompok A B C
Atas 170 124 90
Menengah 120 112 100
Bawah 130 90 88

Uji dengan  = 0,1 lilik for mm 27

Das könnte Ihnen auch gefallen