Beruflich Dokumente
Kultur Dokumente
Categorical Variables
Case where both explanatory (independent)
variable and response (dependent) variable
are qualitative (Chapter 7 includes case
where both are binary (2 levels)
Association: The distributions of responses
differ among the levels of the explanatory
variable (e.g. Party affiliation by gender)
Contingency Tables
Cross-tabulations of frequency counts where the
rows (typically) represent the levels of the
explanatory variable and the columns represent
the levels of the response variable.
Numbers within the table represent the numbers
of individuals falling in the corresponding
combination of levels of the two variables
Row and column totals are called the marginal
distributions for the two variables
Autumn
370
526
980
1876
Winter
452
624
1200
2276
Spring
273
513
995
1781
Summer
422
1059
1751
3232
Total
1517
2722
4926
9165
Autumn
24.4
19.3
19.9
Winter
29.8
22.9
24.4
Spring
18.0
18.9
20.2
Summer
27.8
38.9
35.5
Total% (n)
100.0 (1517)
100.0 (2722)
100.0 (4926)
region
40-49S
50-59S
60-79S
30.00
20.00
10.00
Autumn
Winter
Spring
Summer
season
Total
n11
n12
n1c
n1.
n21
n22
n2c
n2.
nr1
nr2
nrc
nr.
Total
n.1
n.2
n.c
n..
f
)
2
e
Test Statistic: obs
o
fe
Autumn
370
526
980
1876
Winter
452
624
1200
2276
Spring
273
513
995
1781
Summer
422
1059
1751
3232
Total
1517
2722
4926
9165
Autumn
310.5
557.2
1008.3
1876
Winter
376.7
676.0
1223.3
2276
Spring
294.8
529.0
957.3
1781
Summer
535.0
959.9
1737.1
3232
Total
1517
2722
4926
9165
2
obs
Season
Autumn
Winter
Spring
Summer
Autumn
Winter
Spring
Summer
Autumn
Winter
Spring
Summer
fo
fe
370
452
273
422
526
624
513
1059
980
1200
995
1751
310.5
376.7
294.8
535.0
557.2
676.0
529.0
959.9
1008.3
1223.3
957.3
1737.1
(fo-fe)^2
3540.25
5670.09
475.24
12769
973.44
2704
256
9820.81
800.89
542.89
1421.29
193.21
((fo-fe)^2)/fe
11.4017713
15.0520042
1.61207598
23.8672897
1.74702082
4
0.48393195
10.2310762
0.79429733
0.44379138
1.4846861
0.11122561
71.2291706
REGION
40-49S
50-59S
60-79S
Total
Count
Expected Count
% within REGION
Count
Expected Count
% within REGION
Count
Expected Count
% within REGION
Count
Expected Count
% within REGION
Autumn
370
310.5
24.4%
526
557.2
19.3%
980
1008.3
19.9%
1876
1876.0
20.5%
SEASON
Winter
Spring
452
273
376.7
294.8
29.8%
18.0%
624
513
676.0
529.0
22.9%
18.8%
1200
995
1223.3
957.3
24.4%
20.2%
2276
1781
2276.0
1781.0
24.8%
19.4%
Summer
422
535.0
27.8%
1059
959.9
38.9%
1751
1737.1
35.5%
3232
3232.0
35.3%
Total
1517
1517.0
100.0%
2722
2722.0
100.0%
4926
4926.0
100.0%
9165
9165.0
100.0%
Chi-Square Tests
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases
Value
71.189a
71.337
23.418
6
6
Asymp. Sig.
(2-sided)
.000
.000
.000
df
9165
P-value
Residual Analysis
Once dependence has been determined from a chisquared test, often interested in determining which
cells contributed
Residual: fo-fe measures the difference between the
observed and expected counts
Positive implies observed more than expected
Residuals practical importance depends on level of fe
Season
Autumn
Winter
Spring
Summer
Autumn
Winter
Spring
Summer
Autumn
Winter
Spring
Summer
fo
fe
370
452
273
422
526
624
513
1059
980
1200
995
1751
310.5
376.7
294.8
535
557.2
676
529
959.9
1008.3
1223.3
957.3
1737.1
2x2 Tables
Each variable has 2 levels
Explanatory Variable Groups (Typically based
on demographics, exposure, or Trt)
Response Variable Outcome (Typically
presence or absence of a characteristic)
Measures of association
Relative Risk (Prospective Studies)
Odds Ratio (Prospective or Retrospective)
Absolute Risk (Prospective Studies)
Group 1
Outcome
Present
n11
Outcome
Absent
n12
Group
Total
n1.
Group 2
n21
n22
n2.
Outcome
Total
n.1
n.2
n..
Relative Risk
Ratio of the probability that the outcome
characteristic is present for one group, relative to
the other
Sample proportions with characteristic from
groups 1 and 2:
n11
1
n1.
^
n21
2
n2.
^
Relative Risk
Estimated Relative Risk:
RR 1
) , RR (e1.96
^
e 2.71828
))
^
(1 1 )
(1
v
n11
n21
Relative Risk
Interpretation
Conclude that the probability that the outcome is
present is higher (in the population) for group 1 if
the entire interval is above 1
Conclude that the probability that the outcome is
present is lower (in the population) for group 1 if
the entire interval is below 1
Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 1
TNF
Other
Total
COC
7
4
11
No COC
240
734
974
Total
247
738
985
.0283
RR ^
5.24
2 .0054
95%CI : (5.24e 1.96
.3874
1 .0283 1 .0054
v
.3874
7
4
, 5.24e1.96
.3874
) (1.55 , 17.76)
Odds Ratio
Odds of an event is the probability it occurs
divided by the probability it does not occur
Odds ratio is the odds of the event for group 1
divided by the odds of the event for group 2
Sample odds of the outcome for each group:
n11 / n1.
n11
odds1
n12 / n1.
n12
odds2
n21
n22
Odds Ratio
Estimated Odds Ratio:
odds1 n11 / n12 n11n22
OR
) , OR (e1.96 v ) )
1
1
1
1
e 2.71828
v
n11
n12
n21
n22
Odds Ratio
Interpretation
Conclude that the probability that the outcome is
present is higher (in the population) for group 1 if
the entire interval is above 1
Conclude that the probability that the outcome is
present is lower (in the population) for group 1 if
the entire interval is below 1
Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 1
Total
170
368
538
0.58
138(105) 14490
1
1
1
1
v
0.0518
32 138 105 263
OR
0.0518
, 0.58e1.96
0.0518
) (0.37 , 0.91)
Absolute Risk
Difference Between Proportions of outcomes with
an outcome characteristic for 2 groups
n11
1
n1.
^
n21
2
n2.
^
Absolute Risk
Estimated Absolute Risk:
^
AR 1 2
95% Confidence Interval for Population
Absolute Risk ^
^
^
^
1 1 1 2 1 2
AR 1.96
n1.
n2.
Absolute Risk
Interpretation
Conclude that the probability that the outcome is
present is higher (in the population) for group 1 if
the entire interval is positive
Conclude that the probability that the outcome is
present is lower (in the population) for group 1 if
the entire interval is negative
Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 0
247
738
.0229 .0213 (0.0016 , 0.0242)
ALCOHOL
Total
Without Risk
Hardly any Risk
Some-Considerable Risk
0 days
347
154
52
553
SICKDAYS
1-6 days
113
63
25
201
7+ days
145
56
34
235
Total
605
273
111
989
ALCOHOL
Total
Without Risk
Hardly any Risk
Some-Considerable Risk
0 days
347
154
52
553
SICKDAYS
1-6 days
113
63
25
201
7+ days
145
56
34
235
Total
605
273
111
989
Measures of Association
Goodman and Kruskals Gamma:
CD
CD
^
1 1
Kendalls b:
CD
0.0617
C D 83164 73496
^
Symmetric Measures
Ordinal by
Ordinal
Kendall's tau-b
Gamma
N of Valid Cases
Value
.035
.062
989
Asymp.
a
Std. Error
.030
.052
Approx. T
1.187
1.187
Approx. Sig.
.235
.235