Beruflich Dokumente
Kultur Dokumente
CONTINGENCY TABLES
DIFFERENCE OF PROPORTIONS
Suppose
1
DIFFERENCE OF PROPORTIONS
+
+
+
[Large Sample] % CI: [due to Walds]
EXAMPLE # 1
The following table is from a report on the relationship between
aspirin use and myocardial infarction (heart attacks) by the
Physicians Health Study Research Group at Harvard Medical
School. The Physicians Health Study was a five-year
randomized study testing whether regular intake of aspirin
reduces mortality from cardiovascular disease. Every other day,
the male physicians participating in the study took either one
aspirin tablet or a placebo. The study was blind the
physicians in the study did not know which type of pill they
were taking.
EXAMPLE # 1
EXAMPLE # 1
(a) Estimate the probability of suffering myocardial
infarction (MI) for both placebo and aspirin groups.
(b) Construct a 95% CI for the true difference of
probabilities of heart attack between male physicians who
took placebo and those who took aspirin. From this,
determine if aspirin is effective in diminishing the risk of
heart attack?
RELATIVE RISK
For 2-by-2 tables, the relative risk (RR) is the ratio
=
where it can be any non-negative number. RR = 1.0 iff
.
RELATIVE RISK
The importance of RR is due to the importance of
differences of a certain fixed size when proportions of
success (in all levels of ) are close to 0 or 1. That is, while
the same difference was observed for (a) 0.010 and 0.001
and (b) 0.410 and 0.401, (a) is more striking since the
discrepancy between the two proportions can be expressed
as 10 times of the other. This goes to show that RR may
give better interpretative meaning for public health
implications, than relying on the differences of proportions
alone (which may be misleading if
i
0 or 1).
RELATIVE RISK
The sampling distribution of RR is highly skewed
unless the sample sizes are quite large. Under which, an
approximate [large-sample due to Walds] 1
100% CI for the true log RR is given by:
EXAMPLE # 2
Refer to the aspirin use and myocardial infarction (heart
attacks) study by the Physicians Health Study Research
Group at Harvard Medical School.
(a) Estimate and interpret the RR of heart attack
between male physicians who took placebo and those
who took aspirin.
(b) Construct a 95% CI for the true RR of heart attack
between male physicians who took placebo and those
who took aspirin.
ODDS RATIO
For a probability of success , the odds (of success)
are defined to be
= /( )
from which we can get
= /( +)
ODDS RATIO
For 2-by-2 tables, the odds ratio () is the ratio
=
where it can be any non-negative number.
Sample odds ratio (
ODDS RATIO
and independent = .
> . : higher success rate for row [X level] 1
< . : higher success rate for row [X level] 2
Values of farther from 1.0 in any direction represent stronger
association between and .
is orientation invariant (unlike RR).
may be viewed as a cross-product ratio of joint probabilities if
interdependence is desired.
ODDS RATIO
The sampling distribution of is highly skewed unless
the sample sizes are quite large. Under which, an
approximate [large-sample] 1 100% CI for the
true log [which is symmetric about 0] is given by:
ODDS RATIO
If some cell counts (n
ij
) are 0, then
can either be 0 or ,
or even undefined if both entries in a row or column are 0. To
adjust for this, an amended estimator is given by
=
(
+. )(
+. )
(
+. )(
+. )
i.e., an adjustment of 0.5 was made on each cell count (also
applies for SE(
Hence, whenever direct estimation of RR is not
possible, one can estimate instead, and use it to
approximate RR, as long as
and
.
ODDS RATIO AND
CASE- CONTROL STUDIES
In most case-control studies, marginal distribution of the
response variable is usually fixed by the sampling design.
With this being retrospective, one can construct conditional
distributions for the explanatory variable, within levels of
the response outcome of interest. In this case, only can
be estimated due to its symmetric orientation (invariance).
Thus, for relatively rare successes [usually rare diseases],
RR is usually approximated by
TESTS OF INDEPENDENCE
Consider
For a sample of size with cell counts *n
ij
+, the values *
ij
= n
ij
+
are expected frequencies, i.e. *(n
ij
)+ under which
is true.
To arrive at a decision, *n
ij
+ is compared with *
ij
+, such that for
is true, *n
ij
ij
+ must be small, i.e. larger differences provide
stringer evidences against
.
Test statistics used to make such comparisons have large-sample
distributions.
TESTS OF INDEPENDENCE
()
Mean:
Variance:
=
(,)
PEARSON
STATISTIC
score statistic
minimum at 0 if all n
ij
=
ij
p-value: ,
-
*
likelihood-ratio statistic [based on multinomial assumption]
minimum at 0 if all n
ij
=
ij
p-value: ,
-
*
TESTS OF INDEPENDENCE
In two-way tables, the null hypothesis of statistical independence
has the form
=
+
=
+
+
Note: *
=
n
i+
n
+j
n
+
TESTS OF INDEPENDENCE
For testing independence in I x J contingency tables,
the
and
TESTS OF INDEPENDENCE
Recall:
The degrees of freedom is obtained by taking the difference
between the number of parameters [cell counts] under the alternative
[for w/c there are IJ 1 non-redundant parameters] and null
[for w/c there are (I 1)+(J 1) non-redundant parameters]
hypotheses, i.e.,
= + = ( )( )
EXAMPLE # 4
The following table, from the 2000 General Social
Survey, cross classifies gender and political party
identification. Subjects indicated whether they identified
more strongly with the Democratic or Republican party
or as Independents. This also contains estimated
expected frequencies for
: Independence between
Gender and Political Party Identification.
Determine if a significant association between gender
and political party identification exists or not.
EXAMPLE # 4
RESIDUALS FOR CELLS
A cell-by-cell comparison of observed and estimated
frequencies help us better understand the nature of the
evidence.
However, it is rather insufficient to simply rely on the
raw cell differences
+
+
follows a [large-sample] standard normal distribution under
: (as compared to 0) evidence towards lack of fit of
i.e., at a significance level , one expects 100% of the
standardized residuals to be beyond 2 (or 3, if many cells) by chance
alone under
EXAMPLE # 5
Refer to the gender and political party identification
example. The following table shows the standardized
residuals for testing independence in the previous
example. Try to make sense of the computed standardized
residuals in relation with the observed global result for
testing independence between gender and political arty
identification.
EXAMPLE # 5
STANDARDIZED RESIDUALS
Notice that residuals for the females are the negative
of those of males. In general, the residuals in each
column must sum up to 0 as the observed counts and the
expected frequencies are constrained by the same row
and column totals. In particular, for 2 x J tables,
= (
)
PARTITIONING
Recall: Let
and
be independent
2
RVs w/ degrees of
freedom
1
and
2
, respectively. Then
=
In essence, this enables one to separate/collapse rows or columns
of I x J tables to several sub-tables, and obtain
2
or
2
statistics for
which the sum of each corresponding partitioned statistic is the
global statistics.
PARTITIONING
Consider: For a test of independence in a 2 x J table, a
2
statistic can be broken down into J 1 components: [1] the
first two columns; [2] collapsing of the first two columns, then
compared with the 3
rd
column; [3] collapsing of the first three
columns, then compared with the 4
th
column, etc. until the J
th
column is considered. In particular, this is true for
.
PARTITIONING
While it might seem more natural to obtain
TESTS
These tests likewise require a very large sample size n
relative to IJ. Moreover,
still provides decent approximation even if some expected
frequencies are as small as 1.
SOME COMMENTS ON
TESTS
and
}. A small-sample null
probability distribution for the cell counts that does not
depend on unknown parameters results from considering the
set of tables having the same row and column total. Under this
condition, each *
: = ,
is hypergeometric with
+
Hence, the p-value equals the sum of hypergeometric probabilities
for outcomes at least as favorable to
: =
: =