Sie sind auf Seite 1von 6

SPECIAL CONTRIBUTION

biostatistics

Introduction to Biostatistics: Part 4,


Statistical Inference Techniques in
Hypothesis Testing

Statistical methods used to test the null hypothesis are termed tests of Gary M Caddis, MD, PhD*
significance. Selection of an appropriate test of significance is dependent Monica L G~tddis, PhDt
on the type of data to be analyzed and the number of groups to be com- Kansas City, Missouri
pared. Parametric tests of significance are based on the parameters, mean,
standard deviation, and variance, and thus are used appropriately when From the Departments of Emergency
interval or ratio data are analyzed. The t-test and analysis of variance Health Services* and Surgery,t Truman
Medical Center, University of Missouri-
(ANOVA) are examples of parametric tests of significance. Assumptions Kansas City School of Medicine.
regarding the data to be analyzed when using the t-test or ANOVA include
normality of the populations from which the sample data are drawn, ho-
Received for publication September 1,
mogeneity of the variances of the populations from which the sample data
1989. Accepted for publication
are drawn, and independence of the data points within a sample group. March 30, 1990.
The t-test is the appropriate test of significance to use if there are only two
groups to compare. If there are three or more groups to compare, ANOVA is Address for reprints: Monica L Caddis,
the appropriate test. ANOVA holds the preset c~ level constant. While PhD, Department of Surgery, Truman
ANOVA will imply a significant difference between the groups compared, Medical Center, 2301 Holmes, Kansas
a multiple comparison test will define which of the three or more groups City, Missouri 64108.
differ significantly. [Gaddis GM, Gaddis ML: Introduction to biostatistics:
Part 4, statistical inference techniques in hypothesis testing. Ann Emerg
Med July 1990;19:820-825.]

INTRODUCTION
The research process follows an organized, stepwise pattern. A problem
is identified, the research hypothesis is generated, methods of data collec-
tion are devised, and the statistical analysis of the data to be collected is
designed. Calculation of measures of central tendency and variability are
easily completed, but alone these numbers have only descriptive value.
Making a decision to reject or accept the null hypothesis (Ho) requires
much more extensive statistical analysis of the data.
Statistical methods used to test the null or statistical hypothesis (Ho) are
termed tests of significance. 1 Recall from Part 3 of this series [May
1990;19:591-597] that hypothesis testing involves accepting or rejecting
He .2 Selection of an appropriate test of significance is dependent on several
factors, including the number of groups to be compared and the type of
data to be analyzed. This fourth in the series of six articles will address the
concepts of parametric statistical inference techniques in hypothesis test-
ing.

PARAMETRIC VERSUS NONPARAMETRIC METHODS


The mean and the standard deviation (SD) of a population describe a
normally distributed population. 3 (Because the SD is computed as the
square root of the variance, it can also be said that the variance also de-
scribes a normal distribution.) Not only are the mean, median, and mode
equal in a normal distribution of data, but known percentages of data fall
within set SDs from the mean with a normally distributed set of data. The
mean, SD, and variance of a population are termed parameters of that pop-
ulation. Parametric statistical methods are based on these parameters. 1
Thus, given the relationship between these parameters and normality, the
underlying assumption of parametric statistical methods is that the data
being analyzed are normally distributed. If the data are not normally dis-
tributed and cannot be defined as interval or ratio data, other statistical

19:7 July 1990 Annals of Emergency Medicine 820/137


BIOSTATISTICS
Gaddis & Gaddis

Set a level

Planned I Ftest
comparisons

I
Control of experiment-I
wise error rate? I
I

I Yes I No

(
I Experimentwise LSD
method Layer method
J test

I Small or large number


of contrasts
Contrasts with
control only
I I Newman-Keuls
test Duncan test

Small I Large I

Dunn test
I
I More than two means J I Dunnetttest
involved in contrast

No Yes L
I
Tukey test
II Scheff~ test

FIGURE. Flow chart for multiple In addition to differences in type of ence between groups truly exists, all
comparison decisions (adapted from data analyzed and the assessment of else being the same, that difference
Hopkins and Chadboum [1967] and normality of the data, there are other would more likely be found using the
Keppel [1973]). characteristics possessed by these parametric test. Furthermore, more
two classifications of statistical tests , information about the data is gener-
that illustrate their inherent differ- ated from parametric tests.1 However
methods appropriately termed non- ences. First, parametric tests prove to important these differences are, the
parametric statistical methods are be m o r e p o w e r f u l t h a n n o n - nonparametric statistical test should
used. parametric tests. That is, if a differ- not be discounted. Because not all
138/821 Annals of EmergencyMedicine 19:7 July 1990
data are n o r m a l l y distributed and not servations between groups d e p e n d e n t assessed again. Pre- and post-throm-
all are of an i n t e r v a l or r a t i o scale, (as is the case for a p r e t e s t / p o s t - t e s t bolytic a d m i n i s t r a t i o n data are c o m -
nonparametric methods that are design), so that a paired t-test is ap- pared using a paired t-test so that pa-
sound in their m a t h e m a t i c a l theory propriate? 3) Are the groups equal or t i e n t s serve as t h e i r o w n c o n t r o l s .
often offer the only l e g i t i m a t e m e a n s unequal in size? 4) Is the c o m p a r i s o n T h e l a c k of a significant difference
of data analysis available. b e t w e e n a p o p u l a t i o n m e a n and sam- between pre- and post-treatment
p l e m e a n or b e t w e e n t w o s a m p l e e j e c t i o n f r a c t i o n e s t i m a t e s is ex-
PARAMETRIC STATISTICAL means? 5) Is t h e direction of the dif- pected if the drug is efficacious.
INFERENCE TESTS ference between the two groups The t-test is the m e t h o d of choice
t-Test k n o w n or unknown? If a direction of when making a single comparison
S t u d e n t ' s t-test (t-test) is the pa- difference is postulated, the t-test is b e t w e e n two groups whose data m e e t
rametric statistical method with t e r m e d a or~e-tailed test. If no direc- the a s s u m p t i o n s required of parame-
w h i c h researchers are m o s t often fa- tion of difference is p o s t u l a t e d , t h e tric analysis methods. However,
miliar. It is certainly the m o s t com- t-test is t e r m e d two-tailed. w h a t is done if the e x p e r i m e n t a l de-
m o n s t a t i s t i c a l m e t h o d r e p o r t e d in A very c o m m o n e x p e r i m e n t a l de- sign consists of three or m o r e groups
the m e d i c a l literature. 1 The t-test is sign in t h e medical l i t e r a t u r e is a sit- to be compared? T h e researcher m a y
used to accept or reject H o. It is sim- u a t i o n in w h i c h there are two differ- incorrectly compare these groups
plistic in that a comparison b e t w e e n e n t i n d e p e n d e n t groups, a c o n t r o l using several t-tests. For example, if
two groups can be m a d e and a deci- group and an e x p e r i m e n t a l group. an e x p e r i m e n t consisted of one con-
sion rendered w i t h o u t further analy- For example, suppose a n e w drug is trol group (C), and three experimen-
sis. Yet the t-test is powerful; it is a being tested to see if it will decrease tal groups (El, E2, E3), the compari-
parametric method that mathe- arterial pressure in persons w i t h hy- sons m a d e using t-tests w o u l d be C
m a t i c a l l y and t h e o r e t i c a l l y is based pertension. Two sample groups versus El, C versus E2, C versus E3,
on the means, SDs, and variances of w o u l d be selected by r a n d o m assign- E1 versus E2, E1 versus E3, and E2
the data. ment. Group 1 will receive a placebo versus E3. W h i l e this s e e m s logical
The t-test also requires that several while group 2 will receive the drug in and certainly easy, it is i m p r o p e r and
a s s u m p t i o n s r e g a r d i n g t h e d a t a be question. The alpha (cx) level is pre- can lead to serious errors in drawing
m a d e prior to use. If the data do not set. (Because the drug in question is conclusions from the data.L4-6
m e e t the assumptions, then the t-test h y p o t h e s i z e d to l o w e r arterial pres- W h e n several groups from an ex-
is n o t the appropriate m e t h o d to use. sure, a direction of change is postu- p e r i m e n t are compared using " m u l t i -
A s s u m p t i o n s of the t-test include the lated, and this data should be tested ple t-tests," the p r o b a b i l i t y of m a k -
following: 1) T h e p o p u l a t i o n s f r o m by a one-tailed t-test.) T h e data are ing a type I error (rejecting a true Ho)
which the samples were drawn c o l l e c t e d , d e s c r i p t i v e s t a t i s t i c s are is increased as the n u m b e r of com-
s h o u l d a p p r o a c h a n o r m a l distribu- calculated, and the t v a l u e is coin- parisons made using independent
tion; 2) the variances of the popula- puted. T h e t-test calculation is easily t-tests increases. 4 T h e increase in c~
tions from w h i c h sample 1 and sam- referenced. 4-6 level can be calculated as follows:
ple 2 were drawn should be equal or Once a t value is obtained, the re- Step i
nearly equal; and 3) the observations s e a r c h e r s h o u l d c o n s u l t a t a b l e of Number of comparisons:
within a p o p u l a t i o n or sample group critical values for t w i t h the appro- X = no. of groups in experiment
s h o u l d be i n d e p e n d e n t , ie, " n o t priate c~ level and degrees of freedom. C = no. of comparisons X(X - l)
paired, matched, correlated, or inter- If t h e c a l c u l a t e d t v a l u e is g r e a t e r
dependent in any way. ''4 than the critical t, H o is rejected and 2
W h i l e t h e s e a s s u m p t i o n s are im- it is c o n c l u d e d t h a t the m e d i c a t i o n Step 2
portant, the t-test is robust enough to in question does lower diastolic arte- Corrected a level:
be an appropriate test if an assump- rial pressure in hypertensives. If t h e c~corrected = 1 - (1 - a) c
tion is n o t m e t in the strictest sense c a l c u l a t e d t v a l u e is less t h a n t h e Example: As shown above, w i t h four
( e x c e p t i n g t h e a s s u m p t i o n of i n d e - critical t, H o is accepted as tenable. groups, there can be a m a x i m u m of
pendence, w h i c h m u s t be m e t at all A n o t h e r experimental design com- 4(4 - 1)/2 = 6 paired comparisons. If
times).4, s However, this is n o t to say m o n to the medical literature is t h e the original c~ level was P - .05, the
that it is appropriate to use the t-test p r e t e s t / p o s t - t e s t design. This results corrected o~ will be 1 (1 - .05) 6 =
for n o m i n a l or o r d i n a l data or data in dependent or related data b e t w e e n .26. Thus, there is n o w a .26 chance
that do n o t come from a n o r m a l l y or groups (repeated measure) and is an- of i n a p p r o p r i a t e l y r e j e c t i n g the n u l l
near-normally distributed popula- alyzed using the paired t-test. h y p o t h e s i s (type I error) in at l e a s t
tion. For e x a m p l e , a n e w t h r o m b o l y t i c one of the six comparisons made. 4 In
W h i l e the t-test is used to compare agent is developed that is p o s t u l a t e d m o s t s t u d i e s , t h i s w o u l d be u n a c -
two sample groups, the e x p e r i m e n t a l to halt the progression of a myocar- ceptable! Should m u l t i p l e t - t e s t s be
design of the study m u s t be consid- dial infarction. Patients entering t h e m a d e a m o n g d e p e n d e n t groups, the
ered b e c a u s e n o t all t-tests are the emergency department with an c o r r e c t e d c~ l e v e l s are e v e n g r e a t e r
same. Consideration of the following evolving myocardial infarction un- t h a n t h o s e c a l c u l a t e d for i n d e p e n -
is i m p o r t a n t : 1) Are the observations dergo D o p p l e r e c h o c a r d i o g r a p h y to dent groups. 4 Thus, m u l t i p l e t - t e s t s
between g r o u p s i n d e p e n d e n t (as is assess e j e c t i o n f r a c t i o n . F o l l o w i n g s h o u l d not be a c c e p t e d as a legiti-
the case for a control vs e x p e r i m e n t a l this procedure, the experimental m a t e m e a n s of data analysis for the
group design), so t h a t a nonpaired t h r o m b o l y t i c agent is a d m i n i s t e r e d . comparison of m o r e t h a n t w o
t - t e s t is appropriate? 2) Are the ob- Two days later, e j e c t i o n fraction is groups.4, 6

19:7 July 1990 Annals of Emergency Medicine 822/139


BIOSTATISTICS
Gaddis & Gaddis

ANALYSIS OF VARIANCE Treatment effect + experimental error if the a b o v e a s s u m p t i o n s are n o t


Analysis of variance (ANOVA) has strictly m e t (excepting the assump-
Experimental error
long b e e n an a c c e p t e d m e t h o d of tion of independence, which m u s t be
comparing three or more groups from Assuming that the experimental er- met at all times).4, s When the com-
one experiment. The advantages of ror rate estimates are approximately pared groups have equal values of n,
A N O V A over multiple t-tests include equal, a n y i n f l u e n c e of t r e a t m e n t population variances need not be ho-
the following: 6 1) The R level is held will result in a ratio that is greater m o g e n o u s . Also, n o r m a l i t y of the
c o n s t a n t at the preset level w i t h than 1. 7 The above example of hy- population distributions m a y be vio-
ANOVA, while the ~ level for multi- pothesis testing illustrates the gen- lated to a l i m i t e d degree w i t h o u t
ple t-tests increases as the n u m b e r of eral theory behind the mathematical c o n s e q u e n c e . 4-6 F i n a l l y , b e c a u s e
c o m p a r i s o n s i n c r e a s e s ; 4 2) o n e calculations of ANOVA. A N O V A is calculated using a param-
A N O V A is less cumbersome to cal- Just as the t-test involves calcula- eter (variance), it is considered to be a
culate than are several t-tests; and 3) tion of a t-statistic, w h i c h is com- parametric statistical analysis
A N O V A is a m o r e p o w e r f u l data pared with a critical t, A N O V A in- method and its use should be limited
analysis m e t h o d than is the t-test. v o l v e s c a l c u l a t i o n of an F-ratio, to interval and ratio scale data.
A N O V A is the appropriate statistical w h i c h is c o m p a r e d w i t h a critical Thus, there exist m a n y similarities
method to test for differences among F-ratio. The F-ratio answers the ques- between the t-test and ANOVA. This
more than two groups. Often, it is as- tion, Is "the variability between the can further be extended to the calcu-
sumed that A N O V A is used to deter- groups large enough in comparison to lated t from the t-test and to the F-ra-
m i n e if there is a difference a m o n g the variability of data within each tio from ANOVA. If an A N O V A was
the m e a n s of t h e s e g r o u p s r a t h e r group to justify the conclusion that being used instead of the t-test to
t h a n a m o n g the groups' collective two or more of the groups differ? ''6 If c o m p a r e t w o groups, it w o u l d be
values. This is an incorrect assump- the v a r i a b i l i t y b e t w e e n g r o u p s is found that F = t 2 for these data.4, 5
tion. W h i l e t h e m e a n describes a large enough, we can conclude that
group in a meaningful way, it is sim- there is a significant difference be- MULTIPLE COMPARISON
ply a descriptor of the group. Many tween groups. The F-r~}tio is defined METHODS
s t a t i s t i c a l r e f e r e n c e s will discuss as follows: Following a significant F test, the
A N O V A as a c o m p a r i s o n b e t w e e n F-ratio = Between-groups variance next logical step w o u l d be to ask,
means, but intragroup and intergroup Which of the groups compared in the
variability is what is actually being Within-groups variance A N O V A are significantly different?
analyzed. A N O V A is not just one simply de- This q u e s t i o n can be a n s w e r e d by
It is also of value to u n d e r s t a n d fined computation. The experimental the use of multiple comparison pro-
h o w A N O V A relates to the theory of design p o s s i b i l i t i e s are n u m e r o u s cedures. "All are essentially based
hypothesis testing. W i t h o u t the te- with ANOVA. By using one test, sev- upon the t-test but include appropri-
d i u m of a guided tour t h r o u g h the eral factors (eg, drugs, dose levels, ate corrections for the fact that more
calculation of ANOVA, a simple ex- dose times) can be analyzed for rela- t h a n o n e c o m p a r i s o n is b e i n g
planation of A N O V A follows. tionship at one time. The number of made."1
A test of the null hypothesis can F-ratios calculated in an ANOVA is There exist n u m e r o u s legitimate
be made in terms of two sets of dif- directly related to the n u m b e r of fac- m e t h o d s of m u l t i p l e c o m p a r i s o n ,
ferences (subjects participate in only tors in t h e e x p e r i m e n t a l design. each looking for unplanned yet "in-
o n e t r e a t m e n t , ie, s u b j e c t s are Thus, each A N O V A computation is teresting" differences in the experi-
"nested" within treatments). "One of u n i q u e to the e x p e r i m e n t a l design m e n t a l data, but operating under a
these sets of differences is obtained being tested. It is the researcher's re- different set of rules and a s s u m p -
by comparison of differences among sponsibility to ensure that the appro- tions. 5 The test that is selected for
t r e a t m e n t groups, referred to as ex- priate A N O V A is used, given the de- use should be the test that meets the
ternal or between-group differences. sign of the study. needs of the researcher and the de-
The other set is obtained by compari- The a s s u m p t i o n s for A N O V A are sign of the study. But overall, it is
son of differences among subjects re- the same as those for the t-test. 4-6 To important to remember that the rea-
ceiving the same treatment within a reiterate: 1) T h e p o p u l a t i o n s f r o m son for using A N O V A and a multiple
t r e a t m e n t group, termed internal or which the samples are drawn should comparison method is ultimately to
within-group differences. Between- approach normal distribution; 2) the control the experimentwise error rate
group differences are a result of the variances of the p o p u l a t i o n s f r o m (the type I error rate for all compari-
c o m b i n e d influence of the experi- which the samples were drawn sons) while at the same time making
m e n t a l treatment plus experimental should be equal or nearly equal; and several different comparisons.7 The
error. W i t h i n - g r o u p differences are 3) the o b s e r v a t i o n s within groups experimentwise error rate can be lim-
t h e r e s u l t of e x p e r i m e n t a l e r r o r m u s t be independent. ited by reducing the n u m b e r of com-
alone. ''7 The comparison ratio: These assumptions can usually be parisons made or reducing the error
met by random sampling and by use rate w i t h i n each c o m p a r i s o n . Be-
Between-group differences of a good m e a s u r e m e n t scale. 6 The cause most researchers do not want
Within-group differences more that the above assumptions for such imposing conditions placed on
ANOVA are violated, the more likely their work, as would be the case by
is sensitive to the effects of experi- a type I or type II error will be made. 6 limiting the n u m b e r of comparisons
mental treatment and can be written As with the t-test, A N O V A is ro- allowed, the only other way to con-
as" bust enough to be an appropriate test trol the experimentwise error rate is
140/823 Annals of Emergency Medicine 19:7 July 1990
to control the type I error rate w i t h i n control group and experimental researcher. There are other l e g i t i m a t e
each comparison; hence, the purpose groups, t h e D u n n e t t test is an option m e t h o d s that have n o t been i n c l u d e d
behind multiple comparison tech- of m u l t i p l e c o m p a r i s o n to consider. in this d i s c u s s i o n b e c a u s e of space
niques. H o w e v e r , it is i m p o r t a n t to However, if t h e group c o m p a r i s o n s limitations. Furthermore, statistical
n o t e t h a t in reducing the type I error are b e t w e e n a n y groups, t h e r e are procedures and opinions on m u l t i p l e
rate in such a way, there w i l l be an other test options. The Dunn test c o m p a r i s o n t h e o r y are c o n t i n u a l l y
i n c r e a s e i n t h e t y p e II e r r o r r a t e . could be considered if there are only evolving. The researcher is free to se-
T h u s , b e f o r e p r o g r e s s i n g , t h e re- a few c o m p a r i s o n s t o be m a d e . If lect whatever multiple comparison
s e a r c h e r m u s t d e t e r m i n e w h i c h is there are a large n u m b e r of compari- m e t h o d is d e s i r e d as l o n g as t h e
m o r e d e t r i m e n t a l to the work, m a k - sons to be made, the T u k e y test or m e t h o d is appropriate for the experi-
ing t y p e I errors or m a k i n g type II the Scheff4 test m i g h t be considered. m e n t a l design and research questions
errors.7 The Tukey test assumes that the asked.
A s u m m a r y flow chart of the selec- groups being c o m p a r e d are of equal
tion of m u l t i p l e comparison tests is size and is appropriate in the simple SUMMARY
s h o w n (Figure). Use of this figure w i l l c o m p a r i s o n of one group w i t h an- In conclusion, w h e n selecting t h e
help guide t h e r e s e a r c h e r to select other. T h e Scheff4 t e s t is based on m e t h o d for h y p o t h e s i s testing, s i m -
the test m o s t appropriate for the ex- t h e F s t a t i s t i c a n d t h u s is less af- plicity and familiarity must be
p e r i m e n t a l design tested and research fected b y v i o l a t i o n s of the a s s u m p - p u s h e d aside for a s s u r a n c e t h a t t h e
questions asked. This flow chart, de- tions of n o r m a l i t y and h o m o g e n e i t y data being analyzed m e e t t h e defined
veloped by H o p k i n s and Chadbourn 8 of variances. Should comparisons be a s s u m p t i o n s r e q u i r e d for use of a
a n d m o d i f i e d b y K e p p e l , 7 w a s in- desired b e t w e e n c o m p l e x c o m b i n a - given test. For the t-test and
tended to show the similarities and tions of groups, the Scheff4 test w i l l ANOVA, these a s s u m p t i o n s i n c l u d e
differences b e t w e e n some of the var- be sensitive in detecting real differ- n o r m a l i t y of t h e p o p u l a t i o n s f r o m
ious m u l t i p l e c o m p a r i s o n m e t h o d s . ences. 7 w h i c h the d a t a come, h o m o g e n e i t y
It should n o t be used as a "fixed and While not included in the flow of the variances of the sample popu-
rigid plan for analysis. ''7 For the pur- chart, the Bonferroni t-test is a m u l t i - lations, and independence of the data
poses of this article, this chart serves ple c o m p a r i s o n m e t h o d f r e q u e n t l y points w i t h i n a sample group.
as a logical guide to aid the reader in used in m e d i c a l literature. The Bon- If t h e e x p e r i m e n t a l design consists
t h e u n d e r s t a n d i n g of m u l t i p l e com- f e r r o n i t - t e s t a d j u s t s t h e p r e s e t c~ of only two groups, the t-test is ap-
parison methods. level by the n u m b e r of comparisons propriate to test for a significant dif-
Before a n y m u l t i p l e c o m p a r i s o n to be made. t,9 ference b e t w e e n these groups. How-
test, an c~ level is determined. Next, O~adj = Cgp
ever, if there are three or m o r e groups
an F test is performed. If a significant to compare, the t-test is inappropri-
F - r a t i o is o b t a i n e d , t h e p r o c e s s of ate because the preset level will in-
data analysis continues to d e t e r m i n e where p is the preset ~ level and n is crease w i t h the n u m b e r of compari-
w h i c h groups differ statistically. T h e t h e n u m b e r of c o m p a r i s o n s to b e sons made.
t e s t of Least Significant Difference made. "If each c o m p a r i s o n is m a d e A N O V A is a p o w e r f u l s t a t i s t i c a l
(LSD) is an o p t i o n if the r e s e a r c h e r using the critical t corresponding to test to d e t e r m i n e s i m u l t a n e o u s l y if
w i s h e s to c o n t r o l t h e c o m p a r i s o n - COp/n, t h e error rate for all c o m p a r i - t h e r e is a s i g n i f i c a n t d i f f e r e n c e
wise error rate ( i n d i v i d u a l type I er- sons t a k e n as a group will be at m o s t a m o n g three or m o r e groups. W h i l e
ror rates for each comparison) ~ and if % . ' q T h u s the preset R level is pro- t h e F - r a t i o w i l l t e l l if s i g n i f i c a n c e
a small n u m b e r of comparisons, rela- tected. However, the Bonferroni a m o n g any of the groups exists, it
tive to the total n u m b e r of compari- t-test b e c o m e s very c o n s e r v a t i v e as gives no information regarding w h i c h
sons possible, are to be made. How- the n u m b e r of comparisons m a d e in- of the groups differs.
ever, if the e x p e r i m e n t w i s e error rate creases. 1 T h u s , f o l l o w i n g a s i g n i f i c a n t F-
(type I error rate for all comparisons) Finally, as previously noted, confi- ratio, a m u l t i p l e c o m p a r i s o n test can
must be held constant, other dence intervals m a y be m o r e useful be selected t h a t will define w h i c h of
m e t h o d s of m u l t i p l e c o m p a r i s o n t h a n m u l t i p l e c o m p a r i s o n t e s t s in the three or m o r e groups is differenL
m u s t be considered. analysis of intergroup similarity.2,3, 9 T h e m u l t i p l e comparison m e t h o d se-
T h e r e are two ways to control the " C o n f i d e n c e i n t e r v a l s : 1) Show t h e lection is based on the e x p e r i m e n t a l
e x p e r i m e n t w i s e error rate, These in- degree of u n c e r t a i n t y in each c o m - d e s i g n and t h e r e s e a r c h q u e s t i o n s
clude the layer or s t e p w i s e m e t h o d p a r i s o n i n an e a s i l y i n t e r p r e t a b l e asked.
and the e x p e r i m e n t w i s e method. T h e way; 2) m a k e it easier to assess t h e
layer method gradually adjusts the practical significance of a difference REFERENCES
type I error rate. T h e e x p e r i m e n t w i s e as well as the statistical significance; 1. Glantz SA: Primer of Biostatistics, ed 2. New
m e t h o d h o l d s t h e t y p e I error rate and 3) are less l i k e l y to lead non-stat- York, McGraw-Hill Book Co, 1987.
c o n s t a n t for a set of c o m p a r i s o n s . i s t i c i a n s to t h e i n v a l i d c o n c l u s i o n 2. Gaddis GM, Gaddis ML: Introduction to bio-
T h e N e w m a n - K e u l s test and D u n c a n t h a t n o n s i g n i f i c a n t l y different sam- statistics: Part 3, Sensitivity, specificity, predic-
tive value and hypothesis testing. Ann Emerg
test are examples of layer methods. ple m e a n s i m p l y e q u a l p o p u l a t i o n Med 1990;19:591-597.
If an e x p e r i m e n t w i s e m e t h o d is se- means.'9 3. Gaddis GM, Gaddis ML: Introduction to bio-
lected, the "type of comparisons to be T h e above d i s c u s s i o n of m u l t i p l e statistics: Part 2, Descriptive analysis. Ann
made will determine the multiple c o m p a r i s o n m e t h o d s and t h e i r uses Emerg Med 1990;19:309-315.
comparison m e t h o d selected. If com- is a basic overview of just a few of 4. Hopkins KD, Glantz GV: Basic S~atistics for
p a r i s o n s are m a d e o n l y b e t w e e n a the possible options available to the the Behavioral Sciences. Engfewood Cliffs, New

19:7 July 1990 Annals of Emergency Medicine 824/141


BIOSTATISTICS
Gaddis & Gaddis

Jersey, Prentice~Hall, Inc, 1978. 7. Keppel G: Design and Analysis: A Re- research and a case study. Amer Educ Res J
searcher's Handbook. Englewood Cliffs, New 1967;4:407-412.
5. Sokal RR, Rolph FJ: Biometry, ed 2. New
York, WH Freeman and Co, 1981. Jersey, Prentice-Hall, Inc, 1973.
9. SAS Institute Inc: SAS/STAT ® User's Guide,
6. Elston RC, Johnson WD: Essentials of Bio- 8. Hopkins KD, Chadbourn RA: A schema for Release 6.03 edition. Cary, North Carolina, SAS
statistics. Philadelphia, FA Davis Co, 1987. proper utilization of multiple comparisons in Institute Ine, 1988, p 1028.

142/825 Annals of Emergency Medicine 19:7 July 1990

Das könnte Ihnen auch gefallen