Beruflich Dokumente
Kultur Dokumente
biostatistics
Statistical methods used to test the null hypothesis are termed tests of Gary M Caddis, MD, PhD*
significance. Selection of an appropriate test of significance is dependent Monica L G~tddis, PhDt
on the type of data to be analyzed and the number of groups to be com- Kansas City, Missouri
pared. Parametric tests of significance are based on the parameters, mean,
standard deviation, and variance, and thus are used appropriately when From the Departments of Emergency
interval or ratio data are analyzed. The t-test and analysis of variance Health Services* and Surgery,t Truman
Medical Center, University of Missouri-
(ANOVA) are examples of parametric tests of significance. Assumptions Kansas City School of Medicine.
regarding the data to be analyzed when using the t-test or ANOVA include
normality of the populations from which the sample data are drawn, ho-
Received for publication September 1,
mogeneity of the variances of the populations from which the sample data
1989. Accepted for publication
are drawn, and independence of the data points within a sample group. March 30, 1990.
The t-test is the appropriate test of significance to use if there are only two
groups to compare. If there are three or more groups to compare, ANOVA is Address for reprints: Monica L Caddis,
the appropriate test. ANOVA holds the preset c~ level constant. While PhD, Department of Surgery, Truman
ANOVA will imply a significant difference between the groups compared, Medical Center, 2301 Holmes, Kansas
a multiple comparison test will define which of the three or more groups City, Missouri 64108.
differ significantly. [Gaddis GM, Gaddis ML: Introduction to biostatistics:
Part 4, statistical inference techniques in hypothesis testing. Ann Emerg
Med July 1990;19:820-825.]
INTRODUCTION
The research process follows an organized, stepwise pattern. A problem
is identified, the research hypothesis is generated, methods of data collec-
tion are devised, and the statistical analysis of the data to be collected is
designed. Calculation of measures of central tendency and variability are
easily completed, but alone these numbers have only descriptive value.
Making a decision to reject or accept the null hypothesis (Ho) requires
much more extensive statistical analysis of the data.
Statistical methods used to test the null or statistical hypothesis (Ho) are
termed tests of significance. 1 Recall from Part 3 of this series [May
1990;19:591-597] that hypothesis testing involves accepting or rejecting
He .2 Selection of an appropriate test of significance is dependent on several
factors, including the number of groups to be compared and the type of
data to be analyzed. This fourth in the series of six articles will address the
concepts of parametric statistical inference techniques in hypothesis test-
ing.
Set a level
Planned I Ftest
comparisons
I
Control of experiment-I
wise error rate? I
I
I Yes I No
(
I Experimentwise LSD
method Layer method
J test
Small I Large I
Dunn test
I
I More than two means J I Dunnetttest
involved in contrast
No Yes L
I
Tukey test
II Scheff~ test
FIGURE. Flow chart for multiple In addition to differences in type of ence between groups truly exists, all
comparison decisions (adapted from data analyzed and the assessment of else being the same, that difference
Hopkins and Chadboum [1967] and normality of the data, there are other would more likely be found using the
Keppel [1973]). characteristics possessed by these parametric test. Furthermore, more
two classifications of statistical tests , information about the data is gener-
that illustrate their inherent differ- ated from parametric tests.1 However
methods appropriately termed non- ences. First, parametric tests prove to important these differences are, the
parametric statistical methods are be m o r e p o w e r f u l t h a n n o n - nonparametric statistical test should
used. parametric tests. That is, if a differ- not be discounted. Because not all
138/821 Annals of EmergencyMedicine 19:7 July 1990
data are n o r m a l l y distributed and not servations between groups d e p e n d e n t assessed again. Pre- and post-throm-
all are of an i n t e r v a l or r a t i o scale, (as is the case for a p r e t e s t / p o s t - t e s t bolytic a d m i n i s t r a t i o n data are c o m -
nonparametric methods that are design), so that a paired t-test is ap- pared using a paired t-test so that pa-
sound in their m a t h e m a t i c a l theory propriate? 3) Are the groups equal or t i e n t s serve as t h e i r o w n c o n t r o l s .
often offer the only l e g i t i m a t e m e a n s unequal in size? 4) Is the c o m p a r i s o n T h e l a c k of a significant difference
of data analysis available. b e t w e e n a p o p u l a t i o n m e a n and sam- between pre- and post-treatment
p l e m e a n or b e t w e e n t w o s a m p l e e j e c t i o n f r a c t i o n e s t i m a t e s is ex-
PARAMETRIC STATISTICAL means? 5) Is t h e direction of the dif- pected if the drug is efficacious.
INFERENCE TESTS ference between the two groups The t-test is the m e t h o d of choice
t-Test k n o w n or unknown? If a direction of when making a single comparison
S t u d e n t ' s t-test (t-test) is the pa- difference is postulated, the t-test is b e t w e e n two groups whose data m e e t
rametric statistical method with t e r m e d a or~e-tailed test. If no direc- the a s s u m p t i o n s required of parame-
w h i c h researchers are m o s t often fa- tion of difference is p o s t u l a t e d , t h e tric analysis methods. However,
miliar. It is certainly the m o s t com- t-test is t e r m e d two-tailed. w h a t is done if the e x p e r i m e n t a l de-
m o n s t a t i s t i c a l m e t h o d r e p o r t e d in A very c o m m o n e x p e r i m e n t a l de- sign consists of three or m o r e groups
the m e d i c a l literature. 1 The t-test is sign in t h e medical l i t e r a t u r e is a sit- to be compared? T h e researcher m a y
used to accept or reject H o. It is sim- u a t i o n in w h i c h there are two differ- incorrectly compare these groups
plistic in that a comparison b e t w e e n e n t i n d e p e n d e n t groups, a c o n t r o l using several t-tests. For example, if
two groups can be m a d e and a deci- group and an e x p e r i m e n t a l group. an e x p e r i m e n t consisted of one con-
sion rendered w i t h o u t further analy- For example, suppose a n e w drug is trol group (C), and three experimen-
sis. Yet the t-test is powerful; it is a being tested to see if it will decrease tal groups (El, E2, E3), the compari-
parametric method that mathe- arterial pressure in persons w i t h hy- sons m a d e using t-tests w o u l d be C
m a t i c a l l y and t h e o r e t i c a l l y is based pertension. Two sample groups versus El, C versus E2, C versus E3,
on the means, SDs, and variances of w o u l d be selected by r a n d o m assign- E1 versus E2, E1 versus E3, and E2
the data. ment. Group 1 will receive a placebo versus E3. W h i l e this s e e m s logical
The t-test also requires that several while group 2 will receive the drug in and certainly easy, it is i m p r o p e r and
a s s u m p t i o n s r e g a r d i n g t h e d a t a be question. The alpha (cx) level is pre- can lead to serious errors in drawing
m a d e prior to use. If the data do not set. (Because the drug in question is conclusions from the data.L4-6
m e e t the assumptions, then the t-test h y p o t h e s i z e d to l o w e r arterial pres- W h e n several groups from an ex-
is n o t the appropriate m e t h o d to use. sure, a direction of change is postu- p e r i m e n t are compared using " m u l t i -
A s s u m p t i o n s of the t-test include the lated, and this data should be tested ple t-tests," the p r o b a b i l i t y of m a k -
following: 1) T h e p o p u l a t i o n s f r o m by a one-tailed t-test.) T h e data are ing a type I error (rejecting a true Ho)
which the samples were drawn c o l l e c t e d , d e s c r i p t i v e s t a t i s t i c s are is increased as the n u m b e r of com-
s h o u l d a p p r o a c h a n o r m a l distribu- calculated, and the t v a l u e is coin- parisons made using independent
tion; 2) the variances of the popula- puted. T h e t-test calculation is easily t-tests increases. 4 T h e increase in c~
tions from w h i c h sample 1 and sam- referenced. 4-6 level can be calculated as follows:
ple 2 were drawn should be equal or Once a t value is obtained, the re- Step i
nearly equal; and 3) the observations s e a r c h e r s h o u l d c o n s u l t a t a b l e of Number of comparisons:
within a p o p u l a t i o n or sample group critical values for t w i t h the appro- X = no. of groups in experiment
s h o u l d be i n d e p e n d e n t , ie, " n o t priate c~ level and degrees of freedom. C = no. of comparisons X(X - l)
paired, matched, correlated, or inter- If t h e c a l c u l a t e d t v a l u e is g r e a t e r
dependent in any way. ''4 than the critical t, H o is rejected and 2
W h i l e t h e s e a s s u m p t i o n s are im- it is c o n c l u d e d t h a t the m e d i c a t i o n Step 2
portant, the t-test is robust enough to in question does lower diastolic arte- Corrected a level:
be an appropriate test if an assump- rial pressure in hypertensives. If t h e c~corrected = 1 - (1 - a) c
tion is n o t m e t in the strictest sense c a l c u l a t e d t v a l u e is less t h a n t h e Example: As shown above, w i t h four
( e x c e p t i n g t h e a s s u m p t i o n of i n d e - critical t, H o is accepted as tenable. groups, there can be a m a x i m u m of
pendence, w h i c h m u s t be m e t at all A n o t h e r experimental design com- 4(4 - 1)/2 = 6 paired comparisons. If
times).4, s However, this is n o t to say m o n to the medical literature is t h e the original c~ level was P - .05, the
that it is appropriate to use the t-test p r e t e s t / p o s t - t e s t design. This results corrected o~ will be 1 (1 - .05) 6 =
for n o m i n a l or o r d i n a l data or data in dependent or related data b e t w e e n .26. Thus, there is n o w a .26 chance
that do n o t come from a n o r m a l l y or groups (repeated measure) and is an- of i n a p p r o p r i a t e l y r e j e c t i n g the n u l l
near-normally distributed popula- alyzed using the paired t-test. h y p o t h e s i s (type I error) in at l e a s t
tion. For e x a m p l e , a n e w t h r o m b o l y t i c one of the six comparisons made. 4 In
W h i l e the t-test is used to compare agent is developed that is p o s t u l a t e d m o s t s t u d i e s , t h i s w o u l d be u n a c -
two sample groups, the e x p e r i m e n t a l to halt the progression of a myocar- ceptable! Should m u l t i p l e t - t e s t s be
design of the study m u s t be consid- dial infarction. Patients entering t h e m a d e a m o n g d e p e n d e n t groups, the
ered b e c a u s e n o t all t-tests are the emergency department with an c o r r e c t e d c~ l e v e l s are e v e n g r e a t e r
same. Consideration of the following evolving myocardial infarction un- t h a n t h o s e c a l c u l a t e d for i n d e p e n -
is i m p o r t a n t : 1) Are the observations dergo D o p p l e r e c h o c a r d i o g r a p h y to dent groups. 4 Thus, m u l t i p l e t - t e s t s
between g r o u p s i n d e p e n d e n t (as is assess e j e c t i o n f r a c t i o n . F o l l o w i n g s h o u l d not be a c c e p t e d as a legiti-
the case for a control vs e x p e r i m e n t a l this procedure, the experimental m a t e m e a n s of data analysis for the
group design), so t h a t a nonpaired t h r o m b o l y t i c agent is a d m i n i s t e r e d . comparison of m o r e t h a n t w o
t - t e s t is appropriate? 2) Are the ob- Two days later, e j e c t i o n fraction is groups.4, 6
Jersey, Prentice~Hall, Inc, 1978. 7. Keppel G: Design and Analysis: A Re- research and a case study. Amer Educ Res J
searcher's Handbook. Englewood Cliffs, New 1967;4:407-412.
5. Sokal RR, Rolph FJ: Biometry, ed 2. New
York, WH Freeman and Co, 1981. Jersey, Prentice-Hall, Inc, 1973.
9. SAS Institute Inc: SAS/STAT ® User's Guide,
6. Elston RC, Johnson WD: Essentials of Bio- 8. Hopkins KD, Chadbourn RA: A schema for Release 6.03 edition. Cary, North Carolina, SAS
statistics. Philadelphia, FA Davis Co, 1987. proper utilization of multiple comparisons in Institute Ine, 1988, p 1028.