Sie sind auf Seite 1von 13

A Note on Interpretation of the Paired-Samples t Test

Author(s): Donald W. Zimmerman


Source: Journal of Educational and Behavioral Statistics, Vol. 22, No. 3 (Autumn, 1997), pp. 349360
Published by: American Educational Research Association and American Statistical Association
Stable URL: http://www.jstor.org/stable/1165289
Accessed: 25-12-2015 18:58 UTC
REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/1165289?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Sage Publications, Inc., American Statistical Association and American Educational Research Association are
collaborating with JSTOR to digitize, preserve and extend access to Journal of Educational and Behavioral Statistics.

http://www.jstor.org

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

TEACHER'S CORNER

A Note on Interpretation of the Paired-Samples t Test


Donald W. Zimmerman
CarletonUniversity
Keywords: correlated samples, difference scores, independentsamples, matched pairs,
nonindependence,paired samples, power, t test, TypeI error, TypeII error
Explanationsof advantages and disadvantagesof paired-samplesexperimental
designs in textbooks in education and psychology frequently overlook the
change in the Type I error probability which occurs when an independentsamples t test is performed on correlated observations. This alteration of the
significance level can be extreme even if the correlation is small. By comparison, the loss of power of the paired-samples t test on differencescores due to
reduction of degrees of freedom, which typically is emphasized, is relatively
slight. Althoughpaired-samplesdesigns are appropriateand widely used when
there is a natural correspondenceor pairing of scores, researchers have not
often consideredthe implicationsof undetectedcorrelationbetweensupposedly
independentsamples in the absence of explicitpairing.

Many experimental designs in education, psychology, and social sciences


employ paired or matched observations.A familiar example is repeated measures on the same subjects over a period of time. Some significance tests of
location, includingthe independent-samplesStudentt test are not appropriatefor
these designs, because the measures usually are correlatedratherthan independent.
Researchers typically analyze paired data using the paired-samples t test,
which essentially is a one-sample Studentt test performedon difference scores.
Applied statisticiansgenerally are awareof the advantagesand disadvantagesof
this test. First, the correlationassociated with pairing or matching of observations reduces the standarderror of the difference between means, so the error
term differs from that of the independent-samplestest. This is apparentfrom the
equation
2

oy.
u_ = ug + op- 2p7o o"
The correlationterm reduces the variance of the difference between means and
increases the t ratio. In the context of interval estimation, the reduced standard
This researchwas supportedby a CarletonUniversity researchgrant.A listing of the
computerprogram,writtenin TurboBASIC, Version 1.0 (Borland,Inc.) can be obtained
by writing to the authorat 15078 Eagle Place, Surrey,BC V3R 4W2, Canada.
349

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Teacher's Corner

errorresults in a narrowerconfidence interval.For this reason, an experimental


design involving paired observationscan more accuratelydetect differences in
which a researcheris interested.Similarlogic applies to within-subjectsANOVA
as opposed to independent-groupsANOVA.
Second, this gain is partly offset by a loss of degrees of freedom. The
one-sample t statisticbased on n pairs is evaluatedat n - 1 degrees of freedom,
while the two-sample t is evaluated at 2n - 2 degrees of freedom. Therefore,
authorsemphasizethatthe paired-samplestest is preferableif the two groupsare
highly correlated,while the independent-samplestest is the better choice if they
are uncorrelatedor only slightly correlated. Authors usually do not advise
explicit matching or pairing of subjects in an experimentaldesign and subsequentuse of a paired-samplest test, unless this procedureproducesa substantial
correlation.For example, Kurtz(1965) summarizedthe thinking of many investigatorsas follows.
of pairingis seento dependon theclosenessof therelationship
Theadvantage
establishedbetweenthe two sets of observationsas a resultof pairing.If a
is established,thereductionof thevarianceof the
sufficientlyhighrelationship
for thedegreesof freedomlost as a resultof
differencemorethancompensates
pairing;if only a low correlationis established,the gains resultingfrom
reductionof thevarianceof thedifferencemaybe morethanoffsetby theloss
of degreesof freedom.(p. 213)
More recently,Hays (1988) wrote,
of unmatched
Suchmatchingmaybe less efficientthanthecomparison
random
groups,unlessthe factorusedin matchingintroducesa relativelystrongpositive relationship
betweenthemeans.Althougha positiverelationship,
reflected
in a positivecovarianceterm,does reducethe standard
errorof the difference,
this procedure
also halvesthe numberof degreesof freedom.Dealingwith a
sampleof N pairsgivesonlygroupsof N caseseach.Thus,if thefactorentering
intothematchingis onlyslightlyrelevantto thedifferencesbetweenthegroups
or is evenirrelevant
to suchdifferences,matchingis not a desirableprocedure.
(p. 315)
And Edwards(1979) noted that
the averagevalueof thecovariancemustbe sufficientlylargeto offsetthefact
thatfor the samenumberof observations,
MSsTwill have fewer degreesof
freedomthanMSwandwill thusrequirea largervalueof F for significance.
(p. 128)
See also introductorytextbooks by Howell (1987, pp. 204-206), Loether and
McTavish(1993, p. 554), and Pagano (1986, pp. 301-304). These recommendations are typical of many authors, although the relative emphasis placed on
reductionof the standarderrorand reductionof degrees of freedom varies from
one text to another.
The simulationsin the presentstudy reveal that this advice must be qualified
and that pairingsometimes is associatedwith a large difference in the efficiency
350

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Teacher's Corner

of the two significance tests when the correlationis quite small. In the case of
naturally paired data, even correlations of .10, .15, or .20 make the pairedsamples t test mandatoryin orderto protectagainstdistortionof the significance
level. These conclusions are based on examinationof power functions as well as
degrees of freedom and Type I errors.
The presentstudy also comparesthe two tests from anotherpoint of view and
focuses attentionon an aspect of the problem which has been overlooked. The
comparison of the two proceduresfrequently made in textbooks fails to take
account of an important effect: Nonindependence of observations depresses
both Type I errorprobabilitiesand the power of the test to detect differences. In
other words, a correlationbetween samples that are believed to be independent
compromisesnot only the efficiency but also the validityof the significance test.
Furthermore,the change that occurs is quite large.
Many years ago, Cochran(1974), Scheff6 (1959), Walsh (1947), and others
discovered that violation of the independenceassumptionunderlyingthe t and F
tests distorts Type I and Type II errorprobabilities.(See also a recent study by
Zimmerman,Williams, & Zumbo, 1993.) However, investigatorshave not considered these results in the context of paired-samplesexperimentaldesigns. The
present note examines some implications of nonindependenceof observations,
as investigated in these studies, for interpretationof the paired-samplest statistic.

Paired Data and Nonindependence of Observations


A simulation study consisted of performing independent-samplesStudent t
tests and paired-samplest tests on samples from a normalpopulation.Although
it is possible to calculate the power of these tests analytically,a comparisonof
the two tests is not possible without takinginto considerationthe change in Type
I errorprobabilitiesdiscussed above. In the presentstudy, a computeralgorithm
induced correlations ranging from -.50 to .50 by adding a multiple of one
random variable to each of two other random variables, the multiplicative
constantbeing chosen to producethe desired correlationcoefficient.
The algorithmgeneratedN(0, 1) normal deviates by the method of Box and
Muller (1958), based on the transformationX = (-2 log Ul)1/2 cos 27rU2, where
U1 and U2 are uniformly distributedpseudorandomnumbers on the interval
(0, 1). In successive replications,constantswere addedto all scores in one group
in incrementsof .5o, 1.25o, or 1.5u in orderto determineboth Type I and Type
II errors. Sample sizes ranged from 10 to 80. The study performedboth onetailed and two-tailed tests at the .05 significance level. Each data point represents 10,000 replicationsof the samplingprocedureand subsequentsignificance
tests. The purpose of the simulations was to illustrate the arguments in the
presentnote, and they were not intendedto be an exhaustive study of properties
of the t test.
351

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Teacher's Corner

Two Concomitant Effects: Failure to Maintain the Significance Level and


Reduction of the Power of the Test
First, considerthe two curves in the lower section of Figure 1, which plots the
probability of rejecting the null hypothesis as a function of the correlation
between paired observations, for both the independent-samplest test and the
paired-samplest test, when the null hypothesis is false. There were 20 pairs of
observations,and the difference between the means of the two populationswas
1.5o. It is apparent that the efficiency of the paired-samples test increases
systematically, while that of the independent-samplestest decreases, as the
correlation increases from -.50 to .50. When the correlation is zero, the
independent-samplestest is slightly more powerful than the paired-samplestest.
This result is consistent with our previous discussion, althoughinvestigatorsdo
not usually consider negative correlationsin the presentcontext.
Examination of the upper section of Figure 2, again based on 20 pairs of
observations, reveals a somewhat different pattern. In the simulations represented in this graph, there were no differences between population means, so
that the curves representthe probabilitiesof Type I errors.The paired-samples
test maintains the probability close to the .05 significance level despite the
increasingcorrelation.The independent-samplestest, however, exhibits a rather
large change as the correlationincreases. Even a correlationof only .10 or .20
has a substantialinfluence on this test. Because of this change in the Type I error
probability,the values plotted for the independent-samplestest in the lower
section of Figure 1 cannotbe interpretedas the power of the test. Consequently,
the values are not comparableto those of the paired-samplestest.
Implicationsof the alterationof the significance level are furtherillustratedby
Figure 2. The uppersection of the figure shows power functions of both tests. In
this graph,there are 20 pairs of scores, the correlationis zero, and the difference
between means increases from 0 to 4.5o in incrementsof .5o. Apparently,the
independent-samplestest is slightly more powerful than the paired-samplestest.
The difference between the two curves is accounted for by the fact that the
paired-samplestest is based on 9 degrees of freedom (critical value of t of
2.262), while the independent-samplestest is based on 18 degrees of freedom
(critical value of t of 2.101).
In the data plotted in the lower section, the correlation between paired
observations is .30. In this case, the paired-samples test dominates the
independent-samples test. However, the Type I error probability of the
independent-samplestest declines to .023, while that of the paired-samplestest
remains close to .05. For this reason, the two "power" curves are not comparable. Similarly, in the lower section of Figure 1, one cannot conclude that the
independent-samplestest is preferablefor negative correlations,because of the
large difference in Type I errorprobabilitiesexhibited in the upper section.
The third curve in Figure 2, labeled adjusted, representsthe paired-samples
test performedat the .023 significance level. This adjustmentof the significance
level to allow for the change in Type I errorprobabilitymakes the two functions
352

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

n = 20
0.12

0.11
S0.10
O

0.09

0.08

t-Independent
t-Paired

0.07

0.06

4
0

0.05

0.04
QQ3
-Q
0
L

0.02

0.01

0.00

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5
Correlation
n = 20

0.55

0.5o

0.45

t-Paired

0.40
0.35

t-Independent

0.30

- -- -- -- -- -- -

-?--------

0.25
S0.20

S0.15
0

0.10

0.05
0.00

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

Correlation
FIGURE 1. Probability of rejecting Ho by the independent samples t test and the
paired-samplest test as a function of correlation
Note. The differencebetween populationmeans is zero in the uppersection and 1.5arin the lower
section.

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

n = 20

p = 0

1.0

O
0

0.9

t-Independent

t-Paired

-1- 0.8
0.7

.C. 0.6
S
*

0.6
0.5
S0.4
0.3

c3
-0
0
L

0.2
0.1
I

0.0
0

I
3

Difference
1.0

0.9

n = 20

0.8

in Standard

Units

p = .30

t-Independent

t-Paired
t-Adjusted

0.7

0.6

>

0.4

.-0

0.3

0.2

L-

0.1
0.0
0

Difference

in Standard

Units

FIGURE 2. Probability of rejecting Ho by the independent-samplest test, the pairedsamples t test, and the paired-samples t test with an adjusted significance level as a
function of the differencebetween means (incrementsof .5cr)
Note. The correlationis zero in the uppersection and .30 in the lower section.

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Teacher'sCorner
comparable.The modified curve remainsslightly below that of the independentsamples test over the entire range of differences between means. This slight
disparityof the two curves apparentlyreflects differences in degrees of freedom.
It is evident from these figures that even a moderate correlation between
observations has a stronger influence on the probability of Type II errors and
power than does reductionof degrees of freedom from 18 to 9.
Table 1 provides simulation data for sample sizes of 10, 20, 40, and 80 for
both one-tailed and two-tailed tests. The differencebetween means increasedin
incrementsof 1.25r. It is evident that depressionof the Type I errorprobability
of the independent-samplestest occurs consistently for all sample sizes examined. Furthermore,the relative advantageof the paired-samplestest for correlated samples is apparentfor all sample sizes.
Conclusions
Inspection of Figures 1 and 2 and Table 1 certainly confirms the widespread
belief among researchersand applied statisticiansthat one should substitutethe
paired-samplest test for the independent-samplestest whenever subjects are
coupled or matched in some way in an experimentaldesign. The magnitudeof
the effect producedby slight correlationsprobablyis greaterthan most researchers realize. The present results disclose that even a correlationof .10 or .20
seriously distorts the significance level of the t statistic based on independent
samples. When power functions are examined, it is apparentthat advantagesof
the paired-samplestest are not negligible for small correlationsand are exceptional for correlationsas high as .40 or .50.
We now examine the problemfrom anotherpoint of view. In makingcomparisons in the presentcontext, one can ask two distinctquestions.The first question
is, What gain in efficiency results from using a matched-pairsexperimental
design instead of an independent-samplesdesign, if matchinginduces a correlation? The answer to this question is found by comparingthe curve representing
the paired-samplestest in the lower section of Figure 1 with the horizontal
broken line. The line represents a constant probability of .308, which is the
power of the independent-samplestest when the correlationis zero. This comparison makes it clear that the advantageof the paired-samplesdesign becomes
greateras the correlationincreases from 0 to .50, and that the advantageis quite
large for higher correlations.This outcome is consistent with the usual interpretation of the two tests. Of course, the amountof gain depends on the parameters
chosen for this particular example. The figure also reveals that a negative
correlationresults in a loss ratherthan a gain.
A second question is, What loss occurs if one performs the independentsamples t test inappropriatelyon measureswhich are correlated?This question is
somewhat more complicated, but it has significant practical applications.The
answer can be found by inspecting the two curves (open circles and filled
circles) in the lower section of Figure 1. These curves reveal that the difference
in the probabilities of rejecting the null hypothesis for the two tests becomes
355

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

TABLE 1
Probability of rejecting Ho by independent-samplest test and paired-samples t test for
various numbersof pairs (n) and degrees of correlationbetween samples (p)-one-tailed
and two-tailed tests

p= .20

p=0
n

t
indep.

t
paired

t
indep.

p= .40

t
paired

t
indep.

t
paired

.051
.331
.766
.968
.052
.345
.797
.984
.051
.350
.807
.986
.048
.346
.812
.986

.022
.249
.762
.980
.019
.261
.786
.988
.019
.258
.792
.988
.017
.267
.791
.989

.051
.391
.860
.993
.051
.417
.891
.997
.050
.426
.898
.996
.051
.435
.899
.997

.048
.211
.634
.931
.051
.229
.684
.953
.050
.238
.710
.963
.049
.246
.719
.969

.016
.149
.609
.946
.014
.145
.636
.961
.013
.143
.652
.966
.0 13
.149
.672
.972

.051
.270
.758
.976
.049
.288
.798
.988
.052
.302
.820
.991
.051
.316
.834
.992

One-tailedtests
10

20

40

80

0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3

.049
.297
.716
.952
.048
.295
.724
.957
.052
.309
.744
.962
.050
.317
.743
.962

.050
.283
.683
.937
.047
.285
.711
.951
.051
.304
.735
.961
.050
.314
.736
.958

.035
.284
.734
.964
.034
.288
.748
.977
.034
.288
.755
.977
.033
.279
.761
.978
Two-tailedtests

10

20

40

80

0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3

.051
.199
.583
.899
.050
.195
.603
.921
.051
.218
.630
.930
.051
.215
.627
.926

.050
.182
.531
.863
.048
.185
.580
.903
.051
.215
.620
.923
.051
.210
.617
.922

.030
.170
.601
.926
.031
.175
.622
.941
.030
.178
.636
.944
.030
.185
.649
.952

Note.Differencesarein unitsof 1.25u.

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Teacher's Corner

larger as the correlationincreases. The paired-samplestest dominates when a


positive correlationexceeds about .05, and the independent-samplestest dominates when the correlationis negative or zero.
As mentionedearlier,however, the uppersection of Figure 1 discloses that the
differencesin the lower section cannotbe interpretedas differences in the power
of the test, because the Type I error probability changes as the correlation
changes. The two sections of this figure together suggest that a correlation
spuriouslyelevates or depresses the entire power function of the independentsamples test. Instead of referring to power differences, one must state simply
that nonindependencecompromisesthe validity of the test and makes the power
to detect differencesuninterpretable.Althoughexplicit matchingis efficient only
for positive correlations,this spuriousalterationof the significance level occurs
for both positive and negative correlations.
Authorshave not often asked the second question,even thoughit has practical
implicationsfor research.Perhapsan experimenteris unawareof some incidental pairing which induces a correlationbetween measures of a dependent variable. In other words, a researchermay believe samples to be independentwhen
in reality they are correlated,although perhaps only slightly. Violation of random assignmentof subjectsto experimentaltreatmentsis one possible source of
such a correlation,which can invalidatethe independent-samplest test. Another
source was identified and studied by Coren and Hakstian(1990) and by Zumbo
(1996). These investigatorsexamined designs in which each subject contributes
two scores to the data pool-for example, measuresof two eyes, two ears, and
so on, in perceptualresearch.Researcherssometimesanalyze this kind of data as
if all measures are independent, ignoring the correlation induced by pairing.
This kind of violation is sometimes difficult to detect in otherwise welldesigned experimentsand probablyoccurs more often in researchstudies than is
generallyrealized. Undoubtedly,it can markedlyinfluence the significance level
and the probabilityof rejecting Ho. For this reason, the hazards of inappropriately using an independent-samplestest probablyare more serious than the loss
of degrees of freedom resulting from using a paired-samplestest when it is not
required.
Sometimes researchersfail to identify negative correlationsor overlook the
fact that negative correlationsin paired data have effects quite different from
positive correlations(see Figure 1). It is apparentfrom the equation presented
earlierthat they result in wider confidence intervalsand decreased sensitivity of
the paired-samples design. A negative relationship between naturally paired
subjects is conceivable in some practicalresearchcontexts. For example, Hays
(1988, p. 314) suggested that measures of personalitydominance of husbandwife pairs could be negatively correlatedif highly dominantwomen are paired
with men having low dominanceratings.Matchingon the basis of husband-wife
pairs thereforecould elevate the probabilityof Type I errorsof the independentsamples test and at the same time reduce the power of the paired-samplestest, as
indicatedin Figure 1. One can envision other negative relationshipsof this sort
357

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Teacher's Corner

in education, psychology, and the social sciences. Many of these relationships


may be quite difficult to detect, althoughsome can be avoided easily in experimental designs.
The example in Table 2 illustratessome of the practicalimplicationsof these
conclusions. Suppose a researcherbelieves that the scores in the two left-hand
columns, labeled X and Y, comprise independentsamples. A Studentt test fails
to reject Ho at the .05 significance level. Now, assume that there exists an
unknown correspondenceof scores as indicatedin the next three columns. The
second X and Y columns are permutationsof the two left-handcolumns with the
hidden pairing now displayed. In fact, these scores are computer-generated
samples from a populationin which the correlationbetween X and Y was .10 and
the difference between population means was 4.65. The sample correlation
turned out to be .139. Despite this relatively small correlation, which many
investigatorsmight consider insignificant,a paired-samplest test now rejectsHo
at the .05 significance level.
Let us now look at the same data from another point of view. Suppose an
experimenteris awareof the pairingindicatedin the table, but believes the small
correlation to be unimportantand performs an independent-samplest test in
order to take advantageof more degrees of freedom. The result is failure to
rejectHo, althougha paired-samplest test would have a differentoutcome. If the
existence of pairingor matchingis known, this kind of oversight is not likely to
occur and can be correctedeasily. However, it is impossible to know from most

TABLE 2
Example of a design in which initially there is an undetected correspondenceof values
t paired

t indep.
X

Pair

25
32
43
16
34
25
17
18
29
24
34
36

34
39
34
30
35
46
23
27
43
45
28
29

1
2
3
4
5
6
7
8
9
10
11
12

17
25
16
24
43
18
34
25
36
34
32
29

45
35
27
34
46
30
29
39
23
43
34
28

D=Y-

28
10
11
10
3
12
-5
14
-13
9
2
-1

Note.An independent-samples
withoutconsideration
of possible
Studentt test was firstperformed
pairingof scores.Then,pairingwas recognized,anda one-sampleStudentt test (i.e., a pairedon differencescores.
samplest test)wasperformed
Independent:t = 2.052, df-= 22, p > .05. Paired:t = 2.2 10, df = 11,p < .05. Pxy= .100. rxy = .139.

358

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Teacher's Corner

publishedexperimentswhetheror not a hidden pairingmight exist. Clearly,even


a slight correlationhas serious consequences.
To summarizethe conclusions to be drawnfrom the simulationsin the present
study: First, if one has a choice between an experimental design with two
groups that are unquestionablyindependentand a design in which subjects are
explicitly matchedor pairedby the experimenter,the formeris preferableunless
matchingproducesa substantialcorrelation.Because of samplingvariability,the
value of this correlationdependson n. For the sample sizes in the present article,
it might be in the range of, say, .30 to .40. Ideally, matchingwould produce an
even higher correlation.
This conclusion is consistent with recommendationsof almost all textbook
authors,althoughit should be emphasizedthatit dependson the assumptionthat
the correlationprior to matching is close to zero. Another way to express this
conclusion is that when samples are correlatedand an experimentaleffect exists,
the paired-samples test yields a narrower confidence interval. However, to
determinethe advantageof deliberatepairing, that intervalcannot be compared
to the one resulting from the independent-samplestest performedon the same
data. It should be comparedto the one resulting from the independent-samples
test performedon uncorrelateddata when the experimentaleffect is the same.
Second, if subjectsare naturallypaired, and an independent-samplesdesign is
out of the question (such as a pretest-posttestdesign using the same subjects), a
paired-samples test is mandatory unless one has strong evidence of
independence-that is, again, a near-zero correlation between the pretest and
posttest measures.Such evidence is not likely to arise in practice.This conclusion is consistent with the recommendationsof many but not all authors.
Third, the most unsuspectedconclusion to be drawnfrom the present simulations is this: A researcheremploying an independent-samplesdesign should go
to great lengths to ensure that supposedlyindependentgroupsof subjects are not
correlated in some manner that is not obvious. If correlationis suspected, the
researchershould redesign the experimentto take it into consideration.Failure
to identify even slight dependenceof measurescan change the significance level
and make the results of a statisticaltest uninterpretable.The presentsimulations
indicate that a correlationcoefficient of .10 or .15 is not sufficient evidence of
independence,not even for relatively small sample sizes.
Again, it should be emphasizedthat several distinct questions are relevant to
the choice of a significance test. Given that scores of subjects in two groups are
independent, should one deliberately induce a correlation through pairing in
order to obtain the great precision of the paired-samplesdesign? What correlation should be requiredto make that procedureworthwhile?On the other hand,
given that scores of subjects in two groups are already paired, what absolute
value of the correlationshould not be exceeded in orderto take advantageof the
largernumberof degrees of freedom of the independent-samplesdesign?
The presentsimulationdata suggest that the answersto these questionsare not
the same. For the sample sizes in the present study, the answerto the first is "in
359

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Teacher's Corner

the range of .30 to .40 or higher," and the answer to the second is "in the range
of .05 to .15 or lower." In any case, choice of an appropriatetest cannot be
determined solely by the error term in the denominatorof the t statistic and
calculation of the power of the test. It must also take into consideration a
possible alterationof the significance level by a correlationbetween observations.
References
Box, G. E. P., & Muller,M. (1958). A note on the generationof normaldeviates. Annals
of MathematicalStatistics,29, 610-611.
Cochran, W. G. (1947). Some consequences when the assumptions in the analysis of
varianceare not satisfied.Biometrics,3, 22-38.
Coren, S., & Hakstian,A. R. (1990). Methodological implications of interauralcorrelation: Count heads not ears. Perception and Psychophysics,48, 291-294.
Edwards,A. L. (1979). Multipleregression and the analysis of variance and covariance.
New York: Freeman.
Hays, W. L. (1988). Statistics (4th ed.). New York: Holt, Rinehart,& Winston.
Howell, D. C. (1987). Statisticalmethodsfor psychology. Boston: DuxburyPress.
Kurtz, K. H. (1965). Foundations of psychological research. Boston: Allyn & Bacon.
Loether, H. J., & McTavish, D. G. (1993). Descriptive and inferential statistics: An
introduction(4th ed.). Boston: Allyn & Bacon.
Pagano, R. R. (1986). Understandingstatistics in the behavioral sciences (2nd ed.). St.
Paul, MN: West.
Scheff6, H. (1959). The analysis of variance. New York: Wiley.
Walsh,J. E. (1947). Concerningthe effect of intraclasscorrelationon certainsignificance
tests. Annals of MathematicalStatistics, 18, 88-96.
Zimmerman,D. W., Williams,R. H., & Zumbo,B. D. (1993). Effect of nonindependence
of sample observationson some parametricand nonparametricstatistical tests. Communicationsin Statistics: Simulationand Computation,22, 779-789.
Zumbo, B. D. (1996). Randomizationtest for coupled data. Perception and Psychophysics, 58, 471-478.

Author
DONALD W. ZIMMERMANis Professor of Psychology (Retired),CarletonUniversity,
Ottawa, Canada,and can currentlybe reached at 15078 Eagle Place, Surrey,BC V3R
4W2, Canada;zimmerma@direct.ca.He specializes in psychology.
Received August 10, 1996
Revision received September3, 1996
Second revision received October 20, 1996
Accepted December 3, 1996

360

This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTC
All use subject to JSTOR Terms and Conditions

Das könnte Ihnen auch gefallen