Sie sind auf Seite 1von 8

This article was downloaded by: [141.214.17.

222]
On: 22 October 2014, At: 20:16
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

The American Statistician


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/utas20

A Consonance Criterion for Choosing Sample Size


a b
W. Alan Nicewander & James M. Price
a
Personnel Testing Division , Defense Manpower Data Center , Seaside , CA , 93955 , USA
b
Department of Psychology , Oklahoma State University , Stillwater , OK , 74078 , USA
Published online: 17 Feb 2012.

To cite this article: W. Alan Nicewander & James M. Price (1997) A Consonance Criterion for Choosing Sample Size, The
American Statistician, 51:4, 311-317

To link to this article: http://dx.doi.org/10.1080/00031305.1997.10474404

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
A Consonance Criterion for Choosing Sample Size
and James M. PRICE
W. Alan NICEWANDER
methods the consonance criterion produces formulas that
are similar, or identical, to some power-based methods, es-
Sample sizes determined by the proposed criterion ensure pecially the recent work of Harris and Quade (1992) and
that subjectively important parameter estimates will be Willan (1994). Following an explication of the consonance
statistically significant; alternatively, subjectively trivial criterion in the case of two-sample differences, we present
values will be excluded from confidence intervals for those generalizations to more complex situations, approximation
parameters. Formulas are given for simple mean differ- formulas for all cases presented, power computations, and
ences, multiple correlations, and mean comparison methods a discussion of the other proposals just mentioned.
in multiple group settings, with suggestions for extension to
larger, factorial designs. Although no assumptions are nec- 2. THE CONSONANCE CRITERION
essary about the true values of unknown parameters or a As used in this inquiry, consonance means bringing into
desired level of power, adequate power is provided for im- agreement the values of sample statistics and the statistical
portant results. Only central distributions are required for statements based upon them. As such, our use of the term
either the exact iterative solutions given or their closed form is quite similar to Gabriels (1969) usage in the context of
approximations. multiple comparison techniques. In what follows we assume
that observations are normal and independent, with a com-
KEY WORDS: Effect size; Mean difference; Power;
mon population standard deviation 0 , and that samples are
Downloaded by [141.214.17.222] at 20:16 22 October 2014

ScheffLs method; Tukeys method.


to have an equal number of observations n.
Let S represent the difference between the population
means of two groups in standard deviation units, that is,
1. INTRODUCTION = 1p1 - P2l/O. (1)
For both the consultant and the client one of the most (In the context of a designed experiment 6 is often desig-
vexing problems in the design of an experimental study is nated as the effect size.) Let 6, be such a standardized
the determination of the number of observations to include. difference that is deemed critical or important by the inves-
Even more vexing is the situation in which a critical or tigator. The proposed criterion for determining the value
important difference is apparent in the sample data, but for n is:
statistical statements do not preclude trivial interpreta-
Choose TI so that if the sample estimate d of 6 is equal to or greater than
tions (e.g., nonsignificance or zero as a possible value for 6,, then a (1- a ) l O O % confidence interval for pi - p2 will just exclude
a parameter). Previously proposed methods for determin- zero.
ing sample size n have generally been based on optimizing
This very simple criterion brings into accord (or conso-
certain aspects of statistical inference. For example, sam-
nance) the parameter estimation aspect of a study (i.e., es-
ple size is often chosen so as to yield a preferred value of
timating p1 - pa) and the inferential aspect (i.e., hypothesis
power (at some specified value of a ) for an effect size that
testing). Whenever d equals or exceeds 6, in magnitude,
has been designated as important by the investigator (e.g.,
the null hypothesis Ho: pl - p2 = 0 will always be re-
Scheffk 1959; Cohen 1988). Alternatively, sample size may
jected at the a level of significance; likewise, a (1- a )100%
be determined by a preference for a given absolute width
confidence interval for p.1 - p2 will never include zero as
of a confidence interval (e.g., Stein 1945). a possible value. We consider both aspects in the follow-
The present inquiry proposes a consonance criterion ing two sections. In these sections, and the supporting ta-
as an alternative to both the power based and the absolute bles, it should be understood that S is used as a convenient,
interval width calculations for sample size. This criterion unit-free metric for mean differences (especially in situa-
allows quick and easy sample size determination without tions in which o may not be known to the experimenter),
the necessity for making assumptions about factors that in- and that all probabilistic statements are oriented towards
fluence the parameters of the noncentral distribution of a the difference p1 - p2 in raw score form.
statistic (e.g., the configuration of population parameters).
All calculations are based on using the central distribu- 2.1 Two-Group Hypothesis Testing
tion of the statistic involved. For single degrees of freedom In order to compute the minimum sample size that will
always lead to the rejection of HOwhen the estimated effect
W. Alan Nicewander is Chief, Personnel Testing Division, Defense Man-
size equals or exceeds S,, it is necessary to show the rela-
power Data Center, Seaside, CA 93955. James M. Price is Associate Pro- tionship between the estimated effect size and the sample
fessor, Department of Psychology, Oklahoma State University, Stillwa- value of the t statistic. Let
ter, OK 74078 (E-mail: psyjmp@vms.ucc.okstate.edu). The authors thank
Richard Harris, Frank Schmidt, the associate editor, and the referees for 1x1- X 2 l
d= (2)
their helpful comments on an earlier version of this paper. SP

@ 1997 American Statistical Association The American Statistician, November 1997, Vol. 51, No. 4 311
be the sample estimate of the effect size, where If, as before, 6, is substituted for d in Equation (S), then
an equation identical to (5) is obtained; further simplifica-
s2P =
s;
~
+ s; (3) tion leads to Equation (6), and one can solve for sample
2 sizes such that if the (standardized) mean difference d is
is a pooled unbiased estimate of the (assumed common) 6, or greater, then the value of zero will fall outside the
variance, and s: and s: are the unbiased variance estimates (1 - a)100% confidence interval for (PI- p 2 ) .
for the two samples. It is easy to show that Stein (1945; Steel and Torrie 1960, pp. 86-87) proposed
- a somewhat similar method for sample size determination
It1 = d,/E2 (4) that is based on setting the absolute width of a confidence
interval to some a priori value. Such an interval will have
is the value of the sample t for testing the hypothesis that a desired absolute precision (say, f 5 mm), but may also
the group means are equal. In order to find, a priori, the include 0 as a possible value; the present method employs a
value of n for which one will always reject Ho whenever d specified relative level of precision to guarantee exclusion
equals or exceeds a, one must adjust n iteratively until of 0.
For the sake of simplicity and brevity only the hypoth-
esis testing approach is detailed in the following sections.
be,/; = L 4 2 ; 2 ( n - l ) (5) However, a derivation based on the estimation aspect will
yield identical formulas.
where tl--cl/2;2(n-1) is the two-tailed critical value of t for
2 ( n - 1) df at the a level of significance. Equivalently, the 3. GENERALIZATIONS TO MULTIPLE-GROUP
consonant sample size can be found by solving, iteratively, EXPERIMENTS
the implicit equation
Downloaded by [141.214.17.222] at 20:16 22 October 2014

In what follows the same principles will be used for other


techniques: 1) specify a sample difference that is important,
2) restate the appropriate test statistic in terms of that dif-
ference, and 3) solve for the value of n that sets the statistic
For example, suppose that an effect size of 6, = .5 is just equal to the a-level critical value. The resulting formu-
deemed to be critical. What should n be in order that HO las are simple enough for presentation to students at any
is always rejected (at Q = .05, two tailed) when the sample level, and may well be easier for even occasional users to
estimate of 6 is equal to or greater than 6, = .5? Trying remember. The methods to be presented are quite straight-
successive values for n until the equation holds, it will be forward to derive; their derivations are included here for
found that when n is set to 32, t = 2.00, just exceeding the completeness and to point out similarities to, and differ-
.05 critical value o f t for 62 df. A sample of size n = 32 will ences from, the standard power-based techniques.
always lead to the rejection of Ho in experiments that yield For multiple-group studies matters are a bit more compli-
an estimated effect size of .5 or greater; samples smaller cated, and there are at least two ways in which to generalize
than n = 32 may well yield estimated effect sizes of .5 or the consonance criterion discussed above. In the simplest of
greater, but without the certainty of rejecting Ho. these generalizations one might want to determine n such
that if the two most extreme sample means differ by 6, or
2.2 Interval Estimation of a Difference
more, one will always reject the hypothesis of equal group
In the context of effect sizes a confidence interval can be means using: 1) Scheffes method, or 2) Tukeys method.
thought of as the set of all numbers that, based on the data, These two situations are considered separately below.
are not excluded as possible values of the true effect size, on In this section and those that follow we make use of what
the basis of chance. Suppose that a sample size is wanted so we call the Tang Condition. In an unpublished paper in
that if some important difference (in terms of 6) is found in 1938 (Pearson and Hartley 1953), P. C. Tang proposed a
the sample means, then the (1- a)100% confidence interval worst case situation for power calculations for omnibus
located by this obtained difference will exclude zero as a tests in the analysis of variance (ANOVA). In this scenario
possible population value. In order to find this value of n, one assumes that (with the population means ordered from
which distinguishes important differences from trivial ones, largest to smallest)
one must solve (again iteratively) the following equation:
Pl = P + (&c/2)
(7) P2 = P
P3 = P
Dividing both sides by s p gives

or equivalently
where p is the grand mean (see, e.g., Cohen 1988, p. 277;
(9) Scheffk 1959, pp. 63-64). This configuration of population

312 General
means represents the minimum departure from HOsuch that statistic q is given by
at least one pair of population means differs by &CT. As
such, this arrangement yields the minimum important value XL - xs
= dmax fi (15)
for the noncentrality parameter of the appropriate sampling q= lpzJi
distribution. If one calculates the sample size under the as-
sumptions of the Tang Condition, then the value of n pro- where MS,,,,, is the mean-square error from a correspond-
duced will be an upper bound to the sample size actually ing one-way ANOVA. Given this relationship between max-
needed to produce a specified level of power. imum sample effect size d, and the studentized range
The consonance criterion makes no assumptions about statistic q, it is obvious from previously developed logic that
population means or desired power. Aside from the later the value of n that will guarantee rejection of the hypoth-
comparisons of n and power produced by different methods, esis of equality of the population means when d, 2 6,
we will assume that the configuration of sample means is must satisfy
as described by (10) when population means are replaced
by corresponding sample means and the common variance 6 c f i = ql--a;J;J(n-l) (16)
o2 is replaced by MS,,,,,. where ql--a;J,J(n-l) is the 1 - a percentile point of the
3.1 Scheffes Method of Simultaneous Confidence sampling distribution of q. An alternative implicit formula
Intervals for n is
2
Given an experiment with J 2 2 treatment conditions in ql--a;J;J(n-l)
a single factor layout, let n=
6
:
Downloaded by [141.214.17.222] at 20:16 22 October 2014

4. COMPARISONS OF SAMPLE SIZES


Table 1 lists the multiple-sample ( J 2 2 ) consonance
be the sample estimate of 6 for the pair of sample means
based sample sizes for both the Scheffk and Tukey meth-
with the largest difference, where X L and XS are, respec-
ods, along with the standard sample size estimates based
tively, the largest and smallest sample means. The t test
on the Tang Condition, as a function of values of 6, and
statistic for this comparison is given by
the number of groups J . The standard Tang-based sample
sizes require a (user-fixed) value of power for their com-
putation; for this comparison, power was arbitrarily set to
In order to satisfy Scheffks criterion, with a common .8. All F-based sample sizes were determined by means of
consonant sample size, n is adjusted iteratively until an iterative routine written in the DATA step of SAS that
was programmed to mimic the aGtions of a user employing
n6,2 - the appendixes of a statistics text, the same method that
2 ( J - 1) - F l - a ; ( J - l ) ; J ( n - l ) . was actually used to find the q-based values of n. Step-
ping from an initial value of n = 2, the consonant sample
Equivalently, this consonance criterion-based sample size sizes were found by computing the critical value of F (us-
can be found using an iterative solution to the implicit equa- ing FCNV) and integrating the central F distribution (using
tion PROBF) until the convergence dictated by Equation (14)
was attained. The standard sample size values were deter-
mined by the same method, except that for each value of n
the value of the noncentrality parameter was computed for
A bonus of this approach results from the isomorphism the Tang Condition and the appropriate noncentral F dis-
between the omnibus ANOVA F test and the maximum nor- tribution integrated (again using SASs PROBF function).
malized comparison among means. If the value of n given (Note: Under the Tang Condition, the noncentrality param-
by the previous equations is used in a J-sample layout, both eter used by PROBF is X = (n6,)/2. The parameter for
the ANOVA F and the t for the maximum pairwise com- entering the Pearson and Hartley charts is 4 = m.)
parison of means will be significant whenever d, 2 6,. Table 1 shows that the consonance criterion generally
Put another way, when the ANOVA F test is significant and leads to smaller sample sizes than does the Tang Upper
d, L 6,,at least one painvise mean comparison will also Bound when power is set at .8 for the standard method,
be significant by Scheffks method. with the greatest differences in values of n occurring for
smaller numbers of samples. The Tang Upper Bound Val-
3.2 -keys Method for the Maximum Pairwise ues of n change monotonically with power, whereas the
Difference consonant values of n are constant for a given design and
If, in the multiple-group situation just described, one value of 6,. Therefore, neither method will necessarily al-
prefers using Tukeys method to investigate differences in ways produce lower values of n; however, it should be ex-
group means, then the following approach can be used. As pected that, in situations involving higher values of power
before, let d, be the largest standardized pairwise differ- and smaller numbers of samples, the consonance criterion
ence in the sample means. Then the studentized range test will require smaller sample sizes. The actual power asso-
The American Statistician, November 1997, Vol. 51, No. 4 313
Table 1. A Comparison of Sample Sizes Yieldedby the Two Consonance are lower than 6,are by definition too small to be of interest
Criterion Methods and the Standard Power-Based Method to the investigator. Examination of Table 2 indicates that, in
Method for Determining Sample Size such cases, the power of the consonance criterion samples
is never less than .5, and is often 30 and above.
No. of Consonance Consonance Tang Upper Bound
6, groups F-based q-based (power = .SO) The minimum value of .5 arises in the case of J = 2
samples when true S equals S,,, in which case power is the
.1 2 770 770 1,571
3 1,200 1,096 1,928
probability that the sample estimate d exceeds S,, and the
5 1,899 1,490 2,389 mean of the appropriate noncentral t distribution exactly
7 2319 1,739 2,726 equals the critical value of the test. As true S increases, an
9 3,102 1,928 3,006 increasing proportion of the noncentral distribution exceeds
.5 2 32 32 64 the critical value, and power increases. As J increases, this
3 49 44 79 minimum value of power is augmented by the proportion
5 77 60 97
7 102 70 110 of rejections that occur when several mean differences are
9 125 78 122 combined and yield a significant F statistic. Together, these
1.o 2 9 9 17 two factors yield the observed pattern of results in Table 2.
3 14 12 21
5 20 16 25 6. APPROXIMATIONS TO CONSONANT
7 26 18 29 SAMPLE SIZES
9 32 20 31
If one assumes that the population variance is known,
NOTE The consonance criterion F-based and q-based sample sizes are those values of n then simple equations with closed form solutions for con-
that ensure that the HOBfor F and 9 (the studentized range statistic) will be rejected if the most
sonant sample sizes can be developed. The resulting values
Downloaded by [141.214.17.222] at 20:16 22 October 2014

different means differ by &u or more.


yield approximate values for the true consonant sample size,
ciated with consonance-based sample sizes is discussed in and may be used as a starting point in the iterative solutions.
the next section. The implicit equations for consonant sample size-
It should also be noted that the q-based consonant values requiring iterative solutions-are given in Equation (6) for
of n are uniformly smaller than those for the F-based con- the two-group case and in (14) and (17) for the two gen-
sonant values (except for the trivial case of J = 2). As a eralizations to the multiple-group case. Under the assump-
consequence and a second bonus, if one chooses n on the tion of known variance the consonance sample sizes are
basis of an omnibus F test (Scheffk's method), then when- simple functions of the standard normal distribution, a chi-
ever the ANOVA F test is significant and d, 2 S,, at square distribution, and the distribution of the range statis-
least one pairwise mean difference will also be significant tic, respectively. For convenience the approximations are
by Tukey's method. presented below following the implicit equations giving the
exact solution.
Exact Solution Approximate Solution
5. POWER ASSOCIATED WITH CONSONANT
SAMPLE SIZES
An advantage of the consonance criterion for determin- J Groups (Scheffe) n = 2 ( J - 1 ) F l - a , ( J - l ) , J ( n - l ) ~ "f-a,(J-l)

ing sample size is that it requires no specification of the 62 e


J Groups (Tukey) 11.= q?-a,J,J(n-i) q?-a,J,-
power desired to reject a null hypothesis for a given (but un- ~

*: 6:
known) pattern of differences among the population means.
One might reasonably be concerned about the actual power Unlike the exact solutions the approximations employ n
associated with using a consonant sample size instead of the only on the left-hand side, yielding simple, closed form
standard value. Table 2 shows the power levels associated solutions. Note that in the case of consonant samples for
with the Scheffk and Tukey methods (under the Tang Con- J samples using Tukey's method, the approximate sample
dition) as a function of values of s,, the number of groups sizes are based on the distribution of the range-which is
J, the (unknown) true value of 6,and the sample sizes given given as the last row (infinite n) of the tabled distribution
in Table 1. Again, SAS's PROBF function was used to in- of the studentized range statistic q.
tegrate the appropriate noncentral F distribution for each It has been our experience, and that of Harris and Quade
of the Scheffk cases. Because of the problems associated (1991), that the approximate solutions for the consonance
with a noncentral studentized range distribution (Hochberg criterion sample sizes rarely differ by 1 or 2 (and occa-
and Tamhane 1987) a noncentral F distribution was used to sionally 3) from the exact solution. Kupper and Hafner
compute lower bounds to power for the studentized range- (1989) noted similar empirical discrepancies in power-based
determined sample sizes; these values should be considered approximation formulas, and offered tables for correcting
quite conservative. the value of n so derived. Guenther (1981), also in a pa-
Notice that a number of power lower bounds are enclosed per discussing power-based sample size determination, sug-
in boxes. These boxed values are the lower bounds to power gested a correction to a similar approximate solution for
for true values of S that are equal to or greater than 6,. n in the two-group case that can be used in conjunction
Ordinarily, one is most interested in the performance of a with the consonance-based sample sizes for two-group ex-
statistical test only for such values of S because values that periments, namely, increase the computed approximate n
314 General
Table 2. Conditional Lower Bounds to Power for Consonance Criterion Sample Sizes for Selected Values of J and 6

True 6
10 .25 .50 .75 1.00 1.25 1.50

6, = .I0
J n
2 770 50 .99 .99 .99 .99 .99 .99
3 1,200(1,096) 58 (54) .99 (.99) .99(.99) .99(.99) .99 (.99) .99 (.99) .99 (.99)
5 1,899(1,490) .69(57) .99(.99) .99(.99) .99(.99) .99(.99) .99 (.99) .99 (.99)
7 2,519(1,739) .76 ( 5 7 ) .99(.99) .99 (.99) .99 (.99) .99(.99) .99(.99) .99 (.99)
9 3,102(1,928) .82(57) .99(.99) .99 (.99) .99 (.99) .99(.99) .99(.99) .99 (.99)

6, = .50
J n
2 32 .07 .I 7 50 .84 .98 .99 .99
3 49 (47) .07(.07) .I8(.16) 58 (53) .92(.88) .99(.99) .99(.99) .99 (.99)
5 77 (60) .07(.06) .20(.16) .69 (56) .97(.92) .99(.99) .99(.99) .99 (.99)
7 102 (70) .07(.06) .22 (.16) .76 (57) .99 (.93) .99(.99) .99 (.99) .99(.99)
9 125 (78) .07(.06) .23(.16) .82 (57) .99 (.94) .99(.99) .99 (.99) .99(.99)

6, = 1.00
J n
2 9 .05 .08 .I 7 .32
3 14 (12) .05 (.05) .08(.08) .I9 (.16) .38(.33) 6 2 (54) .82(.75) .94(.89)
Downloaded by [141.214.17.222] at 20:16 22 October 2014

5 20 (16) .05 (.05) .08(.07) .21 (.15) .47(.32) .76(56) .94(.79) .99(.93)
7 26 (18) .05 (.05) .08(.07) .21 (.15) .47(.32) .76 (56) .94(.79) .99(.93)
9 32 (20) .05 (.05) .08(.07) .23(.15) 52 (.32)

NOTE: All values of n are based on the consonance criterion, and are shown in Table 1. For J > 2, sample sizes are based on both the omnibus F test and the studentized range statistic (values
of n in parentheses). All conditional lower bounds to power are based on rejectingthe hypothesis tested by the omnibus F test, that is, power values in parentheses are for sample sizes determined
from the studentized range statistic, but power was evaluated in terms of rejecting the omnibus F test (see text).

by 0.252,2-,/, observations. This modification gives val- The expression in (18) is not, in itself, hard to derive from
ues that correspond roughly to the empirical discrepancies a power-based standpoint, and seemingly quite a few others
mentioned above, and it has the advantage of automatically have done so because each paper just mentioned credits
adjusting for the level of a and the nature of the test (one- a different set of predecessors for its origin. The present
or two-tailed). Whether this modification can be improved paper arrives at (18) by dropping all reference to power in
and/or extended to multiple-group techniques is an open the derivation, relying instead on a requirement of assured
question. significance or certain exclusion of zero from a confidence
interval when Id1 2 6,. This represents one major difference
7. COMPARISON WITH OTHER PROPOSALS between the present paper and those of Harris and Quade
As mentioned at the outset, in the simplest case, involv- (1992) and Willan (1994).
ing the simple difference between two sample means, the A second major difference with the work of Harris and
formulas produced by satisfying the consonance criterion Quade (1992) is our approach to multiple sample settings.
are similar or identical to those produced by others. This is Although they concentrate on single degree of freedom
due, in part, to similarities or identities among those other methods, Harris and Quade point out, quite correctly, that
approaches. For instance, both Guenther (1981) and Kupper their MIDS criterion may produce values of n in an ANOVA
and Hafner (1989) make use of the formula (modified for setting that are larger than necessary for specific (i.e., a pri-
consistency with the present notation) ori) contrasts among means, and they recommend methods
for reducing n until 50% power is achieved for those con-
n 2 2[(21-,/2 + z1-a)/6l2 (18) trasts and the chosen method of familywise control of a
which determines sample size for a two-tailed test of signif- (p. 40).
icance for two independent samples at the a level of signifi- The present paper gives simple formulas for tests of mean
cance with power equal to 1- ,O (exact for known common contrasts by both Scheffks and Tukeys methods if an a
variance and approximate otherwise). Harris and Quades priori approach is taken; these formulas may readily be ex-
(1992, p. 46) formula for this situation is the special case of tended to other criteria for significance. If, however, the
(18) that occurs when their minimally important difference more traditional a posteriori approach is appropriate, then
significant (or MIDS) criterion (that the optimal power for we recommend setting n according to (14) or its approxi-
a test of significance is S O ) is adopted and 21-p = 0. Willan mation. Under the sample Tang Condition, if the omnibus
(1994), working in the context of controlled management F test is significant, then at least one test of pairwise mean
trials, derives a similar formula based on setting power at differences will also be significant by both Scheffks and
S O for the value of (our) 6 that represents the researchers Tukey s methods.
point of indifference between two competing treatment reg- Willans (1994) alternative approach for choosing n in
imens. clinical management trials represents yet another departure
The American Statistician, November 1997, Vol. 51, No. 4 315
from the single degrees of freedom work previously cited. differ by 6, or more, under any pattern of differences in
As the author states (p. 212), the objective of manage- the population means. The consulting statistician can tell
ment trials is aimed at deciding which treatment should be clients, I have determined a sample size such that if some-
used[,] as opposed to explanatory trials, which are con- thing important happens in your experiment (as measured
ducted to determine whether a difference in treatments ex- by d, relative to S,), it will be statistically significant.
ists at all. The management trial approach is in the tradi- Finally, the approximate solutions, based on the central nor-
tion of Wald (1947), in which one desires strong evidence mal, chi-square, and range distributions, have closed form
for one or the other of two distinct hypotheses. As such, solutions that are trivial to compute; even the iterative ex-
Willans S (for his symmetric case, p. 215) is half the act solutions are easier to find than those involving noncen-
value used in the present paper. Substituting this value into tral distributions. This simplicity makes the present method
the present formulas (and recognizing that his n is the total easier to present to students or clients, and may prove ap-
number of observations, not n per sample) leads to Willans pealing to those who need such formulas only occasionally.
equation (6). Blind use of the present formulas in the man- With a little reflection it is easy to see that the consonance
agement trial setting would lead to dramatic underestimates criterion may be used for determining n in other situations.
(by a factor of 8) relative to Willans criterion. The ap- For instance, main effect and cell means tests in multifac-
proaches taken by all the other works cited in the present tor designs may be handled by substituting the appropriate
paper, as well as our approach, lead to values that would degrees of freedom in Equations (14) and (17) or their ap-
be appropriate for explanatory trials only, without such ad- proximations. Although derived from a different standpoint,
justments. Harris and Quades (1991) formulas for Pearsons T and x2
are also such extensions, which may be further generalized.
8. SUMMARY AND DISCUSSION As an illustration, suppose one wanted to test the multiple
Downloaded by [141.214.17.222] at 20:16 22 October 2014

The investigations that led to the consonance criterion correlation coefficient for relating several independent vari-
were inspired by a recent paper in which Schmidt (1992) ables to some dependent variable. The consonance criterion
discussed situations in which the values of n were such that sample size in this situation is the value of n that guaran-
the sample estimate of S had to exceed 6, by 20% or more tees the rejection of Ho: p 2 = 0 if the (squared) sample
in order for HO to be rejected. As previously stated, such multiple correlation exceeds some critical size, say pz. It is
results can be vexing (embarrassing?)to both the researcher easy to show that the exact implicit equation for the conso-
and the statistical consultant. The consonance criterion was nance sample size and the (known-variance) approximation
proposed because it determines a value of n such that re- are, respectively,
jection of HO is assured whenever the largest sample effect 2
size d, equals or exceeds some critical effect size 6,. For- n = P F l - - a ; p ; n - p - l ( l - P,) + p + l
mulas for determining this value of n were proposed and P:
used to construct tables of sample sizes and minimal tradi- and
tional power for simple experiments. A striking feature of
the sample size table is how large these sample sizes are
for effect sizes as small as .lo, and how small they are for
effect sizes as large as 1.0. This range of values emphasizes where p is the number of independent (predictor) variables.
an observation made not long ago by Tukey (1986): For example, given five independent variables, and given
that a (squared) multiple correlation of .4is considered im-
With a reasonable amount of data, things of size 50 are nearly trivial to portant, the exact formula yieldsa sample size of n = 58,
find-anyone should be able to find them-whereas things of 0 . 0 5 ~ can with an approximate sample size of 56.
be nearly impossible to find, once we face the presence of systematic error Cohen (1992) recounts his seemingly futile efforts over
as well as those very nice errors whose effects come down like cr/J;E
(P. 76).
three decades to make sample size selection easier for the
everyday research worker, through simpler (and fewer) for-
A double bonus of the generalizations to multiple sample mulas and improved tables. Like him, we hope that our
situations is that choosing n for the ANOVAs omnibus F proposed methods will help to reverse the negative answer
test assures rejection of at least one a posteriori pairwise to Sedlmeier and Gigerenzers (1 989) title question.
mean comparison, by either Scheffks or Tukeys method.
One question that will surely arise in connection with [Received September 1993. Revised September 1996.1
the consonance criterion is, What advantage does this cri-
terion for determining n have over older methods in which REFERENCES
one specifies values of a, S,, and power, and then calculates Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences
sample size based on some assumed pattern of differences (2nd ed.), New York: Academic Press.
(19921, A Power Primer, Psychological Bulletin, 112, 155-159.
among the population treatment means? For one thing a Gabriel, K. R. (1969), Simultaneous Test Procedures-Some Theory of
value of power for a particular configuration of unknown Multiple Comparisons, Annals of Mathematical Statistics, 40,224-250.
population means does not have to be specified in order to Guenther, W. G. (1981), Sample Size Formulas for Normal Theory T
do the calculations. Second, the sample sizes given by the Tests, The American Statistician, 35, 243-244.
Harris, R. J., and Quade, D. (1992), The Minimally Important Difference
consonance criterion deliver their promise of certain rejec- SignificantCriterion for Sample Size, Journal of Educational Statistics,
tion of HO whenever one or more pairs of sample means 17, 27-49.

316 General
Hochberg, Y., and Tamhane, A. C. (19871, Multiple Comparison Proce- 309-3 16.
dures, New York: Wiley. Steel, R. G. D., and Torrie, J. H. (1960), Principles and Procedures of
Kupper, L. L., and Hafner, K. B. (1989), How Appropriate are Popular Statistics, New York: McGraw-Hill.
Sample Size Formulas?, The American Statistician, 43, 101-105. Stein, C. (1945), A Two-Sample Test for a Linear Hypothesis Whose
Pearson, E. S., and Hartley, H. 0. (19511, Charts of the Power Function Power is Independent of Variance, Annals of Mathematical Statistics,
of the Analysis of Variance Tests, Derived from the Non-Central F - 16,243-258.
Distribution, Biometrika, 38, 112-1 30. Tang, P. C. (1938), unpublished manuscript.
Scheffk, H. A. (1959), The Analysis of Variance, New York: Wiley. Tukey, J. W. (1986), Sunset Salvo, The American Statistician, 40, 72-76.
Schmidt, F. L. (1992), What do Data Really Mean?, Anrerican Psychol- Wald, A. (1947), Sequential Analysis (1973 reprint by Dover Publications,
ogist, 47, 1173-1 180. New York).
Sedlmeier, P., and Gigerenzer, G. (1989), Do Studies of Statistical Power Willan, A. R. (1994), Alternative Approach for Analyzing Management
have an Effect on the Power of Studies?, Psychological Bulletin, 105, Trials, Controlled Clinical Trials, 15, 21 1-219.
Downloaded by [141.214.17.222] at 20:16 22 October 2014

The American Statistician, November 1997, Vol. 51, No. 4 317