0 Stimmen dafür0 Stimmen dagegen

0 Aufrufe8 SeitenTamano amotra

Jun 13, 2017

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

Tamano amotra

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Aufrufe

Tamano amotra

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 8

222]

On: 22 October 2014, At: 20:16

Publisher: Taylor & Francis

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,

37-41 Mortimer Street, London W1T 3JH, UK

Publication details, including instructions for authors and subscription information:

http://www.tandfonline.com/loi/utas20

a b

W. Alan Nicewander & James M. Price

a

Personnel Testing Division , Defense Manpower Data Center , Seaside , CA , 93955 , USA

b

Department of Psychology , Oklahoma State University , Stillwater , OK , 74078 , USA

Published online: 17 Feb 2012.

To cite this article: W. Alan Nicewander & James M. Price (1997) A Consonance Criterion for Choosing Sample Size, The

American Statistician, 51:4, 311-317

Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content) contained

in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no

representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the

Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and

are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and

should be independently verified with primary sources of information. Taylor and Francis shall not be liable for

any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever

or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of

the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic

reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any

form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://

www.tandfonline.com/page/terms-and-conditions

A Consonance Criterion for Choosing Sample Size

and James M. PRICE

W. Alan NICEWANDER

methods the consonance criterion produces formulas that

are similar, or identical, to some power-based methods, es-

Sample sizes determined by the proposed criterion ensure pecially the recent work of Harris and Quade (1992) and

that subjectively important parameter estimates will be Willan (1994). Following an explication of the consonance

statistically significant; alternatively, subjectively trivial criterion in the case of two-sample differences, we present

values will be excluded from confidence intervals for those generalizations to more complex situations, approximation

parameters. Formulas are given for simple mean differ- formulas for all cases presented, power computations, and

ences, multiple correlations, and mean comparison methods a discussion of the other proposals just mentioned.

in multiple group settings, with suggestions for extension to

larger, factorial designs. Although no assumptions are nec- 2. THE CONSONANCE CRITERION

essary about the true values of unknown parameters or a As used in this inquiry, consonance means bringing into

desired level of power, adequate power is provided for im- agreement the values of sample statistics and the statistical

portant results. Only central distributions are required for statements based upon them. As such, our use of the term

either the exact iterative solutions given or their closed form is quite similar to Gabriels (1969) usage in the context of

approximations. multiple comparison techniques. In what follows we assume

that observations are normal and independent, with a com-

KEY WORDS: Effect size; Mean difference; Power;

mon population standard deviation 0 , and that samples are

Downloaded by [141.214.17.222] at 20:16 22 October 2014

to have an equal number of observations n.

Let S represent the difference between the population

means of two groups in standard deviation units, that is,

1. INTRODUCTION = 1p1 - P2l/O. (1)

For both the consultant and the client one of the most (In the context of a designed experiment 6 is often desig-

vexing problems in the design of an experimental study is nated as the effect size.) Let 6, be such a standardized

the determination of the number of observations to include. difference that is deemed critical or important by the inves-

Even more vexing is the situation in which a critical or tigator. The proposed criterion for determining the value

important difference is apparent in the sample data, but for n is:

statistical statements do not preclude trivial interpreta-

Choose TI so that if the sample estimate d of 6 is equal to or greater than

tions (e.g., nonsignificance or zero as a possible value for 6,, then a (1- a ) l O O % confidence interval for pi - p2 will just exclude

a parameter). Previously proposed methods for determin- zero.

ing sample size n have generally been based on optimizing

This very simple criterion brings into accord (or conso-

certain aspects of statistical inference. For example, sam-

nance) the parameter estimation aspect of a study (i.e., es-

ple size is often chosen so as to yield a preferred value of

timating p1 - pa) and the inferential aspect (i.e., hypothesis

power (at some specified value of a ) for an effect size that

testing). Whenever d equals or exceeds 6, in magnitude,

has been designated as important by the investigator (e.g.,

the null hypothesis Ho: pl - p2 = 0 will always be re-

Scheffk 1959; Cohen 1988). Alternatively, sample size may

jected at the a level of significance; likewise, a (1- a )100%

be determined by a preference for a given absolute width

confidence interval for p.1 - p2 will never include zero as

of a confidence interval (e.g., Stein 1945). a possible value. We consider both aspects in the follow-

The present inquiry proposes a consonance criterion ing two sections. In these sections, and the supporting ta-

as an alternative to both the power based and the absolute bles, it should be understood that S is used as a convenient,

interval width calculations for sample size. This criterion unit-free metric for mean differences (especially in situa-

allows quick and easy sample size determination without tions in which o may not be known to the experimenter),

the necessity for making assumptions about factors that in- and that all probabilistic statements are oriented towards

fluence the parameters of the noncentral distribution of a the difference p1 - p2 in raw score form.

statistic (e.g., the configuration of population parameters).

All calculations are based on using the central distribu- 2.1 Two-Group Hypothesis Testing

tion of the statistic involved. For single degrees of freedom In order to compute the minimum sample size that will

always lead to the rejection of HOwhen the estimated effect

W. Alan Nicewander is Chief, Personnel Testing Division, Defense Man-

size equals or exceeds S,, it is necessary to show the rela-

power Data Center, Seaside, CA 93955. James M. Price is Associate Pro- tionship between the estimated effect size and the sample

fessor, Department of Psychology, Oklahoma State University, Stillwa- value of the t statistic. Let

ter, OK 74078 (E-mail: psyjmp@vms.ucc.okstate.edu). The authors thank

Richard Harris, Frank Schmidt, the associate editor, and the referees for 1x1- X 2 l

d= (2)

their helpful comments on an earlier version of this paper. SP

@ 1997 American Statistical Association The American Statistician, November 1997, Vol. 51, No. 4 311

be the sample estimate of the effect size, where If, as before, 6, is substituted for d in Equation (S), then

an equation identical to (5) is obtained; further simplifica-

s2P =

s;

~

+ s; (3) tion leads to Equation (6), and one can solve for sample

2 sizes such that if the (standardized) mean difference d is

is a pooled unbiased estimate of the (assumed common) 6, or greater, then the value of zero will fall outside the

variance, and s: and s: are the unbiased variance estimates (1 - a)100% confidence interval for (PI- p 2 ) .

for the two samples. It is easy to show that Stein (1945; Steel and Torrie 1960, pp. 86-87) proposed

- a somewhat similar method for sample size determination

It1 = d,/E2 (4) that is based on setting the absolute width of a confidence

interval to some a priori value. Such an interval will have

is the value of the sample t for testing the hypothesis that a desired absolute precision (say, f 5 mm), but may also

the group means are equal. In order to find, a priori, the include 0 as a possible value; the present method employs a

value of n for which one will always reject Ho whenever d specified relative level of precision to guarantee exclusion

equals or exceeds a, one must adjust n iteratively until of 0.

For the sake of simplicity and brevity only the hypoth-

esis testing approach is detailed in the following sections.

be,/; = L 4 2 ; 2 ( n - l ) (5) However, a derivation based on the estimation aspect will

yield identical formulas.

where tl--cl/2;2(n-1) is the two-tailed critical value of t for

2 ( n - 1) df at the a level of significance. Equivalently, the 3. GENERALIZATIONS TO MULTIPLE-GROUP

consonant sample size can be found by solving, iteratively, EXPERIMENTS

the implicit equation

Downloaded by [141.214.17.222] at 20:16 22 October 2014

techniques: 1) specify a sample difference that is important,

2) restate the appropriate test statistic in terms of that dif-

ference, and 3) solve for the value of n that sets the statistic

For example, suppose that an effect size of 6, = .5 is just equal to the a-level critical value. The resulting formu-

deemed to be critical. What should n be in order that HO las are simple enough for presentation to students at any

is always rejected (at Q = .05, two tailed) when the sample level, and may well be easier for even occasional users to

estimate of 6 is equal to or greater than 6, = .5? Trying remember. The methods to be presented are quite straight-

successive values for n until the equation holds, it will be forward to derive; their derivations are included here for

found that when n is set to 32, t = 2.00, just exceeding the completeness and to point out similarities to, and differ-

.05 critical value o f t for 62 df. A sample of size n = 32 will ences from, the standard power-based techniques.

always lead to the rejection of Ho in experiments that yield For multiple-group studies matters are a bit more compli-

an estimated effect size of .5 or greater; samples smaller cated, and there are at least two ways in which to generalize

than n = 32 may well yield estimated effect sizes of .5 or the consonance criterion discussed above. In the simplest of

greater, but without the certainty of rejecting Ho. these generalizations one might want to determine n such

that if the two most extreme sample means differ by 6, or

2.2 Interval Estimation of a Difference

more, one will always reject the hypothesis of equal group

In the context of effect sizes a confidence interval can be means using: 1) Scheffes method, or 2) Tukeys method.

thought of as the set of all numbers that, based on the data, These two situations are considered separately below.

are not excluded as possible values of the true effect size, on In this section and those that follow we make use of what

the basis of chance. Suppose that a sample size is wanted so we call the Tang Condition. In an unpublished paper in

that if some important difference (in terms of 6) is found in 1938 (Pearson and Hartley 1953), P. C. Tang proposed a

the sample means, then the (1- a)100% confidence interval worst case situation for power calculations for omnibus

located by this obtained difference will exclude zero as a tests in the analysis of variance (ANOVA). In this scenario

possible population value. In order to find this value of n, one assumes that (with the population means ordered from

which distinguishes important differences from trivial ones, largest to smallest)

one must solve (again iteratively) the following equation:

Pl = P + (&c/2)

(7) P2 = P

P3 = P

Dividing both sides by s p gives

or equivalently

where p is the grand mean (see, e.g., Cohen 1988, p. 277;

(9) Scheffk 1959, pp. 63-64). This configuration of population

312 General

means represents the minimum departure from HOsuch that statistic q is given by

at least one pair of population means differs by &CT. As

such, this arrangement yields the minimum important value XL - xs

= dmax fi (15)

for the noncentrality parameter of the appropriate sampling q= lpzJi

distribution. If one calculates the sample size under the as-

sumptions of the Tang Condition, then the value of n pro- where MS,,,,, is the mean-square error from a correspond-

duced will be an upper bound to the sample size actually ing one-way ANOVA. Given this relationship between max-

needed to produce a specified level of power. imum sample effect size d, and the studentized range

The consonance criterion makes no assumptions about statistic q, it is obvious from previously developed logic that

population means or desired power. Aside from the later the value of n that will guarantee rejection of the hypoth-

comparisons of n and power produced by different methods, esis of equality of the population means when d, 2 6,

we will assume that the configuration of sample means is must satisfy

as described by (10) when population means are replaced

by corresponding sample means and the common variance 6 c f i = ql--a;J;J(n-l) (16)

o2 is replaced by MS,,,,,. where ql--a;J,J(n-l) is the 1 - a percentile point of the

3.1 Scheffes Method of Simultaneous Confidence sampling distribution of q. An alternative implicit formula

Intervals for n is

2

Given an experiment with J 2 2 treatment conditions in ql--a;J;J(n-l)

a single factor layout, let n=

6

:

Downloaded by [141.214.17.222] at 20:16 22 October 2014

Table 1 lists the multiple-sample ( J 2 2 ) consonance

be the sample estimate of 6 for the pair of sample means

based sample sizes for both the Scheffk and Tukey meth-

with the largest difference, where X L and XS are, respec-

ods, along with the standard sample size estimates based

tively, the largest and smallest sample means. The t test

on the Tang Condition, as a function of values of 6, and

statistic for this comparison is given by

the number of groups J . The standard Tang-based sample

sizes require a (user-fixed) value of power for their com-

putation; for this comparison, power was arbitrarily set to

In order to satisfy Scheffks criterion, with a common .8. All F-based sample sizes were determined by means of

consonant sample size, n is adjusted iteratively until an iterative routine written in the DATA step of SAS that

was programmed to mimic the aGtions of a user employing

n6,2 - the appendixes of a statistics text, the same method that

2 ( J - 1) - F l - a ; ( J - l ) ; J ( n - l ) . was actually used to find the q-based values of n. Step-

ping from an initial value of n = 2, the consonant sample

Equivalently, this consonance criterion-based sample size sizes were found by computing the critical value of F (us-

can be found using an iterative solution to the implicit equa- ing FCNV) and integrating the central F distribution (using

tion PROBF) until the convergence dictated by Equation (14)

was attained. The standard sample size values were deter-

mined by the same method, except that for each value of n

the value of the noncentrality parameter was computed for

A bonus of this approach results from the isomorphism the Tang Condition and the appropriate noncentral F dis-

between the omnibus ANOVA F test and the maximum nor- tribution integrated (again using SASs PROBF function).

malized comparison among means. If the value of n given (Note: Under the Tang Condition, the noncentrality param-

by the previous equations is used in a J-sample layout, both eter used by PROBF is X = (n6,)/2. The parameter for

the ANOVA F and the t for the maximum pairwise com- entering the Pearson and Hartley charts is 4 = m.)

parison of means will be significant whenever d, 2 6,. Table 1 shows that the consonance criterion generally

Put another way, when the ANOVA F test is significant and leads to smaller sample sizes than does the Tang Upper

d, L 6,,at least one painvise mean comparison will also Bound when power is set at .8 for the standard method,

be significant by Scheffks method. with the greatest differences in values of n occurring for

smaller numbers of samples. The Tang Upper Bound Val-

3.2 -keys Method for the Maximum Pairwise ues of n change monotonically with power, whereas the

Difference consonant values of n are constant for a given design and

If, in the multiple-group situation just described, one value of 6,. Therefore, neither method will necessarily al-

prefers using Tukeys method to investigate differences in ways produce lower values of n; however, it should be ex-

group means, then the following approach can be used. As pected that, in situations involving higher values of power

before, let d, be the largest standardized pairwise differ- and smaller numbers of samples, the consonance criterion

ence in the sample means. Then the studentized range test will require smaller sample sizes. The actual power asso-

The American Statistician, November 1997, Vol. 51, No. 4 313

Table 1. A Comparison of Sample Sizes Yieldedby the Two Consonance are lower than 6,are by definition too small to be of interest

Criterion Methods and the Standard Power-Based Method to the investigator. Examination of Table 2 indicates that, in

Method for Determining Sample Size such cases, the power of the consonance criterion samples

is never less than .5, and is often 30 and above.

No. of Consonance Consonance Tang Upper Bound

6, groups F-based q-based (power = .SO) The minimum value of .5 arises in the case of J = 2

samples when true S equals S,,, in which case power is the

.1 2 770 770 1,571

3 1,200 1,096 1,928

probability that the sample estimate d exceeds S,, and the

5 1,899 1,490 2,389 mean of the appropriate noncentral t distribution exactly

7 2319 1,739 2,726 equals the critical value of the test. As true S increases, an

9 3,102 1,928 3,006 increasing proportion of the noncentral distribution exceeds

.5 2 32 32 64 the critical value, and power increases. As J increases, this

3 49 44 79 minimum value of power is augmented by the proportion

5 77 60 97

7 102 70 110 of rejections that occur when several mean differences are

9 125 78 122 combined and yield a significant F statistic. Together, these

1.o 2 9 9 17 two factors yield the observed pattern of results in Table 2.

3 14 12 21

5 20 16 25 6. APPROXIMATIONS TO CONSONANT

7 26 18 29 SAMPLE SIZES

9 32 20 31

If one assumes that the population variance is known,

NOTE The consonance criterion F-based and q-based sample sizes are those values of n then simple equations with closed form solutions for con-

that ensure that the HOBfor F and 9 (the studentized range statistic) will be rejected if the most

sonant sample sizes can be developed. The resulting values

Downloaded by [141.214.17.222] at 20:16 22 October 2014

yield approximate values for the true consonant sample size,

ciated with consonance-based sample sizes is discussed in and may be used as a starting point in the iterative solutions.

the next section. The implicit equations for consonant sample size-

It should also be noted that the q-based consonant values requiring iterative solutions-are given in Equation (6) for

of n are uniformly smaller than those for the F-based con- the two-group case and in (14) and (17) for the two gen-

sonant values (except for the trivial case of J = 2). As a eralizations to the multiple-group case. Under the assump-

consequence and a second bonus, if one chooses n on the tion of known variance the consonance sample sizes are

basis of an omnibus F test (Scheffk's method), then when- simple functions of the standard normal distribution, a chi-

ever the ANOVA F test is significant and d, 2 S,, at square distribution, and the distribution of the range statis-

least one pairwise mean difference will also be significant tic, respectively. For convenience the approximations are

by Tukey's method. presented below following the implicit equations giving the

exact solution.

Exact Solution Approximate Solution

5. POWER ASSOCIATED WITH CONSONANT

SAMPLE SIZES

An advantage of the consonance criterion for determin- J Groups (Scheffe) n = 2 ( J - 1 ) F l - a , ( J - l ) , J ( n - l ) ~ "f-a,(J-l)

J Groups (Tukey) 11.= q?-a,J,J(n-i) q?-a,J,-

power desired to reject a null hypothesis for a given (but un- ~

*: 6:

known) pattern of differences among the population means.

One might reasonably be concerned about the actual power Unlike the exact solutions the approximations employ n

associated with using a consonant sample size instead of the only on the left-hand side, yielding simple, closed form

standard value. Table 2 shows the power levels associated solutions. Note that in the case of consonant samples for

with the Scheffk and Tukey methods (under the Tang Con- J samples using Tukey's method, the approximate sample

dition) as a function of values of s,, the number of groups sizes are based on the distribution of the range-which is

J, the (unknown) true value of 6,and the sample sizes given given as the last row (infinite n) of the tabled distribution

in Table 1. Again, SAS's PROBF function was used to in- of the studentized range statistic q.

tegrate the appropriate noncentral F distribution for each It has been our experience, and that of Harris and Quade

of the Scheffk cases. Because of the problems associated (1991), that the approximate solutions for the consonance

with a noncentral studentized range distribution (Hochberg criterion sample sizes rarely differ by 1 or 2 (and occa-

and Tamhane 1987) a noncentral F distribution was used to sionally 3) from the exact solution. Kupper and Hafner

compute lower bounds to power for the studentized range- (1989) noted similar empirical discrepancies in power-based

determined sample sizes; these values should be considered approximation formulas, and offered tables for correcting

quite conservative. the value of n so derived. Guenther (1981), also in a pa-

Notice that a number of power lower bounds are enclosed per discussing power-based sample size determination, sug-

in boxes. These boxed values are the lower bounds to power gested a correction to a similar approximate solution for

for true values of S that are equal to or greater than 6,. n in the two-group case that can be used in conjunction

Ordinarily, one is most interested in the performance of a with the consonance-based sample sizes for two-group ex-

statistical test only for such values of S because values that periments, namely, increase the computed approximate n

314 General

Table 2. Conditional Lower Bounds to Power for Consonance Criterion Sample Sizes for Selected Values of J and 6

True 6

10 .25 .50 .75 1.00 1.25 1.50

6, = .I0

J n

2 770 50 .99 .99 .99 .99 .99 .99

3 1,200(1,096) 58 (54) .99 (.99) .99(.99) .99(.99) .99 (.99) .99 (.99) .99 (.99)

5 1,899(1,490) .69(57) .99(.99) .99(.99) .99(.99) .99(.99) .99 (.99) .99 (.99)

7 2,519(1,739) .76 ( 5 7 ) .99(.99) .99 (.99) .99 (.99) .99(.99) .99(.99) .99 (.99)

9 3,102(1,928) .82(57) .99(.99) .99 (.99) .99 (.99) .99(.99) .99(.99) .99 (.99)

6, = .50

J n

2 32 .07 .I 7 50 .84 .98 .99 .99

3 49 (47) .07(.07) .I8(.16) 58 (53) .92(.88) .99(.99) .99(.99) .99 (.99)

5 77 (60) .07(.06) .20(.16) .69 (56) .97(.92) .99(.99) .99(.99) .99 (.99)

7 102 (70) .07(.06) .22 (.16) .76 (57) .99 (.93) .99(.99) .99 (.99) .99(.99)

9 125 (78) .07(.06) .23(.16) .82 (57) .99 (.94) .99(.99) .99 (.99) .99(.99)

6, = 1.00

J n

2 9 .05 .08 .I 7 .32

3 14 (12) .05 (.05) .08(.08) .I9 (.16) .38(.33) 6 2 (54) .82(.75) .94(.89)

Downloaded by [141.214.17.222] at 20:16 22 October 2014

5 20 (16) .05 (.05) .08(.07) .21 (.15) .47(.32) .76(56) .94(.79) .99(.93)

7 26 (18) .05 (.05) .08(.07) .21 (.15) .47(.32) .76 (56) .94(.79) .99(.93)

9 32 (20) .05 (.05) .08(.07) .23(.15) 52 (.32)

NOTE: All values of n are based on the consonance criterion, and are shown in Table 1. For J > 2, sample sizes are based on both the omnibus F test and the studentized range statistic (values

of n in parentheses). All conditional lower bounds to power are based on rejectingthe hypothesis tested by the omnibus F test, that is, power values in parentheses are for sample sizes determined

from the studentized range statistic, but power was evaluated in terms of rejecting the omnibus F test (see text).

by 0.252,2-,/, observations. This modification gives val- The expression in (18) is not, in itself, hard to derive from

ues that correspond roughly to the empirical discrepancies a power-based standpoint, and seemingly quite a few others

mentioned above, and it has the advantage of automatically have done so because each paper just mentioned credits

adjusting for the level of a and the nature of the test (one- a different set of predecessors for its origin. The present

or two-tailed). Whether this modification can be improved paper arrives at (18) by dropping all reference to power in

and/or extended to multiple-group techniques is an open the derivation, relying instead on a requirement of assured

question. significance or certain exclusion of zero from a confidence

interval when Id1 2 6,. This represents one major difference

7. COMPARISON WITH OTHER PROPOSALS between the present paper and those of Harris and Quade

As mentioned at the outset, in the simplest case, involv- (1992) and Willan (1994).

ing the simple difference between two sample means, the A second major difference with the work of Harris and

formulas produced by satisfying the consonance criterion Quade (1992) is our approach to multiple sample settings.

are similar or identical to those produced by others. This is Although they concentrate on single degree of freedom

due, in part, to similarities or identities among those other methods, Harris and Quade point out, quite correctly, that

approaches. For instance, both Guenther (1981) and Kupper their MIDS criterion may produce values of n in an ANOVA

and Hafner (1989) make use of the formula (modified for setting that are larger than necessary for specific (i.e., a pri-

consistency with the present notation) ori) contrasts among means, and they recommend methods

for reducing n until 50% power is achieved for those con-

n 2 2[(21-,/2 + z1-a)/6l2 (18) trasts and the chosen method of familywise control of a

which determines sample size for a two-tailed test of signif- (p. 40).

icance for two independent samples at the a level of signifi- The present paper gives simple formulas for tests of mean

cance with power equal to 1- ,O (exact for known common contrasts by both Scheffks and Tukeys methods if an a

variance and approximate otherwise). Harris and Quades priori approach is taken; these formulas may readily be ex-

(1992, p. 46) formula for this situation is the special case of tended to other criteria for significance. If, however, the

(18) that occurs when their minimally important difference more traditional a posteriori approach is appropriate, then

significant (or MIDS) criterion (that the optimal power for we recommend setting n according to (14) or its approxi-

a test of significance is S O ) is adopted and 21-p = 0. Willan mation. Under the sample Tang Condition, if the omnibus

(1994), working in the context of controlled management F test is significant, then at least one test of pairwise mean

trials, derives a similar formula based on setting power at differences will also be significant by both Scheffks and

S O for the value of (our) 6 that represents the researchers Tukey s methods.

point of indifference between two competing treatment reg- Willans (1994) alternative approach for choosing n in

imens. clinical management trials represents yet another departure

The American Statistician, November 1997, Vol. 51, No. 4 315

from the single degrees of freedom work previously cited. differ by 6, or more, under any pattern of differences in

As the author states (p. 212), the objective of manage- the population means. The consulting statistician can tell

ment trials is aimed at deciding which treatment should be clients, I have determined a sample size such that if some-

used[,] as opposed to explanatory trials, which are con- thing important happens in your experiment (as measured

ducted to determine whether a difference in treatments ex- by d, relative to S,), it will be statistically significant.

ists at all. The management trial approach is in the tradi- Finally, the approximate solutions, based on the central nor-

tion of Wald (1947), in which one desires strong evidence mal, chi-square, and range distributions, have closed form

for one or the other of two distinct hypotheses. As such, solutions that are trivial to compute; even the iterative ex-

Willans S (for his symmetric case, p. 215) is half the act solutions are easier to find than those involving noncen-

value used in the present paper. Substituting this value into tral distributions. This simplicity makes the present method

the present formulas (and recognizing that his n is the total easier to present to students or clients, and may prove ap-

number of observations, not n per sample) leads to Willans pealing to those who need such formulas only occasionally.

equation (6). Blind use of the present formulas in the man- With a little reflection it is easy to see that the consonance

agement trial setting would lead to dramatic underestimates criterion may be used for determining n in other situations.

(by a factor of 8) relative to Willans criterion. The ap- For instance, main effect and cell means tests in multifac-

proaches taken by all the other works cited in the present tor designs may be handled by substituting the appropriate

paper, as well as our approach, lead to values that would degrees of freedom in Equations (14) and (17) or their ap-

be appropriate for explanatory trials only, without such ad- proximations. Although derived from a different standpoint,

justments. Harris and Quades (1991) formulas for Pearsons T and x2

are also such extensions, which may be further generalized.

8. SUMMARY AND DISCUSSION As an illustration, suppose one wanted to test the multiple

Downloaded by [141.214.17.222] at 20:16 22 October 2014

The investigations that led to the consonance criterion correlation coefficient for relating several independent vari-

were inspired by a recent paper in which Schmidt (1992) ables to some dependent variable. The consonance criterion

discussed situations in which the values of n were such that sample size in this situation is the value of n that guaran-

the sample estimate of S had to exceed 6, by 20% or more tees the rejection of Ho: p 2 = 0 if the (squared) sample

in order for HO to be rejected. As previously stated, such multiple correlation exceeds some critical size, say pz. It is

results can be vexing (embarrassing?)to both the researcher easy to show that the exact implicit equation for the conso-

and the statistical consultant. The consonance criterion was nance sample size and the (known-variance) approximation

proposed because it determines a value of n such that re- are, respectively,

jection of HO is assured whenever the largest sample effect 2

size d, equals or exceeds some critical effect size 6,. For- n = P F l - - a ; p ; n - p - l ( l - P,) + p + l

mulas for determining this value of n were proposed and P:

used to construct tables of sample sizes and minimal tradi- and

tional power for simple experiments. A striking feature of

the sample size table is how large these sample sizes are

for effect sizes as small as .lo, and how small they are for

effect sizes as large as 1.0. This range of values emphasizes where p is the number of independent (predictor) variables.

an observation made not long ago by Tukey (1986): For example, given five independent variables, and given

that a (squared) multiple correlation of .4is considered im-

With a reasonable amount of data, things of size 50 are nearly trivial to portant, the exact formula yieldsa sample size of n = 58,

find-anyone should be able to find them-whereas things of 0 . 0 5 ~ can with an approximate sample size of 56.

be nearly impossible to find, once we face the presence of systematic error Cohen (1992) recounts his seemingly futile efforts over

as well as those very nice errors whose effects come down like cr/J;E

(P. 76).

three decades to make sample size selection easier for the

everyday research worker, through simpler (and fewer) for-

A double bonus of the generalizations to multiple sample mulas and improved tables. Like him, we hope that our

situations is that choosing n for the ANOVAs omnibus F proposed methods will help to reverse the negative answer

test assures rejection of at least one a posteriori pairwise to Sedlmeier and Gigerenzers (1 989) title question.

mean comparison, by either Scheffks or Tukeys method.

One question that will surely arise in connection with [Received September 1993. Revised September 1996.1

the consonance criterion is, What advantage does this cri-

terion for determining n have over older methods in which REFERENCES

one specifies values of a, S,, and power, and then calculates Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences

sample size based on some assumed pattern of differences (2nd ed.), New York: Academic Press.

(19921, A Power Primer, Psychological Bulletin, 112, 155-159.

among the population treatment means? For one thing a Gabriel, K. R. (1969), Simultaneous Test Procedures-Some Theory of

value of power for a particular configuration of unknown Multiple Comparisons, Annals of Mathematical Statistics, 40,224-250.

population means does not have to be specified in order to Guenther, W. G. (1981), Sample Size Formulas for Normal Theory T

do the calculations. Second, the sample sizes given by the Tests, The American Statistician, 35, 243-244.

Harris, R. J., and Quade, D. (1992), The Minimally Important Difference

consonance criterion deliver their promise of certain rejec- SignificantCriterion for Sample Size, Journal of Educational Statistics,

tion of HO whenever one or more pairs of sample means 17, 27-49.

316 General

Hochberg, Y., and Tamhane, A. C. (19871, Multiple Comparison Proce- 309-3 16.

dures, New York: Wiley. Steel, R. G. D., and Torrie, J. H. (1960), Principles and Procedures of

Kupper, L. L., and Hafner, K. B. (1989), How Appropriate are Popular Statistics, New York: McGraw-Hill.

Sample Size Formulas?, The American Statistician, 43, 101-105. Stein, C. (1945), A Two-Sample Test for a Linear Hypothesis Whose

Pearson, E. S., and Hartley, H. 0. (19511, Charts of the Power Function Power is Independent of Variance, Annals of Mathematical Statistics,

of the Analysis of Variance Tests, Derived from the Non-Central F - 16,243-258.

Distribution, Biometrika, 38, 112-1 30. Tang, P. C. (1938), unpublished manuscript.

Scheffk, H. A. (1959), The Analysis of Variance, New York: Wiley. Tukey, J. W. (1986), Sunset Salvo, The American Statistician, 40, 72-76.

Schmidt, F. L. (1992), What do Data Really Mean?, Anrerican Psychol- Wald, A. (1947), Sequential Analysis (1973 reprint by Dover Publications,

ogist, 47, 1173-1 180. New York).

Sedlmeier, P., and Gigerenzer, G. (1989), Do Studies of Statistical Power Willan, A. R. (1994), Alternative Approach for Analyzing Management

have an Effect on the Power of Studies?, Psychological Bulletin, 105, Trials, Controlled Clinical Trials, 15, 21 1-219.

Downloaded by [141.214.17.222] at 20:16 22 October 2014

## Viel mehr als nur Dokumente.

Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.

Jederzeit kündbar.