Sample Size For Binary Response Experiments Under Various Models

Sample Size for Binary Response Experiments
Under Various Models
Marie Gaudard, Marvin Karson, and Philip J. Ramsey

Univeristy of New Hampshire, Durham, NH 03824
January 5, 2001
Abstract
We itd~le&3'"-rh~_,~~su~.
of ~~mple size determination in binary response ex-
r.~({d1;&",-.eJ
t/.')...,V'q
periments.Il)~mpl.Q.y a generalized approach, where the o~ality criterion
is the classical &~ stze problem GFih~rion....G£.balancingAtype I and type II
error probabilities in a test for a treatment effect. Using this approach, we

. r. t
.:"-/0-T'; !','v'
give formulas for calculating sample sizes for experiments involving the Pois-
son, mixed binomial-Poisson, and negative binomial distributions, and for the
binomial model where replication error exists.
Key Words. Binary Data, Binomial Distribution, Fractional Factorial Designs,

Negative Binomial Distribution, Poisson Distribution, Replication Error, Sample Size.
INTRODUCTION
Consider a two-level balanced design, such as a 2k or 2k-f factorial experiment,

consisting of I distinct factor-level combinations, or treatments, i = 1, ..., I. Note
1
that I, the number of treatments, is even, and is usually a power of 2. For the ith
treatment setting, a run size, or subsample size, of n; is obtained. ,,-.,.
In this paper, we
consider situations were each subsam le measurement is binary.~n the binomial
case, the subs ample of size n, obtaine for a given treatment is assumed to consist
of n, independent Bernoulli trials. L Pi denote the unknown true probability of
a response (success) for a subs ample btained under the conditions of treatment i.
If Xij is the Bernoulli random variab e for the jth subsample of treatment i, then
,/Ii ,.
Pi = E (Xij ), and the usual estimator f Pi is Pi = ct X ij ) / n., The variance of Pi is

;; -:-1
In a 2k or 2k- f factorial experiment estimable main effects and interaction effects

are of primary interest. er, we consider the problem of determining
a sample size for such experiments t at tests for main or interaction effects with
specified power. Note that, in a bi omial experiment, the usual estimator of an
estimable effect is a contrast in mea of Pi's. Any main or interaction effect can be
estiima t ed bv
y an
an exnressi
expression 0
f th e C
10
Y 1 _ Y 2 -- _1 ",I/2~.
I/2 L.i=l Pt
__ 1 ",I/2~. __.1...:'~_~
I/2 L.i'=l p», ~
the subscript i runs over the high lev Is of the factor or interaction, and if runs over
the low levels. A test of E (Y1) - l!) (Y2 = U requires t.lie variauce of y 1 Y z, w-hich is
Note that this variance depends on the unknown response probabilities Pi. This
means that, for the usual test constr cted by dividing the estimator by its standard
error, it is impossible to determine h large the subsamples, tu, must be in order to
achieve a specified power for discerni g an effect E(Yi.) - E(Y2).
Bisgaard and Fuller (1995) address his problem for binomial subsamples by model-
ing Pi by Pi =]5+ ai, where ji is the 0 erall average response over the treatments, and
ai is the combined effect of the ith fa tor combination on Pi. To eliminate the depen-
dence of optimum run sizes on the Pi, Bisgaard and Fuller transform the Pi scale by the
2
usual variance stabilizing transformation for the binomial case, namely, arcsin( vPi).
!"\,, \ _ t< ,} " C) (. s.l:t~"",4-
TooFefe-re;-if Y .is a mean of M transformed i5/s,1 y = 'Lt!1 arcsin( vP:)/ M 1Jfe
J
Bisgaard-Fuller optimum run size criterion is based on the expected value of the
difference between two independent Y's, say, Y1 - Y2, each of M = 1/2 terms, repre-
senting estimates of arbitrary main effects or interactions! on the transformed scale.
The sample size determination formula is based on the run size obtained by applying
the classical sample sJ~ _~:erion that balances Type I error (a) and Type II error
((3) for aHwlaitrg,ry maiu Gr interactiQ.B @iI~R....the transformed.seade. Bisgaard
and Fuller show that any n satisfying P'f

(1)
will solve the classical sample size problem, where .6.1 is the difference one wants to
detect on the transformed scale. It is implicitly assumed that the required sample
size, ri, will be large enough to ensure approximate normality of the difference Y1 - Y2.
METHODOLOGY
We adopt an approach to sample size determination that builds on some features

of the Bisgaard and Fuller (1995) approach. We apply this approach to sample size
determination in a number of useful situations. Our approach can be summarized in
five steps:
1. Model: Model the estimator of the probability of a response, Pi, using an
ANOVA-like model;
2. Distribution: Use the model for the data to obtain Var(pi);
3. Variance-Stabilizing Transformation: Find a transformation of Pi that renders
Var(pi) independent of Pi;
, 3
Cow} d--iG'r'-
4. Classical Sample Size P¥oblem: Formulate the classical sample size problem for
H; : 6. = 0 versus Ha : 6. = 6.1, for 6.1 f 0, where 6. and ~1 are effects on the
transformed scale;
5. Solution: Solve the condition imposed by the classical sample size problem to
obtain the sample size allocation constraints.
Step 4 stipulates that
(2)
must hold for the transformed effects Y1, and 172,,P,fu(where ~1 is the effect of interest ,/
on the transformed scale. To see this, consider the following.

Any main or interaction effect of interest can be represented as /2 L:~~
1 Pi -
1/2 L:!~lPi" where the subscript i runs over the high levels of the factor or inter-
action, and i' runs over the low levels. Thus, we are interested in testing Ho :
1 ",1/2 1 ",1/2 0 Th e 0 bvi . b d h diff .
1/2 L...i=l Pi - 1/2 L...i'=l Pi' =. VIOUS test I~, .ase, on t e tll~rence Ip m.eans
- - _ 1 1/2 ~ 1 I/2 ~ i-tA'i'iN sd·"'1t-J-.",-<> I t~.f-Hr e--I~ .
Y1 - Y2 - 1/2 Li=lPi - 1/2 Li'=lPi'. However, ~t e vanance~9lfthls
quantity depends on the unknown Pi, making it impossible to solve for a sample size
that gives a predetermined power.
To obtain a test of Ho that does not depend on the values of the Pi and Pi', one can
apply a variance stabilizing transformation. Recall from the theory of Taylor series
that, if g(p) is a differentiable function of p, then, to a second degree approximation,
g(P) ~ g(p) + g'(p)(fi - p) for the value fi It follows that, if fi is an unbiased
estimator for p, E(g(fi)) ~ g(p) and V ar(g(fi)) ~ g' (p )2V ar(fi). A variance-stabilizing
transformation is a transformation 9 such that V ar(g(fi)), and thus, g' (p) Var(fi) , does
not depend on p. This is satisfied when g' (p) ex [V ar(fi) tl/2.
Transforming Y 1 - Y 2 using a variance-stabilizing transformation 9 gives the trans-
formed contrast: ~
Y1 -
~
Y 2 =
1
I/2 1/2 9 (~)
'Li=l
1 ",I/2 (~)
Pi - 1/2 L...i'=l 9 Pi"
S·mce E(Y,~1 -
y;~)
2 =
4
_1 ,,£1/2E( (~)) 1 ,,1/2 E( (~ )) 1 1/2fl::'L 1 1/21( 1\ . .
1/2 i=l 9 Pi - 1/2L..i'=l 9 Pi' ~ 1/2"£i=lrgi/ 1/2L.i'=l7!}/' the test statistic
- - / .
. 1-. 2e£')\ :-_> Y1 - Y2 r: .' 111 LueJ (v.:... l/d,<~i:" t...;:'l
50 JOrffdj £~_{W
l: '2'
Sh i .
tJ ) '" r.1
<:.- ~t):..")
TY c
1(~?/). JVar(Yi - 1'2)

I,d
'T<"
~I ~~
s L""] .~ ? ~
. J-.&
~J lc~ v-p., .' ~r
• ~i) ,! 9 l"iJ. P\
IS appropnate for testmg Ho, The alternative of interest must be formulated on the
transformed scale; denote this alternative by .6..1, where
.6..1 = 1/2 t;
1 1/2
g(p?) - 1/2
1 1/2
if;g(p?,)
and where p? and p~ are the specified alternative parameter values.
To achieve significance level a requires that
1171- 1721
-r=/='===:=====:== = Z (1-a/2) ,
V Var(Yl - Y2)
while to achieve power (3requires that
1171- 1721- ~1
./ _ = Z(l-,6)·
V Var(Yl - Y2)
Both of these conditions are satisfied when (2) holds. In the remainder of the paper,
sample sizes are determined by imposing this requirement.
In this paper, we will give sample size formulas for three data distributions: the
Poisson, the mixed binomial-Poisson, and the negative binomial distributions. We
also give a solution to the sample size problem for the binomial model with replication.
The results for the binomial model with replication have application to industrial
situations, where replication error is often of substantial interest. The run sizes in
this situation give guidance on whether ignoring replication error and running a single
replication is prudent in a specific situation.
Our sample size determinations allow for unequal run sizes. The extension to
unequal run sizes and cost considerations in the binomial situation is straightfor-
ward. This extension, together with a generalization to the Poisson case and cost
considerations, is detailed in Karson, Gaudard, and Ramsey (1997).
5
Note that, as in Bisgaard and Fuller (1995), our solutions only apply in the situation
where the constraints of the sample size problem require a sample size large enough
to ensure the approximate normality of the test statistic. We suggest that the user
run simulations to determine whether approximate normality can safely be assumed
in the specific situation of interest.
The Poisson Model
We consider the case where the response of interest, Xij, has a Poisson distribution
with parameter Ai. We need to find a sample size, or opportunity unit, that will
allow us to detect specified effects. To do this, we approximate the Poisson model
with a binomial model. We define Pi = Ai/ni, and, as in the binomial case, define
Pi = Xi. / ru, We model Pi by
where the errors e, are independent, E(ci) = 0, and Var(ci) = Pi/ni' This consists
of Step 1, setting up the model, and Step 2, obtaining the variance of Pi.
A variance-stabilizing transformation (Step 3) is needed, and, in the Poisson case,
such a transformation is well-known to be the square root transformation, g(Pi) =
v1J:. This transformation gives Var(g(pi)) ~ 1/4ni, which is independent of Pi, and
E(g(Pi)) ~ g(Pi). Thus the variance on the transformed scale is the same as in the
binomial case, and so the condition imposed by the classical sample size problem for
the Poisson model is formally the same as in the binomial case. namel¥, '5i It c-e~N
..L~
-r-;>...I--
cr.-~') "
t~
i=l n;
= (
Z(1-a/2)
ILl1
+ Z(l-,6)
) 2 (3) .
.. _
I
<If " )
(:t'Y'tl{ ;;0 ffifL~ 'ffv;r:!-

However,
6
is now dependent on the transformation for Poisson data. If all the ni are equal, it
follows that
(Z(1-a/2) + Z(1_,6))2 (4)

n = It::.i
Note that Anscombe (1948) argues that the transformation JX ij + 3/8, is preferred
for large n, both in terms of stabilizing the variance of the Poisson variance at 1/4,
and in terms of reducing the bias. If we were to use this corrected form of the square
root transformation, namely defi~~g :11:.VI VIi ~,~~1t:

...
~ ~~~:'J~a~le to
solve explicitly for n, However, truSlS not a:-seriotls dl~wbaek:, and the reader can
certainly incorporate this transformation if desired.
It is also important to note that sample sizes determined using neither variance-
stabilizing transformation provide normality if the Poisson means of interest are small.
~~~\£dcv-
In cases involving At value b 5 or smaller, the practitioner should augment the sample
size in order to obtain approximate normality.
Weights
The more general condition (3) permits the analyst to impose an additional criterion
~.
wh.i,Gh along with (3) will specify the run sizes in both the binomial and Poisson mod-
I )
els. Suppose that WI, W2, ... , WI are a set of treatment weights such that 2:{=1 ui, = 1.
If H denotes the right hand side of (3), H = ( If',.~ ) 2 , then choosing
Z(1-a/2) Z(1-{3)
satisfies (3) and permits smaller run sizes for higher weights, or larger run sizes for
lower weights. For example, a special kind of weight is a cost. Suppose that c; is the
7
cost of obtaining a subsample for treatment i, so that the total cost of the treatment
i run is n.c, and the total cost of the experiment is I:{=1 rue; Subject to (3), the total
cost is minimized if
I
ui, = Vci/ L Vci·
i=l
Example
As an example, consider an industrial situation where metal parts are produced in

an investment casting operation. Three factors are studied to determine their effects
on flaws: mold design (A), pouring method (B), and temperature (C). A fractional
factorial design is employed with runs at a, b, c, and abc( irhere the appeara!,lce of a ."<'AI.i-u;'~
[")/1 . le -h.•.
• • •••• Jc'-'LP ma:1~
letter indicates that the correspondmg factor IS set at Its high level, and: the lest 1tt tv· r.t..-:::::::r
J.4.. ,
their low levels. One lot of molds is run through the process at each set of treatment
conditions; 6hce completed, all parts in that lot are inspected for flaws and the total
number of flaws for that lot is recorded. Note that the assumption of a Poisson model
j!oes~~ ':~~his case. Denote the probabilities of success for the treatments
a, b, c, and abc by Pa, Pb, P«, and Pabc, respectively. We are interested in testing the
main effect of A, Hi, : CPa + Pabc)/2 - (Pb + Pc)/2 = 0, at a significance level of .05 and
with power .90 at the alternative that Pa = Pabc = .20 and Pb = Pc = .10. We need to
determine how many parts must be tested at each set of treatment conditions.
Applying (4),
n ;:.::;
(1.95996 + 1.28155)2 /[4( v.2 - -/.1)2]
153.1.
Thus, a subs ample size of 154 parts for each of the four treatments should provide
the desired power, giving a total experiment size of 616. The reader may check that,
using Anscombe's transformation, the subsample size would be 156. Note that these
sample sizes are large enough to guarantee approximate normality if the probability
8
of a flaw per unit is on the order of .10, or even somewhat smaller.
Consider the situation under a cost structure scenario. We will assume that the
cost of the first three treatments, a, b, and c results in a weighting of .20 for each
treatment, and that the resulting weight for the fourth treatment, abc, is .40. Then
the reader may check that the required sample sizes are na = nb = nc = 192, and
nc = 96. This gives a total experiment size of 672.
1i")JC,
THE MIXED BINOMIAL-POISSON MODEL
Binary response industrial experiments often have responses which can be consid-
ered either binomial or Poisson. The Poisson model often fits better than the binomial
in situations that otherwise appear to be binomial, but where the underlying assump-
tions of independence or constant probability do not hold. A class of models that
includes both the binomial model, the Poisson model in the limit, and also allows for
a combination of the two is the class of binomial-Poisson mixtures. In this section,

we derive the optimal sample size allocations for this class of models.
Again, consider a single replicate of a two-level balanced design, such as a 2k or
2k-! factorial experiment consisting of I treatments, where a subs ample size n; is
observed for treatment i, i = 1, ..., I .. Each run of ni subsamples is considered to
consist of ri; independent Bernoulli trials. Denote the Bernoulli random variable for
the jth subs ample of treatment i by Xij, and let Pi = E(Xij) denote the unknown
true probability of a response for a subs ample of treatment i. Then Xi., where Xi. =
Lj~l Xij, is assumed to be modeled by a distribution given as follows:
(5)
where 0 < '"'Ii [] 1, where Binitu, Pi) denotes a binomial distribution with parameters
ni and Pi, and where Pois("\) denotes a Poisson distribution with parameter value
"\ = niPi. The mixture is a convex combination of these two distributions with
9
mixture proportion determined by Ii, and indicates that the response will be binomial
with probability Ii and Poisson with probability 1 -Ii' The situation where Ii = 0,
namely where Xi. is Poisson, was discussed earlier. Note that, when using this mixed
binomial-Poisson model, we assume that the Poisson probabilities of Xi. exceeding n,
are negligible, so that, for all intents and purposes, P(Xi. 0 n) = 1 holds.
Given n; trials of run i resulting in Xi. responses, we model Pi = Xdni as before,
by
We assume that the errors Ci are independent and E(Ci) = 0, so that E(Pi) = Pi.
Using this model, we show in Appendix B that, if we require that n; = n for all i,
the optimal sample allocation is given by
Here ,6.1 is given by

2
I
[I/2
trg(Pi) - if;g(Pi/) 1 ,
I/2
where the variance-stabilizing transformation is g(Pi) = (1/,ffi) arcsin( V( liPi)). Note

that ti depends on the Ii through ,6.1'
If WI, W2, ... , WI is a set of treatment weights such that 'L{=1 Wi = 1, then choosing
satisfies (13) and permits smaller run sizes for higher weights, or larger run sizes for
lower weights. In particular, if c; is the cost of obtaining a subs ample for treatment
i, the total cost of the treatment i run is rue, and the total cost of the experiment is
'L{=l nici· Subject to (13), the total cost is minimized if

I
ui, = Vci/ L Vci·
i=l
10
Example
We provide an example that is consistent with the one previously constructed,

in order to permit comparisons among the examples. Consider now a high speed
operation where metal coupons are being stamped. Three factors are being studied
to determine their effect on defective units: die speed (A), amount oflubrication (B),
and type of lubrication (C). Again, we employ a fractional factorial design with runs
at the conditions described by a, b, c, and abc. The probabilities of the corresponding
treatments are denoted by Pa, Pb, Pc, and Pabc· Again, we are interested in testing
the main effect of A, Ho : (Pa + Pabc)/2 - (Pb + Pc)/2 = 0, at a significance level of .05
and with power .90 at the alternative that Pa = Pabc = .20 and Pb = Pe = .10. We
will explore the effects of three different mixture scenarios, as indicated in the table
below (the gamma values are given as CIa' Ib' 10 labe))'

To illustrate the calculation, consider the case where treatments a and abc are
considered to be fairly well represented by a binomial distribution, with "( values of
.9, and where treatments band c are thought to be more readily represented by a
n~
Poisson distribution, with I values of .2. Then,
1'>, ~ g(p,) - ,~ g(p" ) 1

(1/ v.9) arcsin( J (.9)(.2) - (1/ v(2) arcsin( J (.2)(.1)
.14456,
and
(1.96 + 1.28)2
4(.14456)2
125.6.
In the table below, we have rounded up to the next largest sample size. Here N
represents the total experiment size. We also consider the cost structure scenario
11
where the cost of the first three treatments, a, b, and c results in a weighting of .20
for each treatment, and the resulting weight for the fourth treatment, abc, is .40.
Results for these scenarios are also give in the table below. Note that the variance-
stabilizing transformation does result in approximate normality for these sample sizes
for the model values of interest. Again, it is important for the practitioner to check
that approximate normality holds in the specific situation of interest.
'Ya, 'Yb' 'Y c: 'Yabc Equal Run Sizes Weighted Treatment Costs
.9, .2, .2, .9 lla=llb=llc=llabc=126, ll=504 lla=llb=llc=158, llabc=79, ll=553
.9, .8, .8, .9 lla=llb=llc=nabc=132, ll=528 na=llb=llc=165, llabc=83, ll=578
.1, .2, .2, .1 lla=llb=llc=llabc=153, ll=612 lla=llb=llc=191, llabc=96, ll=669
.1, .8, .8, .1 lla=llb=llc=llabc=161, ll=644 lla=llb=llc=201, nabc=101, ll=704
THE NEGATIVE BINOMIAL MODEL
In the case of continuous observable Bernoulli processes, it is often convenient to

design an experiment in such a way that experimentation terminates when a pre-
determined number of responses has been observed. This strategy involves negative
binomial sampling. Here the "sample size" issue is that of determining the number
of "successes" that must be observed before experimentation is terminated.
The issue of determining a sample size adequate to achieve a given power in a
negative binomial situation is addressed by Bisgaard and Gertsbakh (2000). In their
derivation, they apply a first order Taylor series approximation that results in a
sample size that depends only on the variance of the estimator on the original scale.
We present a method that is more direct, and more accurate in certain cases, in that
it does not involve this approximation. We validate our method with simulations of
the achieved power and significance levels in a number of cases.
12
We will consider negative binomial sampling in a situation where one runs a single
replicate of a two-level balanced design consisting of I treatments, where treatment
i is applied and the process is run until a fixed number Si, i = 1, ..., I, of responses
is observed. Each treatment results in a random number N, of runs, which are
considered to be independent Bernoulli trials. Thus Ns, the number of runs required
to obtain s, responses under the conditions of treatment i, is a random variable, and
we model it with a negative binomial distribution with parameters s, and Pi, so that
Given s, responses resulting from the N, runs under the conditions of treatment i,
we model Pi = sd N, as before, by
We assume that the errors e, are independent and that E(ci) = o.

The variance-stabilizing transformation of interest, derived in Appendix C, is g(Pi) =
- tanh-1 (vI - pi). The test that corresponds to this transformation is to reject the
null hypothesis of no treatment or interaction effect if
ffs -
I/2
1
L tanh-
[/2
i'=l
1
(fl-- -)
Pi' - -
I/2
1
Ltanh-
[/2
';=1
1 (
VI- Pi )
(6)
exceeds Z(1-O'/2). It follows from Appendix C that the sample size condition for the
ith treatment response number is
[1 ( 16.1 ) 2
L-~
s,
i=l 2:(1-0'/2) + Z(l-,6)
Note that, if we require that s, = S for all i, then,
(7)
and weights may be applied as in the previous models.
13
Note that, in their paper, Bisgaard and Gertsbakh work with what is essentially
the transformation arcsinhjp"! - 1). This is the transformation recommended by
Anscombe (1948). It is important to note that the transformation that we employ,
- tanh -1 (vr=J)), is equivalent to the transformation based on the hyperbolic arcsin,
as the derivatives of tanh-1(vr=J)) and arcsinhjp"! - 1) are equal. Our decision
to use the transformation tanh -1 (vr=J)) is based on the ease of its derivation as a
variance-stabilizing transformation.
Comparison with Bisgaard-Gertsbakh
The table below lists details for the four examples considered in Bisgaard and
Gertsbakh (2000). Here, PI and P2 refer to the probabilities of response at the low
and high settings, respectively, of the factor or interaction of interest; thus Pi =
PI, i = 1, .. .1/2, and Pi' = P2, if = 1, ..., 1/2. The run sizes in the second column were
computed from (7), and then rounded up to the nearest integer. The "Beta" values
were simulated by generating 50,000 observations from negative binomial distributions
with the appropriate parameter values and determining the proportion of times that
the test in (6) rejected the null hypothesis for the run size given in (7) and the run size
obtained by Bisgaard and Gertsbakh (2000). The same simulated values were used
for the test based on each of the two run sizes. (Note that Bisgaard and Gertsbakh
(2000) indicate that their run sizes do not depend on the actual variance-stabilizing
transformation being used.) The "Alpha" values are obtained by generating 50,000
observations from distributions with parameter value P = (Pl + P2)/2, applying the
test in (6) for both run sizes, and calculating the proportion of times the tests were
rejected.
Note that the standard error for the simulation leading to the "Beta" values is
approximately 0.001342, and for the "Alpha" values is 0.000975. The values obtained
for Type I error and for power indicate that the formula we present in (7) performs
14
quite well, and is more accurate than the Bisgaard-Gertsbakh method, which tends to
be conservative, in cases where smaller run sizes are at issue. Also note that the (7)
gives a simple formula for calculating the run size, whereas the approach in Bisgaard
and Gertsbakh (2000) requires two steps, one of which is based on a "lookup" table.
Example s from (7) Alpha, Beta s from B-G Alpha, Beta
PI = .075
P2 = .125 19 .05192, .90784 20 .05774, .91506
1=8
PI = .250
P2 = .350 17 .05320, .90970 18 .05784, .91574
1=16
PI = .00095
P2 = .00105 262 .05094, .90060 264 .05158, .90196
1=16
PI = .00875
P2 = .01125 42 .05088, .90330 43 .05352, .90728
1=16
Example
Consider once again the high speed operation where metal coupons are being
stamped. Three factors are involved: die speed (A), amount of lubrication (B),
and type of lubrication (C). We employ a fractional factorial design with runs at
the conditions described by a, b, c, and abc, and denote the probabilities of the cor-
responding treatments by Pa, Pb, Pc, and Pabe. We want to design the experiment to
test the main effect of A, Ho : (Pa + Pabe)/2 - (Pb + Pe)/2 = 0, at a significance level
of .05 and with power .90 at the alternative that Pa = Pabe = .20 and Pb = Pc = .10.
15
From ??,
(1.96 + 1.2S)2
s ~
4( - tanh -1 ()1 - .2) + tanh -1 (~))2
IS.68
Thus, a run size of 19 will provide power .90 at significance level .05. Simulations
confirm that the distribution of the test statistic,
~ 1-(tanh-1()1- Pa) + tanh-1()1- Pabc)) + (tanh-1()1- Pb) + tanh-1()1- pc))1

V1/(4(19))
has a distribution that is well-approximated by a normal distribution.
THE BINOMIAL MODEL WITH REPLICATION ERROR
Next, we study a binomial response model that allows for replication error. Still
in the context of a two-level balanced design, we consider I runs each of which is
replicated R times, so that the Bernoulli random variable for the jth trial of run i in
replicate r is denoted by Xirj, i = 1, ... , I, r = 1, ... R, j = 1, ... , n. Let Pir = E(Xirj)
denote the unknown true probability of a response for the rth replicate of treatment i,
and model this by Pir = f1+(Xi+Ci(r), where f1 is the overall treatment mean probability
of a response, (Xi is the fixed effect of treatment i, and where Ci(r) is a random effect
with E(Ci(r») = 0 representing replication error. Note that, unconditional on Ci(r),
E (Xirj) = J1 + (Xi; we define Pi to be this unconditional expected value, Pi = E (Xirj) =
f1 + (Xi· Thus Pi is interpreted as the overall probability of a response to treatment i.
Consistent with previous notation, we let Pir = ~'£"];;1 Xirj. We model Pir by
where rJir is a random effect with E(rJir) = 0, representing the error due to subsampling
within a replication of treatment i. We assume that the Ci(r) and rJir are independent.
16
In Appendix D, we derive the sample size condition for the ith treatment and rth
replicate run size, nir' This derivation is based on the transformation arcsin((2p -
1)/ vII + 4no-;). The sample size result is summarized in Formula (18):
I R 1 1( I us, ) 2
2:2:-=-4
i=l r=l nir Z(1-a/2) + z(1-,B)
(8)
As usual, .6.1is the transformed effect of interest. If all replicates are to have equal
run sizes, this common run size, n, must satisfy (19):
I(z(1-a/2)
n= ----------------~~~--~~~------------~
+ Z(1_,B))2
R (""I/2 . ((2Pi-l)) _ ",,1/2 . ((2Pi'-1) ))2'
L..i=l arcsm ~ L..i'=l arcsm .I 2
V 1+4nO"i v 1+4nO"i'
Note that this does not provide an explicit expression for n; rather, one must solve
for n implicitly. This is fairly easy, using mathematical or statistical software.
The usual transformation for stabilizing binomial data (which is also the one used
by Bisgaard and Fuller (1995)) is arcsin(y'P). In the case where there is no replication
error, the transformation arcsin((2p - 1)/ )1 + 4no-;) reduces to arcsin(2p - 1). It is
interesting to note that this transformation is equivalent to the usual transformation;
it can be shown, by differentiating both expressions, that 2 arcsin( y'P) = arcsin(2p -
1) + '71-;2.
Example
We return to our example where metal coupons are being stamped in a high-speed
operation. The three factors of interest are: die speed (A), amount oflubrication (B),
and type of lubrication (C). We run a full factorial design. Given the experimental
constraints, it is possible to replicate the experiment three times. Again, we are
interested in testing the main effect of A at significance level .05; this corresponds to
a test of Ho : (Pa + Pab + Pac + Pabc)/4 - (Po + Pb + Pc + Pbc)/4 = O. In our first set of
17
computations, we require power .90 at the alternative that Po = Pab = Pac = Pabc = .20
and Po = Pb = P« = Pbc = .10. In our second set of computations, we require power

.90 at the alternative that the "high" effect of the factor is .11, while the "low" effect
is .09.
We assume that the replication error is constant, and, as earlier, we denote this
replication variance by (}2. We also require equal subs ample sizes. Results for various
values of a are given in the table below. All fractional subs ample sizes were rounded
up to the nearest integer, and the overall experiment size was obtained by adding
these. Were we to run the experiment with only one replication, assuming that
replication error is zero, then the required run size in the first case would be 66,
and in the second 1180. Note that, when the replication error is small, the total
experiment size for the experiment run with three replications is essentially the same
as the total experiment size when the experiment is run with only one replication
We also note the sample size algorithm can not be solved if the replication variance
is large compared to the difference one is interested in detecting.
PI = .1; P2 = .2 PI = .09; P2 = .11

Sigma Run Size Experiment Size Run Size Experiment Size
.0001 22 528 394 9456
.001 22 528 395 9480
.005 22 528 442 10608
.01 23 552 700 16800
.05 40 960 * *
.06 60 1440 * *
.07 156 3744 * *
.08 * * * *
18
CONCLUSION
The table below summarizes the transformations used is each situation considered
in this paper. In most cases, the practitioner will be interested in obtaining equal
run sizes, and the power requirement will be based on an assumption of equal re-
sponse probabilities at those treatments involving the high level of the contrast of
interest, say PI, and at those involving the low level of the contrast of interest, say
P2, as in our examples. The sample size in each situation can be computed as
(Z(1-a/2) + Z(1_{3))2 / I l1i, where 111 = g(PI) - 9 (P2). Note that the formula for the
negative binomial gives the number of failures that must be observed before the run
is terminated.
Replication? Model Transformation

No Binomial g(Pi) = arcsin( yPi)
No Poisson g(Pi) = Viii
No Birr/Poisson g(Pi) = arcsin( V (!iPi)) / fti
No Neg. Bin. g(Pi) = - tanh-1 (VI - P:)
Yes Binomial g(Pi) = arcsin((2pi - 1)/ VI + 4no-;)
APPENDIX A. DERIVATION OF BINOMIAL SAMPLE SIZE
FORMULA FOR UNEQUAL RUN SIZES
Consider a two-level balanced design consisting of I treatments, i = 1, ..., I, where

each treatment condition is set, and a run size, or subs ample size, of ti is obtained.
The subsample of size n obtained for a given treatment condition setting is assumed to
consist of ti independent Bernoulli trials. Let Pi denote the unknown true probability
of a response (success) for a subs ample obtained under the conditions of treatment
19
i. Thus, if Xij is the Bernoulli random variable for the jth subs ample of treatment i,
then Pi = E(Xij).
Denote the sum of the Xij over the trials by Xi., so that Xi. = 'Lj~lXij. Then,
given n subsamples of treatment i resulting in Xi. responses, Pi = Xd n is the usual
estimator of Pi. A model for Pi is
where the errors e, are independent, E(Ci) = 0, and Var(Ci) = Pi(1-Pi)/n = Var(Pi) ,
i = 1, ..., I, .and 'L{=1 ai = 0.
In a 2k or 2k-f factorial experiment, estimable main effects and interaction effects
are of primary interest. Any estimator of an estimable effect is a contrast in means
of Pi's. Suppose Y is a mean of M p/s. Then Y = 'L-t!:1 pd M estimates E(Y)
J-l + 'L-t!:1 adM, and
1 M
Var(Y) = -2 LPi(1 - Pi)/n.
M i=l
Note that Var(Y) depends on the M Pi'S, where Pi = E(Pi)'

Since the optimum run sizes will depend on the unknown Pi's, we transform the Pi
scale by the usual variance stabilizing transformation for the binomial case, namely,
arcsin( JPi). Denote this variance-stabilizing transformation by g(Pi), so that g(Pi) =
arcsin( JPi). This yields Var(g(pi)) ~ 1/4n, which is independent of Pi , and E(g(Pi)) ~
g(Pi). Therefore, if Y is a mean of M transformed Pi's, Y = 'L-t!:1 arcsin (JPi)/ M.
Now consider an arbitrary main effect or interaction, on the transformed scale. It
is estimated by a contrast, YI - Y2, each of M = 1/2 terms. That is,
where
20
to a first order Taylor series approximation, and
- - 1
Var(Yi. - Y2) ~ -. (9)
In
Thus, Yi - Y2 estimates 6. with variance given by (9). For a given 6., the classical
sample size problem considers the null hypothesis that 6. = 0 versus the alternative
that 6. =I- 0, with the type I error probability, a, specified, and the type II error prob-
ability, when Ll = Lll' specified for a given Lll =I- 0 as (3. The large sample test rejects
the null hypothesis when !Yl - Y2! / VVar(Yi - Y2) > Z(1-a/2), where Z(1-a/2) is the
standard normal (1 - a/2) quantile. The "a critical value" is Z(1-a/2)VVar(Yl - Y2)
and the "(3 critical value" is Lll - Z(l-,6) VVar(Yl - Y2), and the classical sample size
problem solution equates these two "critical values". Equating the two gives
which from (9) gives, as the run size condition,
(Z(1-a/2) + Z(l-,6l) 2
n= I6.i (10)
APPENDIX B. SAMPLE SIZE DERIVATION IN MIXED

BINOMIAL-POISSON MODEL
Recall that, when using this mixed binomial-Poisson model, we assume that the
Poisson probabilities of Xi exceeding n are negligible, so that, for all intents and
purposes, P(Xi. [I n) = 1 holds. Given n trials of run i resulting in Xi. responses, we
model Pi = Xi. / n, as before, by
where we assume that the errors Ci are independent and E(ci) = 0, so that E(Pi) = Pi·
This is Step 1, the model for Pi.
Step 2 consists of applying the distribution on the response in obtaining the variance
of Pi. Here it is easily shown that Var(pi) = Pi(l - "YiPi)/n, i = 1, ..., I.
21
Since the variance depends on Pi, a variance-stabilizing transformation is needed.
Recall from the theory of Taylor series that, if g(p) is a differentiable function of
p, then, to a second degree approximation, g(p) ~ g(p) + g'(p)(p - p) for the value
p It follows that, if p is an unbiased estimator for p, that E(g(p)) ~ g(p) and
V ar(g(p)) ~ g' (p )2V ar(p). A variance-stabilizing transformation is a transformation
9 such that Var(g(p)), and thus, g'(p)Var(p), does not depend onp. This is satisfied
when g'(p) ex [Var(p)]-1/2.
To obtain a variance-stabilizing transformation in this situation requires that g' (p) =

c[P(l - ,P)/n]-1/2, where c is a constant. In the following derivation, we denote the
value of Pi that corresponds to a treatment level, and its corresponding mixture pro-
portion, Ii' generically by p and I, respectively. Using this notation, we require
that
g(p) c J[P(l - 'P)/ntl/2dp
cvnJ vlPV(1-dp IP)

2cvn arcsin( y9P)
Reverting now to our notation of Pi, Ii' and n to designate the values associ-
ated with the ith treatment combination, the variance-stabilizing transformation sug-
gested by our derivation is arcsin(J'iPi). However, note that, from the Taylor series
expansion, Var(arcsin(vbiPi))) ~ Id4ni' which depends on Ii. Also note that
E[arcsin(V(,iPi))] ~ arcsin(vbiPi)) depends on Ii. This means that the expected
value of the transformed contrast, E(Yl - Y2) = 6, would not in general be 0 under
Ho : the effect of interest is zero. To a first degree Taylor series approximation about
0, arcsin(vbiPi)) ~ J,iPi, and so we define g(Pi) = (l/v'fiJarcsin(VbiPi)). This
ensures that E[g(Pi)] ~ yPi, so that 6 will be 0 under H«. Note that Var(g(pi)) =
1/ 4ni. This is Step 3.

Step 4 consists of formulating the classical sample size problem on the transformed
22
scale. We transform the Pi, and denote a mean of the g(pi) by Y, so that Y
~f;!1g(pi) / M is our generic notation for such a mean. An arbitrary main effect or
interaction on the transformed scale is estimated by a contrast, Y1 - 1/2, each of
M = 1/2 terms. That is, as in Appendix A,
_ _ 1 1/2 1 1/2
Yi - 12 = I /2 ~g(pi) - 1/2 if;g(Pi/),

where
and where now
1 I
~ J22:(1/n) (11)
i=l
I/nI. (12)
- -
Thus, Yi - 1'2 estimates .6.1 with variance given by (11). The classical sample size
problem condition for the ith run size is thus
(2(1-0</2) + Z(l-,8)) 2
n= 1.6.2 (13)
1
Therefore, any n satisfying (13) will solve the classical sample size problem for the
binomial-Poisson mixture family. Note again that .6.1 is model-dependent and must
be expressed in the transformed scale.
APPENDIX C. SAMPLE SIZE DETERMINATION FOR THE

NEGATIVE BINOMIAL MODEL
Let Ni denote the number of runs required to obtain s, responses under the condi-
tions of treatment i., so that N, has a negative binomial distribution with parameters
s; and Pi. It follows that E(Ni) = Sdpi and Var(Ni) = si(l - Pi)/P;' Given s, re-
sponses resulting from the N, runs under the conditions of treatment i, we model
23
where J.1 is the overall treatment mean probability of a response, (Xi is the fixed effect
of treatment i, and 'L{=l (Xi = O. We assume that the errors ci are independent and
that E(ci) = O. This consists of Step 1, specifying the ANOVA model for the Pi.
In Step 2, we apply the relevant response distribution to obtain Var(pi) = Var(sd Ni).
From a first order Taylor series expansion of Pi = s.] N, considered as a function of
Ni, around E(Ni) = Si/Pi' it follows that
Step 3 consists of finding a variance-stabilizing transformation, namely, a transfor-

mation 9 such that V ar(g(A)) is approximately constant. Such a transformation is
given by
g(Pi) = - tanh-1 ( VI - Pi) .
For this transformation, one can show that Var(g(Pi)) ~ 1!(4si).
In Step 4, we formulate the classical sample size problem on the transformed
scale. Using the notation introduced in the preceding section and the fact that
Var(g(pi)) ~ 1!(4si), it follows that

~ ~ 1 I
Var(Y1 - Y2) ~ J2 L(1! Si).
i=l
Recall that the classical sample size problem requires solving
for the sample size. In the negative binomial case, the classical sample size problem
condition for the ith treatment response number is
I 1
L-~ ( t s, )2
i=l Si Z(1-a/2) + Z(1-f3)
24
Note that, if we require that Si = S for all i, then,
'"" (Z(1-a/2) + Z(1_,B))2

s~ Ill2 '
1
and weights may be applied as in the previous models.
APPENDIX D. SAMPLE SIZE DERIVATION FOR THE BINOMIAL

MODEL WITH REPLICATION ERROR
Here, our ANOVA model (Step 1) for Pir is given by:
where "lir is a random effect with E( "lir) = 0, representing the error due to subsampling
within a replication of treatment i. We assume that the Ci(r) and "lir are independent.
Step 2 consists of applying the distribution on the response in obtaining the variance
of Pir. Note that, given Ci(r) , the variance of Xirj is Pir(l-Pir). Thus, Var(Pir I Ci(r)) =
Pir (1 - Pir) / nir· Denoting the variance of Ci(r) by 0-f, the unconditional variance of Pir
is given by
(14)
As expected, the variance depends on Pi, and so a variance-stabilizing transforma-

tion is needed. To simplify the problem of obtaining a variance-stabilizing transfor-
mation for (14), we make the assumption that the coefficient of 0; is near 1. Thus we
find a variance-stabilizing transformation 9 for Pir, where we assume that Var(Pir) =
Pi(l- Pi)/nir + or We seek a transformation 9 such that g'(Pir) ex Var(Pir)-1/2. The

reader can verify that such a transformation is given by
~ )
9 (Pir = arcsm
. (
V 12Pir+ 4nirO"i
- 1 )
2 '
25
and that g'(Pir) = (Pi(1- Pi) + nireJT)-1/2 ~ (nirVar(Pir))-1/2. Since 9 is a variance-
stabilizing transformation, it follows that Var(g(Pir)) ~ g' (Pi?Var(Pir), and so Var(g(piT)) ~ l!niT'
Note that E(g(Pir)) ~ arcsin ((2Pi - 1)/ )1 + 4nireJT) .
In step 4, we formulate the classical sample size problem on the transformed scale.
An arbitrary main effect or interaction, on the transformed scale, is estimated by a
~ ~
contrast, Y1 - Y2, of the form:
~ ~ 1 1/2 R 1 1/2 R
1 2
Y - Y = IR/2 ~~g(Pir) - IR/2 if;~g(Pilr).
Here
2 [1/2 R 1/2
~ I R ~ ~ E(g(Pir)) - if;~ R
E(g(Pi1r))
]
(15)
-2 [1/2
L arcsin. ( (2Pi-l))
-
1/2
L . (
arcsin
(2Pi,-1)
2
)]
I i=l )1 + 4nireJT i'=l )1 + 4nireJil
L}.1·
Also,
~ ~ 4 I R
Var(Yl - Y2) ~ (IR)2 ~~(l/nir). (16)
Solving the classical sample size problem for an alternative L}.l requires finding nir
to satisfy
(17)
From (16), it follows that the run sizes nir must satisfy:
tt~=!(
r=l nir
i=l 4 Z(1-O'/2)
IRL}.l
+ Z(l-J3)
)2 (18)
If we are seeking equal run sizes for all treatments and replications, then from (18),
denoting the common run size by n,
4(Z(1-O'/2) + Z(l-.8») 2
n = IRL}.r
26
It follows from (15) that the common run size for each replicate satisfies:
n = I(z(l-Ci/2) + Z(1_.B))2 (19)

R (2:.I~2
2-1
arcsin ( (2pi-1) ) _ 2:.If}: arcsin ( (2pj,-1)
J1+4no} 2 -1 J1+4no 7'
))2.
REFERENCES
[lJ Anscombe, F. J. (1948). "The Transformation of Poisson, Binomial, and Negative-

Binomial Data." Biometrika, 35, pp. 246-254.
[2J Bisgaard, S. and Fuller, H. T. (1995). "Sample Size Estimates for 2k-p Designs with
Binary Responses." Journal of Quality Technology 27, 4, pp. 344-354.
[3J Bisgaard, S. and Gertsbakh, I. (2000). -r » Experiments with Binary Responses:

Inverse Binomial Sampling." Journal of Quality Technology 32, 2, pp. 148-156.
[4] Karson, M., Gaudard, M., and Ramsey, P. J. (1997). "Sample Size Allocations for
Factorial Experiments with Binary Responses." Proceedings of the Section on
Quality and Productivity, American Statistical Association, pp. 124-127.
Marie Gaudard is a Professor of Statistics in the Department of Mathematics and

Statistics. She is a Senior member of ASQ and RCA.
Marvin Karson is Professor Emeritus, of the Department of Decision Sciences.
Philip J. Ramsey is an Adjunct Professor of Statistics in the Department of Math-
ematics and Statistics.
27

Sample Size For Binary Response Experiments Under Various Models

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Sample Size For Binary Response Experiments Under Various Models

Hochgeladen von

Copyright:

Verfügbare Formate

Sample Size for Binary Response Experiments

Under Various Models

Marie Gaudard, Marvin Karson, and Philip J. Ramsey

error probabilities in a test for a treatment effect. Using this approach, we

binomial model where replication error exists.

Key Words. Binary Data, Binomial Distribution, Fractional Factorial Designs,

Consider a two-level balanced design, such as a 2k or 2k-f factorial experiment,

Pi = E (Xij ), and the usual estimator f Pi is Pi = ct X ij ) / n., The variance of Pi is

In a 2k or 2k- f factorial experiment estimable main effects and interaction effects

and Fuller show that any n satisfying P'f

We adopt an approach to sample size determination that builds on some features

1. Model: Model the estimator of the probability of a response, Pi, using an

2. Distribution: Use the model for the data to obtain Var(pi);

3. Variance-Stabilizing Transformation: Find a transformation of Pi that renders

Var(pi) independent of Pi;

Step 4 stipulates that

on the transformed scale. To see this, consider the following.

1(~?/). JVar(Yi - 1'2)

The Poisson Model

Pi = Xi. / ru, We model Pi by

(:t'Y'tl{ ;;0 ffifL~ 'ffv;r:!-

(Z(1-a/2) + Z(1_,6))2 (4)

root transformation, namely defi~~g :11:.VI VIi ~,~~1t:

As an example, consider an industrial situation where metal parts are produced in

THE MIXED BINOMIAL-POISSON MODEL

a combination of the two is the class of binomial-Poisson mixtures. In this section,

Lj~l Xij, is assumed to be modeled by a distribution given as follows:

the optimal sample allocation is given by

Here ,6.1 is given by

where the variance-stabilizing transformation is g(Pi) = (1/,ffi) arcsin( V( liPi)). Note

'L{=l nici· Subject to (13), the total cost is minimized if

We provide an example that is consistent with the one previously constructed,

below (the gamma values are given as CIa' Ib' 10 labe))'

1'>, ~ g(p,) - ,~ g(p" ) 1

.9, .8, .8, .9 lla=llb=llc=nabc=132, ll=528 na=llb=llc=165, llabc=83, ll=578

.1, .2, .2, .1 lla=llb=llc=llabc=153, ll=612 lla=llb=llc=191, llabc=96, ll=669

.1, .8, .8, .1 lla=llb=llc=llabc=161, ll=644 lla=llb=llc=201, nabc=101, ll=704

THE NEGATIVE BINOMIAL MODEL

In the case of continuous observable Bernoulli processes, it is often convenient to

the achieved power and significance levels in a number of cases.

We assume that the errors e, are independent and that E(ci) = o.

and weights may be applied as in the previous models.

Comparison with Bisgaard-Gertsbakh

~ 1-(tanh-1()1- Pa) + tanh-1()1- Pabc)) + (tanh-1()1- Pb) + tanh-1()1- pc))1

THE BINOMIAL MODEL WITH REPLICATION ERROR

with E(Ci(r») = 0 representing replication error. Note that, unconditional on Ci(r),

E (Xirj) = J1 + (Xi; we define Pi to be this unconditional expected value, Pi = E (Xirj) =

f1 + (Xi· Thus Pi is interpreted as the overall probability of a response to treatment i.

and Po = Pb = P« = Pbc = .10. In our second set of computations, we require power

PI = .1; P2 = .2 PI = .09; P2 = .11

Replication? Model Transformation

Consider a two-level balanced design consisting of I treatments, i = 1, ..., I, where

Note that Var(Y) depends on the M Pi'S, where Pi = E(Pi)'

which from (9) gives, as the run size condition,

APPENDIX B. SAMPLE SIZE DERIVATION IN MIXED

model Pi = Xi. / n, as before, by

of Pi. Here it is easily shown that Var(pi) = Pi(l - "YiPi)/n, i = 1, ..., I.

To obtain a variance-stabilizing transformation in this situation requires that g' (p) =

g(p) c J[P(l - 'P)/ntl/2dp

cvnJ vlPV(1-dp IP)

1/ 4ni. This is Step 3.

Yi - 12 = I /2 ~g(pi) - 1/2 if;g(Pi/),