Sie sind auf Seite 1von 9

Approximate Is Better than "Exact" for Interval Estimation of Binomial Proportions

Author(s): Alan Agresti and Brent A. Coull


Source: The American Statistician, Vol. 52, No. 2 (May, 1998), pp. 119-126
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2685469
Accessed: 22/11/2010 07:51

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The
American Statistician.

http://www.jstor.org
Approximate
is Betterthan"Exact" forIntervalEstimation
ofBinomialProportions
Alan AGRESTI and Brent A. COULL

mial tests of Ho: p = po. It has endpointsthatare the


solutionsin po to theequations
For intervalestimationof a proportion, coverageprobabil-
ities tendto be too large for "exact" confidenceintervals
based on inverting the binomialtestand too small forthe j ($$c ) Po (1-Po)rk = /2
intervalbased on inverting the Wald large-samplenormal
test(i.e., sampleproportioni z-score x estimatedstandard and
error).Wilson's suggestionof invertingthe related score x,\
test withnull ratherthan estimatedstandarderroryields n k
_ pO)n-k = (X/2,
E p(I
coverageprobabilitiesclose to nominalconfidencelevels, k=0
even forverysmall sample sizes. The 95% score interval
exceptthatthelowerboundis 0 whenx = 0 and theupper
has similarbehavioras theadjustedWald intervalobtained
bound is 1 when x = n. This intervalestimatoris guar-
afteraddingtwo "successes" and two "failures"to thesam-
anteedto have coverageprobabilityof at least 1 - a for
ple. In elementarycourses,withthescoreand adjustedWald
everypossible value of p. When x = 1,2, . . ., n - 1, the
methodsitis unnecessary to providestudentswithawkward
confidenceintervalequals
samplesize guidelines.
KEY WORDS: Confidence interval;Discrete distribu- r n -x+1
tion; Exact inference;Poisson distribution;
Small sample; XF2x,2(n-x+1))1-a/2
Score test.
< 1+
<p (x + 1)F2(x+l),2(n-x),ca/2-1

whereFa,b,c denotesthe 1 - c quantilefromthe F distri-


1. INTRODUCTION butionwithdegreesof freedoma and b. Equivalently,the
One of themostbasic analysesin statisticalinferenceis lower endpointis the a/2 quantileof a beta distribution
forminga confidenceintervalfor a binomialparameterp. withparametersx and n - x + 1, and theupperendpointis
Let X denotea binomialvariateforsamplesize n, and let the 1 - a/2 quantileof a beta distribution withparameters
p = X/n denotethe sampleproportion. Most introductoryx + 1 and n - x. Lettersto theeditor from J. Klotz and from
statisticstextbookspresentthe confidenceintervalbased L. Leemis and K. S. Trivedi in theNovember 1996 issue of
on the asymptoticnormalityof the sampleproportionand thisjournal (p. 389) showed how simple it is to calculate
estimating thestandarderror.This 100(1 - a)% confidence thisintervalusingMinitab S-Plus. or
intervalforp is A considerableliteratureexists about these and other,
less common,methodsof formingconfidenceintervalsfor
P i Za/2Vp(1-p)j/n (1) p. Santnerand Duffy(1989, pp. 33-43) and Vollset(1993)
wherez, denotesthe 1 - c quantileof the standardnormal revieweda varietyof methods.It has been knownforsome
distribution. This is called the Waldconfidenceintervalfor time that the Wald intervalperformspoorly unless n is
p, sinceit resultsfrominverting theWald testforp; thatis, quite large (e.g., Ghosh 1979, Blythand Still 1983). The
theintervalis thesetofpo values havingP value exceeding Clopper-Pearsonexact intervalis typicallytreatedas the
a in testingHo: p po against Ha: p #po using the "gold standard"(e.g., B6hning 1994; Leemis and Trivedi
test statisticz =( - po)i/ p(1 - p)j/n.Historically,this 1996; Jovanovicand Levy 1997; and most mathematical
is surelyone of the firstconfidenceintervalsproposedfor statisticstexts).However,thisprocedureis necessarilycon-
anyparameter(see, e.g., Laplace 1812, p. 283). servative,because of thediscretenessof thebinomialdistri-
To avoid approximation, most advanced statisticstext- bution(Neyman1935),just as thecorresponding exact test
booksrecommend theClopper-Pearson(1934) "exact"con- (withoutsupplementary randomization on the boundary of
fidenceintervalforp, based on inverting equal-tailedbino- the critical
region) is conservative.For any fixed parameter
value, the actual coverageprobabilitycan be much larger
thanthe nominalconfidencelevel unless n is quite large,
Alan Agrestiis Professor,
Departmentof Statistics, ofFlorida, and we believe it is inappropriate
University to treatthisapproachas
Gainesville,FL 32611-8545 (E-mail:aa@stat.ufl.edu).BrentA. Coull is a optimalforstatisticalpractice.
post-doc,Departmentof Biostatistics,HarvardSchool of Public Health,
BostonMA 02115. This workwas partiallysupportedby a grantfromthe
A compromisesolutionis the confidenceintervalbased
NationalInstitutesof Health.The authorsthanktherefereesand Thomas on inverting the approximatelynormaltest thatuses the
Santnerforhelpfulsuggestions. null, ratherthan estimated,standarderror; that is, its

(?) 1998 Amnerican


StatisticalAssociationi TheAmericani May 1998 Vol.52, No. 2
Statistician, 119
endpointsare the Po solutions to the equations (p - 2. COMPARING ACTUAL COVERAGE
po)/ po(l - po)/n =?Z,/2. This confidenceinterval,ap- PROBABILITIES TO NOMINAL
firstdiscussedby EdwinB. Wilson(1927), has the
parently CONFIDENCE LEVELS
form For a fixedvalueof a parameter, theactualcoverageprob-
abilityof an intervalestimatoris the (a priori)probability
(+ 2n J Za/2 a[3(1 P) + Z/2/4hlff) /(1 +Zc/2/n)' thatthe intervalcontainsthatvalue. In manycases, such
as withdiscretedistributions, this varies accordingto the
(2) parametervalue. In statisticaltheory, theconfidencecoeffi-
cientis definedto be theinfimum of such coverageproba-
This inversionof whatis the score testforp is called the
bilitiesforall possiblevaluesofthatparameter. Most practi-
score confidenceinterval.(Score tests,and in particular
tioners,however,probablyinterpret confidencecoefficients
theirstandarderrors,are based on thelog likelihoodat the
in termsof "averageperformance" ratherthan"worstpos-
nullhypothesisvalue of theparameter, whereasWald tests
sible performance." Thus,a possiblymorerelevantdescrip-
are based on thelog likelihoodat themaximumlikelihood
tionof performance is thelong-runpercentageof timesthat
estimate;see, e.g., Agresti 1996, pp. 88-95.) This article
theprocedureis correctwhenit is used repeatedlyfora va-
shows thatthe score confidenceintervaltendsto perform
rietyof data setsin variousproblemswithpossiblydifferent
muchbetterthanthe exact or Wald intervalsin termsof
parametervalues.
havingcoverage probabilitiesclose to the nominalconfi-
For any confidenceintervalprocedureforestimatingp,
dencelevel. It can be recommendedforuse withnearlyall
the actual coverageprobabilityat a fixedvalue of p is
sample sizes and parametervalues. In addition,we show
thata simpleadaptationof theWald intervalalso performs
Cn(p)= E 1(k,p)
n
well even forsmall samples. pk(l _n-k) ,
At firstglance, the score confidenceintervalformula
seems awkwardto interpret, comparedto (1). Lettingz = whereI (k,p) equals 1 iftheintervalcontainsp whenX = k
Zc/2, however, themidpointof thisintervalis theweighted and equals 0 if it does not containp. We summarizethis,
average using the alternativedescriptionof performance, by aver-
agingoverthepossiblevalues thatp can take.We obtained
resultsCn =f Cn(p)g(p)dp forthreebeta densitiesg(p)
p
(n + z2 )+ 2 (n +z2)' forthisaveraging:(1) theuniformdistribution (mean = .50,
std. dev. = 1/A/D= .29); (2) bell-shapedwithvalues rel-
whichfalls betweenp and 1/2, withthe weightgiven to ativelynear the middle (mean = .50, std. dev. = .10); (3)
p approaching1 asymptotically. This midpointshrinksthe skewedwithvalues relativelynear0 (mean = .10, std.dev.
sampleproportiontowards.5, the shrinking being less se- = .05) or,by symmetry, near1. Due to space considerations,
vere as n increases.The coefficient of z in the termthat we report resultshere mainly forthe firstcase, but similar
is added to and subtractedfromthismidpointto formthe results occurred in the other two cases. Thoughthiseval-
score confidenceintervalhas square equal to uation may suggest a Bayesian approach to inference,we
restrictattentionin thisarticleto comparingthethreestan-
dardmethodsdecribedpreviously, in whichtheusermakes
n +z2 i -P)(+2) ? (2) (2) (n + z2) no assumption about such a distribution forp.
Table 1 showsthemeanof theactualcoverageprobabili-
This has the formof a weightedaverage of the variance ties fortheuniformaveragingof theparametervalues (i.e.,
of a sample proportionwhenp = p and the varianceof a C, withg(p) = 1, 0 < p < 1) at varioussample sizes, for
sampleproportionwhenp = 1/2,usingn + z2 in place of nominal95% Wald, score, and exact confidenceintervals
theusual sample size n. (the threeothermethodslistedin thattable are discussed

Table 1. Mean Coverage Probabilitiesof Nominal 95% Confidence Intervalsfor the Binomial Parameter p, withRoot Mean
Square Errorsin Parentheses, forSamplingp froma Uniform Distribution

Method n= 5 n= 15 n= 30 n= 50 n= 100
Exact .990 .980 .973 .969 .965
(.041) (.031) (.026) (.022) (.017)
Score .955 .953 .952 .952 .951
(.029) (.019) (.014) (.012) (.008)
Wald .641 .819 .875 .901 .922
(.400) (.238) (.170) (.133) (.094)
Wald witht .664 .837 .886 .905 .926
(.391) (.233) (.167) (.131) (.093)
Mid-P .978 .964 .958 .955 .953
(.033) (.021) (.017) (.013) (.010)
Continuity-corrected .987 .979 .973 .969 .965
Score (.039) (.030) (.025) (.021) (.016)

120 General
in Section 4). The mean actual coverageprobabilitiesfor rectcomparisonof theformulasforthetwo intervalwidths
the Wald intervaltendto be muchtoo small. On the other yields that the score intervalis narrowerthan the Wald
hand,the exact intervalis veryconservative.For instance, intervalwheneverp3falls within (n ? z2)j(8r ? 4Z2) of
forthismethod,Cn = .990 when n = 5, .980 when n = 1/2. In particular, since thistermdecreasesin thelimitto-
15, and .973 when n = 30. By contrast,Cn forthe score ward 1/ = .35 as n increasesor Izj decreases,the score
methodis close to thenominalconfidencelevel,even forn intervalis narrowerthantheWald intervalwheneverp3falls
= 5 whereit is .955. Figure1, whichplotsC, as a function in (.15, .85) for any n and any nominalconfidencelevel.
of n forthethreeintervalestimatorswiththeuniformand See Ghosh (1979) for additionalresultsabout the relative
skewedbeta weightings, theirperformance.
illustrates Sim- lengthsof the two typesof interval.This comparisonhas
ilar resultswere obtainedwiththe bell-shapedweighting limitedrelevance,sincetheactualcoverageprobabilitiesof
and using .90 nominalconfidencecoefficient, but are not thetwo methodsdiffer. We mentionthis,however,to stress
reportedhere. thatthe inadequacyof the Wald approachis not thatthe
To describehow far actual coverage probabilitiestypi- intervalsare too short.
cally fall fromthe nominalconfidencelevel, Table 1 also For fixedn and p, the expectedwidthof an intervales-
reports f. (Cn (p) .95)2dp,the uniform-weighted
- root timatoris a useful measureof its performance. Figure 2
mean squarederrorof thoseprobabilitiesabout thatconfi- illustratesthe relativesizes of the expectedwidthsforthe
dencelevel.These values indicatethatthevariability about nominal95% Wald, score, and exact intervalsby plotting
thenominallevel is muchsmallerforthe score confidence themas a functionof p, forn = 15. For small n, the score
intervalthanfortheWald or exactconfidenceintervals.The intervalstendto be muchshorterthanexact intervals.The
improvedperformance of the score methodrelativeto the narrownessof the Wald intervalsas p approaches0 or 1
Wald methodis no surpriseand simplyadds to otherevi- reflectsthe factthatwhen x = 0 or n, thatintervalis de-
dence of thistypeaccumulatedovertheyears(e.g., Ghosh generateat 0 or at 1. By contrast,when x = 0, the score
1979; Vollset 1993). Some readers,though,may be sur- intervalis [0,z2/j(n + Z2)] = [0, 3.84/(n + 3.84)] and the
prisedat just how muchbetterthe score methoddoes than exact intervalis [0, 1 - (.025)1/m],whichis approximately
theexactmethod.The exact intervalremainsquite conser- [0, - log(.025)/n] = [0, 3.69/n];thelattershowsan exten-
vativeevenformoderatelylargesamplesizes whenp tends sion of the "rule of 3/n" (Jovanovicand Levy 1997) from
to be near0 or 1. The Wald intervalis also especiallyinad- the .95 upperconfidenceboundto .95 confidencelimits.
equatewhenp is near0 or 1, partlya consequenceof using Is anything sacrificedby usingthe score intervals?Well,
p as its midpointwhen the binomialdistribution is highly since theyare not "exact,"theyare not guaranteedto have
skewed. coverage probabilitiesuniformlybounded below by the
Even thoughthe score intervalstend to have consider- nominalconfidencelevel, and theiractual confidenceco-
ably higheractual coverageprobabilitiesthantheWald in- efficient (theinfimum of suchprobabilities)is, in fact,well
tervals,theyare not necessarilywider.In fact,unless the below it.Vollset's(1993) plotsof thecoverageprobabilities
sample proportionsfall near 0 or 1, theyare shorter.Di- as a functionof p, forvariousmethods,are illuminating for

CoverageProbability Coverage Probability

1. 0 E E E E E E E
1.0 E E E E E E E E
E E E- E
-.. ..-- - -s - -- - - -s . . . .. . . . .s- - - - --- -L - ---a -
------- a-. s.. ... - ..---- ----

0.8 - ww0.8 - w

0.7 0.7
w
0.6 - 0.6 w

0.5 - 0.5

-
0.4 0.4 -wn
I I I I I fln n

0 20 40 60 80 100 0 20 40 60 80 100

(a) (b)

Figure 1. Mean Coverage Probability as a Functionof Sample Size forthe Nominal95% Exact (E), Score (S), and Wald (W) Intervals,Whenp
and (b) a Beta Distribution
(0,1) Distribution
has (a) a Uniform with1 = 10 and 1J = .05.

May 1998 Vol.52, No. 2


TheAmericanStatisticiani, 121
bilitydrops seriouslybelow the nominalconfidencelevel
ExpectedWidth Exact is small.Table 2 illustrates.The proportionof theparame-
0.6 - Wald terspace forwhichthecoverageprobability of thenominal
Score 95% scoreintervalfallsbelow .90 is no morethan.01 when
n > 20. Thattablealso showsthattheproportion of param-
etervalues forwhichthecoverageprobability is within.02
04 of .95 is muchhigherforthe score thantheexact interval.
In fact,thescorecoverageprobability is closerthantheex-
0.3
act coverageprobabilityto .95 over morethan90% of the
parameterspace, forthe sample sizes reported.
0.2 3. THE "ADD TWO SUCCESSES AND TWO
FAILURES" ADJUSTED WALD INTERVAL
0.1
The poor performanceof the Wald intervalis unfortu-
nate,since it is thesimplestapproachto presentin elemen-
0.0 I i p tarystatisticscourses.We strongly recommendthatinstruc-
0.0 0.5 1.0 torspresentthescoreintervalinstead.Santner(1998) makes
Figure2. A Comparison of Expected Widthsforthe Nominal95% thesame recommendation. Of course,manyinstructors will
Exact, Wald,and Score IntervalsWhenn = 15. hesitateto presenta formulasuch as (2) in elementary
courses.The shrinkagerepresentation of thescoreapproach
describingthebehaviorof the methods.The score method suggests,however,thatforconstructing 95% confidencein-
has two verynarrowregionsof values forp, one near0 and tervals(forwhichz2 = 1.962 4 and the midpointof the
r

one near 1, at whichthe actual coverageprobabilityfalls score intervalis (X + z2 /2)/(rn+ z2) (X + 2)/(rn+ 4))
seriouslybelowthenominalconfidencelevel,andthisbadly an instructor will not go farwrongin givingthefollowing
affectsthe actual confidencecoefficient. These regionsget advice: "Add two successes and two failuresand thenuse
closerto 0 and to 1 as n increases.For n = 10 withnominal theWald formula(1)." Thatis, this"adjustedWald" interval
95% confidenceintervals,forinstance,thereis a minimum uses theusual simpleformulapresentedin suchcourses,but
coverageof .835 at p = .018 and p = .982, whereasat n = with(n + 4) trialsand pointestimatep = (X + 2)/(rn+ 4).
100, thereis a minimumcoverageof .838 at p = .002 and The midpointof this interval,p = (X + 2)/(n + 4), is
p = .998. nearlyidenticalto the midpointof the 95% score interval.
We now explain why this happens. There is a region It is identicalto the Bayes estimate(mean of the posterior
of values [O,r) for p that falls in the score confidence distribution) for the beta priordistribution with parame-
intervalonly when X = 0. The upper bound r of this ters 2 and 2, whichhas mean .50 and standarddeviation
region is the lower endpointof the confidenceinterval .224 and which shrinksthe sample proportiontoward.50
when X = 1, which for large n is approximately(1 + somewhatmore thandoes the uniformprior.This simple
2/2- z4 + z2/2)/n. The coverageprobability just be- adjustmentto the ordinaryWald intervalchangesit from
low r is approximatelyP(X = 0) = [I - (1 + z2/2 - highlyliberalto slightlyconservative, on the average,and
zV4 + z2/2)/n]nT exp{-(1 + z2/2 - z 4 + z2/2)}. The a bit moreconservativethanthe score method.Figure3 il-
analogousremarkapplies forvalues of p near 1. This lim- lustrates,showingthemean actualcoverageprobabilityC,
itingcoverage probabilityis .800 for nominal90% inter- forthenominal95% Wald and adjustedWald intervalsas a
vals, .838 for 95% intervals,and .889 for 99% intervals. functionof n, fortheuniformand skewedweightingsof p.
See Huwang (1995) forrelatedremarks.In particular, the The adjustedWald confidenceintervalbehavessurprisingly
actual confidencelevel does not convergeto the nominal well, even forverysmall samplesizes.
level as n increases. Figure 4 shows the actual coverage probabilitiesas a
Thoughthismayseem problematic, theportionof the[0, functionof p for the Wald, adjustedWald, and Clopper-
1] parameterspace over whichthe actual coverageproba- Pearson exact intervalswhen n = 5 and n = 10. The im-

Table 2. Proportionof ParameterSpace forwhich(a) Nominal95% Score Intervalhas Actual Coverage Probability
Below .90; (b) Nominal95% Score and Exact IntervalsHave Actual Coverage Probabilities
Between .93 and .97; (c) Actual Coverage Probability
is Closer to .95 forScore IntervalthanExact Interval

Coverage Coverage closer


Score coverage .93-.97 to .95 forScore
n Prob. below.90 Score Exact thanExact

5 .042 .463 .000 .944


10 .019 .608 .077 .963
20 .010 .792 .297 .925
30 .006 .882 .395 .977
50 .003 .939 .615 .961
100 .002 .968 .830 .961

122 Gener-al
Coverage Probability Coverage Probability

1.0 1.0 -
AAAA A
A-- -A -----A----A.----A----A-A---- -t- AA A A A

0'9 WW W W W W 0.9 W w
W W W W W
W W

0.8 0.8 w
w
0.7 0.7
w
0.6 0.6 - w

0.5 0.5

0.4 0.4 - w
I I I l|I n I I I I n
n

0 20 40 60 80 100 0 20 40 60 80 100

(a) (b)
Figure3. Mean Coverage Probability as a Functionof Sample Size forthe Nominal95% Wald (W) and Adjusted Wald (A) Intervals,Whenp
has (a) a Uniform withg = .10 and a = .05.
and (b) a Beta Distribution
(0,1) Distribution

provementof the adjustedWald intervalover the ordinary ing spikes withseriouslylow coveragenear p = 0 and 1.
Wald intervalis dramatic.The adjustedWald intervalalso This is because thisinterval'srathercrudeboundscontain
has theadvantage,relativeto thescoreinterval,of nothav- 0 whenX = 0 or 1 and contain1 whenX = n- 1 or n. For

CoverageProbability Coverage-Probability Coverage-Probability


1.00 1.00 1.00

0.95 - - - - - - - - - - - - - - - - - - - - -- 0.95 0.95 - - - - - - - - - - - - - - - -- - - - - -

n =5 .
0.90 0.90 0.90

0.85 - 0.85 0.85 -

0.80- 0.80 0.80-

0.75 - 0.75 0.75 -

0.70 - p 0.70 p 0.70 - p


0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0

CoverageProbability CoverageProbability CoverageProbability


1.00 1.00 0
1.,0 ' ''''

0.95.-- -- - - - - - - - - - - -0.95.-0.95 -------------------

n =10 0.90l \ 0.90 0.90


0.85 -0.85 0.85

0.80 0.80 0.80o

0.75- 0.75- 0.75

0.70- p 0.70 p 0.70 p


0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0

Wald AdjustedWald Exact

Figure4. A Comparison
ofCoverageProbabilities 95% Wald,AdjustedWald,and ExactIntervals.
fortheNominal

TheAmerican May1998 Vol.52,No. 2


Statistician, 123
instance,theminimumcoverageprobability forthenominal 4) as theWilsonpointestimatorof p and referto the score
95% adjustedWald intervalis .917 for n = 10 and never confidenceintervalforp as theWilsonmethod.See Stigler
fallsbelow .92 forn > 10. The proportion of theparameter summaryof EdwinB. Wilson'sca-
(1997) foran interesting
space forwhichtheactualcoverageprobability fallswithin reer.Otherhighlights includedserviceas thefirstprofessor
.02 of .95 is slightlyless thanreportedin Table 2 forthe and head of the Departmentof Vital Statisticsat Harvard
score interval,but the proportionof timesits actual cov- School of Public Healthin 1922, theWilson-Hilferty nor-
erage probabilityis closer to .95 thanthe exact intervalis mal approximation forthechi-squareddistribution in 1931,
stillat least .94 forthe sample sizes reportedin thattable. and theWilson-Worcester introduction
of themedianlethal
See Chen (1990) for resultsabout coveragepropertiesof dose (LD 50) in bioassay.
relatedintervalsusingBayes estimatesas midpoints.
Introductory statisticstextbookshave an awkwardtime 4. OTHER INTERVAL ESTIMATION
with sample size recommendationsfor the Wald inter- METHODS FOR p
val. Most simplerecommendations tend to be inadequate
(Leemis and Trivedi1996). Our resultssuggestthatif one Althoughthe focus of this articleis comparisonof the
tells studentsto add two successes and two failuresbe- Wald, score, and exact intervals,which are the methods
foretheyformtheWald 95% interval,it is notnecessaryto commonlypresentedin statisticstextbooks,we nextbriefly
presentsuchsamplesize rules,sincethe"add twosuccesses discuss some alternativemethods.Some elementarytext-
and two failures"confidenceintervalbehaves adequately books (e.g.,Siegel 1988), perhapsrecognizingthepoorper-
forpracticalapplicationforessentiallyanyn regardlessof formanceof the Wald intervals,suggestusing ordinaryt
thevalue of p. confidenceintervalsfora meanforintervalestimationof a
One can use the adjustedWald intervalwithoutregard- proportion. These intervalsare widerthanthe Wald inter-
ing its midpointp = (X + 2)/(n + 4) as thepreferred point vals, of course,butwe foundthatmean coverageprobabil-
estimateof p. However,thisratherstrongshrinkagetoward ities are still seriouslydeficient.Table 1 illustratesforthe
.5 mightoftenprovidea more appealingestimatethanp. uniformweighting.
The mean square errorof p equals [np(l - p) + 16(p - Other,morecomplex,methodsexistforconstructing ex-
.5)2]/(n + 4)2, which is smallerthanthatof p when p is act confidenceintervals,such as presented by Blyth and
withinv/3n2+ 8n + 4/(6n + 4) of .5; thisintervalof val- Still (1983) and Duffyand Santner(1987). Our evaluations
ues of p decreasesfrom(.113, .887) to (.211, .789) as n in- of theseintervalsindicatedthattheyperform betterthanthe
creases.Interestingly, Wilson(1927) mentionedthisshrink- Clopper-Pearsonintervalsbut not as well as the score in-
age estimatoras a reasonablealternative to thesamplepro- tervals,stillshowingconsiderableconservatism. To reduce
portionor the Laplace estimator(X + 1)/(n + 2). Letting theconservativeness inherentin exactmethodsfordiscrete
S denoteX, the numberof successes,Wilson stated,"As distributions, manyauthorsrecommendusingtestsand con-
the distribution of chances of an observationis asymmet- fidenceintervalsbased on themid-P value,namelyhalfthe
ric,it is perhapsunfairto take thecentralvalue as thebest probabilityof the observedresultplus the probabilityof
estimateof thetrueprobability; butthisis whatis actually more extremeresults(Lancaster1961). The mid-P confi-
done in practice... . Those who make theusual allowance dence intervalis theinversionof the adaptationof the ex-
of 2crfordrawingan inferencewoulduse (S + 2)/(n + 4)." act testthatuses themid-P value. Resultsin Vollset(1993)
In recognitionof his pioneeringwork,predatingthe fa- suggestthatthe mid-P intervaltendsto performwell but
mous articlesby Neymanand Pearsonon confidenceinter- is somewhatmoreconservative thanthescoreinterval,typ-
vals,we suggestthatstatisticians referto p = (X + 2)/(n + ically havingactual coverageprobabilitygreaterthan(and

Coverage Coverage Coverage


Probability Probability Probability
0
1.0O 1V00 1.00

0.90 - -- - - -0.95 0.90

0.90 0.90 0 90

0.o - 0.95- 0 -

0 80 0980 0.90

0.75 - 0.75 0.75

I I j I I I I
0.70- p 0.70 0.70 - p
0 5 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Wald Score Exact

Figure5. A Comparisonof Coverage Probabilitiesforthe Nominal95% Wald,Score, and Exact Intervalsfora Poisson Mean.

124 General
nevermuch less than) the nominalconfidencelevel. Our is betterthantheexact intervalin termsof how mostprac-
evaluationsagreedwiththis,and are also illustratedin Ta- titionersinterpret thatterm.
ble 1. We feelthisis a reasonablemethodto use, especially Resultssimilarto thosein thisarticlealso hold in other
if one is concernedthatp maybe veryclose to 0 or 1. It is discreteproblems.For instance,similarcomparisonsapply
morecomplexcomputationally thanthescore and adjusted for score,Wald, and exact confidenceintervalsfora Pois-
Wald intervals, butlike thoseintervalsit has theadvantage son parameter,u,based on an observationX fromthatdis-
of being shorterthantheexact interval. tribution. Figure 5 illustrates, plottingthe actual coverage
Yet anotheralternative methodis a continuity-corrected probabilities whenthenominalconfidencelevelis .95. Here,
versionof thescoreinterval,based on thenormalcontinu- the score intervalfor,uresultsfrominverting the approx-
itycorrectionforthebinomial.This intervalapproximates imately normal test statistic z = (X - Ato)/ to,theWald
theClopper-Pearsoninterval,however,and our evaluations interval results from inverting z = (X - Ato)/ X, and the
and resultsin Vollset(1993, Fig. 2) suggestthatit is often endpoints of the exact interval, (1/2)(x x 025' X2(X?1),.975)
as conservativeas the exact intervalitself.Again,Table 1 resultfromequatingtail sumsof nullPoisson probabilities
illustrates,and we do notrecommendthisapproach. to .025 (Garwood 1936; forn independentPoisson obser-
Finally,we mentiontwoothermethodsthatperform well. vations,X ..... ,Xn, the same formulasapply if one lets
The confidenceintervalbased on inverting the likelihood- X = E Xi and t = E(X) = nE(Xi)). For anotherdiscrete
ratiotestis similarto the score intervalin termsof how it example,see Mehta and Walsh (1992) fora comparisonof
compareswiththeexact interval,butit is morecomplexto exactwithmid-P confidenceintervalsforodds ratiosor for
construct.Not surprisingly, Bayesian confidenceintervals a commonodds ratioin several2 x 2 contingency tables.
withbeta priorsthatare onlyweaklyinformative also per- Exact inferencehas an important place in statisticalinfer-
formwell in a frequentist sense (see, e.g., Carlinand Louis ence of discretedata, in particularfor sparse contingency
1996, pp. 117-123). table problemsfor whichlarge-samplechi-squaredstatis-
In decidingwhetherto use the score interval,some may tics are oftenunreliable.However,approximateresultsare
be botheredby its poor coverageforvalues of p just below sometimesmore usefulthanexact results,because of the
the lowerboundaryof the intervalwhen X = 1 and just inherentconservativeness of exact methods.
above theupperboundaryof theintervalwhenX = n- 1.
One could thenuse an adapted versionthatreplaces the [ReceivedFebruiary 1997. RevisedNovember1997.]
lower endpointby - log(l - a)/n when X = 1 and the
upperendpointby 1 + log(l - ca)/nwhenX = n - 1. (e.g., REFERENCES
atp =log(l - a)/n, P(X = O) = [I + log(l - a)/n] Agresti,A. (1996), An Introductionz to CategoricalData Analysis,New
1 - a.) This adaptationimprovesthe minimumcoverage York:Wiley.
considerably.For instance,the nominal95% intervalhas Blyth,C. R., and Still, H. A. (1983), "Binomial ConfidenceIntervals,"
Journalof theAmericanStatisticalAssociation,78, 108-116.
minimumcoverageprobability converging to .895 forlarge
B6hning,D. (1994), "BetterApproximateConfidenceIntervalsfor a Bi-
n, whichis the large-samplecoverageprobability at p just nomialParameter," CancadianJournalof Statistics,22, 207-218.
below thelowerendpointof theintervalwhenX = 2. Carlin,B. P., and Louis, T. A. (1996), Bayes acndEmzpirical
Bayes Methods
for Data Analysis,London: Chapmanand Hall.
Chen,H. (1990), "The Accuracyof ApproximateIntervalsfora Binomial
Parameter,"Journalof theAnmerican StatisticalAssociation,85, 514-
5. CONCLUSION AND EXTENSIONS 518.
Clopper,C. J., and Pearson, E. S. (1934), "The Use of Confidenceor
The Clopper-Pearsonintervalhas coverageprobabilities Fiducial LimitsIllustratedin theCase of theBinomial,"Biometrika, 26,
boundedbelow by the nominal confidencelevel, but the 404-413.
typicalcoverageprobability is muchhigherthanthatlevel. Duffy, D. E., andSantner, T. J.(1987), "ConfidenceIntervalsfora Binomial
ParameterBased on MultistageTests,"Biometrics,43, 81-93.
The score and adjustedWald intervalscan have coverage Garwood, F. (1936), "Fiducial Limits for the Poisson Distribution,"
probabilitieslower thanthe nominalconfidencelevel, yet Biometrika,28, 437-442.
the typicalcoverage probabilityis close to thatlevel. In Ghosh,B. K. (1979), "A Comparisonof Some ApproximateConfidenceIn-
forminga 95% confidenceinterval,is it betterto use an ap- tervalsfortheBinomialParameter," Journcalof theAnmericanzStatistical
Association,74, 894-900.
proachthatguaranteesthattheactualcoverageprobabilities
Huwang,L. (1995), "A Note on theAccuracyof an ApproximateInterval
are at least .95 yettypicallyachievescoverageprobabilities forthe BinomialParameter," Statistics& ProbabilityLetters,24, 177-
of about .98 or .99, or an approachgivingnarrowerinter- 180.
vals forwhichtheactualcoverageprobability could be less Jovanovic,B. D., and Levy,P. S. (1997), "A Look at the Rule of Three,"
The Anmericanz Statistician,51, 137-139.
than.95 but is usuallyquite close to .95? For most appli-
Lancaster,H. 0. (1961), "SignificanceTests in Discrete Distributions,"
cations,we would preferthelatter.The score and adjusted Journalof theAmericanz StatisticalAssociation,56, 223-234.
Wald confidenceintervalsfor p provide shorterintervals Laplace, P. S. (1812), TheorieAnalytiquedes Probabilites,Paris:Courcier.
withactualcoverageprobability usuallynearerthenominal Leemis,L. M., and Trivedi,K. S. (1996), "A Comparisonof Approximate
confidencelevel. In particular,even thoughthe score and IntervalEstimatorsforthe BernoulliParameter,"The Amzerican? Statis-
tician,50, 63-68.
adjusted Wald intervalsleave somethingto be desiredin
Mehta, C. R., and Walsh, S. J. (1992), "Comparisonof Exact, Mid-p,
termsof satisfyingthe usual technicaldefinition of "95% and Mantel-HaenszelConfidenceIntervalsfortheCommonOdds Ratio
confidence," theoperationalperformance of thosemethods Across Several 2x2 ContingencyTables," The Anmerican? Statisticiacn,

May 1998 Vol.52, No. 2


Statistician,
The Americacn 125
46, 146-150. Stigler,S. M. (1997), "EdwinBidwellWilson,"in LeadinigPersonalitiesin
Neyman,J. (1935), "On the Problemof ConfidenceLimits,"Annals of StatisticalSciences,eds. N. L. Johnsonand S. Kotz, New York:Wiley,
MathematicalStatistics,6, 111-116. pp. 344-346.
Santner,T. J.(1998), "A Note on TeachingBinomialConfidenceIntervals," Vollset,S. E. (1993), "ConfidenceIntervalsfor a Binomial Proportion,"
TeachingStatistics,20, 20-23. Statisticsin Medicine,12, 809-824.
Santner,T. J.,and Duffy,D. E. (1989), The StatisticalAnalysisofDiscrete Wilson, E. B. (1927), "Probable Inference,the Law of Succession, and
Data, Berlin:Springer-Verlag. StatisticalInference,"Journalof theAmericanStatisticalAssociation,
Siegel, A. F. (1988), Statisticsand Data Anialysis.
New York:Wiley. 22, 209-212.

126 Genieral

Das könnte Ihnen auch gefallen