Sie sind auf Seite 1von 18

Chapter26

ConfidenceIntervals
ThischaptercontinuesourstudyofestimatingpopulationPARAMETERSfromRANDOMSAMPLES.InCHAPTER
25,ESTIMATINGPARAMETERSFROMSIMPLERANDOMSAMPLES,westudiedESTIMATORSthatassignanumberto
eachpossiblerandomsample,andtheuncertaintyofsuchestimators,measuredbytheirRMSE.(The
RMSEisthesquarerootoftheexpectedvalueofthesquareddifferencebetweentheestimatorandthe
parameterameasureofthetypicalsizeoftheerror.)Insteadofassigningasinglenumbertoeach
sampleandreportingthesizeofatypicalerror,themethodsinthischapterassignanintervaltoeach
sampleandreporttheCONFIDENCELEVELthattheintervalcontainstheparameter.Confidenceisa
technicaltermrelatedtoprobability.JustastheRMSEofanestimatormeasuresthelongrunaverage
sizeoftheerrorinrepeatedsampling,buttheerrorforanyparticularsamplecouldbesmallerorlarger
thantheRMSE,theconfidencelevelisthelongrunfractionofintervalsthatcontaintheparameterin
repeatedsampling,buttheintervalforanyparticularsamplemightormightnotcontaintheparameter.
Thestatement"theinterval[92%,94%]containsthepopulationpercentageatconfidencelevel90%"does
notmeanthattheprobabilitythatthepopulationpercentageisbetween92%and94%is90%.(Theevent
thattheinterval[92%,94%]containsthepopulationpercentageisnotrandom:Eitherthepopulation
percentageisbetween92%and94%,oritisnot.)Rather,thestatementmeansthatifweweretotake
samplesofsizenrepeatedlyandcomputea90%confidencelevelconfidenceintervalforthepopulation
percentagefromeachsampleofsizen,thelongrunfractionofintervalsthatcontainthepopulation
percentagewouldconvergeto90%.
Thelengthoftheconfidenceintervalandtheconfidencelevelmeasurehowaccuratelyweareableto
estimatetheparameterfromasample.Ifashortintervalhashighconfidence,thedataallowusto
estimatetheparameteraccurately.Higherconfidencegenerallyrequiresalongerinterval,ceterisparibus,
and,shorterintervalsgenerallyhavelowerconfidencelevels.Conventionalvaluesfortheconfidence
levelofconfidenceintervalsinclude68%,90%,95%,and99%,butsometimesothervaluesareused.Itis
crucialtoknowtheconfidencelevelassociatedwithaconfidenceinterval:Theintervalbyitselfis
meaningless.

Conservativeconfidenceintervalsforpercentages

Inthissection,wedevelopconservativeconfidenceintervalsforthepopulationPERCENTAGEbasedonthe
SAMPLEPERCENTAGE,usingCHEBYCHEVSINEQUALITYandanupperboundontheSDofliststhatcontain
onlythenumbers0and1.Conservativemeansthatthechancethattheprocedureproducesaninterval
thatcontainsthepopulationpercentageisatleastlargeasclaimed.(Laterinthischapterwewillconsider
approximateconfidenceintervals.)
Considera01BOXofNtickets.Thepopulationpercentagepisthefractionofticketslabeled"1:"
p=100%(#ticketsinthepopulationlabeled"1")/N,
ThepopulationpercentageisalsothePOPULATIONMEANofthenumbersonalltheticketsinthebox,
ave(box).ThesamplepercentageofaSIMPLERANDOMSAMPLE(randomsamplewithoutreplacement)of
sizenfromthepopulationofNticketsis
=100%(#ticketsinthesamplelabeled"1")/n.
Thesamplepercentageisthesamplemeanofthelabelsontheticketsinthesample.TheEXPECTED
VALUEofthesamplepercentageisthepopulationpercentagep,andtheSEofthesamplepercentage
is[+]
SE()=f(p(1p))/n
f50%/n,
wherefisthefinitepopulationcorrection
f=(Nn)/(N1).
Thusf50%/nisanupperboundontheSEofthesamplepercentage.
FIGURE261showswhathappensifwecenteranintervalatthesamplepercentage,andextendthe
intervaldownandupfromthesamplepercentagebytwicetheupperboundontheSEofthesample
percentage.Whentheintervalincludesthepopulationpercentage,wesaytheintervalCOVERSthetruth.
Theintervalisrandom,becauseitiscenteredatthesamplepercentage,whichisrandom.Thechance
thattherandomintervalwillcontainthetruepopulationpercentageiscalledtheCOVERAGEPROBABILITYof
theinterval.TakeafewsamplesbyclickingTakeSampletogetthefeelofthetoolthenincrease
SamplestoTaketo1000andclickTakeSampleagain.Theactualpercentageofintervalsthatcoverwill
vary,butalmostalwaysitwillbelargerthan75%,sometimesnearly100%.Theempiricalpercentageof
intervalsthatcoverisanestimateofthecoverageprobabilityoftheprocedure.Varythesamplesizeand
putafewdifferentlistsofzerosandonesintothePopulationboxattherightofthefigure,andtryafew
differentsamplesizesforeachpopulation.Youshouldfindthatthefractionofintervalsthatcoverthetrue
populationpercentagestaysabove75%(almostwithoutfail),nomatterwhatthepopulationofzerosand
onesis.

Figure261:ConservativeConfidenceIntervalforthePopulation
Percentage

Samplefrom: Box

TakeSample withoutreplacement HideBox


1
1
0
0
0
1

Samples:0
SD(Box):0.5
Ave(Box):0.5

0.1
SampleSize: 3

0.0

0.1

0.2

0.3

Samplestotake: 1

0.4

0.5

0.6

Intervals:+/ 2

0.7

0.8

0.9

* BoundonSE(01boxonly)

1.0
0%cover

Whydotheserandomintervalscoverthetruepopulationpercentagesooften?Wecanshowthatthey
shouldusingChebychev'sinequality.Because
SE()f50%/n,
theevent
|p|kSE()
isasubsetoftheevent
|p|kf50%/n.
Itfollowsthat
P(|p|kSE())P(|p|kf50%/n).

CHEBYCHEV'SINEQUALITYguaranteesthatthechancethesamplepercentagediffersfromitsexpected
valuepbymorethanktimesitsSTANDARDERRORisatmost1/k2,so
11/k2P(|p|kSE())
P(|p|kf50%/n).
Thatis,
P(|p|kf50%/n)11/k2.
Therefore,inthelongruninrepeatedsampling,thefractionoftrialsinwhichthesamplepercentageis
within2f50%/nofthepopulationpercentagepconvergestoanumberthatis75%orlarger.[+]
Wheneveriswithin2f50%/nofthepopulationpercentagep,anintervalcenteredatextending
downandupby2f50%/nwillcontainp.Thatis,theinterval
2f50%/n,
whichisshorthandfor

[2f50%/n,+2f50%/n],
containspatleast75%ofthetime,inthelongrun.Similarly,thefractionoftrialsinwhichiswithin
3f50%/nofpconvergestoanumberthatis88.89%orlarger,sothelongrunfractionofintervals
3f50%/nthatcontainpwillbe88.89%orlarger.Thefractionoftrialsinwhichiswithin

4f50%/nofpconvergestoanumberthatis93.75%orlarger,sothelongrunfractionofintervals
4f50%/nthatcontainpwillbe93.75%orlarger,etc.
Ingeneral,ifwegodownandupfromthesamplepercentagebykf50%/n,theninthelongrunin
repeatedtrials,theresultingintervalswillincludethetruepopulationpercentageatleast11/k2ofthe
time.
ChangetheIntervals:valueinFIGURE261to3andto4toconfirmempiricallythatthisistrue.
Theintervalkf(50%/n)israndom:Itscenterdependson,whichinturndependsonwhichUNITS
(here,tickets)happentobeintherandomsample.Theprobabilityisintherandomsamplingprocedure,
notintheparameter.ThePARAMETERisthesame,nomatterwhatsamplewehappentogetthe
parameterisapropertyofthepopulation,notthesample.Itistheintervalthatvarieswiththerandom
sample.Beforethedataarecollected,thecoverageprobabilityisthechancethatsamplingwillresultin
anintervalthatcontainstheparameter.
Takingthesampledeterminestheinterval,leavingnothingtochance:Theintervaltheprocedure
producedeitherdoesordoesnotcontainthepopulationpercentage.(Onecouldsaythataftercollecting
thedata,thechancethattheintervalcoverstheparameteriseither0or100%.)Typically,weneverlearn
whethertheintervalcoverstheparameter,butourignoranceisnotaprobability(atleast,notaccordingto
theFREQUENCYTHEORYOFPROBABILITYusedinthisbook).
TheintervaltheproceduregivesforanyparticularsetofdataiscalledaCONFIDENCEINTERVAL.The
CONFIDENCELEVELofaCONFIDENCEINTERVALisequaltotheCOVERAGEPROBABILITYoftheprocedurebefore
thedataarecollected.

CONFIDENCEisawordstatisticiansreserveforthisidea.If,beforecollectingthedata,theprocedureweare
usinghasaP%chanceofproducinganintervalthatCOVERSthetruePOPULATIONPERCENTAGE,then,after
collectingthedata,theintervaltheprocedureproducediscalledaP%CONFIDENCEINTERVAL.

CoverageProbabilityandConfidenceLevel
Considerapopulationparameter,andaprocedurethatproducesrandom
intervals.Supposethattheprobabilitythattheprocedureproducesan
intervalthatcontainstheparameterisP%.
1. TheprocedureissaidtohavecoverageprobabilityP%.
2. Theintervaltheprocedureproducesforanyparticularsampleis
calledaP%confidenceintervalfortheparameter,oraconfidence
intervalfortheparameterwithconfidencelevelP%.

Inrepeatedsampling,aboutP%ofconfidenceintervalswithconfidencelevelP%willcontain(COVER)the
PARAMETER.About(100P)%oftheintervalswillnotcovertheparameter.Foranyparticularsample,
unlessthepopulationparameterisknown,wewillnotknowwhethertheconfidenceintervalcoversthe
PARAMETER.
CHAPTER25,ESTIMATINGPARAMETERSFROMSIMPLERANDOMSAMPLES,summarizedtheuncertaintyofan
estimateofaparameterbytheMEANSQUAREDERRORorROOTMEANSQUAREDERRORoftheestimator,which
aremeasuresoftheaverageerroroftheestimatorinrepeatedsampling.Aconfidenceintervalisa
differentwayofexpressingtheuncertaintyinanestimate:arangeofvaluesthatcontainstheparameter
withspecifiedconfidencelevel.
TheinterpretationofconfidencelevelforaparticularintervalisanalogoustotheinterpretationofRMSE
foraparticularvalueoftheestimate:TheRMSEisthesquarerootofthelongrunaveragesquarederror
oftheestimatorinrepeatedsampling,butforanyparticularsample,theerrorcouldbelargerorsmaller
thantheRMSEandwewillnotknowwhichunlessweknowthetruevalueoftheparameter.The
confidencelevelmeasuresthelongrunfractionofintervalsthatcontaintheparameterinrepeated
sampling,butforanyparticularsample,theconfidenceintervaleitherwillorwillnotcontaintheparameter
andwewillnotknowwhichunlessweknowthetruevalueoftheparameter.[+]
WecanusetheapproachdevelopedinthissectiontoconstructconfidenceintervalsforthePOPULATION

PERCENTAGEPwithothernominalconfidencelevels,byextendingtheintervalupanddownfromthe
SAMPLEPERCENTAGEbylargerorsmalleramounts.Thelongertheintervals,thelargerthenominal

confidencelevelthelargerthechancethatanintervalwillcontainp.Theshortertheintervals,the
smallerthechancethatanintervalwillcontainp.Inparticular,ifwechooseksothat[+]
11/k2=P%,
thentheinterval

[kf50%/n,+kf50%/n]
isa(nominal)P%confidenceintervalforthepopulationpercentagep.
Conversely,togetanominalP%conservativeconfidenceintervalforthepopulationpercentageusinga
simplerandomsample,weshouldtakeanintervalthatextendsdownandupfromthesamplepercentage
bykf50%/n,with
k=(1P/100).
TheactualCOVERAGEPROBABILITYoftheinterval

[kf50%/n,+kf50%/n]
isgreaterthan(11/k2),fortworeasons.First,theSTANDARDERRORofthesamplepercentageisless
thanf(50%/n)unlessthepopulationpercentagepis50%.Second,thedistributionofthesample
percentageisthatofanhypergeometricrandomvariabledividedbythesamplesize,n,andsucha
distributioncannotattaintheboundinCHEBYCHEV'SINEQUALITY:EvenforthetrueSEofthesample
percentage,
SE()=f(p(1p))/n,
thechancethatthesamplepercentageiswithinkSE()ofthepopulationpercentagepisgreaterthan
11/k2:
P(|p|<kSE())>11/k2.
Asaresult,confidenceintervalsforthepopulationpercentagebasedonChebychev'sinequalityandthe
upperboundof50%fortheSDofalistofzerosandonesareconservative:theactualCONFIDENCELEVEL
isgreaterthanthenominalconfidencelevel,(11/k2).Thenextsectiondevelopsaprocedurethatisnot
conservative,butthatisapproximate:Theconfidencelevelcouldbelargerorsmallerthanthenominal
level.(Thenominalconfidencelevelisclosetotheactualconfidencelevelwhenthesamplesizenis
large.)
Apopulationpercentagecannotbelessthan0%.Ifthelowerendpointofaconfidenceintervalfora
populationpercentageisnegative,itiscompletelylegitimatetoreplacethelowerendpointbyzero:It
doesnotdecreasetheconfidencelevel.Similarly,apopulationpercentagecannotbegreaterthan100%.
Iftheupperendpointofaconfidenceintervalforapopulationpercentageisgreaterthan100%,itis
legitimatetoreplacetheupperendpointby100%.Theconfidencelevelremainsthesame.Similarly,ifwe
areconstructingaconfidenceintervalforaquantitythatcannotbenegative(height,weight,orage,for
instance),removingnegativevaluesfromaconfidenceintervalcannotreducethecoverageprobabilityor
confidencelevel.

ConfidenceIntervalsforRestrictedParameters
Ifsomevaluesofaparameterareknowntobeimpossible,excluding
thosevaluesfromaconfidenceintervaldoesnotreducetheconfidence
leveloftheconfidenceinterval.
Conversely,includingimpossiblevaluesofaparameterinaconfidence
intervaldoesnotincreasetheconfidencelevel.
Forexample,ifaconfidenceintervalforaparameterthatmustbepositive
hasalowerendpointthatisnegative,thelowerendpointcanbereplaced

withzero.Theconfidencelevelremainsthesame.
Inparticular,ifthelowerendpointofaconfidenceintervalforapopulation
percentageisnegative,thelowerendpointcanbereplacedwithzero.If
theupperendpointofaconfidenceintervalforapopulationpercentageis
greaterthan100%,theupperendpointcanbereplacedwith100%.

Wheneveryouuseaconfidenceinterval,itcrucialtoreporttheconfidencelevel.Otherwise,itis
impossibletointerprettheresult.Thechoiceoftheconfidencelevelisessentiallyarbitrary,butthechoice
shouldbemadebeforecollectingthedata.Commonvaluesoftheconfidencelevelare68%,90%,95%,
and99%.Thereisatradeoffbetweenprecision(thelengthoftheconfidenceinterval),andconfidence
level:Ceterisparibus,higherconfidencelevelsrequirelongerconfidenceintervals.
Thefollowingexercisechecksyourabilitytocomputeaconservativeconfidenceintervalforthe
populationpercentage.

Exercise261.TheenteringclassatNorthSouthcentralUniversitycontains600
students.Thedean'sofficeseekstodeterminethepercentageofenteringstudents
whohavecreditcards.Thedean'sofficewilltakeasimplerandomsampleof40
enteringstudents,interviewthem,andcomputethesamplepercentage.Theoffice
wouldliketoconstructaconservative75%confidenceintervalforthepercentageof
enteringstudentswhohavecreditcards.Thecenteroftheintervalwillbethe
samplepercentage.
Theintervalshouldextendupanddownfromthesamplepercentageby

Thesampleistaken,andthesamplepercentageisobservedtobe86%.
Thelowerendpointoftheconfidenceintervalshouldbe
upperendpointshouldbe

andthe

Theprobabilitythatthisintervalcontainsthepercentageofstudentsintheentering
classwhohavecreditcards ?

Theconfidencelevelofthisinterval ?

[+Solution]

Conservativeconfidenceintervalsforpopulationmeansofbounded
boxes
Recallthatpercentagesarejustmeansofspeciallistsofnumbers,liststhatcontainsonlyzerosandones.
Wecanfindconfidenceintervalsforthemeansofmoregenerallistsofnumbers,too.
IntheprevioussectionweexploitedthefactthattheSDofa01boxisatmost1/2toconstruct
conservativeconfidenceintervalforthepopulationmeanofa01boxthatis,thepopulationpercentage.
Theapproachcanbeusednotonlyfor01boxes,butwheneverwecanfindaboundontheSDofthe
box,sothatwecanapplyChebychev'sinequality.Foranyboxofnumberedticketswhatsoever,the
samplemeanofasimplerandomsampleorrandomsamplewithreplacementisanunbiasedestimatorof
thepopulationmeanofthenumbersonthetickets,andtheSEofthesamplemeanisproportionaltothe
SDofthebox.
Forinstance,supposeweknowthatthenumbersontheticketsintheboxareallbetweenaandb,witha
b.ThenSD(box)isatmost(ba)/2.[+]Inthespecialcasethata=0andb=1,thisimpliesthattheSD
ofa01boxisatmost50%,aswehaveseenalready.

Thatinturnimpliesthatthemeansthatifallthenumbersinaboxarebetweenaandb,theSEofthe
samplemeanofasimplerandomsampleofndrawsfromtheboxisatmostf(ba)/(2n),wherefisthe
FINITEPOPULATIONCORRECTION.AndtheSEofthesamplemeanofndrawswithreplacementfromthebox
isatmost(ba)/(2n).

SamplingfromaBoundedBox
Supposeallthenumbersinaboxarebetweenaandb,withab.Then:
SD(box)isatmost(ba)/2
TheSEofthesamplemeanofndrawswithreplacementfromthebox
isatmost(ba)/(2n).
TheSEofthesamplemeanofasimplerandomsampleofsizenfrom
theboxisatmostf(ba)/(2n),wherefistheFINITEPOPULATION
CORRECTION.

WithaboundontheSE,wecanuseChebychev'sinequalitythesamewaywedidforthepopulation
percentagetogetaconfidenceintervalforthepopulationmeanofthenumbersontheticketsinabox:

ConservativeConfidenceIntervalsforthePopulationMeanofaBounded
List
Supposeallthenumbersinaboxarebetweenaandb,whereab.
Forasimplerandomsampleofsizen,thechancethattherandominterval
[(samplemean)kf(ba)/(2n),(samplemean)+kf(ba)/(2n)]
includesthemeanofthenumbersintheboxisatleast11/k2,wherefis
thefinitepopulationcorrection(Nn)/(N1),Nisthepopulationsize,
andnisthesamplesize.
Forrandomsamplingwithreplacement,thechancethattherandom
interval
[(samplemean)k(ba)/(2n),(samplemean)+k(ba)/(2n)]
includesthemeanofthenumbersintheboxisatleast11/k2.
Inbothcases,ifthelowerendpointoftheintervalislessthana,itcanbe
replacedbya,andiftheupperendpointoftheintervalisgreaterthanb,it
canbereplacedbyb.
Theseareconservativeproceduresforconstructingconfidenceintervals:
theprobabilitythattheintervalstheyproducecoverthetruepopulation
meanisgreaterthantheprobabilitytheyclaim,11/k2(thenominal
coverageprobability).

Approximateconfidenceintervalsforpercentages

ConfidenceintervalsforthepopulationpercentagebasedonChebychev'sinequalityandtheupperbound
of50%fortheSDoflistsofzerosandonesareconservative:Theirtrueconfidencelevelisgreaterthan
theirnominalconfidencelevel,(11/k2).Wecoulduseshorterintervalsandstillhaveconfidencelevel
(11/k2),orwecouldclaimaconfidencelevelhigherthan(11/k2).
Howmuchshortercouldtheintervalbe,orhowlargeaconfidencelevelcouldweclaim?Itispossibleto
figurethesethingsoutprecisely,[+]butweshallfollowastandardapproximateapproachinstead,one
thatwecanextendtoothersituations.WeshallusetheCENTRALLIMITTHEOREMtodevelopaprocedure
thatproducesshorterconfidenceintervalsforagivennominalconfidencelevel.Thenewprocedurewill
beapproximateinsteadofconservative:thecoverageprobabilitywillbeclosetothenominalcoverage
probabilitywhenthesamplesizeislarge,butcouldbesmallerorlargerdependingonthepopulation
percentage,andcouldbequitedifferentfromthenominalcoverageprobabilityforsmallsamplesfrom
pathologicalpopulations.
Weshallassumethroughouttherestofthischapterthateither
thesampleisdrawnwithreplacement,or
thesamplesizenismuch,muchsmallerthanthepopulationsizeN.
Withthisassumption,wecanneglecttheFINITEPOPULATIONCORRECTIONandactasiftheticketsinthe
sampleweredrawnindependently.(SeeCHAPTER22,STANDARDERROR.)Whentheticketsaredrawn
independently,theCENTRALLIMITTHEOREMtellsusthatasthesamplesizegrows,theNORMALCURVEisa
betterandbetterapproximationtothePROBABILITYHISTOGRAMoftheSAMPLEPERCENTAGE(andtothe
probabilityhistogramoftheSAMPLEMEAN).TheNORMALAPPROXIMATIONtotheprobabilitythatthesample
percentageisintheinterval
[p1.15(p(1p))/n,p+1.15(p(1p))/n]
isequaltotheareaundertheNORMALCURVEforthecorrespondingrangeofvaluesinSTANDARDUNITS,
[1.15,1.15].Theareaunderthenormalcurvebetween1.15and1.15isabout75%:

Selectedarea:74.99%

Lowerendpoint: 1.15

Upperendpoint: 1.15

2)=24.4%that
Thisismuchlargerthantheboundof(11/(1.15)
CHEBYCHEV
'S3INEQUALITY
5
4
3
2
1
0
1
2
4 gives.When
5
thesamplepercentageiswithin

1.15(p(1p))/n
ofp,piswithin
1.15(p(1p))/n
ofthesamplepercentage,sotheprobabilitythattheinterval
I=[1.15(p(1p))/n,+1.15(p(1p))/n]
containsthepopulationpercentagepisabout75%:ThecoverageprobabilityofIisapproximately75%.
Unfortunately,wecannotconstructIfromthesamplealone:thesampledeterminesthecenterofI,butto
findthelengthofIweneedtoknowp(1p),whichistantamounttoknowingp.[+]Ifweknewp,we
wouldnotbeestimatingit.
Ifthesamplesizenislarge,theSAMPLESTANDARDDEVIATIONS
s=((n/(n1))(1)),
islikelytobeclosetotheSDofthepopulationwhenthathappens,
s/n
isclosetoSE(),thestandarderrorofthesamplepercentage.Therefore,ifthesamplesizeislarge,but
eitherthesampleissmallcomparedtothepopulationorthesampleistakenwithreplacement,the
probabilitythattherandominterval

[1.15s/n,+1.15s/n]
containsthepopulationpercentagepisabout75%.Thisintervalhasnotonlyarandomcenter(the
samplepercentage),butalsoarandomlength(thelengthdependsontheobservedvalueofs,andsis
random,becauseitdependsontherandomsample).
FigureFIGURE262letsyoutrytheprocedureyourself.EachtimeyouclicktheTakeSamplebutton,a
sampleisdrawnwithreplacementfromthenumbersintheboxontheright(initiallysettoarandomlistof
zerosandones).Thesamplesizeinitiallyissetto30.Thecontrolsatthebottomofthefigureallowyouto
changethesizeofeachsample,thenumberofsamplesthataretakeneachtimeyouclickthebutton,
andthewidthoftheinterval,asamultipleoftheestimatedSEortheconservativeboundontheSE.(The
estimatedSEisS/nbecausewearesamplingwithreplacementtheboundis0.5/n.)Alabelinthe
bottomrightcornerreportsthefractionofintervalsthatCOVERthepopulationpercentage.Intervalsthat
coveraregreenthosethatdonotcoverarered.Asmallblackdotmarksthemiddleofeachinterval(the
samplepercentage).Ablueverticallinemarksthetruepopulationpercentagep.

Figure262:Approximateconfidenceintervalsforthepopulation
meanandpercentage
Samplefrom: Box
Samples:0
SD(Box):0.49
Ave(Box):0.4

TakeSample withreplacement HideBox


0
0
1
0
1

SampleSize: 30

Samplestotake: 1

Intervals:+/ 1.15

* EstimatedSE

0%cover

TakeafewsamplestogetthefeelofthetoolthenincreasetheSamplestotaketo1000,andclickthe
TakeSamplebuttonagain.Theactualpercentageofintervalsthatcoverwillvary,butshouldbe
reasonablycloseto75%.IncreaseSamplesizeto200andtryagainthepercentageofintervalsthat
covershouldbecloserto75%.TryputtingafewdifferentlistsofzerosandonesintothePopulationbox
attherightofthefigure,andtryafewdifferentsamplesizesforeachpopulation.Whenthesamplesizeis
large,thefractionofintervalsthatcoverthetruepopulationpercentagewillbeverycloseto75%.
Thefollowingexercisescheckyourabilitytocomputeconservativeandapproximateconfidenceintervals
forthepopulationpercentage,andyourabilitytodeterminewhichmethodismoreappropriate.
0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

VideosofExercises
(Reminder:Examplesandexercisesmayvarywhenthepageisreloadedthevideoshowsonlyone
version.)


Exercise262.IwouldliketoknowthefractionofUCBerkeleyundergraduates
whocommutetoschoolfromtheirparents'homes.Isendemailtostudentswith
campuscomputeraccountsuntil100haveresponded3oftheresponderswere
commuters.
Anapproximate95%confidenceintervalforthefractionofUCBerkeley
undergraduateswhocommutetoschoolisfrom
to

[+Solution]

Exercise263.IwouldliketoknowthefractionofhomesinAlamedaCounty,
California,thathaveassessedvaluesof$700,000ormore.Itakeasimplerandom
sampleofsize500fromtheAlamedaCountypropertytaxrecords(somehow).The
samplepercentageofhomesassessedat$700,000ormoreis10%.
Anapproximate96%confidenceintervalforthepercentageofhomesassessedat
$700,000ormoreisfrom
to

[+Solution]

Exercise264.Arandomsamplewithreplacementofsize20wastakenfromabox
oftickets.Eachticketintheboxisnumberedeitherzeroorone.Sixoftheticketsin
thesamplearelabeled"1"therestarelabeled"0."
Thesamplepercentageis
Thesamplesize ?

largeenoughtojustifyassumingthatsiscloseto

SD(box)andusingaconfidenceintervalbasedonthenormaldistribution.
TheSEofthesamplepercentageis ?

Aconservative70%confidenceintervalforthepopulationpercentageisfrom
to

[+Solution]


Exercise265.Arestauranteurplanstochangethemenuinherrestaurant,which
specializesingamemeats.Sheistryingtodecidewhetherornottooffervenison
goulashonthenewmenu.Eachdayforamonth,shepickspeopleatrandomas
theycomeintotherestaurant,andasksthemwhethertheywouldordervenison
goulashifitwereoffered.Onbusydays,shepicksmorepeopleonquietdays,she
picksfewerpeople.Supposethatineffectshehasasimplerandomsampleof160
peoplewhoeatatherrestaurant.Supposefurtherthatthenumberofdinersis
muchmuchlargerthanthesample.Inthesample,118saytheywouldorder
venisongoulashifitwereoffered.
Thesamplepercentageofdinerswhosaytheywouldordervenisongoulashis

Thebootstrapestimateofthepopulationstandarddeviationis

A98%confidenceintervalforthepercentageofdinerswhowouldsaytheywould
ordervenisongoulashamongthepopulationofpeoplewhoeatatthatrestaurant
wouldgofrom
to

[+Solution]

ApproximateConfidenceIntervalsforthePopulationMean

Supposethatweseekaconfidenceintervalforthemeanofapopulation(box)ofnumbers,basedona
randomsamplefromthepopulation.TheSAMPLEMEANisanUNBIASEDestimatorofthepopulationmean
(E(SAMPLEMEAN)=AVE(box)),soitisreasonabletocenteraconfidenceintervalatthesamplemean.How
wideshouldwemakeanintervalcenteredatthesamplemean,fortheintervaltohaveaspecified
probabilityofCOVERINGthePOPULATIONMEAN?
IfweknewtheSDofthepopulationorhadanupperboundontheSDofthepopulation,wecoulduse
CHEBYCHEV'SINEQUALITYtoconstructaconservativeconfidenceintervalforthepopulationmean,aswe
didearlierinthechapter:thestandarderrorofthesamplemeanis

SE(SAMPLEMEAN)=SD(box)/n,
wherenisthesamplesize.So,forexample,theCOVERAGEPROBABILITYoftherandominterval
[(samplemean)2SD(box)/n,(samplemean)+2SD(box)/n]
isatleast75%.
Typically,however,theSDofthepopulationisnotknown,sowecannotconstructthisinterval.Moreover,

typicallywecannotusetheconservativeapproachbasedonChebychev'sInequality,becausethereisno
upperboundontheSDofagenerallistofnumbersanalogoustotheupperboundof50%fortheSDof
liststhatcontainonlyzerosandones.(Aswehaveseen,ifallthenumbersareboundedbetweenaand
b,withab,thenSD(box)(ba)/2buttypicallywedonotknowsuchlowerandupperboundsaandb.)
However,theapproximateapproachtoconstructingconfidenceintervals,basedonthenormalcurve,
worksifthesamplesizeissufficientlylarge.TheCENTRALLIMITTHEOREMtellsusthatthePROBABILITY
HISTOGRAMoftheAVERAGEofndrawswithreplacementfromaboxfollowstheNORMALCURVEincreasingly
wellasthenumberofdrawsnincreases.Wealsoknowthatthesamplestandarddeviationsis
increasinglylikelytobeanaccurateestimateoftheSDofthepopulationasnincreases.Asaresult,the
probabilitythattheSAMPLEMEANiswithinzs/nisapproximatelythesameastheareaunderthe
normalcurvebetweenzandz.Foranyfixedpopulation(box),theapproximationimprovesasthe
samplesizenincreases,forrandomsamplingwithreplacement.ExampleEXAMPLE261illustrates
calculatinganapproximateconfidenceintervalforthepopulationmean.Theexampleisdynamic:Itwill
tendtochangewhenyoureloadthepage.

Example261:ApproximateConfidenceIntervalforthe
PopulationMean
Toassessscholasticperformance,astateadministersanachievementtesttoa
simplerandomsampleof160highschoolseniors.Thereare40000highschool
seniorsinthestate.Themeanscoreofthestudentswhotooktheexamis104.34
points,andthesamplestandarddeviationoftheirscoresis13.9points.Findan
approximate98%confidenceintervalfortheaverageofthepopulationscoresthat
wouldhavebeenobtainedhadeveryhighschoolseniorinthestatebeen
administeredtheachievementtest.
Solution.Thesamplesize(160)isasufficientlysmallfractionofthepopulationsize
(40,000)thattreatingthesampleasifitweredrawnwithreplacementisreasonable.
Thesamplesizeissufficientlylargethatthenormalapproximationtothedistribution
ofthesamplemeanshouldbereasonablyaccurate,andthatthesamplestandard
deviationshouldbeclosetothestandarddeviationofthepopulation.Thearea
underthenormalcurvebetween2.326is98%:

Selectedarea:98%

Lowerendpoint: 2.326

Upperendpoint: 2.326

Thus,anapproximate98%confidenceintervalwouldbecenteredatthesample
mean,andextenddownandupbyfromthesamplemeanby2.326standardunits.
Theestimatedstandarderrorofthesamplemeanis
13.9/160=1.099points.
Theconfidenceintervalthusshouldextenddownandupfromthesamplemeanby
2.3261.099points,
sotheconfidenceintervalis
[101.784points,106.896points]
Thefollowingexercisechecksyourabilitytocalculateapproximateconfidenceintervalsforthepopulation

mean.Theexerciseisdynamic:Thequestionwilltendtochangewhenyoureloadthepage.

Exercise266.Todeterminetheaveragelifetimeoftheirlightemittingdiode(LED)
lightbulbs,amanufacturertakesasimplerandomsampleof110bulbsfroma
manufacturinglotof34,000bulbs.Themeanlifetimeofthebulbsinthesampleis
93.49thousandhours,andthesamplestandarddeviationoftheirlifetimesis8.56
thousandhours.
Anapproximate98%confidenceintervalfortheaveragelifetimeofthebulbsinthe
manufacturinglotwouldextendfrom
thousandhours(low)to

thousandhours(high)

.
[+Solution]

ExactConfidenceIntervalsforPercentages
Wehaveseentwomethodsforconstructingconfidenceintervalsforapopulationpercentage:a
conservativemethodbasedonChebychev'sInequalityandaboundonSD(box),andanapproximate
methodbasedonthenormalapproximation.Conservativemeansthatthecoverageprobabilityisatleast
ashighasclaimedbutcouldbesubstantiallyhigherforsomepopulations.Approximatemeansthatthe
coverageprobabilityisroughlyashighasclaimedbutcouldbesubstantiallylower(orsubstantially
higher)forsomepopulations.Thissectiondevelopsathirdmethod,whichisexact.Exactmeansthatthe
probabilitythattherandomintervalcoversthetruepopulationpercentageisjustwhatitisclaimedtobe
(dependingonthevalueofitcanbeabithigher,simplybecausethebinomialdistributionisadiscrete
distribution).
Theseintervalsareratherdifferentfromtheconfidenceintervalspresentedearlierinthischapter,which
wereoftheform(estimateuncertainty).Instead,eachoftheendpointsiscomputedfromthedata,
separately.Theresultingintervalusuallyisnotsymmetricaroundthesamplepercentage.
Weassumethatasampleofsizenisdrawnatrandomwithreplacementfroma01box.Wewanttofind
aconfidenceintervalforp,thepercentageofticketslabeled"1"inthebox.LetXbethenumberoftickets
inthesamplethatarelabeled"1."Ifthetruepercentageofticketslabeled"1"inthe01boxisp,thenX
hasaBINOMIALPROBABILITYDISTRIBUTIONwithparametersnandp.Wewillconstructaconfidenceinterval
forpbylookingatthevaluesofpthatareplausible,giventheobservedvalueofX.Theapproachis
similartotheapproachwetookinCHAPTER19,PROBABILITYMEETSDATA,andverycloselyrelatedto
hypothesistesting,discussedinCHAPTER27,HYPOTHESISTESTING:DOESCHANCEEXPLAINTHERESULTS?.
SupposetheobservedvalueofXisx.Ifpwereveryverysmall(closetozero),itwouldbeunlikelytosee
xormoreonesinthesampleunlessx=0.Soseeingxonesinthesampleisevidencethatpisnottoo
small.Conversely,ifpwereveryverylarge(closetoone),itwouldbeunlikelytoseexorfeweronesin
thesampleunlessx=n.SoobservingthatX=xlimitstheplausiblerangeofvaluesofp.
Supposewewantaconfidenceintervalforpwithconfidencelevel1.Letpbethesmallestvalueofq
forwhich
/2P(Xxifp=q)=nCxqx(1q)nx+nCx+1qx+1(1q)nx1++nCnqn(1q)0.
Similarly,letp+bethelargestvalueofqforwhich
/2P(Xxifp=q)=nCxqx(1q)nx+nCx1qx+1(1q)nx+1++nC0q0(1q)n.
Thentheinterval[p,p+]isa1confidenceintervalforp.Intervalsconstructedthiswaycanbemuch
shorterthantheconservativeintervalsbasedonChebychev'sInequalityandtheupperboundon
SD(box),buttheyarestillguaranteedtoattainatleasttheirnominalconfidencelevel.Confidence
intervalsbasedonthenormalapproximationaregenerallynotmuchshorter,buttheiractualconfidence
levelcanbesubstantiallylowerthantheirnominalconfidencelevel.

ConfidenceIntervalsforPopulationPercentiles

WecanalsousearandomsamplewithreplacementtofindaconfidenceintervalforaPERCENTILEofa
population.WeshallworkoutthedetailsfortheMEDIANotherpercentilescanbetreatedsimilarly.Unlike
theconservativeandapproximateconfidenceintervalsandlikeexactconfidenceintervalsforthe
populationpercentagewejustsawandtheseintervalsarenotoftheform(estimateuncertainty).
Instead,theendpointsoftheintervalsaretwoofthedata.Andthisapproachalsoleadstoexact
confidenceintervals:Thenominalcoverageprobabilityisequal[+]totheactualcoverageprobability.
Tobegin,supposewehavearandomsampleofsize10
{X1,X2,,X10}
takenwithreplacementfromapopulationwithmedianm.Sortthedataintoincreasingorder:letX(1)be
thesmallestdatum,X(2)bethesecondsmallest,etc.,andletX(10)bethelargestdatum.(Thesorteddata
arecalledtheorderstatistics.)LetA1betheeventthatthefourthsmallestdatum,X(4),islessthanor
equaltothemedian,andletA2betheeventthattheseventhsmallestdatum,X(7),isgreaterthanor
equaltothemedian.TheeventA1occursunless7ormoredataaregreaterthanthepopulationmedian,
soA1cistheeventthat7ormoredataaregreaterthanthepopulationmedian.Similarly,theeventA2
occursunless7ormoredataarelessthanthepopulationmedian,soA2cistheeventthat7ormoredata
arelessthanthepopulationmedian.LetA=A1A2betheeventthatthefourthandseventhorderstatistics
bracketthemedian.WeshallfindalowerboundontheprobabilityofA.
Notethatifsevenormoredataarelessthanthemedian,thenitisnotthecasethatsevenormoredata
aregreaterthanthemedian,soA1candA2caredisjoint.Hence,
P(Ac)=P((A1A2)c)
=P(A1cA2c)
=P(A1c)+P(A2c),
andthus
P(A)=1P(Ac)=1P(A1c)P(A2c).
WearedoneifwecanfindupperboundsforP(A1c)andP(A2c).
Recallthatthemedianisthesmallestnumberthatatleast50%ofthepopulationarelessthanorequal
to.Itfollowsthattheprobabilitythatanumberdrawnatrandomfromthepopulationisstrictlylessthanthe
medianisatmost50%(andpossiblyless),andthattheprobabilitythatanumberdrawnatrandomfrom
thepopulationisstrictlygreaterthanthemedianisatmost50%(andpossiblyless).Thedataaredrawn
fromthepopulationindependently,sothenumberofdatathatarelessthanthepopulationmedianhasa
BINOMIALPROBABILITYDISTRIBUTIONwithntrialsandp50%,asdoesthenumberofdatathataregreater
thanthepopulationmedian.

LetYbearandomvariablewithaBinomialdistributionwithparametersn=10andp=50%.ThusP(A1c)
P(Y7),andP(A2c)P(Y7).However,P(Y7)=P(Y3),so
P(A)1P(Y3orY7)=P(4Y6).
Thustheprobabilitythattheinterval[X(4),X(7)]containsthepopulationmedianisatleastaslargeasthe
probabilityofobserving4,5,or6successesin10independenttrialswithprobability50%ofsucceessin
eachtrialthehighlightedareainFIGURE263:

Figure263:Binomialprobabilityhistogram

10

Selectedarea:0%

Areafrom: 0.5
n: 10

to: 0.5
p: 0.5

Theintervalfromthefourthsmallestdatumtotheseventhsmallestdatumisthereforea65.6%
confidenceintervalforthepopulationmedian.
Thesameideacanbeusedtofindconfidenceintervalsforotherpercentiles:Theprobabilitydistribution
ofthenumberofdatathatarelessthanthe100qthpercentileisBinomialwithnumberoftrialsequalto
thenumberofdata,n,andprobabilityofsuccessatmostq,andtheprobabilitydistributionofthenumber
ofdatathataregreaterthanthe100qthpercentileisBinomialwithnumberoftrialsequaltothenumber
ofdata,n,andprobabilityofsuccessatmost1q.
Thefollowingexercisecheckswhetheryoucanfindaconfidenceintervalforapopulationmedian.

Exercise267.Considerfindinga96.5%confidenceintervalforthemedianofa
populationfromarandomsamplewithreplacementofsize15.
Theconfidenceintervalshouldgofromthe ?
?

datumtothe

[+Solution]

Summary
Supposewehaveaprocedureforcalculatinganintervalfromeverypossiblesampleofsizenfroma
populationofsizeN(aboxofNnumberedtickets).Lettbeaparameterofthepopulation.Supposethatif
theprocedureisappliedtoarandomsampleofsizen,thechancethattheresultingintervalwillcontaint

isP%.Thentheintervalthatresultsfromapplyingtheproceduretoanyparticularrandomsampleofsize
nisaP%CONFIDENCEINTERVALFORt.Oncetherandomsamplehasbeendrawn,theresultinginterval
eithercovers(contains)ordoesnotcoverttheprobabilitythattheintervalcoverstiseither0or100%.
TheprobabilitythattheintervalwillcovertbeforethesampleisdrawniscalledtheCONFIDENCELEVELof
theintervalafterthesampleisdrawn.Confidenceintervalsprovideanalternativetoreportingasingle
"bestestimate"ofaparameterandasummarymeasureoftheuncertaintyoftheestimate.Itispossibleto
constructconservativeconfidenceintervalsforthepopulationpercentagefromsimplerandomsamplesor
randomsampleswithreplacementfrom01BOXES:Forasimplerandomsampleofsizen,thechancethat
therandominterval
[kf/(2n),kf/(2n)]
coversthepopulationpercentagepisatleast11/k2,whereisthesamplepercentage,fisthefinite
populationcorrection(Nn)/(N1),Nisthepopulationsize,andnisthesamplesize.Forrandom
samplingwithreplacement,thechancethattherandominterval
[k/(2n),k/(2n)]
includesthepopulationpercentagepisatleast11/k2.Theseareconservativeproceduresfor
constructingconfidenceintervals,becausetheprobabilitythattheintervalstheyproducecoverthetrue
populationpercentagep(theactualcoverageprobability)isgreaterthantheprobabilitytheyclaim,11/k2
(thenominalcoverageprobability).Theseprocedurescanbeextremelypessimistic,especiallywhenthe
samplesizenislargeandwhenthetruepopulationpercentagepisfarfrom50%theintervalsthenare
muchwiderthantheyneedtobefortheactualcoverageprobabilitytobe11/k2.
Supposethattherandomsampleisdrawnwithreplacement.Whenthesamplesizenislarge,thecentral
limittheoremensuresthattheprobabilityhistogramofthesamplepercentagecanbeapproximated
accuratelybythenormalcurve.TheexpectedvalueofthesamplepercentageispandtheSEofthe
samplepercentageisSD(box)/n,whereSD(box)isthepopulationSD,(p(1p)),theSDofthelistof
numbersontheticketsinthebox.Whennislarge,theSDofthesample,s*,tendstobeanaccurate
estimateofSD(box),andthechancethattherandominterval
[zs*/n,+zs*/n]
containspisapproximatelyequaltotheareaunderthenormalcurvebetweenz.Takingz=1.96,for
example,givesapproximate95%confidenceintervals.Thecoverageprobabilityofthisprocedure
typicallyisnotexactlytheareaunderthenormalcurvebetweenz,butasthesamplesizegrows,the
coverageprobabilityapproachesthatarea.
Approximateconfidenceintervalsforthepopulationmeancanbeconstructedsimilarly,butthenitismore
commontouse
s=s*n/(n1)
toestimateSD(box)thantouses*.LetMdenotethesamplemean.Forrandomsamplingwith
replacement,ifthesamplesizenislarge,thechancethattherandominterval
[Mzs/n,M+zs/n]
coversthepopulationmeanisapproximatelyequaltotheareaunderthenormalcurvebetweenz.
Again,thecoverageprobabilityisnotexactlytheareaunderthenormalcurvebetweenz,butit
approachesthatareaasthesamplesizegrows.
Confidenceintervalscanbeconstructedforpopulationparametersotherthanpercentagesandmeans.
Forexample,onecanconstructconfidenceintervalsforpercentilesofapopulationusingthefactthatfor
randomsamplingwithreplacement,thenumberofdatathatarelessthanthe100qthpercentilehasa
binomialdistributionwithparametersnandp=q,andthenumberofdatathataregreaterthanthe
100qthpercentilehasabinomialdistributionwithparametersnandp=1q.

KeyTerms
approximateconfidenceinterval
bootstrapestimateofthestandarddeviation
Chebychev'sinequality

confidenceinterval
confidencelevel
conservativeconfidenceinterval
coverageprobability
expectedvalue
finitepopulationcorrectionf
normalapproximation
normalcurve
parameter
populationmean
populationpercentage
populationSD
probability
samplemean
samplepercentage
samplestandarddeviations
standarddeviation(SD)
standarddeviationofthesamples*
19972015.P.B.Stark.Allrightsreserved.
Lastgenerated5/29/2015,8:04:49PM.Contentlastmodified21January201308:37PST.

Das könnte Ihnen auch gefallen