Beruflich Dokumente
Kultur Dokumente
6:LearningabouttheDifferenceinPopulationProportions
Part1:DistributionforaDifferenceinSampleProportions
TheIndependentSamplesScenario
Twosamplesaresaidtobeindependentsampleswhenthemeasurementsinonesampleare
notrelatedtothemeasurementsintheothersample.Independentsamplesaregeneratedina
varietyofways.Somecommonways:
Random samples are taken separately from two populations and the same response
variableisrecordedforeachindividual.
Onerandomsampleistakenandavariableisrecordedforeachindividual,butthenunits
arecategorizedasbelongingtoonepopulationoranother,e.g.male/female.
Participants are randomly assigned to one of two treatment conditions, and the same
responsevariable,suchasweightloss,isrecordedforeachindividualunit.
Iftheresponsevariableiscategorical,aresearchermightcomparetwoindependentgroupsby
lookingatthedifferencebetweenthetwoproportions.
Thereareusuallytwoquestionsofinterestaboutadifferenceintwopopulationproportions.
First, we want to estimate the value of the difference. Second, often we want to test the
hypothesisthatthedifferenceis0,whichwouldindicatethatthetwoproportionsareequal.In
either case, we will need to know about the sampling distribution for the difference in two
sampleproportions(fromindependentsamples).
SamplingDistributionfortheDifferenceinTwoSampleProportions
Example:DrivingSafely
Questionofinterest:Howmuchofadifferenceistherebetweenmenandwomenwithregard
totheproportionwhohavedrivenacarwhentheyhadtoomuchalcoholtodrivesafely?
Study:TimemagazinereportedtheresultsofapollofadultAmericans.Onequestionasked
was:Haveyoueverdrivenacarwhenyouprobablyhadtoomuchalcoholtodrivesafely?
Letp1bethepopulationproportionofmenwhowouldrespondyes.
Letp2bethepopulationproportionofwomenwhowouldrespondyes.
Wewanttolearnaboutp1andp2andhowtheycomparetoeachother.Wecouldestimatethe
differencep1p2withthecorrespondingdifferenceinthesampleproportions p 1 p 2 .
Willitbeagoodestimate?Howclosecanweexpectthedifferenceinsampleproportionstobe
tothetruedifferenceinpopulationproportions(onaverage)?
93
Imaginerepeatingthestudymanytimes,eachtimetakingtwoindependentrandomsamplesof
sizes n1 and n2, and computing the value of p 1 p 2 . What kind of values could you get for
p 1 p 2 ?Whatwouldthedistributionofthepossible p 1 p 2 valueslooklike?Whatcanwesay
aboutthedistributionofthedifferenceintwosampleproportions?
Usingresultsabouthowtoworkwithdifferencesofindependentrandomvariablesandrecalling
theformofthesamplingdistributionforasampleproportion,thesamplingdistributionofthe
differenceintwosampleproportions p 1 p 2 canbedetermined.
Firstrecallthatwhenworkingwiththedifferenceintwoindependentrandomvariables:
themeanofthedifferenceisjustthedifferenceinthetwomeans
thevarianceofthedifferenceisthesumofthevariances
Next,rememberthatthestandarddeviationofasampleproportionis p(1 p) .
n
Sowhatwouldthevarianceofasinglesampleproportionbe? p(1 p)
n
So lets apply these ideas to our newest parameter of interest, the difference in two sample
proportions p 1 p 2 .
SamplingDistributionoftheDifferenceinTwo(Independent)SampleProportions
Ifthetwosampleproportionsarebasedonindependentrandomsamplesfromtwopopulations
andifallofthequantities n1 p 1 , n1 (1 p1 ) , n2 p 2 ,and n2 (1 p 2 ) areatleast10,
Thenthedistributionforthepossible
N p1 p2 ,
p 1 p 2 willbe(approximately)
p1 1 p1 p2 1 p2
n1
n2
94
Sincethepopulationproportionsofp1andp2arenotknown,wewillusethedatatocompute
thestandarderrorofthedifferenceinsampleproportions.
StandardErroroftheDifferenceinSampleProportions
s.e.( p 1 p 2 )
p 1 (1 p 1 ) p 2 (1 p 2 )
n1
n2
The standard error of p 1 p 2 estimates, roughly, the average distance of the possible
p 1 p 2 valuesfromp1p2.Thepossible p 1 p 2 valuesresultfromconsideringallpossible
independentrandomsamplesofthesamesizesfromthesametwopopulations.
Moreover,we can usethis standard error to produce arange of values that wecan be quite
confidentwillcontainthedifferenceinthepopulationproportionsp1p2:
p 1 p 2 (afew)s.e.( p 1 p 2 ).
Thisisthebasisforconfidenceintervalforthedifferenceinpopulationproportionsdiscussed
nextinPart2.
Ifweareinterestedintestinghypothesesaboutthedifferenceinthepopulationrates,wewill
needtoconstructanullstandarderrorofthedifferenceinthesampleproportionsanduseitto
computeastandardizedteststatistic.Thatteststatisticwillhavethefollowingbasicform:
SamplestatisticNullvalue.
(Null)standarderror
Thisisthebasisforthehypothesistestingaboutthedifferenceinpopulationproportionscovered
inPart3ofthissectionofnotes.
95
AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.
96
Stat250GundersonLectureNotes
6:LearningabouttheDifferenceinPopulationProportions
Part2:ConfidenceIntervalforaDifferenceinPopulationProportions
Wehavetwopopulationsfromwhichindependentsamplesareavailable, (oronepopulationfor
whichtwogroupsformedusingacategoricalvariable).Theresponsevariableisalsocategoricaland
weareinterestedincomparingtheproportionsforthetwopopulations.
Letp1bethepopulationproportionforthefirstpopulation.
Letp2bethepopulationproportionforthesecondpopulation.
Parameter:thedifferenceinthepopulationproportionsp1p2.
Sampleestimate:thedifferenceinthesampleproportions p 1 p 2 .
Standarderror: s.e.( p p ) p 1 (1 p 1 ) p 2 (1 p 2 )
1
n1
n2
Sowehaveourestimateofthedifferenceinthetwopopulationproportions,namely p 1 p 2 ,
andwehaveitsstandarderror.Tomakeourconfidenceinterval,weneedtoknowthemultiplier.
SampleEstimateMultiplierxStandarderror
As in the case for estimating one population proportion, we assume the sample sizes are
sufficiently large so the multiplier will be a z* value found from using the standard normal
distribution.
TwoIndependentSampleszConfidenceIntervalforp1p2
p 1 p 2 z *s.e. p 1 p 2
where
s.e.( p 1 p 2 )
p 1 (1 p 1 ) p 2 (1 p 2 )
n1
n2
andz*istheappropriatemultiplierfromtheN(0,1)distribution.
Thisintervalrequiresthatthesampleproportionsarebasedonindependentrandomsamples
fromthetwopopulations.
97
TryIt!DoOlderPeopleSnoreMorethanYounger?
ResearchersattheNationalSleepFoundationwereinterestedincomparingtheproportionof
peoplewhosnorefortwoagepopulations(1=olderadultsdefinedasover50yearsoldand
2=youngeradultsdefinedasbetween18and30yearsold).Thefollowingdatawasobtained
fromadultswhoparticipatedinasleeplabstudy.
Group
1=olderadults(over50yearsold)
2=youngeradults(between18and30yearsold)
Snore?
Yes
No
168
312
45
135
Total
480
180
Letp2representthepopulationproportionofallyoungeradultswhosnore.Provideanestimate
forthispopulationproportionp2.Includetheappropriatesymbol.
p 2 45 / 180 0.25
Wewishtoprovidea90%confidenceintervaltoestimatethedifferenceinsnoringratesforthe
twopopulationproportionsofadults.Oneoftheconditionsforthatconfidenceintervaltobe
validinvolveshavingtwoindependentrandomsamples,whichisreasonablefromthedesignof
thestudy.Validatetheremainingassumption.
Weneedtohaveatleast10whodosnoreandatleast10whodonotsnoreineachofourtwo
samples.Herewehave168and312forgroup1and45and135forgroup2,soallfourofthese
countsisatleast10.
Providethe90%confidenceintervalandgiveaninterpretationofthisintervalincontext.
s.e.( p1 p 2 )
p1 (1 p1 ) p 2 (1 p 2 )
0.35(0.65) 0.25(0.75)
0.0389
n1
n2
480
180
p 1 p 2 z *s.e. p 1 p 2
Interpretationthisinterval.
With90%confidenceweestimatethedifferenceinsnoringratesforthetwopopulation
proportionsofadultstobesomewherebetween___3.6%____and___16.4%___.
Whatvaluedoyounoticeisnotinthisinterval?___0____
Doesthereappeartobeasignificantdifferencebetween
thepopulationratesofsnoringforolderversusyoungeradults?
Yes
No
98
Stat250GundersonLectureNotes
6:LearningabouttheDifferenceinPopulationProportions
Part3:TestingaboutaDifferenceinPopulationProportions
TestingHypothesesabouttheDifferenceinTwoPopulationProportions
Wehavetwopopulationsfromwhichindependentsamplesareavailable,(oronepopulationfor
which two groups can be formed using a categorical variable). The response variable is also
categoricalandweareinterestedincomparingtheproportionsforthetwopopulations.
Letp1bethepopulationproportionforthefirstpopulation.
Letp2bethepopulationproportionforthesecondpopulation.
Parameter:thedifferenceinthepopulationproportionsp1p2.
Sampleestimate:thedifferenceinthesampleproportions p 1 p 2 .
Standarddeviationof p 1 p 2 : s.d.( p 1 p 2 )
p1 (1 p1 ) p 2 (1 p 2 )
n1
n2
Recallthatthemultiplierintheconfidenceintervalwasaz*value.Sowewillbecomputinga
Zteststatisticforperformingasignificancetest.
Thestandarderrorusedinconstructingtheconfidenceintervalforthedifferencebetweentwo
populationproportionsisnotthesameasthatusedforthestandardizedzteststatistic.
Wewillneedtoconstructthenullstandarderror,thestandarderrorforthestatisticwhenthe
nullhypothesisistrue.Letsstartwithwhatthehypotheseswilllooklike.
Possiblenullandalternativehypotheses.
1.H0: p1=p2(orp1p2=0)
versusHa:p1p2
2.H0: p1=p2(orp1p2=0)
versusHa:p1>p2
3.H0: p1=p2(orp1p2=0)
versusHa:p1<p
99
Nextweneedtodeterminetheteststatisticandunderstandtheconditionsrequiredforthetest
tobevalid.Thegeneralformoftheteststatisticis:
Teststatistic=SamplestatisticNullvalue
Standarderror
Inthecaseoftwopopulationproportions,ifthenullhypothesisistrue,wehavep1p2=0or
that the two population proportions are the same, p1= p2 = p. What is a reasonable way to
estimatethecommonpopulationproportionp?
n1 p1 n2 p 2
n1 n2
Thegeneralstandarderrorfor p 1 p 2 isgivenby:
s.e.( p1 p 2 )
p1 (1 p1 ) p 2 (1 p 2 )
n1
n2
butifthenullhypothesisistrue,then p isthebestestimateforeachpopulationproportion
andshouldbeusedinthestandarderror.
So,thenullstandarderrorfor p 1 p 2 isgivenby:
1 1
p (1 p )
n1 n2
Andthecorrespondingteststatisticis:
p1 p 2
1 1
p (1 p )
n1 n2
Ifthenullhypothesisistrue,thiszstatisticwillhavea_____N(0,1)______distribution.This
distributionisusedtofindthepvalueforthetest.
Conditions:Thistestrequiresthatthesampleproportionsarebasedonindependentrandom
samplesfromthetwopopulations.Also,allofthequantities n1 p , n1(1 p ) , n2 p ,and n2 (1 p )
bepreferablyatleast10.Notethesearecheckedwiththeestimateofthecommonpopulation
proportion p .
100
TryIt!TakingMorePictureswithCell
Cellphonescannowbeusedformanypurposesbesidesmakingcalls.Aninitialstudyfoundthat
more than 75% of young adults (defined as 1825 years old) use their cell phones for taking
picturesatleast2timesperweek.Thisstudyalsosuggestedthattheproportionofyoungwomen
inthisagegroupwhousetheircellphonetotakepicturesishigherthanthatforyoungmenin
thisagegroup.Afollowupstudywasconductedtoinvestigatethisconjecture.Theresearchers
whichtousea5%significancelevel.
Statedthehypotheses:H0:p1p2=0
versusHa:p1p2>0where
p1representsthepopulationproportionofallyoungwomen1825yearsoldwhoreportusing
theircellphonetotakepicturesatleast2timesperweek,and
p2representsthepopulationproportionofallyoungmen1825yearsoldwhoreportusingtheir
cellphonetotakepicturesatleast2timesperweek.
Herearetheresults:
Young
Agegroup=1825yearolds
Women
Numberwhoreportusingphonetotakepicturesatleast2times/week
417
SampleSize
521
Percent
80%
Young
Men
369
492
75%
Wecanassumethesesamplesareindependentrandomsamples.Verifytheremainingcondition
necessarytoconducttheZtest.
Allofthequantities n1 p , n1(1 p ) , n2 p ,and n2 (1 p ) bepreferablyatleast10.Noteweneed
tofindtheestimateofthecommonpopulationproportion p todothischeck.
417 369
p
0.7759
521 492
417
369
Conductthetest. p 1
0.8004
p 2
0.75
521
492
p1 p 2
0.8004 0.75
0.0504
z
1.92
1 0.0262
1 1
1
0.7759(1 0.7759)
p (1 p )
521 492
n1 n2
pvalue=P(Z1.92undertheN(0,1)distrib.)=0.0273(lessthan0.05)sowerejectH0
Usinga5%significancelevelwhichistheappropriateconclusion?
Thereisnotsufficientevidencetodemonstratethepopulationproportionofallyoung
women1825yearsoldwhotake pictureswiththeirphoneatleasttwiceperweekis
greaterthanthatofthepopulationofallyoungmen1825yearsold.
101
AdditionalNotes
A place to jot down questions you may have and ask
duringofficehours,takeafewextranotes,writeoutan
extra problem or summary completed in lecture, create
yourownsummaryabouttheseconcepts.
102