06 Learning About Difference in Proportions

Stat250GundersonLectureNotes
6:LearningabouttheDifferenceinPopulationProportions
Part1:DistributionforaDifferenceinSampleProportions
TheIndependentSamplesScenario
Twosamplesaresaidtobeindependentsampleswhenthemeasurementsinonesampleare
notrelatedtothemeasurementsintheothersample.Independentsamplesaregeneratedina
varietyofways.Somecommonways:
Random samples are taken separately from two populations and the same response
variableisrecordedforeachindividual.
Onerandomsampleistakenandavariableisrecordedforeachindividual,butthenunits
arecategorizedasbelongingtoonepopulationoranother,e.g.male/female.
Participants are randomly assigned to one of two treatment conditions, and the same
responsevariable,suchasweightloss,isrecordedforeachindividualunit.
Iftheresponsevariableiscategorical,aresearchermightcomparetwoindependentgroupsby
lookingatthedifferencebetweenthetwoproportions.
Thereareusuallytwoquestionsofinterestaboutadifferenceintwopopulationproportions.
First, we want to estimate the value of the difference. Second, often we want to test the
hypothesisthatthedifferenceis0,whichwouldindicatethatthetwoproportionsareequal.In
either case, we will need to know about the sampling distribution for the difference in two
sampleproportions(fromindependentsamples).
SamplingDistributionfortheDifferenceinTwoSampleProportions
Example:DrivingSafely
Questionofinterest:Howmuchofadifferenceistherebetweenmenandwomenwithregard
totheproportionwhohavedrivenacarwhentheyhadtoomuchalcoholtodrivesafely?
Study:TimemagazinereportedtheresultsofapollofadultAmericans.Onequestionasked
was:Haveyoueverdrivenacarwhenyouprobablyhadtoomuchalcoholtodrivesafely?
Letp1bethepopulationproportionofmenwhowouldrespondyes.
Letp2bethepopulationproportionofwomenwhowouldrespondyes.
Wewanttolearnaboutp1andp2andhowtheycomparetoeachother.Wecouldestimatethe
differencep1p2withthecorrespondingdifferenceinthesampleproportions p 1 p 2 .
Willitbeagoodestimate?Howclosecanweexpectthedifferenceinsampleproportionstobe
tothetruedifferenceinpopulationproportions(onaverage)?

93
Imaginerepeatingthestudymanytimes,eachtimetakingtwoindependentrandomsamplesof
sizes n1 and n2, and computing the value of p 1 p 2 . What kind of values could you get for
p 1 p 2 ?Whatwouldthedistributionofthepossible p 1 p 2 valueslooklike?Whatcanwesay
aboutthedistributionofthedifferenceintwosampleproportions?
Usingresultsabouthowtoworkwithdifferencesofindependentrandomvariablesandrecalling
theformofthesamplingdistributionforasampleproportion,thesamplingdistributionofthe
differenceintwosampleproportions p 1 p 2 canbedetermined.
Firstrecallthatwhenworkingwiththedifferenceintwoindependentrandomvariables:
themeanofthedifferenceisjustthedifferenceinthetwomeans
thevarianceofthedifferenceisthesumofthevariances
Next,rememberthatthestandarddeviationofasampleproportionis p(1 p) .
n
Sowhatwouldthevarianceofasinglesampleproportionbe? p(1 p)
n
So lets apply these ideas to our newest parameter of interest, the difference in two sample
proportions p 1 p 2 .
SamplingDistributionoftheDifferenceinTwo(Independent)SampleProportions
Ifthetwosampleproportionsarebasedonindependentrandomsamplesfromtwopopulations
andifallofthequantities n1 p 1 , n1 (1 p1 ) , n2 p 2 ,and n2 (1 p 2 ) areatleast10,
Thenthedistributionforthepossible
N p1 p2 ,
p 1 p 2 willbe(approximately)
p1 1 p1 p2 1 p2
n1
n2
94
Sincethepopulationproportionsofp1andp2arenotknown,wewillusethedatatocompute
thestandarderrorofthedifferenceinsampleproportions.
StandardErroroftheDifferenceinSampleProportions
s.e.( p 1 p 2 )
p 1 (1 p 1 ) p 2 (1 p 2 )
n1
n2
The standard error of p 1 p 2 estimates, roughly, the average distance of the possible
p 1 p 2 valuesfromp1p2.Thepossible p 1 p 2 valuesresultfromconsideringallpossible
independentrandomsamplesofthesamesizesfromthesametwopopulations.
Moreover,we can usethis standard error to produce arange of values that wecan be quite
confidentwillcontainthedifferenceinthepopulationproportionsp1p2:
p 1 p 2 (afew)s.e.( p 1 p 2 ).
Thisisthebasisforconfidenceintervalforthedifferenceinpopulationproportionsdiscussed
nextinPart2.
Ifweareinterestedintestinghypothesesaboutthedifferenceinthepopulationrates,wewill
needtoconstructanullstandarderrorofthedifferenceinthesampleproportionsanduseitto
computeastandardizedteststatistic.Thatteststatisticwillhavethefollowingbasicform:
SamplestatisticNullvalue.
(Null)standarderror
Thisisthebasisforthehypothesistestingaboutthedifferenceinpopulationproportionscovered
inPart3ofthissectionofnotes.
95
AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.
96
Part2:ConfidenceIntervalforaDifferenceinPopulationProportions
Wehavetwopopulationsfromwhichindependentsamplesareavailable, (oronepopulationfor
whichtwogroupsformedusingacategoricalvariable).Theresponsevariableisalsocategoricaland
weareinterestedincomparingtheproportionsforthetwopopulations.
Letp1bethepopulationproportionforthefirstpopulation.
Letp2bethepopulationproportionforthesecondpopulation.
Parameter:thedifferenceinthepopulationproportionsp1p2.
Sampleestimate:thedifferenceinthesampleproportions p 1 p 2 .
Standarderror: s.e.( p p ) p 1 (1 p 1 ) p 2 (1 p 2 )
1
n1
n2
Sowehaveourestimateofthedifferenceinthetwopopulationproportions,namely p 1 p 2 ,
andwehaveitsstandarderror.Tomakeourconfidenceinterval,weneedtoknowthemultiplier.
SampleEstimateMultiplierxStandarderror
As in the case for estimating one population proportion, we assume the sample sizes are
sufficiently large so the multiplier will be a z* value found from using the standard normal
distribution.
TwoIndependentSampleszConfidenceIntervalforp1p2
p 1 p 2 z *s.e. p 1 p 2
where
s.e.( p 1 p 2 )
p 1 (1 p 1 ) p 2 (1 p 2 )
n1
n2
andz*istheappropriatemultiplierfromtheN(0,1)distribution.
Thisintervalrequiresthatthesampleproportionsarebasedonindependentrandomsamples
fromthetwopopulations.
Also,allofthequantities n1 p 1 , n1 (1 p 1 ) , n2 p 2 ,and n2 (1 p 2 ) bepreferablyatleast10.
97
TryIt!DoOlderPeopleSnoreMorethanYounger?
ResearchersattheNationalSleepFoundationwereinterestedincomparingtheproportionof
peoplewhosnorefortwoagepopulations(1=olderadultsdefinedasover50yearsoldand
2=youngeradultsdefinedasbetween18and30yearsold).Thefollowingdatawasobtained
fromadultswhoparticipatedinasleeplabstudy.
Group
1=olderadults(over50yearsold)
2=youngeradults(between18and30yearsold)
Snore?
Yes
No
168
312
45
135
Total
480
180
Letp2representthepopulationproportionofallyoungeradultswhosnore.Provideanestimate
forthispopulationproportionp2.Includetheappropriatesymbol.
p 2 45 / 180 0.25
Wewishtoprovidea90%confidenceintervaltoestimatethedifferenceinsnoringratesforthe
twopopulationproportionsofadults.Oneoftheconditionsforthatconfidenceintervaltobe
validinvolveshavingtwoindependentrandomsamples,whichisreasonablefromthedesignof
thestudy.Validatetheremainingassumption.
Weneedtohaveatleast10whodosnoreandatleast10whodonotsnoreineachofourtwo
samples.Herewehave168and312forgroup1and45and135forgroup2,soallfourofthese
countsisatleast10.
Providethe90%confidenceintervalandgiveaninterpretationofthisintervalincontext.
p1 168 / 480 0.35 and p 2 0.25
s.e.( p1 p 2 )
p1 (1 p1 ) p 2 (1 p 2 )
0.35(0.65) 0.25(0.75)
0.0389
n1
n2
480
180
p 1 p 2 z *s.e. p 1 p 2
(0.35 0.25) 1.645(0.0389)

0.10 0.064
(0.036, 0.164) or 3.6% to 16.3%
Interpretationthisinterval.
With90%confidenceweestimatethedifferenceinsnoringratesforthetwopopulation
proportionsofadultstobesomewherebetween___3.6%____and___16.4%___.
Whatvaluedoyounoticeisnotinthisinterval?___0____
Doesthereappeartobeasignificantdifferencebetween
thepopulationratesofsnoringforolderversusyoungeradults?
Yes
No
98
Part3:TestingaboutaDifferenceinPopulationProportions
TestingHypothesesabouttheDifferenceinTwoPopulationProportions
Wehavetwopopulationsfromwhichindependentsamplesareavailable,(oronepopulationfor
which two groups can be formed using a categorical variable). The response variable is also
categoricalandweareinterestedincomparingtheproportionsforthetwopopulations.
Letp1bethepopulationproportionforthefirstpopulation.
Letp2bethepopulationproportionforthesecondpopulation.
Parameter:thedifferenceinthepopulationproportionsp1p2.
Sampleestimate:thedifferenceinthesampleproportions p 1 p 2 .
Standarddeviationof p 1 p 2 : s.d.( p 1 p 2 )
p1 (1 p1 ) p 2 (1 p 2 )
n1
n2
Recallthatthemultiplierintheconfidenceintervalwasaz*value.Sowewillbecomputinga
Zteststatisticforperformingasignificancetest.
Thestandarderrorusedinconstructingtheconfidenceintervalforthedifferencebetweentwo
populationproportionsisnotthesameasthatusedforthestandardizedzteststatistic.
Wewillneedtoconstructthenullstandarderror,thestandarderrorforthestatisticwhenthe
nullhypothesisistrue.Letsstartwithwhatthehypotheseswilllooklike.
Possiblenullandalternativehypotheses.
1.H0: p1=p2(orp1p2=0)
versusHa:p1p2
2.H0: p1=p2(orp1p2=0)
versusHa:p1>p2
3.H0: p1=p2(orp1p2=0)
versusHa:p1<p
99
Nextweneedtodeterminetheteststatisticandunderstandtheconditionsrequiredforthetest
tobevalid.Thegeneralformoftheteststatisticis:
Teststatistic=SamplestatisticNullvalue
Standarderror
Inthecaseoftwopopulationproportions,ifthenullhypothesisistrue,wehavep1p2=0or
that the two population proportions are the same, p1= p2 = p. What is a reasonable way to
estimatethecommonpopulationproportionp?
n1 p1 n2 p 2
n1 n2
Thegeneralstandarderrorfor p 1 p 2 isgivenby:
s.e.( p1 p 2 )
p1 (1 p1 ) p 2 (1 p 2 )
n1
n2
butifthenullhypothesisistrue,then p isthebestestimateforeachpopulationproportion
andshouldbeusedinthestandarderror.
So,thenullstandarderrorfor p 1 p 2 isgivenby:
1 1
p (1 p )
n1 n2
Andthecorrespondingteststatisticis:
p1 p 2
1 1
p (1 p )
n1 n2
Ifthenullhypothesisistrue,thiszstatisticwillhavea_____N(0,1)______distribution.This
distributionisusedtofindthepvalueforthetest.
Conditions:Thistestrequiresthatthesampleproportionsarebasedonindependentrandom
samplesfromthetwopopulations.Also,allofthequantities n1 p , n1(1 p ) , n2 p ,and n2 (1 p )
bepreferablyatleast10.Notethesearecheckedwiththeestimateofthecommonpopulation
proportion p .
100
TryIt!TakingMorePictureswithCell
Cellphonescannowbeusedformanypurposesbesidesmakingcalls.Aninitialstudyfoundthat
more than 75% of young adults (defined as 1825 years old) use their cell phones for taking
picturesatleast2timesperweek.Thisstudyalsosuggestedthattheproportionofyoungwomen
inthisagegroupwhousetheircellphonetotakepicturesishigherthanthatforyoungmenin
thisagegroup.Afollowupstudywasconductedtoinvestigatethisconjecture.Theresearchers
whichtousea5%significancelevel.
Statedthehypotheses:H0:p1p2=0
versusHa:p1p2>0where
p1representsthepopulationproportionofallyoungwomen1825yearsoldwhoreportusing
theircellphonetotakepicturesatleast2timesperweek,and
p2representsthepopulationproportionofallyoungmen1825yearsoldwhoreportusingtheir
cellphonetotakepicturesatleast2timesperweek.
Herearetheresults:
Young
Agegroup=1825yearolds
Women
Numberwhoreportusingphonetotakepicturesatleast2times/week
417
SampleSize
521
Percent
80%
Young
Men
369
492
75%
Wecanassumethesesamplesareindependentrandomsamples.Verifytheremainingcondition
necessarytoconducttheZtest.
Allofthequantities n1 p , n1(1 p ) , n2 p ,and n2 (1 p ) bepreferablyatleast10.Noteweneed
tofindtheestimateofthecommonpopulationproportion p todothischeck.
417 369
p
0.7759
521 492
417
369
Conductthetest. p 1

0.8004
p 2
0.75
521
492
p1 p 2
0.8004 0.75
0.0504
z
1.92
1 0.0262
1 1
1
0.7759(1 0.7759)
p (1 p )
521 492
n1 n2
pvalue=P(Z1.92undertheN(0,1)distrib.)=0.0273(lessthan0.05)sowerejectH0
Usinga5%significancelevelwhichistheappropriateconclusion?
There is sufficient evidence to demonstrate the population proportion of all young

women1825yearsoldwhotakepictureswiththeirphoneatleasttwiceperweekis
greaterthanthatofthepopulationofallyoungmen1825yearsold.
Thereisnotsufficientevidencetodemonstratethepopulationproportionofallyoung
women1825yearsoldwhotake pictureswiththeirphoneatleasttwiceperweekis
greaterthanthatofthepopulationofallyoungmen1825yearsold.
101
AdditionalNotes
A place to jot down questions you may have and ask
duringofficehours,takeafewextranotes,writeoutan
extra problem or summary completed in lecture, create
yourownsummaryabouttheseconcepts.
102

06 Learning About Difference in Proportions

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

06 Learning About Difference in Proportions

Hochgeladen von

Copyright:

Verfügbare Formate

Stat250GundersonLectureNotes

Also,allofthequantities n1 p 1 , n1 (1 p 1 ) , n2 p 2 ,and n2 (1 p 2 ) bepreferablyatleast10.

p1 168 / 480 0.35 and p 2 0.25

(0.35 0.25) 1.645(0.0389)

There is sufficient evidence to demonstrate the population proportion of all young

Das könnte Ihnen auch gefallen