Beruflich Dokumente
Kultur Dokumente
SASDataAnalysisExamples:LogitRegression
HelptheStatConsultingGroupby
stat
>
sas
>
dae
givingagift
>logit.htm
SASDataAnalysisExamples
LogitRegression
Logisticregression,alsocalledalogitmodel,isusedtomodeldichotomousoutcomevariables.Inthelogitmodelthelogoddsoftheoutcomeismodeled
asalinearcombinationofthepredictorvariables.
Pleasenote:Thepurposeofthispageistoshowhowtousevariousdataanalysiscommands.Itdoesnotcoverallaspectsoftheresearchprocesswhich
researchersareexpectedtodo.Inparticular,itdoesnotcoverdatacleaningandchecking,verificationofassumptions,modeldiagnosticsandpotential
followupanalyses.
Examples
Example1:Supposethatweareinterestedinthefactorsthatinfluencewhetherapoliticalcandidatewinsanelection.Theoutcome(response)variableis
binary(0/1)winorlose.Thepredictorvariablesofinterestaretheamountofmoneyspentonthecampaign,theamountoftimespentcampaigning
negatively,andwhetherthecandidateisanincumbent.
Example2:Aresearcherisinterestedinhowvariables,suchasGRE(GraduateRecordExamscores),GPA(gradepointaverage)andprestigeofthe
undergraduateinstitution,affectadmissionintograduateschool.Theoutcomevariable,admit/don'tadmit,isbinary.
Descriptionofthedata
Forourdataanalysisbelow,wearegoingtoexpandonExample2aboutgettingintograduateschool.Wehavegeneratedhypotheticaldata,whichcanbe
obtainedfromourwebsitebyclickingonbinary.sas7bdat.Youcanstorethisanywhereyoulike,butthesyntaxbelowassumesithasbeenstoredinthe
directoryc:\data.Thisdatasethasabinaryresponse(outcome,dependent)variablecalledadmit,whichisequalto1iftheindividualwasadmittedto
graduateschool,and0otherwise.Therearethreepredictorvariables:gre,gpa,andrank.Wewilltreatthevariablesgreandgpaascontinuous.The
variableranktakesonthevalues1through4.Institutionswitharankof1havethehighestprestige,whilethosewitharankof4havethelowest.Westart
outbylookingatsomedescriptivestatistics.
procmeansdata="c:\data\binary";
vargregpa;
run;
TheMEANSProcedure
VariableNMeanStdDevMinimumMaximum
GRE400587.7000000115.5165364220.0000000800.0000000
GPA4003.38990000.38056682.26000004.0000000
procfreqdata="c:\data\binary";
tablesrankadmitadmit*rank;
run;
TheFREQProcedure
CumulativeCumulative
RANKFrequencyPercentFrequencyPercent
16115.256115.25
215137.7521253.00
312130.2533383.25
46716.75400100.00
CumulativeCumulative
ADMITFrequencyPercentFrequencyPercent
027368.2527368.25
112731.75400100.00
TableofADMITbyRANK
http://www.ats.ucla.edu/stat/sas/dae/logit.htm
1/6
02/06/2016
SASDataAnalysisExamples:LogitRegression
ADMITRANK
Frequency|
Percent|
RowPct|
ColPct|1|2|3|4|Total
+++++
0|28|97|93|55|273
|7.00|24.25|23.25|13.75|68.25
|10.26|35.53|34.07|20.15|
|45.90|64.24|76.86|82.09|
+++++
1|33|54|28|12|127
|8.25|13.50|7.00|3.00|31.75
|25.98|42.52|22.05|9.45|
|54.10|35.76|23.14|17.91|
+++++
Total6115112167400
15.2537.7530.2516.75100.00
Analysismethodsyoumightconsider
Belowisalistofsomeanalysismethodsyoumayhaveencountered.Someofthemethodslistedarequitereasonablewhileothershaveeitherfallenoutof
favororhavelimitations.
Logisticregression,thefocusofthispage.
Probitregression.Probitanalysiswillproduceresultssimilartologisticregression.Thechoiceofprobitversuslogitdependslargelyonindividual
preferences.
OLSregression.Whenusedwithabinaryresponsevariable,thismodelisknownasalinearprobabilitymodelandcanbeusedasawaytodescribe
conditionalprobabilities.However,theerrors(i.e.,residuals)fromthelinearprobabilitymodelviolatethehomoskedasticityandnormalityoferrors
assumptionsofOLSregression,resultingininvalidstandarderrorsandhypothesistests.Foramorethoroughdiscussionoftheseandotherproblems
withthelinearprobabilitymodel,seeLong(1997,p.3840).
Twogroupdiscriminantfunctionanalysis.Amultivariatemethodfordichotomousoutcomevariables.
Hotelling'sT2.The0/1outcomeisturnedintothegroupingvariable,andtheformerpredictorsareturnedintooutcomevariables.Thiswillproducean
overalltestofsignificancebutwillnotgiveindividualcoefficientsforeachvariable,anditisuncleartheextenttowhicheach"predictor"isadjustedfor
theimpactoftheother"predictors."
Usingthelogitmodel
Belowwerunthelogisticregressionmodel.Tomodel1sratherthan0s,weusethedescendingoption.Wedothisbecausebydefault,proclogistic
models0sratherthan1s,inthiscasethatwouldmeanpredictingtheprobabilityofnotgettingintograduateschool(admit=0)versusgettingin(admit=1).
Mathematically,themodelsareequivalent,butconceptually,itprobablymakesmoresensetomodeltheprobabilityofgettingintograduateschool
versusnotgettingin.TheclassstatementtellsSASthatrankisacategoricalvariable.Theparam=refoptionaftertheslashrequestsdummycoding,
ratherthanthedefaulteffectscoding,forthelevelsofrank.Formoreinformationondummyversuseffectscodinginproclogistic,seeourFAQpage:In
PROCLOGISTICwhyaren'tthecoefficientsconsistentwiththeoddsratios?.
proclogisticdata="c:\data\binary"descending;
classrank/param=ref;
modeladmit=gregparank;
run;
Theoutputfromproclogisticisbrokenintoseveralsectionseachofwhichisdiscussedbelow.
TheLOGISTICProcedure
ModelInformation
DataSetDATA.LOGITWrittenbySAS
ResponseVariableADMIT
NumberofResponseLevels2
Modelbinarylogit
OptimizationTechniqueFisher'sscoring
NumberofObservationsRead400
NumberofObservationsUsed400
ResponseProfile
http://www.ats.ucla.edu/stat/sas/dae/logit.htm
2/6
02/06/2016
SASDataAnalysisExamples:LogitRegression
OrderedTotal
ValueADMITFrequency
11127
20273
ProbabilitymodeledisADMIT=1.
ClassLevelInformation
ClassValueDesignVariables
RANK1100
2010
3001
4000
ModelConvergenceStatus
Convergencecriterion(GCONV=1E8)satisfied.
Thefirstpartoftheaboveoutputtellsusthefilebeinganalyzed(c:\data\binary)andthenumberofobservationsused.Weseethatall400
observationsinourdatasetwereusedintheanalysis(fewerobservationswouldhavebeenusedifanyofourvariableshadmissingvalues).
WealsoseethatSASismodelingadmitusingabinarylogitmodelandthattheprobabilitythatofadmit=1isbeingmodeled.(Ifweomittedthe
descendingoption,SASwouldmodeladmitbeing0andourresultswouldbecompletelyreversed.)
ModelFitStatistics
Intercept
Interceptand
CriterionOnlyCovariates
AIC501.977470.517
SC505.968494.466
2LogL499.977458.517
TestingGlobalNullHypothesis:BETA=0
TestChiSquareDFPr>ChiSq
LikelihoodRatio41.45905<.0001
Score40.16035<.0001
Wald36.13905<.0001
Type3AnalysisofEffects
Wald
EffectDFChiSquarePr>ChiSq
GRE14.28420.0385
GPA15.87140.0154
RANK320.89490.0001
TheportionoftheoutputlabeledModelFitStatisticsdescribesandteststheoverallfitofthemodel.The2LogL(499.977)canbeusedin
comparisonsofnestedmodels,butwewon'tshowanexampleofthathere.
Inthenextsectionofoutput,thelikelihoodratiochisquareof41.4590withapvalueof0.0001tellsusthatourmodelasawholefitssignificantly
betterthananemptymodel.TheScoreandWaldtestsareasymptoticallyequivalenttestsofthesamehypothesistestedbythelikelihoodratiotest,
notsurprisingly,thesetestsalsoindicatethatthemodelisstatisticallysignificant.
ThesectionlabeledType3AnalysisofEffects,showsthehypothesistestsforeachofthevariablesinthemodelindividually.Thechisquaretest
statisticsandassociatedpvaluesshowninthetableindicatethateachofthethreevariablesinthemodelsignificantlyimprovethemodelfit.Forgre,
andgpa,thistestduplicatesthetestofthecoefficientsshownbelow.However,forclassvariables(e.g.,rank),thistablegivesthemultipledegreeof
freedomtestfortheoveralleffectofthevariable.
http://www.ats.ucla.edu/stat/sas/dae/logit.htm
3/6
02/06/2016
SASDataAnalysisExamples:LogitRegression
TheLOGISTICProcedure
AnalysisofMaximumLikelihoodEstimates
StandardWald
ParameterDFEstimateErrorChiSquarePr>ChiSq
Intercept15.54141.138123.7081<.0001
GRE10.002260.001094.28420.0385
GPA10.80400.33185.87140.0154
RANK111.55140.417813.78700.0002
RANK210.87600.36675.70560.0169
RANK310.21120.39290.28910.5908
Theabovetableshowsthecoefficients(labeledEstimate),theirstandarderrors(error),theWaldChiSquarestatistic,andassociatedpvalues.The
coefficientsforgre,andgpaarestatisticallysignificant,asarethetermsforrank=1andrank=2(versustheomittedcategoryrank=4).Thelogistic
regressioncoefficientsgivethechangeinthelogoddsoftheoutcomeforaoneunitincreaseinthepredictorvariable.
Foreveryoneunitchangeingre,thelogoddsofadmission(versusnonadmission)increasesby0.002.
Foraoneunitincreaseingpa,thelogoddsofbeingadmittedtograduateschoolincreasesby0.804.
Thecoefficientsforthecategoriesofrankhaveaslightlydifferentinterpretation.Forexample,havingattendedanundergraduateinstitutionwitha
rankof1,versusaninstitutionwitharankof4,increasesthelogoddsofadmissionby1.55.
OddsRatioEstimates
Point95%Wald
EffectEstimateConfidenceLimits
GRE1.0021.0001.004
GPA2.2351.1664.282
RANK1vs44.7182.08010.701
RANK2vs42.4011.1704.927
RANK3vs41.2350.5722.668
AssociationofPredictedProbabilitiesandObservedResponses
PercentConcordant69.1Somers'D0.386
PercentDiscordant30.6Gamma0.387
PercentTied0.3Taua0.168
Pairs34671c0.693
Thefirsttableabovegivesthecoefficientsasoddsratios.Anoddsratioistheexponentiatedcoefficient,andcanbeinterpretedasthemultiplicative
changeintheoddsforaoneunitchangeinthepredictorvariable.Forexample,foraoneunitincreaseingpa,theoddsofbeingadmittedtograduate
school(versusnotbeingadmitted)increasebyafactorof2.24.FormoreinformationoninterpretingoddsratiosseeourFAQpage:HowdoIinterpret
oddsratiosinlogisticregression?
Theoutputgivesatestfortheoveralleffectofrank,aswellascoefficientsthatdescribethedifferencebetweenthereferencegroup(rank=4)andeachof
theotherthreegroups.Wecanalsotestfordifferencesbetweentheotherlevelsofrank.Forexample,wemightwanttotestforadifferenceincoefficients
forrank=2andrank=3,thatis,tocomparetheoddsofadmissionforstudentswhoattendedauniversitywitharankof2,tostudentswhoattendeda
universitywitharankof3.Wecantestthistypeofhypothesisbyaddingacontraststatementtothecodeforproclogistic.Thesyntaxshownbelowisthe
sameasthatshownabove,exceptthatitincludesacontraststatement.Followingthewordcontrast,isthelabelthatwillappearintheoutput,enclosedin
singlequotes(i.e.,'rank2vs.rank3').Thisisfollowedbythenameofthevariablewewishtotesthypothesesabout(i.e.,rank),andavectorthat
describesthedesiredcomparison(i.e.,011).Inthiscasethevaluecomputedisthedifferencebetweenthecoefficientsforrank=2andrank=3.Afterthe
slash(i.e.,/)weusetheestimate=parmoptiontorequestthattheestimatebethedifferenceincoefficients.Formoreinformationonuseofthecontrast
statement,seeourFAQpage:HowcanIcreatecontrastswithproclogistic?.
proclogisticdata="c:\data\binary"descending;
classrank/param=ref;
modeladmit=gregparank;
contrast'rank2vs3'rank011/estimate=parm;
run;
ContrastTestResults
Wald
ContrastDFChiSquarePr>ChiSq
http://www.ats.ucla.edu/stat/sas/dae/logit.htm
4/6
02/06/2016
SASDataAnalysisExamples:LogitRegression
rank2vs.315.50520.0190
ContrastRowsEstimationandTestingResults
StandardWald
ContrastTypeRowEstimateErrorAlphaConfidenceLimitsChiSquarePr>ChiSq
rank2vs.3PARM10.66480.28330.050.10951.22005.50520.0190
Becausethemodelsarethesame,mostoftheoutputproducedbytheaboveproclogisticcommandisthesameasbefore.Theonlydifferenceisthe
additionaloutputproducedbythecontraststatement.UndertheheadingContrastTestResultsweseethelabelforthecontrast(rank2versus3)along
withitsdegreesoffreedom,Waldchisquarestatistic,andpvalue.Basedonthepvalueinthistableweknowthatthecoefficientforrank=2issignificantly
differentfromthecoefficientforrank=3.Thesecondtable,showsmoredetailedinformation,includingtheactualestimateofthedifference(under
Estimate),it'sstandarderror,confidencelimits,teststatistic,andpvalue.Wecanseethattheestimateddifferencewas0.6648,indicatingthathaving
attendedanundergraduateinstitutionwitharankof2,versusaninstitutionwitharankof3,increasesthelogoddsofadmissionby0.67.
Youcanalsousepredictedprobabilitiestohelpyouunderstandthemodel.Thecontraststatementcanbeusedtoestimatepredictedprobabilitiesby
specifyingestimate=prob.Inthesyntaxbelowweusemultiplecontraststatementstoestimatethepredictedprobabilityofadmissionasgrechangesfrom
200to800(inincrementsof100).Whenestimatingthepredictedprobabilitiesweholdgpaconstantat3.39(itsmean),andrankat2.Thetermintercept
followedbya1indicatesthattheinterceptforthemodelistobeincludedinestimate.
proclogisticdata="c:\data\binary"descending;
classrank/param=ref;
modeladmit=gregparank;
contrast'gre=200'intercept1gre200gpa3.3899rank010/estimate=prob;
contrast'gre=300'intercept1gre300gpa3.3899rank010/estimate=prob;
contrast'gre=400'intercept1gre400gpa3.3899rank010/estimate=prob;
contrast'gre=500'intercept1gre500gpa3.3899rank010/estimate=prob;
contrast'gre=600'intercept1gre600gpa3.3899rank010/estimate=prob;
contrast'gre=700'intercept1gre700gpa3.3899rank010/estimate=prob;
contrast'gre=800'intercept1gre800gpa3.3899rank010/estimate=prob;
run;
ContrastTestResults
Wald
ContrastDFChiSquarePr>ChiSq
gre=20019.77520.0018
gre=300111.24830.0008
gre=400113.32310.0003
gre=500115.09840.0001
gre=600111.22910.0008
gre=70013.07690.0794
gre=80010.21750.6409
ContrastRowsEstimationandTestingResults
StandardWald
ContrastTypeRowEstimateErrorAlphaConfidenceLimitsChiSquarePr>ChiSq
gre=200PROB10.18440.07150.050.08170.36489.77520.0018
gre=300PROB10.22090.06470.050.11950.371911.24830.0008
gre=400PROB10.26230.05480.050.16950.382513.32310.0003
gre=500PROB10.30840.04430.050.22880.401315.09840.0001
gre=600PROB10.35870.03990.050.28470.440011.22910.0008
gre=700PROB10.41220.04900.050.32060.51043.07690.0794
gre=800PROB10.46800.06850.050.33910.60130.2175
Aswiththepreviousexample,wehaveomittedmostoftheproclogisticoutput,becauseitisthesameasbefore.Thepredictedprobabilitiesareincluded
inthecolumnlabeledEstimateinthesecondtableshownabove.Lookingattheestimates,wecanseethatthepredictedprobabilityofbeingadmittedis
only0.18ifone'sgrescoreis200,butincreasesto0.47ifone'sgrescoreis800,holdinggpaatitsmean(3.39),andrankat2.
Thingstoconsider
Emptycellsorsmallcells:Youshouldcheckforemptyorsmallcellsbydoingacrosstabbetweencategoricalpredictorsandtheoutcomevariable.Ifa
cellhasveryfewcases(asmallcell),themodelmaybecomeunstableoritmightnotrunatall.
Separationorquasiseparation(alsocalledperfectprediction):Aconditioninwhichtheoutcomedoesnotvaryatsomelevelsoftheindependent
variables.SeeourpageFAQ:Whatiscompleteorquasicompleteseparationinlogistic/probitregressionandhowdowedealwiththem?for
informationonmodelswithperfectprediction.
Samplesize:BothlogitandprobitmodelsrequiremorecasesthanOLSregressionbecausetheyusemaximumlikelihoodestimationtechniques.Itis
http://www.ats.ucla.edu/stat/sas/dae/logit.htm
5/6
02/06/2016
SASDataAnalysisExamples:LogitRegression
sometimespossibletoestimatemodelsforbinaryoutcomesindatasetswithonlyasmallnumberofcasesusingexactlogisticregression(available
withtheexactoptioninproclogistic).Formoreinformationseeourdataanalysisexampleforexactlogisticregression.Itisalsoimportanttokeepin
mindthatwhentheoutcomeisrare,eveniftheoveralldatasetislarge,itcanbedifficulttoestimatealogitmodel.
PseudoRsquared:ManydifferentmeasuresofpsuedoRsquaredexist.TheyallattempttoprovideinformationsimilartothatprovidedbyRsquared
inOLSregressionhowever,noneofthemcanbeinterpretedexactlyasRsquaredinOLSregressionisinterpreted.Foradiscussionofvarious
pseudoRsquaredsseeLongandFreese(2006)orourFAQpageWhatarepseudoRsquareds?
Diagnostics:ThediagnosticsforlogisticregressionaredifferentfromthoseforOLSregression.Foradiscussionofmodeldiagnosticsforlogistic
regression,seeHosmerandLemeshow(2000,Chapter5).Notethatdiagnosticsdoneforlogisticregressionaresimilartothosedoneforprobit
regression.
Bydefault,proclogisticmodelstheprobabilityofthelowervaluedcategory(0ifyourvariableiscoded0/1),ratherthanthehighervaluedcategory.
References
Hosmer,D.andLemeshow,S.(2000).AppliedLogisticRegression(SecondEdition).NewYork:JohnWileyandSons,Inc.
Long,J.Scott(1997).RegressionModelsforCategoricalandLimitedDependentVariables.ThousandOaks,CA:SagePublications.
Seealso
HowdoIinterpretoddsratiosinlogisticregression?
Whyaremylogisticresultsreversed?
SASAnnotatedOutput:proclogistic
SASSeminar:LogisticRegressioninSAS
SASLinksbyTopic:LogisticRegression
ASTextbookExamples:AppliedLogisticRegression(SecondEdition)byDavidHosmerandStanleyLemeshow
ATutorialonLogisticRegression(PDF)byYingSo,fromSUGIProceedings,1995,courtesyofSAS).
SomeIssuesinUsingPROCLOGISTICforBinaryLogisticRegression(PDF)byDavidC.Schlotzhauer,courtesyofSAS).
LogisticRegressionExamplesUsingtheSASSystembySASInstitute
LogisticRegressionUsingtheSASSystem:TheoryandApplicationbyPaulD.Allison
Howtocitethispage
Reportanerroronthispageorleaveacomment
Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.
High Performance
Computing
Statistical Computing
ABOUT
2016 UC Regents
CONTACT
NEWS
HighPerformanceComputing
GIS
StatisticalComputing
Hoffman2Cluster
Mapshare
Classes
Hoffman2AccountApplication
Visualization
Conferences
Hoffman2UsageStatistics
3DModeling
ReadingMaterials
UCGridPortal
TechnologySandbox
IDREListserv
UCLAGridPortal
TechSandboxAccess
IDREResources
SharedCluster&Storage
DataCenters
SocialSciencesDataArchive
AboutIDRE
EVENTS
OUR EXPERTS
http://www.ats.ucla.edu/stat/sas/dae/logit.htm
6/6