SAS Data Analysis Examples - Logit Regression

02/06/2016
SASDataAnalysisExamples:LogitRegression
HelptheStatConsultingGroupby
stat
>
sas
>
dae
givingagift
>logit.htm
SASDataAnalysisExamples
LogitRegression
Logisticregression,alsocalledalogitmodel,isusedtomodeldichotomousoutcomevariables.Inthelogitmodelthelogoddsoftheoutcomeismodeled
asalinearcombinationofthepredictorvariables.
Pleasenote:Thepurposeofthispageistoshowhowtousevariousdataanalysiscommands.Itdoesnotcoverallaspectsoftheresearchprocesswhich
researchersareexpectedtodo.Inparticular,itdoesnotcoverdatacleaningandchecking,verificationofassumptions,modeldiagnosticsandpotential
followupanalyses.
Examples
Example1:Supposethatweareinterestedinthefactorsthatinfluencewhetherapoliticalcandidatewinsanelection.Theoutcome(response)variableis
binary(0/1)winorlose.Thepredictorvariablesofinterestaretheamountofmoneyspentonthecampaign,theamountoftimespentcampaigning
negatively,andwhetherthecandidateisanincumbent.
Example2:Aresearcherisinterestedinhowvariables,suchasGRE(GraduateRecordExamscores),GPA(gradepointaverage)andprestigeofthe
undergraduateinstitution,affectadmissionintograduateschool.Theoutcomevariable,admit/don'tadmit,isbinary.
Descriptionofthedata
Forourdataanalysisbelow,wearegoingtoexpandonExample2aboutgettingintograduateschool.Wehavegeneratedhypotheticaldata,whichcanbe
obtainedfromourwebsitebyclickingonbinary.sas7bdat.Youcanstorethisanywhereyoulike,butthesyntaxbelowassumesithasbeenstoredinthe
directoryc:\data.Thisdatasethasabinaryresponse(outcome,dependent)variablecalledadmit,whichisequalto1iftheindividualwasadmittedto
graduateschool,and0otherwise.Therearethreepredictorvariables:gre,gpa,andrank.Wewilltreatthevariablesgreandgpaascontinuous.The
variableranktakesonthevalues1through4.Institutionswitharankof1havethehighestprestige,whilethosewitharankof4havethelowest.Westart
outbylookingatsomedescriptivestatistics.
procmeansdata="c:\data\binary";
vargregpa;
run;
TheMEANSProcedure
VariableNMeanStdDevMinimumMaximum
GRE400587.7000000115.5165364220.0000000800.0000000
GPA4003.38990000.38056682.26000004.0000000
procfreqdata="c:\data\binary";
tablesrankadmitadmit*rank;
run;
TheFREQProcedure
CumulativeCumulative
RANKFrequencyPercentFrequencyPercent
16115.256115.25
215137.7521253.00
312130.2533383.25
46716.75400100.00
CumulativeCumulative
ADMITFrequencyPercentFrequencyPercent
027368.2527368.25
112731.75400100.00
TableofADMITbyRANK
http://www.ats.ucla.edu/stat/sas/dae/logit.htm
1/6
02/06/2016
ADMITRANK
Frequency|
Percent|
RowPct|
ColPct|1|2|3|4|Total
+++++
0|28|97|93|55|273
|7.00|24.25|23.25|13.75|68.25
|10.26|35.53|34.07|20.15|
|45.90|64.24|76.86|82.09|
+++++
1|33|54|28|12|127
|8.25|13.50|7.00|3.00|31.75
|25.98|42.52|22.05|9.45|
|54.10|35.76|23.14|17.91|
+++++
Total6115112167400
15.2537.7530.2516.75100.00
Analysismethodsyoumightconsider
Belowisalistofsomeanalysismethodsyoumayhaveencountered.Someofthemethodslistedarequitereasonablewhileothershaveeitherfallenoutof
favororhavelimitations.
Logisticregression,thefocusofthispage.
Probitregression.Probitanalysiswillproduceresultssimilartologisticregression.Thechoiceofprobitversuslogitdependslargelyonindividual
preferences.
OLSregression.Whenusedwithabinaryresponsevariable,thismodelisknownasalinearprobabilitymodelandcanbeusedasawaytodescribe
conditionalprobabilities.However,theerrors(i.e.,residuals)fromthelinearprobabilitymodelviolatethehomoskedasticityandnormalityoferrors
assumptionsofOLSregression,resultingininvalidstandarderrorsandhypothesistests.Foramorethoroughdiscussionoftheseandotherproblems
withthelinearprobabilitymodel,seeLong(1997,p.3840).
Twogroupdiscriminantfunctionanalysis.Amultivariatemethodfordichotomousoutcomevariables.
Hotelling'sT2.The0/1outcomeisturnedintothegroupingvariable,andtheformerpredictorsareturnedintooutcomevariables.Thiswillproducean
overalltestofsignificancebutwillnotgiveindividualcoefficientsforeachvariable,anditisuncleartheextenttowhicheach"predictor"isadjustedfor
theimpactoftheother"predictors."
Usingthelogitmodel
Belowwerunthelogisticregressionmodel.Tomodel1sratherthan0s,weusethedescendingoption.Wedothisbecausebydefault,proclogistic
models0sratherthan1s,inthiscasethatwouldmeanpredictingtheprobabilityofnotgettingintograduateschool(admit=0)versusgettingin(admit=1).
Mathematically,themodelsareequivalent,butconceptually,itprobablymakesmoresensetomodeltheprobabilityofgettingintograduateschool
versusnotgettingin.TheclassstatementtellsSASthatrankisacategoricalvariable.Theparam=refoptionaftertheslashrequestsdummycoding,
ratherthanthedefaulteffectscoding,forthelevelsofrank.Formoreinformationondummyversuseffectscodinginproclogistic,seeourFAQpage:In
PROCLOGISTICwhyaren'tthecoefficientsconsistentwiththeoddsratios?.
proclogisticdata="c:\data\binary"descending;
classrank/param=ref;
modeladmit=gregparank;
run;
Theoutputfromproclogisticisbrokenintoseveralsectionseachofwhichisdiscussedbelow.
TheLOGISTICProcedure
ModelInformation
DataSetDATA.LOGITWrittenbySAS
ResponseVariableADMIT
NumberofResponseLevels2
Modelbinarylogit
OptimizationTechniqueFisher'sscoring
NumberofObservationsRead400
NumberofObservationsUsed400
ResponseProfile
2/6
02/06/2016
OrderedTotal
ValueADMITFrequency
11127
20273
ProbabilitymodeledisADMIT=1.
ClassLevelInformation
ClassValueDesignVariables
RANK1100
2010
3001
4000
ModelConvergenceStatus
Convergencecriterion(GCONV=1E8)satisfied.
Thefirstpartoftheaboveoutputtellsusthefilebeinganalyzed(c:\data\binary)andthenumberofobservationsused.Weseethatall400
observationsinourdatasetwereusedintheanalysis(fewerobservationswouldhavebeenusedifanyofourvariableshadmissingvalues).
WealsoseethatSASismodelingadmitusingabinarylogitmodelandthattheprobabilitythatofadmit=1isbeingmodeled.(Ifweomittedthe
descendingoption,SASwouldmodeladmitbeing0andourresultswouldbecompletelyreversed.)
ModelFitStatistics
Intercept
Interceptand
CriterionOnlyCovariates
AIC501.977470.517
SC505.968494.466
2LogL499.977458.517
TestingGlobalNullHypothesis:BETA=0
TestChiSquareDFPr>ChiSq
LikelihoodRatio41.45905<.0001
Score40.16035<.0001
Wald36.13905<.0001
Type3AnalysisofEffects
Wald
EffectDFChiSquarePr>ChiSq
GRE14.28420.0385
GPA15.87140.0154
RANK320.89490.0001
TheportionoftheoutputlabeledModelFitStatisticsdescribesandteststheoverallfitofthemodel.The2LogL(499.977)canbeusedin
comparisonsofnestedmodels,butwewon'tshowanexampleofthathere.
Inthenextsectionofoutput,thelikelihoodratiochisquareof41.4590withapvalueof0.0001tellsusthatourmodelasawholefitssignificantly
betterthananemptymodel.TheScoreandWaldtestsareasymptoticallyequivalenttestsofthesamehypothesistestedbythelikelihoodratiotest,
notsurprisingly,thesetestsalsoindicatethatthemodelisstatisticallysignificant.
ThesectionlabeledType3AnalysisofEffects,showsthehypothesistestsforeachofthevariablesinthemodelindividually.Thechisquaretest
statisticsandassociatedpvaluesshowninthetableindicatethateachofthethreevariablesinthemodelsignificantlyimprovethemodelfit.Forgre,
andgpa,thistestduplicatesthetestofthecoefficientsshownbelow.However,forclassvariables(e.g.,rank),thistablegivesthemultipledegreeof
freedomtestfortheoveralleffectofthevariable.
3/6
02/06/2016
TheLOGISTICProcedure
AnalysisofMaximumLikelihoodEstimates
StandardWald
ParameterDFEstimateErrorChiSquarePr>ChiSq
Intercept15.54141.138123.7081<.0001
GRE10.002260.001094.28420.0385
GPA10.80400.33185.87140.0154
RANK111.55140.417813.78700.0002
RANK210.87600.36675.70560.0169
RANK310.21120.39290.28910.5908
Theabovetableshowsthecoefficients(labeledEstimate),theirstandarderrors(error),theWaldChiSquarestatistic,andassociatedpvalues.The
coefficientsforgre,andgpaarestatisticallysignificant,asarethetermsforrank=1andrank=2(versustheomittedcategoryrank=4).Thelogistic
regressioncoefficientsgivethechangeinthelogoddsoftheoutcomeforaoneunitincreaseinthepredictorvariable.
Foreveryoneunitchangeingre,thelogoddsofadmission(versusnonadmission)increasesby0.002.
Foraoneunitincreaseingpa,thelogoddsofbeingadmittedtograduateschoolincreasesby0.804.
Thecoefficientsforthecategoriesofrankhaveaslightlydifferentinterpretation.Forexample,havingattendedanundergraduateinstitutionwitha
rankof1,versusaninstitutionwitharankof4,increasesthelogoddsofadmissionby1.55.
OddsRatioEstimates
Point95%Wald
EffectEstimateConfidenceLimits
GRE1.0021.0001.004
GPA2.2351.1664.282
RANK1vs44.7182.08010.701
RANK2vs42.4011.1704.927
RANK3vs41.2350.5722.668
AssociationofPredictedProbabilitiesandObservedResponses
PercentConcordant69.1Somers'D0.386
PercentDiscordant30.6Gamma0.387
PercentTied0.3Taua0.168
Pairs34671c0.693
Thefirsttableabovegivesthecoefficientsasoddsratios.Anoddsratioistheexponentiatedcoefficient,andcanbeinterpretedasthemultiplicative
changeintheoddsforaoneunitchangeinthepredictorvariable.Forexample,foraoneunitincreaseingpa,theoddsofbeingadmittedtograduate
school(versusnotbeingadmitted)increasebyafactorof2.24.FormoreinformationoninterpretingoddsratiosseeourFAQpage:HowdoIinterpret
oddsratiosinlogisticregression?
Theoutputgivesatestfortheoveralleffectofrank,aswellascoefficientsthatdescribethedifferencebetweenthereferencegroup(rank=4)andeachof
theotherthreegroups.Wecanalsotestfordifferencesbetweentheotherlevelsofrank.Forexample,wemightwanttotestforadifferenceincoefficients
forrank=2andrank=3,thatis,tocomparetheoddsofadmissionforstudentswhoattendedauniversitywitharankof2,tostudentswhoattendeda
universitywitharankof3.Wecantestthistypeofhypothesisbyaddingacontraststatementtothecodeforproclogistic.Thesyntaxshownbelowisthe
sameasthatshownabove,exceptthatitincludesacontraststatement.Followingthewordcontrast,isthelabelthatwillappearintheoutput,enclosedin
singlequotes(i.e.,'rank2vs.rank3').Thisisfollowedbythenameofthevariablewewishtotesthypothesesabout(i.e.,rank),andavectorthat
describesthedesiredcomparison(i.e.,011).Inthiscasethevaluecomputedisthedifferencebetweenthecoefficientsforrank=2andrank=3.Afterthe
slash(i.e.,/)weusetheestimate=parmoptiontorequestthattheestimatebethedifferenceincoefficients.Formoreinformationonuseofthecontrast
statement,seeourFAQpage:HowcanIcreatecontrastswithproclogistic?.
contrast'rank2vs3'rank011/estimate=parm;
run;
ContrastTestResults
Wald
ContrastDFChiSquarePr>ChiSq
4/6
02/06/2016
rank2vs.315.50520.0190
ContrastRowsEstimationandTestingResults
StandardWald
ContrastTypeRowEstimateErrorAlphaConfidenceLimitsChiSquarePr>ChiSq
rank2vs.3PARM10.66480.28330.050.10951.22005.50520.0190
Becausethemodelsarethesame,mostoftheoutputproducedbytheaboveproclogisticcommandisthesameasbefore.Theonlydifferenceisthe
additionaloutputproducedbythecontraststatement.UndertheheadingContrastTestResultsweseethelabelforthecontrast(rank2versus3)along
withitsdegreesoffreedom,Waldchisquarestatistic,andpvalue.Basedonthepvalueinthistableweknowthatthecoefficientforrank=2issignificantly
differentfromthecoefficientforrank=3.Thesecondtable,showsmoredetailedinformation,includingtheactualestimateofthedifference(under
Estimate),it'sstandarderror,confidencelimits,teststatistic,andpvalue.Wecanseethattheestimateddifferencewas0.6648,indicatingthathaving
attendedanundergraduateinstitutionwitharankof2,versusaninstitutionwitharankof3,increasesthelogoddsofadmissionby0.67.
Youcanalsousepredictedprobabilitiestohelpyouunderstandthemodel.Thecontraststatementcanbeusedtoestimatepredictedprobabilitiesby
specifyingestimate=prob.Inthesyntaxbelowweusemultiplecontraststatementstoestimatethepredictedprobabilityofadmissionasgrechangesfrom
200to800(inincrementsof100).Whenestimatingthepredictedprobabilitiesweholdgpaconstantat3.39(itsmean),andrankat2.Thetermintercept
followedbya1indicatesthattheinterceptforthemodelistobeincludedinestimate.
contrast'gre=200'intercept1gre200gpa3.3899rank010/estimate=prob;
run;
ContrastTestResults
Wald
ContrastDFChiSquarePr>ChiSq
gre=20019.77520.0018
gre=300111.24830.0008
gre=400113.32310.0003
gre=500115.09840.0001
gre=600111.22910.0008
gre=70013.07690.0794
gre=80010.21750.6409
ContrastRowsEstimationandTestingResults
StandardWald
ContrastTypeRowEstimateErrorAlphaConfidenceLimitsChiSquarePr>ChiSq
gre=200PROB10.18440.07150.050.08170.36489.77520.0018
gre=300PROB10.22090.06470.050.11950.371911.24830.0008
gre=400PROB10.26230.05480.050.16950.382513.32310.0003
gre=500PROB10.30840.04430.050.22880.401315.09840.0001
gre=600PROB10.35870.03990.050.28470.440011.22910.0008
gre=700PROB10.41220.04900.050.32060.51043.07690.0794
gre=800PROB10.46800.06850.050.33910.60130.2175
Aswiththepreviousexample,wehaveomittedmostoftheproclogisticoutput,becauseitisthesameasbefore.Thepredictedprobabilitiesareincluded
inthecolumnlabeledEstimateinthesecondtableshownabove.Lookingattheestimates,wecanseethatthepredictedprobabilityofbeingadmittedis
only0.18ifone'sgrescoreis200,butincreasesto0.47ifone'sgrescoreis800,holdinggpaatitsmean(3.39),andrankat2.
Thingstoconsider
Emptycellsorsmallcells:Youshouldcheckforemptyorsmallcellsbydoingacrosstabbetweencategoricalpredictorsandtheoutcomevariable.Ifa
cellhasveryfewcases(asmallcell),themodelmaybecomeunstableoritmightnotrunatall.
Separationorquasiseparation(alsocalledperfectprediction):Aconditioninwhichtheoutcomedoesnotvaryatsomelevelsoftheindependent
variables.SeeourpageFAQ:Whatiscompleteorquasicompleteseparationinlogistic/probitregressionandhowdowedealwiththem?for
informationonmodelswithperfectprediction.
Samplesize:BothlogitandprobitmodelsrequiremorecasesthanOLSregressionbecausetheyusemaximumlikelihoodestimationtechniques.Itis
5/6
02/06/2016
sometimespossibletoestimatemodelsforbinaryoutcomesindatasetswithonlyasmallnumberofcasesusingexactlogisticregression(available
withtheexactoptioninproclogistic).Formoreinformationseeourdataanalysisexampleforexactlogisticregression.Itisalsoimportanttokeepin
mindthatwhentheoutcomeisrare,eveniftheoveralldatasetislarge,itcanbedifficulttoestimatealogitmodel.
PseudoRsquared:ManydifferentmeasuresofpsuedoRsquaredexist.TheyallattempttoprovideinformationsimilartothatprovidedbyRsquared
inOLSregressionhowever,noneofthemcanbeinterpretedexactlyasRsquaredinOLSregressionisinterpreted.Foradiscussionofvarious
pseudoRsquaredsseeLongandFreese(2006)orourFAQpageWhatarepseudoRsquareds?
Diagnostics:ThediagnosticsforlogisticregressionaredifferentfromthoseforOLSregression.Foradiscussionofmodeldiagnosticsforlogistic
regression,seeHosmerandLemeshow(2000,Chapter5).Notethatdiagnosticsdoneforlogisticregressionaresimilartothosedoneforprobit
regression.
Bydefault,proclogisticmodelstheprobabilityofthelowervaluedcategory(0ifyourvariableiscoded0/1),ratherthanthehighervaluedcategory.
References
Hosmer,D.andLemeshow,S.(2000).AppliedLogisticRegression(SecondEdition).NewYork:JohnWileyandSons,Inc.
Long,J.Scott(1997).RegressionModelsforCategoricalandLimitedDependentVariables.ThousandOaks,CA:SagePublications.
Seealso
HowdoIinterpretoddsratiosinlogisticregression?
Whyaremylogisticresultsreversed?
SASAnnotatedOutput:proclogistic
SASSeminar:LogisticRegressioninSAS
SASLinksbyTopic:LogisticRegression
ASTextbookExamples:AppliedLogisticRegression(SecondEdition)byDavidHosmerandStanleyLemeshow
ATutorialonLogisticRegression(PDF)byYingSo,fromSUGIProceedings,1995,courtesyofSAS).
SomeIssuesinUsingPROCLOGISTICforBinaryLogisticRegression(PDF)byDavidC.Schlotzhauer,courtesyofSAS).
LogisticRegressionExamplesUsingtheSASSystembySASInstitute
LogisticRegressionUsingtheSASSystem:TheoryandApplicationbyPaulD.Allison
Howtocitethispage
Reportanerroronthispageorleaveacomment
Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.
IDRE RESEARCH TECHNOLOGY

GROUP
High Performance
Computing
Statistical Computing
GIS and Visualization
ABOUT
2016 UC Regents
CONTACT
NEWS
HighPerformanceComputing
GIS
StatisticalComputing
Hoffman2Cluster
Mapshare
Classes
Hoffman2AccountApplication
Visualization
Conferences
Hoffman2UsageStatistics
3DModeling
ReadingMaterials
UCGridPortal
TechnologySandbox
IDREListserv
UCLAGridPortal
TechSandboxAccess
IDREResources
SharedCluster&Storage
DataCenters
SocialSciencesDataArchive
AboutIDRE
EVENTS
OUR EXPERTS
Terms of Use & Privacy Policy
6/6

SAS Data Analysis Examples - Logit Regression

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

SAS Data Analysis Examples - Logit Regression

Hochgeladen von

Copyright:

Verfügbare Formate

02/06/2016

IDRE RESEARCH TECHNOLOGY

GIS and Visualization

Terms of Use & Privacy Policy

Das könnte Ihnen auch gefallen