Sie sind auf Seite 1von 6

02/06/2016

SASDataAnalysisExamples:LogitRegression

HelptheStatConsultingGroupby

stat

>

sas

>

dae

givingagift

>logit.htm

SASDataAnalysisExamples
LogitRegression
Logisticregression,alsocalledalogitmodel,isusedtomodeldichotomousoutcomevariables.Inthelogitmodelthelogoddsoftheoutcomeismodeled
asalinearcombinationofthepredictorvariables.
Pleasenote:Thepurposeofthispageistoshowhowtousevariousdataanalysiscommands.Itdoesnotcoverallaspectsoftheresearchprocesswhich
researchersareexpectedtodo.Inparticular,itdoesnotcoverdatacleaningandchecking,verificationofassumptions,modeldiagnosticsandpotential
followupanalyses.

Examples
Example1:Supposethatweareinterestedinthefactorsthatinfluencewhetherapoliticalcandidatewinsanelection.Theoutcome(response)variableis
binary(0/1)winorlose.Thepredictorvariablesofinterestaretheamountofmoneyspentonthecampaign,theamountoftimespentcampaigning
negatively,andwhetherthecandidateisanincumbent.
Example2:Aresearcherisinterestedinhowvariables,suchasGRE(GraduateRecordExamscores),GPA(gradepointaverage)andprestigeofthe
undergraduateinstitution,affectadmissionintograduateschool.Theoutcomevariable,admit/don'tadmit,isbinary.

Descriptionofthedata
Forourdataanalysisbelow,wearegoingtoexpandonExample2aboutgettingintograduateschool.Wehavegeneratedhypotheticaldata,whichcanbe
obtainedfromourwebsitebyclickingonbinary.sas7bdat.Youcanstorethisanywhereyoulike,butthesyntaxbelowassumesithasbeenstoredinthe
directoryc:\data.Thisdatasethasabinaryresponse(outcome,dependent)variablecalledadmit,whichisequalto1iftheindividualwasadmittedto
graduateschool,and0otherwise.Therearethreepredictorvariables:gre,gpa,andrank.Wewilltreatthevariablesgreandgpaascontinuous.The
variableranktakesonthevalues1through4.Institutionswitharankof1havethehighestprestige,whilethosewitharankof4havethelowest.Westart
outbylookingatsomedescriptivestatistics.

procmeansdata="c:\data\binary";
vargregpa;
run;
TheMEANSProcedure
VariableNMeanStdDevMinimumMaximum

GRE400587.7000000115.5165364220.0000000800.0000000
GPA4003.38990000.38056682.26000004.0000000

procfreqdata="c:\data\binary";
tablesrankadmitadmit*rank;
run;
TheFREQProcedure
CumulativeCumulative
RANKFrequencyPercentFrequencyPercent

16115.256115.25
215137.7521253.00
312130.2533383.25
46716.75400100.00

CumulativeCumulative
ADMITFrequencyPercentFrequencyPercent

027368.2527368.25
112731.75400100.00

TableofADMITbyRANK
http://www.ats.ucla.edu/stat/sas/dae/logit.htm

1/6

02/06/2016

SASDataAnalysisExamples:LogitRegression

ADMITRANK
Frequency|
Percent|
RowPct|
ColPct|1|2|3|4|Total
+++++
0|28|97|93|55|273
|7.00|24.25|23.25|13.75|68.25
|10.26|35.53|34.07|20.15|
|45.90|64.24|76.86|82.09|
+++++
1|33|54|28|12|127
|8.25|13.50|7.00|3.00|31.75
|25.98|42.52|22.05|9.45|
|54.10|35.76|23.14|17.91|
+++++
Total6115112167400
15.2537.7530.2516.75100.00

Analysismethodsyoumightconsider
Belowisalistofsomeanalysismethodsyoumayhaveencountered.Someofthemethodslistedarequitereasonablewhileothershaveeitherfallenoutof
favororhavelimitations.

Logisticregression,thefocusofthispage.
Probitregression.Probitanalysiswillproduceresultssimilartologisticregression.Thechoiceofprobitversuslogitdependslargelyonindividual
preferences.
OLSregression.Whenusedwithabinaryresponsevariable,thismodelisknownasalinearprobabilitymodelandcanbeusedasawaytodescribe
conditionalprobabilities.However,theerrors(i.e.,residuals)fromthelinearprobabilitymodelviolatethehomoskedasticityandnormalityoferrors
assumptionsofOLSregression,resultingininvalidstandarderrorsandhypothesistests.Foramorethoroughdiscussionoftheseandotherproblems
withthelinearprobabilitymodel,seeLong(1997,p.3840).
Twogroupdiscriminantfunctionanalysis.Amultivariatemethodfordichotomousoutcomevariables.
Hotelling'sT2.The0/1outcomeisturnedintothegroupingvariable,andtheformerpredictorsareturnedintooutcomevariables.Thiswillproducean
overalltestofsignificancebutwillnotgiveindividualcoefficientsforeachvariable,anditisuncleartheextenttowhicheach"predictor"isadjustedfor
theimpactoftheother"predictors."

Usingthelogitmodel
Belowwerunthelogisticregressionmodel.Tomodel1sratherthan0s,weusethedescendingoption.Wedothisbecausebydefault,proclogistic
models0sratherthan1s,inthiscasethatwouldmeanpredictingtheprobabilityofnotgettingintograduateschool(admit=0)versusgettingin(admit=1).
Mathematically,themodelsareequivalent,butconceptually,itprobablymakesmoresensetomodeltheprobabilityofgettingintograduateschool
versusnotgettingin.TheclassstatementtellsSASthatrankisacategoricalvariable.Theparam=refoptionaftertheslashrequestsdummycoding,
ratherthanthedefaulteffectscoding,forthelevelsofrank.Formoreinformationondummyversuseffectscodinginproclogistic,seeourFAQpage:In
PROCLOGISTICwhyaren'tthecoefficientsconsistentwiththeoddsratios?.

proclogisticdata="c:\data\binary"descending;
classrank/param=ref;
modeladmit=gregparank;
run;
Theoutputfromproclogisticisbrokenintoseveralsectionseachofwhichisdiscussedbelow.

TheLOGISTICProcedure
ModelInformation
DataSetDATA.LOGITWrittenbySAS
ResponseVariableADMIT
NumberofResponseLevels2
Modelbinarylogit
OptimizationTechniqueFisher'sscoring
NumberofObservationsRead400
NumberofObservationsUsed400
ResponseProfile

http://www.ats.ucla.edu/stat/sas/dae/logit.htm

2/6

02/06/2016

SASDataAnalysisExamples:LogitRegression

OrderedTotal
ValueADMITFrequency
11127
20273
ProbabilitymodeledisADMIT=1.
ClassLevelInformation
ClassValueDesignVariables
RANK1100
2010
3001
4000
ModelConvergenceStatus
Convergencecriterion(GCONV=1E8)satisfied.
Thefirstpartoftheaboveoutputtellsusthefilebeinganalyzed(c:\data\binary)andthenumberofobservationsused.Weseethatall400
observationsinourdatasetwereusedintheanalysis(fewerobservationswouldhavebeenusedifanyofourvariableshadmissingvalues).
WealsoseethatSASismodelingadmitusingabinarylogitmodelandthattheprobabilitythatofadmit=1isbeingmodeled.(Ifweomittedthe
descendingoption,SASwouldmodeladmitbeing0andourresultswouldbecompletelyreversed.)

ModelFitStatistics
Intercept
Interceptand
CriterionOnlyCovariates
AIC501.977470.517
SC505.968494.466
2LogL499.977458.517

TestingGlobalNullHypothesis:BETA=0
TestChiSquareDFPr>ChiSq
LikelihoodRatio41.45905<.0001
Score40.16035<.0001
Wald36.13905<.0001
Type3AnalysisofEffects
Wald
EffectDFChiSquarePr>ChiSq
GRE14.28420.0385
GPA15.87140.0154
RANK320.89490.0001
TheportionoftheoutputlabeledModelFitStatisticsdescribesandteststheoverallfitofthemodel.The2LogL(499.977)canbeusedin
comparisonsofnestedmodels,butwewon'tshowanexampleofthathere.
Inthenextsectionofoutput,thelikelihoodratiochisquareof41.4590withapvalueof0.0001tellsusthatourmodelasawholefitssignificantly
betterthananemptymodel.TheScoreandWaldtestsareasymptoticallyequivalenttestsofthesamehypothesistestedbythelikelihoodratiotest,
notsurprisingly,thesetestsalsoindicatethatthemodelisstatisticallysignificant.
ThesectionlabeledType3AnalysisofEffects,showsthehypothesistestsforeachofthevariablesinthemodelindividually.Thechisquaretest
statisticsandassociatedpvaluesshowninthetableindicatethateachofthethreevariablesinthemodelsignificantlyimprovethemodelfit.Forgre,
andgpa,thistestduplicatesthetestofthecoefficientsshownbelow.However,forclassvariables(e.g.,rank),thistablegivesthemultipledegreeof
freedomtestfortheoveralleffectofthevariable.

http://www.ats.ucla.edu/stat/sas/dae/logit.htm

3/6

02/06/2016

SASDataAnalysisExamples:LogitRegression

TheLOGISTICProcedure
AnalysisofMaximumLikelihoodEstimates
StandardWald
ParameterDFEstimateErrorChiSquarePr>ChiSq
Intercept15.54141.138123.7081<.0001
GRE10.002260.001094.28420.0385
GPA10.80400.33185.87140.0154
RANK111.55140.417813.78700.0002
RANK210.87600.36675.70560.0169
RANK310.21120.39290.28910.5908
Theabovetableshowsthecoefficients(labeledEstimate),theirstandarderrors(error),theWaldChiSquarestatistic,andassociatedpvalues.The
coefficientsforgre,andgpaarestatisticallysignificant,asarethetermsforrank=1andrank=2(versustheomittedcategoryrank=4).Thelogistic
regressioncoefficientsgivethechangeinthelogoddsoftheoutcomeforaoneunitincreaseinthepredictorvariable.
Foreveryoneunitchangeingre,thelogoddsofadmission(versusnonadmission)increasesby0.002.
Foraoneunitincreaseingpa,thelogoddsofbeingadmittedtograduateschoolincreasesby0.804.
Thecoefficientsforthecategoriesofrankhaveaslightlydifferentinterpretation.Forexample,havingattendedanundergraduateinstitutionwitha
rankof1,versusaninstitutionwitharankof4,increasesthelogoddsofadmissionby1.55.

OddsRatioEstimates
Point95%Wald
EffectEstimateConfidenceLimits
GRE1.0021.0001.004
GPA2.2351.1664.282
RANK1vs44.7182.08010.701
RANK2vs42.4011.1704.927
RANK3vs41.2350.5722.668

AssociationofPredictedProbabilitiesandObservedResponses
PercentConcordant69.1Somers'D0.386
PercentDiscordant30.6Gamma0.387
PercentTied0.3Taua0.168
Pairs34671c0.693
Thefirsttableabovegivesthecoefficientsasoddsratios.Anoddsratioistheexponentiatedcoefficient,andcanbeinterpretedasthemultiplicative
changeintheoddsforaoneunitchangeinthepredictorvariable.Forexample,foraoneunitincreaseingpa,theoddsofbeingadmittedtograduate
school(versusnotbeingadmitted)increasebyafactorof2.24.FormoreinformationoninterpretingoddsratiosseeourFAQpage:HowdoIinterpret
oddsratiosinlogisticregression?

Theoutputgivesatestfortheoveralleffectofrank,aswellascoefficientsthatdescribethedifferencebetweenthereferencegroup(rank=4)andeachof
theotherthreegroups.Wecanalsotestfordifferencesbetweentheotherlevelsofrank.Forexample,wemightwanttotestforadifferenceincoefficients
forrank=2andrank=3,thatis,tocomparetheoddsofadmissionforstudentswhoattendedauniversitywitharankof2,tostudentswhoattendeda
universitywitharankof3.Wecantestthistypeofhypothesisbyaddingacontraststatementtothecodeforproclogistic.Thesyntaxshownbelowisthe
sameasthatshownabove,exceptthatitincludesacontraststatement.Followingthewordcontrast,isthelabelthatwillappearintheoutput,enclosedin
singlequotes(i.e.,'rank2vs.rank3').Thisisfollowedbythenameofthevariablewewishtotesthypothesesabout(i.e.,rank),andavectorthat
describesthedesiredcomparison(i.e.,011).Inthiscasethevaluecomputedisthedifferencebetweenthecoefficientsforrank=2andrank=3.Afterthe
slash(i.e.,/)weusetheestimate=parmoptiontorequestthattheestimatebethedifferenceincoefficients.Formoreinformationonuseofthecontrast
statement,seeourFAQpage:HowcanIcreatecontrastswithproclogistic?.

proclogisticdata="c:\data\binary"descending;
classrank/param=ref;
modeladmit=gregparank;
contrast'rank2vs3'rank011/estimate=parm;
run;
ContrastTestResults
Wald
ContrastDFChiSquarePr>ChiSq

http://www.ats.ucla.edu/stat/sas/dae/logit.htm

4/6

02/06/2016

SASDataAnalysisExamples:LogitRegression

rank2vs.315.50520.0190
ContrastRowsEstimationandTestingResults
StandardWald
ContrastTypeRowEstimateErrorAlphaConfidenceLimitsChiSquarePr>ChiSq
rank2vs.3PARM10.66480.28330.050.10951.22005.50520.0190
Becausethemodelsarethesame,mostoftheoutputproducedbytheaboveproclogisticcommandisthesameasbefore.Theonlydifferenceisthe
additionaloutputproducedbythecontraststatement.UndertheheadingContrastTestResultsweseethelabelforthecontrast(rank2versus3)along
withitsdegreesoffreedom,Waldchisquarestatistic,andpvalue.Basedonthepvalueinthistableweknowthatthecoefficientforrank=2issignificantly
differentfromthecoefficientforrank=3.Thesecondtable,showsmoredetailedinformation,includingtheactualestimateofthedifference(under
Estimate),it'sstandarderror,confidencelimits,teststatistic,andpvalue.Wecanseethattheestimateddifferencewas0.6648,indicatingthathaving
attendedanundergraduateinstitutionwitharankof2,versusaninstitutionwitharankof3,increasesthelogoddsofadmissionby0.67.
Youcanalsousepredictedprobabilitiestohelpyouunderstandthemodel.Thecontraststatementcanbeusedtoestimatepredictedprobabilitiesby
specifyingestimate=prob.Inthesyntaxbelowweusemultiplecontraststatementstoestimatethepredictedprobabilityofadmissionasgrechangesfrom
200to800(inincrementsof100).Whenestimatingthepredictedprobabilitiesweholdgpaconstantat3.39(itsmean),andrankat2.Thetermintercept
followedbya1indicatesthattheinterceptforthemodelistobeincludedinestimate.

proclogisticdata="c:\data\binary"descending;
classrank/param=ref;
modeladmit=gregparank;
contrast'gre=200'intercept1gre200gpa3.3899rank010/estimate=prob;
contrast'gre=300'intercept1gre300gpa3.3899rank010/estimate=prob;
contrast'gre=400'intercept1gre400gpa3.3899rank010/estimate=prob;
contrast'gre=500'intercept1gre500gpa3.3899rank010/estimate=prob;
contrast'gre=600'intercept1gre600gpa3.3899rank010/estimate=prob;
contrast'gre=700'intercept1gre700gpa3.3899rank010/estimate=prob;
contrast'gre=800'intercept1gre800gpa3.3899rank010/estimate=prob;
run;
ContrastTestResults
Wald
ContrastDFChiSquarePr>ChiSq
gre=20019.77520.0018
gre=300111.24830.0008
gre=400113.32310.0003
gre=500115.09840.0001
gre=600111.22910.0008
gre=70013.07690.0794
gre=80010.21750.6409
ContrastRowsEstimationandTestingResults
StandardWald
ContrastTypeRowEstimateErrorAlphaConfidenceLimitsChiSquarePr>ChiSq
gre=200PROB10.18440.07150.050.08170.36489.77520.0018
gre=300PROB10.22090.06470.050.11950.371911.24830.0008
gre=400PROB10.26230.05480.050.16950.382513.32310.0003
gre=500PROB10.30840.04430.050.22880.401315.09840.0001
gre=600PROB10.35870.03990.050.28470.440011.22910.0008
gre=700PROB10.41220.04900.050.32060.51043.07690.0794
gre=800PROB10.46800.06850.050.33910.60130.2175
Aswiththepreviousexample,wehaveomittedmostoftheproclogisticoutput,becauseitisthesameasbefore.Thepredictedprobabilitiesareincluded
inthecolumnlabeledEstimateinthesecondtableshownabove.Lookingattheestimates,wecanseethatthepredictedprobabilityofbeingadmittedis
only0.18ifone'sgrescoreis200,butincreasesto0.47ifone'sgrescoreis800,holdinggpaatitsmean(3.39),andrankat2.

Thingstoconsider
Emptycellsorsmallcells:Youshouldcheckforemptyorsmallcellsbydoingacrosstabbetweencategoricalpredictorsandtheoutcomevariable.Ifa
cellhasveryfewcases(asmallcell),themodelmaybecomeunstableoritmightnotrunatall.
Separationorquasiseparation(alsocalledperfectprediction):Aconditioninwhichtheoutcomedoesnotvaryatsomelevelsoftheindependent
variables.SeeourpageFAQ:Whatiscompleteorquasicompleteseparationinlogistic/probitregressionandhowdowedealwiththem?for
informationonmodelswithperfectprediction.
Samplesize:BothlogitandprobitmodelsrequiremorecasesthanOLSregressionbecausetheyusemaximumlikelihoodestimationtechniques.Itis

http://www.ats.ucla.edu/stat/sas/dae/logit.htm

5/6

02/06/2016

SASDataAnalysisExamples:LogitRegression
sometimespossibletoestimatemodelsforbinaryoutcomesindatasetswithonlyasmallnumberofcasesusingexactlogisticregression(available
withtheexactoptioninproclogistic).Formoreinformationseeourdataanalysisexampleforexactlogisticregression.Itisalsoimportanttokeepin
mindthatwhentheoutcomeisrare,eveniftheoveralldatasetislarge,itcanbedifficulttoestimatealogitmodel.
PseudoRsquared:ManydifferentmeasuresofpsuedoRsquaredexist.TheyallattempttoprovideinformationsimilartothatprovidedbyRsquared
inOLSregressionhowever,noneofthemcanbeinterpretedexactlyasRsquaredinOLSregressionisinterpreted.Foradiscussionofvarious
pseudoRsquaredsseeLongandFreese(2006)orourFAQpageWhatarepseudoRsquareds?
Diagnostics:ThediagnosticsforlogisticregressionaredifferentfromthoseforOLSregression.Foradiscussionofmodeldiagnosticsforlogistic
regression,seeHosmerandLemeshow(2000,Chapter5).Notethatdiagnosticsdoneforlogisticregressionaresimilartothosedoneforprobit
regression.
Bydefault,proclogisticmodelstheprobabilityofthelowervaluedcategory(0ifyourvariableiscoded0/1),ratherthanthehighervaluedcategory.

References
Hosmer,D.andLemeshow,S.(2000).AppliedLogisticRegression(SecondEdition).NewYork:JohnWileyandSons,Inc.
Long,J.Scott(1997).RegressionModelsforCategoricalandLimitedDependentVariables.ThousandOaks,CA:SagePublications.

Seealso
HowdoIinterpretoddsratiosinlogisticregression?
Whyaremylogisticresultsreversed?
SASAnnotatedOutput:proclogistic
SASSeminar:LogisticRegressioninSAS
SASLinksbyTopic:LogisticRegression
ASTextbookExamples:AppliedLogisticRegression(SecondEdition)byDavidHosmerandStanleyLemeshow
ATutorialonLogisticRegression(PDF)byYingSo,fromSUGIProceedings,1995,courtesyofSAS).
SomeIssuesinUsingPROCLOGISTICforBinaryLogisticRegression(PDF)byDavidC.Schlotzhauer,courtesyofSAS).
LogisticRegressionExamplesUsingtheSASSystembySASInstitute
LogisticRegressionUsingtheSASSystem:TheoryandApplicationbyPaulD.Allison

Howtocitethispage

Reportanerroronthispageorleaveacomment

Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.

IDRE RESEARCH TECHNOLOGY


GROUP

High Performance
Computing
Statistical Computing

GIS and Visualization

ABOUT
2016 UC Regents

CONTACT

NEWS

HighPerformanceComputing

GIS

StatisticalComputing

Hoffman2Cluster

Mapshare

Classes

Hoffman2AccountApplication

Visualization

Conferences

Hoffman2UsageStatistics

3DModeling

ReadingMaterials

UCGridPortal

TechnologySandbox

IDREListserv

UCLAGridPortal

TechSandboxAccess

IDREResources

SharedCluster&Storage

DataCenters

SocialSciencesDataArchive

AboutIDRE

EVENTS

OUR EXPERTS

Terms of Use & Privacy Policy

http://www.ats.ucla.edu/stat/sas/dae/logit.htm

6/6

Das könnte Ihnen auch gefallen