Sie sind auf Seite 1von 7

Binary Logistic Regression with PASW/SPSS

Logistic regression is used to predict a categorical (usually dichotomous)


variable from a set of predictor variables. With a categorical dependent variable,
discriminant function analysis is usually employed if all of the predictors are continuous
and nicely distributed; logit analysis is usually employed if all of the predictors are
categorical; and logistic regression is often chosen if the predictor variables are a
mix of continuous and categorical variables and/or if they are not nicely distributed
(logistic regression makes no assumptions about the distributions of the predictor
variables). Logistic regression has been especially popular with medical research in
which the dependent variable is whether or not a patient has a disease.
or a logistic regression, the predicted dependent variable is a function of the
probability that a particular sub!ect will be in one of the categories (for e"ample, the
probability that #u$ie %ue has the disease, given her set of scores on the predictor
variables).
Using a Single Dichotomous Predictor, Gender o Su!"ect
Let us first consider a simple (bivariate) logistic regression, using sub!ects&
decisions as the dichotomous criterion variable and their gender as a dichotomous
predictor variable. ' have coded gender with ( ) emale, * ) +ale, and decision with (
) ,#top the -esearch, and * ) ,%ontinue the -esearch,.
.ur regression model will be predicting the logit, that is, the natural log of the
odds of having made one or the other decision. /hat is,

( ) bX a
Y
Y
ODDS + =

=
0
*
0
ln ln
, where Y
0
is the #redicted #ro!a!ility o the e$ent
which is coded with % &continue the research' rather than with ( (stop the research),
Y
0
* is the predicted probability of the other decision, and 1 is our predictor variable,
gender. #ome statistical programs (such as #2#) predict the event which is coded with
the smaller of the two numeric codes. 3y the way, if you have ever wondered what is
,natural, about the natural log, you can find an answer of sorts at
http4//www.math.toronto.edu/mathnet/answers/answers5*6.html.
.ur model will be constructed by an iterati$e maximum li(elihood #rocedure.
/he program will start with arbitrary values of the regression coefficients and will
construct an initial model for predicting the observed data. 't will then evaluate errors
in such prediction and change the regression coefficients so as make the likelihood of
the observed data greater under the new model. /his procedure is repeated until the
model converges 77 that is, until the differences between the newest model and the
previous model are trivial.
.pen the data file at http4//core.ecu.edu/psyc/wuenschk/#8##/Logistic.sav.
%lick Analy)e, Regression, Binary Logistic. #coot the decision variable into the
9ependent bo" and the gender variable into the %ovariates bo". /he dialog bo"
should now look like this4
:
%opyright ;((< =arl L. Wuensch 7 2ll rights reserved.
Logistic7#8##.doc
%lick *+.
Look at the statistical output. We see that there are 6*> cases used in the
analysis.
,ase Processing Summary
6*> *((.(
( .(
6*> *((.(
( .(
6*> *((.(
?nweighted %ases
a
'ncluded in 2nalysis
+issing %ases
/otal
#elected %ases
?nselected %ases
/otal
@ 8ercent
'f weight is in effect, see classification table for the total
number of cases.
a.
/he Bloc( - output is for a model that includes only the intercept (which 82#W
calls the constant). Aiven the base rates of the two decision options (*BC/6*> ) ><D
decided to stop the research, E*D decided to allow it to continue), and no other
information, the best strategy is to predict, for every case, that the sub!ect will decide to
stop the research. ?sing that strategy, you would be correct ><D of the time.
;
,lassiication .a!le
a,!
*BC ( *((.(
*;B ( .(
><.E
.bserved
stop
continue
decision
.verall 8ercentage
#tep (
stop continue
decision
8ercentage
%orrect
8redicted
%onstant is included in the model.
a.
/he cut value is .>((
b.
?nder /aria!les in the 01uation you see that the intercept7only model is
ln&odds' 2 34567. 'f we e"ponentiate both sides of this e"pression we find that our
predicted odds FG"p(3)H ) .IBE. /hat is, the predicted odds of deciding to continue the
research is .IBE. #ince *;B of our sub!ects decided to continue the research and *BC
decided to stop the research, our o!ser$ed odds are %89/%96 2 4:9;.
/aria!les in the 01uation
7.6C< .**> *(.<*< * .((* .IBE %onstant #tep (
3 #.G. Wald df #ig. G"p(3)
@ow look at the Bloc( % output. Jere 82#W has added the gender variable as
a predictor. *mni!us .ests o <odel ,oeicients gives us a %hi7#Kuare of ;>.I>6
on * df, significant beyond .((*. /his is a test of the null hypothesis that adding the
gender variable to the model has not significantly increased our ability to predict the
decisions made by our sub!ects.
*mni!us .ests o <odel ,oeicients
;>.I>6 * .(((
;>.I>6 * .(((
;>.I>6 * .(((
#tep
3lock
+odel
#tep *
%hi7sKuare df #ig.
?nder <odel Summary we see that the 38 Log Li(elihood statistic is 6<<.<*6.
/his statistic measures how #oorly the model #redicts the decisions 77 the smaller
the statistic the better the model. 2lthough 82#W does not give us this statistic for the
model that had only the intercept, ' know it to be E;>.III. 2dding the gender variable
reduced the 7; Log Likelihood statistic by E;>.III 7 6<<.<*6 ) ;>.I>6, the
;
statistic
we !ust discussed in the previous paragraph. /he ,ox = Snell R
2
can be interpreted
like R
2
in a multiple regression, but cannot reach a ma"imum value of *. /he
>agel(er(e R
2
can reach a ma"imum of *.
6
<odel Summary
6<<.<*6
a
.(CB .*(I
#tep
*
7; Log
likelihood
%o" L #nell
- #Kuare
@agelkerke
- #Kuare
Gstimation terminated at iteration number 6 because
parameter estimates changed by less than .((*.
a.
/he /aria!les in the 01uation output shows us that the regression eKuation is

( ) Gender ODDS ;*C . * BEC . ln + =
.
/aria!les in the 01uation
*.;*C .;E> ;E.C>C * .((( 6.6CI
7.BEC .*>E 6(.*>; * .((( .E;<
gender
%onstant
#tep
*
a
3 #.G. Wald df #ig. G"p(3)
Mariable(s) entered on step *4 gender.
a.
We can now use this model to #redict the odds that a sub!ect of a given gender
will decide to continue the research. /he odds prediction eKuation is
bX a
e ODDS
+
= .
'f our sub!ect is a woman (gender ) (), then the E;< . (
BEC . ) ( ( ;*C . * BEC .
= = =
+
e e ODDS .
/hat is, a woman is only .E;< as likely to decide to continue the research as she is to
decide to stop the research. 'f our sub!ect is a man (gender ) *), then the
EEB . *
6C . ) * ( ;*C . * BEC .
= = =
+
e e ODDS . /hat is, a man is *.EEB times more likely to decide
to continue the research than to decide to stop the research.
We can easily con$ert odds to #ro!a!ilities. or our women,
6( . (
E;< . *
E;< . (
*
0
= =
+
=
ODDS
ODDS
Y . /hat is, our model predicts that 6(D of women will
decide to continue the research. or our men, >< . (
EEB . ;
EEB . *
*
0
= =
+
=
ODDS
ODDS
Y . /hat is,
our model predicts that ><D of men will decide to continue the research
/he /aria!les in the 01uation output also gives us the 0x#&B'. /his is better
known as the odds ratio predicted by the model. /his odds ratio can be computed by
raising the !ase o the natural log to the b
th
#ower, where b is the slo#e rom our
logistic regression e1uation. or our model, 6CI . 6
;*C . *
= e . /hat tells us that the
model predicts that the odds of deciding to continue the research are 6.6CI times
higher for men than they are for women. or the men, the odds are *.EEB, and for the
women they are (.E;<. /he odds ratio is
*.EEB / (.E;< ) 6.6CI .
/he results of our logistic regression can be used to classiy su!"ects with
respect to what decision we think they will make. 2s noted earlier, our model leads to
the prediction that the probability of deciding to continue the research is 6(D for
women and ><D for men. 3efore we can use this information to classify sub!ects, we
need to have a decision rule. .ur decision rule will take the following form4 'f the
probability of the event is greater than or eKual to some threshold, we shall predict that
E
the event will take place. 3y default, 82#W sets this threshold to .>. While that seems
reasonable, in many cases we may want to set it higher or lower than .>. +ore on this
later. ?sing the default threshold, 82#W will classify a sub!ect into the N%ontinue the
-esearchO category if the estimated probability is .> or more, which it is for every male
sub!ect. 82#W will classify a sub!ect into the N#top the -esearchO category if the
estimated probability is less than .>, which it is for every female sub!ect.
/he ,lassiication .a!le shows us that this rule allows us to correctly classify
IB / *;B ) >6D of the sub!ects where the predicted event (deciding to continue the
research) was observed. /his is known as the sensiti$ity of prediction, the 8(correct P
event did occur), that is, the percentage of occurrences correctly predicted. We also
see that this rule allows us to correctly classify *E( / *BC ) C>D of the sub!ects where
the predicted event was not observed. /his is known as the s#eciicity of prediction,
the 8(correct P event did not occur), that is, the percentage of nonoccurrences correctly
predicted. .verall our predictions were correct ;(B out of 6*> times, for an o$erall
success rate of IID. -ecall that it was only ><D for the model with intercept only.
,lassiication .a!le
a
*E( EC CE.<
I( IB >6.*
II.(
.bserved
stop
continue
decision
.verall 8ercentage
#tep *
stop continue
decision
8ercentage
%orrect
8redicted
/he cut value is .>((
a.
We could focus on error rates in classification. 2 alse #ositi$e would be
predicting that the event would occur when, in fact, it did not. .ur decision rule
predicted a decision to continue the research **> times. /hat prediction was wrong EC
times, for a false positive rate of EC / **> ) E*D. 2 alse negati$e would be predicting
that the event would not occur when, in fact, it did occur. .ur decision rule predicted a
decision not to continue the research ;(( times. /hat prediction was wrong I( times,
for a false negative rate of I( / ;(( ) 6(D.
't has probably occurred to you that you could have used a simple Pearson ,hi3
S1uare ,ontingency .a!le Analysis to answer the Kuestion of whether or not there is
a significant relationship between gender and decision about the animal research. Let
us take a Kuick look at such an analysis. 'n 82#W click Analy)e, Descri#ti$e
Statistics, ,rossta!s. #coot gender into the rows bo" and decision into the columns
bo". /he dialog bo" should look like this4
>
@ow click the Statistics bo". %heck ,hi3S1uare and then click ,ontinue.
@ow click the ,ells bo". %heck *!ser$ed ,ounts and Row Percentages and
then click %ontinue.
I
3ack on the initial page, click *+.
'n the ,rossta!ulation out#ut you will see that ><D of the men and 6(D of the
women decided to continue the research, !ust as predicted by our logistic regression.
gender ? decision ,rossta!ulation
*E( I( ;((
C(.(D 6(.(D *((.(D
EC IB **>
E(.<D ><.*D *((.(D
*BC *;B 6*>
><.ED E(.ID *((.(D
%ount
D within gender
%ount
D within gender
%ount
D within gender
emale
+ale
gender
/otal
stop continue
decision
/otal
Qou will also notice that the Li(elihood Ratio ,hi3S1uare is ;>.I>6 on * df, the
same test of significance we got from our logistic regression, and the 8earson %hi7
#Kuare is almost the same (;>.IB>). 'f you are thinking, NJey, this logistic regression is
nearly eKuivalent to a simple 8earson %hi7#Kuare,O you are correct, in this simple case.
-emember, however, that we can add additional predictor variables, and those
additional predictors can be either categorical or continuous 77 you canRt do that with a
simple 8earson %hi7#Kuare.
,hi3S1uare .ests
;>.IB>
b
* .(((
;>.I>6 * .(((
6*>
8earson %hi7#Kuare
Likelihood -atio
@ of Malid %ases
Malue df
2symp. #ig.
(;7sided)
%omputed only for a ;"; table
a.
( cells (.(D) have e"pected count less than >. /he
minimum e"pected count is EI.C6.
b.
C

Das könnte Ihnen auch gefallen