Logistic Regression

Logistic
Regression
Logistic Regression
Extends idea of linear regression to situation where outcome
variable is categorical

Widely used, particularly where a structured model is useful
to explain (=profiling) or to predict

We focus on binary classification
i.e. Y=0 or Y=1

Why Not Linear Regression?
Technically, you can run linear regression with a 0/1 response
variable and obtain an output

But the resulting model will not make sense

For instance:
Predictions will mostly not be 0 or 1
Coefficient interpretation will not make sense

Example: Suppose researchers interested in potential
relationship between patient age (x) and presence (1)/
(0) of disease
Data set includes 20 patients

Patient
ID
Age, x Disease,
y
Patient
ID
Age, x Disease,
y
1 25 0 11 50 0
2 29 0 12 59 1
3 30 0 13 60 0
4 31 0 14 62 0
5 32 0 15 68 1
6 41 0 16 72 0
7 41 0 17 79 1
8 42 0 18 80 0
9 44 1 19 81 1
10 49 1 20 84 1
Simple Example of Logistic
Regression (contd)
Plot shows least squares
regression line (straight)
and logistic regression
line (curved) for disease
on age
ERE linear, assumes linear
relationship between
variables
In contrast, logistic regression line assumes non-linear
relationship between predictor and response
Patient 11 estimation errors (vertical lines) shown
Regression (contd)
Patient 11s estimation error greater for linear regression
versus logistic regression
Thus, for this point, and many others, linear regression
does poorer job of estimating disease
Question: How is logistic regression line derived?
First, E(Y|x) is conditional mean of Y, given x
Equals expected value of response, for given predictor
value
Recall linear regression, where response random variable
defined as:
c | | + + = x Y
1 0
Regression (contd)
Furthermore, since has mean = 0, E(Y|x) for linear
regression equals:

Denote E(Y|x) as (x), where conditional mean for
logistic regression takes form:

Function forms s-shaped (sigmoidal) curves, which are
non-linear
Logistic function models dichotomous data well, because
of simplicity and interpretability
x x Y E
1 0
) | ( | | + =
x
x
e
e
x
1 0
1 0
1
) (
| |
| |
t
+
+
+
=
Regression (contd)
Sigmoid min = , Sigmoid max =
where, 0 (x) 1 interpreted as a probability
(x) interpreted as probability disease (positive outcome)
present for records X = x
Furthermore, 1 (x) interpreted as probability disease
(positive outcome) not present
Recall linear regression error term normally distributed,
with mean = 0 and constant variance
However, assumptions regarding error term different for
logistic regression
0
1
lim =
(
+

a
a
a
e
e
1
1
lim =
(
+

a
a
a
e
e
Regression (contd)
Because response dichotomous, errors take one of two
forms:
Y = 1 (disease present)
Occurs with probability (x), probability response positive
= 1 (x) represents vertical distance between point Y = 1 and
curve (x) below, for X = x
Y = 0 (disease not present)
Occurs with probability 1 (x), probability response negative
= 0 (x) = (x), which represents vertical distance between
point Y = 0 and curve (x) above, for X = x

Regression (contd)
Variance of = (x)(1 (x)), variance of binomial
distribution
Therefore, logistic regression response Y = (x) +
assumed to follow binomial distribution with probability
success = (x)
Transformation for logistic regression,
logit transformation, defined as:

Includes useful properties for linearity, continuity, and
ranges from positive to negative infinity

x
x
x
x g
1 0
) ( 1
) (
ln ) ( | |
t
t
+ =
(
=
Maximum Likelihood Estimation
The, maximum likelihood estimation finds parameter
estimates for which likelihood of observing the data is
maximum

Likelihood function is function of
i
parameters, which
expresses probability of observed data, x

Maximum likelihood estimators determined by finding
values for
i
, which maximize likelihood function

( )
m
where x l | | | , , , , |
1 0
=
Interpreting Logistic Regression
Output
Logistic regression of disease on age performed using
SPSS

Coefficients (maximum likelihood estimates) of unknown
parameters
0
and
1
, given as b
0
= 4.372 and
b
1
= 0.06696, respectively
Variables in the Equation

B S.E. Wald df Sig. Exp(B)
Step 1
a
Age .06696 .032 4.315 1 .038 1.069
Constant -4.372 1.966 4.948 1 .026 .013
a. Variable(s) entered on step 1: Age.
Interpreting Logistic Regression Output
Thus

Is estimated as

With the estimated logit

These eqns may then be used to estimate the probability that the
disease is present in a particular patient given the patients age. For
example for a 50 year old patient, we have

Thus, the estimated probability that a 50-year old patient has the
disease is 26% and the estimate probability the disease is not
present is 100%-26% = 74%.

x
x
e
e
x
1 0
1 0
1
) (
| |
| |
t
+
+
+
=
4.372 0.06696( )
4.372
0 1
0 1
0.06696( )
( )
1
1
age
x
age
x
e e
x
e
e
| |
| |
t
+
+
+
+
= =
+
+
( ) 4.372 0.06696( ) g x age = +
( ) 4.372 0.06696(50) 1.024 g x = + =
1.024
1.024
0 1
0 1
( ) 0.26
1
1
x
x
e e
x
e
e
| |
| |
t

+
+
= = =
+
+
Inference: Are Predictors
Significant? (contd)
Denote (x
i
) from fitted model to be -hat
i
, where
deviance for logistic regression becomes:

Deviance represents remaining error in model, after
predictors accounted for
Analogous to SSE in linear regression
G-statistic for particular predictor derived by subtracting
deviance of model with predictor from deviance of
model without predictor
=
)
`
+
(
= =
n
i
i
i
i
i
i
i
y
y
y
y D deviance
1
1
1
ln ) 1 (
ln ln 2
t t
(
=
=
predictor with likelihood
predictor without likelihood
-2ln
predictor) with model ( - predictor) without model ( deviance deviance G
Let n
1
= y
i
and n
0
= (1 y
i
)
Then, for single predictor case G-statistic becomes:

Logistic regression example of disease on age produces
log likelihood = 10.101, resulting in G-statistic:

Here, G-statistic follows chi-square distribution with 1
degree of freedom
Null hypothesis assumes
1
= 0
Thus, small p-value P(
2
1
) > 5.696 = 0.017 indicates age
useful for predicting presence of disease
| | | | | | | | { } ) ln( ) ln( ) ln( 1 ln ) 1 ( ln 2
0 0 1 1
1
n n n n n n y y G
n
i
i i i i
+ + =

=
t t
| | { } 696 . 5 ) 20 ln( 20 ) 13 ln( 13 ) 7 ln( 7 101 . 10 2 = + = G
Wald test second hypothesis test for assessing
significance of predictor
Under Ho that
1
= 0, Z
Wald
statistic follows standard
normal distribution:

SPSS results show b
1
= 0.06696 and SE(b
1
) = 0.03223
leading to Z
Wald
= 2.08
Thus, P( |z| > 2.08) = 0.038 which indicates age
significant, assuming alpha-level = 0.05
100(1 alpha)% confidence intervals for coefficients b
0

or b
1
may be constructed
) (
1
1
b SE
b
Z
Wald
=
Model
Slope coefficient
1
interpreted as change in value of
logit, for unit increase in predictor:

Simple logistic regression predictor
1
interpreted for
three different cases, where
1
:
Dichotomous
Polychotomous, or
Continuous
Concept of odds introduced to facilitate interpretation
Odds defined as probability event occurs divided by
probability event does not occur
) ( ) 1 (
1
x g x g + = |
Model (contd)
Example: Estimated probability of 72-year old having
disease 61%, and not having disease 39%
Thus, odds 72-year old patient has disease:

Odds > 1, when event more likely to occur than not
Odds < 1, when event less likely to occur than not
Where odds = 1, event just as likely to occur
Probability ranges from 0 1, however odds ranges from
0 infinity
Thus, odds indicates how much more likely event occurs,
versus not occurring
. 56 . 1
39 . 0
61 . 0
= = odds
Model (contd)
With dichotomous predictor, odds
response occurred (y = 1) for
records (x = 1):
Similarly, odds response occurred
for records (x = 0):
Now, odds ratio defined as odds
response occurred with (x = 1) divided by odds response
occurred with (x = 0):

Use of odds ratio widespread in research community
1 0
1 0
1 0
1 0
1
1
1
) 1 ( 1
) 1 (
| |
| |
| |
| |
t
t
+
+
+
+
=
+
+
=
e
e
e
e
0
0
0
0
1
1
1
) 0 ( 1
) 0 (
|
|
|
|
t
t
e
e
e
e
=
+
+
=
1
0
1 0
|
|
| |
e
e
e
OR ratio odds = = =
+
Model for Dichotomous Predictor
Voice Mail Plan variable
used to predict those
leaving companys
service (churn)
Cross tabulation of
churn by membership
in Voice Mail Plan shown

Odds of those in plan churning:

= 80/842 = 0.0950
VMail = No
x = 0
VMail = Yes
x = 1
Total
Churn = False
y = 0
2008 842 2850
Churn = True
y = 1
403 80 483
Total 2411 922 3333
| | ) 1 ( 1 ) 1 ( t t
(contd)
Odds of those not in plan churning:
= 403/2008 = 0.2007
Therefore, odds ratio determines those participating in
voice mail plan are less likely to churn, compared to
those not in plan:

SPSS logistic regression reports b
0
= 1.60596 and
b
1
= 0.747795, leading to estimated logit:
| | ) 0 ( 1 ) 0 ( t t
| |
| |
47 . 0
842 403
2008 80
) 1 ( 1 ) 0 (
) 0 ( 1 ) 1 (
=
=

= =
t t
t t
OR ratio odds
) x ( 747795 . 0 60596 . 1 ) ( = x g
(contd)
For customers in plan, their probability of churning is
8.68%:

Proportion of all customers in data set churning =
14.50%, indicating that participation in plan protects
against churn
P(churn | Voice Mail Plan) = 80/922 = 0.0868, as
computed directly from data
For customer not in plan, their probability of churning is
16.72%:
0868 . 0
1 1
) (
3538 . 2
3538 . 2
) (
) (
=
+
=
+
=

e
e
e
e
x
x g
x g
t
16715 . 0
1 1
) (
60596 . 1
60596 . 1
) (
) (
=
+
=
+
=

e
e
e
e
x
x g
x g
t
Model for Polychotomous Predictor
From churn data set, suppose Customer Service Calls
categorized as CSC, where:
CSC = Low Zero or one calls
CSC = Medium Two or three calls
CSC = High Four or more calls
Logistic regression requires coding data set using
indicator variables, where CSC = Low reference cell:

CSC_Med CSC_Hi
Low 0 0
Medium 1 0
High 0 1
(contd)

Using CSC = Low as reference class, odds ratios calculated
as follows:
CSC = Medium:
CSC = High:
Note, odds ratios are same reported by Minitab, where
b
0
= 2.051, b
1
= 0.0369891, and b
2
= 2.11844
Thus, probability
of churning:

CSC = Low CSC = Medium CSC = High Total
Churn = False 1664 1057 129 2850
Churn = True 214 131 138 483
Total 1878 1188 267 333
96 . 0 963687 . 0
1057 214
1664 131
~ =
= ratio odds
32 . 8 31819 . 8
129 214
1664 138
~ =
= ratio odds
) CSC_Hi ( 11844 . 2 ) CSC_Med ( 0369891 . 0 051 . 2
) CSC_Hi ( 11844 . 2 ) CSC_Med ( 0369891 . 0 051 . 2
) (
) (
1 1
) (
+
+
+
=
+
=
e
e
e
e
x
x g
x g
t
(contd)
For customers with CSC = Low, estimated probability of
churning:

Thus, estimated probability of churning = 11.4%, which is
calculated directly as:
Additionally, customers with CSC = Medium and CSC =
High are found have estimated probability of churning
equal to 11.0% and 51.7%, respectively
Clearly, company needs to flag customers making 4 or
more calls, before they attrit

051 . 2 ) 0 ( 11844 . 2 ) 0 ( 0369891 . 0 051 . 2 ) ( = + = x g
114 . 0
1 1
) (
051 . 2
051 . 2
) (
) (
=
+
=
+
=

e
e
e
e
x
x g
x g
t
114 . 0 1878 214 ) | ( = = = low CSC churn P
(contd)
Wald test for significance of CSC_Med, where SE(b
1
) =
0.117701:

Here, p-value = 0.753, indicating no evidence CSC_Med
versus CSC_Low useful predicting churn
However, for CSC_Hi, p-value ~ 0.000, indicating
distinction between CSC_Hi versus CSC_Low helpful for
predicting churn
Natural log of odds ratio of
CSC_Hi to CSC_Low
derived:

, 31426 . 0
117701 . 0
0369891 . 0
=
=
Wald
Z
| |
| |
| |
11844 . 2
) 0 _ ( ) 0 _ (
) 1 _ ( ) 0 _ (
) ( ) ( ) , ( ln
2
2 1 0
2 1 0
= =
= + = +
= + = + =
=
b
Hi CSC b Med CSC b b
Hi CSC b Med CSC b b
Low g High g Low High OR
Model for Continuous Predictor
Now, churn predicted using continuous predictor Day
Minutes
Churners have higher mean Day Minutes = 206.91,
compared to non-churners with mean Day Minutes =
175.18
Question: Is difference significant?
Two-sample t-test with p-value ~ 0.000 rejects Ho,
indicating difference between Day Minutes usage for
churners and non-churners
However, t-test gives no indication how increase in Day
Minutes affects odds customer will churn

(contd)
Minitab results of churn on Day Minutes using Minitab:

For customer with 100 day minutes, estimated
probability of churning = 5.7%:

Logistic Regression Table
Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -3.92929 0.202822 -19.37 0.000
Day Mins 0.0112717 0.0009750 11.56 0.000 1.01 1.01 1.01
Log-Likelihood = -1307.129
Test that all slopes are zero: G = 144.035, DF = 1, P-Value = 0.000
0572 . 0
1 1
) (
) mins day ( 0112717 . 0 92929 . 3
) mins day ( 0112717 . 0 92929 . 3
) (
) (
=
+
=
+
=
+
+
e
e
e
e
x
x g
x g
t
(contd)
Derived similarly, customer with 300 day minutes has
estimated probability of churning = 36.64%
Deviance for example has G = 144.035, where P(
2
1
) >
144.035 ~ 0.000, concluding strong evidence exists Day
Minutes useful for predicting churn
Wald test for significance of Day Minutes, has associated
p-value ~ 0.000, indicating strong evidence predictor
useful:

Note, coefficient for Day Minutes equal to natural log of
its odds ratio:

56 . 11
0009750 . 0
0112717 . 0
= =
Wald
Z
0112717 . 0 ) 011335 . 1 ln( ) 01 . 1 ln(
min
= ~ =
s day
b
(contd)
Coefficient for Day Minutes b
1
also derived as follows:

b
1
represents estimated change in log odds ratio, for unit
increase in predictor
For example, for every additional day minute a customer
uses, log odds ratio for churning increases by 0.0112717
Odds ratio 1.1 reported by Minitab interpreted as odds
of customer churning with x + 1 minutes, compared to
customer with x minutes churning

| |
| |
| |
0112717 . 0
) (
) 1 (
) ( ) 1 ( ) mins day ( ln
1
1 0
1 0
= =
+
+ + =
+ =
b
x b b
x b b
x g x g OR
(contd)
Analyst may prefer to interpret odds ratio using different
scale, such as 10 or 100 minutes
For constant c, c b
1
represents estimated change in log
odds ratio, for c units increase for predictor, derived as:

For example, with c = 60, estimated change in log odds
ratio c b
1
= 60 (0.0112717) = 0.676302
Estimated odds ratio Customer A with 60 more Day
Minutes churns, compared to Customer B = e
0.676302
=
1.97, which doubles odds of churning

| | | |
1 1 0 1 0
) ( ) ( ) ( ) ( b c x b b c x b b x g c x g = + + + = +
Multiple Logistic Regression
We examine whether a relationship exists between churn and
the following set of predictors.
1. International plan, a flag variable
2. Voice mail Plan, a flag variable
3. CSC-Hi, a flag variable indicating whether or not a customer
had a high (>=4) level of customer service calls.
4. Account length, continuous
5. Day minutes, continuous
6. Evening minutes, continuous
7. Night minutes, continuous
8. International minutes, continuous
3
3

Multiple Logistic Regression
Five continuous and three flag variables chosen from
churn data set to predict binary response and modeled
with Minitab:
Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -8.15980 0.536092 -15.22 0.000
Account Length 0.0008006 0.0014408 0.56 0.578 1.00 1.00 1.00
Day Mins 0.0134755 0.0011192 12.04 0.000 1.01 1.01 1.02
Eve Mins 0.0073029 0.0011695 6.24 0.000 1.01 1.01 1.01
Night Mins 0.0042378 0.0011474 3.69 0.000 1.00 1.00 1.01
Intl Mins 0.0853508 0.0210217 4.06 0.000 1.09 1.05 1.13
Int_l Plan yes 2.03287 0.146894 13.84 0.000 7.64 5.73 10.18
VMail Plan yes -1.04435 0.150087 -6.96 0.000 0.35 0.26 0.47
CSC-Hi 1 2.67683 0.159224 16.81 0.000 14.54 10.64 19.86
Log-Likelihood = -1036.038
Test that all slopes are zero: G = 686.218, DF = 8, P-Value = 0.000
Multiple Logistic Regression (contd)
Overall regression significant according to G-statistics p-
value, indicating model useful for predicting churn
However, p-value associated with Wald statistic for
Account Length non-significant
Multiple logistic regression model re-run (not shown)
after removing Account Length, where remaining seven
predictors still significant
Interpreting each predictors coefficient:
POSITIVE increase in predictor associated with increase in
probability of churning
NEGATIVE increase in predictor associated with decrease in
probability of churning
Only membership in Voice Mail Plan reduces probability
of churning
Estimated logit shown:

where Int_l Plan = Yes, VMail Plan = Yes, and CSC- Hi = 1
represent indicator variables
Example: estimate probability of churning for customer
belonging to no plans with fewer calls to customer
service. Also has 180 day minutes, 200 evening minutes,
and 10 international minutes usage.
) 1 ( 67697 . 2 ) ( 04356 . 1 ) _ ( 03548 . 2
) ( 0853509 . 0 ) ( 0042223 . 0
) ( 0072939 . 0 ) ( 0134735 . 0 07374 . 8 ) (
= + = = +
+ +
+ + =
Hi CSC Yes Plan VMail Yes Plan l Int
IntlMins NightMins
EveMins DayMins x g
Therefore, probability customer will churn about 7.6%:

Example: estimate probability of churning for customer
belonging to International Plan but not Voice Mail Plan
with many calls to customer service. High usage includes
300 day, evening, and international minutes.
491761 . 2
) 0 ( 67697 . 2 ) 0 ( 04356 . 1 ) 0 ( 03548 . 2
) 10 ( 0853509 . 0 ) 200 ( 0042223 . 0
) 200 ( 0072939 . 0 ) 180 ( 0134735 . 0 07374 . 8 ) (
=
+ +
+ +
+ + = x g
076435 . 0
1 1
) (
491761 . 2
491761 . 2
) (
) (
=
+
=
+
=

e
e
e
e
x
x g
x g
t
842638 . 5
) 1 ( 67697 . 2 ) 0 ( 04356 . 1 ) 1 ( 03548 . 2
) 20 ( 0853509 . 0 ) 300 ( 0042223 . 0
) 300 ( 0072939 . 0 ) 300 ( 0134735 . 0 07374 . 8 ) (
=
+ +
+ +
+ + = x g
997107 . 0
1 1
) (
842638 . 5
842638 . 5
) (
) (
=
+
=
+
=
e
e
e
e
x
x g
x g
t
Estimated probability these customers churn = 99.7%!
Clearly company should intervene to prevent these
customers from switching to other carriers

Logistic Regression

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Logistic Regression

Hochgeladen von

Copyright:

Verfügbare Formate

Logistic

Das könnte Ihnen auch gefallen