Binary Logit

1.
Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Table of contents
1. Introduction
1.1 Dichotomous dependent variables
1.2 Problems with OLS
3. The Logit model
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
3.3.1 SAS codes and basic outputs
3.3.2 Wald test for individual signicance
3.3.3 Likelihood-ratio, LM and Wald tests for overall signicance
3.3.4 Odds ratio estimates
3.3.5 AIC, SC and Generalised R
2
3.3.6 Association of predicted probabilities and observed responses
3.3.7 Hosmer-Lemeshow test statistic
4. Class exercises
2 / 44
1. Introduction
3. The Logit model
4. Class exercises
Introduction
Motivation for Logit model:
Dichotomous dependent variables;
Problems with Ordinary Least Squares (OLS) in the face of

dichotomous dependent variables;
Alternative estimation techniques

3 / 44
1. Introduction
3. The Logit model
4. Class exercises
Introduction


3 / 44
1. Introduction
3. The Logit model
4. Class exercises
Introduction


3 / 44
1. Introduction
3. The Logit model
4. Class exercises
Introduction


3 / 44
1. Introduction
3. The Logit model
4. Class exercises
Dichotomous dependent variables
Often variables in social sciences are dichotomous:
employed vs. unemployed
married vs. unmarried
guilty vs. innocent
voted vs. didnt vote

4 / 44
1. Introduction
3. The Logit model
4. Class exercises
Social scientists frequently wish to estimate

regression models with a dichotomous dependent
variable;
Most researchers are aware that something is

wrong with OLS in the face of a dichotomous
dependent variable but they do not know what
makes dichotomous variables problematic in
regression, and what other methods are superior
5 / 44
1. Introduction
3. The Logit model
4. Class exercises
Focus of this chapter is on binary Logit models

(or logistic regression models) for dichotomous
dependent variables;
Logits have many similarities to OLS but there

are also fundamental differences
6 / 44
1. Introduction
3. The Logit model
4. Class exercises
Problems with OLS
Examine why OLS regression runs into problems when the

dependent variable is 0/1.
Example
Dataset: penalty.txt
Comprises 147 penalty cases in the state of New

Jersey;
In all cases the defendant was convicted of

first-degree murder with a recommendation by the
prosecutor that a death sentence be imposed;
Penalty trial is conducted to determine if the

defendant should receive a death penalty or life
imprisonment;
7 / 44
1. Introduction
3. The Logit model
4. Class exercises
Problems with OLS
The dataset comprises the following variables:
DEATH 1 for a death sentence

0 for a life sentence
BLACKD 1 if the defendant was black

0 otherwise
WHITVIC 1 if the victim was white

0 otherwise
SERIOUS - an average rating of seriousness of the

crime evaluated by a panel of judges, ranging from
(least serious) to 15 (most serious)
The goal is to regress DEATH on BLACKD, WHITVIC and

SERIOUS;
8 / 44
1. Introduction
3. The Logit model
4. Class exercises
Problems with OLS
Note that DEATH, which has only two outcomes, follows a

Bernoulli(p) distribution with p being the probability of a
death sentence. Let Y=DEATH, then
Pr (Y = y) = p
y
(1 p)
1y
, y = 0, 1
Recall that Bernoulli trials led to the Binomial distribution - if

we repeat the Bernoulli(p) trials n times and count the
number of W successes, the distribution of W follows a
Binomial B(n, p) distribution, i.e.,
Pr (W = w) =
n
C
w
p
w
(1 p)
(nw)
, 0 w n
So the Bernoullli distribution is special case of the Binomial

distribution when n = 1.
9 / 44
1. Introduction
3. The Logit model
4. Class exercises
Problems with OLS
data penalty;
infile 'd:\teaching\ms4225\penalty.txt';
input DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;
PROC REG;
MODEL DEATH=BLACKD WHITVIC SERIOUS;
RUN;
10 / 44
1. Introduction
3. The Logit model
4. Class exercises
Problems with OLS

The REG Procedure
Model: MODEL1
Dependent Variable: DEATH

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 3 2.61611 0.87204 4.11 0.0079
Error 143 30.37709 0.21243
Corrected Total 146 32.99320

Root MSE 0.46090 R-Square 0.0793
Dependent Mean 0.34014 Adj R-Sq 0.0600
Coeff Var 135.50409

Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -0.05492 0.12499 -0.44 0.6610
BLACKD 1 0.12197 0.08224 1.48 0.1403
WHITVIC 1 0.05331 0.08411 0.63 0.5272
SERIOUS 1 0.03840 0.01200 3.20 0.0017
11 / 44
1. Introduction
3. The Logit model
4. Class exercises
Problems with OLS
The coefficient of SERIOUS is positive and very

significant;
Neither of the two racial variables are

significantly different from zero;
R
2
is low;
F-test indicates overall significance of the

model;
But....can we trust these results?

12 / 44
1. Introduction
3. The Logit model
4. Class exercises
Problems with OLS
Note that if y is a 0/1 variable, then

E(y
i
) = 1 Pr (y
i
= 1) + 0 Pr (y
i
= 0)
= 1 p
i
+ 0 (1 p
i
)
= p
i
.
But based on linear regression, y

i
=
1
+
2
X
i
+
i
. Hence
E(y
i
) = E(
1
+
2
X
i
+
i
)
=
1
+
2
X
i
+E(
i
)
=
1
+
2
X
i
.
Therefore, p
i
=
1
+
2
X
i
. This is commonly referred to as
the linear probability model (LPM).
13 / 44
1. Introduction
3. The Logit model
4. Class exercises
Problems with OLS
Accordingly, from the SAS results, a one-point increase in the

SERIOUS scale is associated with a 0.038 increase in the
probability of a death sentence; the probability of a death
sentence for blacks is 0.12 higher than for non-blacks, ceteris
paribus. But do these results make sense?
The LPM p
i
=
1
+
2
X
i
is actually implausible because p
i
is
postulated to be a linear function of X
i
and thus has no upper
and lower bounds. Accordingly, p
i
(which is a probability) can
be greater than 1 or smaller than 0 !!
14 / 44
1. Introduction
3. The Logit model
4. Class exercises
Odds versus probability
Odds of an event: the ratio of the expected number of times

that an event will occur to the expected number of times it
will not occur;
For example, an odds of 4 means we expect 4 times as many

occurrences as non-occurrences; an odds of 5/2 (or 5 to 2)
means we expect 5 occurrences to 2 non-occurrences;
Let p be the probability of an event occurring and o the

corresponding odds, then
o = p/(1 p) or p = o/(1 +o);
15 / 44
1. Introduction
3. The Logit model
4. Class exercises

will not occur;


o = p/(1 p) or p = o/(1 +o);
15 / 44
1. Introduction
3. The Logit model
4. Class exercises

will not occur;


o = p/(1 p) or p = o/(1 +o);
15 / 44
1. Introduction
3. The Logit model
4. Class exercises
Relationship between probability and odds:

Probability Odds
0.1 0.11
0.2 0.25
0.3 0.43
0.4 0.67
0.5 1.00
0.6 1.50
0.7 2.33
0.8 4.00
0.9 9.00
o < 1 p < 0.5 and o > 1 p > 0.5;
0 o < although 0 p 1
16 / 44
1. Introduction
3. The Logit model
4. Class exercises
Death sentence by race of defendant for 147

penalty trials:
blacks non-blacks
death 28 22 50
life 45 52 97
73 74 147
o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;
Hence the ratio of blacks odds of death to non-blacks odds of

death are 0.62/0.42 = 1.476;
This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
3. The Logit model
4. Class exercises

penalty trials:
blacks non-blacks
death 28 22 50
life 45 52 97
73 74 147
o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;

death are 0.62/0.42 = 1.476;
17 / 44
1. Introduction
3. The Logit model
4. Class exercises

penalty trials:
blacks non-blacks
death 28 22 50
life 45 52 97
73 74 147
o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;

death are 0.62/0.42 = 1.476;
17 / 44
1. Introduction
3. The Logit model
4. Class exercises

penalty trials:
blacks non-blacks
death 28 22 50
life 45 52 97
73 74 147
o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;

death are 0.62/0.42 = 1.476;
17 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Logit model: basic elements
The Logit model is based on the following cumulative

distribution function of the logistic distribution:
p
i
=
1
1+e
1
+
2
X
i
;
Let Z
i
=
1
+
2
X
i
, then
p
i
=
1
1+e
Z
i
= F(
1
+
2
X
i
) = F(Z
i
);
As Z
i
ranges from to , P
i
ranges between 0 and 1;
P
i
is non-linearly related to Z
i
.
18 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Graph of the Logit with

1
= 0 and
2
= 1:

-4 -3 -2 -1 0 1 2 3 4
P
i
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
19 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Note that e
Z
i
= p
i
/(1 p
i
), the odds of an event;
So, ln(p
i
/(1 p
i
)) = Z
i
=
1
+
2
X
i
; in other words, the log
of the odds is linear in X
i
, although p
i
and X
i
have a
non-linear relationship. This is dierent from the LPM.
20 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
For a linear model y

i
=
1
+
2
X
i
+
i
,
y
i
X
i
=
2
, a constant;
But for a Logit model,

p
i
= F(
1
+
2
X
i
)
p
i
X
i
=
F(
1
+
2
X
i
X
i
= F
(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.
As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC

i
=
1
+
2
X
i
+
i
,
y
i
X
i
=
2
, a constant;

p
i
= F(
1
+
2
X
i
)
p
i
X
i
=
F(
1
+
2
X
i
X
i
= F
(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
distribution.
As f (
1
+
2
X
i
2
indicates
i
and X
i
.
21 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC

i
=
1
+
2
X
i
+
i
,
y
i
X
i
=
2
, a constant;

p
i
= F(
1
+
2
X
i
)
p
i
X
i
=
F(
1
+
2
X
i
X
i
= F
(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
distribution.
As f (
1
+
2
X
i
2
indicates
i
and X
i
.
21 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC

i
=
1
+
2
X
i
+
i
,
y
i
X
i
=
2
, a constant;

p
i
= F(
1
+
2
X
i
)
p
i
X
i
=
F(
1
+
2
X
i
X
i
= F
(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
distribution.
As f (
1
+
2
X
i
2
indicates
i
and X
i
.
21 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Note that for the Logit model

f (
1
+
2
X
i
) =
e
Z
i
(1 +e
Z
i
)
2
= F(
1
+
2
X
i
)(1 F(
1
+
2
X
i
))
= p
i
(1 p
i
)
Therefore,
p
i
X
i
=
2
p
i
(1 p
i
). In other words, a 1-unit
change in X
i
does not produce a constant eect on p
i
.
22 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Note that for the Logit model

f (
1
+
2
X
i
) =
e
Z
i
(1 +e
Z
i
)
2
= F(
1
+
2
X
i
)(1 F(
1
+
2
X
i
))
= p
i
(1 p
i
)
Therefore,
p
i
X
i
=
2
p
i
(1 p
i
). In other words, a 1-unit
change in X
i
does not produce a constant eect on p
i
.
22 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Maximum Likelihood estimation
Note that y
i
only takes on values of 0 and 1, so p
i
/(1 p
i
) is
undened and OLS is not an appropriate method of
estimation. Maximum likelihood (ML) estimation is usually
the technique to adopt;
ML principle: choose as estimates the parameter values which

would maximise the probability of what we have already
observed;
Steps of ML estimation: First, construct the likelihood

function by expressing the probability of observing the data as
a function of the unknown parameters. Second, nd the
values of the unknown parameters that make the value of this
expression as large as possible.
23 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Note that y
i
i
/(1 p
i
) is

observed;

23 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Note that y
i
i
/(1 p
i
) is

observed;

23 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
The likelihood function is given by

L = Pr (y
1
, y
2
, ....y
n
)
= Pr (y
1
)Pr (y
2
)....Pr (y
n
), assuming independent sampling;
=
n
i =1
Pr (y
i
)
But by denition, Pr (y
i
= 1) = p
i
and Pr (y
i
= 0) = 1 p
i
.
Therefore, Pr (y
i
) = p
y
i
i
(1 p
i
)
1y
i
24 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
The likelihood function is given by

L = Pr (y
1
, y
2
, ....y
n
)
= Pr (y
1
)Pr (y
2
)....Pr (y
n
), assuming independent sampling;
=
n
i =1
Pr (y
i
)
But by denition, Pr (y
i
= 1) = p
i
and Pr (y
i
= 0) = 1 p
i
.
Therefore, Pr (y
i
) = p
y
i
i
(1 p
i
)
1y
i
24 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
So,
L =
n
i =1
Pr (y
i
) =
n
i =1
p
y
i
i
(1 p
i
)
1y
i
=
n
i =1
(
p
i
1 p
i
)
y
i
(1 p
i
)
It is usually easier to maximise the log of L than L itself.

Taking log of both sides yields
lnL =
n
i =1
log(
p
i
1 p
i
)
y
i
+log(1 p
i
)
=
n
i =1
y
i
log(
p
i
1 p
i
) +
n
i =1
log(1 p
i
)
25 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Substituting p
i
=
1
1+e
1
+
2
X
i
in lnL leads to
lnL =
1
n
i =1
y
i
+
2
n
i =1
X
i
y
i

n
i =1
log(1 +e
1
+
2
X
i
)
There are no closed form solutions to

1
and
2
when
maximizing lnL;
Numerical optimisation is required - SAS uses Fishers Scoring,

which is similar in principle to the Newton-Raphson algorithm.
26 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Substituting p
i
=
1
1+e
1
+
2
X
i
in lnL leads to
lnL =
1
n
i =1
y
i
+
2
n
i =1
X
i
y
i

n
i =1
log(1 +e
1
+
2
X
i
)

1
and
2
when
maximizing lnL;

26 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Substituting p
i
=
1
1+e
1
+
2
X
i
in lnL leads to
lnL =
1
n
i =1
y
i
+
2
n
i =1
X
i
y
i

n
i =1
log(1 +e
1
+
2
X
i
)

1
and
2
when
maximizing lnL;

26 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Suppose is a univariate unknown parameter to be estimated.

The Newton-Raphson algorithm derives estimates based on
the formula
new
=

old
H
1
(
old
)U(
old
),
where H(.) and U(.) are the second and rst derivatives of
the objective function with respect to . The algorithm stops
when the estimates from successive iterations converge;
Consider a simple example, where g() =

3
+ 3
2
5. So,
U() = 3( 2) and H() = 6( 1);
Actual maximum and minimum of g() are located at = 2

and = 0 respectively;
27 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Suppose is a univariate unknown parameter to be estimated.

The Newton-Raphson algorithm derives estimates based on
the formula
new
=

old
H
1
(
old
)U(
old
),
where H(.) and U(.) are the second and rst derivatives of
the objective function with respect to . The algorithm stops
when the estimates from successive iterations converge;
Consider a simple example, where g() =

3
+ 3
2
5. So,
U() = 3( 2) and H() = 6( 1);
Actual maximum and minimum of g() are located at = 2

and = 0 respectively;
27 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Step 1: Choose an arbitrary initial starting value, say,
initial
= 1.5. So, U(1.5) = 2.25 and H(1.5) = 3. The new
estimate of is therefore equal to
new
= 1.5 2.25/(3) = 2.25;
Step 2:

old
= 2.25. So, U(2.25) = 1.6875 and
H(2.25) = 7.5. The new estimate of is
new
= 2.25 1.6875/(7.5) = 2.025;
Continue with Steps 3, 4 and so on until convergence;
Caution: Suppose we start with

initial
= 0.5. If the process is
left unchecked, the algorithm will converge to the minimum
located at = 0!!!
28 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
The only dierence between Fishers Scoring and

Newton-Raphson algorithm is that Fishers Scoring uses
E(H(.)) instead of H(.);
Our current situation is more complicated in that the

unknowns are multivariate. However, the optimisation
principle remains the same;
In practice, we need a set of initial values. PROC LOGISTIC

in SAS starts with all coecients equal to zero.
29 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
PROC LOGISTIC: basic elements
data PENALTY;
infile 'd:\teaching\ms4225\penalty.txt';
input DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;
PROC LOGISTIC DATA=PENALTY DESCENDING;
MODEL DEATH=BLACKD WHITVIC SERIOUS;
RUN;
30 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
The LOGISTIC Procedure

Model Information

Data Set WORK.PENALTY
Response Variable DEATH
Number of Response Levels 2
Number of Observations 147
Model binary logit
Optimization Technique Fisher's scoring

Response Profile

Ordered Total
Value DEATH Frequency

1 1 50
2 0 97

Probability modeled is DEATH=1.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

31 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Model Fit Statistics

Intercept
Intercept and
Criterion Only Covariates

AIC 190.491 184.285
SC 193.481 196.247
-2 Log L 188.491 176.285

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 12.2060 3 0.0067
Score 11.6560 3 0.0087
Wald 10.8211 3 0.0127

32 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -2.6516 0.6748 15.4424 <.0001
BLACKD 1 0.5952 0.3939 2.2827 0.1308
WHITVIC 1 0.2565 0.4002 0.4107 0.5216
SERIOUS 1 0.1871 0.0612 9.3342 0.0022

Odds Ratio Estimates

Point 95% Wald
Effect Estimate Confidence Limits

BLACKD 1.813 0.838 3.925
WHITVIC 1.292 0.590 2.832
SERIOUS 1.206 1.069 1.359

Association of Predicted Probabilities and Observed Responses

Percent Concordant 67.2 Somers' D 0.349
Percent Discordant 32.3 Gamma 0.351
Percent Tied 0.5 Tau-a 0.158
Pairs 4850 c 0.675

33 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Wald test for individual signicance
Test of signicance of individual coecients:

H
0
:
j
= 0 vs. H
1
: otherwise
Instead of reporting the t-stats, PROC LOGISTIC reports the
Wald
2
-stats for the signicance of individual coecients.
Reason being that the t-stat is not t distributed in a Logit
model; instead, it has an asymptotic N(0, 1) distribution
under the null of H
0
:
j
= 0. The square of a N(0, 1) variable
is a
2
variable with 1 df. The Wald
2
-stat is just the square
of the usual t-stat.
34 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Likelihood-ratio, LM and Wald tests for overall signicance
Test of overall model signicance:

H
0
:
1
=
2
= .... =
k
= 0 vs. H
1
: otherwise
1. Likelihood-ratio test:
LR = 2[lnL(
(UR)
) lnL(
(R)
)]
2
k
2. Score (Lagrange-multplier)(LM) test:
LM = [U(
(R)
)]
[H
1
(
(R)
)][U(
(R)
)]
2
k
3. Wald test:
W =

(UR)
[H(
(UR)
)]
(UR)

2
k
35 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Odds ratio estimates
The odds ratio estimates are obtained by exponentiating the

corresponding estimates, i.e., e
j
;
The (predicted) odds ratio of 1.813 indicates that the odds of

a death sentence for black defendants are 81% higher than
the odds for other defendants;
Similarly, the (predicted) odds of death are about 29% higher

when the victim is white, notwithstanding the coecient
being insignicant;
A 1-unit increase in the SERIOUS scale is associated with a

21% increase in the predicted odds of a death sentence
36 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
AIC, SC and Generalised R
2
Model selection criteria

1. Akaikes Information Criterion (AIC):
AIC = 2[lnL (k + 1)]
2. Schwartz Bayesian Criterion (SBC or SC):
SC = 2lnL + (k + 1) ln(n)
3. Generalized R
2
= 1 e
LR/n
, analogous to the conventional
R
2
used in linear regression
37 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Association of predicted probabilities and observed
responses
For the 147 observations in the sample, there are

147
C
2
= 10731 ways to pair them up (without pairing an
observation with itself). Of these, 5881 pairs have either both
1s or both 0s on y. These we ignore, leaving 4850 pairs for
which one case has a 1 and other case has a 0;
For each of these pairs, we ask the following question: Based

on estimated model, does the case with a 1 have a higher
predicted probability of attaining 1 than the case with a 0?
If yes, we call the pair a concordant; if no, we call the pair a

discordant; if the two cases have the same predicted values,
we call it a tie;
Obviously, the more concordant pairs, the better the t of the

model.
38 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
responses

147
C
2


we call it a tie;

model.
38 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
responses

147
C
2


we call it a tie;

model.
38 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
responses

147
C
2


we call it a tie;

model.
38 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
responses
Let C= number of concordant pairs, D= number of

discordant pairs, T=number of ties, and N=total number of
pairs before eliminating any;
Tau a =
CD
N
,Somer
sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D
and C stat = 0.5 (1 +SD)
All 4 measures vary between 0 and 1 with large values

corresponding to stronger associations between the predicted
and observed values
Rules of thumb for minimally acceptable levels of Tau a,

SD, Gamma and C stat are 0.1, 0.3, 0.3 and 0.65
respectively.
39 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
responses

Tau a =
CD
N
,Somer
sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D

and observed values

respectively.
39 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
responses

Tau a =
CD
N
,Somer
sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D

and observed values

respectively.
39 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
Hosmer-Lemeshow goodness of t test
The Hosmer-Lemeshow (HL) test is goodness of t test which

may be invoked by augmenting the LACKFIT option in the
model statement under PROC LOGISTIC;
The HL statistic is calculated as follows. Based on the

estimated model, predicted probabilities are generated for all
observations. These are sorted by size, then grouped into
approximately 10 intervals. Within each interval, the expected
frequency is obtained by adding up the predicted probabilities.
Expected frequencies are compared with the observed
frequencies by the conventional Pearson
2
statistic. The df is
the number of intervals minus 2;
40 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
HL =
2G
j =1
(O
j
E
j
)
2
E
j

2
G2
, where G is the number of
intervals, and O and E are the observed and predicted
frequencies respectively. LACKFIT output is as follows:
Partition for the Hosmer and Lemeshow Test

DEATH = 1 DEATH = 0
Group Total Observed Expected Observed Expected

1 15 3 2.04 12 12.96
2 15 2 2.78 13 12.22
3 15 3 3.49 12 11.51
4 15 4 4.10 11 10.90
5 15 6 4.89 9 10.11
6 15 6 5.42 9 9.58
7 15 4 5.97 11 9.03
8 15 6 6.77 9 8.23
9 15 7 7.50 8 7.50
10 12 9 7.05 3 4.95

Hosmer and Lemeshow Goodness-of-Fit Test

Chi-Square DF Pr > ChiSq

3.9713 8 0.8597

41 / 44
1. Introduction
3. The Logit model
4. Class exercises
3.1 Basic elements
3.3 PROC LOGISTIC
HL =
2G
j =1
(O
j
E
j
)
2
E
j

2
G2
, where G is the number of
intervals, and O and E are the observed and predicted
frequencies respectively. LACKFIT output is as follows:
Partition for the Hosmer and Lemeshow Test

DEATH = 1 DEATH = 0
Group Total Observed Expected Observed Expected

1 15 3 2.04 12 12.96
2 15 2 2.78 13 12.22
3 15 3 3.49 12 11.51
4 15 4 4.10 11 10.90
5 15 6 4.89 9 10.11
6 15 6 5.42 9 9.58
7 15 4 5.97 11 9.03
8 15 6 6.77 9 8.23
9 15 7 7.50 8 7.50
10 12 9 7.05 3 4.95

Hosmer and Lemeshow Goodness-of-Fit Test

Chi-Square DF Pr > ChiSq

3.9713 8 0.8597

41 / 44
1. Introduction
3. The Logit model
4. Class exercises
Class exercises
1. Tutorial 1
2. Table 12.4 of Ramanathan (1995): Introductory
Econometrics, presents information on the acceptance or
rejection to medical school for a sample of 60 applicants,
along with a number of their characteristics. The variables are
as follows:
ACCEPT=1 if granted acceptance, 0 otherwise;
GPA=cumulative undergraduate grade point average;
BIO=score in the biology portion of the Medical College
Admission Test (MCAT);
CHEM=score in the chemistry portion of the MCAT;
42 / 44
1. Introduction
3. The Logit model
4. Class exercises
Class exercises
PHY=score in the physics portion of the MCAT;
RED=score in the reading portion of the MCAT;
PRB=score in the problem portion of the MCAT;
QNT=score in the quantitative portion of the MCAT;
AGE=age of the applicant;
GENDER=1 for male, 0 for female;
Answer the following questions with the aid of the program and
output medicalsas.txt and medicalout.txt uploaded on the course
website:
43 / 44
1. Introduction
3. The Logit model
4. Class exercises
Class exercises
1. Write down the estimated Logit model that regresses
ACCEPT on all of the above explanatory variables.
2. Test for the overall signicance of the model using the LR, LM
and Wald tests. Do the three tests provide consistent results?
3. Test for the signicance of the individual coecients using the
Wald test.
4. Predict the probability of success of an individual with the
following characteristics: GPA=2.96, BIO=7, CHEM=7,
PHY=8, RED=5, PRB=7, QNT=5, AGE=25, GENDER=0.
5. Calculate the Generalised R
2
for the above regression. How
well does the model appear to t the data?
6. AGE and GENDER represent personal characteristics. Test
the hypothesis that they jointly have no impact on the
probability of success.
44 / 44

Binary Logit

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Binary Logit

Hochgeladen von

Copyright:

Verfügbare Formate

1.

Dichotomous dependent variables;

Problems with Ordinary Least Squares (OLS) in the face of

Alternative estimation techniques

Dichotomous dependent variables;

Problems with Ordinary Least Squares (OLS) in the face of

Alternative estimation techniques

Dichotomous dependent variables;

Problems with Ordinary Least Squares (OLS) in the face of

Alternative estimation techniques

Dichotomous dependent variables;

Problems with Ordinary Least Squares (OLS) in the face of

Alternative estimation techniques

employed vs. unemployed

married vs. unmarried

guilty vs. innocent

voted vs. didnt vote

Social scientists frequently wish to estimate

Most researchers are aware that something is

Focus of this chapter is on binary Logit models

Logits have many similarities to OLS but there

Examine why OLS regression runs into problems when the

Comprises 147 penalty cases in the state of New

In all cases the defendant was convicted of

Penalty trial is conducted to determine if the

The dataset comprises the following variables:

DEATH 1 for a death sentence

BLACKD 1 if the defendant was black

WHITVIC 1 if the victim was white

SERIOUS - an average rating of seriousness of the

The goal is to regress DEATH on BLACKD, WHITVIC and

Note that DEATH, which has only two outcomes, follows a

Recall that Bernoulli trials led to the Binomial distribution - if

So the Bernoullli distribution is special case of the Binomial

The coefficient of SERIOUS is positive and very

Neither of the two racial variables are

F-test indicates overall significance of the

But....can we trust these results?

Note that if y is a 0/1 variable, then

But based on linear regression, y

Accordingly, from the SAS results, a one-point increase in the

Odds of an event: the ratio of the expected number of times

For example, an odds of 4 means we expect 4 times as many

Let p be the probability of an event occurring and o the

Odds of an event: the ratio of the expected number of times

For example, an odds of 4 means we expect 4 times as many

Let p be the probability of an event occurring and o the

Odds of an event: the ratio of the expected number of times

For example, an odds of 4 means we expect 4 times as many

Let p be the probability of an event occurring and o the

Relationship between probability and odds:

o < 1 p < 0.5 and o > 1 p > 0.5;

Death sentence by race of defendant for 147

Hence the ratio of blacks odds of death to non-blacks odds of

Death sentence by race of defendant for 147

Hence the ratio of blacks odds of death to non-blacks odds of

Death sentence by race of defendant for 147

Hence the ratio of blacks odds of death to non-blacks odds of

Death sentence by race of defendant for 147

Hence the ratio of blacks odds of death to non-blacks odds of

The Logit model is based on the following cumulative

Graph of the Logit with

For a linear model y

But for a Logit model,