Beruflich Dokumente
Kultur Dokumente
Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Table of contents
1. Introduction
1.1 Dichotomous dependent variables
1.2 Problems with OLS
2. Odds versus probability
3. The Logit model
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
3.3.1 SAS codes and basic outputs
3.3.2 Wald test for individual signicance
3.3.3 Likelihood-ratio, LM and Wald tests for overall signicance
3.3.4 Odds ratio estimates
3.3.5 AIC, SC and Generalised R
2
3.3.6 Association of predicted probabilities and observed responses
3.3.7 Hosmer-Lemeshow test statistic
4. Class exercises
2 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Introduction
Motivation for Logit model:
Example
Dataset: penalty.txt
R
2
is low;
Therefore, p
i
=
1
+
2
X
i
. This is commonly referred to as
the linear probability model (LPM).
13 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS
The LPM p
i
=
1
+
2
X
i
is actually implausible because p
i
is
postulated to be a linear function of X
i
and thus has no upper
and lower bounds. Accordingly, p
i
(which is a probability) can
be greater than 1 or smaller than 0 !!
14 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability
0 o < although 0 p 1
16 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability
o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;
This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability
o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;
This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability
o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;
This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability
o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;
This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements
1
+
2
X
i
;
Let Z
i
=
1
+
2
X
i
, then
p
i
=
1
1+e
Z
i
= F(
1
+
2
X
i
) = F(Z
i
);
As Z
i
ranges from to , P
i
ranges between 0 and 1;
P
i
is non-linearly related to Z
i
.
18 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements
Note that e
Z
i
= p
i
/(1 p
i
), the odds of an event;
So, ln(p
i
/(1 p
i
)) = Z
i
=
1
+
2
X
i
; in other words, the log
of the odds is linear in X
i
, although p
i
and X
i
have a
non-linear relationship. This is dierent from the LPM.
20 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements
(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.
As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements
(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.
As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements
(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.
As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements
(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.
As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements
Therefore,
p
i
X
i
=
2
p
i
(1 p
i
). In other words, a 1-unit
change in X
i
does not produce a constant eect on p
i
.
22 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements
Therefore,
p
i
X
i
=
2
p
i
(1 p
i
). In other words, a 1-unit
change in X
i
does not produce a constant eect on p
i
.
22 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation
Note that y
i
only takes on values of 0 and 1, so p
i
/(1 p
i
) is
undened and OLS is not an appropriate method of
estimation. Maximum likelihood (ML) estimation is usually
the technique to adopt;
Note that y
i
only takes on values of 0 and 1, so p
i
/(1 p
i
) is
undened and OLS is not an appropriate method of
estimation. Maximum likelihood (ML) estimation is usually
the technique to adopt;
Note that y
i
only takes on values of 0 and 1, so p
i
/(1 p
i
) is
undened and OLS is not an appropriate method of
estimation. Maximum likelihood (ML) estimation is usually
the technique to adopt;
i =1
Pr (y
i
)
But by denition, Pr (y
i
= 1) = p
i
and Pr (y
i
= 0) = 1 p
i
.
Therefore, Pr (y
i
) = p
y
i
i
(1 p
i
)
1y
i
24 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation
i =1
Pr (y
i
)
But by denition, Pr (y
i
= 1) = p
i
and Pr (y
i
= 0) = 1 p
i
.
Therefore, Pr (y
i
) = p
y
i
i
(1 p
i
)
1y
i
24 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation
So,
L =
n
i =1
Pr (y
i
) =
n
i =1
p
y
i
i
(1 p
i
)
1y
i
=
n
i =1
(
p
i
1 p
i
)
y
i
(1 p
i
)
i =1
log(
p
i
1 p
i
)
y
i
+log(1 p
i
)
=
n
i =1
y
i
log(
p
i
1 p
i
) +
n
i =1
log(1 p
i
)
25 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation
Substituting p
i
=
1
1+e
1
+
2
X
i
in lnL leads to
lnL =
1
n
i =1
y
i
+
2
n
i =1
X
i
y
i
n
i =1
log(1 +e
1
+
2
X
i
)
Substituting p
i
=
1
1+e
1
+
2
X
i
in lnL leads to
lnL =
1
n
i =1
y
i
+
2
n
i =1
X
i
y
i
n
i =1
log(1 +e
1
+
2
X
i
)
Substituting p
i
=
1
1+e
1
+
2
X
i
in lnL leads to
lnL =
1
n
i =1
y
i
+
2
n
i =1
X
i
y
i
n
i =1
log(1 +e
1
+
2
X
i
)
new
=
old
H
1
(
old
)U(
old
),
where H(.) and U(.) are the second and rst derivatives of
the objective function with respect to . The algorithm stops
when the estimates from successive iterations converge;
new
=
old
H
1
(
old
)U(
old
),
where H(.) and U(.) are the second and rst derivatives of
the objective function with respect to . The algorithm stops
when the estimates from successive iterations converge;
initial
= 1.5. So, U(1.5) = 2.25 and H(1.5) = 3. The new
estimate of is therefore equal to
new
= 1.5 2.25/(3) = 2.25;
Step 2:
old
= 2.25. So, U(2.25) = 1.6875 and
H(2.25) = 7.5. The new estimate of is
new
= 2.25 1.6875/(7.5) = 2.025;
initial
= 0.5. If the process is
left unchecked, the algorithm will converge to the minimum
located at = 0!!!
28 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation
(UR)
) lnL(
(R)
)]
2
k
2. Score (Lagrange-multplier)(LM) test:
LM = [U(
(R)
)]
[H
1
(
(R)
)][U(
(R)
)]
2
k
3. Wald test:
W =
(UR)
[H(
(UR)
)]
(UR)
2
k
35 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Odds ratio estimates
j
;
Tau a =
CD
N
,Somer
sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D
and C stat = 0.5 (1 +SD)
Tau a =
CD
N
,Somer
sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D
and C stat = 0.5 (1 +SD)
Tau a =
CD
N
,Somer
sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D
and C stat = 0.5 (1 +SD)
HL =
2G
j =1
(O
j
E
j
)
2
E
j
2
G2
, where G is the number of
intervals, and O and E are the observed and predicted
frequencies respectively. LACKFIT output is as follows:
Partition for the Hosmer and Lemeshow Test
DEATH = 1 DEATH = 0
Group Total Observed Expected Observed Expected
1 15 3 2.04 12 12.96
2 15 2 2.78 13 12.22
3 15 3 3.49 12 11.51
4 15 4 4.10 11 10.90
5 15 6 4.89 9 10.11
6 15 6 5.42 9 9.58
7 15 4 5.97 11 9.03
8 15 6 6.77 9 8.23
9 15 7 7.50 8 7.50
10 12 9 7.05 3 4.95
Hosmer and Lemeshow Goodness-of-Fit Test
Chi-Square DF Pr > ChiSq
3.9713 8 0.8597
41 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Hosmer-Lemeshow goodness of t test
HL =
2G
j =1
(O
j
E
j
)
2
E
j
2
G2
, where G is the number of
intervals, and O and E are the observed and predicted
frequencies respectively. LACKFIT output is as follows:
Partition for the Hosmer and Lemeshow Test
DEATH = 1 DEATH = 0
Group Total Observed Expected Observed Expected
1 15 3 2.04 12 12.96
2 15 2 2.78 13 12.22
3 15 3 3.49 12 11.51
4 15 4 4.10 11 10.90
5 15 6 4.89 9 10.11
6 15 6 5.42 9 9.58
7 15 4 5.97 11 9.03
8 15 6 6.77 9 8.23
9 15 7 7.50 8 7.50
10 12 9 7.05 3 4.95
Hosmer and Lemeshow Goodness-of-Fit Test
Chi-Square DF Pr > ChiSq
3.9713 8 0.8597
41 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Class exercises
1. Tutorial 1
2. Table 12.4 of Ramanathan (1995): Introductory
Econometrics, presents information on the acceptance or
rejection to medical school for a sample of 60 applicants,
along with a number of their characteristics. The variables are
as follows:
ACCEPT=1 if granted acceptance, 0 otherwise;
GPA=cumulative undergraduate grade point average;
BIO=score in the biology portion of the Medical College
Admission Test (MCAT);
CHEM=score in the chemistry portion of the MCAT;
42 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Class exercises
PHY=score in the physics portion of the MCAT;
RED=score in the reading portion of the MCAT;
PRB=score in the problem portion of the MCAT;
QNT=score in the quantitative portion of the MCAT;
AGE=age of the applicant;
GENDER=1 for male, 0 for female;
Answer the following questions with the aid of the program and
output medicalsas.txt and medicalout.txt uploaded on the course
website:
43 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Class exercises
1. Write down the estimated Logit model that regresses
ACCEPT on all of the above explanatory variables.
2. Test for the overall signicance of the model using the LR, LM
and Wald tests. Do the three tests provide consistent results?
3. Test for the signicance of the individual coecients using the
Wald test.
4. Predict the probability of success of an individual with the
following characteristics: GPA=2.96, BIO=7, CHEM=7,
PHY=8, RED=5, PRB=7, QNT=5, AGE=25, GENDER=0.
5. Calculate the Generalised R
2
for the above regression. How
well does the model appear to t the data?
6. AGE and GENDER represent personal characteristics. Test
the hypothesis that they jointly have no impact on the
probability of success.
44 / 44