Sie sind auf Seite 1von 67

1.

Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Table of contents
1. Introduction
1.1 Dichotomous dependent variables
1.2 Problems with OLS
2. Odds versus probability
3. The Logit model
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
3.3.1 SAS codes and basic outputs
3.3.2 Wald test for individual signicance
3.3.3 Likelihood-ratio, LM and Wald tests for overall signicance
3.3.4 Odds ratio estimates
3.3.5 AIC, SC and Generalised R
2
3.3.6 Association of predicted probabilities and observed responses
3.3.7 Hosmer-Lemeshow test statistic
4. Class exercises
2 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Introduction
Motivation for Logit model:

Dichotomous dependent variables;

Problems with Ordinary Least Squares (OLS) in the face of


dichotomous dependent variables;

Alternative estimation techniques


3 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Introduction
Motivation for Logit model:

Dichotomous dependent variables;

Problems with Ordinary Least Squares (OLS) in the face of


dichotomous dependent variables;

Alternative estimation techniques


3 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Introduction
Motivation for Logit model:

Dichotomous dependent variables;

Problems with Ordinary Least Squares (OLS) in the face of


dichotomous dependent variables;

Alternative estimation techniques


3 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Introduction
Motivation for Logit model:

Dichotomous dependent variables;

Problems with Ordinary Least Squares (OLS) in the face of


dichotomous dependent variables;

Alternative estimation techniques


3 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Dichotomous dependent variables
Often variables in social sciences are dichotomous:

employed vs. unemployed

married vs. unmarried

guilty vs. innocent

voted vs. didnt vote


4 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Dichotomous dependent variables

Social scientists frequently wish to estimate


regression models with a dichotomous dependent
variable;

Most researchers are aware that something is


wrong with OLS in the face of a dichotomous
dependent variable but they do not know what
makes dichotomous variables problematic in
regression, and what other methods are superior
5 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Dichotomous dependent variables

Focus of this chapter is on binary Logit models


(or logistic regression models) for dichotomous
dependent variables;

Logits have many similarities to OLS but there


are also fundamental differences
6 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS

Examine why OLS regression runs into problems when the


dependent variable is 0/1.

Example

Dataset: penalty.txt

Comprises 147 penalty cases in the state of New


Jersey;

In all cases the defendant was convicted of


first-degree murder with a recommendation by the
prosecutor that a death sentence be imposed;

Penalty trial is conducted to determine if the


defendant should receive a death penalty or life
imprisonment;
7 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS

The dataset comprises the following variables:

DEATH 1 for a death sentence


0 for a life sentence

BLACKD 1 if the defendant was black


0 otherwise

WHITVIC 1 if the victim was white


0 otherwise

SERIOUS - an average rating of seriousness of the


crime evaluated by a panel of judges, ranging from
(least serious) to 15 (most serious)

The goal is to regress DEATH on BLACKD, WHITVIC and


SERIOUS;
8 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS

Note that DEATH, which has only two outcomes, follows a


Bernoulli(p) distribution with p being the probability of a
death sentence. Let Y=DEATH, then
Pr (Y = y) = p
y
(1 p)
1y
, y = 0, 1

Recall that Bernoulli trials led to the Binomial distribution - if


we repeat the Bernoulli(p) trials n times and count the
number of W successes, the distribution of W follows a
Binomial B(n, p) distribution, i.e.,
Pr (W = w) =
n
C
w
p
w
(1 p)
(nw)
, 0 w n

So the Bernoullli distribution is special case of the Binomial


distribution when n = 1.
9 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS
data penalty;
infile 'd:\teaching\ms4225\penalty.txt';
input DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;
PROC REG;
MODEL DEATH=BLACKD WHITVIC SERIOUS;
RUN;
10 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS

The REG Procedure
Model: MODEL1
Dependent Variable: DEATH

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 3 2.61611 0.87204 4.11 0.0079
Error 143 30.37709 0.21243
Corrected Total 146 32.99320


Root MSE 0.46090 R-Square 0.0793
Dependent Mean 0.34014 Adj R-Sq 0.0600
Coeff Var 135.50409


Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -0.05492 0.12499 -0.44 0.6610
BLACKD 1 0.12197 0.08224 1.48 0.1403
WHITVIC 1 0.05331 0.08411 0.63 0.5272
SERIOUS 1 0.03840 0.01200 3.20 0.0017
11 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS

The coefficient of SERIOUS is positive and very


significant;

Neither of the two racial variables are


significantly different from zero;

R
2
is low;

F-test indicates overall significance of the


model;

But....can we trust these results?


12 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS

Note that if y is a 0/1 variable, then


E(y
i
) = 1 Pr (y
i
= 1) + 0 Pr (y
i
= 0)
= 1 p
i
+ 0 (1 p
i
)
= p
i
.

But based on linear regression, y


i
=
1
+
2
X
i
+
i
. Hence
E(y
i
) = E(
1
+
2
X
i
+
i
)
=
1
+
2
X
i
+E(
i
)
=
1
+
2
X
i
.

Therefore, p
i
=
1
+
2
X
i
. This is commonly referred to as
the linear probability model (LPM).
13 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
1.1 Dichotomous dependent variables
1.2 Problems with OLS
Problems with OLS

Accordingly, from the SAS results, a one-point increase in the


SERIOUS scale is associated with a 0.038 increase in the
probability of a death sentence; the probability of a death
sentence for blacks is 0.12 higher than for non-blacks, ceteris
paribus. But do these results make sense?

The LPM p
i
=
1
+
2
X
i
is actually implausible because p
i
is
postulated to be a linear function of X
i
and thus has no upper
and lower bounds. Accordingly, p
i
(which is a probability) can
be greater than 1 or smaller than 0 !!
14 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability

Odds of an event: the ratio of the expected number of times


that an event will occur to the expected number of times it
will not occur;

For example, an odds of 4 means we expect 4 times as many


occurrences as non-occurrences; an odds of 5/2 (or 5 to 2)
means we expect 5 occurrences to 2 non-occurrences;

Let p be the probability of an event occurring and o the


corresponding odds, then
o = p/(1 p) or p = o/(1 +o);
15 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability

Odds of an event: the ratio of the expected number of times


that an event will occur to the expected number of times it
will not occur;

For example, an odds of 4 means we expect 4 times as many


occurrences as non-occurrences; an odds of 5/2 (or 5 to 2)
means we expect 5 occurrences to 2 non-occurrences;

Let p be the probability of an event occurring and o the


corresponding odds, then
o = p/(1 p) or p = o/(1 +o);
15 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability

Odds of an event: the ratio of the expected number of times


that an event will occur to the expected number of times it
will not occur;

For example, an odds of 4 means we expect 4 times as many


occurrences as non-occurrences; an odds of 5/2 (or 5 to 2)
means we expect 5 occurrences to 2 non-occurrences;

Let p be the probability of an event occurring and o the


corresponding odds, then
o = p/(1 p) or p = o/(1 +o);
15 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability

Relationship between probability and odds:


Probability Odds
0.1 0.11
0.2 0.25
0.3 0.43
0.4 0.67
0.5 1.00
0.6 1.50
0.7 2.33
0.8 4.00
0.9 9.00

o < 1 p < 0.5 and o > 1 p > 0.5;

0 o < although 0 p 1
16 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability

Death sentence by race of defendant for 147


penalty trials:
blacks non-blacks
death 28 22 50
life 45 52 97
73 74 147

o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;

Hence the ratio of blacks odds of death to non-blacks odds of


death are 0.62/0.42 = 1.476;

This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability

Death sentence by race of defendant for 147


penalty trials:
blacks non-blacks
death 28 22 50
life 45 52 97
73 74 147

o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;

Hence the ratio of blacks odds of death to non-blacks odds of


death are 0.62/0.42 = 1.476;

This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability

Death sentence by race of defendant for 147


penalty trials:
blacks non-blacks
death 28 22 50
life 45 52 97
73 74 147

o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;

Hence the ratio of blacks odds of death to non-blacks odds of


death are 0.62/0.42 = 1.476;

This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Odds versus probability

Death sentence by race of defendant for 147


penalty trials:
blacks non-blacks
death 28 22 50
life 45 52 97
73 74 147

o
D
= 50/97 = 0.52; o
D|B
= 28/45 = 0.62; and
o
D|NB
= 22/52 = 0.42;

Hence the ratio of blacks odds of death to non-blacks odds of


death are 0.62/0.42 = 1.476;

This means the odds of death sentence for blacks are 47.6%
higher than non-blacks, or the odds of death sentence for
non-blacks are 0.63 times the corresponding odds for blacks
17 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

The Logit model is based on the following cumulative


distribution function of the logistic distribution:
p
i
=
1
1+e

1
+
2
X
i
;

Let Z
i
=
1
+
2
X
i
, then
p
i
=
1
1+e
Z
i
= F(
1
+
2
X
i
) = F(Z
i
);

As Z
i
ranges from to , P
i
ranges between 0 and 1;

P
i
is non-linearly related to Z
i
.
18 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

Graph of the Logit with


1
= 0 and
2
= 1:

-4 -3 -2 -1 0 1 2 3 4
P
i
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
19 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

Note that e
Z
i
= p
i
/(1 p
i
), the odds of an event;

So, ln(p
i
/(1 p
i
)) = Z
i
=
1
+
2
X
i
; in other words, the log
of the odds is linear in X
i
, although p
i
and X
i
have a
non-linear relationship. This is dierent from the LPM.
20 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

For a linear model y


i
=
1
+
2
X
i
+
i
,
y
i
X
i
=
2
, a constant;

But for a Logit model,


p
i
= F(
1
+
2
X
i
)
p
i
X
i
=
F(
1
+
2
X
i
X
i
= F

(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.

As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

For a linear model y


i
=
1
+
2
X
i
+
i
,
y
i
X
i
=
2
, a constant;

But for a Logit model,


p
i
= F(
1
+
2
X
i
)
p
i
X
i
=
F(
1
+
2
X
i
X
i
= F

(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.

As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

For a linear model y


i
=
1
+
2
X
i
+
i
,
y
i
X
i
=
2
, a constant;

But for a Logit model,


p
i
= F(
1
+
2
X
i
)
p
i
X
i
=
F(
1
+
2
X
i
X
i
= F

(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.

As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

For a linear model y


i
=
1
+
2
X
i
+
i
,
y
i
X
i
=
2
, a constant;

But for a Logit model,


p
i
= F(
1
+
2
X
i
)
p
i
X
i
=
F(
1
+
2
X
i
X
i
= F

(
1
+
2
X
i
)
2
= f (
1
+
2
X
i
)
2
,
where f (.) is the probability density function for the logistic
distribution.

As f (
1
+
2
X
i
) is always positive, the sign of
2
indicates
the direction of the relationship between p
i
and X
i
.
21 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

Note that for the Logit model


f (
1
+
2
X
i
) =
e
Z
i
(1 +e
Z
i
)
2
= F(
1
+
2
X
i
)(1 F(
1
+
2
X
i
))
= p
i
(1 p
i
)

Therefore,
p
i
X
i
=
2
p
i
(1 p
i
). In other words, a 1-unit
change in X
i
does not produce a constant eect on p
i
.
22 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Logit model: basic elements

Note that for the Logit model


f (
1
+
2
X
i
) =
e
Z
i
(1 +e
Z
i
)
2
= F(
1
+
2
X
i
)(1 F(
1
+
2
X
i
))
= p
i
(1 p
i
)

Therefore,
p
i
X
i
=
2
p
i
(1 p
i
). In other words, a 1-unit
change in X
i
does not produce a constant eect on p
i
.
22 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Note that y
i
only takes on values of 0 and 1, so p
i
/(1 p
i
) is
undened and OLS is not an appropriate method of
estimation. Maximum likelihood (ML) estimation is usually
the technique to adopt;

ML principle: choose as estimates the parameter values which


would maximise the probability of what we have already
observed;

Steps of ML estimation: First, construct the likelihood


function by expressing the probability of observing the data as
a function of the unknown parameters. Second, nd the
values of the unknown parameters that make the value of this
expression as large as possible.
23 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Note that y
i
only takes on values of 0 and 1, so p
i
/(1 p
i
) is
undened and OLS is not an appropriate method of
estimation. Maximum likelihood (ML) estimation is usually
the technique to adopt;

ML principle: choose as estimates the parameter values which


would maximise the probability of what we have already
observed;

Steps of ML estimation: First, construct the likelihood


function by expressing the probability of observing the data as
a function of the unknown parameters. Second, nd the
values of the unknown parameters that make the value of this
expression as large as possible.
23 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Note that y
i
only takes on values of 0 and 1, so p
i
/(1 p
i
) is
undened and OLS is not an appropriate method of
estimation. Maximum likelihood (ML) estimation is usually
the technique to adopt;

ML principle: choose as estimates the parameter values which


would maximise the probability of what we have already
observed;

Steps of ML estimation: First, construct the likelihood


function by expressing the probability of observing the data as
a function of the unknown parameters. Second, nd the
values of the unknown parameters that make the value of this
expression as large as possible.
23 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

The likelihood function is given by


L = Pr (y
1
, y
2
, ....y
n
)
= Pr (y
1
)Pr (y
2
)....Pr (y
n
), assuming independent sampling;
=
n

i =1
Pr (y
i
)

But by denition, Pr (y
i
= 1) = p
i
and Pr (y
i
= 0) = 1 p
i
.
Therefore, Pr (y
i
) = p
y
i
i
(1 p
i
)
1y
i
24 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

The likelihood function is given by


L = Pr (y
1
, y
2
, ....y
n
)
= Pr (y
1
)Pr (y
2
)....Pr (y
n
), assuming independent sampling;
=
n

i =1
Pr (y
i
)

But by denition, Pr (y
i
= 1) = p
i
and Pr (y
i
= 0) = 1 p
i
.
Therefore, Pr (y
i
) = p
y
i
i
(1 p
i
)
1y
i
24 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

So,
L =
n

i =1
Pr (y
i
) =
n

i =1
p
y
i
i
(1 p
i
)
1y
i
=
n

i =1
(
p
i
1 p
i
)
y
i
(1 p
i
)

It is usually easier to maximise the log of L than L itself.


Taking log of both sides yields
lnL =
n

i =1
log(
p
i
1 p
i
)
y
i
+log(1 p
i
)
=
n

i =1
y
i
log(
p
i
1 p
i
) +
n

i =1
log(1 p
i
)
25 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Substituting p
i
=
1
1+e

1
+
2
X
i
in lnL leads to
lnL =
1
n

i =1
y
i
+
2
n

i =1
X
i
y
i

n

i =1
log(1 +e

1
+
2
X
i
)

There are no closed form solutions to


1
and
2
when
maximizing lnL;

Numerical optimisation is required - SAS uses Fishers Scoring,


which is similar in principle to the Newton-Raphson algorithm.
26 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Substituting p
i
=
1
1+e

1
+
2
X
i
in lnL leads to
lnL =
1
n

i =1
y
i
+
2
n

i =1
X
i
y
i

n

i =1
log(1 +e

1
+
2
X
i
)

There are no closed form solutions to


1
and
2
when
maximizing lnL;

Numerical optimisation is required - SAS uses Fishers Scoring,


which is similar in principle to the Newton-Raphson algorithm.
26 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Substituting p
i
=
1
1+e

1
+
2
X
i
in lnL leads to
lnL =
1
n

i =1
y
i
+
2
n

i =1
X
i
y
i

n

i =1
log(1 +e

1
+
2
X
i
)

There are no closed form solutions to


1
and
2
when
maximizing lnL;

Numerical optimisation is required - SAS uses Fishers Scoring,


which is similar in principle to the Newton-Raphson algorithm.
26 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Suppose is a univariate unknown parameter to be estimated.


The Newton-Raphson algorithm derives estimates based on
the formula

new
=

old
H
1
(

old
)U(

old
),
where H(.) and U(.) are the second and rst derivatives of
the objective function with respect to . The algorithm stops
when the estimates from successive iterations converge;

Consider a simple example, where g() =


3
+ 3
2
5. So,
U() = 3( 2) and H() = 6( 1);

Actual maximum and minimum of g() are located at = 2


and = 0 respectively;
27 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Suppose is a univariate unknown parameter to be estimated.


The Newton-Raphson algorithm derives estimates based on
the formula

new
=

old
H
1
(

old
)U(

old
),
where H(.) and U(.) are the second and rst derivatives of
the objective function with respect to . The algorithm stops
when the estimates from successive iterations converge;

Consider a simple example, where g() =


3
+ 3
2
5. So,
U() = 3( 2) and H() = 6( 1);

Actual maximum and minimum of g() are located at = 2


and = 0 respectively;
27 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

Step 1: Choose an arbitrary initial starting value, say,

initial
= 1.5. So, U(1.5) = 2.25 and H(1.5) = 3. The new
estimate of is therefore equal to

new
= 1.5 2.25/(3) = 2.25;

Step 2:

old
= 2.25. So, U(2.25) = 1.6875 and
H(2.25) = 7.5. The new estimate of is

new
= 2.25 1.6875/(7.5) = 2.025;

Continue with Steps 3, 4 and so on until convergence;

Caution: Suppose we start with


initial
= 0.5. If the process is
left unchecked, the algorithm will converge to the minimum
located at = 0!!!
28 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Maximum Likelihood estimation

The only dierence between Fishers Scoring and


Newton-Raphson algorithm is that Fishers Scoring uses
E(H(.)) instead of H(.);

Our current situation is more complicated in that the


unknowns are multivariate. However, the optimisation
principle remains the same;

In practice, we need a set of initial values. PROC LOGISTIC


in SAS starts with all coecients equal to zero.
29 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
PROC LOGISTIC: basic elements
data PENALTY;
infile 'd:\teaching\ms4225\penalty.txt';
input DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;
PROC LOGISTIC DATA=PENALTY DESCENDING;
MODEL DEATH=BLACKD WHITVIC SERIOUS;
RUN;
30 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
PROC LOGISTIC: basic elements
The LOGISTIC Procedure

Model Information

Data Set WORK.PENALTY
Response Variable DEATH
Number of Response Levels 2
Number of Observations 147
Model binary logit
Optimization Technique Fisher's scoring

Response Profile

Ordered Total
Value DEATH Frequency

1 1 50
2 0 97

Probability modeled is DEATH=1.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.






































31 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
PROC LOGISTIC: basic elements
Model Fit Statistics

Intercept
Intercept and
Criterion Only Covariates

AIC 190.491 184.285
SC 193.481 196.247
-2 Log L 188.491 176.285

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 12.2060 3 0.0067
Score 11.6560 3 0.0087
Wald 10.8211 3 0.0127

32 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
PROC LOGISTIC: basic elements


The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -2.6516 0.6748 15.4424 <.0001
BLACKD 1 0.5952 0.3939 2.2827 0.1308
WHITVIC 1 0.2565 0.4002 0.4107 0.5216
SERIOUS 1 0.1871 0.0612 9.3342 0.0022

Odds Ratio Estimates

Point 95% Wald
Effect Estimate Confidence Limits

BLACKD 1.813 0.838 3.925
WHITVIC 1.292 0.590 2.832
SERIOUS 1.206 1.069 1.359

Association of Predicted Probabilities and Observed Responses

Percent Concordant 67.2 Somers' D 0.349
Percent Discordant 32.3 Gamma 0.351
Percent Tied 0.5 Tau-a 0.158
Pairs 4850 c 0.675


33 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Wald test for individual signicance

Test of signicance of individual coecients:


H
0
:
j
= 0 vs. H
1
: otherwise
Instead of reporting the t-stats, PROC LOGISTIC reports the
Wald
2
-stats for the signicance of individual coecients.
Reason being that the t-stat is not t distributed in a Logit
model; instead, it has an asymptotic N(0, 1) distribution
under the null of H
0
:
j
= 0. The square of a N(0, 1) variable
is a
2
variable with 1 df. The Wald
2
-stat is just the square
of the usual t-stat.
34 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Likelihood-ratio, LM and Wald tests for overall signicance

Test of overall model signicance:


H
0
:
1
=
2
= .... =
k
= 0 vs. H
1
: otherwise
1. Likelihood-ratio test:
LR = 2[lnL(

(UR)
) lnL(

(R)
)]
2
k
2. Score (Lagrange-multplier)(LM) test:
LM = [U(

(R)
)]

[H
1
(

(R)
)][U(

(R)
)]
2
k
3. Wald test:
W =

(UR)
[H(

(UR)
)]

(UR)

2
k
35 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Odds ratio estimates

The odds ratio estimates are obtained by exponentiating the


corresponding estimates, i.e., e

j
;

The (predicted) odds ratio of 1.813 indicates that the odds of


a death sentence for black defendants are 81% higher than
the odds for other defendants;

Similarly, the (predicted) odds of death are about 29% higher


when the victim is white, notwithstanding the coecient
being insignicant;

A 1-unit increase in the SERIOUS scale is associated with a


21% increase in the predicted odds of a death sentence
36 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
AIC, SC and Generalised R
2

Model selection criteria


1. Akaikes Information Criterion (AIC):
AIC = 2[lnL (k + 1)]
2. Schwartz Bayesian Criterion (SBC or SC):
SC = 2lnL + (k + 1) ln(n)
3. Generalized R
2
= 1 e
LR/n
, analogous to the conventional
R
2
used in linear regression
37 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Association of predicted probabilities and observed
responses

For the 147 observations in the sample, there are


147
C
2
= 10731 ways to pair them up (without pairing an
observation with itself). Of these, 5881 pairs have either both
1s or both 0s on y. These we ignore, leaving 4850 pairs for
which one case has a 1 and other case has a 0;

For each of these pairs, we ask the following question: Based


on estimated model, does the case with a 1 have a higher
predicted probability of attaining 1 than the case with a 0?

If yes, we call the pair a concordant; if no, we call the pair a


discordant; if the two cases have the same predicted values,
we call it a tie;

Obviously, the more concordant pairs, the better the t of the


model.
38 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Association of predicted probabilities and observed
responses

For the 147 observations in the sample, there are


147
C
2
= 10731 ways to pair them up (without pairing an
observation with itself). Of these, 5881 pairs have either both
1s or both 0s on y. These we ignore, leaving 4850 pairs for
which one case has a 1 and other case has a 0;

For each of these pairs, we ask the following question: Based


on estimated model, does the case with a 1 have a higher
predicted probability of attaining 1 than the case with a 0?

If yes, we call the pair a concordant; if no, we call the pair a


discordant; if the two cases have the same predicted values,
we call it a tie;

Obviously, the more concordant pairs, the better the t of the


model.
38 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Association of predicted probabilities and observed
responses

For the 147 observations in the sample, there are


147
C
2
= 10731 ways to pair them up (without pairing an
observation with itself). Of these, 5881 pairs have either both
1s or both 0s on y. These we ignore, leaving 4850 pairs for
which one case has a 1 and other case has a 0;

For each of these pairs, we ask the following question: Based


on estimated model, does the case with a 1 have a higher
predicted probability of attaining 1 than the case with a 0?

If yes, we call the pair a concordant; if no, we call the pair a


discordant; if the two cases have the same predicted values,
we call it a tie;

Obviously, the more concordant pairs, the better the t of the


model.
38 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Association of predicted probabilities and observed
responses

For the 147 observations in the sample, there are


147
C
2
= 10731 ways to pair them up (without pairing an
observation with itself). Of these, 5881 pairs have either both
1s or both 0s on y. These we ignore, leaving 4850 pairs for
which one case has a 1 and other case has a 0;

For each of these pairs, we ask the following question: Based


on estimated model, does the case with a 1 have a higher
predicted probability of attaining 1 than the case with a 0?

If yes, we call the pair a concordant; if no, we call the pair a


discordant; if the two cases have the same predicted values,
we call it a tie;

Obviously, the more concordant pairs, the better the t of the


model.
38 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Association of predicted probabilities and observed
responses

Let C= number of concordant pairs, D= number of


discordant pairs, T=number of ties, and N=total number of
pairs before eliminating any;

Tau a =
CD
N
,Somer

sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D
and C stat = 0.5 (1 +SD)

All 4 measures vary between 0 and 1 with large values


corresponding to stronger associations between the predicted
and observed values

Rules of thumb for minimally acceptable levels of Tau a,


SD, Gamma and C stat are 0.1, 0.3, 0.3 and 0.65
respectively.
39 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Association of predicted probabilities and observed
responses

Let C= number of concordant pairs, D= number of


discordant pairs, T=number of ties, and N=total number of
pairs before eliminating any;

Tau a =
CD
N
,Somer

sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D
and C stat = 0.5 (1 +SD)

All 4 measures vary between 0 and 1 with large values


corresponding to stronger associations between the predicted
and observed values

Rules of thumb for minimally acceptable levels of Tau a,


SD, Gamma and C stat are 0.1, 0.3, 0.3 and 0.65
respectively.
39 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Association of predicted probabilities and observed
responses

Let C= number of concordant pairs, D= number of


discordant pairs, T=number of ties, and N=total number of
pairs before eliminating any;

Tau a =
CD
N
,Somer

sD(SD) =
CD
C+D+T
, Gamma =
CD
C+D
and C stat = 0.5 (1 +SD)

All 4 measures vary between 0 and 1 with large values


corresponding to stronger associations between the predicted
and observed values

Rules of thumb for minimally acceptable levels of Tau a,


SD, Gamma and C stat are 0.1, 0.3, 0.3 and 0.65
respectively.
39 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Hosmer-Lemeshow goodness of t test

The Hosmer-Lemeshow (HL) test is goodness of t test which


may be invoked by augmenting the LACKFIT option in the
model statement under PROC LOGISTIC;

The HL statistic is calculated as follows. Based on the


estimated model, predicted probabilities are generated for all
observations. These are sorted by size, then grouped into
approximately 10 intervals. Within each interval, the expected
frequency is obtained by adding up the predicted probabilities.
Expected frequencies are compared with the observed
frequencies by the conventional Pearson
2
statistic. The df is
the number of intervals minus 2;
40 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Hosmer-Lemeshow goodness of t test

HL =

2G
j =1
(O
j
E
j
)
2
E
j

2
G2
, where G is the number of
intervals, and O and E are the observed and predicted
frequencies respectively. LACKFIT output is as follows:
Partition for the Hosmer and Lemeshow Test

DEATH = 1 DEATH = 0
Group Total Observed Expected Observed Expected

1 15 3 2.04 12 12.96
2 15 2 2.78 13 12.22
3 15 3 3.49 12 11.51
4 15 4 4.10 11 10.90
5 15 6 4.89 9 10.11
6 15 6 5.42 9 9.58
7 15 4 5.97 11 9.03
8 15 6 6.77 9 8.23
9 15 7 7.50 8 7.50
10 12 9 7.05 3 4.95

Hosmer and Lemeshow Goodness-of-Fit Test

Chi-Square DF Pr > ChiSq

3.9713 8 0.8597





































41 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
3.1 Basic elements
3.2 Maximum likelihood estimation
3.3 PROC LOGISTIC
Hosmer-Lemeshow goodness of t test

HL =

2G
j =1
(O
j
E
j
)
2
E
j

2
G2
, where G is the number of
intervals, and O and E are the observed and predicted
frequencies respectively. LACKFIT output is as follows:
Partition for the Hosmer and Lemeshow Test

DEATH = 1 DEATH = 0
Group Total Observed Expected Observed Expected

1 15 3 2.04 12 12.96
2 15 2 2.78 13 12.22
3 15 3 3.49 12 11.51
4 15 4 4.10 11 10.90
5 15 6 4.89 9 10.11
6 15 6 5.42 9 9.58
7 15 4 5.97 11 9.03
8 15 6 6.77 9 8.23
9 15 7 7.50 8 7.50
10 12 9 7.05 3 4.95

Hosmer and Lemeshow Goodness-of-Fit Test

Chi-Square DF Pr > ChiSq

3.9713 8 0.8597





































41 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Class exercises
1. Tutorial 1
2. Table 12.4 of Ramanathan (1995): Introductory
Econometrics, presents information on the acceptance or
rejection to medical school for a sample of 60 applicants,
along with a number of their characteristics. The variables are
as follows:
ACCEPT=1 if granted acceptance, 0 otherwise;
GPA=cumulative undergraduate grade point average;
BIO=score in the biology portion of the Medical College
Admission Test (MCAT);
CHEM=score in the chemistry portion of the MCAT;
42 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Class exercises
PHY=score in the physics portion of the MCAT;
RED=score in the reading portion of the MCAT;
PRB=score in the problem portion of the MCAT;
QNT=score in the quantitative portion of the MCAT;
AGE=age of the applicant;
GENDER=1 for male, 0 for female;
Answer the following questions with the aid of the program and
output medicalsas.txt and medicalout.txt uploaded on the course
website:
43 / 44
1. Introduction
2. Odds versus probability
3. The Logit model
4. Class exercises
Class exercises
1. Write down the estimated Logit model that regresses
ACCEPT on all of the above explanatory variables.
2. Test for the overall signicance of the model using the LR, LM
and Wald tests. Do the three tests provide consistent results?
3. Test for the signicance of the individual coecients using the
Wald test.
4. Predict the probability of success of an individual with the
following characteristics: GPA=2.96, BIO=7, CHEM=7,
PHY=8, RED=5, PRB=7, QNT=5, AGE=25, GENDER=0.
5. Calculate the Generalised R
2
for the above regression. How
well does the model appear to t the data?
6. AGE and GENDER represent personal characteristics. Test
the hypothesis that they jointly have no impact on the
probability of success.
44 / 44

Das könnte Ihnen auch gefallen