Sie sind auf Seite 1von 87

# GENERALIZED LINEAR MODEL

LOGISTIC REGRESSION

## Generalized Linear Models

Traditional applications of linear models, such as

## SLR and multiple linear regression, assume that

the response variable is
Normally distributed
Constant variance
Independent

## The response is either binary (0,1), or a count

The response is continuous, but nonnormal
2

## Some Approaches to These Problems

Data transformation
Induce approximate normality
Stabilize variance
Simplify model form
Weighted least squares
Often used to stabilize variance
Generalized linear models (GLM)
Approach is about 25-30 years old, unifies linear and nonlinear
regression models
Response distribution is a member of the exponential family
(normal, exponential, gamma, binomial, Poisson)

## Generalized Linear Models

Original applications were in biopharmaceutical sciences
Lots of recent interest in GLMs in industrial statistics
GLMs are simple models; include linear regression and

## OLS as a special case

Parameter estimation is by maximum likelihood
(assume that the response distribution is known)
Inference on parameters is based on large-sample or
asymptotic theory
We will consider logistic regression, Poisson regression,
then the GLM
4

## Logistic regression: an overview

1. Models with binary outcomes
2.

3.

4.

## Predicting y and/or odds of y for a given x.

1. Binary outcome..
Binary outcomes are outcomes with two possible values, Success

or failure

## occurs, otherwise 0 (failure). The units of analysis for binary

(0,1) outcomes are individuals.
Occurs often in the biopharmaceutical field; dose-response
studies, bioassays, clinical trials
Industrial applications include failure analysis, fatigue

## testing, reliability testing. Example: functional electrical

testing on a semiconductor can yield: success in which case

failure mode

## Binary Response Variables

Possible model:

i 1, 2,..., n
yi 0 j xij i xi i
j 1
yi 0 or 1
k

## The response yi is a Bernoulli random variable

P ( yi 1) i with 0 i 1
P ( yi 0) 1 i
E ( yi ) i xi i
Var ( yi ) i (1 i )
2
yi

## 2. Problems With This Model

The error terms take on only two values, so they cant

## possibly be normally distributed.

Error distribution is neither identical nor normal.
The variance of the observations is a function of the mean
(see previous slide).
Heteroskedasticity is more of a problem, but still often not
fatal because it acts in a conservative direction

## 2. Problems With This Model

A linear response function could result in predicted values that

## fall outside the 0, 1 range, and this is impossible because

0 E ( yi ) i xi i 1
Nonsensical predictions.
Bad predictions due to nonlinear functional form even within

reasonable values of y.

Data

10

At Least One
O-ring Failure

Temperature
at Launch

1.0

At Least One
O-ring Failure

53

70

56

70

57

72

63

73

66

75

67

75

67

76

67

76

68

78

69

79

70

80

70

81

O-Ring Fail

Temperature
at Launch

0.5

0.0
50

60

70

Temperature

## Data for space shuttle

launches and static tests
prior to the launch of
Challenger

80

## A solution for nonlinear relationships:

Generalized Linear Models
Linear model: yi = i + ixi + i

## (identity transform: no change in yi)

Generalized linear model: F(yi) = i + ixi + i

## (F is some function such that F(y) is linear with xk)

Logit model: log( pi /(1- pi) )= i + ixi + i

## Binary Response Variables

There is a lot of empirical evidence that the response

## function should be nonlinear; an S shape is quite logical

See the scatter plot of the Challenger data
The logistic response function is a common choice

exp(x
1
E ( y)

1 exp(x 1 exp(x

12

13

## Logistic Regression Curve

1.0

Probability

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

9 10 11 12 13 14 15 16 17 18 19 20 21

Assumption

pii
P

(pi )
Logit
Transform

Predictor

Predictor

## The Logistic Response Function

The logistic response function can be easily linearized. Let:

x and E ( y )
Define

ln

16

Model:

yi E ( yi ) i
where
E ( yi ) i
exp(xi

1 exp( xi

17

## Maximum Likelihood Estimation in Logistic

Regression
The distribution of each observation yi is

fi ( yi ) iyi (1 i )1 yi , i 1, 2,..., n
The likelihood function is
n

i 1

L(y , f i ( yi ) iyi (1 i )1 yi
We usually work with the log-likelihood:

i n
ln L(y, ln fi ( yi ) yi ln
ln(1 i )
i 1
i 1
1 i i 1
n

18

## Maximum Likelihood Estimation in Logistic

Regression
The maximum likelihood estimators (MLEs) of the model

## parameters are those values that maximize the likelihood

(or log-likelihood) function
ML has been around since the first part of the previous
century
Often gives estimators that are intuitively pleasing
MLEs have nice properties; unbiased (for large samples),
minimum variance (or nearly so), and they have an
approximate normal distribution when n is large
19

## Maximum Likelihood Estimation in Logistic

Regression
If we have ni trials at each observation, we can write the

log-likelihood as

i 1

## The derivative of the log-likelihood is

n

ni
ln L(y,

X y
exp(xixi

i 1 1 exp( xi
n

Xy ni i xi
i 1

Xy X because i ni i )
20

## Maximum Likelihood Estimation in Logistic

Regression
Setting this last result to zero gives the maximum likelihood

score equations

X(y 0

## before in linear regression:

y X
X y 0 results from OLS or ML with normal errors
Since XX y X y X 0,
XX Xy, and XX)1 Xy (OLS or the normal-theory MLE)
21

## Maximum Likelihood Estimation in Logistic

Regression
Solving the ML score equations in logistic regression isnt quite as

easy, because

ni
, i 1, 2,..., n
1 exp(xi

## Logistic regression is a nonlinear model

It turns out that the solution is actually fairly easy, and is based on

## iteratively reweighted least squares or IRLS

An iterative procedure is necessary because parameter estimates
must be updated from an initial guess through several steps
Weights are necessary because the variance of the observations is
not constant
The weights are functions of the unknown parameters
22

## 3. What does the logit mean?

The logit model is related to the odds for a binary outcome.
odds = Pr(y=1) / Pr(y=0) = p/(1-p)
log odds = ln (odds) = ln (p/(1-p))
(in statistics, all logs refer to the natural log:
If x = en where e = 2.718, then ln(x) = n
Thus, the logit is the predicted log odds of Y for a given value of x

## Interpretation of the Parameters in Logistic

Regression
The log-odds at x is

( x)
( x) ln
0 1 x
1 ( x)

The log-odds at x + 1 is

( x 1)
( x 1) ln
0 1 ( x 1)
1 ( x 1)

( x 1) ( x) 1
24

## Interpretation of the Parameters in Logistic

Regression
The odds ratio is found by taking antilogs:

Odds x 1
1

OR
e
Odds x
The odds ratio is interpreted as the estimated increase in

## the probability of success associated with a one-unit

increase in the value of the predictor variable

25

## 4. Computing p from a log odds

The formal statement of the logit model is (again)
p
log i
1 pi

Then
Then

i i xi i

Note: ( x) p

p
e x
1 p

e x
p
1 e x

x
e
Or, predicted p, p

1 e x

## Inference on the Model Parameters

Likelihood ratio tests (LRT)
A LRT can be used to compare a full model with a reduced model
of interest.
Analogous to the extra sum of squares technique to compare full
and reduced models.
The LRT compares twice the logarithm of the value of the likelihood
function for the full model (FM) to twice the logarithm of the value of
the likelihood function of the reduced model (RM) to obtain test
statistic:

L( FM )
LR 2 ln
2[ln L( FM ) ln L( RM )]
L( RM )

27

For large samples, when the reduced model is correct, the test
statistic LR follows a chi-square distribution with df equal to the
difference in the no of parameters between full & reduced models.
2
If LR > ,df , we would reject the claim that the reduced model is
appropriate.

## Inference on the Model Parameters

LR approach can be used to provide a test for significance of logistic
regression
uses current model (fit the data) as the full model & compares it to a
reduced model (only has constant prob of success). The constant probability
of success model is
e 0
E ( y) p
1 e 0
logistic regression model with no regressor variables.
The MLE of reduced model is just y/n.
Substituting this into log likelihood fcn gives the max values of the likelihood
fcn for the reduced model as

## ln L( RM ) y ln( y ) (n y ) ln(n y ) n ln(n)

Therefore, the LRT for testing significance of regression is:

28

n
n

i 1
i 1

## Testing Goodness of Fit (GOF)

The GOF of the logistic regression model can also be assessed using a LRT
procedure.
This test compares the current model to a saturated model, where each obs (or
group obs when n=1) is allowed to have its own parameter (a success probability).
The Deviance is defined as twice the difference in log-likelihoods between this
saturated model and the full model (current model) that has been fit to the data
with estimated
xi'

p i

1 e xi
'

## The Deviance is defined as

n

yi
ni yi
L(saturated model)
D 2 ln
2 yi ln(
) (ni yi ) ln(
)

L(FM)
n
p
n
(1

p
)
i 1
i i
i
i

29

## Testing Goodness of Fit (GOF)

In calculating the deviance, y ln(
y n we have (n - y ) ln(

n y
) 0.

n(1 p)

y
) 0 if y 0 and if
np

When the logistic regression model is an adequate fit to the data, & the sample
size is large, the deviance has a chi-square distribution with n-p df; p is no of
parameters in the model.
Small values of deviance (or large p value) imply that the model provides a
satisfactory fit to the data, while large values of deviance imply that the current

## A good rule of thumb is to divide the deviance by its number of degrees of

freedom.
If the ratio D/(n-p) is much greater than unity, the current model is not an
30

## The Pearson chi-square GOF statistic can be compared to a chi-square

distribution with n-p degrees of freedom.
Small values of the statistic (or a large P value) imply that the model
provides a satisfactory fit to the data.
The Pearson chi-square statistic can also be divided by the number of df np and the ratio compared to unity.
If the ratio greatly exceeds unity, the GOF of the model is questionable.

31

## The Hosmer-Lemeshow (HL) goodness-of-fit statistic:

When there are no replicates on the regressor variables, the observations can be
grouped to perform a GOF test called the Hosmer-Lemeshow test.
In this procedure, the observations are classified into g groups based on the
estimated probabilities of success.
Generally, about 10 groups are used (when g=10 the groups are called the
deciles of risk) and the observed number of successes Oj and failures Nj-Oj are
compared with the expected frequencies in each group,
and
where Nj is the number of obs in the jth group and the average estimated success
probability in the jth group is

## The Hosmer-Lemeshow statistic is really just a Pearson chi-square GOF statistic

comparing observed and expected frequencies:

32

If the fitted logistic regression model is correct, the HL statistic follows a chisquare distribution with g-2 df when the sample size is large.
Large values of the HL imply that the model is not adequate fit to the data.
It is also useful to compute the ratio of the HL to the no of df g-p with values
close to unity implying an adequate fit.
H null : model is not adequate
H 1 : model is adequate

33

## Likelihood Inference on the Model Parameters

Deviance can also be used to test hypotheses about subsets of the

Procedure:

## X1 X 2 2 , with p parameters, 2 has r parameters

This full model has deviance (
H 0 : 2 0
H1 : 2 0
The reduced model is X1 , with deviance (1 )
The difference in deviance between the full and reduced models is

## ( | 1 ) (1 ) (with r degrees of freedom

( | 1 ) has a chi-square distribution under H 0 : 0
Large values of ( | 1 ) imply that H 0 : 0 should be rejected
34

## Inference on the Model Parameters

Tests on individual model coefficients can also be done using Wald

inference
Uses the result that the MLEs have an approximate normal
distribution, so the distribution of

Z0
se( )
is standard normal if the true value of the parameter is zero. Some
computer programs report the square of Z (which is chi-square), and
others calculate the P-value using the t distribution

35

## Logistic Regression with 1 Predictor

Response - Presence/Absence of characteristic
Predictor - Numeric variable observed for each case
Model - (x) Probability of presence at predictor level x
0 1 x

e
( x)
0 1 x
1 e
= 0 P(Presence) is the same at each level of x
> 0 P(Presence) increases as x increases

## statistical software such as SPSS, SAS, R or STATA (or in a matrix

language)
Primary interest in estimating and testing hypotheses regarding
Large-Sample test (Wald Test):
H0: = 0
HA: 0

2
T .S . : X obs

2
R.R. : X obs

^

^1

^
1

2 ,1

2
P val : P ( 2 X obs
)

## Note: Some software packages

perform this as an equivalent Ztest or t-test

Odds Ratio
Interpretation of Regression Coefficient ():
In linear regression, the slope coefficient is the change in the mean response
as x increases by 1 unit
In logistic regression, we can show that:

odds( x 1)
e
odds( x)

( x)
odds( x)

1 ( x)

## Thus e represents the change in the odds of the outcome

(multiplicatively) by increasing x by 1 unit
If = 0, the odds and probability are the same at all x levels (e=1)
If > 0 , the odds and probability increase as x increases (e>1)
If < 0 , the odds and probability decrease as x increases (e<1)

## 95% Confidence Interval for Odds Ratio

Step 1: Construct a 95% CI for :

^
^
^
^
^
^
1.96 , 1.96

1.96
^

Step 2: Raise e = 2.718 to the lower and upper bounds of the CI:

e 1.96 , e 1.96

^ ^

^ ^

## If entire interval is above 1, conclude positive association

If entire interval is below 1, conclude negative association
If interval contains 1, cannot conclude there is an association

## Ex. Sex ratios in insects (the proportion of all

individuals that are males)
In the species in question, it has been observed that the sex ratio

## is highly variable, and an experiment was set up to see whether

population density was involved in determining the fraction of
males.
Density
1
4
10
22
55
121
210
444

females
1
3
7
18
22
41
52
79

male
0
1
3
4
33
80
158
365

## Ex. Sex ratios in insects (the proportion of all

individuals
that
It certainly looks
as ifare
theremales)
are proportionally more males at
density,
but
shoulditplot
the data
as proportions
see
high
In the
species
in we
question,
has been
observed
that the sextoratio

more clearly.
is highly variable, andthis
an experiment
was set up to see whether
population density was involved in determining the fraction of
males.
Density
1
4
10
22
55
121
210
444

females
1
3
7
18
22
41
52
79

male
0
1
3
4
33
80
158
365

## Enter the data into R

Make it as a data frame:

Density
1
4
10
22
55
121
210
444

females
1
3
7
18
22
41
52
79

male
0
1
3
4
33
80
158
365

## Evidently, a logarithmic transformation of the explanatory variable is

likely to improve the model fit (population density involved in
determining the fraction of males)

## in the proportion of males in the population? (whether the sex ratio

is density-dependent?)
The response variable matched pair of counts that we wish to

## analyse as proportion data

The explanatory variable population density
First: bind together the vectors of male and female counts into a

## single object that will be the response in the analysis.

y <- cbind(males,females)
y will be interpreted in the model as the proportion of all indiv.
that were male.
Then, fit the generalized linear model; link function:binomial

Intercept

slope
If residual deviance > residual d.o.f , we
called overdispersion

## The slope is highly significantly steeper than zero (proportionately more

males at higher population density)
See whether if log transformation of the explanatory variable reduces the
residual deviance below 22.091

## residual degrees of freedom.

If the residual deviance > residual degrees of freedom, we called
OVERDISPERSION
OVERDISPERSION there is extra, unexplained variation, over
and above the binomial variance assumed by the model
specification.
How to overcome?
By transformation
Or use quasi-likelihood (in family argument)
Eg: glm(y~log(x), family=quasibinomial) if binomial dist.

overdispersion.

## The analysis of deviance table

Deviance column differences between models as variables

## are added to the model in turn.

The deviances are approximately chi-square distributed with
the stated dof.
Necessary to add the test=chisq argument to get the
approx. chisq tests.
If more than one predictor, to test which predictor should be
stay/ remove from the model, we can use function:
> drop1(model,test="Chisq")

Measure of Fit
The deviance shows how well the model fits the data
Comparing two models deviances
Use a likelihood ratio test
Compare using Chi-square distribution

## To do the reverse transformation

to get the model coefficients

Model checking..
Plot the model

## Residual vs fitted values

2. Normal plot
3. Diagnostic checking ; cooks distance etc
1.

> par(mfrow=c(2,2))
> plot(model)

## No pattern in the residuals against fitted values

Normal plot is reasonably linear
Point no. 4 is highly influential ( it has a large Cooks
distance), but the model is still significant with the
point omitted.

## Conclusion of the example:

We conclude that the proportion of animals that are males

## increases significantly with increasing density

The logistic model is linearized by logarithmic transformation of
the explanatory variable (population density).
Draw the fitted line through the scatter plot:
xv <- seq(0,6,0.1)
plot(log(density),p,ylab="Proportion male")
lines(xv,predict(model,list(density=exp(xv)),type="response"))
The use of type=response to back-transform from logit scale to
the S-shaped proportion scale.

EXAMPLE 2:
Challenger Data

Launch
O-ring Failure

Launch
O-ring Failure

53

70

56

70

57

72

63

73

66

75

67

75

67

76

67

76

68

78

69

79

70

80

70

81

## A Logistic Regression Model for the

Challenger Data
Test that all slopes are zero: G = 5.944, DF = 1,
P-Value = 0.015
Goodness-of-Fit Tests
Method

Chi-Square

DF

Pearson

14.049

15

0.522

Deviance

15.759

15

0.398

Hosmer-Lemeshow

11.834

0.159

exp(10.875 0.17132 x)
y
1 exp(10.875 0.17132 x)
68

## Note that the fitted function has

been extended down to 31 deg F,
the temperature at which
Challenger was launched

69

## Odds Ratio for the Challenger Data

O R e 0.17132 0.84
This implies that every decrease of one degree in temperature
increases the odds of O-ring failure by about 1/0.84 = 1.19 or 19
percent
The temperature at Challenger launch was 22 degrees below the
lowest observed launch temperature, so now

## O R e22( 0.17132) 0.0231

This results in an increase in the odds of failure of 1/0.0231 =
Theres a big extrapolation here, but if you knew this prior to
launch, what decision would you have made?
70

Example 3

## Another Logistic Regression Example: The

Pneumoconiosis Data
A 1959 article in Biometrics reported the data:

72

73

74

75

76

Vining

77

78

## Linear Regression Analysis 5E Montgomery, Peck &

Vining

Diagnostic Checking

79

Vining

80

Vining

81

Vining

82

Vining

## Useful qualities of the logit for social analysis.

The use of odds in the outcome variable makes the model more

## sensitive to changes near p=0 or p=1 than to changes near p=.5

This is appropriate in that small absolute changes in proportions near 0 or 1

## tend to reflect bigger effects than small absolute changes in proportions

near .5
The use of the log function in the outcome variable makes the

## model sensitive to relative changes in proportions rather than

absolute changes.
This is appropriate in that explanatory variables often have multiplicative

## Advantages of the logit model over the linear

regression model for binary outcomes.
1.) The logit of the outcome tends to have a linear relationship with

## the explanatory variables.

(This is the most important advantage!)
2.) The logit of the outcome can go to + or -, so it is

## impossible to have meaningless predictions for the outcome

variable.
3.) The logit model produces results equivalent to those of a
homoskedastic model.

## One important disadvantage of the logit model:

Estimation
A given individual either will or will not have the outcome, so the

## observed p = 0 or p = 1 for all cases.

What is the logit when p = 0? When p = 1?

## This problem makes it impossible to do least squares estimation of a logit

model.
least squares estimates minimize (observed expected)2, and the logit of the

## observed is always undefined!

it is impossible to directly standardize logits, so there is no true r or r2 for a
logit model.

## Solving the estimation problem for logit models:

Logit models are not solved by least squares estimation, but by a

## completely different procedure called maximum likelihood

estimation.
least squares procedures are based on the notion of a sampling distribution; a

## universe of possible samples coming from a single true population

parameter.
maximum likelihood procedures are based on the notion of a universe of
possible population parameters that could produce the one observed sample.
Standard errors are comparable in the two procedures, but

## Summary of this lecture:

You should be able to do the following:
explain the problems (in order of importance) with using a linear regression

## model when there is a binary outcome.

define a logit model in equations and in words
explain why a logit model often overcomes the problems of a linear
regression model
look at the output of a logit model and be able to
predict y,

## Predict odds of y, and

And predict log odds of y for a given x,
and to express the slope as an odds ratio