Sie sind auf Seite 1von 17

Epidemiology 9521B

Logistic Regression

Chapter 5 - Logistic Regression


5.5.6 Modeling Ordinal and Nominal Categorial Outcomes
5.7 Sample Size, Power and Detectable Effects

Yun-Hee Choi
Department of Epidemiology and Biostatistics
Western University

Mar. 14, 2013

Epidemiology 9521B

Logistic Regression

Learning Objectives

By the end of this lecture you should be able to


1. understand modelling approaches for more than two outcome
levels
2. calculate the sample size, power and detectable effect sizes

Epidemiology 9521B

Logistic Regression

5.5.6 Ordinal Categorial Outcomes


Proportional Odds Model
I a generalization of the logistic model to a multilevel
categorical response with ordered categories.
I Consider the severity of back pain (y ) was measured with 5
level scales, j=1,2,3,4,5.
logit(Pr (y j)) = log
I

Equivalently, based on the cumulative probability that


response is not greater than a chosen category
logit(Pr (y j)) = log

Pr (y j)
= j + x
Pr (y < j)

Pr (y j)
= j x
Pr (y > j)

j is unique to this response level and represents represents


the log odds of falling into or below category j when x = 0.
1 2 5
3

Epidemiology 9521B

Logistic Regression

5.5.6 Ordinal Categorial Outcomes


Proportional Odds Model
I

Assume that is identical for every cut point,


j = 1, , c 1.

represents the log odds ratio associated with one-unit


change in x, i.e., the increase in log odds of falling into or
blow any category associated with one-unit decrease in x.

e represents proportional odds of falling into or below any


category for one-unit decrease in x.
P(Y j|X = x2 )/P(Y > j|X = x2 )
= exp((x1 x2 ))
P(Y j|X = x1 )/P(Y > j|X = x1 )
independently of j; hence the name Proportional Odds Model.

Epidemiology 9521B

Logistic Regression

SAS example
proc logistic data=wcgs descending;
model typchd69 = age smoke ;
run;
Response Profile
Ordered
Value
1
2
3
4

Parameter
Intercept 3
Intercept 2
Intercept 1
AGE
smoke

TYPCHD69
3
2
1
0

Total
Frequency
51
71
134
2897

Analysis of Maximum Likelihood Estimates


Standard
Wald
DF
Estimate
Error
Chi-Square
1
-8.0670
0.5781
194.7288
1
-7.1652
0.5672
159.5814
1
-6.3660
0.5617
128.4496
1
0.0760
0.0114
44.5381
1
0.6381
0.1348
22.4010

Pr > ChiSq
<.0001
<.0001
<.0001
<.0001
<.0001

Epidemiology 9521B

Logistic Regression

5.5.6 Nominal Categorial Outcomes


Multinomial Logistic Model
I When there is no natural ordering in a categorical response,
I Each response level follows a logistic model with a selected
level specified as the reference.
log

I
I
I

I
I

Pr (y = j)
= j + j x,
Pr (y = 1)

using the first level (y=1) as the reference.


The regression coefficients for each level are unique
The outcome represents the log relative risk rather than a log
odds
j represents the change in the log relative risk for level j
relative to the reference level associated with a unit increase
in x.
e j = relative risk ratio
This could be an attractive alternative when the proportional
odds assumption is not satisfied.

Epidemiology 9521B

Logistic Regression

SAS example
proc logistic data=wcgs ;
class typchd69 (ref=0);
model typchd69 = age smoke /link=glogit;
run;

Parameter

Analysis of Maximum Likelihood Estimates


Standard
Wald
TYPCHD69
DF
Estimate
Error
Chi-Square

Intercept
Intercept
Intercept
AGE
AGE
AGE
smoke
smoke
smoke

1
2
3
1
2
3
1
2
3

1
1
1
1
1
1
1
1
1

-6.7666
-7.3720
-9.0303
0.0692
0.0703
0.1009
0.7689
0.6417
0.3560

0.7582
1.0250
1.2246
0.0154
0.0208
0.0244
0.1854
0.2478
0.2848

79.6430
51.7323
54.3787
20.2179
11.4094
17.0809
17.1919
6.7065
1.5628

Pr

Epidemiology 9521B

Logistic Regression

5.7 Sample Size Calculation


Sample size calculation for two-sided tests H0 : j = 0 with type 1
error of to achieve power :
For linear regression,
n=

(z1/2 + z )2 y2|X
(ja )2 (xj )2 (1 2j )

For logistic regression,


n=

I
I
I
I
I

(z1/2 + z )2
(ja )2 (xj )2 p(1 p)(1 2j )

ja is the hypothesized value of j under the alternative


z is the percentile of the standard normal distribution.
xj is the standard deviation of Xj ,
j is the multiple correlation of Xj with the other covariates
p is the marginal prevalence of the outcome.
8

Epidemiology 9521B

Logistic Regression

5.7 Sample Size, Power, and Detectable Effects


For predetermined n, power can be obtained by
q
i
h
= 1 z1/2 |ja |xj np(1 p)(1 2j )

Minimum detectable effect (on the log-odds scale ) by


ja =

z1/2 + z
q
xj np(1 p)(1 2j )

For = 0.05 and 80% power, = 0.8,


z1/2 = 1.96,

z = 0.842

Epidemiology 9521B

Logistic Regression

5.7 Sample Size, Power, and Detectable Effects


I
I

p
When Xj is binary with prevalence fj , xj = fj (1 fj )
When Xj is continuous with standard deviation xj , it is
important to recognize that sample size, power and detectable
effect do not depend on the units in which Xj is measured.
Variance Inflation factor (VIF) =

varp (j )
var1 (j )

1
,
12j

where var1 (j ) is the variance of j in the simple model,


varp (j ) is the variance of j in the multivariable setting with
p covariates
2j is known as R 2 , i.e, the proportion of the variance of xj
explained by the regression relationship with the other
covariates.
The sample size required for a logistic regression model with p
covariates can be obtained
np = n1 VIF ,
where n1 is obtained from a simple logistic regression model.
10

Epidemiology 9521B

Logistic Regression

5.7 Sample Size, Power, and Detectable Effects


(j ),
Using published results given n, SE
Sample Size
(j )
(z1/2 + z )2 nSE
n=
a
(j )2
Power
h
i
p
(j )
= 1 z1/2 |ja |/ n/nSE
Minimum detectable effect (on the log-odds scale )
p
(j )
ja = (z1/2 + z ) n/nSE

11

Epidemiology 9521B

Logistic Regression

5.7 Sample Size, Power, and Detectable Effects

Note that
I

you have to use the SE of j not the SE of the odds ratio e j .

Using the confidence interval of the odds ratio (L, U) of e j ,


(j ) = log(U/L)/3.92,
we can obtain SE

Epidemiology 9521B

SAS: proc power

proc power;
twosamplefreq test=pchi
oddsratio=2
refproportion=0.25
npergroup=.
power= 0.8;
run;

Logistic Regression

Epidemiology 9521B

Logistic Regression

SAS: proc power


The POWER Procedure
Pearson Chi-square Test for Two Proportions
Fixed Scenario Elements
Distribution
Asymptotic normal
Method
Normal approximation
Reference (Group 1) Proportion
0.25
Odds Ratio
2
Nominal Power
0.8
Number of Sides
2
Null Odds Ratio
1
Alpha
0.05
Computed N Per Group
Actual
N Per
Power
Group
0.800
152

Epidemiology 9521B

SAS: proc power

proc power;
logistic
alpha = 0.05
vardist(Duration) = normal(4, 1.5)
testpredictor = Duration
testoddsratio = 1.7
responseprob = 0.65
ntotal = .
power = 0.8 ;
run;

Logistic Regression

Epidemiology 9521B

Logistic Regression

SAS: proc power


The POWER Procedure
Likelihood Ratio Chi-Square Test for One Predictor
Fixed Scenario Elements
Method
Alpha
Response Probability
Test Predictor
Odds Ratio for Test Predictor
Unit for Test Pred Odds Ratio
Nominal Power
Total Number of Bins

Computed N Total
Actual
Power

N
Total

0.805

70

Shieh-OBrien approximation
0.05
0.65
Duration
1.7
1
0.8
10

Epidemiology 9521B

Logistic Regression

SAS: proc power

data adjn;
rho2 = 0.3;
vif = 1/(1-rho2);
n1 = 70;
n = n1*vif;
run;
proc print;
run;
Obs

rho2

vif

n1

0.3

1.42857

70

100

Das könnte Ihnen auch gefallen