Beruflich Dokumente
Kultur Dokumente
Logistic Regression
Jia Li
Department of Statistics
The Pennsylvania State University
Email: jiali@stat.psu.edu
http://www.stat.psu.edu/jiali
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Logistic Regression
Preserve linear classification boundaries.
I By the Bayes rule:
Pr (G = k | X = x) = Pr (G = l | X = x) .
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Assumptions
Pr (G = 1 | X = x)
log = 10 + 1T x
Pr (G = K | X = x)
Pr (G = 2 | X = x)
log = 20 + 2T x
Pr (G = K | X = x)
..
.
Pr (G = K 1 | X = x)
log = (K 1)0 + KT 1 x
Pr (G = K | X = x)
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Pr (G = k | X = x)
log = k0 l0 + (k l )T x .
Pr (G = l | X = x)
I Number of parameters: (K 1)(p + 1).
I Denote the entire parameter set by
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
exp(k0 + kT x)
Pr (G = k | X = x) = PK 1
1 + l=1 exp(l0 + lT x)
for k = 1, ..., K 1
1
Pr (G = K | X = x) = PK 1 .
1+ l=1 exp(l0 + lT x)
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
I Similarities:
I Both attempt to estimate Pr (G = k | X = x).
I Both have linear classification boundaries.
I Difference:
I Linear regression on indicator matrix: approximate
Pr (G = k | X = x) by a linear function of x.
Pr (G = k | X = x) is not guaranteed to fall between 0 and 1
and to sum up to 1.
I Logistic regression: Pr (G = k | X = x) is a nonlinear function
of x. It is guaranteed to range from 0 to 1 and to sum up to 1.
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Binary Classification
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
I If yi = 1, i.e., gi = 1,
If yi = 0, i.e., gi = 2,
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
exp( T x)
p(x; ) = Pr (G = 1 | X = x) =
1 + exp( T x)
1
1 p(x; ) = Pr (G = 2 | X = x) =
1 + exp( T x)
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
I The element on the jth row and nth column is (counting from
0):
L()
1j 1n
N T T T
X (1 + e xi )e xi xij xin (e xi )2 xij xin
=
i=1
(1 + e T xi )2
XN
= xij xin p(xi ; ) xij xin p(xi ; )2
i=1
XN
= xij xin p(xi ; )(1 p(xi ; )) .
i=1
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Pseudo Code
1. 0
2. Compute y by setting its elements to
1 if gi = 1
yi = ,
0 if gi = 2
i = 1, 2, ..., N.
3. Compute p by setting its elements to
T
e xi
p(xi ; ) = i = 1, 2, ..., N.
1 + e T xi
4. Compute the diagonal matrix W. The ith diagonal element is
p(xi ; )(1 p(xi ; )), i = 1, 2, ..., N.
5. z X + W1 (y p).
6. (XT WX)1 XT Wz.
7. If the stopping criteria is met, stop; otherwise go back to step
3.
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Computational Efficiency
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
1. 0
2. Compute y by setting its elements to
1 if gi = 1
yi = , i = 1, 2, ..., N .
0 if gi = 2
3. Compute p by setting its elements to
T
e xi
p(xi ; ) = i = 1, 2, ..., N.
1 + e T xi
4. Compute the N (p + 1) matrix X by multiplying the ith row of
matrix X by p(xi ; )(1 p(xi ; )), i = 1, 2, ..., N:
T
p(x1 ; )(1 p(x1 ; ))x1T
x1
xT p(x2 ; )(1 p(x2 ; ))x T
X= 2 2
X =
xNT p(xN ; )(1 p(xN ; ))xNT
5. + (XT X)1 XT (y p).
6. If the stopping criteria is met, stop; otherwise go back to step 3.
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Example
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
e 0.76790.6816X1 0.3664X2
Pr (G = 1 | X = x) =
1 + e 0.76790.6816X1 0.3664X2
1
Pr (G = 2 | X = x) =
1 + e 0.76790.6816X1 0.3664X2
I The classification rule is:
1 0.7679 0.6816X1 0.3664X2 0
G (x) =
2 0.7679 0.6816X1 0.3664X2 < 0
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
I Within training
data set
classification error
rate: 28.12%.
I Sensitivity: 45.9%.
I Specificity: 85.8%.
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Multiclass Case (K 3)
I When K 3, is a (K-1)(p+1)-vector:
10
11
.
..
10
1 1p
20
20
.
2 .
= = .
..
2p
.
.
(K 1)0 ..
K 1
(K 1)0
..
.
(K 1)p
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
l0
I Let l = .
l
I The likelihood function becomes
N
X
L() = log pgi (xi ; )
i=1
N T
!
X e gi xi
= log PK 1 Tx
i=1 1+ l=1 e l i
N K 1
" !#
lT xi
X X
= gTi xi log 1 + e
i=1 l=1
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
N
X
= xij (I (gi = k) pk (xi ; ))
i=1
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
I Matrix form.
I y is the concatenated indicator vector of dimension
N (K 1).
y1 I (g1 = k)
y2 I (g2 = k)
y= . yk =
..
..
.
yK 1 I (gN = k)
1k K 1
I p is the concatenated vector of fitted probabilities of dimension
N (K 1).
p1 pk (x1 ; )
p2 pk (x2 ; )
p= . pk =
..
..
.
pK 1 pk (xN ; )
1k K 1
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
L()
= XT (y p)
2 L()
= XT WX .
T
I The formula for updating new in the binary classification case
holds for multiclass.
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Computation Issues
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Simulation
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
LDA Result
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
= (0.3288, 1.3275)T .
e 0.32881.3275x
Pr (G = 1 | X = x) = .
1 + e 0.32881.3275x
Jia Li http://www.stat.psu.edu/jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/jiali