Beruflich Dokumente
Kultur Dokumente
1 / 24
Supervised learning
y {0, 1}
3 / 24
Figure 2: Classification as linear regression (1)
4 / 24
You can classify as follows:
if h 0, 5, predict y = 1
if h < 0, 5, predict y = 0
5 / 24
Figure 3: Classification as linear regression (2)
6 / 24
Classification :: Logistic regression
1
h (x) = (1)
1 + e T x
7 / 24
1
Figure 4: Sigmoid function g(z) = 1+e z
8 / 24
Interpretation of h (x)
9 / 24
Linear Decision Boundary
10 / 24
Figure 5: Linear decision boundary
12 / 24
Figure 6: Non-linear decision boundary
13 / 24
More complicated decision boundary
14 / 24
How to choose parameters ? :: Cost function
1 Pn
Notation: J() = n i=1 Cost(h (xi ), yi )
Linear regression
J() = n1 ni=1 Cost(h (xi ), yi ) = 1 Pn 1
yi )2 .
P
n i=1 2 (h (xi )
Logistic regression Using the squarre error function doesnt guarantee
finding a global minimum because of non-linearity of
1
h (x) = T x
(many local minima)
1+e
15 / 24
How to choose parameters ? :: Cost function(2)
(
log(h (x)) if y = 1
Cost(h (x), y ) = (2)
log(1 h(x)) if y = 0
16 / 24
How to choose parameters ?
17 / 24
How to choose parameters ? :: Gradient descent
Pn
J() = n1 [ i=1 yi log(h (xi )) + (1 yi ) log(1 h (xi ))]
min J()
Repeat
{
J()
j := j (4)
j
18 / 24
How to choose parameters ? :: Gradient descent
(2)
Repeat
{
n
X
j := j (h (xi ) yi )xij (5)
i=1
Have you already meet it? Yes, see linear regression. But
19 / 24
Logistic regression :: Multi-class classification
20 / 24
One-vs-all
Figure 8: One-vs-all
21 / 24
One-vs-all (2)
New instance x:
22 / 24
Addressing overfitting :: Regularization :: Cost
function
Pn Pm
J() = n1 [ i=1 yi log(h (xi )) + (1 yi ) log(1 h (xi )) +
2m i=j 2j ]
Repeat
n
X
0 := 0 (h (xi ) yi )xi0 (6)
i=1
n
X
j := j [ (h (xi ) yi )xij + j ] (7)
i=1
m
23 / 24
For more details refer to
24 / 24