Sie sind auf Seite 1von 141

( )

. .
http://www.ccas.ru/voron
voron@ccas.ru
, .
, vokov@forecsys.ru,
( , ..)
- www.MachineLearning.ru.
.

1 :
1.1 . . . . . . . . . . . . . . . . . . . .
1.1.1 . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 . . . . . . . . . . . . . .
1.1.4 . . . . . . . . . . . . . . . . . . . . . . .
1.1.5 . . . . . . . . . .
1.1.6
1.2 . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 . . . . . . . . . . . . . . . . . . . . . .
1.2.2 . . . . . . . . . . . . . . . .
1.2.3 . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 . . . . . . . . . . . . . . . . . . . . . . .
1.2.5 . . . . . . . . . . . . . . . . . . . .
1.2.6 . . . . . .
1.2.7 . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

4
4
4
5
5
6
7
8
9
9
11
12
13
14
14
16

2
2.1 . . .
2.1.1 . . . . . . . . . . . .
2.1.2 . .
2.1.3
2.2 . . . . . . . . . . . .
2.2.1 . . . . . .
2.2.2 . . . . . . . . . . . . .
2.3 . . . . . . . . . .
2.3.1 . . . . .
2.3.2 . . . . . . . . . . .
2.3.3 . . . . . . . . .
2.4 . . . . . . . . . . . . . .
2.4.1 EM- . . . . . . . . . . . . . . . . . . . . .
2.4.2

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

18
18
18
19
21
22
22
23
25
25
26
29
32
32
37

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

. . .

2.4.3

. . . . . . . . . . . . . . . . 39

3
3.1 . . . . . . . .
3.1.1 . . . . .
3.1.2 . . . . . . . . . . . . . .
3.1.3 . . . . . . . . . . . . . .
3.1.4 . . . . . . . . . . .
3.2 . . . . . . . . . . . . . . . . . .
3.2.1 . . . . . . . . . . . . . . .
3.2.2 STOLP

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

4
4.1 . . . . . . . . .
4.2 . . . . . . . . . . . . . . . . . . . . .
4.3 . . . . . . . . . . . . . . . . . . . . .
4.3.1 . . . . . . . . . . . . . . . . . . .
4.3.2 . .
4.4 . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 . . . . . . . . . . . . . .
4.4.2
4.4.3 . . . . . .
4.5 . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 . . . . . . . . . . . . . . . . . . .
4.5.2 . . . . . . . . . . . . . . . . . .
4.5.3 . . . . . . . . . . . . . . . .
4.6 ROC- . . . . . . . .
5
5.1 . . . . . . . . . . . . . . . . . . . . . .
5.2 : . . . . . . . . .
5.2.1 . . . . . . . . . . . . . . . . . . . .
5.2.2 . . . . . . . . . . . . . . . . . . .
5.2.3 :
5.2.4 . . . . . . . . . . . . . . . . . . .
5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 . . . . . . . . . . . . . . . . . . . . .
5.3.2 . . . . . . . . . . . . . . . .
5.3.3 . . . . . . . . . . . . . . . . . . . . . . . .
5.3.4 . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.5 . . . . . . . . . . . . . . . . .
5.4 . . . . . . . . . . . . . . . . . . . . . . . .
5.5 . . . . . . . . . . . .
5.5.1 . . . . . . . . . . . . . . . . . .
5.5.2 . . . . .
5.5.3 . . . . . . . . . . . . . . . . . .
5.5.4 . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

42
42
42
43
45
45
47
47
48

.
.
.
.
.
.
.
.
.
.
.
.
.
.

51
51
54
57
59
60
62
62
64
66
67
68
69
73
78

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

80
80
80
80
81
82
83
84
85
86
86
88
89
89
93
93
94
96
96

5.5.5 . . 98
5.6 . . . . . . . . . . . . . . . 99
6
6.1 . . . . . . . . . . . . . . . . . . . . . .
6.1.1 . . . . . . . . . . .
6.1.2 .
6.2 . . . . . . . . . . . . . . .
6.2.1 . . . .
6.2.2 . . . . . .
6.2.3 . . . . . . . . . . .
7
7.1 . . . . . . . . . . . . . . .
7.1.1 . . . . .
7.1.2 . . . .
7.1.3 . . . . . . . . . . .
7.1.4 . . . . . . . . .
7.2 . . . . . . . . . . . . . . . . . . . . . .
7.2.1 . . . . . . . .
7.2.2 . . . .
7.2.3
7.3 . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

102
. 102
. 102
. 104
. 105
. 105
. 108
. 110

.
.
.
.
.
.
.
.
.
.

113
. 113
. 115
. 117
. 119
. 122
. 126
. 126
. 128
. 131
. 133

. . .

,
. .

1.1

X, Y , (target function) y : X Y , yi = y (xi ) {x1 , . . . , x } X.


(xi , yi ) . X = (xi , yi )i=1 (training sample).
, X
y , (decision
function) a : X Y , y (x),
, X.
a ; .
1.1.1

(feature) f x . f : X Df , Df
. , a : X Y
.
Df .
Df = {0, 1}, f ;
Df , f ;
Df , f ;
Df = R, f .
, Df1 = = Dfn ,
, .

f1 , . . . , fn . f1 (x), . . . , fn (x)
x X. X , X = Df1 . . . Dfn .
X , n, :

f1 (x1 ) . . . fn (x1 )
...
... .
(1.1)
F = kfj (xi )kn = . . .
f1 (x ) . . . fn (x )
.

1.1.2

Y
.
Y = {1, . . . , M }, (classication) M . X
Ky = {x X : y (x) = y}, a(x) x?.
(pattern recognition).
Y = {0, 1}M , M . M
.
Y = R, (regression estimation).
(forecasting) , x X
x, y Y .
1.1.3

. 1.1. A = {g(x, ) | }, g : X Y ,
, (search space).
1.1. n fj : X R, j = 1, . . . , n
= (1 , . . . , n ) = Rn :
g(x, ) =

n
X

j fj (x) , Y = R;

j=1

g(x, ) = sign

n
X
j=1

j fj (x) , Y = {1, +1}.

, .
,
.
1.2. (xi , yi ) R2 , i = 1, . . . ,
. n fj (x) = xj1 , g(x, )
1.1 n 1 x.
X
(tting) (training, learning)1 a A.
1

, (learning machine),
, (training sample).

. . .

. 1.2. (learning algorithm) : (X Y ) A,


X = (xi , yi )i=1 a A. , a X . .
, .
X a = (X ).
a x y = a(x).
. ,
, .
1.1.4

. 1.3. (loss function) L (a, x),


a x. L (a, x) = 0,
a(x) .
. 1.4. a X :

1X
L (a, xi ).
i=1

(1.2)

(X ) = arg min Q(a, X ).

(1.3)

Q(a, X ) =

Q [4], (xi , yi )i=1 .


, 0 1, .
L (a, x) = 1 , a x,
Q a X .
, Y R:
L (a, x) = [a(x) 6= y (x)] ,
2 ;
L (a, x) = |a(x) y (x)| ; Q
a X ;
L (a, x) = (a(x) y (x))2 ; Q a X ; .
, (empirical risk minimization, ERM), , A a, Q
X :
aA

1.3. (Y = R) n fj : X R, j = 1, . . . , n,
, :

(X ) = arg min

X
i=1

2
g(xi , ) yi .

[] = 0, [] = 1.

1.1.5

X , . ,
fj (x) y (x)
. , , .
x . y (x), , .
.
y (x)
X Y
p(x, y),
X = (xi , yi )i=1 .
(independent identically distributed, i.i.d.).
, y (x)
p(x, y) = p(x)p(y|x), p(y|x) = (y y (x)), (z) -.
. g(x, ), y (x),
(x, y, ),
p(x, y). , X ,
.
X ,
p(x, y) 
: p(X ) = p (x1 , y1 ), . . . , (x , y ) = p(x1 , y1 ) p(x , y ). p(x, y)
(x, y, ), (likelihood):
L(, X ) =

(xi , yi , ).

i=1

, . , , L(, X )
.
. , , [13].
, , a (x) (x, y, ) .
.
L ln L,
( ) :

ln L(, X ) =

X
i=1

ln (xi , yi , ) min .

(1.4)

. . .

(1.2),
L (a , x) = ln (x, y, ). (xi , yi ) ,
(xi , yi , ) L (a , x).

(x, y, ) ,
.
1.4. g(x, ).
, (x,
) = g(x, ) y (x) 2 

1
2
N (; 0, ) = 2 exp 22 2 .


(x, y, ) = p(x)(y | x, ) = p(x)N g(x, ) y (x); 0, 2 .
, C0 C1 , :

2
ln (x, y, ) = ln p(x)N g(x, ) y (x); 0, 2 = C0 + C1 g(x, ) y (x) .

,
: , . (
) . .

1.1.6

. Q(a, X ) a,
, a X k = (xi , yi )ki=1 .
,
, , , (overtraining) (overtting). .
,
, . X , ,
x xi X . x = xi
yi . .
, . . :
, .
(generalization ability) Q((X ), X k ) , X X k .
, X X k ,
X.

. 1.5. ,


PX ,X k Q((X ), X k ) > < .
(1.5)

, (1 ) .
:
X X k Q((X ), X k ) 6 1 .

(1.5) . 60-
. . . . [5, 6, 7].
[34],
, .
,
.
X L = (xi , yi )Li=1 . N
Xn Xnk
k = L. n = 1, . . . , N an = (Xn )
Qn = Q(an , Xnk ). Qn
(cross-validation, CV):
N
1 X
CV(, X ) =
Q((Xn ), Xnk ).
N n=1
L

(1.6)

, X L [48]. , N 20 100.
tq- (tq-fold cross-validation),
q ( ) , , . X L - t
q . N = tq .
, t .
: , ,
- L .

1.2

,
, .
1.2.1

1.5. . ,

10

. . .

. , , , , . .
(, , , ).
, , , , , . . , ,
. ,
: ( ); ; ; ;
. ,
, .
1.6. .
60-70- .

, .
, , , , , .
: , .
, , . ,
. , , , , .
.
:
. .
(score) , . , . (credit scoring).
,
,
. , :
, ,
. .
1.7. (churn prediction)
, ,
, .
. , , , . , .
, : , , , ,
, . . ,
. , , -

11

, xi ti . , ti + t.
, , .
1.2.2

1.8. 1886
.
, ,
. ,
. 928 -.
y = 32 x, x , y
. ,
.
:
, , : .
, . , , .
, , .
1.9. .
.
, , , ( ). ,
. .


.
. , : , -
, , , . , ,
.
. , , , , .
1.10. -,
, . ,
, , 1 5. .

12

. . .

, , .
,
. ,
. ,
99% .
.
.
(collaborative ltering).
.
2006 Netix,
Internet, 1 , 10% ,
Netix Cinematch (. http://www.netflixprize.com). ,
Cinematch 10% .
, 70%
. .
1.2.3

(ranking) .
,
. .
, . , (),
, .
(learning to rank).
1.11. , , . , , , . (, )
( ). , , .
.
: , , , . .
, . .
1.12. 1.10 . ,

13

, . , Netix,
. ,
.
,
.
1.2.4

(clustering) (classication)
, yi = y (xi ). xi ,
() , , . .
, .
1.13. . ,
. , . :
, , .
.
( , , , . .).
. ,
. , , .
, ( , ) , (semisupervised learning). , , .
1.14. . , , (,
), (,
). , .
, .
, . , , ,
, , . . ,
.

14

1.2.5

. . .

(association rule induction) ,


.
1.15. (market basket analysis)
, (, )
, .
,
(-), . , .
fj (xi ) = 1 , i- j- . , ,
. , , c 60%
. - ,
: .
,
,
.
, .
1.16. (term extraction) ,
(. 1.14), . , ,
. , , .
1.2.6

,
.
, . , ,
.
(1.6).
,
.
. ; ; , .
.
,
. ,
.

15

, . ,
, , , ;
.
. , . ,
, , , . , ( ). . UCI ( , ),
http://archive.ics.uci.edu/ml. ,
, [30].
.
. ;
, . ,
, ,
. , , -
.
, http://poligon.MachineLearning.ru. .
, , . , .

.
. , , , .
, ,
, , . http://www.kaggle.com, http://tunedit.org. ,
: http://poligon.MachineLearning.ru, http://mlcomp.org.

16

1.2.7

. . .

. , .
. [10]. , [0, 1] .
. 1. r [0, 1], = [r < p] 1 p 0
1 p.
. 2. r [0, 1], F0 = 0, F1 , . . . , Fk1 , Fk = 1,
, F1 6 r < F , j = 1, . . . , k
pj = Fj Fj1 .
. 3. r [0, 1],
R F (x), 0 6 F (x) 6 1, = F 1 (r)
F (x).
. 4. r1 , r2 , [0, 1], -
p
1 = 2 ln r1 sin 2r2 ;
p
2 = 2 ln r1 cos 2r2 ;


: 1 , 2 N (0, 1).

. 5. N (0, 1), = + N (, 2 )
2 .
. 6. n- x = (1 , . . . , n ) i N (0, 1). V n n-,
Rn . x = + V x N (, ) c = V V .
. 7. X k p1 (x), . . . , pk (x).
1, . . . , k w1 , . . . , wk . x X,
Pk
p (x), p(x) = j=1 wj pj (x).
.

17

. 1. .

. 2. .

. 8. .
X p(x, t), t R . R w(t). x RX, +
p(x, ), p(x) = w(t)p(x, t) dt.
, ,
, .
. 9. Rn = [a1 , b1 ] . . . [an , bn ] G . r = (r1 , . . . , rn ) n
ri , [ai , bi ]. , r ,
r G. r G.
, G .
, .
. 1.
, . , . -
.
, , ,
,
. . . 2 ,
.

18

. . .


. , , , , .
. : ,
.

2.1

X , Y , X Y p(x, y) = P(y)p(x|y).
Py = P(y) . py (x) = p(x|y) 3 .
.
2.1. X = (xi , yi )i=1 p(x, y) = Py py (x). 4 Py py (x) y Y .
2.2. py (x) Py y Y a(x),
.
, .
, p(x, y)
X . , .
2.1.1


x , x y:
Z
P(|y) =
py (x) dx, X.

a : X Y . X
Ay = {x X | a(x) = y}, y Y . ,
y a s, Py P(As |y).
(y, s) Y Y ys y s. yy = 0, ys > 0 y 6= s.
, , .
3

P , p .
() , , .
4

19

. 2.1.
a:
XX
R(a) =
ys Py P(As |y).
yY sY

, ys = [y 6= s], R(a) a.
2.1.2

2.1. Py
py (x), R(a)
X
a(x) = arg min
ys Py py (x).
sY

yY

. t Y :
XX
R(a) =
ys Py P(As |y) =
yY sY

X
yY

yt Py P(At |y) +

X X

sY \{t} yY

ys Py P(As |y).

P
P(As |y), :
, P(At |y) = 1
sY \{t}
X
X X
R(a) =
yt Py +
(ys yt )Py P(As |y) =
yY

sY \{t} yY

= const(a) +

X Z

sY \{t}

As yY

sY \{t}

As

(ys yt )Py py (x) dx.

(2.1)

P
gs (x) =
ys Py py (x),
yY
X Z

R(a) = const(a) +
gs (x) gt (x) dx.

(2.1)

As . R(a) 
R
|Y | 1 I(As ) = As gs (x) gt (x) dx,
As . I(As ) , As
. t



As = x X gs (x) 6 gt (x), t Y, t 6= s .



, As = x X a(x) = s . , a(x) = s , s = arg min gt (x). gt (x)
tY

t, , R(a),
.
.

, , , .
.

20

. . .

2.2. Py py (x) , yy = 0 ys y y, s Y ,

a(x) = arg max y Py py (x).
yY

(2.2)

. (2.1) 2.1. ys , s, t Y
ys yt = t [y = t] s [y = s].
,
X
(ys yt )Py py (x) = t Pt pt (x) s Ps ps (x) = gt (x) gs (x),
yY

gy (x) = y Py py (x) y Y . 2.1


, a(x) = s x, gs (x) s Y .

s t x X , (2.2) y = s y = t:
t Pt pt (x) = s Ps ps (x). x, , s, t, R(a).
y x
P(y|x). , py (x) Py :
P(y|x) =

py (x)Py
p(x, y)
.
= P
ps (x)Ps
p(x)
sY

x, , P(y|x) . x:
X
R(x) =
y P(y|x).
yY

. (2.2) :
a(x) = arg max y P(y|x).
yY

(2.2) .
R(a), , .
(y 1),
. (Py |Y1 | ), x y
py (x) x.

21

2.1.3

2.1. ,
p(x, y) = Py py (x),

X .


y Xy = (xi , yi )i=1 yi = y .
Py . ,
y
Py = , y = |Xy |, y Y,
(2.3)

Py y . ,
Py . (2.3) , .
. ,
(unbalanced classes) ; , ,
. ,
,

. Py
(2.3), .
,
, X m Xy ,
.
2.3. X m = {x1 , . . . , xm }, p(x).
p(x), p(x) X.
. , x X n fj : X R, j = 1, . . . , n. x =
= (1 , . . . , n ) X = Rn , j = fj (x).
2.1. f1 (x), . . . , fn (x) . ,
py (x) = py1 (1 ) pyn (n ),

y Y,

(2.4)

pyj (j ) j- y.
, n , n- . , ,
(2.4), (nave Bayes).
pyj () (2.4)
(2.2).


n
X

a(x) = arg max ln y Py +


(2.5)
ln pyj (j ) .
yY

j=1

22

. . .


. , () , () .
.
, , ??.

2.2

py (x) x X. x (2.2).
2.2.1

. . ,
, (2.5). , .
. X , |X| m.
xi , X m = (xi )m
i=1 :
m

1 X
[xi = x].
p(x) =
m i=1

(2.6)

, |X| m, , , ,
.
. X = R. 1
P [x h, x + h], P [a, b] [a, b].
, p(x) = lim 2h
h0

, , [x h, x + h], h ,
:
m


1 X
|x xi | < h .
ph (x) =
2mh i=1

(2.7)

ph (x) -, , (2.2)
y Y . - [57, 56]:


m
1 X
x xi
ph (x) =
,
K
mh i=1
h
K(z) , ,

(2.8)
R

K(z) dz = 1.

23

ph (x) , K(z), , ,
R
: ph (x) dx = 1 h.

 ,
 . 3, . 25.
K(z) = 12 |z| < 1 (2.7).
K(z) = [z = 0] h = 1 (2.6).
(2.7) , ,
ph (x) p(x)
m h.
2.3 ([56, 57, 25]). :
1) X m ,
p(x);
R
2) K(z) , : X K 2 (z) dz < ;
3) hm , lim hm = 0 lim mhm = .
m

phm (x) p(x) m x X,


O(m2/5 ).

. n
fj : X R, j = 1, . . . , n.
x X [9, 17]:


m
n
fj (x) fj (xi )
1 XY 1
K
.
(2.9)
ph (x) =
m i=1 j=1 hj
hj
, xi . , .

(2.8), , .
. X (x, x ), , . (2.8) :


m
X
1
(x, xi )
K
,
(2.10)
ph (x) =
mV (h) i=1
h

V (h) , , ph (x) . (2.10) K ,


, [22].
2.2.2


(2.10) y Y :



1 X
(x, xi )
[yi = y] K
,
(2.11)
py,h (x) =
y V (h) i=1
h

24

. . .

K , h . V (h)
y, (2.2) - arg max
. (2.11)
Py = y / (2.2):

a(x; X , h) = arg max y


yY

X
i=1

(x, xi )
[yi = y] K
h

(2.12)

X .
, (2.12)
h K.
h .
h 0 , ph (x) . h . ,
h . :

LOO(h, X ) =

X

i=1



a xi ; X \xi , h =
6 yi min,
h

a(x; X \ xi , h) , X xi . LOO(h) ,
h .
h(x). X
, .
h
X, . ,
x X

(k+1)
(k + 1)- h(x) = x, x
, , x.
V (h) y, , py,h(x) (x) Xy y Y .
k , h .
K
. ,
ph (x). . G ph (x)
x. E, Q, T , ( , . 3), ,
x h.

25

T
1.0

0.8

0.6

. 3. :

E ;
Q ;
T ;
G ;
.

0.4

0.2
-0.0
-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

. (x, x ) , ,
.
. (curse of dimensionality). (. ??), (. ??). ,
.
, ??.

2.3

, X m = {x1 , . . . , xm } , p(x) = (x; ),


. X m (maximum likelihood).

, , py (x), y Y
. ,
.
2.3.1


X = Rn , n .

. 2.2.

n
1
N (x; , ) = (2) 2 || 2 exp 12 (x ) 1 (x ) ,

x Rn ,

n- ()
() Rn Rnn . ,
, , .

26

. . .

Rn , , , :
Z
N (x; , ) dx = 1;
Z
Ex = xN (x; , ) dx = ;
Z

E(x )(x ) = (x )(x ) N (x; , ) dx = .


. , = diag(12 , . . . , n2 ),
, .
, = 2 In , .
,
,
. , ,
= V SV , V = (v1 , . . . , vn )
, 1 , . . . , n , S , S = diag(1 , . . . , n ). 1 = V S 1 V , ,
(x ) 1 (x ) = (x ) V S 1 V (x ) = (x ) S 1 (x ).
, x = V x
.
S . V .
.
2.3.2

2.4. n-
py (x) = N (x; y , y ),

y Y.

.
, .
. , s t,
s Ps ps (x) = t Pt pt (x),
ln ps (x) ln pt (x) = Cst ,
Cst = ln(t Pt /s Ps ) , x.
, ln py (x) x:
ln py (x) = n2 ln 2 12 ln |y | 12 (x y ) 1
y (x y ).

27

s = t ,
:
x 1 (s t ) 12 s 1 s + 12 t 1 t = Cst ;
(x st ) 1 (s t ) = Cst ;
st = 21 (s + t ) .

. ,
s, t (s Ps = t Pt ),
, (s = t = In ).
, .
,
, .
, , R(a).
(s = t 6= In ), , -
, .
(s Ps 6= t Pt ), .
,
,
.

. . ,
,
, .
2, -, -.
. , ,
(x s ) 1 (x s ) = (x t ) 1 (x t );
kx s k = kx t k ;

p
ku vk (u v) 1 (u v) Rn , . , .
, . () (nearest
mean classier), .

28

. . .

. X m = {x1 , . . . , xm },
(x; ). :
m

p(X ; ) = p(x1 , . . . , xm ; ) =

m
Y

(xi ; ).

i=1

, ,
p(X m ; ) ,
. ,
[13, 16].
:
m

L(X ; ) =

m
X
i=1

ln (xi ; ) max .

(2.13)

. , ( (x; ) ):
m

L(X m ; ) =
ln (xi ; ) = 0.

i=1

(2.14)

. (, )
, (2.14).
2.5. X m = (x1 , . . . , xm ). (x; ) N (x; , ),
(2.13),
m

1 X
xi ;

=
m i=1

X
= 1

(xi
)(xi
) .
m i=1

. .
. 2.3. X m , m ) (x; ) = 0 . (X
m
m ) = 0 .
, X , , EX m (X
: E
=

, E
= .

=
,

.
:
m
.
m1

1 X
(xi
)(xi
) .
m 1 x=1

(2.15)

29

y
.
y
Xy = {xi X | yi = y} y Y .
(2.2). , (plug-in).
y :
y
y
, . , , .

, .
.
, , ,
.
, y < n,
y .
,
.
, y , .
. 1 ,
,
y
1

y (x st ).
, , . .
, , .
. , fj (x) yj yj , , :


1
( yj )2
, y Y, j = 1, . . . , n.
pyj () =
exp
2
2yj
2yj
, , y
y .
.
yj
yj y Y j = 1, . . . , n.
2.3.3

1936 . . , , ,

30

. . .

[41]. ,
.
. ,

X
1
(xi
yi )(xi
yi )
|Y | i=1

2.4, , ,
-, . :

a(x) = arg max y Py py (x) =
yY

1
1
y
y +x
= arg max ln(y Py ) 21
y =
yY
| {z }
|
{z
}
y

= arg max x y + y .
yY

(2.16)

().
,
. , , , .

, , [28]:

R(a) = 21 k1 2 k ,
(r) = N (x; 0, 1) .

. : ( ).

, . ,

, v .
, :
+ In )v = v + v = ( + )v.
(
.
+ diag
[28]. ,
(1 )
,
[2]; ,
, , . , - .
.
.

31

0.40
0.35
0.30
0.25
0.20
0.15
0.10

max(t t+1 )
t

0.05
0
-0.05

mp
1 2 3 4 5 6

mq

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

. 4. , m = 25 q = 3 p = 10 .
, 6, .

. , .
(features selection) ??.
: , .
,
n- . . . [28]. ,

(. . R(a) ).
 (2) (2)
(2)
y , y : y Y , y Rn
. (2)
: (x) = x y . n 2
, (x) .
 (3) (3)

y , y : y Y ,
(3)
y Rn .
,
( ).
22. [28].
. ,
. , ,
, , .
, , .
(principal
component analysis), 5.4.
. , (x; ), (robust ).

32

. . .

.
X m . xi X m
: 1 > . . . > m .
i = (xi ; ),
, , ()
. : p q, t {m p, . . . , m q 1},
t t+1 . (m t)
, . . 4. . , , ,
.

, .
,
,
,
, .

2.4

, -
, .
2.2. X k :
p(x) =

k
X

wj pj (x),

j=1

k
X

wj = 1,

wj > 0,

j=1

pj (x) j- , wj . (x; ) , pj (x) = (x; j ).


, x p(x)
j- {w1 , . . . , wk },
x pj (x).
, , X m p(x), k ,
= (w1 , . . . , wk , 1 , . . . , k ).
2.4.1

EM-

, , , .
EM (expectation-maximization).
.
(hidden) G, .
, , .
, , .

33

EM- . E- (expectation) G . - (maximization)


G .
2.1. EM-
1: ;
2:
3:
G := EStep();
4:
:= MStep(, G);
5: G .
. . [27].
[39] -. , , , [26]. .
E- (expectation). p(x, j ) , x j- .
p(x, j ) = p(x) P(j | x) = wj pj (x).
gij P(j | xi ).
, xi j- . . G = (gij )mk = (g1 , . . . , gj ),
gj j- G. -
, :
k
X

gij = 1 i = 1, . . . , .

j=1

wj , j , gij :
wj pj (xi )
gij = Pk
s=1 ws ps (xi )

i, j.

(2.17)

E- EM.
M- (maximization). , gij
, ( ) .
Q() = ln

m
Y
i=1

p(xi ) =

m
X
i=1

ln

k
X
j=1

wj pj (xi ) max .

34

. . .

Pk

j=1

wj = 1. :

X

X

m
k
k
X
L(; X ) =
ln
wj pj (xi )
wj 1 .
m

i=1

j=1

j=1

wj :
m

X
pj (xi )
L
= 0,
=
Pk
wj
w
p
(x
)
s
s
i
s=1
i=1

(2.18)

j = 1, . . . , k.

wj , k ,
j i:
k
m X
X
i=1 j=1

wj pj (xi )

Pk

s=1 ws ps (xi )
{z
}
=1

k
X

wj ,

j=1

| {z }
=1

= m.
(2.18) wj , = m,
, (2.17),
:
m

wj pj (xi )
1 X
1 X
gij ,
wj =
=
Pk
m i=1 s=1 ws ps (xi )
m i=1

(2.19)

j = 1, . . . , k.

, - wj > 0
, .
j , , pj (x) (x; j ):
m

X
wj
wj pj (xi )
L X

=
pj (xi ) =
ln pj (xi ) =
Pk
Pk
j

j
j
w
p
(x
)
w
p
(x
)
s
s
i
s
s
i
s=1
s=1
i=1
i=1
=

m
X
i=1

X
gij
ln pj (xi ) =
gij ln pj (xi ) = 0,
j
j i=1

j = 1, . . . , k.



j := arg max

m
X

gij ln (xi ; ),

j = 1, . . . , k.

(2.20)

i=1

, M- wj (2.19) j
k (2.20). ,
.
EM [39, 67, 47].

35

2.2. -
:
X m = {x1 , . . . , xm };
k ;
= (wj , j )kj=1 ;
;
:
= (wj , j )kj=1 ;
1: EM (X m , k, , );
2:
3:
E- (expectation):

i = 1, . . . , m, j = 1, . . . , k
wj (xi ; j )
gij0 := gij ; gij := Pk
;
w
(x
;

)
s
i
s
s=1
4:
M- (maximization):
j = 1, . . . , k
m
m
P
1 P
gij ;
j := arg max gij ln (xi ; ); wj :=
i=1
m i=1
0
| > ;
5: max |gij gij
i,j

6: (wj , j )kj=1 ;

. ,
Q() G . , [0, 1].
2.2. E- G (2.17). M-
k (2.20),
X m gj .
EM-. (2.20) M- .
, ,
E-. - (generalized EM-algorithm, GEM) [39].
. EM .
.
,
,
. ( )
, . k ,

36

. . .

2.3. EM-
:
X m = {x1 , . . . , xm };
R ;
m0 , ;
;
:
k ;
= (wj , j )kj=1 ;
1: :

1 := arg max

m
P

ln (xi ; );

w1 := 1;

k := 1;

i=1

2: k := 2, 3, . . .
3:
:

U := {xi X m : p(xi ) < max p(xj )/R};


j

4:
5:
6:

|U | < m0
k;

k- :
P
k := arg max
ln (xi ; ); wk := m1 |U |;

7:

xi U

wj := wj (1 wk ), j = 1, . . . , k 1;
EM (X m , k, , );

.
.
EM-
.
. ,
xi ,
p(xi ). .
EM-,
. ,
.
2.3. 1
k = 1. . p(xi ) R
, xi . ,
; p(xi )
;
P0 . 3 U , . m0 ,
, . 6
, ,

37

U . , - . 7
EM-.
EM-. Q()
. EM : ,
,
.
.
EM- (stochastic EMalgorithm, SEM) [1, . 207]. 2.2
, M- ( 4)
j := arg max

m
X

gij ln (xi ; )

i=1

, ,
X
j := arg max
ln (xi ; ),

xi Xj

Xj X m .
xi X m j(i) {gij : j = 1, . . . , k}, xi X m
Xj(i) .
[1] k
, kmax . , |Xj | 6 m0 , .
.
SEM , . SEM EM, .
, SEM Q(), .
2.4.2

M- , () . (2.20)
, .
.
2.6. n- (x; j ) = N (x; j , j ) j = (j , j ), j Rn , j Rnn , j = 1, . . . , k.

38

. . .

(2.20)
m
1 X
gij xi ,

j =
mwj i=1

j =

j = 1, . . . , k;

m
1 X
gij (xi
j )(xi
j ) ,
mwj i=1

j = 1, . . . , k.

, M- . i- j- gij
, E-.

. , , . ,
2.6 .
. . , ,
. (, )
.
, , .
c . , ,
. , ,
, .
2.7. n- 2
2
(j , j ), j = (j1 , . . . , jn ), j = diag(j1
, . . . , jn
) , j = 1, . . . , k:
(x; j ) = N (x; j , j ) =

n
Y
d=1

jd


1  d jd 2
exp
,
2
jd
2

x = (1 , . . . , n ).

(2.20)

jd

m
1 X
=
gij xid ,
mwj i=1

jd
=

d = 1, . . . , n;

m
1 X
gij (xid
jd )2 ,
mwj i=1

d = 1, . . . , n;

xi = (xi1 , . . . , xin ) X m .

39

. N (x; j , j )
jd , jd xi = (xi1 , . . . , xin ):

2
ln N (xi ; j , j ) = jd
(xid jd );
jd

1
3
ln N (xi ; j , j ) = jd
+ jd
(xid jd )2 .
jd

jd , jd :
2
jd
3
jd

m
X
i=1

m
X
i=1

gij (xid jd ) = 0;

2
gij jd
(xid jd )2 = 0.

, jd , jd i,

(2.19), .
, , j = j2 In . : ,
, .
,
.
. pj (x) = N (x; j , j ) j


pj (x) = Nj exp 21 2j (x, j ) ,
n

Nj = (2) 2 (j1 jn )1 , j (x, x )


n- X:
2j (x, x ) =

n
X
d=1

2
jd
|d d |2 ,

x = (1 , . . . , n ),

x = (1 , . . . , n ).

j (x, j ), x. pj (x) x j .
f (x), x X, .
2.4.3

, .
Y = {1, . . . , M }, y Y
py (x) Xy = {(xi , yi ) X | yi = y}.

40

. . .
5 p11 (x)

2 p1k1 (x)
89:;
?>=<
x
,
)

YYYY
w11 YYYY, P
ggg3
JJJ
w
gggg 1k1
J

1 PJ
1J

pM 1 (x)

pM kM (x)

WWWW
w

M1

arg

: max
tt

/ a(x)

M PM
tt
WWW+ P tt
ee2

eeewM kM

. 5. .

py (x), y Y ,
ky . n- 2
2
yj = (yj1 , . . . , yjn ), yj = diag(yj1
, . . . , yjn
), j = 1, . . . , ky :
py (x) =

ky
X

wyj pyj (x),

j=1

pyj (x) = N (x; yj , yj ),

ky
X

wyj = 1,

wyj > 0.

j=1

. (2.2), pyj (x)


x yj :
a(x) = arg max y Py
yY

ky
X


wyj Nyj exp 21 2yj (x, yj ) ,
{z
}
|
j=1

(2.21)

pyj (x)

Nyj = (2) 2 (yj1 yjn )1 .


, , 5.
k1 + + kM pyj (x), y Y , j = 1, . . . , ky .
x,
x yj , x.
M ,
wyj .
x , pyj (x).
arg max, x .
, x yj yj (x, yj ), j = 1, . . . , ky .
, .
c RBF- (radial basis function network).
.
RBF- py (x) EM-. yj yj

41

j = 1, . . . , ky . , yj (x, yj ) yj . 2.3 .
EM-. (. 6), EM-
. , , . , yj
,
. . ,
(j-) (y-) . ,
, .
EM- ,
7.

42

. . .

, . , , , ,
, -
. , , ,
, . ,
. 5 .
X. , ,
, ( , ).
similarity-based learning distance-based learning.

3.1

X : X X [0, ).
y : X Y ,
X = (xi , yi )i=1 , yi = y (xi ). Y
. a : X Y , y (x) X.
3.1.1

u X
x1 , . . . , x u:
(2)
()
(u, x(1)
u ) 6 (u, xu ) 6 6 (u, xu ),
(i)

xu i- u. , i-
(i)
(i)
u yu = y (xu ). , u X
.
. 3.1. X u y Y , y (u, X ) :

a(u; X ) = arg max y (u, X );


yY

y (u, X ) =

[yu(i) = y] w(i, u);

(3.1)

i=1

w(i, u) i- u. y (u, X ) u y.
5

.
,
.

43

w(i, u).
, i. (i)
, u xu ,
, .
X a. , , , - ,
. a(u; X ) X , , u. (lazy learning),
(eager learning),
, .
(case-based reasoning, CBR). , u y?
: ,
y, .
w(i, u),
, .
3.1.2

(nearest neighbor, NN)


u X , :
w(i, u) = [i = 1];

a(u; X ) = yu(1) .

, , .
NN X .
. :
. , ,
, ,
.
, .
, .
.
k (k nearest neighbors, kNN). , u ,
(i)
k xu , i = 1, . . . , k:
w(i, u) = [i 6 k];

a(u; X , k) = arg max


yY

k
X

[yu(i) = y].

i=1

k = 1 , ,
. k = , , .

44

. . .

, k . k
(leave-one-out, LOO). xi X ,
k .

LOO(k, X ) =

h
X
i=1

i

a xi ; X \{xi }, k =
6 yi min .
k

, xi
, xi xi , ()
LOO(k) k = 1.
kNN: k u , u ,
k .
k . kNN , .
, k. ,
wi , i- :
w(i, u) = [i 6 k] wi ;

a(u; X , k) = arg max


yY

k
X

[yu(i) = y]wi .

i=1

wi . , , ( wi = k+1i
k
: ; 1,
2; ). , , ,
: wi = q i , q (0, 1)
. LOO, k.
kNN.
. .
( , )

. , .

O() . . ,
O(ln ) .

, .

45

3.1.3

wi (i)
(u, xu ), i. K(z),
(i) 
[0, ). w(i, u) = K h1 (u, xu ) (3.1),

a(u; X , h) = arg max


yY

X
i=1

[yu(i)


(i) 
(u, xu )
= y] K
.
h

(3.2)

h , k. u h, xi u yi .
,
, , ,
.
h . LOO(h), , ,
;
.
h ,
X. , . .
K(z), [0, 1],
. h , k
(k+1)
u : h(u) = (u, xu ).

!
k
(i)
X
(u,
x
)
u
.
(3.3)
a(u; X , k) = arg max
[yu(i) = y] K
(k+1)
yY
(u, xu )
i=1
,
, (, )
. 2.2.2.
3.1.4


Kh (u, x) = K h1 (u, x)
u. (u, x) , , . , xi u yi , hi :

a(u; X ) = arg max


yY

X
i=1

(u, xi )
[yi = y] i K
hi

i > 0, hi > 0.

(3.4)

, (3.3) , hi
xi , u.

46

. . .

3.1.
:
X ;
:
i , i = 1, . . . , (3.4);
1: : i = 0; i = 1, . . . , ;
2:
3:
xi X ;
4:
a(xi ) 6= yi
5:
i := i + 1;
6:

[3] . Y = {1, +1}


; i ; K(z)
; : 1
u. , K(z) = z1 z+a
,
.
(3.4) 2 i , hi . 3.1. i , , hi
K . : xi , yi xi , i . 3 ,
. , .
[3].
3.1 , ,
, . , ,
. , 3.1 .
3.1 : ;
; ( 1)
i ; - ; ( i )
; hi .
.

, EM-, 2.4. , , . :

47

.
(3.1), u , ,
.

3.2

. . , , , .
.
. ,
. , , . , .
, .
,
, . , .
, .
3.2.1

(3.1) , , .
. 3.2. (margin) xi X , a(u) = arg max y (u),
yY

M (xi ) = yi (xi ) max y (xi ).


yY \yi

.
, .

, : , , , , , . 6.
,
.
.
( , ),
. , . .

48

. . .
Margin
0.8
0.6

0.4
0.2

0
-0.2

-0.4
-0.6
0

20

40

60

80

100

120

140

160

180

200

. 6. Mi , i = 1, . . . , 200.
.

, .
,
.

.
(3.1), , .

.
. -
, , .
. . , , , , .
. .

, . ,
. , ,
kNN . , ,
.
3.2.2

STOLP

STOLP [11].
w(i, u).
a(u; ) (3.1), X .

49

3.2. STOLP
:
X ;
;
0 ;
:
X ;
1: xi X , xi :
2:
M (xi , X ) <
3:
X 1 := X \ {xi }; := 1;
4: :


:

:= arg max M (xi , X ) y Y ;


xi Xy

5: 6= X ;
6:
, a(u; ) :

7:
8:
9:

E := {xi X \ : M (xi , ) < 0};


|E| < 0
;
:
xi := arg min M (x, ); := {xi };
xE

M (xi , ) xi a(xi ; ).
, xi , , .
, ,
, .
3.2 ( 13). X xi M (xi , X ), .
= 0, . .
( 4).

. xi ,
. ,
0 . 0 = 0,
a(u; ), ,
.
,
. . ,
, , .

50

. . .

STOLP .
X \ , . O(||2 ).
, .
, . , ,
. 3.2,
.
STOLP : , . , . ,
.

51

,
.
. , ,
, ,
.

4.1

, Y = {1, +1}.
a(x, w) = sign f (x, w), w . f (x, w) . ,
. f (x, w) > 0, a x +1,
1. f (x, w) = 0 .
, a(x, w) ,
w, X = (xi , yi )i=1 .
. 4.1. Mi (w) = yi f (xi , w) (margin) xi
a(x, w) = sign f (x, w).
Mi (w) < 0, a(x, w) xi . Mi (w), xi .

. 
L Mi (w) , L (M ) , : [M < 0] 6 L (M ). :

Q(w, X ) =

X
i=1

[Mi (w) < 0]

e
Q(w,
X ) =

X
i=1

L Mi (w)

min .
w

(4.1)

. 7.
, , . - (SVM),
, ,
AdaBoost, . ??.
, .
.
, : L (M ) ,
.

52

. . .

V
L

. 7.
[M < 0].

2
1
0
-5

-4

-3

-2

-1

Q(M ) = (1 M )2
V (M ) = (1 M )+
S(M ) = 2(1 + eM )1
L(M ) = log2 (1 + eM )
E(M ) = eM

;
-;
;
;
.

- . , X Y ,
f (x, w) p(x, y|w).
w X
. w,
X (
). X ( ,
p(x, y|w)), :

p(X |w) =

Y
i=1

p(xi , yi |w) max .


w

L(w, X ) = ln p(X |w) =

X
i=1

ln p(xi , yi |w) max .


w

(4.2)

(4.2) (4.1), ,
,

ln p(xi , yi |w) = L yi f (xi , w) .

p(x, y|w), f L . ,
L , , ,
.
. ,
p(x, y|w) p(w).
, p(x, y|w) , . ,
p(x, y|w) p(w).

53

, p(w)
p(w; ), , .
.
,
, .
-, X , w
. , , p(X , w; ) = p(X |w)p(w; ). ,
:
L (w, X ) = ln p(X , w; ) =

X
i=1

ln p(xi , yi |w) + ln p(w; ) max .


w

(4.3)

L : (4.2)
, .
, .
. w Rn , .
. , :



1
kwk2
1
ln p(w; ) = ln
exp
= kwk2 + const(w),
n/2
(2)
2
2
const(w) , w, ,
(4.3). w,
.
.
. w Rn , .



n
X
1
kwk1
1
ln p(w; C) = ln
exp
= kwk1 + const(w), kwk1 =
|wj |.
(2C)n
C
C
j=1
,
. 2C 2 .
, . - ,
w = 0.
w:
Q(w) =

X
i=1

Li (w) +

n
1 X
|wj | min,
w
C j=1

54

. . .

Li (w) = L (yi f (xi , w)) . , . wj



uj = 12 |wj | + wj

vj = 12 |wj |wj . wj = uj vj |wj | = uj +vj .
, -:
Q(u, v) =

X
i=1

n
1 X
(uj + vj ) min;
Li (u v) +
u,v
C j=1

uj > 0,

vj > 0,

j = 1, . . . , n.

j uj > 0 vj > 0 ,
, Q(u, v)
, . C , - 2n . C j,
, uj = vj = 0, , wj = 0. ,
j- , .
, , wj j- , (features selection). C . C, wj .
. p(w), w Rn wj ,
, Cj wj :
 X

n
wj2
1

p(w) =
exp
.
2C
(2)n/2 C1 Cn
j
j=1

(4.4)

Cj wj :
Q(w) =

X
i=1


n 
wj2
1X
Li (w) +
ln Cj +
min .
w,C
2 j=1
Cj

. Cj 0,
wj , .
Cj , wj , .

4.2

X ; Y = {1, 1} ; n fj : X R, j = 1, . . . , n.
x = (x1 , . . . , xn ) Rn , xj = fj (x), x.

55

. 8. ( ).

x w Rn , :


a(x, w) = sign hw, xi w0 = sign

X
n
j=1

wj fj (x) w0 .

(4.5)

hw, xi = 0 , Rn . x w, x +1, 1.
w0 . , , fj (x) 1, w0 wj .
-.
, 8. , , , . : . ,
. 100
1 , .
, , .
,
.
.
, xj = fj (x) n ,
. wj .
wj , j- , ,

56

. . .

1.5
1.0

0.5

(z) = [z > 0]
(z) = (1 + ez )1
th(z) =2(2z) 1
ln(z + z 2 + 1)
exp(z 2 /2)
z

0.0
-0.5

T
-1.0
-1.5

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

;
(S);
(T);
(L);
(G);
(Z);

. 9. (z).

. w0 ,
+1, 1.
(z) = sign(z), , .
. (z) = th(z) , . . 9.
, (4.5) . 1943 [52].
. , . 100 /.
, ,
102 .
1011 ,
103 104 . ,
.
, ,
(, , ), , , , . ,
50- ,
.

. ,
.
. ,
.
,
, .

57

4.3



X = (xi , yi ) i=1 , xi Rn , yi {1, 1}.
w Rn , :

Q(w, X ) =

X
i=1


L hw, xi i yi min .
w

(4.6)

Q(w) .
w, , w
Q.

Q(w) n

Q (w) = wj j=1 :
w := w Q (w),

> 0 ,
(learning rate). , L ,
:
w := w

X
i=1


L hw, xi i yi xi yi .

(4.7)

(xi , yi ) w,
w . , (xi , yi ) ,
:

w := w La hw, xi i yi xi yi .
(4.8)

(stochastic gradient, SG) , . 4.1.


, .
.

1
1
, wj := random 2n
, 2n
. .

hy, fj i
, j = 1, . . . , n,
(4.9)
hfj , fj i

fj = fj (xi ) i=1 j- , y = (yi )i=1 .

.
wj :=

4.1 Q . . , .
1/. .

58

. . .

4.1. .
:
X ; ; .
:
w1 , . . . , wn ;
1: wj , j = 1, . . . , n;
2:
:

3:
4:
5:

6:
7:
8:

P
Q := i=1 L hw, xi i yi ;

xi X (, );

a(xi , w) :

i := L hw, xi i yi ;

 :
w := w L hw, xi i yi xi yi ;
:
Q := (1 )Q + i ;
Q / w ;

SG.

.
, , .
,
, .
SG.
Q, , ,
, .
n . ,
, ,
, , , .
, .
hw, xi i, L ,
(4.8). wj , .

59

4.3.1

. , , L (M ) = (M 1)2 .


w := w hw, xi i yi xi .
(4.10)

1960 (delta-rule),
ADALINE [66].

, Y = R, a(x) = hw, xi hw, xi i yi 2 .

. 1957 , . ,
.
, .
. , ,
. , . w.
. , fj (x) {0, 1}. ,
yi {1, 1}. , a(xi )
yi . .
1. a(xi ) yi , .
2. a(xi ) = 1 yi = 1, w .
wj , fj (xi ) 6= 0;
. w := w + xi , > 0 .
3. a(xi ) = 1 yi = 1, : w := w xi .
[45]:
hw, xi i yi < 0 w := w + xi yi .

(4.11)

, (4.8), - L (M ) = (M )+ . (4.8)
, .
,
.
4.1 (, 1962 [55]). X = Rn , Y = {1, 1}, X
w ,
hw,
xi i yi > i = 1, . . . , . 4.1 (4.11)
, , w0 , > 0,
. w0 = 0,

 2
D
, D = max kxk.
tmax =

xX

60

. . .

. w t- wt , kwk
= 1:
[
cos(w,
wt ) =

hw,
wt i
.
kwt k

t- wt1
x, y, : hx, wt1 i y < 0.
(4.11) .
, :


w,
wt = w,
wt1 + hw,
xi y > w,
wt1 + > w,
w0 + t.
, kxk < D, :


kwt k2 = kwt1 k2 + 2 kxk2 + 2 x, wt1 y < kwt1 k2 + 2 D2 < kw0 k2 + t 2 D2 .
:
hw,
w0 i + t
[
cos(w,
wt ) > p
t .
kw0 k2 + t 2 D2

. , t x X , hx, wt i y < 0,
.
w0 = 0,
.


cos 6 1 t/D 6 1, tmax = (D/)2 .
,
. ,
.
4.3.2

, .
, , 6.2.
, , . [50].
.
. kxi k , , . :
xj :=

xj xjmin
,
xjmax xjmin

xj :=

xj xj
xj

j = 1, . . . , n,

xjmin , xjmax , xj , xj , , j- .

61

. , .
1. ,
, .
, . , ,
. (shuing).
2. , ,
. .
,
,
, .
3. , . , . ,
: , .
.
(weights decay). ,
Q(w) :

Q (w) = Q(w) + kwk2 .


2
: Q (w) = Q (w) + w.

w := w(1 ) Q (w).
, (1 ), . , ,
. .
,
, .
.
1. , ,

t. ,
P
P
2

=
,
<
,
t = 1/t.
t 0,
t=1 t
t=1 t
2.  ,
Q w Q (w) min.

[8]. ,
ADALINE = kxi k2 .

62

. . .

.
, .
(jog of weights). ,
(stochastic local search).
. .
Q(w, X ) ,
. (early stopping): -
(. ??), ,
, , .

4.4

, . ,
. -, . -,
.
4.4.1

,
, . : ? , .
. , Y = {1, +1}, n fj : X R, j = 1, . . . , n. X = Rn ,
: x (f1 (x), . . . , fn (x)).
4.1. X Y . X = (xi , yi )i=1 p(x, y) = Py py (x) = P(y|x)p(x),
Py , py (x) , P(y|x) y Y .
. 4.2. p(x), x Rn ,
p(x) = exp c() h, xi + b(, ) + d(x, ) , Rn ,
, b, c, d .
.
: , , , , , -, .

63

4.1.
Rn Rnn = 1 = :

1
n
N (x; , ) = (2) 2 || 2 exp 12 (x ) 1 (x ) =

= exp 1 x 21 1 1 12 x 1 x n2 ln(2) 21 ln || .
| {z } |
{z
} |
{z
}
h,xi

b(,)

d(x,)

4.2. py (x)
, d ,
y .

. , a(x) = arg max y P(y|x), y y.


yY

a(x) = sign + P(+1|x) P(1|x) = sign

P(+1|x)

P(1|x) +

4.2. 4.1, 4.2, f1 (x), . . . , fn (x)


, :

1) : a(x) = sign hw, xi w0 ,
w0 = ln( /+ ), w , + ;
2) x X
y {1, +1}
:

1
P(y|x) = hw, xi y , (z) = 1+ez .
.
, py (x) y :


P+ p+ (x)
P(+1|x)
=
= exp h(c+ ()+ c () ), xi + b+ (, + ) b (, ) + ln PP+ .
P(1|x)
P p (x)
|
|
{z
}
{z
}
w=const(x)

const(x)

w x
. , x,
. , , hw, xi.
P(+1|x)
= ehw,xi .
P(1|x)

P(1|x) + P(+1|x) = 1, P(1|x) P(+1|x) hw, xi:




P(+1|x) = + hw, xi ; P(1|x) = hw, xi .

, : P(y|x) = hw, xi y .

P(1|x) = + P(+1|x), hw, xi ln + = 0, , .


64

. . .

3.5
1.0

3.0
2.5

0.8

2.0

0.6

1.5

0.4

1.0

0.2

0.5
0

0
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5

0.5 1.0 1.5 2.0 2.5 3.0

Mi

. 10.

log2 1 + eMi .

4.4.2

-5

-4

-3

-2

-1

Mi

. 11. : [Mi < 0]


(Mi ).

. w X :

L(w, X ) = log2

Y
i=1

p(xi , yi ) max .
w

, p(x, y) = P(y|x)p(x), p(x) w.


4.2 
: P(y|x) = hw, xi y . ,

L(w, X ) =

X
i=1


log2 hw, xi i yi + const(w) max .
w

L(w, X ) e
Q(w,
X ), (4.1):
e
Q(w,
X ) =

X
i=1


log2 1 + exp( hw, xi i yi ) min .
w

(4.12)


, L (M ) = log2 1+eM
.
e
. Q(w),

(z) = (z) 1 (z) = (z)(z),



:

w := w + yi xi hw, xi i yi ,
(4.13)
(xi , yi ) , .

65

. - , ,
. 10. (4.13) , , (4.11), . 11:


w := w + yi xi hw, xi i yi < 0 .

xi . ,
Mi (w) = hw, xi i yi , . ,
, ,
. (margin)
, [31, 60].

. w
-, , . 5.5.5.
,
(,
n n) . . , x
P (+1|x).
. ()
, .
n|Y | + n(n + 1)/2 (
), n
( w). ? ,
, , .
. ,

.
.
,
( ), -
( ).
.
.
, 4.2.

66

. . .

. , , ( ), , .
, nn .
4.4.3

. , X = {0, 1}n , , py (x) ,


, .
,
.

(score): fj (x) = 1, fj x, wj . w0 .

(scoring)
, , , , , , .
wj fj , sign(wj ) ,
. , , .


(scorecard), , ,
. 12. . , .
. . 12.

hw, xi , x +1: P(+1|x) = hw, xi.
, x
:
X
X

R(x) =
y P(y|x) =
y hw, xi y ,
yY

yY

y y.

67



. 4.2 , P(y|x) = hw, xi y
.
. ,
.

, 4.2 , a(x) = sign f (x, w) .
, P(+1|x) f (x, w).
, , - :

P(+1|x) = f (x, w) + .

(, ) R2

X
i=1

log P(yi |xi ) =

log yi f (xi , w) + yi

i=1

max
,

4.5

6070- . . , [7].
,
, .
,
, , . ,
4.5.2, .
90-
(support vector machine, SVM) [38]. . [35, 61].
SVM . -,
SVM , ,
. -, :

. ; . ,

68

. . .

. ,
, .
4.5.1

,
n- : X = Rn , Y = {1, +1}.
:
X

n

j
a(x) = sign
(4.14)
wj x w0 = sign hw, xi w0 ,
j=1

x = (x1 , . . . , xn ) x; w = (w1 , . . . , wn ) Rn
w0 R . hw, xi = w0
, Rn .
, X = (xi , yi )i=1
w, w0 ,
Q(w, w0 ) =

X

i=1

yi (hw, xi i w0 ) 6 0

. . ,
. , .
. ,
. : , (margin)
. [32, 59, 65].
. , : a(x) , w w0 .
,

(4.15)
min yi hw, xi i w0 = 1.
i=1,...,



x : 1 6 hw, xi w0 6 1 ,
, . . 13.
. w.
. , , , (4.15).
,
.

69

x+
x

. 13. .
x x+ . w .

.
, .
x x+ 1 +1 ,
.


(w0 + 1) (w0 1)
2
hw, x+ i hw, x i
w
=
=
.
=
(x+ x ),
kwk
kwk
kwk
kwk
, w .
, : w w0 ,
- w :
(
hw, wi min;

(4.16)
yi hw, xi i w0 > 1, i = 1, . . . , .
. (4.16) , .
4.5.2

, ,
, .
i > 0, xi , i = 1, . . . , .
(4.16) - :

i min ;
2 hw, wi + C
w,w0 ,
i=1
(4.17)


y
hw,
x
i

w
>
1

,
i
=
1,
.
.
.
,
;

i
i
0
i

> 0, i = 1, . . . , .
i

C
.

70

. . .
3

0
-3

-2

-1

Mi

. 14. - : [Mi < 0] 6 (1 Mi )+ .

. Y = {1, +1}
(margin) xi

Mi (w, w0 ) = yi hw, xi i w0 .

(4.14) xi , Mi . Mi (1, +1), xi . Mi > 1, xi ,


.
(4.17) i Mi . , -
, i > 0 i > 1Mi .
P
i i .
, i = (1 Mi )+ . , (4.17) Q, i :
Q(w, w0 ) =

X
i=1

1 Mi (w, w0 )

1
kwk2 min .
w,w0
2C

(4.18)

[Mi < 0] 6 (1 Mi )+ , . 14, (4.18)


( ), kwk2 ,
1
.
2C
[M < 0] -
L (M ) = (1 M )+ .
L (M ) .
w. , w, .
, , . ,
. . [23].
(4.18) (4.3),

p(xi , yi |w) = z1 exp (1 Mi (w, w0 ))+ ,
w


kwk2
,
p(w; C) = z2 exp
2C

71

w0 . z1 , z2 , C .
,
,
, , SVM, .
. (4.17):

X
X
 X
1
i i =
i Mi (w, w0 ) 1 + i
i
L (w, w0 , ; , ) = kwk2 + C
2
i=1
i=1
i=1

X
 X

1
= kwk2
i Mi (w, w0 ) 1
i i + i C ,
2
i=1
i=1

= (1 , . . . , ) , w; = (1 , . . . , )
, = (1 , . . . , ).
- (4.17) :

L (w, w0 , ; , ) min max;

w,w0 , ,

i > 0, i > 0, i > 0, i = 1, . . . , ;

i = 0 Mi (w, w0 ) = 1 i , i = 1, . . . , ;

i = 0 i = 0, i = 1, . . . , ;
.

. :

X
L
i yi xi = 0
=w
w
i=1

L
= i i + C = 0
i

i yi xi ;

(4.19)

i=1

X
L
i yi = 0
=
w0
i=1

w=

(4.20)

i yi = 0;

i=1

i + i = C,

i = 1, . . . , .

(4.21)

(4.19) , w
xi , , i > 0.
. 4.3. i > 0, xi (support vector).
(4.21) i > 0 0 6 i 6 C.
, ,
i , i , i Mi .
, xi , i = 1, . . . , :
1. i = 0; i = C; i = 0; Mi > 1.

72

. . .

xi w. .
2. 0 < i < C; 0 < i < C; i = 0; Mi = 1.
xi . .
3. i = C; i = 0; i > 0; Mi < 1.
xi , (0 < i < 1, 0 < Mi < 1), (i = 1, Mi = 0),
(i > 1, Mi < 0).
xi .
(4.21) ,
i i , i .

X
1 XX

L () =

i +
i j yi yj hxi , xj i min;

2 i=1 j=1

i=1
0 6 i 6 C, i = 1, . . . , ;

i yi = 0.

(4.22)

i=1

,
, , . , , . , .
, . w (4.19). w0 xi w0 w0 = hw, xi i yi .
w0 , :


w0 = med hw, xi i yi : i > 0, Mi = 1, i = 1, . . . , .
(4.23)
:
a(x) = sign

X

i=1


i yi hxi , xi w0 .

(4.24)

, ,
, i 6= 0. a(x) ,
. (sparsity); SVM
, .
i , . ( ) SVM.
, ; , .

73

. C . , C. , , ,
C, .
, ,
- , . C, , i .
. ,
, .
4.5.3

.
X H : X H. H , ,
( , X , ,
). H .
, (xi ), xi , SVM , .
, hx, x i X h(x), (x )i H.
: H
, , ,
, .
. 4.4. K : X X R (kernel function),
K(x, x ) = h(x), (x )i : X H,
H .
(4.22), (4.24)
,
. , hx, x i K(x, x ). , a : X Y .
, H
, .
, .
, , . K(x, x ), , SVM.

(featureless recognition), (kNN, RBF .) .

74

. . .

. K(x, x )
? , .
4.3 (, 1909 [53]). K(x, x ) , K(x, x ) = K(x , x), :
R R,
K(x, x )g(x)g(x )dxdx > 0 g : X R.
X X
.

. 4.5. K(x, x ) ,
X p = (x1 , . . . , xp ) X K = kK(xi , xj )k p p : z Kz > 0 z Rp .

.
, , . , , . ,
. ,
, .
.
[19, 20].
1. K(x, x ) = hx, x i .
2. K(x, x ) = 1 .
3. K(x, x ) = K1 (x, x )K2 (x, x ) .
4. : X R K(x, x ) = (x)(x ) .
5. K(x, x ) =
= 1 K1 (x, x ) + 2 K2 (x, x ) .
6. : X X K0 : K(x, x ) = K0 ((x), (x )).
7. s : X X
,
R R

K(x, x ) = X s(x, z)s(x , z) dz .

8. K(x, x ) = k(x R x ) ,
n
- F [k]() = (2) 2 X eih,xi k(x) dx .
9. - .

10. K0 f : R R,  K(x, x ) = f K0 (x, x ) , . , f (z) = ez


1
.
f (z) = 1z

75

. , : , ,
(RBF-), . ,
.
. , . ,
.
4.2. X = R2 K(u, v) = hu, vi2 , u = (u1 , u2 ),
v = (v1 , v2 ). ,
. :
K(u, v) = hu, vi2 = h(u1 , u2 ), (v1 , v2 )i2 =
= (u1 v1 + u2 v2 )2 = u21 v12 + u22 v22 + 2u1 v1 u2 v2 =
D

 2 2
E
2
2
= u1 , u2 , 2u1 u2 , v1 , v2 , 2v1 v2 .

K
H = R3 .
: R2 R3 : (u1 , u2 ) 7 (u21 , u22 , 2u1 u2 ). H
X.
, .
4.3. . X = Rn ,
K(u, v) = hu, vid .
(u) (u1 )d1 (un )dn
d1 , . . . , dn ,
d1 + + dn = d. , d
H, Cn+d1
. H ,
d u1 , . . . , un .
4.4. X = Rn ,
d
K(u, v) = hu, vi + 1 ,

H d u1 , . . . , un .
H d.
X.
SVM . a(x)
(4.24) hxi , xi K(xi , x).
, h . i = 0
, i = h + 1, . . . , , a(x)
X

h
a(x) = sign
i yi K(xi , x) w0 .
i=1

76

. . .
GFED
@ABC
x1 ?? x11 / K(x, x1 )
SSS
?
?
S

x1h ?
1 y1 SS

S) X
?? 

?


?
5
 ???
kkk H
xn

y
?
h
h
k
?
1
kkk

@ABC
GFED
xn xnh / K(x, xh )

sign

/ a(u)

w0

@ABC
GFED
1
. 15. (SVM) .

X = Rn , a(x) ,
. ,
. , SVM, .
-, ,
.
-, : i
K(x, xi ), , xi .
4.5.
,

K(u, v) = th k0 + k1 hu, vi .

k0 k1 . , k0 < 0 k1 < 0 [36].


. th z (z) = 1+e1z .

, K(u, v) ? (4.22) . , ,
. ,
0 6 i 6 C , .

- , .
4.6. (radial basis
functions, RBF) ,

K(u, v) = exp ku vk2 ,

. K(xi , x) x xi . , .

77

, i yi .
+1 , 1 . ,
x .
2.4.3 RBF-, EM-.
. ,
. SVM-RBF EM-RBF. SVM
,
. , SVM-RBF
. , EM-RBF .
SVM.
, .
, .
.
SVM.
. - .
. ,
, .
C .
(RVM). ,
w - . w, .
(relevance vector machine,
RVM). ;
[64, 33].
, (SVM) w xi :
w=

X
i=1

i yi xi ,

(4.25)

78

. . .

i ,
xi . SVM , , -,
. RVM ,
.
RVM (4.25), ,
i . , , SVM. , i i :
!

X
1
2i
p() =
exp

(2)/2 1
2i
i=1

, (4.4),
, .
, , SVM,
. ,
, SVM,
, , , . ,
.

4.6

ROC-


 , Y = {1, +1},
a(x, w) = sign f (x, w) w0 , w0 R . .
4.2 w0 : w0 = ln + , +
+1 1 .
.
ROC-, , , .
(receiver operating characteristic, ROC curve) . II ,
1941 , .
: , , , , . .
ROC- .

.
X
(false positive rate, FPR):
P
[yi = 1][a(xi ) = +1]

.
FPR(a, X ) = i=1 P
[y
=
1]
i
i=1

79

4.2. ROC-
:
X ; f (x) = hw, xi ;
:


(FPRi , TPRi ) i=0 ROC-;
AUC ROC-.
P
[y = 1] 1;
1: :=
Pi=1 i
+ := i=1 [yi = +1] +1;
2: X f (xi );
3: :
(FPR0 , TPR0 ) := (0, 0); AUC := 0;
4: i := 1, . . . ,
yi = 1
5:
6:
:
FPRi := FPRi1 + 1 ; TPRi := TPRi1 ;
AUC := AUC + 1 TPRi ;
7:

8:
:
FPRi := FPRi1 ; TPRi := TPRi1 + 1+ ;
1 FPR(a)
(true negative rate, TNR) a. 1 .
Y
(true positive rate, TPR), a:

TPR(a, X ) =

i=1 [yi

= +1][a(xi ) = +1]
.
P
[y
=
+1]
i
i=1

ROC- w0 . ROC- (0, 0) (1, 1).


ROC- FPR TPR w0 . 4.2 , w0
f (xi ) = hw, xi i, .
ROC-, . ROC- (0, 1). , (0, 0) (1, 1);
.
, w0 , ROC- (area under curve,
AUC). 4.2.

80

. . .

Y = R . . X Y . y : X Y , X = (xi , yi )i=1 , yi = y (xi ). ,


a : X Y , y .

5.1

g(x, ),
Rp . X :

Q(, X ) =

X
i=1

2
g(xi , ) yi .

(5.1)

() , , X :
= arg minp Q(, X ).
R

(5.2)

. g(x, )
, p p :

X
 g
Q
g(xi , ) yi
(, X ) = 2
(xi , ) = 0.

i=1

5.2

(5.3)

,
,
2.2.2. a(x) x . , X (x, x ).
5.2.1

, g(x, ) = , R. , ,
wi (x), x,
a(x) = g(x, ). ,
x X.

81

a(x) = x X,
:

Q(; X ) =

X
i=1

wi (x) yi

2

min .
R

wi , (x, xi ). , ,
K : [0, ) [0, ), :


(x, xi )
.
wi (x) = K
h
h .
h, wi (x) xi x.
Q
= 0,

ah (x; X ) =

yi wi (x)

i=1

i=1

=
wi (x)

yi K

i=1

i=1

(x,xi )
h
(x,xi )
h

(5.4)

: a(x) yi xi , x.
X = R1 (x, xi ) = |x xi |.
(5.4) ,
2.3 .
5.1 ([25]). :
1) X = (xi , yi )i=1 ,
R p(x, y);
2) K(r) 0 K(r) dr < lim rK(r) = 0;
r

3) ,
R 2 p(y|x), 2
x X E(y |x) = Y y p(y|x) dy < ;
4) h , lim h = 0 lim h = .

: ah (x; X ) E(y|x)
x X, E(y|x), p(x) D(y|x) p(x) > 0.
,
h.
5.2.2

.
ah (x; X ) , K
h.

82

. . .

K ,
ah (x). ah (x)
, K(r).


. 3.


KG (r) = exp 12 r2 KQ (r) = (1 r2 )2 |r| < 1 .
K(r) , K(r) = 0 r > 1, xi , (x, xi ) < h. (5.4) x.
X = R1
xi . , x.
h . (h 0) ah (x)
, .
h
. ,
h .
, X.
, . h(x),
x. ,

(x,xi )
wi (x) = K h(x) .
h(x) (k+1)
x k + 1- : hk (x) = (x, xx ). ,
hk (x) , , wi (x)
ahk (x) , . hk (x)
, - , , KQ .
. h k xi ,
. , h 0.
(leave-one-out, LOO):

LOO(h, X ) =

X
i=1


2
ah xi ; X \{xi } yi min,
h

h k.
5.2.3

. ,

83

5.1. LOWESS .
:
X ;
:
i , i = 1, . . . , ;
1: : i := 1, i = 1, . . . , ;
2:
3:
:


ai := ah xi ; X \{xi } =

yj j K

j=1, j6=i

j K

j=1, j6=i

(xi ,xj ) 
h(xi )

(xi ,xj ) 
h(xi )

i = 1, . . . ,


i :


i := K |ai yi | ; i = 1, . . . , ;
5: i ;

4:




i = ah xi ; X \{xi } yi , (xi , yi )
, .
i ), K
,
wi (x) i = K(
, K(r).
i , i , ah , ,
, i . , , , . 5.1.
ah , i . , . (locally
weighted scatter plot smoothing, LOWESS) [37].
, , , , (robust).

K().
: (1) 6 6 () ,

t . 


K()
= 6 (t) .


[37]: K()
,
= KQ 6 med{
}
i
med{i } .
5.2.4

X = R1 ah (x) y (x) xi , . ??. ,


xi ( ) x.
, .
x X a(u) = , a(u) = (u x) + .

84

. . .

wi = wi (x), di = xi x :

Q(, ; X ) =

X
i=1

wi di + yi

2

min .
,R

= 0 Q
= 0, Q

2 2, :

ah (x; X ) =

i=1

wi d2i

w i yi

i=1

wi

i=1

i=1

wi d2i

wi di

i=1

P

i=1

i=1

wi di

w i d i yi
2

X = Rn
a(u) = (u x) + (. ). x X,
.

5.3


f1 (x), . . . , fn (x) ,
fj : X R , j = 1, . . . , n. Rn :
g(x, ) =

n
X

j fj (x).

j=1

: F = fj (xi ) n ;

y = yi 1 ; = j n1 .
Q
Q() = kF yk2 .
(5.3) :
Q
() = 2F (F y) = 0,

F F = F y. . F F
n n ,
= (F F )1 F y = F + y.
F + = (F F )1 F F . ,
Q( ) = kPF y yk2 ,

85

PF = F F + = F (F F )1 F .
. PF y
y F . (PF y y) y
. Q( ) = kPF y yk2 , y . ,
y F .
. ,
F . ,
.
5.3.1

n- n
(singular value decomposition, SVD)
F = V DU ,
( 5.2, ):


1) n n- D , D = diag 1 , . . . , n , 1 , . . . , n F F F F .
2) n- V = (v1 , . . . , vn ) , V V = In , vj
F F , 1 , . . . , n ;
3) nn- U = (u1 , . . . , un ) , U U = In , uj
F F , 1 , . . . , n ;
, :
F + = (U DV V DU )1 U DV = U D1 V =
-:

n
X
1
p uj vj ;
j
j=1

n
X
1
p uj (vj y);
= F y = UD V y =
j
j=1

(5.5)

F - y:

F = PF y = (V DU )U D V y = V V y =

n
X

vj (vj y);

(5.6)

j=1

n
X
1 2
k k = y V D U U D V y = y V D V y =
(v y) .
j
j=1 j
2

(5.7)

, , .
,
. , SVD,
.

86

5.3.2

. . .

= F F , .
.

, . ,
. , m < n.
, .

1

() = kkk k =

max kuk

u : kuk=1

min kuk

u : kuk=1

max
,
min

max min ,
. , () & 102 . . . 104 .
. , z = 1 u, () :
kzk
kuk
6 ()
.
kzk
kuk
- . (5.7) , , . -

, x
g(x, ). ,
, j fj .
5.3.3

Q , kk:
Q () = kF yk2 + kk2 ,
. , Q , . , .
Q () , :
= (F F + In )1 F y.

87

, In . (ridge
regression). ,
. ,
. 2.3.3
.
- :

n
X
j
uj (vj y).
= (U D2 U + In )1 U DV y = U (D2 + In )1 DV y =

j=1 j
- y:
F

= V DU


n
X
j
j

vj (vj y).
V y=
= V diag
j +

j=1 j


(5.8)

(5.6), -
y F F .
j
(0, 1). , j +
(5.7) :
k k2

= kD (D + In ) D V yk =

n
X
j=1

X 1
1
(vj y)2 <
(vj y)2 = k k2 .
j +

j=1 j

(shrinkage)
(weight decay) [44].
. ,

. , .
, .
,
tr F (F F )1 F = tr(F F )1 F F = tr In = n.
0 n, , :
1

n = tr F (F F + In ) F = tr diag

j
j +

n
X
j=1

j
< n.
j +

. 0 -: .
: 0. ,

88

. . .

. , . 1.1.6 , , ??. , ,
.
. [0.1, 0.4], F
( ).
, :
+
, max /M0 .
M0 = (F F + In ) = max
min +
5.3.4

, .
-,
:

2
;

Q() = kF yk min

n
X

|j | 6 ;
(5.9)

j=1

. (5.9) , -. ,
j . () ,
. ,
, .
(LASSO, least absolute shrinkage and selection operator) [63].
, ,
(5.9) . j : j = j+ j .
Q , (5.9)
, 2n -:
n
X

j+ + j 6 ;

j+ > 0;

j > 0.

j=1

, j+ = j = 0,
j j- .
. . ,
. ,
,
. . 16, [44],
, .
. -
, .

89

() {j } = 1/

() {j }

. 16. = 1/ , UCI.cancer [44].

5.3.5

. , , fj , y.
(. ??). , y,
.
Q(, X ) n :
(
Q() = kF yk2 min;

j > 0;

j = 1, . . . , n.

-, . j > 0 ,
, fj , , .
j- .

5.4

, ,
, , ,
, .
(principal component analysis, PCA)
, . PCA (unsupervised learning),
F y.

90

. . .

, PCA , ,
, , .
. n fj (x), j = 1, . . . , n.
,  : xi f1 (xi ), . . . , fn (xi ) , i = 1, . . . , . F ,
:


x1
f1 (x1 ) . . . fn (x1 )

...
...
...
= . . . .
Fn =
x
f1 (x ) . . . fn (x )

zi = g1 (xi ), . . . , gm (xi ) Z = Rm , m < n:


z1
g1 (x1 ) . . . gm (x1 )

...
...
...
= . . . .
Gm =
z
g1 (x ) . . . gm (x )
,
, U = (ujs )nm :
fj (x) =

m
X

gs (x)ujs ,

j = 1, . . . , n,

s=1

x X,

: x = zU . x
x,
m. G, U , :
2

(G, U ) =

X
i=1

k
xi xi k =

X
i=1

kzi U xi k2 = kGU F k2 min,


G,U

(5.10)

. , kAk2 = tr AA = tr AA, tr
.
, G U : rk G = rk U = m.
U = GU G,

G
m. , m 6 rk F .
(5.10) .
5.2. m 6 rk F , 2 (G, U ) ,
U F F , m . G = F U , U G .

91

. :
(
2 /G = (GU F )U = 0;
2 /U = G (GU F ) = 0.
G U ,
(
G = F U (U U )1 ;
U = F G(G G)1 .

(5.11)

2 (G, U ) GU , (5.10) R: GU = (GR)(R1 U ). R ,


U U G G . , .
U (5.10).
G
U U , , , Smm , S 1 U U S 1 = Im .
GS
, S G
GS)T

Tmm , T (S G
= diag(1 , . . . , m ) . T T = Im .

R = ST . G = GR,
U = R1 U .
GS)T

G G = T (S G
= ;
U U = T 1 (S 1 U U S 1 )T 1 = (T T )1 = Im .
U G U (5.10) GU = G
. G U (5.11).
G G U U :
(
G = F U;
U = F G.
, U = F F U . ,
U F F , 1 , . . . , m .
, , G = F F G,
G F F , .
G U 2 (G, U ), :
2 (G, U ) = kF GU k2 = tr(F U G )(F GU ) = tr F (F GU ) =
= tr F F tr F GU = kF k2 tr U U =
n
m
n
X
X
X
2
= kF k tr =
j
j =
j ,
j=1

j=1

j=m+1

1 , . . . , n F F . 2 ,
1 , . . . , m m n .

u1 , . . . , um , , .
5.2 .

92

. . .

. m = n, 2 (G, U ) = 0.
F = GU :
F = GU = V DU , G = V D = D2 . V : V V = Im . ,
. 85, 5.2.
m < n, F GU .
GU F
() n m .
. G G = ,
g1 , . . . , gm . U
. m = n,
U : F = GU G = F U .

kG yk2 min .

U , G = GU U = GU F , = U . ,
F GU .
,
U : = U U = U .
- ,
G G :
= 1 G y = D1 V y;
G = V D = V V y.
= U - , ,
, (5.5)(5.7) m 6 n , n m .
. (5.5)(5.7). ,
. , .
. F . m . . F F : 1 > . . . > n > 0.
[0, 1], ,
m, F :
E(m) =

m+1 + + n
kGU F k2
=
6 .
2
kF k
1 + + n

93

E(m) , n m.
, E(m) m.
,
. E(m) m, : E(m 1) E(m), ,
E(m) .
.  .
m = 2 g1 (xi ), g2 (xi ) , i = 1, . . . , ,
. -
. , , .
, . g1
g2 . , ,
, , .

.
(. ??) (. 7.2.2).

5.5

, ,
,
. ,
, ,
.
:
.
5.5.1

f (x, )
Rp :

Q(, X ) =

X
i=1

2
f (xi , ) yi .

Q . 0 = (10 , . . . , p0 )
1
t+1 := t ht Q (t ) Q (t ),

94

. . .

Q (t ) Q t , Q (t ) ( ) Q t , ht ,
, .
:

X
 f

Q() = 2
(xi , ).
f (xi , ) yi
j
j
i=1

X f
X
 2f
f
2
Q() = 2
(xi , )
(xi , ) 2
f (xi , ) yi
(xi , ) .
j k

j
k
j
k
i=1
{z
}
| i=1
0

f , .

.
. f (
),
t :
p
X

f
f (xi , ) = f (xi , ) +
(xi , j ) j jt .
j
j=1
t

f . , .
2f
j
(xi , ). .
k
.

f
t j=1,p
: Ft =
(x
,

)
i
j
 i=1,
t
p t- ; ft = f (xi , ) i=1, t- . t-
:
t+1 := t ht (Ft Ft )1 Ft (f t y) .
{z
}
|


kFt (f t y)k2 min. ,

. ,
( ), .
5.5.2

, , f (x, )

95

5.2. (backtting).
:
F, y ;
:
j (x) , .
1: :

:= fj (x);
j (x) := j fj (x), j = 1, . . . , n;
2:
3:
j = 1, . . . , n
n
P
k (fk (xi )), i = 1, . . . , ;
4:
zi := yi
5:

6:

k=1,k6=j

j := arg min

Qj :=

i=1

i=1

2
(fj (x)) zi ;

2
j (fj (x)) zi ;

7: Qj

.
f (x, ) =

n
X

j (fj (x)),

j=1

j : R R ,
. , j , (5.1).
1986 [43]. 5.2.
j .
, j (x) = j fj (x), j .
j , ,
.

X
2
P
j (fj (xi )) yi nk=1,k6=j k (fk (xi ))
Q(j , X ) =
min
j
|
{z
}
i=1

zi =const(j )


Zj = fj (xi ), zi i=1 . : , , -.

96

5.5.3

. . .

, f (x, ) , g(f ) f y. , :

2
X

Pn
g

f
(x
)
Q(, X ) =
minn ,

y
i
i
j=1 j j
R
|
{z
}
i=1

zi

g(f ) .
, . g(z) zi :
g(z) = g(zi ) + g (zi )(z zi ).


Q Q,
:

Q(,
X ) =
=


2
X

Pn
g(zi ) + g (zi )

f
(x
)

y
=
j
j
i
i
i
j=1
i=1

X
i=1

2 Pn
 2
yi g(zi )

f
(x
)

z
+
minn .
g (zi )
j
j
i
i

j=1
g (z )
R
| {z }
{z i }
|
wi

yi

wi y.
. , Q() .
5.5.4

L (a, y) a Y
y Y . , a(x) X = (xi , yi )i=1 :

Q(a, X ) =

X
i=1


L a(xi ), y(xi ) min .
a : XY

, L (a, y) = (a y)2 , Q , .
.
,
, , .
. 1.4, . , .
, .

97

- . |a y|
.
5.1.
. a(x),
, (, ). (a y)2
a y .
. a < y : L (a, y) =
= c1 |a y|, c1 . a > y , ,
.
, : L (a, y) = c2 |a y|. c1 c2
. ,
, - .
5.2.
a(x), xi xi+1 . (a y)2
. ,
, a(x). , 1 , , 1 ,
. ( ),
a(x), . , x1 , . . . , x ,
Q(a) =

1
X
i=1



sign a(xi ) y(xi ) y(xi+1 ) y(xi ) .

a Q(a) y(x1 ), . . . , y(x+1 ). wi = |y(xi+1 ) y(xi )|


wi ,
. , - .
. Q(a, X ) ,  , , 1
(a y)2 , , L (a, y) = 1 exp 2
, . Q(a, X ) ; , ,
.

98

5.5.5

. . .

, , . 4.4.
Q(w) =


X
X


ln w xi yi min,
ln 1 + exp w xi yi =
i=1

i=1

(z) = (1 + ez )1 .
w - Q(w).
,
, yi {1, +1}. , t-
wt+1 :
1
wt+1 := wt ht Q (wt ) Q (wt ),

Q (wt ) () Q(w) wt ,
Q (wt ) () Q(w) wt , ht
, 1,
.
.
i = (yi w xi ) ,
(z) = (z)(1 (z)).
( ) Q(w):

X
Q(w)
=
(1 i )yi fj (xi ),
wj
i=1

j = 1, . . . , n.

( ) Q(w):

X
2 Q(w)
=
(1 i )yi fj (xi ) =
wj wk
wk i=1

X
i=1

(1 i )i fj (xi )fk (xi ),

j = 1, . . . , n,

k = 1, . . . , n.


:

Fn = fj (xi ) ;
p

= diag (1 i )i ;
F = Fp ;
yi )i=1 .
yi = yi (1 i )/i , y = (
, ,
:
1
y = (F F )1 F y = F + y.
Q (w) Q (w) = (F 2 F )1 F

99

5.3. IRLS
:
F, y ;
:
w .
1: :
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:

w := (F F )1 F y;
t := 1, 2, 3, . . .
z := Fp
w;
i := (1 (zi ))(zi ) i = 1, . . . , ;
F := diag(
p 1 , . . . , )F ;
yi := yi (1 (zi ))/(zi ) i = 1, . . . , ;
ht ;
w := w + ht (F F )1 F y;
(zi )
, ;
t.


:
Q(w) = kF w yk2 =

X
i=1

(1 i )i w x yi
| {z }
|
i

p
2
(1 i )/i min .
w
{z
}
yi

,
, .
(iteratively reweighted least squares, IRLS)
. -, , i wt xi .
i , 12 .
. -, wt xi yi .
wt+1 , wt .

5.6

5 ,
, X = Rn , Y = R, a(x) = hw, xi w0 ,
w Rn w0 R .
(. 5.3.3) ,

100

. . .

0
-3

-2

-1

. 17. :
- |z| = 1 z 2 .

w:
Q(a, X ) =

X
i=1

hw, xi i w0 yi )2 + kwk2 min,


w,w0

. .
-

-, 17: |z| = |z| + ,
a(xi ) yi , . , , .

X


hw, xi i w0 yi + hw, wi2 min .
Q (a, X ) =

i=1

w,w0

(5.12)

(4.18).
, (5.12)
. , ; ; , ; . , SVM- SVM , .
C = 21 . i+ i , a(xi ) :
i+ = (a(xi ) yi )+ ,

i = (a(xi ) + yi )+ ,

i = 1, . . . , .

(5.12) - wi , w0 , i+ i :

hw,
wi
+
C
(i+ + i ) min
;
2
w,w0 , + ,
i=1
(5.13)
yi i 6 hw, xi i w0 6 yi + + i+ , i = 1, . . . , ;

> 0, + > 0, i = 1, . . . , .
i
i

,
+
i , i , i = 1, . . . , , hxi , xj i

101

K(xi , xj ). , :

P
P

L (+ , ) = (
+ +
) + (
i
i
i i )yi

i=1
i=1

(
;
12
i i )(j j )K(xi , xj ) max
+ ,
i,j=1

i = 1, . . . , ;
0 6
0 6 +

i 6 C,
i 6 C,

(
i + i ) = 0.
i=1

xi , i = 1, . . . , :

1. |a(xi ) yi | < ; +
i = i = i = i = 0.
a(xi ) [yi , yi + ]
. xi w ,
.

2. a(xi ) = yi + ; 0 < +
i < C; i = 0; i = i = 0.
+
+

3. a(xi ) = yi ; 0 <
i < C; i = 0; i = i = 0.
+

+
4. a(xi ) > yi + ; i = C; i = 0; i = a(xi ) yi > 0; i = 0.
+

+
5. a(xi ) < yi ;
i = C; i = 0; i = yi a(xi ) > 0; i = 0.
25 . 4 5 .

:
a(x) =

X
i=1

+
(
i i )K(xi , x) w0 ;

w0 -, 2 3:
(
yi + , xi 2;
hw, xi i w0 =
yi , xi 3.
, , w0 , .
.
. C , ,
, .
SVM-
[61].

102

. . .

, .
, , . (articial
neural networks, ANN) .
, .
.
ANN ,
, .

6.1

, (??) .
. [54]. .
6.1.1

, , ,
x1 x2 , . . 18:


x1 x2 = x1 + x2 21 > 0 ;


x1 x2 = x1 + x2 32 > 0 ;


x1 = x1 + 21 > 0 ;

x1 x2 = [x1 6= x2 ] (exclusive or, XOR)


x1 x2 ,
.
, . 20.
, . , ,
, . x1 x2 ,
:


x1 x2 = x1 + x2 2x1 x2 12 > 0 .
,
, .

103

GFED
@ABC
x1 LLL
LL
1 LL
LL& X
@ABC
GFED
/
2
1
x
8
r
r
rr
1/2
rrr
@ABC
GFED
1

GFED
@ABC
x1 9RRRR
99 1RRR
99
R( X
199
RRR
ll6
+1RRR X
l9l9l B
(
1l 99 
l
l
99
@ABC
GFED
2 R

x R1RR  9
lll6
RRR
1
P

l
X
R
l
l
(

1/2
6

1/2
lll
lll3/2
@ABC
GFED
GFED
@ABC
1
1

/ (x1 x2 )

. 18. ,
.

. 19. , .

1.0

1.0

1.0

1.0

0.5

0.5

0.5

0.5

0.0

0.0
0.0

0.5

1.0

0.0
0.0

0.5

1.0

/ (x1 x2 )

0.0
0.0

0.5

1.0

0.0

0.5

1.0

. 20. . : , ,
XOR x1 x2 , XOR .

, , .
. ,
, - -
-, 19:


x1 x2 = (x1 x2 ) (x1 x2 ) 21 > 0 .

, . . 21.
n H
. M
. .6 . ,
, (hidden layers).
, , .
, . , .
6

. (,
) , x0 , x1 , . . . , xn , . , , .

104

6.1.2

. . .

: ( ) ?
.
1. . , ,
[29].
2. ,

n- . , , , , .
3. 1900 23 , ,
, XX . :
n
. . . [14].
6.1 (, 1957). n
[0, 1]n
:
1

f (x , x , . . . , x ) =

2n+1
X
k=1

hk

X
n
i=1


ik (x ) ,
i

hk , ik , ik f .
, 2n + 1 . ,
, , , . , :
ik , hk f ,
.
4. , n . ,
X
, F ,
[62].
. 6.1. F X,
x, x X f F , f (x) 6= f (x ).
6.2 (, 1948). X , C(X) X , F C(X),
(1 F ) X. F C(X).

105

. ,
( )
- ( )
[21]. , ,
.
. 6.2. F C(X) : R R, f F (f ) F .
6.3 (, 1998). X , C(X) X , F
C(X), ,
(1 F ) X. F C(X).

: ,
.
( )
, .
,
. . - , ,
, .

6.2

, , ( ), , . 80- ,
, ,
.
, , , .
.
(error back-propagation) [58].
6.2.1

,
, . 21. . X = Rn , Y = RM .
. M m am , m = 1, . . . , M .
H h uh , h = 1, . . . , H.

106

. . .

,
n

}|

,
H

GFED
@ABC
/
x1 ERERRw11
ll<5
l
EEw1h RR
l
l y
RR
lll y
. . . w1HEE llRlRlRlRlRlR yyyy
Ry
lElE
yyRRRRRR
wj1 l EE
y
l
y
EE y
)
ll
@ABC
GFED
/
xj RRR wjh yyyEyEEE
5
l
l
EEllll A
wjH RRyy
yRRR lllEE
y
. . . wn1yy llRlRlRlRR EEE
RRR EE
y
ll
RRR "
yy ll
R)
ylyllwnh
n
@ABC
GFED
/
w
nH
x
:
w01

w0h

@ABC
GFED
1

}|

...
X
...
X

w0H

1 RERR w11

/
ll<5
EEw1m R
l
l
E
RRR
ll y
RRRlllll yyyy
w1ME
l
R
l
RRR yy
EElll
yRR
lE
wh1l EEE yyy RRRR
l
)
ll w
EEyy
/
h RRR hm yyy EE
5
l
E
l
EE lll A
whM RRyy
yyRRRRlllllElEE
y
y
wH1
lllRRRRRR EEE
RRR E"
ywy llll
y
RR)
y
lll Hm
/
H wHM
:
w01

w0m

,
M

}|

...
X
...
X

/ a1

/ am

/ aM

w0M

@ABC
GFED
1

. 21. .

h- m- whm .
( ),
v j , j = 1, . . . , J wjh . . , v j
j- : v j (x) fj (x) xj , J = n. w .
xi :
m

a (xi ) = m

X
H
h=0

whm u (xi ) ;

u (xi ) = h

X
J
j=0


wjh v (xi ) .
j

(6.1)

xi
( ):
M
2
1X m
Q(w) =
a (xi ) yim .
2 m=1

(6.2)

Q .
:
Q(w)
= am (xi ) yim = m
i .
am
, Q am m
i xi .
:
M
M
X

Q(w) X m
h
m

w
=
=
a
(x
)

m
i
i
m hm
i m whm = i .
uh
m=1
m=1

107

, m
i ,
h

i . m ,
, (6.1). , 

m
= m (1 m ) = am (xi ) 1 am (xi ) .
, hi m
i , ,

h
m
i m , .
whm , , , (
):

hi o

X jt j

jjwh1

j TTT
T

1
jjj i 1

whM TTT

...

M
i M

am uh , Q :
Q(w)
Q(w) am
h
=
= m
i m u (xi ),
m
whm
a whm
Q(w) uh
Q(w)
=
= hi h v j (xi ),
h
wjh
u wjh

m = 1, . . . , M,
h = 1, . . . , H,

h = 0, . . . , H;

j = 0, . . . , J;

(6.3)
(6.4)

. , .
,
, . 6.1.
.
. ,
O(Hn+HM ) .
.
back-propagation
.
.
, ,
, ,
. , back-propagation : , , - .
.
.

108

. . .

6.1. back-propagation

:
X = (xi , yi )i=1 , xi Rn , yi RM ;
H ;
;
:
wjh , whm ;
1:  :
2:
3:
4:

5:
6:

7:
8:

1
1
wjh := random 2n
, 2n
;
1
1
whm := random 2H , 2H ;

(xi , yi ) X ;
:

PJ
j
uhi := h
j=0 wjh xi , h = 1, . . . , H;

PH
h
am
i := m
h=0 whm ui , m = 1, . . . , M ;
m
m
am
i := P
i yi , m = 1, . . . , M ;
M
2
Qi := m=1 (m
i ) ;

PM :
h

i := m=1 m
i m whm , h = 1, . . . , H;
:
h
whm := whm m
i m ui , h = 0, . . . , H, m = 1, . . . , M ;
wjh := wjh hi h xji , j = 0, . . . , n, h = 1, . . . , H;
Q + 1 Qi ;
Q := 1

Q ;

, Q, . , , .
H. , ,
.
6.2.2


, 4.3.2 . , .
.
.
 1 16.1
 1
2k , 2k , k , .

109

( , ) ,
. 9.
.
, , H . - , . ,
. , -
,
, 7 . , , . ,
,
.
. ,
, . , Q(w),
. , ,
Q(w), . ,
. [50] .
1. (
), ,
.
2. .
,
:
jh =

2Q
2
wjh

+ ,

, , , , ,
. / Q(w),
.
back-propagation.
7

, . ,
.
(bagging),
. ??. ??.

110

. . .

3. ,
,
. , , (batch learning).
. , .
6.2.3

, ,
, , , . : , , .
.
. ,
, .
.
, . ,
. . ,
,
, .
H ,
.
1. . ( ) , , ,
. ( )
, ,
, ,
. ,
( ).
2. H , , Q(X k ).
, H, . ,
H,

.
. H . ,

111

. . , - .
, , ,
. .
.
, , , . ,
1.52 , . , ,
, .

- . Q(X k ) , ,
.
. (optimal
brain damage, OBD) [51, 42] , Q
. .
OBD ,
Q w ,
:
Q(w + ) = Q(w) + 12 H(w) + o(kk2 ),

2
, . H(w) = wjhQ(w)
wj h
, ,
, ,
. , .
H(w) ,
H(w) =

J X
H
X
j=0 h=1

2
jh

2 Q(w)
.
2
wjh

wjh wjh + jh = 0.
(salience) , 2
2 Q(w)
Q(w) : Sjh = wjh
.
2
wjh
OBD , d , Sjh . d
.
Q. d
Q.
, , .

112

. . .

wjh , h- j- .
h- .
OBD P . H
Sj =
h=1 Sjh ,
Sj .
whm , m- h- .
(M = 1), h- .
PM
Sh = m=1 Shm .

113

, . ,
, , . ,
.
,
- , . 7.1 ,
. 7.3 ,
.

7.1

( ) .
X = {x1 , . . . , x } X
(x, x ). ,
, , ,
, . xi X () yi .
a : X Y , x X y Y . Y ,
, .
,
. -, . ,
, , .
. -, , , . -, , , ,
.
( ) (
) , yi ,
Y .
, .
:
, X
( , , ).

114

. . .

, ( ).
,
( ).
( ).
. , .
, .
,
, ,
. . (taxonomy). , .
, , .

, XVIII . 30 , 7
: , , , , , , .
,
. , .
. , .
: , , .

:
,
, .
: ,
,
.

,
.

115

,
- , , . .

.
, .


. , , - . , FOREL -,
. ,
,
[18]. [46].
7.1.1

. ,
ij = (xi , xj ).
, , , .
. R (i, j), ij > R.
. ,
R [min ij , max ij ],
. .
,
, .
( ) .

116

. . .

7.1. ()
1: (i, j) ij ;
2:
3:
, ;
4:
;
5: K 1 ;
R ij . :
. R
, [16].
.
. . .
. R, . R
. R,
.
1 ,
.
(),
. ,
, 14 7.1. 5 K 1
, K .
, K
. , , , .
- () .
, ,
.
.
O(3 ) .
FOREL ( ) 1967
.
, [12, 11].
.

117

x0 X R.
xi X , (xi , x0 ) 6 R, x0
. ,
, , .
, .
. x0
, .
, X
, .
, . , ,
X. ,
. , 6
X
x0 := arg min
(x, x ).
xK0

x K0

. O() ,
O( 2 ), . , ,
O() , O(1).
FOREL
, R, x0 . 7.2 ,
. 9 . ,
, , . ,
, , :
.

. R, . , R .
R.
7.2 x0
. [11]
( 10..20) . ,
. ,
.
.
7.1.2

:
yi xi , .

118

. . .

7.2. FOREL
1: :
U := X ;
2: , U 6= :
3:
x0 U ;
4:

5:
x0 R:
K0 := {xi U | (xi , x0 ) 6 R};
6:
P :
x0 := |K10 | xi K0 xi ;
x0 ;
7:
8:
K0 :
U := U \ K0 ;
9: ;
10: xi X ;
, . ,
.
:
P
i<j [yi = yj ](xi , xj )
P
F0 =
min .
i<j [yi = yj ]
:
P
i<j [yi 6= yj ](xi , xj )
P
F1 =
max .
i<j [yi 6= yj ]

y , y Y ,
, .
:
X 1 X
2 (xi , y ) min,
0 =
|Ky | i : y =y
yY
i

Ky = {xi X | yi = y} y.
, . , , 0
Ky , , |Ky | .
:
X
1 =
2 (y , ) max,
yY

.
,
, :
F0 /F1 min,

0 /1 min .

119

7.3. EM-
1: y Y :
wy := 1/|Y |;
y := ;
P
2
yj
:= |Y1 | i=1 (fj (xi ) yj )2 , j = 1, . . . , n;
2:
3:
E- (expectation):
wy py (xi )
, y Y , i = 1, . . . , ;
giy := P
zY wz pz (xi )
4:
M- (maximization):

1P
giy , y Y ;
wy :=
i=1

1 P
yj :=
giy fj (xi ), y Y , j = 1, . . . , n;
wy i=1

1 P
2
giy (fj (xi ) yj )2 , y Y , j = 1, . . . , n;
yj
:=
wy i=1
5:
:
yi := arg max giy , i = 1, . . . , ;
yY

6: yi ;

7.1.3

, .
.
EM-. , . 2.4.
7.1 ( ). X ,
X
X
p(x) =
wy py (x),
wy = 1,
yY

yY

py (x) y, wy y.

py (x), . , ,
, . , . -,
. , , . ,
.

120

. . .

7.2 ( ). 
x X = Rn n : x f1 (x), . . . , fn (x) .
y Y n- py (x) 2
2
y = (y1 , . . . , yn ) y = diag(y1
, . . . , yn
):

n
py (x) = (2) 2 (y1 yn )1 exp 21 2y (x, y ) ,

2y (x, x ) =

n
P

j=1

2
2
yj
|fj (x)fj (x )|2 yj
.

, EM 2.3. , 2.7 .
7.3.
, EM-
. E- giy . giy , xi X
y Y . M- (y , y ),
giy .
7.3 ,
. , 2.3.
k-, 7.4, EM-. , EM- xi giy = P{yi = y}. k-
(k-means) .
, k-means .
. EM,

y = y In .
, k-means,
. 7.3
5 E- :
yi := arg min y (xi , y ),

j = 1, . . . , n;

yY

giy := [yi = y],

j = 1, . . . , n,

y Y.

, EM k-means ,
.
, k-means
FOREL. , FOREL R,
k-means .
k-means. [18, . 110] 7.4. [18, . 98]
, , xi

121

7.4. k-
1: y Y :
y ;
2:
3:
( E-):
yi := arg min (xi , y ), i = 1, . . . , ;
yY

( M-):
P
[yi = y]fj (xi )
, y Y , j = 1, . . . , n;
yj := i=1
P
[y
=
y]
i
i=1
5: yi ;

4:

, . 4
i, 3. 1967 ,
0 .
k-means . 1 . k :
; ,
.
,
.
, .
k ,
.
. EM k-means (semi-supervised
learning), xi y (xi ). U , U X .
, , . ,
, . . (x, x ) -
: , ,
, .
: E- ( 3)
xi U giy := [y = y (xi )], xi X \ U giy .
.

122

7.1.4

. . .

,
, , .
.
.
.
. , . 7.5.
.

R({x}, {x }) = (x, x ).

.
U V W = U V .
W S R(U, V ),
R(U, S) R(V, S), :
R(U V, S) = U R(U, S) + V R(V, S) + R(U, V ) + |R(U, S) R(V, S)|,

U , U , , . .
1967 [49, 24].
R(W, S)
W S.
- [18]:
U = V = 12 , = 0, = 12 ;

:
R (W, S) =

min (w, s);

wW,sS

U = V = 12 , = 0, = 21 ;

:
R (W, S) = max (w, s);
wW,sS

:
P P
R (W, S) = |W1||S|
(w, s);

U =

|U |
,
|W |

V =

|V |
,
|W |

= = 0;

U =

|U |
,
|W |

V =

|V |
,
|W |

= U V , = 0;

U =

|S|+|U |
,
|S|+|W |

wW sS

:
P
P s
w
R (W, S) = 2
;
,
|W |
|S|
wW

R (W, S) =

|S||W | 2

|S|+|W |

sS

P

wW

w
,
|W |

sS

s
|S|


;

V =

|S|+|V |
,
|S|+|W |

|S|
,
|S|+|W |

= 0.

,
. : ?
, .

123

7.5. -
1: C1 :


t := 1; Ct = {x1 }, . . . , {x } ;
2: t = 2, . . . ,
(t ):
3:
Ct1 :
(U, V ) := arg min R(U, V );
U 6=V

4:
5:
6:

Rt := R(U, V );
U V , W = U V :
Ct := Ct1 {W } \ {U, V };
S Ct
R(W, S) -;

. Rt , t- . , R
,
: R2 6 R3 6 . . . 6 R .

, . , Rt . ,
,
, . Ct
.
, , - .
, . ,
, , .
, -
.
7.1 (, 1979). ,
:
1) U > 0, V > 0;
2) U + V + > 1;
3) min{U , V } + > 0.
R . ,
.
. . , ,
, .

124

. . .

, Rt .
, . , ,
. R R .
, , . , ,
. .
, Rt . R
.
Rt /(U , V ),
Rt = R(U, V ) ,
t- , U V .
, R ;
, . ,
, , , R R . ,
.
,
, . :
U = V = (1 )/2,

= 0,

< 1.

> 0 < 0.
: = 0,25 [24].
. 7.5 3. O(2 )
. , O(3 ) .
.
, .

,
(U, V ) : R(U, V ) 6 . , , .
, .
7.6.
, , 7.5, R :
. 7.1 (, 1978). R ,
> 0 - U V - U V
- W = U V :



S R(U V, S) < , R(U, V ) 6 S R(S, U ) < R(S, V ) < .

125

7.6.
1: C1 :


t := 1; Ct = {x1 }, . . . , {x } ;
2: 
;
3: P () := (U, V ) U, V Ct , R(U, V ) 6 ;
4: t = 2, . . . ,
(t ):
P () =
5:
6:
, P () 6= ;
7:
P () :
(U, V ) := arg min R(U, V );
(U,V )P ()

8:
9:
10:
11:
12:

Rt := R(U, V );
U V , W = U V :
Ct := Ct1 {W } \ {U, V };
S Ct
R(W, S) -;
R(W, S) 6 

P () := P () (W, S) ;

7.2 ( , 1984). ,
R :
1) U > 0, V > 0;
2) U + V + min{, 0} > 1;
3) min{U , V } + > 0.
7.2 7.1, , , , . R
.
7.6 2 6. : , P ()
, 7.5; , P (). .
Ct n1 , P ()
(U, V ) Ct . n2
R(U, V ), . n1 n2 ,
. , , .
n1 = n2 = 20.

.
|Rt+1 Rt |,

126

. . .

Ct . K = t + 1.
K0 6 K 6 K1
t, K1 + 1 6 t 6 K0 + 1.

, .
. , , . ,
, .
,
. , . , .
.
, .
,
, .
[18].
.

7.2

,
, xi yi .
, xi , ,
.
, X : X X R,
.
7.2.1

X = Rn , Y = {1, . . . , M } , M . X = {xi }i=1 .


a : X Y .
WTA.
, wm Rn , m = 1, . . . , M , , x X :
a(x) = arg min (x, wm ).
mY

(7.1)

, x.
, x, -, (7.1)
WTA (winner takes all).

127
GFED
@ABC
x1 ?? w11 / (x, w1 )
?
LLL
?

LLL
1

wM

&
?? 
arg
??


?

?
min
8
??

rr
wn
??
rrr
1

r


n
@ABC
GFED
/ (x, wM )
x n wM

/ a(x)
. 22. (7.1) .

a(x) wm . ,
:

1X 2
Q(w1 , . . . , wM ) =
(xi , wa(xi ) ) min .
{wm }
2 i=1

, : (x, w) = kxwk.
Q wm :

X


Q
(wm xi ) a(xi ) = m .
=
wm
i=1

wm
4.1, .
, 6


wm := wm + (xi wm ) a(xi ) = m ,
(7.2)

xi X ,
, . : xi m, wm
xi , .
(7.2)
, ,
. (7.1) ,
hx, wm i
(x, wm ), , . . 22. wm
, . ,
, -.
. .
[40].
(learning
vector quantization, LVQ) , xi
M wm , , , () . , M ,
.
M , .

128

. . .

CWTA. WTA ,
, . .
(7.1) .
CWTA (conscience WTA):
a(x) = arg min Cm (x, wm ),
mY

(7.3)

Cm m- . ,
.
WTM. WTA , , - wm . , , ,
xi .
[0, +) K(). K(0) = 1. K() = exp(2 )
> 0. WTA WTM (winner takes most):

wm := wm + (xi wm ) K (xi , wm ) , m = 1, . . . , M.
(7.4)

xi ,
xi , .
, (7.2)

(7.4), K (xi , wm ) = [a(xi ) = m].
,
. , , -.
, WTM
. .
7.2.2

(self-organizing maps, SOM) . ,


, . ,
. ,
, - .
,
, ,
M H. M H .
wmh Rn , m = 1, . . . , M ,

129

7.7.
:
X ;
;
:
wmh , m = 1, . . . , M , h = 1, . . . , H;
1: :


wmh := random 2M1 H , 2M1 H ;
2:
3:
xi X ;
4:
WTA: , xi :
(mi , hi ) := a(xi ) arg min (xi , wmh );
(m,h)Y

(m, h) Y , (mi , hi )
WTM: :

wmh := wmh + (xi wmh ) K r((mi , hi ), (m, h)) ;
7: ;

5:
6:

h = 1, . . . , H. , Y ,
Y = {1, . . . , M } {1, . . . , H}.
a(x) (m, h) Y , , x.
, .
, . 7.7. xi 3 WTA.
(mi , hi ). , WTM, ,
, xi .
, 6, (7.4), (x, x ), ,
Y :
 p
r (mi , hi ), (m, h) = (m mi )2 + (h hi )2 .

, K() , , K() = exp(2 ). : ,


, . ,
,
.
, , ,
(mi , hi ) = a(xi ) .

. ( a(x)) X ,
.

130

. . .

. 23. UCI.house-votes ( , 17 ). : (1)


, (2) , (3) {, }.
.

- .
k ,
. ,
. . , ,
, , . 23.
.
n , . (m, h) j- wm,h .
. j- , .

131

, ,
.

, , .
, .
, . , ,
.
.
. , , .

.
. . , [44]. ,
, , .
.
, .
.

, . .
, , .
.
7.2.3

.
,
X = {xi }i=1 yi = y (xi ).
, ( X) , yi , y . WTA - .
WTM .

132

. . .

- . (7.1) ,
, v1 , . . . , vM :
a(x) = vm (x) =

M
X

m=1



vm m (x) = m ;

(7.5)

m (x) = arg min (x, wm );


m=1,...,M

m (x) - x, WTA.
vm
:

Q(v) =

2
1X
a(xi ) yi min;
v
2 i=1

X


Q
=
a(xi ) yi m (xi ) = m = 0.
vm
i=1

a(xi ) (7.5),


P
i=1 yi m (xi ) = m
vm = P 
 .
(x ) = m
m
i
i=1

, vm yi
xi , m- . a(x) , m- , vm . , a(x)
- .
WTM
K(). a(x) - vm M :

M
X
K (x, wm )
vm PM
a(x) =
.
s=1 K (x, ws )
m=1

Q(v)
M M . .
wm , () vm :


wm := wm (wm xi )K (xi , wm ) ; 
K (xi , wm )

;
vm := vm (a(xi ) yi ) PM
s=1 K (xi , ws )

, wm vm , .
, , ,
back-propagation.

133

7.3

X , ,
.
(multidimensional scaling, MDS) . X = {x1 , . . . , x } X.
Rij = (xi , xj ) (i, j) D.
xi X xi = (x1i , . . . , xni ) Rn ,
dij xi xj
d2ij

n
X
d=1

(xdi xdj )2

Rij (i, j) D.
-;
, :
X
S(X ) =
wij (dij Rij )2 min,
(i,j)D

d=1,n
n (xdi )i=1,
.
n . , n = 2

(scatter plot). , , (S > 0),
, ,
.
.
,
, D .
wij . wij = (Rij ) .
< 0 ;
> 0 . = 2,

, ; , .
S(X ) n ,
, .

.
.

. ,
x X x (x1 , . . . , xn ), :
X
2
S(x) =
wi di (x) Ri min,
xi U

134

. . .

U X Ri = (x, xi ); di (x) x (x1 , . . . , xn ) xi (x1i , . . . , xni ).


S(x).
, x(0) . x xi U ,
Ri . x(0)
U , , ci = Ri2 :
P
c i xi
(0)
.
x = Pxi U
xi U ci

1
x(t+1) := x(t) ht S (x(t) ) S (x(t) ),

S (x(t) ) () S(x) x(t) ,


S (x(t) ) () S(x) x(t) ,
ht , 1,
.
.
xa xai
di
=
;
xa
di

X 

S
Ri
a
a
1

=
2
w
x

x
;
i
i
xa
di
xi U
!

2
X
Ri
Ri xa xai
2S

+1 ;
=2
wi
xa xa
d
d
d
i
i
i
xi U
2
X Ri  xa xa   xb xb 
S
i
i
=2
wi
.
a
b
x x
di
di
di
x U
i

, n 6 3
,
O(n3 ).
, S(x) x ,
kx(t+1) x(t) k.
, x U X (x, U ).
. 7.8 , xi , xj .
. , ,
, ,
. 34 , .
( )
(0, 0) (0, Rij ). xk ,

135

7.8.
:
Rij , , ;
K ;
:
xi (x1i , . . . , xni ), i = 1, . . . , ;
1: :

U := ;
2: |U | < K
:

3:
x := arg max min Rij ;
xi X \U

xj U

4:
(x, U );
5:
U := U {x};
6: :
7:
:

x := arg max S(x);


xi U

8:
(x, U \ {x});
9: x X \ U
10:
(x, U );

, min{Rki , Rkj } . xk ( ) , ijk


.

. K
. , ,
, .
- .
K = , ; , , O(2 ) .
O(K 2 ) + O(K). K,
.
n = 2 .
, . . , .
, , , .
. - , ,
.

136

. . .

,
. ;
Rij ; dij ; (i, j) D.
,
. dij (Rij ),
, dij (R)
(, 90%) R.

. , , , .
7.1.
, [15] ( ) . ,
, Rij (, ).
, ( ), .


[1] . ., . ., . ., . .
: . .: , 1989.
[2] . ., . ., . . : . .: , 1985.
[3] . ., . ., . . . .: , 1970. 320 pp.
[4] . . . .: , 1979.
[5] . ., . .
// . 1968. . 181, 4. . 781784.
[6] . ., . .
// . 1971.
. 16, 2. . 264280.
[7] . ., . . . .: ,
1974.
[8] . . : , . .: , 2001.

137

[9] . . // . 1969. . 14, 1. . 156


161.
[10] . ., . . . .: , 1976.
[11] . . . :
, 1999.
[12] . ., . ., . . . : , 1985.
[13] . . .: , 1975.
[14] . . // .
. 1958. . 114, 5. . 953956.
[15] . . . : , 2006.
[16] . . . .: -, 2003.
[17] . ., . ., . ., . . . .
: , 1996.
[18] . . . .: , 1988.
[19] . . ,
. . 2005.
http://www.recognition.mccme.ru/pub/RecognitionLab.html/methods.html.
[20] . . . . 2006.
http://www.recognition.mccme.ru/pub/RecognitionLab.html/slt.pdf.
[21] / . . , . . -, . . ,
. . , . . , . . , . . . : , 1998. 296 .
[22] . . . .: -, 2004.
[23] . ., . . . .: , 1986.
[24] . ., . . // / . . . . .: , 1986.
. 269301.
[25] . . .: , 1993.

138

. . .

[26] ., .
. : , 2004.
[27] . . // . , , 1965. Pp. 3845.
[28] . . : , , .
.: , 2000.
[29] . . . .: , 1986.
[30] Asuncion A., Newman D. UCI machine learning repository: Tech. rep.: University of
California, Irvine, School of Information and Computer Sciences, 2007.
http://www.ics.uci.edu/mlearn/MLRepository.html.
[31] Bartlett P. The sample complexity of pattern classication with neural networks:
the size of the weights is more important than the size of the network // IEEE
Transactions on Information Theory. 1998. Vol. 44, no. 2. Pp. 525536.
http://discus.anu.edu.au/bartlett.
[32] Bartlett P., Shawe-Taylor J. Generalization performance of support vector machines
and other pattern classiers // Advances in Kernel Methods. MIT Press,
Cambridge, USA, 1999. Pp. 4354.
http://citeseer.ist.psu.edu/bartlett98generalization.html.
[33] Bishop C. M. Pattern Recognition and Machine Learning. Springer, Series:
Information Science and Statistics, 2006. 740 pp.
[34] Boucheron S., Bousquet O., Lugosi G. Theory of classication: A survey of some
recent advances // ESAIM: Probability and Statistics. 2005. no. 9. Pp. 323
375.
http://www.econ.upf.edu/lugosi/esaimsurvey.pdf.
[35] Burges C. J. C. A tutorial on support vector machines for pattern recognition //
Data Mining and Knowledge Discovery. 1998. Vol. 2, no. 2. Pp. 121167.
http://citeseer.ist.psu.edu/burges98tutorial.html.
[36] Burges C. J. C. Geometry and invariance in kernel based methods // Advances in
Kernel Methods / Ed. by B. Scholkopf, C. C. Burges, A. J. Smola. MIT Press,
1999. Pp. 89 116.
[37] Cleveland W. S. Robust locally weighted regression and smoothing scatter plots //
Journal of the American Statistical Association. 1979. Vol. 74, no. 368.
Pp. 829836.
[38] Cortes C., Vapnik V. Support-vector networks // Machine Learning. 1995.
Vol. 20, no. 3. Pp. 273297.
http://citeseer.ist.psu.edu/cortes95supportvector.html.
[39] Dempster A. P., Laird N. M., Rubin D. B. Maximum likelihood from incomplete
data via the EM algorithm // J. of the Royal Statistical Society, Series B. 1977.
no. 34. Pp. 138.

139

[40] Durbin R., Rummelhart D. E. Product units: A computationally powerful


and biologically plausible extension to backpropagation networks // Neural
Computation. 1989. Vol. 1, no. 4. Pp. 133142.
[41] Fisher R. A. The use of multiple measurements in taxonomic problem // Ann.
Eugen. 1936. no. 7. Pp. 179188.
[42] Hassibi B., Stork D. G. Second order derivatives for network pruning: Optimal brain
surgeon // Advances in Neural Information Processing Systems / Ed. by S. J. Hanson,
J. D. Cowan, C. L. Giles. Vol. 5. Morgan Kaufmann, San Mateo, CA, 1993.
Pp. 164171.
http://citeseer.ist.psu.edu/hassibi93second.html.
[43] Hastie T., Tibshirani R. Generalized additive models // Statistical Science. 1986.
Vol. 1. Pp. 297318.
http://citeseer.ist.psu.edu/hastie95generalized.html.
[44] Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning.
Springer, 2001. 533 pp.
http://http://www-stat.stanford.edu/tibs/ElemStatLearn.
[45] Hebb D. The organization of behavior. New York: Wiley, 1949.
[46] Jain A., Murty M., Flynn P. Data clustering: A review // ACM Computing
Surveys. 1999. Vol. 31, no. 3. Pp. 264323.
http://citeseer.ifi.unizh.ch/jain99data.html.
[47] Jordan M. I., Xu L. Convergence results for the EM algorithm to mixtures of experts
architectures: Tech. Rep. A.I. Memo No. 1458: MIT, Cambridge, MA, 1993.
[48] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and
model selection // 14th International Joint Conference on Articial Intelligence,
Palais de Congres Montreal, Quebec, Canada. 1995. Pp. 11371145.
http://citeseer.ist.psu.edu/kohavi95study.html.
[49] Lance G. N., Willams W. T. A general theory of classication sorting strategies. 1.
hierarchical systems // Comp. J. 1967. no. 9. Pp. 373380.
[50] LeCun Y., Bottou L., Orr G. B., Muller K.-R. Ecient BackProp // Neural
Networks: tricks of the trade. Springer, 1998.
[51] LeCun Y., Denker J., Solla S., Howard R. E., Jackel L. D. Optimal brain damage //
Advances in Neural Information Processing Systems II / Ed. by D. S. Touretzky.
San Mateo, CA: Morgan Kauman, 1990.
http://citeseer.ist.psu.edu/lecun90optimal.html.
[52] McCulloch W. S., Pitts W. A logical calculus of ideas immanent in nervous activity //
Bulletin of Mathematical Biophysics. 1943. no. 5. Pp. 115133.
[53] Mercer J. Functions of positive and negative type and their connection with the
theory of integral equations // Philos. Trans. Roy. Soc. London. 1909. Vol. A,
no. 209. Pp. 415446.

140

. . .

[54] Minsky M., Papert S. Perceptrons: an Introduction to Computational Geometry.


MIT Press, 1968.
[55] Noviko A. B. J. On convergence proofs on perceptrons // Proceedings of the
Symposium on the Mathematical Theory of Automata. Vol. 12. Polytechnic
Institute of Brooklyn, 1962. Pp. 615622.
[56] Parzen E. On the estimation of a probability density function and mode // Annals
of Mathematical Statistics. 1962. Vol. 33. Pp. 10651076.
http://citeseer.ist.psu.edu/parzen62estimation.html.
[57] Rosenblatt M. Remarks on some nonparametric estimates of a density function //
Annals of Mathematical Statistics. 1956. Vol. 27, no. 3. Pp. 832837.
[58] Rummelhart D. E., Hinton G. E., Williams R. J. Learning internal representations
by error propagation // Vol. 1 of Computational models of cognition and perception,
chap. 8. Cambridge, MA: MIT Press, 1986. Pp. 319362.
[59] Shawe-Taylor J., Cristianini N. Robust bounds on generalization from the margin
distribution: Tech. Rep. NC2TR1998029: Royal Holloway, University of London,
1998.
http://citeseer.ist.psu.edu/shawe-taylor98robust.html.
[60] Smola A., Bartlett P., Scholkopf B., Schuurmans D. Advances in large margin
classiers. MIT Press, Cambridge, MA. 2000.
http://citeseer.ist.psu.edu/article/smola00advances.html.
[61] Smola A., Schoelkopf B. A tutorial on support vector regression: Tech. Rep.
NeuroCOLT2 NC2-TR-1998-030: Royal Holloway College, London, UK, 1998.
http://citeseer.ist.psu.edu/smola98tutorial.html.
[62] Stone M. N. The generalized Weierstrass approximation theorem // Math. Mag.
1948. Vol. 21. Pp. 167183, 237254.
[63] Tibshirani R. J. Regression shrinkage and selection via the lasso // Journal of the
Royal Statistical Society. Series B (Methodological). 1996. Vol. 58, no. 1.
Pp. 267288.
http://citeseer.ist.psu.edu/tibshirani94regression.html.
[64] Tipping M. The relevance vector machine // Advances in Neural Information
Processing Systems, San Mateo, CA. Morgan Kaufmann, 2000.
http://citeseer.ist.psu.edu/tipping00relevance.html.
[65] Vapnik V., Chapelle O. Bounds on error expectation for support vector machines //
Neural Computation. 2000. Vol. 12, no. 9. Pp. 20132036.
http://citeseer.ist.psu.edu/vapnik99bounds.html.
[66] Widrow B., Ho M. E. Adaptive switching circuits // IRE WESCON Convention
Record. DUNNO, 1960. Pp. 96104.

141

[67] Wu C. F. G. On the convergence properties of the EM algorithm // The Annals of


Statistics. 1983. no. 11. Pp. 95103.
http://citeseer.ist.psu.edu/78906.html.

Das könnte Ihnen auch gefallen