Beruflich Dokumente
Kultur Dokumente
. .
http://www.ccas.ru/voron
voron@ccas.ru
, .
, vokov@forecsys.ru,
( , ..)
- www.MachineLearning.ru.
.
1 :
1.1 . . . . . . . . . . . . . . . . . . . .
1.1.1 . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 . . . . . . . . . . . . . .
1.1.4 . . . . . . . . . . . . . . . . . . . . . . .
1.1.5 . . . . . . . . . .
1.1.6
1.2 . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 . . . . . . . . . . . . . . . . . . . . . .
1.2.2 . . . . . . . . . . . . . . . .
1.2.3 . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 . . . . . . . . . . . . . . . . . . . . . . .
1.2.5 . . . . . . . . . . . . . . . . . . . .
1.2.6 . . . . . .
1.2.7 . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
4
5
5
6
7
8
9
9
11
12
13
14
14
16
2
2.1 . . .
2.1.1 . . . . . . . . . . . .
2.1.2 . .
2.1.3
2.2 . . . . . . . . . . . .
2.2.1 . . . . . .
2.2.2 . . . . . . . . . . . . .
2.3 . . . . . . . . . .
2.3.1 . . . . .
2.3.2 . . . . . . . . . . .
2.3.3 . . . . . . . . .
2.4 . . . . . . . . . . . . . .
2.4.1 EM- . . . . . . . . . . . . . . . . . . . . .
2.4.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
18
18
19
21
22
22
23
25
25
26
29
32
32
37
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . .
2.4.3
. . . . . . . . . . . . . . . . 39
3
3.1 . . . . . . . .
3.1.1 . . . . .
3.1.2 . . . . . . . . . . . . . .
3.1.3 . . . . . . . . . . . . . .
3.1.4 . . . . . . . . . . .
3.2 . . . . . . . . . . . . . . . . . .
3.2.1 . . . . . . . . . . . . . . .
3.2.2 STOLP
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4.1 . . . . . . . . .
4.2 . . . . . . . . . . . . . . . . . . . . .
4.3 . . . . . . . . . . . . . . . . . . . . .
4.3.1 . . . . . . . . . . . . . . . . . . .
4.3.2 . .
4.4 . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 . . . . . . . . . . . . . .
4.4.2
4.4.3 . . . . . .
4.5 . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 . . . . . . . . . . . . . . . . . . .
4.5.2 . . . . . . . . . . . . . . . . . .
4.5.3 . . . . . . . . . . . . . . . .
4.6 ROC- . . . . . . . .
5
5.1 . . . . . . . . . . . . . . . . . . . . . .
5.2 : . . . . . . . . .
5.2.1 . . . . . . . . . . . . . . . . . . . .
5.2.2 . . . . . . . . . . . . . . . . . . .
5.2.3 :
5.2.4 . . . . . . . . . . . . . . . . . . .
5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 . . . . . . . . . . . . . . . . . . . . .
5.3.2 . . . . . . . . . . . . . . . .
5.3.3 . . . . . . . . . . . . . . . . . . . . . . . .
5.3.4 . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.5 . . . . . . . . . . . . . . . . .
5.4 . . . . . . . . . . . . . . . . . . . . . . . .
5.5 . . . . . . . . . . . .
5.5.1 . . . . . . . . . . . . . . . . . .
5.5.2 . . . . .
5.5.3 . . . . . . . . . . . . . . . . . .
5.5.4 . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
42
42
42
43
45
45
47
47
48
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
51
54
57
59
60
62
62
64
66
67
68
69
73
78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
80
80
80
80
81
82
83
84
85
86
86
88
89
89
93
93
94
96
96
5.5.5 . . 98
5.6 . . . . . . . . . . . . . . . 99
6
6.1 . . . . . . . . . . . . . . . . . . . . . .
6.1.1 . . . . . . . . . . .
6.1.2 .
6.2 . . . . . . . . . . . . . . .
6.2.1 . . . .
6.2.2 . . . . . .
6.2.3 . . . . . . . . . . .
7
7.1 . . . . . . . . . . . . . . .
7.1.1 . . . . .
7.1.2 . . . .
7.1.3 . . . . . . . . . . .
7.1.4 . . . . . . . . .
7.2 . . . . . . . . . . . . . . . . . . . . . .
7.2.1 . . . . . . . .
7.2.2 . . . .
7.2.3
7.3 . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
102
. 102
. 102
. 104
. 105
. 105
. 108
. 110
.
.
.
.
.
.
.
.
.
.
113
. 113
. 115
. 117
. 119
. 122
. 126
. 126
. 128
. 131
. 133
. . .
,
. .
1.1
(feature) f x . f : X Df , Df
. , a : X Y
.
Df .
Df = {0, 1}, f ;
Df , f ;
Df , f ;
Df = R, f .
, Df1 = = Dfn ,
, .
f1 , . . . , fn . f1 (x), . . . , fn (x)
x X. X , X = Df1 . . . Dfn .
X , n, :
f1 (x1 ) . . . fn (x1 )
...
... .
(1.1)
F = kfj (xi )kn = . . .
f1 (x ) . . . fn (x )
.
1.1.2
Y
.
Y = {1, . . . , M }, (classication) M . X
Ky = {x X : y (x) = y}, a(x) x?.
(pattern recognition).
Y = {0, 1}M , M . M
.
Y = R, (regression estimation).
(forecasting) , x X
x, y Y .
1.1.3
. 1.1. A = {g(x, ) | }, g : X Y ,
, (search space).
1.1. n fj : X R, j = 1, . . . , n
= (1 , . . . , n ) = Rn :
g(x, ) =
n
X
j fj (x) , Y = R;
j=1
g(x, ) = sign
n
X
j=1
, .
,
.
1.2. (xi , yi ) R2 , i = 1, . . . ,
. n fj (x) = xj1 , g(x, )
1.1 n 1 x.
X
(tting) (training, learning)1 a A.
1
, (learning machine),
, (training sample).
. . .
1X
L (a, xi ).
i=1
(1.2)
(1.3)
Q(a, X ) =
1.3. (Y = R) n fj : X R, j = 1, . . . , n,
, :
(X ) = arg min
X
i=1
2
g(xi , ) yi .
[] = 0, [] = 1.
1.1.5
X , . ,
fj (x) y (x)
. , , .
x . y (x), , .
.
y (x)
X Y
p(x, y),
X = (xi , yi )i=1 .
(independent identically distributed, i.i.d.).
, y (x)
p(x, y) = p(x)p(y|x), p(y|x) = (y y (x)), (z) -.
. g(x, ), y (x),
(x, y, ),
p(x, y). , X ,
.
X ,
p(x, y)
: p(X ) = p (x1 , y1 ), . . . , (x , y ) = p(x1 , y1 ) p(x , y ). p(x, y)
(x, y, ), (likelihood):
L(, X ) =
(xi , yi , ).
i=1
, . , , L(, X )
.
. , , [13].
, , a (x) (x, y, ) .
.
L ln L,
( ) :
ln L(, X ) =
X
i=1
ln (xi , yi , ) min .
(1.4)
. . .
(1.2),
L (a , x) = ln (x, y, ). (xi , yi ) ,
(xi , yi , ) L (a , x).
(x, y, ) ,
.
1.4. g(x, ).
, (x,
) = g(x, ) y (x) 2
1
2
N (; 0, ) = 2 exp 22 2 .
(x, y, ) = p(x)(y | x, ) = p(x)N g(x, ) y (x); 0, 2 .
, C0 C1 , :
2
ln (x, y, ) = ln p(x)N g(x, ) y (x); 0, 2 = C0 + C1 g(x, ) y (x) .
,
: , . (
) . .
1.1.6
. Q(a, X ) a,
, a X k = (xi , yi )ki=1 .
,
, , , (overtraining) (overtting). .
,
, . X , ,
x xi X . x = xi
yi . .
, . . :
, .
(generalization ability) Q((X ), X k ) , X X k .
, X X k ,
X.
. 1.5. ,
PX ,X k Q((X ), X k ) > < .
(1.5)
, (1 ) .
:
X X k Q((X ), X k ) 6 1 .
(1.5) . 60-
. . . . [5, 6, 7].
[34],
, .
,
.
X L = (xi , yi )Li=1 . N
Xn Xnk
k = L. n = 1, . . . , N an = (Xn )
Qn = Q(an , Xnk ). Qn
(cross-validation, CV):
N
1 X
CV(, X ) =
Q((Xn ), Xnk ).
N n=1
L
(1.6)
, X L [48]. , N 20 100.
tq- (tq-fold cross-validation),
q ( ) , , . X L - t
q . N = tq .
, t .
: , ,
- L .
1.2
,
, .
1.2.1
1.5. . ,
10
. . .
. , , , , . .
(, , , ).
, , , , , . . , ,
. ,
: ( ); ; ; ;
. ,
, .
1.6. .
60-70- .
, .
, , , , , .
: , .
, , . ,
. , , , , .
.
:
. .
(score) , . , . (credit scoring).
,
,
. , :
, ,
. .
1.7. (churn prediction)
, ,
, .
. , , , . , .
, : , , , ,
, . . ,
. , , -
11
, xi ti . , ti + t.
, , .
1.2.2
1.8. 1886
.
, ,
. ,
. 928 -.
y = 32 x, x , y
. ,
.
:
, , : .
, . , , .
, , .
1.9. .
.
, , , ( ). ,
. .
.
. , : , -
, , , . , ,
.
. , , , , .
1.10. -,
, . ,
, , 1 5. .
12
. . .
, , .
,
. ,
. ,
99% .
.
.
(collaborative ltering).
.
2006 Netix,
Internet, 1 , 10% ,
Netix Cinematch (. http://www.netflixprize.com). ,
Cinematch 10% .
, 70%
. .
1.2.3
(ranking) .
,
. .
, . , (),
, .
(learning to rank).
1.11. , , . , , , . (, )
( ). , , .
.
: , , , . .
, . .
1.12. 1.10 . ,
13
, . , Netix,
. ,
.
,
.
1.2.4
(clustering) (classication)
, yi = y (xi ). xi ,
() , , . .
, .
1.13. . ,
. , . :
, , .
.
( , , , . .).
. ,
. , , .
, ( , ) , (semisupervised learning). , , .
1.14. . , , (,
), (,
). , .
, .
, . , , ,
, , . . ,
.
14
1.2.5
. . .
,
.
, . , ,
.
(1.6).
,
.
. ; ; , .
.
,
. ,
.
15
, . ,
, , , ;
.
. , . ,
, , , . , ( ). . UCI ( , ),
http://archive.ics.uci.edu/ml. ,
, [30].
.
. ;
, . ,
, ,
. , , -
.
, http://poligon.MachineLearning.ru. .
, , . , .
.
. , , , .
, ,
, , . http://www.kaggle.com, http://tunedit.org. ,
: http://poligon.MachineLearning.ru, http://mlcomp.org.
16
1.2.7
. . .
. , .
. [10]. , [0, 1] .
. 1. r [0, 1], = [r < p] 1 p 0
1 p.
. 2. r [0, 1], F0 = 0, F1 , . . . , Fk1 , Fk = 1,
, F1 6 r < F , j = 1, . . . , k
pj = Fj Fj1 .
. 3. r [0, 1],
R F (x), 0 6 F (x) 6 1, = F 1 (r)
F (x).
. 4. r1 , r2 , [0, 1], -
p
1 = 2 ln r1 sin 2r2 ;
p
2 = 2 ln r1 cos 2r2 ;
: 1 , 2 N (0, 1).
. 5. N (0, 1), = + N (, 2 )
2 .
. 6. n- x = (1 , . . . , n ) i N (0, 1). V n n-,
Rn . x = + V x N (, ) c = V V .
. 7. X k p1 (x), . . . , pk (x).
1, . . . , k w1 , . . . , wk . x X,
Pk
p (x), p(x) = j=1 wj pj (x).
.
17
. 1. .
. 2. .
. 8. .
X p(x, t), t R . R w(t). x RX, +
p(x, ), p(x) = w(t)p(x, t) dt.
, ,
, .
. 9. Rn = [a1 , b1 ] . . . [an , bn ] G . r = (r1 , . . . , rn ) n
ri , [ai , bi ]. , r ,
r G. r G.
, G .
, .
. 1.
, . , . -
.
, , ,
,
. . . 2 ,
.
18
. . .
. , , , , .
. : ,
.
2.1
X , Y , X Y p(x, y) = P(y)p(x|y).
Py = P(y) . py (x) = p(x|y) 3 .
.
2.1. X = (xi , yi )i=1 p(x, y) = Py py (x). 4 Py py (x) y Y .
2.2. py (x) Py y Y a(x),
.
, .
, p(x, y)
X . , .
2.1.1
x , x y:
Z
P(|y) =
py (x) dx, X.
a : X Y . X
Ay = {x X | a(x) = y}, y Y . ,
y a s, Py P(As |y).
(y, s) Y Y ys y s. yy = 0, ys > 0 y 6= s.
, , .
3
P , p .
() , , .
4
19
. 2.1.
a:
XX
R(a) =
ys Py P(As |y).
yY sY
, ys = [y 6= s], R(a) a.
2.1.2
2.1. Py
py (x), R(a)
X
a(x) = arg min
ys Py py (x).
sY
yY
. t Y :
XX
R(a) =
ys Py P(As |y) =
yY sY
X
yY
yt Py P(At |y) +
X X
sY \{t} yY
ys Py P(As |y).
P
P(As |y), :
, P(At |y) = 1
sY \{t}
X
X X
R(a) =
yt Py +
(ys yt )Py P(As |y) =
yY
sY \{t} yY
= const(a) +
X Z
sY \{t}
As yY
sY \{t}
As
(2.1)
P
gs (x) =
ys Py py (x),
yY
X Z
R(a) = const(a) +
gs (x) gt (x) dx.
(2.1)
As . R(a)
R
|Y | 1 I(As ) = As gs (x) gt (x) dx,
As . I(As ) , As
. t
As = x X gs (x) 6 gt (x), t Y, t 6= s .
, As = x X a(x) = s . , a(x) = s , s = arg min gt (x). gt (x)
tY
t, , R(a),
.
.
, , , .
.
20
. . .
2.2. Py py (x) , yy = 0 ys y y, s Y ,
a(x) = arg max y Py py (x).
yY
(2.2)
. (2.1) 2.1. ys , s, t Y
ys yt = t [y = t] s [y = s].
,
X
(ys yt )Py py (x) = t Pt pt (x) s Ps ps (x) = gt (x) gs (x),
yY
py (x)Py
p(x, y)
.
= P
ps (x)Ps
p(x)
sY
x, , P(y|x) . x:
X
R(x) =
y P(y|x).
yY
. (2.2) :
a(x) = arg max y P(y|x).
yY
(2.2) .
R(a), , .
(y 1),
. (Py |Y1 | ), x y
py (x) x.
21
2.1.3
2.1. ,
p(x, y) = Py py (x),
X .
y Xy = (xi , yi )i=1 yi = y .
Py . ,
y
Py = , y = |Xy |, y Y,
(2.3)
Py y . ,
Py . (2.3) , .
. ,
(unbalanced classes) ; , ,
. ,
,
. Py
(2.3), .
,
, X m Xy ,
.
2.3. X m = {x1 , . . . , xm }, p(x).
p(x), p(x) X.
. , x X n fj : X R, j = 1, . . . , n. x =
= (1 , . . . , n ) X = Rn , j = fj (x).
2.1. f1 (x), . . . , fn (x) . ,
py (x) = py1 (1 ) pyn (n ),
y Y,
(2.4)
pyj (j ) j- y.
, n , n- . , ,
(2.4), (nave Bayes).
pyj () (2.4)
(2.2).
n
X
j=1
22
. . .
. , () , () .
.
, , ??.
2.2
py (x) x X. x (2.2).
2.2.1
. . ,
, (2.5). , .
. X , |X| m.
xi , X m = (xi )m
i=1 :
m
1 X
[xi = x].
p(x) =
m i=1
(2.6)
, |X| m, , , ,
.
. X = R. 1
P [x h, x + h], P [a, b] [a, b].
, p(x) = lim 2h
h0
, , [x h, x + h], h ,
:
m
1 X
|x xi | < h .
ph (x) =
2mh i=1
(2.7)
ph (x) -, , (2.2)
y Y . - [57, 56]:
m
1 X
x xi
ph (x) =
,
K
mh i=1
h
K(z) , ,
(2.8)
R
K(z) dz = 1.
23
ph (x) , K(z), , ,
R
: ph (x) dx = 1 h.
,
. 3, . 25.
K(z) = 12 |z| < 1 (2.7).
K(z) = [z = 0] h = 1 (2.6).
(2.7) , ,
ph (x) p(x)
m h.
2.3 ([56, 57, 25]). :
1) X m ,
p(x);
R
2) K(z) , : X K 2 (z) dz < ;
3) hm , lim hm = 0 lim mhm = .
m
. n
fj : X R, j = 1, . . . , n.
x X [9, 17]:
m
n
fj (x) fj (xi )
1 XY 1
K
.
(2.9)
ph (x) =
m i=1 j=1 hj
hj
, xi . , .
(2.8), , .
. X (x, x ), , . (2.8) :
m
X
1
(x, xi )
K
,
(2.10)
ph (x) =
mV (h) i=1
h
(2.10) y Y :
1 X
(x, xi )
[yi = y] K
,
(2.11)
py,h (x) =
y V (h) i=1
h
24
. . .
K , h . V (h)
y, (2.2) - arg max
. (2.11)
Py = y / (2.2):
X
i=1
(x, xi )
[yi = y] K
h
(2.12)
X .
, (2.12)
h K.
h .
h 0 , ph (x) . h . ,
h . :
LOO(h, X ) =
X
i=1
a xi ; X \xi , h =
6 yi min,
h
a(x; X \ xi , h) , X xi . LOO(h) ,
h .
h(x). X
, .
h
X, . ,
x X
(k+1)
(k + 1)- h(x) = x, x
, , x.
V (h) y, , py,h(x) (x) Xy y Y .
k , h .
K
. ,
ph (x). . G ph (x)
x. E, Q, T , ( , . 3), ,
x h.
25
T
1.0
0.8
0.6
. 3. :
E ;
Q ;
T ;
G ;
.
0.4
0.2
-0.0
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
. (x, x ) , ,
.
. (curse of dimensionality). (. ??), (. ??). ,
.
, ??.
2.3
X = Rn , n .
. 2.2.
n
1
N (x; , ) = (2) 2 || 2 exp 12 (x ) 1 (x ) ,
x Rn ,
n- ()
() Rn Rnn . ,
, , .
26
. . .
Rn , , , :
Z
N (x; , ) dx = 1;
Z
Ex = xN (x; , ) dx = ;
Z
2.4. n-
py (x) = N (x; y , y ),
y Y.
.
, .
. , s t,
s Ps ps (x) = t Pt pt (x),
ln ps (x) ln pt (x) = Cst ,
Cst = ln(t Pt /s Ps ) , x.
, ln py (x) x:
ln py (x) = n2 ln 2 12 ln |y | 12 (x y ) 1
y (x y ).
27
s = t ,
:
x 1 (s t ) 12 s 1 s + 12 t 1 t = Cst ;
(x st ) 1 (s t ) = Cst ;
st = 21 (s + t ) .
. ,
s, t (s Ps = t Pt ),
, (s = t = In ).
, .
,
, .
, , R(a).
(s = t 6= In ), , -
, .
(s Ps 6= t Pt ), .
,
,
.
. . ,
,
, .
2, -, -.
. , ,
(x s ) 1 (x s ) = (x t ) 1 (x t );
kx s k = kx t k ;
p
ku vk (u v) 1 (u v) Rn , . , .
, . () (nearest
mean classier), .
28
. . .
. X m = {x1 , . . . , xm },
(x; ). :
m
p(X ; ) = p(x1 , . . . , xm ; ) =
m
Y
(xi ; ).
i=1
, ,
p(X m ; ) ,
. ,
[13, 16].
:
m
L(X ; ) =
m
X
i=1
ln (xi ; ) max .
(2.13)
. , ( (x; ) ):
m
L(X m ; ) =
ln (xi ; ) = 0.
i=1
(2.14)
. (, )
, (2.14).
2.5. X m = (x1 , . . . , xm ). (x; ) N (x; , ),
(2.13),
m
1 X
xi ;
=
m i=1
X
= 1
(xi
)(xi
) .
m i=1
. .
. 2.3. X m , m ) (x; ) = 0 . (X
m
m ) = 0 .
, X , , EX m (X
: E
=
, E
= .
=
,
.
:
m
.
m1
1 X
(xi
)(xi
) .
m 1 x=1
(2.15)
29
y
.
y
Xy = {xi X | yi = y} y Y .
(2.2). , (plug-in).
y :
y
y
, . , , .
, .
.
, , ,
.
, y < n,
y .
,
.
, y , .
. 1 ,
,
y
1
y (x st ).
, , . .
, , .
. , fj (x) yj yj , , :
1
( yj )2
, y Y, j = 1, . . . , n.
pyj () =
exp
2
2yj
2yj
, , y
y .
.
yj
yj y Y j = 1, . . . , n.
2.3.3
1936 . . , , ,
30
. . .
[41]. ,
.
. ,
X
1
(xi
yi )(xi
yi )
|Y | i=1
2.4, , ,
-, . :
a(x) = arg max y Py py (x) =
yY
1
1
y
y +x
= arg max ln(y Py ) 21
y =
yY
| {z }
|
{z
}
y
= arg max x y + y .
yY
(2.16)
().
,
. , , , .
, , [28]:
R(a) = 21 k1 2 k ,
(r) = N (x; 0, 1) .
. : ( ).
, . ,
, v .
, :
+ In )v = v + v = ( + )v.
(
.
+ diag
[28]. ,
(1 )
,
[2]; ,
, , . , - .
.
.
31
0.40
0.35
0.30
0.25
0.20
0.15
0.10
max(t t+1 )
t
0.05
0
-0.05
mp
1 2 3 4 5 6
mq
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
. 4. , m = 25 q = 3 p = 10 .
, 6, .
. , .
(features selection) ??.
: , .
,
n- . . . [28]. ,
(. . R(a) ).
(2) (2)
(2)
y , y : y Y , y Rn
. (2)
: (x) = x y . n 2
, (x) .
(3) (3)
y , y : y Y ,
(3)
y Rn .
,
( ).
22. [28].
. ,
. , ,
, , .
, , .
(principal
component analysis), 5.4.
. , (x; ), (robust ).
32
. . .
.
X m . xi X m
: 1 > . . . > m .
i = (xi ; ),
, , ()
. : p q, t {m p, . . . , m q 1},
t t+1 . (m t)
, . . 4. . , , ,
.
, .
,
,
,
, .
2.4
, -
, .
2.2. X k :
p(x) =
k
X
wj pj (x),
j=1
k
X
wj = 1,
wj > 0,
j=1
EM-
, , , .
EM (expectation-maximization).
.
(hidden) G, .
, , .
, , .
33
gij = 1 i = 1, . . . , .
j=1
wj , j , gij :
wj pj (xi )
gij = Pk
s=1 ws ps (xi )
i, j.
(2.17)
E- EM.
M- (maximization). , gij
, ( ) .
Q() = ln
m
Y
i=1
p(xi ) =
m
X
i=1
ln
k
X
j=1
wj pj (xi ) max .
34
. . .
Pk
j=1
wj = 1. :
X
X
m
k
k
X
L(; X ) =
ln
wj pj (xi )
wj 1 .
m
i=1
j=1
j=1
wj :
m
X
pj (xi )
L
= 0,
=
Pk
wj
w
p
(x
)
s
s
i
s=1
i=1
(2.18)
j = 1, . . . , k.
wj , k ,
j i:
k
m X
X
i=1 j=1
wj pj (xi )
Pk
s=1 ws ps (xi )
{z
}
=1
k
X
wj ,
j=1
| {z }
=1
= m.
(2.18) wj , = m,
, (2.17),
:
m
wj pj (xi )
1 X
1 X
gij ,
wj =
=
Pk
m i=1 s=1 ws ps (xi )
m i=1
(2.19)
j = 1, . . . , k.
, - wj > 0
, .
j , , pj (x) (x; j ):
m
X
wj
wj pj (xi )
L X
=
pj (xi ) =
ln pj (xi ) =
Pk
Pk
j
j
j
w
p
(x
)
w
p
(x
)
s
s
i
s
s
i
s=1
s=1
i=1
i=1
=
m
X
i=1
X
gij
ln pj (xi ) =
gij ln pj (xi ) = 0,
j
j i=1
j = 1, . . . , k.
j := arg max
m
X
gij ln (xi ; ),
j = 1, . . . , k.
(2.20)
i=1
, M- wj (2.19) j
k (2.20). ,
.
EM [39, 67, 47].
35
2.2. -
:
X m = {x1 , . . . , xm };
k ;
= (wj , j )kj=1 ;
;
:
= (wj , j )kj=1 ;
1: EM (X m , k, , );
2:
3:
E- (expectation):
i = 1, . . . , m, j = 1, . . . , k
wj (xi ; j )
gij0 := gij ; gij := Pk
;
w
(x
;
)
s
i
s
s=1
4:
M- (maximization):
j = 1, . . . , k
m
m
P
1 P
gij ;
j := arg max gij ln (xi ; ); wj :=
i=1
m i=1
0
| > ;
5: max |gij gij
i,j
6: (wj , j )kj=1 ;
. ,
Q() G . , [0, 1].
2.2. E- G (2.17). M-
k (2.20),
X m gj .
EM-. (2.20) M- .
, ,
E-. - (generalized EM-algorithm, GEM) [39].
. EM .
.
,
,
. ( )
, . k ,
36
. . .
2.3. EM-
:
X m = {x1 , . . . , xm };
R ;
m0 , ;
;
:
k ;
= (wj , j )kj=1 ;
1: :
1 := arg max
m
P
ln (xi ; );
w1 := 1;
k := 1;
i=1
2: k := 2, 3, . . .
3:
:
4:
5:
6:
|U | < m0
k;
k- :
P
k := arg max
ln (xi ; ); wk := m1 |U |;
7:
xi U
wj := wj (1 wk ), j = 1, . . . , k 1;
EM (X m , k, , );
.
.
EM-
.
. ,
xi ,
p(xi ). .
EM-,
. ,
.
2.3. 1
k = 1. . p(xi ) R
, xi . ,
; p(xi )
;
P0 . 3 U , . m0 ,
, . 6
, ,
37
U . , - . 7
EM-.
EM-. Q()
. EM : ,
,
.
.
EM- (stochastic EMalgorithm, SEM) [1, . 207]. 2.2
, M- ( 4)
j := arg max
m
X
gij ln (xi ; )
i=1
, ,
X
j := arg max
ln (xi ; ),
xi Xj
Xj X m .
xi X m j(i) {gij : j = 1, . . . , k}, xi X m
Xj(i) .
[1] k
, kmax . , |Xj | 6 m0 , .
.
SEM , . SEM EM, .
, SEM Q(), .
2.4.2
M- , () . (2.20)
, .
.
2.6. n- (x; j ) = N (x; j , j ) j = (j , j ), j Rn , j Rnn , j = 1, . . . , k.
38
. . .
(2.20)
m
1 X
gij xi ,
j =
mwj i=1
j =
j = 1, . . . , k;
m
1 X
gij (xi
j )(xi
j ) ,
mwj i=1
j = 1, . . . , k.
, M- . i- j- gij
, E-.
. , , . ,
2.6 .
. . , ,
. (, )
.
, , .
c . , ,
. , ,
, .
2.7. n- 2
2
(j , j ), j = (j1 , . . . , jn ), j = diag(j1
, . . . , jn
) , j = 1, . . . , k:
(x; j ) = N (x; j , j ) =
n
Y
d=1
jd
1 d jd 2
exp
,
2
jd
2
x = (1 , . . . , n ).
(2.20)
jd
m
1 X
=
gij xid ,
mwj i=1
jd
=
d = 1, . . . , n;
m
1 X
gij (xid
jd )2 ,
mwj i=1
d = 1, . . . , n;
xi = (xi1 , . . . , xin ) X m .
39
. N (x; j , j )
jd , jd xi = (xi1 , . . . , xin ):
2
ln N (xi ; j , j ) = jd
(xid jd );
jd
1
3
ln N (xi ; j , j ) = jd
+ jd
(xid jd )2 .
jd
jd , jd :
2
jd
3
jd
m
X
i=1
m
X
i=1
gij (xid jd ) = 0;
2
gij jd
(xid jd )2 = 0.
, jd , jd i,
(2.19), .
, , j = j2 In . : ,
, .
,
.
. pj (x) = N (x; j , j ) j
pj (x) = Nj exp 21 2j (x, j ) ,
n
n
X
d=1
2
jd
|d d |2 ,
x = (1 , . . . , n ),
x = (1 , . . . , n ).
j (x, j ), x. pj (x) x j .
f (x), x X, .
2.4.3
, .
Y = {1, . . . , M }, y Y
py (x) Xy = {(xi , yi ) X | yi = y}.
40
. . .
5 p11 (x)
2 p1k1 (x)
89:;
?>=<
x
,
)
YYYY
w11 YYYY, P
ggg3
JJJ
w
gggg 1k1
J
1 PJ
1J
pM 1 (x)
pM kM (x)
WWWW
w
M1
arg
: max
tt
/ a(x)
M PM
tt
WWW+ P tt
ee2
eeewM kM
. 5. .
py (x), y Y ,
ky . n- 2
2
yj = (yj1 , . . . , yjn ), yj = diag(yj1
, . . . , yjn
), j = 1, . . . , ky :
py (x) =
ky
X
j=1
ky
X
wyj = 1,
wyj > 0.
j=1
ky
X
wyj Nyj exp 21 2yj (x, yj ) ,
{z
}
|
j=1
(2.21)
pyj (x)
41
j = 1, . . . , ky . , yj (x, yj ) yj . 2.3 .
EM-. (. 6), EM-
. , , . , yj
,
. . ,
(j-) (y-) . ,
, .
EM- ,
7.
42
. . .
, . , , , ,
, -
. , , ,
, . ,
. 5 .
X. , ,
, ( , ).
similarity-based learning distance-based learning.
3.1
X : X X [0, ).
y : X Y ,
X = (xi , yi )i=1 , yi = y (xi ). Y
. a : X Y , y (x) X.
3.1.1
u X
x1 , . . . , x u:
(2)
()
(u, x(1)
u ) 6 (u, xu ) 6 6 (u, xu ),
(i)
xu i- u. , i-
(i)
(i)
u yu = y (xu ). , u X
.
. 3.1. X u y Y , y (u, X ) :
y (u, X ) =
(3.1)
i=1
w(i, u) i- u. y (u, X ) u y.
5
.
,
.
43
w(i, u).
, i. (i)
, u xu ,
, .
X a. , , , - ,
. a(u; X ) X , , u. (lazy learning),
(eager learning),
, .
(case-based reasoning, CBR). , u y?
: ,
y, .
w(i, u),
, .
3.1.2
a(u; X ) = yu(1) .
, , .
NN X .
. :
. , ,
, ,
.
, .
, .
.
k (k nearest neighbors, kNN). , u ,
(i)
k xu , i = 1, . . . , k:
w(i, u) = [i 6 k];
k
X
[yu(i) = y].
i=1
k = 1 , ,
. k = , , .
44
. . .
, k . k
(leave-one-out, LOO). xi X ,
k .
LOO(k, X ) =
h
X
i=1
i
a xi ; X \{xi }, k =
6 yi min .
k
, xi
, xi xi , ()
LOO(k) k = 1.
kNN: k u , u ,
k .
k . kNN , .
, k. ,
wi , i- :
w(i, u) = [i 6 k] wi ;
k
X
[yu(i) = y]wi .
i=1
wi . , , ( wi = k+1i
k
: ; 1,
2; ). , , ,
: wi = q i , q (0, 1)
. LOO, k.
kNN.
. .
( , )
. , .
O() . . ,
O(ln ) .
, .
45
3.1.3
wi (i)
(u, xu ), i. K(z),
(i)
[0, ). w(i, u) = K h1 (u, xu ) (3.1),
X
i=1
[yu(i)
(i)
(u, xu )
= y] K
.
h
(3.2)
h , k. u h, xi u yi .
,
, , ,
.
h . LOO(h), , ,
;
.
h ,
X. , . .
K(z), [0, 1],
. h , k
(k+1)
u : h(u) = (u, xu ).
!
k
(i)
X
(u,
x
)
u
.
(3.3)
a(u; X , k) = arg max
[yu(i) = y] K
(k+1)
yY
(u, xu )
i=1
,
, (, )
. 2.2.2.
3.1.4
Kh (u, x) = K h1 (u, x)
u. (u, x) , , . , xi u yi , hi :
X
i=1
(u, xi )
[yi = y] i K
hi
i > 0, hi > 0.
(3.4)
, (3.3) , hi
xi , u.
46
. . .
3.1.
:
X ;
:
i , i = 1, . . . , (3.4);
1: : i = 0; i = 1, . . . , ;
2:
3:
xi X ;
4:
a(xi ) 6= yi
5:
i := i + 1;
6:
47
.
(3.1), u , ,
.
3.2
. . , , , .
.
. ,
. , , . , .
, .
,
, . , .
, .
3.2.1
(3.1) , , .
. 3.2. (margin) xi X , a(u) = arg max y (u),
yY
.
, .
, : , , , , , . 6.
,
.
.
( , ),
. , . .
48
. . .
Margin
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
0
20
40
60
80
100
120
140
160
180
200
. 6. Mi , i = 1, . . . , 200.
.
, .
,
.
.
(3.1), , .
.
. -
, , .
. . , , , , .
. .
, . ,
. , ,
kNN . , ,
.
3.2.2
STOLP
STOLP [11].
w(i, u).
a(u; ) (3.1), X .
49
3.2. STOLP
:
X ;
;
0 ;
:
X ;
1: xi X , xi :
2:
M (xi , X ) <
3:
X 1 := X \ {xi }; := 1;
4: :
:
5: 6= X ;
6:
, a(u; ) :
7:
8:
9:
M (xi , ) xi a(xi ; ).
, xi , , .
, ,
, .
3.2 ( 13). X xi M (xi , X ), .
= 0, . .
( 4).
. xi ,
. ,
0 . 0 = 0,
a(u; ), ,
.
,
. . ,
, , .
50
. . .
STOLP .
X \ , . O(||2 ).
, .
, . , ,
. 3.2,
.
STOLP : , . , . ,
.
51
,
.
. , ,
, ,
.
4.1
, Y = {1, +1}.
a(x, w) = sign f (x, w), w . f (x, w) . ,
. f (x, w) > 0, a x +1,
1. f (x, w) = 0 .
, a(x, w) ,
w, X = (xi , yi )i=1 .
. 4.1. Mi (w) = yi f (xi , w) (margin) xi
a(x, w) = sign f (x, w).
Mi (w) < 0, a(x, w) xi . Mi (w), xi .
.
L Mi (w) , L (M ) , : [M < 0] 6 L (M ). :
Q(w, X ) =
X
i=1
e
Q(w,
X ) =
X
i=1
L Mi (w)
min .
w
(4.1)
. 7.
, , . - (SVM),
, ,
AdaBoost, . ??.
, .
.
, : L (M ) ,
.
52
. . .
V
L
. 7.
[M < 0].
2
1
0
-5
-4
-3
-2
-1
Q(M ) = (1 M )2
V (M ) = (1 M )+
S(M ) = 2(1 + eM )1
L(M ) = log2 (1 + eM )
E(M ) = eM
;
-;
;
;
.
- . , X Y ,
f (x, w) p(x, y|w).
w X
. w,
X (
). X ( ,
p(x, y|w)), :
p(X |w) =
Y
i=1
X
i=1
(4.2)
(4.2) (4.1), ,
,
ln p(xi , yi |w) = L yi f (xi , w) .
p(x, y|w), f L . ,
L , , ,
.
. ,
p(x, y|w) p(w).
, p(x, y|w) , . ,
p(x, y|w) p(w).
53
, p(w)
p(w; ), , .
.
,
, .
-, X , w
. , , p(X , w; ) = p(X |w)p(w; ). ,
:
L (w, X ) = ln p(X , w; ) =
X
i=1
(4.3)
L : (4.2)
, .
, .
. w Rn , .
. , :
1
kwk2
1
ln p(w; ) = ln
exp
= kwk2 + const(w),
n/2
(2)
2
2
const(w) , w, ,
(4.3). w,
.
.
. w Rn , .
n
X
1
kwk1
1
ln p(w; C) = ln
exp
= kwk1 + const(w), kwk1 =
|wj |.
(2C)n
C
C
j=1
,
. 2C 2 .
, . - ,
w = 0.
w:
Q(w) =
X
i=1
Li (w) +
n
1 X
|wj | min,
w
C j=1
54
. . .
X
i=1
n
1 X
(uj + vj ) min;
Li (u v) +
u,v
C j=1
uj > 0,
vj > 0,
j = 1, . . . , n.
j uj > 0 vj > 0 ,
, Q(u, v)
, . C , - 2n . C j,
, uj = vj = 0, , wj = 0. ,
j- , .
, , wj j- , (features selection). C . C, wj .
. p(w), w Rn wj ,
, Cj wj :
X
n
wj2
1
p(w) =
exp
.
2C
(2)n/2 C1 Cn
j
j=1
(4.4)
Cj wj :
Q(w) =
X
i=1
n
wj2
1X
Li (w) +
ln Cj +
min .
w,C
2 j=1
Cj
. Cj 0,
wj , .
Cj , wj , .
4.2
X ; Y = {1, 1} ; n fj : X R, j = 1, . . . , n.
x = (x1 , . . . , xn ) Rn , xj = fj (x), x.
55
. 8. ( ).
x w Rn , :
X
n
j=1
wj fj (x) w0 .
(4.5)
hw, xi = 0 , Rn . x w, x +1, 1.
w0 . , , fj (x) 1, w0 wj .
-.
, 8. , , , . : . ,
. 100
1 , .
, , .
,
.
.
, xj = fj (x) n ,
. wj .
wj , j- , ,
56
. . .
1.5
1.0
0.5
(z) = [z > 0]
(z) = (1 + ez )1
th(z) =2(2z) 1
ln(z + z 2 + 1)
exp(z 2 /2)
z
0.0
-0.5
T
-1.0
-1.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
;
(S);
(T);
(L);
(G);
(Z);
. 9. (z).
. w0 ,
+1, 1.
(z) = sign(z), , .
. (z) = th(z) , . . 9.
, (4.5) . 1943 [52].
. , . 100 /.
, ,
102 .
1011 ,
103 104 . ,
.
, ,
(, , ), , , , . ,
50- ,
.
. ,
.
. ,
.
,
, .
57
4.3
X = (xi , yi ) i=1 , xi Rn , yi {1, 1}.
w Rn , :
Q(w, X ) =
X
i=1
L hw, xi i yi min .
w
(4.6)
Q(w) .
w, , w
Q.
Q(w) n
Q (w) = wj j=1 :
w := w Q (w),
> 0 ,
(learning rate). , L ,
:
w := w
X
i=1
L hw, xi i yi xi yi .
(4.7)
(xi , yi ) w,
w . , (xi , yi ) ,
:
w := w La hw, xi i yi xi yi .
(4.8)
hy, fj i
, j = 1, . . . , n,
(4.9)
hfj , fj i
fj = fj (xi ) i=1 j- , y = (yi )i=1 .
.
wj :=
4.1 Q . . , .
1/. .
58
. . .
4.1. .
:
X ; ; .
:
w1 , . . . , wn ;
1: wj , j = 1, . . . , n;
2:
:
3:
4:
5:
6:
7:
8:
P
Q := i=1 L hw, xi i yi ;
xi X (, );
a(xi , w) :
i := L hw, xi i yi ;
:
w := w L hw, xi i yi xi yi ;
:
Q := (1 )Q + i ;
Q / w ;
SG.
.
, , .
,
, .
SG.
Q, , ,
, .
n . ,
, ,
, , , .
, .
hw, xi i, L ,
(4.8). wj , .
59
4.3.1
. , , L (M ) = (M 1)2 .
w := w hw, xi i yi xi .
(4.10)
1960 (delta-rule),
ADALINE [66].
, Y = R, a(x) = hw, xi hw, xi i yi 2 .
. 1957 , . ,
.
, .
. , ,
. , . w.
. , fj (x) {0, 1}. ,
yi {1, 1}. , a(xi )
yi . .
1. a(xi ) yi , .
2. a(xi ) = 1 yi = 1, w .
wj , fj (xi ) 6= 0;
. w := w + xi , > 0 .
3. a(xi ) = 1 yi = 1, : w := w xi .
[45]:
hw, xi i yi < 0 w := w + xi yi .
(4.11)
, (4.8), - L (M ) = (M )+ . (4.8)
, .
,
.
4.1 (, 1962 [55]). X = Rn , Y = {1, 1}, X
w ,
hw,
xi i yi > i = 1, . . . , . 4.1 (4.11)
, , w0 , > 0,
. w0 = 0,
2
D
, D = max kxk.
tmax =
xX
60
. . .
. w t- wt , kwk
= 1:
[
cos(w,
wt ) =
hw,
wt i
.
kwt k
t- wt1
x, y, : hx, wt1 i y < 0.
(4.11) .
, :
w,
wt = w,
wt1 + hw,
xi y > w,
wt1 + > w,
w0 + t.
, kxk < D, :
kwt k2 = kwt1 k2 + 2 kxk2 + 2 x, wt1 y < kwt1 k2 + 2 D2 < kw0 k2 + t 2 D2 .
:
hw,
w0 i + t
[
cos(w,
wt ) > p
t .
kw0 k2 + t 2 D2
. , t x X , hx, wt i y < 0,
.
w0 = 0,
.
cos 6 1 t/D 6 1, tmax = (D/)2 .
,
. ,
.
4.3.2
, .
, , 6.2.
, , . [50].
.
. kxi k , , . :
xj :=
xj xjmin
,
xjmax xjmin
xj :=
xj xj
xj
j = 1, . . . , n,
xjmin , xjmax , xj , xj , , j- .
61
. , .
1. ,
, .
, . , ,
. (shuing).
2. , ,
. .
,
,
, .
3. , . , . ,
: , .
.
(weights decay). ,
Q(w) :
=
,
<
,
t = 1/t.
t 0,
t=1 t
t=1 t
2. ,
Q w Q (w) min.
[8]. ,
ADALINE = kxi k2 .
62
. . .
.
, .
(jog of weights). ,
(stochastic local search).
. .
Q(w, X ) ,
. (early stopping): -
(. ??), ,
, , .
4.4
, . ,
. -, . -,
.
4.4.1
,
, . : ? , .
. , Y = {1, +1}, n fj : X R, j = 1, . . . , n. X = Rn ,
: x (f1 (x), . . . , fn (x)).
4.1. X Y . X = (xi , yi )i=1 p(x, y) = Py py (x) = P(y|x)p(x),
Py , py (x) , P(y|x) y Y .
. 4.2. p(x), x Rn ,
p(x) = exp c() h, xi + b(, ) + d(x, ) , Rn ,
, b, c, d .
.
: , , , , , -, .
63
4.1.
Rn Rnn = 1 = :
1
n
N (x; , ) = (2) 2 || 2 exp 12 (x ) 1 (x ) =
= exp 1 x 21 1 1 12 x 1 x n2 ln(2) 21 ln || .
| {z } |
{z
} |
{z
}
h,xi
b(,)
d(x,)
4.2. py (x)
, d ,
y .
P(+1|x)
P(1|x) +
P+ p+ (x)
P(+1|x)
=
= exp h(c+ ()+ c () ), xi + b+ (, + ) b (, ) + ln PP+ .
P(1|x)
P p (x)
|
|
{z
}
{z
}
w=const(x)
const(x)
w x
. , x,
. , , hw, xi.
P(+1|x)
= ehw,xi .
P(1|x)
64
. . .
3.5
1.0
3.0
2.5
0.8
2.0
0.6
1.5
0.4
1.0
0.2
0.5
0
0
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5
Mi
. 10.
log2 1 + eMi .
4.4.2
-5
-4
-3
-2
-1
Mi
. w X :
L(w, X ) = log2
Y
i=1
p(xi , yi ) max .
w
L(w, X ) =
X
i=1
log2 hw, xi i yi + const(w) max .
w
L(w, X ) e
Q(w,
X ), (4.1):
e
Q(w,
X ) =
X
i=1
log2 1 + exp( hw, xi i yi ) min .
w
(4.12)
, L (M ) = log2 1+eM
.
e
. Q(w),
65
. - , ,
. 10. (4.13) , , (4.11), . 11:
w := w + yi xi hw, xi i yi < 0 .
xi . ,
Mi (w) = hw, xi i yi , . ,
, ,
. (margin)
, [31, 60].
. w
-, , . 5.5.5.
,
(,
n n) . . , x
P (+1|x).
. ()
, .
n|Y | + n(n + 1)/2 (
), n
( w). ? ,
, , .
. ,
.
.
,
( ), -
( ).
.
.
, 4.2.
66
. . .
. , , ( ), , .
, nn .
4.4.3
yY
y y.
67
. 4.2 , P(y|x) = hw, xi y
.
. ,
.
, 4.2 , a(x) = sign f (x, w) .
, P(+1|x) f (x, w).
, , - :
P(+1|x) = f (x, w) + .
(, ) R2
X
i=1
log yi f (xi , w) + yi
i=1
max
,
4.5
6070- . . , [7].
,
, .
,
, , . ,
4.5.2, .
90-
(support vector machine, SVM) [38]. . [35, 61].
SVM . -,
SVM , ,
. -, :
. ; . ,
68
. . .
. ,
, .
4.5.1
,
n- : X = Rn , Y = {1, +1}.
:
X
n
j
a(x) = sign
(4.14)
wj x w0 = sign hw, xi w0 ,
j=1
x = (x1 , . . . , xn ) x; w = (w1 , . . . , wn ) Rn
w0 R . hw, xi = w0
, Rn .
, X = (xi , yi )i=1
w, w0 ,
Q(w, w0 ) =
X
i=1
yi (hw, xi i w0 ) 6 0
. . ,
. , .
. ,
. : , (margin)
. [32, 59, 65].
. , : a(x) , w w0 .
,
(4.15)
min yi hw, xi i w0 = 1.
i=1,...,
x : 1 6 hw, xi w0 6 1 ,
, . . 13.
. w.
. , , , (4.15).
,
.
69
x+
x
. 13. .
x x+ . w .
.
, .
x x+ 1 +1 ,
.
(w0 + 1) (w0 1)
2
hw, x+ i hw, x i
w
=
=
.
=
(x+ x ),
kwk
kwk
kwk
kwk
, w .
, : w w0 ,
- w :
(
hw, wi min;
(4.16)
yi hw, xi i w0 > 1, i = 1, . . . , .
. (4.16) , .
4.5.2
, ,
, .
i > 0, xi , i = 1, . . . , .
(4.16) - :
i min ;
2 hw, wi + C
w,w0 ,
i=1
(4.17)
y
hw,
x
i
w
>
1
,
i
=
1,
.
.
.
,
;
i
i
0
i
> 0, i = 1, . . . , .
i
C
.
70
. . .
3
0
-3
-2
-1
Mi
. Y = {1, +1}
(margin) xi
Mi (w, w0 ) = yi hw, xi i w0 .
X
i=1
1 Mi (w, w0 )
1
kwk2 min .
w,w0
2C
(4.18)
71
w0 . z1 , z2 , C .
,
,
, , SVM, .
. (4.17):
X
X
X
1
i i =
i Mi (w, w0 ) 1 + i
i
L (w, w0 , ; , ) = kwk2 + C
2
i=1
i=1
i=1
X
X
1
= kwk2
i Mi (w, w0 ) 1
i i + i C ,
2
i=1
i=1
= (1 , . . . , ) , w; = (1 , . . . , )
, = (1 , . . . , ).
- (4.17) :
w,w0 , ,
i = 0 Mi (w, w0 ) = 1 i , i = 1, . . . , ;
i = 0 i = 0, i = 1, . . . , ;
.
. :
X
L
i yi xi = 0
=w
w
i=1
L
= i i + C = 0
i
i yi xi ;
(4.19)
i=1
X
L
i yi = 0
=
w0
i=1
w=
(4.20)
i yi = 0;
i=1
i + i = C,
i = 1, . . . , .
(4.21)
(4.19) , w
xi , , i > 0.
. 4.3. i > 0, xi (support vector).
(4.21) i > 0 0 6 i 6 C.
, ,
i , i , i Mi .
, xi , i = 1, . . . , :
1. i = 0; i = C; i = 0; Mi > 1.
72
. . .
xi w. .
2. 0 < i < C; 0 < i < C; i = 0; Mi = 1.
xi . .
3. i = C; i = 0; i > 0; Mi < 1.
xi , (0 < i < 1, 0 < Mi < 1), (i = 1, Mi = 0),
(i > 1, Mi < 0).
xi .
(4.21) ,
i i , i .
X
1 XX
L () =
i +
i j yi yj hxi , xj i min;
2 i=1 j=1
i=1
0 6 i 6 C, i = 1, . . . , ;
i yi = 0.
(4.22)
i=1
,
, , . , , . , .
, . w (4.19). w0 xi w0 w0 = hw, xi i yi .
w0 , :
w0 = med hw, xi i yi : i > 0, Mi = 1, i = 1, . . . , .
(4.23)
:
a(x) = sign
X
i=1
i yi hxi , xi w0 .
(4.24)
, ,
, i 6= 0. a(x) ,
. (sparsity); SVM
, .
i , . ( ) SVM.
, ; , .
73
. C . , C. , , ,
C, .
, ,
- , . C, , i .
. ,
, .
4.5.3
.
X H : X H. H , ,
( , X , ,
). H .
, (xi ), xi , SVM , .
, hx, x i X h(x), (x )i H.
: H
, , ,
, .
. 4.4. K : X X R (kernel function),
K(x, x ) = h(x), (x )i : X H,
H .
(4.22), (4.24)
,
. , hx, x i K(x, x ). , a : X Y .
, H
, .
, .
, , . K(x, x ), , SVM.
(featureless recognition), (kNN, RBF .) .
74
. . .
. K(x, x )
? , .
4.3 (, 1909 [53]). K(x, x ) , K(x, x ) = K(x , x), :
R R,
K(x, x )g(x)g(x )dxdx > 0 g : X R.
X X
.
. 4.5. K(x, x ) ,
X p = (x1 , . . . , xp ) X K = kK(xi , xj )k p p : z Kz > 0 z Rp .
.
, , . , , . ,
. ,
, .
.
[19, 20].
1. K(x, x ) = hx, x i .
2. K(x, x ) = 1 .
3. K(x, x ) = K1 (x, x )K2 (x, x ) .
4. : X R K(x, x ) = (x)(x ) .
5. K(x, x ) =
= 1 K1 (x, x ) + 2 K2 (x, x ) .
6. : X X K0 : K(x, x ) = K0 ((x), (x )).
7. s : X X
,
R R
8. K(x, x ) = k(x R x ) ,
n
- F [k]() = (2) 2 X eih,xi k(x) dx .
9. - .
75
. , : , ,
(RBF-), . ,
.
. , . ,
.
4.2. X = R2 K(u, v) = hu, vi2 , u = (u1 , u2 ),
v = (v1 , v2 ). ,
. :
K(u, v) = hu, vi2 = h(u1 , u2 ), (v1 , v2 )i2 =
= (u1 v1 + u2 v2 )2 = u21 v12 + u22 v22 + 2u1 v1 u2 v2 =
D
2 2
E
2
2
= u1 , u2 , 2u1 u2 , v1 , v2 , 2v1 v2 .
K
H = R3 .
: R2 R3 : (u1 , u2 ) 7 (u21 , u22 , 2u1 u2 ). H
X.
, .
4.3. . X = Rn ,
K(u, v) = hu, vid .
(u) (u1 )d1 (un )dn
d1 , . . . , dn ,
d1 + + dn = d. , d
H, Cn+d1
. H ,
d u1 , . . . , un .
4.4. X = Rn ,
d
K(u, v) = hu, vi + 1 ,
H d u1 , . . . , un .
H d.
X.
SVM . a(x)
(4.24) hxi , xi K(xi , x).
, h . i = 0
, i = h + 1, . . . , , a(x)
X
h
a(x) = sign
i yi K(xi , x) w0 .
i=1
76
. . .
GFED
@ABC
x1 ?? x11 / K(x, x1 )
SSS
?
?
S
x1h ?
1 y1 SS
S) X
??
?
?
5
???
kkk H
xn
y
?
h
h
k
?
1
kkk
@ABC
GFED
xn xnh / K(x, xh )
sign
/ a(u)
w0
@ABC
GFED
1
. 15. (SVM) .
X = Rn , a(x) ,
. ,
. , SVM, .
-, ,
.
-, : i
K(x, xi ), , xi .
4.5.
,
K(u, v) = th k0 + k1 hu, vi .
, K(u, v) ? (4.22) . , ,
. ,
0 6 i 6 C , .
- , .
4.6. (radial basis
functions, RBF) ,
K(u, v) = exp ku vk2 ,
. K(xi , x) x xi . , .
77
, i yi .
+1 , 1 . ,
x .
2.4.3 RBF-, EM-.
. ,
. SVM-RBF EM-RBF. SVM
,
. , SVM-RBF
. , EM-RBF .
SVM.
, .
, .
.
SVM.
. - .
. ,
, .
C .
(RVM). ,
w - . w, .
(relevance vector machine,
RVM). ;
[64, 33].
, (SVM) w xi :
w=
X
i=1
i yi xi ,
(4.25)
78
. . .
i ,
xi . SVM , , -,
. RVM ,
.
RVM (4.25), ,
i . , , SVM. , i i :
!
X
1
2i
p() =
exp
(2)/2 1
2i
i=1
, (4.4),
, .
, , SVM,
. ,
, SVM,
, , , . ,
.
4.6
ROC-
, Y = {1, +1},
a(x, w) = sign f (x, w) w0 , w0 R . .
4.2 w0 : w0 = ln + , +
+1 1 .
.
ROC-, , , .
(receiver operating characteristic, ROC curve) . II ,
1941 , .
: , , , , . .
ROC- .
.
X
(false positive rate, FPR):
P
[yi = 1][a(xi ) = +1]
.
FPR(a, X ) = i=1 P
[y
=
1]
i
i=1
79
4.2. ROC-
:
X ; f (x) = hw, xi ;
:
(FPRi , TPRi ) i=0 ROC-;
AUC ROC-.
P
[y = 1] 1;
1: :=
Pi=1 i
+ := i=1 [yi = +1] +1;
2: X f (xi );
3: :
(FPR0 , TPR0 ) := (0, 0); AUC := 0;
4: i := 1, . . . ,
yi = 1
5:
6:
:
FPRi := FPRi1 + 1 ; TPRi := TPRi1 ;
AUC := AUC + 1 TPRi ;
7:
8:
:
FPRi := FPRi1 ; TPRi := TPRi1 + 1+ ;
1 FPR(a)
(true negative rate, TNR) a. 1 .
Y
(true positive rate, TPR), a:
TPR(a, X ) =
i=1 [yi
= +1][a(xi ) = +1]
.
P
[y
=
+1]
i
i=1
80
. . .
5.1
g(x, ),
Rp . X :
Q(, X ) =
X
i=1
2
g(xi , ) yi .
(5.1)
() , , X :
= arg minp Q(, X ).
R
(5.2)
. g(x, )
, p p :
X
g
Q
g(xi , ) yi
(, X ) = 2
(xi , ) = 0.
i=1
5.2
(5.3)
,
,
2.2.2. a(x) x . , X (x, x ).
5.2.1
, g(x, ) = , R. , ,
wi (x), x,
a(x) = g(x, ). ,
x X.
81
a(x) = x X,
:
Q(; X ) =
X
i=1
wi (x) yi
2
min .
R
wi , (x, xi ). , ,
K : [0, ) [0, ), :
(x, xi )
.
wi (x) = K
h
h .
h, wi (x) xi x.
Q
= 0,
ah (x; X ) =
yi wi (x)
i=1
i=1
=
wi (x)
yi K
i=1
i=1
(x,xi )
h
(x,xi )
h
(5.4)
: a(x) yi xi , x.
X = R1 (x, xi ) = |x xi |.
(5.4) ,
2.3 .
5.1 ([25]). :
1) X = (xi , yi )i=1 ,
R p(x, y);
2) K(r) 0 K(r) dr < lim rK(r) = 0;
r
3) ,
R 2 p(y|x), 2
x X E(y |x) = Y y p(y|x) dy < ;
4) h , lim h = 0 lim h = .
: ah (x; X ) E(y|x)
x X, E(y|x), p(x) D(y|x) p(x) > 0.
,
h.
5.2.2
.
ah (x; X ) , K
h.
82
. . .
K ,
ah (x). ah (x)
, K(r).
. 3.
KG (r) = exp 12 r2 KQ (r) = (1 r2 )2 |r| < 1 .
K(r) , K(r) = 0 r > 1, xi , (x, xi ) < h. (5.4) x.
X = R1
xi . , x.
h . (h 0) ah (x)
, .
h
. ,
h .
, X.
, . h(x),
x. ,
(x,xi )
wi (x) = K h(x) .
h(x) (k+1)
x k + 1- : hk (x) = (x, xx ). ,
hk (x) , , wi (x)
ahk (x) , . hk (x)
, - , , KQ .
. h k xi ,
. , h 0.
(leave-one-out, LOO):
LOO(h, X ) =
X
i=1
2
ah xi ; X \{xi } yi min,
h
h k.
5.2.3
. ,
83
5.1. LOWESS .
:
X ;
:
i , i = 1, . . . , ;
1: : i := 1, i = 1, . . . , ;
2:
3:
:
ai := ah xi ; X \{xi } =
yj j K
j=1, j6=i
j K
j=1, j6=i
(xi ,xj )
h(xi )
(xi ,xj )
h(xi )
i = 1, . . . ,
i :
i := K |ai yi | ; i = 1, . . . , ;
5: i ;
4:
i = ah xi ; X \{xi } yi , (xi , yi )
, .
i ), K
,
wi (x) i = K(
, K(r).
i , i , ah , ,
, i . , , , . 5.1.
ah , i . , . (locally
weighted scatter plot smoothing, LOWESS) [37].
, , , , (robust).
K().
: (1) 6 6 () ,
t .
K()
= 6 (t) .
[37]: K()
,
= KQ 6 med{
}
i
med{i } .
5.2.4
84
. . .
wi = wi (x), di = xi x :
Q(, ; X ) =
X
i=1
wi di + yi
2
min .
,R
= 0 Q
= 0, Q
2 2, :
ah (x; X ) =
i=1
wi d2i
w i yi
i=1
wi
i=1
i=1
wi d2i
wi di
i=1
P
i=1
i=1
wi di
w i d i yi
2
X = Rn
a(u) = (u x) + (. ). x X,
.
5.3
f1 (x), . . . , fn (x) ,
fj : X R , j = 1, . . . , n. Rn :
g(x, ) =
n
X
j fj (x).
j=1
: F = fj (xi ) n ;
y = yi 1 ; = j n1 .
Q
Q() = kF yk2 .
(5.3) :
Q
() = 2F (F y) = 0,
F F = F y. . F F
n n ,
= (F F )1 F y = F + y.
F + = (F F )1 F F . ,
Q( ) = kPF y yk2 ,
85
PF = F F + = F (F F )1 F .
. PF y
y F . (PF y y) y
. Q( ) = kPF y yk2 , y . ,
y F .
. ,
F . ,
.
5.3.1
n- n
(singular value decomposition, SVD)
F = V DU ,
( 5.2, ):
1) n n- D , D = diag 1 , . . . , n , 1 , . . . , n F F F F .
2) n- V = (v1 , . . . , vn ) , V V = In , vj
F F , 1 , . . . , n ;
3) nn- U = (u1 , . . . , un ) , U U = In , uj
F F , 1 , . . . , n ;
, :
F + = (U DV V DU )1 U DV = U D1 V =
-:
n
X
1
p uj vj ;
j
j=1
n
X
1
p uj (vj y);
= F y = UD V y =
j
j=1
(5.5)
F - y:
F = PF y = (V DU )U D V y = V V y =
n
X
vj (vj y);
(5.6)
j=1
n
X
1 2
k k = y V D U U D V y = y V D V y =
(v y) .
j
j=1 j
2
(5.7)
, , .
,
. , SVD,
.
86
5.3.2
. . .
= F F , .
.
, . ,
. , m < n.
, .
1
() = kkk k =
max kuk
u : kuk=1
min kuk
u : kuk=1
max
,
min
max min ,
. , () & 102 . . . 104 .
. , z = 1 u, () :
kzk
kuk
6 ()
.
kzk
kuk
- . (5.7) , , . -
, x
g(x, ). ,
, j fj .
5.3.3
Q , kk:
Q () = kF yk2 + kk2 ,
. , Q , . , .
Q () , :
= (F F + In )1 F y.
87
, In . (ridge
regression). ,
. ,
. 2.3.3
.
- :
n
X
j
uj (vj y).
= (U D2 U + In )1 U DV y = U (D2 + In )1 DV y =
j=1 j
- y:
F
= V DU
n
X
j
j
vj (vj y).
V y=
= V diag
j +
j=1 j
(5.8)
(5.6), -
y F F .
j
(0, 1). , j +
(5.7) :
k k2
= kD (D + In ) D V yk =
n
X
j=1
X 1
1
(vj y)2 <
(vj y)2 = k k2 .
j +
j=1 j
(shrinkage)
(weight decay) [44].
. ,
. , .
, .
,
tr F (F F )1 F = tr(F F )1 F F = tr In = n.
0 n, , :
1
n = tr F (F F + In ) F = tr diag
j
j +
n
X
j=1
j
< n.
j +
. 0 -: .
: 0. ,
88
. . .
. , . 1.1.6 , , ??. , ,
.
. [0.1, 0.4], F
( ).
, :
+
, max /M0 .
M0 = (F F + In ) = max
min +
5.3.4
, .
-,
:
2
;
Q() = kF yk min
n
X
|j | 6 ;
(5.9)
j=1
. (5.9) , -. ,
j . () ,
. ,
, .
(LASSO, least absolute shrinkage and selection operator) [63].
, ,
(5.9) . j : j = j+ j .
Q , (5.9)
, 2n -:
n
X
j+ + j 6 ;
j+ > 0;
j > 0.
j=1
, j+ = j = 0,
j j- .
. . ,
. ,
,
. . 16, [44],
, .
. -
, .
89
() {j } = 1/
() {j }
5.3.5
. , , fj , y.
(. ??). , y,
.
Q(, X ) n :
(
Q() = kF yk2 min;
j > 0;
j = 1, . . . , n.
-, . j > 0 ,
, fj , , .
j- .
5.4
, ,
, , ,
, .
(principal component analysis, PCA)
, . PCA (unsupervised learning),
F y.
90
. . .
, PCA , ,
, , .
. n fj (x), j = 1, . . . , n.
, : xi f1 (xi ), . . . , fn (xi ) , i = 1, . . . , . F ,
:
x1
f1 (x1 ) . . . fn (x1 )
...
...
...
= . . . .
Fn =
x
f1 (x ) . . . fn (x )
zi = g1 (xi ), . . . , gm (xi ) Z = Rm , m < n:
z1
g1 (x1 ) . . . gm (x1 )
...
...
...
= . . . .
Gm =
z
g1 (x ) . . . gm (x )
,
, U = (ujs )nm :
fj (x) =
m
X
gs (x)ujs ,
j = 1, . . . , n,
s=1
x X,
: x = zU . x
x,
m. G, U , :
2
(G, U ) =
X
i=1
k
xi xi k =
X
i=1
(5.10)
. , kAk2 = tr AA = tr AA, tr
.
, G U : rk G = rk U = m.
U = GU G,
G
m. , m 6 rk F .
(5.10) .
5.2. m 6 rk F , 2 (G, U ) ,
U F F , m . G = F U , U G .
91
. :
(
2 /G = (GU F )U = 0;
2 /U = G (GU F ) = 0.
G U ,
(
G = F U (U U )1 ;
U = F G(G G)1 .
(5.11)
Tmm , T (S G
= diag(1 , . . . , m ) . T T = Im .
R = ST . G = GR,
U = R1 U .
GS)T
G G = T (S G
= ;
U U = T 1 (S 1 U U S 1 )T 1 = (T T )1 = Im .
U G U (5.10) GU = G
. G U (5.11).
G G U U :
(
G = F U;
U = F G.
, U = F F U . ,
U F F , 1 , . . . , m .
, , G = F F G,
G F F , .
G U 2 (G, U ), :
2 (G, U ) = kF GU k2 = tr(F U G )(F GU ) = tr F (F GU ) =
= tr F F tr F GU = kF k2 tr U U =
n
m
n
X
X
X
2
= kF k tr =
j
j =
j ,
j=1
j=1
j=m+1
1 , . . . , n F F . 2 ,
1 , . . . , m m n .
u1 , . . . , um , , .
5.2 .
92
. . .
. m = n, 2 (G, U ) = 0.
F = GU :
F = GU = V DU , G = V D = D2 . V : V V = Im . ,
. 85, 5.2.
m < n, F GU .
GU F
() n m .
. G G = ,
g1 , . . . , gm . U
. m = n,
U : F = GU G = F U .
kG yk2 min .
U , G = GU U = GU F , = U . ,
F GU .
,
U : = U U = U .
- ,
G G :
= 1 G y = D1 V y;
G = V D = V V y.
= U - , ,
, (5.5)(5.7) m 6 n , n m .
. (5.5)(5.7). ,
. , .
. F . m . . F F : 1 > . . . > n > 0.
[0, 1], ,
m, F :
E(m) =
m+1 + + n
kGU F k2
=
6 .
2
kF k
1 + + n
93
E(m) , n m.
, E(m) m.
,
. E(m) m, : E(m 1) E(m), ,
E(m) .
. .
m = 2 g1 (xi ), g2 (xi ) , i = 1, . . . , ,
. -
. , , .
, . g1
g2 . , ,
, , .
.
(. ??) (. 7.2.2).
5.5
, ,
,
. ,
, ,
.
:
.
5.5.1
f (x, )
Rp :
Q(, X ) =
X
i=1
2
f (xi , ) yi .
Q . 0 = (10 , . . . , p0 )
1
t+1 := t ht Q (t ) Q (t ),
94
. . .
Q (t ) Q t , Q (t ) ( ) Q t , ht ,
, .
:
X
f
Q() = 2
(xi , ).
f (xi , ) yi
j
j
i=1
X f
X
2f
f
2
Q() = 2
(xi , )
(xi , ) 2
f (xi , ) yi
(xi , ) .
j k
j
k
j
k
i=1
{z
}
| i=1
0
f , .
.
. f (
),
t :
p
X
f
f (xi , ) = f (xi , ) +
(xi , j ) j jt .
j
j=1
t
f . , .
2f
j
(xi , ). .
k
.
f
t j=1,p
: Ft =
(x
,
)
i
j
i=1,
t
p t- ; ft = f (xi , ) i=1, t- . t-
:
t+1 := t ht (Ft Ft )1 Ft (f t y) .
{z
}
|
kFt (f t y)k2 min. ,
. ,
( ), .
5.5.2
, , f (x, )
95
5.2. (backtting).
:
F, y ;
:
j (x) , .
1: :
:= fj (x);
j (x) := j fj (x), j = 1, . . . , n;
2:
3:
j = 1, . . . , n
n
P
k (fk (xi )), i = 1, . . . , ;
4:
zi := yi
5:
6:
k=1,k6=j
j := arg min
Qj :=
i=1
i=1
2
(fj (x)) zi ;
2
j (fj (x)) zi ;
7: Qj
.
f (x, ) =
n
X
j (fj (x)),
j=1
j : R R ,
. , j , (5.1).
1986 [43]. 5.2.
j .
, j (x) = j fj (x), j .
j , ,
.
X
2
P
j (fj (xi )) yi nk=1,k6=j k (fk (xi ))
Q(j , X ) =
min
j
|
{z
}
i=1
zi =const(j )
Zj = fj (xi ), zi i=1 . : , , -.
96
5.5.3
. . .
, f (x, ) , g(f ) f y. , :
2
X
Pn
g
f
(x
)
Q(, X ) =
minn ,
y
i
i
j=1 j j
R
|
{z
}
i=1
zi
g(f ) .
, . g(z) zi :
g(z) = g(zi ) + g (zi )(z zi ).
Q Q,
:
Q(,
X ) =
=
2
X
Pn
g(zi ) + g (zi )
f
(x
)
y
=
j
j
i
i
i
j=1
i=1
X
i=1
2 Pn
2
yi g(zi )
f
(x
)
z
+
minn .
g (zi )
j
j
i
i
j=1
g (z )
R
| {z }
{z i }
|
wi
yi
wi y.
. , Q() .
5.5.4
L (a, y) a Y
y Y . , a(x) X = (xi , yi )i=1 :
Q(a, X ) =
X
i=1
L a(xi ), y(xi ) min .
a : XY
, L (a, y) = (a y)2 , Q , .
.
,
, , .
. 1.4, . , .
, .
97
- . |a y|
.
5.1.
. a(x),
, (, ). (a y)2
a y .
. a < y : L (a, y) =
= c1 |a y|, c1 . a > y , ,
.
, : L (a, y) = c2 |a y|. c1 c2
. ,
, - .
5.2.
a(x), xi xi+1 . (a y)2
. ,
, a(x). , 1 , , 1 ,
. ( ),
a(x), . , x1 , . . . , x ,
Q(a) =
1
X
i=1
sign a(xi ) y(xi ) y(xi+1 ) y(xi ) .
98
5.5.5
. . .
, , . 4.4.
Q(w) =
X
X
ln w xi yi min,
ln 1 + exp w xi yi =
i=1
i=1
(z) = (1 + ez )1 .
w - Q(w).
,
, yi {1, +1}. , t-
wt+1 :
1
wt+1 := wt ht Q (wt ) Q (wt ),
Q (wt ) () Q(w) wt ,
Q (wt ) () Q(w) wt , ht
, 1,
.
.
i = (yi w xi ) ,
(z) = (z)(1 (z)).
( ) Q(w):
X
Q(w)
=
(1 i )yi fj (xi ),
wj
i=1
j = 1, . . . , n.
( ) Q(w):
X
2 Q(w)
=
(1 i )yi fj (xi ) =
wj wk
wk i=1
X
i=1
j = 1, . . . , n,
k = 1, . . . , n.
:
Fn = fj (xi ) ;
p
= diag (1 i )i ;
F = Fp ;
yi )i=1 .
yi = yi (1 i )/i , y = (
, ,
:
1
y = (F F )1 F y = F + y.
Q (w) Q (w) = (F 2 F )1 F
99
5.3. IRLS
:
F, y ;
:
w .
1: :
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
w := (F F )1 F y;
t := 1, 2, 3, . . .
z := Fp
w;
i := (1 (zi ))(zi ) i = 1, . . . , ;
F := diag(
p 1 , . . . , )F ;
yi := yi (1 (zi ))/(zi ) i = 1, . . . , ;
ht ;
w := w + ht (F F )1 F y;
(zi )
, ;
t.
:
Q(w) = kF w yk2 =
X
i=1
(1 i )i w x yi
| {z }
|
i
p
2
(1 i )/i min .
w
{z
}
yi
,
, .
(iteratively reweighted least squares, IRLS)
. -, , i wt xi .
i , 12 .
. -, wt xi yi .
wt+1 , wt .
5.6
5 ,
, X = Rn , Y = R, a(x) = hw, xi w0 ,
w Rn w0 R .
(. 5.3.3) ,
100
. . .
0
-3
-2
-1
. 17. :
- |z| = 1 z 2 .
w:
Q(a, X ) =
X
i=1
. .
-
-, 17: |z| = |z| + ,
a(xi ) yi , . , , .
X
hw, xi i w0 yi + hw, wi2 min .
Q (a, X ) =
i=1
w,w0
(5.12)
(4.18).
, (5.12)
. , ; ; , ; . , SVM- SVM , .
C = 21 . i+ i , a(xi ) :
i+ = (a(xi ) yi )+ ,
i = (a(xi ) + yi )+ ,
i = 1, . . . , .
(5.12) - wi , w0 , i+ i :
hw,
wi
+
C
(i+ + i ) min
;
2
w,w0 , + ,
i=1
(5.13)
yi i 6 hw, xi i w0 6 yi + + i+ , i = 1, . . . , ;
> 0, + > 0, i = 1, . . . , .
i
i
,
+
i , i , i = 1, . . . , , hxi , xj i
101
K(xi , xj ). , :
P
P
L (+ , ) = (
+ +
) + (
i
i
i i )yi
i=1
i=1
(
;
12
i i )(j j )K(xi , xj ) max
+ ,
i,j=1
i = 1, . . . , ;
0 6
0 6 +
i 6 C,
i 6 C,
(
i + i ) = 0.
i=1
xi , i = 1, . . . , :
1. |a(xi ) yi | < ; +
i = i = i = i = 0.
a(xi ) [yi , yi + ]
. xi w ,
.
2. a(xi ) = yi + ; 0 < +
i < C; i = 0; i = i = 0.
+
+
3. a(xi ) = yi ; 0 <
i < C; i = 0; i = i = 0.
+
+
4. a(xi ) > yi + ; i = C; i = 0; i = a(xi ) yi > 0; i = 0.
+
+
5. a(xi ) < yi ;
i = C; i = 0; i = yi a(xi ) > 0; i = 0.
25 . 4 5 .
:
a(x) =
X
i=1
+
(
i i )K(xi , x) w0 ;
w0 -, 2 3:
(
yi + , xi 2;
hw, xi i w0 =
yi , xi 3.
, , w0 , .
.
. C , ,
, .
SVM-
[61].
102
. . .
, .
, , . (articial
neural networks, ANN) .
, .
.
ANN ,
, .
6.1
, (??) .
. [54]. .
6.1.1
, , ,
x1 x2 , . . 18:
x1 x2 = x1 + x2 21 > 0 ;
x1 x2 = x1 + x2 32 > 0 ;
x1 = x1 + 21 > 0 ;
103
GFED
@ABC
x1 LLL
LL
1 LL
LL& X
@ABC
GFED
/
2
1
x
8
r
r
rr
1/2
rrr
@ABC
GFED
1
GFED
@ABC
x1 9RRRR
99 1RRR
99
R( X
199
RRR
ll6
+1RRR X
l9l9l B
(
1l 99
l
l
99
@ABC
GFED
2 R
x R1RR 9
lll6
RRR
1
P
l
X
R
l
l
(
1/2
6
1/2
lll
lll3/2
@ABC
GFED
GFED
@ABC
1
1
/ (x1 x2 )
. 18. ,
.
. 19. , .
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.0
0.0
0.0
0.5
1.0
0.0
0.0
0.5
1.0
/ (x1 x2 )
0.0
0.0
0.5
1.0
0.0
0.5
1.0
. 20. . : , ,
XOR x1 x2 , XOR .
, , .
. ,
, - -
-, 19:
x1 x2 = (x1 x2 ) (x1 x2 ) 21 > 0 .
, . . 21.
n H
. M
. .6 . ,
, (hidden layers).
, , .
, . , .
6
. (,
) , x0 , x1 , . . . , xn , . , , .
104
6.1.2
. . .
: ( ) ?
.
1. . , ,
[29].
2. ,
n- . , , , , .
3. 1900 23 , ,
, XX . :
n
. . . [14].
6.1 (, 1957). n
[0, 1]n
:
1
f (x , x , . . . , x ) =
2n+1
X
k=1
hk
X
n
i=1
ik (x ) ,
i
hk , ik , ik f .
, 2n + 1 . ,
, , , . , :
ik , hk f ,
.
4. , n . ,
X
, F ,
[62].
. 6.1. F X,
x, x X f F , f (x) 6= f (x ).
6.2 (, 1948). X , C(X) X , F C(X),
(1 F ) X. F C(X).
105
. ,
( )
- ( )
[21]. , ,
.
. 6.2. F C(X) : R R, f F (f ) F .
6.3 (, 1998). X , C(X) X , F
C(X), ,
(1 F ) X. F C(X).
: ,
.
( )
, .
,
. . - , ,
, .
6.2
, , ( ), , . 80- ,
, ,
.
, , , .
.
(error back-propagation) [58].
6.2.1
,
, . 21. . X = Rn , Y = RM .
. M m am , m = 1, . . . , M .
H h uh , h = 1, . . . , H.
106
. . .
,
n
}|
,
H
GFED
@ABC
/
x1 ERERRw11
ll<5
l
EEw1h RR
l
l y
RR
lll y
. . . w1HEE llRlRlRlRlRlR yyyy
Ry
lElE
yyRRRRRR
wj1 l EE
y
l
y
EE y
)
ll
@ABC
GFED
/
xj RRR wjh yyyEyEEE
5
l
l
EEllll A
wjH RRyy
yRRR lllEE
y
. . . wn1yy llRlRlRlRR EEE
RRR EE
y
ll
RRR "
yy ll
R)
ylyllwnh
n
@ABC
GFED
/
w
nH
x
:
w01
w0h
@ABC
GFED
1
}|
...
X
...
X
w0H
1 RERR w11
/
ll<5
EEw1m R
l
l
E
RRR
ll y
RRRlllll yyyy
w1ME
l
R
l
RRR yy
EElll
yRR
lE
wh1l EEE yyy RRRR
l
)
ll w
EEyy
/
h RRR hm yyy EE
5
l
E
l
EE lll A
whM RRyy
yyRRRRlllllElEE
y
y
wH1
lllRRRRRR EEE
RRR E"
ywy llll
y
RR)
y
lll Hm
/
H wHM
:
w01
w0m
,
M
}|
...
X
...
X
/ a1
/ am
/ aM
w0M
@ABC
GFED
1
. 21. .
h- m- whm .
( ),
v j , j = 1, . . . , J wjh . . , v j
j- : v j (x) fj (x) xj , J = n. w .
xi :
m
a (xi ) = m
X
H
h=0
whm u (xi ) ;
u (xi ) = h
X
J
j=0
wjh v (xi ) .
j
(6.1)
xi
( ):
M
2
1X m
Q(w) =
a (xi ) yim .
2 m=1
(6.2)
Q .
:
Q(w)
= am (xi ) yim = m
i .
am
, Q am m
i xi .
:
M
M
X
Q(w) X m
h
m
w
=
=
a
(x
)
m
i
i
m hm
i m whm = i .
uh
m=1
m=1
107
, m
i ,
h
i . m ,
, (6.1). ,
m
= m (1 m ) = am (xi ) 1 am (xi ) .
, hi m
i , ,
h
m
i m , .
whm , , , (
):
hi o
X jt j
jjwh1
j TTT
T
1
jjj i 1
whM TTT
...
M
i M
am uh , Q :
Q(w)
Q(w) am
h
=
= m
i m u (xi ),
m
whm
a whm
Q(w) uh
Q(w)
=
= hi h v j (xi ),
h
wjh
u wjh
m = 1, . . . , M,
h = 1, . . . , H,
h = 0, . . . , H;
j = 0, . . . , J;
(6.3)
(6.4)
. , .
,
, . 6.1.
.
. ,
O(Hn+HM ) .
.
back-propagation
.
.
, ,
, ,
. , back-propagation : , , - .
.
.
108
. . .
6.1. back-propagation
:
X = (xi , yi )i=1 , xi Rn , yi RM ;
H ;
;
:
wjh , whm ;
1: :
2:
3:
4:
5:
6:
7:
8:
1
1
wjh := random 2n
, 2n
;
1
1
whm := random 2H , 2H ;
(xi , yi ) X ;
:
PJ
j
uhi := h
j=0 wjh xi , h = 1, . . . , H;
PH
h
am
i := m
h=0 whm ui , m = 1, . . . , M ;
m
m
am
i := P
i yi , m = 1, . . . , M ;
M
2
Qi := m=1 (m
i ) ;
PM :
h
i := m=1 m
i m whm , h = 1, . . . , H;
:
h
whm := whm m
i m ui , h = 0, . . . , H, m = 1, . . . , M ;
wjh := wjh hi h xji , j = 0, . . . , n, h = 1, . . . , H;
Q + 1 Qi ;
Q := 1
Q ;
, Q, . , , .
H. , ,
.
6.2.2
, 4.3.2 . , .
.
.
1 16.1
1
2k , 2k , k , .
109
( , ) ,
. 9.
.
, , H . - , . ,
. , -
,
, 7 . , , . ,
,
.
. ,
, . , Q(w),
. , ,
Q(w), . ,
. [50] .
1. (
), ,
.
2. .
,
:
jh =
2Q
2
wjh
+ ,
, , , , ,
. / Q(w),
.
back-propagation.
7
, . ,
.
(bagging),
. ??. ??.
110
. . .
3. ,
,
. , , (batch learning).
. , .
6.2.3
, ,
, , , . : , , .
.
. ,
, .
.
, . ,
. . ,
,
, .
H ,
.
1. . ( ) , , ,
. ( )
, ,
, ,
. ,
( ).
2. H , , Q(X k ).
, H, . ,
H,
.
. H . ,
111
. . , - .
, , ,
. .
.
, , , . ,
1.52 , . , ,
, .
- . Q(X k ) , ,
.
. (optimal
brain damage, OBD) [51, 42] , Q
. .
OBD ,
Q w ,
:
Q(w + ) = Q(w) + 12 H(w) + o(kk2 ),
2
, . H(w) = wjhQ(w)
wj h
, ,
, ,
. , .
H(w) ,
H(w) =
J X
H
X
j=0 h=1
2
jh
2 Q(w)
.
2
wjh
wjh wjh + jh = 0.
(salience) , 2
2 Q(w)
Q(w) : Sjh = wjh
.
2
wjh
OBD , d , Sjh . d
.
Q. d
Q.
, , .
112
. . .
wjh , h- j- .
h- .
OBD P . H
Sj =
h=1 Sjh ,
Sj .
whm , m- h- .
(M = 1), h- .
PM
Sh = m=1 Shm .
113
, . ,
, , . ,
.
,
- , . 7.1 ,
. 7.3 ,
.
7.1
( ) .
X = {x1 , . . . , x } X
(x, x ). ,
, , ,
, . xi X () yi .
a : X Y , x X y Y . Y ,
, .
,
. -, . ,
, , .
. -, , , . -, , , ,
.
( ) (
) , yi ,
Y .
, .
:
, X
( , , ).
114
. . .
, ( ).
,
( ).
( ).
. , .
, .
,
, ,
. . (taxonomy). , .
, , .
, XVIII . 30 , 7
: , , , , , , .
,
. , .
. , .
: , , .
:
,
, .
: ,
,
.
,
.
115
,
- , , . .
.
, .
. , , - . , FOREL -,
. ,
,
[18]. [46].
7.1.1
. ,
ij = (xi , xj ).
, , , .
. R (i, j), ij > R.
. ,
R [min ij , max ij ],
. .
,
, .
( ) .
116
. . .
7.1. ()
1: (i, j) ij ;
2:
3:
, ;
4:
;
5: K 1 ;
R ij . :
. R
, [16].
.
. . .
. R, . R
. R,
.
1 ,
.
(),
. ,
, 14 7.1. 5 K 1
, K .
, K
. , , , .
- () .
, ,
.
.
O(3 ) .
FOREL ( ) 1967
.
, [12, 11].
.
117
x0 X R.
xi X , (xi , x0 ) 6 R, x0
. ,
, , .
, .
. x0
, .
, X
, .
, . , ,
X. ,
. , 6
X
x0 := arg min
(x, x ).
xK0
x K0
. O() ,
O( 2 ), . , ,
O() , O(1).
FOREL
, R, x0 . 7.2 ,
. 9 . ,
, , . ,
, , :
.
. R, . , R .
R.
7.2 x0
. [11]
( 10..20) . ,
. ,
.
.
7.1.2
:
yi xi , .
118
. . .
7.2. FOREL
1: :
U := X ;
2: , U 6= :
3:
x0 U ;
4:
5:
x0 R:
K0 := {xi U | (xi , x0 ) 6 R};
6:
P :
x0 := |K10 | xi K0 xi ;
x0 ;
7:
8:
K0 :
U := U \ K0 ;
9: ;
10: xi X ;
, . ,
.
:
P
i<j [yi = yj ](xi , xj )
P
F0 =
min .
i<j [yi = yj ]
:
P
i<j [yi 6= yj ](xi , xj )
P
F1 =
max .
i<j [yi 6= yj ]
y , y Y ,
, .
:
X 1 X
2 (xi , y ) min,
0 =
|Ky | i : y =y
yY
i
Ky = {xi X | yi = y} y.
, . , , 0
Ky , , |Ky | .
:
X
1 =
2 (y , ) max,
yY
.
,
, :
F0 /F1 min,
0 /1 min .
119
7.3. EM-
1: y Y :
wy := 1/|Y |;
y := ;
P
2
yj
:= |Y1 | i=1 (fj (xi ) yj )2 , j = 1, . . . , n;
2:
3:
E- (expectation):
wy py (xi )
, y Y , i = 1, . . . , ;
giy := P
zY wz pz (xi )
4:
M- (maximization):
1P
giy , y Y ;
wy :=
i=1
1 P
yj :=
giy fj (xi ), y Y , j = 1, . . . , n;
wy i=1
1 P
2
giy (fj (xi ) yj )2 , y Y , j = 1, . . . , n;
yj
:=
wy i=1
5:
:
yi := arg max giy , i = 1, . . . , ;
yY
6: yi ;
7.1.3
, .
.
EM-. , . 2.4.
7.1 ( ). X ,
X
X
p(x) =
wy py (x),
wy = 1,
yY
yY
py (x) y, wy y.
py (x), . , ,
, . , . -,
. , , . ,
.
120
. . .
7.2 ( ).
x X = Rn n : x f1 (x), . . . , fn (x) .
y Y n- py (x) 2
2
y = (y1 , . . . , yn ) y = diag(y1
, . . . , yn
):
n
py (x) = (2) 2 (y1 yn )1 exp 21 2y (x, y ) ,
2y (x, x ) =
n
P
j=1
2
2
yj
|fj (x)fj (x )|2 yj
.
, EM 2.3. , 2.7 .
7.3.
, EM-
. E- giy . giy , xi X
y Y . M- (y , y ),
giy .
7.3 ,
. , 2.3.
k-, 7.4, EM-. , EM- xi giy = P{yi = y}. k-
(k-means) .
, k-means .
. EM,
y = y In .
, k-means,
. 7.3
5 E- :
yi := arg min y (xi , y ),
j = 1, . . . , n;
yY
j = 1, . . . , n,
y Y.
, EM k-means ,
.
, k-means
FOREL. , FOREL R,
k-means .
k-means. [18, . 110] 7.4. [18, . 98]
, , xi
121
7.4. k-
1: y Y :
y ;
2:
3:
( E-):
yi := arg min (xi , y ), i = 1, . . . , ;
yY
( M-):
P
[yi = y]fj (xi )
, y Y , j = 1, . . . , n;
yj := i=1
P
[y
=
y]
i
i=1
5: yi ;
4:
, . 4
i, 3. 1967 ,
0 .
k-means . 1 . k :
; ,
.
,
.
, .
k ,
.
. EM k-means (semi-supervised
learning), xi y (xi ). U , U X .
, , . ,
, . . (x, x ) -
: , ,
, .
: E- ( 3)
xi U giy := [y = y (xi )], xi X \ U giy .
.
122
7.1.4
. . .
,
, , .
.
.
.
. , . 7.5.
.
R({x}, {x }) = (x, x ).
.
U V W = U V .
W S R(U, V ),
R(U, S) R(V, S), :
R(U V, S) = U R(U, S) + V R(V, S) + R(U, V ) + |R(U, S) R(V, S)|,
U , U , , . .
1967 [49, 24].
R(W, S)
W S.
- [18]:
U = V = 12 , = 0, = 12 ;
:
R (W, S) =
wW,sS
U = V = 12 , = 0, = 21 ;
:
R (W, S) = max (w, s);
wW,sS
:
P P
R (W, S) = |W1||S|
(w, s);
U =
|U |
,
|W |
V =
|V |
,
|W |
= = 0;
U =
|U |
,
|W |
V =
|V |
,
|W |
= U V , = 0;
U =
|S|+|U |
,
|S|+|W |
wW sS
:
P
P s
w
R (W, S) = 2
;
,
|W |
|S|
wW
R (W, S) =
|S||W | 2
|S|+|W |
sS
P
wW
w
,
|W |
sS
s
|S|
;
V =
|S|+|V |
,
|S|+|W |
|S|
,
|S|+|W |
= 0.
,
. : ?
, .
123
7.5. -
1: C1 :
t := 1; Ct = {x1 }, . . . , {x } ;
2: t = 2, . . . ,
(t ):
3:
Ct1 :
(U, V ) := arg min R(U, V );
U 6=V
4:
5:
6:
Rt := R(U, V );
U V , W = U V :
Ct := Ct1 {W } \ {U, V };
S Ct
R(W, S) -;
. Rt , t- . , R
,
: R2 6 R3 6 . . . 6 R .
, . , Rt . ,
,
, . Ct
.
, , - .
, . ,
, , .
, -
.
7.1 (, 1979). ,
:
1) U > 0, V > 0;
2) U + V + > 1;
3) min{U , V } + > 0.
R . ,
.
. . , ,
, .
124
. . .
, Rt .
, . , ,
. R R .
, , . , ,
. .
, Rt . R
.
Rt /(U , V ),
Rt = R(U, V ) ,
t- , U V .
, R ;
, . ,
, , , R R . ,
.
,
, . :
U = V = (1 )/2,
= 0,
< 1.
> 0 < 0.
: = 0,25 [24].
. 7.5 3. O(2 )
. , O(3 ) .
.
, .
,
(U, V ) : R(U, V ) 6 . , , .
, .
7.6.
, , 7.5, R :
. 7.1 (, 1978). R ,
> 0 - U V - U V
- W = U V :
S R(U V, S) < , R(U, V ) 6 S R(S, U ) < R(S, V ) < .
125
7.6.
1: C1 :
t := 1; Ct = {x1 }, . . . , {x } ;
2:
;
3: P () := (U, V ) U, V Ct , R(U, V ) 6 ;
4: t = 2, . . . ,
(t ):
P () =
5:
6:
, P () 6= ;
7:
P () :
(U, V ) := arg min R(U, V );
(U,V )P ()
8:
9:
10:
11:
12:
Rt := R(U, V );
U V , W = U V :
Ct := Ct1 {W } \ {U, V };
S Ct
R(W, S) -;
R(W, S) 6
P () := P () (W, S) ;
7.2 ( , 1984). ,
R :
1) U > 0, V > 0;
2) U + V + min{, 0} > 1;
3) min{U , V } + > 0.
7.2 7.1, , , , . R
.
7.6 2 6. : , P ()
, 7.5; , P (). .
Ct n1 , P ()
(U, V ) Ct . n2
R(U, V ), . n1 n2 ,
. , , .
n1 = n2 = 20.
.
|Rt+1 Rt |,
126
. . .
Ct . K = t + 1.
K0 6 K 6 K1
t, K1 + 1 6 t 6 K0 + 1.
, .
. , , . ,
, .
,
. , . , .
.
, .
,
, .
[18].
.
7.2
,
, xi yi .
, xi , ,
.
, X : X X R,
.
7.2.1
(7.1)
, x.
, x, -, (7.1)
WTA (winner takes all).
127
GFED
@ABC
x1 ?? w11 / (x, w1 )
?
LLL
?
LLL
1
wM
&
??
arg
??
?
?
min
8
??
rr
wn
??
rrr
1
r
n
@ABC
GFED
/ (x, wM )
x n wM
/ a(x)
. 22. (7.1) .
a(x) wm . ,
:
1X 2
Q(w1 , . . . , wM ) =
(xi , wa(xi ) ) min .
{wm }
2 i=1
, : (x, w) = kxwk.
Q wm :
X
Q
(wm xi ) a(xi ) = m .
=
wm
i=1
wm
4.1, .
, 6
wm := wm + (xi wm ) a(xi ) = m ,
(7.2)
xi X ,
, . : xi m, wm
xi , .
(7.2)
, ,
. (7.1) ,
hx, wm i
(x, wm ), , . . 22. wm
, . ,
, -.
. .
[40].
(learning
vector quantization, LVQ) , xi
M wm , , , () . , M ,
.
M , .
128
. . .
CWTA. WTA ,
, . .
(7.1) .
CWTA (conscience WTA):
a(x) = arg min Cm (x, wm ),
mY
(7.3)
Cm m- . ,
.
WTM. WTA , , - wm . , , ,
xi .
[0, +) K(). K(0) = 1. K() = exp(2 )
> 0. WTA WTM (winner takes most):
wm := wm + (xi wm ) K (xi , wm ) , m = 1, . . . , M.
(7.4)
xi ,
xi , .
, (7.2)
(7.4), K (xi , wm ) = [a(xi ) = m].
,
. , , -.
, WTM
. .
7.2.2
129
7.7.
:
X ;
;
:
wmh , m = 1, . . . , M , h = 1, . . . , H;
1: :
wmh := random 2M1 H , 2M1 H ;
2:
3:
xi X ;
4:
WTA: , xi :
(mi , hi ) := a(xi ) arg min (xi , wmh );
(m,h)Y
(m, h) Y , (mi , hi )
WTM: :
wmh := wmh + (xi wmh ) K r((mi , hi ), (m, h)) ;
7: ;
5:
6:
h = 1, . . . , H. , Y ,
Y = {1, . . . , M } {1, . . . , H}.
a(x) (m, h) Y , , x.
, .
, . 7.7. xi 3 WTA.
(mi , hi ). , WTM, ,
, xi .
, 6, (7.4), (x, x ), ,
Y :
p
r (mi , hi ), (m, h) = (m mi )2 + (h hi )2 .
. ( a(x)) X ,
.
130
. . .
- .
k ,
. ,
. . , ,
, , . 23.
.
n , . (m, h) j- wm,h .
. j- , .
131
, ,
.
, , .
, .
, . , ,
.
.
. , , .
.
. . , [44]. ,
, , .
.
, .
.
, . .
, , .
.
7.2.3
.
,
X = {xi }i=1 yi = y (xi ).
, ( X) , yi , y . WTA - .
WTM .
132
. . .
- . (7.1) ,
, v1 , . . . , vM :
a(x) = vm (x) =
M
X
m=1
vm m (x) = m ;
(7.5)
m (x) - x, WTA.
vm
:
Q(v) =
2
1X
a(xi ) yi min;
v
2 i=1
X
Q
=
a(xi ) yi m (xi ) = m = 0.
vm
i=1
a(xi ) (7.5),
P
i=1 yi m (xi ) = m
vm = P
.
(x ) = m
m
i
i=1
, vm yi
xi , m- . a(x) , m- , vm . , a(x)
- .
WTM
K(). a(x) - vm M :
M
X
K (x, wm )
vm PM
a(x) =
.
s=1 K (x, ws )
m=1
Q(v)
M M . .
wm , () vm :
wm := wm (wm xi )K (xi , wm ) ;
K (xi , wm )
;
vm := vm (a(xi ) yi ) PM
s=1 K (xi , ws )
, wm vm , .
, , ,
back-propagation.
133
7.3
X , ,
.
(multidimensional scaling, MDS) . X = {x1 , . . . , x } X.
Rij = (xi , xj ) (i, j) D.
xi X xi = (x1i , . . . , xni ) Rn ,
dij xi xj
d2ij
n
X
d=1
(xdi xdj )2
Rij (i, j) D.
-;
, :
X
S(X ) =
wij (dij Rij )2 min,
(i,j)D
d=1,n
n (xdi )i=1,
.
n . , n = 2
(scatter plot). , , (S > 0),
, ,
.
.
,
, D .
wij . wij = (Rij ) .
< 0 ;
> 0 . = 2,
, ; , .
S(X ) n ,
, .
.
.
. ,
x X x (x1 , . . . , xn ), :
X
2
S(x) =
wi di (x) Ri min,
xi U
134
. . .
=
2
w
x
x
;
i
i
xa
di
xi U
!
2
X
Ri
Ri xa xai
2S
+1 ;
=2
wi
xa xa
d
d
d
i
i
i
xi U
2
X Ri xa xa xb xb
S
i
i
=2
wi
.
a
b
x x
di
di
di
x U
i
, n 6 3
,
O(n3 ).
, S(x) x ,
kx(t+1) x(t) k.
, x U X (x, U ).
. 7.8 , xi , xj .
. , ,
, ,
. 34 , .
( )
(0, 0) (0, Rij ). xk ,
135
7.8.
:
Rij , , ;
K ;
:
xi (x1i , . . . , xni ), i = 1, . . . , ;
1: :
U := ;
2: |U | < K
:
3:
x := arg max min Rij ;
xi X \U
xj U
4:
(x, U );
5:
U := U {x};
6: :
7:
:
8:
(x, U \ {x});
9: x X \ U
10:
(x, U );
136
. . .
,
. ;
Rij ; dij ; (i, j) D.
,
. dij (Rij ),
, dij (R)
(, 90%) R.
. , , , .
7.1.
, [15] ( ) . ,
, Rij (, ).
, ( ), .
[1] . ., . ., . ., . .
: . .: , 1989.
[2] . ., . ., . . : . .: , 1985.
[3] . ., . ., . . . .: , 1970. 320 pp.
[4] . . . .: , 1979.
[5] . ., . .
// . 1968. . 181, 4. . 781784.
[6] . ., . .
// . 1971.
. 16, 2. . 264280.
[7] . ., . . . .: ,
1974.
[8] . . : , . .: , 2001.
137
138
. . .
[26] ., .
. : , 2004.
[27] . . // . , , 1965. Pp. 3845.
[28] . . : , , .
.: , 2000.
[29] . . . .: , 1986.
[30] Asuncion A., Newman D. UCI machine learning repository: Tech. rep.: University of
California, Irvine, School of Information and Computer Sciences, 2007.
http://www.ics.uci.edu/mlearn/MLRepository.html.
[31] Bartlett P. The sample complexity of pattern classication with neural networks:
the size of the weights is more important than the size of the network // IEEE
Transactions on Information Theory. 1998. Vol. 44, no. 2. Pp. 525536.
http://discus.anu.edu.au/bartlett.
[32] Bartlett P., Shawe-Taylor J. Generalization performance of support vector machines
and other pattern classiers // Advances in Kernel Methods. MIT Press,
Cambridge, USA, 1999. Pp. 4354.
http://citeseer.ist.psu.edu/bartlett98generalization.html.
[33] Bishop C. M. Pattern Recognition and Machine Learning. Springer, Series:
Information Science and Statistics, 2006. 740 pp.
[34] Boucheron S., Bousquet O., Lugosi G. Theory of classication: A survey of some
recent advances // ESAIM: Probability and Statistics. 2005. no. 9. Pp. 323
375.
http://www.econ.upf.edu/lugosi/esaimsurvey.pdf.
[35] Burges C. J. C. A tutorial on support vector machines for pattern recognition //
Data Mining and Knowledge Discovery. 1998. Vol. 2, no. 2. Pp. 121167.
http://citeseer.ist.psu.edu/burges98tutorial.html.
[36] Burges C. J. C. Geometry and invariance in kernel based methods // Advances in
Kernel Methods / Ed. by B. Scholkopf, C. C. Burges, A. J. Smola. MIT Press,
1999. Pp. 89 116.
[37] Cleveland W. S. Robust locally weighted regression and smoothing scatter plots //
Journal of the American Statistical Association. 1979. Vol. 74, no. 368.
Pp. 829836.
[38] Cortes C., Vapnik V. Support-vector networks // Machine Learning. 1995.
Vol. 20, no. 3. Pp. 273297.
http://citeseer.ist.psu.edu/cortes95supportvector.html.
[39] Dempster A. P., Laird N. M., Rubin D. B. Maximum likelihood from incomplete
data via the EM algorithm // J. of the Royal Statistical Society, Series B. 1977.
no. 34. Pp. 138.
139
140
. . .
141