Beruflich Dokumente
Kultur Dokumente
Ch11 Presnedit
Ch11 Presnedit
Backpropagation
1
11 Multilayer Perceptron
Inputs First Layer Second Layer Third Layer
R S1 S2 S3 Network
2
11 Example
3
11 Elementary Decision Boundaries
1 First Boundary:
1
a 1 = hardlim ( 1 0 p + 0.5 )
2
Second Boundary:
1
a 2 = hardlim ( 0 1 p + 0.75 )
First Subnetwork
Inputs Individual Decisions AND Operation
n11 a11
p1
-1
0
1 n21 a21
1
0.5
0 n12 a12 1 -1.5
p2
-1 1
0.75
1
4
11 Elementary Decision Boundaries
4 Third Boundary:
1
a 3 = hardlim ( 1 0 p 1.5 )
Fourth Boundary:
3 1
a 4 = hardlim ( 0 1 p 0.25 )
Second Subnetwork
Inputs Individual Decisions AND Operation
n13 a13
p1
1
0
1 n22 a22
1
- 1.5
0 n14 a14 1 -1.5
p2
1 1
- 0.25
1
5
11 Total Network
1 0 0.5
1
W = 0 1 b 1 = 0.75
1 0 1.5
0 1 0.25
2
W = 1100 b = 1.5
2
0 0 1 1 1.5
3 3
W = 11 b = 0.5
p a1 a2 a3
2x1
W 1
4x1
W 2
2x1
W 3
1x1
4x 2
n1 2x4
n2 1x 2
n3
4x1 2x1 1x1
1 b1 1 b2 1 b3
4x1 2x1 1x1
2 4 2 1
1 1
n11 a11 f ( n ) = -----------------
w11,1
w21,1
1+e
n
n2 a2
b11
p
1
n2
1 a2
1
b2
w12,1 w21,2
1
2
b2
1
f (n) = n
1
2 2 2
w 1, 1 = 1 w 1, 2 = 1 b = 0
7
11 Nominal Response
-1
-2 -1 0 1 2
8
11 Parameter Variations
3 3
2
2 0
1
b2 20 2 1 w 1, 1 1
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
3 3
2
2 1 b 1
2 1 w 1, 2 1 2
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
9
11 Multilayer Network
Input First Layer Second Layer Third Layer
p a1 a2 a3
Rx1
W1 S1 x 1
W2 S2 x 1
W3 S3 x 1
S1 x R
n 1
S2 x S1
n 2
S3 x S2
n3
S1 x 1
f1 S2 x 1
f2 S3 x 1
f3
1 b1 1 b2 1 b3
S1 x 1 S2 x 1 S3 x 1
R S1 S2 S3
0
a = p
a = aM
10
11 Performance Index
Training Set
{p 1, t 1} , { p 2, t 2} , , {p Q, tQ}
Vector Case
T T
F(x )= E[e e ] = E[(t a ) (t a ) ]
Example
2w 2w
f ( n ) = cos ( n ) n = e f ( n ( w ) ) = cos ( e )
d f (n(w) ) d f ( n ) dn ( w ) 2w 2w 2w
----------------------- = -------------- --------------- = ( sin ( n ) ) ( 2e ) = ( sin ( e ) ) ( 2e )
dw dn dw
m
F F n i F F n
m
m
- = --------m- -----------
----------- m
- --------- = --------- --------i-
w i, j n i w i, j b i
m m
n i b i
m
12
11 Gradient Calculation
m1
S
m m m1 m
ni = wi, j a j + bi
j=1
m m
n i m1 n i
- = aj
----------- --------- = 1
m m
w i, j b i
Sensitivity
m F
s i --------m-
n i
Gradient
F m m1 F m
-----------
m
- = s i aj --------m- = s i
w i, j b i
13
11 Steepest Descent
m m m m1 m m m
w i, j ( k + 1 ) = w i, j ( k ) s i a j b i ( k + 1 ) = b i ( k ) si
m m m m1 T
W (k + 1) = W (k ) s (a ) bm ( k + 1 ) = bm ( k ) sm
F
---------
n m 1
F
m
F --------m-
s ---------m- = n 2
n
F
----------
m
-
n m
S
m m
n j n j
n mm++ 11 n mm++ 11 n mm++ 11
S S S m m
---------------- ---------------- ---------------- m m f (n j )
n 1
m
n 2
m
n m
m f ( n j ) = --------------------
-
m
S n j
m m
f ( n 1 ) 0 0
m+1 m m
n m+1 m m m m 0 f ( n 2 ) 0
----------------- = W F ( n ) F ( n ) =
m
n
m
0 0 f ( n mm )
S
15
11 Backpropagation (Sensitivities)
T
m F n m + 1 F m m m + 1 T F
s = ---------- = ----------------
m
- ----------------
m+1
- = F ( n ) ( W ) ----------------
m+1
-
m
n n n n
m m m m+1 T m+1
s = F (n ) ( W ) s
M M1 2 1
s s s s
16
11 Initialization (Last Layer)
M
S
(t j a j)
2
T
M F ( t a ) ( t a ) a
si j=1 - = 2 ( t i a i ) ---------i-
= ---------- = --------------------------------------- = ----------------------------------
M M M M
n i n i n i n i
a i a iM f M(nM ) M M
---------- = ---------- = ----------------------- = f ( n i )
i
n iM n iM n iM
M M M
si = 2 ( t i a i ) f ( n i )
M M M
s = 2 F (n ) ( t a )
17
11 Summary
Forward Propagation
0
a = p
m+1 m+1 m+1 m m+1
a = f (W a +b ) m = 0, 2, , M 1
a = aM
Backpropagation
M M M
s = 2 F (n ) ( t a )
m m m m+1 T m+1
s = F (n ) ( W ) s m = M 1, , 2, 1
Weight Update
m m m m1 T m m m
W (k + 1) = W (k ) s (a ) b (k + 1) = b (k ) s
18
11 Example: Function Approximation
t
g ( p ) = 1 + sin --- p
4
p - e
1-2-1 a
Network
19
11 Network
p
1-2-1 a
Network
n11 a11
w11,1
w21,1
n2 a2
b11
p
1
n2
1 a2
1
b2
w12,1 w21,2
1
b2
1
-1
-2 -1 0 1 2 21
11 Forward Propagation
0
a = p = 1
a1 = f 1 ( W 1 a0 + b 1 ) = logsig 0.27 1 + 0.48 = logsig 0.75
0.41 0.13 0.54
1
--------------------
0.75
1
a = 1 + e = 0.321
1 0.368
-------------------
0.54
-
1+e
0.368
2
e = t a = 1 + sin --- p a = 1 + sin --- 1 0.446 = 1.261
4 4
22
11 Transfer Function Derivatives
n
d 1
----------------- = ------------------------ = 1 ----------------- ----------------- = ( 1 a ) ( a )
1 e 1 1 1 1
f ( n ) =
d n 1 + en n 2 n n
(1 + e ) 1 + e 1 + e
2 d
f ( n ) = (n) = 1
dn
23
11 Backpropagation
2 2 2 2
s = 2 F (n ) ( t a ) = 2 f ( n 2 ) ( 1.261 ) = 2 1 ( 1.261 ) = 2.522
1 1
1 2 T 2 ( 1 a 1 ) ( a1 ) 0 0.09
s 1 = F (n 1) ( W ) s = 2.522
1 1 0.17
0 ( 1 a2 ) ( a2 )
s = ( 1 0.321 ) ( 0.321 )
1 0 0.09
2.522
0 ( 1 0.368 ) ( 0.368 ) 0.17
24
11 Weight Update
= 0.1
T
W 2 ( 1 ) = W 2 ( 0 ) s 2 ( a 1 ) = 0.09 0.17 0.1 2.522 0.321 0.368
2
W ( 1 ) = 0.171 0.0772
2 2 2
b ( 1 ) = b ( 0 ) s = 0.48 0.1 2.522 = 0.732
0 T
W ( 1 ) = W ( 0 ) s ( a ) = 0.27 0.1 0.0495 1 = 0.265
1 1 1
1-3-1 Network
3 3
2
i=1 2
i=2
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
3 3
i=4 i=8
2 2
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
26
11 Choice of Network Architecture
g ( p ) = 1 + sin ------ p
6
4
3 3
2
1-2-1 2
1-3-1
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
3 3
1-4-1 1-5-1
2 2
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
27
11 Convergence
g ( p ) = 1 + sin ( p )
3 3
2 5 2
1
5
1 3 1 3
2 4
4 2
0
0 0
0
1
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
28
11 Generalization
{p 1, t 1} , { p 2, t 2} , , {p Q, tQ}
g ( p ) = 1 + sin --- p p = 2, 1.6, 1.2, , 1.6, 2
4
3 3
1-2-1 1-9-1
2 2
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
29