Sie sind auf Seite 1von 29

11

Backpropagation

1
11 Multilayer Perceptron
Inputs First Layer Second Layer Third Layer

n11 a11 w 21,1 n21 a21 w 31,1 n31 a31



w 11,1
f1 f2 f3
p1
b11 b21 b31
1 1 1
p2 n12 a12 n22 a22 n32 a32
f1 f2 f3
p3 b12 b22 b32
1 1 1
pR nS 1 1 aS
1 1 nS 2 2 aS
2 2 n3S 3 a3S 3
w 1S 1, R f1
w 2S 2, S 1 f2
w 3S 3, S 2 f3
b1S 1 b2S 2 b3S 3
1 1 1

a1 = f 1 (W1p + b1) a2 = f 2 (W2a1 + b2) a3 = f 3 (W3a2 + b3)


a3 = f 3 (W3f 2 (W2f 1 (W1p + b1) + b2) + b3)

R S1 S2 S3 Network
2
11 Example

3
11 Elementary Decision Boundaries
1 First Boundary:
1
a 1 = hardlim ( 1 0 p + 0.5 )
2

Second Boundary:
1
a 2 = hardlim ( 0 1 p + 0.75 )

First Subnetwork
Inputs Individual Decisions AND Operation

n11 a11
p1
-1
0
1 n21 a21
1
0.5

0 n12 a12 1 -1.5
p2
-1 1
0.75
1
4
11 Elementary Decision Boundaries
4 Third Boundary:
1
a 3 = hardlim ( 1 0 p 1.5 )

Fourth Boundary:
3 1
a 4 = hardlim ( 0 1 p 0.25 )

Second Subnetwork
Inputs Individual Decisions AND Operation

n13 a13
p1
1
0
1 n22 a22
1
- 1.5

0 n14 a14 1 -1.5
p2
1 1
- 0.25
1
5
11 Total Network
1 0 0.5
1
W = 0 1 b 1 = 0.75
1 0 1.5
0 1 0.25

2
W = 1100 b = 1.5
2
0 0 1 1 1.5
3 3
W = 11 b = 0.5

Input Initial Decisions AND Operations OR Operation

p a1 a2 a3
2x1
W 1
4x1
W 2
2x1
W 3
1x1
4x 2
n1 2x4
n2 1x 2
n3
4x1 2x1 1x1
1 b1 1 b2 1 b3
4x1 2x1 1x1
2 4 2 1

a1 = hardlim (W1p + b1) a2 = hardlim (W2a1 + b2) a3 = hardlim (W3a2 + b3)


6
11 Function Approximation Example
Input Log-Sigmoid Layer Linear Layer

1 1
n11 a11 f ( n ) = -----------------
w11,1
w21,1
1+e
n
n2 a2

b11
p
1
n2
1 a2
1
b2
w12,1 w21,2
1
2
b2
1
f (n) = n
1

a1 = logsig (W1p + b1) a2 = purelin (W2a1 + b2)

Nominal Parameter Values


1 1 1 1
w 1, 1 = 1 0 w 2, 1 = 10 b 1 = 10 b 2 = 10

2 2 2
w 1, 1 = 1 w 1, 2 = 1 b = 0

7
11 Nominal Response

-1
-2 -1 0 1 2

8
11 Parameter Variations

3 3
2
2 0
1
b2 20 2 1 w 1, 1 1

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

3 3
2
2 1 b 1
2 1 w 1, 2 1 2

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

9
11 Multilayer Network
Input First Layer Second Layer Third Layer

p a1 a2 a3
Rx1
W1 S1 x 1
W2 S2 x 1
W3 S3 x 1
S1 x R
n 1
S2 x S1
n 2
S3 x S2
n3
S1 x 1
f1 S2 x 1
f2 S3 x 1
f3
1 b1 1 b2 1 b3
S1 x 1 S2 x 1 S3 x 1
R S1 S2 S3

a1 = f 1 (W1p + b1) a2 = f 2 (W2a1 + b2) a3 = f 3 (W3a2 + b3)


a3 = f 3 (W3 f 2 (W2f 1 (W1p + b1) + b2) + b3)

m+1 m+1 m+1 m m+1


a = f (W a +b ) m = 0, 2, , M 1

0
a = p

a = aM
10
11 Performance Index
Training Set
{p 1, t 1} , { p 2, t 2} , , {p Q, tQ}

Mean Square Error


2 2
F(x)= E[ e ] = E[ (t a ) ]

Vector Case
T T
F(x )= E[e e ] = E[(t a ) (t a ) ]

Approximate Mean Square Error (Single Sample)


T T
F ( x ) = ( t ( k ) a ( k ) ) ( t ( k ) a ( k ) ) = e ( k ) e ( k )

Approximate Steepest Descent


m m F m m F
wi, j ( k + 1 ) = w i, j ( k ) ------------ b i ( k + 1 ) = b i ( k ) --------m-
m
w i, j b i
11
11 Chain Rule
d f ( n( w ) ) d f ( n ) dn ( w )
----------------------- = -------------- ---------------
dw dn dw

Example
2w 2w
f ( n ) = cos ( n ) n = e f ( n ( w ) ) = cos ( e )

d f (n(w) ) d f ( n ) dn ( w ) 2w 2w 2w
----------------------- = -------------- --------------- = ( sin ( n ) ) ( 2e ) = ( sin ( e ) ) ( 2e )
dw dn dw

Application to Gradient Calculation

m
F F n i F F n
m

m
- = --------m- -----------
----------- m
- --------- = --------- --------i-
w i, j n i w i, j b i
m m
n i b i
m

12
11 Gradient Calculation
m1
S


m m m1 m
ni = wi, j a j + bi
j=1

m m
n i m1 n i
- = aj
----------- --------- = 1
m m
w i, j b i

Sensitivity
m F
s i --------m-
n i

Gradient
F m m1 F m
-----------
m
- = s i aj --------m- = s i
w i, j b i

13
11 Steepest Descent
m m m m1 m m m
w i, j ( k + 1 ) = w i, j ( k ) s i a j b i ( k + 1 ) = b i ( k ) si

m m m m1 T
W (k + 1) = W (k ) s (a ) bm ( k + 1 ) = bm ( k ) sm

F
---------
n m 1
F
m
F --------m-
s ---------m- = n 2
n

F
----------
m
-
n m
S

Next Step: Compute the Sensitivities (Backpropagation)


14
11 Jacobian Matrix
m
S
n m +1
n m +1
n m +1 m + 1
wi, l a l + b i
m+1 m
1
----------------
1
---------------- 1
---------------- m+1 m
n 1
m
n 2
m
n
m n i l = 1 m+1 a
S
m ---------------- = ----------------------------------------------------------- = w i, j --------j-
m m m
m+1 m+1 m+1 n j n j n j
n 2 n 2 n 2
n
m+1
---------------- ---------------- ----------------
----------------
- n 1
m
n 2
m
n m
m
m+1 m m
n
m
S n i m + 1 f (n j ) m+1 m m
---------------- = w i, j --------------------- = w i, j f ( n j )


m m
n j n j
n mm++ 11 n mm++ 11 n mm++ 11
S S S m m
---------------- ---------------- ---------------- m m f (n j )
n 1
m
n 2
m
n m
m f ( n j ) = --------------------
-
m
S n j

m m
f ( n 1 ) 0 0
m+1 m m
n m+1 m m m m 0 f ( n 2 ) 0
----------------- = W F ( n ) F ( n ) =
m
n


m
0 0 f ( n mm )
S
15
11 Backpropagation (Sensitivities)

T
m F n m + 1 F m m m + 1 T F
s = ---------- = ----------------
m
- ----------------
m+1
- = F ( n ) ( W ) ----------------
m+1
-
m
n n n n

m m m m+1 T m+1
s = F (n ) ( W ) s

The sensitivities are computed by starting at the last layer, and


then propagating backwards through the network to the first layer.

M M1 2 1
s s s s

16
11 Initialization (Last Layer)
M
S
(t j a j)
2
T
M F ( t a ) ( t a ) a
si j=1 - = 2 ( t i a i ) ---------i-
= ---------- = --------------------------------------- = ----------------------------------
M M M M
n i n i n i n i

a i a iM f M(nM ) M M
---------- = ---------- = ----------------------- = f ( n i )
i
n iM n iM n iM

M M M
si = 2 ( t i a i ) f ( n i )

M M M
s = 2 F (n ) ( t a )

17
11 Summary
Forward Propagation
0
a = p
m+1 m+1 m+1 m m+1
a = f (W a +b ) m = 0, 2, , M 1

a = aM

Backpropagation
M M M
s = 2 F (n ) ( t a )

m m m m+1 T m+1
s = F (n ) ( W ) s m = M 1, , 2, 1

Weight Update
m m m m1 T m m m
W (k + 1) = W (k ) s (a ) b (k + 1) = b (k ) s
18
11 Example: Function Approximation


t
g ( p ) = 1 + sin --- p
4

p - e

1-2-1 a
Network

19
11 Network

p
1-2-1 a
Network

Input Log-Sigmoid Layer Linear Layer

n11 a11
w11,1
w21,1
n2 a2

b11
p
1
n2
1 a2
1
b2
w12,1 w21,2
1
b2
1

a1 = logsig (W1p + b1) a2 = purelin (W2a1 + b2)


20
11 Initial Conditions
W ( 0 ) = 0.27 b ( 0 ) = 0.48
1 1 2 2
W ( 0 ) = 0.09 0.17 b ( 0 ) = 0.48
0.41 0.13
3
Network Response
Sine Wave

-1
-2 -1 0 1 2 21
11 Forward Propagation
0
a = p = 1


a1 = f 1 ( W 1 a0 + b 1 ) = logsig 0.27 1 + 0.48 = logsig 0.75
0.41 0.13 0.54

1
--------------------
0.75
1
a = 1 + e = 0.321
1 0.368
-------------------
0.54
-
1+e

a = f ( W a + b ) = purelin ( 0.09 0.17 0.321 + 0.48 ) = 0.446


2 2 2 1 2

0.368

2
e = t a = 1 + sin --- p a = 1 + sin --- 1 0.446 = 1.261
4 4
22
11 Transfer Function Derivatives

n
d 1
----------------- = ------------------------ = 1 ----------------- ----------------- = ( 1 a ) ( a )
1 e 1 1 1 1
f ( n ) =
d n 1 + en n 2 n n
(1 + e ) 1 + e 1 + e

2 d
f ( n ) = (n) = 1
dn

23
11 Backpropagation

2 2 2 2
s = 2 F (n ) ( t a ) = 2 f ( n 2 ) ( 1.261 ) = 2 1 ( 1.261 ) = 2.522

1 1
1 2 T 2 ( 1 a 1 ) ( a1 ) 0 0.09
s 1 = F (n 1) ( W ) s = 2.522
1 1 0.17
0 ( 1 a2 ) ( a2 )

s = ( 1 0.321 ) ( 0.321 )
1 0 0.09
2.522
0 ( 1 0.368 ) ( 0.368 ) 0.17

s = 0.218 0 0.227 = 0.0495


1

0 0.233 0.429 0.0997

24
11 Weight Update
= 0.1

T
W 2 ( 1 ) = W 2 ( 0 ) s 2 ( a 1 ) = 0.09 0.17 0.1 2.522 0.321 0.368
2
W ( 1 ) = 0.171 0.0772

2 2 2
b ( 1 ) = b ( 0 ) s = 0.48 0.1 2.522 = 0.732

0 T
W ( 1 ) = W ( 0 ) s ( a ) = 0.27 0.1 0.0495 1 = 0.265
1 1 1

0.41 0.0997 0.420

b ( 1 ) = b ( 0 ) s = 0.48 0.1 0.0495 = 0.475


1 1 1

0.13 0.0997 0.140


25
11 Choice of Architecture
g ( p ) = 1 + sin ----- p
i
4

1-3-1 Network
3 3

2
i=1 2
i=2
1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

3 3
i=4 i=8
2 2

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

26
11 Choice of Network Architecture
g ( p ) = 1 + sin ------ p
6
4

3 3

2
1-2-1 2
1-3-1
1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

3 3
1-4-1 1-5-1
2 2

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

27
11 Convergence
g ( p ) = 1 + sin ( p )

3 3

2 5 2

1
5
1 3 1 3
2 4
4 2
0
0 0

0
1
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

28
11 Generalization

{p 1, t 1} , { p 2, t 2} , , {p Q, tQ}


g ( p ) = 1 + sin --- p p = 2, 1.6, 1.2, , 1.6, 2
4

3 3

1-2-1 1-9-1
2 2

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

29

Das könnte Ihnen auch gefallen