Sie sind auf Seite 1von 4

11 Backpropagation

m m
m+1 T m+1
= ) Q :
V
, for m = M 1 }  2 1 .

(11.45)

Finally, the weights and biases are updated using the approximate steepest descent rule:
m

: k + 1 = : k DV D
m

m1 T

(11.46)

E k + 1 = E k DV .

(11.47)

Example
2
+2

To illustrate the backpropagation algorithm, lets choose a network and apply it to a particular problem. To begin, we will use the 1-2-1 network that
we discussed earlier in this chapter. For convenience we have reproduced
the network in Figure 11.8.
Next we want to define a problem for the network to solve. Suppose that we
want to use the network to approximate the function
S
g p = 1 + sin  p for 2 d p d 2 .
4

(11.48)

To obtain our training set we will evaluate this function at several values
of p .

,QSXW

/RJ6LJPRLG/D\HU

Z

Q

/LQHDU/D\HU

D

Z

E


Z

Q


D

E

Z

Q

D

E


D ORJVLJ :SE

D SXUHOLQ :DE

Figure 11.8 Example Function Approximation Network


Before we begin the backpropagation algorithm we need to choose some initial values for the network weights and biases. Generally these are chosen
to be small random values. In the next chapter we will discuss some reasons for this. For now lets choose the values

11-14

Example

1
1
2
2
: 0 = 0.27 , E 0 = 0.48 , : 0 = 0.09 0.17 , E 0 = 0.48 .
0.41
0.13

The response of the network for these initial values is illustrated in Figure
11.9, along with the sine function we wish to approximate.





















p
Figure 11.9 Initial Network Response
Next, we need to select a training set ^p 1 t 1`  ^p 2 t 2`  }  ^p Q tQ` . In this
case, we will sample the function at 21 points in the range [-2,2] at equally
spaced intervals of 0.2. The training points are indicated by the circles in
Figure 11.9.
Now we are ready to start the algorithm. The training points can be presented in any order, but they are often chosen randomly. For our initial input we will choose p = 1 , which is the 16th training point:
0

a = p = 1.
The output of the first layer is then

1
1
1 0
1
D = I : D + E = ORJVLJ 0.27 1 + 0.48 = ORJVLJ 0.75
0.41
0.54
0.13
1


0.75
= 1+e
= 0.321 .
0.368
1 

1+e

0.54

The second layer output is

11-15

11 Backpropagation

2
2
2 1
2
a = f : D + E = purelin 0.09 0.17 0.321 + 0.48 = 0.446 .
0.368

The error would then be

S 2
S
e = t a = 1 + sin  p a = 1 + sin  1 0.446 = 1.261 .
4
4

The next stage of the algorithm is to backpropagate the sensitivities. Before we begin the backpropagation, recall that we will need the derivatives
1
2
of the transfer functions, f n and f n . For the first layer
n

1 1
e
1 
1
d 
 = 1 a1 a1 .
 
f n =
= 2 = 1 

n
dn 1 + e
1+e
1+e
1 + e
For the second layer we have
2
f n = d n = 1 .
dn

We can now perform the backpropagation. The starting point is found at


the second layer, using Eq. (11.44):
2 2
2
V = 2) Q W D = 2 f 2 n 2 1.261 = 2 1 1.261 = 2.522 .

The first layer sensitivity is then computed by backpropagating the sensitivity from the second layer, using Eq. (11.45):
1

1 1
1 a1 a1
0
2 T 2
0.09
V = ) Q : V =
2.522
1
1

0.17
1 a2 a2
0
1

0
0.09
= 1 0.321 0.321
2.522
0
1 0.368 0.368 0.17
0.227 = 0.0495 .
= 0.218 0
0 0.233 0.429
0.0997
The final stage of the algorithm is to update the weights. For simplicity, we
will use a learning rate D = 0.1 . (In Chapter 12 the choice of learning rate
will be discussed in more detail.) From Eq. (11.46) and Eq. (11.47) we have

11-16

Batch vs. Incremental Training

1 T

: 1 = : 0 DV D = 0.09 0.17 0.1 2.522 0.321 0.368


= 0.171 0.0772 ,
2

E 1 = E 0 DV = 0.48 0.1 2.522 = 0.732 ,


1
1
1 0 T
: 1 = : 0 DV D = 0.27 0.1 0.0495 1 = 0.265 ,
0.41
0.0997
0.420
1
1
1
E 1 = E 0 DV = 0.48 0.1 0.0495 = 0.475 .
0.13
0.0997
0.140

This completes the first iteration of the backpropagation algorithm. We


next proceed to randomly choose another input from the training set and
perform another iteration of the algorithm. We continue to iterate until the
difference between the network response and the target function reaches
some acceptable level. (Note that this will generally take many passes
through the entire training set.) We will discuss convergence criteria in
more detail in Chapter 12.
To experiment with the backpropagation calculation for this two-layer network, use the MATLAB Neural Network Design Demonstration Backpropagation Calculation (nnd11bc).

Batch vs. Incremental Training


Incremental Training
Batch Training

The algorithm described above is the stochastic gradient descent algorithm, which involves on-line or incremental training, in which the network weights and biases are updated after each input is presented (as with
the LMS algorithm of Chapter 10). It is also possible to perform batch training, in which the complete gradient is computed (after all inputs are applied to the network) before the weights and biases are updated. For
example, if each input occurs with equal probability, the mean square error
performance index can be written
1
F [ = E > H H @ = E > W D W D @ = 
Q
T

Wq Dq
q=1

The total gradient of this performance index is

11-17

Wq Dq .

(11.49)

Das könnte Ihnen auch gefallen