Sie sind auf Seite 1von 34

Module2

Back propagation: Training Algorithm - Application -


Network Configurations - Network Paralysis - Local
Minima - Temporal instability.
The expansion of ANN was under eclipse due to lack of
algorithms for training multilayer ANN.
Back propagation is a systematic method of training
multilayer ANN.
The back propagation algorithms dramatically
expanded the range of problem that can be solved using
ANN.
INTRODUCTION
BACK PROPAGATION
Back propagation is a systematic method for training
multilayer artificial neural networks
Overcoming Linear separability Limitation
Linear separability limitation of single layer Peceptron
networks can be overcome by adding more layers.
Multilayer networks can perform more general tasks.
The multi layer perceptron ,trained by BACK
PROPAGATION algorithm is the most widely used
NN.


p
1
b
1


p
2
b
2


p
3
b
3
a
1
a
2
a
3
n
1
n
2
n
3
P
R


b
1


b
2


b
3
a
1
a
2
a
3
n
1
n
2
n
3


b
1


b
2


b
3
a
1
a
2
a
3
n
1
n
2
n
3
*Notations are incorrect
Three Layer Neural Network
First Layer
Second Layer Third Layer
Back Propagation Training algorithm
Network Configuration
X
1
X
2

X
3
X
4
W
1,1
W
1,2
W
1,3
W
1,4
NET= XW

F
OUT
Artificial Neuron

b
The generally used activation function for NN using
Back Propagation Algorithm is Sigmoid Function


Sigmoid Function Gives a nonlinear gain for Artificial
Neuron.
Sigmoid Function
Why Sigmoid Function is used in Back Propagation?

BP requires a function differentiable everywhere.
Sigmoid function has an additional advantage of
providing a form of automatic gain control.
Multilayer network will have more representational
power with non linear functions.




Sigmoid Function

OUT=
) 1 (
1
NET
+ EXP
d (OUT)
---------- =OUT(1-OUT)
d (NET)
Multilayer Layer Back Propagation Network


p
1
b
1


p
2
b
2


p
3
b
3
a
1
a
2
a
3
n
1
n
2
n
3
P
R


b
1


b
2


b
3
OUT
1
OUT
2
OUT
3
n
1
n
2
n
3
Target
1
Target
3
Target
2
OBJECTIVE OF TRAINING
TRAINING PAIR
TRAINING SET
TRAINING STEPS
OVER VIEW OF TRAINING
The Steps Required
1. Select the Training Pair from the training
set;apply the input to the network input.
2. Calculate the output of the network.
3. Calculate the error between the network output
and the target.
4. Adjust the weights of the network in a way to
minimize the error.
5. Repeat step 1 to 4 for each vector in the training
set, until the error for the entire set is acceptably
low.

Forward Pass

Step 1 and Step 2 constitute forward pass
Signal propagate from input to output.

NET = XW.
OUT = F(XW)
Reverse Pass
Step 3 and step 4 constitute reverse pass.
Weights in the OUTPUT LAYER is adjusted with the
modified delta rule.
Training is more complicated in the HIDDEN LAYERS,
as their output have no target for comparison.
Adjusting Weights of the Output Layer
Training Process is as follows
Consider the weight between neuron p in the hidden
layer j to neuron q in the output layer.
The OUTPUT of the neuron in the layer k is
subtracted from the Target value to produce the error
signal.
This is multiplied by the derivative of the Function
calculated for layer k.

=OUT(1--OUT)(Target -OUT)

q
F
I
*
+
*
*
p
Wqp

,Training Rate
w
qp

W
qp
(n)

W
qp
(n+1)

Adjusting Weights of the Output Layer Contd..
Then the is multiplied by the OUT from neuron j-
the source neuron.
This product is multiplied by the Learning Rate.
() typically the learning rate is taken as value
between 0.01-1.0.
This result is added to the weight.
An identical process is done for each weight
proceeding from a neuron in the hidden layer to
output layer.

The following Equation will illustrate this calculation..
w w w
w
qp qp qp
j p, k q,
qp
(n) 1) (n
OUT
+ = +
=
) (n
w
qp
-The value of weight from neuron p in the
hidden layer to q in the output layer during
n
th
iteration

Adjusting Weights of Hidden Layer
Hidden layer have no target vectors, so training process
described above is not used for them.
Back propagation trains the hidden layers by
propagating the output error back through a the
network layer by layer, adjusting weights at each layer.
The same equation as in the previous case can be
utilized here also.
i.e..


w w w
w
qp qp qp
j p, k q,
qp
(n) 1) (n
OUT
+ = +
=
How to generate for hidden layers?
First is calculated for each neuron in the output
layer.
It is used to adjust the weights feeding into the
output layer.
Propagated back the above through the same
weights to generate for each neuron in the first
hidden layer.
Calculate the by summing up all the weighted .
These s are used for adjusting weights of this
hidden layer.
Now these are propagated back to all the preceding
layers in a similar way.

) )( 1 (
,k qp qk pj pj pj
w OUT OUT o o E =
1
2
q
p
Output Layer Hidden Layer Previous Layer
i
j
k

1,k

2,k
q
,k
w
1p
w
2p
w
qp
Derivation of Learning Rule for Back Propagation
Assumptions and Notations.

y
k
- The output of the k
th
neuron y
k
=f(y
in,k
)
y
in,k -
The net input to the neuron k.
E Squired error, E =0.5 (Target-OUT)
2
z
j
Output from j
th

hidden layer.
portion of error correction weight.
j Hidden Layer
k Output Layer



j ink k k
kj
ink
ink
ink
k k
kj
k
k k
kj
z y f y t
w
y
y
y f
y t
w
y
y t
w
E
* ) ( ' ) (
*
) (
) (
) (
=
c
c
c
c
=
c
c
=
c
c
.
) (
) (
) (
) ( ) ( let
,
NET d
OUT d
y t
y f y t
k k
k in k k k
=
'
= o
E =0.5 (Target-OUT)2.
E =0.5 (t
k
-y
k
)
2
Usually the activation function for the BP Network is
either Binary Sigmoid function (range [0,1]) or
Bipolar sigmoid function (range [-1,1]).Hence the
above equation for becomes

). 1 ( ) (
) ( ) (
,
OUT OUT y t
y f y t
k k
k in k k k
=
'
= o
i inj jk k
ji
inj
inj
inj
jk k
ji
inj
inj
ji
jk k
ji
ji
ji
k in
k in k k
ji
k in
k in
k in
k k
ji
k
k k
ji
x z f w
w
z
z
z f
w
w
z
z
z
w
w
z
z
y
y f y t
w
y
y
y f
y t
w
y
y t
w
E
) ( *
) (
*
*
* ) ( ) (
*
) (
) (
) (
,
,
,
,
,
'
=
c
c
c
c
=
c
c
c
c
=
c
c
c
c
'
=
c
c
c
c
=
c
c
=
c
c
o
o
o
For Hidden Layer
.
layer output for
) ( let
j k
jk
jk
inj jk
k
z
w
E
w
z f w j
qo
q
o o
=
c
c
= A
'
=

.
i j
ji
ji
x
w
E
w
qo
q
=
c
c
= A
Now consider the first case
Now consider the second case (Hidden Layer)
BiasAdjustments.
m m m
k b k b qo + = + ) ( ) 1 (
(For Hidden and Output Layers)

Example :Find the equation for Change in weight by back
propagation algorithm, when the activation function used
is Tan Sigmoid Function.
1 2
2
2
) 1 )( (
). 1 )( (
) ) ( 1 (

= A
=
=
'
+

= =
k
n n
n n
OUT a k t w
a k t
Hence
a T OU
e e
e e
a OUT
q
o
Example: For the network shown the initial weights and
biases are chosen to be w
1
(0)=-1,b
1
(0)=1,w
2
(0)=-2,b
2
(0)=1.
An input Target pair is given to be ((p=-1),(T=1)).
Perform the Back propagation algorithm for one iteration.
F=Tan-Sigmoid function
p F
F
w
1
n
1
b
1
a
1 w
2
b
2
n
2
a
2
Example: For the neural network shown in Fig. With
initial data determine the new weights after applying the
sample(0,0) once. Assume the learning rate as 0.3 and
the activation function for the hidden layer and the
output layer as
x
e

+ 1
1
z
2
y
1
x
1
x
2
1

1

0.2
0.1
0.1
0.1
-0.2
0.15
Applications of Back propagation algorithm
Short-term Load Forecasting
I mage Processing
Online Motor fault detection
Power system stability.
Network Paralysis
During BP training the weights can become very
large.
This force all or most of the neuron to operate at
large values of OUT.
Derivative in this region is very small.
The error sent back for training is also small.
(Proportional to derivative of OUT)
Hence the training process can come to a virtual
stand still. (called Network Paralysis)
It is commonly avoided by reducing the step size.

Local Minima
Back Propagation algorithm employs a type of
Gradient descent method.
The error surface of a complex network is highly
convoluted, full of hills, valleys, folds etc.
The network can be get trapped in a local minimum
(shallow valley) when there is a much deeper
minimum nearby. ( This problem denoted as Local
Minima.)
It is avoided by statistical training methods.
Wasserman proposed a combined statistical and gradient
descent method

Weight
A
B
O
b
j
e
c
t
i
v
e

F
u
n
c
t
i
o
n

Temporal Instability
Human brain has the ability of retaining the data
,while able to record new data.
Conventional ANN have failed to solve this stability
problem.
Learning a new pattern may erase or modifies too
often.
In BPNN the new set of applied input may badly
change the existing weights, hence complete retraining is
required.
In real world problems NN is exposed to a
continuously changing environment.
BPNN learn nothing because of continuous change in
the input pattern., never arriving at satisfactory settings.

Das könnte Ihnen auch gefallen