Beruflich Dokumente
Kultur Dokumente
1,k
2,k
q
,k
w
1p
w
2p
w
qp
Derivation of Learning Rule for Back Propagation
Assumptions and Notations.
y
k
- The output of the k
th
neuron y
k
=f(y
in,k
)
y
in,k -
The net input to the neuron k.
E Squired error, E =0.5 (Target-OUT)
2
z
j
Output from j
th
hidden layer.
portion of error correction weight.
j Hidden Layer
k Output Layer
j ink k k
kj
ink
ink
ink
k k
kj
k
k k
kj
z y f y t
w
y
y
y f
y t
w
y
y t
w
E
* ) ( ' ) (
*
) (
) (
) (
=
c
c
c
c
=
c
c
=
c
c
.
) (
) (
) (
) ( ) ( let
,
NET d
OUT d
y t
y f y t
k k
k in k k k
=
'
= o
E =0.5 (Target-OUT)2.
E =0.5 (t
k
-y
k
)
2
Usually the activation function for the BP Network is
either Binary Sigmoid function (range [0,1]) or
Bipolar sigmoid function (range [-1,1]).Hence the
above equation for becomes
). 1 ( ) (
) ( ) (
,
OUT OUT y t
y f y t
k k
k in k k k
=
'
= o
i inj jk k
ji
inj
inj
inj
jk k
ji
inj
inj
ji
jk k
ji
ji
ji
k in
k in k k
ji
k in
k in
k in
k k
ji
k
k k
ji
x z f w
w
z
z
z f
w
w
z
z
z
w
w
z
z
y
y f y t
w
y
y
y f
y t
w
y
y t
w
E
) ( *
) (
*
*
* ) ( ) (
*
) (
) (
) (
,
,
,
,
,
'
=
c
c
c
c
=
c
c
c
c
=
c
c
c
c
'
=
c
c
c
c
=
c
c
=
c
c
o
o
o
For Hidden Layer
.
layer output for
) ( let
j k
jk
jk
inj jk
k
z
w
E
w
z f w j
qo
q
o o
=
c
c
= A
'
=
.
i j
ji
ji
x
w
E
w
qo
q
=
c
c
= A
Now consider the first case
Now consider the second case (Hidden Layer)
BiasAdjustments.
m m m
k b k b qo + = + ) ( ) 1 (
(For Hidden and Output Layers)
Example :Find the equation for Change in weight by back
propagation algorithm, when the activation function used
is Tan Sigmoid Function.
1 2
2
2
) 1 )( (
). 1 )( (
) ) ( 1 (
= A
=
=
'
+
= =
k
n n
n n
OUT a k t w
a k t
Hence
a T OU
e e
e e
a OUT
q
o
Example: For the network shown the initial weights and
biases are chosen to be w
1
(0)=-1,b
1
(0)=1,w
2
(0)=-2,b
2
(0)=1.
An input Target pair is given to be ((p=-1),(T=1)).
Perform the Back propagation algorithm for one iteration.
F=Tan-Sigmoid function
p F
F
w
1
n
1
b
1
a
1 w
2
b
2
n
2
a
2
Example: For the neural network shown in Fig. With
initial data determine the new weights after applying the
sample(0,0) once. Assume the learning rate as 0.3 and
the activation function for the hidden layer and the
output layer as
x
e
+ 1
1
z
2
y
1
x
1
x
2
1
1
0.2
0.1
0.1
0.1
-0.2
0.15
Applications of Back propagation algorithm
Short-term Load Forecasting
I mage Processing
Online Motor fault detection
Power system stability.
Network Paralysis
During BP training the weights can become very
large.
This force all or most of the neuron to operate at
large values of OUT.
Derivative in this region is very small.
The error sent back for training is also small.
(Proportional to derivative of OUT)
Hence the training process can come to a virtual
stand still. (called Network Paralysis)
It is commonly avoided by reducing the step size.
Local Minima
Back Propagation algorithm employs a type of
Gradient descent method.
The error surface of a complex network is highly
convoluted, full of hills, valleys, folds etc.
The network can be get trapped in a local minimum
(shallow valley) when there is a much deeper
minimum nearby. ( This problem denoted as Local
Minima.)
It is avoided by statistical training methods.
Wasserman proposed a combined statistical and gradient
descent method
Weight
A
B
O
b
j
e
c
t
i
v
e
F
u
n
c
t
i
o
n
Temporal Instability
Human brain has the ability of retaining the data
,while able to record new data.
Conventional ANN have failed to solve this stability
problem.
Learning a new pattern may erase or modifies too
often.
In BPNN the new set of applied input may badly
change the existing weights, hence complete retraining is
required.
In real world problems NN is exposed to a
continuously changing environment.
BPNN learn nothing because of continuous change in
the input pattern., never arriving at satisfactory settings.