Beruflich Dokumente
Kultur Dokumente
Typically the same activation function is used for all neurons in any
particular layer (this is not a requirement).
In a multi-layer network if the neurons have linear activation
functions the capabilities are no better than a single layer network
with a linear activation function.
Hence in most cases nonlinear activation functions are used.
f (x) = x, x.
f (x)
1
f (x) = .
1 + exp (x)
0
f (x) = f (x)[1 f (x)].
f (x)
1 exp (x)
g(x) = 2f (x) 1 = .
1 + exp (x)
0
g (x) = [1 + g(x)][1 g(x)].
2
1
bj
X1
w1j
Xi wij Yj
wnj
Xn
Pn
sj = wj xt = i=1 xi wij , if the bias is neglected
If the bias is not neglected, the x includes the bias value.
Here row vectors for the input and weights are assumed.
Ex 2.2.1 is a direct application of the material learned thus far.
X1
1
Y
X2 1
X1
2
Y
X2 2
X1
2
Y
1
X2
X1 2 Z1
2
1
Y
1
2
X2 2 Z2
Essentially the above are equivalent for most cases. However the
bias becomes essential for certain problems associated with linear
separability.
Dr. E.C. Kulasekere () Neural Networks 17 / 36
Requirement of Linear Separability
For bipolar signals the outputs for the two classes are -1 and +1.
For unipolar signals it is 0 and 1.
Depending on the number of inputs the decision boundary can be
a line, plane or a hyperplane. Eg. For two inputs its a line and for
three inputs its a plane.
If all of the training input vectors for which the correct response is
+1 lie on one side of the decision boundary we say that this
system is linearly separable.
It has been shown that a single layer network can only learn
linearly separable problems.
The trained weights are not unique.
(1,-1) -1
(-1,1) -1
(-1,-1) -1
x1
x2 = x1 + 1 is the decision
boundary where w1 = w2 = 1 and
b = 1.
This boundary is not unique. - -
(1,-1) +1 This
(-1,1) +1
(-1,-1) -1
x1
is not linearly separable.
- +
In most cases the binary input data can be modified to bipolar data.
However the form of the data can change the problem from one that is
solvable to a problem that cannot be solved.
Binary representation is also not as good as the bipolar if we want the
net to generalize. i.e. to respond to input data that is similar but not
the same as the training data.
Bipolar inputs and targets are also better during training where the
adaptive weight change is computed. For example
w = xy
This is the earliest and the most simplest learning rule for the
neural networks.
Hebb proposed that if two interconnected neurons are both on at
the same time, then the weight between them should be increased.
However the original statement did not discuss neurons that are
connected but do not fire together. Later this was also included in
the Hebb rule so that the weight increased of not firing together
and the original rule was made a strongest learning algorithm.
The weight update can be represented as
wi (new) = wi (old) + xi y .
Note that if the signals are binary, the update rule cannot
distinguish between a pair of inputs outputs for the following
conditions.
Input is on and the target is off.
Both the input and the target units are off.
Dr. E.C. Kulasekere () Neural Networks 24 / 36
Hebb Algorithm
w(new) = w(old) + w
where w = xy .
Dr. E.C. Kulasekere () Neural Networks 25 / 36
Hebb Net for AND Function
Binary inputs and targets
Input Target
(x1 , x2 , 1)
(1,1,1) 1
The truth table for the AND function is
(1,0,1) 0
(0,1,1) 0
(0,0,1) 0
The initial values are w(old) = (0, 0) and b(old) = 0.
The first step of the algorithm is
Input Target Weight Changes Weights
(x1 , x2 , 1) (w1 , w2 , b) (w1 , w2 , b)
(0,0,0)
(1,1,1) 1 (1,1,1) (1,1,1)
The separating line after the first step is
x2 = x1 1
Dr. E.C. Kulasekere () Neural Networks 26 / 36
Hebb Net for AND Function ...
Binary inputs and targets
Now if we present the second, third and fourth training vectors the
weight change is given by
Input Target Weight Changes Weights
(x1 , x2 , 1) (w1 , w2 , b) (w1 , w2 , b)
(1,1,1)
(1,0,1) 0 (0,0,0) (1,1,1)
(0,1,1) 0 (0,0,0) (1,1,1)
(0,0,1) 0 (0,0,0) (1,1,1)
We note that in the above problem when the target value is zero,
no learning occurs and hence no weight change.
With this we can determine that the bipolar inputs have resulted in
a short coming in the learning method.
Input Target
(x1 , x2 , 1)
(1,1,1) +1
The truth table in this case is
(1,0,1) -1
(0,1,1) -1
(0,0,1) -1
Presenting the first input
Input Target Weight Changes Weights
(x1 , x2 , 1) (w1 , w2 , b)
(w1 , w2 , b)
(0,0,0)
(1,1,1) 1 (1,1,1) (1,1,1)
The separating line becomes x2 = x1 1. This is the correct
classification for the first input.
x2
- +
- - x1
Input Target
- +
(x1 , x2 , 1)
(1,1,1) +1
(1,-1,1) -1
(-1,1,1) -1 x1
(-1,-1,1) -1
Presenting the first input:target
pair yields the same result as
before with the separation line - -
being x2 = x1 1.
Now the classification is
correct for the point (1,1) and
also for (-1,-1).
x2
- +
x1
- -
Figure: Decision boundary for bipolar AND function using Hebb rule after
third/fourth training.