PartB PDF

Accepted Notation
xi input signal of activation unit Xi .

yj yj = F (sj ) for an activation unit Yj .
wi,j Weight on connection from unit Xi to unit Yj .
bj Bias on unit Yj . X
sj Net input to unit Yj : sj = bj + xi wij
i
W weight matrix {wij }.
wj Vector of weights wj = (w1j , w2j , . . . , wnj )T .
uj Threshold for activation of neuron Yj .
x Input vector for classification or response x = (x1 , . . . , xi , . . . , xn ).
wij Change in wij . wij = [wij (new) wij (old)].
Dr. E.C. Kulasekere () Neural Networks 1 / 36

Mathematical Models of Activation Functions
Typically the same activation function is used for all neurons in any
particular layer (this is not a requirement).
In a multi-layer network if the neurons have linear activation
functions the capabilities are no better than a single layer network
with a linear activation function.
Hence in most cases nonlinear activation functions are used.
Linear Activation Function
f (x) = x, x.

Mathematical Models of Activation Functions ...
Binary Step Function with Threshold

1 if x
f (x) =
0 if x <
f (x)

Binary Sigmoid Function
1
f (x) = .
1 + exp (x)
0
f (x) = f (x)[1 f (x)].
f (x)
This is especially useful when training the backpropagation neural

network where the derivatives are used to obtain the output value.
Bipolar Sigmoid Function
1 exp (x)
g(x) = 2f (x) 1 = .
1 + exp (x)
0
g (x) = [1 + g(x)][1 g(x)].
2
The above is actually the binary sigmoid scaled to have a set of

values ranging from -1 to +1.
The bipolar sigmoid is also closely related to the hyperbolic
tangent function.
exp (x) exp (x)
h(x) = tanh (x) =
exp (x) + exp (x)
1 exp (2x)
= .
1 + exp (2x)
Matrix Manipulations
1
bj
X1
w1j
Xi wij Yj
wnj
Xn
Pn
sj = wj xt = i=1 xi wij , if the bias is neglected
If the bias is not neglected, the x includes the bias value.
Here row vectors for the input and weights are assumed.
Ex 2.2.1 is a direct application of the material learned thus far.

McCulloch-Pitts Neuron
Characteristics
The McCulloch-Pitt neuron is considered to be the first neural network.

It is a fixed weight network that can be used to implement boolean
functions.
Its characteristics are:
Binary activation (1 ON, 0 OFF). i.e. it either fires with an
activation 1 or does not fire with an activation of 0.
Neurons are connected by directed weighted paths.
If w > 0, excitatory, else inhibitory.
Excitatory weights are identical; inhibitory weights too.
Each neuron has a fixed threshold for firing. That is if the net input
to the neuron is greater than the threshold it fires.
The threshold is set such that the inhibition is absolute. Any
non-zero inhibitory input will prevent it from firing.
It takes one time step for the signal to pass over one link.
McCulloch-Pitts Neuron
Architecture
The inhibition should be

X1
absolute. Hence the threshold
w
satisfies
> nw p
Xn
w
Y
Xn+1 p The output neuron will fire if it
receives k or more excitatory
p inputs with no inhibitory inputs,
Xn+m where
kw > (k 1)w
The activation function

is
1 if s
f (s) =
0 if s <
Neural net to perform the AND function
X1
1
Y
X2 1
Write the truth table for the above network.

Threshold = 2
Given another configuration of the AND function with Threshold 1.

Neural net to perform the OR function
X1
2
Y
X2 2
Write the truth table for the above network.

Threshold=2
Give another configuration of the OR function with Threshold 1.

Neural net to perform the AND NOT function
X1
2
Y
1
X2
Write the truth table for the AND NOT function.

Threshold=2
Can you find another configuration for the weights to implement
the AND NOT function?

Neural net to perform the XOR function
X1 2 Z1
2
1
Y
1
2
X2 2 Z2
Write the truth table for the XOR function.

Threshold=2
Write the layer equations for the XOR function and show that it is
AND NOT combined with an OR function.
Geometric View of Neural Architectures
Single output neuron
Note that the threshold function can be reduced to the discrete

activation function when the threshold is considered as a bias.
Then the synaptic input can be written as s = wxt .
For two input neurons with inputs x1 and x2 , s = 0 is a straight line.
On the x1 x2 -plane, s > 0 gives one half-plane and s < 0 gives
another half-plane.
If the weights and threshold are multiplied by a constant c, then
the only difference is that the half-planes in which the earlier
classification occurred will be switched. (pp.27)
If you have more that two inputs you will be plotting on a
x1 x2 . . . xn -hyper plane. For example if n = 3 you will end up with a
plane.

Geometric View of Neural Architectures
Multiple output neurons
Each output neuron will have a summation block of sj = wxt .

For j = 2 two input neurons the above will give two straight lines in
the x1 , x2 -plane.
For j = 2 with more than two input neurons, the above will give
two n-dimensional hyper-planes.
For j > 2 and several inputs will give higher order planes.

Example 3.3.5
Note that the threshold function can be converted to the discrete

activation function by converting s > 0 to s u > 0. By doing this
we are converting a threshold to a bias.
We have to mark the area in which the output turns out to be 1.
That is by looking at the vertices, we determine it is
x1 > 0, x1 < 1, x2 > 0, and, x2 < 1.
In order to obtain the required output the threshold function has to
satisfy s u > 0. That is w1 x1 + w2 x2 u > 0. Now compare the
above two items to obtain the required w1 , w2 and u
Even for Exercise 3.3.2 we have a region bonded by a triangle in
which the same computation can be carried out.
Even when you are building a two layer network the arguments
used in MCP networks can be used for example the question in
pp.33.
All of these techniques produce fixed weight ANNs.
Pattern Classification Using ANNs
This is the simplest task a neural network can be trained to

perform.
Each input vector pattern either belongs or does not belong to a
class.
The correct input data is assumed to be known.
The activation function is either binary or bipolar.
The output of the trained network will be 1 when presented with a
pattern from the class.
If the pattern is not in the class a -1 (or 0 for a binary output) is
output.
Since it is either one set of outputs the activation is a hard step.

Biases and Thresholds
The bias acts exactly as a weight on a connection from a unit

whose activation is always 1.
Increasing the bias increases the net input to to the unit.
The Activation with a bias is

1 if s 0; X
f (s) = where s = b + xi wi,.
1 if s < 0;
i
The Activation with a threshold is

1 if s ; X
f (s) = where s = xi wi,.
1 if s < ;
i
Essentially the above are equivalent for most cases. However the
bias becomes essential for certain problems associated with linear
separability.
Requirement of Linear Separability
In pattern classification, the network is trained to output an

indicator to show the presented input is either a member of one
class or another class.
This will depend on the decision boundary that will either fire the
neuron or not fire it. This can be determined by setting y _in = 0
for the case with a bias.
The decision boundary is
X
b+ xi wi = 0.
i
If the system is linearly separable, the two classes should lie on

either side of the decision boundary (y _in > 0 or y _in < 0).
The boundary is non unique
w1 b
x2 = x1 .
w2 w2
Requirement of Linear Separability ...
For bipolar signals the outputs for the two classes are -1 and +1.
For unipolar signals it is 0 and 1.
Depending on the number of inputs the decision boundary can be
a line, plane or a hyperplane. Eg. For two inputs its a line and for
three inputs its a plane.
If all of the training input vectors for which the correct response is
+1 lie on one side of the decision boundary we say that this
system is linearly separable.
It has been shown that a single layer network can only learn
linearly separable problems.
The trained weights are not unique.

Response Regions for the AND Function
The AND function for Bipolar inputs

and targets is x2
Input (x1 , x2 ) Output (t)
(1,1) +1 - +
(1,-1) -1
(-1,1) -1
(-1,-1) -1
x1
x2 = x1 + 1 is the decision
boundary where w1 = w2 = 1 and
b = 1.
This boundary is not unique. - -

Response Regions for the OR function
The OR function for Bipolar inputs Importance of Bias

and targets is If the bias weight was not
Input (x1 , x2 ) Output (t) included in the previous
(1,1) +1 example the decision
(1,-1) +1 One boundary would have been
(-1,1) +1 forced to go through the origin.
(-1,-1) -1
This changes the problem
possible decision boundary is
from a solvable one to an
given by x2 = x1 1 where
unsolvable problem.
b = 1, w1 = 1 = w2 .
Draw the above graphically.

Response Regions for the XOR Function
The XOR function for bipolar inputs

and targets is given by x2
Input (x1 , x2 ) Output (t)
(1,1) -1 + -
(1,-1) +1 This
(-1,1) +1
(-1,-1) -1
x1
is not linearly separable.
- +

Binary/Bipolar Data Representation
In most cases the binary input data can be modified to bipolar data.
However the form of the data can change the problem from one that is
solvable to a problem that cannot be solved.
Binary representation is also not as good as the bipolar if we want the
net to generalize. i.e. to respond to input data that is similar but not
the same as the training data.
Bipolar inputs and targets are also better during training where the
adaptive weight change is computed. For example
w = xy
is the weight change in the Hebb learning algorithm. If either the

training input vector or the target vector is binary (unipolar) the update
becomes zero and the learning stops.
Using bipolar missing data can be distinguished from the mistaken
data by assigning 0 to missing data and a mistake by changing -1 to
+1 or vise versa.
The HEBB Net
This is the earliest and the most simplest learning rule for the
neural networks.
Hebb proposed that if two interconnected neurons are both on at
the same time, then the weight between them should be increased.
However the original statement did not discuss neurons that are
connected but do not fire together. Later this was also included in
the Hebb rule so that the weight increased of not firing together
and the original rule was made a strongest learning algorithm.
The weight update can be represented as
wi (new) = wi (old) + xi y .
Note that if the signals are binary, the update rule cannot
distinguish between a pair of inputs outputs for the following
conditions.
Input is on and the target is off.
Both the input and the target units are off.
Hebb Algorithm
Step 0 Initialize all weights: wi = 0, i = 1, . . . , n.

Step 1 For each input training vector and target output pair, s : t,
do steps 2-4.
Step 2 Set activations for input units: xi = si , i = 1, . . . , n.
Step 3 Set activation for output unit: y = t.
Step 4 Adjust the weight and bias:
wi (new) = wi (old) + xi y for i = 1, . . . , n

b(new) = b(old) + y .
If the bias is considered to be an input signal that is always 1, the

weight change can be written as
w(new) = w(old) + w
where w = xy .
Hebb Net for AND Function
Binary inputs and targets
Input Target
(x1 , x2 , 1)
(1,1,1) 1
The truth table for the AND function is
(1,0,1) 0
(0,1,1) 0
(0,0,1) 0
The initial values are w(old) = (0, 0) and b(old) = 0.
The first step of the algorithm is
Input Target Weight Changes Weights
(x1 , x2 , 1) (w1 , w2 , b) (w1 , w2 , b)
(0,0,0)
(1,1,1) 1 (1,1,1) (1,1,1)
The separating line after the first step is
x2 = x1 1
Hebb Net for AND Function ...
Binary inputs and targets
Now if we present the second, third and fourth training vectors the
weight change is given by
(x1 , x2 , 1) (w1 , w2 , b) (w1 , w2 , b)
(1,1,1)
(1,0,1) 0 (0,0,0) (1,1,1)
(0,1,1) 0 (0,0,0) (1,1,1)
(0,0,1) 0 (0,0,0) (1,1,1)
We note that in the above problem when the target value is zero,
no learning occurs and hence no weight change.
With this we can determine that the bipolar inputs have resulted in
a short coming in the learning method.

Hebb Net for the AND function
Binary inputs, bipolar targets
Input Target
(x1 , x2 , 1)
(1,1,1) +1
The truth table in this case is
(1,0,1) -1
(0,1,1) -1
(0,0,1) -1
Presenting the first input
(x1 , x2 , 1) (w1 , w2 , b)
(w1 , w2 , b)
(0,0,0)
(1,1,1) 1 (1,1,1) (1,1,1)
The separating line becomes x2 = x1 1. This is the correct
classification for the first input.

Hebb Net for the AND function ...
x2
- +
- - x1

Presenting the other inputs we have the following

(x1 , x2 , 1) (w1 , w2 , b) (w1 , w2 , b)
(1,0,1) -1 (-1,0,-1) (0,1,0)
(0,1,1) -1 (0,-1,-1) (0,0,-1)
(0,0,1) -1 (0,0,-1) (0,0,-2)
Again we see that the problem has not correctly classified the
output. Hence alternate training data should be used.

Hebb Net for the AND function
Bipolar inputs and targets
The truth table is given by x2
Input Target
- +
(x1 , x2 , 1)
(1,1,1) +1
(1,-1,1) -1
(-1,1,1) -1 x1
(-1,-1,1) -1
Presenting the first input:target
pair yields the same result as
before with the separation line - -
being x2 = x1 1.
Now the classification is
correct for the point (1,1) and
also for (-1,-1).

For the second input:target pair we obtain the separating line as

x2 = 0.
(x1 , x2 , 1) (w1 , w2 , b) (w1 , w2 , b)
(1,1,1)
(1,-1,1) -1 (-1,1,-1) (0,2,0)
With the third pair we obtain the separating line as x2 = x1 + 1.
(x1 , x2 , 1) (w1 , w2 , b) (w1 , w2 , b)
(0,2,0)
(-1,1,1) -1 (1,-1,-1) (1,1,-1)
Presenting the fourth pair, the separating line does not change from
what we obtained previously even though the weights have changed.
(x1 , x2 , 1) (w1 , w2 , b) (w1 , w2 , b)
(1,1,-1)
(-1,-1,1) -1 (1,1,-1) (2,2,-2)
x2
- +
x1
- -
Figure: Decision boundary for bipolar AND function using Hebb rule after
third/fourth training.
Draw the diagram for the intermediate stages as well.

Character Recognition Example Using Hebb Net
The Problem: Distinguish between the two patterns.

#
#
#
#
#

#
#
#
#

#
#
#

#
#
#
#

#
#
#
#
#
The steps involved are as follows:

We use the bipolar representation to convert the patterns into the
input values that can be used for training. We assign # to 1 and to
-1.
The correct target for the first pattern is +1 and for the second
pattern it is -1. This is used as the classification.
Then use the Hebb rule to find the weights by repeatedly
presenting the input:target pairs ti find the weights.
Check the system with training patterns that are similar but not
identical to see if the system will still react with the correct
classification.
Missing and Mistaken Data
Binary representation of inputs and targets (0 and 1 levels)

Missing Data: Cannot be represented.
Mistaken Data: A mistake of 1 is represented as 0 and vise versa.
Bipolar representation of inputs and targets (-1 and 1 levels)
Missing Data: this is represented by a 0.
Mistaken Data: A mistake in +1 is represented as -1 and vise versa.
In general a net can handle more missing components than wrong
components.
For input data the above translates to Its better not to guess.
The reason is that if the value is zero the weight change during
training may not get affected.

Additional Comments
ANN Training is completely ad hoc. Sometimes the result will not

converge.
The weight vector ca be multiplied by a positive number without
changing the actions of the neurons. This normalization can
eliminate rounding errors account for input variations etc.
The value of is significant for convergence as well as the rate of
convergence.
The order in which the training vectors are presented is also
important. Randomizing is generally preferable.
In multilayer networks, it is accepted that more than two layers are
rarely used.

PartB PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

PartB PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Accepted Notation

xi input signal of activation unit Xi .

Dr. E.C. Kulasekere () Neural Networks 1 / 36

Linear Activation Function

Dr. E.C. Kulasekere () Neural Networks 2 / 36

Binary Step Function with Threshold

Dr. E.C. Kulasekere () Neural Networks 3 / 36

This is especially useful when training the backpropagation neural

The above is actually the binary sigmoid scaled to have a set of

Dr. E.C. Kulasekere () Neural Networks 6 / 36

The McCulloch-Pitt neuron is considered to be the first neural network.

The inhibition should be

The activation function

Write the truth table for the above network.

Dr. E.C. Kulasekere () Neural Networks 9 / 36

Write the truth table for the above network.

Dr. E.C. Kulasekere () Neural Networks 10 / 36

Write the truth table for the AND NOT function.

Dr. E.C. Kulasekere () Neural Networks 11 / 36

Write the truth table for the XOR function.

Note that the threshold function can be reduced to the discrete

Dr. E.C. Kulasekere () Neural Networks 13 / 36

Each output neuron will have a summation block of sj = wxt .

Dr. E.C. Kulasekere () Neural Networks 14 / 36

Note that the threshold function can be converted to the discrete

This is the simplest task a neural network can be trained to

Dr. E.C. Kulasekere () Neural Networks 16 / 36

The bias acts exactly as a weight on a connection from a unit

The Activation with a threshold is

In pattern classification, the network is trained to output an

If the system is linearly separable, the two classes should lie on

Dr. E.C. Kulasekere () Neural Networks 19 / 36

The AND function for Bipolar inputs

Dr. E.C. Kulasekere () Neural Networks 20 / 36

The OR function for Bipolar inputs Importance of Bias

Dr. E.C. Kulasekere () Neural Networks 21 / 36

The XOR function for bipolar inputs

Dr. E.C. Kulasekere () Neural Networks 22 / 36

is the weight change in the Hebb learning algorithm. If either the

Step 0 Initialize all weights: wi = 0, i = 1, . . . , n.

wi (new) = wi (old) + xi y for i = 1, . . . , n

If the bias is considered to be an input signal that is always 1, the

Dr. E.C. Kulasekere () Neural Networks 27 / 36

Dr. E.C. Kulasekere () Neural Networks 28 / 36

Dr. E.C. Kulasekere () Neural Networks 29 / 36

Presenting the other inputs we have the following

Dr. E.C. Kulasekere () Neural Networks 30 / 36

The truth table is given by x2

Dr. E.C. Kulasekere () Neural Networks 31 / 36

For the second input:target pair we obtain the separating line as

Draw the diagram for the intermediate stages as well.

The Problem: Distinguish between the two patterns.

The steps involved are as follows:

Binary representation of inputs and targets (0 and 1 levels)

Dr. E.C. Kulasekere () Neural Networks 35 / 36

ANN Training is completely ad hoc. Sometimes the result will not

Dr. E.C. Kulasekere () Neural Networks 36 / 36

Das könnte Ihnen auch gefallen