Sie sind auf Seite 1von 10

NEURAL NETWORKS AND FUZZY LOGIC

PART 2

LEARNING VECTOR QUANTIZATION


Overview:
LVQ can be understood as a special case of an artificial neural network used very often in
classification problems. More precisely, it applies a winner-take-all learning-based approach in order
to adjust its inner parameters for specific classification problems.
An LVQ system is represented by prototypes W = (w(1),...,w(n)) which are defined in the
feature space of observed data. In classic winner-take-all training algorithm one determines, for each
data point, the prototype which is closest to the input according to a given distance measure. The
position of this so-called winner prototype is then adapted, i.e. the winner is moved closer if it
correctly classifies the data point or moved away if it classifies the data point incorrectly. Thus, after
the training algorithm is completed the prototypes correctly represent the classes of the data in the
feature space and can be used in a simple classification process: an unknown input is decided to
belong to the class defined by the closest prototype in the LVQ network.
Architecture:
The general architecture of the LVQ network is presented below in Fig. 1.

Figure 1. LVQ network architecture


It can be noticed that the network consists of two layers: one input layer and one competitive layer.
The nodes in the two layers are completely interconnected with weights attached to every such
1

connection. A prototypes defined in the feature space is represented in the system by the weights
targeting a specific output, i.e. the weights represent the components of the prototype vector. Thus
the number of output nodes in the competitive layer define the number of prototypes stored in the
network.
The input layer on the other hand is of size equal to the feature-space dimension and has the
role of distributing the components of the input vectors to all the nodes in the competitive layer.
Training:
As mentioned before, the training process implies a winner-take-all technique with the role of
adjusting the weights of the winning prototype (i.e. the closest to the current input). The classic
approach is based on a supervised learning algorithm that prerequisites the labeling of the inputs in
the training set and also the labeling of the prototypes in the competitive layer (Labels show to which
class in the feature space, a prototype is designated). At the same time a learning rate parameter is
used which limits the amount of transition in the feature space for the winning prototype.
The algorithm can be described by the following steps:
STEP 0: Initialize weight vectors to the first M training vectors, where M is the number of different
categories and set (0) - the initial learning rate. This weight initialization technique presented here
is only one of many different methods.
STEP 1: While stopping condition is false do steps 2 to 6:
STEP 2: For each training input x do steps 3 and 4:
STEP 3: Find the closest prototype wwinner to the current input x using an appropriate measure (in
most cases, the Euclidean distance is computed)
STEP 4: Adjust the winning prototype according to the rule:
IF Cwinner is the same with Cx THEN
+ 1 = +

(1)

(i.e. move the prototype closer to the current input)


ELSE
+ 1 =

(2)

(i.e. move away the prototype from the current input)


STEP 5: Reduce learning rate (t) where t = current training iteration
STEP 6: Check the stop condition. This could be activated by reaching a predefined number of
iterations or the learning rate drops below a certain value.
For the Learning Vector Quantization, an alternative unsupervised learning algorithm can be
developed. This time the labeling of the prototypes and the labeling of the input data is unknown, the
2

goal of the training being to develop an insight into the data (i.e. develop a knowledge on how many
classes there may be insight the test database - cluster).
The algorithm is similar to its supervised version:
STEP 0: Initialize weight vectors to the first M training vectors, where M is the number of different
categories and set (0) - the initial learning rate. This weight initialization technique presented here
is only one of many different methods.
STEP 1: While stopping condition is false do steps 2 to 6:
STEP 2: For each training input x do steps 3 and 4:
STEP 3: Find the closest prototype wwinner to the current input x using an appropriate measure (in
most cases, the Euclidean distance is computed)
STEP 4: Adjust the prototypes in the network according to the rules:
+ 1 = +

(3)

(i.e. move the winning prototype closer to the current input)


all the other prototypes:
, + 1 = , ,

(4)

(i.e. move all the prototypes different from the winner away from the current input)
In here i, j denote the weight connecting node i in input layer to node j in competitive layer.
Let us remember that a prototype j is defined by the collection of weights targeting the same
competitive node j.
STEP 5: Reduce learning rate (t) where t = current training iteration
STEP 6: Check the stop condition. This could be activated by reaching a predefined number of
iterations or the learning rate drops below a certain value.

Running the network:


After the training process is completed the network can be used in classification problems
that work on data targeted to the same feature space where the training data resides. The process used
to map an unlabeled input to one of the classes defined by the prototypes is straightforward: the input
is mapped to the class defined by the closest prototype (the distance is implemented by an
appropriated measure - usually Euclidian distance). In the case of the LVQ with unsupervised
training first the prototypes must be labeled and then the mapping is performed with the same
process.

Example:
Consider the following two dimensional training database. The vectors are also labeled. The
figure below presents the evolution of the training algorithm when the inputs are presented in the
same order as they appear in the representation of the database - their initial state is randomly
chosen. For the ease of computation the learning rate is considered to be constant, equal to 0.5.
x1

x2

1
3
6
8
9
1

3
4
1
3
1
6

1
1
2
2
2
1

= 0.5 (const)

Figure 2. Evolution of the LVQ supervised training method

SELF ORGANIZING MAP


Overview:
A Self Organizing Map (SOM) or Kohonen Neural Network is an unsupervised trained
neural network able to produce a representation of the input features in a lower dimensional space,
called map. What is important is that the map consists of nodes that are continuously tweaked in the
learning stage, to resemble as much as possible to the topology of the input data. Also neighboring
nodes resemble each other defining regions of resemblance and making the map a good data
classifier for subsequent problems running on the same feature space.
Architecture:
A SOM consists of two layers of nodes fully interconnected - see figure 3 - with weights
assigned to the connections between the nodes. The input layer has the role of distributing the
components of the inputs towards the nodes in the output layer whereas the output layer of nodes
define an organized pattern (usually a 2D array) called map. All these nodes contain a position in the
map and have associated n-dimensional vectors defined by the weights of the connections targeting
them. When the training of a map is completed, these nodes will resemble the input data and
4

moreover neighboring nodes will resemble each other. Of course, the comparison is being made
based on the vector of weights associated with each node.

Figure 3. The architecture of a self organizing map


Training:
The goal of the training for a SOM is to adjust the weights corresponding to each output in
order to create the topological resemblance with the input database. Like in the LVQ case the
training of the network is based on the competitive concept where a winning unit also known as Best
Matching Unit or BMU is moved closer (in the feature space) to the current input. The main
difference resides in the fact that the SOM training algorithm also adjusts the positions of the
neighbors of the BMU, bringing them closer to the input as well. Also the training algorithm for the
SOM is unsupervised. The full training algorithm is based on the next steps:
STEP 0:

Initialize the weighs wij (in many cases small random values are used)

STEP 1:

While stopping condition is false do steps 2 -

STEP 2:

Pick at random an input

STEP 3:

Compute a measure of resemblance (usually the Euclidian distance) to all the nodes in
the map

STEP 4:

Extract the closest node -> BMU is defined

STEP 5:

Adjust the weight of the BMU and its neighbors using the rule:
, ( + 1) = , () + () ,

where:

(5)

t
= current iteration of the algorithm
(t) = learning rate
(j, t) = neighborhood function. It activates the updating process depending on the
position of the j node in the map relative to the BMU and defines a degree
of change depending on iteration step t. In some implementation the
degree of change is strict: 0 or 1, depending on the position of the node.

wi,j
STEP 6:

= current input
= the weight from input node i to output node j

Reduce the learning rate (t)

The stopping condition of the algorithm may be a certain (large) number of iterations being
passed or a predefined value for the learning rate being reached.
Some examples for the neighborhood functions are:
, =

1, __ __

, =

+ __ __
0,

(6)

(7)

2 2

where:
r(t) = the radius of the circle around the BMU - decreases monotonically with t
(t) = the width of the gaussian - decreases monotonically with t
pj R2, pBMU R2 = the positions of the current node j and the BMU in the map
An example of a trained SOM is presented below in figure 4. The map is represented as a 2D
array and was trained with 3 dimensional vectors representing the 24bit RGB colors. The nodes
contain weight vectors that resemble the colors in the input space and moreover resemble to their
neighbors.

Figure 4. An example of a trained Self Organizing Map


6

Running the network:


After the training process is completed the network can be used in a classification problem
with the apriori condition that all the nodes in the map are labeled. The labeling process is performed
by first assigning labels to the input data in the training set and then mapping the data on the trained
nodes. Each node is assigned to the same class with the majority of inputs that mapped on the node.
To be noted that some nodes may not be assigned to any class resulting in 'dead' nodes that cannot be
used in the classification. The classification process itself is identical to the one presented for the
LVQ: for a given unknown input, the distance is computed to all the nodes in the map (usually, the
Euclidian distance); the unknown input is given the same class label with the closest node.

BIDIRECTIONAL ASSOCIATIVE MEMORY


Overview:
A Bidirectional Associative Memory is a 2 layer recurrent network capable of storing bipolar
patterns (ai, bi) of size n and m and retrieving the pair bi for the input ai or vice-versa. The
information in the network flows from one layer to the other in an oscillatory manner until a stable
state is reached equivalent to the finding of the searched pattern.
Architecture:
The Bidirectional Associative Memory is made up of two layers of neurons activated by the
sign function: The input layer contains units which receive the input to the network and send the
result of their computation to the output layer. Thus the output of the first layer is transported by
bidirectional edges to the second layer of units, which then return the result of their computation
back to the first layer using the same edges. The network reaches a stable state in which the
information being sent back and forth does not change after a few iterations. Last but not least
important it must be said that the information is coded using bipolar values: {-1, +1}.

Figure 5. Bidirectional Associative Memory network model


Training:
The BAM network uses a Hebbian learning type algorithm. The task for the training /
learning algorithm is to establish the set of weights wij that permit the storing of vector pairs (a1,b1),
7

(a2,b2), ... , (aM, bM) where M was established to be min(n,m). For improved pattern recovering
performance it is important that the vector pairs are pair wise orthogonal within their respective
groups. The expression upon which the set of pairs are established is:

= 1 1 + 2 2 + +

(8)

where W is the matrix of weights


11
W=
1

(9)

Running the network:


As stated before the network works based on a recurrent informational flow that passes from
one layer to the other, using the same connection bidirectional edges (Figure 5). The network using
synchronous or asynchronous update always reaches a stable state in a finite number of iterations
when the states in the first and second layers of neurons no longer change from iteration to iteration.
The expressions that describe the functioning are (for x0 = ai):
forward pass - step 0:
0 = 0

(10)

1 = 0

(11)

backward pass - step 1:

The two computational steps are performed until a pair (xk, yk) of input-output states remains
unchanged from one iteration to the next. At this point bi = yk and the stored information has been
recovered.

HOPFIELD NETWORKS
Overview:
Hopfield networks represent a recurrent neural network model defined by the fact that the
state of any neuron is not changed as long as the update process is not performed for itself (i.e. it uses
asynchronous update - the probability that two neurons fire simultaneously is zero). Hopfield
networks resemble memories and are used in recovering information represented in bipolar form
affected by unwanted modifications. They represent a special case of a bidirectional associative
memory but chronologically the Hopfield network was proposed before the BAM.
Architecture:
The architecture of a Hofield NN is depicted below in figure 6.

Figure 6. Hopfield network model.

One should notice the fact that there is only a single layer of neurons completely
interconnected. The blue, green and magenta traces lead the output states of the neurons towards
their neighbors where the value is fed back as input - thus the network can also be described as
recurrent. The initial states of the neurons are loaded from the left side through the inputs (black
traces). The output of the network is obtained after all the neurons are updated but their new states do
not differ from their previous ones. At that point the network reached a point of energy minimum
(network is relaxed) and the output is read (on figure, right side black traces).
Training:
The Hopfield network uses a "one pass" Hebbian derived training rule. If we want to
"imprint" m different stable states (patterns that the network "remembers") then the problem of
training resumes to find adequate weights for connections between neurons. It is important to
mention that the stable states are represented only using bipolar values {1, -1}.
Hebbian learning is implemented by loading the m selected n-dimensional stable states x1, x2,
... , xm and updating the weights (initially set to zero) after each presentation according to the rule:

where i , j = 1, ... , n and i j

(12)

The symbols xik, xjk denote the i-th and j-th component of the stable state xk. After the
presentation of the first vector the weight matrix describing the connections of the network is given
by the following expression:
1 = 1 1

(13)

where I denotes the n x n identity matrix. Subtraction of the identity matrix guarantees that the main
diagonal of W becomes zero as the Hopfield model implies no connection between a neuron and
itself. Obviously another property of W is the symmetry. At this point it is important to mention that
the necessary and sufficient conditions for a Hopfield network with asynchronous update to reach a
stable state are these two mentioned above: W must be symmetric and present a null main diagonal
After all the m training vectors (stable states) have been presented W will be described by:
=

=1

(14)

Best results are obtained when the learning vectors x1, x2, ... , xm are orthogonal or close to
orthogonal just as the case of any other associative memory.
Running the network:
After the weights are established (i.e. W is computed) the network can be used in recovering
the memorized patterns. Starting from an arbitrary input state the network, using asynchronous
update, will reach with greater probability the stable state that differs the least from the input.
The network runs by randomly selecting a neuron, computing its excitation and then
changing its state accordingly - see the below expression.

1 , wij s j t i
s i t 1
1 , otherwise

(15)

where:
wij = weight of the connection from the j-th neuron to the current i-th neuron
sj = state of the j-th neuron
i = the activation threshold for the current neuron
If all the neurons in the network have been updated but their states did not change then the
network reached a stable state. From this point on, any update will not change the state of the
network.
TASKS:
1. Work on the exercise presented at the LVQ model description. How is the choice of the weights'
values and class labels for the prototypes going to influence the training?
2. What is the role of the neighborhood function in the SOM training algorithm? How is the SOM
training influenced by the choice of initial weights? Can the SOM training be implemented using a
parallel architecture? What about classification?
3. Using MATLAB train a BAM network and test its functionality on the following patterns:
A) ( [1, -1, 1], [-1, -1] )
( [-1, 1, -1], [ 1, 1] )

B) ( [1, -1, 1, -1, 1, -1], [1, 1, -1, -1] )


( [1, 1, 1, -1, -1, -1], [1, -1, 1, -1] )

4. Using MATLAB train a Hopfield network to store the [1, -1, 1] and [-1, 1, -1] as stable states. Test
its functionality using thresholds of 0.5 for the neurons.

10

Das könnte Ihnen auch gefallen