Beruflich Dokumente
Kultur Dokumente
(2.3)
where
j
x is the jth component of the input vector, and
ij
y is the jth is component of the
codeword
j
y . Associated with each codeword,
i
y is a nearest neighbor region called Voronoi
region. The set of Voronoi regions partition the entire space R
k
. Voronoi region is defined
by:
{
: ,
k
i i j
v x R x y x y = for all
} j i (2.4)
chapter 2 image compression algorithm
Jadavpur University MEE Thesis
13
x Indexi
Input vector
Fig-2.5(a):A VQ encoder
Indexi
i
x
Output vector
Fig-2.5(b):A VQ decoder
1
2
.
.
,
N
x
x
x
CODEBOOK
min[ ( , )]
1, 2,....,
i
d x x
i N =
LOOK UP TABLE
1
2
.
.
,
N
x
x
x
CODEBOOK
chapter 2
Jadavpur University
Fig-2.6: Codewords in 2-dimensional space.
marked with red circles, and the Voronoi regions are separated with boundary lines.
The figure 2.6 is a two dimensional voronoi diagram. Here
2-dimensional space. A vector quantizer with minimum encoding distortion is called a
quantizer or nearest-neighbor quantizer
computation complexity and high comp
applications because it can provide high compression ratio and simple decod
vector quantization is well-known lossy compression method which can be applied to both
images and signals, including biosignal applications.
vectors, whereas scalar quantization is its special
There are many vector quantization variations which typically perform better, but usually they
have larger computational complexity
image compression algorithm
14
dimensional space. Input vectors are marked with an x, codewords are
red circles, and the Voronoi regions are separated with boundary lines.
two dimensional voronoi diagram. Here codewords are present in
A vector quantizer with minimum encoding distortion is called a
neighbor quantizer. A good vector quantization system should have low
complexity and high compression ratio. It has received great interest in many
applications because it can provide high compression ratio and simple decod
known lossy compression method which can be applied to both
images and signals, including biosignal applications. The general vector quantization
vectors, whereas scalar quantization is its special case (dealing with vectors with one element).
There are many vector quantization variations which typically perform better, but usually they
have larger computational complexity.
image compression algorithm
MEE Thesis
Input vectors are marked with an x, codewords are
red circles, and the Voronoi regions are separated with boundary lines.
codewords are present in
A vector quantizer with minimum encoding distortion is called a Voronoi
vector quantization system should have low
It has received great interest in many
applications because it can provide high compression ratio and simple decoding process. The
known lossy compression method which can be applied to both
The general vector quantization deals with
case (dealing with vectors with one element).
There are many vector quantization variations which typically perform better, but usually they
CHAPTER 3
LEARNING VECTOR QUANTIZATION ALGORITHIM
3.1 LEARNING VECTOR QUANTIZATION (LVQ) ALGORITHMS FOR IMAGE
COMPRESSION
In 1989 Teuvo Kohonen initiated the study of the prototype generation algorithm
called learning vector quantization (LVQ). LVQ is the name used for vector quantization
algorithms, implemented by training a competitive neural network using gradient descent
algorithm. Gradient descent based minimization allows the development of LVQ algorithms
capable of minimizing a broad variety of objective functions that cannot be treated by
conventional optimization methods frequently used for developing clustering and vector
quantization algorithms [5]. LVQ is a competitive network where the output is known. This
algorithm is used to maximize correct data. In an LVQ network, target values are available for
the input training patterns and the learning is supervised. The LVQ scheme is a simple one and it
gives a class of adaptive techniques for constructing vector quantizers. An advantage of LVQ
algorithm is that it creates prototypes that are easy to be interpreted by the experts in the field. It
can be applied to pattern recognition, multi-class classification and data compression tasks, e.g.
speech recognition, image compression, image processing or custom classification. In many
cases, LVQ algorithms can achieve better results than other neural network classifiers in spite of
their simple and time-efficient training process. Before describing LVQ algorithm here neural
network has been briefly discussed.
3.2 NEURAL NETWORK (NN)
A neural network (NN) can be defined as a massively parallel distributed processor that
has a natural propensity for storing experiential knowledge and making it available for use [7].
An NN derive their intelligence from the collective behavior of simple computational
mechanisms at individual neurons. NNs are useful for recognizing patterns, classifying inputs,
and adapting to dynamic environments by learning. However, the internal mapping structure of a
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
16
NN is, many a times, treated like a black box. Typically, these networks are composed of
collections of processing elements. The nodes in neural networks are called processing elements,
and the directed links (information channels) are called interconnects. An NN topology is
specified by its interconnection scheme, the number of layers and the number of nodes per layer.
Figure 3.1 gives the structure of a three-layered neural network. That is represented by a set of
nodes and arrows. In this figure three types of nodes are present (input/hidden/output). The input
nodes receive the signals and the output nodes encode the concepts (or values) to be assigned.
The nodes in the hidden layers are not directly observable. Hidden layer nodes provide the
required degree of non-linearity for the network. According to the learning process, neural
networks are divided into two kinds:
a) Supervised and
b) Unsupervised.
The difference between them lies in how the networks are trained to recognize and categorize
the objects.
Output
1
y
2
y
3
y
2
ij
w
1
ij
w
Input
1
x
2
x
Fig-3.1:The structure of a representative three-layered neural network. Input nodes are
marked as
1
x and
2
x . Output nodes are marked as
1
y ,
2
y and
3
y .
1
ij
w and
2
ij
w are the weights
associated with the links between the input and the hidden layer nodes and the links between the
hidden and the output layer nodes, respectively.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
17
3.2.1 SUPERVISED LEARNING
Learning with supervision or with a teacher is known as the supervised learning. In
this learning method, the network is given input samples from a training data set, along with the
current classification of each sample, and the network produces an output, signifying its best
guess for the classification of each input object. The network compares its output with the
correct, or target output which was specified by the user along with the input data. The network
then adjusts its internal components (connection weights) to make its output agree more closely
with the target output. In this way the network learns the correct classification of its training data
set [8]. We can say that, in this type of learning, training inputs are provided with the desired
outputs. For example, in a classification problem, the learner approximates a function mapping
a vector into classes by looking at input-output examples of the function. The output of the
function can be a continuous value (called regression), or can predict a class label of the input
object (called classification). The task of the supervised learner is to predict the value of the
function for any valid input object after having seen a number of training examples (i.e. pairs of
input and target output). To achieve this, the learner has to generalize from the presented data to
unseen situations in a "reasonable" way [9]. Depending on the nature of the teacher's
information, there are two approaches to supervised learning. One is based on the correctness of
the decision and the other based on the optimization of a training cost criterion. Supervised
learning can generate models of two types. Most commonly, supervised learning generates a
global model that maps input objects to desired outputs. In some cases, however, the map is
implemented as a set of local models (such as in case-based reasoning or the nearest neighbor
algorithm). This learning have been used in a wide range of applications including signal and
image processing, speech recognition, system identification, automatic diagnosis, prediction of
stock prices, signature authentication, detection of events in high- energy physics, etc [10].
Popular supervised learning algorithms include Perceptron learning algorithm, Least Mean
Square (LMS) algorithm, and Back propagation algorithm.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
18
3.2.2 UNSUPERVISED LEARNING
Learning without supervision is known as the unsupervised learning. Using no
supervision from any teacher, unsupervised networks adapt the weights and verify the results
only on the input patterns. The unsupervised networks are also called as self-organizing
networks. In the unsupervised method, samples are input into the network and the network must
determine the correlations between the objects and produce an output in the correct class for each
input object. In essence, the unsupervised algorithm must have some internal means of
differentiating objects in order to classify them [8]. In this learning, the system parameters are
adapted using only the information of the input and are constrained by pre-specified internal
rules. This neural network cluster, code or categorize input data. Similar inputs are classified as
being in the same category, and should activate the same output unit, which corresponds to a
prototype of the category [10]. Unsupervised classification procedures are often based on some
kind of clustering strategy, which forms groups of similar patterns. The clustering technique is
very useful for pattern classification problems. Furthermore, it plays an important role in many
competitive learning networks. Unsupervised learning is very important in neural networks
because it leads to effective dimensionality reduction in the input data. This is achieved by
discovering a smaller number of features to work with than were present in the raw data, based
on statistical regularities in this data [11]. This is important since it is likely to be much more
common in the brain than supervised learning. This learning using LVQ technique is
successfully employed in several application fields. Two very simple classic examples of
unsupervised learning are clustering and dimensionality reduction. This type of NNs have been
widely used in clustering tasks, feature extraction, data dimensionality reduction, data mining
(data organization for exploring and search), information extraction, density approximation etc
[10]. Unsupervised learning includes competitive learning and self-organizing map which will
discuss next.
3.3 TYPES OF UNSUPERVISED LEARNING
Now we discuss competitive learning and self originated map in this sub section.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
19
3.3.1 COMPETITIVE LEARNING (CL)
Competitive learning (CL) networks [12] are unsupervised neural networks where only
the active neurons are allowed to update their weight vectors. A basic competitive learning
model consists of feed forward and lateral networks with fixed output nodes (i.e. fixed number of
clusters). Here the output neurons of a neural network compete among themselves to become
active. As a result only one output neuron is activated at any given time [11].This is achieved by
means of a so called winner-take-all (WTA) operation [13]. The network in its simplest form,
works in accordance with WTA strategy. The input and output nodes are assumed to have binary
values (1 or 0). An input pattern x is a sample point in the n-dimensional real or binary vector
space. That is, there are as many output neurons as the number of classes and each output node
represents a pattern category. CL is useful for classification of input patterns into a discrete set of
output classes. The simplest CL network is shown in figure-3.2 which shows all inputs are
connected to a single layer of output neurons.
1
x
1
y
2
x
2
y
3
x
3
y
Fig-3.2: Competitive learning network. The solid lines indicate excitatory connections whereas
the dashed lines indicate inhibitory connections.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
20
The winner-take-all operation is implemented by connecting the outputs to the other
neurons. As a result of competition, the winner of each iteration, element
*
i , is the element
whose total weighted input is the largest
*
. . ,
i
i
W X W X for all
*
i i
(3.1)
activation becomes the winner. In the case of normalized inputs, the unit with
*
i
produces
the
smallest activation in terms of
*
,
i
i
W X W X for all
*
i i , (3.2)
that is, the unit with normalized weight closest to the input vector becomes the winner. In fact,
the winning neuron can be found by simple search of maximum or minimum activation [10].
This neuron updates its weight while the weights of the other neurons remain unchanged. A
simple competitive weight updating rule is the following:
( )
*
*
0
j ij
ij
x w if
i i
w
if
i i
(3.3)
where is a constant,
j
x is the n-dimensional input vector and
ij
w is the change in weight
vectors. These types of learning algorithms are frequently based on a minimum loss function.
Although Kohonens competitive learning (KCL) network is originally not a clustering method,
it could be used as a prototype generation algorithm called a learning vector quantization (LVQ)
[12].In recent years, competitive learning algorithms have been widely used for vector
quantization methods. Vector quantization is based on the competitive learning paradigm, so it
is closely related to the self-organizing map model, described next.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
21
3.3.2 SELF-ORGANIZING MAP (SOM)
SOM [8] or self-organizing feature map (SOFM) is a special type of competitive
learning network, where the neurons have a spatial arrangement, i.e. the neurons are typically
organized in a line or a plane [10]. SOFM has formed a basis for a great deal of research into
applying network models to the problem of codebook design in vector quantization. Professor
Teuvo Kohonen introduced SOM as the concept of classes ordered in a topological map. One
of the most interesting aspects of SOMs is that they learn to classify data without supervision.
This unsupervised Artificial Neural Networks (ANN) are mathematically characterized by
transforming high-dimensional data into two-dimensional representation, enabling automatic
clustering of the input, while preserving higher order topology [8]. It is trained using
unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized
representation of the input space of the training samples, and called a map. During training, all
weight vectors are associated with the winner neuron and its neighbors are updated [10]. SOM
architecture is given in figure-3.4. It is sometimes called a Kohonen map. This Kohonens
method consists of one layer of neurons and uses the method of competitive learning with
winner take all approach. Its architecture consists of input or training data vectors X which are
mapped into a two-dimensional lattice, and each node on the lattice has an associated reference
or node's weight vectorW .
1
x
2
x
3
x
Fig-3.4: A Self-Organizing Map which shows a very small Kohonen network of 3 X 3 nodes
connected to the input layer representing a two dimensional vector.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
22
The lines connecting the nodes in Figure 3.4 are only there to represent adjacency and do
not signify a connection as normally indicated when discussing a neural network. There are no
lateral connections between nodes within the lattice. To determine the best matching unit
(BMU), one method is to iterate through all the nodes and calculate the Euclidean distance
between each node's weight vector and the current input vector. The node with a weight vector
closest to the input vector is tagged as the BMU.
The Euclidean distance is given as:
2
0
( ) ( )
n
i i
i
D i x W
=
=
(3.4)
The weights of the BMU and neurons close to it in the SOM lattice are adjusted towards the
input vector. The magnitude of the change decreases with time and with distance from the BMU.
The reference vectors are then updated using the following rules:
( 1) ( ) ( ) ( )( ( ) ( )) W t W t t t X t W t + = + (3.5)
where t represents the time-step and is a small variable called the learning rate;
0 ( ) 1 t < < , which decreases with time and modulates the weight update.
is used to
represent the amount of influence a node's distance from the BMU has on its learning. Thus
( ) t is given as:
( ) exp t =
2
2
2 ( )
dist
t
1,2,3,.... t = (3.6)
where dist represents the distance a node from the BMU. is the width of the neighborhood
function and it can be calculated as:
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
23
0
( ) exp t =
t
1,2,3,.... t = (3.7)
where,
0
denotes the width of the lattice at time
0
t , denotes a time constant,
t and
is the
current time-step (same for the current iteration of the loop). The decay of the learning rate is
calculated at each iteration using the following equation:
0
( ) exp t =
t
1, 2,3,.... t =
(3.8)
where
0
is the learning rate at time
0
t .
SOMs are different from other artificial neural networks in the sense that they use a
neighborhood function to preserve the topological properties of the input space. An SOM enables
us to have an idea about the statistical distribution of the input vectors on the output layer. It is a
competitive process which can also be called vector quantization. Most SOM applications use
numerical data and are popularly employed in several applications e.g. automatic speech
recognition, clinical voice analysis, monitoring of the condition of industrial plants and
processes, cloud classification from satellite images, analysis of electrical signals from the brain,
organization of and retrieval from large document collections, environmental modeling, analysis
and visualization of large collections of statistical data etc. The SOFM learning method may be
viewed as the first of two stages as a classification algorithm which employs the LVQ to perform
the second and final stage. Improved classification performance can be obtained by using this
algorithm in combination with a supervised learning technique, such as LVQ, described next.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
24
3.4 LEARNING VECTOR QUANTIZATION (LVQ)
LVQ is an online algorithm whose observations are processed one at a time. It is based
on the SOM or Kohonen feature map. It should be noted that LVQ is not a clustering algorithm.
This is a "nearest neighbor" neural net in which each node is designated, via its desired output.
This learning technique uses the class information to reposition the Voronoi vectors slightly, so
as to improve the quality of the classifier decision regions. Here classes are predefined and the
model vectors are labeled by symbols corresponding to the predefined classes. This
algorithm can be used when we have labeled input data. The basic LVQ algorithm is actually
quite simple. This algorithm improves the separation in classes from the solution suggested by
the unsupervised training, may be by the SOM algorithm. For a given input, the method consists
in bringing closer the most activated neuron if it is in the right class (supervised training), or to
push it further away in the opposite case. The other neurons (i.e. losers) remain unchanged. Each
neuron thus becomes class-representative. LVQ belongs to hard vector quantization group. In
the hard approach, each input is associated only with the group with the nearest center. The goal
is to determine a set of prototypes that best represent each class. It is applicable to p-
dimensional unlabeled data [13]. In LVQ, cluster substructure hidden in unlabeled p-dimensional
data can be discovered. The architecture of the LVQ is also similar to that of the Kohonen
feature map, without a topological structure assumed for the output units. The network has three
layers: an input layer, a Kohonen classification layer, and a competitive output layer. The basic
architecture of LVQ neural network is shown in Fig.3.5. In Fig.3.5, LVQ network contains an
input layer, a Kohonen layer which learns and performs the classification, and an output layer.
The input layer contains one node for each input feature; the Kohonen layer contains equal
numbers of nodes for each class; in the output layer, each output node represents a particular
class [6] or we can say the output unit has a known class, since it uses supervised learning.
Hence, it differs from Kohonens SOM, which uses unsupervised learning. In terms of neural
networks, an LVQ is a feed forward net with one hidden layer of neurons, fully connected with
the input layer. A codebook vector (CV) can be seen as a hidden neuron (Kohonen neuron) or a
weight vector of the weights between all input neurons and the regarded Kohonen neuron
respectively.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
25
1
o
4
o
1
x
1
u
1
y
2
o
3
o
4
u
2
x
2
u
One y = 1, others = 0
2
y
2
y
3
x
3
u
Input layer hidden layer output layer
Adjustable weights (CVs)
Fig-3.5: LVQ architecture: one hidden layer with Kohonen neurons, adjustable weights
between input and hidden layer and a winner takes it all mechanism.
As we know LVQ, uses the same internal architecture as SOM: a set of n-dimensional
input vectors is mapped onto a two-dimensional lattice, and each node on the lattice has an
associated n-dimensional reference vector. The learning algorithm in LVQ, i.e., the method of
updating the reference vectors, is different from that in SOM. Because LVQ is a supervised
4
u
8
u
5
u
9
u
6
u
7
u
4
2
o x w =
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
26
method, during the learning phase, the input data are tagged with their correct class and each
output neuron represents a known category [8]. We define the input vector x as
1 2 3
( , , ,..., )
n
x x x x x =
and the reference vector for i th output neuron
i
w as
1 2 3
( , , ,..., )
i i i i ni
w w w w w =
We define Euclidean distance between the input vector and the reference vector of the i th
neuron as:
2
1
( ) ( )
n
j ji
j
D i x w
=
=
(3.9)
When ( ) D i is a minimum, the input vectors are compared to the reference vectors and the
closest match is found. The winning reference vector,
*
i
w is then obtained by the formula
*
i
i
w x w x
(3.10)
The reference vectors are then updated using the following rules:
* * *
( ) ( ) ( )( ( ))
i i i
w new w old t x w old = + if x is in the same class as
*
i
w ,
* * *
( ) ( ) ( )( ( ))
i i i
w new w old t x w old = if x is in a different class from
*
i
w ,
* *
( ) ( )
i i
w new w old = ) if i is not the index of the winning reference vector.
The learning rate0 ( ) 1 t < < should generally be made to decrease monotonically with time
and can be defined as:
0
( ) (1 / ) t t T = (3.11)
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
27
where
0
is the learning rate at time
0
t and T is the total number of learning iterations. The
LVQ training algorithm aims at producing highly discriminative reference vectors through
learning [6]. There are several versions of LVQ algorithm for which different learning rules are
used. LVQ algorithms are a family of training algorithms for the nearest-neighbor classifiers,
which include OLVQ which is the optimized version of LVQ1, LVQ2 and its improved versions
LVQ 2.1, LVQ3 algorithms etc [8]. All these algorithms are intended to be applied as extension
to previously discussed (O) LVQ1 (KOHONEN recommend an initial use of OLVQ1 and
continuation by LVQ1, LVQ2.1 or LVQ3 with a low initial learning rate) [19]. OLVQ1 is the
same as LVQ1, except that each codebook vector has its own learning rate. The popular LVQ2,
LVQ2.1 and LVQ3, algorithm are briefly discussed next.
3.4.1 LVQ2 ALGORITHM
An improved LVQ algorithm, known as LVQ2 algorithm, is sometimes preferred
because it comes closer in effect to Bayesian decision theory. The same weight or vector update
equations are used as in the standard LVQ, but they only get applied under certain conditions,
namely when:
1. The input vector x is incorrectly classified by the associated Voronoi vector.
2. The next nearest Voronoi vector does give the correct classification, and
3. The input vector x is sufficiently close to the decision boundary (perpendicular bisector plane)
between the Voronoi vector and nearest Voronoi vector. In this case, both Voronoi and nearest
Voronoi vectors are updated (using the incorrect and correct classification update equations
respectively). In LVQ2 algorithm, adaptation only occurs in regions with cases of
misclassification in order to get finer and better class boundaries.
3.4.2 LVQ2.1 and LVQ3 Algorithms
LVQ2.1 algorithm is an improved version of LVQ2 algorithm which aims at eliminating the
detrimental effect. It is to be used only after LVQ1 has been applied. LVQ2.1 allows
adaptation for correctly classifying codebook vectors [19]. The LVQ2.1 algorithm is based
on the idea of shifting the decision boundaries toward the Bayes limits with attractive and
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
28
repulsive forces. Here two BMUs are selected and only updated if one belongs to the desired
class and one does not, and the distance ratio is within a defined window.
LVQ3 leads to even more weight adjusting operations due to less restrictive adaptation rules
[19]. LVQ3 has been proposed to ensure that the reference vectors continue approximating
the class distributions, but it must be noted that if only one reference vector is assigned to
each class, LVQ3 operates same as LVQ2.1. If both BMUs are of the correct class, they are
updated but adjusted using an epsilon value (adjusted learning rate instead of the global
learning rate).
Basically the developer of an LVQ algorithm has to prepare a learning schedule, a plan
which LVQ-algorithm(s) LVQ1, OLVQ, LVQ2.1 etc. should be used with which values for the
main parameters at different training phases. Also, the number of codebook vectors for each
class must be decided in order to reach high classification accuracy and generalization, while
avoiding under or over fitting. In LVQ, it is difficult to determine a good number of codebook
vectors for a given problem. Here accuracy is highly dependent on the initialization of the model
as well as the learning parameters used (learning rate, training iterations, etc). In the domain of
neural networks, LVQ and its extensions are among the best known algorithms for classification.
However, these often end at local minima of the distortion surface because they only accept new
solutions which maximally reduce the distortion, resulting in suboptimal networks whose
performance is inferior to globally optimal networks [20]. LVQ is an alternative of the
Generalized Lloyd Algorithm (GLA), better known as the Linde-Buzo-Gray (LBG) [14]
algorithm. LVQ algorithms are commonly grouped with GLA in the discussion of VQ
techniques. GLA is simple and have relatively good fidelity. So, it is a widely used VQ method.
This algorithm starts with a good initial codebook and global codebook can also be generated.
The strategy of LVQ is the same as GLA, except that the codevector update function differs.
GLA is discussed next.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
29
3.5 GENERALIZED LLOYD ALGORITHM
A well-known codebook design method is Generalized Lloyd algorithm (GLA) or
Linde Buzo and Gray (LBG) algorithm [14]. In 1980, Linde, Buzo and Gray proposed a VQ
design algorithm based on a training sequence which is known as LBG algorithm or GLA. This
method operates more in the input domain, clustering the input vectors and moving the centroid
to develop a new and better representation for the next iteration of the codebook. It is similar to
the k-means algorithm. GLA is an iterative gradient descent algorithm that tries to minimize an
average squared error distortion measure. This algorithm plays an important role in the design of
vector quantizer and in nearest neighbor feature clustering for pattern recognition. GLA begins
with a set of input vectors and an initial codebook. For each input vector, a codeword from the
codebook is chosen that yields the minimum distortion. If the sum of distortions from all input
vectors does not improve beyond some threshold, the algorithm stops. Otherwise the codebook is
modified as follows: Each codeword is replaced by the centroid of all input vectors that have
previously chosen it as their output vector. This completes one iteration. Then the GLA
iteratively keeps refining a codebook. In this algorithm, in each iteration the average distortion is
reduced this corresponds to a local change in the codebook, i.e., the new codebook is not
drastically different from the old codebook. Given an initial codebook, the algorithm leads to the
nearest local codebook, which may not be optimal. As codebook design is a complex
optimization problem, it has many local minima. GLA performance is sensitive to the
initialization of the codebook. The task of codeword search is to find the best-match codeword
from the given codebook for the input vector. This means the nearest codeword
1 2
( , ,...., )
j j j jk
y y y y = in the codebook c is found for each input vector
1 2
( , ,....., )
k
x x x x = such that the distortion between this codeword and the input vector is the
smallest among all codewords. The most common distortion measure between x and
j
y
is the
Euclidean distance as follows:
2
0
( ) ( )
k
i ji
i
D j x y
=
=
(3.12)
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
30
Now, we describe the GLA steps. It consists of two phases, as shown in figure.3.6.
a) Initialization of the codebook, and
b) Optimization of the codebook.
Fig: 3.6: The GLA Procedure.
A. Codebook Initialization
The codebook initialization process is very important. In the initialization phase,
two methods are mainly used: in a random manner and by splitting.
Random initialization. The initial code words are randomly chosen [12]. Generally,
they are chosen inside the convex hull of the input data set [18].
Initialization by splitting. The original GLA algorithm uses splitting technique to
initialize the codebook. This technique basically doubles the size of the codebook in
START
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
31
every iteration. This procedure starts with one code vector
1
(0) c that is set to the
average of all training vectors.
Step 1: In a general iteration there will be N code vectors in the codebook. (0) 1, 2,.....,
i
c N = .
Split each code vector into two code vectors (0)
i
c and (0)
i
c r + , where r is a fixed perturbation
vector. Set 2 N N .
Step 2: If there are enough code vectors, stop the splitting process. The current set of N code
vectors can now serve as the initial set (0)
i
c for the codebook optimization phase. If more code
vectors are needed, execute the optimistic algorithm on the current set of N entries, to converge
them to a better set; then go to Step 1.
B. Codebook Optimization
Step 1: Select a threshold value , set 0 k = and ( 1) D = +. Start with initial codebook
with code vectors ( )
i
c k (where k is currently zero, but will be incremented in each iteration).
Training vectors are denoted as
i
T .
Step 2: For each code vector ( )
i
c k , find the set of all training vectors
i
T , that satisfies,
( , ) ( , )
i i j j
d T c d T c i j < (3.13)
This set or cell (also called Voronoi Region) is denoted as ( )
i
P k . Repeat Step 2 for all
values of i .
Step 3: Calculate the distortion ( )
i
D k between each code vector ( )
i
C k and the set of training
vectors ( )
i
P k found for it in Step 2. Repeat for all i , then calculate the average ( ) D k of all the
( )
i
D k . A distortion ( )
i
D k for a giveni is calculated by computing the distances ( ( ), )
i m
d C k T for
all training vectors
m
T in the set ( )
i
P k and then calculating the average distance.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
32
Step 4:
( ( 1) ( ))
,
( 1)
D k D k
If stop
D K
(3.14)
Otherwise, continue.
Step 5: 1 k k = + , find new code vectors ( )
i
c k that are the average of training vectors in cell
( 1)
i
P k that was computed in Step 2. Go to Step 2. Since the unused code vectors are doubled
in each step, such doubling might result in final codebook with many unused codevectors.
3.5.1 GLA Design Algorithm
Begin
1. Set a threshold , to be a ``small'' number. Let
1 2 3
{ , , ,..., },
M
X x x x x =
, 1,2,....., .
L
i
x R i M =
2. Select an initial code book
1 2
{ , ,...., }
N
y y y y = , , 1, 2,....., .
L
j
y R j N =
3. Calculate
(0) 2
1
1
min ( , )
j
M
y y i j
i
D d x y
M
=
=
Set 0 = .
4. Set 1 + .
0 i = .
5. Set 1 i i = +
Evaluate ( )
j i
x =
min
1, ( , ) ( )
0, ,
i j i
if d x y d x
otherwise
=
1, 2,...., j N =
6. If i M < , then go to step 5.
chapter 3 LVQ algorithm
Jadavpur University MEE Thesis
33
7. Calculate
1
1
( )
( )
M
j i i
i
j M
j i
i
x x
y
x
=
=
=
, 1, 2,...., j N =
8. Calculate
( ) 2
1
1
min ( , ).
j
M
y Y i j
i
D d x y
M
=
=
9. If
( 1) ( )
( 1)
D D
D
=
=
(
| |
(
= = |
|
(
\
| |
|
|
\
(4.2)
where is a parameter that controls the fuzziness of the membership and is also called as
fuzzifier. is a positive integer and can be expressed as
1
1 m
=
, in otherwords ( )
1
1
N
j i
j
u x
=
=
1, 2,...., . j N = (4.3)
At each iteration of FKM some or all of the codevectors will be found to change. The movement
of a particular codevector is determined only by its member training vectors. The FKM design
algorithm is summarized latter. The codebook design process is terminated if the fractional
decrease of distortion
( )
K
is below a threshold . is a very small value, normally
3 4
10 10
.
( )
K
is defined as:
( ) ( )
( )
1
( )
1
D D
K
D
= (4.4)
where is the index of iterations commencing with 1 = . The derivation of the fuzzy k-means
algorithms was based on the constrained minimization of the objective function [33]:
2
1 1
( )
N M
m
m j i i j
j i
J u x x y
= =
=
(4.5)
where1 m < < . The parameters in this equation, the cluster centroid vectors
j
y and the
components of the membership vectors
( )
j i
u x , can be optimized by Lagranges method.
4.2.1 FKM Design Algorithm
Begin
1. Select a threshold
2. Select an initial codebook
1 2
{ , ,...., }
N
y y y y =
3. Evaluate D
(0)
according to Eq.(4.1)
Set 0 =
chapter 4 fuzzy learning vector quantization algorithms
Jadavpur University MEE Thesis
39
4. Set 1 +
0 i =
5. Set 1 i i = +
Evaluate ( )
j i
x using Eq. (4.2), 1, 2,...., . j N =
6. If i M < , then go to step 5.
7. Calculate
j
y using Eq.(4.3), 1, 2,...., . j N =
8. Calculate
( )
D
according to Eq. (4.1)
9. If
( 1) ( )
( 1)
D D
D
>
, then go to step 4.
End.
In recent years, this method is employed in a wide range of applications, including fuzzy
control and machine vision. FKM generally produces better results in codebook design than
GLA and also reduces the dependence of the resulting codebook on the selection of the initial
codebook. This benefit is generally obtained at the expense of increased computation time
caused by the need to calculate the fuzzy membership and also because more iterations are
required due to the slow convergence. To overcome this deficiency, Karayiannis proposed fuzzy
vector quantization (FVQ) [24] algorithms as fast alternatives to the FKM algorithm. The FKM
FVQ algorithm is popularly employed in speech and speaker recognition and is used to train
codebooks in the vector quantization approach [27]. This FVQ algorithm, which is a fast
alternative to the FKM algorithm, is described next.
chapter 4 fuzzy learning vector quantization algorithms
Jadavpur University MEE Thesis
40
4.3 FUZZY VECTOR QUANTIZATION (FVQ) ALGORITHM
An FVQ model is a set of cluster centers determined using fuzzy c-means (FCM)
clustering to cluster the training dataset. So, it can be said that FVQ usually uses FCM clustering
algorithm to achieve clusters or codebooks. This was proposed by Karayiannis and Pai [9], and
Tsekouras [26].FVQ algorithms are based on a flexible strategy that allows gradual transition of
the membership function from soft to hard decisions [26]. FVQ algorithms achieve similar
qualities of codebook design to FKM but with much less computational effort. The FVQ
algorithm allows that each training vector is assigned to multiple codewords in the early stages
of the codebook design. Although, the FVQ algorithm reduces the dependence of the resulting
codebook on the initial codebook, the codewords are calculated in batch mode. The iterative
fuzzy vector quantization approach is based on a gradient decent approach, and the concept of
fuzzy logic is introduced into it. In FVQ, the source image is approximated coarsely by fixed
basis blocks, and the codebook is self-trained from the coarsely approximated image, rather than
from an outside training set or the source image itself. Therefore, FVQ is capable of eliminating
the redundancy in the codebook without any side information, in addition to exploiting the self-
similarity in real images effectively. FVQ is a clustering algorithm based on soft decisions that
leads to crisp decision at the end of the codebook design process [33]. The FVQ makes a soft
decision about which codeword is closest to the input vector, generating an output vector whose
components indicate the relative closeness (membership) of each codeword to the input vector.
In the training process, the output vector not only provides the membership description but also
provides detailed codewords distribution in the feature space which guides the FVQ codebook
updating. The FVQ keeps the fuzziness parameter constant throughout the training process. In
the initialization step of this algorithm, each training vector is assigned to a codebook vector
which is concentrated at a cluster center.
2
max
( ) 1
( )
i j
j i
i
x y
u x
d x
(4.6)
where is a positive integer that controls the fuzzification of the clustering process. Each
training vector is assigned to one cluster. Similar to FKM, FVQ algorithm does not classify
chapter 4 fuzzy learning vector quantization algorithms
Jadavpur University MEE Thesis
41
fuzzy data [FVQ2]. The advantages of FVQ versus FKM are the elimination of the effect of
initial codebook selection on the quality of clustering and the avoidance of a priori assumptions
for the level of the fuzziness needed for a clustering task [FVQ2]. FVQ algorithms are
categorized as FVQ1, FVQ2, and FVQ3 which are described below.
4.3.1. Fuzzy Vector Quantization I (FVQ1) [24]
The development of this algorithm is attempted by constructing a family of membership
functions. According to these conditions, the membership function
( )
j i
u x approaches unity as
( )
,
i j
d x y approaches zero and decreases monotonically to zero as the distance
( )
,
i j
d x y
increases from zero to
( )
max
( ) max ( , )
v
j i
i i j
y
d x d x y
=
=
(4.7)
Since
max
( , )
( )
i j
i
d x y
d x
is an increasing function of the distance
( )
,
i j
d x y , the membership
function
( )
j i
u x can be of the form
( )
max
max
( , )
( ) ( , ), ( ) 1
( )
i j
j i i j i
i
d x y
u x f d x y d x
d x
| |
= =
|
\
(4.8)
where is a positive integer. This family of membership functions has been experimentally
evaluated, mainly because of its simplicity and low computational requirements. Nevertheless,
there may be other functions satisfying the conditions mentioned above that could be used as
well [24]. The vector assignment is based on crisp decisions towards the end of the vector
quantizer design. This can be guaranteed by the minimization with respect to
j
y of the
discrepancy measure
1 1
( , 1,2,..., ),
j
J J y j k = = defined in
chapter 4 fuzzy learning vector quantization algorithms
Jadavpur University MEE Thesis
42
2
1
1 1
( )
k M
j i i j
j i
J u x x y
= =
=
, (4.9)
which results in the formula
( )
( )
1
1
M
j i i
i
j M
j i
i
u x x
y
u x
=
=
=
1, 2,...., j k = . (4.10)
The direct implication of this selection is that the proposed algorithm reduces to the crisp k-
means algorithm after all the training vectors have been transferred from the fuzzy to the crisp
mode [24].
4.3.2. Fuzzy Vector Quantization 2 (FVQ2) [24]
This algorithm is based on the certainty measures used for training vector assignment by
the family of fuzzy k-means algorithms. The codebook vectors can be evaluated in this case by
( )
( )
1
1
M
m
j i i
i
j M
m
j i
i
u x x
y
u x
=
=
=
(4.11)
resulting from the minimization of ( , 1,2,..., ),
m m j
J J y j k = = and also used in fuzzy k-
means algorithms [32]. Training vector assignment is entirely based on crisp decisions towards
the end of the vector quantizer design. If the assignment of the training vector is based on crisp
decisions, the corresponding membership function
( )
j i
u x takes the values zero and one. In this
case,
( ) ( )
m
j i j i
u x u x =
regardless of the value of m. Therefore, the codebook vectors are
evaluated by the same formula used in the crisp k-means algorithm towards the end of the vector
quantizer design [24]. The combination of the formulae used in fuzzy k-means algorithms for
evaluating the membership functions and the codebook vectors with the vector assignment
strategy results in the (FVQ 2) algorithm.
chapter 4 fuzzy learning vector quantization algorithms
Jadavpur University MEE Thesis
43
4.3.3. Fuzzy Vector Quantization 3 (FVQ3) [24]
The proposed strategy for vector assignment can lead to a broad variety of algorithms,
which employ different schemes for evaluating the membership functions and the codebook
vectors [24]. The evaluation of the membership functions is again based on the formula:
( )
( )
1
1
1
1
( )
,
,
j i
m
k
i j
l i l
u x
d x y
d x y
=
=
(4.12)
which is associated with the fuzzy k-means algorithm. Since image compression is based on a
crisp interpretation of the designed codebook, it is reasonable that the evaluation of each
codebook vector be influenced more by its closest training vectors. If mapproaches
asymptotically unity, this requirement is satisfied by evaluating the codebook vectors using the
crisp formula Eq. (4.10) instead of Eq. (4.11). Consider the membership function
( )
j i
u x ,
defined in Eq. (4.12). If m is sufficiently close to unity and
( ) 1
j i
u x ,
( ) ( )
m
j i j i
u x u x .
As the value of
( )
j i
u x decreases,
( ) ( )
m
j i j i
u x u x < .For a fixedm, the difference between,
( )
m
j i
u x and
( )
j i
u x increases as
( )
j i
u x approaches zero. In conclusion, the evaluation of
the codebook vectors by Eq. (4.10) guarantees that each codebook vector is not significantly
affected by the training vectors that are assigned membership values significantly smaller than
unity [24]. Another advantage of this choice is that the codebook evaluation formula Eq. (4.10) is
computationally less demanding than Eq. (4.11), since it does not require the computation of
( )
m
j i
u x . In addition, the computational burden associated with the evaluation of the
membership functions can be significantly moderated by requiring that
1
1 m
=
, where is
a positive integer. Such a choice results in a wide range of values of m close to unity, given in
terms of
1
1
m
=
+
.
chapter 4 fuzzy learning vector quantization algorithms
Jadavpur University MEE Thesis
44
4.4 FUZZY LEARNING VECTOR QUANTIZATION (FLVQ)
Fuzzy learning vector quantization (FLVQ) is developed based on learning vector
quantization (LVQ) and extended by using fuzzy theory. It was already discussed in section 3.2
that LVQ is the name used for unsupervised learning algorithms associated with a competitive
neural network. Bezdek et al. [13], [23] originals proposed a batch learning scheme, known as
fuzzy learning vector quantization (FLVQ). Karayiannis et al. [4], [33] presented a formal
derivation of batch FLVQ algorithms, which were originally introduced on the basis of intuitive
arguments. This derivation was based on the minimization of a function defined as the average
generalized distance between the feature vectors and the prototypes. This minimization problem
is actually a reformulation of the problem of determining fuzzy -partitions that was solved by
fuzzy c-means algorithms [23], [28]. Reformulation is the process of reducing an objective
function treated by alternating optimization to a function that involves only one set of unknowns,
namely the prototypes [36]. The function resulting from this process is referred to as the
reformulation function. FLVQ [38] has quickly gained popularity as a fairly successful batch
clustering algorithm. FLVQ employs a smaller set of user defined parameters. Both FVQ and
FLVQ are batch procedures, which indicate that in each iteration they process all the training
data at once. This algorithm combines local and global information in the computation of a
relative fuzzy membership function [13], [38]. The update equations for FLVQ involve the
membership functions of the fuzzy c-means (FCM) algorithm. This membership function
provides the degree of compatibility of an input pattern with the vague concept represented by a
cluster center or we can say membership functions are used to determine the strength of
attraction between each prototype and the input vectors. FLVQ employs metrical neighbors in
the input space. In this case, the transition from fuzzy to crisp mode is accomplished by
manipulating the fuzziness parameter. FLVQ manipulates the fuzziness parameter throughout the
training process [34]. The fuzzy LVQ method presented here makes use of a fuzzy objective
function. This function is defined as the sum of the squares of the euclidian distances between
each prototype vector and each input vector. Each of these squares is weighted by the
characteristic function of the input vectors [13]. Suppose that there are a set of n training data
chapter 4 fuzzy learning vector quantization algorithms
Jadavpur University MEE Thesis
45
vectors
1 2 3
( , , ,..., )
p
n
X x x x x R = .The objective function of the fuzzy c-means algorithm
is
2
1 1
( , ) ( )
n c
m
m ik k i
k i
J U V u x v
= =
=
(4.13)
where
ik
u is the membership degree of the k th training vector in the i th cluster,
{[ ],1 ,1 }
ik
U u i c k n = is the partition matrix, {[ ],1 }
i
V v i c = the cluster
center matrix, and (1, ) m the fuzziness parameter. The problem is to minimize ( , )
m
J U V
under the following constraint:
1
1,
c
ik
i
u k
=
=
(4.14)
To achieve this task, the FLVQ exhibits some certain steps [33] which are described in FLVQ
design algorithm present next.
4.4.1 FLVQ Design Algorithm:
Begin
1. Select the number of clusters c , the initial values for the cluster centers
1 2
, ,.....,
c
v v v and a
value for the parameter .
2. Set the maximum number of iterations
max
t and select the initial
0
m and the final
f
m values
for the parameter m.
3. For
max
0,1, 2,..., t t =
(a) Calculate the fuzziness parameter
0 0
max
[ ( )]
( )
f
m t m m
m t
t
=
chapter 4 fuzzy learning vector quantization algorithms
Jadavpur University MEE Thesis
46
(b) Set
( )
2
( ( ) 1)
1
( )
m t
m t
c
k i
ik
i
k j
x v
a t
x v
=
(
| |
(
|
=
(
|
( \
(c) Update the cluster centers according to the following learning rule:
1
1
( )
( )
( )
n
ik
k
i n
ik
k
a t x
v t
a t
=
=
=
.
(d)
1
( ) ( ) ( 1)
c
i i
i
If E t v t v t
=
= <
(5.2)
where 255 is the peak signal value,
ij
F and
ij
F