Sie sind auf Seite 1von 6

International Journal of Computer Information Systems,

Vol. 2, No. 5, May 2011


Comparing the Performance of Backpropagation Algorithm
and Genetic Algorithms in Pattern Recognition Problems

Chukwuchekwa Ulumma Joy
Department of Mathematics
Federal University of Technology, Owerri
(NIGERIA)

E-mail: rejoice2k7@yahoo.com

Abstract Multilayer Perceptrons (MLPs) with
Backpropagation (BP) training algorithm have been known to be
very useful in solving a wide variety of real world problems (such
as Pattern Classification, Clustering, Function Approximation,
Forecasting, Optimization, Pattern Association and Control), but
there have been numerous research on determining alternative
training algorithms to the backpropagation algorithm, which has
been known to be using the gradient descent technique. One
approach has been to use optimization algorithms, such as the
genetic algorithm, that are not dependent on derivatives to
modify the objective function to penalize for unwanted weights in
the solution. In this paper, using some pattern recognition
problems as illustrations, comparisons are made based on the
effectiveness and efficiency of both backpropagation and genetic
algorithm training algorithms on the networks. The
backpropagation algorithm is found to outperform the genetic
algorithm in this instance.

Keywords- Backpropagation neural networks; Genetic Algorithms;
Evolving neural networks
I. INTRODUCTION
Multilayer perceptron (MLP) neural networks trained
with backpropagation (BP) algorithm have been proved to be
useful in solving a wide variety of real world problems in
various domains. Despite numerous extensions and
modifications [1] (such as the acceleration of the convergence
speed (i.e. fast-backpropagation), special learning rules and
data representation schemes (i.e. cumulative delta-rule,
cascade and batching learning), different error functions (i.e.
cubic instead of standard RMS), alternative transfer
(activation) functions of the neurons, weight distribution (i.e.
weight pruning) among others, to improve the results or to
achieve some required properties of the trained networks, one
key element i.e., the BP, still based on the gradient descent
algorithm to minimize the network error, has been scarcely
changed.

Usually a gradient descent algorithm is used to adjust the
neural networks weight by comparing the target (desired) and
actual network results when a set of inputs are introduced in
the network, but despite its popularity in the training of MLP,
BP has some drawbacks. It depends on the shape of the error
surface, the values of the randomly initialized weights and
some other parameters, that is, BP very much depends on
good, problem specific parameter settings [2]. Also there is the
tendency of the trained neural network getting stuck in local
minima.

Numerous attempts have been made to prevent the
gradient descent algorithm from becoming trapped in local
minima as the training of the network progresses. Genetic
algorithms (GAs) usually avoid local minima by searching in
several regions simultaneously (working on a population of
trial solutions). And the only information that GAs need is
some performance value that determines how good a given set
of weights is. They have no need for gradient information.
GAs also place no restrictions on the network topology
because they do not require backward propagation of an error
signal [2].

In this paper, using some pattern recognition problems as
illustrations, comparisons are made based on the effectiveness
and efficiency of both the backpropagation and the genetic
algorithm training algorithms on the neural networks. The
backpropagation algorithm is found to outperform the genetic
algorithm in this instance.

II. MATERIALS AND METHODS
A. TRAINING MLPS WITH THE BP
There are many kinds of learning rules but the most often used
is the Delta rule or Back-propagation rule. A NN is trained to
map a set of input data by iterative adjusting the weights
iteratively. Information from inputs is fed forward through the
network to optimize the weights between neurons.
Optimization of the weights is usually made by backward
propagation of the error during training or learning phase. The
ANNs read the input and output values in the training data set
and change the value of the weighted connections (links) to
reduce the difference between the predicted and target values.
The error in prediction is minimized across many training
cycles (epochs) until network reaches specified level of
accuracy. Stergiou and Siganos [3] gave a good explanation
for the backpropagation algorithm.
A.1 JAVANNS
JAVANNS is a simulator for ANNs. It enables one to
use predefined networks or create new networks, to train and
analyze them. Details of the usage can be found in
Special Issue Page 7 of 52 ISSN 2229 5208
International Journal of Computer Information Systems,
Vol. 2, No. 5, May 2011
JAVANNS-manual-4.html 2001 2002 Universitt
Tbigen.
B. TRAINING THE MLPS USING GENETIC ALGORITHMS

Genetic algorithms (GAs) belong to a family of
computational models based on evolution. GAs belong to a
class of population based random search algorithms that are
inspired by the principles of natural evolution known as
Evolutionary Algorithms (EAs). A key element in a genetic
algorithm is that it maintains a population of candidate
solutions that evolves over time. By selecting suitable
parameters to control the GAs, high efficiency and good
performance can be achieved. GAs are developed by John
Holland in 1975. GAs have ability to create an initial
population of feasible solutions, and then recombine them in a
way to guide their search to only the most promising areas of
the state space. Each feasible solution is encoded as
chromosome (string) and each chromosome is a measure of
fitness via a fitness function. The fitness of a chromosome
determines its ability to survive and produce offspring.

B.1 EVOLVING NEURAL NETWORKS

Combining GAs and ANNs produces a special class
of ANNs in which GAs is another form of adaptation in
addition to learning [4]. The GAs can be used effectively to
find an optimal set of connection weights globally without
computing gradient information. GAs have been combined
with ANNs at different levels such as connection weights,
network architecture and learning rules. The evolution of
architectures enables ANNs to adapt their structures
(topologies) to different tasks without human intervention.
Evolution of learning rules is a process where the adaptation
of learning rules is achieved through evolution. Evolution of
connection weights introduces an adaptive and global
approach to training. Fig. 1. [5] shows a typical evolutionary
neural network design.

Applying GAs to weight training in ANNs consists of
two major phases deciding on the representation of
connection weights, i.e., whether in the form of binary strings
or not and the evolutionary process simulated by GAs, in
which search operators such as crossover and mutation have to
be decided in conjunction with the representation scheme.
Different representations and search operators can lead to
quite different training performance. The evolution stops when
the fitness is greater than a predefined value (i.e., the training
error is smaller than a certain value) or the population has
converged.
A typical cycle of the evolution of connection weights is as
follows [4].

Decode each individual (genotype) in the current
generation into a set of connection weights and
construct a corresponding ANN with the weights.
Evaluate each ANN by computing its total mean
square error between actual and target outputs. The



Fig. 1: A typical evolutionary neural network design.

fitness of an individual is determined by the error.
The higher the error, the lower the fitness.
Select parents for reproduction based on their fitness.
Apply search operators, such as crossover and/or
mutation, to parents to generate offspring, which
form the next generation.

The ANNs weights can be represented in two different
ways binary and real valued [6]; [7]; [8]. In either of the
cases, it is just a concatenation of the networks weights in a
string [5]. The focus in this paper is real-valued weight
representation, in which each gene in the chromosome is a real
rather than a bit. The weights are read off the network in a
fixed order (i.e., from left to right and from top to bottom and
placed in a list. Each chromosome is a vector (list) of weights.
The main genetic operators used here are the recombination
(crossover) and the mutation operators. A mutation operator
takes one parent and randomly changes some of the entries in
its chromosome to create a child. A crossover operator takes
two parents and creates one or two children containing some
of the genetic material of each parent.

B.2 A SOFTWARE MODEL FOR THE EVOLUTION OF
MULTILAYER PERCEPTRON WEIGHTS

In this section, a software model is constructed for
the evolution of MLP network weights using an object
oriented approach. The whole process is carried out using the
Unified Modeling Language (UML), which provides a formal
framework for the modeling of software systems. The final
implementation, called GANN, has been written in the C++
Programming Language.

A neuron model is the basic information processing
unit in ANNs. The perceptron is the characteristic neuron
model in the MLP [9]. It computes a net input signal u, as a
function f of the input signals x and the free parameters bias
and weights (b,w). The net input signal is then subjected to an
activation function g to produce an output signal y. Two of the
most used activation functions are the sigmoid function, g(u) =
1/1+e
-u
, and the hyperbolic tangent function, g(u) = tanh(u).

Special Issue Page 8 of 52 ISSN 2229 5208
International Journal of Computer Information Systems,
Vol. 2, No. 5, May 2011
The power of neural computation comes from
connecting many neurons in a network architecture. The
architecture of a neural network refers to the number of
neurons, their arrangement and connectivity. The
characteristic network architecture in the multilayer perceptron
is the so called feed-forward architecture [10]. A feedforward
architecture is usually made up of an input layer of nodes
(units), one or more hidden layers of neurons (nodes), and an
output layer of nodes. Information normally proceeds layer by
layer from the input layer through the hidden layers and then
to the output layer. In this way, a MLP becomes feedforward
network architecture of perceptron neuron models [10].

B.3 THE GANN MODEL

The Unified Modeling Language (UML) is a general
purpose visual modeling language that is used to specify,
visualize, construct, and document the artifacts of a software
system [11]. UML class diagrams usually describe the classes
of the system, the way the classes relate to one another and the
attributes and operations (methods) of the classes.

In order to construct the GANN model for the
multilayer perceptron, a top-down development shall be
followed [11]. This approach normally begins at the highest
conceptual level and works down to the details. In this way, to
create and evolve a conceptual class diagram for the
multilayer perceptron, we iteratively model (i) classes, (ii)
associations, (iii) derived classes and (iv) attributes and
operations. In object-oriented modelling, concepts (objects)
are represented by means of classes. Therefore, a major task is
to identify the main objects of the problem domain. In this
work, the multilayer perceptron is characterized by a neuron
model, network architecture, and associated objective
functional (mean squared error) and training algorithms
(genetic algorithms). The characteristics of the major classes
are as follows:
Perceptron: The class that represents the concept of
perceptron neuron model is called Perceptron.
Multilayer perceptron: The class representing the
concept of MLP network architecture is called
MultilayerPerceptron.
Objective functional: The class that represents the
concept of objective functional of the multilayer
perceptron is called ObjectiveFunctional.
Training algorithm: The class that represents the
concept of training algorithm for a multilayer
perceptron is called TrainingAlgorithm.
Once the main concepts (classes) in the model have been
identified, it is necessary to identify their interrelationship to
one another. For example, the multilayer perceptron is built by
a set of neurons (perceptrons). A multilayer perceptron assigns
an objective functional (mean squared error). An objective
functional (mean squared error) is improved by a training
algorithm (genetic algorithm).
Fig. 2 shows a simplified UML class diagram for the GANN
showing the main classes and their interrelationship.

C. THE DATASETS
In order to have the results comparable to BP-trained MLPs,
some Pattern recognition (classification) problems/datasets
have been applied to evaluate the performance of the GA-
trained neural networks. The datasets used for the neural
network learning are split into two parts: one part on which the
training is performed, is called the training set, and another
part on which the performance of the resulting network is
measured (to test for generalization ability



Fig. 2: A Simplified UML class diagram for the GANN

of the trained network), is called the test set. The idea is
that the performance of a network on the test set estimates its
performance in real use [12]. The following briefly describes
some of the datasets used for this study.

C.1 LOGICAL OPERATORS
Learning logical operations is a traditional
benchmark application for neural networks [11]. Here a single
MLP is used to learn a set of logical operations. In this section,
a neural network is being trained to learn the logical
operations namely, AND, OR, NAND, NOR, XOR and
XNOR. The number of samples in the data set is 4. The
number of input variables for each sample is 2 and the number
of target variables is 6. Table 1 shows the input-target data set
for this problem.

C.2 FISHERS IRIS DATA
The data set contains 3 classes of 50 instances each, where
each class refers to a type of Iris plant. One class is linearly
separable from the other 2. This data set was created from the
Iris problem data set from the UCI repository of machine-
learning databases.

C.3 PIMA INDIAN DIABETES PROBLEM
This data set is used for diagnosing diabetes among Pima
Indians. This data set includes eight inputs and one output.
The patterns are split with 576 for training and 192 for testing,
totalling 768 patterns. All inputs are continuous, and 65.1% of
the patterns are negative for diabetes. This data set was created
Special Issue Page 9 of 52 ISSN 2229 5208
International Journal of Computer Information Systems,
Vol. 2, No. 5, May 2011
from the Pima Indians diabetes problem data set from the
UCI repository of machine-learning databases.

C.4 AIRCRAFT LANDING DATA
The data set provided is of image analysis of aircraft
approaching an aircraft carrier. This is data from 5 classes and
there are 200 examples from each class. The patterns are split
with 750 for training and 250 for testing, totaling 1000
patterns.

III EXPERIMENTATIONS

A series of experiments are discussed to compare the
performance of GA-trained neural networks with BP
trained neural networks using some classification
problems (datasets).
In comparing the two algorithms, one iteration of the
BP is considered to be equal to one iteration of the
GAs.
The ability of the BP and GAs to reduce the objective
function (MSE) and the quantity of time (CPU time)
utilized by each of the algorithms are measured to
determine the effectiveness and efficiency of the
algorithms.

Table 1. Logical operations input-target data set.

a b AND OR NAND NOR XOR
XNOR
1 1
1 0
0 1
0 0
1 1 0 0 0
1
0 1 1 0 1
0
0 1 1 0 1
0
0 1 1 0 1
1


The weights of the initial members of the population
are chosen randomly with a uniform distribution
between -1.0 and 1.0.

A. COMPONENT OF THE Gas
The weights of the initial members of the population
are chosen randomly with a uniform distribution
between -1.0 and 1.0.
The activation function for both the hidden and
output units is the hyperbolic tangent function.
The crossover probability is set at 0.25 and the
mutation rate is set at 0.1.
The fitness Assignment Method is LinearRanking.
The selection Method is Roulette Wheel Selection.
The recombination (crossover) Method is
Intermediate and the normal mutation method is used.
The training using GA stops when the best evaluation
(mean squared error) is 0.01 or the maximum number
of generations is reached.
B. COMPONENT OF THE BP
The BP algorithm described in JAVANNS mentioned
in Section A.1 is used.
The BP requires epoch learning, i.e., the weights are
updated only once per epoch.
The Mean squared error (MSE) function is used.
The learning rate is set 0.5 in all the cases.
The activation function for the input unit is linear
(Identity) while the activation function for both the
hidden and output units is the sigmoidal function.
Experiment 1 was designed to compare the
performances of the GA and the BP on the logical
operations problem.
Experiment 2 was designed to compare the
performances of the GA and the BP on the Fishers
Iris data.
Experiment 3 was designed to compare the
performances of the GA and the BP on the Pima
India Diabetes data.
Experiment 4 was designed to compare the
performances of the genetic algorithm and the BP on
the Aircraft landing data.

IV RESULTS AND DISCUSSION

As shown in table 2, it took BP only 500 epochs and
consequently less time to train the logical operations problem
against the GAs that used 40,000 epochs and consequently
larger time to reach the same percentage correct classification.
The results also show very significant change in the
performance of the GA-trained network (from 0.0% correct
classification, when the number of epochs is 500 to 100%
correct classification, when the number of epochs is 40000,
thus showing that what GAs needs is more iterations (epochs)
to reach the desired accuracy.

Table 2. Results of Experiment 1 Comparison of results of
training the logical operation problem with the BP and GAs.
Training
Algorithm
(TA)
# Of
Epochs
(EPC)
CPU
Time
Used
(CPU)
% Correct
Classification(COC)
Best
Evaluation
(MSE)
BP 500
40,000
1
8
100
100
0.0
0.0
GA 500
40,000
2
60
0.0
100
4.4438
0.2

It took BP about 100000 epochs to achieve 100%
correct accuracy and took more than 150000 epochs to achieve
the same percentage correct classification using the Gas as
shown in table 3. There is also significant improvement in the
percentage correct classification of the GA-trained network as
Special Issue Page 10 of 52 ISSN 2229 5208
International Journal of Computer Information Systems,
Vol. 2, No. 5, May 2011
the number of iterations increases from 40000 to 150000 (i.e.,
from 70.67% to 96%).

As seen in table 4, it took BP about 100000 epochs to
achieve 100% correct accuracy and took more than 150000
epochs to achieve the same percentage correct classification
using the GAs. There is little significant improvement in the
percentage correct classification of the GA-trained network as
the number of iterations increases from 40000 to 150000 (i.e.,
from 74% to 76.04%).
From table 5, it took BP about 40000 epochs to
achieve 99.6% correct accuracy and took more than 150000
epochs to achieve the same percentage correct classification
using the GAs. The results also show that the percentage
correct classification by BP remains the same in all the cases,
showing that the best number of iterations is 40000 for this
particular experiment. There is also significant improvement
in the percentage correct classification of the GA-trained
network as the number of iterations increases from 40000 to
150000 (i.e., from 46.3% to 86%).

Table 3. Results of Experiment 2 - Comparison of results of
training the Iris data with the BP and GAs.
Training
Algorithm
# of
Epochs
CPU
Time
(Seconds)
Best
Evaluation
(MSE)
% Correct
Classification
BP 40,000
100,000
150,000
147
522
935
0.013
0.0
0.0
99.34
100
100
GA 40,000
100,000
150,000
2116
5540
8310
0.487
0.1109
0.1026
70.67
95.33
96


Table 4. Results of Experiment 3 - Comparison of results of
training the Pima Indian Diabetes data with the BP and GAs.


Training
Algorithm

# of
Epochs
CPU
Time
(Seconds)
Best
Evaluation
(MSE)
%Correct
Classification

BP
40,000
100,000
150,000
985
3173
4260
0.021
0.0052
0.0
98.9
100
100

GA
40,000
100,000
150,000
11320
28360
42540
0.602
0.5670
0.5543
74
75.91
76.04










Table 5. Results of Experiment 4 - Comparison of results of
training the Aircraft landing data with the BP and GA.

Training
Algorithm
# of
Epochs
CPU
Time
(Seconds)
Best
Evaluation
(MSE)
%Correct
Classification
BP 40,000
100,000
150,000
2033
3756
6400
0.0115
0.004
0.004
99.6
99.6
99.6
GA 40,000
100,000
150,000
18400
46000
69000
1.0668
0.4634
0.3881

46.3
81.3
86



V. CONCLUSION

This paper in essence did not try to find a training
algorithm that will be a substitute for BP training algorithm
but rather investigates what happens when the BP and GA are
used to train feedforward neural networks using pattern
recognition examples. The results showed that the BP
outperformed the GAs in this instance. The results also
confirm that MLPs using BP training algorithm are still
considered as universal classifiers [13]. The results imply that
caution should be taken before using other algorithms as
substitutes for the BP algorithm, more especially in
classification problems. The performance of the GAs also
indicated that the GAs can be used as an alternative training
algorithm for MLPs in some cases. To make good comparison
for the two algorithms, more complex experiments are
required to ascertain the performances of both BP and GAs
algorithms especially in other applications other than the
pattern recognition problems.

REFERENCES

[1] S. Udo. Multiple Layer Perceptron Training Using
Genetic Algorithms European Symposium on
Artificial Neural Networks (ESANN2001).

[2] J. Branke. Evolutionary Algorithms for Neural
Network Design and Training, Proceedings of the
first Nordic Workshop on Genetic Algorithms and its
Applications 1995, Vaasa, Finland, (1995).

[3] C. Stergiou and D. Siganos. Neural Networks.
Accessed 20/04/10. [Internet] Available from:
http://www.doc.ic.ac.uk/~nd/surprise_96/journal/v
ol4/cs11/report.html , (1996) [Accessed 20/05/09].

[4] X. Yao. Evolving Artificial Neural Networks.
Proceedings of the IEEE, Vol. 87, No. 9, pp. 1423
1447, (1999).

[5] D. Rinku. Evolutionary Neural Networks: Design
Methodologies. [Internet] Available from: http://ai-
Special Issue Page 11 of 52 ISSN 2229 5208
International Journal of Computer Information Systems,
Vol. 2, No. 5, May 2011
depot.com/articles/evolutionary-neural-networks-
design-methodologies/,(2003)
[Accessed 02/04/10].

[6] M. Mitchell. An introduction to genetic algorithms.
MIT Press, (1996).

[7] D. J. Montana and L. Davis. Training Feedforward
Neural Networks Using Genetic Algorithms,
Proceedings of the International Joint Conference on
Artificial Intelligence, pp. 762-767, (1989).

[8] P. Koehn. Combining genetic algorithms and neural
networks: The encoding problem. Masters thesis,
University of Tennessee, Knoxville. Accessed
20/04/10. Available online from:
ftp://archive.cis.ohio-
state.edu/pub/neuroprose/koehn.encoding.ps.Z.,
(1994)

[9] C. M. Bishop. Neural Networks for Pattern
Recognition. Clarendon Press: Oxford, (1995).

[10] Rumbaugh, J. et al. The Unified Modeling Language
Reference Manual. Addison Wesley, (1999).

[11] R. Lopez and E. Oate. A Software model for the
multilayer Perceptron
IADIS International Conference of Applied
Computing 2007, pp. 464 468, (2007).

[12] L. Prechelt. PROBEN1 A Set of neural network
benchmark problems and benchmarking rules.
Technical report, Fakutat Fur Informatik, Universitat
Karlsruhe. Doc: pub/papers/techreports/1994/1994-
21.ps.Z, Data :/pub/neuron/-Proben1.tar.gz from
ftp.ira.uka.de, (1994).

[13] K. Hornik, M. Stinchcombe and H. White. Multilayer
feed-forward networks are universal approximators.
Neural Networks 2(5): 359366, (1989).

.

Joy Ulumma Chukwuchekwa received her M.Sc. degree
from University of Glarmorgan, United Kingdom, in
Intelligent Computer Systems in 2010. She also holds M.Sc
degree in Applied Mathematics from Federal University of
Technology, Owerri, Nigeria where she is at present lecturing.
She is a member of IEEE Computer Society and Society of
Industrial and Applied Mathematics, all based in USA
Special Issue Page 12 of 52 ISSN 2229 5208

Das könnte Ihnen auch gefallen