Comparing The Performance of Backpropagation Algorithm and Genetic Algorithms in Pattern Recognition Problems

International Journal of Computer Information Systems, Vol. 2, No.
5, May 2011
Comparing the Performance of Backpropagation Algorithm and Genetic Algorithms in Pattern Recognition Problems
Chukwuchekwa Ulumma Joy Department of Mathematics Federal University of Technology, Owerri (NIGERIA) E-mail: rejoice2k7@yahoo.com Abstract Multilayer Perceptrons (MLPs) with Backpropagation (BP) training algorithm have been known to be very useful in solving a wide variety of real world problems (such as Pattern Classification, Clustering, Function Approximation, Forecasting, Optimization, Pattern Association and Control), but there have been numerous research on determining alternative training algorithms to the backpropagation algorithm, which has been known to be using the gradient descent technique. One approach has been to use optimization algorithms, such as the genetic algorithm, that are not dependent on derivatives to modify the objective function to penalize for unwanted weights in the solution. In this paper, using some pattern recognition problems as illustrations, comparisons are made based on the effectiveness and efficiency of both backpropagation and genetic algorithm training algorithms on the networks. The backpropagation algorithm is found to outperform the genetic algorithm in this instance. Keywords- Backpropagation neural networks; Genetic Algorithms; Evolving neural networks
tendency of the trained neural network getting stuck in local minima. Numerous attempts have been made to prevent the gradient descent algorithm from becoming trapped in local minima as the training of the network progresses. Genetic algorithms (GAs) usually avoid local minima by searching in several regions simultaneously (working on a population of trial solutions). And the only information that GAs need is some performance value that determines how good a given set of weights is. They have no need for gradient information. GAs also place no restrictions on the network topology because they do not require backward propagation of an error signal [2]. In this paper, using some pattern recognition problems as illustrations, comparisons are made based on the effectiveness and efficiency of both the backpropagation and the genetic algorithm training algorithms on the neural networks. The backpropagation algorithm is found to outperform the genetic algorithm in this instance. II. MATERIALS AND METHODS
I.
INTRODUCTION
Multilayer perceptron (MLP) neural networks trained with backpropagation (BP) algorithm have been proved to be useful in solving a wide variety of real world problems in various domains. Despite numerous extensions and modifications [1] (such as the acceleration of the convergence speed (i.e. fast-backpropagation), special learning rules and data representation schemes (i.e. cumulative delta-rule, cascade and batching learning), different error functions (i.e. cubic instead of standard RMS), alternative transfer (activation) functions of the neurons, weight distribution (i.e. weight pruning) among others, to improve the results or to achieve some required properties of the trained networks, one key element i.e., the BP, still based on the gradient descent algorithm to minimize the network error, has been scarcely changed. Usually a gradient descent algorithm is used to adjust the neural networks weight by comparing the target (desired) and actual network results when a set of inputs are introduced in the network, but despite its popularity in the training of MLP, BP has some drawbacks. It depends on the shape of the error surface, the values of the randomly initialized weights and some other parameters, that is, BP very much depends on good, problem specific parameter settings [2]. Also there is the
A. TRAINING MLPS WITH THE BP There are many kinds of learning rules but the most often used is the Delta rule or Back-propagation rule. A NN is trained to map a set of input data by iterative adjusting the weights iteratively. Information from inputs is fed forward through the network to optimize the weights between neurons. Optimization of the weights is usually made by backward propagation of the error during training or learning phase. The ANNs read the input and output values in the training data set and change the value of the weighted connections (links) to reduce the difference between the predicted and target values. The error in prediction is minimized across many training cycles (epochs) until network reaches specified level of accuracy. Stergiou and Siganos [3] gave a good explanation for the backpropagation algorithm. A.1 JAVANNS JAVANNS is a simulator for ANNs. It enables one to use predefined networks or create new networks, to train and analyze them. Details of the usage can be found in
Special Issue
Page 7 of 52
ISSN 2229 5208
JAVANNS-manual-4.html 2001 2002 Universitt Tbigen. B. TRAINING THE MLPS USING GENETIC ALGORITHMS Genetic algorithms (GAs) belong to a family of computational models based on evolution. GAs belong to a class of population based random search algorithms that are inspired by the principles of natural evolution known as Evolutionary Algorithms (EAs). A key element in a genetic algorithm is that it maintains a population of candidate solutions that evolves over time. By selecting suitable parameters to control the GAs, high efficiency and good performance can be achieved. GAs are developed by John Holland in 1975. GAs have ability to create an initial population of feasible solutions, and then recombine them in a way to guide their search to only the most promising areas of the state space. Each feasible solution is encoded as chromosome (string) and each chromosome is a measure of fitness via a fitness function. The fitness of a chromosome determines its ability to survive and produce offspring. B.1 EVOLVING NEURAL NETWORKS
International Journal of Computer Information Systems, Vol. 2, No. 5, May 2011 Evaluate each ANN by computing its total mean square error between actual and target outputs. The
Fig. 1: A typical evolutionary neural network design. fitness of an individual is determined by the error. The higher the error, the lower the fitness. Select parents for reproduction based on their fitness. Apply search operators, such as crossover and/or mutation, to parents to generate offspring, which form the next generation.
Combining GAs and ANNs produces a special class of ANNs in which GAs is another form of adaptation in addition to learning [4]. The GAs can be used effectively to find an optimal set of connection weights globally without computing gradient information. GAs have been combined with ANNs at different levels such as connection weights, network architecture and learning rules. The evolution of architectures enables ANNs to adapt their structures (topologies) to different tasks without human intervention. Evolution of learning rules is a process where the adaptation of learning rules is achieved through evolution. Evolution of connection weights introduces an adaptive and global approach to training. Fig. 1. [5] shows a typical evolutionary neural network design. Applying GAs to weight training in ANNs consists of two major phases deciding on the representation of connection weights, i.e., whether in the form of binary strings or not and the evolutionary process simulated by GAs, in which search operators such as crossover and mutation have to be decided in conjunction with the representation scheme. Different representations and search operators can lead to quite different training performance. The evolution stops when the fitness is greater than a predefined value (i.e., the training error is smaller than a certain value) or the population has converged. A typical cycle of the evolution of connection weights is as follows [4]. Decode each individual (genotype) in the current generation into a set of connection weights and construct a corresponding ANN with the weights.
The ANNs weights can be represented in two different ways binary and real valued [6]; [7]; [8]. In either of the cases, it is just a concatenation of the networks weights in a string [5]. The focus in this paper is real-valued weight representation, in which each gene in the chromosome is a real rather than a bit. The weights are read off the network in a fixed order (i.e., from left to right and from top to bottom and placed in a list. Each chromosome is a vector (list) of weights. The main genetic operators used here are the recombination (crossover) and the mutation operators. A mutation operator takes one parent and randomly changes some of the entries in its chromosome to create a child. A crossover operator takes two parents and creates one or two children containing some of the genetic material of each parent. B.2 A SOFTWARE MODEL FOR THE EVOLUTION OF MULTILAYER PERCEPTRON WEIGHTS In this section, a software model is constructed for the evolution of MLP network weights using an object oriented approach. The whole process is carried out using the Unified Modeling Language (UML), which provides a formal framework for the modeling of software systems. The final implementation, called GANN, has been written in the C++ Programming Language. A neuron model is the basic information processing unit in ANNs. The perceptron is the characteristic neuron model in the MLP [9]. It computes a net input signal u, as a function f of the input signals x and the free parameters bias and weights (b,w). The net input signal is then subjected to an activation function g to produce an output signal y. Two of the most used activation functions are the sigmoid function, g(u) = 1/1+e-u , and the hyperbolic tangent function, g(u) = tanh(u).
Special Issue
Page 8 of 52
ISSN 2229 5208
International Journal of Computer Information Systems, Vol. 2, No. 5, May 2011 The power of neural computation comes from connecting many neurons in a network architecture. The architecture of a neural network refers to the number of neurons, their arrangement and connectivity. The characteristic network architecture in the multilayer perceptron is the so called feed-forward architecture [10]. A feedforward architecture is usually made up of an input layer of nodes (units), one or more hidden layers of neurons (nodes), and an output layer of nodes. Information normally proceeds layer by layer from the input layer through the hidden layers and then to the output layer. In this way, a MLP becomes feedforward network architecture of perceptron neuron models [10]. B.3 THE GANN MODEL C. THE DATASETS In order to have the results comparable to BP-trained MLPs, some Pattern recognition (classification) problems/datasets have been applied to evaluate the performance of the GAtrained neural networks. The datasets used for the neural network learning are split into two parts: one part on which the training is performed, is called the training set, and another part on which the performance of the resulting network is measured (to test for generalization ability
The Unified Modeling Language (UML) is a general purpose visual modeling language that is used to specify, visualize, construct, and document the artifacts of a software system [11]. UML class diagrams usually describe the classes of the system, the way the classes relate to one another and the attributes and operations (methods) of the classes. In order to construct the GANN model for the multilayer perceptron, a top-down development shall be followed [11]. This approach normally begins at the highest conceptual level and works down to the details. In this way, to create and evolve a conceptual class diagram for the multilayer perceptron, we iteratively model (i) classes, (ii) associations, (iii) derived classes and (iv) attributes and operations. In object-oriented modelling, concepts (objects) are represented by means of classes. Therefore, a major task is to identify the main objects of the problem domain. In this work, the multilayer perceptron is characterized by a neuron model, network architecture, and associated objective functional (mean squared error) and training algorithms (genetic algorithms). The characteristics of the major classes are as follows: Perceptron: The class that represents the concept of perceptron neuron model is called Perceptron. Multilayer perceptron: The class representing the concept of MLP network architecture is called MultilayerPerceptron. Objective functional: The class that represents the concept of objective functional of the multilayer perceptron is called ObjectiveFunctional. Training algorithm: The class that represents the concept of training algorithm for a multilayer perceptron is called TrainingAlgorithm. Once the main concepts (classes) in the model have been identified, it is necessary to identify their interrelationship to one another. For example, the multilayer perceptron is built by a set of neurons (perceptrons). A multilayer perceptron assigns an objective functional (mean squared error). An objective functional (mean squared error) is improved by a training algorithm (genetic algorithm). Fig. 2 shows a simplified UML class diagram for the GANN showing the main classes and their interrelationship.
Fig. 2: A Simplified UML class diagram for the GANN of the trained network), is called the test set. The idea is that the performance of a network on the test set estimates its performance in real use [12]. The following briefly describes some of the datasets used for this study. C.1 LOGICAL OPERATORS Learning logical operations is a traditional benchmark application for neural networks [11]. Here a single MLP is used to learn a set of logical operations. In this section, a neural network is being trained to learn the logical operations namely, AND, OR, NAND, NOR, XOR and XNOR. The number of samples in the data set is 4. The number of input variables for each sample is 2 and the number of target variables is 6. Table 1 shows the input-target data set for this problem. C.2 FISHERS IRIS DATA The data set contains 3 classes of 50 instances each, where each class refers to a type of Iris plant. One class is linearly separable from the other 2. This data set was created from the Iris problem data set from the UCI repository of machinelearning databases. C.3 PIMA INDIAN DIABETES PROBLEM This data set is used for diagnosing diabetes among Pima Indians. This data set includes eight inputs and one output. The patterns are split with 576 for training and 192 for testing, totalling 768 patterns. All inputs are continuous, and 65.1% of the patterns are negative for diabetes. This data set was created
Special Issue
Page 9 of 52
ISSN 2229 5208
from the Pima Indians diabetes problem data set from the UCI repository of machine-learning databases. C.4 AIRCRAFT LANDING DATA The data set provided is of image analysis of aircraft approaching an aircraft carrier. This is data from 5 classes and there are 200 examples from each class. The patterns are split with 750 for training and 250 for testing, totaling 1000 patterns. III EXPERIMENTATIONS A series of experiments are discussed to compare the performance of GA-trained neural networks with BP trained neural networks using some classification problems (datasets). In comparing the two algorithms, one iteration of the BP is considered to be equal to one iteration of the GAs. The ability of the BP and GAs to reduce the objective function (MSE) and the quantity of time (CPU time) utilized by each of the algorithms are measured to determine the effectiveness and efficiency of the algorithms.
International Journal of Computer Information Systems, Vol. 2, No. 5, May 2011 The training using GA stops when the best evaluation (mean squared error) is 0.01 or the maximum number of generations is reached. B. COMPONENT OF THE BP The BP algorithm described in JAVANNS mentioned in Section A.1 is used. The BP requires epoch learning, i.e., the weights are updated only once per epoch. The Mean squared error (MSE) function is used. The learning rate is set 0.5 in all the cases. The activation function for the input unit is linear (Identity) while the activation function for both the hidden and output units is the sigmoidal function. Experiment 1 was designed to compare the performances of the GA and the BP on the logical operations problem. Experiment 2 was designed to compare the performances of the GA and the BP on the Fishers Iris data. Experiment 3 was designed to compare the performances of the GA and the BP on the Pima India Diabetes data. Experiment 4 was designed to compare the performances of the genetic algorithm and the BP on the Aircraft landing data. IV RESULTS AND DISCUSSION As shown in table 2, it took BP only 500 epochs and consequently less time to train the logical operations problem against the GAs that used 40,000 epochs and consequently larger time to reach the same percentage correct classification. The results also show very significant change in the performance of the GA-trained network (from 0.0% correct classification, when the number of epochs is 500 to 100% correct classification, when the number of epochs is 40000, thus showing that what GAs needs is more iterations (epochs) to reach the desired accuracy. Table 2. Results of Experiment 1 Comparison of results of training the logical operation problem with the BP and GAs. Training # Of CPU % Correct Best Algorithm Epochs Time Classification(COC) Evaluation (TA) (EPC) Used (MSE) (CPU) BP 500 1 100 0.0 40,000 8 100 0.0 GA 500 2 0.0 4.4438 40,000 60 100 0.2 It took BP about 100000 epochs to achieve 100% correct accuracy and took more than 150000 epochs to achieve the same percentage correct classification using the Gas as shown in table 3. There is also significant improvement in the percentage correct classification of the GA-trained network as
Table 1. Logical operations input-target data set. a 1 1 0 0 1 0 1 0 b AND OR XNOR 1 1 1 0 1 0 0 1 0 0 1 1 NAND 0 1 1 1 NOR 0 0 0 0 XOR 0 1 1 1
The weights of the initial members of the population are chosen randomly with a uniform distribution between -1.0 and 1.0.
A. COMPONENT OF THE Gas The weights of the initial members of the population are chosen randomly with a uniform distribution between -1.0 and 1.0. The activation function for both the hidden and output units is the hyperbolic tangent function. The crossover probability is set at 0.25 and the mutation rate is set at 0.1. The fitness Assignment Method is LinearRanking. The selection Method is Roulette Wheel Selection. The recombination (crossover) Method is Intermediate and the normal mutation method is used.
Special Issue
Page 10 of 52
ISSN 2229 5208
the number of iterations increases from 40000 to 150000 (i.e., from 70.67% to 96%). As seen in table 4, it took BP about 100000 epochs to achieve 100% correct accuracy and took more than 150000 epochs to achieve the same percentage correct classification using the GAs. There is little significant improvement in the percentage correct classification of the GA-trained network as the number of iterations increases from 40000 to 150000 (i.e., from 74% to 76.04%). From table 5, it took BP about 40000 epochs to achieve 99.6% correct accuracy and took more than 150000 epochs to achieve the same percentage correct classification using the GAs. The results also show that the percentage correct classification by BP remains the same in all the cases, showing that the best number of iterations is 40000 for this particular experiment. There is also significant improvement in the percentage correct classification of the GA-trained network as the number of iterations increases from 40000 to 150000 (i.e., from 46.3% to 86%). Table 3. Results of Experiment 2 - Comparison of results of training the Iris data with the BP and GAs. Training # of CPU Best % Correct Algorithm Epochs Time Evaluation Classification (Seconds) (MSE) BP 40,000 147 0.013 99.34 100,000 522 0.0 100 150,000 935 0.0 100 GA 40,000 2116 0.487 70.67 100,000 5540 0.1109 95.33 150,000 8310 0.1026 96
International Journal of Computer Information Systems, Vol. 2, No. 5, May 2011 Table 5. Results of Experiment 4 - Comparison of results of training the Aircraft landing data with the BP and GA. Training Algorithm BP # of Epochs 40,000 100,000 150,000 40,000 100,000 150,000 CPU Time (Seconds) 2033 3756 6400 18400 46000 69000 Best Evaluation (MSE) 0.0115 0.004 0.004 1.0668 0.4634 0.3881 %Correct Classification 99.6 99.6 99.6 46.3 81.3 86
GA
V.
CONCLUSION
This paper in essence did not try to find a training algorithm that will be a substitute for BP training algorithm but rather investigates what happens when the BP and GA are used to train feedforward neural networks using pattern recognition examples. The results showed that the BP outperformed the GAs in this instance. The results also confirm that MLPs using BP training algorithm are still considered as universal classifiers [13]. The results imply that caution should be taken before using other algorithms as substitutes for the BP algorithm, more especially in classification problems. The performance of the GAs also indicated that the GAs can be used as an alternative training algorithm for MLPs in some cases. To make good comparison for the two algorithms, more complex experiments are required to ascertain the performances of both BP and GAs algorithms especially in other applications other than the pattern recognition problems. REFERENCES [1] S. Udo. Multiple Layer Perceptron Training Using Genetic Algorithms European Symposium on Artificial Neural Networks (ESANN2001). J. Branke. Evolutionary Algorithms for Neural Network Design and Training, Proceedings of the first Nordic Workshop on Genetic Algorithms and its Applications 1995, Vaasa, Finland, (1995). C. Stergiou and D. Siganos. Neural Networks. Accessed 20/04/10. [Internet] Available from: http://www.doc.ic.ac.uk/~nd/surprise_96/journal/v ol4/cs11/report.html , (1996) [Accessed 20/05/09]. X. Yao. Evolving Artificial Neural Networks. Proceedings of the IEEE, Vol. 87, No. 9, pp. 1423 1447, (1999). D. Rinku. Evolutionary Neural Networks: Design Methodologies. [Internet] Available from: http://ai-
Table 4. Results of Experiment 3 - Comparison of results of training the Pima Indian Diabetes data with the BP and GAs. CPU Time (Seconds) 985 3173 4260 11320 28360 42540 Best Evaluation (MSE) 0.021 0.0052 0.0 0.602 0.5670 0.5543 %Correct Classification 98.9 100 100 74 75.91 76.04
Training Algorithm BP
GA
# of Epochs 40,000 100,000 150,000 40,000 100,000 150,000
[2]
[3]
[4]
[5]
Special Issue
Page 11 of 52
ISSN 2229 5208
International Journal of Computer Information Systems, Vol. 2, No. 5, May 2011 depot.com/articles/evolutionary-neural-networksdesign-methodologies/,(2003) [Accessed 02/04/10]. [6] M. Mitchell. An introduction to genetic algorithms. MIT Press, (1996). D. J. Montana and L. Davis. Training Feedforward Neural Networks Using Genetic Algorithms, Proceedings of the International Joint Conference on Artificial Intelligence, pp. 762-767, (1989). P. Koehn. Combining genetic algorithms and neural networks: The encoding problem. Masters thesis, University of Tennessee, Knoxville. Accessed 20/04/10. Available online from: ftp://archive.cis.ohiostate.edu/pub/neuroprose/koehn.encoding.ps.Z., (1994) C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press: Oxford, (1995). [10] Rumbaugh, J. et al. The Unified Modeling Language Reference Manual. Addison Wesley, (1999). R. Lopez and E. Oate. A Software model for the multilayer Perceptron IADIS International Conference of Applied Computing 2007, pp. 464 468, (2007). L. Prechelt. PROBEN1 A Set of neural network benchmark problems and benchmarking rules. Technical report, Fakutat Fur Informatik, Universitat Karlsruhe. Doc: pub/papers/techreports/1994/199421.ps.Z, Data :/pub/neuron/-Proben1.tar.gz from ftp.ira.uka.de, (1994). K. Hornik, M. Stinchcombe and H. White. Multilayer feed-forward networks are universal approximators. Neural Networks 2(5): 359366, (1989).
[11]
[7]
[12]
[8]
[13]
[9]
Joy Ulumma Chukwuchekwa received her M.Sc. degree from University of Glarmorgan, United Kingdom, in Intelligent Computer Systems in 2010. She also holds M.Sc degree in Applied Mathematics from Federal University of Technology, Owerri, Nigeria where she is at present lecturing. She is a member of IEEE Computer Society and Society of Industrial and Applied Mathematics, all based in USA
Special Issue
Page 12 of 52
ISSN 2229 5208

Comparing The Performance of Backpropagation Algorithm and Genetic Algorithms in Pattern Recognition Problems

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Comparing The Performance of Backpropagation Algorithm and Genetic Algorithms in Pattern Recognition Problems

Hochgeladen von

Copyright:

Verfügbare Formate

International Journal of Computer Information Systems, Vol. 2, No.

ISSN 2229 5208

ISSN 2229 5208

ISSN 2229 5208

ISSN 2229 5208

# of Epochs 40,000 100,000 150,000 40,000 100,000 150,000

ISSN 2229 5208

ISSN 2229 5208

Das könnte Ihnen auch gefallen