Beruflich Dokumente
Kultur Dokumente
Scholars' Mine
Electrical and Computer Engineering Faculty
Electrical and Computer Engineering
Research & Creative Works
1-1-2003
Recommended Citation
G. K. Venayagamoorthy and V. G. Gudise, "Comparison of Particle Swarm Optimization and Backpropagation as Training Algorithms
for Neural Networks," Proceedings of the 2003 IEEE Swarm Intelligence Symposium, 2003. SIS '03, Institute of Electrical and Electronics
Engineers (IEEE), Jan 2003.
The definitive version is available at https://doi.org/10.1109/SIS.2003.1202255
This Article - Conference proceedings is brought to you for free and open access by Scholars' Mine. It has been accepted for inclusion in Electrical and
Computer Engineering Faculty Research & Creative Works by an authorized administrator of Scholars' Mine. This work is protected by U. S. Copyright
Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact
scholarsmine@mst.edu.
Comparison of Particle Swarm Optimization and Backpropagation as Training
Algorithms for Neural Networks
Venu G. Gudise and Ganesh K. Venayagamoorthy, Senior- Member; /EEE
Department of Electrical and Computer Engineering
University of Missouri - Rolla, USA
vgggwh@un,rrdrr.edua n d g ~ ~ , n , r r ~ i r . r . o
Abstract - Particle swarm optimization (PSO) motivated by comparison of BP lo PSO is presented with regard to their
the social behavior of organisms, is a step up to existing computational requirements.
evolutionary algorithms (or optimization of continuous non- The paper is organized as follows: In section II, the
linear runctions. Backpropagation (BP) is generally used for architecture of feedforward neural network considered in this
neural network training. Choosing a proper algorithm for paper is explained with the forward path and the backward
training a neural network is very important. In this paper, B
path for the backpropagation method. In section 111, a brief
comparative study i s made on the computational requirements of
the PSO and BP as training algorithms for neural networks. overview of the particle swarm optimization is given and its
Results are presented for a feedforward neural network learning implementation is explained. Section IV describes how the
a non-linear fnnction and these results show that the feedforward optimal set of parameters for the PSO is determined. In
neural network weights converge faster with the PSO than with section V, different neural network training procedures
the BP algorithm. (incremental and batch) are described. In section VI, the
results of training the neural network with BP and PSO are
given and their performances are compared and contrasted.
I. INTRODUCTION
11. FEEDFORWARD NEURAL NETWORKS
The role of artificial neural networks in the present world
applications is gradually increasing and faster algorithms are Neural networks are known to be universal approximators
being developed for training neural networks. In general, for any non-linear function [16]-[I71 and they are generally
backpropagation is a method used for training neural networks used for mapping error tolerant problems that have much
[ 1]-[5]. Gradient descent, conjugate gradient descent, resilient, data trained in noise. Training algorithms arc critical when
BFGS quasi-Newton, one-step secant, Levenberg-Marquardt neural networks are applied to high speed applications with
and Bayesian regularization are all different forms of the complex nonlinearities.
backpropagation training algorithm [6]-[ IO]. For all these A neural network consists of many layers namely: an input
algorithms storage and computational requirements are layer, a number of hidden layers and an output layer. The
different, some of these are good for pattern recognition and input layer and the hidden layer are connected by synaptic
others for function approximation but they have drawbacks in links called weights and likewise the hidden layer and output
one way or other, like neural network size and their associated layer also have connection weights. When more than one
storage requirements. Certain training algorithms are suitable hidden layer exists, weights exist betweer? such layers.
for some type of applications only, for example an algorithm Neural networks use some sort of "leaming" rule by which
which performs well for pattern recognition may not for the connections weights are determined in order to minimize
classification problems and vice versa, in addition some the error between the neural network output and desired
cannot cater for high accuracylperformance. It is difficult to output. A three layer feedforward neural network is shown in
find a particular training algorithm that is the best for all Fig. I .
applications under all conditions all the time.
A newly developed algorithm known as particle swarm is A . Forward Paih
an addition IO existing evolutionary techniques, which is
based on simulation of the behavior of a flock of birds or The feedforward path equations for the network in Fig. I
school of fish. The main concept is to utilize the with two input neurons, four hidden neurons and one output
communication involved in such swarms or schools. Some of neuron are given below. The first input is x and the second is
the previous work related to neural network training using the a bias input (I). The activation function of all the hidden
particle swarm optimization has been reported [11]-[15] hut neurons is given by eq (1).
none have compared against conventional training techniques.
In this paper, particle swarm optimization is compared with a, = Y i X / o r ; =/ lo 4 , j = I , 2 (1)
the conventional backpropagation (a gradient descent
algorithm) for training a feedforward neural network to learn a
non-linear function. The problem considered is how fats and
how accurate can the neural network weights be determined
by BP and PSO learning a common function. Detailed
where W , is the weight and X =
[;I is an input vector
e,,, = d , ( I - d , ) e , (6)
A V ( k ) = y m A Y ( k-I)+yxe,d' (7a)
AW(k)=ymAW(k-l)+yxeJr (7b)
111
swarm. The basic concept of PSO technique lies in preferred. Synchronous updates are more costly than the
accelerating each particle towards its pbrsr and the asynchronous updates.
locations at each time step. Acceleration has random weights V,,,, is the maximum allowable velocity for the particles.
for both ph<,.q,and locations. i.e. in case the velocity of the particle exceeds V,,,, then it is
Figure 2 illustrates briefly the concept of PSO, where P' reduced to Vnmr. Thus, resolution and fitness of search
$tl
is current position. IS modified position, V;"i is initial depends on V,,,,,. If V,,,, is too high, then particles will
velocity, V,,,,,d is modified velocity, Vpbesr is velocity move beyond good solution and if V,,,,, is too low, then
considering pbes, and Vb'he,71is velocity considering particles will be trapped in local minima. C I . cz termed as
cognition and social components respectively are the
V acceleration constants which changes the velocity of a
particle towards phes and gbc,ss,(generally somewhere
between pbesr and gbcvl). Velocity determines the tension in
the system. A swarm of particles can be used locally or
globally in a search space. In the local version of the PSO,
the gbrsr is replaced by the Ibessr and the entire procedure is
same.
112
of no convergence. This is as a result of the particles comparison with BP and PSO carried out in this paper is
exploring and not exploiting the optimum position. based on the batch mode.
Then the experiment is carried out with the optimal
parameters obtained above for W, V,,,o.7 and search space VI. RESULTS
range, now to determine the cognition ( C I ) and social (c2)
In order to compare the training capabilities of BP and
coefficients. It is found that c / = 2 and c2=2 gave best results PSO algorithms, a non-linear quadratic equation v, = 2xz+1,
(faster global convergence), also c1=2.5 and c2==1.3gave good with data points (patterns) in range ( - I , 1 ) presented in the
results. batch mode to the feedfonvard neural network in Fig. 1. The
With all these optimum values, the experiment is repeated flowchart procedure for implementing the PSO global
varying the size of the swarm (no. of particles) from 1 to version (gbus,) is given in Fig. 3. The PSO parameters used in
1000. As the size increased from 20 to 25 and 25 to 30, there this study are those mentioned in Section IV above. For
is an improvement in the convergence rate. The improvement training a neural network using the PSO, the fitness value of
for 20 to 25 is noticeable higher than from 25 to 30. When each particle (member) of the swarm is the value of the error
size is increased from 30 to 50, 50 lo 100 and 100 lo 1000, the function evaluated at the current position of the particle and
performance is also improved at the cost of a higher position vector of the particle corresponds to the weight
computational time. A compromise between the matrix of the network. The vector e(particle) in Fig. 3 stores
computational time and the performance is a size of 25 the minimum error encountered by the particles for the input
particles for the swarm (for this example). The final optimal patterns
set of PSO parameters are:
NETWORKS
A. Incremental Training
113
during training with bias value I . Fig. 5 shows an expanded
view of Fig. 4. It is clear that with the PSO the MSE drops to
under 0.01 in few iterations unlike with BP. Fig. 6 shows the
test plots after the neitral network is trained with BP and PSO
and subjected to the input vectors between - I and I . Fig. 7
shows an expanded view of Fig. 6 and it is clear that the
neural network trained with the PSO algorithm approximates
the nonlinear function better than the one trained with BP.
This means that PSO yields a better global convergence than a
conventional BP. Fig 8 shows the mean square error (MSE)
versus the number of iterations with BP and PSO during
training for the feedforward neural network with bias of 2.
The bias has helped the BP algorithm performance but the
PSO results are now obtained with 8 particles. It is observed
from Table 11 that still PSO out performs the BP despite a
0.5
change in bias. -1 .O.B -06 -0.1 -02 0 0.2 01 0.6 0.8 I
W I .
I--..
11
0.9951
0.4;
0.3; I , C BP
0.2;
Fig. 7 Magniticd vicw o f the abovc test curycs
OL .
.... ......................
..... ^^"__-,
0 yl ,m 150 2w 2% 3w 3yl 4w 1yI
Fig. 4 Mean squarc error CUNCS of neural ncrworks during mining with
BP and PSO for bias I
0 35
03
025
02
E m *
0 15
Fig. 8 Mcan square cmor CUNCS of ncural ncrworks during mining with
BP and PSO for bias 2
01
114
TABLE I
COMPARISION OF NUMBER OF COMPUTATIONS (MULTIPLICATIONS AND ADDITIONS) INVOLVED IN TRAINING A FEEDFORWARD
NEURAL NETWORK OF SIZE 2 x 4 x I FOR BP AND PSO (WITH BIAS I )
TABLE II
COMPARlSlON OF NUMBER OF COMPUTATIONS (MULTIPLICATIONS AND ADDITIONS) INVOLVED IN TRAINING A FEEDFORWARD
NEURALNETWORKOFSlZE2x4xl FORBPANDPSO(WITHBlAS2)
I Erro~0.001 I psn I RP I
Patterns (inputx) I -I:0.1:I
Iterations I 194(83 I 9836(307
Forward oath I I
I (addition;+ I 348600 I 148281 I
Backward path
(additions+ 71712 438396
Total (Forward +
Backward) 420312 586617
OF SIZE n x m x r WITH BP
Equation No.
I (activation)
2 (sigmoid)
3 (output)
Forward path
4 (0"tp"t CrrOC)
5 (decision mor)
6 (activation m o r )
7a (A")
7b (Aw)
8
Training
Total computations
115
TABLE IV
NUMBER OF COMPUTATIONS (MULTIPLICATIONS AND ADDITIONS) INVOLVED I N TRAINING A FEEDFORWARD NEURAL NETWORK
OF SIZE 17 X In X I ' WITH PSO
Sigmoids
particlcs'pattcms'm
paniclcr*pattcms*m
V11. CONCLUSIONS D.E.Rumclhart. G.E. Hinton, and R.J. Williams, "Lcaming lntcmal
Rcprcscnlalions by Enor Propagation," m Porollel Diswibuwd
Prucers;ng, vol. I, cahp.8. cds. D.E.Rumclharl and J.L. McClclland,
A feedforward neural network leaming a nonlinear function Cambridge. MA:M.I.T Press, pp.318-62, 1986
with the backpropagation and particle swarm optimization D.E.Rumclhan, G.E. Hinlon and R.J. Williams, "Lcaming
algorithms have been presented in this paper. The number of rcprcscntalions by Back-propagating errors", Nolure, Vo1.323,
computations required by each algorithm has shown that PSO pp.533-6, 1986
M.T. Hagan. H.B. Demulh and M. Bcale, Ncural Network Dcsign,
requires less to achieve the same error goal as with the BP. Boston, MA: PWS Publishing Company. 1996
Thus, PSO is a better one for applications that require fast R. Batlit, "First and sccond ordcr mcthods of leaming: bctwecn thc
learning algorithms. An important observation made is that stccpcst desecnt and N~wton'smethod", Neurol Compvrorirm, Vo1.4:2.
when the training points are fewer, the neural network learns pp.141-66, 1991
C.H. Chcn and H. Lai. "An Empirical study of thc Gradicnt dcsccnl
the nonlinear function with six times lesser number of and thc Conjugalc gradicnt hackpropagation neural networks"
computations with PSO than that required by the BP and in Procccdings OCEANS '92. Mastcring L c Oceans Through
other cases, comparable performance is observed. Moreover, Tcrhnology. Vol: I pp. 132-135. 1992
the success of BP depends on choosing a bias value unlike M.F. Mollcr, A Scalcd eonjugatc gradicnl algorilhm for fast
supervised lcaming PB-339 Rcprint, Computcr Seicnce Dcpartmcl,
with PSO. The use of PSO as a training algorithm for University of Aaurhus, Dcnmark, 1990
recurrent neural networks has been studied and similar results R. Rosa&. A Conjugalc-gradient bascd algorithm for training of she
are also found. Further work is to investigate using PSO to fccd-forward neural nctworks. Ph.D. Disscrlation, Mclhoumc: Florida
optimize PSO parameters for neural network training and InsIiIutc ofTcchnology, Scptcmbcr 1989
M.T. Hagar and M.B. Menhaj, "Training fccdfoward nctworks with
other tasks. The concept of the PSO can be incorporated into the Marquardl algoilhm" IEEE 'Tronsoc~ion.~ on Neural Nmwrks.
BP algorithm to improve its global convergence rate. This is Vol: 5 : 6, pp. 989 -993, Nov. I994
currently being studied for online learning which is critical for J. Salemo, "Using thc parliclc swam oplimizalion tcchnique to lrain a
adaptive real time identification and control functions. rccurrenl mural modcl", IEEE lntcmational Canfcrcnce on Tools with
Artificial Inlelligcncc, pp.4549, 1997
C. Zhang, Y. Li. and H. Shao, "A ncw cvolvcd artificial neural
REFERENCES nctwork and its application", Procccdings of thc 3rd World Congrcss
on lntelligcnl Control and Automation, Hefci. China, Vol. 2. pp.1065-
Ill P.J. Wcrhos. "Backpropagation through limc: what it docs and how to 1068, Junc 2000
d o it". Proceedings " / d e IEEE, Vol: 7010. Pagc(s): I550 -1560, Oct. C. Zhang, H. Shao. and Y. Li. "Panick swarm oplimisalion for
1990 evolving artificial neural nchuork", Procccdings o f thc IEEE
121 P.J. Wcrhos. Thc Roou of Backpropagation. Ncw York: Wilcy, 1994
116
lntcrnational Confcrcnce 011Systcmr. Man. and Cybcmctics . Vol. 4.
pp.24x7-249~,2nnn
M. Selllcs and B. Rylander, "Ncural network lcarning using parlirlu
w a r m oplimizcrs", Advenccs i n Information Sciencc and Sofl
Computing. pp.224-226.2002
I;. Van dcn Bergli. and A.P. Engclhrcchl.. "Coopcralivc lcarning in
mum1 iiclworkr using particlc swarm oplimizcrs." South A~Xcon
C o n p m Journal,
~ Vol. 26 pp. 84~90.2000
S. Lawrcncc. A. C. Tsoi and A. U. Back, "Funclion approx. imslion
with neural nctworks and local methods: bias. vsrianec and
smoothncss", Prucecdings of the Aurlralinn Confcrcncc on Ncural
Networks A C N N 96,Canbena, Aurlralia. pp. 16-21, 1996
1171 Mcng Jinli and Sun Zhiyi. "Applieatmn ofcombincd mural nctworks in
nunlincar function approximation." Procccdings of thc 3rd World
Congrcss on lnlelligcnt Control and Automation. Vol.2. pp. 839 ~841,
2000
M.H. Ham and I. Koslanic, Principlcs of Ncurocompuling for Scicncc
and Enginccring, McGraw-Hill, INC. NY, ISBN: 0070259666,200l
B. Bunon and R. F. Harlcy, "Rcducing the computational dcmands of
continually online trained artificial ncural ncrworks for syslcm
idcntification and control o f fast proccsscs", in Conf. Rcc. IEEE I A S
Annual Meeting. Ucnver, CO, pp. 18361843,Ocl. 1994
Jamcs Kennedy and R. Eberhan. "Paniclc swarm oplimizalion".
Procccdings, IEEE lntemational Conf. on Ncural Networks, Pcrth,
Australia. Vol. IV, pp. 1942-1948
Y. Shi and R. Eberhan, "Empirical study o f panielc swarm
optimization". Procccdingr of thc 1999 Congrcsr on Evolulionary
Computation. CEC 99.. Vol. 3
Y Shi and R. Ebcrhan, '"A modificd parlielc swarm aptimizcer". IEEE
lnlcmational Conf. an Evolutionary Computation, Anchorage, Alaska.
USA, pp. 69-73. May 1998
1231 R. Ebcrhart and Y. Shi. "Particle swarm optimization: dcvclopmcnrs.
applications and ~CSOU~CES*'.Pracecdings of the 2001 Congress on
EvolutionaryComputation, Vol: I,pp. 81 -86.20Ol
1241 Y. Shi and R. Ebcrhart, "Paramctcr Sclcction in Panicle Swarm
Optimization". Proe. Scvcnth Annual Conf. on Evolutionary
Programming. pp.591-601, March 1998
1251 Jamcs Kcnncdy, Russcll C. Ebcrhan, Yuhui Shi, Swarm intclligcnce,
Morgan Kaufmann Publishcrs, 2001
1261 H. Yoshida, Y. Fuknyama, S . Takayama. and Y. Nakanishi., "A particle
swarm optimization far rcactivc power and voltagc control in ~lcclrie
powcr syrlcms considcring vollagc sccurity assessment." IEEE SMC
'99 Conf. Prcxcedings. Vol: 6, pp. 497 -502, I999
117