Moth-Flame Optimization For Training Multi-Layer Perceptrons

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/296705917
Moth-flame optimization for training Multi-

Layer Perceptrons
Conference Paper · December 2015

DOI: 10.1109/ICENCO.2015.7416360
CITATIONS READS
0 83
4 authors, including:
Waleed Yamany Alaa Tharwat

Fayoum University Suez Canal University
13 PUBLICATIONS 6 CITATIONS 46 PUBLICATIONS 99 CITATIONS
SEE PROFILE SEE PROFILE
Aboul Ella Hassanien

Cairo University
620 PUBLICATIONS 2,246 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Renewable Energy predication View project
Special issue Call for Papers in IJCIStudies, Inderscience Publisher View project
All content following this page was uploaded by Waleed Yamany on 06 March 2016.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue
are linked to publications on ResearchGate, letting you access and read them immediately.
Moth-Flame Optimization for Training Multi-layer
Perceptrons
b d
Waleed Yamanya,*, Mohammed Fawzya, Alaa Tharwat ,c,*, Aboul Ella Hassanien ,e,*
aFayoum University, Faculty of Computers and Information, Fayoum, Egypt
Email: wsyOO@fayoum.edu.eg
b
Electrical Department, Faculty of Engineering, Suez Canal University, Ismailia, Egypt
cFaculty of Engineering, Ain Shams University, Cairo, Egypt
d
Faculty of Computers and Information, Cairo University, Cairo, Egypt
eFaculty of Computers and Information, Beni Suef University - Egypt
*Scientific Research Group in Egypt (SRGE) http://www.egyptscience.net
Abstract-Multi-Layer Perceptron (MLP) is one of the Feed optimization techniques to optimize the performance of NN.
Forward Neural Networks (FFNNs) types. Searching for weights On the other hand, stochastic trainers employ stochastic opti
and biases in MLP is important to achieve minimum training mization methods to increase performance of the NN.
error. In this paper, Moth-Flame Optimizer (MFO) is used to
train Multi-Layer Perceptron (MLP). MFO-MLP is used to In general, NN consists of three layers, namely, input,
search for the weights and biases of the MLP to achieve minimum hidden, and output layers. Moreover, a network is used to
error and high classification rate. Five standard classification connect the nodes of all layers. The weight of each connection
datasets are utilized to evaluate the performance of the proposed is adjusted during the training process. The trainer can be
method. Moreover, three function-approximation datasets are
considered as the most important element of any NNs. The
used to test the performance of the proposed method. The
main goal of the trainer is to train NNs by searching for the
proposed method (i.e. MFO-MLP) is compared with four well
known optimization algorithms, namely, Genetic Algorithm (GA),
optimal weights and biases to obtain the maximal accuracy
Particle Swarm Optimization (PSO), Ant Colony Optimization for new sets (i.e. unknown sets or patterns) of given inputs.
(ACO), and Evolution Strategy (ES). The experimental results In other words, the trainer changes the structural parameters
prove that the MFO algorithm is very competitive, solves the of the NN in every training step (iteration) to enhance the
local optima problem, and it achieves a high accuracy. accuracy. When the training stage is finished, the model of the
NN is used to predict or estimate the value of a new pattern.
I. INTRODUCTION The characteristics of the deterministic trainers are: direct

ness and speed. Deterministic training method usually begins
Neural Networks (NN) is one of the most important tech with a solution and leads it approaching an optimal solution.
niques in the domain of machine learning and computational The goodness of the obtained solution deeply based on the
intelligence. Neural networks simulates the neurons of a human beginning solution. Thus, there exist a greater probability of
brain to predict or estimate the types or class of the unknown local optima trap. In other words, different solutions will be
patterns. The basic concepts of NNs are reported in 1943 [1]. reached based on different beginning solutions. On the other
There exist various kinds of NNs appeared in literature such hand, the stochastic trainers begin with random solutions and
as Radial basis function (RBF) neural network [2], Kohonen improve them through the training operation. Randomness is
self-organizing (KSO) neural network [3], Spiking neural net the essential component of the stochastic trainers that apply
works [4], Learning Vector Quantization (LVQ) [5], and Feed to both initial solutions and method of solution's improvement
forward neural network (FFNN) [6]. In FFNN, the information during the training process. The characteristic of such tech
is passed in one direction throughout the networks, while in niques is a high avoiding local optima trap, hence a global
Back-Propagation Neural Networks (BPNN) the information optimal solution will be achieved. However, stochastic trainers
is passed in two ways. are slower than deterministic methods.
There exist two common kinds of learning: unsupervised In this paper, MFO-MLP is used to search for the weights
and supervised [7]. In unsupervised or adaptive learning, the and biases to find the optimal solution which has minimum
NN adjusts itself with the inputs without the need for any training error. MFa algorithm has two main advantages. First,
extra external feedbacks. However, in supervised learning, the MFa avoids the local optima problem, while many other opti
NN is given with feedbacks from an external source. In other mization algorithms such as Genetic Algorithm (GA) still face
words, in supervised learning, training or labeled data are this problem [11], [12]. Second, MFa has high exploration and
used to adjust the output of NN. Moreover, there exist two exploitation which may assist to outperform other algorithms.
kinds of learning techniques, namely, stochastic and deter
ministic. In deterministic techniques, e.g. Back-Propagation The rest of this paper is organized as follows: Section
[8] and gradient-based [9], [10], the training stage results in II presents the fundamental of an MLP and MFa algorithm.
the same accuracy if the training samples stay compatible. In The proposed MFa-based trainer is described in Section III.
deterministic techniques, the trainers are mostly mathematical Experimental results with discussions are presented in Section
978-1-5090-0275-7/15/$31.00 ©2015 IEEE 267

Firstly, shows the set of n moths are represented as a
[
IV. Conclusions and future work are provided in Section V.
]
matrix such that:
II. PRELIMIN ARIES
A. Feed-forward neural network and multi-layer perceptron m" m1,2 m"d

m21 m22
, m2d
,
As discussed in the previous section, FNNs are those NNs M= . ' (5)
with only one-directional and one-way connections among
their neurons. In this kind of NNs, neurons are arranged in mn,1 mn2
, mnd
,
various parallel layers [2]. The first one is named the input
layer, while the last one is named the output layers. The layers
where n is the moths number and d is the number of param
eters.
among the input and output layers are called hidden layer.
[ ]
Multi-Layer Perceptron (MLP) is a FNN, but it consists of Assume there exists an array for sorting the moths accord
one hidden layer as shown in Fig. (1). ing to the value of objective function:
Subsequent to giving the inputs, weights, and biases, the O Ml
yield of MLPs are calculated as the following steps. O M2
O M= . (6)
1) The weighted totals of inputs are initially computed
by Equation (1). O Mn
n
where n is the moths number.
t
j = 2)Wij' Xi) -
Bj,j = 1,2,. . . ,h (1)
i=1
[
Secondly, the components of MFO algorithm are the
]
flames, creates another matrix similar to moths matrix as:
where n is the number of the input nodes, Wij
represents the weight from the ith node in the input
layer (Xi) to the lh node in the hidden layer (hj), h,2 hd
Xi indicates the ith input, and Bj
represents the bias hI h2 hd
or threshold of the lh hidden node. F= . (7)
2) The output of each hidden node is computed as
follows:
ff"
n,1 fn,2 fnd,
1 where n is the flames number and d is
the number of
Tj=sigmoid(
j)=
t ,j=I,2, . . . ,h
[ ]
1+
( exp())
-t parameters. Moreover, assume that there exist an array for
j
(2) sorting the flames according to the value of objective function:
3) The final outputs are characterized depend on the
computed outputs of the hidden nodes: OF1
h OF2
k
O = 2)Wjk, Ti) - B� , k= 1,2,... ,m (3)
OF= . (8)
j=1
OFn
1 where n is the flames number.
Ok=sigmoid(
k
o)= , k= 1,2,... ,m
1+
( exp(
-k
O)) Actually moths are the search agents which move around
where Wjk is the connection weight from the

(4
l
2 the search area, while flames are the optimal position of moths.
hidden node to the kth output node, and B�

is the The general framework of MFO algorithm contains three
threshold of the kth output node. tuple of approximation function that characterized as:
As may be found in the Equations (1,2,3, and 4), the MFO = (A,B,C) (9)
weights and biases in charge of characterizing the last output
of MLPs from given inputs. Discovering fitting qualities for
where A
is a function which creates the population of moths
randomly and fitness values of them.
weights and biases keeping in mind the end goal to accomplish
a connection between the inputs and produces precise meaning
of training MLPs. A: ¢-+ { M,O M} (10)
B. Moth-flame Optimisation (MFO)

The B
is the essential function that determine how the
The MFO is one of the recent meta-heuristic optlmlza moths move around the search area.
tion techniques [11]. MFO algorithm imitates the navigation
method of moths in the night. In this algorithm, the moths
B: M-+ M (11)
are the candidate solutions and the moths' positions are the
problem's parameters. In this way, moths can fly in I-D, 2-D, The C function is checked if the stop criterion is satisfied
3-D, or hyper dimension space by exchanging their position or not.
vectors. In order to setup the population of MFO algorithm: C: M-+ {true, false} (12)
268
In B function, update the position of each moth respect to position of each agent represents the fitness of that particle.
corresponding flame using Equations (13, 14, and 15): The fitness of the ith agent is expressed in terms of average
Mean Square Error (MSE). MSE is used to measure how the
value of the desired output is deviated from the value of the
(13)
actual output as follows:
where P is the spiral function, Mi refer to ith moth and Fj
indicates lh flame. m
= d7)2
MSE 2)07 - (16)
i=l
P (Mi,Fj) = Dj.expbt .cos(21rt) + Fj (14)
where m represents the number of outputs, d7
and 07 are the
where D refer to the distance among the ith moth and lh
j desired and actual outputs, respectively, of the ith input unit
flame, b is constant for defining the spiral function, and t is when the kth training sample is used. Hence, the average MSE
random number between - j
1 and 1. D is computed as: is calculated by calculating the average of MSEs of all training
(15) samples as follows:
",m k dki )2
MSE L L.., i=l (%- -
N
Input Hidden Output
= (17)
Layer Layer Layer
k=l
where N is the total number of training samples. The objective
function of the MFO algorithm aims to minimize the average
MSE as follows, min : F (V) = MSE.Generally, MFO
iteratively moves the weights and biases of the MLP to
minimize the average MSE and converges to a global solution
that is better than random initial solutions. Hence, in each
iteration the weights and biases are changed and the moth's
positions are changed. But, there is no absolute guarantee for
finding the global or the most optimal solution for the MLP
due to the stochastic nature of MFO algorithm.
Fig. 1: The structure of MLP with n inputs, one hidden layer,
and m outputs.
Mirjalili proved that the MFO algorithm has the ability

to find very competitive results compared with other well
known meta-heuristic algorithms such as GA and Particle
Swarm Optimization (PSO) due to two reasons [11]. First, the
Moth-Flame Weights and
diversification of MFO algorithm is very high and requires it to Optimizer (MFO) Biases
avoid local optima. Second, the equilibrium of diversification
and intensification is very simple and effective in finding the
optimal solution to solve real problems [11]. In this research,
MFO algorithm is used to search for optimal weights and Average MSE
biases.
Fig. 2: MFO algorithm searches for the weights and biases
III. PROPOSED MODEL to train the MLP with the training samples and calculate the
average MSE.
The aim of the proposed model is to search for the optimal
values of weights and biases, which are used to train MLP to
achieved low MSE and test error and high classification rate.
In this proposed model, MFO algorithm is used as an MLP I V. EXPERIMENTAL RESULT S AND DISCUSSION
learner.
The aim of all experiments is to search for the weights and
MFO algorithm searches for optimal weights and biases, biases to train the MLP to reduce the MSE and test error and
which are used with the training samples to train MLP and increase the classification rate.
calculate the average MSE. This process is repeated for differ
ent iterations to reach to the optimum solution (i.e. minimum A. DataSets
average MSE) as shown in Fig. (2).
In this section, five different standard datasets, namely,
In this research, three layered perceptron is chosen. The XOR, heart, iris, balloon, and breast cancer dataset are used
weights and biases represent the connection between the input to evaluate the MFO trainer. The datasets are obtained from
and hidden layers and also between the hidden and output University of California at Irvine (UCI) Machine Learning
layers. In MFO algorithm, each agent is denoted by weights Repository and the description of the datasets are summarized
(W) and biases (8) of MLP as follows, 17 = {W, e}.
The in Table (I). As shown from the table, the datasets' structure
269
ranged from a simple dataset as XOR dataset to a more
x_(x-a)x(d-c)
complicated dataset as breast cancer dataset. XOR dataset +c (18)
consists of eight training and testing samples, only two classes, - (b-a)
and each sample is represented by three attributes. On the
other hand, breast cancer dataset has 599 training samples, 100 Each algorithm is run five times on each dataset and the
testing samples, two classes, and each sample is represented by average (AVG) and standard deviation (STD) of the best Mean
nine features. Moreover, another three function-approximation Square Errors (MSEs) in the last iteration in each algorithm
datasets, namely, sigmoid, cosine, and sine are obtained from are calculated. The best classification rates or test errors of
[l3]. All three datasets have the same structure of MLP I-lS each algorithm are also calculated.
I and have one attribute. Moreover, sigmoid dataset consists
C. Experimental Scenarios
61 training samples and 121 teatsing samples, cosine dataset
consists of 31 training samples and 38 testing samples, and In order to verify the performance of the proposed algo
sine dataset consists of 126 training samples and 252 testing rithm (i.e. MFO), four well-known optimization algorithms,
samples. The training and testing samples are chosen from namely, PSO, ACO, ES, and GA, are compared with MFO
each dataset to evaluate the performance of the proposed algorithm on five standard benchmarks and three function
model. approximation datasets. In this section, two experimental sce
narios are performed. In the first experimental scenario, five
sub-experiments are performed. In each sub-experiment, all
TABLE I: Datasets description [14].
optimization algorithms are performed on the five standard
Dataset # Attributes
# Training # Testing
# Classes
MLP datasets. In the second experimental scenario, three sub
Samples Samples Structure
experiments are performed. In each sub-experiment in this
3-bits XOR 3 8 8 2 3-7-1
Iris 4 150 150 3 4-9-3 scenario, all optimization algorithms are performed on the
Heart 22 80 187 2 4-9-3 three function-approximation datasets. In each sub-experiment,
Breast Cancer 9 599 100 2 9-19-1
all optimization algorithms are applied on one dataset.
Baloon 4 16 16 2 22-45-1
According to the first scenario. In the first sub-experiment,
XOR dataset is used. As shown in Table (I), XOR dataset
B. Experimental Setup consists of three attributes, eight training samples, eight testing
samples, two classes, and one output. In the second sub
The initial parameters of all optulllzation algorithms are experiment, Iris dataset which is one of the most common
summarized in Table (II). Moreover, the weights and biases standard datasets is used. Iris dataset consists of four attributes,
are randomly initialized in range [-10,10] for all datasets. The 150 training samples, 150 testing samples, three classes, and
population size of all algorithms are 50 for XOR dataset and three outputs as shown in Table (I). Heart dataset is used
200 for the rest of datasets. Further, the maximum number in the third sub-experiment. As shown from Table (I) the
of iterations are 250 iterations. Furthermore, structure of the heart dataset consists of 22 attributes, 80 training samples,
MLPs for each dataset is presented in Table (I). 187 testing samples, two classes, one output, and the structure
of the MLPs is 4-9-3. The fourth sub-experiment is applied
on Breast cancer dataset. Breast cancer dataset consists of
TABLE II: Initial parameters of the optimization algorithms. nine attributes, 599 training samples, 100 testing samples, two
Optimization classes, and one output and the structure of the MLP is 9-
Parameter Value
Algorithm 19-1. Thus, the 209 variables are optimized. In the fifth sub
Crossover Single point (probability -1.0)
GA Mutation Uniform (Probability-O.OI)
experiment, balloon dataset is used. As shown in Table (I) the
Type Real Coded balloon dataset consists of four attributes, 16 training samples,
Topology Fully Connected 16 testing samples, two classes, output, and the structure of
Social constant ( 2) I
PSO
Cognitive constant (e,) I
the MLP is 22-45-1, thus the dimension of the trainer is 55.
Inertia constant (w) 0.3 The MSEs and classification rates of all sub-experiments are
[nitial pheromone (T) Ie - 06
summarized in Table (III) and Fig. (3), respectively.
Pheromone update constant (�) 20
Pheromone constant (q) I
In the second scenario, function-approximation datasets are
ACO
Global pheromone decay rate (Pg) 0.9
Local pheromone decay rate (p,) 0.5 used. In the First sub-experiment, sigmoid function which is
Pheromone sensitivity (a) I the simplest function in the function-approximation datasets
,\ 10
ES
a I
is used. Sigmoid dataset consists of one attribute, 61 training
b I samples, 121 testing samples, and one output and the structure
MFO
t l 1, IJ of the MLP is 1-15-1. Cosine function which is difficult than
sigmoid function and it is used in the second sub-experiment.
Due to different ranges of the attributes, normalization step This dataset consists of one attribute, 31 training samples, 38
is essential for the MLP. In this work, min-max normalization testing samples, and one output and the structure of the MLP
method is used. Min-max normalization method is calculated is 1-15-1. In the third sub-experiment, sine function is used.
as in Equation (18). In min-max normalization method, the Sine function-approximation dataset consists of one attribute,
variable x is mapped in the interval of [a, b] to c, d. Moreover, 126 training samples, 252 testing samples, and one output and
in this research, the hidden nodes of MLPs are assumed to the structure of the MLP is 1-15-1. The results of all sub
be equal to 2 x N + 1, where N represents the number of experiments in this scenario are summarized in Table (IV) and
attributes of the datasets. Fig. (4).
270
TABLE III: MSE for the XOR, iris, heart, breast cancer, and balloon datasets.
Algorithm XOR Iris Heart Breast Cancer Balloon
MFO 1.018ge 009 ± 1.5111e 009 0.0221 ± 0.0028 0.1982 ± 0.006879 0.00022 ± 4.8597e 07 1.3033e 020 ± 2.8898e 020
PSO 0.084050 ± 0.035945 0.228680 ± 0.057235 0.188568 ± 0.008939 0.034881 ± 0.002472 0.000585 ± 0.000749
GA 0.000181 ± 0.000413 0.089912 ± 0.123638 0.093047 ± 0.022460 0.003026 ± 0.001500 5.08e 24 ± 1.06e 23
ACO 0.180328 ± 0.025268 0.405979 ± 0.053775 0.228430 ± 0.004979 0.013510 ± 0.002137 0.004854 ± 0.007760
ES 0.118739 ± 0.011574 0.314340 ± 0.052142 0.192473 ± 0.015174 0.040320 ± 0.002470 0.019055 ± 0.170260
Fig. 3: Classification rates for XOR, iris, heart, breast cancer, and balloon datasets.
D. Discussion three algorithms were lower than 50%.

3) Using Heart dataset, GA achieved minimum MSE,
As shown from Table (III) and Fig. (3) the following while the MSE of the other algorithms were much
remarks can be noticed: higher than GA. According to the classification rate,
MFO algorithm achieved the highest classification
1) Using XOR dataset, MFO algorithm achieved the rate. The classification rate of PSO, GA, and ES
lowest MSE. Moreover, there is a significant differ algorithms was 68.75%, 58.75%, and 7l.25%, respec
ence between the MSE of the MFO algorithm and tively, while ACO achieved 0% classification rate.
the other algorithms listed in Table (III). Additionally, 4) Using breast cancer dataset, MFO achieved the min
GA achieved the second MSE, while the MSE values imum MSE. In addition, MFO and GA algorithms
of the other algorithms are very high compared with achieved 99.3% and 98% classification rate, while the
MFO and GA. According to classification rate, GA classification rates of the other algorithms were lower
achieved the best result, while MFO achieved the than 45%.
second classification rate. The other three algorithms 5) Using balloon dataset, GA and MFO achieved the
achieved classification rate lower than 65%. minimum MSE, while the MSE values of the other
2) Using iris dataset, MFO algorithm achieved minimum three algorithms are much higher than GA and MFO
MSE. Moreover, the classification rate of the MFO algorithms. As shown in Fig. (3), the classification
algorithm was more than the other algorithms. More rate of all algorithms was 100%.
over, GA algorithm achieved the second classification
rate. Surprisingly, the classification rates of other
271
TABLE IV: MSE for the sigmoid, cosine, and sine datasets.
Algorithm Sigmoid Cosine Sine
MFO 0.000198 ± 0.000018 0.00035 ± 0.00012 0.192 ± 0.001
PSO 0.023 ± 0.0093 0.0591 ± 0.0211 0.61 ± 0.0711
GA 0.00139 ± 0.001 0.0112 ± 0.00613 0.442 ± 0.06
ACO 0.0241 ± 0.0101 0.0509 ± 0.0111 0.56 ± 0.0512
ES 0.0772 ± 0.0172 0.0872 ± 0.0221 0.73 ± 0.0751
Moreover, form Table (IV) and Fig. (4) the following

remarks can be seen:
1) Using sigmoid dataset, MFO achieved the minimum

MSE and GA achieved the second minimum MSE.
Moreover, MFO and GA algorithms achieved 0.2143
and 0.45 test error, while the test error of the other
algorithms were higher than 0.5.
2) Using cosine dataset, MFO achieved the minimum Fig. 4: Test error for sigmoid, sine, and cosine datasets.
MSE. Moreover, MFO and GA algorithms achieved
minimum test error.
3) Using sine dataset, MFO achieved the minimum
MSE. Moreover, MFO algorithm achieved the mini REFERENCES
mum test error (42), while the test error of GA, PSO, [l] w. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent
ACO, and ES algorithms achieved 11l.25, 124.89, in nervous activity," The bulletin of mathematical biophysics, vol. 5,
117.71, and 142.31 respectively. no. 4,pp. 115-133,1943.
[2] 1. Park and 1. W. Sandberg, "Approximation and radial-basis-function
These findings may help us to understand that MFO networks," Neural computation, vol. 5,no. 2,pp. 305-316,1993.
algorithm achieved superior results than the other four algo [3] T. Kohonen, "The self-organizing map," Neurocomputing, vol. 21,no. 1,
pp. 1-6,1998.
rithms. According to MSE, MFO algorithm achieved relatively
[4] S. Ghosh-Dastidar and H. Adeli, "Spiking neural networks," Interna
minimum MSE, which reflects the high local optima avoidance
tional journal of neural systems, vol. 19,no. 04,pp. 295-308,2009.
of this algorithm. The reason for minimum MSE of the MFO
[5] T. Kohonen, "Improved versions of learning vector quantization," in
algorithm is the high exploratory behavior, which helps in IJCNN International Joint Conference on Neural Networks, 1990.
local optima avoidance. In other words, in MFO algorithm, IEEE, 1990,pp. 545-550.
half of the iterations are devoted to the exploration of the [6] G. Bebis and M. Georgiopoulos, "Feed-forward neural networks,"
search space, which is changed for every dataset in training Potentials, IEEE, vol. 13,no. 4,pp. 27-31,1994.
MLPs, while the rest of iterations are devoted to exploitation. [7] A. Tharwat, T. Gaber, M. M. Fouad, V. Snasel, and A. E. Hassanien,
High exploitation behavior leads to a rapid converges towards "Towards an automated zebrafish-based toxicity test model using ma
the global optimum, hence solving the local optima problem. chine learning," Prceedings of International Conference on Commu
nications, management, and Information technology (ICCMIT'2015),
According to the classification rate, MFO algorithm achieved Procedia Computer Science, vol. 65,pp. 643-651,2015.
the highest rate between all other algorithms. The reason for
[8] 1. Hertz, A. Krogh, and R. G. Palmer, Introduction to the theory of
high classification rate is that MFO has adaptive parameters to neural computation. Basic Books, 1991,vol. 1.
smoothly balance between the exploitation and exploration. In [9] R. 1. Williams and 1. Peng, "An efficient gradient-based algorithm for
genera\, MFO algorithm is more suitable and effective in a case on-line training of recurrent network trajectories," Neural Computation,
of difficult and complicated datasets and it is recOlmnended to vol. 2,no. 4,pp. 490-501,1990.
optimize the training process in MLPs. [10] A. Tharwat, T. Gaber, A. E. Hassanien, M. Shahin, and B. Refaat,
"Sift-based arabic sign language recognition system," in Afro-European
Conference for Industrial Advancement, Vtllejuif, France, September 9-
J J. Springer, 2015,pp. 359-370.
V. CONCLUSIONS
[ll] S. Mirjalili, "Moth-flame optimization algorithm: A novel nature
inspired heuristic paradigm," Knowledge-Based Systems, 2015.
In this paper, MFO algorithm is proposed to search for
[l2] A. l. Hafez, H. M. Zawbaa, A. E. Hassanien, and A. A. Fahmy,
the weights and biases to train MLP. The proposed algorithm "Networks community detection using artificial bee colony swarm
(i.e. MFO) is applied to five standard classification datasets, optimization," in Proceedings of the Fifth International Conference
namely, XOR, iris, heart, breast cancer, and balloon datasets on Innovations in Bio-Inspired Computing and Applications (IBICA),
Ostrava, Czech Republic, June 23-25. Springer, 2014,pp. 229-239.
and three function-approximation dataset, namely, sigmoid,
sine, and cosine. Four well-known optimization algorithms, [13] S. Mirjalili, S. M. Mirjalili, and A. Lewis, "Let a biogeography-based
optimizer train your multi-layer perceptron," Information Sciences, vol.
namely, PSO, GA, ES, and ACO are used to train MLP. 269,pp. 188-209,2014.
The results of the MFO algorithm are compared with other
[l4] C. Blake and C. 1. Merz, "{UCI} repository of machine learning
four optimization algorithms. The results showed that MFO databases," 1998.
algorithm is effective in training MLPs and it solves the local
minimum problem efficiently. Hence, MFO helps in finding
the optimal weights and biases and achieved low MSE and
high classification rate.
272

Moth-Flame Optimization For Training Multi-Layer Perceptrons

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Moth-Flame Optimization For Training Multi-Layer Perceptrons

Hochgeladen von

Copyright:

Verfügbare Formate

See