Sie sind auf Seite 1von 7

210

PROCEEDINGS of the International Conference InfoTech-2016

Proceedings of the International Conference on Information Technologies (InfoTech-2016) 20-21 September 2016, Bulgaria

WEB DISTRIBUTED COMPUTING FOR EVOLUTIONARY TRAINING OF ARTIFICIAL NEURAL NETWORKS 1

Todor Balabanov, Delyan Keremedchiev, Ilia Goranov

Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, acad. Georgi Bonchev Str, block 2, office 514, 1113 Sofia, Bulgaria e-mails: todorb@iinf.bas.bg Bulgaria

Abstract: Evolutionary algorithms (EAs) are widely used in artificial neural networks training. EAs are computationally interesting because it is possible ot separate the problem solving in smaller pieces and to calculate these smaller pieces on different machines (distributed computing). Distributed computing platforms are well established and the most popular is BOINC, created in Berkeley. The problem in distributed computing platforms is the heterogeneity of the computational environment. The best way for solving heterogeneity is by using well established technology such as AJAX. In this study a web based distributed computing platform is presented (JavaScript and AJAX). The platform is used for ANN training with EAs. Key words: artificial neural networks, evolutionary algorithms, optimization, AJAX

1. INTRODUCTION

Artificial neural networks (ANNs) began their development back in 1943 with the development of Warren McCulloch and Walter Pitts [1]. In recent developments the most common model of ANN is a three-layer network with back propagation of error (BP). This type of ANN is directed weight graph. Each node has its own activity and strength of connections between nodes determines how interaction is done between individual nerve elements. Conventionally the network is divided into three layers, the first layer serves as the input information from the external environment and is referred to as an input. The third layer serves to display information outside the network to the external environment and is called output.

1 This work was supported by private funding of Velbazhd Software LLC.

20-21 September 2016, BULGARIA

211

Between input and output layer stands a hidden layer, which has an essential role in the operation of the network but also hides most theoretical unknowns (e.g. the size of the hidden layer or how many hidden layers are optimal). In the classical three- layer model of ANN the accepted connections are only straight from input to output. The presence of feedback is characteristic of recurrent ANN. The most frequent use of ANN is in tasks for classification or prediction [2] [3]. The main task of the classic three-layer ANN is to match the function between the input and the output. This process of comparison is called training. Training is only one problem associated with finding such values of the weights in the network for which the network performs the task it was designed for. Once trained, ANNs are extremely effective for use in practice, but the learning process is often slow and not very effective [4] [5] [6]. Many learning algorithms (exact or approximated) are developed for finding optimal values of weights. They are based on gradient methods, evolutionary algorithms (EA) and heuristic approaches of global optimization [7] [8] [9] [10] [11]. When using population based algorithms for training of ANN a significant advantage is the possibility to do the learning process in parallel implementation or even as calculations in a distributed environment [12] [13]. The calculations in a distributed environment have serious application in large tasks that can be calculated in parallel. ANN training, when it is based on algorithms from the group EA, is ideal for implementation in a distributed environment. There are different platforms in a distributed computing environment and the most popular is BOINC with its project SETI@home [14] [15]. A major drawback of the most popular platforms for calculations in a distributed environment is the challenge to work in a heterogeneous environment where hardware and operating systems of individual computers vary greatly. This shortcoming is brightly illustrated ins BOINC platform. The creator of the project for distributed computing is responsible to develop client programs for almost any configuration (hardware - operating system) that should be supported in the project. Even the presence of 32 bit, 64 bit, Windows, Linux and Mac OS X systems leads to developing at least 6 different client applications. The variety of hardware and operating systems is a major problem for scalability of a platform for distributed computing environments. At the same time, high expandability of the system can be achieved if the calculations are carried out with technology that is widespread on different hardware platforms and operating systems. Such technology is the web browser and supported JavaScript programming language. Any modern web browser allows asynchronous requests to the web server by using the AJAX technology.

2. PROBLEM DEFINITION

This study is about a training model of ANN using an evolutionary learning algorithm - differential evolution (DE) [16] [17] which is the EA group. ANN training is done in a distributed environment. The network is represented as a JavaScript code and communication with the central node is performed using

212

PROCEEDINGS of the International Conference InfoTech-2016

asynchronous AJAX requests. The goal of training ANN is forecasting values for different currency on the global currency market. [18] Forecasting is done by mathematical model based on ANN whose training is carried out with DE in the form of a distributed computing. ANN topology is the subject of research and for this reason in the developed system ANN's topology is set by the user. Eligible types topologies are: multi-layer, multi-feedback and fully- connected. The model is based on classical ANN as used in the models with BP. As a transfer function linear transfer function is used:

u [i] = sum (w [i] [j] * x [j])

(1)

The transfer function defines how the input signals, in combination with the weight factors, will influence the activity of the respective neuron. Although models are available with other types of transfer functions at this stage preferences are in favor of the simplest model based on linearity. The result of the transfer function is necessarily normalized using a threshold function (selected model in a sigmoid function with values ranging from 0.0 to 1.0). Normalization is needed because of the different connections number between different neurons.

x [i] = 1 / (1 + exp (-u [I]))

(2)

Sigmoid function is preferred for threshold function because of its suitable properties in terms of differentiability and asymptote to plus infinity and minus infinity. It is possible to use a binary function or a linear function (to improve performance), but their properties affect the results that can be achieved by ANN. If the network is working with values from -1 to +1 hyperbolic tangent can be used as threshold function.

3. PROBLEM SOLUTION

Training shall be implemented with the help of DE. Within DE algorithm each chromosome presents a set of weights for specific topology of ANN with a certain number of training examples. Sequentially therefore individual chromosomes (sets weights) are loaded in the structure of the ANN and after that tutorial examples are fed. Calculation of the error which ANN performs for each example done and the summed error of the specific examples defines the factor for the viability of the chromosome (the set weights). The way to determine the coefficient of viability is one of the key issues that determine the success of the overall prognosis. At the time series it is reasonable training examples for ANN training to be submitted in chronological order. This could be a problem for ANN trained with algorithm such as BP. Also, it is reasonable chronologically earlier learners' examples to exert less influence in shaping viability factor. Once rated individual chromosomes enter the computational scheme of DE by performing selection, crossover and mutation.

20-21 September 2016, BULGARIA

213

In a parallel version of the algorithm local copies are created of ANN and DE. The training takes place locally. Though each chromosome in the population indicates an individual in the search space. Different individuals are grouped relatively close together in this area. Parallel calculations on a multicomputer algorithm such as grouping of individuals can result in locally different areas of research in the solution space. In this respect, the most significant is the policy of synchronization. The synchronization process includes the broadcast and cluster of the best individuals on a common centralized location (in client-server architecture). This global set of most persistent individuals can be used in the creation of further local areas. Instead of client-server architecture it is possible to implement peer-to- peer solution with the absence of a centralized server. Any locally running application provides communication with the other locally running applications. A major advantage of the proposed distributed system is its extremely high degree of expandability. The advantages of the proposed model are: First, the use of DE training of ANN avoids the danger of catastrophic forgetting that occurs in BP based training. The second advantage is the ability to train ANN by DE with recurrent links. The third advantage is the ability to train ANN by DE, no matter the order in which training examples are submitted. The fourth advantage is the ability to train various copies of the ANN and to do so in parallel. This results in improvement of performance and better coverage of the area of search. The proposed model has the following disadvantages. Using ANN, regardless of the solution of a problem, has a very slow and difficult learning process. ANN is very effective after being trained, but the slow training process requires large amounts of computing resources. Using DE slows the learning process, compared to algorithms using BP. Anyway it is preferred because of the many advantages listed. DE combined with BP presents an interesting future research. Although this can not be achieved without any compromise with the topology of the ANN. Even though it is a relatively clean technology the development of distributed computing system is significantly more complicated than writing linear programs and even more complicated than writing parallel programs. Anyway technological drawbacks are possible.

4. EXPERIMENTAL RESULTS AND DISCUSSION

The performance in a JavaScript implementation of the language is the main argument to be avoided in carrying out large-volume calculations or computations that require greater accuracy. JavaScript falls into the group of scripting languages, whose code is not compiled to instructions for the processor. Its code is interpreted by software modules called interpreters.

214

PROCEEDINGS of the International Conference InfoTech-2016

214 PROCEEDINGS of the International Conference InfoTech-2016 Fig. 1. Comparison to speed-up for the dissemination of

Fig. 1. Comparison to speed-up for the dissemination of the information in ANN during forward pass, between C ++ and JavaScript. On the X-axis are the number of neurons and on the Y axis is performance (in milliseconds).

neurons and on the Y axis is performance (in milliseconds). Fig. 2. Standard deviation in speed-up

Fig. 2. Standard deviation in speed-up for the dissemination of the information in ANN during the forward pass, between C ++ and JavaScript. On the X-axis are the number of neurons and on the Y axis is the standard deviation.

20-21 September 2016, BULGARIA

215

At any time the interpreter may suspend the execution of the program code. Therefore calculations are not sufficiently reliable. To check the difference in the performance of the proposed AJAX-JavaScript solution a series of experiments to distribute information in ANN during the forward pass were made. The code for the forward pass is developed in two separate modules of the VitoshaTrade [18] system - respectively a module in C ++ and a module in JavaScript. It is apparent from Fig. 1, that both speeds are comparable for networks with size of 10 to 100 neurons. Each experiment was executed 30 times and the average values are presented in the figure. As regards performance languages C ++ and JavaScript are comparable, but there is higher stability of the computing process in C ++, which can be seen in Fig. 2. It presents the mean square deviation of the time needed for computation. This difference is mainly due to the presence, of an interpreter and a web browser, something that is not present in the calculations with C ++. In the category of the languages like C ++, the program initially is translated to the assembly language and then to machine code.

4. CONCLUSION

The realization of calculations in a distributed environment as AJAX web-based system leads to a very high degree of expandability of the system. Practically, distributed computing can run on any device supporting modern web browser able to run JavaScript and AJAX. The calculation is carried out within the web browser which is a process in an address space of the operating system. The OS in turn is running on physical hardware. Though comparable in performance boost, the calculations are unreliable (presence of interpreter) as opposed to the implementation of languages like C / C ++ or Assembler.

REFERENCES

[1] McCulloch, Warren; Walter Pitts (1943), A Logical Calculus of Ideas Immanent in Nervous Activity, Bulletin of Mathematical Biophysics 5 (4): 115133. doi:10.1007/BF02478259. [2] Zissis, Dimitrios (October 2015), A cloud based architecture capable of perceiving and predicting multiple vessel behaviour, Applied Soft Computing 35. [3] Forrest MD (April 2015), Simulation of alcohol action upon a detailed Purkinje neuron model and a simpler surrogate model that runs >400 times faster, BMC Neuroscience 16 (27).

doi:10.1186/s12868-015-0162-6.

[4] Werbos, P.J. (1975), Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. [5] Schmidhuber, Jurgen (2015), Deep learning in neural networks: An overview, Neural Networks 61: 85117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. [6] Edwards, Chris (25 June 2015), Growing pains for deep learning", Communications of the ACM 58 (7): 1416. doi:10.1145/2771283.

216

PROCEEDINGS of the International Conference InfoTech-2016

[7] M. Forouzanfar, H. R. Dajani, V. Z. Groza, M. Bolic, and S. Rajan, (July 2010), Comparison of Feed-Forward Neural Network Training Algorithms for Oscillometric Blood Pressure Estimation, 4th Int. Workshop Soft Computing Applications. Arad, Romania: IEEE. [8] de Rigo, D., Castelletti, A., Rizzoli, A.E., Soncini-Sessa, R., Weber, E. (January 2005), A selective improvement technique for fastening Neuro-Dynamic Programming in Water Resources Network Management, In Pavel Zítek. Proceedings of the 16th IFAC World Congress IFAC-PapersOnLine. 16th IFAC World Congress. Prague, Czech Republic: IFAC. doi:10.3182/20050703-6-CZ-1902.02172. ISBN 978-3-902661-75-3. Retrieved 30 December

2011.

[9] Ferreira, C. (2006), Designing Neural Networks Using Gene Expression Programming, In A. Abraham, B. de Baets, M. Köppen, and B. Nickolay, eds., Applied Soft Computing Technologies: The Challenge of Complexity, pages 517536, Springer-Verlag. [10] Da, Y., Xiurun, G. (July 2005), T. Villmann, ed. An improved PSO-based ANN with simulated annealing technique. New Aspects in Neurocomputing, 11th European Symposium on Artificial Neural Networks. Elsevier. doi:10.1016/j.neucom.2004.07.002. [11] Wu, J., Chen, E. (May 2009). Wang, H., Shen, Y., Huang, T., Zeng, Z., ed. A Novel Nonparametric Regression Ensemble for Rainfall Forecasting Using Particle Swarm Optimization Technique Coupled with Artificial Neural Network, 6th International Symposium on Neural Networks, ISNN 2009. Springer. doi:10.1007/978-3-642-01513-7-6. ISBN 978-3-642-01215-0. [12] Rumelhart, D.E; James McClelland (1986), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MIT Press. [13] Russell, Ingrid, Neural Networks Module, Retrieved 2012. [14] D. P. Anderson, J. Cobb, E. Korpela, M. Lebofsky, and D. Werthimer, SETI@home: An experiment in public-resource computing, Communications of the ACM, Nov. 2002, Vol. 45 No. 11, pp. 56-61. [15] D. Anderson. BOINC, A System for Public-Resource Computing and Storage, In proceedings of the 5th IEEE/ACM International GRID Workshop, Pittsburgh, USA, 2004. [16] Storn, R., Differential Evolution - A Simple and Efficient Heuristic Strategy for Global Optimization over Continuous Spaces, Journal of Global Optimization, vol.11, Dordrecht, pp.341-359, 1997. [17] Price, K., An introduction to differential evolution, In David Corne, Marco Dorigo, and Fred Glover, editors, New Ideas in Optimization, p.79108, Mc Graw-Hill, UK, 1999. [18] Balabanov, T., VitoshaTrade - Distributed System for Forex Forecasting by Artificial Neural Networks and Evolutionary Algorithms. https://github.com/TodorBalabanov/VitoshaTrade/tree/master/ajax/