Sie sind auf Seite 1von 5

2011 Seventh International Conference on Natural Computation

A Study of Hybrid Parallel Genetic Algorithm Model


WANG Zhu-rong , JU Tao, CUI Du-wuHEI Xin-hong
School of Computer Science and Engineering Xian University of Technology Xian, China
AbstractGenetic algorithms is facing the low evolution rate and difficulties to meet real-time requirements when handing large-scale combinatorial optimization problems. In this paper, we propose a coarse-grained-master-slave hybrid parallel genetic algorithm model based on multi-core cluster systems. This model integrates the messagepassing model and the shared-memory model. We use message-passing modelMPI among nodes which correspond to coarse-grained Parallel Genetic Algorithm (PGA), meanwhile use share-memory modelOpenMP within the node which correspond to masterslave PGA. So it can combine effectively the higher parallel computing ability of multi-core cluster system with inherent parallelism of PGA. On the basis of the proposed model, we implemented a hybrid parallel genetic algorithm (HPGA) based on two-layer parallelism of processes and threads, and it is used to solve several benchmark functions. Theoretical analysis and experimental result show that the proposed model has superiority in versatility and convenience for parallel genetic algorithm design. Keywords-Genetic Algorithm;Parallel Programming Model; Multi-core cluster system; OpenMP; MPI

design various parallel strategies, and applied the PGA on a variety of parallel computer problems. As the traditional parallel machine is very expensive, the general user can not afford to it, which undoubtedly prevented a normal users pace using parallel genetic algorithms to solve the largescale combinatorial optimization problems. As the technology advanced and price falls, multi-core CPU is increasingly popular. Using high-performance multi-core PC to construct cluster system, features lower investment risk, structural flexibility, scalability, easy implementation, high cost-performance and so on. Therefore, it can easily obtain highcomputing performance. How to combine the GA very well with the existing parallel computer system very well, and to design effective parallel genetic algorithm and the realization of the corresponding system, have a positive meaning of theory and applied research of GA. Thus the realization of parallel genetic algorithm model has become an important research direction. II. RELATED WORK The research of PGA mainly includes following aspects: population size, encoding means, parameter setup which effected the efficiency of the PGA [3][4], implementation model of PGA[5],hybrid parallel genetic algorithm theory research[6][7] and the application of PGA[8][9]. Researches listed above mainly focus on the theory of the inherent characteristics of PGA, and only combine the parallel genetic algorithms with parallel programming techniques. Can we fundamentally change the previous concurrent implementation ways using serial method to simulate parallel genetic algorithms, to realize real PGA? Arunadevi et al. [9] studied of the hybrid programming model-MPI+OpenMP on the multi-core node cluster system and combined the advantages of the two parallel programming models to obtain better performance. Xiaoping et al. [10] realized a master-slave parallel genetic algorithm framework on the basis of the MPI, but the master-slave parallel genetic algorithm can not fully play to the cluster computing performance of each node because of its communication restrictions. So it can not work well to the inherent parallelism of PGA and high-speed parallel computing performance of cluster system, and can not deal with very well to the large-scale complex combinatorial optimization problem. In order to give full play to the high-speed parallel computing performance of multi-core PC and inherent

I.

INTRODUCTION

Genetic Algorithms (GA), a kind of global optimization algorithms, has many advantages, such as independency of relevant areas of problems solved, strong robustness to the type of the problems, so it is widely applied in many disciplines. Along with the rapid development of science and technology, the scale of the optimization problem is getting bigger and bigger, and complexity of searching space is getting higher and higher. So people put forward higher requirements for solution quality and processing speed of traditional GA. The use of traditional GA for solving such complex optimization problems, needs more quantity of individual and a large number of calculations, and does not meet the requirements of real time as its slow evolution. So the processing method of traditional GA appeared to be inadequate [1][2]. The inherent parallelism of GA leads itself much suitable to be realized on the large scale parallel machines. If realized the Parallel Genetic Algorithms (PGA) through effectively combing the inherent parallelism of PGA with the higher parallel computing ability of parallel machines, we can overcome the traditional GA deficiencies and increase solution quality of GA greatly and speed up its convergence. In recent years, the PGA research and applications have been widespread concerned. Many researchers dedicated to

978-1-4244-9953-3/11/$26.00 2011 IEEE

1038

parallelism of hybrid parallel genetic algorithms, we propose a hybrid parallel genetic algorithms realization model on the multi-core PC cluster. Through combining the physical topology of the current multi-core PC cluster with the logical structure of the coarse-grained-master-slave hybrid parallel genetic algorithm, and using hybrid programming of MPI and OpenMP, we achieved both process and thread level parallelism, and realized the coarsegrained PGA among nodes and master-salve PGA within node. The proposed model fundamentally improves the solution quality and convergence speed of PGA, and presentes an effective solution method for a general user to handle the complex combinatorial optimization problems by PGA on the low cost. III. D ES IGN AND I MPLEMENTATION OF H PGA M ODEL A. PGA Model Analysis Parallel genetic algorithms seek to combine the highspeed parallel computing of the parallel computers with the inherent parallelism of GA, speed up the search process, maintain and enrich the diversity of population, reduce the likelihood of precocity, and effectively and rapidly complete search of the complex issues. The basic idea of PGA is to realize parallelization of traditional genetic algorithms through multi-group parallel evolution and transport operator that exchange information among populations. The evolution of the populations is assigned to different computing nodes of parallel computer systems to realize distributed evolution, and through using transport operator to achieve information exchange of excellent genes, so that the excellent genetic information rapidly spread among the sub-populations. The algorithm can overcome the barriers of local convergence, and realize evolution of global optimal direction, thus it can accelerate the convergence speed and improve solution quality of genetic algorithms. The population is assigned to different computing nodes with different methods, which results different types of parallel genetic algorithm models. Parallel genetic algorithms currently have the following four models: master-slave PGA (MPGA), coarse-grained PGA (CPGA), fine-grained PGA (FPGA), and hybrid PGA (HPGA)[1]. Hybrid genetic algorithm is a multi-layer parallel model which combines the first three basic parallel genetic algorithms model. There are three models of HPGA currently: coarse-grained-fine-grained model, coarsegrained-coarsegrained model, and coarse-grainedmaster-slave model [2]. HPGA is generally designed by hierarchical structure, the upper layer commonly used coarse-grained model, the lower layer can use any another kind of model. Most of practical application is coarsegrained master-slave model. B. HPGA Model Design The existing realization of HPGA either simulated by serial method on single computer or implemented on a distributed system. However, these methods did not fully

utilize the inherent parallelism of PGA due to the factor of communication delay, load balancing, synchronization not easy to control, poor real-time and scalability. There was direct implementation of PGA on the parallel computer, but compute cost is too high. In view of the fact that the physical structure of multi-core PC cluster system just coincide with the logical structure of coarse-grained-masterslave hybrid parallel genetic algorithm, so the coarsegrained-master-slave hybrid parallel genetic algorithm can be well mapped on multi-core PC cluster. The Figure 1 and Figure 2 illustrate the specific topology structure respectively as follow.

Figure 1. Physical topology of Multi-core cluster

Figure 2. HPGA Model

In Figure 1, p1, p2, p3, and p4 represent four multicore computer nodes of PC cluster respectively, and c1, c2 represent each processor core of multi-core computing nodes (cluster mentioned in this paper is composed of four dual-core nodes).The p1, p2, p3, and p4 in Figure 2 represent upper layers various sub-populations of HPGA structure respectively, and each sub-population is divided into several more smaller populations. The evolution within the sub-population is implemented according to the masterslave PGA, and the evolution among the sub-populations is implemented according to the coarse-grained PGA. Based on the analysis of the two structures above, we can map the coarse-grained evolution among the sub-populations of HPGA as shown in figure 2 to the various computing nodes of the multi-core PC cluster for realizing parallel execution. Meanwhile map the master-slave evolution within the subpopulation of HPGA to the multiple processing cores of the each computing node to realize the two-layer parallelism among computing nodes and between processing cores within the computing nodes. The combination of multi-core PC cluster can give full play to coarse-grained-masterslave hybrid PGA parallelism, and achieve two-layer parallelism among and within populations, so it further accelerates the convergence speed and improves the solution quality of algorithms.

1039

C. HPGA Model Implementation The implementation of HPGA model based on the multi-core PC cluster is divided into physical layer and logic layer. The physical layer is responsible for constructing specific parallel programming model, and the logic layer for implementing the HPGA.

(1) Physical layer implementation


In order to take full advantage of the hierarchy structure characteristics of multi-core cluster, in the physical layer implementation, we combine the message passing and shared memory parallel programming model to achieve MPI and OpenMP hybrid programming. The physical layer is divided into two level structures, the upper structure refers to the parallel of between nodes, and the lower one refers to the parallel of within node. To achieve processes and threads two-level parallelism, we let each node in the multicore cluster correspond to a process, and the process is divided into multiple threads according to the number of the processor cores. a) Process-level parallelism implementation: By way of task decomposition, the problem is divided into small parts which communication is not frequent, each part is assigned to a multi-core computing node (that is a process) to handle. The task in each computing node is parallel handled by message passing model-MPI, and in each constant time interval or when certain conditions are met, the information exchanging between nodes is necessary in order to handle the task effectively, rationally, uniformly handled. MPICH2 is used for the implementation of specific message passing model, and communication between nodes is completed by calling the basic functions of MPI. b) Threads-level parallelism implementation: Completed the process-level task decomposition, the task of each node is further divided into more smaller tasks according data decomposition means, then the decomposed task is assigned to the different computing core of the multicore processor to generate multiple threads for parallel executing. The parallel programming model within node uses share memory parallel programming model-OpenMP. OpenMP programming model, by providing a group of platform-independent pragma, run-time library functions and environment variables, guides the complier to execute parallel operation according to the program parallelism. So application developers do not need to explicitly deal with the complex thread creation, synchronization, load balancing, destruction and other technical details, these details are completed by the compiler and the OpenMP thread library. Application developers only need to consider the problems such as which kinds of codes should be executed in the manner of multi-threading, how to reconstruct the algorithm in order to obtain better performance in multi-core processor and so on. The threadlevel parallelism is implemented by calling the pragma and

run-time library function provided by the OpenMP in order to implement parallelization of the loop part within an inner node. Before parallel implementation, it is needed to ensure that there is no data dependency in parallel execution. For each process, only the loop part is multi-thread parallel computation, other code segments are executed by a single MPI process. Each thread decides which part of it is to be operated according to the number of threads and their own thread ID in the process. The main purpose to combine two parallel programming models is to make full use of the resource of multi-core cluster, ensure load balancing, overlap communications and computation, and thereby enhance the performance of parallel programs. Using hybrid parallel programming model, which can generate multiple lightweight threads with OpenMP approach within MPI process, then let the main thread or a thread assigned to perform communication, other threads to perform calculations. Furthermore, by using load balance strategy of OpenMP as far as possible ensure balancing load of every processor, so that all processor core of multi-core computer will be utilized effectively. Hybrid parallel programming model can greatly reduce the number of communication processes, because MPI is only used to deal with coarse-grained communication among the processes. At the same time, OpenMP can quickly share data through shared memory, so it can shorten the response time of task, and therefore resolve perfectly the interaction between processor cores within multi-core computer node, reduce the communication overhead at a large extent. So that the model can effectively achieve process and thread level parallelism, and largely improve the performance. (2 )Logic layer implementation The logic layer is used to implement hybrid parallel genetic algorithms. We adopt the coarse-grained-masterslave HPGA model in this paper, which includes two layers. The upper layer consists of the coarse-grained PGA, it maps to the multi-core node of the physical layer to independently execute GA operation. The lower layer consists of the master-slave PGA, it maps to the inside of each node of multi-core cluster to implement the specific GA. When implementing the HPGA, we first divide the whole populations into a certain number of sub-populations according to the number of nodes of the multi-core cluster, and then assign each of these sub-populations to a computing node of multi-core cluster. Each multi-core computing node independently executes the GA operation of a sub-population. After a certain interval of evolution, the individual migration is executed between the various populations by using certain migration strategies to exchange population information and maintain the population diversity, so that it implements the coarsegrained parallelism between populations. Inside of the multi-core computing node, we use the master-slave PGA approach to implement the master-slave

1040

evolution within the node. Especially, this approach uses multi-core programming technology to generate multiple parallel execution threads, the individual fitness computing and genetic operating can be executed by different thread respectively. The main thread is responsible for the main genetic operations, while the other threads parallelly calculate the individual fitness. The information exchange between main thread and the other threads includes task is distributed from main thread to the other threads, and the computing results is sent from the other threads to main thread. The main experimental results are as follows:

IV. HPGA MODEL VERIFICATION AND ANALYSIS A HPGA Model Verification On the basis of the programming model above, we design and implement a hybrid parallel genetic algorithm (HPGA). Several functions referred in [14] are solved by using of HPGA. Two of these functions expression and illustrations are as follows:
f 1 ( x , y ) = 100 ( x
2

y)

+ ( x 1 ) 2 , x , y [ 20 , 20 ]

f 2 ( x , y ) = ( x 2 + y 2 ) 0 . 25 (sin( 50 ( x 2 + y 2 ) 0 .1 ) 2 + 1), x , y [ 100 ,100 ]

Table I Optimization result of test functions


Algoritm type SGA HPGA ICSOA BS MBS SD

f1
6.31566e-012 3.02829e-014 *

f2
4.69946e-018 2.67745e-019 *

f1
5.50959e-006 2.17738e-011 3.31e-007

f2
4.19594e-017 1.57846e-018 0.0460

f1
7.46393e-006 3.31827e-011 3.30e-007

f2
1.48088e-016 1.73462e-018 0.0134

Table II Computation time required in SGA and HPGA to get the same accuracy
SGA(Seconds) F1 E-9 E-10 E-11 E-12 E-13 E-14 2.141250 3.780353 4.121934 6.043432 6.404434 115.115345 F2 3.115720 4.345357 4.342384 6.353546 8.546479 110.56775 F1 0.553485 0.988512 1.124465 1.678623 1.765653 29.898834 HPGA(Seconds) F2 0.826940 1.183552 1.150202 1.667546 2.309091 29.218334

Table III The speedup between SGA and HPGA to get the same accuracy
Accuracy E-9 E-10 E-11 E-12 E-13 E-14 3.86867 3.82429 3.66568 3.60023 3.62723 3.85016 Speedup 3.76777 3.67145 3.77532 3.81011 3.70123 3.78419

Accuracy

Figure 3 Evolution process for function f 1

Figure 4 Evolution process for function f 2

The results of the same experiment averaged over 30 runs for f1 and f2. In Table I, the result mainly are mean best solutions(BS), mean best solutions(MBS), and standard deviation(SD).ICSOA represents the experimental results in[14] at the same experimental conditions. Through above experimental results, we can find that HPGA has better performance. B Analysis and Discuss When implement the thread-level parallelism of HPGA model, the pragma of OpenMP is invoked to parallelize the section of FOR LOOP in the program. The

iterations in the FOR LOOP are divided into small iteration blocks, and assigned to different threads to parallel execute. As the problem size of PGA is fixed before handling, the static balancing scheduling strategy defaulted in OpenMP is used to ensure the task was divided into blocks of equal size, so the probability of conflict can be significantly reduced when the number of processors simultaneously access the same memory area, and the load balancing between threads was guaranteed as well. Through analyzing the parallel genetic algorithm program of selected benchmark functions in this paper, we

1041

can estimate that 10 percent of program need be serially implemented. They include the initialization of the parallel environment, the broadcast of parallel genetic algorithms parameters, receiving optimal migration individuals, output results and so on. In the program, the number of the dualcore PC is 4, and the number of all the processor core is 8. If we only use the parallel programming model of MPI, it only has the process-level parallelism, and each process corresponds to a node, so the maximum speedup theoretical value calculated by the Amdahl law is as follow: 1 1 (1) = = 3 . 08
1

the concept of population pool in implementing the migration operation to reduce communication costs. Our next step is to do further research for proposed model and corresponding HPGA, so that HPGA can deal with the large-scale complex combinatorial optimization and real world problems. ACKNOWLEDGMENT This work was supported by National Natural Science Foundation of China (No.60873035) and Scientific Research Program Fund of Education Department of Shaanxi Province (No.2010JK713). REFERENCES
[1] Zden k Konfrst, Parallel genetic algorithms: advances, computing trends,applications and perspectives. Proceedings of the 18th International Parallel and Distributed Processing Symposium, 2004. [2] Xue Shengjun, Guo Shaoyong, The analysis and research of parallel genetic algorithm, IEEE xPlore, 2008:1-4. [3] THOMASZEWSKIB, PABSTS, BLOCH I NGER W, Parallel techniques for physically based simulation on multi- core process or architectures. Computers and Graphics, 2008(32):25-40. [4] Cant-Paz E, A Survey of Parallel Genetic Algorithms, Urbana2 Champaign: Illinois Genetic Algorithms Laboratory, 1996:897-902. [5] GUAN Yu, XU Bao-wen, Parallel Genetic Algorithms with Schema Migration, Chinese Journal of Computer,2003,26(3):.294-301. [6] LAI Xin-sheng, ZHANG Ming-yi, Parallel genetic algorithms with migration scheme based on penetration theory, Chinese Journal of Computer ,2005,28(7):146-152. [7] WU Hao-yang, Chang Bing-guo, A multi-group parallel geneticalgorithm base on simulated annealing method, JOURNAL OF SOFTWARE, 2000,11(3):416-420. [8] Rabenseifner R,Hager G,Jost G.Hybrid, MPI/OpenMP parallel programming on cluster of multi-core SMP nodes, Parallel Distributed and Network-based Processing. The 17th EUROMICRO International Conference on Digital Object Identifier, 2009:427-436. [9] Meghanathan N, Skelton,G.W. Intelligent transport route planning using parallel genetic algorithms and MPI in high performance computing cluster, The 15th International Conference Advanced Computing and Communications, 2007:578-583. [10] LIU Xiao-ping, AN Zhu-lin, Master-Slave parallel genetic algorithm framework on MPI, JOURNAL OF SYSTEM SIMULATION, 2004,16(9):38-41.(in chinese) [11] Salman Yussof, Rina Azlin Razali, Ong Hang See ,Marina Md Din. A coarse-grained parallel genetic algorithm with migration for shortest path routing problem, The 11th IEEE International Conference on High Performance Computing and Communications, 2009:615-621. [12] Dong Li, de Dupinski B.R., Nikolopoulos D.S, Hybrid MPI/OpenMP power-aware computing. Parallel&Distributed Processing (IPDPS)., 2010 IEEE International Symposium on Digital Object Identifier, 2010:1-12. [13] Ren Zi-wu,San Ye.Improvement of Real-Valued Genetic Algorithm and Performance Study.ACTA ELECTRONICA SINICA.2007, 35(2):269-274.(in chinese) [14] Xu Guang-hua, Liu Dan, Liang Lin. Immune clonal selection optimization method with combining mutation strategies. Journal of Xian Jiao Tong University, 2007,19(2):177-181.

f + (1 f ) / p

0 . 1 + (1 0 . 1) / 4

where 1 is the speedup, f is proportion of serial section in the entire algorithm ,and p is the number of processors. If we use the hybrid parallel programming model, each process can generate two parallel threads in its interior. Through analyzing programs each process, we see that there is 80 percent of program can be parallel executed by two threads. They include the genetic operations of the selection, the crossover, the mutation, and the calculation of the individual fitness. The speedup calculated by the Amdahl law under the hybrid parallel programming model is as follow: 1 (2) = 4 . 26
2

0 . 1 + (1 0 . 1 ) * ( 0 . 2 / 4 + 0 . 8 / 8 )

Compare with the two kinds of speedup obtained, we got the following conclusion: the speedup got by using the hybrid parallel programming model increased by 38.31% compared with the single MPI model. In the experimental test, the maximum measured speedup value is 3.86867.It accounts for 90.8 percent of the above ideal theoretical computing value that is theoretical value and does not consider the communication cost. The communication cost is the bottleneck for further optimizing the parallel genetic algorithms. The cost is mainly consumed by the selected individuals migration among the populations. Since we introduced the population pool in the algorithm, the communications of different populations are significantly reduced. At the same time, the migration is carried out within a certain interval, thus it prevents the sub-populations from spreading out the individual information in the case of not fully evolved, which also reduces the communication cost. Because communication within a node use shared memory means, its communication cost can be ignored. After using the above strategy, the communication cost of the algorithm is greatly reduced. V.CONCLUSION In this paper, we combined the physical topology of the multi-core PC cluster with the logical structure of coarse-grained-master-slave PGA, integrated the MPI and OpenMP parallel programming model, and put forward a kind of specific hybrid parallel genetic algorithm implementation model. We used two-layer parallel both process and threads to implement HPGA, and introduced

1042

Das könnte Ihnen auch gefallen