GA Clustering

First International Conference on Emerging Trends in Engineering and Technology
Genetic Algorithm Based Clustering: A Survey

Rahila H. Sheikh
RCERT, Chandrapur. (M.S.), Pin: 442403. Email:rahila.patel@gmail.com
M. M.Raghuwanshi
GNIET, Nagpur Email:m_raghuwanshi@rediffmail.com
Anil N. Jaiswal
GHRCE, Nagpur Email:jaiswal_an@yahoo.com
Abstract
This survey gives state-ofthe-art of genetic algorithm (GA) based clustering techniques. Clustering is a fundamental and widely applied method in understanding and exploring a data set. Interest in clustering has increased recently due to the emergence of several new areas of applications including data mining, bioinformatics, web use data analysis, image analysis etc. To enhance the performance of clustering algorithms, Genetic Algorithms (GAs) is applied to the clustering algorithm. GAs are the best-known evolutionary techniques. The capability of GAs is applied to evolve the proper number of clusters and to provide appropriate clustering. This paper present some existing GA based clustering algorithms and their application to different problems and domains.
fitness. Based on the principle of survival of the fittest, a few of chromosomes are selected and each is assigned into the next generation. Biologically inspired operators like crossover and mutation are applied to chromosomes to yield new child chromosomes. The operator of selection, crossover and mutation continues several generations till the termination criterion is satisfied. The fittest chromosome seen in the last generation provides the best solution to the clustering problem. This paper is organized as follows: Section 2 introduces clustering. Section 3 gives a brief introduction on genetic algorithms. Section 4 contains overview of existing GA based clustering techniques. Section 5 presents some applications of GA based clustering. Finally, we draw some conclusions in Section 6.
2. Clustering Keywords- Genetic algorithm, clustering algorithms,

pattern recognition, GA based clustering. Let us describe the clustering problem formally. Assume that S is the given data set:
1. Introduction
Clustering is the unsupervised classification of patterns (or data items) into groups (or clusters). A resulting partition should possess the following properties: (1) homogeneity within the clusters, i.e. data that belong to the same cluster should be as similar as possible, and (2) heterogeneity between clusters, i.e. data that belong to different clusters should be as different as possible. Several algorithms require certain parameters for clustering, such as the number of clusters and cluster shapes. Many non-GA-based clustering algorithms have been widely used, such as K-means, Fuzzy-c-means, EM, etc. However, the number of clusters in a data set is not known in most real-life situations. None of these nonGA-based clustering algorithms are capable of efficiently and automatically forming natural groups from all the input patterns, especially when the number of clusters included in the data set tends to be large. This was often due to a bad choice of initial cluster centers. Complex problems such as unsupervised clustering or nonparametric clustering were often dealt with by employing an evolutionary approach. GAs is the bestknown evolutionary techniques [1]. The capability of GAs is applied to evolving the proper number of clusters and to provide appropriate clustering. The parameters in the search space are represented in the form of strings (chromosome), which are encoded, by a combination of cluster centroids. A collection of such chromosomes is called a population. Initially, a random population is created, which represents different solutions in the search space. An objective and fitness function is associated with each chromosome that represents the degree of
S = {x1 ,...x N } where
xi R n . The goal of
(1) (2) (3)
clustering is to find K clusters C1 , C 2 ,..., C K such that
Ci for i = 1, . . .K Ci C j = for i, j = 1, . . .K; i j
C
i=1
=S
and the objects belonging into same cluster are similar in the sense of the given metric, while the objects belonging into different clusters are dissimilar in the same sense. 1,..., K } In other words, we seek a function f : S { such that for i = 1,..., K : Ci = f (i ) , where Ci satisfy the above conditions. The Euclidean metric can be used to measure the cluster quality. Then the function f is sought such that (4) f = arg min Evq(c1 ,..., ck )
1
f
= arg min xi c f ( xi )
f i =1
where
ck =
1 Ck
xCk i
k=1,,K
(5)
Therefore instead of function f directly, one can search for the centers of the clusters, i.e. vectors c 1 ,..., ck implement the function f as
f ( x ) = arg min x ci
i
(6)
978-0-7695-3267-7/08 $25.00 2008 IEEE DOI 10.1109/ICETET.2008.48
314
That is, assign the point to the cluster corresponding to the nearest centre [2]. Different starting points and criteria usually lead to different taxonomies of clustering algorithms. A rough but widely agreed frame is to classify clustering techniques as hierarchical clustering and partitioned clustering, based on the properties of clusters generated. Hierarchical clustering group data objects with a sequence of partitions, either from singleton clusters to a cluster including all individuals or vice versa, while partitioned clustering directly divides data objects into some pre specified number of clusters without the hierarchical structure. [3],[4] 3. Genetic algorithm Genetic algorithms (GAs) are search and optimization procedures that are motivated by the principles of natural selection and natural genetics [3]. Some fundamental ideas of genetics are borrowed and used artificially to construct search algorithms that are robust and required minimum problem information. In GAs, the role of selection and recombination operators is very well defined. Selection operator controls the direction of search and recombination operator generates new regions for search. Genetic algorithms are having a large amount of implicit parallelism. GAs perform search in complex, large and multimodal landscapes, and provide near-optimal solutions for objective or fitness function of an optimization problem. In GAs, the parameters of the search space are encoded in the form of strings (called chromosomes) and collection of such strings called a population. Initially, a random population is created, which represents different points in the search space. An objective and fittness function is associated with each string that represents the degree of goodness of the string. Based on the principle of survival of the fittest, a few of the strings are selected and each is assigned a number of copies that go into the mating pool. Biologically inspired operators like crossover and mutation are applied on these strings to yield a new generation of strings. The process of selection, crossover and mutation continues for a fixed number of generations or till a termination condition is satisfied [1].
4. Overview algorithms
of
GA
based
clustering
Cluster analysis is a technique, which is used to discover patterns and associations within data. More specifically, it is a multivariate statistical procedure that starts with a data set containing information on some variables and attempts to reorganize these data cases into relatively homogeneous groups. One of the major problems encountered by researchers, with regard to cluster analysis that different clustering methods can and do generate different solutions for the same data set.
What is needed is a technique that has discovered the most `natural' groups in a data set. The research effort by Krovi R. was to investigate the potential feasibility of using genetic algorithms for the purpose of clustering [5]. A novel hybrid genetic algorithm (GA) proposed by K.Krishna and M.N.Murty, finds a globally optimal partition of a given data into a specified number of clusters. This hybrid GA circumvent expensive crossover operations by using a classical gradient descent algorithm used in clustering viz., K-means algorithm. In genetic Kmeans algorithm (GKA), K-means operator was defined and used as a search operator instead of crossover. GKA also define a biased mutation operator specific to clustering called distance-based-mutation. Using finite Markov chain theory, it was proved that the GKA converges to the global optimum. GKA searches faster than some of the other evolutionary algorithms used for clustering. One of the important problems in partition clustering is to find partition of the given data, with a specified number of clusters, which minimizes the total within cluster variation (TWCV). Problem of minimization of TWCV was handled in GKA in [6]. Fast Genetic K-means Algorithm (FGKA) [7] was inspired by (GKA) but features several improvements over GKA. Experiments indicate that, while K-means algorithm might converge to a local optimum, both FGKA and GKA always converge to the global optimum eventually but FGKA runs much faster than GKA. FGKA starts with the initialization phase, which generates the initial population P0. The population in the next generation Pi+1 is obtained by applying the following genetic operators sequentially: the selection, the mutation and the K-means operator on the current population Pi. The evolution takes place until the termination condition is reached. The initialization phase randomly generates the initial population P0 of Z solutions, which might end up with illegal strings. Illegal strings, however, are permitted in FGKA, but were considered as the most undesirable solutions by defining their TWCVs as + and assigning them with lower fitness values. The flexibility of allowing illegal strings in the evolution process avoids the overhead of illegal string elimination as in [6], and thus improves the time performance of the algorithm. Incremental Genetic K-means Algorithm (IGKA) [8] was an extension to previously proposed clustering algorithm, the Fast Genetic K-means Algorithm (FGKA). IGKA outperforms FGKA when the mutation probability was small. The main idea of IGKA was to calculate the objective value Total Within-Cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability was small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. The GA-clustering was proposed in [9] uses searching capability of GAs for the purpose of appropriately determining a fixed number K of cluster centers in N , there by suitably clustering the set of n unlabelled points.
315
The clustering metric that has been adopted was the sum of the Euclidean distances of the points from their respective cluster centers. The chromosomes, which were represented as strings of real numbers, encode the centers of a fixed number of clusters. Under limiting conditions, a GA-based clustering technique was also expected to provide an optimal clustering with respect to the clustering metric being considered [9]. GGA (Genetically Guided Algorithm) [10] describes a genetically guided approach to optimize the fuzzy and hard c-means (FCM/HCM respectively) functionals, Jm and J1, and were used as fitness functions. On data sets with several local extrema, the GA approach always avoids the less desirable solutions. Degenerate partitions were always avoided by the GA approach, which provides an effective method for optimizing clustering models whose objective function can be represented in terms of cluster centers. In paper [11] authors have proposed a GA-based unsupervised clustering technique that selects cluster centers directly from the data set, allowing it to speed up the fitness evaluation by constructing a look-up table in advance, saving the distances between all pairs of data points, and by using binary representation rather than string representation to encode a variable number of cluster centers. More effective versions of operators for reproduction, crossover, and mutation were introduced. Finally, the Davies-Bouldin index [12], [13] was employed to measure the validity of clusters. The algorithm has shown a more stable clustering performance. Variable string length genetic algorithm (GA) was used for developing a novel nonparametric clustering technique when the number of clusters was not fixed a priori. Chromosomes encoded in real number, in the same population and have different lengths since they encode different number of clusters. The crossover operator was redefined to tackle the concept of variable string length. Cluster validity index was used as a measure of the fitness of a chromosome. The performance of several cluster validity indices, namely, DaviesBouldin(DB) index, Dunns index, two of its generalized versions and a recently developed index, in appropriately partitioning a data set, were compared in[14].[12],[13]. Bandyopadhyay S. and Maulik U. have exploited the searching capability of genetic algorithms for automatically evolving the number of clusters as well as proper clustering of any data set. A new string representation, comprising both real numbers and the do not care symbol, were used in order to encode a variable number of clusters. The DaviesBouldin index was used as a measure of the validity of the clusters. Effectiveness and utility of the genetic clustering scheme was demonstrated for a satellite image of a part of the city Calcutta. The proposed technique was able to distinguish some characteristic land cover types in the image in [15]. Clustering Genetic Algorithm (CGA) proposed in [2]. It out performs the k-means algorithm on some tasks. In
addition, it was capable of optimizing the number of clusters for tasks with well formed and separated clusters. The framework was the same as in genetic algorithm, while the individual building blocks of the algorithm were modified and adopted for the clustering task. In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numeric and categorical values. However, most existing clustering algorithms were only efficient for the numeric data rather than the mixed data set. LI Jie, G. Xinbo in [16] has presented a novel clustering algorithm for mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm was used to optimize the new cost function to obtain valid clustering result. A hybrid genetic based clustering algorithm, called HGA-clustering was proposed in [17] to explore the proper clustering of data sets. This algorithm, with the cooperation of tabu list and aspiration criteria, has achieved harmony between population diversity and convergence speed. K-modes algorithm has been developed for clustering categorical objects by extending from the k-means algorithm. A genetic algorithm was proposed for designing the dissimilarity measure, termed Genetic Distance Measure (GDM) such that the performance of the K-modes algorithm is improved [18]. A semi-supervised clustering algorithm was proposed in [19] that combines the benefits of supervised and unsupervised learning methods. The approach allows unlabeled data with no known class to be used to improve classification accuracy. The objective function of an unsupervised technique, e.g. K-means clustering, was modified to minimize both the cluster dispersion of the input attributes and a measure of cluster impurity based on the class labels. A genetic algorithm optimizes the objective function to produce clusters. The K-means Fast Learning Artificial Neural Network (KFLANN)in [20] was a small neural network bearing two types of parameters, the tolerance, and the vigilance, . In KFLANN GA was introduced as a possible solution for searching through the parameter space to effectively and efficiently extract suitable values to and . It was also able to determine significant factors that help achieve accurate clustering. SPMD (Single Program Multiple Data) algorithm presented in [21] combines GA with local searching algorithm uphill. The hybrid parallel method not only improves the convergence of GA but also accelerates the convergence speed of GA. The SPMD algorithm exploits the parallelism of GA, at the same time, overcomes the premature and poor convergence properties of GA. The algorithm was applied on typical multiple local minima functions, TSP problem and an engineering computation problem QCBED on author developed cluster system THNPSC-1. Genetic Weighted K-means Algorithm (GWKMA), which was a hybridization of a genetic algorithm (GA) and a weighted K-means algorithm (WKMA),proposed
316
by Fang-Xiang et al. GWKMA encodes each individual by a partitioning table which uniquely determines a clustering, and employs three genetic operators (selection, crossover, Mutation) and a WKMA operator. The superiority of the GWKMA over the WKMA and other GA-clustering algorithms without the WKMA operator was demonstrated [22]. The aim was to facilitate the application of user defined constraints to the genetic clustering algorithm. This was achieved by presenting a general penalty function. The penalty function was defined as a normal distribution. The function was augmented to an extensible environment to assemble genetic clustering algorithms, called DAGC. The main idea behind the design of DAGC was to provide the researchers with an environment to develop and investigate genetic clustering algorithms by selecting the building blocks from an extensible library. It also provides the user with some templates to build their own building blocks. This new version of DAGC was equipped with some interfaces to define new constraints or to apply existing ones [23]. Fuzzy clustering, which is one category of clustering method, assigns one sample to multiple clusters according to their degrees of membership. It is more appropriate than hard clustering for analyzing gene expression profiles because single gene might involve multiple genetic functions. Generic clustering methods, however, have inherent problems that they are sensitive to initialization and can be trapped into local optima. In order to solve these problems, an evolutionary fuzzy clustering method was proposed which uses genetic algorithm for clustering and Bayesian validation for evaluation [24]. A hybrid GA (genetic algorithm)-based clustering (HGACLUS)[25] schema, combing merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. Using simulated data and open micro array gene-expression datasets compared the performance of HGACLUS and other methods. HGACLUS was generally found to be more accurate and robust than other methods by the exact validation strategy and the explicit cluster number. In paper [26] authors present data clustering using improved genetic algorithm (IGA) in which an efficient method of crossover and mutation were implemented. Further it was hybridized with the popular Nelder-Mead (NM) Simplex search and K-means to exploit the potentiality of both in the hybridized algorithm. The performance of hybrid approach was evaluated with few data clustering problems. Authors have proposed an image retrieval method based on the emotion using interactive genetic algorithm in [27] It searches the goal with a small population size and generates fewer numbers of generations than that of conventional genetic algorithm to reduce users burden. An idea of sparse fitness evaluation method using clustering method and fitness allocation method was suggested. This has kept not
only the advantages of interactive GA but also improved the performance by utilizing large population [27]. Paper [28] has described a rather novel method for the supervised training of regression systems that can be an alternative to feedforward Artificial Neural Networks (ANNs) trained with the BackPropagation algorithm. The proposed methodology was a hybrid structure based on supervised clustering with genetic algorithms and local learning. Supervised Scaled Regression Clustering with Genetic Algorithms (SSRCGA) offers certain advantages related to robustness, generalization performance, feature selection, explanative behavior, and the additional flexibility of defining the fitness function and the regularization constraints. Computational results of SSRCGA are compared with backpropagation trained ANNs [28]. Mark J. Embrechts, Dirk Devogelaere, proposed a new type of point symmetry based distance. GASDCA (GA with point Symmetry Distance based Clustering Algorithm) was able to detect both convex and nonconvex clusters. Kd-tree based nearest neighbor search was used to reduce the complexity of finding the closest symmetric point. The proposed GASDCA was compared with existing symmetry based clustering technique, SBKM, its modified version, Mod-SBKM and the wellknown K-means algorithm [29]. Clustering techniques have been a valuable tool for several data analysis applications. However, one of the main difficulties associated with clustering is the validation of the results obtained. Both clustering algorithms and validation criteria present an inductive bias, which can favor datasets with particular characteristics. Besides, different runs of the same algorithm using the same data set may produce different clusters. Traditional clustering and validation techniques were combined with a Genetic Algorithm (GA) to build clusters that better approximate the real distribution of the dataset. The GA employs a fitness function that combines two validation criteria. Such combination allows the GA to improve the evaluation of the candidate solutions. Furthermore, this combined approach avoids the individual weaknesses of each criterion [30].
5. Applications of GA based clustering

In previous section, various GA based clustering algorithms are studied. This section provides discussion on few of applications of GA based clustering algorithms. Paper [31] shows application of a Genetic Algorithm to production simulation. The simulation is treated as a detailed, stochastic, multi-modal function that describes a performance statistic. Authors tried to optimize (or at least improve) the performance of the system. A model of a real-world production line for printed circuit boards that has many products and must often be retooled or reconfigured was used. Since the product line is always changing, with half of the products turning over within a year, the job of configuring and fine-tuning the production line is never ending. This paper has shown
317
that a Genetic Algorithm when attached to the simulation model can provide excellent support for this process. This combination can be used to obtain quick and stable results that do indeed indicate the direction to improved production [31]. In microarray data analysis, clustering is a method that groups thousands of genes by their similarities of expression levels, helping to analyze gene expression profiles. This method has been used for identifying unknown functions of genes. The fuzzy clustering method assigns one sample to multiple groups according to their degrees of membership. This method was more appropriate for analyzing gene expression profiles, because a single gene might be involved in multiple functions. An evolutionary fuzzy clustering method with knowledge-based evaluation was proposed in [32]. The segmentation problems were formulated upon such images as an optimization problem and adopt evolutionary strategy of Genetic Algorithms for the clustering of small regions in colors feature space. The present approach uses k-Means unsupervised clustering methods into Genetic Algorithms, namely for guiding this last Evolutionary Algorithm in his search for finding the optimal or sub-optimal data partition, task that as we know, requires a non-trivial search because of its intrinsic NP-complete nature. To solve this task, the appropriate genetic coding was also discussed [33]. The image compression problem using genetic clustering algorithms based on the pixels of the image was proposed in [34]. GA was used to obtain an ordered representation of the image and then the clustering was performed to obtain the compression. In order to solve the problem of error estimation of real life data set the optimization technique of genetic algorithm (GA) was applied to the new adaptive cluster validity index, which is called the Gene Index (GI). The algorithm applies GA to adjust the weighting factors of adaptive cluster validity index to train an optimal cluster validity index [34]. Paper [35]presents a genetic algorithm that deals with document clustering. This algorithm calculates an approximation of the optimum k value, and solves the best grouping of the documents into these k clusters. This algorithm was evaluated with sets of documents that were the output of a query in a search engine. The modified variable string length genetic algorithm (MVGA) was proposed in [36] for text clustering. This algorithm has been exploited for automatically evolving the optimal number of clusters as well as providing proper data set clustering. The chromosome was encoded by a string of real numbers with special indices to indicate the location of each gene. More effective versions of operators for selection, crossover, and mutation were introduced in MVGA which can also automatically adjust the influence between the diversity of the population and selective pressure during generations. GA was applied to enhance the performance of clustering algorithms in mobile ad hoc networks. Authors have optimized recently proposed weighted clustering
algorithm (WCA). The proposed technique was such that each clusterhead handles the maximum possible number of mobile nodes in its cluster in order to facilitate the optimal operation of the medium access control (MAC) protocol. Consequently, it results in the minimum number of clusters and hence clusterheads. [37] Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology was proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm [38]. To solve task clustering problem on an unbounded number of processors, a new genetic algorithm approach which has nice properties called Genetic Convex Clustering Algorithm (GCCA) was proposed. The main idea was to assign tasks to locations in convex groups. Here arbitrary execution time was considered and a novel crossover operator in the context of the task clustering problem was developed [39]. Genetic algorithm with species and sexual selection (GAS3) was proposed to solve unimodal and multi-modal function of various difficulties [40]. GAS3 uses sex determination method to determine the sex (male or female) of members in population. Each female member was considered as a niche in the population and the species (cluster) formation takes place around these niches. Species formation was based on Euclidean distance between female and male members. Each species contains one female member and zero or more male members. Parameter-less cluster formation technique was used. Merging of clusters takes place based on performance evaluation criterion.
6. Conclusion:
The capability of GAs was applied to evolving the proper number of clusters and providing appropriate clustering. Many GA based clustering algorithms are studied. Some are applied on small data set and some are applied on large data set. GA based clustering techniques can be used in many application areas like production simulation, image segmentation, document clustering, image compression, gene expression analysis, text clustering etc. GA was applied on Clustering algorithms like K-means and fuzzy c-means which are mostly distance based clustering algorithms. GA is yet to be applied to other clustering algorithm.
Reference:
[1]D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, New York, 1989. [2] Petra Kudova ,Clustering Genetic Algorithm, IEEE, DOI 10.1109/DEXA.2007.65, 2007 [3] Rui Xu ,and Donald Wunsch, Survey of Clustering Algorithms, IEEE Transactions On Neural Networks, Vol. 16, No. 3, MAY 2005
318
[4] Jawai Han and M. Kamber, Data Mining Concepts and Techniques, second edition, Elsevior [5] Krovi, R, Genetic algorithms for clustering: a preliminary investigation, System Sciences, Proceedings of the TwentyFifth Hawaii International Conference, Volume: iv On page(s): 540-544, Date: 7-10 Jan 1992, [6]K. Krishna and M. N. Murty, Genetic K-Means Algorithm, IEEE Transaction On Systems, Man, And CyberneticsPart B: CYBERNETICS, Vol. 29, No. 3, June 1999 [7] Yi Lu, Shiyong Lu, Farshad Fotouhi ,FGKA: A Fast Genetic K-means Clustering Algorithm, SAC04 Nicosia, Cyprus. , March 2004 ACM 1-58113-812-1/03/04 [8] Yi Lu1, Shiyong Lu1, Farshad Fotouhi1, Youping Deng, d. Susan, J. Brown, an Incremental genetic K-means algorithm and its application in gene expression data analysis, BMCBioinformatics 2004 [9]U. Maulik, S. Bandyopadhyay, Genetic algorithm-based clustering technique, Pattern Recognition 33, 2000 [10] Hall, L.O., Ozyurt I. B., and Bezdek, J.C, Clustering with a genetically optimized approach. IEEE Trans. on Evolutionary Computation,1999 [11] H.J. Lin, F.W. Yang and Y.T. Kao, An Efficient GAbased Clustering Technique, Tamkang Journal of Science and Engineering, Vol. 8, No 2, pp. 113_122, 2005 [12]Davis, D. L. and Bouldin, D. W., A Cluster Separation Measure, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 1, pp. 224-227,1979. [13] Bezdek, J. C., Some New Indexes of Cluster Validity, IEEE Trans. Systems, Man, and Cybernetics - Part B: [14]Bandyopadhyay, S. and Maulik, U., Nonparametric Genetic Clustering: Comparison of Validity Indices, in IEEE Trans. Systems, Man, and Cybernetics Part C: Application and reviews, Vol. 31, pp. 120-125, 2001. [15]Bandyopadhyay, S. and Maulik, U., Genetic Clustering for Automatic Evolution of Clusters and Application to Image Classification,PatternRecognition,Vol.35,pp.1197-1208, 2002. [16] LI Jie, G. Xinbo, A GA-Based Clustering Algorithm for Large Data Sets With Mixed Numeric and Categorical Values, IEEE, Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA03) 0-7695-1957-1/03, 2003 [17] Y. Liu, Kefe and X. Liz, A Hybrid Genetic Based Clustering Algorithm, Proceedings of the Third International Conference on Machine Leaming and Cybernetics, Shanghai, 26-29 August 2004 [18] S. Chiang, S. C. Chu, Y. C. Hsin and M. H. Wang, Genetic Distance Measure for K-Modes Algorithm, International Journal of Innovative Computing, Information and Control ICIC,ISSN 1349-4198, Volume 2, Number 1, pp. 33-40 February 2006 [19] Ayhan D., K. P. Bennett, M. J. Embrechts, Semi-Supervised Clustering Using Genetic Algorithms, [20] Yin Xiang, Alex Tay Leng Phuan, Genetic Algorithm Based K-Means Fast Learning Artificial Neural Network , Nanyang Technological University [21] Zhihui D., Meng D., Sanli Li, Shuyou Li, Mengyue Wu and Jing Zhu, Massively Parallel SPMD Algorithm for Cluster Computing: Combining Genetic Algorithm with Uphill. [22] Fang-Xiang Wu, Anthony J. Kusalik and W. J. Zhang, Genetic Weighted K-means for Large-Scale Clustering Problems, University of Saskatchewan, CANADA [23] Omid Bushehrian ,Saeed Parsa, Genetic Clustering with Constraints, Journal of Research and Practice in Information Technology, Vol. 39, No. 1, February 2007
[24] Robert Entriken, Gentic Algorithms With Cluster Analysis For Production Simulation, Proceedings of the 1997 Winter Simulation Conference ed.1997 [25] H. Pan, J. Zhu, DanfuGeno., Genetic Algorithms Applied to Multi-Class Clustering for Gene Ex-pression Data, Geno., Prot. & Bioinfo. Vol. 1 No. 4 November 2003 [26] V. Katari, S. C. Satapathy, JVR Murthy,P. Reddy ,A Hybridized Improved Genetic Algorithm with Variable Length Chromosome for Image Clustering, International Journal of Computer Science and Network Security, VOL.7 No.11, November 2007 [27]Joo-Young Lee and Sung-Bae Cho, Sparse Fitness Evaluation for Reducing User Burden in Interactive Genetic Algorithm, IEEE International Fuzzy Systems Conference Proceedings, August 22-25, 1999, Seoul, Korea [28] Mark J. Embrechts, Dirk Devogelaere,Supervised Scaled Regression Clustering: an Alternative to Neural Networks, 07695-0619-4/00,2000 IEEE [29]S. Saha, S. Bandyopadhyay, U. Maulik, A New Symmetry-Based Genetic Clustering Algorithm, Machine Intelligence Unit, Indian Statistical Institute, India [30]M. C. Naldi and Andre C. P. L. F. de Carvalho Clustering Using A Genetic Algorithm Combining Validation Criteria, ESANN'2007 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 2007 [31] Han-Saem Park and Sung-Bae Cho, Evolutionary Fuzzy Clustering for GeneExpression Profile Analysis, SCIS&ISIS2006 @ Tokyo, Japan ,September 20-24, 2006 [32]Han-Saem Park, Si-Ho Yoo, and Sung-Bae Cho Evolutionary Fuzzy Clustering Algorithm with KnowledgeBased Evaluation and Applications for Gene Expression, Journal of Computational and Theoretical Nanoscience Vol.2, 110, 2005 [33]Vitorino R., Fernando M., Image Colour Segmentation by Genetic Algorithms, CVRM - Centro de Geosistemas, Instituto Superior Tcnico,Av. Rovisco Pais, Lisboa, PORTUGAL [34] Merlo, Caram, Fernndez, Britos, Rossi, & GarcaMartnez R.,Genetic-Algorithm Based Image Compression,SBAISimpsio Brasileiro de Automao Inteligente, So Paulo, SP, 08-10 de Setembro de 1999 [35] A. Casillas, M. T. Gonzalez de Lena, and R. Martnez, Document Clustering into an unknown number of clusters using a Genetic Algorithm. [36] Wei S. and Soon C. P., Genetic Algorithm-based Text Clustering Technique: Automatic Evolution of Clusters with High Efficiency, Proceedings of the Seventh International Conference on Web-Age Information Management Workshops (WAIMW'06) 0-7695-2705-1/06 2006,IEEE [37] Damla Turgut Sajal K. Das, Ramez Elmasri and Begumhan,Optimizing Clustering Algorithm in Mobile Ad hoc Networks Using Genetic Algorithmic Approach, [38] Razib M. O., Safaai D., Rosli M. I., Zalmiyah Z. and Saberi M. M., Automatic Clustering of Gene Ontology by Genetic Algorithm International Journal of Information Technology Volume 3 Number 1 December 18, 2005. [39]J.E.P. Sanchez, D. Trystram, A New Genetic Convex Clustering Algorithm for Parallel Time Minimization with Large Communication Delays, Parallel Computing: Current & Future Issues of High-End Computing, Proceedings of the International Conference ParCo 2005, [40] M.M. Raghuwanshi and O.G. Kakde, Distributed Quasi Steady-State Genetic Algorithm with Niches and Species, International Journal of Computation and Intelligence review (IJCIR), Vol. 3(2) pp. 155-164, 2006.
319

GA Clustering

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

GA Clustering

Hochgeladen von

Copyright:

Verfügbare Formate

First International Conference on Emerging Trends in Engineering and Technology

Genetic Algorithm Based Clustering: A Survey

2. Clustering Keywords- Genetic algorithm, clustering algorithms,

S = {x1 ,...x N } where

clustering is to find K clusters C1 , C 2 ,..., C K such that

Ci for i = 1, . . .K Ci C j = for i, j = 1, . . .K; i j

978-0-7695-3267-7/08 $25.00 2008 IEEE DOI 10.1109/ICETET.2008.48

5. Applications of GA based clustering

Das könnte Ihnen auch gefallen