Application of Support Vector Machine and Genetic Algorithm For Improved Blood Cell Recognition

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 58, NO.
7, JULY 2009
2159
Application of Support Vector Machine and Genetic Algorithm for Improved Blood Cell Recognition
Stanislaw Osowski, Robert Siroi , Tomasz Markiewicz, and Krzysztof Siwek c
AbstractThis paper presents the application of a genetic algorithm (GA) and a support vector machine (SVM) to the recognition of blood cells based on the image of the bone marrow aspirate. The main task of the GA is the selection of the features used by the SVM in the nal recognition and classication of cells. The automatic recognition system has been developed, and the results of its numerical verication are presented and discussed. They show that the application of the GA is a powerful tool for the selection of the diagnostic features, leading to a signicant improvement of the accuracy of the whole system. Index TermsBlood cell recognition, genetic algorithm (GA), support vector machine (SVM).
I. I NTRODUCTION HE RELATIVE counting and assessment of the blood cells in the bone marrow of patients are very informative in clinical practice. It is particularly important for patients suffering from leukemia in the observation of the development stage of the illness and the preparation of the treatment of patients [1], [2]. To achieve proper diagnosis of the disease, we have to recognize the cells at different stages of their development and calculate their relative quantity in the aspirated bone marrow. There are different cell lines in the bone marrow, the most important of which are the granulocytic and lymphocytic (white blood cells) and erythrocytic (red blood cells) series [1], [2]. The blood cells in the human bone marrow are continuously developing, transforming themselves from one type to another within the same development line. In the development of the white blood cells, the specialists recognize the myeloblast, promyelocyte, myelocyte, metamyelocyte, band neutrophil, and segmented neutrophil. In the case of the erythrocytic line, three different stages are recognized: 1) basophilic erythroblast; 2) polychromatic erythroblast; and 3) pyknotic erythroblast.
Manuscript received October 26, 2007; revised January 31, 2008. First published November 11, 2008; current version published June 10, 2009. This work was supported by the Polish Ministry of Science and Higher Education. The Associate Editor coordinating the review process for this paper was Dr. Dario Petri. S. Osowski is with the Institute of the Theory of Electrical Engineering, Measurement and Information Systems, the Warsaw University of Technology, 00-661 Warsaw, Poland, and also with the Military University of Technology, 00-908 Warsaw, Poland (e-mail: sto@iem.pw.edu.pl). R. Siroi and K. Siwek are with the Warsaw University of Technology, c 00-661 Warsaw, Poland. T. Markiewicz is with the Warsaw University of Technology, 00-661 Warsaw, Poland, and also with the Military Institute of the Health Services, 00-909 Warsaw, Poland. Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TIM.2008.2006726
In the lymphocyte line, we recognize the prolymphocyte and lymphocyte cells. The most difcult problem is the recognition between two neighboring cells in their development line since the cells are very similar and the border point between two neighbors is not well dened (even for specialists). There is a relatively small number of scientic papers devoted to this peculiar problem. Beksac et al. [3] have presented the solution for a system of blood cell recognition applying multilayer perceptron (MLP), but the accuracy is far from good (39% of misclassications) in the recognition of 16 types of cells. In addition, Theera-Umpon and Gader [4] have reported the recognition of six types of cells by using multilayer perceptron at a misclassication of 42%. Recently, Osowski and Markiewicz [5] and Siroic et al. [6] have presented the rst results of the recognition of 17 blood cells, and the averaged recognition error compared with that of a human expert was 18.7%. During the past years, some new semiautomatic blood cell counters have appeared on the market [7], [8], offering a differential count of segmented neutrophils, band neutrophils, monocytes, myelocytes, blast cells, metamyelocytes, erythroblasts (without differentiation of the types), and lymphocytes (without differentiation of the types). They apply the backpropagation neural network (MLP) as the recognizing and classifying tool. The reported accuracy (the discrepancy between the output of the system and the human expert score) for six types of granulocyte families is about 15%. However, the system does not have the capacity to classify immature or abnormal cells with fully accepted accuracy needed in hospital practice [7] for all kinds of cells. In addition, there is a rigorous procedure of checking the quality of the images before applying this semiautomatic system. Therefore, it is treated as a subsidiary tool. Much work should be done to provide efciency that is comparable to that of the human expert. The most important task in pattern recognition is selecting the proper diagnostic features, describing the image by the numerical values, and enabling the automatic system to perform the recognition [9]. There are many ways on how the features can be generated for the image of the cells extracted from the bone marrow aspirate. Usually, we can generate many (over a hundred) of the features using texture, geometrical, and statistical analyses of the cell image. Not all of them are equally good and applicable in the recognition process. Some may assume similar values for different blood cells and thus represent no discriminative properties. Such features may be treated as noise from the recognition process point of view. Some are strongly correlated with the others and may dominate the whole set, decreasing its efciency. Thus, the important
0018-9456/$25.00 2008 IEEE
2160
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 58, NO. 7, JULY 2009
Fig. 1. Sample images of the bone marrow aspirate at 1000 magnication.
problem is nding out the optimal set of the most important features leading to the highest efciency of the recognition. Many different approaches have been proposed for the selection of the most important features, which include correlation analysis, analysis of the mean and variance of the data belonging to different classes, principal and independent component analysis, application of the linear support vector machine (SVM) [10], [11], etc. This paper will be concerned with the selection of the best features using genetic algorithm (GA) cooperating with the Gaussian kernel SVM classier, working in one-against-one mode. It will be shown that the features selected by the GA provide improved recognition rate and outperform the classical methods of feature selection. II. P ROBLEM D ESCRIPTION The main task of this work is the development of a system with the highest possible recognition rate of the blood cells based on the image of the bone marrow aspirate of the patient. The bone marrow aspirate usually contains many different cells that should be recognized and classied to enable medical experts to diagnose the patient. Two typical examples of the image of the bone marrow aspirate are shown in Fig. 1. It contains the immature blood cells in their development line (the large dark objects of different shapes and sizes containing the well visible nucleus and transparent cytoplasm) and the slightly visible shadows (the cells without the nucleus) and smudges. The cells under recognition have been denoted by numbers. Fig. 1(a) mainly shows myeloblasts (4) and one neutrophilic myelocyte (6). In Fig. 1(b), we can see a variety of cells, including three different types of erythroblasts (1, 2, and 3), neutrophilic myelocytes (6), neutrophilic metamyelocyte (7), band neutrophils (8) and segmented neutrophils (9), and lymphocytes (11). The rst step in the whole procedure is the extraction of the individual cells from the image of the smear image. We have developed the automatic system of the cell extraction, based on morphological operations, such as erosion, dilation, closing, opening, smoothing, and watershed transformation [12], [13]. All operations have been implemented in Matlab [14]. The image processing system analyzes the bone marrow image and performs all morphological operations and, as a result, produces
the separated cell images that should have undergone the next postprocessing stages, leading to the diagnostic features. The choice of the numerical features describing the image is quite important. The features are the numerical values that should be stable for all representatives of the same class but substantially differ for cells belonging to different classes. In generating features, we have tried to follow the human expert by stressing the details, based on how they proceed with the recognition. Such details include the size of the cells, shape, granulation, distribution, and intensity of colors. We have elaborated the preprocessing steps for extracting the features imitating or determining the parameters of the details that the human expert takes into account. Additionally, these features should characterize the image in such a way as to suppress the differences within the same class and enhance the differences between the cells belonging to different classes. In our approach to the problem, we have relied on the features belonging to four main groups: 1) textural; 2) geometrical; 3) statistical; and 4) morphological [5]. The textural features reect the statistical arrangement of the pixels in the image. The sum and difference histograms of the gray levels of the neighboring pixels are analyzed at different directions, and based on such analysis, the mean value, angular second momentum, contrast, and entropy are treated as the features. They are determined for the red, green, and blue colors, independent of the nucleus and cytoplasm. We have applied the Unser method of textural features [15], generating them for the original image and the image of reduced resolution (with four- and eightfold reductions applied only to the nucleus). Up to 106 features have been generated this way. The geometrical features describe different aspects of the geometry of the cell and use parameters describing the area, radius, perimeter, symmetry (the difference between lines that are perpendicular to the major axis to the cell boundary in both directions), compactness (perimeter2 /area), concavity (the extent to which the actual boundary of a cell lies on the inside of each chord between nonadjacent boundary points), the length of the major and minor axes, etc. Up to 19 geometrical features have been created this way. No color information is used here. The statistical parameters refer to the color distribution contained in the cell image. We have created the features based
OSOWSKI et al.: APPLICATION OF SVM AND GA FOR IMPROVED BLOOD CELL RECOGNITION
2161
on the histograms of the image matrix and the gradient matrix of the image for three color components: red, green, and blue. The mean value, variance, skewness, and kurtosis of both histograms are used as the features. Up to 27 features have been generated this way. The last set of features refers to the results of the morphological operations, such as erosion and dilation, performed a few times on the cell image. Such features include the relative size and number of cells before and after applying these morphological operations. Twelve features have been dened this way. The described methods of feature generation produce a very rich group of parameters containing 164 features. However, they are of different discrimination abilities. Some of them are strongly correlated with the others; some perform like noise, reducing in this way the overall efciency of the recognizing system. The important problem is thus the assessment of the quality of each feature and the selection of the best set of features. The result of this ranking is used for the reduction of the least important features. III. SVM C LASSIFIER The SVM is a powerful solution to the classication problems. In this paper, it has been used for the recognition and classication of cells, but it can also be applied to feature selection [16]. The main advantage of the SVM network used as a classier is its very good generalization ability and extremely powerful learning procedure, leading to the global minimum of the dened error function [16], [17]. From the principle of operation, the SVM is a linear machine working in the high-dimensional feature space formed by the nonlinear mapping of N -dimensional input vector x into a K-dimensional feature space (K > N ) through the use of function (x). The separation of two classes is performed by the hyperplane dened in the form g(x) = wT (x) + b = 0, with (x) = [1 (x), 2 (x), . . . , K (x)]T , w as the weight vector of network w = [w1 , w2 , . . . , wK ]T , and b as the bias. The learning of the SVM network working in the classication mode is aimed at the maximization of the separation margin between two classes, which is denoted here as d1 = 1 and d2 = 1. Mathematically, it corresponds to the minimization of cost function (w, ) dened as [16] min (w, ) = with constraints di wT (x) + b 1 i i 0 (2a) (2b) 1 T w w+C 2
p
modes use the so-called kernel function, satisfying the Mercer conditions [17]. The kernel is a scalar function dened as the inner product of vector functions (xi ) and (x) dened as K(xi , x) = T (xi )(x). The most known kernel functions are the radial Gaussian basis, polynomial, spline, or sigmoidal functions. The nal learning problem of the SVM is transformed to the solution of the so-called dual problem dened with respect to the Lagrange multipliers [16], [17], i.e.,
p
max Q() =
i=1
1 2
i j di dj K(xi , xj )
i=1 j=1
(3)
with the constraints (i = 1, 2, . . . , p)

p
i di = 0
i=1
(4a) (4b)
0 i C.
The solution of the dual problem results in an optimal weight vector wo , where wo = p i di (xi ). Observe that, among i=1 p learning data points, only a limited number Nsv (for which the associated Lagrange multipliers are nonzero) take an active part in forming the solution. The output signal y(x) of the SVM after learning is described in the form [16], [17]
p
y(x) =
i=1
i di K(xi , x) + b.
(5)
i
i=1
(1)
for i = 1, 2, . . . , p, where C > 0 is the user-specied constant representing the regularization coefcient, i 0 is the nonnegative slack variable, and p is the number of given learning data pairs (xi , di ). The solution of the optimization problem is established by dening the Lagrangian function, nally leading to the quadratic programming with respect to the Lagrange multipliers i [16], [17]. All operations in the learning and testing
Vector x represents the class when y(x) is positive and the alternative class when y(x) is negative. The important point in designing the SVM classier is the choice of kernel function. The introductory experiments have shown that the best results are obtained with the application of the Gaussian kernel, and this particular kernel has been applied in all the experiments. The hyperparameter of the Gaussian function and the regularization constant C have been adjusted by repeating the learning experiments for the set of their predened values (the grid search) and choosing the best value on the validation data sets. Their optimal values are those for which the classication error on the validation data set was the smallest. For normalized data, the optimal values of applied in the experiments were varied from 0.5 to 1.1, whereas regularization constant C was set equal to 100 for all pairs. The SVM networks were trained using a modied Platt algorithm [18] implemented using Matlab [14]. To deal with the problem of many classes, we have applied the one-against-one approach [19]. In this method, many SVM networks are trained to recognize between all the combinations of two classes of data. In this way, for M classes, we have to train M (M 1)/2 individual SVM networks. In the retrieval mode, vector x belongs to the class of the highest number of winnings among all combinations of classes. The main advantage of the one-against-one technique is its efciency in dealing with a two-class recognition problem. At the same time, it provides the best balance between the sample numbers of both classes under recognition, leading to better performance of the SVM classier system and improvement of the accuracy of classication.
2162
Fig. 2. Feature ranking for the recognition of promyelocytes and myelocytes by applying (a) discrimination coefcient SAB (f ) ranking and (b) linear multipleinput SVM approach ranking.
IV. F EATURE -A SSESSMENT M ETHOD The choice of the best set of features forming input vector x for the cell recognition is a very important step that may substantially increase the efciency of the whole system. There are many different approaches to this problem [5], [10], [11]. Some of them investigate the individual feature discriminative ability and use the features of the highest measure, eliminating the features strongly correlated. The most often used approaches include the method based on the analysis of the variance and mean of the data samples belonging to each class. The variance of the features describing the cells belonging to the same class should be as small as possible. At the same time, the means of the feature values for the data belonging to different classes should be maximized. To take into account both measures, the so-called discrimination coefcient SAB (f ) describing the discriminative value of feature f for the recognition between classes A and B is dened and given by [10] SAB (f ) = |cA (f ) cB (f )| . A (f ) + B (f ) (6)
Quantities cA (f ), cB (f ), A (f ), and B (f ) denote the mean values and standard deviations of feature f for classes A and B, respectively. Observe that the particular feature may be very good for recognition between two chosen classes and useless for some others. Therefore, the class-oriented features should be considered. For each two-class recognition problem, the features may be arranged in decreasing order, and a limited number of the best features may be used. However, it is a known fact that the feature discriminative ability may change when used in cooperation with others [10]. Therefore, more reliable results are expected if we investigate their discriminative properties in the whole set of features. The best methods include the approach based on the application of the linear SVM [5], [11]. In this approach, the linear SVM classier is simultaneously fed by the whole set of candidate features generated in the
introductory phase. The classier is trained to recognize all classes. As a result of learning, the values of the weights leading from the features to the classier are differentiated in a natural way, according to the importance of the features. The feature selection method is based on the idea that the absolute values of the weights of a trained linear SVM classier produce a feature ranking. The feature associated with the larger weight is more important than that associated with the smaller weight. This way, the features may easily be arranged in a decreasing order: from the most to the least important. Usually, the importance of feature f changes for different pairs of classes, so this analysis should also be separately done for each pair. Fig. 2 shows the typical distribution of the discriminative abilities of all the 164 generated features for the recognition of the cells belonging to the promyelocyte and myelocyte families. T, G, S, and M correspond to the set of textural, geometrical, statistical, and morphological features, respectively. Fig. 2(a) shows the results of the application of the discrimination coefcient SAB (f ), whereas Fig. 2(b) shows the linear SVM ranking. The x-axis represents the succeeding number associated with the particular feature. The ranking value in Fig. 2(a) is the value of SAB (f ), and that in Fig. 2(b) is the absolute value of the weight connecting the appropriate feature (the input node) with the output of the linear SVM. As shown, the ranking coefcients of different features vary in a large range: from very small (close to zero) to very high. However, the signicant features belong to the four aforementioned families. It means that all the methods of feature generation are important for cell recognition. It is interesting to nd that both methods produce different results. The statistical experiments have found that, in the set of 30 best features, only 60% are common for both methods. The main source of this discrepancy is the statistical dependence among the features. Individual assessment of the features does not take into account their mutual dependence, and the mutually dependent features deteriorate the accuracy of recognition [10]. Selection of the features that work together automatically
2163
eliminates this problem. The verication of both approaches by checking their results in the classication system has conrmed that the linear SVM ranking is more efcient and produces better classication results [5]. The arrangement of features according to their discriminative power does not solve all the selection problems since the optimal number of the best features is still unknown. The method of trials is very time consuming, because it should be repeated for all combinations of the two classes. In this paper, we propose the new genetic approach, leading to the optimal choice of the best features and, at the same time, their optimal quantity. V. GA A PPROACH GAs are adaptive heuristic search algorithms of optimization based on the evolutionary ideas of natural selection and genetics [20], [21]. The basic concept of the GA is designed to simulate the processes in a natural system (i.e., inheritance, mutation, selection, and crossover) that are necessary for evolution, leading to the survival of the ttest. Traditionally, populations are represented in binary as strings of 0s and 1s, and are called chromosomes. In our solution, each chromosome represents one feature. (A value of 1 means inclusion of the feature, and a value of 0 means deletion from the actual set of features.) GA consists of selecting parents for reproduction, performing crossover between the parents, and applying the operation of mutation to the bits representing the children. The evolution usually starts from a population of randomly generated individuals and happens in generations. For each individual, the error of the problem representation and, based on this, the tness function determined (typically, tness is the inverse of the error function) are calculated. In the GA, the tness is maximized. (The error function is minimized.) In each generation, the tness of every individual in the population is evaluated, and multiple individuals are stochastically selected from the current population (based on their tness function values) and modied (recombined and possibly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. The summary of the GA can be presented in ve steps [22]. 1) Randomly generate the initial population M (0). 2) Compute and save tness u(m) for each individual m in the current population M (k). 3) Dene selection probabilities p(m) for each individual m in M (k) in such a way as p(m) is proportional to u(m). 4) Generate M (k + 1) by probabilistically selecting individuals from M (k) to produce the offsprings via genetic operators. 5) Repeat step 2 until a satisfying solution (the maximum of the dened tness function) is obtained. The GA operates with a population of limited size of the candidates. Only one of these is the best, but the other members of the population represent the sample points in other regions of the search space, where a better solution may be found. The GA performs a selection process in which the mostly t members of the population survive and the least t members are eliminated. The most common type is the roulette wheel selec-
tion. In this method, the individuals are given the probability of being selected, with the probability being directly proportional to their tness. Two individuals are then randomly chosen based on these probabilities and produce an offspring. The selection process should guide the evolutionary algorithm toward everbetter solutions. Crossover attempts to combine the elements of existing solutions to create a new solution with some of the features of each parent. In the most typical single-point crossover, a locus of chromosome is chosen at which the remaining alleles are swapped from one parent to the other. The elements of existing solutions are combined in a crossover operation to nd new and possibly better points in the search space. Crossover however does not always occur [22]. Sometimes, based on a set probability, no crossover occurs, and the parents are directly copied to the new population. The probability of crossover occurring is usually about 70%. Mutation makes random changes in one or more members of the current population, yielding a new candidate solution, which may be better or worse than existing population members. In a mutation process, we loop through all the alleles of all the individuals, and if that allele is selected for mutation, it can be either changed by reversing 0 to 1 or 1 to 0 or replaced with a new (usually random) value in the case of the nonbinary string representation of the problem. The probability of mutation is usually between one tenth and two tenth of one percent. In our research, we have applied the GA to determine the best diagnostic features used for the classication of the blood cells. We have used the binary code representation of the individual feature. The value 1 means the inclusion of the particular feature, whereas the value 0 indicates the lack of this feature in vector x. In all the experiments, an elitist strategy of passing the two ttest population members to the next generation was used. This guarantees that the tness is never declined from one generation to the next, which is the desirable property in our application. Each chromosome is associated with the input vector x applied to the SVM classier. (The value 1 means the real inclusion of the feature, and the value 0 means that no such feature is included in the vector.) Two data sets are involved in GA-based training: 1) the learning set and 2) the validation set. The classier is trained on the learning data set and then tested on the validation data set. The testing error function on the validation data set forms the basis for the denition of the tness function. The tness is dened as the error function taken with a minus sign. The GA maximizes the value of the tness function (equivalent to the minimization of the error function) by performing the subsequent operations of the selection of the parents, the crossover between the parents, and, nally, the mutation. The roulette wheel has been applied for the selection. The general diagram illustrating the GA applied for the best feature selection is shown in Fig. 3 [6]. The described process is repeated until a termination condition has been reached. The common terminating conditions are given as follows: a solution that satises the minimum criteria, a xed number of generations reached, the allocated computation time reached, and the highest ranking solution reaching tness or a plateau such that successive iterations no
2164
TABLE I LIST OF THE BLOOD CELLS USED IN TESTING ONLY
Fig. 3. Diagram illustrating the GA applied to feature selection.
longer produce better results. We have applied the combinations of these conditions. GAs are a very effective way of quickly nding a reasonable solution to a complex problem. They do an excellent job of searching through a large and complex search space and are most effective in a search space for which little is known. They produce solutions that solve the problem in ways that may never have even been considered before. The application of GAs to feature selection is superior over other methods; since the search space is very large and poorly understood, no strict mathematical solution of the problem is available, and traditional search methods are extremely tedious and do not lead to the acceptable efciency of the system. Observe that the GA simultaneously selects the most important features forming the components of the best input vector x and determines the optimal size of this vector. This is a unique fusion of the required demands, which are not available in the classical approaches to the problem. VI. R ESULTS OF N UMERICAL E XPERIMENTS A. Database The database of the blood cells used in all the experiments has been created in cooperation with the Hematology Hospital
in Warsaw, Poland. The bone marrow smear samples have been collected from 48 patients suffering from different types of leukemia, i.e., AML-M2, AML-M3, AML-M4, AML-M5, CLL, ALL, and lymphoma. The smears have been processed by applying the standard method of MayGrunwaldGiemsa. The acquired image was digitized using an Olympus microscope with 1000 magnication and a digital camera with a resolution of 1712 1368 pixels, and the picture was saved in redgreenblue format. More than 5000 blood cells taking part in the experiments have been acquired this way. Three families of cells have been used in the experiments. The rst family belongs to the erythrocytic line and contains three succeeding stages of cell development, i.e., the basophilic, polychromatic, and pyknotic erythroblasts. They are very difcult to recognize since the neighboring cells in the development process are very similar and the recognition between two succeeding stages is very hard, even for the human expert. The second family is composed of six neighboring cells of the granulocytic line of development, i.e., the myelo/mono blasts, promyelocytes, neutrophilic myelocytes, neutrophilic metamyelocytes, band neutrophils, and segmented neutrophils. They represent the succeeding stages of white blood cell development, which is very hard to recognize. The third set represents the lymphocyte family, which is composed of only two representatives, i.e., the prolymphocyte and the lymphocyte. Both classes are very similar to each other and are difcult to recognize. The available data set was split into two parts: One part that contains two third of the data set has been used in learning, and the remaining part (one third) has been used for testing only. The data from the rst set (two third of the data) have also been split into two halves. The rst half was used for pure learning of the SVM classier, and the second half was used for calculating the tness function (the validation of the SVM model). The remaining one third of data has only been used for the testing of the trained classiers. An overview of all the cell types under investigation and the number of samples used in testing (one third of the whole set) are presented in Table I. In our further investigations, we will refer to a particular class by its individual number. The rst experiments have been performed using all the features (no ranking) and then repeated using only 30 of the best features selected by applying the linear SVM ranking. Twenty runs of both types of experiments have been performed. The
2165
TABLE II CONFUSION MATRIX AT THE RECOGNITION OF THE ERYTHROBLAST, GRANULOCYTE, AND LYMPHOCYTE CELL FAMILIES WITH THE APPLICATION OF LINEAR SVM RANKING
TABLE III COMPARATIVE RESULTS OF THE BLOOD CELL RECOGNITION ERROR IN THE F IRST E XPERIMENT ( THE B EST R UN OF GA)
results generated by the system were compared with the decisions of the human expert, and all discrepancies were treated as misclassications. For each class of cells, the average relative error was calculated as the ratio of all the misclassications to the total number of cells. Additionally, the total error of recognition of all classes was calculated as the ratio of all the misclassications (over all cells) to the total number of cells taking part in the testing. Note that it is not the same as the mean of the average error of all the classes. In this paper, we will analyze only the errors of testing using the data that do not take part in learning (one third of the whole set of data). With the application of linear SVM ranking, we have the overall mean misclassication error to be equal to 22.5%, whereas the appropriate result at no ranking was 27.5%. However, the most important conclusion is drawn from the so-called confusion matrix. Table II presents such matrix corresponding to the results of the recognition of cells that are not taking part in learning with the application of the linear SVM ranking. The erythroblast, granulocyte, and lymphocyte families have been encircled using thick lines. The diagonal entries represent the numbers of properly recognized classes. Each entry outside the diagonal means an error. The entry in the (i, j)th position of the matrix means false assignment of the ith class to the jth class. It is evident that most misclassications occur for the neighboring cells within the same development line. Far misclassications are very rare. The important problem is the selection of the features capable of reducing the misclassications taking place between the neighboring cells for both families of cells. To solve the problem, we have applied the GA for feature selection. To get an objective assessment of the results, we have performed 20 runs of the GA procedure at different starting conditions of the initial population and different divisions of the data for the learning and testing parts. The error function used in the denition of the tness function was equal to the relative number of misclassications on the validation data set (half of the learning data). B. Numerical Results of the GA Experiments In the GA approach to feature selection, we have used all the generated features as potential candidates, which form the chromosome. With 164 features, the length of the chromosome is also equal to 164. The value of 0 on the ith position of the chromosome is interpreted as the deletion of the ith feature from vector x. The initial population of the chromosomes
randomly generated was equal to 100. Among them was the chromosome composed of all 1s (all the features taken into account). The GA has used the following parameters: a chromosome size of 164, a population size of 100, a mutation rate of 0.01, and a crossover rate of 0.88. The experiments have separately been repeated for all two-class recognizing SVM networks. The best results of the GA have been compared with the results obtained with the application of all features (no ranking) and also with the best results corresponding to the 30 features selected using linear SVM ranking. Table III depicts the comparative results of the relative recognition error of the cells using the proposed GA method (the results of the best run) and the selection using linear SVM ranking, with the application of the 30 best features and no ranking of features at all. All errors have been computed as the difference between the results of our system and the human expert score used as the reference. The errors have been calculated as the ratio of the number of misclassied samples to the total number of cells belonging to the appropriate class. The mean error of recognition was calculated as the ratio of the number of misclassied samples to the total number of samples. As shown, the GA results always belong to the best of all the classes. The mean average error rate is better by almost 25% over the SVM ranking results and almost 40% over the results without ranking. However, this increase in accuracy has been paid by the expensive calculation cost. With 11 classes of cells, the total computation time of learning was almost 6 h on a 2.4-GHz personal computer with 512-MB random access memory. Note, however, that testing the trained SVM network is done practically immediately (with time less than 1 s). Fortunately for users, the learning procedure of the system is done only once, provided that the learning set is representative of the cells under consideration. Table IV presents the confusion matrix for the classication of three erythroblast family members, six members of the granulocytic family, and two lymphocyte members. They correspond to the same matrix (Table II) obtained with the application of the linear SVM ranking. It is evident that the number of misclassied neighboring cells has signicantly been reduced.
2166
TABLE IV CONFUSION MATRIX AT THE RECOGNITION OF THE ERYTHROBLAST, GRANULOCYTE, AND LYMPHOCYTE CELL FAMILIES WITH THE APPLICATION OF GA
TABLE VI NUMBER OF FEATURES FOR DIFFERENT COMBINATIONS OF CLASSES
TABLE V STATISTICAL RESULTS OF THE BLOOD CELL RECOGNITION ERROR WITH THE APPLICATION OF GA
Table V presents the statistical measures of the accuracy of the GA approach to the recognition of the considered blood cells. The statistics (the mean values and the standard deviations) have been estimated based on 20 independent runs of the GA. We see that the results of the individual runs of the GA are close to each other. (Standard deviations take small values.) However, the accuracy of recognition varies from class to class. For some classes (e.g., 5, 7, and 8), the relative misclassication rate is rather high. This is due to the high similarity among the neighbors for these classes. Moreover, the number of samples representing these classes belongs to the smallest. Additionally, we should take into account that the data for the experiments have been collected from 48 patients suffering from different kinds of leukemia at various stages of development of the illness. The images under classication have been registered from the smears gathered in the hospital within a few years, and their quality was also very different. This yields a great variation among the cells belonging to the same class, making the recognition problem extremely difcult. The important advantage of the application of the GA algorithm is the simultaneous determination of the quantity of the most signicant diagnostic features since the optimal vector immediately provides a full set of features. This is in contrast to the classical methods of feature ranking, delivering only the ordering of features without any evidence on their optimal quantity. It is interesting to compare the number of features selected by the GA for the recognition between different classes of cells. Table VI depicts these numbers for different combinations of classes (the best run).
The results show that for each two-class combination, the SVM network should apply different numbers of features. As shown, these numbers signicantly differ from combination to combination. Fig. 4 shows the histogram of all the features selected by the GA for three- and six-class problems. The y-axis represents the number of times the particular feature has been selected as important. (The maximum number is equal to the number of two-pair combinations of all classes.) For example, at a three-class problem, there were three SVM pairs for which separate sets of optimal features should be selected. If a feature is selected twice, this means that it was not selected for one pair of classes. Fig. 4(a) shows the recognition of three classes (three independent SVM networks), whereas Fig. 4(b) shows that of six classes (15 independent SVM networks). As shown, there is a great difference in the discrimination values of the features. Some of them have quite frequently been selected, but some have not been chosen at all (almost one third of the features for the three-class problem). None of the features has been selected by all SVM classiers. VII. C ONCLUSION This paper has presented the application of the GA to feature selection for the recognition of the neighboring blood cells belonging to the same development line. The numerical experiments performed for the set of six neighboring cell types of granulocytes, three neighboring erythroblasts, and two members of lymphocyte family have conrmed the high efciency of the proposed approach in comparison with the other classical feature selection methods. The important advantage of the GA method is in combining together the ranking of the features and the determination of the optimal number of them. Applying the GA, we were able to increase the accuracy of the blood cell recognition by more than 25% (in relative terms) with respect to the best method of feature selection (linear SVM ranking). The important point in this solution is the application of the SVM network classier. Its choice is motivated by the high efciency of the learning algorithm, relative insensitivity to the number of learning data, and very good generalization ability. The selection procedures based on the GA have been specialized for this classier. (The features are adjusted to the recognition of two classes only.) This approach leads to the high specialization of each classier and results in an overall increase in accuracy. The practical comparison with
2167
Fig. 4.
Histogram of the features selected by the GA in the recognition system. (a) Recognition of three classes and (b) recognition of six classes. [16] B. Schlkopf and A. Smola, Learning With Kernels. Cambridge, MA: MIT Press, 2002. [17] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998. [18] L. Platt, Fast training of SVM using sequential optimization, in Advances in Kernel MethodsSupport Vector Learning, B. Scholkopf, B. Burges, and A. Smola, Eds. Cambridge, MA: MIT Press, 1998, pp. 185208. [19] C. W. Hsu and C. J. Lin, A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415425, Mar. 2002. [20] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley, 1989. [21] D. Ashlock, Evolutionary Computation for Modeling and Optimization. Berlin, Germany: Springer-Verlag, 2006. [22] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs. Berlin, Germany: Springer-Verlag, 1996.
other solutions has shown its superiority. Additional parallel tests performed with the application of a multilayer perceptron in the role of a classier have resulted in a signicant increase in the misclassication rate. R EFERENCES
[1] J. M. Bennett, D. Catovsky, M. T. Daniel, G. Flandrin, D. A. Galton, H. R. Gralnick, and C. Sultan, Proposals for the classication of the acute leukaemias. FrenchAmericanBritish (FAB) co-operative group, Br. J. Haematol., vol. 33, no. 4, pp. 451458, Aug. 1976. [2] K. Lewandowski and A. Hellmann, Hematology Atlas. Gdansk, Poland: Multimedia Med., 2001. [3] M. Beksac, M. S. Beksac, V. B. Tippi, H. A. Duru, M. U. Karakas, and A. Nurcakar, An articial intelligent diagnostic system on differential recognition of hematopoietic cells from microscopic images, Cytom., vol. 30, no. 3, pp. 145150, Jun. 1997. [4] N. Theera-Umpon and P. Gader, System-level training of neural networks for counting white blood cells, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 1, pp. 4853, Feb. 2002. [5] S. Osowski and T. Markiewicz, Support vector machine for recognition of white blood cells in leukemia, in Kernel Methods in Bioengineering, Signal and Image Processing, G. Camps-Valls, J. L. Rojo-Alvarez, and M. Martinez-Ramon, Eds. London, U.K.: Idea Group, 2007, pp. 93123. [6] R. Siroic, S. Osowski, T. Markiewicz, and K. Siwek, Support vector machine and genetic algorithm for efcient blood cell recognition, in Proc. IEEE IMTC, Warsaw, Poland, 2007, pp. 16. [7] B. Swolin, P. Simonsson, S. Backman, I. Lofqvist, I. Bredin, and M. Johnsson, Differential counting of blood leukocytes using automated microscopy and a decision support system based on articial neural networksEvaluation of DiffMaster Octavia, Clin. Lab. Haematol., vol. 25, no. 3, pp. 139147, Jun. 2003. [8] DiffMaster Octavia DM96, Cellavision AB. [Online]. Available: www. cellavision.com [9] R. O. Duda, P. E. Hart, and P. Stork, Pattern Classication and Scene Analysis. New York: Wiley, 2003. [10] J. Schurmann, Pattern Classication: A Unied View of Statistical and Neural Approaches. New York: Wiley, 1996. [11] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selection for cancer classication using support vector machines, Mach. Learn., vol. 46, no. 13, pp. 389422, Jan. 2002. [12] P. Soille, Morphological Image Analysis, Principles and Applications. Berlin, Germany: Springer-Verlag, 2003. [13] F. Scotti, Robust segmentation and measurements techniques of white cells in blood microscope images, in Proc. IEEE IMTC, Sorrento, Italy, 2006, pp. 4348. [14] Matlab User Manual, MathWorks, Natick, MA, 2004. [15] T. Wagner, Texture analysis, in Handbook of Computer Vision and Application, B. Jahne, H. Haussecker, and P. Geisser, Eds. Boston, MA: Academic, 1999, pp. 275309.
Stanislaw Osowski was born in Poland in 1948. He received the M.Sc., Ph.D., and Dr.Sc. degrees from Warsaw University of Technology, Warsaw, Poland, in 1972, 1975, and 1981, respectively, all in electrical engineering. He is currently a Professor of electrical engineering with the Institute of the Theory of Electrical Engineering, Measurement and Information Systems, Warsaw University of Technology. He is also with Military University of Technology, Warsaw. His research and teaching interests include neural networks, optimization techniques, and their applications in biomedical signal and image processing.
Robert Siroi was born in Poland in 1980. He c received the M.Sc. degree in electrical engineering from Warsaw University of Technology, Warsaw, Poland, in 2003. He is currently working toward the Ph.D. degree with Warsaw University of Technology. His research interests include neural networks and genetic and evolutionary programming.
2168
Tomasz Markiewicz was born in Poland in 1976. He received the M.Sc. and Ph.D. degrees in electrical engineering from Warsaw University of Technology, Warsaw, Poland, in 2001 and 2006, respectively. He is currently with Warsaw University of Technology and the Military Institute of the Health Services, Warsaw. His research interests include neural networks and biomedical signal and image processing.
Krzysztof Siwek was born in Poland in 1971. He received the M.Sc. and Ph.D. degrees in electrical engineering from Warsaw University of Technology, Warsaw, Poland, in 1995 and 2001, respectively. He is currently with Warsaw University of Technology. His research interests include neural networks, signal processing, and genetic algorithms.

Application of Support Vector Machine and Genetic Algorithm For Improved Blood Cell Recognition

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Application of Support Vector Machine and Genetic Algorithm For Improved Blood Cell Recognition

Hochgeladen von

Copyright:

Verfügbare Formate

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 58, NO.

0018-9456/$25.00 2008 IEEE

Fig. 1. Sample images of the bone marrow aspirate at 1000 magnication.

with the constraints (i = 1, 2, . . . , p)

TABLE I LIST OF THE BLOOD CELLS USED IN TESTING ONLY

Fig. 3. Diagram illustrating the GA applied to feature selection.

TABLE VI NUMBER OF FEATURES FOR DIFFERENT COMBINATIONS OF CLASSES

Das könnte Ihnen auch gefallen