You are on page 1of 4

Medical Image Classification with Multiple Kernel Learning

Hong Wu

Hao Zhang

Chao Li

School of Computer Sci. and Eng.

School of Computer Sci. and Eng.
School of Computer Sci. and Eng.
Univ. of ELEC. Sci. & Tech. of China Univ. of ELEC. Sci. & Tech. of China Univ. of ELEC. Sci. & Tech. of China
Chengdu 611731, P. R. China
Chengdu 611731, P. R. China
Chengdu 611731, P. R. China


retrieval system.

Nowadays, medical images are generated by hospitals and medical

centers rapidly. The large volume of medical image data produces
a strong need to effective medical image retrieval. The visual
characteristic of medical image, such as modality, anatomical
region etc., are important information and can be used to improve
the retrieval process. Even though some of the information is
contained in the DICOM headers, it has been reported that
DICOM headers contain a relatively high rate of errors. And for
on-line medical collection, these metadata can be lost when
medical images are compressed. In this paper, we propose an
algorithm for medical image classification according to their
visual content. Our method uses multiple kernel learning (MKL)
to combine different visual features, and learn the optimal mixing
weights for each class adaptively. This method is evaluated on a
medical image dataset with 1400 images, and the experimental
results demonstrate the effectiveness of our method.

It is helpful for medical image retrieval to classify medical image

according to their visual characteristic, such as modality,
anatomical region etc. Although part of this information is
normally contained in the DICOM headers and many imaging
devices are DICOM-compliant, there have also been reported
errors in DICOM headers [2, 3]. In the context of internet,
DICOM header information can be lost when medical images are
converted to formats such as JPEG and GIF. And the image
captions or annotations often do not capture this information.
Consequently, automatic annotation of medical images by
classification is regarded as an important step to improve medical
image retrieval, and attract much research attention. From 2005, a
medical image annotation sub-task has been introduced to
ImageCLEFmed, a medical image retrieval task within
ImageCLEF campaign.
In this paper, we propose to combine various visual features with
multiple kernel learning (MKL) for automatic medical image
classification. MKL is a kind a method to learn the optimal
combination of a set of kernels for SVM-based classification. By
using MKL, the optimal mixing weights are estimated for each
class. We test the proposed method on a medical image dataset
with 1400 images from 14 classes, selected from ImageCLEFmed
2009. And the experimental results indicate the effectiveness of
our method.

Categories and Subject Descriptors

H.3.1 [Content Analysis and Indexing]: Abstracting Methods;
H.3.3 [Information Search and Retrieval]: Information filtering;
I.4.8 [Scene Analysis]: Object recognition

General Terms
Algorithms, Measurement, Experimentation.

The next section introduces some previous works related to

medical image classification and multiple kernel learning. Our
method for medical image classification, including visual features
used and multiple kernel SVM classifier, is described in section 3.
The dataset, experimental procedure and results are described in
section 4, followed by conclusions in section 5.

medical imaging, image classification, multiple kernel learning,
feature fusion.

With the progress of digital imaging technologies, a large number
of medical images have been generated by hospitals and medical
centers in recent years [1]. These visual data play an increasingly
important role in diagnosis, treatment planning and education [2],
and are commonly integrated into PACS, published as online
collections or in the online content of journals. These large image
repositories produce a strong need to effective medical image

Different visual features have been used for medical image
classification. Among them, edge, shape and global texture
features [4, 5, 6] were commonly used. Local features and bagof-features representation, which are successfully applied for
object recognition, are also used for medical image classification
[6, 7, 8]. To take advantage of both kinds of features, some works
combine different global and local features into a unique feature
representation [4, 6, 9].

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
ICIMCS'10, December 30-31, 2010, Harbin, China.
Copyright 2010 ACM 978-1-4503-0460-3/10/12...$10.00.

For classification, various algorithms have been used, such as

nearest neighbor classifier [5, 10], decision trees [11], as well as
Support Vector Machines [8, 9, 11].
Kernels define possibly nonlinear similarities between data points
and have been successfully applied to learning algorithm such as

and rotation-invariance is removed. ModSIFT is used with bagof-features representation in our experiments.

SVM. From a machine learning view, different representations

give rise to different kernel, and combining different kernels for
classification become a promising approach. Recently, multiple
kernel learning (MKL) [12, 13, 14, 15] was applied to integrating
various features, more specifically, to find a linear mixture of the
kernels. MKL has the appealing property of always finding the
optimal kernel combination and converges quickly as it can be
wrapped around a regular support vector machine [14].

3.2 Classifier
In this paper, we use multiple kernel learning for medical image
classification. For this multi-class classification, the one-vs-rest
strategy is used.
A normal SVM classifier is designed for two-class problem, and
can treat with only a single kernel. Given n training samples
is the input vector and label
{(x i , yi )}in1 , where xi  0

Some works have applied MKL for object recognition and image
classification. Varma et al. [16] proposed using MKL to fuse
various kinds of image features and made experiments with
Caltech-101/256. Similarly, Nilsback et al.[17] applied a MKLbased feature fusion to flower image classification. On the other
hand, Kumar et al. [18] used MKL to estimate combination
weights of the spatial pyramid kernels (SPK) [19] with a single
kind of image features. Lampert et al.[20] estimated the degree of
contextual relations between objects in the setting of multiple
object recognition employing MKL. Joutou and Yanai [21]
employed MKL-based feature fusion method for food image
recognition. In this paper, we use MKL for medical image

yi  1,1 .

Support vector machines originate from linear classifiers and

maximize the margin between samples of both classes.
Introducing a feature mapping from the input space 0 to a
reproducing kernel Hilbert space (RKHS)
, linear classifiers in
of the form

f (x) 
(x)  b


provide a rich set of flexible classifiers in 0 . The parameters ( ,

b) are determined by solving an equivalent dual optimization.
The dual optimization depends only on inner products
(similarities) of inputs which can be alternatively computed by
means of kernel functions k, given by

3.1 Visual Features

k (x, x' )  (x), (x' )

In this paper, we test four image features, in which three global

features are GLCM texture features, Tamura texture feature and
Gabor texture features, and one local feature is ModSIFT.


And the final decision function can be written as


f ( x )   i k ( x i , x)  b

GLCM Texture Features: The gray level co-occurrence matrix

(GLCM) of an image is defined as a matrix of frequencies at
which two pixels, separated by a certain offset, occur in the image.
The distribution in the matrix will depend on the angular and
distance relationship between pixels [22]. We used contrast,
correlation, energy and homogeneity features derived from each
normalized co-occurrence matrix to represent an image.


i 1

The multiple kernel learning framework extends the regular SVM

formulation by an adaptively-weighted combined kernel which
fuses different kinds of features. The combined kernel is as

k (x, x' )   j k j (x, x' )

Tamura Texture Feature: Based on the research of textural

features corresponding to human visual perception, Tamura et
al.[23] proposed six basic textural features, namely, coarseness,
contrast, directionality, line likeness, regularity, and roughness.
In image retrieval research, the coarseness, contrast and
directionality features have been proven effective and are also
used in this paper.

j 1

with  j  0,


 j 1
j 1

where  j is weight to combine M sub-kernels k j (x, x' ) . MKL

can estimate optimal weights from training data.
By preparing one sub-kernel for each image features and
estimating weights by the MKL method, we can obtain an optimal
combined kernel. Sonnenburg et al. [14] proposed an efficient
algorithm of MKL to estimate optimal weights and SVM
parameters simultaneously by iterating training steps of a normal
SVM. This implementation is available as the SHOGUN machine
learning toolbox at the Web site of the first author of [14]. In the
experiment, we use the MKL library included in the SHOGUN
toolbox as the implementation of MKL.

Gabor Texture Feature: It has been proposed that Gobor filters

can be used to model the responses of the human visual system,
and Gabor filter based approaches are popular for texture feature
extraction. We use an implementation proposed by Manjunath et
al [24]. The feature is built by filtering the image with a bank of
orientation and scale sensitive filters and computing statistic
measures of the output in the frequency domain.
ModSIFT: SIFTs [25] are local features and designed to describe
an area of an image so to be robust to noise, illumination, scale,
translation and rotation changes. For medical image classification,
the SIFT rotation-invariance is not relevant, as the various
structures in the radiographs are likely to appear always with the
same orientation. Moreover, the scale is not likely to change too
much between images of the same class. So we use a modified
version of the SIFT descriptor, ModSIFT [26], in which the scale-

4.1 Dataset
In our experiments, we construct a dataset by selecting images
from the IRMA dataset, which has been used in the annotation
subtask of ImageCLEFmed2009. In IRMA dataset, the image
annotation is based on the hierarchical Image Retrieval in Medical


where  is a kernel parameter, 2 is the chi-squared distance.

Zhang et al.[28] reported that the best results were obtained in
case that they set the average of 2 distance between all the
training data to the parameter  of the 2 kernel. We followed
this method to set  . For other kernels, the parameters are tuned
by grid search. In the second phase of our experiment, multiple
kernel learning is used with different combination of features, and
each feature is with their best kernel determined by the first phase
experiment. From the tests, the best combination of features for
MKL is found. After that, these features are concatenated to form
one feature vector to test with normal SVM. The target of this
phase is to find the best combination of features for MKL, and
compare MKL with normal SVM.

4.3 Experimental Results

Fig. 1. Example images from IRMA dataset

Our dataset contains 14 categories of total 1400 medical images,

and the size of each category is 100. Each category of images
was split randomly into two separate sets of images, N (50) for
training and the rest for testing. All experiments are repeated ten
times with different randomly selected training and test images,
and the average of per-class classification accuracies is recorded
for each run. The final result is reported as the mean and standard
deviation of the results from the individual runs. From the
experimental results of the first-phase experiment, we found that
Gabor texture feature and modSIFT have better performance than
other features for classification, and among the kernels we used,
2 kernel has the best performance. From the results of the
second phase, we found that fusing Gobor feature and modSIFT
with MKL achieved the best performance among all combinations
of features. Table 1. gives the classification accuracy for some
methods. When classification with single feature, Gabor texture
feature achieves the accuracy of 85.28% and modSIFT gets
94.87%. When concatenating Gabor feature and ModSIFT to a
620-dim vector, the normal SVM gets accuracy of 96.01%.
Fusing Gabor feature and Modsift with MKL gets the best result
of 96.68%. The experimental results indicate fusing features with
MKL can improve medical classification accuracy and is better
than with normal SVM.

Applications (IRMA) code. The complete IRMA code is a string

of 13 characters TTTT-DDD-AAA-BBB, in which T is short for
technical, indicating image modality, D for directional, indicating
body orientation, A for anatomical, indicating body region
examined and B for biological, indicating biological system
examined. To classify image at a coarse level, the IRMA code is
compressed to only 2,1,2 and 1 code levels at the T, D, A, and B
axis, according to Mark Oliver Gulds method [27], and 73
different codes remained in IRMA dataset. For experiment
convenience, we choose 14 categories of the 73 categories, and
randomly select 100 images from each chosen categorie to
construct a dataset with 1400 images for our experiments. Fig. 1
gives some example images of our dataset.

4.2 Experimental Settings

GLCM texture features, Tamura texture feature, Gabor texture
features and ModSIFT are used in our experiments. To calculating
GLCM texture features, 4 GLCMs were created for each image
using vectors of length 1 and orientations 0,  / 4 ,  / 2 , 3 / 4 ,
and contrast, correlation, energy and homogeneity features are
calculated for each normalized co-occurrence matrix to get a 16dim vector for each image. To extract Tamura feature, coarseness,
contrast, and directionality features are computed on a per-pixel
basis, and the values are quantized into a three-dimensional
histogram (8 h 8 h 8 = 512 bins) to form one 512-dim vector
for each image. To extract Gabor features, Gabor filters with 3
scales and 4 orientations are used to filter image, and the mean of
the filtered images are quantized to 10 bins to form 120-dim
histogram feature. According to [26], modSIFTs features are
extracted at 30 randomly sampled point from each input image,
and all extracted modSIFTs are used to form a vocabulary by Kmeans with K=500. Then each image is represented by a 500-bin
histogram of visual words.

Table 1. classification accuracies for different methods



85.28  1.92%


94.87  1.58%
96.01  0.99%
96.68  1.23%


Our experiments include two phases. In the first phase, SVM

classifier with single feature is used for classification. Our target
of this experimental phase is to determine the best kernel for each
feature, and four kernels are compared: rbf, poly, linear, sigmoid
and 2 kernel. 2 kernel is commonly used in object recognition
tasks and its form is as follows:

k (x, x' )  exp   1 2 (x, x' )


Recently, MKL is used to integrate various features by finding a

linear mixture of the kernels, and have achieved success in some
applications. In this paper, we propose a MKL-based algorithm
for medical image classification. MKL can adaptively learn the
optimal combination of a set of kernels for SVM classifier. The
experimental results indicate the effectiveness of MKL. In our
future work, we plan to use more image features, test with other
MKL implementations, and extend the medical image dataset by
adding more categories and more images.



[14] Sonnenburg S., Rtsch G., Schfer C., Schlkopf B. 2006.

Large scale multiple kernel learning. Journal of Machine
Learning Research, 7 (July 2006) 15311565.

This work is supported by the National Science Foundation of
China under grants 60873185 and by the Key Program of the
Youth Science Foundation of UESTC under Grant JX0745

[15] Rakotomamonjy A., Bach F., Canu S., and Grandvalet Y.

2007. More efficiency in multiple kernel learning. In
Proceedings of International Conference on Machine
Learning. ICML 07. 775782.

[1] Robb R. A. 1999 Biomedical Imaging, Visualization, and
Analysis, Wiley-Liss.

[16] Varma M. and Ray D. 2007. Learning the discriminative

power-invariance trade-off. In Proceedings of IEEE
International Conference on Computer Vision. ICCV07.

Muller H., Michoux N., Bandon D., and Geissbuhler A.

2004. A review of content-based image retrieval systems in
medical applications-clinical benefits and future directions.
International Journal of medical Informatics, 73. 1 (Feb.
2004). 1-23.

[17] Nilsback M. and Zisserman A. 2008. Automated flower

classification over a large number of classes. In Proceedings
of Proceedings of the Indian Conference on Computer
Vision, Graphics and Image Processing, CVGIP08.

[3] Gld M.O., Kohnen M., Keysers D., Schubert H., Wein B.B.,
Bredno J., Lehmann T.M. 2002. Quality of DICOM header
information for image categorization. In Proceedings SPIE,
4685, 280-287.

[18] Kumar A. and Sminchisescu C. 2007. Support kernel

machines for object recognition. In Proceedings of IEEE
International Conference on Computer Vision. ICCV07.

[4] Qiu B., Xiong W., Tian Q., Xu C.S. 2005. Report for
annotation task in ImageCLEFmed 2005. In Working Notes
of CLEF 2005. Vienna, Austria.

[19] Lazebnik S., Schmid C., and Ponce J. 2006. Beyond bags of
features: Spatial pyramid matching for recognizing natural
scene categories. In Proceedings of IEEE Computer Vision
and Pattern Recognition, CVPR06. 21692178.

[5] Deselaers T., Weyand T., Keysers D., Macherey W., Ney H.
2005. FIRE in ImageCLEF 2005: Combining contentbased
image retrieval with textual information retrieval. In CLEF
2005 Proceedings, Springer, Lecture Notes in Computer
Science (LNCS). 4022. 652661.

[20] Lampert C. H. and Blaschko M.B. 2008. A multiple kernel

learning approach to joint multi-class object detection. In
Proceedings of the German Association for Pattern
Recognition Conference. GAPR08.

[6] Liu J., Hu Y., Li M., Ma S., Ma W.Y. 2006. Medical image
annotation and retrieval using visual features. In CLEF 2006
Proceedings, Springer, Lecture Notes in Computer Science
(LNCS). 4730. 678685.

[21] Joutou T., Yanai K. 2009. A food image recognition system

with multiple kernel learning. In Proceedings of the 16th
IEEE international conference on Image processing.
ICIP09. 285-288.

[7] Tommasi T., Orabona F., Caputo B. 2007. CLEF2007 Image

Annotation Task: an SVMbased Cue Integration Approach.
In Working Notes of CLEF 2007. Budapest, Hungary.

[22] Haralick R. 1979. Statistical and structural approaches to

texture. In Proceedings of the IEEE, 67, 5, 786-804.

[8] Avni U., Goldberger J., Greenspan H. 2008. TAU MIPLAB

at ImageClef 2008. In Working Notes of CLEF 2008. Aarhus,

[23] Tamura H., Mori S., and Yamawaki T. 1978. Texture

features corresponding to visual perception. IEEE Trans. On
Systems, Man, and Cybernetics, 8, 6 (June, 1978), 460-473.

[9] Tommasi T., Orabona F., Caputo B. 2008. CLEF2008 Image

Annotation Task: an SVM ConfidenceBased Approach. In
Working Notes of CLEF 2008. Aarhus, Denmark.

[24] Manjunath B. and Ma W. 1996. Textures for browsing and

retrieval of image data. IEEE Trans on Pattern Analysis and
Machine Intelligence, 18, 8 (Aug. 1996), 837-842.

[10] Gld M.O., Christian Thies B.F., Lehmann T.M. 2005.

Combining global features for contentbased retrieval of
medical images. In Working Notes of CLEF 2005. Vienna,

[25] Lowe, D.G. 1999. Object recognition from local scaleinvariant features. In Proceedings of International
Conference on Computer Vision, ICCV99. 2, 11501157.
[26] Tommasi T., Orabona F., Caputo B. 2002. Discriminative
cue integration for medical image annotation. Pattern
Recognition Letters. 29, 15 (Nov. 2008), 19962002

[11] Setia L., Teynor A., Halawani A., Burkhardt H. 2008.

Grayscale medical image annotation using local relational
features. Pattern Recognition Letters, 29, 15 (Nov. 2008).

[27] Gueld M. O., Keysers D., Deselaers T., Leisten M., Schubert
H., Ney H., Lehmann T.M. 2004. Comparison of Global
Features for Categorization of Medical Images. In
Proceedings of SPIE, 5371, 211-222.

[12] Gert R.G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent

El Ghaoui, and Michael I. Jordan. Learning the kernel matrix
with semidefinite programming. Journal of Machine
Learning Research, 5, 12 (Dec. 2004) 2772.

[28] Zhang J., Marszalek M., Lazebnik S., and Schmid C. 2007.
Local Features and Kernels for Classification of Texture and
Object Categories: A Comprehensive Study. International
Journal of Computer Vision, 73, 2 (June, 2007), 213238.

[13] Bach F., Lanckriet G., and Jordan M. 2004. Multiple kernel
learning, conic duality, and the smo algorithm. In
Proceedings of International Conference on Machine
Learning. ICML04.