Sie sind auf Seite 1von 2

Efficient Binary Classifier for Prediction of Diabetes

Using Data Preprocessing and Support Vector Machine

Madhavi Pradhan1 and G.R. Bamnote2


1
Department of Computer Engineering, AISSMS College of Engineering,
University of Pune, Pune, Maharashtra, India
2
Department of Computer Science and Engineering, PRMIT&R, SSGBAU,
Amravati, Maharashtra, India
{madhavipradhan,grbamnote}@rediffmail.com

Abstract. Diabetes offer a sea of opportunity to build classifier as wealth of


patient data is available in public domain. It is a disease which affects the vast
population and hence cost a great deal of money. It spreads over the years to the
other organs in body thus make its impact lethal. Thus, the physicians are
interested in early and accurate detection of diabetes. This paper presents an
efficient binary classifier for detection of diabetes using data preprocessing and
Support Vector Machine (SVM). In this study, attribute evaluator and the best
first search is used for reducing the number of features. The dimension of the
input feature is reduced from eight to three. The dataset used is Pima diabetic
dataset from UCI repository. The substantial increase is noted in accuracy by
using the data pre processing.

Keywords: SVM, Diabetes, Classifier, Preprocessing, Binary Classifier.

1 Introduction
Diabetes is a malfunctioning of the body caused due to lack of production of insulin
in the body or resistance of the produced insulin by the body. Insulin is a hormone
which regulates and controls the glucose in the blood. Lack of insulin leads to
excessive content of glucose in the blood which can be toxic. 347 million people
worldwide have diabetes[1] . Diabetes can lead to heart diseases which may increase
the complications and can be fatal. Looking at all these instances and statistics, there
is a need for early and accurate detection of diabetes.
Diagnosis of diabetes depends on many parameters and usually the doctors need to
compare results of previous patients for correct diagnosis. So establishing a classifier
system that can classify according to the previous decision made by experts and with
minimal features can help in expediting this process. In the classification tasks on
clinical datasets, researchers notice that it is common that a considerable number of
features are not informative because they are either irrelevant or redundant with
respect to the class concept. Ideally, we would like to use the features which have
high predictive power while ignore or pay less attention to the rest. The predictive
feature set can simplify the pattern representation and the classifier design. Also the
resulting classifier will be more efficient.

© Springer International Publishing Switzerland 2015 131


S.C. Satapathy et al. (eds.), Proc. of the 3rd Int. Conf. on Front. of Intell. Comput. (FICTA) 2014
– Vol. 1, Advances in Intelligent Systems and Computing 327, DOI: 10.1007/978-3-319-11933-5_15
132 M. Pradhan and G.R. Bamnote

Feature reduction has been applied to several areas in medicine [2], [3]. Huang et
al. [4] predicts type 2 diabetic patients by employing a feature selection technique as
supervised model construction to rank the important attributes affecting diabetes
control. K. Polat, S. Gunes[5] used PCA_ANFIS for diabetes detection and has got
accuracy 89.47%. K. Polat, S. Gunes, A. Aslan[6] got 78.21% accuracy. They have
used support vector machine for the diabetes detection. T. Hasan, Y. Nijat and T.
Feyzullah [7] present a comparative study on Pima Indian diabetes diagnosis by using
Multilayer Neural Network (MLNN) which was trained by Levenberg–Marquardt
(LM) algorithm and Probabilistic Neural Network (PNN). MLNN have been
successfully used in replacing conventional pattern recognition method for disease
diagnosis system. LM used in this study provides generally faster convergence and
better estimation results than other training algorithm. The classification accuracy of
MLNN with LM obtained by this study using correct training was better than those
obtained by other studies for the conventional validation method. K. Kayaer and T.
Yildirim [8] have proposed three different neural network structures namely
Multilayer Perceptron (MLP), Radial Basis Function (RBF) and General Regression
Neural Network (GRNN) .These techniques were applied to the Pima Indians
Diabetes (PID) medical data. The performance of RBF was worse than the MLP for
all spread values tried. The performance of the MLP was tested for different types of
back propagation training algorithms. The best result achieved on the test data is the
one using the GRNN structure (80.21%). This is very close to one with the highest
true classification result that was achieved by using the more complex structured
ARTMAP-IC network (81%). P. Day and A. K. Nandi [9] introduce genetic algorithm
based classifier. D. P. Muni, N. R. Pal and J. Das [10] proposed a Genetic
Programming approach to design multiclass classifier.
Numerous classification models are proposed for prediction of diabetes but it is
widely recognized that diabetes are extremely difficult to classify [11]. This paper
presents an efficient binary classifier for detection of diabetes using data
preprocessing and support vector machine. First we have applied feature selection
methods to identify most predictive attributes and then used SVM algorithm for
classification.

2 Proposed Classifier

Mathematically, a classifier can be represented as a function which takes a feature in


p dimensional search space and assigns a label vector Lvc to it ;
C: Sp →Lvc
Where,
C is the classifier that maps search space to label vectors.
Sp is the p dimensional search space.
Lvc is the set of label vectors.

The objective here is to create 'C' using SVM. During the training phase of the
classifier, samples of the form {x1, x2, x3, x4, x5, x6, x7, x8} ∈ S8 and associated

Das könnte Ihnen auch gefallen