Sie sind auf Seite 1von 5

2016 International Conference on Circuit, Power and Computing Technologies [ICCPCT]

Human Heart Disease Prediction System using Data


Mining Techniques
Theresa Princy. R
Research Scholar
Department of Information Technology
Christ University faculty of engineering,
Bangalore, India-560060.
princy.aida@gmail.com

Abstract Nowadays, health disease are increasing day by


day due to life style, hereditary . Especially, heart disease has
become more common these days .i.e. life of people is at risk..
Each individual has different values for Blood pressure,
cholesterol and pulse rate. But according to medically proven
results the normal values of Blood pressure is 120/90,
cholesterol is and pulse rate is 72.This paper gives the survey
about different classification techniques used for predicting
the risk level of each person based on age, gender, Blood
pressure, cholesterol, pulse rate. The patient risk level is
classified using datamining classification techniques such as
Nave Bayes, KNN, Decision Tree Algorithm, Neural
Network. etc., Accuracy of the risk level is high when using
more number of attributes.
Keywords classification Techniques, Decision Tree
Algorithm, heart disease, KNN, Nave Bayes, Neural
Network, Risk level.
NOMENCLATURE.
KNN K-nearest neighbour algorithm.
ID3 Iterative Dichotomiser 3.
CART Classification and Regression Tree.
CHAID Chi-square Automatic Interaction Detector.
J48 Java Implementation of the C4.5 Algorithm.
ANN Artificial Neural Network.
I. INTRODUCTION
Heart disease is the biggest cause of death nowadays. Blood
pressure, cholesterol, pulse rate are the major reason for the
heart disease. Some non-modifiable factors are also there. Such
as smoking, drinking also reason for heart disease. The heart is
an operating system of our human body. If the function of
heart is not done properly means, it will affect other human
body part also. Some risk factors of heart disease are Family
history, High blood pressure, Cholesterol, Age, Poor diet,
Smoking. When blood vessels are overstretched, the risk level
of the blood vessels are increased. This leads to the blood
pressure. Blood pressure is typically measured in terms of

978-1-5090-1277-0/16/$31.00 2016 IEEE

J. Thomas
Department of Computer Science and Engineering
Christ University faculty of engineering,
Bangalore, India-560060

systolic and diastolic. Systolic indicates the pressure in the


arteries when the heart muscle contracts and diastolic indicates
the pressure in the arteries when the heart muscle is in resting
state. The level of lipids or fats increased in the blood are
causes the heart disease. The lipids are in the arteries hence the
arteries become narrow and blood flow is also become slow.
Age is the non-modifiable risk factor which also a reason for
heart disease. Smoking is the reason for 40% of the death of
heart diseases. Because it limits the oxygen level in the blood
then it damage and tighten the blood vessels. Various data
mining techniques such as Nave Bayes, KNN algorithm,
Decision tree, Neural Network are used to predict the risk of
heart disease [1]. The KNN algorithm uses the K user defined
value to find the values of the factors of heart disease. Decision
tree algorithm is used to provide the classified report for the
heart disease. The Nave Bayes method is used to predict the
heart disease through probability. The Neural Network
provides the minimized error of the prediction of heart disease.
In all this above mentioned techniques the patient records are
classified and predicted continuously. The patient activity is
monitored continuously, if there is any changes occur, then the
risk level of disease is informed to the patient and doctor. The
doctors are able to predict heart diseases at an earlier stage
because of machine learning algorithms and with the help of
computer technology. This paper provides an insight about
KNN data mining technique used to predict heart diseases.
II. LITERATURE SURVEY
Different types of studies have been done to focus on
prediction of heart disease. various datamining techniques are
used for diagnosis and achieved different accuracy level for
different methods[7].
The Naive Bayes classifier algorithm uses conditional
independence; it believes that an attribute value of a given
class is independent of the values of other attributes. Web
based health care detection was proposed by S.Indhumathi.etl
[3] has suggested a prediction of high risk heart disease using a
Nave Bayes algorithm. The preprocessed data has been
considered as the training set. Two phase namely classification

2016 International Conference on Circuit, Power and Computing Technologies [ICCPCT]


and prediction was discussed in that work. Preprocessing is
done in the classification phase. The preprocessing includes
cleaning of data, normalization and reduction of data, etc. In
the prediction phase the disease types are classified and
predicted, i.e. a training set is formed based on the disease type
and the test set is formed based on the questions. The predicted
results are sent to the doctor.
ANN, often just called a "neural network", is a
mathematical model or computational model used for a
biological purpose. In other words, it is an emulation of
biological neural system. The prediction method for heart
disease using Neural Network has been proposed by Chaitrali
S.Dangare.etl [2]. It has mainly three layers, i.e. the input layer,
hidden layer and the output layer. The input is given to the
input layer and the result is obtained in the output layer. Then
the actual output and the expected output are compared. The
back propagation has been applied to find the error and to
adjust the weight between the output and the previous hidden
layers. Once, the back propagation is completed, then the
forward process is started and continued until the error is
minimized.
KNN is a non-parametric method which is used for
classification and regression. Compared to other machine
learning algorithm KNN is the simplest algorithm. This
algorithm consist K-closet training examples in the feature
space. In this algorithm K is a user defined constant. The test
data are classified by assigning a constant value which is most
chronic among the K-training samples nearest to the point.
Literature shows the KNN has the strong consistency result.
Decision tree builds classification models in the form of a tree
structure. It breaks the dataset into smaller subset while at the
same time an associated decision tree is incrementally
developed. The decision tree uses a top-down approach
method. The root of the decision tree is the data set and the leaf
is the subset of the data set. The risk level of heart disease
prediction through hybrid algorithm has been proposed by
Shovon K.Pramanik.etl [8]. Hybrid Algorithm is the
combination of KNN algorithm and ID3. These algorithms are
used for heart disease prediction. The KNN algorithm is used
to preprocess the data; it is called as preprocessed algorithm.
The preprocessed data are considered as training set and then
the data has been classified into a tree structure. The ID3
algorithm is applied for the classifier to predict the heart
disease. The incorrect values are classified through KNN
Algorithm.

appropriate results additional attributes such as Smoking and


history of heart diseases also where included in the study.
Smoking and Heart disease were the Modifiable attributes.
Constant values were given to the smoking and heart disease to
predict from the risk rate of heart disease. Patient id is
considered as a key attribute which is unique for each and
every user. Using this key attribute the patient and doctor
retrieve record. Authenticity of the user is taken care by the
application. The Prediction Attribute found the Risk level of
the disease. The risk level was classified into three levels
namely low risk, high risk, normal risk which indicates lesser
than 50%, greater than 50% and 0 respectively[9].
IV. WORK FLOW DESIGN

III. DATA DESCRIPTION


The dataset consist of 3 types of attributes. Input, Key and
Prediction attributes. Commonly used attributes such as Age,
Gender, Blood pressure, Pulse rate and Cholesterol are
considered as input attributes of which age and gender are nonmodifiable attributes. Age is continuous and dynamic in nature
where gender is static and constant. The other parameter have
a continuous and Random Values. In order to get more

Figure1: Flowchart of the risk level prediction system

2016 International Conference on Circuit, Power and Computing Technologies [ICCPCT]


Heart disease is a most annoying thing that happens to an
individual person. There are many prevention methods are
available. But sometimes we cant avoid such a situation. To
avoid those situation find the risk in on time. The Proposed
system First checks all input attributes and classify that
attributes using KNN algorithm. The classes are analyzed with
the standard values. Then the risk rate of the heart disease was
found with the help of ID3 algorithm.
After finding the risk level that was compared with
previous history and the report was sent to doctor and the
patient. The normal measurements of the factors such as Blood
pressure, Pulse rate are given to the patient and doctor through
the alert message [9].
V. TECHNIQUES USED FOR PREDICTION
A Prediction method using KNN and ID3 algorithm proposed
in this study. It consists of two modules Initial module consist
of classifier module and second module consist of prediction
module. In Classifier module data is trained through KNN
algorithm and classified. All the input parameters were
observed and based on the attribute age the data were classified
using KNN algorithm. This classified data is provided to test
data. The KNN algorithm provides K-unique value to each and
every group if the age falls near to that group it belongs to that
respective group. Otherwise, it continuously checks till it
reaches its respective group.

Figure2: various risk levels of systolic for input attribute.


Figure.3 shown below plots the range of minimum and
maximum diastolic, risk levels of each class, modeled. The age
range was grouped by the K-Nearest Neighbor algorithm into
class. The Risk level of each class was identified with the help
of ID3 algorithm. In this Model each class has maximum,
minimum and average risk values plotted. These risk values
were obtained from the test data compared with the model
generated for the diastolic input factor.

In Prediction module data is tested and predicted through ID3


algorithm. All the classes are observed and each class were
verified to find the risk level of the heart disease. For each
class there are standard values for Blood pressure, cholesterol,
Pulse rate. If the test data exceeds that class value means the
risk level of the patient was intimated to the patient and the
doctor. The ID3 algorithm provides the tree structure for the
classified data. In the tree each sub-node represents training
data of each class. Using this sub-node structure the test data
classes are verified and the risk rate of the patient was
calculated.
VI. RESULT AND DISCUSSION
Figure.2 shown below plots the range of minimum and
maximum systolic, risk levels of each class, modeled. The age
range was grouped by the K-Nearest Neighbor algorithm into
class. The Risk level of each class was identified with the help
of ID3 algorithm. In this Model each class has maximum,
minimum and average risk values plotted. These risk values
were obtained from the test data compared with the model
generated for the systolic input factor.

978-1-5090-1276-3/16/$31.00 2016 IEEE

Figure3: various risk levels of diastolic for input attributes


Figure.4 shown below plots the range of minimum and
maximum pulse rates, risk levels of each class, modeled. The
age range was grouped by the K-Nearest Neighbor algorithm
into class. The Risk level of each class was identified with the
help of ID3 algorithm. In this Model each class has maximum,
minimum and average risk values plotted. These risk values
were obtained from the test data compared with the model
generated for the pulse rate input factor.

2016 International Conference on Circuit, Power and Computing Technologies [ICCPCT]


VII. CONCLUSION
The main motivation of this paper is to provide an insight
about detecting heart disease risk rate using data mining
techniques. Various Data mining techniques and classifiers are
discussed in many studies which are used for efficient and
efficacious heart disease diagnosis. As per the analysis mode,
it is seen that many authors use various technologies and
different number of attributes for their study. Hence, different
technologies give different precision depending on a number
of attributes considered. Using KNN and ID3 algorithm the
risk rate of heart disease was detected and accuracy level also
provided for different number of attributes. In future, the
numbers of attributes could be reduced and accuracy would be
increased using some other algorithms.

Figure4: various risk levels of pulse rate for input attributes.


Figure5 describes about the Accuracy levels of different
number of attributes. The accuracy of the prediction is
increased through adding additional attributes such as previous
heart disease rate and smoking. Smoking and previous heart
disease rate were analyzed to predict the patient from heart
disease with the other basic attributes. Using the basic
attributes such as blood pressure, cholesterol, Pulse rate, age
and gender the accuracy level of the prediction was 40.3% and
when two additional attributes such as smoking and previous
heart disease were added the accuracy level of the prediction
was increased up to 80.6%.

REFERENCES
[1] K.Sudhakar, Dr. M. Manimekalai, Study of Heart Disease
Prediction using Data Mining, International Journal of
Advanced Research in  Computer Science and Software
Engineering, Volume 4, Issue 1, pp.1157-60, January 2014.
[2] S. U. Amin, K. Agarwal, and R. Beg, Genetic Neural Network
Based Data Mining in Prediction of Heart Disease Using Risk
Factors, ,IEEE Conference on Information and Communication
Technologies (ICT 2013), 2013.
[3] Miss. Chaitrali S. Dangare, Dr. Mrs. Sulabha S. Apte, A Data
mining approach for prediction of heart disease using neural
networks, International Journal of Computer Engineering &
Technology(IJCET)), Volume 3, Issue 3, October - December
(2012), pp. 30-40.
[4] S.Indhumathi, Mr.G.Vijaybaskar, Web based health care
detection using naive Bayes algorithm, International Journal of
Advanced Research in Computer Engineering & Technology
(IJARCET), Volume 4 Issue 9, pp.3532-36, September 2015.
[5] G. Purusothaman,
P. Krishnakumari, A Survey of Data
Mining Techniques on Risk Prediction: Heart Disease, Indian
Journal of Science and Technology, Vol 8(12),,June 2015.
[6] R. Chitra, V. Seenivasagam, Review of heart disease
prediction system using data mining and hybrid intelligent
techniques, ICTACT JOURNAL ON SOFT COMPUTING, July
2013, volume: 03, issue: 04 pp.605-09.

Figure5: Accuracy levels of different number of Attributes.


Both smoking and history of heart disease had the static values
such as 0 which indicates either not smoking or no history of
heart disease and 1 which indicates either smoking or history
of heart disease. Adding number of prominent attributes
increases the accuracy of the prediction system.

[7] Miss. Chaitrali S. Dangare, Dr. Mrs. Sulabha S. Apte, Improved


Study of Heart Disease Prediction System using Data Mining
Classification Techniques, International Journal of Computer
Applications (0975 888), Volume 47 No.10, pp.44-48, June
2012.
[8] Beant Kaur h, Williamjeet Singh, Review on Heart Disease
Prediction System using Data Mining Techniques, International
Journal on Recent and Innovation Trends in Computing and
Communication, Volume: 2 Issue: 10, pp.3003-08,October 2014.

2016 International Conference on Circuit, Power and Computing Technologies [ICCPCT]


[9] Shovon K. Pramanik, Subrata Pramanik, Bimal K. Pramanik, M.
K. Islam Molla and Md. Ekramul Hamid, Hybrid Classification
Algorithm for Knowledge Acquisition of Biomedical Data,
International Journal of Advanced Science and Technology, Vol.
44, July, 2012
.
[11] Jyothi Thomas, G.Kulanthaivel, Preterm Birth Prediction Using
Cuckoo Search Based Fuzzy Min-Max Neural Network,
International Review on Computer and Software (IRECOS),
Vol.8, N.8, pp.1854-62, pp.1854-62, August, 2013.

[10] C. Ordonez, Association rule discovery with the train and test
approach for heart disease prediction, IEEE transactions on
Information technology in biomedicine: a publication of the
IEEE Engineering in Medicine and Biology Society, vol. 10, no.
2,pp.33443,Apr.2006

Das könnte Ihnen auch gefallen