Sie sind auf Seite 1von 8

DENGUE FEVER PREDICTION USING DATA MINING TECHNIQUE

Abstract: applying data mining technique on the


Data mining has ability to extract medical data bases.
useful knowledge that is hidden in huge
data. Health care system is potential area Keywords: Dengue fever (DF), Decision
to apply and take the advantage of data tree, Support Vector Machine (SVM),
mining. Dengue fever is a disease caused Waikato Environment for Knowledge
by a family of viruses transmitted by Analysis (WEKA).
mosquitoes. To detect the dengue fever at
the beginning by using the most important I. INTRODUCTION:
medical symptoms and laboratory data Dengue fever is a disease caused by
helps to predict the dengue fever in early dengue virus and is also known as break bone
stages. This process consists of three fever is transmitted by Aedes mosquito.
important steps: to find manual missing Infection of dengue is divided into four part
value imputation method is applied that DHF I, DHF II, DHF II, DHF IV. It causes
makes the data consistent; to select most life threatening dengue hemorrhagic fever
significant attributes for dengue fever; whose symptoms include bleeding, low
under the classification technique, dengue levels of blood platelets, low blood pressure,
fever is used to calculate and to associate and metallic taste in mouth, headache,
their performance. From the UCI muscle joint pain and rashes [1]. Dengue
repository all the dataset were collected fever occurs in form of cycles and this cycle
and for that different classification is present inside the body of an infected
techniques are performed. There are person for two week or less than two week. It
Decision tree, Random tree, J48 and SMO. causes abdominal pain, haemorrhage
WEKA is used as a tool in data mining for (bleeding), and circulatory collapse and
classification of data. In the conclusion we Dengue hemorrhagic fever. The following is
propose a new technique to predict the the cycle mechanism by which dengue is
dengue fever in the early stages by transmitted: Aedes mosquito carries a virus
in its saliva, when it bites a healthy person,
that virus enters the person’s body and gets II. FEATURE SELECTION
mixed with the person’s body fluids. METHOD
The moment white blood vessels gets The most important and frequently used
mixed with the single stranded RNA dengue technique in data mining is feature selection
virus, it starts reproducing inside the white which is used to data pre-processing. Feature
blood cells and thus initiates the dengue virus selection has been applied to a lot of areas
cycle. In case of severe infection, the namely face recognition, text categorization,
duration of virus cycle is prolonged and finance and customer relationship
thereby affects liver and bone marrow management and cancer classification. In the
leading to less blood circulation in blood feature selection technique the data includes
vessels, and the blood pressure becomes so many unnecessary features. Redundant
low that it cannot supply sufficient blood to features provide more information rather than
all the organs of the body. The bone marrow selected features, and irrelevant features
also does not function properly due to this provide no useful information in any
infection leading to reduced number of situation. Feature selection process is used
platelets and increased risk of bleeding, for choosing a subset of features from a huge
which are necessary for effective blood set of features to reduce the extensity of
clotting [2]. Weka tool is mainly used in feature space for a strong classification task.
research area for solving data mining Selecting a subset of relevant features is the
problems. Weka stands for Waikato process of feature selection. The algorithm
Environment for Knowledge Analysis feature selection is the combination of a
developed at the University of Waikato in search method used for generating new
New Zealand and was implemented in 1997. feature subsets, including an estimating
Weka is open source software and written in measure which scores the different feature
java language. Weka can be used in many subsets. It states that one should aim for the
different levels. Weka contains modules for models with the smallest possible number of
data classification and accuracy to predict parameters that adequately represent the
diseases. Weka has been used in existing data. For example, Einstein is quoted
bioinformatics for diagnoses and analysis of as saying that “everything should be as
dengue disease datasets [3]. simple as it can be, but not simpler.”
III. DECISION TREE to treat the patients. They expect to have an
Decision tree which is known as the most intelligent system that can trigger the day0
powerful and a widespread tool used for date of each patient. They set up four
prediction and classification. A Decision tree experiments. In the first three experiments,
is a flowchart like tree structure, where each they find knowledge in order to classify type
internal node denotes a test on an attribute, of dengue infection.
each branch represents an outcome of the
test, and each leaf node (terminal node) holds For forth experiment, they tried to predict
a class label. In this method they said Dengue the day of effervescence with the data before
infection is a disease typically found in hot day0 date. They applied decision tree
and sticky region. The doctors need to approach to all experiments. Note that they
understand the features on dengue infection use sensitivity, specificity and accuracy as
in order to correctly categorize the patients, performance measures. Their approximate
since these patients require different accuracy of all four experiments using
treatment. The datasets consists of more than decision tree is around 96.5%.
400 attributes. They used decision tree as a
data mining tool. They propose a set of IV. FEATURE REDUCTION BY SVM
meaningful attributes from the temporal data. A mapping of a multi-dimensional space
Their experiments are divided into four parts. is applied into a lower dimension space
In all four experiments they use decision which is done feature reduction. Feature
trees. To make sure that the test data was a extraction includes features construction,
real unseen data following each set of space dimensionality reduction, sparse
information is tested by different dataset. The representations, and feature selection all
third experimental results show the useful these techniques are commonly used as pre-
knowledge when they integrated two processing to machine learning and statistics
datasets. Another objective of this research is tasks of prediction, including pattern
to detect the day of effervescence of fever recognition. Although for many years all
which is called day0. The day0 date is the such issues were tackled by researchers,
critical date of dengue patients that some recently there has been a renewed concern in
patients face the fatal condition. Therefore feature extraction. The feature space having
the physicians need to predict day0 in order compact features actually contributes to
classification that cuts down the pre- technique also may choose less
processing overheads and minimizes the important attributes. So this makes
effects of the “peaking phenomenon‟ in the processing time increase.
classification. Thereby improving the overall
performance of classifier based intrusion B. Proposed System
detection systems. Classification techniques We propose a new expert system for
are effective tool in order to classify the predicting dengue fever. Our methodology
cancer data and measuring the accuracy. The consists of three major steps.
effective classification tool is Support Vector 1) A manual missing value imputation
Machine (SVM). In the proposed method, method is used. This reduces the false
cancer datasets are processed more value entry. So that our results will
effectively using feature selection and improve marginally.
classification. The feature selection method is
based on Ant Colony Optimization and the 2) For selecting the most influential
classification method is Support Vector attributes that predict the dengue
Machine (SVM). The ACO algorithm is fever we took expert doctors opinion
implemented using Java Net Beans IDE and internet survey. This process
8.0.1. The SVM classification is done using reduces collecting unnecessary
Weka 3.6. attributes during data collection. This
helps in accurate prediction of dengue
A. Existing System fever.
1) In the existing methods for missing
values they used automated data 3) After pre-processing the data we use
mining missing value imputation decision tree for predicting dengue
techniques in all the methods fever. This will be implemented by
explained above. These techniques using Java Net Beans and to analyse
may fill approximate or wrong values accuracy using SVM classification.
in many cases. This will affect the So as we are expected this method
final results. gave accurate results as explained in
2) In the existing methods for feature the implementation section.
selection they used algorithms. This
V. PREDICTION OF AN Machines (SVM) have greater accuracy in
ARBOVIRUS – DENGUE diagnosis with the increasing conformation.
DISEASE As shown in the experiment results,
Dengue infection is an epidemic support vector machine has the highest
disease typically found in tropical region. classification precision most of the time.
Symptoms of this disease shows a rapid and However support vector machine is very
violent nature in patient’s behaviour in a time consuming because of more
short time. The World Health Organization parameters, demands more computation
(WHO) classifies the dengue infection as time. Following chart describes results for
Dengue Fever (DF) and Dengue this comparative analysis.
Haemorrhagic Fever (DHF). Symptoms of
DHF are divided into 4 types. The problem
1.2
might be happen when an expert
1
misdiagnoses dengue infection. For
0.8
Example, an expert diagnosed a patient as 0.6

non-dengue or DF even if a patient was a 0.4


0.2
DHF patient. That might be the cause of
0
dead if patient did not receive treatment. Sensitivity Specificity Accuracy Risk Rate
Therefore, we selected data mining approach
SVM method Decision tree
to solve this problem.
Different classification techniques are used Fig -1: Performance of SVM and Decision
to properly classify the dataset. These Tree Algorithm
techniques are REP Tree, Random tree,
SVM, Decision Tree Approach, and Spatial
Data Analysis as well.
[2] Represents Data Mining of Dengue
In this survey study, [1] With the Infection Using Decision Tree in which
Representation of Comparative Analysis in each dataset consists of more than 400
Machine Learning Techniques for attributes. To accomplish the knowledge
classification of Arbovirus – Research discovery task, we consider employing
analysis have stated that the Support Vector decision tree as a data mining tool. We
propose a set of meaningful attributes from comparison among all of them we
the temporal data. Our experiments are concluded that SVM technique is greatest
divided into 4 parts. The first two among all others.
experimental results show the useful
knowledge to classify dengue infection VI. CONCLUSION
from 2 different datasets respectively. The infection rates of Aedes Aegypti
Another objective of this research is to mosquitoes increase morbidity rate hence the
detect the day of effervescence of fever decision tree is generated with the Aegypti
which is called day0. At the end we rate as the root node and prevent further
obtained very low accuracy in day-4 as we occurrences. The prediction of dengue
found that the tree is over fit. The infection carried out using Weka data mining
experimental results shown that the decision tool and data mining techniques such as
tree approach did not suit this task thus we Decision tree and Support Vector Machine.
think we should to select a new Thus the model helps to predict the dengue
classification approach in the future works. cases earlier and reduce mortality rate.
[3] To determine the infected cases
caused by Dengue fever in Jhelum Firstly, there exists a wide class of
district and the surrounding areas algorithms and techniques for information
geographically is represented by extraction and knowledge discovery in
classification technique. So, we can medical science. Best results are achieved
compare performance of different by balancing knowledge of experts for
classification techniques. Objective of this describing the problem and goals with
study also includes the comparison of search capabilities. Hospitals must also
different classification algorithms with the want to minimize cost of clinical test. It
help of graphs, based on our dataset. We can be achieved by employing appropriate
have implemented all the techniques by computer based information and decision
using weka tool and all the procedure of sport system. Here, data mining plays an
implementation is within it. At the end, important role to give many results faster
after analysis of our dataset with each and accurate by using various algorithms.
technique we are paralleling them in the By analysing the different techniques in
conclusion. When we have done the mentioned researches we can say that more
accurate method for medical science is [2] Song Q O. Dengue vector control in
SVM classification methods is more useful Malaysia: A review for current and
in medical science and disease prediction. alternative
But, we can combine different methods to strategies. SainsMalaysiana, 2016, 45(5):777-
get accurate knowledge discovery. 785
Previously studies are along with [3] World Health Organization (WHO).
comparative analysis only. Global strategy for dengue prevention and
control
REFERENCES 2012-2020. Geneva: WHO, 2012
Ouardighi A, Aboutajdine D. A powerful [4] Webster D P, Farrar J, Rowland J S.
feature selection approach based on mutual Progress towards a dengue vaccine. The
information. International Journal of Lancet
Computer Science and Network Security. Infectious Diseases, 2009, 9(11):678-687
2008; 8(4):116–21. [5] Paul B, Tham WL. Controlling dengue:
2. Chen B, Chen L, Chen Y. Efficient ant Effectiveness of biological control and
colony optimization for image feature vaccine in
selection. Signal Process. 2013; 93(6):1566– reducing the prevalence of dengue infection
76. in endemic areas. Health, 2016, 8(1):64-74
3. Meiri R, Zahavi J. Using simulated [6] Tham AS. Issue and challenges inAedes
annealing to optimize the feature selection surveillance and control. In Workshop
problem in marketing applications. European Proceeding
Journal of Operational Research. 2006; Behavior Intervention in Dengue Control of
171(3):842–58. Malaysia, 2000, pp. 15-23
[7] World Health Organization (WHO).
[1] iDengue. Malaysian dengue information. Dengue: Call for urgent interventions for a
Kuala Lumpur: Malaysian Remote Sensing rapidly
Agency, Ministry of Science, Technology expanding emerging disease. Technical
and Innovation and Ministry of Health paper, Geneva: WHO, 2011
Malaysia, [8] Mudin RN. Dengue incidence and the
2017 prevention and control program in Malaysia.
International Medical Journal Malaysia, 10(12):6319-6334
2015, 14(1):5-10 [10] Choi Y, Tang CS, McIver L, Hashizume
[9] Cheong YL, Burkart K, Leitão, PJ, Lakes M, Chan V, Abeyasinghe RR, Huy R. Effects
T. Assessing weather effects on dengue of
disease weather factors on dengue fever incidence
in Malaysia. International Journal of and implications for interventions in
Environmental Research and Public Health, Cambodia.
2013, BMC Public Health, 2016, 16(1):1-7

Das könnte Ihnen auch gefallen