DENGUE FEVER PREDICTION USING DATA MINING TECHNIQUE
Abstract: applying data mining technique on the
Data mining has ability to extract medical data bases. useful knowledge that is hidden in huge data. Health care system is potential area Keywords: Dengue fever (DF), Decision to apply and take the advantage of data tree, Support Vector Machine (SVM), mining. Dengue fever is a disease caused Waikato Environment for Knowledge by a family of viruses transmitted by Analysis (WEKA). mosquitoes. To detect the dengue fever at the beginning by using the most important I. INTRODUCTION: medical symptoms and laboratory data Dengue fever is a disease caused by helps to predict the dengue fever in early dengue virus and is also known as break bone stages. This process consists of three fever is transmitted by Aedes mosquito. important steps: to find manual missing Infection of dengue is divided into four part value imputation method is applied that DHF I, DHF II, DHF II, DHF IV. It causes makes the data consistent; to select most life threatening dengue hemorrhagic fever significant attributes for dengue fever; whose symptoms include bleeding, low under the classification technique, dengue levels of blood platelets, low blood pressure, fever is used to calculate and to associate and metallic taste in mouth, headache, their performance. From the UCI muscle joint pain and rashes [1]. Dengue repository all the dataset were collected fever occurs in form of cycles and this cycle and for that different classification is present inside the body of an infected techniques are performed. There are person for two week or less than two week. It Decision tree, Random tree, J48 and SMO. causes abdominal pain, haemorrhage WEKA is used as a tool in data mining for (bleeding), and circulatory collapse and classification of data. In the conclusion we Dengue hemorrhagic fever. The following is propose a new technique to predict the the cycle mechanism by which dengue is dengue fever in the early stages by transmitted: Aedes mosquito carries a virus in its saliva, when it bites a healthy person, that virus enters the person’s body and gets II. FEATURE SELECTION mixed with the person’s body fluids. METHOD The moment white blood vessels gets The most important and frequently used mixed with the single stranded RNA dengue technique in data mining is feature selection virus, it starts reproducing inside the white which is used to data pre-processing. Feature blood cells and thus initiates the dengue virus selection has been applied to a lot of areas cycle. In case of severe infection, the namely face recognition, text categorization, duration of virus cycle is prolonged and finance and customer relationship thereby affects liver and bone marrow management and cancer classification. In the leading to less blood circulation in blood feature selection technique the data includes vessels, and the blood pressure becomes so many unnecessary features. Redundant low that it cannot supply sufficient blood to features provide more information rather than all the organs of the body. The bone marrow selected features, and irrelevant features also does not function properly due to this provide no useful information in any infection leading to reduced number of situation. Feature selection process is used platelets and increased risk of bleeding, for choosing a subset of features from a huge which are necessary for effective blood set of features to reduce the extensity of clotting [2]. Weka tool is mainly used in feature space for a strong classification task. research area for solving data mining Selecting a subset of relevant features is the problems. Weka stands for Waikato process of feature selection. The algorithm Environment for Knowledge Analysis feature selection is the combination of a developed at the University of Waikato in search method used for generating new New Zealand and was implemented in 1997. feature subsets, including an estimating Weka is open source software and written in measure which scores the different feature java language. Weka can be used in many subsets. It states that one should aim for the different levels. Weka contains modules for models with the smallest possible number of data classification and accuracy to predict parameters that adequately represent the diseases. Weka has been used in existing data. For example, Einstein is quoted bioinformatics for diagnoses and analysis of as saying that “everything should be as dengue disease datasets [3]. simple as it can be, but not simpler.” III. DECISION TREE to treat the patients. They expect to have an Decision tree which is known as the most intelligent system that can trigger the day0 powerful and a widespread tool used for date of each patient. They set up four prediction and classification. A Decision tree experiments. In the first three experiments, is a flowchart like tree structure, where each they find knowledge in order to classify type internal node denotes a test on an attribute, of dengue infection. each branch represents an outcome of the test, and each leaf node (terminal node) holds For forth experiment, they tried to predict a class label. In this method they said Dengue the day of effervescence with the data before infection is a disease typically found in hot day0 date. They applied decision tree and sticky region. The doctors need to approach to all experiments. Note that they understand the features on dengue infection use sensitivity, specificity and accuracy as in order to correctly categorize the patients, performance measures. Their approximate since these patients require different accuracy of all four experiments using treatment. The datasets consists of more than decision tree is around 96.5%. 400 attributes. They used decision tree as a data mining tool. They propose a set of IV. FEATURE REDUCTION BY SVM meaningful attributes from the temporal data. A mapping of a multi-dimensional space Their experiments are divided into four parts. is applied into a lower dimension space In all four experiments they use decision which is done feature reduction. Feature trees. To make sure that the test data was a extraction includes features construction, real unseen data following each set of space dimensionality reduction, sparse information is tested by different dataset. The representations, and feature selection all third experimental results show the useful these techniques are commonly used as pre- knowledge when they integrated two processing to machine learning and statistics datasets. Another objective of this research is tasks of prediction, including pattern to detect the day of effervescence of fever recognition. Although for many years all which is called day0. The day0 date is the such issues were tackled by researchers, critical date of dengue patients that some recently there has been a renewed concern in patients face the fatal condition. Therefore feature extraction. The feature space having the physicians need to predict day0 in order compact features actually contributes to classification that cuts down the pre- technique also may choose less processing overheads and minimizes the important attributes. So this makes effects of the “peaking phenomenon‟ in the processing time increase. classification. Thereby improving the overall performance of classifier based intrusion B. Proposed System detection systems. Classification techniques We propose a new expert system for are effective tool in order to classify the predicting dengue fever. Our methodology cancer data and measuring the accuracy. The consists of three major steps. effective classification tool is Support Vector 1) A manual missing value imputation Machine (SVM). In the proposed method, method is used. This reduces the false cancer datasets are processed more value entry. So that our results will effectively using feature selection and improve marginally. classification. The feature selection method is based on Ant Colony Optimization and the 2) For selecting the most influential classification method is Support Vector attributes that predict the dengue Machine (SVM). The ACO algorithm is fever we took expert doctors opinion implemented using Java Net Beans IDE and internet survey. This process 8.0.1. The SVM classification is done using reduces collecting unnecessary Weka 3.6. attributes during data collection. This helps in accurate prediction of dengue A. Existing System fever. 1) In the existing methods for missing values they used automated data 3) After pre-processing the data we use mining missing value imputation decision tree for predicting dengue techniques in all the methods fever. This will be implemented by explained above. These techniques using Java Net Beans and to analyse may fill approximate or wrong values accuracy using SVM classification. in many cases. This will affect the So as we are expected this method final results. gave accurate results as explained in 2) In the existing methods for feature the implementation section. selection they used algorithms. This V. PREDICTION OF AN Machines (SVM) have greater accuracy in ARBOVIRUS – DENGUE diagnosis with the increasing conformation. DISEASE As shown in the experiment results, Dengue infection is an epidemic support vector machine has the highest disease typically found in tropical region. classification precision most of the time. Symptoms of this disease shows a rapid and However support vector machine is very violent nature in patient’s behaviour in a time consuming because of more short time. The World Health Organization parameters, demands more computation (WHO) classifies the dengue infection as time. Following chart describes results for Dengue Fever (DF) and Dengue this comparative analysis. Haemorrhagic Fever (DHF). Symptoms of DHF are divided into 4 types. The problem 1.2 might be happen when an expert 1 misdiagnoses dengue infection. For 0.8 Example, an expert diagnosed a patient as 0.6
non-dengue or DF even if a patient was a 0.4
0.2 DHF patient. That might be the cause of 0 dead if patient did not receive treatment. Sensitivity Specificity Accuracy Risk Rate Therefore, we selected data mining approach SVM method Decision tree to solve this problem. Different classification techniques are used Fig -1: Performance of SVM and Decision to properly classify the dataset. These Tree Algorithm techniques are REP Tree, Random tree, SVM, Decision Tree Approach, and Spatial Data Analysis as well. [2] Represents Data Mining of Dengue In this survey study, [1] With the Infection Using Decision Tree in which Representation of Comparative Analysis in each dataset consists of more than 400 Machine Learning Techniques for attributes. To accomplish the knowledge classification of Arbovirus – Research discovery task, we consider employing analysis have stated that the Support Vector decision tree as a data mining tool. We propose a set of meaningful attributes from comparison among all of them we the temporal data. Our experiments are concluded that SVM technique is greatest divided into 4 parts. The first two among all others. experimental results show the useful knowledge to classify dengue infection VI. CONCLUSION from 2 different datasets respectively. The infection rates of Aedes Aegypti Another objective of this research is to mosquitoes increase morbidity rate hence the detect the day of effervescence of fever decision tree is generated with the Aegypti which is called day0. At the end we rate as the root node and prevent further obtained very low accuracy in day-4 as we occurrences. The prediction of dengue found that the tree is over fit. The infection carried out using Weka data mining experimental results shown that the decision tool and data mining techniques such as tree approach did not suit this task thus we Decision tree and Support Vector Machine. think we should to select a new Thus the model helps to predict the dengue classification approach in the future works. cases earlier and reduce mortality rate. [3] To determine the infected cases caused by Dengue fever in Jhelum Firstly, there exists a wide class of district and the surrounding areas algorithms and techniques for information geographically is represented by extraction and knowledge discovery in classification technique. So, we can medical science. Best results are achieved compare performance of different by balancing knowledge of experts for classification techniques. Objective of this describing the problem and goals with study also includes the comparison of search capabilities. Hospitals must also different classification algorithms with the want to minimize cost of clinical test. It help of graphs, based on our dataset. We can be achieved by employing appropriate have implemented all the techniques by computer based information and decision using weka tool and all the procedure of sport system. Here, data mining plays an implementation is within it. At the end, important role to give many results faster after analysis of our dataset with each and accurate by using various algorithms. technique we are paralleling them in the By analysing the different techniques in conclusion. When we have done the mentioned researches we can say that more accurate method for medical science is [2] Song Q O. Dengue vector control in SVM classification methods is more useful Malaysia: A review for current and in medical science and disease prediction. alternative But, we can combine different methods to strategies. SainsMalaysiana, 2016, 45(5):777- get accurate knowledge discovery. 785 Previously studies are along with [3] World Health Organization (WHO). comparative analysis only. Global strategy for dengue prevention and control REFERENCES 2012-2020. Geneva: WHO, 2012 Ouardighi A, Aboutajdine D. A powerful [4] Webster D P, Farrar J, Rowland J S. feature selection approach based on mutual Progress towards a dengue vaccine. The information. International Journal of Lancet Computer Science and Network Security. Infectious Diseases, 2009, 9(11):678-687 2008; 8(4):116–21. [5] Paul B, Tham WL. Controlling dengue: 2. Chen B, Chen L, Chen Y. Efficient ant Effectiveness of biological control and colony optimization for image feature vaccine in selection. Signal Process. 2013; 93(6):1566– reducing the prevalence of dengue infection 76. in endemic areas. Health, 2016, 8(1):64-74 3. Meiri R, Zahavi J. Using simulated [6] Tham AS. Issue and challenges inAedes annealing to optimize the feature selection surveillance and control. In Workshop problem in marketing applications. European Proceeding Journal of Operational Research. 2006; Behavior Intervention in Dengue Control of 171(3):842–58. Malaysia, 2000, pp. 15-23 [7] World Health Organization (WHO). [1] iDengue. Malaysian dengue information. Dengue: Call for urgent interventions for a Kuala Lumpur: Malaysian Remote Sensing rapidly Agency, Ministry of Science, Technology expanding emerging disease. Technical and Innovation and Ministry of Health paper, Geneva: WHO, 2011 Malaysia, [8] Mudin RN. Dengue incidence and the 2017 prevention and control program in Malaysia. International Medical Journal Malaysia, 10(12):6319-6334 2015, 14(1):5-10 [10] Choi Y, Tang CS, McIver L, Hashizume [9] Cheong YL, Burkart K, Leitão, PJ, Lakes M, Chan V, Abeyasinghe RR, Huy R. Effects T. Assessing weather effects on dengue of disease weather factors on dengue fever incidence in Malaysia. International Journal of and implications for interventions in Environmental Research and Public Health, Cambodia. 2013, BMC Public Health, 2016, 16(1):1-7