Beruflich Dokumente
Kultur Dokumente
Problem
By:
Asep Saefulloh Himawan Arisantoso Moedjiono Nazori AZ Presented in International Conference Paper
Computer Science and Information Technology (CSIT-2013) JUNE 2013
INTRODUCTION
START Introduction
Problem
INTRODUCTION
START Introduction
Problem
We conducted this study Which From the problems To Conduct Classification data mining the dataset AO and SIS
INTRODUCTION
START Introduction
Problem
PROBLEM
START Introduction
Problem
Problem formula is : Is algorithm C45, Naive Bayes and Neural Network be algorithms which can be applied in determining the prediction of graduation timely? Best which algorithm in determining prediction of graduation timely ? From chosen algorithm does can present result of data forecast of classification of datamining by presenting graduation timely ?
RESEARCH METHODS
START Introduction
Problem
Business/Research Understanding Phase Data obtained from secondary data from a database DMQ stored on a server Higher Education Prog. Data Understanding Phase (Fase Pemahaman Data) Database DMQ as 5842. Processing performed on the data that is used by 7 attributes or variables used in the prediction of graduation timely is: Nim, Student Name, Study of Education, Department, GPA, IMK and Prediction. of 7 attributes 2, Predictor namely GPA and IMK and 1 attributes goal to graduate on time.
Modeling Phase In this study, using three algorithms are algorithms C4.5, Naive Bayes and Neural Network. Evaluation Phase Evaluation and validation is performed by using Confusion Matrix and the ROC curve (Receiver Operating Characteristic). Deployment Phase At this stage rule applied to the model or the most accurate in predicted graduation on time and can then be used to evaluate new data.
DISCUSSION
START Introduction
Problem
This study aims to compare the accuracy of the resulting by engineering or data mining models namely algorithm C4.5, Naive Bayes, and Neural Network in making predictions for timely graduation.
Algoritma C4.5/J48
Steps to make the algorithm using data C4.5 totaling 891 training data, namely: a. Prepare training data b. Calculate the value of entropy c. Furthermore calculate the gain for each attribute and a select gain value the highest. For example, for the attribute GPA will get Gain
START Introduction
Problem
START Introduction
Problem
START Introduction
Problem
Method Naive Bayes using training data record number of 891 as the C4.5 methods
10
START Introduction
Problem
11
These are generated from neural net training data using the tools Weka multilayerperceptron.
12
Comparison of test results of the three algorithms as shown in Table 3 are found the highest accuracy values obtained Neural Network and C4.5 Algorithm and lows that followed Naive Bayes, measurenment that get to be used for precision, recall dan accuracy.
13
ROC Curve
START Introduction
Problem
In each test the Weka basically will instantly appear values ROC (Receveir Operating Characteristic).
Value Area Under the Curve (AUC) is 1 for the calculation of class the value graduated on time in the algorithm C4.5. As for the Neural Network value or Area Under the ROC curve Curve (AUC) is a class 1 for the calculation of the value of Pass Not the Right Time. Area Under Curve (AUC) using formula below
14
Of the three models, it can be seen that the value of accuracy, precision, sensitivity, recal, and the highest AUC values obtained in testing the model C4.5 and Neral Network with a balanced outcome and final Naive Bayes models as shown in Table 5 below:
For classification data mining, values AUC can be divided into several groups a. 0.90-1.00 = classification very good b.0.80-0.90 = classification good c. 0.70-0.80 = classification is quite d. 0.60-0.70 = classification poor e. 0.50-0.60 = classification false
can be concluded that the method C4.5, nave bayes, and neural network is classified as very good as it has Area Under Curve (AUC) values between 0.90-1.00.
Jumat, 10 Januari 2014 15
START Introduction
Problem
Figure 5. The Application Of Classification of Prediction of Graduation Timely with Engine Java
16
CONCLUSION
START
1. That algorithm C4.5, Naive Bayes, and Neural Network are algorithms
Introduction
accuracy 100% while Naive Bayes 99.8878%. The third algorithm is classified as very good value AUC (Area Under the Curve) between 0.90-1.00 so it can be used for predictive applications.
References
3. From the algorithm selected to show NIM, Student Name, GPA, IMK,
END
17
REFERENCES
START Introduction
Problem
18
START Introduction
Problem