Sie sind auf Seite 1von 7

Weka Tutorial

By Tresna Maulana Fahrudin

Department of Information and Computer Engineering

Graduate Program of Engineering Technology

Electronics Engineering Polytechnic Institute of Surabaya, Indonesia

1. Download two files following this URL:


a. http://users.aber.ac.uk/rkj/book/wekafull.jar WEKA TOOLS
b. http://tunedit.org/repo/UCI/breast-w.arff DATASET

2. Open wekafull.jar, and then click Explorer button


3. Click Open file button, and choose breast-w.arff file.
4. If your data is successful open, it will show all attributes/features. This data is consists
of 9 features and 2 classes (benign, and malignant).

5. To make the scenario of feature selection / attribute selection, the first step you must
measure the accuracy performance using classification (before feature selection) to
know the accuracy of data.
Click tab Classify weka classifiers bayes NaiveBayes
As you know there are four popular algorithms in data mining

a. Nave Bayes (probability-based),


b. Decision Tree (Tree/Reasoning-based),
c. Rule Induction (IF-THEN rules-based),
d. and K-Nearest Neighbor (Distance-based).

You can apply their algorithm in Weka following instruction below

Nave Bayes: Click tab Classify weka classifiers bayes NaiveBayes

Decision Tree: Click tab Classify weka classifiers trees J48

Rule Induction: Click tab Classify weka classifiers rules JRip

K-Nearest Neighbor: Click tab Classify weka classifiers lazy IBk

After you choose one algorithm (for example: Nave Bayes), you can click start button to
execute its algorithm.

You can see on the figure above that is accuracy performance of Nave Bayes algorithm to
measure breast cancer dataset. You can read my paper, I used 4 parameter performance which
it is recommended by several researcher to use Accuracy (TP rate), Precision, Recall, and
F-measure. You can search the meaning of them. As example, we know the accuracy
performance of breast cancer dataset is 96%.
6. After we known the accuracy performance of original breast cancer dataset (96%), we
want to measure how the effect of feature selection. You can follow this instruction:

Click tab Select attributes weka attributeSelection AntSearch

The figure above explain that we choose Ant Colony (AntSearch) as feature selection
algorithm.

7. We can click Start button to execute its algorithm, and then we get the result.
The figure above explain that all the feature of breast cancer dataset based on evaluation by
Ant Colony algorithm as feature selection is important, because its algorithm choose 9
features as the important features and have high contribution from 9 existing features in data.

In other case (other dataset), you will get the condition which Ant colony algorithm will
choose 533 feature as important features from 2001 existing features (you can read my
paper).

Existing features After feature selection by GA(Genetic algorithm)

8. For example, if you get the condition/case that Ant Colony algorithm choose 5
features of 9 existing feature, you can remove the other 4 features in data.
The figure above explain that you can check in the feature column to remove which
feature is not important in data based on feature selection result, and then click
Remove button to execute.

9. And last step, you can measure the accuracy performance of your data which they are
feature selection data (not original data), and then you apply the classification
algorithm using Nave Bayes classifiers.

- May be useful for you -

Das könnte Ihnen auch gefallen