Sie sind auf Seite 1von 59

An Introduction to WEKA

Lecture by Limsoon Wong Slides prepared by Dong Difeng

Lecture at National Yang Ming University, June 2006

Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

What is WEKA
Developed at Univ of Waikato in New Zealand A collection of state-of-art machine learning algorithms and data preprocessing tools Provide implementation of Regression Classification Clustering Association rules Feature selection
Lecture at National Yang Ming University, June 2006 Copyright 2006 Dong Difeng

What is WEKA

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

Knowledge Flow
Experiment 1: Type: Classification Feature selection: GainRatio; Ranker (top 3) Algorithm: ID3 Training: Weather_nominal.arff Test: Weather_nominal.arff

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

Knowledge Flow
Source file (.ARFF)

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

10

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

11

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

12

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

13

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

14

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

15

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

16

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

17

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

18

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

19

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

20

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

21

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

22

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

23

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

24

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

25

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

26

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

27

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

28

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

29

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

30

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

31

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

32

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

33

Knowledge Flow

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

34

Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

35

Explorer
Do the same experiment Experiment 1: Type: Classification Feature selection: GainRatio; Ranker top 3 Algorithm: ID3 Training: Weather_nominal.arff Test: Weather_nominal.arff

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

36

Explorer

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

37

Explorer

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

38

Explorer

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

39

Explorer

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

40

Explorer

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

41

Explorer

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

42

Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

43

Why Knowledge Flow


There are some jobs we cannot do in explorer Combine feature selection Build more complicated systems KF describes the process more clearly Never regard the training and test data to be separate in the previous example in explorer KF help us to access some mid-process info of the machine learning method Cross Validation

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

44

Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

45

Cross Validation
Experiment 2: Type: Classification Feature selection: GainRatio; Ranker top 3 Algorithm: ID3 Training: Weather_nominal.arff (CV) Test: Weather_nominal.arff (CV) CV type: 3-folder CV

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

46

Cross Validation

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

47

Cross Validation
What do we view in this case? Text1 VS. Text2 (1)

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

48

Cross Validation
Text1 VS. Text2 (2)

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

49

Cross Validation
Text1 VS. Text2 (3)

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

50

Cross Validation
Text3 VS. Text4 (1)

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

51

Cross Validation
Text3 VS. Text4 (2)

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

52

Cross Validation
Text3 VS. Text4 (3)

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

53

Cross Validation
Trees

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

54

Evaluation of result

Cross Validation

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

55

Cross Validation
Conclusion: Source data are separated into several folders for cross validation Feature selection is done for each training folder (only training) folder separately Different trees are build in different cases The evaluation of classification is by overall results

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

56

Cross Validation
Experiment 3: Type: Classification Feature selection: GainRatio; Ranker top 2 Algorithm: ID3 Training: Weather_nominal.arff Test: Weather_nominal.arff

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

57

Cross Validation
Ranker top 3 VS. Ranker top 2

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

58

Cross Validation
Conclusion: Attribute windy was ignored. In this case, the classifier only consider the attribute that was kept

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

59

Reference
http://www.cs.waikato.ac.nz/~ml/ Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)

Lecture at National Yang Ming University, June 2006

Copyright 2006 Dong Difeng

Das könnte Ihnen auch gefallen