Sie sind auf Seite 1von 6

WEKA AN INTRODUCTION

WEKA
Waikato Environment for Knowledge Analysis
(WEKA)
Developed by the Department of Computer Science,
University of Waikato, New Zealand
Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Used for research, education, and applications
http://www.cs.waikato.ac.nz/ml/weka/

Weka Interfaces
Explorer
Preprocessing, attribute selection, learning,
visualization
Knowledge Flow
Visual design of KDD process
Experimenter
testing and evaluating machine learning
algorithms
Command-line

Data Formats
Uses flat text files to describe the data
Can work with a wide variety of data files including
its own .arff format and C4.5 file formats
Data can be imported from a file in various formats:
ARFF, CSV, C4.5 etc.
ARFF (Attribute Relation File Format)
@relation person

@attribute age numeric


@attribute name string
@attribute education {College, Masters,
Doctorate}
@attribute class {>50K,<=50K}
@data
Supported Data types
Numeric
String
Nominal
Date
Relational

Explorer:

Supports Exploratory Data Analysis

Preprocess: Choose and modify the data being


acted on.

Classify: Train and test learning schemes that


classify or perform regression.

Cluster: Learn clusters for the data.

Associate: Learn association rules for the data.

Select attributes: Select the most relevant


attributes in the data.

Visualize: View an interactive 2D plot of the


data.
Explorer Preprocessing:
Loading Data
Open file
Open URL
Open DB
Generate

Native format ARFF


Supports file Conversions

Explorer Applying Filters:


Supervised Vs Unsupervised Filters
Attribute Vs Instance Filters
Unsupervised Attribute Filters
Add-Adds a new attribute
Normalize-Scales all numeric values
Remove-Remove Attributes (RemoveType /
RemoveUseless)
Unsupervised Instance Filters
Randomize- Randomize order of instance in a
dataset
RemoveWithValues- Filter out instances with
certain attribute values
Supervised Attribute Filters
AttributeSelection- Attribute Selection Methods
Discretize- Convert Numeric attributes to nominal
Supervised Instance Filters
Resample- Produce a random sub sample of a
dataset
Classifiers:
Bayes
Trees
Rules
Functions

BayesNet, NaiveBayes
ID3, J48
OneR, Conjunctive Rule
Linear Regression,
RBFNetwork,
Multilayer Perceptron
Lazy
KStar, IBk
Miscellaneous- VFI

Clusterers:
OPTICS
DBScan
SimpleKMeans
Cobweb
Associations:
Apriori
Predictive Apriori
Filtered Associator
Attribute Selection:
Attribute Evaluators
CfsSubsetEval
ClassifierSubsetEval
GainRatioAttributeEval
InfoGainAttributeEval
Search Method
Best First
Exhaustive Search
Genetic Search
Rank Search
Knowledge Flow Interface:
Data-flow inspired interface to WEKA
process data in batches or incrementally
process multiple batches or streams in parallel (each
separate flow executes in its own thread)
chain filters together
visualize performance of incremental classifiers
during processing

Experimenter Interface:
Enables the user to create, run, modify, and analyse
experiments in a more convenient manner
Modes of Operation
Simple
Advanced
Local / Remote Experiments are supported

Command Line Interface:


Plain text panel from where commands can be
entered
java <classname> [<args>] invokes a java class
with the given arguments (if any)
break stops the current thread, e.g., a running
classifier, in a friendly manner
kill stops the current thread in an unfriendly
fashion
cls clears the output area
exit exits the Simple CLI
help [<command>]
Weka Operation:
The Operating Systems command line interface can
also be used after setting the CLASSPATH
accordingly.
All the functionality supported by Weka can also be
invoked from ones own source code.
Weka Extensions:
BioWeka - Extension library for knowledge
discovery in biology
WekaMetal - Meta learning extension to WEKA

Weka-Parallel - Parallel processing for WEKA


Grid Weka - Grid computing using WEKA

References:
Witten, I.H. and Frank, E. (2005) Data Mining:
Practical machine learning tools and techniques. 2nd
edition Morgan Kaufmann, San Francisco
Weka Knowledge Flow Tutorial, Mark Hall Peter
Reutemann
http://www.inf.fhdortmund.de/personen/professoren/engels/dm/praktik
um/WEKA-KnowledgeFlowTutorial-3-5-7.pdf
WEKA Manual for Version 3-6-2 - Remco R.
Bouckaert, Eibe Frank et.al, January 11, 2010

Das könnte Ihnen auch gefallen