Beruflich Dokumente
Kultur Dokumente
Chapter 9
Classification: Advanced Methods
Jiawei Han, Micheline Kamber, and Jian Pei
University of Illinois at Urbana-Champaign &
Simon Fraser University
2011 Han, Kamber & Pei. All rights reserved.
1
Classification by Backpropagation
Summary
2
X
Z
Y is the parent of P
No dependency between Z and P
Has no loops/cycles
3
Smoker (S)
LungCancer
(LC)
PositiveXRay
Emphysema
Dyspnea
LC
0.8
0.5
0.7
0.1
~LC
0.2
0.5
0.3
0.9
Classification by Backpropagation
Summary
6
Classification by
Backpropagation
Weakness
Strength
Output layer
Hidden layer
wij
Input layer
Input vector: X
9
Inputs are fed simultaneously into the units making up the input
layer
The weighted outputs of the last hidden layer are input to units
making up the output layer, which emits the network's prediction
10
Backpropagation
For each training tuple, the weights are modified to minimize the
mean squared error between the network's prediction and the
actual target value
Steps
x0
w0
x1
w1
xn
wn
output y
For Example
n
Input
weight
vector x vector w
weighted
sum
Activation
function
y sign( wi xi k )
i 0
Classification by Backpropagation
Summary
15
Classification: A Mathematical
Mapping
x1 : # of word homepage
x2 : # of word welcome
x x
x
x
x x x
Discriminative Classifiers
Advantages
Prediction accuracy is generally high
Applications:
SVMGeneral Philosophy
Small Margin
Large Margin
Support Vectors
20
21
Let data D be (X1, y1), , (X|D|, y|D|), where Xi is the set of training
tuples associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but
we want to find the best one (the one that minimizes classification
error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e.,
maximum marginal hyperplane (MMH)
22
SVMLinearly Separable
H2: w0 + w1 x1 + w2 x2 1 for yi = 1
23
SVMLinearly Inseparable
26
28
29
SVM
Deterministic algorithm
Nice generalization
properties
Neural Network
Nondeterministic
algorithm
To learn complex
functionsuse multilayer
perceptron (nontrivial)
32
Representative implementations
LIBSVM: an efficient implementation of SVM, multiclass classifications, nu-SVM, one-class SVM, including
also various interfaces with java, python, etc.
Classification by Backpropagation
Summary
34
Associative Classification
Why effective?
35
36
(a) Austral
(b) Cleve
(c) Sonar
38
Empirical Results
1
InfoGain
IG_UpperBnd
0.9
0.8
Information Gain
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
100
200
300
400
500
600
700
Support
(a) Austral
(b) Breast
(c) Sonar
39
Feature Selection
Given a set of frequent patterns, both nondiscriminative and redundant patterns exist, which
can cause overfitting
Experimental Results
41
41
Scalability Tests
42
DDPMine: Branch-and-Bound
Search
sup(child ) sup( parent )
sup(b) sup(a)
a
b
a: constant, a parent
node
b: variable, a
descendent
Association between
information gain and
frequency
43
Harmon
y
PatClass:
ICDE07
Pattern
Classification
Alg.
DDPMin
e
44
Classification by Backpropagation
Summary
45
Instance-based learning:
Store training examples and delay the
processing (lazy evaluation) until a new
instance must be classified
Typical approaches
k-nearest neighbor approach
_
_
. +
xq
48
Methodology
Challenges
Classification by Backpropagation
Summary
51
53
Fuzzy Set
Approaches
Fuzzy logic uses truth values between 0.0 and 1.0 to represent
the degree of membership (such as in a fuzzy membership
graph)
Classification by Backpropagation
Summary
55
Multiclass Classification
Comparison
Error-Corr.
Codeword
1 1 1 1 1 1
0 0 0 0 1 1
0 0 1 1 0 0
0 1 0 1 0 1
57
Semi-Supervised Classification
Then f1 and f2 are used to predict the class label for unlabeled data X
Teach each other: The tuple having the most confident prediction
from f1 is added to the set of labeled data for f 2, & vice versa
58
Active Learning
59
Traditional Learning
Framework
Transfer Learning
Framework
60
Classification by Backpropagation
Summary
62
Summary
Pattern-based classification
Multiclass classification
Semi-supervised classification
Active learning
Transfer learning
63
References
64
Surplus Slides
What Is Prediction?
Linear Regression
w
1
(x
i 1
x )( yi y )
| D|
(x
x )2
w yw x
0
1
i 1
Multiple linear regression:
involves more than one predictor variable
Training data is of the form (X1, y1), (X2, y2),, (X|D|, y|D|)
Nonlinear Regression
Minimal generalization
Prediction
Determine the major factors which influence the prediction
Data relevance analysis: uncertainty measurement,
entropy analysis, expert judgement, etc.
Multi-level prediction: drill-down and roll-up analysis
71
72
73
SVMIntroductory
Literature
C.J.C. Burges.
A Tutorial on Support Vector Machines for Pattern Recognition .
Knowledge Discovery and Data Mining, 2(2), 1998.
Better than the Vapniks book, but still written too hard for
introduction, and the examples are so not-intuitive
C.J.C. Burges.
A Tutorial on Support Vector Machines for Pattern Recognition .
Knowledge Discovery and Data Mining, 2(2), 1998.
Easier than Vapniks book, but still not introductory level; the
examples are not so intuitive
76
CMAR (Classification based on Multiple Association Rules: Li, Han, Pei, ICDM01)
Efficiency: Uses an enhanced FP-tree that maintains the distribution
of class labels among tuples satisfying each frequent itemset
Rule pruning whenever a rule is inserted into the tree
Given two rules, R and R , if the antecedent of R is more
1
2
1
general than that of R2 and conf(R1) conf(R2), then prune R2
Prunes rules for which the rule antecedent and class are not
positively correlated, based on a 2 test of statistical significance
Classification based on generated/pruned rules
If only one rule satisfies tuple X, assign the class label of the rule
If a rule set S satisfies X, CMAR
77
x2
Scalar: x, y, w
Input:
{(x1, y1), }
x1
Winnow: update W
multiplicatively
78