Beruflich Dokumente
Kultur Dokumente
Lecture 21/10-09-09
(no class b’coz of placements)
Lecture 22/11-09-09
(No class)
Lecture 23/12-09-09
(No Class)
Lecture 24/14-09-09
L22/11-09-09 1
Building Classification Rules
• Direct Method:
• Extract rules directly from data.
• e.g.: RIPPER, CN2, Holte’s 1R
• Indirect Method:
• Extract rules from other classification
models (e.g. decision trees, neural
networks, etc).
• e.g: C4.5rules
L22/11-09-09 2
Direct Method: Sequential Covering
Algorithm
• Extracts rules directly from data.
L22/11-09-09 3
Algorithm
1: Let E - training Records, A – set of attr. value pairs, {(Aj,vj)}.
2: Let Yo be an ordered set of classes {y1,y2,y3-------- ,yk}.
3: Let R = { } be initial
∈ rule (decision) list.
4: for each class y Yo- {yk} do
5: while stopping condition is not met do
6: r Learn-One-Rule (E, A, y).
Remove training records from E that are covered by r.
Add r to the bottom of the rule list: R R V r.
end while
end for
Insert the default rule, { } yk, to the bottom of the rule list R
L22/11-09-09 4
Sequential Covering Algorithm
(in short for your quick reference)
1. Start from an empty rule set.
2. Extract a rule using the Learn-One-Rule
function.
3. Remove training records covered by the
rule.
4. Repeat Step (2) and (3) until stopping
criterion is met.
L22/11-09-09 5
Learn-One-Rule function
• Objective - extract a rule that covers
maximum of +ve examples and none or few
negative examples in the training dataset.
L22/11-09-09 6
Example of Sequential Covering
(ii) Step 1
L22/11-09-09 7
Example of Sequential
Covering…
R1 R1
R2
L22/11-09-09 8
Aspects of Sequential Covering
• Rule Growing Strategy
• Instance Elimination
• Rule Evaluation
• Stopping Criterion
• Rule Pruning
L22/11-09-09 9
Rule Growing
• Two common strategies :
1. General-to-specific
Tid R
2. Specific-to-general
1 Y
L22/11-09-09
2 10
N
General- to- specific
r: { } y Rule has poor
quality as it
covers all
examples in the
training set
Conjuncts are
subsequently
added to
improve the
quality of the
rule
L22/11-09-09 11
Specific to general
Refund=No, Refund=No,
Status=Single, Status=Single,
Income=85K Income=90K
(Class=Yes) (Class=Yes)
Refund=No,
Status = Single
(Class = Yes)
(b) Specific-to-general
L22/11-09-09 12
Specific-to-general
Body temp=warm-blooded,
Skin cover=hair, gives
birth=yes, aquatic
creature=no, Aerial creature=
no, has legs= yes=> Mammals
L22/11-09-09 14
Rule Evaluation
• Metrics: nc
– Accuracy
=
n
k
R = 2∑ f i log( f i / ei )
– Likelihood ratio Statistics, i =1
nc + 1
– Laplace =
n +k n : total no. of instances
nc : Number of instances
nc + kp covered by rule
=
– M-estimate n +k k : Number of classes
nc + pm p : Prior probability
=
OR n+m
L22/11-09-09 15
FOIL’s Information gain (Rule Evaluation
Contd…)
L22/11-09-09 16
• Q2. Consider two rules:
– R1:A C
– R2:A ∧B C
Suppose R1 is covered by 350 +ve examples and 150 –
ve examples, while R2 is covered by 300 +ve
examples and 50 –ve examples. Compute the FOIL’s
information gain for rule R2 wrt R1.
• Q4. Page no. 317
L22/11-09-09 17
Aspects of Sequential Covering
Algorithm
• Rule Growing Strategy
• Rule Evaluation
• Stopping Criterion
• Rule Pruning
• Instance Elimination
L22/11-09-09 18
Stopping Criterion and Rule
Pruning
• Stopping criterion
– Compute the gain
– If gain is not significant, discard the new rule
• Rule Pruning
– Remove one of the conjuncts in the rule
– Compare error rate on validation set before
and after pruning
– If error improves, prune the conjunct
L22/11-09-09 19
Aspects of Sequential Covering
Algorithm
• Rule Growing Strategy
• Rule Evaluation
• Stopping Criterion
• Rule Pruning
• Instance Elimination
L22/11-09-09 20
Instance Elimination
R3 R
• Why do we need to eliminate
instances?
R1 + + + +
– Otherwise, the next rule is + + + +
+ + + +
identical to previous rule class = + ++ + +
+ + +
+ + +
+ + + +
- -
- - - - -
-
class = - - -
- -
-
- -
- -
•Why do we remove +ve and –ve instances?
-Ensure that the next rule is different
-Prevent underestimating the accuracy of rule.
L22/11-09-09 21
Indirect methods for rule-based
classifiers and Instance-Based
Classifiers
Lecture 26/17-09-09
L22/11-09-09 22
Indirect Methods
(Generating rule set from Decision tree)
P
No Yes
Q R Rule Set
Yes No
Birds Reptiles
L22/11-09-09 24
Advantages of Rule-Based
Classifiers
• As highly expressive as decision trees
• Easy to interpret
• Easy to generate
• Can classify new instances rapidly
• Performance comparable to decision trees
L22/11-09-09 25
Instance–Based Classifiers
L22/11-09-09 26
Eager learners vs. Lazy learners
• Eager learners
– DT and rule-based classifiers are ex. of eager
learners
– They are designed to learn a model that maps the
input attr. to the class label as soon as training data
becomes available.
• Lazy Learners
– They delay the process of modeling the tr. Data until it
is provided with an unseen instance to be classified.
– Instance-based classifiers belong to this class.
– They memorizes the entire tr. data and perform
classification only when attr. of a test instance
matches it completely.
L22/11-09-09 27
Instance-Based Classifiers
Set of Stored Cases • Store the training records
• Use training records to
Atr1 ……... AtrN Class
predict the class label of
A unseen cases
B
B
Unseen Case
C
Atr1 ……... AtrN
A
C
B
L22/11-09-09 28
Instance Based Classifiers
• Examples:
– Rote-learner (classifier)
• Memorizes entire training data and performs
classification only if attributes of record match one of
the training examples exactly.
• Its drawback is that some test rec. may not at all be
classified b’coz they don’t match any of the instance
in the tr. Data.
• SLOUTION??
– Nearest neighbor
• Uses k “closest” points (nearest neighbors) for
performing classification
L22/11-09-09 29
Nearest Neighbor Classifiers
• Basic idea: The main idea or justification of nearest neighbor
classifier is emphasized with the following example:
• If it walks like a duck, quacks like a duck, looks like a
duck, then it’s probably a duck
Compute
Distance
TTraining
eRecords
s Choose k of the
t “nearest” records
R
L22/11-09-09 30
e
• A nearest-neighbor classifier represents
each instance as a d-dimensional data
point in space, where d is the no. of
attributes.
L22/11-09-09 31
Nearest-Neighbor Classifiers
Unknown record ● Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve
X X X
d ( p, q ) = ∑ ( pi − qi )
i
2
34
Nearest Neighbor Classification…
• Choosing the value of k:
– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points
from other classes
K-nearest
classification
with large k
L22/11-09-09 35
Nearest Neighbor
Classification…
• Scaling issues
– Attributes may have to be scaled to prevent
distance measures from being dominated by
one of the attributes
– Example:
• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 90lb to 300lb
• income of a person may vary from $10K to $1M
L22/11-09-09 36
Nearest neighbor
Classification…
• k-NN classifiers are lazy learners
– It does not build models explicitly
– Unlike eager learners such as decision tree
induction and rule-based systems
– Classifying unknown records are relatively
expensive
L22/11-09-09 37
Algorithm
• 1: Let k be the no. of NN and D be the set of tr. examples.
• 2: for each test example z=(x’,y’) do
• 3: compute d(x’,x), the distance between z and
every example, (x,y) D.
• 4: ∈
Select Dz D, the set of k closest training
examples to z. ⊆
• 5: y’=
arg max ∑( xi , yi )∈Dz I (v = y i )
v
• 6: end for
L22/11-09-09 38
• Once the NN list is obtained, the test sample is
classified based on the majority class of its NN :
– Majority voting: y’=
arg max ∑( xi , yi )∈Dz I (v = y i )
v
L22/11-09-09 39
• In majority voting approach, every
neighbor has the same impact on the
classification. (Refer slide 15 fig.)
• This factor makes classification algo.
sensitive to the choice of k.
• In order to reduce this impact of k, we
assign weight to each of the distance for
NN say xi:
wi=1/d(x’,xi)2.
L22/11-09-09 40
• As a result of applying wt. to the distance, the tr.
Ex that are located far away from z have a
weaker impact on the classification.
• Using the distance-weighted voting scheme, the
class label can be determined:
Distance-wtd. Voting
L22/11-09-09 41
Characteristics
• 1. NN classification is a part of instance-based
learning.
• 2. Lazy learners like NN classifiers do not need
model building.
• 3. NN classifiers make their predictions based
on local information whereas DT and rule-based
classifiers attempt to find a global model that fits
the entire input space.
• 4. Appropriate proximity measures play a
significant role in NN classifiers.
L22/11-09-09 42