Sie sind auf Seite 1von 21

Deriving Classification Rules using Covering Approach

Chapter 4- Part (2)

Dr Fadi Fayez

Why Rule
Trees
big and busy no part can be understood without reference to the whole can confuse users

Rules
small independent chunks of knowledge can be easier to explain e.g if THIS then THAT if ANTECEDENT then CONSEQUENCE if LHS then RHS

Covering Algorithm
For building classification rules Separate and conquer approach iteratively Keep add in new rule to cover as many as possible a class of interest (positive instances) BUT try using at least number of rule as possible Choose attribute to separate which can maximize the probability of the desired classification hence higher accuracy Accuracy measured by p/t p = positive examples of a class t = total of instances covered by new rule (attribute being used)

Covering Algorithm (contd.)


Choose the attribute with maximum p/t For two or more attributes with equal p/t, choose the one with higher coverage, i.e. the one with greater p Do not need to consider the instances covered by the new rule accepted for next iteration e.g. Prism algorithm (no numeric attributes)

An Example of the Covering Algorithm

Step 1: If x > 1.2 then class =a. Step 2: If x > 1.2 and y>2.6 then class =a

Continue to Derive More Comprehensive Rules


The rule if x>1.2 and y>2.6, then class=a covers all as but one. A new rule if x>1.4 and y<2.4, then class =A may be added to cover all as.

Prism Algorithm
A simple covering algorithm developed by Cendrowksa in 1987. Available in the WEKA PRISM weka.classifiers.rules.Prism

Prism Pseudocode
For each class C Initialize E to the instance set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do
For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R

Remove the instances covered by R from E

Prism: Separate and Conquer Approach


Methods like PRISM (for dealing with one class) are separate-and-conquer algorithms:
First, a rule is identified Then, all instances covered by the rule are separated out Finally, the remaining instances are conquered

Difference to a decision tree's divide-andconquer method: Subset covered by rule doesn't need to be explored any further

A More Comprehensive Example and the Prism Algorithm


Assume we want to derive a rule for recommendation = hard based on the following dataset.

Insert Table 1.1 on page 4

The Candidate Tests and Their Accuracies


age=young age=pre-presbyopic age=presbyopic spectacle prescription=myope spectacle prescription=hypermetrope 2/8 1/8 1/8 3/12 1/12

astigmatism=no
astigmatism=yes tear production rate=reduced tear production rate=normal

0/12
4/12 0/12 4/12

Among the 9 candidates, the following two have the highest accuracy astigmatis m yes 4 12
tear production rate normal 4 12

The First Intermediate Rule


Assume that we pick astigmatism = yes randomly. Then, we have the first intermediate rule:
If astigmatism = yes, then recommendation = hard.

Now, consider the remaining possible tests in order to refine the rule.

Tests to Refine the Intermediate Rule


age=young 2/4

age=pre-presbyopic
age=presbyopic spectacle prescription=myope spectacle prescription=hypermetrope

1/4
1/4 3/6 1/6

tear production rate=reduced


tear production rate=normal

0/6
4/6

The test tear production rate = normal is the apparent winner. Hence, the intermediate rule becomes
If astigmatism = yes and tear production rate = normal, then recommendation = hard.

Insert Table 4.9 on page 102

More Tests to Get the Perfect Rule age=young 2/2


age=pre-presbyopic age=presbyopic spectacle prescription=myope spectacle prescription=hypermetrope 1/2 1/2 3/3 1/3

We may include test spectacle prescription = myope to get a perfect rule. The rule now is
If astigmatism = yes and tear production rate = normal and spectacle prescription = myope, then recommendation = hard.

Deriving More Rules to Get 100% Coverage


The rule that we just derived covers 3 out of 4 instances that have recommendation = hard. Therefore, we delete these 3 instances and start the process over again.

The Complete Rules List for Recommendation = Hard


Eventually, we will get the following list of rules
If astigmatism = yes and tear production rate = normal and spectacle prescription = myope, then recommendation = hard. If age= young and astigmatism = yes and tear production rate = normal, then recommendation = hard.

Prism Overfitting Avoidance


Standard PRISM has no over-fitting avoidance strategy

Prism Limitations
PRISM algorithm silent on Order with which classes are explored (usually, majority first) Order with which attributes are explored (heh, lets pre-sort on, um, correlation to class?) Standard PRISM also demands that all attributes are added Bad idea Why not prune with info gain? Why not have an early stopping criteria? Standard PRISM has no support-based pruning Why not stop learning when support of selected rule falls too low? Currently, these options are unexplored.

Das könnte Ihnen auch gefallen