Beruflich Dokumente
Kultur Dokumente
Rule-Based Classifier
Classify records by using a collection of ifthen rules Rule: (Condition) y where Condition is a conjunctions of attributes y is the class label LHS: rule antecedent or condition RHS: rule consequent Examples of classification rules: (Blood Type=Warm) (Lay Eggs=Yes) Birds (Taxable Income < 50K) (Refund=Yes) Evade=No
(Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle
warm cold cold warm cold cold warm warm warm cold cold warm warm cold cold cold warm warm warm warm
mammals reptiles fishes mammals amphibians reptiles mammals birds mammals fishes reptiles birds mammals fishes amphibians reptiles mammals birds mammals birds
R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians
warm warm
no yes
yes no
no no
? ?
The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal
Taxable Income Class 125K 100K 70K 120K No No No No Yes No No Yes No Yes
Accuracy of a rule:
Fraction of records that satisfy both the antecedent and consequent of a rule (over those that satisfy the antecedent)
yes no yes
no no no
no sometimes yes
? ? ?
A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules
Exhaustive rules
There exists a rule for each combination of attribute values. This ensures that every record is covered by at least one rule. Together these properties ensure that every record is covered by exactly one rule.
Rules
Non mutually exclusive rules
A record may trigger more than one rule Solution?
Ordered rule set
turtle
cold
no
no
sometimes
(ii) Step 1
R1
R1
R2
(iii) Step 2 (iv) Step 3
This approach is called a covering approach because at each stage a rule is identified that covers some of the instances
y
26
b b b b b
b a a b b b a a a b b b b b b b b
12
a b b
a b b x
a b b x
Possible rule set for class b: More rules could be added for perfect rule set
If x 1.2 then class = b If x > 1.2 and y 2.6 then class = b
Here, each new test (growing the rule) reduces rules coverage.
Selecting a test
Goal: maximizing accuracy
t: total number of instances covered by rule p: positive examples of the class covered by rule t-p: number of errors made by rule
We are finished when p/t = 1 or the set of instances cant be split any further
The numbers on the right show the fraction of correct instances in the set singled out by that choice. In this case, correct means that their recommendation is hard.
The rule isnt very accurate, getting only 4 out of 12 that it covers. So, it needs further refinement.
Further refinement
Should we stop here? Perhaps. But lets say we are going for exact rules, no matter how complex they become. So, lets refine further.
Further refinement
The result