Sie sind auf Seite 1von 2


Decision tree algorithms are quite robust to the presence of noise, especially when methods for
avoiding over fitting, are employed.
7. The presence of redundant attributes does not adversely affect the accuracy of decision trees.
An attribute is redundant if it is strongly correlated with another attribute in the data. One of the
two redundant attributes will not be used for splitting once the other attribute has been chosen.
However, if the data set contains many irrelevant attributes, i.e., attributes that are not useful for
the classification task, then some of the irrelevant attributes may be accidently chosen during the
Tree-growing process, which results in a decision tree that is larger than necessary.
8. Since most decision tree algorithms employ a top-down, recursive partitioning approach, the
number of records becomes smaller as we traverse down the tree. At the leaf nodes, the number of
Records may be too small to make a statistically significant decision about the class representation
of the nodes. This is known as the data fragmentation problem. One possible solution is to disallow
further splitting when the number of records falls below a certain threshold.
9. A subtree can be replicated multiple times in a decision tree, This makes the decision tree more
complex than necessary and perhaps more difficult to interpret. Such a situation can arise from
decision tree implementations that rely on a single attribute test condition at each internal node.
Since most of the decision tree algorithms use a divide-and-conquer partitioning strategy, the same
test condition can be replication problem. Applied to different parts of the attribute space, thus
leading to the subtree
10. The test conditions described so far in this chapter involve using only a single attribute at a
time. As a consequence, the tree-growing procedure can be viewed as the process of partitioning
the attribute space into disjoint regions until each region contains records of the same class. The
border between two neighboring regions of different classes is known as a decision boundary.
Constructive induction provides another way to partition the data into homogeneous,
nonrectangular regions

5.4 Rule-Based Classification

In this section, we look at rule-based classifiers, where the learned model is represented as a set of
IFTHEN rules. We first examine how such rules are used for classification. We then study ways
in which they can be generated, either from a decision tree or directly from the training data using
a sequential covering algorithm.
5.4.1 Using IF-THEN Rules for Classification
Rules are a good way of representing information or bits of knowledge. A rule-based classifier
uses a set of IFTHEN rules for classification. An IF-THEN rule is an expression of the form
IF condition THEN conclusion.
An example is rule R1,
R1: IF age = youth AND student = yes THEN buys computer = yes.
The IF-part (or left-hand side) of a rule is known as the rule antecedent or precondition. The
THEN-part (or right-hand side) is the rule consequent. In the rule antecedent, the condition
consists of one or more attribute tests (such as age = youth, and student = yes) that are logically
ANDed. The rules consequent contains a class prediction (in this case, we are predicting whether
a customer will buy a computer). R1 can also be written as R1: (age = youth) ^ (student =

yes)) (buys computer = yes).

If the condition (that is, all of the attribute tests) in a rule antecedent holds true for a given tuple,
we say that the rule antecedent is satisfied (or simply, that the rule is satisfied) and that the rule
covers the tuple. A rule R can be assessed by its coverage and accuracy. Given a tuple, X, from a
class labeled data set, D, let ncovers be the number of tuples covered by R; ncorrect be the number
of tuples correctly classified by R; and jDj be the number of tuples in D. We can define the
coverage and accuracy of R as

That is, a rules coverage is the percentage of tuples that are covered by the rule (i.e., whose
attribute values hold true for the rules antecedent). For a rules accuracy, we look at the tuples
that it covers and see what percentage of them the rule can correctly classify.