Sie sind auf Seite 1von 12

Decision Tree

Group Members

 Hafiz Muhammad Ahmed F2016065123


 Nofil Bhatty F2016065089
 Shahrukh Quddus F2016065120
Introduction

 Decision tree falls under the category of supervised learning.


 Decision tree can be used to solve regression and classification
problems.
 Decision tree uses tree representation to solve the problem.
 Each leaf node represents the class label.
 Each attribute represent internal node of tree.
Attribute Selection Measure

 Information Gain
 Dataset entropy
 Attribute entropy
 Gain
Attribute Selection: Information Gain
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no 5 4
31…40 high no fair yes Info age ( D)  I (2,3)  I (4,0)
>40 medium no fair yes 14 14
>40 low yes fair yes
>40 low yes excellent no 5
31…40 low yes excellent yes  I (3,2)  0.694
<=30 medium no fair no 14
<=30 low yes fair yes
>40 medium yes fair yes 5
<=30 medium yes excellent yes I (2,3) means “age <=30” has 5
31…40 medium no excellent yes 14
31…40 high yes fair yes out of 14 samples, with 2 yes’es
>40 medium no excellent no
and 3 no’s. Hence
g Class P: buys_computer =
“yes” Gain(age)  Info ( D)  Info age ( D)  0.246
g Class N: buys_computer =
9 9 5 5 Similarly,
D)  I (9,5)   log 2 ( )  log 2 ( ) 0.940
Info (“no”
14 14 14 14

age pi n i I(p i, ni) Gain(income)  0.029


<=30 2 3 0.971 Gain( student )  0.151
31…40 4 0 0 Gain(credit _ rating )  0.048
>40 3 2 0.971
Advantages

 A decision tree is simple to understand and after a brief exploration, we


construct it.
 It requires modest data training.
 It can deal with numerical as well as categorical data.
 Universal for solving both classification and regression problems
 Time efficient even with large data. Standard computing resources help
in analyzing large data.
 High-performing with regard to searching down a built tree, because
the tree traversal algorithm is efficient even for massive data sets
Disadvantages

 Decision-tree learning algorithms are based on heuristic algorithms which


fail to offer an assurance of returning the globally optimal decision tree i.e.
they are greedy in approach.
 It is possible that the decision tree learners can generate extra complex
trees which may fail to generalize data properly. This is known as over
fitting.
 Each split in a tree leads to a reduced dataset under consideration. And,
hence the model created at the split will potentially introduce bias.
 Decision trees can be unstable. Even minor perturbations in a data set can
produce a drastically different tree.
 It can be difficult to control the size of the tree. The size of a decision tree is
critical for ensuring the quality of the problem-solving process
 Decision Tree's do not work best if you have a lot of  un-correlated variables.
Decision tree's work by finding the interactions between variables.
Applications

 When the user has an objective he is trying to achieve: max, profit,


optimize cost.
 When there are several courses of actions.
 There is a calculable measure of benefit of the various alternatives.
 When there are events beyond the control of the decision maker:
environmental factors.
 Uncertainty concerning which outcome will actually happen.

Das könnte Ihnen auch gefallen