Sie sind auf Seite 1von 40

# Evaluating Model Accuracy and

Bias check:
How well the predicted values fitted the actual values? (Ideally low bias model is the best model)

## Variance check(Error variance) : (actual-predicted)^2/n

Model error check between Training Vs Test/Validation
(Ideally low error variance in both Train & Test/validation is the best fit model)
MACHINE LEARNING ALGORITHMS
Decision Trees
Decision Tree Algorithms are also called as Top Down Induction Of
Decision Trees(TDIDT)

Important Terminology:
Root Node : Test/Decision Points
Branch : Collection of nodes and the Leaf
Leaves : End route note/Final decisions/Conclusions

## Famous TDIDT Algorithms are

- C5.0(Quinlan)
- CART(Breiman)
Trees are Rules expressed

“AND”

## Branches with similar outcomes are

connected with “OR”

## Smallest Tree(Least number of nodes) with

smallest error(least number of incorrectly
classified records)

• Fast
• Robust
• Explicable
Regression Trees

## It turns out that, we are collecting very similar records at each

leaf. So that we can use mean (or) median of the records at a
leaf as the predictor value for all the new records that obey
similar conditions. Such tree are called Regression Trees.

problems.

## • Which Attribute to Choose(Where to Start)?

• Where to Stop(To avoid overfitting)?
Attribute Selection Criteria
• Main principle:
- Select attribute which partitions the learning set(dataset) into subsets as
“PURE” as possible

## • Various measures of Purity:

- Entropy
- Information Gain
- Gini Index

## Note: Lower the above values higher the Purity of nodes

We can measure the purity of a Leaf/Node by using below the methods:

## For Classification Trees : ENTROPY/GINI Index

For Regression : RMSE/MAPE
Two Most Popular Decision Tree Algorithms

• C5.0:
- Multi split
- Information Gain (Measure of Purity)
- Pessimistic pruning (To avoid overfitting)

• CART:
- Binary Split
- Gini Index (Measure of Purity)
- Cost Complexity Pruning (To avoid overfitting)
C5.0 Algorithm
MEASURE OF PURITY
Entropy becomes Zero in case Probability value of any class(Pi) = 1
LOG(1) =0
We can grow until we exhaust the data. But is
that the right time to stop?

## HOW TO MINIMIZE THE OVERFIT?

Why Prune? : To avoid Overfitting
REDUCED ERROR PRUNING/PESSIMISTIC PRUNING
Here ‘f’ is # bad/Total  2/6, 1/2, 2/6
‘e’ value comes out from the formulae

Weighted sum of errors(0.51) for the lowest layer is calculated as fallows(which is higher than its previous layer , so Prune it
(6/14)*0.47+(2/14)*0.72+(6/14)*0.47 = 0.51

Finally, the lower layer’s error is higher the its mother branch, hence Prune the complete layer.
CART Algorithm
MEASURE OF PURITY
** Here ‘S’ is Total records