Beruflich Dokumente
Kultur Dokumente
Tibebe Beshah
tibebe.beshah@gmail.com
Topics
Overfitting
and underfitting
Data fragmentation and Expressiveness
Underfitting: when model is too simple, both training and test errors are large
Overfitting
Overfitting means that the model performs poorly on new
examples (e.g. testing examples) as it is too highly trained to the
specific (non-general) nuances of the training examples.
Underfitting
Using too few components/attributes
The model is not large enough to capture important
variability in data
Towards more model error
Overfitting
Using too many components /attributes
Prediction will be data dependent
Towards more estimation error.
Cont
How to Address Overfitting
Hold
Out Training Performance=67%
(test)
Average
Hold Performance=
out Training Performance=60%
(67+60+81)/3=69.3
Hold Performance=81%
out Training
Enhancements to basic decision tree induction
0 1 28 1 1 1
1 2 56 0 0 0
0 5 61 1 0 1 X
0 1 28 1 1 1
Cont
Is measuring accuracy on training data a good
performance indicator?
Using the same set of examples for training as well as
for evaluation results is an overoptimistic evaluation of
model performance.
Need to test performance on data not seen by the
modeling algorithm. I.e., data that was not used for
model building
Thus
separating in to training and test sent
Using N-fold cross validation
Subjective aspects too
Model Evaluation- details
Metrics
for Performance Evaluation
How to evaluate the performance of a model?
Metrics
for Performance Evaluation
How to evaluate the performance of a model?
PREDICTED CLASS
Class=Ye Class=No a: TP (true positive)
s b: FN (false negative)
Error rate
Cont
Metrics for Performance
Evaluation
PREDICTED CLASS
Class=Yes Class=No
ACTUAL Class=Yes a b
(TP) (FN)
CLASS
Class=No c d
(FP) (TN)
ad TP TN
Accuracy
a b c d TP TN FP FN
Accuracy and error rate
Actual Class
Class = 1 Class = 0
Class = 1 f11 f10
Class = 0 f01 f00
Consider a
2-class problem
Number of Class 0 examples = 9990
Number of Class 1 examples = 10
PREDICTED CLASS
+ - + -
ACTUAL ACTUAL
CLASS
+ 150 40 CLASS
+ 250 45
- 60 250 - 5 200
wa wb wc w d
1 2 3 4
Exercise
+ - + -
ACTUAL ACTUAL
CLASS
+ 150 40 + 250 45
CLASS
- 60 250 - 5 200
Model Evaluation
Metrics
for Performance Evaluation
How to evaluate the performance of a model?