Sie sind auf Seite 1von 50

Decision Trees are High Variance

▪ Problem: decision trees tend


to overfit

2
Decision Trees are High Variance
▪ Problem: decision trees tend
to overfit
▪ Pruning helps reduce
variance to a point

3
Decision Trees are High Variance
▪ Problem: decision trees tend
to overfit
▪ Pruning helps reduce
variance to a point
▪ Often not significant for
model to generalize well

4
Improvement: Use Many Trees
Create many different trees.

5
Improvement: Use Many Trees
Create many different trees.

6
Improvement: Use Many Trees
Create many different trees.

7
Improvement: Use Many Trees
Combine predictions to reduce variance.

8
How to Create Multiple Trees?
Use bootstrapping: sample data with replacement.

9
How to Create Multiple Trees?
Create multiple bootstrapped samples.

10
How to Create Multiple Trees?
Grow decision tree from each bootstrapped sample.

11
Distribution of Data in Bootstrapped Samples
▪ Given a dataset, create n
0.40
bootstrapped samples

0.35
(1-1/n)n

0.30

0.25
0 20 40 60 80 100
Number of Bootstrapped Samples (n)

12
Distribution of Data in Bootstrapped Samples
▪ Given a dataset, create n
0.40
bootstrapped samples
▪ For a given record x,
0.35 𝑃 𝑟𝑒𝑐 𝑥 𝑛𝑜𝑡 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 =(1 − 1ൗ𝑛)𝑛
(1-1/n)n

0.30

0.25
0 20 40 60 80 100
Number of Bootstrapped Samples (n)

13
Distribution of Data in Bootstrapped Samples
▪ Given a dataset, create n
0.40
bootstrapped samples
▪ For a given record x,
0.35 𝑃 𝑟𝑒𝑐 𝑥 𝑛𝑜𝑡 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 =(1 − 1ൗ𝑛)𝑛
(1-1/n)n

▪ Each bootstrap sample


0.30 contains approximately 2Τ3
of the records

0.25
0 20 40 60 80 100
Number of Bootstrapped Samples (n)

14
Aggregate Results
Trees vote on or average result for each data point.

Vote to Form
a Single
Classifier

15
Aggregate Results
Trees vote on or average result for each data point.

Data
Point

Vote to Form
a Single
Classifier

16
Aggregate Results
Trees vote on or average result for each data point.

Vote to Form
a Single
Classifier

Results

17
Aggregate Results
Trees vote on or average result for each data point.

Vote to Form
a Single
Classifier

18
Aggregate Results
Trees vote on or average result for each data point.

Bagging = Bootstrap Aggregating


Vote to Form
a Single
Classifier

19
Bagging error calculations
▪ Bootstrapped samples
provide built-in error
estimate for each tree

20
Bagging error calculations
▪ Bootstrapped samples
provide built-in error
estimate for each tree
▪ Create tree based on subset
of data

21
Bagging error calculations
▪ Bootstrapped samples
provide built-in error
estimate for each tree
▪ Create tree based on subset
of data
▪ Measure error for that tree
on unused samples

22
Bagging error calculations
▪ Bootstrapped samples
provide built-in error
estimate for each tree
▪ Create tree based on subset
of data
▪ Measure error for that tree
on unused samples

23
Bagging error calculations
▪ Bootstrapped samples
provide built-in error
estimate for each tree
▪ Create tree based on subset
of data
▪ Measure error for that tree
on unused samples
▪ Called “Out-of-Bag” error

24
Calculation of Feature Importance
▪ Fitting a bagged model
doesn't produce coefficients
like logistic regression

25
Calculation of Feature Importance
▪ Fitting a bagged model
doesn't produce coefficients
like logistic regression
▪ Instead, feature importances
are estimated using oob error

26
Calculation of Feature Importance
▪ Fitting a bagged model
doesn't produce coefficients
like logistic regression
▪ Instead, feature importances
are estimated using oob error
▪ Randomly permute data for
particular feature and
measure change in accuracy

27
How Many Trees to Fit?
▪ Bagging performance
improvements increase

RMSE (Cross-Validated)
with more trees

0 100 200 300 400 500


Number of Bagged Trees

28
How Many Trees to Fit?
▪ Bagging performance
improvements increase

RMSE (Cross-Validated)
with more trees
▪ Maximum improvement
generally reached ~50 trees

0 100 200 300 400 500


Number of Bagged Trees

29
Strengths of Bagging
Same as decision trees:
▪ Easy to interpret and implement

30
Strengths of Bagging
Same as decision trees:
▪ Easy to interpret and implement
▪ Heterogeneous input data
allowed, no preprocessing
required

31
Strengths of Bagging
Same as decision trees:
▪ Easy to interpret and implement
▪ Heterogeneous input data
allowed, no preprocessing
required

Specific to bagging:
▪ Less variability than decision
trees

32
Strengths of Bagging
Same as decision trees:
▪ Easy to interpret and implement
▪ Heterogeneous input data
allowed, no preprocessing
required

Specific to bagging:
▪ Less variability than decision
trees
▪ Can grow trees in parallel

33
BaggingClassifier: The Syntax
Import the class containing the classification method.
from sklearn.ensemble import BaggingClassifier

34
BaggingClassifier: The Syntax
Import the class containing the classification method.
from sklearn.ensemble import BaggingClassifier

Create an instance of the class.


BC = BaggingClassifier(n_estimators=50)

35
BaggingClassifier: The Syntax
Import the class containing the classification method.
from sklearn.ensemble import BaggingClassifier

Create an instance of the class.


BC = BaggingClassifier(n_estimators=50)

Fit the instance on the data and then predict the expected value.
BC = BC.fit(X_train, y_train)
y_predict = BC.predict(X_test)

Tune parameters with cross-validation. Use BaggingRegressor for regression.

36
Reduction in Variance Due to Bagging
▪ For 𝑛 independent trees, each
with variance σ2 , the bagged

RMSE (Cross-Validated)
variance is:
σ2
𝑛

0 100 200 300 400 500


Number of Bagged Trees

37
Reduction in Variance Due to Bagging
▪ For 𝑛 independent trees, each
with variance σ2 , the bagged

RMSE (Cross-Validated)
variance is:
σ2
𝑛
▪ However, bootstrap samples
are correlated (𝜌):
2
1−𝜌 2
𝜌σ + σ
𝑛 0 100 200 300 400 500
Number of Bagged Trees

38
Introducing More Randomness
▪ Solution: further de-correlate
trees

RMSE (Cross-Validated)
0 100 200 300 400 500
Number of Bagged Trees

39
Introducing More Randomness
▪ Solution: further de-correlate
trees

RMSE (Cross-Validated)
▪ Use random subset of
features for each tree
▪ Classification: 𝑚
▪ Regression: 𝑚Τ3

0 100 200 300 400 500


Number of Bagged Trees

40
Introducing More Randomness
▪ Solution:
further de-correlate trees

RMSE (Cross-Validated)
▪ Use random subset of
features for each tree
Bagging
▪ Classification: 𝑚 Random Forest

▪ Regression: 𝑚Τ3

▪ Called “Random Forest”


0 100 200 300 400 500
Number of Bagged Trees

41
How Many Random Forest Trees?
▪ Errors are further reduced
for Random Forest relative

RMSE (Cross-Validated)
to Bagging

Bagging
Random Forest

0 100 200 300 400 500


Number of Bagged Trees

42
How Many Random Forest Trees?
▪ Errors are further reduced
for Random Forest relative

RMSE (Cross-Validated)
to Bagging
▪ Grow enough trees until
Bagging
error settles down Random Forest

0 100 200 300 400 500


Number of Bagged Trees

43
How Many Random Forest Trees?
▪ Errors are further reduced
for Random Forest relative

RMSE (Cross-Validated)
to Bagging
▪ Grow enough trees until
Bagging
error settles down Random Forest

▪ Additional trees won't


improve results

0 100 200 300 400 500


Number of Bagged Trees

44
RandomForest: The Syntax
Import the class containing the classification method.
from sklearn.ensemble import RandomForestClassifier

Create an instance of the class.


RC = RandomForestClassifier(n_estimators=100, max_features=10)

Fit the instance on the data and then predict the expected value.
RC = RC.fit(X_train, y_train)
y_predict = RC.predict(X_test)

Tune parameters with cross-validation. Use RandomForestRegressor


for regression.

45
Introducing Even More Randomness
▪ Sometimes additional randomness is desired beyond Random Forest

46
Introducing Even More Randomness
▪ Sometimes additional randomness is desired beyond Random Forest
▪ Solution: select features randomly and create splits randomly—don't
choose greedily

47
Introducing Even More Randomness
▪ Sometimes additional randomness is desired beyond Random Forest
▪ Solution: select features randomly and create splits randomly—don't
choose greedily
▪ Called “Extra Random Trees”

48
ExtraTreesClassifier: The Syntax
Import the class containing the classification method.
from sklearn.ensemble import ExtraTreesClassifier

Create an instance of the class.


EC = ExtraTreesClassifier(n_estimators=100, max_features=10)

Fit the instance on the data and then predict the expected value.
EC = EC.fit(X_train, y_train)
y_predict = EC.predict(X_test)

Tune parameters with cross-validation. Use ExtraTreesRegressor


for regression.

49