Beruflich Dokumente
Kultur Dokumente
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 1 / 12
Lecture 14 - Roadmap
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 2 / 12
Aggregating Trees
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 3 / 12
Bagging
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 4 / 12
Bagging
So, in order to reduce the variance and thus increase the prediction
accuracy of a SL method,
I take many training sets from the population
I build separate prediction models, and average the resulting predictions
That is, calculate fˆ1 (x), fˆ2 (x), ..., fˆB (x) using B training sets, and
average them to obtain a single low-variance SL model
B
1 X ˆb
fˆavg (x) = f (x)
B
b=1
I Not practical, since we generally do not have access to multiple
training sets
I Instead, we can bootstrap!
Bagging:
B
1 X ˆ∗b
fˆbag (x) = f (x)
B
b=1
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 5 / 12
Bagging
Bagging is particularly useful for decision trees
We construct B regression trees using B bootstrapped training sets,
and average the resulting predictions
I Trees are not pruned, so each of them has high variance but low bias
I Averaging these B trees reduces the variance
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 6 / 12
Out-of-Bag Error Estimation
Turns out that there is a very easy way to estimate the test error of a
bagged model, without using CV
Can show that, on average, each bootstrap sample (and thus each
bagged tree) makes use of around 2/3 of the obs’s (why?)
I The remaining 1/3 is referred to as the out-of-bag (OOB) obs’s
Then we can predict the response for the i-th obs using each of the
trees in which that obs was OOB
I This will yield around B/3 predictions for the i-th obs
I Average them (for regression) or take a majority vote (for
classification) to obtain a single prediction for i
I Then calculate the overall OOB MSE (for regression) or classification
error (for classification)
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 7 / 12
Bagging
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 8 / 12
Random Forests
Random forests provide improvement over bagged trees by ways of
decorrelating the trees
Similar to bagging procedure, but each time a split in a tree is
considered,
I a random sample of m predictors is chosen as split candidates from the
√
full set of p predictors (typically m ≈ p)
Why “decorrelating”?
I If there is one very strong predictor in the dataset, most of the trees
will use this predictor in the top split, and all the bagged trees will look
similar
I Hence the predictions from the bagged trees will be highly correlated
I Does not lead to a large reduction in variance
Random forests are particularly helpful when we have a large number
of correlated predictors
I e.g., high-dim biological dataset, such as genes (Figure 8.10)
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 9 / 12
Boosting
Another approach for improving the performance of decision trees
Like bagging, boosting is a general approach that can be applied to
many SL methods
Boosting is similar to bagging, but the trees are grown sequentially
I Each tree is grown using info from previously grown trees
I No bootstrap sampling is involved
I Instead, each tree is fit on a modified version of the original dataset
ri ← ri − λfˆb (xi )
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 11 / 12
Boosting
Figure 8.11
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 12 / 12