Advanced Econometrics: Professor: Sukjin Han

Advanced Econometrics
Professor: Sukjin Han
November 13, 2018
Professor: Sukjin Han Multiple Treatments with Interaction November 13, 2018 1 / 12
Lecture 14 - Roadmap
Tree-based methods, continued

Bagging
Random forests
Boosting
Ch. 8 of ISL
Aggregating Trees
The goal is to construct more powerful prediction models

I Bagging
I Random forests
I Boosting
Bagging
In Lecture 5, we learn how to use bootstrap to, e.g., calculate

standard errors
Here we use it in a totally different context
The decision trees discussed in Lecture 13 suffer from high variance
I e.g., If we split the training data into two and fit a decision free to
both halves, we get quite different results
Bootstrap aggregation (bagging)
I It is a general-purpose procedure for reducing the variance of a SL
method
I Particularly useful in the context of decision trees
Intuition:
I Given a set of n independent obs’s Z1 , ..., Zn , each with variance σ 2 ,
the variance of the mean Z̄ of the obs’s is given by σ 2 /n
Bagging
So, in order to reduce the variance and thus increase the prediction
accuracy of a SL method,
I take many training sets from the population
I build separate prediction models, and average the resulting predictions
That is, calculate fˆ1 (x), fˆ2 (x), ..., fˆB (x) using B training sets, and
average them to obtain a single low-variance SL model
B
1 X ˆb
fˆavg (x) = f (x)
B
b=1
I Not practical, since we generally do not have access to multiple
training sets
I Instead, we can bootstrap!
Bagging:
B
1 X ˆ∗b
fˆbag (x) = f (x)
B
b=1
Bagging
Bagging is particularly useful for decision trees
We construct B regression trees using B bootstrapped training sets,
and average the resulting predictions
I Trees are not pruned, so each of them has high variance but low bias
I Averaging these B trees reduces the variance
So far, bagging in regression

How to extend bagging to a classification problem?
I For a given test obs, record the class predicted by each of the B trees,
I and take the majority vote, i.e., the overall prediction is the most
commonly occurring class among the B predictions
I Figure 8.8
F The choice of B is not critical, and a large value doesn’t cause
overfitting
Out-of-Bag Error Estimation
Turns out that there is a very easy way to estimate the test error of a
bagged model, without using CV
Can show that, on average, each bootstrap sample (and thus each
bagged tree) makes use of around 2/3 of the obs’s (why?)
I The remaining 1/3 is referred to as the out-of-bag (OOB) obs’s
Then we can predict the response for the i-th obs using each of the
trees in which that obs was OOB
I This will yield around B/3 predictions for the i-th obs
I Average them (for regression) or take a majority vote (for
classification) to obtain a single prediction for i
I Then calculate the overall OOB MSE (for regression) or classification
error (for classification)
Bagging
One drawback of bagging is lack of interpretability

Can still calculate a measure of variable importance
I Record the decrease of the RSS (or the Gini index) due to splits over a
given predictor, averaged over all B trees
I Figure 8.9
Random Forests
Random forests provide improvement over bagged trees by ways of
decorrelating the trees
Similar to bagging procedure, but each time a split in a tree is
considered,
I a random sample of m predictors is chosen as split candidates from the
√
full set of p predictors (typically m ≈ p)
Why “decorrelating”?
I If there is one very strong predictor in the dataset, most of the trees
will use this predictor in the top split, and all the bagged trees will look
similar
I Hence the predictions from the bagged trees will be highly correlated
I Does not lead to a large reduction in variance
Random forests are particularly helpful when we have a large number
of correlated predictors
I e.g., high-dim biological dataset, such as genes (Figure 8.10)
Boosting
Another approach for improving the performance of decision trees
Like bagging, boosting is a general approach that can be applied to
many SL methods
Boosting is similar to bagging, but the trees are grown sequentially
I Each tree is grown using info from previously grown trees
I No bootstrap sampling is involved
I Instead, each tree is fit on a modified version of the original dataset
Goal is to combine a large number of decision trees, fˆ1 , ..., fˆB

I by “learning slowly”
I In general, SL methods that learn slowly tend to perform well
Given the current model, we fit a tree using the residuals from the
model, rather than the outcome Y , as the response
I Then we add this new tree into the fitted function in order to update
the residuals
Boosting
Algorithm:
1. Set fˆ(x) = 0 and ri = yi for all i in the training set
2. For b = 1, ..., B, repeat:
a. Fit a tree fˆb with d splits (d + 1 terminal nodes) to the training data
(X , r )
b. Update fˆ by adding in a shrunken version of the new tree:
fˆ(x) ←fˆ(x) + λfˆb (x)
c. Update the residuals,
ri ← ri − λfˆb (xi )
3. Output the boosted model,

B
X
fˆ(x) = λfˆb (x)
b=1
Boosting
Three tuning parameters:

I The number of trees B: Unlike the previous methods, boosting can
overfit if B is too large
I The shrinkage parameter λ ≥ 0: Typically 0.01 or 0.001 (small λ
requires large B)
I The number d of splits in each tree: Often d = 1 works well
Figure 8.11

Advanced Econometrics: Professor: Sukjin Han

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Advanced Econometrics: Professor: Sukjin Han

Hochgeladen von

Copyright:

Verfügbare Formate

Advanced Econometrics

Professor: Sukjin Han

November 13, 2018

Tree-based methods, continued

The goal is to construct more powerful prediction models

In Lecture 5, we learn how to use bootstrap to, e.g., calculate

So far, bagging in regression

One drawback of bagging is lack of interpretability

Goal is to combine a large number of decision trees, fˆ1 , ..., fˆB

b. Update fˆ by adding in a shrunken version of the new tree:

fˆ(x) ←fˆ(x) + λfˆb (x)

c. Update the residuals,

3. Output the boosted model,

Three tuning parameters:

Das könnte Ihnen auch gefallen