5 views

Uploaded by sleepyhollowinoz

course notes.

- The Pain With Using Singleton(WebDynpro)
- Links
- workshop04.txt
- Covering Graphs using Trees and Stars
- Tree Properties
- CATA2017-HenriqueRaiaPlentz
- PeopleSoft Tree Manager (Creating Trees)
- MCS-021(11-12)
- The Barnes-Hut Algorithm
- Kelompok Batuan Ini Disebut Kompleks Tektonik Bantimala Yang Tersusun Oleh Batuan Metamorf Yaitu Glaucophane Schist
- [IJCST-V4I5P47]:Rajeswari P. V. N, Radhika P
- decision tree 1.pdf
- FE570Lec6b
- Implement
- IARE_DWDM_AND_WT_LAB_MANUAL.pdf
- Work
- Self-Organized Hierarchical Methods for Time Series Forecasting_Versao_Alynne
- fulltext
- 17403_CS2finalreview
- 5. Randomized Algorithms for Optimizing Large Join Queries - Ioannidis

You are on page 1of 72

in decision trees

Emily Fox & Carlos Guestrin

Machine Learning Specialization

University of Washington

2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Review of loan default prediction

Loan

Applications

Safe

review system

Risky

Decision tree review

start

excellent poor

Credit?

fair

Loan

Income? i

Application Safe Term?

high Low

3 years 5 years

3 years 5 years

Risky Safe

Overfitting review

Overfitting in logistic regression

True error

Classification Error

Error =

training_error(w*) > training_error()

true_error(w*) < true_error()

Training error

Model complexity

Overfitting

Overconfident predictions

(Degree 6) (Degree 20)

Overfitting in decision trees

Decision stump (Depth 1):

Split on x[1]

y values Root

- + 18 13

x[1]

13 3 4 11

What happens when we increase depth?

Decision

boundary

Deeper trees lower training error

Training Error

Tree depth

Training error = 0: Is this model perfect?

Depth 10 (training error = 0.0)

EC T

O T P ERF

N

17 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Why training error reduces with depth?

Safe Risky 22 18

Tree Training

error

Credit?

(root) 0.45

split on credit 0.20

excellent good fair

9 0 9 4 4 14

improved by 0.25

because of the split

18 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Feature split selection algorithm

Given a subset of data M (a node in a tree)

For each feature hi(x):

1. Split data of M according to feature hi(x)

2. Compute classification error split

Chose feature h*(x) with lowest

classification error

By design, each split

reduces training error

Decision trees overfitting

on loan data

Principle of Occams razor:

Simpler trees are better

Principle of Occams Razor

Among competing hypotheses, the one with

fewest assumptions should be selected,

William of Occam, 13th Century

Symptoms: S1 and S2

SIMPLER

Diagnosis 1: 2 diseases Diagnosis 2: 1 disease

Two diseases D1 and D2 where OR Disease D3 explains both

D1 explains S1, D2 explains S2 symptoms S1 and S2

Occams Razor for decision trees

When two trees have similar classification error

on the validation set, pick the simpler one

error error Same validation

error

Simple 0.23 0.24

Moderate 0.12 0.15

Complex 0.07 0.15

Super complex 0 0.18 Overfit

26 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Which tree is simpler?

OR

SIMPLER

Modified tree learning problem

Find a simple decision tree with low classification error

T1(X) T2(X)

T4(X)

How do we pick simpler trees?

before tree become too complex

learning algorithm terminates

Early stopping for

learning decision trees

Deeper trees

Increasing complexity

Early stopping condition 1:

Limit the depth of a tree

Restrict tree learning to shallow trees?

Classification Error

Simple Complex

trees trees

True error

Training error

max_depth

Tree depth

36 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Early stopping condition 1:

Limit depth of tree

Classification Error

depth = max_depth

max_depth

Tree depth

37 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Picking value for max_depth???

Classification Error

Validation set or

cross-validation

max_depth

Tree depth

38 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Early stopping condition 2:

Use classification error to

limit depth of tree

Decision tree recursion review

Loan status: Root

Safe Risky 22 18

Credit?

16 0 1 2 5 16

Safe

Build decision stump Build decision stump

with subset of data with subset of data

where Credit = fair where Credit = poor

40 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Split selection for credit=poor

Safe Risky 22 18

Credit?

No split improves

excellent fair poor classification error

16 0 1 2 5 16

Stop!

Splits for Classification

credit=poor error

Safe

(no split) 0.24

0.45

split on term 0.24

split on income 0.24

Early stopping condition 2:

No split improves classification error

Loan status: Root

Safe Risky

Early stopping

22 18

condition 2

Credit?

16 0 1 2 5 16

credit=poor error

Safe Risky

Build decision stump (no split) 0.24

with subset of data split on term 0.24

where Credit = fair split on income 0.24

Practical notes about stopping when

classification error doesnt decrease

- Stop if error doesnt decrease by more than

Early stopping condition 3:

Stop if number of data points

contained in a node is too small

Can we trust nodes with very few points?

Loan status: Root

Safe Risky 22 18

Credit?

16 0 1 2 5 16

Safe Risky

points!

47 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Early stopping condition 3:

Stop when data points in a node <= Nmin

Loan status: Root

Safe Risky 22 18 Example: Nmin = 10

Credit?

16 0 1 2 5 16

Early stopping

condition 3

48 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Summary of decision trees

with early stopping

Early stopping: Summary

certain depth

split that does not cause a sucient

decrease in classification error

intermediate node which contains

too few data points

Greedy decision tree learning

Step 2: Select a feature to split data

For each split of the tree: Stopping

Step 3: If nothing more to, conditions 1 & 2

make predictions or

Step 4: Otherwise, go to Step 2 & Early stopping

continue (recurse) on this split conditions 1, 2 & 3

Recursion

Overfitting in Decision Trees:

Pruning

I O N A L

O P T

2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Stopping condition summary

Stopping condition:

1. All examples have the same target value

2. No more features to split on

1. Limit tree depth

2. Do not consider splits that do not cause a

sucient decrease in classification error

3. Do not split an intermediate node which

contains too few data points

Exploring some challenges

with early stopping conditions

Challenge with early stopping condition 1

Hard to know exactly

when to stop

Classification Error

Simple Complex

trees trees

True error

Training error

max_depth

Tree depth

57 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Is early stopping condition 2 a good idea?

Classification Error

Stop because of

zero decrease in

classification error

Tree depth

58 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Early stopping condition 2:

Dont stop if error doesnt decrease???

y = x[1] xor x[2] y values Root

x[1] x[2] y True False 2 2

False False False

False True True Error = .

True True False

=

(root) 0.5

Consider split on x[1]

x[1] x[2] y True False 2 2

False False False

False True True Error = .

True True False

=

True False

1 1 1 1

(root) 0.5

Split on x[1] 0.5

61 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Consider split on x[2]

x[1] x[2] y True False 2 2

False False False

False True True Error = .

True True False

=

True False

1 1 1 1

Split on x[1] 0.5

Stop now??? Split on x[2] 0.5

62 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Final tree with early stopping condition 2

x[1] x[2] y True False 2 2

False False False

False True True

True False True True

True True False

Tree Classification

error

with early stopping 0.5

condition 2

Without early stopping condition 2

y values 2 2

x[1] x[2] y True False

False False False

x[1]

False True True

True False

True False True

1 1 1 1

True True False

x[2] x[2]

Tree Classification

error

True False True False

with early stopping 0.5 0 1 1 0 1 0 0 1

condition 2

without early 0.0

stopping condition 2 False True True False

64 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Early stopping condition 2: Pros and Cons

Pros:

- A reasonable heuristic for early stopping to

avoid useless splits

Cons:

- Too short sighted: We may miss out on good

splits may occur right after useless splits

Tree pruning

Two approaches to picking simpler trees

algorithm before the tree becomes

too complex

learning algorithm terminates

68 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Pruning: Intuition

Train a complex tree, simplify later

Complex Tree

Simpler Tree

Simplify

Pruning motivation

Simple Complex

tree tree

True Error

Classification Error

Simplify after

tree is built

Dont stop

too early

Training Error

Tree depth

71 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Example 1: Which tree is simpler?

Start

excellent poor

Credit?

fair

Income?

OR Start

Safe Term?

high low

3 years 5 years excellent poor

Credit?

72 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Example 2: Which tree is simpler???

Start Start

excellent

Credit?

poor

OR Term?

3 years 5 years

good fair bad

Safe Risky

Simple measure of complexity of tree

Start

excellent poor

Credit?

Safe Risky

Which tree has lower L(T)?

L(T1) = 5 L(T2) = 2

Start Start

excellent

Credit?

poor

OR Term?

3 years 5 years

good fair bad

Safe Risky

SIMPLER

75 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Balance simplicity & predictive power

Too complex, risk of overfitting

Start

excellent

Credit?

poor

Too simple, high

fair classification error

Income?

Safe Term? Start

high low

3 years 5 years

Risky

Risky Safe Term? Risky

3 years 5 years

Risky Safe

76 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Desired total quality format

Want to balance:

i. How well tree fits data

ii. Complexity of tree

want to balance

Total cost =

measure of fit + measure of complexity

(classification error)

Large # = likely to

Large # = bad fit to

overfit

78

training data 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Consider specific total cost

Total cost =

classification error + number of leaf nodes

Error(T) L(T)

Balancing fit and complexity

tuning parameter

If =0:

If =:

If in between:

81 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Use total cost to simplify trees

Complex tree

Simpler tree

Total quality

based pruning

Tree pruning algorithm

Pruning Intuition

Start Tree T

excellent poor

Credit?

fair

Income?

Safe Term?

high low

3 years 5 years

3 years 5 years

Risky Safe

Step 1: Consider a split

Start Tree T

excellent poor

Credit?

fair

Income?

Safe Term?

high low

3 years 5 years

3 years 5 years

Candidate for

Risky Safe

pruning

86 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Step 2: Compute total cost C(T) of split

Start Tree T

excellent

Credit?

poor = 0.3

Tree Error #Leaves Total

fair

T 0.25

Income?

Safe Term?

high

3 years 5 years

low

C(T) = Error(T) + L(T)

Risky Safe Term? Risky

3 years 5 years

Candidate for

Risky Safe

pruning

88 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Step 2: Undo the splits on Tsmaller

excellent

Credit?

poor = 0.3

Tree Error #Leaves Total

fair

T 0.25 6 0.43

Income?

Safe Term? Tsmaller 0.26

high low

3 years 5 years

C(T) = Error(T) + L(T)

Risky Safe Safe Risky

Replace split

by leaf node?

89 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Prune if total cost is lower: C(Tsmaller) C(T)

Worse training

Start Tree Tsmaller error but lower

overall cost

excellent

Credit?

poor = 0.3

Tree Error #Leaves Total

fair

T 0.25 6 0.43

Income?

Safe Term? Tsmaller 0.26 5 0.41

high low

3 years 5 years

C(T) = Error(T) + L(T)

Risky Safe Safe Risky

Replace split

by leaf node? YES!

90 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Step 5: Repeat Steps 1-4 for every split

Start

Decide if each

split can be

excellent poor pruned

Credit?

fair

Income?

Safe Term?

high low

3 years 5 years

Decision tree pruning algorithm

Start at bottom of tree T and traverse up,

apply prune_split to each decision node M

prune_split(T,M):

1. Compute total cost of tree T using

C(T) = Error(T) + L(T)

2. Let Tsmaller be tree after pruning subtree

below M

3. Compute total cost complexity of Tsmaller

C(Tsmaller) = Error(Tsmaller) + L(Tsmaller)

4. If C(Tsmaller) < C(T), prune to Tsmaller

Summary of overfitting in

decision trees

What you can do now

Identify when overfitting in decision trees

Prevent overfitting with early stopping

- Limit tree depth

- Do not consider splits that do not reduce

classification error

- Do not split intermediate nodes with only

few points

Prevent overfitting by pruning complex trees

- Use a total cost formula that balances

classification error and tree complexity

- Use total cost to merge potentially complex

trees into simpler ones

Thank you to Dr. Krishna Sridhar

Sta Data Scientist, Dato, Inc.

96 2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

- The Pain With Using Singleton(WebDynpro)Uploaded byheart404
- LinksUploaded byice_cute21
- workshop04.txtUploaded byTooba Aamir
- Covering Graphs using Trees and StarsUploaded byfsecuredown
- Tree PropertiesUploaded byRajendranbehappy
- CATA2017-HenriqueRaiaPlentzUploaded byAllan Renato Sabino
- PeopleSoft Tree Manager (Creating Trees)Uploaded byNaveen Kumar Naidu
- MCS-021(11-12)Uploaded bysamir_redrose
- The Barnes-Hut AlgorithmUploaded byLuizAntunes
- Kelompok Batuan Ini Disebut Kompleks Tektonik Bantimala Yang Tersusun Oleh Batuan Metamorf Yaitu Glaucophane SchistUploaded byanugrah
- [IJCST-V4I5P47]:Rajeswari P. V. N, Radhika PUploaded byEighthSenseGroup
- decision tree 1.pdfUploaded byCART11
- FE570Lec6bUploaded byz_k_j_v
- ImplementUploaded bycancelthis0035994
- IARE_DWDM_AND_WT_LAB_MANUAL.pdfUploaded byPrince Varman
- WorkUploaded byArup Ghosh
- Self-Organized Hierarchical Methods for Time Series Forecasting_Versao_AlynneUploaded byAlynne Saraiva
- fulltextUploaded bymmluqman
- 17403_CS2finalreviewUploaded byjocansino4496
- 5. Randomized Algorithms for Optimizing Large Join Queries - IoannidisUploaded byPanos Koukios
- EE304_20160822 Inc_MatUploaded byJoyprakash Lairenlakpam
- DS Model Paper 2013 14Uploaded byRK Sanayaima Singh
- iosrjournals.orgUploaded byInternational Organization of Scientific Research (IOSR)
- C Programming Chapter 13Uploaded bymalecsandrescu
- catalan.pdfUploaded bykepler1729
- Cyber-Physical Systems - Concept, Challenges and Research AreasUploaded byMarcelo Fortes
- Haskell GHCiUploaded byCésar Alfredo Sánchez Beltrán
- artificial intelligence_ch6_mach2Uploaded byupender_kalwa
- rr410509-data-mining-and-data-warehousingUploaded bySrinivasa Rao G
- Catalan NumbersUploaded byAlvin Abala

- Prevention of Waterhammer ThermguardUploaded bysleepyhollowinoz
- investment-fundamentals-guide.pdfUploaded bySewale Abate
- BoostedTree.pdfUploaded bysleepyhollowinoz
- n300Uploaded byepalpa
- Decision Trees in Practice _ Coursera_programming_assignment-quiz.pdfUploaded bysleepyhollowinoz
- An Overview of AS4678-Earth Retaining StructuresUploaded bysleepyhollowinoz
- Screw Size and ToleranceUploaded bynick10686
- csbpUploaded bysinner123
- Bridas API 6a Tipo 6bUploaded byCamilo Sanchez Vanegas
- Torque Value Guide (Formulas)Uploaded bytenk_man
- Fatigue Resistance of High Strength Bolts With Large DiametersUploaded bymutlu_seçer

- Knowledge-based and Artificial Intelligence System Application in the Fuzzification AndUploaded byLazar
- Apache Mahout Online TrainingUploaded byMindMajix
- MarkovChains.pptUploaded byDursun -Avrupa Yakası-
- Chap 3Uploaded byGabo García
- Bubble SortUploaded byEllenoj Ito Hira
- Clustering ValidacaoUploaded byAndrey de Souza
- Express Learning Automata Theory and Formal Languages PDFUploaded byKim
- PenteUploaded byMuthu Vijay Deepak
- Folder Restore LogUploaded bychessadf
- Continuous Time Fourier TransformUploaded byaparajit5054
- Various Heuristic techniques used for problem solving : A ReviewUploaded bySOMNATH MAJI
- An ideal Aggregated Key Cryptosystem for Maintaining Security of Cloud Using AES AlgorithmUploaded byIRJET Journal
- MELPe NATO code for speech encryption.docxUploaded byElsa Cristina David
- Equation Line Parallel Perpendicular PracticeUploaded bySaher
- Math Unknown TrickUploaded byName Less
- A Bus Driver Scheduling Problem - GRASPUploaded byMaicol Gomez Zegarra
- Bootstrap controlUploaded byboysam4
- On Spectral-Homotopy Perturbation Method Solution of Nonlinear Differential Equations in Bounded DomainsUploaded byEditorJap
- ClassificationUploaded byJijeesh Baburajan
- Trim - I - QMM_111401Uploaded bysarim
- Combinatorics12Uploaded byJimmy Revilla
- MCS Var Red1Uploaded bySalome796
- 2141004 Control System EngineeringUploaded bynisarg
- Unit - VUploaded byJasmine Mary
- sin_park.pdfUploaded byRaymundo Cordero
- A Review on Digital Image Compression TechniquesUploaded byEditor IJRITCC
- MRF for Vision and Image ProcessingUploaded bymeomuop2508
- Algorithm Design SlidesUploaded byThangNgô
- DSP for BeginnersUploaded byGordon Ariho
- Deep Learning With PyTorch (Packt)-2018 262pUploaded byalex torres