Beruflich Dokumente
Kultur Dokumente
Represent rules
Rules can be expressed in English
IF Age <=43 & Gender = Male
& Credit Card Insurance = No
THEN Life Insurance Promotion = No
Target :
default on a home-equity line of credit (BAD)
Inputs :
The input variables are variables such as the amount of the loan,
amount due on the existing mortgage, the value of the property,
and the number of recent credit inquiries.
HMEQ Dataset
Presume that every two dollars loaned eventually returns three dollars
if the loan is paid off in full.
The Cultivation of Trees
Split Search
Which splits are to be considered?
Splitting Criterion
Which split is best?
Stopping Rule
When should the splitting stop?
Pruning Rule
Should some branches be lopped off?
Decision Trees
Start at the root node (top) and follow the rules until a terminal
node (leaf) is reached.
n = 5,000
The tree is fitted to the data
by recursive partitioning.
Partitioning refers to
segmenting the data into
subgroups that are as 10% BAD
homogeneous as possible yes no
with respect to the target. In Debt-to-Income
this case, the binary split n = 3,350 Ratio < 45 n = 1,650
(Debt-to-Income Ratio < 45)
was chosen. The 5,000
cases were split into two
groups, one with a 5% BAD 5% BAD 21% BAD
rate and the other with a
21% BAD rate.
The method is recursive because each subgroup results from splitting a
subgroup from a previous split. Thus, the 3,350 cases in the left child node
and the 1,650 cases in the right child node are split again in similar
fashion.
The Cultivation of Trees
Split Search
Which splits are to be considered?
Splitting Criterion
Which split is best?
Stopping Rule
When should the splitting stop?
Pruning Rule
Should some branches be lopped off?
Splitting Criteria
Left Right
Not Bad 3196 1304 4500 Debt-to-Income
Bad 154 346 500 Ratio < 45
14
Split Criteria
The best split is defined as one that does the best job of separating
the data into groups where a single class predominates in each
group
The best split is one that increases purity of the sub-sets by the
greatest amount
A good split also creates nodes of similar size or at least does not
create very small nodes
Tests for Choosing Best Split
Easy to understand
Interpretability
Tree-structured presentation
Handling of Outliers
Stunting
Pruning
24
The Scenario
The input variables are variables such as the amount of the loan,
amount due on the existing mortgage, the value of the property,
and the number of recent credit inquiries.
HMEQ Dataset
Presume that every two dollars loaned eventually returns three dollars
if the loan is paid off in full.
SAS Process Flow
Input Data Source Node: Variables
Decision Tree Node: Results
Decision Tree Node: Results
Decision Tree Node: Results
Lift Charts
% Response Chart: Cumulative
% Response Chart: Cumulative
% Response Chart: Non- Cumulative
Lift Value: Cumulative
Lift Value: Non- Cumulative
% Captured Response: Cumulative
Consequences of a Decision
Decision 1 Decision 0
40
Consequences of a Decision: Profit matrix (SAS EM)
Decision 1 Decision 0
41
Profit
NEXT TOPIC: Clustering for Segmentation