Beruflich Dokumente
Kultur Dokumente
http://info.salford-systems.com/jsm-2015-ctw
August 2015
Salford Systems
Course Outline
Demonstration of two classification examples in SPM
o Bank Marketing
o KDD cup 2009
ROC learn value is 0.94 which should get your attention to exam if it is
too good to be true
ROC learning and test difference tells us that time does have an impact
Copyright Salford Systems 2013
LOGIT Model Coefficients
Learn and Test sample perform quite different with this model which means
time does contribute as a factor to influence the outcome
Also learning sample performance looks too good to be true
Copyright Salford Systems 2013
Variable Importance
Duration: this attribute highly affects the output target (e.g., if duration=0
then y='no'). Yet, the duration is not known before a call is performed.
CART gives an initial look of what variable are important, it is useful when there
are quite a few predictors in your dataset.
Copyright Salford Systems 2013
Root Node Split Very
Effective
We can view nodes detail by clicking
Tree Details in CART output window
The first splitter is month which is
also shown in variable importance
ranking table as the most influential
predictor
The whole tree with details can be
viewed as well
60 60
50 50
40 40
30
MV
MV
30
20
10 20
0 10
-10 0
0 10 20 30 40 0 10 20 30 40
LSTAT LSTAT
Build MARS Model
This output window shows you the number of basis functions in the model
against the performance of the model. Because MARS is a regression engine, the
MSE and R-squared values will still be reported, but can be ignored here.
Copyright Salford Systems 2013
Summary
Here is where the logistic regression equation is laid out in terms of the basis
functions (transformations of the predictors). Each basis function is
described and the final model is listed at the bottom. This form of output is
especially desired by those who are comfortable with standard regression.
Copyright Salford Systems 2013
MARs Plots
The Output window shows a graph of the number of trees in the ensemble
with its corresponding ROC value. The vertical green bar denotes the model
with the optimal ROC: 9 trees at 0.69.
Using TreeNet for targeted marketing has improved random calling and given you
an idea of how the predictors affect subscription
Copyright Salford Systems 2013
Random Forests
Ensemble of trees built on bootstrap samples
Algorithm:
o Each tree is grown on a bootstrap sample from the learning data
o During tree growing, only P predictors are selected and tried at each
node
o By default, P is the square root of total predictors
A separate test
dataset is provided
in the competition,
but true target
values were not
included
For model-building,
we will use a 20%
random partition of
the training dataset
to monitor
performance
We are interested in
looking at CART
ranking of important
predictors
By forcing the tree
to only one split, we
can quickly create
a tree to access this
information
Unable to compare to true target values because these were only seen
by competition judges
However, we are confident in our results (2 of the above groups used SPM)
Results can vary based on optimal selection criterion, random number
seed, etc.