Beruflich Dokumente
Kultur Dokumente
IBM Research
Two parts:
1 Introduction to use of Data Mining in marketing applications
(Collaborator: Naoki Abe)
What are the problems we address?
Comparison of Data Mining and Marketing Science
approaches
Some of the challenges for Data Mining approaches
1 Customer Wallet and Opportunity Estimation: Analytical
Approaches and Applications
(Collaborators: Claudia Perlich, Rick Lawrence, Srujana
Merugu and others)
Define the problem
Describe analytic solutions
Demonstrate performance in real application
2
IBM Research
IBM Research
Channel optimization
Churn analysis
1
1
1
IBM Research
IBM Research
CRM analytics:
Relies on primary research (=surveys) to understand
needs and wants
Relies on (more or less) detailed models of customer
behavior
Usually parametric statistical models
Data mining:
Typically relies on data in Data Warehouse /Mart
Uses minimum of parametric assumptions
Often attempts to fit problem into standard modeling
framework: classification, regression, clustering...
IBM Research
Comparison of approaches
Criterion
Marketing
DM
IBM Research
IBM Research
Costs
Return on
marketing
investment
Marketing
investment
Pulling
levers
Increased
equity
Main goals:
1 Identify relevant levers
1 Quantify their effect
9
IBM Research
IBM Research
Data definitions
1 Potential drivers (marketing activities) are reflected in
the components of xi
Price
Quality of service
etc.
11
IBM Research
Coefficient
Std error
Z score
(coeff/std)
Inertia
.849
.075
11.34
Quality
.441
.041
10.87
Price
.199
.020
9.86
Convenience
.609
.093
6.56
..
.
..
.
..
.
..
.
12
IBM Research
IBM Research
14
IBM Research
15
IBM Research
IBM Research
Comparison of approaches
Criterion
17
Marketing
DM
IBM Research
An integrated approach
1
18
IBM Research
IBM Research
Future
Next year
Now
LTV modeling
Time
Sales
forecasting
Sales /
revenue
modeling
Actual sales
20
Wallet estimation
Revenue
Potential sales
2006 IBM Corporation
IBM Research
IBM Research
LTV modeling
Real
insight
Wallet
estimation
Understand effect
of potential actions
on LTV and Wallet
attainment
Revenue
modeling
Revenue
forecast
Lever
identification
Basic
concepts
Correlation
Causality
Actionability
Passive
22
Active
2006 IBM Corporation
IBM Research
23
IBM Research
1
1
Y
24
IBM Research
25
IBM Research
Causes
Dynamics
Effects on company
26
Marketing campaigns
IBM Research
IBM Research
Cost to company
28
IBM Research
1 Bayesian Networks
Motivation: need to address causality vs. correlation issue; need to
formalize domain knowledge about relationships in data
Example domain: Customer wallet estimation
IBM Research
30
IBM Research
31
IBM Research
True
Predicted
True
bad
good
True
bad
good
True
bad
-C
bad
- Default Amt
good
Profit C
good
Interest
Predicted
IBM Research
need be estimated!
1 Problem: Requires conditional density estimation and
regression to solve a classification problem.
Price is high computational and sample complexity
1 Merit: more flexibility and general applicability
Business constraints
Variability in fixed costs
But, is it necessary ?
33
IBM Research
For all
(2)
For all
(3)
For t=1 to T do
(d) ft = Stochastic(hi )
h (x, y) )
t =1
34
IBM Research
The difference between the current average cost and the cost
associated with a particular label is the boosting weight
C(x,y)
Cost C(x,y)
Ave Cost
E[C(x,y)]
Predicted
At learning iteration t
35
Label, y
Training Labels
IBM Research
Cost-sensitive boosting outperforms existing methods of costsensitive learning as well as classification and regression
Data Set
Bagging
Annealing
1059174
12712
20742
344
Solar
5403397
23738
5317390
4810
31942
428
499
21
Letter
1513
921
1302
852
Splice
645
614
503
584
19010
1086
1046
936
KDD-99
Satellite
AvgCost
Existing methods
36
MetaCost
GBSE
IBM Research
Active/Query Leaning
Domain
Learner
Training Sample
37
IBM Research
Main idea is to query points at which the agent algorithms disagree the most (to
maximize information gain)
Weakness: the theory requires ideal (Gibbs) agent learner, and is generally not
computationally feasible
Query by Committee
Maximize Uncertainty: Query a point with maximum spread
Let Agents Predict on
randomly selected points
Agent 2
38
Agent 2
Agent 2
Agent 2
1Idealized
Agents:
Randomized Algorithms
Input Sample
IBM Research
Query by Bagging/Boosting
h2
Agent
Learner A
hT
Agent
Learner A
Input Sample
39
IBM Research
It has been observed that active learning can drastically accelerate the
rate of learning (e.g. 10 to 100 folds) over passive learning
IBM Research
1
1
41
IBM Research
R = t U(st , at )
t=0
42
IBM Research
1 Retailer's goal is to take sequence of actions to guide customer's path to maximize customer's
lifetime value
Campaign E
Valuable
Customer
Loyal
Customer
Loyal
Customer
Campaign A
Potentially
Valuable
Repeater
Repeater
Repeater
Bargain
Hunter
One Timer
Defector
Defector
Campaign B
43
Campaign D
2006 IBM Corporation
IBM Research
Observed lifetime value reflects only customers lifetime value attained by current
marketing policy, and therefore fails to capture their potential lifetime value
MDP based lifetime value modeling allows modeling of lifetime value based on
optimized marketing policy (= the output of system !)
Campaign E
Loyal
Customer
Loyal
Customer
Repeater
Repeater
Repeater
One Timer
Defector
Defector
Campaign C
Potentially
Valuable
Campaign A
Bargain
Hunter
Campaign B
44
Valuable
Customer
Campaign D
2006 IBM Corporation
IBM Research
Potentially
Valuable
45
Rule d
Rule c
Valuable
Customer
Rule b
Repeater
Loyal
Customer
Loyal
Customer
Repeater
Repeater
IBM Research
Q 0 (s, a) U(s, a)
Q k +1 (s, a) (1 - )Q k (s, a) + (U ( s, a) + max a ' Q k (s' , a' ))
Estimate using function approximation (regression)
46
IBM Research
to yield
greater
long
term
profits
80000
Output
Outputpolicy
policy
ofofMDP
MDP
approach
approach
(CCOM)
(CCOM)
invests
investsinin
initial
initial
campaigns
campaigns
70000
60000
50000
S ingle
40000
C C OM
30000
20000
10 0 0 0
0
C a m pa ign num be r
47
IBM Research
P(E)
P(E)
0.3
0.7
Economy
E
P(M)
P(M)
0.3
0.7
0.9
0.1
Marketing
Competition
P(C)
P(C)
0.4
0.6
0.7
0.3
Revenue
48
MC
P(R)
P(R)
FF
0.3
0.7
TF
0.9
0.1
FT
0.2
0.8
TT
0.6
0.4
2006 IBM Corporation
IBM Research
Class
Variable 1
Variable N
Unobserved
Variable 1
Variable N
Clustering/Mixture
State
State
Symbol
Symbol
IBM Research
Economy
An Example Bayesian Network
Marketing
Economy
Competition
Economy
Marketing
Marketing
Competition
Revenue
Marketing
Economy
Competition
Competition
50
IBM Research
Causal Pattern
Economy
Marketing
Economy
Competition
Revenue
Marketing
Competition
Revenue
This pattern shows that the causal relationship between
E, M, and C are ambiguous
51
IBM Research
IBM Research
Economy
Marketing
Economy
Revenue
M 34564748
Marketing
Economy
Revenue
Marketing
Revenue
Marketing
Competition
It can be inferred that Marketing can be
Revenue
IBM Research
54
IBM Research
$
55
$
2006 IBM Corporation
IBM Research
reminder
$
Direct
Mail
Store
IBM Research
action
reward
FULL_LINE_STORE_OF_RES.:
0.018
0.004
NON_FL_STORE_OF_RES.:
0.012
-0.004
0.065
0.090
CUR_DIV_PURCHASE_AMT_2_3M:
0.099
0.080
CUR_DIV_PURCHASE_AMT_4_6M:
0.133
0.091
CUR_DIV_PURCHASE_AMT_1Y:
0.162
0.128
CUR_DIV_PURCHASE_AMT_TOT:
0.153
0.147
0.294
0.028
CUR_DIV_N_CATS_2_3M:
0.260
0.025
CUR_DIV_N_CATS_4_6M:
0.158
0.062
CUR_DIV_N_CATS_TOT:
0.062
1.000
0.008
0.008
1.000
Control Variable
ACTION:
IBM Research
The Challenge: No explicit linking between actions in one channel (mailing) and
rewards in another (revenue)
Very low correlation observed between actions and responses
Other factors determining life time value may dominate over the control variable
(marketing action) in estimation of expected value
Obtained models can be independent of the action and give rise to useless rules !
The Cross-Channel Solution: Learn the relative advantage of competing actions!
Standard Method
Value in state s1
Value in state s2
Value in state s1
Value in state s2
Approximation
a1 a2
Actions
a1 a2
Actions
a1 a2
Actions
a1 a2
Value in state s2
Actions
a1 a2
58
a1 a2
2006 IBM Corporation
IBM Research
Definition of Advantage
Repeat
1. Learn
1.1. A(s,a):=(1-5)A(s,a)
+5 (Amax(s)+(R(s,a)+67tV(s)-V(s))/7t)
1.2. Use Regression to estimate A(s,a)
1.3. V(s):=(1-8)V(s)
+8(V(s)+(Amax-new(s)-Amax-old(s))/5)
2. Normalize
A(s,a):=(1- 9)A(s,a)+9(A(s,a)-Amax(s))
Modifications: 1. Initialization with empirical life time value
59
IBM Research
Evaluation Results
Policy Advantage
Advantage (percentage)
8
6
4
2
0
1
-2
-4
Learning iterations
10
6
4
2
0
1
-2
-4
Learning iterations
60
IBM Research
Evaluation Method
1 Challenge in Evaluation: Need to evaluate new policy using data
collected by existing (sampling) policy
1 Solution: Use bias-corrected estimation of policy advantage using data
collected by sampling policy
1 Definition of policy advantage:
(Discrete Time) Advantage
A
(s,a):= Q
(s,a) maxa Q
(s,a)
Policy Advantage
As~
(
):= E
[Ea~
[A
(s,a)]]
61
IBM Research
IBM Research
63
IBM Research
Customer Identifier
Profile History Date
Period Identifier
Product Category Identifier
Channel Identifier
Aggregated Count of Event
Aggregated Revenue
Aggegated Profit
Transaction
Customer
Customer Identifier
Transaction Date
Product Category Identifier
Event Identifier
Channel Identifier
Transaction Revenue
Transaction Profit
Customer Identifier
First Name
Last Name
Age
Gender
Event Identifier
Customer Identifier
Marketing Action Date
Marketing Action
Channel Identifier
Channel Description
Marketing Event
Event
Product Category
Event Identifier
Product Category Identifier
Weight
Model Identifier
Model Type
Model
Model Identifier
Model Type
Model
Event Identifier
Channel Identifier
Event Date
Event Category Description
Fixed Cost
CCOM Output Models
Optional Entity
64
IBM Research
IBM Research
Outline
1 Wallet estimation: problems and solutions
The different wallet definitions
How can we evaluate wallet models?
Modeling approaches
Empirical evaluation
66
IBM Research
Company Revenue
IT Wallet
IBM Sales
IBM Research
OnTarget
MAP
68
IBM Research
IBM Research
Historical Approaches
IBM Research
Agenda
1 Introduction and analytical issues
Different wallet definitions
71
IBM Research
Total Wallet
Served Wallet
Realistic
IBM Research
s = x + r + z + , ~ N (0, )
2
E(s|r,x,z)
Realistic
IBM Research
Agenda
1 Introduction and analytical issues
Different wallet definitions
74
IBM Research
IBM Research
if y > y
p ( y y )
L p ( y, y ) =
(1 p ) ( y y ) if y > y
76
IBM Research
p=0.8
-3
-2
-1
Residual (observed-predicted)
77
IBM Research
IBM Research
79
IBM Research
Universe of IBM customers
with D&B information
K-Nearest Neighbors
1 Distance metric:
Target company i
Normalization
Employees
Frequency
Re
ve
nu
e
Industry
Industry match
Wallet Estimate
IBM Sales
80
IBM Research
Quantile Regression
1 Traditional Regression:
Estimation of conditional expected value by minimizing sum of
n
squares
2
min ( yi f (xi , ))
i=1
1 Quantile Regression:
L (y
i =1
f ( xi , ))
if y > y
p ( y y )
L p ( y, y) =
(1 p ) ( y y ) if y > y
1 Implementation:
assume linear function
programming
quantile
regression
loss
function
IBM Research
no
yes
Sales<100K
Frequency
Frequency
yes
no
Wallet Estimate
Wallet Estimate
yes
no
IBM Sales
Wallet Estimate
Frequency
Frequency
IBM Sales
Wallet Estimate
82
IBM Sales
IBM Sales
IBM Research
1 Baselines
Constant model
Traditional regression models for expected values (for
skewed distributions, the expected value is actually a high
quantile)
83
IBM Research
1 Conclusions
If there is a time-lagged variable, linear quantile model is
best
Quanting (using decision trees) and quantile tree perform
comparably
Generalized kNN is not competitive
84
IBM Research
IBM Research
1 MAP Components:
Web-interface with customer information
Analytical component: wallet estimates
Workshops with Sales personal to review and correct the
wallet predictions
Shift of resources towards customers with lower wallet
share
86
IBM Research
The MAP tool captures expert feedback from the Client Facing teams
MAP interview process all Integrated and Aligned Coverages
Insight Delivery
and Capture
Web Interface
Wallet models:
Predicted
Opportunity
Transaction
Data
D&B
Data
Expert
validated
Opportunity
Resource
Assignments
Analytics and
Validation
Data Integration
Post-processing
The objective here is to use expert feedback (i.e. validated revenue opportunity)
from from last years workshops to evaluate our latest opportunity models
87
IBM Research
IBM Research
Target company i
1 Post-Processing
Floor prediction by max of last 3 years
revenue
89
Employees
Re
ve
nu
e
1 Prediction
Industry
1 Neighborhood sizes 20
IBM Research
Experts accept
opportunity (45%)
18
Expert Feedback
16
Increase (17%)
14
12
Experts change
opportunity (40%)
10
Decrease (23%)
8
6
4
2
0
0
10
12
14
16
18
Experts reduced
opportunity to 0
20 (15%)
MODEL_OPPTY
90
IBM Research
Observations
1 Many accounts are set for external reasons to zero
Exclude from evaluation since no model can predict this
91
IBM Research
Evaluation Measures
1 Different scales to avoid outlier artifacts
Original: e = model - expert
Root:
e = root(model) - root(expert)
Log:
e = log(model) - log(expert)
1 Total of 6 criteria
92
IBM Research
Model
93
Rational
DB2
Tivoli
Regression Tree
kNN 50 + flooring
Decomposition Center
(Anchoring)
(Best)
IBM Research
Conclusions
94
IBM Research
95
IBM Research
back
Company
IT
Wallet
IT spend
with IBM
Historical
relationship
with IBM
IBM Research
References
1
Marketing Science
R. Rust, K. Lemon and V. Zeithaml, Return on Marketing: Using Customer Equity
to Focus Marketing Strategy, J. of Marketing, 2004.
P. Kotler, Marketing Management. Millennium Ed., Prentice-Hall, 2000.
Cost-sensitive Learning
P. Domingo, Meta-Cost: A general method for making classifiers cost-sensitive,
The 5th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 1999.
N. Abe, B. Zadrozny and J. Langford, An Iterative Method for Multi-class Costsensitive Learning, The Tenth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, August 2004.
Active Learning
H.S. Seung, M. Opper and H. Sompolinsky. Query by committee. Proceedings of
the Fifth Workshop on Computaional Learning Theory, 1992.
D. Angluin. Queries and concept learning. Machine Learning, 1988.
97
IBM Research
References
1 Bayesian Networks and Causal Networks
K. Murphy, A brief introduction to Bayesian Networks and
Graphical Models,
http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html
D. Heckerman, A tutorial on learning with Bayesian Networks,
Microsoft Research MSR-TR-95-06, March 1995.
J. Pearl, Causality: Models, Reasoning, and Inference,
Cambridge University Press, 2000.
P. Spirtes, C. Glymour and R. Scheines, Causation, Prediction,
and Search, 2nd Edition (MIT Press), 2000.
98
IBM Research
Thank you!
srosset@us.ibm.com
99