Sie sind auf Seite 1von 8


Predicting customer behavior is so difficult that

companies spend millions inundating – and
alienating – customers. Here’s a way to crunch
the data that makes it possible to offer customers
what they want, when they want it.

Knowing What to Sell,

When, and to Whom
by V. Kumar, Rajkumar Venkatesan, and Werner Reinartz

Y ou are a chief marketing officer

contemplating your company’s
quarterly mail-shot to customers. You
when. Using these data, you should be
able to determine the probability that
a given existing customer with a certain
know that if you can get some of your buying history will purchase a given
customers to buy from you, then you’ll product at a given time. This informa-
have increased the chances that they’ll tion should enable you not only to tar-


come back again in the future; second- get the customers who are most likely to
time customers are more likely to become purchase something but also to tailor
third-time customers than first-time cus- your offering to what is most likely to
tomers are to become second-time ones, appeal to them. And it should prevent
and so on. But the mailing is an expen- you from spending money on customers
sive proposition, and you know that in who won’t follow through (and who
the past only about 3% of customers might actually be put off by the ava-
have actually responded to mailings by lanche of unsolicited offers coming
making a purchase. Shareholders and fi- from your organization). All of this
nancial analysts are keeping a close eye should significantly improve your ROI–
on your company’s marketing ROI, so the benefits from more precise target-
you need to make each contact with ing and the reduction in the number
your customers count. of mailings more than outweighing
You turn to your company’s newly im- the costs of customization in this digi-

plemented customer relationship man- tal age.

agement (CRM) system, which tracks That, at least, is the theory. Unfortu-
what each customer purchases and nately, despite the abundance of data

march 2006 131

T O O L K I T • K n o w i n g W h at t o S e l l , W h e n , a n d t o W h o m

that many companies collect, most do probability that a customer will choose tomer is most likely to buy. It does not,
a poor job predicting the behavior of to purchase a particular product. The however, tell them anything about
their customers. In fact, our research second is to estimate the probability when a customer will buy.
into the purchase patterns of thousands that a customer will make a purchase at The next step in the traditional
of customers at two large firms suggests a particular time. Most firms stop at the method is to estimate the probability
that their predictions about whether first step, which limits their ability to that a customer will make a purchase at
a particular customer will buy a partic- make accurate predictions about the a given time. This probability is a func-
ular product at a particular time are cor- timing of purchases, but even those tion of the average interval between
rect only around 60% of the time, a re- companies that follow the process purchases for all customers in the orig-
sult that hardly justifies the costs of through end up with bad data, as we inal sample, adjusted for a number of
having a CRM system in the first place. shall see. customer-specific variables, such as the
After all, you would accurately predict The probability that a customer will time intervals between the most recent
the outcome of a coin toss 50% of the choose to buy a particular product is purchases, and how often marketing
time. Most companies take studies like assumed to be a function of a range of materials have been sent to each per-
this as evidence that it’s impossible to variables. Some of the variables will be son. Historical data on these variables
use the past to predict the future, and a customer’s demographic data, some for a sample of customers are, once
they revert to the timeworn marketing will reflect the person’s past purchasing again, plugged into a form of regression
practice of inundating their customers behavior, and still other factors will have analysis that produces an equation in
with offers. to do with the company’s actions, such which the relative importance of the
But as we will demonstrate, the poor as the customer’s familiarity with the determining variables is fixed. By feed-
predictions are not the result of any brand or the nature of the company’s ing fresh data on these variables into
basic problem with CRM systems or any
failure of the predictive power of past
behavior. Rather, the problem lies in the
Many companies may be actively damaging
limitations of the mathematical meth- revenues in an attempt to make sure that no
ods most companies use to interpret the
data. We have developed a new way of
opportunity for a sale is missed.
predicting customer behavior, based on
the work of the Nobel Prize–winning contact with him or her. Marketers the equation, marketers can derive the
economist Daniel McFadden, that de- using the traditional method determine probability that each customer will buy
livers vastly improved results. Indeed, the relative importance of the variables a product at a particular time. This
the new methodology ups the odds of by looking at a sample of customers, allows the marketer to determine at
successfully predicting a specific pur- usually those on which the firm has the which times (for example, in which
chase by a specific customer at a specific richest data. Some form of regression months) each customer is most likely to
time to about 80%, a number that will analysis is then applied to these data to buy any of the company’s products.
have a major impact on any company’s derive an equation for the desired prob- The joint probability for each cus-
marketing ROI. Using our methodol- ability, in which each variable has a co- tomer’s future purchase behavior is cal-
ogy, managers can actually increase efficient, or weighting, that determines culated by simply multiplying the two
revenues while reducing the frequency its relative importance. The equation is probabilities – which products the indi-
of customer contact, evidence that over- then used to estimate the product vidual will buy and when. What mar-
communication does indeed damage a choice probabilities for all the custom- keters get from this is a probability cube
company’s sales. ers on which the firm has enough data, whose three axes are customers, product
the assumption being that the coeffi- groups, and time periods, as illustrated
A Problem of Probabilities cients of the sample will remain valid in the exhibit “The Customer Probabil-
To understand why companies do such for all customers in the future. What ity Cube.”Marketers can use the cube in
a poor job of predicting customer be- marketers get at the end of this exercise various ways. They can identify what
havior, we must first take a closer look is a series of probabilities that tells them products each customer will buy over
at the methods they use. The most com- (theoretically) which customers are a period and when his or her purchases
mon method involves two separate most likely to buy a particular product are most likely to take place. Or they
steps. This first step is to estimate the and which products a particular cus- can identify the customers who are

V. Kumar ( is the ING Chair Professor and the executive director of the ING Center for Financial Services at
the University of Connecticut’s School of Business in Storrs. Rajkumar Venkatesan ( is an assis-
tant professor and the assistant director of the ING Center for Financial Services at the University of Connecticut’s School of Busi-
ness. Werner Reinartz ( is an associate professor at Insead in Fontainebleau, France.

132 harvard business review

K n o w i n g W h at t o S e l l , W h e n , a n d t o W h o m • T O O L K I T

most likely to buy each product and the

times when the product will be most
actively in demand. Using those pre-
dictions, they can determine what prod-
ucts to offer to which customers at
which times.
All this sounds very reasonable. The
relationship between a customer’s deci-
sion to purchase and the choice of prod-
uct seems to be captured by the fact that
product choice probabilities are fac-
tored into timing probabilities. And in
many industries, sample sizes are large
and customer data rich. Why then are
the numbers so unreliable?
Part of the problem is that the timing
of a customer’s purchase is influenced
by the type of product purchased. Sup-
pose a customer purchases product A
every three months and product B every
four months. Let’s further suppose that
two months have elapsed since the cus-
tomer’s most recent purchase, and she
bought product B at that time. Clearly,
this customer is right now more likely
to purchase product A than product B.
But the approach of multiplying the
product-choice and purchase-timing
probabilities from two independent re- to put the question to a random sample the sample group are representative
gression equations completely ignores of 1,000 customers and average their of the population as a whole (the pop-
any kind of interdependence between responses. You’d then use their average ulation here being all your existing
the two probabilities. The result is poor rating of, say, four stars as a proxy for customers and the sample being those
predictions of both when a customer the entire population’s average rating. existing customers you’re using to de-
will make a purchase and what product The problem is, if you were to take termine the relationship). But since
the customer will buy at that time. another random sample of 1,000, you that’s highly unlikely to be the case, it
There are statistical corrections that might get a three-star average rating. If follows that the relationships between
can deal with this common regression- you took 100 such samples, you would customer purchase decisions and the de-
analysis phenomenon, however, so it is find that the ratings from these samples termining variables estimated through


not the main problem with the method. followed a normal bell-shaped distribu- regression analysis are bound to be inac-
There’s another source of error in the tion pattern around a mean (say, 4.1 curate. If the sampling error is severe
traditional method that cannot be cor- stars) that was closest to the entire pop- enough, the company using this meth-
rected: the fact that the two probability ulation’s true average rating. The chance odology can end up choosing the wrong
equations are based on data from a sin- that the sample you started with actu- product to push at the wrong time to
gle sample. This gives rise to sampling ally had the same mean as the popula- the wrong customer – and even using
error, the inaccuracy of results that oc- tion as a whole is infinitesimal. To get the wrong channels (which channels a
curs when a population sample is used close to the true population mean, you company uses are often a big determi-
to explain the behavior of the total pop- would have to repeat the test 100 times nant of both product choice and pur-
ulation. To understand how this works, or more with different samples or use chase timing).
consider the following simple example. a much bigger sample. Unfortunately, most companies have
Suppose you have 20 million custom- The traditional approach to estimat- no option but to rely on often relatively
ers, and you want to know how highly ing probabilities is vulnerable to sam- small samples to perform the calcula-
they rate your product. You probably pling error precisely because of the tions. They frequently lack enough
would not ask all 20 million of them implicit assumption in all regression data on all their customers to estimate
what they thought and then average all analyses that the weightings, or coeffi- meaningful relationships between the
the ratings. Rather, you’d be more likely cients, of the independent variables of various drivers of purchasing behavior.

march 2006 133

T O O L K I T • K n o w i n g W h at t o S e l l , W h e n , a n d t o W h o m

On top of that, the populations may than estimating a single weighting for weightings are valid for the whole
simply be too big for their computers each variable (as regression analysis population.
to handle–imagine trying to work with does), the formula at the heart of this We have built on the pioneering work
data from 1 million customers choos- technique first specifies the range of of Daniel McFadden to develop a multi-
ing every month from 1,000 products. weightings that could have produced variate formula called a likelihood func-
Yet it’s precisely for companies with the observed data of the sample being tion, which can accurately compute pur-
large customer populations that an ac- analyzed. Then, through an iterative chase and timing probabilities for a
curate probability cube would create chain of calculations, it allows the ana- customer population choosing from
the most value. lyst to determine the most probable more than two products. That’s obvi-
weightings for the variables involved, ously important because most compa-
Eliminating Sampling those that would most likely have pro- nies offer more than two products and
Error duced the observed data. You can think many of their customers – especially
So how can companies derive proba- of a Bayesian estimation as reproduc- those they are likely to use in a sam-
bilities free of sampling error? The an- ing the dots on a scatter diagram rather ple–will have purchased more than two
swer lies in a branch of statistical math- than finding the best-fit line, which is different ones. While a full discussion
ematics called Bayesian estimation. The what regression analysis does. This kind of the mathematics of the model is be-
methodology has been around for de- of calculation has greater predictive yond the scope of this article, we pro-
cades but is only recently entering the power because it reproduces the actual vide a summary description of the for-
marketing mainstream. behavior of a sample rather than esti- mula in the exhibit “Estimating the
Bayesian estimation gets around the mating a set of weightings from one Likelihood of Purchase.” (We refer those
problem in the following way. Rather sample and then assuming that those interested in a complete exposition of
our methodology to our working paper,
“A Purchase Sequence Analysis. Frame-
work for Targeting Products, Customers
The Customer Probability Cube and Time Period” and to the paper that
describes the work of Daniel McFadden
To better target marketing initiatives, companies need to be able to predict
on which our methodology is based:
what products customers will buy and when. Marketers use a probability cube
“Social Science Duration Analysis,” by
to determine the probabilities of purchase along three dimensions: the cus- James Heckman and Burton Singer, in
tomers, the products, and time. The cube shown here is for a company selling the book Longitudinal Analysis of Labor
four products. The numbered cells indicate that there’s a 90% chance that in Market Data.)
the first quarter, customer 1 will buy product 1, a 10% chance that he will buy Estimating and using our likelihood
product 2, a 60% chance that he will buy product 3, and a 20% chance that he function requires special software such
will buy product 4. This cube also allows the firm to identify which customers as Gauss or MATLAB. We start by plug-
are most likely to buy product 1, for example, in Q1 as well as all the products ging into the program the actual pur-
that customer 1 is likely to buy in all four quarters. chasing behavior of all customers in our
sample, specifying what each purchased
Q4 … … … … and when for a given period. Next, we
Timing Q3 … … … …
(in Quarters) … input all other data we have on each
Q2 … … … … customer in the sample – age, sex, aver-

Q1 age time between purchases, and so on.
0.9 0.1 0.6 0.2 … …
The program software then processes
.2 … the data through our likelihood func-
C1 0.9 0.1 0.6 0.2
… … tion and iteratively applies different
… … weightings to each variable until the
C2 … … … …

… function approaches the range of co-

efficients most likely to reproduce the
… …
… … … … … behaviors observed at the beginning.
… In other words, the software reverse-
… … … … … engineers the scatter diagram, in which
Cn the dots are purchases of different prod-
ucts at different times by different cus-
P1 P2 P3 P4 tomers in the sample.
Products/Categories Of course, for this new approach to be
an improvement over traditional meth-

134 harvard business review

K n o w i n g W h at t o S e l l , W h e n , a n d t o W h o m • T O O L K I T

ods, it needs to generate more accurate only 55% of cases. Thus, our new meth- 10,000 customers to estimate probabil-
results. To see if this was the case, we odology improved the B2B company’s ities over the fifth year. The customer
first applied it to a large multinational ability to accurately predict customer variables we input were the same as
B2B company that sells high-tech prod- behavior by about 54%. The main flaw those we used for the B2B company,
ucts and services to professional and in the traditional method is that even with some adjustments reflecting the
Fortune 500 clients. We examined three though it accurately predicts which different nature of the businesses (cus-
years of data (2000 through 2002) on products the customer will buy, it per- tomers can’t, for instance, return finan-
a sample of 20,000 customers to deter- forms poorly in predicting the pur- cial services). The results we got were
mine coefficients for the customer vari- chase time. The greater accuracy of our strikingly similar to the B2B case. Our
ables and then applied the resulting method was also reflected in the reduc- model predicted actual purchases cor-
equations to all the customers in the tion of the standard deviation of our rectly 71% to 89% of the time, which com-
database to derive a probability cube predictions. Our predictions typically pared favorably with a hit rate of be-
covering the four quarters that started varied from the outcome by 3.4 months tween 58% and 65% for the traditional
in January 2003. We looked at a range rather than the 4.4 months of the tradi- model. On average, the improvement in
of factors related to purchase behavior tional approach. performance for the proposed model
(such as number of products purchased We performed the same experiment is about 33% compared with the tradi-
from different product categories and for a large corporation selling financial tional model. The average deviation in
number of products bought within the investment, banking, and insurance purchase timing was about 3.1 months
same category) and timing (interpur- products directly to consumers. This for our method compared with 4.2
chase times, for example, and frequency time, we used four years of data on months for the traditional method.
of marketing contact).
We then applied the traditional
methodology (using the same set of cus-
tomer variables) to derive a second
probability cube. The probabilities ob- Estimating the Likelihood of Purchase
tained by the two methods produced
At the heart of our method for predicting customer behavior is what we call
very different numbers, as can be seen
in the exhibit “How Different Are Our the likelihood function. The function estimates the likelihood (Li) that a
Numbers?” The exhibit compares the customer or household (i) will purchase a given product at a given time:
probabilities of a single, very frequent Cr
Ri J i
customer choosing to buy one or both ijt 1−Cr
of two products over a period of four Li = fi t, j Si t i
quarters. ri = 1 j=1
We compared the probabilities we de-
rived by both methods with the actual where:
observed behavior of a number of cus- Ri is the number of interpurchase times for customer or household i
tomers of our B2B firm during 2003 and
2004, and a sample of our findings is if the ri interpurchase time extends beyond
0 the observation


given in the exhibit “How Accurate window
Cr =
Were We?” Our method was much bet- i
1 otherwise
ter at predicting what customers would
actually do than the traditional method.
if product j is bought by customer or household i at time t;
When our methodology, for example, 1
predicted that a particular customer ijt = the probability that  ijt = 1 is Pij (t)
had a high probability (defined as more 0 otherwise; the probability that  ijt = 0 is (1− P ij (t))
than 50%) of purchasing product 1 in a
given quarter, in 85% of cases, the cus-
and fi (•) and Si (•) denote the density and survivor functions, respectively.
tomer did indeed purchase that prod-
The term involving Si (t) accounts for right censoring of the data, because
uct. (The hit rates for buying product 2
the end of the data collection period usually does not coincide with a purchase
individually and buying both products 1
and 2 together were 74% and 80% re- for all households. Consequently, this term does not depend on Pij (t).
spectively.) But when the traditional Standard maximum likelihood methods can be used to estimate the model
methodology indicated that a customer parameters. We have implemented the model-estimation procedure in
had a high probability of purchasing a Gauss program.
product 1, the prediction was correct in

march 2006 135

T O O L K I T • K n o w i n g W h at t o S e l l , W h e n , a n d t o W h o m

How Different Are Our Numbers?

The charts in this exhibit compare the probabilities B2B FIRM:
of purchase for a single customer over time and PROBABILITIES OF PURCHASE USING…
across product types. The first chart shows the num-
bers according to our Bayesian estimation model; OUR MODEL
the second reflects the traditional method. As the Q1 Q2 Q3 Q4
numbers show, the two models predict very different Product 1 0.75 0.1 0.17 0.36
purchasing behaviors for the same customer. The Product 2 0.1 0.15 0.66 0.35
results in the first table indicate that a given customer Both Products 1 & 2 0.1 0.7 0.1 0.2
is expected to buy product 1 (say, a router) in Q1,
products 1 and 2 (a router and an antivirus software
program) in Q2, and product 2 (an antivirus program) TRADITIONAL MODEL
in Q3. The results in the bottom table indicate that Q1 Q2 Q3 Q4
the same customer is expected to purchase a router Product 1 0.5 0.7 0.05 0.2
in Q2, a router and an antivirus program in Q3, and Product 2 0.4 0.1 0.25 0.6
an antivirus program in Q4.
Both Products 1 & 2 0.05 0.15 0.65 0.2

How Accurate Were We?

This exhibit compares the predictive Made Did not make
power of our method with that of the OUR MODEL a purchase a purchase
traditional method of estimating pur- Of the customers predicted to buy 85% 15%
chase probabilities. We see that 85% of Of the customers predicted not to buy 13% 87%
the customers of the B2B company who
our method predicted would purchase
product 1 actually went on to do so. In TRADITIONAL MODEL
contrast, only 55% of the customers Of the customers predicted to buy 55% 45%
predicted to be purchasers by the Of the customers predicted not to buy 41% 59%
traditional method actually did so.

More Bang for the panies may be actively damaging their we conducted a field study to see what
Marketing Buck customer revenues in attempts to make impact applying strategies suggested by
Our experiments highlighted the im- sure that no opportunity for a sale is the model would actually have on the
portance of interdependencies between missed. This finding reinforces anec- profits and revenues at our two compa-
the variables. Of particular interest was dotal evidence: How do you, as a cus- nies, both of which we suspected were
our finding that purchase acceleration tomer, feel about the space taken up in guilty of overcommunicating.
was linked to marketing communi- your mailbox by special offers from We split each of our samples (20,000
cation in a highly nonlinear fashion. credit card companies? customers at the B2B firm and 10,000 at
Below a certain threshold frequency of The corollary is that a careful reduc- the financial services firm) into a test
marketing contact, customers were held tion in communication by these same group and a control group. The commu-
back from purchasing; but above a cer- companies to the right levels would lead nication strategy for the customers in
tain threshold, customers were put off. not only to lower costs but to an increase the test groups was determined by the
In other words, communicating too in revenues per customer. To test the ef- variable relationships and the probabil-
much can harm you as much as commu- fectiveness of our methodology in help- ity predictions generated by our model.
nicating too little. Clearly, many com- ing companies find those right levels, The contact strategy for customers in

136 harvard business review

K n o w i n g W h at t o S e l l , W h e n , a n d t o W h o m • T O O L K I T

10,000 customers, the increase in profits

What Was the Impact on the Bottom Line? amounted to over $4 million. Extended
to the firm’s total customer population,
Reducing the level of marketing communication with customers, as the profit improvement would amount
suggested by our approach, sharply improved returns on marketing to $200 million.
investment at both companies we studied. Revenues also increased A great proportion of this improved
across the board at both firms for the test group customers. We ob- profitability, of course, can be attributed
to the costs saved by reducing the level
served that total revenues from the test groups were higher than rev-
of communication (31% at the B2B firm
enues for the control groups by about $4 million for the high-tech
and 26% at the financial services firm).
firm and about $1.7 million for the financial services firm. The impact
But note that the revenues generated
of rolling this out to the entire customer bases of these firms is for all product groups also went up. The
clearly significant. At the B2B firm, for example, we estimate a poten- $365 average per-customer difference in
tial revenue improvement of $73 million. revenues at the B2B firm, for example,
implies that sales could be as much
IMPROVEMENTS IN PERFORMANCE (PER CUSTOMER) as $73 million higher if the new method-
ology were rolled out to all 200,000
HIGH-TECHNOLOGY COMPANY (B2B) customers. It appears, therefore, that
our model does indeed do more than
Revenue ($) Profit ($) ROI (%) just allow companies to stop spending
Product 1 605 1,649 150 money on unreceptive customers–it ac-
Product 2 306 1,897 160 tually helps companies recover sales
that their traditional marketing strate-
Products 1 & 2 198 1,273 170
gies may currently be losing.
The secret to achieving a good market-
Revenue ($) Profit ($) ROI (%) ing ROI is simple: Give customers more
Product 1 208 591 180 of what they truly want and less of
Product 2 247 428 170 what they don’t. It’s always been hard
to work out what customers do and
Product 3 182 397 180
don’t want, let alone when they do or
Products 1 & 2 97 402 200 don’t want it, so marketers have re-
Products 1 & 3 58 336 220 sorted to offering them everything all
Products 2 & 3 101 381 220 the time. Our new technique makes
it perfectly feasible for companies to
Products 1, 2, & 3 164 402 210
avoid this trap. And thanks to the wide-
spread availability of rich databases,
computing power, methodological ad-


vancements, and quantitative empiri-
cal thinking, the list of companies that
the control groups was determined by At the B2B firm, the new methodology can benefit from this approach is large
their company’s traditional approach. increased profits by an average of and growing larger. Companies that
Over the course of a year, we collected $1,600 per customer, representing an take advantage of the new technology
per-customer data on revenues, costs improvement in ROI of 160%. Given the in the right way will doubly benefit –
of sales and communication, number of sample size of over 20,000 customers, the overall reduced level of marketing
contacts before a purchase is induced, the increase in profits amounted to will stop them from alienating custom-
profit, and return on investment for the about $32 million for the sample group ers while making more dollars available
sample customers. alone. Since the company’s entire cus- for tailored pitches to existing custom-
The exhibit “What Was the Impact on tomer base numbered 200,000, the po- ers and for outreach initiatives to new
the Bottom Line?” gives a breakdown tential profit improvement would total ones. When companies offer customers
of the differences between the two $320 million. For the financial services what they want, when they want it, sales
groups at each company for each mea- firm, the average profitability improve- will rise.
sure tracked. The communications plans ment per customer was about $400, rep-
determined by our model resulted in resenting an ROI improvement of 200%. Reprint R0603J
sharp improvements in profitability. Given the sample size of more than To order, see page 151.

march 2006 137