Beruflich Dokumente
Kultur Dokumente
The authors present a new model that links the well-known RFM
(recency, frequency, and monetary value) paradigm with customer life-
time value (CLV). Although previous researchers have made a concep-
tual link, none has presented a formal model with a well-grounded
behavioral "story." Key to this analysis is the notion of "iso-value" curves,
which enable the grouping of individual customers who have different
purchasing histories but similar future valuations. Iso-value curves make
it easy to visualize the interactions and trade-offs among the RFM meas-
ures and CLV. The stochastic model is based on the Pareto/NBD frame-
work to capture the flow of transactions over time and a gamma-gamma
submodel for spend per transaction. The authors conduct several holdout
tests to demonstrate the validity of the model's underlying components
and then use it to estimate the total CLV for a cohort of new customers of
the online music site CDNOW. Finally, the authors discuss broader
issues and opportunities in the application of this model in actual
practice.
The move toward a customer-centric approach to market- Many researchers and consultants have developed "scor-
ing, coupled with the increasing availability of customer ing models" (i.e.. regression-type models) that attempt to
transaction data, has led to an interest in both the notion and predict customers' future behavior (.see. e.g., Baesens et al.
the calculation of customer lifetime value (CLV). At a 2002: Berr>' and Linoff 2004: Bolton 1998: Malthouse
purely conceptual level, the calculation of CLV is a straight- 2003: Malthouse and Blattberg 2005: Parr Rud 2001). In
forward proposition: It is simply the present value of the examining this work, we note that measures of customers'
future cash flows as.sociated with a customer (Pfeifer. Hask- prior behavior are key predictors of their future behavior. In
ins, and Conroy 2005). Because of the challenges associ- the direct marketing literature, it is common practice to
ated with forecasting future revenue streams, most empiri- summarize customers' prior behavior in terms of their
cal research on lifetime value has actually computed recency (time of most recent purchase), frequency (number
customer profitability solely on the basis of customers' of prior purchases), and monetary value (average purchase
prior behavior, but to be true to the notion of CLV. measures amount per transaction), that is. their RFM characteristics.
should icKjk to the future not the past. A significant barrier However, there are several problems with these models,
has been the ability to model future revenues appropriately, especially when attempting to develop CLV estimates:
particularly in the ca.se of a "noncontractual" setting (i.e.,
where the time at which custotners become "inactive" is •Scoritig tnodels attempt to predict behavior in the next p)eriod.
unobserved) (Bell et al. 2(X)2: Mulhem 1999). but wheti cotnputing CLV. we are intere.sted tiot only in Period
2 but also in Periods 3. 4. 5. and so on. It is not clear how a
regression-type model can be used to forecast the dynamics of
*Peter S. Fader is Frances and Pei-Yuan Chia Professor of Marketing. buyer behavior well into the future and then tie it all back into
Wharton SchtKtl. University of Pennsylvania (e-mail: faderp@whanon. a "present value" for each custotner
upenn.edu). Bruce G.S. Hardie is AssiKiate Professor of Marketing. Lon- •TWo peritxis of purchasing behavior are required: Period 1 to
don Business Schixil (e-mail: bbardietP'london.edu). Ka Lok Lee is Market define the Rf-M variables and Period 2 to arrive at values of the
Research Analyst. Catalina Health Resource (e-mail: kaloklee(3'alumni. dependent variable(s). It would be better if it were possible to
upenn.edu). The authors thank Alben Beinmaor. Paul Farris. Phil Pfeifer, create predictions of future purchasing behavior using only
the three anonymous JMR reviewers, and seminar panicipants al the Yale Period 1 data. More generally, it would be nice to be able to
Sch(X)l of Management and University of Auckland for their helpful com-
leverage all of the available data for model calibration pur-
ments. The second author acknowledges the suppon of the London Busi-
ness .Sch(H)l Centre lor Marketing. poses without using any of it to create a dependent variable for
a regression-tyf)e analysis.
•TTie developers of these ttitxiels ignore the probletn that the An alternative view of the same relationship appears in
observed RFM variables are only imperfect indicators of Figure 2. This contour plot, or "30.000-foot view" of Figure
underlying behavioral traits (Morrisoti and Silva-Risso 199.5).' 1, is an example of an "iso-value" plot; each curve in the
They fail to recognize that different "slices" of the data will graph links customers with equivalent future value despite
yield different values of the RFM variables and. therefore, dif-
ferent scoring model parameters. This has important implica- differences in their prior behavior. Again, it is easy to
tions when the observed data frotn one period are used to make understand how greater recency and greater frequency are
predictions of future behavior. correlated with greater future purchasing, but the jagged
The way to overcome these general problems is to
Figure 1
develop a formal model of buyer behavior that is rooted in
AVERAGE TOTAL SPEND IN WEEKS 40-78 BY RECENCY AND
well-established stochastic models of buyer behavior. In
developing a model based on the premise that observed FREQUENCY IN WEEKS 1-39
behavior is a realization of latent traits, we can use Bayes'
theorem to estimate a person's latent traits as a function of
observed behavior and then predict future behavior as a
function of these latent traits. At the heart of our model is
the Pareto/NBD framework (Schmittlein. Morrison, and 400,
Colombo 1987) for the flow of transactions over time in a
noncontractual setting. An impxjrtant characteristic of our
model is that RFM measures are sufficient statistics for an
individual customer's purchasing history. That is. rather
than including RFM variables in a scoring model simply
because of their predictive performance as explanatory vari-
ables, we formally link the observed measures to the latent
traits and show that no other information about customer
behavior is required to implement our model.
Before conveying our analytical results, we offer a brief
exploratory analysis to set the stage for our model develoj)-
ment. Consider the purchasing of the cohort of 23,570
people who made their first purchase at CDNOW in the first 20
10
quarter of 1997. We have data on their initial and subse- 0 0
quent (i.e.. repeat) purchases through the end of June 1998 Frequency (x) Recency (t»)
(for further details about this data set. see Fader and Hardie
2001). For presentational simplicity, we initially examine
only the relationship between future purchasing and
recency/frequency: we introduce monetary value subse- Figure 2
quently. We first split the 78-week data set into two periods CONTOUR PLOT OF AVERAGE WEEK 40-78 TOTAL SPEND BY
of equal length and group customers on the ba.sis of fre- RECENCY AND FREQUENCY
quency (where x denotes the number of repeat purchases in
the first 39-week period) and recency (where t, denotes the
time of the last of these purcha.ses). We then compute the
average total spend for each of these groups in the follow-
ing 39-week f)eriod.- These data appear in Figure I.
It is clear that recency and frequency each has a direct
and positive association with future purchasing, and there
may be some additional synergies when both measures are
high (i.e., in the back comer of Figure 1 ). However, despite
the large number of customers we used to generate this
graph, it is still sparse and. therefore, somewhat untrustwor-
thy. Several "valleys" in Figure 1 are simply due to the
absence of any observations for particular combinations of
recency and frequency. To "fill in the blanks," we must
abstract away from the observed data and develop a formal
model.
nature of these ctirves emphasizes the dangers of relying efïïcacy, we discuss the iso-value curves. Then, we use our
solely on observed data in the absence of a model. new model to compute CLV for the entire cohort of
We should also note that in using such plots to under- CDNOW customers. We conclude with a brief summary of
stand the nature of the relationship between future purchas- our work and a discussion of promising future research
ing and recency/frequency, the patterns observed will directions.
depend on the amount of time for which the customers are
observed (i.e., the length of Periods 1 and 2). UNKING RFM WITH FUTURE PURCHASING
Despite these limitations of using the kind of data sum- The challenge we face is how to generate forward-
maries in Figures 1 and 2, there is still clear diagnostic looking forecasts of CLV. At the heart of any such effort is a
value in the general concept of iso-value curves for cus- model of customer purchasing that accurately characterizes
tomer base analysis. We can extract the main patterns buyer behavior and therefore can be trusted as the basis for
observed in Figure 2 into the stylized version that appears any CLV estimates. Ideally, such a model would generate
in Figure 3. The shape of these curves should seem fairly these estimates using only simple summary statistics (e.g.,
intuitive, and they are reasonably consistent with the coarse RFM) without requiring more detailed information about
summary in Figure 2. A formal model will enable us to each customer's purchasing history.
understand these relationships better, including possible In developing our model, we assume that monetary value
exceptions to the simple curves shown here. Furthermore, it is independent of the underlying transaction process.
will enable us to create plots that do not depend on the Although this may seem counterintuitive (e.g., frequent
amount of time for which the customers are observed. (A buyers might be expected to spend less per transaction than
parallel is Schmittlein. Coofjer. and Morrison's [1993] work infrequent buyers), our analysis lends support for the inde-
on "80/20 rules" in purchasing, in which they use a model pendence assumption. This suggests that the value per
to create a time-invariant measure of customer transaction (revenue per transaction x contribution margin)
concentration. ) can be factored out. and we can focus on forecasting the
Overall, the creation and analysis of iso-value curves is "flow" of future transactions (discounted to yield a present
an excellent way to summarize and evaluate the CLV for an value). We can then rescale this number of discounted
entire customer ba.se. It can help guide managerial decision expected transactions (DET) by a monetary value "multi-
making and provide accurate quantitative benchmarks to plier" to yield an overall estimate of lifetime value:
gauge the "return on investment" for programs that compa-
nies use to develop and manage their portfolios of (1) CLV = margin x revenue/transaction x DET.
customers. We first develop our submodel for DET. Then, we intro-
In the next section, we describe the model that underpins duce a separate submodel for expected revenue per transac-
our effort to link RFM and CLV. one for which RFM char- tion. Although we assume indejjendence for these two pro-
acteristics are sufficient statistics for an individual cus- cesses, there are still some notable relationships
tomer's purchasing history. After briefly discussing the (particularly between frequency and monetary value) that
model results and some holdout validations to assess its we explore subsequently.
The Pareto/NBD model that Schmittlein. Morrison, and
Colombo (1987) developed has proved to be a popular and
Figure 3
powerful model in explaining the flow of transactions in a
HYPOTHESIZED RELATIONSHIP BETWEEN
noncontractual setting. Reinartz and Kumar (2000, 2003)
RECENCY/FREOUENCY AND CLV
provide excellent illustrations. This model is based on the
following general assumptions about the repeat buying
process:
High •Customers go through two stages in their "lifetime" with a spe-
1 cific firm: They are active for some period of time, and then
I
"Medium"
"High-
value
they become permanently inactive.
•While customers are active, they can place orders whenever
they want. The number of orders a customer places in any
given time period (e.g.. week, month) appears to vary ran-
domly around his or her underlying average rate.
\ value
\ ^
•Customers (while active) varv in their underlying average pur-
chase rates.
\ "Low" •The point at which a customer becomes inactive is unobserved
\ value "^- by the firm. The only indicator of this change in status is an
unexpectedly long time since the customer's transaction, and
even this is an imperfect indicator; that is. a long hiatus does
not neces.sarily indicate that the customer has become inactive.
. There is no way for an outside observer to know for sure (thus
the need for a mtxlel to make a "best guess" about this
Low
process).
•Customers become inactive for any number of reasons; thus,
Low High the unobserved time at which the customer becomes inactive
appears to have a random compwnent.
Recency •The inclination for customers to "drop out" of their relation-
ship with the firm is heterogeneous. In other words, some cus-
418 JOURNAL OF MARKETING RESEARCH, NOVEMBER 2005
In Figure 4, we use a set of box-and-whisker plots to of the 78 weeks (see Figure 5). We note that the Pareto/
summarize the distribution of average transaction value, NBD model predictions accurately track the actual (cumu-
broken down by the number of repeat purchases in the first lative) sales trajectory in both the 39-week calibration
39 weeks. Although we can recognize the slight positive fjeriod and the 39-week forecast period, underforecasting
correlation, the variation within each number-of- by less than 2% at week 78.
transactions group dominates the between-group variation. Given our desire to use the Pareto/NBD model as a basis
Thus, although there is a slight positive correlation for the computation of CLV, we are more interested in our
between average transaction value and the number of trans- ability to predict individual-level buying behavior in the
actions, we do not believe that it represents a substantial forecast period (Weeks 40-78) conditional on prior behav-
violation of our independence assumption. In the next sec- ior (purchasing in Weeks 1-39). We compute this using the
tion, we provide some additional evidence that this modest expression for E(Y(t)|X = x, t,. T) for each of the 2357 cus-
relationship has no discernable impact on the f)erformance tomers. In Figure 6, we rep)ort these numbers along with the
of our model in a holdout setting. average of the actual number of transactions in the forecast
period, broken down by the number of repeat purchases in
MODEL VAUDATION
Before we use the preceding expressions to explore the Figure 5 ' -
relationship between RFM and CLV for the case of TRACKING CUMULATIVE REPEAT TRANSACTIONS
CDNOW. it is important to verify that the Pareto/NBD sub-
model for transactions and the gamma-gamma submodel
for monetary value each provides accurate predictions of 5000
future buying behavior. A careful holdout validation such as
this is missing from virtually all CLV-related literature, but — Actual
it must be conducted before we can make any statements s 4000 - - - Pareto/NBD
about the model's ability to estimate CLV. This analysis is
based on the previously noted 1/10th systematic sample of
1
the cohort (2357 customers), using the first 39 weeks of 3000
data for model calibration and holding out the second 39
weeks of data for model validation. (In the next section,
however, we use all 78 weeks of data for our CLV analysis.) s 2000-
3
We briefly summarize the key statistical results of the fit/
forecasting performance of the Pareto/NBD. We refer the 1000-
interested reader to a more detailed validation of the model
in a separate analysis that Fader. Hardie, and Lee (2005b)
conducted.
The maximum likelihood estimates of the model parame-
ters are f = .55. â = 10.58, s = .61, and $ = 11.67. Using the
expression for E[X(t)], we compute the expected number of
transactions for the whole group of 2357 customers for each
Figure 6
Figure 4 CONDITIONAL EXPECTATIONS OF PURCHASING
THE DISTRIBUTION OF AVERAGE TRANSACTION VALUE BY
(WEEKS 40-78)
NUMBER OF TRANSACTIONS
300 •
i — Actual
(D - * - Pareto/NBD / •
250
4'
S» 200 •
*
• • • . . . ... /-l
H « 150 • • ''
J t
11 i
4-
î
j_
»
_
.\
î
- * :•
t 50
0
^ è ^ f Ö;
1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+
the first 39 weeks. Note that for each frequency level (x). tionai on the calibration period recency and frequency (i.e.,
we average over cu.stomers of differing recency (i.e., differ- the numbers summarized in Figure 6). These different pro-
ent values of t^). Our ability to closely track the sales data in jections enable us to test each subcomponent of the model
a holdout period (at both the aggregate and the individual separately and together.
level) gives us confidence in the Pareto/NBD model as the In Figure 8, we report these two sets of conditional
basis for our upcoming CLV calculations. expectations along with the actual average total value of
The next step is to validate the monetary value model. customers' forecast period transactions, broken down by
The maximum likelihcwd estimates of the model parameters calibration period frequency (i.e., number of repeat transac-
are p = 6.25, q = 3.74. and 7 - 15.44. The theoretical mean tions). For each x. we average over customers of differing
transaction value differs from the mean observed average recency (values of t^) and calibration period monetary
repeat transaction value per customer by a mere $.09. To value.
visualize the fit of the model, we compute the implied dis- Our comparison of the "expected|actual x^" numbers
tribution of average transaction value across people using with the actual numbers can be viewed as a clean test of the
Equation 3 and compare it with a nonparametric density of
monetary value submodel. In comparing these predictions
the observed average transaction values in Figure 7.
with the actual numbers, we do not find any particular bias
The fit is reasonable. However, the theoretical mode of that would lead us to question the assumption of stationarity
$ 19 is greater than the observed mode of S15. which corre- in the average buying rates or the assumption that average
sponds to the typical price of a CD at the time the data were
transaction value is indejjendent of frequency. Our compar-
collected. The model is not designed to recognize the exis-
ison of the expected|expected Xj numbers with the actual
tence of threshold price points (e.g., prices ending in .99).
numbers can be viewed as a test of the combined Pareto/
so this mismatch is not surprising. The fit of the model
NBD + gamma-gamma submodels. The combined submod-
could be improved by adding additional parameters, but for
els provide good conditional predictions of expected total
the purposes of this article, we choose to forgo such tweaks
to maintain model parsimony. monetary value in the forecast (leriod: thus, we have confi-
dence in using this model to make overall predictions of
A stronger test of our model for monetary value is to
lifetime value outside the range of the observed data.
combine it with the Pareto/NBD model for transactions and
examine the quality of the predictions of individual-level CREATING AND ANALYZING ISO-VALUE CURVES
total spend in the forecast period ( Weeks 40-78 ) of the data
set. For each customer, we compute the expected average Now that we have demonstrated the substantial validity
transaction value conditioned on his or her calibration of the Pareto/NBD submodel for transactions and the
period frequency and monetary value using Equation 4. We gamma-gamma submodel for monetary value, we can use
compute two sets of conditional expectations of foreca.st all 78 weeks of data to make predictions about CLV. As we
pericxl monetary value: We obtain the first by multiplying previously noted, our ability to use all the data is a signifi-
each person's conditional expectation of per-transaction cant advantage that helps distinguish a well-specified sto-
value by his or her actual number of forecast period transac- chastic model from a more traditional scoring model: how-
tions; we obtain the second by multiplying the person's ever, it is not immediately clear how the results of a
conditional expectation of per-transaction value by his or standard two-period scoring model with RFM variables as
her predicted number of forecast period transactions, condi- key predictors can be projected beyond the observed data.
Figure 7 Figure 8
OBSERVED VERSUS THEORETICAL DISTRIBUTION OF CONDITIONAL EXPECTATIONS OF MONETARY VALUE
AVERAGE TRANSACTION VALUE ACROSS CUSTOMERS
04
s» 2^°
CO — Actual
Actual (nonparametric density) -*- Expected|actual X2
(1 _ Model (weighted by actual x) 200 -<^ Expectedjexpected xz
y
/
.03
I
fv
150
.02
m 100
'A-
o
.01 50
I
u
IU
1 2 3 4 5 6
50 100 150 200 250 300
Average Transaction Value (nix) ($) Number of Transactions in Weeks 1-39
422 JOURNAL OF MARKETING RESEARCH, NOVEMBER 2005
We reestimate the models using all 78 weeks of data, and who have made a relatively large number of transactions in
as a final validity check, we compare this "full" model with the past, recency plays a much greater role in the determi-
the 39-week version we used in the previous section. Fortu- nation of CLV than for customers who have made infre-
nately, the model remains stable as we double the length of quent purchases in the past.
the calibration period. If we take the new 78-week Pareto/ The iso-value curves that appear in Figure 10 further
NBD parameters and plug them back into the 39-week log- shed light on this complex relationship. For the high-value
likelihood (LL) function, we notice only a small decrea.se in customers (i.e., the upper-right area of Figure 10), the iso-
model fit (from -9608 to -9595). For the gamma-gamma value curves reflect the basic shape suggested by Figure 3.
model, the degree of stability is remarkable: The 78-week but the lower-value regions suggest that the iso-value lines
parameters provide a 39-week LL of ^ 6 6 1 . compared with begin to bend backwards. This may seem highly counter-
the optimal 39-week LL of ^ 6 5 9 . This provides strong intuitive: Someone with frequency of x = 7 and recency of
support for our assumption that the submodel governing t^ = 35 has an approximate DET of 2. the same as someone
monetary value is stable over time. with a lower (i.e., worse) frequency of x = 1 and recency of
We begin our iso-value analysis by focusing on the rela- t, = 30. In general, for people with low recency, higher fre-
tionship between DET and recency/frequency. We then quency seems to be a bad thing.
reintroduce monetary value to complete the picture. To To resolve this apparent paradox, consider the two hypo-
obtain the iso-value curves for DET. we evaluate Equation 2 thetical customers in Figure II. If we were certain that both
for all recency and frequency combinations (t¡j = 0, 1 customers were still active in Week 78. we would expect
78; X = 0. 1 14).-^ The assumed annual di.scount rate is Customer B to have a greater value of DET. given the
15%, which implies a continuously compounded rate of S = higher number of prior purchases, but the pattern of pur-
.0027. The CLV estimates appear as a "waterfall" plot in chases strongly suggests that Customer B is no longer
Figure 9.
With the exception of x = 0.* DET is an increasing func-
tion of recency. However, note that there is a strong interac- Figure 10
tion with frequency. For low-frequency customers, there is ISO-VALUE REPRESENTATION OF DET
an almost linear relationship between recency and DET.
However, this relationship becomes highly nonlinear for
high-frequency customers. In other words, for customers
Figure 9
DET AS A FUNCTION OF RECENCY AND FREOUENCY
0 10 20 30 40 50 60 70 8C
Figure 11
THE "INCREASING FREOUENCY" PARADOX
Customer A
Customer B H X X X MX X
data at hand. Instead, we rely on the same type of recency/ In Figtu-e 13, we show the waterfall and iso-value contour
frequency plots we used previously, but we allow monetary plots for two different levels of observed average transac-
value to vary at different levels across plots.5 tion value (m,): We use $20 and $50 as in Figure 12. At first
'We could bring monetar> value in as a primary dimension. Indeed, we that there was a great deal of redundancy (and confusion) when we
considered a variety of alternative plots (i.e.. recency and monetary value attempted to examine the data along different dimensions.
by frequency, and frequency and monetary value by recency), but we found
Figure 13
CLV AS A FUNCTION OF RECENCY AND FREQUENCY FOR AVERAGE TRANSACTION VALUES OF $20 AND $50
= $20 = $50
20
20
0 0
0 0
Frequency (x) Recency (tx Frequency (x) Recency (tx
10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80
,i,j ; .. Recency (tx)
RFM and CLV
glance, these waterfall plots show some meaningful similar- tomers on the basis of their RFM characteristics. This
ities to each other and to the earlier plots for DET only enables us to "close the loop" with traditional RFM seg-
(Figures 9 and 10); recency and frequency each has a strong mentation analyses to show how our model can be used for
main effect on CLV, they interact positively when both are target marketing purposes.
at high levels, and the flat area on the left-hand side of each We .set aside the 12,054 customers who made no repeat
graph reflects the "increasing frequency" paradox we dis- purchases over the 78-week observation period. We
cussed previously. assigned each of the remaining 11,506 customers an RFM
For the most part, these plots are essentially rescaled ver- code in the following manner: We .sorted the list of cus-
sions of each other (reflecting the assumption of independ- tomers in descending order on recency, and we coded the
ence between recency/frequency and monetary value), but customers in the top tercile (most recent) R = 3. in the sec-
the regression-to-the-mean patterns in Figure 12 are also a ond tercile R = 2. and in the third tercile (least recent) R =
factor, particularly at lower levels of frequency. As x 1. We then sorted the entire list in descending order on fre-
increases in each plot, the increases in CLV are altered by quency; we coded members of the top tercile (highest num-
the diverging curves in that figure. For the case of m^ = $20. ber of transactions) F = 3. and so forth. Finally, we sorted
the lower curve in Figure 12 shows that the changes associ- the customer list in descending order on average transaction
ated with increasing frequency are less than linear, but for value and coded the customers in the top tercile (highest
the case of m^ = $50. the upper curve in Figure 12 shows average transaction value) M = 3. and so forth. We coded
that there is a greater-than-linear multiplier effect. These the customers who made no repeat purchases R = F = M = 0.
differences can be observed when comparing the lower- In Table 2, we show the estimate of total CLV for the 28
right portions of the two iso-value graphs. resulting groups; we report the size of each RFM group in
Another way to understand and appreciate the usefulness parenthe.ses. Perhaps the most striking observation is the
of our method for computing CLV is to combine the model- significant contribution of the "zero cell.'" Although each
driven RFM-CLV relationship with the actual RFM pat- customer in that cell has a small CLV value (an average
terns observed in our data set. In Figure 14. we show the expected lifetime value of $4.40 beyond Week 78 for some-
distribution in the recency/frequency space for the 11,506 one who made his or her initial, and only, purchase at
customers (of 23,560) who made at least one repeat pur- CDNOW in the first 12 weeks of the data set), this slight
cha.se at CDNOW in the 78-week observation p>eriod. Note whisper of CLV becomes a loud roar when it is applied to
that we have reversed the directions of the recency and fre- such a large group of customers. This is an important sub-
quency axes (for this figure only) to make it possible to stantive finding from our model. Many miuiagers would
obser\'e the patterns here. This figure shows that the major- assume that after a year and a half of inactivity, a customer
ity of repeating customers had only one ref)eat transaction; has dropped out of the relationship with the firm, but these
from the mass of customers at the back of the distribution, it very light buyers collectively constitute approximately 5%
is clear that many of these single ref)eat purchases occurred of the total future value of the entire cohort, which is greater
early in the observation period. than most of the 27 other RFM cells.
Essentially, our goal is to integrate this customer distribu-
tion along with the iscvvalue curves to derive a sense of
overall CLV for the customer ba.se. To enhance the clarity Table 2
and interpretability of this combination, we group cus- TOTAL CLV BY RFM GROUP
Recency
Figure 14 Frequency
DISTRIBUTION OF REPEATING CUSTOMERS IN THE 78-WEEK
0 $5?.O(K1
RECENCY/FREOUENCY SPACE (12.054)
M=l 1 $7.700 $9.900 $1.800
(1197) (482) (71)
2 $2.800 $15.300 $17.400
(382) (488) (419)
3 $300 $12,500 $52.900
(57) (256) (484)
200-,
1 $5.900 $7,600 $2.300
(6.50) (264) (68)
2 $3.6(K) $26.500 $25.800
(3.58) (545) (414)
3 $500 $37.200 $203.000
(86) (478) (972)
M=3 I $11,300 $19,700 $3.700
(676) (371) (57)
10 2 $7,300 $45,900 $47.900
15 (329) (504) (3%)
20+
Recency (tx) 3 $1.000 $62.700 $414.900
Frequency (x) (101) (447) (954)
426 JOURNAL OF MARKETING RESEARCH, NOVEMBER 2005
Examining the other cells, we find clear evidence of the CLV forecasts. This uniform use of the data should lead to
same patterns we discussed previously for the iso-value greater faith in the meaningfulness of these estimates and
curves. For example, within the recency x frequency table an improved ability to tie CLV to observed RFM
associated with each level of the monetary value dimension, characteristics.
there is consistent evidence that high-frequency/low- Beyond the performance of the model and its usefulness
recency customers are less valuable than those with lower as a forecasting tool, we uncovered or confirmed several
frequency. Not surprisingly, the lower-right cell, which rep- substantive observations:
resents high levels on all three dimensions, has the greatest
•There is a highly nonlinear relationship between recency/
CLV; it shows an average net present value of $435 per cus- frequency and future transactions. Standard scoring models
tomer. This represents approximately 38% of the future (particularly those without solid underlying behavioral
value of the entire cohort. For the cohort as a whole, the theories) would be unlikely to capture this complex
average CLV is approximately $47 per customer, making relationship.
the full group of 23.560 customers worth slightly more than •The existence of the increasing frequency paradox is clearly
$1.1 million. visible in all of the iso-value curves. For low levels of recency,
Table 3 reports the average CLV for each RFM tercile. customers with greater frequency are likely to have lower
We note that the greatest variability in CLV is on the future purchasing potential than customers with lower prior
purchasing rates.
recency dimension, followed closely by the frequency
•The underlying pr(x:ess that drives nionetar>' value per transac-
dimension. The least variation is on the monetary value
tion appears to be stable over time and largely independent of
dimension. This is consistent with the widely held view recency and frequency. However, these patterns require further
that recency is usually a more powerful discriminator than testing before they can be accepted as "empirical generaliza-
frequency or monetary value (thus, the framework is tions." As such, any researcher applying our model to a new
called RFM rather than, for example. FRM (Hughes data set must test the validity of the assumption of independ-
2000]). ence between recency/frequency and monetary value. We
believe that the analysis reported in the .section "Assessing the
SUMMARY AND CONCLUSION Independence of the Monetary Value Assutnption" and the
conditional expectations of monetary value (Figure 8) are good
Many recent articles have discussed the importance of
ways to explore this tnatter.
determining CLV and understanding how it can be associ-
•Despite the finding that tnonetary value and frequency are
ated with obser\'able behavioral characteristics, but despite essentially independent, a reasonably strong regression-to-the-
this high level of interest in the topic, few have offered a tnean pattem creates the illusion that they are more tightly
carefully validated statistical model and specific CLV esti- connected (Figure 13): that is. there is more regression-to-the-
mates for a large group of customers. From a methodologi- tnean in E(M|m^. x) for customers with a .small number of
cal perspective, we offer several useful contributions to the observed transactions (i.e.. a low frequency) than for cus-
CLV literature. tomers with a larger number of observed transactions (i.e.. a
The key to our analysis is the use of a forward-looking high frequency). This is a general propeny of our submodel
model that features well-accepted behasioral assumptions for dollar expenditure per transaction, which assumes that
rather than relying on only prior purchase data as a mirror average transaction values are independent of the transaction
process.
for the future. Not only does the model help us translate
•Furthermore, the monetary value process does not conform
prior behavior into likely future trends, but it also fills in the
well to the typical use of normal distributions for transaction
many sparse holes (and it smoothes out the random "blips" amounts and intercustomer heterogeneity. The gamma-
that are present in any observed data set), providing a gamma model is a straightforward way to capture the high
cleaner image of these patterns. These advantages can be degree of skewness we observed in both of the.se
observed clearly in the iso-value curves that we discussed at distributions.
length. A "data-driven" curve-fitting procedure would be •A thorough analysis of the customer base requires careful con-
hard-pressed to provide the same degree of diagnostic value sideration of the zero class (i.e.. the customers who made no
as our model. purchases during the observation period). This is often a very
From the standpoint of managerial implementation, it is a large group, so even though each member may have a small
future lifetime value, collective profitability may still be quite
significant benefit that the model inputs are nothing more
substantial.
than each customer's RFM characteristics. Furthermore, our
•We showed how iso-value curves can be created and used to
model also avoids the need to split the sample into two (or identify customers with different purcha.se histories but similar
more) time f)eriods for calibration purposes. All of the avail- CLVs.
able data can be combined into a single sample to obtain the •Finally, we emphasized the use of several validation tests (par-
ticularly the conditional expectations curves) that .should be
used in all CLV analyses regardless of the underlying model.
Table 3 Given the forward-l(X)king nature of CLV. it is not enough to
. - . • • • -
AVERAGE CLV BY RFM TERCILE consider a model (and compare it with benchmarks) using only
the calibration sample. Furthermore, given the individual-level
Code nature of CLV. it is not enough to use tracking plots or other
purely aggregate summaries to judge the perfonnance of the
/ 2 3 model.
Recency 10 62 201
Frequency 18 5Ü 205 Although all of these contributions are managerially
Monetary value 31 81 160 important, we consider our basic model only a first step
N 3836 3835 3835 toward a complete understanding of how best to understand.
RFM and CLV
capture, and create additional value for a given group of With respect to acquisition, it would be desirable if the
customers. Natural extensions include (I) introducing model could be used to make predictions for groups of
marketing-mix variables into the model,*» (2) adding an jx)tential buyers before the firm targets them for acquisition.
optimization layer to the model to help allocate resources It might be possible to "connect the dots" and extrapolate
most effectively, (3) relaxing the assumption of independ- the model parameters from existing cohorts to new ones.
ence between the distribution of monetary value and the This latter point is an ambitious step. Further research
underlying transaction process, and (4) relaxing the must first replicate and extend the model for a variety of
assumption of constant contribution margin per transaction. existing cohorts across a variety of firms. We expect that
With respect to the third point, this means we allow for a several of our substantive observations may become
correlation between the v parameter in the monetary value "empirical generalizations" related to CLV. We hope that
submodel and the X parameter in the transaction model. other researchers and practitioners will confirm and extend
This could be accommcxlated by replacing their respective some of our findings and uncover new behavioral patterns
(independent) gamma distributions with a bivariate Sar- using the basic concepts and modeling platform we present
manov distribution that has gamma marginals (Park and herein.
Fader 2004). Alternatively, this correlation h)etween v and X
could easily be accommodated by moving to a hierarchical APPENDIX
Bayesian formulation of the basic model. With respect to Our objective is to derive an expression for the present
the fourth jxjint. our empirical analysis assumed a gross value of a customer's future transaction stream, conditional
contribution margin of 30% because we have no informa- on obser\'ed purchasing history, as implied by the Pareto/
tion about the actual margins lor CDNOW. In practice, we NBD model. The Pareto/NBD model is based on the fol-
would expect to use customer-level (or at least segment- lowing assumptions:
level) contribution margins. If we observe considerable
intracu.stomer variation in contribution margin, we may 1. While a customer is active, the numher of transactions he or
she makes follows a Poisson process with transaction rate /..
wish to modify the monetary value component of the mcxiel This is equivalent to assuming that the time hetween transac-
(see the section "Adding Monetary Value") to model margin tions is distrihuted exponentially with transaction rate X.
per transaction rather than dollar expenditure per transac- 2. Each customer has an unobserved lifetime of length T (after
tion. We would need to replace the individual-level gamma which he or she is viewed as heing inactive), which is distrih-
distribution, the domain of which is nonnegative, with a uted exponentially with dropout rate \i.
skewed distribution defined over the domain of real num- 3. Heterogeneity in transaction rates across customers follows a
bers (to accommodate the transactions on which the com- gamma distribution with shape parameter r and scale
pany makes a loss). parameter a.
4. Heterogeneity in dropout rates across customers follows a
We also recognize that our context of noncontractual pur- gamma distribution with shape parameter s and scale
chasing is a limitation. It is true that, in general. RFM parameter ß.
analyses are conducted in such a domain, but the use of 5. TTie transaction rate X and the dropout rate |i vary independ-
CLV goes well beyond it. It is worthwhile to consider how ently across customers.
similar mcxiels and concepts can be applied to subscrip-
tions, financial ser\ices. and other types of business rela- Assume that it is known when each of a customer's x
tionships. It is not immediately clear how the model compo- transactions occurred during the p)eriod (0, T]; we denote
nents we used (Pareto/NBD for transaction incidence and these times by t,. t^ t^:
gamma-gamma for transaction size) need to be changed to 0 t, t, t, T
accommodate these other contexts, but we hope that
researchers will continue to use stochastic models (rather
than scoring models) to address these new situations.
Another limitation is that we examined only one cohort
for CDNOW. This is a sizable group of customers but prob-
ably not a representative snapshot of the company s entire There are two possible ways this pattern of transactions
customer ba.se. Managers need to run the model across mul- could arise:
tiple cohorts to obtain an accurate picture of the value of an I. The customer is still active at the end of the observation
entire customer ba.se. In doing so, the definition of these peiiod (i.e.. T > T). in which case the individual-level likeli-
cohorts becomes an important issue: Should we group cus- hood function is simply the product of the (intertransaction
tomers simply by date of initial purchase (as we did here), time) exponential density functions and the associated sur-
or should we group them on the basis of geography, demo- vival function, •
graphics, or mode of acquisition.' The answers to these . t, . T>T)
questions should not affect the development of the model
per se, but they might have a large influence on how the
model is implemented in practice.
Note that we do not require information on when each of The probability that a customer with purchase history
the X transactions occurred; the only customer information (X = X. t,. T) is active at time T is the probability that the
we require is (X = x. t^. T). By definition, t^ = 0 when x = 0. (unobserved) time at which he or she becomes inactive (t)
In other words, recency (t^) and frequency (x) are sufficient occurs after T. Referring back to our derivation of the
statistics. individual-level likelihood function, the application of
Removing the conditioning on T yields the following Bayes'theorem leads to
expression for the individual-level likelihood function:
(A2) P(T > T|X., |i, X = X. t,. T) :., r
: _ L(A.|X = X. t^. T. T > T)P(T > T|n)
xf(t|n)}dT
We can now turn our attention to the derivation of DET
= ^''e-'-Te-fT + X" e-^^ne->»^dT as implied by the Pareto/NBD model. The general explicit
formula for the computation of CLV is (Rosset et al. 2003)
CLV = rv(t)S(t)d(t)dt,
Jo
It follows that the likelihood function for a randomly cho- where, for t > 0 (with t = 0 representing "now"). v(t) is the
sen person with purchase history (X = x. t,, T) is customer's value at time t. S(t) is the survivor function (i.e..
the probability that the customer has remained active to at
(Al) lea.st time t). and d(t) is a discount factor that reflects the
present value of money received at time t. When we factor
=ff out the value of each transaction. v(t) becomes the underly-
Jo Jo ing transaction rate X. It follows that, conditional on X and
r(r + [I. the discounted present value at time 0 of a customer's
expected transaction stream over his or her lifetime with
r(r) continuous compounding at rate of interest 5 is
DET(6|>..n) =
(A3)
where, for a > ß.
-)F, r + s + x , s + U r + s + x + 1;
- '[ As noted in any introductory finance textbook, an annual
discount rate of (100 x d)% is equivalent to a continuously
_ D \
- ,F. I r + s + X. s + 1; r + s + X + 1; " \, compounded rate of Ô = ln( 1 + d). If the data are recorded in
a+ij time units such that there are k periods per year (k = 52 if
the data are recorded in weekly units of time), the relevant
and for a < ß. continuously compounded rate is Ô = ln( 1 + d)/k.
Standing at time T, the DET for a customer with purchase
- -,F,
-,F, r + s + X. r + x; history (X = x. t,. T) is the discounted present value of the
- ''[ customer's expected transaction stream over his or her life-
time (Equation A3) times the probability that the customer
2^1 r + s + x . r + x;r + s + x + l: ^—-:z - with this purchase history is active at time T (Equation A2):
(A4)
(For details of the derivation, see Fader and Hardie 2005.)
The four Pareto/NBD model parameters (r. a. s. ß) can = DET(6|X.. > T|>.. n. X = X. t,. T).
be estimated by the method of maximum likelihood in the
following manner: Suppose we have a sample of N cus- However, we do not observe k and |i. Therefore, we com-
tomers, where customer i had Xj = Xj transactions in the pute DET(Ô|X = X, t,. T) for a randomly chosen person by
period (0. Tjl. with the la.st transaction occurring at l^^. The taking the expectation of Equation A4 over the distribution of
sample log-likelihood function is given by X and ¡I, updated to account for the information (X = x, t,, T):
n:
r " f°DET(5|X. \i)X''e-'^ * >^''^g(?v|r. a)g(li|s. ß)
L(r,a. s.
Rosset. Saharon. Einat Neumann. Uri Eick, and Nurit Vatnik , Donald G. Morrison, and Richard Colombo (1987),
(2003). "Customer Lifetime Value Models for Decision Sup- "Counting Your Custotners: Who They Are and What Will They
port." Data Mining and Knowledge Discovery. 7 (July). 321-39. Do Next?" Management Science, 33 (January). 1-24.
Schmittlein. David C . Lee G. Cooper, and Donald G. Morrison and Robert A. Peterson (1994). "Customer Base Analysis:
(1993). "Truth in Concenu~ation in the Land of (80/20) Laws," An Industrial Purchase Process Application," Marketing Sci-
Marketing Science, 12 (Spring), 167-83. ence, 13 (Winter), 41-67.
Copyright of Journal of Marketing Research (JMR) is the property of American Marketing Association and its
content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for individual use.