Beruflich Dokumente
Kultur Dokumente
1This chapter is based on an article with the same title in Journal of Targeting, Measurement and
Analysis for Marketing, 6, 2, 1997. Used with permission.
TABLE 13.1
Response Decile Analysis
Decile Cumulative
Number of Number of Response Response
Decile Individuals Responders Rate Rate Cum Lift
Top 18,110 88 0.49% 0.49% 186
2 18,110 58 0.32% 0.40% 154
3 18,110 50 0.28% 0.36% 138
4 18,110 63 0.35% 0.36% 137
5 18,110 44 0.24% 0.33% 128
6 18,110 48 0.27% 0.32% 123
7 18,110 39 0.22% 0.31% 118
8 18,110 34 0.19% 0.29% 112
9 18,110 23 0.13% 0.27% 105
Bottom 18,110 27 0.15% 0.26% 100
Total 181,100 474 0.26%
2Validation of any response or profit model built from any modeling technique (e.g., discrimi-
nant analysis, logistic regression, neural network, genetic algorithms, or CHAID).
decile, the Cum Lift 154 indicates that when soliciting to the top-two deciles
— the top 20% of the customer file based on the RM model — there is an
expected 1.54 times the number of responders found by randomly soliciting
20% of the file.
As luck would have it, the data analyst finds two additional samples on
which two additional decile analysis validations are performed. Not surpris-
ing, the Cum Lifts for a given decile across the three validations are some-
what different. The reason for this is the expected sample-to-sample variation,
attributed to chance. There is a large variation in the top decile (range is 15
= 197 – 182), and a small variation in decile 2 (range is 6 = 154 – 148). These
results in Table 13.2 raise obvious questions.
1. With many decile analysis validations, how can an average Cum Lift
(for a decile) be defined to serve as an honest estimate of the Cum
Lift? Additionally, how many validations are needed?
2. With many decile analysis validations, how can the variability of an
honest estimate of Cum Lift be assessed? That is, how can the stan-
dard error (a measure of precision of an estimate) of honest estimate
of the Cum Lift be calculated?
The answers to these questions and more lie in the bootstrap methodology.
3 Other resampling methods include: the jackknife, infinitesimal jackknife, delta method, influ-
ence function method, and random subsampling.
4 A sampling distribution can be considered as the frequency curve of a sample statistic from an
where
Sample Mean value is simply the mean of the five numbers in the sample.
|Z a/2 | is the value from the standard normal distribution for a 100(1 – a)%
CI. The |Z a/2 | values are 1.96, 1.64 and 1.28 for 95%, 90% and 80% CIs,
respectively.
Standard Error (SE) of the sample mean has the analytic formula: SE = the
sample standard deviation divided by the square root of the sample size.
For sample A, SE equals 8.31, and the 95% CI for the population mean
is between 4.31 ( = (20.60 – 1.96*8.31) and 36.89 ( = (20.60 + 1.96*8.31). This
is commonly, although not quite exactly, stated as: there is a 95% confidence
that the population mean lies between 4.31 and 36.89. The statistically
correct statement for the 95% CI for the unknown population mean is: if
one repeatedly calculates such intervals from, say, 100 independent random
samples, 95% of the constructed intervals would contain the true popula-
tion mean. Do not be fooled: once the confidence interval is constructed,
the true mean is either in the interval, or not in the interval. Thus, the 95%
confidence refers to the procedure for constructing the interval, not the
observed interval itself.
The above parametric approach for the construction of confidence intervals
for statistics, such as the median or the Cum Lift, does not exist because the
theoretical sampling distributions (which provides the standard errors) of
the median and Cum Lift are not known. If confidence intervals for median
or Cum Lift are desired, then approaches that rely on a resampling meth-
odology like the bootstrap can be used.
Sample B: 0.1 0.1 0.1 0.4 0.5 1.0 1.1 1.3 1.9 1.9 4.7
7 This bootstrap method is the normal approximation. Others are Percentile, B-C Percentile and
Percentile-t.
8 Actually, the sample distribution function is the nonparametric maximum likelihood estimate
10 This calculation arguably insures a bias-reduced estimate. Some analysts question the use of
the bias correction. I feel that this calculation adds precision to the decile analysis validation
when conducting small solicitations, and has no noticeable effect on large solicitations.
As you may have suspected, the sample was drawn from a normal popu-
lation (NP). Thus, it is instructive to compare the performance of the bootstrap
with the theoretically correct parametric chi-square test. The bootstrap (BS)
confidence interval is somewhat wider than the chi-square/NP interval in
Figure 13.1. The BS confidence interval covers values between .50 and 2.47,
which also includes the values within the NP confidence interval (0.93, 2.35).
TABLE 13.3
100 Bootstrapped StDs
1.3476 0.6345 1.7188 … … … … 1.4212 1.6712 1.0758
… … … … … …
… … … … … …
… … … … … …
… … … … … …
… … … … … …
… … … … … …
… … … … … …
… … … … … …
1.3666 1.5388 1.4211 … … … … 1.4467 1.9938 0.5793
FIGURE 13.1
Bootstrap vs. Normal Estimates
Regardless of who has built a database model, once it is ready for imple-
mentation, the question is essentially the same: “how large a sample do I
need to implement a solicitation based on the model to obtain a desired per-
formance quantity?” The database answer, which in this case does not
require most of the traditional input, depends on one of two performance
objectives. Objective one is to maximize the performance quantity for a
specific depth-of-file, namely, the Cum Lift. Determining how large the sam-
ple should be — actually the smallest sample necessary to obtain Cum Lift
value — involves the concepts discussed in the previous section that corre-
spond to the relationship between confidence interval length and sample
size. The following procedure answers the sample size question for objective
one.
1. For a desired Cum Lift value, identify the decile and its confidence
interval containing Cum Lift values closest to the desired value,
based on the decile analysis validation at hand. If the corresponding
confidence interval length is acceptable, then the size of the valida-
tion dataset is the required sample size. Draw a random sample of
that size from the database.
2. If the corresponding confidence interval is too large, then increase
the validation sample size by adding individuals or bootstrapping
a larger sample size until the confidence interval length is acceptable.
Draw a random sample of that size from the database.
3. If the corresponding confidence interval is unnecessarily small,
which indicates that a smaller sample can be used to save time and
cost of data retrieval, then decrease the validation sample size by
deleting individuals or bootstrapping a smaller sample size until the
confidence interval length is acceptable. Draw a random sample of
that size from the database.
The following procedure is used for determining how large the sample
should be. Actually, the smallest sample necessary to obtain the desired
quantity and quality of performance involves:
13.8.1 Illustration
Consider a three-variable response model predicting response from a pop-
ulation with an overall response rate of 4.72%. The decile analysis validation
based on a sample size of 22,600 along with the bootstrap estimates are in
Table 13.6. The 95% margin of errors (|Z a/2 |* SEBS) for the top four decile
95% confidence intervals are considered too large for using the model with
confidence. Moreover, the decile confidence intervals severely overlap. The
top decile has a 95% confidence interval of 160 to 119, with an expected
bootstrap response rate of 6.61% ( = (140/100)*4.72%; 140 is the bootstrap
Cum Lift for the top decile). The top-two decile level has a 95% confidence
interval of 113 to 141, with an expected bootstrap response rate of 5.99% ( =
(127/100)*4.72%; 127 is the bootstrap Cum Lift for the top-two decile levels).
TABLE 13.6
Three-Variable Response Model 95% Bootstrap Decile Validation
Bootstrap Sample Size n = 22,600
Cum 95%
Number of Number of Response Response Bootstrap Margin of 95% Lower 95% Upper
Decile Individuals Responses Rate (%) Rate (%) Cum Lift Cum Lift Error Bound Bound
Top 2,260 150 6.64 6.64 141 140 20.8 119 160
2nd 2,260 120 5.31 5.97 127 127 13.8 113 141
3rd 2,260 112 4.96 5.64 119 119 11.9 107 131
4th 2,260 99 4.38 5.32 113 113 10.1 103 123
5th 2,260 113 5.00 5.26 111 111 9.5 102 121
6th 2,260 114 5.05 5.22 111 111 8.2 102 119
7th 2,260 94 4.16 5.07 107 107 7.7 100 115
8th 2,260 97 4.29 4.97 105 105 6.9 98 112
9th 2,260 93 4.12 4.88 103 103 6.4 97 110
Bottom 2,260 75 3.32 4.72 100 100 6.3 93 106
Total 22,600 1,067
TABLE 13.7
Three-Variable Response Model 95% Bootstrap Decile Validation
Bootstrap Sample Size n = 50,000
Model Bootstrap 95% Margin 95% Lower 95% Upper
Decile Cum Lift Cum Lift of Error Bound Bound
Top 141 140 15.7 124 156
2nd 127 126 11.8 114 138
3rd 119 119 8.0 111 127
4th 113 112 7.2 105 120
5th 111 111 6.6 105 118
6th 111 111 6.0 105 117
7th 107 108 5.6 102 113
8th 105 105 5.1 100 111
9th 103 103 4.8 98 108
Bottom 100 100 4.5 95 104
TABLE 13.8
Three-Variable Response Model 95% Bootstrap Decile Validation
Bootstrap Sample Size n = 75,000
I create a new bootstrap validation sample of size 50,000. The 95% margins
of error in Table 13.7 are still unacceptably large and the decile confidence
intervals still overlap. I further increase the bootstrap sample size to 75,000
in Table 13.8. There is no noticeable change in 95% margins of error, and the
decile confidence intervals overlap.
I recalculate the bootstrap estimates using 80% margins of error on the
bootstrap sample size of 75,000. The results in Table 13.9 show that the
decile confidence intervals have acceptable lengths and are almost non-
overlapping. Unfortunately, the joint confidence levels are quite low: at
least 60%, at least 40% and at least 20% for the top-two, top-three and top-
four decile levels, respectively.
TABLE 13.10
Three-Variable Response Model 95% Bootstrap Decile Validation
Bootstrap Sample Size n = 175,000
Model Bootstrap 95% Margin 95% Lower 95% Upper
Decile Cum Lift Cum Lift of Error Bound Bound
Top 141 140 7.4 133 147
2nd 127 127 5.1 122 132
3rd 119 120 4.2 115 124
4th 113 113 3.6 109 116
5th 111 111 2.8 108 114
6th 111 111 2.5 108 113
7th 107 108 2.4 105 110
8th 105 105 2.3 103 108
9th 103 103 2.0 101 105
Bottom 100 100 1.9 98 102
TABLE 13.11
Bootstrap Model Efficiency
Bootstrap Sample Size n = 175,000
Eight-Variable Response Model A Three-Variable Response Model B
Model Bootstrap 95% Margin of 95% Lower 95% Upper 95% Margin of
Decile Cum Lift Cum Lift Error Bound Bound Error Efficiency Ratio
Top 139 138 8.6 129 146 7.4 86.0%
2nd 128 128 5.3 123 133 5.1 96.2%
3rd 122 122 4.3 117 126 4.2 97.7%
4th 119 119 3.7 115 122 3.6 97.3%
5th 115 115 2.9 112 117 2.8 96.6%
6th 112 112 2.6 109 114 2.5 96.2%
7th 109 109 2.6 107 112 2.4 92.3%
8th 105 105 2.2 103 107 2.3 104.5%
9th 103 103 2.1 101 105 2.0 95.2%
Bottom 100 100 1.9 98 102 1.9 100.0%
13.10 Summary
Traditional validation of database models involves comparing the Cum Lifts
from the calibration and hold-out decile analyses based on the model under
consideration. If the expected shrinkage (difference in decile Cum Lifts
between the two analyses) and the Cum Lifts values themselves are accept-
able, then the model is considered successfully validated and ready to use;
otherwise, the model is reworked until successfully validated. I illustrated
with a response model case study that the single-sample validation provides
neither assurance that the results are not biased, nor any measure of confi-
dence in the Cum Lifts.
I proposed the bootstrap — a computer-intensive approach to statistical
inference — as a methodology for assessing the bias and confidence in the
Cum Lift estimates. I provided a brief introduction to the bootstrap, along
with a simple ten-step procedure for bootstrapping any statistic. I illustrated
the procedure for a decile validation of the response model in the case study.
I compared and contrasted the single-sample and bootstrap decile valida-
tions for the case study. It was clear that the bootstrap provides necessary
information for a complete validation: the biases and the margins of error
of decile Cum Lift estimates.
I addressed the issue of margins of error (confidence levels) that are too
large (low) for the business objective at hand. I demonstrated how to use
the bootstrap to decrease the margins of error.
Then I continued the discussion on the margin of error to a bootstrap
assessment of model implementation performance. I addressed the issue:
“how large a sample do I need to implement a solicitation based on the
model to obtain a desired performance quantity?” Again, I provided a boot-
strap procedure for determining the smallest sample necessary to obtain the
desired quantity and quality of performance and illustrated the procedure
with a three-variable response model.
Last, I showed how the bootstrap decile analysis can be used an assessment
of model efficiency. Continuing with the three-variable response model
study, I illustrated the efficiency of the three-variable response model relative
to the eight-variable alternative model. The latter model was shown to have
more unstable predictions than the former model. The implication is: review
a model with too many variables for justification of each variable’s contri-
bution in the model; otherwise, it can be anticipated that the model has
unnecessarily large prediction error variance. The broader implication is: a
bootstrap decile analysis of model calibration, similar to bootstrap decile
analysis of model validation, can be another technique for variable selection,
and other assessments of model quality.