Sie sind auf Seite 1von 3

Introduction

Part 2 is intended to illustrate how binary classification performance metrics make


it possible for you to put an exact value, in dollars per event, on new information
that relates to a predictive model.

Note that new information will be worth far more if it is compared to no


forecasting model rather than the state of partial knowledge available from the
current model. Sellers of information (and data science consultants!) love to take
credit for any information gain they achieve over the base rate.

Very often some intermediate state of knowledge is already available for which no
additional spending is required. Evaluating the realistic incremental financial
gain from new information, whether licensing a third-party commercial database or
collecting new data internally, is therefore of great practical value, as this sets
an upper bound on what your Company should be willing to pay to license or create
the new information.

In this case study, your boss has been in discussions with an advanced machine-
learning predictive-analytics credit-risk analytics company that claims to score
individual probability of default with very high information gain. Lets call the
company Eggertopia. Eggertopia sales representatives claim their pre-processed
risk-scores can achieve AUC values as high as .85 or even higher. However,
Eggertopia scores are sold per-event, and they are expensive!

Your boss asks you to determine the incremental financial value to the bank of
purchasing Eggertopia risk scores on future credit-card applicants.

Eggertopia agrees to apply its algorithms to generate credit scores for the 400
individuals in the Training and Test Sets. Eggertopia scores do not need to be
combined with anything else to make a model. However, since the scores range from
approximately -600 (best credit risk) to 4900 (most likely to default) they will
need to be standardized and adjusted to fit the -3.5 to 3.5 range of the AUC
Calculator Spreadsheet (below)

AUC_Calculator and Review of AUC Curve.xlsx


You will determine the sustainable AUC of the Eggertopia scores, the sustainable
cost-per-event, and the savings per event, when comparing Eggertopia data to the
base rate forecast.

You will then calculate the incremental savings per event if you compare use of
Eggertopia data to use of your current model developed in Part 1.

Question: What is the AUC of the Eggertopia Scores on the Training Set? Give your
answer to two digits to the right of the decimal point.

.85 r

What is the optimum threshold on the training set to minimize the average cost per
test?

.15 x
.1 r

What is the average cost-per-event at the Training Set optimum threshold?


$640 x
$500 x
$540 x

What is the AUC of the Eggertopia scores on the Test Set?

.85 r

Using the same threshold as used on the training set, what is the cost per event of
the Eggertopia scores on the Test Set? Round to the nearest dollar.

$838 r

If the bank did not have your model, or any other way of forecasting default, what
is the maximum (break-even) price per event that the bank could theoretically pay
for Eggertopia scores? In other words, what are Eggertopias scores absolute
savings-per-event?

Hint: Calculate the difference between the cost-per-event at a 25% default rate,
and the cost-per-event using Eggertopia scores

$418 x
$423 x
$412 r

What is the True Positive rate of the forecasting model using Eggertopia Scores?

.72 r

What is its Positive Predictive Value (PPV) of the forecasting model using
Eggertopia scores?

Hint: To calculate the PPV, divide the portion of True Positives by the total
number of Positive Classifications. Review confusion matrix definitions and letter
designations on the Information Gain Spreadsheet, [PPV is defined at Cell G41],
obtain True Positive and False Positive Rates from the AUC Calculator Spreadsheet,
and use algebra to solve.

Information Gain Calculator.xlsx

.50 x
.54 x
.52 x

Incremental Financial Value of Eggertopia Scores

You calculated a cost per event for your own predictive model on Test Set data to
answer Quiz 1 - Part 1, Question 6.
Incremental Financial Value of Eggertopia Scores

You calculated a cost per event for your own predictive model on Test Set data to
answer Quiz 1 - Part 1, Question 6.

Question: Assuming that the performance of the Eggertopia model and your model both
remain stable on any future data (a big assumption), what is the maximum, or break-
even, price that the bank could pay per score for Eggertopia, given that it already
has your model and data?
200 r

Das könnte Ihnen auch gefallen