You are on page 1of 13

A Credit Data Evolution: Evaluating the

Predictive Power of Time-Varying Credit Bureau Data


Ryan Burton
CAPITAL Services

Thomas Brandenburger
South Dakota State University

Alfred Furth
CAPITAL Services

Chet Wiermanski
Aether Analytics LLC

February 6, 2014
Abstract

Data is central to improvements in any predictive model, and credit reporting agencies have

recently enhanced the way their data is stored and reported. This paper discusses how to leverage

the new data by creating credit trend characteristics to account for movements in credit behaviors

that are indicative of future credit performance. A case study was conducted to evaluate the

benefits of the new data coupled with a trend-calculation technique in predicting whether a

customer will charge-off their credit card account within the first year of acquisition. Test and

control scorecards were created to evaluate the incremental benefits of adding credit trend

characteristics to a standard credit risk scorecard. Comparing these scorecards, fit statistics

indicated that lift is gained by using trend characteristics. This new technique is one example of

how time-varying data will help credit issuers better predict risk. Consequently consumers will

benefit from receiving more appropriate offers based on their enhanced risk profile.

Keywords: predictive model; scorecards; logistic regression; data mining; credit cards

Page 2 of 13
1. Introduction

Credit reporting agencies consistently seek to improve the quality of their data, which is

manifested in the reliability of their credit models. A revolutionizing enhancement to the way

data is stored and reported is surfacing. Until recently, credit balances, payments, past due

amounts, and credit limits were limited to the most recent single point-in-time for each account.

A new enhancement will offer a time series dimension of dollar amounts for credit balances,

payments, past due amounts, and credit limits reported. One way for credit card issuers to

leverage this new data is to create credit trend characteristics that account for movements in

credit behaviors.

In order to assess the potential of this new data and accompanying trend characteristic

technique, a case study was performed on a credit card portfolio. The goal was to ascertain the

benefits of the new data in predicting whether a customer will charge-off their credit card

account within the first year of acquisition by using trend characteristics obtained from time-

varying data compared to using standard credit risk characteristics. A standard credit risk

scorecard using static point-in-time characteristics was created and used as the control. Credit

trend characteristics were calculated by estimating the linear trend of the time-varying

characteristics. Combining the trend characteristics with the traditional credit characteristics, a

second scorecard was created and used as the test. Fit statistics were applied to compare the two

models, and the results indicated significant lift was gained by using trend characteristics. This

new technique is one of many likely to be implemented by modelers to leverage the new data

soon available from credit bureau reporting agencies.

2. Background

Credit scoring has evolved into a critical tool for assessing future credit performance

across a variety of important consumer lending dimensions. Lenders invest heavily in the

Page 3 of 13
development of custom credit scoring systems, seeking new data sources and techniques to

improve the performance of their models. Enhanced data is often relied upon to develop more

robust models. Custom credit scores typically rely heavily upon data from credit reporting

agencies. Credit characteristics derived from a consumer credit file typically evaluate a

consumers lending history at individual points in time and is therefore a static representation of

the consumers behavior. The few trend elements that could be derived from a traditional

consumer credit report have typically been ignored. The recent availability of monthly time

series dollar amounts populated within credit balances, payments, past due amounts and credit

limit fields may offer significant incremental information over static credit characteristics [1].

Many lenders obtain credit scores and summarized credit characteristics of their credit

portfolios on a monthly basis; however, consumer credit report characteristics evaluated over

multiple points of time have not been relied upon by lenders nor made available from consumer

credit reporting agencies for account acquisition, credit underwriting, or account management

purposes. The reason for not using credit scores and/or credit characteristics trends to manage a

credit portfolio stems from the Fair Credit Reporting Act (1996) which requires furnishers and

users of consumer credit report information to ensure that the consumer credit information is

accurate and up to date [2]. For example, if a characteristic was disputed on a consumers

account with a separate lender, the update would not be reflected in the original lenders time-

varying data. This is why the validated enhanced credit bureau data is a necessity for lenders

using time-varying data. To ensure data accuracy, the U.S. credit reporting system relies heavily

upon consumers to dispute inaccurate information. This requires the ability for consumers to

review data on their credit file and to file a dispute when the consumer believes content on their

credit report is inaccurate. Until recently the process for consumers to view and dispute the

Page 4 of 13
accuracy of time-varying data did not exist. With this process in place, time-varying data can

now be used for pre-approved credit solicitation, account review and credit underwriting for

credit applications.

3. Credit Trend Characteristics

Credit bureau data is the foundation of nearly every underwriting tool used to evaluate

the current risk level of consumers. Due to the datas static nature, there has been a lack of focus

on characteristics that measure the direction or stability of risk. The standard set of credit

characteristics available from credit bureaus and custom credit characteristics derived from

consumer credit report information are usually in the form of the number of current credit

behaviors, time since the occurrence of certain events, and combinations of these two

dimensions.

Credit trend characteristics extract and summarize the trend of information associated

with traditional credit scoring characteristics. A traditional credit scoring data set used to create

a custom credit score has rows of distinct customer information. Columns are populated by credit

characteristics and at least one performance outcome.

In the new enhanced credit data, each account will now have rows representing different

points in time for many credit characteristics. To capture the trend found within each of the

credit characteristics, a credit trend characteristic for each credit characteristic per account can be

calculated [3]. This trend is the rate of change of the credit characteristic with respect to time.

Graphically this is equivalent to the slope of the line of best fit through the data points of that

credit characteristic over time.

[Figure 1: Data time line.]

Figure 1 shows the time line of data used to fit traditional and enhanced models. Traditional

credit characteristics are usually captured in the current month to estimate the performance target
Page 5 of 13
captured in future months. The credit trend characteristics measure the rate of change of

characteristics over the past several months. The formula for estimating the slope in a simple

linear regression model is given in equation (1) where is the number of observations, and

and are the dependent and independent observations respectively.


= (1)
2 ( )2

Applying equation (1), a credit trend characteristic representing the rate of change with respect to

time can be calculated. Equation (2) shows the calculation for the credit trend characteristic, , ,

corresponding to the credit characteristic and account where , is the number of

observations, is the time of the observation, and ,, is the value of the observation at time .

,
,
,
, =1 ,, =1 =1 ,,
, = 2 . (2)
, 2 ,
, =1 (=1 )

A general logistic regression model utilizing trend and standard characteristics is shown in

equation (3) where is the estimated probability of a binary event, is the estimated parameter

for the or credit characteristics.


log (1) = 0 + 1 1 + 2 1 + 3 2 + 4 2 + + 1 + . (3)

Using credit trend characteristics, customer behavior can be defined with more precision. As an

example, two customers with the same current past due balance can be ranked by their credit

worthiness according to their past due balance credit trend characteristic. The customer whose

current past due balance has been growing in size will exhibit a trend up, which would equate to

a positive trend characteristic. A customer whose delinquent balance has been shrinking over

time will exhibit a downward trend. The second customer may be considered less risky.

Page 6 of 13
Figure 2 illustrates another example where trend characteristics improve the predictive

power of models. ID1 and ID2 represent separate customers. Traditionally the information about

utilization for these potential customers only consists of the most recent month. With the new

data, prior month utilizations are also available. In figure 2 and table 1, time zero represents the

current month and the negative times represent past months. The goal is to predict future

behavior at positive time points.

[Figure 2: Utilization trend comparison.]

[Table 1: Utilization values.]

Both ID1 and ID2 have the same utilization of 0.5 at time 0. Using a traditional model, they

would be scored the same based on their utilization even though they have different previous

behaviors. ID1 has a decreasing utilization, and ID2 has an increasing utilization. Accounts with

increasing utilization are typically riskier than accounts with decreasing utilization even though

they may have the same current utilization. Using a model that encompasses the past utilization

through trend characteristics, ID1 would appropriately be scored higher than ID2 because ID1

has a negative utilization trend characteristic and ID2 has a positive utilization trend

characteristic.

Since the original characteristics included in the model are also indirectly accounted for in

the credit trend characteristic calculation, there is an inherent issue of the characteristics being

correlated with each other. This correlation when measured empirically was not a factor in the

data for the case study. Further, a simulation analysis showed if the variance of the credit

characteristics is reasonably constant over time, then correlation will not be a major issue.

Page 7 of 13
4. Case Study

A case study was conducted to measure the potential benefits of adding credit trend

characteristics to a standard logistic regression model. Two models were created, a control or

Champion model and a test or Challenger model. Using credit card solicitation data, the models

were made to predict if customers will charge off their credit cards within the next 12 months

after acquisition. To simplify the analysis, the credit characteristics used in the models were

restricted to a subset derived from a credit reporting agencys enhanced data including monthly

credit balances, payments, past due amounts, credit limits, and calculated utilization credit

characteristics.

The candidate credit characteristics of the Champion model were traditional static credit

characteristics from the most recent month. The candidate characteristics for the Challenger

model were the same candidate characteristics as in the Champion model along with their

corresponding calculated credit trend characteristics.

The data was partitioned into a training and validation data set. Sixty percent of the

sample making up 3,779 accounts was allocated to the training data set, and forty percent making

up 2,520 accounts was allocated to the validation data set. As a standard practice, each of the

potential credit characteristics was coarse classified by their weights of evidence before the

variable selection process [4].

After coarse classifying the characteristics, the same variable selection process was

conducted for the Champion and Challenger models similar to the method outlined in [5]. Using

the selected credit characteristics, each model was made using logistic regression. The credit

characteristics selected and their Wald Chi-square and p-values in each model are shown in

tables 2 and 3.

[Table 2: Champion model parameter statistics.]


Page 8 of 13
[Table 3: Challenger model parameter statistics.]

According to the Wald Chi-Square statistic, each parameter in both models is significant at an

= 0.1 level of significance. Tables 2 and 3 show the addition of the credit trend characteristics

changed what original characteristics were significant since the trend characteristics contain

some of the same information about the customers behavior. As an example, the Credit Line

characteristic is no longer significant after adding the trend characteristics because the trend

characteristics do a better job at explaining that portion of credit worthiness of a consumer.

4.1 Model performance

Table 4 shows the fit statistics of the models including the Kolmogorov Smirnov (KS)

and the area under the receiver operating characteristic curve (AUC). The magnitude of the fit

statistics is relatively small because the goal was not to build the best model, but to simply

compare the effect of adding the trend characteristics. The Challenger model has higher fit

statistics than the Champion model in both the training and validation datasets.

[Table 4: Fit statistics.]

Figure 3 show the receiver operating characteristic (ROC) curves for the Champion and

Challenger models using the validation data set.

[Figure 3: ROC Curve using validation data.]

According to the validation data set using a 0.2 false positive rate baseline, the Challenger model

can attain a 0.36 true positive rate while the Champion model can only attain a 0.26 true positive

rate. A company could target the least likely customers to charge-off with credit card offers and

could expect significantly fewer losses by utilizing the Challenger model.

Page 9 of 13
5. Conclusion

Traditionally, data provided by credit reporting agencies has been very reliable at

predicting credit worthiness of consumers. However, the addition of a trend component

associated with the credit characteristics studied enhanced the ability to predict credit worthiness

over traditional credit bureau information. Credit reporting agencies are on the brink of providing

new time-varying monthly information which will provide modelers the trend perspective to

better assess future credit performance.

To leverage this new enhanced data, users of consumer credit report information will

need to utilize new modeling techniques and invest in credit bureau aggregation software that

will allow them to create custom credit trend characteristics tailored to specific aspects of the

portfolios and target. By adding credit trend characteristics into their credit scoring and credit

decision support platforms, lenders and users of consumer credit report information should

expect to derive significant improvement across all aspects of the consumer credit life cycle.

Page 10 of 13
References

1. Gaskill D, Kolo B, McGraw T, Wiermanski C. US Patent No 20,120,278,227. Patent and

Trademark Office: Washington, DC, 2012.

2. The Fair Credit Reporting Act of 1996 15 U.S.C. 1681, 1996.

3. Morrison J, Marrying Credit Scoring and Time-Series Data. The RMA Journal:

Philadelphia, PA, March 2010.

4. Siddiqi N. Credit Risk Scorecards: Developing and Implementing Intelligent Credit

Scoring. John Wiley & Sons, Inc.: Hoboken, NJ, 2006.

5. Thomas L, Edelman D, Crook N. Credit Scoring and Its Applications. Society for

Industrial and Applied Mathematics: Philadelphia, PA, 2002.


Tables

Table 1: Utilization values.


Time -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 Trend
ID1 1.00 0.90 0.80 0.85 0.70 0.85 0.75 0.65 0.70 0.75 0.60 0.50 -0.033
ID2 0.10 0.10 0.15 0.25 0.30 0.15 0.40 0.20 0.45 0.40 0.45 0.50 0.036

Table 2: Champion model parameter statistics.


Credit Wald Chi-
P-Value
Characteristic Square
Payment 7.515 0.006
Credit Line 6.450 0.011
Balance 7.008 0.008
Utilization 3.389 0.066

Table 3: Challenger model parameter statistics.


Credit Wald Chi-
P-Value
Characteristic Square
Credit Line Trend 5.325 0.021
Balance Trend 5.866 0.015
Payment Trend 3.486 0.062
Payment 13.056 0.001
Utilization Trend 8.194 0.004
Past Due Trend 2.750 0.097
Balance 3.563 0.059
Utilization 2.953 0.086

Table 4: Fit statistics.


Training Validation
Champion Challenger Champion Challenger
KS 0.131 0.148 0.140 0.215
AUC 0.574 0.604 0.576 0.636

Page 12 of 13
Figures

Figure 1: Data time line.


Traditional
Trend Characteristics Characteristics Performance Target

Month -24 -23 . . . -2 -1 Current 1 2 . . . 11 12

Figure 2: Utilization trend comparison.

Figure 3: ROC Curve using validation data.

Page 13 of 13