Sie sind auf Seite 1von 4

Data Mining in Building Behavioral Scoring Models

Horng-I Hsieh

Tsung-Pei Lee

Tian-Shyug Lee*

Graduate Institute of Business


Administration
Fu-Jen Catholic University
Taipei County, Taiwan
timon.hsieh@gmail.com

Department of International Trade


and Finance
Fu-Jen Catholic University
Taipei County, Taiwan
004167@mail.fju.edu.tw

Department of Business
Administration
Fu-Jen Catholic University
Taipei County, Taiwan
036665@mail.fju.edu.tw

AbstractCredit scoring and behavioral scoring have become


very important credit risk management tasks during the past few
years due to the impact of several financial crises. The objective
of the proposed study is to explore the performance of behavioral
scoring using three commonly discussed data mining techniqueslinear discriminant analysis (LDA), backpropagation neural
networks (BPN), and support vector machine (SVM). To
demonstrate the effectiveness of behavioral scoring using the
above-mentioned techniques, behavioral scoring tasks are
performed on one bank credit card dataset in Taiwan. As the
results reveal, BPN outperforms other techniques in terms of
overall scoring accuracy and hence is an efficient alternative in
implementing behavioral scoring tasks.
Keywords-behavioral scoring; multi-class classification; neural
networks; support vector machine
* Corresponding author.

I.

INTRODUCTION

Having been grown rapidly during the past few decades in


the credit industry, financial institutions would be impossible to
decide whether to grant credit to customers without using
various automatic analysis techniques. With the increased
competition in the financial industry, the banks find it harder to
develop new customers than before. Because the cost of
attracting new customers increases rapidly, retaining existing
customer is a must for all banks. The banks may easily lose
customers if they cannot satisfy them with quality and good
service. That is, the customers will choose other banks that can
provide better personalized services. Though contemporary
marketing treats customers as valuable assets, it is crucial to
extract knowledge from huge numbers of consumers' data
collected by the credit department of banks to maintain the
viability and optimize credit book. Financial institutions invest
large amounts of resources to understand and explore the
contribution of their existing customers. Through analysis of
customer contribution, they can identify valuable customers
and further make efforts to retain and enhance their
contributions, to increase customer satisfaction, lead to better
bank customer relationship, and to locate causes of customer
loss. The analysis results serve as an important reference for
marketing and decision making.
However, it is not sufficient to identify valuable customers
solely based on their payment histories. A customer pays his
bills on time every month might also involve in risky behaviors

at the same time. As the delinquencies and charge-offs soar, the


problem of personal bankruptcy has become an international
concern yet little understood [1]. Therefore credit risk
assessment is crucial for financial institutions especially in this
era of economic globalization. Recently, several financial
crises, such as credit and cash card crises in Taiwan and
subprime mortgage crisis in US, have not only caused a severe
recession in domestic economy but resulted in the global
financial disaster. All of which were triggered by inappropriate
credit decisions. Speed and efficiency are key components to
the success of the credit issuers; however, profitability in
consumer credit industry is based on faster and better decisions.
Granting credit to customers with poor credit performing may
cause huge amount of losses, and further lead the banks into
bankruptcy. The adage pay on the spot and borrow a lot; pay
slow and youll get no dough fails to attack the problem
before it arises. The financial health of a customer may change
dramatically, thus must be monitored and managed over the
life of the leasing relationship.
Credit scoring and behavioral scoring are primary
techniques to answer these needs and aim to increase cash flow
and reduce losses from bad credit decisions. Many banks
continue to focus attention and resources on automating frontend credit scoring processes, but back-end behavioral scoring
are the foundation of their success [2]. The objective of this
study is to explore the performance of behavioral scoring using
several commonly discussed data mining techniques-linear
discriminant analysis (LDA), backpropagation neural networks
(BPN) and support vector machine (SVM).
The remainder of the paper is organized as follows: An
overview of credit and behavioral scoring models is given in
Section 2. Section 3 provides a brief outline of LDA, BPN, and
SVM. The empirical results of behavioral scoring models using
LDA, BPN, and SVM are provided in Section 4. Section 5
concludes the study and discusses the possible further research
areas.
II.

CREDIT AND BEHAVIORAL SCORING MODELS

Credit risk management has become one of the most


important tasks in the credit industry. Credit scoring and its
derivative, behavioral scoring, are the primary techniques that
help organizations decide whether to grant credit to consumers
based on the credit risk of applicants [3]. The objective of both
credit and behavioral scoring models is to assign consumers to

This research was partially supported by the National Science Council of the
Republic of China under Grant Number NSC 97-2221-E-030-011-MY2.

978-1-4244-5392-4/10/$26.00 2010 IEEE

either a good credit group that is likely to repay financial


obligation or a bad credit group whose application should be
denied because of its high possibility of defaulting on the
financial obligation. Therefore credit scoring lies in the domain
of the more general and widely discussed classification
problems [4]. Credit scoring is done on the front end when new
consumer applies for credit. On the other hand, behavioral
scoring evaluates the credit risk of existing consumers based on
the same principles. In building behavioral scoring models, one
uses the credit scoring variables and includes others which
describe the behavior [3].
Behavioral scoring has gained more and more attention
because the explosion of competition in recent years making it
difficult to attract and retain profitable, low-risk customers.
Generally, most business revenue, sometimes up to 90 percent,
is being produced from repeat transactions with existing
customers [2]. In addition, the financial health of customers
may change over time; thus must be continuously monitored
and managed. By forecasting their future performance,
behavioral scoring models allow financial institutions to make
decisions faster and better to retain creditworthy and valuable
customers.
Traditional research often treats credit scoring and
behavioral models as binary classification problem, the
approaches that might miss useful information [5]. In order to
discover more knowledge for advanced credit risk management,
multi-class classifiers are needed [6]. As a result, this study
used multi-class models to build behavioral scoring models.
III.

RESEARCH METHODOLOGY

A. Artificial Neural Networks


A neural network, modeled following the neural activity in
the human brain, is a computer-intensive, algorithmic
procedure for transforming inputs into desired outputs using
highly interconnected networks of relatively simple processing
elements. Neural networks are increasingly found to be useful
in modeling non-stationary processes due to their associated
memory characteristics and generalization capabilities. Neural
networks have been widely used in engineering, science,
education, social research, medical research, business, finance,
forecasting, and related fields. Neural networks have also been
explored in handling credit and behavioral scoring problems [4]
and the results show that the scoring accuracies of neural
networks are better than those using traditional statistical and
parametric approaches.
B. Support Vector Machine
SVM is a novel non-parametric statistical learning
algorithm developed by Vapnik [7]. The original SVM was
designed for solving the binary classification problem, and has
gained popularity due to many attractive features and
promising generalization performance. Based on the structured
risk minimization (SRM) principle, SVM seeks to minimize an
upper bound of the generalization error instead of the empirical
risk minimization (ERM) principle implemented in most of the
traditional neural network models. Generally, the neural
network models may tend to fall into the problem of local

minimum. On the other hand, the SVM will be equivalent to


solve a linear constrained quadratic programming (QP)
problem so that it can provide both global and unique solutions.
In order to create a classifier, the basic principle of SVM is
learning to find out a line or hyperplane between the two-class
data set. If the margin is maximized, the hyperplane is called
optimal separating hyperplane (OSH). When the data is not
linearly separated, the SVM uses the kernel method to map the
input data into a high-dimensional feature space via a nonlinear
mapping, and then performs a linear separating between the
two classes in this space.
The SVM works very well when dealing with the highdimensional data and therefore avoids the curse of
dimensionality problem. It has been widely used in modeling
credit and behavioral scoring problems and preliminary
evidence suggest support vector machines seem to be most
accurate [8]. For a detailed introduction, the readers are
referred to [7].
IV.

EMPIRICAL STUDY

The aim of this study is to explore the performance of


behavioral scoring using three commonly discussed data
mining techniques-LDA, BPN, and SVM. To demonstrate the
effectiveness of behavioral scoring using the above-mentioned
techniques, behavioral scoring tasks are performed on one bank
credit card dataset in Taiwan. The dataset consists of totally
10769 cardholders. Each cardholder in the dataset contains 41
variables, such as demographics, credit history, account
balances, payment history, etc., which are used to describe the
cardmembers' characteristics, credit status as well as credit card
usage behavior. The dataset are divided into four groups:
transactor users, revolver users, inactive users without using
their credit cards, and bad credit customers. The number of the
four groups are 3835, 2567, 3884, and 483. Table I summarizes
the distribution of the dataset.
TABLE I.
Groups
Group 1
Group 2
Group 3
Group 4
Total

DISTRIBUTION OF DEPENDENT VARIABLE

Description
Transactor users
Revolver users
Inactive users
Bad credit customers

Frequency
3835
2567
3884
483
10769

Percentage
35.61%
23.84%
36.07%
4.49%
100.0%

In order to minimize the possible bias associated with the


random sampling of the training and testing samples,
researchers tend to use n-fold cross-validation scheme in
evaluating the classification capability of the built model. In nfold cross-validation, the entire dataset is randomly split into n
mutually exclusively subsets (also called folds) of
approximately equal size with respect to the ratios of different
populations. The classification model will then be trained and
tested n times. Each time the model is built using (n1) folds
as the training sample and the remaining single fold is retained
for testing. The training sample is used to estimate the
behavioral scoring models parameters while the retained
holdout sample is used to test the generalization capability of
the built model. The overall classification accuracy of the built

model is then just the simple average of the n individual


accuracy measures. The five-fold cross-validation will be
adopted, the detailed behavioral scoring results using the
above-mentioned modeling techniques can be summarized as
follows.
A. Linear Discriminant Analysis Model
The stepwise discriminant approach is adopted in building
the discriminant analysis behavioral scoring models. Twentysix out of forty-one independent variables are selected in the
final discriminant function. The behavioral scoring
classification results of the corresponding testing samples
obtained from five discriminant functions are summarized in
Table II.
TABLE II.

As the training of any neural network is itself a stochastic


process, the reported neural network result is therefore the
medium value (avoid possible extreme values due to
better/poorly trained networks) of 20 repetitive trials. The
network topology with the highest correct classification rate is
considered as the optimal network topology. Five neural
networks behavioral scoring models were built and the
classification results of the corresponding testing samples were
summarized in Table III. From the results in Table III, we can
observe that the average correct classification rates for the five
folds are 94.52%, 93.55%, 93.92%, 94.99%, and 95.12%,
respectively, with mean equals to 94.42%. Among the four
groups of customers, group 4 achieves the best classification
accuracy between 96.88% and 100.00%, with average accuracy
of 97.93%, while group 2 has the worse classification accuracy
ranging from 89.67% to 93.39%.

CROSS-VALIDATION RESULTS OF THE LDA MODELS

Behavioral scoring results


Fold
number
1
2
3
4
5
Mean

92.31%
(708/767)
92.70%
(711/767)
92.18%
(707/767)
94.00%
(721/767)
91.66%
(703/767)

75.05%
(385/513)
71.35%
(366/513)
70.96%
(364/513)
76.07%
(391/514)
75.88%
(390/514)

68.34%
(531/777)
72.20%
(561/777)
70.01%
(544/777)
72.97%
(567/777)
72.94%
(566/776)

94.85%
(92/97)
90.72%
(88/97)
92.78%
(90/97)
97.92%
(94/96)
89.58%
(86/96)

Average
correct
classification
rate
79.67%
(1716/2154)
80.13%
(1726/2154)
79.16%
(1705/2154)
82.31%
(1773/2154)
81.05%
(1745/2153)

92.57%

73.86%

71.29%

93.17%

80.46%

{1-1}

{2-2}

{3-3}

{4-4}

Here {x-y} means that customer in group x is classified as group y, and the definitions of groups please
refer to Table I.

From the results revealed in Table II, we can observe that


the average correct classification rates for the five folds are
79.67%, 80.13%, 79.16%, 82.31%, and 81.05%, respectively,
with mean equals to 80.46%. Among the four groups of
customers, group 4 achieves the best classification accuracy
between 89.58% and 97.92%, with average accuracy of
93.17%, while group 3 has the worse classification accuracy
ranging from 68.34% to 72.97%.
B. Backpropagation Neural Networks Model
The popular BPN is used in building the credit scoring
model, and the single hidden layer network is used to design
network structure. There are 41 input nodes in the input layer
and 4 output nodes. As the issue of determining the optimal
number of hidden nodes is a crucial yet complicated one, the
most commonly used way in determining the number of hidden
nodes is via experiments or trial and error. We, therefore, will
also use the trial-and-error approach with the range from 43 to
88 neurons to determine the appropriate number of hidden
nodes for the desired networks. The training of a network is
implemented with various learning rates ranging from 0.01 to
0.9 (almost all the network structure cannot converge with a
learning rate greater than 0.9) and training lengths ranging
from 56,000 to 500,000 iterations until the network converges.
Network weights will be reset for each combination of the
network parameters such as learning rates and momentum.

TABLE III.

CROSS-VALIDATION RESULTS OF THE BPN MODELS

Behavioral scoring results


Fold
number
1
2
3
4
5
Mean

{1-1}

{2-2}

{3-3}

{4-4}

95.83%
(735/767)
94.39%
(724/767)
95.05%
(729/767)
96.87%
(743/767)
96.48%
(740/767)

90.45%
(464/513)
89.67%
(460/513)
90.64%
(465/513)
92.22%
(474/514)
93.39%
(480/514)

95.37%
(741/777)
94.85%
(737/777)
94.59%
(735/777)
94.72%
(736/777)
94.33%
(732/776)

98.97%
(96/97)
96.91%
(94/97)
96.91%
(94/97)
96.88%
(93/96)
100.00%
(93/96)

Average
correct
classification
rate
94.52%
(2036/2154)
93.55%
(2015/2154)
93.92%
(2023/2154)
94.99%
(2045/2154)
95.12%
(2048/2153)

95.72%

91.27%

94.77%

97.93%

94.42%

C. Support Vector Machine Model


In this study, the most widely used radial basis function
(RBF) [7] [9] is adopted as the kernel function of SVM. In the
modeling of SVM, one of the key problems is how to select
model parameters correctly, which plays an important role in
good generalization performance. However, no general
guidelines are available to choose the free parameters of an
SVM model. The selection is usually based on trial-and-error
method or users prior knowledge and/or expertise.
This study used a grid-search method [9] to find the best
combination of parameters. The grid search is a straightforward
method using exponentially growing sequences of C and to
identify good parameters (for example, C=2-5, 2-3, 2-1, , 215).
Since doing a complete gird search is time-consuming for large
dataset, a two-step search processbeginning with a coarse
grid and followed by a finer grid searchwas conducted to
select the model parameters. Five SVM behavioral scoring
models were built, and the cross-validation results of these
models are presented in Table IV. From the results in Table IV,
we can observe that the average correct classification rates for
the five folds are 94.06%, 94.05%, 93.31%, 94.52%, and
94.84%, respectively, with mean equals to 94.16%. Among the
four groups of customers, group 4 achieves the best
classification accuracy between 93.75% and 97.94% for each

fold, with average accuracy of 96.91%, while group 2 has the


worse classification accuracy ranging from 88.50% to 93.00%.
TABLE IV.

CROSS-VALIDATION RESULTS OF THE SVM MODELS

Behavioral scoring results


{1-1}

{2-2}

{3-3}

{4-4}

95.83%
(735/767)
96.22%
(738/767)
95.05%
(729/767)
95.96%
(736/767)
95.70%
(734/767)

89.28%
(458/513)
89.47%
(459/513)
88.50%
(454/513)
91.25%
(469/514)
93.00%
(478/514)

94.98%
(738/777)
94.85%
(736/777)
94.34%
(733/777)
95.37%
(741/777)
95.10%
(738/776)

97.94%
(95/97)
94.85%
(92/97)
96.91%
(94/97)
93.75%
(90/96)
95.83%
(92/96)

Average
correct
classification
rate
94.06%
(2026/2154)
94.05%
(2025/2154)
93.31%
(2010/2154)
94.52%
(2036/2154)
94.84%
(2042/2153)

95.75%

90.30%

94.93%

95.85%

94.16%

Fold
number
1
2
3
4
5
Mean

D. Comparison of Results of Different Behavioral Models


Finally, in order to evaluate the classification capabilities of
the above three constructed behavioral scoring models, the
summarized results can be shown in Table V. From the results
revealed in Table V, we can conclude that the BPN model has
the best behavioral scoring capability in terms of the average
classification rate in comparison with LDA and SVM. In
consideration of group classification accuracies, SVM achieved
the best accuracy in both group 1 and 3; BPN achieved the best
accuracy in both group 2, 4.
TABLE V.
Group
Accuracy
Group 1
Group 2
Group 3
Group 4
Overall

GROUP ACCURACY COMPARISONS


LDA

BPN

SVM

92.57%
73.86%
71.29%
93.17%
80.46%

95.72%
91.27%
94.77%
97.93%
94.42%

95.75%
90.30%
94.93%
95.85%
94.16%

accuracy than those of the other models. However, the overall


classification accuracy is not the only criterion to assess the
capability of model. For example, if the business strategy of the
bank is to make a black list on bad credit accounts, BPN model
then can be used to accomplish this goal as it achieved the
highest classification accuracy in group 4 (see Table V). On the
other hand, if the business strategy is to identify the inactive
customers who are at risk of being lost to a competitor, then
SVM model should be used to conduct this task.
Future studies may aim at collecting more important
independent variables that will increase the behavioral scoring
accuracies. Combining feature selection tools with
classification techniques is also recommended. Integrating
other artificial intelligence techniques, like fuzzy discriminant
analysis, genetic algorithms and gray theory, with support
vector machine in further improving the behavioral scoring
accuracies may also being discussed. Other related topics about
credit cards like customer retention, market basket analysis,
profit scoring, and collection scoring models may also being
investigated in future research works.

REFERENCES
[1]

[2]
[3]

[4]

[5]

[6]

[7]

V.

CONCLUSIONS AND FUTURE RESEARCH

Consumer credit risk management has gained more and


more attention during the past few years due to the impact of
serious financial crises. Behavioral scoring is a widely used
tool for banks to continually assess the ongoing credit risks and
consumer behavior of their existing customers. This paper
investigates the classification capability of three modeling
techniques in building behavioral scoring tasks. Experimental
results showed that BPN model has better overall classification

[8]

[9]

J.M. Donato, J.C. Schryver, G.C. Hinkel, R.L. Schmoyer Jr., N.W.
Grady, and M.R. Leuze, Mining multi-dimensional data for decision
support, Future Gener. Comp. Syst., vol. 15, pp. 433441, 1999.
M. Banasiak and E. O'Hare, Behavior scoring, Bus. Credit, vol. 103,
pp. 5255, 2001.
L.C. Thomas, A survey of credit and behavioural scoring: forecasting
financial risk of lending to consumers, Int. J. Forecast., vol. 16, pp.
149172, 2000.
T.S. Lee, C.C. Chiu, C.J. Lu, and I.F. Chen, Credit scoring using the
hybrid neural discriminant technique, Expert Syst. Appl., vol. 23, pp.
245254, 2002.
H.J. Noh, T.H. Roh, and I. Han, Prognostic personal credit risk model
considering censored information, Expert Syst. Appl., vol. 28, pp. 753
762, 2005.
G. Kou, Y. Peng, Y. Shi, M. Wise, and W. Xu, Discovering credit
cardholders behavior by multiple criteria linear programming, Ann.
Oper. Res., vol. 135, pp. 261274, 2005.
V.N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed., NY:
Springer. (2000).
J.N. Crook, D.B. Edelman, and L.C. Thomas, Recent developments in
consumer credit risk assessment, Eur. J. Oper. Res., vol. 183, pp. 14471465, 2007.
C.W. Hsu, C.C. Chang, and C.J. Lin, A practical guide to support
vector classification, Technical Report, Department of Computer
Science and Information Engineering, National Taiwan University,
2003.

Das könnte Ihnen auch gefallen