Sie sind auf Seite 1von 4

J. Vis. Commun. Image R.

71 (2020) 102730

Contents lists available at ScienceDirect

J. Vis. Commun. Image R.


journal homepage: www.elsevier.com/locate/jvci

Credit risk assessment of P2P lending platform towards big data based
on BP neural network q
Yiping Guo
School of Finance and Trade, ZhengZhou ShengDa University of Economics, Business & Management, Zhengzhou 451191, China

a r t i c l e i n f o a b s t r a c t

Article history: Peer-to-peer (P2P) lending platform plays a significant role in modern financial systems. However, due to
Received 10 October 2019 improper supervision, credit risk is inevitable. In this paper, we analyze the traditional financial risk and
Revised 26 November 2019 information technology risk of P2P lending platform. In order to evaluate the performance of assessment
Accepted 28 November 2019
algorithms, we present a BP neural network-based algorithm for lending risk assessment. To achieve our
Available online 29 November 2019
task, we crawled large-scale lending data for 2015–2019. Logistic regression is used to compare with BP
neural network method. Experimental results show that BP neural network-based algorithm outperforms
Keywords:
traditional Logistic regression algorithm and the proposed method can effectively reduce investor risk.
Peer to peer
Credit risk assessment
Ó 2020 Elsevier Inc. All rights reserved.
Logistic regression
BP neural network
Big data

1. Introduction However, P2P-based lending platform lacks effective supervi-


sion. Because of the lack of effective risk control of P2P lending
With the development of modern financial systems, peer-to- platform, the credit risk rises sharply. For example, investigation
peer (P2P) lending platform has achieved wide applications, espe- data shows that there are more than 800 P2P lending platforms
cially in small business lending. P2P lending indicates that individ- went bankrupt due to fraud and poor management. P2P lending
uals can use third-party network platforms to lend capital to other platform covers a wide range of areas and has strong relevance,
users who need to borrow. P2P lending is a personal loan, which is which makes the risk of borrowing and lending greater. A large-
a direct financing. That is to say, the information of capital supply scale default occurs may affect the entire financial market. There-
and demand is published and matched directly on the Internet, and fore, it is particularly urgent and important to evaluate the credit of
both sides of supply and demand trade directly without the need the P2P lending market and build a barrier to safeguard financial
for traditional financial institutions such as banks. Thus, P2P lend- security [6,8,9]. Assessment index is an important factor to evalu-
ing platform plays a significant role in modern financial innovation, ate credit risk of P2P lending platform. Multiple factors can affect
and it is an important component in Internet finance [1–4]. P2P the credit of financial systems, such as financial, lending, and credit
lending makes up for the shortcomings of formal financial institu- status. Analyzing these factors is of great significance for credit risk
tions and promotes a win-win situation between borrowers and assessment of P2P lending platforms. Designing appropriate
investors. Borrowers can get better financing conditions and inves- indices is another factor for this task. In this paper, we aim to pre-
tors can get higher interest returns. P2P-based financial platform sent a BP neural network-based algorithm for lending risk
can broaden financing and investment channels, boost market con- assessment.
sumption and form inclusive financial system. For some small
enterprises, P2P lending can reduce the cumbersome process of 2. Related work
direct access to financing channels and reduce production costs.
Therefore, P2P-based lending platform is of great significance to Credit risk assessment of P2P lending platform is a hot research
increase the financial support of small enterprises and break the topic in modern financial system and intelligent applications
financial repression. [6,10,15,19–21]. Freedman et al. [14] pointed out that users tended
to hide their truth information in order to obtain loan from P2P
lending platform. Even worse, some users leveraged false informa-
q
This paper has been recommended for acceptance by Maofu Liu. tion to make loans, which led investors to invest in borrowers with
E-mail address: Guo_Yiping66@hotmail.com higher default risk coefficient, and increased the risk of capital loss.

https://doi.org/10.1016/j.jvcir.2019.102730
1047-3203/Ó 2020 Elsevier Inc. All rights reserved.
2 Y. Guo / J. Vis. Commun. Image R. 71 (2020) 102730

Klafft et al. [15] indicated that fraud often occurred because of


information asymmetry between investors and borrowers. Thus,
credit risk assessment plays an important role in maintaining the
stability of financial order. Two commonly used algorithms are
statistic-based and network-based methods. Altman et al. [12] pro-
posed the earliest statistic-based method (Z-score model) to calcu-
late financial ratios, which evaluate the financial situation of
enterprises. Tam et al. [13] leveraged BP neural network to predict
bankruptcy risk based on large-scale bank data samples. The
research indicated that BP neural network-based algorithm out-
performed DA, Logit, and KNN based algorithms. One of the major
advantages of neural network-based algorithm is that these meth-
ods can overcome non-linear data, which can achieve arbitrary
non-linear mapping and has good generalization ability, and can
obtain high prediction accuracy. Neural network-based algorithms,
such as BP neural network, have strong ability to learn the distribu-
tion of training data. The model can learn and train according to Fig. 1. The basic architecture of BP neural network.
the input pattern samples, construct the model reasoning from a
large number of complex data mining rules, and can effectively
adjust the parameters of the model according to the change of 1X l
 2
the external environment. In the field of economy and finance, Ek ¼ b
y  yj ð1Þ
2 j¼1 j
large-scale data analysis is unavoidable, and the method based
on neural network is a good way to solve this problem. The training
where b y j denotes the output value. After selecting BP neural net-
model can well fit the characteristics that the repayment effective-
work, we first set parameters for the network. Then, we leverage
ness of online borrowers is susceptible to the influence of market
training samples to train the network to adjust parameters, where
environment under the network credit environment.
the weights of neural network and thresholds can adapt to our task.
Another widely used algorithm is statistic-based method, such
More specifically, we fist crawled large-scale lending data for
as regression analysis, multivariate discriminant analysis, and
2015–2019, and we set the number of evaluation index to 10. Based
nearest neighbor method. Logistic regression algorithms are
on our collected dataset, we set the number of input nodes to 10,
widely used in computer vision and intelligent systems
and the number of nodes in hidden layer is set to 10. We set the
[5,7,11,16]. Ohlson et al. [17] first leverage logistic regression to
number of nodes in output layer to 1 (1 for high-quality users, 0
build credit classification model and achieved good performance.
for high-risk users). We leverage Sigmoid function as action func-
Dinh et al. [18] pointed out that logistic regression algorithms per-
tion. The dataset is trained for 10 epochs with a batch size of 256.
form best among all traditional credit risk assessment algorithms.
The initial learning rate is 0.01, and the learning rate will be
However, logistic regression algorithms require that the data
0.005 after 5000 iterations. We set the weight decay and momen-
should satisfy strict assumptions, so it is difficult to apply in prac-
tum to 0.0002 and 0.9, respectively.
tice. The relationship between financial status and indicators is
non-linear, and indicators are also related. Many indicators do
not satisfy the hypothetical distribution, such as the orthogonal 3.2. Index introduction
distribution. In addition, these algorithms cannot cope with
large-scale data. As we discussed above, there are various indices affecting
results of credit risk assessment. In our implementation, we select
10 indices including age, gender, education, job, marital status,
3. Proposed method
income, real estate, car property, mortgage, and car loan. These
10 indices can reflect capital status of users. The output is credit
In this paper, we leverage BP neural network for credit risk
assessment level. In order to make data suitable for training of neu-
assessment of P2P lending platform. In order to highlight the per-
ral network, the data should be quantified, as shown in Table 1. We
formance of BP neural network-based algorithm, we leverage tra-
discuss quantitative index about users’ gender. International banks
ditional logistic regression method to conduct the same
generally acknowledge that women have better credit perfor-
experiment and compare their performances.
mance than men, and that women are less likely to default than
men.
3.1. BP neural network
In our implementation, we gather 460 samples for 2015–2019.
The samples consist of 300 high-quality customers and 160 high-
BP neural network is multilayer feedforward neural network,
risk customers. We divide the samples into training samples and
which consists of two learning stages: signal forward propagation
testing samples. The training samples contains 250 high-quality
and error backward propagation. The architecture of BP neural net-
customers and 130 high-risk customers, while the testing samples
work is shown in Fig. 1.
contains 50 high-quality customers and 30 high-risk customers.
BP neural network consists of input layer, hidden layer, and out-
put layer. All layers of neurons are linked together, and the same
layer of neurons are not connected. Multiple hidden layers can 4. Experiment and analysis
improve model accuracy, but inevitably increase computational
complexity. Since only a hidden layer BP neural network can Our experiment was conducted on MATLAB2016a platform.
implement arbitrary nonlinear mapping, in our implementation, Each training and testing sample can be regarded as
we only use one layer of hidden layers. Given training samples 10-dimension binary feature vector. Firstly, we use linear transfer
D ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ;    ; ðxn ; yn Þg. BP neural network aims to learn function and gradient descent algorithm to test the hidden layer
the parameters that can minimize the mean square error: nodes of BP neural network. The maximum training time is set to
Y. Guo / J. Vis. Commun. Image R. 71 (2020) 102730 3

Table 1 regression methods is low computation and stable prediction


The quantitative results of credit rating. results. We feed the training data into the Logistic model for
Index Quantitative results regression and obtain the coefficients of the corresponding vari-
Age 30–45 Others ables. Then we feed the test data into the trained regression model
1 0 to calculate the credit evaluation results. To eliminate the correla-
Gender Female Male tion between features, we first carry out principal component anal-
1 0 ysis (PCA). PCA aims to calculate feature vector that can represent
Education Bachelor degree or above Others
1 0
the original feature.
Job Business owner Others
1 0
X
n

Marital status Married Others


J ðeÞ ¼ et Se þ jjxk  mjj2 ð2Þ
k¼1
1 0
Income $2000/month or above Others
1 0 where e denotes unit vector of projection direction. S denotes scat-
Real estate Yes No ter matrix, xk denotes the k-th training samples, and m denotes the
1 0 mean value of samples. To maximize JðeÞ, the scatter matrix should
Car property Yes No
satisfy Se ¼ ke. Thus, k denotes the eigenvalues of matrix S, and the
1 0
Mortgage Yes No resolution is the eigenvector. In our implementation, we obtain four
0 1 principle components and the prediction accuracy is 89.36%. Thus,
Car loan Yes No BP neural network-based algorithm outperforms traditional logistic
0 1 regression-based methods.

5. Conclusion

20000, and the number of nodes is set to [5,16]. Table 2 shows the In this paper, we propose BP neural network-based algorithm
training result. Considering the calculation efficiency and fitting for credit risk assessment of P2P lending platforms. To achieve
error, we set the number of nodes in the hidden layer to be 10. our task, we crawled 460 samples for 2015–2019. We conduct
The number of our testing samples is 80, Table 3 shows part of comprehensive experiment to show the effectiveness of BP neural
the testing results. The accuracy of credit risk assessment is 93.3%. network for predicting credit levels. Logistic regression based algo-
In order to highlight advantages of BP neural network, we com- rithm is conducted for comparison. Experimental results have
pare the BP neural network based algorithm with traditional logis- shown that BP neural network-based algorithm performs better
tic regression algorithm. One of the major advantages of logistic than traditional logistic regression-based methods.

Table 2
The training result on different number of nodes in hidden layer.

#Num. of nodes #Num. of iteration fitting error #Num. of nodes #Num. of iteration fitting error
5 15,500 0.1629 11 7500 0.1382
6 13,500 0.1371 12 7000 0.1493
7 12,500 0.1947 13 6500 0.1162
8 11,000 0.1738 14 6500 0.1427
9 10,500 0.1670 15 6000 0.1285
10 7500 0.1057 16 5500 0.1626

Table 3
The testing result on 30 testing samples.

Prediction 0.8129 label Prediction 0.3280 label Prediction 0.6852 label


Expectation 1 0 Expectation 0 0 Expectation 1 1
Prediction 0.4382 label Prediction 0.8936 label Prediction 0.7372 label
Expectation 0 1 Expectation 1 1 Expectation 1 1
Prediction 0.8593 label Prediction 0.9027 label Prediction 0.2859 label
Expectation 1 1 Expectation 1 1 Expectation 0 0
Prediction 0.7736 label Prediction 0.8543 label Prediction 0.8173 label
Expectation 1 1 Expectation 1 1 Expectation 1 1
Prediction 0.6682 label Prediction 0.4692 label Prediction 0.6439 label
Expectation 1 1 Expectation 0 0 Expectation 1 1
Prediction 0.7328 label Prediction 0.6480 label Prediction 0.2782 label
Expectation 1 1 Expectation 1 0 Expectation 0 0
Prediction 0.5783 label Prediction 0.7363 label Prediction 0.8773 label
Expectation 1 1 Expectation 1 1 Expectation 1 1
Prediction 0.2730 label Prediction 0.6539 label Prediction 0.3957 label
Expectation 0 0 Expectation 1 1 Expectation 0 0
Prediction 0.8603 label Prediction 0.7301 label Prediction 0.2730 label
Expectation 1 1 Expectation 1 1 Expectation 0 0
Prediction 0.5705 label Prediction 0.6836 label Prediction 0.8485 label
Expectation 1 1 Expectation 1 1 Expectation 1 1
4 Y. Guo / J. Vis. Commun. Image R. 71 (2020) 102730

Declaration of Competing Interest [10] C. Luo, H. Xiong, W. Zhou, Y. Guo, G. Deng, August). Enhancing investment
decisions in P2P lending: an investor composition perspective, in: Proceedings
of the 17th ACM SIGKDD international conference on Knowledge discovery
The authors declare that they have no known competing finan- and data mining, ACM, 2011, pp. 292–300.
cial interests or personal relationships that could have appeared [11] N. Karimi, M.R. Taban, Nonparametric blind SAR image super resolution based
on combination of the compressive sensing and sparse priors, J. Vis. Commun.
to influence the work reported in this paper.
Image Represent. 55 (2018) 853–865.
[12] E. Altman, F. Ratios, Discriminant analysis and the prediction of corporate
References bankruptcy, J. Finan. 23 (4) (1968) 589–609.
[13] K.Y. Tam, Neural network models and the prediction of bank bankruptcy,
[1] E. Lee, B. Lee, Herding behavior in online P2P lending: An empirical Omega 19 (5) (1991) 429–445.
investigation, Electron. Commer. Res. Appl. 11 (5) (2012) 495–503. [14] S. Freedman, G.Z. Jin, Do social networks solve information problems for peer-
[2] S.C. Berger, F. Gleisner, Emergence of financial intermediaries in electronic to-peer lending? Evidence from prosper. Com (2008).
markets: The case of online P2P lending, BuR Busin. Res. J. 2 (1) (2009). [15] M. Klafft, Online peer-to-peer lending: a lenders’ perspective. In Proceedings of
[3] R. Emekter, Y. Tu, B. Jirasakuldech, M. Lu, Evaluating credit risk and loan the international conference on E-learning, E-business, enterprise information
performance in online Peer-to-Peer (P2P) lending, Appl. Econ. 47 (1) (2015) systems, and E-government, IEEE, (2008, July) pp. 371–375.
54–70. [16] R. Pramanik, S. Bag, Shape decomposition-based handwritten compound
[4] Y. Guo, W. Zhou, C. Luo, C. Liu, H. Xiong, Instance-based credit risk assessment character recognition for bangla ocr, J. Vis. Commun. Image Represent. 50
for investment decisions in P2P lending, Eur. J. Oper. Res. 249 (2) (2016) 417– (2018) 123–134.
426. [17] J.A. Ohlson, Financial ratios and the probabilistic prediction of bankruptcy, J.
[5] R.A. Virrey, C.D.S. Liyanage, M.I.B.P.H. Petra, P.E. Abas, Visual data of facial Account. Res. (1980) 109–131.
expressions for automatic pain detection, J. Vis. Commun. Image Represent. 61 [18] T.H.T. Dinh, S. Kleimeier, A credit scoring model for Vietnam’s retail banking
(2019) 209–217. market, Int. Rev. Finan. Anal. 16 (5) (2007) 471–495.
[6] H. Zhao, L. Wu, Q. Liu, Y. Ge, E. Chen, Investment recommendation in p2p [19] U. Atz, D. Bholat, Peer-to-peer lending and financial innovation in the United
lending: A portfolio perspective with risk management, in: 2014 IEEE Kingdom-Ulrich Atz and David Bholat (No. 598). Bank of England (2016).
International Conference on Data Mining, IEEE, 2014, pp. 1109–1114. [20] H. Wang, M. Greiner, J.E. Aronson, People-to-people lending: The emerging e-
[7] I. Bezzine, M. Kaaniche, S. Boudjit, A. Beghdadi, Sparse optimization of non commerce transformation of a financial market, in: People-to-people lending:
separable vector lifting scheme for stereo image coding, J. Vis. Commun. Image The emerging e-commerce transformation of a financial market, Springer,
Represent 57 (2018) 283–293. Berlin, Heidelberg, 2009, pp. 182–195.
[8] B. Luo, Z. Lin, A decision tree model for herd behavior and empirical evidence [21] Y.E. Xiangrong, The risks of China’s P2P lending models and related
from the online P2P lending market, IseB 11 (1) (2013) 141–160. regulations, Finan. Regul. Res. 3 (2014) 006.
[9] L. Liao, M. Li, Z. Wang, The intelligent investor: not-fully-marketized interest
rate and risk identify: evidence from P2P lending, Econ. Res. J. 7 (2014) 125–
137.

Das könnte Ihnen auch gefallen