Ijecr-Predicting Default For Japanese Smes With Robust

International Journal of Economics,
Commerce and Research (IJECR)

ISSN(P): 2250-0006; ISSN(E): 2319-4472
Vol. 6, Issue 3, Jun 2016, 27-34
TJPRC Pvt. Ltd
PREDICTING DEFAULT FOR JAPANESE SMES WITH ROBUST

LOGISTIC REGRESSION
MICHIKO MIYAMOTO
Department of Management Science and Engineering, Akita Prefectural University, Japan
ABSTRACT
Using financial ratio data consists of nearly 1 million Japanese SMEs collected by credit guarantee
corporations, as well as government-affiliated and private financial institutions involving SME business in 2010, this
study investigates indicators needed for credit risk individual measurement for Japanese SMEs by robust logistic
regression including the Bianco and Yohai (BY) estimator. Bianco and Yohai (1996) proposed an estimator that was
highly robust in the logistic regression model. The BY estimator included a bounded function and a bias correction term.
After filling missing data, a robust logistic regression extends the conventional logistic model by taking outlier
into account, to implement forecast of defaulted firms.
KEYWORDS: Credit Risk Assessment, Robust Logistic Regression, SMEs, BY Estimator
1. INTRODUCTION
Original Article
Received: Apr 22, 2016; Accepted: May 11, 2016; Published: May 23, 2016; Paper Id.: IJECRJUN201604
Modeling credit risk specifically for Small or medium-sized enterprises (SMEs)are considered as important for
financial institutions. SMEs, which are more than 99% of all businesses in Japan, rarely provide reliable financial
data so that they are informationally opaque. In particular, missing or incomplete data is present in nearly every use
of financial data (Kofman and Sharpe, 2003).Besides missingness, outliers also could seriously distort the estimated
results.
In this paper, the author analyzes predict default probability with the consideration of missingness and outliers in
credit risk modeling, using a large SMEs database in Japan.This paper is organized as follows. Following the
introduction on Section 1, Section 2 presents literature review on SMEs credit risk assessment. Section 3 describes
the data and variables. Section 4 presents analytical method. Section 5 presents the result of analysis.
Finally,summary of research results is discussed in Section 6.
2. LITERATURE REVIEWS
Miyamoto (2013) analyzes the effect of missing data of a small regional bank in Japan, and found different
results from the complete data which omitting such data. The problem of outliers has been discussed over 150
years (Anscombe, 1960). In order to solve outliers, several methods were proposed; use a non-linear formulation or
apply a transformation (log, square root, etc.) to the data, remove suspected observations, winsorizaiton of the data,
use dummy variables, use LAD (quantile) regressions, which are less sensitive to outliers, and weight observations
by size of residuals or variance (robust estimation). Many theoretical efforts have been devoted to develop statistical
www.tjprc.org
editor@tjprc.org
28
Michiko Miyamoto
procedures that are resistant to small deviations from the assumptions, i.e. robust with respect to outliers and stable with
respect to small deviations from the assumed parametric model since 1960(Bellio and Ventura, 2005). The robust
regression models have been developed in early days; Least Absolute Deviation/Values (LAD/LAV) regression or least
absolute deviation regression,i.e., minimizes |e| instead of e2 have proposed. More modern methods include M-Estimation,
Huber estimates, Bi-square estimators, Bounded Influence Regression, Least Median of Squares, and Least-Trimmed
Squares.A general theory ofrobustness is developed in Huber (1981) and Hampel, Ronchetti, Rousseeuw,and Stahel
(1986). Rousseeuw (1994) introduced several robust regression estimators, including least median of squares (LMS) and
least trimmed squares (LTS). The least median of squares (LMS) regression method is highly robust to outliers.For robust
logit model, several alternatives have been proposed, particularly in terms of systematically downweighting observations
(Bondell, 2008). These procedures are introduced by Pregibon (1982), Copas (1988), Knsch, et al. (1989), Carroll and
Pederson (1993), and Bianco and Yohai (1996).
There are a few literature related torobust logit model in credit risk assessment. Shen et al. (2010) predict default
probability with the consideration of outliers. They directly extended the logit estimation method by applying the forward
search method of Atkinson and Cheng (1990) and Atkingson and Riani (2001) to Taiwanese companies data. They found
that robust logit method is substantially superior to the logit method by using all validation tools, however, the superiority
of robust logit is less pronounced with respect to the out-of-sample forecasts.Hauser and Booth (2011) compared the
classificnation and prediction of bankrupt firms by robust logistic regression with the Bianco and Yohai (BY) estimator
versus maximum likelihood (ML) logistic regression. They applied the methodologies to a sample of 24 non-financial U.S.
firms that filed for bankruptcy in 2008-2009 and a sample of 48 non-financial U.S. firms that did not file for bankruptcy in
the same period. They use the financial ratio data of those corporations from 2006 and 2007. They concluded that BY
robust logistic regression should be used as a robustness check on ML logistic regression, and if there is any difference
exists, BY robust logistic regression should be used as the primary classifier of bankrupt firms. Miyamoto (2015)
compared general linear regression, and three types of robust logit methods which take possible outliers into account are
conducted. Using financial ratio data consists of nearly 4,955 loans to entrepreneurs and small enterprises extended by one
bank located in the provincial city of Japan over the period 2002 to 2004, the resultsfrom robust logistic regression bring
different credit assessment results from the general linear model. ROA, for example, is negative and statistically highly
significant only for the regular BY logistic regression, while those of other methods are not.
his paper will compare the results of complete dataset, and those after filling missing data. Then, an alternative
method given by the Bianco and Yohai (BY) estimators is used to estimate credit assessment of SMEs in Japan.
3. LOGISTIC AND ROBUST LOGISTIC ESTIMATION

3.1 Logistic Estimation
A logistic regression is the traditionally preferred statistical techniques for credit risk modeling. The model under
consideration regards the binary outcome y,such as default and not default,as a Bernoulli variable with probability function.
Impact Factor (JCC): 4.5976
Index Copernicus Value (ICV): 6.1
Predicting Default for Japanese SMEs with Robust Logistic Regression
29
(1)
3.2 Robust Logistic Estimation
A generalization of the equation (1) is shown in equation (2).
(2)
3.3 Robust Logistic Estimation by Bianco and Yohai (1996)

Bianco and Yohai (1996) constructed a consistent and more robust version of Pregibon (1982)s estimator by
working with a bounded function , and defining
www.tjprc.org
editor@tjprc.org
30
Michiko Miyamoto
4. DATA
In this paper, the author analyzes credit risk assessment by using financial dataset consists of nearly 1 million
Japanese SMEs collected by credit guarantee corporations, as well as government-affiliated or private financial institutions
involving SME business in 2010. The data provides information on each firms balance sheet, income statement and its
default information. Table 1 shows description of variables in this study.
Definitionsof SME varies among different
countries. SMEs in Japan are defined under Article 2, Paragraph 1 of the Small and Medium-sized Enterprise Basic Act,
and the term small enterprises is defined under Article 2, Paragraph 5 of said act. According to these definitions, most
companies in this research are considered as SMEs.
The data plotted in Figure 1 shows that there are some outliers even after filling missing values.
There aretwo approaches to default modelling: 1) discriminant analysis, and 2) probit and logit analysis.
Discriminant analysis, like probit and logit analysis, is used to determine which variables discriminate between two
(or more) naturally occurring groups, for example, default and no default.
(3)
(4)
Table 1: A List of Independent Variables
31
Figure 1: Plots of Dataset after Filling Missing Values
5. RESULTS
5.1 Comparison of Complete Data, Multiple Imputations, and BY-Estimator
The first logit analysis is using only those cases with complete information. That is the most common, and easier
method used by researchers either consciously or by default in a statistical analysis drop informants who do not have
complete data on the variables of interest (Pigott, 2001). However, results of such analyses could be biased. Multiple
imputationsare a general approach to the problem of missing data (King, et al. 2001). It aims to allow for the uncertainty
about the missing data by creating several different plausible imputed data sets and appropriately combining results
obtained from each of them.I use the notation proposed by Little and Rubin (2002) in order to analyze data which assume
to bemissing at random (MAR).Results of logit analysis using a complete dataset, a multiple imputed dataset, and robust
regressions using the BY-estimator with multiple imputed dataset are shown in table 2.
Table 2: Complete Data Multiple Imputation and by Logistic Regression
www.tjprc.org
editor@tjprc.org
32
Michiko Miyamoto
Some signs of the coefficient are reversed for the complete data, those for multiple imputed data, and those for
robust multiple imputed data. Sales (log), retained earnings/total assets, ln(current assets/current liabilities), are negative
and insignificant for the complete data, while they are positive and significant for multiple imputed data, as well as robust
multiple imputed data. Net income/sales is positive and insignificant, EBIT/Interest expense is positive and significant for
the complete data, while they are negative and significant for multiple imputed data, as well as robust multiple imputed
data. ROA is positive but insignificant for all datasets, but those for multiple imputed data and robust multiple imputed data
are positive and significant. Operating profit/total assets is negative and insignificant for the complete data, but negative
and significant for latter two datasets. Total debt/total assets is positive and significant for all datasets. Cash
equivalent/total assets, quick assets/current assets, and fixed liabilities/total debt are negative and significant for all
datasets.
5.2 Credit Risk Model Validation
ROC (Receiver Operating Characteristic) and CAP (Cumulative Accuracy Profile) analyses are two ways of
evaluating credit risk model. Both ROC and CAP analysis provide satisfactory analyses of the accuracy of assessments of
credit ratings (Irwin and Irwin2012). The key idea underlying ROC and CAP analysis is that diagnosis involves a trade-off
between default and non-default (that is, between true and false positives) and that this trade-off varies with the stringency
of the threshold used to decide whether an alarm is sounded. Financial analysts have used ROC analysis to assess
credit-ratings systems and indicators of financial crisis (e.g., Basel Committee on Banking Supervision, 2005; Engelmann,
Hayden, and Tasche, 2003; Sobehart and Keenan, 2001; Van Gool, Verbeke, Sercu, and Baesens, 2011).
See details on how to construct a ROC curve and other validation methods in BCBS Working Papers No
14(2005). A rating models performance is the better the steeper the ROC curve is at the left end and the closer the ROC
curves position is to the point (0,1). Similarly, the model is the better the larger the area under the ROC curve is
(BCBS 2005). The ROC curve for small enterprises is shown in figure1, and that for individual entrepreneurs is shown in
figure 2.
Accuracy Ratio (AR; exactly equivalent to the Gini coefficient) measures the trade-off between the selection rates
of Goods and that of Bads. If the score model is random, at any given cut-off, the proportion of goods passing the
cutoff will be the same as the proportion of bads. This would give an AR of 0%. On the other hand, with a perfect
scorecard, it would be possible to select all of the goods (100%) and none of the bads (0%). The resulting AR would be
100%. Practical experience shows that the Accuracy Ratio (AR) has tendency to take values in the range of 50% and 80%.
AR for complete data is 0.739, that for multiple imputationsis 0.753, and that for BY-estimator is 0.7454 in this study.
The result for multiple imputations which include outliers has the higher AR; however, outliers may distort the estimation
result and decrease its accuracy.
6. CONCLUSIONS
The purpose of this study is to investigate indicators needed for credit risk measurement for the SMEs in Japan,
using robust logistic regression including the Bianco and Yohai (BY) estimatorwhich takes possible outliers into account.
The result from BY estimator is compared with a traditional logistic regression with the complete data and multiple
imputed data. The analyses in this study show that removing such outliers might improve the accuracy of the analyses.
33
Figure 2: ROC Curve

7. ACKNOWLEDGMENTS
This research was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas (C)
24530355from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.
8. REFERENCES
1.
Anscombe, F. J. & Guttman, I. (1960) Rejection of outliers. Technometrics, 2:123147.
2.
Basel Committee on Banking Supervision.(2005, May). Studies on the Validation of Internal Rating Systems, BIS Working
Paper No. 14. (Basel, Switzerland: Bank for International Settlements).
3.
Bellio,
&
Ventura,
(2005)
An
Introduction
to
Robust
Estimation
with
functions;
on
line:
http://www.dst.unive.it/rsr/BelVenTutorial.pdf.
4.
Bondell, H. D. (2008). A characteristic function approach to the biased sampling model, with application to robust logistic
regression. Journal of Statistical Planning and Inference 138(3):742-755.
5.
Carroll, R. J. and Pederson, S. (1993) On robust estimation in the logistic regression model. J. R. Statist. Soc. B, 55, 693-706.
6.
Copas, J. B. (1988) Binary regression models for contaminated data (with discussion). J. R. Statist. Soc. B, 50, 225-265.
7.
Englemann, B., Hayden, E. and Tasche, D. (2003), Testing Rating Accuracy, Credit Risk, Vol. 16, January, pp. 8286.
8.
Hampel, F. R., Ronchetti, E.M.. Rousseeuw, P. J & Stahel, W. A. (1986).Robust Statistics: The Approach Based on Influence
Functions. NewYork: John Wiley.
9.
Huber, P. J. (1981). Robust Statistics. New York: John Wiley.
10. Irwin, J. R. and Irwin, C. T. (2012). Appraising Credit Ratings: Does the CAP Fit Better than the ROC?. [IMF Working paper
12/122]. Retrieved February 27, 2014 fromhttp://www.imf.org/external/pubs/ft/wp/2012/wp12122.pdf.
11. King, G., Honaker, J., Joseph, A. and Scheve, K. (2001).Analyzing Incomplete Political Science Data: An Alternative
Algorithm for Multiple Imputation. The American Political Science Review 95(1): pp. 49-69.
12. Kofman P. &Sharpe, I. (2003) Using multiple imputation in the analysis of incomplete observations in finance, Journal of
Financial Econometrics 1, pp. 216-249.
13. Knsch, H. R., Stefanski, L. A. and Carroll, R. J. (1989) Conditionally unbiased bounded influence estimation in general
regression models, with applications to generalized linear models. J. Am. Statist. Assoc., 84, 460-466.
www.tjprc.org
editor@tjprc.org
34
Michiko Miyamoto
14. Little, J.R. and Rubin, D.Statistical Analysis with Missing Data. Wiley: New York. 2002.
15. Miyamoto, M. (2013) Analyzing Effects of Missing Data in Credit Risk Assessment of a Small Bank, Proceedings of 2013
International Conference on Business and Social Sciences (ICBASS), pp.522-528.
16. Miyamoto, M. (2015) Predicting Default for a Small Bank with Robust Logistic Regression. Proceeding of 2016 Seoul
International Conference on Social Sciences and Management, forthcoming.
17. Pigott, T. D. (2001)A Review of Methods for Missing Data, Educational Research and Evaluation, vol. 7, no. 4, pp. 353-383.
18. Pregibon, D. (1982) Resistant Fits for Some Commonly Used Logistic Models with Medical Applications, Biometrics,38(2),
pp. 485-498.
19. Powers, J. M., & Cookson, P. W. Jr.(1999). The politics of school choice research. Educational Policy, 13(1), 104-122.
doi:10.1177/0895904899131009
20. Sobehart, J., and Keenan, S. (2001). Measuring Default Accurately. Credit Risk Special Report, Risk, pp. S31S33.
21. Van Gool, J., Verbeke, W. Sercu, P. and Baesens, B. (2011). Credit scoring for microfinance: is it worth it? International
Journal of Finance and Economics, Vol. 17, No. 2, pp. 103123.

Ijecr-Predicting Default For Japanese Smes With Robust

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ijecr-Predicting Default For Japanese Smes With Robust

Hochgeladen von

Copyright:

Verfügbare Formate

International Journal of Economics,

Commerce and Research (IJECR)

PREDICTING DEFAULT FOR JAPANESE SMES WITH ROBUST

3. LOGISTIC AND ROBUST LOGISTIC ESTIMATION

Impact Factor (JCC): 4.5976

Index Copernicus Value (ICV): 6.1

Predicting Default for Japanese SMEs with Robust Logistic Regression

3.3 Robust Logistic Estimation by Bianco and Yohai (1996)

Definitionsof SME varies among different

Impact Factor (JCC): 4.5976

Index Copernicus Value (ICV): 6.1

Predicting Default for Japanese SMEs with Robust Logistic Regression

Figure 1: Plots of Dataset after Filling Missing Values

Impact Factor (JCC): 4.5976

Index Copernicus Value (ICV): 6.1

Predicting Default for Japanese SMEs with Robust Logistic Regression

Figure 2: ROC Curve

Anscombe, F. J. & Guttman, I. (1960) Rejection of outliers. Technometrics, 2:123147.

Huber, P. J. (1981). Robust Statistics. New York: John Wiley.

Impact Factor (JCC): 4.5976

Index Copernicus Value (ICV): 6.1

Das könnte Ihnen auch gefallen