Sie sind auf Seite 1von 12

Knowledge-Based Systems 89 (2015) 459–470

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

Detecting the financial statement fraud: The analysis of the differences


between data mining techniques and experts’ judgments
Chi-Chen Lin a,1, An-An Chiu b,2, Shaio Yan Huang c,3, David C. Yen d,⇑
a
Department of Accounting Information, National Taipei University of Business, No. 321, Sec. 1, Jinan Rd., Zhongzheng, Taipei, Taiwan
b
Department of International Trade, Feng Chia University, No. 100, Wenhwa Rd., Seatwen, Taichung 40724, Taiwan
c
Department of Accounting and Information Technology, National Chung Cheng University, 168 University Rd., Min-Hsiung, Chia-Yi 62102, Taiwan
d
School of Economics and Business, 226 Netzer Administration Bldg., SUNY College at Oneonta, Oneonta, NY 13820, United States

a r t i c l e i n f o a b s t r a c t

Article history: The objective of this study is to examine all aspects of fraud triangle using the data mining techniques
Received 4 November 2014 and employ the available and public information to proxy variables to evaluate such attributes as
Received in revised form 14 August 2015 pressure/incentive, opportunity, and attitude/rationalization, based on the findings from prior studies
Accepted 18 August 2015
in this subject field and also the Statement on Auditing Standards. The second objective is to discuss
Available online 24 August 2015
whether or not the suggestion of the experts agrees with the results obtained from adopting those novel
techniques. In specific, this study uses both expert questionnaires and data mining techniques to sort out
Keywords:
the different fraud factors and then rank the importance of them. The data mining methods employed in
Fraud factor
Fraud triangle
this research include Logistic Regression, Decision Trees (CART), and Artificial Neural Networks (ANNs).
Data mining Empirically, the ANNs and CART approaches work with the training and testing samples in a correct
classification rate of 91.2% (ANNs) & 90.4% (CART) and 92.8% (ANNs) & 90.3% (CART), respectively, which
is more accurate than the logistic model that only reaches 83.7% and 88.5% of the correct classification in
assessing the fraud presence. In addition, type II error of ANNs drops significantly to 23.9% from 43.3%
and 27.8% compared to the ones using CART and logistic models. Finally, the differences between differ-
ent data mining tools and expert judgments are also compared to provide more insights as a research
contribution.
Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction fraud is becoming an increasingly serious problem and as a result,


effective detecting accounting fraud has always been an important
After the occurrence of several major scandals (e.g., Enron Corp., but rather complex task for accounting professionals [29,13,37,34].
Tyco, and WorldCom Inc.), the loss of market capitalization result- Examining the financial fraud is in fact one of the hot issues given
ing from the reported financial statement fraud is estimated to be that the economic and social fallouts from the fraud can be massive
about $460 billion [39]. In 2014, Association of Certified Fraud [22]. After AICPA issued SAS No. 82, a greater responsibility has
Examiners (ACFE) reported that the U.S. organizations lose almost been imposed onto the auditors to detect fraud in general, and in
5 percent of their revenue due to fraud, and the Gross Domestic dealing with the effective management of fraud in particular.
Product (GDP) based annual fraud estimate for U.S. alone is around However, this aforementioned act did not provide more specific
$3.7 trillion (ACFE, 2014). Sorkin [41] reported that there are 343 and objective guidelines. Following the issuance of SAS No. 99
criminals and 189 civil defendants involved with fraudulent activ- and Sarbanes–Oxley Act, the aim of preventing fraud with a more
ities which have harmed more than 120,000 victims with a value of rigorous internal control oversight is placed as a major focus and it
more than $8 billion in recent years in the United States. Financial has stimulated and inspired the numerous academic studies
[42,33,12,18] in this subject area.
⇑ Corresponding author. Tel.: +1 607 436 3458 (office); fax: +1 607 436 2543. A prolific area of prior research has focused on using different
E-mail addresses: c97ve47@yahoo.com.tw (C.-C. Lin), aachiu@fcuoa.fcu.edu.tw tools and techniques to detect frauds such as analytical procedures,
(A.-A. Chiu), actsyh@yahoo.com.tw (S.Y. Huang), David.Yen@oneonta.edu ratio analysis, regression analysis, score propagation over an
(D.C. Yen). auction network (SPAN) and checklists to improve the fraud detec-
1
Tel.: +886 2 23226362. tion [16,19,48]. However, the previous studies may result in too
2
Tel.: +886 4 24517250x4076.
3 many fraud risk factors to identify the importance of each fraud
Tel.: +886 5 2720411x34501; fax: +886 5 2721197.

http://dx.doi.org/10.1016/j.knosys.2015.08.011
0950-7051/Ó 2015 Elsevier B.V. All rights reserved.
460 C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470

factor. Nevertheless, to identify and to rank the importance of the fraud [38]. The fraud triangle describes the probability of financial
fraud risk factors becomes a critical issue since the limited budgets reporting fraud which depends on three factors: incentives/
are always one of the main concerns encountered by today’s busi- pressures, opportunities, and attitudes/rationalization of financial
nesses. This paper tries to rank the importance of frauds to provide statement fraud. Gozman and Currie [14] suggested that the poten-
the solutions to meet the aforementioned challenges of the limited tial for fraud is increased where there are incentives, often in the
budget and restricted resources. The rank of the importance of form of the need to meet targets or hide losses. The management
financial frauds may provide a significant advantage to auditors will face the incentives or pressures to resort to fraudulent
and managers in enhancing the efficiency of fraud detection and practice. Opportunity exists, for example, the absence of controls
critical evaluation. or ineffective controls that provide an opportunity for fraud to be
Nowadays, auditing practices have to be conducted in a timely perpetrated. Rationalization depends on the individuals and the
manner to cope with an increasing number and occurrence of finan- circumstances they are facing and occurs when the perpetrator
cial statement fraud cases. The novel techniques such as data min- constructs a justification for the fraud.
ing, claims that it has advanced classification and prediction
capabilities and can be employed to facilitate auditors’ role in terms 2.2. Experts’ decision
of successfully accomplishing the task of fraud detection. There has
been a limited use of data mining techniques for the detection of Because of the limited budgets, how to identify the fraud factors
financial statement frauds [38]. Data mining plays an important and rank the importance of those fraud factors becomes a critical
role in financial fraud detection, as it is often applied to extract issue. Prior researches determine the relative importance of fraud
and uncover the hidden truths behind the very large quantities of factors by using AHP (Analytic Hierarchy Process) in order to deter-
data [29]. Lin et al. [23] conducted an experts’ questionnaire survey mine the relative weightings of each individual item. Apostolou
to evaluate the fraud factors using Lawshe’s approach. The result of and Hassell [2] used experts’ decision such as Big5 auditors, inter-
this expert questionnaires shows that 32 factors can be regarded as nal auditors, and accounting academics through AHP to determine
the suitable measurements for the continuing assessment of fraud the relative importance of the 14 fraud risk factors identified in SAS
detection. Following the study of Lin et al. [23], the first objective No. 53. Further, Apostolou et al. [3] provided 25 red flags identified
of this study is to use different tools and techniques such as logistic in SAS No. 82. They used the experts’ decision technique to assess
regression model and data mining to examine the ranking of the the relative critical fraud factors in three factor group including
fraud factors and test out the effectiveness of the fraud detection management characteristics and influence over the control envi-
tools by using the published financial data. ronment, industry conditions, and operation and financial stability
Furthermore, Lin et al. [23] listed the top five fraud factors characteristics. Mock and Turner [27] also examined the response
including ‘‘Poor performance”, ‘‘The need for external financing”, to SAS No. 82 from three large international audit firms. Of the
‘‘Financial distress”, ‘‘Insufficient board oversight”, and ‘‘Competi- three audit firms examined, they found that two attempted to
tion or market saturation” by sequence. However, the judgments reach an assessment through some form of formal scoring system.
of the experts were merely made according to their own experi- Fraud risk factors in the newest fraud standards (SAS 99, ISA 240,
ence and specialized knowledge. To resolve this limitation, the sec- ASA 240, and TSAS 43) are all based on ‘‘the fraud triangle”. Lin et al.
ond objective of this paper is to discuss whether or not the [23] used Lawshe’s approach and 32 factors are considered by
suggestion of the experts agrees with the result obtained from experts to be the measurements suitable for the continuing assess-
adopting those novel techniques such as logistic regression model ment of fraud detection. The same study further adopts AHP in cal-
and data mining. It is the authors’ hope to use these aforemen- culating the weightings of individual measurement items to rank
tioned techniques to verify the judgments of the experts to figure the importance of factors for three aspects of the fraud triangle.
out what will be the real financial situation to deal with. In addi- Their research indicated that in the fraud triangle dimension, the
tion, most of previous studies tend to use surveys or subjective highest weight is ‘‘Pressure/Incentive”, and the next is ‘‘Opportu-
measurements to identify the fraud factors and by doing so, the nity”, while the lowest one is ‘‘Attitude/rationalization”. In terms
data sets are unavailable to other researches or users so that it is of the category in each dimension, the top five most important fac-
difficult to perform the empirical research to verify the correctness. tors are ‘‘Poor performance”, ‘‘The need for external financing”,
To bridge this gap, this proposed study uses the public information ‘‘Financial distress”, ‘‘Insufficient board oversight”, and ‘‘Competi-
to proxy variables measurement and consequently, the results can tion or market saturation” by sequence. In specific, 11 of 32 factors
be available for other researches or users for a public scrutiny [24]. belong to the pressure/incentive dimension, the other 15 factors
belong to opportunity dimension and the last 6 of 32 factors belong
to the attitude/rationalization dimension. In addition, the same
2. Literature review study utilizes the Analytic Hierarchy Process (AHP) to calculate
the weightings of individual measurement items and then, rank
2.1. Fraud triangle the importance of factors to form the three aspects of fraud triangle.
Experts do make judgments according to their work experience
The fraud triangle theory is developed by Cressey [10] and this and professional knowledge. The results of experts’ decision in
theory has been widely used by professionals as a useful, theoret- relative importance of fraud risk factors might be different from
ical model to explain why most frauds occur. This theory posits the real situations. To bridge this gap, this research hopes to
that the fraud is likely to occur because of the availability of one analyze these differences through data mining technique [24].
or more of the three elements (e.g., pressure, opportunities, or
rationalization) of the fraud triangle [1]. Sixty percent of all fraud 2.3. Detecting tools
incidents involved an insider [36]. Srivastava et al. [44] indicated
that in the accounting profession, there has been an increased Traditional analytical review, which mainly involves with the
attention on the responsibility of the auditors to adequately assess ratio analysis, has yielded a rather limited success in identifying
the risk of fraudulent financial reporting. In fact, the newest fraud the fraud. One of the problems with using ratio analysis is related
standards (e.g., SAS 99, ISA 240, ASA 240, and/or TSAS 43) to the subjectivity involved in the identification of the ratios that
about fraud risk factors are all based on ‘‘the fraud triangle”. are likely to indicate a fraud [16,17]. The study of Nigrini and
Understanding the fraud triangle is essential to evaluating financial Mittermaier [30] discussed various analytical procedures which
C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470 461

auditors can employ during the planning stage. Data mining is constructed by many nodes and branches on different stages and
deemed as one of the possible approaches and also a critical vehi- with various conditions. They are multistage decision systems in
cle to nugget the implicit, unknown, potential and valuable knowl- which classes are sequentially rejected until an accepted class is
edge. In addition, data mining is recognized widely as an iterative finally reached. To this end, the critical feature space is split into
process within which the progress is defined by the discovery of unique regions, corresponding to the classes, in a sequential man-
various relationships, through either automatic or manual meth- ner [46]. Finally, Magnusson et al. [26] used text mining and
ods. In this subject area, there has been a limited use of data min- demonstrated that the language of quarterly reports provided an
ing techniques so far to detect the financial statement fraud. The indication of the change in the company’s financial status.
data mining techniques used include logistic regression, neural Probabilistic-model-based fraud detection method is mainly
networks, decision trees, and text mining. These aforementioned utilized in detecting the financial fraud. For example, Srivastava
research techniques are reviewed in the following paragraphs [38]. et al. [43] modeled the sequence of operations in the credit card
Based on the data mining techniques, a response model can be transaction processing using a hidden Markov model (HMM) and
built as a decision model for the prediction or the classification of a showed how it can be used for the detection of frauds. Further,
domain problem potentially like expert systems [25]. Binary logis- Xing and Girolami [49] employed the Latent Dirichlet Allocation
tical regression is applied to generate the dichotomous prediction (LDA) to build the user profile signatures and it assumed that
models [20]. Moreover, binary logic deems appropriate because any significant, unexplainable deviations occurred from the normal
logistical regression provides significance tests on the parameter activity of/by an individual user may be strongly correlated with
estimates and enables researchers to generate probabilities for fraudulent activity [31].
fraud and also for each firm to examine the classification accuracy. It is notable that the traditional solutions applied to the stock
This study use logistic regression model to test out the relationship market security may not be sufficient in identifying the attackers
existed between fraud risk factors by experts’ questionnaires and and further the attack plans from the analysis of existing events.
the likelihood of fraud commit. The study of Didimo et al. [11] presented a new system, VISFAN,
Neural networks constitute one of the most widely used tech- for performing the visual analysis of the financial activity net-
niques in data mining [9]. Chen et al. [8] showed empirically that works. It supported the analysts with an effective tool to discover
a neural network not only provides the promising prediction accu- the potential financial crimes such as money laundering and
racy, but also may have a better detecting power and a lower mis- frauds. Moreover, the study of Olszewski [32] proposed a fraud
classification cost comparing with a logit model and auditor detection method based on the user accounts visualization to con-
judgments. The same article suggests that an artificial intelligence duct the threshold-type detection.
technique turns out to perform quite well in identifying a fraud-
lawsuit presence, and hence it can be a supportive tool for 3. Research methodology
practitioners.
Sugumaran et al. [45] indicated that decision trees are also one This study applies the three most representative financial fraud
of the popular methods for feature selection. Decision trees are detection tools which are addressed as follows.

Table 1
Result of expert questionnaires_Lin et al. [23].

Dimension Factor
Pressure/incentive Meet analysts’ forecasts
The need for external financing_Taiwan Corporate Credit Risk Index
The need for external financing_debt/equity ratio
The need for external financing_(cash from operations – mean capital expenditures)/current assets
Poor performance_return on assets
Poor performance_return on equity
Poor performance_at least two annual net losses
Poor performance_at least two annual negative cash flows from operations
Financial distress_going concern opinion
Financial distress_the higher probability of bankruptcy
Competition or market saturation_higher growth
Opportunity Related party transaction_percentage of sales-related party transaction
Insufficient board oversight_consecutive changes in insider holdings
Related party transaction_percentage of purchases-related party transaction
Related party transaction_the ratio of the assurance of related-party
Complex transactions_equity investment ratio
Complex transactions_foreign re-investment
Complex transactions_the percentage of sales which are foreign
Insufficient board oversight_family firm
Insufficient board oversight_deviation between voting and cash flow rights
Insufficient board oversight_percentage of management ownership
Insufficient board oversight_percentage of blockholders ownership
Internal control environment_internal control statement about significant deficiencies
Internal control environment_turnover frequency of internal auditor
Internal control environment_firm age
Other opportunity_audited by a Big 4
Attitude/rationalization Ethic_Historical restate frequency
Earning manipulation
Top management turnover_CEO turnover frequency
Top management turnover_CFO turnover frequency
CPA turnover frequency
Non-separating ownership and control_CEO duality
462 C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470

(1) Logistic regression model to minimize the squared error sum between the actual value and
the predicted value [8]. Unlike other learning algorithms,
This study uses logistic regression model to test out the rela- back-propagation algorithm works, or learns and adjusts the
tionship existed between fraud risk factors by experts’ question- weight, in a backward manner which simply means that it predicts
naires and the likelihood of fraud commit. Stepwise regression is the weighted algorithms by propagating the input from the output.
a common traditional statistical technique used to perform feature To this end, this study may be suitable to adapt to back-
selection [40,47]. To select important variables from a given large propagation algorithms.
set of features, it starts by selecting the best predictors of the
dependent variables. Therefore, this study uses the Stepwise (3) Decision tree
regression of logistic model to obtain the significant effect factors.
Decision tree is a predictive model with a hierarchical or tree
(2) Artificial Neural Network (ANNs) structure. It is used most in the area of classification and prediction
methods. The construction of decision tree based classifiers does
The ANNs methodology can be employed to design the useful not require any domain knowledge or parameter setting, and
non-linear systems by accepting large numbers of inputs, with therefore it is appropriate for exploratory knowledge discovery
the design based solely on the instances of the input–output [15]. The main advantages of Decision Trees are that this technique
relationships. The neural network gains knowledge about the provides a meaningful way of representing acquired knowledge
transformation to be performed by iteratively learning from a and hence makes it easy to extract IF–THEN classification rules.
sufficient training set of samples or input–output training pairs. As discussed earlier, decision tree is a predictive model with
A well-trained network can perform the transformation correctly hierarchical or tree structure and it is used most in the classifica-
and also possess some generalization capability [28]. tion and prediction methods. Breiman et al. [7] developed the
Back-propagation (BP) training algorithm is one of the main algorithms of decision trees ‘‘classification and regression trees
algorithms for training feed forward artificial neural networks (CART)” which is a non-parametric statistical method used to con-
(ANNs). In addition, back-propagation can be regarded as one of struct a decision tree to select from a large number of explanatory
the most well-known and commonly used methods. It is catego- variables that are very critical in determining the response variable
rized as one of the supervised learning models. The purpose of [21]. CART is a single procedure that can be used to analyze either
the back-propagation training is to obtain the weight of each edge categorical or continuous data using the same tree technology.

Table 2
Summary of fraud factors.

Dimension Factors Definition


Pressure/incentive P1_MAF Value obtained by subtraction of company’s realized earnings per share before fraud
from the latest analysts’ earnings forecasts of earnings per share in event year
P2.1_TCRI Taiwan Corporate Credit Risk Index
P2.2_DEratio Debt/equity ratio
P2.3_NEF Value obtained by (cash from operations – mean capital expenditures)/current assets
P3.1_ROA After-tax return on assets
P3.2_ROE Return on equity
P3.3_2Netloss Dummy variable with a value of 1 if firm at least two annual net losses before event
year, otherwise 0
P3.4_2NCFO Dummy variable with a value of 1 if firm at least two annual negative cash flows from
operations before event year, otherwise 0
P4.1_ZFC Estimated by Zmijewski [50] prediction model
P4.2_GC Dummy variable with a value of 1 if company received going concern opinion before
event year, otherwise 0
P5_Hgrowth The different of sales growth rate before event year
Opportunity O1.1_RTPsales Percentage of sales-related party transaction
O1.2_RPTpurchase Percentage of purchases-related party transaction
O1.3_RPTassurance The ratio of the assurance of related-party
O2.1_EIratio Equity investment ratio = total equity investment to total stockholders equity
O2.2_Reinvestment The percentage of firms re-investment
O2.3_Fsales The percentage of sales which are foreign
O3.1_InsiderHchange Equal to insider holdings (before two event year) – insider holdings (before one event
year)
O3.2_Family Dummy variable with a value of 1 if company is family firm, otherwise 0
O3.3_Devation Deviation between voting and cash flow rights
O3.4_ManagOwn Percentage of management ownership
O3.5_BlockOwn Percentage of blockholders ownership
O4.1_ICdeficiency Dummy variable with a value of 1 if the internal control statement of company about
significant deficiencies, otherwise 0
O4.2_InAuditorTF Number of internal auditor switch in the past three years
O4.3_FirmAge How long from establishment registration to event year
O5_Big4 Dummy variable with a value of 1 if the firm is audited by a Big 4 audit firm,
otherwise 0
Attitude/rationalization A1_EM Performance-matched discretionary accruals
A2_CEOtunoverF Number of CEO switch in the past three years
A3_CFOturnoverF Number of CFO switch in the past three years
A4_CPAturnoverF Number of CPA switch in the past three years
A5_CEOduality Dummy variable with a value of 1if chairperson of board holds managerial position
CEO or president, otherwise 0
A6_Hrestatement Number of earnings-affected restatement in two years before event year
C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470 463

Table 3
Descriptive statistics for risk factors from the fraud triangle.

Risk factors from the fraud triangle


Fraud (N = 129), nonFraud (N = 447) Mean Median Stander error Wilcoxon p value
Pressure/incentive
P1_MAF 1 0.789 0.640 7.791 2.812 0.005
0 0.781 0.280 12.128
P2.1_TCRI 1 7.465 7 1.885 7.884 0.000
0 5.862 6.000 1.840
P2.2_DEratio 1 2.329 1.222 8.253 5.490 0.000
0 1.027 0.788 1.399
P2.3_NEF 1 0.016 0.030 0.403 4.002 0.000
0 0.169 0.117 0.392
P3.1_ROA 1 0.002 0.026 0.183 5.662 0.000
0 0.070 0.064 0.090
P3.2_ROE 1 0.113 0.021 0.447 4.866 0.000
0 0.075 0.067 0.168
P3.3_2Netloss 1 0.155 0 0.363 2.326 0.020
0 0.085 0 0.279
P3.4_2NCFO 1 0.225 0 0.419 1.942 0.052
0 0.152 0 0.360
P4.1_ZFC 1 7.468 7.231 1.547 6.871 0.000
0 6.460 6.539 1.147
P4.2_GC 1 0.031 0 0.174 0.877 0.381
0 0.049 0 0.217
P5_Hgrowth 1 0.166 0.073 1.367 0.210 0.833
0 0.258 0.036 6.167
Opportunity
O1.1_RTPsales 1 0.105 0.030 0.182 1.282 0.200
0 0.107 0.030 0.162
O1.2_RPTpurchase 1 0.130 0.010 0.225 1.338 0.181
0 0.113 0.000 0.220
O1.3_RPTassurance 1 0.165 0.010 0.491 1.258 0.209
0 0.095 0.000 0.176
O2.1_EIratio 1 0.279 0.137 0.394 0.898 0.369
0 0.347 0.126 1.405
O2.2_Reinvestment 1 0.408 0.255 0.432 1.676 0.094
0 0.304 0.233 0.272
O2.3_Fsales 1 0.331 0.176 0.343 0.020 0.984
0 0.340 0.213 0.351
O3.1_InsiderHchange 1 3.003 0.930 10.807 1.342 0.180
0 0.180 0.380 10.665
O3.2_Family 1 0.109 0 0.312 2.706 0.007
0 0.045 0 0.207
O3.3_Devation 1 0.413 0.316 0.324 4.498 0.000
0 0.533 0.455 0.345
O3.4_ManagOwn 1 0.013 0.003 0.028 2.634 0.008
0 0.017 0.005 0.029
O3.5_BlockOwn 1 0.215 0.175 0.149 6.160 0.000
0 0.313 0.297 0.171
O4.1_ICdeficiency 1 0.109 0 0.312 2.706 0.007
0 0.045 0 0.207
O4.2_InAuditorTF 1 0.234 0 0.554 0.700 0.484
0 0.197 0 0.494
O4.3_FirmAge 1 22.488 21 11.169 0.083 0.934
0 26.734 21 85.126
O5_Big4 1 0.628 1.000 0.485 3.572 0.000
0 0.783 1.000 0.413
Attitude/rationalization
A1_EM 1 0.217 0.029 0.577 3.112 0.002
0 0.030 0.029 0.669
A2_CEOtunoverF 1 0.664 0 0.835 1.448 0.148
0 0.570 0 0.860
A3_CFOturnoverF 1 0.906 1 1.075 2.698 0.007
0 0.588 0 0.725
A4_CPAturnoverF 1 0.273 0 0.513 2.144 0.032
0 0.168 0 0.386

(continued on next page)


464 C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470

Table 3 (continued)

Risk factors from the fraud triangle


Fraud (N = 129), nonFraud (N = 447) Mean Median Stander error Wilcoxon p value
A5_CEOduality 1 0.336 0 0.474 1.234 0.217
0 0.280 0 0.449
A6_Hrestatement 1 0.234 0 0.646 5.102 0.000
0 0.045 0 0.304

P1_MAF: analyst’s forecast error; P2.1_TCRI: Taiwan Corporate Credit Risk Index; P2.2_DEratio: Debt/equity ratio; P2.3_NEF: (cash from operations – mean capital expen-
ditures)/current assets; P3.1_ROA: After-tax return on assets; P3.2_ROE: Return on equity; P3.3_2Netloss: Dummy variable with a value of 1 if firm at least two annual net
losses before event year, otherwise 0; P3.4_2NCFO: Dummy variable with a value of 1 if firm at least two annual negative cash flows from operations before event year,
otherwise 0; P4.1_ZFC: Estimated by Zmijewski [50] prediction model; P4.2_GC: Dummy variable with a value of 1 if company received going concern opinion before event
year, otherwise 0; P5_Hgrowth: The different of sales growth rate before event year; O1.1_RTPsales: Percentage of sales-related party transaction; O1.2_RPTpurchase:
Percentage of purchases-related party transaction; O1.3_RPTassurance: The ratio of the assurance of related-party; O2.1_EIratio: Equity investment ratio; O2.2_Reinvest-
ment: The percentage of firms re-investment; O2.3_Fsales: The percentage of sales which are foreign; O3.1_InsiderHchange: Changes in insider holdings before two event
years; O3.2_Family: Dummy variable with a value of 1 if company is family firm, otherwise 0; O3.3_Devation: Deviation between voting and cash flow rights;
O3.4_ManagOwn; Percentage of management ownership; O3.5_BlockOwn: Percentage of blockholders ownership; O4.1_ICdeficiency: Dummy variable with a value of 1 if the
internal control statement of company about significant deficiencies, otherwise 0; O4.2_InAuditorTF: Number of internal auditor switch in the past three years;
O4.3_FirmAge: How long from establishment registration to event year; O5_Big4: Dummy variable with a value of 1 if the firm is audited by a Big 4 audit firm, otherwise 0;
A1_EM: Performance-matched discretionary accruals; A2_CEOtunoverF: Number of CEO switch in the past three years; A3_CFOturnoverF: Number of CFO switch in the past
three years; A4_CPAturnoverF: Number of CPA switch in the past three years; A5_CEOduality: Dummy variable with a value of 1if chairperson of board holds managerial
position CEO or president, otherwise 0; A6_Hrestatement: Number of earnings-affected restatement in two years before event year.
p value 6 0.10 level.

When the response variable is categorical, CART produces a classi- ages of the non-fraud companies are ultimately selected as the
fication tree. training dataset and the rest are classified as the testing dataset.
The data was obtained from several sources including the
Taiwan Economic Journal (TEJ) database, public prospectuses,
4. Empirical results
Securities and Futures Investors protection centers, Financial
Supervisory Commission of The Executive Yuan, and The Judicial
4.1. Sample selection and study period
Yuan of Republic of China Law and Regulations Retrieving System.
This research identifies fraud samples which had been included
in the prosecution and judgment cases against the major securities 4.2. Risk factors of fraud detection
crimes released by the Taiwan Securities and Futures Bureau and
the group litigation cases announced by the Securities and Futures According to the results obtained by Lin et al. [23] using Law-
Investors Protection Center during the time period of 1998–2010 she’s approach, the 32 factors are considered by experts to be the
[8]. These cases are chosen because it has the following features suitable measurements for the continuing assessment of fraud
such as (1) it is a public-trading company; (2) the suspicion to detection. In specific, there are 11 of 32 factors belonging to pres-
fraud is the primary element for suing; (3) to be pursuant to Secu- sure/incentive dimension, the other 15 factors belonging to oppor-
rities and Futures Investors protection centers and The Judicial tunity dimension, and the last 6 of 32 factors are related to
Yuan of Republic of China Law and Regulations Retrieving System attitude/rationalization dimension. The results of the expert ques-
violation Articles 20 and 20-1 of the Securities and Exchange Act tionnaires are provided in Table 1.
[8]; and (4) to exclude the financial and insurance companies.
Control firms are matched based on such attributes as year, 4.3. Descriptive statistics and univariate analysis
assets size, industry, and trade market in the year preceding the
event year [4]. To avoid oversampling which may result in This study randomly divides all the sample dataset into the
choice-based sample biases, this research increases the sample of training dataset (70%) and testing dataset (30%). The former one
normal companies to improve the model fit [50]. Actual frequency is used to construct the forecast model while the latter one is used
rates of fraud firms in Taiwan are in general less than 20%. The to test the effectiveness of the forecast model.
authors use ‘‘matched-pair sampling” technique by taking one Following the study of Lin et al. [23], this study uses the 32
fraud firm to match with four non-fraud firms (e.g., 1:4), select fraud factors in three diminutions as shown in Table 2. Further,
non-sued firms from the same aforementioned time period and Table 3 presents the means, medians, and results of Wilcoxon tests.
similar industry such as the sued firms, with an exception that Since data do not conform to assumed normal distribution, we use
when amount of companies in the specific industry is insufficient. Wilcoxon sign rank test to evaluate the differences of mean
129 fraud companies were selected from 1998 to 2010. This between the fraud and non-fraud firms in terms of the indepen-
study uses the method proposed by Beaver [5] to choose the dent variables.
matching samples and this method also eliminates these compa- All the pressure/incentive factors except two factors (P4.2_GC,
nies in the banking and finance sectors. Due to the industry and P5_Hgrowth) are different between the fraud and non-fraud firms.
economic scale requirements, only 447 non-fraud companies meet Only category five ‘‘Competition or market saturation” has no sig-
this criterion of selection. This study repeats the matching process nificant factors existed between the fraud and non-fraud firms.
which is conducted to address the insufficiency of the other 69 This result indicates that the probability of fraud rises if a firm con-
control samples. tains large number of such conditions as analyst’s forecast error,
Further, this study divides all the sample dataset into the train- poor performance, needing more external financing, and facing
ing dataset and testing dataset. The former one is used to construct financial distress.
the forecast model while the latter one is used to test out the effec- In terms of the opportunity factors, seven factors have created a
tiveness of the forecast model. Through the random sampling significant difference. Namely, they are O2.2_Reinvestment which
method, 70 percentages of the fraud companies and 70 percent- is the category of the ‘‘Complex transaction”; O3.2_Family,
C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470 465

Table 4 and O5_Big4 represents the category of ‘‘Other opportunity”. Only


Result of logistic model (stepwise). one category which is ‘‘Related party transaction” fails to have a
Fraud factors Beta Std. error Wald Sig. significant difference. This finding indicates that the probability
Intercept 3.744 0.818 20.935 0 of fraud rises if a firm contains a large number of complex transac-
P2.1_TCRI 0.572 0.112 26.286 0 tions, has an insufficient board oversight, operates in an inade-
P5_Hgrowth 0.381 0.204 3.487 0.062 quate internal control environment, and is audited by non-big
O2.1_EIratio 0.875 0.62 1.996 0.158 four audit firms.
O2.2_Reinvestment 1.375 0.788 3.05 0.081
O3.1_InsiderHchange 0.047 0.019 6.06 0.014
Among the attitude/rationalization factors, four factors have
O3.4_ManagOwn 18.058 8.712 4.297 0.038 significant difference, which are A1_EM, A3_CFOturnoverF,
O3.5_BlockOwn 5.071 1.318 14.795 0 A4_CPAturnoverF, and A6_Hrestatement. This result indicates that
A1_EM 0.84 0.32 6.875 0.009 the probability of fraud rises if a firm contains more earning
A6_Hrestatement 0.993 0.342 8.424 0.004
manipulation, a higher frequency of top management turnover,
-2 Log Likelihood 205.063 Chi-Square 87.974 an uprising frequency of CPA turnover, and an increasing frequency
Cox and Snell .275 Nagelkerke .418
of historical restate.
P2.1_TCRI: Taiwan Corporate Credit Risk Index; P5_Hgrowth: The different of sales
growth rate before event year; O2.1_EIratio: Equity investment ratio; O2.2_Rein- 4.4. Result of logistic model
vestment: The percentage of firms re-investment; O3.1_InsiderHchange: Changes
in insider holdings before two event years; O3.4_ManagOwn; Percentage of man-
agement ownership; O3.5_BlockOwn: Percentage of blockholders ownership; Table 4 shows that nine factors are selected from the stepwise
A1_EM: Performance-matched discretionary accruals; A6_Hrestatement: Number regression. Three factors have a significant negative correlation
of earnings-affected restatement in two years before event year. with fraud (e.g., O2.1_EIratio, O3.4_ManagOwn, O3.5_BlockOwn).
Six factors have a significant positive correlation with fraud (e.g.,
P2.1_TCRI, P5_Hgrowth, O2.2_Reinvestment, O3.1_InsiderHchange,
A1_EM, A6_Hrestatement). All factors effects are consistent with
Table 5
Important of variable in logistic model. prior research except O2.1_EIratio. O2.1_EIratio (Equity investment
ratio) is actually in the category of ‘‘Complex transactions”. In prior
Nodes Important Rank
research, complex transactions are positively related to the fraud
P2.1_TCRI 0.2468 1 which is different from the obtained finding of this research.
O3.5_BlockOwn 0.1748 2
Moreover, there are two significant fraud factors found in each of
A6_Hrestatement 0.1181 3
P5_Hgrowth 0.1154 4
pressure/incentive dimension and opportunity and attitude/
O3.1_InsiderHchange 0.1006 5 rationalization dimension.
A1_EM 0.0984 6 Table 5 indicates the importance of variables included in
O2.2_Reinvestment 0.0580 7 prediction model. The most critical factor is P2.1_TCRI which is
O2.1_EIratio 0.0451 8
in the category of ‘‘The need for external financing” in the pres-
O3.4_ManagOwn 0.0429 9
sure/incentive dimension. Next critical factor is O3.5_BlockOwn
which is in the category of ‘‘Insufficient board oversight” in the
opportunity dimension. The top five fraud factors are ‘‘The need
Table 6 for external financing”, ‘‘Insufficient board oversight”, ‘‘Historical
Result of logistic prediction. restate frequency”, ‘‘Competition or market saturation”, and
‘‘Insufficient board oversight”, respectively.
Practical classification Model classification
The result of Logistic Prediction Analysis provided in Table 6
Non-Fraud Co. Fraud Co. Prediction rate (%)
shows that the total accuracy rate for the training dataset is
Training dataset 83.7%, and the total accuracy rate for the testing dataset is 88.5%.
Non-Fraud Co. 290 27 91.4 As for the detecting power (1 b) of fraud firms with training
Fraud Co. 39 51 56.7
Total accuracy rate 83.7
dataset, the prediction accuracy rate is 56.7%, while the detecting
power with testing dataset is around 71.8%. Prior research focused
Testing dataset
Non-Fraud Co. 118 8 93.6
on the total accuracy rate, but in terms of the entire society, the
Fraud Co. 11 28 71.8 economic loss triggered by the type II error is much more signifi-
Total accuracy rate 88.5 cant. The logistic model has a high total accuracy rate and on the
other hand may have the low type II error.
Another purpose of this research is to compare the prediction
O3.3_Devation, O3.4_ManagOwn, O3.5_BlockOwn which repre- model result with the experts’ decision [23]. Five of all logistic sig-
sents the category of ‘‘Insufficient board oversight”; O4.1_ICdefi- nificant factors (e.g., nine) are matched with top five important cat-
ciency indicates the category of ‘‘Internal control environment”; egories of AHP [22] provided in Table 7.

Table 7
Factors important ranking in logistic and AHP model.

Result of logistic model Rank Result of AHP (Lin et al. [23]) Rank
P2.1_TCRI: The need for external financing (Pressure/Incentive) 1 Poor performance (Pressure/Incentive) 1
O3.5_BlockOwn: Insufficient board oversight (Opportunity) 2 The need for external financing (Pressure/Incentive) 2
A6_Hrestatement: Ethic (Attitude/rationalization) 3 Financial distress (Pressure/Incentive) 3
P5_Hgrowth: Competition or market saturation (Pressure/Incentive) 4 Insufficient board oversight (Opportunity) 4
O3.1_InsiderHchange: Insufficient board oversight (Opportunity) 5 Competition or market saturation (Pressure/Incentive) 5
A1_EM: Earning manipulation (Attitude/rationalization) 6
O2.2_Reinvestment: Complex transactions (Opportunity) 7
O2.1_EIratio: Complex transactions (Opportunity) 8
O3.4_ManagOwn: Insufficient board oversight (Opportunity) 9
466 C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470

Table 8 Table 11
Splitting factors and factor important of CART predict result. Important of factors in ANNs prediction model.

Panel a: The splitting factors Nodes Important Rank


First splitter P3.1_ROA
P4.1_ZFC 0.1106 1
Second splitter O3.5_BlockOwn
A6_Hrestatement 0.0903 2
Third splitter P2.1_TCRI, P1_MAF
P2.1_TCRI 0.0898 3
Fourth splitter O3.3_Devation
O3.5_BlockOwn 0.0716 4
Fifth splitter P3.2_ROE
A3_CFOturnoverF 0.068 5
Factor Important Rank O3.1_InsiderHchange 0.0574 6
A1_EM 0.0527 7
Panel b: Factor important of CART prediction model
P3.2_ROE 0.0513 8
P2.1_TCRI 0.1554 1
O4.1_ICdeficiency 0.0454 9
O3.3_Devation 0.1165 2
P1_MAF 0.0366 10
P3.1_ROA 0.1001 3
A2_CEOtunoverF 0.0333 11
O4.2_InAuditorTF 0.0482 4
P3.1_ROA 0.0319 12
P4.1_ZFC 0.0482 5
O4.2_InAuditorTF 0.0302 13
P2.2_DEratio 0.0482 6
O3.4_ManagOwn 0.0274 14
A1_EM 0.0482 7
O3.2_Family 0.0259 15
O1.1_RTPsales 0.0482 8
O3.3_Devation 0.0245 16
A3_CFOturnoverF 0.0482 9
O1.1_RTPsales 0.0211 17
O1.2_RPTpurchase 0.0482 10
P5_Hgrowth 0.0207 18
O2.3_Fsales 0.0482 11
P3.4_2NCFO 0.0202 19
A6_Hrestatement 0.0482 12
O5_Big4 0.0171 20
P2.3_NEF 0.0482 13
O1.3_RPTassurance 0.0116 21
P5_Hgrowth 0.0482 14
O2.1_EIratio 0.0113 22
O2.2_Reinvestment 0.0482 15
P2.3_NEF 0.0107 23
P3.2_ROE 0.0479 16
P4.2_GC 0.0103 24
O3.5_BlockOwn 0.0015 17
A5_CEOduality 0.009 25
P1_MAF 0.0000 18
A4_CPAturnoverF 0.0077 26
P2.2_DEratio 0.0033 27
O4.3_FirmAge 0.0032 28
O2.3_Fsales 0.0025 29
Table 9 O2.2_Reinvestment 0.0024 30
Result of CART prediction. O1.2_RPTpurchase 0.0013 31
P3.3_2Netloss 0.0009 32
Practical classification Model classification
Non-Fraud Co. Fraud Co. Prediction rate (%)
Training dataset
Non-Fraud Co. 315 15 95.4 Panel B, there are eighteen factors selected from CART. Among
Fraud Co. 25 66 72.2
these factors, there are eight factors belonging to pressure/incen-
Total accuracy rate 90.4
tive dimension; the other seven factors belong to opportunity
Testing dataset
dimension and the last three factors belong to attitude/rationaliza-
Non-Fraud Co. 109 8 93.1
Fraud Co. 7 31 81.6 tion dimension. The most important factor is P2.1_TCRI which falls
Total accuracy rate 90.3 into the category of ‘‘The need for external financing” in the
pressure/incentive dimension. The next important factor is
O3.3_Devation which is from the category of ‘‘Insufficient board
Within the top five categories as obtained from the study of Lin oversight” in the opportunity dimension. The top five important
et al. [23], only the result from the ‘‘Competition or market satura- factors include: ‘‘The need for external financing”, ‘‘Insufficient
tion” is unmatched with our logistic model. In addition, all top five board oversight”, ‘‘Poor performance”, ‘‘Internal control environ-
categories belong to the Pressure/Incentive and Opportunity ment”, and finally ‘‘Financial distress”. These categories are in fact,
dimensions. Surprisingly enough, two factors in the Attitude/ part of Pressure/Incentive and Opportunity dimensions.
rationalization dimension – ‘‘Historical restate frequency” and From the CART prediction analysis shown in Table 9, the total
‘‘Earning manipulation”, are only found from the logistic result. accuracy rate for the training dataset is 90.4%, and the same rate
for the testing dataset is 90.3%. With regard to the detecting power
4.5. Result of decision tree (1 b) of fraud firms with training dataset, the prediction accuracy
rate is 72.2%; while the detecting power with testing dataset is
Table 8 depicts the splitting factors and also the factors which about 81.6%. The CART model may have a high total accuracy rate
are important in the order as they appear in the Decision Tree. In and also have a lower type II error.

Table 10
Ranking of top 10 important factors in CART and AHP model.

Result of CART model Rank Result of AHP (Lin et al. [23]) Rank
P2.1_TCRI: The need for external financing (Pressure/Incentive) 1 Poor performance (Pressure/Incentive) 1
O3.3_Devation: Insufficient board oversight (Opportunity) 2 The need for external financing (Pressure/Incentive) 2
P3.1_ROA: Poor performance (Pressure/Incentive) 3 Financial distress (Pressure/Incentive) 3
O4.2_InAuditorTF: Internal control environment (Opportunity) 4 Insufficient board oversight (Opportunity) 4
P4.1_ZFC: Financial distress (Pressure/Incentive) 5 Competition or market saturation (Pressure/Incentive) 5
P2.3_NEF: The need for external financing (Pressure/Incentive) 6
A6_Hrestatement: Ethic (Attitude/rationalization) 7
O2.3_Fsales: Complex transactions (Opportunity) 8
O1.2_RPTpurchase: Related party transaction (Opportunity) 9
A3_CFOturnoverF: CFO turnover frequency (Attitude/rationalization) 10
C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470 467

Table 12 ‘‘CFO turnover frequency”, are in the Attitude/rationalization


Result of ANNs prediction. dimension of CART prediction model.
Practical classification Model classification
Non-Fraud Co. Fraud Co. Prediction rate (%)
4.6. Result of Artificial Neural Networks (ANNs)
Training dataset
Non-Fraud Co. 304 15 95.2
Fraud Co. 21 67 76.1
Table 11 shows the ranking of all fraud factors obtained from
Total accuracy rate 91.2 ANNs prediction model. The most important factor in ANNs result
Testing dataset
is P4.1_ZFC (the higher probability of bankruptcy) which is of the
Non-Fraud Co. 123 5 96.1 ‘‘Financial distress” category in the pressure/incentive dimension.
Fraud Co. 7 34 82.9 The second important factor is A6_Hrestatement (Historical restate
Total accuracy rate 92.8 frequency) in the attitude/rationalization dimension. In specific,
the top five critical factors, the pressure/incentive and the opportu-
nity dimensions have two factors each and attitude/rationalization
Table 10 shows the top 10 fraud factors ranking in CART and dimension has only one factor. In summary, four of the bottom five
AHP [23] models. The fraud factors of CART prediction model are factors are in the opportunity dimension.
actually in line with the decisions of the experts as obtained from From the ANNs prediction analysis provided in Table 12, the
the study of Lin et al. [23]. In AHP and CART results, all of the top total accuracy rate for the training dataset is 91.2%, and the total
five categories are in Pressure/Incentive and Opportunity dimen- accuracy rate for the testing dataset is 92.8%. With regard to the
sions. The same category existed in both CART and AHP models detecting power (1 b) of the fraud firms with training dataset
include the following three items and they are namely, ‘‘The need the prediction accuracy rate is 76.1%, while the detecting power
for external financing”, ‘‘Poor performance”, and ‘‘Financial with testing dataset is 82.9%. The CART model may have the high
distress”. total accuracy rate and also the lower type II error.
In AHP [22], in terms of the top five categories, only ‘‘Competi- Table 13 shows the top 10 fraud factors from ANNs and AHP
tion or market saturation” is unmatched with CART result. More- [23] models. Some of the fraud factors of ANNs prediction model
over, two fraud factors such as ‘‘Historical restate frequency” and are matched with the decisions of the experts’. However, in AHP

Table 13
Ranking of top 10 important factors in ANNs and AHP model.

Result of ANNs model Rank Result of AHP (Lin et al. [23]) Rank
P4.1_ZFC: Financial distress (Pressure/Incentive) 1 Poor performance (Pressure/Incentive) 1
A6_Hrestatement: Ethic (Attitude/rationalization) 2 The need for external financing (Pressure/Incentive) 2
P2.1_TCRI: The need for external financing (Pressure/Incentive) 3 Financial distress (Pressure/Incentive) 3
O3.5_BlockOwn: Insufficient board oversight (Opportunity) 4 Insufficient board oversight (Opportunity) 4
A3_CFOturnoverF: CFO turnover frequency (Attitude/rationalization) 5 Competition or market saturation (Pressure/Incentive) 5
O3.1_InsiderHchange: Insufficient board oversight (Opportunity) 6
A1_EM: Earning manipulation (Attitude/rationalization) 7
P3.2_ROE: Poor performance (Pressure/Incentive) 8
O4.1_ICdeficiency: Internal control environment (Opportunity) 9
P1_MAF: Meet analysts’forecasts (Pressure/Incentive) 10

Table 14
Comparison with factor importance.

Result of logistic model Result of CART model Result of ANNs model Result of AHP Rank
The need for external financing The need for external financing Financial distress (Pressure/Incentive) Poor performance (Pressure/ 1
(Pressure/Incentive) P2.1_TCRI (Pressure/Incentive) P2.1_TCRI P4.1_ZFC Incentive)
Insufficient board oversight Insufficient board oversight Historical restate frequency_Ethic The need for external 2
(Opportunity) O3.5_BlockOwn (Opportunity) O3.3_Devation (Attitude/rationalization) financing (Pressure/Incentive)
A6_Hrestatement
Historical restate frequency_Ethic Poor performance (Pressure/Incentive) The need for external financing Financial distress (Pressure/ 3
(Attitude/rationalization) P3.1_ROA (Pressure/Incentive) P2.1_TCRI Incentive)
A6_Hrestatement
Competition or market saturation Internal control environment Insufficient board oversight Insufficient board oversight 4
(Pressure/Incentive) P5_Hgrowth (Opportunity) O4.2_InAuditorTF (Opportunity) O3.5_BlockOwn (Opportunity)
Insufficient board oversight Financial distress (Pressure/Incentive) CFO turnover frequency (Attitude/ Competition or market 5
(Opportunity) O3.1_InsiderHchange P4.1_ZFC rationalization) A3_CFOturnoverF saturation (Pressure/
Incentive)
Earning manipulation (Attitude/ The need for external financing Insufficient board oversight Related party transaction 6
rationalization) A1_EM (Pressure/Incentive) P2.3_NEF (Opportunity) O3.1_InsiderHchange (Opportunity)
Complex transactions (Opportunity) Historical restate frequency_Ethic Earning manipulation (Attitude/ Complex transactions 7
O2.2_Reinvestment (Attitude/rationalization) rationalization) A1_EM (Opportunity)
A6_Hrestatement
Complex transactions (Opportunity) Complex transactions (Opportunity) Poor performance (Pressure/Incentive) Meet analysts’forecasts 8
O2.1_EIratio O2.3_Fsales P3.2_ROE (Pressure/Incentive)
Insufficient board oversight Related party transaction (Opportunity) Internal control environment CPA turnover frequency 9
(Opportunity) O3.4_ManagOwn O1.2_RPTpurchase (Opportunity) O4.1_ICdeficiency (Attitude/rationalization)
CFO turnover frequency (Attitude/ Meet analysts’forecasts (Pressure/ Internal control environment 10
rationalization) A3_CFOturnoverF Incentive) P1_MAF (Opportunity)
468 C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470

Table 15
Comparison with predict performance.

Accuracy Method
Logistic CART ANNs
Training Testing Training Testing Training Testing
Overall classification accuracy 83.7% 88.5% 90.4% 90.3% 91.2% 92.8%
Accuracy Prediction Rate (Type I error) 91.4% 93.6% 95.4% 93.1% 95.2% 96.1%
(8.6%) (6.4) (4.6%) (6.9%) (4.8%) (3.9%)
Detecting power (Type II error) 56.7% 71.8% 72.2% 81.6% 76.1% 82.9%
(43.3%) (28.2%) (27.8%) (18.4%) (23.9%) (17.1%)
Misclassification cost
1000 9700 6318 6228 4122 5354 3831
5000 48,497 31,585 31,137 20,609 26,769 19,153
10,000 96,993 63,169 62,273 41,217 53,537 38,305
100,000 969,921 631,681 622,721 412,161 535,361 383,041

result, three of the top five categories are matched and only ‘‘Com- 1000, 5000, 10,000 and 100,000 in turn to investigate whether or
petition or market saturation” and ‘‘Poor performance” are not the cost gap between a Type I error and a Type II error affects
unmatched with ANNs result. Moreover, three fraud factors, ‘‘His- the model performance.
torical restate frequency”, ‘‘CFO turnover frequency”, and ‘‘Earning
manipulation”, are in the Attitude/rationalization dimension of Misclassification Cost = (p + q)/o
ANNs prediction model. Where: p = Type I error ⁄ percentage of fraud ⁄ ratio of cost
q = Type II error ⁄ percentage of non-fraud
4.7. Comparison with factor importance o = percentage of distress ⁄ ratio of cost + percentage of
non-distress
Table 14 shows the fraud factors in difference methods and the
comparison of experts’ decision (AHP_from Lin et al. [23]) and Evidences obtained from Table 15 show that ANNs again has an
empirical data result. Expert’s decisions indicate that the most inexpensive cost in the overall of misclassification compared with
important category of fraud detection is ‘‘poor performance”, CART. CART has an inexpensive cost in overall of misclassification
which is also the category selected by CART and ANNs analysis compared with logistic. As a result, the performance of ANNs and
but not by the logistic analysis. The second critical category CART are unquestionably better than logistic.
selected by experts is ‘‘The need for external financing” which is
also the category selected by all prediction models. In Table 14,
the decisions of experts are most consistent with CART prediction 5. Conclusion
model and only two of all factors (Historical restate frequency, and
CFO turnover frequency) of CART model are unmatched with AHP With the appearance of an increasing number of companies that
analysis. In other words, the experts’ decisions are very consistent resort to these unfair practices, auditors have become quite
with empirical situations. overburdened with the task of detecting frauds. Therefore, the
Two factors, ‘‘P2.1_TCRI (The need for external financing)”, and continuous improvement and development of financial market
‘‘A6_Hrestatement (Ethic)”, are included in the top 10 fraud factors monitoring and surveillance systems with higher analytical
in the logistic, CART, and ANNs prediction models but the factor capabilities to capture the fraud is essential to guarantee and pre-
‘‘A6_Hrestatement” are not included in the AHP analysis. In serve an efficient and effective market. This study uses data mining
addition, five factors, ‘‘O3.5_BlockOwn (Insufficient board techniques to sort out the fraud factors and then rank the impor-
oversight)”, ‘‘A1_EM (Earning manipulation)”, ‘‘P4.1_ZFC (Financial tance of fraud factors. The methods employed in this research
distress)”, ‘‘O3.1_InsiderHchange (Insufficient board oversight)”, are Logistic regression, Decision Trees (CART), and ANNs (Artificial
and ‘‘A3_CFOturnoverF (CFO turnover frequency)”, are included Neural Networks). The study selects 129 fraud cases and 447 non-
in the two prediction models, but two of these factors, ‘‘A1_EM” fraud cases to test out the machine learning expert system that
and ‘‘A3_CFOturnoverF”, are not included in AHP analysis. estimates the likelihood of fraud presence. Empirically, the ANNs
and CART approach achieve the training and testing sample with
4.8. Comparison with predict performance a correct classification rate of 91.2% (ANNs), 90.4% (CART) and
92.8% (ANNs), 90.3% (CART), respectively, which is more accurate
Table 15 summarizes the classification results of the training than logistic model that only reaches 83.7% and 88.5% of correct
and testing data set. The result shows that the correctness of the classification in assessing the fraud presence. In addition, type II
classification in ANNs is greater than in CART, and the correctness error of ANNs drops dramatically to 23.9% from 43.3% and 27.8%
of the classification in CART is greater than that in logistic in both compared to CART and logistic models. Furthermore, the rate of
training and testing samples. However, it may be more important correct classification is higher than the models developed by
to classify a fraud firm correctly than to classify a non-fraud firm [35,17].
correctly. Therefore, an analysis for Type I and Type II error is con- This study investigates the differences existed between the
ducted. The Type II error is viewed as more serious than a Type I judgments from the experts (e.g., AHP from Lin et al. [23]) and
error. Both data sets provided in Table 15 reveal that the detecting empirical results of prediction model. In prediction model, two
power of ANNs is much stronger than those of CART and logistic. In fraud factors are included in the top 10 of all prediction models.
other words, ANNs possesses a better quality in classifying a fraud These fraud factors are ‘‘P2.1_TCRI (e.g., Taiwan Corporate Credit
firm. Risk Index)”, and ‘‘A6_Hrestatement (e.g., Historical restate
Berardi and Zhang [6] argue that the relative costs of Type I and frequency)” which belong to ‘‘The need for external financing”
Type II errors are larger to get solutions. We change the multiple of category in the pressure/incentive dimension, and ‘‘Historical
C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470 469

restate frequency” in the attitude/rationalization dimension also be applied to compare with the prediction models provided
accordingly. This result is different with the judgments of the by this study in order to end up with a more reliable conclusion.
experts obtained in the study of Lin et al. [23]. The prediction
model shows that three dimensions of the fraud triangle all play
important roles but in Lin et al. [23] AHP result shows that only References
pressure/incentive and opportunity dimension is in the top five
[1] W. Albrecht, Fraud Examination, Southwestern, New York, 2003.
categories. This gap warns that the auditors and users of financial [2] B.A. Apostolou, J.M. Hassell, An overview of the analytic hierarchy process and
statement should pay more attention to the attitude/rationaliza- its use in accounting research, J. Account. Lit. 12 (1) (1993) 1–28.
tion dimension, especially when the firm has a high frequency of [3] B.A. Apostolou, J.M. Hassell, S.A. Webber, G.E. Summers, The relative
importance of management fraud risk factors, Behav. Res. Account. 13 (1)
financial restatements. (2001) 24–31.
In addition, five factors, ‘‘O3.5_BlockOwn (e.g., Insufficient [4] M.S. Beasley, An empirical investigation of the relation between board of
board oversight)”, ‘‘A1_EM (e.g., Earning manipulation)”, director composition and financial statement fraud, Account. Rev. 71 (4)
(1996) 443–460.
‘‘P4.1_ZFC (e.g., Financial distress)”, ‘‘O3.1_InsiderHchange (e.g., [5] W. Beaver, Financial ratios as predictors of failure, J. Account. Res. 4 (3) (1966)
Insufficient board oversight)”, and ‘‘A3_CFOturnoverF (e.g., CFO 71–111.
turnover frequency)”, are included in two of the prediction models. [6] V.L. Berardi, G.P. Zhang, The effect of misclassification costs on neural network
classifiers, Decis. Sci. 30 (3) (1999) 659–682.
However, two of these factors, ‘‘A1_EM” and ‘‘A3_CFOturnoverF”, [7] L. Breiman, J. Friedman, R. Olshen, S. Stone, Classification and Regression Trees,
are not included in the top 10 categories of the experts’ judgments Chapman and Hall/CRC Press, Boca Raton, FL, 1984.
on Lin et al. [23]. This gap signals that the auditors and users of [8] H.J. Chen, S.Y. Huang, Y.N. Shih, C.T. Hsiao, Discussing the financial fraud factor
detection, Chin. Manage. Rev. 12 (4) (2009) 1–22.
financial statements should put an alert on the attitude/rational- [9] H.J. Chen, S.Y. Huang, C.S. Lin, Using financial indicator approach to construct a
ization dimension, especially when the firm manipulates the earn- TSEC & OTC listed companies litigation warning model, Int. Res. J. Financ. Econ.
ings and has a high frequency of CFO turnover. 38 (2) (2010) 78–93.
[10] D.R. Cressey, Other People’s Money, Rev. Ed., Patterson Smith Publishing
From the top 10 critical fraud factors in different prediction
Corporation, Montclair, New Jersey, 1973.
models, the judgments of experts are most consistent with CART [11] W. Didimo, G. Liotta, F. Montecchiani, P. Palladino, An advanced network
prediction model. Only two of the fraud factors (e.g., Historical visualization system for financial crime detection, in: Pacific Visualization
restate frequency, and CFO turnover frequency) of CART model Symposium (PacificVis), IEEE, 2011, pp. 203–210.
[12] P.R. Gillett, N. Uddin, CFO intentions of fraudulent financial reporting, Audit.: J.
are unmatched with AHP results. In other words, the decisions of Pract. Theory 24 (1) (2005) 55–76.
the experts are very consistent with the empirical results. [13] S. Goode, D. Lacey, Detecting complex account fraud in the enterprise: the role
Compared with prior researches, this research may have the of technical and non-technical controls, Decis. Support Syst. 50 (4) (2011)
702–714.
following contributions. First, this research based on fraud triangle [14] D. Gozman, W. Currie, The role of investment management systems in
adopted in the fraud standards (SAS 99, and TSAS 43) develops regulatory compliance: a post-financial crisis study of displacement
different prediction models without using the subjective measure- mechanisms, J. Inform. Technol. 29 (1) (2014) 44–58.
[15] J. Han, M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufman
ments. Secondly, this research compares the effectiveness of Publishers, 2006.
different tools to detect fraud and find out the gaps existed [16] C.E. Hogan, Z. Rezaee, R.A. Riley, U.K. Velury, Financial statement fraud:
between the judgments of the experts and different prediction insights from the academic literature, Audit.: J. Pract. Theory 27 (2) (2008)
231–252.
models. [17] K. Kaminski, T. Wetzel, L. Guan, Can financial ratios detect fraudulent financial
This study has some practical implications for the accounting reporting?, Manage Audit. J. 19 (1) (2004) 15–28.
practitioners, internal auditors, and fraud examiners. It provides [18] D.S. Kerr, U.S. Murthy, The importance of the CobiT framework IT processes for
effective internal control over financial reporting in organizations: an
the prescriptive information on what fraud detection and preven-
international survey, Inform. Manage. 50 (7) (2013) 590–597.
tion methods work best. In other words, it suggests that a simple [19] E. Kirkos, C. Spathis, Y. Manolopoulos, Data mining techniques for the
binary technique may not be sufficiently useful in identifying detection of fraudulent financial statement, Expert Syst. Appl. 32 (4) (2007)
fraudulent risks. However, a neural network system performs quite 995–1003.
[20] T.S. Lee, Y.H. Yeh, Corporate governance and financial distress: evidence from
well and might be a supportive tool for the practitioners. This also Taiwan, Corp. Gov. Int. Rev. 12 (3) (2004) 378–388.
implies that an ongoing innovation in artificial intelligence is nec- [21] T.S. Lee, C.C. Chiu, Y.C. Chou, C.J. Lu, Mining the customer credit using
essary and could be employed to facilitate the evaluation of audit- classification and regression tree and multivariate adaptive regression splines,
Comput. Stat. Data Anal. 50 (4) (2006) 1113–1130.
ing evidence. Accounting practitioners and their management level [22] C. Lennox, P. Lisowsky, J. Pittman, Tax aggressiveness and accounting fraud, J.
may wish to consider investing in these methods in order to pre- Account. Res. 51 (4) (2013) 739–778.
vent costly frauds in their organizations and respond to the press- [23] C.C. Lin, S.Y. Huang, A.A. Chiu, Fraud detection using fraud triangle risk factors
with Analytic hierarchy process, in: 2012 Annual Meeting of the American
ing demands of regulatory agencies and legal requirements such as Accounting Association, 2012.
those imposed by the Sarbanes–Oxley Act of 2002. [24] S.Y. Huang, C.C. Lin, A.A. Chiu, Using data mining techniques to identify and
There are some limitations of this research. As the sampling rank the fraud factors, in: American Accounting Association Annual Meeting
and Conference on Teaching and Learning in Accounting, 2014.
period of the study is thirteen years, most of the prosecuted com- [25] C.L. Lu, T.C. Chen, A study of applying data mining approach to the information
panies may have been delisted. In addition, the earlier financial disclosure for Taiwan’s stock market investors, Expert Syst. Appl. 36 (2) (2009)
statements are difficult to access due to the prolonged study period 3536–3542.
[26] C. Magnusson, A. Arppe, T. Eklund, B. Back, H. Vanharanta, A. Visa, The
of time, therefore the samples with incomplete financial data must
language of quarterly reports as an indicator of change in the company’s
be eliminated, which may affect the prediction rate. Moreover, as financial status, Inform. Manage. 42 (4) (2005) 561–574.
the study scope simply covers the listed and OTC companies and [27] T.J. Mock, J.L. Turner, Auditor identification of fraud risk factors and their
excludes the corporations in other markets, the applicability of impact on audit programs, Int. J. Audit. 9 (1) (2005) 59–77.
[28] S. Mukkamala, A.H. Sung, A. Abraham, Intrusion detection using an ensemble
the other models may need to be investigated further. of intelligent paradigms, J. Netw. Comput. Appl. 28 (2) (2004) 167–182.
About the prediction model, this study considers several widely [29] E.W.T. Ngai, Y. Hu, Y.H. Wong, Y. Chen, X. Sun, The application of data mining
used techniques to develop the prediction models; there may be techniques in financial fraud detection: a classification framework and an
academic review of literature, Decis. Support Syst. 50 (3) (2011) 559–569.
some other algorithms available in literature such as genetic algo- [30] M.J. Nigrini, L.J. Mittermaier, The use of Benford’s Law as an aid in analytical
rithms for the classification techniques, which can be also applied. procedures, Audit.: J. Pract. Theory 16 (1997) 52–67 (Fall).
However, it turns out to be rather difficult to conduct a compre- [31] D. Olszewski, A probabilistic approach to fraud detection in
telecommunications, Knowl.-Based Syst. 26 (2012) 246–258.
hensive study to include all existing classification techniques. [32] D. Olszewski, Fraud detection using self-organizing map visualizing the user
Thus, for future work, other classifier or clustering methods can profiles, Knowl.-Based Syst. 70 (2014) 324–334.
470 C.-C. Lin et al. / Knowledge-Based Systems 89 (2015) 459–470

[33] S. Owusu-Ansah, G.D. Moyes, P.B. Oyelere, D. Hay, An empirical analysis of the [42] C. Spathis, Detecting false financial statements using published data: some
likelihood of detecting fraud in New Zealand, Manag. Audit. J. 17 (4) (2002) evidence from Greece, Manage. Audit. J. 17 (4) (2002) 179–191.
192–204. [43] A. Srivastava, A. Undu, S. Sural, A.K. Majumdar, Credit card fraud detection
[34] P.F. Pai, M.F. Hsu, M.C. Wang, A support vector machine-based model for using hidden Markov model, IEEE Trans. Dependable Secure Comput. 5 (1)
detecting top management fraud, Knowl.-Based Syst. 24 (2) (2011) 314–321. (2008) 37–48.
[35] O. Persons, Using financial statement data to identify factors associated with [44] R.P. Srivastava, T.J. Mock, J.L. Turner, Bayesian fraud risk formula for financial
fraudulent financial reporting, J. Appl. Bus. Res. 11 (3) (1995) 38–46. statement audits, Abacus 45 (1) (2009) 66–87.
[36] C. Posey, T.L. Roberts, P.B. Lowry, R.J. Bennett, J.F. Courtney, Insiders’ protection [45] V. Sugumaran, V. Muralidharan, K.I. Ramachandran, Feature selection using
of organizational information assets: development of a systematics-based decision tree and classification through proximal support vector machine for
taxonomy and theory of diversity for protection-motivated behaviors, MIS fault diagnostics of roller bearing, Mech. Syst. Signal Process. 21 (2) (2007)
Quart. 37 (4) (2013) 1189–1210. 930–942.
[37] L. Purda, D. Skillicorn, Accounting variables, deception, and a bag of words: [46] S. Theodoridis, K. Koutroumbas, Pattern Recognition, vol. 885, Academic Press,
assessing the tools of fraud detection, Contemp. Account. Res. (2014), http:// 2006.
dx.doi.org/10.1111/1911-3846.12089. [47] C.F. Tsai, Feature selection in bankruptcy prediction, Knowl.-Based Syst. 22 (2)
[38] P. Ravisankar, V. Ravi, G. Raghava Rao, I. Bose, Detection of financial statement (2009) 120–127.
fraud and feature selection using data mining techniques, Decis. Support Syst. [48] S. Tsang, Y.S. Koh, G. Dobbie, S. Alam, SPAN: finding collaborative frauds in
50 (2) (2011) 491–500. online auctions, Knowl.-Based Syst. 71 (2014) 389–408.
[39] Z. Rezaee, Causes, consequences, and deterrence of financial statement fraud, [49] D. Xing, M. Girolami, Employing Latent Dirichlet allocation for fraud detection
Crit. Perspect. Account. 16 (3) (2005) 277–298. in telecommunications, Pattern Recogn. Lett. 28 (13) (2007) 1727–1734
[40] K.S. Shin, Y.J. Lee, A genetic algorithm application in bankruptcy prediction (2007).
modeling, Expert Syst. Appl. 23 (3) (2002) 321–328. [50] M.E. Zmijewski, Methodological issues related to the estimation of financial
[41] A.R. Sorkin, Pulling Back the Curtain on Fraud Inquiries, The New York Times, distress prediction models, J. Account. Res. 22 (1) (1984) 59–82.
December 6, 2010.

Das könnte Ihnen auch gefallen