Beruflich Dokumente
Kultur Dokumente
Behrouz Far
Electrical and Computer Engineering pattern [3]. Early analysis of big data was centred
Calgary University primarily on data volume, for example, general public
Calgary, Canada database [4], biometrics, financial analysis [5], [6].
far@ucalgary.ca For frauds, the credit card is an easy and friendly
target because without any risk a significant amount
Abstract-The goal of data analytics is to of money is obtained within a short period [7]. To
commit credit card fraud, fraudsters try to steal
delineate hidden patterns and use them to
sensitive information such as credit card number,
support informed decisions in a variety of bank account and social security number [8], [6].
situations. Credit card fraud is escalating Fraudsters try to make every fraudulent transaction
significantly with the advancement of legitimate which makes fraud detection a challenging
modernized technology and became an easy problem [9]. Increased credit card transactions show
that approximately 70% of the people in the US can
target for frauds. Credit card fraud has
fall into the trap of these fraudsters [10].
highly imbalanced publicly available Credit card dataset is highly imbalanced dataset
datasets. In this paper, we apply many because it carries more legitimate transactions as
supervised machine learning algorithms to compared to the fraudulent one [11]. That means
detect credit card fraudulent transactions prediction will get very high accuracy score without
detecting a fraud transaction. To handle this kind of
using a real-world dataset. Furthermore, we
problem one better way is to class distribution, i.e.,
employ these algorithms to implement a sampling minority classes [12]. In sampling minority,
super classifier using ensemble learning class training example can be increased in proportion
methods. We identify the most important to the majority class to raise the chance of correct
variables that may lead to higher accuracy prediction by the algorithm [13].
in credit card fraudulent transaction In this paper, we use ten machine learning models
and compare their Accuracy, TPR, FPR, G-mean,
detection. Additionally, we compare and Recall, Precision, Specificity and F1-Score. All
discuss the performance of various machine learning algorithm is evaluated using a real-
supervised machine learning algorithms that world credit card transaction to identity fraud or non-
exist in literature against the super classifier fraud transaction. The main motive of this paper to
apply supervised learning method on the real-world
that we implemented in this paper.
dataset [14].
123
5. Conclusion • We can work on top ten features and find-out
• Under-sampling is done for balancing the the accuracy, Recall, Precision, Confusion
unbalanced dataset. matrix and compare it with our old result.
• The learning model’s evaluation is based on
their accuracy, recall, precision, TPR, FPR,
specificity and G-mean.
• The result of all the purposed models were
superior in overall performance.
• Overall results show that stacking classifier
which is used LR as meta classifier is most
promising for predicting fraud transaction in
the dataset, followed by the random forest
and XGB classifier.
Reference
[1] S. Bhattacharyya, S. Jha, K. Tharakunnel,
and J. C. Westland, “Data mining for credit
card fraud: A comparative study,” Decis.
Support Syst., vol. 50, no. 3, pp. 602–613,
2011.
[2] K. Chaudhary, J. Yadav, and B. Mallick, “A
Figure 2. Classifier ranking based on Recall score
review of Fraud Detection Techniques:
Credit Card,” Int. J. Comput. Appl., vol. 45,
6. Future implications no. 1, pp. 975–8887, 2012.
[3] “Mining of Massive Datasets Second
• Future work will be conducting the using the Edition.”
voting classifier and check the performance [4] F. N. Ogwueleka, “Data Mining Application
with other ML learning methods, increase in Credit Card Fraud Detection System,” vol.
the size of training and testing dataset. 6, no. 3, pp. 311–322, 2011.
• We can work on using the all the machine [5] H. Nordberg, K. Bhatia, K. Wang, and Z.
learning algorithm to find out the features Wang, “BioPig: a Hadoop-based analytic
importance.
124
toolkit for large-scale sequence data,” [18] E. W. T. Ngai, Y. Hu, Y. H. Wong, Y. Chen,
Bioinformatics, vol. 29, no. 23, pp. 3014– and X. Sun, “The application of data mining
3019, Dec. 2013. techniques in financial fraud detection: A
[6] M. Hegazy, A. Madian, and M. Ragaie, classification framework and an academic
“Enhanced Fraud Miner: Credit Card Fraud review of literature,” Decis. Support Syst.,
Detection using Clustering Data Mining vol. 50, no. 3, pp. 559–569, 2011.
Techniques,” Egypt. Comput. Sci., no. 03, [19] C. C. Lin, A. A. Chiu, S. Y. Huang, and D.
pp. 72–81, 2016. C. Yen, “Detecting the financial statement
[7] M. Zareapoor and P. Shamsolmoali, fraud: The analysis of the differences
“Application of credit card fraud detection: between data mining techniques and experts’
Based on bagging ensemble classifier,” judgments,” Knowledge-Based Syst., vol. 89,
Procedia Comput. Sci., vol. 48, no. C, pp. pp. 459–470, 2015.
679–686, 2015. [20] N. Carneiro, G. Figueira, and M. Costa, “A
[8] O. S. Yee, S. Sagadevan, N. Hashimah, and data mining based system for credit-card
A. Hassain, “Credit Card Fraud Detection fraud detection in e-tail,” Decis. Support
Using Machine Learning As Data Mining Syst., vol. 95, pp. 91–101, 2017.
Technique,” vol. 10, no. 1, pp. 23–27. [21] A. C. Bahnsen, D. Aouada, and B. Ottersten,
[9] K. Randhawa, C. K. Loo, M. Seera, C. P. “Example-dependent cost-sensitive logistic
Lim, and A. K. Nandi, “Credit Card Fraud regression for credit scoring,” Proc. - 2014
Detection Using AdaBoost and Majority 13th Int. Conf. Mach. Learn. Appl. ICMLA
Voting,” IEEE Access, vol. 6, pp. 14277– 2014, pp. 263–269, 2014.
14284, 2018. [22] M. Carminati, R. Caron, F. Maggi, I. Epifani,
[10] N. Mahmoudi and E. Duman, “Detecting and S. Zanero, “BankSealer: A decision
credit card fraud by Modified Fisher support system for online banking fraud
Discriminant Analysis,” Expert Syst. Appl., analysis and investigation,” Comput. Secur.,
vol. 42, no. 5, pp. 2510–2516, 2015. vol. 53, pp. 175–186, 2015.
[11] A. Dal Pozzolo, O. Caelen, Y. A. Le Borgne, [23] C. Whitrow, D. J. Hand, P. Juszczak, D.
S. Waterschoot, and G. Bontempi, “Learned Weston, and N. M. Adams, “Transaction
lessons in credit card fraud detection from a aggregation as a strategy for credit card fraud
practitioner perspective,” Expert Syst. Appl., detection,” Data Min. Knowl. Discov., vol.
vol. 41, no. 10, pp. 4915–4928, 2014. 18, no. 1, pp. 30–55, Feb. 2009.
[12] M. A. Scholar, M. Ali, and P. Fellow, [24] G. Rushin, C. Stancil, M. Sun, S. Adams,
“Investigating the Performance of Smote for and P. Beling, “Horse race analysis in credit
Class Imbalanced Learning : A Case Study card fraud - Deep learning, logistic
of Credit Scoring Datasets,” vol. 13, no. 33, regression, and Gradient Boosted Tree,”
pp. 340–353, 2017. 2017 Syst. Inf. Eng. Des. Symp. SIEDS 2017,
[13] H. He, W. Zhang, and S. Zhang, “A novel pp. 117–121, 2017.
ensemble method for credit scoring: [25] R. J. Bolton, D. J. Hand, F. Provost, L.
Adaption of different imbalance ratios,” Breiman, R. J. Bolton, and D. J. Hand,
Expert Syst. Appl., vol. 98, pp. 105–117, “Statistical Fraud Detection: A
May 2018. ReviewCommentCommentRejoinder,” Stat.
[14] A. Dal Pozzolo, O. Caelen, R. A. Johnson, Sci., vol. 17, no. 3, pp. 235–255, 2002.
and G. Bontempi, “Calibrating Probability [26] X.-Y. Liu, J. Wu, and Z.-H. Zhou,
with Undersampling for Unbalanced “Exploratory Undersampling for Class-
Classification.” Imbalance Learning,” vol. 39, no. 2, 2009.
[15] Y. Sahin and E. Duman, “Detecting Credit [27] E. A. Mohammed, M. M. A. Mohamed, C.
Card Fraud by Decision Trees and Support Naugler, and B. H. Far, “Toward leveraging
Vector Machines,” Int. Multiconference Eng. big value from data: chronic lymphocytic
Comput. Sci., vol. I, pp. 442–447, 2011. leukemia cell classification,” Netw. Model.
[16] V. Van Vlasselaer et al., “APATE: A novel Anal. Heal. Informatics Bioinforma., vol. 6,
approach for automated credit card no. 1, p. 6, Dec. 2017.
transaction fraud detection using network- [28] G. H. John and P. Langley, “Estimating
based extensions,” Decis. Support Syst., vol. Continuous Distributions in Bayesian
75, pp. 38–48, 2015. Classifiers,” Feb. 2013.
[17] C. Phua, D. Alahakoon, and V. Lee,
“Minority report in fraud detection,” ACM
SIGKDD Explor. Newsl., vol. 6, no. 1, p. 50,
2004.
125