Beruflich Dokumente
Kultur Dokumente
BY
AUGUST, 2017
CERTIFICATION
This is to certify that this research work titled “Electronic Banking Fraud Detection using
Data Mining Techniques and R for implementing Machine Learning Algorithms in
prevention of Fraud” was carried out by Aluko, Sayo Enoch with Matric Number: 070313043
and has been prepared with regulations governing the presentation of projects for the award of
Master of Sciences degree in Statistics, Department of Mathematics, School of Post Graduate
Study, University of Lagos, Akoka, Lagos.
------------------------------------------- ------------------------------------
ALUKO, SAYO ENOCH DATE
Researcher
------------------------------------------- ------------------------------------
DR. (Mrs.) M. I. AKINYEMI DATE
Project Supervisor
------------------------------------------- ------------------------------------
Prof. J.O OLALERU DATE
Head of Department
------------------------------------------- ------------------------------------
DATE
External Examiner
DEDICATION
This research work is dedicated first to my father in heaven (GOD) and Savior, Lord Jesus
Christ. My mother; Mrs. Olufumilola Comfort Aluko, My father, Late Mr. Samuel
Oluwole Aluko and grand-mother, Late Princess Alice Adeleye Aluko; My humble
gratitude goes to my siblings; Mr. Gbenga Aluko, Dr. Seun Aluko, Mr. James Aluko
(FNM), Aluko Cornel, Aluko Temilola Abidemi... Nieces and Nephews, Friends,
Colleagues, good and well wishers
ACKNOWLEDGEMENTS
The fact that I am writing this sentence echoes my enormous indebtedness to the Almighty God for his
grace, mercy and most importantly his unending blessings and favour towards me.
My profound gratitude goes to my parents; my mother; Mrs. Olufumilola Comfort Aluko, My father, Late
Mr. Samuel Oluwole Aluko and grand-mother, Late Princess Alice Adeleye Aluko; my humble gratitude
goes to my siblings; Mr. Gbenga Aluko, Dr. Seun Aluko, Mr. James Aluko (FNM), Aluko Cornel, Aluko
Temilola Abidemi... Nieces and Nephews, Friends, Colleagues, good and well wishers for their
encouragement, words of wisdom, financial support and so much more, for always supporting me and
being my source of strength.
At this point, I also thank my project supervisor, Dr. M.I Akinyemi for her encouragement, guardians,
support and so much more, for the contribution and effort towards the success of this research work.
Dr. (Mrs.) M.I Akinyemi, for guiding my work tirelessly, going through all my draft, providing valuable
suggestions and constructive criticism, for the improvement of the dissertation
Finally, I express my gratitude to my entire families, beloved colleagues and friends.
To all of us, God’s richest blessing is ours!
ABSTRACT.
This research work deals with the procedures for computing the presence of outliers using various distance
measures and general detection performance for unsupervised machine learning, such as the K-Mean
Clustering Analysis and Principal Component Analysis. A comprehensive evaluation of Data Mining
Technique, Machine Learning and Predictive modelling for Unsupervised Anomaly Detection Algorithms
on Electronic banking transaction dataset record for over a period of six (6) months, April to September,
2015, consisting of 9 variable data fields and 8,641 observations was used to carry out the survey on fraud
detection. On completion of the underlying system, I can conclude that integrated techniques systems
provide better performance efficiency than a singular system. Besides, in near real-time settings, if a faster
computation is required for larger data sets, just like the unlabelled data set used for this research work,
clustering based method is preferred to classification model.
TABLE OF CONTENTS
Title page i
Certification ii
Dedication iii
Acknowledgements iv
Abstract v
Table of contents vi
1.0 Introduction 1
3.0 Introduction 19
5.1 Conclusion 58
5.2 Recommendations 59
References 60
APPENDIX:
LIST OF FIGURES: 65
Figure 3.1.1 65
Figure 3.5.4 65
Figure 3.5.5 65
Figure 3.5.6 66
Figure 3.5.7 66
Figure 3.6.2 67
Figure 4.2.3 67
Figure 5.1.1 68
Figure 5.1.2 68
CHAPTER ONE
INTRODUCTION
In spite of the challenging economy, the use of e-channel platforms –Internet banking,
Mobile Banking, ATM, POS, Web, etc. has continued to experience significant growth.
According to NIBSS 2015 annual fraud report, transaction volume and value grew by 43.36%
and 11.57% respectively, compared to 2014. Although e-fraud rate in terms of value reduced by
63% in 2015, due, in part, to the introduction of BVN and improved collaboration among banks
via the fraud desks; the total fraud volume increased significantly by 683% in 2015 compared to
2014. Similarly, data released recently by NITDA (Nigeria Information Technology
Development Agency) indicated that Nigeria experienced a total of 3,500 cyber-attacks with
70% success rate, and a loss of $450 million within the last one year.
The sustained growth of e-transactions as depicted by the increased transaction volume and value
in 2015, coupled with the rapidly evolving nature of technology advancements within the
e-channel ecosystem continues to attract cybercriminals who continuously develop new schemes
to perpetrate e-fraud.
What is e-fraud? What is responsible for its growth in Nigeria? What are the major techniques
used by these criminals to commit fraud? Is e-fraud dying in Nigeria? Can it be mitigated?
What is e-fraud?
e-fraud can be briefly defined as Electronic Banking trickery and deception which affects the
entire society, impacting upon individuals, businesses and governments.
Why Is It Growing?
The following inherent factors fuel e-fraud in Nigeria:
i. Dissatisfied staff;
ii. Increased adoption of e-payment systems for transactions due to its convenience
and simplicity;
iii. Emerging payment products being adopted by Nigerian banks;
iv. Growing complexity of e-channel systems;
v. Abundance of malicious code, malware and tools available to attackers;
vi. Rapid pace of technological innovations;
vii. Casual security practices and knowledge gap;
viii. Obscurity approach of the internet;
ix. The increasing role of Third-party processors in switching e-payment
transactions;
x. Passive approach to fraud detection and prevention;
xi. Lack of inter industry collaboration in fraud prevention -banks, telecoms, police,
etc.
Can it be alleviated?
Because of the risk inherent in the e-channel space, many organisations have attempted to
implement the following comprehensive strategies for detecting and preventing e-fraud:
Fraud Policies
Fraud Risk Assessment
Fraud Awareness and Training
Monitoring
Penetration Testing
Collaboration
In conclusion increased revenue, optimized costs, innovations, regulation, convenience
and simplicity are the major factors driving the massive adoption of e-channel platforms in
Nigeria. Furthermore, the usages of these platforms have created opportunities for cyber-thieves
who continuously devise new and sophisticated schemes to perpetrate fraud.
e-fraud will continue to grow, and combating it requires effective fraud strategies, collaboration
and cooperation of many organisations in Nigeria including government agencies and other
countries. If otherwise, cybercriminals would be getting richer from the hard work of others due
to lack of united front on the part of everyone.
1.2 STATEMENT OF THE PROBLEM
Electronic banking is a driving force that is changing the landscape of the banking environment
fundamentally towards a more competitive industry. Electronic banking has blurred the
boundaries between different financial institutions, enabled new financial
products and services, and made existing financial services available in different package,
(Anderson S. 2000), but the influences of electronic banking go far beyond this.
The developments in electronic banking together with other financial innovativeness are
constantly bringing new challenges to finance theory and changing people’s understanding of the
financial system. It is not surprising that in the application of electronic banking in Nigeria, the
financial institutions have to face its problems:-
The main objective of this study is to find out the solution of controlling fraud, since it
seems to be a critical problem in many organisations including the government. Specifically the
following are objective of the study;
i. Identify the factors that cause fraud,
ii. Explore the various techniques of fraud detection
iii. Explore some major detection techniques based on the unlabelled data available for
analysis, which do not contain a useful indicator of fraud. Thus, unsupervised Machine
Learning and predictive modeling with major focus on Anomaly/Outlier Detection (OD)
will be considered as the major techniques for this project work.
1.4 RESEARCH QUESTIONS
Understand the different areas of fraud and their specific detection methods
Identify anomalies and risk areas using data mining and machine learning algorithm
techniques
Carry out some major fraud detection techniques, as a model and encouragement to
initiate fraud detection techniques from different banks working together to achieve more
extensive and better result.
This work considers anomaly detection as the main theme. Therefore, the following resources
illustrate the variety of approaches, methods and tools for the task in each ecosystem. In order to
make sure this study will be successful, data mining and statistical methodology will be explored
to detect fraud and take immediate action to minimize costs. Through the use of sophisticated
data mining tools, millions of transactions can be searched and spot for patterns and detect
fraudulent transactions.
Using sophisticated data mining tools such as Decision trees: Booting trees, Classification trees
and Random forest; Machine learning, Association rules, Cluster analysis and Neural networks.
Predictive models can be generated to estimate things such as probability of fraudulent behavior
or the naira amount of fraud. These predictive models help to focus resources in the most
efficient manner to prevent or recover fraud losses.
In the course of this research work some constraints were encountered, for instance, it
does not make sense to describe fraud detection techniques in great detail in the public domain,
as this gives criminals the information that they require in order to evade detection. Although
data sets are readily available, yet, results are often censored, making them difficult to assess (for
example, Leonard 1993). Many fraud detection problems involve huge data sets that are
constantly evolving; besides, original data sets are modified in order, not to infringe on clients
personal information and for the organisation security measure.
Data Source: Chartered Institute of Treasury Management, Abuja
http://www.cbn.gov.ng/neff%20annual%20report%2015.pdf
http://www.nibbs-plc.com/ng/report/2014fraud.report
https://statistics.cbn.gov.ng/cbn-ElectronicBankingstats/DataBrowser.aspx
I will like to, first summarize the main characteristics of Electronic banking fraud, and
then discuss the related work on different areas of fraud detection. Most published work about
fraud detection is related to the domain of credit card fraud, computer intrusion and
telecommunication fraud. Therefore I will discuss each of these and explain the limitations of the
existing work when applied to detect Electronic banking fraud.
According to Linda D., Hussein A., John P., (2009), In Electronic banking, the interval
between a customer making a payment and the payment being transferred to its destination
account is usually very short. To prevent instant money loss, a fraud detection alert should be
generated as quickly as possible. This requires a high level of efficiency in detecting fraud in
large and imbalanced data.
The fraud behavior is dynamic. According to MasoumehZareapoor, Fraudsters
continually advance their techniques to defeat Electronic banking defenses. Malware, which
accounts for the greater part of Electronic banking fraud, has been reported to have over 55,000
new malicious programs every day. This puts fraud detection in the position of having to defend
against an ever-growing set of attacks. This is far beyond the capability of any single fraud
detection model, and requires the adaptive capability of models and the possibility of engaging
multiple models for leveraging the challenges that cannot be handled by any single model.
(Seeja.K.R, and M.Afshar.Alam, 2012)
The forensic evidence for fraud detection is weak. For Electronic banking transactions, it
is only possible to know source accounts, destination currency value associated with each
transaction, but other external information, for example, the purpose of the spending, is not
available. Moreover, with the exception of ID theft, most electronic banking fraud is not caused
by the hijack of an Electronic banking system but by attacks on customers’ computers. In fraud
detection, only the Electronic banking activities recorded in banking systems can be accessed,
not the whole compromise process and solid forensic evidence (including labels showing
whether a transaction is fraudulent) which could be very useful for understanding nature of the
deception. This makes it challenging to identify sophisticated fraud with very limited
information. (Adnan M. Al-Khatib, 2012)
The customer behavior patterns are diverse. An Electronic banking interface provides a
one-stop entry for customers to access most banking services and multiple accounts. In
conducting Electronic banking business, every customer may perform very differently for
different purposes. This leads to a diversity of genuine customer transactions. In addition,
fraudsters simulate genuine customer behavior and change their behavior frequently to compete
with advances in fraud detection. This makes it difficult to characterize fraud and even more
difficult to distinguish it from genuine behavior. (Tung-shou Chen, 2006)
The Electronic banking system is fixed. The Electronic banking process and system of
any bank are fixed. Every customer accesses the same banking system and can only use the
services in a predefined way. This leads to good references for characterizing common genuine
behavior sequences, and for identifying tiny suspicions in fraudulent Electronic banking.
The above characteristics make it very difficult to detect Electronic banking fraud, and
Electronic banking fraud detection presents several major challenges to the research, especially
for the mainstream data mining community: extremely imbalanced data, big data, model
efficiency in dealing with complex data, dynamic data mining, pattern mining with limited or no
labels, and discriminate analysis of data without clear differentiation. In addition, it is very
challenging to develop a single model to tackle all of the above aspects, which greatly challenge
the existing work in fraud detection. (Tung-shou Chen, 2006)
Thresholding has several disadvantages; it may vary with time of day, type of account and types
of call to be sensitive to fraud investigation without setting off too many false alarms for
legitimate traffic. (Fawcett, T and Provost, F., 1996)
Fawcett and Provost developed an innovative method for choosing account-specific threshold
rather than universals threshold that apply to all accounts or all accounts in a segment. In the
experiment, fraud detection is based on tracking account behaviour. Fraud detection was event
driven and not time driven, so that fraud can be detected as it is happening. Second, fraud
detection must be able to learn the calling pattern on an account and adapt to legitimate changes
in calling behaviour. Lastly, fraud detection must be self-initializing so that it can be applied to
new accounts that do not have enough data for training. The approach adopted probability
distribution functions to track legitimate calling behaviour.
Other models that have been developed in research settings that have promising potential for real
world applications include the Customer Relationship Model, Bankruptcy Prediction Model,
Inventory Management Model, and Financial Market Model. (Fawcett, T and Provost, F.,
1997)
Similarly, it was stated that that many financial institutions see the value of Artificial Neural
Network (ANNs) as a supporting mechanism for financial analysts and are actively investing in
this arena. The models described provide the needed knowledge to choose the type of neural
network to be used. The use of techniques of decision trees, in conjunction with the management
model CRISP-DM, to help in the prevention of bank fraud was evaluated in. The study
recognized the fact that it is almost impossible to eradicate bank fraud and focused on what can
be done to minimize frauds and prevent them. The research offered a study on decision trees, an
important concept in the field of artificial intelligence. The study focused on discussing how
these trees are able to assist in the decision making process of identifying frauds by the analysis
of information regarding bank transactions. This information is captured with the use of
techniques and the CRISP-DM management model of data mining in large operational databases
logged from internet bank.
The Cross Industry Standard Process for Data-Mining – CRISP-DM is a model of a data mining
process used to solve problems by experts. The model identifies the different stages in
implementing a data mining project while, A decision tree is both a data representing structure
and a method used for data mining and machine learning, the model also describe the use of
neural networks in analyzing the great increase in credit card transactions, since credit card fraud
has become increasingly rampant in recent years. This study investigates the efficacy of applying
classification models to credit card fraud detection problems.
Three different classification methods, i.e. decision tree, neural networks and logistic regression
were tested for their applicability in fraud detections. The paper provides a useful framework to
choose the best model to recognize the credit card fraud risk. Detecting credit card fraud is a
difficult task when using normal procedures, so the development of the credit card fraud
detection model has become of significance, whether in the academic or business community
recently.
These models are mostly statistics-driven or artificial intelligent-based, which have the
theoretical advantages in not imposing arbitrary assumptions on the input variables.
In this work, the irregularity detection system Model has sought to reduce the risk level of
fraudulent transactions that take place in the Nigerian banking industry thereby aiding in the
decrement of bank fraud. This will brings about reduced fraudulent transactions if implemented
properly. Neural network technology is appropriate in detecting fraudulent transactions because
of its ability to learn and remember the characteristics of the fraudulent transactions and apply
that “knowledge” when assessing new transactions. (Yuhas B.P., 1993)
The study reinforced the validity and efficiency of AutoNet as a research tool and provides
additional empirical evidence regarding the merits of suggested red flags for fraudulent financial
statements. Reviews the various factors that lead to fraud in our banking system may have some
attachment. Therefore, there must be some factors that may have led to this fraudulent.
RESEARCH METHODOLOGY
3.1 Introduction
This chapter presents the analytical framework and the methodology in building Electronic
Banking Fraud Detection using Data Mining and R for implementing Machine Learning
Algorithms in detection of fraud. The method of analysis was K-Mean Clustering Analysis and
Principal Component Analysis. Accordingly, Predictive model was formulated and adequate
procedures and technique for computing the presence of outliers, using various distance measures
is adopted.
This technique will follow the tabular procedure below for Electronic Banking transactions to
demonstrate the fraud detection process. This process will consist of the following steps, the table
below summarises the steps:
Steps Description
1. read-untagged-data Data (data object name before
preprocessing),
2. data-preprocessing Preprocess and clean the data: group
or aggregate the items together based
on the labelID
Split the data into (behavioral
transaction pattern)
3. create-risk-table Build clusters which identifies groups
within the datasets and numeric
variables using K-Mean Algorithms /
display Discriminant Analysis Plot
4. Modelling Model Principal Component Variables
5. Visualisation Highlight homogeneous groups of
individuals with Parallel Coordinate
Plot (PCP)
6. Prediction Prediction on experimental sets
7. Evaluation Evaluate performance
3.2.2 Credit Card Fraud Detection Methods
On doing the literature survey of various methods for fraud detection I come to the
conclusion that to detect credit card fraud there are a lot of approaches, stated as follows:
Hybridization
Genetic Algorithm
Neural Network
Bayesian Network
Decision Tree
The first group of techniques deals with supervised classification task in transaction level. In
these methods, transactions are labeled as fraudulent or normal based on previous historical data.
This dataset is then used to create classification models which can predict the state (normal or
fraud) of new records. There are numerous model creation methods for a typical two class
classification task such as: rule induction, decision trees and neural networks. This approach is
proven to reliably detect most fraud tricks which have been observed before, it also known as
misuse detection.
The second approach (anomaly detection), deals with unsupervised methodologies which are
based on account behavior. In this method a transaction is detected fraudulent if it is in contrast
with user’s normal behavior. This is because we don’t expect fraudsters behave the same as the
account owner or be aware of the behavior model of the owner. To this aim, we need to extract
the legitimate user behavioral model (i.e. user profile) for each account and then detect fraudulent
activities according to it. Comparing new behaviors with this model, different enough activities
are distinguished as frauds. The profiles may contain the activity information of the account; such
as transaction types, amount, location and time of transactions, this method is also known as
anomaly detection, (Yeung, D., and Ding, Y., (2002).
It is important to highlight the key differences between user behavior analysis and fraud analysis
approaches. The fraud analysis method can detect known fraud tricks, with a low false positive
rate (FPR). These systems extract the signature and model of fraud tricks presented in dataset and
can then easily determine exactly which frauds, the system is currently experiencing. If the test
data does not contain any fraud signatures, no alarm is raised. Thus, the false positive rate (FRP)
can be reduced extremely. However, since learning of a fraud analysis system (i.e. classifier) is
based on limited and specific fraud records, it cannot distinguish or detect original frauds. As a
result, the false negatives rate (FNR), may be extremely high depending on how ingenious are the
fraudsters. User behavior analysis, on the other hand, greatly addresses the problem of detecting
novel frauds. These methods do not search for specific fraud patterns, but rather compare
incoming activities with the constructed model of legitimate user behavior. Any activity that is
enough different from the model will be considered as a possible fraud.
Though, user behavior analysis approaches are powerful in detecting innovative frauds, they
really suffer from high rates of false alarm. Moreover, if a fraud occurs during the training phase,
this fraudulent behavior will be entered in baseline mode and is assumed to be normal in further
analysis. (Yeung, D., and Ding, Y., (2002).
Now I will discuss briefly and introduce some current fraud detection techniques which are
applied to credit card fraud detection tasks, also main advantage and disadvantage of each
approach will be discussed.
An artificial neural network (ANN) is a set of interconnected nodes designed to imitate the
functioning of the human brain, Douglas, L., and Ghosh, S., (1994). Each node has a weighted
connection to several other nodes in adjacent layers. Individual nodes take the input received
from connected nodes and use the weights together with a simple function to compute output
values. Neural networks come in many shapes and architectures. The Neural network
architecture, including the number of hidden layers, the number of nodes within a specific hidden
layer and their connectivity, most be specified by user based on the complexity of the problem.
ANNs can be configured by supervised, unsupervised or hybrid learning methods.
In supervised learning, samples of both fraudulent and non-fraudulent records, associated with
their labels are used to create models. These techniques are often used in fraud analysis approach.
One of the most popular supervised neural networks is back propagation network (BPN). It
minimizes the objective function using a multi-stage dynamic optimization method that is a
generalization of the delta rule. The back propagation method is often useful for feed-forward
network with no feedback. The BPN algorithm is usually time-consuming and parameters like the
number of hidden neurons and learning rate of delta rules require extensive tuning and training to
achieve the best performance. In the domain of fraud detection, supervised neural networks like
back-propagation are known as efficient tool that have numerous applications.
Ragh avendra Patidar, et al. used a dataset to train a three layers back propagation neural
network in combination with genetic algorithms (GA) for credit card fraud detection. In this
work, genetic algorithms was responsible for making decision about the network architecture,
dealing with the network topology, number of hidden layers and number of nodes in each layer.
Also, Aleskerov et al. developed a neural network based data mining system for credit card fraud
detection. The proposed system (CARDWATCH) had three layers auto associative architectures.
They used a set of synthesized data for training and testing the system. The reported results show
very successful fraud detection rates.
In a P-RCE neural network was applied for credit card fraud detection. P-RCE is a type of radial-
basis function networks that usually applied for pattern recognition tasks. Krenker et al. proposed
a model for real time fraud detection based on bi-directional neural networks. They used a data
set of cell phone transactions provided by a credit card company. It was claimed that the system
outperforms the rule based algorithms in terms of false positive rate.
Again in a parallel granular neural network (GNN) is proposed to speed up data mining and
knowledge discovery process for credit card fraud detection. GNN is a kind of fuzzy neural
network based on knowledge discovery (FNNKD).The underlying dataset was extracted from
SQL server database containing sample Visa Card transactions and then preprocessed for
applying in fraud detection. They obtained less average training errors in the presence of larger
training dataset.
According to Yamanishi, K., and Takeuchi, J. (2004), the unsupervised techniques do not need
the previous knowledge of fraudulent and normal records. These methods raise alarm for those
transactions that are most dissimilar from the normal ones. These techniques are often used in
user behavior approach .ANNs can produce acceptable result for enough large transaction dataset.
They need a long training dataset. Self-organizing map (SOM) is one of the most popular
unsupervised neural networks learning which was introduced by SOM and provides a clustering
method, which is appropriate for constructing and analyzing customer profiles, in credit card
fraud detection, as suggested. SOM operates in two phase: training and mapping. In the former
phase, the map is built and weights of the neurons are updated iteratively, based on input
samples, in latter, test data is classified automatically into normal and fraudulent classes through
the procedure of mapping. After training the SOM, new unseen transactions are compared to
normal and fraud clusters, if it is similar to all normal records, it is classified as normal. New
fraud transactions are also detected similarly.
One of the advantages of using unsupervised neural networks over similar techniques is that these
methods can learn from data stream. The more data passed to a SOM model, the more adaptation
and improvement on result is obtained. More specifically, the SOM adapts its model as time
passes. Therefore it can be used and updated electronic in banks or other financial corporations.
As a result, the fraudulent use of a card can be detected fast and effectively. However, neural
networks has some drawbacks and difficulties which are mainly related to specifying suitable
architecture in one hand and excessive training required for reaching to best performance in other
hand. Williams, G. and Milne, P., (2004)
In addition to supervised and unsupervised learning models of neural networks, some researchers
have applied hybrid models. John ZhongLei et.Al., proposed hybrid supervised (SICLN) and
unsupervised (ICLN) learning network for credit card fraud detection. They improved the reward
only rule of SICLN model to ICLN in order to update weights according to both reward and
penalty. This improvement appeared in terms of increasing stability and reducing the training
time. Moreover, the number of final clusters of the ICLN is independent from the number of
initial network neurons. As a result the inoperable neurons can be omitted from the clusters by
applying the penalty rule. The results indicated that both the ICLN and the SICLN have high
performance, but the SICLN outperforms well-known unsupervised clustering algorithms.
(R. Huang, H. Tawfik, A. Nagar., 2010)
Classification models which are based on decision trees and support vector machines (SVM) are
developed and applied on credit card fraud detection problem. In this technique, each account is
tracked separately by using suitable descriptors, and the transactions are attempted to be
identified and indicated as normal or legitimate. Sahin, Y., and Duman, E.,(2011).
The identification is based on the suspicion score produced by the developed classifier model.
When a new transaction is proceeding, the classifier can predict whether the transaction is
normal or fraud.
In this approach, firstly, all the collected data is pre-processed before we start the modeling
phase. Since, the distribution of data with respect to the classes is highly imbalanced, so
stratified sampling is used to under sample the normal records so that the models have chance to
learn the characteristics of both the normal and the fraudulent record’s profile. To do this, the
variables that are most successful in differentiating the legitimate and the fraudulent transactions
are founded. Then, these variables are used to form stratified samples of the legitimate records.
Later on, these stratified samples of the legitimate records are combined with the fraudulent ones
to form three samples with different fraudulent to normal record ratios. The first sample set has a
ratio of one fraudulent record to one normal record; the second one has a ratio of one fraudulent
record to four normal ones; and the last one has the ratio of one fraudulent to nine normal ones.
The variables which are used make the difference in the fraud detection systems. Our main
motive in defining the variables that are used to form the data-mart is to differentiate the profile
of the fraudulent card user from the profile of legitimate card user. The results show that the
classifiers of SVM and other decision tree approaches outperform SVM in solving the problem
under investigation. However, as the size of the training data sets become larger, the accuracy
performance of SVM based models becomes equivalent to decision tree based models, but the
number of frauds caught by SVM models are still less than the number of frauds caught by
decision tree methods. (Carlos Leon, Juan I. Guerrero, Jesus Biscarri., 2012)
The purpose of Fuzzy neural networks is to process the large volume of information which is
not certain and is extensively applied in our lives. Syeda et al in 2002 proposed fuzzy neural
networks which run on parallel machines to speed up the rule production for credit card fraud
detection which was customer-specific. His work can be associated to Data mining and
Knowledge Discovery in data bases (KD). In this technique, he used GNN (Granular Neural
Network) method that uses fuzzy neural network which is based on knowledge discovery
(FNNKD), to train the network fast and how fast a number of customers can be processed for
fraud detection in parallel. A transaction table is there which includes various fields like the
transaction amounts, statement date, posting date, time between transactions, transaction code,
day, transaction description, and etc. But for implementation of this credit card fraud detection
method, only the significant fields from the database are extracted into a simple text file by
applying suitable SQL queries. In this detection method the transaction amounts for any
customer is the key input data. This preprocessing of data had helped in decreasing the data size
and processing, which speeds up the training and makes the patterns briefer. In the process of
fuzzy neural network, data is classified into three categories:
In this work, the Predictive Model for Unsupervised Machine Learning Detection System has
sought to reduce the risk level of fraudulent transactions that take place in the Nigerian banking
industry thereby aiding in the decrement of bank fraud. This will brings about reduced fraudulent
transactions if implemented properly. The efficiency is measured on the basis of frequency of
detecting outliers or unusual behavioral user pattern.
3.3.1 Model for Data Reduction
According to Bruker Daltonics, Data reduction techniques can be applied to obtain a reduced
representation of the data set that is much smaller in volume, yet closely maintains the integrity
of the original data. That is, mining on the reduced data set should be more efficient yet produce
the same (or almost the same) analytical results. (D.L Massart, and Y. Vander Heyden., 2004)
.
.
.
subject to the constraint that the sums of squared coefficients add up to one,
along with the additional constraint that these two components will be uncorrelated with one
another:
All subsequent principal components have this same property; they are linear combinations
that account for as much of the remaining variation as possible and they are not correlated
with the other principal components.We will do this in the same way with each additional
component. For instance:
subject to the constraint that the sums of squared coefficients add up to one; along with
the additional constraint that this new component will be uncorrelated with all the previously
defined components:
We are also going to let the vectors through , that is, ; denote the corresponding
eigenvectors. It turns out that the elements for these eigenvectors will be the coefficients of our
principal components.
The variance for the ith principal component is equal to the ith eigenvalue.
The variance covariance matrix may be written as a function of the eigenvalues and their
corresponding eigenvectors. This is determined by using the Spectral Decomposition Theorem.
This will become useful later when we investigate topics under factor analysis.
Σ
The second expression is a useful approximation if are small. We
might approximate Σ by:
Again, this will become more useful when we talk about factor analysis.
Note, we defined the total variation of X as the trace of the variance covariance
matrix, or if you like, the sum of the variances of the individual variables. This is
also equal to the sum of the eigenvalues as shown below:
Σ
This will give us an interpretation of the components in terms of the amount of the
full variation explained by each component. The proportion of variation explained
by the ith principal component is then going to be defined to be the eigenvalue for
that component divided by the sum of the eigenvalues. In other words, the ith
principal component explains the following proportion of the total variation:
Procedure:
Compute the eigenvalues of the sample variance covariance matrix S, and the
corresponding eigenvectors; then we will define our estimated principal components
using the eigenvectors as
our coefficients:
.
.
.
Generally, we only retain the first k principal component. Here we must balance two
conflicting desires:
1. To obtain the simplest possible interpretation, we want k to be as small as
possible. If we can explain most of the variation just by two principal components
then this would give us a much simpler description of the data. The smaller k is the
smaller amount of variation is explained by the first k component.
2. To avoid loss of information, we want the proportion of variation explained
by the first k principal components to be large. Ideally as close to one as possible;
i.e., we want
3.3.3 Standardize the Variables
According to Baxter, R., and Hawkins, S., (2002), if raw data is used principal
component analysis will tend to give more emphasis to those variables that have
higher variances than to those variables that have very low variances. In effect the
results of the analysis will depend on what units of measurement are used to measure
each variable. That would imply that a principal component analysis should only be
used with the raw data if all variables have the same units of measure. And even in
this case, only if you wish to give those variables which have higher variances more
weight in the analysis.
Summary
The results of principal component analysis depend on the scales at which the
variables are measured. Variables with the highest sample variances will tend to be
emphasized in the first few principal components.
Principal Component analysis using the covariance function should only be
considered if all of the variables have the same units of measurement.
If the variables either have different units of measurement (i.e., pounds, feet, gallons,
etc), or if we wish each variable to receive equal weight in the analysis, then the
variables should be standardized before a principal components analysis is carried
out. Standardize the variables by subtracting its mean from that variable and dividing
it by its standard deviation:
Where,
In this matrix we denote the eigenvalues of the sample correlation matrix R, and the
corresponding eigenvectors
Then the estimated principal components scores are calculated using formulas similar
to before, but instead of using the raw data we will use the standardized data in the
formulae below:
This is the square root of the sum of the squared differences between the measurements
for each variable.
Some other distances also use similar concept. For instance the Minkowski Distance is:
Here the square is replaced with raising the difference by a power of m and instead of
taking the square root, we take the mth root.
Here are two other methods for measuring association:
Canberra Metric
Czekanowski Coefficient
For each of these distance measures, the smaller the distance, the more similar (more
strongly associated) are the two subjects.
1. Symmetry
i.e., the distance between subject one and subject two must be the same as the distance
between subject two and subject one.
2. Positivity
i.e., the distances must be positive, negative distances are not allowed!
3. Identity
i.e., the distance between the subject and itself should be zero.
4. Triangle inequality
This follows from geometric consideration, where we learnt that sum of two sides of a
triangle cannot be smaller than the third side.
3.3.5. Agglomerative Hierarchical Clustering
Combining Clusters in the Agglomerative Approach
In the agglomerative hierarchical approach, we start by defining each data point to be a
cluster and combine existing clusters at each step. Bates, S., and Saker, H., (2006)
1. Single Linkage: In single linkage, we define the distance between two clusters to be
the minimum distance between any single data point in the first cluster and any
single data point in the second cluster. On the basis of this definition of distance
between clusters, at each stage of the process we combine the two clusters that have
the smallest single linkage distance.
2. Complete Linkage: In complete linkage, we define the distance between two clusters
to be the maximum distance between any single data point in the first cluster and any
single data point in the second cluster. On the basis of this definition of distance
between clusters, at each stage of the process we combine the two clusters that have
the smallest complete linkage distance.
3. Average Linkage: In average linkage, we define the distance between two clusters
to be the average distance between data points in the first cluster and data points in
the second cluster. On the basis of this definition of distance between clusters, at
each stage of the process we combine the two clusters that have the smallest
average linkage distance.
4. Centroid Method: In centroid method, the distance between two clusters is the
distance between the two mean vectors of the clusters. At each stage of the process
we combine the two clusters that have the smallest centroid distance.
5. Ward’s Method: This method does not directly define a measure of distance between
two points or clusters. It is an ANOVA based approach. At each stage, those two
clusters merge, which provides the smallest increase in the combined error sum of
squares from one-way univariate ANOVAs that can be done for each variable with
groups defined by the clusters at that stage of the process
According to, Pinheiro, R., and Bates, S., (2000), none of these methods is uniformly
the best. In practice, it’s advisable to try several methods and then compare the results
to form an overall judgment about the final formation of clusters.
Notationally define as:
distances between all pairs and averages of all the distances. This is also called, Uniweighted
pair Group Mean (UPGMA)
4. Centroid Method This involves finding the mean vector location for
each of the clusters and taken the distance between these two centroid. (Vesanto, J.,
&Alhoniemi, E., 2000).
Let assuming each such that the joint probability density functions of
is given by:
3.
a. If
b. ,
When developing learning algorithm, that is choosing features, etc, decision making is
much easier if we have a way of evaluating our learning algorithms:
c. Test Set:
Step4. Predict
Possible Evaluation Metrics (Ref: 3.2.3)
3.4.7 Now, given training, cross validation and test sets, algorithms evaluation is
computed as followed:
3.4.8 Suggestion and Guidelines on how to Design or Choose Features for Anomaly
Detection Algorithms:
1. Plot the histogram of the assumed features from available data set, to confirm if it is
normally distributed.
2. If normal, then fit the algorithms model, else, transform by taking log or any other
appropriate function and check again if the histogram plot validate the normality
assumption
3. Define the new feature as new X and replace with the previous variable X.
4. Then fit the anomaly detection algorithms as stated earlier
3.5.7 Data Pre-Processing for Fraud Detection
The deployment of unsupervised K-Mean clustering algorithm could be too demanding and
unrealistic based on the mathematical and algorithms steps and procedures suggested in the
various literature reviews and research work even for an R_package expert user.
Consequently, I source for a graphical user Interface package, such as rattle for easy
manipulation and implementation based on the guide lines and suggestion by Williams,
Graham. Data Mining with Rattle and R. s.l. Springer
Now, the problem at hand contains large number of data with no prior known features that
can be used for classification. Clustering the data into different groups and trying to
understand the behaviour of each group is suggested as a methodology for modelling the user
behavioral pattern of the transaction data sets. Thus, I explore the dtrans_data and
Aggdtrans_data sets with R/Rattle to validate the legitimate user behavioral model. The
algorithm chosen for clustering the transaction data is K-mean algorithm and the tools for the
implementation are R and Rattle. The following sections will present the algorithm that will
be used for clustering and the tools used for implementing the solution.
K-MEANS is the simplest algorithm used for clustering which is unsupervised clustering
algorithm. This algorithm partitions the data set into k clusters using the cluster mean value
so that the resulting clusters intra cluster similarity is high and inter cluster similarity is low.
K-Means is iterative in nature it follows the following steps:
Arbitrarily generate k points (cluster centres), k being the number of clusters desired.
Calculate the distance between each of the data points to each of the centres, and
assign each point to the closest centre.
Calculate the new cluster centre by calculating the mean value of all data points in the
respective cluster.
With the new centres, repeat step 2. If the assignment of cluster for the data points
changes, repeat step 3 else stop the process.
The distance between the data points is calculated using Euclidean distance as
follows. The Euclidean distance between two points or features,
X1= (x11, x12... x1m) , X2= (x21, x22 ,...., x2m)
Advantages
Gives best result when data set are distinct or well separated from each other.
Disadvantages
The learning algorithm requires apriori specification of the number of cluster centres.
The learning algorithm provides the local optima of the squared error function.
Applicable only when mean is defined i.e. fails for categorical data.
Data reduction techniques can be applied to obtain a reduced representation of the data set
that is much smaller in volume, yet closely maintains the integrity of the original data. That
is, mining on the reduced data set should be more efficient yet produce the same (or almost
the same) analytical results. Strategies for data reduction include the following:
Data aggregation, where aggregation operations are applied to the data in the construction of
optimal data variables and features for the analysis (Bruker Daltonics)
Dimensionality reduction, where encoding mechanisms are used to reduce the dataset size
Numerosity reduction, where the data are replaced or estimated by alternative, smaller data
representations such as parametric models (which need store only the model parameters
instead of the actual data) or nonparametric methods such as clustering, sampling, and the use
of histograms.
Discretization and concept hierarchy generation: where raw data values for attributes are
replaced by ranges or higher conceptual levels. Data discretization is a form of numerosity
reduction that is very useful for the automatic generation of concept hierarchies.
Discretization and concept hierarchy generation are powerful tools for data mining, in that
they allow the mining of data at multiple levels of abstraction.
From the above data reduction strategies, the attribute subset selection strategy has been
selected, for the step of data cleaning and transformation in Rattle typical work flow.
CHAPTER FOUR
4.0 RESULTS AND ANALYSIS
In this chapter, I present the result of the experimental deployment and practical evaluation
of K-Mean Clustering Analysis and Principal Component Analysis on the procedures for
computing the presence of outliers using various distance measures and general detection
performance for unsupervised machine learning on how to design, choose features and
carry out electronic transaction fraud detection
The response variable is initially unsuitable for the proposed model, since it was highly
skewed, I, need transform the transactionNairaAmount and as we can see, the histogram of
the transformed variable with normality curve is displayed above.
4.2 Iteration 1: Applying K-Mean Cluster Analysis on the dtrans_data:
Cluster centres:
The cluster centre table above summarises the measure of association or linkages between
two clusters. This involves finding the mean vector location for each of the clusters and taken
the distances between these two centroid. First, the initial cluster centroid will be randomly
selected from the four variables. The first row, gives the initial cluster centres; the procedure
then working iteratively. The within sum of squares table summarises the nearest neighbors
between two distinct clusters based on the initial table, the cluster centre table. For instance,
from the table above, it seems that cluster 3, is the middle, because, seven (7) of the clusters
(1, 2, 4, 6, 8, 9 and 10), are closest to cluster 3 and not to any other cluster.
Implication: Since the principal purpose is to look at the cluster means for the significant of
explanatory transaction variable identified based on the cluster centres. We can see from row
3, of the cluster centre table, that transactionTime has the highest cluster centre value,
followed by the localHour and so on. Besides, from the tables above it is now clear, that
cluster 3 is the nearest neighbour to cluster 10, based on the best explanatory cluster variables
values (0.8541757 against 0.8846654) and (0.7489533 against 0.7172179). Furthermore, the
graphical display of the score plot in the later analysis will validate this more explicitly.
After Cluster has been built, the display of the Discriminant plot is shown below:
The Discriminant coordinate figure above demonstrated the visual representation of cluster
sizes, ten clusters, altogether as previously explained, which account for has 53.69% of the
point variability as shown in the figure above, cluster sizes varies for each clusters, with 426
as the dimension of the least cluster and 1133 being the dimension of the biggest cluster.
Reference the List of figure 4.4.2 for the remaining cluster sizes.
(Vesanto, J., &Alhoniemi, E.,)
4.4 The result of the Iteration2 on dtrans_data transformed output is display:
Data means:
The Data means table: Now, we can recall that, the principal purpose is to look at the
cluster means for the significant of the best explanatory transaction variable identified based
on the cluster centres. This involves finding the mean vector location for each of the clusters
and taken the distance between these two centroid. Since the distance between two clusters is
the distance between the two mean vectors of the clusters; from the data mean table, we can
see that transactionTime and localHour has the shortest mean distance apart, with data
means value of 0.5011834 and 0.5020773, respectively. Generally, according to, Vesanto, J.,
&Alhoniemi, E., at each stage we combine the two clusters that have the smallest centroid
distance.
Implication: This ascertains further, from the tables above, that transactionTime and
localHour are key explanatory variable.
Cluster centres:
Implication: Since the principal purpose is to look at the cluster means for the significant of
explanatory transaction variable identified based on the cluster centres. We can see from row
3, of the cluster centre table, that transactionTime has the highest cluster centre value,
followed by the localHour and so on. Besides, from the tables above it is now clear, that
cluster 3 is the nearest neighbour to cluster 10, based on the best explanatory cluster variables
values (0.8967377 against 0.8836406) and (0.7156922 against 0.6828752). Similarly, the
graphical display of the score plot in the later analysis will validate this more explicitly.
The PCA is a tool to reduce multidimensional data to lower dimensions while retaining most
of the information. Now, the PCA is a transformation of the old coordinate system (peaks)
into the new coordinate system (PC), it can be estimated how much each of the old
coordinates (peaks) contribute to each of the new ones (PCs). These values are called
loadings. The higher the loading of a particular peak onto a PC, the more it contributes to that
PC. (Vesanto, J., & Alhoniemi, E., 2000)
4.7 Finalizing on the Desired Variables using Principal Component Analysis: PCA
Note that principal components on only the numeric variables are calculated, and so we
cannot use this approach to remove categorical variables from consideration. Any numeric
variables with relatively large rotation values (negative or positive) in any of the first few
components are generally variables that I may wish to include in the modelling. (List of
figures 4.3.2) The explanation of the next three (3), tables is more constructive and
consequential when considered in view of one another. (D.L. Massart and Y. Vander
Heyden)
Standard deviations:
Rotation:
Interpretations:
Loading for the principal components is represented in the Rotation table, this contains a
matrix with loadings of each principal component, where the first column in the matrix
contains loading for the first principal component, and the second column in the matrix
contains loading for the second principal component and so on. Now, from the Rotation table
above, the first principal component (PC1) has the highest (in absolute value) loading for
transactionTime. Similarly, loading for the transaction Date and transactionTime are
‘negative’, while that of localHour is ‘positive’ in view of the transactionNairaAmount.
Consequently, the implication of the first principal component is that, transactionTime
contribute most to PC1, which gives the direction of the highest variance, similarly, PC1
represents a contrast between the explanatory variables: (transactionDate and
TransactionTime against the localHour in relation to the response variable, the
transactionNairaAmount). However, the second principal component PC2 has the highest
loading for transactionDate and localHour, thus, the contrast is mainly between
transactionDate and localHour.
Implication: The original variable are represented in the PC1 and PC2 dimension spaces, as
this will be explicitly demonstrated as a confirmation in the score plot of the PCs in the later
analysis. The PC1 represent the resultant of all values projected in the x-axis and this is
dominated by the transactionTime and to lesser extent, by the localHour. In contrast, the y-
axis (PC2) is defined by the transactionNairaAmount and is dominated by the
transactionDate and to lesser extent, by the localHour. Consequently, the transactions would
be ranked according to the PC1 with the highest scoring explanatory variables being
probably the best at least in terms of transactionTime and localHour
4.8 Determine Number of Components to retain:
In practice; H0: Retain components that account for at least 5% to 10% of the total variance,
Now, if you look under Important of components of the output result, the row indicator is tag:
Proportion of variance, PC1, PC2, PC3 and PC4 columns, gives values greater than 10%,
which are approximately 40%, 21%, 20% and 19% respectively.
Similarly, H0: Retain component that combine account for at least 70% of the Cumulative
Proportion. Now, if you look under Important of components of the output result, the row
indicator is tag: Cumulative Proportion, PC1, PC2, PC3 columns, gives values greater than
70%, which are approximately 80.79% and approximately 100% if PC4 were to be included.
4.8.1 The Loading Plot below reveals the relationship between variables in the space of the
first two components. In the loading plot, we can see that transactionTime and localHour
have similar heavy load for PC1 and PC2, however others have heavy loading for PC3 and
PC4.
Now, main component variables can be expressed as a linear combination of the original
variables; the eigenvectors table above provides coefficients for the equation:
I will only express PC1 and PC2 as a linear combination of the original variables, because
this two constitute the sets or combinations of predictors and response variables scores that
contributed most information in the analysed data sets. (D.L. Massart and Y. Vander
Heyden)
, , ,
Note the principal component variables now represent the aggregation of the desired
variables that is finally included and used in the final modelling and for the implementation
of electronic transaction fraud detection techniques. The primary multidimensional data sets
have been finally reduced to lower dimensions while still retaining most of the information.
Now, the subsequent analysis of the score plot of the explanatory variables against the
response variable below will help validate the previous findings and better perceptive of the
principal components variables.
4.8.3 The score plot of the explanatory variables against the response variable:
The graphical Display of the score plot of the explanatory variables against the response
variable is displayed above for visualising the relationship between variables in the space of
the principal components.
From the score plot above, the interpretation of the axis comes from the analysis of this
figure. Now the original variables are represented in PC1 and PC 2 dimensional spaces.
The PC1 can be interpreted as the resultant of all the values projected on the x-axis. The
longer the projected vector is, the more important is the contribution in the dimension.
The origin of the new coordinate system is located in the centre of the datasets. The first PC,
that is, PC1, points in the direction of the highest variance and is dominated by the
transactionTime. In contrast, the y-axis (PC2), points in the direction of the second highest
variance and is defined by the transactionNairaAmount, dominated by the transactionDate
and to lesser extent, by the localHour, while the coordinate stay perpendicular. (D.L.
Massart and Y. Vander Heyden)
The implication of this will be to rank the transactions according to the PC1 with the highest
scoring explanatory variables values: respectively; being the best,
at least in terms of the transactionTime and the localHour.
This is the Dataset overview for the dtrans_data before applying IDEA (Interactive
Data Exploration Analysis)
We can see that transactionNairaAmount, transactionTime and local hour has been dropped
for the corresponding PC variables accordingly. Now, points can be identified based on the
unique identifier, labelID and linked with brushing across multiple plots, to check mate
deviation of the conventional transaction behavioral model.
In view of the research objectives, I have been able to explore some of the various detection
techniques for unsupervised machine learning such as the K-Mean Cluster Analysis and the
Principal Component Analysis. Besides, the analyses above have helped in perceiving the
user behaviour transaction patterns, identify transactionTime and localHour as two major
explanatory attributes and key factor that can be worked with in e-banking fraud detection;
equally determine the threshold of identification of the relationship. Not only can we identify
the direction of the slope of the relationship between the Principal Component Variables, but
also, we could equally identify the strength of relationship or the degree of the slope. Now, I
shall proceed to the final stage for computing the presence of outlier using the various
distance measures and general detection performance based on the previous analysis.
Identification of Outliers:
4.9.4 3D Plot of the best explanatory variables and the Response Variable.
Outliers are observations which deviate so much from other observations as to arouse
suspicions that was generated by different mechanism. (Abe, N., Zadrozny, B., and
Langford, J.)
An inspection of the 3-plots displayed above, show how transactionNairaAmount varies with
transactionDate and Time. The Datetime is sub divided into 6 transaction time periods or
categories, that is: April, May, June, July, August and September. Since, the original data set
feature transaction from the Month of April to September. That is, between 2015-04-02
01:44:50 and 2015-09-30 23:06:54. [Ref: 3.1.4].
The Interactive graphic data set of these three variable components helps view how
transactionNairaAmount vary in time space or better still with respect to the transactionDate,
and localHour. The following are the user account identities that deviate from the behavioral
pattern based on our model as shown in the above: LabelID = [641B6A70B816],
[AB77E701417E], [C03089119C16], [AA39724E34AD], [973114BAAC2A],
[91C33507469F].
Scatterplot of transactionAmount against transactionTime and localHour
The plot features: The response variable, transactionNairaAmount against the explanatory
variables, transactionDate, transactionTime and the localHour respectively; validating the
above listed labelID’s: [641B6A70B816], [AB77E701417E], [C03089119C16],
[AA39724E34AD], [973114BAAC2A], [91C33507469F]. In conformation with the previous
3-plots demonstrated.
CHAPTER FIVE
SUMMARY OF FINDINGS, CONCLUSION AND RECOMMENDATION
5.1.2 The summary of the experimental research finding and output are summarised below:
Red Flags labelID: the following are the user account identities that deviate from the
behavioral pattern based on our model.
At least, 6 out of the 8,430 transaction dataset are suspected and predicted to be a fraudulent
transaction.
LabelID = [641B6A70B816], [AB77E701417E], [C03089119C16], [AA39724E34AD],
[973114BAAC2A], [91C33507469F]. For detail reference list of Figure 5.1.2
At least, 6 out of the 8,641 transaction dataset are suspected and predicted to be a fraudulent
transaction.
5.2.0 Conclusion:
This research deals with the procedure for computing the presence of outliers using
various distance measures and as a general detection performance result, I can
conclude that nearest-neighbor based algorithms perform better in most cases when
compared to clustering algorithms for a small data sets. Also, the stability concerning
a not-perfect choice of k is much higher for the nearest-neighbor based methods. The
reason for the higher variance in clustering-based algorithms is very likely due to the
non-deterministic nature of the underlying k-means clustering algorithm.
Despite of this disadvantage, clustering-based algorithms have a lower computation
time. As a conclusion, I reckon to prefer nearest-neighbor based algorithms if
computation time is not an issue. If a faster computation is required for large datasets,
for example, just like the unlabelled dataset used for this research work or better still,
in a near real-time setting, clustering-based anomaly detection is the method of
choice, I observed.
Besides supporting the unsupervised anomaly detection research community, I also
believe that the study and its implementation are useful for researchers from
neighboring fields.
5.3.0 Recommendation:
On completion of the underlying system I can conclude that the integrated technique
system is providing far better system performance efficiency than a singular system
using k-means for outlier detection. Since the main focus is on finding fraudulent data
in a transaction dataset of credit cards, hence efficiency is measured on the basis of
frequency of detecting outliers or unusual behavioral user pattern. For this purpose the
techniques have a mechanism consisting of clustering based K-Nearest neighbor
algorithm with Anomaly Detection Efficiency. Thus, we are having a system which is
efficiently detecting unusual behavioral pattern as a final product.
Aleskerov E., Freisleben B., Rao B., CARDWATCH: A Neural Network-Based Database
Mining System for Credit Card Fraud Detection‟, the International Conference on
Computational Intelligence for Financial Engineering, pp. 220-226, 1997.
Central Bank of Nigeria (2003a). Report of Technical Committee on Electronic
Banking. Abuja: CBN Central Bank of Nigeria (2003b). Guideline on Electronic Banking.
Abuja: CBN
Davenport, T. H. (1999). Saving IT's Soul, Business Value of IT. Boston: Harvard Business
School Publishing.
Woherem E .W. (2000) Information Technology in the Nigerian Banking Industry. Ibadan:
Spectrum
Christopher, G., Mike, C., Visit, L., and Amy, W. (2006) “A logit analysis of electronic
banking in New Zealand”, International Journal of Bank Marketing, Vol. 24, No. 6, pp. 360-
383. Emerald Group Publishing Limited
D., Rachna, J., Tygar & M., Hearst,"Why Phishing Works" in the Proceedings of the
Conference on Human Factors in Computing Systems, 2006.
D.E. Bell and L.J. La Padula, 1976. "Secure Computer System: Unified Exposition and
Multics Interpretation," ESD-TR-75-306 (March), Mitre Corporation.
Ekberg, P., et al., ‘Online Banking Access System: Principles Behind Choices And Further
Development Seen From A Managerial Perspective’, Retrieved December, 2013, from
http://www.essays.se/essay/6974685cb6/
Ghosh S., & Reilly D., Credit Card Fraud Detection with a Neural Network. Proc. of 27th
Hawaii International Conference on Systems Science 3: 621-630. 2004.
Jun, M. & Cai, S., ‘The Key Determinants Of Internet Banking Service Quality: A Content
Analysis’ International Journal of Bank Marketing, (2001) 19(7), pp.276-291.
Karim, Z., et al., ‘Towards secure information systems in online banking’ Paper presented at
the International Conference for Internet Technology and Secured Transactions, 2009
(ICITST 2009), London
Leow, H.B., ‘New Distribution Channels In Banking Services’ Banker’s Journal Malaysia,
(199) (110), pp.48-56.
Maes S. Tuyls K. Vanschoenwinkel B. and Manderick B.; "Credit Card Fraud Detection
Using Bayesian and Neural Networks"; Vrije University Brussel – Belgium; 2002.
Panida S., and Sunsern L., ‘A Comparative Analysis Of The Security Of Internet Banking
In Australia: A Customer Perspective’ (2011) being a discussion paper delivered at the 2nd
International Cyber-Resilience Conference, Australia.
Pavlou, P., Integrating trust in electronic commerce with the Technology Acceptance Model:
Model Development and Validation. AMCIS 2001 Proceedings. [Online]. Available
at: http://aisel.aisnet.org/amcis2001/159. Accessed 3 August 2008.
List of Figures
Figure 3.1.1
“604 531 426 625 408 512 507 571 1133 584”
Figure 5.1.1
The first and last (10) variable rows of the primary transaction dataset, with seven (7) data
field variables and 8,430 observations.
Figure 5.1.2
At least, 6 out of the 8,430 transaction dataset are suspected and predicted to be a fraudulent
transaction.
LabelID = [641B6A70B816], [AB77E701417E], [C03089119C16], [AA39724E34AD],
[973114BAAC2A], [91C33507469F]. The R programming output snapshot is displayed
below for detail reference.
Figure 5.1.2