Sie sind auf Seite 1von 5

Introduction:-

Credit Card Fraud can be defined as “Unauthorized account activity by a person for which the
account was not intended. In simple terms, Credit Card Fraud is defined as when an individual
uses another individual’s credit card for personal reasons while the owner of the card and the
card issuer are not aware of the fact that the card is being used. And the persons using the
card has not at all having the connection with the cardholder or the issuer and has no
intention of making the repayments for the purchase they done. In Credit Card Fraud
consumer may face trouble trying to get a fraudulent charge reversed, merchants lose the
cost of the product sold, pay chargeback fees, and fear from the risk of having their merchant
account closed. According to a recent survey, the rate at which credit card fraud occurs is 12
to 15 times higher than physical world fraud.
Credit Card Fraud Detection is the process which is used to detect fraud. The technology for
detecting credit card frauds is advancing at a rapid pace- rules based systems, neural network,
chip cards and biometrics are some of the popular techniques employed by Issuing and
Acquiring banks these days.
Credit Card Fraud Detection Using Neural Network based on the ‘statistical knowledge’
contained in extensive database of historical transactions and fraudulent ones in particular.
These neural network models are basically trained by using examples of the both legitimate
and fraudulent transactions and are able to correlate and weigh various fraud indicators (e.g.
unusual transaction amount, card history etc.) to the occurrence of fraud. The principles of
neural networking are motivated by the functions of the brain – especially pattern recognition
and associative memory. The neural network recognizes similar patterns, predicting future
values or events based upon the associative memory of the patterns it has learned. Neural
networks can be created for supervised and/or unsupervised learning [3]. The user specifies
the number of hidden layer along with the number of nodes within a specific hidden layer.
The output layer of the neural network may contain one or several nodes depending upon
the application. Recently, neural network researchers have several associated methods from
statistics and numerical analysis into their networks. Neural networks can learn and
summarizes the internal assumptions of data even without knowledge of the potential data
principles in advance. According to Rumelhart, (1986), Neural networks topologies, or
architectures, formed by organizing nodes into layers and attach layers of neurons with
modified weighted interconnections And it can match its own behavior to the new
environment along with the results of formation of evolution capability from present
environment to the new possible situation. Statistical methods are sometime unusual in the
practice research even though the common advantages of the neural networks in application
of credit card fraud detection. Neural network is a latest technique that is being used in
different areas due to its powerful capabilities of learning and predicting. In this project we
try to use this capability of neural network in the area of credit card fraud detection as we
know that Back propagation Network is the most popular learning algorithm to train the
neural network so in this paper BPN is used for training purpose and then in order to choose
those parameter (weight, network type, number of layer, number of node etc.) that play an
important role to perform neural network as accurately as possible, we use genetic algorithm,
and using this combined Genetic Algorithm and Neural Network(GANN) we try to detect the
credit card fraud successfully.
ASSUMPTIONS
CLASS PARATMETSD 451 2 example and supervised or not1 case study

Analysis and Findings:


We have the dataset of a bank and we have applied the supervised approach. This technique
aims to learn from historical information or observation to differentiate between normal and
fraudulent behavior. In the data set we have parameters or attributes of each transactions
and again those transactions have parameters which indicate whether those transaction are
fraudulent or not. This study is carried out in supervised learning where even if a new
transaction with the same parameters is carried out the model should be able to tell whether
it is fraudulent or not with some probability. However, we cannot detect a new fraud which
is not detected so far and not present in the historical dataset under the supervised approach.
So we complement supervised and unsupervised method to develop a powerful Fraud
Detection and Prevention application.
Description of Dataset:
Since this is a credit card transaction information, all the variable are decoded and Principal
Component Analysis is carried out. In bare eye the data set won’t make any such sense. We
can see that all the parameters are named as V1, V2 and so on. As this is a sensitive
information it would not be right for the bank also to disclose the information. In the dataset
the independent variables are V1 to V8 and Amount. The dependent variables are Time and
Class.
 Time
Number of seconds elapsed between this transaction and the first transaction in
the dataset
 V1-V28
May be result of a PCA Dimensionality reduction to protect user identities and
sensitive features (v1-v28)
 Amount
Transaction amount
 Class
 1 for fraudulent transactions, 0 otherwise

Analysis:
Str() Function:
It provides great information about the structure of R object. It compactly displays the internal
structure of an R object, a diagnostic function and an alternative to summary.
The main purpose of this function is to see whether categorical variables are taken as a level
or not.

We see that all the variables are numeric however variable ‘Class’ is taken as integer, but we
need ‘Class’ as a factor. The dependent variable can have only two value that is 0 and 1. So
in the next step we convert ‘Class’ into factor.
Splitting the Dataset:
Set.seed () function
Set.seed function in R is used to reproduce results i.e. it produces the same sample again
and again. It is for replication perspective, any number can be assigned as seed value.
We see that we have 284807 observations in the dataset which we have converted into two
separate dataset training and testing.

With the training dataset we will make the model and with the testing dataset we will see
how good the model is made. We have splitted the entire dataset based on ‘Class’ variable
in the ratio of 0.7. It randomly selects 70% of the record and it gives a T value and for others
it gives a F value. So wherever it gives a T value we consider it in Training dataset and where
it gives a F value we consider it in Testing dataset which is named as CV.
We also checked how many observations are there in training and testing dataset using
nrow () function.
Table() function:
Here we make a table to see the frequency of fraudulent transactions in testing dataset.

Out of 85443 transactions only 148 transactions are fraudulent in nature. So the base
accuracy is 99.82%. If we add a new transaction in the dataset and even if the model selects
all the transactions as non-fraudulent i.e. 0, it still be 99.82% accurate because in the given
dataset of 284807 record we have199364 records which are non-fraudulent transactions. So
even if the model doesn’t do anything and consider all the tractions as non-fraudulent, it
will be 99.82% accurate.
Logistic Regression:
The dependent variable is categorical and we are expecting 0 or 1 as an outcome and
independent variables are nothing but a set of parameters. So we are running logistic
regression.

‘glm’ stands for generalized linear model. We have taken all variables as dependent variable
except Class. As our dependent variable is dichotomous in nature family is ‘binomial’.
The error that we are getting is because of overfitting. The reason for this error is because
out of 284807 records 199364 records are non-fraudulent transaction.

Das könnte Ihnen auch gefallen