Sie sind auf Seite 1von 16

Review Paper on Big Data Analytics in Banking Sectors

1.Sandeep .S 2.Mr A.V.Allin Geo


Student,Department Of CSE,BIST,BIHER,Chennai.
2.Assistant Professor,Department Of CSE,BIST,BIHER,Chennai.
1.sandeep.naidu9640@gmail.com 2.Allinegeo@gmail.com

Abstract:

Big Data is the Analytics is the process of large data. Data is rapidly increasing in
Banking sectors since many decades. Banking sectors maintaining huge amount of large
data every day. This huge amount of data is used to wide open the secrets money
movements. Most of the banks have failed to use the information within own relational
databases (RDBM’s). Big data has the characteristics of volume, variety and velocity.
These characteristics will improve the strength of risk management and also providing
fraud detection. Big Data challenging the new technological trends of banking sectors
and solving issues in effective way. Main objective of Big Data Analytics in banking
sector.

Introduction

Big Data is the storing and analysing data in order to make some sense for the Organizations
like banking sectors and educational institutions. For any application contains limited amount
of data we normally use SQL and Postegre SQL. These are maintained for small applications
but for large applications like Facebook, Google, You tube? This data is very large and
complex to store that none of traditional database management system is able to store and
process the data. Big data used for analysing data and improve the business of institutions.
Data is a collection of information and it is divides into two types of categories.

Categories of Big Data:

1.Strucured Data:

Data which can be stored and processed in table. Table means data has to be stored in the
form of rows and
columns and this format is called structured data. This data is relatively simple to store, enter
and analyse.

2. Unstructured Data:

This data is unknown form or structure is called as unstructured Data. Examples of


unstructured data are images, videos, customer service interactions, web pages, PDF files,
PPT, social media data etc., Facebook generates 500 plus terabytes of data per day as people
upload various images, videos, posts, advertisements etc.,

3V’s of Big Data:

1.Volume: The amount of data which deals with large size of Peta bytes, Exa bytes, Tera
bytes. In this volume maintains large amount of datasets.

2.Variety: Data comes in all type of format. (text, audio, image, video). In this version
maintains unstructured data. Data is un known form of structure.

3.Velocity: Data is generating at a very fast rate. Velocity is the measure how fast is the data
is coming from. For time critical applications faster processing is very important. For
example share marketing and video sharing are velocities of big data.

Sources of Big Data:

Banking sectors: This could be data coming from banking sectors such as RBI, ICICI, State
Bank of India and Different banks etc.,

Social Media: Data coming from the social media services such as facebook, likes, photos,
video uploads, comments, you tube views, Tweets.
Share market: Stock market exchange generate huge amount of data through daily
transaction. Now days stock markets plays important role in Big Data.
E-commerce sites: These days E-commerce sites generates huge amount of data Like
Flipkart, Amazon, snapdeal, Myntra etc.,

Application of Big Data:

Banking Zones and Fraud Detection: Big Data is hugely used in fraud detection in the
banking sectors. In banking sector, it finds out of all the damage tasks done. It detects the
waste of credit and debit cards, business clarity, public Analytics for business and IT strategy
implementation analytics

1.Impact of Big Data on Banking Institutions and major areas of work

Bank industry experts describe gigantic data as the mechanical assembly which empowers a
relationship to make, control, and administer significant educational accumulations in a given
day and age and the limit required to help the volume of data, depicted by variety, volume
and speed.
Underneath we look at the genuine regions where tremendous data is being utilized by cash
related establishments which are increment their endeavor peril organization structures to
help improve undertaking straightforwardness, auditability, and authority oversight of hazard.
1.1 Customer centric
Client experience closed feedback loop

Customer life event examination

Best offer

Real time assignment based configurations

Sentiment strategy after enabled administration

Sentiment analysis-enabled lead/referral administration

Nature of lead analysis

Micro-division

Customer Gamification

Sentiment analysis-enable sales estimating

1.2 Risk management:


MIS/ Regular announcing

Disclosure detailing

Real time keyboard discussion following

Anti-tax evasion
Following are the manners by which information investigation is being utilized to discover
and assess monetary wrong doing administration (FCM) arrangement rules, by early
identification of the connection between's money related wrongdoing and qualities of the
exchange, or arrangement of exchanges.

1.3 Transactions :

Exchanges when pursued over some undefined time frame, have a tendency to uncover a
great deal of data about the idea of exchange, log examination, exchanging conclusions and
different angles. Banks and other money related establishments use Big Data under this
header in following ways

IVR analysis

B2B vendor bits of knowledge.

Real time capital estimations

Log analytics

2. Use case from Banking Sector - Problem statement and available data
The bank under thought is a bank in the center east. Personality has been hidden to keep
private data from spilling out, and consequently we will allude to this bank as XYZ Bank. It's
been in activity since the previous 20 years, and experiences experienced issues resuscitating its
overall revenues post the 2008 money related emergency. From 2011 onwards, they began
gathering client input with the end goal to comprehend and settle issues with the working of
bank.
In 2013, they encountered a dunk in their consumer loyalty estimation, alongside which their
client maintenance additionally dropped. We have been given the task by the Bank to perform
the following:
I. Deciding the underlying driver of drop in consumer satisfaction measurement.
II. Break down the spending examples of their card holders (4 cardholders as a subset)
III. Channel use examination – charge/acknowledge depictions, and in addition installment
modes – ATM, cards x Behavior of a client and product cross selling.
For our case, the accompanying focuses have been taken into consideration:
IV. Value-based information for 4 cardholders (set of around 5000 records), for the time
arrangement January 2011 – June 2014. x Access accessible to 20,000 records put away with an
outsider in charge of gathering input for XYZ Bank.
2.1 Methodology
We start with breaking down the consumer satisfaction measurement information gave to us.
This will likewise enable us to understand if the issues XYZ Bank was facing were because of
poor administrations or some other issue.
Subsequent to dividing the issue with help of input investigation, we will attempt and figure the
motivation behind why issue occurred and proposed improvements.
3.5

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Jan

Feb

2011

2012

2.5 2

Service Quality? Service Speed?

We will likewise do client division and propose reasonable items which can be sold to a client,
based on their type.
3. Analysis and Inferences
3.1. Feedback Analysis
Feedback processes are important for any organization to help and understand the budding
areas of development and if done on a consistent basis, they help to identify gaps in services
rendered. XYZ Bank also started to collect feedback from their customers; from those who
visited bank branches as well as from those who used online services.
3.1.1. Data Collection and Sample Size

Following data was gathered and accumulated over a period of 3 years 6 months. Customers
visiting any branch of XYZ Bank were asked to rate the bank anonymously on a scale of 1 –
5 on the following parameters:
The analysis below is performed using the affecting subset of the total data together,
including of feedback from around 20,000 customers.
When we plot the data, there are some curious findings
0

Jan Mar May

Jul

Sep Nov

Jan Mar May

Jul

Sep Nov

Jan Mar May

Jul

Sep Nov

Jan Mar May

20

11

2012

2013

2014

CSI Value

Service Quality?

Service Speed?

Solution to Queries?

3
4

Jan Mar May

Jul

Sep Nov

Jan Mar May

Jul

Sep Nov

Jan Mar May

Jul

Sep Nov

Jan Mar May

20

11

2012

2013

2014

CSI Value

Service Quality?

Service Speed?

Solution to Queries?

3.1.2 Feedback Analysis and Inference

The ratings received prior to February 2012 are fairly stable and low. Service quality, service
speed and effective addressing of queries were all ranked with equal weightage.
Inference – The customers rated bank services as average and the bank did not take up any
corrective measures during this period to improve its customer ratings. 2
2.5

3.5

Jul

Jan

Mar

May

Sep

Nov

Jan

2011

2012

Service

Quality?
Solution to

Queries?

2.5

3.5

Jul

Jan

Mar

May

Sep

Nov

Jan

2011

2012

Service

Quality?

Solution to

Queries?

4. Transactional Analysis

The next section of the study will try and isolate the root cause of the drop in customer
satisfaction ratings for bank, as well as evaluate and look at various strategies used in
analytics. As mentioned above, the following will be the basis of this part of the study:
The XYZ Bank Dataset comprises of the Transactional History of 4 cardholders from
January 2011 – May 2014 and will be evaluated as per the heads given below.

Big Data in banking applications:

Fraud Detection: It help bank to detect and prevent inner and exterior as well as reduced as
the associated cost. Big Data is mainly found the mistakes and limitations in Baking sectors.

Risk Management: Banks are anlyse the transaction data to determine risks and exposures
on simulated market behaviour scoring customer and possible clients.
Contacts centre Efficiency Optimization: It helps banks to resolve problems of customers
speedily by permitting banks to anticipate customers in need ahead of time.

Customer Segmentation for Optimize Offers: Provides a way to understand customers


needs at a rough level. So that the banks can carry targeted offers more successfully.

Customer analysis: It helps to banks retain their customers by analysing their behaviour and
finding outlines that lead to a customer rejection.

Examine customer feedback: Clients assessment can be gathered in the content shape from
different internet based life sites. Once these assumptions can be gathered, they can be
characterized into positive and negative and by applying different channels they can be
utilized to give administrations to clients.

Detect when a customer is about to leave: As we probably aware the expense of getting
new clients is more noteworthy than holding its old clients. At the point when the bank deals
with clients require by understanding the issue, consideration must be given to discover an
answer.

Hadoop:

It is a Open Source Framework where we can analyze the data cheaper and faster with the
cluster of commodity hardware. It provide massive storage for any kind of data with
enormous processing power. Hadoop is an distributed file System and it is not a database. It
simply uses the file system provided by Linux to store data. Hadoop has five such daemons.
They are Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker. By
using daemons operates the Both Structured Data and Unstructured data. By Using Hadoop
in Banking sectors we can handle the Risk Management and Fraud Detection.
HDFS (Hadoop Distributed File System): The java based scalable that stores data across
multiple machines without prior organization.

Map Reduce: It is a software programming model for processing large sets of data in
parallel.

Flexibility: We can store as much data as we want and decide how to use it later. That
include unstructured data like text, images and videos.
HDFS Architecture:
YARN child: it is a logical expression in the resource manager which is going to chap the
data into perfect block size 128MB.
Yarn Scheduler: it is also logical expression which is going to schedule the block on top of
the particular data node based upon resource manager information.
Hadoop has five daemons. They are Name Node, Secondary Name Node, Data Node, Job
Tracker, Task Tracker.
Name Node: Name Node is the central piece of HDFS. It keeps the directory tree of all files
in the file system. It won’t store any meta data of the blocks which exits in the data node.
Name Node maintain two important files one is Edits.log and other one is fs.image. Edits.log
stores the meta data and fs.image captures the Edits.log information.
Secondary Name Node: Secondary Name Node is also called as check point node and
passive node.
For every frequent equal intervals of time the Meta Data transferred to Name Node to Check
Point Node.
Data Node: Data Node is responsible for storing actual data in the form of blocks in HDFS.
Data Node is always communicate the Name Node constantly for every 3 seconds. It is also
called heart beat. It will update the health information of Data Node.
Cluster Management Calculation:
To calculate the Data Node with given configuration.
30% 2PB RAM = 64 GB
100% ? HDD = (6*12) = 72 TB
2*100/30 = 6.7 pb + 0.2 pb = 6.9 pb = 7.0 pb
7 pb * 1024 TB = 7168Tb/72 = 100 DN.
Process of Hadoop Cluster:
1. Node manager of Data Node 2 send the information to resource manager again sent to the
Edits.log
2. To know the Hadoop cluster information (Health) which includes total size, total
directories, total files and about block information such as total blocks, over, under, Mis
replicated Blocks and number of Data Nodes, number of racks which are using in the cluster
HDFS Architecture information.
3. let us consider the file name as bbc.txt which is the size of 1 GB which includes 8 blocks
of 128 MB and over all 24 blocks of replication which includes replication.
4. To write on top of HDFS Hadoop fs -put “bbc.txt” “user/cloudera”. By using edge node we
are using above syntax. Once file reaches name node resource manager handover to yarn
child.
5. Yarn child will chap the data into perfect block size 128MB and sending to Data Que in a
FIFO process. From Data Que to blocks going to send the acknowledge Que by using data
stream(mapper).
6. In the Acknowledge Que the blocks are going to be in hold position. Acknowledge Que
totally controlled by Yarn Scheduler. The first original block is return on the top of particular
Data Node with the guidance of yarn scheduler to Acknowledge Que. Once the original block
successfully return it’s meta updated in Edits.log.
7. Now replication block1 return on top of adjacent Data Node with the help of native C++
libraries. Once the block written successfully requires Acknowledge. Third replicated block
also return on top of adjacent data node successfully till the replicated blocks are return
successfully the Node Manager will contact Resource Manager and update blocks of meta
data successfully.
Fail Over Chances of the Blocks:
Over Replicated Block: In any case the replication block written successfully and Data
Node get failed at that time there is a chances of writing replicated block again.
Under Replicated Block: In any case the block return successfully at 99.0% chances of
getting Data Node failure. Node Manager sent information Acknowledgement. Block will
removed in the Ack Que.

Benefits of the Hadoop:

Computing Power :

Its distributed computing model quickly processes Big Data. The more computing notes we
use the more processing power we have.
Low Cost: It is open source framework id free and uses commodity hardware to store large
quantity of data.

Scalability: We can easily grow our system simply by adding more nodes .
High Scale Computing Platform for Big Data Analytics:

HDFS
Structured
data in
RDMS
S
qoop
Unstructured
data
Pig
Online
data
stream
Real
time
learning
system
System/
web logs
F
lume
Internal data
transformation
P
ig
R
Hadoop
H
ive
Internal data
transformation
HDFS
Structured
data in
RDMS
S
qoop
Unstructured
data
Pig
Online
data
stream
Real
time
learning
system
System/
web logs
F
lume
Internal data
transformation
P
ig
R
Hadoop
H
ive
Internal data
transformation

Advantages of Big data for banks Using big data and technology, the banks may be able reap
some of the following benefits:

Advantages of Big data for banks

1.Find out the root cause of issue and failures


2.Determine the most efficient channel for particular customers
2.Identify the most important and valuable customer
3.Prevent the fraudulent behaviour
4.Analyse the risk and the risk profiling
5.Customised products and customised marketing communication
6.Optimise human resources
7.Customer retention

Literature review:

Banks square measure establishments to inside the money business and region improvement,
concern like exchange stores organization and premiums in capital markets, among others.
The keeping cash structure is indispensable for the economy as it's a subject of lovely energy
for examiners in an exceedingly no matter how you look at it of different regions, like
organization science, displaying, back and information propels. Berger (2003) found check of
an association between creative headway and effectiveness in keeping cash.

Predictable maker conjointly focuses on that banks use associated science models reinforced
their money data for different limits, like credit evaluation and peril examination. Budgetary
region changes allowed a climb in competition, turning bank advancing a basic supply of
financing. Credit risk examination is by its very own a colossal region, encompassing a
bigger than normal combination of examination creations inside dealing with a record and
spread out through the latest twelve years. Other dealing with a record related subject
wherever examination has been dynamic is distortion bar and acknowledgment in old
keeping cash organizations and in new correspondence channels that assistance e-setting
aside extra cash organizations, from that bit of email spamming to unlawfully get singular
money data could be a specific case of premium.
As a result of advances in data development, essentially all dealing with a record exercises
and approach square measure customized, making tremendous proportions of information.
Thusly, all of the subjects determined over will in all probability have the upside of metallic
segment courses of action.

Proposed System:

By utilizing Big information examination in saving money applications we can without much
of a stretch discover Risks and extortion recognitions. Utilizing Hadoop correct every one of
the issues of the managing an account divisions aṇd additionally fulfilling the client
necessities. Presently a days Hadoop assumes real job in creating business situations. Start
outline work are additionally utilizing for business prerequisites like keeping money areas
and instructive organizations. Utilizing Hadoop we can redress every one of the issues of the
keeping money applications.

Conclusion:

Enormous amount of data examination is by and by being approved over different circles of
setting aside some cash part and causes them pass on higher organizations to their customers,
each inside and outside, close-by that is also serving to them improve their dynamic and idle
security structures. This examination explored esteem based and nostalgic examination for
the Banking Sector. We saw one among the behaviour in which that in any case client
thoughts are gotten and won’t to review working of the bank. There are more ways that banks
and various cash related establishments have begun to get client related data for thought
examination, going from life destinations to changed promoting research channels.

References:

[1] Srivastava, U., & Gopalkrishnan, S. (2015). Impact of big data analytics on banking
sector: Learning for Indian banks. Procedia Computer Science, 50, 643-652.
[2] Pingale Murali Manish, Sheetal Kasale, Anit Dani Simon Banking &Big Data analytics.
IOSR Journal of Business and Management (IOSR-JBM) e-ISSN: 2278-487X, p-ISSN:
2319-7668 PP 55-58 www.iosrjournals.org
[3] The Impact of Big Data Analytics on the Banking Industry
[4] Chandani, A., Mehta, M., Neeraja, B., & Prakash, O. (2015). Banking on Big Data: a
Case Study, 10(5), 2066–2069. Retrieved from www.arpnjournals.com
[5] Sobolevsky et al. Big Data of Bank Card Transactions as the New Proxy for Human
Mobility Patterns and Regional Delineation. The Case of Residents and Foreign Visitors in
Spain; 2014.
[6] Bob Palmer. 2013. Getting the most out of big data and analytics. IBM White Paper.
Available fromhttps://www.ibm.com/smarterplanet/global/files/sweden_n
one_banking_mostoutofbigdata.pdf
[7] Richard Winter, Rick Gilbert, Judith R Davis. Big Data: 2014. What does it really cost?
Wintercorp. Available from http://www.wintercorp.com/tcod-report/.
[8] Abhinav Kathuria "Impact of Big Data analytics on banking sector". International
Journal of Science, Engineering and Technology Research (IJSETR) Volume 5, Issue 11,
November 2016.
[9]http://bigdata-madesimple.com/role-big-data- banking-industry/

Das könnte Ihnen auch gefallen