Big Data Analysis: Shubham Gupta B.Tech Computers Batch: E3 Roll No.:E059

BIG DATA ANALYSIS
SHUBHAM GUPTA
B.TECH COMPUTERS
BATCH: E3 ROLL NO. :E059
ABSTRACT
Need for methods other than traditional database
management systems.
Knowing about big data.
Factors by which data can be expressed.
Importance of big data.
Big data tools
Issues and challenges faced by big data.

Indexing methods .
Applications
Brief details about Data mining
BRIEF DESCRIPTION
DBMS and Big data

DBMS and Big data : Database management
systems is a primary and simple method.

Data is increasing rapidly since decade
Rdbms being simple system is not efficient and
collapses.
This huge pool of data is big data
BIG DATA
The big data requires certain approaches:
techniques tools and architecture.

The motive is to solve old and new problems in
better way.
Big data generates value from the storage and
processing very large quantities of digital
information that cannot be analysed by traditional
methods
EXAMPLES
RFID
Transactions in walmart
VISA transactions
tweets
Active users on facebook generating social
interaction data etc.
CHARACTERISTICS OF BIG DATA
Volume
Velocity
Variety
Data
quantity
Data speed Data types
IMPORTANCE OF BIG DATA

The main motive is to use the large data, find
relevant information and analyzing it to find

solutions that reduce time, cost.
help to develop new products and help smarter
decision making.
With help of proper analytics tools we can also find
root causes of failure, issues, defects etc in near real
time and optimize use the routes .
BIG DATA ANALYTICS

Examining large amount of data.
Appropriate information
Identification of hidden patterns
Competitive advantage
Better business decision strategic and operational
Effective marketing, customer satisfaction and
increased revenue.
BIG DATA TOOLS

NoSQL solution: It provides more dynamic model
which is comparatively less rigid.

Apache CASANDRA: It is a better method since it can
handle high velocity data easily and effectively, uses
schema that supports a larger variety of data sets. It
provides continuous availability and much more
cheaper than RDBMS
APACHE HADOOP: It permits distributed processing
of different datasets across clusters of systems using
datasets. It does data storage and data processing
BIG DATA TOOLS

SAP HANA: It helps in large amount data
processing in short period of time. The databases let

HANA perform processing on data stored in RAM.
By this we can get immediate results from user
transaction and data analysis.
CHALLENGES & ISSUES

The challenges include capture, curation, storage,
algorithms, search, sharing, transfer, analysis and

visualization
Issues:
Analysis of the data and data quality.

Searching the data through retrieval algorithm
Addressing data quality.
Displaying meaningful results
Another issue is SECURITY.
INDEXING METHOD : MapReduce
Map( ): This functions sorts the queues and
generates intermediate keys.

Reduce() is used to find occurrence or frequencies. It
merges occurrence of similar intermediate keys.
INDEXING METHOD : Parallel Indexing

2 level indexing
Global level: key -> node
Like a table, should be partitioned and replicated
Problems
Index partitioning
Update propagation for replicas
Local level (within a node): like a normal index
Easier with
Hash tables, bitmaps, inverted files
More complex with trees or graphs
Index partitioning
Replicate top levels, partition low levels
Applications
Maximum temperature: The weather sensors collect data of
weather across the globe all the data is collected by map

reduce method it is sorted and we can calculate maximum
temperature.
Word count: It is used to count number of times the word
appears in a document a mapper maps similar words and
reducer sums the counts for each word and emits single value.
Anagram : It is a word play all the letters of a word are
repeated only once to create a new word.
Election commission: For election a lot of data is required a
persons history region location party details candidate details
etc.
Different natural disasters hit data: Every year
details of data climate area are gathered which

together forms a big data and needs to be handled by
tools.
Mutual friend problem: too many request so mutual
friends are calculated daily and a quick request is
sent.
University database: has courses student data
enrolling and employee data which is fetched by
various minning techniques. Etc
DATA MINING
Big data is a term for a large data set.
Data mining refers to the activity of going through
big data sets to look for relevant or pertinent

information.
Data mining techniques
Cluster Detection.
Decision Trees.
Memory-Based Reasoning.
Neural Networks.
Genetic Algorithm.
Data mining techniques
ADVANTAGES &
DISADVANTAGES
ADVANTAGES
Provides geographic diversity and redundancy.
Ample computing options are available
Huge workloads can be handled by adding new and good
quality of hardware, along with latest processing.

It can also work as stand alone and can be integrated with
soft layer solution and services.
Multi level security protects the big data against various
physical and electronic threats. A number of safety
measures are taken too.
Payments can be made according to the services availed
for the application.
DISADVANTAGES
Encourages large data collections holds on incomprehensible
data if it proves to be useful

People are unaware about information collected and located.
Personal information of people is combined with cosmic data
set. So personal details can be inferred.
It permits public to be manipulative.
Risk analysis can treat people unfairly and carelessly.
It is also used by spies and governments to find out about the
terrorist activities but the major problem is sometimes even
common public can face adverse conditions and data mining
can not efficiently find out the main terrorist.
INFERENCE
INFERENCE
We can infer that big data is efficient since it uses
certain indexing techniques to organize data.

Also we performed a comparitive study of different
algorithms as seen earlier.
INFERENCE
We got to know about different indexing methods
their similarities and differences.

Like the mapreduce method sorts the queue and
generates intermediate keys.
Whereas in parallel indexing graphs trees etc are
formed.
By this we can conclude that if we use proper algorithms
or techniques to manage bulks of data and can be
handled in timely manner then that data would be of
use else it would just occupy space and be of no use.
QUESTIONS ANSWERED
Questions answered are:
Why do we use bigdata ?

Its feasibility and what scale companies use it?
How to process data immediately?
How to handle different kinds of data?
How is data indexing done?
QUESTIONS TO BE
ANSWERED.
Yet to be answered
How to maintain a balance between security and
privacy.
Risk analysis
How to get analysis architecture that can balance
data in real time and historic data.
Efficient distrinbuted processing.
CONCLUSIONS
CONCLUSION
As data is increasing day by day for them databases and
DBMS is not enough.

So to organise and get the real information from the pool of
data i.e. big data. We use tools like Hadoop and MongoDB
which are the top-two Big Data technologies for big data
analysis.
Making use of MapReduce model, MapReduce computation
processes many terabytes of data on thousands of machines
and programmers find the system easy to use.
Data mining and big data are co-related and information can
be retrieved from pool of data by applying the most optimized
data mining algorithm on the big data.
References
Anshul Gupta, Nishchol Mishra, Jitendra Agarwal,
Ravindra patel A Study On Big Data International
conference on Cloud, Big Data and trust 2013,Nov 13-15,
RGPV;
Vibha Bhardwaj, Rahul Johari USICT GGSIPU Big data
analysis:issues and challenges 978-1-4799-76782/15/$31.00 2015 IEEE
From Databases to Big Data Sam Madden
Massachusetts Institute of Technology 10897801/12/$31.00 2012 IEEE

Big Data Analysis: Shubham Gupta B.Tech Computers Batch: E3 Roll No.:E059

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Big Data Analysis: Shubham Gupta B.Tech Computers Batch: E3 Roll No.:E059

Hochgeladen von

Copyright:

Verfügbare Formate

BIG DATA ANALYSIS

Need for methods other than traditional database

Issues and challenges faced by big data.

DBMS and Big data

systems is a primary and simple method.

techniques tools and architecture.

interaction data etc.

CHARACTERISTICS OF BIG DATA

Data speed Data types

IMPORTANCE OF BIG DATA

relevant information and analyzing it to find

BIG DATA ANALYTICS

BIG DATA TOOLS

which is comparatively less rigid.

BIG DATA TOOLS

processing in short period of time. The databases let

CHALLENGES & ISSUES

algorithms, search, sharing, transfer, analysis and

Analysis of the data and data quality.

Another issue is SECURITY.

INDEXING METHOD : MapReduce

Map( ): This functions sorts the queues and

generates intermediate keys.

merges occurrence of similar intermediate keys.

INDEXING METHOD : Parallel Indexing

weather across the globe all the data is collected by map

Different natural disasters hit data: Every year

details of data climate area are gathered which

big data sets to look for relevant or pertinent

Data mining techniques

Data mining techniques

quality of hardware, along with latest processing.

data if it proves to be useful

certain indexing techniques to organize data.

their similarities and differences.

Questions answered are:

Why do we use bigdata ?

DBMS is not enough.

Das könnte Ihnen auch gefallen