Sie sind auf Seite 1von 27

BIG DATA

Submitted by-Rajashree Rashmita


Reg.no-1825209016
MCA 4th sem
Content
 Introduction
 What is BIG DATA
 Examples of BIG DATA
 Characteristic of BIG DATA
 The structure of BIG DATA
 Why BIG DATA
 How it is different
 BIG DATA sources
 Tools used in BIG DATA
 Application of BIG DATA
 Risks of BIG DATA
 Benefits of BIG DATA
 How BIG DATA impact on IT
 Future of BIG DATA
 Conclusion
Introduction
 BIG DATA is a term defined for data sets that are large or
complex that traditional data processing applications are
inadequate .
 BIG DATA basically consists of analysis zing , capturing the
data , sharing , storage capacity , transfer, visualization and
querying and information privacy .
What is BIG DATA
Big Data means really a big data, it is a collection of large datasets that cannot be
processed using traditional computing techniques. Big data is not merely a data,
rather it has become a complete subject, which involves various tools, techniques
and frameworks.
Examples of BIG DATA
Ex- Face book, Flicker, YouTube, Twitter, Google,
Google news. Face book reports 2.5 billion
content items, 105 terabytes of data each half
hour, 300M photos and 4M videos posted per day .
In Twitter, Over 651 million users, generating over
6,000 tweets per second. 300 hours of video are
uploaded to YouTube every minute with more than
1 trillion video views. Google make 50 billion
pages indexed and more than 2.4 million queries
in every minute . Google news Articles from over
10,000 sources in real time . More than 4.5 million
photos uploaded in a day in Flicker. It is estimated
that all the global data generated from the
Characteristics of BIG DATA
Volume
 The quantity of generated and stored data every second .
 Here we are talking about Zettabyte or more .
 It is the task of big data to convert such Hadoop data into
valuable information .
 Data is generated by machine , networks and human
interaction on systems like social media .
 The volume of data to be analyzed is massive .
Velocity
 The speed of generation of data.
 Perhaps action being taken upon .
 The highest velocity data normally streams directly into memory
versus being written to disk .
 Some Internet Of Things(IOT) requires real-time evaluation and
action .
 E.g.-almost 2.5 million queries on Google are performed .
 Around 20 million photos are viewed .
 Every minute we upload 100 hours of video on Youtube .
 300,000 tweets are sent .
 Every minute over 200 million Emails are sent .
Variety
 BIG DATA is not just numbers , dates ,and strings . BIG
DATA is also 3D data ,geospatial , audio and video and
unstructured text , including log files and social media .
 Traditional database systems were designed to address
smaller volumes of structured data , fewer updates or a
predictable , consistent data structure .
 BIG DATA includes different types of data .
Veracity
 It is the extended definition for BIG DATA , which refers to
the data quality and data value .
 The data quality of captured data can vary greatly , affecting
the accurate analysis .
 Data quality is unreliable .
 Data coming from uncontrolled environments .
The Structure of big data
Now days 8 vs
10 vs
Why BIG DATA
 Growth of BIG DATA is needed .
 Increase of storage capacities .
 Increase of processing power .
 Availability of data (different data types) .
 Every day we create 2.5 quintillion bytes of data ; 90% of the
data in the world today has been created in the last 2 years
alone .
How it is different
 Automatically generated by a machine (sensor embedded in
an engine)
 Typically an entirely new source of data (use of the internet)
 Not designed to be friendly(text streams)
 May not have much values (need to focus on the important
part)
Tools used in BIG DATA
 Distributed servers / cloud (Amazon EC2)-processing is
hosted .
 Distributed storage (Amazon S3)-data is stored .
 Distributed processing(MapReduce)-programming model .
 High-performance schema –free database(MongoDB)-data is
stored and indexed .
 Analytic /semantic processing-operations are performed on
data .
Continue…
 Hadoop- It is a free , JAVA –based programming framework
that supports the processing large data sets in a distributed
computing environment .
 Facebook , LinkedIn , Twitter ,eBay use Hadoop .
 Hbase- A scalable , distributed database that supports
structured data storage for large tables .
 Hive- A data ware house infrastucture that provides data
summerization and ad hoc querying .
Application of BIG DATA
Benefits of BIG DATA
 Real-time big data is not just a process for storing petabytes
or exabytes of data in a data warehouse, it is about the ability
to make better decision and take meaningful actions at the
right time .
 Fast forward to the present and technologies like Hadoop
give you the scale and flexibility to store data before you
know how you are going to process it .
 Technologies such as MapReduce , Hive and impala enable
you to run queries without changing the data structures
underneath .
Continue….
 Our newest research finds that organizations are using big
data to target customer-centric outcomes, tap into internal
data and build a better information ecosystem .
 BIG DATA is already an important part of the $64 billion
database and data analytics market .
 It offers commercial opportunities of a comparable scale to
enterprise software in the late 1980s .
 And the internet boom of the 1990s , and the social media
explosion of today .
How BIG DATA impacts on IT
 BIG DATA is a troublesome force presenting opportunities
with challenge to IT organization .
 By 2015 4.4 million IT jobs in BIG DATA ; 1.9 million is in
US itself .
 India will require a minimum of 1 lakh data scientists in the
next couple of years in addition to data analysts and data
managers to support the BIG DATA space .
Future of BIG DATA
 $15 billion on software firms only specializing in data
management and analytics .
 This industry on its own is worth more than $100 billion and
growing at almost 10% a year which is roughly twice as fast as the
software business as a whole .
 In February , 2012 the open source analyst firm Wikibon released
the first market forecast for BIG DATA , listing $5.1 billion
revenue in 2012 with growth to $53.4 billion in 2017 .
 The McKinsey Global Institute estimates that data volume is
growing 40%per year , and will grow 44x between 2009 and
2020 .
Conclusion
 BIG DATA is now a reality with a huge profit potential .
 Tools and Technologies are available through Open-Sourse .
 Each one of us can benefit from working with BIG DATA
(Dynamic)in its pure form or in its traditional form (static) .
THANK YOU

Das könnte Ihnen auch gefallen