Sie sind auf Seite 1von 23

Big Data The journey begins

Zubair Shaikh

Senior Technical Lead IP & Science.

Objective
Share Contemporary understanding on Big
Data.
Creating awareness, and spark up interest to
explore new avenues in Big Data trends /
technologies.
Big Data initiatives in ThomsonReuters.

Content
The rise of the Bytes
Astonishing facts and figures
World Data forecast
Broad classification of Big Data
Characteristics of Big Data The 3 Vs of Big
Data
Challenges of Big Data and next Gen tools
Big Datas impact on Thomson Reuters

The rise of the Bytes


10008 YB -> Yottabyte
10007 ZB -> Zetabyte
10006 EB-> Exabyte
10005 PB ->Petabyte

10004 TB -> Terabyte


10003 GB -> Gigabyte

10002 MB -> Megabyte


1000 KB -> Kilobyte

Astonishing facts and figures


ERIC Schmidt, Chairman of Google Said :

From the dawn of humanity to 2003 data produced by


human race is 5 Exa bytes( 10006), and now every 2
days we are creating 2 Exa Bytes of data

World Data forecast.

In 2010, estimated amount of world digital data was 1.2 ZB.


In 2013, the web data reached to 4 Zettabytes
Data growth will be 44 times greater in 2020 than in 2009.
Data volume is doubling in every 1.2 years.

Big Data :Broad classification

Big Data :Broad classification


(Contd)
Structured data
Fits into table, stored in RDMBS
It is 20% of the world data

Semi-Structured Data:

Big Data :Broad classification (Contd)


Unstructured data:
80% of world data semi-Structured /
Unstructured

Big Data :Characteristics


The 3 Vs of Big Data..

Big Data :Characteristics (contd..)


Volume: Huge Volume of data is being
generated by different sources.
Velocity: The speed at which data comes into
real time as a consequence of different
sources.
Variety: The different forms of data.
Machine Generated: Sensors, Machines, Satellites, Weather data
User Generated Data: Social Media sites, Face book, Twitter
Operational Data: Stock Market, Application Logs

Big data :Significant data


producers

NYSE trading/day produces 1 TB


New websites created every minute a day
571.
Google data processing /day 20 peta
bytes.
Data uploaded daily to Facebook 100
terabytes.
Aadhar card for India
UIDs for Indian population of 1.5 BILLION.
Per resident 5MB
I/O everyday 30 TB

Big Data : Challenges


Handle the variety of data.
Store the Huge volumes of data in
existing in different forms.
Process /Analyze this Huge data
. Eg :
By using the traditional RDBMS approach
for decoding the human genome takes
10 years.

What next ??
Next generation of data tools and
techniques like Hadoop and NoSQL
databases are needed to handle the Big
Data.

Big Datas impact on


Thomson Reuters.

What Thomson Reuters intends

Thomson Reuterss Big data strategy.

BOLDBIG OBJECT LINKED DATA


Thomson Reuters Big Data initiative to place/link data under one common
platform for analytics.
Its a data lake for all the content from TR.
Content pumped from Legal,IP & Science,F&R,Tax and accounting.
A Hadoop store of our content.
CORE GOAL: A Knowledge Graph that manages facts and relationships
extracted from the Lake

Linked Data - RDF


RDF (Resource Description Framework) is a
standard model for data interchange on the Web
Its the foundation upon which the web of
semantic data is built
Organized into triples [Subject, Predicate, Object]
Predicate
Subject

Object

A predicate defines the relationship between the subject


and object nodes
20

RDF Example
RDF: XML based language for triples using URIs

Inferred relationships

Subject=Dan,
Predicate= is_from,
Object=England

Relationship doesnt exist inferred from the other two: new


knowledge

The Graph Federated Machine Readable Knowledge

Questions ??

Das könnte Ihnen auch gefallen