Sie sind auf Seite 1von 22

Big Data Camp, Delhi, Sep 10, 2011

Introduction to Hadoop / Big Data

Good Times < Year 2000


Online Applications- OLTP

Web Users Web Servers RDBMS

Analytics and Reporting- OLAP

Report Users Reporting Servers RDBMS DW

Year 2000 +
Online Applications- OLTP

Web Users

Web Servers

RDBMS

Analytics and Reporting- OLAP

Report Users Reporting Servers

RDBMS DW

Big Data- Problems to Solve

Storage

Fail
Scalability

The Knight in Shining Armor

Engine + Logic

File system

Video: What can Apache Hadoop Do for You?

Who Uses Hadoop?


Search Yahoo, Amazon, Zvents,

Log processing
Facebook, Yahoo Recommendation Systems

Facebook
Data Warehouse Facebook, AOL Video and Image Analysis New York Times, Eyealike INDIAN GOVERNMENT- UUID project

HDFS: Design Principles

Hardware will Fail!

Petabyte Scale Store!

HDFS: Design Principles

Map Reduce

Origin in Lisp!

Google- GFS paper!

Divide and Rule!

Map Reduce Programming Model

Borrows from functional programming

Users implement interlace of two functions :

map (in_key, in_value) ->

(out_key, intermediate value) list

reduce (out_key, intermediate value list) -> out_value list

Hadoop Map Reduce

Hadoop Map Reduce

Hadoop Example
Weather sensors collecting data every hour at many locations cross the globe gather a large volume of log data, which is a good candidate for analysis with MapReduce, since it is semistructured and recordoriented.
Data Format: The data is stored using a line-oriented ASCII format, in which each line is a record. The format supports a rich set of meteorological elements, many of which are optional or with variable data lengths. For simplicity, we shall focus on the basic elements, such as temperature, which are always present and are of fixed width.

Hadoop Example

Hadoop Example

Hadoop Ecosystem Map


1 Workflow 2 10
Cascading Cascading

12

Support More High Level Interfaces

Unstructured Data 6
5

High Level Interfaces 8 4


JAQL

13

Engine + Logic

File system 9

7 RDBMS

Structured Data
hiho

Monitor/manage 11 Hadoop ecosystem 14 OLTP

Java Applications

Sqoop

How can You Contribute?

Apache Hadoop Projects


Learn more about Hadoop Contribute to source code Participate in Mailing Lists/Forums Share blogs etc.

Impetus Open Source Projects


Github/Google code hosted projects Contribute to source code

Thank you
Visit bigdata.impetus.com

Big Data in EDW

20

Building Big Data Analytics Platform

Commercial

Open source

Hybrid

Teradata/ Netezza

CloverETL/ Kettle/ Talend

ETL - Open Source and Commercial

Greenplum/ Vertica/ Aster

Jaspersoft/ Pentaho Reporting Hadoop Apache Cassandra

Analytics - Open Source or Commercial

Informatica

Commercial Hadoop Versions

SAS/
Microstrategy/

Business Objects

Pentaho/ Jasper

Web Analytics

22