Sie sind auf Seite 1von 21

INT312::BIG DATA

FUNDAMENTALS
Lecture #0
Course details
LTP 004 [Four Practicals/week] [BYOD]

CA Category: A0304
Course Orientation: RESEARCH, SOFTWARE SKILL
Weightages: ATT: 5 CA: 25 MTT: 20 ETT: 50
Course details
TEXT BOOKS
No Textbook for this course.

REFERENCE BOOKS
1. BIG DATA by ANIL MAHESHWARI, MCGRAW HILL EDUCATION
2. BIG DATA AND ANALYTICS by SEEMA ACHARYA, SUBHASHINI CHELLAPPAN, WILEY
3. UNDERSTANDING BIG DATA: ANALYTICS FOR ENTERPRISE CLASS HADOOP AND
STREAMING DATA by PAUL C ZIKOPOULOS, IBM, CHRIS EATON, PAUL ZIKOPOULOS,
MC GRAW HILL
4. ORACLE BIG DATA HANDBOOK by TOM PLUNKETT, BRIAN MACDONALD, BRUCE
NELSON, MARK HORNICK, HELEN SUN, KHADER MOHIUDDIN, DEBRA HARDING,
GOKULA MISHRA, ROBERT STACKOWIAK, KEIT, MC GRAW HILL
5. PROFESSIONAL HADOOP SOLUTIONS by BORIS LUBLINSKY, KEVIN T. SMITH, ALEXEY
YAKUBOVICH, WILEY
Course Objectives
recognize the need and importance of fundamental concepts and
principles of Big Data

examine internal functioning of different modules of Big Data and


Hadoop

conceptualize the big data ecosystem and appreciate its key


components
What you will learn?

Big Data Fundamentals provides a path for


Introduction to Big Data
Introduction to Hadoop
Installation of Hadoop
Hadoop Architecture
Hadoop Ecosystem
HIVE and HBASE
6

Course Prerequisite
Prerequisite:
Java Programming / C++
Database basics
7

Whats Big Data?


No single definition; here is from Wikipedia:

Big data is the term for a collection of data sets so


large and complex that it becomes difficult to process
using on-hand database management tools or
traditional data processing applications.

The challenges include capture, curation, storage,


search, sharing, transfer, analysis, and visualization.
8

Big Data: 3Vs


9

Volume (Scale)
Data Volume
44x increase from 2009 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially

Exponential increase in
collected/generated data
4.6
30 billion RFID billion
. 12+ TBs tags today
(1.3B in 2005)
camera
of tweet data phones
every day world wide

100s of
millions
data every day

of GPS
? TBs of

enabled
devices
sold
annually

25+ TBs of
log data 2+
every day billion
people on
the Web
76 million smart by end
meters in 2009 2011
200M by 2014
CERNs Large Hydron Collider (LHC) generates 15 PB a year
Maximilien Brice, CERN
2

Variety (Complexity)
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF),

Streaming Data
You can only scan the data once

A single application can be generating/collecting


many types of data

Big Public Data (online, weather, finance, etc)

To extract knowledge all these types of


data need to linked together
A Single View to the Customer

Social Banking
Media Finance

Our
Gaming
Customer Known
History

Purchas
Entertain
e
4

Velocity (Speed)

Data is begin generated fast and need to be processed


fast
Online Data Analytics
Late decisions missing opportunities
Examples
E-Promotions: Based on your current location, your purchase history, what
you like send promotions right now for store next to you

Healthcare monitoring: sensors monitoring your activities and body


any abnormal measurements require immediate reaction
5

Real-time/Fast Data

Mobile devices
(tracking all objects all the time)

Social media and networks Scientific instruments


(all of us are generating data) (collecting all sorts of data)

Sensor technology and


networks
(measuring all kinds of data)
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
6

Some Make it 4Vs


7

Harnessing Big Data

OLTP: Online Transaction Processing (DBMSs)


OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
8

The Model Has Changed


The Model of Generating/Consuming Data has Changed

Old Model: Few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming
data
9

Whats driving Big Data


- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time

- Ad-hoc querying and reporting


- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
1

Big Data Technology

Das könnte Ihnen auch gefallen