You are on page 1of 3

Big Data Analytics

This course provides practical foundation level training that enables immediate and effective
participation in big data projects. At the end of this course, the student will become familiar with the
fundamental concepts of Big Data management and analytics; will become competent in recognizing
challenges faced by applications dealing with very large volumes of data as well as in proposing
scalable solutions for them; and will be able to understand how Big Data impacts business intelligence,
scientific discovery, and our day-to-day life.

Core topics:
Introduction to the Big Data problem. Current challenges, trends, and applications

Technologies for Big Data management


Hands on prototype projects to get the actua; working of these technologies

Module 1: Introduction to Big Data Analytics

The Evolution of Data Management, Defining Big Data, Traditional and advanced analytics. History of big data,
its elements, career related knowledge, advantages, and disadvantages. Application perspective of Big Data
covering topics such as using big data in marketing, analytics, retail, hospitality, consumer good, defense etc.

Module 2: Introduction to Big Data and Hadoop eco system

This module focuses on Data Explosion, Types of Data, Need for Big Data, Big Data and Its Sources,
Characteristics of Big Data Technology, Leveraging Multiple Sources of Data, Hadoop/Spark based technologies
for Handling Big Data.

Module 3: Interactive analysis


Hive: Introducing Hive, Getting Started with Hive, Hive Variables, Hive Properties, Data types in Hive, Loading
Files into Tables, Application in Hive, Inserting Data into Tables, Update in Hive.

Introduction to schema on write, dimensional models to exploit/analyse business metrics, data pond for
analysis, metrics and KPIs, Drilling/roll ups, slice/dice for big data, Implementing a sales analysis system
with Hive

Module 4: Advanced Analytics (structured and and Time series Analysis)


Introduction to Analysis Base Tables, Dimension reduction, ETL for analysis, data pond for analytics,
Implementing a customer profiling system with SparkSQL

Hbase: HBase Introduction, Characteristics of HBase, Companies Using HBase, HBase Architecture, Storage
Model of HBase, Row Distribution of Data between Region Servers , Data Storage in HBase, Data Model, HBase
vs. RDBMS, Implementing a time series analysis system for IoT

Module 5: Text Analytics


Pig: Introducing Pig, the Pig Architecture, Benefits of Pig, Installing Pig, Properties of Pig, Running Pig, Running
Pig Programs, Pig Latin Structure, Application Flow.
Text analysis, Tokenizing, filtering, scoring, corpus creation, implementing a Sentiment analysis system for
Tweets
Module 6: Real time Analytics
Introduction to Lambda architecture, aggregations, anomaly detection, CEP, batch layer models, real time
thresholds, implementing a real time analytics for impressions
Dr. Sridhar Vaithianathan
sridhar.v@imthyderabad.ed
u.in
Pig: Introducing Pig, the Pig Architecture, Benefits of Pig, Installing Pig, Properties of Pig,
Running Pig, Running Pig Programs, Pig Latin Structure, Application Flow.

Hive: Introducing Hive, Getting Started with Hive, Hive Variables, Hive Properties, Data
types in Hive, Loading Files into Tables, Application in Hive, Inserting Data into Tables,
Update in Hive.

Hbase: HBase Introduction, Characteristics of HBase, Companies Using HBase, HBase


Architecture, Storage Model of HBase, Row Distribution of Data between Region Servers ,
Data Storage in HBase, Data Model, HBase vs. RDBMS.