Beruflich Dokumente
Kultur Dokumente
1
Module 1 Module 3
Understanding Big Data and Hadoop Hadoop MapReduce Framework
Learning Objectives - In this module, you will understand Big Learning Objectives - In this module, you will understand
Data, the limitations of the existing solutions for Big Data Hadoop MapReduce framework and the working of
problem, how Hadoop solves the Big Data problem, the MapReduce on data stored in HDFS. You will understand
common Hadoop ecosystem components, Hadoop concepts like Input Splits in MapReduce, Combiner &
Architecture, HDFS, Anatomy of File Write and Read, how Partitioner and Demos on MapReduce using different data
MapReduce Framework works sets.
Topics Topics
Big Data MapReduce Use Cases
Limitations and Solutions of existing Data Analytics Traditional way Vs MapReduce way
Architecture Why MapReduce
Hadoop Hadoop 2.x MapReduce Architecture
Hadoop Features Hadoop 2.x MapReduce Components
Hadoop Ecosystem YARN MR Application Execution Flow
YARN Workflow, Anatomy of MapReduce
Hadoop 2.x core components
Program
Hadoop Storage: HDFS, Hadoop
Demo on MapReduce. Input Splits,
Processing:
Relation between Input Splits and HDFS
MapReduce Framework Blocks
Hadoop Different Distributions MapReduce: Combiner & Partitioner
Demo on de-identifying Health Care Data
set
Module 2 Demo on Weather Data set.
Hadoop Architecture and HDFS
Learning Objectives - In this module, you will learn the Module 4
Hadoop Cluster Architecture, Important Configuration files Advanced MapReduce
in a Hadoop Cluster, Data Loading Techniques, how to setup
Learning Objectives - In this module, you will learn
single node and multi node Hadoop cluster
Advanced MapReduce concepts such as Counters,
Distributed Cache, MRunit, Reduce Join, Custom Input
Topics Format, Sequence Input Format and XML parsing.
Hadoop 2.x Cluster Architecture - Federation and
High Availability
Topics
A Typical Production Hadoop Cluster Counters
Hadoop Cluster Modes Distributed Cache
Common Hadoop Shell Commands MRunit
Hadoop 2.x Configuration Files Reduce Join
Single node cluster Custom Input Format
Multi node cluster set up Hadoop Administration Sequence Input Format
Module 5 Module 6
Pig Hive
Learning Objectives - In this module, you will learn Pig, Learning Objectives - This module will help you in
types of use case we can use Pig, tight coupling between Pig understanding Hive concepts, Hive Data types, loading and
and MapReduce, and Pig Latin scripting, PIG running Querying Data in Hive, running hive scripts and Hive UDF.
modes, PIG UDF, Pig Streaming, Testing PIG Scripts. Demo
on healthcare dataset. Topics
Hive Background
Topics Hive Use Case
About Pig About Hive Hive Vs Pig
MapReduce Vs Pig Hive Architecture and Components
Pig Use Cases Metastore in Hive
Programming Structure in Pig Limitations of Hive
Pig Running Modes Comparison with Traditional Database
Pig components Hive Data Types and Data Models
Pig Execution Partitions and Buckets
Pig Latin Program Hive Tables(Managed Tables and External Tables)
Data Models in Pig Importing Data
Pig Data Types Querying Data
Shell and Utility Commands Managing Outputs
Pig Latin : Relational Operators File Loaders, Group Hive Script
Operator, COGROUP Operator, Joins and Hive UDF
COGROUP, Union, Diagnostic Operators,
Retail use case in Hive
Specialized joins in Pig
Hive Demo on Healthcare Data set.
Built In Functions ( Eval Function, Load and Store
Functions, Math function, String Function, Date
Function, Pig UDF, Piggybank Module 7
Advanced Hive and HBase
Parameter Substitution ( PIG macros and Pig
Learning Objectives - In this module, you will understand
Parameter substitution )
Advanced Hive concepts such as UDF, Dynamic Partitioning,
Pig Streaming
Hive indexes and views, optimizations in hive. You will also
Testing Pig scripts with Punit acquire in-depth knowledge of HBase, HBase Architecture,
Aviation use case in PIG, Pig Demo on Healthcare running modes and its components.
Data set.
Topics
Hive QL: Joining Tables, Dynamic Partitioning,
Custom Map/Reduce Scripts
Hive Indexes and views
Hive query optimizers
Hive : Thrift Server, User Defined Functions
HBase: Introduction to NoSQL Databases and
HBase, HBase v/s RDBMS, HBase Components,
HBase Architecture, HBase Cluster Deployment.
Module 10
Module 8 Oozie and Hadoop Project
Advance HBase Learning Objectives - In this module, you will understand working of
Learning Objectives - This module wil multiple Hadoop ecosystem components together in a Hadoop
implementation to solve Big Data problems. We will discuss multiple
HBase concepts. We will see demo
data sets and specifications of the project. This module will also
Filters. You will also learn what Zooke
cover Flume & Sqoop demo, Apache Oozie Workflow Scheduler for
how it helps in monitoring a cluster
Hadoop Jobs, and Hadoop Talend integration.
Zookeeper.
Topics
Topics
Flume and Sqoop Demo
HBase Data Model
Oozie
HBase Shell
Oozie Components
HBase Client API
Oozie Workflow
Data Loading Techniques Scheduling with Oozie
ZooKeeper Data Model Demo on Oozie Workflow
Zookeeper Service Oozie Co-ordinator
Zookeeper Oozie Commands
Demos on Bulk Loading Oozie Web Console
Getting and Inserting Data Oozie for MapReduce PIG, Hive, and Sqoop,
Filters in HBase Combine flow of MR, PIG, Hive in Oozie
Hadoop Project Demo
Module 9 Hadoop Integration with Talend
Processing Distributed Data with
Apa
Topics
What is Apache Spark
Spark Ecosystem
Spark Components
History of Spark and Spark
Ver Spark a Polyglot What is
Scala?
Why Scala?
SparkContext
RDD
l so
,
wn
Project Work
Towards the end of the course, you will be working on a live project where you will be using PIG, HIVE, HBase and MapReduce to
perform Big Data analytics.
Here are the few Industry-wise Big Data case studies e.g. Finance, Retail, Media, and Aviation etc. which you can take up as your
project work: