Beruflich Dokumente
Kultur Dokumente
in Page 1
Hadoop
Syllabus
SUMMER TRAINING 2018
Instructor
Information
General
Information
Descriptio
n
Learn the Concepts and implementation of Hadoop and Java programming, and take the first step on your journey
to becoming a Hadoop Developer!
Expectations and
Goals
It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job
requirements to provide in-depth learning on big data and Hadoop Modules. This is an industry recognized Big Data
certification training course that is a combination of the training courses in Hadoop developer, Hadoop administrator,
and analytics.
Course
Materials
Required
Materials
• Laptop
• 6+ GB RAM (Recommended
8GB)
Optional
Materials
• Internet Connection
Course
Syllabus
Introduction to Big Data & Hadoop and its Ecosystem, Map Reduce and
HDFS
What is Big Data, Where does Hadoop fit in, Hadoop Distributed File System – Replications, Block Size, Secondary
Namenode, High Availability, Understanding YARN – ResourceManager, NodeManager, Difference between 1.x and
2.x
Hadoop 2.x Cluster Architecture , Federation and High Availability, A Typical Production Cluster setup , Hadoop Cluster
Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Cloudera Single node cluster
Core Java
Fundamentals
Deep Dive in
Mapreduce
How Mapreduce Works, How Reducer works, How Driver works, Combiners, Partitioners, Input Formats, Output Formats,
Toll Free- 1800-30000-893 | www.mtaeducation.in Page 2
Shuffle and Sort, Mapside Joins, Reduce Side Joins, Distributed
Cache.
Linux
Fundamentals
Basic Linux commands, understanding the linux environment, exercises for practice
Lab
exercises:
Working with HDFS, Writing WordCount Program, Writing custom partitioner, Mapreduce with Combiner , Map Side Join,
Reduce Side Joins, Running Mapreduce in Local Job Runner Mode.
Understanding
Pig
A. Introduction to Pig Understanding Apache Pig, the features, various uses and learning to interact with Pig B.
Deploying Pig for data analysis The syntax of Pig Latin, the various definitions, data sort and filter, data types,
deploying Pig for ETL, data loading, schema viewing, field definitions, functions commonly used. C. Pig for complex
data processing Various data types including nested and complex, processing data with Pig, grouped data iteration,
practical exercise D. Performing multi-dataset operations Data set joining, data set splitting, various methods for data
set combining, set operations, hands-on exercise
Understanding
Hive
A. Hive Introduction Understanding Hive, traditional database comparison with Hive, Pig and Hive comparison, storing
data in Hive and Hive schema, Hive interaction and various use cases of Hive B. Hive for relational data analysis
Understanding HiveQL, basic syntax, the various tables and databases, data types, data set joining, various built-in
functions, deploying Hive queries on scripts, shell. C. Data management with Hive The various databases, creation of
databases, data formats in Hive, data loading, changing databases and Tables, result storing of queries, data access
control, managing data with Hive. D. Hands on Exercises – working with large data sets and extensive querying
Understanding
SQOOP
Sqoop Installations and Basics, Importing Data from MySQL to HDFS, Advance Imports, Real Time UseCase, Exporting
Data from HDFS to MySQL, Running Sqoop in Cloudera
Understanding
Flume
Overview of Apache Flume, Physically distributed Data sources, Changing structure of Data, Closer look, Anatomy of
Flume, Core concepts, Event, Clients, Agents, Source, Channels, Sinks, Interceptors, Channel selector, Sink processor,
Data ingest, Agent pipeline, Transactional data exchange, Routing and replicating, Why channels?, Use case- Log
aggregation, Adding flume agent, Handling a server farm, Data volume per agent, Example describing a single node flume
deployment
IMPAL
A
A. Introduction to Impala What is Impala?, How Impala Differs from Hive and Pig, How Impala Differs from Relational
Databases, Limitations and Future Directions, Using the Impala Shell
(AVRO) Data Format Selecting a File Format, Tool Support for File Formats, Avro Schemas, Using Avro with Hive
and Sqoop, Avro Schema Evolution, Compression
Apache
HBase
What is Hbase, Where does it fits, What is NOSQL, Hbase Basics & Architecture, Creating Tables, Listing Tables,
Enabling & Disabling tables, describe, alter drop tables, Scan, Insert, Update, Read, Delete Data, Scan
Apache
Spark
A. Why Spark? Working with Spark and Hadoop Distributed File System What
is Spark, Comparison between Spark and Hadoop, Components of Spark B.
Running Spark on a Cluster, Writing Spark Applications using Java/Scala
ZOOKEEP
ER
Oozi
e
Why Oozie?, Running an example, Oozie- workflow engine, Word count example, Oozie job processing, Job
submission
Quiz &
Awards
Project
c
c