Hadoop Architecture

Big Data Architecture
12 June 2017 | Proprietary and confidential information. © Mphasis 2017

Agenda
✓ Hadoop Features
✓ Hadoop Components
✓ Hadoop Processes
✓ Hadoop Architecture
✓ MapReduce Framework
✓ What is YARN
✓ What is ZooKeeper

2 12 June 2017 | Proprietary and confidential information. © Mphasis 2017
Hadoop Features
➢ The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
➢ HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.

➢ Achieving data localization
➢ HDFS provides high throughput access to application data and is suitable for applications that have large data sets.
❖ Moving the application to the place where the data is residing OR
❖ Making
➢ HDFS relaxesdata local
a few to application
POSIX requirements to enable streaming access to file system data.
➢ HDFS was originally built as infrastructure for the Apache Nutch web search engine project.
➢ HDFS is now an Apache Hadoop subproject. The project URL is http://hadoop.apache.org/hdfs/

Hadoop Components
o Hadoop is combination of two independent components
o HDFS (Hadoop Distributed File System)

➢ Designed for scaling in terms of storage and IO bandwidth
❖ Making data local to application
o MR framework (MapReduce)
➢ Designed for scaling in terms of performance

Hadoop Process


Hadoop Processes
o Processes running on Hadoop
➢ ▪Achieving
NameNode
data localization
▪ DataNode
❖ Moving the application to the place whereUsed ByisHDFS
the data residing OR
▪ ❖Secondary NameNode
Making data local to application
▪ Task Tracker
Used by MapReduce Framework
▪ Job Tracker

Hadoop Processes
➢ Two Masters
NameNode
• Ifdata
➢ Achieving down cannot access HDFS
localization

Job tracker
• If down cannot run MapReduce Job, but still you can access HDFS

NameNode and DataNode
HDFS has a master/slave architecture.
An HDFS cluster consists of a single NameNode, a master server that manages the
file system namespace and regulates access to files by clients.
In addition, there are a number of DataNodes, usually one per node in the cluster,
which manage storage attached to the nodes that they run on.
HDFS exposes a file system namespace and allows user data to be stored in files.
Internally, a file is split into one or more blocks and these blocks are stored in a set of
DataNodes.

NameNode and DataNode
The NameNode executes file system namespace operations like opening, closing,
and renaming files and directories.
It also determines the mapping of blocks to DataNodes.
DataNodes are responsible for serving read and write requests from the file system’s
clients.
DataNodes also perform block creation, deletion, and replication upon instruction
from the NameNode.
HDFS is built using the Java language, any machine that supports Java can run the
NameNode or the DataNode software.

Hadoop Architecture

File System Namespace
✓ HDFS supports a traditional hierarchical file organization.
✓ A user or an application can create directories and store files

inside these directories.
✓ The file system namespace hierarchy is similar to most other

existing file systems; one can create and remove files, move
a file from one directory to another, or rename a file. HDFS
does not yet implement user quotas. HDFS does not support
hard links or soft links. However, the HDFS architecture does
not preclude implementing these features.

File System Namespace
✓ The NameNode maintains the file system namespace.
✓ Any change to the file system namespace or its properties is

recorded by the NameNode.
✓ An application can specify the number of replicas of a file that

should be maintained by HDFS.
✓ The number of copies of a file is called the replication factor of

that file. This information is stored by the NameNode.

Data Replication
✓ The NameNode maintains the file system namespace.
✓ Any change to the file system namespace or its properties is

recorded by the NameNode.
✓ An application can specify the number of replicas of a file that

should be maintained by HDFS.
✓ The number of copies of a file is called the replication factor of

that file. This information is stored by the NameNode.

Overview of HDFS
✓ NameNode is the single point of contact
❖Consist of the meta information of the HDFS

❖If it fails, HDFS is inaccessible
✓ DataNodes consist of the actual data
❖Store the data in blocks

❖Blocks are stored in local file system

Overview of MapReduce
✓ MapReduce job consist of two tasks

▪ Map Task
▪ Reduce Task
✓ Blocks of data distributed across several machines are processed by map tasks parallely
✓ Results are aggregated in the reducer
✓ Works only on KEY/VALUE pair

Secondary NameNode
✓ Not a back up or stand by NameNode
✓ Only purpose is to take the snapshot of NameNode and merging the log file contents
into metadata file on local file system
✓ It’s a CPU intensive operation
✓ In big cluster it is run on different machine

Secondary NameNode Cont’d
✓ Two important files (present under this directory

(/home/software/hadoop-temp/dfs/name/previous.checkpoint )
✓ Edits file
✓ Fsimage file
✓ When starting the Hadoop cluster (start-all.sh)

• Restores the previous state of HDFS by reading fsimage file
• Then starts applying modifications to the meta data from the edits file
• Once the modification is done, it empties the edits file
• This process is done only during start up

Secondary NameNode Cont’d
✓ Over a period of time edits file can become very big and the next start become very
longer
✓ Secondary NameNode merges the edits file contents periodically with fsimage file to
keep the edits file size within a sizeable limit

Job Tracker
✓ MapReduce master
✓ Client submits the job to JobTracker
✓ JobTracker talks to the NameNode to get the list of blocks
✓ Job Tracker locates the task tracker on the machine where data is located
✓ Data Localization
✓ Job Tracker then first schedules the mapper tasks
✓ Once all the mapper tasks are over it runs the reducer tasks

Task Tracker
✓ Responsible for running tasks (map or reduce tasks) delegated by job tracker
✓ For every task separate JVM process is spawned
✓ Periodically sends heart beat signal to inform job tracker
• Regarding the available number of slots

• Status of running tasks

Heartbeat
What is heartbeat in
hadoop

Heartbeat Cont’d
✓ A heartbeat is a signal indicating that node is alive.

✓ A datanode sends heartbeat to Namenode and task tracker will send its heart beat to
job tracker.
✓ If the Namenode or job tracker does not receive heart beat then they will decide that
there is some problem in datanode or task tracker is unable to perform the assigned
task.

Typical data storage in Hadoop
✓ How data is stored in Hadoop

✓ Example to store data with Replication factor 3
✓ What is Replication factor
• Replication Factor tells about data replication on multiple nodes ie how many copies
of data need to be maintained
• by the way it can achieve high fault tolerant and high availability

How data is stored in HDFS
Rack -1 Rack - 2 Rack -3
1 1
1
2 2 2
3 3 3
4 4 4
Block B: Block C:
Block A:

Test Your Knowledge
✓ What are hadoop components

✓ What are hadoop processes
✓ What are master components in hadoop
✓ What is meant by Data replication
✓ Howmany tasks will MapReduce consits and name those tasks
✓ What is secondary NameNode
✓ What is Heartbeat

Reading from HDFS
✓ Connects to NameNode (NN)
✓ Ask NN to give the list of DataNodes (DN) that is hosting the replica’s of the block of file
✓ Client then directly read from the data nodes without contacting again to NN
✓ Along with the data, check sum is also shipped for verifying the data integrity.
• If the replica is corrupt client intimates NN, and try to get the data from other
DataNode(DN)

MapReduce Framework
✓ What is MapReduce job?
✓ What is input split?
✓ What is mapper?
✓ What is reducer?

Typical MapReduce Workflow

Typical MapReduce Workflow Cont’d
sd

MR Job for word count

Checking MR job status on console

MR Job status on admin

MapReduce Job
✓ It’s a framework for processing the data residing on HDFS

• Distributes the task (map/reduce)
✓ Consist of typically 5 phases:
• Map
• Partitioning
• Sorting
• Shuff ling
• Reduce
✓ A single map task works typically on one block of data (dfs.block.size)

• No of blocks / input split = No of map tasks
✓ After all map tasks are completed the output from the map is passed on to the machines where reduce
task will run

MR Terminology
✓ What is job?
• Complete execution of mapper and reducers over the entire data set
✓ What is task?
• Single unit of execution (map or reduce task)
• Map task executes typically over a block of data (dfs.block.size)
• Reduce task works on mapper output
✓ What is “task attempt”?
• Instance of an attempt to execute a task (map or reduce task)
• If task is failed working on particular portion of data, another task
• will run on that portion of data on that machine itself
• If a task fails 4 times, then the task is marked as failed and entire job fails
• Make sure that atleast one attempt of task is run on different machine

MR Terminology Cont’d
✓ How many tasks can run on portion of data?

• Maximum 4
• If “speculative execution” is ON, more task will run
✓ What is “failed task”?
• Task can be failed due to exception, machine failure etc.
• A failed task will be re-attempted again (4 times)
✓ What is “killed task”?
• If task fails 4 times, then task is killed and entire job fails.
• Task which runs as part of speculative execution will also be marked as killed

Input Split
✓ Portion or chunk of data on which mapper operates
✓ Input split is just a reference to the data
✓ Typically input size is equal to one block of data (dfs.block.size)
✓ Each mapper works only on one input split

Input Split Cont’d
✓ Input split size can be controlled.

• Useful for performance improvement
• Generally input split is equal to block size (64MB)
• What if you want mapper to work only on 32 MB of a block data?
✓ Controlled by 3 properties:
• Mapred.min.split.size( default 1)
• Mapred.max.split.size (default LONG.MAX_VALUE)
• Dfs.block.size ( default 64 MB)

What is Mapper
✓ Mapper is the first phase of MapReduce job
✓ Works typically on one block of data
✓ MapReduce framework ensures that map task is run closer to the data to
avoid network traffic
• Several map tasks runs parallel on different machines and each working
on different portion (block) of data
✓ Mapper reads key/value pairs and emits key/value pair

Mapper Cont’d
✓ Mapper can use or can ignore the input keu
✓ Mapper can emit

✓ Zero key value pair
✓ 1 key value pair
✓ “n” key value pair

Mapper Cont’d
✓ Map function is called for one record at a time

• Input Split consist of records
• For each record in the input split, map function will be called
• Each record will be sent as key –value pair to map function
• So when writing map function keep ONLY one record in mind
• It does not keep the state of whether how many records it has
processed or how many records will appear
• Knows only current record

What is reducer
✓ Reducer runs when all the mapper tasks are completed

✓ After mapper phase , all the intermediate values for a given intermediate
keys is grouped together and form a list
✓ This list is given to the reducer
• Reducer operates on Key, and List of Values

• When writing reducer keep ONLY one key and its list of value in mind
• Reduce operates ONLY on one key and its list of values at a time

Reducer Cont’d
✓ NOTE all the values for a particular intermediate key goes to one reducer
✓ There can be Zero, one or “n” reducer.

• For better load balancing you should have more than one reducer

Input Format
Input Format Key Value
Text Input Format Offset of the line within a Entire line till “\n” as value
file
Key Value Text Input Format Part of the record till the Remaining record after first
first delimiter delimiter
Sequence File Input Format Key needs to be determined Value need to be
from the header determined from header

Input Format Cont’d
Input Format Key Data Type Value Data Type
Text Input Format Long Writable Text
Key Value Text Input Format Text Text
Sequence File Input Foramt ByteWritable ByteWritable

Text Input Format
Efficient for processing text data
Example:
Hello, How are you
Hey, I am good
I am from Mphasis

Text Input Format Cont’d
✓ Internally every line is associated with offset
✓ The offset is treated as key. The first column is offset

✓ Simply, line numbers are given
0 Hello, How are you
1 Hey, I am good
2 I am from Mphasis

Text Input Format Cont’d
1 Hey, I am good
2 I am from Mphasis
Key Value
1 Hey, I am good
2 I am from Mphasis

How Input Split is processed by Mapper
✓ Input split by default is the block size (dfs.block.size)

✓ Each input split / block comprises of records
• A record is one line in the input split terminated by “\n” (new line
character)
✓ Every input format has “RecordReader”

• RecordReader reads the records from the input split
• RecordReader reads ONE record at a time and call the map function.
• If the input split has 4 records, 4 times map function will be called,
one for each record
• It sends the record to the map function in key value pair
Combiner
✓ Large number of mapper running will produce large amounts of

intermediate data
• This data needs to be passed to the reducer over the network
• Lot of network traffic
• Shuffling/Copying the mapper output to the machine where reducer
will run will take lot of time

Combiner Cont’d
✓ Similar to reducer
✓ Runs on the same machine as the mapper task
✓ Runs the reducer code on the intermediate output of the mapper
✓ Thus minimizing the intermediate key-value pairs
✓ Combiner runs on intermediate output of each mapper
Advantages
✓ Minimize the data transfer across the network
✓ Speed up the execution
✓ Reduces the burden on reducer

Combiner Cont’d
✓ Combiner has the same signature as reducer class

✓ Can make the existing reducer to run as combiner, if
✓ The operation is associative or commutative in nature
Example: Sum, Multiplication
✓ Average operation cannot be used
✓ Combiner may or may not run. Depends on the framework

✓ It may run more than once on the mapper machine

Partitioner
✓ It is called after you emit your key value pairs from mapper
context.write(key,value)
✓ Large number of mappers running will generate large amount of data
• And If only one reducer is specified, then all the intermediate key and its
list of values goes to a single reducer
✓ Copying will take lot of time
✓ Sorting will also be time consuming
✓ Whether single machine can handle that much amount of
intermediate data or not?
✓ Solution is to have more than one reducer

Partitioner
✓ It is called after you emit your key value pairs from mapper
context.write(key,value)
✓ Large number of mappers running will generate large amount of data
• And If only one reducer is specified, then all the intermediate key and its
list of values goes to a single reducer
✓ Copying will take lot of time
✓ Sorting will also be time consuming
✓ Whether single machine can handle that much amount of
intermediate data or not?
✓ Solution is to have more than one reducer

Partitioner Cont’d
✓ Partitioner divides the keys among the reducers

• If more than one reducer running, then partitioner decides which key
value pair should go to which reducer
✓ Default is “Hash Partitioner”

• Calculates the “hashcode” and do the modulo operator with total
number of reducers which are running

Speculative Execution

Speculative Execution – Cont’d
➢ Hadoop framework feels that a certain task (Mapper or Reducer) is taking longer on
average compared to the other tasks from the same job, it clones the “long running”
task and run it on another node. This is called Speculative Execution.
➢ Meaning Hadoop is speculating that something is wrong with the “long running” task
and runs a clone task on the other node
➢ The slowness in the “long running” job could be due to a faulty hardware, network
congestion, or the node could be simply busy etc.
➢ Most of the the time this is a false alarm and the task which was considered long
running or problematic completes successfully. In that case Hadoop will kill the cloned
task and proceed with the results from the completed task

NextGen – MR ( YARN )
What is YARN
✓ MRv2 is split into two major functionalities
✓ job Tracker , resource management and scheduling/monitoring
✓ Resource Manager(RM) is ultimate authority that arbitrates resources

among all the applications in the system
✓ Application Master(AM), is a framework specific library and is tasked with

negotiating resources from ResourceManager ad working with
NodeManager(s) to execute and monitor the tasks

NextGen – MR ( YARN ) – Cont’d

✓ ResourceManager has two components

• Scheduler
• ApplicationManager
✓ Scheduler is responsible for allocating resources to various running

applications
✓ Scheduler offers no guarantees about restarting failed tasks either due to

application failure or hardware failure.
✓ Scheduler performs its scheduling function based the resource

requirements of the applications
✓ ApplicationManager is responsible for job-submissions
✓ NodeManager is per-machine framework agent which is respsonsible for

containers, monitoring their resource usage ( cpu,meory,disk, network)
and reporting same to ResourceManager/Scheduler.
✓ ApplicationMaster has the responsibility of negotiating appropriate

resource containers from Scheduler, tracking status and monitoring for
progress

Differences b/w MR1 and MR2

MR Job execution on YARN

ZooKeeper

ZooKeeper
What is ZooKeeper
✓ ZooKeeper is an open source Apache project that provides a centralized

infrastructure and services that enable synchronization across a cluster.
✓ ZooKeeper maintains common objects needed in large cluster

environments. Examples of these objects include configuration
information, hierarchical naming space, and so on.

A Typical ZooKeeper Service

Test Your Knowledge
✓ What is input split

✓ What is mapper
✓ What is reducer
✓ what are different types of input formats
✓ what is combiner
✓ What is Partitioner
✓ What is YARN
✓ What is ZooKeeper

References
https://en.wikipedia.org/wiki/Apache_Hadoop
https://hadoop.apache.org/
https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-
site/YARN.html


Hadoop Architecture

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Hadoop Architecture

Hochgeladen von

Copyright:

Verfügbare Formate

Big Data Architecture

12 June 2017 | Proprietary and confidential information. © Mphasis 2017

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

➢ HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.

➢ HDFS is now an Apache Hadoop subproject. The project URL is http://hadoop.apache.org/hdfs/

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

o Hadoop is combination of two independent components

o HDFS (Hadoop Distributed File System)

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

➢ Achieving data localization

❖ Moving the application to the place where the data is residing OR

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

o Processes running on Hadoop

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

❖ Moving the application to the place where the data is residing OR

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

HDFS has a master/slave architecture.

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

It also determines the mapping of blocks to DataNodes.

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ HDFS supports a traditional hierarchical file organization.

✓ A user or an application can create directories and store files

✓ The file system namespace hierarchy is similar to most other

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ The NameNode maintains the file system namespace.

✓ Any change to the file system namespace or its properties is

✓ An application can specify the number of replicas of a file that

✓ The number of copies of a file is called the replication factor of

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ The NameNode maintains the file system namespace.

✓ Any change to the file system namespace or its properties is

✓ An application can specify the number of replicas of a file that

✓ The number of copies of a file is called the replication factor of

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ NameNode is the single point of contact

❖Consist of the meta information of the HDFS

✓ DataNodes consist of the actual data

❖Store the data in blocks

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ MapReduce job consist of two tasks

✓ Results are aggregated in the reducer

✓ Works only on KEY/VALUE pair

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ Not a back up or stand by NameNode

✓ It’s a CPU intensive operation

✓ In big cluster it is run on different machine

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ Two important files (present under this directory

✓ When starting the Hadoop cluster (start-all.sh)

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ For every task separate JVM process is spawned

✓ Periodically sends heart beat signal to inform job tracker

• Regarding the available number of slots

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ A heartbeat is a signal indicating that node is alive.

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

✓ How data is stored in Hadoop

12 June 2017 | Proprietary and confidential information. © Mphasis 2014

Rack -1 Rack - 2 Rack -3

12 June 2017 | Proprietary and confidential information. © Mphasis 2014