Beruflich Dokumente
Kultur Dokumente
Agenda
MR1 limitations.
Yarn architecture.
MR1 limitations
Scalability
Maximum cluster size ~ 4,500 nodes
Maximum concurrent tasks 40,000
Availability
Failure kills all queued and running jobs
Inflexible slots
Low resource utilization
Hadoop versions
Hadoop Versions
Beyond MR
Beyond MR
Applications Run Natively in Hadoop
BATCH
INTERACTIVE
(Tez)
(MapReduce)
ONLINE
(HBase)
STREAMING
(Storm, S4,)
GRAPH
(Giraph)
OTHER
(Search)
(Weave)
YARN Components
ResourceManager (RM)
- RM
YARN Components
- RM
Resource
Manager
Resourc
e
Schedul
er
Scheduler
AM
Livelines
s
Monitor
NM
Livelines
s
Monitor
Applicati
onsMana
ger
YARN Components - NM
NodeManager (NM)
YARN Components NM
Node
Manager
NodeStatusUpdater
NodeSta
tusUpda
ter
Contain
er
Manage
r
Contain
er
Execute
r
YARN Components AM
Application Master (AM)
2
Resource
Manager
ContainerLaunchContext
The ApplicationMaster has to provide more information
to the NodeManager to actually launch the container.
Specify command line to launch the process within the
container.
Environment variables, local resources necessary on
the machine prior to launch, etc.
ApplicationSubmissionContex
t
ApplicationSubmissionContext
Application ID.
Application User.
Application Name.
Application Priority.
ContainerLaunchContext.
Resour
ce
Manag
er
ContainerLaunchContext
ContainerLaunchContext:
Container ID.
Resource allocated to the container.
User to whom the container is
allocated.
Security tokens and local resources.
Environment variables.
Command to launch the container.
Node
Manag
er
Node
Manag
er
MyApp
Applicatio
n Master
7
Node
Manager
AM
ResourceRequest
It has the following form:
<resource-name, priority, resource-requirement, number-ofcontainers>
resource-name is either hostname, rackname or * to indicate no
preference. In future, we expect to support even more complex
topologies for virtual machines on a host, more complex networks etc.
priority is intra-application priority for this request (to stress, this isnt
across multiple applications).
resource-requirement is required capabilities such as memory, cpu
etc. (at the time of writing YARN only supports memory and cpu).
number-of-containers is just a multiple of such containers.
Resource Request:
Request priority
Name of the machine or rack (* to signify any
machine or rack)
Resource required for each request.
Number of containers.
A Boolean Relax locality flag.
Node
Manager
Resourc
e
Manage
r
AM
Containers
:
Container
IDs.
Nodes.
AM & NM Communication
Start containers by
sending CLC.
Request Container
Status.
Status Response.
Node
Manager
AM
Node
Manager
Contain
er
Node
Manager
Contain
er
Node
Manager
Node
Manager
Contain
er
Node
Manager
AM
Continuou
s
Heartbeat
Resourc
e
Manage
r
Status Update
Node
Manager
Contain
er
Node
Manager
Contain
er
Node
Manager
Node
Manager
Contain
er
Job History
Server
Containers Configurations
Variable
yarn.scheduler.minimum-allocationmb
yarn.scheduler.maximum-allocationmb
yarn.scheduler.minimum-allocationvcores
yarn.scheduler.maximum-allocationvcores
Default Value
Description
1024
8192
Minimum number of
virtual cores for every
container request at RM.
32
Maximum number of
virtual cores for every
container request at RM.
Containers Configurations
Variable
yarn.nodemanager.resource.memorymb
yarn.nodemanager.resource.cpuvcores
yarn.nodemanager.vmem-pmemratio
Default Value
Description
8192
Amount of Physical
Memory that can be
allocated for containers.
2.1
Ratio of virtual to
physical memory for
containers
Containers Configuration
Recommended
Recommendations
Total Memory per
Reserved System
Memory
Recommended
Reserved HBase
Memory
4 GB
1 GB
1 GB
8 GB
2 GB
1 GB
16 GB
2 GB
2 GB
24 GB
4 GB
4 GB
48 GB
6 GB
8 GB
64 GB
8 GB
8 GB
72 GB
8 GB
8 GB
96 GB
12 GB
16 GB
128 GB
24 GB
24 GB
256 GB
32 GB
32 GB
512 GB
64 GB
64 GB
Node
Containers Configuration
Recommendations
Total RAM per Node
Recommended Min.
Container Size
< 4 GB
256 MB
Between 4 GB and 8 GB
512 MB
Between 8 GB and 24 GB
1024 MB
Above 24 GB
2048 MB
MapReduce on
YARN
Node
Manager
Node
Manager
Container
Resource
Manager
Node
Manager
MRAppMa
ster
Node
Manager
Container
MapReduce Configurations
yarn-site.xml
Variable
Default Value
Description
yarn.resourcemanager.hostname
0.0.0.0
yarn.nodemanager.aux-services
mapred-site.xml
Variable
mapreduce.framework.name
Default Value
Description
local
The runtime
framework for
executing MapReduce
jobs. Can be one of
local, classic, or yarn
MapReduce Configurations
Variable
mapreduce.map.memory.mb
mapreduce.map.cpu.vcores
mapreduce.reduce.memory.mb
mapreduce.reduce.cpu.vcores
Default Value
Description
1024
Amount of memory to
request from
scheduler for each
map task.
1024
Amount of memory to
request from
scheduler for each
reduce task.
MapReduce Configurations
Variable
mapred.child.java.opts
mapreduce.map.java.opts
mapreduce.reduce.java.opts
Default Value
Description
Xmx200m
Xmx200m
Xmx200m
Shuffle Services
Required for parallel MapReduce job operation.
Reducers fetch the output from all of the Maps by Shuffling Map output data from the
corresponding nodes where the map tasks have run.
It is implemented as a helping service in Node Manager.
NodeManager starts a Netty Web Server in its address space which knows how to
handle MapReduce specific shuffle requests.
Hadoop 2.0 provides for Encrypted Shuffle where HTTPS optional client
authentication is provided.