05 Scheduler 141206231404 Conversion Gate01 PDF

Scheduling
Hadoop as batch processing system

Hadoop was designed mainly for running large batch
jobs such as web indexing and log mining.
Users submitted jobs to a queue, and the cluster ran
them in order.
Soon, another use case became attractive:
Sharing a MapReduce cluster between multiple users.
The benefits of sharing

With all the data in one place, users can run queries
that they may never have been able to execute
otherwise, and
Costs go down because system utilization is higher
than building a separate Hadoop cluster for each
group.
However, sharing requires support from the Hadoop
job scheduler to
provide guaranteed capacity to production jobs and
good response time to interactive jobs while allocating
resources fairly between users.
Approaches to Sharing
FIFO : In FIFO scheduling, a JobTracker pulls jobs from a work queue, oldest
job first. This schedule had no concept of the priority or size of the job
Fair : Assign resources to jobs such that on average over time, each job gets
an equal share of the available resources. The result is that jobs that require
less time are able to access the CPU and finish intermixed with the execution
of jobs that require more time to execute. This behavior allows for some
interactivity among Hadoop jobs and permits greater responsiveness of the
Hadoop cluster to the variety of job types submitted.
Capacity : In capacity scheduling, instead of pools, several queues are
created, each with a configurable number of map and reduce slots. Each
queue is also assigned a guaranteed capacity (where the overall capacity of
the cluster is the sum of each queue's capacity). Queues are monitored; if a
queue is not consuming its allocated capacity, this excess capacity can be
temporarily allocated to other queues.
FIFO Scheduling
Job Queue
FIFO Scheduling
Job Queue
FIFO Scheduling
Job Queue
Hadoop default scheduler (FIFO)

Problem: short jobs get stuck behind long ones
Separate clusters
Problem 1: poor utilization
Problem 2: costly data replication
Full replication across clusters nearly infeasible at
Facebook/Yahoo! scale
Partial replication prevents cross-dataset queries
Fair Scheduling
Job Queue
Fair Scheduling
Job Queue
Fair Scheduler Basics

Group jobs into pools
Assign each pool a guaranteed minimum share
Divide excess capacity evenly between pools
Pools
Determined from a configurable job property
Default in 0.20: user.name (one pool per user)
Pools have properties:

Minimum map slots
Minimum reduce slots
Limit on # of running jobs
Example Pool Allocations
entire cluster 100

slots
emp1
emp2
emp6
finance
min share = 30
min share = 40
job 1
job 2
job 3
job 4
30 slots
15 slots
15 slots
40 slots
Scheduling Algorithm
Split each pools min share among its jobs
Split each pools total share among its jobs
When a slot needs to be assigned:
If there is any job below its min share, schedule it
Else schedule the job that weve been most unfair to
(based on deficit)
Scheduler Dashboard
Scheduler Dashboard
Change priority
FIFO mode (for testing)
Change pool
Additional Features
Weights for unequal sharing:
Job weights based on priority (each level = 2x)
Job weights based on size
Pool weights
Limits for # of running jobs:

Per user
Per pool
Installing the Fair Scheduler

Build it:
ant package
Place it on the classpath:
cp
build/contrib/fairscheduler/*.jar
lib
Configuration Files
Hadoop config (conf/mapred-site.xml)
Contains scheduler options, pointer to pools file
Pools file (pools.xml)

Contains min share allocations and limits on pools
Reloaded every 15 seconds at runtime
Minimal hadoop-site.xml
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</
value>
</property>
<property>
<name>mapred.fairscheduler.allocation.file</nam
e>
<value>/path/to/pools.xml</value>
</property>
Minimal pools.xml
<?xml version="1.0"?>
<allocations>
</allocations>
Configuring a Pool
<allocations>
<pool name=emp4">
<minMaps>10</minMaps>
<minReduces>5</minReduces>
</pool>
</allocations>
Setting Running Job Limits

<allocations>
<pool name=emp4">
<maxRunningJobs>3</maxRunningJobs>
</pool>
<user name=emp1">
</user>
</allocations>
Default Per-User Running Job Limit

<allocations>
<pool name=emp4">
</pool>
<user name=emp1">
</user>
<userMaxJobsDefault>10</userMaxJobsDefault>
</allocations>
Other Parameters
mapred.fairscheduler.assignmultiple:
Assign a map and a reduce on each heartbeat;
improves ramp-up speed and throughput;
recommendation: set to true
Other Parameters
mapred.fairscheduler.poolnameproperty:
Which JobConf property sets what pool a job is in
- Default: user.name (one pool per user)
- Can make up your own, e.g. pool.name, and pass in
JobConf with conf.set(pool.name, mypool)
Useful Setting
<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>pool.name</value>
</property>
<property>
<name>pool.name</name>
<value>${user.name}</value>
</property>
Make pool.name default

to user.name
Issues with Fair Scheduler

Fine-grained sharing at level of map & reduce tasks
Predictable response times and user isolation
Problem: data locality

For efficiency, must run tasks near their input data
Strictly following any job queuing policy hurts locality: job
picked by policy may not have data on free nodes
Solution: delay scheduling

Relax queuing policy for limited time to achieve locality
The Problem
Master
Job 1
Job 2
Scheduling order
File 1:
File 2:
Task 2
Task 5
Task 3
Task 1
Task 7
Task 4
Slave
Slave
Slave
Slave
Slave
Slave
2
1
8
2
The Problem
Master
Job 2
Job 1
Scheduling order
File 1:
File 2:
Task 2
Task 5
1
Task 3
Task 1
2
Task 7
Task 4
3
Slave
Slave
Slave
Slave
Slave
Slave
2
1
8
2
Problem: Fair decision hurts locality

Especially bad for jobs with small input files
Solution: Delay Scheduling

Relax queuing policy to make jobs wait for a limited
time if they cannot launch local tasks
Result: Very short wait time (1-5s) is enough to get
nearly 100% locality
Delay Scheduling Example

Wait!
Master
Job 2
Job 1
Scheduling order
File 1:
File 2:
Task 2
1
Task 5
8
Task 3
Task 3
1
2
Task 7
Task 4
6
Slave
Slave
Slave
Slave
Slave
Slave
2
1
8
2
Idea: Wait a short time to get data-local

scheduling opportunities
Delay Scheduling Details
Scan jobs in order given by queuing policy, picking

first that is permitted to launch a task
Jobs must wait before being permitted to launch

non-local tasks
If wait < T1, only allow node-local tasks
If T1 < wait < T2, also allow rack-local
If wait > T2, also allow off-rack
Increase a jobs time waited when it is skipped
Capacity Scheduler
Organizes jobs into queues

Queue shares as %s of cluster
FIFO scheduling within each queue
Supports preemption
Queues are monitored; if a queue is not consuming its
allocated capacity, this excess capacity can be temporarily
allocated to other queues.
End of session
Day 1: Scheduling

05 Scheduler 141206231404 Conversion Gate01 PDF

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

05 Scheduler 141206231404 Conversion Gate01 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Scheduling

Hadoop as batch processing system

The benefits of sharing

Hadoop default scheduler (FIFO)

Fair Scheduler Basics

Pools have properties:

Example Pool Allocations

entire cluster 100

FIFO mode (for testing)

Limits for # of running jobs:

Installing the Fair Scheduler

Pools file (pools.xml)

Setting Running Job Limits

Default Per-User Running Job Limit

Make pool.name default

Issues with Fair Scheduler

Problem: data locality

Solution: delay scheduling

Problem: Fair decision hurts locality

Solution: Delay Scheduling

Delay Scheduling Example

Idea: Wait a short time to get data-local

Delay Scheduling Details

Scan jobs in order given by queuing policy, picking

Jobs must wait before being permitted to launch

Increase a jobs time waited when it is skipped

Organizes jobs into queues

Das könnte Ihnen auch gefallen