Sie sind auf Seite 1von 5

Job Scheduling In a High Performance Computing Environment Robert C. Jackson University Of Texas-Pan American rjack@utpa.

edu Abstract
Important to the optimal usage of HPC resources is the commodity CPU cycles, compute cycles, or allocations, which must not be wasted. Schedulers are aware of CPUs, nodes and generally the time it takes to run parallel jobs. Speedup the main focus for using HPC resources is described by Amdahls law: Amdahls Law = = 10, where S is the speedup, P is parallelizable, (1-P) is serial and N is the # of processors in this case 8000. Job schedulers and resources managers whose policy is not configured beyond its default values [6] or used correctly can affect the performance slightly or greatly. Several scheduling policies are currently used: FCFS (FIFO) First Come First Served, a convenient natural selection policy provides a norm for the comparison of other policies. This policy handles all arriving jobs in single file the first job first the last job last. Drawbacks to FCFS processing are 1) throughput can be low because large jobs can hog resources and, 2) deadlines may not be met. Low scheduling overhead accompanies this policy and no starvation will occur even though wait time may be high. Fair Share (FS) resource usage is equally distributed amongst users and groups, neglecting distribution to processes. An example of this is 4 users w, x, y, z, all concurrently executing 1 process each, the scheduler operating in FS mode will divide resources equally amongst 4 users giving each 25% of the whole (100%/4 = 25%). If user x starts another process, each user will still receive 25%, but both of user xs processes will use 12.5%. Round Robin (RR) every user in turn is given access to resources. This policy has low overhead, is easy to implement and is starvation free. An example of RR operation would assign resources to jobs for 100ms at a time, job x would take a turn and after 100ms it would be job ys turn. If job x didnt complete its computation by the end of 100ms run it would be checkpointed and terminated allowing job y to take its turn, on job xs next turn it would complete its previously started computation.

With the evolution of higher speed computational machines, clusters, and grids the need for resource managers and job schedulers to manage tasks is at a premium. The original Portable Batch Scheduler (PBS) has evolved from OpenPBS to many different variants that employ various techniques, mechanisms and algorithms to implement and maximize the effective use of said resources. This survey will discuss Job Scheduling and demonstrate a few capabilities of popular scheduling products.

1. Introduction: Current Scheduling trends and the importance of resource management


Scheduling refers to the way processes are assigned to run on available processors. Job schedulers and resource managers are used on High Performance Computing (HPC) clusters and grids, which have multiple computers or nodes available as execution hosts. Not to be confused with OS process schedulers, job schedulers, accept job submissions for execution on these execution hosts. Job schedulers in their most basic form can be represented as queue with a server (M/M/1 queue) [4],

where represents the arrival of customers (jobs), which are placed into the queue waiting for the server with rate to service them. With these factors in mind, the system works best if, . This is also the case with the practical job scheduler; jobs need to be serviced in a timely manner. Another important feature of the job scheduler or resource manager is dispatching submissions to the required or requested resources according to scheduler policy dictated by the schedulers configuration and setup. This policy may also determine how and where results are formatted and retrieved. Using multiple policy based schedulers is the current trend of resource management for High Performance Computing Clusters, grids and cloud computing.

2. Background: Scheduling Policy

Shortest Job First (SJF) tasks with the estimated least processing time are scheduled to run next. This method maximizes job throughput, minimizes wait time but has the potential to starve jobs that have large processing times if jobs with short processing times are continuously added to the schedule. Aging is used to estimate the running times of queued jobs. Priority Based (PB) scheduler arranges jobs according to an assigned priority, with the highest priority jobs getting access to resources first. Maximum overhead and starvation is possible. Multilevel (ML) this scheduler class uses multiple queues to service tasks. Jobs are divided into different groups or categories and then supplied to a queue specifically maintained to support their resource needs. Pre-emption (PE) this policy allows the scheduler to interrupt, checkpoint and release the resources of a running job to another job. An example of this would be setting a hard limit of 5on the number of jobs that a specific user can run. When user1 submits their 6th job that job stands the chance of being pre-empted if user2s job comes into the queue and needs the resources currently used by user1s 6th job. Backfilling (BF) allows for the harvesting of idle compute cycles giving those resources to jobs of lower priority whose needs are in tune with those resources. Service Level Agreement (SLA) this policy class determines the terms of service between jobs and resources. Specific agreements between jobs and resources are made as to how jobs will be serviced and what services will be accessed. A good policy for jobs that have a run deadline. Global (GL) policy that is applicable throughout all scheduler resources. Local (LOC) policy that is applied to specific scheduler resources that may or may not apply globally. An example of this would be a multi-queue scheduler that consists of 6 queues, where queues 1-3operate as FCFS queues, and queues 4-6 operate as priority based queues. Proposed Algorithm Weighted this scheduler policy assigns an estimated weight to jobs, based on submission attributes such as walltime, memory required, number of CPUs requested and a signature from similar runs over time. This weight determines what resources are available to jobs, over and above requested resources or under and below said resources to maximize return time, prevent

starvation and prevent system hogging. Jobs are reweighted on exit according to performance metrics determined by the run. The weight value is learned and properly applied to arriving jobs. Example follows: 0 + Wxi = U0x; where 0 is the arriving job, Wx is the weighted value added to the job index, Ux is the value of the job including the weight as the job goes to run. Wxf = U0x +/- V0x; Wxf is the upgrade value for Wx after the job has completed its run; Vx is the learned upgrade attributes to reset the weighted value.

3. Popular Schedulers
Some common schedulers are: PBS Portable Batch Scheduler developed by NASA in 1993 to replace the NQS scheduler used on Cray Supercomputers. Features of this scheduler are mutliqueue capability, several scheduling algorithms and feature based scheduling. Torque, MAUI and PBSPro schedulers are descendents of PBS. OpenPBS cannot provide adequate service to large clusters. PBS is no longer supported LSF Load Sharing Facility made by Platform computing www.platform.com is an in depth resource manager employing several scheduling policies including FCFS, fairshare, pre-emption, backfilling and SLA. Multi-queuing is also available. LSF is capable of scheduling all cluster resources including application licensing, and workload suspension is another feature [14]. LSF can also co-operate with other workload managers such as MAUI/MOAB. The latest version of LSF offers a mechanism called Intelligent Scheduling which in part allows for live (transparent) reconfiguration.

LSF Base Architecture [15] LSF architecture is loosely coupled, where the base architecture is central [15]. The Batch Architecture shown in the next figure sits on top of the base architecture. The LSF Parallel Library handles the parallel interaction for applications; this can be seen in the 3rd figure.

LSF Batch Architecture [15] The MPI Library The Parallel Application Manager (PAM)

Master LIM stores data collected by LIMs running on cluster nodes. RES Remote Execution Server runs on each LSF server and accepts remote execution requests.RES is similar to the remote shell daemon (rshd). SBD Slave Batch Daemon, runs on each node, receive batch requests from the MBD, it also enforces LSF policy. MBD Master Batch Daemon receives and applies policy to all job requests from LSF clients. Is responsible for all jobs in the batch system. PIM Process Information Manager, runs on all LSF servers monitors all jobs and processes.

LSF Job Submission and Scheduling Session [15] Advantages for the LSF Scheduler: Robust and vast capability scalability Disadvantages for the LSF Scheduler: complex administration and oversight high cost per CPU PBSPro developed by Altair Engineering www.altair.com since 2003 is a robust workload management solution that has most of the common features but adds advanced scheduling algorithms, server failover, checkpointing and automatic job recovery capability. License scheduling, GPU scheduling and Green Provisioning are the latest offerings. PBSPro Architecture centers on the PBS Server [1], Scheduler and Machine Oriented Mini_server (MOM). The scheduler and server reside on the main PBS host and the MOM resides on execution hosts.

LSF Parallel Library [15] LSF Architecture Components [15] Hosts run an OS and interact with LSF LIM Load Information Manager daemon monitors the load for host it runs on, interacts with the master LIM.

scheduler before making changes to the operation. MOABs workload Manager status allows it to manage other resource managers where a user submitting a job to it may have that job executed by several cluster resource managers on different clusters. MOAB supports MOAB domains or grids in which one domain consists of the MOAB WLM supervising one or more clusters. Dynamic functionality is available to manage jobs and adjust QOS capability. MOAB is a Cluster Manager similar to ROCKS or OSCAR but more powerful.

PBSPro Server pbs_server executes on main node. PBSPro Scheduler pbs_sched, executes on main node. PBSPro MOM pbs_mom executes on execution nodes. Scheduler config file sched_config MOM config file mom_priv/config PBSPro Scheduling behavior Gets list of MOMs from PBS server Scheduler sorts resources based on default scheduling policy Sorts queue(s) Sorts jobs from first queue PBSPro Job submission steps 1. User submits job. 2. PBS server returns a job ID. 3. PBS Scheduler requests resource list from server. 4. Scheduler sorts resources and jobs. 5. Scheduler informs Server which hosts the job can use. 6. Server pushes job execution script to selected compute nodes. 7. PBS MOM on selected nodes execute job script 8. PBS MOM reports periodically reports job status back to PBS server. 9. When job completes PBS MOM kills the running job. 10. PBS Server removes job from scheduling service Advantages for PBSPro Robust and scalable. Low administration cost. Low cost. Disadvantages Uses global policy scheme. Complex to customize. MAUI/MOAB from Adaptive Computing www.adaptivecomputing.com is by default an FCFS resource manager that has a vast array of scheduling policy, including backfilling. MOAB is the manager while MAUI handles the job scheduling tasks. An important feature of this solution is a simulation mode which allows Administrators to virtually tune the

MOAB Scheduler Architecture [17] MOAB Server/scheduler moabd MOAB client moabd Config file moab.cfg Advantages for MOAB Powerful Workload manager capable of handling scheduling of resources and jobs for multiple clusters. Robust and highly scalable. Automated notifications of status. Can integrate with several Resource managers. Cluster analysis and simulation function. Disadvantages for MOAB Complex set of configurables.

4. Comparison of Popular Schedulers


The aforementioned schedulers all have their pros and cons, relating to ease of use, effectiveness and the like. The following table compares the policy features of the schedulers.

Sch/Pol

PBS

F P C E F S Y N

F S Y

B F N

R R Y

S J F N

P R Y

M G L L N Y

L O C N

S L A N

LSF PBSPro MOAB

Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N Y Y Y Y Y Y Y Y Y Y Scheduler Policy Comparison Table

Y N Y

6. Conclusion
Scheduling is as much an art as it is a technological necessity [6, 7]. Configuring the scheduler to process jobs in the most efficient way, wasting as few CPU cycles as possible is a daunting task. Job sizes and variety make it hard for any singular policy to suffice. Starvation, backfilling and checkpointing are some of the objects that scheduling must be aware of for high availability which Cluster users require for their CPU intensive applications. As High Performance Computing and Clusters continue to grow in size, availability and scalability new schedulers and scheduling policies will need to be invented to handle the work accurately and efficiently. New offerings from scheduler producers might include local policy, singular to a queue or group, rather than global policy, a learning mode allowing the scheduler to predict and optimize its own usage, and realtime failover capability for the execution nodes.

[11] Design and Implementation of a Flexible Cluster Scheduling Framework, Esteban Tristan Pauli UC-Davis 2006 [12] A Common Substrate for Cluster Computing, Ben Hindman, A. Konwinski, Matei Zaharia, Ion Stoica, UCBerkeley, 2009 [13] Beowulf Cluster Design for Scientific PDE Models, B. McGarvey, R. Cicconetti, N. Bushyager, E. Dalton, M. Tentzeris, Georgia Institute of Technology 2003 [14] Platform PreEmption Management Solution, R. Leung, K. Ball 2009 . Websites [15] http://people.web.psi.ch/markushin/papers_html/lsf/lsf.htm [16] http://rnirmal.com/review-of-moab-hpc-suite [17] https://computing.llnl.gov/tutorials/moab

7. References
Manuals and Books [1] PBS Professional Administrators guide 10.0, 2007 [2] MOAB Administration Guide [3] LSF Administration guide [4] Queueing Theory: A Problem Solving Approach, Leonard Gorney, Petrocelli Books [5] Theory of Scheduling, Richard W. Conway, Williams L. Maxwell, Louis W. Miller, Dover Publications Inc., 1967 Publications [6] A Short Survey of Commercial Cluster Batch Schedulers, Yoav Etsion, Dan Tsafrir, 2005 [7] Xen and the Art of Cluster Scheduling, N. Fallenback, Hans-Joachim Picht, M. Smith, B. Freislenben, Dept of Mathematics and Computer Science, University of Marburg, 2007 [8] Quincy: Fair Scheduling for Distributed Computing Clusters, Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, Andrew Goldberg, Microsoft Research, Silicon Valley, 2010 [9] Cluster Schedulers, Abhisek Gupta, 2007 [10] Planning considerations for Job Scheduling in HPC Clusters, Saeed Iqbal, Ph.D, Rinku Gupta, Yung-Chin Fang, Dell Power Solutions February 2005

Das könnte Ihnen auch gefallen