Sie sind auf Seite 1von 39

Introduction to Sun Grid

Engine (SGE)

What is SGE?
Sun Grid Engine (SGE) is an open
source community effort to facilitate the
adoption of distributed computing
solutions. Sponsored by Sun
Features :

Automatic computing resource selection

Resource Accounting
Support for parallel computing (mpi)
Support for Grid Computing

SGE Job Management

Job management in SGE

1. Each user submit their job into SGE
scheduler. No need to wait for the job to
2. SGE choose node(s) to run the job.
3. Output and error of the job will be placed
in output and error file

SGE Architecture &


SGE Components
Host type
Master Host
Control all jobs
Run at frontend node

Execution Host
Host that compute the job(s)
Run at compute node

Submit Host
Where user log-in and submit their job
In ROCKS, frontend is also Submit Host

Administrative Host
Where admin log-in and do administrative task over SGE
Also frontend in ROCKS.

SGE Components
SGE Software Components
sge_commd - Communication daemon. Centralizing
all communication. Run on all nodes
sge_qmaster - Entry point for all command (qsub,
qstat, etc). Run at Master Host (frontend)
sge_execd - Execution daemon. Run only on remote
computing resource. Run at Execution Host (compute
SGE Utility (qsub, qdel, qstat, etc) - Utility
command for user job submission and statistics.
Install on Submit Host and Administrative Host only.

SGE Components
A container for a class of jobs allowed to
execute on a host concurrently
A queue determines jobs types

Cpu (itanium.q, xeon.q)

Mem (himem.q)
Time (short.q, long.q)
Licences (Fluent.q)

No need to submit job to a particular queue!

Only need to specify your job requirements
OS, software, mem

SGE will dispatch to suitable queue on a low-loaded host

ROCKS automatically setup queue for you!


Basic SGE Command

qsub - Job submission

qstat - View job statistics
qdel - Delete a job from queue
qhost - show current online host
qalter - job parameter alteration

Basic Job Submission

NOTE: Must use ordinary user to submit
the job!
Example : Create a simple Job Script to
submit the job
echo Hello world

Save it to a file named simplejob

Then submit the job using
qsub simplejob


Basic job submission (cont)

The job id will be shown after job submited

After job finished, output will be placed in

simplejob.o<job id> and error in
simplejob.e<job id>


Job statistics
Now create another job script called
simplejob2 with the following content
echo sleep 10000 seconds
sleep 1000

Submit the job

qsub simplejob2


Job statistics (cont)

Now, lets see the status of our job with qstat

state qw means job is waiting in the queue (SGE is

allocating a node for the job). Now try qstat again

state t means job is starting. r means job is running


Job statistics (cont)

Important field in job statistics
Job ID - Job ID
Name - job script name
user name - owner of the job
state - job state
queue - queue name (in ROCKS, it usually a
node name)


Job deletion
Use qstat to see the job id of

Now, lets delete the job with

qdel <job id>


Job deletion (cont)

Job output and error (until the job was
killed) will be placed in simplejob2.o<job


What is Job Script?

Job script is a shell script that describe the
The program command
Some job parameter (aka. qsub option)
May include the command to start parallel job
(such as mpirun)


More on job submission

Lets see what we can do on job submission
Create a directory named myproject then cd to that
mkdir myproject
cd myproject

Then, create a program myprog with the following


Compile this program into myprog

gcc myprog.c -o myprog

More on job submission (cont)

Now lets create a job script advancejob

Note the ./myprog line


More on job submission (cont)

Now, try submiiting the job with the same
qsub advancejob

Now, lets see the output


More on job submission (cont)

SGE always run the job on users home
The output and error file also placed in
users home directory
You need to supply -cwd, -o, and -e to
fix this problem
-cwd - Change to current working directory
before doing anything
-o, -e - specify output file name (instead of
xx.{o,e}<job id>)

More on job submission (cont)

Now lets submit the job again with the
following command
qsub -cwd -o ./advancejob.out -e
./advancejob.err advancejob arg1 arg2 arg3

NOTE: you can pass job script argument as

arg1 arg2 arg3 in this example


More job options

qsub -N theadvancejob -a 03121500 -cwd S /bin/sh -o advance.out -j y advancejob
arg1 arg2 arg3
-N - specify job name
-a - specify job start date
-S - specify the shell interpreter for the job
-j y - merge standard error to output file
(advance.out) in this case

Try to submit the job and see the result!


Placing job option in the script

You can specify the job option in job script,
by prefix the line with #$


Altering the job

You can alter the job parameter after it
was queued
Only some part of parameter can be
altered after the job was launched!
Using qalter command to altering job,
using the same argument and option as


Altering the job parameter

Please consult the man page (man qalter)

for the list of option that could be altered
after the job launched (in t or r state


Job suspension
You can suspend the job state at any time
Suspend queued job stop that job from being

When to suspend job?

You need to run another more important job,
but the old job consume all resource
Admin. wants to suspend some job because it
consume too much resource on the system

Job suspension (cont)

Using qhold command
qhold <job id>

Using qlrs command to release a hold

qrls <job id>


The qhost command

You can use qhost command to see the
online node in SGE

Try supplying -j option and see whats

happened (try it after submit some job)

qmon: SGE in Graphics Mode

Previous section we introduce using SGE
via command line
We can comfortably utilize SGE via
Graphical User Interface (GUI) by qmon
Among the facilities provided by the qmon
are submitting jobs, managing jobs,
managing hosts, and managing job

Running qmon
X-Windows is required by qmon for
providing GUI
Start X-Windows by startx
Start the qmon by qmon


Submitting a Job via QMON


, the submit job window will show


Job Control via QMON

for viewing job status and
controlling jobs


Queue Control
Only one compute node usually consists
of one queue but you can add more
queues or remove existing queues
Slot management
Slot is the capacity of a queue that can handle
concurrent jobs
May provide Number of slot of a queue =
Number of processor of the compute node

Queue Control via SGE


for control queues


Queue Control via SGE

This icon present a queue named compute0
prepared for a host named comp-pvfs-0-0
This queue consists of only one slot
You can modify properties of this queue by
highlight its icon and click the Modify button
* Normal user cannot control queues


Queue Control via SGE

Modify the properties of a queue
Try to modify the number of slot


Lab 1: Batch scheduler

Write a small program that calculate the
multiplication table. Save the file in
Program takes one argument which is the
number used to generate the multiplication
Multab 2 - generate multiplication table for number 2

Print the multiplication table to standard


Using SGE to submit the job . Calculate

the multiplication table of 2 to 12


The End