Sie sind auf Seite 1von 39

Introduction to Sun Grid

Engine (SGE)

What is SGE?
Sun Grid Engine (SGE) is an open
source community effort to facilitate the
adoption of distributed computing
solutions. Sponsored by Sun
Microsystems
Features :

Automatic computing resource selection


Resource Accounting
Support for parallel computing (mpi)
Support for Grid Computing
2

SGE Job Management

Job management in SGE


1. Each user submit their job into SGE
scheduler. No need to wait for the job to
finish.
2. SGE choose node(s) to run the job.
3. Output and error of the job will be placed
in output and error file

SGE Architecture &


Components

SGE Components
Host type
Master Host
Control all jobs
Run at frontend node

Execution Host
Host that compute the job(s)
Run at compute node

Submit Host
Where user log-in and submit their job
In ROCKS, frontend is also Submit Host

Administrative Host
Where admin log-in and do administrative task over SGE
Also frontend in ROCKS.
6

SGE Components
SGE Software Components
sge_commd - Communication daemon. Centralizing
all communication. Run on all nodes
sge_qmaster - Entry point for all command (qsub,
qstat, etc). Run at Master Host (frontend)
sge_execd - Execution daemon. Run only on remote
computing resource. Run at Execution Host (compute
node)
SGE Utility (qsub, qdel, qstat, etc) - Utility
command for user job submission and statistics.
Install on Submit Host and Administrative Host only.
7

SGE Components
Queue
A container for a class of jobs allowed to
execute on a host concurrently
A queue determines jobs types

Cpu (itanium.q, xeon.q)


Mem (himem.q)
Time (short.q, long.q)
Licences (Fluent.q)

No need to submit job to a particular queue!


Only need to specify your job requirements
OS, software, mem

SGE will dispatch to suitable queue on a low-loaded host

ROCKS automatically setup queue for you!


8

Basic SGE Command

qsub - Job submission


qstat - View job statistics
qdel - Delete a job from queue
qhost - show current online host
qalter - job parameter alteration

Basic Job Submission


NOTE: Must use ordinary user to submit
the job!
Example : Create a simple Job Script to
submit the job
#!/bin/sh
date
echo Hello world

Save it to a file named simplejob


Then submit the job using
qsub simplejob

10

Basic job submission (cont)


The job id will be shown after job submited

After job finished, output will be placed in


simplejob.o<job id> and error in
simplejob.e<job id>

11

Job statistics
Now create another job script called
simplejob2 with the following content
#!/bin/sh
date
echo sleep 10000 seconds
sleep 1000

Submit the job


qsub simplejob2

12

Job statistics (cont)


Now, lets see the status of our job with qstat

state qw means job is waiting in the queue (SGE is


allocating a node for the job). Now try qstat again

state t means job is starting. r means job is running


13

Job statistics (cont)


Important field in job statistics
Job ID - Job ID
Name - job script name
user name - owner of the job
state - job state
queue - queue name (in ROCKS, it usually a
node name)

14

Job deletion
Use qstat to see the job id of
simplejob2

Now, lets delete the job with


qdel <job id>

15

Job deletion (cont)


Job output and error (until the job was
killed) will be placed in simplejob2.o<job
id>.

16

What is Job Script?


Job script is a shell script that describe the
job
The program command
Some job parameter (aka. qsub option)
May include the command to start parallel job
(such as mpirun)

17

More on job submission


Lets see what we can do on job submission
Create a directory named myproject then cd to that
directory
mkdir myproject
cd myproject

Then, create a program myprog with the following


content

Compile this program into myprog


gcc myprog.c -o myprog
18

More on job submission (cont)


Now lets create a job script advancejob

Note the ./myprog line


19

More on job submission (cont)


Now, try submiiting the job with the same
command
qsub advancejob

Now, lets see the output

20

More on job submission (cont)


SGE always run the job on users home
directory
The output and error file also placed in
users home directory
You need to supply -cwd, -o, and -e to
fix this problem
-cwd - Change to current working directory
before doing anything
-o, -e - specify output file name (instead of
xx.{o,e}<job id>)
21

More on job submission (cont)


Now lets submit the job again with the
following command
qsub -cwd -o ./advancejob.out -e
./advancejob.err advancejob arg1 arg2 arg3

NOTE: you can pass job script argument as


arg1 arg2 arg3 in this example

22

More job options


qsub -N theadvancejob -a 03121500 -cwd S /bin/sh -o advance.out -j y advancejob
arg1 arg2 arg3
-N - specify job name
-a - specify job start date
([YY]MMDDHHMM[.ss])
-S - specify the shell interpreter for the job
script
-j y - merge standard error to output file
(advance.out) in this case

Try to submit the job and see the result!


23

Placing job option in the script


You can specify the job option in job script,
by prefix the line with #$

24

Altering the job


You can alter the job parameter after it
was queued
Only some part of parameter can be
altered after the job was launched!
Using qalter command to altering job,
using the same argument and option as
qsub

25

Altering the job parameter

Please consult the man page (man qalter)


for the list of option that could be altered
after the job launched (in t or r state

26

Job suspension
You can suspend the job state at any time
Suspend queued job stop that job from being
launched

When to suspend job?


You need to run another more important job,
but the old job consume all resource
Admin. wants to suspend some job because it
consume too much resource on the system
27

Job suspension (cont)


Using qhold command
qhold <job id>

Using qlrs command to release a hold


job
qrls <job id>

28

The qhost command


You can use qhost command to see the
online node in SGE
qhost

Try supplying -j option and see whats


happened (try it after submit some job)
29

qmon: SGE in Graphics Mode


Previous section we introduce using SGE
via command line
We can comfortably utilize SGE via
Graphical User Interface (GUI) by qmon
Among the facilities provided by the qmon
are submitting jobs, managing jobs,
managing hosts, and managing job
queues
30

Running qmon
X-Windows is required by qmon for
providing GUI
Start X-Windows by startx
Start the qmon by qmon

31

Submitting a Job via QMON


Click

, the submit job window will show

32

Job Control via QMON


Click
for viewing job status and
controlling jobs

33

Queue Control
Only one compute node usually consists
of one queue but you can add more
queues or remove existing queues
Slot management
Slot is the capacity of a queue that can handle
concurrent jobs
May provide Number of slot of a queue =
Number of processor of the compute node
34

Queue Control via SGE


Click

for control queues

35

Queue Control via SGE


(Cont)
This icon present a queue named compute0
prepared for a host named comp-pvfs-0-0
This queue consists of only one slot
You can modify properties of this queue by
highlight its icon and click the Modify button
* Normal user cannot control queues

36

Queue Control via SGE


(Cont)
Modify the properties of a queue
Try to modify the number of slot

37

Lab 1: Batch scheduler


Write a small program that calculate the
multiplication table. Save the file in
multab.c
Program takes one argument which is the
number used to generate the multiplication
table
Multab 2 - generate multiplication table for number 2

Print the multiplication table to standard


output

Using SGE to submit the job . Calculate


the multiplication table of 2 to 12

38

The End