Sie sind auf Seite 1von 45

GE COOKBOOKV3.

Document identifier: GE_Cookbook-v3-0.odt

Date: 17/01/2012

Activity: SA1.5

Document status: FINAL

Document link:

Abstract: A quick reference for using GE batch system in an EMI environment.

PUBLIC 1 / 45
This work is co-funded by the EC EMI project under the FP7 Collaborative Projects Grant Agreement Nr.
INFSO-RI-261611.

Copyright notice:

Copyright © EGI.eu. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To
view a copy of this license, see CreativeCommons license or send a letter to Creative Commons, 171 Second Street, Suite 300, San
Francisco, California, 94105, USA. The work must be attributed by attaching the following reference to the copied elements: “Copyright ©
EGI.eu (www.egi.eu). Using this document in a way and/or for purposes not foreseen in the license, requires the prior written permission
of the copyright holders. The information contained in this document represents the views of the copyright holders as of the date such
views are published.

Delivery Slip
Name Partner/ Date Signature
Activity

From

Verified

Reviewed by

Approved by

Document Log
Issue Date Comment Author/Partner
0-1 28/06/2007 Initial Draft Javier Lopez - CESGA Esteban Freire - CESGA
1-0 29/06/2007 Ready for comments Javier Lopez - CESGA
Updated to reflect the comments from John
1-1 16/07/2007 Esteban Freire – CESGA
Walsh (TCD)
1-2 20/07/2007 Minor changes Javier Lopez - CESGA
Minor changes. Updated to reflect the
1-3 04/10/2007 Esteban Freire- CESGA
comments from Pablo Rey (CESGA)
Update to adapt the document to the last SGE
2-0 16/02/2010 Esteban Freire- CESGA
and gLite version
Adding a new paragraph to include installation
2-1 28/04/2010 Esteban Freire- CESGA
notes of CREAM-CE with SGE
Changes for Open Grid Scheduler / Grid
3-0 17/01/2012 Engine installation and configuration instead Roberto Rosende - CESGA
Sun Grid Engine and adaptation to EMI

Document Change Record


Issue Item Reason for Change
Table of contents
1. INTRODUCTION............................................................................................................................................................4
1.1. PURPOSE.........................................................................................................................................................................4
1.2. DOCUMENT ORGANIZATION................................................................................................................................................4
1.3. APPLICATION AREA...........................................................................................................................................................4
1.4. REFERENCES...................................................................................................................................................................4
1.5. DOCUMENT AMENDMENT PROCEDURE..................................................................................................................................5
1.6. TERMINOLOGY.................................................................................................................................................................5
2. OVERVIEW......................................................................................................................................................................5

3. GE IN THE "LCG/GLITE" CONTEXT.......................................................................................................................6

4. INSTALLING GE.............................................................................................................................................................6
4.1. INTRODUCTION.................................................................................................................................................................6
4.2. INSTALL PROCCESS FOR EMI CREAM-CE WITH GE..........................................................................................7
4.3. CREAM INSTALLATION FOR GE.......................................................................................................................................8
4.4. STARTING GE.........................................................................................................................................................12
5. CONFIGURING GE ON CE.........................................................................................................................................12
5.1. POLICY_HIERARCHY PARAMETER AND OTHER PARAMETERS ABOUT PRIORITIES..........................................................................15
5.2. HOW TO CONFIGURE A EPILOG OR PROLOG SCRIPT.............................................................................................................17
5.3. HOW TO CONFIGURE RESOURCE QUOTAS...........................................................................................................................18
5.4. HOW TO CONFIGURE A SHADOW QMASTER..........................................................................................................................20
6. USEFUL ADMIN COMMANDS...................................................................................................................................21

7. HOW TO RUN ARRAY JOBS USING GE..................................................................................................................29

8. HOW TO CONFIGURE A PARALLEL ENVIROMENT.........................................................................................31

9. HOW TO ASSIGN PRIORITIES TO GROUPS AND USERS .................................................................................35

10. ONE CONFIGURATION EXAMPLE.......................................................................................................................36

11. TROUBLESHOOTING...............................................................................................................................................41

12. COMMANDS QUICK REFERENCE........................................................................................................................43


1. INTRODUCTION

1.1. PURPOSE
To provide site administrators a quick reference for helping them using GE in their sites.

1.2. DOCUMENT ORGANIZATION


The document is divided in twelve sections.

1.3. APPLICATION AREA


This document is intended for site administrators.

1.4. REFERENCES
Table 1: Table of references
R1 Grid Engine http://gridscheduler.sourceforge.net/
R2 Sun Grid Engine http://www.oracle.com/us/sun/index.htm

R3 SUN INDUSTRY STANDARDS SOURCE LICENSE


http://gridengine.sunsource.net/project/gridengine/Gridengine_SISSL_license.html
R4 Upgrading Open Grid Scheduler Software
http://gridscheduler.sourceforge.net/howto/howto.html
R5 SGE Wiki Page https://twiki.cern.ch/twiki/bin/view/LCG/ImplementationOfSGE
R6 SGE stress tests on LCG-CE https://twiki.cern.ch/twiki/bin/view/LCG/SGE_Stress
R7 GE installation
http://gridscheduler.sourceforge.net/CompileGridEngineSource.html
R8 YAIM info, http://yaim.info/
R9 Job Manager releases http://www.egee.cesga.es/lcgsge/releases/
R 10 GE installation with CREAM-CE EMI
http://wiki.italiangrid.it/twiki/bin/view/CREAM/SystemAdministratorGuideForEMI1
R 11 SGE server yaim interface
http://eticssoft.web.cern.ch/eticssoft/repository/org.glite/org.glite.yaim.sge-
server/4.1.1/noarch/glite-yaim-sge-server-4.1.1-1.noarch.rpm
R 12 GE support mailing list ge-support@listas.cesga.es
R 13 Grid Engine Documentation http://gridengine.sunsource.net/manpages.html
R 14 Scheduling Policies
http://docs.sun.com/app/docs/doc/817-5677/6ml49n2bs?q=seq_no&a=view
R 15 Specification Resource quota
http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/deve
l/rfe/ResourceQuotaSpecification.html
R 16 RQS common uses http://wiki.gridengine.info/wiki/index.php/RQS_Common_Uses
R 17 Migrating qmaster to Another Host http://docs.sun.com/app/docs/doc/820-
0698/6ncdvjcl4?a=view
R 18 Site Configuration for MPI http://grid.ifca.es/wiki/Middleware/MpiStart/MpiUtils
R 19 Sun N1 Grid Engine 6.1 Collection http://www.sun.com/blueprints/1005/819-4325.html
R 20 Grid Engine Mail Lists http://gridengine.sunsource.net/maillist.html
R 21 SGE troubleshooting http://gridscheduler.sourceforge.net/howto/troubleshooting.html

1.5. DOCUMENT AMENDMENT PROCEDURE


This document may be update as new changes appear in GE software or gLite middleware.
Amendments, comments and suggestions should be sent to the authors.

1.6. TERMINOLOGY
This subsection provides the definitions of terms, acronyms, and abbreviations required to properly
interpret this document. A complete project glossary is provided in the EGEE glossary

Glossary
Acronym Meaning
SGE Sun Grid Engine
JobID Number of job assigned by SGE
SGE Qmaster Master Host
YAIM A tool to configuring Grid Services
JM Job Manager
CE Computing Element
WN Worker node or Execution hosts
RQS Resource quotas
GE Grid Engine

2. OVERVIEW
Grid Engine [R1] is an open source job management system developed initially by Sun and now
supported and developed by the Open Grid Scheduler Project. There is also a commercial version
including support from Sun and Oracle [R2]
Some important features include:

 Extensive operating system support: RedHat, Debian, HP-UX, Solaris, etc.


 Flexible Scheduling Policies: Priority; Urgency;
Ticket-based: Share-Based, Functional, Override
 Support for Subordinate Queues
 Support for Array Jobs
 Support for Interactive Jobs (qlogin)
 Support for Complex Resource Attributes (i.e. it allows to define the number of licenses
available for a program)
 Shadow Master Hosts (high availability)
 Accounting and Reporting Console (ARCo)
 Tight integration of parallel libraries
 Implements Calendars for Fluctuating Resources
 Supports Check pointing and Migration
 Supports DRMAA 1.0
 Supports Resource Quotas (i.e. allows to limit the maximum number of running jobs per user
or user group)
 Transfer-queue Over Globus (TOG)
 Intuitive Graphic Interface: Used by users to manage jobs and by admins to configure and
monitor their cluster
 Good Documentation: Administrator’s Guide, User’s Guide, mailing lists, wiki, blogs
 Enterprise-grade scalability: 10,000 nodes per one master

3. GE IN THE "LCG/GLITE" CONTEXT


Some virtues of SGE in the EMI context:

 GE include a scheduler, 'sge_schedd', so it is not required to maintain a separate


scheduler.

 GE allows:

 Configuration of a wide range of policies of group and users, therefore VO policies.

 Resource reservation for a given user and group (and thus VO), in other words, making
sure that we will have available resources for those jobs.

 Can set limits on the maximum resources a user/group can use, in order to establish, for
example, jobs that a user/group can run at the same time.

 Supports complex attributes in order to establish, for example, processors, memory and disk
usage.

4. INSTALLING GE

4.1. INTRODUCTION
GE hosts are classified into four groups:

 Master host.

The master host (GE Qmaster) is central for the overall cluster activity. The master host runs
the master daemon 'sge_qmaster'. This daemon controls all GE components such as
queues and jobs. On master host usually runs the "scheduler" 'sge_schedd'. This daemon
makes the scheduling decisions.

 Execution hosts.

Execution hosts are nodes that have permission to run jobs. On these hosts run the
execution daemon 'sge_execd'. This daemon controls the queues local to the machine on
which it is running and executes the jobs sent from 'sge_qmaster' to be run on these
queues.

 Administration hosts.

Permission can be given to hosts other than the master host to carry out any kind of
administrative activity.

 Submit hosts.

Submit hosts allow for submitting and controlling batch jobs only. In particular, a user who is
logged into a submit host can use 'qsub' to submit jobs, can use 'qstat' to control the job
status, or can run the graphical user interface 'qmon'.

Note: A host can belong to more than one group.

4.2. INSTALL PROCCESS FOR EMI CREAM-CE WITH GE


To install GE, the source code can be downloaded from official web page [R1], then it can be
compiled following steps in that that page.

To install compiled version:


 On CE:

 setenv SGE_ROOT < Your Target Directory > (or export SGE_ROOT=< Your Target
Directory > if your shell is sh, bash, ksh)
 mkdir $SGE_ROOT
 scripts/distinst -all -local -noexit
 cd $SGE_ROOT
 ./install_qmaster
Here we can ask all question with default answers, except that three:
Do you want to install Grid Engine
under an user id other than >root< (y/n) [y] >> n

Grid Engine TCP/IP service >sge_qmaster<


Please enter an unused port number >> 536

Grid Engine TCP/IP service >sge_execd<


Please enter an unused port number >> 537
 It's necessary define now appropiate variables in enviroment for right work, we can do it
with provided script for this:
source /usr/local/ge2011.11/default/common/settings.sh
*Noticed, check that /etc/profile.d/sge.sh has appropiate paths, example:
# Define SGE_ROOT directory and SGE commands
export SGE_ROOT=/usr/local/ge2011.11
export SGE_CELL=default
. /usr/local/ge2011.11/default/common/settings.sh

 On WN:

 setenv SGE_ROOT < Your Target Directory > (or export SGE_ROOT=< Your Target
Directory > if your shell is sh, bash, ksh)
 mkdir $SGE_ROOT
 scripts/distinst -all -local -noexit
 cd $SGE_ROOT
 ./install_qmaster
Here we can ask all question with default answers, except that three:
Do you want to install Grid Engine
under an user id other than >root< (y/n) [y] >> n

Grid Engine TCP/IP service >sge_qmaster<


Please enter an unused port number >> 536

Grid Engine TCP/IP service >sge_execd<


Please enter an unused port number >> 537
 Next install execution daemon: ./install_execd
 It's necessary define now appropiate variables in enviroment for right work, we can do it
with provided script for this:
source /usr/local/ge2011.11/default/common/settings.sh
*Noticed, check that /etc/profile.d/sge.sh has appropiate paths, example:
# Define SGE_ROOT directory and SGE commands
export SGE_ROOT=/usr/local/ge2011.11
export SGE_CELL=default
. /usr/local/ge2011.11/default/common/settings.sh

4.3. CREAM INSTALLATION FOR GE


The roadmap followed to implement CREAM with GE can be found on link [R5]. It also can be found
on link [R6] the results for the GE stress tests of lcg-CE what we did in our CESGA-SA3 testbed to
testing GE capacity.

The administrator can choose between install CREAM and GE Qmaster in the same physical
machine or install CREAM and GE Qmaster in different physical machines. It is recommended
install the CREAM and GE Qmaster in different machines in order to do not mix both services.

 CREAM and GE Qmaster in the same physical machine

Basically, the process consists in installing the necessary RPMs and configures with YAIM
[R8] the services. The detailed installation instructions can be found on link [R7].

 CREAM and GE Qmaster in different physical machines


First of all, make sure that you are using the same GE version for client and server tools,
and that the GE installation paths are the same in the CE and in the GE Qmaster server.
After installing, the site administrator can find one of the following two cases before
configure with YAIM:

 Have the control of the GE Qmaster (the administrator can do changes on the GE
configuration):
Make sure that in the Qmaster configuration it is present the following setting:
“execd_params INHERIT_ENV=false”. This setting allows propagating the environment
of the CE into the WN. It should be there by default if it is used the “sge-server yaim
plugin”. If not, it can be added using:

> qconf -mconf

- See section “5 Configuring GE on CE" to get more information.

It is necessary declare the CE as an allowed submission machine on the GE Qmaster:

>qconf -as <CE.MY.DOMAIN>

- See section “6 Useful admin commands” to get more information.CREAM-CE installation


for GE
The detailed installation instructions of CREAM-CE with GE in EMI can be found on link [R10].

The CREAMCE should be installed in a separate node from the GE QMASTER machines in order
to do not mix both services and the same GE software version should be used in both cases.
However, the administrator can choose between install CREAM-CE and GE Qmaster in the same
physical machine or install CREAM-CE and GE Qmaster in different physical machines:

 Installing CREAM-CE and GE Qmaster in the same physical machine


The detailed installation instructions can be found on link [R10].
Note: In this example we do assume that no GE NFS installation will be used.

 Install GE as described in 4.2 INSTALL PROCCESS for Lcg-ce/CREAM-CE WITH GE


 Install cream-ce and ge-utils meta package
>yum install emi-cream-ce
>yum install emi-ge-utils

 Configure appropriate queues, vo's and users for your batch system, you can see
section 5. CONFIGURING GE on CE to do it.
 Set the following variables relevant in the site-info.def file:

BATCH_SERVER="GE Qmaster FQN"


BATCH_VERSION="GE version"
BATCH_BIN_DIR="Directory where the GE binary client tools are installed in the CE"
Example: /usr/local/sge/pro/bin/lx26-x86
BATCH_LOG_DIR="Path for the SGE accounting file".
Example: /usr/local/ge2011.11/default/common/accounting
SGE_ROOT="The GE installation directory"
SGE_CELL="SGE cell definition". Default: default
SGE_QMASTER="GE qmaster port". Default: 536
SGE_EXECD="GE execd port". Default: 537
SGE_SPOOL_METH="GE spooling method".
BLPARSER_WITH_UPDATER_NOTIFIER="true"
JOB_MANAGER=sge
CE_BATCH_SYS=sge

 Configure the CREAMCE and GE_utils services: (in siteinfo/site-info.def the


‘BATCH_SERVER’ variable should point to the CREAMCE machine)

>/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n creamCE -n SGE_utils

 The transferring of files between WN and CE is handled by a script, called


sge_filestaging, which must be available in all WNs under /opt/glite/bin, and which you
may find in your Cream-CE installation under /opt/glite/bin/sge_filestaging. This script
must be executed as prolog and epilog of your jobs. Therefore, you should define
/opt/glite/bin/sge_filestaging --stagein and /opt/glite/bin/sge_filestaging --stageout
as prolog and epilog scripts, either in GE global configuration “qconf -mconf" or in each
queue configuration "qconf -mq <QUEUE>".

 Configuring CREAM-CE and GE Qmaster in different physical machines

Note: In this example we do assume that no GE NFS installation will be used.

 Install GE as described in 4.2 INSTALL PROCCESS for Lcg-ce/CREAM-CE WITH GE


 Install cream-ce and ge-utils meta package
>yum install emi-cream-ce
>yum install emi-ge-utils

 Set the following variables relevant in the site-info.def file (be also sure that you include
the VOs and QUEUES information which you want to setup):

BATCH_SERVER="GE Qmaster FQN"


BATCH_VERSION="GE version"
BATCH_BIN_DIR="Directory where the GE binary client tools are installed in the CE"
Example: /usr/local/ge2011.11/bin/linux-x64/
BATCH_LOG_DIR="Path for the GE accounting file". Accounting file should be
accessible from CREAM. If QMASTER is installed in a different machine share this file (by
NFS per example)
Example: /usr/local/ge2011.11/default/common/accounting
SGE_ROOT="The GE installation directory". Default: /usr/local/ge2011.11
SGE_CELL="GE cell definition". Default: default
SGE_QMASTER="GE qmaster port". Default: 536
SGE_EXECD="GE execd port". Default: 537
SGE_SPOOL_METH="GE spooling method"
BLPARSER_WITH_UPDATER_NOTIFIER="true"
JOB_MANAGER=sge
CE_BATCH_SYS=sge
 Configure the CREAMCE service (in siteinfo/site-info.def the BATCH_SERVER variable
should point to the machine where your GE Qmaster will run)
>/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n creamCE -n SGE_utils
 For MPI support you must intall glite-mpi and openmpi support:
>yum install glite-mpi
>yum install openmpi openmpi-devel

 Configure CREAM with appropiate vars as is described in [R18] link to run yaim:
>/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n MPI_CE

 In the GE Qmaster, declare the CE as an allowed submission machine:

>qconf -as <CE.MY.DOMAIN>

 If you have control of the GE Qmaster, make sure that in the Qmaster configuration you
have the following setting: execd_params INHERIT_ENV=false. This setting allows
propagating the environment of the submission machine (CE) into the execution machine
(WN). It can be implemented in GE QMASTER using:

>qconf –mconf

 The transferring of files between WN and CE is handled by a script, called


sge_filestaging, which must be available in all WNs under /opt/glite/bin, and which you
may find in your Cream-CE installation under /opt/glite/bin/sge_filestaging. This script
must be executed as prolog and epilog of your jobs. Therefore you should define
/opt/glite/bin/sge_filestaging --stagein and /opt/glite/bin/sge_filestaging --stageout
as prolog and epilog scripts, either in GE global configuration "qconf -mconf" or in each
queue configuration "qconf -mq <QUEUE>".

 Link CREAM-CE with a running GE Qmaster server


You should ensure that you are using the same GE version for client and server tools, and
that the GE installation paths are the same in the CREAM-CE and in the GE Qmaster
server.

 If you are using a GE installation shared via NFS or equivalent, and you do not want to
change it with YAIM, you must set up the following variable in your site-info.def file. The
default value for this variable is "no", which means that GE software WILL BE configured
by YAIM.

SGE_SHARED_INSTALL=yes

 To complete the installation can be following the steps explained in the previous
paragraph “Configure CREAM-CE and SGE Qmaster in different physical
machines”.

4.4. STARTING GE
 To start GE on the Master Host (CE), use the following command :
> /etc/init.d/sgemaster start

This command starts the scheduler 'sge_schedd' and the qmaster 'sge_qmaster'.

 To start GE on the Execution Hosts (WN) , use the following command:

> /etc/init.d/sgeexecd start

This command starts the execution daemon 'sge_execd', in order to be able to submit jobs
to this node

Notes: There is no restart option. You can use start/stop instead.


If you got some error during GE installation/configuration, you can send an e-mail to the GE
support mailing list to consult it [R12].

5. CONFIGURING GE ON CE
Configuring the batch system, we will first consider the "Global Cluster Configuration" setup and
the "Scheduler Configuration" setup. These configurations are modified on the "Master Host".
GE provides a GUI configuration tool called 'qmon', it can be launched from the command line
executing ‘qmon’ command. In this document we will use the command-line based methods.
Qmon main control window:

The command 'qconf' displays an editor for each one of its options. The editor is either the default
'vi' editor or an editor corresponding to the EDITOR environment variable.
The "Global Cluster Configuration" settings may be displayed using the command 'qconf
-sconf'. These settings may be modified using the command 'qconf -mconf'.
Below is a sample of a configuration used on testbed site "CESGA-SA3". We have included some
comments where we though the variable settings needed some explanation.
#global:
execd_spool_dir /usr/local/ge2011.11/default/spool # Directory in which we can see the
messages
mailer /bin/mail
xterm /usr/bin/X11/xterm
load_sensor none
prolog none # The exec path of a shell script that is started before execution of GE jobs
epilog none # The exec path of a shell script that is started after execution of GE jobs
shell_start_mode posix_compliant
login_shells sh,bash,ksh,csh,tcsh # The shells which it offers
min_uid 0
min_gid 0
user_lists none # Users to whom we allow to run jobs in this cluster
xuser_lists none # Users to whom we deny to run jobs in this cluster
projects none
xprojects none
enforce_project false
enforce_user auto
load_report_time 00:00:40
max_unheard 00:05:00
reschedule_unknown 02:00:00
loglevel log_warning # Form in which it will be shown logs on the messages
administrator_mail none
set_token_cmd none
pag_cmd none
token_extend_time none
shepherd_cmd none
qmaster_params none
execd_params none
reporting_params accounting=true reporting=false \
flush_time=00:00:15 joblog=false sharelog=00:00:00
finished_jobs 100 # The number of finished Jobs which we'll show with ‘qstat -s z –u ‘*’’
gid_range 20000-20100
qlogin_command builtin
qlogin_daemon builtin
rlogin_command builtin
rlogin_daemon builtin
rsh_command builtin
rsh_daemon builtin
max_aj_instances 2000
max_aj_tasks 75000
max_u_jobs 0
max_jobs 0
max_advance_reservations 0
auto_user_oticket 0
auto_user_fshare 0
auto_user_default_project none
auto_user_delete_time 86400
delegated_file_staging false
reprioritize 0
jsv_url none
jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w

The "Scheduler Configuration" settings may be displayed using the command 'qconf -ssconf'
and these settings may be modified using the command 'qconf -msconf'.

[root@sa3-ce etc]# qconf -ssconf


algorithm default
schedule_interval 0:0:15
maxujobs 0
queue_sort_method load
job_load_adjustments np_load_avg=0.50
load_adjustment_decay_time 0:7:30
load_formula np_load_avg
schedd_job_info false
flush_submit_sec 0
flush_finish_sec 0
params none
reprioritize_interval 0:0:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 5.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 0
weight_tickets_share 0
share_override_tickets TRUE
share_functional_shares TRUE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OFS
weight_ticket 0.010000
weight_waiting_time 0.000000
weight_deadline 3600000.000000
weight_urgency 0.100000
weight_priority 1.000000
max_reservation 0
default_duration INFINITY

Notes: After modifying the configuration of the Scheduler or the Global Cluster it is not necessary
to restart anything, it is only necessary to worry about what changes are the best for your site.
The detailed documentation about these parameters can be found on the following link [R13] in
paragraphs:

 About "Global Cluster Configuration" in the section 5 as "sge_conf"


 About "Scheduler Configuration" in the section 5 as "sched_conf"

5.1. POLICY_HIERARCHY PARAMETER AND OTHER PARAMETERS ABOUT PRIORITIES

The "POLICY_HIERARCHY" parameter can be a up to 4 letter combination of the first letters of the
4 policies S(hare-based), F(unctional), D(eadline) and O(verride). Basically the share-based
policy is equivalent to a fair-share policy that takes into account previous usage of the resources (in
GE terminology a share-tree policy), the functional policy is a fair-share policy that does not
consider historical information, the deadline policy takes into account the deadline value of the jobs
(if deadlines are set by the users), and the override policy allows a site admin to dynamically adjust
the priorities of the jobs in the system. You can combine all these policies and create your own
policy.

In our case the value OFS means that the override policy takes precedence over the functional
policy, which finally influences the share-based policy.

Then, we can assign this policy to a user or group. The final policy applied depends also on different
weight values that we can assign on the Scheduler Configuration:
Scheduling algorithms:

 weight_user

The relative importance of the user shares in the functional policy


 weight_project

The relative importance of the project shares in the functional policy.

 weight_department

The relative importance of the department shares in the functional policy

 weight_job

The relative importance of the job shares in the functional policy

 weight_tickets_functional

The maximum number of functional tickets available for distribution by Grid Engine. By
default is indefinite.

 weight_tickets_share

The maximum number of share based tickets available for distribution by Grid Engine. By
default is indefinite

 share_override_tickets

If set to "true" or "1", override tickets of any override object instance are shared equally
among all running jobs associated with the object

 share_functional_shares

If set to "true" or "1", functional shares of any functional object instance are shared among all
the jobs associated with the object

 weight_ticket

The weight applied on normalized ticket amount when determining priority finally used
 weight_waiting_time

The weight applied on the jobs waiting time since submission

 weight_deadline

The weight applied on the remaining time until a jobs latest start time

 weight_urgency

The weight applied on jobs normalized urgency when determining priority finally used
GE uses a weighted combination of these three policies to implement automated job scheduling
strategies:

 Share-based
 Functional
 Override

The Tickets are a mixture of these three policies. Each policy has a pool of tickets and the Tickets
weight the three policies.
The detailed documentation about "Scheduling Policies" can be found on link [R13].

5.2. HOW TO CONFIGURE A EPILOG OR PROLOG SCRIPT

There are two parameters that can be configured on the configurations of the queues:

 Prolog

It can be configured a prolog script to be executed before the job start.

 Epilog

It can be configured an epilog script to be executed when the job finishes.

To configure a queue with an epilog script we should:

a) Inspect the queues using the command 'qconf -sql'


b) Edit the queue's settings using the command 'qconf -mq queue_name'
Example:
[esfreire@sa3-ce esfreire]$ qconf -mq cesga
[ ... ]
qname cesga
....
epilog /usr/local/sge/pro/default/common/epilog.sh # Directory in which we have copied
the epilog script
# By default, the value for this variable is “none”, therefore we replaced the value “none” by the
PATH directory in which we have copied the script
[ ... ]
To avoid errors generated in the epilog, it is necessary to configure
“StrictHostKeyChecking=no” in each Execution Host in the file “/etc/ssh/ssh_config”, otherwise
epilog fails when requesting the Host verification.

5.3. HOW TO CONFIGURE RESOURCE QUOTAS


The Resource Quota feature allows to administrators for example applying limits to users, the limit
the jobs what can run one user or one user's group in one determinate host or queue.
Resource quotas are configured and defined as sets. Any set can contain one or more resource
quota limits. Resource quota syntax and rule sets behave are processed like firewall rules on Linux
systems.

The resource quota can be added creating it first in a file and then add this file with the respective
GE command or edit it directly using a editor like VIM with the respective GE command:

 To adding a resource quota sets:

> qconf –arqs [name]

 To adding a resource quota from file:

> qconf -Arqs file_name

 To modifying a resource quota set:

> qconf –mrqs [name] :

 To showing resource quota set:

> qconf –srqs [name_list]

 To showing resource quota set list:

> qconf -srqsl


 To deleting resource quota set

> qconf –drqs [name_list]

 To viewing information about the current Grid Engine resource quotas:

> qquota –u ‘*’

One configuration example:

We creating the following Resource Quota rule for one of our CEs in production:
[root@svgd ~]# qconf -srqsl
maxujobs_svgd

The “maxujobs_svgd” content is:


[root@svgd ~]# qconf -srqs
{
name maxujobs_svgd
description NONE
enabled TRUE
limit users {*,!orballo,!@GRID_ops,!@GRID_opssgm} hosts \
compute-3-38.local to s_vmem=512M
limit users mosfet012 to num_proc=10
limit users @GRID_alice hosts @all_X86 to num_proc=20
limit users @GRID_atlas hosts @all_X86 to num_proc=20
limit users @GRID_atlassgm hosts @all_X86 to num_proc=10
limit users @GRID_biomed hosts @all_X86 to num_proc=80
limit users @GRID_cesga hosts @all_X86 to num_proc=40
limit users @GRID_cms hosts @all_X86 to num_proc=40
limit users @GRID_compchem hosts @all_X86 to num_proc=70
limit users @GRID_dteam hosts @all_X86 to num_proc=20
limit users @GRID_fusion hosts @all_X86 to num_proc=60
limit users @GRID_globus hosts @all_X86 to num_proc=10
limit users @GRID_imath hosts @all_X86 to num_proc=30
limit users @GRID_lhcb hosts @all_X86 to num_proc=10
limit users @GRID_lhcbprd hosts @all_X86 to num_proc=26
limit users @GRID_lhcbpil hosts @all_X86 to num_proc=30
limit users @GRID_ops hosts @all_X86 to num_proc=10
limit users @GRID_opssgm hosts @all_X86 to num_proc=10
limit users @GRID_alicesgm hosts @all_X86 to num_proc=30
limit users @GRID_alicesgm hosts @nodos_X86_2GB to num_proc=10
limit users @GRID_swetest hosts @all_X86 to num_proc=2
limit users @GRID_eelaprod_10 hosts @all_X86 to num_proc=20
limit users @GRID_EGEE_sgm_10 hosts @all_X86 to num_proc=10
limit users @GRID_EGEE_prd_10 hosts @all_X86 to num_proc=10
limit users @GRID_EGEE_prd_10 hosts @all_X86 to num_proc=4
limit users {*,!orballo} hosts @nodos_X86_1GB to num_proc=6
limit users {*,!orballo} hosts @nodos_X86_2GB to num_proc=4
}

The "@GRID_group" are users groups lists which we are defined previously. The "@all_X86",
"@nodos_X86_1GB" and "@nodos_X86_1GB" are hosts groups lists which we are defined
previously.

 In the example:
limit users {*,!orballo,!@GRID_ops,!@GRID_opssgm} hosts \
compute-3-38.local to s_vmem=512M

We are saying that only can run jobs which ask a maximum of 512 Megabytes on node
"compute-3-38.local", this rule is applied for all users "*", excepts "orballo" user and
"GRID_ops", "GRID_opssgm" users group lists.

 In the example:
limit users @GRID_lhcbprd hosts @all_X86 to num_proc=26

We are saying that the users defined on "GRID_lhcbprd" list only occupy a maximum of
twenty-six processors on the hosts defined in "all_X86" list, where "all_X86" list groups are
eighty nodes with one processor each one, in other words, one user defined on
"GRID_lhcbprd" list could occupy twenty-six of these eighty nodes.

 In the example:
limit users {*,!orballo} hosts @nodos_X86_2GB to num_proc=4
We are saying all users '*' excepts "orballo" user only can occupy a maximum of four
processors on hosts defined on "nodos_X86_2GB" list, where "nodos_X86_2GB" list
groups are thirty nodes with one processor each one, in other words, a user different from
"orballo" user only could occupy four of these thirty nodes.

Note: It can be found detailed information on link [R16]. It also can be found information about how
to create host groups or user lists in section “6 Useful admin command”.

5.4. HOW TO CONFIGURE A SHADOW QMASTER

As explained in section "4.1 Introduction", a GE cluster is composed of execution hosts and


master hosts. The execution hosts run of the GE execution daemon "sge_execd". The master host
run the GE qmaster daemon "sge_qmaster". The qmaster daemon is central for the overall cluster
activity, and without it the jobs cannot be submitted or scheduled. In order to get fault tolerance, it is
possible define a machine so that in case of the master host fails this other one becomes the new
master host. This is known in GE like shadow qmaster host which runs the GE shadow daemon
"sge_shadowd", so in the event that the master host fails, the shadow daemon on one of the
shadow master machines will become the new master machine.

How to configure a shadow qmaster:

 Check that the new master host has read/write access:

The new master host must have read/write access to the qmaster spool directory and
common directory ($SGE_ROOT/default) as does the current master. For example, in
CESGA case, we mounted the "$SGE_ROOT/default" directory from the master host in
the shadow qmaster host.

 Give administration permissions to the shadow qmaster:

We should give permission on the master host to the machine where will be running the
shadow qmaster in order to it can acts like administrative and submit host, it is necessary
for the shadow qmaster host can do the same functions that the master host.

To adding the new host as an administrative and submit host:

>qconf -ah shadow_qmaster_machine


>qconf -as shadow_qmaster_machine

 Add the new host to the shadow_masters file:

Check if the “$SGE_ROOT/default/common/shadow_masters” file exists on the master


host. If the file exists, you can add the new qmaster host to this file. For example:
cat $SGE_ROOT/default/common/shadow_masters
svgd.local
test01.egee.cesga.es
If “$SGE_ROOT/default/common/shadow_masters” file does not exist, it can be
created with the same user with which one was installed GE on the machine for editing it
after. For example:
touch $SGE_ROOT/default/common/shadow_masters

 Starting the qmaster shadow on the qmaster shadow host. For example:

>$SGE_ROOT/default/common/sgemaster -shadowd
starting sge_shadowd

ps auxf | grep sge


root 3511 0.0 0.0 3660 652 pts/0 S+ 16:19 0:00 | \_ grep sge
root 3509 0.0 0.0 5284 796 ? S 16:19 0:00 /opt/cesga/sge62/bin/lx26-
x86/sge_shadowd

 This means that the shadow qmaster daemon is listening on the shadow qmaster host and
if for any reason the GE qmaster stops on the master host, for example because the
master host is being restarted, the GE qmaster would be started on the shadow qmaster
host

 How to force the migration of the GE qmaster to the shadow qmaster host (it must be a node
in shadow_masters file):

On the qmaster shadow host execute the following command:

>$SGE_ROOT/default/common/sgemaster -migrate

Example:

>$SGE_ROOT/default/common/sgemaster -migrate
shutting down qmaster on host "svgd" ...
starting sge_qmaster

 This means that the GE qmaster is stopped on the master host to be started on the
shadow qmaster host.

It can be found more information on link [R17]

6. USEFUL ADMIN COMMANDS


 To start/stop GE daemons on the Master Host and on the Execution Host respectively:

> /etc/init.d/sgemaster
(no parameters): start qmaster and execution daemon if applicable
"start" start qmaster and scheduler
"stop" shutdown local Grid Engine processes and jobs
"-qmaster" only start/stop qmaster and scheduler (if applicable)

>/etc/init.d/sgeexecd
(no parameters): start sgeexecd
"start" start sgeexecd
"stop" shutdown local Grid Engine processes and jobs
"softstop" shutdown local Grid Engine processes (no jobs)

Notes: If we stop the "qmaster" and the "scheduler" on the Master Host (CE), we don't see the
jobs running with 'qstat' because the GE commands are not available when "qmaster" is stopped
(there is no connection between the Master Host and GE through "qmaster"), but when we restart
them, we see the jobs running again, and they finished without problems. However, if we stopped
the "sge_execd" on the Execution Host (WN), when we restart it the job is killed although if we
stopped the "sge_execd" with the option "sofstop" in principle the job is not killed.

GE supports various Roles such as:

 Managers:

Managers have full capabilities to manipulate the grid engine system.

 Operators:

Operators can perform many of the same commands as managers, except that operators
cannot add, delete, or modify queues.

 Owners:

Queue owners are restricted to suspending and resuming, or disabling and enabling, the
queues that they own.

 Users:

Users have certain access permissions, but they have no cluster or queue management
capabilities.

The modification in the GE configuration is done by a “Manager” that is the user who has
permissions for modify these configurations, by default this user is root, but we can use another
user who we have created in the machine for these aims and the which is configured in one user
access list:
 To add one manager :

> qconf -am user-name #Add one or more users like manager

 To adding one or more users to the specified access list :

> qconf -au user-name acces-list-name

 To add/delete a administrative host, i.e. a host from which we can execute 'qconf'
> qconf -ah #Add
> qconf -dh #Delete

 To add/delete a submit host, i.e. a host from which we can execute 'qsub':

> qconf -as #Add


> qconf -ds #Delete

 To add/delete/modify/show an execution host, i.e. a host that will run jobs:

> qconf -ae #add


> qconf -de #Delete
> qconf -me execution_host #Modify
> qconf -se execution_host #Show

 To add/delete/modify/show a host group list:

> qconf -ahgrp group #Add


> qconf -dhgrp group #Delete
> qconf -mhgrp group #Modify
> qconf -shgrpl #Show

 To show the status of the jobs:

> qstat –u ‘*’


 Since the last GE updates it is necessary execute the command ‘qstat’ with the option ‘–u’
and wildcard “*’ between single quotes to see the status for all jobs in the cluster, otherwise
it is not showed anything.

 To see the jobs which are in queue:

> qstat -s p –u ‘*’

 To see the jobs which are running:

> qstat -s r –u ‘*’

 To see the jobs which have finalized:

> qstat -s z –u ‘*’

 To see the jobs which have finalized for one user with the previous options, i.e.:
> qstat -s z -u user

 To see the jobs which are running by queue and node:

> qstat –f –u ‘*’

It can also show the state of error “E” or Alarm “A” for the jobs and queues:

a) Case Error: To clean the state of Error for the queue 'qmod -cq queue' or for the
job 'qmod -cj jobID'

b) Case Alarm: In principle nothing can be done, just wait (normally is produced by a
high load in the execution node)

 To show all queues created:

> qconf -sql


 To add a new cluster queue

> qconf -aq queue (the argument queue is a queue which is created and we use as
template)

 To edit a queue:

> qconf -mq queue

 To show complex attributes:

> qconf -sc

Notes: Resource attribute definitions are stored in an entity called the grid engine system
"complex". Users can request "resource attributes" for jobs with 'qsub -l'. The "complex" builds
the framework for the system's "consumable resources" facility. The resource attributes that are
defined in the complex can be attached to the global cluster, to a host, or to a queue instance. The
attached attribute identifies a resource with the associated capability.

 To edit the complex:

> qconf -mc


 To list all jobs in queue and to see the reasons by which they don't enter in execution:

> qstat -j
Sample output:
[root@sa3-ce root]# qstat -j
[ .... ]
queue instance "lhcb@sa3-wn001.egee.cesga.es" dropped because it is
overloaded: np_load_avg=2.380000 (= 1.450000 + 0.50 * 1.860000 with nproc=1) >= 1.75
queue instance "ops@sa3-wn001.egee.cesga.es" dropped because it is
overloaded: np_load_avg=2.380000 (= 1.450000 + 0.50 * 1.860000 with nproc=1) >= 1.75
Jobs can not run because no host can satisfy the resource requirements
1337760
There could not be found a queue instance with suitable access permissions
1337968
Jobs can not run because queue instance is not in queue list of PE
1337968

Jobs can not run because available slots combined under PE are not in range of job
1337968, 1337970

Jobs can not run because queue instance is not of type batch or transfer
1338110

Jobs can not run because the resource requirements cannot be satisfied
1338110

Notes: Also to see the reasons by which a determined job not enters in execution or if the
job is running to see the time execution, memory (how much memory is consuming).

> qstat -j jobID

 To show additional information about the priority for each job

> qstat –pri –u ‘*’

 To show accounting information about a finished job

> qacct -j jobID

The 'qacct' command can be used to obtain varying degrees of information about a job, for
example, the queue name and the host the job was executed on, the job status of the
finished job, how long the job took and the maximum amount of memory used.
In the output of this command, we should watch the variables 'exit_status' and 'failed'. If
these variables have a value different from 0 it is signal that the job had some error or there
was some internal problem in GE. This is detailed in the section “11 Troubleshooting".

Sample output:
[root@sa3-ce root]# qacct -j 41711
==============================================================
qname cesga
hostname sa3-wn001.egee.cesga.es # The host the job was executed on
group cesga
owner esfreire
project NONE
department defaultdepartment
jobname STDIN
jobnumber 41711
taskid undefined
account sge
priority 19
qsub_time Fri Jun 15 13:22:39 2007
start_time Fri Jun 15 13:22:54 2007
end_time Fri Jun 15 13:25:55 2007
granted_pe NONE
slots 1
failed 0 # If it is different from 0 it usually indicates a failure in the configuration
of SGE
exit_status 0 # Exit code for the job, if is different from 0 it usually indicates
a job failure
ru_wallclock 181 # How long the job took
ru_utime 0
ru_stime 1
ru_maxrss 0
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 12271
ru_majflt 14128
ru_nswap 0
ru_inblock 0
ru_oublock 0
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 0
ru_nivcsw 0
cpu 1
mem 0.001
io 0.000
iow 0.000
maxvmem 20.379M # The maximum amount of memory used.

These parameters are kept in the accounting log file on the Master Host:
$SGE_ROOT/default/common/accounting

grep 41711 $SGE_ROOT/default/common/accounting


cesga:sa3-
wn001.egee.cesga.es:cesga:esfreire:STDIN:41711:sge:19:1181906559:1181906574:118190
6755:0:0:181:0:1:0.000000:0:0:0:0:12271:14128:0:0.000000:0:0:0:0:0:0:NONE:defaultdepart
ment:NONE:1:0:1.000000:0.001339:0.000000:-U cesga -q
cesga:0.000000:NONE:21368832.000000

The GE batch log file has a simple format using “:” as a field separator. These are the fields
that you can find in the accounting log file:

qname:hostname:group:owner:jobname:jobnumber:account:priority:qsub_time:start_time:en
d_time:failed:exit_status:
ru_wallclock:ru_utime:ru_stime:ru_maxrss:ru_ixrss:ru_ismrss:ru_idrss:ru_isrss:
ru_minflt:ru_majflt:ru_nswap:ru_inblock:ru_oublock:ru_msgsnd:ru_msgrcv:ru_nsignals:
ru_nvcsw:ru_nivcsw:project:department:granted_pe:slots:UNKNOWN:cpu:mem:
UNKNOWN:command_line_arguments:UNKNOWN:UNKNOWN:maxvmem_bytes

The accounting file format is documented in the GE man page [R13] in paragraph 5, section
"Accounting":

 To report account for Grid Engine usage

> qacct

Sample output:
[root@sa3-ce root]# qacct

Total System Usage


WALLCLOCK UTIME STIME CPU MEMORY IO
IOW
======================================================================
==
538426 36375 35599 75230 147.547 0.000
0.000
Notes: This command scans the accounting data file and produces a summary of
information for wall-clock time, cpu-time, and system time for the categories of
hostname, queue-name, group-name, owner-name, job-name, job-ID.

 To report all jobs that a given user has run during a certain time:

> qacct -d 10 -j -o user # It'll show all jobs finished for the user in the 10 last days and the
output for each job

 To submit a job with GE on the CE without complex

> qsub test.sh


Your job 42897 ("test.sh") has been submitted.

 To send a array of 200 jobs with qsub without complex:

> qsub -t 1-200 test.sh


Your job 42909.1-200:1 ("test.sh") has been submitted.

Submitting an array of jobs requesting complex:

> qsub -t 1-200 -l num_proc=1,s_rt=1:00:00,s_vmem=1G,h_fsize=1G test.sh


Your job 42908.1-200:1 ("test.sh") has been submitted.

 To delete a jobID:

> qdel jobID

 To delete the jobs of an user:

> qdel -u user

 To suspend/resume in job and to stop its execution momentarily:

> qmod -sj jobID # Suspend


> qmod -usj jobID # Resume
Note: When we suspend a job we stopped its execution, but we don't cancel it and when we
resume it, it continues again.

 To enable/disable a queue:

> qmod -e queue #Enable


> qmod -d queue #Disable

Note: When we enable a queue, we allow enter jobs in that queue for its execution.
When we disable a queue and on that queue there are jobs running we don't cancel them,
the jobs finish its execution correctly, but it would not enter more jobs in that queue.

 To change to the hold state to the jobs:

> qhold -h {u|o|s}

Note: The jobs in state of hold don't enter in execution. It is useful, for example if there are
some jobs that we don't want that they enter in execution, we put them in hold state.

 To change the state of hold the jobs:

> qalter -h {u|o|s}

 To give more priority to a jobId or to the jobs of an user which are in queue

> qalter -p 1024 jobID


> qalter -p 1024 -u user

7. HOW TO RUN ARRAY JOBS USING GE

GE allows submitting together a collection of similar jobs using only one job script. The scheduler
executes each job in the array when resources are available. To do this with use the Array job
option.
Some advantages of using array jobs are:

 It is only necessary to write one shell script.

 You can keep track of the status of all the jobs in the array using only one job id.

 If you submit an array job, and realize you’ve made a mistake, you only have one job id to
'qdel', instead of figuring out how to remove 100s of them.

 The memory consumption is much lower on the Master Host compared to the situation when
all the jobs run independently
For example, we can need run a large number of jobs, and they are largely identical in terms of the
command to run. In order to do this, you could generate many shell scripts, and submit them to the
queue, but instead this, you can use the job array option. For this you must execute the 'qsub' with
the option:

-t min-max:interval

The -t option defines the task index range, where min is the lower index, max the higher index and
interval the interval of the jobs.

GE runs the executable once for each number specified in the task index range and it generates the
variable, $SGE_TASK_ID, which we can use on the executable in order to determine the task
number each time it is run. It can select input files or other options to run according to its task index
number. A very simple example:
Supposing that we have two inputs in /opt/exp_soft/cesga/ where job1.in and job2.in contain a
column of numbers, and we want to read each number of inputs both, we can do a script like the
following:

jobArray.sh
#!/bin/bash

echo "Number of the job is $SGE_TASK_ID"

for n in `cat /opt/exp_soft/cesga/job$SGE_TASK_ID.in`

do

echo "___________ Reading number $n ___________"


done

We submit the job:


qsub -t 1-2 jobArray.sh

The contents of each output file are shown below.


cat jobArray.sh.o43447.1

Number of the job is 1

___________ Reading number 1 ___________

___________ Reading number 2 ___________


___________ Reading number 3 ___________

[ ... ]

cat jobArray.sh.o43447.2

Number of the job is 2

___________ Reading number 11 ___________

___________ Reading number 12 ___________

___________ Reading number 13 ___________

[ ... ]

8. HOW TO CONFIGURE A PARALLEL ENVIROMENT

In order to handle parallel jobs running, GE provides a flexible and powerful interface. For example
we can use one of the most known “Message Passing Environment” MPI on GE. With the purpose
of doing this possible, we should:

MPI support for the lcg-CE with GE is still under development, however, it is possible configure MPI
in a lcg-CE supporting GE in production on your own risk following the instructions on link [R18]

 Add a parallel environment

qconf -ap openmpi_egee


pe_name openmpi_egee
slots 16 # Number total of nodes on which we can run job
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE # Full control over slave tasks
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
We also should configure a queue with this environment in order to have a queue in which to
be able to run parallel jobs. For example:

[root@ce3 ~]# qconf -sq GRID_ops


qname GRID_ops
[ ... ]
pe_list openmpi_egee
[ ... ]

 Testing:
 Submitting a job directly to the GE batch system:

qsub -l num_proc=1 -pe openmpi_egee 2 test.sh

Note: In this example, it is requested 2 nodes with one processor each one

test.sh content:
#!/bin/bash

MPI_FLAVOR=OPENMPI
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`

# Path where is installed the MPI package


export MPI_PATH=/opt/i2g/openmpi

# Ensure the prefix is correctly set. Don't rely on the defaults.


eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX

export X509_USER_PROXY=/tmp/x509up_u527

export I2G_TMP=/tmp
export I2G_LOCATION=/opt/i2g
#export I2G_OPENMPI_PREFIX=/opt/i2g/openmpi
export I2G_MPI_TYPE=openmpi
export I2G_MPI_FLAVOUR=openmpi

#PATH to the application that we want to run


## This application should be copied in the WN, in this example it was copied to the
/tmp directory
export I2G_MPI_APPLICATION=/tmp/cpi

export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_NP=2
export I2G_MPI_JOB_NUMBER=0
export I2G_MPI_STARTUP_INFO=/home/glite/dteam004
export I2G_MPI_PRECOMMAND=
export I2G_MPI_RELAY=

# PATH where is installed de mpi-start RPM


export I2G_MPI_START=/opt/i2g/bin/mpi-start

export I2G_MPI_START_DEBUG=1
export I2G_MPI_START_VERBOSE=1
$I2G_MPI_START

 Submitting the previous job using a WMS:

Building the JDL file:


esfreire@ui mpi_egee]$ cat lanzar_cpi_mpi.jdl
JobType = "Normal";
VirtualOrganisation = "dteam";
NodeNumber = “2;
# We use a wrapper script to starting the MPI job
Executable = "mpi-start-wrapper.sh";
Arguments = "cpi OPENMPI";
StdOutput = "cpi.out";
StdError = "cpi.err";
InputSandbox = {"cpi","mpi-start-wrapper.sh"};
OutputSandbox = {"cpi.out","cpi.err"};
# In this example, we are requesting available queue on one of CEs in CESGA-
EGEE production
Requirements = other.GlueCEUniqueID == "ce2.egee.cesga.es:2119/jobmanager-
lcgsge-GRID_dteam"

mpi-start-wrapper.sh content:
[esfreire@ui mpi_egee]$ cat mpi-start-wrapper.sh
#!/bin/bash
# Pull in the arguments.
MY_EXECUTABLE=`pwd`/$1

MPI_FLAVOR=$2

# Convert flavor to lowercase for passing to mpi-start


MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`

# Pull out the correct paths for the requested flavor.


eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`
export MPI_PATH=$MPI_PATH

# Ensure the prefix is correctly set. Don't rely on the defaults.


eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX

# Touch the executable. It exist must for the shared file system check.
# If it does not, then mpi-start may try to distribute the executable
# when it shouldn't.
touch $MY_EXECUTABLE
chmod +x $MY_EXECUTABLE

# Setup for mpi-start.


export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
# optional hooks
#export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh
#export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh

# If these are set then you will get more debugging information.
#export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1

# Invoke mpi-start.
$I2G_MPI_START

Submitting the job:


glite-wms-job-submit -a -o mpi.job lanzar_cpi_mpi.jdl
Downloading the job results:
glite-wms-job-output -i mpi.job --dir ~/jobOutput/
Note: In both examples, we are using a compiled application called “cpi” what calculates PI
number.

9. HOW TO ASSIGN PRIORITIES TO GROUPS AND USERS

If we want that the jobs of a certain VO to have more priority at the time of entering execution, we
can give it with the command 'qconf -mu VO_name' on the parameters fshare (The current
functional share of the department.) and oticket (the amount of override tickets currently assigned
to the department). Before assigning these two parameters we must configure the parameter type
as "ACL DEPT" or otherwise it will not let assign priorities.

In the next example, first we show a list of all groups created, there we see the departments
created. Then we edit the Ops group, in order to the jobs sent to VO Ops have more priority than
other jobs. It can be established priorities to all the VOs, and to give them for example a value
between 0 (low priority/value by default) and 9000 (high priority) and so you can establish
priorities based on your site. Also if we want to give more priority to a user, we can create one
"userlist" with that user, and then give him more priority on that "userlist".

[esfreire@sa3-ce esfreire]$ qconf -sul


biomed
cesga
compchem
deadlineusers
defaultdepartment
dteam
fusion
lhcb
ops
swetest

[esfreire@sa3-ce esfreire]$ qconf -su ops


name ops
type ACL
fshare 1000
oticket 1000
entries ops001,ops002,ops003,ops004,ops005,ops006,ops007,ops008,ops009,ops010, \
ops011,ops012,ops013,ops014,ops015,ops016,ops017,ops018,ops019,ops020, \
ops021,ops022,ops023,ops024,ops025,ops026,ops027,ops028,ops029,ops030, \
ops031,ops032,ops033,ops034,ops035,ops036,ops037,ops038,ops039,ops040, \
ops041,ops042,ops043,ops044,ops045,ops046,ops047,ops048,ops049,ops050, \
ops051,ops052,ops053,ops054,ops055,ops056,ops057,ops058,ops059,ops060, \
ops061,ops062,ops063,ops064,ops065,ops066,ops067,ops068,ops069,ops070, \
ops071,ops072,ops073,ops074,ops075,ops076,ops077,ops078,ops079,ops080, \
ops081,ops082,ops083,ops084,ops085,ops086,ops087,ops088,ops089,ops090, \
ops091,ops092,ops093,ops094,ops095,ops096,ops097,ops098,ops099,opssgm, \
opsprd

[esfreire@sa3-ce esfreire]$ qconf -au opssgm opssgm


added "opssgm" to access list "opssgm"

[esfreire@sa3-ce esfreire]$ qconf -su opssgm


name opssgm
type ACL DEPT
fshare 0
oticket 1000
entries opssgm

10. ONE CONFIGURATION EXAMPLE

For this configuration example:

 We suppose that we have a Cluster formed by 80 nodes


 We have many users and who send average 20 jobs each one.
 We give priority to the short jobs
 We give more priority to the jobs of the Dteam and Ops VOs

An optimized "Global Cluster Configuration" could be:

[esfreire@sa3-ce~]$ qconf -sconf


global:
execd_spool_dir /usr/local/sge/pro/default/spool
mailer /bin/mail
xterm /usr/bin/X11/xterm
load_sensor none
prolog none
epilog none
shell_start_mode unix_behavior
login_shells sh,ksh,csh,tcsh,bash
min_uid 0
min_gid 0
user_lists none
xuser_lists none
projects none
xprojects none
enforce_project false
enforce_user auto
load_report_time 00:00:40
max_unheard 00:05:00
reschedule_unknown 00:00:00
loglevel log_info
administrator_mail egee-admin@cesga.es
set_token_cmd none
pag_cmd none
token_extend_time none
shepherd_cmd none
qmaster_params enabled_force_qdel=true
execd_params none
reporting_params accounting=true reporting=true \
flush_time=00:00:15 joblog=false \
sharelog=00:00:00
finished_jobs 80
gid_range 5000-5100
qlogin_command telnet
qlogin_daemon /usr/sbin/in.telnetd
rlogin_daemon /usr/sbin/in.rlogind
max_aj_instances 2000
max_aj_tasks 75000
max_u_jobs 80
max_jobs 0
auto_user_oticket 0
auto_user_fshare 0
auto_user_default_project none
auto_user_delete_time 86400
delegated_file_staging none
reprioritize 1

With this configuration, each time that a job fails, we will receive an e-mail with the output of error
for the job. We are allowing delete jobs with 'qdel -f', this is useful when the jobs are disappointed.
Like we have 80 nodes, we only allow that one user only can have 80 jobs in queue, the user will
not be able send more of 80 jobs.
An optimized way for the Scheduler Configuration, it could be:

[esfreire@sa3-ce ~]$ qconf -ssconf


algorithm default
schedule_interval 00:2:00
maxujobs 5
queue_sort_method seqno
job_load_adjustments np_load_avg=0.50
load_adjustment_decay_time 0:7:30
load_formula np_load_avg
schedd_job_info true
flush_submit_sec 10
flush_finish_sec 10
params MONITOR=0
reprioritize_interval 0:15:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 5.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 10000
weight_tickets_share 10000
share_override_tickets TRUE
share_functional_shares TRUE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OFS
weight_ticket 2.000000
weight_waiting_time 2.000000
weight_deadline 3600000.000000
weight_urgency 1.000000
weight_priority 1.000000
max_reservation 5
default_duration 0:10:0
With this configuration, the scheduler interval runs every two minutes, trying not to overload the
machine. Like we only have 80 nodes, and we have many users we only allow that one user only
can have 5 jobs running at the same time in order to all users can have jobs running. We use the
queue sorting scheme by sequence number of this way, each VO has its own queue, each queue
has one sequence number assigned and also we have a queue in the which jobs of all the VO are
allowed but only of one hour of execution can enter, therefore this queue will have the sequence
number more short for it try the jobs find first this queue and then the other queues will be sorted for
the sequence number. We specified the sequence number on the parameter seq_no, for example
we "short queue" could be so:

[esfreire@sa3-ce~]$ qconf -sq short_queue


qname short_queue
hostlist sa3-wn001.egee.cesga.es
seq_no 0 #With this value “0” , it will be the first queue which the job find,
in case that we do not have any queue else with this value
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH
ckpt_list NONE
pe_list make
rerun FALSE
slots 1
tmpdir /tmp
shell /bin/ksh
prolog NONE
epilog NONE
shell_start_mode unix_behavior
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt 1:00:00 # The jobs for this queue can't request more of one hour.
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize 400M
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem 400M
h_vmem INFINITY

In order to give more priority to the jobs of the "dteam" and "ops" VOs, and to try that the jobs of
these VOs have the least time possible of delay to enter to execution, we always add an additional
slot reserved for Ops and dteam.

We can allow that in a node run at the same time a job of another VO and a job of dteam/ops
because these jobs consume neither much Memory nor much CPU.

For example:

qconf -sq ops


[ ... ]
slots 2
[ ... ]
s_rt 1:00:00
[ .... ]
s_vmem 400M

qconf -sq dteam


[ ... ]
slots 2
[ .... ]
s_rt 1:00:00
[ ... ]
s_vmem 400M

Rest of queues:

qconf -sq '*'


[ ... ]
slots 1
[ ... ]

Note: In this case we played with two slots, because we suppose that these nodes only have a
processor each one.

11. TROUBLESHOOTING

We can know if a job has finished correctly executing the command 'qacct -j jobid_of_the_job'. On
the output of this command, we must watch the variables 'exit_status' and 'failed'. If these
variables have a value different from 0 it is signal that the job had some error or there was some
internal problem in GE, for example on the epilog script or node on which ran the job has some
problem. If we see that these values are different from 0, we can look for tracks on that it could be
failing in the following sites:

 In the files that are generated when finishing the job:

xxx.ojobID (where the output of data for the job is redirected)


xxx.ejobID (where the output of errors for the job is redirected)

 Important logs that we must watch, for example we can looking for by jobid of the job:

a) On the master node (CE):

Qmaster's logs:

[esfreire@sa3-ce esfreire]$ grep 42895


$SGE_ROOT/default/spool/qmaster/messages

06/22/2007 07:06:31|qmaster|sa3-ce|I|job 42895.1 finished on host sa3-


wn001.egee.cesga.es
Resources that asked for the job, in case that the job asked resources or for see
what consumed the job:

[esfreire@sa3-ce esfreire]$ grep 42895 $SGE_ROOT/default/common/accounting


dteam:sa3-
wn001.egee.cesga.es:dteam:dteam006:STDIN:42895:sge:19:1182488713:11824887
15:1182488790:0:0:75:17:22:0.000000:0:0:0:0:272462:463298:0:0.000000:0:0:0:0:0:
0:NONE:defaultdepartment:NONE:1:0:39.000000:0.233644:0.000000:-U dteam -q
dteam:0.000000:NONE:2068172800.000000

b) On the execution Host (Wn):

Qmaster's logs:

[root@sa3-wn001 root]# grep 42895 $SGE_ROOT/default/spool/sa3-


wn001/messages

Besides, we can check the exit_status obtained and see its meaning in the manual “N1 Grid
Engine User Guide" [R19] and we can also subscribe to the mailing list where we send our
consultations.
It is possible to be subscribed to the list on link [R20].

It can be found more information about the error messages and troubleshooting on link [R21]
12. COMMANDS QUICK REFERENCE
TARGET ACTION CMD. SWITCH TARGET ACTION CMD. SWITCH
SCHEDULER SHOW qconf -sss ADMIN HOST CREATE qconf -ah name

TERMINA qconf -ks DELETE qconf -dh name


TE
SCHEDULER MODIFY qconf -msconf LIST qconf -sh
CONFIG
SHOW qconf -ssconf ADVANCE CREATE qresu (SEE
RESERVATION b MANPAGE)
SHARE TREE ADD qconf -astnode DELETE qrdel res_id
NODE path

CREATE qconf -astree SHOW qrstatf -ar ar_id

DELETE qconf -dstree LIST qrstat -u ‘*’

SHOW qconf -sstree CALENDAR CREATE qconf -acl name


SUBMIT CREATE qconf -as name DELETE qconf -dcal name
HOST
DELETE qconf -ds name MODIFY qconf -mcal name

SHOW qconf -ss SHOW qconf -scal name


USER CREATE qconf -auser LIST qconf -scall

DELETE qconf -duser CHECKPOINT CREATE qconf -ackpt name


name ENVRIONMENT

MODIFY qconf -muser DELETE qconf -dckpt name


name

SHOW qconf -usq name MODIFY qconf -mckpt name

LIST qconf -suserl SHOW qconf -sckpt name


USER LIST CREATE qconf -au LIST qconf -sckptl
user_nam
e
list_name

DELETE qconf -dul COMPLEX CREATE qconf -mc


list_name ENTRY

DELETE qconf -du DELETE qconf -mc


USER user_nam
e
list_name

MODIFY qconf -mu MODIFY qconf -mc


list_name
SHOW qconf -su SHOW qconf -sc
list_name

LIST qconf -sul


USER SET SEE “USER LIST” CONSUMABLE SEE “COMPLEX ENTRY”

ARGET ACTION CMD. SWITCH TARGET ACTION CMD. SWITCH


DEPARTMEN SEE “USER LIST” JOB ALTER qalter (SEE
T MANUAL)
EVENT SHOW qconf -secl CLEAR qmod -cj jobID
CLIENT LIST ERROR
EXEC HOST STOP qconf -ke name HOLD qalter -hold_jid
JOBID

If name is “all”, all exec hosts HOLD qhold jobID


will be killed
EXEC HOST CREATE qconf -ae RELEASE qrls -h n jobID
CONFIG [config]

Name conifg with “-ae” to RESCHED qmod -rj jobID


import as template ULE

DELETE qconf -de name | SHOW qstat -j jobID


global

MODIFY qconf -me name Note: In most cases, job name


| global and wildcard (*) patterns can be
used in place of jobID

SHOW qconf -se name | LIST qstat -u '*'


global
GLOBAL SEE “EXEC HOST CONFIG” SUBMIT qrsh (SEE
EXEC HOST MANPAGE)
CONFIG

GLOBAL SEE “HOST CONFIG” SUBMIT qrsub (SEE


HOST MANPAGE)
CONFIG

HOST CREATE qconf -aconf SUSPEND qmod -sj jobID


CONFIG name

DELETE qconf .dconf TERMINAT qdel jobID


name E

MODIFY qconf -mconf MANAGE / CREATE qconf -am name


[name] OPERATOR

SHOW qconf -sconf DELETE qconf -dm name


[name]
LIST qconf -sconfl LIST qconf -sm

Note: The “global” host PARALLEL CREATE qconf -ap name


configuration cannot be ENVIRONMENT
deleted. If [name] not provided, (“PE”)
GE assumes “global”.
HOST CREATE qconf -ahgrp DELETE qconf -dp name
GROUP @name

DELETE qconf -dhgrp MODIFY qconf -mp name


@name

MODIFY qconf -mhgrp SHOW qconf -sp name


@name

SHOW qconf -shgrp LIST qconf -spl


@name

LIST qconf -shgrpl QMASTER STOP/TER qconf -km


MINATE
PROJECT CREATE qconf -aprj name QUEUE CLEAR qmod -cq name |
ERROR '*'

DELETE qconf -dprj name CREATE qconf -aq name

MODIFY qconf -mprj DELETE qconf -dq name


name

SHOW qconf -sprj name MODIFY qconf -mq name

LIST qconf -sprjl RESUME qconf -usq name


RESOURCE SEE “COMPLEX ENTRY” SHOW qconf -sq [name]
RESOURCE CREATE qconf -arqs
QUOTA SET [name] -sq used without [name] prints
default template

DELETE qconf -drqs LIST qstat -f | -u '*'


name

MODIFY qconf -mrqs LIST qselec


name t

SHOW qconf -srqs LIST qconf -sql


[name]

LIST qconf -srqsl SUSPEND qmod -sq name


SETS

SHOW qquota -u '*'

Das könnte Ihnen auch gefallen