GE Cookbook v3 0

GE COOKBOOKV3.
Document identifier: GE_Cookbook-v3-0.odt
Date: 17/01/2012
Activity: SA1.5
Document status: FINAL
Document link:
Abstract: A quick reference for using GE batch system in an EMI environment.
PUBLIC 1 / 45
This work is co-funded by the EC EMI project under the FP7 Collaborative Projects Grant Agreement Nr.
INFSO-RI-261611.
Copyright notice:
Copyright © EGI.eu. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To
view a copy of this license, see CreativeCommons license or send a letter to Creative Commons, 171 Second Street, Suite 300, San
Francisco, California, 94105, USA. The work must be attributed by attaching the following reference to the copied elements: “Copyright ©
EGI.eu (www.egi.eu). Using this document in a way and/or for purposes not foreseen in the license, requires the prior written permission
of the copyright holders. The information contained in this document represents the views of the copyright holders as of the date such
views are published.
Delivery Slip
Name Partner/ Date Signature
Activity
From
Verified
Reviewed by
Approved by
Document Log
Issue Date Comment Author/Partner
0-1 28/06/2007 Initial Draft Javier Lopez - CESGA Esteban Freire - CESGA
1-0 29/06/2007 Ready for comments Javier Lopez - CESGA
Updated to reflect the comments from John
1-1 16/07/2007 Esteban Freire – CESGA
Walsh (TCD)
1-2 20/07/2007 Minor changes Javier Lopez - CESGA
Minor changes. Updated to reflect the
1-3 04/10/2007 Esteban Freire- CESGA
comments from Pablo Rey (CESGA)
Update to adapt the document to the last SGE
and gLite version
Adding a new paragraph to include installation
notes of CREAM-CE with SGE
Changes for Open Grid Scheduler / Grid
3-0 17/01/2012 Engine installation and configuration instead Roberto Rosende - CESGA
Sun Grid Engine and adaptation to EMI
Document Change Record

Issue Item Reason for Change
Table of contents
1. INTRODUCTION............................................................................................................................................................4
1.1. PURPOSE.........................................................................................................................................................................4
1.2. DOCUMENT ORGANIZATION................................................................................................................................................4
1.3. APPLICATION AREA...........................................................................................................................................................4
1.4. REFERENCES...................................................................................................................................................................4
1.5. DOCUMENT AMENDMENT PROCEDURE..................................................................................................................................5
1.6. TERMINOLOGY.................................................................................................................................................................5
2. OVERVIEW......................................................................................................................................................................5
3. GE IN THE "LCG/GLITE" CONTEXT.......................................................................................................................6
4. INSTALLING GE.............................................................................................................................................................6
4.1. INTRODUCTION.................................................................................................................................................................6
4.2. INSTALL PROCCESS FOR EMI CREAM-CE WITH GE..........................................................................................7
4.3. CREAM INSTALLATION FOR GE.......................................................................................................................................8
4.4. STARTING GE.........................................................................................................................................................12
5. CONFIGURING GE ON CE.........................................................................................................................................12
5.1. POLICY_HIERARCHY PARAMETER AND OTHER PARAMETERS ABOUT PRIORITIES..........................................................................15
5.2. HOW TO CONFIGURE A EPILOG OR PROLOG SCRIPT.............................................................................................................17
5.3. HOW TO CONFIGURE RESOURCE QUOTAS...........................................................................................................................18
5.4. HOW TO CONFIGURE A SHADOW QMASTER..........................................................................................................................20
6. USEFUL ADMIN COMMANDS...................................................................................................................................21
7. HOW TO RUN ARRAY JOBS USING GE..................................................................................................................29
8. HOW TO CONFIGURE A PARALLEL ENVIROMENT.........................................................................................31
9. HOW TO ASSIGN PRIORITIES TO GROUPS AND USERS .................................................................................35
10. ONE CONFIGURATION EXAMPLE.......................................................................................................................36
11. TROUBLESHOOTING...............................................................................................................................................41
12. COMMANDS QUICK REFERENCE........................................................................................................................43

1. INTRODUCTION
1.1. PURPOSE
To provide site administrators a quick reference for helping them using GE in their sites.
1.2. DOCUMENT ORGANIZATION

The document is divided in twelve sections.
1.3. APPLICATION AREA

This document is intended for site administrators.
1.4. REFERENCES
Table 1: Table of references
R1 Grid Engine http://gridscheduler.sourceforge.net/
R2 Sun Grid Engine http://www.oracle.com/us/sun/index.htm
R3 SUN INDUSTRY STANDARDS SOURCE LICENSE

http://gridengine.sunsource.net/project/gridengine/Gridengine_SISSL_license.html
R4 Upgrading Open Grid Scheduler Software
http://gridscheduler.sourceforge.net/howto/howto.html
R5 SGE Wiki Page https://twiki.cern.ch/twiki/bin/view/LCG/ImplementationOfSGE
R6 SGE stress tests on LCG-CE https://twiki.cern.ch/twiki/bin/view/LCG/SGE_Stress
R7 GE installation
http://gridscheduler.sourceforge.net/CompileGridEngineSource.html
R8 YAIM info, http://yaim.info/
R9 Job Manager releases http://www.egee.cesga.es/lcgsge/releases/
R 10 GE installation with CREAM-CE EMI
http://wiki.italiangrid.it/twiki/bin/view/CREAM/SystemAdministratorGuideForEMI1
R 11 SGE server yaim interface
http://eticssoft.web.cern.ch/eticssoft/repository/org.glite/org.glite.yaim.sge-
server/4.1.1/noarch/glite-yaim-sge-server-4.1.1-1.noarch.rpm
R 12 GE support mailing list ge-support@listas.cesga.es
R 13 Grid Engine Documentation http://gridengine.sunsource.net/manpages.html
R 14 Scheduling Policies
http://docs.sun.com/app/docs/doc/817-5677/6ml49n2bs?q=seq_no&a=view
R 15 Specification Resource quota
http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/deve
l/rfe/ResourceQuotaSpecification.html
R 16 RQS common uses http://wiki.gridengine.info/wiki/index.php/RQS_Common_Uses
R 17 Migrating qmaster to Another Host http://docs.sun.com/app/docs/doc/820-
0698/6ncdvjcl4?a=view
R 18 Site Configuration for MPI http://grid.ifca.es/wiki/Middleware/MpiStart/MpiUtils
R 19 Sun N1 Grid Engine 6.1 Collection http://www.sun.com/blueprints/1005/819-4325.html
R 20 Grid Engine Mail Lists http://gridengine.sunsource.net/maillist.html
R 21 SGE troubleshooting http://gridscheduler.sourceforge.net/howto/troubleshooting.html
1.5. DOCUMENT AMENDMENT PROCEDURE

This document may be update as new changes appear in GE software or gLite middleware.
Amendments, comments and suggestions should be sent to the authors.
1.6. TERMINOLOGY
This subsection provides the definitions of terms, acronyms, and abbreviations required to properly
interpret this document. A complete project glossary is provided in the EGEE glossary
Glossary
Acronym Meaning
SGE Sun Grid Engine
JobID Number of job assigned by SGE
SGE Qmaster Master Host
YAIM A tool to configuring Grid Services
JM Job Manager
CE Computing Element
WN Worker node or Execution hosts
RQS Resource quotas
GE Grid Engine
2. OVERVIEW
Grid Engine [R1] is an open source job management system developed initially by Sun and now
supported and developed by the Open Grid Scheduler Project. There is also a commercial version
including support from Sun and Oracle [R2]
Some important features include:
 Extensive operating system support: RedHat, Debian, HP-UX, Solaris, etc.

 Flexible Scheduling Policies: Priority; Urgency;
Ticket-based: Share-Based, Functional, Override
 Support for Subordinate Queues
 Support for Array Jobs
 Support for Interactive Jobs (qlogin)
 Support for Complex Resource Attributes (i.e. it allows to define the number of licenses
available for a program)
 Shadow Master Hosts (high availability)
 Accounting and Reporting Console (ARCo)
 Tight integration of parallel libraries
 Implements Calendars for Fluctuating Resources
 Supports Check pointing and Migration
 Supports DRMAA 1.0
 Supports Resource Quotas (i.e. allows to limit the maximum number of running jobs per user
or user group)
 Transfer-queue Over Globus (TOG)
 Intuitive Graphic Interface: Used by users to manage jobs and by admins to configure and
monitor their cluster
 Good Documentation: Administrator’s Guide, User’s Guide, mailing lists, wiki, blogs
 Enterprise-grade scalability: 10,000 nodes per one master
3. GE IN THE "LCG/GLITE" CONTEXT

Some virtues of SGE in the EMI context:
 GE include a scheduler, 'sge_schedd', so it is not required to maintain a separate

scheduler.
 GE allows:
 Configuration of a wide range of policies of group and users, therefore VO policies.
 Resource reservation for a given user and group (and thus VO), in other words, making
sure that we will have available resources for those jobs.
 Can set limits on the maximum resources a user/group can use, in order to establish, for
example, jobs that a user/group can run at the same time.
 Supports complex attributes in order to establish, for example, processors, memory and disk
usage.
4. INSTALLING GE
4.1. INTRODUCTION
GE hosts are classified into four groups:
 Master host.
The master host (GE Qmaster) is central for the overall cluster activity. The master host runs
the master daemon 'sge_qmaster'. This daemon controls all GE components such as
queues and jobs. On master host usually runs the "scheduler" 'sge_schedd'. This daemon
makes the scheduling decisions.
 Execution hosts.
Execution hosts are nodes that have permission to run jobs. On these hosts run the
execution daemon 'sge_execd'. This daemon controls the queues local to the machine on
which it is running and executes the jobs sent from 'sge_qmaster' to be run on these
queues.
 Administration hosts.
Permission can be given to hosts other than the master host to carry out any kind of
administrative activity.
 Submit hosts.
Submit hosts allow for submitting and controlling batch jobs only. In particular, a user who is
logged into a submit host can use 'qsub' to submit jobs, can use 'qstat' to control the job
status, or can run the graphical user interface 'qmon'.
Note: A host can belong to more than one group.
4.2. INSTALL PROCCESS FOR EMI CREAM-CE WITH GE

To install GE, the source code can be downloaded from official web page [R1], then it can be
compiled following steps in that that page.
To install compiled version:

 On CE:
 setenv SGE_ROOT < Your Target Directory > (or export SGE_ROOT=< Your Target
Directory > if your shell is sh, bash, ksh)
 mkdir $SGE_ROOT
 scripts/distinst -all -local -noexit
 cd $SGE_ROOT
 ./install_qmaster
Here we can ask all question with default answers, except that three:
Do you want to install Grid Engine
under an user id other than >root< (y/n) [y] >> n
Grid Engine TCP/IP service >sge_qmaster<

Please enter an unused port number >> 536
Grid Engine TCP/IP service >sge_execd<

 It's necessary define now appropiate variables in enviroment for right work, we can do it
with provided script for this:
source /usr/local/ge2011.11/default/common/settings.sh
*Noticed, check that /etc/profile.d/sge.sh has appropiate paths, example:
# Define SGE_ROOT directory and SGE commands
export SGE_ROOT=/usr/local/ge2011.11
export SGE_CELL=default
. /usr/local/ge2011.11/default/common/settings.sh
 On WN:
 setenv SGE_ROOT < Your Target Directory > (or export SGE_ROOT=< Your Target
Directory > if your shell is sh, bash, ksh)
 mkdir $SGE_ROOT
 scripts/distinst -all -local -noexit
 cd $SGE_ROOT
 ./install_qmaster
Here we can ask all question with default answers, except that three:
Do you want to install Grid Engine
under an user id other than >root< (y/n) [y] >> n
Grid Engine TCP/IP service >sge_qmaster<

Grid Engine TCP/IP service >sge_execd<

 Next install execution daemon: ./install_execd
 It's necessary define now appropiate variables in enviroment for right work, we can do it
with provided script for this:
source /usr/local/ge2011.11/default/common/settings.sh
*Noticed, check that /etc/profile.d/sge.sh has appropiate paths, example:
# Define SGE_ROOT directory and SGE commands
export SGE_ROOT=/usr/local/ge2011.11
export SGE_CELL=default
. /usr/local/ge2011.11/default/common/settings.sh
4.3. CREAM INSTALLATION FOR GE

The roadmap followed to implement CREAM with GE can be found on link [R5]. It also can be found
on link [R6] the results for the GE stress tests of lcg-CE what we did in our CESGA-SA3 testbed to
testing GE capacity.
The administrator can choose between install CREAM and GE Qmaster in the same physical
machine or install CREAM and GE Qmaster in different physical machines. It is recommended
install the CREAM and GE Qmaster in different machines in order to do not mix both services.
 CREAM and GE Qmaster in the same physical machine
Basically, the process consists in installing the necessary RPMs and configures with YAIM
[R8] the services. The detailed installation instructions can be found on link [R7].
 CREAM and GE Qmaster in different physical machines

First of all, make sure that you are using the same GE version for client and server tools,
and that the GE installation paths are the same in the CE and in the GE Qmaster server.
After installing, the site administrator can find one of the following two cases before
configure with YAIM:
 Have the control of the GE Qmaster (the administrator can do changes on the GE
configuration):
Make sure that in the Qmaster configuration it is present the following setting:
“execd_params INHERIT_ENV=false”. This setting allows propagating the environment
of the CE into the WN. It should be there by default if it is used the “sge-server yaim
plugin”. If not, it can be added using:
> qconf -mconf
- See section “5 Configuring GE on CE" to get more information.
It is necessary declare the CE as an allowed submission machine on the GE Qmaster:
>qconf -as <CE.MY.DOMAIN>
- See section “6 Useful admin commands” to get more information.CREAM-CE installation

for GE
The detailed installation instructions of CREAM-CE with GE in EMI can be found on link [R10].
The CREAMCE should be installed in a separate node from the GE QMASTER machines in order
to do not mix both services and the same GE software version should be used in both cases.
However, the administrator can choose between install CREAM-CE and GE Qmaster in the same
physical machine or install CREAM-CE and GE Qmaster in different physical machines:
 Installing CREAM-CE and GE Qmaster in the same physical machine

The detailed installation instructions can be found on link [R10].
Note: In this example we do assume that no GE NFS installation will be used.
 Install GE as described in 4.2 INSTALL PROCCESS for Lcg-ce/CREAM-CE WITH GE

 Install cream-ce and ge-utils meta package
>yum install emi-cream-ce
>yum install emi-ge-utils
 Configure appropriate queues, vo's and users for your batch system, you can see
section 5. CONFIGURING GE on CE to do it.
 Set the following variables relevant in the site-info.def file:
BATCH_SERVER="GE Qmaster FQN"

BATCH_VERSION="GE version"
BATCH_BIN_DIR="Directory where the GE binary client tools are installed in the CE"
Example: /usr/local/sge/pro/bin/lx26-x86
BATCH_LOG_DIR="Path for the SGE accounting file".
Example: /usr/local/ge2011.11/default/common/accounting
SGE_ROOT="The GE installation directory"
SGE_CELL="SGE cell definition". Default: default
SGE_QMASTER="GE qmaster port". Default: 536
SGE_EXECD="GE execd port". Default: 537
SGE_SPOOL_METH="GE spooling method".
BLPARSER_WITH_UPDATER_NOTIFIER="true"
JOB_MANAGER=sge
CE_BATCH_SYS=sge
 Configure the CREAMCE and GE_utils services: (in siteinfo/site-info.def the

‘BATCH_SERVER’ variable should point to the CREAMCE machine)
>/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n creamCE -n SGE_utils
 The transferring of files between WN and CE is handled by a script, called

sge_filestaging, which must be available in all WNs under /opt/glite/bin, and which you
may find in your Cream-CE installation under /opt/glite/bin/sge_filestaging. This script
must be executed as prolog and epilog of your jobs. Therefore, you should define
/opt/glite/bin/sge_filestaging --stagein and /opt/glite/bin/sge_filestaging --stageout
as prolog and epilog scripts, either in GE global configuration “qconf -mconf" or in each
queue configuration "qconf -mq <QUEUE>".
 Configuring CREAM-CE and GE Qmaster in different physical machines
Note: In this example we do assume that no GE NFS installation will be used.
 Install GE as described in 4.2 INSTALL PROCCESS for Lcg-ce/CREAM-CE WITH GE

 Install cream-ce and ge-utils meta package
>yum install emi-cream-ce
>yum install emi-ge-utils
 Set the following variables relevant in the site-info.def file (be also sure that you include
the VOs and QUEUES information which you want to setup):
BATCH_SERVER="GE Qmaster FQN"

BATCH_VERSION="GE version"
BATCH_BIN_DIR="Directory where the GE binary client tools are installed in the CE"
Example: /usr/local/ge2011.11/bin/linux-x64/
BATCH_LOG_DIR="Path for the GE accounting file". Accounting file should be
accessible from CREAM. If QMASTER is installed in a different machine share this file (by
NFS per example)
Example: /usr/local/ge2011.11/default/common/accounting
SGE_ROOT="The GE installation directory". Default: /usr/local/ge2011.11
SGE_CELL="GE cell definition". Default: default
SGE_QMASTER="GE qmaster port". Default: 536
SGE_EXECD="GE execd port". Default: 537
SGE_SPOOL_METH="GE spooling method"
BLPARSER_WITH_UPDATER_NOTIFIER="true"
JOB_MANAGER=sge
CE_BATCH_SYS=sge
 Configure the CREAMCE service (in siteinfo/site-info.def the BATCH_SERVER variable
should point to the machine where your GE Qmaster will run)
>/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n creamCE -n SGE_utils
 For MPI support you must intall glite-mpi and openmpi support:
>yum install glite-mpi
>yum install openmpi openmpi-devel
 Configure CREAM with appropiate vars as is described in [R18] link to run yaim:
>/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n MPI_CE
 In the GE Qmaster, declare the CE as an allowed submission machine:
>qconf -as <CE.MY.DOMAIN>
 If you have control of the GE Qmaster, make sure that in the Qmaster configuration you
have the following setting: execd_params INHERIT_ENV=false. This setting allows
propagating the environment of the submission machine (CE) into the execution machine
(WN). It can be implemented in GE QMASTER using:
>qconf –mconf
 The transferring of files between WN and CE is handled by a script, called

sge_filestaging, which must be available in all WNs under /opt/glite/bin, and which you
may find in your Cream-CE installation under /opt/glite/bin/sge_filestaging. This script
must be executed as prolog and epilog of your jobs. Therefore you should define
/opt/glite/bin/sge_filestaging --stagein and /opt/glite/bin/sge_filestaging --stageout
as prolog and epilog scripts, either in GE global configuration "qconf -mconf" or in each
queue configuration "qconf -mq <QUEUE>".
 Link CREAM-CE with a running GE Qmaster server

You should ensure that you are using the same GE version for client and server tools, and
that the GE installation paths are the same in the CREAM-CE and in the GE Qmaster
server.
 If you are using a GE installation shared via NFS or equivalent, and you do not want to
change it with YAIM, you must set up the following variable in your site-info.def file. The
default value for this variable is "no", which means that GE software WILL BE configured
by YAIM.
SGE_SHARED_INSTALL=yes
 To complete the installation can be following the steps explained in the previous
paragraph “Configure CREAM-CE and SGE Qmaster in different physical
machines”.
4.4. STARTING GE
 To start GE on the Master Host (CE), use the following command :
> /etc/init.d/sgemaster start
This command starts the scheduler 'sge_schedd' and the qmaster 'sge_qmaster'.
 To start GE on the Execution Hosts (WN) , use the following command:
> /etc/init.d/sgeexecd start
This command starts the execution daemon 'sge_execd', in order to be able to submit jobs
to this node
Notes: There is no restart option. You can use start/stop instead.

If you got some error during GE installation/configuration, you can send an e-mail to the GE
support mailing list to consult it [R12].
5. CONFIGURING GE ON CE
Configuring the batch system, we will first consider the "Global Cluster Configuration" setup and
the "Scheduler Configuration" setup. These configurations are modified on the "Master Host".
GE provides a GUI configuration tool called 'qmon', it can be launched from the command line
executing ‘qmon’ command. In this document we will use the command-line based methods.
Qmon main control window:
The command 'qconf' displays an editor for each one of its options. The editor is either the default
'vi' editor or an editor corresponding to the EDITOR environment variable.
The "Global Cluster Configuration" settings may be displayed using the command 'qconf
-sconf'. These settings may be modified using the command 'qconf -mconf'.
Below is a sample of a configuration used on testbed site "CESGA-SA3". We have included some
comments where we though the variable settings needed some explanation.
#global:
execd_spool_dir /usr/local/ge2011.11/default/spool # Directory in which we can see the
messages
mailer /bin/mail
xterm /usr/bin/X11/xterm
load_sensor none
prolog none # The exec path of a shell script that is started before execution of GE jobs
epilog none # The exec path of a shell script that is started after execution of GE jobs
shell_start_mode posix_compliant
login_shells sh,bash,ksh,csh,tcsh # The shells which it offers
min_uid 0
min_gid 0
user_lists none # Users to whom we allow to run jobs in this cluster
xuser_lists none # Users to whom we deny to run jobs in this cluster
projects none
xprojects none
enforce_project false
enforce_user auto
load_report_time 00:00:40
max_unheard 00:05:00
reschedule_unknown 02:00:00
loglevel log_warning # Form in which it will be shown logs on the messages
administrator_mail none
set_token_cmd none
pag_cmd none
token_extend_time none
shepherd_cmd none
qmaster_params none
execd_params none
reporting_params accounting=true reporting=false \
flush_time=00:00:15 joblog=false sharelog=00:00:00
finished_jobs 100 # The number of finished Jobs which we'll show with ‘qstat -s z –u ‘*’’
gid_range 20000-20100
qlogin_command builtin
qlogin_daemon builtin
rlogin_command builtin
rlogin_daemon builtin
rsh_command builtin
rsh_daemon builtin
max_aj_instances 2000
max_aj_tasks 75000
max_u_jobs 0
max_jobs 0
max_advance_reservations 0
auto_user_oticket 0
auto_user_fshare 0
auto_user_default_project none
auto_user_delete_time 86400
delegated_file_staging false
reprioritize 0
jsv_url none
jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w
The "Scheduler Configuration" settings may be displayed using the command 'qconf -ssconf'
and these settings may be modified using the command 'qconf -msconf'.
[root@sa3-ce etc]# qconf -ssconf

algorithm default
schedule_interval 0:0:15
maxujobs 0
queue_sort_method load
job_load_adjustments np_load_avg=0.50
load_adjustment_decay_time 0:7:30
load_formula np_load_avg
schedd_job_info false
flush_submit_sec 0
flush_finish_sec 0
params none
reprioritize_interval 0:0:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 5.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 0
weight_tickets_share 0
share_override_tickets TRUE
share_functional_shares TRUE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OFS
weight_ticket 0.010000
weight_waiting_time 0.000000
weight_deadline 3600000.000000
weight_urgency 0.100000
weight_priority 1.000000
max_reservation 0
default_duration INFINITY
Notes: After modifying the configuration of the Scheduler or the Global Cluster it is not necessary
to restart anything, it is only necessary to worry about what changes are the best for your site.
The detailed documentation about these parameters can be found on the following link [R13] in
paragraphs:
 About "Global Cluster Configuration" in the section 5 as "sge_conf"

 About "Scheduler Configuration" in the section 5 as "sched_conf"
5.1. POLICY_HIERARCHY PARAMETER AND OTHER PARAMETERS ABOUT PRIORITIES
The "POLICY_HIERARCHY" parameter can be a up to 4 letter combination of the first letters of the
4 policies S(hare-based), F(unctional), D(eadline) and O(verride). Basically the share-based
policy is equivalent to a fair-share policy that takes into account previous usage of the resources (in
GE terminology a share-tree policy), the functional policy is a fair-share policy that does not
consider historical information, the deadline policy takes into account the deadline value of the jobs
(if deadlines are set by the users), and the override policy allows a site admin to dynamically adjust
the priorities of the jobs in the system. You can combine all these policies and create your own
policy.
In our case the value OFS means that the override policy takes precedence over the functional
policy, which finally influences the share-based policy.
Then, we can assign this policy to a user or group. The final policy applied depends also on different
weight values that we can assign on the Scheduler Configuration:
Scheduling algorithms:
 weight_user
The relative importance of the user shares in the functional policy

 weight_project
The relative importance of the project shares in the functional policy.
 weight_department
The relative importance of the department shares in the functional policy
 weight_job
The relative importance of the job shares in the functional policy
 weight_tickets_functional
The maximum number of functional tickets available for distribution by Grid Engine. By
default is indefinite.
 weight_tickets_share
The maximum number of share based tickets available for distribution by Grid Engine. By
default is indefinite
 share_override_tickets
If set to "true" or "1", override tickets of any override object instance are shared equally
among all running jobs associated with the object
 share_functional_shares
If set to "true" or "1", functional shares of any functional object instance are shared among all
the jobs associated with the object
 weight_ticket
The weight applied on normalized ticket amount when determining priority finally used
 weight_waiting_time
The weight applied on the jobs waiting time since submission
 weight_deadline
The weight applied on the remaining time until a jobs latest start time
 weight_urgency
The weight applied on jobs normalized urgency when determining priority finally used
GE uses a weighted combination of these three policies to implement automated job scheduling
strategies:
 Share-based
 Functional
 Override
The Tickets are a mixture of these three policies. Each policy has a pool of tickets and the Tickets
weight the three policies.
The detailed documentation about "Scheduling Policies" can be found on link [R13].
5.2. HOW TO CONFIGURE A EPILOG OR PROLOG SCRIPT
There are two parameters that can be configured on the configurations of the queues:
 Prolog
It can be configured a prolog script to be executed before the job start.
 Epilog
It can be configured an epilog script to be executed when the job finishes.
To configure a queue with an epilog script we should:
a) Inspect the queues using the command 'qconf -sql'

b) Edit the queue's settings using the command 'qconf -mq queue_name'
Example:
[esfreire@sa3-ce esfreire]$ qconf -mq cesga
[ ... ]
qname cesga
....
epilog /usr/local/sge/pro/default/common/epilog.sh # Directory in which we have copied
the epilog script
# By default, the value for this variable is “none”, therefore we replaced the value “none” by the
PATH directory in which we have copied the script
[ ... ]
To avoid errors generated in the epilog, it is necessary to configure
“StrictHostKeyChecking=no” in each Execution Host in the file “/etc/ssh/ssh_config”, otherwise
epilog fails when requesting the Host verification.
5.3. HOW TO CONFIGURE RESOURCE QUOTAS

The Resource Quota feature allows to administrators for example applying limits to users, the limit
the jobs what can run one user or one user's group in one determinate host or queue.
Resource quotas are configured and defined as sets. Any set can contain one or more resource
quota limits. Resource quota syntax and rule sets behave are processed like firewall rules on Linux
systems.
The resource quota can be added creating it first in a file and then add this file with the respective
GE command or edit it directly using a editor like VIM with the respective GE command:
 To adding a resource quota sets:
> qconf –arqs [name]
 To adding a resource quota from file:
> qconf -Arqs file_name
 To modifying a resource quota set:
> qconf –mrqs [name] :
 To showing resource quota set:
> qconf –srqs [name_list]
 To showing resource quota set list:
> qconf -srqsl

 To deleting resource quota set
> qconf –drqs [name_list]
 To viewing information about the current Grid Engine resource quotas:
> qquota –u ‘*’
One configuration example:
We creating the following Resource Quota rule for one of our CEs in production:
[root@svgd ~]# qconf -srqsl
maxujobs_svgd
The “maxujobs_svgd” content is:

[root@svgd ~]# qconf -srqs
{
name maxujobs_svgd
description NONE
enabled TRUE
limit users {*,!orballo,!@GRID_ops,!@GRID_opssgm} hosts \
compute-3-38.local to s_vmem=512M
limit users mosfet012 to num_proc=10
limit users @GRID_alice hosts @all_X86 to num_proc=20
limit users @GRID_atlas hosts @all_X86 to num_proc=20
limit users @GRID_atlassgm hosts @all_X86 to num_proc=10
limit users @GRID_biomed hosts @all_X86 to num_proc=80
limit users @GRID_cesga hosts @all_X86 to num_proc=40
limit users @GRID_cms hosts @all_X86 to num_proc=40
limit users @GRID_compchem hosts @all_X86 to num_proc=70
limit users @GRID_dteam hosts @all_X86 to num_proc=20
limit users @GRID_fusion hosts @all_X86 to num_proc=60
limit users @GRID_globus hosts @all_X86 to num_proc=10
limit users @GRID_imath hosts @all_X86 to num_proc=30
limit users @GRID_lhcb hosts @all_X86 to num_proc=10
limit users @GRID_lhcbprd hosts @all_X86 to num_proc=26
limit users @GRID_lhcbpil hosts @all_X86 to num_proc=30
limit users @GRID_ops hosts @all_X86 to num_proc=10
limit users @GRID_opssgm hosts @all_X86 to num_proc=10
limit users @GRID_alicesgm hosts @all_X86 to num_proc=30
limit users @GRID_alicesgm hosts @nodos_X86_2GB to num_proc=10
limit users @GRID_swetest hosts @all_X86 to num_proc=2
limit users @GRID_eelaprod_10 hosts @all_X86 to num_proc=20
limit users @GRID_EGEE_sgm_10 hosts @all_X86 to num_proc=10
limit users @GRID_EGEE_prd_10 hosts @all_X86 to num_proc=10
limit users @GRID_EGEE_prd_10 hosts @all_X86 to num_proc=4
limit users {*,!orballo} hosts @nodos_X86_1GB to num_proc=6
}
The "@GRID_group" are users groups lists which we are defined previously. The "@all_X86",
"@nodos_X86_1GB" and "@nodos_X86_1GB" are hosts groups lists which we are defined
previously.
 In the example:
limit users {*,!orballo,!@GRID_ops,!@GRID_opssgm} hosts \
compute-3-38.local to s_vmem=512M
We are saying that only can run jobs which ask a maximum of 512 Megabytes on node
"compute-3-38.local", this rule is applied for all users "*", excepts "orballo" user and
"GRID_ops", "GRID_opssgm" users group lists.
 In the example:
limit users @GRID_lhcbprd hosts @all_X86 to num_proc=26
We are saying that the users defined on "GRID_lhcbprd" list only occupy a maximum of
twenty-six processors on the hosts defined in "all_X86" list, where "all_X86" list groups are
eighty nodes with one processor each one, in other words, one user defined on
"GRID_lhcbprd" list could occupy twenty-six of these eighty nodes.
 In the example:
We are saying all users '*' excepts "orballo" user only can occupy a maximum of four
processors on hosts defined on "nodos_X86_2GB" list, where "nodos_X86_2GB" list
groups are thirty nodes with one processor each one, in other words, a user different from
"orballo" user only could occupy four of these thirty nodes.
Note: It can be found detailed information on link [R16]. It also can be found information about how
to create host groups or user lists in section “6 Useful admin command”.
5.4. HOW TO CONFIGURE A SHADOW QMASTER
As explained in section "4.1 Introduction", a GE cluster is composed of execution hosts and

master hosts. The execution hosts run of the GE execution daemon "sge_execd". The master host
run the GE qmaster daemon "sge_qmaster". The qmaster daemon is central for the overall cluster
activity, and without it the jobs cannot be submitted or scheduled. In order to get fault tolerance, it is
possible define a machine so that in case of the master host fails this other one becomes the new
master host. This is known in GE like shadow qmaster host which runs the GE shadow daemon
"sge_shadowd", so in the event that the master host fails, the shadow daemon on one of the
shadow master machines will become the new master machine.
How to configure a shadow qmaster:
 Check that the new master host has read/write access:
The new master host must have read/write access to the qmaster spool directory and
common directory ($SGE_ROOT/default) as does the current master. For example, in
CESGA case, we mounted the "$SGE_ROOT/default" directory from the master host in
the shadow qmaster host.
 Give administration permissions to the shadow qmaster:
We should give permission on the master host to the machine where will be running the
shadow qmaster in order to it can acts like administrative and submit host, it is necessary
for the shadow qmaster host can do the same functions that the master host.
To adding the new host as an administrative and submit host:
>qconf -ah shadow_qmaster_machine

>qconf -as shadow_qmaster_machine
 Add the new host to the shadow_masters file:
Check if the “$SGE_ROOT/default/common/shadow_masters” file exists on the master

host. If the file exists, you can add the new qmaster host to this file. For example:
cat $SGE_ROOT/default/common/shadow_masters
svgd.local
test01.egee.cesga.es
If “$SGE_ROOT/default/common/shadow_masters” file does not exist, it can be
created with the same user with which one was installed GE on the machine for editing it
after. For example:
touch $SGE_ROOT/default/common/shadow_masters
 Starting the qmaster shadow on the qmaster shadow host. For example:
>$SGE_ROOT/default/common/sgemaster -shadowd
starting sge_shadowd
ps auxf | grep sge

root 3511 0.0 0.0 3660 652 pts/0 S+ 16:19 0:00 | \_ grep sge
root 3509 0.0 0.0 5284 796 ? S 16:19 0:00 /opt/cesga/sge62/bin/lx26-
x86/sge_shadowd
 This means that the shadow qmaster daemon is listening on the shadow qmaster host and
if for any reason the GE qmaster stops on the master host, for example because the
master host is being restarted, the GE qmaster would be started on the shadow qmaster
host
 How to force the migration of the GE qmaster to the shadow qmaster host (it must be a node
in shadow_masters file):
On the qmaster shadow host execute the following command:
>$SGE_ROOT/default/common/sgemaster -migrate
Example:
>$SGE_ROOT/default/common/sgemaster -migrate
shutting down qmaster on host "svgd" ...
starting sge_qmaster
 This means that the GE qmaster is stopped on the master host to be started on the
shadow qmaster host.
It can be found more information on link [R17]
6. USEFUL ADMIN COMMANDS

 To start/stop GE daemons on the Master Host and on the Execution Host respectively:
> /etc/init.d/sgemaster
(no parameters): start qmaster and execution daemon if applicable
"start" start qmaster and scheduler
"stop" shutdown local Grid Engine processes and jobs
"-qmaster" only start/stop qmaster and scheduler (if applicable)
>/etc/init.d/sgeexecd
(no parameters): start sgeexecd
"start" start sgeexecd
"stop" shutdown local Grid Engine processes and jobs
"softstop" shutdown local Grid Engine processes (no jobs)
Notes: If we stop the "qmaster" and the "scheduler" on the Master Host (CE), we don't see the
jobs running with 'qstat' because the GE commands are not available when "qmaster" is stopped
(there is no connection between the Master Host and GE through "qmaster"), but when we restart
them, we see the jobs running again, and they finished without problems. However, if we stopped
the "sge_execd" on the Execution Host (WN), when we restart it the job is killed although if we
stopped the "sge_execd" with the option "sofstop" in principle the job is not killed.
GE supports various Roles such as:
 Managers:
Managers have full capabilities to manipulate the grid engine system.
 Operators:
Operators can perform many of the same commands as managers, except that operators
cannot add, delete, or modify queues.
 Owners:
Queue owners are restricted to suspending and resuming, or disabling and enabling, the
queues that they own.
 Users:
Users have certain access permissions, but they have no cluster or queue management
capabilities.
The modification in the GE configuration is done by a “Manager” that is the user who has
permissions for modify these configurations, by default this user is root, but we can use another
user who we have created in the machine for these aims and the which is configured in one user
access list:
 To add one manager :
> qconf -am user-name #Add one or more users like manager
 To adding one or more users to the specified access list :
> qconf -au user-name acces-list-name
 To add/delete a administrative host, i.e. a host from which we can execute 'qconf'
> qconf -ah #Add
> qconf -dh #Delete
 To add/delete a submit host, i.e. a host from which we can execute 'qsub':
> qconf -as #Add

> qconf -ds #Delete
 To add/delete/modify/show an execution host, i.e. a host that will run jobs:
> qconf -ae #add

> qconf -de #Delete
> qconf -me execution_host #Modify
> qconf -se execution_host #Show
 To add/delete/modify/show a host group list:
> qconf -ahgrp group #Add

> qconf -dhgrp group #Delete
> qconf -mhgrp group #Modify
> qconf -shgrpl #Show
 To show the status of the jobs:
> qstat –u ‘*’

 Since the last GE updates it is necessary execute the command ‘qstat’ with the option ‘–u’
and wildcard “*’ between single quotes to see the status for all jobs in the cluster, otherwise
it is not showed anything.
 To see the jobs which are in queue:
> qstat -s p –u ‘*’
 To see the jobs which are running:
> qstat -s r –u ‘*’
 To see the jobs which have finalized:
> qstat -s z –u ‘*’
 To see the jobs which have finalized for one user with the previous options, i.e.:
> qstat -s z -u user
 To see the jobs which are running by queue and node:
> qstat –f –u ‘*’
It can also show the state of error “E” or Alarm “A” for the jobs and queues:
a) Case Error: To clean the state of Error for the queue 'qmod -cq queue' or for the
job 'qmod -cj jobID'
b) Case Alarm: In principle nothing can be done, just wait (normally is produced by a
high load in the execution node)
 To show all queues created:
> qconf -sql

 To add a new cluster queue
> qconf -aq queue (the argument queue is a queue which is created and we use as
template)
 To edit a queue:
> qconf -mq queue
 To show complex attributes:
> qconf -sc
Notes: Resource attribute definitions are stored in an entity called the grid engine system
"complex". Users can request "resource attributes" for jobs with 'qsub -l'. The "complex" builds
the framework for the system's "consumable resources" facility. The resource attributes that are
defined in the complex can be attached to the global cluster, to a host, or to a queue instance. The
attached attribute identifies a resource with the associated capability.
 To edit the complex:
> qconf -mc

 To list all jobs in queue and to see the reasons by which they don't enter in execution:
> qstat -j
Sample output:
[root@sa3-ce root]# qstat -j
[ .... ]
queue instance "lhcb@sa3-wn001.egee.cesga.es" dropped because it is
overloaded: np_load_avg=2.380000 (= 1.450000 + 0.50 * 1.860000 with nproc=1) >= 1.75
queue instance "ops@sa3-wn001.egee.cesga.es" dropped because it is
overloaded: np_load_avg=2.380000 (= 1.450000 + 0.50 * 1.860000 with nproc=1) >= 1.75
Jobs can not run because no host can satisfy the resource requirements
1337760
There could not be found a queue instance with suitable access permissions
1337968
Jobs can not run because queue instance is not in queue list of PE
1337968
Jobs can not run because available slots combined under PE are not in range of job
1337968, 1337970
Jobs can not run because queue instance is not of type batch or transfer
1338110
Jobs can not run because the resource requirements cannot be satisfied
1338110
Notes: Also to see the reasons by which a determined job not enters in execution or if the
job is running to see the time execution, memory (how much memory is consuming).
> qstat -j jobID
 To show additional information about the priority for each job
> qstat –pri –u ‘*’
 To show accounting information about a finished job
> qacct -j jobID
The 'qacct' command can be used to obtain varying degrees of information about a job, for
example, the queue name and the host the job was executed on, the job status of the
finished job, how long the job took and the maximum amount of memory used.
In the output of this command, we should watch the variables 'exit_status' and 'failed'. If
these variables have a value different from 0 it is signal that the job had some error or there
was some internal problem in GE. This is detailed in the section “11 Troubleshooting".
Sample output:
[root@sa3-ce root]# qacct -j 41711
==============================================================
qname cesga
hostname sa3-wn001.egee.cesga.es # The host the job was executed on
group cesga
owner esfreire
project NONE
department defaultdepartment
jobname STDIN
jobnumber 41711
taskid undefined
account sge
priority 19
qsub_time Fri Jun 15 13:22:39 2007
start_time Fri Jun 15 13:22:54 2007
end_time Fri Jun 15 13:25:55 2007
granted_pe NONE
slots 1
failed 0 # If it is different from 0 it usually indicates a failure in the configuration
of SGE
exit_status 0 # Exit code for the job, if is different from 0 it usually indicates
a job failure
ru_wallclock 181 # How long the job took
ru_utime 0
ru_stime 1
ru_maxrss 0
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 12271
ru_majflt 14128
ru_nswap 0
ru_inblock 0
ru_oublock 0
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 0
ru_nivcsw 0
cpu 1
mem 0.001
io 0.000
iow 0.000
maxvmem 20.379M # The maximum amount of memory used.
These parameters are kept in the accounting log file on the Master Host:
$SGE_ROOT/default/common/accounting
grep 41711 $SGE_ROOT/default/common/accounting

cesga:sa3-
wn001.egee.cesga.es:cesga:esfreire:STDIN:41711:sge:19:1181906559:1181906574:118190
6755:0:0:181:0:1:0.000000:0:0:0:0:12271:14128:0:0.000000:0:0:0:0:0:0:NONE:defaultdepart
ment:NONE:1:0:1.000000:0.001339:0.000000:-U cesga -q
cesga:0.000000:NONE:21368832.000000
The GE batch log file has a simple format using “:” as a field separator. These are the fields
that you can find in the accounting log file:
qname:hostname:group:owner:jobname:jobnumber:account:priority:qsub_time:start_time:en
d_time:failed:exit_status:
ru_wallclock:ru_utime:ru_stime:ru_maxrss:ru_ixrss:ru_ismrss:ru_idrss:ru_isrss:
ru_minflt:ru_majflt:ru_nswap:ru_inblock:ru_oublock:ru_msgsnd:ru_msgrcv:ru_nsignals:
ru_nvcsw:ru_nivcsw:project:department:granted_pe:slots:UNKNOWN:cpu:mem:
UNKNOWN:command_line_arguments:UNKNOWN:UNKNOWN:maxvmem_bytes
The accounting file format is documented in the GE man page [R13] in paragraph 5, section
"Accounting":
 To report account for Grid Engine usage
> qacct
Sample output:
[root@sa3-ce root]# qacct
Total System Usage

WALLCLOCK UTIME STIME CPU MEMORY IO
IOW
======================================================================
==
538426 36375 35599 75230 147.547 0.000
0.000
Notes: This command scans the accounting data file and produces a summary of
information for wall-clock time, cpu-time, and system time for the categories of
hostname, queue-name, group-name, owner-name, job-name, job-ID.
 To report all jobs that a given user has run during a certain time:
> qacct -d 10 -j -o user # It'll show all jobs finished for the user in the 10 last days and the
output for each job
 To submit a job with GE on the CE without complex
> qsub test.sh

Your job 42897 ("test.sh") has been submitted.
 To send a array of 200 jobs with qsub without complex:
> qsub -t 1-200 test.sh

Your job 42909.1-200:1 ("test.sh") has been submitted.
Submitting an array of jobs requesting complex:
> qsub -t 1-200 -l num_proc=1,s_rt=1:00:00,s_vmem=1G,h_fsize=1G test.sh

Your job 42908.1-200:1 ("test.sh") has been submitted.
 To delete a jobID:
> qdel jobID
 To delete the jobs of an user:
> qdel -u user
 To suspend/resume in job and to stop its execution momentarily:
> qmod -sj jobID # Suspend

> qmod -usj jobID # Resume
Note: When we suspend a job we stopped its execution, but we don't cancel it and when we
resume it, it continues again.
 To enable/disable a queue:
> qmod -e queue #Enable

> qmod -d queue #Disable
Note: When we enable a queue, we allow enter jobs in that queue for its execution.
When we disable a queue and on that queue there are jobs running we don't cancel them,
the jobs finish its execution correctly, but it would not enter more jobs in that queue.
 To change to the hold state to the jobs:
> qhold -h {u|o|s}
Note: The jobs in state of hold don't enter in execution. It is useful, for example if there are
some jobs that we don't want that they enter in execution, we put them in hold state.
 To change the state of hold the jobs:
> qalter -h {u|o|s}
 To give more priority to a jobId or to the jobs of an user which are in queue
> qalter -p 1024 jobID

> qalter -p 1024 -u user
7. HOW TO RUN ARRAY JOBS USING GE
GE allows submitting together a collection of similar jobs using only one job script. The scheduler
executes each job in the array when resources are available. To do this with use the Array job
option.
Some advantages of using array jobs are:
 It is only necessary to write one shell script.
 You can keep track of the status of all the jobs in the array using only one job id.
 If you submit an array job, and realize you’ve made a mistake, you only have one job id to
'qdel', instead of figuring out how to remove 100s of them.
 The memory consumption is much lower on the Master Host compared to the situation when
all the jobs run independently
For example, we can need run a large number of jobs, and they are largely identical in terms of the
command to run. In order to do this, you could generate many shell scripts, and submit them to the
queue, but instead this, you can use the job array option. For this you must execute the 'qsub' with
the option:
-t min-max:interval
The -t option defines the task index range, where min is the lower index, max the higher index and
interval the interval of the jobs.
GE runs the executable once for each number specified in the task index range and it generates the
variable, $SGE_TASK_ID, which we can use on the executable in order to determine the task
number each time it is run. It can select input files or other options to run according to its task index
number. A very simple example:
Supposing that we have two inputs in /opt/exp_soft/cesga/ where job1.in and job2.in contain a
column of numbers, and we want to read each number of inputs both, we can do a script like the
following:
jobArray.sh
#!/bin/bash
echo "Number of the job is $SGE_TASK_ID"
for n in `cat /opt/exp_soft/cesga/job$SGE_TASK_ID.in`
do
echo "___________ Reading number $n ___________"

done
We submit the job:

qsub -t 1-2 jobArray.sh
The contents of each output file are shown below.

cat jobArray.sh.o43447.1
Number of the job is 1
___________ Reading number 1 ___________
___________ Reading number 2 ___________

___________ Reading number 3 ___________
[ ... ]
cat jobArray.sh.o43447.2
Number of the job is 2
___________ Reading number 11 ___________
___________ Reading number 12 ___________
___________ Reading number 13 ___________
[ ... ]
8. HOW TO CONFIGURE A PARALLEL ENVIROMENT
In order to handle parallel jobs running, GE provides a flexible and powerful interface. For example
we can use one of the most known “Message Passing Environment” MPI on GE. With the purpose
of doing this possible, we should:
MPI support for the lcg-CE with GE is still under development, however, it is possible configure MPI
in a lcg-CE supporting GE in production on your own risk following the instructions on link [R18]
 Add a parallel environment
qconf -ap openmpi_egee

pe_name openmpi_egee
slots 16 # Number total of nodes on which we can run job
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE # Full control over slave tasks
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
We also should configure a queue with this environment in order to have a queue in which to
be able to run parallel jobs. For example:
[root@ce3 ~]# qconf -sq GRID_ops

qname GRID_ops
[ ... ]
pe_list openmpi_egee
[ ... ]
 Testing:
 Submitting a job directly to the GE batch system:
qsub -l num_proc=1 -pe openmpi_egee 2 test.sh
Note: In this example, it is requested 2 nodes with one processor each one
test.sh content:
#!/bin/bash
MPI_FLAVOR=OPENMPI
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`
# Path where is installed the MPI package

export MPI_PATH=/opt/i2g/openmpi
# Ensure the prefix is correctly set. Don't rely on the defaults.

eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX
export X509_USER_PROXY=/tmp/x509up_u527
export I2G_TMP=/tmp
export I2G_LOCATION=/opt/i2g
#export I2G_OPENMPI_PREFIX=/opt/i2g/openmpi
export I2G_MPI_TYPE=openmpi
export I2G_MPI_FLAVOUR=openmpi
#PATH to the application that we want to run

## This application should be copied in the WN, in this example it was copied to the
/tmp directory
export I2G_MPI_APPLICATION=/tmp/cpi
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_NP=2
export I2G_MPI_JOB_NUMBER=0
export I2G_MPI_STARTUP_INFO=/home/glite/dteam004
export I2G_MPI_PRECOMMAND=
export I2G_MPI_RELAY=
# PATH where is installed de mpi-start RPM

export I2G_MPI_START=/opt/i2g/bin/mpi-start
export I2G_MPI_START_DEBUG=1
export I2G_MPI_START_VERBOSE=1
$I2G_MPI_START
 Submitting the previous job using a WMS:
Building the JDL file:

esfreire@ui mpi_egee]$ cat lanzar_cpi_mpi.jdl
JobType = "Normal";
VirtualOrganisation = "dteam";
NodeNumber = “2;
# We use a wrapper script to starting the MPI job
Executable = "mpi-start-wrapper.sh";
Arguments = "cpi OPENMPI";
StdOutput = "cpi.out";
StdError = "cpi.err";
InputSandbox = {"cpi","mpi-start-wrapper.sh"};
OutputSandbox = {"cpi.out","cpi.err"};
# In this example, we are requesting available queue on one of CEs in CESGA-
EGEE production
Requirements = other.GlueCEUniqueID == "ce2.egee.cesga.es:2119/jobmanager-
lcgsge-GRID_dteam"
mpi-start-wrapper.sh content:
[esfreire@ui mpi_egee]$ cat mpi-start-wrapper.sh
#!/bin/bash
# Pull in the arguments.
MY_EXECUTABLE=`pwd`/$1
MPI_FLAVOR=$2
# Convert flavor to lowercase for passing to mpi-start

MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`
# Pull out the correct paths for the requested flavor.

eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`
export MPI_PATH=$MPI_PATH
# Ensure the prefix is correctly set. Don't rely on the defaults.

eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX
# Touch the executable. It exist must for the shared file system check.
# If it does not, then mpi-start may try to distribute the executable
# when it shouldn't.
touch $MY_EXECUTABLE
chmod +x $MY_EXECUTABLE
# Setup for mpi-start.

export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
# optional hooks
#export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh
#export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh
# If these are set then you will get more debugging information.
#export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1
# Invoke mpi-start.
$I2G_MPI_START
Submitting the job:

glite-wms-job-submit -a -o mpi.job lanzar_cpi_mpi.jdl
Downloading the job results:
glite-wms-job-output -i mpi.job --dir ~/jobOutput/
Note: In both examples, we are using a compiled application called “cpi” what calculates PI
number.
9. HOW TO ASSIGN PRIORITIES TO GROUPS AND USERS
If we want that the jobs of a certain VO to have more priority at the time of entering execution, we
can give it with the command 'qconf -mu VO_name' on the parameters fshare (The current
functional share of the department.) and oticket (the amount of override tickets currently assigned
to the department). Before assigning these two parameters we must configure the parameter type
as "ACL DEPT" or otherwise it will not let assign priorities.
In the next example, first we show a list of all groups created, there we see the departments
created. Then we edit the Ops group, in order to the jobs sent to VO Ops have more priority than
other jobs. It can be established priorities to all the VOs, and to give them for example a value
between 0 (low priority/value by default) and 9000 (high priority) and so you can establish
priorities based on your site. Also if we want to give more priority to a user, we can create one
"userlist" with that user, and then give him more priority on that "userlist".
[esfreire@sa3-ce esfreire]$ qconf -sul

biomed
cesga
compchem
deadlineusers
defaultdepartment
dteam
fusion
lhcb
ops
swetest
[esfreire@sa3-ce esfreire]$ qconf -su ops

name ops
type ACL
fshare 1000
oticket 1000
entries ops001,ops002,ops003,ops004,ops005,ops006,ops007,ops008,ops009,ops010, \
ops011,ops012,ops013,ops014,ops015,ops016,ops017,ops018,ops019,ops020, \
ops091,ops092,ops093,ops094,ops095,ops096,ops097,ops098,ops099,opssgm, \
opsprd
[esfreire@sa3-ce esfreire]$ qconf -au opssgm opssgm

added "opssgm" to access list "opssgm"
[esfreire@sa3-ce esfreire]$ qconf -su opssgm

name opssgm
type ACL DEPT
fshare 0
oticket 1000
entries opssgm
10. ONE CONFIGURATION EXAMPLE
For this configuration example:
 We suppose that we have a Cluster formed by 80 nodes

 We have many users and who send average 20 jobs each one.
 We give priority to the short jobs
 We give more priority to the jobs of the Dteam and Ops VOs
An optimized "Global Cluster Configuration" could be:
[esfreire@sa3-ce~]$ qconf -sconf

global:
execd_spool_dir /usr/local/sge/pro/default/spool
mailer /bin/mail
xterm /usr/bin/X11/xterm
load_sensor none
prolog none
epilog none
shell_start_mode unix_behavior
login_shells sh,ksh,csh,tcsh,bash
min_uid 0
min_gid 0
user_lists none
xuser_lists none
projects none
xprojects none
enforce_project false
enforce_user auto
load_report_time 00:00:40
max_unheard 00:05:00
reschedule_unknown 00:00:00
loglevel log_info
administrator_mail egee-admin@cesga.es
set_token_cmd none
pag_cmd none
token_extend_time none
shepherd_cmd none
qmaster_params enabled_force_qdel=true
execd_params none
reporting_params accounting=true reporting=true \
flush_time=00:00:15 joblog=false \
sharelog=00:00:00
finished_jobs 80
gid_range 5000-5100
qlogin_command telnet
qlogin_daemon /usr/sbin/in.telnetd
rlogin_daemon /usr/sbin/in.rlogind
max_aj_instances 2000
max_aj_tasks 75000
max_u_jobs 80
max_jobs 0
auto_user_oticket 0
auto_user_fshare 0
auto_user_default_project none
auto_user_delete_time 86400
delegated_file_staging none
reprioritize 1
With this configuration, each time that a job fails, we will receive an e-mail with the output of error
for the job. We are allowing delete jobs with 'qdel -f', this is useful when the jobs are disappointed.
Like we have 80 nodes, we only allow that one user only can have 80 jobs in queue, the user will
not be able send more of 80 jobs.
An optimized way for the Scheduler Configuration, it could be:
[esfreire@sa3-ce ~]$ qconf -ssconf

algorithm default
schedule_interval 00:2:00
maxujobs 5
queue_sort_method seqno
job_load_adjustments np_load_avg=0.50
load_adjustment_decay_time 0:7:30
load_formula np_load_avg
schedd_job_info true
flush_submit_sec 10
flush_finish_sec 10
params MONITOR=0
reprioritize_interval 0:15:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 5.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 10000
weight_tickets_share 10000
share_override_tickets TRUE
share_functional_shares TRUE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OFS
weight_ticket 2.000000
weight_waiting_time 2.000000
weight_deadline 3600000.000000
weight_urgency 1.000000
weight_priority 1.000000
max_reservation 5
default_duration 0:10:0
With this configuration, the scheduler interval runs every two minutes, trying not to overload the
machine. Like we only have 80 nodes, and we have many users we only allow that one user only
can have 5 jobs running at the same time in order to all users can have jobs running. We use the
queue sorting scheme by sequence number of this way, each VO has its own queue, each queue
has one sequence number assigned and also we have a queue in the which jobs of all the VO are
allowed but only of one hour of execution can enter, therefore this queue will have the sequence
number more short for it try the jobs find first this queue and then the other queues will be sorted for
the sequence number. We specified the sequence number on the parameter seq_no, for example
we "short queue" could be so:
[esfreire@sa3-ce~]$ qconf -sq short_queue

qname short_queue
hostlist sa3-wn001.egee.cesga.es
seq_no 0 #With this value “0” , it will be the first queue which the job find,
in case that we do not have any queue else with this value
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH
ckpt_list NONE
pe_list make
rerun FALSE
slots 1
tmpdir /tmp
shell /bin/ksh
prolog NONE
epilog NONE
shell_start_mode unix_behavior
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt 1:00:00 # The jobs for this queue can't request more of one hour.
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize 400M
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem 400M
h_vmem INFINITY
In order to give more priority to the jobs of the "dteam" and "ops" VOs, and to try that the jobs of
these VOs have the least time possible of delay to enter to execution, we always add an additional
slot reserved for Ops and dteam.
We can allow that in a node run at the same time a job of another VO and a job of dteam/ops
because these jobs consume neither much Memory nor much CPU.
For example:
qconf -sq ops

[ ... ]
slots 2
[ ... ]
s_rt 1:00:00
[ .... ]
s_vmem 400M
qconf -sq dteam

[ ... ]
slots 2
[ .... ]
s_rt 1:00:00
[ ... ]
s_vmem 400M
Rest of queues:
qconf -sq '*'

[ ... ]
slots 1
[ ... ]
Note: In this case we played with two slots, because we suppose that these nodes only have a
processor each one.
11. TROUBLESHOOTING
We can know if a job has finished correctly executing the command 'qacct -j jobid_of_the_job'. On
the output of this command, we must watch the variables 'exit_status' and 'failed'. If these
variables have a value different from 0 it is signal that the job had some error or there was some
internal problem in GE, for example on the epilog script or node on which ran the job has some
problem. If we see that these values are different from 0, we can look for tracks on that it could be
failing in the following sites:
 In the files that are generated when finishing the job:
xxx.ojobID (where the output of data for the job is redirected)

xxx.ejobID (where the output of errors for the job is redirected)
 Important logs that we must watch, for example we can looking for by jobid of the job:
a) On the master node (CE):
Qmaster's logs:
[esfreire@sa3-ce esfreire]$ grep 42895

$SGE_ROOT/default/spool/qmaster/messages
06/22/2007 07:06:31|qmaster|sa3-ce|I|job 42895.1 finished on host sa3-

wn001.egee.cesga.es
Resources that asked for the job, in case that the job asked resources or for see
what consumed the job:
[esfreire@sa3-ce esfreire]$ grep 42895 $SGE_ROOT/default/common/accounting

dteam:sa3-
wn001.egee.cesga.es:dteam:dteam006:STDIN:42895:sge:19:1182488713:11824887
15:1182488790:0:0:75:17:22:0.000000:0:0:0:0:272462:463298:0:0.000000:0:0:0:0:0:
0:NONE:defaultdepartment:NONE:1:0:39.000000:0.233644:0.000000:-U dteam -q
dteam:0.000000:NONE:2068172800.000000
b) On the execution Host (Wn):
Qmaster's logs:
[root@sa3-wn001 root]# grep 42895 $SGE_ROOT/default/spool/sa3-

wn001/messages
Besides, we can check the exit_status obtained and see its meaning in the manual “N1 Grid
Engine User Guide" [R19] and we can also subscribe to the mailing list where we send our
consultations.
It is possible to be subscribed to the list on link [R20].
It can be found more information about the error messages and troubleshooting on link [R21]
12. COMMANDS QUICK REFERENCE
TARGET ACTION CMD. SWITCH TARGET ACTION CMD. SWITCH
SCHEDULER SHOW qconf -sss ADMIN HOST CREATE qconf -ah name
TERMINA qconf -ks DELETE qconf -dh name

TE
SCHEDULER MODIFY qconf -msconf LIST qconf -sh
CONFIG
SHOW qconf -ssconf ADVANCE CREATE qresu (SEE
RESERVATION b MANPAGE)
SHARE TREE ADD qconf -astnode DELETE qrdel res_id
NODE path
CREATE qconf -astree SHOW qrstatf -ar ar_id
DELETE qconf -dstree LIST qrstat -u ‘*’
SHOW qconf -sstree CALENDAR CREATE qconf -acl name

SUBMIT CREATE qconf -as name DELETE qconf -dcal name
HOST
DELETE qconf -ds name MODIFY qconf -mcal name
SHOW qconf -ss SHOW qconf -scal name

USER CREATE qconf -auser LIST qconf -scall
DELETE qconf -duser CHECKPOINT CREATE qconf -ackpt name

name ENVRIONMENT
MODIFY qconf -muser DELETE qconf -dckpt name

name
SHOW qconf -usq name MODIFY qconf -mckpt name
LIST qconf -suserl SHOW qconf -sckpt name

USER LIST CREATE qconf -au LIST qconf -sckptl
user_nam
e
list_name
DELETE qconf -dul COMPLEX CREATE qconf -mc

list_name ENTRY
DELETE qconf -du DELETE qconf -mc

USER user_nam
e
list_name
MODIFY qconf -mu MODIFY qconf -mc

list_name
SHOW qconf -su SHOW qconf -sc
list_name
LIST qconf -sul

USER SET SEE “USER LIST” CONSUMABLE SEE “COMPLEX ENTRY”
ARGET ACTION CMD. SWITCH TARGET ACTION CMD. SWITCH

DEPARTMEN SEE “USER LIST” JOB ALTER qalter (SEE
T MANUAL)
EVENT SHOW qconf -secl CLEAR qmod -cj jobID
CLIENT LIST ERROR
EXEC HOST STOP qconf -ke name HOLD qalter -hold_jid
JOBID
If name is “all”, all exec hosts HOLD qhold jobID

will be killed
EXEC HOST CREATE qconf -ae RELEASE qrls -h n jobID
CONFIG [config]
Name conifg with “-ae” to RESCHED qmod -rj jobID

import as template ULE
DELETE qconf -de name | SHOW qstat -j jobID

global
MODIFY qconf -me name Note: In most cases, job name

| global and wildcard (*) patterns can be
used in place of jobID
SHOW qconf -se name | LIST qstat -u '*'

global
GLOBAL SEE “EXEC HOST CONFIG” SUBMIT qrsh (SEE
EXEC HOST MANPAGE)
CONFIG
GLOBAL SEE “HOST CONFIG” SUBMIT qrsub (SEE

HOST MANPAGE)
CONFIG
HOST CREATE qconf -aconf SUSPEND qmod -sj jobID

CONFIG name
DELETE qconf .dconf TERMINAT qdel jobID

name E
MODIFY qconf -mconf MANAGE / CREATE qconf -am name

[name] OPERATOR
SHOW qconf -sconf DELETE qconf -dm name

[name]
LIST qconf -sconfl LIST qconf -sm
Note: The “global” host PARALLEL CREATE qconf -ap name

configuration cannot be ENVIRONMENT
deleted. If [name] not provided, (“PE”)
GE assumes “global”.
HOST CREATE qconf -ahgrp DELETE qconf -dp name
GROUP @name
DELETE qconf -dhgrp MODIFY qconf -mp name

@name
MODIFY qconf -mhgrp SHOW qconf -sp name

@name
SHOW qconf -shgrp LIST qconf -spl

@name
LIST qconf -shgrpl QMASTER STOP/TER qconf -km

MINATE
PROJECT CREATE qconf -aprj name QUEUE CLEAR qmod -cq name |
ERROR '*'
DELETE qconf -dprj name CREATE qconf -aq name
MODIFY qconf -mprj DELETE qconf -dq name

name
SHOW qconf -sprj name MODIFY qconf -mq name
LIST qconf -sprjl RESUME qconf -usq name

RESOURCE SEE “COMPLEX ENTRY” SHOW qconf -sq [name]
RESOURCE CREATE qconf -arqs
QUOTA SET [name] -sq used without [name] prints
default template
DELETE qconf -drqs LIST qstat -f | -u '*'

name
MODIFY qconf -mrqs LIST qselec

name t
SHOW qconf -srqs LIST qconf -sql

[name]
LIST qconf -srqsl SUSPEND qmod -sq name

SETS
SHOW qquota -u '*'

GE Cookbook v3 0

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

GE Cookbook v3 0

Hochgeladen von

Copyright:

Verfügbare Formate

GE COOKBOOKV3.

Document identifier: GE_Cookbook-v3-0.odt

Document status: FINAL

Abstract: A quick reference for using GE batch system in an EMI environment.

Document Change Record

3. GE IN THE "LCG/GLITE" CONTEXT.......................................................................................................................6

7. HOW TO RUN ARRAY JOBS USING GE..................................................................................................................29

8. HOW TO CONFIGURE A PARALLEL ENVIROMENT.........................................................................................31

9. HOW TO ASSIGN PRIORITIES TO GROUPS AND USERS .................................................................................35

10. ONE CONFIGURATION EXAMPLE.......................................................................................................................36

12. COMMANDS QUICK REFERENCE........................................................................................................................43

1.2. DOCUMENT ORGANIZATION

1.3. APPLICATION AREA

R3 SUN INDUSTRY STANDARDS SOURCE LICENSE

1.5. DOCUMENT AMENDMENT PROCEDURE

 Extensive operating system support: RedHat, Debian, HP-UX, Solaris, etc.

3. GE IN THE "LCG/GLITE" CONTEXT

 GE include a scheduler, 'sge_schedd', so it is not required to maintain a separate

 Configuration of a wide range of policies of group and users, therefore VO policies.

Note: A host can belong to more than one group.

4.2. INSTALL PROCCESS FOR EMI CREAM-CE WITH GE

To install compiled version:

Grid Engine TCP/IP service >sge_qmaster<

Grid Engine TCP/IP service >sge_execd<

Grid Engine TCP/IP service >sge_qmaster<

Grid Engine TCP/IP service >sge_execd<

4.3. CREAM INSTALLATION FOR GE

 CREAM and GE Qmaster in the same physical machine

 CREAM and GE Qmaster in different physical machines

> qconf -mconf

- See section “5 Configuring GE on CE" to get more information.

It is necessary declare the CE as an allowed submission machine on the GE Qmaster:

>qconf -as <CE.MY.DOMAIN>

- See section “6 Useful admin commands” to get more information.CREAM-CE installation

 Installing CREAM-CE and GE Qmaster in the same physical machine

 Install GE as described in 4.2 INSTALL PROCCESS for Lcg-ce/CREAM-CE WITH GE

BATCH_SERVER="GE Qmaster FQN"

 Configure the CREAMCE and GE_utils services: (in siteinfo/site-info.def the

>/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n creamCE -n SGE_utils

 The transferring of files between WN and CE is handled by a script, called

 Configuring CREAM-CE and GE Qmaster in different physical machines

Note: In this example we do assume that no GE NFS installation will be used.

 Install GE as described in 4.2 INSTALL PROCCESS for Lcg-ce/CREAM-CE WITH GE

BATCH_SERVER="GE Qmaster FQN"

 In the GE Qmaster, declare the CE as an allowed submission machine:

>qconf -as <CE.MY.DOMAIN>

 The transferring of files between WN and CE is handled by a script, called

 Link CREAM-CE with a running GE Qmaster server

 To start GE on the Execution Hosts (WN) , use the following command:

> /etc/init.d/sgeexecd start

Notes: There is no restart option. You can use start/stop instead.

[root@sa3-ce etc]# qconf -ssconf

 About "Global Cluster Configuration" in the section 5 as "sge_conf"

5.1. POLICY_HIERARCHY PARAMETER AND OTHER PARAMETERS ABOUT PRIORITIES

The relative importance of the user shares in the functional policy

The relative importance of the project shares in the functional policy.

The relative importance of the department shares in the functional policy

The relative importance of the job shares in the functional policy

The weight applied on the jobs waiting time since submission

5.2. HOW TO CONFIGURE A EPILOG OR PROLOG SCRIPT

It can be configured a prolog script to be executed before the job start.

It can be configured an epilog script to be executed when the job finishes.

To configure a queue with an epilog script we should:

a) Inspect the queues using the command 'qconf -sql'