Sie sind auf Seite 1von 10

Cluster Computing 5, 237246, 2002

2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

Condor-G: A Computation Management Agent


for Multi-Institutional Grids
JAMES FREY, TODD TANNENBAUM and MIRON LIVNY
Department of Computer Science, University of Wisconsin, Madison, WI 53706, USA

IAN FOSTER and STEVEN TUECKE


Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA

Received January 2002

Abstract. In recent years, there has been a dramatic increase in the number of available computing and storage resources. Yet few tools exist
that allow these resources to be exploited effectively in an aggregated form. We present the Condor-G system, which leverages software
from Globus and Condor to enable users to harness multi-domain resources as if they all belong to one personal domain. We describe the
structure of Condor-G and how it handles job management, resource selection, security, and fault tolerance. We also present results from
application experiments with the Condor-G system. We assert that Condor-G can serve as a general-purpose interface to Grid resources, for
use by both end users and higher-level program development tools.

Keywords: Condor, Globus, distributed computing, Grid computing

1. Introduction They do care about how long their tasks are likely to run
and how much these tasks will cost.
In recent years the scientific community has experienced a
dramatic pluralization of computing and storage resources. In this article, we present an innovative distributed comput-
The national high-end computing centers have been joined ing framework that addresses these three issues. The Con-
by an ever-increasing number of powerful regional and lo- dor-G system leverages the significant advances that have
cal computing environments. The aggregated capacity of been achieved in recent years in two distinct areas: (1) secu-
these new computing resources is enormous. Yet, to date, rity, resource discovery, and resource access in multi-domain
few scientists and engineers have managed to exploit the ag- environments, as supported within the Globus Toolkit [14],
gregate power of this seemingly infinite Grid of resources. and (2) management of computation and harnessing of re-
While in principle most users could access resources at mul- sources within a single administrative domain, specifically
tiple locations, in practice few reach beyond their home in- within the Condor system [24,26]. In brief, we combine the
stitution, whose resources are often far from sufficient for inter-domain resource management protocols of the Globus
increasingly demanding computational tasks such as simula- Toolkit and the intra-domain resource management methods
tion, large scale optimization, Monte Carlo computing, image of Condor to allow the user to harness large collections of
processing, and rendering. The problem is the significant po- resources across multiple domains as if they all belong to one
tential barrier associated with the diverse mechanisms, poli- personal domain. The user defines the tasks to be executed;
cies, size, failure modes, performance uncertainties, etc., that Condor-G handles all aspects of discovering and acquiring
inevitably arise when we bring together large collections of appropriate resources, regardless of their location; initiating,
resources that cross the boundaries of administrative domains. monitoring, and managing execution on those resources; de-
Overcoming this potential barrier requires new methods tecting and responding to failure; and notifying the user of
and mechanisms that meet the following three key user re- termination and unrecoverable faults. The result is a powerful
quirements for computing in a Grid that comprises re- tool for managing a variety of parallel computations in Grid
sources at multiple locations: environments.
Condor-Gs utility has been demonstrated via record-
They want to be able to discover, acquire, and reli- setting computations. For example, in one recent computa-
ably manage computational resources dynamically, in the tion a Condor-G agent managed a mix of desktop worksta-
course of their everyday activities. tions, commodity clusters, and supercomputer processors at
They do not want to be bothered with the location of these ten sites to solve a previously open problem in numerical op-
resources, the mechanisms that are required to use them, timization [5]. In this computation, over 95,000 CPU hours
with keeping track of the status of computational tasks op- were delivered over a period of less than seven days, with an
erating on these resources, or with reacting to failure. average of 653 processors being active at any one time. In
238 FREY ET AL.

another case, resources at three sites were used to simulate and state, and secure allocation of remote computational
and reconstruct 50,000 high-energy physics events, consum- resources and management of computation on those re-
ing 1200 CPU hours in less than a day and a half. sources. We use the protocols defined by the Globus
Condor-G has also been incorporated as a component of Toolkit [14], a de facto standard for Grid computing.
several Grid computing systems, including the NCSA Grid in Computation management issues are addressed via the
a Box and GridChem, and the EU DataGrid Resource Bro- introduction of a robust, multi-functional user computa-
ker [3,27,28]. tion management agent responsible for resource discov-
In the rest of this article, we describe the specific prob- ery, job submission, job management, and error recovery.
lem we seek to solve with Condor-G, the architecture of the This Condor-G component is taken from the Condor sys-
system, and the results and experience obtained to date. tem [24].
Remote execution environment issues are addressed via the
2. Large-scale sharing of computational resources use of mobile sandboxing technology that allows a user to
create a tailored execution environment on a remote node.
We consider a Grid environment in which an individual user This Condor-G component is also taken from the Condor
may, in principle, have access to computational resources at system.
many sites. Answering why the user has access to these re-
sources is not our concern. It may be because the user is a This separation of concerns between remote resource ac-
member of some scientific collaboration, or because the re- cess and computation management has some significant ben-
sources in question belong to a colleague, or because the user efits. First, it is significantly less demanding to require that
has entered into some contractual relationship with a resource a remote resource speak some simple protocols rather than to
provider [16]. The point is that the user is authorized to use require it to support a more complex distributed computing
resources at those sites to perform a computation. The ques- environment. This is particularly important given that the de-
tion that we address is how to build and manage a multi-site ployment of production Grids [6,21,33] has made it increas-
computation that uses those resources. ingly common that remote resources speak these protocols.
Performing a computation on resources that belong to dif- Second, as we explain below, careful design of remote ac-
ferent sites can be difficult in practice for the following rea- cess protocols can significantly simplify computation man-
sons: agement.
different sites may feature different authentication and au-
thorization mechanisms, schedulers, hardware architec- 3. Grid protocol overview
tures, operating systems, file systems, etc.;
the user has little knowledge of the characteristics of re- In this section, we briefly review the Grid protocols that we
sources at remote sites, and no easy means of obtaining exploit in the Condor-G system: GRAM, GASS, MDS-2, and
this information; GSI. The Globus Toolkit provides open source implementa-
due to the distributed nature of the multi-site computing tions of each.
environment, computers, networks, and subcomputations
can fail in various ways; 3.1. Grid security infrastructure
keeping track of the status of different elements of a com-
putation involves tedious bookkeeping, especially in the The Globus Toolkits Grid Security Infrastructure (GSI) [15]
event of failure and dependencies among subcomputa- provides essential building blocks for other Grid protocols
tions. and for Condor-G. This authentication and authorization sys-
tem makes it possible to authenticate a user just once, using
Furthermore, the user is typically not in a position to re- public key infrastructure (PKI) mechanisms to verify a user-
quire uniform software systems on the remote sites. For ex- supplied Grid credential. GSI then handles the mapping
ample, if all sites to which a user had access ran DCE and of the Grid credential to the diverse local credentials and au-
DFS, with appropriate cross-realm Kerberos authentication thentication/authorization mechanisms that apply at each site.
arrangements, the task of creating a multi-site computation Hence, users need not re-authenticate themselves each time
would be significantly easier. But it is not practical in the they (or a program acting on their behalf, such as a Condor-G
general case to assume such uniformity. computation management service) access a new remote re-
The Condor-G system addresses these issues via a sepa- source.
ration of concerns between the three problems of remote re- GSIs PKI mechanisms require access to a private key that
source access, computation management, and remote execu- they use to sign requests. While in principle a users private
tion environments: key could be cached for use by user programs, this approach
Remote resource access issues are addressed by requir- exposes this critical resource to considerable risk. Instead,
ing that remote resources speak standard protocols for re- GSI employs the users private key to create a proxy creden-
source discovery and management. These protocols sup- tial, which serves as a new private-public key pair that al-
port secure discovery of remote resource configuration lows a proxy (such as the Condor-G agent) to make remote
CONDOR-G 239

requests on behalf of the user. This proxy credential is anal-


ogous in many respects to a Kerberos ticket [32] or Andrew
File System token.

3.2. GRAM protocol and implementation

The Grid Resource Allocation and Management (GRAM)


protocol [12] supports remote submission of a computational
request (run program P) to a remote computational re-
source, and subsequent monitoring and control of the result-
ing computation. Three aspects of the protocol are particu-
larly important for our purposes: security, two-phase commit,
and fault tolerance. The latter two mechanisms were devel-
oped in collaboration with the UW team and are part of the
GRAM version included in the Globus Toolkit 2.0 release.
GSI security mechanisms are used in all operations to au-
thenticate the requestor and for authorization. Authentication
is performed using the supplied proxy credential, hence pro-
viding for single sign-on. Authorization implements local Figure 1. GRAM protocol message exchange.
policy and may involve mapping the users Grid id into a
local subject name; however, this mapping is transparent to a correctly queued or executing job. This is also true for a
the user. Work in progress will also allow authorization deci- single machine that has a persistent job queue. Hence, our
sions to be made on the basis of capabilities supplied with the GRAM implementation logs details of all active jobs to sta-
request. ble storage at the resource side, indexed by the request ID.
Each job request from a client is assigned a unique identi- This information is used after a GRAM server crashes and is
fier by the server. This request ID is a critical piece of infor- restarted to resume monitoring active jobs.
mation for enabling fault tolerance in the protocol. It is the Figure 1 illustrates the messages exchanged in a typical
one thing shared between the client and server for a given re- GRAM request. First, the client connects to the server and
quest. With it, client can reestablish communication with the sends the job request. The server prepares to run the job (by
server about the request after a failure. The client and server checking the request, staging files, creating a job record on
must store it persistently from before job submission to after disk, etc.). If the preparations are successful, it sends a ready
job completion. message back to the client, including a unique identifier for
Two-phase commit [17] is important as a means of achiev- the job. The client saves the job ID to disk and then sends a
ing exactly once execution semantics. It ensures that a commit message (which is acknowledged by the server). The
client wont forget about an active request due to a fail- server then executes the job. When the job completes, the
ure, and that the client can reconnect to the server and obtain server sends a completion message to the client (which is ac-
the current status of the request after a failure. When a job knowledged). The client ensures that the completion status
request is received by a server, the request ID is sent back to and any staged output are safely on disk, and then sends a
the client, but job execution (or submission to a local queuing commit message back to the server (which is acknowledged).
system) is delayed until a commit message is received by The server cleans up the files used by the job and then erases
the client. This lets the client store the request ID on disk be- the job record (and all knowledge of the job). One problem
fore the job is submitted. When the job completes, the server not addressed by the current system is jobs where the client
notifies the client, but delays cleanup until it receives another disappears and never returns. These jobs dont get cleaned up
commit message. This allows the client to ensure that the by the server. To do so, the server needs to judge when its
jobs completion status and any staged output are on disk. By safe to assume that the client wont come back. One possi-
writing information to persistent storage before sending each bility is to have the client specify a time-out interval. If this
commit message, the client can ensure that it wont lose interval passes without any contact from the client, the server
track of a job due to a client-side crash. After a failure, the is free to terminate and clean up the job.
client can reconnect to the server and obtain the current status
of the request. 3.3. MDS protocols and implementation
Resource-side fault tolerance support addresses the fact
that a single resource may often contain multiple machines The Globus Toolkits MDS-2 provides basic mechanisms for
(e.g., a cluster or Condor pool) with specialized interface discovering and disseminating information about the struc-
machines running the GRAM server(s) that provide the con- ture and state of Grid resources [11]. The basic ideas are
nection between submitting clients and local schedulers. Con- simple. A resource uses the Grid Resource Registration Pro-
sequently, failure or restart of an interface machine may re- tocol (GRRP) to notify other entities that it is part of the Grid.
sult in the remote client losing contact with what is otherwise Those entities can then use the Grid Resource Information
240 FREY ET AL.

Table 1
Protocol (GRIP) to obtain information about resource status.
User log event types.
These two protocols allow us to construct a range of interest-
ing structures, including various types of directories that sup- Log events Additional information
port discovery of interesting resources. GSI authentication is Job submitted to Condor-G N/A
used as a basis for access control. Job submitted to Globus resource Resource name
Job started executing N/A
Job terminated Exit code (if available),
3.4. GASS execution time
Job removed by user N/A
The Globus Toolkits Global Access to Secondary Storage Globus submit failed Error code
(GASS) service [9] provides mechanisms for transferring data
between a remote HTTP, FTP, or GASS server. In the current
and past, Condor-G provides the user log. The user log pro-
context, we use these mechanisms to stage executables and
vides a history of job events in a format that can be easily read
input files to a remote computer. As usual, GSI mechanisms
by the user or parsed by a program. There are C++ and Perl
are used for authentication.
APIs for reading the log file. Work is ongoing to represent the
log as an XML document. This will make it easier for other
4. Computation management: the Condor-G agent programs to read the log. Each log entry contains an event
id, job id, timestamp, and optional additional information.
Next, we describe the Condor-G computation management Table 1 shows the types of job events. A higher-level system
service (or Condor-G agent). It makes the Grid appear like can read the user log to watch the status of jobs it has submit-
a local resource by using the previously-described Grid pro- ted to Condor-G. When it sees a job termination event, it can
tocols to handle remote execution, fault tolerance, credential then process the results of the job. If it sees a log event indi-
management, and resource scheduling. cating an error, it can take appropriate action. For example, if
a job request is denied by a resource, it can try submitting to
a different resource.
4.1. User interface
4.2. Supporting remote execution
The Condor-G agent allows the user to treat the Grid as an
entirely local resource, with an API and command line tools Behind the scenes, the Condor-G agent executes user compu-
that allow the user to perform the following job management tations on remote resources on the users behalf. It does this
operations: by using the Grid protocols described above to interact with
submit jobs, indicating an executable name, input/output machines on the Grid and mechanisms provided by Condor to
files and arguments; maintain a persistent view of the state of the computation. In
particular, it:
query a jobs status, or cancel the job;
be informed of job termination or problems, via callbacks stages a jobs standard I/O and executable using GASS;
or asynchronous mechanisms such as e-mail; submits a job to a remote machine using the revised
obtain access to detailed logs, providing a complete his- GRAM job request protocol; and
tory of their jobs execution. subsequently monitors job status and recovers from re-
mote failures using the revised GRAM protocol and
There is nothing new or special about the semantics of GRAM callbacks and status calls; while
these capabilities, as one of the main objectives of Condor-G
is to preserve the look and feel of a local resource manager. authenticating all requests via GSI mechanisms.
The innovation in Condor-G is that these capabilities are pro- The Condor-G agent also handles resubmission of failed
vided by a personal desktop agent and supported in a Grid jobs, communications with the user concerning unusual and
environment, while guaranteeing fault tolerance and exactly- erroneous conditions (e.g., credential expiry, discussed be-
once execution semantics. By providing the user with a famil- low), and the recording of computation on stable storage to
iar and reliable single access point to all the resources he/she support restart in the event of its failure.
is authorized to use, Condor-G empowers end-users to im- We have structured the Condor-G agent implementation as
prove the productivity of their computations by providing a depicted in figure 2. The Scheduler responds to a user request
unified view of dispersed resources. to submit jobs destined to run on Grid resources by creat-
The Condor-G agent can also be used by software sys- ing a new GridManager daemon to submit and manage those
tems that require a Grid-oriented job management service. jobs. One GridManager process handles all jobs for a single
Although there is not currently a full programming API for in- user and terminates once all jobs are complete. Each Grid-
teracting with Condor-G, systems that use Condor-G through Manager job submission request (via the modified two-phase
the user interface can and have been built. To submit jobs commit GRAM protocol) results in the creation of one Globus
and manipulate them in the queue, you use the command line JobManager daemon. This daemon connects to the GridMan-
tools. To monitor the status and history of jobs, both current ager using GASS in order to transfer the jobs executable and
CONDOR-G 241

Figure 2. Remote execution by Condor-G on Globus-managed resources.

standard input files, and subsequently to provide real-time ing a network failure). In either case, the GridManager starts
streaming of standard output and error. Next, the JobMan- a new JobManager, which will resume watching the job or tell
ager submits the jobs to the execution sites local scheduling the GridManager that the job has completed.
system. Updates on job status are sent by the JobManager To protect against local failure, all relevant state for each
back to the GridManager, which then updates the Scheduler, submitted job is stored persistently in the schedulers job
where the job status is stored persistently as we describe be- queue. This persistent information allows the GridManager
low. When the job is started, a process environment variable to recover from a local crash. When restarted, the GridMan-
points to a file containing the address/port (URL) of the lis- ager reads the information and reconnects to any of the Job-
tening GASS server in the GridManager process. If the ad- Managers that were running at the time of the crash. If a
dress of the GASS server should change, perhaps because the JobManager fails to respond, the GridManager starts a new
submission machine was restarted, the GridManager requests JobManager to watch that job.
the JobManager to update the file with the new address. This The current implementation of the GridManager daemon
allows the job to continue file I/O after a crash recovery. is non-threaded. We found that this has led to performance
Condor-G is built to tolerate four types of failure: crash of problems. Most GRAM API function calls block until the re-
the Globus JobManager, crash of the machine that manages quested communication with the remote machine completes.
the remote resource (the machine that hosts the GateKeeper During this time, the GridManager cannot do anything else.
and JobManager), crash of the machine on which the Grid- If the GridManager has many calls to make for multiple jobs,
Manager is executing (or crash of the GridManager alone), they have to be performed serially. If the remote machine
and failures in the network connecting the two machines. cant be contacted (due to a crash or network failure), the call
The GridManager detects remote failures by periodically can take upwards of minutes to complete. One user experi-
probing the JobManagers of all the jobs it manages. If a Job- enced calls that were blocking for up to 10 min. The usual
Manager fails to respond, the GridManager then probes the solution in Globus to deal with these problems is to move
GateKeeper for that machine. If the GateKeeper responds, to a multi-threaded environment. However, modifying the
then the GridManager knows that the individual JobMan- event system in Condor-G to be multi-threaded was deemed
ager crashed. Otherwise, either the whole resource man- too cumbersome. Instead, we are currently re-implementing
agement machine crashed or there is a network failure (the the GridManager to use a multi-threaded helper program that
GridManager cannot distinguish these two cases). If only handles calls to Globus functions. The GridManager issues
the JobManager crashed, the GridManager attempts to start requests to the helper function to make function calls over
a new JobManager to resume watching the job. Otherwise, a pipe and occasionally checks if any of the requested calls
the GridManager waits until it can reestablish contact with have completed. The helper program, being multi-threaded,
the remote machine. When it does, it attempts to reconnect can issue all the calls in parallel, so calls that block for a long
to the JobManager. This can fail for two reasons: the Job- time wont stall other calls (or the rest of the system). We de-
Manager crashed (because the whole machine crashed), or the signed a simple protocol, the Globus ASCII Helper Protocol
JobManager exited normally (because the job completed dur- (GAPH) [36], for the communication between the GridMan-
242 FREY ET AL.

ager and helper program, in the hope that others with needs 4.4. Resource discovery and scheduling
similar to ours will find it useful. This new design should
also make it easy to port Condor-G to Windows 2000/XP. We have not yet addressed the critical question of how
There is currently no C implementation of the GRAM API the Condor-G agent determines where to execute user jobs.
for Windows (Condor-G is written in C++). However, there A number of strategies are possible.
is a Java implementation [23]. By writing the helper applica- A simple approach, which we used in the initial Condor-G
tion in Java, we can then easily port Condor-G to Windows implementation, is to employ a user-supplied list of GRAM
2000/XP. servers. This approach is a good starting point for further
development.
A more sophisticated approach is to construct a personal
4.3. Credential management resource broker that runs as part of the Condor-G agent and
combines information about user authorization, application
A GSI proxy credential used by the Condor-G agent to au- requirements and resource status (obtained from MDS) to
thenticate with remote resources on the users behalf is given build a list of candidate resources. These resources will be
a finite lifetime so as to limit the negative consequences of queried to determine their current status, and jobs will be
its capture by an adversary. A long-lived Condor-G compu- submitted to appropriate resources depending on the results
tation must be able to deal with credential expiration. The of these queries. Available resources can be ranked by user
Condor-G agent addresses this requirement by periodically preferences such as allocation cost and expected start or com-
analyzing the credentials for all users with currently queued pletion time. One promising approach to constructing such
jobs. (GSI provides query functions that support this analy- a resource broker is to use the Condor Matchmaking frame-
sis.) If a users credentials have expired or are about to ex- work [31] to implement the brokering algorithm. Such an
pire, the agent places the job in a hold state in its queue and approach is described by Vazhkudai et al. [34]. They gather
sends the user an e-mail message explaining that their job can- information from MDS servers about Grid storage resources,
not run again until their credentials are refreshed by using a format that information and user storage requests into Class-
Ads, and then use the Matchmaker to make brokering deci-
simple tool. Condor-G also allows credential alarms to be
sions. The DataGrid Resource Broker does the same thing for
set. For instance, it can be configured to e-mail a reminder
computation resources [3]. A similar approach could be taken
when less than a specified time remains before a credential
for use with Condor-G.
expires.
In the case of high throughput computations, a simple but
Credentials may have been forwarded to a remote location,
effective technique is to flood candidate resources with re-
in which case the remote credentials need to be refreshed as
quests to execute jobs. These can be the actual jobs submitted
well. At the start of a job, the Condor-G agent forwards the
by the user or Condor GlideIns as discussed below. Moni-
users proxy certificate from the submission machine to the toring of actual queuing and execution times allows for the
remote GRAM server. When an expired proxy is refreshed, tuning of where to submit subsequent jobs and to migrate
Condor-G not only needs to refresh the certificate on the local queued jobs.
(submit) side of the connection, but it also needs to re-forward
the refreshed proxy to the remote GRAM server.
We have not yet modified the GRAM protocol to han- 5. GlideIn mechanism
dle forwarding of refreshed proxy credentials. Thus, in the
current implementation of Condor-G, forwarding of the re- The techniques described above allow a user to construct, sub-
freshed proxy to the JobManager is accomplished by having mit, and monitor the execution of a task graph, with failures
the JobManager exit when its proxy is about to expire. Then and credential expirations handled seamlessly and appropri-
the GridManager starts a new JobManager to replace the old ately. The result is a powerful management tool for Grid
one, at which time a new proxy is delegated. We plan to ex- computations. However, we still have not addressed issues
tend the GRAM protocol to handle forwarding of refreshed relating to what happens when a job executes on a remote
proxies in the near future. platform where required files are not available and local pol-
To reduce user overhead in dealing with expired creden- icy may not permit access to local file systems. Local policy
tials, Condor-G could be enhanced to work with an online may also impose restrictions on the running time of the job,
credential repository such as MyProxy [29]. MyProxy lets a which may prove inadequate for the job to complete. These
user store a long-lived proxy credential (e.g., a week) on a se- additional system and site policy heterogeneities can repre-
cure server. Remote services acting on behalf of the user can sent substantial barriers.
then obtain short-lived proxies (e.g., 12 h) from the server. We address these concerns via what we call mobile sand-
Condor-G could use these short-lived proxies to authenticate boxing. In brief, we use the mechanisms described above
with and forward to remote resources and refresh them au- to start on a remote computer not a user job, but a daemon
process that performs the following functions:
tomatically from the MyProxy server when they expire. This
limits the exposure of the long-lived proxy (only the MyProxy It uses standard Condor mechanisms to advertise its avail-
server and Condor-G have access to it). ability to a Condor Collector process, which is queried
CONDOR-G 243

Figure 3. Remote job execution via GlideIn.

by the Scheduler to learn about available resources. Another advantage of using GlideIns is that they allow the
Condor-G uses standard Condor mechanisms to match Condor-G agent to delay the binding of an application to a
locally queued jobs with the resources advertised by resource until the instant when the remote resource manager
these daemons and to remotely execute them on these re- decides to allocate the resource(s) to the user. By doing so,
sources [31]. the agent minimizes queuing delays by preventing a job from
It runs each user task received in a sandbox, using sys- waiting at one remote resource while another resource capa-
tem call trapping technologies provided by the Condor ble of serving the job is available. By submitting GlideIns to
all remote resources capable of serving a job, Condor-G can
system [24] to redirect system calls issued by the task
guarantee optimal queuing times to its users. One can view
back to the originating system. In the process, this both
the GlideIn as an empty shell script submitted to a queuing
increases portability and protects the local system.
system that can be populated once it is allocated the requested
It periodically checkpoints the job to another location resources.
(e.g., the originating location or a local checkpoint server)
and migrates the job to another location if requested to do
so (for example, when a resource is required for another 6. Experiences
purpose or the remote allocation expires) [25].
Three different examples illustrate the range and scale of
application that we have already encountered for Condor-G
These various functions are precisely those provided by the
technology.
daemon process that is run on any computer participating in
An early version of Condor-G was used by a team of four
a Condor pool. The difference is that in Condor-G, these mathematicians from Argonne National Laboratory, North-
daemon processes are started not by the user, but by using western University, and University of Iowa to harness the
the GRAM remote job submission protocol. In effect, the power of over 2,500 CPUs at 10 sites (eight Condor pools,
Condor-G GlideIn mechanism uses Grid protocols to dynam- one Cluster managed by PBS, and one supercomputer man-
ically create a personal Condor pool out of Grid resources by aged by LSF) to solve a very large optimization problem [4].
gliding-in Condor daemons to the remote resource. Dae- In less than a week the team logged over 95,000 CPU hours to
mons shut down gracefully when their local allocation ex- solve more than 540 billion Linear Assignment Problems con-
pires or when they do not receive any jobs to execute after a trolled by a sophisticated branch and bound algorithm. This
(configurable) amount of time, thus guarding against runaway computation used an average of 653 CPUs during that week,
daemons. Our implementation of this GlideIn capability with a maximum of 1007 in use at any one time. Each worker
submits an initial GlideIn executable (a portable shell script), in this Master-Worker application was implemented as an in-
which in turn uses GSI-authenticated GridFTP to retrieve the dependent Condor job that used Remote I/O services to com-
Condor executables from a central repository, hence avoiding municate with the Master.
a need for individual users to store binaries for all potential A group at Caltech that is part of the CMS high energy
architectures on their local machines. Figure 3 illustrates how physics and Grid Physics Network (GriPhyN) collaborations
GlideIn works. has been using Condor-G to perform large-scale distributed
244 FREY ET AL.

simulation and reconstruction of high-energy physics events. computers for specific computations, via the use of simple
A two-node Directed Acyclic Graph (DAG) of jobs submit- remote execution agents (workers) that, once installed on a
ted to a Condor-G agent at Caltech triggers 100 simulation computer, can download problems (or, in some cases, Java
jobs on the Condor pool at the University of Wisconsin. Each applications) from a central location and run them when local
of these jobs generates 500 events. The execution of these resources are available (i.e., SETI@home [22], Entropia, and
jobs is also controlled by a DAG that makes sure that local Parabon). These tools assume a homogeneous environment
disk buffers do not overflow and that all events produced are where all resource management services are provided by their
transferred via GridFTP to a data repository at NCSA. Once own system. Furthermore, a single master (i.e., a single sub-
all simulation jobs terminate and all data is shipped to the mission point) controls the distribution of work amongst all
repository, the Condor-G agent at Caltech submits a subse- available worker agents. Application-level scheduling tech-
quent reconstruction job to the PBS system that manages the niques [7,8] provide personalized policies for acquiring
reconstruction cluster at NCSA. and managing collections of heterogeneous resources. These
Condor-G has also been used in the GridChem project (for- systems employ resource management services provided by
merly called GridGaussian) at NCSA to prototype a portal batch systems to make the resources available to the appli-
for running Gaussian98 jobs on Grid resources. This Portal cation and to place elements of the application on these re-
uses GlideIns to optimize access to remote resources and em- sources. An application-level scheduler for high-throughput
ploys a shared Mass Storage System (MSS) to store input and scheduling that takes data locality information into account in
output data. Users of the portal have two requirements for interesting ways has been constructed [10]. Condor-G mech-
managing the output of their Gaussian jobs. First, the out- anisms complement this work by addressing issues of uni-
put should be reliably stored at MSS when the job completes. form remote access, failure, credential expiry, etc. Condor-G
Second, the users should be able to view the output as it is pro- could potentially be used as a backend for an application-level
duced. These requirements are addressed by a utility program scheduling system.
called G-Cat that monitors the output file and sends updates to Nimrod [2] provides a user interface for describing para-
MSS as partial file chunks. G-Cat hides network performance meter sweep problems, with the resulting independent jobs
variations from Gaussian by using local scratch storage as a being submitted to a resource management system; Nim-
buffer for Gaussians output, rather than sending the output rod-G [1] generalizes Nimrod to use Globus mechanisms to
directly over the network. Users can view the output as it is support access to remote resources. Condor-G addresses is-
received at MSS using a standard FTP client or by running a sues of failure, credential expiry, and interjob dependencies
script that retrieves the file chunks from MSS and assembles that are not addressed by Nimrod or Nimrod-G. Nimrod could
them for viewing. reasonably adopt Condor-G as a backend, improving its func-
The Alliance Grid in a Box (GiB) distribution of Grid tionality and reliability.
software incorporates Condor-G as a standard job submission
client for Grid resources.
Acknowledgments

7. Related work We would like to thank all the early adopters of Condor-G,
who helped us to debug the software. They include Jeff
The management of batch jobs within a single distributed sys- Linderoth (MetaNEOS), Massimo Sgaravatto (EU DataGrid),
tem or domain has been addressed by many research and com-
Scott Koranda (GridChem, CMS), Paolo Mazzanti (EU
mercial systems, notably Condor [24], DQS [20], LSF [35],
DataGrid), Francesco Prelz (EU DataGrid), Jim Amundson
LoadLeveler [19], and PBS [18]. Some of these systems were
(PPDG), Vladimir Litvin (CMS), and Jens Voeckler (Gri-
extended with restrictive and ad hoc capabilities for routing
PhyN). This research was supported by the NASA Informa-
jobs submitted in one domain to a queue in a different do-
tion Power Grid program.
main. In all cases, both domains must run the same resource
management software. With the exception of Condor, they all
use a resource allocation framework that is based on a system- References
wide collection of queues each representing a different class
of service. [1] D. Abramson, J. Giddy and L. Kotler, High performance parametric
Condor flocking [13] supports multi-domain computation modeling with Nimrod/G: Killer application for the Global Grid?, in:
IPDPS2000 (IEEE Press, 2000).
management by using multiple Condor flocks to exchange
[2] D. Abramson, R. Sosic, J. Giddy and B. Hall, Nimrod: A tool for per-
load. The major difference between Condor flocking and forming parameterized simulations using distributed workstations, in:
Condor-G is that Condor-G allows inter-domain operation on Proc. 4th IEEE Symp. on High Performance Distributed Computing
remote resources that require authentication, and uses stan- (1995).
dard protocols that provide access to resources controlled by [3] C. Anglano et al. Integrating GRID tools to build a computing resource
broker: Activities of DataGrid WP1, in: Computing in High Energy
other resource management systems, rather than the special-
and Nuclear Physics (2001).
purpose sharing mechanisms of Condor. [4] K. Anstreicher, N. Brixius, J.-P. Goux and J. Linderoth, Solving large
Recently, various research and commercial groups have quadratic assignment problems on computational Grids, in: Mathemat-
developed software tools that support the harnessing of idle ical Programming (2000).
CONDOR-G 245

[5] K. Anstreicher, N. Brixius, J.-P. Goux and J. Linderoth, Solving large print for a New Computing Infrastructure, eds. I. Foster and C. Kessel-
quadratic assignment problems on computational Grids, in: Mathemat- man (Morgan Kaufmann, 1999) pp. 311337.
ical Programming (to appear). [27] NCSA Alliance, Grid-in-a-Box, http://www.ncsa.uiuc.edu/TechFocus/
[6] J. Beiriger, W. Johnson, H. Bivens, S. Humphreys and R. Rhea, Con- Deployment/GiB.
structing the ASCI Grid, in: Proc. 9th IEEE Symposium on High Per- [28] NCSA Alliance, GridGaussian, http://www.ncsa.uiuc.edu/Divisions/
formance Distributed Computing (IEEE Press, 2000). ACES/APG/grid_gaussian.htm.
[7] F. Berman, High-performance schedulers, in: The Grid: Blueprint for a [29] J. Novotny, S. Tuecke and V. Welch, An online credential repository
New Computing Infrastructure, eds. I. Foster and C. Kesselman (Mor- for the Grid: MyProxy, in: Proc. 10th IEEE Int. Symp. on High-
gan Kaufmann, 1999) pp. 279309. Performance Distributed Computing (2001).
[8] F. Berman, R. Wolski, S. Figueira, J. Schopf and G. Shao, Application- [30] M. Papakhian, Comparing job-management systems: The users per-
level scheduling on distributed heterogeneous networks, in: Proc. Su- spective, IEEE Computational Science & Engineering (AprilJune)
percomputing 96 (1996). (1998), http://pbs.mrj.com.
[9] J. Bester, I. Foster, C. Kesselman, J. Tedesco and S. Tuecke, GASS: [31] R. Raman, M. Livny and M. Solomon, Resource management through
A data movement and access service for wide area computing systems, multilateral matchmaking, in: Proc. of the 9th IEEE Symposium on
in: 6th Workshop on I/O in Parallel and Distributed Systems, 5 May High Performance Distributed Computing (HPDC9), Pittsburgh, Penn-
1999. sylvania (August 2000) pp. 290291.
[10] H. Casanova, G. Obertelli, F. Berman and R. Wolski, The AppLeS pa- [32] J. Steiner, B.C. Neuman and J. Schiller, Kerberos: An authentication
rameter sweep template: User-level middleware for the Grid, in: Proc. system for open network systems, in: Proc. Usenix Conference (1988)
SC2000 (2000). pp. 191202.
[11] K. Czajkowski, S. Fitzgerald, I. Foster and C. Kesselman, Grid in- [33] R. Stevens, P. Woodward, T. DeFanti and C. Catlett, From the I-WAY
formation services for distributed resource sharing, in: Proc. 10th to the national technology Grid, Communications of the ACM 40(11)
IEEE Int. Symp. on High-Performance Distributed Computing (2001) (1997) 5061.
pp. 181184. [34] S. Vazhkudai, S. Tuecke and I. Foster, Replica selection in the Globus
[12] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. data Grid, in: Proc. of the 1st IEEE/ACM Int. Conference on Clus-
Smith and S. Tuecke, A resource management architecture for meta- ter Computing and the Grid (CCGRID 2001) (IEEE Computer Society
computing systems, in: Proc. IPPS/SPDP 98 Workshop on Job Press, 2001) pp. 106113.
Scheduling Strategies for Parallel Processing (1998). [35] S. Zhou, LSF: Load sharing in large-scale heterogeneous distributed
[13] D.H.J. Epema, M. Livny, R.V. Dantzig, X. Evers and J. Pruyne, systems, in: Proc. Workshop on Cluster Computing (1992).
A worldwide flock of condors: Load sharing among workstation clus- [36] Condor Project, Globus ASCII Helper Protocol (GAHP), http://www.
ters, Future Generation Computer Systems 12 (1996). cs.wisc.edu/condor/gaph.
[14] I. Foster and C. Kesselman, Globus: A toolkit-based Grid architec-
ture, in: The Grid: Blueprint for a New Computing Infrastructure,
eds. I. Foster and C. Kesselman (Morgan Kaufmann, 1999) pp. 259
278. James Frey is a Research Assistant with the Con-
[15] I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, A security architec- dor Project, a distributed high-throughput and grid
ture for computational Grids, in: ACM Conference on Computers and scheduling system, at the University of Wisconsin-
Security (1998) pp. 8391. Madison. He previously worked on distributed to-
[16] I. Foster, C. Kesselman and S. Tuecke, The anatomy of the Grid: En- mographic reconstruction at the National Center for
abling scalable virtual organizations, Int. J. High Performance Com- Microscopy and Imaging Research (NCMIR), and
puting Applications 15(3) (2001) 200222, http://www.globus.org/ the Application-Level Scheduling (AppLeS) group
research/papers/anatomy.pdf. at the University of California, San Diego (UCSD).
[17] J. Gray and A. Reuter, Two-phase commit: Making computations He received a B.S. in computer science from UCSD
atomic, in: Transaction Processing: Concepts and Techniques (Mor- and is currently pursuing his Ph.D.
gan Kaufmann, 1993) pp. 562573. E-mail: jfrey@cs.wisc.edu
[18] R. Henderson and D. Tweten, Portable Batch System: External Refer-
ence Specification (1996).
[19] IBM, Using and Administering IBM LoadLeveler, Release 3.0, IBM Todd Tannenbaum is an Associate Researcher in
CorporationSC23-3989 (1996). the Department of Computer Sciences at the Univer-
[20] Institute S.C.R., DQS 3.1.3 User Guide, Florida State University, Tal- sity of Wisconsin-Madison (UW-Madison). He plays
lahassee (1996). the role of technical lead for the Condor project,
[21] W.E. Johnston, D. Gannon and B. Nitzberg, Grids as production com- a distributed high throughput and grid computing
puting environments: The engineering aspects of NASAs information research group. Previous to his involvement with
power Grid, in: Proc. 8th IEEE Symposium on High Performance Dis- the Condor Project, Todd served as the Director of
tributed Computing (IEEE Press, 1999). the Model Advanced Facility, an advanced visualiza-
[22] E. Korpela, D. Werthimer, D. Anderson, J. Cobb and M. Lebofsky, tion and high-performance computing center housed
SETI@home: Massivel distributed computing for SETI, Computing in in the UW-Madison College of Engineering. Todd
Science and Engineering 3(1) (2001). has also served as Technology Editor for Network
[23] G. von Laszewski, I. Foster, J. Gawor and J. Lane, A Java commod- Computing magazine, and as an officer of Coffee
ity Grid toolkit, Concurrency: Practice and Experience 13 (2001) (to Computing Corp., a software development consult-
appear). ing company. In addition to research publications,
[24] M. Litzkow, M. Livny and M. Mutka, Condor A hunter of idle work- Todd is a contributing author on books relating to
stations, in: Proc. 8th Int. Conf. on Distributed Computing Systems cluster computing, and has published over 25 articles
(1988) pp. 104111. in several of the nations mainstream software devel-
[25] M. Litzkow, T. Tannenbaum, J. Basney and M. Livny, Checkpoint and opment and administration publications such as Dr.
migration of UNIX processes in the Condor distributed processing sys- Dobbs Journal, Network Computing, and Informa-
tem, University of Wisconsin-Madison Computer Sciences, Technical tion Week. He received a B.S. in computer science
Report 1346 (1997). from University of Wisconsin-Madison.
[26] M. Livny, High-throughput resource management, in: The Grid: Blue- E-mail: tannenba@cs.wisc.edu
246 FREY ET AL.

Ian Foster is Senior Scientist and Associate Direc- ronments. His recent work includes the Condor high
tor of the Mathematics and Computer Science Divi- throughput computing system, the DEVise data visu-
sion at Argonne National Laboratory, and Professor alization and exploration environment, the ZOO ex-
of Computer Science at the University of Chicago. periment management framework, quality controlled
He has published four books and over 150 papers and lossy image compression, and data clustering.
technical reports. He co-leads the Globus project, E-mail: miron@cs.wisc.edu
which provides protocols and services used by in-
dustrial and academic distributed computing projects
worldwide. He co-founded the influential Global Steven Tuecke is a Software Architect in the Dis-
Grid Forum and co-edited the book The Grid: Blue- tributed Systems Laboratory in the Mathematics and
print for a New Computing Infrastructure. Computer Science Division at Argonne National
E-mail: foster@mcs.anl.gov Laboratory, and a Fellow with the Computation In-
stitute at the University of Chicago. He plays a lead-
ership role in many of Argonnes research and de-
Miron Livny received a B.Sc. degree in physics velopment projects in the area of high-performance,
and mathematics in 1975 from the Hebrew Uni- distributed, Grid computing, and directs the efforts
versity and M.Sc. and Ph.D. degrees in computer of both Argonne staff and collaborators in the de-
science from the Weizmann Institute of Science in sign and implementation of the Globus Toolkit. He
1978 and 1984, respectively. Since 1983 he has is also the Co-director of the Global Grid Forum Se-
been on the Computer Sciences Department faculty curity area. He received a B.A. in mathematics and
at the University of Wisconsin-Madison, where he computer science from St. Olaf College.
is currently a Professor of computer sciences. Dr. E-mail: tuecke@mcs.anl.gov
Livnys research focuses on High Throughput Com-
puting, Grid Computing, and data visualization envi-

Das könnte Ihnen auch gefallen