Sie sind auf Seite 1von 23

The Legion Project

Key words: parallel processing, high performance, object-oriented, distributed


systems, metasystems, wide area, gigabit networks

Legion is an object-based, meta-systems software project at the University of Virginia.


From the project's beginning in late 1993, the Legion Research Group`s goal has been a
highly useable, efficient, and scalable system founded on solid principles. We have been
guided by our own work in object-oriented parallel processing, distributed computing,
and security, as well as by decades of research in distributed computing systems. Our
system addresses key issues such as scalability, programming ease, fault tolerance,
security, site autonomy, etc. Legion is designed to support large degrees of parallelism in
application code and manage the complexities of the physical system for the user. The
first public release was made at Supercomputing '97, San Jose, California, on November
17, 1997.

Legion is a work in progress: our team will not finish Legion but will create an "open"
system that allows and actively encourages third-party development of applications, run-
time library implementations, and core system components.

1.8 Release Notes -- 6/20/01


The notable changes are:

• We've changed the software's directory structure. This change is relevant to


developers and system administrators, but not to users. It does not affect any tools
or context space. Please note that you may need to update paths in makefiles or
change library paths. The complete Legion package now consists of five
packages:
1. Core: This is the basic Legion package and the minimum for running a
Legion system. It lets you start up and shut down Legion, work in context
space, run Legion security, etc.
2. Software development kit (SDK): This contains development-oriented
tools and libraries, such as the stub generator, Legion Grid library,
LegionArray library, etc.
3. High-performance computing (HPC): The HPC module lets you run your
programs in Legion. It contains PVM and MPI tools, the two-dimensional
FileObject interfaces, JobProxy and JobQueue objects, batch queue class
and host object, and legion_run and legion_run_multi.
4. Extra: This adds functionality to the basic Legion package. It contains the
round robin scheduler, simple k-copy class (SKCC), process control
daemon host objects, etc. It is not necessary, but it gives you more control
over your objects.
5. Applications: The Apps package also extends the basic Legion package.

When first starting a new system, you will need to initialize the HPC, Extra, and
Applications packages with the legion_init_HPC, legion_init_Extra, and
legion_init_Apps command-line tools.

• This restructuring has meant that you now need to download and install Open
SSL on your own. Legion uses public key cryptography based on the RSA 2.0
algorithm, as implemented by OpenSSL. You will need to download OpenSSL
0.9.5 or higher from http://www.openssl.org. You'll need to untar, configure, and
compile it. Be sure that you set your $OPENSSL_INC and $OPENSSL_LIB
variables to the correct directory. Suggested values are:

(ksh or sh users)

export OPENSSL_INC=<OpenSSL installation directory>/include


export OPENSSL_LIB=<OpenSSL installation directory>/lib

(csh users)

setenv OPENSSL_INC <OpenSSL installation directory>/include


setenv OPENSSL_LIB <OpenSSL installation directory>/lib

• You can use the JobQueue, with the legion_nq, legion_manage_job, and
legion_manage_queue command-line tools, to start and monitor remote jobs.
• You can edit information about your user profile and security settings with
legion_configure_profile. You can modify the implicit parameter set for your
current session with legion_modify_parameters.
• Two new command-line tools, legion_skcc_set_class_vaults and
legion_skcc_set_defaults, let you set defaults for SKCC classes.
• The list of supported platforms has changed. We don't have a working binary for
the SGI Workstations/IRIX 6.5 n64 build although we're working on it. We are
dropping support for the x86/FreeBSD 4.2 platform, although we will consider
adding it back in if someone needs it. We may be adding a T3E platform in the
future. We are also not currently supporting Windows platforms. If you need any
of these platforms, please contact us at legion@virginia.edu.

1.7 Release Notes -- 10/27/00


The notable changes are:

• We've added simple K-copy classes (SKCC). This allows certain Legion objects
to use backup vaults to replicate their persistent state, in case their primary vault
crashes or is unavailable when an object needs to reactivate. This makes it easier
to tolerate host failures. There are four new commands associated with SKCC:
legion_set_backup_vaults, legion_synch_vaults, legion_set_worm, and
legion_unset_worm.
• We are now using OpenSSL to implement the RSA algorithm. Since the RSAREF
patent has expired, we can now export Legion abroad with full encryption.
• The 1.7 release now includes a set of GUIs for Windows 2000 machines. These
GUIs are collectively known as the Worldwide File Server (WWFS). The WWFS
is a discrete set of applications that you download and install on your Windows
machine. It connects your machine to an existing Legion net (such as NPACI-net)
and lets you work in your context space. The WWFS binary package includes
four GUIs to let you work in Legion context space and an FTP daemon, which
uses standard ftp protocols to transfer files between context space and any ftp
client (Legin credentials and full security are always managed by the daemon).
The binary package is available from Applied MetaComputing.
• For NPACI-net users, we've added a web-portal for running Amber on our Legion
web browser. The portal works on both IE 4 and Netscape Communicator, but for
best results we'd suggest you use IE.
• We've improved legion_run and legion_run_multi. We have added a probe
objects, which allows you to check your runs while they are executing and move
files to and from the executing remote host(s). You can also start your jobs in
blocking or nonblocking mode. For more information, please see the updated
legion_run and legion_run_multi FAQs.
• We've added a new MPI tool, legion_mpi_probe. This tool allows you to check
your MPI runs.
• You can now use wildcards with legion_ls, legion_cp, and legion_rm. For
example, you could ask to remove all context names beginning with "Foo" by
entering:

$ legion_rm Foo\*

Note that you need to escape the "*" character.


• We've added the ability to temporarily lock down individual objects or all of a
class's objects. This makes it easier to shut down a Legion system or perform
class and system maintenance or upgrades. The legion_deactivate_object and
legion_deactivate_instances commands have a new -stay_down flag, which
causes the object or instances to inactive after it being successfully deactivated. It
can only be reactivated by legion_allow_activation.
• We've reworked the binding agents, to improve system caching. We've also
chagned the default configuration so that each host now has its own local binding
agent (either on or nearby the host). Objects that are started on a host with its own
binding agent will automatically use that binding agent. You can also choose to
use a specific binding agent during a login session. Once you've logged in, run the
legion_set_binding_agent tool to set or unset a binding agent for the session.

Binding agents and the Legion library have been improved to cache more
information. Caching now includes object interfaces, context names, and contents.
Context information caching is only allowed for objects that export a
"context_contents_cacheble('YES')"attribute.

Upon login, we now cache some high-use objects' bindings and high-use contexts'
LOIDs. These binding may become stale, so we have added a new tool,
legion_refresh_local_cache, to refresh them on request. We advise refreshing
your cache if you notice a consistent delay of around thirty seconds before and
after commands respond.

• Finally, we've improved the I/O library and updated the communication system
(with a UDP communication-layer sliding window protocol) so that version 1.7 is
remarkable faster, more scalable, and more flexible.

1.6.6 Release Notes -- 8/4/00


The notable changes are:

• You can use wildcards with legion_mpi_run's -in/-IN and -out/-OUT flags to
name groups of files to be used as input and output files. The following wildcards
can be used with -in/-out and -IN/-OUT:
* match 0 or more characters
? match any one character
[-] match any character listed between the brackets (use these to specify a range
of characters)
\ treat the character as a literal
• For example, if you wanted to identify done.1, done.2, done.3 ... done.9 as
your inputs, you could use square brackets to identify them as a group:
• $ legion_mpi_run -n 2 -IN done.[0-9] /mpi/programs/mpiFoo
• You can use wildcards on the command line or in an option file. They can only be
used with file names, however, not with directories.
• The legion_native_mpi_run command now has a -debug flag.
• A new command, legion_make_hostlist, lets you create a host list for
legion_mpi_run.

1.6.5 Release Notes -- 6/13/00


This release contains bug fixes and updates for 1.6.4, most notably:

• We have added three new commands for subcollections. A subcollection is a


collection that is attached to a parent collection. The parent collection can query a
subcollection for resource data. These commands are
legion_add_sub_collection, legion_remove_sub_collection, and
legion_list_sub_collections.
• You can adjust a collection's polling frequency by setting the
collection_update_frequency_secs attribute on the collection object (use
legion_update_attributes). The default is currently 300 seconds.
• There are several changes to the legion_run_multi command. The specification
file now takes -in/out/constant flags as well as -IN/OUT/CONSTANT. It also
uses pattern specification holders. Please see the man page for further
information. One change that is not mentioned in the man page: the format for
specifying files for the CONSTANT variable in legion_run_multi has changed.
The old format was:

CONSTANT <file path>

The new format is:

CONSTANT <file name> <file path>

For example,

CONSTANT foo /home/my_files/foo_file

Note that the <file name> does not need to match the <file path>: in this case,
the program will copy the contents of /home/my_files/foo_file to a local file
and assign it the name foo.

• The legion_link command has a -FC flag. This flag allows you to specify a
Fortran compiler.

1.6.4 Release Notes -- 3/21/00


This release contains several bug fixes and improvements. Primary points are:

• The legion_create_user command has new flags that allow you to specify a
new user id's password from the command line and to specify the new user's
home context space. The <user id> parameter is also now a full path, which can
be given as a relative or absolute path.
• We have added new flags to the legion_mpi_run command. The new flags,
-in/-out/-stdin/-stdout/-stderr, -IN/-OUT/-STDIN/-STDOUT/-STDERR, and
-a/-A, give you more control over input and output data for your mpi program.
They resemble the legion_run flags.
• There is a new -f flag for legion_add_host_account. This allows you to set up
a mapping file that lists all of your Unix-Legion account mappings for that PCD
host.
• There are new keywords available for legion_run_multi: you can now specify
stdout/stderr/stdin for local file space.
• When starting a PCD host object, once you have started the PCD host object and
(if necessary) the accompanying vault, you must change the following file
permissions on the node that is actually running the PCD host.
o $LEGION_OPR should be set to 755
o $LEGION_OPR/LegionClass.config* should be set to 644
o $LEGION_OPR/BootstrapVaultOPR should be set to 777 (If your
bootstrap host is a PCD host)
o $LEGION_OPR/<vault_name>.OPA should be set to 777 (If the bootstrap
host is not a PCD host)

These changes should be made by the Legion administrator.

• A fully implemented object migration.


• The TCP version of Legion communication layer.
• Added intelligent switching between using UDP and TCP communication based
on message size and destination.
• Improved robustness of HostObject being making it more tolerant to
implementation cache failures.

1.6.3 Release Notes -- 1/13/00


This is an upgrade of 1.6, and contains several bug fixes. The primary fixes are listed
below.

• There are several MPI-related changes:


o In a secure net, the legion_mpi_register command now puts a new MPI
application's context name into the /home/<user_name>/mpi context
instead of /mpi EXCEPT when the command is run by admin or a guest
user (i.e., a user who isn't logged in). In an insecure net the new
application's context name will continue to be placed in /mpi.
o The legion_mpi_run command's -p flag (which names a context to hold
PIDs) is more flexible. Previously, only currently existing contexts could
be used. The flag will now create a new context to hold PIDs.
o The MPI libraries have been renamed: previously they used the form
libmpi.a or -lmpi and now they use the form libLegionMPI.a or
-lLegionMPI.
• There are also several small changes to the command-line utilities. All commands
now have a -debug and -help flag. A new command, legion_set_vault,
migrates a Legion object to a specific vault.

1.6.2 Release Notes -- 11/1/99


This is an upgrade of 1.6, and contains several bug fixes. The primary fixes are listed
below.

• MPI is now integrated with vector create. This involved added two new options to
the legion_mpi_run option, -hf and -HF.
• We've added two new command-line tools: legion_mkdir and legion_cd. These
two commands perform exactly the same functions as legion_context_create
and legion_set_context, respectively.
• The legion_ping tool now has a -timeout flag, which allows you to set a
timeout period for pinging a Legion object.
• The Legion libraries (libLegion1 and libLegion2) now use version numbers.
• The performance of passing messages in secure Legion systems has been greatly
improved.

1.6 Release Notes -- 8/27/99

• We have added two new platforms to the 1.6 release: a beta release of Windows
NT 4.0 and a FreeBSD 3.0 for x86 machines.
• We have also added tools for building virtual hosts. This allows you to run
programs on unsupported machines, such as a Cray T3E.
• The 1.6 release has tools to help debug and analyze Legion applications
(legion_record and legion_replay).
• We've added the support for operating in environments that require Kerberos
authentication.
• We have added the legion_export_dir tool, which lets you link a local
directory to your context space.
• We have also added a checkpointing library for SPMD-style (Single Program
Multiple Data) applications to deal with MPI application failure.
• The legion_check_system tool has two new flags which will report errors in
command-line or context objects and then destroy the erring objects.
• We have updated the legion_run_multi tool, so that you can use keywords to
specify an input/output file's location.

1.5.15 Release Notes -- 5/26/99


This is an upgrade of 1.5, and contains several bug fixes. The primary fixes are listed
below.

• Host objects: We've put in a bug fix in the host object restart code, so that host
objects will restart more reliably. This is especially relevant for multi-host
systems, although it will apply to all Legion systems.
• MPI: 1.5.15 has MPI-2 conversion functions for C and Fortran interoperability. It
also has a bug fix for MPI_pack().
• Security: We've put in a bug fix for the legion_init_security command-line
utility. It should now run much more reliably.

1.5 Release Notes -- 4/30/99


Version 1.5 includes an updated GUI, several new command-line tools, improved
resource management, and the ability to connect Legion systems together to form larger,
multi-domained systems. We have also added a batch queue host object for running
Legion on local queueing systems and a process control daemon (PCD) host object for
better control over process ownership. To improve parallel application performance
we've added two-dimensional file interfaces.

1.2 Release Notes -- 7/7/98


Version 1.2 offers support for remote execution of arbitrary programs, from either the
command line or from the GUI, using the new legion_run command. This includes the
ability to run serial programs with multiple input files and mulitple executions, with the
legion_flogger command. We have also improved the GUI, which can be run from the
command line or from Windows95.

1.0 Release Notes -- 2/9/98


The February 9, 1998 release contains bug fixes of the previous release (December 16,
1997). No new features have been added.
Legion Overview
The wide-area virtual environment of the future

As computer networks are get larger, faster, and more powerful, they offer new
opportunities. Gigabit networks, connecting powerful high-performance machines and
workstations, have enormously powerful infrastructures that can solve complex problems
and distribute huge amounts of information. Linked together, these connected resources
make up a single, worldwide, virtual computer. We now need easy-to-use software that
can manage a complex physical system and support large degrees of parallelism so that a
virtual computer becomes a reliable, efficient, and real opportunity for a wide variety of
users.

Legion, an object-based metasystems software project at the University of Virginia, is


designed for a system of millions of hosts and trillions of objects tied together with high-
speed links. Users working on their home machines see the illusion of a single computer,
with access to all kinds of data and physical resources, such as digital libraries, physical
simulations, cameras, linear accelerators, and video streams. Groups of users can
construct shared virtual work spaces, to collaborate research and exchange information.
This abstraction springs from Legion's transparent scheduling, data management, fault
tolerance, site autonomy, and a wide range of security options.

As new requirements and new opportunities for distributed computing emerge and future
users make unforeseen demands on resources and software, the demands placed on a
virtual computer will evolve and grow. What works today or even tomorrow will soon be
worse than useless, and we strongly believe that Legion should be a flexible tool that can
adapt to new needs. Legion is therefore an open system, designed to encourage third
party development of new or updated applications, run-time library implementations, and
core components.

Legion sits on top of the user's operating system, acting as liaison between its own host(s)
and whatever other resources are required. The user isn't bogged down with time-
consuming negotiations with outside systems and system administrators, since Legion's
scheduling and security policies act on his or her behalf. Conversely, it can protect its
own resources against other Legion users, so that administrators can choose appropriate
policies for who uses which resources under what circumstances. To allow users to take
advantage of a wide range of possible resources, Legion offers a user-controlled naming
system called context space, so that users can easily create and use objects in farflung
systems. Users can also run applications written in multiple languages, since Legion
supports interoperability between objects written in multiple languages.

Legion Objectives and Constraints


There are ten design objectives listed here, and three constraints. They were laid out
before any Legion code was written and have been carefully considered at each stage.

Objectives

• Site autonomy
Legion is not a monolithic system, but is composed of resources owned and
controlled by a variety of organizations. Since these organizations require control
over their own resources, Legion can not dictate how much of a particular
resource can be used, when it can be used, or who can use it.
• Extensible core
We cannot predict all of the needs of current and future users. We must build
Legion with extensible and replaceable components that permit Legion to evolve
over time and allow users to construct their own mechanisms and policies.
• Scalable architecture
Legion cannot rely on a centralized structure. If the system is to eventually
encompass millions of hosts, it must use a scalable architecture.
• Easy-to-use, seamless, computational environment
We must mask the complexity of the hardware environment and the
communications synchronization of parallel processing. Users should not be
aware of machine boundaries. Compilers, in cooperation with run-time facilities,
should manage the environment.
• High-performance via parallelism
We must support easy-to-use parallel processing by means of large degrees of
parallelism (this includes task and data parallelism and their arbitrary
combinations).
• Single, persistent, name space
One of the most significant obstacles to wide-area parallel processing is the lack
of a single name space for file and data access. Existing multiple disjoint name
spaces makes writing applications for multiple sites very difficult. Legion
therefore uses a single, persistent, name space.
• Security for users and resource owners
We cannot replace existing host operating systems (see below), but we can ensure
that existing mechanisms are not weakened by Legion. Legion does not define a
security policy or requires a "trusted" Legion, but offers mechanisms for users to
manage their own security needs.
• Management and exploitation of resource heterogeneity
Legion must support interoperability between heterogeneous hardware and
software components, as well as exploit architectural strengths as possible when
making scheduling decisions and policy.
• Multiple language support and interoperability
Legion applications will be written in a variety of languages, and heterogeneous
source language application components must be integrated. We must also
support legacy codes.
• Fault tolerance
At any given moment in a large system, several hosts, communication links, and
disks will fail. Legion must be able to handle their failures and dynamic
reconfiguration.

Constraints

• We cannot replace host operating systems


Organizations cannot allow their operating systems to be replaced. That would
require rewriting applications and retraining users, as well as raising compatibility
problems with other machines in the organization.
• We cannot legislate changes to the interconnection network
We must assume that network resources and protocols currently in use will not
change. While this means accommodating operating system heterogeneity, we
must accept the available resources.
• We cannot require the Legion run as "root"
To protect their resources, most users will want to run Legion with the fewest
possible privileges.

Legion Applications
Adapting and Running Applications
Legion aims to provide an easy-to-use environment in which users have access to all the
resources of the worldwide metacomputing environment for their applications. The four
application examples illustrated here show how the advanced features of Legion, such as
flexible security and transparent file access, can be used to extend today's applications on
to larger sets of resources.

Transparent Remote Execution

Legion allows programs to execute transparently and securely on remote hosts, taking
advantage of Legion's distributed resources. Users can quickly and easily adapt their
programs to run in Legion, although modification may not even be necessary. Legion
also includes a remote "make" tool, which compiles binaries for other machine
architectures without requiring the user to log in to another machine.

Parameter Space Studies

Not all applications are best suited to simple remote execution, so we have extended this
capability to attack the class of problems known as parameter space searches. For
example, the NASA NAS effort demands thousands of combinations of CFD
computations, using a variety of wing designs, wing angles, and air speeds. A single
computation requires running five programs in sequence, each reading the output files
generated by the previous program. No individual program consumes significant CPU
time, but the total CPU time consumed by all runs adds up to tens of thousands of hours.

We have written a simple tool to facilitate running parameter space studies under Legion.
Users can specify input and output files, the maximum number of jobs to run on a
particular host, and the total number of jobs to be run at a time. Once the initial jobs have
started on each host, future jobs are sent to the hosts that have finished previous runs.
This dynamic scheduling allows for load balancing and faster processing time.

Wide-Area Parallel Applications

Some existing parallel applications run more efficiently on more CPUs than are available
on a single machine. For example, a DSMC (Direct Simulation Monte-Carlo) code in use
at the University of Virginia to study vapor deposition onto surfaces has a low ratio of
communications to computation, so that it can run on a large number of widely separated
machines. This approach allows the solution of much larger problems, but can cause
difficulties with some conventional parallel tools. Most vendor-supplied versions of MPI,
a popular communications library, cannot be parallelized on multiple supercomputers.
Successfully running this type of problem requires overcoming MPI's limitations as well
as transparent access to files and appropriate scheduling support. Legion provides these
capabilities today.

Another example of a wide-area parallel application that can benefit from Legion's
collection of metacomputing resources is neural network modeling. Some types of
models use relatively low amounts of communications and are limited only by the
amount of RAM or disk bandwidth of the machines that they run on. We have
parallelized such a model using the MPI interface of Legion, and hope soon to set a world
record for the largest neural network run.

Coupled Applications

The most difficult category of applications that Legion supports are coupled applications,
in which several initially distinct programs are glued together. One example we are
working with under the NPACI umbrella is a coupled system attempting to predict the
effect of the El Niño weather pattern on California. Global climate models do not have a
sufficiently fine resolution to accurately predict precipitation over the California
mountains, so we are coupling the global model with a regional model that more
accurately predicts area weather. However, the two models were written by different
scientific groups and in different languages. The global model is parallelized but the
regional model is not. Legion can couple these models in an easy-to-use fashion that can
be extended to include other models, such as an estuary model of San Diego Bay.

Legion Architecture
Legion's object-based system gives classes and metaclasses system-level
responsiblity

Philosophy

Legion users will require a wide range of services in many different dimensions,
including security, performance, and functionality. No single policy or static set of
policies will satisfy every user, so, whenever possible, users must be able to decide what
trade-offs are necessary and desirable. Several characteristics of the Legion architecture
reflect and support this philosophy.

• Everything is an object: The Legion system will consist of a variety of hardware


and software resources, each of which will be represented by a Legion object,
which is an active process that responds to member function invocations from
other objects in the system. Legion defines the message format and high-level
protocol for object interaction, but not the programming language or the
communications protocol.
• Classes manage their instances: Every Legion object is defined and managed by
its class object, which is itself an active Legion object. Class objects are given
system-level responsibility; classes create new instances, schedule them for
execution, activate and deactivate them, and provide information about their
current location to client objects that wish to communicate with them. In this
sense, classes are managers and policy makers, not just definers of instances.
Classes whose instances are themselves classes are called metaclasses.
• Users can provide their own classes: Legion allows users to define and build
their own class objects; therefore, Legion programmers can determine and even
change the system-level mechanisms that support their objects. Legion 1.4 (and
future Legion systems) contains default implementations of several useful types
of classes and metaclasses. Users will not be forced to use these implementations,
however, particularly if they do not meet the users' performance, security, or
functionality requirements.
• Core objects implement common services: Legion defines the interface and basic
functionality of a set of core object types that support basic system services, such
as naming and binding, and object creation, activation, deactivation, and deletion.
Core Legion objects provide the mechanisms that classes use to implement
policies appropriate for their instances. Examples of core objects include hosts,
vaults, contexts, binding agents, and implementations.

The Model

Legion objects are independent, logically address-space-disjoint active objects that


communicate with one another via non-blocking method calls that may be accepted in
any order by the called object. Each method has a signature that describes the parameters
and return value, if any, of the method. The complete set of method signatures for an
object fully describes that object's interface, which is determined by its class. Legion
class interfaces can be described in an interface description language (IDL), several of
which will be supported by Legion.

Legion implements a three-level naming system. At the highest level, users refer to
objects using human-readable strings, called context names. Context objects map context
names to LOIDs (Legion object identifiers), which are location-independent identifiers
that include an RSA public key. Since they are location independent, LOIDs by
themselves are insufficient for communication; therefore, a LOID is mapped to an LOA
(Legion object address) for communication. An LOA is a physical address (or set of
addresses in the case of a replicated object) that contains sufficient information to allow
other objects to communicate with the object (e.g., an <IP address, port number> pair).

Legion will contain too many objects to simultaneously represent all of them as active
processes. Therefore, Legion requires a strategy for maintaining and managing the
representations of these objects on persistent storage. A Legion object can be in one of
two different states, active or inert. An inert object is represented by an OPR (object
persistent representation), which is a set of associated bytes that exists in stable storage
somewhere in the Legion system. The OPR contains state information that enables the
object to move to an active state. An active object runs as a process that is ready to accept
member function invocations; an active object's state is typically maintained in the
address space of the process (although this is not strictly necessary).

Core objects
Several core object types implement the basic system-level mechanisms required by all
Legion objects. Like classes and metaclasses, core objects are replaceable system
components; users (and in some cases resource controllers) can select or implement
appropriate core objects.

• Host objects: Host objects represent processors in Legion. One or more host
objects run on each computing resource that is included in Legion. Host objects
create and manage processes for active Legion objects. Classes invoke the
member functions on host objects in order to activate instances on the computing
resources that the hosts represent. Representing computing resources with Legion
objects abstracts the heterogeneity that results from different operating systems
having different mechanisms for creating processes. Further, it provides resource
owners with the ability to manage and control their resources as they see fit.
• Vault objects: Just as a host object represents computing resources and maintains
active Legion objects, a vault object represents persistent storage, but only for the
purpose of maintaining the state, in OPRs, of the inert Legion objects that the
vault object supports. Context objects: Context objects map context names to
LOIDs, allowing users to name objects with arbitrary high-level string names, and
enabling multiple disjoint name spaces to exist within Legion. All objects have a
current context and a root context, which define parts of the name space in which
context names are evaluated.
• Binding agents: Binding agents are Legion objects that map LOIDs to LOAs. A
<LOID, LOA> pair is called a binding. Binding agents can cache bindings and
organize themselves in hierarchies and software combining trees, in order to
implement the binding mechanism in a scalable and efficient manner.
• Implementation objects: Implementation objects allow other Legion objects to
run as processes in the system. An implementation object typically contains
machine code that is executed when a request to create or activate an object is
made; more specifically, an implementation object is generally maintained as an
executable file that a host object can execute when it receives a request to activate
or create an object. An implementation object (or the name of an implementation
object) is transferred from a class object to a host object to enable the host to
create processes with the appropriate characteristics.
Summary

Legion specifies functionality and interfaces, not implementations. Legion 1.4 provides
useful default implementations of class objects and of all the core system objects, but
users are never required to use our implementations. In particular, users can select (or
build their own) class objects, which are empowered by the object model to select or
implement system-level services. This feature of the system enables object services (e.g.
creation, scheduling, security) to be appropriate for the object types on which they
operate, and eliminates Legion's dependence on a single implementation for its success.

Legion High Performance


Parallelism and resource selection

Legion achieves high-performance computing by selecting resources based on load and


job affinity and through parallel processing.

• High performance via resource selection


Even single task jobs can get better performance when presented with a range of
possible execution sites. The user can, for example, choose the host with the
lowest load or the greatest power. Power, in this context, might be determined by
factors such as performance on the Spec benchmakrs adjusted for load or with the
application itself as a benchmark. Either way, Legion's flexible resource
management scheme lets user-level scheduling agents choose the right resource.
• High performance via parallelism
Parallel processing has been around for some time, on both tightly cupled MPPs
and on workstation and PC clusters. Legion supports a distributed mememory
parallel computing model, but since Legion's objects are often on different hosts,
perhaps thousands of miles apart, communication overhead can run from single
digit milliseconds to tens of milliseconds. The result is that Legion is not
appropriate for fine-grain parallel programs.

Legion can be used for parallel processing in a variety of application styles. It can
execute a single application across geographically separate hosts or support meta-
applications (e.g., schedule the components of a single meta-application on the
nodes of an MPP).

Legion supports parallel processing in four ways:

1. Supporting popular parallel libraries, such as MPI


2. Supporting parallel languages, such as MPL
3. Offering wrap parallel components
4. exporting the run-time library interface to library, toolkit, and compiler writers

Support of parallel libraries

The vast majority of parallel applications today are written in MPI and PVM. Legion
supports both libraries, via emulation libraries that use the underlying Legion run-time
library. Existing applications only need to be recompiled and relinked in order to run on
Legion. MPI and PVM users can thus reap the benefits of Legion with existing
applications. In the future, libraries such as Scalapak will also be supported.

Parallel language support


Legion supports MPL (Mentat Programming Language) and BFS (Basic Fortran
Support). MPL is a parallel C++ language in which the user specifies those classes that
are computationally complex enough to warrent parallel execution. Class instances are
then used like C++ class instances: the compiler and run-time system take over and
construct parallel computation graphs of the program and then execute the methods in
parallel on different processors. Legion is written in MPL: BFS is a set of pseudo-
comments for Fortran and a preprocessor that gives the Fortran programmer access to
Legion objects. It also allows parallel execution via remote asynchronous procedure calls
and the construction of program graphs. HPF may also be supported in the future.

Wrap parallel components

Object wrapping is a time-honored traditiion in the object-oriented world. We have


extended the notion of encapsulating existing legacy codes into objects by encapsulating
parallel components into objects. To other Legion objects the encapsulated object appears
sequential but it executes faster. PVM, HPF, and shared memory threaded applications
can thus be encapsulated into a Legion object.

Export the run-time library

We do not expect to provide the full range of languages and tools that users require:
instead of developing everything here at the University of Virginia, we anticipate Legion
becoming an open, community, artifact, to which other tools and languages are ported.
To support these third party developments, the complete run-time library is available.
User libraries can directly manipulate the run-time library.

The library is completely reconfigurable. It supports basic communication,


encryption/decryption, authentication, and exception detection and propagation, as well
as parallel program graphs. Program graphs represent functions and are first class and
recursive. Graph nodes are member function invocations on Legino objects or sub-
graphs. Arcs model data dependencies. Graphs may be annotated with arbitrary
information, such as resource requirements, architecture affinities, etc. The annotations
may be used by schedulers, fault-tolerance protocols, and other user-defined services.

Legion Scheduling
Application-level scheduling and total site autonomy

Philosophy

The Legion scheduling philosophy is one of reservation through a negotiation process


between resource providers and resource consumers. We view autonomy as the single
most crucial aspect of this process.

• Site autonomy is crucial in attracting resource providers. In particular,


participating sites must be assured that their local policies will be respected by the
system at large. Therefore, final authority over the use of a resource is placed with
the resource itself.
• User autonomy is crucial to achieving maximum performance. A single
scheduling policy will not be the best answer for all problems and programs:
rather, users should be able to choose between scheduling policies, and select the
one which best fits the problem at hand or, in the extreme, provide their own
schedulers. A special, and vitally important, case of user-provided schedulers is
that of application-level scheduling. This allows users to provide per-application
schedulers that are specially tailored to match the needs of the application.
Application-level schedulers will be commonplace in high-performance
computing domains.

To paraphrase the 1996 Presidential election campaign, "It's the autonomy, stupid!"

Model

Legion presently provides two types of resources: hosts (computational resources) and
vaults (storage resources). We will incorporate network resources in the future. As seen
below, the Legion scheduling module consists of three major components: a resource
state information database, a module which computes request (object) mapping to
resources (hosts and vaults), and an activation agent responsible for implementing the
computed schedule. We call these items the Collection, Scheduler, and Enactor,
respectively.
The Collection interacts with resource objects to collect state information describing the
system (step 1). The Scheduler queries the Collection to determine a set of available
resources that match the Scheduler's requirements (step 2). After computing a schedule,
or set of desired schedules, the Scheduler passes a list of schedules to the Enactor for
implementation (step 3). The Enactor then makes reservations with the individual
resources (step 4), and reports the results to the Scheduler (step 5). Upon approval by the
Scheduler, the Enactor places objects on the hosts, and monitors their status (step 6).

If the user does not wish to select or provide an external scheduler, the Legion system
(via the class mechanism) provides default scheduling behavior supplying general-
purpose support. Through the use of class defaults, sample schedulers, and application-
level schedulers, the user can balance the effort put into scheduling against the resulting
application performance gain.

Features in 1.4

• Resource reservations for Host and Vaults.


• Collection objects providing resource information for schedulers, using data
collection agents that push information.
• Enactor objects to implement schedules, by obtaining resource reservations and
starting objects.
• Support for application-level, per-object schedulers.
• Per-class default external schedulers and placements (these may be overridden at
user's behest).
• Intelligent scheduling for stateless objects, which balances the workload across
available hosts.
• A pull model for Collection data gathering will be added in future releases, as
well as additional monitoring support and sample schedulers.
Legion Security
Security is built into the system from the beginning

Philosophy

Legion is a software infrastructure that unites large collections of heterogeneous


computing resources into single, coherent systems. With Legion, users can access
scattered resources easily, share data and computing power, and build new meta-
applications running across the network. While these possibilities are attractive, users
will only adopt Legion if they feel confident that it will protect the privacy and integrity
of their existing resources as well as the new resources they create within Legion.
Without security, Legion systems can offer some limited uses. But for the full Legion
vision of large-scale metacomputers to become a reality, security is essential.

Recognizing this fact, we have made security a part of the Legion design from the
beginning. There are five main requirements that must be satisfied:

• Do no harm. The installation of Legion at a site should not compromise that site's
security policies and goals. In general, Legion must not allow unauthorized access
to system resources, where resources can be broadly defined to range from user
files to root privileges.
• Adapt to local policies. In concert with the first requirement, Legion must be
configurable to the security needs of different organizations. Of course, more
stringent security constraints generally exact a price in performance and ease of
use.
• Provide an access control framework. All local resources are represented in
Legion as objects, and the fundamental Legion resource is the ability to call a
method on an object. Objects must have flexible access control mechanisms for
authorizing and denying method calls.
• Maintain and protect identities. Objects and users have identities that can be used
to independently authenticate and authorize one another. These identities are
represented through private keys and signed credentials of various types. In a
distributed object system, it is often necessary to delegate authority to other
objects. Legion should not only protect identities from theft and spoofing, but also
minimize the dispersion of authority that delegation causes.
• Protect communication. A Legion system may span public or semi-public
networks. Objects must be able to communicate with guaranteed integrity and
privacy as needed. Message replay must be detected and prevented.

Model

The security model for Legion differs significantly from that of conventional systems. A
Legion "system" is really a federation of resources from multiple administrative domains,
each with its own separately evaluated and enforced security policies. As such, there is no
central kernel or trusted code base that can monitor and control all interactions between
users and resources. Nor is there the concept of a superuser--no one person or entity
controls all of the resources in a Legion system.

Legion programs and objects run on top of host operating systems, in user space. They
are thus subject to the policies and administrative control of the local OS, and the Legion
objects running on a particular host must trust that host. However, there is no requirement
for Legion objects to trust other Legion objects. A critical aspect of Legion security is
that the security of the overall system does rely on every host being trustworthy. A large
Legion system will include multiple trust domains, and even within one trust domain,
some of the hosts may be compromised or may even be malicious. For example, two
organizations might use Legion to share certain resources in specifically constrained
ways. Such sharing would clearly not be acceptable if one organization could subvert the
other's objects through its ownership of some part of a Legion system.

These aspects of Legion allow for considerable flexibility in the security policies
associated with various Legion objects, which in turn provides the foundation for
satisfying the first two security requirements. For example, local policy may require
Kerberos authentication, audit logs, resource usage accounting, encapsulation of critical
security functionality in small, easily vetted programs, etc. Any of these features can be
implemented without departing from the overall model and the minimal assumptions that
are made between Legion objects.

Access control for Legion objects first requires that the user determine the security policy
for an object by defining the object's rights and the method calls they allow. Access
control is then supported via a special member function called MayI present in every
object. MayI is Legion's traffic cop: All method calls to an object must first pass through
MayI before the target member function is invoked. Only if the caller has the appropriate
rights for the target method will MayI allow that method invocation to proceed. The
figure below shows a call from object A to object B.

To make rights available to a potential caller, the owner of an object gives it an


unforgeable credential that lists the rights granted. When the caller invokes a method on
the object, it presents the appropriate credential to MayI, which then checks the scope and
authenticity of the credential. Alternatively, the owner of an object can semipermanently
assign a set of rights to a particular caller or group. MayI's responsibility is then to
confirm the identity of the caller and its membership in one of the allowed groups,
followed by comparing the rights authorized with the rights required for the method call.

The means for establishing identity in Legion also address the requirement for protecting
communications between objects. Every Legion object has a public key pair; the public
key is part of the object's name. Objects can use the public key of a target object to
encrypt their communications to it. Likewise, an object's private key can be used to sign
messages, providing authentication and nonrepudiation. The integration of public keys
into object names allows Legion to avoid the need for a certification authority (although
such an authority is still useful for establishing user identities). If an intruder tries to
tamper with the public key of a known object, it will create a new name that is unknown.

The combined components of the security model encourage the creation of a large-scale
Legion system with multiple overlapping trust domains. Each domain can be separately
defined and controlled by the users it affects. When difficult problems arise such as
merging two trust domains, Legion provides a common and flexible context in which
they can be resolved.

Security Features

• Public-key cryptography based on RSAREF 2.0.


• Three message-layer security modes: private (encrypted communication),
protected (fast digested communication with unforgeable secrets to ensure
authentic replies to message calls), and no security.
• Caching secret-keys for faster encryption of multiple messages between
communicating parties.
• Auto-encrypted bearer credentials with free-form rights. Propagation of security
modes and certificates through calling trees (e.g., if a caller demands encryption,
all downstream calls will use it automatically).
• Drop-in addition of MayI functionality to existing objects.
• Persistent authentication objects that serve as the representation for users in a
trust domain.
• Secure legion shell to allow users to login to their authentication objects and
obtain associated credentials and environment information.
• Isolation and protection of objects using local OS accounts.
• Easily checked Process Control Daemon for granting limited OS privileges to
Legion Host Objects.
• Context space configured with access control for multiple users.

Das könnte Ihnen auch gefallen