Beruflich Dokumente
Kultur Dokumente
Legion is a work in progress: our team will not finish Legion but will create an "open"
system that allows and actively encourages third-party development of applications, run-
time library implementations, and core system components.
When first starting a new system, you will need to initialize the HPC, Extra, and
Applications packages with the legion_init_HPC, legion_init_Extra, and
legion_init_Apps command-line tools.
• This restructuring has meant that you now need to download and install Open
SSL on your own. Legion uses public key cryptography based on the RSA 2.0
algorithm, as implemented by OpenSSL. You will need to download OpenSSL
0.9.5 or higher from http://www.openssl.org. You'll need to untar, configure, and
compile it. Be sure that you set your $OPENSSL_INC and $OPENSSL_LIB
variables to the correct directory. Suggested values are:
(ksh or sh users)
(csh users)
• You can use the JobQueue, with the legion_nq, legion_manage_job, and
legion_manage_queue command-line tools, to start and monitor remote jobs.
• You can edit information about your user profile and security settings with
legion_configure_profile. You can modify the implicit parameter set for your
current session with legion_modify_parameters.
• Two new command-line tools, legion_skcc_set_class_vaults and
legion_skcc_set_defaults, let you set defaults for SKCC classes.
• The list of supported platforms has changed. We don't have a working binary for
the SGI Workstations/IRIX 6.5 n64 build although we're working on it. We are
dropping support for the x86/FreeBSD 4.2 platform, although we will consider
adding it back in if someone needs it. We may be adding a T3E platform in the
future. We are also not currently supporting Windows platforms. If you need any
of these platforms, please contact us at legion@virginia.edu.
• We've added simple K-copy classes (SKCC). This allows certain Legion objects
to use backup vaults to replicate their persistent state, in case their primary vault
crashes or is unavailable when an object needs to reactivate. This makes it easier
to tolerate host failures. There are four new commands associated with SKCC:
legion_set_backup_vaults, legion_synch_vaults, legion_set_worm, and
legion_unset_worm.
• We are now using OpenSSL to implement the RSA algorithm. Since the RSAREF
patent has expired, we can now export Legion abroad with full encryption.
• The 1.7 release now includes a set of GUIs for Windows 2000 machines. These
GUIs are collectively known as the Worldwide File Server (WWFS). The WWFS
is a discrete set of applications that you download and install on your Windows
machine. It connects your machine to an existing Legion net (such as NPACI-net)
and lets you work in your context space. The WWFS binary package includes
four GUIs to let you work in Legion context space and an FTP daemon, which
uses standard ftp protocols to transfer files between context space and any ftp
client (Legin credentials and full security are always managed by the daemon).
The binary package is available from Applied MetaComputing.
• For NPACI-net users, we've added a web-portal for running Amber on our Legion
web browser. The portal works on both IE 4 and Netscape Communicator, but for
best results we'd suggest you use IE.
• We've improved legion_run and legion_run_multi. We have added a probe
objects, which allows you to check your runs while they are executing and move
files to and from the executing remote host(s). You can also start your jobs in
blocking or nonblocking mode. For more information, please see the updated
legion_run and legion_run_multi FAQs.
• We've added a new MPI tool, legion_mpi_probe. This tool allows you to check
your MPI runs.
• You can now use wildcards with legion_ls, legion_cp, and legion_rm. For
example, you could ask to remove all context names beginning with "Foo" by
entering:
$ legion_rm Foo\*
Binding agents and the Legion library have been improved to cache more
information. Caching now includes object interfaces, context names, and contents.
Context information caching is only allowed for objects that export a
"context_contents_cacheble('YES')"attribute.
Upon login, we now cache some high-use objects' bindings and high-use contexts'
LOIDs. These binding may become stale, so we have added a new tool,
legion_refresh_local_cache, to refresh them on request. We advise refreshing
your cache if you notice a consistent delay of around thirty seconds before and
after commands respond.
• Finally, we've improved the I/O library and updated the communication system
(with a UDP communication-layer sliding window protocol) so that version 1.7 is
remarkable faster, more scalable, and more flexible.
• You can use wildcards with legion_mpi_run's -in/-IN and -out/-OUT flags to
name groups of files to be used as input and output files. The following wildcards
can be used with -in/-out and -IN/-OUT:
* match 0 or more characters
? match any one character
[-] match any character listed between the brackets (use these to specify a range
of characters)
\ treat the character as a literal
• For example, if you wanted to identify done.1, done.2, done.3 ... done.9 as
your inputs, you could use square brackets to identify them as a group:
• $ legion_mpi_run -n 2 -IN done.[0-9] /mpi/programs/mpiFoo
• You can use wildcards on the command line or in an option file. They can only be
used with file names, however, not with directories.
• The legion_native_mpi_run command now has a -debug flag.
• A new command, legion_make_hostlist, lets you create a host list for
legion_mpi_run.
For example,
Note that the <file name> does not need to match the <file path>: in this case,
the program will copy the contents of /home/my_files/foo_file to a local file
and assign it the name foo.
• The legion_link command has a -FC flag. This flag allows you to specify a
Fortran compiler.
• The legion_create_user command has new flags that allow you to specify a
new user id's password from the command line and to specify the new user's
home context space. The <user id> parameter is also now a full path, which can
be given as a relative or absolute path.
• We have added new flags to the legion_mpi_run command. The new flags,
-in/-out/-stdin/-stdout/-stderr, -IN/-OUT/-STDIN/-STDOUT/-STDERR, and
-a/-A, give you more control over input and output data for your mpi program.
They resemble the legion_run flags.
• There is a new -f flag for legion_add_host_account. This allows you to set up
a mapping file that lists all of your Unix-Legion account mappings for that PCD
host.
• There are new keywords available for legion_run_multi: you can now specify
stdout/stderr/stdin for local file space.
• When starting a PCD host object, once you have started the PCD host object and
(if necessary) the accompanying vault, you must change the following file
permissions on the node that is actually running the PCD host.
o $LEGION_OPR should be set to 755
o $LEGION_OPR/LegionClass.config* should be set to 644
o $LEGION_OPR/BootstrapVaultOPR should be set to 777 (If your
bootstrap host is a PCD host)
o $LEGION_OPR/<vault_name>.OPA should be set to 777 (If the bootstrap
host is not a PCD host)
• MPI is now integrated with vector create. This involved added two new options to
the legion_mpi_run option, -hf and -HF.
• We've added two new command-line tools: legion_mkdir and legion_cd. These
two commands perform exactly the same functions as legion_context_create
and legion_set_context, respectively.
• The legion_ping tool now has a -timeout flag, which allows you to set a
timeout period for pinging a Legion object.
• The Legion libraries (libLegion1 and libLegion2) now use version numbers.
• The performance of passing messages in secure Legion systems has been greatly
improved.
• We have added two new platforms to the 1.6 release: a beta release of Windows
NT 4.0 and a FreeBSD 3.0 for x86 machines.
• We have also added tools for building virtual hosts. This allows you to run
programs on unsupported machines, such as a Cray T3E.
• The 1.6 release has tools to help debug and analyze Legion applications
(legion_record and legion_replay).
• We've added the support for operating in environments that require Kerberos
authentication.
• We have added the legion_export_dir tool, which lets you link a local
directory to your context space.
• We have also added a checkpointing library for SPMD-style (Single Program
Multiple Data) applications to deal with MPI application failure.
• The legion_check_system tool has two new flags which will report errors in
command-line or context objects and then destroy the erring objects.
• We have updated the legion_run_multi tool, so that you can use keywords to
specify an input/output file's location.
• Host objects: We've put in a bug fix in the host object restart code, so that host
objects will restart more reliably. This is especially relevant for multi-host
systems, although it will apply to all Legion systems.
• MPI: 1.5.15 has MPI-2 conversion functions for C and Fortran interoperability. It
also has a bug fix for MPI_pack().
• Security: We've put in a bug fix for the legion_init_security command-line
utility. It should now run much more reliably.
As computer networks are get larger, faster, and more powerful, they offer new
opportunities. Gigabit networks, connecting powerful high-performance machines and
workstations, have enormously powerful infrastructures that can solve complex problems
and distribute huge amounts of information. Linked together, these connected resources
make up a single, worldwide, virtual computer. We now need easy-to-use software that
can manage a complex physical system and support large degrees of parallelism so that a
virtual computer becomes a reliable, efficient, and real opportunity for a wide variety of
users.
As new requirements and new opportunities for distributed computing emerge and future
users make unforeseen demands on resources and software, the demands placed on a
virtual computer will evolve and grow. What works today or even tomorrow will soon be
worse than useless, and we strongly believe that Legion should be a flexible tool that can
adapt to new needs. Legion is therefore an open system, designed to encourage third
party development of new or updated applications, run-time library implementations, and
core components.
Legion sits on top of the user's operating system, acting as liaison between its own host(s)
and whatever other resources are required. The user isn't bogged down with time-
consuming negotiations with outside systems and system administrators, since Legion's
scheduling and security policies act on his or her behalf. Conversely, it can protect its
own resources against other Legion users, so that administrators can choose appropriate
policies for who uses which resources under what circumstances. To allow users to take
advantage of a wide range of possible resources, Legion offers a user-controlled naming
system called context space, so that users can easily create and use objects in farflung
systems. Users can also run applications written in multiple languages, since Legion
supports interoperability between objects written in multiple languages.
Objectives
• Site autonomy
Legion is not a monolithic system, but is composed of resources owned and
controlled by a variety of organizations. Since these organizations require control
over their own resources, Legion can not dictate how much of a particular
resource can be used, when it can be used, or who can use it.
• Extensible core
We cannot predict all of the needs of current and future users. We must build
Legion with extensible and replaceable components that permit Legion to evolve
over time and allow users to construct their own mechanisms and policies.
• Scalable architecture
Legion cannot rely on a centralized structure. If the system is to eventually
encompass millions of hosts, it must use a scalable architecture.
• Easy-to-use, seamless, computational environment
We must mask the complexity of the hardware environment and the
communications synchronization of parallel processing. Users should not be
aware of machine boundaries. Compilers, in cooperation with run-time facilities,
should manage the environment.
• High-performance via parallelism
We must support easy-to-use parallel processing by means of large degrees of
parallelism (this includes task and data parallelism and their arbitrary
combinations).
• Single, persistent, name space
One of the most significant obstacles to wide-area parallel processing is the lack
of a single name space for file and data access. Existing multiple disjoint name
spaces makes writing applications for multiple sites very difficult. Legion
therefore uses a single, persistent, name space.
• Security for users and resource owners
We cannot replace existing host operating systems (see below), but we can ensure
that existing mechanisms are not weakened by Legion. Legion does not define a
security policy or requires a "trusted" Legion, but offers mechanisms for users to
manage their own security needs.
• Management and exploitation of resource heterogeneity
Legion must support interoperability between heterogeneous hardware and
software components, as well as exploit architectural strengths as possible when
making scheduling decisions and policy.
• Multiple language support and interoperability
Legion applications will be written in a variety of languages, and heterogeneous
source language application components must be integrated. We must also
support legacy codes.
• Fault tolerance
At any given moment in a large system, several hosts, communication links, and
disks will fail. Legion must be able to handle their failures and dynamic
reconfiguration.
Constraints
Legion Applications
Adapting and Running Applications
Legion aims to provide an easy-to-use environment in which users have access to all the
resources of the worldwide metacomputing environment for their applications. The four
application examples illustrated here show how the advanced features of Legion, such as
flexible security and transparent file access, can be used to extend today's applications on
to larger sets of resources.
Legion allows programs to execute transparently and securely on remote hosts, taking
advantage of Legion's distributed resources. Users can quickly and easily adapt their
programs to run in Legion, although modification may not even be necessary. Legion
also includes a remote "make" tool, which compiles binaries for other machine
architectures without requiring the user to log in to another machine.
Not all applications are best suited to simple remote execution, so we have extended this
capability to attack the class of problems known as parameter space searches. For
example, the NASA NAS effort demands thousands of combinations of CFD
computations, using a variety of wing designs, wing angles, and air speeds. A single
computation requires running five programs in sequence, each reading the output files
generated by the previous program. No individual program consumes significant CPU
time, but the total CPU time consumed by all runs adds up to tens of thousands of hours.
We have written a simple tool to facilitate running parameter space studies under Legion.
Users can specify input and output files, the maximum number of jobs to run on a
particular host, and the total number of jobs to be run at a time. Once the initial jobs have
started on each host, future jobs are sent to the hosts that have finished previous runs.
This dynamic scheduling allows for load balancing and faster processing time.
Some existing parallel applications run more efficiently on more CPUs than are available
on a single machine. For example, a DSMC (Direct Simulation Monte-Carlo) code in use
at the University of Virginia to study vapor deposition onto surfaces has a low ratio of
communications to computation, so that it can run on a large number of widely separated
machines. This approach allows the solution of much larger problems, but can cause
difficulties with some conventional parallel tools. Most vendor-supplied versions of MPI,
a popular communications library, cannot be parallelized on multiple supercomputers.
Successfully running this type of problem requires overcoming MPI's limitations as well
as transparent access to files and appropriate scheduling support. Legion provides these
capabilities today.
Another example of a wide-area parallel application that can benefit from Legion's
collection of metacomputing resources is neural network modeling. Some types of
models use relatively low amounts of communications and are limited only by the
amount of RAM or disk bandwidth of the machines that they run on. We have
parallelized such a model using the MPI interface of Legion, and hope soon to set a world
record for the largest neural network run.
Coupled Applications
The most difficult category of applications that Legion supports are coupled applications,
in which several initially distinct programs are glued together. One example we are
working with under the NPACI umbrella is a coupled system attempting to predict the
effect of the El Niño weather pattern on California. Global climate models do not have a
sufficiently fine resolution to accurately predict precipitation over the California
mountains, so we are coupling the global model with a regional model that more
accurately predicts area weather. However, the two models were written by different
scientific groups and in different languages. The global model is parallelized but the
regional model is not. Legion can couple these models in an easy-to-use fashion that can
be extended to include other models, such as an estuary model of San Diego Bay.
Legion Architecture
Legion's object-based system gives classes and metaclasses system-level
responsiblity
Philosophy
Legion users will require a wide range of services in many different dimensions,
including security, performance, and functionality. No single policy or static set of
policies will satisfy every user, so, whenever possible, users must be able to decide what
trade-offs are necessary and desirable. Several characteristics of the Legion architecture
reflect and support this philosophy.
The Model
Legion implements a three-level naming system. At the highest level, users refer to
objects using human-readable strings, called context names. Context objects map context
names to LOIDs (Legion object identifiers), which are location-independent identifiers
that include an RSA public key. Since they are location independent, LOIDs by
themselves are insufficient for communication; therefore, a LOID is mapped to an LOA
(Legion object address) for communication. An LOA is a physical address (or set of
addresses in the case of a replicated object) that contains sufficient information to allow
other objects to communicate with the object (e.g., an <IP address, port number> pair).
Legion will contain too many objects to simultaneously represent all of them as active
processes. Therefore, Legion requires a strategy for maintaining and managing the
representations of these objects on persistent storage. A Legion object can be in one of
two different states, active or inert. An inert object is represented by an OPR (object
persistent representation), which is a set of associated bytes that exists in stable storage
somewhere in the Legion system. The OPR contains state information that enables the
object to move to an active state. An active object runs as a process that is ready to accept
member function invocations; an active object's state is typically maintained in the
address space of the process (although this is not strictly necessary).
Core objects
Several core object types implement the basic system-level mechanisms required by all
Legion objects. Like classes and metaclasses, core objects are replaceable system
components; users (and in some cases resource controllers) can select or implement
appropriate core objects.
• Host objects: Host objects represent processors in Legion. One or more host
objects run on each computing resource that is included in Legion. Host objects
create and manage processes for active Legion objects. Classes invoke the
member functions on host objects in order to activate instances on the computing
resources that the hosts represent. Representing computing resources with Legion
objects abstracts the heterogeneity that results from different operating systems
having different mechanisms for creating processes. Further, it provides resource
owners with the ability to manage and control their resources as they see fit.
• Vault objects: Just as a host object represents computing resources and maintains
active Legion objects, a vault object represents persistent storage, but only for the
purpose of maintaining the state, in OPRs, of the inert Legion objects that the
vault object supports. Context objects: Context objects map context names to
LOIDs, allowing users to name objects with arbitrary high-level string names, and
enabling multiple disjoint name spaces to exist within Legion. All objects have a
current context and a root context, which define parts of the name space in which
context names are evaluated.
• Binding agents: Binding agents are Legion objects that map LOIDs to LOAs. A
<LOID, LOA> pair is called a binding. Binding agents can cache bindings and
organize themselves in hierarchies and software combining trees, in order to
implement the binding mechanism in a scalable and efficient manner.
• Implementation objects: Implementation objects allow other Legion objects to
run as processes in the system. An implementation object typically contains
machine code that is executed when a request to create or activate an object is
made; more specifically, an implementation object is generally maintained as an
executable file that a host object can execute when it receives a request to activate
or create an object. An implementation object (or the name of an implementation
object) is transferred from a class object to a host object to enable the host to
create processes with the appropriate characteristics.
Summary
Legion specifies functionality and interfaces, not implementations. Legion 1.4 provides
useful default implementations of class objects and of all the core system objects, but
users are never required to use our implementations. In particular, users can select (or
build their own) class objects, which are empowered by the object model to select or
implement system-level services. This feature of the system enables object services (e.g.
creation, scheduling, security) to be appropriate for the object types on which they
operate, and eliminates Legion's dependence on a single implementation for its success.
Legion can be used for parallel processing in a variety of application styles. It can
execute a single application across geographically separate hosts or support meta-
applications (e.g., schedule the components of a single meta-application on the
nodes of an MPP).
The vast majority of parallel applications today are written in MPI and PVM. Legion
supports both libraries, via emulation libraries that use the underlying Legion run-time
library. Existing applications only need to be recompiled and relinked in order to run on
Legion. MPI and PVM users can thus reap the benefits of Legion with existing
applications. In the future, libraries such as Scalapak will also be supported.
We do not expect to provide the full range of languages and tools that users require:
instead of developing everything here at the University of Virginia, we anticipate Legion
becoming an open, community, artifact, to which other tools and languages are ported.
To support these third party developments, the complete run-time library is available.
User libraries can directly manipulate the run-time library.
Legion Scheduling
Application-level scheduling and total site autonomy
Philosophy
To paraphrase the 1996 Presidential election campaign, "It's the autonomy, stupid!"
Model
Legion presently provides two types of resources: hosts (computational resources) and
vaults (storage resources). We will incorporate network resources in the future. As seen
below, the Legion scheduling module consists of three major components: a resource
state information database, a module which computes request (object) mapping to
resources (hosts and vaults), and an activation agent responsible for implementing the
computed schedule. We call these items the Collection, Scheduler, and Enactor,
respectively.
The Collection interacts with resource objects to collect state information describing the
system (step 1). The Scheduler queries the Collection to determine a set of available
resources that match the Scheduler's requirements (step 2). After computing a schedule,
or set of desired schedules, the Scheduler passes a list of schedules to the Enactor for
implementation (step 3). The Enactor then makes reservations with the individual
resources (step 4), and reports the results to the Scheduler (step 5). Upon approval by the
Scheduler, the Enactor places objects on the hosts, and monitors their status (step 6).
If the user does not wish to select or provide an external scheduler, the Legion system
(via the class mechanism) provides default scheduling behavior supplying general-
purpose support. Through the use of class defaults, sample schedulers, and application-
level schedulers, the user can balance the effort put into scheduling against the resulting
application performance gain.
Features in 1.4
Philosophy
Recognizing this fact, we have made security a part of the Legion design from the
beginning. There are five main requirements that must be satisfied:
• Do no harm. The installation of Legion at a site should not compromise that site's
security policies and goals. In general, Legion must not allow unauthorized access
to system resources, where resources can be broadly defined to range from user
files to root privileges.
• Adapt to local policies. In concert with the first requirement, Legion must be
configurable to the security needs of different organizations. Of course, more
stringent security constraints generally exact a price in performance and ease of
use.
• Provide an access control framework. All local resources are represented in
Legion as objects, and the fundamental Legion resource is the ability to call a
method on an object. Objects must have flexible access control mechanisms for
authorizing and denying method calls.
• Maintain and protect identities. Objects and users have identities that can be used
to independently authenticate and authorize one another. These identities are
represented through private keys and signed credentials of various types. In a
distributed object system, it is often necessary to delegate authority to other
objects. Legion should not only protect identities from theft and spoofing, but also
minimize the dispersion of authority that delegation causes.
• Protect communication. A Legion system may span public or semi-public
networks. Objects must be able to communicate with guaranteed integrity and
privacy as needed. Message replay must be detected and prevented.
Model
The security model for Legion differs significantly from that of conventional systems. A
Legion "system" is really a federation of resources from multiple administrative domains,
each with its own separately evaluated and enforced security policies. As such, there is no
central kernel or trusted code base that can monitor and control all interactions between
users and resources. Nor is there the concept of a superuser--no one person or entity
controls all of the resources in a Legion system.
Legion programs and objects run on top of host operating systems, in user space. They
are thus subject to the policies and administrative control of the local OS, and the Legion
objects running on a particular host must trust that host. However, there is no requirement
for Legion objects to trust other Legion objects. A critical aspect of Legion security is
that the security of the overall system does rely on every host being trustworthy. A large
Legion system will include multiple trust domains, and even within one trust domain,
some of the hosts may be compromised or may even be malicious. For example, two
organizations might use Legion to share certain resources in specifically constrained
ways. Such sharing would clearly not be acceptable if one organization could subvert the
other's objects through its ownership of some part of a Legion system.
These aspects of Legion allow for considerable flexibility in the security policies
associated with various Legion objects, which in turn provides the foundation for
satisfying the first two security requirements. For example, local policy may require
Kerberos authentication, audit logs, resource usage accounting, encapsulation of critical
security functionality in small, easily vetted programs, etc. Any of these features can be
implemented without departing from the overall model and the minimal assumptions that
are made between Legion objects.
Access control for Legion objects first requires that the user determine the security policy
for an object by defining the object's rights and the method calls they allow. Access
control is then supported via a special member function called MayI present in every
object. MayI is Legion's traffic cop: All method calls to an object must first pass through
MayI before the target member function is invoked. Only if the caller has the appropriate
rights for the target method will MayI allow that method invocation to proceed. The
figure below shows a call from object A to object B.
The means for establishing identity in Legion also address the requirement for protecting
communications between objects. Every Legion object has a public key pair; the public
key is part of the object's name. Objects can use the public key of a target object to
encrypt their communications to it. Likewise, an object's private key can be used to sign
messages, providing authentication and nonrepudiation. The integration of public keys
into object names allows Legion to avoid the need for a certification authority (although
such an authority is still useful for establishing user identities). If an intruder tries to
tamper with the public key of a known object, it will create a new name that is unknown.
The combined components of the security model encourage the creation of a large-scale
Legion system with multiple overlapping trust domains. Each domain can be separately
defined and controlled by the users it affects. When difficult problems arise such as
merging two trust domains, Legion provides a common and flexible context in which
they can be resolved.
Security Features