Beruflich Dokumente
Kultur Dokumente
Concept / Idea
enrolling
conscious decision on the part of a PC owner to sign up with a particular organization to allow the spare
computational cycles of his PC to be used by the selected project
Upon enrollment, a small control program is downloaded to the PC. This program is responsible for
communicating with the central project serve
harvesting
spare capacity of the machine by executing cause-related computations relayed by the central server
Key Concepts
Resource Management
All Internet-based grids use passive resource management;
they rely on the enrolled PCs to initiate communication on a periodic basis.
This limits the degree to which the timeliness of results from such a grid can be predicted.
In addition, it limits the ability to reprioritize the computational behavior of the grid in a timely manner.
Communication and Data Security
HTTP is the communication protocol between the PCs and the central server.
the data usually reside in an unencrypted format on the enrolled PC.
This limits the nature of the problems that can be attempted over the public Internet to those in which
compromise of the data is not a pressing issue.
In some cases, the answers produced on the enrolled PC may be vulnerable to tampering, causing the
confidence in the results to be lower than desired.
Machine Heterogeneity
A wide variety of machines can vary in CPU speed, RAM, hard-drive capacity, and operating system level.
The management infrastructure either needs to operate at the lowest common denominator or needs to be
aware of differences in the machines and assign tasks appropriately.
Resource Availability
The entire cause-computing paradigm relies on the idea of voluntary participation.
the availability and utility of any particular resource is subject to the whim of the person controlling the PC.
adds a layer of unpredictability to the performance expectations that can be associated with such a grid.
Definition
Characteristics of Desktop Grid
A defined (named) collection of machines on a shared network, behind a single firewall, with all machines
running the Windows operating system.
A set of user-controlled policies describing the way in which each of these machines participates in the grid.
These policies should also support automated addition and removal of machines
A hub-and-spoke virtual network topology controlled by a dedicated, central server. The machines on the
grid are unaware of each other. More of a client-server architecture than a peer-to-peer architecture.
An actively managed mechanism for distribution, execution, and retrieval of work to and from the grid under
control of a central server
User Expectations
The user of the PC on the corporate desktop views it as a truly “personal” part of his work experience, much
like a telephone or a stapler.
It is often running many concurrent applications and needs to appear as if it is always and completely
available to serve that employee’s needs.
After a distributed computing component is deployed on an employee’s PC, that component will tend to be
blamed for every future fault that occurs—at least until the next new component or application is installed.
Security
The Desktop Grid must protect the integrity of the distributed computation.
Tampering with or disclosure of the application data and program must be prevented.
must protect the integrity of the underlying computing resources.
The Grid Client Executive must prevent distributed computing applications from accessing or modifying
data on the computing resources.
Application Level
The distributed application should run on the PC in an environment that is completely separate from the
PC’s normal operating environment.
The Grid Clients should receive executable programs only from Grid Servers, which should always be
authenticated.
System Level
The Grid Client Executive should prevent an application from using/misusing local or network resources.
Machine configuration, applications, and data should be unaffected.
Task Level
The Grid Client Executive must encrypt the entire work unit to protect the integrity of the application, the
input data, and the results.
Unobtrusiveness
usage of these resources should be unobtrusive.
The Grid Client should cause no degradation in PC performance
When the user’s tasks require any resources from the Grid Client, the Grid Client Executive should yield
instantly and only resume activity as the resources again become available
o result is maximum utilization of resources with no loss of PC user productivity
Robustness
must complete computational jobs with minimal failures
the grid must execute reliably with fault tolerance on heterogeneous resources that may be turned off or
disconnected during execution
matching and dispatching appropriately sized tasks to each machine
Scalability
should be capable of scaling to tens of thousands of PCs to take advantage of the increased speed and
power a large grid can provide
should also scale downward, performing well even when the grid is limited in scope
Central Manageability
management capability that allows control of grid resources, application configuration, scheduling, and
software version management and upgrades
administrator should be able to manage all of the Grid Clients without requiring physical access to them
Management, queuing, and monitoring of the computational work should be easy and intuitive
Applications
most technically advanced Desktop Grid deployment is of little use without applications that can execute
must have a Windows version of the application, all supporting files and environmental settings needed to
establish an execution environment for the application,
appropriate licensing to permit multiple, concurrent copies of the application to be executed
continuum of “Application Suitability
Application Categories
Data Parallel
These applications process large input datasets in a sequential fashion with no application dependencies
between or among the records of the dataset.
application that examines a large file of integers (that are stored one integer per line) and counts the
number of items greater than a particular target value
Parameter Sweep
These applications use an iterative approach to generate a multidimensional series of input values used to
evaluate a particular set of output functions
application that finds the maximum value of a function F(X,Y) over a specified range using an exhaustive
search approach that involves iteration of the parameters
Probabilistic
These applications process a very large number of trials using randomized inputs (or other ab initio
processes) to generate input values used to evaluate a particular set of output functions
application that finds the maximum value of a function F(X,Y) over a specified range using a Monte Carlo
approach
Compute Intensity of a typical work unit reflects the relative percentage of time spent moving data to and
from the Desktop Grid Client compared to the time spent performing calculations on that data
typical work unit executes in 15 minutes (900 seconds) on a hypothetical “average” grid client, consumes
2MB (2,000 KB) of input data, and produces 0.4MB (400 KB) of output data
Computing Environment
Can be considered as
o alternative, lower-cost option when compared with acquiring new, dedicated computing resources.
o complementary addition to an existing infrastructure in which problems with a Windows-based solution
can be executed on the Desktop Grid
A partial list of environmental considerations includes:
o Archival and Cleanup—Of information regarding completed computing tasks and their results.
o Backup and Recovery—In the event of failure of a Grid Server
o Performance Monitoring and Tuning—Of the Grid Server and the Grid Clients
Culture
number of cultural considerations that will contribute to the success or failure of a Desktop Grid within any
particular organization
Data Caching
The time needed to move data to and from the Grid Client in the calculation of Computational Intensity.
Advanced Desktop Grid systems will provide various forms of data caching
o data needed for a work unit can be placed in (or very close to) the Grid Client for advance execution
can either be manually controlled or automatically administered (the Grid Server examines its queue of
work and ensures that any data needed for a work unit will be available at the Client).
Security
Each function within the Grid Server user environment should include user-level security—which users may
add new applications, which users may submit jobs, which users may review job output, etc
should have its own security system for access to any of its components through direct methods
System Interfaces
The Grid Server should support a variety of interfaces for its various user and administrative functions.
o a browser-based interface.
o a command-line interface (for scripting support),
o a Windows-based API (for invoking grid functionality from other Windows programs)
o an XML interface (as a general-purpose communication methodology).
Any system interfaces provided must also include a security protocol
Real-world Examples
Risk Management for Financial Derivatives - financial sector
Opportunity
A large North American brokerage organization used a series of interconnected Excel spreadsheets to
calculate various risk parameters
their ability to complete all the required calculations in the time window available
Upper management also wanted to move the calculation windows from a once per day (batch) paradigm to
one in which complex risk-management calculations could be executed on a near-realtime basis
Desktop Grid Solution
Departmental Grid.- All of the PCs in the Risk Management group participated in a small-scale grid
The application was grid-enabled with a combination of parameter sweep and probabilistic techniques
a PC grid with less than 25 PCs was more than sufficient
Architectural Rendering
Opportunity
challenge of generating realistic renderings of increasingly complex environments in a timely manner
Desktop Grid Solution
turning every designer workstation and each of the dedicated rendering PCs into a Grid Client
Clusters
HPC clusters use Smart System Software (SSS) to virtualize independent operating-system instances to
provide an HPC service.
SSS allows a number of distinct systems to appear as one—even though each runs its own instance of the
operating system.
there are two possibilities for SSS.
the Single System Image (SSI) is SSS that involves kernel modification.
the Single System Environment (SSE) is SSS that runs in user space as a layered service.
To varying degrees, these solutions enable computing for capacity (i.e., throughput of serial, parametric,
and embarrassingly parallel applications) and/or capability (i.e., multithreaded and distributed-memory
parallel applications).
A layered view of SSS opposite the operating system and end-user applications
This dynamic-load-state capability has significant implications, as task-placement advice is provided directly
to the application on dispatch for execution.
A remote-execution service is required to allow
o Authenticated communications over a network
o A high degree of transparency in maintaining the user’s execution environment
o Task control with respect to limits, signal passing, etc
task-tracking mechanisms are required
o a unique identifier for application control, cluster-wide identifiers can be used in monitoring,
manipulating, reporting, and accounting contexts
employ a policy center to manage all resources—e.g., jobs, hosts, users, queues, external events, etc.
Through the use of a scheduler—and subject to predefined policies—demands for resources are mapped
against the supply for the same in order to facilitate specific activities
Industry Examples
Three application areas can be identified:
Capacity HPC
o serial-processing requirements,
o highly focused on throughput.
o individual tasks are loosely related, and a parametric processing approach can be applied.
o Compute farms based on SSE are of common use in this case.
Capability HPC
o parallel processing requirements
o focused more on the resources required to address challenging problems
o Traditional Grand Challenge problems fall into this category
o Solutions based on either SSI or SSE are common in this case.
Hybrid HPC.
o Capacity and capability tend to be idealized end points.
o Real-world examples involve some combination of capacity and capability requirements.
o SSE handles the mixed workloads found in this case
Bioinformatics
Five steps in genome sequence analysis via Platform ActiveCluster
Split sequences of interest into M files
Divide reference database of interest into N pieces
Dispatch NxM jobs to desktops
Collect NxM results
Merge and sort results
Industrial Manufacturing
Computed-assisted engineering (CAE)
o obtaining high-fidelity results in a cost-effective fashion
Computational fluid dynamics (CFD)
Fluent, Inc.provides software technology for fluid-flow simulations in three-dimensions
MPI implementation of the solver across a cluster of interconnected workstations.
Ford Motor Company’s enterprise VISTEON implemented an MPI version of Fluent across a cluster of
Hewlett-Packard workstations
Cluster Grids
Based on their success with clusters, customers in EDA and industrial manufacturing sought to take
virtualization to the next level.
The resulting federation of clusters had to:
Allow for selective sharing of resources
Preserve a high degree of local autonomy
Address availability on a cluster-by-cluster basis
need to span geographic boundaries within organization
Platform MultiCluster
Platform MultiCluster was introduced in 1996 to
federate clusters based on Platform LSF.
Platform MultiCluster addresses the identified
requirements, and is deployed in one of the three
submission-execution topologies
it’s clear that an HPC cluster is not a grid.
A grid needs to involve more than one cluster, address collaborative requirements, and span distance.
three-point grid checklist:
(1) coordinates resources that are not subject to centralized control using
(2) standard, open, general-purpose protocols and interfaces to
(3) deliver nontrivial qualities of service
Clusters Are Not Grids
Centrally controlled, but not distributed geographically
o In HPC clusters, BPROC and the master of a Platform LSF cluster, are points of central control
o clusters tend to be tightly coupled architectures
Often built around proprietary technologies
o clustering for HPC via SSE, proprietary protocols and interfaces remain in use
o more pronounced in the case of clustering for High Availability and transactional processing
Delivering non-trivial QoS
o locality of concern—a tightly coupled LAN versus a geographically dispersed WAN environment
HPC Grids
Scientific insight
Five steps to scientific insight
Serial Applications
Most scientific problems are implemented initially as serial applications
These problems require that each step of the scientific calculation be performed in sequence.
Serial applications can be executed on compute architectures ranging from isolated desktops, servers, or
supercomputers to compute farms.
Compute farms are loosely coupled compute architectures in which system software is used to virtualize
compute servers into a single system environment (SSE).
Service Applications
focus is networking itself or Web Services
MPI applications would require tightly coupled architectures
networking applications can be applied in a variety of contexts,
loosely coupled architectures can be used in the instantiation of Web Services
HPC Grids
production HPC can be reinvented through the use of an integrated development and run-time environment
in which workload management system software plays a key role
Consider the integrated production HPC solution stack shown below
Data Grids
Characteristics of grid resources
numerous,
owned and managed by different organizations and individuals,
potentially faulty,
different security requirements and policies,
heterogeneous,
connected by heterogeneous, multilevel networks,
different resource management policies, separated geographically
A grid enables users to
Find and share : data be able to access them like data on their own system
Find and share applications
Share computing resources
Data Grids
A data grid provides transparent, secure, high-performance access to federated data sets across administrative
domains and organizations.
Data grids are used to provide secure access to remote data resources:
flat-file data,
relational data
streaming data
Examples
two collaborators at sites A and B need to share the results of a computation performed at site A,
design data for a new part needs to be accessible by multiple team members working on a new product at
different sites—and in different companies.
Secure Copy—scp/sftp
SCP/SFTP belong to the ssh family of tools.
SCP is basically a secure version of the Unix rcp command that can copy files to and from remote sites,
sftp is a secure version of ftp.
Both are command-line tools.
The syntax for scp resembles the standard Unix cp command with a provision for naming a remote machine
and a user on it.
the syntax and usage for sftp resembles ftp.
Advantages
their usage is similar to existing tools.
password and data transfer is encrypted, and therefore secure.
Disadvantages
these tools must be installed specifically on the machines on which they will be used.
Installations for Windows are hard to come by.
scp/sftp do not solve several of the problems with ftp
Applications cannot take advantage of remote files using scp/sftp without significant modification
GridFTP
GridFTP is a tool for transferring files
It is built on top of the Globus Toolkit
characterizes the Globus “sum of services” approach for a grid architecture.
could use GridFTP to transfer files from one machine to another, similar to the way they would use ftp.
both parties must install the Globus Toolkit in order to use this service.
Advantages
solves the privacy and integrity of the problems with ftp by encrypting passwords and data.
provides for high-performance, concurrent accesses by design.
An API enables accessing files programmatically
Data can be accessed in a variety of ways—for example, blocked and striped.
Part or all of a data file may be accessed, thus removing the all-or-nothing disadvantage of ftp.
Disadvantages
does not address the identity space problems with ftp.
Advantages
Users would not require accounts on the other’s machines.
they could control each other’s access to their cell using the fine-grained permissions provided by AFS.
AFS avoids the consistency problems with other approaches using copy-on-open semantics
AFS supports intelligent caching mechanisms for performance.
access to an AFS file system is almost identical to accessing a Unix file system, users have to learn few
new commands,
legacy applications can run almost unchanged.
AFS implements strong security features.
o All data are encrypted in transit.
o Authentication is using Kerberos, and access control lists are supported.
Disadvantages
use of Kerberos
all sites and organizations that want to connect using AFS must themselves use Kerberos authentication
and all of the Kerberos realms must trust each other.
this means changing the authentication mechanism in use at the organization.
o This is a non-trivial—and typically politically very difficult—step to accomplish.
the realms must trust each other
the Kerberos security credentials time-out eventually
o long-running applications must be changed to renew credentials using Kerberos’s API.
AFS requires that
all parties migrate to the same file system
o which would probably be a significant burden on them and their organizations
Grid Servers
A grid server performs the following grid-related tasks
Domain Creation
This grid server is also called a grid domain controller (GDC).
The GDC creates and defines a domain
A domain represents a single grid.
Every domain has exactly one GDC.
Multiple domains may be interconnected
Authentication
this grid server would be responsible for receiving the user name and password
verifying the user’s identity using an in-built grid authentication service
delegating this process to a third-party authentication service such as NIS, Active Directory, or Netegrity
Access Control
grid server uses her credentials to retrieve her identity and then check the access controls on the object to
determine if the requested access is permissible or not
Meta-data management
Every object in a grid has meta-data associated with it
o creation time, ownership information, modification time, etc
A grid server is responsible for storing the meta-data in an internal database,
performing searches on it when requested,
rehashing the information when it becomes stale
Monitoring
Monitoring typically involves determining the response time of other components to ping messages
Searching
Share Servers
responsible for bulk data transfer to and from a local disk on a machine
always associated with a grid server
The grid server is responsible for verifying whether a given read/write request is permissible or not.
If the request is permitted, the grid server passes a handle to the user as well as the share server.
o The user’s request is then forwarded to the share server along with this handle.
Subsequent requests are satisfied by the share server without the intervention of the grid server
A share server performs the actual bulk data transfers, grid server performs grid-related tasks for it
Proxy Servers
enables accesses across a firewall
requires a single port in the firewall to be opened for TCP—specifically HTTP/HTTPS—traffic.
All Avaki traffic passes through this port
The proxy server accepts all Avaki traffic forwarded from the firewall and redirects the traffic to the
appropriate components running on machines within the firewall.
The responses of these machines are sent back to the proxy server, which forwards this traffic to the
appropriate destination through the open port on the firewall.
Disclaimer
Intended for educational purposes only. Not intended for any sort of commercial use
Purely created to help students with limited preparation time
Text and picture used were taken from the reference items
Reference
Grid Computing: A Practical Guide to Technology and Applications by Ahmar Abbas
Credits
Thanks to my family members who supported me, while I spent considerable amount of time to prepare these notes.
Feedback is always welcome at GHCRajan@gmail.com