Sie sind auf Seite 1von 32

CS6703- GRID AND CLOUD COMPUTING

UNIT I- INTRODUCTION
Evolution of Distributed computing: Scalable computing over the Internet
Technologies for network based systems clusters of cooperative computers Grid
computing Infrastructures cloud computing service oriented architecture
Introduction to Grid Architecture and standardsElements of Grid Overview of Grid
Architecture.

1. SCALABLE COMPUTING OVER THE INTERNET

THE AGE OF INTERNET COMPUTING

Billions of people use the Internet every day. As a result, supercomputer sites and
large data centers must provide high-performance computing services to huge
numbers of Internet users concurrently.
Because of this high demand, the Linpack Benchmark for high-performance
computing (HPC) applications is no longer optimal for measuring system
performance. The emergence of computing clouds instead demands high-throughput
computing (HTC) systems built with parallel and distributed computing technologies
systems, and high-bandwidth networks.

THE PLATFORM EVOLUTION

HIGH-PERFORMANCE COMPUTING

HPC needs large amounts of computing power for short periods of time.
HPC systems uses a smaller number of more expensive processors expensively
interconnected
HPC system focuses on how fast an individual job can complete
Fine grained problems need a high performance system that enables rapid
synchronization between the bits that can be processed in parallel and runs the bit as
fast as possible that are difficult to parallelise
Tightly coupled systems are efficiently executed
For many years, HPC systems emphasize the raw speed performance.
The metrics to evaluate HPC systems are FLOPS(Floating Point Operations Per
Second) which identifies the number of floating point operations per second that a
computing system can perform
On the HPC side, supercomputers (massively parallel processors or MPPs) are
gradually replaced by clusters of cooperative computers out of a desire to share
computing resources.
The cluster is often a collection of homogeneous compute nodes that are physically
connected in close range to one another.
The speed of HPC systems has now increased from Gflops to Pflops.

HIGH-THROUGHPUT COMPUTING

The development of market-oriented high-end computing systems is undergoing a


strategic change from an HPC paradigm to an HTC paradigm.
This HTC paradigm pays more attention to high-flux computing.
HTC needs large amounts of computing power for much longer
times(months/years)
HPC systems uses a large number of inexpensive processors inexpensively
interconnected
HTC system focuses on how many jobs can be completed over a long period of time
Coarse grained problems can use a HTC systems
The main application for high-flux computing is in Internet searches and web services
by millions or more users simultaneously.
Loosely coupled systems are efficiently executed
On the HTC side, peer-to-peer (P2P) networks are formed for distributed file sharing
and content delivery applications.
A P2P system is built over many client machines. Peer machines are globally
distributed in nature. P2P, cloud computing, and web service platforms are more
focused on HTC applications than on HPC applications.
HTC systems measure their performance in terms of jobs completed per
month/year

COMPUTING PARADIGM DISTINCTIONS

Centralized computing
This is a computing paradigm by which all computer resources are
Centralized in one physical system.
All resources (processors, memory, and storage) are fully shared and tightly coupled
within one integrated OS.
Many data centres and supercomputers are centralized systems, but they are used in
parallel, distributed, and cloud computing applications

Parallel computing
In parallel computing, all processors are either tightly coupled with centralized shared
memory or loosely coupled with distributed memory.
Interprocessor communication is accomplished through shared memory or via
message passing.
A computer system capable of parallel computing is commonly known as a parallel
computer
Programs running in a parallel computer are called parallel programs.
The process of writing parallel programs is often referred to as parallel programming

Distributed computing
A distributed system consists of multiple autonomous computers, each having its own
private memory, communicating through a computer network.
Information exchange in a distributed system is accomplished through message
passing. A computer program that runs in a distributed system is known as a
distributed program.
The process of writing distributed programs is referred to as distributed programming.

Cloud computing
An Internet cloud of resources can be either a centralized or a distributed computing
system.
The cloud applies parallel or distributed computing, or both.
Clouds can be built with physical or virtualized resources over large data centers that
are centralized or distributed
Some authors consider cloud computing to be a form of utility computing or service
computing

DEGREES OF PARALLELISM

Bit-level parallelism (BLP) converts bit-serial processing to word-level processing


gradually.
Instruction-level parallelism (ILP), in which the processor executes multiple
instructions simultaneously rather than only one instruction at a time.
ILP is practised through pipelining, superscalar computing, VLIW (very long
instruction word) architectures, and multithreading.
ILP requires branch prediction, dynamic scheduling, speculation, and compiler
support to work efficiently.
Data-level parallelism (DLP) was made popular through SIMD (single instruction,
multiple
data) and vector machines using vector or array types of instructions. DLP requires
even more hardware support and compiler assistance to work properly.
In fact, BLP, ILP, and DLP are well supported by advances in hardware and compilers
THE TREND TOWARD UTILITY COMPUTING

It is service provisioning model in which a service provider makes computing


resources and infrastructure management available to the customer as needed and
charges them for specific usage rather than a flat rate
Utility computing focuses on a business model in which customers receive computing
resources from a paid service provider.
All grid/cloud platforms are regarded as utility service providers
Reliability and scalability are two major design objectives in these computing models.
Second, they are aimed at autonomic operations that can be self-organized to support
dynamic discovery.
Finally, these paradigms are composable with QoS and SLAs (service-level
agreements).

THE HYPE CYCLE OF NEW TECHNOLOGIES


Hype Cycles are graphical representations of the relative maturity of technologies, IT
methodologies and management disciplines

2. TECHNOLOGIES FOR NETWORK-BASED SYSTEMS

MULTICORE CPUS AND MULTITHREADING TECHNOLOGIES


Processor speed is measured in millions of instructions per second (MIPS) and
network bandwidth is measured in megabits per second (Mbps) or gigabits per second
(Gbps). The unit GE refers to 1 Gbps Ethernet bandwidth.

MULTITHREADING TECHNOLOGY

The superscalar processor is single-threaded with four functional units. Each of the
three multithreaded processors is four-way multithreaded over four functional data
paths.
In the dual-core processor, assume two processing cores, each a single-threaded two-
way superscalar processor.
Instructions from different threads are distinguished by specific shading patterns for
instructions from five independent threads.
Typical instruction scheduling patterns are shown here. Only instructions from the
same thread are executed in a superscalar processor.
Fine-grain multithreading switches the execution of instructions from different
threads per cycle.
Course-grain multithreading executes many instructions from the same thread for
quite a few cycles before switching to another thread.
The multicore CMP executes instructions from different threads completely.
The SMT allows simultaneous scheduling of instructions from different threads in the
same cycle.

GPU COMPUTING TO EXASCALE AND BEYOND

A GPU is a graphics coprocessor or accelerator mounted on a computers graphics


card or video card.
A GPU offloads the CPU from tedious graphics tasks in video editing applications
Unlike CPUs, GPUs have a throughput architecture that exploits massive parallelism
by executing many concurrent threads slowly, instead of executing a single long
thread in a conventional microprocessor very quickly.
Lately, parallel GPUs or GPU clusters have been garnering a lot of attention against
the use of CPUs with limited parallelism.

GPU PROGRAMMING MODEL

The CPU is the conventional multicore processor with limited parallelism to exploit.
The GPU has a many-core architecture that has hundreds of simple processing cores
organized as multiprocessors.
Each core can have one or more threads.
Essentially, the CPUs floating-point kernel computation role is largely offloaded to
the many-core GPU. The CPU instructs the GPU to perform massive data processing.
The bandwidth must be matched between the on-board main memory and the on-chip
GPU memory.
This process is carried out in NVIDIAs CUDA programming using the GeForce 8800
or Tesla and Fermi GPUs.

VIRTUAL MACHINES AND VIRTUALIZATION MIDDLEWARE

A conventional computer has a single OS image. This offers a rigid architecture that
tightly couples application software to a specific hardware platform.
Some software running well on one machine may not be executable on another
platform with a different instruction set under a fixed OS.
Virtual machines (VMs) offer novel solutions to underutilized resources, application
inflexibility, software manageability, and security concerns in existing physical
machines.

In Fig, the host machine is equipped with the physical hardware, as shown at the
bottom of the figure.
An example is an x-86 architecture desktop running its installed Windows OS, as
shown in part (a) of the figure.
The VM can be provisioned for any hardware system. The VM is built with virtual
resources managed by a guest OS to run a specific application. Between the VMs and
the host platform, one needs to deploy a middleware layer called a virtual machine
monitor (VMM).
Fig (b) shows a native VM installed with the use of a VMM called a hypervisor in
privileged mode. For example, the hardware has x-86 architecture running the
Windows system.
The guest OS could be a Linux system and the hypervisor is the XEN system
developed at Cambridge University. This hypervisor approach is also called bare-
metal VM, because the hypervisor handles the bare hardware (CPU, memory, and I/O)
directly.
Another architecture is the host VM shown in Fig(c). Here the VMM runs in non-
privileged mode. The host OS need not be modified. The VM can also be
implemented with a dual mode, as shown in Fig (d). Part of the VMM runs at the user
level and another part runs at the supervisor level. In this case, the host OS may have
to be modified to some extent.
Multiple VMs can be ported to a given hardware system to support the virtualization
process.
The VM approach offers hardware independence of the OS and applications. The user
application running on its dedicated OS could be bundled together as a virtual
appliance that can be ported to any hardware platform.
The VM could run on an OS different from that of the host computer.

VM Primitive Operations

First, the VMs can be multiplexed between hardware machines


Second, a VM can be suspended and stored in stable storage
Third, a suspended VM can be resumed or provisioned to a new hardware platform
Finally, a VM can be migrated from one hardware platform to another

3. CLUSTERS OF COOPERATIVE COMPUTERS

Cluster Architecture
The cluster is connected to the Internet via a virtual private network (VPN) gateway. The gateway IP
address locates the cluster. Most clusters have loosely coupled node
Computers. All resources of a server node are managed by their own OS.
Single-System Image
Cluster designers desire a cluster operating system or some middleware to support SSI at various
levels, including the sharing of CPUs, memory, and I/O across all cluster nodes.
Hardware, Software, and Middleware Support
Special cluster middleware supports are needed to create SSI or high availability (HA).
Both sequential and parallel applications can run on the cluster, and special parallel environments are
needed to facilitate use of the cluster resources.
Major Cluster Design Issues
Unfortunately, a cluster-wide OS for complete resource sharing is not available yet.
A computing cluster consists of interconnected stand-alone computers which work
cooperatively as a single integrated computing resource.
In the past, clustered computer systems have demonstrated impressive results in
handling heavy workloads with large data sets.
Figure shows the architecture of a typical server cluster built around a low-latency,
high bandwidth interconnection network.
This network can be as simple as a SAN (e.g., Myrinet) or a LAN (e.g., Ethernet).
To build a larger cluster with more nodes, the interconnection network can be built
with multiple levels of Gigabit Ethernet, Myrinet, or InfiniBand switches.
Through hierarchical construction using a SAN, LAN, or WAN, one can build
scalable clusters with an increasing number of nodes.
The cluster is connected to the Internet via a virtual private network (VPN) gateway.
The gateway IP address locates the cluster.
The system image of a computer is decided by the way the OS manages the shared
cluster resources.
Most clusters have loosely coupled node computers.
All resources of a server node are managed by their own OS.
Thus, most clusters have multiple system images as a result of having many
autonomous nodes under different OS control

4. GRID COMPUTING INFRASTRUCTURES

Grid Computing is based on the concept of information and electricity sharing, which
allowing us to access to another type of heterogeneous and geographically separated
resources.

Grid gives the sharing of:

1. Storage elements
2. Computational resources
3. Equipment
4. Specific applications
5. Other

Thus, Grid is based on:

Internet protocols.
Ideas of parallel and distributed computing.

A Grid is a system that,

1) Coordinates resources that may not subject to a centralized control.

2) Using standard, open, general-purpose protocols and interfaces.

3) To deliver nontrivial qualities of services.

5. CLOUD COMPUTING

Cloud Computing is used to manipulating, accessing, and configuring the hardware and
software resources remotely. It gives online data storage, infrastructure and application.

Cloud Computing

Cloud computing supports platform independency, as the software is not required to be


installed locally on the PC. Hence, the Cloud Computing is making our business applications
mobile and collaborative.

INFRASTRUCTURE AS A SERVICE (IAAS)

This model puts together infrastructures demanded by usersnamely servers, storage,


networks, and the data centre fabric.
The user can deploy and run on multiple VMs running guest OSes on specific
applications.
The user does not manage or control the underlying cloud infrastructure, but can
specify when to request and release the needed resources.

PLATFORM AS A SERVICE (PAAS)

This model enables the user to deploy user-built applications onto a virtualized cloud
platform.
PaaS includes middleware, databases, development tools, and some runtime support
such as Web 2.0 and Java.
The platform includes both hardware and software integrated with specific
programming interfaces.
The provider supplies the API and software tools (e.g., Java, Python, Web 2.0, .NET).
The user is freed from managing the cloud infrastructure.

SOFTWARE AS A SERVICE (SAAS)

This refers to browser-initiated application software over thousands of paid cloud


customers.
The SaaS model applies to business processes, industry applications, consumer
relationship management (CRM), enterprise resources planning (ERP), human
resources (HR), and collaborative applications.
On the customer side, there is no upfront investment in servers or software licensing.
On the provider side, costs are rather low, compared with conventional hosting of user
applications.

CHARACTERISTICS OF CLOUD COMPUTING

There are four key characteristics of cloud computing. They are shown in the following
diagram:
1. On Demand Self Service
Cloud Computing allows the users to use web services and resources on demand. One can
logon to a website at any time and use them.
2. Broad Network Access
Since cloud computing is completely web based, it can be accessed from anywhere and at any
time.
3. Resource Pooling
Cloud computing allows multiple tenants to share a pool of resources. One can share single
physical instance of hardware, database and basic infrastructure.
4. Rapid Elasticity
It is very easy to scale the resources vertically or horizontally at any time. Scaling of
resources means the ability of resources to deal with increasing or decreasing demand.The
resources being used by customers at any given point of time are automatically monitored.
5. Measured Service
In this service cloud provider controls and monitors all the aspects of cloud service. Resource
optimization, billing capacity planning etc. depend on it.

BENEFITS OF CLOUD COMPUTING

1. One can access applications as utilities, over the Internet.


2. One can manipulate and configure the applications online at any time.
3. It does not require to install a software to access or manipulate cloud application.
4. Cloud Computing offers online development and deployment tools, programming runtime
environment through PaaS model.
5. Cloud resources are available over the network in a manner that provide platform
independent access to any type of clients.
6. Cloud Computing offers on-demand self-service. The resources can be used without
interaction with cloud service provider.

6. SERVICE- ORIENTED ARCHITECTURE

Service-Oriented Architecture (SOA)


In grids/web services, Java, and CORBA, an entity is, respectively, a service, a Java object,
and a CORBA distributed object in a variety of languages.
These architectures build on the traditional seven Open Systems Interconnection (OSI) layers
that provide the base networking abstractions.
Layered Architecture for Web Services and Grids
The entity interfaces correspond to the Web Services Description Language (WSDL), Java
method, and CORBA interface definition language (IDL) specifications.
These interfaces are linked with customized, high-level communication systems: SOAP, RMI,
and IIOP. These communication systems are built on message-oriented middleware
infrastructure such as Web Sphere MQ or Java Message Service (JMS) which provide rich
functionality and support virtualization of routing, senders, and recipients.
In the case of fault tolerance, the features in the Web Services Reliable Messaging (WSRM)
framework mimic the OSI layer capability modified to match the different abstractions
At the entity levels. Security is a critical capability that either uses or re implements the
capabilities seen in concepts such as Internet Protocol Security (IPsec) and secure sockets in
the OSI layers.
Web Services and Tools
Both web services and REST systems have very distinct approaches to building reliable
Interoperable systems.
This specification is carried with communicated messages using Simple Object Access
Protocol (SOAP).
The hosting environment then becomes a universal distributed operating system with fully
distributed capability carried by SOAP
Messages.
REST can use XML schemas but not those that are part of SOAP; XML over HTTP is a
popular design choice in this regard.

Service-Oriented Architecture helps to use applications as a service for other applications


regardless the type of vendor, product or technology. Therefore, it is possible to exchange the
data between applications of different vendors without additional programming or making
changes to services.

The cloud computing service oriented architecture is shown in the diagram below.

Distributed computing such as Grid computing relies on causing actions to occur on


remote computers. Taking advantage of remote computers was recognized many years
ago well before Grid computing. One of the underlying concepts is the client-server
model, as shown in the figure below. The client in this context is a software
component on one computer that makes an access to the server for a particular
operation.
The server responds accordingly. The request and response are transmitted through
the network from the client to the server.

An early form of client-server arrangement was the remote procedure call (RPC)
introduced in the 1980s. This mechanism allows a local program to execute a
procedure on a remote computer and get back results from that procedure. It is now
the basis of certain network facilities such as mounting remote files in a shared file
system.

For the remote procedure call to work, the client needs to:

Identify the location of the required procedure.


Know how to communicate with the procedure to get it to provide the actions
required.
The remote procedure call introduced the concept of a service registry to provide a means of
locating the service (procedure). Using a service registry is now part of what is called a
service-oriented architecture (SOA) as illustrated in the figure below. The sequence of events
is as follows:
First, the server (service provider) publishes its services in a service registry.
Then, the client (service requestor) can ask the service registry to locate a service.
Then, the client (service requestor) binds with service provider to invoke a service.

Service-oriented architecture.

Later forms of remote procedure calls in 1990s introduced distributed objects, most
notably, CORBA (Common Request Broker Architecture) and Java RMI (Remote Method
Invocation).
A fundamental disadvantage of remote procedure calls so far described is the need for the
calling programs to know implementation-dependent details of the remote procedural
call. A procedural call has a list of parameters with specific meanings and types and the
return value(s) have specific meaning and type.

All these details need to be known by the calling program each remote procedure
provided by different programmers could have different and incompatible arrangements.
This led to improvements including the introduction of interface definition (or
description) languages (IDLs) that enabled the interface to be described in a language-
independent manner and would allow clients and servers to interact in different languages
(e.g., between C and Java). However, even with IDLs, these systems were not always
completely platform/language independent.

Some aspects for a better system include:


Universally agreed-upon standardized interfaces.
Inter-operability between different systems and languages.
Flexibility to enable different programming models and message patterns.
Agreed network protocols (Internet standards).
Web services with an XML interface definition language offer the solution.

7. INTRODUCTION TO GRID ARCHITECTURE AND STANDARDS

Grids provide protocols and services at five different layers as identified in the Grid protocol
architecture.

At the Fabric layer, Grids provide access to different resource types such as compute,
storage for instance, local resource managers.
Connectivity layer defines core communication and authentication protocols for easy
and secure network transactions.
The GSI (Grid Security Infrastructure) protocol underlies every Grid transaction.
The Resource layer defines protocols for the publication, discovery, negotiation,
monitoring, accounting and payment of sharing operations on individual resource
The GRAM (Grid Resource Access and Management) protocol is used for allocation
of computational resources and for monitoring and control of computation on those
resources, and Grid FTP for data access and high-speed data transfer.

The Collective layer captures interactions across collections of resources, directory


services such as MDS (Monitoring and Discovery Service) allows for the monitoring
and discovery of VO resources, Condor-G and Nimrod-G are examples of co
allocating, scheduling and brokering services, and MPICH for Grid enabled
programming systems, and CAS (community authorization service) for global
resource policies
8. ELEMENTS OF GRID

COMPUTATIONAL & DATA GRIDS

Like an electric utility power grid, a computing grid offers an infrastructure that
couples computers, software/middleware, special instruments, and people and sensors
together. The grid is often constructed across LAN, WAN, or Internet backbone
networks at a regional, national, or global scale.
Enterprises or organizations present grids as integrated computing resources.
They can also be viewed as virtual platforms to support virtual organizations. The
computers used in a grid are primarily workstations, servers, clusters, and
supercomputers. Personal computers, laptops, and PDAs can be used as access
devices to a grid system.
The resource sites offer complementary computing resources, including workstations,
large servers, a mesh of processors, and Linux clusters to satisfy a chain of
computational needs.
The grid is built across various IP broadband networks including LANs and WANs
already used by enterprises or organizations over the Internet.

PEER-TO-PEER NETWORK FAMILIES

The P2P architecture offers a distributed model of networked systems. First, a P2P
network is client-oriented instead of server-oriented.
In a P2P system, every node acts as both a client and a server, providing part of the
system resources.
Peer machines are simply client computers connected to the Internet. All client
machines act autonomously to join or leave the system freely.
This implies that no master-slave relationship exists among the peers.
No central coordination or central database is needed.
In other words, no peer machine has a global view of the entire P2P system. The
system is self-organizing with distributed control.

STANDARDS FOR GRID ENVIRONMENT

OGSA

The Global Grid Forum has published the Open Grid Service Architecture (OGSA).
To address the requirements of grid computing in an open and standard way requires a
framework for distributed systems that support integration, virtualization and
management. Such a framework requires a core set of interfaces, expected behaviors,
resource models bindings.

OGSA defines requirements for these core capabilities and thus provides general
reference architecture for grid computing environments.

It identifies the components and functions that are useful if not required for a grid
environment.

OGSI

The Global Grid Forum extended the concepts defined in OGSA to define specific
interfaces to various services that would implement the functions defined by OGSA.
Open Grid Services Interface (OGSI) defines mechanisms for creating, managing
exchanging information among Grid services.

A Grid service is a Web service that conforms to a set of interfaces and behaviors that
define how a client interacts with a Grid service.

These interfaces and behaviors, along with other OGSI mechanisms associated with
Grid service creation and discovery, provide the basis for a robust grid environment.

OGSI provides the Web Service Definition Language (WSDL) definitions for these
key interface

OGSA-DAI

The OGSA-DAI (data access and integration) project is concerned with constructing
middleware to assist with access and integration of data from separate data sources
via the grid.

GRIDFTP

Grid FTP is a secure and reliable data transfer protocol providing high performance
and optimized for wide-area networks that have high bandwidth.
As one might guess from its name, it is based upon the Internet FTP protocol and
includes extensions that make it a desirable tool in a grid environment.
The Grid FTP protocol specification is a proposed recommendation document in the
Global Grid Forum (GFD-R-P.020).

Grid FTP uses basic Grid security on both control (command) and data channels.

Features include multiple data channels for parallel transfers, partial file transfers,
third-party transfers more.

WSRF Web Services Resource Framework (WSRF). Basically, WSRF defines a set of
specifications for defining the relationship between Web services (that are normally
stateless) and stateful resources

OVERVIEW OF GRID ARCHITECTURE

GENERAL DESCRIPTION

The Computing Element (CE) is a set of gLite services that provide access for Grid jobs to a
local resource management system (LRMS, batch system) running on a computer farm, or
possibly to computing resources local to the CE host. Typically the CE provides access to a
set of job queues within the LRMS.

BOOKING CONDITIONS

No particular booking is required to use this service. However, the user MUST have a
valid grid certificate of an accepted Certificate Authority and MUST be member of a
valid Virtual Organization (VO).
The service is initiated by respective commands that can be submitted from any gLite
User Interface either interactively or through batch submission.
To run a job on the cluster the user must install an own or at least have access to a
gLite User Interface. Certificates can be requested for example at the German Grid
Certificate Authority.

DEREGISTRATION

No particular deregistration is required for this service.


A user with an expired Grid certificate or VO membership is automatically blocked
from accessing the CE.

IT-SECURITY

The database and log files of the CEs contain information on the status and results of
the jobs and the certificate that was used to initiate the task.
The required data files themselves are stored on the worker nodes or in the Grid
Storage Elements (SEs). No other personal data is stored.
UNIT II GRID SERVICES

Introduction to Open Grid Services Architecture (OGSA) Motivation Functionality


Requirements Practical & Detailed view of OGSA/OGSI Data intensive grid service
models OGSA services.

1. INTRODUCTION TO OPEN GRID SERVICES ARCHITECTURE (OGSA)

The OGSA is an open source grid service standard jointly developed by academia and
the IT industry under coordination of a working group in the Global Grid Forum
(GGF).
The standard was specifically developed for the emerging grid and cloud service
Communities.
The OGSA is extended from web service concepts and technologies.
The standard defines a common framework that allows businesses to build grid
platforms across enterprises and business partners.
The intent is to define the standards required for both open source and commercial
software to support a global grid infrastructure.
OPEN GRID SERVICES ARCHITECTURE (OGSA)

Open Grid Services Architecture (OGSA) describes a service-oriented architecture for


a grid computing environment for business and scientific use.
OGSA is a distributed interaction and computing architecture based around services,
assuring interoperability on heterogeneous systems so that different types of resources
can communicate and share information.
OGSA is based on several other Web service technologies, such as the Web Services
Description Language (WSDL) and the Simple Object Access Protocol (SOAP), but it
aims to be largely independent of transport-level handling of data.
OGSA has been described as a refinement of a Web services architecture, specifically
designed to support grid requirements

The Open Grid Services Architecture, described these capabilities:

Infrastructure services
Execution Management services
Data services
Resource Management services
Security services
Self-management services
Information services

OGSA DESIGN GOALS

Operations are grouped to form interfaces, and interfaces are combined to specify a service.

Encourages code-reuse
Simplifies application design
Ease of composition of services
Service Virtualization: isolate users from
Details of service implementation and location.

OGSA COMPONENTS

Open Grid Services Infrastructure (OGSI)


OGSA services
OGSA schemas
Built on Web services

A Web service that adheres to OGSI is called a Grid service.

OGSA FRAMEWORK

oThe OGSA was built on two basic soware technologies: the Globus Toolkit
widely adopted as a grid technology solution for scientific and technical
computing, and web services (WS 2.0) as a popular standards-based
framework for business and network applications.
The OGSA is intended to support the creation, termination, management,
and invocation of stateful, transient grid services via standard interfaces and
conventions
The OGSA framework specifies the physical environment, security,
infrastructure profile, resource provisioning, virtual domains, and execution
environment for various grid services and API access tools
OGSA is service oriented.
A service is an entity that provides some capability to its client by exchanging
Messages

OGSA INTERFACES

The OGSA is centered on grid services.


These services demand special well-defined application interfaces.
These interfaces provide resource discovery, dynamic service
creation, lifetime management, notification, and manageability

GRID SERVICE HANDLE


A GSH is a globally unique name that distinguishes a specific grid service instance
from all others.
The status of a grid service instance could be that it exists now or that it will exist in
the future.
These instances carry no protocol or instance-specific addresses or supported protocol
bindings. Instead, these information items are encapsulated along with all other
instance-specific information.
In order to interact with a specific service instance, a single abstraction is defined as a
GSR.
Unlike a GSH, which is time-invariant, the GSR for an instance can change over the
Lifetime of the service.
The OGSA employs a handle-resolution mechanism for mapping from a GSH to a
GSR.
The GSH must be globally defined for a particular instance. However, the GSH may
not always refer to the same network address.
A service instance may be implemented in its own way, as long as it obeys the
associated semantics.

GRID SERVICE MIGRATION

This is a mechanism for creating new services and specifying assertions regarding the
Lifetime of a service.
The OGSA model defines a standard interface, known as a factor, to implement this
reference. Any service that is created must address the former services as the
reference of later services

OGSA SECURITY MODELS

The OGSA supports security enforcement at various levels


The grid works in a heterogeneous distributed environment, which is essentially open
to the general public.
At the security policy and user levels, we want to apply a service or endpoint policy,
resource mapping rules, authorized access of critical resources, and privacy
protection.
At the Public Key Infrastructure (PKI) service level, the OGSA demands security
binding with the security protocol stack and bridging of certificate authorities
(CAs), use of multiple trusted intermediaries, and so on.
Trust models and secure logging are often practiced in grid platforms.
DATA-INTENSIVE GRID SERVICE MODELS

Applications in the grid are normally grouped into two categories: computation-
intensive and data-intensive.
For data-intensive applications, we may have to deal with massive amounts of data.
The grid system must be specially designed to discover, transfer, and manipulate these
massive data sets.
Transferring massive data sets is a time-consuming task.
Efficient data management demands low-cost storage and high-speed data movement.

DATA REPLICATION AND UNIFIED NAMESPACE

oThis data access method is also known as caching, which is oen applied to enhance
data efficiency in a grid environment. By replicating the same data blocks and
scattering them in multiple regions of a grid, users can access the same data with
locality of references.
Furthermore, the replicas of the same data set can be a backup for one another. Some
key data will not be lost in case of failures. However, data replication may demand
periodic consistency checks.
The increase in storage requirements and network bandwidth may cause additional
problems.
Replication strategies determine when and where to create a replica of the data.
The factors to consider include data demand, network conditions, and transfer cost.
The strategies of replication can be classified into method types: dynamic and static.
For the static method, the locations and number of replicas are determined in advance
and will not be modified.
Although replication operations require little overhead, static strategies cannot adapt
to changes in demand, bandwidth, and storage availability.
Dynamic strategies can adjust locations and number of data replicas according to
changes in conditions (e.g., user behavior).
However, frequent data-moving operations can result in much more overhead than in
static strategies.
The replication strategy must be optimized with respect to the status of data replicas.
For static replication, optimization is required to determine the location and number
of data replicas.
For dynamic replication, optimization may be determined based on whether the data
replica is being created, deleted, or moved.
The most common replication strategies include preserving locality, minimizing
update costs, and maximizing profits.

GRID DATA ACCESS MODELS


Multiple participants may want to share the same data collection. To retrieve any
piece of data, we need a grid with a unique global namespace. Similarly, we desire to
have unique file names.
To achieve these, we have to resolve inconsistencies among multiple data objects
bearing the same name.
Access restrictions may be imposed to avoid confusion.
Data needs to be protected to avoid leakage and damage.
Users who want to access data have to be authenticated first and then authorized for
access.

In general, there are four access models for organizing a data grid, as listed here

MONADIC MODEL:

This is a centralized data repository model


All the data is saved in a central data repository.
When users want to access some data they have to submit requests directly to the
central repository.
No data is replicated for preserving data locality.
This model is the simplest to implement for a small grid.
For a large grid, this model is not efficient in terms of performance and reliability.
Data replication is permitted in this model only when fault tolerance is demanded.

HIERARCHICAL MODEL:
The hierarchical model, is suitable for building a large data grid which has only one
large data access directory.
The data may be transferred from the source to a second-level center. Then some data
in the regional center is transferred to the third-level center.
After being forwarded several times, specific data objects are accessed directly by
users.
Generally speaking, a higher-level data center has a wider coverage area. It provides
higher bandwidth for access than a lower-level data center.
PKI security services are easier to implement in this hierarchical data access model.

FEDERATION MODEL:

This data access model is better suited for designing a data grid with multiple sources
of data supplies.
Sometimes this model is also known as a mesh model. The data sources are
distributed to many different locations. Although the data is shared, the data items are
still owned and controlled by their original owners.
According to predefined access policies, only authenticated users are authorized to
request data from any data source.
This mesh model may cost the most when the number of grid institutions becomes
very large.

HYBRID MODEL:

This model combines the best features of the hierarchical and mesh models.
Traditional data transfer technology, such as FTP, applies for networks with lower
bandwidth.
Network links in a data grid often have fairly high bandwidth, and other data transfer
models are exploited by high-speed data transfer tools such as Grid FTP developed
with the Globus library.
The cost of the hybrid model can be traded off between the two extreme models for
hierarchical and mesh-connected grids.

OPEN GRID SERVICES INFRASTRUCTURE (OGSI)

The first attempt by the Grid computing community to specify how OGSA could be
implemented was the Open Grid Services Infrastructure (OGSI) standard, which was
introduced in 2002-2003. OGSI specifies the way that clients interact with services
(that is, service invocation, management of data, security mechanism, etc.).
The approach taken in OGSI to implement a stateful Web service was to modify the
Web Service Description Language, WSDL, to enable the state to be specified.

The modified language was called Grid Web Service Definition Language(GWSDL).
OGSI also introduced and described the term Grid service as an extended Web service
that conforms to its OGSI standard.
GWSDL provided support for the extra features in Grid services that were not present
in Web services. In addition to a means of representing state, OGSI included
inheritance of portTypes (interfaces), a way of addressing services using so-called
Grid Service References,

The word definition comes from when Web Service Description Language was
called Web Service Definition Language.

WS-RESOURCE FRAMEWORK

The term Grid service, which originally was introduced with OGSI, continued for
while. A broad meaning of Grid service is any service that conforms to interface
conventions of a Grid computing infrastructure.
A narrow WSRF meaning is a service that conforms to WSRF. The term seems to
have lost favor, maybe because it is better to think of services in a Grid environment
simply as regular Web services.

OGSA also continues as an overall framework. OGSA requires a stateful Web service.

WSRF specifies how that stateful Web service is implemented. Stateful Web services
are extensions of the original stateless Web service and a WSRF Web service could be
stateless by simply not using the state features.

A stateful Web service is obtained in WSRF by having a stateless service and stateful
resources where the stateless Web service is a front-end to stateful resources; hence
the name WS-Resource Framework.

Resource Properties: Resource properties are the name given to data items in the resource.
They can consist of:

1. Data values about the current state of the service results of calculations, etc.,

2. Metadata information about data.

3. Information about the whole resource termination time, etc.

The service interface is described in WSDL. The WSDL file serves the same purpose
as in the original stateless Web service. Using WSDL allows existing WSDL parsing
tools to be used.
A significant addition in the WSDL file is a specification of the resources. In WSRF,
the Web service is described in a WSDL document and the resource is specified in a
separate Resource Properties document (or merged into the WSDL document in the
case of simple resources).

A WS-Resource is a Web service and an associated resource.

The below figure shows a WSRF version of this stateful Web service. In this example,
there is a single resource property, data, acted upon by the add method. The add method adds
an integer x given as an argument to data. This service implements an operation on a WSRF
resource property and so WSDL will include definitions relating to the resource property.

WSRF stateful Web service

Each WSRF service needs an addressing mechanism that includes addressing to the
resources.
Pure stateless Web services are addressed with URIs (Uniform Resource Identifiers).

Typically, they are addressed by URLs (Uniform Resource Locators) as used for Web
sites.

URLs are a subset of URIs. The WSRF service addressing mechanism is defined in
the WS-addressing standard and uses a term called an endpoint reference (EPR),
which is an XML document that contains various information about the service and
resource. Specifically, the endpoint reference includes both the service address (URI)
and a resource identification called a key.

The key is a number. The service that accesses the resources and the resources are
paired together.

The client makes a request to the (stateless) Web service, addressing the service using
the EPR. The URI part of the EPR identifies the service.

The Web service selects the resource using the resource key inside the EPR, as
illustrated in the figure.

An endpoint reference has required and optional entries. An endpoint reference could
be used simply to address a Web service without an associated resource, i.e., a
stateless Web service
WSRF Specifications: WSRF is actually a collection of four specifications (standards):

1. WS-ResourceProperties specifies how resource properties are defined and accessed.

2. WS-ResourceLifetime specifies mechanisms to manage resource lifetimes.

3. WS-ServiceGroup specifies how to group services or WS-Resources together.

4. WS-BaseFaults specifies how to report faults.

Additional related WS-* standards include:

1. WS-Notification collection of specifications that specify how to configure services as


notification producers or consumers.

2. WS-Addressing specifies how to address Web services. Provides a way to address a


Web service/resource pair. Defines the endpoint reference.

GENERIC STATEFUL WSRF SERVICE

In this the code has been simplified, including not having specific operations or a
types element for defining the data format of the messages. However, as we shall see, a types
element will be still needed to define the resource properties.

WSDL Code: A generic Web service was outlined that performed a function functl that has
one integer argument argl and returns an integer result based only upon the supplied
argument. Now, that simple Web service will be extended so that the function acts upon an
integer, data, as shown in the figure. The actual function is still undefined at the WSDL level.
The integer data is separated out as a resource property.

The format of the stateless service WSDL 1.1 document was given and this carries over for a
WSRF WSDL document. But in addition, the properties of the resources need to be specified.

Generic Stateful Web service with a WSDL document and a stored value.
OGSA SERVICES

Configuring and Managing Resources:

Previously, there was a single resource although it could have multiple resource
properties inside the resource. Both the service code and the resource code were held
in a single file. Having one file is not the preferred way except for a simple service.
Ideally, there should be separate classes, and classes are provided for different
arrangements, including having one class for the arrangement combining the service
and resource, which is called Ref lectionResourceProperty class.

Resources are managed by a resource home, which provides resource management


functions including locating the resources. The resource home class for a combined
service-resource pair is called ServiceResourceHome class. The resource home class
for a single resource that is separate from the service is called
SingletonResourceHome class.
Ideally, we would like to have multiple resources. In WSRF, resources can be created
(but not the services themselves) using the WS-Resource factory patterna
traditional object-oriented approach to creating resources using a factory service. The
factory service is responsible for creating WS-resources and interacts with the
resource home that subsequently will manage the resources.

As mentioned earlier, each resource assigned a unique key, which together with
service URI identifies WS- resource pair (endpoint reference). The factory service is
requested to create a resource using the create Resource method, and uses the resource
home to create resource. The create Resource method will return an EPR of the newly
created WS-Resource. Notice that the client making the request to the resource factory
needs to know the location of the resource factory. When a service needs access to the
resource it will contact the resource home to find the resource.

Lifecycle Mechanisms:

Mechanisms are available in WSRF to specify how long a resource exists.


GT4 provides mechanisms to specify when a resource is automatically destroyed. It
can be destroyed immediately by a client invoking destroy operation of a WSRF
service. Notice that the factory is responsible for creating a resource, but a service
destroys it. A resource can also be destroyed by the GT4 command globus-wsrf-
destroy on the command line, e.g., globus-wsrf-destroy -e EPRfile.epr, where
EPRfile.epr is a file that contains the ERP of the resource.

Lease-Based Lifecycle:

In lease-based management, resources must be kept alive otherwise the


resource dies after the set lifetime. Interested parties (clients) must renew the lease by
updating the termination time or the resource will be destroyed. This approach
guarantees cleanup without having to use a destroy operation explicitly.

INFORMATION SERVICES

Information services collect information from various sources in a Grid environment. They
can provide information on running jobs and on the availability of Grid resources (computers,
etc.). One of the special features of Grid computing is the possibility of discovering remote
resources in the Grid. Globus 4 information services collectively is called the Monitoring and
Discovering System (MDS4 in GT4) and consists of a set of three WSRF information
components:

Index service
Trigger Service

Registering resource in an index service

Resource properties can be implemented in different ways but the information they
contain is converted to an XML document for exchange with other components. This
document is called a resource properties document. The index service also stores the resource
property information in an XML format. A resource properties document can be queried on
the command line with the GT4 command wsrf-query.
Index services support hierarchical structures. Information from various index
services can be aggregated into a higher-level index service, as shown in the figure below.
A community index service might contain information propagated from all the index services
of the virtual organization and provide a global view of the state of the Grid. Since the
resource properties include those associated with regular Globus services such as GRAM
services, the community index service can be searched for monitoring and discovery within
the Grid environment.

Hierarchical index services

Trigger Service: The trigger service responds to specific conditions occurring within the
Grid environment and first appeared in GT4. It subscribes to an index service (or another
source of WSRF information) to be notified of changes, which are compared to the provided
XPath expression. When a match occurs, prescribed actions take place such as sending an
email or updating a Web page.

Das könnte Ihnen auch gefallen