Sie sind auf Seite 1von 65

PANIMALAR INSTITUTE OF TECHNOLGY

(JAISAKTHI EDUCATIONAL TRUST)


CHENNAI 600 123

DEPARTMENT OF CSE

CS 6712 GRID AND CLOUD COMPUTING


LABORATORY

IV YEAR VII SEMESTER

LAB MANUAL Academic


Year: 2016 2017 (ODD
SEM)
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

PROGRAM EDUCATIONAL OBJECTIVES OF THE DEPARTMENT

PEO-I:

To excel in Computer Science and Engineering program to pursue their higher studies or

succeed in their profession through quality education.

PEO-II:

To acquire knowledge in the latest technologies and innovations and an ability to

identify, analyze and solve problems in computer engineering.

PEO-III:

To become recognized professional engineers with demonstrated commitment to life-

long learning and continuous self-improvement in order to respond to the rapid pace of

change in Computer Science Engineering.

PEO IV:

To be capable of modeling, designing, implementing and verifying a computing system

to meet specified requirements for the benefit of society.

PEO-V:

To possess critical thinking, communication skills, teamwork, leadership skills and

ethical behavior necessary to function productively and professionally.


IV/VII SEM i
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

PROGRAM OUTCOMES OF THE DEPARTMENT

Engineering graduates will be able to:

PO1. Engineering Knowledge: Apply the knowledge of mathematics, science,

engineering fundamentals, and an engineering specialization to the solution of complex

engineering problems.

PO2. Problem Analysis: Identify, formulate, review research literature, and analyze

complex engineering problems reaching substantiated conclusions using first principles

of mathematics, natural sciences, and engineering sciences.

PO3. Design/Development of Solutions: Design solutions for complex engineering

problems and design system components or processes that meet the specified needs with

appropriate consideration for the public health and safety, and the cultural, societal, and

environmental considerations.

PO4. Conduct Investigations of Complex Problems: Use research-based knowledge

and research methods including design of experiments, analysis and interpretation of

data, and synthesis of the information to provide valid conclusions.

PO5. Modern Tool Usage: Create, select, and apply appropriate techniques, resources,

and modern engineering and IT tools including prediction and modeling to complex

engineering activities with an understanding of the limitations.

PO6. The Engineer and Society: Apply reasoning informed by the contextual

knowledge to assess societal, health, safety, legal and cultural issues and the consequent

responsibilities relevant to the professional engineering practice.

PO7. Environment and Sustainability: Understand the impact of the professional

engineering solutions in societal and environmental contexts, and demonstrate the

knowledge of, and need for sustainable development.


IV/VII SEM ii
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

PO8. Ethics: Apply ethical principles and commit to professional ethics and

responsibilities and norms of the engineering practice.

PO9. Individual and Team Work: Function effectively as an individual, and as a

member or leader in diverse teams, and in multidisciplinary settings.

PO10. Communication: Communicate effectively on complex engineering activities

with the engineering community and with society at large, such as, being able to

comprehend and write effective reports and design documentation, make effective

presentations, and give and receive clear instructions.

PO11. Project Management and Finance: Demonstrate knowledge and understanding

of the engineering and management principles and apply these to ones own work, as a

member and leader in a team, to manage projects and in multidisciplinary environments.

PO12. Life-Long Learning: Recognize the need for, and have the preparation and

ability to engage in independent and life-long learning in the broadest context of

technological change.
IV/VII SEM iii
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

PROGRAM SPECIFIC OUTCOMES OF THE DEPARTMENT

PSO1: An ability to apply knowledge of software development concepts to select and

apply software development processes, programming paradigms, and architectural

models appropriate to different applications.

PSO2: Familiarity with various programming languages and paradigms, with practical

competence in at least three languages and two paradigms.

PSO3: An ability to demonstrate knowledge in theoretical computer science and in

related areas such as algorithm design, compiler design, artificial intelligence and

information security.

IV/VII SEM iv
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

CS6712 GRID AND CLOUD COMPUTING LABORATORY

OBJECTIVES:
The student should be made to:
1. Be exposed to tool kits for grid and cloud environment.
2. Be familiar with developing web services/Applications in grid framework
3. Learn to run virtual machines of different configuration.
4. Learn to use Hadoop

LIST OF EXPERIMENTS:
GRID COMPUTING LAB
Use Globus Toolkit or equivalent and do the following:
1. Develop a new Web Service for Calculator.
2. Develop new OGSA-compliant Web Service.
3. Using Apache Axis develop a Grid Service.
4. Develop applications using Java or C/C++ Grid APIs
5. Develop secured applications using basic security mechanisms available in Globus
Toolkit.
6. Develop a Grid portal, where user can submit a job and get the result. Implement it
with and without GRAM concept.
CLOUD COMPUTING LAB
Use Eucalyptus or Open Nebula or equivalent to set up the cloud and demonstrate.
1. Find procedure to run the virtual machine of different configuration. Check how many
virtual machines can be utilized at particular time.
2. Find procedure to attach virtual block to the virtual machine and check whether it
holds the data even after the release of the virtual machine.
3. Install a C compiler in the virtual machine and execute a sample program.
4. Show the virtual machine migration based on the certain condition from one node to
the other.
5. Find procedure to install storage controller and interact with it.
6. Find procedure to set up the one node Hadoop cluster.
7. Mount the one node Hadoop cluster using FUSE.
8. Write a program to use the API's of Hadoop to interact with it.
9. Write a word count program to demonstrate the use of Map and Reduce tasks
TOTAL: 45 PERIODS
REFERENCE: spoken-tutorial.org.
At the end of the course, the student should be able to:
Use the grid and cloud tool kits.
Design and implement applications on the Grid.
Design and Implement applications on the Cloud.
LIST OF EQUIPMENT FOR A BATCH OF 30 STUDENTS:
SOFTWARE:
Globus Toolkit or equivalent
Eucalyptus or Open Nebula or equivalent
HARDWARE
Standalone desktops 30 Nos

IV/VII SEM 5
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

COURSE OUTCOMES:
C408.1 An ability to learn and use the modern software and tools in Open Nebula and Globus toolkit.
C408.2 An ability to develop web service programs for Grid computing using Globus toolkit.
C408.3 An ability to implement applications using Gird APIs, Security mechanism available in
Globus toolkit & to develop a Grid portal.
C408.4 An ability to learn virtual machine environment on different configuration, installation of
compliers in virtually and virtual machine migration.
C408.5 An ability to learn Hadoop clusters and to develop programs using Hadoop APIs.

COURSE OUTCOMES AND PROGRAM OUTCOMES MAPPING


CO PO MATRIX:
CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
C408.1 2 - - 3 2 - - - - 1 - -
C408.2 2 - - 3 2 - - - - 1 - -
C408.3 2 - - 3 2 - - - - 1 - -
C408.4 2 - - 3 2 - - - - 1 - -
C408.5 2 - - 3 2 - - - - 1 - -
C408
2 - - 3 2 - - - - 1 - -
AVG

COURSE OUTCOMES AND PROGRAM SPECIFIC OUTCOMES


MAPPING CO PSO MATRIX:
CO PSO1 PSO2 PSO3
C408.1 2 2 -
C408.2 2 2 -
C408.3 2 2 -
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
C408.4 2 2 -
C408.5 2 2 -
C408 AVG 2 2 -

INTRODUCTION TO GRID COMPUTING

What Grid Computing

Grid computing can mean different things to different individuals. The grand vision is often presented as an
analogy to power grids where users (or electrical appliances) get access to electricity through wall sockets
with no care or consideration for where or how the electricity is actually generated. In this view of grid
computing, computing becomes pervasive and individual users (or client applications) gain access to
computing resources (processors, storage, data,
applications, and so on) as needed with little or no knowledge of where those resources are located or what
the underlying technologies, hardware, operating system, and so on are.

Though this vision of grid computing can capture ones imagination and may indeed someday become a
reality, there are many technical, business, political, and social issues that need to be addressed. If we
consider this vision as an ultimate goal, there are many smaller steps that need to be taken to achieve it.
These smaller steps each have benefits of their own.

Therefore, grid computing can be seen as a journey along a path of integrating various technologies and
solutions that move us closer to the final goal. Its key values are in the underlying distributed computing
infrastructure technologies that are evolving in support of cross-organizational application and resource
sharingin a word, virtualizationvirtualization across technologies, platforms, and organizations. This
kind of virtualization is only achievable through the use of open standards. Open standards help ensure that
applications can transparently
take advantage of whatever appropriate resources can be made available to them. An environment that
provides the ability to share and transparently access resources across a distributed and heterogeneous
environment not only requires the technology to virtualize certain resources, but also technologies and
standards in the areas of scheduling, security, accounting, systems management, and so on.

Grid computing could be defined as any of a variety of levels of virtualization along a continuum. Exactly
where along that continuum one might say that a particular solution is an implementation of grid computing
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
versus a relatively simple implementation using virtual resources is a matter of opinion. But even at the
simplest levels of virtualization, one could say that grid-enabling technologies are being utilized.

Grid computing involves an evolving set of open standards for Web services and interfaces that make
services, or computing resources, available over the Internet. Very often grid technologies are used on
homogeneous clusters, and they can add value on those clusters by assisting, for example, with scheduling
or provisioning of the resources in the cluster. The term grid, and its related technologies, applies across this
1
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
entire spectrum. If we focus our attention on distributed computing solutions, then we could consider one
definition of grid computing to be distributed computing across virtualized resources. The goal is to create
the illusion of a simple yet large and powerful virtual computer out of a collection of connected (and
possibly heterogeneous) systems sharing various combinations of resources.

Benefits of grid computing


Exploiting under utilized resources
Parallel CPU capacity
Virtual resources and virtual organizations for collaboration
Access to additional resources
Resource balancing
Reliability
Management

Creating a grid environment with the Globus Toolkit 4


Globus Toolkit 4 components

The Globus Alliance is made up of organizations and individuals that develop and make available
various technologies applicable to grid computing. The Globus Toolkit, the primary delivery vehicle for
technologies developed by the Globus Alliance, is an open source software toolkit used for building grid
systems and applications. Many companies and organizations are using the Globus Toolkit as the basis
for grid implementations of various types.

Overview of Globus Toolkit 4


Globus Toolkit 4 is a collection of open-source components. Many of these are based on existsing
standards, while others are based on (and in some cases driving) evolving standards. Version 4 of the
toolkit is the first version to support Web service based implementations of many of its components.
(Version 3 had included an OGSI implementation of some components, and Version 2 was not service
based at all.)

Though many components have Web service based implementations, some do not, and for compatibility
and migration reasons, some have both implementations.
2
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
Globus Toolkit 4 provides components in the following five categories:
Common runtime components
Security
Data management
Information services
Execution management

11.4 Installation
In order to install Globus Toolkit 4, we need to configure some tools that are essential for the Globus
Toolkit 4 installation. After installation of those tools, we
will install Globus Toolkit 4 using those tools.

11.4.1 Installing required software for Globus Toolkit 4 installation

Software name Recommended version


Java SDK (IBM / Sun / BEA) 1.4.2 or later
Apache Ant 1.5.1 or later
Gcc 3.2.1 and 2.95.x are tested (avoid 3.2)
GNU tar -
GNU sed -
Zlib 1.1.4 or later
GNU Make -
Sudo -
PostgreSQL(or other JDBC compliant 7.1 orlater (if using PostgreSQL)
database)

3
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO.1 1. Develop a new Web Service for Calculator.

AIM:

To develop a Web Service for performing arithmetic operation like calculator.

PROCEDURE:

Create new project


Select ASP.NET Empty Web Application
Give a name to your project and click ok button
Go to Solution Explorer and right click at your project
Select Add New Item and select Web Service application
Give it name and click ok button

Output:

4
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
Go to test page by clicking at "calculate" -> do the entry and click the "invoke" button.

It will show the result like the below figure.( After clicking the "invoke" button)

Similarly, you can perform different operations like addition, subtraction, multiplication, division
and modulo division. Now we consume this web service into window form application.

Create a windows application and add a service reference ( You can get help here ). After adding the
reference, go to the design page and arrange the UI controls like in the below figure.

5
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Result:
Thus the program for performing arithmetic operation using web service was successfully executed.

6
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO 2 Develop new OGSA-compliant Web Service.

AIM:
To develop a new OGSA-Compliant Web service in Grid Service using .NET language.

PROCEDURE:

To implement an OGSA-compliant GIS by exploiting .NET technologies. The implementation of Grid


Services with .NET technologies, management of the dynamic nature of the information describing the
re-sources (e.g. CPU usage, Memory usage, etc.), extension of the MS directory service functionalities
(e.g. Active Directory) in order to implement the OGSA Index Service functionalities.

The first issue is related to the implementation of Grid Service Specification prescribed in a MS .NET
language . In the framework of the GRASP project, we have selected the implementation of Grid
Service Specification provided by the Grid Computing Group of the Virginia University, named
OGSI.NET.

To manage the dynamic nature of information describing the resources, GT3 leverages on Service Data
Providers. In the MS environment, we rely on Performance Counters and Windows Management
Instrumentation (WMI) architecture to implement the Service Data Providers. For each component of a
MS system we have a performance object (e.g. Processor Object) gathering all the performance data of
the related entity. Each performance object provides a set of Performance Counters that retrieves specific
performance data regarding the resource associated to the performance object. For example, the
%Processor Time is a Performance Counter of the Processor Object representing the percentage of
time during which the processor is executing a thread. The performance counters are based on services at
the operating system level, and they are integrated in the .NET platform. In fact, .NET Framework
provides a set of APIs that allows the management of the performance counters.

To perform the collection and provisioning of the performance data to an index service, we leverage
on Windows Management Instrumentation (WMI) architecture. WMI is a unifying architecture that
allows the access to data from a variety of underlying technologies. WMI is based on the Common
Information Model (CIM) schema, which is an industry standard specification.
7
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Fig. 3. WMI Architecture


Distributed Management Task Force (DMTF).WMI provides a three-tiered approach for collecting
and providing management data. This approach consists of a standard mechanism for storing data, a
standard protocol for obtaining and distributing management data, and a WMI provider. A WMI provider
is a Win32 Dynamic-Link Library (DLL) that supplies instrumentation data for parts of the CIM schema.
Figure 3 shows the architecture of WMI.

When a request for management information comes from a consumer (see Figure 3) to the CIM Object
Manager (CIMON), the latter evaluates the re-quest, identifies which provider has the information, and
returns the data to the consumer. The consumer only requests the desired information, and never knows
the information source or any details about the way the information data are extracted from the
underlying API. The CIMOM and the CIM repository are implemented as a system service, called

8
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
WinMgmt, and are accessed through a set of Component Object Model (COM) interfaces.
WMI provides an abstraction layer that o ers the access to many system information about hardware
and software and many functions to make calcu-lations on the collected values. The combination of
performance counters and WMI realizes a Service Data Provider that each resource in a VO should
provide.

To implement an OGSA-compliant Index Service, we exploit some Active Directory (AD) features .
AD is a directory service designed for distributed networking environments providing secure, structured,
hierarchical storage of information about interesting objects, such as users, computers, services, inside an
enterprise network. AD provides a rich support for locating and working with these objects, allowing the
organizations to eciently share and manage information about network resources and users. It acts as
the central authority for network security, letting the operating system to readily verify a user identity and
to control his/her access to network resources.

Our goal is to implement a Grid Service that, taking the role of a consumer (see Figure 3), queries at
regular intervals the Service Data Providers of a VO (see Figure 5) to obtain resources information,
collect and aggregate these information, and allows to perform searches, among the resources of an
organization, matching a specified criteria (e.g. to search for a machine with a specified number of
CPUs). In our environment this Grid Service is called Global Information Grid Service (GIGS) (see
Figure 5).

9
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Fig. 5. Example of GIS in MS Environment

Fig 4

The hosts that run the GIGS have to be Domain Controllers (DC). A DC is a server computer,
running on Microsoft WindowsNT, Windows2000, or Windows Server2003 family operating systems,
that manages security for a domain. The use of a DC permits us to create a global catalog of all the
objects that reside in an organization, that is the primary goal of AD services. This scenario is depicted in
Figure 4 (a) where black-coloured machines run GIGS and the other ones are Service Data Providers.
Obviously, black-coloured hosts could also be Service Data Providers.

In order to avoid that the catalog grows too big and becomes slow and clumsy, AD is partitioned into
units, the triangles in Figure 4 (a). For each unit there is at least a domain controller. The AD partitioning
scheme emulates the Windows 2000 domain hierarchy (see Figure 4 (b)). Consequently, the unit of
partition for AD services is the domain. GIGS has to implement an interface in order to obtain, using a
publish/subscribe method, a set of data from Service Data Providers describing an active directory object.
Such data are then recorded in the AD by using Active Directory Service Interface (ADSI), a COM based
interface to perform common tasks, such as adding new objects.

After having stored those data in AD, the GIGS should be able to query AD for retrieving such data.
This is obtained exploiting the Directory Services.
Result:
Thus the program for developing OGSA- Complaint web service was successfully executed.
10
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
EXP.NO.3 Using Apache Axis develop a Grid Service.

AIM:
To develop a Grid service using Apache Axis.

Procedure:
1. Creating the New Level in the Package.
2. Edit the Configuration Files.
3. Modify the Service Code.
4. Modify the Client.
5. Compile and Deploy.
6. Starting the Container.
7. Compile the Client.
8. Run the Client.

OUTPUT:
Addition was successful
Subtraction was successful
Multiplication was successful
Division was successful
Current value: 20.

RESULT:

Thus the program for Grid Service using Apache Axis was successfully executed.

11
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
EXP.NO.4 Develop applications using Java or C/C++ Grid APIs

AIM:

To develop an application in Java using Grid APIs.

Procedure:
1. Import all the necessary java packages and name the file as GridLayoutDemo.java
2. Set up components to preferred size.
3. Add buttons to experiment with Grid Layout.
4. Add controls to set up horizontal and vertical gaps.
5. Process the Apply gaps button press.
6. Create the GUI.
7. Create and set up the window, Set up the content pane and Display the window.
8. Schedule a job for the event dispatch thread.
9. Show the application's GUI.

OUTPUT:

Figure 1: Horizontal, Left-to-Right Figure 2: Horizontal, Right-to-Left

Result:

Thus the program to develop an application in java using Grid APIs was successfully executed.

12
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO 5 Develop secured applications using basic security mechanisms


available in Globus Toolkit.

AIM:

To develop a secured applications using a basic security mechanisms available in Globus toolkit.

Procedure:

Basic security mechanisms available in Globus Toolkit.

1. Public Key Cryptography

The most important thing to know about public key cryptography is that, unlike earlier cryptographic
systems, it relies not on a single key (a password or a secret "code"), but on two keys. These keys are
numbers that are mathematically related in such a way that if either key is used to encrypt a message, the
other key must be used to decrypt it. Also important is the fact that it is next to impossible (with our
current knowledge of mathematics and available computing power) to obtain the second key from the
first one and/or any messages encoded with the first key.

By making one of the keys available publicly (a public key) and keeping the other key private (a private
key), a person can prove that he or she holds the private key simply by encrypting a message. If the
message can be decrypted using the public key, the person must have used the private key to encrypt the
message.

Important: It is critical that private keys be kept private! Anyone who knows the private key can easily
impersonate the owner.

2. Digital Signatures

Using public key cryptography, it is possible to digitally "sign" a piece of information. Signing
information essentially means assuring a recipient of the information that the information hasn't been
tampered with since it left your hands.

13
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
To sign a piece of information, first compute a mathematical hash of the information. (A hash is a
condensed version of the information. The algorithm used to compute this hash must be known to the
recipient of the information, but it isn't a secret.) Using your private key, encrypt the hash, and attach it
to the message. Make sure that the recipient has your public key.

To verify that your signed message is authentic, the recipient of the message will compute the hash of
the message using the same hashing algorithm you used, and will then decrypt the encrypted hash that
you attached to the message. If the newly-computed hash and the decrypted hash match, then it proves
that you signed the message and that the message has not been changed since you signed it.

3. Certificates

A central concept in GSI authentication is the certificate. Every user and service on the Grid is identified
via a certificate, which contains information vital to identifying and authenticating the user or service.

A GSI certificate includes four primary pieces of information:

A subject name, which identifies the person or object that the certificate represents.
The public key belonging to the subject.
The identity of a Certificate Authority (CA) that has signed the certificate to certify that the public
key and the identity both belong to the subject.
The digital signature of the named CA.

Note that a third party (a CA) is used to certify the link between the public key and the subject in the
certificate. In order to trust the certificate and its contents, the CA's certificate must be trusted. The link
between the CA and its certificate must be established via some non-cryptographic means, or else the
system is not trustworthy.

GSI certificates are encoded in the X.509 certificate format, a standard data format for certificates
established by the Internet Engineering Task Force (IETF). These certificates can be shared with other
public key-based software, including commercial web browsers from Microsoft and Netscape.

4. Mutual Authentication

If two parties have certificates, and if both parties trust the CAs that signed each other's certificates, then
the two parties can prove to each other that they are who they say they are. This is known as mutual

14
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
authentication. GSI uses the Secure Sockets Layer (SSL) for its mutual authentication protocol, which is
described below. (SSL is also known by a new, IETF standard name: Transport Layer Security, or TLS.)

Before mutual authentication can occur, the parties involved must first trust the CAs that signed each
other's certificates. In practice, this means that they must have copies of the CAs' certificates--which
contain the CAs' public keys--and that they must trust that these certificates really belong to the CAs.

To mutually authenticate, the first person (A) establishes a connection to the second person (B).

To start the authentication process, A gives B his certificate.

The certificate tells B who A is claiming to be (the identity), what A's public key is, and what CA is being
used to certify the certificate.

B will first make sure that the certificate is valid by checking the CA's digital signature to make sure that
the CA actually signed the certificate and that the certificate hasn't been tampered with. (This is where B
must trust the CA that signed A's certificate.)

Once B has checked out A's certificate, B must make sure that A really is the person identified in the
certificate.

B generates a random message and sends it to A, asking A to encrypt it.

A encrypts the message using his private key, and sends it back to B.

B decrypts the message using A's public key.

If this results in the original random message, then B knows that A is who he says he is.

Now that B trusts A's identity, the same operation must happen in reverse.

B sends A her certificate, A validates the certificate and sends a challenge message to be encrypted.

B encrypts the message and sends it back to A, and A decrypts it and compares it with the original.

If it matches, then A knows that B is who she says she is.

15
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
At this point, A and B have established a connection to each other and are certain that they know each
others' identities.

5. Confidential Communication

By default, GSI does not establish confidential (encrypted) communication between parties. Once mutual
authentication is performed, GSI gets out of the way so that communication can occur without the
overhead of constant encryption and decryption.

GSI can easily be used to establish a shared key for encryption if confidential communication is desired.
Recently relaxed United States export laws now allow us to include encrypted communication as a
standard optional feature of GSI.

A related security feature is communication integrity. Integrity means that an eavesdropper may be able
to read communication between two parties but is not able to modify the communication in any way.
GSI provides communication integrity by default. (It can be turned off if desired). Communication
integrity introduces some overhead in communication, but not as large an overhead as encryption.

6. Securing Private Keys

The core GSI software provided by the Globus Toolkit expects the user's private key to be stored in a file
in the local computer's storage. To prevent other users of the computer from stealing the private key, the
file that contains the key is encrypted via a password (also known as a passphrase). To use GSI, the user
must enter the passphrase required to decrypt the file containing their private key.

We have also prototyped the use of cryptographic smartcards in conjunction with GSI. This allows users
to store their private key on a smartcard rather than in a file system, making it still more difficult for
others to gain access to the key.

7. Delegation, Single Sign-On and Proxy Certificates

GSI provides a delegation capability: an extension of the standard SSL protocol which reduces the
number of times the user must enter his passphrase. If a Grid computation requires that several Grid
resources be used (each requiring mutual authentication), or if there is a need to have agents (local or
remote) requesting services on behalf of a user, the need to re-enter the user's passphrase can be avoided
by creating a proxy.

16
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
A proxy consists of a new certificate and a private key. The key pair that is used for the proxy, i.e. the
public key embedded in the certificate and the private key, may either be regenerated for each proxy or
obtained by other means. The new certificate contains the owner's identity, modified slightly to indicate
that it is a proxy. The new certificate is signed by the owner, rather than a CA. (See diagram below.) The
certificate also includes a time notation after which the proxy should no longer be accepted by others.
Proxies have limited lifetimes.

Figure 1.1. The new certificate is signed by the owner, rather than a CA.

The proxy's private key must be kept secure, but because the proxy isn't valid for very long, it doesn't
have to kept quite as secure as the owner's private key. It is thus possible to store the proxy's private key
in a local storage system without being encrypted, as long as the permissions on the file prevent anyone
else from looking at them easily. Once a proxy is created and stored, the user can use the proxy
certificate and private key for mutual authentication without entering a password.

When proxies are used, the mutual authentication process differs slightly. The remote party receives not
only the proxy's certificate (signed by the owner), but also the owner's certificate. During mutual
authentication, the owner's public key (obtained from her certificate) is used to validate the signature on
the proxy certificate. The CA's public key is then used to validate the signature on the owner's
certificate. This establishes a chain of trust from the CA to the proxy through the owner.

Result:

Thus the program to develop a security application available in Globus toolkit was successfully
executed.

17
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
EXP.NO 6 Develop a Grid portal, where user can submit a job and get the
result. Implement it with and without GRAM concept.
AIM:
To develop a Grid portal and implement it with and without GRAM concept.

Procedure:

1. Start---> Control panel---> system ---> advanced features --->


environment variables

2. The few basic steps to acquire an account, register your resource, and then submit a job for execution
using the TIET Grid Portal is as follows:
Load the TIET Grid Portal.
To register as a user with the Grid and obtain an account click on UserRegistration.
To Access the Grid resources and to enter into the Grid, user has to enter its Grid Identification
Name (GIN) or user id and Grid Identification Password (GIP) or password.
If the user is already registered to Grid then he can be directly login by its unique Grid
identification id and password otherwise user have to register to the Grid and he will be assigned
a unique Grid identification id and password.
In the next step u have to fill up your resource details in the provided form.
If the User wants to execute a job the he will open a portal that will take the job requirement
from the user.
From the Grid we can get the information about the current registered resources so that we can
match our job resource requirement with the available resources.
After submitting the job we can easily watch the current status of the job through the job status
portal after giving the unique job id that is provided to each job when the user submits the job.
At the end we can easily get the result of the job.

18
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
Output:

RESULT

Thus the program to develop Grid Portal was successfully executed.


19
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
INTRODUCTION TO CLOUD COMPUTING

What is cloud computing?


Cloud computing means that instead of all the computer hardware and software you're using sitting on
your desktop, or somewhere inside your company's network, it's provided for you as a service by
another company and accessed over the Internet, usually in a completely seamless way. Exactly where
the hardware and software is located and how it all works doesn't matter to you, the userit's just
somewhere up in the nebulous "cloud" that the Internet represents.

Cloud computing is a buzzword that means different things to different people. For some, it's just
another way of describing IT (information technology) "outsourcing"; others use it to mean any
computing service provided over the Internet or a similar network; and some define it as any bought-in
computer service you use that sits outside your firewall.

Types of cloud computing


IT people talk about three different kinds of cloud computing, where different services are being
provided for you. Note that there's a certain amount of vagueness about how these things are defined and
some overlap between them.

Infrastructure as a Service (IaaS) means you're buying access to raw computing hardware over the
Net, such as servers or storage. Since you buy what you need and pay-as-you-go, this is often
referred to as utility computing. Ordinary web hosting is a simple example of IaaS: you pay a
monthly subscription or a per-megabyte/gigabyte fee to have a hosting company serve up files for
your website from their servers.
Software as a Service (SaaS) means you use a complete application running on someone else's
system. Web-based email and Google Documents are perhaps the best-known examples. Zoho is
another well-known SaaS provider offering a variety of office applications online.
Platform as a Service (PaaS) means you develop applications using Web-based tools so they run on
systems software and hardware provided by another company. So, for example, you might develop
your own ecommerce website but have the whole thing, including the shopping cart, checkout, and
payment mechanism running on a merchant's server. App Cloud (from salesforce.com) and the
Google App Engine are examples of PaaS.
20
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Advantages and disadvantages of cloud computing


Advantages

The pros of cloud computing are obvious and compelling. If your business is selling books or repairing
shoes, why get involved in the nitty gritty of buying and maintaining a complex computer system? If
you run an insurance office, do you really want your sales agents wasting time running anti-virus
software, upgrading word-processors, or worrying about hard-drive crashes? Do you really want them
cluttering your expensive computers with their personal emails, illegally shared MP3 files, and naughty
YouTube videoswhen you could leave that responsibility to someone else? Cloud computing allows
you to buy in only the services you want, when you want them, cutting the upfront capital costs of
computers and peripherals. You avoid equipment going out of date and other familiar IT problems like
ensuring system security and reliability. You can add extra services (or take them away) at a moment's
notice as your business needs change. It's really quick and easy to add new applications or services to
your business without waiting weeks or months for the new computer (and its software) to arrive.

Drawbacks

Instant convenience comes at a price. Instead of purchasing computers and software, cloud computing
means you buy services, so one-off, upfront capital costs become ongoing operating costs instead. That
might work out much more expensive in the long-term.

If you're using software as a service (for example, writing a report using an online word processor or
sending emails through webmail), you need a reliable, high-speed, broadband Internet connection
functioning the whole time you're working. That's something we take for granted in countries such as the
United States, but it's much more of an issue in developing countries or rural areas where broadband is
unavailable.

An Introduction to Cloud Computing with OpenNebula


An OpenNebula Private Cloud provides infrastructure users with an elastic platform for fast delivery and
scalability of services to meet dynamic demands of service end-users. Services are hosted in VMs, and
then submitted, monitored and controlled in the Cloud by using Sunstone or any of the OpenNebula
interfaces:
21
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
Command Line Interface (CLI)
XML-RPC API
OpenNebula Ruby and Java Cloud APIs

The aim of a Private Cloud is not to expose to the world a cloud interface to sell capacity over the
Internet, but to provide local cloud users and administrators with a flexible and agile private
infrastructure to run virtualized service workloads within the administrative domain. OpenNebula
virtual infrastructure interfaces expose user and administrator functionality for virtualization,
networking, image and physical resource configuration, management, monitoring and accounting.

22
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
EXP.NO.1 Find procedure to run the virtual machine of different configuration.
Check how many virtual machines can be utilized at particular time.

AIM:
To write a procedure to run the virtual machine of different configuration and to check how
many virtual machines can be utilized at a particular time.

Procedure:
1. Setting up a Private Cloud in Open Nebula.
2. Front-end installation.
3. Add the cluster nodes to the system.
4. Configure ssh access.
5. Install the nodes.
6. Setting authorization.
7. prepare a virtual network for our VM.
8. Create a text file vmnet.template containing the following to create a virtual network with just
one IP.

Output:
ID NAME STAT CPU MEM HOSTNAME TIME
0 one-0 runn 0 65536 aquila01 00 0:00:02

Result:
Thus the program to run virtual machines on different configuration was successfully
executed & its utilization time is checked in various machines.

23
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
EXP.NO.2 Find procedure to attach virtual block to the virtual machine and
check whether it holds the data even after the release of the virtual machine.

AIM:
To write a procedure to attach virtual block to the virtual machine and to check whether it
holds the data even after the release of the virtual machine.

Procedure:

1. Create and List Existing VMs


2. Pause VM Instances.
3. Reset VM Instances.
4. Delay VM Instances.
5. To attach to a running VM the Image named storage:

$ onevm disk-attach one-5 --image storage

6. To detach a disk from a running VM

onevm detach vm_id disk_id:

24
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

OUTPUT

RESULT:

Thus the program to attach virtual block to virtual machine was successfully executed & checked
whether it holds the data after the release of the virtual machine.

25
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO.3 Install a C compiler in the virtual machine and execute a sample


Program.

AIM:
To install a C compiler in the virtual machine and to execute a sample program.

Procedure:
To install this tool:

Step 1: Install C/C++ compiler and related tools

If you are using Fedora, Red Hat, CentOS, or Scientific Linux, use the following yum command to install
GNU c/c++ compiler:
# yum groupinstall 'Development Tools'

If you are using Debian or Ubuntu Linux, type the following apt-get command to install GNU c/c++
compiler:
$ sudo apt-get update
$ sudo apt-get install build-essential manpages-dev

Step 2: Verify installation

Type the following command to display the version number and location of the compiler on Linux:
$ whereis gcc
$ which gcc
$ gcc --version

26
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Sample outputs:

Fig. GNU C/C++ compilers on Linux

How to Compile and Run C/C++ program on Linux

Create a file called demo.c using a text editor such as vi, emacs or joe:

#include<stdio.h>
/* demo.c: My first C program on a Linux */
int main(void)
{
printf("Hello! This is a test prgoram.\n");
return 0;
}
How do I compile the program on Linux?
Use any one of the following syntax to compile the program called demo.c:
cc program-source-code.c -o executable-file-name
OR
gcc program-source-code.c -o executable-file-name
27
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
OR
## assuming that executable-file-name.c exists ##
make executable-file-name

In this example, compile demo.c, enter:


cc demo.c -o demo
OR
## assuming demo.c exists in the current directory ##
make demo

If there is no error in your code or C program then the compiler will successfully create an executable file
called demo in the current directory, otherwise you need fix the code. To verify this, type:
$ ls -l demo*

How do I run or execute the program called demo on Linux?

Simply type the the program name:


$ ./demo
OR
$ /path/to/demo
Samples session:

28
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Animated gif 01: Compile and run C and C++ program demo

Compiling and running a simple C++ program

Create a program called demo2.C as follows:

#include "iostream"
// demo2.C - Sample C++ prgoram
int main(void)
{
std::cout << "Hello! This is a C++ program.\n";
return 0;
}

To compile this program, enter:

g++ demo2.C -o demo2

29
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
## or use the following syntax ##
make demo2

To run this program, type:

./demo2

RESULT:
Thus the program to install a C complier is done and the sample program was executed
successfully.

30
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO.4 Show the virtual machine migration based on the certain condition from
one node to the other.

AIM:
To implement migration of virtual machine based on the certain conditions from one node to the
other.

Procedure:
1. Migr---- migrate the VM is migrating from one resource to another. This can be a life migration or
cold migration (the VM is saved and VM files are transferred to the new resource).

vMotion Performance in an Email/


Messaging Environment

31
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
Email continues to be the key communication tool among organizations. Accordingly, IT departments regard
email systems as mission-critical applications. Microsoft Exchange Server is a widely used email platform in
business worldwide. Therefore,Microsoft Exchange Server 2010 was chosen as the email server to use to
study the impact of vMotion.
Test Methodology
Load-Generation Software
The Microsoft Exchange Load Generator 2010 tool (LoadGen), the official Exchange Server performance
assessment tool from Microsoft, was used to simulate the email users. LoadGen simulates a number of MAPI
(Mail Application Program Interface) clients accessing their email on Exchange Servers. Included with
LoadGen are profiles for light, medium and heavy workloads. In all of the tests, Outlook 2007 online clients
using a very heavy user profile workload150 messages sent/received per day per userwere used for load
generation. Each mailbox was initialized with 100MB of user data.
Tests were configured on the commonly used Exchange Server deployment scenarios.
Exchange Server Configuration
The Exchange Server test environment consisted of two mailbox server role virtual machines and two client
access and hub transport combined-role virtual machines to support 8,000 very heavy users. These two types
of virtual machines were configured as follows:
The mailbox server role virtual machine was configured with four vCPUs and 28GB of memory to support
4,000 users. The mailbox server role had higher resource (CPU, memory and storage I/O) requirements.
Therefore, a mailbox server role virtual machine was used as a candidate for vMotion testing.
The client access and hub transport combined-role virtual machine was configured with four vCPUs and
8GB of memory.
The following test scenarios for vMotion tests were used:
Test scenario1(one virtual machine): Perform vMotion on a single mailbox server role virtual
machine(running a load of 4,000 very heavy users).
Test scenario 2 (two virtualmachines): Perform vMotion on two mailbox server role virtual machines
simultaneously (running a combined load of 8,000 very heavy users).
32
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
In this study, the focus was on the duration of vMotion, and the impact on application performance when an
Exchange Server virtual machine was subjected to vMotion. To measure the application performance, the
following metrics were used:
Task Queue Length
The LoadGen task queue length is used as a popular metric to study the user experience and SLA trending in
Exchange Server benchmarking environments. The number of the tasks in the queue will increase if Exchange
Server fails to process the dispatched tasks expeditiously. So the rise in the task queue length directly
reflects a decline in the client experience.
Number of Task Exceptions
The LoadGen performance counter presents the number of task executions that resulted in a fatal exception,
typically due to lack of response from Exchange Servers.

Test Result
In the single mailbox server virtual machine test scenario, machine memory consumed and in use by the guest
was 28GB of memory when the migration of the mailbox server virtual machine was initiated. The vMotion
duration dropped from 71 seconds on vSphere 4.1 to 47 seconds on vSphere 5, a 33% reduction. In the two
mailbox server virtual machines scenario, the total machine memory consumed and in use by both mailbox
server virtual machines was 56GB, when vMotion was initiated. Once again, the vSphere 5 results were quite
impressive. The total duration dropped by about 49 seconds when using vSphere 5, a 34% reduction.
The following table compares the impact on the guest during vMotion on both vSphere 4.1 and vSphere 5
during the onevirtual machine test scenario

SCENARIO TASK QUEUE LENGTH (MAXIMUM) NUMBER OF TASK EXCEPTIONS


vSphere 4.1 294 0
vSphere 5 219 0

33
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

The table shows that the maximum size of the task queue length observed during vMotion on vSphere 5 was
219, much smaller than the 294 observed on vSphere 4.0. This confirms that Exchange Server users got a
better response time during the migration period in the vSphere 5 environment. There were no reported task
exceptions during migrations. This means that no Exchange Server task was dropped in either the vSphere 5
or the vSphere 4.1 environment.
Results from these tests clearly indicate that the impact of vMotion is minimal on even the largest memory-
intensive email applications.

RESULT:
Thus the program to implement migration of virtual machine was executed
successfully.

34
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO.5 Procedure to install storage controller and interact with it.

AIM:
To write a procedure to install storage controller and interact with it.

Procedure:

Storage Configuration

OpenNebula uses Datastores to manage VM disk Images. There are two configuration steps needed
to perform a basic set up:

First, you need to configure the system datastore to hold images for the running VMs, check
the the System Datastore Guide, for more details.
Then you have to setup one ore more datastore for the disk images of the VMs, you can find
more information on setting up Filesystem Datastores here.

The suggested configuration is to use a shared FS, which enables most of OpenNebula VM
controlling features. OpenNebula can work without a Shared FS, but this will force the
deployment to always clone the images and you will only be able to do cold migrations.

The simplest way to achieve a shared FS backend for OpenNebula datastores is to export via NFS
from the OpenNebula front-end both the system (/var/lib/one/datastores/0) and the images
(/var/lib/one/datastores/1) datastores. They need to be mounted by all the virtualization nodes to be
added into the OpenNebula cloud.

Setting up the internal storage for host operating system installation

1.Enter the RAID controller BIOS by pressing Ctrl+R at the relevant prompt during boot.

2.Highlight Controller 0, and press F2.

3.Select Create New VD.


35
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
4.Select the first two drives, select RAID level 1, tab to the OK button, and press Enter. Accept the
warning regarding initialization.

5.Select the new virtual drive, press F2, and select InitializationStart Init.

6.Wait for the initialization operation to complete.

7.Repeat steps 2 through 6 for the remaining internal volume, selecting drives three and four.

8.Press Escape, and choose Save and Exit to return to the boot sequence.

9.Repeat steps 1 through 8 on each server.

Setting up the external storage

1.Using the command-line console, via serial cable, reset the first Dell EqualLogic PS5000XV by
using the reset command.

2.Supply a group name, group IP address, and IP address for eth0 on the first of three arrays.

3.Reset the remaining two arrays in the same manner, supply the group name to join and IP address
created in Step 2, and supply an IP address in the same subnet for eth0 on each remaining tray.

4.After group creation, using a computer connected to the same subnet as the storage, use the Dell
EqualLogic Web interface to do the following:

a.Assign IP addresses on the remaining NICs (eth1 and eth2) on each array. Enable the NICs.

b.Verify matching firmware levels on each array and MTU size of 9,000 on each NIC on each array.

c.To create two storage pools,right-click Storage pools,select choose Create storage pool. Designate
one storage pool for VM OS. Designate the other storage pool for VM SQL Server data, SQL Server
transaction logs, and the utility virtual disk.

d.Click each member (array), and choose Yes when prompted to configure the member. Choose
RAID 10 for each array.

36
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
e.Assign the two arrays containing 15K drives to the SQL Server data storage pool. Assign the one
array containing 10K drives to the VM OS storage pool.

f.Create eight 750GB volumes in the database storage poolfour for VMware vSphere 5 usage and
four for Microsoft Hyper-V R2 SP1 usage.

g.Create eight 460GBvolumes in the OS storage poolfour for VMware vSphere 5 usage and four
for Microsoft Hyper-V R2 SP1 usage.

h.Enable shared access to the iSCSI target from multiple initiators on the volume.

i.Create an access control record for the volume without specifying any limitations.

j.During testing, offline the volumes not in use by the current hypervisor.

RESULT:
Thus the program to install storage controller was executed successfully.

37
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO.6 PROCEDURE TO SET UP ONE HADOOP CLUSTER

AIM:

To write a procedure to set up one hadoop cluster.

Procedure:
1. Main Installation
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin/ directory to PATH
export PATH= $PATH:$HADOOP_HOME/bin

2. Starting your single-node cluster


hduser@ubuntu:~$ sudo chmod -R 777 /usr/local/hadoop

3. Run the command

hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

4. Stopping your single-node cluster


hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh

Result:
Thus the procedure to set up one hadoop cluster was executed successfully.

38
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO.7 MOUNT THE ONE NODE HADOOP CLUSTER USING FUSE.

AIM:

To mount the one node hadoop cluster using FUSE.

PROCEDURE:

To install fuse-dfs on Ubuntu 12.04 and higher:

$ wget http://archive.cloudera.com/one-click-install/maverick/cdh3-repository_1.0_all.deb
$ sudo dpkg -i cdh3-repository_1.0_all.deb
$ sudo apt-get update
$ sudo apt-get install hadoop-0.20-fuse

Once fuse-dfs is installed, go ahead and mount HDFS using FUSE as follows.

$ sudo hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>

Once HDFS has been mounted at <mount_point>, you can use most of the traditional filesystem
operations (e.g., cp, rm, cat, mv, mkdir, rmdir, more, scp). However, random write operations such
as rsync, and permission related operations such as chmod, chown are not supported in FUSE-
mounted HDFS.

Result:
Thus the procedure to mount the one node hadoop cluster using FUSE was executed
successfully.

39
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO.8 Write a Program to use the APIs of Hadoop to Interact with it

AIM:
To write a program using APIs of Hadoop to interact with it.

Procedure:
1. Once you have downloaded a test dataset, we can write an application to read a file from the local
file system and write the contents to Hadoop Distributed File System.
2. Export the Jar file and run the code from terminal to write a sample file to HDFS.
3. Verify whether the file is written into HDFS and check the contents of the file.
4. Next, we write an application to read the file we just created in Hadoop Distributed File System
and write its contents back to the local file system.
5. Export the Jar file and run the code from terminal to write a sample file to HDFS.
6. Verify whether the file is written back into local file system.

Result:
Thus the program using hadoop APIs was executed successfully.

40
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

EXP.NO.9 Word Count Program to demonstrate the use of Map and Reduce
Tasks

AIM:
To write a word count program to demonstrate the use of map and reduce tasks.

PROCEDURE:

A sample Map/Reduce job

Lets run a simple Map/Reduce job written in R and C++ (just for fun we assume that all the nodes
run the same operating system and they use the same CPU architecture).

$ su
# yum install readline-devel
# cd
# wget http://cran.rstudio.com/src/base/R-3.1.2.tar.gz
# tar -zxf R-3.1.2.tar.gz
# cd R-3.1.2
# /configure --with-x=no --with-recommended-packages=no
# make
# make install
#R
R> install.packages('stringi')
R> q()

2. Edit yarn-site.xml (on all nodes)

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

3. Create script wc_mapper.R:

#!/usr/bin/env Rscript
41
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
library('stringi')
stdin <- file('stdin', open='r')

while(length(x <- readLines(con=stdin, n=1024L))>0) {


x <- unlist(stri_extract_all_words(x))
xt <- table(x)

words <- names(xt)

counts <- as.integer(xt)


cat(stri_paste(words, counts, sep='t'), sep='n')
}

4. Create a source file wc_reducer.cpp:


5. Lets submit a map/reduce job via the Hadoop Streaming API

$ chmod 755 wc_mapper.R


$ hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar
-input /input/test.txt
-output /output
-mapper wc_mapper.R
-reducer wc_reducer
-file wc_mapper.R
-file wc_reducer

RESULT:

Thus the Word Count Program to demonstrate the use of Map and Reduce Tasks was
executed successfully.

42
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Additional Programs in Grid Computing & Cloud Computing

43
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Anonymous Transfer using Globus Grid


AIM:
To implement Anonymous Transfer using Globus Grid.

Procedure:

Install the GridFTP Server

http://www.gridftp.org/tutorials/

tar xvfz gt-gridftp*.tar.gz

cd gt-gridftp-binary-installer

./configure -prefix /path/to/install


ignore any java/ant warnings

make gridftp install

Setup the environment (repeat for all globus sessions)


export GLOBUS_LOCATION=/path/to/install
source $GLOBUS_LOCATION/etc/globus-user-env.sh

globus-gridftp-server options
globus-gridftp-server --help

Start the server in anonymous mode


globus-gridftp-server control-interface 127.0.0.1 -aa p 5000

Run a two party transfer


globus-url-copy -v file:///etc/groupftp://localhost:5000/tmp/group

Run 3rd party transfer


globus-url-copy -v ftp://localhost:<port>/etc/group ftp://localhost:<port>/tmp/group2

44
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
Experiment with -dbg, -vb -fast options
globus-url-copy -dbg file:///etc/groupftp://localhost:5000/tmp/group
globus-url-copy -vb file:///dev/zeroftp://localhost:5000/dev/null

Kill the server

Examine debug output


TCP connection formed from client to server
Control connection authenticated
Several session establishment options sent
Data channel established
MODE E command
PASV sent to server
Server begins listening and replies to client with contact info
Client connected to the listener
File is sent across data connection

Security Options
Clear text (RFC 959)
Username/password
Anonymous mode (anonymous/<email addr>)
Password file
SSHFTP
Use ssh/sshd to form the control connection
GSIFTP
Authenticate control and data channels with GSI

User Permissions

File permissions are handled by the OS


inetd or daemon mode
Fork on TCP accept
User is mapped to a local account
Client connects and the server forks
The child process authenticates the connection
The child process setuid() to a local user

inetd/daemon Interactions using Globus Grid

45
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
xinetd

service gsiftp
{
socket_type = stream
protocol = tcp
wait = no
user = root
env += GLOBUS_LOCATION=<GLOBUS_LOCATION>
env += LD_LIBRARY_PATH=<GLOBUS_LOCATION>/lib
server = <GLOBUS_LOCATION>/sbin/globus-gridftp-server
server_args = -i
disable = no
}

inetd

gsiftp stream tcp nowait root /usr/bin/env env \


GLOBUS_LOCATION=<GLOBUS_LOCATION> \
LD_LIBRARY_PATH=<GLOBUS_LOCATION>/lib \
<GLOBUS_LOCATION>/sbin/globus-gridftp-server -i
Remember to add 'gsiftp' to /etc/services with port 2811.

Result:

Thus the Anonymous Transfer using Globus Grid was successfully executed.

46
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Password file using Globus Grid

AIM:
To provide Password for a file.

Procedure:
Create a password file
gridftp-password.pl > pwfile

Run the server in password mode


globus-gridftp-server p 5000 -password-file /full/path/of/pwfile

Connect with standard ftp program


ftp localhost 5000
ls, pwd, cd, etc...

Transfer with globus-url-copy


globus-url-copy file:///etc/group ftp://username:pw@localhost:5000/tmp/group

globus-url-copy -list ftp://username:pw@localhost:5000/

Result:

Thus the above program to give password to a file is successfully executed.


PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

vMotion Performance in a Web Environment


AIM:
To implement vMotion Performance in a Web Environment.

Procedure:
Test Methodology

Load-Generation Software

SPECweb2005 is an industry-standard Web server workload defined by the Standard Performance


Evaluation Corporation (SPEC).

The SPECweb2005 architecture represents a typical Web architecture that consists of clients, Web
server software (that includes PHP or JSP support) and a back-end application and database server. The
SPECweb2005 benchmark comprises three component workloads including banking, e-commerce and
support. The support workload used in our tests is the most I/O intensive of the three workloads. It emulates a
vendor support site that provides downloads, such as driver updates and documentation, over HTTP. The
performance score of the workload is measured in terms of the number of simultaneous user/browser sessions
a Web server can handle while meeting the QoS requirements specified by the benchmark.
We used the following test scenario for our vMotion tests. Both the source and destination vSphere
hosts were configured with two 10GbE ports, one used for Web client traffic and the other for vMotion
traffic.
Test Scenario

The test scenario for this case study includes the following:

RockAWeb/JSPservdeployedrinasinglevirtualmachineconfiguredwithfourvCPUsand12GBmemory

SUSELinuxEnterpriseServer11x64astheguestOS
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
benchmarkAloadof12,000supportusers,whichgeneratednearly6GbpsWebtraffic

The objectives of the tests were to measure the total migration time and to quantify the application
slowdown when a virtual machine is subjected to vMotion during the steady-state phase of the SPECweb2005
benchmark. The SPECweb2005 benchmark was configured to enable fine-grained performance tracking.
Specifically, the BEAT_INTERVAL test parameter was configured with a value of 2 seconds, which resulted
in the clients reporting the performance data every 2 seconds (default: 10 seconds). Two seconds was the
lowest granularity level that was supported by the benchmark driver. This fine-grained performance tracking
helped us quantify the application slowdown (the number of user sessions failing to meet QoS requirements)
during the different phases of the vMotion.

As described in the test scenario, the test used a load of 12,000 support users, which generated a
substantial load on the virtual machine in terms of CPU and network usage. During the steady-state period of
the benchmark, the client network traffic was close to 6Gbps and the CPU utilization (esxtop %USED
counter) of the virtual machine was about 325%.
Test Results

Figure 2 compares the total elapsed time for vMotion in both vSphere 4.1 and vSphere 5 for the following
configurations:

Both source and destination hosts running vSphere 4.1

Both source and destination hosts running vSphere 5

vMotion Total Duration


35

30
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
25

20

(in vMotion Total Duration


15

10

0
vSphere 4.1 vSphere 5
Both test scenarios used a dedicated 10GbE network adaptor for vMotion traffic. The total vMotion time
dropped from 30 seconds to 19 seconds when running vSphere 5, a 37% reduction, clearly showing vMotion
performance improvements made in vSphere 5 towards reducing vMotion transfer times. Our analysis
indicated that most of the gains were due to the optimizations in vSphere 5 that enabled vMotion to
effectively saturate the 10GbE bandwidth during the migration.
Figure 3 plots the performance of the Web server virtual machinebefore, during and after vMotion when
running vSphere 4.1.
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
The figure plots the number of SPECweb2005 user sessions that meet the QoS requirements (Time Good) at
a given time. In this graph, the first dip observed at 17:09:30 corresponds to the beginning of the steady- state
interval of the SPECweb2005 benchmark when the statistics are cleared. The figure shows that even though
the actual benchmark load was 12,000 users, due to think-time used in the benchmark, the actual number of
users submitting the requests at a given time is about 2,750. During the steady-state interval, 100% of the
users were meeting the QoS requirements. The figure shows that the vMotion process started at about 1
minute into the steady-state interval. The figure shows two dips in performance. The first noticeable dip in
performance was during the guest trace phase during which trace is installed on all the memory pages. The
second dip is observed during the switchover phase when the virtual machine is momentarily quiesced on the
source host and is resumed on the destination host. In spite of these two dips, no network connections were
dropped or timed out and the SPECweb2005 benchmark run continued.
Figure 4 plots the performance of the Web server virtual machinebefore, during and after vMotion when
running vSphere 5 with a single 10GbE network adaptor configured for vMotion.

Figure 4 shows the beginning of the steady state at about 12:20:30 PDT, marked by a small dip. During the
steady-state interval, 100% of the users were meeting the QoS requirements. The figure shows that the
51
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
vMotion process started after 2 minutes into the steady-state interval. In contrast to the two dips observed on
vSphere 4.1, only a single noticeable dip in performance was observed during vMotion on vSphere 5. The dip
during the guest trace stage was insignificant, due to improvements made in vSphere 5 to minimize the impact
of memory tracing. The only noticeable dip in performance was during the switchover phase from the source
to the destination host. Even at such high load level, the amount of time the guest was quiesced during the
switchover phase was about 1 second. It took less than 5 seconds to resume to the normal level of
performance.

Result:
Thus the program to implement vMotion Performance in a Web Environment was successfully
executed.

52
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE

Program to Sample Data Using Mapreduce Framework.


Aim:
To write MapReduce applications to have an understanding of how data is transformed as it executes
PANIMALAR INSTITUTE OF TECHNOLOGY DEPT OF CSE
in the MapReduce framework.

Procedure:

From start to finish, there are four fundamental transformations. Data is:

1. Transformed from the input files and fed into the mappers

2. Transformed by the mappers

3. Sorted, merged, and presented to the reducer

4. Transform by reducers and written to output files

Result:
Thus the program for MapReduce applications to have an understanding of how data is transformed was
successfully executed.

53

IV/VII SEM 5
4