Sie sind auf Seite 1von 15

GRIDSEED: A Virtual Training Grid Infrastructure

I. Gregori , M. Patil and S. Cozzini

SISSA eLab, Trieste, Italy CNR/INFM Democritos National Simulation Center, Trieste, Italy

Lecture given at the Joint EU-IndiaGrid/CompChem Grid Tutorial on Chemical and Material Science Applications Trieste, 15-18 September 2008

LNS0924003

cozzini@democritos.it

Abstract GRIDSEED provides a simple tool to setup a portable fully edged gLite Grid infrastructure based on virtual machines. The GRIDSEED tool was developed to easily deploy a training Grid infrastructure almost everywhere in the world with a set of machines (simple PCs) locally connected among them as the only requirement. It uses a standard virtualization tool like VMware, easily and widely available. On the top of the gLite middleware GRIDSEED includes a set of demo applications deployed on Grid infrastructure and some tools associated. All these extra features are fully documented and tutorials are provided as well. GRIDSEED is therefore a complete training environment formed by a virtual infrastracture complemented by some demo applications and training materials ready to be used in standard training events. In this short paper we present the motivations behind this tool, the way it could be installed and used for training events and some technical details.

Contents
1 Introduction 2 Context 3 GRIDSEED architectures 3.1 Architecture . . . . . . . 3.2 Deployment . . . . . . . 3.3 Availability . . . . . . . 4 GRIDSEED environment 5 Conclusions References and technical . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 44 implementation 45 . . . . . . . . . . . . 45 . . . . . . . . . . . . 48 . . . . . . . . . . . . 50 50 51 53

GRIDSEED: A Virtual Training Grid Infrastructure

43

Introduction

Grid infrastructures play a fundamental role in promoting and wide-spreading Grid computing technologies training. Training infrastructure allows beginners to play and experience how to use a Grid without the burden to be ocially enrolled in a Grid production environment. In case of EGEE and related projects a dedicated training infrastructure named GILDA [6] has been setup since a few years. Beside this permanent infrastructure several eorts have been done to setup temporary Grid infrastructures to be used during training and dissemination events. There are several methods to setup such temporary Grid infrastructures. Right now almost all proposed solutions are based on a virtualization approach. The availability of free virtual software like xen [1] and VMware server and player [2] has indeed recently boosted the use of virtualization for temporary infrastructures dedicated to training (see for instance [3]). The basic idea behind this approach is to have many dierent virtual machines each serving a specic Grid service. These virtual machines can be hosted in one or more physical servers and then appropriately congured in order to start all the needed Grid services. GRIDSEED is just a step further in this direction: based on virtual machines this tool hides the complex task to congure, by hand, all the services available on a Grid. In GRIDSEED a set of pre-congured virtual machines can be installed on a set of physical machines connected in a LAN and once booted, they oer all the Grid services without any further conguration. The system can then be monitored via a simple web interface. Our goal here is to provide a full training environment where the training infrastructure can be turned on with minimal eort and easily used and exploited. In the remainder of this paper we present in detail our tool. In the next section we describe the context where GRIDSEED was deployed discussing the motivation and the evolution of the project. Section 3 introduces the general architecture and some details: we also discuss a few technical choices done during the development. Section 4 discusses the full GRIDSEED environment and the way it can be congured and used for standard training events. We also discuss advantages and limitations with respect to other experiences/tools. Finally in section 5 we set forth our future plans and draw some conclusions.

44

I. Gregori, M. Patil and S. Cozzini

Context

The research group behind this project is based in Trieste and involves people from ICTP (at the beginning belonging to EGRID [4] team and then later to EU-IndiaGRID [5] one), from Democritos National Simulation Center and Sissa. The maintainer is now the recently born joint SISSA/Democritos laboratory for e-science (in short eLab) that is involved, in collaboration with ICTP, in many Grid computing projects and activities. eLab is providing computational resources to EGEE infrastructure supporting Eu-India and CompChem Virtual Organizations. Members of our groups recently promoted a few training events on Grid technologies in several parts of the world. Some of the training activities took place in regions with very limited bandwidth. It therefore happened that a remote training Grid infrastructure (like for instance GILDA) was unaccessible making it impossible to have hands-on sessions. These experiences convinced us that in order to be productive a fully edged training Grid infrastructure needs to be setup at the location where the training event is organized: this approach is actually followed right now in many training events at European level as well. The main objectives that GRIDSEED tries to reach are the following: Very small requirements GRIDSEED should not require sophisticated hardware and/or advanced software setup in a generic computer lab. Requirements should be kept to the minimum to avoid any diculty in providing what is needed by local organizers. Ease of use GRIDSEED should make the setup and the management of the training infrastructure as simple as possible. The complete infrastructure should be setup and then managed by people that could be completely unaware about the complexity to congure global and local Grid services. Stability and reliability The GRIDSEED infrastructure should be stable and reliable: we are marginally interested here in performance. The system must be responsive under some demo loads to be useful but it is tolerable that under heavy real load some slow downs can occur in favour of stability.

GRIDSEED: A Virtual Training Grid Infrastructure

45

Expandable and portable GRIDSEED should be easy to expand in terms of new services, new Grid-sites, new kinds of middleware. Advanced users should easily be able to expand it depending on their needs. GRIDSEED was initially developed as a natural evolution of the EGRID Live CD [7] a pionering solution in making gLite middleware installation and conguration simple. The kick-o of GRIDSEED activities took place during the EU-IndiaGrid tutorial organized in June 2007 in Kolkata (India): in that occasion a fully edge computational EGEE/glite Grid was setup based on VMware virtual machines and this can be considered as an alpha version of the product. In all the subsequent training activities of the EU-India project enhanced versions of GRIDSEED were used and tested. From May 2008 the initiative was sponsored and supported by the eLab team which became more and more actively involved in the development. Version 1.1 was successfully used during the JOINT EU-IndiaGRID/CompChem Grid Tutorial on chemical and material science applications. We recently released version 1.3 and we discuss this last version in the rest of the paper.

GRIDSEED architectures and technical implementation

GRIDSEED consists of a set of VMware virtual machines (VMs) each of them hosting one or more gLite services, working in a coordinated way within a dedicated local network. We decided to use VMware virtualization tools due to the free availability of VMplayer and VMware server on both Linux and Windows architectures. This choice helps us to keep requirements to the minumum: for instance, a Window-based laboratory can easily be used by installing VMware software. VMware is also state of the art in virtualisation, it is not very invasive, and it has a history of being used in industry. So users of GRIDSEED can leverage industrial strength virtualisation products, for free. Prerequisite for GRIDSEED, is therefore the presence of VMware software in the target hosts and a LAN connecting them.

3.1

Architecture

Figure 1 graphically describes the set of GRIDSEED VMs at the moment available.

46

I. Gregori, M. Patil and S. Cozzini

Figure 1: GRIDSEED architecture: see text for discussion.

We can distinguish four kinds of VMs based on their logical roles within the architecture: The master VMs: These VMs collect ad hoc GRIDSEED services to setup, manage and monitor the whole architecture. At the moment such services are all collected in just one machine (master) that will be described in detail later. Central Grid services VMs: Central services are hosted on three dierent VMs. The Central VM is congured to host both information services (Top-bdii) and data management services (LCG File Catalog LFC). WMS VM hosts only the Workload management system. VOMS and Myproxy central services are hosted on the master VM. Site Services: These are VMs needed to operate a Grid site. A minimal GRIDSEED site is formed by a Storage Element, a Computing Element and two Worker Nodes. Such conguration can be expanded to include up to 20 WNs. GRIDSEED provides at the moment two kinds of Grid sites. Site-1 features an LCG Computing Element and Storm Storage Element while Site-2 features the recent CREAM-CE

GRIDSEED: A Virtual Training Grid Infrastructure

47

together with a DPM installation as Storage Element. This basic conguration can be easily expanded by adding other sites. This of course depends on the HW resources available. The maximum number of sites is set at the moment to 20. UI services: These VMs are hosting the User Interface middleware. GRIDSEED, beside the standard UI coming with gLite middleware provides also the latest version of Milu, the Miramare Lightweight User Interface [8]. Again depending on the hardware and the number of people using the infrastructure, more UIs can be added up a maximum of 10. Such UIs have a set of predened accounts with a script to create Grid certicates on the y. The gLite middleware on all the nodes is congured using YAIM (V4), a suite of shell scripts that setup the middleware according to the site-wide conguration le. The current version of GRIDSEED works for gLite version 3.1 and is kept constantly updated. The core of the architecture is played by the master VM which provides a set of services to keep all the VMs coordinated and easily manageable. We therefore installed on it the following services and tools, commonly used to manage clusters of workstations: c3 tools, for remote management of all the VMs. DNS, completely congured up to the maximum level of expandability. iptable service to provide outbound connectivity from VMs to external network. ntp service to keep VMs synchronized. Unfortunately VMware has problems with time syncronization accurate within the second which is critical for Grids in general and especially for the security component of gLite called GSI. For this reason the NTP standard server could not be used and an ad hoc simple shell script keeps the machine synchronized. A web server that collects web applications to monitor the status of the VMs and the most important Grid services. The functionality of Grid sites is tested by the execution of Site Availability Monitor (SAM) tests that launches periodic tests at various Grid service instances (CE, SE, UI, etc.) to check their status.

48

I. Gregori, M. Patil and S. Cozzini

The above services are complemented by a fake Certication Authority. It consists of an Apache web server and a set of CGIs that issues all the X509 certicates required by the Grid infrastructure. Any user can generate his/her personal certicate and/or a host certicate to add, for instance, a new VM host to the system. Indeed, a user is not actually requested to do these steps manually: a script (ask cert.sh) available on the standard UI asks for user certicate and registers it automatically on the VOMS Server to enroll each user in the two Virtual Organizations pre-congured on the infrastructure: GRIDSEED and eLab. The master is also hosting two gLite services: VOMS and MYPROXY, this is just to keep the number of VM machines limited. We decided however to host VOMS service on the same VM where CA is installed to slightly facilitate automatic procedures for handling both user and host certicates.

3.2

Deployment

All the VMs can be booted from the same physical host, or alternatively they may be booted from dierent ones: it depends on the hardware available. Clearly, the more VMs are booted from the same physical host, the bigger the hardware requirements are for that host. We tried to keep to the minimum memory and size requirements for all the VM machines. The compressed size for each machine is in order of one GB and the minimal GRIDSEED set of machine compressed is of about 7 GB. The small size makes it very handy to download it on a usb pen and carry it to remote training locations where there is either no internet connection or very slow internet. Once all the machines are decompressed a few tens of gigabyte are needed to host all of them. Memory requirements are quite easily satised with recent hardware as well. The basic GRIDSEED setup can be managed with 8GB of RAM: this is the minimum requirement in terms of RAM. Of course, this amount could be provided by more than one machine. A fairly comfortable setup, able to deal with 10/15 users at the same time requires a total of 32GB of RAM, easily available in any medium size modern computer lab. At the moment we are currently keeping the working version of GRIDSEED on 4 servers, each of them with 8GB each: this allows us to keep up and running, beside all the central services and 2 UI 3, dierent Grid-sites in a comfortable way. Figure 2 shows a possible GRIDSEED deployment on two phyisical machines. On machine-1 we have all three core VMs running Master, Central

GRIDSEED: A Virtual Training Grid Infrastructure

49

and WMS virtual machines and one User Interface. On the second machine we have two Grid sites running. Each site has one Computing Element (CE), one Storage Element (SE) and one WorkerNode. A similar conguration requires at least 4 Gb or RAM for the rst machine and 8GB on the second one.

Figure 2: An example of GRIDSEED deployment on two servers.

As mentioned GRIDSEED is fairly exible. It is designed to create and congure a gLite Grid infrastructure made up of up to 10 UI, up to 20 sites, each one consisting of 1 CE + 1 SE + up to 20 WN. This represents a formidable Grid to play with. So it serves well the needs of training towards a large number of users: this however requires adequate hardware to run on. The easiness of copying and rebuilding VMs makes it feasible also for advanced training dedicated to system administrators as well. Once the VM are all booted only a few network conguration steps (described in detail in the wiki project) are needed in order to allow users reach, via ssh, the UI(s). This is because GRIDSEED VMs are on an isolated network and the only open gateway should be the one to reach the User Interfaces available. On each UI 20 predened accounts are also available and ready to be used by users with a few scripts to facilitate the creation of the jobs and testing of the infrastructure.

50

I. Gregori, M. Patil and S. Cozzini

3.3

Availability

On the GRIDSEED web pages it is possible to download all the machines together with detailed instructions about their installation/deployment. GRIDSEED is an open project and we encourage interested people to play with it and contribute to it. The GS software project is now hosted on the gforge portal for scientic software development hosted at eLab [9] and we welcome people from the Grid community who want to contribute to the project. On the gforge portal all conguration scripts and tools to deploy and develop the product will be made available. It is also possible to require other features and/or Grid services not yet developed.

GRIDSEED environment

GRIDSEED virtual training infrastructure is associated with a set of tutorials on Grid computing. These tutorials are available online on the GRIDSEED website and represent the latest and updated eort to complement the infrastructure with materials ready to be used in training events and/or for self-training. The web site is actually hosted in a wiki: the materials can easily be improved, updated and completed by anybody interested in collaborating with the project. At the moment the GRIDSEED wiki reports some basic and advanced tutorials dedicated to the users; these are standard tutorials prepared following the gLite User Guide. These tutorials can be used jointly with the GRIDSEED infrastructure and oer some advantages with respect to a standard and generic guide like, for instance: Output of the command obtained from command line is actually the same reported on the wiki: this does not confuse beginners. There is a clear correspondence among what is written on the wiki and what is obtained on the system. Cut and paste of the command is almost always working: in case of quite complex commands this allows trainees to focus on the command itself and not on the typesetting of the command. There is no need to change important parameters in the commands like, for instance, the Virtual Organization; again this allows trainers (and tutors as well) to focus on the meaning of the commands and the proposed exercises without worrying about many details that at rst iteration can cause some confusion.

GRIDSEED: A Virtual Training Grid Infrastructure

51

All the software needed is there and no specic congurations are required. The GRIDSEED environment is then enriched by some case studies of successfully ported scientic applications performed within the Grid projects our team is involved with. Thus, besides the standard tutorials there are also ad hoc tutorials intended to illustrate such porting activities. These tutorials are devoted to users eager to learn and understand tricks and tools developed to port scientic applications on the EGEE/gLite Grid infrastructure. We consider this an important added value because real porting experience can be analyzed in detail with all the software needed installed and under direct control of the trainees. We tried to keep our case studies to be of general interest as much as possible for larger communities. This GRIDSEED set of tutorials is also an important tool for self training. It has to be noted that people trained once on GRIDSEED are able to replicate their training infrastructure easily everywhere and are then ready to oer it to beginners. In this context GRIDSEED seems to be an ecient tool to allow trainees to become future trainers. There are still of course some limitations: for absolute beginners installing and using GRIDSEED from scratch can be dicult, and in case of a large number of users and applications the virtual approach overhead can be signicant. GRIDSEED environment has been successfully used several times in different contexts and for dierent purposes. We note that in training events focused on specic user communities, temporary infrastructure like GRIDSEED can easily be adapted and customized in order to provide specic tools and services requested by the user community. We experience the high degree of exibility provided by such temporary infrastructures in the event already mentioned and dedicated to chemistry and material science where we could easily add specic software to our Grid without struggling too much and without having to contact any external entity and/or people.

Conclusions

We presented GRIDSEED, a tool to setup a temporary training infrastructure using virtual technologies. Such system makes it easy to setup a training testbed to start experiencing with Grid infrastructure based on gLite middleware. GRIDSEED users and even trainers are largely shielded from the

52

I. Gregori, M. Patil and S. Cozzini

intricate details of Grid middleware installation and congurations and can fully dedicate to learn how to use the Grid and not how to install the middleware. GRIDSEED provides also its own Certication Authority so no administrative requests have to be issued to start playing within a Grid environment. Our tools are under development and there is room for further improvements; we are currently considering the following improvements: (i) to implement support to automatically add, if needed and if hardware resources are available, more Grid sites without any manual intervention by the GRIDSEED manager; (i) to add new services and features depending on the needs and the requests we receive. In this context we plan to add el [10] support to our UI; (iii) install and provide it with an updated repository of scientic applications already deployed on EGEE/gLite middleware and with all the tools that such applications require. Such an expansion is actually going on with ganga [11] and diane [12] developers that plan to run a tutorial on the top of the GRIDSEED infrastructure. We are nally interested to install VMs using dierent Grid middleware such as, for instance, Globus GT4 and make them interoperable with the gLite infrastructure. This would expand the potential audience for the tool.

GRIDSEED: A Virtual Training Grid Infrastructure

53

References
[1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt and A. Wareld, (2003). Xen and the Art of Virtualization. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. ACM. Energy and Nuclear Physics, March, 2003 [2] S. Devine, E. Bugnion and M. Rosenblum, (1998). Virtualization system including a virtual machine monitor for a comwith a segmented architecture. US Patent [3] R. Berlich and M. Hardt, (2005). Grid in a box - virtualisation techniques in Grid training. Presented at EGEE conference, Athens. Available via: http://www.ep1.rub.de/ruediger/pandoraAthens.pdf [4] See www.egrid.it [5] See www.euindiagrid.eu [6] GILDA (Grid INFN Laboratory for Dissemination Activities) see https://gilda.ct.infn.it/ [7] See http://www.egrid.it/sw/livecd [8] See http://doc.escience-lab.org/index.php/Main/Milu [9] See gforge.escience-lab.org [10] EGEE Grid storage in http://www.egrid.it/sw/el a local lesystem interface, see:

[11] Ganga: A tool for computational-task management and easy access to Grid resources arXiv:0902.2685v2 [12] See http://it-proj-diane.web.cern.ch/it-proj-diane/

Das könnte Ihnen auch gefallen