An Inquiry Report On Grid Computing

An Inquiry Report On
GRID
Submitted By
COMPUTING
Jeevan Kumar Vishwakarman
vishnudhoodhan@gmail.com | +91 9020 321 091
Jeevan Kumar Vishwakarman. All rights reserved. This is a collection of information availed from various sources including internet, text books, and newspaper publications. This collection is a sole work of the Mr. Jeevan Kumar Vishwakarman and any kind of reuse of this work in any form is prohibited, and all the rights on this collection is reserved to him and violation of this prohibition is punishable under the laws whichever is applicable.
1st M.C.A |M9 MCA AA 0010

Sree Saraswathy Thyagaraja College, Thippampatti, Pollachi.
For Personal Contact Use These. Karampotta, Kozhinjampara, Palakkad. 678555
vishnudhoodhan@gmail.com | vishnudhoodhan@yahoo.
CONTENTS
1. 2. 3. 4. 5. 6. 7. A Gentle Introduction To Grid Computing And Technologies ............. 1 What Is Grid Computing? ......................................................................... 1 Grid Computings Ancestors..................................................................... 2 The Architecture ......................................................................................... 3 Five Big Ideas .............................................................................................. 4 Grids Versus Conventional Supercomputers .......................................... 5 Virtual Organizations................................................................................. 6
From passenger jets to chemical spills .................................................................... 6
8. 9.
The Hardware ............................................................................................. 6 Design Considerations And Variations .................................................... 7
10. CPU Scavenging ......................................................................................... 8 11. History ......................................................................................................... 9 12. Father of the Grid ....................................................................................... 9 13. Current Projects And Applications......................................................... 15
Fastest virtual supercomputers ............................................................................ 17
14. Definitions ................................................................................................. 17

But what does "high performance" mean? ........................................................... 18
15. The Death Of Distance ............................................................................. 18

Faster! Faster! ..................................................................................................... 19
16. Secure Access ............................................................................................ 19

Security and trust................................................................................................ 20
17. Resource Use ............................................................................................. 20

Middleware to the rescue ..................................................................................... 21
18. Resource Sharing ...................................................................................... 21 19. But Would You Trust Your Computer To A Complete Stranger? ....... 21
iii
20. Open Standards ........................................................................................ 22 21. Who Is In Charge Of Grid Standards?.................................................... 22 22. The Middleware ....................................................................................... 23
Agents, brokers and striking deals........................................................................ 23 Delving inside middleware .................................................................................. 23
23. Globus Toolkit .......................................................................................... 24

Globus includes programs such as:....................................................................... 25
24. National Grids .......................................................................................... 26 25. International Grids ................................................................................... 28 26. High-Throughput Problems .................................................................... 32 27. High-Performance Problems ................................................................... 33 28. Grid Computing In 30 Seconds ............................................................... 34 29. The Dream ................................................................................................. 34 30. ''Gridifying'' Your Application ................................................................ 35 31. Computational Problems ......................................................................... 35
Parallel calculations: ............................................................................................ 35 Embarrassingly parallel calculations: ................................................................... 36 Coarse-grained calculations: ................................................................................ 36 Fine-grained calculations: .................................................................................... 36 High-performance vs. high-throughput ................................................................ 36 And grid computing..? ........................................................................................ 36
32. Breaking Moores Law? ........................................................................... 37

Nice Idea, But... ................................................................................................... 37
33. More On Moore's Law ............................................................................. 38 34. Works Cited .............................................. Error! Bookmark not defined.
Index ..................................................................... Error! Bookmark not defined.
iv
A Gentle Introduction To Grid Computing And Technologies

Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries. This makes Grid application management and deployment a complex undertaking. Grid middlewares provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment. Several software toolkits and systems have been developed, most of which are results of academic research projects, all over the world. This paper presents an introduction to Grid computing and discusses two complimentary Grid technologies: Globus developed by researchers from Argonne National Laboratory and University of Southern California, USA; and Gridbus by researchers from the University of Melbourne, Australia. Globus primarily focuses on providing core Grid services whereas Gridbus focuses on providing user-level Grid services in addition to utility computing model for management of grid resources.
What Is Grid Computing?

grid computing allows the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to provide a unique system image, granting users and applications access to vast it capabilities. Although "the grid" is still just a dream... Grid computing is already reality. Imagine several million computers from all over the world, and owned by thousands of different people. Imagine they include desktops, laptops, supercomputers, data vaults, and instruments like mobile phones, meteorological sensors and telescopes... Now imagine that all of these computers can be connected to form a single, huge and super-powerful computer! This huge, sprawling, global computer is what many people dream "the grid" will be. "The grid" takes its name from an analogy with the electrical "power grid". The idea was that accessing computer power from a computer grid would be as simple as accessing electrical power from an electrical grid".
An Inquiry Report On Grid Computing a Report Work
Grid Computings Ancestors

Grid computing didn't just come out of nowhere. It grew from previous efforts and ideas, such as those listed below:
Grid computing's immediate ancestor is "metacomputing", which dates back to around 1990. Metacomputing was used to describe efforts to connect us supercomputing centers. Larry smarr, a former director of the national center for supercomputing applications in the us, is generally credited with popularizing the term.
Fafner and i-way were cutting-edge metacomputing projects in the us, both conceived in 1995. Each influenced the evolution of key grid technologies.
Fafner (factoring via network-enabled recursion) aimed to factorize very large numbers, a challenge very relevant to digital security. Since this challenge could be broken into small parts, even fairly modest computers could contribute useful power. Many fafner techniques for dividing and distributing computational problems were forerunners of technology used for seti@home and other "cycle scavenging" software.
I-way (information wide area year) aimed to link supercomputers using existing networks. One of i-way's innovations was a computational resource broker, conceptually similar to those being developed for grid computing today. I-way strongly influenced the development of the globus project, which is at the core of many grid activities, as well as the legion project, an alternative approach to distributed supercomputing.
Grid computing was born at a workshop called "building a computational grid", held at argonne national laboratory in september 1997. Following this, in 1998, ian foster of argonne national laboratory and carl kesselman of the university of southern california published "the grid: blueprint for a new computing infrastructure", often called "the grid bible". Ian foster had previously been involved in the i-way project, and the foster-kesselman duo had published a paper in 1997, called "globus: a metacomputing infrastructure toolkit", clearly linking the globus toolkit with its predecessor, metacomputing.
Grid computing is a phrase in distributed computing which can have several meanings:
A local computer cluster which is like a "grid" because it is composed of multiple nodes. Offering online computation or storage as a metered commercial service, known as utility computing, "computing on demand", or "cloud computing".
The creation of a "virtual supercomputer" by using spare computing resources within an organization. The creation of a "virtual supercomputer" by using a network of geographically dispersed computers. Volunteer computing, which generally focuses on scientific, mathematical, and academic problems, is the most common application of this technology.
These varying definitions cover the spectrum of "distributed computing", and sometimes the two terms are used as synonyms. This article focuses on distributed computing technologies which are not in the traditional dedicated clusters; otherwise, see computer cluster.
Functionally, one can also speak of several types of grids:
Computational grids (including CPU scavenging grids) which focuses primarily on computationally-intensive operations. Data grids or the controlled sharing and management of large amounts of distributed data. Equipment grids which have a primary piece of equipment e.g. A telescope, and where the surrounding grid is used to control the equipment remotely and to analyze the data produced.
The Architecture
Grid architecture is the way in which a grid has been designed.
A grid's architecture is often described in terms of "layers", where each layer has a specific function. The higher layers are generally user-centric, whereas lower layers are more hardware-centric, focused on computers and networks.
The lowest layer is the network, which connects grid resources. Above the network layer lies the resource layer: actual grid resources, such as computers, storage systems, electronic data catalogues, sensors and telescopes that are connected to the network.
The middleware layer provides the tools that enable the various elements (servers, storage, networks, etc.) To participate in a grid. The middleware layer is sometimes the "brains" behind a computing grid!
The highest layer of the structure is the application layer, which includes applications in science, engineering, business, finance and more, as well as portals and development toolkits to support the applications. This is the layer that grid users "see" and interact with. The application layer often includes the so-called serviceware, which performs general management functions like tracking who is providing grid resources and who is using them.
Five Big Ideas

Grid computing is driven by five big areas:
1. 2.
Resource sharing: global sharing is the very essence of grid computing. Secure access: trust between resource providers and users is essential, especially when they don't know each other. Sharing resources conflicts with security policies in many individual computer centers, and on individual pcs, so getting grid security right is crucial.
3. 4. 5.
Resource use: efficient, balanced use of computing resources is essential. The death of distance: distance should make no difference: you should be able to access to computer resources from wherever you are. Open standards: interoperability between different grids is a big goal, and is driven forward by the adoption of open standards for grid development, making it possible for everyone can contribute constructively to grid development. Standardization also encourages
industry to invest in developing commercial grid services and infrastructure.
Grids Versus Conventional Supercomputers

"distributed" or "grid computing" in general is a special type of parallel computing which relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) Connected by a conventional network interface, such as ethernet or the internet. This is in contrast to the traditional notion of a supercomputer, which has many CPUs connected by a local high-speed computer bus. The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which when combined can produce similar computing resources to a many-CPU supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers. The primary performance disadvantage is that the various CPUs and local storage areas do not have high-speed connections. This arrangement is thus well-suited to applications where multiple parallel computations can take place independently, without the need to communicate intermediate results between CPUs. The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between nodes relative to the capacity of the public internet. Conventional supercomputers also create physical challenges in supplying sufficient electricity and cooling capacity in a single location. Both supercomputers and grids can be used to run multiple parallel computations at the same time, which might be different simulations for the same project, or computations for completely different applications. The infrastructure and programming considerations needed to do this on each type of platform are different, however. There are also differences in programming and deployment. It can be costly and difficult to write programs so that they can be run in the environment of a supercomputer, which may have a custom operating system, or require the program to address concurrency issues. If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can cause conventional, standalone programs to run on multiple machines (but each given a different part of the same problem). This makes it possible to write and debug programs on a single conventional machine, and eliminates
complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.
Virtual Organizations
Virtual organizations (vos) are groups of people who share a data-intensive goal. To achieve their mutual goal, people within a vo choose to share their resources, creating a computer grid. This grid can give vo members direct access to each other's computers, programs, files, data, sensors and networks. This sharing must be controlled, secure, flexible, and usually time-limited.
From Passenger Jets To Chemical Spills
Many scientists form vos to pursue their research. Vos exist for astronomy research, alternative energy research, biology research and more. The needs of each vo are different. For example, a vo formed to develop a nextgeneration passenger jet will need to run complex computer simulations, testing various combinations of components from different manufacturers, while keeping the proprietary know-how associated with each component hidden from the other consortium members. Another example is an environmental science vo, tasked with managing a chemical spill. This vo will need to analyze local weather and soil models to estimate the spread of the spill and determine its impact. They will need to create a short term mitigation plan and help emergency response personnel to plan and coordinate the evacuation.
The Hardware
Grids must be built "on top of" hardware, which forms the physical infrastructure of a grid - things like computers and networks. This infrastructure is often called the grid "fabric".
Networks are an essential piece of the grid "fabric". Networks link the different computers that form part of a grid, allowing them to be handled as one huge computer. Networks are characterized by their size (local, national and international) and throughput (the amount of data transferred in a specific time). Throughput is measured in kbps (kilo bits per second; where kilo means a thousand), mbps (m for mega; a million) or gbps (g for giga; a billion). One of the big ideas of grid computing is to take advantage of ultra-fast networks. This idea allows us to access globally distributed resources in an integrated and data-intensive way. Ultra-fast networks also help to minimize latency: the delays that build up as data are transmitted over the internet. Grids are built "on top of" high-performance networks, such as the intraeuropean geant network, which has 10gbps performance on the network "backbone". This backbone links the major "nodes" on the grid (like national computing centres). One level down from the "backbone" are the network links, which join individual institutions to nodes on the backbone. Performance of these is typically 1gbps. A further level down are the 10 to 100mbps desktop-to-institution network links.
Design Considerations And Variations

One feature of distributed grids is that they can be formed from computing resources belonging to multiple individuals or organizations (known as multiple administrative domains). This can facilitate commercial transactions, as in utility computing, or make it easier to assemble volunteer computing networks. One disadvantage of this feature is that the computers which are actually performing the calculations might not be entirely trustworthy. The designers of the system must thus introduce measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results, and from using the system as an attack vector. This often involves assigning work randomly to different nodes (presumably with different owners) and checking that at least two different nodes report the same answer for a given work unit. Discrepancies would identify malfunctioning and malicious nodes.
Due to the lack of central control over the hardware, there is no way to guarantee that nodes will not drop out of the network at random times. Some nodes (like laptops or dialup internet customers) may also be available for computation but not network communications for unpredictable periods. These variations can be accommodated by assigning large work units (thus reducing the need for continuous network connectivity) and reassigning work units when a given node fails to report its results as expected. The impacts of trust and availability on performance and development difficulty can influence the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal to the developing organization, or to an open external network of volunteers or contractors. In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust "client" nodes must place in the central system. For example, parabon computation produces grid computing software that operates in a java sandbox. Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on heterogeneous systems, using different operating systems and hardware architectures. With many languages, there is a tradeoff between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). Cross-platform languages can reduce the need to make this tradeoff, though potentially at the expense of high performance on any given node (due to run-time interpretation or lack of optimization for the particular platform). Various middleware projects have created generic infrastructure, to allow various scientific and commercial projects to harness a particular associated grid, or for the purpose of setting up new grids. Boinc is a common one for academic projects seeking public volunteers; more are listed at the end of the article.
Cpu Scavenging
CPU-scavenging, cycle-scavenging, cycle stealing, or shared computing creates a "grid" from the unused resources in a network of participants (whether worldwide or internal to an organization). Usually this technique is used to make use of instruction cycles on desktop computers that would otherwise be wasted at night, during lunch, or
even in the scattered seconds throughout the day when the computer is waiting for user input or slow devices. Volunteer computing projects use the CPU scavenging model almost exclusively. In practice, participating computers also donate some supporting amount of disk storage space, ram, and network bandwidth, in addition to raw CPU power. Nodes in this model are also more vulnerable to going "offline" in one way or another from time to time, as their owners use their resources for their primary purpose.
History
The term grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid in Ian foster and Carl Kesselmans seminal work, "the grid: blueprint for a new computing infrastructure". CPU scavenging and volunteer computing were popularized beginning in 1997 by distributed.net and later in 1999 by seti@home to harness the power of networked PCs worldwide, in order to solve CPU-intensive research problems. The ideas of the grid (including those from distributed computing, object oriented programming, cluster computing, web services and others) were brought together by Ian foster, Carl Kesselman and Steve Tuecke, widely regarded as the "fathers of the grid." they led the effort to create the globus toolkit incorporating not just CPU management (examples: cluster management and cycle scavenging) but also storage management, security provisioning, data movement, monitoring and a toolkit for developing additional services based on the same infrastructure including agreement negotiation, notification mechanisms, trigger services and information aggregation. While the globus toolkit remains the de facto standard for building grid solutions, a number of other tools have been built that answer some subset of services needed to create an enterprise grid.
Father Of The Grid

An exact extraction of an article by Amy m. Braverman, in Chicago university emagazine, ( on April 2004 volume 96, number 4).
computer scientist Ian foster has developed the software to take shared computing to a global level.
In a bare research institutes building room with white, cinder-block walls, Ian foster sits at a red table holding his laptop, blinds shut to block the windows glare, eyes glazed behind wire-rimmed glasses. i might not be too articulate today, the Arthur holly Compton distinguished service professor of computer science warns. Im on two hours sleep. The previous night a west coast students paper was due at midnight, Pacific Time, and then, awake anyway, he worked online with some European colleagues. And because the father of grid computing is alsowith wife Angela Smyth, md00, a hospitals psychiatry fellowthe father of a five- and a six-year-old, he rarely gets to sleep in. So when asked to predict how grid computing will change everyday life in five, ten, 15 years, he thinks for a moment but comes up short. Im not feeling very creative right now, he says in the quick cadence of a native New Zealander. But foster, 45, who heads the distributed systems lab at Argonne national laboratory, clearly has had more inspired moments, persuading the federal government to invest in several multimilliondollar grid-technology projects and convincing companies such as IBM, HewlettPackard, oracle, and Sun Microsystems that grids are the answer to complex computational problemsthe next major evolution of the internet. Just as the internet is a tool for mass communication, grids are a tool for amplifying computer power and storage space. By linking far-flung supercomputers, servers, storage systems, and databases across existing internet lines, grids allow more numbers to be crunched faster than ever before. Several grid projects exist today, but eventually, foster says, a huge global gridthe grid, akin to the internetwill perform complex tasks such as designing semiconductors or screening thousands of potential pharmaceutical drugs in an hour rather than a year. Though corporations recently have begun to show interest in grids, research institutions have long been a ripe testing ground, in the same way that the internet sprouted in academia before blossoming in the commercial world. Large projects are already using the technology. The Sloan digital sky surveyan effort at Chicago, fermilab, and 11 other institutions to map a quarter of the night sky, determining more than 100 million celestial objects positions and absolute brightnessharnesses computer power from labs nationwide to perform in minutes scans that previously took a week. The national digital mammography archive (ndma) in the united states and ediamond in the united kingdom are creating digital-image libraries to hold their respective countries
10
scans. With an expected 35 million u.s. mammograms a year, at 160 megabytes per exam, the ndma web site explains, the annual volume could exceed 5.6 petabytes [a petabyte is 1 million gigabytes] a year, and the minimal daily traffic a day is expected to be 28 terabytes [a terabyte is 1,024 gigabytes]traffic that wouldnt be possible without a grid. By combining computer power and storage space from multiple locations, doctors can view a patients progress over time, compare her with other patient populations, or access diagnostic tools. A similar venture, the biomedical informatics research network, compiles brain images from different databases so researchers can compare the brains of alzheimers patients, for example, to those of healthy people. Still another project is a grid for the network for earthquake engineering simulation (nees). An $82 million program funded by the national science foundation, nees seeks to advance earthquake-engineering research and reduce the physical havoc earthquakes create. The grid, to be completed in october, links civil engineers around the country with 15 sites containing equipment such as 4-meter-by-4-meter shake tables or tsunami simulators. Through the grid, engineers building san franciscos new bay bridge tested their design structures remotely to make sure they met the latest earthquakeresistance standards. At argonne, a neesgrid partner, an 18-square-inch mini shake table, used for software development and demonstration, sits in material scientist nestor zaluzecs office. A researcher in, say, san diego can activate the mini shake table, moving it quickly back and forth to agitate the 2-foot-tall plastic model sitting on it. Likewise, from his desktop zaluzec can maneuver video cameras in places like boulder, colorado, or champaign, illinois, to watch or participate in experiments. At argonne even some meetings about grids are held using grids. With the access grid, developed by argonnes futures lab for remote group collaboration, scientists nationwide convene in a virtual conference room, from large groups such as a 2002 national science foundation meeting, where 28 sites popped in, star treklike, on a white argonne wall, to smaller thursday test cruises held to keep the system bug-free. At these sessions access grid programmers susanne lefvert and eric olson sit at personal computers, talking with wall-projected images of scientists from other energy department labs, including the princeton plasma physics lab and lawrence berkeley national lab. By now the access grid, first used in 1999, has more than 250 research nodes rooms equipped to connecton five continents. A major automobile company and some oil and gas companies have developed their own access grids, notes futures lab research manager and computer-science doctoral student mike papka, sm02, and chicago researchers also are experimenting with the technology. Last fall jonathan silverstein,
11
assistant professor of surgery and senior fellow in the joint argonne/ chicago computation institute, along with chicago anesthesiologist stephen small and argonne/chicago computer scientist rick stevens, won a national institutes of health contract to install access grid nodes at the u of c hospitals. Connecting operating rooms, the emergency room, radiology, ambulances, and residents hand-held tablet pcs, the three-year prototype project could change the way hospitals process information. Students will watch not only real-time operating-room video feeds but also feeds from laparoscopic devices and robotic surgeons. Radiologists will beam three-dimensional xray scans to surgeonsminus middlemen and waiting time. we are in all these complex environments, silverstein says. The grid allows medical workers literally to share environments, eliminate hand-offs, avoid phone taginstead of passing messages between multiple physicians or waiting before taking the next step, we could all meet for one moment and relay necessary information. Then theres the teragrid. Launched in 2001 by the national science foundation with $53 million, the teragrid aims to be the worlds largest, most comprehensive, distributed infrastructure for open scientific research, its web site declares. Beginning with five sitesargonne; the university of illinois urbana-champaign; the university of california, san diego; the california institute of technology; and the pittsburgh supercomputing centerthe project has since picked up four more partners. To be finished by late september, teragrid executive director charlie catlett says, it will have 20 teraflops (a teraflop equals a trillion operations per second) of computing power and a petabyte of storage space. Many of its sites, the web page says, already boast a crosscountry network backbone four times faster than the fastest research networks currently in existence. The teragrid aims to revolutionize the speed at which science operates. The multi-institutional mimd lattice computation collaboration, for instance, which tests quantum chromodynamic theory and helps interpret high-energy accelerator experiments, uses more than 2 million processor hours of computer time per yearand needs more. Another project, namd, a parallel molecular dynamics code designed to simulate large biomolecular systems, has maxed out the fastest system available. On the teragrid, already used by some projects, such research can move forward. Sharing resourcesa practice known as distributed computinggoes back to computers early days. In the late 1950s and early 1960s researchers realized that the machines, then costing tens or even hundreds of thousands of dollars, needed to be more efficient. Because they spent much time idly waiting for human input, the researchers reasoned, multiple users could share them by doling out that unemployed power. Today
12
computers are cheaper, but theyre still underutilizedfive percent usage is normal, foster sayswhich is one reason many companies connect their computers to form unified networks. In a sense grids are simply another variety of distributed computing, now used in many forms. Cluster computing, for example, links multiple pcs to replace unwieldy mainframes or supercomputers. In peer-to-peer computing, such as napster, users who have downloaded specific software can connect to each other and share files. And theres internet computing, most notably seti@home, a virtual supercomputer based at the university of california, berkeley, that analyzes data from puerto ricos arecibo radio telescope to find signs of extraterrestrial intelligence. Pc users download seti@homes screen-saver program, and when their computers are otherwise idle they retrieve data from the internet and send the results to a central processing system. But a lot had to happen between the grids earliest inklings and its current test beds. Foster, who switched from studying math and chemistry to computer science at new zealands university of canterbury before earning a doctorate in the field at londons imperial college, came to argonne in 1989. Programming specialized languages for computing chemistry codes, he used parallel networks, similar to clusters. high-speed networks were starting to appear, he writes in the april 2003 scientific american, and it became clear that if we could integrate digital resources and activities across networks, it could transform the process of scientific work. Indeed research was occurring more and more on an international scale, with scientists from different institutions trying to share data that was growing exponentially. In 1994 foster refocused his research to distributed computing. With steven tuecke, today the lead software architect in argonnes distributed systems laboratory, and carl kesselman, now director of the center for grid technologies at the university of southern californias information sciences institute, he began the globus project, a software system for international scientific collaboration. In the same way that internet protocols became standard for the web, creating a common language and tools, they envisioned globus software that would link sites into a virtual organization, with standardized methods to authenticate identities, authorize specific activities, and control data movement. The concept was quickly put to use. At a 1995 supercomputing conference rick stevens, who also directs argonnes math and computer-science division, and thomas a. Defanti, director of the university of illinoischicagos electronic visualization lab, headed a prototype project, called i-way (information wide area year), that linked 17 high-speed research networks for two weeks. Fosters team developed the software that, he writes in scientific american, knitted the sites into a single virtual system, so users could log on once, locate suitable computers, reserve time, load application codes, and then monitor their execution. Scientists performed computationally complicated
13
simulations such as colliding neutron stars and moving cloud patterns around the planet. it was the woodstock of the grid, larry smarr, the conferences program chair and now director of the california institute for telecommunications and information technology, told the new york times last july, everyone not sleeping for three days, running around and engaged in a kind of scientific performance art. The experience inspired much enthusiasmand funding. The u.s. defense advanced research projects agency gave the globus project $800,000 a year for three years. In 1997 fosters team unveiled the first version of the globus toolkit, the software that does the knitting. The national science foundation, nasa, and the energy department began grid projects, with globus underlying them all. And while foster and his crew have used an open-source approach to develop the technology, making the software freely available and its code open for outside programmers to read and modify, in 1998 he and his colleagues also began the global grid forum, a group that meets three times a year to adopt basic language and infrastructure standards. Such standards, foster writes in what is the grid? (july 2002), allow users to collaborate with any interested party and thus to create something more than a plethora of balkanized, incompatible, noninteroperable distributed systems. The globus toolkit, named the most promising new technology by r&d magazine in 2002, a top-ten emerging technology by technology review in 2003, and given a chicago innovation award last year by the sun-times, still needs work to perfect security and other measures. But the open-source model, much like that used to develop the internet, has proved useful in ferreting out bugs and making improvements. When physicists overloaded one grid system by submitting tens of thousands of tasks at once, for example, the university of wisconsin helped design applications to manage a grids many users. As the technology moves from research institutions, whose data is stored mostly in electronic files, to corporations, which favor databases, the uks e-science program is developing ways to handle the different systems. Without the open-source approach, foster says, the software might not have become the de facto standard for most grid projects, and ibm, the globus toolkits sole corporate funder for the past three years, wouldnt have taken such an active role. success of the grid depends on everyone adopting it, he says, so its counterproductive to work in private. Brokerage firm charles schwab uses a grid developed by ibm to give its clients real-time investment advice. The computer company also has projects under way with morgan stanley and hewitt associates. For fosterthe british computer societys 2002 lovelace medal winner and a 2003 american association for the advancement of science fellowsuch corporate ventures are a critical step in
14
making grids, already a powerful scientific tool, important in everyday life, when the grid will be as common as the internetand as seamless. In the 1960s mits fernando corbato, whom foster calls the father of time-sharing operating systems, described shared computing as a utility, meaning computer access would operate like water, gas, and electricity, where a client would connect and pay by usage amount. Today the grid is envisioned similarly, and utility computing is used synonymously. But when grids will become so ubiquitous remains a big question. Even on a full nights sleep fostertodays father figurehesitates to guess beyond thats some way out, happily encouraging his virtual child but not wanting to impose unrealistic expectations. its a process, he says. Although large grids are running in both the united states and europe, and foster skipped the march global grid forum meeting in berlin to talk up grids in his homeland new zealand, we havent nailed down all the standards. Theres more to be done. Its a global, multi-industry path hes forging, and if he cant predict where the next generation will head, hes prepared the grid to lead the way.
Current Projects And Applications

Grids offer a way to solve grand challenge problems like protein folding, financial modeling, earthquake simulation, and climate/weather modeling. Grids offer a way of using the information technology resources optimally inside an organization. They also provide a means for offering information technology as a utility for commercial and non-commercial clients, with those clients paying only for what they use, as with electricity or water. Grid computing is presently being applied successfully by the national science foundation's national technology grid, nasa's information power grid, pratt & whitney, bristol-myers squibb, co., and american express. One of the most famous cycle-scavenging networks is seti@home, which was using more than 3 million computers to achieve 23.37 sustained teraflops (979 lifetime teraflops) as of september 2001. As of may 2005, folding@home had achieved peaks of 186 teraflops on over 160,000 machines. As of august 2009 folding@home achieves more than 4 petaflops on over 350,000 machines.
15
The european union has been a major proponent of grid computing. Many projects have been funded through the framework programme of the european commission. Many of the projects are highlighted below, but two deserve special mention: beingrid and enabling grids for e-science. Beingrid (business experiments in grid) is a research project partly funded by the european commission[citation needed] as an integrated project under the sixth framework programme (fp6) sponsorship program. Started in june 1, 2006, the project will run 42 months, until november 2009. The project is coordinated by atos origin. According to the project fact sheet, their mission is to establish effective routes to foster the adoption of grid computing across the eu and to stimulate research into innovative business models using grid technologies. To extract best practice and common themes from the experimental implementations, two groups of consultants are analyzing a series of pilots, one technical, one business. The results of these cross analyzes are provided by the website it-tude.com. The project is significant not only for its long duration, but also for its budget, which at 24.8 million euros, is the largest of any fp6 integrated project. Of this, 15.7 million is provided by the european commission and the remainder by its 98 contributing partner companies. Another well-known project is distributed.net, which was started in 1997 and has run a number of successful projects in its history. The nasa advanced supercomputing facility (nas) has run genetic algorithms using the condor cycle scavenger running on about 350 sun and sgi workstations. Until april 27, 2007, united devices operates the united devices cancer research project based on its grid mp product, which cycle scavenges on volunteer pcs connected to the internet. As of june 2005, the grid mp ran on about 3,100,000 machines . The enabling grids for e-science project, which is based in the european union and includes sites in asia and the united states, is a follow up project to the european datagrid (edg) and is arguably the largest computing grid on the planet. This, along with the lhc computing grid (lcg) have been developed to support the experiments using the cern large hadron collider. The lcg project is driven by cern's need to handle huge amounts of data, where storage rates of several gigabytes per second (10 petabytes per year) are required. A list of active sites participating within lcg can be found online as can real time monitoring of the egee infrastructure . The relevant software and documentation is also publicly accessible .
16
Fastest Virtual Supercomputers
Boinc - 525 teraflops ( as of 4 jun 2007 )
Definitions
Today there are many definitions of grid computing:
The definitive definition of a grid is provided by ian foster in his article "what is the grid? A three point checklist" the three points of this checklist are: Computing resources are not administered centrally. Open standards are used. Non-trivial quality of service is achieved.
Plaszczak/wellner define grid technology as "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations."
Ibm defines grid computing as "the ability, using a set of open standards and protocols, to gain access to applications and data, processing power, storage capacity and a vast array of other computing resources over the internet. A grid is a type of parallel and distributed domains system that based on enables the sharing, selection, and aggregation of resources distributed across 'multiple' administrative their (resources) availability, capacity, performance, cost and users' quality-of-service requirements"
An earlier example of the notion of computing as utility was in 1965 by mit's fernando corbat. Fernando and the other designers of the multics operating system envisioned a computer facility operating "like a power company or water company".
Buyya (Dr. Rajkumar Buyya is a Senior Lecturer and the Storage Tek fellow of Grid Computing in the Department of Computer Science and Software Engineering at the University of Melbourne, Australia) defines a grid as "a type of parallel and distributed system that
17
enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' qualityof-service requirements". Cern, one of the largest users of grid technology, talk of the grid: "a service for sharing computer power and data storage capacity over the internet." Pragmatically, grid computing is attractive to geographicallydistributed non-profit collaborative research efforts like the ncsa bioinformatics grids such as birn: external grids. Grid computing is also attractive to large commercial enterprises with complex computation problems who aim to fully exploit their internal computing power: internal grids. ServePath.com defines grid computing as, The definition of Grid Computing is the simultaneous application of multiple computers to a problem that typically requires access to significant amounts of data or a large number of computer processing cycles. Grid computing is quickly gaining popularity due to its ability to maximize the efficiency of computing sources as well as its ability to solve large problems with considerably less computing power. Grids can be categorized with a three stage model of departmental grids, enterprise grids and global grids. These correspond to a firm initially utilising resources within a single group i.e. An engineering department connecting desktop machines, clusters and equipment. This progresses to enterprise grids where nontechnical staff's computing resources can be used for cycle-stealing and storage. A global grid is a connection of enterprise and departmental grids that can be used in a commercial or collaborative manner.
But What Does "High Performance" Mean?
Performance is measured in flops. A flop is a basic computational operation - like adding two numbers together. A gigaflop is a billion flops, or a billion operations.
The Death Of Distance
18
Computing grids use international networks to link computing resources from all over the world. This means you can sit in france and use computers in the u.s, or work from australia using computers from taiwan. Such international grids are possible today because of the impressive development of networking technology. Ten years ago, it would have been stupid to send large amounts of data across the globe for processing on other computer resources, because of the time taken to transfer the data. Today, all this is possible and more! Pushed by the internet economy and the widespread penetration of optical fibers in telecommunications systems, the performance of wide area networks has been doubling every nine months or so over the last few years. That translates to a 3000x improvement in 15 years. Imagine if cars had made the same improvements in speed since 1985you could easily go into orbit by pressing down hard on the accelerator!
Faster! Faster!
Some researchers have computing needs that make even the fastest connections seem slow: some scientists need even higher-speed connectivity, up to tens of gigabits per second (gbps); others need ultra-low "latency", which means there is minimal delay when sending date to remote colleagues in "real time". Other researchers want "just-in-time" delivery of data across a grid, so that complicated calculations requiring constant communication between processors can be performed. To avoid communication bottlenecks, grid developers also have to determine ways to compensate for failures, like transmission errors or pc crashes. To meet such critical requirements, several high-performance networking issues have to be solved, including the optimization of transport protocols and the development of technical solutions such as high-performance ethernet switching.
Secure Access
Secure access to shared resources is one of the most challenging areas of grid development. To ensure secure access, grid developers and users need to manage three important things:
19
Access policy - what is shared? Who is allowed to share? When can sharing occur? Authentication - how do you identify a user or resource? Authorization - how do you determine whether a certain operation is consistent with the rules?
Grids need to efficiently track of all this information, which may change from day to day. This means that grids need to be extremely flexible, and have a reliable accounting mechanism. Ultimately, such accounting will be used to decide pricing policies for using a grid. These accounting challenges are not new - the same questions arise whenever you use your credit card in a caf. But grid users must share resources, and so grids require new solutions. Imagine if the owner of a caf were to lend some tables to another caf...how would you securely track customers, orders and payments?
Security And Trust
The issue of security is linked to trust: you may trust the other users, but do you trust that your data and applications are securely protected on their shared machines? Without adequate security, someone could read or modify your data - hence the warnings about security when you use your credit card on the internet. The issue of security concerns all information technologies and is taken very seriously. New security solutions are constantly being developed, including sophisticated data encryption techniques. But it is a never-ending race to stay ahead of malicious hackers.
Resource Use
Grids allow you to efficiently and automatically spread your work across many computer resources. The result? Your jobs are finished much faster. Imagine if you had to do 1000 difficult maths questions. You could do them yourself, or you could use a computing grid. If you used a grid of 100 computers, you would give one question or "job" to each computer. When a computer finished one "job",
20
it would automatically ask for another. In this way, your 1000 questions could be finished in a flash, with all 100 computers working to full efficiency. But grids are shared resources, right? So what happens when there is a queue of people is waiting to use a computing grid? How do you decide whose "job" is next in line?
Middleware To The Rescue
Computing grids rely on middleware - special grid computing software - to allocate jobs efficiently. Middleware uses information about the different "jobs" submitted to each queue to calculate the optimal allocation of resources. To do this, we ideally need to know how many jobs are in each queue, and how long each job will take. This doesn't work perfectly yet, but then, neither did the web in its early days (remember when they called it the world wide wait?!)
Resource Sharing
Resource sharing is the crux of grid philosophy - but grid computing is not about getting something for nothing. Grid computing aims to involve everyone in the advantages of resource sharing and the benefits of increased efficiency. Grids give you shared access to extra computing power A grid can also give you direct access to remote software, computers and data A grid can even give you access and control of remote sensors, telescopes and other devices that do not belong to you.
But Would You Trust Your Computer To A Complete Stranger?

What about your car? A computing grid is a bit like a car pool - sometimes you share your car with other people; other times, they share their car with you. These people could be strangers, but if they are part of the same car pool organization as you,
21
you will generally trust each other at some level. If you are always late, the others will complain and eventually kick you out of the car pool. So there is trust, and there are mechanisms to deal with breach of trust. Grids are kind of the same. Grid resources are owned by many different people who run different software, exist in different administrative domains, and use different systems for security and access. This presents a major challenge. For example, when someone decides to share their computing resources on a grid, they will normally put conditions on the use of those resources, specifying limits on which resources can be used when, and what can be done with them.
Open Standards
By standardizing the way we create computing grids, we're one step closer to making sure all the smaller grids can connect together to form larger, more powerful grid computing resources. standard can often be equated with average or boring: how can you innovate or invent when youre bound by standards and regulations? How can you push the boundaries when youre stuck inside a box? Yet how can you create something on a grand scalesomething that can slot together with other grand thingsunless you create something interoperable. Something standard. Adopting open, common standards for grid computing might sound obvious. But when was the last time you needed a inch screw and only had metric screws available? And have you ever blown up a 120v machine by accidentally sticking it into 240v mains? So much for "universal" standards! The sticky question is, which standards should be used for grid computing? There are hundreds of software developers working to create dozens of different grids, and each of these developers have their own views on what makes a good standard. While they work, technology continues to evolve and provides new tools that need to be integrated within the existing grid machinery, which may require revising the standards.
Who Is In Charge Of Grid Standards?
22
The open grid forum is a standards body for the grid community. With more than 5000 volunteer members, this body is a significant force for setting standards and community developments.
The Middleware
"middleware" is the software that organizes and integrates the resources in a grid. Middleware is made up of many software programs, containing hundreds of thousands of lines of computer code. Together, this code automates all the "machine to machine" (m2m) interactions that create a single, seamless computational grid.
Agents, Brokers And Striking Deals
Middleware automatically negotiate deals in which resources are exchanged, passing from a grid resource provider to a grid user. In these deals, some middleware programs act as "agents" and others as "brokers". Agent programs present "metadata" (data about data) that describes users, data and resources. Broker programs undertake the m2m negotiations required for user authentication and authorization, and then strike the "deals" for access to, and payment for, specific data and resources. Once a deal is set, the broker schedules the necessary computational activities and oversees the data transfers. At the same time, special "housekeeping" agents optimize network routings and monitor quality of service. And all this occurs automatically, in a fraction of the time that it would take humans at their computers to do manually.
Delving Inside Middleware

There are many other layers within the middleware layer. For example, middleware includes a layer of "resource and connectivity protocols", and a higher layer of "collective services".
23
Resource and connectivity protocols handle all grid-specific network transactions between different computers and grid resources. For example, computers contributing to a particular grid must recognize grid-relevant messages and ignore the rest. This is done with communication protocols, which allow the resources to communicate with each other, enabling exchange of data, and authentication protocols, which provide secure mechanisms for verifying the identity of both users and resources. The collective services are also based on protocols: information protocols, which obtain information about the structure and state of the resources on a grid, and management protocols, which negotiate uniform access to the resources. Collective services include:
Updating directories of available resources Brokering resources (which like stock broking, is about negotiating between those who want to "buy" resources and those who want to "sell") Monitoring and diagnosing problems Replicating data so that multiple copies are available at different locations for ease of use Providing membership/policy services for tracking who is allowed to do what and when.
Globus Toolkit
The globus toolkit is a popular example of grid middleware. It's a set of tools for constructing a grid, covering security measures, resource location, resource management, communications and so on. Many major grid projects use the globus toolkit, which is being developed by the globus alliance, a team primarily involving ian foster's team at argonne national laboratory and carl kesselman's team at the university of southern california in los angeles. Many of the protocols and functions defined by the globus toolkit are similar to those in networking and storage today, but have been optimized for grid-specific deployments.
24
Globus Includes Programs Such As:
Gram (globus resource allocation manager): figures out how to convert a request for resources into commands that local computers can understand Gsi (grid security infrastructure): authenticates users and determines their access rights Mds (monitoring and discovery service): collects information about resources such as processing capacity, bandwidth capacity, type of storage, and so on
Gris (grid resource information service): queries resources for their current configuration, capabilities, and status Giis (grid index information service): coordinates arbitrary gris services Gridftp (grid file transfer protocol): provides a high-performance, secure, and robust data transfer mechanism Replica catalog: provides the location of replicas of a given dataset on a grid The replica management system: manages the replica catalog and gridftp, allowing applications to create and manage replicas of large datasets.
There are two main reasons for the strength and popularity of the globus toolkit:
1.
Grids need to support a wide variety of applications created according to different programming paradigms. Rather than providing a uniform programming model for grid applications, the globus toolkit has an "object-oriented approach", providing a bag of services so that developers can choose the services that best meet their needs. The tools can also be introduced one at a time. For example, an application can use gram or gris without having to necessarily use the globus security or replica management systems.
2.
The globus toolkit is available under an "open-source" licensing agreement, which means anyone is free to use or improve the software. This is similar to the world wide web and the linux operating system.
25
National Grids
National grids like those listed below combine national computing resouces to create powerful grid computing resources. D-grid Eneagrid Grid-ireland National grid service Norgrid Teragrid dutchgrid fermilab computing division hungrid nersc swegrid thai national grid twgrid
Project details Synopsis
d-grid (germany) the first d-grid projects started in september 2005 with the goal of developing a distributed, integrated resource platform for high-performance computing and related services to enable the processing of large amounts of scientific data and information.
dutchgrid (the netherlands) dutchgrid is the platform for grid computing and technology in the netherlands. Open to all institutions for research and test-bed activities, dutchgrid aims to coordinate various grid deployment efforts and to offer a forum for the exchange of experiences on grid technologies.
eneagrid (italy) eneagrid makes use of grid technologies to provide an integrated production environment including all the high performance and high throughput computational resources available in enea, the italian national agency for new technologies, energy and the environment. Interoperability with other grid infrastructures is currently in operation.
Project details
fermilab computing division (fermilab in the u.s.)
26
Synopsis
fermigrid united fermilabs computing resources into a single grid infrastructure, changing the way that computing was done at the lab by improving efficiency and making better use of resources. Now involved in developing and supporting innovative computing solutions and services for fermilab.
grid-ireland (ireland) grid-ireland fosters and promotes grid activities in ireland, involving partners across the country.
hungrid (uk) hungrid is the first official hungarian virtual organization of egee. Its goal is to allow grid users of hungarian academic and educational institutes to perform the computing activities relevant for their researches and thus the vo functions as a catch-all vo for all the hungarian participants that do not (yet) have an established vo in their respective field of research. It is also an egee testing environment for hungarian research communities that show interest in starting their own virtual organizations.
national grid service (uk) the ngs aims to provide coherent electronic access for uk researchers to all computational and data based resources and facilities. Join their mailing list for up-to-the-minute ngs action..
nersc (national energy research scientific computing center in the u.s.) users can access several nersc resources via globus grid interfaces using x509 grid certificates. Nersc is part of the open science grid (osg), and is available to select osg virtual organizations for compute and storage resources.
Project details
norgrid (norway)
27
Synopsis
norgrid aims to establish and maintain a national grid infrastructure in norway. Norgrid is the norwegian component in the third phase of the egee project.
swegrid (sweden) swegrid is a swedish national computational resource, consisting of 600 computers in six clusters at six different sites across sweden. The sites are connected through the highperformance gigasunet network.
teragrid (u.s. supercomputing grid) teragrid aims to build and deploy the world's largest, fastest, most comprehensive, distributed infrastructure for open scientific research. It involves involves partners across the u.s.
thai national grid project (thailand) the thai national grid project is a national initiative on grid computing funded by the royal thai government through the software industry promotion agency of the ministry of information and communication technology.
twgrid (taiwan) twgrid is the taiwanese grid and a member of global grid projects including egee and wlcg. Coordinated by academia sinica grid computing, twgrid provides the grid-related technology and infrastructure support for the lhc experiments in taiwan, as well as working to produce new grid-powered science applications to further international e-science advances.
International Grids
International grids cross national boundaries, spanning cultures, languages, technologies and more to create international resources and power global science using global computing.
28
Ap grid Deisa Egee Euasiagrid Gridpp Nextgrid Open grid forum Open science grid Winds
d4science eela-2 egi_ds eu-indiagrid lcg nordugrid ogf-europe pragma
ap grid
asia-pacific grid
ap grid is a partnership for grid computing in the asia-pacific region, aiming to share technologies, resources and knowledge in order to build, nurture and promote grid technologies and applications. Partners come from 15 countries in the asiapacific and beyond..
d4science distributed collaboratories infrastructure on grid enabled technology 4 science d4science aims to create grid-based and data-centric einfrastructures to support scientific research. It is co-funded by the european commission until 2010 and involves partners across europe.
deisa
distributed
european
infrastructure
for
supercomputing applications deisa combines the power of supercomputing centres across europe to accelerate scientific research. Project details Synopsis eela e-science for europe and latin america
eela aims to provide grid facilities to promote scientific collaboration between europe and latin america, aiming to ensure the long-term sustainability of the e-infrastructure.
Project details
egee
enabling grids for e-science
29
Synopsis
egee is the largest multi-disciplinary grid infrastructure in the world, bringing together more than 120 organisations to provide scientific computing resources to the european and global research community. Egee comprises 250 sites in 48 countries and more than 68,000 CPUs available to some 8,000 users, 24 hours a day, 7 days a week.
egi_ds
european grid initiative design study
the european grid initiative design study aims to establish a sustainable grid infrastructure in europe. Driven by the needs and requirements of the research community, it is expected to enable the next leap in research infrastructures, thereby supporting collaborative scientific discoveries in the european research area. Egi_ds includes partners across europe
euasiagrid collaboration between europe and asia euasiagrid aims to pave the way towards an asian e-science grid infrastructure, in synergy with the other european grid initiatives in europe and asia.
eu-indiagrid
collaboration between europe and india
eu-indiagrid will bring together over 500 multidisciplinary organisations to build a grid-enabled e-science community aiming to boost r&d innovation across europe and india.
gant gant
pan-european gigabit research network provides networking infrastructure to support
researchers, as well as an infrastructure for network research. Gant aims for high speed connectivity, geographical expansion, global connectivity and guaranteed quality of service. It comprises 27 european national research and education networks. Project details Synopsis gridpp grid for uk particle physics
gridpp is a collaboration of particle physicists and computing scientists from the uk and cern, who are building a grid for particle physics. The main objective is to develop and deploy a
30
large-scale science grid in the uk for use by the worldwide particle physics community. Project details Synopsis lcg worldwide lhc computing grid the mission of the lhc computing project (lcg) is to build and maintain a data storage and analysis infrastructure for the entire high energy physics community that will use the large hadron collider. Project details Synopsis nextgrid supporting mainstream use of grids
nextgrid aims to enable the widespread use of grids by research, industry and the ordinary citizen, thus creating a dynamic marketplace for new services and products. An eufunded project with multiple partners
nordugrid grids in the nordic region nordugrid is a grid research and development collaboration aiming at development, maintenance and support of a free grid middleware known as the "advanced resource connector" (arc). The collaboration was established by five nordic academic institutes and is based upon a memorandum of understanding.
open grid forum international grid standards the open grid forum is a community-initiated forum of 5000+ people interested in distributed computing and grid technologies. Ogf aims to promote and support grid technologies via the creation and documentation of "best practices" - technical specifications, user experiences, and implementation guidelines. Involves more than 400 organizations from 50 countries.
ogf-europe key role in
european and international grid standards influencing the drive towards global
ogf-europe works closely with open grid forum and plays a standardisation efforts and in bringing best practices in the european computing environment.
31
open science grid open grid infrastructure for collaborative science the open science grid consortium provides an open grid infrastructure for science in the u.s and beyond. Osg combines resources at many u.s. labs and universities and provides access to shared resources for the benefit of scientific applications.
pragma assembly
pacific rim applications and grid middleware
pragma is an open organization in which pacific rim institutions collaborate to develop grid-enabled applications and to deploy the infrastructure throughout the pacific region. Pragma aims to enhance current collaborations and connections, build new collaborations, and formalize resourcesharing agreements.
winds caribbean
grid collaboration in europe, latin america and the
the www.winds-lac.eu platform, maintained by the winds-la and winds-caribe projects, aims to further develop and support ict research and development collaboration between europe, latin america and the caribbean by identifying common needs, research issues and opportunities for cooperation, promoting excellence research from the regions in europe, and proposing a long-term cooperation strategy in the field of ict research.
High-Throughput Problems
High-throughput applications are problems that can be divided into many independent tasks. Computing grids can be used to schedule these tasks, dealing them out to the different computer processors in the grid. As soon as a processor finishes one
32
task, the next task arrives. In this way, hundreds of tasks can be performed in a very short time.
Examples of high-throughput
Error! Bookmark not defined. applications include:? The analysis of thousands of particle collisions in a bid to understand more about our universe, as in the large hadron collider computing grid The analysis of thousands of molecules in a bid to discover a drug candidate against a specific malaria protein, as part of the grid-enabled wisdom project The analysis of thousands of protein folding configurations in a bid to discover more efficient ways of packaging drug proteins, using rosetta software in the open science grid The use of volunteer computing to power applications including seti@home, which aids in the search for extraterrestrial intelligence; fightaids@home, which models the evolution of drug resistance and helps to design new anti-hiv drugs, or brats@home, which works on gravitational ray tracing. These "@home" tasks involved are totally independent, so it doesn't matter whether some tasks take a long time. After a "time-out" period, unfinished tasks are simply sent elsewhere to be processed.
High-Performance Problems
When people talk about "high performance computing" or hpc, they're generally talking about supercomputing. Supercomputers are different to computing grids: where grids link computers that are distributed around an institution, country or the world, supercomputers are one giant computer in a single room. Supercomputers generally deal with computer-centric problems; the secret to solving these problems is "teraflops": as many as possible. Grid computing allows large computational resources to be combined, helping scientists to tackle problems that cannot be solved on a single system, or to solve problems much more quickly.
33
Examples of these supercomputing grids are deisa in europe or teragrid in the u.s. Typical hpc grid applications include:
Astrophysics (e.g., simulations of a supernova explosion or black hole collision) Automotive/aerospace industry (e.g., simulations of a car crash or a new airplane design) Climate modeling (e.g., simulations of a tornado or climate prediction) Economics (e.g., modeling the world economy)
Grid Computing In 30 Seconds

Grid computing is a service for sharing computer power and data storage capacity over the internet. How is grid computing different from the world wide web? Simple. Grid computing uses the internet to help us share computer power, while the web uses the internet to help us share information. Grid computing is making big contributions to scientific research, helping scientists around the world to analyze and store massive amounts of data.
The Dream
The grid computing dream began with talk of creating an all-powerful "grid": one grid comprised of many smaller grids joined together, forming a global network of computers that can operate as one vast computational resource. In grid computing reality, there are already hundreds of grids around the world, each one created to help a specific group of researchers, or a particular group of users. And across the world, researchers and software engineers are working to bring "the grid" closer to achieving the dream.
34
''Gridifying'' Your Application

An application that ordinarily runs on a stand-alone pc must be "gridified" before it can run on a grid. Just like "webifying" applications to run on a web browser, grid users need to "gridify" their applications to run on a grid. Once gridified, thousands of people will be able to use the same application and run it trouble-free on interoperable grids (like most software, there will always be a few bugs here and there). "gridification" means adapting applications to include new layers of grid-enabled software. For example, a gridified data analysis application will be able to:
Obtain the necessary authentication credentials to open the files it needs Query a catalogue to determine where the files are and which grid resources are able to do the analysis Submit requests to the grid, asking to extract data, initiate computations, and provide results Monitor progress of the various computations and data transfers, notifying the user when analysis is complete, and detecting and responding to failures (collective services).
Computational Problems
There are many different ways to describe computational problems. Here are a few that are important to grid technology:
Parallel Calculations:
Parallel calculations can be split into many smaller sub-calculations. This means that each sub-calculation can be worked on by a different processor, so that many subcalculations can be worked on "in parallel". This allows you to speed up your computation.
35
Embarrassingly Parallel Calculations:

A calculation is embarrassingly parallel when each sub-calculation is independent of all the other calculations. For example, analyzing a large databank of medical images is embarrassingly parallel, since each image is independent of the others.
Coarse-Grained Calculations:
Coarse-grained calculations are often embarrassingly parallel. "Monte Carlo simulations", where you vary the parameters in a model and then study the results, are also coarse-grained calculations.
Fine-Grained Calculations:
In a fine-grained calculation, each sub-calculation is dependent on the result of another sub-calculation. For example, when calculating the weather, each calculation in one volume of atmosphere is affected by surrounding volumes. Fine-grained parallel calculations require very clever programming to make the most of their parallelism, so that the right information is available to processors at the right time.
High-Performance Vs. High-Throughput Error! Bookmark not defined.:

Fine-grained calculations are better suited to high-performance computing, which usually involves a big, monolithic supercomputer, or very tightly coupled computer clusters with lots of identical processors and an extremely fast, reliable network between the processors. Embarrassingly parallel calculations are ideal for high-throughput computing: more loosely coupled networks of computers where delays in getting results from one processor will not affect the work of the others.
And Grid Computing..?
Many interesting problems in science require a combination of fine- and coarsegrained calculations, and this is where grids can be particularly powerful:
36
For example, in the case of complex climate modeling, researchers launch many similar calculations to see how different parameters affect their models. Each calculation is a fine-grained parallel calculation that needs to run on a single cluster or supercomputer. Using a grid, these many independent calculations can be distributed over many different grid clusters, thus adding coarse-grained parallelism and saving a lot of time.
Breaking Moores Law?

Moore's law was a statement made in 1965 by Gordon Moore, one of the founders of Intel. Moore noted that the number of transistors that could be squeezed on to a silicon chip was doubling every year. Over time, this has been revised to doubling every 18 months.
Nice Idea, But...
Moore's law is one of the most misused concepts in computing. Even though Moore's statement was limited to a very specific quantity - the number of transistors on a chip - it is now used for just about everything else in computing. "Computing power doubles every 18 months" is one common misuse of Moore's observation. Today, comparisons are made between different quantities that have nothing to do with Moore's law. For example, if "network performance is doubling every nine months", or "data storage density is doubling every 12 months", it might be said that these trends are "outperforming" Moore's law. This could mean that, somehow, computer processors are not keeping up with data storage and network capacity. This ignores a number of trends which Moore's law does not take into account. For example, the clock cycle of processors increases along with the increase in the number of transistors per chip. This means that processor power grows faster than Moore's law. Further, improvements in chip architecture and operating systems also make processors more powerful than the mere sum of their transistors. In short, comparing different growth rates using Moore's law is often misleading. It is best to see Moore's law as simply a metaphor for exponential growth in the performance of IT hardware.
37
More On Moore's Law

As a result of this exponential growth, with every year that passes, the grid concept becomes more feasible: networks become faster and distributed processors can be more tightly integrated. Individual computers also become more powerful, which means that computer grids are increasingly able to solve increasingly complex problems. All this computing power helps our scientists find solutions to the big questions, like climate change and sustainable power.
38
Works Cited
(2010, January). Retrieved from GridCafe.org: http://www.gridcafe.org (2010, January). Retrieved from WikiPedia: http://www.wikipedia.com Braverman, A. M. (2004, April). Father of Grid Computing. Retrieved from University of Chicago Magazine: http://uchicago.edu Buyya, R., & Venugopal, S. (2005, July). A Gentle Introduction to Grid Computing and Technologies. Educause Learning Initiative. (2006, January). 7 things you should know about... Grid computing. Retrieved from educause.edu: www.educause.edu/eli Jacob, B., Brown, M., Fukui, K., & Trivedi, N. (2005, December). Introduction to Grid Computing. Retrieved from IBM RedBooks: http://www.ibm.com/redbooks Viktors Berstis. (2001). Fundamentals of Grid Computing. Retrieved from IBM RedBooks: ibm.com/redbooks

An Inquiry Report On Grid Computing

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

An Inquiry Report On Grid Computing

Hochgeladen von

Copyright:

Verfügbare Formate

An Inquiry Report On

Jeevan Kumar Vishwakarman

vishnudhoodhan@gmail.com | +91 9020 321 091

1st M.C.A |M9 MCA AA 0010

For Personal Contact Use These. Karampotta, Kozhinjampara, Palakkad. 678555

The Hardware ............................................................................................. 6 Design Considerations And Variations .................................................... 7

14. Definitions ................................................................................................. 17

15. The Death Of Distance ............................................................................. 18

16. Secure Access ............................................................................................ 19

17. Resource Use ............................................................................................. 20

Jeevan Kumar Vishwakarman

23. Globus Toolkit .......................................................................................... 24

32. Breaking Moores Law? ........................................................................... 37

Jeevan Kumar Vishwakarman

A Gentle Introduction To Grid Computing And Technologies

What Is Grid Computing?