Sie sind auf Seite 1von 11

Cloud Computing BOF

OGF22 Birds of a Feather Session


Hyatt Regency Cambridge
February 27 2008

Geoffrey Fox
Indiana University
gcf@indiana.edu

© 2007 Open Grid Forum


Cloud Agenda

• Geoffrey Fox (Indiana U.) Remarks on Cloud Computing


• Martin Swany (Internet2) Clouds and Dynamic Networking
• Steven Newhouse (Microsoft) Personal View on Clouds
• Kate Keahey (Argonne, Chicago) First Steps in the Clouds
• Next Steps

© 2007 Open Grid Forum 2


What are Clouds?
• Clouds are “Virtual Clusters” (“Virtual Grids”) of
possibly “Virtual Machines”
• They may cross administrative domains or may “just be a
single cluster”; the user cannot and does not want to know
• Clouds support access (lease of) computer instances
• Instances accept data and job descriptions (code) and return
results that are data and status flags
• Each Cloud is a “Narrow” (perhaps internally
proprietary) Grid
• When does Cloud concept work
• Parameter searches, LHC style data analysis ..
• Common case (most likely success case for clouds) versus
corner case?
• Clouds can be built from Grids
• Grids can be built from Clouds
© 2007 Open Grid Forum 3
Cloud References

• http://en.wikipedia.org/wiki/Cloud_computing
• Includes references to Amazon, Apple, Dell, Enomalism, Globus,
Google, IBM, KnowledgeTreeLive, Nature, New York Times, Zimdesk
• Others like Microsoft Windows Live Skydrive important
• http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud
• http://uc.princeton.edu/main/index.php?option=com_content&task
Policy Issues
• http://www.cra.org/ccc/home.article.bigdata.html
• Hadoop (MapReduce) and “Data Intensive Computing”
• See Data intensive computing minitrack at HICSS-42 January 2009
• http://ianfoster.typepad.com/blog/2008/01/theres-grid-in.html
• OGF Thought Leadership blog
• OGF22 talks by Charlie Catlett and Irving Wladawsky-Berger
© 2007 Open Grid Forum 4
Big-Data Computing Study Group

CCC
Role Hadoop and
Versus MapReduce
are “just”
OGF? workflow?

© 2007 Open Grid Forum 5


Google MapReduce
Simplified Data Processing on Clusters/Clouds
• http://labs.google.com/papers/mapreduce.html
• This is a dataflow model between services where services can do useful
document oriented data parallel applications including reductions
• The decomposition of services onto cluster engines (clouds) is automated
• The large I/O requirements of datasets changes efficiency analysis in favor
of dataflow
• Services (count words in example) can obviously be extended to general
parallel applications
• There are many alternatives to language expressing either dataflow and/or
parallel operations and/or workflow

© 2007 Open Grid Forum 6


Technical Questions about Clouds I
• What is performance overhead?
• On individual CPU
• On system including data and program transfer
• What is cost gain
• From size efficiency; “green” location (rumor that Google
has purchased the Niagara Falls including Canada!)
• Is Cloud Security adequate: can clouds be trusted?
• Can one can do parallel computing on clouds?
• Looking at “capacity” not “capability” i.e. lots of modest
sized jobs
• Marine corps will use Petaflop machines – they just need
ssh and a.out
© 2007 Open Grid Forum 7
Technical Questions about Clouds II
• How is data compute affinity tackled in clouds?
• Co-locate data and compute clouds?
• Lots of optical fiber i.e. “just” move the data?
• What happens in clouds when demand for resources
exceeds capacity – is there a multi-day job input
queue?
• Are there novel cloud scheduling issues?
• Do we want to link clouds (or ensembles as atomic
clouds); if so how and with what protocols
• Is there an intranet cloud e.g. “cloud in a box”
software to manage personal (cores on my future 128
core laptop) department or enterprise cloud?
© 2007 Open Grid Forum 8
Standards for Compute and Storage Clouds

• We no longer need interoperability of services and


messages (SOAP) but rather interoperability of clouds
• Maybe each cloud so big that interoperability between
clouds not so critical
• Interoperability certainly for application specific data
and perhaps also for job specifications
• WFS, GML for Geo-data; IVOA standards; DST LHC
experiment formats
• JSDL, BES etc.
• Each Cloud will be proprietary but they might want raw
infrastructure standards so they can easily swap in
and out different vendor’s disk drives
• Clouds very very loosely coupled; services loosely
coupled
© 2007 Open Grid Forum 9
MSI Challenge Problem
• There are > 330 MSI’s – Minority Serving Institutions
• 2 examples
• ECSU is a small state university in North Carolina
• HBCU with 4000 students
• Working on PolarGrid (Sensors in Arctic/Antarctic linked to
“TeraGrid”)
• Navajo Tech in Crown Point NM is community college with
technology leadership for Navajo Nation
• “Internet to the Hogan and Dine Grid” links Navajo communities by
wireless
• Wish to integrate TeraGrid science into Navajo Nation education
curriculum
• Current Grid technology too complicated if you are not an R1
institution
• Hard to deploy campus grids broadly into MSI’s
• Clouds provide virtual campus resources?
© 2007 Open Grid Forum 10
Next Steps at OGF
• Clouds are just starting and build on/are related to
Grids
• Clear need for best practice in use and technology
• Likely to be need for new standards and novel use of
existing/projected standards

• New Cloud Community Group?


• Chairs, participants?
• Workshop?
• OGF23 activity?
• Identify key players not currently involved with OGF?
© 2007 Open Grid Forum 11

Das könnte Ihnen auch gefallen