Sie sind auf Seite 1von 36

HiPIC

Introduction to Cloud Computing


2009 KOCSEA Symposium
Jongwook Woo, PhD jwoo5@calstatela.edu High-Performance Internet Computing Center (HiPIC) Computer Information Systems Department California State University, Los Angeles
Jongwook Woo

CSULA

HiPIC

Contents

 Cloud Computing  Why now?  Models  Map/Reduce

Jongwook Woo

CSULA

HiPIC

Cloud Computing

Provide services to S/W and H/W companies


Save the development costs of the companies Ex: Hadoop service by Amazon

Web Based Service


Web Office, DB etc by MS http://www.nextgov.com/nextgov/ng_20 090817_6429.php
Jongwook Woo

CSULA

HiPIC

Cloud Computing by NIST

Cloud computing is a model


for on-demand network access
to a shared pool of configurable computing resources
that can be rapidly provisioned and released
with minimal management effort or service provider interaction.

Jongwook Woo

CSULA

HiPIC

Why Cloud Computing now?


Broadband networking Mobile, location-aware, services Self-service

 Growth of the Internet usage

 Massive data horizontal scale


User-generated content, digital media Even more data ahead environmental monitoring

 Moore's Law driving down cost of computing and storage


Low cost 1U servers, +1 TB consumer disk drives Consumer devices: smart phones, netbooks, gaming consoles Enables new capabilities: speech, NLP, semantics
Jongwook Woo

CSULA

HiPIC

Why Cloud Computing now?


Broadband networking Mobile, location-aware, services Self-service

 Growth of the Internet usage

 Massive data horizontal scale


User-generated content, digital media Even more data ahead environmental monitoring

 Moore's Law driving down cost of computing and storage


Low cost 1U servers, +1 TB consumer disk drives Consumer devices: smart phones, netbooks, gaming consoles Enables new capabilities: speech, NLP, semantics
Jongwook Woo

CSULA

HiPIC

Benefits
Pay As-You-Go Op-ex vs. Cap-ex Virtualization

 Efficiency

 Flexibility
Demand Scalable Services

 Speed
Rapid, Self Provisioning Faster Deployment API-Driven
Jongwook Woo

CSULA

HiPIC

Cloud Computing Models

Software as a Service
Applications on-demand

Platform as a Service
Developer platform for creating applications

Infrastructure as a Service
Storage and compute capabilities offered as a service
Jongwook Woo

CSULA

HiPIC

Software as a Service (SaaS)

Applications on demand:
Subscription-based, multi-tenant, nothing to download or manage

Google Apps (docs, email), Microsoft Exchange Online, Yahoo Mail, TurboTax Online, YouTube, Twitter, Flickr, Salesforce.com,

Jongwook Woo

CSULA

HiPIC

Platform as a Service (PaaS)

On-demand develop and deploy apps


Unique programming model, autoscaling Often both a platform and a channel

Google AppEngine, Engine Yard

Jongwook Woo

CSULA

HiPIC

Infrastructure as a Service (IaaS)

On-demand virtual infrastructure


Lowest level, most general, selfprovisioning Unlimited managed resources

Amazon AWS (EC2, S3, SQS), Microsoft Azure, RackSpace Cloud, Savis, Terremark, Joyent

Jongwook Woo

CSULA

HiPIC

Issues

Security
Security is still your responsibility Learn everything you can (attackers will) Ease of use often comes with greater risk Monitor don't assume your provider will alert you
http://cloudsecurityalliance.org/
Jongwook Woo

CSULA

HiPIC

Public vs Private Clouds


Pay as you go, Multitenant Applications and services Access virtually unlimited resources

 Public

 Private
Cloud Computing model in a company's own datacenter Resources directly owned but therefore constrained

 Hybrid
Mixed usage of both public and private clouds, often integrated into the same application
Jongwook Woo

CSULA

HiPIC

Public vs Private Clouds


Complete, pre-configured, image of application and OS Pre-packaged or built by user Amazon AMIs, DMTF's OVF

 Virtual Machine Images

 Cloud APIs
Programmic way to provision and manage compute, storage, and network resources Access to scalable services (S3, SimpleDB) XML SOAP, RPC, RESTful + language bindings Work underway to standardize for interoperability

Jongwook Woo

CSULA

HiPIC

Example Cloud APIs (infrastructure)


EC2, S3, SQS, SDB. ....

 Amazon's AWS  ServePath


GoGrid API

 RackSpace
Cloud API

 Sun Microsystems
Open Cloud API

 Vmware
vCloud API
Jongwook Woo

CSULA

HiPIC

Economics of Cloud Computing

 Capex (capital expense)


Typically large upfront cost of purchasing equipment

 Opex (operating expense)


Monthly cost of renting equipment you don't own

 Pay as you go reduces startup costs and risk for the  investor
Closely track business growth
Jongwook Woo

CSULA

HiPIC

Running the company


email, docs, collaboration, CRM, project planning, customer support, HR

 Use SaaS for almost all business apps

 Use IaaS/PaaS for application development and delivery


Source repository, continuous integration, QA, load testing Application delivery: IaaS, PaaS

 Create your own Virtual Data Center  Use laptops and mobile phones for everything else
Jongwook Woo

CSULA

HiPIC

Functional Programming Review

Functional operations do not modify data structures: They always create new ones Original data still exists in unmodified form Data flows are implicit in program design Order of operations does not matter
Jongwook Woo

CSULA

HiPIC

Functional Programming Review

fun foo(l: int list) = sum(l) + mul(l) + length(l)

Order of sum() and mul(), etc does not matter they do not modify l

Jongwook Woo

CSULA

HiPIC

Map
map f lst: (a->b) -> (a list) -> (b list) Creates a new list by applying f to each element of the input list; returns output in order.

Jongwook Woo

CSULA

HiPIC

Map Example

 Given an original list of numbers, how can we generate a list of numbers that are square of the original list

f: b -> a^2 ; a list = [1,2,3,4,5] map f [1,2,3,4,5] -> [1,4,9,16,25]


Jongwook Woo

CSULA

HiPIC

Fold (Reduce)
fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b Moves across a list, applying f to each element plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list

Jongwook Woo

CSULA

HiPIC

What is MapReduce?

 Restricted parallel programming model meant for large clusters


User implements Map() and Reduce()

 Parallel computing framework


Libraries take care of EVERYTHING else
Parallelization Fault Tolerance Data Distribution Load Balancing

 Useful model for many practical tasks


CSULA

Jongwook Woo

HiPIC

Map and Reduce

 Functions borrowed from functional programming languages (eg. Lisp)  Map()


Process a key/value pair to generate intermediate key/value pairs

 Reduce()
Merge all intermediate values associated with the same key

Jongwook Woo

CSULA

HiPIC

Example: Counting Words

Map()
Input <filename, file text> Parses file and emits <word, count> pairs
eg. <hello, 1>

Reduce()
Sums all values for the same key and emits <word, TotalCount>
eg. <hello, (3 5 2 7)> => <hello, 17>
Jongwook Woo

CSULA

HiPIC

Example Use of MapReduce

 Counting words in a large set of documents


map(string key, string value) //key: document name //value: document contents for each word w in value

EmitIntermediate(w, 1);
reduce(string key, iterator values) //key: word //values: list of counts int results = 0; for each v in values

result += ParseInt(v);
Emit(AsString(result));
Jongwook Woo

CSULA

HiPIC

MapReduce Automatic parallelization & distribution Fault-tolerant Provides status and monitoring tools Clean abstraction for programmers
Jongwook Woo

CSULA

HiPIC

Programming Model Borrows from functional programming Users implement interface of two functions:
map (in_key, in_value) -> (out_key, intermediate_value) list reduce (out_key, intermediate_value list) -> out_value list
Jongwook Woo

CSULA

HiPIC

map Records from the data source (lines out of files, rows of a database, etc) are fed into the map function as key*value pairs: e.g., (filename, line). map() produces one or more intermediate values along with an output key from the input.
Jongwook Woo

CSULA

HiPIC

reduce After the map phase is over, all the intermediate values for a given output key are combined together into a list reduce() combines those intermediate values into one or more final values for that same output key (in practice, usually only one final value per key)
Jongwook Woo

CSULA

HiPIC

Jongwook Woo

CSULA

HiPIC

Parallelism
 map() functions run in parallel, creating different intermediate values from different input data sets  reduce() functions also run in parallel, each working on a different output key  All values are processed independently  Bottleneck: reduce phase cant start until map phase is completely finished.

Jongwook Woo

CSULA

HiPIC

Example: Count word occurrences of each word in a large collection of documents

map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));
Jongwook Woo

CSULA

HiPIC

More Examples

 Distributed Grep:
Map() emits a line if it matches a supplied pattern Reduce() is an identity function that just copies the supplied intermediate data to output.

 Count of URL Access Frequency


Map() processes logs of web page requests and outputs (URL,1) Reduce() adds together all values for the same URL and emits (URL, total count)
Jongwook Woo

CSULA

HiPIC

MapReduce Implementation

Jongwook Woo

CSULA

HiPIC

References

 Introduction to Cloud Computing - for Startups and Developers, Lew Tucker, Ph.D. CTO, Cloud Computing, Sun Microsystems, Inc.  Introduction to Cloud Computing - for Enterprise Users, Lew Tucker, Ph.D. CTO, Cloud Computing, Sun Microsystems, Inc.  Googles Parallel Programming Model and Implementation MapReduce, Klara Nahrstedt and Sam King, UIUC  How to painlessly process terabytes of data, John R. Gilbert, UCSB

 MapReduce Theory and Implementation, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, University of Washington and Google
Jongwook Woo

CSULA

Das könnte Ihnen auch gefallen