Beruflich Dokumente
Kultur Dokumente
CSULA
HiPIC
Contents
Jongwook Woo
CSULA
HiPIC
Cloud Computing
CSULA
HiPIC
Jongwook Woo
CSULA
HiPIC
CSULA
HiPIC
CSULA
HiPIC
Benefits
Pay As-You-Go Op-ex vs. Cap-ex Virtualization
Efficiency
Flexibility
Demand Scalable Services
Speed
Rapid, Self Provisioning Faster Deployment API-Driven
Jongwook Woo
CSULA
HiPIC
Software as a Service
Applications on-demand
Platform as a Service
Developer platform for creating applications
Infrastructure as a Service
Storage and compute capabilities offered as a service
Jongwook Woo
CSULA
HiPIC
Applications on demand:
Subscription-based, multi-tenant, nothing to download or manage
Google Apps (docs, email), Microsoft Exchange Online, Yahoo Mail, TurboTax Online, YouTube, Twitter, Flickr, Salesforce.com,
Jongwook Woo
CSULA
HiPIC
Jongwook Woo
CSULA
HiPIC
Amazon AWS (EC2, S3, SQS), Microsoft Azure, RackSpace Cloud, Savis, Terremark, Joyent
Jongwook Woo
CSULA
HiPIC
Issues
Security
Security is still your responsibility Learn everything you can (attackers will) Ease of use often comes with greater risk Monitor don't assume your provider will alert you
http://cloudsecurityalliance.org/
Jongwook Woo
CSULA
HiPIC
Public
Private
Cloud Computing model in a company's own datacenter Resources directly owned but therefore constrained
Hybrid
Mixed usage of both public and private clouds, often integrated into the same application
Jongwook Woo
CSULA
HiPIC
Cloud APIs
Programmic way to provision and manage compute, storage, and network resources Access to scalable services (S3, SimpleDB) XML SOAP, RPC, RESTful + language bindings Work underway to standardize for interoperability
Jongwook Woo
CSULA
HiPIC
RackSpace
Cloud API
Sun Microsystems
Open Cloud API
Vmware
vCloud API
Jongwook Woo
CSULA
HiPIC
Pay as you go reduces startup costs and risk for the investor
Closely track business growth
Jongwook Woo
CSULA
HiPIC
Create your own Virtual Data Center Use laptops and mobile phones for everything else
Jongwook Woo
CSULA
HiPIC
Functional operations do not modify data structures: They always create new ones Original data still exists in unmodified form Data flows are implicit in program design Order of operations does not matter
Jongwook Woo
CSULA
HiPIC
Order of sum() and mul(), etc does not matter they do not modify l
Jongwook Woo
CSULA
HiPIC
Map
map f lst: (a->b) -> (a list) -> (b list) Creates a new list by applying f to each element of the input list; returns output in order.
Jongwook Woo
CSULA
HiPIC
Map Example
Given an original list of numbers, how can we generate a list of numbers that are square of the original list
CSULA
HiPIC
Fold (Reduce)
fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b Moves across a list, applying f to each element plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list
Jongwook Woo
CSULA
HiPIC
What is MapReduce?
Jongwook Woo
HiPIC
Reduce()
Merge all intermediate values associated with the same key
Jongwook Woo
CSULA
HiPIC
Map()
Input <filename, file text> Parses file and emits <word, count> pairs
eg. <hello, 1>
Reduce()
Sums all values for the same key and emits <word, TotalCount>
eg. <hello, (3 5 2 7)> => <hello, 17>
Jongwook Woo
CSULA
HiPIC
EmitIntermediate(w, 1);
reduce(string key, iterator values) //key: word //values: list of counts int results = 0; for each v in values
result += ParseInt(v);
Emit(AsString(result));
Jongwook Woo
CSULA
HiPIC
MapReduce Automatic parallelization & distribution Fault-tolerant Provides status and monitoring tools Clean abstraction for programmers
Jongwook Woo
CSULA
HiPIC
Programming Model Borrows from functional programming Users implement interface of two functions:
map (in_key, in_value) -> (out_key, intermediate_value) list reduce (out_key, intermediate_value list) -> out_value list
Jongwook Woo
CSULA
HiPIC
map Records from the data source (lines out of files, rows of a database, etc) are fed into the map function as key*value pairs: e.g., (filename, line). map() produces one or more intermediate values along with an output key from the input.
Jongwook Woo
CSULA
HiPIC
reduce After the map phase is over, all the intermediate values for a given output key are combined together into a list reduce() combines those intermediate values into one or more final values for that same output key (in practice, usually only one final value per key)
Jongwook Woo
CSULA
HiPIC
Jongwook Woo
CSULA
HiPIC
Parallelism
map() functions run in parallel, creating different intermediate values from different input data sets reduce() functions also run in parallel, each working on a different output key All values are processed independently Bottleneck: reduce phase cant start until map phase is completely finished.
Jongwook Woo
CSULA
HiPIC
map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));
Jongwook Woo
CSULA
HiPIC
More Examples
Distributed Grep:
Map() emits a line if it matches a supplied pattern Reduce() is an identity function that just copies the supplied intermediate data to output.
CSULA
HiPIC
MapReduce Implementation
Jongwook Woo
CSULA
HiPIC
References
Introduction to Cloud Computing - for Startups and Developers, Lew Tucker, Ph.D. CTO, Cloud Computing, Sun Microsystems, Inc. Introduction to Cloud Computing - for Enterprise Users, Lew Tucker, Ph.D. CTO, Cloud Computing, Sun Microsystems, Inc. Googles Parallel Programming Model and Implementation MapReduce, Klara Nahrstedt and Sam King, UIUC How to painlessly process terabytes of data, John R. Gilbert, UCSB
MapReduce Theory and Implementation, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, University of Washington and Google
Jongwook Woo
CSULA