5174affa160bd Cloud Computing Big Data

Page 1
Cloud computing has been generating considerable hype these

days. Every participant in the datacenter and IT ecosystem has
been rolling out cloud initiatives and strategies from hardware
vendors, ISVs, SaaS providers, and Web 2.0 companies - start-
ups and incumbents are equally active.
Cloud computing promises to transform IT infrastructure and
deliver scalability, flexibility, and efficiency, as well as new
services and applications that were previously unthinkable.
Despite all of this activity, cloud computing remains as
amorphous today as its name suggests. However, one critical
trend shines through the cloud Big Data. Indeed, its the core
driver in cloud computing and will define the future of IT.
BIG DATA THE PERFECT STORM
Cloud computing has been driven fundamentally by the need to
process an exploding quantity of data. Data is no longer measured
in gigabytes but in exabytes as we are Approaching the
ZettaByte Era.
1
Moreover, data types structured, semi-
structured, or unstructured continue to proliferate at an
alarming rate as more information is digitized, from family
pictures to historical documents to genome mapping to financial
transactions to utility metering. The list is truly unbounded. But
today, data is not only being generated by users and applications.
It is increasingly being machine-generated, and such data is
exponentially leading the charge in the Big Data world. In a
recent article, The Economist called this phenomenon the Data
Deluge (http://www.economist.com/opinion/displaystory.cfm?
story_id=15579717).
One can argue that Web 2.0 companies have been pushing the
upper bounds of large-scale data processing more than anyone.
That being said, this data explosion is not sparing any vertical
industries financial, health care, biotech, advertising, energy,
telecom, etc. All are grappling with this perfect storm. Below are
just a few stats:
Google was processing two years ago more than 400PB of
data/month in just one application
The New York Times is processing an 11-million-story
archive dating back to 1851
eBay processes more than 50TB/day in its data warehouse
CERN is processing 2GB/second for their most recent
particle accelerator
Facebook crunches 15TB/day into a 2.5PB data warehouse
Without question, data represents the competitive advantage of
any enterprise, and every organization is now encumbered with
the task of storing, managing, analyzing, and extracting value

CLOUD COMPUTING: BIG DATA IS THE FUTURE OF IT
Winter 2009 | Ping Li | ping@accel.com
from this exponential data growth as inexpensively as
possible.
Previous computing platform transitions had technology
dislocations similar to cloud computing but along different
dimensions. The shift from mainframe to client-server was
fueled by disruptive innovation in computing horsepower that
enabled distributed microprocessing environments. The
following shift to web applications/web services during the last
decade was enabled by the open networking of applications and
services through the internet buildout. While cloud computing
will leverage these prior waves of technology computing and
networking it will also embrace deep innovations in storage/
data management to tackle big data.
Along these lines, many of the early uses of cloud computing
have been focused less on computing and more on storage.
For example, a significant portion of the initial applications on
AWS were primarily leveraging just S3 with applications
executing behind the firewall. Popular storage applications, like
Jungle Disk and Smug Mug, were early AWS customers. This
explosion of data has driven enterprises (and consumers for
that matter) to find cheap, on-demand storage in unlimited
quantities which cloud storage promises to deliver. Until
now, massive tape archives in the middle of nowhere (like Iron
Mountain) have been the only means to achieve that cheap
storage. However, enterprises today need more; they need
quick access data retrieval for multiple reasons, from
compliance to business analytics. It is simply no longer
sufficient to have cold data; rather, it needs to be online and
resilient (and cheap, of course); hence, the accelerating shift
towards storing every piece of data in memory or on disks
(Data Domain smartly rode this trend).
The need to balance data availability/usability and cost
effectiveness has prompted significant innovation in both on-
premise and hosted cloud storage cloud storage systems
(Caringo, EMC Atmos, and ParaScale, to name just a few),
flash-based storage systems (Fusion IO, Nimble Storage,
Pliant, etc.) are just some current examples. Furthermore,
hierarchical storage management (HSM, which has always
sounded great but has been implemented only rarely) will
become an important element in storage workflows.
Enterprises will require seamless capability to move data
across different tiers of storage (both on-premise and into the
cloud) based on policy and data type to maximize retrieval
costs. As cloud computing matures, true cloud applications will
be (re)written to leverage hierarchical and cloud-like storage
tiers to retrieve data dynamically from different storage layers.

1
Source: Approaching the Zettabyte Era. Cisco, 16 June 2008. <http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-
481374_ns827_Networking_Solutions_White_Paper.html>

Page 2
A NEW CLOUD STACK
In order for cloud computing to become a mainstream approach,
a new cloud stack (like mainframe and OSI) will likely emerge.
Just like prior computing platform transitions (client/server, web
services, etc.), core platform capabilities, such as security, access
control, application management, virtualization, systems
management, provisioning, availability, etc. will be a prerequisite
before IT organizations are able to adopt the cloud completely.
Clearly, this stack will exist in a different representation than
prior platform layers to embrace a cloud environment. Simply
replicating the current computing stack but allowing it to reside
off-premise will not achieve the scale, capabilities, and
economies of cloud computing. In particular, this new cloud
framework needs the ability to process data in increasingly
greater orders of magnitude and do it at a fraction of the cost
by leveraging commodity, multi-threaded servers for storage and
computing. In many ways, this cloud stack has been implemented
already, albeit in a primitive form, at large-scale internet
datacenters.
The challenge of processing terabytes of data daily at Google,
Facebook, and Amazon drove them to adopt a new data
architecture, which is essentially Martian to traditional enterprise
datacenter architects. No longer are ACID and relational
databases back-ending transactional applications. Internet
datacenters quickly encountered the scaling limitations of SQL
databases as the volume of data exploded. Instead, high-
performance, scalable/distributed non-SQL data stores are being
developed internally and implemented at scale. Big Table and
Cassandra are among the many variants, and this non-database
database trend has proliferated to the point of having its own
conference: NoSQL. Database caching layers (i.e., Northscales
Memcached) are also being implemented to further drive
application performance, and its now accepted as a standard
tier in datacenters.
Managing non-transactional data has become even more
daunting. From log files to click stream data to web indexing,
internet data centers are collecting massive volumes of data that
need to be processed cheaply in order to drive monetization
value. Hadoop is an open source data management framework
that has become widely deployed for massive parallel
computation and distributed file systems in a cloud environment.
Hadoop has allowed the largest web properties (Yahoo!,
LinkedIn, Facebook, etc.) to store and analyze any data in near
real-time at a fraction of the cost that traditional data
management and data warehouse approaches could even
contemplate. Although the framework has roots in internet
datacenters, Hadoop is quickly penetrating broader enterprise use
cases. The diverse set of participants at Hadoop World NYC
hosted by Cloudera clearly points to this trend.
SECURING THE CLOUD
Given this data intensive nature, any widely adopted cloud
computing platform will inevitably account for richer security
requirements. The security challenges will be focused less on
point network and data level security, although high bandwidth
encryption solutions and sophisticated key management will be
needed to match the massively parallel computational cloud
environments. In this case, the primary security challenges will
stem from control. User authentication will become
increasingly challenging as applications are federated outside
the firewall because of SaaS adoption. In addition, managing
and reconciling user identities across individual user directories
for each SaaS/Cloud application will present further security
issues. Much like web applications in the 90s created an SSO
layer, cloud computing is essentially abstracting a web services
interface for infrastructure IT, and it will demand a similar
unified authentication/entitlement layer.
In addition to federated user authentication, cloud computing
will also require data authentication and security. Impervas
database firewall is an example of an increasingly important
cloud security product. As applications reside in different
public and private clouds, it will be critical for the cloud
applications to be able to talk to each other. This will drive
the need for ensuring data authentication and policy control for
the volumes of data flowing between cloud applications.
Moreover, given the multi-tenancy paradigm of cloud
environments, policy granularity will be paramount to ensure
security and compliance. Data integration across cloud
platforms will be more of an obstacle than application
integration, as applications have become more open/standard.
Standard data APIs will emerge as part of the new cloud
stack to allow disparate environments to talk to each other and
avoid vendor lock-in. Data migration challenges are perhaps
the greatest factor today for locking users to a particular cloud
platform.
Over time, these APIs and layers will harden and will become
tailored, depending on use case and workload for particular
applications. The adoption of these new frameworks will
ultimately make cloud computing safe and broaden its
penetration into enterprises of all sizes.
WHATS BREWING IN A CLOUD?
Despite constant comparisons to grid and utility computing,
cloud computing has the potential to address a much broader
set of applications and use cases beyond the limited HPC
environments served traditionally by grid computing. This
breadth of cloud computing is engendered in a new set of
underlying technology forces. Virtualization technologies,
high-powered commodity servers, low-cost/high bandwidth
connectivity, concurrent/multi-threaded programming models
and open source software stacks are all technology building
blocks that can deliver the high performance and scalability of
grid/utility computing, but importantly and concurrently
with underlying commodity resources.
These technology drivers enable applications and users to be
abstracted cleanly from particular IT infrastructure resources
(computing, storage, networking, etc.) in new and powerful
ways; i.e., location agnostic and multi-tenancy are two critical

Page 3
elements among others. Unlike traditional HPC grid
environments, which were designed for a specific application in a
single company, cloud computing enables disparate applications
and entities to harness a shared pool of resources. In addition,
applications can be broken up in the cloud where computing
resources may reside on the client while the data is accessed
portably from multiple cloud locations (as an example).
Many different definitions of cloud computing have surfaced.
Rather than posit yet another, several characteristics are resident
in any cloud instance: (i) self-provisioned (either by user,
developer, or IT); (ii) elasticity (on-demand allocation of any
computing, storage and networking resources); (iii) multi-
anything (multi-user, multi-application, multi-session, etc.);
and (iv) portability (applications are abstracted from physical
infrastructure and can be migrated easily). These capabilities
allow enterprise to shift IT resources from capex to opex a
usage based model that is particularly appealing during recent
economic constraints.
These cloud prerequisites will yield a powerful a set of use cases
beyond grid computing that are unique to cloud platforms. Cloud
computing will reach its full potential in the future when a whole
new set of applications (never possible before) is created that is
purpose-built for the cloud. For example, one can envision
powerful collaboration applications emerging that enable internal
enterprise and external users to seamlessly cooperate that would
have been previously impossible with users and data isolated on
disparate enterprise islands. Its likely these innovative
applications will require new programming models and
potentially languages yet to be hardened.
STILL IN THE EARLY DAYS
Despite the high energy surrounding cloud computing and early
cloud offering successes, such as Amazon Web Services, cloud
computing for enterprise services is definitely still in its
formative stages. In contrast, however, consumers have already
adopted cloud computing technologies. One could argue that web
companies like Google, Yahoo!, Facebook, and Salesforce are
examples of consumers leveraging cloud computing. These Web
2.0/SaaS offerings clearly exhibit the core cloud characteristics
outlined above, and in turn are delivering new, value-added
services previously considered unthinkable. Interestingly, this
time the consumers, via their use of Web 2.0 services, have been
teaching the typically early technology adopter enterprises the
effectiveness of cloud computing.
Today, the enterprise use of cloud computing represents opposite
ends of the spectrum: (i) Web 2.0 start-ups seeking to launch
applications quickly and cheaply, and (ii) compute intensive
enterprises that need batch processing for bursty, large-scale
applications. Although these users are driving the early adoption
of cloud technology, its unlikely these limited use cases will
establish cloud computing as a pervasive platform. Cloud
computing instead will need to penetrate mainstream IT
infrastructure slowly and offer a broader set enterprise
applications.
It is important to note here that these Web 2.0 start-ups represent
a powerful trend in the role of developers in driving cloud
computing adoptions. Many early users of cloud computing are
examples of developers launching applications without
requiring the involvement of IT (in the case of a Web 2.0 start-
up, they dont have an IT department). Increasingly,
empowering developers and line of business owners to
innovate and deploy new applications without the shackles of
IT will be a motivating driver for cloud adoption. No longer do
users need to have ITs blessing and time to get their job done.
This developer-centric nature was a primary motivator of
VMwares strategic acquisition of SpringSource. In addition to
inheriting significant Java technology, VMware now has a
distinct opportunity to transition SpringSources dominant Java
developer mindshare to develop onto VMwares private cloud
platform. Amazon Web Services has experienced tremendous
success from its developer-centric platform APIs. Unlike
traditional hosting providers that cater to IT/operations,
Amazon went after developers first and has only recently
begun to add the functionality that will appeal to broader
enterprise IT.

Within enterprises, there are early signs of developers (Q&A
environments, batch processing, and developer prototyping)
and line of business/departmental leveraging cloud computing.
It is not uncommon for new platform technologies to start at
the fringes of IT before mainstream adoption takes place.
Unlike typical three-tier traditional enterprise datacenters, the
internet datacenters of Facebook, Google, etc. were not
encumbered by legacy enterprise stacks, applications, and IT
rules; which in turn enabled them to be built from the ground
up with cloud stacks to handle elastically large-scale consumer
transactions for multiple applications. Therefore, and
unsurprisingly, Amazons internet datacenters was easily
adapted to become the first and leading public computing
provider. It will certainly take significant time/effort for
enterprise IT infrastructure gatekeepers to evolve their current
architectures to embrace a new cloud platform. Luckily,
enterprises can reap the technology innovation from internet
data centers (many which are open source) to accelerate this
transition.
MORE THAN ONE FLAVOR
There have been analogies drawn between cloud computing
and public utilities (electric, gas, etc.) where the value is all
about economies of scale. According to this hypothesis, the
world will only have a few cloud providers that reach
maximum efficient scale. It is quite unlikely that this will
happen. Multiple cloud models will emerge depending on the
user, the workload, and the application. For example, certain
developers will prefer to interface with a cloud provider at a
higher level of abstraction, such as Google App Engine, as
opposed to a more bare metal API, such as Rackspace.
Alternatively, an application may choose to run on MSFT
Azure to leverage SQL/MSFT services or Salesforce Force for
CRM integration and distribution advantages. Today, one can
break cloud platforms into roughly two camps: developer-
centric (Amazon, MSFT) and IT-centric (EMC, VMware).

Page 4
Cloud platforms will remain distinct and diverse as long as they
continue to deliver unique value-add for their particular use cases
and users.
To drive this cloud diversity point further, the concept of a cloud
within a cloud is also emerging where distinct services,
such as data warehousing, can be built atop a more generic cloud
platform to provide a higher layer cloud service.
In addition, private clouds behind the firewalls present yet
another flavor of cloud computing as enterprises leverage the
benefits of cloud frameworks while maintaining security/control
as well as the compliance of their internal datacenters. Lastly,
hybrid clouds that bridge private and public clouds on a
permanent and temporary basis (also known as cloud bursting)
will come to fruition for certain applications or as a migration
path for enterprises. Several start-ups (Cirtas, CloudSwitch and
Zetta among them) are building products that make the cloud
safe for enterprises. Innovation will abound to solve the
specific issues in all of these various cloud environments.
LOOKING AHEAD
To further parse all this, I hosted a cloud computing panel with an
esteemed group of technology thought leaders at Accels 15th
Stanford Technology Symposium. Needless to say, these
panelists had plenty of deep insights, opinions, and predictions
about cloud computing.
The panel brought together technologists who view cloud
computing from distinctly different lenses: private cloud
innovators, public cloud providers, cloud enabling technology
solutions and cloud infrastructure applications. In wrapping up
the panel session, I asked each speaker to conjure up a single
prediction for cloud computing in the next few years. Heres what
the experts said:
Jonathan Bryce, CTO/Founder, Mosso (Rackspace): I think
cloud computing is going to be a mindshift; its going to take a
while. But I think an economy like this is actually a huge
opportunity for entrepreneursI think this is a time when
resources are scarce thats when great businesses end up getting
built. And I think part of whats going to enable some of those
businesses is cloud computing, and being able to get started with
a lower varied entry, lower price point, all of those kind of
things
Mike Olson, CEO/Co-founder, Cloudera: I think that a lot of
whats been said around here about data is really right on. I
predict that in the next 10 years, computer science as computer
science isnt really going to be the place that smart young guys
are going to find tremendously rewarding careers. I think that the
application of these new compute systems to large data in the
sciences will advance human kind substantially. I think that
science will be done maybe not even in the lab on the wet bench
anymore, but with data, with computer systems looking at vast
amounts of data.
Raghu Ramakrishnan, Chief Scientist for Audience and Research
Fellow, Yahoo! Research: So a lot of the companies that are
out there today Yahoo!, Facebook, Google theyre all
exposing data APIs. Imagine whats going to happen once
large clouds are routinely available to build theyre own
application and you start aggregating your own data, and you
have the opportunity to fuse that with all the data thats out
there. Someones going to figure out the next big thing, by
taking 2 + 2 and coming up with 20.
Mike Schroepfer, VP Engineering, Facebook: one of the
things that is going to happen is that people are going to figure
out that we need a more blended workload between the cloud
and the client. Weve been operating kind of in the cycle of
reincarnation and computer science, moved toward most of the
computing happening in the cloud, and my browser effectively
being its own terminal. You know, in the last 2 or 3 years, the
speed and capability of browsers has been outpacing that of
most chips. Youre seeing 2x to 4x improvements in core
performance on the engines and VMs in those browsers year on
year, which is way outpacing the speed of chip designSo I
believe that there will be a couple of people who will figure out
ways to blend computation and storage on the client, more
gracefully with that on the server, but still provide you with all
of the benefits of basically access to my data anywhere I need,
and the kind of reliability of the cloud.
Jayshree Ullal, President and CEO, Arista Networks: Well,
theres a technology impact but I actually think its going to
really make CIOs rethink their jobs. Today, you can have a
server administrator, an application administrator, a network
administrator, and theyre all silos but you need your general
practitioner. And thats really missing right now in the cloud.
So if I had to make a prediction, less on the technology, more
on the operational side, I would say for the deployment of this,
its got to be a generalized IT person, whether thats the CIO or
somebody he or she appoints
Rich Wolski, Professor of Computer Science, University of
California, Santa Barbara and CTO/Founder, Eucalyptus
Systems: theres another revolution coming thats going to
intersect the cloud revolution and that has to do with data
simulationpretty much everything you own is going to be
trying to send you data. And youre going to need, personally,
a great deal of storage and compute capacity to be able to deal
with that. I think the cloud is going to make that revolution that
much quicker to come to us.
These predictions depict cloud computing as still being in its
formative phases, but that it will emerge as fundamental
breakthroughs in datacenter and IT infrastructure in the years to
come. Despite the current macro headwinds, deep innovation,
and market opportunities in cloud computing will persist. Once
this economic storm passes, Im convinced the sun will shine
through, and cloud computing is sure to have many silver
linings.
Ping Li is a partner at Accel Partners in Palo Alto
and focuses primarily on Information Technology
infrastructure and digital media platforms.

5174affa160bd Cloud Computing Big Data

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

5174affa160bd Cloud Computing Big Data

Hochgeladen von

Copyright:

Verfügbare Formate

Page 1

Cloud computing has been generating considerable hype these

Das könnte Ihnen auch gefallen