You are on page 1of 5

A Dissection of Cloud Computing

Panuelos, Kevin Matthew C. H.


ADVANDB S18 July 22, 2012

Abstract
In the interest of exploring the way cloud computing aects dierent areas in computer science, this paper delves into the way the term works by rst nding a denition for it. It then dissects the layers with which cloud services can run upon, gives an overview of the fundamental concepts workows and virtualization that are employed to make it possible, and rounds up the issues that come with current implementations of cloud oerings in the aspects of security, availability, scalability, portability and performability. Categories and Subject Descriptors Systems ]: General H.0 [Information
User Front End Everything as a Service

Software as a Service

Consumer/ Enterprise Applications

Platform as a Service

Developer runtime or middleware

Keywords Cloud Computing, Virtualization, Databases, Concurrency, Distributed Systems, Security, Internet, Serviceoriented Architecture

Infrastructure as a Service

Raw computing power

Figure 2.1. The various levels of the architecture usually found in cloud infrastructures.

1.

Introduction
2.1 Dierent Layers of Cloud Computing

The increasing spread and ubiquity of high-speed Internet be it wired or wireless has brought upon the growth of the number and complexity of interactive web applications for enterprise and consumer sectors, thus necessitating infrastructure that can handle the similarly growing number of users that access and use these applications everyday. This infrastructure, now provided as a service to interested parties to host their data, has been instrumental to the movement that is Cloud Computing.

The term serves as an abstraction; it covers up an intricate and systematic set of hardware and software that work together in tandem to deliver an experience over a network. [11] states that, ultimately, cloud computing involves the provision of an entire infrastructure that would otherwise be rather dicult to maintain for entities like small businesses. This infrastructure already includes services that help interface with the data center, or the cloud. Depending on the requirement, [10] provides a somewhat hierarchical view on the various layers that are involved in a cloud infrastructure, as depicted in Figure 2.1. The front end describes the interface where users are able to actually interact with a service provided by the software as a service (SaaS) layer. The front end is by far the most abstracted layer, where users do not need to know half of what goes on in a certain cloud-powered service. Most interest from consumer and enterprise customers spring from the SaaS layer, where software solutions that utilize the lower layers are provided to the stated markets. These include applications that involve invisibly storing user data into remote servers (e.g., Dropbox), as well as retrieving, processing and presenting said data. In terms of research, if research centers didnt run the infrastructure

2.

Cloud Computing

In serving the needs of various parties that access online services, various providers have sprung up to provide data centers for the data storage and maintenance needs of various Internet start-ups. However, Cloud Computing is not limited only to infrastructure, as it has also been used to describe the services that run on top of the infrastructure, and are oered to interested parties over the Internet [3]. Pallis [10] even observes that cloud computing may well be considered a new form of distributed computing, brought upon by the overwhelming demand for faster and cheaper computing solutions. The advent of lowered costs of ownership for various devices, along with improving internet speeds, opens consumers and enterprises up to an unprecedented berth of new business models and opportunities to use dierent kinds of services for a very low cost.

themselves, their researchers wouldnt have to sweat and give any emphasis on maintaining said infrastructure, but to the research their doing, thus increasing productivity exponentially. It is also possible for enterprise entities to conduct their business in an all-electronic fashion, but their costs on information technology are incredibly small because all theyre focusing on is the business logic, and not on the management of the infrastructure [12]. The operations with user data are likely interfaced through middleware, or a platform that enables software developers to use common operations that do not pay mind to the various hardware involved in running their products. [11] uses the term resource virtualization to describe this, because the developer doesnt have a realistic view of the resources available to him for a service he might be developing, but can nonetheless utilize these resources without having to stress about those details. This middleware can also be considered as an integrated development environment that can deploy web applications based on technologies like Java, Python, et al.The most important piece of the middleware is how they must allow software service developers to exploit the underlying data store (i.e., the database), which can be the traditional relational database that interfaces with SQL, or otherwise [7]. Companies like Google and Amazon have platform as a service solutions that give raw access to their actual infrastructure (the same hardware used in their in-house solutions) and provide easy deployment of applications, along with statistics that detail the way users are able to be engaged in their applications. Access to the infrastructure is also encapsulated through the special APIs provided by both, so as to ensure the sustained security of their systems and ensure that they remain free of exploits and malicious attacks. The lowest-level layer would be the hardware itself which can be handled and interfaced with by system administrators. This is provided so that anyone who wants to build a platform can manipulate the database itself, as opposed to middleware which limits the database interactions exposed to developers. In simpler terms, anyone creating a compiler that runs on a browser (middleware) has access to the raw operations provided by the hardware provider. Anyone using the compiler will only be limited to the database interactions that the compilers author chooses to expose to them. In this sense, the organization creating the compiler rents the actual infrastructure alone so that it can build a service it can in turn sell to software authors [10].

These services operate under dierent principles: they should be reusable, sustainable, extensible, scalable, customizable, composable, reliable, available and secure. To elaborate, reusability refers to how a component can be repeatedly incorporated in dierent workows; sustainable refers to how a service can adapt to changes in technology, and how dierent implementations can be plugged in easily; extensible and scalable means that the functionality can be added and its capabilities incrased; customizable and composable means that generic features of the service can be modied to t according to a need, and that more complex solutions can then be built on top the service, eectively making it a component; reliability, availability and security are fundamentally the way these services are not only stable and impervious to malicious user activity, but also economical and low-cost [12]. The point of dening the term within the context of the architecture is that the services cloud computing applications have in common are what make it possible. The concepts of workows and virtualization fall under these fundamental concepts that most of these types of applications share.

3.1

Workows

Usually represented by owcharts, workows when visualized represent an overview of the way dierent components can be connected to each other to dene a process [12]. This process can dictate the ow with which a cloud application can go through the infrastructure; for example, a platform as a service application for example, an application creation and deployment tool may undergo a process from the service that runs the user interface, to a layer that manages the code compilation. The user interface would ideally be able to interface with a procedure that allows it to retrieve a dump of compile errors that may occur, or whether a compile error occurred at all. The rest of the process would be able to go on a sequence; a sample of which is shown in gure 3.2.

Software Emulator Service

User Interface Framework

Code Compilation Service

Code Scanner Service

Code Parser Service

Abstracted/ Virtualized Infrastructure

3.

Fundamental Concepts

Code Deployment Service

Various concepts are employed in making on-demand access to computing resources possible. A fundamental architecture, called the Service-oriented Architecture, is used as a template for the structure of these services, and isnt quite new, but is receiving much attention due to the rise of interest in cloud computing [12]. A service-oriented architecture basically denes a set of services that are loosely coupled (i.e., not dependent on each other to operate) and can be used across dierent domains [6]. These services can then be tapped by publicly published interfaces.

Figure 3.2. A sample workow for a developer environment in the cloud.

This is essential in laying out the encapsulated, structured manner with which a bigger application will be able to ultimately interact with the hardware, but this should not be the concern of the client, essentially providing a virtualized perspective on the hardware, as covered in the next subsection.

3.2

Virtualization

As shown in gure 3.2, all the isolated services in the workow ultimately interface with the hardware, however, the services should not be concerned with the realistic limitations set by real-world hardware, and is only provided a hint that, yes, hardware is present. The principle of virtualization is not a new thing either, but what is important in its utilization in the cloud computing space is the way it is able to render all sorts of low-level functionalities into a unied, consistent interface; in eect blocking the headaches of hardware vendor-specic functionality that may be working behind the scenes. The concept has been around since the 1960s and has matured signicantly up to the scale where, as cloud computing proves, it is cost-ecient. The ability to hide low-level functionality makes it easier to scale, because the entity interacting with the virtualized interface doesnt need to be aware that, perhaps, the underlying infrastructure has improved or whatnot. Given its implementations in various forms, as consumer software such as VMWare or Parallels prove, its ability to provide abstractions to all sorts of hardware storage, memory, processing power, networking, et al. can be exploited to oer the computing power of a supercomputer, as far as any device with network access is concerned [12]. This is plainly visualized in gure 3.3.

worry about scalability, which is handled by the people responsible for managing the data center. Entities that provide customers access to their data centers already oer pay-asyou-go, or usage-based charges, and enable customers to save on the costs for energy and maintenance associated with data centers [4]. If anything, cloud service providers are the only entities worried about buying and decommissioning hardware rather than their customers. This exible scaling allows those who use the infrastructure to avoid overprovisioning and underprovisioning, thus allowing the customer to avoid passing down the expense of compensating or overcompensating for hardware to their own customer base. An example would be a retailer website that conducts a big sale of their wares; if they relied on a cloud storage provider like Amazon, they would only pay very large sums of money for peak utilization of their resources at that time, but pay a low amount of money for seasonal lulls [4].

4.2

Availability

Disk/ Database

VIRTUALIZATION LAYER

CPU

NETWORK

However, whereas the fact that relying on a cloud service provider can control the amount of downtime for a clients service, there are still occurrences within even the best infrastructures that go down for a certain amount of time. Ironically, even when companies like Google have very elaborate networks of data centers around the globe, the search giant still experienced downtime for its own GMail e-mail service and AppEngine PaaS. Amazon also experienced major outages in its S3 storage solution [4]. This point of weakness may be a valid point for enterprises, which have missioncritical information, to be hesitant in adopting cloud services; one of the main weaknesses in relying on only one provider is the possible loss of data due to various circumstances: a provider could go out of business, or one of the other issues of cloud computing in general happen to them. Chief among those is security.

Figure 3.3. Because of virtualization, even mobile devices can access more computing resources than they come with using network access.

4.3

Security

4.

Issues and Points of Interest

This promise of low-cost usable computing power has become a reality; many a publication has stated that it is a transformative development in the industry, because of how little capital is now needed for customers with an idea to deploy a system aimed for consumer use and how companies can harness an existing infrastructure instead of building their own for their computationally-intensive tasks [3]. 4.1 Scalability

Security of data is one of the biggest concerns with regards to cloud services in general, for the fact that customers do not know the precise locations where their data may be stored alone can raise red ags pertaining to how they can be ensured that their data is truly being safeguarded [13]. Even with the distributed nature of data centers for most major cloud service providers, there may still be a point of weakness in terms of software architecture or network architecture that may still hold other data centers prone to certain security attacks [4]. A malicious collective of people may cooperate to initiate a Distributed Denial of Service attack, where each person involved in the attack almost simultaneously make requests to the companys infrastructure they wish to destroy; even as concurrency techniques are employed in even the best of cloud infrastructures, the hardware still holds limitations; there is no such thing as a truly innite resource. When a service is too busy handling requests, any more incoming requests are not served, thus the name denial of service attack.

Despite that, as stated in The Claremont Report on Database Research [1], there is a pressing need to address the scalability issues that come with the increase of users who utilize cloud data services everyday, and those who rent out the infrastructure that runs the database should not have to

While reports have sprung pertaining to certain websites migrating to cloud services to protect themselves from this type of attack, it is still possible for a Distributed Denial of Service attack to aect a cloud service provider; this will not only even aect just one site, but the eects may be felt across multiple websites and/or services being hosted under that same cloud service provider [8].

Cloud Service Amazon SimpleDB

Features

Dynamic and exible schema (no prede-

ned schema)
Cannot retrieve large batches of results;

queries limited to 5 seconds of execution time 4.4 Data Migration


Each item is limited to 256 attributes Domains (or tables) cannot be larger than

One of the other problems present in cloud computing is how there is no such thing as a standard that enables users to switch from one cloud service provider to another, or utilize multiple cloud services for the same data. This is made dicult by the fact that companies can have dierent approaches to storing data; not all of them use Relational Databases (which are not really ideal for this sort of work, which demands scalability more than data consistency [5]), thus giving way to all sorts of implementations without a single point of easy data migration. In simpler terms, users are locked into whatever cloud service provider they choose, and if the cloud provider runs out of business or even gets bought by a bigger company, the way data should be presented and stored must be changed, because there is no standard that governs the rules of the space [13]. This lack of standardization also means that one can only be locked-in to one provider and virtually unable to get fast performance when using dierent providers because of the need to employ middleware, which sort of defeats the purpose and the advantages posed by cloud computing. This disparity in standardized interfaces is made apparent by the variations in the implementations of the popular key-value pair databases by various entities, as presented in table 4.1 and table 4.2. Most of these key-value databases oer unstructured, exible schemas that can dynamically change over time and allow signicantly less overhead in deserializing to object-oriented environments.

10GB
Uses

eventual consistency model, which is very similar to the Optimistic technique, except all duplicated data are consolidated only when there are no more people updating a particular record (which does not bode well for anyone reading a record immediately after a write)

Google AppEngine Data Store

Interface

for Googles proprietary BigTable database (presumably also a key-value pair table) (dynamic array)

Supports strongly typed values like lists Use of the Data Store is not available to

services not owned by Google Microsoft SQL Data Services

Underlying infrastructure is based on re-

lational databases
Developers still interface with this in-

frastructure like competing key-value pair databases Table 4.1. The features between cloud services show a disparity in implementing standards for non-relational databases popular in this space, part 1. [5] This is also an issue on the consumer side as they access the services of cloud providers through the Internet. Another performance bottleneck that can be encountered would be the bandwidth of a market being served; if a service is far too slow for the perception of a customer, then it could be due in part by too many requests being processed in the providerside, or low internet speeds on the side of the customer. The adoption of cloud computing in mainstream computing may be slowed down by the cost of wide area networks, in that case [4].

4.5

Performance Bottlenecks

The realized dream of cloud computing does not come with its current shortcomings; it is not yet at a state of perfection. One of the primary issues also come with the way cloud infrastructure can be inuenced by current trends in hardware: they are, especially if the data center hosts a distributed system, the same computers that can be found in consumers homes after all, since consumer level computers cost signicantly smaller than a supercomputer; but combining several of these computers together can yield just about the same amount of computing power a trend that is making its way into enterprise computing [?]. Given this, that also means performance is still constricted by the components that may slow down a computer in doing a task, such as hard disk drives and I/O in general. Given that a distributed system set-up can consist of various kinds of computers being delegated a larger task, performance can be uncontrollable and unpredictable. Not to mention, bugs that may be hard to debug in such a large scale may appear from time to time [4].

5.

Conclusion

Cloud Computing the realization of utility computing continues to make waves in the world of computing, and relies on age-old concepts to employ its benets of oering customers aordable, scalable and abstracted resources that they otherwise would not be able to access. The signicance of this area has aected many elds, not only databases,

Cloud Service CouchDB

Features

References
[1] Agrawal, R., Ailamaki, A., Bernstein, P. A., Brewer, E. A., Carey, M. J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M. J., Garcia-Molina, H., Gehrke, J., Gruenwald, L., Haas, L. M., Halevy, A. Y., Hellerstein, J. M., Ioannidis, Y. E., Korth, H. F., Kossmann, D., Madden, S., Magoulas, R., Ooi, B. C., OReilly, T., Ramakrishnan, R., Sarawagi, S., Stonebraker, M., Szalay, A. S., and Weikum, G. The claremont report on database research. SIGMOD Rec. 37, 3 (Sept. 2008), 919. [2] Anderson, J. Chris, L. J., and Slater, N. Couchdb: The denitive guide, Jan. 2010. [3] Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., and Zaharia, M. A view of cloud computing. Commun. ACM 53, 4 (Apr. 2010), 5058. [4] Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R. H., Konwinski, A., Lee, G., Patterson, D. A., Rabkin, A., and Zaharia, M. Above the Clouds: A Berkeley View of Cloud Computing, 2009. [5] Bain, T. Is the relational database doomed?, Feb. 2009. [6] Cothren, R. Acronyms and terms, Mar. 2010. [7] Lawton, G. Developing Software Online With Platform-asa-Service Technology. Computer 41, 6 (June 2008), 1315. [8] Linthicum, D. Can cloud computing save you from ddos attacks?, Dec. 2010. [9] Merriman, D. Comparing mongo db and couch db, Aug. 2011. [10] Pallis, G. Cloud Computing: The New Frontier of Internet Computing. IEEE Internet Computing 14, 5 (Sept. 2010), 7073. [11] Stanoevska-Slabeva, K., and Wozniak, T. Cloud Basics An Introduction to Cloud Computing. In Grid and Cloud Computing, K. Stanoevska-Slabeva, T. Wozniak, and S. Ristol, Eds. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, ch. 4, pp. 4761. [12] Vouk, M. A. Cloud computing - Issues, research and implementations. In ITI 2008 - 30th International Conference on Information Technology Interfaces (June 2008), IEEE, pp. 3140. [13] Walker, G. Cloud computing fundamentals, Dec. 2010.

Uses Javascript and Javascript Object No-

tation (JSON) to dene schemas and return records


Operates on the principle of data replica-

tion and synchronization across servers


Each server is not a part of a bigger set

of data in a data warehouse; they are only machines that hold the same data so that any user can be placed in any server that can handle their requests (load balancing) [2]
Overwrites entire sets of data, because

this data is contiguously allocated [9] MongoDB


Document-oriented

JSON-based

database, like CouchDB


Unlike CouchDB, aims to be a true

database and not just a key-value store


Also unlike CouchDB, various servers can

be a partition of a bigger set of data in a database


Only writes modied sections of data

(update-in-place) [9] Drizzle


Based from MySQL 6.0 source code Strips features from MySQL to make it

scalable and optimized for multi-core machines


Stores relational data

Table 4.2. The features between cloud services show a disparity in implementing standards for non-relational databases popular in this space, part 2. [5]

but also distributed systems, virtualization, networking, et al.and has spurred renewed interest and fervent research in these specialized elds. This type of computing has yet to be perfected, especially because it still has a lot of unresolved quirks and issues pertaining to lack of standardization, security, reliability, scalability and availability. That said, the rapid growth of this area in mainstream computing leaves room for the growth of the aforementioned concepts and the resolution of the aforementioned issues, especially when taking into consideration future advancements in hardware, architecture and infrastructure, opening up even more opportunities to make dierent kinds of services that go beyond infrastructure as a service, platform as a service and software as a service.