Faculty of Informatics
Master Thesis
Database management
as a cloud-based service
for small and medium
organizations
Student:
Dime Dimovski
Brno, 2013
Statement
I declare that I have worked on this thesis independently using only the sources listed in the
bibliography. All resources, sources, and literature, which I used in preparing or I drew on them,
I quote in the thesis properly with stating the full reference to the source.
Dime Dimovski
Resume
The goal of this thesis is to explore the cloud computing, manly focusing on database
management systems as a cloud service. It will give review of some of current available
solutions of SQL and NOSQL based database management systems as a cloud service;
advantages and disadvantages of the cloud computing in general and the common
considerations.
Keywords
Cloud computing, SaaS, PaaS, Database management, SQL, NOSQL, DBaaS, Database.com, SQL
Azure, Amazon Web Services, SimpleDB, DynamoDB, Google SQL, MongoDB, CouchDB, Google
Datastore.
Contents
1.
Introduction .................................................................................................................................... 8
2.
2.2
2.2.1
2.3
2.3.1
Infrastructure ....................................................................................................................... 13
2.3.2
Platform ............................................................................................................................... 14
2.3.3
2.3.4
Application ........................................................................................................................... 16
3.
Scalability ...................................................................................................................................... 17
4.
Elasticity ........................................................................................................................................ 18
5.
6.
Database.com ............................................................................................................................... 21
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
Backup.......................................................................................................................................... 27
6.14
Pricing .......................................................................................................................................... 27
7.
Subscriptions ................................................................................................................................ 28
7.2
Databases..................................................................................................................................... 28
7.3
7.4
7.5
7.6
Network Topology........................................................................................................................ 31
7.7
7.8
Failure Detection.......................................................................................................................... 33
7.9
Reconfiguration............................................................................................................................ 33
7.10
7.11
7.12
Throttling ..................................................................................................................................... 34
7.13
Load Balancer............................................................................................................................... 35
7.14
7.15
8.
9.
Amazon WebServices.................................................................................................................... 37
8.1
8.2
8.3
8.4
8.5
Pricing .......................................................................................................................................... 39
Google Cloud SQL ......................................................................................................................... 40
5
9.1
Pricing .......................................................................................................................................... 41
10.
11.
NOSQL ........................................................................................................................................... 45
12.
12.1
12.2
12.3
12.4
12.5
Pricing .......................................................................................................................................... 51
13.
13.1
13.2
13.3
Transactions ................................................................................................................................. 53
13.4
Scalability ..................................................................................................................................... 53
13.5
High Availability............................................................................................................................ 53
13.6
13.7
14.
14.1
14.2
14.3
14.4
Scalability ..................................................................................................................................... 57
14.5
Querying....................................................................................................................................... 57
14.6
14.7
14.8
Javascript...................................................................................................................................... 58
14.9
REST ............................................................................................................................................. 58
6
14.10
15.
What benefits cloud database and cloud computing brings for small and medium organizations?
62
15.1
15.2
15.3
16.
17.
Conclusion..................................................................................................................................... 69
Appendix ................................................................................................................................................... 70
Case studies from the industry Amazon RDS........................................................................................ 70
Case studies from the industry Microsoft SQL Azure ........................................................................... 70
Case studies from the industry Amazon DynamoDB ............................................................................ 70
Case studies from the industry Amazon SimpleDB............................................................................... 71
References ................................................................................................................................................ 72
1. Introduction
The boom of the cloud computing over the past few years has led to situation that it is common to many
innovations and new technologies. It became common for enterprises and a person to use the services
that are offered in the cloud and recognize that cloud computing is a big deal even though they are not
clear why that is so. Even the phrase in the cloud has been used in our colloquial language. Huge
percentage of the developers in the world is currently working on cloud-related products. Therefore
the cloud is this amorphous entity that is supposed to represent the future of modern computing.
In an attempt to gain a competitive edge, businesses are looking for new innovative ways to cut costs
while maximizing value. They recognize the need to grow but at the same time they are under pressure
to save money. The cloud gave this opportunity for the business allowing them to focus on their core
business by offering hardware and software solution without having to develop them by their own.
In this thesis I will give an overview of what cloud computing is. I will describe its main concepts and
architecture; and take a look at the paradigm XaaS (something/everything as a service) and the current
available options in the cloud mostly focusing on Database in the cloud or Database as a service. I will
give a closer look on how the cloud computing in general and database as a service can be used for small
and medium enterprises, what are the main benefits that it offers and will it really help businesses to
reduce the budget and focus on their core business.
Abstraction
Virtualization
Abstraction
Cloud computing is abstracting the details of the system implementation from the users and the
developers. Applications run on physical systems that aren't specified, data is stored in locations that
are unknown, administration of systems is outsourced to others, and access by users is ubiquitous.[1]
Virtualization
Cloud computing virtualizes systems by pooling and sharing resources. Systems and storage can be
provisioned as needed from a centralized infrastructure, costs are assessed on a metered basis, multitenancy is enabled, and resources are scalable with agility.
Cloud computing is an abstraction based on the notion of pooling physical resources and presenting
them as a virtual resource. It is a new model for provisioning resources, for staging applications, and
for platform-independent user access to services. Clouds can come in many different types, and the
services and applications that run on clouds may or may not be delivered by a cloud service provider.
Rapid elasticity - Capabilities can be elastically provisioned and released, in some cases
automatically, to scale rapidly outward and inward commensurate with demand. To the
consumer, the capabilities available for provisioning often appear to be unlimited and can be
appropriated in any quantity at any time.
Measured service - Cloud systems automatically control and optimize resource use by
leveraging a metering capability at some level of abstraction appropriate to the type of service
(e.g. storage, processing, bandwidth, and active user accounts). Resource usage can be
monitored, controlled, and reported, providing transparency for both the provider and
consumer of the utilized service.
Service Models:
Software as a Service (SaaS) - The capability provided to the consumer is to use the providers
applications running on a cloud infrastructure2. The applications are accessible from various
client devices through either a thin client interface, such as a web browser (e.g., web-based
email), or a program interface. The consumer does not manage or control the underlying
cloud infrastructure including network, servers, operating systems, storage, or even individual
application capabilities, with the possible exception of limited user specific application
configuration settings.
Platform as a Service (PaaS) - The capability provided to the consumer is to deploy onto the
cloud infrastructure consumer-created or acquired applications created using programming
languages, libraries, services, and tools supported by the provider. The consumer does not
manage or control the underlying cloud infrastructure including network, servers, operating
systems, or storage, but has control over the deployed applications and possibly configuration
settings for the application-hosting environment.
Infrastructure as a Service (IaaS) - The capability provided to the consumer is to provision
processing, storage, networks, and other fundamental computing resources where the
consumer is able to deploy and run arbitrary software, which can include operating systems
and applications. The consumer does not manage or control the underlying cloud
infrastructure but has control over operating systems, storage, and deployed applications; and
possibly limited control of select networking components (e.g., host firewalls).
Deployment Models:
Private cloud - The cloud infrastructure is provisioned for exclusive use by a single
organization comprising multiple consumers (e.g., business units). It may be owned, managed,
and operated by the organization, a third party, or some combination of them, and it may
exist on or off premises.
Community cloud - The cloud infrastructure is provisioned for exclusive use by a specific
community of consumers from organizations that have shared concerns (e.g., mission,
security requirements, policy, and compliance considerations). It may be owned, managed,
and operated by one or more of the organizations in the community, a third party, or some
combination of them, and it may exist on or off premises.
Public cloud - The cloud infrastructure is provisioned for open use by the general public. It is
usually open system available to general public via WWW or Internet. It may be owned,
managed, and operated by a business, academic, or government organization, or some
combination of them. It exists on the premises of the cloud provider. Examples of public
cloud: Google application engine, Amazon elastic compute cloud, Microsoft Azure.
Hybrid cloud - The cloud infrastructure is a composition of two or more distinct cloud
11
infrastructures (private, community, or public) that remain unique entities, but are bound
together by standardized or proprietary technology that enables data and application
portability (e.g., cloud bursting for load balancing between clouds). [2]
Applications in the cloud are usually composable systems, this means that they are using standard
component so assemble services that are tailored for a specific purpose. A composable component
must be:
In general cloud computing doesnt require that hardware and software to be composable but it is a
highly desirable characteristic. It makes system design easier to implement and solutions are more
portable and interoperable.
Some of the benefits from composable system are:
There is a trend toward designing composable systems in cloud computing in the widespread adoption
of what has come to be called the Service Oriented Architecture (SOA). The essence of a service
oriented design is that services are constructed from a set of modules using standard communications
and service interfaces. An example of a set of widely used standards describes the services themselves
in terms of the Web Services Description Language (WSDL), data exchange between services using
some form of XML, and the communications between the services using the SOAP protocol. There are,
of course, alternative sets of standards.[1]
What isn't specified is the nature of the module itself; it can be written in any programming language
the developer wants. From the standpoint of the system, the module is a black box, and only the
interface is well specified. This independence of the internal workings of the module or component
means it can be swapped out for a different model, relocated, or replaced at will, provided that the
interface specification remains unchanged. That is a powerful benefit to any system or application
provider as their products evolve.
Essentially there are 3 tiers in a basic cloud computing architecture:
Infrastructure
Platform
Application
If we further break down the standard cloud computing architecture there are really two areas to deal
with; the front end and back end.
Front End - The front end includes all client (user) devices and hardware in addition to their computer
network and the application that they actually use to make a connection with the cloud.
Back End - The back end is populated with the various servers, data storage devices and hardware that
facilitate the functionality of a cloud computing network.
2.3.1 Infrastructure
The infrastructure of cloud computing architecture is essentially all the hardware, data storage devices
(including virtualized hardware), networking equipment, applications and software that operates and
13
2.3.2 Platform
A cloud computing platform is the actual programming, code and implemented systems of interfacing
that help user-level devices (and applications) connect with the hardware and software resources of
the cloud. It is a software layer that is used to create higher level of services.
A cloud computing platform is generally divided up between the front end and back end of a network.
Its job is to provide a communication and access portal for the client, so that they may effectively
utilize the resources of the cloud network. The platform may only be a set of directions, but it is in all
actuality the most integral part of a cloud computing network; without it cloud computing would not
be possible.
There are many different Platform as a Service (PaaS) providers, we will mention some of them:
All platform services offer hosted hardware and software needed to build and deploy Web application
or services that are custom built by the developers.
It makes sense for operating system vendors to move their development environments into the cloud
14
with the same technologies that have been successfully used to create Web applications. Thus, you
might find a platform based on an Oracle xVM hypervisor virtual machine that includes
a NetBeans Integrated Development Environment (IDE) and that supports the Oracle GlassFish
Web stack programmable using Perl or Ruby. For Windows, Microsoft would be similarly interested in
providing a platform that allowed Windows developers to run on a Hyper-V VM, use the ASP.NET
application framework, support one of its enterprise applications such as SQL Server, and be
programmable within Visual Studiowhich is essentially what the Azure Platform does. This approach
allows someone to develop a program in the cloud that can be used by others.
Platforms often come with tools and utilities to aid in application design and deployment. Depending
on a vendor they can be: tools for team collaboration, testing tools, versioning tools, database and
web service integration, and storage tools. Platforms providers begin with creation of developers
community to support the work done in the environment.
Platform is exposed to users through an API, also an application built in the cloud using a platform
service would encapsulates the service through its own API. An API can control data flow,
communications, and other important aspects of the cloud application. Till now there are is no
standard API and each cloud vendor has their own.
packages that are put together for designers are often much easier to use than most standardized
design tools. These packages often allow software development teams to integrate and share their
work more smoothly as well as run the project from start to finish much faster than with other
systems.[3]
The global emergence of APaaS will no doubt lead to the creation of a number of companies that will
utilize the tools of APaaS to create their own business model, especially one that seeks to provide yet
another proprietary service aimed at delivering timely solutions to business software issues. One
particular area that could use the help is enterprise software, for example. Enterprise software is
often hard to manage, difficult to customize and frequently falls short in its functionalities. When you
couple these shortcomings with the fact that it is often quite expensive, there is a serious problem. An
obvious solution for dealing with enterprise software problems would be the deployment of an
APaaS-style service. Not only would this greatly increase the overall functionality of expensive
enterprise business software, but it would also allow for a great range of customization, as well as the
option for integrating it with other cloud services and/or networking opportunities. APaaS was
created to make the lives of software designers, developers and investors much easier. It is through
the use of APaaS that many excellent next generation apps have been developed and many experts in
the field of cloud computing agree that it is APaaS that will produce some of the upcoming game
changing applications that will actually shape the future of cloud computing in general.
2.3.4 Application
This area is compromised of the client hardware and the interface used to connect to the cloud. Big
problems arise from the design of Internet protocols to treat each request to a server as an
independent transaction (stateless service) [1]. The standard HTTP commands are all atomic in nature.
While stateless servers are easier to architect and stateless transactions are more resilient and can
survive outages, much of the useful work that computer systems need to accomplish are stateful.
Usage of transaction servers, message queuing servers and other similar middleware is meant to
bridge this problem. Standard methods that are part of Service oriented Architecture that help to
solve this issue and that are used in cloud computing are:
There are many ways how clients can connect to a cloud service. The most common are:
Web browser
Proprietary application
This application can run on number of different devices, PC, Servers, Smartphones, and Tablets. They
all need a secure way to communicate with the cloud. Some of the basic methods to secure the
connection are:
Remote data transfer such as Microsoft RDP or Citrix ICA that are using tunneling mechanism
Data encryption
16
3. Scalability
The scalability is the ability of a system to handle growing amount of work in a capable manner or its
ability to improve when additional resources are added.
The scalability requirement arises due to the constant load uctuations that are common in the
context of Web-based services. In fact these load uctuations occur at varying frequencies: daily,
weekly, and over longer periods. The other source of load variation is due to unpredictable growth (or
decline) in usage. The need for scalable design is to ensure that the system capacity can be
augmented by adding additional hardware resources whenever warranted by load uctuations.
Thus, scalability has emerged both as a critical requirement as well as a fundamental challenge in the
context of cloud computing.[1][4]
Typically there are two ways to increase scalability:
Vertical scalability by adding hardware resources, usually addition of CPU, memory etc.
This vertical scaling (scaling-up) enables them to use virtualizations technologies more
effectively by providing more resources for the hosted operating systems and applications to
share.
Horizontal scalability means to add more nodes to a system, such as adding new node to a
distributed software application or adding more access points within the current system.
Hundreds of small computers may be congured in a cluster to obtain aggregate computing
power. The Horizontal scalability (scale-out) model also creates an increased demand for
shared data storage with very high I/O performance especially where processing of large
amounts of data is required. In general, the scale-out paradigm has served as the
fundamental design paradigm for the large-scale data-centers of today.
Integrating multiple load balancers into your system is probably the best solution for dealing with
scalability issues. There are many different forms of load balancers to choose from; server farms,
software and even hardware that have been designed to handle and distribute increased traffic. Items
that interfere with scalability[3]:
Too much software clutter (no organization) within the hardware stack(s).
Creating a cloud network that offers the maximum level of scalability potential is entirely possible if
we apply a more diagonal solution. By incorporating the best solutions present in both vertical and
horizontal scaling, it is possible to reap the benefits of both models[3]. Once the servers reach the
limit of diminishing returns (no growth), we should simply start cloning them. This will allow us to
keep a consistent architecture when adding new components, software, apps and users. For most
individuals, problems arise from lack of resources not the inherent architecture of their cloud itself. A
more diagonal approach should help the business to deal with the current and growing demands that
it is facing.
17
4. Elasticity
Of all the attributes possessed by cloud computing in general, the most important is certainly its
elasticity. Its ability to amplify and instantly upgrade resources and/or capacities on a moment notice.
Storage, processing and the scalability of applications are all elastic in the cloud. The really remarkable
thing about cloud computing is the real-time infrastructure that actively responds on user requests for
resources. Without the real-time monitoring and support behind this elasticity, the effectiveness,
adaptability and muscle of cloud computing would be greatly undermined. It is this elastic ability that
the service providers possess which allows them to offer their users access to cloud computing
services at such reduced costs. Since users only pay for what they use they can save money. For
example with the traditional grid computing network every user has its own intensive hardware setup
of which most of the users rarely use more than 50% of the capacity. Their combined resource usage
might be 20-30% of the total resources available on their central cloud computing hardware stack.
What cloud computing is really offering is the ability for average users to retain their current
standards and expectations, while leaving the door open for instant expansion opportunities if they
desire it. This also gives a much more efficient way to use energy.
Elasticity offers the same computing experience to which we are accustomed, with the added benefit
of near limitless resources at the same time offering a way to manage the energy consumption. [1][3]
The elastic capabilities offered by cloud computing makes it perfectly suited toward handling certain
activities or processes.
Establishing an in office communication and online networking infrastructure (for
employees). Setting up a system that allows those in the organization a cleaner and
more efficient system for communicating and working often leads to greatly increased
profits.
Using cloud computing to handle overdrafting - high volume data transfer periods and
events. Some businesses only use cloud computing when they run out of their own
resources, or perhaps anticipate that they might lack needed functionalities.
This can be something that is scheduled for an annual or bi-annual basis; designed to
meet a seasonal demand for a particular product for example.
Assigning all customer data and transaction information to a cloud computing
element. This allows an organization to keep their customers data safe even from
their own employees. Utilizing a third party to handle all customer data can also pay
off in the event of a catastrophic type event. Cloud computing providers tend to keep
your information more securely backed-up than most are even aware of. [3]
In other word elasticity allows both user and provider to do more with less.
18
Overall functionality offering intuitive and relatively simple model for modeling different
types of applications.
Consistency, dealing with concurrent workloads without worrying about the data getting out
of sync
Performance, low latency and high throughput combined with many years of engineering and
development
Reliability, persistence of data in the presence of different types of failures and ensuring
safety.
The main concern is that the DBMSs and RDBMSs are not cloud-friendly because they are not as
scalable as the web-servers and application servers, which can scale from a few machines to hundreds.
The traditional DBMSs are not design to run on top of the shared-nothing architecture (where a set of
independent machines accomplish a task with minimal resource overlap) and they do not provide the
tools needed to scale-out from a few to a large number of machines.
Technology leaders such as Google, Amazon, and Microsoft have demonstrated that data centers
comprising thousands to hundreds of thousands compute nodes, provide unprecedented economiesof-scale since multiple applications can share a common infrastructure. All three companies have
provided frameworks such as Amazons AWS, Googles AppEngine and Microsoft Azure for hosting
third party application in their clouds (data-center infrastructures).
Because the RDBMs or transactional data management databases that back banking, airline
reservation, online e-commerce, and supply chain management applications typically rely on the ACID
(Atomicity, Consistency, Isolation, Durability) guarantees that databases provide and It is hard to
maintain ACID guarantees in the face of data replication over large geographic distances1, they even
have developed propriety data management technologies referred to as key-value stores or informally
called NO-SQL database management systems.[6] The need for web-based application to support
virtually unlimited number of users and to be able to respond to sudden load fluctuations raises the
requirement to make them scalable in cloud computing platforms. There is a need that such scalability
can be provisioned dynamically without causing any interruption in the service. Key-value stores and
other NOSQL database solutions, such as Google Datastore offered with Google AppEngine, Amazon
SimpleDB and DynamoDB, MongoDB and others, have been designed so that they can be elastic or can
be dynamically provisioned in the presence of load fluctuations. We will explain some of these systems
in more details later on.
1
CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to
simultaneously provide all three of the following guarantees:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the
system)
According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but
not all three.
19
As we move to the cloud-computing arena which typically comprises data-centers with thousands of
servers, the manual approach of database administration is no longer feasible. Instead, there is a
growing need to make the underlying data management layer autonomic or self-managing especially
when it comes to load redistribution, scalability, and elasticity. [7]
This issue becomes especially acute in the context of pay-per-use cloud-computing platforms hosting
multi-tenant applications. In this model, the service provider is interested in minimizing its operational
cost by consolidating multiple ten-ants on as few machines as possible during periods of low activity
and distributing these tenants on a larger number of servers during peak usage [7]. Due to the above
desirable properties of key-value stores in the context of cloud computing and large-scale data-centers,
they are being widely used as the data management tier for cloud-enabled Web applications. Although
it is claimed that atomicity at a single key is adequate in the context of many Web-oriented
applications, evidence is emerging that indicates that in many application scenarios this is not enough.
In such cases, the responsibility to ensure atomicity and consistency of multiple data entities falls on
the application developers. This results in the duplication of multi-entity synchronization mechanisms
many times in the application software. In addition, as it is widely recognized that concurrent programs
are highly vulnerable to subtle bugs and errors, this approach impacts the application reliability
adversely. The realization of providing atomicity beyond single entities is widely discussed in developer
blogs. Recently, this problem has also been recognized by the senior architects from Amazon and
Google, leading to systems like MegaStore [10] that provide transactional guarantees on key-value
stores.
Both RDBMs and NOSQL DBMs offerings in the cloud will be explained in more details, how they work
who offers them and how they are provisioned.
I will first focus on the relational database offered in the cloud. I will start with one of the first
Enterprise database built for the cloud, the Salesforces database.com.
20
6. Database.com
Database.com is a database management system that is built for cloud computing with multitenancy
inherent in its design. Traditional RDBMSs were designed to support on premises deployments for one
organization. All core mechanisms such as system catalog, cashing mechanisms and query optimizer
are built to support single-tenant applications and to run directly on a specifically tuned host operating
system and hardware. Only possible way to build multi-tenant cloud database service with standard
RDBMS is to use virtualization. Unfortunately, the extra overhead of the hypervisor typically hurts the
performance of the RDBMS. Database.com combines several different persistence technologies,
including a custom -designed relational database schema, which are innately designed for clouds and
multitenancy - no virtualization required.
6.1
Database.com Architecture
Database.coms core relational database technology uses a runtime engine that materializes all
application data from metadata - data about the data itself. In Database.coms metadata-driven
architecture, there is a clear separation of the compiled runtime database engine (kernel), tenant data,
and the metadata that describes each applications schema. These distinct boundaries make it possible
to independently update the system kernel and tenant -specific application schemas.
Every logical database object is internally managed using metadata. Objects, (tables in traditional
relational database parlance), fields, stored procedures, and database triggers are all abstract
21
constructs that exist merely as metadata in Database.coms Universal Data Dictionary (UDD).
Database.com used terminology is shown in Table 1.
Relational Database Term
Database
Table
Column
Row
When a new application object is defined or some procedural code is written, Database.com does not
create an actual table in a database or compile any code, it simply stores metadata that the systems
engine can use to generate the virtual application components at runtime. When modification or
customization of something about the application schema is needed, like modify an existing field in an
object, all thats required is a simple non-blocking update to the corresponding metadata [9].
In order to avoid performance-sapping disk I/O and code recompilations, and improve application
response times, Database.com uses massive and sophisticated metadata caches to maintain the most
recently used metadata in memory. The system runtime engine must be optimizes to access metadata
because frequent metadata access would prevent the service from scaling.
At the heart of Database.com is its transaction database engine. Database.com uses a relational
database engine with a specialized schema build for multitenancy. It also employs a search engine
(separate from the transaction engine) that optimizes full -text indexing and searches. As applications
update data, the search services background processes asynchronously update tenant - and userspecific indexes in near real time. The goal of this separation of duties between the transaction engine
and the search service lets applications process transactions without the overhead of text index
updates [9].
22
When application schemas are created, the UDD keeps track of metadata concerning the objects, their
fields, their relationships, and other object attributes. Few large database tables store the structured
and unstructured data for all virtual tables. A set of related multitenant indexes, implemented as
simple pivot tables with denormalized data, make the combined data set extremely functional.
Because Database.com manages object and field definitions as metadata rather than actual database
structures, the system can tolerate online multitenant application schema maintenance activities
without blocking the concurrent activity of other tenants and users [9].
6.3
Multitenant indexes
Database.com automatically indexes various types of fields to deliver scalable performance. Traditional
database systems rely on native database indexes to quickly locate specific rows in a database table
that have fields matching a specific condition. Index of MT_Data is managed by synchronously copying
field data marked for indexing to an appropriate column in a pivot table called MT_Indexes.
In some circumstances the external search engine can fail to respond to a search request. In this cases
Database.com falls back to a secondary search mechanism. A fallback search is implemented as a direct
database query with search conditions that reference the Name field of target records. To optimize
global object searches (searches that span tables) without having to execute potentially expensive
union queries, a pivot table called MT_Fallback_Indexes that records the Name of all records is
maintained. Updates to MT_Fallback_Indexes happen synchronously, as transactions modify records,
so that fall-back searches always have access to the most current database information [9].
6.4
Multitenant relationships
Database.com provides relationship datatypes that an organization can use to declare relationships
(referential integrity) among tables. When an organization declares an objects field with a relationship
type, the field is mapped to a Value field in MT_Data, and then uses this fie ld to store the ObjID of a
related object [9].
6.5
Database.com provides history tracking for any field. When a tenant enables auditing for a specific
field, the system asynchronously records information about the changes made to the field (old and
new values, change date, etc.) using an internal pivot table as an audit trail [9].
query optimizer need only to consider accessing data partitions that contain a tenants data rather
than an entire table or index. This common optimization is sometimes referred to as partition
pruning. [9]
6.7
Application development
Developers can declaratively build server-side application components using the Database.com
Console. This point-and-click interface supports all facets of the application schema building process,
including the creation of an applications data model (objects and their fields, relationships, etc.),
security and sharing model (users, profiles, role hierarchies, etc.), declarative logic (workflows), and
programmatic logic (stored procedures and triggers). The Console provides access to built-in system
futures which makes it easy to implement application functionality without the need of writing code
[9].
6.8
Data Access
Database.com provides the following tools to query and work with data.
Database.com REST API and Force.com Web Services API
The REST API and Web Services API can be used to interact with Database.com by creating, retrieving,
updating, and deleting records, maintaining passwords, performing searches, etc. This APIs can be used
with any language that supports Web services.
The SOAP-based API is optimized for real-time client applications that update small numbers of records
at a time [8] [9].
Force.com Bulk API
The Bulk API is based on REST principles, and is optimized for loading or deleting large sets of data. It
can be used to insert, update, delete, or restore a large number of records asynchronously by
submitting a number of batches that are processed in the background by Database.com. The Bulk
API is designed to simplify the processing of a few thousand to millions of records.
Apex Data Manipulation Language (DML)
DML statements are used to insert, delete, and update data from within your Apex code.
Apex Web Services
Apex methods can be exposed as Web service operations that can be called by external Web client
applications. This is a powerful tool for building efficient communication between data service and
application tier. By aggregating business logic onto Database.com, it can:
Build more robust applications, since all of the logic implemented in Apex is executed within a
transaction on Database.com [9]
6.9
Query languages
Database.com is using the Salesforce Object Query Language (SOQL) to construct database queries.
Similar to the SELECT command in the Structured Query Language (SQL), SOQL allows you to specify
the source object, a list of fields to retrieve, and conditions for selecting rows in the source object.
Database.com also includes a full-text, multi-lingual search engine that automatically indexes all textrelated fields. Apps can leverage this pre-integrated search engine using the Salesforce Object Search
Language (SOSL) to perform text searches.
Unlike SOQL, which can only query one object at a time, SOSL can search text, email, and phone fields
for multiple objects simultaneously [9].
The search engine receives data from the transactional engine, with which it creates search indexes.
The transactional engine forwards search requests to the search engine, which returns results that the
transaction engine uses to locate rows that satisfy the search request.
As applications update data in text fields (CLOBs, Name, etc.), a pool of background processes called
indexing servers are responsible for asynchronously updating corresponding indexes, which the search
25
engine maintains outside the core transaction engine. To optimize the indexing process, Database.com
synchronously copies modified chunks of text data to an internal to-be - indexed table as
transactions commit, thus providing a relatively small data source that minimizes the amount of data
that indexing servers must read from disk. The search engine automatically maintains separate indexes
for each organization (tenant).
Depending on the current load and utilization of indexing servers, text index updates may noticeably
lag behind actual transactions. To avoid unexpected search results originating from stale indexes,
Database.com also maintains an MRU (most recently used) cache of recently updated rows that the
system considers when materializing full-text search results. In order to efficiently support possible
search scopes, MRU caches are maintained per-user and per-organization.
Database.coms search engine optimizes the ranking of records within search results using several
different methods. For example, the system considers the security domain of the user performing a
search and weighs those rows to which the current user has access more heavily. The system can also
consider the modification history of a particular row and rank more actively updated rows ahead of
those that are relatively static. The user can choose to weight search results as desired, for example,
placing more emphasis on recently modified rows.
26
6.13 Backup
Database.com uses a variety of methods to ensure that organizations do not experience any data loss.
Every transaction is stored to RAID disks in real-time with archive mode enabled, allowing the database
to recover all transactions prior to any system failure. Every night all data is backed up to a separate
backup server and automatic tape library. The backup tapes are cloned as an additional precautionary
measure, and the cloned tapes are transported to an off-site, fireproof vault twice a month [8].
6.14 Pricing
Database.com pricing is based on number of users, records and transactions per month. The
registration of new account is free and it includes:
3 Standard Users
3 Administration Users
Additional storage and capacity can be purchased at any time with no downtime.
27
7.1
Subscriptions
To use SQL Azure, Windows Azure platform account must be used. This account allows access to all
the Windows Azure-related services, such as Windows Azure, Windows Azure AppFabric, and SQL
Azure. The Windows Azure platform account is used to set up and manage subscriptions and to bill for
consumption of any of the Windows Azure services including SQL Azure, and running SQL Azure does
not require Windows Azure. Whit the Windows Azure platform account, the Windows Azure Platform
Management portal can be used to create SQL Azure servers, databases, and its associated
administrator accounts [11].
Each subscription allows one instance of SQL Server to be defined, which will initially include only a
master database. For each server firewall settings has to be configured, to determine which
connections will be allowed access.
7.2
Databases
Each SQL Azure server always includes a master database. Up to 149 additional databases can be
created for each SQL Azure server. Microsoft is offering two editions of SQL Azure databases: Web
and Business, and when you create a database using the Windows Azure Platform Management
portal, the maximum size you specify determines the edition you create. A Web Edition database can
have a maximum size of 1 GB or 5GB. A Business Edition database can have maximum size of up to
150 GB of data, in 10GB increments up to 50GB, and then 50 GB increments [11][12]. If the size of the
database reaches the limit it is not possible to insert data, update data, or create new database
28
objects. However, read and delete data, truncate tables, drop tables and indexes, and rebuild indexes
are still possible.
SQL Azure data access model does not support cross-database queries in the current version a
connection is made to a single database. If data from another database is needed, new connection
must be created [11].
7.3
Most security issues for SQL Azure databases are managed by Microsoft within the SQL Azure data
center, with very little setup required by the users. A user must have a valid login and password in
order to connect to the SQL Azure database. Because SQL Azure supports only standard security, each
login must be explicitly created.
In addition, the firewall can be configured on each SQL Azure server to only allow traffic from
specified IP addresses to access the SQL Azure server. This helps to greatly reduce any chance of a
denial-of-service (DoS) attack. All communications between clients and SQL Azure must be SSL
encrypted, and clients should always connect with Encrypt = True to ensure that there is no risk of
man-in-the-middle attacks. DoS attacks are further reduced by a service called DoSGuard that actively
tracks failed logins from IP addresses and if it notices too many failed logins from the same IP address
within a period of time, the IP address is blocked from accessing any resources in the service [11].
The security model within a database is identical to that in SQL Server. Users are created and mapped
to login names. Users can be assigned to roles, and users can be granted permissions. Data in each
database is protected from users in other databases because the connections from the client
application are established directly to the connecting users database.
7.4
Each SQL Azure database is associated with its own subscription. From the subscribers perspective,
SQL Azure provides logical databases for application data storage. In reality, each subscribers data is
replicated across three SQL Server databases that are distributed across three physical servers in a
single data center. Many subscribers may share the same physical database, but the data is presented
to each subscriber through a logical database that abstracts the physical storage architecture and uses
automatic load balancing and connection routing to access the data. The logical database that the
subscriber creates and uses for database storage is referred to as a SQL Azure database [11].
7.5
SQL Azure subscribers access the actual databases, which are stored on multiple machines in the data
center, through the logical server. The SQL Azure Gateway service acts as a proxy, forwarding the
Tabular Data Stream (TDS) requests to the logical server. It also acts as a security boundary providing
29
login validation, enforcing the firewall and protecting the instances of SQL Server behind the gateway
against denial-of-service attacks. The Gateway is composed of multiple computers, each of which
accepts connections from clients, validates the connection information and then passes on the TDS to
the appropriate physical server, based on the database name specified in the connection. Figure 8
shows the physical architecture represented by the single logical server.
Figure 7 Figure 8 A logical server and its databases distributed across machines in the data center [11]
The machines with the SQL Server instances are called data nodes. Each data node contains a single
SQL Server instance, and each instance has a single user database, divided into partitions. Each
partition contains one SQL Azure client database, either a primary or secondary replica. Each database
hosted in the SQL Azure data center has three replicas: one primary replica and two secondary
replicas. All reads and writes go through the primary replica, and any changes are replicated to the
secondary replicas asynchronously. The replicas are the central means of providing high availability
for your SQL Azure databases.
The other SQL Azure databases partitions existing within the same SQL Server instances in the data
center are completely invisible and unavailable between different subscribers [11].
For SQL Azure databases every commit needs to be a quorum commit. That is, the primary replica and
at least one of the secondary replicas must confirm that the log records have been written before the
transaction is considered to be committed.
Each data node machine hosts a set of processes referred to as the fabric. The fabric processes
perform the following tasks:
Failure detection: notes when a primary or secondary replica becomes unavailable so that
the Reconfiguration Agent can be triggered
Engine Throttling: ensures that one logical server does not use a disproportionate amount of
the nodes resources, or exceed its physical limits
Ring Topology: manages the machines in a cluster as a logical ring, so that each machine has
two neighbors that can detect when the machine goes down
The machines in the data center are all commodity machines with components that are of low-tomedium quality and low-to-medium performance capacity. The low cost and the easily available
configuration make it easy to quickly replace machines in case of a failure condition. In addition,
Windows Azure machines use the same commodity hardware, so that all machines in the data center,
whether used for SQL Azure or for Windows Azure, are interchangeable
In Figure 7, the logical server contains three databases: DB1, DB2, and DB3. The primary replica for
DB1 is on Machine 6 and the secondary replicas are on Machine 4 and Machine 5. For DB3, the
primary replica is on Machine 4, and the secondary replicas are on Machine 5 and on another
machine not shown in this figure. For DB4, the primary replica is on Machine 5, and the secondary
replicas are on Machine 6 and on another machine not shown in this figure. Note that this diagram is
a simplification. Most production Microsoft SQL Azure data centers have hundreds of machines with
hundreds of actual instances of SQL Server to host the SQL Azure replicas, so it is extremely unlikely
that if multiple SQL Azure databases have their primary replicas on the same machine, their secondary
replicas will also share a machine [11].
The physical distribution of databases that all are part of one logical instance of SQL Server means
that each connection is tied to a single database, not a single instance of SQL Server.
7.6
Network Topology
Four distinct layers of abstraction work together to provide the logical database for the subscribers
application to use: the client layer, the services layer, the platform layer, and the infrastructure layer.
Figure 8 illustrates the relationship between these four layers.
The client layer resides closest to the application, and it is used by the application to communicate
directly with SQL Azure. The client layer can reside on-premises in a data center, or it can be hosted in
Windows Azure. Every protocol that can generate TDS over the wire is supported. Because SQL Azure
provides the TDS interface same as SQL Server, known and familiar tools and libraries can be used to
build client applications for data that is in the cloud.
The infrastructure layer represents the IT administration of the physical hardware and operating
systems that support the services layer.
31
Figure 8 Four layers of abstraction provide the SQL Azure logical database for a client application to use [11]
32
7.7
The goal for Microsoft SQL Azure is to maintain 99.9 percent availability for the subscribers
databases. As it was stated earlier this goal is achieved by the use of commodity hardware that can
be quickly and easily replaced in the case of machine or drive failure and the management of the
replicas, one primary and two secondary, for each SQL Azure database [12].
7.8
Failure Detection
Management in the data centers needs to detect not only a complete failure of a machine, but also
conditions where machines are slowly degenerating and communication with them is affected. The
concept of quorum commit, discussed earlier, addresses these conditions. First, a transaction is not
considered to be committed unless the primary replica and at least one secondary replica can
confirm that the transaction log records were successfully written to disk. Second, if both a primary
replica and a secondary replica must report success, small failures that might not prevent a
transaction from committing but that might point to a growing problem can be detected [11].
7.9
Reconfiguration
The process of replacing failed replicas is called reconfiguration. Reconfiguration can be required
due to failed hardware or to an operating system crash, or to a problem with the instance of SQL
Server running on the node in the data center. Reconfiguration can also be necessary when an
upgrade is performed, whether for the operating system, for SQL Server, or for SQL Azure.
All nodes are monitored by six peers, each on a different rack than the failed machine. The peers
are referred to as neighbors. A failure is reported by one of the neighbors of the failed node, and
the process of reconfiguration is carried out for each database that has a replica on the failed node.
Because each machine holds replicas of hundreds of SQL Azure databases (some primary replicas
and some secondary replicas), if a node fails, the reconfiguration operations are performed
hundreds of times. There is no prioritization in handling the hundreds of failures when a node fails;
the Partition Manager randomly selects a failed replica to handle, and when it is done with that
one, it chooses another, until all of the replica failures have been dealt with.
If a node goes down because of a reboot, that is considered a clean failure, because the neighbors
receive a clear exception message.
Another possibility is that a machine stops responding for an unknown reason, and an ambiguous
failure is detected. In this case, an arbitrator process determines whether the node is really down.
Although this discussion centers on the failure a single replica, it is really the failure of a node that is
detected and dealt with. A node contains an entire SQL Server instance with multiple partitions
containing replicas from up to 650 different databases. Some of the replicas will be primary and
some will be secondary. When a node fails, the processes described earlier are performed for each
affected database. That is, for some of the databases, the primary replica fails, and the arbitrator
chooses a new primary replica from the existing secondary replicas, and for other databases, a
33
7.12 Throttling
Because of the multitenant use of each SQL Server in the data center, it is possible that one
subscribers application could render the entire instance of SQL Server ineffective by imposing
heavy loads. For example, under full recovery mode, inserting lots of large rows, especially ones
containing large objects, can fill up the transaction log and eventually the drive that the transaction
log resides on. In addition each instance of SQL Server in the data center shares the machine with
34
other critical system processes that cannot be starved most relevantly the fabric process that
monitors the health of the system.
To keep a data center servers resources from being overloaded and jeopardizing the health of the
entire machine, the load on each machine is monitored by the Engine Throttling component. In
addition, each database replica is monitored to make sure that statistics such as log size, logs write
duration, CPU usage, the actual physical database size limit, and the SQL Azure user database size
are all below target limits. If the limits are exceeded, the result can be that a SQL Azure database
rejects reads or writes for 10 seconds at a time. Occasionally, violation of resource limits may result
in the SQL Azure database permanently rejecting reads and writes (depending on the resource type
in question) [11].
to start with a small investment and add space as the business grows. SQL Azure provides two
different database editions, Business Edition and Web Edition. SQL Azure edition features apply to
the individual database. They can be mixed and match different database editions within the same
SQL Azure server.
Both editions offer scalability, automated high availability, and self-provisioning.
The Web Edition Database is suited for small Web applications and workgroup or
departmental applications. This edition supports a database with a maximum size of 1 or 5
GB of data.
The Business Edition Database is suited for independent software vendors (ISVs), line- ofbusiness (LOB) applications, and enterprise applications. This edition supports a database
of up to 150 GB of data, in 10GB increments up to 50GB, and then 50 GB increments.
Both editions charge an additional bandwidth-based fee when the data transfer includes a client
outside the Windows Azure platform or outside the region of the SQL Azure database.
You specify the edition and maximum size of the database when you create it; you can also change
the edition and maximum size after creation. The billing will be based on the new edition type (and
the peak size the database reaches, daily) [13].
Microsoft is charging monthly fee for each SQL Azure user database. The database fee is amortized
over the month and charged daily. The daily fee depends on the peak size that each database
reached that day, the edition of each database, and the maximum number of databases you. A 10
GB multiplier is used for pricing Business Edition databases and a 1 GB or 5 GB multiplier is used for
pricing Web Edition databases. Users pay for the databases they have, for the days they have them
[13].
Bandwidth used between SQL Azure and Windows Azure or Windows Azure AppFabric is free
within the same sub-region or data center.
36
8. Amazon WebServices
Amazon is another company that is offering relational database service as a part of their amazon
web services. In the next section I will first speak about Amazon relational database services and
later I will give an overview of their NOSQL database, Amazon SimpleDB and DynamoDB and
another NOSQL solutions currently available.
8.1
Amazon Relational Database Service (Amazon RDS) is a web service that can operate, and to some
level scale a relational database in the cloud. It provides cost-efficient and resizable capacity while
automating the administration tasks. Amazon RDS gives the users access to the capabilities of a
MySQL or Oracle database running on their own Amazon RDS database instance. This gives the
advantage that the code and applications that use on-premises MySQL or Oracle database can be
easily migrated to Amazon RDS.
8.2
Amazon RDS has different approach then the Database.com and SQL Azure. It offers the full
capabilities of MySQL or Oracle database running on separate database instance. The features
provided by Amazon RDS depend on the DB Engine you select. In general it offers:
Monitoring and Metrics Amazon RDS provides Amazon CloudWatch metrics for the DB
Instance deployments. AWS Management Console can be used to view key operational
metrics for the DB Instance deployments, including compute/memory/storage capacity
utilization, I/O activity, and DB Instance connections.
Automatic Software Patching Amazon RDS will make sure that the relational database
software stays up-to-date with the latest patches
Isolation and Security Using Amazon VPC2, it is possible to isolate DB Instances in own
virtual network, and connect to an existing IT infrastructure using industry-standard
encrypted IPsec VPN. In addition, for both MySQL and Oracle, it allows controlling access
to the DB Instances using database security groups (DB Security Groups). A DB Security
Group acts like a firewall controlling network access to the DB Instance. By default,
network access is turned off to the DB Instances. For applications to access a DB Instance
DB Security Group must be set to allow access from EC23 Instances with specific EC2
Security Group membership or IP ranges [14].
8.3
Amazon RDS gives flexibility of being able to scale the compute resources or storage capacity
associated with the relational database instance by using the Amazon RDS APIs or through the AWS
Management Console. The compute and memory resources can be scaled up or down by using
predefined DB Instance Classes. Currently Amazon is offering five supported DB Instance classes:
Small DB Instance: 1.7 GB memory, 1 ECU (1 virtual core with 1 ECU), 64-bit platform,
Moderate I/O Capacity
Large DB Instance: 7.5 GB memory, 4 ECUs (2 virtual cores with 2 ECUs each), 64-bit
platform, High I/O Capacity
High-Memory Extra Large Instance 17.1 GB memory, 6.5 ECU (2 virtual cores with 3.25
ECUs each), 64-bit platform, High I/O Capacity
For each DB Instance class, it is possible to select from 5GB to 1TB of associated storage capacity.
Additional storage can be provisioned on the fly with no downtime.
One ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon
processor [14].
2
Amazon Virtual Private Cloud (Amazon VPC) - isolated section of the Amazon Web Services (AWS) Cloud where you
can launch AWS resources in a virtual network that you define, offering complete control over your virtual networking
environment, including selection of your own IP address range, creation of subnets, and configuration of route tables
and network gateways.
3
Amazon Elastic Compute Cloud (EC2) - web service that provides resizable compute capacity in the cloud.
38
8.4
High Availability
Amazon RDS run on the same high reliable infrastructure as the other Amazon web services. It has
multiple features that enhance availability for critical production databases. Currently it offers
Automatic host replacement and Replication.
With the automatic host replacement, Amazon RDS will automatically replace the compute instance
powering the deployment in the event of a hardware failure.
The replication at this time is supported only for MySQL, although it is planned to be available for
oracle in the near future. For MySQL Amazon RDS provides two replication features, Multi-AZ
deployments and read replicas.
With Multi-AZ deployments Amazon RDS will automatically provision and manage a standby
replica in a different Availability Zone (independent infrastructure in a physically separate location).
Database updates are made concurrently on the primary and standby resources to prevent
replication lag. In the event of planned database maintenance, DB Instance failure, or an Availability
Zone failure, Amazon RDS will automatically failover to the up-to-date standby so that database
operations can resume quickly without administrative intervention. Prior to failover you cannot
directly access the standby, and it cannot be used to serve read traffic.
Read Replicas make it easy to elastically scale out beyond the capacity constraints of a single DB
Instance for read-heavy database workloads. It is possible to create one or more replicas of a given
source DB Instance and serve high-volume application read traffic from multiple copies of the data,
thereby increasing aggregate read throughput. Amazon RDS uses MySQLs native replication to
propagate changes made to a source DB Instance to any associated Read Replicas. Since Read
Replicas leverage standard MySQL replication, they may fall behind their sources, and they are
therefore not intended to be used for enhancing fault tolerance in the event of source DB Instance
failure or Availability Zone failure [14].
8.5
Pricing
Same as with the other, previously mentioned DBMS services, Amazon RDS pricing is based on the
usage and the DB Instance class. It is possible to choose between hourly On-Demand pricing with
no up-front or long-term commitments with reserved pricing option.
On-Demand DB Instances lets user to pay for compute capacity by the hour with no longterm commitments. This frees you from the costs and complexities of planning,
purchasing, and maintaining hardware and transforms what are commonly large fixed
costs into much smaller variable costs.
Reserved DB Instances give users the option to make a low, one-time payment for each
DB Instance they want to reserve and in turn receive a discount on the hourly usage
charge for that DB Instance. Depending on usage, there is a possibility to choose between
three Reserved DB Instance types (Light, Medium, and Heavy Utilization) and receive
anywhere between 30% and 55% of discount over On-Demand prices. Based on the
39
application workload and the amount of time they will run, Amazon RDS Reserved
Instances may provide substantial savings over running on-demand DB instances.
The prices are different weather standard or Multi-AZ Deployment is used. For both standard and
Multi-AZ deployments, pricing is per DB Instance-hour consumed, from the time a DB Instance is
launched until it is terminated.
There is no additional charge for backup storage up to 100% of provisioned database storage for an
active DB Instance. After the DB Instance is terminated, backup storage is billed at per GB-month.
Also additional backup storage is billable.
Data transferred between Amazon RDS and Amazon EC2 Instances in the same Availability Zone
and
Data transferred between Availability Zones for replication of Multi-AZ deployments is free.
Amazon RDS DB Instances outside VPC: For data transferred between an Amazon EC2 instance and
Amazon RDS DB Instance in different Availability Zones of the same Region, there is no Data
Transfer charge for traffic in or out of the Amazon RDS DB Instance. Charges apply only for the Data
Transfer in or out of the Amazon EC2 instance, and standard Amazon EC2 Regional Data Transfer
charges apply.
Amazon RDS DB Instances inside VPC: For data transferred between an Amazon EC2 instance and
Amazon RDS DB Instance in different Availability Zones of the same Region, Amazon EC2 Regional
Data Transfer charges apply on both sides of transfer.
Data transferred between Amazon RDS and AWS services in different regions is charged as Internet
Data Transfer on both sides of the transfer.
Additionally for Oracle database there are two licensing models, License Included and BringYour- Own-License (BYOL). In the "License Included" service model, you do not need separately
purchased Oracle licenses; the Oracle Database software has been licensed by AWS.
Bring-Your-Own-License is suited for users that already own Oracle Database licenses. The BYOL
model is designed for customers who prefer to use existing Oracle database licenses or purchase
new licenses directly from Oracle [14].
9.
Google Cloud SQL is a MySQL database in the Google's cloud. It has all the capabilities and
functionality of MySQL. Google Cloud SQL is currently available for Google App Engine applications
that are written in Java or Python. It can also be accessed from a command-line tool.
As all the others database as a service offers Google cloud SQL is fully managed, patch
management, replication and other database management chores are managed by Google.
40
High availability is offered by built in automatic replication across multiple geographic regions so da
service is available and data is preserved even when whole data center becomes unavailable. Users
can choose to create databases and choose synchronous or asynchronous replication in
datacenters in the EU or the US.
Google cloud SQL is tightly integrated with Google App Engine and other Google services which
allow users to work across multiple products and get more value of their data. The database
instances are not restricted to be used only by one application in the app engine allowing multiple
applications to use same instance and database. Data to the database can be imported usind
mysqldumps. This allows users to easily move data, applications, and services in and out of the
cloud.
As initial trial Google is offering instances with small amount of RAM and 0.5GB of database
storage. Additional RAM and storage can be purchased up to 16GB of RAM and 100GB of storage
[15].
9.1
Pricing
Google offers two billing plans for Google Cloud SQL, Packages or Per Use. The packages offer is
shown in the table below:
Tier
RAM
Included Storage
D1
0.5GB
1GB
850K
D2
1GB
2GB
1.7M
D4
2GB
5GB
4M
D8
4GB
10GB
8M
D16
8GB
10GB
16M
D32
16GB
10GB
32M
Each database instance is allocated the RAM shown above, along with an appropriate amount of
CPU. Storage is measured as the filespace used by the MySQL database. Bills are issued monthly,
based on the number of days during which the database existed. Google is not charging the storage
for backups created using the scheduled backup service. The number of I/O requests to storage
made by database instance depends on the queries, workload and data set. Cloud SQL caches data
in memory to serve queries efficiently and to minimize the number of I/O requests. Use of storage
or I/O over the included quota is charged at the Per Use rate. The maximum storage for any
instance is currently 100GB.
41
With the Per Use plan the same tiers as with the packages are offered with the difference that
database instances is charged for periods of continuous use. Storage is charged per GB in hourly
units (whether the database is active or not) measured as the largest number of bytes during that
one hour period, rounded up to the nearest GB and the I/O are charged by number rounded to the
nearest million.
Network use is charged for both packages and per use billing plans. Only outbound external traffic
is charged the network usage between Google App Engine applications and Cloud SQL is not
charged [15].
10.
As we can see from the previous section Relational Database as a Service (DBaaS) is currently found
in the public marketplace in two broad capabilities - online general relational databases, and the
ability to operate virtual machine images loaded with common databases such as MySQL, Oracle or
similar commercial databases.
Database.com offers relational multitenant database specially build for the cloud using their
metadata-driven architecture.
Microsoft AzureSQL offers SQL Server like relational database management system and controls
many of the database configuration details allowing the users to focus on the schema, data and
application layer.
Amazon RDS provides implementation of MySQL or Oracle on virtual machine build and tune for
that purpose and Goolge also has their cloud SQL providing MySQL for their AppEngine PaaS.
While the all presented RDBMS DBaaS provide an opportunity to reduce cost there are many
consideration to taken before moving the data to a cloud based solution. Figure 11 presents the
main considerations comparison.
Data Sizing - All of the RDBMS DBaaS offerings presented have limits on the size of the data set
that can be stored on their systems.
Portability - Portability and adherence to standards is a critical issue for ensuring Continuity of
Operations and to mitigate business risk (e.g., a provider going out of business or raising rates). The
ability to instantiate a replicated version of the data off-cloud or in another cloud offering can
provide the business owners with an extra level of assurance that they will not suffer a loss of data.
This can be facilitated by standards, such as the use of a standard database query language (SQL).
Transaction Capabilities - Transaction capabilities are an essential feature for databases that need
to provide guaranteed reads and writes (ACID).
42
Salesforce
Database.com
Microsoft SQL
Azure
Amazon RDS
(MySQL or Oracle)
Google Cloud
SQL
Maximum amount
of data that can be
stored
Maximum data is
limited by number of
records per
database. Up to
22300000 records.
1 terabyte per
database instance.
100GB per
database
instance.
Ease of software
portability with
similar locally hosted
capability
Low. Requires
database to be
specially built and
tested by Salesforce
before deployment.
High. MySQL/Oracle
instantiation in cloud
is very similar to the
local instantiated
version.
Medium.
MySQL
instance in
the cloud very
similar to the
local instance
but accessible
only by
Google App
Engine
Transaction
capabilities
Yes
Yes
Yes
Yes
Configurability and
ability to tune
databases
Low. It creates
indexes
automatically and
keeps record of most
recently accessed
records but does not
allow control over it.
Also does not allow
control over memory
allocation and similar
resources.
Medium. Can
create indexes and
stored procedures,
but no control over
memory allocation
or similar
resources.
High. MySQL/Oracle
instantiation in cloud
on virtual machine.
Low.
Automatically
tuned.
Database accessible
as stand-alone
offering.
Yes
Yes
Yes
No. Requires
Google App
Engine
application
layer.
Possibility to
designate where the
data is stored (ex.
Region or data
center)
No
Yes
Yes
Yes
Replication
No
Yes
Yes
Yes
43
Configurability - DBaaS offerings may provide capabilities that reduce the amount of configuration
options available to database administrators. For some applications, if more configurability options
are managed by the platform owner rather than the customers database administrator, this can be
a benefit and it can reduce the amount of effort expended to maintain the database. For others,
the inability to tune and control all aspects of the database, such as memory management, can be
a limiting constraint in obtaining performance.
Database Accessibility - Most DBaaSs offer a predefined set of connectivity mechanisms that will
directly impact adoption and use. There are three general approaches. First, Most RDBMS offerings
are typically accessible through industry standard database drivers such as Java Database
Connectivity (JDBC) or Open Database Connectivity (ODBC). These drivers allow for applications
external to the service to access the database through a standard connection, facilitating
interoperability. Second, services typically provide interfaces that use standards-based, ServiceOriented Architecture (SOA) protocols, such as SOAP or REST, with Hypertext Transfer Protocol
(HTTP) and a vendor-specific API definition. These services may provide software development kits
in common source-code languages to facilitate the adoption. Third, some databases may be
restricted to accessing data through software running in the vendors ecosystem. This approach
may increase security, but it also significantly limits portability and interoperability.
Availability and Replication - the ability to ensure that data is available and not lost will be a key
consideration. Ensuring access to data can come through enforcement of service-level agreements
(SLA) metrics such as up time, replication across a cloud providers regions, and replication or
movement of the data across cloud providers or to the consuming organizations data center.
Replication across a cloud providers hardware within a region may ameliorate the effects
of a localized hardware or software failure.
Replication across a cloud providers geographic regions may ameliorate the effects of a
network outage, natural disaster, or other regional event.
Many providers such as Microsoft and Amazon offer replication of the data across hardware within
a specific region as part of a packaged service. Within a given vendor, replication across
geographies is usually more expensive and may result in significant data transfer fees.
44
11. NOSQL
While RDBMS databases are widely deployed and successful, they have shortcomings for some
applications that have been filled by the growing use of NoSQL databases. Rather than conforming
to SQL standards and providing relational data modeling, NoSQL databases typically offer fewer
transactional guarantees than RDBMSs in exchange for greater flexibility and scalability. NoSQL
databases tend to be less complex than RDBMSs and scale horizontally across lower-cost hardware.
Unlike RDBMSs, which share a common relational data model, several different types of databases,
such as column-oriented, key-value, and document-oriented, are considered as NoSQL
databases. NoSQL databases tend to be used in applications that do not require the same level of
data consistency guarantees that RDBMS systems provide but that require throughput levels that
would be very expensive for RDBMSs to support.
Dynamo do not require support for hierarchical namespaces (a norm in many file systems) or
complex relational schema (supported by traditional databases). Dynamo can be characterized as a
zero-hop DHT, where each node maintains enough routing information locally to route a request to
the appropriate node directly in order to avoid routing requests through multiple nodes and meet
the need of the latency sensitive applications that require at least 99.9% of read and write
operations to be performed within a few hundred milliseconds [17].
Dynamo gave to the developers a system that met their reliability, performance, and scalability
needs, it did nothing to reduce the operational complexity of running large database systems. Since
developers were responsible for running their own Dynamo installations, they had to become
experts on the various components running in multiple data centers. Also, they needed to make
complex tradeoff decisions between consistency, performance, and reliability. This operational
complexity was a barrier that kept them from adopting Dynamo [17].
46
Notice that the table has a name, "my table", but the item does not have a name. The primary key
defines the item; the item with primary key "ImageID"=1. [18]
Tables
Tables contain items, and organize information into discrete areas. All items in the table have the
same primary key scheme. Attribute name (or names) to be used for the primary key are
designated when a table is created, and the table requires each item in the table to have a unique
primary key value. The first step in writing data to DynamoDB is to create a table and designate a
table name with a primary key. The following is a larger table that also uses the ImageID as the
primary key to identify items.
DyanomoDB also allows specifying a composite primary key which enable specifying two attributes
in a table that collectively form a unique primary index. All items in the table must have both
attributes. One serves as a hash partition attribute and the other as a range attribute. For
example, there might be a Status Updates table with a composite primary key composed of
UserID (hash attribute, used to partition the workload across multiple servers) and a Time
(range attribute). Then query can be executed to fetch either: 1) a particular item uniquely
identified by the combination of UserID and Time values; 2) all of the items for a particular hash
bucket in this case UserID; or 3) all of the items for a particular UserID within a particular time
range. Range queries against Time are only supported when the UserID hash bucket is specified.
[18]
47
Table: My Images
Primary
Key
Other Attributes
ImageID
=1
ImageLocation =
https://s3.amazonaws.com/bucket/img_1.jpg
Date =
1260653179
Title =
flower
Tags =
Flower,
Jasmine
Width = 1024
Depth =
768
ImageID
=2
ImageLocation =
https://s3.amazonaws.com/bucket/img_2.jpg
Date =
1252617979
Rated =
3, 4, 2
Tags =
Work,
Seattle,
Office
Width = 1024
Depth =
768
ImageID
=3
ImageLocation =
https://s3.amazonaws.com/bucket/img_3.jpg
Date =
1285277179
Price =
10.25
Tags =
Seattle,
Grocery,
Store
Author = you
Camera =
phone
ImageID
=4
ImageLocation =
https://s3.amazonaws.com/bucket/img_4.jpg
Date =
1282598779
Title =
Hawaii
Author =
Joe
are many applications that benefit from predictable performance as their workloads scale: online
gaming, social graphs applications, online advertising, and real-time analytics to name a few.
DynamoDBs gives the ability of Provisioned Throughput. Users can specify the request
throughput capacity they require for a given table. DynamoDB will allocate sufficient resources to
the table to predictably achieve this throughput with low-latency performance. Throughput
reservations are elastic and can be increased or decreased on-demand using the AWS Management
Console or the DynamoDB APIs. CloudWatch metrics provides the ability to make informed
decisions about the right amount of throughput to be dedicated to a particular table.
Amazon DynamoDB also integrates with Amazon Elastic MapReduce (Amazon EMR) which allows
businesses to perform complex analytics on their large datasets using a hosted Hadoop framework
on AWS. [18]
Some of the ways in which EMR can be used with DynamoDB are as follows:
Users can analyze data stored in DynamoDB using EMR and store the results of the
analysis in S3 while leaving the original data in DynamoDB.
Users can back up the data from DynamoDB to S3 using EMR.
Customers can also use Amazon EMR to access data in multiple stores, do complex
analysis over this combined dataset, and store the results of this work.
49
CustomerID
First
name
Last
name
Street
address
City
State
Zip
Telephone
123
Bob
Smith
123 Main
St
Springfield
MO
65801
222-3334444
456
James
Johnson
456 Front
St
Seattle
WA
98104
333-4445555
Amazon SimpleDB differs from tables of traditional databases in important ways. It offers the
flexibility to easily go back later on and add new attributes that only apply to certain records. For
example, adding customers email addresses to enable real-time alerts on order status it is possible
to add the new records and any additional attributes to the existing customers domain. The
resulting domain might look something like this:
First name
Last name
Street
address
123
Bob
Smith
456
James
789
Deborah
City
State
Zip
Telephone
123 Main
St
Springfield
MO
65801
222-3334444
Johnson
456 Front
St
Seattle
98104
333-4445555
Thomas
789
Garfield
New York
10001
444-5556666
CustomerID
WA
NY
dthomas@xyz.com
Domains have a finite capacity in terms of storage (10 GB) and request throughput which is
considerable scaling limitation. Although there is a possibility to work around this limitation by
partitioning workloads over many domains, this is not that simple to implement. SimpleDB also fails
to meet the requirement of incremental scalability which is possible with DynamoDB.
Another limitation of SimpleDB is Predictability of Performance. SimpleDB indexes all attributes for
each item stored in a domain. While this simplifies schema design and provides query flexibility, it
has a negative impact on the predictability of performance. For example, every database write
needs to update not just the basic record, but also all attribute indices (regardless of whether all
indices are used for querying). Similarly, since the Domain maintains a large number of indices, its
50
working set does not always fit in memory. This impacts the predictability of a Domains read
latency, particularly as dataset sizes grow.
SimpleDBs original implementation had taken the "eventually consistent"4 approach to the
extreme and presented users with consistency windows that were up to a second in duration. This
meant that developers used to a more traditional database solution had trouble adapting to it. The
SimpleDB team eventually addressed this issue by enabling users to specify whether a given read
operation should be strongly or eventually consistent. consistent read can potentially incur higher
latency and lower read throughput it is best to use it only when an application scenario mandates
that a read operation absolutely needs to read all writes that received a successful response prior
to that read. For all other scenarios the default eventually consistent read yield the best
performance. [18]
12.5 Pricing
As the other services, DynamoDB and SimpleDB keep the pay only for what you use model. The
pricing is calculated based on the provisioned throughput capacity, index data storage and data
transfer.
When a DynamoDB table is created or updated the needed capacity to be reserved is specified for
reads and writes and it is charged hourly based on the capacity used. A unit of Write Capacity
enables users to perform one write per second for items of up to 1KB in size. Similarly, a unit of
Read Capacity enables users to perform one strongly consistent read per second (or two eventually
consistent reads per second) of items of up to 1KB in size.
Amazon DynamoDB is an indexed datastore, and the amount of disk space the data consumes will
exceed the raw size of the data uploaded. Amazon DynamoDB measures the size of the billable data
by adding up the raw byte size of the uploaded data, plus a per-item storage overhead of 100 bytes
to account for indexing. The first 100MB stored per month are offered free and after that the price
is calculated per GB depending on region.
As with the other AWS there is no additional charge for data transferred between Amazon
DynamoDB , SimpleDB and other Amazon Web Services within the same Region. Data transferred
across Regions (e.g. between Amazon DynamoDB in the US East (Northern Virginia) Region and
Amazon EC2 in the EU (Ireland) Region), is charged at Internet Data Transfer rates on both sides of
the transfer.
Amazon SimpleDB is biling based on machine hours utilization and data transfer depending on the
region where the SimpleDB domains are established.
Amazon SimpleDB measures the machine utilization of each request and charges based on the
amount of machine capacity used to complete the particular request (SELECT, GET, PUT, etc.),
normalized to the hourly capacity of a circa 2007 1.7 GHz Xeon processor. [18]
Eventually consistent- It means that given a sufficiently long period of time over which no changes are sent,
all updates can be expected to propagate eventually through the system and all the replicas will be
consistent.
51
an integer numeric ID
An optional ancestor path locating the entity within the Datastore hierarchy.
Entities in the Datastore form a hierarchically structured space similar to the directory structure of
a file system. When an entity is created, it is possible designate another entity as its parent; the
new entity is a child of the parent entity. This creates the ancestor path. [20]
Join operations
All the queries in the Datastore are eventually consistent. A typical query includes the following:
Zero or more filters based on the entities' property values, keys, and ancestors
In addition to retrieving entities from the Datastore directly by their keys, an application can
perform a query to retrieve them by the values of their properties [20].
13.3 Transactions
The Datastore can execute multiple operations in a single transaction. By definition, a transaction
cannot succeed unless every one of its operations succeeds. If any of the operations fails, the
transaction is automatically rolled back. This is especially useful for distributed web applications,
where multiple users may be accessing or manipulating the same data at the same time [20].
13.4 Scalability
The App Engine Datastore is designed to scale, allowing applications to maintain high performance
as they receive more traffic:
Datastore reads scale because the only queries supported are those whose performance
scales with the size of the result set (as opposed to the data set). This means that a query
whose result set contains 100 entities performs the same whether it searches over a
hundred entities or a million. This property is the key reason some types of query are not
supported [20].
Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the
process of agreeing on one result among a group of participants. This problem becomes difficult when the
participants or their communication medium may experience failures
53
Each call to the Datastore API counts toward the Datastore API Calls quota.
Data sent to the Datastore by the application counts toward the Data Sent to Datastore API
quota.
Data received by the application from the Datastore counts toward the Data Received from
Datastore API quota.
The total amount of data currently stored in the Datastore for the application cannot exceed the
Stored Data (billable) quota. This includes all entity properties and keys, as well as the indexes
needed to support querying those entities. The following table shows the limits that apply
specifically to the use of the Datastore [20]:
Limit
Amount
1MB
10MB
2000
2MB
54
Collections
55
Tags
Non-visible Metadata
Directory hierarchies
Data Model
Interface
Large Objects (Files)
Horizontal Partitioning scheme
Object Storage
Query Method
MongoDB
Document-Oriented (JSON)
HTTP/REST
Yes (attachments)
BigCouch, CouchDB Lounge,
Pillow
Database contains Documents
Distributed Consistency
Map/Reduce (javascript +
others) creating Views + range
queries
Master-master with custom
conflict resolution function
MVCC (Multi Version
Concurrency Control)
Eventually consistent
Written in
Erlang
Replication
Concurrency
CouchDB
Document-Oriented (BSON)
Native Drivers; REST
Yes (GRIDFS)
Auto-sharding
Database contains collections
Collection contains documents
Map/Reduce (javascript)
creating collections + objectbased query language
Master-Slave
Update in-place
Strong consistency. Eventually
consistent reads from
secondary replicas
C++
Problems which need intense versioning; problems with offline databases that re-sync
later
Problems where you want a large amount of master-master replication happening. But
with MVCC there are some considerations:
56
MongoDB updates an object in-place when possible. Problems requiring high update rates of
objects are a great fit, compaction is not necessary. Mongo's replication, without the MVCC model,
is more oriented towards master/slave and auto failover configurations than to master-master
setups. MongoDB promises high write performance, especially for updates.
14.4 Scalability
One fundamental difference is that a number of Couch users use replication as a way scale. Mongo
uses auto - sharding as a way of scalability. There is couple of available options for sharding
CouchDB available as opensource or by third-party developers. The best known are CouchDB
Lounge and BigCouch used by cloudant.com [25, 26]
BigCouch can be seen as Erlang/OTP applications that allow creating a cluster of CouchDBs that is
distributed across many nodes/servers.[30] Instead of one big honking CouchDB, the result is an
elastic data store which is fully CouchDB API-compliant.
The clustering layer is most closely modeled after Amazon's Dynamo, with consistent hashing,
replication, and quorum for read/write operations. CouchDB view indexing occurs in parallel on
each partition, and can achieve impressive speedups as compared to standalone serial indexing.
[25]
14.5 Querying
CouchDB uses a view model which acts as an ongoing incremental map-reduce function, providing
a constantly updated view of the database. From the HTTP interface different views can be
accessed and data can be retrieved by key/index as well. The view model is well-suited for statically
definable queries; job-style operations. There is elegance to the approach, although these
structures must be pre-declared for each query to be executed. They can be thought of as
materialized views.[27]
Mongo uses traditional dynamic queries. As with, say, MySQL, it can do queries where an index
does not exist, or where an index is helpful but only partially so. Mongo includes a query optimizer
which makes these determinations. This is very nice for inspecting the data administratively, and
this method is also good when indexes are not used: such as insert-intensive collections. When an
index corresponds perfectly to the query, the Couch and Mongo approaches are conceptually
similar.[24]
57
14.8 Javascript
Both CouchDB and MongoDB make use of Javascript. CouchDB uses Javascript extensively including
in the building of views.
MongoDB supports the use of JavaScript but more as an adjunct. In MongoDB, query expressions
are typically expressed as JSON-style query objects, however one may also specify a JavaScript
expression as part of the query. MongoDB also supports running arbitrary javascript functions
server-side and uses JavaScript for map/reduce operations.
14.9 REST
Couch uses REST as its interface to the database. MongoDB relies on language-specific database
drivers for access to the database over a custom binary protocol. Of course, REST interface can be
added on top of an existing MongoDB driver at any time.
bellow.
Master0
Slave0
Replication for
backup
DB0
DBN
DB0
DBN
Mastern
Slaven
Replication for
backup
DB0
DB0
DBN
DBN
VM Host 0
VM Host 1
The shared plan is offered for free up to 250MB and there are three more options available Small
Medium and Large, also additional storage is available as an option.
The dedicated plan is offered in two variants with one dedicated node and with two and more
dedicated nodes. The dedicated plan with one node is a single dedicated VM with automatic
failover to a secondary on a shared VM. It offers high availability with the replicas but it does not
allow reading from the replicas as a mean to increase read-throughput. It also offers monitoring
services through MongoDB Monitoring Service (MMS). MMS is 10gen6 web service that monitors
6
10gen is a software company that develops and provides commercial support for the open MongoDB
database
59
and graphs the performance of MongoDB clusters, servers and databases over time. It can monitor
important statistics such as resident memory usage, rate of database operations, write-lock queue
depth and CPU alongside any other MongoDB instances they might be running outside of
MongoLab [28].
Dedicated plan with two and more nodes can scale to as many dedicated nodes of equal size as it is
needed. Also in addition to providing high-availability, it scales horizontally read throughput by the
creation of a Replica Set cluster of more than one member. The architectures of both dedicated
plans are shown in Figure 14. With dedicated plans hosting is available on Amazon EC2 or in
Rackspace Cloud.
60
Cloudent.com is offering multi-tenant and single-tenant (private) CouchDB database clusters that
are hosted and scaled within or across multiple top-tier data centers around the globe. In all
offered plans Cloudant automatically replicates the data across this network as needed to push it
closer to the global user base, reduce network latency overhead, ensure 24x7 availability, and
provide disaster recovery capabilities.[27]
Cloudant, provides a domain through which to access the data layer. Behind that domain, Cloudant
stores the data in horizontally scalable version of the CouchDB database. The horizontal scalability
is done with BigCouch [30] as mentioned earlier. The data layer automatically handles load
balancing, clustering, backup, growing/shrinking the clusters, and high availability. It also provides
private, single-tenant clusters that exist entirely within a data center or that span across data
centers to provide real-time data distribution to multiple locations.[29]
Regardless of whether it is a multi- or single-tenant data layer, data can be replicated and
synchronized between Cloudant data centers and:
61
other Cloudant data centers for high availability, backup or for scalability and performance
edge databases such as data marts or spreadsheets; great for independent analytic
projects
Cloudant Data Layer also includes a number of dashboards that allow view and control of the data
layer performance, usage, search indexing, billing and other metrics.[29]
Cloudnat pricing is a little bit different then MongoLab and it is based on data stored and millions of
requests per month (MReq/mo). There is a free starting plan that includes 250MB of storage and
0.5 Mreq/mo.[25,28]
Data storage is counted in a way that includes the size only of the latest revision of all documents,
plus the size of the view indexes. Older revisions and deleted documents do not count towards size
quotas. They are purged automatically after a certain time.
Requests are approximately the number of documents reads and writes from the database.
Lower Initial Investment only things needed to start using the cloud is computer and an
Internet connection, it is possible take advantage of most cloud offerings without investing
in any new hardware, specialized software, or adding to staff. This is one cloud computing
62
advantage that has universal appeal regardless of the industry or the type of business. This
allows organizations and especially startups to invest in new projects and ideas without risk
of big loss.
Easier to manage - There are no power requirements or space considerations to think about
and users do not have to understand the underlying technology in order to take advantage
of it. There is no need for maintaining and updating any new hardware or software.
Planning time is considerably less as well since there are fewer logistical issues.
Pay as You Go - Large upfront fees is not the norm when it comes to cloud services. Most
of the cloud services as I wrote earlier in this paper are available on a month to month
basis with no long term contracts. It also gives the benefit of keeping multiple projects
running without enormous expenses.
Scalability - Cloud computing can be scaled to match the changing needs of the small
business as it grows. Licenses, storage space, new instances and more can be added as
needed.
Deploy Faster usually it is possible to get up and running significantly faster with cloud
services than if there is a need to plan, buy, build, and implement in house. With many
software as a service applications or other cloud offerings it is possible to start using the
service within hours or days rather than weeks or months.
Location Independent - Because services are offered over the Internet, there are no limits
to using cloud software or services just at work or only on one computer. Access from
anywhere is a big advantage for people who travel a lot, like to be able to work from home,
or whose organization is spread out across multiple locations.
Device independent - Most web-based software and cloud services are not designed
specifically for any one browser or operating system. Many can be accessed via PC, Mac, on
tablets and through mobile phones.
Downtime - While we would like to think our data or the cloud based services that we use
are available on demand all day every day, the truth is they are not. System uptime is
entirely out of our hands with cloud services. There are two types of downtime:
o
Security Issues - This is maybe one of the most discussed issues when considering moving
to the cloud. You are turning over data about your business and your customers to a third
party and entrusting them to keep it safe. Without the proper level of security, your data
could be exposed to users outside your company or accessed by a hacker.
Less control over your data loss - With cloud services, you will have to give up some degree
of control over the prevention of data loss. That is in the hands of the cloud service
provider.
Integration and Customization - Some web based software solutions and cloud services are
offered as a one size fits all solution. If you need to customize the application or service to
fit specific needs or integrate with your existing systems, doing so may be challenging,
expensive, or not an option.
Prioritize applications
Focus on the applications that provide the maximum benefit for the minimum cost/risk. Measure
the business criticality, business risk, functionality of the services and impact to data sovereignty,
regulation and compliance. Prioritize which applications to migrate to the cloud and in which order.
Consumption models
As can be seen from the different pricing models used by the services and providers described
earlier, each provider has a different consumption model for how you procure and use the service.
These consumption models need to be considered carefully from two perspectives frequency of
change and volume.
Performance and availability
When moving to a distributed IT landscape with some functionality in the cloud, where there is
integration between these cloud applications and on-premise applications, then performance of
this distributed functionality needs careful consideration and potentially increased processing to
ensure service delivery. Similarly, availability will need careful assessment because an application
that is all in the cloud, or distributed across the cloud and on-premise, will have different
availability characteristics to the legacy on-premise application. Organizations also need to ensure
that their local and wide area networks are enabled for cloud and will support the associated
increase in bandwidth and network traffic.
Service integration
When moving an application to the cloud, continuity of service and service management needs to
be considered. The service management role changes to more of a service integration role. An
alternative to the in-house service management function providing this capability is the use of an
outsourcing organization, to provide this function.
Architecting for the cloud and cloud application maturity
Cloud Computing provides real benefits for organizations but to realize these benefits the
applications being utilized sometimes need to be architected to take advantage of the scalable
nature of Cloud Computing. While new applications, should be built with this in mind, often legacy
applications are built to take advantage of legacy systems and hence may not be able to truly
leverage the benefits the Cloud can bring without significant re- architecting. There are even
differences between how much re-architecting is needed from to move to a cloud provider and
also to move from one Cloud Computing provider to the next, so the Cloud provider selection
process should include questions about the Cloud providers technological underpinning and if rearchitecting is needed, it does not come as a surprise. Currently application maturity is extremely
variable from one application to the next.
Exit strategy
Before adopting a cloud service provider or application ensure you consider your exit strategy, e.g.
data extraction, and put costs for this strategy into your business case and service costs. Many
people are rightly concerned about moving to Cloud Computing and being fixed to one provider.
This is indeed a concern and one which should not be brushed off lightly. That said however, Cloud
Computing tends to be much more transparent when it comes to lock in and so organizations
should be able to accurately gauge the risks. Organizations should look at a number of different
factors:
o
Does the vendor provide quick and easy data extraction in the event that the
customer wishes to shift?
Does the vendor use open standards or have they created their own ways of doing
things?
Can the Cloud Computing service be controlled by third party control panels?
65
Data migration
Moving data into or out of a SaaS and DBaaS application may require considerable transformation
and load effort.
Service and transaction state
Maintaining continuity of the state of in-flight transactions at the point of transition into the cloud
will need consideration. This will also be the case at the point of exit as well.
Service Level Agreement (SLA)
Small business owners usually dont have experience with these types of agreements and not
viewing them might open up Pandoras Box without knowing it. Business impact in the SLA myst be
carefully considered and analyzed. Close attention should be paid to the availability guarantees and
penalty clauses:
What do you need to do to receive the credits when the hosting provider failed to achieve
the guaranteed service levels?
Are they automatically processes, or do you need to ask for them in writing?
Usually the cloud providers have one SLA for all users and do not provide customization of the SLA.
All this considerations must be evaluated carefully before moving to a cloud based solutions in
order to mitigate the risk and be confident to choose the right cloud services that will support and
insure growth of the business.
66
this also improves the morale. Travel time and costs are significantly reduced. Each employee who is
given access to the software can even ask the cloud computing suppliers team for support with
regards to the problems which may arise while he is using the system. Management can even monitor
remotely each employees activity through the management consoles provided by the supplier.
68
17. Conclusion
Database management system, for a long time has been an integral part of the computing. As the
whole IT world is moving to the cloud whether you are assembling, managing or developing on a cloud
computing platform, you need a cloud compatible database. In this work I gave a short overview of
cloud computing and presented couple of the currently available companies that offer database as a
service in the cloud. Although they differ from the most widely used traditional relational database
systems and most of them might require revision and recoding of the existing applications, it is obvious
that they bring a lot of benefits especially with the offer for fully managed and automated database
administration tuning and optimization.
Cloud database system are built to use the power of the cloud, they are extremely scalable and elastic,
giving the opportunity to start small and expand as you need mitigating the risk and uncertainties of
investing in IT equipment and professional IT support. Cloud computing in general, with the flexible
pricing models and different plans it presents the one of the best solutions for startup and small
companies that are developing new products and does not have the financial power to risk and invest
in uncertain projects.
The cloud database solution provides an ideal solution for web and mobile application. The fact that
most of the DBaaS offerings are tightly integrated with other PaaS gives the organization the
opportunity to focus on developing their products and do not waste any resources on administration
of the platform and gives an opportunity to fully focus on the development of the product.
Despite the benefits offered by cloud-based DBMS, many people still have apprehensions about them.
This is most likely due to the various security issues that have yet to be dealt with. Storing and
entrusting security of critical business data in the cloud, to a third party, where the data will be spread
on multiple hardware stacks and across multiple data centers can be a big security issue. In my
opinion, maybe the cloud is still not ready to be used to move critical enterprise applications which
store highly sensitive data but is definitely ready to be used for testing and development of new
projects.
Many companies including some of the huge multinational corporations have already moved to cloud
computing because it is less costly, efficient, and agile as compared to onsite IT systems. Therefore,
small and medium scale enterprise must follow suit. If cloud computing is proven to work for these big
enterprises, it will surely work for small and medium enterprises.
69
Appendix
Case studies from the industry Amazon RDS
Airbnb, a vacation rental firm, kept its main database in Amazon RDS. The consistency between locally
hosted MySQL and Amazon RDS MySQL facilitated the migration to AWS.
A significant architecture consideration for Airbnb was that Amazon provided the underlying
replication infrastructure. Amazon RDS supports asynchronous master-slave replication, wrote Tobi
Knaup.21 Knaup added that the hot standby, which ran in a different AWS Availability Zone, was
updated synchronously with no replication lag. Therefore, if the master database failed, the standby
was promoted to the new master with no loss of data. [32]
70
71
References
[1]. Cloud Computing Bible - Barrie Sosinsky, Janury 2012. ISBN: 978-0-470-90356-8
[2]. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
[3]. Introduction to cloud computing - Ivanka Menken, Emereo Publishing 2011
[4]. Understanding PaaS - Michael P. McGrath, O'Reilly Media January 2012
[5]. Data Management Challenges in Cloud Computing Infrastructures - Divyakant Agrawal, A. E.,
University of California, Santa Barbara.
[6]. Database Scalability, Elasticity, and Autonomy in the Cloud - Divyakant Agrawal, A. E.,
Department of Computer Science, University of California at Santa Barbara.
[7]. Cloud Computing: Principles, Systems and Applications - Gillam, N. A., Springer 2010
[8]. http://relationalcloud.com/index.php?title=Database_as_a_Service
[9]. The multitenant, metadata-driven architecture of Database.com - Database.com Getting
Started Series White Paper
[10]. Megastore: Providing Scalable, Highly Available Storage for Interactive Services - Jason Baker,
C. B.-M. http://pdos.csail.mit.edu/6.824-2012/papers/jbaker-megastore.pdf
[11]. Inside SQL Azure. Microsoft TechNet.
http://social.technet.microsoft.com/wiki/contents/articles/1695.inside-windows-azure-sqldatabase.aspx
[12]. https://www.windowsazure.com/en-us/home/features/data-management/
[13]. https://www.windowsazure.com/en-us/pricing/details/#storage
[14]. http://aws.amazon.com/rds/
[15]. https://developers.google.com/appengine/docs
[16]. http://en.wikipedia.org/wiki/Paxos_algorithm
[17]. Werner Vogels' weblog on building scalable and robust distributed systems
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
[18]. http://aws.amazon.com/dynamodb/
[19]. http://www.databasejournal.com/features/mssql/article.php/3823471/Cloud-Computingwith-Google-DataStore.htm
72
73