Sie sind auf Seite 1von 13

Database System in Cloud Computing

Harsh Shah Electrical and Computer Engineering, University of Waterloo, Ontario, Canada

Abstract - The WWW has evolved more than anyone envisioned and is still growing twice every year. So with such tremendous growth of the Web there came challenges related to management of data, scalability, consistency, availability and most importantly the security of the data. To answer some of the above challenges Cloud computing model was very attractive and basically Cloud computing takes the utility model to the next level and started offering Database-as-a-Service (DaaS). Today almost all traditional database providing companies have started providing its extension to cloud platform which are hosted over internet by companies like Amazon, IBM, Google, Microsoft, VMware etc. The paper describes the different services offered by the cloud computing and the advantages and disadvantages of cloud computing, further cloud databases are discussed and two important cloud database architectures: Share-nothing and Shared-disk architecture. The important challenges faced in deploying database in cloud are explained along with the two important data management applications in cloud. And finally Amazons SimpleDBs data model, query execution, read consistency, transaction management and security of data are discussed. Index Terms Database Management System, Cloud Computing, Cloud Data Management, Amazon SimpleDB I. INTRODUCTION The World Wide Web has evolved more substantial than anybody possibly envisioned

may perhaps be and so even the actual volume associated with digital data associated into it. Nevertheless the actual exact dimension is not known, the current guesstimate is actually there tend to be about hundred million hosting companies as well as more than 350 million active customers on the web. In reality, the actual pace associated with advancement become so that the web is proficiently doubling in dimension each year. As the volume of data exploded, there were needs to efficiently and effectively manage this data and data is considered to be very precious in web industry. Conventional relational database management systems where incapable to handle such huge data volume and to relieve the RDBMS came the non-relational databases. And these databases needed to be highly scalable. Both cloud and grid computing were built to scale horizontally very effectively. But then cloud computing being continues processing whereas grid computing being batch processing, clouds were accepted widely. Clouds developed into types of resources available Web services, databases, and file storage and extended it to Web and enterprise applications. Cloud computing is ubiquitous. Pick up any tech journal or visit almost any information technology website or blog and its very much likely to notice talk about cloud computing. The only difficulty is that not every person agrees on what it is. Ask ten different experts what cloud computing is, and there are ten unique answers. Cloud computing is a tremendously successful model of service oriented computing, and has transformed the method computing infrastructure is used. Three most general cloud models include: IaaS: Infrastructure as a Service, PaaS: Platform as

a Service, and SaaS: Software as a Service. This notion can also be extended to DaaS, i.e., Database as a Service. Elasticity, pay-asyou-go, low upfront cost, less time to market, and transfer of dangers are certain key qualifying features that mark cloud computing a universal model for deploying different applications which may not be economically practical in a conventional enterprise structure sites. This has perceived a spread in the number of applications which influence various cloud platforms emerging in an incredible rise in the scale of the data produced along with consumed through such applications. Scalable DBMS both for decision schemes, as well as update exhaustive application are thus an important part of the cloud organization. II. CLOUD COMPUTING The National Institute of Standards and Technology (NIST) defines cloud computing as "Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction."

As a cloud like shape is used to represent a network, hence the term cloud is often used as a metaphor for the Internet in general for

the fundamental building blocks it represents. IT companies are under burden to provide better service faster and at less cost to the lines of commerce. Cloud computing offers a simultaneous method to lower costs, increase elasticity and responsiveness, and improve QoS. Cloud computing is a technology that lets users to use resources from big pooled computing resources like storage, servers and networks. And rather providing these resources as Product they are provided as a Service which allows them to plug into the cloud as any other utility (e.g. electricity) grid. Cloud computing relies on allocating of resources to accomplish coherence and economies of scale comparable to a utility in a network. At the basis of cloud computing is the broader notion of combination of infrastructure and shared services. The main aim of cloud computing is to free the users from complex, very important and very often tedious task of managing and providing computational resources to run its applications. Cloud computing is also known as On-Demand Computing There are three different service models in cloud computing that provide different levels of abstraction to the users. (1)Software as a Service (SaaS) Users are provided with a complete hardware and software stack, with pre-deployed applications to use. e.g. Salesforce.com, Google Docs, Yahoo!Mail, Hotmail, Dropbox (2)Platform as a Service (PaaS) Developers code and deploy applications using an API provided. Infrastructure is managed automatically. e.g. Google AppEngine, Windows Azure, VMware CloudFoundry (3)Infrastructure as a Service (IaaS) Hardware resources (servers and storage) are made available over the Internet in a scalable and flexible manner. For e.g. Amazon EC2,

Rackspace Storage

CloudFiles,

Google

Cloud

A. The Benefits of Cloud Computing: Cloud computing is beneficial to achieve economies of scale by increasing output volume with fewer resources which eventually decreases per unit cost of production. Cloud computing reduces capital cost on technological infrastructure with minimum upfront expense with pay-as-yougo plan. Cloud computing deployment enables globalizing the workforce by accessing it from anywhere, anytime through internet, making life much easier. More work can be done with less people and less time by streamline processes. Projects can be monitored more effectively with less personnel training requirement. Cloud computing minimizes licensing of new software and provides much more flexibility in terms of allocation of resources. B. The Downsides of Cloud Computing: But definitely there are some short comings of any new technology and user needs to give a thought to the drawbacks before jumping into the new technology. The risk associated with Cloud Computing is trusting the service provider that they will not disrupt their service. As in the service provider has complete hold over user hostage because the only way the user can have access to its documentation and productivity is through the service provider. So in short the service provider can paralyze the user after which the user is helpless. And another main drawback is the protection and security of users information and data as its in the massive internet which is the part of business network. So even if a big, well established business is jumping to cloud computing, they need to carefully select the service provider. For sure the administration cost will decrease to much extent because of cloud deployment but then there will be increase in risk of

service interference, privacy, online security and connectivity. Cloud computing brings along loss of control over the system. In long run the cost incurred in deploying cloud computing is much higher than setting up a system in traditional way. Additionally, there are some problems related to knowledge and integration of knowledge. III.

DBMS

Databases played an important role in any Information System. Storage and data processing are very important for almost every company which is directly or indirectly driven by IT industry. Data stored in computers or DBMS have proven to be price effective. Every type of data needs to be managed and arranged so that it can administrate the database processes, this is known as database association arrangements. And there has always been demand of good DBMS to manage the ever increasing volume of data to be stored in the database. A DBMS is a set of programs that allows operators to manage, i.e., insert, update and delete information into the database. Its key role is to manage the request of the operator and other programs installed in the computer or in the network. IV. Cloud Database Relational databases governed the Database industry for almost four decades. However last few years have seen drastic changes in the way data is interpreted and processed. In a traditional database management system, every program runs in a silo, and every silo is sized for maxload. Its not possible to share capacity between silos; it has to carry sufficient capacity for the maxload of each program. Once it has been moved into shared or pooled resources, it can increase utilization rates and needs to have only sufficient capacity spread over all workloads.

The next sort of database is cloud-based databases. Cloud database system is a distributed database that provides computing as a service instead of a product. It is the distribution of resources, software and data between multiple devices in a network which is typically the internet. The agreeable thing concerning the cloud established databases is that there is no necessity to have highend hardware. As a firm one wouldnt require to retain as many people to run the database furthermore Cloud services are also extensively accessible to the information data virtually anyplace. The setback alongside the cloud based services is the data security issues, if one can have access to it anywhere it is far easier for anyone else to have access. The additional setback is that it is kind of one-size-fits-all, meaning it usually dont permit one too much customization. By pooling resources in a cloud model it leads to increased flexibility along with faster innovation for dynamic business needs. All DBMS, irrespective of whether traditional or cloud-based, are fundamentally communicators who function as middlemen between the operating system and the database. A unique thing that distinguishes cloud DBMS from traditional ones is they are extremely scalable. They have capacity to process bulk of data that would have rather exhausted a traditional DBMS. Compared alongside established relational database servers, such cloud databases could proposal less querying skill and frequently weaker consistency guarantees, but scale far larger by providing built-in prop on load balancing elasticity and availability. V. DESIGN ARCHITECTURES One important thing about cloud database is the end-user does not require the knowledge of the physical location where the data is stored. For the end user it is stored virtually in a cloud. But technically it is stored in large

data centers. And majority of the data stored is in unstructured form as it is the fastest growing data type and the only solution to process huge amount of data. Another feature of unstructured data type is they can have images, videos, worksheets, etc. at the same time. Next important question is how to manage this unstructured data. For this cloud database uses two different design architectures of Parallel Database. (1) Shared-Nothing Database Architecture (2) Shared-Disk Database Architecture Shared-nothing and Shared-disk represent methods of data access architectures. Sharednothing essentially requires data portioning so that every database server has to store and process its own piece of database without worrying about the other database servers and data stored in them. Workload is divided among the different database servers and each one is responsible for its own data, resources and memory location. Shared-disk is equivalent to accessing single large volume of data by different nodes in the database system. Here nodes have access to all system resources, data and memory location which are shared among them. Many discussions, papers and debates can be found on topic related to these two architectures stating advantages and disadvantages of one over another. However to find the real score between these two type of database architecture it is essential to study them in detail. A. Shared-Nothing Database Architecture The shared-nothing architecture data is portioned into discrete silos of data, one per every server. In a typical Shared-Nothing database architecture each subsystem is provided with its own set of memory and one or more disks. And all these subsystems are

interconnected through a network to a computer, to which the processor communicates. When there is a request from clients they are directed to the subsystem that owns the resources automatically. But at any instance only one clustered subsystem can access and own particular resources. And in an event of failure the ownership of the resources is transferred dynamically to another clustered subsystem. This particular database architecture makes use of the local disk for storage whereas shared-disk database architecture uses the pooled storage that can be accessed via the network. As the storage was local and there was no need of network to access the storage shared-nothing

required such that data is processed on one machine and then passed to another machine for processing and joining. This passing of data is called data shipping and it kills the database performance. More the data shipping more is the latency and network bandwidth is congested. These issues reduce database performance badly. So for this reason, data partitioning should be done cautiously to minimize data shipping. And it requires very high level of skills for efficient data partitioning. B. Shared-Disk Database Architecture In a Shared-Disk Database Architecture all the connected systems access the same disk devices. Though processor of the each system do have their own reserved memory but they can directly address all disk shared by them. Shared-Disk architecture has become quite common in fast few years with growing popularity of Storage Area Network (SAN). Cost of administration is quiet reduced in shared-disk as compared to shared nothing architecture as database administrators (DBA) dont have to worry about partitioning of database across systems in order to achieve parallelism. But as the size of database increases partitioning is required, so then this difference becomes less announced. And if one of the DBMS node fails then this does not affect the functioning of other nodes and the ability to access the whole database and this is another convincing feature of shared-disk architecture. As in contrast in shared-nothing architecture some data is at least lost upon failure of a node. However, shared-disk architecture is still exposed to single point failure. Like if the data was damaged or corrupted by software or hardware failure before it reached the storage system successfully then all node will access this corrupt data. And again if some redundancy

database architecture had better performance in an era when networking was slow. The most important advantage of sharednothing architecture is its scalability. Theoretically speaking a shared-nothing system can scale up to any number because they dont bother each others work as they have their own set of resources to work upon as nothing is shared. Hence shared-nothing is usually preferred to form clustering. The scalability feature of shared nothing architecture makes it an ideal choice for database applications that are read intensive, for example data warehousing. In Shared-Nothing there is one important concept of Data Shipping. As data is spread across multiple servers, and some data is

technique is used then the system will have copies of this corrupted data. As mentioned earlier shared-disk architecture allows any node to access data on the entire shared disk, hence any node can serve any database request. As a result this leads to better load balancing than as compared to shared-nothing architecture. Because of better load balancing ability shared-disk architecture can withstand evolutionary and temporary changes in usage

Data is partitioned across clusters Messaging overhead Overhead of efficient limits total number of data partitioning nodes such that data shipping is minimum. On the basis of conventional computing constraints, shared-nothing architecture has been leader in price-performance because it requires relatively simpler transaction and concurrency control though has data portioning overhead. VI. DESIGN CHALLENGES Cloud database management systems needs to support features of Cloud computing along with traditional databases for wider acceptance, which is a Herculess task. The potential challenges associated with cloud databases are as follows A. Scalability The primary function of Cloud model is very high scalability. Scalability actually means resources could be dynamically scaled up or scaled down without triggering any disruption in the present on going service. The database developers are challenged to develop such databases that can handle and support constant growing infinite number of users and data being accessed concurrently. Companies deal with huge amount of data in todays world limiting it to peta and zeta bytes. So the problem of scalability can be solved by adding additional servers but only if all the processes and workload can be made parallelizable. Scalability requirement is more for analytical data management than for transactional data management. B. High availability and Fault Tolerance High availability of database means database is available and can be accessed 24x7. In

High availability

pattern. Consequently, it allows to run each system at a higher CPU utilization, as the peak loads are well distributed across all the systems in the cluster. However, the latency issues are same for share-disk architecture as there are for shared-nothing architecture. This issue can be resolved by creating two separate shared-disk systems by partitioning the disk as its done with shared-nothing system. Shared-Disk Shared-Nothing Quick adaptability to Can exploit simpler, varying workload cheaper hardware Dynamic way of load Load balancing is balancing fixed according to the partitioning scheme used. Works best in heavy Unlimited read environment scalability, depends on partitioning, data shipping can kill scalability Data need not be Performs well in portioned high-volume, writeread environment

order to have this feature of high availability of database data replication needs to be done across large geographic locations. Data replication also provides durability and also good level of fault tolerance. High Data availability and fault tolerance is principal aim for any cloud database providers, because either loss of data or unavailability of data can damage both the provider and user by not able to hit targets mentioned in service agreement and the business reputation of them. C. Heterogeneous Environment In todays world users need to access different application from different place and environment like hand phones, tablets and laptops. So as users application and data, whether structured or unstructured, differ its hard to decide deterministically how the data is going to be used by the user and so its required that the database be able to function efficiently in various heterogeneous environment. D. Data Consistency and Integrity Integrity and consistency of the data in the database are the most crucial pre-requisite for any business applications deploying cloud database and it is maintained through database constraints. As consequence of lack of data consistency and integrity unexpected and false output. Traditional RDBMS have ACID (Atomicity, Consistency, Isolation and Durability) properties while cloud databases have BASE (Basically Available, Soft state, Eventually consistent). As cloud databases rely on eventual consistency it takes a while before data is made consistent across all the multiple replicated distributed databases. Hence its very difficult to have consistency of transactional data in a database which changes quite very often as in the case of transaction intensive data management. So BASE approach should be followed cautiously keeping in mind the type of data

management, whether its analytical intensive or transactional intensive data management, because data integrity cannot be compromised at the cost of over enthusiasm to switch to cloud database E. Simplified Query Interface As stated earlier cloud databases are essentially distributed databases. So the biggest challenge that a cloud database developer face is to query these distributed databases. As the database is distributed, a database query has to execute through multiple nodes of the cloud database. So its relatively difficult but necessary to have a simple and uniform query interface to query the distributed database. F. Database Security and Privacy Though abstractly the storage of data can be thought of as in some cloud but physically it is stored in some data centers in particular country and it is subjected to data management rules and regulations governed by that particular country. And there might be rules such that the government needs to have an access to all the data stored in the country, like the US Patriot Act grants the government the right to demand access to any computer and the data stored in it. So if the company needs to avoid access to their database in the cloud from unauthorized hosts then data needs to be encrypted before uploading to the cloud database with the key not located in the cloud. However there are risks involved in storing transactional data on an untrusted host. But usually sensitive data is encrypted and then uploaded to cloud database to avoid unauthorized access. And after doing encryption of the data the application running on the cloud database even needs to decrypt the data before accessing it and this is a big challenge. Hence there is a big question to the privacy and security of the data stored in the cloud database. Its because of this reason Amazon

S3 was the first cloud database provider to have an option for the customer to choose among United States and European Union for their data storage. And lately there were quite a few providers who provided the customers an option to choose the country in which they wanted their data to be stored. Data Portability and Interoperability Technology advances very fast with new additions and modifications such that suites the customer needs. So customer needs the freedom to switch from one provider to another without any hassles. But its not very easy to switch the providers because of the provider lock-in and it is very important challenge to cloud database that needs to be addressed. Up to some extent it can be resolved by portable and interoperable components but still its not that trivial. Data Portability is defined as the capability to run components designed for one cloud database provider on another cloud database providers environment. And Interoperability is extension of Data Portability. It allows a component that is flexible enough to run in multiple cloud environments irrespective of any differences. As of now, no standard API for storing and accessing cloud database available. In order to make cloud databases accepted widely more productivity features like interface with business intelligence tools should be incorporated in them. VII. DATA MANAGEMENT APPLICATIONS IN THE CLOUD The above described cloud design challenges have clear significances on the choice of the user to deploy what type of data management into the cloud. Two types of data management make up the largest portion of the data management in general. They are transactional data management and analytical data management.

A. Transactional data management Transactional Data Management is referred to like bread-and-butter for the database industry. Databases are backbone of bank industry, online flight booking, e-commerce and other supply chain management applications. These are the kind of application which inherently need the database to be ACID properties compliant and they are more of write intensive jobs. And following are the challenges in deploying transactional data management in the cloud database: Its not possible to deploy Transactional data management systems using a sharednothing database architecture. The transactional database management industry is ruled by Microsofts SQL Server, IBMs DB2, Sybase and Oracle. Out of these four leading database management service provider neither SQL Server nor Sybase can deploy shared-nothing database architecture. A shared-nothing implementation of DB2 was released by IBM, but it was designed to benefit analysis based application like data warehousing and not transactional data management. Same was the case with Oracle which had released a shared-nothing implementation at storage level and so meant for analytical data management. As data is partitioned across different locations, implementing shared-nothing database for transactional data management is not easy as it leads to increased latency and possible network bandwidth bottleneck as data shipping takes place across the network.

Furthermore, as data is replicated across different geographical locations, its difficult to maintain ACID properties. According to the CAP (Consistency, Availability and Partition Tolerance) Theorem, a system having shared-data architecture can have at the max two

properties. As data is replicated over different geographical locations it basically leaves the choice of consistency and availability to choose from. As a result out of ACID, in order the system to be rationally available its Consistency (C) part is compromised. So its like in order to have strict implementation of one property other property needs to be relaxed. Like for example both Amazons SimpleDB and Yahoos PNUTS have implemented sharednothing architecture oven WAN, but by relaxing the ACID properties they are able to overcome the hitches of distributed replications. Whereas Googles Bigtable weakens the Atomicity (A) property of ACID and does not offer complete relational application in its shared-nothing database architecture. There are proposals like H-store project that aims to build shared-nothing architecture on wide-area that strictly follows the ACID properties by carefully designing the database such that it minimizes the number of cross partition transactions. However the practicability of such an approach in real-world data problem and querying workload is yet to be established. Lastly, as transactional data is to be stored on an untrusted host there are high risks associated with it. Transactional data usually includes data needed to accomplish mission-critical business processes. Transactional data often contains sensitive customer data such as bank account details or contact number at the lowest granularity. This leads to possible security threats or privacy violations and its usually not acceptable. Hence because of these possible challenges it is obvious to say that transactional data management is not yet well suited to be deployed on cloud environment.

B. Analytical data management In certain type of data management technique the output of the queried data is useful in business analysis, planning, decision making and problem resolving. Such type of data management technique is termed as Analytical data management. Typically whole data history from multiple operational database is scanned in such type of analysis. As a result the magnitude of analytical data management is usually larger than transactional data management techniques mentioned before. Analytical data has reached up the order of petabytes whereas for transactional data even terabyte is large enough. However Analytical data management are read intensive operations with rare data updates or inserts. From the following discussion it will be convincing to say that analytical data management is well suited and can be deployed in cloud environment. Analytical data management is good to go with shared-nothing architecture Many database service providing companies use shared-nothing architecture for their analytical data management products, to name a few Vertica, Netezza, Teradata, Greenplum and IBM DB2 and recently Oracle also added analytical products on shared-nothing architecture. The primary driving force behind choosing sharednothing architecture by these providers is the ever increasing volume of data involved in analysis, and share-nothing architecture is believed to be highly scalable. Additionally its relatively easy to parallelize work among the nodes in shared-nothing architecture as data analysis management work usually comprise of star schema join, many large scans and multi-dimensional aggregations. Lastly, as the write and update operations are not very rare there is no need to have commit protocols and complex distributed locking.

ACID properties are usually not required. The write and update operations are not very frequent in analytical data management and as generally it is not required that analysis needs to be done on up to the second recent data, it is fine to the analysis on recent snapshot of the data. This makes three out of four ACID properties easy to obtain, namely Atomicity (A), Consistency(C) and Isolation (I). Hence as in the case of transactional data management there is no such problem of tradeoff between consistency and atomicity of the distributed replication of data. No issues of storing analytical data on untrusted hosts as it can be left out of analysis for security reasons. In Analytical data management, it is usually possible to separate chunk of data that would be most damaging if it is accessed by unauthorized third party. So this data can either be left out from the analytical data management store or it should be included after encrypting it or after having applied an anonymizer function. Besides, for analytical data management there is no need for analysis of most detailed data at the lowest level rather less granular version of data is good to analyze. To conclude, it can be said that analytical data management is well suited for deployment in cloud because of its typical data and workload characteristics. If the risks related to security are reduced then sharednothing architecture can easily leverage the computational elasticity and availability of storage resources of the cloud model. And in particular cloud could be a preferred option for data warehousing for medium sized business that cannot afford to have datawarehouse because of large up-front cost and for applications where security is not an issue where a window of data of data warehouse is intended to be publicly viewed.

VIII.

AMAZONS SIMPLEDB

Amazons SimpleDB is provided by Amazon Web Services (AWS) cloud and is an alternate of conventional RDBMS which is a hosted cloud-based web service and works along with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2) to offer the facility to store, query and process data stored in the cloud. It is highly scalable, available and elastic non-relational type of database. It provides only the essential functionalities for complex and ambiguous querying and storing data usually found in conventional database management system and thus has a streamlined approach. And as it is XML based, storing, retrieving and editing of data is very quick using web service API calls which can be written in many modern programming language platforms. One of the noted user of SimpleDB is online video streaming giant Netflix. The core parts of the database of Netflix have been moved from Oracle to Amazons SimpleDB. SimpleDB has flexible schema or it is schema less and usually supports eventual consistency and sometimes immediate consistency. For scaling the system no database administration is required and hence is virtually zero-administration service. As the database grows computational and storage power is allocated dynamically by Amazon. SimpleDB is actually a distributed document-oriented database. There are many characteristics of SimpleDB that match with relational databases, but there are some dissimilarities as well. For example, as in relational database, information or data in SimpleDB is stored as tuples. However SimpleDB does not have service of data ordering, unlike relational database. Without undergoing the operational complications Amazon SimpleDB offers the core functionality of any given database by real-

time lookup by simple querying of structured data. A. SimpleDB Data Model SimpleDB is NoSQL database though it stores data in structured manner as name/value pair data. All data is stored in string data format. In order to execute queries on the structured data simple interface is

attributes and select in either REST or XML


Request REST or XML SimpleDB

User Response XML

using HTTP and the database returns the query value in XML format that can be displayed in web browser. C. Read consistency in SimpleDB In order to provide high availability and data durability SimpleDB stores replicas of each domain at different geographical locations. For successful write operation all copies of each domain should durably persist. Read consistency depends on the manner and timing in which prior successful write or update took place. SimpleDB enables the user to specify the consistency characteristics desired by the user for each read operation. SimpleDB has two options to provide the users for the read consistency: Eventually Consistent and Consistent Read. By default SimpleDB has Eventually Consistent Reads. Following table describes the comparison between the two Read Consistencies: Eventually Consistent Consistent Read Read Stale Reads Possible No Stale Reads Lowest read latency Potential higher read latency Highest read Potential lower read throughput throughput D. Transaction Management in SimpleDB SimpleDB uses Optimistic Concurrency Control (OCC) mechanism to manage transaction and for which relational database transactions were depreciated. OCC

provided by SimpleDB to get, post, delete and query. The stored structured data is in the form of domain, items, attributes and values. Domains comprise of items which are defined by attribute name-value pairs. The data model of SimpleDB can be better understood with spreadsheet model. Domain can be compared to a table or spreadsheet, it is a container of nested key-value pairs. Rows of the domain (spreadsheet) are the items and items are defined by attributevalue pairs. However it allows the cells to contain multiple values per entry unlike a table or spreadsheet. Every item can have its own attributes and everything is indexed. Like item1 can have n set of attributes item2 can have another n set of same or different attributes, in a way they are not interrelated. B. Query execution in SimpleDB Programming language that SimpleDB uses is Erlang. Queries are executed using XML or REST interface. User can store data in Amazon SimpleDB using the REST interface. All results displayed using XML documents. The user request can be create/delete domain, put/get/delete

mechanism permits modifications to proceed only if the item that is modified is found unchanged since it was last accessed. OCC can be implemented by keeping a track of an additional attribute of version number or a timestamp as a part of item and then based on the value of version number conditional put/delete can be performed. So for example, if a user tries to increment a counter as a part of any application, it would be rejected if there was other transaction that simultaneously modified the counter. E. Data Security in SimpleDB In order to authenticate the user, SimpleDB uses standard cryptographic methods and data is only available to the user. If user applications needs to have extra level of security then data can be encrypted before storing into the database. But then the user needs to keep in mind that SimpleDB would not decode it while querying. User needs to query according to the encryption-decryption algorithm used. F. SDB Explorer SDB Explorer is an enhanced Graphic User Interface (GUI) for accessing Amazons SimpleDB very conveniently and advanced query processing features. Using SDB Explorer, items in SimpleDB domains can be manage very conveniently. Users can perform fast and parallel edits on data easily using SDB Explorer thus saving time and money. As mentioned earlier a domain is like a spreadsheet with items and attribute-value pairs. SDB Explorer provides the visualization of the unstructured data as it is stored in a tabular view. And very easily data can be edited directly in the table cells without having any need of writing query as it is done in relational database management systems. SDB Explorer also calculates SimpleDB charges for the queries that are executed.

G. Advantages Amazon SimpleDB is a simple web services interface to create and store multiple data sets, query your data easily, and return the results. Amazon SimpleDB spontaneously manages performance tuning, replication and indexing of data items, hardware and software maintenance, and infrastructure provisioning. So this results into faster development time. Other advantage is being a cloud service the user pays for what is used (Pay-as-you-go) and can scale-in and scaleout resources as per need (Scale-as-you-go). A simple application programming interface (API) is used for access and storage. Overall operational complexity is reduced and being a non-relational database it is essentially schema less. The user can scale the database by partitioning the workload through several domains. Depending on user application they have a choice to choose between consistency and eventual consistency. However it is difficult to maintain data integrity with complex application. H. Limitations Data integrity is not always guaranteed by SimpleDB. Less Features means user need to do more work for their application needs. As all data is stored in string data format processing query that includes other type of data type (date and numerical) is not possible. Data inconsistency can have serious flaws in the user application. SimpleDB is not that fast. Amazon SimpleDB has several drawbacks, including weaker forms of consistency and storage limitations. IX. CONCLUSION Hence, it can be concluded that though SimpleDB has some limitations of scaling, it is a good fit for workloads that require simplicity and flexibility in query processing. SimpleDB offers these features as trade-off between flexibility and performance and

scaling capability. It is relatively good for smaller workloads. And the bottom line for database systems in cloud computing is although it is a good and easy way of offloading some or all the responsibilities related to the data storage and processing to a third-party provider there are some tradeoffs that needs to be considered before deploying into the cloud database depending on the users application needs.

[7] [8] [9]

[10]

REFERENCES
[1] S. Lee, "Shared-Nothing vs. SharedDisk Cloud Database Architecture," International Journal of Energy, Information and Communications, vol. 2, no. 4, pp. 211-216, 2011. I. Arora and D. A. Gupta, "Cloud Databases: A Paradigm Shift in Databases," IJCSI International Journal of Computer Science Issues, vol. 9, no. 4, pp. 77-83, 2012. V. Anne, V. S. Ponnam and G. Praveen, "A Significant Approach for Cloud Database using Shared-Disk Architecture". S. Ramanathan, S. Goel and S. Alagumalai, "Comparison of Cloud Database: Amazons SimpleDB and Googles Bigtable," IJCSI International Journal of Computer Science Issues, vol. 8, no. 6, pp. 243246, 2011. Y. E. Gelogo and S. Lee, "Data Management System as a Cloud Service," International Journal of Future Generation Communication and Networking, vol. 5, no. 2, pp. 7176, 2012. J. M. Hellerstein1, M. Stonebraker and J. Hamilton, "Architecture of a Database System," Foundations and [11]

[12]

[2]

[13]

[3]

[14]

Trends in Databases, vol. 1, no. 2, pp. 141-259, 2007. C. S. I. Mike Hogan, "Database Virtualization and the Cloud," 2009. C. S. I. Mike Hogan, "Cloud Computing & Databases," 2008. Oracle, "DATA CENTER BEST PRACTICES: MANAGING DATA WITH CLOUD COMPUTING". D. Agrawal, A. E. Abbadi, S. Antony and S. Das, "Data Management Challenges in Cloud Computing Infrastructures". D. J. Abadi, "Data Management in the Cloud: Limitations and Opportunities," New Haven. U. F. Minhas, "Scalable and Highly Available Database Systems in the Cloud," Waterloo, 2013. "Amazon SimpleDB FAQs," Amazon Web Services, Inc., 2013. [Online]. Available: http://aws.amazon.com/simpledb/faqs /#What_is_Amazon_SimpleDB. Amazon SimpleDB Developer Guide, 2009.

[4]

[5]

[6]

Das könnte Ihnen auch gefallen