Sie sind auf Seite 1von 18

PROJECT REPORT CS 670

Comparative NoSQL DBs for Online Shopping


Name Aesam AbdulrahmanAl Malky Ayoob Al Essa Salah Al Shammary ID 433037907 2311044 432032124

Dr. Mohamed-Foued Sriti


Springer-Verlag Berlin Heidelberg 2011

Table of Contents
1 INTRODUCTION ......................................................................................... 3 1.1 1.2 1.3 1.4 1.5 Project Overview : ............................................................................. 3 Importance........................................................................................ 3 Scope ................................................................................................. 5 Problematic ....................................................................................... 5 Objective ........................................................................................... 5

2. Literature review .......................................................................................... 6 2.1 2.2 3 Definition of the domain................................................................... 6 Similar works: .................................................................................... 6

Comparative NoSQL DBs for online shopping .......................................... 8 3.1 3.2 3.3 3.4 Research methodology ..................................................................... 8 Issues:................................................................................................ 8 Database design and Modeling......................................................... 8 Document-oriented Database (Couchbase ) .................................... 9

CouchDB Data Model: ............................................................................. 10 CouchDB Architechture : ........................................................................ 11 3.5 Document-oriented database (MongoDB) ..................................... 12

MongoDB Data Model: ........................................................................... 13 MongoDB Architechture : ....................................................................... 14 4 5 Discussion and Conclusion ...................................................................... 16 References .............................................................................................. 18

INTRODUCTION

Electronic commerce also known as e-commerce, is the function of buying goods and services through the Internet. In other words, e-commerce is the usage of electronic communications and digital information processing mediums in transactions to produce relationships for creating values among businesses, and between businesses and customers. Nowadays for many people, e-Commerce is something they use on daily basis, online payment of bills, purchase of goods from Amazon or booking a plane ticket can be shown as examples of e-Commerce usage. e-Commerce first appeared 40 years ago and, till now, continually growing with latest innovations, technologies, and new start-ups joining the online market every year. Even though it is not older than two decades, today life without eCommerce seems to be difficult.

1.1

Project Overview :

The Project titled "comparative NoSQL DBs for online shopping" is analytical study for current e-commerce portals and its issues. The objectives of this study are to improve the services of e-commerce portals by using NoSQL DBs instead of Relational DBs used in legacy web-based application, and to find which the class of NoSQL DBs are the most suitable for web-based application such as Online- Shopping . We report the comparison between the two types of NoSQL database storage in order to achieve high responding, design flexibility and easy scalability of the online shopping sites.

1.2

Importance

A Web database is an integrated system of Web servers and database servers, which enables users to access on-line information in a platform3

independent manner through Web browsers. Web servers and database servers work together in a Web database as an integrated system. Since a web server and a database server work simultaneously, the response time in dealing with a request to the database cannot be seen simply as the web server service time plus database service time. The performance metrics and optimization suggestions are made on the basis of the analysis of the relationship between them. Good performance of Web databases provides a company with a definite edge over competition while poor performance makes it seriously handicapped. Hence to ensure good performance of Web databases is absolutely essential for business institutions as well as for any type of enterprises. Interactive applications have changed dramatically over the last 15 years. In the late 1990s, large web companies emerged with dramatic increases in scale on many dimensions: The number of concurrent users increasingly became accessible via the web and on mobile devices. The amount of data collected and processed soared as it became easier and increasingly valuable to capture all kinds of data. The amount of unstructured or semi-structured data its use became integral to the value and richness of applications. Where the size of data gets bigger as the E-commerce portals has grown in the world and the Relational Database Management Systems (RDBMS) were not sufficient to store and handle large volumes of data efficiently. Therefore, in this project, NoSQL as a database is proposed to maintain data for "Online Shopping". After this comparative study we can conclude which types of NoSQL DBs are the most suitable for such applications.

1.3

Scope

Due to time limitation, our project will focus on two types of NoSQL DBs which are Document-oriented database (Couch DB, Mongo DB).

1.4

Problematic

The legacy e-commerce sites with relational database technology Suffers from some issues such: Response time. Figure 1 shows the testing result with the same requests to a different size database. A large database has a bit longer response time than a small [3].

Fig. 1. Shows the Web database system response time versus the result file size from 10k to 100k in single query test

1.5

Objective

The objectives of this study are to improve the services of e-commerce portals by using NoSQL DBs instead of Relational DBs used in legacy web-based application, and to find which type of NoSQL DBs are the most suitable for 5

web-based application such as Online- Shopping. We aim at this study to propose type of database for online shopping sites to achieve: Better application development productivity through a more flexible data model; Greater ability to scale dynamically to support more users and data; Improved performance to satisfy expectations of users wanting highly responsive applications and to allow more complex processing of data.

2. Literature review

2.1

Definition of the domain

The biggest companies over world like Google, Amazon, Facebook, and LinkedIn were among the first companies to discover the serious limitations of relational database technology for supporting these new application requirements. Commercial alternatives didnt exist, so they invented new data management approaches themselves. Their pioneering work generated tremendous interest because a growing number of companies faced similar problems. Open source NoSQL database projects formed to leverage the work of the pioneers, and commercial companies associated with these projects soon followed.

2.2

Similar works:

The Web-based application Database challenges and issues discussed by various researchers. The authors of [2] discussed the new generation of ecommerce applications and data schemas requirements. Analyzing the performance of a typical web database system with different sizes of web pages and different sizes of database tables presented in Performance Issues of a 6

Web Database [3]. A survey and comparison of relational and non-relational database [6] Made the comparison of large amount of data between the two leading types of database (relational and non-relational) storage components prevailing in the industry, then they conclude the major differences between the two types of databases such that NoSQL has High data throughput and Highly scalable than relational database.

Comparative NoSQL DBs for online shopping

3.1

Research methodology

The project presented in this report based on theoretical research into best practices regarding how a website should be developed. In order to achieve this, an analytical comparative methodology was followed.

3.2

Issues:

Relational databases do not support high scalability, until a certain point better hardware can be employed but beyond that point the database must be distributed. One of the major disadvantage is data is stored in relational database in form of tables, this structure can give rise to high complexity in case data cannot be easily encapsulated in a table. Much of the features provided by relational databases are not used hence simply add to the cost as well as the complexity of the database. Relational Databases make use of SQL, which is featured to work on structured data, but SQL can be highly complex when working with unstructured data. When the amount of data turns huge the database has to be partitioned across multiple servers, this partitioning poses several problems because joining tables in distributed servers is not an easy task.

3.3

Database design and Modeling

In a relational database system we must define a schema before adding records to a database. The schema is the structure described in a formal lan8

guage supported by the database and provides a blueprint for the tables in a database and the relationships between tables of data. Within a table, we need to define constraints in terms of rows and named columns as well as the type of data that can be stored in each column. In contrast, a document-oriented database contains documents, which are records that describe the data in the document, as well as the actual data. Documents can be as complex as we choose; we can use nested data to provide additional subcategories of information about your object. We can also use one or more document to represent a real-world object.

3.4

Document-oriented Database (Couchbase )

In a document-oriented model, data objects are stored as documents; each document stores the data and enables us to update the data or delete it. Instead of columns with names and data types, we describe the data in the document, and provide the value for that description. CouchDB is a document-oriented database server, accessible through REST APIs. Couch is an acronym for "Cluster Of Unreliable Commodity Hardware", emphasizing the distributed nature of the database. CouchDB is designed for document-oriented applications, such as forums, bug tracking, wiki, email, etc. The CouchDB project is part of the Apache Foundation and is completely written in Erlang. Erlang was chosen as programming language, because it is very well suited for concurrent applications through its light-weight processes and functional programming paradigm. CouchDB is ad-hoc and schemafree with a flat address space. CouchDB is not only a NoSQL database, but also a web server for applications written in JavaScript. The advantage of using CouchDB as a web server is that applications in CouchDB can be deployed by just putting them into the database and that the applications can directly access the database without the overhead of a query protocol.

CouchDB Data Model: Data in CouchDB is organized into documents. Each document can have any number of attributes and each attribute itself can contain lists or even objects. The Documents are stored and accessed as JSON objects, this is why CouchDB supports the data types String, Number, Boolean and Array. Each CouchDB document has a unique identifier and because CouchDB uses optimistic replication on the server side and on the client side, each document has also a revision identifier. The revision id is updated by CouchDB every time a document is rewritten. Update operations in CouchDB are performed on whole documents. If a client wants to modify a value in a document, it has first to load the document, make the modifications on it and then the client has to send the whole document back to the database. CouchDB uses the revision id included in the document for concurrency control and therefore can detect if another client has made any updates in the meantime. The query model of CouchDB consists of two concepts one is Views which are build using MapReduce functions and another is HTTP query API, which allows clients to access and query the views. A View in CouchDB is basically a collection of key-value pairs, which are ordered by their key. Views are build by user specified MapReduce functions, which are incrementally called whenever a document in the database is updated or created. Views should be specified before runtime, as introducing a new View requires that its MapReduce functions are invoked for each document in the databases. This is why CouchDB does not support dynamic queries.

10

CouchDB Architechture :.

Fig. 2. Simple Architecture of Apache CouchDB database

There are three main components of CouchDB which are Storage Engine, View Engine and Replicator. Storage Engine: It is B-tree based and the core of the system which manages storing internal data, documents and views. Data in CouchDB is accessed by keys or key ranges which map directly to the underlying B-tree operations. This direct mapping improves speed significantly. View Engine: It is based on Mozilla SpiderMonkey and written in JavaScript. It allows creating adhoc views that are made of MapReduce jobs. Definitions of the views are stored in design documents. When a user reads data in a 11

view, CouchDB makes sure the result is up to date. Views can be used to create indices and extract data from documents. Replicator: It is responsible for replicating data to a local or remote database and synchronizing design documents.

Fig. 3. Simple database model for online shopping upon CouchDB database

3.5

Document-oriented database (MongoDB)

MongoDB is a schema less document oriented database developed by 10gen and an open source community. The name mongoDB comes from "humongous". The database is intended to be scalable and fast and is written in C++. 12

In addition to its document oriented databases features, mongoDB can be used to store and distribute large binary files like images and videos. It is fault tolerant, persistent and provides a complex query language as well as an implementation of MapReduce.

MongoDB Data Model: MongoDB stores documents as BSON (Binary JSON) objects, which are binary encoded JSON like objects. BSON supports nested object structures with embedded objects and arrays like JSON does. MongoDB supports in-place modifications of attributes, so if a single attribute is changed by the application, then only this attribute is send back to the database. Each document has an ID field, which is used as a primary key. To enable fast queries, the developer can create an index for each query-able field in a document. MongoDB also supports indexing over embedded objects and arrays. For arrays it has a special feature, called "multikeys": This feature allows using an array as index, which could for example contain tags for a document. With such an index, documents can be searched by their associated tags. Documents in MongoDB can be organized in so called "collections". Each collection can contain any kind of document, but queries and indexes can only be made against one collection. Because of MongoDB's current restriction of 40 indexes per collection and the better performance of queries against smaller collections, it is advisable to use a collection for each type of document. Relations in MongoDB can be modeled by using embedded objects and arrays. Therefore, the data model has to be a tree. The first option would imply that some documents would be replicated inside the database. This solution should only be used, if the replicated documents do not need very frequent updates. The second option is to use client side joins for all relations that cannot be put into the tree form. This requires more work in the application layer and increases the network traffic with the database.

13

MongoDB Architechture : A MongoDB cluster consists of three components namely Shard nodes, configuration servers and routing services or mongos. Shard nodes: Shard nodes are responsible for storing the actual data. Each shard can consist of either one node or a replication pair. In future versions of mongoDB one shard may consist of more than two nodes for better redundancy and read performance. Configuration servers: The config servers are used to store the metadata and routing information of the MongoDB cluster and are accessed from the shard nodes and from the routing services. Routing services or mongos: Mongos, the routing processes are responsible for the performing of the tasks requested by the clients. Depending on the type of operation the mongos send the requests to the necessary shard nodes and merge the results before they return them to the client. Mongos for themselves are stateless and therefore can be run in parallel. For storage MongoDB uses memory-mapped files, which lets the operating system's virtual memory manager decide which parts of the database are stored in memory and which one only on the disk. This is why MongoDB cannot control, when the data is written to the hard disk. The motivation for the usage of memory mapped files is to instrument as much of the available memory as possible to boost the performance. In some cases this might eliminate the need for a separate cache layer on the client side.

14

Fig. 4. typical architecture of a MangoDB Cluster

15

Discussion and Conclusion

Firstly we summarize the reasons to adopt and use NoSQL database for webbased applications as flowing: 1. Avoidance of Unneeded Complexity Relational databases provide a variety of features and strict data consistency but this rich feature set and the ACID properties implemented by RDBMSs might be more than necessary for particular applications and use cases.

2. High Throughput Some NoSQL databases provide a significantly higher data throughput than traditional RDBMSs. For instance, the column-store Hypertable which pursues Googles Bigtable approach allows the local search engine Zvent to store one billion data cells per day. To give another example, Google is able to process 20 petabyte a day stored in Bigtable via its MapReduce approach.

3. Avoidance of Expensive Object-Relational Mapping Most of the NoSQL databases are designed to store data structures that are either simple or more similar to the ones of object-oriented programming languages compared to relational data structures. They do not make expensive object-relational mapping necessary (such as Key/Value-Stores, Document-Stores).

16

Finally, we can conclude that the two types of document-oriented databases is satisfy all web-based requirements but the MongoBase database has an advantage which is the capability to store and distribute large binary files like images and videos, so in make sense to use this database model to handle a lot of picture and clips for the deferent product on the online shopping websites.

No. 1 2 3 4 5 6 7 8 9

Web issues Scalability Flexibility Support Big user Support Big data Support distribution Support parallel computation Support Binary files for imagers and video High data throughput For write more than read

CouchBase OK OK OK OK OK OK N/A OK N/A

MongoBase OK OK OK OK OK OK OK OK OK

Table 1. Comparison between two types of Document-oriented database

17

References

1. NoSQL Databases, Christof Strauch, Lecture Selected Topics on Software-Technology Ultra-Large Scale Sites. 2. Storage and Querying of E-Commerce Data, R.Agrawal, A.Somani and Y.Xu, VLDB Conference, 2001. 3. Performance Issues of a Web Database , Y.Li and K.L, chapter from Database and Expert Systems Applications, Springer, Volume 1873, 2000, pp 825-834. 4. Couchbase whitepapers, By www.couchbase.com/ 5. Couchbase Developer's Guide 2.0, By www.couchbase.com/ 6. "A Survey and Comparison of Relational and Non-Relational Database", N.Jatana, S.Puri, M.Ahuja, I.Kathuria and D.Gosain, International Journal of Engineering Research & Technology (IJERT), August 2012. 7. "Semi-Structured Data Modeling For A Web-Enabled Engineering Application", E.Rasys and N.Dawood, International Conference on Computing in Civil Engineering, June-2012. 8. "RDBMS to NoSQL: Reviewing Some Next-Generation Non-Relational Database's", R.Padhy, M.Patra, S.Satapathy, International Journal Of Advanced Engineering Sciences And Technologies, 2011.

18