You are on page 1of 12

Ravish Chawla Technology Review Database Systems Implementation December 12, 2013

ArangoDB: A new NoSQL Database System Introduction I am writing a technology review on a NoSQL Database implementation, called ArangoDB. I chose to write about this technology because it was released last year, so there is not much information available online about it. ArangoDB is an intriguing technology, because it has some similar attributes to other NoSQL Database Implementations, such as Cassandra and MongoDB. However, there are also attributes that make it stand out from such databases. ArangoDB is advertised as a multi-purpose NoSQL Database that uses a flexible data model for storing documents, graphs, and key-values. One of the most compelling advantages of ArangoDB is its wide distribution and availability, because it is available on not just Windows, Linux and Mac, but also on a new distribution, the Raspberry Pi, which is a distinct advantage over other NoSQL Databases. About ArangoDB ArangoDB has only been released last year, so it does not have a lot of information available online. However, it is an intriguing product, because it has been released into a market which is already heavily occupied by Cassandra and MongoDB. Because ArangoDB has only been released in this past year, the developers of the database have made many resources available online for consumes to use to understand how this database works. For instance, the main webpage of the database contains a documentation on the entire database, and how it is

used. The developers have also made it easy for consumes to use the database, even if they are coming from a different background. For instance, the query language used by ArangoDB is called the AQL, which is very similar to SQL. The language has all the features generally required by most users, such as joins and filter conditions. The use of such a language makes it easy for most users to transition from a SQL background to a NoSQL background. The use of programming languages in this database is also very good, such as the use of JavaScript, which is used for developing single page application (SPA). These are used extensively in ArangoDB, along with an integrated application framework called Foxx. Foxx is an integrated single page application, which supports many scripting languages, such as HTML, CSS, and JavaScript. Foxx allows users to create their own API, which may be useful for consumers who, like is mentioned on the website, want to create an API to use it from an Android or iOS app. The inclusion of many of these feature, especially the diversity of languages, and the integrated framework, make ArangoDB stand out for something that only came out this year. Since ArangoDB is a NoSQL database implementation, it supports different data models, the most important of which is Key Value store. However, ArangoDB also implements other data models, such as a Document Store and Graph data. In Document stores, data is encapsulated in text documents, and in Graph data, a graph consisting of nodes, edges, and attributes is used to represent and store data. Graph Data can be really useful, because relationship in databases can be most easily represented as an edge between two vertices. ArangoDB goes beyond this and allows linking between different documents as a graph as well. The actual implementation of the language is done in C and C++, and it uses Google‟s V8 engine to run JavaScript code on the server side, which is written in JavaScript and Ruby. ArangoDB performs background synchronization, instead of immediate synchronization. So, the

data modified after a transaction is not immediately written to disk. The database runs a separate thread in the background that writes pending actions to disk in parallel. Hence, there is a risk of data loss. However, it has not been specified what kind of logging system, or data recovery model the database uses. Because of the possibility of data loss, there should be a technique of recovering data that has been stored, yet not written to disk. For data, data files can also be imported directly into ArangoDB. ArangoDB first transfers imported data to the server, imports the records into the database, and then prints a status summary. Data Files that are supported are CSV data, and JSON-encoded data. ArangoDB also has built in authentication for security. Authentication allows a selected group of users to access the database, make transactions, and use the API. Moreover, ArangoDB supports adding and removing users, other than the root user. As of now, ArangoDB is a free open source project, part of Apache 2.0, which gives it free licensing. The code for the database implementation is available on GitHub, allowing contributions from other programmers. Since this project is still a work in progress, there are still issues with it. However, because it is available online easily, it gives the database a better possibility to improve in the future. The database has some great features, and compared to other NoSQL database systems. Its main advantage is that it has been developed recently, which gives it a better possibility of being compatible with today‟s devices. It has already shown this by having built in support for Raspberry Pi, It support of different architectures, which are in use today, is important, and its support for API for Android and iOS devices is also an indicator of its diversity. ArangoDB has also been updated very frequently. Since its initial release, the database system has been updated many times, bringing in important features with every update. For instance, the latest update brought to the ArangoDB the support for multiple databases, and the previous update brought in asynchronous master-slave replication. Future releases should make

ArangoDB a viable alternative to other NoSQL databases, as it gains more features and updates that make it more competitive with MongoDB and Couch DB. Detailed Analysis of the Product and Comparison with other NoSQL Systems There are many attributes of comparisons between different database systems, such as performance, data organization, querying, and durability. In terms of data organization, ArangoDB stands out from MongoDB in that ArangoDB contains built in support for Graph Based data, while MongoDB has conditional support for it. Graph data is useful in that it helps store relationships between different collections easily. Two collections, related by a single event, can be stored as graph with two vertices, one for each collection, and an edge that connects the two. ArangoDB stores documents in collections, hence the database allows a graph storage in which documents are connected via graph edges. Both databases, however, support storing data with composite and secondary indexes. As mentioned above, ArangoDB uses a separate background thread to synchronize changes between the database and disk. For data recovery, ArangoDB uses a technique similar to Redo Logging, called Journaling. In this technique, new data is added to a journal, instead of being directly written to disk. The data is written to disk in parallel with other processes, and hence does not wait for a commit log to write to disk. Journaling is similar to Redo Logging, because when the database goes offline, when it comes back online, the database rereads the journals and restores the database to its current state. The reasoning behind using a journaling system instead of directly writing to database, as stated by the developers, is that is gives a higher throughput, because the database does not have to switch between executing transactions and going back and forth to writing into the database. This differs from MongoDB in that MongoDB uses immediate synchronization, which ArangoDB tries to avoid. However, both write to disk permanently.

ArangoDB uses its own querying language, called AQL. ArangoDB uses its own querying language because it supports data models other than the standard key-value stores and document stores that are used in most database systems. Because of graph data, SQL can be insufficient to query data. On the other hand, MongoDB uses document based queries, and Couch DB uses functions and JSON documents to query data. Compared to both, AQL is most similar to SQL, because it is a declarative language, and allows execution of many functions that are supported by standard SQL based databases, such as joins and filtering data. Disregarding the structure of the language, both databases allow querying data, and applying functions such as sorting, results projection, and aggregate functions. AQL only contains additional support for querying graph data, and indexing based on 2 dimensional geo indexing. The developers of ArangoDB have compared performance of MongoDB, Couch DB, and ArangoDB by testing bulk inserts. The following diagram shows the time taken to insert medium sized datasets into each of the databases

Overall, the running time of ArangoDB was on par with MongoDB. There were cases in which one was slightly faster than the other, but overall, both turned out to be faster than Couch DB. According to the benchmarking results page, the datasets used consisted documents. The largest set was the last dataset, data consisting of 50,000 Wikipedia documents. The key success indicator of the database system, as mentioned earlier is the database‟s support for graph data. Graph data is a unique style of data, in which vertices represent certain elements that could or could not be related to each other. Those that are related to each other contain an edge between them. The edge represents how two vertices are related to each other. Standard databases store such data differently. For instance, in a standard relational database, graph data would be stored in two tables, one that contains a list of all vertices and edges, and another which pairs each vertex with all the edges that link to it. ArangoDB provides support for graphs through its implementation of key/value stores. Each document is identified by a key,

which allows for storage in key/value stores. Each key, can further be connected to another key, using a built-in attribute in ArangoDB. This connectedness allows for graph storage. For actual disk storage, ArangoDB uses a specialized binary data format. The data for a document is stored in binary in disk, containing attributes such as name and attribute types. If some documents have similar metadata, both can share the same binary data, allowing a smaller use of disk spade than is necessary to store each document. A distinct advantage that ArangoDB has over other NoSQL databases is that it supports cross-collection and document transaction; the latter provides transaction of data from or into a document. This is most useful with complex nested documents. As is evident, ArangoDB is a Multi Model database, as compared to other NoSQL databases, such as MongoDB, which only has support for Key/Value stores and Document stores. Like MongoDB, however, ArangoDB has its own querying language, called the AQL. AQL is a language that 9s similar to SQL. It supports many features that are included in SQL, but not generally in most querying languages. The most important for it is the ability to join different tables to get an output that is dependent on both tables. The most important thing that the AQL has to provide for is graph data access. For graph access, AQL uses a technique called Path Traversal. Path Traversals are short programs written in JavaScript. For instance, the following query
FOR t IN TRAVERSAL(users, friends, "users/john", "outbound", {minDepth: 2}) RETURN t.vertex._key

runs a path traversal on a table called „users‟, on a column called „friends‟, into the field „John‟. For each vertex called „John‟, it traverses through each, and for each outgoing edge, it returns all vertices at a depth of 2. This traversal algorithm will return a list of all friends of friends of John, who are not directly his own friends. Like AQL shows in this algorithm, ArangoDB has native

support for graphs. This is a much simpler implementation of an algorithm that might take O (|V|3) time using a naïve algorithm. Such an algorithm would traverse through all pairs of vertices, and for each of those vertices, it would pair them with all others. Of course, MongoDB also supports querying such data, at better times than the naïve approach, however, AQL‟s built in integration with graph data give it an advantage. Programming larger queries in JavaScript allows the user much higher control over use and access of data, because programming comes with more freedom when it comes to implementation. Support for both kinds of queries, programming and transaction language (as shown in the above example), make AQL much more suitable for data access than many other query languages, especially SQL. AQL already implements the best parts of SQL within itself, such as joins, aggregate functions, list iterations, results projections, and set operations. Hence, AQL is a very powerful tool for data access in ArangoDB. The most important part about a software is how you interact with it. For instance, Hadoop Clusters have a shell environment that allows access to all parts of the system through commands. Different systems have different intractability, depending on what is required of the application. ArangoDB also has a shell environment, but there is a secondary web interface. The shell interface gives users intractability through command line instructions. Similar to most terminal applications, command line tools can be very powerful, and are more useful when working with large ad-hoc queries. However, ArangoDB also has a web interface, which displays information more visually. This is very useful, especially with ArangoDB because Graph Data is best displayed visually, and not descriptively. For instance, this is an image of a graph stored in ArangoDB.

As the screenshot shows, the Web Interface is very interactive, making it more useful than the command line interface when viewing data. Most applications also allow access to web interface, hosted on a localhost port. However, they usually display information that is not very different than what is displayed in a shell. For instance, as mentioned above, Hadoop Clusters have a shell interface. They also have a web interface, however, it only displays a summary of work on the cluster. This data is also accessible through a shell command, making the web interface an unnecessary component. ArangoDB, on the other hand, makes better use of the web interface. A similar comparison can also be made with other database systems, such as MongoDB and couch DB. ArangoDB also has an application interface that allows control of database through programming languages, called Foxx. Foxx also gives additional support into the database by allowing uses to extract API to be used in other applications. An example of how this may be used is in mobile apps, such as for Android and iOS, allowing the use of ArangoDB as a backbone for storing data as part of an app, or giving the database more interactivity and

control by accessing it through such apps. This is a major advantage that ArangoDB has over MongoDB and couch DB. Because ArangoDB was designed very recently, such features have come with the system from the start, and do not require second hand solutions. Reflection and Concluding Remarks Despite its many advantages over other systems. ArangoDB is not as widely used. This is partly because there has not been much time since the system was available to end users. Most companies and corporates also prefer official business solutions to something that has not nurtured for as long, and hence use MongoDB and Cough DB as their main solution. Although ArangoDB has some distinct features which make it better than other systems, most competing system are able to replicate the features with second hand solutions. Although these solutions are not as reliable, the systems they run on are, and this outweighs any disadvantages of the system. ArangoDB may prove to be a better software over time, as its distinct features start to shadow other systems. As performance benchmarking results showed, ArangoDB performs on par with MongoDB. For many uses, this is the most useful stat about the system, making ArangoDB an easy alternative. However, for people who already have a lot of data stored in MongoDB, it is almost impossible to migrate data from one system to another. Hence, I believe that ArangoDB should target new users and new companies as its audience. ArangoDB has many compelling attributes, and stands above MongoDB in some of those. The most important of this is when each system was released. Because ArangoDB was developed recently, it uses new technology, and is programmed based on concepts and resources available today. It is also designed to meet the needs companies today, as discussed earlier. It also has a greater support available, such as on Android and iOS. These things may also be available on other systems, but, they are not tailored for the system, as ArangoDB is.

New systems always take time to catch on, especially technology like this. Most companies have dedicated a lot of resources to a particular system, and hence, ArangoDB is not meant for them. However, I believe that ArangoDB will have better adoption in the future, especially by new users. Although it has not been as successful right now, there are some users who are very dedicated to the system. This is evident from their Twitter page, which 415 followers. Although not many, there are some tweets that show appreciation for the system. Even though it takes time, ArangoDB may become more noticed in the future, and may replace database systems which prove to be very slow compared to ArangoDB, such as Couch DB. What make ArangoDB stand out is that it is new, and this may be the main reason it catches on as a more widespread solution.

References "ArangoDB - the Multi-purpose NoSQL DB." ArangoDB. N.p., n.d. Web. 11 Dec. 2013. <www.arangodb.com> "ArangoDB - the Multi-purpose Nosql Database." ArangoDB. N.p., n.d. Web. 11 Dec. 2013. <www.arangodb.com/blog> "ArangoDB Admin Interface." ArangoDB. N.p., n.d. Web. 11 Dec. 2013 <http://www.arangodb.org/admin-interface> "ArangoDB Vs. MongoDB." ArangoDB vs. MongoDB Comparison. N.p., n.d. Web. 11 Dec. 2013. <http://vschart.com/compare/arangodb/vs/mongodb> "ArangoDB 1.4.3." ArangoDB Free Download. N.p., n.d. Web. 11 Dec. 2013. <http://www.softpedia.com/get/Internet/Servers/Other-Servers/ArangoDB.shtml> “@ArangoDB – twitter.” ArangoDB on Twitter. N.p., n.d, Web. 11 Dec. 2013. <http://twitter.com/arangodb> "Comparing ArangoDB with CouchDB and MongoDB." ArangoDB. N.p., n.d. Web. 11 Dec. 2013. <http://www.arangodb.org/2012/11/13/comparing-arangodb-with-mongodb-andcouchdb> "FAQs." ArangoDB. N.p., n.d. Web. 11 Dec. 2013. <www.arangodb.com/faqs>