Beruflich Dokumente
Kultur Dokumente
978-1-5090-3477-2/16/$31.00
Abstract—Library Automation and Digital Archive The structure of the rest of this paper is as follows:
(Lontar) is a library information system developed by In Section 2, we briefly explain Big Data, NoSQL and
Universitas Indonesia and used by its main library. MongoDB. In Section 3, we describe Lontar system.
Rapid increase of library collections will soon make Development and Implementation of this research is
query performance of current SQL DBMS, which is
discussed in Section 4. The result of query
MySQL, not fast enough to satisfy users and need to be
complemented by NoSQL database, an emerging performance comparison is presented in Section 5.
technology that specially developed for managing big Section 6 contains conclusions of our study.
data. The goal of this research is to implement and
analyze the usage of NoSQL database to improve the II. LITERATURE REVIEW
query performance of Lontar. MongoDB is selected as In this section, we briefly explain Big Data, NoSQL,
NoSQL DBMS and the result shows that MongoDB is MongoDB, and the architecture of Lontar.
signficantly faster than MySQL.
A. Big Data
Keywords—NoSQL Database, Digital Library, Nowadays, data becomes really important for
DBMS, MySQL, MongoDB organizations and it impacts to the fast increasing of
I. INTRODUCTION data produced by several sources [2]. Big Data
defined as a condition which needs advanced
Lontar (Library Automation and Digital Archive) technologies to capture, store, distribute, manage, and
is a library information system developed by analyze the big number of data because of increasing
Universitas Indonesia (UI) and used by not only data condition in terms of volume, velocity, and also
Universitas Indonesia but also several other veracity [3]. Moreover, Big Data is also a concept of
universities in Indonesia. Initialy developed in year managing the big number of data sets which vary and
2004 and continously developed since then, Lontar is have a complex structure of storing, analyzing, and
accessed by both citizens of UI and also external visualizing the data itself [2].
visitors. Lontar is used to search, manage, store, and
analyze library collections including books, B. NoSQL
magazines, thesis, modules, journals, etc. In general, NoSQL is a concept of data
Lontar has 450,000 entries in the Collection management in a non-relational database which
database [1] with a size of 2 TB and the figures keep emphasizes on Schema-less Oriented Database [4] [5]
growing each year. Lontar is developed using Java [6] [7]. NoSQL does not support the join process,
and Hibernate is used for database connection. resulting more efficient data manipulation process.
Hibernate has a feature to connect to several DBMS The popular NoSQL DBMS are MongoDB,
but current DBMS used by UI is MySQL. Looking at RavenDB, and CouchDB [6].
the current size of the database and its growth, there is In the development of the database, NoSQL has
a need to improve the query performance of Lontar. several data models, and one of them is document
An emerging technology for handling huge database. Document database uses the concept of
database is Big Data where NoSQL DBMS is used for documents to represent the entities being stored in the
managing the database. In this paper, we design and DBMS. Each document has its own fields and related
implement part of Lontar database using MongoDB values. Documents which have the same characteristic
and compared its query performance with MySQL. are collected in a single collection. Document
We limit the scope of query performance evaluation database can be represented in JSON, XML, BSON,
to Collection database since it is one of the most etc. Each collection may have documents with
important database in a library information systems as different number of fields. This cannot be done in
it is used by other modules such as Circulation relational database since data in a table need to have
module, Cataloging module, and Acquisition module. the same numbers of columns.
41
IWBIS 2016 c 2016 IEEE
978-1-5090-3477-2/16/$31.00
C. MongoDB
MongoDB is an example of DBMS for document
database. MongoDB is a NoSQL DBMS developed
by 10gen Company [8]. MongoDB is also an open
source DBMS which uses BSON as a data storing
format [9]. BSON itself is a Binary JSON whose
format mostly the same with JSON [9]. One of the
differences between BSON and JSON is on the data
type, BSON has a better data type than JSON because
BSON also supports Date data type [8].
BSON uses field and value as its format in
Figure 1. Architecture of Cataloging Module in Lontar
representing data [8]. One field name has one
appropriate value, so that querying of a value is done III. SYSTEM AND DATA ANALYSIS
by using the field name. Some documents may have
Basically, Cataloging Module uses JSP (Java
couples of fields and values. In addition, Some
Server Page) as a web programming technology and
documents may also form a collection. MongoDB
MySQL as a DBMS [11]. There are two frameworks
emphasizes on denormalized database, so user need to
which support it, Struts and Hibernate [11]. Struts is
refrain from using too many collections. By using
used to set the user inteface and Hibernate is used to
only a few collections, a few relationships will be
map relational data into object data which will be
needed for the query. There are two types of
used by JSP. Relational schema of Cataloging
modelling collections in MongoDB, as follows [10]:
modules can be seen on Figure 2.
1) Embedded Documents
This modelling type focuses on embedding one or
more documents on another corresponding document.
For example a book_collection document must have
at least one corresponding copy of book document.
Therefore, copy of book document is embedded on
book_collection document (“collection” in
book_collection refers to a group of books).
2) Referenced Documents
For documents which cannot be embedded on any
other documents, referenced document is used to
relate documents. The concept of referenced
document is exactly similar with the concept of
relationship in relational database. Referenced
document is divided into two types, manual reference
and DBRefs. Manual reference is done by saving an
_id or primary key of another document and using it
to relate to that document. DBRefs is almost the same
with manual reference, but in DBRefs, user is able to
relate documents from different database.
Figure 2. Relational Schema of Cataloging Module
D. Library Automation and Digital Archive (Lontar)
Lontar has several modules, such as Circulation The arrow shows a relationship between two
module, Acquisition module, and Cataloging module. tables. In general, Collection_Type table has all
Cataloging module is the main module in managing collection types of every single collection of the
collection data owned by a library. Cataloging module library and each collection type has some collections
has several functions including insert, search, and in the Collection table. Collection table may have
delete function. some digital files in the Digital_File table and copies
Users request is sent into the DBMS using in the Collection_Copy table (in default, each
Hibernate as a framework and relational mapper [11]. collection has at least one copy of collection).
Then, Hibernate accesses MySQL and maps relational When a librarian edits, inserts, or deletes
data into object data and vice versa. Mapping is done collections in Lontar, identity of the librarian will be
to make Lontar able to read the data in relational stored in the inserted_by, edited_by, and deleted_by
form. Figure 1 shows the architecture of Cataloging column of Collection_Copy table. Each row in User
module in Lontar. table is actually part of a group of user in the table of
Group_User. Fields of each collection, such as “title”,
“author”, “publisher”, “year”, etc., are stored in the
table of Collection_Field. Value of each field is
42
IWBIS 2016 c 2016 IEEE
978-1-5090-3477-2/16/$31.00
stored in the table of Collection_Data_Field. With this anymore because Collection collection has already
kind of schema, Lontar is able to store collection field-document and its data. Furthermore, there are
which has different number of fields corresponded to some embedded documents and manual references.
each collection type of the collection. Manual references are done through reference key
As mentioned before, Cataloging module is related between collections.
to other modules in Lontar. Therefore, for easier The process of designing MongoDB schema is to
implementation of non-relational database in the eliminate Join Processes which can affect the query
Cataloging module, MySQL is still used in some time. In the search function, at least 6 Joins used to
logics which access other modules beside Cataloging retrieve a document. This Join Processes will affect to
module. For example, Circulation module has a the slow response of the query. In the delete function
relationship with Cataloging module in the process of and insert function will at least use 3 Joins to run the
book Circulation. Figure 3 shows an updated function. This Join Processes will also affect the
architecture of Lontar combining the usage of response for each delete and insert made
MySQL and MongoDB.
43
IWBIS 2016 c 2016 IEEE
978-1-5090-3477-2/16/$31.00
44
IWBIS 2016 c 2016 IEEE
978-1-5090-3477-2/16/$31.00
D. Delete Function Testing Result collection, the difference is just 0.37 seconds and
Delete function is also done from one deleted 0.175 seconds for six matched collections. Compared
collection to 100 deleted collections. Like other to more than or equal to 15 matched collections,
testings, each scenario is also done in 10 times and MongoDB has faster query time than MySQL. For
averaged to get the time. Deleting a collection is also 79,788 macthed collections, the difference between
automatically deleting all copies that the collection MySQL and MongoDB is 3,199.51 seconds or almost
has, e.g. Collection A has 2 copies, if we delete 54 minutes.
Collection A, the 2 copies will also be deleted MongoDB has also another benefit compared to
automatically from the database. Table 3 and Figure 9 MySQL in terms of query time increased for 1 to 995
are the result of Delete function testing. matched collections. For that range, the increasing of
query time of MongoDB is slower than MySQL. The
Table 3. Testing Result of Delete function increasing of query time of MongoDB is less than two
times, but in MySQL, it is one to seven times. For
example, the increasing of query time from query
time of 98 macthed collections to 995 is less than 2
times in MongoDB (increased of 1.96 times from
36666 seconds to 6.3966 seconds), but in MySQL, the
increasing reaches around 7 times (increased of 7.2
times from 7.4879 seconds to 53.8038 seconds).
The next evaluation is for Delete function. In
general, query time for Delete fucntion in MongoDB
is faster than in MySQL. However, in MongoDB, the
increasing of query time is bigger than in MySQL.
For example, in MongoDB, the query time for
deleting a collection wih 10 copies to 100 copies is
around 2.21 times but in MySQL, the increasing of
query time with the same scenario is 1.62 times.
Deleting a collection with its copies will change
the copies status to “DELETED”. Copy document
which is an embedded document from Collection
Figure 9. Query Time Comparison for Delete function collection forces the new system to update the status
field in copy embedded document by accessing the
Collection collection first with the $set function. In
E. Testing Result Evaluation MySQL, changing a status of copy to “DELETED”
Evaluation is done for each tested function. For can just be done by directly updating the status
search testing, shown that for matched collections less column in table of Collection_Copy.
than 15, MySQL has a faster query time than The next evaluation is on Insert function. In
MongoDB. The reason is MySQL has a Hibernate general, MongoDB is faster than MySQL. The
framework which supports MySQL to retrieve the increasing of query time in MongoDB is slower than
data into a list. This cannot be done in MongoDB MySQL. This is because the inserting process in
which has to use a driver provided by MongoDB. The MySQL has to be done by adding rows in several
driver cannot map the collection directly into a list, tables, including Collection, Collection_Data_Field,
but need to use the API provided to save the data into and Collection_Copy table. Moreover, the more
a DBCursor. Then, every entity of DBCursor has to copies inserted impact to the more rows inserted to
be saved in the object of Collection using hasNext() Collection_Data_Field table and it will impact to the
function. In MongoDB, calling hasNext() can be query time or performance. In MongoDB, the
actually slower if a few collections are matched inserting process is only done by inserting a
compared to MySQL. However, if there are Collection document and embedded Collection_Copy
significant number of matched collections, the documents to Collection collection.
accumulated cost of calling hasNext() will not affect VI. CONCLUSION
the query time. On the other hand, accessing a lot of
matched collections in MySQL can be very slow Cataloging Module of Lontar has been
because it has to access several tables, especially table implemented successfully using MongoDB. Based on
of Collection_Data_Field which has lots of rows. the testing result, MongoDB has a better performance
Furthermore, joins in MySQL can impact to the than MySQL. This is generaly because MySQL has to
accumulated cost if there are significant number of access table of Collection_Data_Field which has lots
macthed collections. Although MySQL query time is of rows and does several joins between tables. On the
faster than MongoDB for matched collections less other hand, accessing table of Collection_Data_Field
than 15, the time difference between MySQL and is not required anymore in MongoDB since
MongoDB is not significant. For one macthed Collection_Data_Field data is stored as field-
45
IWBIS 2016 c 2016 IEEE
978-1-5090-3477-2/16/$31.00
document in MongoDB and several joins have been [4] Y. Li and S. Manoharan, “A Performance Comparison of
replaced by embedded documents. SQL and NoSQL Databases,” dalam 2013 IEEE Pacific Rim
Conference on Communications, Computers and Signal
For less than 15 matched collection scenarios, Processing (PACRIM), Victoria, BC, 2013.
MySQL has a faster query time than MongoDB. This [5] B. G. Tudorica and C. Bucur, “A Comparison Between
is because MongoDB has to access DBCursor using Several NoSQL Databases with Comments and Notes,”
hasNext() which needs lots of cost. Although dalam 2011 10th Roedunet International Conference
(RoEduNet), Iasi, 2011.
MongoDB has a slower query time for those [6] K. Kaur and R. Rani, “Modelling and Querying Data in
scenarios, the difference is actually not significant. NoSQL Databases,” dalam 2013 IEEE International
For more than 15 matched collections, MongoDB has Conference on Big Data, Silicon Valley, CA , 2013.
a better perfomance and faster query time. For 79,788 [7] J. Han, H. E, G. Le and J. Du, “Survey on NoSQL
Database,” dalam 2011 6th International Conference on
matched collections, the difference between MySQL Pervasive Computing and Applications (ICPCA), Port
and MongoDB is 3,199.51 seconds or almost 54 Elizabeth , 2011.
minutes. [8] R. Arora and R. R. Anggarwal, “Modeling and Querying
Data in MongoDB,” International Journal of Scientific &
REFERENCES Engineering Research, vol. 4, no. 7, pp. 141-144, 2013.
[9] G. Zhao, W. Huang, S. Liang and Y. Tang, “Modelling
[1] M. H. Virdhani, “Intip Perpustakaan UI yang Baru Yuk!,” 24 MongoDB with Relational Model,” dalam 2013 Fourth
Juni 2011. [Online]. Available: International Conference on Emerging Intelligent Data and
http://news.okezone.com/read/2011/06/24/373/472323/intip- Web Technologies (EIDWT), Xi'an, 2013.
perpustakaan-ui-yang-baru-yuk. [accessed 26 December [10] MongoDB Developers, MongoDB Release Notes 2.6.4,
2014]. USA: MongoDB, Inc, 2014.
[2] S. Sagiroglu and D. Sinanc, “Big Data: A Review,” dalam [11] Lontar UI Developers, Developer Guide for Library
2013 International Conference on Collaboration Automation and Digital Archive Universitas Indonesia
Technologies and Systems (CTS), San Diego, CA, 2013. (Lontar UI v3.0), Depok: Universitas Indonesia, 2006.
[3] K. Ebner, T. Buhnen and N. Urbach, “Think Big with Big
Data: Identifying Suitable Big Data Strategies in Corporate
Environments,” dalam 2014 47th Hawaii International
Conference on System Science, Hawaii, 2014.
46