(Doi 10.1109/IWBIS.2016.7872887) Herrnansyah, Ruldeviyani, Yova Aji, Rizal Fathoni - (IEEE 2016 International Workshop On Big Data and Information Security (IWBIS) - Jakarta, Indonesia (2016.10 PDF

IWBIS 2016 c 2016 IEEE
978-1-5090-3477-2/16/$31.00
Enhancing Query Performance of Library Information

Systems using NoSQL DBMS:
Case Study on Library Information Systems of
Universitas Indonesia
+HUPDQV\DK<RYD5XOGHYL\DQLDQG5L]DO)DWKRQL$ML
)DFXOW\RI&RPSXWHU6FLHQFH8QLYHUVLWDV,QGRQHVLD
KHUPDQV\DK#JPDLOFRP\RYD#FVXLDFLGUL]DO#FVXLDFLG
Abstract—Library Automation and Digital Archive The structure of the rest of this paper is as follows:
(Lontar) is a library information system developed by In Section 2, we briefly explain Big Data, NoSQL and
Universitas Indonesia and used by its main library. MongoDB. In Section 3, we describe Lontar system.
Rapid increase of library collections will soon make Development and Implementation of this research is
query performance of current SQL DBMS, which is
discussed in Section 4. The result of query
MySQL, not fast enough to satisfy users and need to be
complemented by NoSQL database, an emerging performance comparison is presented in Section 5.
technology that specially developed for managing big Section 6 contains conclusions of our study.
data. The goal of this research is to implement and
analyze the usage of NoSQL database to improve the II. LITERATURE REVIEW
query performance of Lontar. MongoDB is selected as In this section, we briefly explain Big Data, NoSQL,
NoSQL DBMS and the result shows that MongoDB is MongoDB, and the architecture of Lontar.
signficantly faster than MySQL.
A. Big Data
Keywords—NoSQL Database, Digital Library, Nowadays, data becomes really important for
DBMS, MySQL, MongoDB organizations and it impacts to the fast increasing of
I. INTRODUCTION data produced by several sources [2]. Big Data
defined as a condition which needs advanced
Lontar (Library Automation and Digital Archive) technologies to capture, store, distribute, manage, and
is a library information system developed by analyze the big number of data because of increasing
Universitas Indonesia (UI) and used by not only data condition in terms of volume, velocity, and also
Universitas Indonesia but also several other veracity [3]. Moreover, Big Data is also a concept of
universities in Indonesia. Initialy developed in year managing the big number of data sets which vary and
2004 and continously developed since then, Lontar is have a complex structure of storing, analyzing, and
accessed by both citizens of UI and also external visualizing the data itself [2].
visitors. Lontar is used to search, manage, store, and
analyze library collections including books, B. NoSQL
magazines, thesis, modules, journals, etc. In general, NoSQL is a concept of data
Lontar has 450,000 entries in the Collection management in a non-relational database which
database [1] with a size of 2 TB and the figures keep emphasizes on Schema-less Oriented Database [4] [5]
growing each year. Lontar is developed using Java [6] [7]. NoSQL does not support the join process,
and Hibernate is used for database connection. resulting more efficient data manipulation process.
Hibernate has a feature to connect to several DBMS The popular NoSQL DBMS are MongoDB,
but current DBMS used by UI is MySQL. Looking at RavenDB, and CouchDB [6].
the current size of the database and its growth, there is In the development of the database, NoSQL has
a need to improve the query performance of Lontar. several data models, and one of them is document
An emerging technology for handling huge database. Document database uses the concept of
database is Big Data where NoSQL DBMS is used for documents to represent the entities being stored in the
managing the database. In this paper, we design and DBMS. Each document has its own fields and related
implement part of Lontar database using MongoDB values. Documents which have the same characteristic
and compared its query performance with MySQL. are collected in a single collection. Document
We limit the scope of query performance evaluation database can be represented in JSON, XML, BSON,
to Collection database since it is one of the most etc. Each collection may have documents with
important database in a library information systems as different number of fields. This cannot be done in
it is used by other modules such as Circulation relational database since data in a table need to have
module, Cataloging module, and Acquisition module. the same numbers of columns.
41
978-1-5090-3477-2/16/$31.00
C. MongoDB
MongoDB is an example of DBMS for document
database. MongoDB is a NoSQL DBMS developed
by 10gen Company [8]. MongoDB is also an open
source DBMS which uses BSON as a data storing
format [9]. BSON itself is a Binary JSON whose
format mostly the same with JSON [9]. One of the
differences between BSON and JSON is on the data
type, BSON has a better data type than JSON because
BSON also supports Date data type [8].
BSON uses field and value as its format in
Figure 1. Architecture of Cataloging Module in Lontar
representing data [8]. One field name has one
appropriate value, so that querying of a value is done III. SYSTEM AND DATA ANALYSIS
by using the field name. Some documents may have
Basically, Cataloging Module uses JSP (Java
couples of fields and values. In addition, Some
Server Page) as a web programming technology and
documents may also form a collection. MongoDB
MySQL as a DBMS [11]. There are two frameworks
emphasizes on denormalized database, so user need to
which support it, Struts and Hibernate [11]. Struts is
refrain from using too many collections. By using
used to set the user inteface and Hibernate is used to
only a few collections, a few relationships will be
map relational data into object data which will be
needed for the query. There are two types of
used by JSP. Relational schema of Cataloging
modelling collections in MongoDB, as follows [10]:
modules can be seen on Figure 2.
1) Embedded Documents
This modelling type focuses on embedding one or
more documents on another corresponding document.
For example a book_collection document must have
at least one corresponding copy of book document.
Therefore, copy of book document is embedded on
book_collection document (“collection” in
book_collection refers to a group of books).
2) Referenced Documents
For documents which cannot be embedded on any
other documents, referenced document is used to
relate documents. The concept of referenced
document is exactly similar with the concept of
relationship in relational database. Referenced
document is divided into two types, manual reference
and DBRefs. Manual reference is done by saving an
_id or primary key of another document and using it
to relate to that document. DBRefs is almost the same
with manual reference, but in DBRefs, user is able to
relate documents from different database.
Figure 2. Relational Schema of Cataloging Module
D. Library Automation and Digital Archive (Lontar)
Lontar has several modules, such as Circulation The arrow shows a relationship between two
module, Acquisition module, and Cataloging module. tables. In general, Collection_Type table has all
Cataloging module is the main module in managing collection types of every single collection of the
collection data owned by a library. Cataloging module library and each collection type has some collections
has several functions including insert, search, and in the Collection table. Collection table may have
delete function. some digital files in the Digital_File table and copies
Users request is sent into the DBMS using in the Collection_Copy table (in default, each
Hibernate as a framework and relational mapper [11]. collection has at least one copy of collection).
Then, Hibernate accesses MySQL and maps relational When a librarian edits, inserts, or deletes
data into object data and vice versa. Mapping is done collections in Lontar, identity of the librarian will be
to make Lontar able to read the data in relational stored in the inserted_by, edited_by, and deleted_by
form. Figure 1 shows the architecture of Cataloging column of Collection_Copy table. Each row in User
module in Lontar. table is actually part of a group of user in the table of
Group_User. Fields of each collection, such as “title”,
“author”, “publisher”, “year”, etc., are stored in the
table of Collection_Field. Value of each field is
42
978-1-5090-3477-2/16/$31.00
stored in the table of Collection_Data_Field. With this anymore because Collection collection has already
kind of schema, Lontar is able to store collection field-document and its data. Furthermore, there are
which has different number of fields corresponded to some embedded documents and manual references.
each collection type of the collection. Manual references are done through reference key
As mentioned before, Cataloging module is related between collections.
to other modules in Lontar. Therefore, for easier The process of designing MongoDB schema is to
implementation of non-relational database in the eliminate Join Processes which can affect the query
Cataloging module, MySQL is still used in some time. In the search function, at least 6 Joins used to
logics which access other modules beside Cataloging retrieve a document. This Join Processes will affect to
module. For example, Circulation module has a the slow response of the query. In the delete function
relationship with Cataloging module in the process of and insert function will at least use 3 Joins to run the
book Circulation. Figure 3 shows an updated function. This Join Processes will also affect the
architecture of Lontar combining the usage of response for each delete and insert made
MySQL and MongoDB.
Figure 3. New Cataloging Module System Design

Figure 4. MongoDB Schema
Searching method in MySQL is not efficient
IV. PROTOTYPE DEVELOPMENT AND IMPLEMENTATION
because fields of each collection stored as rows in the
table of Collection_Data_Field. Consequently, data in A notebook with processor i5-4200U and 4 GB
Collection_Data_Field will grow and degrade the RAM is used to build the prototyoe. Several softwares
performance of Searching process. This situation can are also used, such as MySQL 5.79, XAMPP Control
be overcome by using MongoDB to store documents Panel 2, Apache 2.4.10, phpMyAdmin 4.2.7.1, and
in a collection with different number of fields. MongoDB 2.6.4. Prototype implementation and
MongoDB is used to optimize Collection_Data_Field development are done through several steps, which
table in MySQL by changing Collection_Data_Field are data preparation, data migration, and code
table to field-documents in a collection in MongoDB. modification.
Field-document from Collection_Data_Field will be A. Data Preparation
stored as “tag” represented as a code from a field and
each tag has its value. For example, the row “author” Before migrating data from MySQL to MongoDB,
in the table of Collection_Data_Field will be stored as data quality checking has to be done first. Data quality
a “600” field in Collection collection in MongoDB. checking can be done in two steps, data integration
“Hermansyah” will be stored as a value of field “600” checking (such as checking for orphan values) and
(we use the term field-document to distinguish from data duplication checking.
field in term of field owned by a collection, such as B. Data Migration
“author” in book). Data migration process is done in several steps,
The same approach is also applied to exporting data in MySQL to JSON format, importing
Collection_Copy table, there are also field-documents JSON to MongoDB, and embedding collections based
particularly for Collection_Copy table. Embedded on the MongoDB schema.
document method is also implemented for easier
searching in some collections, such as C. Code Modification
Collection_Copy and Digital_File are embedded into After data has been migrated to MongoDB, the
Collection. Referencing Document is also used in next step is to modify the code in Cataloging Module
several collections, such as Collection collection and based on MongoDB syntax. Modification is done for
Collection_Type collection. Insert, Delete, and Search function since MongoDB
As can be seen in the MongoDB schema (Figure has different syntax in retrieving data.
4), Collection_Data_Field collection does not exist
43
978-1-5090-3477-2/16/$31.00
V. TESTING AND EVALUATION

In performing testing and evaluation, two systems
(MySQL and MongoDB) are compared. Testing is
done in several scenarios and each scenario tracks the
time the system needs to perform the query/scenario.
The result of the testing part is compared and
analyzed to get the conclusion of the query
performance for both systems.
A. Testing Step
First step is to add a block of code which saves the
query time needed to respond the user’s request for Figure 6. Query Time Comparison for Search function with
each system. This block code is added into each class 1,6,15, and 98 matched collections
that accesses the DBMS for every system and
function (Insert, Delete, and Search function). The
query time is used as a comparison parameter to see
the performance of each system in term of responding
user’s request. Figure 5 is the block code to save the
query time.
Figure 5. Block Code to save query time

Figure 7. Query Time Comparison for Search function with
Second step is to make a comparison scenario for 995, 15166, and 79788 matched collections
each function. Search function is tested based on the C. Insert Function Testing Result
number of matched collections in each scenario. Insert function testing is done by inserting from
Matched collections are collections filtered by the one to 100 collections. Each scenario saves the
search function. For example, if user searches for needed query time and is done for 10 times. The
books titled “Big Data”, then the system shows that average of 10 times testing is counted. Table 2 and
the number of collections matched based on the Figure 8 are the result of insert function testing.
keyword input. The scenario is started from one
matched collection to 79,788 matched collections. Table 2. Testing Result of Insert function
Delete and Insert function testing are done by deleting
or inserting from one collection to 100 collections.
B. Search Function Testing Result
Testing is done from one matched collection to
79,788 matched collections. Each matched collection
scenario is tested ten times and averaged of the query
time is calculated. Table 1, Figure 6, and Figure 7 are
the result of Search function testing.
Table 1. Testing Result of Search function
Figure 8. Query Time Comparison for Insert function
44
978-1-5090-3477-2/16/$31.00
D. Delete Function Testing Result collection, the difference is just 0.37 seconds and
Delete function is also done from one deleted 0.175 seconds for six matched collections. Compared
collection to 100 deleted collections. Like other to more than or equal to 15 matched collections,
testings, each scenario is also done in 10 times and MongoDB has faster query time than MySQL. For
averaged to get the time. Deleting a collection is also 79,788 macthed collections, the difference between
automatically deleting all copies that the collection MySQL and MongoDB is 3,199.51 seconds or almost
has, e.g. Collection A has 2 copies, if we delete 54 minutes.
Collection A, the 2 copies will also be deleted MongoDB has also another benefit compared to
automatically from the database. Table 3 and Figure 9 MySQL in terms of query time increased for 1 to 995
are the result of Delete function testing. matched collections. For that range, the increasing of
query time of MongoDB is slower than MySQL. The
Table 3. Testing Result of Delete function increasing of query time of MongoDB is less than two
times, but in MySQL, it is one to seven times. For
example, the increasing of query time from query
time of 98 macthed collections to 995 is less than 2
times in MongoDB (increased of 1.96 times from
36666 seconds to 6.3966 seconds), but in MySQL, the
increasing reaches around 7 times (increased of 7.2
times from 7.4879 seconds to 53.8038 seconds).
The next evaluation is for Delete function. In
general, query time for Delete fucntion in MongoDB
is faster than in MySQL. However, in MongoDB, the
increasing of query time is bigger than in MySQL.
For example, in MongoDB, the query time for
deleting a collection wih 10 copies to 100 copies is
around 2.21 times but in MySQL, the increasing of
query time with the same scenario is 1.62 times.
Deleting a collection with its copies will change
the copies status to “DELETED”. Copy document
which is an embedded document from Collection
Figure 9. Query Time Comparison for Delete function collection forces the new system to update the status
field in copy embedded document by accessing the
Collection collection first with the $set function. In
E. Testing Result Evaluation MySQL, changing a status of copy to “DELETED”
Evaluation is done for each tested function. For can just be done by directly updating the status
search testing, shown that for matched collections less column in table of Collection_Copy.
than 15, MySQL has a faster query time than The next evaluation is on Insert function. In
MongoDB. The reason is MySQL has a Hibernate general, MongoDB is faster than MySQL. The
framework which supports MySQL to retrieve the increasing of query time in MongoDB is slower than
data into a list. This cannot be done in MongoDB MySQL. This is because the inserting process in
which has to use a driver provided by MongoDB. The MySQL has to be done by adding rows in several
driver cannot map the collection directly into a list, tables, including Collection, Collection_Data_Field,
but need to use the API provided to save the data into and Collection_Copy table. Moreover, the more
a DBCursor. Then, every entity of DBCursor has to copies inserted impact to the more rows inserted to
be saved in the object of Collection using hasNext() Collection_Data_Field table and it will impact to the
function. In MongoDB, calling hasNext() can be query time or performance. In MongoDB, the
actually slower if a few collections are matched inserting process is only done by inserting a
compared to MySQL. However, if there are Collection document and embedded Collection_Copy
significant number of matched collections, the documents to Collection collection.
accumulated cost of calling hasNext() will not affect VI. CONCLUSION
the query time. On the other hand, accessing a lot of
matched collections in MySQL can be very slow Cataloging Module of Lontar has been
because it has to access several tables, especially table implemented successfully using MongoDB. Based on
of Collection_Data_Field which has lots of rows. the testing result, MongoDB has a better performance
Furthermore, joins in MySQL can impact to the than MySQL. This is generaly because MySQL has to
accumulated cost if there are significant number of access table of Collection_Data_Field which has lots
macthed collections. Although MySQL query time is of rows and does several joins between tables. On the
faster than MongoDB for matched collections less other hand, accessing table of Collection_Data_Field
than 15, the time difference between MySQL and is not required anymore in MongoDB since
MongoDB is not significant. For one macthed Collection_Data_Field data is stored as field-
45
978-1-5090-3477-2/16/$31.00
document in MongoDB and several joins have been [4] Y. Li and S. Manoharan, “A Performance Comparison of
replaced by embedded documents. SQL and NoSQL Databases,” dalam 2013 IEEE Pacific Rim
Conference on Communications, Computers and Signal
For less than 15 matched collection scenarios, Processing (PACRIM), Victoria, BC, 2013.
MySQL has a faster query time than MongoDB. This [5] B. G. Tudorica and C. Bucur, “A Comparison Between
is because MongoDB has to access DBCursor using Several NoSQL Databases with Comments and Notes,”
hasNext() which needs lots of cost. Although dalam 2011 10th Roedunet International Conference
(RoEduNet), Iasi, 2011.
MongoDB has a slower query time for those [6] K. Kaur and R. Rani, “Modelling and Querying Data in
scenarios, the difference is actually not significant. NoSQL Databases,” dalam 2013 IEEE International
For more than 15 matched collections, MongoDB has Conference on Big Data, Silicon Valley, CA , 2013.
a better perfomance and faster query time. For 79,788 [7] J. Han, H. E, G. Le and J. Du, “Survey on NoSQL
Database,” dalam 2011 6th International Conference on
matched collections, the difference between MySQL Pervasive Computing and Applications (ICPCA), Port
and MongoDB is 3,199.51 seconds or almost 54 Elizabeth , 2011.
minutes. [8] R. Arora and R. R. Anggarwal, “Modeling and Querying
Data in MongoDB,” International Journal of Scientific &
REFERENCES Engineering Research, vol. 4, no. 7, pp. 141-144, 2013.
[9] G. Zhao, W. Huang, S. Liang and Y. Tang, “Modelling
[1] M. H. Virdhani, “Intip Perpustakaan UI yang Baru Yuk!,” 24 MongoDB with Relational Model,” dalam 2013 Fourth
Juni 2011. [Online]. Available: International Conference on Emerging Intelligent Data and
http://news.okezone.com/read/2011/06/24/373/472323/intip- Web Technologies (EIDWT), Xi'an, 2013.
perpustakaan-ui-yang-baru-yuk. [accessed 26 December [10] MongoDB Developers, MongoDB Release Notes 2.6.4,
2014]. USA: MongoDB, Inc, 2014.
[2] S. Sagiroglu and D. Sinanc, “Big Data: A Review,” dalam [11] Lontar UI Developers, Developer Guide for Library
2013 International Conference on Collaboration Automation and Digital Archive Universitas Indonesia
Technologies and Systems (CTS), San Diego, CA, 2013. (Lontar UI v3.0), Depok: Universitas Indonesia, 2006.
[3] K. Ebner, T. Buhnen and N. Urbach, “Think Big with Big
Data: Identifying Suitable Big Data Strategies in Corporate
Environments,” dalam 2014 47th Hawaii International
Conference on System Science, Hawaii, 2014.
46

(Doi 10.1109/IWBIS.2016.7872887) Herrnansyah, Ruldeviyani, Yova Aji, Rizal Fathoni - (IEEE 2016 International Workshop On Big Data and Information Security (IWBIS) - Jakarta, Indonesia (2016.10 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

(Doi 10.1109/IWBIS.2016.7872887) Herrnansyah, Ruldeviyani, Yova Aji, Rizal Fathoni - (IEEE 2016 International Workshop On Big Data and Information Security (IWBIS) - Jakarta, Indonesia (2016.10 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

IWBIS 2016 c 2016 IEEE

Enhancing Query Performance of Library Information

Figure 3. New Cataloging Module System Design

V. TESTING AND EVALUATION

Figure 5. Block Code to save query time

Table 1. Testing Result of Search function

Figure 8. Query Time Comparison for Insert function

Das könnte Ihnen auch gefallen