Sie sind auf Seite 1von 77

An Open Source NoSQL solution for

Internet Access Logs Analysis


A practical case of why, what and how to use a NoSQL Database
Management System instead of a relational one
Jos Manuel Ciges Regueiro <jmanuel@ciges.net> - Student of the V Master on
Free Software Projects Development and Management !""#!"
Page 1
Copyright (cc) 2012 Jos Manuel Ciges Regueiro. Some rights reserved. his !or" is #ree and is licensed under the
conditions o# Creative Commons $ttri%ution & Share $li"e '.0 (nported license. )ou can use* distri%ute and reuse this
!or" i# the same license is applied and the author is +uoted
,ull te-t o# the license can %e read on http.//creativecommons.org/licenses/%y0sa/'.0/deed.en
he author1s personal !e% page is http.//!!!.ciges.net and #or contacting him pre#erred method is %y email at
2manuel3ciges.net.
4 am also availa%le at social net!or"s li"e ,ace%oo"
1
* 5oogle6
2
* !itter
'
or 4denti.ca
7
1 http://www.facebook.com/ciges
2 https://plus.google.com/105050850707469524247/posts
https://twitte!.com/ciges
4 http://i"e#ti.ca/ciges
Page 2
Inde
Foreword......................................................................................................................................................... 5
Notation used for the references and bibliography..................................................................................... 6
Introduction and description......................................................................................................................... 7
1. $utho!%s "esc!iptio#.............................................................................................................................................................................7
2. &ompa#'%s "esc!iptio#........................................................................................................................................................................7
. P!o(ect%s ob(ecti)es..............................................................................................................................................................................8
4. *ealisatio# co#"itio#s..........................................................................................................................................................................8
5. +!ief "esc!iptio# of the wo!k pla#......................................................................................................................................................10
NoSQL State of the Question....................................................................................................................... 11
1. ,hat a!e we talki#g about.................................................................................................................................................................11
2. -ocume#t.o!ie#te" "atabases..........................................................................................................................................................12
. /e'.)alue "atabases ........................................................................................................................................................................1
4. 0!aph "atabases...............................................................................................................................................................................14
5. 1abula! "atabases.............................................................................................................................................................................14
6. 2'345 as 6o345 "atabase.............................................................................................................................................................15
7. 7the! i#te!esti#g !ea"i#gs a#" featu!e compa!iso#s........................................................................................................................15
etailed description of so!e NoSQL "#S.............................................................................................. 16
1. &assa#"!a.........................................................................................................................................................................................16
2. &ouch-+...........................................................................................................................................................................................20
. 2o#go-+...........................................................................................................................................................................................24
Loo$ing for a NoSQL solution for our needs............................................................................................. %&
1. 8#t!o"uctio# a#" i#itial app!oach.......................................................................................................................................................0
2. $#al'se of 8#te!#et access logs9 "esc!ibi#g the p!oblem..................................................................................................................0
. 7h m' 0o": ,e ha)e a lot of optio#s:..............................................................................................................................................1
4. 4uestio#s we shoul" a#swe! befo!e maki#g a choice......................................................................................................................2
5. -esc!iptio# of the "ata fo! a# 8#te!#et $ccess 5og ma#ageme#t s'stem.........................................................................................2
6. &hoosi#g betwee# se)e!al 6o345 p!o"ucts....................................................................................................................................4
7. $#" the ,i##e! is ..... 2o#go-+:......................................................................................................................................................6
8. ,hat we will "o f!om he!e ................................................................................................................................................................7
9. 8#te!esti#g !ea"i#gs...........................................................................................................................................................................7
Installation of #ongo".............................................................................................................................. %'
1. -eplo'i#g 2o#go-+ bi#a!ies............................................................................................................................................................8
2. &ompili#g a#" i#stalli#g P;P "!i)e! .................................................................................................................................................8
. &ompili#g a#" i#stalli#g P'2o#go9 the P'tho# "!i)e! fo! 2o#go-+................................................................................................9
4. &o#figu!i#g the se!)e!.......................................................................................................................................................................9
5. 8#stalli#g *ock2o#go9 a P;P base" a"mi#ist!atio# tool..................................................................................................................40
6. $uthe#ticatio# i# 2o#go-+...............................................................................................................................................................41
7. -e)elope" sc!ipts fo! sta!ti#g9 stoppi#g a#" )e!if'i#g 2o#go-+ status...........................................................................................42
NoSQL Sche!a esign for Internet (ccess Logs .................................................................................... )6
1. $#al'sis of logs with 6o345.............................................................................................................................................................46
2. -esc!iptio# of a# e<ui)ale#t 2'345 "atabase.................................................................................................................................46
. -efi#i#g a schema fo! 2o#go-+ 6o345..........................................................................................................................................46
4. 8#te!esti#g a!ticles a#" p!ese#tatio#s................................................................................................................................................49
*o!parati+e of a #ySQL based solution +ersus #ongo".....................................................................5&
1. ,o!k pla# fo! the pe!fo!ma#ce tests.................................................................................................................................................50
2. P;P -e)elopme#t fo! testi#g the "atabase mo"el...........................................................................................................................50
Page
. 1esti#g 2o#go-+ )s 2'345 pe!fo!ma#ce.......................................................................................................................................52
4. 8#se!tio# tests....................................................................................................................................................................................5
5. 2ulti use! co#cu!!e#t tests................................................................................................................................................................54
6. -ata a#al'se =agg!egatio#> !ea" tests..............................................................................................................................................60
7. $gg!egatio# !ea" tests co"e..............................................................................................................................................................60
8. ;ow to !u# this tests..........................................................................................................................................................................6
*onclusions and last words........................................................................................................................ 67
1. 1ests co#clusio#s...............................................................................................................................................................................67
2. 8#itial pla##i#g a#" actual time spe#t o# each task...........................................................................................................................67
. P!oblems a#" bugs fou#"..................................................................................................................................................................68
4. ?utu!e wo!k........................................................................................................................................................................................70
5. &o#t!ibutio#s to the commu#it'.........................................................................................................................................................70
6. Pe!so#al e)aluatio# of the p!acticum................................................................................................................................................72
"ibliography , -eferences.......................................................................................................................... 7%
Page 4
!oreword
This work makes part of the fifth edition of the Master on Free Software Projects Development and Management
5
,
created ! the galician open so"rce cons"ltanc! #galia
$
and the %niversidad &e! '"an (arlos
)
"niversit! at Madrid*
This master is composed from live sessions imparted ! professionals from different speciali+ed open so"rce sol"tions
enterprises and "niversit! researchers, practical works and a final project ,called practic"m- which co"ld e made in
an enterprise*
.s # am working in an open so"rce department at PS. Pe"geot (itro/n
0
,as a worker for the #T cons"lting
Seresco
1
- # tho"ght it co"ld e a good idea to appl! part of the knowledge ac2"ired in the master to a project that is
interesting for PS.* 3"r department manages open so"rce server sol"tions over %ni4 servers ,mainl! "t not onl!
5in"4- and at that time ,earl! 6786- was considering making a st"d! of 9oS:5 and how this class of dataase
management s!stems co"ld e "sef"l for enterprise needs*
So what it;s presented ne4t is the res"lt of st"d!ing a few open so"rce 9oS:5 sol"tions, choosing and deplo!ing one
of them in PS.;s servers and making a comparative etween the dataase management s!stem "sed at this moment for
#nternet access log management and the chosen one, MongoD<*
.ll of this wo"ld not have een possile, at least the wa! it has een done, witho"t a lot of people, "t speciall!=
#galia;s and 5iresoft;s
87
people who had een made possile that in a distant cit! located in the northwest
corner of Spain eight st"dents co"ld enjo! the compan! of open so"rce e4perts
Seresco and PS. Pe"geot (itro/n;s >.MP department, who has given all the collaoration possile to
make compatile the re2"irements of the Master with ever!da! work
M! wife and m! !o"ng da"ghter, who have had a lot of patience with that little g"! ehind his comp"ter
the! are living with
M! father, who alwa!s has een there when # needed to delegate m! father f"nctions
89e ,ree: 9e ;ild: 9e <pen:=
'os? M* (iges
5 http://www.maste!softwa!elib!e.com/
6 http://www.igalia.com/
7 http://www.u!(c.es/
8 http://www.psa.peugeot.cit!oe#.com
9 http://www.se!esco.es/
10 5ib!esoft is the lib!e softwa!e a#" ope# commu#ities !esea!ch g!oup f!om @A#i)e!si"a" *e' Bua# &a!losC http://lib!esoft.es/
Page 5
Notation used for the references and bibliography
Most of affirmations made in this doc"ment are s"pported ! #nternet references ,logs of e4perts, wes of
enterprises which made the technologies cited-, ooks or p"lished papers*
# have "sed the following criteria to incl"de the references
St"dies shown on log;s posts, ooks, papers or official doc"mentation from s"pporting enterprises are shown
as part of the iliograph! ,at the end of the doc"ment-* The reference to the iliograph! is made with a
n"mer etween @ and A
%&5s to prod"cts official we pages and links to #nternet cited are shown as a little n"mer, which leads to a
note on the footer of the same page* .s a footnote is shown what normall! wo"ld e a link ,in a digital
doc"ment-* # have preferred this format to avoid loosing information in case this work is printed on paper
,please don;t kill trees onl! for a 2"estion of comfort-
#n the following e4ample we can see "tilisation of oth=
.pache (assandra
88
is an open so"rce distri"ted 9oS:5 dataase management s!stem* #t is an .pache Software
Fo"ndation topBlevel project>1?.
11 http://cassa#"!a.apache.o!g/
Page 6
Introduction and description
"# Author$s description
# am a Spanish s!stems engineer orn in 81)$ who has een working last 1 !ears in PS. Pe"geot (itro/n ,from now
on PS.- >.MP;s department hired ! Seresco ,an #T (ons"lting enterprise-*
This department gives s"pport for some 3pen So"rce servers on %ni4 machines ,mainl! 5in"4- for the needs of an!
PS.;s worker worldwide* The prod"cts we work on a dail! asis are .pache we server, M!S:5, PCP, Tomcat,
Free&adi"s, MediaDiki and T!poE*
# discovered 5in"4 at %niversit!* .t that time # knew nothing ao"t Free Software, # installed 5in"4 j"st eca"se
someod! told me with that # co"ld make the %ni4 e4ercises at home* 5ater, when #nternet ecome part of m! life, #
ecome interested in ever!thing aro"nd 5in"4 and now # am a F9%G5in"4 H Free Software fan o!*
.part from that # am also a father, h"sand, and in m! free time # tr! to make some sport, pa! attention to what
happens aro"nd and learn ever!da! something new =B-
M! personal data are
9ame= 'os? Man"el (iges &eg"eiro
<orn date= 65 Fer"ar! 81)$
>d"cation Title= Technical >ngineering in (omp"ter S!stems at %niversit! of 5a (or"Ia ,Spain-
(it! in which # live= Jigo
%# &ompany$s description
.s # said # work for Seresco
86
, an #T (ons"lting Spanish compan! orn in 81$1 with aro"nd 577 emplo!ees in the
co"ntr!* Seresco;s main activities are software development, technical assistance, cons"ltanc! and is speciali+ed in the
areas of h"man reso"rces and geographical information* (lients of Seresco are other enterprises and Spanish regional
governments*
#n Falicia one of this clients is PS. Pe"geot (itro/n
8E
, as this m"ltinational a"tomoiles and motorc!cles
man"fact"rer has a factor! at Jigo*
PS. is the second largest >"ropean a"tomaker and the eighth largest in the world meas"red ! 6787 "nit prod"ction*
Dith its E6 man"fact"ring facilities PS. emplo!ees 80$*777 and makes aro"nd E,5 millions of vehicles per !ear*
From an #T S!stems point of view from Jigo;s factor! we give service to ever! technical team to install, config"re
and "pdate some 3pen So"rce prod"cts as .pache we server, M!S:5, PCP, Tomcat, Free&adi"s, MediaDiki and
T!poE*
The #T S!stems at PS. emplo!ees 6*$77 people in 6$ different co"ntries* . ro"gh s"mmar! of the facilities co"ld e=
Servers= $*577 instances of %ni4, E*677 instances of Dindows servers, and also a few tens of Mainframe +G3S,
JMS and Tandem G F"ardian
3ffice e2"ipment= 0)*777 client comp"ters, most of them ,over )7*777- windows
12 http://www.se!esco.es
1 http://www.psa.peugeot.cit!oe#.com/
Page 7
The project;s t"tor from Seresco will e .ndr?s &iveiro Sestelo, the director of compan!;s Falician area* .t PS.;s
side this work will e verified ! David FernKnde+ FernKnde+, the head of the department*
'# (ro)ect$s ob)ecti*es
The term 9oS:5 is fairl! pop"lar in last !ears to designate a class of dataase management s!stems ,D<MS from
now- where the relations etween gro"ps of data are not reall! important* 9oS:5 dataase s!stems rose alongside
major internet companies, s"ch as Foogle, .ma+on, Twitter, and Faceook
8L
which had significantl! different
challenges that the traditional &D<MS sol"tions co"ld not cope with*
The main raison to work with this D<MS is to manage huge amounts of data where we need to generate reports
and make calc"lations "t d"e to the nat"re of the prolem the traditional data schema where the data is stored in
records with the identical str"ct"re gro"ped in tales and related etween them is not "sef"l* De co"ld think in
prolems as statistical data anal!se, logs information, doc"ment or we page inde4ing, geographical information
management M*
Dhen we sa! ig amo"nts of data we mean
the data must be distributed etween an "ndetermined n"mer of comp"ters and the architect"re m"st e
fa"lt tolerant
the performance of retrieving and appending operations m"st e ver! high, as the read & write performance
are critical
operations over the data must be distributed also, the s!stem m"st e ale to coordinate different nodes and
the res"lts got ! them to get the final res"lts

+# ,ealisation conditions
.ll the project has een reali+ed on PS.;s #T installations, as part of m! work in the office or at home "sing a
network connection via JP9*
The work has eg"n the 5 of March with an estimated completion date of late '"l! and an estimated work charge of
E77 ho"rs* M! time at PS. will not e f"ll! dedicated to this 9oS:5 st"d!, as there are other work tasks that re2"ire
m! time* So the n"mer of ho"rs per da! will e variale etween 7 and 0*
The hardware "sed for the tests will e two identical machines with the following specifications=
&.M= 87 F<
(ores= E6 ,.MD 3pteron Processor $860-
3ther info= this machines are virt"al machines hosted on a ph!sical DellN Power>dgeN &085
85
server
The initial st"d! and information searching will e done reading articles and comparatives from the #nternet* For the
development and tests phase almost all the technologies "sed will e serverBside* #n partic"lar=
ServerBside software technologies "sed
3perating S!stem= Suse Linux Enterprise Server 10 ,Jirt"al Server on Oen-
14 &assa#"!a9 #ow mai#tai#e" b' $pache was i#itiall' a ?acebook%s p!o(ect. 3ee @&assa#"!a D $ st!uctu!e" sto!age s'stem o# a P2P 6etwo!kC at
http://www.facebook.com/#ote.phpE#oteFi"G244118919Hi"G9445547199Hi#"eIG9
15 http://www."ell.com/us/e#te!p!ise/p/powe!e"ge.!815/p"
Page 8
The servers to compare will e MongoDB !!0
8$
and M"S#L
8)
$!0!% with M!#S.M tales
The development of scripts will e done on &'&, (avaScript and S#L* The code of the PCP classes will e
doc"mented "sing phpDocumentor
1)
*
The code editor will e *+M
From the clientBside # will "se
."tomation tasks and single "ser load tests will e done "sing Shell scripts
,otepad-- as code editor
(Meter
1.
for making m"lti"ser load tests, "sing the pl"gin Stepping Thread Fro"p
67
for some of them
The res"lts from 'Meter will e represented graphicall! "sing the statistical software /
68
Libre0ffice for writing the doc"mentation
1rgo2ML
66
for showing a diagram with the architect"re of PCP classes
# have made this work as part of the Master on Free Software Projects Development and
Management and # think it3s important and coherent with its spirit that4 when possible4 the
documentation created4 methodolog" used and conclusions reached should be of public
availabilit" ,and all the tools "sed sho"ld e 3pen So"rce-* The onl! limit to this consideration is the
internal data of PS. and those tools oliged ! the work place conditions ,as Dindows for the
workstation =B-*
#nternal data ,as config"rations or c"stom installation directories- has een caref"ll! replaced in the
doc"mentation ! fake data ,not real, "t valid an!wa!-*
The tools "sed to make this had een the following=
. s"mmar! of the information told in this doc"ment is availale as posts at m" personal web page
6E
,in
Spanish-* This we has een done with 5ext&attern
6L
, a PCP (ontent Management S!stem* This doc"ment
and a presentation made are also availale*
The code developed in PCP is availale in 6ithub at (igesGinternetPaccessPcontrolPdemo
65
* The CTM5
doc"mentation created with phpDoc"mentor is at http=GGwww*ciges*netGphpdocGiacdG#nternet.ccess5ogG
The presentation has een "ploaded to SlideShare
%
16 2.2.0 is the p!o"uctio# !elease f!om $ugust 2012 http://www.mo#go"b.o!g
17 2'83$2 is the light e#gi#e fo! web.base" http://www.m's<l.com/
18 http://www.php"oc.o!g/
19 http://(mete!.apache.o!g/
20 http://co"e.google.com/p/(mete!.plugi#s/wiki/3teppi#g1h!ea"0!oup
21 http://www.!.p!o(ect.o!g/
22 http://a!gouml.tig!is.o!g/
2 http://www.ciges.#et
24 1eItpatte!# is a P;P +ase" &o#te#t 2a#ageme#t 3'stem http://teItpatte!#.com/
25 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo
26 http://www.sli"esha!e.#et/&iges/#o.s<l.p!o(ectmswlp!ese#tatio#
Page 9
-# .rief description of the wor/ plan
The work plan is composed of fo"r main parts=
NoSQL State of the Question
.s at this moment we know reall! nothing of 9oS:5 technologies first we will have to ecome familiar with this
kind of dataase management s!stems*
So this part is a summar" of what + have learned reading articles and boo7s, and the first general concl"sions
reached*
De are st"d!ing to appl! 9oS:5 technologies for log information management, so in this part we will describe one
of the use cases4 the management of the access log from the enterprise networ7 to +nternet* From this "se case we
will choose a 9oS:5 sol"tion etween the prod"cts availale in the 3pen So"rce world*
Installation of chosen solution and database design for an Internet access control
software
This part will e more technical* #t will have
.n e4planation of some details of how the product has been installed, its config"ration and scripts developed
if we have the need to develop some*
1 database schema design for ,S#L ased on M!S:5 act"al sol"tion* The str"ct"re chosen and the
information fields will e shown
&omparati*e of a MySQL based solution *ersus new NoSQL one
Dith a ver! similar dataase on M!S:5 and on the new 9oS:5 D<MS
6)
# will tr! to do performance tests on oth
"sing generated random data ,# can not "se real data eca"se of confidentialit!-*
To make this tests valid the vol"me of data sho"ld e high* .s we can not "se real data some scripts will e
developed to make it possile=
6enerate fa7e random data similar to real one ,%&5s, "ser ids, timestamps, #Ps M-
(reate tales on the dataase and save all this data in a ver! similar str"ct"re regardless of the D<MS "sed
5he classes developed will have the same interface4 so the DBMS chosen will be transparent to to the
applications that "se them
5hen4 using this classes a list of write and read tests will be developed to compare oth sol"tions* . description of
each test and how to pla! them will e made in this part
&onclusions and last words
Dell, then ojective of this st"d! is to learn ao"t 9oS:5 and to have concrete data to decide if it;s a good idea to
"se 9oS:5 for some of PS.;s needs*
Cere the res"lts otained, its limitations if there are an! and f"t"re work to do will e detailed*
27 -atabase 2a#ageme#t 3'stem
Page 10
NoSQL State of the Question
"# 0hat are we tal/ing about
.s # said in the previo"s chapter 9oS:5 is a class of dataase management s!stems that differ from the classic
model of the relational dataase management s!stems=
5he" do not use S#L
9oS:5 dataase s!stems rose alongside major internet companies, s"ch as Foogle, .ma+on, Twitter, and
Faceook which had significantl! different challenges in dealing with h"ge 2"antities of data that the traditional
&D<MS sol"tions co"ld not cope with*
The kind of prolems this dataases are developed for are the management of reall! ig amo"nts of data that do
not follow necessaril" a fixed schema* The data is partitioned etween different machines ,for performance
2"estions and d"e its si+e- so (0+, operations and are not usable and .(#D g"arantees are not given* D<MS
and S:5 are not valid tools
Ma" not give full 18+D guarantees
%s"all! onl" eventual consistenc" is guaranteed or transactions limited to single data items* This means that
given a s"fficientl! long period of time over which no changes are sent, all "pdates can e e4pected to propagate
event"all! thro"gh the s!stem*
%s"all! the! have a distributed architecture and are fault tolerant
Several 9oS:5 s!stems emplo! a distri"ted architect"re, with the data held in a red"ndant manner on several
servers* #n this wa!, the s!stem can easil! scale o"t ! adding more servers, and fail"re of a server can e
tolerated*
This t!pe of dataases t!picall! scale hori9ontall" and are "sed for managing with big amounts of data, when
the performance and realBtime nat"re is more important than consistenc! ,as inde4ing a large n"mer of
doc"ments, serving pages on highBtraffic wesites, and delivering streaming media-*
9oS:5 Dataases are often highl! optimi+ed for retrieve and append operations and often offer little f"nctionalit!
e!ond record storage ,e*g* ke!Bval"e stores-* The red"ced r"n time fle4iilit! compared to f"ll S:5 s!stems is
compensated ! significant gains in scalailit! and performance for certain data models*
#n short, 9oS:5 dataase management s!stems are "sef"l when we work with a h"ge 2"antit! of data, and the data;s
nat"re does not re2"ire a relational model for the data str"ct"re* The data co"ld e str"ct"red, "t it is minimal and what
matters is the ailit! of storing and retrieving great 2"antities of data, and not the relations etween the elements*
<! e4ample, we want to store millions of pairs ke!Bval"e in one or a few associative arra!s or we want to store
million of data records* This is partic"larl! "sef"l for statistical or realBtime anal!ses for growing list of elements ,think
in posts at Twitter or the logs of access to #nternet from a ig gro"p of "sers-*
9oS:5 dataases are categori+ed according to the wa! the! store the data* The main categories we co"ld consider
are
Doc"mentBoriented dataases
Page 11
Qe!Bval"e dataases
Fraph dataases
Ta"lar dataases, also called (ol"mnar dataases
%# Document1oriented databases
. doc"mentBoriented dataase stores, retrieves, and manages semi str"ct"red data* 5he element of data is called
:document;*
Different implementations offer different wa!s of organi+ing andGor gro"ping doc"ments=
(ollections
Tags
9onBvisile Metadata
Director! hierarchies
(ompared to relational dataases we co"ld sa! collections are as tales, and doc"ments are as records* <"t there is
one ig difference= ever! record in a tale have the same n"mer of fields, while documents in a collection could have
completel" different fields*
>ncodings in "se incl"de OM5, R.M5, 'S39, and <S39, as well as inar! forms like PDF and Microsoft 3ffice
doc"ments ,MS Dord, >4cel, and so on-*
Doc"ments are addressed in the dataase via a "ni2"e 7e" that represents that doc"ment* 3ne of the other defining
characteristics of a doc"mentBoriented dataase is that, e!ond the simple ke!Bdoc"ment ,or ke!Bval"e- look"p that !o"
can "se to retrieve a doc"ment, the dataase will offer an 1&+ or <uer" language that will allow "ou to retrieve
documents based on their contents*
Data eample
For e4ample, MongoD< "ses a inar! form 'S39 to store data* .n e4ample MongoD< collection co"ld e descried
as
{
"_id": ObjectId("4efa8d2b7d284dad101e4bc9"),
"Last Name": "D!ON",
""i#st Name": "$ea%",
"&'e": 4(
),
{
"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),
"Last Name": "*+LL+,IN",
""i#st Name": ""#a%c-",
"&'e": 29,
"&dd#ess": "1 c.emi% des L/'es",
"0it1": "2+,3&ILL+3"
)
Page 12
Some Open Source solutions
>4ample of 3pen So"rce doc"mentBoriented 9oS:5 dataases that we will st"d! are=
MongoDB= . 87gen
60
project which store str"ct"re data as 'S39Blike doc"ments with d!namic schemas
,MongoD< calls the format <S39-* MongoD< provides d!namic 2"eries, inde4es, geospatial inde4es and
masterBslave replication with a"toBfailover* MongoDB is being developed as a business with commercial
support available!
Best used= #f !o" need d!namic 2"eries* #f !o" prefer to define inde4es, not mapGred"ce f"nctions* #f
!o" need good performance on a ig D<* #f !o" wanted (o"chD<, "t !o"r data changes too m"ch,
filling "p disks* >2?
#f !o"r D< is E9F and !o" donSt do an! joins ,!o"Sre j"st selecting a "nch of tales and p"tting all
the ojects together, .Q. what most people do in a weapp-, MongoD< wo"ld proal! kick ass for
!o"* >'?
>or example >2?= For most things that !o" wo"ld do with M!S:5 or PostgreS:5, "t having
predefined col"mns reall! holds !o" ack*
8ouchDB= .n .pache Software Fo"ndation project which "ses 'S39 over &>STGCTTP* (o"chD< provides
master?master replication and versioning*
Best used= For acc"m"lating, occasionall! changing data, on which preBdefined 2"eries are to e r"n*
Places where versioning is important* >2?
>or example= (&M, (MS s!stems* MasterBmaster replication is an especiall! interesting feat"re,
allowing eas! m"ltiBsite deplo!ments* >2?
'# 2ey1*alue databases
Data is stored as pairs ke!Bval"e, in a schemaBless wa!* . val"e co"ld e of an! data t!pe or oject*
>4ample of 3pen So"rce ke!Bval"e 9oS:5 dataases that we will st"d! are=
8assandra=
61
#t is an 1pache Software >oundation top?level pro@ect >1?, initiall! developed ! Faceook* #t
is distri"ted and designed for "sing commodit! servers >7?* #t is possile to "se Map &ed"ce with .pache
Cadoop*
Best used= Dhen !o" write more than !o" read ,logging-* >2?
>or example= <anking, financial ind"str!* Drites are faster than reads, so one nat"ral niche is real
time data anal!sis* >2?
.pache Cadoop is a software framework that s"pports dataBintensive distri"ted applications which
is ecoming a standard for data storing and anal!se*
Membase=
E7
Memcache li7e compatible database ,it "ses Memcache protocol- "t with persistence to disk
and masterBmaster replication ,all nodes are identical-
28 http://10ge#.com/
29 http://cassa#"!a.apache.o!g/
0 http://www.couchbase.com/membase
Page 1
Best used= .n! application where lowBlatenc! data access, high conc"rrenc! s"pport and high
availailit! is a re2"irement* >2?
For e4ample= 5owBlatenc! "seBcases like ad targeting or highl!Bconc"rrent weapps like online
gaming >2?
/edis=
E8
. ver! fast 9oS:5 dataase that 7eeps most data in memor"* Provides masterBslave replication and
transactions* .s Memcached it does not scale, can do sharding ! handling it in the client, and therefore, !o"
canSt j"st start adding new servers and increase !o"r thro"ghp"t* 9or is it fa"lt tolerant* Ro"r &edis server dies,
and there goes that data* &edis also s"pports replication* >'?
Best used= For rapidl! changing data with a foreseeale dataase si+e ,sho"ld fit mostl! in memor!-*
>2?
>or example= Stock prices* .nal!tics* &ealBtime data collection* &ealBtime comm"nication* >2?
/ia7=
E6
&iak is a 9oS:5 dataase implementing the principles from 1ma9on3s D"namo storage s"stem*
/ia7 provides built?in Map /educe s"pport, f"llBte4t search, inde4ing H 2"er!ing* (omes in Topen so"rceT
and TenterpriseT editions*
Best used= #f !o" want something (assandraBlike, "t no wa! !o";re gonna deal with the loat and
comple4it!* #f !o" need ver! good singleBsite scalailit!, availailit! and fa"ltBtolerance, "t !o";re
read! to pa! for m"ltiBsite replication* >2?
>or example= PointBofBsales data collection* Factor! control s!stems* Places where even seconds of
downtime h"rt* (o"ld e "sed as a wellB"pdateBale we server* >2?
+# 3raph databases
This kind of dataases are tho"ght for data whose relations are well represented with a graphBst!le ,elements
interconnected with an "ndetermined n"mer of relations etween them-* The kind of data co"ld e social relations,
public transport lin7s, road maps or networ7 topologies, ! e4ample*
>4amples of 3pen So"rce Fraph 9oS:5 dataases co"ld e=
,eoA@=
EE
Fraph dataase with f"ll .(#D conformit!, transactions, inde4ing of nodes and relationships and
advanced pathBfinding with m"ltiple algorithms*
>loc7DB=
EL
Fraph dataase created ! Twitter for managing pl"s de 8E illion of relationships etween its
"sers*
-# 4abular databases
#n this kind of D<MS data is organi+ed in col"mns* >ach rows has one or more val"es for a n"mer of possile
col"mns*
>4ample of 3pen So"rce Ta"lar 9oS:5 dataases are=
'Base=
E5
.n alternative to Foogle;s <ig Tale that "ses Cadoop;s CDFS as storage* 3ffers MapGred"ce with
1 http://!e"is.io/
2 http://wiki.basho.com/*iak.html
http://#eo4(.o!g
4 https://github.com/twitte!/flock"b
5 http://hbase.apache.o!g/
Page 14
.pache Cadoop*
Best used= Dhen !o" "se the CadoopGCDFS stack* Dhen !o" need random, realBtime readGwrite
access to <ig TaleBlike data* >2?
>or example= For data that;s similar to a search engine;s data* >2?
8assandra, aforementioned, co"ld e considered also a ta"lar dataase d"e to ke!s map to m"ltiple val"es,
which are gro"ped into column families
5# MySQL as NoSQL database
&ecentl! a preview version of ne4t M!S:5 5*$ >@? at M!S:5 Developer Uone has een released ! 3racle* This
version has a 9oS:5 interface*
Dith this interface applications co"ld write and read to a #nnoD< storage "sing a MemcacheBt!pe .P#* The data
co"ld e in memor! or stored in the #nnoD< Storage >ngine, and in the val"e m"ltiple col"mns co"ld e stored*
5his software is "et experimental, "t in the f"t"re co"ld e interesting* More info can e read at the following
articles=
M!S:5 5*$ preview introd"ces a 9oS:5 interface at The C we* >A?
9oS:5 to #nnoD< with Memcached at TTransactions on #nnoD<T log* >B?
6# Other interesting readings and feature comparisons
(onsistenc! Models in 9onB&elational Dataases ! F"! Carrison = . good e4planation of (.P Theorem,
>vent"al consistenc! and how consistenc! prolems can e handled in distri"ted environments* >C?
The appendi4 of the 3;&eall! ook (assandra= The Definitive F"ide makes a ver! good description of the
9oS:5 c"rrent stat"s* >D?
The following articles done good e4planations to know which 9oS:5 sol"tions is the right choice for the scenario
we are facing
(o"chD< vs MongoD< ! Fariele 5ana= . good comparison etween (o"chD< and MongoD< with an
e4cellent e4planation of MapG&ed"ce* >10?
Sho"ld # "se MongoD< or (o"chD< ,or &edis-V, ! &i!ad Qalla* >11?
Page 15
Detailed description of some NoSQL D.MS
# have read a lot over a few 9oS:5 Dataase Management S!stems which co"ld e interesting for o"r needs ,log
management-* #n partic"lar there are three that ca"ght initiall! m! attention= (assandra, (o"chD< and MongoD<*
#n this section # will made a rief description of each one*
"# &assandra
.pache (assandra
E$
is an open so"rce distri"ted 9oS:5
dataase management s!stem* #t is an .pache Software Fo"ndation topBlevel project>1? designed to handle ver! large
amo"nts of data spread o"t ;;;across man! commodit! servers;;; while providing a highl! availale service with ;;;no
single point of fail"re;;;*
(assandra provides a structured 7e"?value store with tunable consistenc"* Qe!s map to m"ltiple val"es, which are
gro"ped into col"mn families* Different ke!s can have different n"mers of col"mns* This makes (assandra a h!rid
data management s!stem etween a 7e"?value and a tabular database*
7istory
.pache (assandra was developed at Faceook to power their #no4 Search feat"re ! .vinash 5akshman ,one of the
a"thors of .ma+on;s D!namo- and Prashant Malik*
#t was released as an open so"rce project on Foogle code in '"l! 6770* #n March 6771, it ecame an .pache
#nc"ator project* 3n Fer"ar! 8), 6787 it grad"ated to a topBlevel project* >12?
Faceook aandoned (assandra in late 6787 when the! "ilt Faceook Messaging platform on C<ase* >1'?
Licensing and support
.pache (assandra is an .pache Software Fo"ndation project, so it has an .pache 5icense ,version 6*7-
E)
*
There are professional grade s"pport availale from a few companies* #n the official wiki of .pache (assandra;s
project >17? the following ones, which collaorate with developers to the project, are mentioned
.c"n"
E0
Datasta4
E1
Main features
Decentrali9ed
>ver! node in the cl"ster has the same role* There is no single point of failure* Data is distri"ted across the
cl"ster ,so each node contains different data-, "t there is no master as ever! node can service an! re2"est*
Supports replication and multi datacenter replication
6 http://cassa#"!a.apache.o!g/
7 http://www.apache.o!g/lice#ses/58&J63J.2.0.html
8 http://www.acu#u.com/
9 http://"atastaI.com/
Page 16
&eplication strategies are config"rale >1@?. (assandra is designed as a distri"ted s!stem, for deplo!ment of
large n"mers of nodes across m"ltiple data centers* Qe! feat"res of (assandraSs distri"ted architect"re are
specificall! tailored for m"ltiple datacenter deplo!ment, for red"ndanc!, for failover and disaster recover!*
Elasticit"
&ead and write thro"ghp"t oth increase linearl! as new machines are added, with no downtime or interr"ption
to applications*
>ault?tolerant
Data is a"tomaticall! replicated to m"ltiple nodes for fa"ltBtolerance* &eplication across m"ltiple data centers is
s"pported* Failed nodes can e replaced with no downtime*
5unable consistenc"
Drites and reads offer a t"nale level of consistenc!, all the wa! from Twrites never failT to Tlock for all
replicas to e readaleT, with the 2"or"m level in the middle*
Map /educe support
(assandra has Cadoop integration, with Map &ed"ce s"pport* There is s"pport also for .pache Pig
L7
and
.pache Cive
L8
* >1A?
#uer" language
(:5 ,(assandra :"er! 5ang"age- was introd"ced, an S:5Blike alternative to the traditional &P( interface*
5ang"age drivers are availale for ;;;'ava;;; ,'D<(- and ;;;P!thon;;; ,D<.P#6-*
8nterprises who use &assandra
. ver! rief list of known enterprises who "ses (assandra co"ld e=
,etflix, "ses (assandra as their ackBend dataase for their streaming services >1B?
5witter, anno"nced it is planning to "se (assandra eca"se it can e r"n on large server cl"sters and is capale
of taking in ver! large amo"nts of data at a time* >1C?
2rban 1irship, "ses (assandra with the moile service hosting for over 8$7 million application installs across
07 million "ni2"e devices* >1D?
8onstant 8ontact, "ses (assandra in their social media marketing application* >20?
/ac7space >21?
8isco3s BebEx "ses (assandra to store "ser feed and activit! in near real time* >22?
. list more complete can e looked "p at Datasta4 T(assandra %sersT page
L6
Data manipulation9 /eys, row /eys, columns and column families
.s said in 9oS:5 State of the :"estion section we co"ld consider (assandra a h"brid between a 7e"?value and
a tabular database*
For each ke! in (assandra corresponds a val"e which is an oject* >ach ke! has val"es as col"mns, and col"mns are
gro"ped together into sets called col"mn families* .lso, each col"mn families can e gro"ped in s"per col"mn families*
40 http://pig.apache.o!g/
41 http://hi)e.apache.o!g
42 http://www."atastaI.com/cassa#"!ause!s
Page 17
So each ke! identifies a row, of variale elements n"mer* This col"mn families co"ld e considered then as tales*
. tale in (assandra is a distri"ted m"lti dimensional map inde4ed ! a ke!*
F"rthermore, applications can specif! the sort order of col"mns within a S"per (ol"mn or Simple (ol"mn famil!*
4ools for &assandra
(assandra has "ilt in tools for accessing (assandra from the direct download s"ch cassandra?cli and node?tool*
There are third part! tools availale, as the following= >2'?
Data browsers
(hiton
LE
, a FTQ data rowser*
cassandraBg"i
LL
, a Swing data rowser*
1dministration tools
3ps(enter
L5
, 3ps(enter is a tool for management and monitoring of a (assandra cl"ster* The (omm"nit!
>dition of 3ps(enter is free for an!one to download and "se* There is also an >nterprise >dition of 3ps(enter
that incl"des additional feat"res*
(assandra (l"ster .dmin
L$
, (assandra (l"ster .dmin is a F%# tool to help people administrate their .pache
(assandra cl"ster, similar to PCPM!.dmin for M!S:5 administration*
Client inter#aces and language Support
(assandra has a lot of highBlevel client liraries for P!thon, 'ava, *9et, &"!, PCP, Perl, (WW, etc* >27?
For a detailed list of client software go to (lient 3ptions article on (assandra;s Diki
L)
4ntegration !ith other tools
There are other tools worth mentioning like Solandra
A)
, a (assandra ackend for .pache Solr
L1
, a we application
"ilt aro"nd 5"cene, for f"ll te4t inde4ing and search*
For monitoring p"rposes (assandra is well integrated with Fanglia >2@? and there are pl"gins for other monitoring
s!stem as, ! e4ample, 9agios*
&onclusions
#f we need to handle ver! ig amo"nts of data, with more writes than reads ,as for real time data anal!sis, !
e4ample- (assandra co"ld e a good 9oS:5 sol"tion*
# will emphasi+e the following=
Cadoop;s "tilisation for Map &ed"ce is integrated in (assandra @8$A, "t the architect"re will e fairl! comple4
,simpler than C<ase according to Dominic Dilliams @6$A-
4 http://github.com/"!iftI/chito#
44 http://co"e.google.com/p/cassa#"!a.gui
45 http://www."atastaI.com/p!o"ucts/opsce#te!
46 https://github.com/sebgi!ouI/&assa#"!a.&luste!.$"mi#
47 http://wiki.apache.o!g/cassa#"!a/&lie#t7ptio#s
48 https://github.com/t(ake/3ola#"!a 3ola#"!a sou!ce at 0ithub
49 http://luce#e.apache.o!g/sol!/
Page 18
#t;s t"nale consistenc! and m"lti datacenter s"pport
Interesting readings
(assandra B . Decentrali+ed Str"ct"red Storage S!stem, a 6771 paper presenting (assandra ! their creators
.vinash 5akshman and Prashant Malik* >2@?
L Months with (assandra, a love stor!, a chronicle and main reasons wh! (assandra was adopted at
(lo"dQick* >2B?
C<ase vs (assandra= wh! we moved post from Dominic Dilliams log, where he e4plains wh! the! moved
from C<ase to (assandra* >2A?-
C<ase vs (assandra, from .dk" log* >2C?
(assandra vs* ,(o"chD< X MongoD< X &iak X C<ase-, from <rian 3;9eill log* >2D?
#ntrod"ction to (assandra= &eplication and (onsistenc! presentation ! <enjamin <lack* >'0?
Page 19
%# &ouchD.
8ouchDB
57
is an open so"rce document?oriented ,oS#L database
s"stem* #t;s similar to MongoD<
(reated in 6775, egan an .pache Software Fo"ndation project in 6770,
and makes part of the TnewT 9oS:5 famil! of dataase s!stems* #nstead of
storing data in tales as is made in a TclassicalT relational dataase,
(o"chD< store structured data as (S0, documents with d!namic schemas, making easier and faster the integration
of data in certain t!pe of applications*
(o"chD< is interesting in part d"e to its M"ltiBJersion (onc"rrent (ontrol* This means that we have versioning
support for the data and that readers will not lock writers and writers will not lock readers*
(o"chD< "ses a &>STf"l 'S39 .P# for accessing data, which allows accessing data "sing CTTP re2"est*
3ther feat"res are .(#D semantics with eventual consistenc", Map /educe, incremental replication, and fa"ltB
tolerance* #t comes with a we console*
7istory
8ouchDB ,(o"ch is an acron!m for cl"ster of "nreliale commodit! hardware- >'1? is a project created in .pril
6775 ! Damien Qat+, former 5ot"s 9otes developer at #<M* Damien Qat+ defined it as a storage s!stem for a large
scale oject dataase* Cis ojectives for the dataase were for it to ecome the dataase of the #nternet and that it
wo"ld e designed from the gro"nd "p to serve we applications* Ce selfBf"nded the project for almost two !ears and
released it as an open so"rce project "nder the F9% Feneral P"lic 5icense*
+n >ebruar" 00)4 it became an 1pache +ncubator pro@ect and the license was changed to the .pache 5icense
>'2?* . few months after, it grad"ated to a topBlevel project* >''?
("rrentl!, (o"chD< is maintained at the .pache Software Fo"ndation with acking from #<M* Qat+ works on it
f"llBtime as the lead developer*
First stale version was released in '"l! 6787 >'7?* 5ast version is 8*6, released in .pril 6786*
Licensing and support
(o"chD< is an .pache Software Fo"ndation project, and so it has an .pache 5icense 6*7
(o"chD< has commercial s"pport, ! the enterprises (o"chase
58
and (lo"dant
56
*
Main features
. s"mmar! of main feat"res co"ld e the following
Document Storage
8ouchDB stores data as :documents;4 as one or more fieldCvalue pairs expressed as (S0,* Field val"es can
50 http://couch"b.apache.o!g/
51 http://www.couchbase.com
52 https://clou"a#t.com/
Page 20
e simple things like strings, n"mers, or dates* <"t !o" can also "se ordered lists and associative arra!s* >ver!
doc"ment in a (o"chD< dataase has a "ni2"e id and there is no re2"ired doc"ment schema*
18+D Semantics
8ouchDB provides 18+D semantics >'@? #t does this ! implementing a form of M"ltiBJersion (onc"rrenc!
(ontrol, meaning that (o"chD< can handle a high vol"me of conc"rrent readers and writers witho"t conflict*
MapC/educe *iews and +ndexes
The data stored is str"ct"red "sing views* #n (o"chD<, each view is constructed b" a (avaScript function that
acts as the Map half of a mapGred"ce operation* The f"nction takes a doc"ment and transforms it into a single
val"e which it ret"rns* (o"chD< can inde4 views and keep those inde4es "pdated as doc"ments are added,
removed, or "pdated*
Distributed 1rchitecture with /eplication
(o"chD< was designed with bi?direction replication Dor s"nchroni9ationE and off?line operation in mind*
That means m"ltiple replicas can have their own copies of the same data, modif! it, and then s!nc those changes
at a later time*
/ES5 1&+
1ll items have a uni<ue 2/+ that gets exposed via '55&* &>ST "ses the CTTP methods P3ST, F>T, P%T
and D>5>T> for the fo"r asic ,(reate, &ead, %pdate, Delete- operations on all reso"rces*
Eventual 8onsistenc"
(o"chD< g"arantees event"al consistenc! to e ale to provide oth availailit! and partition tolerance*
Built for 0ffline
(o"chD< can replicate to devices ,like smartphones- that can go offline and handle data s!nc for !o" when the
device is ack online*
(o"chD< offers also a built?in admin interface accessible via web called F"ton >'A?*
:se cases ; production deployments
&eplication and s!nchroni+ation capailities of (o"chD< make it ideal for "sing it in moile devices, where network
connection is not g"aranteed "t the application m"st keep on working offline*
(o"chD< is well s"ited for applications with acc"m"lating, occasionall! changing data, on which preBdefined
2"eries are to e r"n and where versioning is important ,(&M, (MS s!stems, ! e4ample-* MasterBmaster replication is
an especiall! interesting feat"re, allowing eas! m"ltiBsite deplo!ments >2?*
8nterprises who use &ouchD.
(o"chD< is "sed in certain applications for 1ndroid like Spread5!rics
5E
and applications for >aceboo7 like
Dill !o" Qissme or <irthda! Freeting (ards or wes like Friendpaste >'B?.
. few e4amples of enterprises that "sed or are "sing (o"chD< are=
5 https://pla'.google.com/sto!e/apps/"etailsEi"Gb!.com.sma!tfi#ge!s.sp!ea"l'!ics
Page 21
%"nt"
5L
for its s!nchroni+ation service %"nt" 3ne "ntil 9ovemer 6788 >'C? "t was discontin"ed
eca"se of scalailit! iss"es* >'D?
The <<(
55
, for its d!namic content platforms* >70?
(redit S"isse
5$
, for internal "se at commodities department for their marketplace framework* >'B?
Meeo
5)
, for their social platform ,we and applications-
For a complete list of software projects and we sites that "se (o"chD<, read the (o"chD< in the wild >'B?
article of the prod"ct;s we*
Data manipulation9 Documents and <iews
(o"chD< is similar to other doc"ment stores like MongoD<* 8ouchDB manages a collection of (S0, documents!
5he documents are organised via views* Jiews are defined with aggregate f"nctions and filters are comp"ted in
parallel, m"ch like Map &ed"ce*
Jiews are generall! stored in the dataase and their inde4es "pdated contin"o"sl!* (o"chD< s"pports a view s!stem
"sing e4ternal socket servers and a 'S39Based protocol* >71? .s a conse2"ence, view servers have een developed in a
variet! of lang"ages ,'avaScript is the defa"lt, "t there are also PCP, &"!, P!thon and >rlang-*
Accessing data *ia 744(
(o"chD< provides a set of &>STf"l CTTP methods ,e*g*, P3ST, F>T, P%T or D>5>T>-* De co"ld access to the
data, "sing c%&5, ! e4ample*
. few e4amples of data accessing via CTTP co"ld e=
For accessing (o"chD< server info=
c4#5 .tt6:77127808081:99847
The (o"chD< server ret"rns a response in 'S39 format=
{"c/4c.db":":e5c/me",";e#si/%":"18180")
For creating a dataase we co"ld=
c4#5 <= *> .tt6:77127808081:99847?i-i
#f the dataase does not e4ist, (o"chD< will repl! with
{"/-":t#4e)
or, with a different response message, if the dataase alread! e4ists=
{"e##/#":"fi5e_e@ists","#eas/%":">.e database c/45d %/t be c#eated, t.e fi5e a5#ead1
e@ists8")
&onclusions
For knowing if (o"chD< is for "s # will emphasi+e the following=
For getting res"lts we have to define views* This means that if our problem could not be resolved with a set
of predefined <ueries 8ouchDB is not for us, as it lacks fle4iilit! in the wa! of 2"er!ing data
For this reason the initial learning c"rve is harder
54 http://www.ubu#tu.com/
55 http://www.bbc.co.uk/
56 http://www.c!e"it.suisse.com
57 https://www.meebo.com/about/
Page 22
(o"chD< has master?master replication support, ideal for a different data centers m"ltiBnode set"p* .lso it
can replicate to devices that can go offline ,like smartphones- and handle data s!nc for !o" when the device
is ack online ,making it a good sol"tion for applications working on distri"ted environments with non
g"aranteed connection-
(o"chD< has multiple versions support
The interface is CTTPG&>ST, so is easil! accessile ! an! applicationGlang"ageGserver
For what is said in a few logs (o"chD< is not a ver! mat"re project* 5ast version is the 8*6*7 and it has reaking
changes with regard to the previo"s version* >72?
Interesting readings
Dh! (o"chD<V, from the T(o"chD< The Definitive F"ide >7'?
(omparing MongoD< and (o"chD<, from MongoD< we >77?
MongoD< or (o"chD< B fit for prod"ctionV, 2"estion and responses at Stack3verflow >7@?
E (o"chD< (ase St"dies, post on .le4 Popesc" 9oS:5 log >7A?
(o"chD< for access log aggregation and anal!sis, post on %serPrimer!*net log >7B?
Page 2
'# MongoD.
MongoD< ,from h"mongo"s- is an open so"rce document?oriented
,oS#L database s"stem*
MongoD< makes part of the new 9oS:5 famil! of dataase
s!stems* #nstead of storing data in tales as is made in a classical relational dataase, MongoD< store str"ct"re data
as 'S39Blike doc"ments with d!namic schemas ,MongoD< calls the format <S39-, making easier and faster the
integration of data in certain t!pe of applications*
Development of MongoD< egan in 3ctoer 677) ! 87gen
50
* #t is now a mat"re and feat"re rich dataase read! for
prod"ction "se* #t;s "sed, ! e4ample, ! MTJ 9etworks >7C?, (raigslist >7D? or Fo"rs2"are >@0?.
7istory
Development of MongoD< egan at 87gen in 677), when the compan! was "ilding a Platform as a Service
similar to Foogle .pp >ngine >@1?* #n 6771 MongoD< was open so"rced as a standBalone prod"ct >@2?, with an F9%
.ffero Feneral P"lic 5icense ,or 16&L
51
E license*
#n March 6788, from version 8*L, MongoD< has een considered prod"ction read! >@'?*
The last stale version is 6*6*7, released in ."g"st 6786*
Licensing and support
MongoD< is availale for free "nder the F9% .ffero Feneral P"lic 5icense >@7?* The lang"age drivers are
availale "nder an .pache 5icense*
MongoD< is eing developed ! 87genA as a "siness with commercial s"pport availale >@@?*
Main features
. s"mmar! of main feat"res co"ld e the following
1d hoc <ueries
MongoD< s"pports search ! field, range 2"eries, reg"lar e4pression searches* :"eries can ret"rn specific
fields of doc"ments and also incl"de "serBdefined 'avaScript f"nctions*
+ndexing
.n! field in a MongoD< doc"ment can e inde4ed ,inde4es in MongoD< are concept"all! similar to those in
&D<MS-* Secondar! inde4es and geospatial inde4es are also availale*
/eplication
MongoD< s"pports :master?slave replication;* . master can perform reads and writes* . slave copies data
from the master and can onl! e "sed for reads or ack"p ,not writes-* The slaves have the ailit! to elect a new
master if the c"rrent one goes down*
Load balancing
MongoDB scales hori9ontall" using a s"stem called sharding >@A?* The developer chooses a shard ke!, which
58 http://e#.wikipe"ia.o!g/wiki/10ge#
59 http://www.g#u.o!g/lice#ses/agpl.html
Page 24
determines how the data in a collection will e distri"ted* The data is split into ranges ,ased on the shard ke!-
and distri"ted across m"ltiple shards* ,. shard is a master with one or more slaves*-
MongoD< can r"n over m"ltiple servers, alancing the load andGor d"plicating data to keep the s!stem "p and
r"nning in case of hardware fail"re* ."tomatic config"ration is eas! to deplo! and it;s possile to add new
machines to a r"nning dataase*
>ile storage
MongoD< co"ld e "sed as a file s!stem, taking advantage of load alancing and data replication feat"res over
m"ltiple machines for storing files*
This f"nction, called FridFS >@B?, is incl"ded with MongoD< drivers and availale with no diffic"lt! for
development lang"ages* MongoD< e4pose f"nctions for manip"late files and their contents to developers*
FridFS is "sed, ! e4ample, in pl"gins for 9F#9O >@C? and lighttpd >@D?*
#n a m"ltiple machines MongoD< s!stem, files co"ld distri"ted and copied m"ltiple times etween machines
transparentl!, having then a load alanced H fa"lt tolerant s!stem*
1ggregation
Map &ed"ce can e "sed for atch processing of data and aggregation operations* The aggregation framework
enales "sers to otain the kind of res"lts S:5 gro"pB! is "sed for
Server?side (avaScript execution
'avaScript can e "sed in 2"eries, aggregation f"nctions ,such as Map /educe-, are sent directl! to the dataase
to e e4ec"ted*
8apped collections
MongoD< s"pports fi4edBsi+e collections called capped collections* This t!pe of collection maintains insertion
order and, once the specified si+e has een reached, ehaves like a circ"lar 2"e"e*
:se cases ; production deployments
.ccording to %se (ases article at prod"ct;s we MongoD< >A0? is well s"ited for following cases=
1rchiving and event logging
Document and 8ontent Management S"stems* as a doc"mentBoriented ,'S39- dataase, MongoD<;s
fle4ile schemas are a good fit for this*
>B(ommerce* Several sites are "sing MongoD< as the core of their eBcommerce infrastr"ct"re ,often in
comination with an &D<MS for the final order processing and acco"nting-*
6aming* Cigh performance small readGwrites are a good fit for MongoD<Y also for certain games geospatial
inde4es can e helpf"l*
'igh volume problems* Prolems where a traditional D<MS might e too e4pensive for the data in 2"estion*
#n man! cases developers wo"ld traditionall! write c"stom code to a file s!stem instead "sing flat files or other
methodologies*
Mobile* Specificall!, the server?side infrastructure of mobile s"stems* Feospatial ke! here*
0perational data store of a web site* MongoD< is ver! good at realBtime inserts, "pdates, and 2"eries*
Scalailit! and replication are provided which are necessar! f"nctions for large we sites; realBtime data stores*
Specific we "se case e4amples
Page 25
&ro@ects using iterativeCagile development methodologies* Mongo;s <S39 data format makes it ver! eas!
to store and retrieve data in a doc"mentBst!le G schemaless format* .ddition of new properties to e4isting
ojects is eas! and does not generall! re2"ire locking T.5T>& T.<5>T st!le operations*
/eal?time statsCanal"tics
8nterprises who use MongoD.
<etween the enterprises who "se MongoD< there are= MTJ 9etworks, (raigslist, Disne! #nteractive Media
Fro"p, Dordnik, Diaspora, Sh"tterfl!, fo"rs2"are, it*l!, The 9ew Rork Times, So"rceForge, <"siness #nsider,
>ts!, (>&9 5C(, Th"mtack, .ppScale, %er or The F"ardian
For a complete list and references on each partic"lar "se case visit the article TProd"ction Deplo!mentsT on
MongoD<;s we >A1?
Data manipulation9 &ollections and Documents
MongoD< store str"ct"re data as 'S39Blike doc"ments with d!namic schemas ,called <S39, with no predefined
schema*
The element of data is called documents, stored in collections* 3ne collection ma! have an! n"mer of doc"ments*
(ompared to relational dataases we co"ld sa! collections are as tales, and doc"ments are as records* <"t there is
one ig difference= ever! record in a tale have the same n"mer of fields, while doc"ments in a collection co"ld have
completel! different fields*
3ne tale of a few records with the fields 5ast name, First name and .ge and possile others like .ddress
or (it! co"ld e descried as the following MongoD< collection=
{
"_id": ObjectId("4efa8d2b7d284dad101e4bc9"),
"Last Name": "D!ON",
""i#st Name": "$ea%",
"&'e": 4(
),
{
"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),
"Last Name": "*+LL+,IN",
""i#st Name": ""#a%c-",
"&'e": 29,
"&dd#ess":
{
"3t#eet" : "1 c.emi% des L/'es",
"0it1": "2+,3&ILL+3"
)
"0it1": "2+,3&ILL+3"
)
Documents in a MongoDB collection could have different fields ,note= Pid field is oligator!,
a"tomaticall! created ! MongoD<, it;s a "ni2"e inde4 which identif! the doc"ment
#n a doc"ment, new fields co"ld e added, e4isting ones s"ppressed, modified or renamed at an! moment* There is
Page 26
no predefined schema* . doc"ment str"ct"re is reall! simple and composed of ke!Bval"e pairs like associative arra!s in
programming lang"ages 'S39 format*
The ke! is the field name, the val"e is its content* .s val"e we co"ld "se n"mers, strings and also inar! data like
images or another 7e"?value pairs*
Language Support
MongoD< has official drivers for= (
$7
, (WW
$8
, (Z G *9et
$6
, >rlang
$E
, Caskell
$L
, 'ava
$5
, 'avaScript
$$
, 5isp
$)
, Perl
$0
,
PCP
$1
, P!thon
)7
, &"!
)8
and Scala
)6
*
There are also a large n"mer of "nofficial drivers for (oldF"sion
)E
, Delphi
)L
, 5"a
)5
, node*js
)$
, &"!
))
, Smalltalk
)0

and m"ch others*
Management and graphical frontends
MongoE9 tools
#n a MongoD< installation there are availale the following commands
mongo
MongoD< offers an interactive shell called mongo >A2?, which lets developers view, insert, remove, and
"pdate data in their dataases, as well as get replication information, setting "p sharding, sh"t down servers,
e4ec"te 'avaScript, and more*
.dministrative information can also e accessed thro"gh a web interface, a simple we page that serves
information ao"t the c"rrent server stat"s* <! defa"lt, this interface is 8777 ports aove the dataase port
,6078)-*
mongostat
mongostat is a commandBline tool that displa!s a simple list of stats ao"t the last second= how man! inserts,
"pdates, removes, 2"eries, and commands were performed, as well as what percentage of the time the dataase
60 http://github.com/mo#go"b/mo#go.c."!i)e!
61 http://github.com/mo#go"b/mo#go
62 http://www.mo#go"b.o!g/"ispla'/-7&3/&3ha!pK5a#guageK&e#te!
6 https://github.com/1o#'0e#/mo#go"b.e!la#g
64 http://hackage.haskell.o!g/package/mo#go-+
65 http://github.com/mo#go"b/mo#go.(a)a."!i)e!
66 http://www.mo#go"b.o!g/"ispla'/-7&3/Ba)a3c!iptK5a#guageK&e#te!
67 https://github.com/fo#s/cl.mo#go
68 http://github.com/mo#go"b/mo#go.pe!l."!i)e!
69 http://github.com/mo#go"b/mo#go.php."!i)e!
70 http://github.com/mo#go"b/mo#go.p'tho#."!i)e!
71 http://github.com/mo#go"b/mo#go.!ub'."!i)e!
72 https://github.com/mo#go"b/casbah
7 http://github.com/)i!tiI/cfmo#go"b
74 http://co"e.google.com/p/pebo#go/
75 http://co"e.google.com/p/luamo#go/
76 http://www.mo#go"b.o!g/"ispla'/-7&3/#o"e.B3
77 http://github.com/tmm1/!mo#go
78 http://www.s<ueaksou!ce.com/2o#go1alk.html
Page 27
was locked and how m"ch memor! it is "sing*
mongosniff
mongosniff sniffs network traffic going to and from MongoD<*
Monitoring plugins
There are MongoD< monitoring pl"gins availale for the following network tools= M"nin
)1
, Fanglia
07
, (acti
08
,
Sco"t
06
Cloud0%ased monitoring services
MongoD< Monitoring Service
0E
is a free, clo"dBased monitoring and alerting sol"tion for MongoD< deplo!ments
offered ! 87gen, the compan! who develops MongoD<
0eb ; Des/top Application 3:Is
Several F%#s have een created ! MongoD<;s developer comm"nit! to help vis"ali+e their data* Some pop"lar
ones are=
0pen Source tools
&ockMongo
0L
= PCP ased MongoD< administration F%# tool
phpMo.dmin
05
= another PCP F%# that r"ns entirel! from a single 15k selfBconfig"ring file
'Mongo<rowser
0$
= a desktop application for all platforms
MongoE
0)
= a &"!Based interface
Meclipse
00
= >clipse pl"gin for interacting with MongoD<
&roprietar" tools
MongoC"
01
= a Freeware native Mac 3S O application for managing MongoD<
Dataase Master
17
= development and administration tool for 3racle, S:5 Server, M!S:5, PostgreS:5,
MongoD<, S:5ite *** which allows r"n S:5, 5#9: and 'S39 2"eries over dataases* Developed ! 9"cleon
Software
18
for Dindows s!stems
<# St"dio= "siness intelligence and data anal!sis software which allows design dataase reports, charts and
79 http://github.com/e!h/mo#go.mu#i#
80 http://github.com/<uii)e!/mo#go"b.ga#glia
81 http://tag1co#sulti#g.com/blog/mo#go"b.cacti.g!aphs
82 http://scoutapp.com/plugi#Fu!ls/291.mo#go"b.slow.<ue!ies
8 http://www.10ge#.com/mo#go"b.mo#ito!i#g.se!)ice
84 http://co"e.google.com/p/!ock.php/wiki/!ockFmo#go
85 http://www.phpmoa"mi#.com/
86 http://www.e"g'tech.com/(mo#gob!owse!/
87 http://mo#go.com/
88 http://up"ate.eIoa#al'tic.com/o!g.mo#go"b.meclipse/
89 http://mo#gohub.to"a'close.com/
90 http://www.#ucleo#softwa!e.com
91 http://www.#ucleo#softwa!e.com
Page 28
dashoards, from the same compan! that Dataase Master
&onclusions
For knowing if MongoD< is for "s # will emphasi+e the following=
MongoD< has a <uer" language, which makes getting the data d!namic and fle4ile
The interface is a c"stom protocol over T(PG#P, with native drivers for a lot of lang"ages* 5he utilisation of a
binar" protocol ma7e the operations faster than others ,like (o"chD<-
/eplication is master?slave 0,LF, as with M!S:5* #f !o" need m"ltiple masters in a Mongo environment,
!o" have to set "p sharding*
3ther consideration # will do are=
the doc"mentation on the prod"ct;s we page is ver! good ,! e4ample, %se cases doc"mentation >A0?-
as said there is commercial s"pport availale*
Interesting ,eadings
MongoD< Schema Design= Cow to Think 9onB&elational 'ared &osoff;s presentation at Ro"t"e >A'?
&ealBtime .nal!tics with MongoD<, presentation ! 'ared &osoff >A7?
De .nal!tics "sing MongoD< of PCP and MongoD< De Development <eginnerSs F"ide ook >A@?
Page 29
Loo/ing for a NoSQL solution for our needs
"# Introduction and initial approach
.s # said the main j"stification we have for looking at 9oS:5 technologies is log management*
#n the enterprise # work, we have two "se cases=
1nal"sis of +nternet access logs
&ight now this access are stored in M!S:5 tales* De;re talking ao"t tens of thousands of users with +nternet
access and a few logs that record ever! %&5 and download si+e* 1s several 6igab"tes are generated dail"
what is done is to store logs in different tales partitioning the data verticall! ! month ,a tale for 'an"ar!, one
for Fer"ar!, etc ***-
Log anal"sis of geographicall" distributed 2nix servers
%sed for comm"nication with sales offices, we are talking of logs of the vario"s services ,FTP, SSC, .pache,
M!S:5, file transfer services developed ! the compan! ***- of ao"t 8777 5in"4 servers
For this anal!se we will chose the first one
%# Analyse of Internet access logs, describing the problem
Dhat we want is to achieve is an efficient storage and anal!sis of the logs of comm"nications made ! emplo!ees
,tens of tho"sands- with #nternet access* .t least more efficient that o"r act"al sol"tion*
9ow has een decided to divide the data in different tales ! month* This decision has een taken for reasons of
vol"me* 3ne immediate conse2"ence is that 2"ite complicated to make 2"eries asking for data of different months as
the developer will have to think caref"ll! how to design the 2"eries and the server will have to access m"ltiple tales*
5he critical problem here is to handle huge amounts of data* Do we need relationsV Res and no*
#f we gain performance we donSt mind to repeat certain data ,as "ser name, ! e4ample-*
De will get statistics and reports ased on log anal!sis, each one of this record will incl"de information s"ch as=
Date and time of access
%ser name
%&5 accessed
Si+e in !tes of the network transmission
The 2"estions we want to answer are the like the following t!pe=
Dhat are the most visited pagesV
.nd the most visited per monthV .nd the last weekV
Dhat "sers spend more time onlineV
Dhat are the 877 "sers whose traffic vol"me is greaterV
Page 0
Dhat is the average dail! vol"me of traffic from the corporate network to the #nternetV
Data si=e estimation
The si+e and n"mer of records of data that is eing stored ! month are=
Data si+e= etween 857 and E77 F<
5og entries n"mer= etween $57 millions and and 8*)77 millions
For a "ser pop"lation of aro"nd )7*777 who access to 8,5 millions of domains
So, given the act"al traffic si+e, in a !ear we co"ld reach a vol"me stored of G!%00 6B for 0 billions of
entries in the log
'# Oh my 3od> 0e ha*e a lot of options>
Dell, well, well ****
.s said in the second chapter 9oS:5, State of the :"estion, we have a lot of 9oS:5 Dataase Management
S!stems and for a eginner it seems diffic"lt to know where to egin*
.fter reading a lot on the s"ject ,see read recommendations at the end of this section- the most known 3pen So"rce
9oS:5 D<MS are= MongoD<
16
, (o"chD<
1E
, (assandra
1L
, Memase
15
, &edis
1$
, &iak
1)
, 9eoL'
10
,
FlockD<
11
and C<ase
877
, among others*
The first thing # wo"ld recommend wo"ld e to give a 2"ick read to the article 9oS:5 on >nglish Dikipedia
878

,with contri"tions from m!self =B-- and the first article in the series Picking the &ight 9oS:5 Dataase Tool
876
. 2"ick s"mmar! ,as reminder, it;s e4plained with more detail on second chapter- wo"ld e that, depending on the
wa! data is organi+ed, 9oS:5 dataases are divided into ***
Document oriented
.s MongoD< or (o"chD<* Data is stored in str"ct"red formats ,records- as 'S39* >ach data "nit is
called a doc"ment ,here this word have nothing to do with a file t!pe-*
He"?*alue
.s (assandra, Memase, &edis or &iak* Data is stores in ke!Bval"e pairs ,a val"e might e an oject-
6raph oriented
92 http://www.mo#go"b.o!g/
9 http://couch"b.apache.o!g/
94 http://cassa#"!a.apache.o!g
95 http://www.couchbase.com/membase
96 http://!e"is.io/
97 http://wiki.basho.com/
98 http://#eo4(.o!g/
99 http://github.com/twitte!/flock"b
100 http://hbase.apache.o!g/
101 http://e#.wikipe"ia.o!g/wiki/6o345
102 http://blog.mo#itis.com/i#"eI.php/2011/05/22/picki#g.the.!ight.#os<l."atabase.tool/
Page 1
.s 9eoL' or FlockD<* The! store the elements and their relationships with a graph st!le ,for social
networks, transport networks, road maps, network topologies, for e4ample-
5abular
.s (assandra or C<ase* Data is stored in rows with several col"mns that correspond to a ke!, with a similar
res"lt to a tale
+# Questions we should answer before ma/ing a choice
There is no one fits all 9oS:5 sol"tion, as this term appl! to a wide range of dataase management s!stems* #t;s
perfectl! possile and reasonale to "se several s!stems at the same time ,a relational D<MS as M!S:5 and one or
more 9oS:5 D<MS-, depending on the t!pe of data to store and to 2"er!*
#n the end the choice will depend on the nat"re of the prolem we want to solve*
. 9oS:5 sol"tion does not replace a relational dataase, complements it for a kind of prolems
where relational D<MS have not eno"gh performance* %nless, of co"rse, we;re "sing a S:5 dataase
for the wrong prolem*
De sho"ld e ale to answer the following 2"estions efore looking for a prod"ct=
Dhat t!pe of data will e handledV This data co"ld e nat"rall! organi+ed in associative .rra!sV 3r in ke!B
val"e pairsV #t is data which will fit in a OM5 or similar str"ct"reV
Do we need transactionsV
Do we need to "se Map &ed"ceV
.nd when reviewing the different options=
The latest version is considered staleV
Does it have commercial s"pportV
Dhat is the learning c"rveV
#s good doc"mentation availaleV #s there an active comm"nit!V
-# Description of the data for an Internet Access Log management system
Description of the problem
.s said will e the kind of data we manage, its str"ct"re and the nat"re of the prolem which will lead "s to one or
other 9oS:5 sol"tion*
The data we want to manage are access logs generated ! several CTTP pro4ies of the compan! for several tens of
tho"sands of "sers*
De have two different t!pe of records= records from FTP access and from the rest ,mainl! CTTP-=
For each FTP access we will save
#P of the host that makes the re2"est
date and time of access
Page 2
#nternet domain accessed
%&#
si+e of the transfer
For each 9onFTP access=
#P of the host that makes the re2"est
the "ser id
date and time of access
the CTTP method "sed
protocol
#nternet domain accessed
%&#
CTTP ret"rn code
si+e of the transfer
<esides storing the data the following statistical reports will be created
Cits n"mer and vol"me of data transferred b" +nternet domain, dail! and monthl!
Cits n"mer and vol"me of data transferred b" user, dail! and monthl!
Definition of our needs
So we can reach to the following first definition of o"r needs=
each data entr! co"ld e represented as an associative arra!
each record in "nrelated to each other
each entr! is stored in a log tale as it grows indefinitel!
accesses to the dataase are mostl! writing
each access means a change in the statistical val"es which reflect dail! and monthl! access ! domain and "ser
the list of 2"eries sent ! o"r application is known ,an!wa!, the schema sho"ld e defined as new ones can e
easil! made-
Dhat lead "s to the following concl"sions=
The data are records with m"ltiple fields, so we need a doc"mentBoriented dataase or ta"lar ,m"ltiple
col"mns for a record-
Map &ed"ce is desired* For having reports in real time each access will "pdate the dail! and monthl! statistics
for domain and "ser
De don;t need masterBmaster replication ,pro4ies in different geographic areas manage accesses from different
"sers-
De don;t need s"pport for m"ltiple versions ,there is no s"ch a thing in a log-
De don;t need real data consistenc!
Page
De don;t need transactions ,data will e added one after another, isolated-
.nd also the prod"ct chosen m"st e=
3pen So"rce
&ead! for prod"ction environments ,stale-
Dith professional s"pport
9ot ad
5# &hoosing between se*eral NoSQL products
#f we discard the dataases that hold data in memor!, as ke!Bval"e pairs and graphs we are left with the following
options= MongoD<, (o"chD<, (assandra, C<ase and &iak*
To what is told in all read doc"ments # will add the following tho"ghts ,read also the #nteresting readings sections
of different chapters-=
MongoD.
10
&/0S=
#t;s a doc"mentBoriented dataase therefore ver! fle4ile in str"ct"ring the data ,"ses 'S39-
#t has a d!namic 2"er! lang"age
Cas professional s"pport ! the compan! that developed the prod"ct, 87gen
87L
#t has a large and active communit" ,present at conferences, # have seen them at F3SD>M
875
in <r"ssels this
!ear-
#t has s"pport for Map &ed"ce
#t;s a mature product, considered prod"ction read! ,c"rrent version is 6*6-
The documentation on their website is reall" good
There are native drivers for m"ltiple lang"ages made ! 87gen
80,S=
.ltho"gh it is not diffic"lt to install and r"n MongoD< is not a simple prod"ct* .n installation of MongoD<
has several t!pes of services= data servers, config"ration servers, servers that ro"te the client re2"ests
&eplication is onl! masterBslave
10 http://www.mo#go"b.o!g/
104 http://www.10ge#.com/
105 https://fos"em.o!g
Page 4
&ouchD.
106
&/0S=
#t is a doc"mentBoriented dataase, so ver! fle4ile in str"ct"ring the data
(onc"rrent Jersions S!stem
#t has master?master replication
80,S=
For achieving versioning data is not modified, each time a modification is done a new version is added* This
takes a lot of disk space and atch processes are necessar! for data compaction operations
+t is not ver" mature* The latest version is 8*6*7 and has changes that make it incompatile with the previo"s
versions
5o exploit the data is necessar" to define views, which means that 2"eries m"st e defined in advance ,not
ver! fle4ile-
,ia/
107
&/0S=
#t is a h!rid dataase, store doc"ments and ke!Bval"e pairs
There is no central controller and therefore no single point of fail"re
#t has s"pport for MapB&ed"ce
#t has s"pport for transactions
80,S=
#t has two versions, one open so"rce and a commercial one with m"ltiBsite replication
&assandra
108
&/0S=
#t is an .pache project, considered of ma4im"m importance
#t is a ta"lar dataase ,can store m"ltiple ke! col"mns- making it fle4ile and valid for o"r case
Designed for situations where there is more writes than reads* Scale ver! well in these cases ,ideal for log
anal!sis-
Designed to replicate data between multiple data centers
Provides integration with 'adoop
871
for Map /educe
The consistenc! level is config"rale
106 http://couch"b.apache.o!g/
107 http://wiki.basho.com/
108 http://cassa#"!a.apache.o!g/
109 http://ha"oop.apache.o!g/
Page 5
There is no central controller and therefore no single point of failure
#t has support for transactions ,with UooQeeper
887
-
80,S=
Ma!e too comple4
7.ase
111
&/0S=
#t is an .pache project ,s"project of Cadoop-
Similar to (assandra ,can store m"ltiple ke! col"mns-
Provides integration with 'adoop for Map /educe
80,S=
Too comple4
6# And the 0inner is ##### MongoD.>
.fter reading ever!thing # opted for MongoD<, mainl! eca"se=
meets all the re2"irements stated at the eginning= doc"mentBoriented, with MapB&ed"ce, 3pen So"rce, stale
and professionall! s"pported
s"pport is given ! the same compan! that developed the prod"ct, 87gen, which is clear in the know =B-
has a complete wesite with e4tensive doc"mentation
are ver! active, the! are present in man! conferences and lect"res ,as seen in the article events
886
of their
we-
comparativel! this prod"ct does not seem too comple4
.s this will e the first deplo!ment of a 9oS:5 dataase in the compan! and made ! someone with no previo"s
e4perience, + consider vital the availabilit" documentation and comprehensive guides*
#n partic"lar, and for o"r "se case # will highlight the following articles from their we
MongoD< is Fantastic for 5ogging >AA?
%sing MongoD< for &ealBtime .nal!tics >AB?
There is a good collection of interesting presentations on 87gen;s we
88E
110 http://Lookeepe!.apache.o!g/
111 http://hbase.apache.o!g/
112 http://www.mo#go"b.o!g/"ispla'/-7&3/J)e#ts
11 http://www.10ge#.com/p!ese#tatio#s
Page 6
?# 0hat we will do from here ####
3nce chosen 9oS:5 management s!stem from here so we have to do is
8* #nstall and config"re MongoD<
6* Designing the MongoD< schema dataase for #nternet .ccess 5ogs
E* Develop some code which will allow "s to "se the schema from MongoD< or M!S:5 transparentl! to
applications
L* Make some performance tests on the same conditions for verif!ing that "sing MongoD< is not onl! more
fle4ile than M!S:5, "t also a etter idea from a performance point of view
5here are other alternatives that seem e<uall" interesting and valid as 8assandra or /ia7, and
that co"ld e interesting to test in a f"rther st"d!*
@# Interesting readings
(assandra vs MongoD< vs (o"chD< vs &edis vs &iak vs C<ase vs Memase vs 9eoLj comparison from
Qristof Qovacs <log >2?
Picking the &ight 9oS:5 Dataase Tool= post from Monitis; log >AC?
T(onsistenc! Models in 9onB&elational DataasesT ! F"! Carrison = . good e4planation of (.P Theorem,
>vent"al consistenc! and how consistenc! prolems can e handled in distri"ted environments* >C?
Page 7
Installation of MongoD.
#n this chapter # will descrie the technical details for deplo!ing a MongoD< 6*6*7 server in the 5in"4 machines of
PS. ,5in"4 distri"tion is S"se >nterprise 5in"4 Server-
#n PS. have developed inBho"se software distri"tion s!stem ,the! don;t "se *de or *rpm packages neither a known
standard for remote installs-* The installed software on their servers m"st meet a strict director! str"ct"re*
For reasons of confidentialit! in this article we will ass"me that MongoD< will e installed "nder GoptGmongodB
6*6*7*
"# Deploying MongoD. binaries
.fter downloading the legac" static %A bits version ,version 6*6*7 at Septemer 6786- from MongoD< Downloads
page
88L
the following commands have een e4ec"ted=
ta# <@A;f m/%'/db<5i%4@<@8B_B4<static<5e'ac1<282808t'A
m-di# 7/6t7m/%'/db<28280
m-di# 7/6t7m/%'/db<282807bi%
m-di# 7/6t7m/%'/db<282807d/cs
m-di# 7/6t7m/%'/db<282807etc
m-di# 7/6t7m/%'/db<2828075/'s
m-di# 7/6t7m/%'/db<282807data
m-di# 7/6t7m/%'/db<282807m/d45es
m-di# 7/6t7m/%'/db<282807tm6
m; m/%'/db<5i%4@<@8B_B4<static<5e'ac1<282807bi% 7/6t7m/%'/db<282807bi%
m; m/%'/db<5i%4@<@8B_B4<static<5e'ac1<282807C 7/6t7m/%'/db<282807d/cs
c./?% <,. #//t:#//t 7/6t7m/%'/db<28280
c./?% <,. %/b/d1:%/b/d1 7/6t7m/%'/db<282807data
c.m/d <, aD#= 7/6t7m/%'/db<28280
%# &ompiling and installing (7( dri*er
For the installation of mongo;s PCP driver we "se the pecl command for using the &E8L repositor" for &'&
extensions
885
6ec5 d/?%5/ad m/%'/
ta# <@;f m/%'/<1828128ta#
cd m/%'/<182812
6.6iAe
87c/%fi'4#e
ma-e
For deplo!ing the PCP mod"le in an .pache installation we will have to cop! the mongo!so "nder the e4tensions
director!
5ast we will have to add the following lines to the php*ini PCP config"ration file
114 http://www.mo#go"b.o!g/"ow#loa"s
115 http://pecl.php.#et/
Page 8
E D#i;e# 6/4# !/%'/DF
e@te%si/%Gm/%'/8s/
'# &ompiling and installing (yMongo, the (ython dri*er for MongoD.
Installing python1de*el ,(M
For adding new P!thon mod"les first we need to install the p!thonBdevel 5in"4 package ,and its dependencies-* #n
a PS. S5>S 87 5in"4 server it means=
#6m <i t-<884812<148128@8B_B48#6m
#6m <i b5t<284A<222828@8B_B48#6m
#6m <i 61t./%<t-<28482<188298@8B_B48#6m
#6m <i 61t./%<de;e5<28482<188298@8B_B48#6m
Installing (yMongo
To install P!thon driver "nder GoptGmongodB6*6*7Gmod"lesGp!thon6*L ,witho"t affecting s!stem;s P!thon- we
need to ma7e first a virtual &"thon installation following instr"ctions from virt"alenv page at P!thon Package
#nde4
88$
3nce the script :virtualenv!p"; is downloaded the commands for creating the virt"al P!thon environment and
installing p!mongo, the P!thon driver for MongoD<, are
m-di# <6 7/6t7m/%'/db<282807m/d45es761t./%284
61t./% ;i#t4a5e%;861 7/6t7m/%'/db<282807m/d45es761t./%284
s/4#ce 7/6t7m/%'/db<282807m/d45es761t./%2847bi%7acti;ate
6i6 i%sta55 61m/%'/
c./?% <,. #//t:#//t 7/6t7m/%'/db<282807m/d45es761t./%284
To "se this environment the developers will have to set the following line as the first in hisGher P!thon scripts
HI7/6t7m/%'/db<282807m/d45es761t./%2847bi%761t./%
+# &onfiguring the ser*er
#n addition to accepting (ommand 5ine Parameters, MongoD< can also e config"red "sing a config"ration file
,GoptGmongodB6*6*7GetcGmongod*conf-* The config"ration incl"ded in o"r installation is the following=
db6at. G 7/6t7m/%'/db<282807data
5/'6at. G 7/6t7m/%'/db<2828075/'s7m/%'/d85/'s
5/'a66e%d G t#4e
4%i@3/c-et*#efi@ G 7/6t7m/%'/db<282807tm6
H;e#b/se G t#4e
#est G t#4e
f/#- G t#4e
di#ect/#16e#db G t#4e
#n this config"ration the most remarkale element is director!perdd directive, which means that we are going to
"se one different director! for each dataase, for making easier to develop ack"p scripts later*
116 http://p'pi.p'tho#.o!g/p'pi/)i!tuale#)
Page 9
-# Installing ,oc/Mongo, a (7( based administration tool
There are a few frontBends for 2"er!ing and manage MongoD< ,a good list can e fo"nd on >nglish Dikipedia
MongoD< article
88)
-*
.fter testing a few # have fo"nd interesting the PCP we administration tool :/oc7Mongo;
880
*
So for installing and deplo!ing it we will download the file rockmongoBv8*8*6*+ip from &ockMongo;s we page
881

and ***
m-di# <6 7/6t7m/%'/db<282807?eb7.tm5
m-di# 7/6t7m/%'/db<2828075/'s7a6ac.e
4%Ai6 #/c-m/%'/<;181828Ai6
cd #/c-m/%'/
m; C 74se#s7m%d017?eb7.tm5
3nce the files are deplo!ed we will have to config"re .pache* 3ne e4ample for a Jirt"al Cost ! name with the
,fake- %&5 m!rockmongo*ciges*net co"ld e
J2i#t4a5K/st C:80L
D/c4me%t,//t "7/6t7m/%'/db<282807?eb7.tm5"
Di#ect/#1I%de@ i%de@86.6 i%de@8.tm5
3e#;e#Name m1#/c-m/%'/8ci'es8%et
+##/#L/' 7/6t7m/%'/db<2828075/'s7a6ac.e7e##/#_m%'85/'
04st/mL/' 7/6t7m/%'/db<2828075/'s7a6ac.e7access_m%' c/mm/%
JIf!/d45e m/d_6.698cL
6.6_admi%_f5a' safe_m/de Off
J7If!/d45eL
J72i#t4a5K/stL
De sho"ld see the following screen after making login with de defa"lt "ser admin with password admin=
117 http://e#.wikipe"ia.o!g/wiki/2o#go-+M2a#ageme#tFa#"Fg!aphicalFf!o#t.e#"s
118 http://co"e.google.com/p/!ock.php/wiki/!ockFmo#go
119 http://!ockmo#go.com/Eactio#G"ow#loa"s
Page 40
5# Authentication in MongoD.
The c"rrent version s"pports onl! asic sec"rit!* De a"thenticate with a "ser name and password* <! defa"lt a
normal "ser has f"ll read and write access to the dataase* So its important change this in the first connection*
De have to create a dataase called admin, which will contain the "sers who can have administration rights* The
"sers created on this dataase will have administration rights on all collections *
&reation of a user with administrator rights
For creating an administrator "ser called administrator the commands will e=
L 4se admi%
L db8addse#("admi%ist#at/#", "admi%ist#at/#")
To check if the "ser was created properl!
L s./? 4se#s
{
"_id" : ObjectId("4faaBa9(41B7(4d999Bef18d"),
"4se#" : "admi%ist#at/#",
"#eadO%51" : fa5se,
"6?d" : "819b(a97e0fBde9ca0b0794B0c9ea88a"
)
To log with the new "ser we have to "se the client with the following parameters
m/%'/ <4 admi%ist#at/# <6 admi%ist#at/# admi%
&reation of a user with read only rights
De can have admin "sers with read onl! rights or "sers with read onl! right for a specific collection
#n this e4ample we will create a read onl! "ser test with access for all the collections
L 4se admi%
L db8addse#("test", "test", t#4e)
:sers with access only to a collection
This kind of "sers onl! have rights on their dataases* To create them* De need to log as admin and connect to the
dataase where we want to create the new "ser=
L 4se test
L db8addse#("test", "test6ass?/#d")
Acti*ation of authentication support
De have two options to activate s"pport in MongoD<=
De can r"n mongo script start with the option BBa"th
De can add the following line to the config"ration file mongod*conf
M a4t. G t#4e
Page 41
Acti*ation of authentication support in ,oc/Mongo web interface
.t last, for activate "ser recognition in &ockMongo we need to set the following variales in config*php PCP file=
M!ONNOO"se#;e#s"POMiPO"m/%'/_a4t."P G t#4eE
M!ONNOO"se#;e#s"POMiPO"c/%t#/5_a4t."P G fa5seE
6# De*eloped scripts for starting, stopping and *erifying MongoD. status
# have developed three shell scripts for starting, stopping and verif!ing the MondoD< stat"s* >ach one of this scripts
reads common config"ration and code from a file called profile* The main feat"res are=
The processes of the server will e r"n ! "ser nood!
For verif!ing that MongoD< is r"nning a process for nood! "ser with the id saved in the P#D file is
searched in the s!stem
The start and the stop script store in a log each time the! are r"n
This scripts m"st e r"n as root, it;s the starting script who r"ns the server with the "ser config"red "sing the
command su
To avoid prolems with memor! "sage ! MongoD< we m"st tell to the operating s!stem that the memor!
si+e of the process sho"ld e "nlimited, "sing ulimit ?v unlimited efore starting it
D"e to this last two reasons the daemon is started with the following line
su nobody -c "ulimit -v unlimited; /opt/mongodb-2.2.0/bin/mongod -f /opt/mongodb-
2.2.0/etc/mongodb.conf
&onfiguration and common code
HI7bi%7-s.
H I%sta55ati/% 6at.s
m%'_bi%6at.G"7/6t7m/%'/db<28280"
m%'_data6at.G"7/6t7m/%'/db<28280"
m%'_db6at.G"7/6t7m/%'/db<282807data"
m%'_6idfi5eG"Mm%'_db6at.7m/%'/d85/c-"
m%'_5/'6at.G"Mm%'_data6at.75/'s"
m%'_5/'G"Mm%'_5/'6at.7m/%'/d85/'"
H Daem/% 6#/6e#ties
m%'_daem/%G"Mm%'_bi%6at.7bi%7m/%'/d"
m%'_4se#G"%/b/d1"
H 0ON3>&N>3
"!>_IN"OG"QtRsQ%"
"!>_+,,O,G"Qt+,,O,: RsIQ%"
H "N0>ION3
f4%cti/% m/%'/d_6#/cess_#4%%i%' {
Page 42
7bi%76s Mm%'_4se#S'#e6 <; '#e6S'#e6 <T "Mm%'_daem/%"
)
f4%cti/% m/%'/d_sta#ted {
if O <e "Mm%'_6idfi5e" PE t.e%
if O <s "Mm%'_6idfi5e" PE t.e%
m%'_6idGUcat "Mm%'_6idfi5e"U
7bi%76s Mm%'_4se#S'#e6 <; '#e6S'#e6 "Mm%'_daem/%"S'#e6
<T Mm%'_6id
e5se
#et4#% 1
fiE
e5se
if m/%'/d_6#/cess_#4%%i%'E t.e%
6#i%tf "M"!>_+,,O," "!/%'/DF i%sta%ce #4%%i%' b4t %/ 6id
fi5e f/4%d /% Q"Mm%'_6idfi5eQ""
e@it 1
e5se
#et4#% 1
fiE
fiE
)
Script to start the ser*er
HI7bi%7-s.
s/4#ce 7s/ft7m%d2207fi5es/76#/fi5e
H !a@im4m time i% sec/%d afte# sta#ti%' daem/% a%d bef/#e #et4#%i%' a%
e##/#
!&=_>I!+GB0
H >.is sc#i6t m4st be #4% b1 #//t
if O Uid <4U IG 0 PE t.e%
6#i%tf "M"!>_+,,O," ">.is sc#i6t m4st be #4% b1 #//t"
e@it 1
fiE
H L/' fi5e
sc#i6t_5/'G"Mm%'_5/'6at.7m%d_sta#t85/'"
if I O <e "Msc#i6t_5/'" PE t.e%
t/4c. "Msc#i6t_5/'"
c./?% Mm%'_4se# "Msc#i6t_5/'"
fiE
6#i%tf "Q%M"!>_IN"OQ%" "3ta#ti%' !/%'/DF 3e#;e# at UdateU" S tee <a
Msc#i6t_5/'
if I m/%'/d_sta#tedE t.e%
s4 Mm%'_4se# <c "45imit <; 4%5imitedE Mm%'_daem/% <f
Mm%'_db6at.7m/%'/db8c/%f S tee <a Msc#i6t_5/'"
e@it
s5ee6 1
iG1
?.i5e O Mi <5t M!&=_>I!+ P VV I m/%'/d_sta#tedE d/
s5ee6 1
d/%eE
Page 4
if I m/%'/d_sta#tedE t.e%
6#i%tf "M"!>_+,,O," "!/%'/DF se#;e# c/45d %/t be sta#ted" S tee
<a Msc#i6t_5/'
e5se
6#i%tf "M"!>_IN"O" "!/%'/DF se#;e# sta#ted /- a%d #4%%i%'" S tee
<a Msc#i6t_5/'
fiE
e5se
6#i%tf "M"!>_+,,O," "!/%'/DF se#;e# is a5#ead1 #4%%i%'" S tee <a
Msc#i6t_5/'
e@it 1
fiE
Script to stop the ser*er
HI7bi%7-s.
s/4#ce 7s/ft7m%d2207fi5es/76#/fi5e
H !a@im4m time i% sec/%ds afte# sta#ti%' daem/% a%d bef/#e #et4#%i%' a%
e##/#
!&=_>I!+GM(((CB0))
H >.is sc#i6t m4st be #4% b1 #//t
if O Uid <4U IG 0 PE t.e%
6#i%tf "M"!>_+,,O," ">.is sc#i6t m4st be #4% b1 #//t"
e@it 1
fiE
H L/' fi5e
sc#i6t_5/'G"Mm%'_5/'6at.7m%d_st/685/'"
if I O <e "Msc#i6t_5/'" PE t.e%
t/4c. "Msc#i6t_5/'"
c./?% Mm%'_4se# "Msc#i6t_5/'"
fiE
6#i%tf "Q%M"!>_IN"OQ%" "3t/66i%' !/%'/DF 3e#;e# at UdateU" S tee <a
Msc#i6t_5/'
if m/%'/d_sta#tedE t.e%
s4 Mm%'_4se# <c "Mm%'_daem/% <f Mm%'_db6at.7m/%'/db8c/%f <<s.4td/?%S
tee <a Msc#i6t_5/'"
s5ee6 1
iG1
?.i5e O Mi <5t M!&=_>I!+ P VV m/%'/d_sta#tedE d/
s5ee6 1
d/%eE
if m/%'/d_sta#tedE t.e%
6#i%tf "M"!>_+,,O," "!/%'/DF se#;e# c/45d %/t be st/66ed" S tee
<a Msc#i6t_5/'
e5se
6#i%tf "M"!>_IN"O" "!/%'/DF se#;e# stt/6ed" S tee <a Msc#i6t_5/'
fiE
e5se
6#i%tf "M"!>_+,,O," "!/%'/DF se#;e# is %/t #4%%i%'" S tee <a
Msc#i6t_5/'
e@it 1
Page 44
fiE
Script to *erify the status
HI7bi%7-s.
s/4#ce 7s/ft7m%d2207fi5es/76#/fi5e
if m/%'/d_sta#tedE t.e%
6#i%tf "M"!>_IN"O" "!/%'/DF se#;e# is #4%%i%'"
#et4#% 0
e5se
6#i%tf "M"!>_IN"O" "!/%'/DF se#;e# is NO> #4%%i%'"
#et4#% 1
fiE
Page 45
NoSQL Schema Design for Internet Access Logs
"# Analysis of logs with NoSQL
The anal!se of logs ,in real time while the data is eing received or processing data alread! stored- is the t!pe of
prolem for which 9oS:5 sol"tions are partic"larl! s"itale* De have a great ,or even h"ge- amo"nt of data that
increases witho"t end, and where the relationships are not reall! important ,we don;t need to normalise the elements of
data-*
#n this article # will e4plain the design of the schema chosen for an e2"ivalent sol"tion implemented with M!S:5
and with MongoD<*
%# Description of an eAui*alent MySQL database
For o"r comparative tests # have defined the following M!S:5 tales=
Access Logs
The FTP connections are stored in a different tale that the 9on FTP ,mostl! CTTP-
,eports by month
Two totals are stored each month per domain and "ser= n"mer of access ,hits- and vol"me in !tes downloaded* .
report is made ! month, what means we have for each month two tables= one with the users information and a
second one with domains information*
'# Defining a schema for MongoD. NoSQL
De have the following elements to manage=
Page 46
%sers
#nternet domains
The access non FTP ,mainl! CTTP-
The access "sing FTP
#n MongoD< data is gro"ped into collections ,e2"ivalent to tales- and each element of data is
called a doc"ment ,e2"ivalent to records-* 2nli7e relational databases each document could have
a different number of fields4 and also contain other documents*
#n MongoD< is not necessar! to define the fields and the t!pe of each field* The collection, if
needed, and the str"ct"re of a doc"ment is created ! the server at the time of saving the data*
De will work with the following collections=
Two collections for access log, one for each access log tale
Totals ,hit n"mer and data transferred- will e calc"lated in real time "sing MongoD< f"nctions
867
* For
each month we will have we will have two collections= one ! "ser and a second one ! domain ,so in a !ear
we will have 6L collections with aggregation data-
Therefore, the str"ct"re of each collection is ,shown in pse"docode- ***
NON !4( &onnections Log
{
"4se#i6": st#i%',
"4se#": st#i%',
"datetime": Date,
"met./d": st#i%',
"6#/t/c/5": st#i%',
"d/mai%": st#i%',
"4#i": st#i%',
"#et4#%_c/de": i%te'e#,
"siAe": i%te'e#
)
!4( &onnections Log
{
"4se#i6": st#i%',
"4se#": st#i%',
"datetime": Date,
"met./d": st#i%',
"d/mai%": st#i%',
"4#i": st#i%',
"siAe": i%te'e#
)
120 http://www.mo#go"b.o!g/"ispla'/-7&3/Ap"ati#g
Page 47
4otals calculated by user
.s said, for monthl! reports we will work with two collections= a collection for "sers and one for the domains, with
totals ! month and da!*
For each !ear and month we will have a collection
>ach collection will have one doc"ment per "ser* #n addition to the "ser identifier, t he number of visits and volume
of b"tes transferred will be stored b" month and b" da"* Dail! totals will e stored as a s"doc"ment ,within the
doc"ment corresponding to the "ser-*
These totals will e "pdated in real time, as log data is eing processed* That is, each time information is received
from a visit a new record will e created in the log ,FTP or FTP-, the n"mer of visits will e incremented ! one and
the si+e transferred will e added the total vol"me for the da! and "ser in the collection of the corresponding month*
There will e totals ! month that will e "pdated too*
{
"_id": "se#id"
"Nb": i%te'e#,
"2/54me": i%te'e#,
"Dai51": {
"0": {
"Nb": i%te'e#,
"2/54me": i%te'e#
),
"1": {
"Nb": i%te'e#,
"2/54me": i%te'e#
),
"2": {
"Nb": i%te'e#,
"2/54me": i%te'e#
),
"(": {
"Nb": i%te'e#,
"2/54me": i%te'e#
),
8888
"(0": {
"Nb": i%te'e#,
"2/54me": i%te'e#
),
),
)
4otals calculated by Domain
This will work e4actl! as in "sers "t each doc"ment will correspond to a domain instead of a "ser name*
Page 48
+# Interesting articles and presentations
To "nderstand how to design a 9oS:5 schema the following presentations have een ver! "sef"l=
&ealBTime .nal!tics Schema Design and 3ptimi+ation, ! &!an 9it+ >AD?
MongoD< for .nal!tics, ! 'ohn 9"nemaker >B0?
&eal Time .nal!tics with MongoD< Deinar, ! 'ared &osoff >B1?
From a roader perspective, considering MongoD< architect"re that wo"ld form part, following two links are also
interesting=
&ealBTime 5og (ollection with Fl"entd and MongoD<, article from Treas"re Data compan!
868
, telling
how to "se Fl"entd
866
for realBtime processing of server logs and storing in MongoD< >B2?
Social Data and 5og .nal!sis %sing MongoD<, ! Takahiro #no"e, telling the architect"re deplo!ed for a
social game compan! "sing MongoD<, Cadoop and (assandra >B'?
121 http://t!easu!e."ata.com/
122 http://flue#t".o!g/
Page 49
&omparati*e of a MySQL based solution *ersus
MongoD.
"# 0or/ plan for the performance tests
9ow that we have the following elements read!=
. package for the installation of M!S:5 5*7*6$
. package for the installation of MongoD< 6*6*7
. schema design for M!S:5 and the e2"ivalent for MongoD<
#t;s time to make tests with data* .s we can3t use real data for confidentialit" concerns we will have to=
Develop code to fill with :fa7e but realistic; data the M!S:5 and MongoD< dataases* This code will have
classes with a common interface that will allow applications to "se data from M!S:5 or MongoD< with no
code changes
Define a atter! of tests to compare the performance of oth sol"tions
Make the tests and otain concl"sions
%# (7( De*elopment for testing the database model
The first thing to do is to fill the dataases with a ig vol"me of realistic data* So initiall! # have developed some
code to create a volume of millions of log entries* For o"r tests we have created=
)7*777 visiting "sers
)7*777 visiting #P;s
8*E77*777 visited #nternet domains
17*777*777 of 9on FTP log entries
L*577*777 of FTP log entries
For creating each log entr! we will have to generate random data for the different elements= #nternet domain, FTP
method, CTTP method, #P, Protocol, &et"rn code, Si+e and %ser*
So, # have developed three PCP classes=
:/andomElements; class= with f"nctions like get&andomDomain,-, get&andomFTPMethod,- M which are
"sed to generate the random elements of data
:Mongo/andomElements; and :M"S#L/andomElements; classes, which are children classes of the
previo"s one and have added f"nctions to work with each dataase management s!stem* The! have f"nctions
to=
Save a random "ser in the dataase
(reate lists of random domains, #Ps and "sers and save them in talesGcollections
Delete talesGcollections
Page 50
Jerif! if a "ser e4ists in the list of random "sers
Send a 2"er! that ret"rns one single data and ret"rn it
Send a 2"er! that ret"rns a gro"p of records and ret"rn them as an arra!
Fet the n"mer of "sers created
Fet one random "serGdomainG#P from the list of created "sersGdomainsG#Ps
(reate a new FTP log entr! getting the ,random- elements needed and save it into the dataase
(reate a new non FTP log entr! getting the ,random- elements needed and save it into the dataase
M
The interface for this two classes is the same, so the! can e "sed with the same code, making the
scripts which "se them agnostic to the t!pe of dataase "sed
The %M5 diagram of this classes will e=
:se code eample
.n e4ample of "sing this classes co"ld e the following PCP code that, starting from an empt! dataase, creates
)7*777 random "sers, )7*777 random #Ps, 8*E77*777 random domains and after having this elements, generate E7
Page 51
millions of random log entries for the month of .pril and save them in the collection for 9on FTP log entries=
Mm#e G %e? !/%'/,a%d/m+5eme%ts()E
Mm#e<Lc#eatese#s(70000)E
Mm#e<Lc#eateI*s(70000)E
Mm#e<Lc#eateD/mai%s(1(00000)E
77 +@am65e data f/# &6#i5
Msta#t G m-time(0,0,0,4,1,2012)E
Me%d G m-time(2(,99,0,4,(0,2012)E
f/# (Mi G 0E Mi J (0000000E MiDD) {
M5/' G Mm#e<L'et,a%d/mN/%">*L/'+%t#1(Msta#t, Me%d)E
Mm#e<Lsa;e,a%d/mN/%">*L/'+%t#1(M5/')E
)
f/# (Mi G 0E Mi J 1900000E MiDD) {
M5/' G Mm#e<L'et,a%d/m">*L/'+%t#1(Msta#t, Me%d)E
Mm#e<Lsa;e,a%d/m">*L/'+%t#1(M5/')E
)
The code for making the same operations on M!S:5 is e4actl! de same e4cept we sho"ld change the first line
Mm#e G %e? !/%'/,a%d/m+5eme%ts()E
!
Mm#e G %e? !13WL,a%d/m+5eme%ts()E
&ode source a*ailable at 3ithub and ciges#net
The so"rce code and the tests scripts who "se this classes are availale at :8iges C
internetIaccessIcontrolIdemo; on 6ithub
86E
* .lso it has een doc"mented with phpDocumentor
127

and it;s availale at m! we page www!ciges!net
865
'# 4esting MongoD. *s MySQL performance
.s said in the first chapter the tests will e realised in one machine with the following hardware specifications=
&.M= 87 F<
(ores= E6 ,.MD 3pteron Processor $860-
The comparative will e made etween
MongoDB !!0 ,and 6*6*7Brc7, the tests had eg"n efore the final stale version has een made availale-
M"S#L $!0!% ,with M!#S.M tales-
The ojective is to have an ojective meas"re of performance of oth sol"tions for a list of e2"ivalent tests* The tests
developed are gro"ped in=
#nsertion tests
M"lti"ser conc"rrent read tests
12 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo
124 http://www.php"oc.o!g/
125 http://www.ciges.#et/php"oc/iac"/
Page 52
M"lti"ser conc"rrent write tests
(omple4 ,aggregation- 2"eries read tests
. script ,PCP, 'avaScript or S:5- has een done for each one* This scripts are r"n with the %ni4 command time to
meas"re the time taken ! each one*
>ach test has een repeated three ,or more- times to discard anormal res"lts*
+# Insertion tests
The list of insertion tests is the following=
8* Feneration and saving of )7*777 random "sers witho"t "sing inde4es and allowing repeated val"es
6* Feneration and saving of )7*777 random #Ps witho"t "sing inde4es and allowing repeated val"es
E* Feneration and saving of 8*E77*777 random domains witho"t "sing inde4es and allowing repeated val"es
L* Feneration and saving of )7*777 random "sers "sing inde4es and verif!ing ,sending a read 2"er!- that the "ser
does not e4ists efore sending the save command
5* Feneration and saving of )7*777 random #Ps "sing inde4es and verif!ing ,sending a read 2"er!- that the #P
does not e4ists efore sending the save command
$* Feneration and saving of 8*E77*777 random domains "sing inde4es and verif!ing ,sending a read 2"er!- that
the "ser does not e4ists efore sending the save command
)* Feneration and saving of 8 million of non FTP log entries
0* Feneration and saving of 5 millions of non FTP log entries
1* Feneration and saving of 87 millions of non FTP log entries
87* Feneration and saving of E7 millions of non FTP log entries
Insertion tests results
&es"lts given are the average of the different res"lts discarding e4treme val"es*
MongoDB M"S#L
)7*777 "sers Es 86s
)7*777 #Ps Es 86s
8*E77*777 domains 50s Lm E$s
)7*777 "ni2"e "sers with inde4es 6Es 60s
)7*777 "ni2"e #Ps with inde4es 66s E8s
8*E77*777 "ni2"e domains with inde4es 0m6)s 8Lm8Es
8*777*777 log entries 86m)s 6$m8Ls
5*777*777 log entries 8h7Em5Es 6h87m5Ls
87*777*777 log entries 8h51m88s Eh6)m87s
E7*777*777 log entries 5h55m65s 87h80mL$s
Page 5
-# Multi user concurrent tests
The previo"s insertion tests are coded as a loop which makes an insertion one after the other* This means that there
will e onl! one 2"er! at a time*
For the following tests instead of making a loop ,it makes little sense for reading tests- # have "sed
the open so"rce tool (Meter
86$
with the plugin Stepping 5hread 6roup
86)
to sim"late conc"rrent
"sers*
'Meter is a powerf"l tool that allows to sim"late "se cases and loads with virt"al "sers to meas"re the performance of
a we application*
# will sim"late virt"al "sers that will access sim"ltaneo"sl! to a collection of scripts which make simple read and
write operations* This scripts are PCP scripts which will e made availale via we*
The tests are composed ! si4 scripts, which will perform the following three tests for MongoD< and for M!S:5=
Search and show data for a random user Dread testE
1dd a random user Dwrite testE
Search and show data for a random user or add a random user Dread & write test4 )0J of times will
read and 0J of times will writeE
De will sim"late two scenarios=
1n incrementing load from 0 to $0 users rising b" five* This load will e kept for a few min"tes
1 load of $0 users sending all <ueries from the beginning* This load will e kept also for a few min"tes
efore stopping*
The list of tests config"red ,two times, one for MongoD< and another for M!S:5- is the following
8* (onc"rrent reads, incrementing "sers from 7 to 57
6* (onc"rrent reads, 57 "sers
E* (onc"rrent writes, incrementing "sers from 7 to 57
L* (onc"rrent writes, 57 "sers
5* (onc"rrent reads ,07[- H writes ,67[-, incrementing "sers from 7 to 57
$* (onc"rrent reads ,07[- H writes ,67[-, 57 "sers
>ach one of this tests have een made three or more times, stopping and starting the server efore*
For each one we will get=
9"mer of 2"eries sent ,val"e samples-
Statistical val"es for response time ,in milliseconds-= average, median, minim"m, ma4im"m, standard
deviation
Percentage of errors
Thro"ghp"t in 2"eriesGsecond
126 http://(mete!.apache.o!g/
127 http://co"e.google.com/p/(mete!.plugi#s/wiki/3teppi#g1h!ea"0!oup
Page 54
Q<!tesGsecond received and average !tes per 2"er!
. (SJ file with the response time for all the 2"eries, which # will "se to made a graphical representation
For the generation of the graphics # have
&ed"ced the n"mer of val"es ,and the impact of aerrant ones- otained getting the mean gro"ped ! second
,855 val"es for MongoD< and the same n"mer for M!S:5-
&epresent a linear regression with the & f"nction loess
860
The (Meter configuration, 8S* files with the samples results and / scripts are all availale in
the Fith" at :(iges G internetPaccessPcontrolPdemo on Fith"
861
&oncurrent read tests results
Concurrent reads* incrementing users #rom 0 to @0 (each thread !ill %e "ept #or 100 seconds)
Samples Med Min Max Std! Dev! 5hroughput HBCsec
MongoDB 886*8$5 A)ms LEms E*717ms 80,)6ms )60,E 2Gs 80L,11 kGs
M"S#L 08*8)1 %Kms L5ms E*8L7ms L7,16ms 560 2Gs 8EL,70 kGs
Fraphicall! represented this load test will e
128 http://stat.ethL.ch/*.ma#ual/*.patche"/lib!a!'/stats/html/loess.html
129 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo
Page 55
0 5 0 1 0 0 1 5 0
5
0
6
0
7
0
8
0
9
0
1
0
0
3 e c o # "
*
e
s
p
o
#
s
e

t
i
m
e

m
e
a
#

i
#

m
s
& o n c u r r e n t r e a d s i n c r e m e n t i n g u s e r s f r o m B t o - B
8 # c ! e m e # t i # g b ' f i ) e e a c h f i ) e s e c o # " s . J a c h t h ! e a " i s k e p t f o ! 1 0 0 s e c o # " s
2 ' 3 4 5
2 o # g o - +
Concurrent reads* @0 users(each thread !ill %e "ept #or @0 seconds)
Samples Med Min Max Std! Dev! 5hroughput HBCsec
MongoDB E)*L1) $Ams 57ms 5*L81ms 8$8,7Lms $E5,L 2Gs 866,00 kGs
M"S#L E6*6)E %ms 56ms 5*8E$ms 85$,17ms 5L),1 2Gs 88L,5L kGs
Fraphicall! represented this load test will e
Page 56
0 1 0 2 0 0 4 0 5 0 6 0
6
0
7
0
8
0
9
0
3 e c o # "
*
e
s
p
o
#
s
e

t
i
m
e

m
e
a
#

i
#

m
s
& o n c u r r e n t r e a d s f o r - B u s e r s
1 e s t m a " e i s w i t h 5 0 t h ! e a " s f ! o m t h e b e g i # # i # g . J a c h t h ! e a " i s k e p t f o ! 5 0 s e c o # " s
2 ' 3 4 5
2 o # g o - +
&oncurrent writes tests results
Concurrent writes* incrementing users #rom 0 to @0 (each thread !ill %e "ept #or 10 minutes)
Samples Med Min Max Std! Dev! 5hroughput HBCsec
MongoDB L$L*05E $Ams L1ms E*875ms 6),)8ms )87,1 2Gs 8L0,$) kGs
M"S#L E0E*)77 K0ms 58ms L*875ms 65,50ms 50$,) 2Gs 866,$L kGs
Fraphicall! represented this load test will e
Page 57
0 1 0 0 2 0 0 0 0 4 0 0 5 0 0 6 0 0
6
0
7
0
8
0
9
0
1
0
0
3 e c o # "
*
e
s
p
o
#
s
e

t
i
m
e

m
e
a
#

i
#

m
s
& o n c u r r e n t w r i t e s i n c r e m e n t i n g u s e r s f r o m B t o - B
8 # c ! e m e # t i # g b ' f i ) e e a c h f i ) e s e c o # " s . J a c h t h ! e a " i s k e p t f o ! 1 0 m i # u t e s
2 ' 3 4 5
2 o # g o - +
Concurrent writes* @0 users (each thread !ill %e "ept #or @0 seconds)
Samples Med Min Max Std! Dev! 5hroughput HBCsec
MongoDB E)*L1) $Ams 57ms 5*L81ms 8$8,7Lms $E5,L 2Gs 8E6,00 kGs
M"S#L E6*6)E %ms 56ms 5*8E$ms 85$,17ms 5L),1 2Gs 88L,5L kGs
Fraphicall! represented this load test will e
&oncurrent reads ; writes tests results
Concurrent read & writes* incrementing users #rom 0 to @0 (each thread !ill %e "ept #or 10 minutes)
Samples Med Min Max Std! Dev! 5hroughput HBCsec
MongoDB L$6*)L7 $$ms L0ms E*888ms 6$,6Lms )7),) 2Gs 8)E,L5 kGs
M"S#L E)E*L0L K1ms 56ms E*856ms 6),10ms 5)8,8 2Gs 8E1,18 kGs
Fraphicall! represented this load test will e
Page 58
0 1 0 2 0 0 4 0 5 0 6 0
6
0
7
0
8
0
9
0
1
0
0
3 e c o # "
*
e
s
p
o
#
s
e

t
i
m
e

m
e
a
#

i
#

m
s
& o n c u r r e n t w r i t e s f o r - B u s e r s
1 e s t m a " e i s w i t h 5 0 t h ! e a " s f ! o m t h e b e g i # # i # g . J a c h t h ! e a " i s k e p t f o ! 5 0 s e c o # " s
2 ' 3 4 5
2 o # g o - +
Concurrent read & writes* @0 users (each thread !ill %e "ept #or @0 seconds)
Samples Med Min Max Std! Dev! 5hroughput HBCsec
MongoDB E1*71$ $Ams 57ms L*)5Ems 8L6,)6ms $$5,7 2Gs 8$6,1E kGs
M"S#L E8*6)6 %Ams 56ms 5*1$Lms 8)$,$Lms 5E7,7 2Gs 861,0 kGs
Fraphicall! represented this load test will e
Page 59
0 1 0 0 2 0 0 0 0 4 0 0 5 0 0 6 0 0
6
0
7
0
8
0
9
0
1
0
0
1
1
0
3 e c o # "
*
e
s
p
o
#
s
e

t
i
m
e

m
e
a
#

i
#

m
s
& o n c u r r e n t r e a d ; w r i t e s , i n c r e m e n t i n g u s e r s f r o m B t o - B
8 # c ! e m e # t i # g b ' f i ) e e a c h f i ) e s e c o # " s . J a c h t h ! e a " i s k e p t f o ! 1 0 m i # u t e s
2 ' 3 4 5
2 o # g o - +
0 1 0 2 0 0 4 0 5 0 6 0
6
0
7
0
8
0
9
0
3 e c o # "
*
e
s
p
o
#
s
e

t
i
m
e

m
e
a
#

i
#

m
s
& o n c u r r e n t r e a d s ; w r i t e s f o r - B u s e r s
1 e s t m a " e i s w i t h 5 0 t h ! e a " s f ! o m t h e b e g i # # i # g . J a c h t h ! e a " i s k e p t f o ! 5 0 s e c o # " s
2 ' 3 4 5
2 o # g o - +
5# Data analyse CaggregationD read tests
This tests are made to compare the aggregation capailities of oth dataase management s!stems* # have designed
comple4 2"eries that will read all the data ,17 millions of log records- and otain different res"lts* For MongoD< # have
"sed the aggregation framework ,simpler than Map &ed"ce f"nctions and eno"gh if we don;t need to get a large list of
res"lts-*
The 2"eries tested are the following=
Dhich are the 87 most visited domains and how man! visits has each oneV
Dhich are the 87 most visited domains in the second half of '"neV
Dhich are the 87 "sers that have more #nternet accessesV
Dhat is the average #nternet traffic for '"neV
+n the real world Dand with database that could have terab"tesE this t"pe of <uestions would be
calculated in real time "pdating collections created for storing the res"lts ,as shown in chapter
9oS:5 Schema Design for #nternet .ccess 5ogs- or with batch scripts*
For comparing the performance etween MongoD< and M!S:5 we will compare the time taken for each one which
the %ni4 command time as with the insertion tests* .lso each test will e repeated three or more times* &es"lts given
are the average of the different res"lts discarding e4treme val"es*
Aggregation read tests results
MongoDB M"S#L
87 most visited domains with visit totals 8Em8Es 6mE)s
87 most visited domains in the second half
of '"ne
56mE1s 8)mLEs
87 "sers with more #nternet accesses 6Lm76s Em5Es
.verage #nternet traffic for '"ne 86m75s 6mL6s
6# Aggregation read tests code
# think it;s interesting to show the code "sed for the aggregation scripts in MongoD< and M!S:5* M!S:5 part is
S:5, so it will e familiar to most readers, MongoD< part "ses the aggregation framework* The code is mostl! PCP
code e4cept one of the tests where # have "sed 'avaScript for MongoD< and a S:5 script for M!S:5*
0hich are the "B most *isited domains and how many *isits has each oneE
MongoE9 (JavaScript)
db8N/%">*_&ccess_5/'8a''#e'ate(
{ M'#/46: {
_id: "Md/mai%",
;isits: { Ms4m: 1 )
)),
{ Ms/#t: { ;isits: <1 ) ),
{ M5imit: 10 )
)8#es45t8f/#+ac.(6#i%tjs/%
Page 60
MySFG (SFG script)
d#/6 tab5e if e@ists N/%">*_&ccess_5/'_d/mai%_;isitsE
c#eate tab5e N/%">*_&ccess_5/'_d/mai%_;isits (
Ud/mai%U ;a#c.a#(299) NO> NLL,
U;a54eU i%t 4%si'%ed %/t %455,
*,I!&,X Y+X (Ud/mai%U),
Y+X U;a54e_i%de@U (U;a54eU)
) +NNIN+G!1I3&! D+"&L> 0K&,3+>G4tf8
se5ect d/mai%, c/4%t(C) as ;a54e f#/m N/%">*_&ccess_5/' '#/46 b1 d/mai%E
se5ect C f#/m N/%">*_&ccess_5/'_d/mai%_;isits /#de# b1 ;a54e desc 5imit 10E
0hich are the "B most *isited domains in the second half of FuneE
MongoE9 (HIH code)
Mm#e G %e? !/%'/,a%d/m+5eme%ts("m/%'/db", "m/%'/db", "5/ca5./st", "I%te#%et&ccessL/'")E
77 "i#st, ?e 'et t.e mi%im4m ;a54e /f t.e 10 .i'.est ;isits 6e# d/mai%
Msta#t G %e? !/%'/Date(st#t/time("2012<0B<19 00:00:00"))E
Me%d G %e? !/%'/Date(st#t/time("2012<0B<(0 2(:99:99"))E
Mmi%_;a54e G Mm#e<L'etO%e(a##a1(
a##a1(ZMmatc.Z GL a##a1(ZdatetimeZ GL a##a1( ZM'tZ GL Msta#t, ZM5tZ GL
Me%d ))),
a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZMd/mai%Z, Z;isitsZ GL a##a1( ZMs4mZ GL 1 ))),
a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZM;isitsZ)),
a##a1(ZMs/#tZ GL a##a1(Z_idZ GL <1)),
a##a1(ZM5imitZ GL 10),
a##a1(ZMs/#tZ GL a##a1(Z_idZ GL 1)),
a##a1(ZM5imitZ GL 1),
), "N/%">*_&ccess_5/'")E
77 N/?, ?e /btai% a55 t.e d/mai%s ?it. at 5est t.at ;a54e
Mdata G Mm#e<L'et,es45ts(a##a1(
a##a1(ZMmatc.Z GL a##a1(ZdatetimeZ GL a##a1( ZM'tZ GL Msta#t, ZM5tZ GL
Me%d ))),
a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZMd/mai%Z, Z;isitsZ GL a##a1( ZMs4mZ GL 1 ))),
a##a1(ZMmatc.Z GL a##a1(Z;isitsZ GL a##a1( ZM'teZ GL Mmi%_;a54e)))
), "N/%">*_&ccess_5/'")E
f/#eac.(Mdata as Md/c) {
6#i%t_#(Md/c)E
)
MySFG (SFG code)
Mm#e G %e? !13WL,a%d/m+5eme%ts("m1sT5db", "m1sT5db", "5/ca5./st", "I%te#%et&ccessL/'")E
77 "i#st, ?e 'et t.e mi%im4m ;a54e /f t.e 10 .i'.est ;isits 6e# d/mai%
Msta#t G "2012<0B<19 00:00:00"E
Me%d G "2012<0B<(0 2(:99:99"E
MT4e#1 G "se5ect C f#/m (se5ect disti%ct(c/4%t(C)) as ;isits f#/m N/%">*_&ccess_5/' ?.e#e
datetime bet?ee% Q""8Msta#t8"Q" a%d Q""8Me%d8"Q" '#/46 b1 d/mai% /#de# b1 ;isits desc 5imit
10) as t/6te%_;isits_b1_d/mai% /#de# b1 ;isits 5imit 1"E
Mmi%_;a54e G Mm#e<L'etO%e(MT4e#1)E
Page 61
77 N/?, ?e /btai% a55 t.e d/mai%s ?it. at 5est t.at ;a54e
MT4e#1 G "se5ect C f#/m (se5ect d/mai%, c/4%t(C) as ;isits f#/m N/%">*_&ccess_5/' ?.e#e
datetime bet?ee% Q""8Msta#t8"Q" a%d Q""8Me%d8"Q" '#/46 b1 d/mai%) as ;isits_b1_d/mai% ?.e#e
;isits LG "8Mmi%_;a54eE
M#es45ts G Mm#e<L'et,es45ts(MT4e#1)E
?.i5e(M#/? G M#es45ts<Lfetc._ass/c()) {
6#i%t_#(M#/?)E
)
0hich are the "B users that ha*e more Internet accessesE
MongoE9 (HIH code)
Mm#e G %e? !/%'/,a%d/m+5eme%ts("m/%'/db", "m/%'/db", "5/ca5./st", "I%te#%et&ccessL/'")E
77 "i#st, ?e 'et t.e mi%im4m ;a54e /f t.e 10 .i'.est ;isits 6e# 4se#
Mmi%_;a54e G Mm#e<L'etO%e(a##a1(
a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZM4se#Z, Z;isitsZ GL a##a1( ZMs4mZ GL 1 ))),
a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZM;isitsZ)),
a##a1(ZMs/#tZ GL a##a1(Z_idZ GL <1)),
a##a1(ZM5imitZ GL 10),
a##a1(ZMs/#tZ GL a##a1(Z_idZ GL 1)),
a##a1(ZM5imitZ GL 1),
), "N/%">*_&ccess_5/'")E

77 N/?, ?e /btai% a55 t.e 4se#s ?it. at 5east t.at ;a54e
Mdata G Mm#e<L'et,es45ts(a##a1(
a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZM4se#Z, Z;isitsZ GL a##a1( ZMs4mZ GL 1 ))),
a##a1(ZMmatc.Z GL a##a1(Z;isitsZ GL a##a1( ZM'teZ GL Mmi%_;a54e)))
), "N/%">*_&ccess_5/'")E
f/#eac.(Mdata as Md/c) {
6#i%t_#(Md/c)E
)
MySFG (SFG code)
Mm#e G %e? !13WL,a%d/m+5eme%ts("m1sT5db", "m1sT5db", "5/ca5./st", "I%te#%et&ccessL/'")E
77 "i#st, ?e 'et t.e mi%im4m ;a54e /f t.e 10 .i'.est ;isits 6e# 4se#
MT4e#1 G "se5ect C f#/m (se5ect disti%ct(c/4%t(C)) as ;isits f#/m N/%">*_&ccess_5/' '#/46
b1 4se# /#de# b1 ;isits desc 5imit 10) as t/6te%_;isits_b1_4se# /#de# b1 ;isits 5imit 1"E
Mmi%_;a54e G Mm#e<L'etO%e(MT4e#1)E
77 N/?, ?e /btai% a55 t.e 4se#s ?it. at 5east t.at ;a54e
MT4e#1 G "se5ect C f#/m (se5ect 4se#, c/4%t(C) as ;isits f#/m N/%">*_&ccess_5/' '#/46 b1
4se#) as ;isits_b1_4se# ?.e#e ;isits LG "8Mmi%_;a54eE
M#es45ts G Mm#e<L'et,es45ts(MT4e#1)E
?.i5e(M#/? G M#es45ts<Lfetc._ass/c()) {
6#i%t_#(M#/?)E
)
Page 62
0hat is the a*erage Internet traffic for FuneE
MongoE9 (HIH code)
Mm#e G %e? !/%'/,a%d/m+5eme%ts("m/%'/db", "m/%'/db", "5/ca5./st", "I%te#%et&ccessL/'")E
Msta#t G %e? !/%'/Date(st#t/time("2012<0B<01 00:00:00"))E
Me%d G %e? !/%'/Date(st#t/time("2012<0B<(0 2(:99:99"))E
M#es45t G #/4%d(Mm#e<L'etO%e(a##a1(
a##a1(ZMmatc.Z GL a##a1(ZdatetimeZ GL a##a1( ZM'teZ GL Msta#t, ZM5teZ GL Me%d ))),
a##a1(ZM6#/jectZ GL a##a1(Z_idZ GL 0, Zda1Z GL a##a1 ( ZMda1Of!/%t.Z GL ZMdatetimeZ ),
ZsiAeZ GL 1)),
a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZMda1Z, Z;/54meZ GL a##a1( ZMs4mZ GL ZMsiAeZ))),
a##a1(ZM'#/46Z GL a##a1(Z_idZ GL Za55Z, Za;e#a'eZ GL a##a1( ZMa;'Z GL ZM;/54meZ))),
a##a1(ZM6#/jectZ GL a##a1(Z_idZ GL ZMa;e#a'eZ))
), "N/%">*_&ccess_5/'"))E
6#i%tf(">#affic ;/54me mea% b1 da1 i% b1tes f/# $4%e: R80fQ%", M#es45t)E
MySFG (SFG code)
Mm#e G %e? !13WL,a%d/m+5eme%ts("m1sT5db", "m1sT5db", "5/ca5./st", "I%te#%et&ccessL/'")E
Msta#t G "2012<0B<01 00:00:00"E
Me%d G "2012<0B<(0 2(:99:99"E
MT4e#1G"se5ect #/4%d(a;'(;/54me)) f#/m (se5ect s4m(siAe) as ;/54me f#/m N/%">*_&ccess_5/'
?.e#e datetime bet?ee% Q""8Msta#t8"Q" a%d Q""8Me%d8"Q" '#/46 b1 da1/fm/%t.(datetime)) as
siAeb1da1"E
M#es45t G Mm#e<L'etO%e(MT4e#1)E
6#i%tf(">#affic ;/54me mea% b1 da1 i% b1tes f/# $4%e: R80fQ%", M#es45t)E
?# 7ow to run this tests
Database and users creation
#n the Fith" repositor! for (iges G internetPaccessPcontrolPdemo
8E7
there is a collection of scripts to r"n the tests*
This scripts "se the following dataase names and "sers ! defa"lt=
For M!S:5= dataase #nternet.ccess5og, with "ser and password m!s2ld
For MongoD<= collection #nternet.ccess5og, with "ser and password mongod
So efore starting we will have to create oth and give the permissions=
Creating data%ase and user in MySFG
m1sT5L c#eate database I%te#%et&ccessL/'E
m1sT5L '#a%t a55 6#i;i5e'es /% I%te#%et&ccessL/'8C t/ m1sT5db[5/ca5./st ide%tified b1
Zm1sT5dbZE
10 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo
Page 6
Creating collection and user in MongoE9
L 4se I%te#%et&ccessL/'
L db8addse#("m/%'/db", "m/%'/db")
3enerating random data
The following PCP scripts create E months of random data as e4plained efore=
createDataPEmonthsPmongo*php
createDataPEmonthsPm!s2l*php
To r"n them we can "se the console &'& interpreter with
6.6 <c 6at.7t/7m176.68i%i 6.6sc#i6t
List of runnable scripts
.s with the data generation script, to r"n them on "se the console &'& interpreter! There are two PCP scripts per
test= one for M!S:5 and the second one for MongoD<* The relation of scripts is the following=
Scripts 5est
test8P8Pmongo*php, test8P8Pm!s2l*php Feneration and saving of )7*777 random "sers witho"t "sing inde4es and allowing
repeated val"es
test8P6Pmongo*php, test8P6Pm!s2l*php Feneration and saving of )7*777 random #Ps witho"t "sing inde4es and allowing
repeated val"es
test8PEPmongo*php, test8PEPm!s2l*php Feneration and saving of 8*E77*777 random domains witho"t "sing inde4es and
allowing repeated val"es
test6P8Pmongo*php, test6P8Pm!s2l*php Feneration and saving of )7*777 random "sers "sing inde4es and verif!ing
,sending a read 2"er!- that the "ser does not e4ists efore sending the save
command
test6P6Pmongo*php, test6P6Pm!s2l*php Feneration and saving of )7*777 random #Ps "sing inde4es and verif!ing ,sending
a read 2"er!- that the #P does not e4ists efore sending the save command
test6PEPmongo*php, test6PEPm!s2l*php Feneration and saving of 8*E77*777 random domains "sing inde4es and verif!ing
,sending a read 2"er!- that the domain does not e4ists efore sending the save
command
testEP8Pmongo*php, testEP8Pm!s2l*php Feneration and saving of 8 million of non FTP log entries
testEP6Pmongo*php, testEP6Pm!s2l*php Feneration and saving of 5 millions of non FTP log entries
testEPEPmongo*php, testEPEPm!s2l*php Feneration and saving of 87 millions of non FTP log entries
testEPLPmongo*php, testEPLPm!s2l*php Feneration and saving of E7 millions of non FTP log entries
test1P8Pmongo*php, test1P8Pm!s2l*php .nal!se 2"er!= Fets the 87 domains most visited and the n"mer of visits for each
one
test1P6Pmongo*php, test1P6Pm!s2l*php .nal!se 2"er!= Fets the 87 domains most visited in the second half of '"ne and
the n"mer of visits for each one
test1PEPmongo*php, test1PEPm!s2l*php .nal!se 2"er!= Fets the 87 "sers with most hits
test1PLPmongo*php, test1PLPm!s2l*php .nal!se 2"er!= Fets the mean ! da! for traffic vol"me in '"ne
Multi1user concurrent tests
This scripts, "nder the we director!, are tho"gh to we hosted in a we server* 3ne we have config"red o"r we
server to made then availale # have "sed=
1pache (Meter
8E8
with the plugin :Stepping 5hread 6roup;
8E6
to r"n the load tests
11 http://(mete!.apache.o!g/
12 http://co"e.google.com/p/(mete!.plugi#s/wiki/3teppi#g1h!ea"0!oup
Page 64
/
8EE
to create graphical representation from (SJ files with the data created with 'Meter
The scripts availale "nder we director!
8EL
are=
Scripts >unction 5est
testLPmongo*php,
testLPm!s2l*php
Search and show data for a random "ser (onc"rrent reads
test5Pmongo*php,
test5Pm!s2l*php
Drite a random "ser (onc"rrent writes
test$Pmongo*php,
test$Pm!s2l*php
MongoD< readGwrite test* This scripts makes one
of two actions= Search and show data for a
random "ser ,read test- or Drite a new random
"ser in the dataase ,write test-* The read test is
made 07[ of times, the write one the 67[*
(onc"rrent reads H writes
3nce the we server config"red if we accede to the %&5 corresponding to the director! we sho"ld see a description
message with links to the different scripts
:sing FMeter to run load tests
.s shown efore we have defined two scenarios for each test and three t"pes of tests* Then we have si4 different
tests=
(onc"rrent reads, incrementing "sers from 7 to 57
(onc"rrent reads, 57 "sers
(onc"rrent writes, incrementing "sers from 7 to 57
(onc"rrent writes, 57 "sers
(onc"rrent reads ,07[- H writes ,67[-, incrementing "sers from 7 to 57
(onc"rrent reads ,07[- H writes ,67[-, 57 "sers
#n the file :MongoDB vs M"S#L!@mx; there is all the config"ration needed for 'Meter
To r"n each tests we sho"ld
(hange the %&5 of the server to o"r address
#n Jiew &es"lts in Tale change the path where the (SJ file sho"ld e saved
>nale in 'Meter onl! the test we want to r"n, disaling the rest ,if not more than one test will e r"n-
1 http://www.!.p!o(ect.o!g/
14 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo/t!ee/maste!/web
Page 65
J-ample o# incrementing user con#iguration !ith JMeter
3etting a graphical representation of load tests with ,
>ach load test will generate tens of thousands of samples that will be stored in 8S* files* De have two files for
each tests t"pe, one with M!S:5 response times and the second one with MongoD< response time*
For each test t!pe # have developed a & script that reads this two files and represents graphicall! a s"mmar! of
samples and draws a line that shows response time evol"tion for oth t!pes of servers*
This scripts are availale also in the we director! and for r"nning them !o" have simpl! to "se the command
so"rce* Their names are self e4planator!* #f we had si4 tests we have then si4 & scripts, one for showing the
comparative res"lts of each one*
To load in & and show the graphic !o" have simpl! to load the script* <! e4ample for loading the first one=
s/4#ce("0/%c4##e%t #eads 90 4se#s8,")
Page 66
&onclusions and last words
"# 4ests conclusions
5ooking at the n"mers and graphics we arrive to the following concl"sions=
0rite performance9
MongoD< is faster in p"re write performance
#n a series of contin"o"s simple writings MongoD< is from 6 to L times faster* +n general4 for high numbers
Dmillions of record savingsE simple writing performance is the double of M"S#L
#n concurrent writes MongoDB is faster D1$J and G0J in o"r tests-
MongoD< is m"ch more scalale, meaning that when the user load increases the response time 7eeps
stable* &esponse time in M!S:5, instead, gets worse as the n"mer of "sers grows
,ead performance9
MongoD< is faster in p"re read performance
#n concurrent reads MongoDB is faster D1$J and A0J in o"r tests-
.lso, MongoD< is more scalale
Aggregation performance9
Cere M!S:5 wins over MongoD<;s aggregation native framework* M"S#L is much faster in aggregating
data4 G to % times faster for the L tests we have done
#n this aggregation 2"eries no relations are involved* M!S:5 F&3%P <R 2"eries have a ver! high
performance
So we co"ld sa! that, as waited, for intensive reading and writing data operations MongoDB is a
better option that M"S#L when no relations nor aggregation 2"eries performance are important and
the data readingGwriting performance is critical*
9eed to sa! that aggregation 2"eries on ten of millions of records is not a good idea, it would be
better to calculate in real time values needed as records are processed ,what means read H write
operations-* So for problems as log anal"se ,oS#L technologies are much better*
%# Initial planning and actual time spent on each tas/
The initial work charge estimated was E77 ho"rs* The act"al time spent on each has een of more of L77 ho"rs
divided as follows=
Page 67
5as7s 5ime spent
Drounded to
hoursE
St"d! of 9oS:5 articles H ooks $5
MongoD< installation, config"ration, package creation H "pdates 50
Development of a schema for #nternet .ccess 5og 67
Scripts development ,PCP, S:5, 'avaScript, M!S:5 stored proced"res- $0
5oad tests )5
Doc"mentation ,memor!, posts on ciges*net H presentation- 1E
#ncidents anal!se H resol"tion 80
Planning, coordination H comm"nication LE
Total ** Expression is
faulty **
To keep track of time spent on each task # have "sed the we application &a"mo
1G$
, an eas! to "se time tracking
software that allow to create projects and tasks and comfortal! startGstop timers for each one*
.lso initiall! was planned to make also tests with sharding capailities of MongoD< ,"sing more than one machine-,
"t d"e to lack of time we will do them later*
'# (roblems and bugs found
The reasons of the time e4cess regarding to was initial planned are=
.t first # developed a script to import real data for prod"ction server into MongoD< for the tests* <"t "sing
real data is not allowed, then # co"ld not "se it
Be have used three versions of MongoDB* The MongoD< st"d! has een started with version 6*7*$*
Meanwhile 6*6*7 release candidate 7 and 6*6*7 final have een p"lished *
De have "pgraded the version d"e to with 6*6*7 a native aggregation framework is availale, and it;s
easier and more performing than "sing MapB&ed"ce to aggregate data*
.lso the second version change has een forced d"e to a "g fo"nd on 6*6*7rc7 that prod"ced and
integer overflow on some aggregation f"nctions
8E$
#nitiall! # have tried to make some load tests scripts "sing directl! 'avaScript for MongoD< and stored
proced"res for M!S:5* .lso # have tried to "se mapBred"ce* Both initiatives4 using native supported
(avaScript Cstored procedures and map?reduce were wrong!
The time invested in learning how to develop with this technologies has een "seless d"e to
limitations on 'avaScript MongoD< .P#* .lso M!S:5 stored proced"res developing was more
comple4 than tho"ght, so at last # have "sed PCP for most of the scripts
MapBred"ce f"nctionalit! incl"ded ! defa"lt in MongoD< is terril! slow
8E)
and not "sef"l*
MongoD<;s aggregation framework was a valid option and the one chosen for data anal!se tests*
D"e to version change and some errors made when r"nning load tests + had to repeat the tests batter" two or
three times
15 http://www.pa'mo.biL/
16 https://(i!a.mo#go"b.o!g/b!owse/3J*NJ*.6166
17 http://stacko)e!flow.com/<uestio#s/1219149/map!e"uce.with.mo#go"b.!eall'.!eall'.slow.0.hou!s.)s.20.mi#utes.i#.m's<l.fo!
Page 68
De have fo"nd some configurations problems and product3s bugs
Documentation wor7 time has been underestimated
.ugs found on MongoD.
.part of prolems commented right now, while we were preparing and testing the MongoD< package for
distri"tion in PS.;s server we have fo"nd the following notale "gs in MongoD< or prolems in o"r initial
config"ration=
Memory problems when inserting big amounts of data
Dhen # egan with the insertion tests # got the following errors after a few millions of records saved
+,,O,: mma6() fai5ed f/# 74se#s7m%'007i%sta%ces76#ima#17data75/ca575/ca584 5e%:214B4(9072
e##%/:12 0a%%/t a55/cate mem/#1
+,,O,: mma6 fai5ed ?it. /4t /f mem/#18 (B4 bit b4i5d)
asse#ti/% 10089 ca%Zt ma6 fi5e mem/#1 %s:5/ca58s1stem8#e65set T4e#1:{)
The prolem here is that MongoD< was reserving memor! as it was needed and it arrives a moment where the
operating s!stem does not allow the process to cons"me more memor!*
.fter a lot of tests and cons"lting with m! colleag"es one of them show me the wa! to go* #n o"r MongoD< starting
script we have to tell 5in"4 not to limit the amo"nt of memor! with the command=
45imit <; 4%5imited
Integer overflow when using aggregation functions
Dhen calc"lating the average of traffic vol"me the res"lt was a negative n"mer* #t was ovio"sl! an overflow* .fter
searching in MongoD<;s '#&.
8E0
it is a known prolem for the version 6*6*7rc7*
.fter "pgrading the prod"ct to the stale 6*6*7 the prolem was solved*
Map-Reduce operations on Mongo! are really" really slow
To get the n"mer of visits ! domains # tried to "se mapBred"ce f"nctions "t # fo"nd the! were terril! slow ,E7
ho"rs vs 67 min"tes in M!S:5 for the same kind of test-*
.fter asking in MongoD<;s '#&. and in Stack 3verflow
8E1
# received 2"ickl! s"pport from .dam (omerford
8L7
, a
technical s"pport manager from 87gen ,the enterprise who made MongoD<- who e4plained that it co"ld e normal*
MongoDB uses the (avaScript engine :SpiderMon7e"; to compute Map?/educe functions4 and (avaScript is
slow and single threaded*
De have three options for this kind of operations
2se the :1ggregation >ramewor7;
1A1
included from MongoDB3s version !1* ,this framework has the
limitation of not eing ale of ret"rning data of more than 8$ Mega!tes-
2se 1pache 'adoop
1A
to ma7e map?reduce operations with the MongoD< Cadoop (onnector
8LE
* #n this
case MongoD< will e the dataase from where to read and save the data and .pache Cadoop will do the
18 https://(i!a.mo#go"b.o!g/b!owse/3J*NJ*.6166
19http://stacko)e!flow.com/<uestio#s/1219149/map!e"uce.with.mo#go"b.!eall'.!eall'.slow.0.hou!s.)s.20.mi#utes.i#.m's<l.fo!
140 http://www.li#ke"i#.com/i#/acome!fo!"
141 http://"ocs.mo#go"b.o!g/ma#ual/applicatio#s/agg!egatio#/
142 http://ha"oop.apache.o!g/
14http://api.mo#go"b.o!g/ha"oop/2o#go-+K;a"oopK&o##ecto!.html
Page 69
calc"lations
.nother option ,to test- co"ld e to use 6oogle3s (avaScript engine *) which can e integrated in
MongoD<;s compiling the prod"ct
8LL
* This engine is faster and m"ltiBthreaded*
For o"r test # have "sed the first possiilit!, the .ggregation Framework with an "pdated version of MongoD<*
+# !uture wor/
This work is not reall! complete* #n this project # have compared MongoD< with M!S:5 for a concrete "se case and
with a limited n"mer of tests*
To complete this work the following sho"ld e done
/epeat the tests with a huge <uantit" of data ,h"ndreds of millions of records instead of onl! 17 millions,
with a "sed disk si+e of h"ndreds of giga!tes instead of tens-
.dd tests with a m"ltiBmachine config"ration "sing sharding
.lso others f"t"re lines of work co"ld e=
Test mapBred"ce operations with J0 'avaScript engine
Test mapBred"ce operations with Cadoop integration ,well, eing realistic MongoD<;s and Cadoop;s
integration co"ld e the s"ject for another work like the presented in this doc"ment-*
.dd a few more aggregation tests
-# &ontributions to the community
.ll this work, made as part of m! paid work as s!stem administrator at PS., is intended to e p"licl! availale* So,
the contri"tion to the comm"nit! is doc"mentation and so"rce code*
#n partic"lar, while this project was eing made the following contri"tions have een done=
8ontributions to the Bi7ipedia
&ewriting of >nglish wikipedia articles= MongoD<
8L5
, (o"chD<
8L$
&ewriting of French wikipedia article= MongoD<
8L)
Minor edition on other articles like >nglish (.P theorem
8L0
, 9oS:5
8L1
, Te4tile ,mark"p lang"age-
857
,
.pache (assandra
858
8reation of a personal blog http=CCwww!ciges!net
Series of post with a summar" of the wor7 done, prolems fo"nd and sol"tions given=
144 http://www.mo#go"b.o!g/"ispla'/-7&3/+uil"i#gKwithKN8
145 http://e#.wikipe"ia.o!g/wiki/2o#go-+
146 http://e#.wikipe"ia.o!g/wiki/&ouch-+
147 http://f!.wikipe"ia.o!g/wiki/2o#go-+
148 http://e#.wikipe"ia.o!g/wiki/&$PFtheo!em
149 http://e#.wikipe"ia.o!g/wiki/6o345
150 http://e#.wikipe"ia.o!g/wiki/1eItileFO28ma!kupFla#guageO29
151 http://e#.wikipe"ia.o!g/wiki/$pacheF&assa#"!a
Page 70
<"scando "na sol"ci\n 9oS:5 para el anKlisis de logs ,8 de L-
856
<"scando "na sol"ci\n 9oS:5 para el anKlisis de logs ,6 de L-
85E
<"scando "na sol"ci\n 9oS:5 para el anKlisis de logs ,E de L-
85L
<"scando "na sol"ci\n 9oS:5 para el anKlisis de logs ,L de L-
855
#nstalando MongoD< W drivers en S%S> 5in"4 S5>S 87, alg"nos ap"ntes
85$
>s2"ema de datos 9oS:5 para el anKlisis de logs de acceso a #nternet
85)
1ll source code D&'& classes4 scripts and configuration filesE
.t 6ithub repositor" (iges G internetPaccessPcontrolPdemo
850
The documentation created with phpDocumentor is on m! we page
851
5his document and detailed instructions to repeat the tests done are incl"ded in Fith" and in
m! we
#uestions opened and answered on Stac7 0verflow ,and also in MongoD<;s '#&.-
Map &ed"ce with MongoD< reall!, reall! slow ,E7 ho"rs vs 67 min"tes in M!S:5 for an e2"ivalent
dataase-
8$7
Simple tool for we server enchmarkingV
8$8
Sim"ltaneo"s "sers for we load tests in 'MeterV
8$6
The doc"mentation ,5ire3ffice doc"ments and posts on m! we- have a (reative (ommons .ttri"tion ] Share
.like E*7 %nported license* The so"rce code is licensed "nder the FP5 version E*7
152 http://www.ciges.#et/busca#"o.#os<l.1
15 http://www.ciges.#et/busca#"o.#os<l.2
154 http://www.ciges.#et/busca#"o.#os<l.
155 http://www.ciges.#et/busca#"o.#os<l.4
156 http://www.ciges.#et/apu#tes.sob!e.la.i#stalacio#."e.mo#go"b
157 http://www.ciges.#et/es<uema."e."atos.#os<l.pa!a.el.a#alisis."e.logs."e.acceso.a.i#te!#et
158 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo
159 http://www.ciges.#et/php"oc/iac"/
160 http://stacko)e!flow.com/<uestio#s/1219149/map!e"uce.with.mo#go"b.!eall'.!eall'.slow.0.hou!s.)s.20.mi#utes.i#.m's<l.fo!
161 http://stacko)e!flow.com/<uestio#s/12249895/simple.tool.fo!.web.se!)e!.be#chma!ki#g
162 http://stacko)e!flow.com/<uestio#s/1292644/simulta#eous.use!s.fo!.web.loa".tests.i#.(mete!
Page 71
5# (ersonal e*aluation of the practicum
This work has een, from an administrator;s s!stem point of view, ver! interesting* Dhat # have tried here is to appl!
the knowledge ac2"ired in the Master on Free Software Projects Development and Management
8$E
#n partic"lar # have considered ver! important=
5he availabilit" of DalmostE all the documentation4 code and configuration files in an open license
5he openness in all the process followed to j"stif! options chosen and to compare MongoD< and M!S:5 to
allow an!one to repeat it
5he use of 0pen Source products ,instead of proprietar! ones "sed ! defa"lt in m! enterprise-* # mean
partic"larl!
5ire3ffice instead of Microsoft 3ffice
.pache 'Meter and & instead of CP 5oad&"nner
#t;s clear the real infl"ence of the philosoph! and technologies shown at the Master on Free Software
on this work, which wo"ld e different if it had een simpl! another project to complete at work*
#t has een also the first time # have "sed from the eginning tools to increase and meas"re prod"ctivit! and to follow
work time dedicated to each project;s task* # have "sed=
&a"mo to meas"re the real time spent on each part
5hin7ing /oc7 as the tool to define tasks, s"tasks and to take 2"ick notes ao"t them
5iddl"Bi7i as a portale notepad to take notes
The initial planning has een optimistic, as "s"al, "t now # have ojective data and # hope to improve work time
estimation for f"t"re projects*
.lso this work has een "sef"l to make a first approach to performance testing, a domain were # have never worked
efore and whose comple4it! # get now a etter idea*
# hope that the Master on Free Software Projects Development and Management will make me a etter
professional with a roader knowledge on Free Software world* #n m! h"mle opinion, # think that at least for this time
it has een s"ccessf"l*
&egards
'os? M* (iges, in Jigo ,Spain- at 3ctoer 6786
16 http://www.maste!softwa!elib!e.com/
Page 72
.ibliography ; ,eferences
1: P&assa#"!a is a# $pache top le)el p!o(ectP9 b' i#cubato!.apache.o!g . A*5: http://www.mail.
a!chi)e.com/cassa#"!a."e)Qi#cubato!.apache.o!g/msg01518.html
2: P&assa#"!a )s 2o#go-+ )s &ouch-+ )s *e"is )s *iak )s ;+ase compa!iso#P9 b' /!istRf /o)Scs . A*5:
http://kko)acs.eu/cassa#"!a.)s.mo#go"b.)s.couch"b.)s.!e"is
: P6o3459 8f 7#l' 8t ,as 1hat Jas'P9 b' +. B. &la!k . A*5: http://b(cla!k.me/2009/08/#os<l.if.o#l'.it.was.that.
eas'/
4: P&assa#"!a D $ st!uctu!e" sto!age s'stem o# a P2P 6etwo!kP9 b' ?acebook . A*5:
http://www.facebook.com/#ote.phpE#oteFi"G244118919Hi"G9445547199Hi#"eIG9
5: P,hat%s #ew i# 2'345 5.6P9 b' 2'345 -e)elope! To#e . A*5: http://"e).m's<l.com/tech.
!esou!ces/a!ticles/whats.#ew.i#.m's<l.5.6.html
6: P2'345 5.6 p!e)iew i#t!o"uces a 6o345 i#te!faceP9 b' 1he P;P web . A*5: http://www.h.
o#li#e.com/ope#/#ews/item/2'345.5.6.p!e)iew.i#t!o"uces.a.6o345.i#te!face.1519719.html
7: P6o345 to 8##o-+ with 2emcache"P9 b' 1*a#sactio#s o# 8##o-+ +log . A*5:
http://blogs.i##o"b.com/wp/2011/04/#os<l.to.i##o"b.with.memcache"/
8: P&o#siste#c' 2o"els i# 6o#.*elatio#al -atabasesP9 b' 0u' ;a!!iso# . A*5:
http://"bpe"ias.com/wiki/6o345:&o#siste#c'F2o"elsFi#F6o#.*elatio#alF-atabases
9: PP&assa#"!a: 1he -efi#iti)e 0ui"ePP9 b' Jbe# ;ewitt . A*5:
http://shop.o!eill'.com/p!o"uct/066920010852."o
10: P&ouch-+ Ns 2o#go-+P9 b' 0ab!iele 5a#a . A*5: http://www.sli"esha!e.#et/gab!iele.la#a/couch"b.)s.
mo#go"b.2982288
11: P3houl" 8 use 2o#go-+ o! &ouch-+ =o! *e"is>EP9 b' *i'a" /alla . A*5:
https://plus.google.com/10797941677126670/posts/5?++2P/41
12: P8s this the #ew hot#ess #owP9 b' $pache . A*5: http://www.mail.a!chi)e.com/cassa#"!a.
"e)Qi#cubato!.apache.o!g/msg00004.html
1: P1he A#"e!l'i#g 1ech#olog' of 2essagesP9 b' /a##a# 2uthukka!uppa# . A*5:
http://www.facebook.com/#otes/facebook.e#gi#ee!i#g/the.u#"e!l'i#g.tech#olog'.of.
messages/454991608919
14: PP1hi!" Pa!t' 3uppo!tP a!ticle o# $pache &assa#"!a%s wikiP9 b' $pache . A*5:
http://wiki.apache.o!g/cassa#"!a/1hi!"Pa!t'3uppo!t
15: PP-eplo'i#g &assa#"!a ac!oss 2ultiple -ata &e#te!sP a!ticle o# -atastaI &assa#"!a -e)elope! &e#te!P9
b' -atastaI . A*5: http://www."atastaI.com/"e)/blog/"eplo'i#g.cassa#"!a.ac!oss.multiple."ata.ce#te!sU
16: PP;a"oop 3uppo!tP a!ticle o# &assa#"!a%s wikiP9 b' $pache . A*5:
http://wiki.apache.o!g/cassa#"!a/;a"oop3uppo!t
17: PP2ig!ati#g 6etfliI f!om -atace#te! 7!acle to 0lobal &assa#"!aP p!ese#tatio#P9 b' $"!ia# &ockc!oft .
Page 7
A*5: http://www.sli"esha!e.#et/a"!ia#co/mig!ati#g.#etfliI.f!om.o!acle.to.global.cassa#"!a
18: P*ai#bi!": *ealtime $#al'tics at 1witte!V p!ese#tatio#P9 b' /e)i# ,eil . A*5:
http://www.sli"esha!e.#et/ke)i#weil/!ai#bi!".!ealtime.a#al'tics.at.twitte!.st!ata.2011
19: P?!om 100s to 100s of 2illio#s p!ese#tatio#P9 b' J!ik 7##e# . A*5:
http://www.sli"esha!e.#et/eo##e#/f!om.100s.to.100s.of.millio#s/
20: P&assa#"!a H puppet9 scali#g "ata at W15 pe! mo#th p!ese#tatio#P9 b' -a)e &o##o!s . A*5:
http://www.sli"esha!e.#et/"a)eco##o!s/cassa#"!a.puppet.scali#g."ata.at.15.pe!.mo#th
21: P;a"oop a#" &assa#"!a at *ackspacePV p!ese#tatio#P9 b' 3tu ;oo" . A*5:
http://www.sli"esha!e.#et/stuhoo"/ha"oop.a#".cassa#"!a.at.!ackspace
22: Pmail f!om &isco i# cassa#"!a."e) maili#g listP9 b' &isco . A*5: http://www.mail.a!chi)e.com/cassa#"!a.
"e)Qi#cubato!.apache.o!g/msg0116.html
2: P?$4 o# &assa#"!a%s wikiP9 b' $pache . A*5: http://wiki.apache.o!g/cassa#"!a/?$4Mgui
24: PP&lie#t 7ptio#sP a!ticle o# &assa#"!a ,ikiP9 b' $pache . A*5:
http://wiki.apache.o!g/cassa#"!a/&lie#t7ptio#s
25: P&assa#"!a . $ -ece#t!aliLe" 3t!uctu!e" 3to!age 3'stemP9 b' $)i#ash 5akshma# a#" P!asha#t 2alik .
A*5: http://www.cs.co!#ell.e"u/p!o(ects/la"is2009/pape!s/lakshma#.la"is2009.p"f
26: PP;+ase )s &assa#"!a: wh' we mo)e"PP9 b' -omi#ic ,illiams . A*5:
http://!ia101.wo!"p!ess.com/2010/02/24/hbase.)s.cassa#"!a.wh'.we.mo)e"/
27: PP4 2o#ths with &assa#"!a9 a lo)e sto!'PP9 b' &lou"&ick . A*5:
https://www.clou"kick.com/blog/2010/ma!/02/4Fmo#thsFwithFcassa#"!a/
28: PP;+ase )s &assa#"!aPP9 b' $"ku . A*5: http://blog.a"ku.com/2011/02/hbase.)s.cassa#"!a.html
29: PP&assa#"!a )s =&ouch-+ X 2o#go-+ X *iak X ;+ase>P9 b' +!ia# 7%6eill . A*5:
http://b!ia#o#eill.blogspot.f!/2012/04/cassa#"!a.)s.couch"b.mo#go"b.!iak.hbase.html
0: PP8#t!o"uctio# to &assa#"!a: *eplicatio# a#" &o#siste#c'P p!ese#tatio#P9 b' +e#(ami# +lack . A*5:
http://www.sli"esha!e.#et/be#(ami#black/i#t!o"uctio#.to.cassa#"!a.!eplicatio#.a#".co#siste#c'
1: PPJIplo!i#g &ouch-+P9 a!ticle f!om 8+2 -e)elope! ,o!ksP9 b' Boe 5e##o# . A*5:
http://www.ibm.com/"e)elope!wo!ks/ope#sou!ce/lib!a!'/os.couch"b/i#"eI.html
2: P$pache maili#g list a##ou#ceme#t o# mail.a!chi)es.apache.o!gP9 b' $pache . A*5: http://mail.
a!chi)es.apache.o!g/mo"FmboI/i#cubato!.ge#e!al/200802.mboI/
Oc"4020080212116p61b52ce'fc0fb0a"81a179Qmail.gmail.comOe
: PP*e: P!opose" *esolutio#: Jstablish &ouch-+ 15PP o# mail.a!chi)es.apache.o!gP9 b' $pache . A*5:
http://mail.a!chi)es.apache.o!g/mo"FmboI/i#cubato!.couch"b."e)/200811.mboI/Oc?52$54.5?&8.
4&+0.8$6+.7-446?07462Q(agu6J1.comOe
4: PP&ouch-+ 6o345 -atabase *ea"' fo! P!o"uctio# AseP9 a!ticle f!om P& ,o!l" of Bull' 2010P9 b' Boab
Backso# . A*5:
http://www.pcwo!l".com/busi#essce#te!/a!ticle/201046/couch"bF#os<lF"atabaseF!ea"'Ffo!Fp!o"uctio#Fuse.
html
Page 74
5: PP&oach-+9 1ech#ical 7)e!)iewPP9 b' $pache . A*5: http://couch"b.apache.o!g/"ocs/o)e!)iew.html
6: PP,elcome to ?uto#P f!om P&ouch-+ 1he -efi#iti)e 0ui"ePP9 b' B. &h!is $#"e!so#9 Ba# 5eh#a!"t a#"
6oah 3late! . A*5: http://gui"e.couch"b.o!g/"!aft/tou!.htmlMwelcome
7: PP&ouch-+ i# the wil"P a!ticle of the p!o"uct%s web9 a list of softwa!e p!o(ects a#" websites usi#g
&ouch-+P9 b' &ouch-+ . A*5: http://wiki.apache.o!g/couch"b/&ouch-+Fi#FtheFwil"
8: PJmail to the &ouch-+.-e)el listP9 b' Jlliot 2u!ph' . A*5: http://mail.
a!chi)es.apache.o!g/mo"FmboI/couch"b."e)/200910.mboI/O&4$-5996.090104Qca#o#ical.comOJ
9: PJ75 fo! couch"b a#" "esktopcouchP9 b' . A*5: https://lists.ubu#tu.com/a!chi)es/ubu#tu."esktop/2011.
6o)embe!/00474.html
40: PP&ouch-+ at the ++& as a fault tole!a#t9 scalable9 multi."ata ce#te! ke'.)alue sto!ePP9 b' J#"a ?a!!ell .
A*5: http://www.e!la#g.facto!'.com/co#fe!e#ce/5o#"o#2009/speake!s/e#"afa!!ell
41: PNiew 3e!)e! -ocume#tatio# o# wiki.apache.o!gP9 b' $pache . A*5:
http://wiki.apache.o!g/couch"b/Niew3e!)e!
42: P+ackwa!"sFcompatibilit' P+!eaki#g &ha#gesPP9 b' $pache . A*5:
http://wiki.apache.o!g/couch"b/+!eaki#gFcha#ges
4: PP,h' &ouch-+EP f!om the P&ouch-+ 1he -efi#iti)e 0ui"ePP9 b' B. &h!is $#"e!so#9 Ba# 5eh#a!"t a#"
6oah 3late! . A*5: http://gui"e.couch"b.o!g/e"itio#s/1/e#/wh'.html
44: P@&ompa!i#g 2o#go -+ a#" &ouch -+C9 f!om 2o#go-+ webP9 b' 2o#go-+ . A*5:
http://www.mo#go"b.o!g/"ispla'/-7&3/&ompa!i#gK2o#goK-+Ka#"K&ouchK-+
45: PP2o#go-+ o! &ouch-+ . fit fo! p!o"uctio#EP9 <uestio# a#" !espo#ses at 3tack7)e!flowP9 b' Baso#
Pla#k . A*5: http://stacko)e!flow.com/<uestio#s/895762/mo#go"b.o!.couch"b.fit.fo!.p!o"uctio#
46: PP &ouch-+ &ase 3tu"iesP post o# $leI Popescu 6o345 blogP9 b' $leI Popescu . A*5:
http://#os<l.m'popescu.com/post/746667801/.couch"b.case.stu"ies
47: P@&ouch-+ fo! access log agg!egatio# a#" a#al'sisC9 post o# Ase!P!ime!'.#et blogP9 b' 3eth ?alco# .
A*5: http://use!p!ima!'.#et/posts/2009/06/1/couch"b.fo!.access.log.agg!egatio#.a#".a#al'sis/
48: P2o#go-+ Powe!i#g 21N%s ,eb P!ope!tiesP9 b' 2o#go-+ . A*5:
http://blog.mo#go"b.o!g/post/56000774/mo#go"b.powe!i#g.mt)s.web.p!ope!ties
49: P2o#go-+ li)e at c!aigslistP9 b' Be!em' Tawo"#' . A*5:
http://blog.mo#go"b.o!g/post/554519861/mo#go"b.li)e.at.c!aigslist
50: PP2o#go-+ at fou!s<ua!eP P!ese#tatio# at 2o#go6Y&P9 b' . A*5: http://blip.t)/file/704098
51: Phttp://www.the!egiste!.co.uk/2011/05/25/theFo#ceFa#"Ffutu!eFmo#go"b/ 2o#go-+ "a""': 2' bab'
beats 0oogle +ig1ableP9 b' . A*5:
52: P1he 2o#go-+ 6o345 -atabase +log9 1he $0P5P9 b' 2o#go-+ . A*5:
http://blog.mo#go"b.o!g/post/108249/the.agpl
5: P1he 2o#go-+ 6o345 -atabase +log9 2o#go-+ 1.4 *ea"' fo! P!o"uctio#P9 b' 2o#go-+ . A*5:
http://blog.mo#go"b.o!g/post/47285820/mo#go"b.1.4.!ea"'.fo!.p!o"uctio#
Page 75
54: PP1he 2o#go-+ 6o345 -atabase +log9 1he $0P5PP9 b' 2o#go-+ . A*5:
http://blog.mo#go"b.o!g/post/108249/the.agpl
55: PP2o#go-+ 3uppo!tP b' 10ge#P9 b' 2o#go-+ . A*5: http://www.10ge#.com/subsc!iptio#
56: P$!ticle P3ha!"i#gP o# 2o#go-+ $"mi#ist!ato!%s 2a#ualP9 b' 2o#go-+ . A*5:
http://www.mo#go"b.o!g/"ispla'/-7&3/3ha!"i#g
57: P0!i"?3 a!ticle o# 2o#go-+ -e)elope!%s 2a#ualP9 b' 2o#go-+ . A*5:
http://www.mo#go"b.o!g/"ispla'/-7&3/0!i"?3
58: P6086Z plugi# fo! 2o#go-+ sou!ce co"eP9 b' 2ike -i!olf . A*5: http://github.com/m"i!olf/#gi#I.g!i"fs
59: Plighttp" plugi# fo! 2o#go-+ sou!ce co"eP9 b' +!e#"a# 2c$"ams . A*5:
http://bitbucket.o!g/bwmca"ams/lighttp".g!i"fs/s!c/
60: PPAse &asesP a!ticle at 2o#go-+%s web pageP9 b' 2o#go-+ . A*5:
http://www.mo#go"b.o!g/"ispla'/-7&3/AseK&ases
61: PPP!o"uctio# -eplo'e#tsP a!ticle o# 2o#go-+ webP9 b' 2o#go-+ . A*5:
http://www.mo#go"b.o!g/"ispla'/-7&3/P!o"uctio#K-eplo'me#ts
62: Pmo#go . 1he 8#te!acti)e 3hellP9 b' 2o#go-+ . A*5: http://www.mo#go"b.o!g/"ispla'/-7&3/mo#goK.
K1heK8#te!acti)eK3hell
6: P@2o#go-+ 3chema -esig#: ;ow to 1hi#k 6o#.*elatio#alC Ba!e" *osoff%s p!ese#tatio# at Youtube P9 b'
Ba!e" *osoff . A*5: http://'outu.be/P8,N?At+N14
64: P@*ealtime $#al'tics with 2o#go-+C9 p!ese#tatio# b' Ba!e" *osoffP9 b' Ba!e" *osoff . A*5:
http://www.sli"esha!e.#et/(!osoff/scali#g.!ails.'ottaa
65: P@,eb $#al'tics usi#g 2o#go-+C of @P;P a#" 2o#go-+,eb -e)elopme#t +egi##e![s 0ui"eC bookP9 b'
*uba'eet 8slam . A*5: http://es.sc!ib".com/"oc/746011/6273.&hapte!.5.,eb.$#al'tics.Asi#g.
2o#go-+.3ample.&hapte!
66: P@2o#go-+ is ?a#tastic fo! 5oggi#gCP9 b' 2o#go-+ . A*5:
http://blog.mo#go"b.o!g/post/17225484/mo#go"b.is.fa#tastic.fo!.loggi#g
67: P@Asi#g 2o#go-+ fo! *eal.time $#al'ticsCP9 b' 2o#go-+ . A*5:
http://blog.mo#go"b.o!g/post/171501/usi#g.mo#go"b.fo!.!eal.time.a#al'tics
68: P@Picki#g the *ight 6o345 -atabase 1oolC: post f!om 2o#itis% blogP9 b' 2o#itis . A*5:
http://blog.mo#itis.com/i#"eI.php/2011/05/22/picki#g.the.!ight.#os<l."atabase.tool/
69: PP*eal.1ime $#al'tics 3chema -esig# a#" 7ptimiLatio#PP9 b' *'a# 6itL . A*5:
http://www.10ge#.com/p!ese#tatio#s/!eal.time.a#al'tics.schema."esig#.a#".optimiLatio#
70: PP2o#go-+ fo! $#al'ticsPP9 b' Boh# 6u#emake! . A*5: http://www.10ge#.com/p!ese#tatio#s/mo#go.
chicago.2011/mo#go"b.fo!.a#al'tics
71: PP*eal 1ime $#al'tics with 2o#go-+ ,ebi#a!PP9 b' Ba!e" *osoff . A*5:
http://www.10ge#.com/p!ese#tatio#s/webi#a!/!eal.time.a#al'tics.with.mo#go"b
72: PP*eal.1ime 5og &ollectio# with ?lue#t" a#" 2o#go-+PP9 b' 1!easu!e -ata . A*5: http://blog.t!easu!e.
Page 76
"ata.com/post/176626262/!eal.time.log.collectio#.with.flue#t".a#".mo#go"b
7: PP3ocial -ata a#" 5og $#al'sis Asi#g 2o#go-+PP9 b' 1akahi!o 8#oue . A*5:
http://www.sli"esha!e.#et/"o!'oku(i#/social."ata.a#".log.a#al'sis.usi#g.mo#go"b
Page 77

Das könnte Ihnen auch gefallen