A practical case of why, what and how to use a NoSQL Database Management System instead of a relational one Jos Manuel Ciges Regueiro <jmanuel@ciges.net> - Student of the V Master on Free Software Projects Development and Management !""#!" Page 1 Copyright (cc) 2012 Jos Manuel Ciges Regueiro. Some rights reserved. his !or" is #ree and is licensed under the conditions o# Creative Commons $ttri%ution & Share $li"e '.0 (nported license. )ou can use* distri%ute and reuse this !or" i# the same license is applied and the author is +uoted ,ull te-t o# the license can %e read on http.//creativecommons.org/licenses/%y0sa/'.0/deed.en he author1s personal !e% page is http.//!!!.ciges.net and #or contacting him pre#erred method is %y email at 2manuel3ciges.net. 4 am also availa%le at social net!or"s li"e ,ace%oo" 1 * 5oogle6 2 * !itter ' or 4denti.ca 7 1 http://www.facebook.com/ciges 2 https://plus.google.com/105050850707469524247/posts https://twitte!.com/ciges 4 http://i"e#ti.ca/ciges Page 2 Inde Foreword......................................................................................................................................................... 5 Notation used for the references and bibliography..................................................................................... 6 Introduction and description......................................................................................................................... 7 1. $utho!%s "esc!iptio#.............................................................................................................................................................................7 2. &ompa#'%s "esc!iptio#........................................................................................................................................................................7 . P!o(ect%s ob(ecti)es..............................................................................................................................................................................8 4. *ealisatio# co#"itio#s..........................................................................................................................................................................8 5. +!ief "esc!iptio# of the wo!k pla#......................................................................................................................................................10 NoSQL State of the Question....................................................................................................................... 11 1. ,hat a!e we talki#g about.................................................................................................................................................................11 2. -ocume#t.o!ie#te" "atabases..........................................................................................................................................................12 . /e'.)alue "atabases ........................................................................................................................................................................1 4. 0!aph "atabases...............................................................................................................................................................................14 5. 1abula! "atabases.............................................................................................................................................................................14 6. 2'345 as 6o345 "atabase.............................................................................................................................................................15 7. 7the! i#te!esti#g !ea"i#gs a#" featu!e compa!iso#s........................................................................................................................15 etailed description of so!e NoSQL "#S.............................................................................................. 16 1. &assa#"!a.........................................................................................................................................................................................16 2. &ouch-+...........................................................................................................................................................................................20 . 2o#go-+...........................................................................................................................................................................................24 Loo$ing for a NoSQL solution for our needs............................................................................................. %& 1. 8#t!o"uctio# a#" i#itial app!oach.......................................................................................................................................................0 2. $#al'se of 8#te!#et access logs9 "esc!ibi#g the p!oblem..................................................................................................................0 . 7h m' 0o": ,e ha)e a lot of optio#s:..............................................................................................................................................1 4. 4uestio#s we shoul" a#swe! befo!e maki#g a choice......................................................................................................................2 5. -esc!iptio# of the "ata fo! a# 8#te!#et $ccess 5og ma#ageme#t s'stem.........................................................................................2 6. &hoosi#g betwee# se)e!al 6o345 p!o"ucts....................................................................................................................................4 7. $#" the ,i##e! is ..... 2o#go-+:......................................................................................................................................................6 8. ,hat we will "o f!om he!e ................................................................................................................................................................7 9. 8#te!esti#g !ea"i#gs...........................................................................................................................................................................7 Installation of #ongo".............................................................................................................................. %' 1. -eplo'i#g 2o#go-+ bi#a!ies............................................................................................................................................................8 2. &ompili#g a#" i#stalli#g P;P "!i)e! .................................................................................................................................................8 . &ompili#g a#" i#stalli#g P'2o#go9 the P'tho# "!i)e! fo! 2o#go-+................................................................................................9 4. &o#figu!i#g the se!)e!.......................................................................................................................................................................9 5. 8#stalli#g *ock2o#go9 a P;P base" a"mi#ist!atio# tool..................................................................................................................40 6. $uthe#ticatio# i# 2o#go-+...............................................................................................................................................................41 7. -e)elope" sc!ipts fo! sta!ti#g9 stoppi#g a#" )e!if'i#g 2o#go-+ status...........................................................................................42 NoSQL Sche!a esign for Internet (ccess Logs .................................................................................... )6 1. $#al'sis of logs with 6o345.............................................................................................................................................................46 2. -esc!iptio# of a# e<ui)ale#t 2'345 "atabase.................................................................................................................................46 . -efi#i#g a schema fo! 2o#go-+ 6o345..........................................................................................................................................46 4. 8#te!esti#g a!ticles a#" p!ese#tatio#s................................................................................................................................................49 *o!parati+e of a #ySQL based solution +ersus #ongo".....................................................................5& 1. ,o!k pla# fo! the pe!fo!ma#ce tests.................................................................................................................................................50 2. P;P -e)elopme#t fo! testi#g the "atabase mo"el...........................................................................................................................50 Page . 1esti#g 2o#go-+ )s 2'345 pe!fo!ma#ce.......................................................................................................................................52 4. 8#se!tio# tests....................................................................................................................................................................................5 5. 2ulti use! co#cu!!e#t tests................................................................................................................................................................54 6. -ata a#al'se =agg!egatio#> !ea" tests..............................................................................................................................................60 7. $gg!egatio# !ea" tests co"e..............................................................................................................................................................60 8. ;ow to !u# this tests..........................................................................................................................................................................6 *onclusions and last words........................................................................................................................ 67 1. 1ests co#clusio#s...............................................................................................................................................................................67 2. 8#itial pla##i#g a#" actual time spe#t o# each task...........................................................................................................................67 . P!oblems a#" bugs fou#"..................................................................................................................................................................68 4. ?utu!e wo!k........................................................................................................................................................................................70 5. &o#t!ibutio#s to the commu#it'.........................................................................................................................................................70 6. Pe!so#al e)aluatio# of the p!acticum................................................................................................................................................72 "ibliography , -eferences.......................................................................................................................... 7% Page 4 !oreword This work makes part of the fifth edition of the Master on Free Software Projects Development and Management 5 , created ! the galician open so"rce cons"ltanc! #galia $ and the %niversidad &e! '"an (arlos ) "niversit! at Madrid* This master is composed from live sessions imparted ! professionals from different speciali+ed open so"rce sol"tions enterprises and "niversit! researchers, practical works and a final project ,called practic"m- which co"ld e made in an enterprise* .s # am working in an open so"rce department at PS. Pe"geot (itro/n 0 ,as a worker for the #T cons"lting Seresco 1 - # tho"ght it co"ld e a good idea to appl! part of the knowledge ac2"ired in the master to a project that is interesting for PS.* 3"r department manages open so"rce server sol"tions over %ni4 servers ,mainl! "t not onl! 5in"4- and at that time ,earl! 6786- was considering making a st"d! of 9oS:5 and how this class of dataase management s!stems co"ld e "sef"l for enterprise needs* So what it;s presented ne4t is the res"lt of st"d!ing a few open so"rce 9oS:5 sol"tions, choosing and deplo!ing one of them in PS.;s servers and making a comparative etween the dataase management s!stem "sed at this moment for #nternet access log management and the chosen one, MongoD<* .ll of this wo"ld not have een possile, at least the wa! it has een done, witho"t a lot of people, "t speciall!= #galia;s and 5iresoft;s 87 people who had een made possile that in a distant cit! located in the northwest corner of Spain eight st"dents co"ld enjo! the compan! of open so"rce e4perts Seresco and PS. Pe"geot (itro/n;s >.MP department, who has given all the collaoration possile to make compatile the re2"irements of the Master with ever!da! work M! wife and m! !o"ng da"ghter, who have had a lot of patience with that little g"! ehind his comp"ter the! are living with M! father, who alwa!s has een there when # needed to delegate m! father f"nctions 89e ,ree: 9e ;ild: 9e <pen:= 'os? M* (iges 5 http://www.maste!softwa!elib!e.com/ 6 http://www.igalia.com/ 7 http://www.u!(c.es/ 8 http://www.psa.peugeot.cit!oe#.com 9 http://www.se!esco.es/ 10 5ib!esoft is the lib!e softwa!e a#" ope# commu#ities !esea!ch g!oup f!om @A#i)e!si"a" *e' Bua# &a!losC http://lib!esoft.es/ Page 5 Notation used for the references and bibliography Most of affirmations made in this doc"ment are s"pported ! #nternet references ,logs of e4perts, wes of enterprises which made the technologies cited-, ooks or p"lished papers* # have "sed the following criteria to incl"de the references St"dies shown on log;s posts, ooks, papers or official doc"mentation from s"pporting enterprises are shown as part of the iliograph! ,at the end of the doc"ment-* The reference to the iliograph! is made with a n"mer etween @ and A %&5s to prod"cts official we pages and links to #nternet cited are shown as a little n"mer, which leads to a note on the footer of the same page* .s a footnote is shown what normall! wo"ld e a link ,in a digital doc"ment-* # have preferred this format to avoid loosing information in case this work is printed on paper ,please don;t kill trees onl! for a 2"estion of comfort- #n the following e4ample we can see "tilisation of oth= .pache (assandra 88 is an open so"rce distri"ted 9oS:5 dataase management s!stem* #t is an .pache Software Fo"ndation topBlevel project>1?. 11 http://cassa#"!a.apache.o!g/ Page 6 Introduction and description "# Author$s description # am a Spanish s!stems engineer orn in 81)$ who has een working last 1 !ears in PS. Pe"geot (itro/n ,from now on PS.- >.MP;s department hired ! Seresco ,an #T (ons"lting enterprise-* This department gives s"pport for some 3pen So"rce servers on %ni4 machines ,mainl! 5in"4- for the needs of an! PS.;s worker worldwide* The prod"cts we work on a dail! asis are .pache we server, M!S:5, PCP, Tomcat, Free&adi"s, MediaDiki and T!poE* # discovered 5in"4 at %niversit!* .t that time # knew nothing ao"t Free Software, # installed 5in"4 j"st eca"se someod! told me with that # co"ld make the %ni4 e4ercises at home* 5ater, when #nternet ecome part of m! life, # ecome interested in ever!thing aro"nd 5in"4 and now # am a F9%G5in"4 H Free Software fan o!* .part from that # am also a father, h"sand, and in m! free time # tr! to make some sport, pa! attention to what happens aro"nd and learn ever!da! something new =B- M! personal data are 9ame= 'os? Man"el (iges &eg"eiro <orn date= 65 Fer"ar! 81)$ >d"cation Title= Technical >ngineering in (omp"ter S!stems at %niversit! of 5a (or"Ia ,Spain- (it! in which # live= Jigo %# &ompany$s description .s # said # work for Seresco 86 , an #T (ons"lting Spanish compan! orn in 81$1 with aro"nd 577 emplo!ees in the co"ntr!* Seresco;s main activities are software development, technical assistance, cons"ltanc! and is speciali+ed in the areas of h"man reso"rces and geographical information* (lients of Seresco are other enterprises and Spanish regional governments* #n Falicia one of this clients is PS. Pe"geot (itro/n 8E , as this m"ltinational a"tomoiles and motorc!cles man"fact"rer has a factor! at Jigo* PS. is the second largest >"ropean a"tomaker and the eighth largest in the world meas"red ! 6787 "nit prod"ction* Dith its E6 man"fact"ring facilities PS. emplo!ees 80$*777 and makes aro"nd E,5 millions of vehicles per !ear* From an #T S!stems point of view from Jigo;s factor! we give service to ever! technical team to install, config"re and "pdate some 3pen So"rce prod"cts as .pache we server, M!S:5, PCP, Tomcat, Free&adi"s, MediaDiki and T!poE* The #T S!stems at PS. emplo!ees 6*$77 people in 6$ different co"ntries* . ro"gh s"mmar! of the facilities co"ld e= Servers= $*577 instances of %ni4, E*677 instances of Dindows servers, and also a few tens of Mainframe +G3S, JMS and Tandem G F"ardian 3ffice e2"ipment= 0)*777 client comp"ters, most of them ,over )7*777- windows 12 http://www.se!esco.es 1 http://www.psa.peugeot.cit!oe#.com/ Page 7 The project;s t"tor from Seresco will e .ndr?s &iveiro Sestelo, the director of compan!;s Falician area* .t PS.;s side this work will e verified ! David FernKnde+ FernKnde+, the head of the department* '# (ro)ect$s ob)ecti*es The term 9oS:5 is fairl! pop"lar in last !ears to designate a class of dataase management s!stems ,D<MS from now- where the relations etween gro"ps of data are not reall! important* 9oS:5 dataase s!stems rose alongside major internet companies, s"ch as Foogle, .ma+on, Twitter, and Faceook 8L which had significantl! different challenges that the traditional &D<MS sol"tions co"ld not cope with* The main raison to work with this D<MS is to manage huge amounts of data where we need to generate reports and make calc"lations "t d"e to the nat"re of the prolem the traditional data schema where the data is stored in records with the identical str"ct"re gro"ped in tales and related etween them is not "sef"l* De co"ld think in prolems as statistical data anal!se, logs information, doc"ment or we page inde4ing, geographical information management M* Dhen we sa! ig amo"nts of data we mean the data must be distributed etween an "ndetermined n"mer of comp"ters and the architect"re m"st e fa"lt tolerant the performance of retrieving and appending operations m"st e ver! high, as the read & write performance are critical operations over the data must be distributed also, the s!stem m"st e ale to coordinate different nodes and the res"lts got ! them to get the final res"lts
+# ,ealisation conditions .ll the project has een reali+ed on PS.;s #T installations, as part of m! work in the office or at home "sing a network connection via JP9* The work has eg"n the 5 of March with an estimated completion date of late '"l! and an estimated work charge of E77 ho"rs* M! time at PS. will not e f"ll! dedicated to this 9oS:5 st"d!, as there are other work tasks that re2"ire m! time* So the n"mer of ho"rs per da! will e variale etween 7 and 0* The hardware "sed for the tests will e two identical machines with the following specifications= &.M= 87 F< (ores= E6 ,.MD 3pteron Processor $860- 3ther info= this machines are virt"al machines hosted on a ph!sical DellN Power>dgeN &085 85 server The initial st"d! and information searching will e done reading articles and comparatives from the #nternet* For the development and tests phase almost all the technologies "sed will e serverBside* #n partic"lar= ServerBside software technologies "sed 3perating S!stem= Suse Linux Enterprise Server 10 ,Jirt"al Server on Oen- 14 &assa#"!a9 #ow mai#tai#e" b' $pache was i#itiall' a ?acebook%s p!o(ect. 3ee @&assa#"!a D $ st!uctu!e" sto!age s'stem o# a P2P 6etwo!kC at http://www.facebook.com/#ote.phpE#oteFi"G244118919Hi"G9445547199Hi#"eIG9 15 http://www."ell.com/us/e#te!p!ise/p/powe!e"ge.!815/p" Page 8 The servers to compare will e MongoDB !!0 8$ and M"S#L 8) $!0!% with M!#S.M tales The development of scripts will e done on &'&, (avaScript and S#L* The code of the PCP classes will e doc"mented "sing phpDocumentor 1) * The code editor will e *+M From the clientBside # will "se ."tomation tasks and single "ser load tests will e done "sing Shell scripts ,otepad-- as code editor (Meter 1. for making m"lti"ser load tests, "sing the pl"gin Stepping Thread Fro"p 67 for some of them The res"lts from 'Meter will e represented graphicall! "sing the statistical software / 68 Libre0ffice for writing the doc"mentation 1rgo2ML 66 for showing a diagram with the architect"re of PCP classes # have made this work as part of the Master on Free Software Projects Development and Management and # think it3s important and coherent with its spirit that4 when possible4 the documentation created4 methodolog" used and conclusions reached should be of public availabilit" ,and all the tools "sed sho"ld e 3pen So"rce-* The onl! limit to this consideration is the internal data of PS. and those tools oliged ! the work place conditions ,as Dindows for the workstation =B-* #nternal data ,as config"rations or c"stom installation directories- has een caref"ll! replaced in the doc"mentation ! fake data ,not real, "t valid an!wa!-* The tools "sed to make this had een the following= . s"mmar! of the information told in this doc"ment is availale as posts at m" personal web page 6E ,in Spanish-* This we has een done with 5ext&attern 6L , a PCP (ontent Management S!stem* This doc"ment and a presentation made are also availale* The code developed in PCP is availale in 6ithub at (igesGinternetPaccessPcontrolPdemo 65 * The CTM5 doc"mentation created with phpDoc"mentor is at http=GGwww*ciges*netGphpdocGiacdG#nternet.ccess5ogG The presentation has een "ploaded to SlideShare % 16 2.2.0 is the p!o"uctio# !elease f!om $ugust 2012 http://www.mo#go"b.o!g 17 2'83$2 is the light e#gi#e fo! web.base" http://www.m's<l.com/ 18 http://www.php"oc.o!g/ 19 http://(mete!.apache.o!g/ 20 http://co"e.google.com/p/(mete!.plugi#s/wiki/3teppi#g1h!ea"0!oup 21 http://www.!.p!o(ect.o!g/ 22 http://a!gouml.tig!is.o!g/ 2 http://www.ciges.#et 24 1eItpatte!# is a P;P +ase" &o#te#t 2a#ageme#t 3'stem http://teItpatte!#.com/ 25 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo 26 http://www.sli"esha!e.#et/&iges/#o.s<l.p!o(ectmswlp!ese#tatio# Page 9 -# .rief description of the wor/ plan The work plan is composed of fo"r main parts= NoSQL State of the Question .s at this moment we know reall! nothing of 9oS:5 technologies first we will have to ecome familiar with this kind of dataase management s!stems* So this part is a summar" of what + have learned reading articles and boo7s, and the first general concl"sions reached* De are st"d!ing to appl! 9oS:5 technologies for log information management, so in this part we will describe one of the use cases4 the management of the access log from the enterprise networ7 to +nternet* From this "se case we will choose a 9oS:5 sol"tion etween the prod"cts availale in the 3pen So"rce world* Installation of chosen solution and database design for an Internet access control software This part will e more technical* #t will have .n e4planation of some details of how the product has been installed, its config"ration and scripts developed if we have the need to develop some* 1 database schema design for ,S#L ased on M!S:5 act"al sol"tion* The str"ct"re chosen and the information fields will e shown &omparati*e of a MySQL based solution *ersus new NoSQL one Dith a ver! similar dataase on M!S:5 and on the new 9oS:5 D<MS 6) # will tr! to do performance tests on oth "sing generated random data ,# can not "se real data eca"se of confidentialit!-* To make this tests valid the vol"me of data sho"ld e high* .s we can not "se real data some scripts will e developed to make it possile= 6enerate fa7e random data similar to real one ,%&5s, "ser ids, timestamps, #Ps M- (reate tales on the dataase and save all this data in a ver! similar str"ct"re regardless of the D<MS "sed 5he classes developed will have the same interface4 so the DBMS chosen will be transparent to to the applications that "se them 5hen4 using this classes a list of write and read tests will be developed to compare oth sol"tions* . description of each test and how to pla! them will e made in this part &onclusions and last words Dell, then ojective of this st"d! is to learn ao"t 9oS:5 and to have concrete data to decide if it;s a good idea to "se 9oS:5 for some of PS.;s needs* Cere the res"lts otained, its limitations if there are an! and f"t"re work to do will e detailed* 27 -atabase 2a#ageme#t 3'stem Page 10 NoSQL State of the Question "# 0hat are we tal/ing about .s # said in the previo"s chapter 9oS:5 is a class of dataase management s!stems that differ from the classic model of the relational dataase management s!stems= 5he" do not use S#L 9oS:5 dataase s!stems rose alongside major internet companies, s"ch as Foogle, .ma+on, Twitter, and Faceook which had significantl! different challenges in dealing with h"ge 2"antities of data that the traditional &D<MS sol"tions co"ld not cope with* The kind of prolems this dataases are developed for are the management of reall! ig amo"nts of data that do not follow necessaril" a fixed schema* The data is partitioned etween different machines ,for performance 2"estions and d"e its si+e- so (0+, operations and are not usable and .(#D g"arantees are not given* D<MS and S:5 are not valid tools Ma" not give full 18+D guarantees %s"all! onl" eventual consistenc" is guaranteed or transactions limited to single data items* This means that given a s"fficientl! long period of time over which no changes are sent, all "pdates can e e4pected to propagate event"all! thro"gh the s!stem* %s"all! the! have a distributed architecture and are fault tolerant Several 9oS:5 s!stems emplo! a distri"ted architect"re, with the data held in a red"ndant manner on several servers* #n this wa!, the s!stem can easil! scale o"t ! adding more servers, and fail"re of a server can e tolerated* This t!pe of dataases t!picall! scale hori9ontall" and are "sed for managing with big amounts of data, when the performance and realBtime nat"re is more important than consistenc! ,as inde4ing a large n"mer of doc"ments, serving pages on highBtraffic wesites, and delivering streaming media-* 9oS:5 Dataases are often highl! optimi+ed for retrieve and append operations and often offer little f"nctionalit! e!ond record storage ,e*g* ke!Bval"e stores-* The red"ced r"n time fle4iilit! compared to f"ll S:5 s!stems is compensated ! significant gains in scalailit! and performance for certain data models* #n short, 9oS:5 dataase management s!stems are "sef"l when we work with a h"ge 2"antit! of data, and the data;s nat"re does not re2"ire a relational model for the data str"ct"re* The data co"ld e str"ct"red, "t it is minimal and what matters is the ailit! of storing and retrieving great 2"antities of data, and not the relations etween the elements* <! e4ample, we want to store millions of pairs ke!Bval"e in one or a few associative arra!s or we want to store million of data records* This is partic"larl! "sef"l for statistical or realBtime anal!ses for growing list of elements ,think in posts at Twitter or the logs of access to #nternet from a ig gro"p of "sers-* 9oS:5 dataases are categori+ed according to the wa! the! store the data* The main categories we co"ld consider are Doc"mentBoriented dataases Page 11 Qe!Bval"e dataases Fraph dataases Ta"lar dataases, also called (ol"mnar dataases %# Document1oriented databases . doc"mentBoriented dataase stores, retrieves, and manages semi str"ct"red data* 5he element of data is called :document;* Different implementations offer different wa!s of organi+ing andGor gro"ping doc"ments= (ollections Tags 9onBvisile Metadata Director! hierarchies (ompared to relational dataases we co"ld sa! collections are as tales, and doc"ments are as records* <"t there is one ig difference= ever! record in a tale have the same n"mer of fields, while documents in a collection could have completel" different fields* >ncodings in "se incl"de OM5, R.M5, 'S39, and <S39, as well as inar! forms like PDF and Microsoft 3ffice doc"ments ,MS Dord, >4cel, and so on-* Doc"ments are addressed in the dataase via a "ni2"e 7e" that represents that doc"ment* 3ne of the other defining characteristics of a doc"mentBoriented dataase is that, e!ond the simple ke!Bdoc"ment ,or ke!Bval"e- look"p that !o" can "se to retrieve a doc"ment, the dataase will offer an 1&+ or <uer" language that will allow "ou to retrieve documents based on their contents* Data eample For e4ample, MongoD< "ses a inar! form 'S39 to store data* .n e4ample MongoD< collection co"ld e descried as { "_id": ObjectId("4efa8d2b7d284dad101e4bc9"), "Last Name": "D!ON", ""i#st Name": "$ea%", "&'e": 4( ), { "_id": ObjectId("4efa8d2b7d284dad101e4bc7"), "Last Name": "*+LL+,IN", ""i#st Name": ""#a%c-", "&'e": 29, "&dd#ess": "1 c.emi% des L/'es", "0it1": "2+,3&ILL+3" ) Page 12 Some Open Source solutions >4ample of 3pen So"rce doc"mentBoriented 9oS:5 dataases that we will st"d! are= MongoDB= . 87gen 60 project which store str"ct"re data as 'S39Blike doc"ments with d!namic schemas ,MongoD< calls the format <S39-* MongoD< provides d!namic 2"eries, inde4es, geospatial inde4es and masterBslave replication with a"toBfailover* MongoDB is being developed as a business with commercial support available! Best used= #f !o" need d!namic 2"eries* #f !o" prefer to define inde4es, not mapGred"ce f"nctions* #f !o" need good performance on a ig D<* #f !o" wanted (o"chD<, "t !o"r data changes too m"ch, filling "p disks* >2? #f !o"r D< is E9F and !o" donSt do an! joins ,!o"Sre j"st selecting a "nch of tales and p"tting all the ojects together, .Q. what most people do in a weapp-, MongoD< wo"ld proal! kick ass for !o"* >'? >or example >2?= For most things that !o" wo"ld do with M!S:5 or PostgreS:5, "t having predefined col"mns reall! holds !o" ack* 8ouchDB= .n .pache Software Fo"ndation project which "ses 'S39 over &>STGCTTP* (o"chD< provides master?master replication and versioning* Best used= For acc"m"lating, occasionall! changing data, on which preBdefined 2"eries are to e r"n* Places where versioning is important* >2? >or example= (&M, (MS s!stems* MasterBmaster replication is an especiall! interesting feat"re, allowing eas! m"ltiBsite deplo!ments* >2? '# 2ey1*alue databases Data is stored as pairs ke!Bval"e, in a schemaBless wa!* . val"e co"ld e of an! data t!pe or oject* >4ample of 3pen So"rce ke!Bval"e 9oS:5 dataases that we will st"d! are= 8assandra= 61 #t is an 1pache Software >oundation top?level pro@ect >1?, initiall! developed ! Faceook* #t is distri"ted and designed for "sing commodit! servers >7?* #t is possile to "se Map &ed"ce with .pache Cadoop* Best used= Dhen !o" write more than !o" read ,logging-* >2? >or example= <anking, financial ind"str!* Drites are faster than reads, so one nat"ral niche is real time data anal!sis* >2? .pache Cadoop is a software framework that s"pports dataBintensive distri"ted applications which is ecoming a standard for data storing and anal!se* Membase= E7 Memcache li7e compatible database ,it "ses Memcache protocol- "t with persistence to disk and masterBmaster replication ,all nodes are identical- 28 http://10ge#.com/ 29 http://cassa#"!a.apache.o!g/ 0 http://www.couchbase.com/membase Page 1 Best used= .n! application where lowBlatenc! data access, high conc"rrenc! s"pport and high availailit! is a re2"irement* >2? For e4ample= 5owBlatenc! "seBcases like ad targeting or highl!Bconc"rrent weapps like online gaming >2? /edis= E8 . ver! fast 9oS:5 dataase that 7eeps most data in memor"* Provides masterBslave replication and transactions* .s Memcached it does not scale, can do sharding ! handling it in the client, and therefore, !o" canSt j"st start adding new servers and increase !o"r thro"ghp"t* 9or is it fa"lt tolerant* Ro"r &edis server dies, and there goes that data* &edis also s"pports replication* >'? Best used= For rapidl! changing data with a foreseeale dataase si+e ,sho"ld fit mostl! in memor!-* >2? >or example= Stock prices* .nal!tics* &ealBtime data collection* &ealBtime comm"nication* >2? /ia7= E6 &iak is a 9oS:5 dataase implementing the principles from 1ma9on3s D"namo storage s"stem* /ia7 provides built?in Map /educe s"pport, f"llBte4t search, inde4ing H 2"er!ing* (omes in Topen so"rceT and TenterpriseT editions* Best used= #f !o" want something (assandraBlike, "t no wa! !o";re gonna deal with the loat and comple4it!* #f !o" need ver! good singleBsite scalailit!, availailit! and fa"ltBtolerance, "t !o";re read! to pa! for m"ltiBsite replication* >2? >or example= PointBofBsales data collection* Factor! control s!stems* Places where even seconds of downtime h"rt* (o"ld e "sed as a wellB"pdateBale we server* >2? +# 3raph databases This kind of dataases are tho"ght for data whose relations are well represented with a graphBst!le ,elements interconnected with an "ndetermined n"mer of relations etween them-* The kind of data co"ld e social relations, public transport lin7s, road maps or networ7 topologies, ! e4ample* >4amples of 3pen So"rce Fraph 9oS:5 dataases co"ld e= ,eoA@= EE Fraph dataase with f"ll .(#D conformit!, transactions, inde4ing of nodes and relationships and advanced pathBfinding with m"ltiple algorithms* >loc7DB= EL Fraph dataase created ! Twitter for managing pl"s de 8E illion of relationships etween its "sers* -# 4abular databases #n this kind of D<MS data is organi+ed in col"mns* >ach rows has one or more val"es for a n"mer of possile col"mns* >4ample of 3pen So"rce Ta"lar 9oS:5 dataases are= 'Base= E5 .n alternative to Foogle;s <ig Tale that "ses Cadoop;s CDFS as storage* 3ffers MapGred"ce with 1 http://!e"is.io/ 2 http://wiki.basho.com/*iak.html http://#eo4(.o!g 4 https://github.com/twitte!/flock"b 5 http://hbase.apache.o!g/ Page 14 .pache Cadoop* Best used= Dhen !o" "se the CadoopGCDFS stack* Dhen !o" need random, realBtime readGwrite access to <ig TaleBlike data* >2? >or example= For data that;s similar to a search engine;s data* >2? 8assandra, aforementioned, co"ld e considered also a ta"lar dataase d"e to ke!s map to m"ltiple val"es, which are gro"ped into column families 5# MySQL as NoSQL database &ecentl! a preview version of ne4t M!S:5 5*$ >@? at M!S:5 Developer Uone has een released ! 3racle* This version has a 9oS:5 interface* Dith this interface applications co"ld write and read to a #nnoD< storage "sing a MemcacheBt!pe .P#* The data co"ld e in memor! or stored in the #nnoD< Storage >ngine, and in the val"e m"ltiple col"mns co"ld e stored* 5his software is "et experimental, "t in the f"t"re co"ld e interesting* More info can e read at the following articles= M!S:5 5*$ preview introd"ces a 9oS:5 interface at The C we* >A? 9oS:5 to #nnoD< with Memcached at TTransactions on #nnoD<T log* >B? 6# Other interesting readings and feature comparisons (onsistenc! Models in 9onB&elational Dataases ! F"! Carrison = . good e4planation of (.P Theorem, >vent"al consistenc! and how consistenc! prolems can e handled in distri"ted environments* >C? The appendi4 of the 3;&eall! ook (assandra= The Definitive F"ide makes a ver! good description of the 9oS:5 c"rrent stat"s* >D? The following articles done good e4planations to know which 9oS:5 sol"tions is the right choice for the scenario we are facing (o"chD< vs MongoD< ! Fariele 5ana= . good comparison etween (o"chD< and MongoD< with an e4cellent e4planation of MapG&ed"ce* >10? Sho"ld # "se MongoD< or (o"chD< ,or &edis-V, ! &i!ad Qalla* >11? Page 15 Detailed description of some NoSQL D.MS # have read a lot over a few 9oS:5 Dataase Management S!stems which co"ld e interesting for o"r needs ,log management-* #n partic"lar there are three that ca"ght initiall! m! attention= (assandra, (o"chD< and MongoD<* #n this section # will made a rief description of each one* "# &assandra .pache (assandra E$ is an open so"rce distri"ted 9oS:5 dataase management s!stem* #t is an .pache Software Fo"ndation topBlevel project>1? designed to handle ver! large amo"nts of data spread o"t ;;;across man! commodit! servers;;; while providing a highl! availale service with ;;;no single point of fail"re;;;* (assandra provides a structured 7e"?value store with tunable consistenc"* Qe!s map to m"ltiple val"es, which are gro"ped into col"mn families* Different ke!s can have different n"mers of col"mns* This makes (assandra a h!rid data management s!stem etween a 7e"?value and a tabular database* 7istory .pache (assandra was developed at Faceook to power their #no4 Search feat"re ! .vinash 5akshman ,one of the a"thors of .ma+on;s D!namo- and Prashant Malik* #t was released as an open so"rce project on Foogle code in '"l! 6770* #n March 6771, it ecame an .pache #nc"ator project* 3n Fer"ar! 8), 6787 it grad"ated to a topBlevel project* >12? Faceook aandoned (assandra in late 6787 when the! "ilt Faceook Messaging platform on C<ase* >1'? Licensing and support .pache (assandra is an .pache Software Fo"ndation project, so it has an .pache 5icense ,version 6*7- E) * There are professional grade s"pport availale from a few companies* #n the official wiki of .pache (assandra;s project >17? the following ones, which collaorate with developers to the project, are mentioned .c"n" E0 Datasta4 E1 Main features Decentrali9ed >ver! node in the cl"ster has the same role* There is no single point of failure* Data is distri"ted across the cl"ster ,so each node contains different data-, "t there is no master as ever! node can service an! re2"est* Supports replication and multi datacenter replication 6 http://cassa#"!a.apache.o!g/ 7 http://www.apache.o!g/lice#ses/58&J63J.2.0.html 8 http://www.acu#u.com/ 9 http://"atastaI.com/ Page 16 &eplication strategies are config"rale >1@?. (assandra is designed as a distri"ted s!stem, for deplo!ment of large n"mers of nodes across m"ltiple data centers* Qe! feat"res of (assandraSs distri"ted architect"re are specificall! tailored for m"ltiple datacenter deplo!ment, for red"ndanc!, for failover and disaster recover!* Elasticit" &ead and write thro"ghp"t oth increase linearl! as new machines are added, with no downtime or interr"ption to applications* >ault?tolerant Data is a"tomaticall! replicated to m"ltiple nodes for fa"ltBtolerance* &eplication across m"ltiple data centers is s"pported* Failed nodes can e replaced with no downtime* 5unable consistenc" Drites and reads offer a t"nale level of consistenc!, all the wa! from Twrites never failT to Tlock for all replicas to e readaleT, with the 2"or"m level in the middle* Map /educe support (assandra has Cadoop integration, with Map &ed"ce s"pport* There is s"pport also for .pache Pig L7 and .pache Cive L8 * >1A? #uer" language (:5 ,(assandra :"er! 5ang"age- was introd"ced, an S:5Blike alternative to the traditional &P( interface* 5ang"age drivers are availale for ;;;'ava;;; ,'D<(- and ;;;P!thon;;; ,D<.P#6-* 8nterprises who use &assandra . ver! rief list of known enterprises who "ses (assandra co"ld e= ,etflix, "ses (assandra as their ackBend dataase for their streaming services >1B? 5witter, anno"nced it is planning to "se (assandra eca"se it can e r"n on large server cl"sters and is capale of taking in ver! large amo"nts of data at a time* >1C? 2rban 1irship, "ses (assandra with the moile service hosting for over 8$7 million application installs across 07 million "ni2"e devices* >1D? 8onstant 8ontact, "ses (assandra in their social media marketing application* >20? /ac7space >21? 8isco3s BebEx "ses (assandra to store "ser feed and activit! in near real time* >22? . list more complete can e looked "p at Datasta4 T(assandra %sersT page L6 Data manipulation9 /eys, row /eys, columns and column families .s said in 9oS:5 State of the :"estion section we co"ld consider (assandra a h"brid between a 7e"?value and a tabular database* For each ke! in (assandra corresponds a val"e which is an oject* >ach ke! has val"es as col"mns, and col"mns are gro"ped together into sets called col"mn families* .lso, each col"mn families can e gro"ped in s"per col"mn families* 40 http://pig.apache.o!g/ 41 http://hi)e.apache.o!g 42 http://www."atastaI.com/cassa#"!ause!s Page 17 So each ke! identifies a row, of variale elements n"mer* This col"mn families co"ld e considered then as tales* . tale in (assandra is a distri"ted m"lti dimensional map inde4ed ! a ke!* F"rthermore, applications can specif! the sort order of col"mns within a S"per (ol"mn or Simple (ol"mn famil!* 4ools for &assandra (assandra has "ilt in tools for accessing (assandra from the direct download s"ch cassandra?cli and node?tool* There are third part! tools availale, as the following= >2'? Data browsers (hiton LE , a FTQ data rowser* cassandraBg"i LL , a Swing data rowser* 1dministration tools 3ps(enter L5 , 3ps(enter is a tool for management and monitoring of a (assandra cl"ster* The (omm"nit! >dition of 3ps(enter is free for an!one to download and "se* There is also an >nterprise >dition of 3ps(enter that incl"des additional feat"res* (assandra (l"ster .dmin L$ , (assandra (l"ster .dmin is a F%# tool to help people administrate their .pache (assandra cl"ster, similar to PCPM!.dmin for M!S:5 administration* Client inter#aces and language Support (assandra has a lot of highBlevel client liraries for P!thon, 'ava, *9et, &"!, PCP, Perl, (WW, etc* >27? For a detailed list of client software go to (lient 3ptions article on (assandra;s Diki L) 4ntegration !ith other tools There are other tools worth mentioning like Solandra A) , a (assandra ackend for .pache Solr L1 , a we application "ilt aro"nd 5"cene, for f"ll te4t inde4ing and search* For monitoring p"rposes (assandra is well integrated with Fanglia >2@? and there are pl"gins for other monitoring s!stem as, ! e4ample, 9agios* &onclusions #f we need to handle ver! ig amo"nts of data, with more writes than reads ,as for real time data anal!sis, ! e4ample- (assandra co"ld e a good 9oS:5 sol"tion* # will emphasi+e the following= Cadoop;s "tilisation for Map &ed"ce is integrated in (assandra @8$A, "t the architect"re will e fairl! comple4 ,simpler than C<ase according to Dominic Dilliams @6$A- 4 http://github.com/"!iftI/chito# 44 http://co"e.google.com/p/cassa#"!a.gui 45 http://www."atastaI.com/p!o"ucts/opsce#te! 46 https://github.com/sebgi!ouI/&assa#"!a.&luste!.$"mi# 47 http://wiki.apache.o!g/cassa#"!a/&lie#t7ptio#s 48 https://github.com/t(ake/3ola#"!a 3ola#"!a sou!ce at 0ithub 49 http://luce#e.apache.o!g/sol!/ Page 18 #t;s t"nale consistenc! and m"lti datacenter s"pport Interesting readings (assandra B . Decentrali+ed Str"ct"red Storage S!stem, a 6771 paper presenting (assandra ! their creators .vinash 5akshman and Prashant Malik* >2@? L Months with (assandra, a love stor!, a chronicle and main reasons wh! (assandra was adopted at (lo"dQick* >2B? C<ase vs (assandra= wh! we moved post from Dominic Dilliams log, where he e4plains wh! the! moved from C<ase to (assandra* >2A?- C<ase vs (assandra, from .dk" log* >2C? (assandra vs* ,(o"chD< X MongoD< X &iak X C<ase-, from <rian 3;9eill log* >2D? #ntrod"ction to (assandra= &eplication and (onsistenc! presentation ! <enjamin <lack* >'0? Page 19 %# &ouchD. 8ouchDB 57 is an open so"rce document?oriented ,oS#L database s"stem* #t;s similar to MongoD< (reated in 6775, egan an .pache Software Fo"ndation project in 6770, and makes part of the TnewT 9oS:5 famil! of dataase s!stems* #nstead of storing data in tales as is made in a TclassicalT relational dataase, (o"chD< store structured data as (S0, documents with d!namic schemas, making easier and faster the integration of data in certain t!pe of applications* (o"chD< is interesting in part d"e to its M"ltiBJersion (onc"rrent (ontrol* This means that we have versioning support for the data and that readers will not lock writers and writers will not lock readers* (o"chD< "ses a &>STf"l 'S39 .P# for accessing data, which allows accessing data "sing CTTP re2"est* 3ther feat"res are .(#D semantics with eventual consistenc", Map /educe, incremental replication, and fa"ltB tolerance* #t comes with a we console* 7istory 8ouchDB ,(o"ch is an acron!m for cl"ster of "nreliale commodit! hardware- >'1? is a project created in .pril 6775 ! Damien Qat+, former 5ot"s 9otes developer at #<M* Damien Qat+ defined it as a storage s!stem for a large scale oject dataase* Cis ojectives for the dataase were for it to ecome the dataase of the #nternet and that it wo"ld e designed from the gro"nd "p to serve we applications* Ce selfBf"nded the project for almost two !ears and released it as an open so"rce project "nder the F9% Feneral P"lic 5icense* +n >ebruar" 00)4 it became an 1pache +ncubator pro@ect and the license was changed to the .pache 5icense >'2?* . few months after, it grad"ated to a topBlevel project* >''? ("rrentl!, (o"chD< is maintained at the .pache Software Fo"ndation with acking from #<M* Qat+ works on it f"llBtime as the lead developer* First stale version was released in '"l! 6787 >'7?* 5ast version is 8*6, released in .pril 6786* Licensing and support (o"chD< is an .pache Software Fo"ndation project, and so it has an .pache 5icense 6*7 (o"chD< has commercial s"pport, ! the enterprises (o"chase 58 and (lo"dant 56 * Main features . s"mmar! of main feat"res co"ld e the following Document Storage 8ouchDB stores data as :documents;4 as one or more fieldCvalue pairs expressed as (S0,* Field val"es can 50 http://couch"b.apache.o!g/ 51 http://www.couchbase.com 52 https://clou"a#t.com/ Page 20 e simple things like strings, n"mers, or dates* <"t !o" can also "se ordered lists and associative arra!s* >ver! doc"ment in a (o"chD< dataase has a "ni2"e id and there is no re2"ired doc"ment schema* 18+D Semantics 8ouchDB provides 18+D semantics >'@? #t does this ! implementing a form of M"ltiBJersion (onc"rrenc! (ontrol, meaning that (o"chD< can handle a high vol"me of conc"rrent readers and writers witho"t conflict* MapC/educe *iews and +ndexes The data stored is str"ct"red "sing views* #n (o"chD<, each view is constructed b" a (avaScript function that acts as the Map half of a mapGred"ce operation* The f"nction takes a doc"ment and transforms it into a single val"e which it ret"rns* (o"chD< can inde4 views and keep those inde4es "pdated as doc"ments are added, removed, or "pdated* Distributed 1rchitecture with /eplication (o"chD< was designed with bi?direction replication Dor s"nchroni9ationE and off?line operation in mind* That means m"ltiple replicas can have their own copies of the same data, modif! it, and then s!nc those changes at a later time* /ES5 1&+ 1ll items have a uni<ue 2/+ that gets exposed via '55&* &>ST "ses the CTTP methods P3ST, F>T, P%T and D>5>T> for the fo"r asic ,(reate, &ead, %pdate, Delete- operations on all reso"rces* Eventual 8onsistenc" (o"chD< g"arantees event"al consistenc! to e ale to provide oth availailit! and partition tolerance* Built for 0ffline (o"chD< can replicate to devices ,like smartphones- that can go offline and handle data s!nc for !o" when the device is ack online* (o"chD< offers also a built?in admin interface accessible via web called F"ton >'A?* :se cases ; production deployments &eplication and s!nchroni+ation capailities of (o"chD< make it ideal for "sing it in moile devices, where network connection is not g"aranteed "t the application m"st keep on working offline* (o"chD< is well s"ited for applications with acc"m"lating, occasionall! changing data, on which preBdefined 2"eries are to e r"n and where versioning is important ,(&M, (MS s!stems, ! e4ample-* MasterBmaster replication is an especiall! interesting feat"re, allowing eas! m"ltiBsite deplo!ments >2?* 8nterprises who use &ouchD. (o"chD< is "sed in certain applications for 1ndroid like Spread5!rics 5E and applications for >aceboo7 like Dill !o" Qissme or <irthda! Freeting (ards or wes like Friendpaste >'B?. . few e4amples of enterprises that "sed or are "sing (o"chD< are= 5 https://pla'.google.com/sto!e/apps/"etailsEi"Gb!.com.sma!tfi#ge!s.sp!ea"l'!ics Page 21 %"nt" 5L for its s!nchroni+ation service %"nt" 3ne "ntil 9ovemer 6788 >'C? "t was discontin"ed eca"se of scalailit! iss"es* >'D? The <<( 55 , for its d!namic content platforms* >70? (redit S"isse 5$ , for internal "se at commodities department for their marketplace framework* >'B? Meeo 5) , for their social platform ,we and applications- For a complete list of software projects and we sites that "se (o"chD<, read the (o"chD< in the wild >'B? article of the prod"ct;s we* Data manipulation9 Documents and <iews (o"chD< is similar to other doc"ment stores like MongoD<* 8ouchDB manages a collection of (S0, documents! 5he documents are organised via views* Jiews are defined with aggregate f"nctions and filters are comp"ted in parallel, m"ch like Map &ed"ce* Jiews are generall! stored in the dataase and their inde4es "pdated contin"o"sl!* (o"chD< s"pports a view s!stem "sing e4ternal socket servers and a 'S39Based protocol* >71? .s a conse2"ence, view servers have een developed in a variet! of lang"ages ,'avaScript is the defa"lt, "t there are also PCP, &"!, P!thon and >rlang-* Accessing data *ia 744( (o"chD< provides a set of &>STf"l CTTP methods ,e*g*, P3ST, F>T, P%T or D>5>T>-* De co"ld access to the data, "sing c%&5, ! e4ample* . few e4amples of data accessing via CTTP co"ld e= For accessing (o"chD< server info= c4#5 .tt6:77127808081:99847 The (o"chD< server ret"rns a response in 'S39 format= {"c/4c.db":":e5c/me",";e#si/%":"18180") For creating a dataase we co"ld= c4#5 <= *> .tt6:77127808081:99847?i-i #f the dataase does not e4ist, (o"chD< will repl! with {"/-":t#4e) or, with a different response message, if the dataase alread! e4ists= {"e##/#":"fi5e_e@ists","#eas/%":">.e database c/45d %/t be c#eated, t.e fi5e a5#ead1 e@ists8") &onclusions For knowing if (o"chD< is for "s # will emphasi+e the following= For getting res"lts we have to define views* This means that if our problem could not be resolved with a set of predefined <ueries 8ouchDB is not for us, as it lacks fle4iilit! in the wa! of 2"er!ing data For this reason the initial learning c"rve is harder 54 http://www.ubu#tu.com/ 55 http://www.bbc.co.uk/ 56 http://www.c!e"it.suisse.com 57 https://www.meebo.com/about/ Page 22 (o"chD< has master?master replication support, ideal for a different data centers m"ltiBnode set"p* .lso it can replicate to devices that can go offline ,like smartphones- and handle data s!nc for !o" when the device is ack online ,making it a good sol"tion for applications working on distri"ted environments with non g"aranteed connection- (o"chD< has multiple versions support The interface is CTTPG&>ST, so is easil! accessile ! an! applicationGlang"ageGserver For what is said in a few logs (o"chD< is not a ver! mat"re project* 5ast version is the 8*6*7 and it has reaking changes with regard to the previo"s version* >72? Interesting readings Dh! (o"chD<V, from the T(o"chD< The Definitive F"ide >7'? (omparing MongoD< and (o"chD<, from MongoD< we >77? MongoD< or (o"chD< B fit for prod"ctionV, 2"estion and responses at Stack3verflow >7@? E (o"chD< (ase St"dies, post on .le4 Popesc" 9oS:5 log >7A? (o"chD< for access log aggregation and anal!sis, post on %serPrimer!*net log >7B? Page 2 '# MongoD. MongoD< ,from h"mongo"s- is an open so"rce document?oriented ,oS#L database s"stem* MongoD< makes part of the new 9oS:5 famil! of dataase s!stems* #nstead of storing data in tales as is made in a classical relational dataase, MongoD< store str"ct"re data as 'S39Blike doc"ments with d!namic schemas ,MongoD< calls the format <S39-, making easier and faster the integration of data in certain t!pe of applications* Development of MongoD< egan in 3ctoer 677) ! 87gen 50 * #t is now a mat"re and feat"re rich dataase read! for prod"ction "se* #t;s "sed, ! e4ample, ! MTJ 9etworks >7C?, (raigslist >7D? or Fo"rs2"are >@0?. 7istory Development of MongoD< egan at 87gen in 677), when the compan! was "ilding a Platform as a Service similar to Foogle .pp >ngine >@1?* #n 6771 MongoD< was open so"rced as a standBalone prod"ct >@2?, with an F9% .ffero Feneral P"lic 5icense ,or 16&L 51 E license* #n March 6788, from version 8*L, MongoD< has een considered prod"ction read! >@'?* The last stale version is 6*6*7, released in ."g"st 6786* Licensing and support MongoD< is availale for free "nder the F9% .ffero Feneral P"lic 5icense >@7?* The lang"age drivers are availale "nder an .pache 5icense* MongoD< is eing developed ! 87genA as a "siness with commercial s"pport availale >@@?* Main features . s"mmar! of main feat"res co"ld e the following 1d hoc <ueries MongoD< s"pports search ! field, range 2"eries, reg"lar e4pression searches* :"eries can ret"rn specific fields of doc"ments and also incl"de "serBdefined 'avaScript f"nctions* +ndexing .n! field in a MongoD< doc"ment can e inde4ed ,inde4es in MongoD< are concept"all! similar to those in &D<MS-* Secondar! inde4es and geospatial inde4es are also availale* /eplication MongoD< s"pports :master?slave replication;* . master can perform reads and writes* . slave copies data from the master and can onl! e "sed for reads or ack"p ,not writes-* The slaves have the ailit! to elect a new master if the c"rrent one goes down* Load balancing MongoDB scales hori9ontall" using a s"stem called sharding >@A?* The developer chooses a shard ke!, which 58 http://e#.wikipe"ia.o!g/wiki/10ge# 59 http://www.g#u.o!g/lice#ses/agpl.html Page 24 determines how the data in a collection will e distri"ted* The data is split into ranges ,ased on the shard ke!- and distri"ted across m"ltiple shards* ,. shard is a master with one or more slaves*- MongoD< can r"n over m"ltiple servers, alancing the load andGor d"plicating data to keep the s!stem "p and r"nning in case of hardware fail"re* ."tomatic config"ration is eas! to deplo! and it;s possile to add new machines to a r"nning dataase* >ile storage MongoD< co"ld e "sed as a file s!stem, taking advantage of load alancing and data replication feat"res over m"ltiple machines for storing files* This f"nction, called FridFS >@B?, is incl"ded with MongoD< drivers and availale with no diffic"lt! for development lang"ages* MongoD< e4pose f"nctions for manip"late files and their contents to developers* FridFS is "sed, ! e4ample, in pl"gins for 9F#9O >@C? and lighttpd >@D?* #n a m"ltiple machines MongoD< s!stem, files co"ld distri"ted and copied m"ltiple times etween machines transparentl!, having then a load alanced H fa"lt tolerant s!stem* 1ggregation Map &ed"ce can e "sed for atch processing of data and aggregation operations* The aggregation framework enales "sers to otain the kind of res"lts S:5 gro"pB! is "sed for Server?side (avaScript execution 'avaScript can e "sed in 2"eries, aggregation f"nctions ,such as Map /educe-, are sent directl! to the dataase to e e4ec"ted* 8apped collections MongoD< s"pports fi4edBsi+e collections called capped collections* This t!pe of collection maintains insertion order and, once the specified si+e has een reached, ehaves like a circ"lar 2"e"e* :se cases ; production deployments .ccording to %se (ases article at prod"ct;s we MongoD< >A0? is well s"ited for following cases= 1rchiving and event logging Document and 8ontent Management S"stems* as a doc"mentBoriented ,'S39- dataase, MongoD<;s fle4ile schemas are a good fit for this* >B(ommerce* Several sites are "sing MongoD< as the core of their eBcommerce infrastr"ct"re ,often in comination with an &D<MS for the final order processing and acco"nting-* 6aming* Cigh performance small readGwrites are a good fit for MongoD<Y also for certain games geospatial inde4es can e helpf"l* 'igh volume problems* Prolems where a traditional D<MS might e too e4pensive for the data in 2"estion* #n man! cases developers wo"ld traditionall! write c"stom code to a file s!stem instead "sing flat files or other methodologies* Mobile* Specificall!, the server?side infrastructure of mobile s"stems* Feospatial ke! here* 0perational data store of a web site* MongoD< is ver! good at realBtime inserts, "pdates, and 2"eries* Scalailit! and replication are provided which are necessar! f"nctions for large we sites; realBtime data stores* Specific we "se case e4amples Page 25 &ro@ects using iterativeCagile development methodologies* Mongo;s <S39 data format makes it ver! eas! to store and retrieve data in a doc"mentBst!le G schemaless format* .ddition of new properties to e4isting ojects is eas! and does not generall! re2"ire locking T.5T>& T.<5>T st!le operations* /eal?time statsCanal"tics 8nterprises who use MongoD. <etween the enterprises who "se MongoD< there are= MTJ 9etworks, (raigslist, Disne! #nteractive Media Fro"p, Dordnik, Diaspora, Sh"tterfl!, fo"rs2"are, it*l!, The 9ew Rork Times, So"rceForge, <"siness #nsider, >ts!, (>&9 5C(, Th"mtack, .ppScale, %er or The F"ardian For a complete list and references on each partic"lar "se case visit the article TProd"ction Deplo!mentsT on MongoD<;s we >A1? Data manipulation9 &ollections and Documents MongoD< store str"ct"re data as 'S39Blike doc"ments with d!namic schemas ,called <S39, with no predefined schema* The element of data is called documents, stored in collections* 3ne collection ma! have an! n"mer of doc"ments* (ompared to relational dataases we co"ld sa! collections are as tales, and doc"ments are as records* <"t there is one ig difference= ever! record in a tale have the same n"mer of fields, while doc"ments in a collection co"ld have completel! different fields* 3ne tale of a few records with the fields 5ast name, First name and .ge and possile others like .ddress or (it! co"ld e descried as the following MongoD< collection= { "_id": ObjectId("4efa8d2b7d284dad101e4bc9"), "Last Name": "D!ON", ""i#st Name": "$ea%", "&'e": 4( ), { "_id": ObjectId("4efa8d2b7d284dad101e4bc7"), "Last Name": "*+LL+,IN", ""i#st Name": ""#a%c-", "&'e": 29, "&dd#ess": { "3t#eet" : "1 c.emi% des L/'es", "0it1": "2+,3&ILL+3" ) "0it1": "2+,3&ILL+3" ) Documents in a MongoDB collection could have different fields ,note= Pid field is oligator!, a"tomaticall! created ! MongoD<, it;s a "ni2"e inde4 which identif! the doc"ment #n a doc"ment, new fields co"ld e added, e4isting ones s"ppressed, modified or renamed at an! moment* There is Page 26 no predefined schema* . doc"ment str"ct"re is reall! simple and composed of ke!Bval"e pairs like associative arra!s in programming lang"ages 'S39 format* The ke! is the field name, the val"e is its content* .s val"e we co"ld "se n"mers, strings and also inar! data like images or another 7e"?value pairs* Language Support MongoD< has official drivers for= ( $7 , (WW $8 , (Z G *9et $6 , >rlang $E , Caskell $L , 'ava $5 , 'avaScript $$ , 5isp $) , Perl $0 , PCP $1 , P!thon )7 , &"! )8 and Scala )6 * There are also a large n"mer of "nofficial drivers for (oldF"sion )E , Delphi )L , 5"a )5 , node*js )$ , &"! )) , Smalltalk )0
and m"ch others* Management and graphical frontends MongoE9 tools #n a MongoD< installation there are availale the following commands mongo MongoD< offers an interactive shell called mongo >A2?, which lets developers view, insert, remove, and "pdate data in their dataases, as well as get replication information, setting "p sharding, sh"t down servers, e4ec"te 'avaScript, and more* .dministrative information can also e accessed thro"gh a web interface, a simple we page that serves information ao"t the c"rrent server stat"s* <! defa"lt, this interface is 8777 ports aove the dataase port ,6078)-* mongostat mongostat is a commandBline tool that displa!s a simple list of stats ao"t the last second= how man! inserts, "pdates, removes, 2"eries, and commands were performed, as well as what percentage of the time the dataase 60 http://github.com/mo#go"b/mo#go.c."!i)e! 61 http://github.com/mo#go"b/mo#go 62 http://www.mo#go"b.o!g/"ispla'/-7&3/&3ha!pK5a#guageK&e#te! 6 https://github.com/1o#'0e#/mo#go"b.e!la#g 64 http://hackage.haskell.o!g/package/mo#go-+ 65 http://github.com/mo#go"b/mo#go.(a)a."!i)e! 66 http://www.mo#go"b.o!g/"ispla'/-7&3/Ba)a3c!iptK5a#guageK&e#te! 67 https://github.com/fo#s/cl.mo#go 68 http://github.com/mo#go"b/mo#go.pe!l."!i)e! 69 http://github.com/mo#go"b/mo#go.php."!i)e! 70 http://github.com/mo#go"b/mo#go.p'tho#."!i)e! 71 http://github.com/mo#go"b/mo#go.!ub'."!i)e! 72 https://github.com/mo#go"b/casbah 7 http://github.com/)i!tiI/cfmo#go"b 74 http://co"e.google.com/p/pebo#go/ 75 http://co"e.google.com/p/luamo#go/ 76 http://www.mo#go"b.o!g/"ispla'/-7&3/#o"e.B3 77 http://github.com/tmm1/!mo#go 78 http://www.s<ueaksou!ce.com/2o#go1alk.html Page 27 was locked and how m"ch memor! it is "sing* mongosniff mongosniff sniffs network traffic going to and from MongoD<* Monitoring plugins There are MongoD< monitoring pl"gins availale for the following network tools= M"nin )1 , Fanglia 07 , (acti 08 , Sco"t 06 Cloud0%ased monitoring services MongoD< Monitoring Service 0E is a free, clo"dBased monitoring and alerting sol"tion for MongoD< deplo!ments offered ! 87gen, the compan! who develops MongoD< 0eb ; Des/top Application 3:Is Several F%#s have een created ! MongoD<;s developer comm"nit! to help vis"ali+e their data* Some pop"lar ones are= 0pen Source tools &ockMongo 0L = PCP ased MongoD< administration F%# tool phpMo.dmin 05 = another PCP F%# that r"ns entirel! from a single 15k selfBconfig"ring file 'Mongo<rowser 0$ = a desktop application for all platforms MongoE 0) = a &"!Based interface Meclipse 00 = >clipse pl"gin for interacting with MongoD< &roprietar" tools MongoC" 01 = a Freeware native Mac 3S O application for managing MongoD< Dataase Master 17 = development and administration tool for 3racle, S:5 Server, M!S:5, PostgreS:5, MongoD<, S:5ite *** which allows r"n S:5, 5#9: and 'S39 2"eries over dataases* Developed ! 9"cleon Software 18 for Dindows s!stems <# St"dio= "siness intelligence and data anal!sis software which allows design dataase reports, charts and 79 http://github.com/e!h/mo#go.mu#i# 80 http://github.com/<uii)e!/mo#go"b.ga#glia 81 http://tag1co#sulti#g.com/blog/mo#go"b.cacti.g!aphs 82 http://scoutapp.com/plugi#Fu!ls/291.mo#go"b.slow.<ue!ies 8 http://www.10ge#.com/mo#go"b.mo#ito!i#g.se!)ice 84 http://co"e.google.com/p/!ock.php/wiki/!ockFmo#go 85 http://www.phpmoa"mi#.com/ 86 http://www.e"g'tech.com/(mo#gob!owse!/ 87 http://mo#go.com/ 88 http://up"ate.eIoa#al'tic.com/o!g.mo#go"b.meclipse/ 89 http://mo#gohub.to"a'close.com/ 90 http://www.#ucleo#softwa!e.com 91 http://www.#ucleo#softwa!e.com Page 28 dashoards, from the same compan! that Dataase Master &onclusions For knowing if MongoD< is for "s # will emphasi+e the following= MongoD< has a <uer" language, which makes getting the data d!namic and fle4ile The interface is a c"stom protocol over T(PG#P, with native drivers for a lot of lang"ages* 5he utilisation of a binar" protocol ma7e the operations faster than others ,like (o"chD<- /eplication is master?slave 0,LF, as with M!S:5* #f !o" need m"ltiple masters in a Mongo environment, !o" have to set "p sharding* 3ther consideration # will do are= the doc"mentation on the prod"ct;s we page is ver! good ,! e4ample, %se cases doc"mentation >A0?- as said there is commercial s"pport availale* Interesting ,eadings MongoD< Schema Design= Cow to Think 9onB&elational 'ared &osoff;s presentation at Ro"t"e >A'? &ealBtime .nal!tics with MongoD<, presentation ! 'ared &osoff >A7? De .nal!tics "sing MongoD< of PCP and MongoD< De Development <eginnerSs F"ide ook >A@? Page 29 Loo/ing for a NoSQL solution for our needs "# Introduction and initial approach .s # said the main j"stification we have for looking at 9oS:5 technologies is log management* #n the enterprise # work, we have two "se cases= 1nal"sis of +nternet access logs &ight now this access are stored in M!S:5 tales* De;re talking ao"t tens of thousands of users with +nternet access and a few logs that record ever! %&5 and download si+e* 1s several 6igab"tes are generated dail" what is done is to store logs in different tales partitioning the data verticall! ! month ,a tale for 'an"ar!, one for Fer"ar!, etc ***- Log anal"sis of geographicall" distributed 2nix servers %sed for comm"nication with sales offices, we are talking of logs of the vario"s services ,FTP, SSC, .pache, M!S:5, file transfer services developed ! the compan! ***- of ao"t 8777 5in"4 servers For this anal!se we will chose the first one %# Analyse of Internet access logs, describing the problem Dhat we want is to achieve is an efficient storage and anal!sis of the logs of comm"nications made ! emplo!ees ,tens of tho"sands- with #nternet access* .t least more efficient that o"r act"al sol"tion* 9ow has een decided to divide the data in different tales ! month* This decision has een taken for reasons of vol"me* 3ne immediate conse2"ence is that 2"ite complicated to make 2"eries asking for data of different months as the developer will have to think caref"ll! how to design the 2"eries and the server will have to access m"ltiple tales* 5he critical problem here is to handle huge amounts of data* Do we need relationsV Res and no* #f we gain performance we donSt mind to repeat certain data ,as "ser name, ! e4ample-* De will get statistics and reports ased on log anal!sis, each one of this record will incl"de information s"ch as= Date and time of access %ser name %&5 accessed Si+e in !tes of the network transmission The 2"estions we want to answer are the like the following t!pe= Dhat are the most visited pagesV .nd the most visited per monthV .nd the last weekV Dhat "sers spend more time onlineV Dhat are the 877 "sers whose traffic vol"me is greaterV Page 0 Dhat is the average dail! vol"me of traffic from the corporate network to the #nternetV Data si=e estimation The si+e and n"mer of records of data that is eing stored ! month are= Data si+e= etween 857 and E77 F< 5og entries n"mer= etween $57 millions and and 8*)77 millions For a "ser pop"lation of aro"nd )7*777 who access to 8,5 millions of domains So, given the act"al traffic si+e, in a !ear we co"ld reach a vol"me stored of G!%00 6B for 0 billions of entries in the log '# Oh my 3od> 0e ha*e a lot of options> Dell, well, well **** .s said in the second chapter 9oS:5, State of the :"estion, we have a lot of 9oS:5 Dataase Management S!stems and for a eginner it seems diffic"lt to know where to egin* .fter reading a lot on the s"ject ,see read recommendations at the end of this section- the most known 3pen So"rce 9oS:5 D<MS are= MongoD< 16 , (o"chD< 1E , (assandra 1L , Memase 15 , &edis 1$ , &iak 1) , 9eoL' 10 , FlockD< 11 and C<ase 877 , among others* The first thing # wo"ld recommend wo"ld e to give a 2"ick read to the article 9oS:5 on >nglish Dikipedia 878
,with contri"tions from m!self =B-- and the first article in the series Picking the &ight 9oS:5 Dataase Tool 876 . 2"ick s"mmar! ,as reminder, it;s e4plained with more detail on second chapter- wo"ld e that, depending on the wa! data is organi+ed, 9oS:5 dataases are divided into *** Document oriented .s MongoD< or (o"chD<* Data is stored in str"ct"red formats ,records- as 'S39* >ach data "nit is called a doc"ment ,here this word have nothing to do with a file t!pe-* He"?*alue .s (assandra, Memase, &edis or &iak* Data is stores in ke!Bval"e pairs ,a val"e might e an oject- 6raph oriented 92 http://www.mo#go"b.o!g/ 9 http://couch"b.apache.o!g/ 94 http://cassa#"!a.apache.o!g 95 http://www.couchbase.com/membase 96 http://!e"is.io/ 97 http://wiki.basho.com/ 98 http://#eo4(.o!g/ 99 http://github.com/twitte!/flock"b 100 http://hbase.apache.o!g/ 101 http://e#.wikipe"ia.o!g/wiki/6o345 102 http://blog.mo#itis.com/i#"eI.php/2011/05/22/picki#g.the.!ight.#os<l."atabase.tool/ Page 1 .s 9eoL' or FlockD<* The! store the elements and their relationships with a graph st!le ,for social networks, transport networks, road maps, network topologies, for e4ample- 5abular .s (assandra or C<ase* Data is stored in rows with several col"mns that correspond to a ke!, with a similar res"lt to a tale +# Questions we should answer before ma/ing a choice There is no one fits all 9oS:5 sol"tion, as this term appl! to a wide range of dataase management s!stems* #t;s perfectl! possile and reasonale to "se several s!stems at the same time ,a relational D<MS as M!S:5 and one or more 9oS:5 D<MS-, depending on the t!pe of data to store and to 2"er!* #n the end the choice will depend on the nat"re of the prolem we want to solve* . 9oS:5 sol"tion does not replace a relational dataase, complements it for a kind of prolems where relational D<MS have not eno"gh performance* %nless, of co"rse, we;re "sing a S:5 dataase for the wrong prolem* De sho"ld e ale to answer the following 2"estions efore looking for a prod"ct= Dhat t!pe of data will e handledV This data co"ld e nat"rall! organi+ed in associative .rra!sV 3r in ke!B val"e pairsV #t is data which will fit in a OM5 or similar str"ct"reV Do we need transactionsV Do we need to "se Map &ed"ceV .nd when reviewing the different options= The latest version is considered staleV Does it have commercial s"pportV Dhat is the learning c"rveV #s good doc"mentation availaleV #s there an active comm"nit!V -# Description of the data for an Internet Access Log management system Description of the problem .s said will e the kind of data we manage, its str"ct"re and the nat"re of the prolem which will lead "s to one or other 9oS:5 sol"tion* The data we want to manage are access logs generated ! several CTTP pro4ies of the compan! for several tens of tho"sands of "sers* De have two different t!pe of records= records from FTP access and from the rest ,mainl! CTTP-= For each FTP access we will save #P of the host that makes the re2"est date and time of access Page 2 #nternet domain accessed %&# si+e of the transfer For each 9onFTP access= #P of the host that makes the re2"est the "ser id date and time of access the CTTP method "sed protocol #nternet domain accessed %&# CTTP ret"rn code si+e of the transfer <esides storing the data the following statistical reports will be created Cits n"mer and vol"me of data transferred b" +nternet domain, dail! and monthl! Cits n"mer and vol"me of data transferred b" user, dail! and monthl! Definition of our needs So we can reach to the following first definition of o"r needs= each data entr! co"ld e represented as an associative arra! each record in "nrelated to each other each entr! is stored in a log tale as it grows indefinitel! accesses to the dataase are mostl! writing each access means a change in the statistical val"es which reflect dail! and monthl! access ! domain and "ser the list of 2"eries sent ! o"r application is known ,an!wa!, the schema sho"ld e defined as new ones can e easil! made- Dhat lead "s to the following concl"sions= The data are records with m"ltiple fields, so we need a doc"mentBoriented dataase or ta"lar ,m"ltiple col"mns for a record- Map &ed"ce is desired* For having reports in real time each access will "pdate the dail! and monthl! statistics for domain and "ser De don;t need masterBmaster replication ,pro4ies in different geographic areas manage accesses from different "sers- De don;t need s"pport for m"ltiple versions ,there is no s"ch a thing in a log- De don;t need real data consistenc! Page De don;t need transactions ,data will e added one after another, isolated- .nd also the prod"ct chosen m"st e= 3pen So"rce &ead! for prod"ction environments ,stale- Dith professional s"pport 9ot ad 5# &hoosing between se*eral NoSQL products #f we discard the dataases that hold data in memor!, as ke!Bval"e pairs and graphs we are left with the following options= MongoD<, (o"chD<, (assandra, C<ase and &iak* To what is told in all read doc"ments # will add the following tho"ghts ,read also the #nteresting readings sections of different chapters-= MongoD. 10 &/0S= #t;s a doc"mentBoriented dataase therefore ver! fle4ile in str"ct"ring the data ,"ses 'S39- #t has a d!namic 2"er! lang"age Cas professional s"pport ! the compan! that developed the prod"ct, 87gen 87L #t has a large and active communit" ,present at conferences, # have seen them at F3SD>M 875 in <r"ssels this !ear- #t has s"pport for Map &ed"ce #t;s a mature product, considered prod"ction read! ,c"rrent version is 6*6- The documentation on their website is reall" good There are native drivers for m"ltiple lang"ages made ! 87gen 80,S= .ltho"gh it is not diffic"lt to install and r"n MongoD< is not a simple prod"ct* .n installation of MongoD< has several t!pes of services= data servers, config"ration servers, servers that ro"te the client re2"ests &eplication is onl! masterBslave 10 http://www.mo#go"b.o!g/ 104 http://www.10ge#.com/ 105 https://fos"em.o!g Page 4 &ouchD. 106 &/0S= #t is a doc"mentBoriented dataase, so ver! fle4ile in str"ct"ring the data (onc"rrent Jersions S!stem #t has master?master replication 80,S= For achieving versioning data is not modified, each time a modification is done a new version is added* This takes a lot of disk space and atch processes are necessar! for data compaction operations +t is not ver" mature* The latest version is 8*6*7 and has changes that make it incompatile with the previo"s versions 5o exploit the data is necessar" to define views, which means that 2"eries m"st e defined in advance ,not ver! fle4ile- ,ia/ 107 &/0S= #t is a h!rid dataase, store doc"ments and ke!Bval"e pairs There is no central controller and therefore no single point of fail"re #t has s"pport for MapB&ed"ce #t has s"pport for transactions 80,S= #t has two versions, one open so"rce and a commercial one with m"ltiBsite replication &assandra 108 &/0S= #t is an .pache project, considered of ma4im"m importance #t is a ta"lar dataase ,can store m"ltiple ke! col"mns- making it fle4ile and valid for o"r case Designed for situations where there is more writes than reads* Scale ver! well in these cases ,ideal for log anal!sis- Designed to replicate data between multiple data centers Provides integration with 'adoop 871 for Map /educe The consistenc! level is config"rale 106 http://couch"b.apache.o!g/ 107 http://wiki.basho.com/ 108 http://cassa#"!a.apache.o!g/ 109 http://ha"oop.apache.o!g/ Page 5 There is no central controller and therefore no single point of failure #t has support for transactions ,with UooQeeper 887 - 80,S= Ma!e too comple4 7.ase 111 &/0S= #t is an .pache project ,s"project of Cadoop- Similar to (assandra ,can store m"ltiple ke! col"mns- Provides integration with 'adoop for Map /educe 80,S= Too comple4 6# And the 0inner is ##### MongoD.> .fter reading ever!thing # opted for MongoD<, mainl! eca"se= meets all the re2"irements stated at the eginning= doc"mentBoriented, with MapB&ed"ce, 3pen So"rce, stale and professionall! s"pported s"pport is given ! the same compan! that developed the prod"ct, 87gen, which is clear in the know =B- has a complete wesite with e4tensive doc"mentation are ver! active, the! are present in man! conferences and lect"res ,as seen in the article events 886 of their we- comparativel! this prod"ct does not seem too comple4 .s this will e the first deplo!ment of a 9oS:5 dataase in the compan! and made ! someone with no previo"s e4perience, + consider vital the availabilit" documentation and comprehensive guides* #n partic"lar, and for o"r "se case # will highlight the following articles from their we MongoD< is Fantastic for 5ogging >AA? %sing MongoD< for &ealBtime .nal!tics >AB? There is a good collection of interesting presentations on 87gen;s we 88E 110 http://Lookeepe!.apache.o!g/ 111 http://hbase.apache.o!g/ 112 http://www.mo#go"b.o!g/"ispla'/-7&3/J)e#ts 11 http://www.10ge#.com/p!ese#tatio#s Page 6 ?# 0hat we will do from here #### 3nce chosen 9oS:5 management s!stem from here so we have to do is 8* #nstall and config"re MongoD< 6* Designing the MongoD< schema dataase for #nternet .ccess 5ogs E* Develop some code which will allow "s to "se the schema from MongoD< or M!S:5 transparentl! to applications L* Make some performance tests on the same conditions for verif!ing that "sing MongoD< is not onl! more fle4ile than M!S:5, "t also a etter idea from a performance point of view 5here are other alternatives that seem e<uall" interesting and valid as 8assandra or /ia7, and that co"ld e interesting to test in a f"rther st"d!* @# Interesting readings (assandra vs MongoD< vs (o"chD< vs &edis vs &iak vs C<ase vs Memase vs 9eoLj comparison from Qristof Qovacs <log >2? Picking the &ight 9oS:5 Dataase Tool= post from Monitis; log >AC? T(onsistenc! Models in 9onB&elational DataasesT ! F"! Carrison = . good e4planation of (.P Theorem, >vent"al consistenc! and how consistenc! prolems can e handled in distri"ted environments* >C? Page 7 Installation of MongoD. #n this chapter # will descrie the technical details for deplo!ing a MongoD< 6*6*7 server in the 5in"4 machines of PS. ,5in"4 distri"tion is S"se >nterprise 5in"4 Server- #n PS. have developed inBho"se software distri"tion s!stem ,the! don;t "se *de or *rpm packages neither a known standard for remote installs-* The installed software on their servers m"st meet a strict director! str"ct"re* For reasons of confidentialit! in this article we will ass"me that MongoD< will e installed "nder GoptGmongodB 6*6*7* "# Deploying MongoD. binaries .fter downloading the legac" static %A bits version ,version 6*6*7 at Septemer 6786- from MongoD< Downloads page 88L the following commands have een e4ec"ted= ta# <@A;f m/%'/db<5i%4@<@8B_B4<static<5e'ac1<282808t'A m-di# 7/6t7m/%'/db<28280 m-di# 7/6t7m/%'/db<282807bi% m-di# 7/6t7m/%'/db<282807d/cs m-di# 7/6t7m/%'/db<282807etc m-di# 7/6t7m/%'/db<2828075/'s m-di# 7/6t7m/%'/db<282807data m-di# 7/6t7m/%'/db<282807m/d45es m-di# 7/6t7m/%'/db<282807tm6 m; m/%'/db<5i%4@<@8B_B4<static<5e'ac1<282807bi% 7/6t7m/%'/db<282807bi% m; m/%'/db<5i%4@<@8B_B4<static<5e'ac1<282807C 7/6t7m/%'/db<282807d/cs c./?% <,. #//t:#//t 7/6t7m/%'/db<28280 c./?% <,. %/b/d1:%/b/d1 7/6t7m/%'/db<282807data c.m/d <, aD#= 7/6t7m/%'/db<28280 %# &ompiling and installing (7( dri*er For the installation of mongo;s PCP driver we "se the pecl command for using the &E8L repositor" for &'& extensions 885 6ec5 d/?%5/ad m/%'/ ta# <@;f m/%'/<1828128ta# cd m/%'/<182812 6.6iAe 87c/%fi'4#e ma-e For deplo!ing the PCP mod"le in an .pache installation we will have to cop! the mongo!so "nder the e4tensions director! 5ast we will have to add the following lines to the php*ini PCP config"ration file 114 http://www.mo#go"b.o!g/"ow#loa"s 115 http://pecl.php.#et/ Page 8 E D#i;e# 6/4# !/%'/DF e@te%si/%Gm/%'/8s/ '# &ompiling and installing (yMongo, the (ython dri*er for MongoD. Installing python1de*el ,(M For adding new P!thon mod"les first we need to install the p!thonBdevel 5in"4 package ,and its dependencies-* #n a PS. S5>S 87 5in"4 server it means= #6m <i t-<884812<148128@8B_B48#6m #6m <i b5t<284A<222828@8B_B48#6m #6m <i 61t./%<t-<28482<188298@8B_B48#6m #6m <i 61t./%<de;e5<28482<188298@8B_B48#6m Installing (yMongo To install P!thon driver "nder GoptGmongodB6*6*7Gmod"lesGp!thon6*L ,witho"t affecting s!stem;s P!thon- we need to ma7e first a virtual &"thon installation following instr"ctions from virt"alenv page at P!thon Package #nde4 88$ 3nce the script :virtualenv!p"; is downloaded the commands for creating the virt"al P!thon environment and installing p!mongo, the P!thon driver for MongoD<, are m-di# <6 7/6t7m/%'/db<282807m/d45es761t./%284 61t./% ;i#t4a5e%;861 7/6t7m/%'/db<282807m/d45es761t./%284 s/4#ce 7/6t7m/%'/db<282807m/d45es761t./%2847bi%7acti;ate 6i6 i%sta55 61m/%'/ c./?% <,. #//t:#//t 7/6t7m/%'/db<282807m/d45es761t./%284 To "se this environment the developers will have to set the following line as the first in hisGher P!thon scripts HI7/6t7m/%'/db<282807m/d45es761t./%2847bi%761t./% +# &onfiguring the ser*er #n addition to accepting (ommand 5ine Parameters, MongoD< can also e config"red "sing a config"ration file ,GoptGmongodB6*6*7GetcGmongod*conf-* The config"ration incl"ded in o"r installation is the following= db6at. G 7/6t7m/%'/db<282807data 5/'6at. G 7/6t7m/%'/db<2828075/'s7m/%'/d85/'s 5/'a66e%d G t#4e 4%i@3/c-et*#efi@ G 7/6t7m/%'/db<282807tm6 H;e#b/se G t#4e #est G t#4e f/#- G t#4e di#ect/#16e#db G t#4e #n this config"ration the most remarkale element is director!perdd directive, which means that we are going to "se one different director! for each dataase, for making easier to develop ack"p scripts later* 116 http://p'pi.p'tho#.o!g/p'pi/)i!tuale#) Page 9 -# Installing ,oc/Mongo, a (7( based administration tool There are a few frontBends for 2"er!ing and manage MongoD< ,a good list can e fo"nd on >nglish Dikipedia MongoD< article 88) -* .fter testing a few # have fo"nd interesting the PCP we administration tool :/oc7Mongo; 880 * So for installing and deplo!ing it we will download the file rockmongoBv8*8*6*+ip from &ockMongo;s we page 881
and *** m-di# <6 7/6t7m/%'/db<282807?eb7.tm5 m-di# 7/6t7m/%'/db<2828075/'s7a6ac.e 4%Ai6 #/c-m/%'/<;181828Ai6 cd #/c-m/%'/ m; C 74se#s7m%d017?eb7.tm5 3nce the files are deplo!ed we will have to config"re .pache* 3ne e4ample for a Jirt"al Cost ! name with the ,fake- %&5 m!rockmongo*ciges*net co"ld e J2i#t4a5K/st C:80L D/c4me%t,//t "7/6t7m/%'/db<282807?eb7.tm5" Di#ect/#1I%de@ i%de@86.6 i%de@8.tm5 3e#;e#Name m1#/c-m/%'/8ci'es8%et +##/#L/' 7/6t7m/%'/db<2828075/'s7a6ac.e7e##/#_m%'85/' 04st/mL/' 7/6t7m/%'/db<2828075/'s7a6ac.e7access_m%' c/mm/% JIf!/d45e m/d_6.698cL 6.6_admi%_f5a' safe_m/de Off J7If!/d45eL J72i#t4a5K/stL De sho"ld see the following screen after making login with de defa"lt "ser admin with password admin= 117 http://e#.wikipe"ia.o!g/wiki/2o#go-+M2a#ageme#tFa#"Fg!aphicalFf!o#t.e#"s 118 http://co"e.google.com/p/!ock.php/wiki/!ockFmo#go 119 http://!ockmo#go.com/Eactio#G"ow#loa"s Page 40 5# Authentication in MongoD. The c"rrent version s"pports onl! asic sec"rit!* De a"thenticate with a "ser name and password* <! defa"lt a normal "ser has f"ll read and write access to the dataase* So its important change this in the first connection* De have to create a dataase called admin, which will contain the "sers who can have administration rights* The "sers created on this dataase will have administration rights on all collections * &reation of a user with administrator rights For creating an administrator "ser called administrator the commands will e= L 4se admi% L db8addse#("admi%ist#at/#", "admi%ist#at/#") To check if the "ser was created properl! L s./? 4se#s { "_id" : ObjectId("4faaBa9(41B7(4d999Bef18d"), "4se#" : "admi%ist#at/#", "#eadO%51" : fa5se, "6?d" : "819b(a97e0fBde9ca0b0794B0c9ea88a" ) To log with the new "ser we have to "se the client with the following parameters m/%'/ <4 admi%ist#at/# <6 admi%ist#at/# admi% &reation of a user with read only rights De can have admin "sers with read onl! rights or "sers with read onl! right for a specific collection #n this e4ample we will create a read onl! "ser test with access for all the collections L 4se admi% L db8addse#("test", "test", t#4e) :sers with access only to a collection This kind of "sers onl! have rights on their dataases* To create them* De need to log as admin and connect to the dataase where we want to create the new "ser= L 4se test L db8addse#("test", "test6ass?/#d") Acti*ation of authentication support De have two options to activate s"pport in MongoD<= De can r"n mongo script start with the option BBa"th De can add the following line to the config"ration file mongod*conf M a4t. G t#4e Page 41 Acti*ation of authentication support in ,oc/Mongo web interface .t last, for activate "ser recognition in &ockMongo we need to set the following variales in config*php PCP file= M!ONNOO"se#;e#s"POMiPO"m/%'/_a4t."P G t#4eE M!ONNOO"se#;e#s"POMiPO"c/%t#/5_a4t."P G fa5seE 6# De*eloped scripts for starting, stopping and *erifying MongoD. status # have developed three shell scripts for starting, stopping and verif!ing the MondoD< stat"s* >ach one of this scripts reads common config"ration and code from a file called profile* The main feat"res are= The processes of the server will e r"n ! "ser nood! For verif!ing that MongoD< is r"nning a process for nood! "ser with the id saved in the P#D file is searched in the s!stem The start and the stop script store in a log each time the! are r"n This scripts m"st e r"n as root, it;s the starting script who r"ns the server with the "ser config"red "sing the command su To avoid prolems with memor! "sage ! MongoD< we m"st tell to the operating s!stem that the memor! si+e of the process sho"ld e "nlimited, "sing ulimit ?v unlimited efore starting it D"e to this last two reasons the daemon is started with the following line su nobody -c "ulimit -v unlimited; /opt/mongodb-2.2.0/bin/mongod -f /opt/mongodb- 2.2.0/etc/mongodb.conf &onfiguration and common code HI7bi%7-s. H I%sta55ati/% 6at.s m%'_bi%6at.G"7/6t7m/%'/db<28280" m%'_data6at.G"7/6t7m/%'/db<28280" m%'_db6at.G"7/6t7m/%'/db<282807data" m%'_6idfi5eG"Mm%'_db6at.7m/%'/d85/c-" m%'_5/'6at.G"Mm%'_data6at.75/'s" m%'_5/'G"Mm%'_5/'6at.7m/%'/d85/'" H Daem/% 6#/6e#ties m%'_daem/%G"Mm%'_bi%6at.7bi%7m/%'/d" m%'_4se#G"%/b/d1" H 0ON3>&N>3 "!>_IN"OG"QtRsQ%" "!>_+,,O,G"Qt+,,O,: RsIQ%" H "N0>ION3 f4%cti/% m/%'/d_6#/cess_#4%%i%' { Page 42 7bi%76s Mm%'_4se#S'#e6 <; '#e6S'#e6 <T "Mm%'_daem/%" ) f4%cti/% m/%'/d_sta#ted { if O <e "Mm%'_6idfi5e" PE t.e% if O <s "Mm%'_6idfi5e" PE t.e% m%'_6idGUcat "Mm%'_6idfi5e"U 7bi%76s Mm%'_4se#S'#e6 <; '#e6S'#e6 "Mm%'_daem/%"S'#e6 <T Mm%'_6id e5se #et4#% 1 fiE e5se if m/%'/d_6#/cess_#4%%i%'E t.e% 6#i%tf "M"!>_+,,O," "!/%'/DF i%sta%ce #4%%i%' b4t %/ 6id fi5e f/4%d /% Q"Mm%'_6idfi5eQ"" e@it 1 e5se #et4#% 1 fiE fiE ) Script to start the ser*er HI7bi%7-s. s/4#ce 7s/ft7m%d2207fi5es/76#/fi5e H !a@im4m time i% sec/%d afte# sta#ti%' daem/% a%d bef/#e #et4#%i%' a% e##/# !&=_>I!+GB0 H >.is sc#i6t m4st be #4% b1 #//t if O Uid <4U IG 0 PE t.e% 6#i%tf "M"!>_+,,O," ">.is sc#i6t m4st be #4% b1 #//t" e@it 1 fiE H L/' fi5e sc#i6t_5/'G"Mm%'_5/'6at.7m%d_sta#t85/'" if I O <e "Msc#i6t_5/'" PE t.e% t/4c. "Msc#i6t_5/'" c./?% Mm%'_4se# "Msc#i6t_5/'" fiE 6#i%tf "Q%M"!>_IN"OQ%" "3ta#ti%' !/%'/DF 3e#;e# at UdateU" S tee <a Msc#i6t_5/' if I m/%'/d_sta#tedE t.e% s4 Mm%'_4se# <c "45imit <; 4%5imitedE Mm%'_daem/% <f Mm%'_db6at.7m/%'/db8c/%f S tee <a Msc#i6t_5/'" e@it s5ee6 1 iG1 ?.i5e O Mi <5t M!&=_>I!+ P VV I m/%'/d_sta#tedE d/ s5ee6 1 d/%eE Page 4 if I m/%'/d_sta#tedE t.e% 6#i%tf "M"!>_+,,O," "!/%'/DF se#;e# c/45d %/t be sta#ted" S tee <a Msc#i6t_5/' e5se 6#i%tf "M"!>_IN"O" "!/%'/DF se#;e# sta#ted /- a%d #4%%i%'" S tee <a Msc#i6t_5/' fiE e5se 6#i%tf "M"!>_+,,O," "!/%'/DF se#;e# is a5#ead1 #4%%i%'" S tee <a Msc#i6t_5/' e@it 1 fiE Script to stop the ser*er HI7bi%7-s. s/4#ce 7s/ft7m%d2207fi5es/76#/fi5e H !a@im4m time i% sec/%ds afte# sta#ti%' daem/% a%d bef/#e #et4#%i%' a% e##/# !&=_>I!+GM(((CB0)) H >.is sc#i6t m4st be #4% b1 #//t if O Uid <4U IG 0 PE t.e% 6#i%tf "M"!>_+,,O," ">.is sc#i6t m4st be #4% b1 #//t" e@it 1 fiE H L/' fi5e sc#i6t_5/'G"Mm%'_5/'6at.7m%d_st/685/'" if I O <e "Msc#i6t_5/'" PE t.e% t/4c. "Msc#i6t_5/'" c./?% Mm%'_4se# "Msc#i6t_5/'" fiE 6#i%tf "Q%M"!>_IN"OQ%" "3t/66i%' !/%'/DF 3e#;e# at UdateU" S tee <a Msc#i6t_5/' if m/%'/d_sta#tedE t.e% s4 Mm%'_4se# <c "Mm%'_daem/% <f Mm%'_db6at.7m/%'/db8c/%f <<s.4td/?%S tee <a Msc#i6t_5/'" s5ee6 1 iG1 ?.i5e O Mi <5t M!&=_>I!+ P VV m/%'/d_sta#tedE d/ s5ee6 1 d/%eE if m/%'/d_sta#tedE t.e% 6#i%tf "M"!>_+,,O," "!/%'/DF se#;e# c/45d %/t be st/66ed" S tee <a Msc#i6t_5/' e5se 6#i%tf "M"!>_IN"O" "!/%'/DF se#;e# stt/6ed" S tee <a Msc#i6t_5/' fiE e5se 6#i%tf "M"!>_+,,O," "!/%'/DF se#;e# is %/t #4%%i%'" S tee <a Msc#i6t_5/' e@it 1 Page 44 fiE Script to *erify the status HI7bi%7-s. s/4#ce 7s/ft7m%d2207fi5es/76#/fi5e if m/%'/d_sta#tedE t.e% 6#i%tf "M"!>_IN"O" "!/%'/DF se#;e# is #4%%i%'" #et4#% 0 e5se 6#i%tf "M"!>_IN"O" "!/%'/DF se#;e# is NO> #4%%i%'" #et4#% 1 fiE Page 45 NoSQL Schema Design for Internet Access Logs "# Analysis of logs with NoSQL The anal!se of logs ,in real time while the data is eing received or processing data alread! stored- is the t!pe of prolem for which 9oS:5 sol"tions are partic"larl! s"itale* De have a great ,or even h"ge- amo"nt of data that increases witho"t end, and where the relationships are not reall! important ,we don;t need to normalise the elements of data-* #n this article # will e4plain the design of the schema chosen for an e2"ivalent sol"tion implemented with M!S:5 and with MongoD<* %# Description of an eAui*alent MySQL database For o"r comparative tests # have defined the following M!S:5 tales= Access Logs The FTP connections are stored in a different tale that the 9on FTP ,mostl! CTTP- ,eports by month Two totals are stored each month per domain and "ser= n"mer of access ,hits- and vol"me in !tes downloaded* . report is made ! month, what means we have for each month two tables= one with the users information and a second one with domains information* '# Defining a schema for MongoD. NoSQL De have the following elements to manage= Page 46 %sers #nternet domains The access non FTP ,mainl! CTTP- The access "sing FTP #n MongoD< data is gro"ped into collections ,e2"ivalent to tales- and each element of data is called a doc"ment ,e2"ivalent to records-* 2nli7e relational databases each document could have a different number of fields4 and also contain other documents* #n MongoD< is not necessar! to define the fields and the t!pe of each field* The collection, if needed, and the str"ct"re of a doc"ment is created ! the server at the time of saving the data* De will work with the following collections= Two collections for access log, one for each access log tale Totals ,hit n"mer and data transferred- will e calc"lated in real time "sing MongoD< f"nctions 867 * For each month we will have we will have two collections= one ! "ser and a second one ! domain ,so in a !ear we will have 6L collections with aggregation data- Therefore, the str"ct"re of each collection is ,shown in pse"docode- *** NON !4( &onnections Log { "4se#i6": st#i%', "4se#": st#i%', "datetime": Date, "met./d": st#i%', "6#/t/c/5": st#i%', "d/mai%": st#i%', "4#i": st#i%', "#et4#%_c/de": i%te'e#, "siAe": i%te'e# ) !4( &onnections Log { "4se#i6": st#i%', "4se#": st#i%', "datetime": Date, "met./d": st#i%', "d/mai%": st#i%', "4#i": st#i%', "siAe": i%te'e# ) 120 http://www.mo#go"b.o!g/"ispla'/-7&3/Ap"ati#g Page 47 4otals calculated by user .s said, for monthl! reports we will work with two collections= a collection for "sers and one for the domains, with totals ! month and da!* For each !ear and month we will have a collection >ach collection will have one doc"ment per "ser* #n addition to the "ser identifier, t he number of visits and volume of b"tes transferred will be stored b" month and b" da"* Dail! totals will e stored as a s"doc"ment ,within the doc"ment corresponding to the "ser-* These totals will e "pdated in real time, as log data is eing processed* That is, each time information is received from a visit a new record will e created in the log ,FTP or FTP-, the n"mer of visits will e incremented ! one and the si+e transferred will e added the total vol"me for the da! and "ser in the collection of the corresponding month* There will e totals ! month that will e "pdated too* { "_id": "se#id" "Nb": i%te'e#, "2/54me": i%te'e#, "Dai51": { "0": { "Nb": i%te'e#, "2/54me": i%te'e# ), "1": { "Nb": i%te'e#, "2/54me": i%te'e# ), "2": { "Nb": i%te'e#, "2/54me": i%te'e# ), "(": { "Nb": i%te'e#, "2/54me": i%te'e# ), 8888 "(0": { "Nb": i%te'e#, "2/54me": i%te'e# ), ), ) 4otals calculated by Domain This will work e4actl! as in "sers "t each doc"ment will correspond to a domain instead of a "ser name* Page 48 +# Interesting articles and presentations To "nderstand how to design a 9oS:5 schema the following presentations have een ver! "sef"l= &ealBTime .nal!tics Schema Design and 3ptimi+ation, ! &!an 9it+ >AD? MongoD< for .nal!tics, ! 'ohn 9"nemaker >B0? &eal Time .nal!tics with MongoD< Deinar, ! 'ared &osoff >B1? From a roader perspective, considering MongoD< architect"re that wo"ld form part, following two links are also interesting= &ealBTime 5og (ollection with Fl"entd and MongoD<, article from Treas"re Data compan! 868 , telling how to "se Fl"entd 866 for realBtime processing of server logs and storing in MongoD< >B2? Social Data and 5og .nal!sis %sing MongoD<, ! Takahiro #no"e, telling the architect"re deplo!ed for a social game compan! "sing MongoD<, Cadoop and (assandra >B'? 121 http://t!easu!e."ata.com/ 122 http://flue#t".o!g/ Page 49 &omparati*e of a MySQL based solution *ersus MongoD. "# 0or/ plan for the performance tests 9ow that we have the following elements read!= . package for the installation of M!S:5 5*7*6$ . package for the installation of MongoD< 6*6*7 . schema design for M!S:5 and the e2"ivalent for MongoD< #t;s time to make tests with data* .s we can3t use real data for confidentialit" concerns we will have to= Develop code to fill with :fa7e but realistic; data the M!S:5 and MongoD< dataases* This code will have classes with a common interface that will allow applications to "se data from M!S:5 or MongoD< with no code changes Define a atter! of tests to compare the performance of oth sol"tions Make the tests and otain concl"sions %# (7( De*elopment for testing the database model The first thing to do is to fill the dataases with a ig vol"me of realistic data* So initiall! # have developed some code to create a volume of millions of log entries* For o"r tests we have created= )7*777 visiting "sers )7*777 visiting #P;s 8*E77*777 visited #nternet domains 17*777*777 of 9on FTP log entries L*577*777 of FTP log entries For creating each log entr! we will have to generate random data for the different elements= #nternet domain, FTP method, CTTP method, #P, Protocol, &et"rn code, Si+e and %ser* So, # have developed three PCP classes= :/andomElements; class= with f"nctions like get&andomDomain,-, get&andomFTPMethod,- M which are "sed to generate the random elements of data :Mongo/andomElements; and :M"S#L/andomElements; classes, which are children classes of the previo"s one and have added f"nctions to work with each dataase management s!stem* The! have f"nctions to= Save a random "ser in the dataase (reate lists of random domains, #Ps and "sers and save them in talesGcollections Delete talesGcollections Page 50 Jerif! if a "ser e4ists in the list of random "sers Send a 2"er! that ret"rns one single data and ret"rn it Send a 2"er! that ret"rns a gro"p of records and ret"rn them as an arra! Fet the n"mer of "sers created Fet one random "serGdomainG#P from the list of created "sersGdomainsG#Ps (reate a new FTP log entr! getting the ,random- elements needed and save it into the dataase (reate a new non FTP log entr! getting the ,random- elements needed and save it into the dataase M The interface for this two classes is the same, so the! can e "sed with the same code, making the scripts which "se them agnostic to the t!pe of dataase "sed The %M5 diagram of this classes will e= :se code eample .n e4ample of "sing this classes co"ld e the following PCP code that, starting from an empt! dataase, creates )7*777 random "sers, )7*777 random #Ps, 8*E77*777 random domains and after having this elements, generate E7 Page 51 millions of random log entries for the month of .pril and save them in the collection for 9on FTP log entries= Mm#e G %e? !/%'/,a%d/m+5eme%ts()E Mm#e<Lc#eatese#s(70000)E Mm#e<Lc#eateI*s(70000)E Mm#e<Lc#eateD/mai%s(1(00000)E 77 +@am65e data f/# &6#i5 Msta#t G m-time(0,0,0,4,1,2012)E Me%d G m-time(2(,99,0,4,(0,2012)E f/# (Mi G 0E Mi J (0000000E MiDD) { M5/' G Mm#e<L'et,a%d/mN/%">*L/'+%t#1(Msta#t, Me%d)E Mm#e<Lsa;e,a%d/mN/%">*L/'+%t#1(M5/')E ) f/# (Mi G 0E Mi J 1900000E MiDD) { M5/' G Mm#e<L'et,a%d/m">*L/'+%t#1(Msta#t, Me%d)E Mm#e<Lsa;e,a%d/m">*L/'+%t#1(M5/')E ) The code for making the same operations on M!S:5 is e4actl! de same e4cept we sho"ld change the first line Mm#e G %e? !/%'/,a%d/m+5eme%ts()E ! Mm#e G %e? !13WL,a%d/m+5eme%ts()E &ode source a*ailable at 3ithub and ciges#net The so"rce code and the tests scripts who "se this classes are availale at :8iges C internetIaccessIcontrolIdemo; on 6ithub 86E * .lso it has een doc"mented with phpDocumentor 127
and it;s availale at m! we page www!ciges!net 865 '# 4esting MongoD. *s MySQL performance .s said in the first chapter the tests will e realised in one machine with the following hardware specifications= &.M= 87 F< (ores= E6 ,.MD 3pteron Processor $860- The comparative will e made etween MongoDB !!0 ,and 6*6*7Brc7, the tests had eg"n efore the final stale version has een made availale- M"S#L $!0!% ,with M!#S.M tales- The ojective is to have an ojective meas"re of performance of oth sol"tions for a list of e2"ivalent tests* The tests developed are gro"ped in= #nsertion tests M"lti"ser conc"rrent read tests 12 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo 124 http://www.php"oc.o!g/ 125 http://www.ciges.#et/php"oc/iac"/ Page 52 M"lti"ser conc"rrent write tests (omple4 ,aggregation- 2"eries read tests . script ,PCP, 'avaScript or S:5- has een done for each one* This scripts are r"n with the %ni4 command time to meas"re the time taken ! each one* >ach test has een repeated three ,or more- times to discard anormal res"lts* +# Insertion tests The list of insertion tests is the following= 8* Feneration and saving of )7*777 random "sers witho"t "sing inde4es and allowing repeated val"es 6* Feneration and saving of )7*777 random #Ps witho"t "sing inde4es and allowing repeated val"es E* Feneration and saving of 8*E77*777 random domains witho"t "sing inde4es and allowing repeated val"es L* Feneration and saving of )7*777 random "sers "sing inde4es and verif!ing ,sending a read 2"er!- that the "ser does not e4ists efore sending the save command 5* Feneration and saving of )7*777 random #Ps "sing inde4es and verif!ing ,sending a read 2"er!- that the #P does not e4ists efore sending the save command $* Feneration and saving of 8*E77*777 random domains "sing inde4es and verif!ing ,sending a read 2"er!- that the "ser does not e4ists efore sending the save command )* Feneration and saving of 8 million of non FTP log entries 0* Feneration and saving of 5 millions of non FTP log entries 1* Feneration and saving of 87 millions of non FTP log entries 87* Feneration and saving of E7 millions of non FTP log entries Insertion tests results &es"lts given are the average of the different res"lts discarding e4treme val"es* MongoDB M"S#L )7*777 "sers Es 86s )7*777 #Ps Es 86s 8*E77*777 domains 50s Lm E$s )7*777 "ni2"e "sers with inde4es 6Es 60s )7*777 "ni2"e #Ps with inde4es 66s E8s 8*E77*777 "ni2"e domains with inde4es 0m6)s 8Lm8Es 8*777*777 log entries 86m)s 6$m8Ls 5*777*777 log entries 8h7Em5Es 6h87m5Ls 87*777*777 log entries 8h51m88s Eh6)m87s E7*777*777 log entries 5h55m65s 87h80mL$s Page 5 -# Multi user concurrent tests The previo"s insertion tests are coded as a loop which makes an insertion one after the other* This means that there will e onl! one 2"er! at a time* For the following tests instead of making a loop ,it makes little sense for reading tests- # have "sed the open so"rce tool (Meter 86$ with the plugin Stepping 5hread 6roup 86) to sim"late conc"rrent "sers* 'Meter is a powerf"l tool that allows to sim"late "se cases and loads with virt"al "sers to meas"re the performance of a we application* # will sim"late virt"al "sers that will access sim"ltaneo"sl! to a collection of scripts which make simple read and write operations* This scripts are PCP scripts which will e made availale via we* The tests are composed ! si4 scripts, which will perform the following three tests for MongoD< and for M!S:5= Search and show data for a random user Dread testE 1dd a random user Dwrite testE Search and show data for a random user or add a random user Dread & write test4 )0J of times will read and 0J of times will writeE De will sim"late two scenarios= 1n incrementing load from 0 to $0 users rising b" five* This load will e kept for a few min"tes 1 load of $0 users sending all <ueries from the beginning* This load will e kept also for a few min"tes efore stopping* The list of tests config"red ,two times, one for MongoD< and another for M!S:5- is the following 8* (onc"rrent reads, incrementing "sers from 7 to 57 6* (onc"rrent reads, 57 "sers E* (onc"rrent writes, incrementing "sers from 7 to 57 L* (onc"rrent writes, 57 "sers 5* (onc"rrent reads ,07[- H writes ,67[-, incrementing "sers from 7 to 57 $* (onc"rrent reads ,07[- H writes ,67[-, 57 "sers >ach one of this tests have een made three or more times, stopping and starting the server efore* For each one we will get= 9"mer of 2"eries sent ,val"e samples- Statistical val"es for response time ,in milliseconds-= average, median, minim"m, ma4im"m, standard deviation Percentage of errors Thro"ghp"t in 2"eriesGsecond 126 http://(mete!.apache.o!g/ 127 http://co"e.google.com/p/(mete!.plugi#s/wiki/3teppi#g1h!ea"0!oup Page 54 Q<!tesGsecond received and average !tes per 2"er! . (SJ file with the response time for all the 2"eries, which # will "se to made a graphical representation For the generation of the graphics # have &ed"ced the n"mer of val"es ,and the impact of aerrant ones- otained getting the mean gro"ped ! second ,855 val"es for MongoD< and the same n"mer for M!S:5- &epresent a linear regression with the & f"nction loess 860 The (Meter configuration, 8S* files with the samples results and / scripts are all availale in the Fith" at :(iges G internetPaccessPcontrolPdemo on Fith" 861 &oncurrent read tests results Concurrent reads* incrementing users #rom 0 to @0 (each thread !ill %e "ept #or 100 seconds) Samples Med Min Max Std! Dev! 5hroughput HBCsec MongoDB 886*8$5 A)ms LEms E*717ms 80,)6ms )60,E 2Gs 80L,11 kGs M"S#L 08*8)1 %Kms L5ms E*8L7ms L7,16ms 560 2Gs 8EL,70 kGs Fraphicall! represented this load test will e 128 http://stat.ethL.ch/*.ma#ual/*.patche"/lib!a!'/stats/html/loess.html 129 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo Page 55 0 5 0 1 0 0 1 5 0 5 0 6 0 7 0 8 0 9 0 1 0 0 3 e c o # " * e s p o # s e
t i m e
m e a #
i #
m s & o n c u r r e n t r e a d s i n c r e m e n t i n g u s e r s f r o m B t o - B 8 # c ! e m e # t i # g b ' f i ) e e a c h f i ) e s e c o # " s . J a c h t h ! e a " i s k e p t f o ! 1 0 0 s e c o # " s 2 ' 3 4 5 2 o # g o - + Concurrent reads* @0 users(each thread !ill %e "ept #or @0 seconds) Samples Med Min Max Std! Dev! 5hroughput HBCsec MongoDB E)*L1) $Ams 57ms 5*L81ms 8$8,7Lms $E5,L 2Gs 866,00 kGs M"S#L E6*6)E %ms 56ms 5*8E$ms 85$,17ms 5L),1 2Gs 88L,5L kGs Fraphicall! represented this load test will e Page 56 0 1 0 2 0 0 4 0 5 0 6 0 6 0 7 0 8 0 9 0 3 e c o # " * e s p o # s e
t i m e
m e a #
i #
m s & o n c u r r e n t r e a d s f o r - B u s e r s 1 e s t m a " e i s w i t h 5 0 t h ! e a " s f ! o m t h e b e g i # # i # g . J a c h t h ! e a " i s k e p t f o ! 5 0 s e c o # " s 2 ' 3 4 5 2 o # g o - + &oncurrent writes tests results Concurrent writes* incrementing users #rom 0 to @0 (each thread !ill %e "ept #or 10 minutes) Samples Med Min Max Std! Dev! 5hroughput HBCsec MongoDB L$L*05E $Ams L1ms E*875ms 6),)8ms )87,1 2Gs 8L0,$) kGs M"S#L E0E*)77 K0ms 58ms L*875ms 65,50ms 50$,) 2Gs 866,$L kGs Fraphicall! represented this load test will e Page 57 0 1 0 0 2 0 0 0 0 4 0 0 5 0 0 6 0 0 6 0 7 0 8 0 9 0 1 0 0 3 e c o # " * e s p o # s e
t i m e
m e a #
i #
m s & o n c u r r e n t w r i t e s i n c r e m e n t i n g u s e r s f r o m B t o - B 8 # c ! e m e # t i # g b ' f i ) e e a c h f i ) e s e c o # " s . J a c h t h ! e a " i s k e p t f o ! 1 0 m i # u t e s 2 ' 3 4 5 2 o # g o - + Concurrent writes* @0 users (each thread !ill %e "ept #or @0 seconds) Samples Med Min Max Std! Dev! 5hroughput HBCsec MongoDB E)*L1) $Ams 57ms 5*L81ms 8$8,7Lms $E5,L 2Gs 8E6,00 kGs M"S#L E6*6)E %ms 56ms 5*8E$ms 85$,17ms 5L),1 2Gs 88L,5L kGs Fraphicall! represented this load test will e &oncurrent reads ; writes tests results Concurrent read & writes* incrementing users #rom 0 to @0 (each thread !ill %e "ept #or 10 minutes) Samples Med Min Max Std! Dev! 5hroughput HBCsec MongoDB L$6*)L7 $$ms L0ms E*888ms 6$,6Lms )7),) 2Gs 8)E,L5 kGs M"S#L E)E*L0L K1ms 56ms E*856ms 6),10ms 5)8,8 2Gs 8E1,18 kGs Fraphicall! represented this load test will e Page 58 0 1 0 2 0 0 4 0 5 0 6 0 6 0 7 0 8 0 9 0 1 0 0 3 e c o # " * e s p o # s e
t i m e
m e a #
i #
m s & o n c u r r e n t w r i t e s f o r - B u s e r s 1 e s t m a " e i s w i t h 5 0 t h ! e a " s f ! o m t h e b e g i # # i # g . J a c h t h ! e a " i s k e p t f o ! 5 0 s e c o # " s 2 ' 3 4 5 2 o # g o - + Concurrent read & writes* @0 users (each thread !ill %e "ept #or @0 seconds) Samples Med Min Max Std! Dev! 5hroughput HBCsec MongoDB E1*71$ $Ams 57ms L*)5Ems 8L6,)6ms $$5,7 2Gs 8$6,1E kGs M"S#L E8*6)6 %Ams 56ms 5*1$Lms 8)$,$Lms 5E7,7 2Gs 861,0 kGs Fraphicall! represented this load test will e Page 59 0 1 0 0 2 0 0 0 0 4 0 0 5 0 0 6 0 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0 3 e c o # " * e s p o # s e
t i m e
m e a #
i #
m s & o n c u r r e n t r e a d ; w r i t e s , i n c r e m e n t i n g u s e r s f r o m B t o - B 8 # c ! e m e # t i # g b ' f i ) e e a c h f i ) e s e c o # " s . J a c h t h ! e a " i s k e p t f o ! 1 0 m i # u t e s 2 ' 3 4 5 2 o # g o - + 0 1 0 2 0 0 4 0 5 0 6 0 6 0 7 0 8 0 9 0 3 e c o # " * e s p o # s e
t i m e
m e a #
i #
m s & o n c u r r e n t r e a d s ; w r i t e s f o r - B u s e r s 1 e s t m a " e i s w i t h 5 0 t h ! e a " s f ! o m t h e b e g i # # i # g . J a c h t h ! e a " i s k e p t f o ! 5 0 s e c o # " s 2 ' 3 4 5 2 o # g o - + 5# Data analyse CaggregationD read tests This tests are made to compare the aggregation capailities of oth dataase management s!stems* # have designed comple4 2"eries that will read all the data ,17 millions of log records- and otain different res"lts* For MongoD< # have "sed the aggregation framework ,simpler than Map &ed"ce f"nctions and eno"gh if we don;t need to get a large list of res"lts-* The 2"eries tested are the following= Dhich are the 87 most visited domains and how man! visits has each oneV Dhich are the 87 most visited domains in the second half of '"neV Dhich are the 87 "sers that have more #nternet accessesV Dhat is the average #nternet traffic for '"neV +n the real world Dand with database that could have terab"tesE this t"pe of <uestions would be calculated in real time "pdating collections created for storing the res"lts ,as shown in chapter 9oS:5 Schema Design for #nternet .ccess 5ogs- or with batch scripts* For comparing the performance etween MongoD< and M!S:5 we will compare the time taken for each one which the %ni4 command time as with the insertion tests* .lso each test will e repeated three or more times* &es"lts given are the average of the different res"lts discarding e4treme val"es* Aggregation read tests results MongoDB M"S#L 87 most visited domains with visit totals 8Em8Es 6mE)s 87 most visited domains in the second half of '"ne 56mE1s 8)mLEs 87 "sers with more #nternet accesses 6Lm76s Em5Es .verage #nternet traffic for '"ne 86m75s 6mL6s 6# Aggregation read tests code # think it;s interesting to show the code "sed for the aggregation scripts in MongoD< and M!S:5* M!S:5 part is S:5, so it will e familiar to most readers, MongoD< part "ses the aggregation framework* The code is mostl! PCP code e4cept one of the tests where # have "sed 'avaScript for MongoD< and a S:5 script for M!S:5* 0hich are the "B most *isited domains and how many *isits has each oneE MongoE9 (JavaScript) db8N/%">*_&ccess_5/'8a''#e'ate( { M'#/46: { _id: "Md/mai%", ;isits: { Ms4m: 1 ) )), { Ms/#t: { ;isits: <1 ) ), { M5imit: 10 ) )8#es45t8f/#+ac.(6#i%tjs/% Page 60 MySFG (SFG script) d#/6 tab5e if e@ists N/%">*_&ccess_5/'_d/mai%_;isitsE c#eate tab5e N/%">*_&ccess_5/'_d/mai%_;isits ( Ud/mai%U ;a#c.a#(299) NO> NLL, U;a54eU i%t 4%si'%ed %/t %455, *,I!&,X Y+X (Ud/mai%U), Y+X U;a54e_i%de@U (U;a54eU) ) +NNIN+G!1I3&! D+"&L> 0K&,3+>G4tf8 se5ect d/mai%, c/4%t(C) as ;a54e f#/m N/%">*_&ccess_5/' '#/46 b1 d/mai%E se5ect C f#/m N/%">*_&ccess_5/'_d/mai%_;isits /#de# b1 ;a54e desc 5imit 10E 0hich are the "B most *isited domains in the second half of FuneE MongoE9 (HIH code) Mm#e G %e? !/%'/,a%d/m+5eme%ts("m/%'/db", "m/%'/db", "5/ca5./st", "I%te#%et&ccessL/'")E 77 "i#st, ?e 'et t.e mi%im4m ;a54e /f t.e 10 .i'.est ;isits 6e# d/mai% Msta#t G %e? !/%'/Date(st#t/time("2012<0B<19 00:00:00"))E Me%d G %e? !/%'/Date(st#t/time("2012<0B<(0 2(:99:99"))E Mmi%_;a54e G Mm#e<L'etO%e(a##a1( a##a1(ZMmatc.Z GL a##a1(ZdatetimeZ GL a##a1( ZM'tZ GL Msta#t, ZM5tZ GL Me%d ))), a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZMd/mai%Z, Z;isitsZ GL a##a1( ZMs4mZ GL 1 ))), a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZM;isitsZ)), a##a1(ZMs/#tZ GL a##a1(Z_idZ GL <1)), a##a1(ZM5imitZ GL 10), a##a1(ZMs/#tZ GL a##a1(Z_idZ GL 1)), a##a1(ZM5imitZ GL 1), ), "N/%">*_&ccess_5/'")E 77 N/?, ?e /btai% a55 t.e d/mai%s ?it. at 5est t.at ;a54e Mdata G Mm#e<L'et,es45ts(a##a1( a##a1(ZMmatc.Z GL a##a1(ZdatetimeZ GL a##a1( ZM'tZ GL Msta#t, ZM5tZ GL Me%d ))), a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZMd/mai%Z, Z;isitsZ GL a##a1( ZMs4mZ GL 1 ))), a##a1(ZMmatc.Z GL a##a1(Z;isitsZ GL a##a1( ZM'teZ GL Mmi%_;a54e))) ), "N/%">*_&ccess_5/'")E f/#eac.(Mdata as Md/c) { 6#i%t_#(Md/c)E ) MySFG (SFG code) Mm#e G %e? !13WL,a%d/m+5eme%ts("m1sT5db", "m1sT5db", "5/ca5./st", "I%te#%et&ccessL/'")E 77 "i#st, ?e 'et t.e mi%im4m ;a54e /f t.e 10 .i'.est ;isits 6e# d/mai% Msta#t G "2012<0B<19 00:00:00"E Me%d G "2012<0B<(0 2(:99:99"E MT4e#1 G "se5ect C f#/m (se5ect disti%ct(c/4%t(C)) as ;isits f#/m N/%">*_&ccess_5/' ?.e#e datetime bet?ee% Q""8Msta#t8"Q" a%d Q""8Me%d8"Q" '#/46 b1 d/mai% /#de# b1 ;isits desc 5imit 10) as t/6te%_;isits_b1_d/mai% /#de# b1 ;isits 5imit 1"E Mmi%_;a54e G Mm#e<L'etO%e(MT4e#1)E Page 61 77 N/?, ?e /btai% a55 t.e d/mai%s ?it. at 5est t.at ;a54e MT4e#1 G "se5ect C f#/m (se5ect d/mai%, c/4%t(C) as ;isits f#/m N/%">*_&ccess_5/' ?.e#e datetime bet?ee% Q""8Msta#t8"Q" a%d Q""8Me%d8"Q" '#/46 b1 d/mai%) as ;isits_b1_d/mai% ?.e#e ;isits LG "8Mmi%_;a54eE M#es45ts G Mm#e<L'et,es45ts(MT4e#1)E ?.i5e(M#/? G M#es45ts<Lfetc._ass/c()) { 6#i%t_#(M#/?)E ) 0hich are the "B users that ha*e more Internet accessesE MongoE9 (HIH code) Mm#e G %e? !/%'/,a%d/m+5eme%ts("m/%'/db", "m/%'/db", "5/ca5./st", "I%te#%et&ccessL/'")E 77 "i#st, ?e 'et t.e mi%im4m ;a54e /f t.e 10 .i'.est ;isits 6e# 4se# Mmi%_;a54e G Mm#e<L'etO%e(a##a1( a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZM4se#Z, Z;isitsZ GL a##a1( ZMs4mZ GL 1 ))), a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZM;isitsZ)), a##a1(ZMs/#tZ GL a##a1(Z_idZ GL <1)), a##a1(ZM5imitZ GL 10), a##a1(ZMs/#tZ GL a##a1(Z_idZ GL 1)), a##a1(ZM5imitZ GL 1), ), "N/%">*_&ccess_5/'")E
77 N/?, ?e /btai% a55 t.e 4se#s ?it. at 5east t.at ;a54e Mdata G Mm#e<L'et,es45ts(a##a1( a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZM4se#Z, Z;isitsZ GL a##a1( ZMs4mZ GL 1 ))), a##a1(ZMmatc.Z GL a##a1(Z;isitsZ GL a##a1( ZM'teZ GL Mmi%_;a54e))) ), "N/%">*_&ccess_5/'")E f/#eac.(Mdata as Md/c) { 6#i%t_#(Md/c)E ) MySFG (SFG code) Mm#e G %e? !13WL,a%d/m+5eme%ts("m1sT5db", "m1sT5db", "5/ca5./st", "I%te#%et&ccessL/'")E 77 "i#st, ?e 'et t.e mi%im4m ;a54e /f t.e 10 .i'.est ;isits 6e# 4se# MT4e#1 G "se5ect C f#/m (se5ect disti%ct(c/4%t(C)) as ;isits f#/m N/%">*_&ccess_5/' '#/46 b1 4se# /#de# b1 ;isits desc 5imit 10) as t/6te%_;isits_b1_4se# /#de# b1 ;isits 5imit 1"E Mmi%_;a54e G Mm#e<L'etO%e(MT4e#1)E 77 N/?, ?e /btai% a55 t.e 4se#s ?it. at 5east t.at ;a54e MT4e#1 G "se5ect C f#/m (se5ect 4se#, c/4%t(C) as ;isits f#/m N/%">*_&ccess_5/' '#/46 b1 4se#) as ;isits_b1_4se# ?.e#e ;isits LG "8Mmi%_;a54eE M#es45ts G Mm#e<L'et,es45ts(MT4e#1)E ?.i5e(M#/? G M#es45ts<Lfetc._ass/c()) { 6#i%t_#(M#/?)E ) Page 62 0hat is the a*erage Internet traffic for FuneE MongoE9 (HIH code) Mm#e G %e? !/%'/,a%d/m+5eme%ts("m/%'/db", "m/%'/db", "5/ca5./st", "I%te#%et&ccessL/'")E Msta#t G %e? !/%'/Date(st#t/time("2012<0B<01 00:00:00"))E Me%d G %e? !/%'/Date(st#t/time("2012<0B<(0 2(:99:99"))E M#es45t G #/4%d(Mm#e<L'etO%e(a##a1( a##a1(ZMmatc.Z GL a##a1(ZdatetimeZ GL a##a1( ZM'teZ GL Msta#t, ZM5teZ GL Me%d ))), a##a1(ZM6#/jectZ GL a##a1(Z_idZ GL 0, Zda1Z GL a##a1 ( ZMda1Of!/%t.Z GL ZMdatetimeZ ), ZsiAeZ GL 1)), a##a1(ZM'#/46Z GL a##a1(Z_idZ GL ZMda1Z, Z;/54meZ GL a##a1( ZMs4mZ GL ZMsiAeZ))), a##a1(ZM'#/46Z GL a##a1(Z_idZ GL Za55Z, Za;e#a'eZ GL a##a1( ZMa;'Z GL ZM;/54meZ))), a##a1(ZM6#/jectZ GL a##a1(Z_idZ GL ZMa;e#a'eZ)) ), "N/%">*_&ccess_5/'"))E 6#i%tf(">#affic ;/54me mea% b1 da1 i% b1tes f/# $4%e: R80fQ%", M#es45t)E MySFG (SFG code) Mm#e G %e? !13WL,a%d/m+5eme%ts("m1sT5db", "m1sT5db", "5/ca5./st", "I%te#%et&ccessL/'")E Msta#t G "2012<0B<01 00:00:00"E Me%d G "2012<0B<(0 2(:99:99"E MT4e#1G"se5ect #/4%d(a;'(;/54me)) f#/m (se5ect s4m(siAe) as ;/54me f#/m N/%">*_&ccess_5/' ?.e#e datetime bet?ee% Q""8Msta#t8"Q" a%d Q""8Me%d8"Q" '#/46 b1 da1/fm/%t.(datetime)) as siAeb1da1"E M#es45t G Mm#e<L'etO%e(MT4e#1)E 6#i%tf(">#affic ;/54me mea% b1 da1 i% b1tes f/# $4%e: R80fQ%", M#es45t)E ?# 7ow to run this tests Database and users creation #n the Fith" repositor! for (iges G internetPaccessPcontrolPdemo 8E7 there is a collection of scripts to r"n the tests* This scripts "se the following dataase names and "sers ! defa"lt= For M!S:5= dataase #nternet.ccess5og, with "ser and password m!s2ld For MongoD<= collection #nternet.ccess5og, with "ser and password mongod So efore starting we will have to create oth and give the permissions= Creating data%ase and user in MySFG m1sT5L c#eate database I%te#%et&ccessL/'E m1sT5L '#a%t a55 6#i;i5e'es /% I%te#%et&ccessL/'8C t/ m1sT5db[5/ca5./st ide%tified b1 Zm1sT5dbZE 10 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo Page 6 Creating collection and user in MongoE9 L 4se I%te#%et&ccessL/' L db8addse#("m/%'/db", "m/%'/db") 3enerating random data The following PCP scripts create E months of random data as e4plained efore= createDataPEmonthsPmongo*php createDataPEmonthsPm!s2l*php To r"n them we can "se the console &'& interpreter with 6.6 <c 6at.7t/7m176.68i%i 6.6sc#i6t List of runnable scripts .s with the data generation script, to r"n them on "se the console &'& interpreter! There are two PCP scripts per test= one for M!S:5 and the second one for MongoD<* The relation of scripts is the following= Scripts 5est test8P8Pmongo*php, test8P8Pm!s2l*php Feneration and saving of )7*777 random "sers witho"t "sing inde4es and allowing repeated val"es test8P6Pmongo*php, test8P6Pm!s2l*php Feneration and saving of )7*777 random #Ps witho"t "sing inde4es and allowing repeated val"es test8PEPmongo*php, test8PEPm!s2l*php Feneration and saving of 8*E77*777 random domains witho"t "sing inde4es and allowing repeated val"es test6P8Pmongo*php, test6P8Pm!s2l*php Feneration and saving of )7*777 random "sers "sing inde4es and verif!ing ,sending a read 2"er!- that the "ser does not e4ists efore sending the save command test6P6Pmongo*php, test6P6Pm!s2l*php Feneration and saving of )7*777 random #Ps "sing inde4es and verif!ing ,sending a read 2"er!- that the #P does not e4ists efore sending the save command test6PEPmongo*php, test6PEPm!s2l*php Feneration and saving of 8*E77*777 random domains "sing inde4es and verif!ing ,sending a read 2"er!- that the domain does not e4ists efore sending the save command testEP8Pmongo*php, testEP8Pm!s2l*php Feneration and saving of 8 million of non FTP log entries testEP6Pmongo*php, testEP6Pm!s2l*php Feneration and saving of 5 millions of non FTP log entries testEPEPmongo*php, testEPEPm!s2l*php Feneration and saving of 87 millions of non FTP log entries testEPLPmongo*php, testEPLPm!s2l*php Feneration and saving of E7 millions of non FTP log entries test1P8Pmongo*php, test1P8Pm!s2l*php .nal!se 2"er!= Fets the 87 domains most visited and the n"mer of visits for each one test1P6Pmongo*php, test1P6Pm!s2l*php .nal!se 2"er!= Fets the 87 domains most visited in the second half of '"ne and the n"mer of visits for each one test1PEPmongo*php, test1PEPm!s2l*php .nal!se 2"er!= Fets the 87 "sers with most hits test1PLPmongo*php, test1PLPm!s2l*php .nal!se 2"er!= Fets the mean ! da! for traffic vol"me in '"ne Multi1user concurrent tests This scripts, "nder the we director!, are tho"gh to we hosted in a we server* 3ne we have config"red o"r we server to made then availale # have "sed= 1pache (Meter 8E8 with the plugin :Stepping 5hread 6roup; 8E6 to r"n the load tests 11 http://(mete!.apache.o!g/ 12 http://co"e.google.com/p/(mete!.plugi#s/wiki/3teppi#g1h!ea"0!oup Page 64 / 8EE to create graphical representation from (SJ files with the data created with 'Meter The scripts availale "nder we director! 8EL are= Scripts >unction 5est testLPmongo*php, testLPm!s2l*php Search and show data for a random "ser (onc"rrent reads test5Pmongo*php, test5Pm!s2l*php Drite a random "ser (onc"rrent writes test$Pmongo*php, test$Pm!s2l*php MongoD< readGwrite test* This scripts makes one of two actions= Search and show data for a random "ser ,read test- or Drite a new random "ser in the dataase ,write test-* The read test is made 07[ of times, the write one the 67[* (onc"rrent reads H writes 3nce the we server config"red if we accede to the %&5 corresponding to the director! we sho"ld see a description message with links to the different scripts :sing FMeter to run load tests .s shown efore we have defined two scenarios for each test and three t"pes of tests* Then we have si4 different tests= (onc"rrent reads, incrementing "sers from 7 to 57 (onc"rrent reads, 57 "sers (onc"rrent writes, incrementing "sers from 7 to 57 (onc"rrent writes, 57 "sers (onc"rrent reads ,07[- H writes ,67[-, incrementing "sers from 7 to 57 (onc"rrent reads ,07[- H writes ,67[-, 57 "sers #n the file :MongoDB vs M"S#L!@mx; there is all the config"ration needed for 'Meter To r"n each tests we sho"ld (hange the %&5 of the server to o"r address #n Jiew &es"lts in Tale change the path where the (SJ file sho"ld e saved >nale in 'Meter onl! the test we want to r"n, disaling the rest ,if not more than one test will e r"n- 1 http://www.!.p!o(ect.o!g/ 14 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo/t!ee/maste!/web Page 65 J-ample o# incrementing user con#iguration !ith JMeter 3etting a graphical representation of load tests with , >ach load test will generate tens of thousands of samples that will be stored in 8S* files* De have two files for each tests t"pe, one with M!S:5 response times and the second one with MongoD< response time* For each test t!pe # have developed a & script that reads this two files and represents graphicall! a s"mmar! of samples and draws a line that shows response time evol"tion for oth t!pes of servers* This scripts are availale also in the we director! and for r"nning them !o" have simpl! to "se the command so"rce* Their names are self e4planator!* #f we had si4 tests we have then si4 & scripts, one for showing the comparative res"lts of each one* To load in & and show the graphic !o" have simpl! to load the script* <! e4ample for loading the first one= s/4#ce("0/%c4##e%t #eads 90 4se#s8,") Page 66 &onclusions and last words "# 4ests conclusions 5ooking at the n"mers and graphics we arrive to the following concl"sions= 0rite performance9 MongoD< is faster in p"re write performance #n a series of contin"o"s simple writings MongoD< is from 6 to L times faster* +n general4 for high numbers Dmillions of record savingsE simple writing performance is the double of M"S#L #n concurrent writes MongoDB is faster D1$J and G0J in o"r tests- MongoD< is m"ch more scalale, meaning that when the user load increases the response time 7eeps stable* &esponse time in M!S:5, instead, gets worse as the n"mer of "sers grows ,ead performance9 MongoD< is faster in p"re read performance #n concurrent reads MongoDB is faster D1$J and A0J in o"r tests- .lso, MongoD< is more scalale Aggregation performance9 Cere M!S:5 wins over MongoD<;s aggregation native framework* M"S#L is much faster in aggregating data4 G to % times faster for the L tests we have done #n this aggregation 2"eries no relations are involved* M!S:5 F&3%P <R 2"eries have a ver! high performance So we co"ld sa! that, as waited, for intensive reading and writing data operations MongoDB is a better option that M"S#L when no relations nor aggregation 2"eries performance are important and the data readingGwriting performance is critical* 9eed to sa! that aggregation 2"eries on ten of millions of records is not a good idea, it would be better to calculate in real time values needed as records are processed ,what means read H write operations-* So for problems as log anal"se ,oS#L technologies are much better* %# Initial planning and actual time spent on each tas/ The initial work charge estimated was E77 ho"rs* The act"al time spent on each has een of more of L77 ho"rs divided as follows= Page 67 5as7s 5ime spent Drounded to hoursE St"d! of 9oS:5 articles H ooks $5 MongoD< installation, config"ration, package creation H "pdates 50 Development of a schema for #nternet .ccess 5og 67 Scripts development ,PCP, S:5, 'avaScript, M!S:5 stored proced"res- $0 5oad tests )5 Doc"mentation ,memor!, posts on ciges*net H presentation- 1E #ncidents anal!se H resol"tion 80 Planning, coordination H comm"nication LE Total ** Expression is faulty ** To keep track of time spent on each task # have "sed the we application &a"mo 1G$ , an eas! to "se time tracking software that allow to create projects and tasks and comfortal! startGstop timers for each one* .lso initiall! was planned to make also tests with sharding capailities of MongoD< ,"sing more than one machine-, "t d"e to lack of time we will do them later* '# (roblems and bugs found The reasons of the time e4cess regarding to was initial planned are= .t first # developed a script to import real data for prod"ction server into MongoD< for the tests* <"t "sing real data is not allowed, then # co"ld not "se it Be have used three versions of MongoDB* The MongoD< st"d! has een started with version 6*7*$* Meanwhile 6*6*7 release candidate 7 and 6*6*7 final have een p"lished * De have "pgraded the version d"e to with 6*6*7 a native aggregation framework is availale, and it;s easier and more performing than "sing MapB&ed"ce to aggregate data* .lso the second version change has een forced d"e to a "g fo"nd on 6*6*7rc7 that prod"ced and integer overflow on some aggregation f"nctions 8E$ #nitiall! # have tried to make some load tests scripts "sing directl! 'avaScript for MongoD< and stored proced"res for M!S:5* .lso # have tried to "se mapBred"ce* Both initiatives4 using native supported (avaScript Cstored procedures and map?reduce were wrong! The time invested in learning how to develop with this technologies has een "seless d"e to limitations on 'avaScript MongoD< .P#* .lso M!S:5 stored proced"res developing was more comple4 than tho"ght, so at last # have "sed PCP for most of the scripts MapBred"ce f"nctionalit! incl"ded ! defa"lt in MongoD< is terril! slow 8E) and not "sef"l* MongoD<;s aggregation framework was a valid option and the one chosen for data anal!se tests* D"e to version change and some errors made when r"nning load tests + had to repeat the tests batter" two or three times 15 http://www.pa'mo.biL/ 16 https://(i!a.mo#go"b.o!g/b!owse/3J*NJ*.6166 17 http://stacko)e!flow.com/<uestio#s/1219149/map!e"uce.with.mo#go"b.!eall'.!eall'.slow.0.hou!s.)s.20.mi#utes.i#.m's<l.fo! Page 68 De have fo"nd some configurations problems and product3s bugs Documentation wor7 time has been underestimated .ugs found on MongoD. .part of prolems commented right now, while we were preparing and testing the MongoD< package for distri"tion in PS.;s server we have fo"nd the following notale "gs in MongoD< or prolems in o"r initial config"ration= Memory problems when inserting big amounts of data Dhen # egan with the insertion tests # got the following errors after a few millions of records saved +,,O,: mma6() fai5ed f/# 74se#s7m%'007i%sta%ces76#ima#17data75/ca575/ca584 5e%:214B4(9072 e##%/:12 0a%%/t a55/cate mem/#1 +,,O,: mma6 fai5ed ?it. /4t /f mem/#18 (B4 bit b4i5d) asse#ti/% 10089 ca%Zt ma6 fi5e mem/#1 %s:5/ca58s1stem8#e65set T4e#1:{) The prolem here is that MongoD< was reserving memor! as it was needed and it arrives a moment where the operating s!stem does not allow the process to cons"me more memor!* .fter a lot of tests and cons"lting with m! colleag"es one of them show me the wa! to go* #n o"r MongoD< starting script we have to tell 5in"4 not to limit the amo"nt of memor! with the command= 45imit <; 4%5imited Integer overflow when using aggregation functions Dhen calc"lating the average of traffic vol"me the res"lt was a negative n"mer* #t was ovio"sl! an overflow* .fter searching in MongoD<;s '#&. 8E0 it is a known prolem for the version 6*6*7rc7* .fter "pgrading the prod"ct to the stale 6*6*7 the prolem was solved* Map-Reduce operations on Mongo! are really" really slow To get the n"mer of visits ! domains # tried to "se mapBred"ce f"nctions "t # fo"nd the! were terril! slow ,E7 ho"rs vs 67 min"tes in M!S:5 for the same kind of test-* .fter asking in MongoD<;s '#&. and in Stack 3verflow 8E1 # received 2"ickl! s"pport from .dam (omerford 8L7 , a technical s"pport manager from 87gen ,the enterprise who made MongoD<- who e4plained that it co"ld e normal* MongoDB uses the (avaScript engine :SpiderMon7e"; to compute Map?/educe functions4 and (avaScript is slow and single threaded* De have three options for this kind of operations 2se the :1ggregation >ramewor7; 1A1 included from MongoDB3s version !1* ,this framework has the limitation of not eing ale of ret"rning data of more than 8$ Mega!tes- 2se 1pache 'adoop 1A to ma7e map?reduce operations with the MongoD< Cadoop (onnector 8LE * #n this case MongoD< will e the dataase from where to read and save the data and .pache Cadoop will do the 18 https://(i!a.mo#go"b.o!g/b!owse/3J*NJ*.6166 19http://stacko)e!flow.com/<uestio#s/1219149/map!e"uce.with.mo#go"b.!eall'.!eall'.slow.0.hou!s.)s.20.mi#utes.i#.m's<l.fo! 140 http://www.li#ke"i#.com/i#/acome!fo!" 141 http://"ocs.mo#go"b.o!g/ma#ual/applicatio#s/agg!egatio#/ 142 http://ha"oop.apache.o!g/ 14http://api.mo#go"b.o!g/ha"oop/2o#go-+K;a"oopK&o##ecto!.html Page 69 calc"lations .nother option ,to test- co"ld e to use 6oogle3s (avaScript engine *) which can e integrated in MongoD<;s compiling the prod"ct 8LL * This engine is faster and m"ltiBthreaded* For o"r test # have "sed the first possiilit!, the .ggregation Framework with an "pdated version of MongoD<* +# !uture wor/ This work is not reall! complete* #n this project # have compared MongoD< with M!S:5 for a concrete "se case and with a limited n"mer of tests* To complete this work the following sho"ld e done /epeat the tests with a huge <uantit" of data ,h"ndreds of millions of records instead of onl! 17 millions, with a "sed disk si+e of h"ndreds of giga!tes instead of tens- .dd tests with a m"ltiBmachine config"ration "sing sharding .lso others f"t"re lines of work co"ld e= Test mapBred"ce operations with J0 'avaScript engine Test mapBred"ce operations with Cadoop integration ,well, eing realistic MongoD<;s and Cadoop;s integration co"ld e the s"ject for another work like the presented in this doc"ment-* .dd a few more aggregation tests -# &ontributions to the community .ll this work, made as part of m! paid work as s!stem administrator at PS., is intended to e p"licl! availale* So, the contri"tion to the comm"nit! is doc"mentation and so"rce code* #n partic"lar, while this project was eing made the following contri"tions have een done= 8ontributions to the Bi7ipedia &ewriting of >nglish wikipedia articles= MongoD< 8L5 , (o"chD< 8L$ &ewriting of French wikipedia article= MongoD< 8L) Minor edition on other articles like >nglish (.P theorem 8L0 , 9oS:5 8L1 , Te4tile ,mark"p lang"age- 857 , .pache (assandra 858 8reation of a personal blog http=CCwww!ciges!net Series of post with a summar" of the wor7 done, prolems fo"nd and sol"tions given= 144 http://www.mo#go"b.o!g/"ispla'/-7&3/+uil"i#gKwithKN8 145 http://e#.wikipe"ia.o!g/wiki/2o#go-+ 146 http://e#.wikipe"ia.o!g/wiki/&ouch-+ 147 http://f!.wikipe"ia.o!g/wiki/2o#go-+ 148 http://e#.wikipe"ia.o!g/wiki/&$PFtheo!em 149 http://e#.wikipe"ia.o!g/wiki/6o345 150 http://e#.wikipe"ia.o!g/wiki/1eItileFO28ma!kupFla#guageO29 151 http://e#.wikipe"ia.o!g/wiki/$pacheF&assa#"!a Page 70 <"scando "na sol"ci\n 9oS:5 para el anKlisis de logs ,8 de L- 856 <"scando "na sol"ci\n 9oS:5 para el anKlisis de logs ,6 de L- 85E <"scando "na sol"ci\n 9oS:5 para el anKlisis de logs ,E de L- 85L <"scando "na sol"ci\n 9oS:5 para el anKlisis de logs ,L de L- 855 #nstalando MongoD< W drivers en S%S> 5in"4 S5>S 87, alg"nos ap"ntes 85$ >s2"ema de datos 9oS:5 para el anKlisis de logs de acceso a #nternet 85) 1ll source code D&'& classes4 scripts and configuration filesE .t 6ithub repositor" (iges G internetPaccessPcontrolPdemo 850 The documentation created with phpDocumentor is on m! we page 851 5his document and detailed instructions to repeat the tests done are incl"ded in Fith" and in m! we #uestions opened and answered on Stac7 0verflow ,and also in MongoD<;s '#&.- Map &ed"ce with MongoD< reall!, reall! slow ,E7 ho"rs vs 67 min"tes in M!S:5 for an e2"ivalent dataase- 8$7 Simple tool for we server enchmarkingV 8$8 Sim"ltaneo"s "sers for we load tests in 'MeterV 8$6 The doc"mentation ,5ire3ffice doc"ments and posts on m! we- have a (reative (ommons .ttri"tion ] Share .like E*7 %nported license* The so"rce code is licensed "nder the FP5 version E*7 152 http://www.ciges.#et/busca#"o.#os<l.1 15 http://www.ciges.#et/busca#"o.#os<l.2 154 http://www.ciges.#et/busca#"o.#os<l. 155 http://www.ciges.#et/busca#"o.#os<l.4 156 http://www.ciges.#et/apu#tes.sob!e.la.i#stalacio#."e.mo#go"b 157 http://www.ciges.#et/es<uema."e."atos.#os<l.pa!a.el.a#alisis."e.logs."e.acceso.a.i#te!#et 158 https://github.com/&iges/i#te!#etFaccessFco#t!olF"emo 159 http://www.ciges.#et/php"oc/iac"/ 160 http://stacko)e!flow.com/<uestio#s/1219149/map!e"uce.with.mo#go"b.!eall'.!eall'.slow.0.hou!s.)s.20.mi#utes.i#.m's<l.fo! 161 http://stacko)e!flow.com/<uestio#s/12249895/simple.tool.fo!.web.se!)e!.be#chma!ki#g 162 http://stacko)e!flow.com/<uestio#s/1292644/simulta#eous.use!s.fo!.web.loa".tests.i#.(mete! Page 71 5# (ersonal e*aluation of the practicum This work has een, from an administrator;s s!stem point of view, ver! interesting* Dhat # have tried here is to appl! the knowledge ac2"ired in the Master on Free Software Projects Development and Management 8$E #n partic"lar # have considered ver! important= 5he availabilit" of DalmostE all the documentation4 code and configuration files in an open license 5he openness in all the process followed to j"stif! options chosen and to compare MongoD< and M!S:5 to allow an!one to repeat it 5he use of 0pen Source products ,instead of proprietar! ones "sed ! defa"lt in m! enterprise-* # mean partic"larl! 5ire3ffice instead of Microsoft 3ffice .pache 'Meter and & instead of CP 5oad&"nner #t;s clear the real infl"ence of the philosoph! and technologies shown at the Master on Free Software on this work, which wo"ld e different if it had een simpl! another project to complete at work* #t has een also the first time # have "sed from the eginning tools to increase and meas"re prod"ctivit! and to follow work time dedicated to each project;s task* # have "sed= &a"mo to meas"re the real time spent on each part 5hin7ing /oc7 as the tool to define tasks, s"tasks and to take 2"ick notes ao"t them 5iddl"Bi7i as a portale notepad to take notes The initial planning has een optimistic, as "s"al, "t now # have ojective data and # hope to improve work time estimation for f"t"re projects* .lso this work has een "sef"l to make a first approach to performance testing, a domain were # have never worked efore and whose comple4it! # get now a etter idea* # hope that the Master on Free Software Projects Development and Management will make me a etter professional with a roader knowledge on Free Software world* #n m! h"mle opinion, # think that at least for this time it has een s"ccessf"l* &egards 'os? M* (iges, in Jigo ,Spain- at 3ctoer 6786 16 http://www.maste!softwa!elib!e.com/ Page 72 .ibliography ; ,eferences 1: P&assa#"!a is a# $pache top le)el p!o(ectP9 b' i#cubato!.apache.o!g . A*5: http://www.mail. a!chi)e.com/cassa#"!a."e)Qi#cubato!.apache.o!g/msg01518.html 2: P&assa#"!a )s 2o#go-+ )s &ouch-+ )s *e"is )s *iak )s ;+ase compa!iso#P9 b' /!istRf /o)Scs . A*5: http://kko)acs.eu/cassa#"!a.)s.mo#go"b.)s.couch"b.)s.!e"is : P6o3459 8f 7#l' 8t ,as 1hat Jas'P9 b' +. B. &la!k . A*5: http://b(cla!k.me/2009/08/#os<l.if.o#l'.it.was.that. eas'/ 4: P&assa#"!a D $ st!uctu!e" sto!age s'stem o# a P2P 6etwo!kP9 b' ?acebook . A*5: http://www.facebook.com/#ote.phpE#oteFi"G244118919Hi"G9445547199Hi#"eIG9 5: P,hat%s #ew i# 2'345 5.6P9 b' 2'345 -e)elope! To#e . A*5: http://"e).m's<l.com/tech. !esou!ces/a!ticles/whats.#ew.i#.m's<l.5.6.html 6: P2'345 5.6 p!e)iew i#t!o"uces a 6o345 i#te!faceP9 b' 1he P;P web . A*5: http://www.h. o#li#e.com/ope#/#ews/item/2'345.5.6.p!e)iew.i#t!o"uces.a.6o345.i#te!face.1519719.html 7: P6o345 to 8##o-+ with 2emcache"P9 b' 1*a#sactio#s o# 8##o-+ +log . A*5: http://blogs.i##o"b.com/wp/2011/04/#os<l.to.i##o"b.with.memcache"/ 8: P&o#siste#c' 2o"els i# 6o#.*elatio#al -atabasesP9 b' 0u' ;a!!iso# . A*5: http://"bpe"ias.com/wiki/6o345:&o#siste#c'F2o"elsFi#F6o#.*elatio#alF-atabases 9: PP&assa#"!a: 1he -efi#iti)e 0ui"ePP9 b' Jbe# ;ewitt . A*5: http://shop.o!eill'.com/p!o"uct/066920010852."o 10: P&ouch-+ Ns 2o#go-+P9 b' 0ab!iele 5a#a . A*5: http://www.sli"esha!e.#et/gab!iele.la#a/couch"b.)s. mo#go"b.2982288 11: P3houl" 8 use 2o#go-+ o! &ouch-+ =o! *e"is>EP9 b' *i'a" /alla . A*5: https://plus.google.com/10797941677126670/posts/5?++2P/41 12: P8s this the #ew hot#ess #owP9 b' $pache . A*5: http://www.mail.a!chi)e.com/cassa#"!a. "e)Qi#cubato!.apache.o!g/msg00004.html 1: P1he A#"e!l'i#g 1ech#olog' of 2essagesP9 b' /a##a# 2uthukka!uppa# . A*5: http://www.facebook.com/#otes/facebook.e#gi#ee!i#g/the.u#"e!l'i#g.tech#olog'.of. messages/454991608919 14: PP1hi!" Pa!t' 3uppo!tP a!ticle o# $pache &assa#"!a%s wikiP9 b' $pache . A*5: http://wiki.apache.o!g/cassa#"!a/1hi!"Pa!t'3uppo!t 15: PP-eplo'i#g &assa#"!a ac!oss 2ultiple -ata &e#te!sP a!ticle o# -atastaI &assa#"!a -e)elope! &e#te!P9 b' -atastaI . A*5: http://www."atastaI.com/"e)/blog/"eplo'i#g.cassa#"!a.ac!oss.multiple."ata.ce#te!sU 16: PP;a"oop 3uppo!tP a!ticle o# &assa#"!a%s wikiP9 b' $pache . A*5: http://wiki.apache.o!g/cassa#"!a/;a"oop3uppo!t 17: PP2ig!ati#g 6etfliI f!om -atace#te! 7!acle to 0lobal &assa#"!aP p!ese#tatio#P9 b' $"!ia# &ockc!oft . Page 7 A*5: http://www.sli"esha!e.#et/a"!ia#co/mig!ati#g.#etfliI.f!om.o!acle.to.global.cassa#"!a 18: P*ai#bi!": *ealtime $#al'tics at 1witte!V p!ese#tatio#P9 b' /e)i# ,eil . A*5: http://www.sli"esha!e.#et/ke)i#weil/!ai#bi!".!ealtime.a#al'tics.at.twitte!.st!ata.2011 19: P?!om 100s to 100s of 2illio#s p!ese#tatio#P9 b' J!ik 7##e# . A*5: http://www.sli"esha!e.#et/eo##e#/f!om.100s.to.100s.of.millio#s/ 20: P&assa#"!a H puppet9 scali#g "ata at W15 pe! mo#th p!ese#tatio#P9 b' -a)e &o##o!s . A*5: http://www.sli"esha!e.#et/"a)eco##o!s/cassa#"!a.puppet.scali#g."ata.at.15.pe!.mo#th 21: P;a"oop a#" &assa#"!a at *ackspacePV p!ese#tatio#P9 b' 3tu ;oo" . A*5: http://www.sli"esha!e.#et/stuhoo"/ha"oop.a#".cassa#"!a.at.!ackspace 22: Pmail f!om &isco i# cassa#"!a."e) maili#g listP9 b' &isco . A*5: http://www.mail.a!chi)e.com/cassa#"!a. "e)Qi#cubato!.apache.o!g/msg0116.html 2: P?$4 o# &assa#"!a%s wikiP9 b' $pache . A*5: http://wiki.apache.o!g/cassa#"!a/?$4Mgui 24: PP&lie#t 7ptio#sP a!ticle o# &assa#"!a ,ikiP9 b' $pache . A*5: http://wiki.apache.o!g/cassa#"!a/&lie#t7ptio#s 25: P&assa#"!a . $ -ece#t!aliLe" 3t!uctu!e" 3to!age 3'stemP9 b' $)i#ash 5akshma# a#" P!asha#t 2alik . A*5: http://www.cs.co!#ell.e"u/p!o(ects/la"is2009/pape!s/lakshma#.la"is2009.p"f 26: PP;+ase )s &assa#"!a: wh' we mo)e"PP9 b' -omi#ic ,illiams . A*5: http://!ia101.wo!"p!ess.com/2010/02/24/hbase.)s.cassa#"!a.wh'.we.mo)e"/ 27: PP4 2o#ths with &assa#"!a9 a lo)e sto!'PP9 b' &lou"&ick . A*5: https://www.clou"kick.com/blog/2010/ma!/02/4Fmo#thsFwithFcassa#"!a/ 28: PP;+ase )s &assa#"!aPP9 b' $"ku . A*5: http://blog.a"ku.com/2011/02/hbase.)s.cassa#"!a.html 29: PP&assa#"!a )s =&ouch-+ X 2o#go-+ X *iak X ;+ase>P9 b' +!ia# 7%6eill . A*5: http://b!ia#o#eill.blogspot.f!/2012/04/cassa#"!a.)s.couch"b.mo#go"b.!iak.hbase.html 0: PP8#t!o"uctio# to &assa#"!a: *eplicatio# a#" &o#siste#c'P p!ese#tatio#P9 b' +e#(ami# +lack . A*5: http://www.sli"esha!e.#et/be#(ami#black/i#t!o"uctio#.to.cassa#"!a.!eplicatio#.a#".co#siste#c' 1: PPJIplo!i#g &ouch-+P9 a!ticle f!om 8+2 -e)elope! ,o!ksP9 b' Boe 5e##o# . A*5: http://www.ibm.com/"e)elope!wo!ks/ope#sou!ce/lib!a!'/os.couch"b/i#"eI.html 2: P$pache maili#g list a##ou#ceme#t o# mail.a!chi)es.apache.o!gP9 b' $pache . A*5: http://mail. a!chi)es.apache.o!g/mo"FmboI/i#cubato!.ge#e!al/200802.mboI/ Oc"4020080212116p61b52ce'fc0fb0a"81a179Qmail.gmail.comOe : PP*e: P!opose" *esolutio#: Jstablish &ouch-+ 15PP o# mail.a!chi)es.apache.o!gP9 b' $pache . A*5: http://mail.a!chi)es.apache.o!g/mo"FmboI/i#cubato!.couch"b."e)/200811.mboI/Oc?52$54.5?&8. 4&+0.8$6+.7-446?07462Q(agu6J1.comOe 4: PP&ouch-+ 6o345 -atabase *ea"' fo! P!o"uctio# AseP9 a!ticle f!om P& ,o!l" of Bull' 2010P9 b' Boab Backso# . A*5: http://www.pcwo!l".com/busi#essce#te!/a!ticle/201046/couch"bF#os<lF"atabaseF!ea"'Ffo!Fp!o"uctio#Fuse. html Page 74 5: PP&oach-+9 1ech#ical 7)e!)iewPP9 b' $pache . A*5: http://couch"b.apache.o!g/"ocs/o)e!)iew.html 6: PP,elcome to ?uto#P f!om P&ouch-+ 1he -efi#iti)e 0ui"ePP9 b' B. &h!is $#"e!so#9 Ba# 5eh#a!"t a#" 6oah 3late! . A*5: http://gui"e.couch"b.o!g/"!aft/tou!.htmlMwelcome 7: PP&ouch-+ i# the wil"P a!ticle of the p!o"uct%s web9 a list of softwa!e p!o(ects a#" websites usi#g &ouch-+P9 b' &ouch-+ . A*5: http://wiki.apache.o!g/couch"b/&ouch-+Fi#FtheFwil" 8: PJmail to the &ouch-+.-e)el listP9 b' Jlliot 2u!ph' . A*5: http://mail. a!chi)es.apache.o!g/mo"FmboI/couch"b."e)/200910.mboI/O&4$-5996.090104Qca#o#ical.comOJ 9: PJ75 fo! couch"b a#" "esktopcouchP9 b' . A*5: https://lists.ubu#tu.com/a!chi)es/ubu#tu."esktop/2011. 6o)embe!/00474.html 40: PP&ouch-+ at the ++& as a fault tole!a#t9 scalable9 multi."ata ce#te! ke'.)alue sto!ePP9 b' J#"a ?a!!ell . A*5: http://www.e!la#g.facto!'.com/co#fe!e#ce/5o#"o#2009/speake!s/e#"afa!!ell 41: PNiew 3e!)e! -ocume#tatio# o# wiki.apache.o!gP9 b' $pache . A*5: http://wiki.apache.o!g/couch"b/Niew3e!)e! 42: P+ackwa!"sFcompatibilit' P+!eaki#g &ha#gesPP9 b' $pache . A*5: http://wiki.apache.o!g/couch"b/+!eaki#gFcha#ges 4: PP,h' &ouch-+EP f!om the P&ouch-+ 1he -efi#iti)e 0ui"ePP9 b' B. &h!is $#"e!so#9 Ba# 5eh#a!"t a#" 6oah 3late! . A*5: http://gui"e.couch"b.o!g/e"itio#s/1/e#/wh'.html 44: P@&ompa!i#g 2o#go -+ a#" &ouch -+C9 f!om 2o#go-+ webP9 b' 2o#go-+ . A*5: http://www.mo#go"b.o!g/"ispla'/-7&3/&ompa!i#gK2o#goK-+Ka#"K&ouchK-+ 45: PP2o#go-+ o! &ouch-+ . fit fo! p!o"uctio#EP9 <uestio# a#" !espo#ses at 3tack7)e!flowP9 b' Baso# Pla#k . A*5: http://stacko)e!flow.com/<uestio#s/895762/mo#go"b.o!.couch"b.fit.fo!.p!o"uctio# 46: PP &ouch-+ &ase 3tu"iesP post o# $leI Popescu 6o345 blogP9 b' $leI Popescu . A*5: http://#os<l.m'popescu.com/post/746667801/.couch"b.case.stu"ies 47: P@&ouch-+ fo! access log agg!egatio# a#" a#al'sisC9 post o# Ase!P!ime!'.#et blogP9 b' 3eth ?alco# . A*5: http://use!p!ima!'.#et/posts/2009/06/1/couch"b.fo!.access.log.agg!egatio#.a#".a#al'sis/ 48: P2o#go-+ Powe!i#g 21N%s ,eb P!ope!tiesP9 b' 2o#go-+ . A*5: http://blog.mo#go"b.o!g/post/56000774/mo#go"b.powe!i#g.mt)s.web.p!ope!ties 49: P2o#go-+ li)e at c!aigslistP9 b' Be!em' Tawo"#' . A*5: http://blog.mo#go"b.o!g/post/554519861/mo#go"b.li)e.at.c!aigslist 50: PP2o#go-+ at fou!s<ua!eP P!ese#tatio# at 2o#go6Y&P9 b' . A*5: http://blip.t)/file/704098 51: Phttp://www.the!egiste!.co.uk/2011/05/25/theFo#ceFa#"Ffutu!eFmo#go"b/ 2o#go-+ "a""': 2' bab' beats 0oogle +ig1ableP9 b' . A*5: 52: P1he 2o#go-+ 6o345 -atabase +log9 1he $0P5P9 b' 2o#go-+ . A*5: http://blog.mo#go"b.o!g/post/108249/the.agpl 5: P1he 2o#go-+ 6o345 -atabase +log9 2o#go-+ 1.4 *ea"' fo! P!o"uctio#P9 b' 2o#go-+ . A*5: http://blog.mo#go"b.o!g/post/47285820/mo#go"b.1.4.!ea"'.fo!.p!o"uctio# Page 75 54: PP1he 2o#go-+ 6o345 -atabase +log9 1he $0P5PP9 b' 2o#go-+ . A*5: http://blog.mo#go"b.o!g/post/108249/the.agpl 55: PP2o#go-+ 3uppo!tP b' 10ge#P9 b' 2o#go-+ . A*5: http://www.10ge#.com/subsc!iptio# 56: P$!ticle P3ha!"i#gP o# 2o#go-+ $"mi#ist!ato!%s 2a#ualP9 b' 2o#go-+ . A*5: http://www.mo#go"b.o!g/"ispla'/-7&3/3ha!"i#g 57: P0!i"?3 a!ticle o# 2o#go-+ -e)elope!%s 2a#ualP9 b' 2o#go-+ . A*5: http://www.mo#go"b.o!g/"ispla'/-7&3/0!i"?3 58: P6086Z plugi# fo! 2o#go-+ sou!ce co"eP9 b' 2ike -i!olf . A*5: http://github.com/m"i!olf/#gi#I.g!i"fs 59: Plighttp" plugi# fo! 2o#go-+ sou!ce co"eP9 b' +!e#"a# 2c$"ams . A*5: http://bitbucket.o!g/bwmca"ams/lighttp".g!i"fs/s!c/ 60: PPAse &asesP a!ticle at 2o#go-+%s web pageP9 b' 2o#go-+ . A*5: http://www.mo#go"b.o!g/"ispla'/-7&3/AseK&ases 61: PPP!o"uctio# -eplo'e#tsP a!ticle o# 2o#go-+ webP9 b' 2o#go-+ . A*5: http://www.mo#go"b.o!g/"ispla'/-7&3/P!o"uctio#K-eplo'me#ts 62: Pmo#go . 1he 8#te!acti)e 3hellP9 b' 2o#go-+ . A*5: http://www.mo#go"b.o!g/"ispla'/-7&3/mo#goK. K1heK8#te!acti)eK3hell 6: P@2o#go-+ 3chema -esig#: ;ow to 1hi#k 6o#.*elatio#alC Ba!e" *osoff%s p!ese#tatio# at Youtube P9 b' Ba!e" *osoff . A*5: http://'outu.be/P8,N?At+N14 64: P@*ealtime $#al'tics with 2o#go-+C9 p!ese#tatio# b' Ba!e" *osoffP9 b' Ba!e" *osoff . A*5: http://www.sli"esha!e.#et/(!osoff/scali#g.!ails.'ottaa 65: P@,eb $#al'tics usi#g 2o#go-+C of @P;P a#" 2o#go-+,eb -e)elopme#t +egi##e![s 0ui"eC bookP9 b' *uba'eet 8slam . A*5: http://es.sc!ib".com/"oc/746011/6273.&hapte!.5.,eb.$#al'tics.Asi#g. 2o#go-+.3ample.&hapte! 66: P@2o#go-+ is ?a#tastic fo! 5oggi#gCP9 b' 2o#go-+ . A*5: http://blog.mo#go"b.o!g/post/17225484/mo#go"b.is.fa#tastic.fo!.loggi#g 67: P@Asi#g 2o#go-+ fo! *eal.time $#al'ticsCP9 b' 2o#go-+ . A*5: http://blog.mo#go"b.o!g/post/171501/usi#g.mo#go"b.fo!.!eal.time.a#al'tics 68: P@Picki#g the *ight 6o345 -atabase 1oolC: post f!om 2o#itis% blogP9 b' 2o#itis . A*5: http://blog.mo#itis.com/i#"eI.php/2011/05/22/picki#g.the.!ight.#os<l."atabase.tool/ 69: PP*eal.1ime $#al'tics 3chema -esig# a#" 7ptimiLatio#PP9 b' *'a# 6itL . A*5: http://www.10ge#.com/p!ese#tatio#s/!eal.time.a#al'tics.schema."esig#.a#".optimiLatio# 70: PP2o#go-+ fo! $#al'ticsPP9 b' Boh# 6u#emake! . A*5: http://www.10ge#.com/p!ese#tatio#s/mo#go. chicago.2011/mo#go"b.fo!.a#al'tics 71: PP*eal 1ime $#al'tics with 2o#go-+ ,ebi#a!PP9 b' Ba!e" *osoff . A*5: http://www.10ge#.com/p!ese#tatio#s/webi#a!/!eal.time.a#al'tics.with.mo#go"b 72: PP*eal.1ime 5og &ollectio# with ?lue#t" a#" 2o#go-+PP9 b' 1!easu!e -ata . A*5: http://blog.t!easu!e. Page 76 "ata.com/post/176626262/!eal.time.log.collectio#.with.flue#t".a#".mo#go"b 7: PP3ocial -ata a#" 5og $#al'sis Asi#g 2o#go-+PP9 b' 1akahi!o 8#oue . A*5: http://www.sli"esha!e.#et/"o!'oku(i#/social."ata.a#".log.a#al'sis.usi#g.mo#go"b Page 77