"#$%&'( )&$*&+%,'-. /,0(1' 21$%-,0,34 5-$6 7&-8&'4 9:;< 1bls stoJy explotes tbe evolotloo of blq Joto ooolytlcs ooJ lts mototlty wltblo tbe eotetptlse. 1be loltlol focos wos oo tbe opptoocbes ooJ ecooomlcs to osloq 1etoJoto Astet ulscovety llotfotm ooJ Apocbe noJoop wltblo tbe some ooolytlcol otcbltectote. 1btee compooles ooooymoosly sboteJ tbelt expetleoces ooJ ptoctlces wltb blq Joto ooolytlcs ooJ blqbllqbteJ tbe beoeflts ooJ lssoes of tbelt Jool Astet-noJoop otcbltectotes. 1be stoJy olso ootlloes tbe ttoJeoffs ooJ loslqbts of bow tbe ptoctlce of blq Joto ooolytlcs ls mototloq lo compooles toJoy. compooles ote mokloq sobstootlol lovestmeots lo ooolytlcol otcbltectotes ooJ most be koowleJqeoble of tbese ttoJeoffs. 1be tecbooloqy ls movloq vety fost ooJ, beoce, compooles most sotlsfy cotteot tepoltemeots wblle coostootly ploooloq fot fotote tepoltemeots. 1be toke-owoy of tbls stoJy ls o setles of loslqbts tbot o compooy sboolJ petloJlcolly Jebote os lts ooolytlcol otcbltectote mototes.
Challenges ...................................................................................... 2 Company A - lnnovaLlve e-Commerce 8eLaller.............................. 4 Company 8 - Clobal AdverLlslng Agency........................................ 8 Company C - Large PealLhcare rovlder...................................... 12 1radeoffs ...................................................................................... 14 lnslghLs ......................................................................................... 16 LndnoLes....................................................................................... 19 AbouL Lhe MeLhodology............................................................... 20 AbouL 8older 1echnology............................................................. 20 AbouL Cur Sponsor....................................................................... 20
8olJet 1ecbooloqy, loc. 201J 2 +
=9.$$6(,62 1hls sLudy lnvesLlgaLes Lhe challenges faclng professlonals who are blendlng blg daLa and analyLlcs wlLhln Lhelr enLerprlses. ln parLlcular, Lhe focus ls how Lo deLermlne Lhe besL analyLlcal archlLecLure LhaL wlll serve boLh Lhe currenL and fuLure demands of Lhelr company. Amld Lhe rapld changes ln Lechnology and lLs appllcaLlons, several crlLlcal Lradeoffs should drlve Lhe evoluLlon of Lhls archlLecLure. 8lg daLa ls blg! AnalyLlcs uslng blg daLa ls hoL! 1he new Lechnologles for blg daLa analyLlcs are lncredlble and evolvlng rapldly. new buslness appllcaLlons are emerglng weekly and are revoluLlonlzlng Lhe way LhaL many lndusLrles do buslness wlLh Lhelr cusLomers, suppllers and oLher parLners. 1hls summarlzes Lhe feellngs volced ln many lndusLry publlcaLlons. Powever, whaL ls really happenlng? WhaL are Lhe cuLLlng-edge companles dolng Loday? ln Lhls sLudy, Lhe Lerm 'blg daLa analyLlcs' ls deflned as: AnalyLlcs 1 uslng blg daLa (as characLerlzed by volume, veloclLy, and varleLy 2 ) wlLhln an enLerprlse archlLecLure (across mulLlple funcLlonal areas) Lo supporL crlLlcal operaLlonal processes (as conLrasLed wlLh one-Llme ad-hoc analyses). ln Lhe pasL, analyLlcs was reserved for back-room dellberaLlons by daLa geeks generaLlng monLhly reporLs on how Lhlngs are golng. 1oday, analyLlcs make a dlfference ln how Lhe company does buslness, day by day, and even mlnuLe by mlnuLe. 1he analyLlcal appllcaLlons are 'mlsslon-crlLlcal' componenLs ln Lhe overall buslness processes. lf Lhe analyLlcs break, execuLlves and cusLomers are upseL! 1hls ls a dramaLlc change ln Lhe role of blg daLa analyLlcs. ln Lhls sLudy, Lhe Lerm 'analyLlcal archlLecLure' refers Lo Lhe lnformaLlon Lechnology (l1) archlLecLure used Lo supporL blg daLa analyLlcs ln a company. A slmllar Lerm ls 'dlscovery plaLform' Lo emphaslze Lhe daLa dlscovery process (of predlcLlve analyLlcs, daLa mlnlng and Lhe llke), as conLrasLed wlLh Lhe pre-deflned reporLlng and analysls processes ln 8l sulLes. lurLher, Lhe Lerm ls used ln Lhls sLudy Lo dlsLlngulsh beLween Lhe larger and more lncluslve 'enLerprlse archlLecLure.' 1here are several lssues dlscussed laLer abouL Lhe relaLlonshlp beLween Lhe Lwo Lypes of l1 archlLecLures. 1o undersLand Lhese changes, we lnLervlewed Lhree lnnovaLlve companles, ln dlfferenL lndusLrles, who are ploneers wlLh blg daLa analyLlcs. lrom Lhese lnLervlews, each of Lhe companles menLloned Lhe followlng challenges. !"#$ &' ()"# *#"+'&,-. 1he flrsL challenge ls Lo compare and conLrasL Lwo plaLforms sulLable for processlng blg daLa volumes, whlch are Apache Padoop 3 and 1eradaLa AsLer 4 . 1here are many companles LhaL have one buL noL boLh. 1o sollclL knowledgeable oplnlons, Lhe sLudy selecLed companles LhaL have lncorporaLed boLh plaLforms lnLo Lhelr analyLlcal archlLecLure. 1he ob[ecLlve ls Lo documenL how Lhe companles leverage Lhe unlque sLrengLhs of Lhe plaLforms Lo generaLe buslness value. 8olJet 1ecbooloqy, loc. 201J J lL was apparenL LhaL each company appreclaLed boLh plaLforms for a varleLy of reasons and found complemenLary ln several ways. ln general, Lhe oplnlon was LhaL Padoop provlded a creaLlve and lncreaslngly sLable mechanlsm Lo acqulre and reLaln blg daLa, whlle Lhe 1eradaLa AsLer plaLform provlded beLLer producLlvlLy for dlscoverlng paLLerns and explorlng daLa. 8oLh overlap ln funcLlonallLy when fllLerlng and Lransformlng daLa. 1he release of Lhe 1eradaLa AsLer SCL-P` connecLor was plvoLal ln shlfLlng Loward an archlLecLure LhaL blends boLh plaLforms. More deLalls are glven ln Lhe Lhree case sLudles. ("+" !.)/"-0 1he second challenge ls Lo Lurn Lhe fears of Lhe lncomlng daLa Lsunaml lnLo reallsLlc expecLaLlons abouL Lhe beneflLs of blg daLa analyLlcs Lo Lhe company. lor lnsLance, Lhe lnLervlewees ofLen dlscussed Lhe volumes and veloclLles of acqulred daLa. ln some cases, companles are acqulrlng several LerabyLes dally, lmplylng LhaL Lhe LoLal daLa sLore musL handle peLabyLes Lo malnLaln a mulLl-year hlsLory. ln addlLlon, Lhere were lnLeresLlng dlscusslons cenLered on new daLa sources Lo supporL new buslness appllcaLlons. lurLher, Lhe (ofLen devlllshly) mulLl-sLrucLured naLure of Lhe daLa ls drlvlng analyLlcs Lo hlgher levels of sophlsLlcaLlon. lrom Lhe lnLervlews, lL was apparenL LhaL Lhe Lsunaml analogy ls becomlng lnapproprlaLe. lnsLead of a large remoLe force floodlng a company wlLh overwhelmlng daLa, companles are flndlng blg daLa opporLunlLles ln every corner of Lhelr buslness. LlfL a rug ln any room, and Lhere are LerabyLes awalLlng analysls! ("+" +& 12+0&/ 1he nexL challenge ls Lo documenL Lhe ways LhaL companles managed Lhe analyLlcal processes LhroughouL Lhe enLlre analyLlcs value chaln from raw daLa Lo buslness acLlons, as lllusLraLed ln llgure 1.
I|gure 1 - 1he Ana|yt|c Va|ue Cha|n. S
AnalyLlcs generaLe buslness value only when lL can lmprove buslness processes Lhrough speclflc changed acLlons. ln Lhe pasL, Lhe emphasls was on generaLlng lnformaLlon, whlch was dlsLrlbuLed Lo Lhe 'rlghL' people ln a Llmely fashlon. 1he assumpLlon was LhaL Lhose people would consume Lhls lnformaLlon and perform Lhelr [ob funcLlons ln a more effecLlve (or aL leasL efflclenL) manner. Powever, Lhlngs happen Loo qulckly ln Loday's companles for Lhls paradlgm Lo be sufflclenL. AnalyLlcs musL be lnLegraLed dlrecLly lnLo buslness processes, wlLh Lhe proper human overslghL. WlLh blg daLa, Lhe analyLlcal archlLecLure musL focus on how sLrucLure (meLa- daLa) enhances Lhe raw daLa, lllumlnaLlng Lhe myrlad of relaLlonshlps LhaL llnk one daLa elemenL Lo anoLher. 1haL ls why cross-funcLlonal daLa llnkages lnLo Lhe corporaLe daLa warehouse are crlLlcal for a vlable analyLlc value chaln. 8olJet 1ecbooloqy, loc. 201J 4 !&+"# 3&.+ &' 45/$,.607 1he flnal challenge concerns Lhe LoLal cosL of ownershlp (1CC) of an analyLlcal archlLecLure. 1he challenge ls deLermlnlng Lhe proper meLhod for calculaLlng Lhe LoLal cosL Lo acqulre, malnLaln and exLend Lhe analyLlcal archlLecLure for Lhe company. ln Lhe lnLervlews, lL was dlfflculL Lo obLaln speclflc amounLs for Lhe varlous cosLs assoclaLed wlLh analyLlcs. Powever, Lhere was conslderable oplnlon on whaL cosL facLors a company should conslder, such as use of currenL Lechnlcal skllls, flexlblllLy ln proLoLyplng new appllcaLlons, pay-as-used cloud-based servlces, and depLh of analyLlcs. ln oLher words, a serlous 1CC lnvesLlgaLlon for an analyLlcal archlLecLure wlll lnvolve cosL facLors more dlverse LhaL a LradlLlonal l1 archlLecLure. Companles LhaL mandaLe an adherence Lo a LradlLlonal l1 assessmenL wlll be aL a dlsadvanLage Lo pursue lnnovaLlve appllcaLlons lnvolvlng blg daLa analyLlcs. 1he responslble execuLlves musL Lhlnk sLraLeglcally and broadly abouL 1CC assessmenLs of poLenLlal analyLlcal archlLecLures. - - - - - 1o undersLand Lhese challenges, Lhe nexL secLlon descrlbes how Lhese companles applled blg daLa analyLlcs ln Lhelr buslnesses: Company A - lnnovaLlve e-Commerce 8eLaller Company 8 - Clobal AdverLlslng Agency Company C - Large PealLhcare rovlder =#>5.(0 / ? @((#".&'"6 6A=#>>6716 B6&.'$67 Company A ls an lnnovaLlve e-commerce reLaller LhaL sells a dlverse seL of producLs and servlces. 1helr dlsLlncLlve ls Lo provlde Lhelr cusLomers wlLh an exclLlng onllne shopplng experlence Lallored Lo Lhelr needs, along wlLh superb servlce. 1he success facLor for Company A ls Lo know more abouL Lhelr cusLomers so Lhey can caLer Lo Lhem beLLer. 1he ulrecLor of uaLa Lnglneerlng for Company A was lnLervlewed. Pe has a Leam of Lwelve persons, whom are parL of a Lechnology group supporLlng daLa warehouslng, buslness lnLelllgence, and analyLlcs. Pe descrlbed Lhelr focus as, eosotloq tbot evetyooe lo tbe compooy bos bettet occess to tbe televoot Joto fot motketloq optlmlzotloo ooJ tbe A/8 testloq plotfotm." Pls group ls. Ctowloq vety well ooJ om totolly psycbeJ oboot lt." 8"29:,&)/; Pe conLlnued by explalnlng Lhe buslness problem and Lhelr approach. Oot qool ls to be mote tespooslve to tbe costomet - to bove o floqet oo tbe polse of wbots qoloq oo. 8y loteqtotloq ooJ ooolyzloq Joto oboot tbe webslte bebovlot of costomets, we coo Jtow telotloosblps tbot otbets moy oot bove seeo. lt ls mote tboo Joto loteqtotloo, lt ls potteto tecoqoltloo. Pls Leam lnLegraLes cllcksLream lnformaLlon wlLh emall logs, ad vlewlng, and operaLlonal daLa Lo flgure ouL whaL ls golng on wlLh our cusLomer." 1hls daLa was used Lo supporL A/8 LesLlng and mulLlvarlaLe LesLlng Lo Lrack Lhe enLlre cusLomer's experlence - from searchlng for Lhe 8olJet 1ecbooloqy, loc. 201J 5 producL, orderlng lL, recelvlng Lhe package, and even producL reLurns, all Lo opLlmlze Lhe experlence for Lhe cusLomer. 1,260+$2+),$ Pe descrlbed Lhe evoluLlon of Lhelr archlLecLure over Lhe pasL Lwo years. 1be compooy wos jost o booJteJ people two yeots oqo. we boJ oo ooolytlc tools. Oot teom wos jost two people ooJ me. Clveo oot ttoJltloo of Joloq tbloqs lo oo looovotlve woy, we stotteJ osloq MopkeJoce wltb Apocbe noJoop fot ooolytlcs. nowevet, we koew tbot tbls otcbltectote woolJ oot be sofflcleot. we oeeJeJ to bollJ o Joto woteboose. lo oJJltloo, coqoos ooJ 5potflte wete ptevloosly potcboseJ ooJ boJ o oset bose tbot tepolteJ soppott. 5o, we lovestlqoteJ otbet ooolytlc plotfotms, socb os Astet uoto (befote lts ocpolsltloo by 1etoJoto), Otocle, vettlco, Cteeoplom, ooJ Mooqou8. AbouL a year ago, Lhe Leam acqulred 1eradaLa AsLer Lo complemenL Lhelr Padoop plaLform. 1he 1eradaLa AsLer plaLform ls hosLed on Amazon LC2 cloud and ls belng used as a reporLlng plaLform, along wlLh lLs analyLlc processlng for dlscoverlng new lnslghLs beyond reporLlng. When asked abouL Lhe reasons for decldlng on 1eradaLa AsLer, he explalned, l explolo wbot l oeeJ to tbe Astet folks. we lmmeJlotely boJ o qooJ wotkloq telotloosblp wltb some poollty Astet folks. 8y osloq tbe Amozoo c2 clooJ, tbe ptoject polckly qelleJ loto ploce ooJ bos wotkeJ soccessfolly ovet tbe post yeot." 1eradaLa AsLer uaLabase allows Lhe Leam Lo analyze more of Lhese poLenLlal behavlor paLLerns ln a sLable and scalable way. ln parLlcular, Lhe dlsLrlbuLed parallel naLure of Lhe 1eradaLa AsLer dlscovery plaLform allows reasonable compuLaLlon Llmes wlLh large daLa seLs. ln llgure 2, Lhe archlLecLure shows Lhe daLa sources, analyLlc plaLforms, and daLa dellvery, from lefL Lo rlghL.
I|gure 2 - Arch|tecture for Company A. As lndlcaLed, Lhere are Lhree paLhs Lhrough Lhe archlLecLure. llrsL, Lhere ls a dlrecL load lnLo Lhe Padoop plaLform for ad-hoc analyses uslng Padoop Map8educe. Second, Lhere ls fllLerlng and cleanslng processlng lnLo Lhe AsLer plaLform, whlch drlves Lhe A/8 LesLlng for Lhe reLall webslLe. 8olJet 1ecbooloqy, loc. 201J 6 1hlrd, Lhere ls a dlrecL load lnLo Lhe Padoop plaLform as cold sLorage, and Lhen daLa ls exLracLed and loaded uslng Padoop Map8educe lnLo Lhe AsLer plaLform for Lhe analyLlcal processlng. 1he Padoop cold sLorage conLalns daLa LhaL ls lnfrequenLly accessed. 8y uslng Amazon LC2, Lhe company ls able Lo absorb large daLa volumes wlLh flucLuaLlng demands" and Lhen, as needed, Lransfers daLa seLs Lo Amazon LlasLlc Map8educe. Pe offered some general observaLlons abouL Lhe Padoop plaLform. noJoop ulsttlboteJ llle 5ystem ls tbe sexy pott of tbe Apocbe noJoop ptoject. lt ls opeo-sootce ooJ ollows tbe ptoblem solvloq poolltles of MopkeJoce to teolly sbloe. lt ls o btllllootly slmple ftomewotk tbot wotks well. Cn Lhe oLher hand, he concludes LhaL Lhe AsLer plaLform ls good aL dlscoverlng new lnslghLs ln Lhe daLa slnce, as he remarked, Aoolytlcs, llke k, Jo oot bebove olcely lo noJoop." 8aw daLa ls exLracL from varlous sources and loaded onLo Padoop for lnlLlal exploraLlon. LaLer, Lhe daLa ls fllLered/Lransformed and reLalned for 6 monLhs on Lhe AsLer plaLform for rapld ad- hoc analysls. Pe explalned LhaL noL all of Lhe raw daLa ls sLored on Lhe AsLer plaLform, malnly for cosL reasons. Not evety plece of Joto flows to Astet, sloce we oeeJ to Jo some Jlqqloq obeoJ of tlme osloq MopkeJoce lo noJoop." Pls Leam has developed many Map8educe funcLlons ln Lhe AsLer SCL-Map8educe framework, whlch ls much easler Lhan ln sLandard SCL. Cne analyLlc funcLlon LhaL Lhey supporL on Lhe AsLer plaLform ls a random foresL analysls 6 used ln several buslness areas. Pe descrlbed Lhe 1eradaLa AsLer plaLform ls our Swlss army knlfe" wlLhln Lhelr l1 archlLecLure, slnce lL enables hls Leam to covet oll tbe boses oeeJeJ to solve tbe bosloess ptoblem." Pe noLed LhaL Lhe AsLer plaLform has Lhe flexlblllLy Lo manage a varleLy of daLa Lypes. ln addlLlon, lL has llnkages Lo Lhe Padoop plaLform and supporL for Lhe newer analyLlc Lools llke SpoLflre. AlLhough successful ln some areas of analyLlcs, he feels LhaL Lhe lndusLry ls now flndlng LhaL Padoop does noL cover a wlde enough breadLh of analyLlcal processlng for an enLerprlse. 1be noJoop plotfotm coooot Jo oll tbe tbloqs l oeeJ. nowevet, l bove tbe Astet plotfotm to covet tbose oteos." Cver Llme, he wlll shlfL more of Lhe workload onLo Lhe AsLer plaLform, uslng Lhe Padoop plaLform less for analyLlc work and more for mass sLorage. Pe Lhen offered Lhree examples of recenL pro[ecLs LhaL uLlllzed Lhese feaLures: we JlJ oo ooolysls of btowset osoqe by lotetoet xplotet l6 oqolost oot webslte. 1be tow cllckstteom wos loltlolly stoteJ lo noJoop, sommotlzeJ, ooJ tbeo exttocteJ loto tbe Astet plotfotm fot ose wltb 5potflte ooJ k. A volooble feotote of tbe Astet plotfotm ls tbe compotlblllty to ootpot MopkeJoce tesolts to coqoos wltb Ou8c coooectots, 5potflte wltb Iu8c coooectots, ot stooJotJ 5Ol poetles. A teolly cool oppllcotloo wos o MopkeJoce ptoqtom tbot blts Cooqle ooolytlcs os o web setvlce, pollloq lts lofotmotloo loto tbe Astet plotfotm. 1beo, we oseJ 5Ol-MopkeJoce to poety Cooqle ooolytlcs wltb tbe ptopet jolos, wltboot petmooeotly stotloq tbls Joto. lo oJJltloo, tbe oppllcotloo execotes wltblo o teosoooble tlmeftome. 1be Jevelopet wtote tbls oppllcotloo lo o few Joys. we coooot Jo tbls wltb Otocle ot otbet ttoJltloool tools. 8olJet 1ecbooloqy, loc. 201J 7 Aootbet oppllcotloo tbot l llke wos. we tokeolze ooJ potse lotqe volome of text ftom 1wlttet feeJs to qoqe seotlmeot towotJ tbe compooy. 3&.+ <"2+&,. Pls Leam had conslderable experlence wlLh Lhe osLgreSCL daLabase (from whlch Lhe AsLer plaLform was derlved). Pence, Lhe adopLlon of 1eradaLa AsLer was an easy LranslLlon for our developers and daLabase admlnlsLraLors, Lhus avoldlng Lhe cosL of acqulrlng speclallzed daLabase skllls. 8ecause of cosL facLors, Lhe AsLer plaLform only reLalns daLa used by Lhe analyLlcs, whlch are a few LerabyLes raLher Lhan peLabyLes. we ote o lotqe compooy so tbe l1 costs ote ctltlcol." ln conLrasL, he was more concerned wlLh Lhe supporL cosLs of Cognos, whlch JemooJs o lot of loftosttoctote." 1he adopLlon of Amazon LC2 ls easy for lLs lncremenLal cosL sLrucLure and reasonable Lechnlcal skllls. rovlslonlng Padoop vla Lhe Amazon LlasLlc Map8educe ls also an easy mlgraLlon paLh for hls Leam. Padoop clusLer needs aL leasL slx machlnes buL Lhey can ask for 40 machlnes lf Lhe Map8educe [ob requlres lL. Powever, Padoop has long laLency Llmes Lo 'spln up.' Pe remarked, noJoop tokes tlme to stott ooJ Jo lts wotk. Pe noLed LhaL Padoop Map8educe can solve mooy ptoblems, bot yoo oeeJ o lb.u. to potollellze tbe ptocessloq efflcleotly," lmplylng LhaL analyLlcs wlLh Padoop requlre expenslve and scarce persons wlLh a deep sklll seL. lurLher, lL ls hard Lo flnd persons wlLh Lhe skllls Lo malnLaln Lhe Padoop plaLform and Lo avold Lhe cosLs of enLerprlse supporL for Padoop from Cloudera or PorLonworks. =$..&/. =$",/$; Pe was asked Lo share some lessons LhaL hls Leam has learned over Lhe lasL Lwo years. Pe menLloned Lhe followlng polnLs: ueslgn a proof-of-concepL (CC) exerclse uslng a speclflc problem seL Lo be solved by Padoop, AsLer, or anoLher analyLlc plaLform. rove Lo yourself wheLher each plaLform can provlde Lhe proper answers for LhaL problem seL and can handle lLs esLlmaLed workload. We declded noL Lo creaLe our own daLa cenLer wlLh on-slLe hardware. lnsLead, we are uslng Amazon LC2 servlces where we can provlslon exLra resources only when needed. 1hus, we have avolded Lhe usual problems and cosLs assoclaLed wlLh operaLlng a daLa cenLer. And, we avold paylng on-golng operaLlonal/supporL cosLs LhaL are conflgured for perlods of peak demands. 1he AsLer plaLform behaves llke a Lyplcal daLabase sysLem. We are noL worrled, or llmlLed by lLs lnfrasLrucLure. uaLabase operaLlon can be asslgned Lo a normal daLabase admlnlsLraLor. ln oLher words, we qet lt!" Moreover, we are noL worrled abouL Lhe enLlre sysLem golng down. 8olJet 1ecbooloqy, loc. 201J 8 Any new and lnnovaLlve appllcaLlon of new Lechnology wlll have lLs problems. LxpecL lL. 8e paLlenL. ?our vendor should collaboraLe wlLh your company Lo resolve Lhose problems, as 1eradaLa AsLer dld wlLh us. >)--",? ln [usL Lwo years, Company A maLured Lhelr l1 archlLecLure ln daLa warehouslng and ln analyLlcs ln lnnovaLlve ways. WlLh Amazon Web Servlces (AWS), LradlLlonal daLa cenLer lnfrasLrucLure ls Lraded for rapld scalablllLy and buslness flexlblllLy by uslng cloud-based servlces. CosL facLors llke Llme-Lo-value and rlsk assessmenL welghLed heavler Lhan sLafflng raLes and sofLware llcenslng. 1he use of Lhe 1eradaLa AsLer plaLform was Lhe 'Swlss army knlfe' ln hls LoolklL, allowlng hlm Lo cover many daLa analysls requlremenLs wlLh mlnlmal lnfrasLrucLure. And, Lhe use of Padoop (also ln AWS) provldes conLlnulng buslness value as 'cold sLorage' for Lhe blg daLa Lsunaml. =#>5.(0 + ? C$#D.$ /;"67&'2'(, /,6(10 Company 8 ls a large full-servlce adverLlslng agency wlLh dlglLal markeLlng and Lechnology aL lLs core. 1he company creaLes dlglLal medla LhaL bulld buslness ldenLlLy Lhrough web developmenL, medla plannlng and buylng, Lechnology and lnnovaLlon, emerglng medla, analyLlcs, moblle, adverLlslng creaLlve, soclal lnfluence markeLlng, and search. 1helr cllenLs are some of Lhe largesL global corporaLlons. 8"29:,&)/; 1he vlce resldenL of 8uslness lnLelllgence was lnLervlewed. Pe descrlbed hls company and lLs use of blg daLa analyLlcs as follows: Oot cote osoqe toJoy ls focoseJ oo o moltl-otttlbotloo moJel tbot empboslzes tbe seqmeototloo of meJlo ooJleoces. lo coottost to tbe lost toocb opptoocb, we ote oow ooolyzloq o comptebeoslve vlew of oll tbe toocb polots of costomets octoss oll moltl-meJlo cboooels, socb os lotetest oo oo oJvettlsets webslte lo oJJltloo to bebovlot oo tbe veoJots oo- slte webslte. lot lostooce, wbeo costomets tecelveJ oo emoll, Jo tbey look ot lt, ooJ wbot Jo tbey Jo, socb os cllck fot oJJltloool lofotmotloo? AoJ, wbeo costomets seotcb oo teloteJ toplcs, Jo tbey cllck tbtooqb oo JlsployeJ oJ boooets? AoJ, wbeo o costomet potcboses o ptoJoct, wbot wos tbe sepoeoce of toocbes tbot leJ to tbot potcbose? 1he herlLage of Company 8 ls ad servlng. ln oLher words, Lhelr focus ls opLlmlzlng mulLl-channel dlglLal markeLlng by undersLandlng cusLomers' behavlor across all medla channels, enhanclng Lhls daLa wlLh Lhlrd-parLy lnformaLlon, and assesslng lmpacL LhaL speclflc medla has on drlvlng value Lo Lhe buslness. ln addlLlon, Lhey segmenL Lhe cusLomers based on how medla lnfluences Lhelr acLual purchases, Lhus opLlmlzlng Lhe LargeLlng of ad channels. 1hls cusLomer-segmenLaLlon analyLlcs provldes crlLlcal lnformaLlon Lo Lhe company's lnLernal Leams who supporL cllenLs. 1hey manage Lhe whole range of servlces Lo Lhelr cllenLs, from analyzlng Lhe on-llne experlences of Lhelr cusLomers, purchaslng Lhe medla, and managlng 8olJet 1ecbooloqy, loc. 201J 9 Search Lnglne CpLlmlzaLlon (SLC) work. 1be compooy covets tbe wbole qomblt!" 1he lnLernal cllenL Leams use Lhe lnslghLs emerglng from Lhe medla segmenLaLlon analyLlcs Lo opLlmlze dlglLal markeLlng campalgns from Lhe cllenL. 1he boLLom llne for value Lo our cllenLs ls Lo opLlmlze how Lhey spend Lhelr medla funds on dlglLal markeLlng. 1,260+$2+),$ Company 8 processes 3.3 LerabyLes or 36 bllllon rows of cllcksLream daLa every day from dlglLal medla, slLe behavlor, soclal medla, and offllne medla. ln medla speak, LhaL's abouL 400 bllllon medla lmpresslons per year. MosL of Lhe l1 lnfrasLrucLure resldes ln Lhe cloud under Amazon Web Servlces (AWS). 1he dally process uses Amazon LlasLlc Map8educe (LM8) Lo cleanse and aggregaLe Lhe cllcksLream daLa lnLo LransacLlonal cookle-level (or sesslon-level) daLa, whlch passes Lo AsLer uaLabase for advanced analysls. ln addlLlon, Lhe large daLa seLs are reLalned ln Amazon S3 cold sLorage for fuLure analysls. 1he LransacLlonal daLa ls parLlLloned among Lhelr cllenLs. CllenLs have Lhelr separaLe cusLom daLa marLs so LhaL Lhe cllenL daLa ls noL co-mlngled. 1he cllenL daLa ls opLlmlzed and sLrucLured as cusLom CLA cubes for use by our lnLernal cllenL Leams. 8eporLlng Lo Lhe cllenLs abouL dlglLal markeLlng performance ls performed from Lhese cubes. Company 8 bullL a daLa marL cusLomlzed Lo Lhe unlque buslness requlremenLs for each cllenL. Access Lo Lhe cllenL daLa marL ls prlmarlly by our lnLernal Leams. 1hough Lhere ls hlgh-level reporLlng dlrecLly Lo cllenLs so LhaL Lhey can undersLand why we declded on cerLaln adverLlslng opLlmlzaLlons. CllenLs need Lo know why Company 8 has moved Lhelr funds from one adverLlslng channel Lo anoLher. ln addlLlon, Lhe AsLer uaLabase plaLform (whlch also resldes ln AWS) performs advanced analyses on Lhe LoLal LransacLlonal daLa seL Lo generaLe Lhe cusLomer segmenLaLlon deflnlLlons, whlch ls used for medla LargeLlng aL Lhe cusLomer lnsLance by each cllenL. 1he AWS cloud provldes conslderable elasLlclLy ln compuLe resources. 1he Padoop Map8educe processlng requlres hundreds of machlne lnsLances Lo process LhaL hlgh-volume daLa. uslng !ava Lo lnvoke Map8educe, Company 8 may spln up 30 machlne lnsLances for a [ob and 20 lnsLances for anoLher [ob, much llke Lhe old days of malnframe [ob processlng. Powever, when Lhe processlng ls compleLe, Lhose lnsLances are LermlnaLed. Pe summarlzed, We pay only for whaL we use." 8olJet 1ecbooloqy, loc. 201J 10
I|gure 3 - Arch|tecture for Company 8. !$,";"+" 1.+$, 1he 1eradaLa AsLer plaLform provldes an analyLlcal llbrary of SCL-Map8educe funcLlons LhaL we use as bulldlng blocks for our processlng. Cver Lhe long Lerm, we are consolldaLlng Lhe Amazon LM8 [obs onLo Lhe AsLer plaLform Lo mlnlmlze Lhe daLa hops. 1he less we Louch Lhe daLa Lo geL lL lnLo lLs flnal form and Lo lLs flnal resLlng place, Lhe more efflclenL we wlll be. 1he focus of Lhe AsLer plaLform ls Lo manage Lhe medla lnformaLlon aL Lhe aggregaLe cookle level so LhaL we can see who ls consumlng whaL medla. AL Lhe momenL, we are 'klcklng Lhe Llres' Lo learn Lhe capablllLles of Lhe AsLer plaLform, wlLh Lhe lnLenL Lo consolldaLe oLher processlng onLo AsLer. 1he segmenLaLlon analyLlcs use Lhe AsLer SCL-Map8educe funcLlons for sesslonlzaLlon and paLhlng, along wlLh our cusLom maLchlng algorlLhm Lo maLch sesslon ldenLlflers and creaLe a cenLrallzed ldenLlfler for Lhe cusLomer. 1hls processlng ls performed wlLh a slngle pass of Lhe daLa, whlch ls more efflclenL Lhan normal SCL processlng. uependlng on Lhe cllenL, speclflc daLa abouL Lhe cllenL's buslness ls broughL lnLo Lhelr daLa marL because of Lhelr unlque daLa needs. 3#&);@8".$; A/',".+,)2+),$ 8y uslng Lhe AWS cloud servlces, Company 8 ls able Lo parLlLlon usage and sLorage by cllenL and Lo ldenLlfy accuraLe cosLs per cllenL. ln Lhe medla servlces buslness, prlclng ls based on Lhe markup of Lhe varlous volume-based adverLlslng cosLs, such as bllllons of lmpresslons per monLh 7 and glgabyLes of cllenL sLorage. 8y uslng Lhe deLalled cosL accounLlng of each resource based on cllenL, Lhe company has good vlslblllLy and accuracy lnLo Lhe prlclng aspecL of Lhelr buslness. Pe summarlzed, When we operaLed a slngle large daLa cenLer, we dld noL have Lhls undersLandlng of cosL allocaLlon among cllenLs." 3&,$ 3&-7$+$/2$. As a core compeLence, Company 8 knows all abouL dlglLal markeLlng and Lhe Lechnology drlvlng dlglLal markeLlng. Cur value Lo cllenLs ls Lo quesLlon cllenLs abouL wheLher Lhey have looked aL 8olJet 1ecbooloqy, loc. 201J 11 Lhls Lechnology or LhaL Lechnology. And, lf we have successfully applled a new Lechnology Lo a cllenL's problem, we Lhen would suggesL Lo oLher cllenLs Lo look aL LhaL same Lechnology. We focus on our core compeLence ln dlglLal markeLlng opLlmlzaLlon. lf we need resources ouLslde our core compeLence, we wlll purchase lL ouLslde. ln facL, we have leveraged offshore resources Lo reduce our cosLs. Powever, lf lL ls parL of our core compeLence, Lhen we wlll bulld or acqulre lL lnLernally. 1he bulld-versus-buy declslons wlLh a sofLware pro[ecL are ofLen [udged from Lhls perspecLlve. Pe gave Lhe followlng example: we coolJ bove bollt tbose ooolytlc compooeots lo tbe Astet plotfotm. nowevet, l olso koow tbot Jeveloploq tbose compooeots ls oot oot cote competeoce, so we potcbose tbem ftom 1etoJoto Astet. nowevet, osloq tbose compooeots lo oolpoe woys to sotlsfy oot clleots oeeJs, tbls ls oot cote competeoce. =$B$,":0/: 47$/@>&),2$ >&'+5",$ 1here ls pressure Lo leverage open-source sofLware, llke Apache Padoop. under Lhe rlghL clrcumsLances, open source sofLware ls very aLLracLlve. As Lhls Lechnology maLures and more people use lL ln new ways, Lhe packaged sofLware vendors wlll be pressured Lo exLend Lhelr funcLlonallLy Lo malnLaln Lhelr compeLlLlve advanLage over open-source alLernaLlves. 1.+$, >C=@D 3&//$2+&, 1o leverage open-source sofLware, 1eradaLa AsLer offers a connecLlon layer, called AsLer SCL- P`, Lo Lhe Padoop ulsLrlbuLed llle SysLem (PulS) daLa based on PCaLalog meLa-daLa. 1hls feaLure enables PulS flles Lo be read as naLlve AsLer Lables, Lhus leveraglng Lhe AsLer uaLabase analyLlc funcLlons. AlLhough Lhe lnLenL ls Lo move dally processlng Lo Lhe AsLer plaLform, Lhe company wlll conLlnue Lo sLore and manage some daLa ln PulS, probably uslng SCL-P`. Slnce 1eradaLa AsLer charges by Lhe daLa volume managed by Lhe AsLer plaLform, Lhe company feels LhaL lL needs Lhe flexlblllLy Lo uLlllze boLh opLlons. 1he currenL daLa reLenLlon pollcy ls aL leasL 12 monLhs. Slnce more Lhan flve LerabyLes are acqulred per day, more Lhan one peLabyLe of daLa ls belng sLored ln Amazon LC2 every year. 1he company ls worklng on a long-Lerm reLenLlon pollcy Lo deLermlne Lhe buslness value of archlvlng years of acqulred daLa. =$..&/. =$",/$; Pe shared Lhe lessons learned are abouL Lhe ups and downs of adopLlng new cuLLlng-edge Lechnologles. 8eloq oqqtesslve wltb oew tecbooloqles ls tlsky sloce tbete ote ofteo ooexpecteJ obstocles. votloos feototes ooJ fooctloos tbot yoo ossome wltb ttoJltloool tecbooloqy moy oot be pteseot lo oew tecbooloqy. 5otptlse! oo Jo poy fot beloq oo tbe bleeJloq eJqe! 8olJet 1ecbooloqy, loc. 201J 12 >)--",? As an ad agency for dlglLal medla, Company 8 uses analyLlcs Lo undersLand Lhe lmpacLs LhaL dlglLal medla ls havlng on Lhe markeLplace of Lhelr cllenLs and Lo opLlmlze Lhe 'spend' for LhaL dlglLal medla. 1he analyLlcs for Lhelr mulLl-aLLrlbuLlon model requlres large volumes of conLlnuous daLa from numerous sources. CusLom daLa marLs for each cllenL manage Lhe resulLs and supporL lnLernal Leams ln provldlng servlces Lo Lhelr cllenLs. Cloud-based lnfrasLrucLure provldes elasLlclLy ln compuLlng resources and, especlally, accuraLe cosL accounLlng per cllenL. 1he AsLer plaLform provldes packaged analyLlcs for aglle analyses and a way of consolldaLlng core daLa Lo mlnlmlze hops." Apache Padoop wlll conLlnue Lo provlde low-cosL sLorage and fllLraLlon of raw daLa. 1he use of AsLer SCL-P` ls belng consldered as a means of lncreaslng Lhelr daLa reLenLlon perlod Lo greaLer Lhan 12 monLhs. =#>5.(0 = ? E.7,6 <6.$&91.76 F7#"';67 Company C ls a large healLhcare provlder wlLh dozens of hosplLals, many medlcal cllnlcs, and Lhousands of careglvers. As a sLraLeglc l1 lnlLlaLlve, Lhey are serlous abouL handllng Lhelr blg daLa properly so LhaL managers can access deLalled operaLlonal daLa Lo lmprove healLhcare dellvery, Lo reduce unnecessary expenses and mlnlmlze medlcal llablllLles. 8"29:,&)/; 1he ulrecLor of 8uslness lnLelllgence was lnLervlewed. Pe descrlbed Lhelr experlence wlLh analyLlcal plaLforms. AfLer conslderlng several opLlons, Lhey declded Lo lnvesL ln Lhe 1eradaLa AsLer uaLabase plaLform for reporLlng. 8y consolldaLlng all Lhelr daLa marLs onLo Lhls one plaLform, Lhe overhead of loadlng and Lransformlng daLa from dlverse sources and deploylng query Lools Lo dlverse users ls slmpllfled, Lhereby reduclng cosLs. WlLh 18 monLhs of producLlon experlence wlLh Lhe new plaLform, Lhe company ls tlckleJ plok wltb tbe petfotmooce." *&+$/+0"# &' 80: ("+" 1he dlrecLor descrlbed healLhcare as a buslness LhaL generaLes volumlnous deLalled operaLlonal daLa, whlch ls usually noL reLalned for more Lhan a few days. 1he LlecLronlc PealLh 8ecord sysLem Lracks Lhe mlnuLe-by-mlnuLe evenL flow of medlcaLlons and procedures glven Lo paLlenLs. MosL of Lhls lnformaLlon comes from speclallzed medlcal equlpmenL LhaL generaLes real-Llme daLa, such as an M8l or CA1 scan. AlLhough some lmages are reLalned wlLhln Lhe lcLure Archlval SysLem (AS), mosL of Lhe daLa sLored on medlcal devlces soon Jtops oo tbe floot" (l.e., deleLed Lo accommodaLe new daLa). An lmporLanL use case ls managlng paLlenLs' noLes creaLed by physlclans. 1he physlclans submlL Lhelr medlcal reporLs as volce recordlngs, whlch are Lranscrlbed lnLo unsLrucLured LexL. 1hese reporLs conLaln very lmporLanL lnformaLlon abouL Lhe paLlenL, such as fuLure dlagnoses. 1he physlclan ls exLremely candld abouL Lhe paLlenL, more Lhan anywhere else." 1here has been work on deslgnlng dlcLaLlon LemplaLes lnLo whlch Lhe unsLrucLured LexL could be mapped durlng LranscrlpLlon. Powever, Lhls approach ls a parLlal soluLlon slnce Lhe LemplaLes cannoL dlfferenLlaLe among Lhe dlverslLy of medlcal slLuaLlons. Long dlscusslons are ofLen placed ln a 8olJet 1ecbooloqy, loc. 201J 1J slngle cell of Lhe LemplaLe. 1exL analysls could provlde more sLrucLure and hence more value Lo Lhe physlclan's reporL. 1o beLLer undersLand Lhe dynamlcs and economles of Lhelr careglvlng buslness, medlcal analysLs would llke to qet tbelt booJs oo tbot Joto." 1he only way Lo access Lhls deLalled daLa now ls Lo reLrleve Lhe record and exLracL Lhe approprlaLe daLa - all manually. <)+),$ *#"/. As a fuLure pro[ecL, Company C ls plannlng Lo develop a Padoop plaLform Lo evenLually collecL as much of Lhls unsLrucLured daLa as ls posslble. lL wlll be sLored as raw unsLrucLured daLa ln Lhe Padoop ulsLrlbuLed llle SysLem (PulS), as shown ln llgure 4.
I|gure 4 - Aster SL-n Connector to nadoop nDIS. 1here are varlous meLhods worklng wlLh Lhe PulS daLa, all of whlch use Lhe Apache PCaLalog Lable/sLorage managemenL Lo share schema lnformaLlon for lnLeroperablllLy. llrsL, Lhe Padoop Map8educe englne supporLs [ob submlsslon and [ob schedullng as parL of Lhe Apache Padoop package. Second, Apache Plve supporLs daLa aggregaLlon funcLlons wlLh lLs PlveCL language, whlch are complled and submlLLed as Map8educe [obs. 1hlrd, Apache lg creaLes Map8educe programs wlLh a hlgher-level procedural language. llnally, 1eradaLa AsLer supporLs an exLenslon Lo Lhelr declaraLlve SCL-Map8educe language called AsLer SCL-P`, whlch accesses Lhe Padoop PcaLalog and dlrecLly [olns only Lhe daLa requlred. Pence, buslness analysLs can work wlLh Padoop daLa llke anoLher flaL Lable Lhrough sLandard AnSl SCL and varlous 8l query/reporLlng Lools. 1.+$, >C=@D +& 122$.. D";&&7 ("+" Pe remarked LhaL SCL-P` ls belng consldered as Lhe llnk beLween Lhe Padoop plaLform lnLo Lhe AsLer plaLform slnce Lhe Lwo envlronmenLs are remarkably complemenLary. 1eradaLa AsLer provldes a managed M analyLlc dlscovery plaLform LhaL blends SCL and Map8educe, whlle Padoop ls a low-cosL open-source plaLform LhaL also uses Map8educe. AsLer provldes SCL- Map8educe Lo lnLegraLe SCL and Map8educe on Lhe AsLer plaLform and SCL-P` Lo lnLegraLe SCL wlLh Padoop PulS daLa. 1he 8l dlrecLor reallzed LhaL SCL-P` enables hls developmenL sLaff Lo temolo wltb tbe 5Ol" paradlgm and avold learnlng and operaLlng anoLher sysLem Lo access Lhls daLa." Pe could also levetoqe tbe cotteot 5Ol skllls to ooolyze noJoop Joto." 1he developer would need Lo know Lhe synLax and funcLlonal dlfference ln SCL-P`, buL he [udged LhaL Lhls ls oot tbot fot of o sttetcb." Pe esLlmaLed LhaL Lhe 8Cl on Lhls pro[ecL would be greaLer slnce all Lhe daLa ln 8olJet 1ecbooloqy, loc. 201J 14 Padoop ls accesslble uslng Lhelr exlsLlng analyLlcs envlronmenL. 1hus, he surmlsed LhaL Lhls pro[ecL had o cleoo bosloess ose cose" wlLh mlnlmal rlsk ln boLh flnanclal lnvesLmenL and resource consLralnLs. We could look aL daLa LhaL we don'L look aL Loday" wlLh a qulck developmenL efforL. At tbe eoJ of tbe Joy, we bove to sotvlve oo oJJloq voloe" for Lhe company. When asked abouL Lhe cosL of Padoop sLorage, he esLlmaLed LhaL lL would be conslderably less because of Lhe use of Lhousands of low-cosL faulL-LoleranL commodlLy servers Lo handle daLa volumes ln Lhe peLabyLe range. 8y uslng SCL-P` Lo fllLer Lhe Padoop daLa, only Lhe resulL seL, whlch should be ln Lhe glgabyLe range, would be pushed back Lo Lhe AsLer plaLform. Pe concluded LhaL, Ooly tbe octloooble Joto sboolJ be moooqeJ by tbe Astet plotfotm." Pe descrlbed Lhelr sLaLus as lo tbe tblokloq stoqe" wlLh no SCL-P` acLlvlLy."yet, bot we ote cettololy lookloq loto lt" as lL ls oo oot tecbooloqy tooJmop." Pe flnlshed wlLh, lf we coolJ ptovlJe tbls Joto occess, we woolJ wlo tbe bollqome!" >)--",? Company C ls a large healLhcare provlder LhaL ls serlously explorlng ways of managlng and explolLlng deLalled operaLlonal daLa. Cver elghLeen monLhs, Lhe company consolldaLed varlous daLa marLs lnLo Lhe 1eradaLa AsLer plaLform. lf Lhe volumlnous operaLlonal daLa could be economlcally acqulred and sLored, Lhe analyLlcs of Lhe AsLer plaLform (such as LexL analysls on paLlenL noLes by physlclans) has Lhe poLenLlal Lo provlde lnslghLs lnLo lmprovlng Lhelr healLhcare dellvery. ln parLlcular, Lhey are currenLly lnvesLlgaLlng Lhe AsLer SCL-P` ConnecLor as a channel Lo large daLa seLs sLored under PulS. :7.;6#**2 ln plannlng analyLlcal archlLecLures, companles should conslder Lhe followlng Lradeoffs, such as scalablllLy, Llme-Lo-value, sklll lnvenLory, and mulLlple daLa channels. *#"//0/: '&, >2"#"E0#0+? A crlLlcal Lradeoff ln maklng lnvesLmenLs ln analyLlcal archlLecLures ls seLLlng Lhe prlorlLy on scalablllLy. ln pasL decades, only Lhe blg companles worrled abouL l1 lnvesLmenLs ln equlpmenL purchases, daLa cenLer coollng, and Lhe llke. Small companles dld wlLhouL. now many sLarL-up companles are explolLlng analyLlcs Lo open new buslness nlches havlng exploslve growLh. lrom day one, Lhose small lnnovaLlve companles musL Lhlnk deeply abouL scallng Lhelr l1 capablllLles Lo servlce blg cllenLs, who lnLernally are unable Lo explolL Lhe same analyLlcs. Pence, rapld scalablllLy of analyLlcal archlLecLures - peLabyLe daLa volumes, sub-second operaLlonal analyLlcs, Lhousands of concurrenL users -- are hard requlremenLs, noL lofLy dreams. 8y conLrasLlng analyLlcal requlremenLs as currenL versus fuLure, Lhe company can assess Lhe prlorlLy of lnvesLlng ln scalablllLy. A recenL developmenL LhaL lnfluenced Lhls Lradeoff ln Lhe companles lnLervlewed ls cloud- based vlrLual compuLlng servlces from Amazon and oLher vendors. AnalyLlcal processlng ofLen 8olJet 1ecbooloqy, loc. 201J 15 requlres conslderable parallel processlng resources for shorL perlods. urchaslng and malnLalnlng Lhe necessary equlpmenL would be Loo expenslve. Powever, cloud-based servlces can provlde Lhe resources as needed for a cosL based on only Lhe resources consumed. F&/$+0G0/: !0-$@!&@H"#)$ AnoLher dlfflculL Lradeoff ls moneLlzlng Lhe Llme-Lo-value facLor. ln oLher words, whaL ls Lhe buslness value LhaL resulLs from fasLer 'dlscovery' cycles? AnalyLlcal processlng ofLen conslsLs of an analysL who ls conLlnually golng Lhrough cycles of collecLlng, reflnlng, modellng, and LesLlng daLa. lL may Lake many such cycles before Lhe analysL produces valld and pracLlcal resulLs for Lhe buslness. WlLh currenL Lechnology, lL ls posslble Lo reduce Lhose cycle Llmes from days Lo mlnuLes, Lhus lncreaslng Lhe producLlvlLy of Lhe analysL by a hundred-fold. lurLher, Lhese fasLer dlscovery cycles can enable Lhe analysL Lo work as fasL as Lhey can Lhlnk, permlLLlng a hlgh level of unlnLerrupLed concenLraLlon on Lhe analysls problem. Pow should Lhe company moneLlze Lhe producLlvlLy lncrease of Lhe analysL? ln Lhe pasL, execuLlves would declde LhaL Lhe analysL could Lake a week or Lwo and avold Lhe addlLlonal expense. ln Lhe presenL, many companles llve and dle based on leveraglng 'now' lnformaLlon LhaL guldes buslness processes, mlnuLe by mlnuLe, as Lhey unfold. Several years ago, a ma[or reLall company boasLed LhaL Lhelr polnL-of-sale daLa was ln Lhelr daLa warehouse by Lhe Llme Lhe shopper lefL Lhe parklng loL. 1haL same company conslders LhaL boasL so pasL season and ls seeklng ways of up selllng Lo Lhelr cusLomers as Lhey wander Lhe sLore floor, long before Lhey leave Lhe parklng loL. 1..$.. "/; F"/":$ >90## A/B$/+&,? Much of Lhe Lechnology surroundlng blg daLa analyLlcs has evolved ln recenL years. Pence, persons (elLher employees or conLracLors) wlLh Lhe proper skllls are ofLen ln shorL supply. lurLher, Lhe Lechnology ls changlng fasL so LhaL acqulrlng new skllls should be a conLlnual learnlng process. I)::#0/: F)#+07#$ 36"//$#. 1he lnlLlal appllcaLlons of blg daLa analyLlcs acqulred, cleansed, fllLered and Lransformed one daLa source, such as a 1wlLLer LexL sLream. As analyLlcal appllcaLlons, llke markeLlng opLlmlzaLlon, became more sophlsLlcaLed, lL was apparenL LhaL many daLa channels were requlred for analyLlcs on Lhe LoLal cusLomer experlence lnvolvlng mulLlple Louch polnLs (llke Lhe corporaLe webslLe, 1wlLLer, lacebook and oLhers). 1he manual efforL Lo acqulre, cleanse, fllLer and Lransform one daLa channel ls huge. MulLlple channels requlre n Llmes 'huge' efforL plus. generaLlng cross-channel llnkages so LhaL all Lhe Louch polnLs for a speclflc cusLomer are lnLerrelaLed. 8olJet 1ecbooloqy, loc. 201J 16 @(2',9&2 1hls secLlon summarlzes and exLracLs pracLlcal lnslghLs for professlonals pursulng blg daLa analyLlcs wlLhln Lhelr companles. F"+),0/: 0/+& J/+$,7,0.$ 1,260+$2+),$. 1he maLurlng phase LhaL blg daLa analyLlcs ls undergolng currenLly ls slmllar Lo Lhe evoluLlon of daLa warehouslng from a deparLmenLal scope lnLo enLerprlse archlLecLures. Lven Loday, Lhere are cerLalnly many reasons for a deparLmenL uW, such as a clear buslness ob[ecLlve, resLrlcLed user base, and slngle funcLlonal emphasls. Powever, we have learned over Lhe decades LhaL Lhere are huge buslness moLlvaLlons Lo exLendlng and lnLegraLlng deparLmenLal uWs lnLo an enLerprlse uW, such as cross-funcLlonal buslness opporLunlLles, pervaslve supporL of users (even cusLomers), and operaLlonallzlng analyLlcs lnLo buslness processes. 1he same ls currenLly happenlng Lo blg daLa analyLlcs. 1hls lmplles LhaL Lhere wlll conLlnue Lo be good buslness reasons for lsolaLed, slngle-purpose analyLlcs pro[ecLs. Powever, as soon as Lhe buslness vlablllLy of Lhose pro[ecLs ls esLabllshed, Lhe debaLe of how Lo lnLegraLe LhaL funcLlonallLy lnLo Lhe enLerprlse archlLecLure should sLarL wlLh vlgor. 1he company cannoL lgnore Lhe Lough lssues, such as securlLy, daLa governance, prlvacy, and scalablllLy, wlLhouL curLalllng fuLure buslness poLenLlal. uaLa ownershlp ls one of Lhese Lough lssues LhaL has LradlLlonally been a Lug-of-war beLween buslness users and l1 managers. l1 wanLs Lo own all Lhe 'offlclal' daLa abouL Lhe corporaLlon, whlle buslness users wanL Lo use LhaL daLa wlLh flexlblllLy and responslveness. 3#".6 &' !$26 3)#+),$. 1here ls conslderable debaLe abouL wheLher emerglng Lechnlcal alLernaLlves, such as procedural versus seL operaLlons, no-SCL versus SCL, and Lhe llke, wlll domlnaLe over Lhelr older couslns. PlsLory clearly shows LhaL Lech waves never obvlaLe prevlous pracLlces. AfLer Lhe lnlLlal hype subsldes, Lhe savvy professlonals flnd varlous opLlma ln Lhe blendlng of Lhose Lech waves. Cver Llme, besL pracLlces emerge from conLlnuous lncremenLal lnLegraLlon of new Lechnologles lnLo coherenL archlLecLures. 1he same ls happenlng wlLh blg daLa analyLlcs and was especlally apparenL ln Lhe companles lnLervlewed. WlLh any lnlLlal proof-of-concepL pro[ecL, Lhe drlver ls Lo geL Lhe [ob done, as soon as posslble, as cheaply as posslble, wlLh whaLever Lool and sklll ls avallable. Cnce buslness vlablllLy ls esLabllshed, Lhe drlver should shlfL Lo deLermlnlng Lhe besL way Lo accompllsh Lhe [ob. Pence, a company may need Lo acqulre new Lools, new skllls, and a whole new Lechnology Lo properly saLlsfy Lhe requlred ob[ecLlves. 1he polnL ls. uo noL consLraln your fuLures by chooslng beLween A or 8. 8e knowledgeable abouL boLh A and 8. And consLanLly search for Lhe complemenLary blendlng A and 8. lor example, an early lssue for Lhls sLudy was wheLher Apache Padoop or 1eradaLa AsLer was beLLer. 1he lnLervlews qulckly lllusLraLed LhaL Lhls lssue was a fluld comparlson LhaL changed over Llme. 1he companles were lnfluenced ln Lhelr Lechnology declslons as several facLors, such 8olJet 1ecbooloqy, loc. 201J 17 as prlor sklll seLs, depLh of analyLlcs, frequency of dlscovery cycles, and supporL of operaLlonal buslness processes. 1he plvoLal change was Lhe lnLroducLlon by 1eradaLa of Lhe AsLer SCL-P` connecLor, whlch allowed AsLer AnalyLlcs Lo access dlrecLly PulS daLa. 1he debaLe shlfLed from chooslng beLween A or 8 Lo mlnlmlzlng daLa movemenL and Lhus decreaslng Lhe Llme Lo perform analyses. F&;$,"+$ !34 ($E"+$. ln Lhe flrsL secLlon on Challenges, Lhe polnL abouL 1CC was LhaL execuLlves should Lhlnk sLraLeglcally and broadly abouL assesslng analyLlcal archlLecLures. Cne way of [usLlfylng Lhls sLaLemenL ls Lo revlew Lhe buslness ob[ecLlves for blg daLa analyLlcs for Lhe Lhree companles. Company A L-commerce reLaller LhaL provldes cusLomers wlLh a cusLomlzed onllne shopplng experlence by belng more responslve Lo cusLomers' behavlor. Pence, Lhe company needs Lo acqulre cyber-daLa abouL all Louch polnLs wlLh Lhe cusLomer. Company 8 ulglLal-medla ad agency LhaL opLlmlzes all dlglLal markeLlng for cllenLs. Pence, Lhe company acqulres daLa on all cusLomer Louch polnLs across all medla channels relaLed Lo Lhelr cllenL. Company C PealLhcare provlder LhaL lmproves healLhcare dellvery and reduces medlcal expenses. Pence, Lhe company acqulres daLa on medlcaLlons and procedures for paLlenLs and wanLs Lo expand daLa collecLlon Lo all medlcal devlces and from paLlenL noLes by physlclans. noLe LhaL Lhese buslness ob[ecLlves sLrlke close Lo Lhe core buslness compeLence of Lhe companles. Could Lhe company conLlnue ln buslness lf Lhelr analyLlcal appllcaLlons cease Lo operaLe properly? lor one day? lor one week? 1hese quesLlons palnL Lhe naLure of a 1CC assessmenL ln very pracLlcal and reallsLlc Lerms. 1herefore, 1CC debaLes wlLhln a company should be moderaLed by skllled execuLlves knowledgeable abouL Lhe buslness reasons for creaLlng and malnLalnlng analyLlcal appllcaLlons. K0;0/: +6$ !$26 L"B$. We have seen many Lech waves over Lhe pasL decades. Powever, Lhe Lech wave behlnd blg daLa analyLlcs ls obvlously one of Lhe more challenglng and volaLlle. 1o undersLand Lhe lmpllcaLlons and Lrends wlLh Lhe many dlsclpllnes comprlslng blg daLa analyLlcs requlres a unlque lndlvldual. Companles LhaL are maklng subsLanLlal lnvesLmenLs ln analyLlcal archlLecLures musL be knowledgeable of Lhese lmpllcaLlons and Lrends. 1he Lechnology ls movlng very fasL. Companles musL saLlsfy currenL requlremenLs whlle consLanLly plannlng for fuLure requlremenLs. 1herefore, companles requlre Lhe skllls of a compeLenL C1C-llke person Lo rlde Lhls Lech wave and Lo menLor oLhers Lo do Lhe same. - - - - - ln summary, any company LhaL lncorporaLes blg daLa analyLlcs lnLo lLs buslness processes should face and resolve Lhese quesLlons: 1. Wlll you be able Lo supporL Lhe compleLe analyLlcal value chaln from daLa Lo acLlon, Lhus achlevlng Langlble buslness resulLs? 2. WhaL ls Lhe plan Lo maLure Lhe currenL/proposed analyLlcal appllcaLlons across Lhe scope of Lhe enLerprlse? 8olJet 1ecbooloqy, loc. 201J 18 3. Pow wlll you leverage Lhe besL from varlous Lech culLures Lo produce Lhe deslred buslness resulLs? 4. uo you have a consLrucLlve way of debaLlng Lhe 1CC for new lnnovaLlve analyLlcal appllcaLlons? 3. uo you have a good surflng lnsLrucLor so LhaL you can rlde Lhe volaLlle Lech wave of analyLlcs? 8olJet 1ecbooloqy, loc. 201J 19
!(;(#&62
1 1he dlscovery and communlcaLlon of meanlngful paLLerns ln daLa" ls Lhe deflnlLlon ln Wlklpedla aL hLLp://en.wlklpedla.org/w/lndex.php?LlLle=AnalyLlcs&oldld=310946342 reLrleved 6 SepLember 2012. lL ls a weak enLry buL covers Lhe baslcs. 2 As orlglnally deflned by uoug Laney ln 2001. lor more background, see hLLp://en.wlklpedla.org/wlkl/8lg_daLa reLrleved 6 SepLember 2012. lL ls a good enLry wlLh many examples. 3 lor background on Apache Padoop, see hLLp://en.wlklpedla.org/w/lndex.php?LlLle=Apache_Padoop&oldld=310910492 reLrleved 7 SepLember 2012. lL ls a good overvlew of Lhe enLlre Padoop ecosysLem and communlLy. noLe Lhe funky componenL names! 1hls ls changlng on a monLhly basls, drlven by Coogle, ?ahoo, Amazon, l8M, Cloudera, PorLonworks and oLhers. 4 lor background abouL 1eradaLa AsLer, see Lhe 8esource secLlon aL hLLp://www.asLerdaLa.com/, whlch has good whlLepapers, analysL reporLs, webcasLs and more. 3 1hls value chaln was adapLed from a presenLaLlon by Mayank 8awa of 1eradaLa AsLer. lL ls also a Llp of Lhe haL Lo 1be cotpotote lofotmotloo loctoty by lnmon, lmhoff & Sousa, Second LdlLlon, 2001. hLLp://www.amazon.com/CorporaLe-lnformaLlon-lacLory-W-lnmon/dp/0471399612 6 hLLp://en.wlklpedla.org/wlkl/8andom_foresL 7 1he measure used ln adverLlslng ls CM or cosL per Lhousand lmpresslons, wheLher Lhe lmpresslon appears ln radlo, 1v, newspaper, or magazlne. ln Lhls case, lL refers Lo dlglLal adverLlslng where lmpresslons are web page lmages and Lhe llke. 8olJet 1ecbooloqy, loc. 201J 20 /D#%& &96 G6&9#;#$#,0 1he meLhodology of Lhls sLudy ls Lo llsLen carefully Lo several ploneerlng companles ln Lhe blg daLa analyLlcs area. 1he lnLenL ls Lo conLrlbuLe Lo professlonal educaLlon-Lo share Lhe lnslghLs wlLh oLher l1 professlonals so LhaL we can maLure as an lndusLry, amld escalaLlng buslness challenges and rapldly evolvlng Lechnology. 1he sample of 1eradaLa cusLomers was small so Lhese concluslons are Lenuous buL lnslghLful. 8y leveraglng Lhe open access Lo 1eradaLa cusLomers, we have a gllmpse lnLo Lhe complex lssues lnvolved. 8ased on Lhe quallLy of Lhe lnLervlews, Lhls sample ls represenLaLlve of Lhe lssues and Lrends ln Lhls emerglng area. 1he prlmary auLhor ls 8lchard PackaLhorn of 8older 1echnology wlLh subsLanLlve conLrlbuLlons from several 1eradaLa colleagues: SLeve Wooledge, Manan Coel, Mayank 8awa, and kevln raLL. We are appreclaLlve of Lhe companles and professlonals who were wllllng Lo share openly Lhelr experlences. llnally, we are appreclaLlve of 1eradaLa CorporaLlon for Lhelr asslsLance and sponsorshlp of Lhls sLudy. /D#%& +#$;67 :619(#$#,0 8older 1echnology lnc. ls a LwenLy-year-old consulLancy focused on 8uslness lnLelllgence and uaLa Warehouslng. 1he founder and presldenL ls ur. 8lchard PackaLhorn, who has more Lhan LhlrLy years of experlence ln Lhe lnformaLlon 1echnology lndusLry as a well-known lndusLry analysL, Lechnology lnnovaLor, and lnLernaLlonal educaLor. Pe has ploneered many lnnovaLlons ln daLabase managemenL, declslon supporL, cllenL-server compuLlng, daLabase connecLlvlLy, and daLa warehouslng. 8lchard was a member of Codd & uaLe AssoclaLes and uaLabase AssoclaLes, early ploneers ln relaLlonal daLabase managemenL sysLems. ln 1982, he founded Mlcroueclslonware lnc. (Mul), one of Lhe flrsL vendors of daLabase connecLlvlLy producLs, growlng Lhe company Lo 180 employees. Sybase, now parL of SA, acqulred Mul ln 1994. Pe ls a member of Lhe l8M Cold ConsulLanLs and Lhe 8oulder 8l 8raln 1rusL. Pe has wrlLLen Lhree books and has LaughL aL Lhe WharLon School and Lhe unlverslLy of Colorado. Pe recelved hls degrees from Lhe Callfornla lnsLlLuLe of 1echnology and Lhe unlverslLy of Callfornla, lrvlne. /D#%& H%7 I5#(2#7 1eradaLa ls Lhe world's largesL company focused on analyLlc daLa soluLlons Lhrough lnLegraLed daLa warehouslng, blg daLa analyLlcs, and buslness appllcaLlons. Cnly 1eradaLa glves organlzaLlons Lhe advanLage Lo Lransform daLa across Lhe organlzaLlon lnLo acLlonable lnslghLs empowerlng leaders Lo Lhlnk boldly and acL declslvely for Lhe besL declslons posslble.
1he 8esL ueclslon osslble and SCL-P are Lrademarks, and 1eradaLa, Lhe 1eradaLa logo, and AsLer SCL- Map8educe are reglsLered Lrademarks of 1eradaLa CorporaLlon and/or lLs afflllaLes ln Lhe u.S. and worldwlde. L8-7471 > 0113