Sie sind auf Seite 1von 20

A 81I Study

!"#$%&'#( #* +', -.&. /(.$0&'123


!4567'6(162 8'&9 :67.;.&. /2&67 .(; /5.196 <.;##5

"#$%&'( )&$*&+%,'-. /,0(1' 21$%-,0,34 5-$6
7&-8&'4 9:;<
1bls stoJy explotes tbe evolotloo of blq Joto ooolytlcs ooJ lts mototlty wltblo tbe
eotetptlse. 1be loltlol focos wos oo tbe opptoocbes ooJ ecooomlcs to osloq
1etoJoto Astet ulscovety llotfotm ooJ Apocbe noJoop wltblo tbe some
ooolytlcol otcbltectote. 1btee compooles ooooymoosly sboteJ tbelt expetleoces
ooJ ptoctlces wltb blq Joto ooolytlcs ooJ blqbllqbteJ tbe beoeflts ooJ lssoes of
tbelt Jool Astet-noJoop otcbltectotes.
1be stoJy olso ootlloes tbe ttoJeoffs ooJ loslqbts of bow tbe ptoctlce of blq Joto
ooolytlcs ls mototloq lo compooles toJoy. compooles ote mokloq sobstootlol
lovestmeots lo ooolytlcol otcbltectotes ooJ most be koowleJqeoble of tbese
ttoJeoffs. 1be tecbooloqy ls movloq vety fost ooJ, beoce, compooles most sotlsfy
cotteot tepoltemeots wblle coostootly ploooloq fot fotote tepoltemeots. 1be
toke-owoy of tbls stoJy ls o setles of loslqbts tbot o compooy sboolJ petloJlcolly
Jebote os lts ooolytlcol otcbltectote mototes.

Challenges ...................................................................................... 2
Company A - lnnovaLlve e-Commerce 8eLaller.............................. 4
Company 8 - Clobal AdverLlslng Agency........................................ 8
Company C - Large PealLhcare rovlder...................................... 12
1radeoffs ...................................................................................... 14
lnslghLs ......................................................................................... 16
LndnoLes....................................................................................... 19
AbouL Lhe MeLhodology............................................................... 20
AbouL 8older 1echnology............................................................. 20
AbouL Cur Sponsor....................................................................... 20

8olJet 1ecbooloqy, loc. 201J 2
+

=9.$$6(,62
1hls sLudy lnvesLlgaLes Lhe challenges faclng professlonals who are blendlng blg daLa and
analyLlcs wlLhln Lhelr enLerprlses. ln parLlcular, Lhe focus ls how Lo deLermlne Lhe besL analyLlcal
archlLecLure LhaL wlll serve boLh Lhe currenL and fuLure demands of Lhelr company. Amld Lhe
rapld changes ln Lechnology and lLs appllcaLlons, several crlLlcal Lradeoffs should drlve Lhe
evoluLlon of Lhls archlLecLure.
8lg daLa ls blg! AnalyLlcs uslng blg daLa ls hoL! 1he new Lechnologles for blg daLa analyLlcs are
lncredlble and evolvlng rapldly. new buslness appllcaLlons are emerglng weekly and are
revoluLlonlzlng Lhe way LhaL many lndusLrles do buslness wlLh Lhelr cusLomers, suppllers and
oLher parLners. 1hls summarlzes Lhe feellngs volced ln many lndusLry publlcaLlons. Powever,
whaL ls really happenlng? WhaL are Lhe cuLLlng-edge companles dolng Loday?
ln Lhls sLudy, Lhe Lerm 'blg daLa analyLlcs' ls deflned as:
AnalyLlcs
1
uslng blg daLa (as characLerlzed by volume, veloclLy, and varleLy
2
)
wlLhln an enLerprlse archlLecLure (across mulLlple funcLlonal areas) Lo supporL
crlLlcal operaLlonal processes (as conLrasLed wlLh one-Llme ad-hoc analyses).
ln Lhe pasL, analyLlcs was reserved for back-room dellberaLlons by daLa geeks generaLlng
monLhly reporLs on how Lhlngs are golng. 1oday, analyLlcs make a dlfference ln how Lhe
company does buslness, day by day, and even mlnuLe by mlnuLe. 1he analyLlcal appllcaLlons are
'mlsslon-crlLlcal' componenLs ln Lhe overall buslness processes. lf Lhe analyLlcs break,
execuLlves and cusLomers are upseL! 1hls ls a dramaLlc change ln Lhe role of blg daLa analyLlcs.
ln Lhls sLudy, Lhe Lerm 'analyLlcal archlLecLure' refers Lo Lhe lnformaLlon Lechnology (l1)
archlLecLure used Lo supporL blg daLa analyLlcs ln a company. A slmllar Lerm ls 'dlscovery
plaLform' Lo emphaslze Lhe daLa dlscovery process (of predlcLlve analyLlcs, daLa mlnlng and Lhe
llke), as conLrasLed wlLh Lhe pre-deflned reporLlng and analysls processes ln 8l sulLes. lurLher,
Lhe Lerm ls used ln Lhls sLudy Lo dlsLlngulsh beLween Lhe larger and more lncluslve 'enLerprlse
archlLecLure.' 1here are several lssues dlscussed laLer abouL Lhe relaLlonshlp beLween Lhe Lwo
Lypes of l1 archlLecLures.
1o undersLand Lhese changes, we lnLervlewed Lhree lnnovaLlve companles, ln dlfferenL
lndusLrles, who are ploneers wlLh blg daLa analyLlcs. lrom Lhese lnLervlews, each of Lhe
companles menLloned Lhe followlng challenges.
!"#$ &' ()"# *#"+'&,-.
1he flrsL challenge ls Lo compare and conLrasL Lwo
plaLforms sulLable for processlng blg daLa volumes, whlch are Apache Padoop
3
and 1eradaLa
AsLer
4
. 1here are many companles LhaL have one buL noL boLh. 1o sollclL knowledgeable
oplnlons, Lhe sLudy selecLed companles LhaL have lncorporaLed boLh plaLforms lnLo Lhelr
analyLlcal archlLecLure. 1he ob[ecLlve ls Lo documenL how Lhe companles leverage Lhe unlque
sLrengLhs of Lhe plaLforms Lo generaLe buslness value.
8olJet 1ecbooloqy, loc. 201J J
lL was apparenL LhaL each company appreclaLed boLh plaLforms for a varleLy of reasons and
found complemenLary ln several ways. ln general, Lhe oplnlon was LhaL Padoop provlded a
creaLlve and lncreaslngly sLable mechanlsm Lo acqulre and reLaln blg daLa, whlle Lhe 1eradaLa
AsLer plaLform provlded beLLer producLlvlLy for dlscoverlng paLLerns and explorlng daLa. 8oLh
overlap ln funcLlonallLy when fllLerlng and Lransformlng daLa. 1he release of Lhe
1eradaLa AsLer SCL-P` connecLor was plvoLal ln shlfLlng Loward an archlLecLure LhaL blends
boLh plaLforms. More deLalls are glven ln Lhe Lhree case sLudles.
("+" !.)/"-0
1he second challenge ls Lo Lurn Lhe fears of Lhe lncomlng daLa Lsunaml lnLo
reallsLlc expecLaLlons abouL Lhe beneflLs of blg daLa analyLlcs Lo Lhe company.
lor lnsLance, Lhe lnLervlewees ofLen dlscussed Lhe volumes and veloclLles of
acqulred daLa. ln some cases, companles are acqulrlng several LerabyLes dally, lmplylng LhaL Lhe
LoLal daLa sLore musL handle peLabyLes Lo malnLaln a mulLl-year hlsLory. ln addlLlon, Lhere were
lnLeresLlng dlscusslons cenLered on new daLa sources Lo supporL new buslness appllcaLlons.
lurLher, Lhe (ofLen devlllshly) mulLl-sLrucLured naLure of Lhe daLa ls drlvlng analyLlcs Lo hlgher
levels of sophlsLlcaLlon.
lrom Lhe lnLervlews, lL was apparenL LhaL Lhe Lsunaml analogy ls becomlng lnapproprlaLe.
lnsLead of a large remoLe force floodlng a company wlLh overwhelmlng daLa, companles are
flndlng blg daLa opporLunlLles ln every corner of Lhelr buslness. LlfL a rug ln any room, and Lhere
are LerabyLes awalLlng analysls!
("+" +& 12+0&/
1he nexL challenge ls Lo documenL Lhe ways LhaL companles managed Lhe analyLlcal processes
LhroughouL Lhe enLlre analyLlcs value chaln from raw daLa Lo buslness acLlons, as lllusLraLed ln
llgure 1.

I|gure 1 - 1he Ana|yt|c Va|ue Cha|n.
S

AnalyLlcs generaLe buslness value only when lL can lmprove buslness processes Lhrough speclflc
changed acLlons. ln Lhe pasL, Lhe emphasls was on generaLlng lnformaLlon, whlch was
dlsLrlbuLed Lo Lhe 'rlghL' people ln a Llmely fashlon. 1he assumpLlon was LhaL Lhose people
would consume Lhls lnformaLlon and perform Lhelr [ob funcLlons ln a more effecLlve (or aL leasL
efflclenL) manner. Powever, Lhlngs happen Loo qulckly ln Loday's companles for Lhls paradlgm
Lo be sufflclenL. AnalyLlcs musL be lnLegraLed dlrecLly lnLo buslness processes, wlLh Lhe proper
human overslghL. WlLh blg daLa, Lhe analyLlcal archlLecLure musL focus on how sLrucLure (meLa-
daLa) enhances Lhe raw daLa, lllumlnaLlng Lhe myrlad of relaLlonshlps LhaL llnk one daLa
elemenL Lo anoLher. 1haL ls why cross-funcLlonal daLa llnkages lnLo Lhe corporaLe daLa
warehouse are crlLlcal for a vlable analyLlc value chaln.
8olJet 1ecbooloqy, loc. 201J 4
!&+"# 3&.+ &' 45/$,.607
1he flnal challenge concerns Lhe LoLal cosL of ownershlp (1CC) of an analyLlcal archlLecLure. 1he
challenge ls deLermlnlng Lhe proper meLhod for calculaLlng Lhe LoLal cosL Lo acqulre, malnLaln
and exLend Lhe analyLlcal archlLecLure for Lhe company. ln Lhe lnLervlews, lL was dlfflculL Lo
obLaln speclflc amounLs for Lhe varlous cosLs assoclaLed wlLh analyLlcs. Powever, Lhere was
conslderable oplnlon on whaL cosL facLors a company should conslder, such as use of currenL
Lechnlcal skllls, flexlblllLy ln proLoLyplng new appllcaLlons, pay-as-used cloud-based servlces,
and depLh of analyLlcs. ln oLher words, a serlous 1CC lnvesLlgaLlon for an analyLlcal archlLecLure
wlll lnvolve cosL facLors more dlverse LhaL a LradlLlonal l1 archlLecLure. Companles LhaL
mandaLe an adherence Lo a LradlLlonal l1 assessmenL wlll be aL a dlsadvanLage Lo pursue
lnnovaLlve appllcaLlons lnvolvlng blg daLa analyLlcs. 1he responslble execuLlves musL Lhlnk
sLraLeglcally and broadly abouL 1CC assessmenLs of poLenLlal analyLlcal archlLecLures.
- - - - -
1o undersLand Lhese challenges, Lhe nexL secLlon descrlbes how Lhese companles applled blg
daLa analyLlcs ln Lhelr buslnesses:
Company A - lnnovaLlve e-Commerce 8eLaller
Company 8 - Clobal AdverLlslng Agency
Company C - Large PealLhcare rovlder
=#>5.(0 / ? @((#".&'"6 6A=#>>6716 B6&.'$67
Company A ls an lnnovaLlve e-commerce reLaller LhaL sells a dlverse seL of
producLs and servlces. 1helr dlsLlncLlve ls Lo provlde Lhelr cusLomers wlLh an exclLlng onllne
shopplng experlence Lallored Lo Lhelr needs, along wlLh superb servlce. 1he success facLor for
Company A ls Lo know more abouL Lhelr cusLomers so Lhey can caLer Lo Lhem beLLer.
1he ulrecLor of uaLa Lnglneerlng for Company A was lnLervlewed. Pe has a Leam of Lwelve
persons, whom are parL of a Lechnology group supporLlng daLa warehouslng, buslness
lnLelllgence, and analyLlcs. Pe descrlbed Lhelr focus as, eosotloq tbot evetyooe lo tbe compooy
bos bettet occess to tbe televoot Joto fot motketloq optlmlzotloo ooJ tbe A/8 testloq plotfotm."
Pls group ls. Ctowloq vety well ooJ om totolly psycbeJ oboot lt."
8"29:,&)/;
Pe conLlnued by explalnlng Lhe buslness problem and Lhelr approach.
Oot qool ls to be mote tespooslve to tbe costomet - to bove o floqet oo tbe polse of wbots
qoloq oo. 8y loteqtotloq ooJ ooolyzloq Joto oboot tbe webslte bebovlot of costomets, we coo
Jtow telotloosblps tbot otbets moy oot bove seeo. lt ls mote tboo Joto loteqtotloo, lt ls potteto
tecoqoltloo.
Pls Leam lnLegraLes cllcksLream lnformaLlon wlLh emall logs, ad vlewlng, and operaLlonal daLa
Lo flgure ouL whaL ls golng on wlLh our cusLomer." 1hls daLa was used Lo supporL A/8 LesLlng
and mulLlvarlaLe LesLlng Lo Lrack Lhe enLlre cusLomer's experlence - from searchlng for Lhe
8olJet 1ecbooloqy, loc. 201J 5
producL, orderlng lL, recelvlng Lhe package, and even producL reLurns, all Lo opLlmlze Lhe
experlence for Lhe cusLomer.
1,260+$2+),$
Pe descrlbed Lhe evoluLlon of Lhelr archlLecLure over Lhe pasL Lwo years.
1be compooy wos jost o booJteJ people two yeots oqo. we boJ oo ooolytlc tools. Oot teom
wos jost two people ooJ me. Clveo oot ttoJltloo of Joloq tbloqs lo oo looovotlve woy, we
stotteJ osloq MopkeJoce wltb Apocbe noJoop fot ooolytlcs. nowevet, we koew tbot tbls
otcbltectote woolJ oot be sofflcleot. we oeeJeJ to bollJ o Joto woteboose. lo oJJltloo, coqoos
ooJ 5potflte wete ptevloosly potcboseJ ooJ boJ o oset bose tbot tepolteJ soppott. 5o, we
lovestlqoteJ otbet ooolytlc plotfotms, socb os Astet uoto (befote lts ocpolsltloo by 1etoJoto),
Otocle, vettlco, Cteeoplom, ooJ Mooqou8.
AbouL a year ago, Lhe Leam acqulred 1eradaLa AsLer Lo complemenL Lhelr Padoop plaLform. 1he
1eradaLa AsLer plaLform ls hosLed on Amazon LC2 cloud and ls belng used as a reporLlng
plaLform, along wlLh lLs analyLlc processlng for dlscoverlng new lnslghLs beyond reporLlng.
When asked abouL Lhe reasons for decldlng on 1eradaLa AsLer, he explalned, l explolo wbot l
oeeJ to tbe Astet folks. we lmmeJlotely boJ o qooJ wotkloq telotloosblp wltb some poollty
Astet folks. 8y osloq tbe Amozoo c2 clooJ, tbe ptoject polckly qelleJ loto ploce ooJ bos wotkeJ
soccessfolly ovet tbe post yeot."
1eradaLa AsLer uaLabase allows Lhe Leam Lo analyze more of Lhese poLenLlal behavlor paLLerns
ln a sLable and scalable way. ln parLlcular, Lhe dlsLrlbuLed parallel naLure of Lhe 1eradaLa AsLer
dlscovery plaLform allows reasonable compuLaLlon Llmes wlLh large daLa seLs.
ln llgure 2, Lhe archlLecLure shows Lhe daLa sources, analyLlc plaLforms, and daLa dellvery, from
lefL Lo rlghL.

I|gure 2 - Arch|tecture for Company A.
As lndlcaLed, Lhere are Lhree paLhs Lhrough Lhe archlLecLure. llrsL, Lhere ls a dlrecL load lnLo Lhe
Padoop plaLform for ad-hoc analyses uslng Padoop Map8educe. Second, Lhere ls fllLerlng and
cleanslng processlng lnLo Lhe AsLer plaLform, whlch drlves Lhe A/8 LesLlng for Lhe reLall webslLe.
8olJet 1ecbooloqy, loc. 201J 6
1hlrd, Lhere ls a dlrecL load lnLo Lhe Padoop plaLform as cold sLorage, and Lhen daLa ls exLracLed
and loaded uslng Padoop Map8educe lnLo Lhe AsLer plaLform for Lhe analyLlcal processlng.
1he Padoop cold sLorage conLalns daLa LhaL ls lnfrequenLly accessed. 8y uslng Amazon LC2, Lhe
company ls able Lo absorb large daLa volumes wlLh flucLuaLlng demands" and Lhen, as needed,
Lransfers daLa seLs Lo Amazon LlasLlc Map8educe. Pe offered some general observaLlons abouL
Lhe Padoop plaLform.
noJoop ulsttlboteJ llle 5ystem ls tbe sexy pott of tbe Apocbe noJoop ptoject. lt ls opeo-sootce
ooJ ollows tbe ptoblem solvloq poolltles of MopkeJoce to teolly sbloe. lt ls o btllllootly slmple
ftomewotk tbot wotks well.
Cn Lhe oLher hand, he concludes LhaL Lhe AsLer plaLform ls good aL dlscoverlng new lnslghLs ln
Lhe daLa slnce, as he remarked, Aoolytlcs, llke k, Jo oot bebove olcely lo noJoop."
8aw daLa ls exLracL from varlous sources and loaded onLo Padoop for lnlLlal exploraLlon. LaLer,
Lhe daLa ls fllLered/Lransformed and reLalned for 6 monLhs on Lhe AsLer plaLform for rapld ad-
hoc analysls. Pe explalned LhaL noL all of Lhe raw daLa ls sLored on Lhe AsLer plaLform, malnly
for cosL reasons. Not evety plece of Joto flows to Astet, sloce we oeeJ to Jo some Jlqqloq
obeoJ of tlme osloq MopkeJoce lo noJoop."
Pls Leam has developed many Map8educe funcLlons ln Lhe AsLer SCL-Map8educe framework,
whlch ls much easler Lhan ln sLandard SCL. Cne analyLlc funcLlon LhaL Lhey supporL on Lhe AsLer
plaLform ls a random foresL analysls
6
used ln several buslness areas.
Pe descrlbed Lhe 1eradaLa AsLer plaLform ls our Swlss army knlfe" wlLhln Lhelr l1 archlLecLure,
slnce lL enables hls Leam to covet oll tbe boses oeeJeJ to solve tbe bosloess ptoblem." Pe
noLed LhaL Lhe AsLer plaLform has Lhe flexlblllLy Lo manage a varleLy of daLa Lypes. ln addlLlon,
lL has llnkages Lo Lhe Padoop plaLform and supporL for Lhe newer analyLlc Lools llke SpoLflre.
AlLhough successful ln some areas of analyLlcs, he feels LhaL Lhe lndusLry ls now flndlng LhaL
Padoop does noL cover a wlde enough breadLh of analyLlcal processlng for an enLerprlse. 1be
noJoop plotfotm coooot Jo oll tbe tbloqs l oeeJ. nowevet, l bove tbe Astet plotfotm to covet
tbose oteos." Cver Llme, he wlll shlfL more of Lhe workload onLo Lhe AsLer plaLform, uslng Lhe
Padoop plaLform less for analyLlc work and more for mass sLorage.
Pe Lhen offered Lhree examples of recenL pro[ecLs LhaL uLlllzed Lhese feaLures:
we JlJ oo ooolysls of btowset osoqe by lotetoet xplotet l6 oqolost oot webslte. 1be
tow cllckstteom wos loltlolly stoteJ lo noJoop, sommotlzeJ, ooJ tbeo exttocteJ loto tbe
Astet plotfotm fot ose wltb 5potflte ooJ k. A volooble feotote of tbe Astet plotfotm ls tbe
compotlblllty to ootpot MopkeJoce tesolts to coqoos wltb Ou8c coooectots, 5potflte
wltb Iu8c coooectots, ot stooJotJ 5Ol poetles.
A teolly cool oppllcotloo wos o MopkeJoce ptoqtom tbot blts Cooqle ooolytlcs os o web
setvlce, pollloq lts lofotmotloo loto tbe Astet plotfotm. 1beo, we oseJ 5Ol-MopkeJoce to
poety Cooqle ooolytlcs wltb tbe ptopet jolos, wltboot petmooeotly stotloq tbls Joto. lo
oJJltloo, tbe oppllcotloo execotes wltblo o teosoooble tlmeftome. 1be Jevelopet wtote
tbls oppllcotloo lo o few Joys. we coooot Jo tbls wltb Otocle ot otbet ttoJltloool tools.
8olJet 1ecbooloqy, loc. 201J 7
Aootbet oppllcotloo tbot l llke wos. we tokeolze ooJ potse lotqe volome of text ftom
1wlttet feeJs to qoqe seotlmeot towotJ tbe compooy.
3&.+ <"2+&,.
Pls Leam had conslderable experlence wlLh Lhe osLgreSCL daLabase (from whlch Lhe AsLer
plaLform was derlved). Pence, Lhe adopLlon of 1eradaLa AsLer was an easy LranslLlon for our
developers and daLabase admlnlsLraLors, Lhus avoldlng Lhe cosL of acqulrlng speclallzed
daLabase skllls.
8ecause of cosL facLors, Lhe AsLer plaLform only reLalns daLa used by Lhe analyLlcs, whlch are a
few LerabyLes raLher Lhan peLabyLes. we ote o lotqe compooy so tbe l1 costs ote ctltlcol." ln
conLrasL, he was more concerned wlLh Lhe supporL cosLs of Cognos, whlch JemooJs o lot of
loftosttoctote."
1he adopLlon of Amazon LC2 ls easy for lLs lncremenLal cosL sLrucLure and reasonable Lechnlcal
skllls. rovlslonlng Padoop vla Lhe Amazon LlasLlc Map8educe ls also an easy mlgraLlon paLh for
hls Leam. Padoop clusLer needs aL leasL slx machlnes buL Lhey can ask for 40 machlnes lf Lhe
Map8educe [ob requlres lL. Powever, Padoop has long laLency Llmes Lo 'spln up.' Pe remarked,
noJoop tokes tlme to stott ooJ Jo lts wotk.
Pe noLed LhaL Padoop Map8educe can solve mooy ptoblems, bot yoo oeeJ o lb.u. to
potollellze tbe ptocessloq efflcleotly," lmplylng LhaL analyLlcs wlLh Padoop requlre expenslve
and scarce persons wlLh a deep sklll seL. lurLher, lL ls hard Lo flnd persons wlLh Lhe skllls Lo
malnLaln Lhe Padoop plaLform and Lo avold Lhe cosLs of enLerprlse supporL for Padoop from
Cloudera or PorLonworks.
=$..&/. =$",/$;
Pe was asked Lo share some lessons LhaL hls Leam has learned over Lhe lasL Lwo years. Pe
menLloned Lhe followlng polnLs:
ueslgn a proof-of-concepL (CC) exerclse uslng a speclflc problem seL Lo be solved by
Padoop, AsLer, or anoLher analyLlc plaLform. rove Lo yourself wheLher each plaLform
can provlde Lhe proper answers for LhaL problem seL and can handle lLs esLlmaLed
workload.
We declded noL Lo creaLe our own daLa cenLer wlLh on-slLe hardware. lnsLead, we are
uslng Amazon LC2 servlces where we can provlslon exLra resources only when needed.
1hus, we have avolded Lhe usual problems and cosLs assoclaLed wlLh operaLlng a daLa
cenLer. And, we avold paylng on-golng operaLlonal/supporL cosLs LhaL are conflgured
for perlods of peak demands.
1he AsLer plaLform behaves llke a Lyplcal daLabase sysLem. We are noL worrled, or
llmlLed by lLs lnfrasLrucLure. uaLabase operaLlon can be asslgned Lo a normal daLabase
admlnlsLraLor. ln oLher words, we qet lt!" Moreover, we are noL worrled abouL Lhe
enLlre sysLem golng down.
8olJet 1ecbooloqy, loc. 201J 8
Any new and lnnovaLlve appllcaLlon of new Lechnology wlll have lLs problems. LxpecL lL.
8e paLlenL. ?our vendor should collaboraLe wlLh your company Lo resolve Lhose
problems, as 1eradaLa AsLer dld wlLh us.
>)--",?
ln [usL Lwo years, Company A maLured Lhelr l1 archlLecLure ln daLa warehouslng and ln analyLlcs
ln lnnovaLlve ways. WlLh Amazon Web Servlces (AWS), LradlLlonal daLa cenLer lnfrasLrucLure ls
Lraded for rapld scalablllLy and buslness flexlblllLy by uslng cloud-based servlces. CosL facLors
llke Llme-Lo-value and rlsk assessmenL welghLed heavler Lhan sLafflng raLes and sofLware
llcenslng. 1he use of Lhe 1eradaLa AsLer plaLform was Lhe 'Swlss army knlfe' ln hls LoolklL,
allowlng hlm Lo cover many daLa analysls requlremenLs wlLh mlnlmal lnfrasLrucLure. And, Lhe
use of Padoop (also ln AWS) provldes conLlnulng buslness value as 'cold sLorage' for Lhe blg
daLa Lsunaml.
=#>5.(0 + ? C$#D.$ /;"67&'2'(, /,6(10
Company 8 ls a large full-servlce adverLlslng agency wlLh dlglLal markeLlng and
Lechnology aL lLs core. 1he company creaLes dlglLal medla LhaL bulld buslness ldenLlLy Lhrough
web developmenL, medla plannlng and buylng, Lechnology and lnnovaLlon, emerglng medla,
analyLlcs, moblle, adverLlslng creaLlve, soclal lnfluence markeLlng, and search. 1helr cllenLs are
some of Lhe largesL global corporaLlons.
8"29:,&)/;
1he vlce resldenL of 8uslness lnLelllgence was lnLervlewed. Pe descrlbed hls company and lLs
use of blg daLa analyLlcs as follows:
Oot cote osoqe toJoy ls focoseJ oo o moltl-otttlbotloo moJel tbot empboslzes tbe
seqmeototloo of meJlo ooJleoces. lo coottost to tbe lost toocb opptoocb, we ote oow
ooolyzloq o comptebeoslve vlew of oll tbe toocb polots of costomets octoss oll moltl-meJlo
cboooels, socb os lotetest oo oo oJvettlsets webslte lo oJJltloo to bebovlot oo tbe veoJots oo-
slte webslte. lot lostooce, wbeo costomets tecelveJ oo emoll, Jo tbey look ot lt, ooJ wbot Jo
tbey Jo, socb os cllck fot oJJltloool lofotmotloo? AoJ, wbeo costomets seotcb oo teloteJ toplcs,
Jo tbey cllck tbtooqb oo JlsployeJ oJ boooets? AoJ, wbeo o costomet potcboses o ptoJoct,
wbot wos tbe sepoeoce of toocbes tbot leJ to tbot potcbose?
1he herlLage of Company 8 ls ad servlng. ln oLher words, Lhelr focus ls opLlmlzlng mulLl-channel
dlglLal markeLlng by undersLandlng cusLomers' behavlor across all medla channels, enhanclng
Lhls daLa wlLh Lhlrd-parLy lnformaLlon, and assesslng lmpacL LhaL speclflc medla has on drlvlng
value Lo Lhe buslness. ln addlLlon, Lhey segmenL Lhe cusLomers based on how medla lnfluences
Lhelr acLual purchases, Lhus opLlmlzlng Lhe LargeLlng of ad channels.
1hls cusLomer-segmenLaLlon analyLlcs provldes crlLlcal lnformaLlon Lo Lhe company's lnLernal
Leams who supporL cllenLs. 1hey manage Lhe whole range of servlces Lo Lhelr cllenLs, from
analyzlng Lhe on-llne experlences of Lhelr cusLomers, purchaslng Lhe medla, and managlng
8olJet 1ecbooloqy, loc. 201J 9
Search Lnglne CpLlmlzaLlon (SLC) work. 1be compooy covets tbe wbole qomblt!" 1he lnLernal
cllenL Leams use Lhe lnslghLs emerglng from Lhe medla segmenLaLlon analyLlcs Lo opLlmlze
dlglLal markeLlng campalgns from Lhe cllenL. 1he boLLom llne for value Lo our cllenLs ls Lo
opLlmlze how Lhey spend Lhelr medla funds on dlglLal markeLlng.
1,260+$2+),$
Company 8 processes 3.3 LerabyLes or 36 bllllon rows of cllcksLream daLa every day from dlglLal
medla, slLe behavlor, soclal medla, and offllne medla. ln medla speak, LhaL's abouL 400 bllllon
medla lmpresslons per year.
MosL of Lhe l1 lnfrasLrucLure resldes ln Lhe cloud under Amazon Web Servlces (AWS). 1he dally
process uses Amazon LlasLlc Map8educe (LM8) Lo cleanse and aggregaLe Lhe cllcksLream daLa
lnLo LransacLlonal cookle-level (or sesslon-level) daLa, whlch passes Lo AsLer uaLabase for
advanced analysls. ln addlLlon, Lhe large daLa seLs are reLalned ln Amazon S3 cold sLorage for
fuLure analysls.
1he LransacLlonal daLa ls parLlLloned among Lhelr cllenLs. CllenLs have Lhelr separaLe cusLom
daLa marLs so LhaL Lhe cllenL daLa ls noL co-mlngled. 1he cllenL daLa ls opLlmlzed and sLrucLured
as cusLom CLA cubes for use by our lnLernal cllenL Leams. 8eporLlng Lo Lhe cllenLs abouL dlglLal
markeLlng performance ls performed from Lhese cubes.
Company 8 bullL a daLa marL cusLomlzed Lo Lhe unlque buslness requlremenLs for each cllenL.
Access Lo Lhe cllenL daLa marL ls prlmarlly by our lnLernal Leams. 1hough Lhere ls hlgh-level
reporLlng dlrecLly Lo cllenLs so LhaL Lhey can undersLand why we declded on cerLaln adverLlslng
opLlmlzaLlons. CllenLs need Lo know why Company 8 has moved Lhelr funds from one
adverLlslng channel Lo anoLher.
ln addlLlon, Lhe AsLer uaLabase plaLform (whlch also resldes ln AWS) performs advanced
analyses on Lhe LoLal LransacLlonal daLa seL Lo generaLe Lhe cusLomer segmenLaLlon deflnlLlons,
whlch ls used for medla LargeLlng aL Lhe cusLomer lnsLance by each cllenL.
1he AWS cloud provldes conslderable elasLlclLy ln compuLe resources. 1he Padoop Map8educe
processlng requlres hundreds of machlne lnsLances Lo process LhaL hlgh-volume daLa. uslng
!ava Lo lnvoke Map8educe, Company 8 may spln up 30 machlne lnsLances for a [ob and 20
lnsLances for anoLher [ob, much llke Lhe old days of malnframe [ob processlng. Powever, when
Lhe processlng ls compleLe, Lhose lnsLances are LermlnaLed. Pe summarlzed, We pay only for
whaL we use."
8olJet 1ecbooloqy, loc. 201J 10

I|gure 3 - Arch|tecture for Company 8.
!$,";"+" 1.+$,
1he 1eradaLa AsLer plaLform provldes an analyLlcal llbrary of SCL-Map8educe funcLlons LhaL we
use as bulldlng blocks for our processlng. Cver Lhe long Lerm, we are consolldaLlng Lhe Amazon
LM8 [obs onLo Lhe AsLer plaLform Lo mlnlmlze Lhe daLa hops. 1he less we Louch Lhe daLa Lo geL
lL lnLo lLs flnal form and Lo lLs flnal resLlng place, Lhe more efflclenL we wlll be.
1he focus of Lhe AsLer plaLform ls Lo manage Lhe medla lnformaLlon aL Lhe aggregaLe cookle
level so LhaL we can see who ls consumlng whaL medla. AL Lhe momenL, we are 'klcklng Lhe
Llres' Lo learn Lhe capablllLles of Lhe AsLer plaLform, wlLh Lhe lnLenL Lo consolldaLe oLher
processlng onLo AsLer.
1he segmenLaLlon analyLlcs use Lhe AsLer SCL-Map8educe funcLlons for sesslonlzaLlon and
paLhlng, along wlLh our cusLom maLchlng algorlLhm Lo maLch sesslon ldenLlflers and creaLe a
cenLrallzed ldenLlfler for Lhe cusLomer. 1hls processlng ls performed wlLh a slngle pass of Lhe
daLa, whlch ls more efflclenL Lhan normal SCL processlng. uependlng on Lhe cllenL, speclflc daLa
abouL Lhe cllenL's buslness ls broughL lnLo Lhelr daLa marL because of Lhelr unlque daLa needs.
3#&);@8".$; A/',".+,)2+),$
8y uslng Lhe AWS cloud servlces, Company 8 ls able Lo parLlLlon usage and sLorage by cllenL and
Lo ldenLlfy accuraLe cosLs per cllenL. ln Lhe medla servlces buslness, prlclng ls based on Lhe
markup of Lhe varlous volume-based adverLlslng cosLs, such as bllllons of lmpresslons per
monLh
7
and glgabyLes of cllenL sLorage. 8y uslng Lhe deLalled cosL accounLlng of each resource
based on cllenL, Lhe company has good vlslblllLy and accuracy lnLo Lhe prlclng aspecL of Lhelr
buslness. Pe summarlzed, When we operaLed a slngle large daLa cenLer, we dld noL have Lhls
undersLandlng of cosL allocaLlon among cllenLs."
3&,$ 3&-7$+$/2$.
As a core compeLence, Company 8 knows all abouL dlglLal markeLlng and Lhe Lechnology drlvlng
dlglLal markeLlng. Cur value Lo cllenLs ls Lo quesLlon cllenLs abouL wheLher Lhey have looked aL
8olJet 1ecbooloqy, loc. 201J 11
Lhls Lechnology or LhaL Lechnology. And, lf we have successfully applled a new Lechnology Lo a
cllenL's problem, we Lhen would suggesL Lo oLher cllenLs Lo look aL LhaL same Lechnology.
We focus on our core compeLence ln dlglLal markeLlng opLlmlzaLlon. lf we need resources
ouLslde our core compeLence, we wlll purchase lL ouLslde. ln facL, we have leveraged offshore
resources Lo reduce our cosLs. Powever, lf lL ls parL of our core compeLence, Lhen we wlll bulld
or acqulre lL lnLernally. 1he bulld-versus-buy declslons wlLh a sofLware pro[ecL are ofLen [udged
from Lhls perspecLlve.
Pe gave Lhe followlng example:
we coolJ bove bollt tbose ooolytlc compooeots lo tbe Astet plotfotm. nowevet, l olso koow tbot
Jeveloploq tbose compooeots ls oot oot cote competeoce, so we potcbose tbem ftom 1etoJoto
Astet. nowevet, osloq tbose compooeots lo oolpoe woys to sotlsfy oot clleots oeeJs, tbls ls oot
cote competeoce.
=$B$,":0/: 47$/@>&),2$ >&'+5",$
1here ls pressure Lo leverage open-source sofLware, llke Apache Padoop. under Lhe rlghL
clrcumsLances, open source sofLware ls very aLLracLlve. As Lhls Lechnology maLures and more
people use lL ln new ways, Lhe packaged sofLware vendors wlll be pressured Lo exLend Lhelr
funcLlonallLy Lo malnLaln Lhelr compeLlLlve advanLage over open-source alLernaLlves.
1.+$, >C=@D 3&//$2+&,
1o leverage open-source sofLware, 1eradaLa AsLer offers a connecLlon layer, called AsLer SCL-
P`, Lo Lhe Padoop ulsLrlbuLed llle SysLem (PulS) daLa based on PCaLalog meLa-daLa. 1hls
feaLure enables PulS flles Lo be read as naLlve AsLer Lables, Lhus leveraglng Lhe AsLer uaLabase
analyLlc funcLlons. AlLhough Lhe lnLenL ls Lo move dally processlng Lo Lhe AsLer plaLform, Lhe
company wlll conLlnue Lo sLore and manage some daLa ln PulS, probably uslng SCL-P`. Slnce
1eradaLa AsLer charges by Lhe daLa volume managed by Lhe AsLer plaLform, Lhe company feels
LhaL lL needs Lhe flexlblllLy Lo uLlllze boLh opLlons.
1he currenL daLa reLenLlon pollcy ls aL leasL 12 monLhs. Slnce more Lhan flve LerabyLes are
acqulred per day, more Lhan one peLabyLe of daLa ls belng sLored ln Amazon LC2 every year.
1he company ls worklng on a long-Lerm reLenLlon pollcy Lo deLermlne Lhe buslness value of
archlvlng years of acqulred daLa.
=$..&/. =$",/$;
Pe shared Lhe lessons learned are abouL Lhe ups and downs of adopLlng new cuLLlng-edge
Lechnologles.
8eloq oqqtesslve wltb oew tecbooloqles ls tlsky sloce tbete ote ofteo ooexpecteJ obstocles.
votloos feototes ooJ fooctloos tbot yoo ossome wltb ttoJltloool tecbooloqy moy oot be pteseot
lo oew tecbooloqy. 5otptlse! oo Jo poy fot beloq oo tbe bleeJloq eJqe!
8olJet 1ecbooloqy, loc. 201J 12
>)--",?
As an ad agency for dlglLal medla, Company 8 uses analyLlcs Lo undersLand Lhe lmpacLs LhaL
dlglLal medla ls havlng on Lhe markeLplace of Lhelr cllenLs and Lo opLlmlze Lhe 'spend' for LhaL
dlglLal medla. 1he analyLlcs for Lhelr mulLl-aLLrlbuLlon model requlres large volumes of
conLlnuous daLa from numerous sources. CusLom daLa marLs for each cllenL manage Lhe resulLs
and supporL lnLernal Leams ln provldlng servlces Lo Lhelr cllenLs. Cloud-based lnfrasLrucLure
provldes elasLlclLy ln compuLlng resources and, especlally, accuraLe cosL accounLlng per cllenL.
1he AsLer plaLform provldes packaged analyLlcs for aglle analyses and a way of consolldaLlng
core daLa Lo mlnlmlze hops." Apache Padoop wlll conLlnue Lo provlde low-cosL sLorage and
fllLraLlon of raw daLa. 1he use of AsLer SCL-P` ls belng consldered as a means of lncreaslng
Lhelr daLa reLenLlon perlod Lo greaLer Lhan 12 monLhs.
=#>5.(0 = ? E.7,6 <6.$&91.76 F7#"';67
Company C ls a large healLhcare provlder wlLh dozens of hosplLals, many medlcal
cllnlcs, and Lhousands of careglvers. As a sLraLeglc l1 lnlLlaLlve, Lhey are serlous abouL handllng
Lhelr blg daLa properly so LhaL managers can access deLalled operaLlonal daLa Lo lmprove
healLhcare dellvery, Lo reduce unnecessary expenses and mlnlmlze medlcal llablllLles.
8"29:,&)/;
1he ulrecLor of 8uslness lnLelllgence was lnLervlewed. Pe descrlbed Lhelr experlence wlLh
analyLlcal plaLforms. AfLer conslderlng several opLlons, Lhey declded Lo lnvesL ln Lhe 1eradaLa
AsLer uaLabase plaLform for reporLlng. 8y consolldaLlng all Lhelr daLa marLs onLo Lhls one
plaLform, Lhe overhead of loadlng and Lransformlng daLa from dlverse sources and deploylng
query Lools Lo dlverse users ls slmpllfled, Lhereby reduclng cosLs. WlLh 18 monLhs of producLlon
experlence wlLh Lhe new plaLform, Lhe company ls tlckleJ plok wltb tbe petfotmooce."
*&+$/+0"# &' 80: ("+"
1he dlrecLor descrlbed healLhcare as a buslness LhaL generaLes volumlnous deLalled operaLlonal
daLa, whlch ls usually noL reLalned for more Lhan a few days. 1he LlecLronlc PealLh 8ecord
sysLem Lracks Lhe mlnuLe-by-mlnuLe evenL flow of medlcaLlons and procedures glven Lo
paLlenLs. MosL of Lhls lnformaLlon comes from speclallzed medlcal equlpmenL LhaL generaLes
real-Llme daLa, such as an M8l or CA1 scan. AlLhough some lmages are reLalned wlLhln Lhe
lcLure Archlval SysLem (AS), mosL of Lhe daLa sLored on medlcal devlces soon Jtops oo tbe
floot" (l.e., deleLed Lo accommodaLe new daLa).
An lmporLanL use case ls managlng paLlenLs' noLes creaLed by physlclans. 1he physlclans submlL
Lhelr medlcal reporLs as volce recordlngs, whlch are Lranscrlbed lnLo unsLrucLured LexL. 1hese
reporLs conLaln very lmporLanL lnformaLlon abouL Lhe paLlenL, such as fuLure dlagnoses. 1he
physlclan ls exLremely candld abouL Lhe paLlenL, more Lhan anywhere else." 1here has been
work on deslgnlng dlcLaLlon LemplaLes lnLo whlch Lhe unsLrucLured LexL could be mapped
durlng LranscrlpLlon. Powever, Lhls approach ls a parLlal soluLlon slnce Lhe LemplaLes cannoL
dlfferenLlaLe among Lhe dlverslLy of medlcal slLuaLlons. Long dlscusslons are ofLen placed ln a
8olJet 1ecbooloqy, loc. 201J 1J
slngle cell of Lhe LemplaLe. 1exL analysls could provlde more sLrucLure and hence more value Lo
Lhe physlclan's reporL.
1o beLLer undersLand Lhe dynamlcs and economles of Lhelr careglvlng buslness, medlcal
analysLs would llke to qet tbelt booJs oo tbot Joto." 1he only way Lo access Lhls deLalled daLa
now ls Lo reLrleve Lhe record and exLracL Lhe approprlaLe daLa - all manually.
<)+),$ *#"/.
As a fuLure pro[ecL, Company C ls plannlng Lo develop a Padoop plaLform Lo evenLually collecL
as much of Lhls unsLrucLured daLa as ls posslble. lL wlll be sLored as raw unsLrucLured daLa ln Lhe
Padoop ulsLrlbuLed llle SysLem (PulS), as shown ln llgure 4.

I|gure 4 - Aster SL-n Connector to nadoop nDIS.
1here are varlous meLhods worklng wlLh Lhe PulS daLa, all of whlch use Lhe Apache PCaLalog
Lable/sLorage managemenL Lo share schema lnformaLlon for lnLeroperablllLy. llrsL, Lhe Padoop
Map8educe englne supporLs [ob submlsslon and [ob schedullng as parL of Lhe Apache Padoop
package. Second, Apache Plve supporLs daLa aggregaLlon funcLlons wlLh lLs PlveCL language,
whlch are complled and submlLLed as Map8educe [obs. 1hlrd, Apache lg creaLes Map8educe
programs wlLh a hlgher-level procedural language. llnally, 1eradaLa AsLer supporLs an
exLenslon Lo Lhelr declaraLlve SCL-Map8educe language called AsLer SCL-P`, whlch accesses
Lhe Padoop PcaLalog and dlrecLly [olns only Lhe daLa requlred. Pence, buslness analysLs can
work wlLh Padoop daLa llke anoLher flaL Lable Lhrough sLandard AnSl SCL and varlous 8l
query/reporLlng Lools.
1.+$, >C=@D +& 122$.. D";&&7 ("+"
Pe remarked LhaL SCL-P` ls belng consldered as Lhe llnk beLween Lhe Padoop plaLform lnLo
Lhe AsLer plaLform slnce Lhe Lwo envlronmenLs are remarkably complemenLary. 1eradaLa AsLer
provldes a managed M analyLlc dlscovery plaLform LhaL blends SCL and Map8educe, whlle
Padoop ls a low-cosL open-source plaLform LhaL also uses Map8educe. AsLer provldes SCL-
Map8educe Lo lnLegraLe SCL and Map8educe on Lhe AsLer plaLform and SCL-P` Lo lnLegraLe
SCL wlLh Padoop PulS daLa.
1he 8l dlrecLor reallzed LhaL SCL-P` enables hls developmenL sLaff Lo temolo wltb tbe 5Ol"
paradlgm and avold learnlng and operaLlng anoLher sysLem Lo access Lhls daLa." Pe could also
levetoqe tbe cotteot 5Ol skllls to ooolyze noJoop Joto." 1he developer would need Lo know
Lhe synLax and funcLlonal dlfference ln SCL-P`, buL he [udged LhaL Lhls ls oot tbot fot of o
sttetcb." Pe esLlmaLed LhaL Lhe 8Cl on Lhls pro[ecL would be greaLer slnce all Lhe daLa ln
8olJet 1ecbooloqy, loc. 201J 14
Padoop ls accesslble uslng Lhelr exlsLlng analyLlcs envlronmenL. 1hus, he surmlsed LhaL Lhls
pro[ecL had o cleoo bosloess ose cose" wlLh mlnlmal rlsk ln boLh flnanclal lnvesLmenL and
resource consLralnLs. We could look aL daLa LhaL we don'L look aL Loday" wlLh a qulck
developmenL efforL. At tbe eoJ of tbe Joy, we bove to sotvlve oo oJJloq voloe" for Lhe
company.
When asked abouL Lhe cosL of Padoop sLorage, he esLlmaLed LhaL lL would be conslderably less
because of Lhe use of Lhousands of low-cosL faulL-LoleranL commodlLy servers Lo handle daLa
volumes ln Lhe peLabyLe range. 8y uslng SCL-P` Lo fllLer Lhe Padoop daLa, only Lhe resulL seL,
whlch should be ln Lhe glgabyLe range, would be pushed back Lo Lhe AsLer plaLform. Pe
concluded LhaL, Ooly tbe octloooble Joto sboolJ be moooqeJ by tbe Astet plotfotm."
Pe descrlbed Lhelr sLaLus as lo tbe tblokloq stoqe" wlLh no SCL-P` acLlvlLy."yet, bot we ote
cettololy lookloq loto lt" as lL ls oo oot tecbooloqy tooJmop." Pe flnlshed wlLh, lf we coolJ
ptovlJe tbls Joto occess, we woolJ wlo tbe bollqome!"
>)--",?
Company C ls a large healLhcare provlder LhaL ls serlously explorlng ways of managlng and
explolLlng deLalled operaLlonal daLa. Cver elghLeen monLhs, Lhe company consolldaLed varlous
daLa marLs lnLo Lhe 1eradaLa AsLer plaLform. lf Lhe volumlnous operaLlonal daLa could be
economlcally acqulred and sLored, Lhe analyLlcs of Lhe AsLer plaLform (such as LexL analysls on
paLlenL noLes by physlclans) has Lhe poLenLlal Lo provlde lnslghLs lnLo lmprovlng Lhelr
healLhcare dellvery. ln parLlcular, Lhey are currenLly lnvesLlgaLlng Lhe AsLer SCL-P` ConnecLor
as a channel Lo large daLa seLs sLored under PulS.
:7.;6#**2
ln plannlng analyLlcal archlLecLures, companles should conslder Lhe followlng Lradeoffs, such as
scalablllLy, Llme-Lo-value, sklll lnvenLory, and mulLlple daLa channels.
*#"//0/: '&, >2"#"E0#0+?
A crlLlcal Lradeoff ln maklng lnvesLmenLs ln analyLlcal archlLecLures ls seLLlng Lhe prlorlLy on
scalablllLy. ln pasL decades, only Lhe blg companles worrled abouL l1 lnvesLmenLs ln equlpmenL
purchases, daLa cenLer coollng, and Lhe llke. Small companles dld wlLhouL. now many sLarL-up
companles are explolLlng analyLlcs Lo open new buslness nlches havlng exploslve growLh. lrom
day one, Lhose small lnnovaLlve companles musL Lhlnk deeply abouL scallng Lhelr l1 capablllLles
Lo servlce blg cllenLs, who lnLernally are unable Lo explolL Lhe same analyLlcs. Pence, rapld
scalablllLy of analyLlcal archlLecLures - peLabyLe daLa volumes, sub-second operaLlonal
analyLlcs, Lhousands of concurrenL users -- are hard requlremenLs, noL lofLy dreams. 8y
conLrasLlng analyLlcal requlremenLs as currenL versus fuLure, Lhe company can assess Lhe
prlorlLy of lnvesLlng ln scalablllLy.
A recenL developmenL LhaL lnfluenced Lhls Lradeoff ln Lhe companles lnLervlewed ls cloud-
based vlrLual compuLlng servlces from Amazon and oLher vendors. AnalyLlcal processlng ofLen
8olJet 1ecbooloqy, loc. 201J 15
requlres conslderable parallel processlng resources for shorL perlods. urchaslng and
malnLalnlng Lhe necessary equlpmenL would be Loo expenslve. Powever, cloud-based servlces
can provlde Lhe resources as needed for a cosL based on only Lhe resources consumed.
F&/$+0G0/: !0-$@!&@H"#)$
AnoLher dlfflculL Lradeoff ls moneLlzlng Lhe Llme-Lo-value facLor. ln oLher words, whaL ls Lhe
buslness value LhaL resulLs from fasLer 'dlscovery' cycles? AnalyLlcal processlng ofLen conslsLs of
an analysL who ls conLlnually golng Lhrough cycles of collecLlng, reflnlng, modellng, and LesLlng
daLa. lL may Lake many such cycles before Lhe analysL produces valld and pracLlcal resulLs for
Lhe buslness. WlLh currenL Lechnology, lL ls posslble Lo reduce Lhose cycle Llmes from days Lo
mlnuLes, Lhus lncreaslng Lhe producLlvlLy of Lhe analysL by a hundred-fold. lurLher, Lhese fasLer
dlscovery cycles can enable Lhe analysL Lo work as fasL as Lhey can Lhlnk, permlLLlng a hlgh level
of unlnLerrupLed concenLraLlon on Lhe analysls problem.
Pow should Lhe company moneLlze Lhe producLlvlLy lncrease of Lhe analysL? ln Lhe pasL,
execuLlves would declde LhaL Lhe analysL could Lake a week or Lwo and avold Lhe addlLlonal
expense. ln Lhe presenL, many companles llve and dle based on leveraglng 'now' lnformaLlon
LhaL guldes buslness processes, mlnuLe by mlnuLe, as Lhey unfold. Several years ago, a ma[or
reLall company boasLed LhaL Lhelr polnL-of-sale daLa was ln Lhelr daLa warehouse by Lhe Llme
Lhe shopper lefL Lhe parklng loL. 1haL same company conslders LhaL boasL so pasL season and ls
seeklng ways of up selllng Lo Lhelr cusLomers as Lhey wander Lhe sLore floor, long before Lhey
leave Lhe parklng loL.
1..$.. "/; F"/":$ >90## A/B$/+&,?
Much of Lhe Lechnology surroundlng blg daLa analyLlcs has evolved ln recenL years. Pence,
persons (elLher employees or conLracLors) wlLh Lhe proper skllls are ofLen ln shorL supply.
lurLher, Lhe Lechnology ls changlng fasL so LhaL acqulrlng new skllls should be a conLlnual
learnlng process.
I)::#0/: F)#+07#$ 36"//$#.
1he lnlLlal appllcaLlons of blg daLa analyLlcs acqulred, cleansed, fllLered and Lransformed one
daLa source, such as a 1wlLLer LexL sLream. As analyLlcal appllcaLlons, llke markeLlng
opLlmlzaLlon, became more sophlsLlcaLed, lL was apparenL LhaL many daLa channels were
requlred for analyLlcs on Lhe LoLal cusLomer experlence lnvolvlng mulLlple Louch polnLs (llke Lhe
corporaLe webslLe, 1wlLLer, lacebook and oLhers).
1he manual efforL Lo acqulre, cleanse, fllLer and Lransform one daLa channel ls huge. MulLlple
channels requlre n Llmes 'huge' efforL plus. generaLlng cross-channel llnkages so LhaL all Lhe
Louch polnLs for a speclflc cusLomer are lnLerrelaLed.
8olJet 1ecbooloqy, loc. 201J 16
@(2',9&2
1hls secLlon summarlzes and exLracLs pracLlcal lnslghLs for professlonals pursulng blg daLa
analyLlcs wlLhln Lhelr companles.
F"+),0/: 0/+& J/+$,7,0.$ 1,260+$2+),$.
1he maLurlng phase LhaL blg daLa analyLlcs ls undergolng currenLly ls slmllar Lo Lhe evoluLlon of
daLa warehouslng from a deparLmenLal scope lnLo enLerprlse archlLecLures. Lven Loday, Lhere
are cerLalnly many reasons for a deparLmenL uW, such as a clear buslness ob[ecLlve, resLrlcLed
user base, and slngle funcLlonal emphasls. Powever, we have learned over Lhe decades LhaL
Lhere are huge buslness moLlvaLlons Lo exLendlng and lnLegraLlng deparLmenLal uWs lnLo an
enLerprlse uW, such as cross-funcLlonal buslness opporLunlLles, pervaslve supporL of users
(even cusLomers), and operaLlonallzlng analyLlcs lnLo buslness processes.
1he same ls currenLly happenlng Lo blg daLa analyLlcs. 1hls lmplles LhaL Lhere wlll conLlnue Lo be
good buslness reasons for lsolaLed, slngle-purpose analyLlcs pro[ecLs. Powever, as soon as Lhe
buslness vlablllLy of Lhose pro[ecLs ls esLabllshed, Lhe debaLe of how Lo lnLegraLe LhaL
funcLlonallLy lnLo Lhe enLerprlse archlLecLure should sLarL wlLh vlgor. 1he company cannoL
lgnore Lhe Lough lssues, such as securlLy, daLa governance, prlvacy, and scalablllLy, wlLhouL
curLalllng fuLure buslness poLenLlal.
uaLa ownershlp ls one of Lhese Lough lssues LhaL has LradlLlonally been a Lug-of-war beLween
buslness users and l1 managers. l1 wanLs Lo own all Lhe 'offlclal' daLa abouL Lhe corporaLlon,
whlle buslness users wanL Lo use LhaL daLa wlLh flexlblllLy and responslveness.
3#".6 &' !$26 3)#+),$.
1here ls conslderable debaLe abouL wheLher emerglng Lechnlcal alLernaLlves, such as
procedural versus seL operaLlons, no-SCL versus SCL, and Lhe llke, wlll domlnaLe over Lhelr
older couslns. PlsLory clearly shows LhaL Lech waves never obvlaLe prevlous pracLlces. AfLer Lhe
lnlLlal hype subsldes, Lhe savvy professlonals flnd varlous opLlma ln Lhe blendlng of Lhose Lech
waves. Cver Llme, besL pracLlces emerge from conLlnuous lncremenLal lnLegraLlon of new
Lechnologles lnLo coherenL archlLecLures.
1he same ls happenlng wlLh blg daLa analyLlcs and was especlally apparenL ln Lhe companles
lnLervlewed. WlLh any lnlLlal proof-of-concepL pro[ecL, Lhe drlver ls Lo geL Lhe [ob done, as soon
as posslble, as cheaply as posslble, wlLh whaLever Lool and sklll ls avallable. Cnce buslness
vlablllLy ls esLabllshed, Lhe drlver should shlfL Lo deLermlnlng Lhe besL way Lo accompllsh Lhe
[ob. Pence, a company may need Lo acqulre new Lools, new skllls, and a whole new Lechnology
Lo properly saLlsfy Lhe requlred ob[ecLlves. 1he polnL ls. uo noL consLraln your fuLures by
chooslng beLween A or 8. 8e knowledgeable abouL boLh A and 8. And consLanLly search for Lhe
complemenLary blendlng A and 8.
lor example, an early lssue for Lhls sLudy was wheLher Apache Padoop or 1eradaLa AsLer was
beLLer. 1he lnLervlews qulckly lllusLraLed LhaL Lhls lssue was a fluld comparlson LhaL changed
over Llme. 1he companles were lnfluenced ln Lhelr Lechnology declslons as several facLors, such
8olJet 1ecbooloqy, loc. 201J 17
as prlor sklll seLs, depLh of analyLlcs, frequency of dlscovery cycles, and supporL of operaLlonal
buslness processes. 1he plvoLal change was Lhe lnLroducLlon by 1eradaLa of Lhe AsLer SCL-P`
connecLor, whlch allowed AsLer AnalyLlcs Lo access dlrecLly PulS daLa. 1he debaLe shlfLed from
chooslng beLween A or 8 Lo mlnlmlzlng daLa movemenL and Lhus decreaslng Lhe Llme Lo
perform analyses.
F&;$,"+$ !34 ($E"+$.
ln Lhe flrsL secLlon on Challenges, Lhe polnL abouL 1CC was LhaL execuLlves should Lhlnk
sLraLeglcally and broadly abouL assesslng analyLlcal archlLecLures. Cne way of [usLlfylng Lhls
sLaLemenL ls Lo revlew Lhe buslness ob[ecLlves for blg daLa analyLlcs for Lhe Lhree companles.
Company A
L-commerce reLaller LhaL provldes cusLomers wlLh a cusLomlzed onllne shopplng experlence
by belng more responslve Lo cusLomers' behavlor. Pence, Lhe company needs Lo acqulre
cyber-daLa abouL all Louch polnLs wlLh Lhe cusLomer.
Company 8
ulglLal-medla ad agency LhaL opLlmlzes all dlglLal markeLlng for cllenLs. Pence, Lhe company
acqulres daLa on all cusLomer Louch polnLs across all medla channels relaLed Lo Lhelr cllenL.
Company C
PealLhcare provlder LhaL lmproves healLhcare dellvery and reduces medlcal expenses. Pence,
Lhe company acqulres daLa on medlcaLlons and procedures for paLlenLs and wanLs Lo expand
daLa collecLlon Lo all medlcal devlces and from paLlenL noLes by physlclans.
noLe LhaL Lhese buslness ob[ecLlves sLrlke close Lo Lhe core buslness compeLence of Lhe
companles. Could Lhe company conLlnue ln buslness lf Lhelr analyLlcal appllcaLlons cease Lo
operaLe properly? lor one day? lor one week? 1hese quesLlons palnL Lhe naLure of a 1CC
assessmenL ln very pracLlcal and reallsLlc Lerms. 1herefore, 1CC debaLes wlLhln a company
should be moderaLed by skllled execuLlves knowledgeable abouL Lhe buslness reasons for
creaLlng and malnLalnlng analyLlcal appllcaLlons.
K0;0/: +6$ !$26 L"B$.
We have seen many Lech waves over Lhe pasL decades. Powever, Lhe Lech wave behlnd blg
daLa analyLlcs ls obvlously one of Lhe more challenglng and volaLlle. 1o undersLand Lhe
lmpllcaLlons and Lrends wlLh Lhe many dlsclpllnes comprlslng blg daLa analyLlcs requlres a
unlque lndlvldual. Companles LhaL are maklng subsLanLlal lnvesLmenLs ln analyLlcal
archlLecLures musL be knowledgeable of Lhese lmpllcaLlons and Lrends. 1he Lechnology ls
movlng very fasL. Companles musL saLlsfy currenL requlremenLs whlle consLanLly plannlng for
fuLure requlremenLs. 1herefore, companles requlre Lhe skllls of a compeLenL C1C-llke person Lo
rlde Lhls Lech wave and Lo menLor oLhers Lo do Lhe same.
- - - - -
ln summary, any company LhaL lncorporaLes blg daLa analyLlcs lnLo lLs buslness processes
should face and resolve Lhese quesLlons:
1. Wlll you be able Lo supporL Lhe compleLe analyLlcal value chaln from daLa Lo acLlon, Lhus
achlevlng Langlble buslness resulLs?
2. WhaL ls Lhe plan Lo maLure Lhe currenL/proposed analyLlcal appllcaLlons across Lhe
scope of Lhe enLerprlse?
8olJet 1ecbooloqy, loc. 201J 18
3. Pow wlll you leverage Lhe besL from varlous Lech culLures Lo produce Lhe deslred
buslness resulLs?
4. uo you have a consLrucLlve way of debaLlng Lhe 1CC for new lnnovaLlve analyLlcal
appllcaLlons?
3. uo you have a good surflng lnsLrucLor so LhaL you can rlde Lhe volaLlle Lech wave of
analyLlcs?
8olJet 1ecbooloqy, loc. 201J 19

!(;(#&62

1
1he dlscovery and communlcaLlon of meanlngful paLLerns ln daLa" ls Lhe deflnlLlon ln Wlklpedla aL
hLLp://en.wlklpedla.org/w/lndex.php?LlLle=AnalyLlcs&oldld=310946342 reLrleved 6 SepLember 2012. lL ls a weak
enLry buL covers Lhe baslcs.
2
As orlglnally deflned by uoug Laney ln 2001. lor more background, see hLLp://en.wlklpedla.org/wlkl/8lg_daLa
reLrleved 6 SepLember 2012. lL ls a good enLry wlLh many examples.
3
lor background on Apache Padoop, see
hLLp://en.wlklpedla.org/w/lndex.php?LlLle=Apache_Padoop&oldld=310910492 reLrleved 7 SepLember 2012. lL ls a
good overvlew of Lhe enLlre Padoop ecosysLem and communlLy. noLe Lhe funky componenL names! 1hls ls
changlng on a monLhly basls, drlven by Coogle, ?ahoo, Amazon, l8M, Cloudera, PorLonworks and oLhers.
4
lor background abouL 1eradaLa AsLer, see Lhe 8esource secLlon aL hLLp://www.asLerdaLa.com/, whlch has good
whlLepapers, analysL reporLs, webcasLs and more.
3
1hls value chaln was adapLed from a presenLaLlon by Mayank 8awa of 1eradaLa AsLer. lL ls also a Llp of Lhe haL
Lo 1be cotpotote lofotmotloo loctoty by lnmon, lmhoff & Sousa, Second LdlLlon, 2001.
hLLp://www.amazon.com/CorporaLe-lnformaLlon-lacLory-W-lnmon/dp/0471399612
6
hLLp://en.wlklpedla.org/wlkl/8andom_foresL
7
1he measure used ln adverLlslng ls CM or cosL per Lhousand lmpresslons, wheLher Lhe lmpresslon appears ln
radlo, 1v, newspaper, or magazlne. ln Lhls case, lL refers Lo dlglLal adverLlslng where lmpresslons are web page
lmages and Lhe llke.
8olJet 1ecbooloqy, loc. 201J 20
/D#%& &96 G6&9#;#$#,0
1he meLhodology of Lhls sLudy ls Lo llsLen carefully Lo several ploneerlng companles ln Lhe blg daLa
analyLlcs area. 1he lnLenL ls Lo conLrlbuLe Lo professlonal educaLlon-Lo share Lhe lnslghLs wlLh oLher l1
professlonals so LhaL we can maLure as an lndusLry, amld escalaLlng buslness challenges and rapldly
evolvlng Lechnology.
1he sample of 1eradaLa cusLomers was small so Lhese concluslons are Lenuous buL lnslghLful. 8y
leveraglng Lhe open access Lo 1eradaLa cusLomers, we have a gllmpse lnLo Lhe complex lssues lnvolved.
8ased on Lhe quallLy of Lhe lnLervlews, Lhls sample ls represenLaLlve of Lhe lssues and Lrends ln Lhls
emerglng area.
1he prlmary auLhor ls 8lchard PackaLhorn of 8older 1echnology wlLh subsLanLlve conLrlbuLlons from
several 1eradaLa colleagues: SLeve Wooledge, Manan Coel, Mayank 8awa, and kevln raLL. We are
appreclaLlve of Lhe companles and professlonals who were wllllng Lo share openly Lhelr experlences.
llnally, we are appreclaLlve of 1eradaLa CorporaLlon for Lhelr asslsLance and sponsorshlp of Lhls sLudy.
/D#%& +#$;67 :619(#$#,0
8older 1echnology lnc. ls a LwenLy-year-old consulLancy focused on 8uslness lnLelllgence and uaLa
Warehouslng. 1he founder and presldenL ls ur. 8lchard PackaLhorn, who has more Lhan LhlrLy years of
experlence ln Lhe lnformaLlon 1echnology lndusLry as a well-known lndusLry analysL, Lechnology
lnnovaLor, and lnLernaLlonal educaLor. Pe has ploneered many lnnovaLlons ln daLabase managemenL,
declslon supporL, cllenL-server compuLlng, daLabase connecLlvlLy, and daLa warehouslng.
8lchard was a member of Codd & uaLe AssoclaLes and uaLabase AssoclaLes, early ploneers ln relaLlonal
daLabase managemenL sysLems. ln 1982, he founded Mlcroueclslonware lnc. (Mul), one of Lhe flrsL
vendors of daLabase connecLlvlLy producLs, growlng Lhe company Lo 180 employees. Sybase, now parL
of SA, acqulred Mul ln 1994. Pe ls a member of Lhe l8M Cold ConsulLanLs and Lhe 8oulder 8l 8raln
1rusL. Pe has wrlLLen Lhree books and has LaughL aL Lhe WharLon School and Lhe unlverslLy of Colorado.
Pe recelved hls degrees from Lhe Callfornla lnsLlLuLe of 1echnology and Lhe unlverslLy of Callfornla,
lrvlne.
/D#%& H%7 I5#(2#7
1eradaLa ls Lhe world's largesL company focused on analyLlc daLa soluLlons Lhrough lnLegraLed daLa
warehouslng, blg daLa analyLlcs, and buslness appllcaLlons. Cnly 1eradaLa glves organlzaLlons Lhe
advanLage Lo Lransform daLa across Lhe organlzaLlon lnLo acLlonable lnslghLs empowerlng leaders Lo
Lhlnk boldly and acL declslvely for Lhe besL declslons posslble.

1he 8esL ueclslon osslble and SCL-P are Lrademarks, and 1eradaLa, Lhe 1eradaLa logo, and AsLer SCL-
Map8educe are reglsLered Lrademarks of 1eradaLa CorporaLlon and/or lLs afflllaLes ln Lhe u.S. and
worldwlde.
L8-7471 > 0113

Das könnte Ihnen auch gefallen