Sie sind auf Seite 1von 10

WHITEPAPER

BuildingaBigDataAnalyticsPlatform GoingbeyondtheTraditionalEnterprise Datawarehouse

Abstract
Inthiswhitepaper,ImpetusTechnologiesfocuseson theneedforbuildingaBigDataanalyticsplatformfor betterbusinessinsights. Italsolooksatwhyorganizationsneedtodesignan EnterpriseDataWarehouse(EDW)tosupportthe businessanalyticsderivedfromtheBigData. Additionally,itdiscussestheoptionsandchallengesof buildingasuccessfulEDWarchitecturetomeetthe newBigDatabusinessrequirements.Ittalksabout whyitmayincludeextremeintegrationwithsemi structuredandunstructureddatasources,thatcould beverylargeinsize,orcouldbestreamingdata, accessedthroughHadoop,aswellasmassivelyparallel databases.

ImpetusTechnologies,Inc. www.impetus.com August2011

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

Tableo ofConte ents


Introduction..... ...................... ............................................................... .......2 mitationsoftr raditionalEDW .......................................................... Ws .......4 Lim The ekeyfeaturesofaBigDat taAnalyticsplatform............................. .......5 Optionsavailableforbuilding gtheBigData aplatform......................... .......6 ingOpenSou urcetobuildB BigDatasolut tions.................................. .......6 Usi OptingforaHyb bridsolution. ............................................................... .......7 Harnessingexist tinginvestme entsinbuildin ngaBigDataAnalyticsplat tform ......................... ...................... ............................................................... .......9 Sum mmary.......... ...................... ............................................................... .....10

In ntroduct tion
Inthepostrecessionwo orld,organiza ationsareund derpressuret tomaximize get profitsandreduceexpenditure.Businessownersneedtofindtherighttarg users,figur reoutthedist tributionchannels,succes ssfullysellthe eirofferings;a as wellaskeepallthestakeholdershap ppy. Moreover,everytimeth hebusinesscomesupwith hnewproduc ctsorcampaig gns, orwishestoevaluateits sexistingbusinessperform mance,ithastodealwitht the followingq questions:Wh hatkindofpro oductsaremycustomersinterestedin? WhereshouldIopenmy ynewstoren nextyear?W Whatisthemo osteffective Distribution nchannel? Traditionally,businesses shaveusedE EnterpriseDat taWarehouses(EDW) solutionsfo orprovidinga analyticsandgainingdeep perinsightsto oaddressthei ir businessre equirementsa andexpansionplans. AnEDWcanplayapivot talroleinanenterpriseITstrategy.Ac comprehensiv ve EDWplanp providescompaniesthefo ollowingbene efits: Ena ablesdisciplin neddataintegrationwithinalargeente erprise Generatesoutputandfacilita ateseffective erepresentationsofall bus sinessproces sses

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

Itsimporta anttoexamin nehowthetraditionalEDW Wworks.Trad ditionaldata sourcesinc cludeanoperationalDB,oldarchiveddata,flat/xmlfilesorERP systems.He ere,thedataisextracted,cleanedandtransformedintothedesi ired formatand dthenloadedintothedata awarehousestoragesyste em.Thisdata acan befurtherd dividedintom marts.Oncet thedataisavailableinthecentralEDW W, queryorre eportingtools sareusedforanalytics.Ho owever,forde eeperorfore ecast basedanaly ysis,datamin ningtoolsare eused. onhoweveris swhethersuc chdatawarehousesarere eadytodealw with Thequestio BigDataan ndmoreimpo ortantly,what tisBigData? BigDataisuse edtodescribe edatasetswh hichcannotb bemanagedo or ThetermB processedb bytraditionallyusedsoftw waretoolswit thinanagreedelapsedtim me. TheBigdat tasizeiscons stantlyincreasing,andcanrangefroma afewterabyt testo manypetab bytes.Howev ver,itisexpec ctedtoreacharound35ze ettabytesbyt the year2020!

useshavefalle enshortofex xpectationsw when TraditionalEnterpriseDataWarehou itcomesto ohandlingBig gData,onacc countofthef followingreas sons: abilitytohand dlelargedata asizes Ina Sto oringandMan nagingtheBig gData Gaininginsights sfromthisdata Cos stsinvolvedindealingwithBigData

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

Limit tationso oftraditi ionalED DWs


Letusexam minethelimit tationsoftrad ditionalEDWs s. eDataWarehousesfocusedonlyontransactionalor r Traditionally,Enterprise archivedda ata.However r,inthelastfe ewyears,the eneedtocapt tureadditional datafordeeperinsightshascomeup p.Thisinclude es,realtimed data,whichm may orcustomerb behaviordata a,whichcaptures bethelowlatencyoperationaldatao thesubtransactionalpr rocesses.Atthesametime e,additionald datasourcessuch asdevicesa andsensorsh havealsoeme erged. SocialMediaalsoprovid desvaluableinformationo onproductpr referencesand usersentim ments.Itisext tremelyusefu ulforgenerat tingbusinessintelligence,from thelargeunstructuredd datagenerate edfromtheW Webapplications. Itisclearth hattraditiona alEDWscannotgainmean ningfulinsight tsfromBigDa ata. Thisisposs siblybecausetraditionalED DWswerejus stnotmeanttohandleTBsand PBsofdata a.Mostofthe esesystemsw weredesigned dinthe1990s susingdatab base technologie es. Anotherdif fferenceisthatinplaceof fExtraTransfo ormLoad,the eBigData Warehouse esneedELTLwhichisExtra actLoadTran nsformLoad.Thenewsyst tem needsasta agingareawh heredataisup ploadedbefo orethe cleansing/t transformatio onsoperation ns. atabasesoluti ionsarenots suitableforamajorityofd data Traditionalrelationalda sets.Theda ataistoouns structuredand/ortoovolu uminousfora atraditional RDBMStoh handle.BigDatacannotbe eanalyzedwithSQLorsim milartechnolo ogies. Infact,data abaseschemadoesnotallowcomplexunstructured dformatstob be definedand dmanagedin nthesedatawarehouses.Moreover,t thecostsinvo olved inhandling gthesenewdatasetsbyus singtradition naltechnologi iesisalsover ry high. Clearly,existingEDWen nvironments,whichwered designeddecadesago,lackthe abilitytoca aptureandpr rocessthenewformsofda atawithinrea asonable processingtimes.Moreo over,thesetr raditionalEDW Wshavelimit tedcapabilitie es whenitcom mestoanalyz zinguserbeha avioraldata. Costisanot therimportantfactor.Cur rrently,organ nizationsares spending hundredso ofthousandsofdollarsper rterabyteper ryearforpro oducingand replacingdataintheirexistingenviro onments,whichishuge.Additionally,th he modelsinu usetendtore equirespecializedhardwar re,whichint turnresultsin nbig dollarsperterabytecos st,makinglargescaledepl loymentsexp pensive.Itisa also ureworkloadformanaging gthisBigData. reallyhardtopredicttheinfrastructu

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

Thekey yfeature esofaB BigDataAnalytic csplatfo orm


Tomanage etheBigDatatrend,anew wbreedofBig gDataOpenS Sourceand proprietary ytechnologieshavecomeup,thatleveragecommod dityhardware e.A BigDataAn nalyticsplatfo ormhelpscap ptureandana alyzethesene ewdatasets. TheidealBigDataAnaly yticsplatform mneedstoma atchuptothe esekey characteris stics: Its shouldhaveth heabilitytos scaleeasilyto osupportlarg gedata,which hwill typ picallybeinte erabytesorpetabytes. The esystemshou uldideallybe edistributeda acrossgeogra aphicallyunaw ware pro ocessors. Its shouldenable equickrespon nsetohighlycomplexque eriesaswellas sup pportawidev varietyofdat tatypes Its shouldbeable etoincorporatemachinelearning,providing rec commendatio ons,andexecutinganalytic csonrealtimeincomingd data suc chaslogs,aswellasprovi idingdomainspecificcann nedreports. Its shouldbeable etohandledatafromhete erogeneousd datasources, wh hileprovidingahighratefo orloadingand danalysis,aswellasthea ability tohandlefailover.

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

Op ptionsav vailableforbuildingthe eBigDat taplatfo orm


Itisimportanttounders standthatfor rbuildingaBigDataanaly yticsplatform, ,any singlevend dortechnolog gymaynotbe esufficient.Theplatforms shouldhave certaincap pabilitiestoad ddressspecifi icsetsofrequ uirements. Therearet twodifferentapproachest thatarebeing gusedtoadd dressBigData a analytics. Thefirston neisusingMa assiveParallelProcessinga andColumnarDatabases.This solutioncanhelpaddres ssscaling,dis stribution,loa admanageme ent,response etime ditionally,itm mayalsohave esomedomain andfailovermanagementissues.Add specificcap pabilitiestoprovidearead dymadesolut tion. doptionisusi ingMapRedu uceimplemen ntations.Thisframeworkw was Thesecond initiallyuse edbyGoogletoperformW Websearchesandisnowe easilyavailableas theOpenS SourceApache eprojectcalle edHadoop. Companies stherefore,havetheoptio ontochoosebetweenOpe enSource solutionsandcommerci ialoptions.However,they ycanalsobuil ldahybrid whichhasami ixofdifferent tcapabilitiesthathandlet theBigData solution,w paradigm. Thecomme ercialtoolsof ftodayhaves stronganalyt ticalproficienciesaswella as sophisticatedreportingandOLAPcubecapabilitie es.Therearealargenumb berof vendorsinthemarketw whoareofferi ingsolutionsforthemaincomponentsof theEDWs,whichareETL,querytools sandBI. SomeofthecommercialoptionsforMPPareGree enPlum,Tera adata,etc. Informatica aisanexamp pleofETL.Afe ewcommercialsolutionsf forBIand Analyticsar rePentaho,B BusinessObje ects,MicroStr rategy,among gothers.Itis possibletobuildaBigD Datawarehousesolutionus singthesecommercial productsto ogether.

Usin ngOpenSourcetobuildBigDat tasolutio ons


Everyorgan nization,bigo orsmall,isno owfocusedon ncuttingITexpenditure. Despitethis,businessan nalyticsremainsamajorb businessdrive erforthese companies.Ifthecomm mercialsolutio onsarescaled dtoreallyhug gevolumesand deeperBI,itcanresultinexorbitantlicensingcost ts. Thisisclear rlynotaviableproposition n.Companies scaninsteadchoosefromthe numerousO OpenSourceimplementat tionsthatare eavailable.Lo owercosts, extensibility,andintegra ationaresom meofthebenefitsthatorg ganizationsrealize odnewsistha atthecommu unityis fromOpenSourcesolutions.Thegoo
6

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

continuous slymakingeff fortstoenhan ncethesefea aturesandaddnew functionalit tiestotheses solutions. SomeoftheOpenSourc cesolutionss stacksinthea analyticsworl ldarejaspers soft, andPantah hoReporting,whiletheETLtoolsareloverETL,Talen nd,etc.Penta aho alsoprovidescommercialextensions sofitssolutio on,whileApac cheHadoopa and Cassandraprovideimple ementationstotheMapRe educeframew work.These productsso olvehugedat tastorageissu uesandprovideETLanda analyticssupp port.

Optingfo oraHybr ridsolut tion


nario,itispos ssibletousea anOpenSourcesolutionfo orETLorBIan nda Inthisscen commercia alsolutionfor rAnalytics,or rviceversa.H HadoopandM MPPsolutions sfor instance,ca anworktogetherasETLpipesalongwithacommer rcialAnalytics stool. Alternative ely,MPPandc columnardat tabasescanb bechosen,alo ongwithMap pReduceto provideano otherperfect thybridsolution. Whenthere earelargerv volumesofda atatobeanalyzed,organiz zationsarebe etteroff usingOpen nSourcesolut tions.Hadoop pisoneofthe ebestavailab bleOpenSource solutionsth hatcanhelpt theminhandlingtheirBigDatainacos steffectivem manner.It alsomakes ssensetouse eparallelproc cessingoroth herfastmechanismswhile etryingto importfrom mthesourcesystemorexporttothedestinationsys stem. Incidentally y,realtimei isamythinB BigData.Thedatawarehou usesystemhastobe carefullyde esignedsothatrealtimed datacanbelim mitedbysizeorbytime.It tispossible toreuseso omeoftheex xistingEDWin nvestmentsin nbuildingaB BigDataplatfo orm.

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

TheImpet tussolut tion


Basedonit tsprojectexp periences,Imp petusTechno ologieshasbu uiltaBigData Analyticsplatformforitsclientsthatcanhelpthemrolloutthe eirBigData Analyticsin nitiatives.The eplatformisc callediLaDaP, ,whichissho ortforImpetu us LargeDataAnalyticsPlatform. Thecoreof ftheiLaDaPp platformisbu uiltusingSOA A,andincorpo oratesallthekey characteris sticsofanidealBigDataAn nalyticsplatfo ormdiscussedearlier.iLaD DaPis designedto oderiveintelligenceando operateonhu ugedatasetsc collectedfrom m numerousd datasourcesinmultipledataformats.Itispowered dbyHadoop,and therefore,c canlinearlyscaleuptotho ousandsofno odesusingco ommodity hardware.T Thisspellsas significantcos stadvantageinthelongru un.iLaDaPals so comeswith hasetofpre cannedandc customizedre eports.

Recognizing gthatitisimportantforbusinessestotrackdownandtakeadvantageofan opportunity y,asithappe ens,Impetusplatformena ablesthemto oreacttotheeventsas theyoccur. .iLaDaPisals socapableofcollectingdat tafromaran ngeofdispara atesources. Thisunstru uctureddatac canbetransfo ormedandut tilizedforstra ategicbusiness decisions. iLaDaPcanbeseamlessl lyintegratedwithcurrentplatforms,w withoutthene eedfor majorchan nges.Thecore eiLaDaPplatf formisbuiltu usingOpenSo ourcetechno ologies,

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

wherethecomponentscanbereplac cedwithothe ercommercia altechnologie es,in accordance ewithrequire ements.

Harn nessinge existinginvestm mentsinbuilding gaBigD Data Analytic csplatfo orm


Itispossibletoreuseinv vestmentsma adeinthetra aditionaldata awarehouse,to buildaBigDataAnalytic csplatform.It tispossibletoreusemost tofthehardw ware sincetheBigDatasoluti ionscanruno oncommodit tygradehardware.Therefore, anexistingRDBMSbase edinfrastructurecanbere eutilized.The eexistingcode e logicandal lgorithmscan nbealsoused dafterminormodification nstoenableth hem toruninastatelessarc chitecturalen nvironment.In nthisscenario,toolslike MATLABca anbeintegrat tedwithHado oopliketechnologies. ayofutilizing gthedatawar rehouseinves stmentsisby yextendingor r Anotherwa enhancingtheircapacity ybypluggingthemtogeth herwithaBigDatawareho ouse solution.Hadoopforexa ample,isaco osteffectiveo optionforsto oringarchivaldata; performing gdeeperanalyticsandprovidingsummarizedreport tingdatatoan existingdat tawarehouse e.Thisstrateg gycanalsohe elpinreusingthereporting g tools.Similarly,ETLtoolscanbemod difiedtouset theBigDataw warehouseas s slikeTalendo orInformaticaprovidecon nnectorsforu usingHadoop pand sinks.Tools commercia alMPPsasdat tasinks. estingstrateg gycanalsobe ereused.Mos stofthenewBig Thedevelopmentandte Datawareh housesolutionssupportSQ QLorJavaorscriptinglang guagesandal llow thereuseo ofexistingde evelopmentandtestinginv vestments.

Organizations O ca andeploy iL LaDaP onpremise,as s wellasina w Cloud C su upported deploymentse d et up.

Buil ldingaBigDataAnalyticsPla atformGoingbeyondtheTr raditionalEnterpriseDatawa arehouse

Summ mary
Inconclusio onitcanbesaidthattradi itionalEnterp priseDataWa arehousesdonothave theabilityt tokeepupwiththegrowingdemandso ofBigData.T Theneedofth hehouris toeffective elystrategizeandbuildaB BigDataanaly yticsplatform mtomanage,storeand deriveinsig ghtsfromthis sdigitaldata. Also,nosin nglevendorte echnologywillbesufficien nt.Itisrecommendedthat t organizatio onsgoforahy ybridsolution nconstitutedbycommerc cialandOpenSource optiontob buildtheirBigDataanalytic csplatform. Whenthere eisalargevo olumeofdata atobeanalyz zed,itissugge estedthatan nOpen Sourcesolu utionbeused d,andHadoop pisthebesto option.Thesu uccessofaBigData platformde ependsentire elyonthetoo olsthatarech hosen.Theref fore,themos st appropriate etoolsmustbeselectedfr romtheavailableoptions. .Companiesc can additionally yreuseexist tingEDWinve estmentsfort theirBigData aanalyticspla atform.

AboutImpet tus ImpetusTech hnologiesoffersProductEng gineeringandT TechnologyR& &Dservicesforsoftwareprod ductdevelopment. Withongoing ginvestmentsinresearchan ndapplicationo ofemergingte echnologyarea as,innovativeb businessmode els,and anagileappr roach,wepartnerwithourclientbasecom mprisinglarges scaleISVsandt technologyinn novatorstodeliver cuttingedge esoftwareprod ducts.Ourexpertisespansth hedomainsofBigData,SaaS,CloudCompu uting,Mobility Solutions,Te estEngineering g,Performance eEngineering,andSocialMediaamongoth hers. ImpetusTechnologies,Inc. vard,Suite450 0,SanJose,CA95129,USA 5300StevensCreekBoulev @impetus.com Tel:408.252.7111|Email:inquiry@ velopmentCentersINDIA:NewDelhiBangaloreIn ndoreHydera abad RegionalDev Toknowmo orevisit:http:/ //www.impetus.com

Di isclaimers
Theinformationcon ntainedinthisdocumentistheprop prietaryandexclus sivepropertyofIm mpetusTechnologi iesInc.exceptaso otherwiseindicate ed.Nopartof isdocument,inwh holeorinpart,ma aybereproduced, ,stored,transmitted,orusedforde esignpurposeswithoutthepriorwri ittenpermissiono ofImpetus thi 10 0 TechnologiesInc.

Das könnte Ihnen auch gefallen