Sie sind auf Seite 1von 5

TableOfContents

1. Dremel
2. Borg

Dremel:

Dremelisaninteractive,fast,SQLbasedqueryengineforanalyzinghugequantitiesofprotocol
bufferdatastoredinlogfiles,ColumnIOfiles,Bigtable,andmanyothersources.InternallyDremelis
usedtoqueryallkindsofdataatGoogleincludinglogsandfinancialdata.Dremelisexternalizedas
partofGoogle'scloudofferingunderthenameBigQuery.

Dremeloffersavarietyofinterfaces,includingthecommandlineshell,webUIssuchas
#plxUI
,
Contour
,and
PowerDrill
,andclientlibrariesfor
C++
,
Java
,
Python
,and
R.

Tocreatereportsanddashboardsfordataanalysis

Tocreatescheduledpipelinesthatcleanandtransformdata

DremelprovidestwoSQLdialectsforwritingqueries:GoogleSQLandDremelSQL.DremelSQL
hasbeeninuseatGooglesinceDremelstarted.GoogleSQLisanewdialect.Ifyou'renewto
Dremel,useGoogleSQL.

GoogleSQLFeatures

LanguageiscompliantwiththeSQL2011Standard

SELECTDISTINCT
NonequalityJOINcondition(akanonthetajoins)
WITHclause
Correlatedsubqueries
EXISTSpredicate
UNIONALLaresupported(CommainFROMclausenolongermeans
UNIONALL,itmeansCROSSJOIN)
Exactandscalabledistinctaggregates

SUM(DISTINCT)andCOUNT(DISTINCT)
Betterqueryoptimization

FilterpushdownthroughJOINandotheroperators
Autosharding(nomoreJOINEACH/ALLorGROUPEACH)
Improvedhandlingofstructureddata

New
STRUCT
datatypeascontainerfororderedtypedfields
New
ARRAY
datatypetorepresentrepeateddata
ProtocolBuffersarefirstclassdatatypeswiththeabilitytooperateonand
returnfullprotomessages.
NonleaffieldscanbereferenceddirectlyinyourSQLstatements
SharedacrossqueryenginesatGoogleincludingF1andSpanner

DremelSQLFeatures

Allowscolumnaliaseswhichdonotmatchtheunderlyingrecordstructure.

UsesacommaasaUNIONoperatorformoretersequeriesoverlogsdata.

Hassomedifficultieshandlingindependentlyrepeatingfields.

Implicitlyflattensresultswhenusing:

ORDERBY
GROUPBY

AdvancedusersmaywishtoaccessDremelprogrammaticallythroughvariousclientlibraries
insteadofusingtheDremelclient.

C++

Python

Java

DremelR

Borg:
AnarchitectureforschedulingandmanagingapplicationsacrossallGoogledatacenters.Amaster
server(BorgMaster)anditsreplicascontroleachBorg"cell"(typicallyacluster)inadatacenter.
Withinacell,themasterschedulesapplicationsontomachinesthathavetheappropriateresources
available.Borgalsomonitorsapplicationandmachinehealthandrestartsthemincaseoffailure.

BorgisakeycomponentofGoogle'sclustermanagementsystem.Itcontrolsthedistributionof

jobswithinamachinecluster,assigningjobstomachinesinawaythatsatisfiesconstraints&
requirements(e.g.,memoryrequirements),andreassigningjobstoothermachinesas

necessary(e.g.,whenamachinefails).Borgwasdesignedtoperformsuchactivitiesona
massivescale.
Borgsignificantlyreducestheamountofmanagementoverheadrequiredtokeepclustersup
andrunning.ThisisahugeissuewhenconsideringthescaleatwhichGoogleoperates.Also,
byestablishingacommonpoolofmachinescontrolledbyamasterscheduler,therearegreater
opportunitiesforresourcesharingbetweenjobsandresourceutilizationproblemsareeasierto
identify.

Aphysicaldatacenterfacilityhousesoneormoreclustersofmachinessharingadministrative
resources(alockservernamespace,securityservices,machinerepairservices,etc.).Each
suchclustertypicallyalsocorrespondstooneBorgcell.(Theyalsooftensharethesame
twolettername.)ABorgcellcontainsaBorgmaster(anditsreplicas)andalargenumberof
slavemachines,eachofwhichrunsaBorgletdaemonprocess.Onecellmaycontainanywhere
fromafewhundredtotensofthousandsofmachines.
Weshallmanageborgjobsandwatchthemfromsigma.

All
Bigtable
serversrunonBorg

Plx:
#plxisacompletebigdataanalysisandvisualizationplatform.Youcansearchourdatacatalogfor
relevantdata,runinteractivequeriesonbillionrowdatasets,visualizetheresultsinconfigurable
dashboards,andcreatedatapipelinestoautomaticallyimportandprocessyourdata.Thegoalof
#plxistounifyandsimplifyaccesstoyourdata,whetheritbeinDremel
,
Tenzing
,
F1,orsomeother
system.

#plxsurfacesfourobjects:

#plxTablesarethedatasources

#plxScriptsanalyze#plxTablesandcanbeusedtocreatenewtables

#plxWorkflowsscheduletheautomatedexecutionof#plxScriptsandotherprocessing
requiredtocreate#plxTables

#plxDashboardsvisualize#plxTablesandtheresultsof#plxScripts.Youcanaccessthese
throughthe#plxsiteortheDremelorTenzingcommandlinetoolsorprogrammingAPIs.

YoucanalsoarrangetorunyourSQLscriptsautomaticallyonaschedule.

Das könnte Ihnen auch gefallen