Beruflich Dokumente
Kultur Dokumente
1. Dremel
2. Borg
Dremel:
Dremelisaninteractive,fast,SQLbasedqueryengineforanalyzinghugequantitiesofprotocol
bufferdatastoredinlogfiles,ColumnIOfiles,Bigtable,andmanyothersources.InternallyDremelis
usedtoqueryallkindsofdataatGoogleincludinglogsandfinancialdata.Dremelisexternalizedas
partofGoogle'scloudofferingunderthenameBigQuery.
Dremeloffersavarietyofinterfaces,includingthecommandlineshell,webUIssuchas
#plxUI
,
Contour
,and
PowerDrill
,andclientlibrariesfor
C++
,
Java
,
Python
,and
R.
Tocreatereportsanddashboardsfordataanalysis
Tocreatescheduledpipelinesthatcleanandtransformdata
DremelprovidestwoSQLdialectsforwritingqueries:GoogleSQLandDremelSQL.DremelSQL
hasbeeninuseatGooglesinceDremelstarted.GoogleSQLisanewdialect.Ifyou'renewto
Dremel,useGoogleSQL.
GoogleSQLFeatures
LanguageiscompliantwiththeSQL2011Standard
SELECTDISTINCT
NonequalityJOINcondition(akanonthetajoins)
WITHclause
Correlatedsubqueries
EXISTSpredicate
UNIONALLaresupported(CommainFROMclausenolongermeans
UNIONALL,itmeansCROSSJOIN)
Exactandscalabledistinctaggregates
SUM(DISTINCT)andCOUNT(DISTINCT)
Betterqueryoptimization
FilterpushdownthroughJOINandotheroperators
Autosharding(nomoreJOINEACH/ALLorGROUPEACH)
Improvedhandlingofstructureddata
New
STRUCT
datatypeascontainerfororderedtypedfields
New
ARRAY
datatypetorepresentrepeateddata
ProtocolBuffersarefirstclassdatatypeswiththeabilitytooperateonand
returnfullprotomessages.
NonleaffieldscanbereferenceddirectlyinyourSQLstatements
SharedacrossqueryenginesatGoogleincludingF1andSpanner
DremelSQLFeatures
Allowscolumnaliaseswhichdonotmatchtheunderlyingrecordstructure.
UsesacommaasaUNIONoperatorformoretersequeriesoverlogsdata.
Hassomedifficultieshandlingindependentlyrepeatingfields.
Implicitlyflattensresultswhenusing:
ORDERBY
GROUPBY
AdvancedusersmaywishtoaccessDremelprogrammaticallythroughvariousclientlibraries
insteadofusingtheDremelclient.
C++
Python
Java
DremelR
Borg:
AnarchitectureforschedulingandmanagingapplicationsacrossallGoogledatacenters.Amaster
server(BorgMaster)anditsreplicascontroleachBorg"cell"(typicallyacluster)inadatacenter.
Withinacell,themasterschedulesapplicationsontomachinesthathavetheappropriateresources
available.Borgalsomonitorsapplicationandmachinehealthandrestartsthemincaseoffailure.
BorgisakeycomponentofGoogle'sclustermanagementsystem.Itcontrolsthedistributionof
jobswithinamachinecluster,assigningjobstomachinesinawaythatsatisfiesconstraints&
requirements(e.g.,memoryrequirements),andreassigningjobstoothermachinesas
necessary(e.g.,whenamachinefails).Borgwasdesignedtoperformsuchactivitiesona
massivescale.
Borgsignificantlyreducestheamountofmanagementoverheadrequiredtokeepclustersup
andrunning.ThisisahugeissuewhenconsideringthescaleatwhichGoogleoperates.Also,
byestablishingacommonpoolofmachinescontrolledbyamasterscheduler,therearegreater
opportunitiesforresourcesharingbetweenjobsandresourceutilizationproblemsareeasierto
identify.
Aphysicaldatacenterfacilityhousesoneormoreclustersofmachinessharingadministrative
resources(alockservernamespace,securityservices,machinerepairservices,etc.).Each
suchclustertypicallyalsocorrespondstooneBorgcell.(Theyalsooftensharethesame
twolettername.)ABorgcellcontainsaBorgmaster(anditsreplicas)andalargenumberof
slavemachines,eachofwhichrunsaBorgletdaemonprocess.Onecellmaycontainanywhere
fromafewhundredtotensofthousandsofmachines.
Weshallmanageborgjobsandwatchthemfromsigma.
All
Bigtable
serversrunonBorg
Plx:
#plxisacompletebigdataanalysisandvisualizationplatform.Youcansearchourdatacatalogfor
relevantdata,runinteractivequeriesonbillionrowdatasets,visualizetheresultsinconfigurable
dashboards,andcreatedatapipelinestoautomaticallyimportandprocessyourdata.Thegoalof
#plxistounifyandsimplifyaccesstoyourdata,whetheritbeinDremel
,
Tenzing
,
F1,orsomeother
system.
#plxsurfacesfourobjects:
#plxTablesarethedatasources
#plxScriptsanalyze#plxTablesandcanbeusedtocreatenewtables
#plxWorkflowsscheduletheautomatedexecutionof#plxScriptsandotherprocessing
requiredtocreate#plxTables
#plxDashboardsvisualize#plxTablesandtheresultsof#plxScripts.Youcanaccessthese
throughthe#plxsiteortheDremelorTenzingcommandlinetoolsorprogrammingAPIs.
YoucanalsoarrangetorunyourSQLscriptsautomaticallyonaschedule.