Beruflich Dokumente
Kultur Dokumente
For Architects
Nick Dimiduk Member of Technical Staff, HBase Strata/Hadoop World, 2013-10-29
Page 1
Page 2
Agenda
! Background
! (how did we get here?)
! TL;DR
! (dont waste my time!)
! High-level Architecture
! (where are we?)
! Anatomy of a RegionServer
! (how does this thing work?)
! By Example
! (how do I use it?)
! Resources
! (where do we go from here?)
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Page 3
Background
Page 4
! Key Features:
! Distributed storage across cluster of machines ! Random, online read and write data access ! Schemaless data model (NoSQL) ! Self-managed data partitions
Page 5
TL;DR
Page 7
Disk
Memory
Figure 2.1 reproduced from ONeil, Patrick, et al. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33.4 (1996): 351-385.
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Page 8
DataNode
RegionServer
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Disk
Memory
Page 9
cessing is also cessing a DataNode part is also of cessing the a DataNode part workloads, is also of the a part workloads, TaskTrackers of the w T DataNode RegionServer RegionServer DataNode RegionServer RegionServe DataNode So what is HBase anyway? Servers canServers run together. canServers run together. can run together.
C1 tree C0 tree C1 tree C0 tree C1 tree C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
DataNode
Disk
Memory
RegionServer DataNode
Disk
Memory
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Figure 3.7
HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase DataNode and RegionServer HDFS processes DataNo and
Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk Memory Disk Memory Disk Memory
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Page 10
Nodes and Nodes RegionServers, and Nodes RegionServers, you and can Nodes RegionServers, use you the and can data Nodes RegionServers, use locality you the and can data property; Nodes RegionServers, use locality you the and can data that property; RegionServers, use is, locality you the RegionServcan data that property; use is, locality you the RegionServcan data that property; use is, locality the RegionServ data that prope is, loc primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, primarily and In other writes. reads random deployments, primarily and In where other writes. reads the random deployments, MapReduce and In where other writes. reads the deployments, MapReduce and In where proother writes. the deployments, MapReduce In where proother the deploy Map whe pro Figure 3.6 that AFigure table consists 3.6 AFigure table ofthe multiple consists 3.6 A smaller Figure table ofstored multiple consists 3.6 chunks A smaller Figure table called ofstored multiple consists 3.6 chunks regions. A smaller Figure table called ofstored multiple consists 3.6 chunks regions. A smaller table called ofstored multiple consists chunks regions. smaller called ofstored multiple chunks regions. smaller called chunks regions. called Given the Given underlying that Given underlying data that is the Given underlying data that in is the Given underlying data that in is the Given underlying data that in is the underlying data in is data in is stored in HDFS , which HDFS is available , which HDFS is to available all , which clients HDFS is to as available all , which clients HDFS is to as available all , which clients HDF is to a a ers can theoretically ers a can theoretically ers and can write theoretically ers to and can the write theoretically local read ers to DataNode and can the write theoretically local read ers to as DataNode and can the write theoretically primary local read to as DataNode and DataNode. write primary local read to as DataNode and the DataNode. write primary local to as DataNode the DataNode. primary local as Dat th cessing is also cessing part isread also of cessing the a part workloads, isread also of cessing the a part workloads, TaskTrackers, is also of cessing the a part workloads, TaskTrackers, is DataNodes, also of cessing the a part workloads, TaskTrackers, is DataNodes, and also of the athe part workloads, TaskTrackers, DataNodes, and of the workloads, TaskTrackers, DataNodes, and TaskTracke DataNod and HB ase RegionHB ase RegionHB ase Region HB a single namespace, a single namespace, all aRegionServers single namespace, all aRegionServers single have namespace, all access aRegionServers single have to the namespace, all access same aRegionServers single have to persisted the namespace, all access same RegionServers files have to persisted the in all access the same RegionServers file files have to persisted the in access the same file files have to persisted in access the same fil fil t You can may wonder You may where wonder You the may TaskTrackers where wonder You the may TaskTrackers where are wonder You in the may this TaskTrackers where are scheme wonder You in the may this of TaskTrackers where are scheme things. wonder in the this In of TaskTrackers where some are scheme things. in the this In of TaskTrackers some are scheme things. inthe this In of som are sche thin Servers Servers run together. can Servers run together. can Servers run together. can Servers run together. can Servers run together. can run together. store/access store/access data on HDFS store/access data . The on master HDFS store/access data . process The on master HDFS store/access does data . process The on the master HDFS distribution store/access does data . process The on the master HDFS distribution of does data regions . process The on the master HDFS distribution among of does regions . process The the master distribution among of does regions process the distrib amon of does reg system and system can therefore and system can host therefore and any system can region host therefore and any (figure system can region host therefore 3.8). and any (figure By system can region physically host therefore 3.8). and any (figure By can collocating region physically host therefore 3.8). any (figure By collocating region Dataphysically host 3.8). any (figure By collocating region Dataphysically 3.8). (figure By collo Data phy 3 HBase deployments, HB ase deployments, the HB MapReduce ase deployments, the HB MapReduce ase framework deployments, the HB MapReduce ase framework isnt deployments, deployed the HB MapReduce ase framework isnt at deployments, deployed all the if MapReduce the framework isnt workload atdeployed all the if MapReduce the framework isnt isworkload atdeployed all if the framework isnt isworkload atdeployed all if the isn i RegionServers, RegionServers, and each RegionServers, RegionServer and each RegionServers, RegionServer and typically each RegionServers, RegionServer hosts and typically multiple each RegionServers, RegionServer hosts and typically regions. multiple each RegionServer hosts and typically regions. multiple each RegionServer hosts typically regions. multiple hosts typically regions. multiple ho Nodes and Nodes RegionServers, and Nodes RegionServers, you and can Nodes RegionServers, use you the and can data Nodes RegionServers, use locality you the and can data property; Nodes RegionServers, use locality you the and can data that property; RegionServers, use is, locality you the RegionServcan data that property; use is, locality you the RegionServcan data that property; use is, locality the RegionServ data that prope is, loc primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, primarily and In other writes. reads random deployments, primarily and In where other writes. reads the random deployments, MapReduce and In where other writes. reads the deployments, MapReduce and In where proother writes. the deployments, MapReduce In where proother the deploy Map whe pro Given that the Given underlying that the Given underlying data that is stored the Given underlying data that in is stored the Given underlying data that in is stored the Given underlying data that in is stored the underlying data in is stored data in is stored in HDFS , which HDFS is available , which HDFS is to available all , which clients HDFS is to as available all , which clients HDFS is to as available all , which clients HDF is to a a ers can theoretically ers RegionServer can theoretically ers and can write theoretically ers to and can the write theoretically local read ers to DataNode and can the write theoretically local read ers to as DataNode and can the write theoretically primary local read to as DataNode and the DataNode. write primary local read to as DataNode and the DataNode. write primary local to as DataNode the DataNode. primary local as Dat th cessing is also cessing a part isread also of cessing the a part workloads, isread also of cessing the a part workloads, TaskTrackers, is also of cessing the a part workloads, TaskTrackers, is DataNodes, also of cessing the a part workloads, TaskTrackers, is DataNodes, and also of the aRegionServer part workloads, TaskTrackers, DataNodes, and of the workloads, TaskTrackers, DataNodes, and TaskTracke DataNod and HB ase RegionHB ase RegionHB ase Region HB DataNode DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServ Re a single namespace, a single namespace, all aRegionServers single namespace, all aRegionServers single have namespace, all access aRegionServers single have to the namespace, all access same aRegionServers single have to persisted the namespace, all access same RegionServers files have to persisted the in all access the same RegionServers file files have to persisted the in access the same file files have to persisted in access the same fil fil t You can may wonder You may where wonder You the may TaskTrackers where wonder You the may TaskTrackers where are wonder You in the may this TaskTrackers where are scheme wonder You in the may this of TaskTrackers where are scheme things. wonder in the this In of TaskTrackers where some are scheme things. in the this In of TaskTrackers some are scheme things. inthe this In of som are sche thin Servers Servers run together. can Servers run together. can Servers run together. can Servers run together. can Servers run together. can run together. system and system can therefore and system can host therefore and any system can region host therefore and any (figure system can region host therefore 3.8). and any (figure By system can region physically host therefore 3.8). and any (figure By can collocating region physically host therefore 3.8). any (figure By collocating region Dataphysically host 3.8). any (figure By collocating region Dataphysically 3.8). (figure By collo Data phy 3 HBase deployments, HB ase deployments, the HB MapReduce ase deployments, the HB MapReduce ase framework deployments, the HB MapReduce ase framework isnt deployments, deployed the HB MapReduce ase framework isnt at deployments, deployed all the if MapReduce the framework isnt workload atdeployed all the if MapReduce the framework isnt isworkload atdeployed all if the framework isnt isworkload atdeployed all if the isn i Nodes and Nodes RegionServers, and Nodes RegionServers, you and can Nodes RegionServers, use you the and can data Nodes RegionServers, use locality you the and can data property; Nodes RegionServers, use locality you the and can data that property; RegionServers, use is, locality you the RegionServcan data that property; use is, locality you the RegionServcan data that property; use is, locality the RegionServ data that prope is, loc primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, primarily and In other writes. reads random deployments, primarily and In where other writes. reads the random deployments, MapReduce and In where other writes. reads the deployments, MapReduce and In where proother writes. the deployments, MapReduce In where proother the deploy Map whe pro ers can theoretically ers can theoretically read ers and can write theoretically read ers to and can the write theoretically local read ers to DataNode and can the write theoretically local read ers to as DataNode and can the write theoretically primary local read to as DataNode and the DataNode. write primary local read to as DataNode and the DataNode. write primary local to as DataNode the DataNode. primary local as Dat th cessing is also cessing a DataNode part is also of cessing the a DataNode part workloads, is also of cessing the a DataNode part workloads, TaskTrackers, isRegionServer also of cessing the a DataNode part workloads, TaskTrackers, isDataNodes, also of cessing the a DataNode part workloads, TaskTrackers, isDataNodes, and also of the aRegionServer part workloads, TaskTrackers, DataNodes, and of the workloads, TaskTrackers, DataNodes, and TaskTracke DataNod and HB ase RegionHB ase RegionHB ase Region HB DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServ Re You can may wonder You may where wonder You the may TaskTrackers where wonder You the may TaskTrackers where are wonder You in the may this TaskTrackers where are scheme wonder You in the may this of TaskTrackers where are scheme things. wonder in the this In of TaskTrackers where some are scheme things. in the this In of TaskTrackers some are scheme things. in this In of som are sche thin Servers Servers run together. can Servers run together. can Servers run together. can Servers run together. can Servers run together. can run together. HBase deployments, HBase deployments, the HB MapReduce ase deployments, the HB MapReduce ase framework deployments, the HB MapReduce ase framework isnt deployments, deployed the HB MapReduce ase framework isnt at deployments, deployed all the ifMapReduce the framework isnt workload atdeployed all the ifMapReduce the framework isnt isworkload atdeployed all if the framework isnt isworkload atdeployed all if the isn i primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, primarily and In other writes. reads random deployments, primarily and In where other writes. reads the random deployments, MapReduce and In where other writes. reads the deployments, MapReduce and In where proother writes. the deployments, MapReduce In where proother the deploy Map whe pro Figure 3.7 HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase Figure DataNode and RegionServer 3.7 HDFS processes HBase Figure DataNode and RegionServer 3.7 are HDFS processes typically HBase Figure DataNode and RegionServer 3.7 collocated are HDFS processes typically HBase DataNode on and RegionServer the collocated are HDFS same processes typically DataNode host. on and the collocated are HDFS same processes typically DataNode host. on the collocated are same processe typically host on cessing is also cessing a DataNode part is also of cessing the a DataNode part workloads, is also of cessing the a DataNode part workloads, TaskTrackers, isRegionServer also of cessing the a DataNode part workloads, TaskTrackers, isDataNodes, also of cessing the a DataNode part workloads, TaskTrackers, isDataNodes, and also of the aRegionServer part workloads, TaskTrackers, DataNodes, and of the workloads, TaskTrackers, DataNodes, and TaskTracke DataNod and HB ase RegionHB ase RegionHB ase Region HB DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServ Re Servers canServers run together. canServers run together. canServers run together. canServers run together. canServers run together. can run together.
C1 tree C0 tree Disk C1 tree Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Figure 3.7
DataNode
Disk
HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase Figure DataNode and RegionServer 3.7 HDFS processes HBase Figure DataNode and RegionServer 3.7 are HDFS processes typically HBase Figure DataNode and RegionServer 3.7 collocated are HDFS processes typically HBase DataNode on and RegionServer the collocated are HDFS same processes typically DataNode host. on and the collocated are HDFS same processes typically DataNode host. on the collocated are same processe typically host on
Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Memory
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
RegionServer DataNode
Disk
Memory
RegionServer DataNode DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServ Re
Disk Memory Disk Memory Disk Memory Disk Memory
C1 tree
Licensed to Licensed Nick Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Nick <ndimiduk@gmail.com Dimiduk <ndimiduk@
C0 tree C1 tree C0 tree C1 tree C0 tree C1 tree C0 tree C1 tree C0 tree C1 tree C0 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Memory C0 tree
Disk C1 tree
Figure 3.7
HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase Figure DataNode and RegionServer 3.7 HDFS processes HBase Figure DataNode and RegionServer 3.7 are HDFS processes typically HBase Figure DataNode and RegionServer 3.7 collocated are HDFS processes typically HBase DataNode on and RegionServer the collocated are HDFS same processes typically DataNode host. on and the collocated are HDFS same processes typically DataNode host. on the collocated are same processe typically host on
Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk Memory Disk Memory Disk Memory Disk Memory Disk Memory Disk Memory
Disk C1 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Licensed to Licensed Nick Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Nick <ndimiduk@gmail.com Dimiduk <ndimiduk@
Figure 3.7
Attribution-ShareAlike 3.0 Unported License. Page 11
HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase Figure DataNode and RegionServer 3.7 HDFS processes HBase Figure DataNode and RegionServer 3.7 are HDFS processes typically HBase Figure DataNode and RegionServer 3.7 collocated are HDFS processes typically HBase DataNode on and RegionServer the collocated are HDFS same processes typically DataNode host. on and the collocated are HDFS same processes typically DataNode host. on the collocated are same processe typically host on Licensed under a Creative Commons
Nodes and Nodes RegionServers, and Nodes RegionServers, you and can Nodes RegionServers, use you the and can data Nodes RegionServers, use locality you the and can data property; Nodes RegionServers, use locality you the and can data that property; RegionServers, use is, locality you the RegionServcan data that property; use is, locality you the RegionServcan data that property; use is, locality the RegionServ data that prope is, loc primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, primarily and In other writes. reads random deployments, primarily and In where other writes. reads the random deployments, MapReduce and In where other writes. reads the deployments, MapReduce and In where proother writes. the deployments, MapReduce In where proother the deploy Map whe pro Figure 3.6 that AFigure table consists 3.6 AFigure table ofthe multiple consists 3.6 A smaller Figure table ofstored multiple consists 3.6 chunks A smaller Figure table called ofstored multiple consists 3.6 chunks regions. A smaller Figure table called ofstored multiple consists 3.6 chunks regions. A smaller table called ofstored multiple consists chunks regions. smaller called ofstored multiple chunks regions. smaller called chunks regions. called Given the Given underlying that Given underlying data that is the Given underlying data that in is the Given underlying data that in is the Given underlying data that in is the underlying data in is data in is stored in HDFS , which HDFS is available , which HDFS is to available all , which clients HDFS is to as available all , which clients HDFS is to as available all , which clients HDF is to a a ers can theoretically ers a can theoretically ers and can write theoretically ers to and can the write theoretically local read ers to DataNode and can the write theoretically local read ers to as DataNode and can the write theoretically primary local read to as DataNode and DataNode. write primary local read to as DataNode and the DataNode. write primary local to as DataNode the DataNode. primary local as Dat th cessing is also cessing part isread also of cessing the a part workloads, isread also of cessing the a part workloads, TaskTrackers, is also of cessing the a part workloads, TaskTrackers, is DataNodes, also of cessing the a part workloads, TaskTrackers, is DataNodes, and also of the athe part workloads, TaskTrackers, DataNodes, and of the workloads, TaskTrackers, DataNodes, and TaskTracke DataNod and HB ase RegionHB ase RegionHB ase Region HB a single namespace, a single namespace, all aRegionServers single namespace, all aRegionServers single have namespace, all access aRegionServers single have to the namespace, all access same aRegionServers single have to persisted the namespace, all access same RegionServers files have to persisted the in all access the same RegionServers file files have to persisted the in access the same file files have to persisted in access the same fil fil t You can may wonder You may where wonder You the may TaskTrackers where wonder You the may TaskTrackers where are wonder You in the may this TaskTrackers where are scheme wonder You in the may this of TaskTrackers where are scheme things. wonder in the this In of TaskTrackers where some are scheme things. in the this In of TaskTrackers some are scheme things. inthe this In of som are sche thin Servers Servers run together. can Servers run together. can Servers run together. can Servers run together. can Servers run together. can run together. store/access store/access data on HDFS store/access data . The on master HDFS store/access data . process The on master HDFS store/access does data . process The on the master HDFS distribution store/access does data . process The on the master HDFS distribution of does data regions . process The on the master HDFS distribution among of does regions . process The the master distribution among of does regions process the distrib amon of does reg system and system can therefore and system can host therefore and any system can region host therefore and any (figure system can region host therefore 3.8). and any (figure By system can region physically host therefore 3.8). and any (figure By can collocating region physically host therefore 3.8). any (figure By collocating region Dataphysically host 3.8). any (figure By collocating region Dataphysically 3.8). (figure By collo Data phy 3 HBase deployments, HB ase deployments, the HB MapReduce ase deployments, the HB MapReduce ase framework deployments, the HB MapReduce ase framework isnt deployments, deployed the HB MapReduce ase framework isnt at deployments, deployed all the if MapReduce the framework isnt workload atdeployed all the if MapReduce the framework isnt isworkload atdeployed all if the framework isnt isworkload atdeployed all if the isn i RegionServers, RegionServers, and each RegionServers, RegionServer and each RegionServers, RegionServer and typically each RegionServers, RegionServer hosts and typically multiple each RegionServers, RegionServer hosts and typically regions. multiple each RegionServer hosts and typically regions. multiple each RegionServer hosts typically regions. multiple hosts typically regions. multiple ho Nodes and Nodes RegionServers, and Nodes RegionServers, you and can Nodes RegionServers, use you the and can data Nodes RegionServers, use locality you the and can data property; Nodes RegionServers, use locality you the and can data that property; RegionServers, use is, locality you the RegionServcan data that property; use is, locality you the RegionServcan data that property; use is, locality the RegionServ data that prope is, loc primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, primarily and In other writes. reads random deployments, primarily and In where other writes. reads the random deployments, MapReduce and In where other writes. reads the deployments, MapReduce and In where proother writes. the deployments, MapReduce In where proother the deploy Map whe pro Given that the Given underlying that the Given underlying data that is stored the Given underlying data that in is stored the Given underlying data that in is stored the Given underlying data that in is stored the underlying data in is stored data in is stored in HDFS , which HDFS is available , which HDFS is to available all , which clients HDFS is to as available all , which clients HDFS is to as available all , which clients HDF is to a a ers can theoretically ers RegionServer can theoretically ers and can write theoretically ers to and can the write theoretically local read ers to DataNode and can the write theoretically local read ers to as DataNode and can the write theoretically primary local read to as DataNode and the DataNode. write primary local read to as DataNode and the DataNode. write primary local to as DataNode the DataNode. primary local as Dat th cessing is also cessing a part isread also of cessing the a part workloads, isread also of cessing the a part workloads, TaskTrackers, is also of cessing the a part workloads, TaskTrackers, is DataNodes, also of cessing the a part workloads, TaskTrackers, is DataNodes, and also of the aRegionServer part workloads, TaskTrackers, DataNodes, and of the workloads, TaskTrackers, DataNodes, and TaskTracke DataNod and HB ase RegionHB ase RegionHB ase Region HB DataNode DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServ Re a single namespace, a single namespace, all aRegionServers single namespace, all aRegionServers single have namespace, all access aRegionServers single have to the namespace, all access same aRegionServers single have to persisted the namespace, all access same RegionServers files have to persisted the in all access the same RegionServers file files have to persisted the in access the same file files have to persisted in access the same fil fil t You can may wonder You may where wonder You the may TaskTrackers where wonder You the may TaskTrackers where are wonder You in the may this TaskTrackers where are scheme wonder You in the may this of TaskTrackers where are scheme things. wonder in the this In of TaskTrackers where some are scheme things. in the this In of TaskTrackers some are scheme things. inthe this In of som are sche thin Servers Servers run together. can Servers run together. can Servers run together. can Servers run together. can Servers run together. can run together. system and system can therefore and system can host therefore and any system can region host therefore and any (figure system can region host therefore 3.8). and any (figure By system can region physically host therefore 3.8). and any (figure By can collocating region physically host therefore 3.8). any (figure By collocating region Dataphysically host 3.8). any (figure By collocating region Dataphysically 3.8). (figure By collo Data phy 3 HBase deployments, HB ase deployments, the HB MapReduce ase deployments, the HB MapReduce ase framework deployments, the HB MapReduce ase framework isnt deployments, deployed the HB MapReduce ase framework isnt at deployments, deployed all the if MapReduce the framework isnt workload atdeployed all the if MapReduce the framework isnt isworkload atdeployed all if the framework isnt isworkload atdeployed all if the isn i Nodes and Nodes RegionServers, and Nodes RegionServers, you and can Nodes RegionServers, use you the and can data Nodes RegionServers, use locality you the and can data property; Nodes RegionServers, use locality you the and can data that property; RegionServers, use is, locality you the RegionServcan data that property; use is, locality you the RegionServcan data that property; use is, locality the RegionServ data that prope is, loc primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, primarily and In other writes. reads random deployments, primarily and In where other writes. reads the random deployments, MapReduce and In where other writes. reads the deployments, MapReduce and In where proother writes. the deployments, MapReduce In where proother the deploy Map whe pro ers can theoretically ers can theoretically read ers and can write theoretically read ers to and can the write theoretically local read ers to DataNode and can the write theoretically local read ers to as DataNode and can the write theoretically primary local read to as DataNode and the DataNode. write primary local read to as DataNode and the DataNode. write primary local to as DataNode the DataNode. primary local as Dat th cessing is also cessing a DataNode part is also of cessing the a DataNode part workloads, is also of cessing the a DataNode part workloads, TaskTrackers, isRegionServer also of cessing the a DataNode part workloads, TaskTrackers, isDataNodes, also of cessing the a DataNode part workloads, TaskTrackers, isDataNodes, and also of the aRegionServer part workloads, TaskTrackers, DataNodes, and of the workloads, TaskTrackers, DataNodes, and TaskTracke DataNod and HB ase RegionHB ase RegionHB ase Region HB DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServ Re 1368394583 7 You can may wonder You may where wonder You the may TaskTrackers where wonder You the may TaskTrackers where are wonder You in the may this TaskTrackers where are scheme wonder You in the may this of TaskTrackers where are scheme things. wonder in the this In of TaskTrackers where some are scheme things. in the this In of TaskTrackers some are scheme things. in this In of som are sche thin Servers Servers run together. can Servers run together. can Servers run together. can Servers run together. can Servers run together. can run together. "bar" 1368394261 "hello" HBase deployments, HBase deployments, the HB MapReduce ase deployments, the HB MapReduce ase framework deployments, the HB MapReduce ase framework isnt deployments, deployed the HB MapReduce ase framework isnt at deployments, deployed all the ifMapReduce the framework isnt workload atdeployed all the ifMapReduce the framework isnt isworkload atdeployed all if the framework isnt isworkload atdeployed all if the isn i primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, primarily and In other writes. reads random deployments, primarily and In where other writes. reads the random deployments, MapReduce and In where other writes. reads the deployments, MapReduce and In where proother writes. the deployments, MapReduce In where proother the deploy Map whe pro cf1 1368394583 22 Figure 3.7 HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase Figure DataNode and RegionServer 3.7 HDFS processes HBase Figure DataNode and RegionServer 3.7 are HDFS processes typically HBase Figure DataNode and RegionServer 3.7 collocated are HDFS processes typically HBase DataNode on and RegionServer the collocated are HDFS same processes typically DataNode host. on and the collocated are HDFS same processes typically DataNode host. on the collocated are same processe typically host on cessing is also cessing a DataNode part is also of cessing the a DataNode part workloads, is also of cessing the a DataNode part workloads, TaskTrackers, isRegionServer also of cessing the a DataNode part workloads, TaskTrackers, isDataNodes, also of cessing the a DataNode part workloads, TaskTrackers, isDataNodes, and also of the aRegionServer part workloads, TaskTrackers, DataNodes, and of the workloads, TaskTrackers, DataNodes, and TaskTracke DataNod and HB ase RegionHB ase RegionHB ase Region HB DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServ Re "foo" 1368394925 13.6 a Servers canServers run together. canServers run together. canServers run together. canServers run together. canServers run together. can run together.
C1 tree C0 tree Disk C1 tree Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
Disk
Memory
1368393847
C1 tree Disk C1 tree
"world"
C1 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
C0 tree
C1 tree
C0 tree
C1 tree
C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Figure 3.7
HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase Figure DataNode and RegionServer 3.7 HDFS processes HBase Figure DataNode and RegionServer 3.7 are HDFS processes typically HBase Figure DataNode and RegionServer 3.7 collocated are HDFS typically HBase DataNode on and RegionServer the collocated are HDFS same processes typically DataNode host. on and the collocated are HDFS same processes typically DataNode host. on the collocated are same processe typically host on "2011-07-04" 1368396302 "fourth of processes July" DataNode RegionServer DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServ Re cf2 1.0001 1368387684 "almost the loneliest number"
Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk Memory Disk Memory Disk Memory Disk Memory Disk Memory Disk Memory C1 tree C0 tree C1 tree C0 tree C1 tree C0 tree C1 tree C0 tree C1 tree C0 tree C1 tree C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Licensed to Licensed Nick Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Nick <ndimiduk@gmail.com Dimiduk <ndimiduk@
Memory C0 tree
Figure 3.7
HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase Figure DataNode and RegionServer 3.7 HDFS processes HBase Figure DataNode and RegionServer 3.7 are HDFS processes typically HBase Figure DataNode and RegionServer 3.7 collocated are HDFS processes typically HBase DataNode on and RegionServer the collocated are HDFS same processes typically DataNode host. on and the collocated are HDFS same processes typically DataNode host. on the collocated are same processe typically host on
Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk C1 tree Memory C0 tree Disk Memory Disk Memory Disk Memory Disk Memory Disk Memory Disk Memory
Disk C1 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Cache
Licensed to Licensed Nick Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Nick <ndimiduk@gmail.com Dimiduk <ndimiduk@
Figure 3.7
Attribution-ShareAlike 3.0 Unported License. Page 12
HBase Figure RegionServer 3.7 HBase Figure and RegionServer 3.7 HDFS HBase Figure DataNode and RegionServer 3.7 HDFS processes HBase Figure DataNode and RegionServer 3.7 are HDFS processes typically HBase Figure DataNode and RegionServer 3.7 collocated are HDFS processes typically HBase DataNode on and RegionServer the collocated are HDFS same processes typically DataNode host. on and the collocated are HDFS same processes typically DataNode host. on the collocated are same processe typically host on Licensed under a Creative Commons
High-level Architecture
Page 13
"bar" cf1 a
Page 14
Column Families
value 7 "hello" 22 13.6 "world" "fourth of July"
Rows
"bar" cf1 a
Page 15
User API
! {rowkey => {family => {qualifier => {version => value}}}}
! Think: nested OrderedDictionary (C#), TreeMap (Java)
! Basic data operations: GET, PUT, DELETE ! SCAN over range of key-values
! benefit of the sorted rowkey business ! this is how you implement any kind of "complex query *
Logical Architecture
Table A
a b c d e f g h i j k l m n o p
Region Server 7
Table A, Region 1 Table A, Region 2 Table G, Region 1070 Table L, Region 25
Region 1
Region Server 86
Table A, Region 3 Table C, Region 30 Table F, Region 160 Table F, Region 776
Region 2
Region 3
Region 4
Page 17
Servers can Servers run together. can Servers run together. can Servers run together. can Servers run together. can Servers run together. can run together. ers namespace, can theoretically read and write to the local DataNode as the primary DataNode. a single a single namespace, all RegionServers all RegionServers have access have to the access same to persisted the same files persisted in the files file in the file ers can theoretically read and write to the local DataNode as the primary DataNod Nodes and Nodes RegionServers, and Nodes RegionServers, you and can Nodes RegionServers, use you and the can RegionServers, data use you locality the can data property; use you locality the can data that prop use lo i system and can therefore host You may wonder where the TaskTrackers are in this scheme of things. In some system and system can therefore and can therefore host any region host any (figure region 3.8). (figure By physically 3.8). By physically collocating collocating DataDataYou may wonder where the TaskTrackers are in this scheme of things. In so ers can theoretically ers can theoretically read ers can and theoretically write read ers can to and the theoretically write read local to and DataNode the write read local to and DataNode as the the write local primar to Da as th Nodes and RegionServers, you HB ase deployments, the MapReduce framework isnt deployed at all if the workload is Nodes and Nodes RegionServers, and you canmay use you the can data use locality the data property; locality property; that is, RegionServthat is, HB aseRegionServers, deployments, the MapReduce framework isnt deployed at allRegionServif the workload You wonder You may where wonder You the may where TaskTrackers wonder You the may where TaskTrackers wonder are the in where this TaskTrackers are scheme the in this TaskTr of are sch th ers can theoretically read and DataNode RegionServer DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer RegionServer DataNode RegionServer DataNode Regio Dat R primarily random reads and writes. In other deployments, where the MapReduce proers can theoretically ers can theoretically read and write read and toHB the write local tothe DataNode the local DataNode asdeployments, the primary as the DataNode. primary DataNode. primarily random reads and writes. In other where the MapReduce p HB ase deployments, ase deployments, HB ase MapReduce deployments, the HB ase MapReduce framework deployments, the MapReduce framework isnt deployed MapReduce framework isnt at deploye all if fra is t REST/Thrift JavaApp JavaApp JavaApp JavaApp HBase Shellthe You may wonder where th Gateway cessing also a part of the workloads, TaskTrackers, DataNodes, and HB ase RegionYou may is wonder You may where wonder the TaskTrackers the are in this are scheme in this of scheme things. of In things. some In some cessing is also a where part of theTaskTrackers workloads, TaskTrackers, DataNodes, and HB ase Regi primarily random primarily reads random primarily and writes. reads random primarily and In other writes. reads random deployments, and In other writes. reads deployments, and Inwhere other writes. the deplo InM wh o HB ase deployments, the MapR Servers run the together. HBase deployments, HBcan aseServers deployments, MapReduce the MapReduce framework isnt isnt deployed at all if the at all if of the workload is workloads, is can run together. cessing is also cessing a part is framework also cessing of the a deployed part workloads, is ... also cessing of the a part workloads, is TaskTrackers, also of the a workload part workloads, TaskTrackers, DataNodes, the TaskTracke DataN and HBase HBase HBase HBase primarily HBase random reads and wr HBase primarily primarily random reads random and reads writes. and In writes. other deployments, In Servers other deployments, where the where MapReduce the MapReduce proClient Clienttogether. Client Client Client Servers can Servers run can run together. can Servers run together. can run together. Client cessing is also a part of prothe wo cessing is cessing also a part is also of the a part workloads, of the workloads, TaskTrackers, TaskTrackers, DataNodes, DataNodes, and and HBase RegionHBase RegionServers can run together. DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer DataNode RegionServer Servers can Servers run together. can run together.
Physical Architecture
Figure 3.7
DataNode
HBase FigureRegionServer 3.7 HBase FigureRegionServer 3.7 and HDFS HBase Figure DataNode RegionServer 3.7 and HDFS HBase Figure processes DataNode RegionServer 3.7 and HDFS HBase are Figure processes typically DataNode RegionServer 3.7 and HDFS collocated HBase are processes typically DataNode RegionServer and on HDFS collocated are the processes typically same DataNode and host. on HDF col are thp
Region Zoo HBase Server RegionServer DataNode RegionServer DataNode Keeper Master
DataNode
RegionServer DataNode RegionServer DataNode DataNode RegionServer DataNode RegionServer DataNode RegionServer RegionServer DataNode DataNode RegionSe DataNod DataNode RegionServer Region Region Region Server DataNode Server RegionServer RegionServer DataNode RegionServer DataNode RegionServer Server
... Name Node
Zoo Keeper
Zoo Keeper
Data Node
Data Node
Data Node
Data Node
Figure 3.7 HBase and HDFS DataNode processes are typically collocated oncollocated the same host. FigureRegionServer 3.7 HBase RegionServer and HDFS DataNode processes are typically on the same h
Attribution-ShareAlike 3.0 Unported License.
Licensed to Licensed Nick Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gmail.com> Dimiduk to Licensed Nick <ndimiduk@gm Dimiduk to Nick <ndim Dim
Figure 3.7 HBase Figure RegionServer 3.7 HBase Figure RegionServer 3.7 and HDFS HBase Figure DataNode RegionServer 3.7 and HDFS HBase processes DataNode RegionServer and HDFS are processes typically DataNode and HDFS collocated are process typical DataN Licensed under a Creative Commons Page 18
Anatomy of a RegionServer
Page 19
DataNode
RegionServer
Disk C1 tree
Memory C0 tree
Disk C1 tree
Memory C0 tree
Cache
Disk C1 tree
Memory C0 tree
Disk
Memory
Page 20
Storage Machinery
RegionServer (HBase) BlockCache HLog (WAL) HRegion HStore StoreFile StoreFile MemStore ... ... HFile HFile ... ... HStore HRegion HStore HStore
Page 21
Storage Machinery
RegionServer (HBase)
Cache
BlockCache HLog (WAL) HRegion HStore StoreFile StoreFile MemStore HStore
C1
HRegion HStore
C1
HStore
C1
C1
HFile HFile
...
C0
... ...
C0 C0
...
C0
Page 22
3
... HFile HFile
Page 23
3
HLog (WAL) HRegion HStore StoreFile
BlockCache
4
HStore
2 3
StoreFile
MemStore
2
... ... ...
Page 24
By Example
Page 25
Database Dichotomy
Latency
Lossy WAL Durability Compression Smart Rowkey Design Smaller Block Size Smart Rowkey Design Compression Row+Column Bloomlter Larger BlockCache
Write
Read
Compression Smart Rowkey Design Less Frequent Compactions Larger Increased Block Size Scanner Caching
Throughput
Page 26
Web-scale Database
App server Counters Sessions User proles Social Media Application Data App server
Write Read Latency
App server
App server
App server
Throughput
Page 27
BigIndex
Search Search App server Search Search App server "BigIndex" Document store Search App server
App server
Latency
App server
Write
Read
Throughput
Page 28
Materialized View
App server App server
App server
App server
App server
Latency
Write
Read
Throughput
Page 29
ETL Assist
Latency
Write
Read
Throughput
Page 30
Lambda Architecture
App server
App server
App server
App server
App server
Page 31
Resources
Page 32
! In person
! HBaseCon, hbasecon.com ! Hadoop Summit, hadoopsummit.org ! Local meetup near you
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Page 33
HBase 0.96.x
! Stability ! Horizontal Scalability (1000s of machines) ! Speed of Recovery (MTTR) ! Operations ! Future-proofing RPC (protobuf) ! Improved Multi-tenancy ! Hadoop1, Hadoop2 ! Plenty more enhancements, new features:
! https://blogs.apache.org/hbase/entry/hbase_0_96_0_released
! *Backward incompatible*
Page 34
Thanks!
Nick Dimiduk
github.com/ndimiduk
Nick Dimiduk Amandeep Khurana
FOREWORD BY Michael Stack
@xefyr n10k.com
MANNING
hbaseinaction.com
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Page 35