Sie sind auf Seite 1von 27

Big Data Hadoop and Spark Developer

HBase Support

© Simplilearn. All rights reserved.


Learning Objectives

Explain the meaning and need of NoSQL

Understand the HBase Data Model

Describe the HBase Architecture

Discuss Region Server Components


HBase Support
Topic 1: Meaning of NoSQL
Need for NoSql

• Relational Databases such as MS SQL, Oracle, etc. have been the mainstay
of the businesses so far.
• These RDBMS been handling the relational and transactional structured
Why NoSQL? data efficiently.
• However, the advent of Web based applications and social networks has
What is NoSQL? made the data unstructured.
• This has changed the landscape for data management.
• Hence, RDBMS have ceased to be a natural fit as the data is no longer
confined to a particular structure.
Need for NoSql

• NoSQL stands for Not Only SQL.


Why NoSQL? • The term NoSQL was introduced by Carl Strozzi in 1998 to name his file
based database.
What is NoSQL? • It was again reintroduced by Eric Evans, in an event organized to discuss
the open source distributed databases.
Need for NoSql

• NoSQL databases do not use a relational model.


Why NoSQL?
• They run well on clusters.
• They are open source.
What is NoSQL?
• They are schema less.
HBase Support
Topic 2: HBase Data Model
HBase Data Model: A “Simple Sorted” Map

{
“1” : “x”,
“aaaaa” : “y”,
“aaaab” : “world”,
“xyz” : “hello”
“zzzzz” : “woot”
}
HBase Data Model: Row and Column Families

{
“1” : { Top Level Key/map pair is called row
“A” : “x”
}
“aaaaa” : {
“A” : “y”
},
“aaaab” : {
“A” : “world”
},
“xyz” : { A is called as Column Family
“A” : “hello”,
},
}
HBase Data Model: Columns and Column Families

{
“1” : {
“A” : {
“foo”: “x”
},
“aaaaa” : {
“A” : {
“foo” : “y” Foo is the column of family A
},
“aaaab” : {
“A” : {
“foo” : “world”
},
“xyz” : {
“A” : {
“foo” : “hello”
},
},
HBase Data Model: Logical Representation

• Data is accessed and stored together


• RowKey is the primary index
• Column Families group similar data by row key
HBase Data Model: Physical Representation
HBase Data Model: Logical Vs. Physical Representation
HBase Data Model: Version Concepts
HBase Support
Topic 3: HBase Architecture
HBase Architecture

• HBase has three major components:


• Client
• HBase Masters
• Region Servers
HBase Architecture

• Region Servers can be added or removed as per the requirement.


• When accessing data, clients connect to the Region Servers directly.
• Region Assignment and DDL creation (create, delete, updates) operations are
handled by Hbase Master Server Process.
• ZooKeeper, which is a part of HDFS maintains a live cluster state.
HBase Architecture: Regions and Regions Servers

• HBase Tables are horizontally divided by row key range into what are called “Regions.”
• A region contains all rows in the table between the region’s start key and end key.
• Regions are assigned to the nodes in the cluster on top of commodity machines and managed
by Region Server Daemons. These daemons enable the data reads and writes.
• A region server can serve about 1000 regions.
HBase Architecture: Regions and Regions Servers

• HBase Tables are horizontally divided by row key range into what are called “Regions.”
• A region contains all rows in the table between the region’s start key and end key.
• Regions are assigned to the nodes in the cluster on top of commodity machines and managed
by Region Server Daemons. These daemons enable the data reads and writes.
• A region server can serve about 1000 regions.
HBase Architecture: HBase Master

• Region assignment DDL operations are handled by HBase Master.


• The master is also responsible for coordination with the region servers and for things
like assigning regions on start up and reassigning regions for recovery or load
balancing.
• It also performs admin functions like creating, updating, and deleting regions.
HBase Architecture: Zookeeper

• It is a distributed coordination service to maintain server state in the cluster.


• It determines which servers are alive and available and provides server failure
notification.
HBase Architecture: Read and Write

• META table (RowKey, RegionServer) holds the location of regions on the cluster.
• Zookeeper stores the location to this meta table.
• The client gets a reference of Region Server that hosts the META table from Zookeeper.
• It then queries the META table to get the region server corresponding to row key it wants to access.
• It caches this information along with the META table location.
• The client then reads the data.
HBase Support
Topic 4: Region Server Components
Region Server Components
Hbase Writes
Memstore
This concludes the lesson “HBase Support.”

©Simplilearn. All rights reserved