Sie sind auf Seite 1von 7

HBase Integration with Hive

Setup HBase Integration with Hive:


For setting up of HBase integration with Hive, we mainly require a few jar files present in
$HIVE_HOME/lib or $HBASE_HOME/lib directory. The required jar files are:
zookeeper-*.jar
hive-hbase-handler-*.jar
guava-*.jar
hbase-*.jar

es

We need to add the paths for the above jar files to value of hive.aux.jars.path in hivesite.xml configuration file,

Te

ch

no

lo

gi

<property>
<name>hive.aux.jars.path</name>
<value>file:///home/training/apache-hive-0.13.1-bin/lib/hive-hbase-handler-0.13.1.jar,
file:///home/training/apache-hive-0.13.1-bin/lib/zookeeper-3.4.5.jar,
file:///home/training/apache-hive-0.13.1-bin/lib/guava-11.0.2.jar,
...
</value>
<description>A comma separated list (with no spaces) of the jar files required for Hive-HBase
integration</description>
</property>

VS

Verify HBase Integration with Hive: (Managing HBase tables using Hive)

Lets create a new hbase table via hive shell. To test the hbase table creations we need
Hadoop and HBase daemons running,
start-all.sh
start-hbase.sh
Below is a sample hbase table creation DDL statement. In this, we are creating
hbase_table_emp table in Hive and emp table in HBase. This table will contain 3 columns in
Hive - key int, name string and role string. There are mapped to two columns name and
role belonging to cf1 column family. Here :key is specified at the beginning of the
hbase.columns.mapping property which automatically maps to first column (id int) in Hive
table.

CREATE TABLE hbase_table_emp(id int, name string, role string)


STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "emp");

es

VS

Te

ch

no

lo

gi

Lets verify this table emp in HBase shell and view its metadata.
$ hbase shell
hbase> list
hbase> describe emp

We cannot directly load data into hbase table emp with load data inpath hive
command. We have to copy data into it from another hive table. Lets create another test

hive table with the same schema as hbase_table_emp and we will insert records into it with
hive load data input command.

no

lo

gi

es

hive> create table testemp(id int, name string, role string) row format delimited
fields terminated by \t;
hive> load data local inpath /home/siva/sample.txt into table testemp;
hive> select * from testemp;

ch

Lets copy contents into hbase_table_emp table from testemp and verify its contents.

VS

Te

hive> insert overwrite table hbase_table_emp select * from testemp;


hive> select * from hbase_table_emp;

Lets see the contents of emp table from hbase shell,


$ hbase shell
hbase> scan emp

es
gi
lo

no

So we have successfully integrated HBase with Hive by creating and populating new HBase
tables from Hive shell.

ch

Mapping Existing HBase Tables to Hive:

VS

Te

Similar to creating new HBase tables, we can also map HBase existing tables to Hive. To
give Hive access to an existing HBase table with multiple columns and families, we need to
use CREATE EXTERNAL TABLE. Again, hbase.columns.mapping is required (and will be
validated against the existing HBase table's column families), whereas hbase.table.name
is optional.

For testing this, we will create user table in HBase as shown below and map this to Hive
table.
hbase(main):002:0>
hbase(main):003:0>
hbase(main):004:0>
hbase(main):005:0>
hbase(main):006:0>
hbase(main):007:0>
hbase(main):008:0>
hbase(main):009:0>
hbase(main):010:0>
hbase(main):011:0>

create 'user', 'cf1', 'cf2'


put 'user', 'row1', 'cf1:a', 'value1'
put 'user', 'row1', 'cf1:b', 'value2'
put 'user', 'row1', 'cf2:c', 'value3'
put 'user', 'row2', 'cf2:c', 'value4'
put 'user', 'row2', 'cf1:b', 'value5'
put 'user', 'row3', 'cf1:a', 'value6'
put 'user', 'row3', 'cf2:c', 'value7'
describe 'user'
scan 'user'

es
gi
lo
no
ch
Te
VS

Lets create corresponding Hive table for the above user HBase table. Below is the DDL for
creation of external table hbase_table_user
$ hive
hive> CREATE EXTERNAL TABLE hbase_table_user(key string, val1 string, val2
string, val3 string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:a,cf1:b,cf2:c")
TBLPROPERTIES("hbase.table.name" = "user");
Verify the contents of hbase_table_user table.
hive> DESCRIBE hbase_table_user;
hive> SELECT * FROM hbase_table_user;

es
gi
lo
no

ch

So, we have successfully mapped HBase table with Hive External table.

Hive MAP to HBase Column Family

VS

Te

Here's how a Hive MAP datatype can be used to access an entire column family. Each row
can have a different set of columns, where the column names correspond to the map keys
and the column values correspond to the map values.

CREATE TABLE hbase_table_1(value map<string,int>, row_key int)


STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "cf:,:key"
);
INSERT OVERWRITE TABLE hbase_table_1 SELECT map(bar, foo), foo FROM pokes
WHERE foo=98 OR foo=100;
(This example also demonstrates using a Hive column other than the first as the HBase row
key.)
Here's how this looks in HBase (with different column names in different rows):
hbase(main):012:0> scan "hbase_table_1"
ROW
COLUMN+CELL
100
column=cf:val_100, timestamp=1267739509194, value=100
98
column=cf:val_98, timestamp=1267739509194, value=98

2 row(s) in 0.0080 seconds


And when queried back into Hive:
hive> select * from hbase_table_1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
...
OK
{"val_100":100} 100
{"val_98":98}
98
Time taken: 3.808 seconds

VS

Te

ch

no

lo

gi

es

Note that the key of the MAP must have datatype string, since it is used for naming the
HBase column.