Sie sind auf Seite 1von 7

Hive Log In

cloudera@quickstart ~]$ hive


Logging initialized using configuration in file:/etc/hive/conf.dist/hivelog4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
Create database
hive> create database liamdatabase;
OK
Time taken: 1.389 seconds
hive> use liamdatabase;
OK
Time taken: 0.068 seconds
Create table Schema
hive> CREATE TABLE u_user(ID int, AGE int, Gender string, Occupation string,
ZipCode int) row format delimited fields terminated by '|' lines terminated by
'\n' stored as textfile;
OK
Time taken: 0.084 seconds
Create permanent FILE
hive> load data local inpath '/home/cloudera/Desktop/ml-100k/u.user' overwrite
into table u_user;
Loading data to table liamdatabase.u_user
Table liamdatabase.u_user stats: [numFiles=1, numRows=0, totalSize=22628,
rawDataSize=0]
OK
Time taken: 0.79 seconds
hive> SHOW DATABASES;
OK
default
la
la_foreclosures
liamdatabase
movies
vaers
Time taken: 0.167 seconds, Fetched: 6 row(s)
hive> USE MOVIES;
OK
Time taken: 0.014 seconds
hive> SHOW TABLES;
OK
data
item
user
Time taken: 0.04 seconds, Fetched: 3 row(s)

Question 5
Code:
hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW
FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE select * from
user where job = 'entertainment';
Query ID = cloudera_20160512144545_0e43b5dc-8b67-4098-be8b-024b274834e4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1463088231602_0001, Tracking URL =
http://quickstart.cloudera:8088/proxy/application_1463088231602_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1463088231602_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers:
0
2016-05-12 14:45:38,949 Stage-1 map = 0%, reduce = 0%
2016-05-12 14:45:46,357 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.84
sec
MapReduce Total cumulative CPU time: 1 seconds 840 msec
Ended Job = job_1463088231602_0001
Copying data to local directory /home/cloudera/Desktop/Hive
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1
Cumulative CPU: 1.84 sec
HDFS Read: 26389 HDFS
Write: 637 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 840 msec
OK
Time taken: 21.125 seconds
hive>
Output:
16
21
M
entertainment
10309
39
41
M
entertainment
1040
75
24
M
entertainment
8816
92
32
M
entertainment
80525
145
31
M
entertainment
\N
179
15
M
entertainment
20755
255
23
M
entertainment
7029
331
33
M
entertainment
91344
375
17
M
entertainment
37777
387
33
M
entertainment
37412
422
26
M
entertainment
94533
432
22
M
entertainment
50311
448
23
M
entertainment
10021
567
24
M
entertainment
10003
721
24
F
entertainment
11238
839
38
F
entertainment
90814
915
50
M
entertainment
60614
926
49
M
entertainment
1701

Question 5
Which gender is the least represented in the data? Females
Code:
INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE SELECT Max(zip) FROM
user;
Output:
Which city is representing the most common zip-code? 99835, referenced a total
of 9 times.

Code:
INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE SELECT job,
count(gender) FROM user WHERE gender = "M" GROUP BY job;
INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE SELECT job,
count(gender) FROM user WHERE gender = "F" GROUP BY job;
Output:
Give the count of all the users based on occupation for each gender.
Males

Females

Load u.data. Determine which gender gave the most 1-star ratings. Males gave
the most 1-star ratings. The output from the join is as follows:
F

1894

4216

INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW FORMAT


DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE SELECT user.gender,
count(data.rating) FROM user JOIN data ON (user.userid = data.userid) WHERE
rating = 1 GROUP BY gender;

Determine the percentage breakdown of ratings on the u.data.


Following what Jorge posted for his Hive homework, I edited my query and
proceeded to get this result:
1

6.11

11.37

27.145000000000003

34.174

21.201

INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW FORMAT


DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE SELECT rating,
(count(rating)/100000)*100 FROM data GROUP BY rating;
I would have liked to write the query in a way that doesnt specify the total
count, but I couldnt find a way for it to work.

Das könnte Ihnen auch gefallen