Beruflich Dokumente
Kultur Dokumente
Question 5
Code:
hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW
FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE select * from
user where job = 'entertainment';
Query ID = cloudera_20160512144545_0e43b5dc-8b67-4098-be8b-024b274834e4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1463088231602_0001, Tracking URL =
http://quickstart.cloudera:8088/proxy/application_1463088231602_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1463088231602_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers:
0
2016-05-12 14:45:38,949 Stage-1 map = 0%, reduce = 0%
2016-05-12 14:45:46,357 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.84
sec
MapReduce Total cumulative CPU time: 1 seconds 840 msec
Ended Job = job_1463088231602_0001
Copying data to local directory /home/cloudera/Desktop/Hive
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1
Cumulative CPU: 1.84 sec
HDFS Read: 26389 HDFS
Write: 637 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 840 msec
OK
Time taken: 21.125 seconds
hive>
Output:
16
21
M
entertainment
10309
39
41
M
entertainment
1040
75
24
M
entertainment
8816
92
32
M
entertainment
80525
145
31
M
entertainment
\N
179
15
M
entertainment
20755
255
23
M
entertainment
7029
331
33
M
entertainment
91344
375
17
M
entertainment
37777
387
33
M
entertainment
37412
422
26
M
entertainment
94533
432
22
M
entertainment
50311
448
23
M
entertainment
10021
567
24
M
entertainment
10003
721
24
F
entertainment
11238
839
38
F
entertainment
90814
915
50
M
entertainment
60614
926
49
M
entertainment
1701
Question 5
Which gender is the least represented in the data? Females
Code:
INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE SELECT Max(zip) FROM
user;
Output:
Which city is representing the most common zip-code? 99835, referenced a total
of 9 times.
Code:
INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE SELECT job,
count(gender) FROM user WHERE gender = "M" GROUP BY job;
INSERT OVERWRITE LOCAL DIRECTORY '/home/cloudera/Desktop/Hive' ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE SELECT job,
count(gender) FROM user WHERE gender = "F" GROUP BY job;
Output:
Give the count of all the users based on occupation for each gender.
Males
Females
Load u.data. Determine which gender gave the most 1-star ratings. Males gave
the most 1-star ratings. The output from the join is as follows:
F
1894
4216
6.11
11.37
27.145000000000003
34.174
21.201