Hadoop Lab

CLOUDERA
Hadoop 2.0 – Hadoop Lab

Linux |HDFS Commands
Sriram Balasubramanian
2016
CALIFORNIA, UNITED STATES OF AMERICA

HADOOP LAB
Table of Contents
Hadoop Lab ....................................................................................................................................................................... 2
Data Node Calculation ................................................................................................................................................. 2
Linux Commands .......................................................................................................................................................... 3
HDFS Commands .......................................................................................................................................................... 4
1|Hadoop Lab P age

Hadoop Lab Assignment
Data Node Calculation
Let's assume that, you have 100 TB of data to store and process with Hadoop. The configuration of each available Data Node is
as follows:
 8 GB RAM
 10 TB HDD
 100 MB/s read-write speed
You have a Hadoop Cluster with replication factor = 3 and block size = 64 MB.
In this case, the number of DataNodes required to store would be:
 Total amount of Data * Replication Factor / Disk Space available on each DataNode
 100 * 3 / 10
 30 DataNodes
Now, let's assume you need to process this 100 TB of data using MapReduce.
And, reading 100 TB data at a speed of 100 MB/s using only 1 node would take:
 Total data / Read-write speed
 100 * 1024 * 1024 / 100
 1048576 seconds
 291.27 hours
So, with 30 DataNodes you would be able to finish this MapReduce job in:
 291.27 / 30
 9.70 hours
1. Problem Statement
How many such Data Nodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster?
1. Problem Solution
1.1 Time required for reading the data using Single DataNode
1 DataNode takes: -
Total Data/Read –Write speed

(100TB * 1024*1024) in MB / 100MB/s
1048576 seconds or 291.77 hours to read 100TB Data
1.2 DataNodes required to read the data in FIVE minutes
Number of DataNodes required to read 100TB in 5 minutes
Time taken by 1 DataNode to read the 100TB data / Total time given to finish the read
= (1048576 seconds/60)/5 minutes
= 3495.253333 Data Nodes
So, you would need ~ 3495 such DataNodes to read the 100TB data in 5 minutes
2|Hadoop Lab P age

Linux Commands
Basic Linux Commands
Command Usage Description
ls ls /directory_path List files
eg: ls /home/Bigdata
cp cp src_file_path Copy files

/destination_directory_path
eg: cp student /home/Bigdata/Desktop
mv mv src _file_path Renames a file or moves it

/destination_directory_path from one directory to
eg: mv myfile.txt /newdirectory another directory
mv current_filename new_filename
eg: mv computer.txt computer_hope.txt
rm rm file_path Delete files

eg: rm /home/Bigdata/Desktop/file_name
ln ln file_path Link files

/destination_directory_path
eg: ln student /home/Bigdata/Desktop
cd cd /Directory_path Change directory

eg: cd /home/Bigdata/Desktop
pwd pwd Print current directory

name or Present Working
Directory
mkdir mkdir /new_directory_path
mkdir Create
Create
directory
directory
eg: mkdir /new_directory_path
/home/Bigdata/Desktop/new_directory
eg: mkdir
/home/Bigdata/Desktop/n
ew_directory
rmdir rmdir /directory_path Delete directory
eg: rmdir
/home/Bigdata/Desktop/directory
cat cat /file_path View files

eg: cat /home/Bigdata/file_name
nl nl /file_path Number lines

eg: nl studentRoll
gedit gedit /file_path Text Editor

eg: gedit studentRoll
stat stat /file_path Display file

eg: stat studentRoll attributes(properties)
wc wc /file_path Count bytes/words/lines

eg: wc studentRoll
chown chown user_name file_path Change file owner

eg: chown Bigdata
/home/Bigdata/Desktop/README.txt
chgrp chgrp group_name /file_path Change file group

eg: chgrp admin
3|Hadoop Lab P age

ifconfig ifconfig Set/display network
information/ Ip address
chattr sudo chattr +i/-i /file_path Change advanced file
eg: sudo chattr +i attributes
HDFS Commands
Command to find the Version of Hadoop:

Command: hadoop version
LS command:
Displays List of Files and Directories in HDFS file Path
Command: hadoop fs –ls /
MKDIR command:
It creates the directory in HDFS
Syntax: hadoop fs–mkdir /directory_name
E.g: hadoop fs –mkdir /Bigdata
DU command:
Displays the summary of file lengths.
Syntax: hadoop fs–du –s /path/to/file_in_hdfs
Command: hadoop fs –du –s / Bigdata /test
Note: Here test is a file that exists in HDFS in the directory Bigdata
TOUCHZ command:
Create a file in HDFS with file size 0 bytes
Syntax: hadoop fs–touchz /directory/filename
E.g: hadoop fs –touchz / Bigdata /sample
Note: Here we are trying to create a file named “sample” in the directory ‘Bigdata of hdfs with
file size 0 bytes.
CAT command:
Copies source paths to stdout.
Syntax: hadoop fs–cat /path/to/file_in_hdfs
Command: hadoop fs –cat / Bigdata /test
TEXT command:
Takes a source file and outputs the file in text format. (Same as Cat command)
Syntax: hadoop fs–text /path/to/file_in_hdfs
Command: hadoop fs –text / Bigdata test
copyFromLocal command :
Copy the file from Local file system to HDFS.
Syntax: hadoop fs -copyFromLocal <localsrc> URI
E.g.: hadoop fs –copyFromLocal /home/ Bigdata/Desktop/test / Bigdata
Note: Here test is the file present in the local directory - /home/Bigdata/Desktop
copyToLocal command :
Copy the file from HDFS to Local File System.
Syntax: hadoop fs -copyToLocal URI <localdst>
Command: hadoop fs –copyToLocal / Bigdata/test /home/ Bigdata
Note: Here test is a file present in Bigdata directory of HDFS
PUT command:
Copy single source, or multiple sources from local file system to the destination file system.
Syntax: hadoop fs -put <localsrc> ... <dst>
Command: hadoop fs –put /home/ Bigdata/Desktop/test /user
4|Hadoop Lab P age
Note: copyFromLocal is similar to put command, except that the source is restricted to a local
file reference
GET command:
Copy files from hdfs to the local file system.
Syntax: hadoop fs -get [-ignorecrc] [-crc] <src><localdst>
E.g.: hadoop fs –get /user/test /home/ Bigdata
Note: copyToLocal is similar to get command, except that the destination is restricted to a
local file reference.
COUNT command:
Count the number of directories, files and bytes under the paths that match the specified file
pattern.
Command: hadoop fs –count /user
RM command:
Remove the file from HDFS.
Syntax: hadoop fs–rm /path/to/file_in_hdfs
Command: hadoop fs –rm / Bigdata/test
RMR command:
Remove the directory to HDFS
Syntax: hadoop fs–rmr /path/to/directory_in_hdfs
Command: hadoop fs –rmr / Bigdata/
5|Hadoop Lab Page

Hadoop Lab

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Hadoop Lab

Hochgeladen von

Copyright:

Verfügbare Formate

CLOUDERA

Hadoop 2.0 – Hadoop Lab

CALIFORNIA, UNITED STATES OF AMERICA

1|Hadoop Lab P age

Total Data/Read –Write speed

1.2 DataNodes required to read the data in FIVE minutes

Number of DataNodes required to read 100TB in 5 minutes

2|Hadoop Lab P age

cp cp src_file_path Copy files

mv mv src _file_path Renames a file or moves it

rm rm file_path Delete files

ln ln file_path Link files

cd cd /Directory_path Change directory

pwd pwd Print current directory

cat cat /file_path View files

nl nl /file_path Number lines

gedit gedit /file_path Text Editor

stat stat /file_path Display file

wc wc /file_path Count bytes/words/lines

chown chown user_name file_path Change file owner

chgrp chgrp group_name /file_path Change file group

3|Hadoop Lab P age

Command to find the Version of Hadoop:

5|Hadoop Lab Page

Das könnte Ihnen auch gefallen