Sie sind auf Seite 1von 6

CLOUDERA

Hadoop 2.0 – Hadoop Lab


Linux |HDFS Commands
Sriram Balasubramanian

2016

CALIFORNIA, UNITED STATES OF AMERICA


HADOOP LAB

Table of Contents
Hadoop Lab ....................................................................................................................................................................... 2
Data Node Calculation ................................................................................................................................................. 2
Linux Commands .......................................................................................................................................................... 3
HDFS Commands .......................................................................................................................................................... 4

1|Hadoop Lab P age


Hadoop Lab Assignment
Data Node Calculation
Let's assume that, you have 100 TB of data to store and process with Hadoop. The configuration of each available Data Node is
as follows:
 8 GB RAM
 10 TB HDD
 100 MB/s read-write speed

You have a Hadoop Cluster with replication factor = 3 and block size = 64 MB.
In this case, the number of DataNodes required to store would be:
 Total amount of Data * Replication Factor / Disk Space available on each DataNode
 100 * 3 / 10
 30 DataNodes

Now, let's assume you need to process this 100 TB of data using MapReduce.
And, reading 100 TB data at a speed of 100 MB/s using only 1 node would take:
 Total data / Read-write speed
 100 * 1024 * 1024 / 100
 1048576 seconds
 291.27 hours

So, with 30 DataNodes you would be able to finish this MapReduce job in:
 291.27 / 30
 9.70 hours

1. Problem Statement
How many such Data Nodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster?

1. Problem Solution
1.1 Time required for reading the data using Single DataNode

1 DataNode takes: -

Total Data/Read –Write speed


(100TB * 1024*1024) in MB / 100MB/s
1048576 seconds or 291.77 hours to read 100TB Data

1.2 DataNodes required to read the data in FIVE minutes

Number of DataNodes required to read 100TB in 5 minutes

Time taken by 1 DataNode to read the 100TB data / Total time given to finish the read
= (1048576 seconds/60)/5 minutes
= 3495.253333 Data Nodes

So, you would need ~ 3495 such DataNodes to read the 100TB data in 5 minutes

2|Hadoop Lab P age


Linux Commands
Basic Linux Commands
Command Usage Description
ls ls /directory_path List files
eg: ls /home/Bigdata

cp cp src_file_path Copy files


/destination_directory_path
eg: cp student /home/Bigdata/Desktop

mv mv src _file_path Renames a file or moves it


/destination_directory_path from one directory to
eg: mv myfile.txt /newdirectory another directory
mv current_filename new_filename
eg: mv computer.txt computer_hope.txt

rm rm file_path Delete files


eg: rm /home/Bigdata/Desktop/file_name

ln ln file_path Link files


/destination_directory_path
eg: ln student /home/Bigdata/Desktop

cd cd /Directory_path Change directory


eg: cd /home/Bigdata/Desktop

pwd pwd Print current directory


name or Present Working
Directory
mkdir mkdir /new_directory_path
mkdir Create
Create
directory
directory
eg: mkdir /new_directory_path
/home/Bigdata/Desktop/new_directory
eg: mkdir
/home/Bigdata/Desktop/n
ew_directory
rmdir rmdir /directory_path Delete directory
eg: rmdir
/home/Bigdata/Desktop/directory

cat cat /file_path View files


eg: cat /home/Bigdata/file_name

nl nl /file_path Number lines


eg: nl studentRoll

gedit gedit /file_path Text Editor


eg: gedit studentRoll

stat stat /file_path Display file


eg: stat studentRoll attributes(properties)

wc wc /file_path Count bytes/words/lines


eg: wc studentRoll

chown chown user_name file_path Change file owner


eg: chown Bigdata
/home/Bigdata/Desktop/README.txt

chgrp chgrp group_name /file_path Change file group


eg: chgrp admin
/home/Bigdata/Desktop/README.txt

3|Hadoop Lab P age


ifconfig ifconfig Set/display network
information/ Ip address
chattr sudo chattr +i/-i /file_path Change advanced file
eg: sudo chattr +i attributes
/home/Bigdata/Desktop/README.txt

HDFS Commands

Command to find the Version of Hadoop:


Command: hadoop version

LS command:
Displays List of Files and Directories in HDFS file Path
Command: hadoop fs –ls /

MKDIR command:
It creates the directory in HDFS
Syntax: hadoop fs–mkdir /directory_name
E.g: hadoop fs –mkdir /Bigdata

DU command:
Displays the summary of file lengths.
Syntax: hadoop fs–du –s /path/to/file_in_hdfs
Command: hadoop fs –du –s / Bigdata /test
Note: Here test is a file that exists in HDFS in the directory Bigdata

TOUCHZ command:
Create a file in HDFS with file size 0 bytes
Syntax: hadoop fs–touchz /directory/filename
E.g: hadoop fs –touchz / Bigdata /sample
Note: Here we are trying to create a file named “sample” in the directory ‘Bigdata of hdfs with
file size 0 bytes.

CAT command:
Copies source paths to stdout.
Syntax: hadoop fs–cat /path/to/file_in_hdfs
Command: hadoop fs –cat / Bigdata /test
Note: Here test is a file that exists in HDFS in the directory Bigdata

TEXT command:
Takes a source file and outputs the file in text format. (Same as Cat command)
Syntax: hadoop fs–text /path/to/file_in_hdfs
Command: hadoop fs –text / Bigdata test
Note: Here test is a file that exists in HDFS in the directory Bigdata

copyFromLocal command :
Copy the file from Local file system to HDFS.
Syntax: hadoop fs -copyFromLocal <localsrc> URI
E.g.: hadoop fs –copyFromLocal /home/ Bigdata/Desktop/test / Bigdata
Note: Here test is the file present in the local directory - /home/Bigdata/Desktop

copyToLocal command :
Copy the file from HDFS to Local File System.
Syntax: hadoop fs -copyToLocal URI <localdst>
Command: hadoop fs –copyToLocal / Bigdata/test /home/ Bigdata
Note: Here test is a file present in Bigdata directory of HDFS

PUT command:
Copy single source, or multiple sources from local file system to the destination file system.
Syntax: hadoop fs -put <localsrc> ... <dst>
Command: hadoop fs –put /home/ Bigdata/Desktop/test /user
4|Hadoop Lab P age
Note: copyFromLocal is similar to put command, except that the source is restricted to a local
file reference

GET command:
Copy files from hdfs to the local file system.
Syntax: hadoop fs -get [-ignorecrc] [-crc] <src><localdst>
E.g.: hadoop fs –get /user/test /home/ Bigdata
Note: copyToLocal is similar to get command, except that the destination is restricted to a
local file reference.

COUNT command:
Count the number of directories, files and bytes under the paths that match the specified file
pattern.
Command: hadoop fs –count /user

RM command:
Remove the file from HDFS.
Syntax: hadoop fs–rm /path/to/file_in_hdfs
Command: hadoop fs –rm / Bigdata/test

RMR command:
Remove the directory to HDFS
Syntax: hadoop fs–rmr /path/to/directory_in_hdfs
Command: hadoop fs –rmr / Bigdata/

5|Hadoop Lab Page

Das könnte Ihnen auch gefallen