Single Node Hadoop Cluster

Aim: Set up a single-node Hadoop cluster backed by the Hadoop distributed file system,
running on ubuntu linux. After successful installation on one node, configuration of a

multi-node Hadoop cluster (one master and multiple slaves).
Hadoop installation
check all updates were already installed or not
$ sudo apt-get install update
Prerequisites
Sun java 7
$ sudo apt-get install default-jdk
After installation, make a quick check whether Sun’s JDK is correctly set up
$ java -version
Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop.
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
This will add the user hduser and the group hadoop to your local machine.
Configuring SSH
$ sudo apt-get install openssh-server
Now, First login with hduser
$ su hduser
The hadoop control scripts rely on SSH to peform cluster-wide operations. For example, there is
a script for stopping and starting all the daemons in the clusters. To work seamlessly, SSH needs
to be setup to allow password-less login for the hadoop user from machines in the cluster. The
simplest way to achive this is to generate a public/private key pair, and it will be shared across
the cluster.
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine.
For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for
the hduser user we created in the earlier.
We have to generate an SSH key for the hduser user.
$ ssh-keygen -t rsa -P “”
P “”, here indicates an empty password.
You have to enable SSH access to your local machine with this newly created key which is done
by the following command.
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh localhost
Now, download hadoop tar file from internet

$ wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.2/hadoop-2.7.2.tar.gz
Now, extract download zip file.
$ tar xvzf hadoop-2.7.2.tar.gz
Now, let move hadooop 2.7.2 to a directory of our choice, we will choose /usr/local/hadoop
$ sudo mv hadoop-2.7.2 /usr/local/hadoop
Let give the directory to hduser as the owner
$ sudo chown -R hduser /usr/local
Setup Configuration Files
The following files will have to be modified to complete the Hadoop setup:
1. ~/.bashrc
2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3. /usr/local/hadoop/etc/hadoop/core-site.xml
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
6. /usr/local/hadoop/etc/hadoop/yarn-site.xml
1. ~/.bashrc
Now let edit the bashrc file and append to the end of the file the path to hadoop
$ sudo nano ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
$ source ~/.bashrc
2. hadoop-env.sh
Now let give the java path to run hadoop
$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Before set the java path the sysntax looks like
export JAVA_HOME=${JAVA_HOME}
After set the java path the sysntax looks like
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
3. core-site.xml
The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting
up.
This file can be used to override the default settings that Hadoop starts with.
$ sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml
add code between <configuration>...</configuration>.
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
4. mapred-site.xml
By default, the /usr/local/hadoop/etc/hadoop/ folder contains

/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
file which has to be renamed/copied with the name mapred-site.xml
$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
The mapred-site.xml file is used to specify which framework is being used for MapReduce.
$ sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
5. hdfs-site.xml
The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the cluster that is being
used.
It is used to specify the directories which will be used as the namenode and the datanode on that host.
$ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
6. yarn-site.xml
$ sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Now, let create the folder where will process the hdfs jobs
$ sudo mkdir -p /usr/local/hadoop_tmp

$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
assign hduser the ownership of the folder
$ sudo chown -R hduser /usr/local/hadoop_tmp
Formatting the HDFS filesystem via the NameNode
hadoop namenode -format command should be executed once before we start using Hadoop.
If this command is executed again after Hadoop has been used, it'll destroy all the data on the Hadoop file
system.
$ hdfs namenode -format
Starting Hadoop
$ start-dfs.sh
dfs : dfs is used for start namenode and datanode
$ start-yarn.sh
yarn : yarn is for start resource manager.
$ jps
jps : It is verify that cluster is properly created or not.
Now, check your hadoop is installed completly or not follow this steps,
1. open your browser
2. write localhost:8088 in your address bar
if you see this type of window then good.
3. then write localhost:50070 in your address bar

if you see this type of window then your hadoop installed successfully.

Single Node Hadoop Cluster

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Single Node Hadoop Cluster

Hochgeladen von

Copyright:

Verfügbare Formate

Aim: Set up a single-node Hadoop cluster backed by the Hadoop distributed file system,

running on ubuntu linux. After successful installation on one node, configuration of a

check all updates were already installed or not

$ sudo apt-get install update

$ sudo apt-get install default-jdk

Adding a dedicated Hadoop system user

$ sudo apt-get install openssh-server

Now, First login with hduser

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Now, download hadoop tar file from internet

Now, extract download zip file.

$ tar xvzf hadoop-2.7.2.tar.gz

$ sudo mv hadoop-2.7.2 /usr/local/hadoop

Let give the directory to hduser as the owner

$ sudo chown -R hduser /usr/local

Setup Configuration Files

$ sudo nano ~/.bashrc

$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Before set the java path the sysntax looks like

After set the java path the sysntax looks like

$ sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

add code between <configuration>...</configuration>.

By default, the /usr/local/hadoop/etc/hadoop/ folder contains

$ sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

$ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

add code between <configuration>...</configuration>.

$ sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

add code between <configuration>...</configuration>.

$ sudo mkdir -p /usr/local/hadoop_tmp

$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode

assign hduser the ownership of the folder

$ sudo chown -R hduser /usr/local/hadoop_tmp

Formatting the HDFS filesystem via the NameNode

$ hdfs namenode -format

dfs : dfs is used for start namenode and datanode

yarn : yarn is for start resource manager.

if you see this type of window then good.

3. then write localhost:50070 in your address bar

Das könnte Ihnen auch gefallen