You are on page 1of 17

Oozie Installation in 2.

x cluster
(https://acadgild.com/blog/beginners-guide-for-oozie-installation/)

STEP 1 :-

First we need to download the oozie-4.1.0 tar file from the below link:

Oozie-4.1.0tarfile (https://drive.google.com/file/d/0ByJLBTmJojjzNVcyMzhhQVg0ak0/view)

By default it will be downloaded in the Downloads folder.

STEP 2 :-

We need to move into the Downloads folder using the below commands:

cd Downloads Install

STEP 3 :-

We need to extract the tar file using the below command:

tar -xzvf oozie-4.1.0.tar.gz

The tar file will be extracted and we will get oozie-4.1.0 file
STEP 4 :-

Maven Installation

Before setting up the things for Oozie install maven in your system.

By using maven, Oozie download the dependencies required for your


Hadoop cluster based on the hadoops version.

If you are using Centos type the below command to install maven:

Command: yum install maven

If you are using Ubuntu type the below command to install maven:

Command: sudo apt-get install maven


STEP 5 :-

After the installation of maven check the installed maven by using the below
command

mvn -version

You must get the output as shown in the below screen shot

STEP 6 :-

Oozie distro creation

Now open the untared oozie-4.1.0 file and open the pom.xml file.
In the pom.xml file update the target version of java as your java version.
Here we are using Java7. So we have updated the target version as 1.7

If you are using Hadoop 2.x update the Hadoop version as 2.3 so that by
using maven, Oozie will refer the dependencies that are required to run it
on Hadoop 2.x cluster, Hadoop 2.3 dependencies are the latest one which
Oozie has added.

Now comment the codehaus repository, because codehaus has stopped its
services recently. So dependencies wont be downloaded from this
repository.

After making the above specified changes, save and close the file.

STEP 7 :-

Now move the pom.xml into the untared oozie-4.1.0 bin folder

and then type the below command:


./mkdistro.sh -DskipTests -X

The above command will run the disto, and prepares a distro file by skipping the
Tests by Debugging

Note: distro command will download the dependencies from maven that are required
for hadoop2.x cluster that required for Oozie.

The process will take some time, it will download all the dependencies required for
your project.

While making the distro file the you will get some dots as shown below, dont panic at
that time.
Finally you will get a success message as shown in the below figure.

STEP 8 :-

A target file will be created in the distro folder of your Oozie directory.

Now open the file target file inside distro folder.

Inside the target folder you can see the oozie-4.1.0-distro folder
Open the oozie-4.1.0-distro folder, inside you will find oozie-4.1.0 folder

This is the oozie-4.1.0 folder which consists of all the dependencies that are
required to run in a Hadoop cluster.

Copy this oozie-4.1.0 folder into your Hadoop user, in our case we are making
a Oozie directory in home folder($HOME) and then paste the obtained oozie-
4.1.0 folder in the path $HOME/Oozie

STEP 9 :-
Now change the path to newly obtained oozie-4.1.0 directory, create a
directory with name libext(library extension) using the command mkdir libext.

In the below screenshot we can see that libext directory has been created in the
path $HOME/oozie/oozie-4-1.0

Move into the libext directory using the command cd libext

Now copy the jar files of Hadoop-2.3.0 into the newly created libext folder.
You can find the libraries of Hadoop-2.3.0 in the following path.

oozie-4.1.0>hadooplibs>hadoop-2>target>hadooplibs>hadooplib-
2.3.0.oozie-4.1.0>

Please refer the below screen shot for the same.


STEP 10 :-

Copy the jar files inside hadooplib-2.3.0.oozie-4.1.0 to the newly


created libext folder

Now download the the ext-2.2 zip file from the below link

ext-2.2.zip
( https://drive.google.com/file/d/0ByJLBTmJojjzcDhxQUsyNEFSQm8/view)

Copy this downloaded ext-2.2.zip file into the newly created libext folder

This ext-2.2.zip file is required for WebUI.

Refer the below screen shot to see the presence of hadooplib-2.3.0.oozie-


4.1.0 jar files and ext-2.2.zip file inside the libext folder.
Now after setting up the things, move into the bin folder of newly
obtained oozie-4.1.0 in the path$HOME/oozie/oozie-4.1.0/

oozie-4.1.0/bin

STEP 11 :-

Preparing a War file

Now prepare a war file by using the below command

sudo ./oozie-setup.sh prepare-war

The above command will prepare a war file for Oozie.

After the successful preparation of war file, you will get the output as shown in
the below image.
STEP 12 :-

Now, open the core-site.xml file in your hadoops etc folder and add the below
properties.

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

<property>
<name>hadoop.proxyuser.hadoop_user_name.hosts</name>
<value>*</value>
</property>

<property>

<name>hadoop.proxyuser.hadoop_user_name..groups</name>
<value>*</value>
</property>
After doing the changes, save and close the file.

STEP 13 :-

Now open the oozie-site.xml file present in the newly obtained oozie-4.1.0s conf
directory.

In the oozie-site.xml file edit the below specified properties

In the oozie.service.HadoopAccessorService.hadoop.configurations, specify


your Hadoop configurations directory path.

Please refer the below for the same

<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/home/kiran/hadoop-2.7.1/etc/hadoop</value>
<description>
Comma separated AUTHORITY=HADOOP_CONF_DIR, where
AUTHORITY is the HOST:PORT of
the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
used when there is no exact match for an authority. The
HADOOP_CONF_DIR contains
the relevant Hadoop *-site.xml files. If the path is relative is looked within
the Oozie configuration directory; though the path can be absolute (i.e. to
point
to Hadoop client conf/ directories in the local filesystem.
</description>
</property>

In the oozie.service.workflowAppservice.system.libpath, give your Namenode port


number.please refer the below for the same.

<property>
<name>oozie.service.WorkflowAppService.system
.libpath</name>
<value>hdfs://localhost:9000/user/$
{user.name}/share/lib</value>
<description>
System library path to use for workflow
applications.
This path is added to workflow application if
their job properties sets
the property 'oozie.use.system.libpath' to true.
</description>
</property>

Now give the ownership permission to the oozie folder by using the below
command
sudo chown hadoop's_user_name oozie_file_path(in our case it is
$HOME/oozie)

Creating Sharelib directory in HDFS

Note: Make sure that all your hadoop daemons are started properly.

Move into the bin folder of newly created oozie-4.1.0.

Now create a file in hdfs for storing the oozie contents with name sharelib using
the below command:

./oozie-setup.sh sharelib create -fs hdfs://localhost:9000

The above command will create a folder with name sharelib in HDFS.

You will get a message as follows:

the destination path for sharelib is: hdfs://localhost:9000/user/kiran/share/lib/

Creating Oozie DB

Before creating a Oozie DB make sure that you have installed Mysql-server in your
system.

If you havent installed mysql, install it by using the command

Command to install mysql_server in Centos

sudo yum install mysql-server

Command to install mysql_server in Ubuntu


sudo apt-get install mysql-server

After the installation of MYSQL server, move into the newly created oozie-4.1.0s bin
folder and then type the below command

./ooziedb.sh create -sqlfile oozie.sql -run

After running this command successfully, you will get the below output

setting CATALINA_OPTS="$CATALINA_OPTS
-Xmx1024m"

Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '4.1.0'

The SQL commands have been written to: oozie.sql

With this step, your oozie installation is completed.


Now export the newly created oozies bin path into your .bashrc file from your home
folder by using the below command
gedit .bashrc
After the editing of bashrc file, save the file and close the file, now update the bashrc
file by using the below command
source .bashrc
Now your oozie is successfully configured with your hadoop cluster. Now start oozie
by using the command
1 oozied.sh start

Now your oozie is successfully started, you can also check the same with the webUI.
Open your browser, and then type localhost:11000, 11000 is the default port for
oozie.
All the Active and suspended jobs can be seen in the web UI.

We have successfully installed Oozie-4.1.0 on hadoop 2.x cluster.

Hope this blog helped you in installing oozie in your hadoop cluster, Keep visiting
our website Acadgild for more updates on Big Data and other technologies.
Click here to learn Big Data Hadoop Development.