Sie sind auf Seite 1von 38

Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

Mo n d a y, Ma y 1 8 , 2 0 1 5 About M e

Image processing using OpenCV on Hadoop using HIPI

Blog Archive

2015 (2)
May (2)
Image processi
Hadoop usin
Image processi
Hadoop usin

HIPI meets OpenCV

Problem

This project tries to solve the problem of processing big data of images on
Apache Hadoop using Hadoop Image Processing Interface (HIPI) for storing and
efficient distributed processing, combined with OpenCV, an open source library of
rich image processing algorithms. A program to count number of faces in
collection of images is demonstrated.

Background

Processing large set of images on a single machine can be very time consuming
and costly. HIPI is an image processing library designed to be used with the
Apache Hadoop MapReduce, a software framework for sorting and processing
big data in a distributed fashion on large cluster of commodity hardware. HIPI
facilitates efficient and high-throughput image processing with MapReduce style
parallel programs typically executed on a cluster. It provides a solution for how to
store a large collection of images on the Hadoop Distributed File System (HDFS)
and make them available for efficient distributed processing.

1 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

OpenCV (Open Source Computer Vision) is an open source library of rich image
processing algorithms, mainly aimed at real time computer vision. Starting with
OpenCV 2.4.4, OpenCV supports Java Development which can be used with
Apache Hadoop.

Goal

This project demonstrates how HIPI and OpenCV can be used together to count
total number of faces in big image dataset.

Overview of Steps

1. Download VMWare Fusion


2. Download and Setup Cloudera Quickstart VM 5.4.x
3. Getting Started with HIPI
Setup Hadoop
Install Apache Ant
Install and build HIPI
Running a sample HIPI MapReduce Program
4. Getting Started with OpenCV Using Java
Download OpenCV source
CMake build system
Build OpenCV for Java
Running OpenCV Face Detection Program
5. Configure Hadoop for OpenCV
6. HIPI with OpenCV
7. Build FaceCount.java as facecount.jar
8. Run FaceCount MapReduce job

Big Data Set


Test images for face detection

2 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

Input: Image data containing 158 image (34MB)


Format: png image files
Source: http://vasc.ri.cmu.edu/idb/images/face/frontal_images/

Downloaded image were in gif format, I used Mac OSX Preview program to
convert these to png format.

Other sources for face detection image datasets:


https://www.bioid.com/About/BioID-Face-Database
https://facedetection.com/datasets/

Technologies Used:

Software Used Purpose

VMWare Fusion Software hypervisor for running Cloudera Quickstart VM

Cloudera Quickstart VM for single node Hadoop cluster for testing and running
VM map/reduce programs

IntelliJ IDEA 14 CE Java IDE for editing and compiling Java code

Hadoop Image Image processing library designed to be used with the Apache
Processing Hadoop MapReduce parallel programming framework, for storing
Interface (HIPI) large collection of images on HDFS and efficient distributed
processing.

OpenCV An image processing library aimed at real-time computer vision

Apache Hadoop Distributed processing of large data sets

References:
http://hipi.cs.virginia.edu/index.html
http://docs.opencv.org/doc/tutorials/introduction/desktop_java/java_dev_intro.html
http://radar.oreilly.com/2013/12/how-to-analyze-100-million-images-for-624.html

Steps

1. Download VMWare Fusion

VMware Fusion is a software hypervisor developed by VMware for computers


running OS X with Intel processors. Fusion allows Intel-based Macs to run
operating systems such as Microsoft Windows, Linux, NetWare, or Solaris on
virtual machines, along with their Mac OS X operating system using a
combination of paravirtualization,hardware virtualization and dynamic

3 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

recompilation. (http://en.wikipedia.org/wiki/VMware_Fusion)

Download and install VMWare Fusion from following URL, this will be used to run
Cloudera Quickstart VM.

https://my.vmware.com/web/vmware/details?downloadGroup=FUS-
711&productId=450&rPId=7446

2. Download and Setup Cloudera Quickstart VM 5.4.x

The Cloudera QuickStart VMs contain a single-node Apache Hadoop cluster,


complete with example data, queries, scripts, and Cloudera Manager to manage
your cluster. The VMs run CentOS 6.4 and are available for VMware, VirtualBox,
and KVM. This will help us gettings started with all the tools needed to run image
processing using Hadoop.

Download Cloudera Quickstart VM 5.4.x from following URL. The Quickstart VM


5.4 has Hadoop 2.6 installed on it, which is needed for HIPI.

http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-4-

4 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

x.html

Open VMWare Fusion and open the VM

3. Getting started with HIPI

Following steps demonstrates how to setup HIPI to run MapReduce job on


Apache Hadoop.

Setup Hadoop
The Cloudera Quickstart VM 5.4.x comes pre-installed with Hadoop 2.6 which is
needed needed for running HIPI.

Check Hadoop is installed and has correct version:

[cloudera@quickstart Project]$ which hadoop


/usr/bin/hadoop
[cloudera@quickstart Project]$ hadoop version
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r

5 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:18Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-
cdh5.4.0.jar

Install Apache Ant


Install Apache Ant and check it added to PATH:

[cloudera@quickstart Project]$ which ant


/usr/local/apache-ant/apache-ant-1.9.2/bin/ant

Install and build HIPI


There are two ways to install HIPI
1. Clone the latest HIPI distribution from GitHub and build from source.
(https://github.com/uvagfx/hipi)
2. Download a precompiled JAR from the downloads page.
(http://hipi.cs.virginia.edu/downloads.html)

Clone HIPI GitHub repository

The best way to check and verify that your system is properly setup is to clone
the official GitHub repository and build the tools and example programs.

[cloudera@quickstart Project]$ git clone https://github.com/uvagfx


/hipi.git
Initialized empty Git repository in /home/cloudera/Project
/hipi/.git/
remote: Counting objects: 2882, done.
remote: Total 2882 (delta 0), reused 0 (delta 0), pack-reused 2882
Receiving objects: 100% (2882/2882), 222.33 MiB | 7.03 MiB/s, done.
Resolving deltas: 100% (1767/1767), done.

Download Apache Hadoop tarball


Download Apache Hadoop tarball from following URL and untar it, this is needed
to build HIPI.
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics
/cdh_vd_cdh_package_tarball.html

[cloudera@quickstart Project]$ tar -xvzf /mnt/hgfs/CSCEI63/project

6 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

/hadoop-2.6.0-cdh5.4.0.tar.gz
[cloudera@quickstart Project]$ ls
hadoop-2.6.0-cdh5.4.0 hipi

Build HIPI binaries

Change directory to hipi repo and build HIPI.

[cloudera@quickstart Project]$ cd hipi/


[cloudera@quickstart hipi]$ ls
3rdparty data examples license.txt release
build.xml doc libsrc README.md util

Before building the HIPI, hadoop.home and hadoop.version properties in


build.xml file should be updated to the path to Hadoop installation and the version
of Hadoop we are using. Change:

build.xml

<!-- IMPORTANT: You must update the following two properties


according to your Hadoop setup -->
<!-- <property name="hadoop.home" value="/usr/local/Cellar
/hadoop/2.6.0/libexec/share/hadoop" /> -->
<!-- <property name="hadoop.version" value="2.6.0" /> -->

to

<!-- IMPORTANT: You must update the following two properties


according to your Hadoop setup -->
<property name="hadoop.home" value="/home/cloudera/Project
/hadoop-2.6.0-cdh5.4.0/share/hadoop" />
<property name="hadoop.version" value="2.6.0-cdh5.4.0" />

Build HIPI using ant

[cloudera@quickstart hipi]$ ant


Buildfile: /home/cloudera/Project/hipi/build.xml

7 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

hipi:
[javac] Compiling 30 source files to /home/cloudera/Project
/hipi/lib
[jar] Building jar: /home/cloudera/Project/hipi/lib/hipi-
2.0.jar
[echo] Hipi library built.

compile:
[javac] Compiling 1 source file to /home/cloudera/Project
/hipi/bin
[jar] Building jar: /home/cloudera/Project/hipi/examples
/covariance.jar
[echo] Covariance built.

all:

BUILD SUCCESSFUL
Total time: 36 seconds

Make sure it builds tools and examples:

[cloudera@quickstart hipi]$ ls
3rdparty build.xml doc lib license.txt release util
bin data examples libsrc README.md tool
[cloudera@quickstart hipi]$ ls tool/
hibimport.jar
[cloudera@quickstart hipi]$ ls examples/
covariance.jar hipi runCreateSequenceFile.sh
createsequencefile.jar jpegfromhib.jar runDownloader.sh
downloader.jar rumDumpHib.sh runJpegFromHib.sh
dumphib.jar runCovariance.sh testimages.txt

Sample MapReduce Java Program

Create SampleProgram.java class in sample folder to run a simple map/reduce


program using sample.hib created in above step:

[cloudera@quickstart hipi]$ mkdir sample


[cloudera@quickstart hipi]$ vi sample/SampleProgram.java

8 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

SampleProgram.java

import hipi.image.FloatImage;
import hipi.image.ImageHeader;
import hipi.imagebundle.mapreduce.ImageBundleInputFormat;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class SampleProgram extends Configured implements Tool {

public static class SampleProgramMapper extends Mapper<ImageHeader,


FloatImage, IntWritable, FloatImage> {
public void map(ImageHeader key, FloatImage value, Context context)
throws IOException, InterruptedException {

// Verify that image was properly decoded, is of sufficient


size, and has three color channels (RGB)
if (value != null && value.getWidth() > 1 && value.getHeight()
> 1 && value.getBands() == 3) {

// Get dimensions of image


int w = value.getWidth();
int h = value.getHeight();

// Get pointer to image data


float[] valData = value.getData();

// Initialize 3 element array to hold RGB pixel average


float[] avgData = {0,0,0};

// Traverse image pixel data in raster-scan order and


update running average
for (int j = 0; j < h; j++) {

9 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

for (int i = 0; i < w; i++) {


avgData[0] += valData[(j*w+i)*3+0]; // R
avgData[1] += valData[(j*w+i)*3+1]; // G
avgData[2] += valData[(j*w+i)*3+2]; // B
}
}

// Create a FloatImage to store the average value


FloatImage avg = new FloatImage(1, 1, 3, avgData);

// Divide by number of pixels in image


avg.scale(1.0f/(float)(w*h));

// Emit record to reducer


context.write(new IntWritable(1), avg);

} // If (value != null...

} // map()
}

public static class SampleProgramReducer extends Reducer<IntWritable,


FloatImage, IntWritable, Text> {
public void reduce(IntWritable key, Iterable<FloatImage> values,
Context context)
throws IOException, InterruptedException {

// Create FloatImage object to hold final result


FloatImage avg = new FloatImage(1, 1, 3);

// Initialize a counter and iterate over


IntWritable/FloatImage records from mapper
int total = 0;
for (FloatImage val : values) {
avg.add(val);
total++;
}

if (total > 0) {
// Normalize sum to obtain average
avg.scale(1.0f / total);
// Assemble final output as string
float[] avgData = avg.getData();
String result = String.format("Average pixel value: %f %f
%f", avgData[0], avgData[1], avgData[2]);
// Emit output of job which will be written to HDFS

10 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

context.write(key, new Text(result));


}

} // reduce()
}

public int run(String[] args) throws Exception {


// Check input arguments
if (args.length != 2) {
System.out.println("Usage: firstprog <input HIB> <output
directory>");
System.exit(0);
}

// Initialize and configure MapReduce job


Job job = Job.getInstance();
// Set input format class which parses the input HIB and spawns
map tasks
job.setInputFormatClass(ImageBundleInputFormat.class);
// Set the driver, mapper, and reducer classes which express the
computation
job.setJarByClass(SampleProgram.class);
job.setMapperClass(SampleProgramMapper.class);
job.setReducerClass(SampleProgramReducer.class);
// Set the types for the key/value pairs passed to/from map and
reduce layers
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(FloatImage.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);

// Set the input and output paths on the HDFS


FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

// Execute the MapReduce job and block until it complets


boolean success = job.waitForCompletion(true);

// Return success or failure


return success ? 0 : 1;
}

public static void main(String[] args) throws Exception {


ToolRunner.run(new SampleProgram(), args);
System.exit(0);
}

11 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

Add a new build target to hipi/build.xml to build SampleProgram.java and create a


sample.jar library:

build.xml

<target name="sample">
<antcall target="compile">
<param name="srcdir" value="sample" />
<param name="jarfilename" value="sample.jar" />
<param name="jardir" value="sample" />
<param name="mainclass" value="SampleProgram" />
</antcall>
</target>
...

Build SampleProgram

[cloudera@quickstart hipi]$ ant sample


Buildfile: /home/cloudera/Project/hipi/build.xml
...
compile:
[jar] Building jar: /home/cloudera/Project/hipi/sample
/sample.jar

BUILD SUCCESSFUL
Total time: 16 seconds

Running a sample HIPI MapReduce Program

Create a sample.hib on HDFS file from sample images provides with HIPI using
hibimport tool, this will be the input to MapReduce program.

[cloudera@quickstart hipi]$ hadoop jar tool/hibimport.jar data/test


/ImageBundleTestCase/read/
0.jpg 1.jpg 2.jpg 3.jpg
[cloudera@quickstart hipi]$ hadoop jar tool/hibimport.jar data/test
/ImageBundleTestCase/read examples/sample.hib
** added: 2.jpg
** added: 3.jpg
** added: 0.jpg
** added: 1.jpg
Created: examples/sample.hib and examples/sample.hib.dat

12 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

[cloudera@quickstart hipi]$ hadoop fs -ls examples


Found 2 items
-rw-r--r-- 1 cloudera cloudera 80 2015-05-09 22:19
examples/sample.hib
-rw-r--r-- 1 cloudera cloudera 1479345 2015-05-09 22:19
examples/sample.hib.dat

Running Hadoop MapReduce program

[cloudera@quickstart hipi]$ hadoop jar sample/sample.jar


examples/sample.hib examples/output
15/05/09 23:05:10 INFO client.RMProxy: Connecting to
ResourceManager at /0.0.0.0:8032

15/05/09 23:07:05 INFO mapreduce.Job: Job job_1431127776378_0001
running in uber mode : false
15/05/09 23:07:05 INFO mapreduce.Job: map 0% reduce 0%
15/05/09 23:08:55 INFO mapreduce.Job: map 57% reduce 0%
15/05/09 23:09:04 INFO mapreduce.Job: map 100% reduce 0%
15/05/09 23:09:38 INFO mapreduce.Job: map 100% reduce 100%
15/05/09 23:09:39 INFO mapreduce.Job: Job job_1431127776378_0001
completed successfully

File Output Format Counters
Bytes Written=50

Check the program output:


[cloudera@quickstart hipi]$ hadoop fs -ls examples/output
Found 2 items
-rw-r--r-- 1 cloudera cloudera 0 2015-05-09 23:09
examples/output/_SUCCESS
-rw-r--r-- 1 cloudera cloudera 50 2015-05-09 23:09
examples/output/part-r-00000

The average pixel value calculated for all the image is:

[cloudera@quickstart hipi]$ hadoop fs -cat examples/output/part-


r-00000
1 Average pixel value: 0.420624 0.404933 0.380449

4. Getting started with OpenCV using Java

OpenCV is an image processing library. It contains a large collection of image


processing functions. Starting from version 2.4.4 OpenCV includes desktop Java
bindings. We will use version 2.4.11 to build Java bindings and use it with HIPI to

13 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

run image processing.

Download OpenCV source:

The zip bundle for OpenCV 2.4.11 source can be download from following URL
and unzip it to ~/Project/opencv directory.
http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/2.4.11/

[cloudera@quickstart Project]$ mkdir opencv && cd opencv


[cloudera@quickstart opencv]$ unzip /mnt/hgfs/CSCEI63/project
/opencv-2.4.11.zip

CMake build system


CMake is a family of tools designed to build, test and package software. CMake
is used to control the software compilation process using simple platform and
compiler independent configuration files. CMake generates native makefiles and
workspaces that can be used in the compiler environment of your choice.
(http://www.cmake.org/)

The OpenCV is built with CMake, download CMake binaries from


http://www.cmake.org/download/ and untar the tar bundles to ~/Project/opencv
directory:

[cloudera@quickstart opencv]$ tar -xvzf cmake-3.2.2-Linux-


x86_64.tar.gz

Build OpenCV for Java

Following steps details how to build OpenCV on linux

Configure OpenCV for builds on Linux

[cloudera@quickstart opencv-2.4.11]$ ../cmake-3.2.2-Linux-x86_64


/bin/cmake -DBUILD_SHARED_LIBS=OFF
.
.
. Target "opencv_haartraining_engine" links to itself.
This warning is for project developers. Use -Wno-dev to suppress
it.

-- Generating done

14 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

-- Build files have been written to: /home/cloudera/Project/opencv


/opencv-2.4.11

Build OpenCV

[cloudera@quickstart opencv-2.4.11]$ make


.
.
.
[100%] Building CXX object apps/traincascade/CMakeFiles
/opencv_traincascade.dir/imagestorage.cpp.o
Linking CXX executable ../../bin/opencv_traincascade
[100%] Built target opencv_traincascade
Scanning dependencies of target opencv_annotation
[100%] Building CXX object apps/annotation/CMakeFiles
/opencv_annotation.dir/opencv_annotation.cpp.o
Linking CXX executable ../../bin/opencv_annotation
[100%] Built target opencv_annotation

This will create a jar containing the Java interface (bin/opencv-2411.jar) and a
native dynamic library containing Java bindings and all the OpenCV stuff
(lib/libopencv_java2411.so). Well use these files to build and run OpenCV
program.

[cloudera@quickstart opencv-2.4.11]$ ls lib | grep .so


libopencv_java2411.so
[cloudera@quickstart opencv-2.4.11]$ ls bin | grep .jar
opencv-2411.jar

Running OpenCV Face Detection Program

Following steps are run to make sure OpenCV is setup correctly and works as
expected.

Create a new directory sample and Create an ant build.xml file in it.

[cloudera@quickstart opencv]$ mkdir sample && cd sample


[cloudera@quickstart sample]$ vi build.xml

build.xml

<project name="Main" basedir="." default="rebuild-run">

15 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

<property name="src.dir" value="src"/>

<property name="lib.dir" value="${ocvJarDir}"/>


<path id="classpath">
<fileset dir="${lib.dir}" includes="**/*.jar"/>
</path>

<property name="build.dir" value="build"/>


<property name="classes.dir" value="${build.dir}/classes"/>
<property name="jar.dir" value="${build.dir}/jar"/>

<property name="main-class" value="${ant.project.name}"/>

<target name="clean">
<delete dir="${build.dir}"/>
</target>

<target name="compile">
<mkdir dir="${classes.dir}"/>
<javac includeantruntime="false" srcdir="${src.dir}"
destdir="${classes.dir}" classpathref="classpath"/>
</target>

<target name="jar" depends="compile">


<mkdir dir="${jar.dir}"/>
<jar destfile="${jar.dir}/${ant.project.name}.jar"
basedir="${classes.dir}">
<manifest>
<attribute name="Main-Class" value="${main-class}"/>
</manifest>
</jar>
</target>

<target name="run" depends="jar">


<java fork="true" classname="${main-class}">
<sysproperty key="java.library.path"
path="${ocvLibDir}"/>
<classpath>
<path refid="classpath"/>
<path location="${jar.dir}/${ant.project.name}.jar"/>
</classpath>
</java>
</target>

<target name="rebuild" depends="clean,jar"/>

16 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

<target name="rebuild-run" depends="clean,run"/>

</project>

Write a program using OpenCV to detect number of faces in an image.

DetectFaces.java

import org.opencv.core.Core;
import org.opencv.core.Mat;
import org.opencv.core.Scalar;
import org.opencv.highgui.*;
import org.opencv.core.MatOfRect;
import org.opencv.core.Point;
import org.opencv.core.Rect;
import org.opencv.objdetect.CascadeClassifier;

import java.io.File;

/**
* Created by dmalav on 4/30/15.
*/
public class DetectFaces {

public void run(String imageFile) {


System.out.println("\nRunning DetectFaceDemo");

// Create a face detector from the cascade file in the resources


// directory.
String xmlPath = "/home/cloudera/project/opencv-examples
/lbpcascade_frontalface.xml";
System.out.println(xmlPath);
CascadeClassifier faceDetector = new CascadeClassifier(xmlPath);
Mat image = Highgui.imread(imageFile);

// Detect faces in the image.


// MatOfRect is a special container class for Rect.
MatOfRect faceDetections = new MatOfRect();
faceDetector.detectMultiScale(image, faceDetections);

System.out.println(String.format("Detected %s faces",
faceDetections.toArray().length));

17 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

// Draw a bounding box around each face.


for (Rect rect : faceDetections.toArray()) {
Core.rectangle(image, new Point(rect.x, rect.y), new
Point(rect.x + rect.width, rect.y + rect.height), new Scalar(0, 255, 0));
}

File f = new File(imageFile);


System.out.println(f.getName());
// Save the visualized detection.
String filename = f.getName();
System.out.println(String.format("Writing %s", filename));
Highgui.imwrite(filename, image);

}
}

Main.java

import org.opencv.core.Core;

import java.io.File;

public class Main {

public static void main(String... args) {

System.loadLibrary(Core.NATIVE_LIBRARY_NAME);

if (args.length == 0) {
System.err.println("Usage Main /path/to/images");
System.exit(1);
}

File[] files = new File(args[0]).listFiles();


showFiles(files);
}

public static void showFiles(File[] files) {


DetectFaces faces = new DetectFaces();
for (File file : files) {
if (file.isDirectory()) {
System.out.println("Directory: " + file.getName());
showFiles(file.listFiles()); // Calls same method again.

18 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

} else {
System.out.println("File: " + file.getAbsolutePath());
faces.run(file.getAbsolutePath());
}
}
}
}

Build Face Detection Java Program

[cloudera@quickstart sample]$ant -DocvJarDir=/home/cloudera/Project


/opencv/opencv-2.4.11/bin -DocvLibDir=/home/cloudera/Project/opencv
/opencv-2.4.11/lib jar
Buildfile: /home/cloudera/Project/opencv/sample/build.xml

compile:
[mkdir] Created dir: /home/cloudera/Project/opencv/sample/build
/classes
[javac] Compiling 2 source files to /home/cloudera/Project
/opencv/sample/build/classes

jar:
[mkdir] Created dir: /home/cloudera/Project/opencv/sample
/build/jar
[jar] Building jar: /home/cloudera/Project/opencv/sample/build
/jar/Main.jar

BUILD SUCCESSFUL
Total time: 3 seconds

This build creates a build/jar/Main.jar file which can be used to detect faces from
images stored in a directory:

[cloudera@quickstart sample]$ java -cp ../opencv-2.4.11/bin/opencv-


2411.jar:build/jar/Main.jar -Djava.library.path=../opencv-
2.4.11/lib Main /mnt/hgfs/CSCEI63/project/images2
File: /mnt/hgfs/CSCEI63/project/images2/addams-family.png

Running DetectFaceDemo
/home/cloudera/Project/opencv/sample/lbpcascade_frontalface.xml
Detected 7 faces
addams-family.png
Writing addams-family.png

19 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

OpenCV detected faces

OpenCV does fairly good job detecting front facing faces when
lbpcascade_frontalface.xml classifier is used. There are other classifier
provided by OpenCV which can detect rotate faces and other face orientations.

5. Configure Hadoop for OpenCV

To run OpenCV Java code the native library Core.NATIVE_LIBRARY_NAME


must be added to Hadoop java.library.path, following steps details how to setup
OpenCV native library with Hadoop.

20 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

Copy OpenCV native library libopencv_java2411.so to /etc/opencv/lib

[cloudera@quickstart opencv]$ pwd


/home/cloudera/Project/opencv
[cloudera@quickstart opencv]$ ls
cmake-3.2.2-Linux-x86_64 opencv-2.4.11 sample test
[cloudera@quickstart opencv]$ sudo cp opencv-2.4.11/lib
/libopencv_java2411.so /etc/opencv/lib/

Setup JAVA_LIBRARY_PATH in /usr/lib/hadoop/libexec/hadoop-config.sh file to


point to OpenCV native library.

[cloudera@quickstart Project]$ vi /usr/lib/hadoop/libexec/hadoop-


config.sh
.
.
# setup 'java.library.path' for native-hadoop code if necessary

if [ -d "${HADOOP_PREFIX}/build/native" -o -d
"${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DIR" ]; then

if [ -d "${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DIR" ]; then
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then

JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${HADOOP_PREFIX}/$HADOOP
_COMMON_LIB_NATIVE_DIR
else

JAVA_LIBRARY_PATH=${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DI
R
fi
fi
fi

# setup opencv native library path


JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:/etc/opencv/lib

.
.

6. HIPI with OpenCV

This step details Java code for combining HIPI with OpenCV.

21 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

HIPI uses HipiImageBundle class to represent collection of images on HDFS,


and FloatImage for representing the image in memory. This FloatImage must be
converted to OpenCV Mat format for image processing, counting face in this
case.

Following method is used to convert FloatImage to Mat:

// Convert HIPI FloatImage to OpenCV Mat


public Mat convertFloatImageToOpenCVMat(FloatImage floatImage) {

// Get dimensions of image


int w = floatImage.getWidth();
int h = floatImage.getHeight();

// Get pointer to image data


float[] valData = floatImage.getData();

// Initialize 3 element array to hold RGB pixel average


double[] rgb = {0.0,0.0,0.0};

Mat mat = new Mat(h, w, CvType.CV_8UC3);

// Traverse image pixel data in raster-scan order and update


running average
for (int j = 0; j < h; j++) {
for (int i = 0; i < w; i++) {
rgb[0] = (double) valData[(j*w+i)*3+0] * 255.0; // R
rgb[1] = (double) valData[(j*w+i)*3+1] * 255.0; // G
rgb[2] = (double) valData[(j*w+i)*3+2] * 255.0; // B
mat.put(j, i, rgb);
}
}

return mat;
}

To count the number of faces from an image we need to create a


CascadeClassifier which uses a classifier file. This file must be present on
HDFS, this can be accomplished by using Job.addCacheFile method and later
retrieve it in Mapper class.

public int run(String[] args) throws Exception {

22 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

....

// Initialize and configure MapReduce job


Job job = Job.getInstance();

....

// add cascade file


job.addCacheFile(new URI("/user/cloudera
/lbpcascade_frontalface.xml#lbpcascade_frontalface.xml"));

// Execute the MapReduce job and block until it complets


boolean success = job.waitForCompletion(true);

// Return success or failure


return success ? 0 : 1;
}

Override Mapper::setup method to load OpenCV native library and create


CascadeClassifier for face detections:

public static class FaceCountMapper extends Mapper<ImageHeader,


FloatImage, IntWritable, IntWritable> {

// Create a face detector from the cascade file in the resources


// directory.
private CascadeClassifier faceDetector;

public void setup(Context context)


throws IOException, InterruptedException {

// Load OpenCV native library


try {
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
} catch (UnsatisfiedLinkError e) {
System.err.println("Native code library failed to load.\n"
+ e + Core.NATIVE_LIBRARY_NAME);
System.exit(1);
}

// Load cached cascade file for front face detection and


create CascadeClassifier
if (context.getCacheFiles() != null &&
context.getCacheFiles().length > 0) {
URI mappingFileUri = context.getCacheFiles()[0];

23 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

if (mappingFileUri != null) {
faceDetector = new
CascadeClassifier("./lbpcascade_frontalface.xml");

} else {
System.out.println(">>>>>> NO MAPPING FILE");
}
} else {
System.out.println(">>>>>> NO CACHE FILES AT ALL");
}

super.setup(context);
} // setup()
....
}

Full listing of FaceCount.java.

Mapper:
1. Load OpenCV native library
2. Create CascadeClassifier
3. Convert HIPI FloatImage to OpenCV Mat
4. Detect and count faces in the image
5. Write number of faces detected to context

Reducer:
1. Count number of files processed
2. Count number of faces detected
3. Output number of files and faces detected

FaceCount.java

import hipi.image.FloatImage;
import hipi.image.ImageHeader;
import hipi.imagebundle.mapreduce.ImageBundleInputFormat;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;

24 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

import org.opencv.core.*;
import org.opencv.objdetect.CascadeClassifier;

import java.io.IOException;
import java.net.URI;

public class FaceCount extends Configured implements Tool {

public static class FaceCountMapper extends Mapper<ImageHeader,


FloatImage, IntWritable, IntWritable> {

// Create a face detector from the cascade file in the resources


// directory.
private CascadeClassifier faceDetector;

// Convert HIPI FloatImage to OpenCV Mat


public Mat convertFloatImageToOpenCVMat(FloatImage floatImage) {

// Get dimensions of image


int w = floatImage.getWidth();
int h = floatImage.getHeight();

// Get pointer to image data


float[] valData = floatImage.getData();

// Initialize 3 element array to hold RGB pixel average


double[] rgb = {0.0,0.0,0.0};

Mat mat = new Mat(h, w, CvType.CV_8UC3);

// Traverse image pixel data in raster-scan order and update


running average
for (int j = 0; j < h; j++) {
for (int i = 0; i < w; i++) {
rgb[0] = (double) valData[(j*w+i)*3+0] * 255.0; // R
rgb[1] = (double) valData[(j*w+i)*3+1] * 255.0; // G
rgb[2] = (double) valData[(j*w+i)*3+2] * 255.0; // B

25 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

mat.put(j, i, rgb);
}
}

return mat;
}

// Count faces in image


public int countFaces(Mat image) {

// Detect faces in the image.


// MatOfRect is a special container class for Rect.
MatOfRect faceDetections = new MatOfRect();
faceDetector.detectMultiScale(image, faceDetections);

return faceDetections.toArray().length;
}

public void setup(Context context)


throws IOException, InterruptedException {

// Load OpenCV native library


try {
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
} catch (UnsatisfiedLinkError e) {
System.err.println("Native code library failed to load.\n"
+ e + Core.NATIVE_LIBRARY_NAME);
System.exit(1);
}

// Load cached cascade file for front face detection and


create CascadeClassifier
if (context.getCacheFiles() != null &&
context.getCacheFiles().length > 0) {
URI mappingFileUri = context.getCacheFiles()[0];

if (mappingFileUri != null) {
faceDetector = new
CascadeClassifier("./lbpcascade_frontalface.xml");

} else {
System.out.println(">>>>>> NO MAPPING FILE");
}
} else {
System.out.println(">>>>>> NO CACHE FILES AT ALL");
}

26 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

super.setup(context);
} // setup()

public void map(ImageHeader key, FloatImage value, Context context)


throws IOException, InterruptedException {

// Verify that image was properly decoded, is of sufficient


size, and has three color channels (RGB)
if (value != null && value.getWidth() > 1 && value.getHeight()
> 1 && value.getBands() == 3) {

Mat cvImage = this.convertFloatImageToOpenCVMat(value);

int faces = this.countFaces(cvImage);

System.out.println(">>>>>> Detected Faces: " +


Integer.toString(faces));

// Emit record to reducer


context.write(new IntWritable(1), new IntWritable(faces));

} // If (value != null...

} // map()
}

public static class FaceCountReducer extends Reducer<IntWritable,


IntWritable, IntWritable, Text> {
public void reduce(IntWritable key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {

// Initialize a counter and iterate over


IntWritable/FloatImage records from mapper
int total = 0;
int images = 0;
for (IntWritable val : values) {
total += val.get();
images++;
}

String result = String.format("Total face detected: %d",

27 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

total);
// Emit output of job which will be written to HDFS
context.write(new IntWritable(images), new Text(result));
} // reduce()
}

public int run(String[] args) throws Exception {


// Check input arguments
if (args.length != 2) {
System.out.println("Usage: firstprog <input HIB> <output
directory>");
System.exit(0);
}

// Initialize and configure MapReduce job


Job job = Job.getInstance();
// Set input format class which parses the input HIB and spawns
map tasks
job.setInputFormatClass(ImageBundleInputFormat.class);
// Set the driver, mapper, and reducer classes which express the
computation
job.setJarByClass(FaceCount.class);
job.setMapperClass(FaceCountMapper.class);
job.setReducerClass(FaceCountReducer.class);
// Set the types for the key/value pairs passed to/from map and
reduce layers
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);

// Set the input and output paths on the HDFS


FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

// add cascade file


job.addCacheFile(new URI("/user/cloudera
/lbpcascade_frontalface.xml#lbpcascade_frontalface.xml"));

// Execute the MapReduce job and block until it complets


boolean success = job.waitForCompletion(true);

// Return success or failure


return success ? 0 : 1;
}

public static void main(String[] args) throws Exception {

28 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

ToolRunner.run(new FaceCount(), args);


System.exit(0);
}

7. Build FaceCount.java as facecount.jar

Create new facecount directory in hipi folder (where HIPI was built) and copy
FaceCount.java from previous step.

[cloudera@quickstart hipi]$ pwd


/home/cloudera/Project/hipi
[cloudera@quickstart hipi]$ mkdir facecount
[cloudera@quickstart hipi]$ cp /mnt/hgfs/CSCEI63/project/hipi/src
/FaceCount.java facecount/
[cloudera@quickstart hipi]$ ls
3rdparty data facecount libsrc README.md sample
bin doc hipiwrapper license.txt release tool
build.xml examples lib my.diff run.sh util
[cloudera@quickstart hipi]$ ls facecount/
FaceCount.java

Make changes to HIPI build.xml ant script to build link to OpenCV jar file and add
new build target facecount.

build.xml

<project basedir="." default="all">

<target name="setup">

....

<!-- opencv dependencies -->


<property name="opencv.jar" value="../opencv/opencv-2.4.11
/bin/opencv-2411.jar" />

<echo message="Properties set."/>


</target>

29 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

<target name="compile" depends="setup,test_settings,hipi">

<mkdir dir="bin" />

<!-- Compile -->


<javac debug="yes" nowarn="on" includeantruntime="no"
srcdir="${srcdir}" destdir="./bin"
classpath="${hadoop.classpath}:./lib
/hipi-${hipi.version}.jar:${opencv.jar}">
<compilerarg value="-Xlint:deprecation" />
</javac>

<!-- Create the jar -->


<jar destfile="${jardir}/${jarfilename}" basedir="./bin">
<zipfileset src="./lib/hipi-${hipi.version}.jar" />
<zipfileset src="${opencv.jar}" />
<manifest>
<attribute name="Main-Class" value="${mainclass}" />
</manifest>
</jar>

</target>
....

<target name="facecount">
<antcall target="compile">
<param name="srcdir" value="facecount" />
<param name="jarfilename" value="facecount.jar" />
<param name="jardir" value="facecount" />
<param name="mainclass" value="FaceCount" />
</antcall>
</target>

<target name="all"
depends="hipi,hibimport,downloader,dumphib,jpegfromhib,createsequenc
efile,covariance" />

<!-- Clean -->


<target name="clean">
<delete dir="lib" />
<delete dir="bin" />
<delete>
<fileset dir="." includes="examples/*.jar,experiments/*.jar"
/>
</delete>
</target>

30 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

</project>

Build FaceCount.java

[cloudera@quickstart hipi]$ ant facecount


Buildfile: /home/cloudera/Project/hipi/build.xml

facecount:

setup:
[echo] Setting properties for build task...
[echo] Properties set.

test_settings:
[echo] Confirming that hadoop settings are set...
[echo] Properties are specified properly.

hipi:
[echo] Building the hipi library...

hipi:
[javac] Compiling 30 source files to /home/cloudera/Project
/hipi/lib
[jar] Building jar: /home/cloudera/Project/hipi/lib/hipi-
2.0.jar
[echo] Hipi library built.

compile:
[jar] Building jar: /home/cloudera/Project/hipi/facecount
/facecount.jar

BUILD SUCCESSFUL
Total time: 12 seconds

Check facecount.jar is built under facecount directory

[cloudera@quickstart hipi]$ ls facecount/


facecount.jar FaceCount.java

8. Run FaceCount MapReduce job

31 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

Setup input images

[cloudera@quickstart hipi]$ ls /mnt/hgfs/CSCEI63/project/images-png/


217.png eugene.png patio.png
221.png ew-courtney-david.png people.png
3.png ew-friends.png pict_28.png
.

Create HIB

The primary input type to a HIPI program is a HipiImageBundle (HIB), which


stores a collection of images on the Hadoop Distributed File System (HDFS).
Use the hibimport tool to create a HIB (project/input.hib) from a collection of
images on your local file system located in the directory /mnt/hgfs/CSCEI63
/project/images-png/ by executing the following command from the HIPI
root directory

[cloudera@quickstart hipi]$ hadoop fs -mkdir project


[cloudera@quickstart hipi]$ hadoop jar tool/hibimport.jar /mnt/hgfs
/CSCEI63/project/images-png/ project/input.hib
** added: 217.png
** added: 221.png
** added: 3.png
** added: addams-family.png
** added: aeon1a.png
** added: aerosmith-double.png
.
.
.
** added: werbg04.png
** added: window.png
** added: wxm.png
** added: yellow-pages.png
** added: ysato.png
Created: project/input.hib and project/input.hib.dat

Run MapReduce

Create a run-facecount.sh script to clean previous output file and execute


mapreduce job:

run-facecount.sh

32 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

#!/bin/bash
hadoop fs -rm -R project/output
hadoop jar facecount/facecount.jar project/input.hib project/output

[cloudera@quickstart hipi]$ bash run-facecount.sh


15/05/12 16:48:06 INFO fs.TrashPolicyDefault: Namenode trash
configuration: Deletion interval = 0 minutes, Emptier interval = 0
minutes.
Deleted project/output
15/05/12 16:48:17 INFO client.RMProxy: Connecting to
ResourceManager at /0.0.0.0:8032
15/05/12 16:48:20 WARN mapreduce.JobSubmitter: Hadoop command-line
option parsing not performed. Implement the Tool interface and
execute your application with ToolRunner to remedy this.
15/05/12 16:48:21 INFO input.FileInputFormat: Total input paths to
process : 1
Spawned 1map tasks
15/05/12 16:48:22 INFO mapreduce.JobSubmitter: number of splits:1
15/05/12 16:48:22 INFO mapreduce.JobSubmitter: Submitting tokens
for job: job_1431127776378_0049
15/05/12 16:48:25 INFO impl.YarnClientImpl: Submitted application
application_1431127776378_0049
15/05/12 16:48:25 INFO mapreduce.Job: The url to track the job:
http://quickstart.cloudera:8088/proxy
/application_1431127776378_0049/
15/05/12 16:48:25 INFO mapreduce.Job: Running job:
job_1431127776378_0049
15/05/12 16:48:58 INFO mapreduce.Job: Job job_1431127776378_0049
running in uber mode : false
15/05/12 16:48:58 INFO mapreduce.Job: map 0% reduce 0%
15/05/12 16:49:45 INFO mapreduce.Job: map 3% reduce 0%
15/05/12 16:50:53 INFO mapreduce.Job: map 5% reduce 0%
15/05/12 16:50:57 INFO mapreduce.Job: map 8% reduce 0%
15/05/12 16:51:01 INFO mapreduce.Job: map 11% reduce 0%
15/05/12 16:51:09 INFO mapreduce.Job: map 15% reduce 0%
15/05/12 16:51:13 INFO mapreduce.Job: map 22% reduce 0%
15/05/12 16:51:18 INFO mapreduce.Job: map 25% reduce 0%
15/05/12 16:51:21 INFO mapreduce.Job: map 28% reduce 0%
15/05/12 16:51:29 INFO mapreduce.Job: map 31% reduce 0%
15/05/12 16:51:32 INFO mapreduce.Job: map 33% reduce 0%
15/05/12 16:51:45 INFO mapreduce.Job: map 38% reduce 0%
15/05/12 16:51:57 INFO mapreduce.Job: map 51% reduce 0%
15/05/12 16:52:07 INFO mapreduce.Job: map 55% reduce 0%
15/05/12 16:52:10 INFO mapreduce.Job: map 58% reduce 0%
15/05/12 16:52:14 INFO mapreduce.Job: map 60% reduce 0%
15/05/12 16:52:18 INFO mapreduce.Job: map 63% reduce 0%
15/05/12 16:52:26 INFO mapreduce.Job: map 100% reduce 0%
15/05/12 16:52:59 INFO mapreduce.Job: map 100% reduce 100%
15/05/12 16:53:01 INFO mapreduce.Job: Job job_1431127776378_0049
completed successfully
15/05/12 16:53:02 INFO mapreduce.Job: Counters: 49

33 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

File System Counters


FILE: Number of bytes read=1576
FILE: Number of bytes written=226139
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=35726474
HDFS: Number of bytes written=27
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=205324
Total time spent by all reduces in occupied slots (ms)=29585
Total time spent by all map tasks (ms)=205324
Total time spent by all reduce tasks (ms)=29585
Total vcore-seconds taken by all map tasks=205324
Total vcore-seconds taken by all reduce tasks=29585
Total megabyte-seconds taken by all map tasks=210251776
Total megabyte-seconds taken by all reduce tasks=30295040
Map-Reduce Framework
Map input records=157
Map output records=157
Map output bytes=1256
Map output materialized bytes=1576
Input split bytes=132
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=1576
Reduce input records=157
Reduce output records=1
Spilled Records=314
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=11772
CPU time spent (ms)=45440
Physical memory (bytes) snapshot=564613120
Virtual memory (bytes) snapshot=3050717184
Total committed heap usage (bytes)=506802176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=35726342
File Output Format Counters

34 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

Bytes Written=27

Check results:

[cloudera@quickstart hipi]$ hadoop fs -ls project/output


Found 2 items
-rw-r--r-- 1 cloudera cloudera 0 2015-05-12 16:52
project/output/_SUCCESS
-rw-r--r-- 1 cloudera cloudera 27 2015-05-12 16:52
project/output/part-r-00000
[cloudera@quickstart hipi]$ hadoop fs -cat project/output/part-
r-00000
157 Total face detected: 0

9. Summary

OpenCV provides very rich set of tools for image processing, when combined
with HIPIs efficient and high-throughput parallel image processing power can be
a great solution for processing very large image dataset very fast. These tools
can help researcher and engineers alike to achieve high performance image
processing.

10. Issues

The wrapper function to convert HIPI FloatImage to OpenCV did not work for
some reason and not producing the correct image when converted. This was
causing a bug of no faces detected. I had contacted HIPI members but did not
receive timely reply before finishing this project. This bug is causing my results to
show 0 faces detected.

11. Benefits (Pros/Cons)

Pros: HIPI is a great tool for processing very large volume of images in hadoop
cluster, when combined with OpenCV it can be very powerful.
Cons: Converting the image format (HIPI FloatImage) to OpenCV Mat format is
not straightforward and causing the issues for OpenCV to process images
correctly.

35 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

12. Lessons Learned

1. Setting up Cloudera Quickstart VM


2. Using HIPI to run MapReduce on large volume of images
3. Using OpenCV in Java environment
4. Settings up hadoop to load native libraries
5. Using cached files on HDFS

Posted by Dinesh Malav at 6:59 PM

+4 Recommend this on Google

Labels: cloudera, hadoop, hipi, image processing, map reduce, opencv

11 comments:

Nam Vo May 24, 2015 at 11:12 AM

yes, I like it. I have question for Dinesh Malav. You can use SIFTDectector (Opencv) to find
the same image on Hadoop using HIPI?

Reply

Dinesh Malav May 26, 2015 at 10:23 AM

I am not familiar with SIFTDectector but a wrapper to convert HIPI FloatImage to any
required format can be written and used with OpenCV.

Reply

Replies

Nam Vo May 29, 2015 at 11:04 AM

Hi Dinesh Malav,

Subject: Image matching with hipi

I want to know whether is it possible to search a image in the bundle that match
to the image I give. Thanks

Reply

Nam Vo May 29, 2015 at 11:01 AM

Reply

Replies

Nam Vo May 29, 2015 at 11:04 AM

36 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

Reply

Prasad Almeida August 10, 2015 at 12:07 AM

Hello Dinesh

Did you try to run this MapReduce prog. in Hipi using eclipse by having separate classes like
driver, mapper and reducer. If yes, please give me the steps of eclipse. Creating with single
file.java works fine with me.. any help!

-Prasad

Reply

Jannik Andrew September 23, 2015 at 10:53 AM

The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop
Administrator Online Training
Thank you for sharing Such a good tutorials on Hadoop Image Processing

Reply

Sai Santosh October 26, 2015 at 3:23 AM

Apart from learning more about Hadoop at hadoop online training, this blog adds to my
learning platforms. Great work done by the webmasters. Thanks for your research and
experience sharing on a platform like this.

Reply

Hadoop FS Commands January 2, 2016 at 2:56 AM

Some topics covered,may be it helps someone,HDFS is a Java-based file system that provides
scalable and reliable data storage,and it was designed to span large clusters of commodity
servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single
cluster of 4500 servers, supporting close to a billion files and blocks.
http://www.computaholics.in/2015/12/hdfs.html
http://www.computaholics.in/2015/12/mapreduce.html
http://www.computaholics.in/2015/11/hadoop-fs-commands.html

Reply

shubham shukla February 6, 2016 at 4:18 AM

hi,dinesh
i have one project in which i have to give image of one person it will search into a video
frame by frame,match and give the result like count of face detection,timing of appearance
etc. so is it possible with HiPi and opencv??

Reply

abirami jeyagobi February 27, 2016 at 5:20 AM

I have my hipi with build.gradle file..!!! Where do i need to specify the opencv
dependencies in hipi instead of build.xml........??????

Reply

37 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...

Comment as:

Publish

Newer Post Home

Subscribe to: Post Comments (Atom)

Simple template. Powered by Blogger.

38 of 38 03/03/16 16:15