Basic Image Processing For Robotics Manual

Robot Practical
Basic Image Processing for Robotics
Maja Rudinac and Xin Wang
Table of contents
Table of contents................................................................................................................. ii
Foreword:............................................................................................................................ v
Chapter 1: Introduction ................................................................................................... 2
1.1 Matlab ....................................................................................................................... 2
1.1.1 Matlab Introduction ........................................................................................... 2
1.1.2 Using Matlab Editor to Create M-file................................................................ 3
1.1.3 Getting Help....................................................................................................... 3
1.1.4 Starting with Image Processing Toolbox........................................................... 4
1.2 DIPimage .................................................................................................................. 5
1.2.1 DIPimage Introduction....................................................................................... 5
1.2.2 Getting Started ................................................................................................... 6
Chapter 2: Image representation and manipulation ..................................................... 9
2.1 Digital Image Representation ................................................................................... 9
2.1.1 Image Sampling and Quantization..................................................................... 9
2.1.2 Data Types and Image Types........................................................................... 12
2.2 Graphic Geometric Transformation........................................................................ 13
2.3 Basic Operations on Image Array........................................................................... 14
2.3.2 Array Operations.............................................................................................. 15
Chapter 3: Intensity Transformation............................................................................ 16
3.1 Contrast Manipulation ............................................................................................ 16
3.2 Histogram Processing ............................................................................................. 18
3.2.1 Histogram calculation ...................................................................................... 18
3.2.2 Histogram equalization .................................................................................... 19
3.3 Threshold ................................................................................................................ 19
Chapter 4: Spatial Filtering ........................................................................................... 22
4.1 Basic Ideas .............................................................................................................. 22
4.2 Smoothing (Blurring) Filters................................................................................... 24
4.3 Sharpening Filter..................................................................................................... 25
4.3 Edge Filter............................................................................................................... 25
4.4 Nonlinear filter........................................................................................................ 26
Chapter 5: Frequency based Processing....................................................................... 28
5.1 The Basics of Fourier Transform............................................................................ 28
5.2 Computing 2-D DFT............................................................................................... 29
5.3 Filtering in the Frequency Domain ......................................................................... 30
5.3.1 Fundamentals of Filtering in Frequency Domain ............................................ 30
5.3.2 Lowpass Frequency Domain Filters ................................................................ 31
5.3.3 Highpass Frequency Domain Filters................................................................ 32
ii
Chapter 6: Binary image processing ............................................................................. 33

6.1 Neighborhood relations........................................................................................... 33
6.2 Binary morphology ................................................................................................. 33
6.2.1 Structuring element.......................................................................................... 34
6.2.2 Erosion ............................................................................................................. 35
6.2.3 Dilation ............................................................................................................ 35
6.2.4 Opening and closing ........................................................................................ 36
6.2.5 Finding boundary pixels .................................................................................. 36
6.3 Morphological reconstruction................................................................................. 37
6.4 Gray scale morphology ........................................................................................... 40
6.4.1 Morphological smoothing................................................................................ 40
6.4.2 Morphological sharpening ............................................................................... 41
6.4.3 Compensating for non uniform illumination (Top-hat transformation) .......... 41
Chapter 7: Measurements.............................................................................................. 43
7.1 Selecting objects ..................................................................................................... 43
7.2 Measuring in binary images.................................................................................... 44
7.2.1 Area.................................................................................................................. 45
7.2.2 Perimeter .......................................................................................................... 46
7.2.3 Centroid (Centre of mass)................................................................................ 47
7.2.4 Euler number.................................................................................................... 47
7.3 Errors Introduced by Binarization .......................................................................... 48
7.4 Measuring in Gray-Value images ........................................................................... 48
7.4.1 Thresholding .................................................................................................... 48
7.4.2 Calculating Area, Perimeter and Euler number (in DipImage) ....................... 49
7.5 Linking measurements to real world....................................................................... 50
7.5.1 Pinhole camera model...................................................................................... 50
7.5.2 Thin Lens law .................................................................................................. 51
7.5.3 Calculating real size of the object.................................................................... 52
Chapter 8: Color Image Processing .............................................................................. 53
8.1 RGB color model .................................................................................................... 53
8.2 HSV (HSI) color model .......................................................................................... 55
8.3 Conversion between different color models ........................................................... 56
8.4 Spatial Filtering of color images............................................................................. 58
8.4.1 Color smoothing............................................................................................... 58
8.4.2 Color sharpening.............................................................................................. 58
8.4.3 Motion blur in color images............................................................................. 59
8.4.4 Color edge detection ........................................................................................ 60
8.4.5 Color morphology............................................................................................ 60
8.5 Segmentation of color images................................................................................. 61
Chapter 9: Advanced topics........................................................................................... 63
9.1 Scale-Spaces ........................................................................................................... 63
9.2 Hough Transform.................................................................................................... 66
iii
9.3 Mean Shift............................................................................................................... 69

Chapter 10: Image Segmentation .................................................................................. 73
10.1 Point line and edge detection ................................................................................ 74
10.1.1 Detection of isolated points ........................................................................... 74
10.1.2 Line detection................................................................................................. 75
10.1.3 Edge detection................................................................................................ 78
10.2 Extracting corners and blobs................................................................................. 80
10.2.1 Corners........................................................................................................... 81
10.2.2 Blobs .............................................................................................................. 83
10.3 Clustering methods in segmentation..................................................................... 84
Chapter 11: Image description (Advanced) ................................................................. 88
11.1 Global descriptors ................................................................................................. 89
11.1.1 Color descriptors............................................................................................ 89
11.1.2 Texture descriptors......................................................................................... 90
11.1.3 Shape descriptors ........................................................................................... 92
11.2 Local descriptors................................................................................................... 95
11.2.1 SIFT Detector................................................................................................. 95
11.2.2 SIFT Descriptor ............................................................................................. 96
11.2.3 SIFT Matching ............................................................................................... 97
iv
Foreword:
This manual is part of the Robot Practicals lab course. In the Robotics Practicals, the aim
is to familiarize yourself with the software tools that are necessary to program the robots
at the Delft Biorobotics Lab (DBL). The lab courses are very much hands-on; you will be
able to practice everything you learn immediately by doing the exercises that are
contained in the text. Because the aim is to understand the tooling instead of a particular
application, we specially encourage you to explore and try to combine the things you've
learned in new and hopefully exciting ways!
In this practicum we aim to explain basics of Image processing required for solving
simple problems in robotic vision and to give you basic knowledge for reading state of
the art vision books and papers. For more advanced explanations please refer to the
given literature. All exercises are given in Matlab so basic understanding of Matlab
environment is prerequisite for this practical.
There is no exam at the end of the course, but student is required to solve all the
assignments within practical and to come to the instructors to discuss the given solutions.
Pay attention that some parts of the Practicum are referred as advanced. We advise you
to read these parts and to solve assignments only if you would like deeper understanding
of the topic.
If you have any problems to understand parts of the Practicum after checking the given
literature, please contact the authors.
We wish you a nice work!
Authors:
Maja Rudinac m.rudinac@tudelft.nl
Xin Wang xin.wang@tudelft.nl
Delft, 2010
Chapter 1 Introduction
Chapter 1: Introduction
Image processing is an area that needs a lot of experimental work to establish the
proposed solution to a given problem. The main goal of this Practicum is to get hands-on
experience with image processing. To do so, you will have to learn the image processing
environment: Matlab and the DIPimage toolbox for scientific image processing. To
facilitate a quick start, there will not be an in-depth explanation of all the features in this
software. We will briefly present the things as you need them.
In this Practicum, the details of image processing theory are left out. For these please
refer to:
Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing, the second

version, Prentice Hall, 2002
Rafael C. Gonzalez, Richard E. Woods and Steven L. Eddins, Digital Image
Processing using MATLAB, the second version, Prentice Hall, 2002
Young, I.T., Gerbrands, J.J. and van Vliet, L.J., The Fundamentals of Image
Processing, Department of Applied Physics, TUD. It is available online at
http://www.tnw.tudelft.nl/live/pagina.jsp?id=223d8f77-7ef1-4d41-b336
28ae2f9a83ef&lang=en
1.1 Matlab
This subsection serves to get yourself acquainted yourself with Matlab so if you are
already familiar with Matlab, skip this section. If you have never worked with Matlab
before, it is necessary to check this section and related materials.
Matlab is an abbreviation of Matrix Laboratory. It is a numerical computing environment
and fourth-generation programming language that allows matrix manipulations, plotting
of functions and data, implementation of algorithms, creation of user interfaces, and
interfacing with programs written in other languages, including C, C++, and Fortran.
1.1.1 Matlab Introduction
First, let us have a look at Matlab desktop shown in Fig. 1.1. The desktop is the main
Matlab application window and it contains five sub windows: the Command Window, the
Workspace Browser, the Current Directory Window, the Command History Window, and
one or more Figure Windows which are used to display graphics. Their functions are
following:
The Command Window is the place that user can type in Matlab commands at the
prompt (>>) and where the outputs of those commands are displayed.
The workspace stores the set of variables that user creates during a work session.
The Command History Window contains a record of the commands a user had
entered in the Command Window. We can re-execute history commands by rightclicking on commands from Command History Window.
2
The Current Directory Window shows the contents of the current directory.
Current
folder
Command
window
Workspace
Command
history
Fig 1.1 Desktop of Matlab

Any file that runs in Matlab must reside in the current directory or in a directory that is on
the search path. The way to check which directories are on search path, or to add or
modify a search path, is to select Set Path from File menu on the desktop, and then use
Set Path dialog box.
1.1.2 Using Matlab Editor to Create M-file
An m-file is a simple text file in which you can write Matlab commands and save for later
use. When the file is running, Matlab reads the commands and executes them exactly as it
would if you had typed each command sequentially at the Matlab prompt. All m-file
names must end with the extension '.m' (e.g. imageprocessing.m).
To create an m-file, choose New from File menu and select m-file. This procedure brings
up a text editor window in which you can enter Matlab commands. Or simply type edit at
the prompt in Command Window to create a new m-file. To save the m-file, go to the
File menu and choose Save (remember to save it with the '.m' extension). To open an
existing m-file, go to the File menu and choose Open and select .m file that you want.
1.1.3 Getting Help
The principle way to get help online is to use Matlab Help Browser by clicking on
Question mark symbol on desktop toolbar, or by typing helpbrowser at the prompt in the
Command Window. For example, help on a specific function is obtained by selecting the
Search tab, selecting Function Name as search type, and then typing in the function name
in the search field. Another way to obtain help for a specific function is by typing doc
followed by the function name at the command prompt.
M-functions have two types of information that can be displayed by user. One is H1 line
which contains the function name and one-line description and anther one is Help text
block. Typing in help at the prompt followed by a function name displays both the H1
line and the help text for that function. If type in look for followed by a keyword at the
prompt, we can have all the H1 lines that contain that keyword. It is especially useful
when looking for a particular topic without knowing the name of the function.
1.1.4 Starting with Image Processing Toolbox
The Image Processing Toolbox software is a collection of functions that extend the
capability of the MATLAB numeric computing environment for the purpose of image
processing.
Before we start, it is better to clear variables and figures before commands are executed.
This avoids undetected errors.
>>clear all
>>clc
Read and Display an image (Matlab)

Here, we give an example that how to use the functions in Image Processing Toolbox to
read and display images. Matlab uses function imread to read images, whose syntax is:
I=imread(filename)
Filename is a string containing the complete name of the image file (including any
applicable extension). And Table 1.1 gives some of images/graphics formats supported
by imread in Matlab.
Format name
TIFF
JPEG
GIF
BMP
PNG
XWD
Table 1.1 image formats in Matlab

Description
Recognized extension
Tagged image file format
.tif, .tiff
Joint photographic experts group
.jpg, .jpeg
Graphics interchange format
.gif
Windows bitmap
.bmp
Portable network graphics
.png
X window dump
.xwd
To display images, Matlab uses function imshow:

imshow(I, p)
I is an image array, and p is the number of intensity levels used to display it. If p is
omitted, it defaults to 256 levels.
>>I=imread('football.png');
>>imshow(I);
Fig 1.2 Football

In order to get additional information from an image we use following functions:
size(I)
; whos(I)
Example 1.2 Obtain additional information from an image (Matlab)

>>size(I)
ans = 280
280
>>whos I
Name Size
Attributes
I 280x280x3
3
Bytes
Class
235200 uint8
As shown in Command Window, size function gives the row and column dimensions as
well as depth of an image( 1 for grayscale images, 3 for color images), and whos displays
more information about the image array such as bytes and data type.
1.2 DIPimage
1.2.1 DIPimage Introduction
DIPimage is the additional toolbox which is used under MATLAB to do image
processing. At this time, we will review only the most relevant features. You will get to
know this environment better throughout the course.
The toolbox can be downloaded from: http://www.diplib.org/download
And the DIPimage user guide from: http://www.diplib.org/documentation
By typing in command dipimage in Command Window in Matlab, we can enter the
DIPimage environment.
On the top-left corner of Matlab the DIPimage GUI (Graphical User Interface) will
appear, same as one shown in Fig 1.3. The other 6 windows are used to display images.
The GUI contains a menu bar. Spend some time exploring the menus. When you choose
one of the options, the area beneath the menu bar changes into a dialog box that allows
you to enter the parameters for the function you have chosen. Most of these functions
correspond to image processing tasks, as we will see later.
Fig 1.3 GUI of DIPimage

There are two ways of using the functions in this toolbox. The first one is through the
GUI, which makes it easy to select parameters and functions. We will be using it very
often at first, but gradually less during the course. The other method is through the
command line.
When solving the problems in this laboratory course, we will be making text files that
contain all commands we used to get to the result. This makes the results reproducible. It
will also avoid lots of repetitive and tedious work. We recommend you make a new file
for each exercise, and give them names that are easy to recognize. The file names should
end in .m, and should not conflict with existing function or variable names.
Similar to Matlab, if you start each file with the commands bellow the variables and the
figure windows will be cleared before your commands are executed. This avoids
undetected errors.
>>clear
>>dipclf
1.2.2 Getting Started

Read and display an image (DIPimage)
We need to load an image (from file) into a variable before we can do any image
processing. The left-most menu in the GUI is called File I/O, and its first item Read
Image (readim). Select it. Press the Browse button, and choose the file twocells.tif .
Now press the Execute button.
Fig 1.4 Two cells

Two things should happen:
1. The image twocells.tif is loaded into the variable image_out, and displayed to a
figure window (see Figure 1.4).
2. The following lines (or something similar) appear in the command window:
image_out = readim('twocells.tif')
Displayed in figure 15
This is to show you exactly the same would have happened if you had typed that
command directly in the command window.
Note that if we omitted the extension to the filename, readim can still find the file without
specifying the file type. We also didnt specify the second argument to the readim
function. Finally, by not specifying a full path to the file, we asked the function to look
for it either in the current directory or in the default image directory.
>>image_out = readim(body1);
Note that the contents of variable image_out changes, but the display is not updated.
To update the display, simply type:
>>image_out
You will have noticed that the image in variable image_out is always displayed in the
top-left window. This window is linked to that variable. Images in all the other
variables are displayed to the sixth window. This can be changed, see the DIPimage
manual for instructions.
A grey-value image is displayed by mapping each pixels value in some way to one of the
256 grey-levels supported by the display(ranging from black (0) to white (255)). By
default, pixel values are rounded, negative values being mapped to 0, and values larger
than 255 to 255. This behavior can be changed through the Mappings menu on each
figure window. Note that these mappings only change the way an image is displayed, not
the image data itself.
Another menu on the figure window, Actions, contains some tools for interactive
exploration of an image. The two tools we will be using are Pixel testing and Zoom.
Pixel testing allows you to click on a pixel in the image (hold down the button) to
examine the pixel value and coordinates. Note that you can drag the mouse over the
image to examine other pixels as well. The right mouse button does the same thing as the
left, but changes the origin of the coordinate system and also shows the distance to the
selected origin (right-click and drag to see this).
The Zoom tool is to enlarge a portion of the image to see it more clearly.
Assignment 1.1:
Create an M file with the function called ImBasic, which has read, display, save and write
operations. Read the image boy.bmp into Matlab environment, display it on the screen,
check its additional parameters and finally save it in your assignment directory by name
boy_assignment and jpg extension.
Chapter 2 Image representation and manipulation
Chapter 2: Image representation and manipulation

2.1 Digital Image Representation
An image can be defined as a two-dimensional function, f(x, y), where x and y are spatial
(plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called
intensity.
2.1.1 Image Sampling and Quantization
An image is continuous with respect to the x and y coordinates as well as the amplitude.
For computer application, we need to convert such an image to a digital form. Digitizing
the coordinate values is called sampling, and digitizing the amplitude is called
quantization. Since x, y and intensity are all finite values we call this image a digital
image. The whole process is shown in Fig 2.1.
Fig 2.1 Sampling and quantization process
Fig 2.2 Image sampling

As shown in Fig 2.2, we take a monochrome image as an example. Image sampling is a
sampling set of points from 2-D grid which transforms infinite points on an image into
9

finite points on a digital image. And image quantization uses a set of grey levels to
sample the infinite amplitude into finite grey values. Then we can get image array F (i, j):
a0,0
a
1,0
A=
aM 1,0
a0,1
a1,1
aM 1,1
a0, N 1
f (0,0)
f (1,0)
a1, N 1
F (i, j ) =
aM 1, N 1
f ( M 1,0)
f (0,1)
f (1,1)
f ( M 1,1)
f (0, N 1)
f (1, N 1)
f ( M 1, N 1)
Each element is this array is called image element, picture element or pixel. A digital
image can be represented as Matlab matrix
f (1,1)
f (2,1)
f (i , j ) =
f ( M ,1)
f (1, 2)
f (2, 2)
f ( M , 2)
f (1, N )
f (2, N )
f (M , N )
Note that Matlab is different from C and C++ in the way that its first index is from (1, 1)
instead of (0, 0). Notice this when programming.
Example 2.1 Downsample an image by factor 2 then upsample by factor 2 (Matlab)
>>img=imread('football.png');
>>figure
>>imshow(uint8(img));
>>title('orginal image');
>>subimg = imresize(img, 0.5);
>>figure
>>imshow(uint8(subimg))
>>title('first subsampling');
>>subsubimg = imresize(subimg, 0.5);
>>figure
>>imshow(uint8(subsubimg))
>>title('second sampling');
>>upimg=imresize(subsubimg,2);
>>figure
>>imshow(uint8(upimg));
>>title('first upsampling');
>>upupimg=imresize(upimg,2);
>>figure
>>imshow(uint8(upupimg))
>>title('second upsampling');
The resulting images are shown in Figure 2.3 and Figure 2.4. We can see if we first
downsample an image and than upsample the same image to its original size the image
will lose some information. It shows the importance of spatial resolution.
10
Figure 2.3 Dowsample an image by factor 2
Figure 2.4 Upsample an image by factor 2

Example 2.2 Downsample the amplitude resolution in an image by factor 2 (DIPimage)
>>image_out = readim('twocells.tif','')
>>sub1 = subsample(image_out,2)
>>sub2 = subsample(sub1,2)
And the results of reducing grey level can be seen in Figure 2.5.
256 levels
32 levels
16 levels
Figure 2.5 Reducing amplitude resolution
Assignment 2.1:
Write your own m file with downsample function without using any functions provided
by Matlab. Read the image kickoff.jpg into Matlab environment, use for or while loops
to downsample this image and then use imshow to display the final result.
11

2.1.2 Data Types and Image Types
Data types and conversion
The values of pixels are not restricted to be integer in Matlab. Table 2.1 lists the data
types supported by Matlab.
All numeric computations in Matlab are done using double data type. unit8 is also
encountered frequently, especially when reading data from storage devices, as the 8-bit
images are the most common representations in practice.
There are two different ways to convert between data types. One is very straightforward
and has the general syntax
B = data_type_name(A)
And another one uses the functions in Table 2.2.

Name
double
unit8
unit16
unit32
int8
int16
int32
single
char
logical
Name
im2unit8
im2unit16
mat2grey
im2double
im2bw
Table 2.1 Data types in Matlab

Description
Double-precision, floating number in the range 10308 10308
Unsigned 8-bit integer in the range [0, 255] (1 byte per element)
Unsigned 16-bit integer in the range [0, 65536] (2 bytes per element)
Unsigned 32-bit integer in the range [0, 4294967295] (4 bytes per element)
Signed 8-bit integer in the range [-128, 127] (1 byte per element)
Signed 16-bit integer in the range [-32768, 32767] (2 bytes per element)
Signed 32-bit integer in the range [-2147483648, 2147483647] (4 bytes per
element)
Single-precision floating-point number rang 1038 1038 (4 bytes per
element)
Characters (2 bytes per element)
Values are 0 or 1 (1 byte per element)
Table 2.2 data types converting function
Converts input to
Valid input data types
uint8
logical, unit8, unit16 and double
unit16
double( range [0,1] )
double
double
logical
unit8, unit16 and double
Note that the rounding behavior of im2unit8 is different from data_type_name conversion
function unit8, which simply discards fractional parts. im2unit8 sets all the values in input
array less than 0 to 0, and greater than 1 to 1, and multiplies all the other values by 255.
Rounding the results of the multiplication to the nearest integer, then the conversion is
completed.
12

Image types
The toolbox supports four types of images: Intensity images, binary images, indexed
images and RGB images.
It is worth mentioning, a binary image is a logical array of 0s or 1s. A numeric array is
converted to binary using function logical.
B = logical(A)
2.2 Graphic Geometric Transformation

Basic graphic geometric transformations contain scalar, rotation, mirroring, shifting
which are useful for image preprocessing.
There are two main methods in Matlab for geometric transformations:
1. Use figure window menu tools which has limit applications like zoom in and zoom
out.
2. Use Matlab transformations commands and they are more flexible for incorporating
into existing code.
Example 2.3 Image rotation (Matlab)
>>I = imread('football.png');
>>J = imrotate(I,-50,'bilinear','crop');
>>figure, imshow(I)
>>figure, imshow(J)
The result is shown in Figure 2.6.
Figure 2.6 Image rotation

In DIPimage environment, it is very easy to implement graphic geometric
transformations. Go to Manipulation menu in GUI, and choose the exact transformation
that you want, execute, and then you can see the results. Here we give an example about
image mirror based on X axis. The result is shown in Figure 2.7.
Example 2.4 Image Mirror based on X axis (DIPimage)
>>image_out = readim('bear.tif','')
13

>> out = mirror(image_out,'x-axis',1)
Fig 2.7 Image mirror (DIPimage)

2.3 Basic Operations on Image Array
2.3.1
Image Operation
The basic image array operations are listed in Table 2.3.

The basic image array operations are very useful, i.e. imadd function can be used to add
noise to an image, and imsubstract can be used to detect the motion in a video sequence.
We now give an example about how to use basic image array operations:
Table 2.3 Basic image array operations
function
description
imadd
Add two images or add a constant to an image
Subtract from one image from another image or subtract a constant from
imsubtract
an image
immultiply
Multiply two images by pixel value or multiply an image by a constant
divide one image from another image by pixel value or divide an image
imdivide
by a constant
imabsdiff
Calculate the absolute difference about two images
imcomplement Calculate the complement of an image
imlincomb
Calculate the line combination of two image by pixel value
Example 2.5: Use imsubtract to detect motion (Matlab).
Here we assume that the background is unchangeable.
>>figure
>>I1 = imread('moving face1.bmp');
>>imshow(I1);
>>title('video image 1');
>>I2 = imread('moving face2.bmp');
>>figure
>>imshow(I2);
>>title('video image 2');
>>I3 = imsubtract(double(I2),double(I1));
>>figure
>>imagesc(I3);
>>title('subtraction');
14
video image 1
video image 2
substraction
50
100
150
200
250
50
100
150
200
250
300
350
Figure 2.8 Image subtraction for motion detection

Assignment 2.2:
In Graphic Geometric Transformation, we explained image rotation and image mirror.
Try to work out other GGT applications like image cropping and image scaling as well as
image shifting, using the image lena.tif.
2.3.2 Array Operations
Array operations mainly consist of Matlab indexing operation which is very useful in
array manipulation and program optimization. Please check related literature to check
how to use array index. Also, it is useful to generate some useful and simple image arrays
to try out ideas.
zeros (M, N) generates an M N matrix of 0s of double type data.

ones (M, N) generates an M N matrix of 1s of double type data.
true (M, N) generates an M N logical matrix of 1s.
false (M, N) generates an M N logical matrix of 0s.
magic (M, M) generates an M M magic square in which the sum along any row,
column or main diagonal is the same.
rand (M, N) generates an M N matrix whose entries are uniformly distributed
random numbers are integers.
randn (M, N) generates an M N matrix whose numbers are normally distributed
random number with mean 0 and variance 1
Assignment 2.3:
Use basic array operations to add image noise to an image region. Read the image
drop.tif, and use array indexing, imadd and randn function to add noise only to the
drop region.
15
Chapter 3 Intensity Transformation
Chapter 3: Intensity Transformation

In this chapter, most of the examples are related to image enhancement which is usually
the fist step for image processing. And intensity transformations are conducted in spatial
domain which means the operations are directly on the pixels of images.
The spatial domain processes are denoted by expression:
g ( x, y) = T [ f ( x, y)]
Where f(x, y) is the input image, g(x, y) is the output image, and T is an operator on f,
defined over a special neighbored about point (x, y) as shown in Figure 3.1.
Figure 3.1 Spatial domain based image operation

The principle approach for defining spatial neighborhoods around a point (x, y) is to use a
square or rectangular region centered at (x, y) which is usually called window or mask.
Only the pixels in the neighborhood are used in computing the value of g at (x, y).
There are two ways to perform intensity transformation: contrast manipulation and
histogram modification.
3.1 Contrast Manipulation
There are lots of ways to change contrast of an image. Some common operations are:
linear scaling and clipping
power-law
logarithmic transformation
reverse function
inverse function
In Matlab, imadjust function is used for intensity transformation of grey-scale images:
g = imadjust (f, [low_in high_in], [low_out high_out], gamma)
16

The values between low_in and high_in map into values between low_out and high_out.
Values below low_in and above high_in are clipped. The input image can be unit8, unit16
or double, and the output image has the same data type as the input. The parameter
gamma determines the shape of the curve that map intensity values in f to create g. If
gamma is less than 1, the mapping is weighted toward higher (brighter) output values. If
gamma is greater than 1, the mapping is weighted toward lower (darker) output values.
Example 3.1 Intensity transformation (Matlab)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
f = imread('mary.bmp');
figure
imshow(f)
g1 = imadjust(f,[],[],0.5);
figure
imshow(g1)
g2 = imadjust(f,[],[],2);
figure
imshow(g2)
g3 = imadjust(f,[0 1],[1 0]);
figure
imshow(g3)
The results are shown in Figure 3.2.
Figure 3.2 Intensity transformation

We also give an example in DIPimage. In DIPimage GUI environment, the
transformation is given in menu part Point.
Example 3.2 Intensity stretching (DIPimage)
>> image_out = readim(blood.tif,'')
>> a = stretch(image_out,0,100,0,255)
Assignment 4.1:
Use logarithmic transformation to enhance the dark image qdna.tif. The
transformation function is:
g = c log(1 + f )
Assignment 4.2:
Use power-law transformation to enhance the light image sca.tif. The transformation
function is:
g = cf
17

3.2 Histogram Processing
Histograms are widely used in image processing for operations such as: enhancement,
compression, segmentation and etc.
3.2.1 Histogram calculation
For an image with gray levels in [0, L-1] and M N pixels, the histogram is a discrete
function given by
h(rk ) = nk
Parameter rk is the k th intensity value, and nk is the number of pixels in the image with
intensity rk .
It is a common practice to normalize the histogram function by the number of pixels in
the image:
n
p ( rk ) = k
MN
The normalized histogram can be used as an estimate of the probability density function
of the image. For enhancement, histograms can be used to infer the type of image quality:
dark, bright, low or high contrast.
Example 3.2 Histogram calculation
>>h1 = imhist(f);
>>h1_plot = h1(1:5:256);
>>horz = 1:5 :256;
>>bar(horz, h1_plot)
>>axis([0 255 0 500])
>>set(gca, 'xtick',0:10:255)
>>set(gca,'ytick',0:20:1000)
>>h1 = imhist(f);
>>h1_plot = h1(1:5:256);
>>horz = 1:5 :256;
>>bar(horz, h1_plot)
>>axis([0 255 0 3800])
>>set(gca, 'xtick',0:10:255)
>>set(gca,'ytick',0:200:3800)
Resulting histograms of images from Figure 3.2 are shown in Figure 3.3
Figure 3.3 Histogram calculation
18

As we can see from Figure 3.3, if the histogram distribution is centered in low part, the
image will be darker, on the contrast, if the histogram distribution is centered in high part,
the image will be brighter.
In DIPimage also to calculate histogram following function should be used. In menu
Point, choose Histogram (diphist) to get the histogram.
>>a = stretch(image_out,0,100,0,255)
>>diphist(a,[0 255],256,'bar')
3.2.2 Histogram equalization

It is quite acceptable that high contrast images have flat histograms (uniform distribution)
Histogram equalization attempts to transform the original histogram into a flat one for the
goal of a better contrast.
Histogram equalization is implemented in the Image processing toolbox by function
histeq, which has the syntax:
g = histeq(f, nlev)
Parameter f is the input image and nlev is the number of intensity levels that is specified
for output image. If the nlev is equal to L (the total number of possible levels in the input
image), histeq directly implements the transformation function. If nlev is less than L,
histeq attempts to distribute the levels so that they will approximate a flat histogram.
Histogram equalization produces a transformation function that is adaptive, in the sense
that it is based on the histogram of a given image. However, once the transformation
function of an image has been computed, it does not change unless the histogram of the
image changes. But image equalization which spread the levels of the input image over a
wider range of intensity scale can not always enhance image. In particular, it is useful in
some applications to be able to specify the shape of the histogram that we wish the
processed image to have. This method is used to generate an image that has a specified
histogram which is called histogram matching or histogram specification. In Matlab, we
also use histeq to implement histogram matching, which has the syntax:
g = histeq(f, hspec)
Parameter f is the input image, hspec is the specified histogram, and g is the output image
whose histogram approximates the specified histogram.
3.3 Threshold
Threshold technique is widely used in image segmentation to segment images based on
intensity levels.
Let us now have a look at the histogram in Figure 3.4 which is corresponding to the
image named coins, composed of light objects on a dark background, in such a way
that object and background pixels have intensity levels grouped into two dominant
19

modes. One obvious way to extract the objects from the background is to select a
threshold T that separates these modes. Threshold function can be described as:
1 if ( x, y ) T
g ( x, y ) =
0 if ( x, y ) < T
When T is a constant, this approach is called global thresholding. The way to choose a
threshold is by visual inspection of the image histogram. If the histogram has two
different distinct modes, it is not difficult to choose a threshold T. Global thresholding
methods can fail when the background illumination is uneven. A common practice in
such situation is to preprocess the image to compensate for illumination problems and
afterwards apply a global threshold which is given in expression bellow. This process is
called local thresholding.
1 if f ( x, y ) T ( x, y )
g ( x, y ) =
0 if f ( x, y ) < T ( x, y )
The threshold T(x,y) is obtained using morphological processing of an image which will
be described in Chapter 6.
3200
3000
2800
2600
2400
2200
2000
1800
1600
1400
1200
1000
800
600
400
200
0
16 32 48 64
80 96 112 128 144 160 176 192 208 224 240
Figure 3.4 Image histogram

Example 3.5 Image thresholding (Matlab)
>>f = imread('cameraman.tif'); subplot(1,3,1);
>>imshow(f); title('original image');
>>subplot(1,3,2);
>>imhist(f);
>>t1 = 90;
% read the value of valley from histogram
>>[m n]=size(f);
>>f1=zeros(m,n);
>>for i=1:m
for j=1:n
if f(i,j)>t1
f1(i,j)=1;
else
f1(i,j)=0;
end
end
end
>>subplot(1,3,3);
>>imshow(f1);
>>title(thresholded image)
20
Figure 3.6 Result of Image thresholding

DIPimage toolbox has implemented several functions that can do thresholding. At first
we should choose an image in which the objects are clearly defined and are easy to
segment. The Segmentation menu contains the function threshold, which assigns
object or background to each pixel depending on its grey-value. We dont need them
for this image. Select fixed for the Type parameter; this requires you to provide a
threshold. Another solution is to do the thresholding in an alternative way: using
relational operators. Recall that thresholding returns an image with true, where the
image is larger than some threshold. This can be accomplished with the larger than (>)
operator. The result of image thresholding is in Figure 3.7.
Figure 3.7 Result of Image thresholding displayed in DIPimage

Assignment 4.3
Combine histogram operation with thresholding method to segment an object from its
background. Read the image cookies.tif into Matlab, use imhist to get the histogram of
this image, read from the histogram best thresholding value and use this to threshold the
image. Output the final result using function imshow. Try to use subplot function to
display the original image and processed image in the same figure.
21
Chapter 4 Spatial Filtering
Chapter 4: Spatial Filtering

There are two main types of filtering applied to images: spatial domain filtering and
frequency domain filtering. The frequency domain filtering will be discussed in Chapter
5. This chapter is mainly about spatial filtering.
4.1 Basic Ideas
Spatial Filtering is sometimes also known as neighborhood processing. Neighborhood
processing is a method that defines a center point and performs an operation (or apply a
filter) to only those pixels in predetermined neighborhood of that center point. The result
of the operation is one value, which becomes the value at the center point's location in the
modified image. Each point in the image is processed with its neighbors. The general idea
is shown below as a "sliding filter" that moves throughout the image to calculate the
value at the center location. The main process follows steps bellow:
1. Defining a center point, (x, y);
2. Performing an operation that involves only pixels in a predefined neighborhood around
the center point;
3. Letting the result of that operation be the response of the process at that point;
4. Repeating the process for every point in the image. The process is shown in Fig 4.1.
Fig 4.1 Neighborhood processing

There are two types of filters: linear and non-linear filters. Linear filters can be
implemented through a convolution, whereas non-linear filters can not. Therefore, linear
filters are easier to implement, and are important for Fourier analysis. All filters discussed
in this section (both linear and non-linear) can be implemented as a combination of the
values in a neighborhood; that is, each pixel in the output image is computed out of a
fixed set of pixels in the neighborhood of the corresponding pixel in the input image.
22

Typical neighborhood shapes are round or square. We call this neighborhood operation
kernel or mask and it is very important in image filtering. In Table 4.1 we can find
different kernels.
Filter type
Filter
Averaging
Low-pass
Gaussian
High-pass
Sharpening
Edge
detection
Gradient
Embossing
Directional
Table 4.1 Filter kernels

Sample 3 3 kernels
a a a 0 a 0
a b a a b a
a a a 0 a 0
c a c
a b a
c a c
a a a 0 a 0
a b a a b a
a a a 0 a 0
b
a
a
0
0
0
a b a
a 0 a
b 0 b
a 0 a
a a
0
a a a
a a 0
1 1 1 1 1 1
1 2 1 1 2 1
1 1 1 1 1 1
23
Description
Smoothing, noise reduction
and blurring filter(focal mean)
Smoothing, noise reduction
and blurring filter(focal weight
mean)
Mean effect
removal/sharpening filter
(focal sum). Provides limited
edge detection. Typically
entries sum to 1 but may be
greater. 3 3 laplacian kernels
typical add to 1. Large
laplacian kernels(e.g. 7 7 )
may be more complex and sum
to >1
Applied singly or as two-pass
process. These kernels
highlight vertical and
horizontal edges. When used in
combination they are known as
Gradient or Order 1 derivative
filters. Typically a = 1 and b =
1 or 2 and entries sum to 0.
Edge detecting filters that
enhance edges in a selected
compass direction to provide
an embossed effect. The
example here shows a sample
north-east kernel
Simply computation of
gradient in one of 8 compass
direction, east and north
directional derivative are
illustrated in the first two
examples here.

4.2 Smoothing (Blurring) Filters
MATLAB:
Smoothing filters are used to smooth (blur) edges and noise in the images. In Matlab, it is
best to use imfilter to generate smoothing images. Firstly the smoothing mask should be
defined and afterwards applied on the image to smooth it.
Example 4.1 Apply average filter to an image (Matlab)
>>w =[1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9];
>>car =imread('car.bmp');
>>figure
>>imshow(car)
>>g =imfilter(car, w);
>>figure
>>imshow(g)
Figure 4.1 Average filtering

DIPimage
Go to Filters menu of the GUI and there you will find a lot of filters. Let us take one
smoothing filter as example.
Example 4.2 Apply Gaussian filter to image (DIPimage)
Select Gaussian Filter (gaussf) in the menu of the GUI. We need to choose the size of the
Gaussian filter: the standard deviation in pixels. Try out different values for it, and see
what happens. The filter size (scale) is very important, as will be shown in Figure 4.2.
>>gaussian1 = gaussf(image_out,1,'best')
>>gaussian1 = gaussf(image_out,2,'best')
>> gaussian1 = gaussf(image_out,3,'best')
>> gaussian1 = gaussf(image_out,4,'best')
Figure 4.2 Gaussian filter
24

4.3 Sharpening Filter
Sharpening filters are used to enhance the edges of objects and adjust the contrast and the
shade characteristics. In combination with threshold they can be used as edge detectors.
Sharpening or high-pass filters let high frequencies pass and reduce the lower frequencies
and are extremely sensitive to shut noise. To construct a high-pass filter the kernel
coefficients should be set positive near the center of the kernel and in the outer periphery
negative.
MATLAB:
We will now try to use another Matlab function fspecial to generate sharpening filter. We
take Laplace filter as example which can enhance discontinuities. First we use negative
Laplace operation to apply on the image, and afterwards we subtract negative Laplace
image from original to get sharpening image. The result is shown in Figure 4.3.
Example 4.3 Sharpening image (Matlab)
>>f = imread('kickoff_grayscale.jpg');
>>w = fspecial('laplacian',0);
>>g =imfilter(f, w, 'replicate');
>>g = f - g;
>>imshow(g)
a) Original image
b) Laplace image
Figure 4.3 Sharpening Image
c) Sharpening image
DIPimage
The menu item Differential Filters in filters menu contains a general derivative filter, a
complete set of first and second order derivatives, and some combinations like gradient
magnitude and Laplace filters. Let us try to use Laplace filter in DIPimage.
Example 4.4 Apply Laplace filter to an image (DIPimage)
>>image_out = readim('schema.tif','')
>>g = laplace(image_out, 1)
4.3 Edge Filter

Edges characterize boundaries and are therefore a problem of fundamental importance in
image processing. Edges in images are areas with strong intensity contrasts. Edge
25
detecting significantly reduces the amount of useless information, while preserving the
important structural properties in an image. The majority of different methods are
grouped into two categories, gradient and Laplacian. The gradient method detects the
edges by looking for the maximum and minimum in the first derivative of the image. The
Laplacian method searches for zero crossings in the second derivative of the image to
find edges. More will be explained in Chapter 10, in edge segmentation section.
4.4 Nonlinear filter
A nonlinear filter is the filter whose output is a nonlinear function of the input. By
definition, any filter that is not a linear filter is a nonlinear filter. One practical reason to
use nonlinear filters instead of linear filters is that linear filters may be too sensitive to a
small fraction of anomalously large observations at the input. One of the most commonly
used nonlinear filters is the median filter which is very effective to reduce salt-and-pepper
noise in an image.
MATLAB:
There are two different ways to generate median filter, one using ordfilt2 function while
another using median function directly. The result is shown in Figure 4.4.
Example 4.5 Apply median filter to an image (Matlab)
>>f = imread('kickoff_grayscale.jpg');
>>fnoise = imnoise(f, 'salt & pepper', 0.2);
>>figure
>>imshow(fnoise)
>>g = medfilt2(fnoise);
>>figure
>>imshow(g);
Figure 4.4 Median filtering

DIPimage
Choose median filter(medif) in filter menu to apply median filter to an image.
Example 4.6 Apply median filter to image (DIPimage)
>>image_out = readim('blood.tif','')
>>out = noise(image_out,'saltpepper',0.2,0.2)
>>result = medif(out,5,'elliptic')
26

Assignment 4.1
1. Load the image shading which contains some text on a shaded background. To
remove this shading, we need to estimate the background. Once this is done, we can
correct the original image. This is a common procedure, often required to correct for
uneven illumination or dirt on a lens. There are several background shading estimation
methods:
The most used one is the low-pass filter (gaussf). Try finding a relevant parameter
for this filter to obtain an estimate of the background, and then correct the original
image.
Another method uses maximum and minimum filtering. Since the text is black, a
maximum filter with a suitable size will remove all text from the image, bringing
those pixels to the level of the background. The problem is that each background
pixel now also was assigned the value of the maximum in its neighborhood. To
correct this, we need to apply a minimum filter with the same size parameter. This
will bring each background pixel to its former value, but the foreground pixels
wont come back!
Use this estimate of the background to correct the original image. Use only Matlab.
27
Chapter 5 Frequency based processing
Chapter 5: Frequency based Processing

Before you start this section, look up Fourier Transform in your book. Note that the
Discrete Fourier Transform (DFT) is not the same as the continuous version: since the
image is sampled, the Fourier Transform is periodic, but the Fourier Transform itself
must also be sampled (hence discrete), thus the image must be assumed periodic too.
Fourier transformation offers considerable flexibility in the design and implementation of
filtering solutions in area such as image enforcement, image restoration and image data
compression. However, in robotic application, we usually process images in spatial
domain.
5.1 The Basics of Fourier Transform
First introduce some basic ideas about Fourier Transform.

Let f(x, y), for x = 0, 1,, M-1 and y = 0, 1,, N-1, denote an M N image. The 2-D,
discrete Fourier Transform (DFT) of f, denoted by F(u, v), is given by the equation:
M 1 N 1
F (u , v) = f ( x, y )e j 2 (ux M + vy N ) for u = 0,1,2,,M-1 and v = 0,1,2,,N-1.

x =0 y =0
And inversely, discrete Fourier transform is given by:
f ( x, y ) =
1
MN
M 1 N 1
F (u, v)e
j 2 ( ux M + vy N )
for x = 0,1,2,,M-1 and y = 0,1,2,N-1.
x =0 y =0
And we should know that even if f(x, y) is real, its transform is complex. The principle
method of visually analyzing a transform is to compute its spectrum. The Fourier
spectrum is defined as:
F (u, v) = R 2 (u, v) + I 2 (u, v)
The power spectrum is defined as:

2
p (u , v ) = F (u , v ) = R 2 (u , v ) + I 2 (u , v)
The phase angle is defined as:

I (u, v)
R(u, v)
(u, v) = tan 1
For purpose of visualization it typically is immaterial whether we view

F (u, v) or P (u , v ) .
28

5.2 Computing 2-D DFT
MATLAB
The FFT of an M N image array f is obtained in the toolbox with function fft2, which
has the syntax:
F = fft 2( f )
This function returns a Fourier transform that is also of size M N . Here we give an
example to compute 2-D DFT using Matlab.
Example 5.1 Compute 2-D DFT (Matlab)
>>img=imread('girl.bmp');
>>subplot(2,2,1);
>>imshow(img);
>>title('original image');
>>F=fft2(img);
>>S=abs(F);
>>subplot(2,2,2);
>>imshow(S,[ ]);
>>title('Fourier spectrum');
>>Fc=fftshift(F);
>>subplot(2,2,3);
>>imshow(abs(Fc),[ ]);
>>title('Centered spectrum');
>>S2=log(1+abs(Fc))
>>subplot(2,2,4);
>>imshow(S2,[ ]);
>>title('log transformation enhanced spectrum');
Figure 5.1 Fourier computation (Matlab)

The result is shown in Figure 5.1. Note that due to the periodicity property, the four
bright spots are in the corners of the image. Thus we need fftshift to move the origin of
the transform to the center of the frequency rectangle.
29
DIPimage
In DIPimage, choose menu transforms and menu item fourier transform(ft).
Example 5.2 Compute 2-D DFT (DIPimage)
>>image_out = readim('acros.tif','')
>>result = ft(image_out)
Figure 5.2 Fourier computation (DIPimage)

5.3 Filtering in the Frequency Domain
5.3.1 Fundamentals of Filtering in Frequency Domain
The foundation for linear filtering in both the spatial and frequency domains is the
convolution theorem, which may be written as:
f ( x, y ) * h( x, y ) H (u , v ) F (u , v )
And conversely,
f ( x, y ) h( x, y ) H (u , v) * F (u , v )
Here the symbol * indicates convolution of the two functions. Thus we can know, from
the two equations, that filtering in the spatial domain consist of convolving an image
f ( x, y ) with a filter mask, h( x, y ) . Now, we give the basic steps in DFT filtering:
1. Obtain the padding parameters using function paddedsize: PQ = paddedsize(size(f));
2. Obtain the Fourier transform with padding: F = fft2( f, PQ(1), PQ(2) );
3. Generate a filter function, H, of size PQ(1) * PQ(2);
4. Multiply the transform by the filter: G = H.* F;
5. Obtain the real part of the inverse FFT of G: G = real (ifft2(G)) ;
6 Crop the top, left rectangle to the original size: G = g(1:size(f,1),1:size(f,2) ).
30
Why padding?
Image and their transforms are automatically considered periodic if we select to work
with DFTs to implement filtering. It is not difficult to visualize that convolving periodic
functions can cause interference between adjacent periods if the periods are close with
respect to the duration of the nonzero parts of the functions. This interference, called
wraparound error, can be avoided by padding the functions with zeros.
5.3.2 Lowpass Frequency Domain Filters
An ideal lowpass filter (ILPF) has the transfer function

1 if D(u, v) D0
H (u , v) =
0 if D(u, v) > D0
where D0 is a specified nonnegative number and D (u , v) is the distance from point (u , v)
to the center of the filter.
MATLAB
In Matlab, there are two ways to generate frequency domain filter, one is obtaining filters
from spatial filter and the other is generating filters directly in the frequency domain.
With regard to the former one, we use function freqz2. Here we focus on using latter
ways for generating filters.
Example 5.3 Lowpass filtering (Matlab)
>>f=imread('girl.bmp');
>>PQ=paddedsize(size(f));
>>[U,V]=dftuv(PQ(1),PQ(2));
>>D0=0.05*PQ(2);
>>F=fft2(f,PQ(1),PQ(2));
>>H=exp(-(U.^2+V.^2)/(2*(D0^2)));
>>g=dftfilt(f,H);
>>subplot(2,2,1);
>>imshow(f,[])
>>title('original image');
>>subplot(2,2,2);
>>imshow(fftshift(H),[])
>>title('the low pass filter in image form');
>>subplot(2,2,3);
>>imshow(log(1+abs(fftshift(F))),[])
>>title('spectrum');
>>subplot(2,2,4);
>>imshow(g, [])
>>title('the image after low pass filter');

DIPimage
Choose Menu filters and pick up different filters, and in method used for computation
combo box choose ft as shown in Figure 5.4.
31
Figure 5.3 Lowpass filtering (Matlab)
Figure 5.4 DIPimage lowpass filter setting

Example 5.4 Lowpass filtering (DIPimage)
>> image_out = readim('trui.tif','')
>> result = gaussf(image_out,5,'best')
5.3.3 Highpass Frequency Domain Filters
Given the transfer function H lp (u , v ) of a lowpass filter, we obtain the transfer function
of the corresponding highpass filter by using the simple relation H lp (u , v) = 1 H lp (u , v)
Assignment 5.1:
Design a Butterworth high pass filter and apply it to the image kickoff_grayscale.jpg.
Look for the website to find related equation of Butterworth low pass filter, and then use
it to get high pass filter and apply it to this image. (Use Matlab)
32
Chapter 6 Binary Image Processing
Chapter 6: Binary image processing

A binary image is a digital image that has only two possible intensity values for each
pixel. Two most commonly used values for image representation are 0 for black and 1 (or
255) for white. Binary image is produced by thresholding a gray scale or color image and
is often output of operations such as image segmentation, masking, silhouette estimation,
etc.. Advantages of binary image processing are simplicity of its processing and low
memory requirements. Observe example of binary image on Figure 6.1. Binary image
can be created using graythresh function in Matlab that performs optimal thresholding.
level = graythresh(Grayscale_image)
Binary_image=Grayscale_image<level;
Color image
Grayscale image
Figure 6.1: Binary image
Binary image
6.1 Neighborhood relations
In binary images, an object is defined as a connected set of pixels. The neighborhood of

a pixel is the set of pixels that touch it, thus, for 2D images we can define the 4neighbourhood and 8-neighbourhood of a pixel, as shown on the image bellow. In 3D, we
can define 6 connected neighborhood, 18 connected neighborhood or 26 connected
neighborhood. Two pixels are said to be connected if they belong to the neighborhood
of each other. Several pixels are said to be connected if there is some chain-ofconnection between any two pixels.
4 connected pixels 8 connected pixels

Figure 6.2: Pixel connectivity
In DIPImage, these connectivities are specified as 1, 2 for the 4 and 8-connected steps. In
3D, a 6-connected, a 18-connected and a 26-connected neighborhoods are represented in
DIPImage with connectivities of 1, 2 and 3 respectively.
6.2 Binary morphology
The basis of mathematical morphology is the description of image regions as sets. The
standard set notation can be used to describe image operations:
33
Set
Operation
A
Ac
A B
A B
A B
MATLAB
Expressions
A
~A
A| B
A& B
A& ~ B
Table 6.1: Set operations

Name of function
Image itself
NOT (image complement)
OR (union of two images)
AND (intersection of two images)
DIFFRENCE (difference of two images; the pixels
in A that arent in B)
All morphological operations can be described using this notation. Main morphological
operations that we will cover in this practicum are dilation, erosion, opening and closing.
6.2.1 Structuring element
A structuring element is used in morphological operations to examine the image. It is a

matrix consisting of only 0's and 1's that can have any arbitrary shape and size. The pixels
with values of 1 define the neighborhood. It should be carefully chosen depending on the
image that is processed.
MATLAB:
In order to determine the structuring element in Matlab function strel is used:
SE = strel (shape, parameters)
This function creates a structuring element, SE, where shape is a string specifying desired
shape Depending on shape, strel can take additional parameters of the type specified by
shape. The table below lists all the supported shapes.
Table 6.2: Structuring elements
Flat structuring element
arbitrary pair
diamond periodicline
disk
rectangle
line
square
octagon arbitrary
Examples of structuring elements:
1 1 1
1 1 1
origin
1 1 1
SE=strel (rectangle, [ 3 3])
1
1 1 1
1
SE=strel (diamond, 1)
1 0 0
1 0 0
1 0 1
SE=strel (arbitrary, [1 0 0; 1 0 0; 1 0 1])
The structuring element consists of a pattern specified as the coordinates of a number of

discrete points relative to some origin. Normally Cartesian coordinates are used and so a
convenient way of representing the element is as a small image on a rectangular grid.
34
Figure 1 shows a number of different structuring elements of various sizes. In each case
the origin is marked with red color. The origin does not have to be in the center of the
structuring element, but often it is. As suggested by the figure, structuring elements that
fit into a 33 grid with its origin at the center are the most commonly seen type.
DipImage:
In DipImage structuring element is defined using parameters filterShape ('rectangular',
'elliptic', 'diamond', 'parabolic') and filterSize within morphological functions.
6.2.2 Erosion
The basic effect of the erosion on a binary image is to gradually enlarge the boundaries of
regions of foreground pixels (i.e. white pixels, typically). Thus areas of foreground pixels
grow in size while holes within those regions become smaller. The dilation operator
takes two pieces of data as inputs. The first is the image which is to be dilated; the second
is a structuring element. It is the structuring element that determines the precise effect of
the dilation on the input image.
Following functions are used:
MATLAB: image_out = imdilate (binary_image, structuring_element)
Dip_image: image_out = dilation (image_in, filterSize, filterShape)
6.2.3 Dilation
Dilation is dual operation to erosion. The basic effect of the dilatation on a binary image
is to gradually enlarge the boundaries of regions of foreground pixels (i.e. white pixels,
typically). Thus areas of foreground pixels grow in size while holes within those regions
become smaller. The dilation operator takes two pieces of data as inputs. The first is the
image which is to be dilated; the second is a structuring element. It is the structuring
element that determines the precise effect of the dilation on the input image.
MATLAB: image_out = imdilate (binary_image, structuring_element)
Dip_image: image_out = dilation (image_in, filterSize, filterShape)
Assignment 6.1
Load image sky.jpg and create binary image from it. Choose structuring element and
perform erosion. Try several different structuring elements, vary their size and generate
images similar to Figure 6.4. What you can observe? Now perform dilation. Can you
observe the difference in images? Results are shown in the Figure 6.3.
35
Original image
Binary image
Erosion
Dilation
Figure 6.3: Effects of erosion and dilation
Binary image
Erosion with SE
Erosion with SE
Erosion with SE
size 3
size 5
size 7
Figure 6.4: Effects of the size of structuring element on erosion
6.2.4 Opening and closing
The sequence of erosion followed by dilation from previous example is called an

opening. It is used to remove small objects from an image while preserving the shape and
size of larger objects in the image. The inverse sequence, dilation followed by erosion, is
a closing and it serves to remove small holes in the objects.
MATLAB: image_out = imopen (binary_image, structuring_element)
image_out = imclose (binary_image, structuring_element)
Dip_image: image_out = bopening (image_in, filterSize, filterShape)

image_out = bclosing (image_in, filterSize, filterShape)
Assignment 6.2
Load image sky and perform opening operation. Compare result with the result from
Figure 6.3. Is there a difference with the result from Figure 6.3?
6.2.5 Finding boundary pixels
The boundary pixels can be found by first dilating the object and subtracting the original
from it. Or by first eroding the image and subtracting it from the original.
1. Boundary_image=imdilate(Original_image) Original_image
2. Boundary_image= Original_image - imerode(Original_image)
36

Assignment 6.3
Load image sky and extract boundary pixels. Choose structuring element carefully and
try to get the result similar to Figure 6.5. Can you improve the result? Is the quality of
result dependent on the size of the structuring element? Explain why.
Original binary image

Extracted pixel boundaries
Figure 6.5: Extracting boundary pixels
6.3 Morphological reconstruction
It is convenient to be able to reconstruct an image that has survived several erosions or

to fill an object that is defined, for example, by a boundary. The formal mechanism for
this has several names including region-filling, reconstruction, and propagation.
Reconstruction is a morphological transformation involving two images and a structuring
element. First image, a seed image is the result of the morphological operation (eg
erosion). The other image is original binary image, often called the mask image. The
structuring element defines connectivity.
One of used method is opening by reconstruction. It restores original shapes of the
object that remained after erosion. For that following commands can be used:
MATLAB:
For reconstruction:
reconstructed_image = imreconstruct(seed_image,mask_image,connectivity)
For filling holes:

image_out=imfill(binary_image, connectivity, holes)
For clearing border objects:

image_out =imclearborder(binary_image, connectivity)
For defining image skeleton:

image_skeleton=bwmorph(
binary_image, skel, Inf);
For defining branch pixels in image skeleton:

image_skeleton=bwmorph(
binary_image,branchpoints);
NOTE: default connectivity is 8 or 4 for a 2D image
37
DipImage:
For reconstruction:
image_out=bpropagation(image_seed,image_mask,iterations,connectivity,
edgeCondition)
Iterations: the number of steps taken defines the size of the structuring element (0 is the
same as Inf, meaning repeat until no further changes occur).
EdgeCondition: Make sure to set the edge condition to 0. This is the value of the pixels
just outside the boundary of the image. If you set it to 1, all objects touching the border
will also be reconstructed.
For clearing border objects:
image_out = brmedgeobjs(image_in,connectivity)
For defining image skeleton:

image_out = bskeleton(image_in, edgeCondition, endPixelCondition);
For defining branch pixels in image skeleton:

image_out = getbranchpixel(image_in)
Assignment 6.4:
Load the image text.jpg and using different morphological operations obtain the results
shown at Figure 6.7. Explain the difference between the different images.
Original binary image
Eroded image with a vertical line
Image opened with a vertical line
Image reconstructed with a vertical line
Holes filled in image

Border characters removed
Figure 6.7: Examples of different morphological operations
Assignment 6.5: Quality control of incandescent lamps
Load the image lamps. It contains an image of six bulbs, two of which are to be
discarded. The bulbs must have a contact at the bottom, and it must not touch the outer
edge, which is the other contact. Threshold at a low value, such that the bulb is merged
with the background (we are only interested in the fitting, which is characterized by the
two black rings). Now remove the background using imclearborder or brmedgeobjs.
38
Now devise some operations using not (or ~), dilation, erosion, imreconstruct (or
bpropagation) and/or imclearborder (brmedgeobjs) to detect either the good or bad bulbs
(either make a program that rejects bad bulbs or accepts good bulbs).
The colored images were generated with the command overlay. It overlays a grey- value
or color image with a binary image. It is possible to apply this function several times,
each with a different binary image, which can thus be used to mark the image using
several colors. Third (optional) parameter determines the color for the binary objects. It is
possible to apply this function several times, each with a different binary image, which
can thus be used to mark the image using several colors.
Image lamps
Exercise goal
Alternate goal
Figure 6.8: Selecting different objects
Assignment 6.6: Distinguishing nuts from bolts (only DipImage)
Now load the image nuts_bolts and threshold it to get the binary image. Note that the
threshold operation chooses the background as the object (because it is lighter). You will
need to inverse the image before or after the thresolding. Use the bskeleton function to
create a skeleton of the objects. What is the influence of the Edge Condition? What
does End-Pixel Condition control?
With looseendsaway we can transform the nuts seen from the top (with the hole in them)
into a form that is distinguishable from the other objects in the image. Now use the
function getsinglepixel to extract the objects without a hole in them. This new image can
be used as a seed image in bpropagation. The mask image is the original binary image.
The objects with holes are retrieved with b & c (b and not c) if the output image for
bpropagation was c. Try extracting the last nut using the bskeleton and the
getbranchpixel functions. Now solve the assignment in Matlab without using DipImage
toolbox. Find the equivalent functions in Matlab to the ones used in DipImage.
Image nuts_bolts
Exercise goal
Figure 6.9: Selecting different objects
39

6.4 Gray scale morphology
All morphological operations that are already explained can be applied on the grayscale
images. Morphological filtering is very useful and often applied on the gray scale images
since it can severely reduce noise while preserving the edges in the image in the contrast
to linear filters. It is able to distinguish structures based on size, shape or contrast
(whether the object is lighter or darker than the background). They can be employed to
remove some structures, leaving the rest of the image unchanged. In this sense,
morphology is one step ahead of other image processing tools towards image
interpretation.
6.4.1 Morphological smoothing
Because opening suppresses bright details smaller than the structuring element and
closing suppresses dark details smaller than the structuring element they are often used in
combination for image smoothing and noise removal. Note that the size of the structuring
element is an important parameter. A morphological smoothing with a small structuring element
is an ideal tool to reduce noise in an image.
MATLAB:
Image_opening=imopen(imclose(Image, SE), SE);
Image_closing=imclose(imopen(Image, SE), SE);
DipImage:
Image_opening = bopening (bclosing (image_in, filterSize, filterShape),
filterSize, filterShape)
Image_closing = bclosing (bopening (image_in, filterSize, filterShape),
filterSize, filterShape)
The results of the morphological smoothing are shown in Figure 6. 10.
Original image
Open-close filtering
Close-open filtering
Figure 6.10: Morphological smoothing
Assignment 6.7
Load the image erika and construct a smoothing filter that removes most of the hair,
but leaves the girls face recognizably human. In process use both the Open-close and the
Close-open filtering and explain the difference between them.
40

6.4.2 Morphological sharpening
As we already saw in Chapter 3, for sharpening we should use edge detectors. They are
morphological gradient magnitudes:
Edge1 = dilation(A)A
Edge2 = Aerosion(A)
In a similar way, we can construct a morphological second derivative:
Edge1 Edge2 dilatation( A) 2* A + erosion( A)

=
2
2
Note that it has a similar form as Laplace operator, ( 1, -2, 1) across the edge.
MSD =
A sharper version of the morphological Laplacian can be computed by taking the

minimum value of the two edge detectors. Note that the sign of the morphological
Laplacian is used for this purpose (use the function sign).
MSDsharp = sign(Edge1 Edge2) min(Edge1,Edge2)
Assignment 6.8
Observe the results obtained by morphological sharpening in Figure 6.11. Compare the
results with the result obtained by using the regular Laplace operator. What is the
difference?
Original image
Linear Laplace
Morphological
Laplace
Figure 6.10: Morphological sharpening
Sharp morph.
Laplace
6.4.3 Compensating for non uniform illumination (Top-hat transformation)
Openings can be used to compensate for non uniform background illumination, which
occurs very often in real world scenes. Figure 6.11 (a) shows an image f in which the
background is darker towards the bottom than in upper portion of the image. The uneven
illumination makes image thresholding difficult. Figure 6.11 (b) is a thresholded version
in which the grains at the top of the image are well separated from the background but the
grains in the middle are improperly extracted from the background. Opening of the
image can produce the estimate of the background across the image, if the structuring
element is larger than rise grains. Estimated image is shown on the image 6.11 (c).
41

>> f =imread('rice.tif'); Original image
>> se = strel('disk', 10);
>> fo = imopen(f,se); Estimate of the background
By subtracting background estimate from the original image, we can produce the image
with reasonably even background. Figure 6.11 (d) shows the result, while the Figure 6.11
(e) shows the new thresholded image. The improvement is apparent.
Subtracting an opened image from the original is called the top-hat transformation.
>> f2 = imtophat(f,se); Image with even background
(Equal with f2 = imsubtract(f,fo);)
A related function imbothat(f, se) perfoms the bottom-hat transformation, defines the
closing of the image minus the image. These two functions can be used for contrast
enhancement using commands such as:
>> se = strel('disk',3);
>> g =imsubtract(imadd(f, imtophat(f,se)), imbothat(f,se));
a) Original image
b) Thresholded image c) Background estimation
d) Top-hat transformation e) Thresholded top-hat image

Figure 6.11: Top-hat transformation
Assignment 6.10
Load the image plugs and create binary image from it. Improve the quality of the
thresholded image by compensating for uneven illumination. How many connected
components can you count? Using morphological operations, try to separate all the plugs
on the image. How many connected components can you count now? Extract the
boundary of each connected component. Can you suggest any method to measure the size
of each plug? Explain it in your own words.
42
Chapter 7 Measurements
Chapter 7: Measurements
In this chapter we describe various ways to measure objects in the image. In previous
chapters we saw different ways for image processing but in order to progress to image
analysis we first need to learn how to select objects in images and measure their basic
properties.
7.1 Selecting objects
In a binary image, an object is considered a connected set of pixels. As discussed in

Subsection 6.1, there are different connectivity modes that define which pixels are
considered connected (and thus belonging to the same objects). Before we can measure a
property of such an object (say, the number of pixels that define it), we need to extract
the object from the image. The common way of doing this is to label all objects. Labeling
involves finding any foreground pixel in the image, give it a value (the label ID), and
recursively give the same value to all pixels that are connected to it. This process is
repeated until all foreground pixels have been assigned to an object. To extract one
object, all we now need to do is get all pixels with its ID.
MATLAB:
Function bwlabel computes all the connected components in a binary image.
[L, num] = bwlabel(binary_image, n)
It returns a matrix L, of the same size as binary_image, containing labels for the
connected objects in binary_image. The variable n can have a value of either 4 or 8,
where 4 specifies 4-connected objects and 8 specifies 8-connected objects. If the
argument is omitted, it defaults to 8.
The number of connected objects found in binary_image is returned in num.
You can use the MATLAB find function in conjunction with bwlabel to return vectors
of indices for the pixels that make up a specific object. For example, to return the
coordinates for the pixels in object 2, enter the following:
>>[r, c] = find(bwlabel(binary_image)==2)
In order to show labeled components in different color, as on Figure 7.1 we can use
following functions:
>>RGB = label2rgb(L); Convert label matrix into RGB image
And in order to display just selected object:

>> mask = zeros(size(L));
>> mask(r,c)=1;
>>imshow(label2rgb(L.*mask))
Labeled objects and the selected one are shown in Figure 7.1.
43
DipImage:
Equivalent function is label, where is the connectivity same as in bwlabel function and
minSize, maxSize represent minimum and maximum size of objects to be labeled.
image_out = label(image_in,connectivity,minSize,maxSize)
Specific object can be easily extracted using:

>> image_out==2;
a)
Labeled b) Selected object
c) Perimeter of the d) Centroids of all
components
selected object
components
Figure 7.1 Selecting and measuring object
Assignment 7.1:
Load the image cerment, label all the objects and show different components with
different colors. Now extract only the largest object. The area of the object can be easily
obtained by summing all the pixels belonging to that object.
7.2 Measuring in binary images
For basic measurements in Matlab we can use function regionprops. It measures a set of
properties for each connected component (object) in the binary image. All the results are
obtained in image pixels and represent only relative measurements. Linking these
measurements to real world values is explained in Section 7.5
STATS = regionprops(binary_image, properties)
List of properties argument are listed in table 7.1.

Table 7.1 List of properties arguments in regionprops function
Area
EulerNumber
Orientation
BoundingBox
Extent
Perimeter
Centroid
Extrema
PixelIdList
ConvexArea
FilledArea
PixelList
ConvexHull
FilledImage
Solidity
ConvexImage
Image
SubarrayIdx
Eccentricity
MajorAxisLength
EquivDiameter MinorAxisLength
44
In further text we will see how to use this function.

In Dip_Image there is a nice function called measure within Analysis menu.
ms=measure(object_in,gray_in,measurmentIDs,objectIDs,connectivity,
minSize,maxSize)
PARAMETERS:
object_in: binary or labelled image holding the objects.
gray_in: (original) gray value image of object_in. It is needed for several types of
measurements. Otherwise you can use [].
measurementIDs: measurements to be performed, either a single string or a cell array
with strings (e.g.: {'Size','Perimeter, Mean, StdDev } ). See measurehelp
objectIDs: labels of objects to be measured. Use [] to measure all objects
minSize, maxSize: minimum and maximum size of objects to be measured. By default use
0.
To extract all measurements for one object, index the returned measurement object using
the label ID of the object you are interested in. The next example illustrates the four types
of indexing:
>> data(2); Properties of object with ID 4
>>data.size; Size of all objects
>>data(2).size; Size of object with ID 4
>>data.size(2); Size of 4th. element
For all measurements performed in DipImage same function is used, so we wont

mention it any more.
Main object properties that we can measure are area, perimeter and length, centre of mass
and number of objects in image, so further explanations on measuring these properties
follows.
7.2.1 Area
The most important characteristic of an object is its area, which is simply the number of
pixels. It's also the first measurement of importance of the object. Whatever the "real" (or
physical) object is, its area computed this way will be as close as we like to its "true" area
as the resolution increases.
Solution 1:
In Matlab, we can use the command bwarea to estimate the area of objects in binary
image. Result is a total number of pixels with value 1 in the image.
Area=bwarea(binary_image);
If we want to estimate the area of the object in Figure 7.1 b), we can calculate it using:
>>[L, num] = bwlabel(binary_image, n)
45
>>[r, c] = find(bwlabel(binary_image)==2)
>>mask = zeros(size(L));
>>mask(r,c)= 1;
>>Area_selected_object = bwarea(L.*mask) = 189,38 pixels
>>Area_all_labeled_components = bwarea(L) = 17788 pixels
Solution 2:
Another way to do it is by using regionprops function.
>>Rice_Area = regionprops(binary_image, Area)
>>Rice_Area = 99x1 struct array with fields: Area
Rice_Area is now a structure array with number of elements equal to number of

connected components in image. To access the area of selected object from Figure 7.1 b)
(Object 2) we can use following command.
>>Rice_Area(2,1).Area=187 pixels
Solution 3:
In DipImage, just specify Area as the argument in the measure function.
7.2.2 Perimeter
The perimeter of an object is the number of edges around the object. The perimeter is
easy to compute but the result depends on the orientation of the object with respect to the
grid. Therefore it is not a good way to measure objects in digital images.
Solution 1:
In Matlab there is a function bwperim that returns a binary image containing only the
perimeter pixels of objects in the input image.
Perimeter=bwperim(binary_image);
Lets calculate the perimeter of a selected object in image 7.1(b). Length of perimeter can
be calculated using previously mentioned function bwarea. The result is shown on image
7.1(c)
>>Perimeter_selected_object = bwperim(L.*mask);
>>Perimeter_length =
bwarea(Perimeter_selected_object)=64,12 pixels
Solution 3:
In DipImage, just specify Perimeter as the argument in the measure function.
Assignment 7.2
Load the image rice and create binary image from it. Calculate the perimeter and the
length of perimeter for all objects in image as well as for the object 5. Display the result
and verify that result is same with the solution 1. Use the regionprops function.
46
7.2.3 Centroid (Centre of mass)
An object in a digital image is a collection of pixels. Then the coordinates of the center of
mass is the average of all coordinates of the pixels.
Solution 1:
In Matlab, it can be computed using regionprops function.
Rice_Centroid
= regionprops(binary_image, 'centroid');
Displaying centroids can be done in following way. Result is shown in Figure 7.1. d)
>>centroids = cat(1, s.Centroid)
>>imshow(binary_image)
>>hold on
>>plot(centroids(:,1), centroids(:,2), 'b*')
>>hold off
Centroid of selected object 2 is:

>>Rice_Centroid (2,1).Centroid =
131.1497
61.1818
Solution 2:
In DipImage, just specify Gravity as the argument in the measure function.
7.2.4 Euler number
The Euler number is a measure of the topology of an image. It is defined as the total
number of objects in the image minus the number of holes in those objects.
In Matlab, it can be computed using bweuler function.
>>Rice_Euler = bweuler(binary_image,8)
>>Rice_Euler = 98
Since there are no holes in image rice we can conclude that there are 98 objects in
image.
Assignment 7.3
Load the image rice and create binary image from it. Calculate the total number of
objects in image using regionprops function.
Assignment 7.4
Load the image nuts_bolts and separate nuts from bolts. Display their perimeters on
two different images. How many components are there in total in each image? Select one
bolt and estimate its centre of mass. In your opinion are the values real?
47
7.3 Errors Introduced by Binarization
Note that the area, perimeter, etc. you measured earlier are not the exact measurements
you could have done on the objects in the real world. Because of the binarization, the
object boundary was discretized, introducing an uncertainty in its location. The true
boundary is somewhere between the last on-pixel and the first off-pixel. The pixel pitch
(distance between pixels) determines the accuracy of any measurement.
Assignment 7.5: Thought experiment
Imagine you drive a car with an odometer that indicates the distance travelled in 100
meter units. You plan to use this car to measure the length of a bridge. When you cross
the bridge, sometimes the odometer advances one unit, sometimes two. Can you use this
set-up to measure the length of the bridge accurately? How can you determine the
accuracy? What special measures do you need to take to make sure your measurement is
not biased?
Assignment 7.6: Errors in area measurement
The object area is computed counting the number of pixels that comprise the object. The
error made depends on the length of the contour. Quantify this error for round objects of
various sizes. What happens with the accuracy as a function of the radius? Why?
Hint: make sure you generate the objects with a random offset to avoid a biased result.
To do so, use the function rand:
a = ((xx+rand) 2 +(yy+rand) 2 ) 642

Hint: on these images, you can use the function sum instead of regionprops, since you
only have one object in each image.
7.4 Measuring in Gray-Value images
In the previous sections we threw out a lot of information by binarizing the images before
measuring the objects. The original grey-value images contain a lot more information
(assuming correct sampling) than the binary images derived from them (which are not
correctly sampled). As long as we only apply operations that are sampling-error free, we
can perform measurements on grey-value images as if we were applying them directly to
analog images. In this case, measurement error is only limited by the resolution of the
image, noise, and imperfections caused by the acquisition system.
7.4.1 Thresholding
However, there are some limitations. If the continuous image is binary, the pixel sum
directly yields its size. To all other images we need to apply a form of nonlinear scaling
to accomplish this, without violating the sampling constraint. Sharping all edges into a
normalized error-function is an example of soft-clipping that fulfills the requirements.
DipImage has function erfclip that performs soft clipping in Grey-value error function.
48
Using this function we will make edges more notable and suppress intensity fluctuations
in the foreground and background.
image_out = erfclip(image_in,threshold,range)
DEFAULTS: threshold=128;range = 64;

Clipping is done between: threshold +- range/2
7.4.2 Calculating Area, Perimeter and Euler number (in DipImage)
The basic measurements in Gray value images can be calculated in the following way. If
we if we let 0 B(x,y) 1 be a continuous function of image position (x,y), we can define
Area, Parameter and Euler number as:
Area: A = B ( x, y )dxdy
Perimeter: P = Bx 2 + By 2 dxdy
1
Euler number: E =
2
Bxx By 2 2 Bxx Bx By + Byy Bx 2

Bx 2 + By 2
dxdy
The (Bx, By) is the brightness gradient and Bxx, Bxy, and Byy are the second partial
derivatives of brightness (Check Section 10.1.3 for explanation of gradient and second
derivative).
Correspondingly, in the continuous gray-level image domain:
(1) Area can be computed using the image values directly;
(2) Perimeter requires first partial derivatives (gradient);
(3) The Euler number requires second order partial derivatives.
In order to compute different properties, lets first generate gray value image of a simple
disk with height 255, and radius 64. It is show on Figure 7.2 a)
>> disk = testobject(a,ellipsoid,255,64);
Area can now be computed as the sum of all pixels:

>> Area=sum(a)/255 = 1.2868e+004;
To compute the perimeter displayed in image 7.2 b), we will need first partial derivative
of the image.
Perimeter = sum(gradmag(a,2))/255 = 402
Assignment 7.7
Create binary image of image disk, and calculate its area and perimeter. Compare results
with the ones obtained by measuring in gray level images. Comment on the error
introduced by binarization.
49
Assignment 7.8
Compute the Euler number of the gray level image of simple disk. Verify that there is
only one object on the image.
a) Gray level image of disk
b) Perimeter
Figure 7.2 Measuring in Gray level images

7.5 Linking measurements to real world
All the measurements that we were performing so far had resulting values in image
pixels. If we need to obtain these values in real world coordinates (meters) there are
several parameters that we need to know in advance about image sensor on which the
image is captured.
7.5.1 Pinhole camera model
Images are captured using cameras. On Figure 7.3 is shown simple camera model.
Real world
P = ( X ,Y , Z )
Image plane
f
y
P = ( X , Y , Z )
Figure 7.3 Pinhole camera model

The pinhole camera is the simplest, and the ideal, model of camera function. It has an
infinitesimally small hole through which light enters before forming an inverted image on
the camera surface facing the hole. To simplify things, we usually model a pinhole
camera by placing the image plane between the focal point of the camera and the object,
so that the image is not inverted. This mapping of three dimensions onto two is called a
perspective projection and perspective geometry is fundamental to any understanding of
image analysis. Assuming the model from Figure 7.3 following relations can be derived:
50
Note that O represents centre of projection and f, the focal point of the camera.
X = X
X Y Z
OP = OP Y = Y =
= =
X Y
Z
Z = Z
Z = f
X = f
x = X
X
Z
Y = f
Y
Z
y = Y
( X , Y , Z ) ( x, y,1) = ( f
X
Y
, f , 1)
Z
Z
7.5.2 Thin Lens law
In reality, one must use lenses to focus an image onto the camera's focal plane. The
limitation with lenses is that they can only bring into focus those objects that lie on one
particular plane that is parallel to the image plane. Assuming the lens is relatively thin
and that its optical axis is perpendicular to the image plane, it operates according to the
following lens law:
1
1
1
+
=
OA ' OA f
OA is the distance of an object point from the plane of the lens, OA is the distance of the
focused image from this plane, and f is the focal length of the lens. These distances can
be observed on Figure 7.4.
A'
O
A
f
Figure 7.4 Camera model with lens
51
7.5.3 Calculating real size of the object
If we now take into account so far mentioned formulas, we can write:

1
1
1
+
=
OA ' OA f
AB =
AB
OA
=
A ' B ' OA '
OA
OA
1) * A ' B '
* A' B ' = (
f
OA '
So if know the size of the object in pixels, the focal point of camera that captured the
image and precise distance from which the image is taken we can estimate the real size of
the object.
If we want to calculate the size of the robot on Figure 7.4, we need following parameters:
f=4.503 mm; focal length of lens ( check lens specifications)
Pixel size: 3.17m; check specifications
| AB|= 343 pixels; measured from image
| OA| = 5m; distance from which the image is taken
Assignment 7.9
Using your own camera fix the distance to 1m, capture an image of a ruler and measure
how many m is in one pixel.
Assignment 7.10
Place the uniform color object on the uniform background and capture the image of the
object with your own camera. Fix the distance from which the image is taken to 1m.
Create binary image and compute the perimeter of the object in image. If you know the
object perimeter in pixels, how large it is in cm? Measure the perimeter of the real object
by ruler and compare with the estimated value. How large is the measurement error?
52
Chapter 8 Color Image Processing
Chapter 8: Color Image Processing

Until now all images that we have seen were represented by a single value for each pixel,
by function f : R n R where each pixel of n dimensional image is represented by a
single value. A more general representation of an image is a function f : R n R m where
each pixel of n dimensional image is represented by m values. Examples of these images
are color images and multispectral images.
Color is a very powerful descriptor in image processing and is often used for object
identification and scene segmentation. Humans can discern 10 million of different colors
compared to only several dozens of the shades of gray. In image processing, color is
represented in a standard, generally accepted way by specific color model. A color model
is a specification of a 3D coordinate system and a subspace within that system where
each color is represented by a single point. Two most commonly used color models are
RGB and HSV color models.
8.1 RGB color model
The RGB color model is based on a Cartesian coordinate system and each color appears
in its primary spectral components of red, green and blue. An RGB image may be viewed
as a stack of three gray scale images that when fed into the red, green and blue inputs of
the color monitor, produce a color image on the screen. By convention, the three images
forming an RGB color image are referred to as green, blue and red component images.
The representation of the RGB model using color cube is shown on Figure 8.1
Figure 8.1 Representation of the RGB color model

The data class of the component images determines their range of values and the number
of bits used to represent the pixel values of the component image determines the bit depth
of the image. If we observe the values showed in Table 8.1, we can conclude that the data
type that mostly satisfies attributes of human visual system is uint8. It can distinguish
similar numbers of colors as the human visual system and is therefore most commonly
53
used. The 16 bit representation (so called Highcolor) is used only for displaying the
image on the screen.
Table 8.1: Influence of data type on color representation
Data class
Range of values Bit depth Number of colors (2b)3
double (b=2 bits) [0 1]
3*2=6
64
uint8 (b=8 bits)
[0,255]
3*8=24
16, 777 * 106
uint16 (b=16)
[0,65535]
3*16=48 2,814 * 1014
Lets load the image ball and observe red, green and blue components of this image.
>>
>>
>>
>>
>>
>>
>>
I=imread(ball.jpg');
I_red=I(:,:,1);
I_green=I(:,:,2);
I_blue=I(:,:,3);
subplot(1,3,1); imshow(I3(:,:,1))
subplot(1,3,2); imshow(I3(:,:,2))
subplot(1,3,3); imshow(I3(:,:,3)
What are the differences between these images? In your opinion how difficult it would be
to segment the ball from the field based on these three images?
RGB is a linear representation, since it directly maps different colors to light intensities
of the various frequencies. However, human vision is logarithmic, in the sense that the
perceived contrast is based on the ratio of two intensities, not the difference. Thus, RGB
is a (perceptually) non-uniform color space. This can create problem when observing
images with illumination changes.
Assignment 8.1:
Lets assume that a robot observes a ball at the outdoor football field. The ball has a
specific orange color and can be described using RGB color model. However since
illumination constantly changes, appearance of the color of the ball changes as well. It
appears much lighter when sunny and very dark orange when clouds appear. Observe
different appearances of the ball from Figure 8.2 If we describe this ball only using RGB
values, they will differ significantly depending on illumination and no connection can be
made between two different appearances of the same ball. For that reason illumination
independent model of the color must be used for ball representation. One of such models
is HSV model described in further text.
Part of R component:
131 131 132 133 134 210 211 211 214 215
70 70 71 71 72
Figure 8.2: Influence of an illumination change on the ball appearance
54

8.2 HSV (HSI) color model
Another way to represent colors is using their Hue, Saturation and Value (Intensity)
attributes as shown on Figure 8.3 HSV model is based on the Polar coordinate system.
The hue is a color attribute that describes the pure color (eg. pure yellow, orange or red)
whereas the saturation gives a measure of an intensity of a specific hue value. In other
words saturation gives a measure of the degree to which a pure color is diluted by a white
light. The value represents the intensity of the light in the image. Observe the difference
between these 3 components on Figure 8.4.and how their levels change. The main
advantage of the HSV space over RGB is the fact that the intensity information (V or I) is
decoupled from the color information in the image which allows an illumination invariant
representation of the image. Second, the hue and saturation component are closely related
to the way that humans perceive color. For that reason HSV is very commonly used
model in image processing.
Figure 8.3 Representation of the HSV color model

Hue
Saturation
Value
High
value
Low
value
Figure 8.4 Influence of change in hue, saturation and value attributes on the image
55

8.3 Conversion between different color models
Mentioned color models are not the only ones that are used. In Table 8.2 we list more
color models supported by Matlab and their simple descriptions.
Color
space
XYZ
xyY
uvL
L*a*b*
Description
Encoding
The original 1931 color space specification;
uint16
double
double
or
The specification that provides normalized chromacity#

values. The capital Y value represents luminance and is the
same as in XYZ
The specification that attempts to make the chromacity plane double
more visually uniform. L is the luminance and is the same as
Y in XYZ
The specification that attempts to make the luminance scale uint8, uint16
more perceptually uniform. L* is a nonlinear scaling of L, or double
normalized to a refernce white point.
# Hue and saturation taken together are called chromacity
In Matlab , in order to convert between different color spaces following commands are
used.
I_rgb = hsv2rgb(I_hsv) converts an HSV color map to an RGB color map.
I_hsv = rgb2hsv(I_rgb) converts an RGB color map to an HSV color map.
In order to convert to other color spaces, besides HSV and RGB we use following.
C=makecform(type) creates the color transformation structure C that defines the color
space conversion specified by type. Some values that type can take are ('srgb2lab',
'srgb2xyz', )
Lab=applycform(A, C)
converts the color values in A to the color space specified in the

color transformation structure C
In DIP Image, we can use following commands to establish current color space and
change to different one. Col can take all values from Table 8.2
L*a*b*
L*u*v*
L*C*H*
RGB
RGB
XYZ
Yxy
CMY
CMYK
HCV
HSV
Table 8.3 Supported color models in DIPImage

col = colorspace(I) returns the name of the color space of the image I.
with I a color image, it changes the color space of image

I to col, converting the pixel values as required.
I = colorspace(I,col),
56

Example 8.3
Lets compare the difference between the RGB and HSV color space
Matlab
DIP Image
>>RGB = imread(ball.jpg);
>>HSV = rgb2hsv(RGB);
>>H=HSV(:,:,1);
>>S=HSV(:,:,2);
>>V=HSV(:,:,3);
>>subplot(2,2,1), imshow(H); title (Hue)
>>subplot(2,2,2), imshow(S); title
(Saturation)
>>subplot(2,2,3), imshow(V); title
(Intensity)
>>subplot(2,2,4), imshow(RGB);title
(RGB)
>>RGB=readim(ball.jpg);
>>HSV=colorspace(RGB,HSV);
>>H=HSV{1};
>>S=HSV{2};
>>V=HSV{3};
Resulting image is shown bellow.
Figure 8.5 Hue, saturation and value attributes of the image

Assignment 8.2
1. Describe in detail the L*a*b* model

2. What are the differences between that model and the RGB model? And between
HSV model?
3. Load the image ball, convert it fro RGB to L*a*b* color space using only
Matlab code and show on the diagram separate color planes.
4. Increase the values for L component for 30% and observe how that change
influences the original image.
5. Vary the values for a and b and inspect how is the original image affected
57

8.4 Spatial Filtering of color images
Since a color is represented as a vector, addition, subtraction and scalar multiplication

can be performed on pixels. Furthermore, these operations are all performed on an
element-per-element basis, which means that these operations can be performed on the
different color channels separately. Thus, all linear filters (which only require these three
operations) can be applied on each channel separately. This converts the complexity of
filtering a 3-component vector image to filtering three grey-value images.
However, the fact that these vectors are linear does not imply that they are visually linear.
For that reason it is suggested to do all the filtering in perceptual color space.
8.4.1 Color smoothing
One way to smooth a monochrome image is to define a filter mask of pixel

neighborhood, multiply all pixel values by the coefficients in the spatial mask and divide
the result by the sum of the elements in the mask. For color images the process is the
same, except that instead of pixel values we now deal with the vector values.
Lets now check functions that perform color smoothing.
MATLAB
>>color_image=imread(flower.jpg)
>>w=fspecial(average, 25)
>>color_image_blurred=imfilter(color_image, w,replicate)
DIP Image
In DIP Image the color images are not supported directly by the filters. To apply the same
filter to each of the color channels, use the function iterate
>> color_image=readim(flower.jpg);
>> color_image_blurred = iterate(gaussf,color_image,10);
Result of the smoothing is shown in the Figure 8.6.

Assignment 8.3 (use Matlab)
Load the image flower, convert it to HSV space and smooth only the intensity
component. Restore the original image using the command cat and display it in RGB
color space. Compare the result with the resulting image from previous example. What
you can observe?
8.4.2 Color sharpening
Sharpening the image follows the same steps as the smoothing with the difference that we
use the Laplacian filter.
Lets now check functions that perform color sharpening.
58
MATLAB
>>lapmask=[1 1 1; 1 -8 1; 1 1 1]; Laplacian filter mask
>>image_sharpen=color_image_blurred-imfilter(color_image_blurred,
lapmask,replicate)
>>imshow(image_sharpen)
Result of the sharping is shown in Figure 8.6.

Assignment 8.4
Perform the unsharp masking of image flower in DIP Image.
Original image
Smoothed image
Sharpened image
Figure 8.6 Effect of smoothing and sharpening on the image
8.4.3 Motion blur in color images
One problem that is constantly present in robot vision is motion blur due to camera
motion. It is very important to approximate image for camera motion and to learn how to
correct such image.
MATLAB
Create a filter, h, that can be used to approximate linear camera motion.
h = fspecial('motion', 50, 45);
Apply the filter to the image originalRGB to create a new image, filteredRGB, shown in
Figure 8.7
>>originalRGB = imread('room.jpg);
>>imshow(originalRGB)
>>filteredRGB = imfilter(originalRGB, h);
>>figure; imshow(filteredRGB)
Assignment 8.5 (use Matlab)

Restore the original image from the blurred image from previous example. You can use
Wiener filter and the function deconvwnr.
59
Figure 8.7: Effect of blurring on the image

8.4.4 Color edge detection
In monochromatic images for edge detection we use gradients. However, computing the
gradient of a color image by combining the gradients of its individual color planes gives
different result from computing the gradient directly in the RGB color space.
MATLAB
Computation of the gradient directly in the RGB color space is performed with following
function that is supplied with the practicum:
[VG, A, PPG]= colorgrad (image_RGB);
VG is the RGB vector gradient; A is the angle image in radians; PPG is a gradient image
formed by summing the 2-D gradient images of the individual color planes.
DIP Image
a = readim(robot_field.jpg)
b = norm(iterate(gradmag,a))
Assignment 8.6 (use either the DIP Image or Matlab)

Calculate the gradient of the color image by combining the gradients of the separate color
planes. Use the HSV color space for representation. Compare the results. Is there a
difference? When is preferable to use combined method for calculation and when the
direct one?
For hint, check code of the colorspace function.
8.4.5 Color morphology
For non-linear filters, it very often is not this clear how they should be applied to color
images. For example the morphological operations, that never should introduce new
colors (the values of the output are selected from the input by maximum or minimum
operations), are particularly difficult to implement.
60

Assignment 8.7
Observe what is happening if we apply the morphological operation dilatation on the
simple test image. What can you conclude?
Figure 8.8: Effect of dilatation on the color image

8.5 Segmentation of color images
Segmentation is process that partitions the image into regions. Some images are easily
segmented when the correct color space has been chosen. This is very often L*a*b* or
HSV, and this exercise and the next will show why.
Assignment 8.8 (use either Matlab or DIP Image)
Read in the image robosoccer_in. This is an image recorded by a soccer-playing robot.
Youll see the dark green floor, greyish walls, a yellow goal, a black robot (the goal
keeper), and two orange balls (of different shades of orange). You should write an
algorithm to find these balls.
Look at the R, G and B components. Youll notice that it is not easy to segment the balls
using any one of these three images. One problem is that the bottom side of the balls is
darker than the top part. We need to separate color from luminance, as does the L*a*b*
color space.
Convert the image into L*a*b*. The a* channel (red-green) makes the segmentation very
easy (by chance: we are looking for objects with lots of red, and the balls are the only
such objects in the image). A triangle threshold will extract the balls. Use function:
[out, th_value] = threshold(in,type,parameter) for DIP_Image and simple thresholding of
the values for Matlab.
Note that the thin lines along strong edges are caused by incorrect sampling in the
camera. This is a common problem with single-chip CCD cameras, where the three
colors of a single pixel are actually recorded at different spatial locations. If you zoom in
on such a strong edge in the input image, youll notice the color changes. These thin lines
in our thresholded image are easy to filter out using some simple binary filtering.
The image robosoccer_in_dark contains the same scene recorded with smaller
diaphragms (less light reaches the detector). Examine if this algorithm still works for
these worse lighting conditions.
61
Figure 8.9: Results of the ball segmentation from the Assignment 8.6
Assignment 8.9 (Use Matlab)
The* channel provides a good solution to our problem. However, if there were a red or
purple object in the scene (like a robot adversary), this technique wouldnt work. We
want to be able to differentiate orange not only from yellow, but also from red and
purple. The hue should provide us with a nice tool for this purpose.
Compute the hue image from robosoccer_in and use it to segment the balls. Try your
program on the other images in the series.
62
Chapter 9 Advanced topics
Chapter 9: Advanced topics

In this chapter we will focus on several topics that are very useful for understanding of
image segmentation and robust object representation in Chapters 10 and 11. Our advice is
to read this chapter and to solve the assignments if you want deeper understanding of the
subject.
9.1 Scale-Spaces
The multi-scale nature of objects is quite common in nature. Most of objects in our world
exist and can be processed only over limited range of scales. A simple example is a
concept of a branch of a tree which makes sense only at scales from few centimeters to
few meters (take a look at Figure 9.1). It is meaningless to discuss the tree concept at the
nanometer or kilometer level. At those scales it is more relevant to talk about the
molecules that form the leaves of the tree or the forest in which tree grows. Once we want
to process the image of an object we need to know which the relevant scales of that
object are. Or another way would be to create the representation of the object so that is
independent of scale. Scale space theory is providing us with answers to these questions.
Complete Tree Scale 1
Branch -Tree Scale 2

Leaf Tree Scale 3
Figure 9.1 Multiscale structure of tree
Such multiscale representation of an image is called the Image pyramid and is in fact
collection of images at different resolutions. At low resolutions there are no details in
image and it only contains very low frequencies (it is very blurred). At high resolution we
can observe both high and low frequencies and process even the smallest details in
image.
We can reduce the resolution of an image by limiting its frequencies which is same as
applying low pass filter. The most common filter to use is Gaussian filter since Gaussian
derivatives can be easily scaled due to explicit scale parameter sigma, . Observe the
formula of Gaussian filter (Kernel) bellow.
2
2
2
1
G ( x, y , ) =
e ( x + y ) / 2
2
Now, blurring the image is done by convolving the image with the Gaussian kernel as we
saw in Chapter 4.
L ( x, y , ) = G ( x, y , ) I ( x, y )
L represents the blurred image and the parameter will determine amount of blur.
Higher values of will yield higher amount of blur in the image and lower the resolution
of the image. Scale space of image Lena is shown at Figure 9.2.
63
Figure 9.2 Scale space of image Lena with 9 scales in total

In Matlab, we use following function as Gaussian filter function that returns a rotationally
symmetric Gaussian lowpass filter of size hsize with standard deviation sigma (positive).
>>h = fspecial('gaussian', hsize, sigma)
>>image_out = imfilter(image_in,h,'replicate');
In order to create the scale space of an image we need to calculate the sequence of
images, where is each image filtered with Gaussian filter with higher value of sigma.
In Dip Image we can use function gaussf to filter the image using Gaussian filter.
image_out = gaussf(image_in,
,method)
For calculation of the Gaussian scale space of an image, separate function in Analysis
menu is implemented:
[sp,bp,pp] = scalespace(image_in,nscales,base)
Output parameters are:

sp- Scale pyramid; bp- Difference between scales and pp- Variance between scales
DEFAULT:
number of scales (nscales) is 7,
base = sqrt(2) and
images are smoothed with basei , i=0:nscales-1
64

Assignment 9.1: Design scale space
Load the image lena and create the scale space of that image. In total use 9 scales as
shown in Figure 9.2. Use Matlab for implementation. Now check the scalespace function
in DipImage. Compare the results.
Scale spaces are often used to create scale invariant object representation. One way to do
it is to combine scale space with image resizing and to calculate object features at each
image. This method is used as a first step of a currently best image descriptor Scale
invariant feature transform-SIFT.
Process goes as follows:
1. First we take the original image, and generate progressively blurred out images
(scale space of that image).
2. Than, we resize the original image to half size and generate blurred out images
again.
3. Resize the original image to one quarter and generate blurred out images again.
4. Keep repeating till defined threshold ( we resize and construct scale space 4 times
in the case of SIFT)
Figure 9.3 SIFT scale space
65
The scale space of one image in this process is called an Octave and in SIFT, author
proposed to use 4 octaves in total. The number of scales in the scale space (number of
blurred images) is set to 5. These values are set manually using experimental results.
More or less octaves can be generated as well, depending on the problem. Take a look of
the constructed SIFT scale space of the image cat at the Figure 9.3.
Assignment 9.2: Design SIFT scale space
Load the image lena and create the SIFT scale space of that image. In implementation,
use 5 scales and 4 Octaves. Use Matlab for implementation. Why is in your opinion such
representation good for scale invariant recognition?
9.2 Hough Transform
The Hough Transform is a technique to detect pre-defined shapes. Often in image

analysis when we want to detect simple shapes, we encounter a problem of missing
points. In that case we need technique as the Hough transform to group extracted points
into an appropriate set of lines, circles or ellipses.
The original Hough Transform is used to detect straight lines. The detection of other
shapes can be done in a similar way. A line can be parameterized by (see Figure 9.4)
p0 = xcos( 0 )+ysin( 0 )
where p is the algebraic length of the normal of the line that passes through the origin,
and is the angle that this normal makes with the x-axis.
Figure 9.4 Line parameterization

To demonstrate the Hough Transform, we first have to make the vectors x and y, that
together compose a line.
>>
>>
>>
>>
x = 0:30;
p0 = 20; theta0 = pi/3;
y = (p0-x*cos(theta0))/sin(theta0);
plot(x,y)
66
There are a lot of lines that go through a point (x1,y1). However, there is only one line that
goes through all points (xi,yi). At each point we determine all lines (combinations of p and
) that go through that point:
>> theta = 0:pi/256:2*pi;
>> p = x(1) * cos(theta) + y(1) * sin(theta);
>> plot(theta,p)
This results in a parameter space as shown in the Figure 9.5. The axes of this space are
the parameters you are looking for (in this case p and ).
Parameter space
Sampled parameter space
Figure 9.5 Parameter space of Hough transform

Assignment 9.3: Understanding the parameter space
In the figure 9.5, you can see two points where all lines get together.
- What do these points represent?
- Why are there two points?
- Compare this result to a parameter space of another line.
- How could you reduce the size of the parameter space?
- Are there any advantages or disadvantages of reducing the size?
Assignment 9.4: Implementing the Hough Transform
Now that the basic idea of the Hough Transform has been explained, we have to
implement the Hough transform so that you can apply it on binary images and do
measurements in the parameter space.
1. Make an binary input image of size 32x32 containing one or more lines.
2. Determine the necessary size of the parameter space if you want to measure
from 0 to 2 with an accuracy of /128, and p from 0 to 322 with an accuracy of 1.
3. Make an empty parameter space image of the determined size.
4. Fill the parameter space:
- For each object point in the image determine all possible combinations of p and .
- For each combination of p and determine the corresponding pixel in the parameter
space image and increment the value of that pixel by one.
5. Find the maximum in the parameter space.
67
6. Determine the corresponding values of p and .

Even for small images, the Hough transform is a time consuming process. Smart
programming will decrease the execution time dramatically. For shorter execution times,
the number of for-loops has to be reduced.
Assignment 9.5: Reducing execution time
Compare the calculation times t1 and t2 of following function:
nx = 100; ny = 100; x = 0:99;
a = zeros(nx,ny);
b = zeros(nx,ny);
tic
for q = 5:5:30;
y = round((1+cos((x+q)/25))*40+10);
for ii=1:length(x)
a(x,y) = a(x,y)+1;
end
end
t1 = toc
tic
for q = 5:5:30;
y = round((1+cos((x+q)/25))*40+10);
I = y + x*ny;
b(I) = b(I)+1;
end
t2 = toc
Use this to speed up your Hough Transform. The variable I in the second part is an array
containing linear indices into the image b. Note how it is computed: the column number
multiplied by the height of each column, plus the row number MATLAB arrays (and thus
also images) are stored column-wise.
Figure 9.6: Detecting lines in soccer field (Assignment 9.6)

Assignment 9.6: Apply the Hough Transform
Load the image soccer_filed, create binary image and using Hough transform
implemented in the Assignment 9.5 detect lines in the field.
Hough transform can also be used for detecting circles, ellipses, etc. For example, the
equation of circle is: (x - x0)2 + (y - y0)2 = r2. In this case, there are three parameters:
(x0, y0), r and transformation that parameterizes this equation is called Circular Hough
68
Transform. In general, we can use Hough transform to detect any curve which can be
described analytically by an equation of the form: g(v,C) (v: vector of coordinates, C:
parameters). Detecting arbitrary shapes, with no analytical description, is also possible
and that transformation is called Generalized Hough Transform.
We will further elaborate usage of Hough transform for line detection in Chapter 10.
9.3 Mean Shift
In order to perform image segmentation, we first need to group extracted points (features
in feature space) in clusters. One of the techniques used for that is Mean-Shift.
Mean shift considers feature space as an empirical probability density function. If the
input is a set of points then Mean shift considers them as sampled from the underlying
probability density function. If dense regions (or clusters) are present in the feature space
then they correspond to the mode (or local maxima) of the probability density function.
Lets now look at the example on Figure 9.7. Detected points (in yellow) correspond to
objects in image and we want to cluster close points together and assign them to objects.
Notice that if we have large number of points close to each other it will yield to large
peak in underlying probability density function. Observe 3 large peaks in the case of
region 1, region 2 and no peak at all in the case of the single point in region 3. All the
points lying underneath the peak one will correspond to object one. We will use Mean
Shift to detect which points lye underneath which peak.
Peak 1
Region 1
Region 2
Region 4
Figure 9.7 a) Scene with detected points
Figure 9.7 b) Underlying probability

density function
Figure 9.7 Probability density function of detected points

For each data point, Mean Shift associates it with the nearby peak of the datasets
probability density function. For each data point, Mean Shift defines a window around it
and computes the mean of the data point. Then it shifts the center of the window to the
mean and repeats the algorithm till it converges. After each iteration, we can consider
that the window shifts to a more denser region of the dataset.
69
At the high level, we can specify Mean Shift as follows:

1. Fix a window around each data point.
2. Compute the mean of data within the window.
3. Shift the window to the mean and repeat till convergence.
In order to compute mean shift we first need to estimate probability function of points in
image. Function that we use to estimate probability is called kernel.
Kernel density estimation is a non parametric way to estimate the probability function of
a random variable. This is usually called the Parzen window technique. Given a kernel K,
bandwidth parameter h,Kernel density estimator for a given set of d-dimensional points is
n
1
f ( x) = d
nh
K(
i =1
x xi
)
h
Mean shift can be considered to based on gradient ascent on the probability contour. To
find a local maximum of a function using gradient ascent, one takes steps proportional to
the gradient (or of the approximate gradient) of the function at the current point. The
generic formula for gradient ascent is,
x1 = x0 + f ' ( x0 )
Applying it to kernel density estimator,

f ( x) =
1
nh d
K (
i =1
'
x xi
)
h
Setting it to 0 we get,
n
K'(
i =1
n
x xi
x xi
)x = K ' (
) xi
h
h
i =1
Finally we get formula for gradient assent,

x xi
) xi
h
i =1
x= n
x xi
)
K '(
h
i =1
n
K (
'
The stationary points obtained via gradient ascent represent the modes of the density
function. All points associated with the same stationary point belong to the same cluster.
Now let apply these formulas to calculate Mean Shift, m(x):
70
x xi
) xi
h
Assuming g ( x) = K ' ( x) , it follows m( x) = i =1n
x
x xi
(
)
g
h
i =1
n
g(
So mean shift procedure can be summarized as:

For each point xi
1. Compute mean shift vector in time t, m(xit)
2. Move the density estimation window by m(xit)
3. Repeat till convergence
Parameter h is the bandwidth parameter that defines the radius of the kernel and it must
be manually tuned for the optimal value. A compromise must be achieved because a
small value of h leads to a large number of small clusters, while large value of h gives
small number of too large clusters and hence close objects are grouped together.
Assignment 9.7: Understanding Mean Shift
Load m file MeanShiftCluster.m. Map different parts of the code to steps in calculation of
Mean Shift. How would you speed up this function?
Mean shift has many practical applications especially in computer vision field. It is used
to perform clustering, for segmentation of images (Chapter 10), for object tracking in
videos etc. The most important application is using Mean Shift for clustering. The fact
that Mean Shift does not make assumptions about the number of clusters or the shape of
the cluster makes it ideal for handling clusters of arbitrary shape and number.
Now once you understood how the algorithm works, lets try to apply it to solve problem
of point clustering and object localization in given image.
Assignment 9.8: Applying Mean Shift algorithm to clustering problem
Load the image table.jpg that shows 3 objects on a uniform background, from Figure
2.6 a. We detected interesting points on that image using SIFT technique explained in
Chapter 11, and calculated probability function of these points. You will be given matrix
with the set of the points and the probability values that correspond to these points. To
obtain these values load the file Assignment_9.8. and check the matrix prob_values.
Group the close points together using MeanShift code from assignment 9.7. Did you
manage to group points belonging to different objects? Vary parameter h (radius of
kernel) and comment how is the result effected. Try to establish optimal value for the
parameter h. Now, calculate the rectangular box around the clustered points similar to
Figure 9.8.
71
Figure 9.8 Clustered points using Mean Shift (result of the assignment 9.8)
Following chapter (Chapter 9) is dealing with image segmentation and there is special
section devoted to clustering where we will further explain this topic.
72
Chapter 10 Image Segmentation
Chapter 10: Image Segmentation

In order to efficiently describe objects in image we first need to estimate their location
and to segment them from the background. However, segmenting nontrivial images is
one of the most difficult problems in image processing and the performance of the state
of the art approaches is still far from perfect. To further understand the problem observe
the state of the art results from Figure 10.1
Figure 10.1 State of the art results in image segmentation

Segmentation subdivides the image in into meaningful regions or objects. For our visual
system is very easy to group pixels on images into meaningful objects, eg. we will easily
detect a person face or detect a shirt from all colorful pixels on Figure 10.1. But how to
teach this to the system and how to tell when the algorithm should stop with further
dividing the image, this still remains open challenge.
There are many ways to perform segmentation. It can be done based on color, texture
motion in the case of video material etc, but in this Chapter we will only focus on
segmentation of monochrome images.
Segmentation algorithms for monochrome images are based on the basic properties of
image intensity values: discontinuity and similarity. In the first category the approach is
to partition an image based abrupt changes in intensity such as edges. The approaches in
the second category are based on partitioning an image into regions that are similar
73
according to the set of predefined criteria. For the explanations of all the methods in this
Chapter we will only use Image Processing Toolbox of Matlab.
10.1 Point line and edge detection
10.1.1 Detection of isolated points
It is very easy to detect isolated points located in the areas of constant of nearly constant
intensity. The most common way to look for discontinuities is to apply a mask on an
image as described in Chapter 4. The response of the 3x3 mask at any point in the image
is shown bellow, where zi is the intensity of pixel associated with mask coefficient wi .
The response of the mask is defined at its center.
n
R = w1 z1 + w2 z2 + ...w9 z9 = wi zi
i =1
Now we can say that an isolated mask has been detected at the location on which the
mask is centered if R T where T is nonnegative threshold. If T is given the following
command the point can be detected using following command:
image_out = abs(imfilter(tofloat(image_in),w))
w is the filter mask and image_out is the binary image containing the points detected.
If T is not given its value is often chosen based the filtered result just like in following
example:
>>
>>
>>
>>
>>
w=[-1 -1 -1; -1 8 -1; -1 -1 -1];

image_out = abs(imfilter(tofloat(image_in),w));
T=max(image_out(:));
image_out=image_out>=T;
imshow(g)
The assumption that we used in this example is that isolated points are located in constant
or nearly constant background. If we choose T to be maximum value in image_out, there
can be no points in image_out with values greater than T.
Another approach to point detection is to find points in all neighbourhoods of size mxn
for which the difference of the maximum and minimum pixel values excides a specified
value T. For this approach function ordfilt2 is used.
Assignment 10.1
Create the image that has an isolated point and using the ordfilt2 function detect that
isolated points in the image. Set T in the same way as in previous example.
In literature we can often find terms as interest point detection, salient point detection but
they will be processed in further text.
74

10.1.2 Line detection
The next level of complexity is line detection. It can be detected in two different ways,
using masks and using Hough transform.
First one is by using different masks. At Figure 10.2 are shown most common masks
used to detect horizontal, vertical and 45 degree oriented lines.
-1
-1
-1
2
2
2
-1
-1
-1
R1: Horizontal
-1
2
-1
-1
-1
-1
2
-1
2
-1
-1
2
-1
-1
2
R2 +45
R3Vertical
Figure 10.2: Masks for line detection
2
-1
-1
-1
-1
2
-1
2
-1
R4-45
2
-1
-1
To automatically detect all the lines in the image we need to run all 4 masks
independently throughout the image and threshold the absolute value of the results, Ri is
the result for mask i. If at certain point in the image result Ri > R j for all j i that point
is considered to be more likely associated in the direction favored by mask i. Final
response is equal to:
R(x, y) = max(|R1(x, y)|, |R2(x, y)|, |R3(x, y)|, |R4(x, y)|)
If R(x, y) > T, then discontinuity
Lets now apply mentioned approach and show that the image on the Figure 10.3 is in
fact just illusion. We want to prove that all lines are in fact straight.
>>I=imread(optical_illusions.png)
>> w1=[-1 -1 -1; 2 2 2; -1 -1 -1];
>> w3=[-1 2 -1; -1 2-1; 1 2-1];
>> g1=imfilter(I, w1); imshow(g1);
>> g2=imfilter(I, w2); imshow(g2)
Original image
Horizontal lines
Figure 10.3: Line detection examples
Vertical lines
Assignment 10.2
Load the image Zollner_illusions and show that the diagonal lines are in fact parallel.
Try to detect other lines in image as well.
75

Line detection using Hough Transform
In practice, results obtained by edge detection or simple line detection often have many
discontinuities because of the noise, brakes in edge due to non uniform lightening etc..
Fro that reason edge detection algorithms are followed by linking procedure to assemble
edge pixels in meaningful edges. For that we use Hough transform. Before we continue,
please read carefully section Hough transform of Chapter 9.
In Chapter 9 we designed our own Hough transform function. Here we will see how to
use already given functions within Matlab Image Processing toolbox.
To compute Hough transform we can use function hough.
[H, theta, rho] = hough(image_in)
[H, theta, rho] = hough(image_in, ThetaRes, val1, RhoRes, val2)
H is the Hough transformation matrix and theta (in degrees) and rho are the vectors from
Figure 9.4. Input image_in is a binary image.
Lets now compute the Hough transform of image from Figure 10.4 a). The result is
shown in Figure 10.4 b).
>> [H, theta, rho]=hough(image_in, ThetaResolution, 0.2);
>> imshow(H, [], XData, theta, YData, rho, InitalMagnification,
fit);
>> axis on, axis normal
>> xlabel(\theta); ylabel(\rho) ;
The following step in line detection is finding high peaks in Hough Transform function.
The function hougpeaks locates peaks in the Hough transform matrix, H. Numpeaks is a
scalar value that specifies the maximum number of peaks to identify. If you omit
numpeaks, it defaults to 1.
peaks = houghpeaks(H, numpeaks)
peaks = houghpeaks(H, numpeaks, 'Threshold', val1, 'NHoodSize', val2)
Threshold is a threshold value at which values of H are considered to be peaks. It can vary
from 0 to Inf. and has default value 0.5*max(H(:)).
Lets now compute the peaks in Hough transformation function of image from Figure
10.4 a). The result is shown in Figure 10.4 c).
>> peaks=houghpeaks(H,5)
>>hold on
>>plot(theta(peaks(:,2)), rho(peaks(:,1)))
The final step is to determine if there are meaningful line segments associated with
detected peaks, as well as where the lines start and end. For that we use function:
Lines=houghlines(image_in, theta, rho, peaks)
76
Now lets find and link the line segments on the image from Figure 10.4 a). Final result is
shown at the Figure 10.4 d). Detected lines are superimposed as thick gray lines.
>> lines = houghlines (image_in, theta, rho, peaks);
>>figure, imshow(image_in), hold on
>> for k=1:length(lines)
xy=[lines(k).point1; lines(k).point2];
plot(xy(:,1), xy(:,2), LineWidth, 4)
end
10.4 a) Original image
10.4 b) Hough transform
10.4 c) Peaks in Hough Transform
10.4 d) Line segments corresponding to

peaks
Figure 10.4 Line detection using Hough Transform
Assignment 10.3:
Load the image soccer filed, create binary image and using functions for Hough
Transform explained above detect lines on the field.
77

10.1.3 Edge detection
The most common approach to detect sudden changes (discontinuities) in image is edge
detection. Most semantic and shape information is saved in edges. Take the artist
sketches as example. We can imagine and understand the outline of entire picture just by
looking at its sketch. In practice edges are caused by variety of different factors. Observe
Figure 10.5 for explanation.
Surface normal discontinuity
Depth discontinuity
Surface normal discontinuity
Illumination discontinuity
Figure 10.5 Factors that cause edges
Discontinuities can be detected using first and second order derivatives. The first order
derivative is the gradient, while the second order derivative is Laplacian.
Gradient:
The gradient of a 2D function f(x,y) is defined as the vector:

f
g
x x
f = =
g y f
y
The magnitude of this vector is shown bellow and it determines the strength of the edge:
f 2 f 2 0.5
) + ( ) ] g x2 + g y2 g x + g y
x
y
A fundamental property of the gradient vector is that it points in the direction of the
maximum change of f at coordinates (x,y). The angle at which this maximum rate of
change occurs is:
g
( x, y ) = tan 1 x
g y
f = mag (f ) = [ g x2 + g y2 ]0.5 = [(
If we use the gradient for edge detection we need to find places in image where the first
derivative of the intensity is greater in magnitude than a specified threshold.
78
We can approximate the gradient using convolution and image masks. In Matlab, in
Image Processing toolbox there is a function edge.
[g , t] = edge(f, method, parameters)
For calculation of a first derivative, there are 3 possible methods that we can use Sobel,
Prewitt and Roberts. Sobel is the most common one and it approximates the gradient
using differencing, where z represents the image neighborhood:
2 0.5
f = g x 2 + g y 2 = [ ( z7 + 2 z8 + z9 ) ( z1 + 2 z2 + z3 ) ] + [ ( z3 + 2 z6 + z9 ) ( z1 + 2 z4 + z7 ) ]
2
The general syntax for Sobel operator is:

[g , t] = edge(f, Sobel, T, dir)
T is a specified threshold that determines the edge pixels, if f T pixel is edge pixel.
Dir specifies the preferred direction of edges: horizontal, vertical or both. Resulting
image g is binary image. An example of image room with detected edges using Sobel is
shown at Figure 10.6 b)
Laplace:
Second order derivatives are seldom used directly for edge detection because of their
sensitivity to noise and inability to detect edge direction. They are used in combination
with first derivatives to find places where the second derivative of intensity crosses zero.
If we consider the Gaussian function used for smoothing, G(x,y) we can define the
Laplacian of Gaussian (LoG) that is often used as edge detector:
G ( x, y ) = e
x2 + y2
2 2
2G ( x, y ) 2G ( x, y ) x 2 + y 2 2 2
G ( x, y ) =
+
=[
]e
2 x2
2 y2
4
2
x2 + y 2
2 2
The general syntax for Laplacian operator is:

[g , t] = edge(f, log, T, sigma)
Sigma is the standard deviation and default value is 2. An example of image with
detected edges using Sobel is shown at Figure 10.6 c)
Finally the best result can be obtained by combining to different methods and that is the
case in the Canny edge detector. The syntax is following:
[g , t] = edge(f, canny, T, sigma)
An example of image with detected edges using Canny is shown at Figure 10.6 d)
79
10.6 a) Original image
10.6 b) Sobel edge detector
10.6 c) LoG edge detector

10.6 d) Canny edge detector
Figure 10.6 Comparison of different edge detectors
Assignment 10.4:
Explain the Canny algorithm. Point out the parts of the algorithm that are using first order
derivatives and the ones that are using Laplacians.
Assignment 10.5:
Load the image kickoff and extract the edges using separately Sobel, LoG and Canny
operator. Compare the results. Now vary the values threshold T. How is the result
affected?
10.2 Extracting corners and blobs
When image segmentation and object recognition have to be performed in real word
scenes, using simple lines or edges in no longer suitable. The algorithm must cope with
occlusions and heavy background clutter so we need to process the image locally. For
that most of the state of the art algorithms use detection of interest points (also called
keypoints). Interest points are points that exhibit some kind of salient (distinctive)
characteristic like a corner. Subsequently, for each interest point a feature vector called
region descriptor is calculated. Each region descriptor characterizes the image
80
information available in the local neighborhood of a point. Region descriptors and object
representation are processed in Chapter 11.
All keypoint detectors can be divided in two major groups:
Corner based detectors: used for detection of structured regions and rely on the
presence of sufficient gradient information (sufficient change in intensity)
Region based detectors (blobs): used for detection of uniform regions and regions with
smooth brightness transition.
10.2.1 Corners
We can detect corner by exploiting its main property: in the region around a corner,
image gradient has two or more dominant directions. Observe the Figure 10.7 where is
this property shown:
Flat region, no change in Edge, no change along Corner, significant change

all directions
the edge direction
in all directions
Figure 10.7 Region conditions resulting in corner
The most common method is Harris detector. In order to detect Harris corners we need to
observe intensity values of image pixels within small window (usually neighborhood of
pixel). In a corner point shifting the window in any direction should yield a large change
in appearance. Using mathematical equations we can describe change of the intensity for
the shift (u,v) in a following way:
I 2 I x I y u
2
E (u , v) = w( x, y )[ I ( x + u , y + v) I ( x, y )] [u , v] w( x, y ) x
I x I y I y v
x, y
x, y
2
I
IxI y
M = x
I I
I y
x y
I(x+u, y+v) is shifted intensity, I(x,y) intensity at point (x,y), while w(x,y) represents
window function (or mask as we called it). As we saw in 10.1.3 we can use gradient to
measure abrupt changes, so the approximation using matrix M can be used. Ix,and Iy, are
first order image derivatives (gradients of the image).
For very distinctive patches change in intensity will be large, hence the E(u,v) will be
large. We can now check if the corner response at each pixel is large:
R = det( M ) k (traceM ) 2
81
Det is determinant of the matrix M, while trace represents the sum of eigenvalues of M. k
is empirically defined constant, and the smaller the value of k the more likely is that
algorithm will detect sharp corners. Now value of R will determine corner point:
R is large for a corner

R is negative with a large magnitude for the edge
R is small for a flat region
In Matlab Image Processing Toolbox, there is a function cornermetric that detects

corners in grayscale image. k is the same parameter as explained above and if omitted the
default value is 0.04, ranges from 0<k<0.25
c = cornermetric(image_in, Harris, 'SensitivityFactor', k)
Lets now calculate corners in image corners. Results are shown in Figure
>>I = imread('corners.png');
>>imshow(I);
>>C = cornermetric(I, 'Harris', 'SensitivityFactor', 0.04);
% Displaying corner metric
>>C_adjusted = imadjust(C);
>>figure;
>>imshow(C_adjusted);
% Detecting corner peaks
>>corner_peaks = imregionalmax(C);
% Displaying corners in the image
>>corner_idx = find(corner_peaks == true);
>> [r g b] = deal(I);
>>r(corner_idx) = 255;
>>g(corner_idx) = 255;
>>b(corner_idx) = 0;
>>RGB = cat(3,r,g,b);
>>figure
>>imshow(RGB);
>>title('Corner Points');
Original image
Corner metric
Figure 10.8 Corner detection
Detected corners
Assignment 10.6:
Calculate corners of image room. Vary the sensitivity factor. What can you observe?
82
10.2.2 Blobs
Blob detection algorithms detect points that are either brighter or darker than
surrounding. They are developed to provide additional information to edge and corner
detectors. They are used efficiently in image segmentation, region detection for tracking,
object recognition etc, they serve as interest points for stereo matching etc
One of the most used blob detector is based on the Laplacian of Gaussian (LoG) edge
detector explained in 10.1.3. The magnitude of the Laplacian response will achieve a
maximum at the center of the blob. The Laplacian operator usually results in strong
positive responses for dark blobs of extent and strong negative responses for bright
blobs of similar size.
x2 + y 2
2G ( x, y ) 2G ( x, y ) x 2 + y 2 2 2 2 2
G ( x, y ) =
+
=[
]e
2 x2
2 y2
4
A main problem when applying this operator at a single scale, however, is that the
operator response is strongly dependent on the relationship between the size of the blob
structures and the size of the Gaussian kernel used for pre-smoothing. In order to
overcome this and to make the automatic detector a multi scale approach similar to one
from Chapter 9.1 must be used. This approach is further developed in Chapter 11.
2
Another approach that is using multi scale approach and gives scale invariant detector is
Maximally Stable Extremum Regions (MSER). Its principle is based on thresholding the
image with a variable brightness threshold t. All pixels with gray value bellow t are set to
0/dark, and all pixels with gray value equal or above t are set to 1/bright. The threshold is
increased successively and in the beginning the thresholded image is completely bright.
As t increases, black areas will appear in the binarized image and they will grow and
merge together. Black areas that will be stable for a long period of time and these ones
are MSER regions. They reveal the position (center point) as well as the characteristic
scale derived from region size as input data for region descriptor calculation. Altogether,
all regions of the scene image are detected which are significantly darker than their
surrounding. Inverting the image and repeating the same procedure with the inverted
image reveals characteristic bright regions.
There is no function in Matlab or DipImage that can calculate these features, so we will
install and use new toolbox VL_Feat that is very useful for segmentation and description.
Assignment 10.7:
Download and install in Matlab VL_Feat toolbox from address:
http://www.vlfeat.org/download.html
Now we can use command vl_mser used to calculate MSER regions.

[r,f] = vl_mser(I)
83
In r are centres of regions, while f contains elliptical frames around centers. Lets now
compute the MSER regions of the image lena. Results is shown on Figure 10.9.
>>I = uint8(rgb2gray(imread(lena.jpg)) ;
>>[r,f] = vl_mser(I);
>>imshow(I) hold on
>>f = vl_ertr(f) ;
>>vl_plotframe(f) ;
Original image
Detected MSER regions

Figure 10.9 Blob detection
Assignment 10.8:
Another way to use function MSER is:
[r,f] =vl_mser(I,'MinDiversity',val1,'MaxVariation',val2,'Delta', val3)
Explain in your own words what is meaning of parameters MinDiversity,

MaxVariation and Delta. Vary the values of these parameters and show the effect on
the image lena.
10.3 Clustering methods in segmentation
One of the easiest methods to perform image segmentation is to start from image pixels
and to group these pixels into meaningful regions (this is so called bottom up
approach).
Clusters can be formed based on pixel intensity, color, texture, location, or some
combination of these. It is very important to choose the number of cluster correctly since
the solution highly depends on that choice.
The K-means algorithm is an iterative technique that is used to partition an image into K
clusters. The basic algorithm is:
1. Pick K cluster centers, either randomly or based on some heuristic.
84
2. Assign each pixel in the image to the cluster that minimizes the distance between
the pixel and the cluster center. As distance function often is used Euclidean
distance.
3. Re-compute the cluster centers by averaging all of the pixels in the cluster
4. Repeat steps 2 and 3 until convergence is attained (e.g. no pixels change clusters)
In Matlab, in Image processing toolbox function kmeans is already implemented.
IDX = kmeans(X,k)
IDX is (n ,1) vector containing the cluster indices of each point. By default, kmeans
uses squared Euclidean distances. K is the predefined number of clusters.
There is also version of function with more parameters:

[IDX,C,sumd,D] = kmeans(X,k, distance, val1)
OUTPUT:
D, (n, k)
matrix that returns distances from each point to every centroid in the image
sumd, (1,k) returns the within-cluster sums of point-to-centroid distances
val1 chooses the possible distance functions: 'sqEuclidean' , cityblock, Hamming
Lets now segment the color image from figure 10.10 a). Algorithm that we will use is
following:
Step 1: Read Image

Step 2: Convert image from RGB color space to L*a*b* color space
Step 3: Classify the colors in L*a*b*' space using K-Means Clustering
Step 4: Label every pixel in the image using the results from kmeans
Step 5: Create images that segment the image by color.
Step 6: Segment the gloves on the image into a separate image
Step 1: Read Image

>>I=imread('girl.jpg');
>>figure (1)
>>imshow(I)
Step 2: Convert image from RGB color space to L*a*b* color space
Note: Please refer to Chapter 7 for explanation of L*a*b* color space
>> cform = makecform('srgb2lab');
>> lab_he = applycform(I,cform);
Step 3: Classify the colors in L*a*b*' space using K-Means Clustering
85
We are now interested only in color information, so we can eliminate Luminosity

component. Our objects that we will cluster are now pixels with 'a*' and 'b*' values. By
observing the image we concluded that there are approximately 10 different colors in the
image, so we set the number of clusters to 10. Now we will use kmeans to cluster the
objects into 10 clusters using the Euclidean distance metric.
% Extract only 'a*' and 'b*' values
>>ab = double(lab_he(:,:,2:3));
>>nrows = size(ab,1);
>>ncols = size(ab,2);
>>ab = reshape(ab,nrows*ncols,2);
% Set value for number of clusters
>>nColors = 10;
% Repeat the clustering 3 times to avoid local minima
[cluster_idx cluster_center] = kmeans(ab, nColors, 'distance',
'sqEuclidean', 'Replicates',3);
Step 4: Label every pixel in the image using the results from kmeans
>>pixel_labels = reshape(cluster_idx,nrows,ncols);
>>figure(2)
>>imshow(pixel_labels,[]), title('image labeled by cluster index');
Result of labeling is now shown in Figure 10.10 b)

Step 5: Create images that segment the image by color.
>>segmented_images = cell(1,10);
>>rgb_label = repmat(pixel_labels,[1 1 3]);
for k = 1:nColors
color = he;
color(rgb_label ~= k) = 0;
segmented_images{k} = color;
end
>>imshow(segmented_images{1}), title('objects in cluster 1');
As we can see from previous example, the K-Means algorithm is fairly simple to
implement and the results for image segmentation are very good for simple images. As
can be seen by the results from Figure 10.10, the number of partitions used in the
segmentation has a very large effect on the output. By using more partitions, in the RGB
setup, more possible colors are available to show up in the output and the result is much
better. Manual detection of the number of clusters is main drawback of this method.
Assignment 10.9:
As we can see from Figure 10.10, we have detected that the glove is found in the cluster
10. Extract only the pixels belonging to the glove and perform the step 6 of the algorithm.
Assignment 10.10: Design system for Skin Detection
86
Segment only the face in image girl. Use previous example as reference. Once you
labeled all the regions, verify all the regions with the criteria for identifying the skin
pixels. Identify the skin pixel objects from the different color objects. As criteria you can
use following values:
(R,G,B) value is classified as skin if it is satisfied:
(R > 95) & (G > 40) & (B > 20)
&( (Max{R,G,B} - Min{R,G,B}) > 15)
&( |R-G| > 15) & (R > G) & (R > B)
Now assume 0s to the regions which is not a skin. Display the segmented image which
contains skin pixels in the RGB format.
Original image
Cluster 4
Pixel labels
Cluster 1
Cluster 2
Cluster 5
Cluster 8
Cluster 10
Figure 10.10 Color segmentation using kmeans clustering
87
Chapter 11: Image description (Advanced)

Images are described using descriptors that are suited for a specific type of images and
applications. For example, description of medical images of MRI is quite different from
description of surveillance data from highway or description of face images. Also there
are two different approaches in the way how are these descriptors applied on the image,
local and global.
Global approaches treat image as a whole and extract some global characteristic of image
such as global color distribution, texture or global shape of the image. This approach is
widely used in image retrieval to group images that come from a same location. For
example as it is shown on Figure 10.1 most of beach images are characterized by see, sky
and sand so they have very distinctive color distribution that can help us to distinguish
these images from all others and group them together.
Figure 11.1 Images suitable for global description

Now lets take a look at images from Figure 11.2. Using global descriptor we will easy
detect that these are images of a beach, but what if want to detect children on these
images, or to describe images as kids playing on beach. In that case global approach
wont help and we need to process image locally. Using local approach, we will divide
the image in regions and process every region separately. Regions (also called patches)
can vary from very small, almost neighborhood of a pixel, till very large, eg quarter of an
image.
Figure 11.2 Images suitable for local description

Process of describing image and finding similarity between images is similar wheatear
we use global or local description. It consists of several steps:
Offline steps:
88
Chapter 11 Image description
1.
2.
3.
4.
5.
Collect images that will form the database

Choose descriptors that will be used based on the database properties
Calculate these descriptors for all images from the database
Normalize database (sometimes used)
Chose distance measure suited to descriptor that will be used for similarity
calculation
Online steps:
6. Calculate descriptor for unknown image (query image)
7. Compare the descriptor of query image with all other descriptors from the
database using the chosen similarity measure
8. Image/images with the smallest distance/distances are most similar with our query
image. From that result we can draw conclusion to which category query image
belongs to or which objects query image contains.
11.1 Global descriptors
Global descriptors are used when we want to describe some global property of an image.
We can divide them in color, texture and shape descriptors.
11.1.1 Color descriptors
There are many color descriptors in use in state of the methods. Here we will focus on
one simple and fast but still very efficient color descriptor based on the statistics of an
image, color moments.
Color moments:
The basis of color moments lays in the assumption that the distribution of color in an
image can be interpreted as a probability distribution. It therefore follows that if the color
in an image follows a certain probability distribution, the moments of that distribution
can then be used as descriptors to identify that image based on color. For a color
distribution perceptual color system as HSV should be used. Moments are then calculated
for each of these channels in an image, H, S and V. In total there are 3 moments Mean,
Standard deviation and Skewness and an image is therefore characterized by 9 moments,
3 moments for each 3 color channels. We will define the ith color channel at the jth image
pixel as pij ( in total there are N pixels).The three color moments can then be defined as:
Moment 1 Mean:
N
1
pij
j =1 N
Mean can be understood as the average color value in the image.
Ei =
Moment 2 Standard deviation:
89
1 N
( pij E i ) 2 )
N j =1
The standard deviation is the square root of the variance of the distribution.
i = (
Moment 3 Skewness:
si = 3 (
1
N
(p
j =1
ij
E i )3 )
Skewness can be understood as a measure of the degree of asymmetry in the distribution.

We can now define feature matrix that uniquely describes the image:
E1 E2 E3
FV= 1 2 3
s s
s3
2
1
Now we should match our image (query image) with other images from database and
calculate similarity between them. For that different similarity functions are used, eg
Euclidean distance, or cosine distance. For color moments it is best to use following
distance:
3
d(FV_1, FV_2)= w i1 Ei1 Ei 2 + w i2 i1 i 2 w i3 si1 si 2

i=1
Here we calculate the distance between two feature vectors, FV_1 and FV_2. Ei1
represents mean of FV_1, Ei2 represents mean of FV_2 etc.
wi1, wi2 and wi3 represent weights that are used to emphasize more important components,
for example to reduce the influence of Value component (Brightness) and increase the
influence of Hue component (color type). These weights are usually pre-calculated from
the entire database of images and suited for images in that database.
Now that we calculated all distances, image with the smallest distance is most similar
with our query image.
Assignment 11.1:
Design a function that will compute the color moments of an image. Load images
beach1, beach2 and rainforest. Calculate color moments for all three images. Now
compare the image beach1 with the other two. Is the descriptor successful?
11.1.2 Texture descriptors
90
Texture descriptors are very useful to group similar texture or patterns in images. On
Figure 11.3 examples of different textures are shown.
Wall
Bush
Hair
Figure 11.3 Examples of textures
Electrical chip
When computing texture it is very important to consider the distribution of gray level
intensities as well as the relative positions of pixels in image. One such method is Gray
Level Co-occurrence matrix (GLCM).
The GLCM is a matrix of frequencies, where each element (i, j) is the sum of the number
of times that a pixel value i was at a certain distance and angle from pixel intensity j.
The number of rows and columns of the GLCM is determined by the number of grayscale
intensity values in the image. Then, 4 statistical features are calculated from the GLCM:
1. Contrast - a measure of the relative intensity contrast between a pixel and its neighbor
in a relative location .
Contrast = (i j ) 2 P (i, j )
i, j
2. Homogeneity - a measure of the closeness of the distribution of elements in the GLCM

to the diagonal.
Homogeneity =
i, j
P (i, j )
1+ i j
3. Energy a measure related to entropy which measures orderliness in an image. The

range of energy is [0, 1], where 1 represents a constant image.
Energy = P (i, j ) 2
i, j
4. Correlation an index of how correlated a reference pixel is to a pixel in the direction

and distance over the entire image.
Correlation =
i, j
(i i )( j j ) P (i j )
i j
i =1
j =1
i =1
j =1
i = i P (i, j ) i = (i i ) 2 P (i, j )
Combining these features together one large feature vector is formed and used as texture
representation of an image.
In Matlab Image Processing Toolbox, there is function that computes GLCM:
91
glcms = graycomatrix(I, 'NumLevels', val1, 'Offset', val2)
OUTPUT:
NumLevels is the number of gray-levels to use. By default, if I is a binary image,
function scales the image to two gray-levels, if I is an intensity image it scales the image
to eight gray-levels.
Offset is p-by-2 array of integers specifying the distance between the pixel of interest
and its neighbor. Each row in the array is a two-element vector, [row_offset, col_offset],
that specifies the relationship, or offset, of a pair of pixels.
Now to calculate descriptors for GLCM, we use following function:
stats = graycoprops(glcm, properties)
As properties we can use all for all descriptors or only some of them by stating
{Contrast, Homogeneity, Energy ,Correlation}
Lets now calculate the descriptors for image wall.
>>I=imread(wall.jpg);
>>GLCM=graycomatrix(I, NumLevels, 256);
>>GLCM_n=GLCM/(sum(GLCM(:)); % Normalized matrix
>>stats =graycoprops(GLCM, all);
>>contrast=stats.Contrast;
>>corr=stats.Correlation;
>>energy=stats.Energy;
>>hom=stats.Homogenity;
These values can be joined in one feature vector of image and used for similarity
comparison with the other images. As distance function it is best to use Euclidean
distance. To calculate Euclidean distance of two vectors we can us the function norm(x-y)
from Matlab.
Assignment 11.2:
Load images brown-hair and blonde-hair and extract texture descriptors. Observe
obtained values and compare them with values obtained from image wall. What can
you conclude? Measure the distance between different descriptors using Euclidean
distance. Is the descriptor effective?
11.1.3 Shape descriptors
There are two main approaches in describing shapes in images, description of a shape
region and description of a boundary (contour) around shape. Most of the shape
descriptors that work in 2D are mostly computed on a binary image. Please read again
section 7.2 in chapter Measurements, Measuring in binary images. There we already
explained different properties of shapes that can be used as shape descriptors, like
92
Perimeter, Area, Curvature etc. We also introduced regionprops function that describes
different shape properties of regions in image.
For the description of a contour of a shape we would like to introduce a very efficient
method called Fourier descriptors. They have several properties that make them valuable
in shape description, they are insensitive to translation, rotation, and scale changes.
Figure 11.4 Contour definition

Contour can be represented like on the Figure 11.4 using sequence of coordinates:
s ( k ) = [ x ( k ), y ( k )] for k = 0,1, 2...K 1
s ( k ) = x( k ) + jy ( k )
Now we can calculate the 1D Fourier transform of the contour s(k), lie in Chapter 4
K 1
a (u ) = s (k )e j 2 uk / K for u = 0,1, 2...K 1

k =0
The complex coefficients a(u) are called the Fourier descriptors of the boundary. To
restore s(k) we need to calculate the inverse Fourier transform.
1 K 1
a (u )e j 2 uk / K
K u =0
Using inverse Fourier transform, we can also restore s(k) using much less points than
originally used. We can limit to using only first P coefficients in reconstruction. In that
case we can use following equation for reconstruction:
s (k ) =
1 P 1
a(u )e j 2 k / K
P u =0
Notice that we are still using the same number of boundary points (K) but only P terms
for reconstruction of each point. By this we are limiting the high frequencies that are
responsible for detail, but the low ones that determine global shape remain.
s(k ) =
To calculate the Fourier descriptors in Matlab, we can use functions frdescp and ifrdescp
provided with this practicum.
To compute a Fourier descriptor of boundary of image we use function frdescp:
93

z = frdescp(s)
To compute the inverse transformation, we use ifrdescp:

s = ifrdescp(z, nd)
OUTPUT:
z are Fourier descriptors computed using frdscp
nd is the number of descriptors used to computing the inverse; ND must be an even
integer no greater than length(Z).
Lets now describe the shape of the object from Figure 11.4 a). First thing we need to do
is to extract a contour from the object. We can do it in following way:
>>b=bwboundaries(I,noholes); % To extract the contour
>>b=b{1} % There is only one boundary
>>bim=bound2im(b,size(I,1), size(I,2))%Create an image from that contour
Once we extracted the boundary, Figure 11.5 (b) we can see that it has 840 points
(K=840). We can now calculate Fourier descriptors from that image.
>>z=frdescp(b);
Now to check if computed values are correct, lets calculate inverse transformation only
using half of the descriptors (P=420).
>>s_half_points=ifrdescp(z,half_points);
>>I_half_points= bound2im(s_half_points,size(I,1), size(I,2))%
If we now vary the number of Fourier descriptors used for reconstruction, we will obtain
the results shown at Figure 11.5 (c-f). We can conclude that even much reduced number
of Fourier descriptor managed to accurately describe the shape.
a) Original shape
bb) Shape contour
c) Reconstructed contour
using 420 Fourier
descriptors (840 max)
d) Reconstructed contour
e) Reconstructed contour
f) Reconstructed contour
using 84 Fourier descriptors using 28 Fourier descriptors using 8 Fourier descriptors
Figure 11.5 Shape contour reconstruction using Fourier descriptors
94

Assignment 11.3:
Load the image fish and describe it using Fourier descriptors. Now reconstruct the
original boundary but vary the number of descriptors from 8, 14, 28, 56, , half image
size. What can you conclude from the result?
11.2 Local descriptors
Main idea of the local approach is to divide the image in small regions and process each
of these regions separately. In that way we can solve problems like occlusion of objects
and background clutter in images. For calculation of similarity each region will vote for
similar images and the images that got most votes will be considered most similar to
query image. Descriptors that are applied on each region can also be color, texture or
shape explained before but the most common technique is keypoint description. As
keypoints we consider corners and blobs explained in Chapter 10.2. Currently best
keypoint descriptor is Scale Invariant Feature Transform (SIFT) already mentioned in
Chapter 9. Please read 9.1 part carefully before continuing with reading.
11.2.1 SIFT Detector
As we already mentioned in Chapter 9.1, First step in calculation of SIFT features is

generating scale space in order to make our features scale invariant. For that we use
Gaussian filter. Second step is to extract interest points such as corners or MSER. For
that SIFT uses Difference of Gaussians (DoG) with the idea to detect all local maxima
and minima in neighboring images in scale space. We can describe it using following
formula, where * denotes convolution:
D ( x, y, ) = (G ( x, y , k ) G ( x, y , )) I ( x, y )
Local maxima and minima are detected by comparing D ( x, y, ) at location ( x, y, )

with its eight neighbors of the same scale and 3x3 regions of the two neighboring scales
centered at the same x and y ( 9+9+8=26 neighbors), as on Figure 11.6.
11.6. Detection of local maxima and minima in SIFT detector

If we would like to design our own scale invariant detector, instead of local maxima and
minima we could also extract corners or use some other detector on all images in scale
95
space. This technique is used in Harris Laplacian detector so we suggest reader to check
difference between this method and SIFT.
In order to make SIFT detector rotation invariant, for every minima and maxima
(keypoints) dominant orientation is calculated. To do it we need to measure orientation
( x, y ) and magnitude m( x, y ) in Gaussian image L(x,y) at the closest scale to the
keypoint's scale.
m( x, y ) = (( L( x + 1, y ) L( x 1, y )) 2 + (( L( x, y + 1) L( x, y 1)) 2
( L( x, y + 1) L( x, y 1)
( x, y ) = tan 1 (
)
( L( x + 1, y ) L( x 1, y )
Now a gradient orientation histogram is computed in the neighborhood of the keypoint.

The contribution of each neighboring pixel is weighted by the gradient magnitude
m ( x, y ) and a Gaussian window with that is 1.5 times the scale of the keypoint. Peaks
in the histogram correspond to dominant orientations. All the properties of the keypoint
are measured relative to the keypoint orientation, this provides invariance to rotation.
Finally every SIFT interest point is represented with position coordinates x and y, scale
and orientation of that region
11.2.2 SIFT Descriptor
Descriptor is calculated for each interest point/region. It is important to mention that if

we would like to make our method invariant to certain property (eg scale or rotation
invariant) both detector and descriptor must be invariant to that property. For that reason
SIFT descriptor must be scale and rotation invariant as well. To accomplish this, a 16x16
window is created between pixels around the keypoint. It is then split into sixteen 4x4
windows and an 8 bin histogram of orientations is generated in each window, as on the
Figure 11.7. The histogram and gradient values, calculated in the same way as in the
detector, are now interpolated to produce a 128 dimensional feature vector (16*8) which
is invariant to scale and rotation. To obtain illumination invariance, the descriptor is
normalized by the square root of the sum of squared components.
11.7. SIFT descriptor calculation
96
There is no command for SIFT calculation in Matlab, so we need to use external toolbox
vl_feat, mentioned in Chapter 10. We can calculate it using:
[f,d] = vl_sift(I)
I must be single precision grayscale image, while f represents matrix of interest points. It
contains a column (frame) for each interest point where a frame is a disk of center f(1:2),
scale f(3) and orientation f(4). The matrix d contains descriptors, and has a size number
of interest points * 128.
Lets now calculate and plot SIFT features for image. Keypoints are shown at Figure 11.8
a) and descriptors at Figure 11.8 b)
>>I1= rgb2gray(imread('scene.JPG'));
>>P1 = single(I1) ;
>>[f1,d1] = vl_sift(P1) ;
% To show only 150 keypoints out of 1200 calculated
>>perm1 = randperm(size(f1,2)) ;
>>sel1 = perm1(1:150) ;
>>imshow(I1); hold on
>>h11
= vl_plotframe(f1(:,sel1)) ;
>>h21
= vl_plotframe(f1(:,sel1)) ;
>>set(h11,'color','k','linewidth',3) ;
>>set(h21,'color','y','linewidth',2) ;
% To overlay the decriptors
>>h31 = vl_plotsiftdescriptor(d1(:,sel1),f1(:,sel1)) ;
>>set(h31,'color','g')
11.8 a) Detected keypoints

11.8 b) Descriptors
11.8 SIFT features
Assignment 11.4:
Load the image room and calculate SIFT features. Check additional parameters that
can be set in the vl_sift function and show how they effect the result.
11.2.3 SIFT Matching
97
In order to calculate similarity between two images we need to match a large number of
128-dimensional descriptors. For that the regular Euclidian distance, dist is used where
d1 and d2 represent descriptors from two different images.
dist ( j ) =
(d (:,1) d (:, i)
i =1
K = 1: num _ keypo int s _ d 2

j = 1, 2..num _ keypo int s _ d1
For each descriptor in image I1 the distance to all descriptors in image I2 is calculated
and following condition checked:
distance to the second closest
1.5
distance to the closest
If the condition is true the descriptor from I1 is matched to the closest one from I2,
otherwise the descriptor in I1 is not matched at all. This criterion is to avoid having too
many false matches for points in image I1 which are not present in image I2.
11.9 Matching of SIFT points

In vl_feat a special function is provided for matching of SIFT features.
[matches, scores] = vl_ubcmatch(d1, d2)
For each descriptor in d1, function finds the closest descriptor in d2 (using Euclidean
distance between them). The index of the original match and the closest descriptor is
stored in each column of matches and the distance between the pair is stored in scores.
Matches can be plotted, same as on Figure 11.9, using function:
plotmatches(I1,I2,d1,d2,matches)
Now lets select one of the objects from image scene and match model of that object
with original image.
>>I2=rgb2gray(imread('model.jpg'));
98

>>P2 = single(I2) ;
>>[f2,d2] = vl_sift(P2) ;
>>[matches, scores] = vl_ubcmatch(d1, d2) ;
>>figure
>>plotmatches(I1,I2,f1,f2,matches)
Result is presented on Figure 11.9 and shows that many keypoints are matched correctly.
However we can also see many false matches, keypoints from model that are matched
elsewhere in the scene. For that reason, further refinement of results or also called
outlier rejection is needed. Also, keypoints belonging to one object need to be clustered
and for that Hough transform is used. Finally, exact position of the object can be
obtained.
Assignment 11.5:
Load the image scene2 and extract SIFT features. Match that image with the image
model, which is one of the objects in the scene. Show matched points, group them
together using Hough transform and try to estimate the position of object model in the
scene.
99

Basic Image Processing For Robotics Manual

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Basic Image Processing For Robotics Manual

Hochgeladen von

Copyright:

Verfügbare Formate

Robot Practical

Basic Image Processing for Robotics

Maja Rudinac and Xin Wang

Chapter 6: Binary image processing ............................................................................. 33

9.3 Mean Shift............................................................................................................... 69

Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing, the second

Fig 1.1 Desktop of Matlab

Read and Display an image (Matlab)

Table 1.1 image formats in Matlab

To display images, Matlab uses function imshow:

Fig 1.2 Football

Example 1.2 Obtain additional information from an image (Matlab)

Fig 1.3 GUI of DIPimage

1.2.2 Getting Started

Fig 1.4 Two cells

Chapter 2 Image representation and manipulation

Chapter 2: Image representation and manipulation

Fig 2.1 Sampling and quantization process

Fig 2.2 Image sampling

Chapter 2 Image representation and manipulation

Chapter 2 Image representation and manipulation

Figure 2.3 Dowsample an image by factor 2

Figure 2.4 Upsample an image by factor 2

Chapter 2 Image representation and manipulation

And another one uses the functions in Table 2.2.

Table 2.1 Data types in Matlab

Chapter 2 Image representation and manipulation

2.2 Graphic Geometric Transformation

The result is shown in Figure 2.6.

Figure 2.6 Image rotation

Chapter 2 Image representation and manipulation

Fig 2.7 Image mirror (DIPimage)

The basic image array operations are listed in Table 2.3.

Chapter 2 Image representation and manipulation

Figure 2.8 Image subtraction for motion detection

zeros (M, N) generates an M N matrix of 0s of double type data.

Chapter 3 Intensity Transformation

Chapter 3: Intensity Transformation

Figure 3.1 Spatial domain based image operation

Chapter 3 Intensity Transformation

The results are shown in Figure 3.2.

Figure 3.2 Intensity transformation

Chapter 3 Intensity Transformation

Figure 3.3 Histogram calculation

Chapter 3 Intensity Transformation

3.2.2 Histogram equalization

Chapter 3 Intensity Transformation

80 96 112 128 144 160 176 192 208 224 240

Figure 3.4 Image histogram

Chapter 3 Intensity Transformation

Figure 3.6 Result of Image thresholding

Figure 3.7 Result of Image thresholding displayed in DIPimage

Chapter 4 Spatial Filtering

Chapter 4: Spatial Filtering

Fig 4.1 Neighborhood processing

Chapter 4 Spatial Filtering

Table 4.1 Filter kernels

Chapter 4 Spatial Filtering

Figure 4.1 Average filtering

Figure 4.2 Gaussian filter

Chapter 4 Spatial Filtering

4.3 Edge Filter

Chapter 4 Spatial Filtering