1 s2.0 S0262885614001413 Main PDF

Image and Vision Computing 32 (2014) 10071020
Contents lists available at ScienceDirect
Image and Vision Computing

journal homepage: www.elsevier.com/locate/imavis
Model-based graph-cut method for automatic ower segmentation with

spatial constraints
Ezzeddine Zagrouba , Siwar Ben Gamra, Asma Najjar
Team of research SIIVA Lab. Riadi, Higher Institute of Computer Science, University of Tunis Elmanar, Tunisia
a r t i c l e
i n f o
Article history:
Received 22 March 2013
Received in revised form 9 August 2014
Accepted 11 August 2014
Available online 30 September 2014
Keywords:
Automatic ower image segmentation
Graph-cut
Spatial prior
a b s t r a c t
In this paper, we present an accelerated system for segmenting ower images based on graph-cut technique which
formulates the segmentation problem as an energy function minimization. The contribution of this paper consists to
propose an improvement of the classical used energy function, which is composed of a data-consistent term and a
boundary term. For this, we integrate an additional data-consistent term based on the spatial prior and we add gradient information in the boundary term. Then, we propose an automated coarse-to-ne segmentation method composed mainly of two levels: coarse segmentation and ne segmentation. First, the coarse segmentation level is based
on minimizing the proposed energy function. Then, the ne segmentation is done by optimizing the energy function
through the standard graph-cut technique. Experiments were performed on a subset of Oxford ower database and
the obtained results are compared to the reimplemented method of Nilsback et al. [1]. The evaluation shows that our
method consumes less CPU time and it has a satisfactory accuracy compared with the mentioned method above [1].
2014 Elsevier B.V. All rights reserved.
1. Introduction
Automatic ower classication systems are important for a wide
range of application including pharmacy research, environment
protection and perfume production. Thanks to computer vision,
image processing and pattern recognition techniques, automatic
recognition systems make the identication of the ower category
easier by analyzing color images. Image segmentation is generally
considered an important component of the recognition or classication processes, and affects the quality of the image analysis. Automatic ower segmentation allows the extraction of the object of
interest (foreground) from the rest of the image (background) without any manual interaction.
The majority of ower images present natural scenes with complex
background. The areas surrounding the owers have generally large variety of colors and textures. It can contain several entities distributed
separately or together such as stones, leaves, turf grass, green foliage
and soil. Fig. 1 illustrates different types of elements that can be
contained in the area surrounding the ower. As the owers from different species may look very similar both in shape and color, the use of the
background information to generate the image features can increase
this similarity and consequently reduce the classication accuracy.
Therefore, we believe that the extraction of features only from the
object of interest provides more meaningful and accurate information
This paper has been recommended for acceptance by Thomas Brox.
Corresponding author at: Higher Institute of Computer Science (ISI), 2 rue
Abourraihan Al Bayrouni, 2080 Ariana, Tunisia. Fax: +216 71 706 698.
E-mail address: ezzeddine.zagrouba@fsm.rnu.tn (E. Zagrouba).
http://dx.doi.org/10.1016/j.imavis.2014.08.012
0262-8856/ 2014 Elsevier B.V. All rights reserved.
than the one obtained from the whole image. Although many ower
image segmentation methods have been proposed in the literature, it
remains difcult to nd a general solution that is applicable to all
types of owers and gives accurate results. In the next paragraph, we
present the state of the art on ower image segmentation.
Das et al. [2] proposed an iterative segmentation algorithm using color
and spatial domain knowledge-driven feedback. Their method mapped
the RGB color space to commonly used color names in order to delete
pixels which belong to background color classes like black, brown,
green or gray. The foreground region represented by the remaining colors
is accepted if it is included in the ower area. In order to dene the ower
region, some hypotheses were made such as the ower centroid should
fall within the central region of the image. Saitoh et al. [3] presented
the Normalized Cost (NC) method to extract ower regions. It is based
on a Dynamic Programming method known as the intelligent scissors
[4] for extracting the boundary of the object of interest. The image is represented as a directed weighted graph where nodes are pixels and arcs
between neighboring pixels represent the 8-connectivity information.
This method consists in computing the local minimum cost given by a
path between two seeds. The obtained cost is normalized by the length
of this path. The shortest path in the graph gives the object edges. In
this work, the authors assume that the ower is at the center of the
image and the background occupies the peripheral area. Based on this hypothesis, the authors determine some local minimum points of each local
cost prole along the straight line from the starting point to all the middle
points of four sides. Then, they extract the boundary for each local minimum point based on the NC and they select the one that has the smallest
normalized cost and contains the center point. Another interesting automatic algorithm can be found in [1]. The rst step of this algorithm aims
1008
E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020
(a)
(b)
(c)
(d)
Fig. 1. Different types of elements in the area surrounding the ower. (a) leaves, (b) stones, (c) turf grass, and (d) soil.
to obtain an initial ower segmentation using the contrast dependent

prior MRF (Markov Random Field) cost function, which is optimized by
using graph-cut [5] based on general foreground and background color
distributions. These distributions are learned by labeling pixels in few
training images for each ower class as foreground or background.
Then, these distributions are averaged over all classes. A generic owershaped model is then tted to this initial segmentation in order to detect
petals which have a loose geometric consistency using an afne invariant
Hough-like procedure. The image region corresponding to the petals is
used to obtain image-specic foreground and background color distributions which will be employed to obtain new color distributions by blending them with general ones. A new segmentation using MRF-based
graph-cut is repeated using these new color distributions. The shapemodel tting and the segmentation will be iterated until convergence.
In [6], Aydin and Ugur proposed a ower image segmentation algorithm
based on ant colony optimization (ACO) [7]. First, RGB color space is converted to CIE-LAB color space in order to provide accuracy and perceptual
approach in color difference determination. Second, the histogram of each
color component is built (n bins for each component) to determine the
center points of each bin and three bin-centers are obtained to generate
all possible combinations of center points (n3) which constitute the potential candidate cluster's center points. Then, ant colony optimization algorithm is applied to select optimum cluster center points. Finally, each
pixel is classied in the nearest cluster. Recently, Fukuda et al. [8] proposed an automatic ower segmentation model based on graph-cut.
They set the region penalty in graph-cut based on posterior probability,
and not on likelihood as in the classical graph-cut. The posterior probabilities are determined by using AdaBoost [9] and saliency map [10]. In fact,
the AdaBoost classier provides a rectangular window representing the
approximate ower location used to compute foreground and background color distributions (likelihoods). Then, the saliency map is used
to provide prior probabilities. Using the Bayes' theorem, posterior probabilities are computed and integrated in graph-cut. More recently, Chai
et al. in [11] addressed the co-segmentation of owers which consists of
segmenting a similar object (ower) from a pair of images. Their algorithm iterates at two levels: segmenting each image independently with
GrabCut [12] at the pixel level and classifying the obtained superpixels
into foreground and background using SVM learning algorithm.
To conclude, all of these approaches aim the automatic segmentation
of ower images but there are some differences between them in terms of
quality and time-consuming. In fact, the approach proposed by [2] doesn't
extract all ower regions in the image, but isolates a region as the best description of the ower color. Besides, the fact of eliminating pixels belonging to non-ower color classes will fail with the images containing brown
or gray owers which will be considered in the evaluation of our
algorithm. All these factors can reduce the segmentation performance of

this method [2]. As for the method proposed by [3], the main disadvantage is the high computational time due to the research of the local cost
paths. As demonstrated in [1], the method of Nilsback et al. achieves
good performances (93% in quality measure [13]) but it is very slow to
be run in real-time applications. In this method [1], tting a generic
shape model needs the determination of the corners, the petals and the
center of the owers which requires high computational time. The method proposed by [6] yields accurate segmentation (87% in quality measure)
but we have no information about the processing time of the algorithm.
The method reviewed by [11] achieves about 94% segmentation accuracy
not on the original data split of the Oxford-17 ower dataset. As cited in
[14], the algorithm proposed by iteChai11 needs more than 30s as runtime for segmentation. Finally, in [8], it is demonstrated that the proposed
method can reduce the error rate in segmenting ower images with no
mention of the consuming time. In terms of evaluation settings, all aforementioned ower segmentation methods have been performed on different datasets and evaluated with different accuracy measure as shown in
Table 1.
After this study, one can say that the model proposed by Nilsback
et al. [1] yields the best performance of automatic ower image segmentation even if it is very time-consuming. That is why we take it as
a state-of-the-art in order to improve it or to propose a new fast algorithm offering similar accuracy.
Recently, it has been an important interest in image segmentation approaches based on graph-cut [1518]. Many works use graph-cut technique since it gives positive results on both medical and natural
images [19,12]. Unfortunately, the standard graph-cut algorithm suffers
from some limits: (i) it fails to give a smooth segmentation result by labeling some object pixels as background, or vice versa. (ii) It fails to dene
the desired boundary of the object. In fact, if object pixels have color distribution similar to the image background then these pixels will be labeled as background, and they won't be considered as part of the object.
This failure can be explained by the fact that the traditional energy function to be minimized uses only the color information. In order to alleviate
these problems, some solutions have been proposed to integrate a priori
information other than color information. It has been demonstrated that
the segmentation results of graph-cut methods can be improved by introducing novel constraints in the segmentation process such as shape constraints [20,21] and spatial constraints [22]. For example, authors in [21]
proposed to incorporate a generic star-shaped prior in the energy function minimized by graph-cut. The shape prior is represented as an object
mask form using the distance transform of the star shape. Although this
method [21] gives encouraging results with the assumption that the center of the shape is given by the user, the extracted object shape tends to be
1009
Table 1
Differences between segmentation methods.
Segmentation method
Flower dataset
Accuracy measure
Das et al. [2]

Saitoh et al. [3]
Fukuda et al. [8]
Nilsback et al. [1]
Aydin and Ugur [6]
Chai et al. [11]
300 Flower images

600 Flower images
150 Flower images
Oxford-17 owers (with the original data split 20 images per class)
Oxford-17 owers (with the original data split 20 images per class)
Oxford-17 owers (with the alternative data split 15 images per class)
Recall-precision
Boundary extraction rate
Error rate
Overlap score
Overlap score
Overlap score & percent correct score
star-aligned. While shape-based graph-cut works can be robust, they

have a shortcoming to segment object with high shape variability [23].
In this paper, our aim is to modify the standard graph energy function to enhance the segmentation result. Therefore, we introduce spatial
constraints through an additional term in the formulation of the graphcut energy function. Thus, we obtain a better classication of pixels as
foreground despite their high probabilities of belonging to the background if we consider only their colors and vice versa. We integrate
also gradient information in the traditional graph energy function to
better express boundary constraints. Consequently, true boundaries
can be determined based on both gradient and color information. The
rest of the paper is organized as follows: In Section 2, rst, we present
the mathematical theory of graph-cut technique and its use for energy
minimization. Then, we introduce our proposed new energy function
to be minimized by graph-cut. Finally, our automatic ower segmentation algorithm is described. In Section 3, we present the experimental
results and the evaluation of the proposed segmentation method.
and represents the weight of t-link edges. The second term Bp,q is the
boundary or smoothness term that measures the cost for two neighboring
pixels p and q being different and represents the weight of n-link edges.
The constant in Eq. (1) controls the relative importance of the boundary
term versus the region term. Since Ip is the intensity of pixel p, the terms
Rp( Obj ) and Rp( Bkg ) are equal to negative log-likelihood of foreground and background intensity models, respectively (Eqs. (2) and (3)).
2. The proposed ower segmentation method
Bp;q

Rp Obj Rp f p 0 lnPr I p jObj

Rp Bkg Rp f p 1 ln Pr Ip jBkg :
The smoothness term Bp,q is commonly expressed by the Eq. (4).

0
2 1

I
I

C
p
q
1
B
exp@
A
2
dist p; q
2
2.1. Image segmentation by classical energy minimization using graph-cut

The graph-cut optimization technique proposed by [5] is one of the
energy minimization algorithms which solve the object segmentation
problem [24]. The algorithm denes a non-oriented graph G = (V, E)
where V is a set of nodes corresponding to the image pixels and E is a
set of non-oriented edges that connect those nodes. There are two terminal nodes called source s and sink t added to V in order to represent
foreground and background, respectively. Each pixel node is connected to each one of the two terminal nodes by a link called t-link
and each pair of neighboring pixels is connected by an edge called nlink. A cut is a partition of V into disjoint subsets S and T such that the
source s is in S and the sink t is in T as shown in Fig. 2.
The cost of the cut is dened as the sum of weights of all edges that are
severed by the cut. The optimal segmentation is obtained by nding the
cut that has the minimum cost among all cuts [25,26]. This graph-cut
technique is used to minimize the standard energy function formulated
as shown in Eq. (1).
E f

X
Rp f p
Bp;q f p ; f q
pV
p;qC
where is the standard deviation of the norm of the image gradient

[27]. For grayscale images, Ip and Iq are the intensities of pixels p and
q. For color images, they can be the RGB color vectors of pixels p and
q. The dist(p, q) is the Euclidean spatial distance between the pixels p
and q.
2.2. Improved model for the energy minimization problem
We propose a new modeling of the energy function to be optimized by
graph-cut. As we search to consider not only color information, but also
the spatial one, we add a new term to the energy function. We also improved the boundary constraints by modifying the standard smoothness
term. The proposed energy function consists of two new terms, namely,
ep;q as illustrated
data-consistent term Dp and modied boundary term B
in Eq. (5).
E f
where C is the set of pairs of adjacent pixels representing the 4- (or 8-)
neighborhood system, f = fp V is the labeling function which associates
each pixel p with a label fp ( fp = 0 if it belongs to the foreground and fp =
1 otherwise). The rst term Rp is called the region or data dependant
term which evaluates the penalty for assigning any label to a pixel p

X
e
B
Dp f p
p;q f p ; f q
pP
p;qC
Table 2
Experimental results of different images.
Image
Image size (pixels)
Fig.
Fig.
Fig.
Fig.
500
560
500
502
14 Daffodil
14 Crocus
14 Dandelion
14 Tiger Lily
665
731
666
500
Computing time (s)

41.7580
52.4330
48.9170
40.3300
cut
t
Fig. 2. A 2D graph for 3 3 image and its cut.
1010
The additional term BGp,q used to improve boundary regularity is dened by Eq. (12).
where

Dp f p Rp f p 1Sp f p :
The rst term Dp is a barycentric combination (Eq. (6)) of a priori

color-dependent term Rp (Eqs. (2) and (3)) and a priori spatialdependant term Sp (Eqs. (7) and (8)). In fact, we add the spatial distribution to the color distribution in the segmentation process in order
to improve the quality. The spatial terms Sp( Obj ) and Sp( Bkg ) represent respectively the penalities for assigning a pixel p to the classes
foreground and background according to spatial distributions
(Prs(p| Obj ) and Prs(p| Bkg )). Thus, the spatial term Sp is dened
as the negative log-likelihood of foreground and background spatial distributions such as described in Eqs. (7) and (8).
Sp Obj ln Pr s pjObj
Sp Bkg ln1Prs pjObj:
The spatial distribution of foreground is determined by the approximate spatial location Rs of the ower zone in the image. Inside the region Rs, the probability to consider a pixel as an object is maximum.
Whereas, for every pixel outside the region Rs, this probability decreases
according to a Gaussian of the distances dist(p, Rs) that separate the
pixel p to the boundary of the region Rs. The formulation of these probabilities is given by Eq. (9).
Prs pjObj
8
>
<1
dist p; Rs
>
: exp 2 2
RS
if pRs :
if pRs :
where RS is the standard deviation of the distances dist(p, Rs).

In standard graph-cut, the expression of the smoothness term is not
enough to describe the boundary constraints. In fact, the area surrounding the ower is characterized by a large variety of colors and texture,
but it is generally a repetitive texture pattern or elements. Using only
the intensity difference in the boundary term (Eq. (4)) penalizes the discontinuity between two adjacent pixels within the same region. So, it is
likely that the two neighboring pixels will not have the same label not
only when they belong to different regions but also when they belong
to the same textured area. Since adding the gradient norm difference
will provide the intensity variation within a window of pixels, both of
the two pixels will have a high value of gradient norm which will reduce
the difference and increase the smoothness term in order to avoid the
ep;q is then given by Eq. (10).
cut. The modied boundary term B
e
B
p;q
8
<

1
BIp;q BGp;q
if f p f q :
dist p; q
:
0
otherwise
10
where BIp,q and BGp,q are two neighborhood interaction functions that
can penalize intensity difference and gradient norm difference between
two neighboring pixels p and q, respectively. By performing an optimization of a parameter combining BIp,q and BGp,q, we have not obtained
an efcient estimation. So we chose to minimize the proposed energy
function with no parameter between BIp,q and BGp,q. The term BIp,q is obtained using the old boundary term (Eq. (4)) and is expressed by
Eq. (11).
BI p;q
0
2 1

I I
B p q C
exp@
A:
2 2I
11
BGp;q
0 2 1

p q
B
C
exp@
A
2
2 I
12
where p and q are gradient norms of the image at neighboring

pixels p and q.Since we have optimized this new energy function
using graph-cut, we have called it Extended graph-cut.
2.3. Proposed coarse-to-ne ower segmentation algorithm
We propose a new coarse-to-ne method which devises a two-level
scheme to segment ower image. In our work, we use general foreground
and background color distributions (Gfg and Gbg) which are learned as in
[1] using the Gaussian mixture model [28]. At the rst level, our goal is
to perform a coarse segmentation based on the minimization of the proposed energy function by the graph-cut. At the second level, we enhance
the obtained result by rening color distributions in order to apply the
standard graph-cut. In the next section, we will detail the components
of the proposed algorithm followed by the complexity analysis. The proposed automatic segmentation framework is depicted in Fig. 3.
2.3.1. Coarse segmentation level
At this rst level, a huge part of the background is eliminated in
order to approximate the location of the ower zone. The idea is to select automatically several background seeds: two seeds are selected in
each window corner of the image as mentioned in Section 2.3.1.2, and
then we gather all pixels similar to those seeds in terms of color. The obtained area will correspond to a large part of the background. The remaining area of the image will contain necessary the ower region
and will be served to determine the spatial model for applying extended
graph-cut. The process of this level is described by Algorithm 1.
Algorithm 1 Coarse segmentation (). Input: general background Gbg
and foreground Gfg color models
-Input image Iimg
Output: coarse segmentation result.
1. Estimate background probability density function PDFbg of the
border of Iimg.
2. For each corner = {pixel(x, y)} in Iimg do
d
3. set seed1
x; y arg maxpixelx;y PD F bg pixelx; y
d
4. set seed2
x; y arg maxpixelx;y Gbg pixelx; y
5. End for
6. Set Iq = Quantization(Iimg, 12).
7. Label the pixels from Iq having the same color as background
seeds to 0 and the other pixels to 1.
8. Compute Euclidean distance map EDM of the previous result
(step 7).
9. Build object and background spatial models using EDM.
10 Use graph-cut algorithm to segment Iimg using spatial and color
models.
11. Return the coarse segmentation result Rescoarse.
2.3.1.1. Estimation of foreground and background distributions. We have
estimated a specic background distribution calculated from the
image to be segmented. So, as illustrated in Fig. 4, we divide our input
image into two areas, an internal area (IA) representing the kernel of
the image and an external one (EA) representing the border. We can estimate the specic background distribution by computing the multivariate Gaussian distribution in EA because we made the assumption that
the internal area will contain the ower region and the external one
will correspond to the background.
Input Image
Border
PDFbg
Kernel
Background
seeds extraction
Gfg
Gbg
Background pixels gathering

Offline learning
by GMM
Learning process
Coarse Segmentation
Color
Specific background
quantization distribution estimation
Spatial models determination
Extended Graph-cut segmentation
Fine Segmentation
1011
Training
set
Dominant background colors identification
Color distributions reestimation
Standard Graph-cut segmentation
Segmentation Output
Fig. 3. Our automatic segmentation framework for ower images.
2.3.1.2. Background seed extraction. In each corner of the input image, we

choose two pixel seeds within 20 20 window, as shown in Fig. 4. The
rst and the second seeds are chosen among the pixels contained in the
window corner as they have the maximum background likelihood value
based on background color distributions, respectively, learned by GMM
(Gbg) and computed on the external area EA (PDFbg).
2.3.1.3. Color quantization. Due to complex mixtures of colors in natural

scenes and the fact that texture features tend to be ambiguous and not
discriminative enough [29], image color quantization is performed in
order to reduce the color number. Consequently, we can easily reduce
the visual difference between pixels having similar colors as shown in
Fig. 5.
In order to perform the color image quantization, we compute the
class-map of the image by changing each original pixel color to his corresponding quantized value. Typically, the number of levels needed for
the quantization of natural scenes is between 10 and 20 [30]. We have
used an empirical value of quantization level which is xed to 12.
Fig. 5 shows some results of 12-level color quantization on four images
from Daffodil, Iris, Pansy and Fritillary classes.
Fig. 4. Illustration of the background probability density function estimation.
2.3.1.4. Background pixel gathering. This step consists in gathering the

pixels having the same color level as the background seeds. As shown in
(Fig. 6b), the gathered pixels represent a background area Zbg while the
remaining part Zo contains the ower and it is considered its spatial
location.
2.3.1.5. Spatial model determination. A distance map [31] of the previous
obtained result is computed (Fig. 6c) in order to provide a prior probability of the spatial object model which will be incorporated in the energy
function minimized by graph-cut. This map labels each background
pixel with the shortest Euclidean distance to the nearest foreground
pixel, unlike the foreground pixels which will take the value zero as a
label. Fig. 6c illustrates the Euclidean distance map of the background
pixels gathering result.
2.3.1.6. Extended graph-cut segmentation. The nal step at the coarse segmentation is the application of the extended graph-cut. Fig. 6d shows an
example of the obtained result on two ower images, namely crocus
and iris. The result may not be perfect, but it is sufcient to detect a
large part of the object with some parts of the background and a few object pixels misclassied as background. In the next section, we will explain in simple steps how to rene this result. The modied boundary
term of the energy function optimized by the extended graph-cut enhances the boundary smoothness.
Fig. 7 shows how the integration of the gradient information improves the coarse segmentation result. Using the new boundary term,
we obtain better coarse segmentation (Fig. 7c) than the result obtained
using the standard boundary term (Fig. 7b).
2.3.2. Fine segmentation
At this level, we try to rene the color distributions used at the
coarse segmentation level. Then, we apply standard graph-cut using
these updated color models. The pseudocode of the ne segmentation
is shown in Algorithm 2.
Algorithm 2 Fine segmentation(). Input: coarse segmentation result
Rescoarse
Output: ne segmentation result
1: Compute color histogram BCH of the pixels classied as background in Rescoarse.
2: Threshold BCH to identify dominant background colors.
3: Find the set of object pixels OP from Rescoarse that has a dominant
background colors.
4: Modify the labels of all pixels in OP in order to be assigned as
background.
5: Reestimate the object hobject and background hbackground color
models according to the labeling result obtained in step 4.
6: Use graph-cut algorithm to segment Iimg using the updated color
models hobject and hbackground.
7: Return the ne segmentation result Resnal.
1012
Fig. 5. Image quantization. Top row: Example of ower images belonging to (left to right) Daffodil, Iris, Pansy and Fritillary classes. Bottom row: Results of 12-levels color
quantization.
2.3.2.1. Dominant background color identication. Considering the result

of the coarse segmentation, we identify the background dominant
colors by computing the background color histogram BCH and
thresholding it. These colors will be used to rectify the classication of
background pixels labeled as foreground in the coarse segmentation result. So, we modify the labels of these pixels classied as foreground and
having a dominant background color. The obtained result is denoted by
imagecleaned where the foreground pixels form a region denoted by
objectcleaned.
2.3.2.2. Color distributions reestimation. In order to apply standard graphcut segmentation, we have to re-estimate color distributions. These distributions should be ner than those used at the coarse segmentation
level. We can rectify the color likelihoods used at the coarse
(a)
(b)
segmentation level by updating general color models. The new foreground color model hobject is a barycentric linear combination of the
color distribution hOC of the objectcleaned and the general foreground distribution Gfg as mentioned in Eq. (13).
hobject hOC 1 Gfg :
13
In some cases, the ower is not accurately cut at the coarse segmentation level. In fact, the foreground pixels marked as background inhibit
the ne segmentation to extract the ower object accurately if we include them in the background color model. To avoid this, we consider
an uncertainty zone around the objectcleaned by applying morphological
dilation using a 5 5 disk-shaped structuring element. We called
objectcleaned dilated the new object region. The background model
(c)
(d)
Fig. 6. Coarse segmentation. (a) original images, (b) gathering results, (c) Euclidean distance map results, and (d) coarse segmentation results.
(a)
(b)
1013
(c)
Fig. 7. Inuence of gradient information incorporation into the boundary term. (a) original image, (b) coarse segmentation without gradient information, and (c) coarse segmentation with
the modied boundary term.
hbackground is, therefore, a barycentric linear combination of the general

background distribution Gbg and the color distribution hNOC of the pixels
which are not part of the region objectcleaned dilated (Eq. (14)).
hbackground hNOC 1 Gbg :
14
Since the new color distributions provide more accurate modeling of

both background and foreground than those used at the rst level, we
give them more condence with a high weight. In our experiments,
the empirical value used is 0.8.
2.3.2.3. Standard graph-cut segmentation. The standard graph-cut is then

performed using the updated color models. At this step, the use of the
extended graph-cut makes hard to precisely segment the ower even
if the object and background models have been updated. In fact, the
background pixels contained in the ower zone will be used to compute
the object spatial model and they will be classied again as foreground
due to their high object probabilities. Therefore, we use the standard
graph-cut algorithm at the ne segmentation level. Fig. 8 shows how
the ne segmentation accurately cleans the coarse segmentation result.
In fact, it is difcult to segment the center pixels of some ower (e.g.
Sunower), having a dark background color (e.g. brown and black), as
foreground. That's why we ll the center hole into the foreground.
2.3.3. Complexity evaluation
Our method applies, at each segmentation level, the graph-cut technique. Since we have two levels, the computational complexity of our
algorithm will be 2 O(m n2 |c|). It is twice the running time complexity of the max-ow min-cut algorithm in the worst case; where n is
the number of nodes, m is the number of edges and |c| is the cost of the
minimum cut in the graph. However, the complexity of the algorithm in
[1] is NB O(m n2 |c|) where NB is the number of iterations required
until convergence of the algorithm. In [1], it was mentioned that the algorithm can need ve iterations to converge. This means that NB is
greater than or equal to 5 in the worst case. Since the ower images in
the Oxford database vary in size, at least 500 500 pixels, the constant
NB which is relatively high (5) increases the computational time for
large images. Thus, our proposed algorithm is faster than that proposed
in [1].
3. Experimental results
(a)
(b)
We have implemented the segmentation algorithm using MATLAB

software. We used maxow library1 for computing the mincut/maxow
of a graph. Our approach was evaluated on the Oxford ower dataset2 as
1
Fig. 8. Segmentation results of images illustrated in Fig. 6a after tting ne segmentation.
The library is available online at http://vision.csd.uwo.ca/code/.

The dataset is available online at [32].
1014
OS = 0.8812
OS = 0.8929
OS = 0.8889
OS = 0.6134
OS = 0.6556
OS = 0.6841
a) =1
b) =4
c) =100
Fig. 9. Segmentation results under different values of . The overlap scores are given below the images.
it was used in [1]. This dataset contains 17 species of ower with 80 images per category having large variations in viewpoint, scale and illumination. There is a ground truth segmentation provided with this dataset.
As in [1], we will remove four classes, i.e. snowdrops, lilies of the valley, cowslips and bluebells, because they either haven't sufcient
images or haven't segmentation ground truth. Therefore, it remains
753 images in the dataset representing 13 classes which are split into
a training set containing 260 images (20 images per class) and a test
set containing 493 images. We have also tested our method on the alternative data split used in [11] having 15 training and 65 test images per
class.
3.1. Evaluation protocol
The evaluation protocol proceeds as follows: rst, we automatically
estimate optimum values of our method's parameters using the training
0.8
Average overlap score
0.78
0.76
0.74
set and the ground truth segmentations. Next, we present ower segmentation results on the test set and evaluate our proposed method. Finally, our method is confronted with other existing methods. We rst
compare our method to the co-segmentation method of Chai et al.
[11] for the alternative data split (Table 1). Then, we provide a comparative study in terms of performance and computational complexity of
our segmentation results with those obtained by our reimplementation
of Nilsback's method [1] for the original data split. In fact, since authors
[1] did not provide the source code of their method, we reimplemented
the corresponding version, denoted as Nilsback's reimplemented method and, in doing so, we choose the different techniques that they did not
mention in the paper [1]. For example we have used the Gaussian Mixture Models (GMMs) [28] for learning the general foreground and background distributions. Furthermore, in order to detect corners in [1], it is
mentioned that two parameters, namely the worm length and the minimum distance from a potential boundary point to the straight line between the worm's head and tail, are required. However, only an
interval of values for the rst parameter (worm length) is provided as
a variable value to ensure optimal performance. That's why, in our
reimplementation of the Nilsback's method we have xed the worm
length parameter to 25 empirically and we have used the Phillips
Rosenfeld algorithm [33] to set the value of the second parameter (minimum distance) which is calculated based on the xed worm length. Indeed, we have not many other values of other several optimized
parameters to execute the reimplemented algorithm.
0.72
3.2. Parameter estimation
0.7
0.68
0.66
0
10
20
30
40
50
60
70
Parameter Lambda
Fig. 10. The optimization of the parameter .
80
90
100
Our method depends on some parameters that must be set to perform segmentation. The parameters and are estimated to minimize
the energy function via graph-cut (Eq. (5)) and the background threshold parameter bgth is used to determine the background dominant
colors. The best values of the parameters are chosen so that they maximize a segmentation quality measure called the overlap score OS,
1015
OS = 0.6166
OS = 0.6568
OS = 0.7988
OS = 0.8809
OS = 0.7643
OS = 0.7676
OS = 0.8767
OS = 0.9122
(c) =0.8
(d) =0.9
(a) =0.1
(b) =0.6
Fig. 11. Coarse segmentation results under different values of . The overlap scores are given below the images.
between the segmentation and the ground truths. The performance

measure OS is dened as indicated in Eq. (15).
OS
True foregroundSegmented foreground

:
True foregroundSegmented foreground
reaches 67.2% for = 100. So, we xed the value 4 for the parameter
as the best value that corresponds to the highest average overlap
score over all classes.
15
3.2.1. Optimization of the parameter

It is known [26] that a low value leads to an over-segmentation
and a high value of gives an under-segmentation. Fig. 9 shows the impact of on segmentation results. For this reason we have to be careful
about the choice of the parameter .
In order to nd the optimal value, we performed segmentation of
the 260 training images using different values of in the range from 1 to
10 by step 1 and also in the range from 10 to 100 by step 10 as shown in
Fig. 10.
For each xed , we compute the average overlap score over all classes. Fig. 10 illustrates that the average overlap score starts at 78.2% for
= 1, then goes up to 79.3% for = 4 and nally decreases until it
3.2.2. Optimization of the parameter

Because the parameter is the weight of the spatial model which
was used only at the rst level (coarse segmentation), it is estimated according to this level result. Fig. 11 shows coarse segmentation results of
two different images under different values of .
The parameter have to ensure a trade-off between the color model
and the spatial one in order to minimize the number of misclassied
background pixels considered foreground as illustrated in Fig. 11. To
choose the best , we performed segmentation of 260 training images
using different values of in the range from 0.1 to 0.92. Fig. 12 shows
the evolution of the segmentation quality according to the values of
the parameter .
As illustrated in Fig. 12, when the value of is less than 0.9, the segmentation quality has increased sharply. Then, it falls slightly from
Average coarse segmentation overlap score
0.8
0.75
0.7
0.65
0.6
0.55
0.1
0.2
0.3
0.4
0.5
0.6
Parameter Beta
Fig. 12. The parameter selection.
0.7
0.8
0.9
0.92
1016

0.7862
0.786
0.7858
0.7856
0.7854
0.7852
0.785
0.7848
60
100
80
Parameter Background threshold
Pansy
Tiger Lily
Dandelion
Fritillary
Crocus
Daffodil
Fig. 13. The parameter bgth selection.
(a)
(b)
(c)
(d)
Fig. 14. Segmentation results of ower images. (a) original image, (b) coarse segmentation, (c) ne segmentation and (d) ground truth.
1017
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Daffodil
Crocus
Iris
Tigerlily
Wildtulip Fritillary
Sunflower
Daisy
Colts foot Dandelion Buttercup Windflower Pansy
Flower classes
Fig. 15. Segmentation quality.
equal to 0.9. Thus, we x the parameter to 0.9, which corresponds to

the rst value with which the segmentation quality has decreased.
3.2.3. Optimization of the parameter bgth
Finally, we have estimated the best background threshold used to
nd the dominant background colors from the coarse segmentation result as shown in Fig. 13.
If bgth is small, foreground colors will be considered dominant background colors; on the other hand, if bgth is large, we cannot obtain a satisfactory list of dominant background colors. By varying this parameter,
we note that the best segmentation is obtained with bgth xed at 80 as
illustrated in Fig. 13.
3.3. Evaluation of segmentation results
After setting the parameters, we applied our method on the test set.
Fig. 14 shows some ower images and their segmentation results at
each level.
Our method works under an assumption that the coarse segmentation result should rectify the distributions which will be used at the
ne segmentation level. Yet, this assumption may not be detained in
some cases when a large part of the foreground is similar in color to
Classical graph-cut
Classical graph-cut + modified boundary term
Classical graph-cut + additional spatial term
Classical graph-cut + modified boundary term + additional spatial term
0.9
the background pixels. As shown in Fig. 14, our method fails to separate
the ower from the background under such cases.
Our proposed segmentation algorithm is executed on a machine
with an AMD Athlon processor. Table 2 shows execution times of the algorithm on different images (presented in Fig. 14). The running time required to perform our algorithm is around 4055 s, while the
reimplemented Nilsback's algorithm takes far more than 1 min.
In order to objectively evaluate the accuracy of our segmentation
method, we compute the average overlap score for each ower class
as indicated in Fig. 15.
The red bars represent the condence intervals. Upper bounds are
the overlap scores obtained in the best case and lower bounds are the
overlap scores obtained in the worst case. The best and the worst overlap scores are achieved when we choose the 20 best and the 20 worst
segmentation results from each class, respectively. In the best case, the
average of our segmentation score is 85.28%. As Fig. 15 indicates, the average overlap score (OS) can reach 92% and it is never below 55%. This
score indicates that our method offers encouraging and good results
for automatic ower segmentation. Our method gives the worst results
for segmenting Crocus class images due to having ower colors similar to those of the background.
In order to illustrate the contributions of our proposed ower segmentation system, we performed these four tests at the coarse
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Daffodil
Crocus
Iris
Tigerlily
Wildtulip Fritillary Sunflower Daisy Colts foot Dandelion Buttercup Windflower Pansy
Flower classes
Fig. 16. Inuence of the energy function modication on the segmentation performance.
1018
Fig. 17. Comparison of our segmentation results with the results of Nilsback's reimplemented method. The top row shows original images. The second and the third rows show segmentation results of Nilsback's reimplemented method and our method, respectively.
segmentation level using obviously the classical graph-cut at the ne

segmentation level:
Test using the standard graph-cut;

Test using the standard graph-cut with the modied boundary term;
Test using the standard graph-cut by incorporating the spatial term;
Test using the standard graph-cut with the additional spatial term and
the modied boundary term.
The results of these experimental tests are shown in Fig. 16. We note
that the quality of segmentation is improved thanks to the addition of
spatial constraints and the modication of the boundary term in the formulation of the energy function minimized by graph-cut. For example,
the fact that ower leaves for Fritillary and Pansy classes containing
an obvious difference in color makes the fail of the segmentation using
only the classical graph-cut (24% for Fritillary and 56% for Pansy). It
can be seen (Fig. 16) that the integration of the gradient information in
the boundary term improves the overlap score for Fritillary and
Pansy classes to 67% and 73%, respectively. Furthermore, adding the
spatial term improves the overlap score for each ower class. Thus,
the segmentation performance is improved over most ower classes
using our proposed algorithm.
3.4. Comparison study

In order to place our segmentation results, we compare experimentally our work to those of Chai et al. [11]. Indeed, we obtain an average
OS equal to 77% whereas the average OS for Chai et al. method is about
94%. Although it can be noticed that our method achieves 91.96%
Fig. 18. Comparison of our segmentation results with the results of Nilsback et al. [1]. The top row shows original images. The second and the third rows show segmentation results of
Nilsback's method and our method, respectively.
1019
0.95
Our method
Nilsbacks reimplemented method
0.9
0.85
0.85
0.75
0.7
0.65
Daffodil
Crocus
Iris
Tigerlily
Wildtulip
Fritillary
Sunflower
Daisy
Colts foot
Dandelion Buttercup
Windflower
Pansy
Flower classes
Fig. 19. Segmentation quality comparison of our method with Nilsback's reimplemented method.
segmentation accuracy with the alternative datasplit used in [11] in the

best case. We compare also our method to Nilsback's method [1] in
terms of segmentation quality and running time. We note that the unavailability of the parameter values and the adopted techniques in the
implementation does not allow us to have exactly the same results obtained in [1]. So, we compare our proposed method segmentation results with those obtained by our Nilsback's reimplemented method
(Fig. 17). As shown, the proposed method can give more accurate segmentation results for a variety of ower images. In Fig. 18 we also compare our segmentation results with those obtained by the method
originally implemented by Nilsback [1] available online at [32]. We
note that our method delivers again accurate segmentation results
with images that fail using Nilsback's method [1].
(Dandelion) and lower for four classes (Crocus, Iris, Daisy and
Windower). So, our method can achieve better results in most cases.
3.4.1. Segmentation quality

We compute the average overlap score for each ower class by the
two methods as indicated in Fig. 19. Comparing our values with those
obtained by the reimplemented method of Nilsback et al., we note
that our results are better for eight classes, very similar for one class
4. Conclusion
3.4.2. Running time

We have also evaluated the performance of our method in terms of
computational time in comparison with the reimplemented method of
Nilsback et al. [1]. We notice that the running time of our algorithm is
faster than that of Nilsback as shown in Fig. 20. Our running time is obviously justied by our time complexity previously calculated. So, we
consider that the method presented in [1] is not well suited to such
real-time application because it requires extraprocessing time to
achieve accurate segmentation (over one minute). Our method is then
more effective in practical.
In this work, we present a method for ower segmentation based on

the minimization of MRF energy by graph-cut. We have made two important contributions. At rst, we have added a spatial term in the
800
Our method
Nilsbacks reimplemented method
700
Time (second)
600
500
400
300
200
100
0
Daffodil
Crocus
Iris
Tigerlily
Wildtulip
Fritillary
Sunflower
Daisy
Colts foot Dandelion Buttercup Windflower Pansy
Flower classes
Fig. 20. Segmentation running time comparison of our method with Nilsback's reimplemented method.
1020
energy function. Then, we have incorporated the gradient information

in the boundary term of the energy function. In summary, we have described a two-level segmentation scheme. At its rst level, the scheme
extracts a large part of the object from the input image and at the second
one renes the result obtained above. The experimental study shows
that our proposed method has a high accuracy and overcomes the limitations of the standard graph-cut. While our accuracy is not greater
than that of the studied method of Nilsback et al. [1], our method is
more efcient and more faster. Future work will concentrate on generalizing our method by testing it on another benchmark database of
ower images.
Acknowledgments
We thank Vladimir Kolmogorov for graph-cut software which we
used in our work.
References
[1] M.E. Nilsback, A. Zisserman, Delving deeper into the whorl of ower segmentation,
Image Vis. Comput. 28 (2010) 10491062.
[2] M. Das, R. Manmatha, E.M. Riseman, Indexing ower patent images using domain
knowledge, IEEE Intell. Syst. 14 (1999) 2433.
[3] T. Saitoh, K. Aoki, T. Kaneko, Automatic recognition of blooming owers, volume 1,
in: International Conference on Pattern Recognition (ICPR), UK, Cambridge, 2004.
pp. 2730.
[4] E.N. Mortensen, W.A. Barrett, Intelligent scissors for image composition, in: International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH),
Los Angeles, CA, USA, 1995. pp. 191198.
[5] Y. Boykov, M. Jolly, Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images, volume 1, in: International Conference on Computer
Vision (ICCV), Vancouver, British Columbia, Canada, 2001. pp. 105112.
[6] D. Aydin, A. Ugur, Extraction of ower regions in color images using ant colony optimization, Procedia Comput. Sci. 3 (2011) 530536.
[7] M. Dorigo, M. Birattari, T. Stitzle, Ant colony optimization: articial ants as a computational intelligence technique, IEEE Comput. Intell. Mag. 1 (2006) 2839.
[8] K. Fukuda, T. Takiguchi, Y. Ariki, Automatic segmentation of object region using
graph cuts based on saliency maps and adaboost, volume 36, in: IEEE International
Symposium on Consumer Electronics (ISCE), Kyoto, Japan, 2009. pp. 3637.
[9] Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: International Conference on Machine Learning (ICML), Bari, Italy, 1996. pp. 148156.
[10] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene
analysis, IEEE Trans. Pattern. Anal. Mach. Intell. 20 (1998) 12541259.
[11] Y. Chai, V. Lempitsky, A. Zisserman, Bicos: a bi-level co-segmentation method for
image classication, in: IEEE International Conference on Computer Vision, Barcelona, Spain, 2011. pp. 25792586.
[12] C. Rother, V. Kolmogorov, A. Blake, Grabcut: interactive foreground extraction using

iterated graph cuts, ACM Trans. Graph. 23 (2004) 309314.
[13] Y. Chai, Recognition Between a Large Number of Flower Species, 2010.
[14] A. Angelova, S. Zhu, Efcient object detection and segmentation for ne-grained recognition, in: CVPR13: IEEE Conference on Computer Vision and Pattern Recognition,
Portland, OR, 2013. pp. 2328.
[15] B. Peng, L. Zhang, D. Zhang, J. Yang, Image segmentation by iterated region merging
with localized graph cuts, Pattern Recogn. 44 (2011) 25272538.
[16] C. Jung, B. Kim, C. Kim, Automatic segmentation of salient objects using iterative reversible graph cut, in: IEEE International Conference on Multimedia and Expo
(ICME), Singapore, 2013. pp. 590595.
[17] B. Peng, L. Zhang, J. Yang, Iterated graph cuts for image segmentation, in: Asian Conference on Computer Vision (ACCV), Xian, China, 2009. pp. 677686.
[18] W. Tao, H. Jin, Y. Zhang, L. Liu, D. Wang, Image thresholding using graph cuts, IEEE
Trans. Syst. Man Cybern. Syst. Hum. 38 (2008) 11811195.
[19] D. Freedman, T. Zhang, Interactive graph cut based segmentation with shape priors,
volume 1, in: Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA,
2005. pp. 755762.
[20] V. Kolmogorov, Y. Boykov, What metrics can be approximated by geo-cuts, or global
optimization of length/area and ux, in: IEEE International Conference on Computer
Vision, Beijing, China, 2005. pp. 564571.
[21] O. Veksler, Star shape prior for graph-cut image segmentation, in: European Conference on Computer Vision, Marseille, France, 2008. pp. 454467.
[22] A. Criminisi, G. Cross, A. Blake, V. Kolmogorov, Bilayer segmentation of live video, in:
IEEE Computer Vision and Pattern Recognition, New York, NY, USA, 2006. pp. 5360.
[23] B. Yangel, D. Vetrov, Image segmentation with a shape prior based on simplied
skeleton, in: EMMCVPR11: Energy Minimization Methods in Computer Vision
and Pattern Recognition, St. Petersburg, Russia, 2011. pp. 247260.
[24] G. Slabaugh, G. Unal, Graph cuts segmentation using an elliptical shape prior, volume 2, in: IEEE International Conference on Image Processing (ICIP), Genoa, Italy,
2005. pp. 12221225.
[25] Y. Boykov, O. Veksler, Graph Cuts in Vision and Graphics: Theories and Applications,
Handbook of Mathematical Models in Computer Vision, Springer, 2005, pp. 7996.
[26] S. Candemir, Y.S. Akgul, Adaptive regularization parameter for graph cut segmentation, in: International Conference on Image Analysis and Recognition (ICIAR), volume 1, Povoa de Varzim, Portugal, 2010. pp. 117126.
[27] M. Fradet, Contributions la Segmentation de Squences D'images au Sens du
Mouvement Dans un Contexte Semi-automatique, 2010.
[28] D. Reynolds, Speaker identication and verication using gaussian mixture speaker
models, Speech Comm. 17 (1995) 91108.
[29] G. Jugui, Multi-resolution Region-preserving Segmentation for Color Images of Natural Scene, 2004.
[30] Y. Deng, B.S. Manjunath, H. Shin, Color image segmentation, volume 2, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Ft.
Collins, CO, USA, 1999. pp. 446451.
[31] P.E. Danielsson, Euclidean distance mapping, Comput. Graph. Image Process. 14
(1980) 227248.
[32] M. Nilsback, A. Zisserman, Flower datasets2009. (http://www.robots.ox.ac.uk/ vgg/
data/owers/).
[33] M. Marji, R. Klette, P. Siy, Corner detection and curve partitioning using arc-chord
distance, in: International Workshop on Combinatorial Image Analysis, Auckland,
New Zealand, 2004. pp. 512521.

1 s2.0 S0262885614001413 Main PDF

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

1 s2.0 S0262885614001413 Main PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Image and Vision Computing 32 (2014) 10071020

Contents lists available at ScienceDirect

Image and Vision Computing

Model-based graph-cut method for automatic ower segmentation with

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

to obtain an initial ower segmentation using the contrast dependent

algorithm. All these factors can reduce the segmentation performance of

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

Das et al. [2]

300 Flower images

star-aligned. While shape-based graph-cut works can be robust, they

2. The proposed ower segmentation method

The smoothness term Bp,q is commonly expressed by the Eq. (4).

2.1. Image segmentation by classical energy minimization using graph-cut

where is the standard deviation of the norm of the image gradient

Image size (pixels)

Computing time (s)

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

The rst term Dp is a barycentric combination (Eq. (6)) of a priori

Sp Bkg ln1Prs pjObj:

where RS is the standard deviation of the distances dist(p, Rs).

where p and q are gradient norms of the image at neighboring

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

Background pixels gathering

Spatial models determination

Extended Graph-cut segmentation

Dominant background colors identification

Color distributions reestimation

Standard Graph-cut segmentation

2.3.1.2. Background seed extraction. In each corner of the input image, we

2.3.1.3. Color quantization. Due to complex mixtures of colors in natural

Fig. 4. Illustration of the background probability density function estimation.

2.3.1.4. Background pixel gathering. This step consists in gathering the

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

2.3.2.1. Dominant background color identication. Considering the result

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

hbackground is, therefore, a barycentric linear combination of the general

Since the new color distributions provide more accurate modeling of

2.3.2.3. Standard graph-cut segmentation. The standard graph-cut is then

We have implemented the segmentation algorithm using MATLAB

Fig. 8. Segmentation results of images illustrated in Fig. 6a after tting ne segmentation.

The library is available online at http://vision.csd.uwo.ca/code/.

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

Average overlap score

3.2. Parameter estimation

Fig. 10. The optimization of the parameter .

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

between the segmentation and the ground truths. The performance

True foregroundSegmented foreground

3.2.1. Optimization of the parameter

3.2.2. Optimization of the parameter

Average coarse segmentation overlap score

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

Average overlap score

Parameter Background threshold

Fig. 13. The parameter bgth selection.

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020

Average overlap score

Colts foot Dandelion Buttercup Windflower Pansy

equal to 0.9. Thus, we x the parameter to 0.9, which corresponds to

Average overlap score

E. Zagrouba et al. / Image and Vision Computing 32 (2014) 10071020