Beruflich Dokumente
Kultur Dokumente
a r t i c l e
i n f o
Article history:
Received 22 March 2013
Received in revised form 9 August 2014
Accepted 11 August 2014
Available online 30 September 2014
Keywords:
Automatic ower image segmentation
Graph-cut
Spatial prior
a b s t r a c t
In this paper, we present an accelerated system for segmenting ower images based on graph-cut technique which
formulates the segmentation problem as an energy function minimization. The contribution of this paper consists to
propose an improvement of the classical used energy function, which is composed of a data-consistent term and a
boundary term. For this, we integrate an additional data-consistent term based on the spatial prior and we add gradient information in the boundary term. Then, we propose an automated coarse-to-ne segmentation method composed mainly of two levels: coarse segmentation and ne segmentation. First, the coarse segmentation level is based
on minimizing the proposed energy function. Then, the ne segmentation is done by optimizing the energy function
through the standard graph-cut technique. Experiments were performed on a subset of Oxford ower database and
the obtained results are compared to the reimplemented method of Nilsback et al. [1]. The evaluation shows that our
method consumes less CPU time and it has a satisfactory accuracy compared with the mentioned method above [1].
2014 Elsevier B.V. All rights reserved.
1. Introduction
Automatic ower classication systems are important for a wide
range of application including pharmacy research, environment
protection and perfume production. Thanks to computer vision,
image processing and pattern recognition techniques, automatic
recognition systems make the identication of the ower category
easier by analyzing color images. Image segmentation is generally
considered an important component of the recognition or classication processes, and affects the quality of the image analysis. Automatic ower segmentation allows the extraction of the object of
interest (foreground) from the rest of the image (background) without any manual interaction.
The majority of ower images present natural scenes with complex
background. The areas surrounding the owers have generally large variety of colors and textures. It can contain several entities distributed
separately or together such as stones, leaves, turf grass, green foliage
and soil. Fig. 1 illustrates different types of elements that can be
contained in the area surrounding the ower. As the owers from different species may look very similar both in shape and color, the use of the
background information to generate the image features can increase
this similarity and consequently reduce the classication accuracy.
Therefore, we believe that the extraction of features only from the
object of interest provides more meaningful and accurate information
This paper has been recommended for acceptance by Thomas Brox.
Corresponding author at: Higher Institute of Computer Science (ISI), 2 rue
Abourraihan Al Bayrouni, 2080 Ariana, Tunisia. Fax: +216 71 706 698.
E-mail address: ezzeddine.zagrouba@fsm.rnu.tn (E. Zagrouba).
http://dx.doi.org/10.1016/j.imavis.2014.08.012
0262-8856/ 2014 Elsevier B.V. All rights reserved.
than the one obtained from the whole image. Although many ower
image segmentation methods have been proposed in the literature, it
remains difcult to nd a general solution that is applicable to all
types of owers and gives accurate results. In the next paragraph, we
present the state of the art on ower image segmentation.
Das et al. [2] proposed an iterative segmentation algorithm using color
and spatial domain knowledge-driven feedback. Their method mapped
the RGB color space to commonly used color names in order to delete
pixels which belong to background color classes like black, brown,
green or gray. The foreground region represented by the remaining colors
is accepted if it is included in the ower area. In order to dene the ower
region, some hypotheses were made such as the ower centroid should
fall within the central region of the image. Saitoh et al. [3] presented
the Normalized Cost (NC) method to extract ower regions. It is based
on a Dynamic Programming method known as the intelligent scissors
[4] for extracting the boundary of the object of interest. The image is represented as a directed weighted graph where nodes are pixels and arcs
between neighboring pixels represent the 8-connectivity information.
This method consists in computing the local minimum cost given by a
path between two seeds. The obtained cost is normalized by the length
of this path. The shortest path in the graph gives the object edges. In
this work, the authors assume that the ower is at the center of the
image and the background occupies the peripheral area. Based on this hypothesis, the authors determine some local minimum points of each local
cost prole along the straight line from the starting point to all the middle
points of four sides. Then, they extract the boundary for each local minimum point based on the NC and they select the one that has the smallest
normalized cost and contains the center point. Another interesting automatic algorithm can be found in [1]. The rst step of this algorithm aims
1008
(a)
(b)
(c)
(d)
Fig. 1. Different types of elements in the area surrounding the ower. (a) leaves, (b) stones, (c) turf grass, and (d) soil.
1009
Table 1
Differences between segmentation methods.
Segmentation method
Flower dataset
Accuracy measure
Recall-precision
Boundary extraction rate
Error rate
Overlap score
Overlap score
Overlap score & percent correct score
and represents the weight of t-link edges. The second term Bp,q is the
boundary or smoothness term that measures the cost for two neighboring
pixels p and q being different and represents the weight of n-link edges.
The constant in Eq. (1) controls the relative importance of the boundary
term versus the region term. Since Ip is the intensity of pixel p, the terms
Rp( Obj ) and Rp( Bkg ) are equal to negative log-likelihood of foreground and background intensity models, respectively (Eqs. (2) and (3)).
Bp;q
Rp Obj Rp f p 0 lnPr I p jObj
Rp Bkg Rp f p 1 ln Pr Ip jBkg :
I
I
C
p
q
1
B
exp@
A
2
dist p; q
2
X
Rp f p
Bp;q f p ; f q
pV
p;qC
where C is the set of pairs of adjacent pixels representing the 4- (or 8-)
neighborhood system, f = fp V is the labeling function which associates
each pixel p with a label fp ( fp = 0 if it belongs to the foreground and fp =
1 otherwise). The rst term Rp is called the region or data dependant
term which evaluates the penalty for assigning any label to a pixel p
X
e
B
Dp f p
p;q f p ; f q
pP
p;qC
Table 2
Experimental results of different images.
Image
Fig.
Fig.
Fig.
Fig.
500
560
500
502
14 Daffodil
14 Crocus
14 Dandelion
14 Tiger Lily
665
731
666
500
cut
t
Fig. 2. A 2D graph for 3 3 image and its cut.
1010
The additional term BGp,q used to improve boundary regularity is dened by Eq. (12).
where
Dp f p Rp f p 1Sp f p :
The spatial distribution of foreground is determined by the approximate spatial location Rs of the ower zone in the image. Inside the region Rs, the probability to consider a pixel as an object is maximum.
Whereas, for every pixel outside the region Rs, this probability decreases
according to a Gaussian of the distances dist(p, Rs) that separate the
pixel p to the boundary of the region Rs. The formulation of these probabilities is given by Eq. (9).
Prs pjObj
8
>
<1
dist p; Rs
>
: exp 2 2
RS
if pRs :
if pRs :
e
B
p;q
8
<
1
BIp;q BGp;q
if f p f q :
dist p; q
:
0
otherwise
10
where BIp,q and BGp,q are two neighborhood interaction functions that
can penalize intensity difference and gradient norm difference between
two neighboring pixels p and q, respectively. By performing an optimization of a parameter combining BIp,q and BGp,q, we have not obtained
an efcient estimation. So we chose to minimize the proposed energy
function with no parameter between BIp,q and BGp,q. The term BIp,q is obtained using the old boundary term (Eq. (4)) and is expressed by
Eq. (11).
BI p;q
0
2 1
I I
B p q C
exp@
A:
2 2I
11
BGp;q
0 2 1
p q
B
C
exp@
A
2
2 I
12
Input Image
Border
PDFbg
Kernel
Background
seeds extraction
Gfg
Gbg
Learning process
Coarse Segmentation
Color
Specific background
quantization distribution estimation
Fine Segmentation
1011
Training
set
Segmentation Output
Fig. 3. Our automatic segmentation framework for ower images.
1012
Fig. 5. Image quantization. Top row: Example of ower images belonging to (left to right) Daffodil, Iris, Pansy and Fritillary classes. Bottom row: Results of 12-levels color
quantization.
(a)
(b)
segmentation level by updating general color models. The new foreground color model hobject is a barycentric linear combination of the
color distribution hOC of the objectcleaned and the general foreground distribution Gfg as mentioned in Eq. (13).
hobject hOC 1 Gfg :
13
In some cases, the ower is not accurately cut at the coarse segmentation level. In fact, the foreground pixels marked as background inhibit
the ne segmentation to extract the ower object accurately if we include them in the background color model. To avoid this, we consider
an uncertainty zone around the objectcleaned by applying morphological
dilation using a 5 5 disk-shaped structuring element. We called
objectcleaned dilated the new object region. The background model
(c)
(d)
Fig. 6. Coarse segmentation. (a) original images, (b) gathering results, (c) Euclidean distance map results, and (d) coarse segmentation results.
(a)
(b)
1013
(c)
Fig. 7. Inuence of gradient information incorporation into the boundary term. (a) original image, (b) coarse segmentation without gradient information, and (c) coarse segmentation with
the modied boundary term.
14
(a)
(b)
1014
OS = 0.8812
OS = 0.8929
OS = 0.8889
OS = 0.6134
OS = 0.6556
OS = 0.6841
a) =1
b) =4
c) =100
Fig. 9. Segmentation results under different values of . The overlap scores are given below the images.
it was used in [1]. This dataset contains 17 species of ower with 80 images per category having large variations in viewpoint, scale and illumination. There is a ground truth segmentation provided with this dataset.
As in [1], we will remove four classes, i.e. snowdrops, lilies of the valley, cowslips and bluebells, because they either haven't sufcient
images or haven't segmentation ground truth. Therefore, it remains
753 images in the dataset representing 13 classes which are split into
a training set containing 260 images (20 images per class) and a test
set containing 493 images. We have also tested our method on the alternative data split used in [11] having 15 training and 65 test images per
class.
3.1. Evaluation protocol
The evaluation protocol proceeds as follows: rst, we automatically
estimate optimum values of our method's parameters using the training
0.8
0.78
0.76
0.74
set and the ground truth segmentations. Next, we present ower segmentation results on the test set and evaluate our proposed method. Finally, our method is confronted with other existing methods. We rst
compare our method to the co-segmentation method of Chai et al.
[11] for the alternative data split (Table 1). Then, we provide a comparative study in terms of performance and computational complexity of
our segmentation results with those obtained by our reimplementation
of Nilsback's method [1] for the original data split. In fact, since authors
[1] did not provide the source code of their method, we reimplemented
the corresponding version, denoted as Nilsback's reimplemented method and, in doing so, we choose the different techniques that they did not
mention in the paper [1]. For example we have used the Gaussian Mixture Models (GMMs) [28] for learning the general foreground and background distributions. Furthermore, in order to detect corners in [1], it is
mentioned that two parameters, namely the worm length and the minimum distance from a potential boundary point to the straight line between the worm's head and tail, are required. However, only an
interval of values for the rst parameter (worm length) is provided as
a variable value to ensure optimal performance. That's why, in our
reimplementation of the Nilsback's method we have xed the worm
length parameter to 25 empirically and we have used the Phillips
Rosenfeld algorithm [33] to set the value of the second parameter (minimum distance) which is calculated based on the xed worm length. Indeed, we have not many other values of other several optimized
parameters to execute the reimplemented algorithm.
0.72
0.7
0.68
0.66
0
10
20
30
40
50
60
70
Parameter Lambda
80
90
100
Our method depends on some parameters that must be set to perform segmentation. The parameters and are estimated to minimize
the energy function via graph-cut (Eq. (5)) and the background threshold parameter bgth is used to determine the background dominant
colors. The best values of the parameters are chosen so that they maximize a segmentation quality measure called the overlap score OS,
1015
OS = 0.6166
OS = 0.6568
OS = 0.7988
OS = 0.8809
OS = 0.7643
OS = 0.7676
OS = 0.8767
OS = 0.9122
(c) =0.8
(d) =0.9
(a) =0.1
(b) =0.6
Fig. 11. Coarse segmentation results under different values of . The overlap scores are given below the images.
reaches 67.2% for = 100. So, we xed the value 4 for the parameter
as the best value that corresponds to the highest average overlap
score over all classes.
15
0.8
0.75
0.7
0.65
0.6
0.55
0.1
0.2
0.3
0.4
0.5
0.6
Parameter Beta
Fig. 12. The parameter selection.
0.7
0.8
0.9
0.92
1016
0.786
0.7858
0.7856
0.7854
0.7852
0.785
0.7848
60
100
80
Pansy
Tiger Lily
Dandelion
Fritillary
Crocus
Daffodil
(a)
(b)
(c)
(d)
Fig. 14. Segmentation results of ower images. (a) original image, (b) coarse segmentation, (c) ne segmentation and (d) ground truth.
1017
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Daffodil
Crocus
Iris
Tigerlily
Wildtulip Fritillary
Sunflower
Daisy
Flower classes
Fig. 15. Segmentation quality.
Classical graph-cut
Classical graph-cut + modified boundary term
Classical graph-cut + additional spatial term
Classical graph-cut + modified boundary term + additional spatial term
0.9
the background pixels. As shown in Fig. 14, our method fails to separate
the ower from the background under such cases.
Our proposed segmentation algorithm is executed on a machine
with an AMD Athlon processor. Table 2 shows execution times of the algorithm on different images (presented in Fig. 14). The running time required to perform our algorithm is around 4055 s, while the
reimplemented Nilsback's algorithm takes far more than 1 min.
In order to objectively evaluate the accuracy of our segmentation
method, we compute the average overlap score for each ower class
as indicated in Fig. 15.
The red bars represent the condence intervals. Upper bounds are
the overlap scores obtained in the best case and lower bounds are the
overlap scores obtained in the worst case. The best and the worst overlap scores are achieved when we choose the 20 best and the 20 worst
segmentation results from each class, respectively. In the best case, the
average of our segmentation score is 85.28%. As Fig. 15 indicates, the average overlap score (OS) can reach 92% and it is never below 55%. This
score indicates that our method offers encouraging and good results
for automatic ower segmentation. Our method gives the worst results
for segmenting Crocus class images due to having ower colors similar to those of the background.
In order to illustrate the contributions of our proposed ower segmentation system, we performed these four tests at the coarse
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Daffodil
Crocus
Iris
Tigerlily
Wildtulip Fritillary Sunflower Daisy Colts foot Dandelion Buttercup Windflower Pansy
Flower classes
Fig. 16. Inuence of the energy function modication on the segmentation performance.
1018
Fig. 17. Comparison of our segmentation results with the results of Nilsback's reimplemented method. The top row shows original images. The second and the third rows show segmentation results of Nilsback's reimplemented method and our method, respectively.
The results of these experimental tests are shown in Fig. 16. We note
that the quality of segmentation is improved thanks to the addition of
spatial constraints and the modication of the boundary term in the formulation of the energy function minimized by graph-cut. For example,
the fact that ower leaves for Fritillary and Pansy classes containing
an obvious difference in color makes the fail of the segmentation using
only the classical graph-cut (24% for Fritillary and 56% for Pansy). It
can be seen (Fig. 16) that the integration of the gradient information in
the boundary term improves the overlap score for Fritillary and
Pansy classes to 67% and 73%, respectively. Furthermore, adding the
spatial term improves the overlap score for each ower class. Thus,
the segmentation performance is improved over most ower classes
using our proposed algorithm.
Fig. 18. Comparison of our segmentation results with the results of Nilsback et al. [1]. The top row shows original images. The second and the third rows show segmentation results of
Nilsback's method and our method, respectively.
1019
0.95
Our method
Nilsbacks reimplemented method
0.9
0.85
0.85
0.75
0.7
0.65
Daffodil
Crocus
Iris
Tigerlily
Wildtulip
Fritillary
Sunflower
Daisy
Colts foot
Dandelion Buttercup
Windflower
Pansy
Flower classes
Fig. 19. Segmentation quality comparison of our method with Nilsback's reimplemented method.
(Dandelion) and lower for four classes (Crocus, Iris, Daisy and
Windower). So, our method can achieve better results in most cases.
4. Conclusion
800
Our method
Nilsbacks reimplemented method
700
Time (second)
600
500
400
300
200
100
0
Daffodil
Crocus
Iris
Tigerlily
Wildtulip
Fritillary
Sunflower
Daisy
Flower classes
Fig. 20. Segmentation running time comparison of our method with Nilsback's reimplemented method.
1020