Sie sind auf Seite 1von 4

SEGMENTATION OF FACES IN VIDEO FOOTAGE USING HSV COLOR

FOR FACE DETECTION AND IMAGE RETRIEVAL


Osamu Ikeda
Faculty of Engineering, Takushoku University
815-1 Tate, Hachioji, Tokyo, 193-0985 Japan

ABSTRACT example, presented a method that uses three consecutive


frames to take into account motion and user interaction when
Previous studies on face detection in video footages show that automatic detection fails [I I]. That may also be the case of
segmenting faces accurately and reliably i s often hard to retrieving some visual information from video footages,
succeed, leading instead to laborious and tedious interactive where accuracy may also be crucial.
manipulation. This paper presents a segmentation method In this paper, we present a segmentation method using HSV
using controlled weights on the three HSV components and color and construct a face detection and image retrieval
constructs aface detection and image retrieval system. First, it system. First, it i s shown that HSV color has advantages over
is shown that HSV color has advantages over RGB or YCbCr RGB or YCbCr in segmentation capability and easiness in its
one when segmenting a face and generating a binary pattern manipulation. Then, a face detecting and image retrieving
that retains as many features of the face as possible in the system i s constructed based on the method, where each time a
original color picture. Then, a face detection and image significant scene change i s detected in the video footage,
retrieval system i s constructed using HSV color, where each segmentation is carried out for the beginning frame by
time a significant scene change i s detected segmentation i s applying a few sets of the weights on the HSV components,
carried out for the beginning frame using a few sets o f the and resulting patterns are checked in some face requirements
weights on the HSV components, and resulting patterns are and correlated with an input face pattem. Computer
checked in some requirements and correlated with a typical experiments show that the successful detection rate is more
face pattern. Computer experiments show that the successful than 95 percent and that images in the video footage can be
detection rate i s more than 95 percent and that images can be retrieved from the input face image in a short time.
retrieved from an input face image in a short time.
2. COLOR SEGMENTATION
1. INTRODUCTION We compare RGB, HSV and YCbCr colors experimentally to
choose the most reliable and convenient color for
Research on face detection in images and its related areas has segmentation. The segmentation means to extract the image
extensively been made in recent years especially in the fields region continuous in the eight neighborhood with the sampled
of image processing and computer vision [I]. Rowley et al. point within a given error as:
[2] and Feraud et al. [3] used neural network-based methods,
Schneiderman et al. developed a NaTve Bayes classifier [4],
Osuna et al. proposed an algorithm to train Support Vector
Machines [ 5 ] , and Turk et al. proposed to use eigenfaces [6].
Those algorithms are aimed at detecting or recognizing the
face in image. The detection is required to be in real time [7]
in computer vision, possibly at the sacrifice o f reliability for
each frame but not for a sequence o f frame images.
In the field o f multimedia, on the other hand, the focus has
been on not just its detection or recognition but also
identification o f faces, people, or some specific objects in where the suffix s denotes the sampled point, each color
video images or video footages. Satoh e l al., for example, component value i s normalized so as to vary between 0 and
tried to retrieve the name from the face or the face from the unity, and IV'S are weights.
name using the video, video caption and the transcripts [SI. Face images in the video footages may have been
Since the segmentation accuracy affects to the identification illuminated with a variety o f light sources, with a variety o f
and the images may be available in a limited duration o f time, color adjustments, and may include objects of similar colors
several improvements have been reported. They combine to that of the face in the background. In addition, they vary in
temporal segmentation or tracking with spatial segmentation skin color, texture, frontal directions and size. So a single set
[9] or adopt manual segmentation [IO]. Long et al., for o f the weights on the color components may not be enough to

0-7803-7750-8/03/$17.00 02003 IEEE Ill - 913


successfully segment faces for such a variety of images. of the face image class for the case of face detection or in a
The desired segmented image may depend on applications; small subspace of a specific input image for the case of image
whether such features as wrinkles are to be reflected in the retrieval. Then, segmentation is carried out using a few sets of
segmented image. In this paper, we require the segmented the weights and six segmentation errors of 7, 9, 11, 13, 15,
image to include as many features as possible not just to and 17 percent. The weights are determined in advance based
evaluate the segmentation capability but also to construct an on the results of manual segmentation, as will be shown.
image retrieval system. The segmented image is required in size to be Y<2.5X and
With this in mind, image regions which satisfy Eq. ( I ) but X<I .25 Y. It is then made binary, and we impose the pattern to
are not continuous with the sampled point are included in the satisfy the following conditions as face:
segmented image, if even slightest parts of the regions are N , / X Y > ce for eyes
longitudinally sandwiched within a certain distance by the N , / X Y > cm for mouth and nose (3)
segmented part, as shown in Fig. 1. This extension may be
useful in making the segmented image more informative, where N, and N,,, are numbers of non-segmented null pixels in
possibly, enough to recognize the person. the regions designated as eye and mouth, respectively, in the
segmented image, as shown in Fig. 2, and c. and c,,, are
Fig. I The segmented region RA is constants.
continuous with the sampled point
S, while the region RR in which
Eq. ( I ) is satisfied is not so. If
there exists a pixel in RB whose Fig. 2 Regions of eyes
distances D, and D2 to R, are and mouth for a
within a certain value, RH is Y segmented image with
included in the seemented imaee. size X by Y.
3. FACE DETECTIONmETRIEVAL SYSTEM
The face detection or image retrieval is carried out based on X
the segmentation method as follows: Then, the binary pattern of the segmented image, P,,(x.y),
(a) A significant scene change is detected in a video footage X,*~+X-I, y & ~ ~ + , + Y - is
l , correlated with an input face
(b) A sampling point is scanned on the beginning frame of a pattern, P&,y), O~XSA'~-I,05~&Y~-l, as
new scene
(c) If the color at a sampled point is within a color window,
segmentation is carried out for a number of errors and for
a few sets of the weights on the HSV components
(d) The segmented image is made binary, which is then
a = (Xk-l)/(X -I), p = (Yk -l)/(Y -1)
checked in some requirements as face where P., and Pk take I or -1 so that -l<C&l. The segmented
(e) Then, the pattern is correlated with an input face pattern pattern is fitted in size to the input pattem. In the case of Y >
(0 The segmented image with the largest correlation is 2X, where a neck part may often be included in the segmented
output as the face for the frame, where it also is possible image, however, the fitting ratio in y is made the same as in x,
to detect multiple faces. and the top part of the segmented pattern is used for the
(g) Segmented face images are displayed according to their correlation.
correlation values. In the case of image retrieval, average R, G, B values are
The scene change is detected by evaluating the difference obtained for each segmented image which are then correlated
between the neighboring frames: with those ofthe input key face image as
Dxe, ( n )

where o is set to 0.1 to 0.2.


where n is the order of the frame, i is the image intensity, and The segmented image with the largest value of C,, is
&I 2 I , which is the transition number of frames from one displayed for each new scene for the case of face detection,
scene to another. The scene is judged to have changed if D,= while those with the largest values of C,,Cc are displayed for
> Dlhholds. the video footage for the case of image retrieval.
The spacing of sampled points on the frame is determined
by the size of the face that we want to detect. Hoping to detect 4. EXPERIMENTS
faces whose sizes are larger than 8x12 pixels on a reduced
frame with (X,,Y,)=(76,58), we use 132 sampling points, 12 A video footage consisting of 100 frames shown in Fig. 3
points in x direction and I I in y. At each point, first, the color was mostly used in the experiments, which may equivalently
is checked whether it is in a relatively large (H,S,V I subspace consist of 100 different scenes. First, manual segmentation
was carried out using HSV, RGB and YChCr colors so as to

111 - 914
extract as many features of the faces as possible. Some of the REFERENCES
results are given in Fig. 4, which shows that HSV color can
most successfully segment the faces, while RGB or YCbCr [I] M-H. Yang, D.J. Kriegman, N. Ahuja, “Detecting Faces in
color fails to do so depending on the image. It also shows Images: A Survey,” IEEE Trans.PAMl,vol.24.pp.34-58,2002.
that HSV color extracts more features than the others. The [2] H.A. Rowley, S. Raluja, T. Kanade, “Neural Network-Based
control of the weights seems easier for HSV color than the Face Detection,” IEEE Trans.PAMI, ~01.20, pp.23-38, 1998.
[3] R. Feraud, O.J. Bemier, I.-E. Viallet, and M. Collobert, “A
other two, since the HSV components have clear meanings. Fast and Accurate Face Detector Based on Neural Networks,”
Figure 5 shows four different cases of the optimal weights on IEEETrans. PAMI, vol. 23, pp. 42-53, 2001.
the HSV components; they are (a) iv>w,w,, (b) whw.>w, [4] H. Schneiderman and T. Kanade, “Probabilistic Modeling of
(c) iv,=iv, (d) iv,>ivhiv, and ivh>iv,. That is, they v a v Local Appearance and Spatial Relationships for Object
depending not only on the face image but also on the Recognition,.’ Proc. CVPR, pp. 45-51, 1998.
surroundings. [5] E. Osuna R. Freund, and G Girosi, “Training Support Vector
We used the three sets of the weights, W,=(l,I,l), W2= Machines: An Application to Face Detection,” Proc. CVPR,
(1,1,5) and W3=(2,2,1), to segment the 100 images in the face pp. 130-136, 1997.
detection and image retrieval system. Manually segmented [6] M.A. Turk and A.P. Pentland, “Eigenfaces for pattern
faces of the 100 images had mean values ranging from -20 to recognition,” J.Cognitive Neuroscience,vol.3.pp.71-96, 1991.
61 in the scale of 360 for H, from 0.1 1 to 0.79 for S, and from [7] B. Fr(lba, A. Emst, and C. Kilblbeck, “Real-Time Face
0.28 to 0.93 for V. So we set the color window to have the Detection,” Proc. 4Ih IASTED International Conf. Signal and
color subspace [-20,611 for H, [O.I I , 0.791 for S, and 10.28, Image Processing, pp. 497-502, 2002.
0.931 for V for the case of face detection, and [h-8, h+8] for H, [8] S. Satoh, Y. Nakamura and T. Kanade, “Name-It: Naming
[s-0.1,s+O.I] for S and [v-O.I,v+O.I] for V for the case of and Detecting Faces in News Videos,” IEEE Multimedia, pp.
image retrieval, where (h,s.v) are mean values of the input 22-35, 1999.
face image. 191 D. Wang, “Unsupervised Video Segmentation Based on
Watersheds and Temporal Tracking,” IEEE Trans. Circuits
The results are summarized in Table I for four input face
and Systems for Video Technology, vo1.8, pp.539-545, 1998.
patterns, where the first and the second patterns are generated
[IO] C. Toklu et al.. “Simultaneous Alpha Map Generation and
from No.3 image in Fig. 3 with different segmentation errors 2D Mesh Tracking for Multimedia Applications,“ Proc.
and have similar mean color values (12, 0.19, 0.69, the third KIP, vol. 1. pp. 113-116,1997
pattern generated from No.12 image has the means (39,0.79, [ I l l E Long, D. Feng, H. Peng, and W. Siu, “Extracting
O S @ , and the fourth one generated from No. 45 image has the Semantic Video Objects,” IEEE Computer Graphics and
means (16, 0.35, 0.77). It is seen from the results that more Applications, pp. 48-55.2001.
than 95 faces out of 100 are successfully segmented, that the
rate tends to increase with increasing number of the weights,
and that using the multiple weights is more crucial for image
retrieval than for face detection.
The 100 segmented images for the second pattern and
weight (2,l,l) in Table I are shown in Fig. 6. It may be seen
that some segmented faces are cut off due to the restricted
segmentation region in size around the sampled point, and
that part of the background image is included in some of the
segmented images. The latter may be insignificant for face
detection but may affect significantly for image retrieval.
Figure 7 shows an example of detecting multiple faces, where
the two segmented regions are discriminated. The face
detection for the 100 face images took about 37 sec when
using a single set of the weights on a I.2MHz Pentiumlll PC,
and the image retrieval took one to eight sec depending on the
number of the weights used and the input image.

5. CONCLUSIONS
The segmentation method using HSV color was shown to be
more accurate and easier for face images compared with RGB
or YCbCr color. Based on this method, we constructed a face
detectioniimage retrieval system. Using a few sets of the
weights on the three HSV components it was able to detect
more than 95 faces out of 100 successfully, and it could
retrieve images from the input face image in a short time.

Fig. 3 A video footage consisting of 100 frames

111 - 915
Table 1. Detection rates for the 100 face images in Fig. 3 for four
input patterns, and identification results for the same data where
the color subspaces are different between the two experiments.

Fig. 4 Comparison of segmented images obtained using HSV,


RGB and YCbCr colors for four face images. The weights on the
three components are given in the parenthesis and the
segmentation error is given in percent.

Fig. 6 Segmented images for the 100 face images. where the
pattern for tlSV in Fig. 4(a) was used as input face pattern.

(C) (d) Fig. 7 Example ofmulti-face detection on the 42nd frame in Fig.
Fig. 5 Four cases of optimal weights on HSV: from top-left to 3.
bottom-right, (wh, w., w,,)=(l,lO,l), (lO,l,lO), (l,l,lO),(l,l,l),
(lO.IO.l), ( l O , l , l ) , ( l , I O . l O ) for (a), (b) and (d) and ( l , 3 , l ) ,
(l0,3,l0),(l,l.3),( l , l , l ~ , ~ l O , l O , 3 ) , ( l O , 3 , 3 )(1.3.3)for(c).
,

111 - 916

Das könnte Ihnen auch gefallen