Sie sind auf Seite 1von 9

Real Time Face Pose Estimation based on HAAR for

Human-Robot Interaction

Andrés Maldonado1, Ana Oña1, Marlon Ramírez1


1
Departamento de Ciencias de la Energía y Mecánica, Universidad de las Fuerzas Armadas-
ESPE, Sangolquí, Ecuador.
jamaldonado12@espe.edu.ec , agoaq@espe.edu.ec, msramrezc@espe.edu.ec

Abstract. In this article we present the real-time application of a robot that mimics the
movements of a human's head temporarily using an RGB camera. This means that it is the
position of the object and its relative orientation with respect to a camera. For this, a unified
model for face detection, detection of facial points, and estimation of posture will be identified.
We use different threshold levels for motion recognition and different types of image captures
for optimal control of robot movement. In this paper, we propose a frame-based approach to
estimating the head poses on Viola and Jones (VJ). Face detector type Haar. Based on feature
extraction with Haar base filters and cascading classifiers. As a result of this research, the
algorithm is presented in real time to estimate the 3D pose with the help of the classifiers in Haar
cascade. For the evaluation of this algorithm several tests are performed with variations of the
posture for the reduction of the error. In addition, the recognition of the smile is implemented,
since it has the intention of creating a social robot that interacts with children who have problems
of socialization.

Keywords : algorithms, Haar Like Features , AR, Pose Estimation

1. Introduction

While the human being learns to evaluate the orientation of the head quickly, the
challenge that researchers and scientists have had in the computer vision system have
existed some problems [1].

For a system to estimate the position of the head (HPES) to work correctly, it must be
sufficiently robust in the face of various factors such as occlusion, noise, lighting and
distortion, and not only that, but it must handle different facial expressions of each
person, and Great differences between the faces of people such as gender, ethnicity and
age. To estimate a 3D pose of a head that are made from static images represent a
problem but with several applications.

There are multiple applications that relate to the proposed topic, know how the head
inclines with respect to a camera. Among some are in virtual reality, the posture of the
head is used to represent the view of a scene. In assistance to the driver the camera
captures the driver in the vehicle, and by means of the position of the head can
determine if the driver is paying attention to the road. In addition you can use the pose
of the head for hands-free games. [2]

In general, there are two categories of algorithms for estimating the head position: the
estimation of the thick head posture, which tends to be more robust, but not necessarily
accurate and the fine head estimate, which demonstrate a large precision. [3] Precisely
locating the head and its orientation is the explicit goal of systems such as human
computer interfaces or a pre-processing step necessary for further analysis, such as the
identification or recognition of facial expressions. From the point of view of the
representation of the image, the estimation of the position of the head can be classified
into two types: methods based on appearance and methods based on geometric
characteristics. [4]

In the feature-based methods, the head pose inferred from the extracted features, which
include the common feature visible in all poses, the pose-dependent feature, and the
discriminated feature together with the appearance information.

There are several works related to the use of Haar classifiers such as Vatahska et al.,
Use a face detector roughly classifies the pose as frontal, left, or right profile. After his,
they detect the eyes and nose tip using AdaBoost classifiers, and the detections fed into
a neural network that estimates the head orientation. Whitehill et al., [5] Present a
discriminative approach to frame-by-frame head pose estimation. Their algorithm relies
on the detection of the nose tip and both eyes, thereby limiting the recognizable poses
to the ones where both eyes are visible.

Our model focuses on the estimation of the position of the head, implementing a robot
that will track the movement with the help of an RGB camera. For this, we intend to
work on a social robot that helps children with socialization problems. There will be a
mood detector that allows the child to interact with the robot. For this purpose, it uses
cascaded predefined cascading classifiers based on Haar characteristics, which is an
effective method of detecting the objects of Paul Viola and Michael Jones. It is an
automatic learning approach when the cascade function is trained with positive (face)
and negative (faceless) images.

Then it will draw characteristics of it. For this, edge features such as Edge, Line Four-
rectangle features are used. Which are our convolutional kernel. Each characteristic is
a unique value obtained by subtracting the sum of pixels in the white box from the sum
of pixels in the black rectangle.

We also want to present a proposal for an integrated method for face detection, tracking
and estimation of head position. The head is raised using the coordinates of both eyes
and a mouth relative to the nose as the origin of the coordinate system. The head is
raised using the coordinates of a left eye, a right eye and a mouth in relation to a nose,
which is the origin of the coordinate system. Their coordinates are what they use using
the front tracking.

2. Face tracking in 3D in real time


a. Haar Like Features
Similar features to Haar are one of the methods for face detection. This method works
by changing a part of the image (Region of interest) into a value. Method for the
detection of faces with high speed and precision.

Basically, characteristics similar to those that work with the change of a region near an
image in a pixel area depending on its classifier. The next step will calculate the
intensity of each pixel area. The resulting difference is used to categorize each area for
the detection of an object.

b. Head Pose Estimation


Similar features to Haar will be used for face detection, the Model will be used actively
to obtain the user's point of view and the POSIT algorithm will be used to acquire a
user's face. The postures will be based on the characteristic points of the face generated
by the Model actively. [6]

3. Related Works
In the following sections some important works related to head pose estimation.

We focus on methods related to our work.Stiefelhagen et al. extract horizontal and


vertical image derivatives of the first order, then train a neural network to discriminate
between poses. Propose an algorithm for yaw estimation using the symmetry principle.
Their method considers an effective combination of Gabor filters and covariance
descriptors, and exploits the existing relationship between the head pose and the
symmetry of the face image. Kota et al. Propose a regression based method for head
pose and car direction estimation. According to authors of the paper the proposed model
is more flexible in the sense that the method does not rely on trial and error process for
finding best spliting rules from already defined set of rules.[7]

Vatahska et al. Algorithm initially detects facial features such as tip of the nose and
both eyes. Based on the position of these features neural network estimates three
rotation angles i.e., frontal, left and right profile images. Hatem et al. Method also uses
facial features for head pose estimation. Haar like features are used initially for face
localization, than the coordinates of eyes and mouth with respect to the nose are located.
Schodl et al. combine the 3D model with a ¨ texture. They transform an extracted texture
of the face to the frontal view and project it onto the model. Dornaika and Ahlberg [8]
apply an active appearance model and use a very detailed description of the contours
and features of the face.

In this document, it is first located in the image using Haar cascading classifiers. In this
way you have an approximation about the head pose.

4. The Proposed Approach:


One of the proposals that need to be implemented is the use of cameras because we
want to change the use of conventional 2D, to what researchers are currently using the
RGBD sensor technology that provides depth information, in addition to the 2D color
images. With the help of the Kinetic sensor, it has high resolution depth detection, with
the help of this you can overcome several problems such as: separating the foreground
of background pixels, unknown scales of objects and some lighting problems. First, the
face is automatically located in the 2D image. Afterwards different types of features of
the face patch detected in the 2D image and its corresponding 3D point cloud are
extracted. These characteristics encode the spatial distribution of the texture of the face
on a box enclosing the detected face. In addition, they encode depth variation across
the face. Finally, the extracted characteristics are concatenated to construct a vector of
past characteristics to support vector machine regressors (SVM-R) to return a
continuous estimation of the position of the head. [9]

5. Our Approach
There are different techniques, methods or algorithms that allow the detection of
movement. As in other subjects, in artificial vision there are no generic cases. It will
depend on each situation to use one or the other. The method used to explain the use of
Cascade based on Haar characteristics. And its procedure is explained quickly.

 Positive images (images of faces) and negative images (images without faces)
to train the classifier.
 Characteristics extraction. Each characteristic is a unique value obtained by
subtracting the sum of pixels in the white box from the sum of pixels in the
black rectangle.
 Possible sizes and locations of each kernel are used to calculate many features
 For the calculation of the characteristics, the pixels of the black and white
rectangles are added (There will be irrelevant features)
 For each characteristic, find the best threshold that will classify the faces in
images and negatives. (Select with a minimum error rate)
 The final sum will have about 6000 characteristics
 Group the 6000 characteristics in different stages of classifiers and apply them
one by one. The window that passes all the stages is a region of the face.
6. Analysis
It will be responsible for assigning a set of characteristics given to a class with
which there is greater similarity, according to a model during training. It is a
classification method combined several basic classifiers to form a more complex
and precise that there are sufficient training samples. The algorithm used in the
Methodology is presented below:

The cascade classifier process is explained mathematically [9]

Algorithm: proposed algorithm Cascade of


Classifiers Haar

1. Input: Set of labeled examples


(𝑥1 , 𝑦1 ), … , (𝑥𝑁 , 𝑦𝑁 ), where 𝑦𝑛 = 1 for positive
examples and 𝑦𝑛 = 0 for negative examples
2. Let 𝑚 be the number of negatives examples and
𝑙 be the number of positive examples. Initialize
1 1
the weights 𝑤1 , 𝑛 = , depending on the
2𝑚 2𝑙
value of 𝑦𝑛
3. For 𝑡 = 1, … . , 𝑇:
 Normalize the weights to get a probability
distribution 𝐷𝑡 on the training set 𝐷𝑡 (𝑖) =
𝑤𝑡 ,𝑖
∑𝑁
𝑛=1 𝑤𝑡 ,𝑛
 Generate a weak classifier ℎ𝑗 for each
feature 𝑓𝑗
 Determine the error 𝜖𝑗 of classifier ℎ𝑗 with
respect to 𝐷𝑡 :
𝑁

𝜖𝑗 = ∑ 𝑤𝑡 , 𝑛 |ℎ𝑗 (𝑥𝑛 ) − 𝑦𝑛 |
𝑛=1

 Choose the classifier ℎ𝑗 with the lowest error


𝜖𝑗 and set (ℎ𝑡 , 𝜖𝑡 ) = (ℎ𝑗 , 𝜖𝑗 )
−𝑒𝑛
 Update the weights 𝑤(𝑡+1),𝑛 = 𝑤𝑡,𝑛 𝛽𝑡1 ,
𝜖
where 𝛽𝑡 = 𝑡 and 𝑒𝑛 = 0; , if example xn
1−𝜖𝑡
is classified correctly by ht and 1, otherwise
4. The final strong classifier is given by:
𝑇 𝑇
1 1 1
(𝑥) ≥ ∑ log ( )
ℎ(𝑥) = {1 𝑖𝑓 ∑ log (𝛽𝑡 ) ℎ𝑡 2 𝛽𝑡
𝑡=1 𝑡=1
0
𝐷𝑡 = 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

Algorithm face pose estimation


Algorithm: proposed algorithm face pose
estimation

1. Load the required XML classifiers.


2. Start video capture
3. Load input image in grayscale mode
4. Use classifiers to find faces or eyes.
5. If faces are found, it returns the positions of
detected faces as Rect(x,y,w,h).

6. If movement to the left is detected, Then


Set turning of the simulated robot to the left.

7. If movement to the right is detected,


Then
Set turning of the simulated robot to the right.

7. End
8. End
9. End

7. Results
In our proposed method, the efficiency of face detection is demonstrated using
characteristics similar to Haar. After them the Haar classifiers were applied,
significantly reducing the training time and the amount of features required in our
system.

For this project we will focus on the face, since it is our area of interest. The program
will only detect the face of the person, whatever their position. Estimate the position of
the face and follow it by determining how much it has moved. If you have not moved
the program will send points to the serial, with which the arduino program will not
perform any action.

In figure 1 we see the operation of the application and the continuation of the different
movements that are made for the tests and the sea to the left, right, up and down.
(a) (b)

(c) (d)
Figure 1: Face Detection (a) Upward movement, (b) Downward movement, (c) Movement to
the left, (d) Movement to the right

If a movement is made and depending on the calibration made in the program letters
will be sent to the serial, which have their meaning:

Points that represent the movement of the head

Right 2a.

Left 2b.

Up 2c.

Down 2d.

Table 1 Results of the movement of the head

These symbols will be understood by the arduino, in order to move the robot.
8. Conclusion
In this article we present a system of characteristics that estimates the pose of the head
from monocular images. The estimation of the posture of the face has been necessary
for several applications such as the recognition of the facial expression, the recognition
of the gestures of the head, the recognition of the look and the control of the driver. A
frame-based approach was proposed that estimates continuous head posture. For this
purpose, the Haar type face detector and the cascade classifier are used.

Acceptable percentages of detection are given and the times in which it is processed are
low, the characteristics of the combination of techniques of the Haar classifier, for the
detection of the face and the facial characteristics and the possibility of taking the
system to a real-time application. .

9. References
[1] Murphy-Chutorian, E. and Trivedi, M. (2009). Head pose estimation in computer
vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence,
31(4):607– 626.

[2] V. Blanz and T. Vetter. Face recognition based on fitting a 3D morphable model.
PAMI, 25(9):1063–1074, 2003.

[3] B. Czupryński, A. Strupczewski, “High Accuracy Head Pose Tracking Survey”,


Active Media Technology, (2014).

[4] J. Whitehill and J.R. Movellan, “A discriminative approach to frame-by-frame head


pose tracking”, 8th IEEE International Conference on Automatic Face and Gesture
Recognition, (2008); Amsterdam.

[5] Endang Setyatia , David Alexandreb , and Daniel Widjajab, “Face Tracking
Implementation with Pose Estimation Algorithm in Augmented Reality Technology,
(2014).

[6] B. O. Han, Y. N. Chae, Y.-H. Seo and H. S. Yang, “Head Pose Estimation Based
on Image Abstraction for Multiclass Classification”, Information Technology
Convergence, (2013).

[7] Khalil Khan , Massimo Mauro , Pierangelo Migliorati , Riccardo Leonardi “HEAD
POSE ESTIMATION THROUGH MULTI-CLASS FACE SEGMENTATION”,
(2014).

[8] Teodora Vatahska, Maren Bennewitz, and Sven Behnke “Feature-based Head Pose
Estimation from Images”
[9] Anwar Saeed 1,*, Ayoub Al-Hamadi 1 and Ahmed Ghoneim 2,3, “Head Pose
Estimation on Top of Haar-Like Face Detection: A Study Using the Kinect Sensor”,
(2015).

Das könnte Ihnen auch gefallen