Beruflich Dokumente
Kultur Dokumente
Human-Robot Interaction
Abstract. In this article we present the real-time application of a robot that mimics the
movements of a human's head temporarily using an RGB camera. This means that it is the
position of the object and its relative orientation with respect to a camera. For this, a unified
model for face detection, detection of facial points, and estimation of posture will be identified.
We use different threshold levels for motion recognition and different types of image captures
for optimal control of robot movement. In this paper, we propose a frame-based approach to
estimating the head poses on Viola and Jones (VJ). Face detector type Haar. Based on feature
extraction with Haar base filters and cascading classifiers. As a result of this research, the
algorithm is presented in real time to estimate the 3D pose with the help of the classifiers in Haar
cascade. For the evaluation of this algorithm several tests are performed with variations of the
posture for the reduction of the error. In addition, the recognition of the smile is implemented,
since it has the intention of creating a social robot that interacts with children who have problems
of socialization.
1. Introduction
While the human being learns to evaluate the orientation of the head quickly, the
challenge that researchers and scientists have had in the computer vision system have
existed some problems [1].
For a system to estimate the position of the head (HPES) to work correctly, it must be
sufficiently robust in the face of various factors such as occlusion, noise, lighting and
distortion, and not only that, but it must handle different facial expressions of each
person, and Great differences between the faces of people such as gender, ethnicity and
age. To estimate a 3D pose of a head that are made from static images represent a
problem but with several applications.
There are multiple applications that relate to the proposed topic, know how the head
inclines with respect to a camera. Among some are in virtual reality, the posture of the
head is used to represent the view of a scene. In assistance to the driver the camera
captures the driver in the vehicle, and by means of the position of the head can
determine if the driver is paying attention to the road. In addition you can use the pose
of the head for hands-free games. [2]
In general, there are two categories of algorithms for estimating the head position: the
estimation of the thick head posture, which tends to be more robust, but not necessarily
accurate and the fine head estimate, which demonstrate a large precision. [3] Precisely
locating the head and its orientation is the explicit goal of systems such as human
computer interfaces or a pre-processing step necessary for further analysis, such as the
identification or recognition of facial expressions. From the point of view of the
representation of the image, the estimation of the position of the head can be classified
into two types: methods based on appearance and methods based on geometric
characteristics. [4]
In the feature-based methods, the head pose inferred from the extracted features, which
include the common feature visible in all poses, the pose-dependent feature, and the
discriminated feature together with the appearance information.
There are several works related to the use of Haar classifiers such as Vatahska et al.,
Use a face detector roughly classifies the pose as frontal, left, or right profile. After his,
they detect the eyes and nose tip using AdaBoost classifiers, and the detections fed into
a neural network that estimates the head orientation. Whitehill et al., [5] Present a
discriminative approach to frame-by-frame head pose estimation. Their algorithm relies
on the detection of the nose tip and both eyes, thereby limiting the recognizable poses
to the ones where both eyes are visible.
Our model focuses on the estimation of the position of the head, implementing a robot
that will track the movement with the help of an RGB camera. For this, we intend to
work on a social robot that helps children with socialization problems. There will be a
mood detector that allows the child to interact with the robot. For this purpose, it uses
cascaded predefined cascading classifiers based on Haar characteristics, which is an
effective method of detecting the objects of Paul Viola and Michael Jones. It is an
automatic learning approach when the cascade function is trained with positive (face)
and negative (faceless) images.
Then it will draw characteristics of it. For this, edge features such as Edge, Line Four-
rectangle features are used. Which are our convolutional kernel. Each characteristic is
a unique value obtained by subtracting the sum of pixels in the white box from the sum
of pixels in the black rectangle.
We also want to present a proposal for an integrated method for face detection, tracking
and estimation of head position. The head is raised using the coordinates of both eyes
and a mouth relative to the nose as the origin of the coordinate system. The head is
raised using the coordinates of a left eye, a right eye and a mouth in relation to a nose,
which is the origin of the coordinate system. Their coordinates are what they use using
the front tracking.
Basically, characteristics similar to those that work with the change of a region near an
image in a pixel area depending on its classifier. The next step will calculate the
intensity of each pixel area. The resulting difference is used to categorize each area for
the detection of an object.
3. Related Works
In the following sections some important works related to head pose estimation.
Vatahska et al. Algorithm initially detects facial features such as tip of the nose and
both eyes. Based on the position of these features neural network estimates three
rotation angles i.e., frontal, left and right profile images. Hatem et al. Method also uses
facial features for head pose estimation. Haar like features are used initially for face
localization, than the coordinates of eyes and mouth with respect to the nose are located.
Schodl et al. combine the 3D model with a ¨ texture. They transform an extracted texture
of the face to the frontal view and project it onto the model. Dornaika and Ahlberg [8]
apply an active appearance model and use a very detailed description of the contours
and features of the face.
In this document, it is first located in the image using Haar cascading classifiers. In this
way you have an approximation about the head pose.
5. Our Approach
There are different techniques, methods or algorithms that allow the detection of
movement. As in other subjects, in artificial vision there are no generic cases. It will
depend on each situation to use one or the other. The method used to explain the use of
Cascade based on Haar characteristics. And its procedure is explained quickly.
Positive images (images of faces) and negative images (images without faces)
to train the classifier.
Characteristics extraction. Each characteristic is a unique value obtained by
subtracting the sum of pixels in the white box from the sum of pixels in the
black rectangle.
Possible sizes and locations of each kernel are used to calculate many features
For the calculation of the characteristics, the pixels of the black and white
rectangles are added (There will be irrelevant features)
For each characteristic, find the best threshold that will classify the faces in
images and negatives. (Select with a minimum error rate)
The final sum will have about 6000 characteristics
Group the 6000 characteristics in different stages of classifiers and apply them
one by one. The window that passes all the stages is a region of the face.
6. Analysis
It will be responsible for assigning a set of characteristics given to a class with
which there is greater similarity, according to a model during training. It is a
classification method combined several basic classifiers to form a more complex
and precise that there are sufficient training samples. The algorithm used in the
Methodology is presented below:
𝜖𝑗 = ∑ 𝑤𝑡 , 𝑛 |ℎ𝑗 (𝑥𝑛 ) − 𝑦𝑛 |
𝑛=1
7. End
8. End
9. End
7. Results
In our proposed method, the efficiency of face detection is demonstrated using
characteristics similar to Haar. After them the Haar classifiers were applied,
significantly reducing the training time and the amount of features required in our
system.
For this project we will focus on the face, since it is our area of interest. The program
will only detect the face of the person, whatever their position. Estimate the position of
the face and follow it by determining how much it has moved. If you have not moved
the program will send points to the serial, with which the arduino program will not
perform any action.
In figure 1 we see the operation of the application and the continuation of the different
movements that are made for the tests and the sea to the left, right, up and down.
(a) (b)
(c) (d)
Figure 1: Face Detection (a) Upward movement, (b) Downward movement, (c) Movement to
the left, (d) Movement to the right
If a movement is made and depending on the calibration made in the program letters
will be sent to the serial, which have their meaning:
Right 2a.
Left 2b.
Up 2c.
Down 2d.
These symbols will be understood by the arduino, in order to move the robot.
8. Conclusion
In this article we present a system of characteristics that estimates the pose of the head
from monocular images. The estimation of the posture of the face has been necessary
for several applications such as the recognition of the facial expression, the recognition
of the gestures of the head, the recognition of the look and the control of the driver. A
frame-based approach was proposed that estimates continuous head posture. For this
purpose, the Haar type face detector and the cascade classifier are used.
Acceptable percentages of detection are given and the times in which it is processed are
low, the characteristics of the combination of techniques of the Haar classifier, for the
detection of the face and the facial characteristics and the possibility of taking the
system to a real-time application. .
9. References
[1] Murphy-Chutorian, E. and Trivedi, M. (2009). Head pose estimation in computer
vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence,
31(4):607– 626.
[2] V. Blanz and T. Vetter. Face recognition based on fitting a 3D morphable model.
PAMI, 25(9):1063–1074, 2003.
[5] Endang Setyatia , David Alexandreb , and Daniel Widjajab, “Face Tracking
Implementation with Pose Estimation Algorithm in Augmented Reality Technology,
(2014).
[6] B. O. Han, Y. N. Chae, Y.-H. Seo and H. S. Yang, “Head Pose Estimation Based
on Image Abstraction for Multiclass Classification”, Information Technology
Convergence, (2013).
[7] Khalil Khan , Massimo Mauro , Pierangelo Migliorati , Riccardo Leonardi “HEAD
POSE ESTIMATION THROUGH MULTI-CLASS FACE SEGMENTATION”,
(2014).
[8] Teodora Vatahska, Maren Bennewitz, and Sven Behnke “Feature-based Head Pose
Estimation from Images”
[9] Anwar Saeed 1,*, Ayoub Al-Hamadi 1 and Ahmed Ghoneim 2,3, “Head Pose
Estimation on Top of Haar-Like Face Detection: A Study Using the Kinect Sensor”,
(2015).