Sie sind auf Seite 1von 12

Face recognition based on 3D mesh model

Chenghua Xu
1*
, Yunhong Wang
1
, Tieniu Tan
1
, Long quan
2

1
Center for Biometric Authentication and Testing, National Laboratory of Pattern Recognition,
Institute of Automation, CAS, Beijing, China, 100080
2
Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong


Abstract

This paper proposes the automatic face recognition method based on the face representation with a 3D mesh, which
precisely reflects the geometric features of the specific subject. The mesh model is generated by using nonlinear
subdivision scheme and fitting with the 3D point cloud, and describes the deep information of human faces accurately. The
effective method for matching two mesh models is developed to decide whether they are from the same person. We test our
proposed algorithm on 3D_RMA database, and experimental results and comparisons with others work show the
effectiveness and competitive performance of the proposed method.
Keywords: 3D face model, face recognition, identification, verification

1. Introduction

Biometric identification is one of the most important characteristics to show machine intelligence and is also stepping
into the life of common people little by little. Of all the biometric features, the face is so common and reachable that face
recognition remains one of the most active research issues in pattern recognition and image processing. In the past decades,
most researches focus on feature extraction from 2D intensity or color images. The recognition accuracy is sensitive to the
lighting conditions, expressions, viewing position or variety of subordinates such as hair, glasses. So far, it is very difficult
to develop a robust automatic face recognition system.
The 3D facial data can provide more information for recognition than 2D images and has potential to improve the
performance of the system. We attempt to develop an automatic face verification and identification scheme based on the
3D geometric information. The face surface is represented with a triangulated mesh, which is like the fishing net covering
the face and describes the geometric features of the individual accurately. By calculating the difference of two mesh models,
we decide whether they are from the same person.
As we know, faces are characterized by their shape and texture features. Most successful algorithms for face
recognition integrate the shape information with texture information provided from 2D images. Their source determines
that they are influenced strongly by illumination and facial varieties. Although some methods have been proposed to
tackle the variation of pose and illumination as summarized in literature [1], they do not work well in arbitrary condition.
In this paper, we reconsider the 3D geometric information. There are some distinct advantages in using 3D information:
sufficient geometrical information, invariance of measured features relative to transformation and 3D capture by some
equipments being immune to illumination variation. 3D information has potential possibility to improve the performance
of face recognition system.
However, 3D face recognition research is still weakly reported [4,5,17,18,19,21] in the published literature. A main
reason baffling the development lies that the equipment for 3D capture is usually expensive and the speed is slow. Some
researches on curvature analysis [2,3,14,21] focused on the high-quality data from 3D laser scanner. Gordon [2, 21]
extracted the facial curvature features in two stages. First, high level features were identified in terms of points, lines,
and regions on the surface, which could describe aspects of the eyes, nose, and head. Second, low level features were
extracted in terms of distance or curvature measurements. The set of measurements for each face described a single point
in feature space. This method usually had to depend on the high-quality 3D data, which could characterize delicate
features. Beumier et al. [17,18,19] developed a 3D acquisition prototype based on structured light and built a 3D face
database. They also proposed two methods of surface matching and central/lateral profiles matching to compare two
instances. Both of methods constructed some central and lateral profiles to represent the individual, and obtained the
matching value by minimizing the distance of the profiles. In [4,5], a 3D morphable model was described with a linear

*
Email: chxu@nlpr.ia.ac.cn; phone: 8610-62647441; fax: 8610-62551993

combination of the shape and texture of multiple exemplars. This model could be fitted to a single image to obtain the
individual parameters, which were used to represent the personal features. Their results seemed very promising except
that the modeling process cost too much time. In this paper, we utilize the database 3D_RMA [17], whose data is so
noisy that facial delicate features are not described. We abandon local features detection and use a regular mesh, which is
fitted to the point cloud, to represent the human face. The recognition and verification is realized by comparing the
global distance between two mesh models, which can avoid the influence of the local noise.
Many works have been proposed on 3D face modeling, especially in the application of facial animation. Since faces
have characteristics of both shapes and texture, automatically shape modeling and texture mapping technique are
developed to obtain realistic 3D virtual face. The general method is to begin with a generic model, which presents the priori
knowledge of the human face and provides the animation structure. Then the model is deformed to build the individual
model according to the images [6,7,8,9], the range data [10,11] from laser scanner or other data source such as
anthropometrical data [12]. In these fitting processes, realistic face modeling is a complicate task and human intervenes are
usually entailed for initialization to avoid the local minima. In this paper, we have an easier modeling task due to ignoring
texture and structure information. We begin with a simple mesh, refine it by nonlinear subdivision scheme and make the
mesh close to the point cloud level by level. The proposed method is so robust that it can overcome the effect of noise and
small holes in the point cloud to a certain extent and obtain a smooth mesh model.
Our whole process includes register and recognition stages as shown in Fig.1. In both stages, the input data is the facial
3D point cloud. According to the prominence nose tip, which can be detected easily and robustly, the basic mesh is aligned
to the data. Then, the fitting method and subdivision scheme are combined to build the individual mesh model. The gallery
set contains mesh models of different persons, and we measure the difference between the gallery model and the testing
model during the recognition process. In this work, we pay much attention to two aspects: how to model the face robustly
and how to compare the gallery model with the testing model. In summary, we propose the new idea that the 3D facial
geometric features are described with 3D mesh surface instead of a feature vector. To support this idea, we develop a robust
and practical modeling method to generate the individual models, as well as an effective scheme for measuring the
difference of the mesh models.
The remaining of this paper is organized as follows. Section 2 describes the method of building individual 3D mesh
model. The matching method between two mesh models is introduced in Section 3. Section 4 shows some experimental
results and comparisons with others. Finally, Section 5 gives the conclusions.

2. 3D face modeling

The 3D point clouds of the different persons in 3D_RMA are noisy as a whole, and the points in different clouds have no
direct corresponding relationship. It is very difficult to calculate the difference between two point clouds. We search a
alternative way to build a regular mesh with certain number of nodes and facets to represent the shape of the human face. It
will be easier to compare two mesh models than compare two irregular point clouds.
We can imagine that a human face is covered with a flexible, quadrate fishing net and each node touches the face lightly.
This fishing net can describe the individual geometric information well. In this section, we develop the method to weave
this kind of net. Beginning with a simple mesh, a regular and dense mesh model is generated from a simple mesh based on
the 3D scattered point cloud. To solve this problem, we focus on two main problems: subdivision of the mesh and fitting of
Figure 1. Framework of the automatic face recognition based on 3D data
Initialization
Training
set
Testing
set
Registion
Matching
...
Model
database
Mesh refining and fitting
Registion
Recognition

the subdivided mesh.

2.1 Nonlinear subdivision
Our generic model is a triangulated mesh combined with a vertex list and a facet list. We develop an efficient non-linear
subdivision scheme to realize the subdivision.
We represent a basic mesh
0
M with ) , (
0 0
V T , where
0
T represents a set of the triangular facets and
0
V means a set of
the basic vertices. For a given mesh
i
M , each triangular facet is divided into four small triangles for each subdivision. Our
refinement scheme takes two phases as a whole as illustrated in Fig.2. First, the mesh ) , (
i i i
V T M = is refined into an
intermediate mesh ) , (
1 1 1 + + +
=
i i i
V T M
) )
by using the middle points of each edge as the interpolation as shown in Fig.2b. The
new vertex set
1 + i
V
)
includes the set
i
V and the middle points of each edge. The triangular count in the mesh
1 + i
M
)
is
increased by four. Second, each middle vertex moves to a suitable position along a certain direction to keep the surface
smooth approximately. To each middle point
0
m , its offset depends on the normal of two corresponding vertices and the
length of its corresponding edge, which is formulated as follows:
) (
2 1 2 1
n n n n kd m
v v v v
+ = (1)
where d represents the length of this edge, k is a positive factor,
1
n
v
and
2
n
v
are the normal of two corresponding vertices,
which can be computed easily according to weighted average of neighboring facet area [13],
2 1
n n
v v
is the absolute value
of the discrimination between the two normal vectors and
2 1
n n
v v
+ determines the offset direction of each middle point.
Thus, the final interpolating point becomes
m m m + =
0
. (2)
All the middle points can be adjusted according to Equ. 1 and 2 as shown in Fig.2c so that the mesh
1 + i
M becomes
) , (
1 1 1 + + +
=
i i i
V T M . Thus, subdivision is performed once.
Beginning with the basic mesh
0
M , we iteratively execute subdivision and obtain the hierarchy mesh
1
M ,
2
M ,
3
M and
so on. Fig.3 shows the basic mesh and the serial of hierarchy meshes after being refined four times.

2.2 Fitting the subdivided mesh
In this section, we will introduce a universal fitting algorithm for regulating the hierarchy meshes to be conformed to the
3D points. We use the fitted mesh to represent 3D shape of the individual approximately. This process includes two steps:
initialization of the basic mesh and fitting of the hierarchy meshes.
(1) Initialization of basic mesh
Our basic idea for modeling is to use optimization to tune the position of mesh nodes so that they are close to the point
clouds perfectly. To keep the fitted result reasonable, one of the most pivotal problems is to give this fitting process a good
initial value. Thus, it is necessary to detect some features correctly in the 3D scattered point clouds.
(a) (b) (c)
2 1
n n
v v
+
2
n
v
1
n
v
0
m
Figure 2. The process of the subdivision (a) One facet to be refined. (b) Refining
using middle vertices. (c) The regulated middle vertices.
Figure 3. Basic mesh and hierarchy meshes of four levels
(a)
0
M (b)
1
M (c)
2
M (d)
3
M (e)
4
M

Some algorithms [14, 21] have been proposed to label some facial features based on the high-quality data obtained from
the laser scanner. They do not work on 3D_RMA data due to the limited quality, which has been proved in the literature [17,
18,19]. Beumier et al. [17] also observed that the nose seems to be the only facial feature providing robust geometrical
features for limited effort. We localize the prominent nose in the point cloud and utilize it to initialize the basic mesh as
shown in Fig.4a.
In most cases, there is much noise around the brim of the point cloud. To avoid the effect of the noise, we ignore the
points whose projection on X-Y plane is out of the basic mesh as shown in Fig.4b.
(2) Hierarchy meshes fitting
After initialization, the basic mesh is aligned with the real data. Nevertheless, the basic mesh is so coarse that the basic
contour of human face cannot be described. The subdivision scheme is utilized to refine the basic mesh as well as the
refined mesh is regulated according to the data at each level. With the proceeding of refinement and regulation, the mesh
can represent the individual well level by level.
Here, we describe the regulation in one refining level, which can be extracted to all the levels. During the process of
regulation, not only the regulated nodes move forward to the 3D point cloud, but also the whole surface needs to be kept
smooth as much as possible. To meet with these two requirements, we define the following energy function:
) ( ) , ( ) (
smooth i dis
E x E E + = (3)
where
i
x is one 3D point, is a parameter vector consisting of the coordinates of the mesh nodes and is the
positive weighted factor. The two terms in the right stand for distance and smooth constraints respectively, which can be
balanced by .
The distance term
dis
E means the sum of weighted squared distance from the 3D data point to the model

=
=
N
i
i i i dis
a x d w a x E
1
2
) , ( ) , (
(4)
where
i
w is a weighting factor and is set to be )) , ( 1 /( 0 . 1
2

i
x d + , ) , (
i
x d is the distance from the point
i
x to the basic
model along Z-axis and is the fitted parameter vector. The reason of using weighted factor
i
w is that the data has
outliers and such a weight scheme is an effective way to avoid intense warp [15].
The smooth term
smooth
E attempts to keep the local area planar as described in Marschner [11]. It is formulated by

= =
=
n
k
m
i
ki
k
k smooth
k
v
m
v E
1
2
1
1
) ( (5)
where n is the number of the mesh node,
k
v is the regulated vertex,
ki
v is the neighboring vertices of
k
v and
k
m
is the number of the neighbors of
k
v .
Minimizing ) (a E through regulating the fitted parameters a is a global optimization problem, and there are many
methods to solve such a problem. Here we select Levenberg-Marquardt method [16]. To reduce the dimension of the fitted
parameters, we only regulate the Z coordinate of each node instead of regulating all the coordinates. It is reasonable since
the normalized point cloud faces the positive direction of Z-axis approximately.
The number of fitted parameters is different on the different level. The basic mesh has five fitted parameters, the mesh of
level one has thirteen and there are several hundred parameters (545) of level four. Although the Levenberg-Marquardt
method can deal with this number of fitted parameters easily, more fitted parameters cost more time. The fitted mesh of
lower level gives a good initialization of the optimization of higher level. This can speed up the optimization process and
avoid the effect of the noise to some extent.
Figure 4. Initialization of the basic mesh. (a) The basic mesh is moved to the position
of the nose tip in X-Y plane. (b) The points outside the basic mesh are ignored.
(a) (b)
Y
X
Z
Y
X
Z

Fig.5 shows the mesh after regulation in different refining levels. We can see that the nodes only change along Z-axis
during regulation and that the projection on the X-Y plane is regular and changeless. The coarse mesh does not describe the
human face well though it attempts to approach the point cloud. The mesh of level four is dense enough to represent the
face surface. Of course, the denser the net is, the better the face is represented. Obviously, the denser mesh costs more time
and space. In this paper, we use the mesh refined four times.

3. 3D mesh model matching

To realize identification and verification, it is important to find a distance measure which can quantifies the difference
between two 3D meshes. Our matching process is executed in 3D spaces directly, and it includes two main phases:
regulating the models to the same pose and calculating the difference between two models, which is illustrated in Fig.6.
The different shots have rotation of up/down or right /left, which results in the corresponding pose of the built model.
In order to obtain correct comparison, it is essential to rotate the mesh model to the same pose. We adopt optimization to
make two models being the same pose. Commonly, there are six parameters (translation and rotation) to be considered,
which can be decreased in our case. We can move both two models so that their nose tip nodes (the central nodes of the
original mesh) are on the origin of the coordinate system. Thus, it is reasonable to ignore the translation. In addition, since
the point clouds in 3D_RMA have rotation of right/left (along Y-axis) and up/down (along X-axis), we can also ignore the
rotation along Z-axis. In summary, we only focus on two parameters, that is, rotation along X-axis and along Y-axis.
We also use the classic global optimization algorithm, Levenberg-Marquardt method [16], to solve this problem. The
energy function is similar to Equ.4, and two rotation parameters, being zero initially, are regulated during this optimization
process. Finally, the models with different poses can be regulated to face the same view (Section 4.3 gives more analysis).
In this processing, the nose tip node gives a good initialization, which avoids the local minima effectively. Moreover, the
small dimensional spaces speed up the optimization process.
Here, we can define the distance measure between two models with the sum of all the distance along Z-axis from the
nodes of one model to the facets of another model as illustrated in Fig.7b. This distance measure can be described as
follows:

=
=
n
i
i Z
v d diff
1
) ( (6)
where n is the number of the mesh nodes and ) (
i Z
v d is the distance along Z-axis of the node
i
v . After regulating the
models in different poses to one same pose, we can calculate the matching value diff. The smaller value means more
similar of the two models.

(a) (b) (c) (d) (e)
Figure 5. The regulated mesh models in different levels. (a) Basic mesh. (b) Level one. (c) Level two. (d) Level three. (e)
Level four.
Figure 6. Matching process. (a) Two mesh models in different poses. (b) Two
mesh models regulated in the same pose. (c) Distance measure.
(b) (a) (c)
2 1
mesh mesh diff =
Figure 7. Distance measure of two models: sum
of distances from point to facet.

4. Experimental results

To testify the performance of our proposed method, we implement the algorithm on the 3D database of 3D_RMA. In
this section, the detailed experimental results of face modeling, model rotation and face identification and verification are
given out. All these tests are finished on the PC environment with P 1. 3G CPU, 128M RAM and the display card Nvidia
Getforce2 MX 100/200. The main experiments are done in the software environment of C++ and OpenGL except that the
ROC and CMS curves are plotted with Matlab.

4.1 3D face database
Our experiments are done on the 3D face database of 3D_RMA [17,18,19] under the framework of the M2VTS projects
(AC102) of the ACTS European program. The 3D data is obtained with a projector and a camera according to the
structured light. The subject face surface is described with a 3D point cloud, including more than 3,000 points. Compared
with the laser scanner, their system is cheap and obtained data has limited quality.
The database includes 120 persons and two sessions: Nov. 97(session1) and Jan. 98 (session2). In each session, each
person is sampled three shots, corresponding to central, limited left/right and up/down poses. People sometimes wear their
spectacles, and beards and moustaches are also represented. From these sessions, two databases are built: automatic DB
(120 persons) and manual DB (30 persons). Fig.8 shows some 3D point clouds of one subject from session1. The top and
bottom row are from manual DB and automatic DB respectively and each shot shows front and profile view. Here we can
see that the quality of manual DB is better than that of automatic DB.

4.2 Experimental results of 3D face modeling
We build 3D mesh model for all the point cloud in 3D_RMA in the way described in section 2. This process is automatic
completely.
Fig.9 shows some modeling results of the different persons from different databases. From these models, we can see that
the built model can describe the geometric features well. The different models have distinct discrimination, which can be
exploited to identify the different persons.
There are some cases that a small point cloud is far from the main point cloud (see Fig.10a). Being affected by these
separated data, the normalized result is not correct so that the nose tip is not in the searching area. Thus, the incorrect nose
tip is detected, which results in the wrong modeling process (see the top row in Fig.10(b, c)). Removing the separated data,
we can obtain a better result as shown in the bottom row in Fig.10(b, c).
There does not exist the qualitative rule to estimate the quality of the built mesh model in the past literatures. Thus, we
have to evaluate the models subjectively. We assumed that if the model can describe the basic shape of the human face, this
model is successful, and if not, the model is incorrect.
With further observation, we found that the main reason for modeling failure is the wrong location of the nose tip, which
results in the incorrect initialization and the final error. In summary, there are following reasons to explain the wrong
detection of the nose tip.
First shot Second shot Third shot
Figure 8. The point clouds for one individual from the session1.

Reason 1: The pose up/down is too big. If the jaw is in front of the nose tip, the jaw points will be regarded as the nose
tip. In this case, the mesh models the shape of human jaw, not the shape of the front face. We can rotate the point cloud
along X-axis with a small angle. After rotation, the nose tip detection and face modeling are right.
Reason 2: There contains too much noise. The quality of the data in automatic DB is not so good as that in manual DB.
The noise results in the failure of detecting nose tip. There does not exist the effective method to eliminate the noise of
the 3D data. We have to initialize the nose tip manually and the modeling result is available narrowly.
Reason 3: The data is too sparse. There is not enough knowledge to instruct the process of modeling. the result is so bad
that the face shape is not described. Fortunately, this case is very rare.
Table 1 summarizes the ratio of the incorrect modeling due to different reasons in different databases. Because the data
in manual DB has better quality than that in automatic DB, all the point clouds in manual DB can be modeled successfully.
The noise is the main reason to influence the models in automatic DB. We can adopt some schemes to remedy the modeling
warp due to noise (reason 2). However, there is no good method to patch the shortage of the point cloud (reason 3).
Fortunately, this kind of data is fairly rare.
In summary, our proposed method is robust enough to conquer the small holes and has ability to avoid the influence of
the noise to a certain extent. The modeling results satisfy the experiments of recognition and verification.

4.3 3D mesh model rotation
The three shots of each person in 3D_RMA have different poses. With respect to the defined coordinate system, the first
shot is central, approximately facing the positive direction of Z-axis, the second short has a limited rotation along Y-axis
and the third shot has a limited rotation along Z-axis. To get the correct matching value, it is necessary to rotate the
compared models to the same pose. Section 3 proposes the rotation algorithm and we design the following experiment to
Table 1. Ratio of the incorrect model
Database Reason 1 Reason 2 Reason 3 Total
Automatic DB, session1 0 1.1% 1.1% 2.2%
Automatic DB, session2 1.1% 2.2% 1.1% 4.4%
Manual DB, session1 0 0 0 0
Manual DB, session1 0 0 0 0
Figure 9. Face mesh models. (a) and (b) show the mesh models from different persons
(b) (a)
Figure 10. The scheme for dealing with the separated data. (a) Separated points from
the main point cloud. (b) Nose tip detection. (c) Mesh models.
(a) (b) (c)

test its validity.
We duplicate one mesh model, and rotate this duplicated model with a known angle along X-axis and Y-axis. Then, the
rotated model is regulated back using the proposed algorithm. If the algorithm is successful, the rectified angle is equal to
the known angle and the matching value is zero approximately. In one experiment, we only rotate the model along X-axis
as shown in Fig.11 and Table 2 shows the experimental results. In another experiment, the model is rotated along both
X-axis and Y-axis as illustrated in Table 3. Because the normalized data in 3D_RMA is vertical approximately, the rotation
along Z-axis is not considered.
In both tables, the rx , ry and diff rows mean the rotated angle of the duplicated model along X-axis and Y-axis and
the distance between the two models. The rx , ry and diff rows show the regulated angels and distance after rotation.
From Table 2, we can see that a big angle (about 80) along X-axis can be rectified well and the matching value is close to
zero. At the same time, a small noise is produced along Y-axis. From Table 3, the rotation within 30 along X-axis and
Y-axis can be rectified simultaneously. Fortunately, the rotation of the shot in 3D_RMA is less than 30 so that our rotation
algorithm is effective in our application.

4.4 Recognition performance
In this section, we will test the verification performance, identification performance and comparison performance with
Table 2. Rotation along X-axis
rx
10 20 30 40 50 60 70 80 90
rx
9.67 19.55 29.42 38.64 48.13 58.66 68.60 78.19 82.81
ry 0.07 0.03 -0.12 0.05 0.07 -1.40 -0.25 -0.59 6.10
) 10 (
4
diff 2.790 40.14 159.1 818.9 2911 6114 12718 27088 70114
) 10 (
4
diff 0.00 0.00 0.00 0.02 0.13 0.24 0.13 0.12 0.86
* rx : manual rotated angle along X; rx : regulated angle along X; ry : regulated angle along Y; diff : distance
before regulation; diff : distance after regulation.
Table 3. Rotation along X-axis and Y-axis
rx
1 5 10 20 30 40
ry
1 5 10 20 30 40
rx
0.97 4,83 9.90 20.53 31.50 44.10
ry 0.97 4.89 9.82 18.96 26.34 30.83
) 10 (
4
diff 0.00 1.37 19.12 252.46 131.99 2775.08
) 10 (
4
diff 0.00 0.00 0.00 0.13 2.04 16.57
* rx : manual rotated angle along X; rx : regulated angle along X; ry : manual rotated angle along Y; ry :
regulated angle along Y; diff : distance before regulation; diff : distance after regulation.
Figure 11. Experiments for rotate two models to the same pose. The left image includes the original model
and the duplicated, rotated model. In the right image, the two models are rotated to the same pose and the
matching value is zero approximately.
Rotated model
Original model

the common evaluation method, and at the same time, the comparison analysis with previous work is given out. In all the
experiments of this section, the incorrect models in automatic DB are remained in order to obtain the performance of
automatic recognition system.
1) Verification performance
We test the verification performance on single session sets and blending session sets of 3D_RMA respectively. On
single session test, the gallery set includes the mesh models from first shot of all the subjects, and the probe set includes the
remained models. On session1-2 test, the gallery set includes the models from first shot of the session1, and the probe set
includes all models in session2 and the remains of the session1. Fig.12 shows the ROC curves for different sets.
In Fig.12, ROC (Receiver Operating Curve), which is trade-off between False Rejection Rate (FRR) and False
Acceptance Rate (FAR), is shown to illustrate the verification performance of the proposed algorithm. Compared with the
algorithm reported by Beumier et al. [17], our method has better performance on manual DB of the single session.
From Fig.12, we can also see that manual DB has better performance than automatic DB since the quality of data in
manual DB is better. We also see that the incorrect mesh models affect the performance of our whole system strongly. The
database combined with two sessions has higher EER than the single session because the subjects are different between
two sessions, such as glasses and moustaches, etc.
2) Identification performance
Here we utilize the CMS curve to evaluate the recognition performance. The Cumulative Match Score (CMS) with
respect to the Rank is proposed in the FERET evaluation as standard for face recognition [20]. The Rank is an integer N
that denotes the first N best matches in one recognition test. The CMS is a ratio between the number of correct recognition
tests and the number of total tests.
(b) Manual DB (a) Automatic DB
Figure 12. Verification performance in the form of ROC.
Manual DB, Session1-2
Manual DB, Session2
Manual DB, Session1

Automatic DB, Session1-2
Automatic DB, Session2
Automatic DB, Session1

Automatic DB, Session1-2
Automatic DB, Session2
Automatic DB, Session1

Manual DB, Session1-2
Manual DB, Session2
Manual DB, Session1
Figure 13. Identification performance in the form of CMS.
(a) Automatic DB (b) Manual DB

The recognition experiments use the same gallery set and probe set as the verification experiments. The experimental
results on different databases are plotted as CMS curve in Fig.13. From an overall view of both figures, the identification
performance on manual DB is better than that on automatic DB.
Compared with recognition results of the algorithms based on 2D images reported by Phillips et al [20], our recognition
performance is better on the condition that the data is sampled in variety environment and that the algorithm is automatic
completely. However, the size of our gallery set and probe set is a little small.
3) Comparison performance
We adopt the method described by Gordon [21] to estimate the comparison performance. For the successful
recognition, the difference between two instances of the same face should be smaller than the difference between two
instances of different faces. If there are m subjects and n instances of each subject in one database, we can divide it into
two sets:
{ }
n
A , , ,
2 1
L = ,
{ }
n m
B
) 1 ( 2 1
, , ,

= L .
Set A contains all of the instances of one individual, and set B contains all of the instances of other subjects. If the
comparison is right, for j i
) , ( ) , (
k i j i
diff diff < ,
where n j i < , and n m k ) 1 ( < . Test how often this relationship holds on the all comparisons. For instance, two
instances of the same person are compared and 10 instances of the different persons have lower matching value, this will
count as 10 incorrect comparisons. In our case, we use the first shot as the targets, that is, 0 = i . Thus the total number
of comparisons is n m n m ) 1 )( 1 ( . Table 4 shows the comparison results on the different sets of Manual DB.
In the work of Gordon [21], they only show the results of comparison performance in a smaller 3D database. Their
results is better than ours. In our opinion, there are three reasons mainly: (a) Theirs 3D data is obtained from a laser
scanner of Cyberware, and the quality is much better than ours; (b) The database they used is small, only 8 persons and
24 instances; (d) Without eyeglasses or beards, as well as without pose difference. Furthermore, they tested on a little big
database, about 48 instances, and the performance became a little low. Unfortunately, we cannot obtain their database so
that we have no a quantitative comparison. Our algorithms are tested on a bigger database and have more persuasion.

4.5 Discussion
The main reason to baffle the 3D face recognition is shortage of the 3D face database. Although the technique for 3D
capture is improved quickly, the common databases for testing algorithm are not reported as widely as the 2D intensity
image database. Our algorithm is tested on 3D_RMA, perhaps the biggest 3D database in public, and the result is more
convinced than that in [21]. Further, we also try to build a bigger database for testing algorithms.
The algorithm dealing with 3D data usually costs considerable time, which also reduces the interest of 3D recognition.
In our recognition framework, the computational time can be tolerable. The mean time for modeling each individual is
about 0.5s and comparing two models costs about 0.7s in our PC computer. The surface matching approach in [17] uses 15
profiles to describe the 3D shape and regulates six parameters during matching process, which results in costing much time.
In [4,5], they adopt a generic model and a single image to build the individual model. Although their matching method is
fast, the modeling process is very slow.
The size of the mesh model affects the performance of identification and verification. We find that when the basic mesh
is in the size 9 . 0 7 . 0 (normalized size) and symmetric with respect to the central nodes, the recognition accuracy is best.
It is reasonable since this kind of mesh covers the nose, eyes and mouth in most instances, which can characterize the
individual and removes the noise points around the edge of the point cloud. Furthermore, some facial areas are affected
by expression variation strongly, such as mouth and jaw. We can tune the size or the shape of the mesh so that these areas
are not included in the mesh model. Thus, this method is immune to the expression. Of course, more experiments should
be done to validate it.
Table 4. Rate of incorrect comparison on different sets of Manual DB
Database
Total No. of
comparisons
No. of error
comparisons
Rate of incorrect
comparisons
Manual DB, session1 5220 151 2.89%
Manual DB, session2 5220 191 3.66%
Manual DB, session1-2 26100 1047 4.01%

Another reason to affect the recognition accuracy is the noise, which has been seen from the identification and
verification results. The quality of data affects the precision of mesh model and further affects the recognition accuracy.
The data in manual DB has better quality than that in automatic DB, which results in the better performance of
recognition as shown in above Section. With the improvement of 3D capture, the quality of the data is becoming better
and the price is becoming lower. This factor is not the key problem in the future application perhaps.
In addition, other factors such as glasses, hairstyle, moustache and respirator, affect the 3D capture and further the
recognition performance. Obviously, all the face recognition methods are confronted with these common problems.
However, the proposed method is robust to makeup and illumination to a certain extent since it does not depend on the
grey information.
In this paper, our main aim is to test the performance of comparing the mesh models, and we only use the simplest
method for classifying. To improve the performance of verification and identification, some better classifying method
can be adopted. Moreover, the gallery set can also be organized more effectively.

5. Conclusion

In this paper, the new idea of describing the human face with a 3D mesh is proposed. To realize this idea, the robust
modeling method and matching scheme are developed. By tested on 3D_RAM, maybe the biggest 3D database currently in
public, the proposed algorithm seems promising. In the future, we may focus on the method of reducing the noise and
modeling, which is the key problem affecting the recognition accuracy strongly.

Acknowledgements
This work is supported by research funds from the Natural Science Foundation of China (Grant No. 60121302 and
60332010) and the Outstanding Overseas Chinese Scholars Fund of CAS (No.2001-2-8). We would like to thank Dr. C.
Beumier for the database 3D_RAM. Also thanks to our colleagues for constructive suggestions.


References

[1] W. Zhao, R. Chellappa, A. Rosenfeld and P.J. Phillips, Face recognition: a literature survey, ACM Computing
Surveys, Vol. 35, No.4, pp.399-458, 2003.
[2] G.G. Gordon, Face recognition based on depth maps and surface curvature, SPIE Proc., Vol 1570: Geometric
Methods in Computer Vision, pp.234-247, 1991.
[3] J.C. Lee, and E. Milios, Matching range images of human faces, Proc. ICCV90, pp.722-726, 1990.
[4] S. Romdhani, V. Blanz and T. Vetter, Face identification by matching a 3D morphable model using linear shape and
texture error functions, Proc. ECCV02, Vol.4, pp.3-19, 2002.
[5] V. Blanz, S. Romdhani and T. Vetter, Face identification across different poses and illumination with a 3D morphable
model, Proc. IEEE International Conference on Automatic Face and Gesture Recognition, pp.202-207, 2002.
[6] Y. Shan, Z. Liu and Z. Zhang Model-based bundle adjustment with application to face modeling, Proc. ICCV01,
pp.644-651, July 2001.
[7] Z. Liu, Z. Zhang, C. Jacobs, and M. Cohen, Rapid modeling of animated faces from video, Technical Report,
MSR-TR-2000-11, Microsoft Research, Microsoft Corporation, 2000.
[8] W.S. Lee and N.M. Thalmann, Head modeling from pictures and morphing in 3D with image metamorphosis based
on triangulation, Captech98 (Modelling and Motion Capture Techniques for Virtual Environments), pp.254-267,
1998.
[9] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H. Salesin, Synthesizing realistic facial expressions from
photographs, SIGGRAPH Proceedings, pp.75-84, 1998.
[10] Y.C. Lee, D. Terzopoulos, and K. Waters, Realistic face modeling for animation, SIGGRAPH Proceedings,
pp.55-62, 1995.
[11] S.R. Marschner, B. Guenter, and S.Raghupathy, Modeling and rendering for realistic facial animation, Proc. 11th
Eurographics rendering workshop, pp.231 -242, 2000.
[12] D. DeCarlo, D. Metaxas, and M. Stone, An anthropometric face model using variational techniques, SIGGRAPH
Proceedings, pp.67-74, 1998.
[13] R. Franke, Scattered data interpolation: tests of some methods, Mathematics of Computation, Vol.38, pp.181-200,



1982.
[14] Y. Yacoob, and L.S. Davis, Labeling of human face components from range data, CVGIP: Image Understanding,
60(2):168-178, 1994.
[15] P. Fua, Using model-driven bundle-adjustment to model heads from raw video sequences, Proc. ICCV99, pp.46-53,
1999.
[16] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical recipes in C: the art of scientific
computation, Cambridge University Press, Cambridge, second edition, 1992.
[17] C. Beumier and M. Acheroy, Automatic 3D face authentication, Image and Vision Computing, 18(4):315-321, 2000.
[18] C. Beumier and M. Acheroy, Automatic face authentication from 3D surface, BMVC98, pp.449-458, 1998.
[19] C. Beumier and M. Acheroy, 3D facial surface acquisition by structured light, International Workshop on
Synthetic-Natural Hybrid Coding and Three dimensional Imaging, pp.103-106, 1999.
[20] P.J. Phillips, H. Moon, S.A. Rizvi, and P.J. Rauss, The feret evaluation methodology for face-recognition algorithm,
IEEE Transaction on PAMI, 22(10):1090-1104, 2000.
[21] G..G. Gordon, Face recognition based on depth and curvature features, Proc. CVPR92, pp.108-110, 1992.

Das könnte Ihnen auch gefallen