Sie sind auf Seite 1von 9

Coding, Analysis, Interpretation, and Recognition of Facial Expressions

Irfan A. Essa

College of Computing, GVU Center,


Georgia Institute of Technology,
Atlanta, GA 30332-0280, USA.
irfan@cc.gatech.edu.

Abstract Perhaps the most important work in this area is that of Ek-
man and Friesen [9], who have produced the most widely
We describe a computer vision system for observing facial used system for describing visually distinguishable facial
motion by using an optimal estimation optical flow method movements. This system, called the Facial Action Coding
coupled with geometric, physical and motion-based dynamic System or FACS, is based on the enumeration of all “action
models describing the facial structure. Our method produces units” of a face which cause facial movements.
a reliable parametric representation of the face's indepen- However a well recognized limitation of this method is
dent muscle action groups, as well as an accurate estimate the lack of temporal and detailed spatial information (both at
of facial motion. local and global scales) [10, 23]. Additionally, the heuristic
Previous efforts at analysis of facial expression have been “dictionary” of facial actions originally developed for FACS-
based on the Facial Action Coding System (FACS), a repre- based coding of emotion, after initial experimentation, has
sentation developed in order to allow human psychologists proven quite difficult to adapt for machine recognition of fa-
to code expression from static pictures. To avoid use of this cial expression.
heuristic coding scheme, we have used our computer vision To improve this situation we would like to objectively
system to probabilistically characterize facial motion and quantify facial movements using computer vision tech-
muscle activation in an experimental population, thus de- niques. Consequently, the goal this paper is to provide a
riving a new, more accurate representation of human facial method for extracting an extended FACS model (FACS+),
expressions that we call FACS+. by coupling optical flow techniques with a dynamic model
Finally, we show how this method can be used for cod- of motion, may it be physics-based model of both skin and
ing, analysis, interpretation, and recognition of facial ex- muscle, geometric representation of a face or a motion spe-
pressions. cific model.
We will show that our method is capable of detailed, re-
peatable facial motion estimation in both time and space,
Keywords: Facial Expression Analysis, Expression with sufficient accuracy to measure previously-unquantified
Recognition, Face Processing, Emotion Recognition, muscle coarticulations, and relates facial motions to facial
Facial Analysis, Motion Analysis, Perception of Action, expressions. We will further demonstrate that the parame-
Vision-based HCI. ters extracted using this method provide improved accuracy
for analysis, interpretation, coding and recognition of facial
1. Introduction expression.

Faces are much more than keys to individual identity, 1.1 Background
they play a major role in communication and interaction that
makes machine understanding, perception and modeling of Representations of Facial Motion: Ekman and
human expression an important problem in computer vision. Friesen [9] have produced a system for describing “all
There is a significant amount research on facial expressions visually distinguishable facial movements”, called the
in computer vision and computer graphics (see [10, 23] for Facial Action Coding System or FACS. It is based on the
review). Perhaps the most fundamental problem in this area enumeration of all “action units” (AU s) of a face that
is how to categorize active and spontaneous facial expres- cause facial movements. There are 46 AU s in FACS that
sions to extract information about the underlying emotional account for changes in facial expression. The combination
states? [6]. Although a large body of work dealing with hu- of these action units result in a large set of possible facial
man perception of facial motions exists, there have been few expressions. For example smile expression is considered to
attempt to develop objective methods for quantifying facial be a combination of “pulling lip corners (AU 12+13) and/or
movements. mouth opening (AU 25+27) with upper lip raiser (AU 10)

1
and bit of furrow deepening (AU 11).” However this is only 80%. In many ways these are impressive results, consider-
one type of a smile; there are many variations of the above ing the complexity of the FACS model and the difficulty in
motions, each having a different intensity of actuation. measuring facial motion within small windowed regions of
Despite its limitations this method is the most widely used the face.
method for measuring human facial motion for both human In our view perhaps the principle difficulty these re-
and machine perception. searchers have encountered is the sheer complexity of de-
scribing human facial movement using FACS. Using the
FACS representation, there are a very large number of AU s,
Tracking facial motion: There have been several attempts which combine in extremely complex ways to give rise
to track facial expressions over time. Mase and Pentland [20] to expressions. Moreover, there is now a growing body
were perhaps the first to track action units using optical flow. of psychological research that argues that it is the dynam-
Although their method was simple, without a physical model ics of the expression, rather than detailed spatial deforma-
and formulated statically rather than within a dynamic opti- tions, that is important in expression recognition. Several
mal estimation framework, the results were sufficiently good researchers [1, 2, 6, 7, 8, 17] have claimed that the timing
to show the usefulness of optical flow for observing facial of expressions, something that is completely missing from
motion. FACS, is a critical parameter in recognizing emotions. This
Terzopoulos and Waters [29] developed a much more so- issue was also addressed in the NSF workshops and reports
phisticated method that tracked linear facial features to esti- on facial expressions [10, 23]. To us this strongly suggests
mate corresponding parameters of a three dimensional wire- moving away from a static, “dissect-every-change” analy-
frame face model, allowing them to reproduce facial expres- sis of expression (which is how the FACS model was devel-
sions. A significant limitation of this system is that it requires oped), towards a whole-face analysis of facial dynamics in
that facial features be highlighted with make-up for success- motion sequences.
ful tracking.
Haibo Li, Pertti Roivainen and Robert Forchheimer [18]
describe an approach in which a control feedback loop be- 2. Visual Coding of Facial Motion
tween what is being visualized and what is being analyzed
is used for a facial image coding system. Their work is the 2.1 Vision-based Sensing: Visual Motion
most similar to ours, but both our goals and implementation
are different. The main limitation of their work is lack of We use optical flow processing as the basis for percep-
detail in motion estimation as only large, predefined areas tion and measurement of facial motion. We have found
were observed, and only affine motion computed within each that Simoncelli' s [28] method for optical flow computation,
area. These limits may be an acceptable loss of quality for which uses a multi-scale, coarse-to-fine, Kalman filtering-
image coding applications. However, for our purposes this based algorithm, provides good motion estimates and error-
limitation is severe; it means we cannot observe the “true” covariance information. Using this method we compute the
patterns of dynamic model changes (i.e., muscle actuations) v^
estimated mean velocity vector i (t), which is the estimated
because the method assumes the FACS model as the under- flow from time t to t + 1. We also store the flow covari-
lying representation. We are also interested in developing a 
ances v between different frames for determining confi-
representation that is not dependent on FACS and is suitable dence measures and for error corrections in observations for
not just for tracking, but recognition and analysis. the dynamic model (see Section 2.3 and Figure 3 [observa-
tion loop (a)]).
Recognition of Facial Motion: Recognition of facial ex-
pressions can be achieved by categorizing a set of such pre- 2.2 Facial Modeling
determined facial motions as in FACS, rather than determin-
ing the motion of each facial point independently. This is A priori information about facial structure is required for
the approach taken by several researchers [19, 20, 33, 4] for our framework. We use a face model which is an elabora-
their recognition systems. Yacoob and Davis, who extend the tion of the facial mesh developed by Platt and Badler [27].
work of Mase, detect motion (only in eight directions) in six We extend this into a topologically invariant physics-based
predefined and hand initialized rectangular regions on a face model by adding anatomically-based muscles to it [11].
and then use simplifications of the FACS rules for the six In order to conduct analysis of facial expressions and to
universal expressions for recognition. The motion in these define a new suitable set of control parameters (FACS+) us-
rectangular regions, from the last several frames, is corre- ing vision-based observations, we require a model with time
lated to the FACS rules for recognition. Black and Yacoob dependent states and state evolution relationships. FACS
extend this method, using local parameterized models of im- and the related AU descriptions are purely static and pas-
age motion to deal with large-scale head motions. These sive, and therefore the association of a FACS descriptor with
methods show about 90% accuracy at recognizing expres- a dynamic muscle is inherently inconsistent.
sions in a database of 105 expressions [4, 33]. Mase [19] on By modeling the elastic nature of facial skin and the
a smaller set of data (30 test cases) obtained an accuracy of anatomical nature of facial muscles we develop a dynamic

2
thickness t i
Element p
hk
k hj hi

Centroid i j
hi = Ai
A (a) Original Image (b)with Eyes, Lips (c) Face Model
Aj Ak A = Ai + Aj + Ak; Total Area Nose Extracted

Ai hi + hj + hk = 1.0
k

j Assemble over whole mesh

Figure 1. Using the geometric mesh to determine the


continuum mechanics parameters of the skin using Fi- (d) Mask of (e) Warped & (f) Canonical
nite Element Methods. Model Masked Points Extracted

Figure 2. Initialization on a face image using Modu-

muscle-based model of the face, including FACS-like con- lar Eigenfeatures method with a canonical model of a
trol parameters (see [11, 32] for implementation details). face.
A physically-based dynamic model of a face may be con-
structed by use of Finite Element methods. These methods
give our facial model an anatomically-based facial structure View-based and Modular Eigenspace methods of Pentland
by modeling facial tissue/skin, and muscle actuators, with a and Moghaddam [22, 24].
geometric model to describe force-based deformations and Using this method we can automatically extract the po-
control parameters [3, 15, 21]. sitions of the eyes, nose and lips in an image as shown in
By defining each of the triangles on the polygonal mesh Figure 2(b). These feature positions are used to warp the
as an isoparametric triangular shell element, (shown in Fig- face image to match the canonical face mesh (Figure 2(c)
ure 1), we can calculate the mass, stiffness and damping and (d)). This allows us to extract the additional “canoni-
matrices for each element (using dV = tel dA), where tel cal feature points” on the image that correspond to the fixed
is thickness, given the material properties of skin (acquired (non-rigid) nodes on our mesh (Figure 2(f)). After the ini-
from [26, 30]). Then by the assemblage process of the direct tial registering of the model to the image the coarse-to-fine
stiffness method [3, 15] the required matrices for the whole flow computation methods presented by Simoncelli [28] and
mesh can be determined. As the integration to compute the Wang [31] are used to compute the flow. The model on the
matrices is done prior to the assemblage of matrices, each el- face image tracks the motion of the head and the face cor-
ement may have different thickness tel , although large differ- rectly as long as there is not an excessive amount of rigid
ences in thickness of neighboring elements are not suitable motion of the face during an expression. This limitation can
for convergence [3]. be addressed by using methods that attempt to track and sta-
The next step in formulating this dynamic model of the bilize head movements (e.g., [12, 4]).
face is the combination of the skin model with a dynamic
muscle model. This requires information about the attach-
ment points of the muscles to the face, or in our geomet- Images to face model
ric case the attachment to the vertices of the geometric sur- Simoncelli' s [28] coarse-to-fine algorithm for optical flow
face/mesh. The work of Pieper [26] and Waters [32] provides computations provides us with an estimated flow vector, i . v^
us with the required detailed information about muscles and Now using the a mapping function, M, we would like to
muscle attachments. compute velocities for the vertices of the face model g . v
Then, using the physically-based modeling techniques and
2.3 Dynamic Modeling and Estimation the relevant geometric and physical models, described ear-
lier, we can calculate the forces that caused the motion. Since
Initialization of Model on an image we are mapping global information from an image (over the
whole image) to a geometric model, we have to concern
In developing a representation of facial motion and then us- ourselves with translations (vector T ), and rotations (ma-
ing it to compare to new data we need to locate a face and trix R). The Galerkin polynomial interpolation function H
the facial features in the image followed by a registration and the strain-displacement function B, used to define the
of these features for all faces in the database. Initially we mass, stiffness and damping matrices on the basis of the fi-
started our estimation process by manually translating, ro- nite element method are applied to describe the deformable
tating and deforming our 3-D facial model to fit a face in behavior of the model [15, 25, 3].
an image. To automate this process we are now using the We would like to use only a frontal view to determine fa-

3
Physics-based
Geometry-based
Shape Parameters
L
The Kalman gain matrix is obtained by solving the follow-
ing Riccati equation to obtain the optimal error covariance

Control Parameters
-G matrix e :
Feedback

d
Muscle Activation (b)

dt e = Ae + e A + Gp G , e C m Ce :


U(t)
(Control Input)
B
+
+ + ∫ State
X(t) T T T ,1
Facial Expressions / Motion Field Dynamics (c) Estimates
(3)

(Observations)
Y(t) - L A
We solve for e in Equation (3) assuming a steady-state sys-
Observation
Errors
(a) 
d e = 0).
tem (i.e., dt
C The Kalman filter, Equation (2), mimics the noise free dy-
namics and corrects its estimate with a term proportional to
Y CX^
the difference ( , ), which is the innovations process.
Figure 3. Block diagram of the control-theoretic ap- This correction is between the observation and our best pre-
proach. Showing the estimation and correction loop diction based on previous data. Figure 3 shows the estima-
(a), the dynamics loop (b), and the feedback loop (c). tion loop (the bottom loop) which is used to correct the dy-
namics based on the error predictions.
The optical flow computation method has already estab-

lished a probability distribution ( v (t)) with respect to the
cial motion and model expressions, and this is only possible observations. We can simply use this distribution in our dy-
if we are prepared to estimate the velocities and motions in namic observations relationships. Hence we obtain:
the third axis (going into the image, the z -axis). To accom-
plish this, we define a function that does a spherical mapping, m (t) = M(x; y; z)v(t); and; Y(t) = M(x; y; z)v^i(t):
S (u; v), where are u and v are the spherical coordinates. (4)
The spherical function is computed by use of a prototype
3-D model of a face with a spherical parameterization; this
Control, Measurement and Correction of Dynamic Mo-
canonical face model is then used to wrap the image onto the
tion
shape. In this manner, we determine the mapping equation:

(x; y; z )v
^i(x; y j z; y)
Now using a control theory approach we will obtain the mus-
vg (x; y; z)  M =
HSR (v^i(x; y) + T ) :
cle actuations. These actuations are derived from the ob-
U
(1)
served image velocities. The control input vector is there-
For the rest of the paper, unless otherwise specified, when- fore provided by the control feedback law: U = ,G , X
ever we talk about velocities we will assume that the above where G is the control feedback gain matrix. We assume
mapping has already been applied. the instance of control under study falls into the category of
an optimal regulator (as we need some optimality criteria for
Estimation and Control
an optimal solution [16]). Hence, the optimal control law U
is given by:
Driving a physical system with the inputs from noisy mo- U R B PX
 = , ,1 T c  (5)
tion estimates can result in divergence or a chaotic physi-
X P
where  is the optimal state trajectory and c is given by
cal response. This is why an estimation and control frame-
work needs to be incorporated to obtain stable and well- solving yet another matrix Riccati equation [16]. Here Q
is a real, symmetric, positive semi-definite state weighting
proportioned results. Similar considerations motivated the
control framework used in [18]. Figure 3 shows the whole R
matrix and is a real, symmetric, positive definite control
weighting matrix. Comparing with the control feedback law
framework of estimation and control of our active facial ex-
pression modeling system. The next few sections discuss R BP
we obtain G = ,1 T c. This control loop is also shown
these formulations. in the block diagram in Figure 3 (upper loop (c)).
The continuous time Kalman filter (CTKF) allows us
to estimate the uncorrupted state vector, and produces an 2.4 Motion Templates from the Facial Model
optimal least-squares estimate under quite general condi-
tions [5, 16]. The Kalman filter is particularly well-suited So far we have discussed how we can extract the muscle
to this application because it is a recursive estimation tech- actuations of an observed expression. These methods have
nique, and so does not introduce any delays into the system relied on detailed geometric and/or physics-based descrip-
(keeping the system active). The CTKF for the above system tion of facial structure. However our control-theoretic ap-
 
X^_ = AX^ + BU + L Y , CX^ ;
is: proach can also be used to extract the “corrected” or “noise-
free” 2-D motion field that is associated with each facial ex-
where:L = e CT m ,1 ;
(2)
pression. In other words, as much as the dynamics of motion
is implicit into our analysis, it does not explicitly require a
where X^ is the linear least squares estimate of X based on geometric and/or physical model of the structure. The de-
Y ( ) for  < t and e the error covariance matrix for X^ . tailed models are there so that we can back-project the facial

4
Expression Magnitude of Control
Point Deformation
AU 2

20
0
Defs15
5 10
(a) Surprise (b) Smile 10 8
5
0 6
Time
20 4
40
Shape
pe Control Points 2
60

Raising
Eyebrow

(c) & (d) Model

20
0
Defs155 10
10
0
5 8
0 6
Time
20 4
40
Shape
pe Control Points 2
60

(e) & (f) Motion Energy Figure 5. FACS/Candide deformation vs. Observed
deformation for the Raising Eyebrow expression. Sur-
Figure 4. Determining of expressions from video se- face plots (top) show deformation over time for FACS
quences. (a) and (b) show expressions of smile and actions AU 2, and (bottom) for an actual video se-
surprise, (c) and (d) show a 3-D model with surprise quence of raising eyebrows.
and smile expressions, and (e) and (f) show the spatio-
temporal motion energy representation of facial mo-
tion for these expressions. cial motion. Then learning the “ideal” 2-D motion views
(e.g., motion energy) for each expression we can charac-
terize spatio-temporal templates for those expressions. Fig-
motion onto these models and use these models to extract a ure 4 (e) and (f) shows examples of this representation of fa-
representation in the state-space of these models. We could cial motion energy. It is this representation of facial motion
just use the motion and velocity measurements for analy- that we use for generating spatio-temporal “templates” for
sis, interpretation and recognition without using the geomet- coding, interpretation and recognition of facial expressions.
ric/physical models. This is possible by using 2-D motion
energy templates that encode just the motion. This encoded 3. Analysis and Representations
motion in 2-D is then used as representation for facial action.
The goal of this work is to develop a new representation
The system shown in Figure 3 employs optimal estima- of facial action that more accurately captures the character-
tion, within an optimal control and feedback framework. istics of facial motion, so that we can employ them in recog-
It maps 2-D motion observations from images onto a dy- nition, coding and interpretation of facial motion. The cur-
namic model, and then the estimates of corrected 2-D mo- rent state-of-the-art for facial descriptions (either FACS itself
tions (based on the optimal dynamic model) are used to cor- or muscle-control versions of FACS) have two major weak-
rect the observations model. Figure 9 and Figure 10 show nesses:
the corrected flow for the expressions of raise eyebrow and
smile, and also show the corrected flow after it has been ap-  The action units are purely local spatial patterns. Real
plied to a dynamic face model. Further corrections are possi- facial motion is almost never completely localized; Ek-
ble by using deformations of the facial skin (i.e., the physics- man himself has described some of these action units
based model) as constraints in state-space that only measures as an “unnatural” type of facial movement. Detecting a
image motion. unique set of action units for a specific facial expression
By using this methodology without the detailed 3-D ge- is not guaranteed.
ometric and physical models and back-projecting the facial
motion estimates into the image we can remove the complex-  There is no time component of the description, or only a
ity of physics-based modeling from our representation of fa- heuristic one. From EMG studies it is known that most

5
Expression Magnitude of Control resolution of our representation using the smile and eyebrow
Point Deformation raising expressions. Questions of repeatability and accuracy
AU 12 will also be briefly addressed.

3.1 Spatial Patterning


10
0
Defs
5
10
To illustrate that our new parameters for facial expressions
8
0 6
are more spatially detailed than FACS, comparisons of the
Time
20 4 expressions of raising eyebrow and smile produced by stan-
40
Shape
pe Control Points
60 2 dard FACS-like muscle activations and our visually extracted
muscle activations are shown in Figure 5 and Figure 6.
Smile The top row of Figure 5 shows AU 2 (“Raising Eyebrow”)
from the FACS model and the linear actuation profile of the
corresponding geometric control points. This is the type
155
of spatio-temporal patterning commonly used in computer
10
Defs 10
0
8
graphics animation. The bottom row of Figure 5 shows the
5
0 6 observed motion of these control points for the expression
Time
20
40
4 of raising eyebrow by Paul Ekman. This plot was achieved
Shape
pe Control Points
60 2 by mapping the motion onto the FACS model and the actu-
ations of the control points measured. As can be seen, the
observed pattern of deformation is very different than that
Figure 6. FACS/Candide deformation vs. Observed assumed in the standard implementation of FACS. There is
deformation for the Happiness expression. Surface a wide distribution of motion through all the control points,
plots (top) show deformation over time for FACS ac- not just around the largest activation points.
tion AU 12, and (bottom) for an actual video sequence Similar plots for smile expression are shown in Figure 6.
of happiness. These observed distributed patterns of motion provide a de-
tailed representation of facial motion that we will show is
sufficient for accurate characterization of facial expressions.
facial actions occur in three distinct phases: applica-
tion, release and relaxation. In contrast, current sys- 3.2 Temporal Patterning
tems typically use simple linear ramps to approximate
the actuation profile. Coarticulation effects are not ac- Another important observation about facial motion that is
counted for in any facial movement. apparent in Figure 5 and Figure 6 is that the facial motion
is far from linear in time. This observation becomes much
Other limitations of FACS include the inability to describe more important when facial motion is studied with refer-
fine eye and lip motions, and the inability to describe the ence to muscles, which is in fact the effector of facial mo-
coarticulation effects found most commonly in speech. Al- tion and the underlying parameter for differentiating facial
though the muscle-based models used in computer graphics movements using FACS.
have alleviated some of these problems [32], they are still too The top rows of Figure 5 and Figure 6, that show the de-
simple to accurately describe real facial motion. Our method velopment of FACS expressions can only be represented by
lets us characterize the functional form of the actuation pro- a muscle actuation that has a step-function profile. Figure 7
file, and lets us determine a basis set of “action units” that and Figure 8 show plots of facial muscle actuations for the
better describes the spatial properties of real facial motion. observed smile and eyebrow raising expressions. For the pur-
Evaluation is an important part of our work as we do need pose of illustration, in this figure the 36 face muscles were
to experiment extensively on real data to measure the valid- combined into seven local groups on the basis of their prox-
ity of our new representation. For this purpose we have de- imity to each other and to the regions they effected. As can
veloped a video database of people making expressions;the be seen, even the simplest expressions require multiple mus-
results presented here are based on 52 video sequences of cle actuations.
8 users making 6 different expressions. These expressions Of particular interest is the temporal patterning of the
were all acquired at 30 frames per second at full NTSC video muscle actuations. We have fit exponential curves to the ac-
resolution. tivation and release portions of the muscle actuation profile
Currently these subjects are video-taped while making an to suggest the type of rise and decay seen in EMG studies of
expression on demand. These “on demand” expressions have muscles. From this data we suggest that the relaxation phase
the limitation that the subjects' emotion generally does not of muscle actuation is mostly due to passive stretching of the
relate to his/her expression. However we are for the moment muscles by residual stress in the skin.
more interested in measuring facial motion and not human Note that Figure 8 for the smile expression also shows a
emotion. In the next few paragraphs, we will illustrate the second, delayed actuation of muscle group 7, about 3 frames

6
Application Release Relax D motion energy, to accurately characterize each expression.
10
9
Our initial experimentation on automatic characterization
Group 1
8 of facial expression is based on 52 image sequences of 8 peo-
a(ebx -1) Group 2
ple making expressions of smile, surprise, anger, disgust,
Muscle Actuation

7
6 Group 3
raise brow, and sad. Some of our subjects had problems
5
4
a(e (c-bx)
-1) Group 4 making the expression of sad, therefore we have decided to
3
Group 5 exclude that expression from our study. Complete detail of
2 Group 6 our work on expression recognition using the representations
1
Group 7 discussed here appears elsewhere [14]. Using two different
0
0 1 2 3 4 5 6 7 8 9 Expected methods; one based on our detailed physical model and the
Time (Frames) other on our 2-D spatio-temporal motion energy templates,
both showed recognition accuracy rates of 98%.

Figure 7. Actuations over time of the seven main mus-


cle groups for the expressions of raising brow. The 4. Discussion and Conclusions
plots shows actuations over time for the seven muscle
groups and the expected profile of application, release
We have developed a mathematical formulation and im-
and relax phases of muscle activation.
plemented a computer vision system capable of detailed
Application Release Relax
analysis of facial expressions within an active and dynamic
3 framework. The purpose of this system to to analyze real fa-
2.5 a(e(c-bx)-1)
Group 1 cial motion in order to derive an improved model (FACS+)
a(ebx -1) Group 2 of the spatial and temporal patterns exhibited by the human
Muscle Actuations

2
Second Peak
Group 3 face.
1.5 Group 4 This system analyzes facial expressions by observing ex-
1 Group 5 pressive articulations of a subject' s face in video sequences.
0.5 Group 6 The visual observation (sensing) is achieved by using an op-
0
Group 7 timal optical flow method. This motion is then coupled to a
0 1 2 3 4 5 6 7 8 9 Expected physical model describing the skin and muscle structure, and
Time (Frames) the muscle control variables estimated.
By observing the control parameters over a wide range of
Figure 8. Actuations over time of the seven main mus- facial motions, we can then extract a minimal parametric rep-
cle groups for the expressions of smiling – lip motion. resentation of facial control. We can also extract a minimal
The plots shows actuations over time for the seven parametric representation of facial patterning, a representa-
muscle groups and the expected profile of application, tion useful for static analysis of facial expression.
release and relax phases of muscle activation. We have used this representation in real-time tracking and
synthesis of facial expressions [13] and have experimented
with expression recognition. Currently our expression recog-
after the peak of muscle group 1. Muscle group 7 includes nition accuracy is 98% on a database of 52 sequences. using
all the muscles around the eyes and as can be seen in Figure 7 either our muscle models or 2-D motion energy models for
is the primary muscle group for the raising eye brow expres- classification [14].
sion. This example illustrates that coarticulation effects can We are working on expanding our database to cover many
be observed by our system, and that they occur even in quite other expressions and also expressions with speech. Catego-
simple expressions. By using these observed temporal pat- rization of human emotion on the basis of facial expression is
terns of muscle activation, rather than simple linear ramps, or an important topic of research in psychology and we believe
heuristic approaches of the representing temporal changes, that our methods can be useful in this area. We are at present
we can more accurately characterize facial expressions. in collaborating with several psychologists on this problem
and procuring funding to undertake controlled experiments
3.3 Characterization of Facial Expressions in the area with more emphasis on evaluation and validity.

One of the main advantages of the methods presented here


Acknowledgments
is the ability to use real imagery to define representations for
different expressions. As we discussed in the last section, we
do not want to rely on pre-existing models of facial expres- We would like to thank Eero Simoncelli, John Wang
sion as they are generally not well suited to our interests and Trevor Darrell, Paul Ekman, Steve Pieper, and Keith Waters,
needs. We would rather observe subjects making expressions Steve Platt, Norm Badler and Nancy Etcoff for their help
and use the measured motion, either muscle actuations or 2- with various parts of this project.

7
[7] J. S. Bruner and R. Taguiri. The perception of people. In Handbook
of Social Psyschology. Addison-Wesley, 1954.
[8] P. Ekman. The argument and evidence about universals in facial ex-
pressions of emotion. In H. Wagner and A. Manstead, editors, Hand-
book of Social Psychophysiology. Lawrence Erlbaum, 1989.
[9] P. Ekman and W. V. Friesen. Facial Action Coding System. Consulting
Psychologists Press Inc., 577 College Avenue, Palo Alto, California
94306, 1978.
[10] P. Ekman, T. Huang, T. Sejnowski, and J. Hager (Editors). Final Re-
port to NSF of the Planning Workshop on Facial Expression Under-
standing. Technical report, National Science Foundation, Human In-
teraction Lab., UCSF, CA 94143, 1993.
[11] I. Essa. Analysis, Interpretation, and Synthesis of Facial Expressions.
PhD thesis, Massachusetts Institute of Technology, MIT Media Labo-
ratory, Cambridge, MA 02139, USA, 1994.
[12] I. Essa, S. Basu, T. Darrell, and A. Pentland. Modeling, tracking and
interactive animation of faces and heads using input from video. In
Figure 9. Left figure shows a motion field for the ex- Proceedings of Computer Animation Conference 1996, pages 68–79.
pression of raise eye brow expression from optical flow IEEE Computer Society Press, June 1996.
computation and the right figures shows the motion [13] I. Essa, T. Darrell, and A. Pentland. Tracking facial motion. In Pro-
field after it has been mapped to a dynamic face model ceedings of the Workshop on Motion of Nonrigid and Articulated Ob-
jects, pages 36–42. IEEE Computer Society, 1994.
using the control-theoretic approach of Figure 3.
[14] I. Essa and A. Pentland. Facial expression recognition using a dy-
namic model and motion energy. In Proceedings of the International
Conference on Computer Vision, pages 360–367. IEEE Computer So-
ciety, Cambridge, MA, 1995.
[15] I. A. Essa, S. Sclaroff, and A. Pentland. A unified approach for phys-
ical and geometric modeling for graphics and animation. Computer
Graphics Forum, The International Journal of the Eurographics As-
sociation, 2(3), 1992.
[16] B. Friedland. Control System Design: An Introduction to State-Space
Methods. McGraw-Hill, 1986.
[17] C. E. Izard. Facial expressions and the regulation of emotions. Journal
of Personality and Socail Psychology, 58(3):487–498, 1990.
[18] H. Li, P. Roivainen, and R. Forchheimer. 3-d motion estimation in
model-based facial image coding. IEEE Trans. Pattern Analysis and
Machine Intelligence, 15(6):545–555, June 1993.
[19] K. Mase. Recognition of facial expressions for optical flow. IEICE
Transactions, Special Issue on Computer Vision and its Applications,
Figure 10. Left figure shows a motion field for the E 74(10), 1991.
expression of smile expression from optical flow com- [20] K. Mase and A. Pentland. Lipreading by optical flow. Systems and
putation and the right figures shows the motion field Computers, 22(6):67–76, 1991.
after it has been mapped to a dynamic face model us- [21] D. Metaxas and D. Terzopoulos. Shape and nonrigid motion estima-
tion through physics-based synthesis. IEEE Trans. Pattern Analysis
ing the control-theoretic approach of Figure 3. and Machine Intelligence, 15(6):581–591, 1993.
[22] B. Moghaddam and A. Pentland. Face recognition using view-based
and modular eigenspaces. In Automatic Systems for the Identification
and Inspection of Humans, volume 2277. SPIE, 1994.
References
[23] C. Pelachaud, N. Badler, and M. Viaud. Final Report to NSF of
the Standards for Facial Animation Workshop. Technical report, Na-
[1] J. N. Bassili. Facial motion in the perception of faces and of emotional
tional Science Foundation, University of Pennsylvania, Philadelphia,
expression. Journal of Experimental Psyschology, 4:373–379, 1978.
PA 19104-6389, 1994.
[2] J. N. Bassili. Emotion recognition: The role of facial motion and the
[24] A. Pentland, B. Moghaddam, and T. Starner. View-based and modu-
relative importance of upper and lower areas of the face. Journal of lar eigenspaces for face recognition. In Computer Vision and Pattern
Personality and Social Psychology, 37:2049–2059, 1979. Recognition Conference, pages 84–91. IEEE Computer Society, 1994.
[3] Klaus-Jürgen Bathe. Finite Element Procedures in Engineering Anal- [25] A. Pentland and S. Sclaroff. Closed form solutions for physically
ysis. Prentice-Hall, 1982.
based shape modeling and recovery. IEEE Trans. Pattern Analysis
[4] M. J. Black and Y. Yacoob. Tracking and recognizing rigid and non- and Machine Intelligence, 13(7):715–729, July 1991.
rigid facial motions using local parametric model of image motion.
[26] S. Pieper, J. Rosen, and D. Zeltzer. Interactive graphics for plastic
In Proceedings of the International Conference on Computer Vision, surgery: A task level analysis and implementation. Computer Graph-
pages 374–381. IEEE Computer Society, Cambridge, MA, 1995. ics, Special Issue: ACM Siggraph, 1992 Symposium on Interactive 3D
[5] R. G. Brown. Introduction to Random Signal Analysis and Kalman Graphics, pages 127–134, 1992.
Filtering. John Wiley & Sons Inc., 1983.
[27] S. M. Platt and N. I. Badler. Animating facial expression. ACM SIG-
[6] V. Bruce. Recognising Faces. Lawrence Erlbaum Associates, 1988. GRAPH Conference Proceedings, 15(3):245–252, 1981.

8
[28] E. P. Simoncelli. Distributed Representation and Analysis of Visual
Motion. PhD thesis, Massachusetts Institute of Technology, 1993.
[29] D. Terzopoulus and K. Waters. Analysis and synthesis of facial image
sequences using physical and anatomical models. IEEE Trans. Pattern
Analysis and Machine Intelligence, 15(6):569–579, June 1993.
[30] S. A. Wainwright, W. D. Biggs, J. D. Curry, and J. M. Gosline. Me-
chanical Design in Organisms. Princeton University Press, 1976.
[31] J. Y. A. Wang and E. Adelson. Layered representation for motion anal-
ysis. In Proceedings of the Computer Vision and Pattern Recognition
Conference, 1993.
[32] K. Waters and D. Terzopoulos. Modeling and animating faces using
scanned data. The Journal of Visualization and Computer Animation,
2:123–128, 1991.
[33] Y. Yacoob and L. Davis. Computing spatio-temporal representations
of human faces. In Proceedings of the Computer Vision and Pattern
Recognition Conference, pages 70–75. IEEE Computer Society, 1994.

Das könnte Ihnen auch gefallen