Sie sind auf Seite 1von 4

Affective Communication Aid using Wearable Devices

based on Biosignals
Yuji Takano

Kenji Suzuki

University of Tsukuba
1-1-1 Tennodai
Tsukuba, Japan

University of Tsukuba/JST
1-1-1 Tennodai
Tsukuba, Japan


communication skill common to all humankind and does not

much depend on culture. Understanding facial expressions
correctly is very important for communication with other
people. Daily communication between parents and children
is very important to building their relationship and has a key
role in childrens mental and social development. However,
there are some cases where it is dicult for parents or caregivers to consistently recognize their childrens facial expressions. For example, children with autism spectrum disorders
(ASD) have diculties with communicating and socially interacting through facial expressions, even with their parents.
Autism comprises a wide range of neurodevelopmental disorders, and its intensity diers greatly in individuals. Therefore, setting a clear boundary between healthy and autistic
people is dicult, and the mechanisms of autism have not
yet been claried. A typical example of the communication
diculty in the case of autism includes the lack of facial
expressions and eye contact [5, 9]. Facial expressions play
an important role in communication with others, and we
want to know when and how much their facial expression
changes based on events in their daily lives. Previously, we
reported on the relationship between smiles and positive social behavior [4]. The smiles of children with ASD can be
quantitatively measured and analyzed by using a wearable
device [6]. There are many situations where reading and
understanding a childs facial expressions are desirable.
Various classication methods of facial expressions have
been proposed based on dierent features. The facial action
coding system (FACS) [3] describes facial expressions based
on physical and anatomical criteria, and many researchers
have embraced FACS to classify facial expressions [8]. There
are also many approaches to capturing facial expressions.
One method is to extract physical variations in facial features from video by means of image processing. This is a
non-contact method that is the most commonly used to recognize facial expressions; it is also easy to use, with little
eort needed to install the equipment. However, it has the
disadvantage of spatial limitations as it depends on the camera position and eld of view, and its accuracy is aected by
the head posture, so the target user has to face the camera
constantly. For use in actual situations outside a controlled
environment, there is little possibility of the subject staying in the same position constantly, especially for children
who are moving and playing around. Thus, using image
processing is dicult. Another possible approach is to use
motion-capture technology to extract the three-dimensional
shape of the face from the markers coordinates and measure
the physical features more properly. However, placing the

We propose a novel wearable interface for sharing facial expressions between children with autism spectrum disorders
(ASD) and their parents, therapists, and caregivers. The developed interface is capable of recognizing facial expressions
based on physiological signal patterns taken from facial bioelectrical signals and displaying the results in real time. The
physiological signals are measured from the forehead and
both sides of the head. We veried that the proposed classication method is robust against facial movements, blinking,
and the head posture. This compact interface can support
the perception of facial expressions between children with
ASD and others to help improve their communication.

Categories and Subject Descriptors

I.5.5 [Implementation]: Interactive systems; K.4.2 [Social
Issues]: Assistive technologies for persons with disabilities

General Terms

Facial expression, Smile sharing, Autism Spectrum Disorder

In this paper, we propose a novel interaction method for
sharing childrens facial expressions with their parents in
order to facilitate communication. In human communication, facial expressions carry some of the most important
non-verbal information. Facial expressions include psychological information, such as emotions, which are very important aspects of communication. People can read a persons
thoughts simply by observing their expression. Psychological studies have found that facial expressions can project
emotions such as disgust, sadness, happiness, fear, anger,
and surprise [2]. Expressing these emotions is a universal
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from
IDC14, June 1720, 2014, Aarhus, Denmark.
Copyright 2014 ACM 978-1-4503-2272-0/14/06 ...$15.00.








LED Display



Figure 2: Overview of head-mounted interface

Figure 1:

Smile sharing:

proposed interaction

positions on top of facial muscles. In this study, we measured

sEMG on the sides of the head and forehead to reduce inhibition against physical variations of the face and developed an
easy-to-wear interface. We used patterns of acquired sEMG
signals to classify the facial expressions. By regarding facial expressions as specic patterns of activity by several
facial muscles, the interface can classify them without needing to identify individual muscle activity. A support vector
machine (SVM) was used for pattern classication and to
dierentiate smiles from other facial expressions.

markers requires preparation, which makes this approach laborious, and the markers can be easily occluded. Thus, the
development of a method for capturing facial expressions
that is easy to use and does not depend on spatial orientation is still a dicult challenge. We have been developing
a tool to detect the facial expressions of a person who has
diculty with expressing their intent in an accurate and continuous manner through the use of a wearable device. This
allows users to not only capture the facial expression but
also share them with others, even if the face is not always
observable by sensors installed in the environment, such as
cameras or depth sensors. In this paper, we propose the concept of smile sharing, where a wearable devicenamely, an
aective communication aidis used that meets the above
criteria to communicate facial expressions. We evaluated the
device to verify its performance through several case studies.


Smile Sharing

We propose a method for sharing facial expressions so that

a childs ambiguous or hidden expressions can be perceived
in real time. In the current implementation, we only classify the childs smile and communicate it to their parents
through various modalities. A smile is a facial expression
that represents happiness [1], and the perception of smiling
facilitates communication between children and their parents. Specically, perceiving a childs smile helps in understanding what makes the child happy. By knowing what a
child is interested in, the parents can communicate with him
or her more intensely and feel more encouraged in their understanding. For smile sharing, we used both light-emitting
and vibration devices. Using a light-emitting device helps
the parents perceive a childs smile even if the child turns
away his or her face. The parents can also perceive the
childs smile by using a wrist-mounted vibration device even
if they are not relatively close, which can happen when playing. These methods are also viable for autistic children and
their parents when the parents cannot look at their childs
face directly.

The proposed system provides a novel method of interaction, particularly between children and their parents, that
considers use in daily life. Figure 1 shows a conceptual diagram of the proposed interaction frame. We rst describe
the method for capturing and classifying the facial expressions independent of spatial orientation and then the sharing
of the facial expressions.

2.1 Wearable Interface

Our proposed wearable interface can capture facial expressions independent of the spatial orientation. To realize
this system, we use surface electromyography (sEMG) on
the forefront and sides of the face. sEMG can be captured
by using small electrodes to measure the bioelectrical signals
emitted from muscles that are activated to generate facial
expressions. Conventionally, the electrodes must be accurately pasted on the skin on top of facial muscles, including the orbicular muscles of the mouth and eyes, for sEMG
measurement of the face. However, pasting electrodes on
the skin has some disadvantages: The process takes a long
time, and the electrodes are prone to interference from facial movements. A possible approach to overcoming these
obstacles is measuring sEMG on the sides of face, i.e., distal EMG [6]. We have previously shown that distal electrode
locations on areas of low facial mobility have a strong amplitude and are correlated to signals captured in the traditional



The system consists of an interface unit and signal processing unit. The interface unit measures signals and outputs the classication results. The signal processing unit
classies facial expressions based on the measured sEMG
signals and sends the result to the interface unit via Bluetooth wireless communication.


Interface unit

We developed two dierent wearable interfaces: headmounted and wrist-mounted devices. Figure 2 shows an
overview of the head-mounted interface. The wrist-mounted
interface is a simple vibration device that simply vibrates
when the smile is detected by the head-mounted interface.


by the SVM based on their patterns. Figure 4 shows an

overview of the signal processing. The sEMG signals are
acquired every 1 ms, and the facial recognition is performed
within a certain time window ( = 150 ms). sEMG signals
vary depending on individual dierences and electrode position. The system rst needs to be calibrated for each user
by recording some facial expressions in advance and learning the wearers signal pattern and intensity. However, it is
dicult for children with ASD to participate in this calibration session. In such cases, the system user simply gives the
period of smiling time as a reference, which is used as the
basis for smile recognition.

Figure 3: Appearances of LED interface









Comb Band
b(t) Smoothing c(t)
Pass x(t)



Signal Processing

Evaluation of classication accuracy

In order to evaluate the classication accuracy of the proposed system, we compared it to the human cognitive ability to recognize smiling in an experimental setting. We
recorded videos of three people (persons AC); each alternately smiled and had a neutral expression for two or three
times over about 20 s while wearing the headband interface.
Nine subjects (eight male, one female) in their twenties and
thirties were recruited for the experiment. Informed consent was obtained from the participants in advance. Videos
of the smiling/neutral faces (AC) were shown to the subjects, and they were asked to mark the smile intervals by
clicking a button to indicate the start and stop of smiles.
We covered the LED in the videos to avoid inuencing the
subjects judgment. We calculated the maximum, minimum,
and median values of precision and recall based on the classications by the subjects and proposed system. Figure 5
shows the results.
As shown in the gure, the precision of each subject was
above 0.95, but the recall varied among subjects depending
on who created the facial expressions. The dierences in recall may have been due to the dierent facial features, some
of which are more dicult to recognize than others. This
made it more dicult to set a threshold for smiling (as for
subject B), which lowered the recall. However, in terms of
classication accuracy, the results were positive because the
precision average was sucient for potential applications.
We also calculated the intra-class correlation coecient to
evaluate the degree of coincidence between the classication
by the interface and the judgment of the subjects. The average intra-class correlation coecient was more than 0.936,
which is also sucient.








We conducted two experiments to evaluate the performance of the proposed system. In this section, we present
the classication accuracy and robustness against head motion of the system.

Light Emitting

Figure 4: Overview of facial recognition by using

head-mounted interface
The head-mounted interface is used both to acquire facial
sEMG and to display the resulting facial expression classication. It comprises dry electrodes and an LED embedded in a headband. sEMG is acquired by the interface and
sent to the signal processing unit through Bluetooth wireless communication. We decided to use dry electrodes in
the interface, although they are prone to noise contamination in the case of unstable contact with the skin, because
they are much easier to apply on the skin and enable fast
measurement of sEMG with minimal preparation time. The
headband is made of elastic material, and the position of the
electrodes inside the headband is adjustable. Therefore, the
interface can manage dierent head sizes and shapes, and
it holds the dry electrodes steady in place to provide better
stability. Figure 3 shows the appearance of the LED, and
the LED colors of the interface are white and red. The LED
emits a red light if the wearer is smiling and a white light
for any other facial expression. The LED is tted in a small
tube as shown in Figure 3 to make it easily noticeable by
The wrist-mounted interface comprises a vibration motor
and presents the facial expression of the headband wearer
through vibration. This interface vibrates if the person
wearing the head-mounted interface is smiling. By using
this interface, parents can perceive their childs smile even
if they cannot look at his or her face directly. In particular,
the wrist-mounted interface can help the parents of autistic
children perceive their childs smile.


Evaluation of robustness

We then conducted an experiment to investigate the robustness of the system against head motion artifacts. In
this experiment, we investigated whether the system is capable of classifying facial expressions when there are disturbances such as head motions. The classication accuracy
against head nodding (forward and back head movement),
head tilting (left and right head tilting), head shaking (right
and left rotation), and blinking was checked to evaluate the
robustness of the system. The three motions we investigated
(nod, tilt, and shake) correspond to all possible motions

3.2 Signal processing unit

The signal processing unit handles digital ltering and
pattern recognition processes. The sEMG signals acquired
from the interface unit are pre-processed and then classied



the acquired signals from the sides of the head and forehead
can be used for facial expression classication. Through several experiments, we veried the classication accuracy of
the developed system. The results demonstrated that the
interface can be used in real environments with some disturbances to classify facial expressions with high accuracy
and to present smiles in real-time. Further investigation will
include the implementation of adaptive ltering to remove
motion artifacts.
So far, we have presented the concept of a novel interaction design between children and their parents and developed interfaces that enable the realization of such interaction. We have already conducted a feasibility study with
children having ASD during robot-assisted activities and
conrmed that the proposed device is acceptable [7]. In the
future, we plan to conduct a user study with children and
families to verify that the interfaces can support the sharing
and perception of facial expressions in the given scenario.





Figure 5: Maximum, minimum, and median values

of precision and recall







no motion





[1] P. Ekman. An argument for basic emotions. Cognition

and Emotion, 6(3):169200, 1992.
[2] P. Ekman. Emotions Revealed: Recognizing Faces and
Feelings to Improve Communication and Emotional
Life. Times Books, 2003.
[3] P. Ekman and W. Friesen. Facial Action Coding
System: A Technique for the Measurement of Facial
Movement. Consulting Psychologists Press, 1978.
[4] A. Funahashi, A. Gruebler, T. Aoki, H. Kadone, and
K. Suzuki. The smiles of a child with autism spectrum
disorder during an animal-assisted activity may
facilitate social positive behaviors - quantitative
analysis with smile-detecting interface. J Autism Dev
Disord, 44(3):685693, 2014.
[5] K. Gray and B. Tonge. Are there early features of
autism in infants and preschool children? J Paediatr
Child Health, 37(3):221226, June 2001.
[6] A. Gruebler and K. Suzuki. Design of a wearable device
for reading positive expressions from facial emg signals.
IEEE Trans. on Aective Comput., (in press).
[7] M. Hirokawa, A. Funahashi, and K. Suzuki. A doll-type
interface for real-time humanoid teleoperation in
robot-assisted activity: A case study. In ACM/IEEE
Intl. Conf. on Human-Robot Interaction, pages
174175, 2014.
[8] J. J. Lien, T. Kanade, J. F. Cohn, and C. C. Li.
Automated facial expression recognition based on facs
action units. In IEEE. Published in the Proceedings of
FG 98, April 1998.
[9] F. R. Volkmar and L. C. Mayes. Gaze behavior in
autism. Development and Psychopathology, 2(1):6169,
January 1990.


Figure 6: Maximum, minimum, and median values

of robustness
(roll, pitch, and yaw); therefore, a positive result means that
the system is likely to be robust against any combination of
head motions. We asked the eight subjects to perform this
experiment while wearing the headband. Each subject performed each motion for about 5 s while smiling or having
a neutral face. Blink represents 10 blinks, and Nod and
Tile were done twice each. Shake represents random head
shaking along the yaw axis. Figure 6 shows the maximum,
minimum, and median values of the classication accuracy
for each motion in the experiment. The results showed that
the system classied neutral expression with no motion with
a probability of 100%. The system was able to classify the
neutral expression of most subjects with an accuracy of more
than 95% even when there were some disturbances. In the
case of smiles, there were some cases where the smile was
occasionally not detected properly. In the most prominent
case, subjects reported that it was dicult to smile and
blink at the same time, which probably contributed to the
classication accuracy for blinking being lower than others.
However, the interface was capable of classifying smiles by
the majority of the subjects with an accuracy of more than


In this study, we considered the scenario of daily communication between children and their parents and focused
on facial expressions, which are non-verbal information that
is important to facilitating communication. We proposed
wearable interfaces to classify facial expressions based on
facial muscle activities and share them through light and
vibration. We evaluated the classication accuracy and robustness of the system through experiments and veried that