Sie sind auf Seite 1von 11

SDT: Maintaining the Communication Protocol

Through Mixed Feedback Strategies


Youssef Khaoula(B) , P. Ravindra De Silva, and Michio Okada
Interaction and Communication Design Lab, Toyohashi University of Technology,
1-1 Hibarigaoka, Tempaku, Toyohashi, Aichi, Japan
youssef@icd.cs.tut.ac.jp, {ravi,okada}@tut.jp
http://www.icd.cs.tut.ac.jp

Abstract. In our previous work, we studied how humans establish a


protocol of communication in a context that requires mutual adaptation using our robot Sociable Dining Table (SDT). SDT integrates a
dish robot put on the table and behaves according to the knocks that
the human emits by guessing the meaning of each knocking pattern. We
remarked based on previous experiments, that a personalized communication protocol is established incrementally. In fact, the communication
protocol is not personalized only to the pair human-robot but also to
the human robot interactions (HRI) instance. In the current study, we
change the robots feedback modality (the way the robot communicates
back with the human) in order that the communication protocol can
be maintained over dierent HRIs instances. We proposed as new feedback modality, 2 mixed-feedback strategies integrating inarticulate utterances (IU) combined with the robots visible behavior in order to facilitate the guessing of the robots internal state for the human. The rst
strategy consisted in anticipating the robots executed behavior using
static IU combined with the robots movement (St1), and the other consisted in genuinely suggesting an adaptive IU generation method combined with the robots movement too (St2). In the current work, we
conducted an HRI experiment to explore whether the communication
protocol can be maintained on a long-term basis by integrating the 2
proposed methods. The results provide conrmatory evidence that using
IU helps in establishing stable communication protocols. In addition to
that it increases the attachment and the robots overall subjective ratings. Another important nding is that, among the two methods, the
adaptive mixed feedback strategy (St2) aords better subjective results
and objective performance.
Keywords: Inarticulate utterance
nication Persuasiveness Recall

Adaptation Protocol of commu-

Introduction

The current study draws on previous research where the goal was to explore how
people can incrementally establish a communication protocol within a simpler
c Springer International Publishing Switzerland 2015

A. Tapus et al. (Eds.): ICSR 2015, LNAI 9388, pp. 348358, 2015.
DOI: 10.1007/978-3-319-25554-5 35

Maintaining the Communication Protocol

349

setting [1]. In our previous study, we used the knocking as the only communication channel that the human can use to express his intention (knocking on the table)
while the robot used visible movement as the only feedback channel to communicate
back with the human [1]. We showed based on an HRI experiment that, the human
and the robot cooperated and as a result a personalized communication protocol
emerged after each HRI interactions instance. We remarked also, that humans tend
to forget the previously established communication protocol (PECP) and instead
keep on creating new communication protocols in each HRIs instance [1]. Since,
we want the HRI to occur smoothly and implicitly (the human has to feel spontaneous), we need indirectly to make the human remember the PECP so as to avoid
the anticipated unpleasant consequences such as making the humans social face
threatened or the robot being abandoned. Consequently, in our study we address
the issue of triggering the human indirectly to use the same previously established
communication protocol (PECP) rather than creating in each HRI instance a new
communication protocol. We proposed to use 2 mixed-feedback strategies to communicate back to the human and to integrate the robots visible movement along
with IU. Thus, the challenge to be resolved is to investigate the eect of using these
mixed-feedback strategies on the long-term recall (remembrance) of the PECP in
addition to exploring the impact of our strategy on the humans subjective evaluation of the robots performance. Also, we want to determine, which is better in
terms of objective and subjective results: the rst or the second proposed mixedfeedback strategy.

Background

There are some occasions in the HRI when special reasons dictate inconsistent
behavior. For example, we can cite humans forgetfulness of the previously formed
communication protocol just as we remarked in the previous study [1]. In such circumstance, the robot is forced to tell the human that he is wrong which may be
perceived as challenging for the human partner. When the robot disagrees with the
human, the human is confronted with a face-threatening act, placing the user at
risk of being bothered by the robots opposition [2]. The concept of face-threatening
acts was initially proposed by social scientists such as Gomann [2]. Moreover,
even if the robot assumes that itself, it is the faulty party and tries to apologize,
users may lose their trust in the robot and they will use it as a scapegoat to avoid
any responsibility [2], assuming of course that they are even aware of their faulty
behaviors. In this context, Lee et al. [3] utilize dierent strategies including apologies, compensation and options for the user to reduce the negative consequences
of the communication protocols breakdowns. In another study, Torrey et al. [4]
highlight that we just need to add hedges and discourse markers so that we diuse
the humans sensation that the robot takes control over him, when it tries literally
to tell the human nicely that he is wrong. Takayama et al. [5], conrmed through
their study that distancing the voice from the robot makes the human tolerates the
fact that the robot disagrees sometimes with them. We believe that these strategies
are useful but simpler methods can be integrated within HRI to resolve this challenge. We draw inspiration from child-caregiver interactions scenario. We want

350

Y. Khaoula et al.

Fig. 1. In (a), a user interacting with SDT and in (b) we have SDTs architecture

to integrate a more implicit way that helps preventing PECP post-forgetfulness


and trigger humans memory indirectly while we expect that an IU may lead the
human to remember the related behavior just as Paivio [6] indicates in his dual
coding concept1 . For example, in a baby-caregiver interaction, a baby does not use
such complex analytical methods (such as compensating, apology, etc..) in order
to communicate back with the caregiver, but combines each of the behaviors with
a special IU. The communication goes through dierent breakdowns and still both
parties (the baby and the caregiver) can establish a long-term communication protocol. In fact, based on the simple IU combined with the behaviors previously used
during previous interactions, the baby and the caregiver are capable of recalling
each of the behaviors which makes the establishment of a stable long-term used
communication protocol easier for both parties [7]. On these grounds, this study is
an attempt to address the issue of how a robot can express implicitly his disagreement about some inconsistency during the HRI without threatening the human
partners social faces. More specically, our goal is by using what we anticipate to
be a threat-free method, to drive the human during the rst HRI instance (coding
phase) to memorize the communication protocol. By using the proposed methods,
we hope that users will be made aware of their faulty indications, focus more when
they establish the communication protocol, and that as a result a long-term communication protocol can be established whereby users feel comfortable to communicate with the robot.

Architecture of the SDT

SDT uses 4 microphones to localize the knocks source based on the weighted regression algorithm. It communicates with the human based on a sound output and with
1

According to the dual coding concept, each trigger (visual or audio) that it is combined
with a concept learning during the coding phase (learning phase or by analogy to our
problematic, during the communication protocol establishment), may facilitate the
remembrance (recall) of the information (or in our case the PECP) once the trigger is
exposed to the human.

Maintaining the Communication Protocol

351

the host computer through Wi-Fi using it control unit (a macro computer chip
(AVR ATMEGA128)). It employs a servomotor that helps to exhibit the dierent behaviors: right, forward, left and back. Finally, 5 photo reectors are utilized
to automatically detect the boundaries of the table and avoid falling (Fig.1).

Robots Action Selection Strategy

We conceived an actor/critic architecture that incrementally helps the robot to


choose between 4 actions (left, right, back, forward).
4.1

Actor Learning

Each knocking pattern (x contiguous knocks pattern:e.g 2 knocks, 3 knocks, etc.)


has its own distribution X(st ) = N (X(st ) , X(st ) ) where X(st ) is dened as the
knocking pattern, X(st ) and X(st ) are the mean value and the variance. We
chose 2 seconds (s) as a threshold for the users reaction time based on previous
established experiments. When the robot observes the state st , the behavior is
picked according to the probabilistic policy (st )nbknocks . If within 2 s there
was no knocking pattern, we suppose that the robot has succeeded by choosing
the right behavior and the critic reinforces the value of the executed behavior in
the state st . The system switches to the state st+1 . If a new knocking pattern
is composed before that 2 s elapsed, the state of the interaction changes to the
state st+1 indicating that the knocker disagrees about the behavior that was
executed. The critic updates thus the value function before choosing any new
behavior. As long as the knocker is interrupting the robots behavior before that
2 s elapsed, the actor chooses the action henceforth by pure exploration (until
we meet an agreement state: no knocking during 2 s) based on (1). The random
values vary between 0 rnd1, and 3 rnd2. The above range was decided to
bring the values of the action between 0 and 3 (corresponding to the behaviors
(forward, right, back, left) numerical codes). We assume in such case that the
knocker will randomly compose the patterns just to switch the robots behavior.

(1)
A(st ) = X(st ) + X(st ) 2 log(rnd1 ) Sin(2 rnd2 )
4.2

Critic Learning

The critic calculates the TD error t which is the dierence between the real value
function of the new gathered state V (st+1 ) and the expected state V (st ) (2)
t = rt + V (st+1 ) V (st )

(2)

with is the discount rate and 0 1. According to the TD error, the critic
updates the state value function V (st ) based on (3).
V (st ) = V (st ) + t

(3)

352

Y. Khaoula et al.

where 0 1 is the learning rate. As long as the knocker disagrees about the
executed behavior before 2 s elapsed, we rene the distribution N (X(st ) , X(st ) )
that it is relevant to the pattern X(st ) (for each pattern, we have a specic
distribution X(st ) ) which helps us to choose the action according to (4).
X(st ) =

X(st ) + Ast
X(st ) + |Ast X(st ) |
X(st ) =
2
2

(4)

Feedback Strategies

In the current study, we have 3 feedback strategies that helps the robot to
communicate with the human.
5.1

Visible Movement-Based Feedback Strategy Method

This strategy corresponds to the feedback modality that we used in our previous
work while the robot can just execute the action after guessing the knocking
patterns meaning. Based on the robots visible movement, the human has to
understand implicitly how the robot combines for each knocking pattern, a relevant action.
5.2

Static Mixed Feedback Strategy

This method consists on announcing before that the robot executes the intended
behavior, the label of that behavior. (e.g: If the robot has to go right, the robot
generates the IU go right before it executes the action. We believe that by
using this method, the user will have more time to help the robot avoid the
wrong steps.
5.3

Adaptive Mixed Feedback Strategy

We opted for the SARSA algorithm in order to generate in real time and in
an adaptive manner, dierent IUs combined with the robots visible behaviors.
SARSA (so called because it uses state-action-reward-state-action experiences
to update the Q-values) is an on-policy reinforcement learning algorithm that
estimates the value of the policy being followed [8]. An experience in SARSA is
of the form (S,A,R,S,A), which means that the robot was in state S, did the
action A, receives the reward R, and ends up in state S, from which it decided to
do action A. This provides a new experience to update Q(S ,A). The new value
that this experience provides is, r + Q(S  , A ). In this context, as the robot will
use the actor-critic to choose the future action (based on the actor component)
and a reward will be generated which will be used by the critic of update issues,
that same action will help to choose the appropriate label (e.g if the robot has
to go left, then the robot will generate the label left as IU (method used in the
static mixed feedback strategy)) and that same reward will help to update the

Maintaining the Communication Protocol

353

Q function because if the human made an error, then it should also be linked to
his misunderstanding of the IU.
A state S in our case, is the combination of the actual interaction status (agreement or disagreement) in addition to the number of knocks received (e.g: a state
S can be S=(agreement, 2 knocks) which means that the actual interaction status is agreement and the robot receives 2 knocks.). In our context, we have 2
interaction status: the agreement status (when the human does not knock for 2
seconds (s), we assume that the last action was correct) and the disagreement
status otherwise. So, if we assume that each state can have 2 status agreement
and disagreement and we have 4 types of patterns, then we will have 8 states.
We assign for each (state, utterance) a value function Q(S, U ) which we initialize arbitrarily. It helps comparing between dierent (state, utterance) couples
outcomes. In the disagreement status and whenever there will be a knocking
pattern, there are 3 possible actions which precedes the robots movement:
indicating the chosen behavior using an IU (to reduce wrong steps).
repeating the received number of knocks (indicating that the input is processed).
combining both.
The second possible status is the agreement, while we have 3 possible actions:
repeat the label of action A that was previously executed by naming it, take
a small pause and then tell the future robots action B that is intended to
be executed (to make the human aware that the action will be shifted from
A to B).
repeat the previous received number of knocks A in addition to the actual
received number of knocks B (to indicate for the human that the robot is
aware the knocking is shifted from A to B).
indicate using IU the (previous knocking pattern, label of previously executed action), a pause and then indicate the future robots action that is
intended to be executed (to consolidate the lately learnt rule and makes the
human aware about the new action).
a high pitched inarticulate utterance showing enthusiasm (to engage the user
more in the interaction).
The update of the value function follows the equation (5):
Q(S, U ) < Q(S, U ) + w(r + vQ(S  , U  ) Q(S, U ))

(5)

with 0<w<1 is the learning rate. r is the gathered reward and 0<v<1 is the
discount factor.

Experimental Setup

Participants take part in 2 trials one-by-one while each one cooperates with
the robot in order to lead it to the dierent checkpoints (Fig.2). Each participant was informed that the robot can execute 4 behaviors (right, left, back,

354

Y. Khaoula et al.

Fig. 2. An example of 2 congurations: each conguration is designed for a trial and


is formed by dierent points marked on the table.

forward ) while he has to knock on the table in order to convey his intention
of making the robot choose a specic direction. Each user, has to participate
for the rst time (trial 1) and then answer a survey (indicated in section 7).
After, 3 days the participant comes again to the laboratory to redo again the
same task except that we propose this time (trial 2) a dierent conguration.
The former points marked on the table are changed to guarantee the diversity
of the patterns suggested by the participants (Fig.2). In the current experiment,
we have 3 conditions for which we assigned the same number of participants
(7 participants per condition). It is the same task for all participants except
that, HRI is designed dierently during trials 1 and 2 for each condition. For
the rst condition (MM), we used a simple feedback strategy (Movement (M))
during both trials and for 7 participants while the robot is silent and the human
can just visualize the robots movement. We call that simple feedback strategy,
movement-based feedback strategy. For the condition 2 (MI), the robot uses the
mixed-feedback strategy (M+IU) while it just uses the static method combined
with the robots movement during both trials and for 7 participants. Finally, in
condition 3 (AMI), the robot uses an adaptive method for utterance generation (M+adaptive IU) combined with the robots movement during both trials
and for 7 participants. In total 21 participants ([21-30]years) take part in our
experiment.

Survey Procedure

After nishing each trial, we asked each participant to ll out 7-Likert scale questionnaires so that we could measure: the attachment that may evolve (5 factors
to measure the attachment: adaptability, stress-free, perceived closeness, cooperation and achievement [1]), the robots credibility (using a standard instrument
indicated [9] which consisted of 3 factors: competence, trust and caring), the
social face support (2 factors: human positive face support (HPFS) and human
negative face support (NHFS)) to verify whether the users social faces were
supported during the HRI (inspired from [10]). We also demanded from the
participant to ll out a 4-God speed questionnaire [11] to measure the robots
likeability, perceived intelligence, animacy and anthropomorphism. Moreover, we
evaluate the users mood using SAM scale [12] (2 factors: pleasure and arousal).

Maintaining the Communication Protocol

355

After answering the questionnaires, we demanded from the user to arrange a list
of words in an increasing order of priority which he assigns according to his own
opinion. This list, is used to measure the persuasiveness based on the kendall-tau
distance and the method indicated in [13]. After that, the user has to describe
his experience with the robot in an open-ended way and then evaluate his future
frequency of use of the robot. Then, we asked the user to play a game for 15
minutes in order to determine whether they remember the interaction rules with
SDT. He has to enumerate the dierent interaction rules that he still remembers
(short-term recall rate). We noted the communication protocol established in
trial 1 and we compared it to the communication protocol used in the beginning
of trial 2, so that we could calculate the reuse percentage of rules that belong
to the previously established communication protocol (long-term recall rate).
We computed also the chi-square and the Cramer V evaluating the stability of
the communication protocol and the relationship between the knocking patterns
and the robots behaviors. Also, we computed the minimal Euclidian distance
between the robots trajectory and the dierent checkpoints (CPs) marked on
the table, so that we can verify the perfection level that the user reached while
doing the task.

Challenges of Using the Robots Movement as a


Feedback Strategy

Table 1 summarized the comparison of the condition MM trials 1 and 2 results.


Based on Table 1, we remark that users found that the robot was more competent
and that the level of focus was higher (mutual attention, arousal) during trial 2.
This, may explain the longer time needed to achieve the task that was achieved
perfectly in comparison to the rst trial. We expected that the time needed would
be shorter during trial 2, however this was not the case. Figure 3 shows that,
all the participants have a long-term recall rate value under 0.5 which explains
why people took more time during trial 2. In fact, they forget the PECP.

Table 1. Comparison of the condition MM trials 1 and 2 results.


Metric
t-test df P-value
Time Needed
2.417 12 0.032
Distance From the CP 2.942 12 0.012
Arousal
2.25 12 0.044
Mutual Attention
4.66 12 0.001
Competence
2.219 12 0.047
Fig. 3. Long-term recall rate for the
dierent participants in trial 2 of the
condition MM

356

Y. Khaoula et al.

Mixed-Feedback Strategy Vs Movement-Based


Feedback Strategy

We compared the trials 2 results of the conditions MI and MM to investigate


the eect of adding the IU to the robots visual behavior as an expected more
explicit feedback strategy. Table 2, shows the results of the comparison (comparison of MI and MM). Based on Table 2, we conclude, that the mixed-feedback
strategy of the condition MI, helped users to feel more attached to the robot,
believe on its credibility and feel no threat of loss of faces. The robot was judged
as more animate (animacy factor) and anthropomorphic (anthropomorphism
factor), likeable (likeability factor) and smart (perceived intelligence factor),
attentive (mutual attention factor) and persuasive (persuasiveness factor). This,
led to a shift to a positive mood for the user and, a higher expected long-term
use for the robot in the condition MI. We noticed also, that the condition MI
boosts the robots objective performance, while we remarked that during trial
2, we have more stable protocols (cramer V). Also, we remark that users could
achieve the task more perfectly (distance from the checkpoints) and in shorter
Table 2. Conditions MM and MI comparison results of the trials 2 factors values and
the conditions MM and AMI comparison results of the trials 2 factors values.
Factors

comparison of MI and MM comparison of MI and AMI


t-test P-value
t-test P-value
df
achievement
3.216 0.007
3.216 0.007
12
cooperation
2.263 0.043
2.263 0.043
12
friendliness
2.29 0.041
2.291 0.041
12
stress-free
2.846 0.015
2.846 0.015
12
adaptability
2.982 0.011
2.982 0.011
12
trust
2.642 0.021
2.642 0.021
12
competence
2.449 0.031
2.449 0.031
12
caring
2.744 0.018
2.744 0.018
12
animacy
3.753 0.003
3.753 0.003
12
anthropomorphism
3.392 0.005
3.392 0.005
12
likeability
3.545 0.004
3.545 0.004
12
perceived intelligence 2.219 0.047
2.219 0.047
12
arousal
4.157 0.001
4.157 0.001
12
pleasure
2.717 0.019
2.717 0.019
12
persuasiveness
4.261 0.001
4.26 0.001
12
PHFS
2.425 0.032
2.425 0.032
12
NHFS
2.58 0.024
2.58 0.024
12
attention allocation
2.489 0.029
2.489 0.029
12
long term use
3.897 0.002
3.897 0.002
12
Distance from CPs
2.411 0.033
2.411 0.033
12
Cramer V
19.297 <0.0001
19.297 <0.0001
12
Task Completion Time 2.284 0.041
2.284 0.041
12
short-term recall
14.899 <0.0001
0.612 0.552
12
long-term recall
4.7555 0.0004
0.816 0.43
12

Maintaining the Communication Protocol

357

time. Furthermore, results suggest that the short-term recall rate value of the
condition MI is signicantly higher in comparison to the short-term recall rate
values of the condition MM. In correlation to that, we nd that long-term recall
was higher in the condition MI. These results aord insights for the HRI while
it shows that using a mixed-feedback strategy is subjectively more preferred,
leads to better objective performance and relatively a better maintainance of
the PECP.

10

Static Mixed Feedback Strategy Vs Adaptive


Mixed-Feedback Strategy

Table 2, exposed the comparison results of the conditions MI and AMI. Based on
Table 2, we can see that users attribute better subjective evaluation of the HRI
when the robot uses the adaptive mixed feedback strategy. We remark also, that
there were no signicant dierences between conditions MI and AMI in terms of
short and long term recall rates which means that both methods guarantee the
remembrance (recall) of the PECP on long and short term basis. Furthermore,
objective performance was higher for the AMI condition. In fact, users achieve
the task more perfectly, in shorter time and succeeded in establishing stable
communication protocols. Consequently, while using the adaptive mixed feedback strategy during an HRI guarantees better subjective ratings, more stable
communication protocol and leads to a better achievement of the task in shorter
a time, the adaptive IU generation strategy does not lead to a better recall of
the PECP.

11

Conclusion

We proposed 2 mixed-feedback strategies that helps to ameliorate users subjective ratings and remembrance of the PECP on a long-term basis. Results
suggest, that the adaptive mixed feedback strategy of IU leads to better recall
rates in comparison to the static mixed feedback strategy. In our future work,
we intend to use aective inarticulate utterances and investigate whether it can
arouse users aective and cognitive empathy.
Acknowledgments. This research is supported by Grant-in-Aid for scientic research
of KIBAN-B (26280102) from the Japan Society for the Promotion of science (JSPS).

References
1. Youssef, K., De Silva, P.R.S., Okada, M.: Investigating the mutual adaptation process to build up the protocol of communication. In: Social Robotics,
pp. 571576 (2014)
2. Brown, P., Levinson, S.C.: Toward a more civilized design: studying the eects of
computers that apologize. Journal of Human-Computer Studies, 319345 (2004)

358

Y. Khaoula et al.

3. Lee, M.K., Kielser, S., Forlizzi, J., Srinivasa, S., Rybski, P.: Gracefully mitigating breakdowns in robotic services. In: Conference on Human-robot Interaction,
pp. 203210 (2010)
4. Torrey, C., Fussell, S.R., Kiesler, S.: How a robot should give advice. In: Conference
on Human-Robot Interaction, pp. 275282 (2013)
5. Takayama, L., Groom, V., Nass, C.: Im sorry, Dave: im afraid i wont do
that: social aspects of human-agent conict. In: Conference on Human Factors in
Computing Systems, pp. 20992108 (2009)
6. Paivio, A.: Mental representations: a dual coding approach (1986)
7. Breazel, C.: Designing Sociable Robots. MIT Press (2002)
8. Szepesvari, C.: Algorithms for Reinforcement Learning. Synthesis lectures on
articial intelligence and machine learning (2010)
9. McCroskey, J.C., Teven, J.J.: Goodwill: A Reexamination of the Construct and its
Measurement. Communication Monographs 66(1), 90103 (1999)
10. Erbert, L.A., Floyd, K.: Aectionate expressions as face-threatening acts: Receiver
assessments. Communication studies 55(2), 254270 (2004)
11. Bartneck, C., Kulic, D., Croft, E., Zoghbi, S.: Measurement Instruments for the
Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived
Safety of Robots. Journal of social robotics 1(1), 7181 (2009)
12. Margaret, M.B., Peter, J.L.: Measuring emotion: The self-assessment manikin and
the semantic dierential. Journal of Behavior Therapy and Experimental Psychiatry 25(1), 4959 (1994)
13. Andrews, P., Manandhar, S.: Measure of belief change as an evaluation of
persuasion. In: Proceedings of the Persuasive Technology and Digital Behaviour
Intervention, pp. 1218 (2009)

Das könnte Ihnen auch gefallen