Beruflich Dokumente
Kultur Dokumente
Introduction
The current study draws on previous research where the goal was to explore how
people can incrementally establish a communication protocol within a simpler
c Springer International Publishing Switzerland 2015
A. Tapus et al. (Eds.): ICSR 2015, LNAI 9388, pp. 348358, 2015.
DOI: 10.1007/978-3-319-25554-5 35
349
setting [1]. In our previous study, we used the knocking as the only communication channel that the human can use to express his intention (knocking on the table)
while the robot used visible movement as the only feedback channel to communicate
back with the human [1]. We showed based on an HRI experiment that, the human
and the robot cooperated and as a result a personalized communication protocol
emerged after each HRI interactions instance. We remarked also, that humans tend
to forget the previously established communication protocol (PECP) and instead
keep on creating new communication protocols in each HRIs instance [1]. Since,
we want the HRI to occur smoothly and implicitly (the human has to feel spontaneous), we need indirectly to make the human remember the PECP so as to avoid
the anticipated unpleasant consequences such as making the humans social face
threatened or the robot being abandoned. Consequently, in our study we address
the issue of triggering the human indirectly to use the same previously established
communication protocol (PECP) rather than creating in each HRI instance a new
communication protocol. We proposed to use 2 mixed-feedback strategies to communicate back to the human and to integrate the robots visible movement along
with IU. Thus, the challenge to be resolved is to investigate the eect of using these
mixed-feedback strategies on the long-term recall (remembrance) of the PECP in
addition to exploring the impact of our strategy on the humans subjective evaluation of the robots performance. Also, we want to determine, which is better in
terms of objective and subjective results: the rst or the second proposed mixedfeedback strategy.
Background
There are some occasions in the HRI when special reasons dictate inconsistent
behavior. For example, we can cite humans forgetfulness of the previously formed
communication protocol just as we remarked in the previous study [1]. In such circumstance, the robot is forced to tell the human that he is wrong which may be
perceived as challenging for the human partner. When the robot disagrees with the
human, the human is confronted with a face-threatening act, placing the user at
risk of being bothered by the robots opposition [2]. The concept of face-threatening
acts was initially proposed by social scientists such as Gomann [2]. Moreover,
even if the robot assumes that itself, it is the faulty party and tries to apologize,
users may lose their trust in the robot and they will use it as a scapegoat to avoid
any responsibility [2], assuming of course that they are even aware of their faulty
behaviors. In this context, Lee et al. [3] utilize dierent strategies including apologies, compensation and options for the user to reduce the negative consequences
of the communication protocols breakdowns. In another study, Torrey et al. [4]
highlight that we just need to add hedges and discourse markers so that we diuse
the humans sensation that the robot takes control over him, when it tries literally
to tell the human nicely that he is wrong. Takayama et al. [5], conrmed through
their study that distancing the voice from the robot makes the human tolerates the
fact that the robot disagrees sometimes with them. We believe that these strategies
are useful but simpler methods can be integrated within HRI to resolve this challenge. We draw inspiration from child-caregiver interactions scenario. We want
350
Y. Khaoula et al.
Fig. 1. In (a), a user interacting with SDT and in (b) we have SDTs architecture
SDT uses 4 microphones to localize the knocks source based on the weighted regression algorithm. It communicates with the human based on a sound output and with
1
According to the dual coding concept, each trigger (visual or audio) that it is combined
with a concept learning during the coding phase (learning phase or by analogy to our
problematic, during the communication protocol establishment), may facilitate the
remembrance (recall) of the information (or in our case the PECP) once the trigger is
exposed to the human.
351
the host computer through Wi-Fi using it control unit (a macro computer chip
(AVR ATMEGA128)). It employs a servomotor that helps to exhibit the dierent behaviors: right, forward, left and back. Finally, 5 photo reectors are utilized
to automatically detect the boundaries of the table and avoid falling (Fig.1).
Actor Learning
Critic Learning
The critic calculates the TD error t which is the dierence between the real value
function of the new gathered state V (st+1 ) and the expected state V (st ) (2)
t = rt + V (st+1 ) V (st )
(2)
with is the discount rate and 0 1. According to the TD error, the critic
updates the state value function V (st ) based on (3).
V (st ) = V (st ) + t
(3)
352
Y. Khaoula et al.
where 0 1 is the learning rate. As long as the knocker disagrees about the
executed behavior before 2 s elapsed, we rene the distribution N (X(st ) , X(st ) )
that it is relevant to the pattern X(st ) (for each pattern, we have a specic
distribution X(st ) ) which helps us to choose the action according to (4).
X(st ) =
X(st ) + Ast
X(st ) + |Ast X(st ) |
X(st ) =
2
2
(4)
Feedback Strategies
In the current study, we have 3 feedback strategies that helps the robot to
communicate with the human.
5.1
This strategy corresponds to the feedback modality that we used in our previous
work while the robot can just execute the action after guessing the knocking
patterns meaning. Based on the robots visible movement, the human has to
understand implicitly how the robot combines for each knocking pattern, a relevant action.
5.2
This method consists on announcing before that the robot executes the intended
behavior, the label of that behavior. (e.g: If the robot has to go right, the robot
generates the IU go right before it executes the action. We believe that by
using this method, the user will have more time to help the robot avoid the
wrong steps.
5.3
We opted for the SARSA algorithm in order to generate in real time and in
an adaptive manner, dierent IUs combined with the robots visible behaviors.
SARSA (so called because it uses state-action-reward-state-action experiences
to update the Q-values) is an on-policy reinforcement learning algorithm that
estimates the value of the policy being followed [8]. An experience in SARSA is
of the form (S,A,R,S,A), which means that the robot was in state S, did the
action A, receives the reward R, and ends up in state S, from which it decided to
do action A. This provides a new experience to update Q(S ,A). The new value
that this experience provides is, r + Q(S , A ). In this context, as the robot will
use the actor-critic to choose the future action (based on the actor component)
and a reward will be generated which will be used by the critic of update issues,
that same action will help to choose the appropriate label (e.g if the robot has
to go left, then the robot will generate the label left as IU (method used in the
static mixed feedback strategy)) and that same reward will help to update the
353
Q function because if the human made an error, then it should also be linked to
his misunderstanding of the IU.
A state S in our case, is the combination of the actual interaction status (agreement or disagreement) in addition to the number of knocks received (e.g: a state
S can be S=(agreement, 2 knocks) which means that the actual interaction status is agreement and the robot receives 2 knocks.). In our context, we have 2
interaction status: the agreement status (when the human does not knock for 2
seconds (s), we assume that the last action was correct) and the disagreement
status otherwise. So, if we assume that each state can have 2 status agreement
and disagreement and we have 4 types of patterns, then we will have 8 states.
We assign for each (state, utterance) a value function Q(S, U ) which we initialize arbitrarily. It helps comparing between dierent (state, utterance) couples
outcomes. In the disagreement status and whenever there will be a knocking
pattern, there are 3 possible actions which precedes the robots movement:
indicating the chosen behavior using an IU (to reduce wrong steps).
repeating the received number of knocks (indicating that the input is processed).
combining both.
The second possible status is the agreement, while we have 3 possible actions:
repeat the label of action A that was previously executed by naming it, take
a small pause and then tell the future robots action B that is intended to
be executed (to make the human aware that the action will be shifted from
A to B).
repeat the previous received number of knocks A in addition to the actual
received number of knocks B (to indicate for the human that the robot is
aware the knocking is shifted from A to B).
indicate using IU the (previous knocking pattern, label of previously executed action), a pause and then indicate the future robots action that is
intended to be executed (to consolidate the lately learnt rule and makes the
human aware about the new action).
a high pitched inarticulate utterance showing enthusiasm (to engage the user
more in the interaction).
The update of the value function follows the equation (5):
Q(S, U ) < Q(S, U ) + w(r + vQ(S , U ) Q(S, U ))
(5)
with 0<w<1 is the learning rate. r is the gathered reward and 0<v<1 is the
discount factor.
Experimental Setup
Participants take part in 2 trials one-by-one while each one cooperates with
the robot in order to lead it to the dierent checkpoints (Fig.2). Each participant was informed that the robot can execute 4 behaviors (right, left, back,
354
Y. Khaoula et al.
forward ) while he has to knock on the table in order to convey his intention
of making the robot choose a specic direction. Each user, has to participate
for the rst time (trial 1) and then answer a survey (indicated in section 7).
After, 3 days the participant comes again to the laboratory to redo again the
same task except that we propose this time (trial 2) a dierent conguration.
The former points marked on the table are changed to guarantee the diversity
of the patterns suggested by the participants (Fig.2). In the current experiment,
we have 3 conditions for which we assigned the same number of participants
(7 participants per condition). It is the same task for all participants except
that, HRI is designed dierently during trials 1 and 2 for each condition. For
the rst condition (MM), we used a simple feedback strategy (Movement (M))
during both trials and for 7 participants while the robot is silent and the human
can just visualize the robots movement. We call that simple feedback strategy,
movement-based feedback strategy. For the condition 2 (MI), the robot uses the
mixed-feedback strategy (M+IU) while it just uses the static method combined
with the robots movement during both trials and for 7 participants. Finally, in
condition 3 (AMI), the robot uses an adaptive method for utterance generation (M+adaptive IU) combined with the robots movement during both trials
and for 7 participants. In total 21 participants ([21-30]years) take part in our
experiment.
Survey Procedure
After nishing each trial, we asked each participant to ll out 7-Likert scale questionnaires so that we could measure: the attachment that may evolve (5 factors
to measure the attachment: adaptability, stress-free, perceived closeness, cooperation and achievement [1]), the robots credibility (using a standard instrument
indicated [9] which consisted of 3 factors: competence, trust and caring), the
social face support (2 factors: human positive face support (HPFS) and human
negative face support (NHFS)) to verify whether the users social faces were
supported during the HRI (inspired from [10]). We also demanded from the
participant to ll out a 4-God speed questionnaire [11] to measure the robots
likeability, perceived intelligence, animacy and anthropomorphism. Moreover, we
evaluate the users mood using SAM scale [12] (2 factors: pleasure and arousal).
355
After answering the questionnaires, we demanded from the user to arrange a list
of words in an increasing order of priority which he assigns according to his own
opinion. This list, is used to measure the persuasiveness based on the kendall-tau
distance and the method indicated in [13]. After that, the user has to describe
his experience with the robot in an open-ended way and then evaluate his future
frequency of use of the robot. Then, we asked the user to play a game for 15
minutes in order to determine whether they remember the interaction rules with
SDT. He has to enumerate the dierent interaction rules that he still remembers
(short-term recall rate). We noted the communication protocol established in
trial 1 and we compared it to the communication protocol used in the beginning
of trial 2, so that we could calculate the reuse percentage of rules that belong
to the previously established communication protocol (long-term recall rate).
We computed also the chi-square and the Cramer V evaluating the stability of
the communication protocol and the relationship between the knocking patterns
and the robots behaviors. Also, we computed the minimal Euclidian distance
between the robots trajectory and the dierent checkpoints (CPs) marked on
the table, so that we can verify the perfection level that the user reached while
doing the task.
356
Y. Khaoula et al.
357
time. Furthermore, results suggest that the short-term recall rate value of the
condition MI is signicantly higher in comparison to the short-term recall rate
values of the condition MM. In correlation to that, we nd that long-term recall
was higher in the condition MI. These results aord insights for the HRI while
it shows that using a mixed-feedback strategy is subjectively more preferred,
leads to better objective performance and relatively a better maintainance of
the PECP.
10
Table 2, exposed the comparison results of the conditions MI and AMI. Based on
Table 2, we can see that users attribute better subjective evaluation of the HRI
when the robot uses the adaptive mixed feedback strategy. We remark also, that
there were no signicant dierences between conditions MI and AMI in terms of
short and long term recall rates which means that both methods guarantee the
remembrance (recall) of the PECP on long and short term basis. Furthermore,
objective performance was higher for the AMI condition. In fact, users achieve
the task more perfectly, in shorter time and succeeded in establishing stable
communication protocols. Consequently, while using the adaptive mixed feedback strategy during an HRI guarantees better subjective ratings, more stable
communication protocol and leads to a better achievement of the task in shorter
a time, the adaptive IU generation strategy does not lead to a better recall of
the PECP.
11
Conclusion
We proposed 2 mixed-feedback strategies that helps to ameliorate users subjective ratings and remembrance of the PECP on a long-term basis. Results
suggest, that the adaptive mixed feedback strategy of IU leads to better recall
rates in comparison to the static mixed feedback strategy. In our future work,
we intend to use aective inarticulate utterances and investigate whether it can
arouse users aective and cognitive empathy.
Acknowledgments. This research is supported by Grant-in-Aid for scientic research
of KIBAN-B (26280102) from the Japan Society for the Promotion of science (JSPS).
References
1. Youssef, K., De Silva, P.R.S., Okada, M.: Investigating the mutual adaptation process to build up the protocol of communication. In: Social Robotics,
pp. 571576 (2014)
2. Brown, P., Levinson, S.C.: Toward a more civilized design: studying the eects of
computers that apologize. Journal of Human-Computer Studies, 319345 (2004)
358
Y. Khaoula et al.
3. Lee, M.K., Kielser, S., Forlizzi, J., Srinivasa, S., Rybski, P.: Gracefully mitigating breakdowns in robotic services. In: Conference on Human-robot Interaction,
pp. 203210 (2010)
4. Torrey, C., Fussell, S.R., Kiesler, S.: How a robot should give advice. In: Conference
on Human-Robot Interaction, pp. 275282 (2013)
5. Takayama, L., Groom, V., Nass, C.: Im sorry, Dave: im afraid i wont do
that: social aspects of human-agent conict. In: Conference on Human Factors in
Computing Systems, pp. 20992108 (2009)
6. Paivio, A.: Mental representations: a dual coding approach (1986)
7. Breazel, C.: Designing Sociable Robots. MIT Press (2002)
8. Szepesvari, C.: Algorithms for Reinforcement Learning. Synthesis lectures on
articial intelligence and machine learning (2010)
9. McCroskey, J.C., Teven, J.J.: Goodwill: A Reexamination of the Construct and its
Measurement. Communication Monographs 66(1), 90103 (1999)
10. Erbert, L.A., Floyd, K.: Aectionate expressions as face-threatening acts: Receiver
assessments. Communication studies 55(2), 254270 (2004)
11. Bartneck, C., Kulic, D., Croft, E., Zoghbi, S.: Measurement Instruments for the
Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived
Safety of Robots. Journal of social robotics 1(1), 7181 (2009)
12. Margaret, M.B., Peter, J.L.: Measuring emotion: The self-assessment manikin and
the semantic dierential. Journal of Behavior Therapy and Experimental Psychiatry 25(1), 4959 (1994)
13. Andrews, P., Manandhar, S.: Measure of belief change as an evaluation of
persuasion. In: Proceedings of the Persuasive Technology and Digital Behaviour
Intervention, pp. 1218 (2009)