Beruflich Dokumente
Kultur Dokumente
Robotic Control
Shafkat Kibria
Umea University
Department of Computing Science
SE-901 87 UMEA
SWEDEN
Abstract
1 Introduction 1
2 Literature Review 3
2.1 About Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 VUI (Voice user interface) in Robotics . . . . . . . . . . . . . . . . . . . 9
4 Implementation 15
4.1 General Robotic Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.1 Behaviors Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Hardware Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 System Component . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.3 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Software Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.1 System Component . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.3 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Evaluation 35
5.1 Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.1 Hardware approach . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.2 Software approach . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.3 Experience from the Technical Fair . . . . . . . . . . . . . . . . . 36
iii
iv CONTENTS
6 Discussion 45
7 Conclusions 47
7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8 Acknowledgements 49
References 51
B Installation guide 61
B.1 Developer guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
B.1.1 Speech Recognition software product installation . . . . . . . . . 61
B.1.2 The Source code files . . . . . . . . . . . . . . . . . . . . . . . . . 62
B.2 User guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
C User Questionnaire 65
D Glossary 67
List of Figures
3.1 A context-free grammar for simple expressions (i.e., a+b or ab+ba etc.) 13
v
vi LIST OF FIGURES
5.5 The histogram shows participant users information on the basis of age
and occupation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.6 The user comments about controlling the CARO. . . . . . . . . . . . . . 41
5.7 The Users comment about CAROs efficiency. . . . . . . . . . . . . . . . 42
5.8 The Users Comment about flexibility. . . . . . . . . . . . . . . . . . . . 43
5.9 The Users comment about their preferences. . . . . . . . . . . . . . . . . 44
B.1 The available software products and their files name in the SpeechStudio
Developer Bundle Package. . . . . . . . . . . . . . . . . . . . . . . . . . 61
vii
viii LIST OF TABLES
Chapter 1
Introduction
The theme of Social interaction and intelligence is important and interesting to an Ar-
tificial intelligence and Robotics community [9]. It is one of the challenging areas in
Human-Robot Interaction (HRI). Speech recognition technology is a great aid to admit
the challenge and it is a prominent technology for Human-Computer Interaction (HCI)
and Human-Robot Interaction (HRI) for the future.
Humans are used to interact with Natural Language (NL) in the social context. This
idea leads Roboticist to make NL interface through Speech for the HRI. Natural Lan-
guage (NL) interface is now starting to appear in standard software application. This
gives benefit to novices to easily interact with the standard software in HCI field. Its
also encourage Roboticist to use Speech Recognition (SR) technology for the HRI. To
percept the world is important knowledge for the knowledge Based-Agent and Robot to
do a task. Its also a key factor to know initial knowledge about the Unknown world.
In the social context Robot can easily interact with Human through SR to gain the
initial knowledge about the Unknown world and also the information about the task to
accomplish.
There are several SR interface robotic systems have been presented [30, 6, 22, 20, 11, 17].
Most of the projects emphasize on Mobile Robot - now a days this type of robot is get-
ting popular as a service robot at indoor and outdoor1 . The goal of the service robot is
to help people in everyday life at social context. It is an important thing for the Mobile
robot to communicate with the users (human) of its world. So Speech Recognition (SR)
is an easy way of communication with human and it also gives the advantage of inter-
acting with the novice users without a proper training. Uncertainty is a major problem
for navigation systems in mobile robots - interaction with humans in a natural way,
using English rather than a programming language, would be a means of overcoming
difficulties with localization. [30]
In this project our main target is to add SR capabilities in the Mobile Robot and
investigate the use of a natural language (NL) such as English as a user interface for
interacting with the Robot. We choose small Mobile Robot (Khepera) for this inves-
tigation. We try both with hardware Speech Recognition (SR) device and as well as
Software PC based SR to achieve our goal. Both technologies are used for SR system
1 World Robotics survey 2004 - issued by UNECE: United Nations Economic Commission for Europe.
1
2 Chapter 1. Introduction
depending on the vocabulary size and the complexity of the grammar. We define sev-
eral requirements for our prototype system. Interaction with robot should be in natural
spoken English (within the application domain). We choose English, because it is most
recognized international Language. The robot should understand its task from the dia-
logues has spoken. The system should be user independent.
In the following chapter we are going to discuss more about the SR system and most
important parts - introducing SR system to the Robotic for interaction purpose. We
start with the literature review about SR system and Voice User Interface (VUI) system
(Chapter 2 on page 3). Then we discuss about the important components of Language
and Speech in Chapter 3 (on page 11). This includes Speech, Speech synthesizer, Speech
Recognition Grammar etc. Chapter 4 (on page 15) contains the description about the
implementation part of our project. There, we discuss about the components -we used
for implementation the system and also the mechanism of the system. Later on, we
have presented our test result in the Chapter 5 (on page 35) and we also do a discussion
about the result - we have presented (see in Chapter 6 on page 45). We conclude in
Chapter 7 (on page 47), in the conclusion part we discuss about the limitation as well
as future work.
Chapter 2
Literature Review
Worldwide investment in industrial robots up 19% in 2003. In first half of 2004, orders
for robots were up another 18% to the highest level ever recorded. Worldwide growth
in the period 2004-2007 forecast at an average annual rate of about 7%. Over 600,000
household robots in use - several millions in the next few years.
UNECE issues its 2004 World Robotics survey [36]
From the above press release we can easily realize that household (service) robots getting
popular. This gives the researcher more interest to work with service robots to make it
more user friendly to the social context. Speech Recognition (SR) technology gives the
researcher the opportunity to add Natural language (NL) communication with robot in
natural and even way in the social context. So the promise of robot that behave more
similar to humans (at least from the perception-response point of view) is starting to
become a reality [28]. Brooks research [5] is also an example of developing humanoid
robot and raised some research issues. Form these issues; one of the important issues is
to develop machine that have human-like perception.
3
4 Chapter 2. Literature Review
factors in a Speech recognition system. Language model or artificial grammars are used
to confine word combination in a series of word or sound. The size of the vocabulary
also should be in a suitable number. Large numbers of vocabularies or many similar-
sounding words make recognition difficult for the system.
The most popular and dominated technique in last two decade is Hidden Markov Models.
There are other techniques also use for SR system - Artificial Neural Network (ANN),
Back Propagation Algorithm (BPA), Fast Fourier Transform (FFT), Learn Vector Quan-
tization (LVQ), Neural Network (NN) [7].
There are both Speech Recognition Software Program (SRSP) and Speech Recognition
Hardware Module (SRHM) is available now in the market. The SRSP s are more mature
then SRHM s, but it is available for limited number of languages [12]. See Table 2.2 - A
complete list of available languages for Speech Recognition Software Program (SRSP).
Table 2.3 shows the available SR programs for developer and their vendors.
6 Chapter 2. Literature Review
The SRHM is also getting matured; previously most of commercial SRHM s only sup-
port speaker dependent SR technique and isolated words. Now you can find some of the
SRHM s available in the market, which can support speaker independent SR technique
and also the continuous listening. Table 2.4 shows some of the SR hardware modules
(SRHM s).
For our project we have used SpeechStudio Suite for PC based Voice User Interface
(VUI) and Voice ExtremeT M Module for stand alone embedded VUI for the Robotics
8 Chapter 2. Literature Review
control.
Systems, Inc. , IBM Corporation , Novell, Inc. , Philips Speech Processing, Texas Instruments
Incorporated. Sun does not ship an implementation of JSAPI
b This product is an outcome from Sphinx Group, which has been funded by Defense Advanced
Table 2.3: Some of the available SR programs for developer and their vendors.
SR Module Manufacturer
Voice ExtremeT M Mod- Sensory,Inc.
ule http://www.sensoryinc.com/
VR StampT M module Sensory,Inc.
http://www.sensoryinc.com/
HM2007 - Speech Recog- HUALON Microelectronic Corp. USA
nition Chip
OKI VRP6679 - Voice OKI Semiconductor and OKI Distribu-
Recognition Processor tors Corporate Headquarters 785 North
Mary Avenue, Sunnyvale, CA, 94086
2909
Speech Commander - Ver- Verbex Voice Systems 1090 King
bex Voice Systems Georges Post Rd., Bldg 107, Edison NJ
08837, USA
Voice Control Systems Voice Control Systems, Inc. 14140 Mid-
way Rd., Dallas, Tx. 75244, USA
http://www.voicecontrol.com/
VCS 2060 Voice Dialer Voice Control Systems 14140 Mid-
way Rd., Dallas, Tx. 75225, USA
http://www.voicecontrol.com/
Table 2.4: Some of the Available SR Hardware Module and their Manufacturer.
2.3. VUI (Voice user interface) in Robotics 9
After near about three decades of research, SR system is getting more mature to use as
a User Interface (UI). Scientists are still working to overcome the rest of the problem
of SR system. Now there are several project going on to introduce SR system as a
UI in Robotics [30, 6, 22, 20, 11, 17]. Most of the projects are working on the Service
Robot and focus on the novice user for controlling or instructing the robot. It is easier
to introduce to the novice user rather than GUI, Keyboard, Joystick etc. technologies.
This is because, human are used to give voice instruction (like - Go to the Office room
and bring the file for Me.) in every day life. But the challenge of HRI is that the
novice user only knows how to give instruction to a human; so the research goal is to
make the robot capable enough that it can understand the same high-level instruction
or command.
For the software development, the normal practice is - to design UI at the early stage of
the designing process, then design and develop the software based on the UI design. The
concept of UI depends on the robots sensors in robotics. The spoken interface is very
much new component added in the HRI field. In the social context people expect that
the robot/machine should understand unconstrained spoken language, so the question of
interface requires to be considered prior to robot design [6]. Like - If a mobile robot needs
to understand the command turn right at blue sign, it will need to be provided with
color vision [6]. Another important thing is that the instructions should be related to
the robots structure or shape, for example - if the robots structure is a car shape then
the instruction should be correspondence to the car driving environment. People have
already adapted the scenario of giving instruction from the social context, so when they
see the car environment, they normally interact with the car (robot/machine) depend
on the environment. Continuous testing with user is extremely important in the design
process for service robot. The instruction design for robot should not focus on only on
the individual user, but that other members in the environment can be seen secondary
users or bystanders who tend to relate to the robot actively in various ways [17]. To
know about the environment object is one of the important criteria in robot navigation.
10 Chapter 2. Literature Review
When the user give the instruction like Go to my office, then it should understand
the object my office; it is the natural description of an object in social context [30].
From the HRI points of view - the Robot should understand of its environment and its
task.
Another component is Speaker (Loud Speaker). If anything goes wrong then the Robot
can inform the user through Speaker (Loud Speaker) using Speech synthesizer (See de-
tail section Speech). For example, if the Robot doesnt understand the command, then
it can give the feedback to user through speech or dialogue - like, I dont understand
using Speech synthesizer.
Figure 2.2 shows a general overview of Spoken Natural Lnaguage Interface for Robotics
Control.
At the beginning researchers have worked with the simple grammar sentence instruc-
tion, like Move, Go ahead, Turn left. One of the examples is VERBOT (Isolated
word speaker dependent Voice Recognition Robot), a hobbyist robot, sold in the early
1980s - it is not available in the market [13]. Now the researchers have emphasized
on complex grammar sentence instruction, which people normally use in their daily life
[30, 6, 22, 20, 11, 17]. We have also organized our project work in the same way. The
roboticists also have used speech synthesizer for error feedback. LED or Color light can
also be used for user feedback, but it is not suitable enough for feedback to human user.
We have also organized our project work in the same way.
Chapter 3
A language is the system of communication in speech and writing that is used by people
of a particular country or area. [26]
In short we can say language is a systemic way of communication using sound and
symbols. From above the definition it is clear that speech is one of the important media
of communication, but it should be used in a systemic way - means should follow rules
or grammar - then we can say this as a language. So grammar is also an important
part of a language.
The way we communicate through speech is called spoken language, more specific -
(language) communication by word of mouth [37]. In spoken language communication,
there are two important things - one is speech and other one is speech understanding.
Something spoken [37] is called Speech and after hearing if the person understand -
what is spoken? - Then it is speech understanding. In the social context we use Natural
language as a spoken language. Now the question arrives - what is Natural Language?
People are social beings and language is the communicating way between people, we
normally call it Natural Language, more specific - a language that has developed in a
natural way and is not designed by humans [26].
11
12 Chapter 3. Language and Speech
3.1 Speech
Speech is an essential component of spoken language. From the early discussion about
spoken language, we figure out that Speech Understanding and Speech are two important
components of spoken language. In term of machine, the scientist defines these two
components as Speech Recognition system and Speech synthesizer. Below we continue
our discussion about these two components.
The earliest Speech Synthesizer was invented by Thomas Edison in 1878. [21] He in-
troduced the record-player or the Phonograph (talking machine), which is one kind of
Speech Synthesizer. The mechanism of a record-player is to record voice/speech and
also possible to playback the voice/speech. Due to advances in technology, now you can
even create voice/speech from text. This technique is called Text-to-Speech Synthesis,
in short TTS.
TTS is computer software that converts text into audible speech [3]. It is a sepa-
rate technology from speech recognition, TTS is for talking and SR is for listening.
Both systems have some shared technology; thats why, the manufacturer or developer
construct combined products. TTS is available only for the SRSP technology. For the
SR Hardware Module (SRHM), the Speech Synthesizer normally uses digitized voice
recording mechanism. The main advantage of digitized voice recording mechanism is -
the sound/voice can be store in the computers memory. [13]
3.2 Grammar
One of the key components of a language is Grammar. A Grammar is the rules in a
language for changing the form of words and joining them into sentences [26]. In an-
other words - grammar is a body of statements of fact - a science; but a large portion
of it may be viewed as consisting of rules for practice, and so as forming an art [25].
The main point is - its a way of structuring words to make sentences meaningful.
a collection of phrases for which the speech recognition engine should be listening.[34]
The simplest artificial grammar can be specified through finite automata and more
general artificial grammars (approximate natural language) are specified in terms of a
context-sensitive grammar [8]. Most SR systems have used CFG for natural language
processing, since CFG have been widely studied and understood and also well efficient
parsing mechanisms have been developed for the CFG [23]. The theory of context-free
languages has been extensively developed since 1960s [16]. A CFG is way of describing
language by recursive rules called productions [16]. A CFG (G) is represented by four
components G = (V, T, P, S) where V is the set of variables, are called non-terminals,
T are called terminals (a finite set of symbol), P the set of productions, and S the start
symbol [16].
1. S I
2. S S+ S
3. S (S)
4. I a
5. I b
6. I Ia
7. I Ib
Figure 3.1: A context-free grammar for simple expressions (i.e., a+b or ab+ba etc.)
The above grammar for expression is stated formally as G = ({S, I}, T, P, S), where T
is the set of symbols {+, a, b} and P is the set of productions show in the figure 3.1. In
the Figure 3.1, Rule (1) is the basis rule for expressions. It represents that an expression
can be a single identifier. Rule (2) to (3) show the inductive case for expressions. Rule
(2) presents that an expression can be produced from two expressions and plus sign is a
connecting symbol between them; Rule (3) says that an expression may have parentheses
around it. Rule (4) through (7) describe identifiers I. The basis rules are (4) and (5); they
represent that a and b are identifiers. The rest of the two rules are the inductive case - if
we have an identifier, it can be followed by a or b and result will be another identifier.[16]
S S+ S
But in the case of context-sensitive grammars (CSG), the productions are restricted
to rewrite rules of the form,
uXv uYv
14 Chapter 3. Language and Speech
One of the complexity measures of a SR is the size of the vocabulary and the com-
plexity of the artificial grammars.The SR tools give the opportunity to developers to
create grammars for their system context. If you think from the Roboticists point of
view, the grammar should be created in the context of the Robots environment and the
Robots task related. So, before creating the grammar for the SR engine, the Roboticist
needs to study the task definition and the users.
Chapter 4
Implementation
The main goal of our project is to a introduce Spoken Natural Language interface for
Robotics control. We also set some requirements, which are mentioned in the Introduc-
tion Chapter -
The robot should have some user feedback; such as, if the robot doesnt understand
the user commands, it gives the user feedback - I dont understand
The robot should understand the dialogue, which are mentioned in the Table 4.1,
4.2 and 4.3.
Table 4.1, 4.2 and 4.3 show the sentences/dialogues we have chosen to evaluate our
system. These sentences/dialogues are arranged in the tables on the basis of grammar
complexity and robotic activities.
15
16 Chapter 4. Implementation
Note: The underlined words are variables,like Move 10 centimeters- here any number can be used
in the sentence.
Table 4.1 shows simple sentences/dialogues for simple limited robotic activities; Ta-
ble 4.2 shows simple sentences/dialogues for complex robotic activities in a limited
scope and Table 4.3 shows complex sentences/dialogues for simple robotic activities in
a limited scope.
To achieve our goal, we organize our project in two stages. At Stage I - we studied the
related works and also found suitable components (Software and Hardware components-
see details in Appendix-A) for the implementation stage. In Stage II - we did the im-
plementation. At implementation, we did the development in two Phases. In the First
Phase - we have worked with the SRHM and in the Second Phase - we have worked
with the SRSP. In the both Phases we worked with a same Small Mobile Robot named
Khepera.
Reactive paradigm has got popular in end of 1980s, because of the faster execution
time characteristic, but still there are limitations caused by eliminating the Planning.
To overcome the limitation, the Hybrid deliberative/reactive paradigm emerged in the
1990s [24]. Purely reactive robotic is not appropriate for all robotic application [2]. The
Hybrid paradigm is capable of integrating deliberative reasoning and reactive control
4.1. General Robotic Design 17
system. This permits the robot to reconfigure the reactive control system based on
world knowledge through deliberative reasoning over a world model.
To create a Hybrid paradigm system, we have to identify the behaviors for our robotic
control system. For our project we define the behaviors, which are mentioned in the
Table 4.4.
Behavior Purpose
Move Straight robot movement
Turn For turning
Avoid-Obstacle Avoid obstacle
Follow-wall Follow the wall
Move-to-goal Find-out and follow the goal heading
Obstruction Identify the obstacle
At-goal Identify the Goal position
These behaviors are reactive behaviors and they are switched according to user com-
mands. If we consider the Table 4.1, 4.2 and 4.3; there we have mentioned Robotic
activity wise users sentences/dialogues. Now we describe the relation between these
robotic activities sentence and the behaviors, which are mentioned above.
If the user gives commands related to Move robotic activity, like Move, the Move
behavior will be switched on; it makes the robot to forward as default, but the user can
also input a distance (centimeter measurement) that makes the robot move this specific
distance. For the Turn robotic activitys sentences, Turn behavior will be switched on.
It makes the robot turn and needs the direction, right or left or the number of degrees as
input to turn the robot in a specific direction. The Avoid-Obstacle behavior helps the
robot to avoid the obstacle in its arena. This behavior also toggle with other behaviors,
whenever there is an obstacle in front to make the motion safe. The Follow-wall ac-
tivitys command sentences make the robot switch on the Follow-wall behavior. This
behavior makes the robot following a wall or an obstacle. For the Initiate a location
activity, the robot stores the current position in the global memory. For the Find-out a
location activity, Move-to-goal, At-goal, Obstruction, Follow-wall behaviors tog-
gle each other depending on the situation. Move-to-goal helps to make the robot
turn in the goal direction (means the location its looking for) and to move towards
the target direction. The Obstruction behavior helps the robot to detect obstruction
whenever an obstruction comes in front in the goal direction. This behavior switches on
the Follow-wall behavior. The At-goal behavior helps the robot to identify the goal
position and, if positively identified, stop the robot.
After identifying the behaviors, our next move is to organize the behaviors for the
Hybrid paradigm. In general the Hybrid architecture has five components or modules -
these are [24]:
Sequencer - The agent which generates the set of behaviors to use in order to ac-
complish a subtask, and determines any sequences and activation conditions.
18 Chapter 4. Implementation
Mission planner - This agent interacts with the human, operationalizes the com-
mands into robot terms, and constructs a mission plan.
Performance monitoring and problem solving - This module allows the robot
to notice if it is making progress or not.
We have followed the common components to create the Hybrid architecture for our
project. The Table 4.5 below summarizes our Hybrid architecture(Figure 4.1) in terms
of the common components and style of emergent behavior:
Table 4.5: The summary of Hybrid architecture (Figure 4.1) in terms of the common
components and style of emergent behavior.
Figure 4.1 presents the Hybrid architecture in our prototype. According to the architec-
ture, Reactive planner module works as a Sequencer as well as Performance monitoring
and problem solving agent - this module selects the behaviors from the behaviors-library
and sends them to the Reactive behaviors module and always monitor the VUI, Position
identifier and Object recognition modules inputs to solve the current problem; the Voice
User Interface (VUI) module, which acts as a Mission planner, is interacting with the
human and send the mission plan to the Reactive Planner; the Position identifier and
the Object recognition modules are acting like a Cartographer - the Position identifier
always records the current position and the Object recognition module identifies the
goal object; the Reactive behaviors acts as a Resource manager. In the Reactive layer,
the Avoid-Obstacle module suppresses (marked in the Figure 4.1 with a S) the output
from the Reactive behaviors module. The Reactive behaviors module is still executing,
but its output doesnt go anywhere; instead the output from Avoid-Obstacle goes to
Actuator, when the robot gets obstacle in the front.
4.1. General Robotic Design 19
8
X
mR = vi ri + v0
i=1
Here wi , w0 , vi , v0 mean weights, ri means IR sensors reading and mL and mR are the
speed for Left and Right Motors of the Khepera. This equation helps us to create Avoid-
obstacle and Follow-wall behaviors.
20 Chapter 4. Implementation
Odometry: Odometry is used for determine the current khepera position ( x-coordiante,
y-coordinate, theta). In this algorithm, the set position function is called to set the ini-
tial khepera values for x, y and theta. The read position function is used to obtain the
tick counts. This tick count values are used to compare the kinematic movement of
the left and the right wheels of the khepera. We have followed the below equations to
calculate the position from the tick counting [15].
R = l/2(nl + nr )/(nr nl )
t = (nr nl )step/l
ICC = [ICCx , ICCy ] = [x Rsin, y + Rcos].
x0 cos(t) sin(t) 0 x ICCx ICCx
y 0 = sin(t) cos(t) 0 y ICCy + ICCy
0 0 0 1 t
Where (x,y,) is previous robot postion and the new calculated postion is (x0 , y 0 , 0 ).
ICC (Instantaneous Center of Curvature), angular velocity and t represent time.
Wheel encoders give decoder counts nr and nl ; step is the length (mm) of one decoder
tick. (See Figure 4.2)
Bug algorithm: This algorithm is used in making the robot navigate from the source
position to the destination position.
Figure 4.3: The robot can able to handle this kind of situations through Bug algorithm
[14].
4.2. Hardware Approach 21
In the algorithm, there is a while loop that checks if the goal is actually been reached
or not. When ever the goal position is not reached the khepera checks for obstacle. If
it meets with an obstacle then it follows the obstacle by using followobstacle function.
If it doesnt encounter an obstacle then it uses the move2goal function to move towards
the goal direction. The speed of left and right wheel is obtained from either followob-
stacle function or move2goal function. Then the Set speed function is called to make the
khepera move with the obtained wheel speeds. The current position is updated and the
khepera stops when it reaches the goal. [14, 10]
In the beginning we have studied the above mentioned software and hardware com-
ponents (see details in Appendix A). After that we have designed a work outline for
this development phase. We have defined spoken dialogues simple grammars for SRHM,
since it is not capable to load a large vocabulary. The reason behind that is memory
space problem. At first the mechanisms of the Khepera and the VE Module have been
investigated, after that the interface and communication way between the VE Module
and Khepera has been also investigated.
About the Khepera hardware, it has 8 IR and ambient light sensors, microcontroller
and 2 DC brushed servo motors with incremental encoders and wheels [19]. With the
help of these IR sensors and others hardware components, we have implemented the
behaviors mentioned in the Table 4.4. After studying the General I/O Turret, we have
found way of communicating with an external device from the Khepera. Through the
General I/O we can only transfer/receive 8 bits (1 byte) of data from the Khepera. (see
details in Appendix A)
to carry signals between General I/O Turret and the VE Module. We have selected
P1-0 to P1-6 as output pins; P0-1, P0-3, P04 as a Red, Yellow and Green LED output
and P0-7 as a Training mode selection pin (it is also set as a input pin) from the 11
I/O pins and pin 4 is for MIC IN (this is a default pin for Microphone input). (See the
detailed pins configuration in Appendix - A).
To start writing project application for the VE Module - we have needed to get used
to Voice ExtremeT M Toolkit. This Toolkit has some hardware components and some
software components, which are we mentioned at beginning of this section. Now we will
discuss some details about their usage. -
We have used the first two data files for our application. *.ves data file was used
for speech synthesis technique, it is a speech table. Quick SynthesisT M was used to
produce a speech file, *.ves. *.veo data file is used for Sentence generation from one
or more speech tables (*.ves files). We have used *.veo file for speech synthesis in
the training session. [32]
The Figure 4.5 shows the overview of the interface between Khepera General I/O Turret
and VE Module. The four areas are marked there. These are -
1. Serial line (S) connector - For interface with the PC.
2. I/O connections area - We only use the Input pins.
3. Free connections area - We have setup LEDs there.
4. Module Connector - Uses for interfacing with other devices
We have intended to use LED to give the developer feedback about the communication
status and the device status. Red LED informs the status about CL feature of the
SR module, Yellow LED gives the developer status whether the device is ready for
the listening or not. The Green LED gives the status of Recognition or not. As a
consequence of using the SD feature, we have needed a pin for mode selection. In the
above we mention it as a Training mode selection pin. To use the SD feature we need
24 Chapter 4. Implementation
Figure 4.5: The circuit diagram of the interface between Khepera General I/O Turret
and VE Module .
a training session to store the voice templates of the user for the every word or phrase.
When this pin is HIGH, it set the device for the training session and LOW sets it to the
SR mode. Figure 4.6 shows the picture of Khepera with VE Module after implement
the circuit design.
4.2. Hardware Approach 25
Communication Protocol
For data communication between the Khepera and the VE Module, we have chosen
packet sending technique. Maximum size of the command-sentence-packet is 6 bytes;
starting with a number 127/126 and ending with a number 127/126 - but starting
and ending number is the same. Any of these numbers is is selected from these two
(127/126) depending on the previous packets start/end number. i.e., if the previous
packets starting and ending number is 127 then the next newly generated packets
starting and ending number is 126. When the power is switched on, the first recognized
(through the VE Module) command-sentence-packets starting and ending number is
126. (See Figure 4.7)
The starting and ending number help us to identify a packets starting and ending. The
reason we have chosen two different numbers is to identify the last generated packet,
because the last generated packet is the new command for the Khepera.
Language Model
Language model/artificial grammar is an important issue for the Speech Recognition
system. The problem with SRHM (here, it is the VE Module) is that the developers
have to take care of this matter, when they do the design and implementation parts.
We have also designed a language model for our system; we have made it for a limited
scope - first we have selected some words/phrases, which fulfill our goal, for system and
then designed a Lexicon table and the artificial grammars, which are presented below.
Grammar
Semantic Analysis
Check the mapping between Unit/object and Command to find the proper meaning
of the sentence and the proper function to run.
i.e From the lexicon we find the mapping like U2=U2, means if degrees word
come in a sentence there should be turn word in the same sentence
Table 4.6 shows the words/phrases selected for the system design, these are also used
at the training session. The User of the system has to train the system following this
Lexicon table. There are some marked signs used near the word or phase - like U1, U2,
O1; these marks are useful for semantic analysis (see Figure 4.9).
The Figure 4.8 presents the artificial grammars for the SR system. Using these artificial
grammars we have done the syntactic analysis at the VE Module, when its recognized
a sentence for system. Example of syntactic and semantic analysis is given below:
Move 1 centimeter - this is example a command sentence, which the user can say
the robot; the system recognizes the sentence in a sequence of words - Move, 1 and
centimeter; after recognizing the sequence of words, the system matches the words
types (move - Command, 1 - Parameter, centimeter - Unit) in the Lexicon table
and sequence the words type same as the recognition words order. After that matches
the words type sequence with artificial grammars; i.e., Command + Parameter + Unit.
The system also does the semantic analysis; i.e., (move) U1 = (centimeter) U1.
Training Mode
We need to train the VE module, because we are using the Speaker dependent feature.
In this feature the User should store his/her voice pattern through a training session.
The Training mode selection pin activate the training session if it is HIGH, otherwise
the system use the previous storage pattern if it is previously trained. We have divided
the training session into four steps - in the first step the User has to train the VE Module
with Stop or similar word command, and then the consecutive steps are trained with
the Command, Parameter, Unit words. The reason behind these training session steps is
- the language model, which we have of this implementation part, consists of Command,
Parameter, Unit words, like - Move 1 centimeter (Command+ Parameter+ Unit ) and
also the VE Module returns index number of the recognized pattern from the storage
table. The training session helps us to identify the index range of the three types of
trained words. i.e., 0-5 range indexes are Command type words. These ranges are
helpful to the Syntactic analysis of the recognized sentence.
Khepera (Robot)
We have followed the general robotic design structure to make the robot intelligent.
At first we have implemented the behaviors which are mentioned in Table 4.4. To
implement these behaviors, we have followed the Breitenberg vehicle technique [4],
Odometry [15] and the Bug algorithm [10].
Breitenberg vehicle technique [4] helps us to implement the Avoid-obstacle and Follow-
wall behaviors. (See more detail in section 4.1.1)
The Odometry gives the Khepera position (x,y,)- x,y coordinate, is the heading
of the Khepera and the Bug algorithm [10] helps to move-to the goal position.(See more
details in section 4.1.1)
After building the behaviors which are mentioned in Table 4.4, we have managed the
behaviors by following the Hybrid architecture show in Figure 4.1. According to the ar-
chitecture, the program select behaviors based on the recognized voice command through
SR and activate the behaviors. For avoiding collision, we have implemented mechanism
that the Avoid-obstacle behavior is switched on whenever an obstacle is nearby.
We have the Lexicon-table (see Table 4.6) of words in the Khepera function/module,
which is identical with the stored voice-pattern for words in the VE Module. Here iden-
tical means that if an index represents a voice-pattern for a word in the VE Module, the
same index represents the same word in the Lexicon-table - that means the index num-
bers, which we read-out from Packet, represent the same words from the Lexicon-table.
After identified the words, we have done the semantic analysis to verify the sentence
meaning, which means the identified command sentence can be Move A cm - here
the sentence follows the grammar perfectly, i.e., Command+Parameter+Unit ; butA is
not the correct parameter for the Move command, it should a number type parameter,
i.e., 10. If the sentence is meaningful then send the command to activate the related
behaviors.
1 The VE Modules 7 I/O pins are connecting to the Khepera for sending data. Through 7 I/O pins
we are able to generate any number with-in 0-127. We have reserved 127 and 126 numbers only for the
Packet start/end byte, other then these we have used for representing indexes of the words, which are
stored in the VE Module.
4.3. Software Approach 29
First we check the Training mode pin is HIGH or LOW. If it is HIGH we call the train-
ing function.In the training mode, we save the voice-pattern of the user in the Flash
memory of the VE Module. At the beginning of the training session we allocate the
memory for the voice-patterns, which are to save. There are four steps in the training
session. At the first, the first word of the training session should be Stop or simi-
lar word and it is automatically switch on the next step. We suggest the user to use
Stop or similar word; because according to our design the user can use this word for
finishing the other consecutive steps and also can use as a command word for stop the
robot movement. In the next consecutive steps user have the option to train maximum
20 words in every step. At 2nd step user can train the system with Command word;
according to our Lexicon-table 4.6 he can only able to train 4 Command words; so after
trained these four Command words, he/she can proceed to the next step just simply
saying the first steps recorded word - i.e., Stop. For the voice-pattern sample collect-
ing, we first collect a pattern sample of a word from user by requesting him/her through
speech synthesis - i.e., Say word one; after collected the first sample, we request again
to give another sample by using the speech synthesis - i.e., Repeat. Then we check the
similarity of the two samples, if these samples match each other then we take an average
of the two samples; otherwise ask for a another sample through Repeat request. In
the 3rd step user can train the module with Parameter words and then the last step the
user can train the module with Unit words and Object words.
After collecting the lexicon through training session, the VE Module is read for Speech
Recognition. After collecting the lexicon through training session, the VE Module is
read for Speech Recognition. We have applied the Continuous Listening (CL) feature
for SR. To implement the CL feature, we have used built-in function to recognize a word
pattern from the lexicon and to return the index number of the word from the table. We
set this built-in to listen 2 second duration and then time out, if it listen a word with-in
this duration it waits for another word and so on as far the words sequence follows
the Grammar (See Figure 4.8); if the module waits for a word it blinks the YELLOW
LED. When the function listens the words it does two things, recognize the pattern and
check the grammar; if any recognition or grammar error finds that processing time, it
on the RED LED and if everything goes fine it gives the green signal through GREEN
LED. After recognition a sentence, it makes a Command-Sentence-Packet by using the
protocol (See Figure 4.7) and then transmits the packet after every 2 sec through the
output pins as far as the new packet is generated.
Software Component: Visual Basic 6.0 (VB6), SpeechStudio Developer Bundle (Speech-
Studio, SpeechRunner, Lexicon Builder, Lexicon Lite, SpeechPlayer, Profile Manager)
30 Chapter 4. Implementation
There are several SR software products available in the market and also these are used
commercial with many products user interface. These SRSPs are more mature then
the SRHM and also support large vocabulary and complex grammar. These SRSPs are
more mature then the SRHM and also support large vocabulary and complex grammar.
That is why; we have chosen to implement another prototype by using SRSP. In this
implementation phase our first approach to know about the chosen components. We
chose SpeechStudio Developer Bundle as a SR interface, because it is suitable with Mi-
crosoft Speech API and our development environment was in Microsoft Windows.
We have done this implementation in two steps. One has been tested with Simple
Sentences - i.e., we have presented as a Candy Robot in the Stockholm International
Fair and another has been tested with more complex sentences for controlling Robot.
(See details the chapter 5)
Khepera
In section 4.2.1 we have mentioned two approaches for programming the Khepera. One
is through sercom protocol; other one is through GNU C Cross Compiler [19]. In previ-
ous phase (at Hardware approach) we have used both, but for this phase we have only
used sercom protocol, which allows the user to control the robot from any standard
computer based on ASCII commands [19] and VB6.0 to communicate with Khepera
through sercom protocol.
We have implemented the behaviors by following the same strategy mention in sec-
tion 4.2. The difference is that we implement all behavior using VB6.0 and sercom
protocol.
Here we havent needed to use the General I/O Turret, because we have no external
hardware device to interface with the Khepera.
SpeechStudio
SpeechStudio Developer Bundle has six components (these are mentioned above) for the
developers to handle. From these, [34] -
Profile Manager is used for adjusting the microphone and creating user profile,
this SR system is normal respond with any user - means speaker dependent, but
because of noise factor sometimes it needs to be training by the user to adjust
with the environment, that is why user profile is important;
Lexicon Builder is to add new word in the SR systems dictionary and the Lexicon
Lite is used to backup the dictionary.
Figure 4.11 shows the interfacing between SpeechStudio SR system and VB6.0. Speech-
Studio Suite is an environment for the development of voice user interfaces (VUI) in
Microsoft Visual Basic . SpeechStudio Suite has an authoring component called Speech-
Studio, which helps the developer design grammars to describe conversations, and to
connect these grammars to actions in his/her programs. The resulting grammar data
is involved at runtime via instances of the SpeechStudio Control, which communicate
as clients of the SpeechPlayer runtime system. The SpeechRunner is the SpeechStudio
Suites powerful debugger and testing tools.
32 Chapter 4. Implementation
Figure 4.12: An example of Option Button and Text Box use for Move and
Turn behaviors.
Figure 4.12 and 4.13 give examples of how to control behaviors by Option Button and
Text Box through SpeechStudio (the SR system). Figure 4.13 shows a portion of the
grammar file named Task.gram, which is written to control the system through speech.
This Figure also shows an example, how the developer can create pattern of grammar to
control the system components; this pattern specifies that when the application system
(Speech Khepera), which controls and communicates with the robot, has the attention
4.3. Software Approach 33
Figure 4.13: An example of create grammar to activate Option Button and to send
parameter at Text Box for Turn behavior.
of SpeechPlayer, our system user can say Khepera Please Turn 30 degrees; recogni-
tion of this phrase will choose the option button - Turn named opttask(1) (shows on
left-side in Figure 4.12) and 30 will be set in the Text Box named txtparam (shows
on right-side in Figure 4.12). To activate the Turn -option button in Figure 4.12 we
have used Press() function and also send integer parameter to the Text Box by simply
using SetWindowText(integer) function within the pattern < action >. . . < /action >;
both functions are built-in function of the SpeechStidio program. The grammar file is
an XML file. XML is a general language for exchanging information. Each piece of
XML is bracketed by a start token, such as < pattern >, and a matching end token - in
this case < /pattern >. Empty pieces can be abbreviated to < myT oken/ > instead of
< myT oken >< /myT oken >[35].
In the example of Figure 4.13 (the Task.gram), the grammar pattern has two parts -
Phrase part and Action part. The Phrase part starts with the < pattern >, which is
the start token; then the end with < /pattern > - end token. The phrase, which can
be spoken to control the system, is written within < pattern >. . . < /pattern >. In
our example, the phrase is - ?Khepera ?Please Turn < integer/ > degree. Here
< integer/ > means it can be any whole number - i.e., the user can say Turn 60 de-
grees and ? sign before the word means the word is optional - it can be said with the
other words in the phrase, not necessary; but other words should be said to do the action
for which the grammar pattern is written - i.e., for the example of Figure 4.13, this gram-
mar pattern is written to activate the Turn behavior option with the degrees parameter
(like - 60 degrees); so the user can say Turn 80 degrees or Please Turn 80 degrees or
Khepera Please Turn 80 degrees. The Action part is start with the < action > - start
token; then end with < /action > - end token; the action, which will be taken to choose
the Turn - option button (opttask(1)) and to set integer type variable to the Text
Box (txtparam) after spoken the phrase, is written within < action >. . . < /action >.
The first line - opttask#1.Press(); means that after recognition of the phrase, which
is written in the Phrase part; SR system will choose the opttask(1) (Turn - option
button) and then go the second line - txtparam.SetWindowText(integer);, means
set the integer type variable (whole number), which is recognized by the SR system from
the phrase.
which we have created SpeechStudio component and send the recognized sentence to
the Robotics application part. For coding simplicity we divide the Robotic application
program mainly in two modules. Within these modules we have further division in more
modules (functions).
From these two modules, one of the modules tasks is to activate the components, make
them read to communicate with each-other, switch the behaviors whenever the system
needs and also take care about the user interface; we have named it frmcom. In the
other module, we have written the general functions and behaviors function for Khepera,
we have named it Khepcom; these functions can be called from the other modules of
the system. The algorithm describes below on the basics of the two major modules of
the system:
Khepcom module: In this module we have written the function to communicate with
Khepera. Here we have implemented the behaviors through Breitenberg vehicle tech-
nique [4], Odometry [15] and Bug algorithm [10]; the same way, which we have mentioned
in section 4.1.1 to build the behaviors.
Also the sercom protocol [19] for communication with Khepera has implemented in
this module through different small functions. Such as F Khepcom - for sending and
receiving data from Khepera through serial cable, Set speed - for set the Khepera wheels
speeds, KStop- for stop the Khepera movement, Read prox - Read proximity sensors
data. The system has some global memory and some searching function, which are also
implemented here - find obj - it is a search function and it find-out the object position,
which previously store in the global memory through object identification command,
like This is room A; global memory like - Khepera Previous position.
Chapter 5
Evaluation
In our second approach we have used both complex and simple sentences and also with
some complex robotics activities, but in limited scope (See Table 4.2 and 4.3). To design
the test-plan, we have considered the implemented grammars and behaviors, which we
did in the implementation stage.
To present our system at the fair, we have used the simple activities from the Table 4.1
and also one activity from Table 4.2- the Back behavior; we just limited the sentence
making scope with these robotic activities but we have not limited the sentences. We
do this to give the user flexibility. i.e., User can make any sentences, like - Robot,
please move or Go forward, without using the sentences mentioned in Table 4.1 , 4.2.
To achieve this goal, it was one of our duties in the fair to observe the users and track
record the users sentences - whenever a new sentence is used, introduce the sentence to
the system afterward. We have also performed usability test of the system in fair. To
perform this usability test we have made a user questionnaire (see in the Appendix- C).
35
36 Chapter 5. Evaluation
5.2 Results
Here we mainly discuss elaborately about our testing experience and the test-results.
We have executed the testing phase according to our test-plan, so we have followed the
same sequence to present the results and the experiences.
The results of the test are not so impressive. The VE Modules (SR module) speaker
dependent feature is very sensitive. For example - if you train the module from a par-
ticular distance (distance between Microphone and the User), to get better SR result -
in our case to control the robot activities, you have to maintain the same distance with
microphone and also the same tone. Otherwise it does not recognize the command-
sentences properly. We have also found that the sentences with three words are not
always recognized and the LEDs are not a suitable interface to give user feedback.
We presented our system as a Candy picker Robot - CARO. The idea behind that
was to give pleasure to the user and make them to use the SR interface for robotic
control and gain a candy - like a fun game. We have set a plow in the front of the
Robot, by which the Robot can push a candy on a plain surface and also made a cage
using plastic glass. That why, if we put the robot and candies inside the cage, then user
can see it from the outside and this cage has also a little door in the front, from which
5.2. Results 37
a candy can come out easily. The task of the user is to navigate the Robot to bring a
candy for him/her through this little door.
From day one of the fair, the visitors gave as much response as we expected. The
people were curious about the CARO and also interested to try for a candy. To know
the users impression and also to do the usability evaluation using the real-time users,
we have prepared a user questionnaire. We have also got a lot of response to fill-up the
questionnaires from the user.
Figure 5.1, 5.2 and 5.3 shows the CAROs picture from the technical fair. These pictures
give you the overview of the CAROs arena.
Usability evaluation
For the usability evaluation of the SR interface for robotic control, first we have iden-
tified the usability factors by which we can evaluate the usability of this system. Our
chosen factors are:
Learnability - Its a most important factor for any system. We can define the learn
ability, how easy it is to learn the system. For our project - how is easy to learn to
control the robot through speech. To know the learn-ability factor we have asked the
user three following questions:
38 Chapter 5. Evaluation
Figure 5.3: Curious visitors are watching the CARO (The picture from the Technical
fair)
5.2. Results 39
Efficiency - If the system gives output that is accepted by the user then we can say
that the system works in efficient way. In this case, is the system responding perfectly
of the users speech. To investigate the systems efficiency factor we have asked the user
following questions:
Flexibility - We can define flexibility as how well the system enables users to do more
things. Our investigation point is to know - are the commands flexible for the users to
navigate the robot. To know the flexibility factor:
User satisfaction - The main goal of any system is to satisfy the user. If the user can
do all the things he/she wants from a system that means its satisfying the user perfectly.
It is hard to know the user satisfaction through some specific questions. To investigate
this factor we have considered the whole questionnaire (see in the Appendix- C) answers
but we have given more emphasis to these following questions:
Before discussing the questionnaire result we present some information about the users,
who have participated to test the CARO and also fill-up the user questionnaire; because,
the users information is an important factor in the usability test. But the conclusion
we have made from these users information and questionnaires may not reflect all the
people in the society; it only reflects the participants at the fair and we also dont know
about what types of peoples participation was majority at this technical fair. We have
analyzed the user through Age, Sex and Occupation; and all this information we have
got from the questionnaire sheet. The users information is presented in histogram in
Figure 5.4 and 5.5.
Figure 5.4 shows that young males were most interested to participate in the test. Of
40 Chapter 5. Evaluation
Figure 5.4: The histogram shows the users information on the basis of age and sex.
Figure 5.5: The histogram shows participant users information on the basis of age and
occupation.
the females, the aged persons (all above 35 years) have participated. According to Fig-
ure 5.5 the most of the participant users were Student and PhD. students. From these
two histograms we have also found the different kind of people participation for our
system testing. Our project goal is to make a user interface for a Service Robot, which
will work in the social context; and also the interface should be for the novice user. This
5.2. Results 41
usability test data is helpful for us, because of the participation of different kinds of
people (especially novice user).
The system efficiency evaluation is an important factor in the usability test. It gives us
the information about problems and limitation of the system. To investigate the effi-
ciency our main focus point is - Is the Robot perfectly responding the users speech?,
depending on this we have asked Question No. 5, 7 and 8 (See in the Appendix- C) to
the users. The answers are showed as pie diagrams a, b and c in Figure 5.7. According
the diagram - (a), we have found that after giving a command to the robot, the delay
time is not seen as a problem for the users. Only 17% of the user have found that - it
takes long time to understand the commands; the majority of the users have felt that
- its not a big problem and the rest of the users have found - its ok for them. The
second diagram - (b) shows that 61% users have found - CAROs responds Often to the
command, 22% of the users have found - Seldom and the rest of the users have found -
Always. The third diagram shows - what CARO does when it doesnt understand the
command. Most of the users say - it does nothings, 52% say - it does something else and
42 Chapter 5. Evaluation
only 4% says - it does right thing, but not perfectly. From these diagrams we can say
that - CARO understand the commands often and when it understands - it does the act
perfectly. So our finding is that because of SR system recognition problem the system
acts in this nature; from the SR documentation [34] we have found that the noise factor
effects SR system performance. A Fair is gathering of people, so the noise factor effect
makes the system response - Often, not Always.
Another usability factor is to know the flexibility of the system from the users point of
view. We have evaluated the flexibility of our system by asking Question No. 6 (See
in the Appendix- C) to the user. Our main focus is to find out that the commands
are flexible enough or not to navigate the CARO in its arena, are the commands are
sufficient or do we need to add more commands. The Figure 5.8 presents the result of
the question and shows that 61% of the users believe that the commands are sufficient to
control the CARO in its arena, 13% users say dont know, 17% of the users believe that
the commands, which already exist, are not sufficient - need to add more, like Fetch
the Candy and 9% of the users say that they need training to control the CARO. From
the result we can conclude that the commands are flexible enough to control the CARO.
5.2. Results 43
The most important usability factor and also hard to justify from the user answer is
User satisfaction. To investigate this factor we have judged all questions answer. But
we have given more emphasize to Question 1, 7, 8, 9 (See in the Appendix- C). We
have already discussed the answers of Question No. 7 and 8, when we have investigated
the efficiency. Now we are going to discuss the answers of Question 1 and 9. Question
1 is mainly to find out about the feeling when talking with CARO. Figure 5.9 presents
the results in pie diagrams. From the Figure 5.9 (a), we have found that 43% finds it
fun to talk to the system, 22% feel - it is unusual, 17% of users have found it - Funny,
9% say that it is Ok and other users comments that sometime the CARO doesnt
recognize the command, they need training to control the CARO and it is hard to know
what to say. We have also found the preferences of the users to control the robot in
Figure 5.9 (b). It shows that 70% of the users like to use speech to control robot,
22% prefer Joystick/Keyboard, 9% say - it depends on situation and 4% say, they dont
know. After evaluating all the questions answers, we have found that majority of the
users have given positive answers about CARO, so we can conclude that our system
satisfied our users.
44 Chapter 5. Evaluation
Discussion
The test results give us the facts about our success, problems and limitation to intro-
duce SR system as interface for robotics control. Here we mainly discuss the overall test
results, which we have presented in the Chapter 5. This discussion gives the reader an
overview of the test results. First we have some discussion about Hardware approach,
then the Software approach test results. We also discuss about the achievement at the
Technical Fair.
In the Hardware approach we have used VE Module (SR module). From the test results
we have found that the VE Modules speaker dependent feature is very sensitive. Its
not only sensitive to noise, but also sensitive to voice tone changes and microphone po-
sition. We have also found that sentences with three words are not always recognized,
because the user has to maintain the even tone at every word in the sentence, when
he/she gives any command to the robot. The LEDs are also not a suitable interface to
give user feedback, because it engages the users so much. Sometime the users simply
miss the feedback.
With the Software approach, we have got better result. Here, we have used the soft-
ware module named SpeechStudio as SR module. We have found some limitation
in this SR module; we have to keep the noise at a minimal level, when we use the
system. Another observation is that when we are not planning to communicate with
the system, we have to mute or switch off the microphone connection, because the sur-
rounding noise can make the system malfunctioning. To avoid the system get hurt or
crash with the wall, if the users forget to mute the microphone when he/she isnt using
- we have introduced Avoid-obstacle behavior to the robot. Sometimes the system can
not response with the user speech, the reason behind that is mainly the noise or the
users speak is not so clear or the user say sometime which is not designed to respond to.
We have also achieved a great experience to present our system in the Stockholm In-
ternational fair2005 (Tekniska massan 2005). It was a technical fair, so people have
gathered there to learn about the new technology. We have also found different kind of
people participation for our system testing. Our project goal is to make interface for the
Service Robot, which will work in the social context; and also the interface should be
for the novice user. Almost all of the participants were novice user, so the test results
help to know their comments about our system. Another interesting thing is that near
45
46 Chapter 6. Discussion
The noise factor affect our system performance quite a lot, so we find that CARO
understands the commands often, but when it understands - it does the act perfectly.
From the SR documentation [34] we have found that the noise factor affects SR system
performance, which is the main key for user interface of our system. A fair is a gathering
of people, so the noise makes the system response - Often not Always.
From the users comment, we have found that the commands are flexible enough to
control the CARO.
After evaluating all the usability test-results, we have found that majority of the users
have given positive response about CARO, so we can conclude that our system satisfied
the users.
Chapter 7
Conclusions
Main target of our project is to add SR capabilities in the Mobile Robot and investigate
the use of a natural language (NL) such as English as a user interface for interacting
with the Robot. We have implemented the SR interface successfully with hardware
Speech Recognition (SR) device as well as Software PC based SR system by using a
small Mobile Robot named Khepera. We have done the laboratory test with expert
users and the real-time test with novice users. After all the implementation and the
testing session, we have gained a lot of experience and also found the problems and lim-
itations when introducing SR system as a user interface to robot. From these achieved
experiences, we have reached some conclusions. Our first finding is that the hardware
SR device is not as matured as the Software PC based SR system. The hardware SR
module does not support the complex grammar sentences, which are normal parts of the
spoken natural languages. Another thing is that LED is not suitable interface for the
user feedback. After testing the system with the novice users in the technical fair, we
have found that SR user interface is a promising aid for interaction with robot. It makes
them learn quickly to control the robot. We have also found limitation of the Software
PC based SR system; the noise factor affects the SR performance of the SRSP. (Speech
Recognition Software Program) and also the robot performance - means the robot does
malfunctioning. Another thing is that when the user is not planning to control the
robot; he/she should mute the microphone. The SRSP supports complex sentences; this
gives us opportunity try complex sentences to control the robot and we have successfully
47
48 Chapter 7. Conclusions
7.1 Limitations
In the implementation stage, we have followed the requirement which we set in the
beginning. According to these, our system only support English language and also the
robots activities are limited to those mentioned in Table 4.1, 4.2 and 4.3.
Acknowledgements
I would like to thank my supervisor, Thomas Hellstrom for his valuable insights and
comments during my Master thesis project. I could not complete this project-work
without the help of a number of people. Even though I cant put everyones name
here. I would specially like to thank Per Lindstrom, International Student coordinator,
and my other courses teachers, who helped me through out my academic life at Umea
University. I am grateful to my supervisor to give me the opportunity to participate
in the Stockholm International fair2005 (Tekniska massan 2005) and also thank to my
fellow collogues, who have participated and helped me in this technical fair.
49
50 Chapter 8. Acknowledgements
References
51
52 REFERENCES
[11] Dominique Estival. Adding lanuage capabilities to a small robot. Technical report,
University of Melbourne, Australia, 1998.
[16] John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to au-
tomata theory, languages and computation. Addison-Wesley, Boston, second edi-
tion, 2001.
[17] Helge Huttenrauch, Anders Green, Michael Norman, Lars Oestreicher, and Ker-
stin Severinson Eklund. Involving users in the design of a mobile office robot.
Systems, Man and Cybernetics, Part C, IEEE Transactions on, 34, Issue:2:113
124, May 2004. ftp://ftp.nada.kth.se/IPLab/TechReports/IPLab-209.pdf (visited
2005-10-20).
[18] K-Team Corporation, Rue Galile 9 - Y-Parc, 1400 Yverdon, SWITZERLAND Tel:
+41 (24) 423 89 50 Fax: +41 (24) 423 89 60. Khepara Documentation & Software.
http://www.k-team.com/download/khepera.html (visited 2005-11-13).
[19] K-Team Corporation, Rue Galile 9 - Y-Parc, 1400 Yverdon, SWITZERLAND Tel:
+41 (24) 423 89 50 Fax: +41 (24) 423 89 60. Khepara User Manual. http://www.k-
team.com/download/khepera.html (visited 2005-11-13).
[22] Pierre Nugues+ Mathias Haage+, Susanne Schotz*. A prototype robot speech in-
terface with multimodal feedback. In Proceedings of the 2002 IEEE- Int. Workshop
Robot and Human Interactive Communication, pages 247252, Berlin Germany,
September 2005.
[23] Hossein Motallebipour and August Bering. A spoken dialogue system to control
robots. Technical report, Dept. of Computer Science, Lund Institute of Technology,
Lund, Sweden, 2003.
REFERENCES 53
[37] WordNet - a lexical database for the English language, Cognitive Science Labora-
tory, Princeton University, 221 Nassau St. Princeton, NJ 08542, New Jersey 08544
USA, http://wordnet.princeton.edu/ (visited 2005-10-28).
Appendix A
Voice ExtremeT M (VE) Module a speech recognition products in simplifies design onto
a single board. It is a reprogrammable module, which can be programmed and down-
loaded into the VE Module using the Voice ExtremeT M Toolkit. After downloaded
the program, the module can to unplug from the Development Board and wired into
the final product. This module has 34-pin connector; from these 11 pins are for I/O
lines, a power, microphone, speaker, and logic-level RS232 interface. Figure A.1 shows
the picture of the Voice ExtremeT M (VE) Module; it is the top view of the module. [31]
There are 6 different features in this module; there are - Speaker-independent speech
recognition, Speaker-dependent speech recognition and word spotting, High quality
speech synthesis and sound effects, Speaker verification, Four-voice music synthesis,
55
56 Chapter A. Hardware & Software Components
Figure A.2 shows the pins configuration of the Voice ExtremeT M (VE) Module. If
an application is stand alone, the two serial I/O pins, P0.0 and P0.1, and the serial
port enable, P1.7, may be used for other purposes; however, programs will download via
asynchronous serial I/O. Since I/O pins P0.5 and P0.6 are connected to the address bus
of the Flash memory, they should not be used under any circumstances. [31]
The Voice ExtremeT M Development Board has several features. We have discussed
about some important features, such as Speaker- there is inboard speaker with fixed
A.1. Hardware Components 57
volume and also an output jack for external speaker; the jack will disable the inboard
speaker after plug-in the external speaker; this speaker can be used for debugging pur-
pose; Prototyping Area - its a grid of 0.1 through-holes for use by the application
developer to add external circuitry; RS-232 Port - there is 9 pins connector for con-
necting to the PC through RS-232 serial cable. I/O Port - there are standard 20-pin
I/O lines, which can be used from the development board to the target application (See
the I/O pins configuration in Figure A.4); Voice ExtremeT M Module - This mod-
ule is the heart of the system, after downloaded the program to the module; it can be
unplugged from the board and wired in the target application; Microphone - there
is a inboard microphone and also a option to use external microphone through output
jack; the microphone is mainly used for debugging or training purpose; Reset Switch
- it makes the hardware reset of the VE Module; Download Switch - it makes the
VE Module in a state such that it is waiting for a program to be downloaded from
the development PC. Led 1, 2 and 3 - can be used for development purpose to see
the output from the VE module; Switch A, B and C - can be used for development
purpose. [32]
Figure A.4: Voice ExtremeT M (VE) Development Board I/O pins configuration [32].
A.1.3 Khepera
Khepera is a small mobile robot for using in research and education purpose. It is
58 Chapter A. Hardware & Software Components
a product from K-team Company. The Khepera robot size in Diameter is: 70 mm;
Motion - For motion robot there are 2 DC brushed servo motors with incremental
encoders (roughly 12 pulses per mm of robot motion). Perception - there are 8 Infra-red
proximity and ambient light sensors with up to 100mm range. The external sensors
can be added through General I/O turret (See Figure A.6). The developer can get the
development guideline and environment information from the K-team company website
(http://www.k-team.com/robots/khepera/index.html). [18]
needs a training set of templates; and after training, stored them in flash memory and
then performing recognition against the trained set. In the training phase, PatGen func-
tion is used to generate patterns, TrainSD function is used to average two templates to
increase the accuracy of recognition, and PutTemplate and GetTemplate functions are
used to transfer templates between temporary and permanent storage. At the recogni-
tion phase, PatGen is again used to generate a template and RecogSD function is used
to perform the recognition. [32]
A.2.2 SpeechStudio
We have used SpeechStudio to create our project Voice User Interface and most im-
portant part of the SpeechStudio is grammar creation. So here we only emphasize on
grammar creation through SpeechStudio.
if we Right click on frmMains Menu under the Menus folder or frmcom under
the Forms folder, a popup menu will come and from the popup we can choose Create
Grammar to create grammar file for the application. If the developer wants to create
grammar for the menu item he/she should Right click under the Menus folder and if
he/she creates for the forms item/object, he/she should Right click under the Forms
folder. So before create grammar developer have to plan a system design that the appli-
cation can be control through graphical interface, then design for the VUI and modify
the GUI according to VUI design. For out project, we have created the GUI using Op-
tion button and Text Box for robotics control and create the grammar with these
Forms components. In the Figure A.8 example shows that, there is a Task.grm, which
is a grammar file (Task with a G-in-a-box icon appear under frmcom in the Forms
folder). Figure A.9 also shows the Task.grm file after open in the right side of the
workspace. The developer can find the grammar syntax in the Start Programs
SpeechStudio Tutorials Introduction /Changing Grammar to create grammar for
VUI in an application.
Installation guide
Welcome to installation guideline of Voice User Interface (VUI) for Robotic Control.
Here we only present the Software approach systems software installation guideline for
both the developer and the user. At the user installation, source files are not accessible,
only *.exe file available there. We assume that the user follow the Khepera Robot User
Manual [19] to connect Khepera with the PC.
Table B.1: The available software products and their files name in the SpeechStudio
Developer Bundle Package.
During installation, you will be prompted for a license key. You will also need a separate
user/license key for installing Profile Manager, which is included in Profile Developer.
61
62 Chapter B. Installation guide
Separately you can browse the grammar files by opening SpeechStudio program from
the Manu: Start All Programs SpeechStudio. The grammar files are in the
same directory, where the VB project is. The files are *.grm extension.
SpeechPlayer372.msi SpeechPlayer
Note:
? If the system gives error -you do not have a speech engine installed, then you have install
Microsoft SAPI 5 English. You can download free SAPI 5 engine from www.microsoft.com
/Speech/download/sdk51 as part of the SAPI 5.1 SDK. [33]
? You may see a Server Busy message box, indicating that SpeechPlayer is still ini-
tializing the speech engine; if so, just click Retry [33].
? After starting the system, look at the bottom of the SpeechPlayer window. The
lower left window will show status going from Starting. . . , to Not Listening to Lis-
tening when the engine is ready. The lower right-hand window is a microphone level
meter. If you have a microphone plugged in and working, you should now be able to talk
to the system. Try to talk with the system with simple word move; it should be work;
B.2. User guide 63
you can see the Khepera moves forward and also the system message window shows
the command. If it is not recognize the word, you should perform a training session
through Profile Manager to increase the SR performance. You can find Start All
Programs SpeechStudioTools Profile Manager. [33]
64 Chapter B. Installation guide
Appendix C
User Questionnaire
Your age: . . . . . .
.................................................................................
.................................................................................
7. When you told CARO to do something - did it act like you expected?
1) Always
2) Often
65
66 Chapter C. User Questionnaire
3) Seldom
4) Never
9. Would you prefer to control the robot with speech instead of joystick or keyboard?
.................................................................................
10. Did you get enough help from the CARO when get it got stuck?
.................................................................................
Appendix D
Glossary
67