Sie sind auf Seite 1von 122

The Simon Handbook

Peter H. Grasch
The Simon Handbook
2
Contents
1 Introduction 9
2 Overview 10
2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Required Resources for a Working Simon Setup . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Acoustic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2.1 Backends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2.2 Types of base models . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2.2.1 Static base model . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2.2.2 Adapted base model . . . . . . . . . . . . . . . . . . . . . 13
2.2.2.2.3 User-generated model . . . . . . . . . . . . . . . . . . . . 14
2.2.2.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2.4 Where to get base models . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2.5 Phoneme set issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Using Simon: Typical user 16
3.1 First run wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Base models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.3 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.4 Sound conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.5 Volume calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 The Simon Main Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Main window: Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Main window: Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 Main window: Acoustic model . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.4 Main window: Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Import Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2 Delete Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1.1 Simon Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
The Simon Handbook
3.4.1.2 Audacity Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2 Silence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.3 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.4 Microphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.5 Sample Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Contribute Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 Manage training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6.1 Modifying samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6.2 Clear training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6.3 Importing Training Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7.1 General Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7.2 Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7.2.1 Device Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7.2.2 Voice Activity Detection . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7.2.3 Training settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7.2.4 Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7.2.5 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7.3 Speech Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.7.3.1 Base model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.7.3.2 Training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7.3.3 Language Prole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7.4 Model Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7.5 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7.5.1 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7.5.1.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7.5.1.2 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7.5.2 Synchronization and Model Backup . . . . . . . . . . . . . . . . . . 44
3.7.6 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7.6.1 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7.6.2 Dialog font . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7.6.3 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7.7 Text-to-speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7.7.1 Backends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7.7.2 Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7.7.3 Webservice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7.8 Social desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7.9 Webcam conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7.10 Advanced: Adjusting the recognition parameters manually . . . . . . . . . 49
3.7.10.1 Julius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4
The Simon Handbook
4 Advanced: Creating new scenarios with Simon 51
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Speech recognition: background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Language Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1.1 Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1.1.1 Active Dictionary . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1.1.2 Shadow Dictionary . . . . . . . . . . . . . . . . . . . . . . 53
4.2.1.1.3 Language prole . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.1.2 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.2 Acoustic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Scenario hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.2 Adding a new Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.3 Edit Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.4 Export Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Adding Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.1.1 Dening the Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1.1.1 Manually Selecting a Category . . . . . . . . . . . . . . . 62
4.4.1.1.2 Manually Providing the Phonetic Transcription . . . . . . 62
4.4.1.2 Training the Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.2 Editing a word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.3 Removing a word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.4 Special Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.5 Importing a Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.5.1 HADIFIX Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.5.2 HTK Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.5.3 PLS Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.5.4 SPHINX Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.5.5 Julius Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.6 Create language prole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5.1 Import a Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.2 Renaming Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.3 Merging Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.6.1 Storage Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6.2 Adding Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.2.1 Add training texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.2.2 Local text les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5
The Simon Handbook
4.6.3 On-The-Fly Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.7 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.1 Scenario selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.2 Sample groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.7.3 Context conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.7.3.1 Active window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.7.3.2 D-Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.7.3.3 Face detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.7.3.4 File content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.7.3.5 Lip detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.7.3.6 Or condition association . . . . . . . . . . . . . . . . . . . . . . . . 85
4.7.3.7 Process opened . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.8 Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8.1 Executable Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.8.1.1 Importing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.8.2 Place Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.8.2.1 Importing Places . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.8.3 Shortcut Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.8.4 Text-Macro Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8.5 List Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8.5.1 List Command Display . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.8.5.2 Conguring list elements . . . . . . . . . . . . . . . . . . . . . . . . 96
4.8.6 Composite Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.8.7 Desktop grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.8.8 Input Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.8.9 Dictation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.8.10 Articial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.8.11 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.8.12 Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.8.13 Pronunciation Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.8.14 Keyboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.8.15 Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.8.15.1 Dialog design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.8.15.2 Dialog: Bound values . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.8.15.3 Template options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.8.15.4 Avatars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.8.15.5 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.8.16 Akonadi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.8.17 D-Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.8.18 JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5 Questions and Answers 120
6 Credits and License 121
A Installation 122
6
The Simon Handbook
List of Tables
2.1 Ways to an acoustic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Base model requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Julius Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Sample Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Sample Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Improved Sample Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Sample Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 Improved Sample Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7
Abstract
Simon is an open source speech recognition solution.
The Simon Handbook
Chapter 1
Introduction
Simon is the main front end for the Simon open source speech recognition solution. It is a Simond
client and provides a graphical user interface for managing the speech model and the commands.
Moreover, Simon can execute all sorts of commands based on the input it receives fromthe server:
Simond.
In contrast to existing commercial offerings, Simon provides a unique do-it-yourself approach to
speech recognition. Instead of predened, pre-trained speech models, Simon does not ship with
any model whatsoever. Instead, it provides an easy to use end-user interface to create language
and acoustic models from scratch.
Additionally the end-user can easily download created use cases from other users and share his
/ her own.
The current release can be used to set up command-and-control solutions especially suitable
for disabled people. However, because of the amount of training necessary, continuous, free
dictation is neither supported nor reasonable with current versions of Simon.
Because of its architecture, the same version of Simon can be used with all languages and dialects.
One can even mix languages within one model if necessary.
9
The Simon Handbook
Chapter 2
Overview
2.1 Architecture
The main recognition architecture of Simon consists of three applications.
Simon
This is the main graphical interface.
It acts as a client to the Simond server.
Simond
The recognition server.
KSimond
A graphical front-end for Simond.
These three components form a real client / server solution for the recognition. That means that
there is one server (Simond) for one or more clients (Simon; this application). KSimond is just a
front-end for Simond which means it adds no functionality to the system but rather provides a
way to interact with Simond graphically.
Additionally to the Simon, Simond and KSimond other, more specialized applications are also
part of this integrated Simon distribution.
Sam
Provides more in-depth control to your speech model and allows to test the acoustic model.
SSC / SSCd
These two applications can be used to collect large amount of speech samples from different
persons more easily.
Afaras
This simple utility allows users to quickly check large corpora of speech data for erroneous
samples.
Please refer to the individual handbooks of those applications for more details.
10
The Simon Handbook
Simon is used to create and maintain a representation of your pronunciation and language. This
representation is then sent to the server Simond which compiles it into a usable speech model.
Simon then records sound from the microphone and transmits it to the server which runs the
recognition on the received input stream. Simond sends the recognition result back to the client
(Simon).
Simon then uses this recognition result to execute commands like opening programs, following
links, etc.
Simond identies its connections with a user / password combination which is completely in-
dependent from the underlying operating system and its users. By default a standard user is set
up in both Simon and Simond so the typical use case of one Simond server per Simon client will
work out of the box.
Every Simon client logs onto the server with a user / password combination which identies a
unique user and thus a unique speech model. Every user maintains his own speech model but
may use it fromdifferent computers (different, physical Simon instances) simply by accessing the
same Simond server. One Simond instance can of course also serve multiple users.
If you want to open up the server to the Internet or use multiple users on one server, you will
have to congure Simond. Please see the Simond manual for details.
2.2 Required Resources for a Working Simon Setup
NOTE
For background information about speech models, please refer to the Speech Recognition: Back-
ground section.
To get Simon to recognize speech and react to it you need to set up a speech model.
Speech models describe how your voice sounds, what words exist, how they sound and what
word combination (sentences or structures) exist.
A speech model basically consists of two parts:
11
The Simon Handbook
Language model: Describes all existing words and what sentences are grammatically correct
Acoustic model: Describes how words sound
You need both these components to get Simon to recognize your voice.
In Simon, the language model will be created from your active scenarios and the acoustic model
will be either built solely through your voice recordings (training) or with the help of a base model.
2.2.1 Scenarios
One scenario makes up one complete use case of Simon. To control Firefox, for example, the user
just installs the Firefox scenario.
In other words, scenarios tell Simon what words and phrases to listen for and what to do when
they are recognized.
Because scenarios do not contain information about howthese words and phrases actually sound,
they can be shared and exchanged between different Simon users without problems. To accom-
modate this community based repository pool, a category for Simon scenarios has been created
on kde-les.org where the scenarios, which are just simple text les (XML format), can be ex-
changed easily.
In most cases scenarios are tailored to work best with a specic base model to avoid issues with
the phoneme set.
For information on how to use scenarios in Simon, please refer to the Scenario section in the Use
Simon chapter.
2.2.2 Acoustic model
As mentioned above, you need an acoustic model to activate Simon.
You can either create your own or use and even adapt a base model. Base models are already
generated, most often speaker independent, acoustic models that can be used with Simon.
The following table shows what is required, depending on your Simon conguration:
Training required
Base model
required
Model creation
backend required
Static base model No Yes No
Adapted base
model
Yes Yes Yes
User-generated
model
Yes No Yes
Table 2.1: Ways to an acoustic model
2.2.2.1 Backends
Simon uses external software to build acoustic models and to recognize speech.
Usually, these backends can be split into two distinct components: The model compiler or
model generation backend used to create or adapt acoustic models and the recognizer used
to recognize speech with the help of these models.
12
The Simon Handbook
Not all operation modes of Simon will require a model compiler backend. Please refer to the next
section about details on when this is the case.
Two different backends are supported:
Julius / HTK
Models will be created with the HTK. Julius will be used as recognizer.
To use this backend, please make sure that you have an up-to-date version of both these tools
installed.
CMU SPHINX
This backend, also often simply referred to as SPHINX backend, uses the PocketSphinx
recognizer and the SphinxTrain model generation backend. Please refer to the CMU SPHINX
website for more details.
The CMU SPHINX backend requires that Simon is built with the optional SPHINX support. If
you have not compiled Simon from source, please refer to your distribution for more informa-
tion.
If you are using base models, Simon will automatically select the appropriate backend for you.
However, if you want to build your own models from scratch (user-generated model, see below)
and have a certain preference, please refer to the Simond conguration for more information.
Base models created for one backend are not compatible with any other backend. Please refer to
the compatibility matrix for details.
2.2.2.2 Types of base models
There are three types of base models:
Static base model
Adapted base model
User-generated model
For information on how to use base models in Simon, please refer to the Base Models section in
the Use Simon chapter.
2.2.2.2.1 Static base model
Static base models simply use a pre-compiled acoustic model without modifying it.
Any training data collected through Simon will not be used to improve the recognition accuracy.
This type of model does not require the model creation backend to be installed.
2.2.2.2.2 Adapted base model
By adapting a pre-compiled acoustic model you can improve accuracy by adapting it to your
voice.
Collected training data will be compiled in a adaption matrix which will then be applied to the
selected base model.
This type of model does require the model creation backend to be installed.
13
The Simon Handbook
2.2.2.2.3 User-generated model
When using user-generated models, the user is responsible for training his own model. No base
model will be used.
The training data will be used to compile your own acoustic model allowing you to create a
system which directly reects your voice.
This type of model does require the model creation backend to be installed.
2.2.2.3 Requirements
To build, adapt or use acoustic models of different types, certain software needs to be installed.
CMU SPHINX Julius / HTK
Static base model PocketSphinx Julius
Adapted base model SphinxTrain, PocketSphinx HTK, Julius
User-generated model SphinxTrain, PocketSphinx HTK, Julius
Table 2.2: Base model requirements
All four tools, HTK, Julius, PocketSphinx and SphinxTrain, can safely be installed at the same
time.
SPHINX support in Simon must be enabled during compile time and might not be available on
your platform. Please refer to your distribution.
NOTE
The Simon Windows installer includes Julius, PocketSphinx and SphinxTrain but not the HTK. Please
refer to the installation section for information on how to install it should you nd the need for it.
2.2.2.4 Where to get base models
Simon base models are packaged as .sbm les. If you happen to have raw model les for your
backend, you can package them into a compatible SBM container within Simon. Please refer to
the speech model conguration for details.
Not all SBM models may work for you. Please refer to the model backends section for details.
To keep this list of available base models up to date, please refer to the list in our online wiki.
2.2.2.5 Phoneme set issues
In order for base models to work, both your scenarios and your base model need to use the same
set of phonemes.
In practice, this often just means that you need to match scenarios to your base model. The name
of Simon base models will most likely start with a tag like [EN/VF/JHTK]. Try to download
scenarios that start with the same tag.
You can not use scenarios designed for different phoneme set (different base model). If Simon
recognizes this error, it will try to disable affected words by removing them from the created
speech model. These words will be marked with a red background in the vocabulary of the
scenario. To re-enable them, transcribe themwith the proper phoneme set or use a user-generated
model.
14
The Simon Handbook
HINT
If you design a new scenario it is therefore a good idea to use the dictionary that was used to create the
base model as shadow dictionary. This way Simon will suggest the correct phonemes when adding
the words automatically.
15
The Simon Handbook
Chapter 3
Using Simon: Typical user
The following sections will describe how to use Simon.
3.1 First run wizard
On the rst start of Simon, this assistant will guide you through the initial conguration of Simon.
The conguration consists of ve easy steps which are outlined below. You can skip each step
and even the whole wizard if you want to - in that case, the system will be set up with default
values.
However, please note that without any conguration, there wont be any recognition.
3.1.1 Scenarios
In this step you can add or download scenarios.
16
The Simon Handbook
To download scenarios from the online repository, select Open Download to open the down-
load dialog pictured below.
Especially for new users it is recommended to try some scenarios rst to see how the system
works before diving into conguring it exactly for your use case.
After completing the assistant, you can change the scenario conguration with the use of the
scenario management dialog.
If you are planning to use a base model, make sure that you download matching scenarios.
17
The Simon Handbook
3.1.2 Base models
In this step you can set up Simon to use base models.
Again, you can download base models from an online repository through Open model
Download.
To use a user-generated model, select Do not use a base model.
After completing or aborting the rst run wizard you can change conguration options dened
here in the Simon conguration.
18
The Simon Handbook
3.1.3 Server
Internally, Simon is a server / client application. If you want to take advantage of a network
based installation, you can provide the server address here.
The default conguration is sufcient for a normal installation and will assume that you use a
local Simond server that will be automatically be started and stopped with Simon.
After completing or aborting the rst run wizard you can change conguration options dened
here in the server conguration.
3.1.4 Sound conguration
Because Simon recognizes sound from one or more microphones, you have to tell Simon which
devices you want to use for recognition and training.
19
The Simon Handbook
Simon can use one or more input- and output devices for different tasks. You can nd more
information about Simons multiple device capabilities in the Simon sound conguration section.
If you dont have at least one working input device for recognition, you will not be able to activate
Simon.
After completing or aborting the rst run wizard you can change conguration options dened
here in the sound conguration.
3.1.5 Volume calibration
For Simon to work correctly, you need to congure your microphones volume to a sensible level.
20
The Simon Handbook
For more details on this, please see the general section about Volume Calibration.
3.2 The Simon Main Window
The Simon main window is split into four logical sections. On the top left, you can see the
scenario section, to its right you nd the training section, on the bottom left is the acoustic model
and nally, on the right of that, the recognition section.
The Simon main window can be hidden at any time by clicking on the Simon logo in the system
tray (usually next to the system clock in the task bar) which will minimize Simon to the tray.
Click it again to show the main window again.
3.2.1 Main window: Scenarios
Alist of scenarios shows the currently loaded scenarios. You can manage this selection by clicking
Manage scenarios which will open the scenario management dialog.
To modify a scenario, select it from the list and open it by pressing Open <name>.
3.2.2 Main window: Training
This section shows all training texts from all currently active scenarios.
Selecting a training text will highlight the parent scenario in the scenario section.
You can start to train the recognition by selecting a text and clicking on Start training. Please note
that, depending on your selected model type, training may or may not improve your recognition
accuracy. The acoustic model section (see below) in the Simon main menu tells you if training
will have an effect for your specic conguration. For more information, please refer to the base
model section for background information on this subject.
The gathered training corpus can be managed by selecting Manage training data which will open
the sample management dialog.
21
The Simon Handbook
To help build a general, open speech corpus, please consider contributing your training corpus
to the Voxforge project by selecting File Contribute samples to bring up the sample upload
assistant.
3.2.3 Main window: Acoustic model
Here, Simon shows information about the currently used base- and active model.
Select Congure acoustic model to congure the base model.
3.2.4 Main window: Recognition
This section shows information about the recognition status.
If Simon is connected to the server, you can activate and deactivate the recognition by toggling
the Activate button. If this control element is not available, make sure you are connected by
selecting File Connect from Simons menu.
An integrated volume calibration widget monitors the congured recognition devices. The
sound setup can be modied by selecting Congure audio to bring up the sound conguration.
3.3 Scenarios
This section describes how to import and remove scenarios to your Simon conguration. For
general information about scenarios, please refer to the background chapter. If you want to
create, edit or export scenarios, please refer to the advanced usage section.
To modify your scenario conguration, rst open the scenario management dialog by pressing
Manage scenarios in the Simon main window.
To activate or deactivation a scenario you can use the arrow buttons between the two lists or
simply double click the option you want to load / unload.
More information about individual scenarios can be found in the tooltips of the list items.
22
The Simon Handbook
3.3.1 Import Scenario
Scenarios can be imported from a local le in Simons XML scenario le format but can also be
directly downloaded and imported from the internet.
When downloading scenarios, the list of scenarios is retrieved from Simon Scenarios subsection
of the OpenDesktop site KDE-les.org.
If you create a scenario that might be valuable for other Simon users, please consider uploading
it to this online repository and help other Simon users.
3.3.2 Delete Scenario
To delete a scenario, select the scenario and click the Delete button.
Because scenarios are synchronized with the recognition server, you can restore deleted scenarios
through the model synchronization backup.
3.4 Recordings
If you are using user-generated or adapted models, Simon builds its acoustic model based on
transcribed samples of the users voice. Because of this, the recorded samples are of vital impor-
tance for the recognition performance.
3.4.1 Volume
It is important that you check your microphone volume before recording any samples.
23
The Simon Handbook
3.4.1.1 Simon Calibration
The current version of Simon includes a simple way of ensuring that your volume is congured
correctly.
By default the volume calibration is displayed before starting any recording in Simon.
To calibrate simply read the text displayed.
The calibration will monitor the current volume and tell you to either raise or lower the volume
but you have to do that manually in your systems audio mixer.
During calibration, try to talk normally. Dont yell but dont be overly quiet either. Take into
account that you should generally use the same volume setting for all your training and for the
recognition too. You might speak a little bit louder (unconsciously) when you are upset or at
another time of the day so try to raise your voice a little bit to anticipate this. It is much better to
have a little quieter samples than to start clipping.
In the Simon settings, both the text displayed and the levels considered correct can be changed. If
you leave the text empty, the default text will be displayed. In the options you can also deactivate
the calibration completely. See the training section for more details.
3.4.1.2 Audacity Calibration
Alternatively you can use an audio editing tool like the free Audacity to monitor the recording
volume.
Too quiet:
24
The Simon Handbook
Too loud:
Perfect volume:
3.4.2 Silence
To help Simon with the automatic segmentation it is recommended to leave about one or two
seconds of silence on the recording before and after reading the prompted text.
Current Simon versions include a graphical notice on when to speak during recording. The
message will tell the user to wait for about half a second:
25
The Simon Handbook
... before telling the user to speak:
This method of visual feedback proved especially valuable when recording with people who
cannot read the prompted text for themselves and therefore need someone to tell them what they
have to say. The colorful visual cue tells them when to start repeating what the facilitator said
without the need of unreliable hand gestures.
3.4.3 Content
Generally we recommend to record roughly the same sentences that Simon should recognize
later.
26
The Simon Handbook
(Obviously that does not apply to massive sample acquisitions where other properties like pho-
netic balance are more important)
Care should be taken to avoid recordings like One One One to quickly ramp up the recognition
rate property. Such recordings often decrease recognition performance because the pronuncia-
tion differs greatly from saying the word in isolation.
3.4.4 Microphone
For Simon to work well, a high quality microphone is recommended.
However, even relatively cheap headsets (around 30 Euros) achieve very good results - magni-
tudes better than internal microphones.
For maximum compatibility we recommend USB headsets as they usually support the neces-
sary samplerate of 16 kHz, are very well supported from both Microsoft Windows as well as
GNU/Linux and normally dont require special, proprietary drivers to operate.
3.4.5 Sample Quality Assurance
Simon will check each recording against certain criteria to ensure that the recorded samples are
not erroneous or of poor quality.
If Simon detects a problematic sample, it will warn the user to re-record the sample.
Currently, Simon checks the following criteria:
Sample peak volume
If the volume is too loud and the microphone started to clip (Clipping on wikipedia), Simon
will display a warning message urging the user to lower the microphone volume.
Signal to noise ratio (SNR)
Simon will automatically determine the signal to noise ratio of each recording. If the ratio is
below a congurable threshold, a warning message will be displayed.
The default value of 2300 % means that for Simon to accept a sample as correctly recorded the
peak volume has to be 23 times louder than the noise baseline (lowest average over 50 ms).
Often this can be a result of either a very low quality microphone, high levels of ambient noise
or a low microphone gain coupled with a microphone boost option in the system mixer.
SNR warning message triggered by an empty sample. This information dialog is displayed when
clicking on the More information button on the recording widget.
27
The Simon Handbook
3.5 Contribute Samples
The base models that can be used with Simon to augment or replace training are built from other
peoples speech samples. In order to create high quality base models, a large amount of training
samples are necessary.
If you trained your local Simon installation, you gathered valuable voice samples that could
improve the quality of the general model.
Through Simons Contribute Samples dialog you can upload those recordings to benet the
Voxforge project to create high quality open source base models.
After connecting to the server, Simon will ask for some basic meta-information. This informa-
tion obviously contains no personal information. Instead, it will later be used to group together
samples of similar speaker groups to build more accurate acoustic models.
28
The Simon Handbook
The duration of the upload process itself will depend on your internet connection. Generally
speaking, this only transmits relatively little data because the audio samples collected by Simon
are generally very small: around 0.1 MB per sample.
3.6 Manage training data
To viewand modify your personal training corpus, you can access the training data management
dialog by selecting Manage training data in the Simon main window or the training section of
any opened scenario.
29
The Simon Handbook
3.6.1 Modifying samples
To listen to or re-record a sample, select it from the list and select Open Sample.
In this dialog you can also modify the samples group after it was recorded.
If you remove the opened sample and do not re-record it, Simon will offer to remove it from the
corpus.
30
The Simon Handbook
3.6.2 Clear training data
After a conrmation dialog, this will remove all personal training data of the user.
3.6.3 Importing Training Samples
Using the import training data eld you can import previously gathered training samples from
previous Simon versions or manual training.
NOTE
This feature is very specic. Please use it with caution and make sure that you know exactly what you
are doing before you continue.
You can either provide a separate prompts le or let Simon extract the transcriptions from the
lenames.
When using prompts based transcriptions your prompts le (UTF-8) needs to contain lines of
the following content: [lename] [content]. Filenames are without le extensions and the
content has to be uppercase. For example: demo_2007_03_20 DEMO to import the le demo
_2007_03_20.wav containing the spoken word Demo.
Because prompts les do not contain a le extension, Simon will try wav, mp3, ogg and ac (in
that order). If one of those match, no other extension will be tested and only the rst le will be
imported (in contrast to le based transcription where all les would be imported).
When using le based transcriptions, a le called this_is_a_test.wav must contain This is a test
and nothing else. Numbers and special characters (., -,...) in the lename are ignored and
stripped.
Files recorded by Simon 0.2 will follow this naming scheme so you can safely import them us-
ing the le name extraction method. Files generated by previous Simon versions should not be
imported using this function but you can use the prompts based import for that.
Imported les and their transcription are then added to the training corpus.
To import a folder containing training samples just select the folder to import and depending on
your import type also the prompts le.
31
The Simon Handbook
The folder will be scanned recursively. This means that the given folder and all its subfolders
will be searched for .wav, .ac, .mp3 and .ogg les. All les found will be imported.
When importing the sound les, all congured post processing lters are applied.
If you import anything other than WAV les you are responsible for decoding them during the
import process (for example through post processing lters) or the model creation will fail.
3.7 Conguration
Simon was designed with high congurability in mind. Because of this, there are plentiful pa-
rameters that can be ne-tuned to your specic requirements.
You can access Simons conguration dialog through the applications main menu: Settings
Congure Simon....
3.7.1 General Conguration
The general conguration page lists some basic settings.
If you want to show the rst run assistant again, deselect Disable conguration wizard.
32
The Simon Handbook
Please note that the option to start Simon at login will work on both Microsoft Windows and
when you are using KDE on Linux. Support for other desktop environments like Gnome, XFCE,
etc. might require manually placing Simon in the session autostart (please refer to the respective
manuals of your desktop environment).
When the option to start Simon minimized is selected, Simon will minimize to the system tray
immediately after starting.
Deselecting the option to warn when there are problems with samples deactivates the sample
quality assurance.
3.7.2 Recordings
Simon uses fairly sophisticated internal sound processing to enable complex multi-device setups.
3.7.2.1 Device Conguration
The sound device conguration allows you to choose which sound device(s) to use, congure
them and dene additional recording parameters.
Use the Refresh devices button if you have plugged in additional sound devices since you started
Simon.
33
The Simon Handbook
Most of the time you will want to use 1 channel and 16kHz (which is also the default) because the
recognition only works on mono input and works best at 16kHz (8kHz and 22kHz being other
viable options). Some low-cost sound cards might not support this particular mode in which case
you can enable automatic Resampling in the devices advanced conguration.
NOTE
Only change the channel and the samplerate if you really know what you are doing. Otherwise the
recognition will most likely not work.
34
The Simon Handbook
You can use Simon with more than one sound device at the same time. Use Add device to add a
new device to the conguration and Remove device to remove it from your conguration. The
rst device in your sound setup cannot be removed.
For each device you can determine for what you want the device to be used: Training or recogni-
tion (last one only applicable for input devices).
If you use more than one device for training, you will create multiple sound les for each utter-
ance. When using multiple devices for recognition each one feeds a separate sound input stream
to the server resulting in recognition results for each stream.
If you use multiple output devices the playback of the training samples will play on all congured
audio devices.
When using different sample rates for your input devices, the output will only play on matching
output devices. If you for example have one input device congured to use 16kHz and the other
to use 48kHz, the playback of samples generated by the rst one will only play on 16kHz outputs,
the other one only on 48kHz devices.
In the devices advanced conguration, you can also dene the sample group tag of the produced
training samples and set activation context conditions.
If you set up this device to be used for recognition and (any of) its activation requirements are
not met, the device will not record. This can be used to augment or even replace the traditional
voice activity detection with context information.
For example, add a face detection condition to the recording devices activation requirements to
only enable the recognition when youre looking at the webcam.
3.7.2.2 Voice Activity Detection
The recognition is done one the Simond server. See the architecture section for more details.
The sound stream is not continuous but is segmented by the Simon client. This is done by some-
thing called voice activity detection.
Here you can congure this segmentation through the following parameters:
35
The Simon Handbook
Cutoff level
Everything below this level is considered silence (background noise).
Head margin
Cache for as long as head margin to start consider it a real sample. During this whole time the
input level needs to be above the cutoff level.
Tail margin
After the recording went below the cutoff level, Simon will wait for as long as tail margin to
consider the current recording a nished sample.
Skip samples shorter than
Samples that are shorter than this value are not considered for recognition. (coughs, etc.)
3.7.2.3 Training settings
When the option Default to power training is selected, Simon will, when training, automatically
start- and stop the recording when displaying and hiding (respectively) the recording prompt.
This option only sets the default value of the option, the user can change it at any time before
beginning a training session.
The congurable font here refers to the text that is recorded to train the acoustic model (through
explicit training or when adding a word).
This option has been introduced after we have worked with a few clients suffering spastic dis-
ability. While we used the mouse to control Simon during the training, they had to read what
was on the screen. At rst this was very problematic as the regular font size is relatively small
and they had trouble making out what to read. This is why we made the font and the font size of
the recording prompt congurable.
Here you can also dene the required signal to noise ratio for Simon to consider a training sample
to be correct. See the Sample Quality Assurance section for more details.
On this conguration page you can also set the parameters for the volume calibration.
36
The Simon Handbook
It can be deactivated for both the add word dialog and the training wizard by unchecking the
group box itself.
The calibration itself uses the voice activity recognition to score your sound conguration.
The prompted text can be congured by entering text in the input eld below. If the edit is empty
a default text will be used.
3.7.2.4 Postprocessing
All recorded (training) and imported (through the import training data) samples can be processed
using a series of postprocessing commands. Postprocessing chains are an advanced feature and
shouldnt be needed by the average user.
The postprocessing commands can be seen as a chain of lters through which the recordings have
to pass through. Using these lters one could dene commands to suppress background noise
in the training data or normalize the recordings.
Given the program process_audio which takes the input- and output les as its arguments (e.g.:
process_audio in.wav out.wav) the postprocessing command would be: process_audio %1
%2. The two placeholders %1 and %2 will be replaced by the input lename and the output
lename respectively.
The switch to apply lters to recordings recorded with Simon enables the postprocessing chains
for samples recorded during the training (including the initial training while adding the word).
If you dont select this switch the postprocesing commands are only applied to imported samples
(through the import training data wizard).
3.7.2.5 Context
Every sample recorded with Simon is assigned a sample group.
When creating the acoustic model from the training samples Simon can take the current situation
into account to only use a subset of all gathered training data.
37
The Simon Handbook
For example, in a system where multiple, very different speakers use one shared setup, context
conditions can be set up to automatically build separate models for both users depending on the
current situation.
The above screenshot, for example, shows a setup where, given that all samples of peter were
tagged peters_samples and all samples of mathias were tagged mathias_samples (refer
to the device conguration for more information on how to set up sample groups), the active
acoustic model will only contain the current users own samples as long as the le /home/bedah
r/.username contains either peter or mathias.
Another example use-case would be to switch to a more noise-resistant acoustic model when the
user starts playing music.
3.7.3 Speech Model
Here you can adjust the parameters of the speech model.
3.7.3.1 Base model
You can optionally use base models to limit / circumvent the training or to avoid installing a
model creation backend. Please refer to the general base model section for more details about
base models.
38
The Simon Handbook
To use a user-generated model, select Do not use a base model. To use a static base model,
select a base model and do not select Adapt base model using training samples. To instead use
an adapted base model, check Adapt base model using training samples after selecting a base
model.
Simon base models are packaged in .sbm les.
To add base models to the selection, you can either import local models (Open model Import),
download them from an online repository (Open model Download) or create new ones from
raw les (Open model Create from model les).
If you have raw model les produced by either supported model creation backend, you can
package them into SBM container for use with Simon.
39
The Simon Handbook
You can also export your currently active model by selecting Export active model. The exported
SBM le will contain your full acoustic model (ignoring the current context) that can be shared
with other Simon users.
3.7.3.2 Training data
This section allows to congure the training samples.
The samplerate set here is the target samplerate of the acoustic model. It has nothing to do
with the recording samplerate and it is the responsibility of the user to ensure that the samples
40
The Simon Handbook
are actually made available in that format (usually by recording in that exact samplerate or by
dening postprocessing commands that resample the les; see the sound conguration section
for more details).
Usually either 16kHz or 8kHz models are built / used. 16kHz models will have higher accuracy
over 8kHz models. Going higher than 16kHz is not recommended as it is very cpu-intensive and
in practice probably wont result in higher recognition rates.
Moreover, the path to the training samples can be adjusted. However, be sure that the previously
gathered training samples are also moved to the new location. If you use automatic synchroniza-
tion the Simond would alternatively also provide Simon with the missing sample but copying
them manually is still recommended for performance reasons.
3.7.3.3 Language Prole
In the language prole section you can select a previously built or downloaded language prole
to aid with the transcription of new words.
3.7.4 Model Extensions
Here you can congure the base URL that is going to be used for the automatic bomp import.
The default points to the copy on the Simon listens server.
41
The Simon Handbook
3.7.5 Recognition
Here you can congure the recognition and model synchronization with the Simond server.
3.7.5.1 Server
Using the server conguration you can set parameters of the connection to Simond.
3.7.5.1.1 General
The Simon main application connects to the Simond server (see the architecture section for more
information).
42
The Simon Handbook
To identify individual users of the system(one Simond server can of course serve multiple Simon
clients), Simon and Simond use users. Every user has his own speech model. The username /
password combination given here is used to log in to Simond. If Simond does not know the
username or the password is incorrect, the connection will fail. See the Simond manual on how
to setup users for Simond.
The recognition itself - which is done by the server - might not be available at all times. For
example it would not be possible to start the recognition as long as the user does not have a
compiled acoustic and language model which has to be created rst (during synchronization
when all the ingredients - vocabulary, grammar, training - are present). Using the option to start
the recognition automatically once it is available, Simon will request to start the recognition when
it receives the information that it is ready (all required components are available).
Using the Connect to server on startup option, Simon will automatically start the connection to
the congured Simond servers after it has nished loading the user interface.
3.7.5.1.2 Network
Simon connects to Simond using TCP/IP.
43
The Simon Handbook
As of yet (Simon 0.4), encryption is not yet supported.
The timeout setting species, how long Simon will wait for a rst reply when contacting the
hosts. If you are on a very, very slow network and/or use connect on start on a very slow
machine, you may want to increase this value if you keep getting timeout errors and can resolve
them by trying again repeatedly.
Simon supports to be congured to use more than one Simond. This is very useful if you for
example are going to use Simon on a laptop which connects to a different server depending
where you are. You could for example add the server you use when you are home and the server
used when you are at work. When connecting, Simon will try to connect to each of the servers
(in order) until it nds one server that accepts the connection.
To add a server, just enter the host name or IP address and the port (separated by :) or use the
dialog that appears when you select the blue arrow next to the input eld.
3.7.5.2 Synchronization and Model Backup
Here you can congure the model synchronization and restore older versions of your speech
model.
44
The Simon Handbook
Simon creates the speech input les which are then compiled and used by the Simond server (see
the section architecture for more details).
The process of sending the speech input les, compiling them and receiving the compiled ver-
sions is called synchronization. Only after the speech model is synchronized the changes take
effect and a new restore point is set. This is why per default Simon will always synchronize the
model with the server when it changes. This is called Automatic Synchronization and is the
recommended setting.
However, if you want more control you can instruct Simon to ask you before starting the synchro-
nization after the model has changed or to rely on manual synchronization all together. When se-
lecting the manual synchronization you have to manually use the Actions Synchronize menu
item of the Simon main window every time you want to compile the speech model.
The Simon server will maintain a copy of the last ve iterations of model les. However, this
only includes the source les (the vocabulary, grammar, etc.) - not the compiled model. However,
the compiled model will be regenerated from the restored source les automatically.
After you have connected to the server, you can select one of the available models and restore it
by clicking on Choose Model.
3.7.6 Actions
In the actions conguration you can congure the reactions to recognition results.
3.7.6.1 Recognition
The recognition of Simon computes not only the most likely result but rather the top ten results.
Each of the results are assigned a condence score between 0 and 1 (were 1 is 100% sure).
Using the Minimum condence you can set a minimum condence for recognition results to be
considered valid.
If more than one recognition results are rated higher than the minimum condence score, Simon
will provide a popup listing the most likely options for you to choose from.
45
The Simon Handbook
This popup can be disabled using the Display selection popup for ambiguous results check box.
3.7.6.2 Dialog font
Many plugins of Simon have a graphical user interface.
The fonts of these interfaces can be congured centrally and independent of the systems font
settings here.
3.7.6.3 Lists
Here you can nd the global list element conguration. This serves as a template for new scenar-
ios but is also directly used for the popup for ambiguous recognition results.
3.7.7 Text-to-speech
Some parts of Simon, most notably the dialog command plugin employ text-to-speech (or TTS)
to read text aloud.
46
The Simon Handbook
3.7.7.1 Backends
Multiple external TTS solutions can be used to allow Simon to talk. Multiple backends can be
enabled at the same time and will be queried in the congured order until one is found that can
synthesize the requested message.
The following backends are available:
Recordings
Instead of an engine to convert arbitrary text into speech, text-snippets can be pre-recorded
and will be simply played back.
Jovie
Uses the Jovie TTS system. This requires a valid Jovie set-up.
Webservice
The webservice backend can be used to talk to any TTS engine that has a web front-end that
returns .wav les.
3.7.7.2 Recordings
Instead of using an external TTS engine, you can also record yourself or other speakers reading
the texts aloud. Simon can then play back these pre-recorded snippet when they are requested of
its text-to-speech engine.
These recorded sound bites are organized into sets of different speakers which can also be
imported and exported to share them with other Simon users.
47
The Simon Handbook
3.7.7.3 Webservice
Through the webservice backend, Simon can use web-based TTS engines like MARY.
You can provide any Url. Simon will replace any instance of %1 within the congured Url
with the text to synthesize. The backend expects the queried webservice to return a .wav le
that will be streamed and outputted through Simons sound layer - respecting the sound device
conguration.
48
The Simon Handbook
3.7.8 Social desktop
Scenarios can be uploaded and downloaded from within Simon.
For this we use KDEs social desktop facilities and our own category for Simon scenarios on kde-
les.org.
If you already have an account on opendesktop.org you can input the credentials there. If you
dont, you can register directly in the conguration module.
The registration is of course free of charge.
3.7.9 Webcam conguration
In Webcam conguration, you can congure frame per second (fps) and select the webcam to use
when multiple webcams are connected to your system.
Frame per second is the rate at which webcam will produce unique consecutive images called
frames. The optimal value of fps is between 5-15 for proper performance.
3.7.10 Advanced: Adjusting the recognition parameters manually
Simon is targeted towards end-users. Its interface is designed to allow even users without any
background in speech technology to design their own language and acoustic models by provid-
ing reasonable default values for simple uses.
In special cases (severe speech impairments for example), special conguration might be needed.
This is why the raw conguration les for the recognition are also respected by Simon and can of
course be modied to suit your needs.
3.7.10.1 Julius
There are basically two parts of the Julius conguration that can be adjusted:
49
The Simon Handbook
adin.jconf
This is the conguration of the Simon client of the Soundstream sent from Simon to the Si-
mond. This le is directly read by the adinstreamer.
Simon ships with a default adin.jconf without any special parameters. You can change this
system wide conguration which will affect all users if there are different user accounts on
your machine who all use Simon. To just change the conguration of one of those users copy
the le to the user path (see below) and edit this copy.
julius.jconf
This is a conguration of the Simond server and directly inuences the recognition. This le is
parsed by libjulius and libsent directly.
Simond ships with a default julius.jconf. Whenever there is a new user added to the Simond
database, Simond will automatically copy this system wide conguration to the new user.
After that the user is of course free to change it but it wont affect the other users. This way the
template (the system wide conguration) can be changed without affecting other users.
The path to the Julius conguration les will depend on your platform:
File Microsoft Windows GNU/Linux
adin.jconf (system)
(installation path)\share\a-
pps\simon\adin.jconf
kde4-config --prefix/shar-
e/apps/simon/adin.jconf
adin.jconf (user)
%appdata%\.kde\share\a-
pps\simon\adin.jconf
~/.kde/share/apps/simon-
/adin.jconf
julius.jconf (template)
(installation path)\share\a-
pps\simond\default.jconf
kde4-config
--prefix/share/apps/simo-
nd/default.jconf
julius.jconf (user)
%appdata%\.kde\share\a-
pps\simond\models\(use-
r)\active\julius.jconf
~/.kde/share/apps/simon-
d/models/(user)/active/j-
ulius.jconf
Table 3.1: Julius Configuration Files
50
The Simon Handbook
Chapter 4
Advanced: Creating new scenarios
with Simon
The following chapter is aimed towards more experienced users who want to design their own
scenarios.
For general usage instruction, please refer to the chapter Using Simon: Typical user.
4.1 Introduction
To add a new scenario, you rst create a new scenario shell by adding a new scenario object
and then open it in the Simon main window.
To instead modify an existing scenario, you of course just have to open it.
A Simon scenario contains the following components:
Vocabulary
Grammar
Training texts
Context
Commands
Before describing how to congure these elements in Simon, the next section provides back-
ground information that will help you understand the basic principles of speech modelling. This
fundamental knowledge is necessary to design sensible scenarios.
4.2 Speech recognition: background
NOTE
Before explaining exactly how you can create new scenarios with Simon, this section introduces some
fundamental basics to speech recognition in general.
51
The Simon Handbook
Speech recognition systems take voice input (often from a microphone) and try to translate it
into written text. To do that, they rely on statistical representations of human voice. To put it into
simple terms: The computer learns how words - or more correctly the sounds that make up those
words - sound.
A speech model consists of two distinct parts:
Language Model
Acoustic Model
4.2.1 Language Model
The language model denes the vocabulary and the grammar you want to use.
4.2.1.1 Vocabulary
The vocabulary denes what words the recognition process should recognize. Every word you
want to be able to use with Simon should be contained in your vocabulary.
One entry in the vocabulary denes exactly one word. In contrast to the common use of the
word word, in Simon word means one unique combination of the following:
Wordname
(The written word itself)
Category
(Grammatical category; for example: Noun, Verb, etc.)
Pronunciation
(How the word is pronounced; Simon accepts any kind of phonetic as long as it does not use
special characters or numbers)
That means that plurals or even different cases are different words to Simon. This is an impor-
tant design decision to allow more control when using a sophisticated grammar.
In general, it is advisable to keep your vocabulary as sleek as possible. The more words, the
higher the chance that Simon might misunderstand you.
Example vocabulary (please note that the categories here are deliberately set to Noun / Verb to
help the understanding; please to refer to the grammar section why this might not be the best
idea):
Word Category Pronunciation
Computer Noun k ax m p y uw t er
Internet Noun ih n t er n eh t
Mail Noun m ey l
close Verb k l ow s
Table 4.1: Sample Vocabulary
4.2.1.1.1 Active Dictionary
The vocabulary used for the recognition is referred to as active dictionary or active vocabulary.
52
The Simon Handbook
4.2.1.1.2 Shadow Dictionary
As said above, the user should keep his vocabulary / dictionary as lean as possible. However,
as a word in your vocabulary has to also have information about its pronunciation, it would
also be good to have a large dictionary where you could look up the pronunciation and other
characteristics of the words.
Simon provides this functionality. We refer to this large reference dictionary as shadow dic-
tionary. This shadow dictionary is not created by the user but can be imported from various
sources.
As Simon is a multi-language solution we do not ship shadowdictionaries with Simon. However,
it is very easy to import them yourself using the import dictionary wizard. This is described in
the Import Dictionary section.
4.2.1.1.3 Language prole
Additionally to a shadow dictionary, Simon can use a language prole to provide help with
transcribing words.
A language prole consists of rules how words are pronounced in the target language. It can be
likened to the way that humans can often pronounce a word they never heard just because they
know some implicit pronunciation rules of the language.
Just as with humans, this process is not perfect but can provide a solid starting ground.
This automatic deduction of a phoneme transcription from a written word is called grapheme
to phoneme conversion.
Simon requires the Sequitur G2P grapheme to phoneme converter to be installed and set up for
language proles to work.
If you have selected a pre-built language prole or built your own, Simon will automatically
transcribe new words with it when they are not found in your shadow dictionary.
4.2.1.2 Grammar
The grammar denes which combinations of words are correct.
Lets look at an example: You want to use Simon to launch programs and close those windows
when you are done. You would like to use the following commands:
Computer, Internet to open a browser
Computer, Mail
To open a mail client
Computer, close
To close the current window
Following English grammar, your vocabulary would contain the following:
Word Category
Computer Noun
Internet Noun
Mail Noun
close Verb
Table 4.2: Sample Vocabulary
53
The Simon Handbook
To allow the sentences dened above Simon would need the following grammar:
Noun Noun for sentences like Computer Internet
Noun Verb for sentences like Computer close
While this would work, it would also allow the combinations Computer Computer, Internet
Computer, Internet Internet, etc. which are obviously bogus. To improve the recognition accu-
racy, we can try to create a grammar that better reects what we are trying to do with Simon.
It is important to remember that you dene your own language when using Simon. That means
that you are not bound to grammar rules that exist in whatever language you want to use Simon
with. For a simple command and control use-case it would for example be advisable to invent
new grammatical rules to eliminate the differences between different commands imposed by
grammatical information not relevant for this use case.
In the example above it is for example not relevant that close is a verb or that Computer and
Internet are nouns. Instead, why not dene themas something that better reects what we want
them to be:
Word Category
Computer Trigger
Internet Command
Mail Command
close Command
Table 4.3: Improved Sample Vocabulary
Now we change the grammar to the following:
Trigger Command
This allows all the combinations described above. However, it also limits the possibilities to ex-
actly those three sentences. Especially in larger models a well thought grammar and vocabulary
can mean a huge difference in recognition results.
4.2.2 Acoustic Model
The acoustic model represents your pronunciation in a machine readable format.
Lets look at the following sample vocabulary:
Word Category Pronunciation
Computer Noun k ax m p y uw t er
Internet Noun ih n t er n eh t
Mail Noun m ey l
close Verb k l ow s
Table 4.4: Sample Vocabulary
54
The Simon Handbook
The pronunciation of each word is composed of individual sounds which are separated by spaces.
For example, the word close consists of the following sounds:
k
l
ow
s
The acoustic model uses the fact that spoken words are composed of sounds much like written
words are composed of letters. Using this knowledge, we can segment words into sounds (repre-
sented by the pronunciation) and assemble them back when recognizing. These building blocks
are called phonemes.
Because the acoustic model actually represents how you speak the phonemes of the words, train-
ing material is shared among all words that use the same phonemes.
That means if you add the word clothes to the language model, your acoustic model already
has an idea how the clo part is going to sound as they share the same phonemes (k, l, ow) at
the beginning.
To train the acoustic model (in other words to tell him how you pronounce the phonemes) you
have to train words from your language model. That means that Simon displays a word which
you read out loud. Because the word is listed in your vocabulary, Simon already knows what
phonemes it contains and can thus learn from your pronunciation of the word.
4.3 Scenarios
This section extends the previous one about basic scenario management and tells you how to
create, edit and export scenarios.
55
The Simon Handbook
4.3.1 Scenario hierarchies
You can create scenario hierarchies by dragging and dropping active scenarios on top of each
other.
Scenario hierarchies serve two purposes:
The context system respects scenario hierarchies: If the parent scenario gets deactivated, all
child scenarios will become deactivated as well.
If you attempt to export a scenario that has children, Simon will allow you to export them in a
joint scenario package. This way, you can share multiple logically co-dependent scenarios (e.g.
one Ofce scenario that contains sub-scenarios for Word, Excel, etc.).
4.3.2 Adding a new Scenario
To add a new scenario, select the Add button. A new dialog will be displayed.
56
The Simon Handbook
When creating a new scenario, please give it a descriptive name. For the later upload on KDE
les we would kindly ask you to follow a certain naming scheme although this is of course not
a requirement: [<language>/<base model>] <name>. If, for example you create a scenario
in English that works with the Voxforge base model and controls Mozilla Firefox this becomes:
[EN/VF] Firefox. If your scenario is not specically tailored to one phoneme set (base model),
just omit the second tag like this: [EN] Firefox.
The scenario version is just an incremental version number that makes it easier to distinguish
between different revisions of a scenario.
If your scenario needs a specic feature of Simon (for example because you use a new plugin),
you can dene minimum and maximum version numbers of Simon here.
The license of your scenario can be set through the drop down. You can of course also add an
arbitrary license text directly in the input eld.
You can then add your name (or alias) to the list of scenario authors. There you will also be
asked for contact information. This eld is purely provided as a convenient way to contact a
scenario author for changes, problems, fanmail etc. If you dont feel comfortable providing your
e-Mail address you can simply enter a dash - denoting that you are not willing to divulge this
information.
4.3.3 Edit Scenario
To edit scenarios, just select Edit from the Manage scenarios dialog.
The dialog works exactly the same as the add scenario dialog.
4.3.4 Export Scenario
Scenarios can be exported to a local le in Simons XML scenario le format and directly up-
loaded to the Simon Scenarios subsection of the OpenDesktop site KDE-les.org.
To upload to OpenDesktop sites, you need an account on the site. Registration is very easy and
of course free of charge.
57
The Simon Handbook
Simon allows you to upload new content directly from within Simon (Export > Publish).
58
The Simon Handbook
To use this functionality, simply enter your account credentials in the social desktop conguration
in the Simon conguration.
4.4 Vocabulary
The vocabulary module denes the set of words of the scenario.
59
The Simon Handbook
Per default, the active vocabulary is shown. To display the shadow vocabulary select the tab
Shadow Vocabulary.
Every word states it recognition rate which at the moment is just a counter of how often the
word has been recorded (alone or together with other words).
4.4.1 Adding Words
To add new words to the active vocabulary, use the add word wizard.
Adding words to Simon is basically a two step procedure:
60
The Simon Handbook
Dening the word
Initial training
4.4.1.1 Dening the Word
Firstly, the user is asked which word he wants to add.
When the user proceeds to the next page, Simon automatically tries to nd as much information
about the word in the shadow dictionary as possible.
If the word is listed in the shadow dictionary, Simon automatically lls out all the needed elds
(Category and Pronunciation).
61
The Simon Handbook
All suggestions fromthe shadowdictionary are listed in the table Similar words. Per default only
exact word matches are shown. However, this can be changed by checking the Include similar
words check box below the suggestion table. Using similar words you can quickly deduce the
correct pronunciation of the word you are actually trying to add. See below for details.
Of course this really depends on your shadow dictionary. If the shadow dictionary does not
contain the word you are trying to add, the required elds have to be lled out manually.
Some dictionaries that can be imported with Simon (SPHINX, HTK) do not differentiate between
upper and lower case. Suggestions based on those dictionaries will always be uppercase. You
are of course free to change these suggestions to the correct case.
Some dictionaries that can be imported with Simon (SPHINX, PLS and HTK) provide no gram-
matical information at all. These will assign all the words to the category Unknown. You should
change this to something appropriate when adding those words.
4.4.1.1.1 Manually Selecting a Category
The category of the word is dened as the grammatical category the word belongs to. This
might be Noun, Verb or completely new categories like Command. For more information see
the grammar section.
The list contains all categories used in both your active and your shadow lexicon and in your
grammar.
You can add new categories to the drop-down menu by using the green plus sign next to it.
4.4.1.1.2 Manually Providing the Phonetic Transcription
The pronunciation is a bit trickier. Simon does not need a certain type of phonetics so you are
free to use any method as long as it uses only ASCII characters and no numbers. However, if you
want to use a shadow dictionary and want to use it to its full potential you should use the same
phonetics as the shadow dictionary.
If you do not know how to transcribe a word yourself you can easily use your shadow dictionary
to help you with the transcription - even if the word is not listed in it. Lets say we want to add
the word Firefox (to launch refox) which is of course not listed in our shadow dictionary.
62
The Simon Handbook
(I imported the English voxforge HTK lexicon available from voxforge as a shadow dictionary.)
Firefox is not listed in our shadow dictionary so we do not get any suggestion at all.
However, we know that refox sounds like re and fox put together. So lets just open the vo-
cabulary (you can keep the wizard open) by selecting Vocabulary fromyour Simon main toolbar.
Switch to the shadow vocabulary by clicking on the tab Shadow Vocabulary.
Use the Filter box above the list to search for Fire:
We can see, that the word Fire is transcribed as f ay r. Now lter for fox instead of Fire and
we can see that Fox is transcribed as f ao k s. We can assume, that refox should be transcribed
as f ay r f ao k s.
63
The Simon Handbook
Using this approach of deducing the pronunciation from parts of the word has the distinct ad-
vantage that we not only get a high quality transcription but also automatically use the same
phoneme set as the other words which were correctly pulled out of the shadow dictionary.
We can now enter the pronunciation and change the category to something appropriate.
4.4.1.2 Training the Word
To complete the wizard we can now train the word twice. If you dont want to do this or for
example use a static base model, you can skip these two pages.
Because you are about to record some training samples, Simon will display the volume calibra-
tion to make sure that your microphone is set up correctly. For more information please refer to
the volume calibration section
Simon will try to prompt you for real-world examples. To do that, Simon will automatically fetch
grammar structures using the category of the word and substitute the generic categories with
example words from your active lexicon.
For example: You have the grammar structure Trigger Command and have the word Com-
puter of the category Trigger in your vocabulary. You then add a new word Firefox of the
category Command. Simon will now automatically prompt you for Computer Firefox as it is -
according to your grammar - a valid sentence.
If Simon is unable to nd appropriate sentences using the word (i.e.: No grammar, not enough
words in your active lexicon, etc.) it will just prompt you for the word alone.
Although Simon ensures that the automatically generated examples are valid, you can always
override its suggestions. Just switch to the Examples tab on the Dene Word page.
64
The Simon Handbook
You are free to change those examples to anything you like. You can even go so far and use words
that are not yet in your active lexicon as long as you add them before you synchronize the model,
although this is not recommended.
All that is left is to record the examples.
Make sure you follow the guidelines listed in the recording section.
4.4.2 Editing a word
To edit a word, simply select it from the vocabulary, and click on Edit.
65
The Simon Handbook
There you can change name, category and pronunciation of the selected word.
4.4.3 Removing a word
To remove a word from your language model, select it in the vocabulary view and click on Re-
move.
The dialog offers four choices:
Move the word to the Unused category.
Because you (hopefully) dont use the category Unused in your grammar, the word will no
longer be considered for recognition. In fact, it will be removed from the active vocabulary
before compiling the model because no grammar sentence references it.
If you want to use the category Unused in your grammar, you can of course use a different
category for unused words. Just set the category through the Edit word dialog.
To use the word again, just set the right category again. No data will be lost.
66
The Simon Handbook
Move the word to the shadow lexicon
This will remove the selected word from the active lexicon (and thus from the recognition) but
will keep a copy in the shadow vocabulary. All the recordings containing the word will be
preserved.
To use the word again, add it again to the active vocabulary. When adding a new word with
the same name the values of the moved word will be suggested to you. Therefore, no data will
be lost.
Delete the word but keep the samples
Removes the word completely but keeps the associated samples. Whenever you add another
word with the same word name the samples will be re-associated.
Be careful with this option as the new word you add again might be transcribed differently
and this difference cannot be taken into account automatically (Simon will then try to force the
new transcription on the old recordings during the model compilation).
Do not use this option if the samples you recorded for this word were erroneous.
Remove the word completely
Just remove the word. All the recordings containing the word will be removed too.
This option leaves no trace of neither the word itself nor the associated samples.
Because samples are global (not assigned to scenarios), even samples recorded from training
sessions of other scenarios might be removed as well if they contain the word. Use this option
carefully.
4.4.4 Special Training
Please see the special training section in the training section.
4.4.5 Importing a Dictionary
Simon provides the functionality to import large dictionaries as a reference. This reference dic-
tionary is called shadow dictionary.
When the user adds a new word to the model, he has to dene the following characteristics to
dene this word:
Wordname
Category
Phonetic denition
These characteristics are taken out of the shadow dictionary if it contains the word in question.
A large, high quality shadow dictionary can thus help the user to easily add new words to the
model without keeping track of the phoneme set or - in many cases - even let him forget that the
phonetic transcription is needed at all.
67
The Simon Handbook
Since version 0.3 you can also import dictionaries directly to the active dictionary. This option is
mostly there to make it easier to move to Simon from custom solutions and to encourage import-
ing of older models (for example one used with Simon 0.2). You will almost never want to import
a very large dictionary as active dictionary.
You can nd a list of available dictionaries that work with Simon on the Simon wiki.
Simon is able to import ve different types of dictionaries:
HADIFIX
HTK
PLS
SPHINX
Julius
4.4.5.1 HADIFIX Dictionary
Simon can import HADIFIX dictionaries.
One example of a HADIFIX dictionary is the German HADIFIX BOMP.
Hadix dictionaries provide both categories and pronunciation.
Due to a special exemption in their license the Simon listens team is proud to be able to offer you
to download the excellent HADIFIX BOMP directly from within Simon.
68
The Simon Handbook
Using the automatic bomp import you can, after providing name and e-Mail address for the
teamof the University Bonn, directly download and import the dictionary fromthe Simon listens
server.
4.4.5.2 HTK Dictionary
Simon can import HTK lexica.
One example of a HTK lexicon is the English Voxforge dictionary.
Hadix dictionaries provide pronunciation information but no categories. All words will be
assigned to the category Unknown.
4.4.5.3 PLS Dictionary
Simon can import PLS dictionaries.
One example of a PLS dictionary is the German GPL dictionary from Voxforge.
PLS dictionaries provide pronunciation information but no categories. All words will be assigned
to the category Unknown.
4.4.5.4 SPHINX Dictionary
Simon can import SPHINX dictionaries.
One example of a SPHINX dictionary is this dictionary for Mexican Spanish.
SPHINX dictionaries provide pronunciation information but no categories. All words will be
assigned to the category Unknown.
69
The Simon Handbook
4.4.5.5 Julius Dictionary
Simon can import Julius vocabularies.
One example of a Julius vocabularies are the word lists of Simon 0.2.
Julius dictionaries provide pronunciation information as well as category information.
4.4.6 Create language prole
Here, you can build a language prole from your shadow dictionary.
After selecting Create prole, Simon will analyze your current shadow dictionary and try to
deduce the transcription rules from it.
This is generally a very length process and can, depending on the size of your shadowdictionary,
take up to several hours.
The created prole will be selected automatically after the process completes.
4.5 Grammar
Simon provides an easy to use text based interface to change the grammar. You can simply list
all the allowed sentences (without any punctuation marks, obviously) like described above.
70
The Simon Handbook
When selecting a sentence on the left, the right pane will automatically show possible real sen-
tences with the words of your vocabulary on the right.
The example section will list at most 35 examples so if more than that amount of sentences match
the selected grammar entry, the list might not be complete.
4.5.1 Import a Grammar
Additionally to simply entering your desired grammar sentence by sentence, Simon is able to au-
tomatically deduce allowed grammar structures by reading plain text using the Import Grammar
wizard.
71
The Simon Handbook
Simon can read and import text les but also provides an input eld if you want to simply type
the text into Simon.
Say we have a vocabulary like in the general section above:
Word Category
Computer Trigger
Internet Command
Mail Command
close Command
Table 4.5: Improved Sample Vocabulary
We want Simon to recognize the sentence Computer Internet!. So we either enter the text using
the Import text option or create a simple text le with this content Computer Internet! (any
punctuation mark would work) and save it as simongrammar.txt to use the Import les option.
72
The Simon Handbook
Simon will then read the entered text or all the given text les (in this case the only given text le
is simongrammar.txt) and look up every single word in both active and shadow dictionary (the
denition in the active dictionary has more importance if the word is available in both). It will
then replace the word with its category.
In our example this would mean that he would nd the sentence Computer Internet. Simon
would nd out that Computer is of the category Trigger and Internet of the category Com-
mand. Because of this Simon would learn that Trigger Command is a valid sentence and add
it to its grammar.
The import automatically segments the input text by punctuation marks (., -, !, etc.) so any
natural text should work. The importer will automatically merge duplicate sentence structures
(even across different les) and add multiple sentence (all possible combinations) when a word
has multiple categories assigned to it.
The import will ignore sentences where one or more words could not be found in the language
model unless you tick the Also import unknown sentences check box in which case those words
are replaced with Unknown.
73
The Simon Handbook
4.5.2 Renaming Categories
The rename category wizard allows you to rename categories in both your active vocabulary,
your shadow dictionary and the grammar.
4.5.3 Merging Categories
The merge category wizard allows you to merge two categories into one new category in both
your active vocabulary, your shadow dictionary and the grammar.
74
The Simon Handbook
This functionality is especially useful if you want to simplify your grammar structures.
4.6 Training
Using the Training-module, you can improve your acoustic model.
The interface lists all installed training texts in a table with three columns:
Name
A descriptive name for the text.
Pages
The number of pages the text consists of. Each page represents one recording.
Recognition Rate
Analogue to the vocabulary; represents how likely Simon will recognize the words (higher is
better). The recognition rate of the training text is the average recognition rate of all the words
in the text.
To improve the acoustic model - and thus the recognition rate - you have to record training texts.
This means that Simon gets essentially two needed parts:
Samples of your speech
Transcriptions of those samples
The active dictionary is used to transcribe the words (mapping them from the actual word to its
phonetic transcription) that make up the text so every word contained in the training text you
want to read (train) has to be contained in your active dictionary. Simon will warn you if this is
not the case and provide you with the possibility to add all the missing words in one go.
75
The Simon Handbook
The procedure is the same as if you would add a single word but the wizard will prompt you for
details and recordings for all the missing words automatically. This procedure can be aborted at
any time and Simon will provide both a way to add the already completely dened words and
to undo all changes done so far. When the user has added all the words he is prompted for (all
the words missing) the changes to the active dictionary / vocabulary are saved and the training
of the previously selected text starts automatically.
The training (reading) of the training text works exactly the same as the initial training when
adding a new word.
Make sure you follow the guidelines listed in the recording section.
4.6.1 Storage Directories
Training texts are stored in two different locations:
Linux: ~/.kde/share/apps/simon/texts
Windows: %appdata%\.kde\share\apps\simon\texts
The texts of the current user. Can be deleted and added with Simon (see below).
Linux: kde4-config --prefix/share/apps/simon/texts
Windows: (install folder)\share\apps\simon\texts
76
The Simon Handbook
System-wide texts. They will appear on every user account using Simon on this machine
and cannot be deleted from within Simon because of the obvious permission restrictions on
system-wide les.
This folder can be used by system administrators to provide a common set of training texts for
all the users on one system.
The XML les (one for each text) can just be moved from one location to the other but this will
most likely require admin privileges.
4.6.2 Adding Texts
The add texts wizard provides a simple way to add new training texts to Simon.
When importing text les, Simon will automatically try to recognize individual sentences and
split the text into appropriate pages (recordings). The algorithm treats text between normal
punctuation (., !, ?, ..., ,...) and line breaks as sentences. Each sentence will be on its
own page.
Simon supports two different sources for new training texts.
77
The Simon Handbook
4.6.2.1 Add training texts
Simply enter the training text in an input eld.
4.6.2.2 Local text les
Simon can import normal text les to use them as training texts.
78
The Simon Handbook
4.6.3 On-The-Fly Training
In addition to training texts, Simon also allows to train individual words or word combinations
from your dictionary on-the-y.
This feature is located in the vocabulary menu of Simon.
Select the words to train from the vocabulary on the left and simply drag them to the selection
list to the right (you could also select them in the table on the left and add them by clicking Add
to Training).
Start the training by selecting Train selected words. The training itself is exactly the same as if it
were a pre-composed training text.
79
The Simon Handbook
If there are more than 9 words to train Simon will automatically split the text evenly across mul-
tiple pages.
Of course you are free to add words from the shadow lexicon to the list of words to train but
Simon will prompt you to add the words before the training starts just like he would if you
would train a text that contains unknown words (see above).
4.7 Context
Simon includes a context layer that allows you to let Simon automatically adjust its conguration
depending on its context.
For example, you could set up Simon to only allow commands like New tab if Mozilla Firefox
is running and the currently active window.
There are three major areas that contextual information can inuence:
Scenario selection
Sample groups
Active microphones
4.7.1 Scenario selection
Scenarios can specify to only be active during certain contextual situations. If these situations are
not met, Simon will temporarily deactivate the affected scenario.
80
The Simon Handbook
The local context conditions of this scenario are shown in the list of Activation Requirements
and can be added, edited and deleted through the respective buttons.
The context conditions respect a possible hierarchy of scenarios: The activation requirements
of all direct or indirect parent scenarios also apply to the child scenario(s). This condition
inheritance is shown on the right side.
The Simon main window also shows a list of currently used scenarios. Scenarios that are deac-
tivated because of their activation requirements (context conditions) are listed in light gray and
italic. The screenshot below, for example, shows a temporarily deactivated Amarok scenario.
The same visual hints (gray, italic font for unmet activation criteria) also apply to the individual
context conditions in the context menu.
81
The Simon Handbook
4.7.2 Sample groups
Every sample recorded with Simon is assigned a sample group. Sample groups can be congured
to only be used for the building of the acoustic models if certain contextual conditions are met.
If this is not the case, all samples tagged with the deactivated sample group will be temporarily
removed from the training corpus.
For more information, an example use-case and instructions on howto work with sample groups,
please refer to the section on sample groups.
4.7.3 Context conditions
In Simon, context is monitored through a set of context condition plugins.
In general, context conditions are combined through an and association. For example, if the
activation of resource is bound by two conditions A and B, it will only be activated if both A and
B see their conditions met. To instead model alternatives (A or B or both), use an Or Condition
Association.
All conditions can optionally be inverted. Inverting a condition means that it will evaluate to
true if it would otherwise evaluate to false and vice versa.
4.7.3.1 Active window
True, if the title of the currently active foreground window matches the provided window title.
4.7.3.2 D-Bus
The D-Bus condition plugin allows to monitor 3rd party applications that export state informa-
tion on D-Bus.
The monitored application needs to provide two methods: One signal to notify of changes and
another method that returns the current state.
82
The Simon Handbook
The screenshot above, for example, congures a D-Bus condition that will evaluate to true while
the music player Tomahawk is playing and to false otherwise.
4.7.3.3 Face detection
The face detection condition will evaluate to true, if Simons vision layer has identied a person
sitting in front of the congured webcam.
83
The Simon Handbook
4.7.3.4 File content
This condition plugin will return true, if the given le contains the provided content.
The le will be monitored for changes.
4.7.3.5 Lip detection
The lip detection condition will evaluate to true, if Simons vision layer has identied a person
sitting in front of the congured webcam and is speaking something (lip movements).
84
The Simon Handbook
The lip detection training will try to determine the optimal value of sensitivity of the detection
by monitoring your lip movements. For better accuracy of lip detection condition, stop training
when the sensitivity value on the slider during training becomes almost constant.
4.7.3.6 Or condition association
The or condition association allows you to congure a meta-condition that reports to be satised
as soon as one or more of its child conditions evaluates to true.
Or condition associations can have an arbitrary number of child conditions that may even also
be or condition associations.
4.7.3.7 Process opened
Is satised if there is a running process with the provided executable name.
85
The Simon Handbook
4.8 Commands
When Simon is active and recognizes something, the recognition result is given to the loaded
command plug-ins (in order) for processing.
The command systemcan be compared with a group of factory workers. Each one of themknows
how to perform one task (e.g. Karl knows how to start a program and Joe knows how to open
a folder, etc.). Whenever Simon recognizes something it is given to Karl who then checks if this
instruction is meant for him. If he doesnt know what to do with it, it is handed over to Joe and
86
The Simon Handbook
so on. If none of the loaded plugins know how to process the input it is ignored. The order in
which the recognition result is given to the individual commands (people) is congurable in the
command options (Commands > Manage plugins).
Each plugin can be associated with a trigger. Using triggers, the responsibility of each plugin
can be easily be divided.
Using the factory workers abstraction from above it could be compared to stating the name of
who you mean to process your request. So instead of Open my home folder you say Joe, open
my home folder and Joe (the plugin responsible for opening folders) will instantly know that
the request is meant for him.
In practice you could have commands like the executable command Firefox to open the popular
browser and the place command Google to open the web search engine. If you assign the trigger
Start to the executable plugin and the trigger Open to the place command you would have to
say Start Firefox (instead of just Firefox if you dont use a trigger for the executable plugin)
and Open Google to open the search engine (instead of just Google).
Triggers are of course no requirement and you can easily use Simon without dening any plugin
triggers (although many plugins come with a default trigger of Computer set which you would
have to remove). But even if you use just one trigger for all your commands (like Computer to
say Computer, Firefox and Computer, Google like) it has the advantage of greatly limiting the
number of false-positives.
Simons command dialog displays the complete phrase associated with a command in the upper
right corner of the command conguration.
You can load multiple instances of one plugin even in one scenario. Each instance can of course
also have a different plugin trigger.
Each Command has a name (which will trigger its invocation), an icon and more elds depending
on the type of the plugin (see below).
Some command plugins might provide a conguration of the plugin itself (not the commands it
contains). These conguration pages will be plugged directly into the action conguration dialog
(below the General menu item) when you load the associated plugin.
Plugins that provide a graphical user interface (like for example the input number command
plugin) can be congured by conguring Voice commands. You can, for example, change the as-
87
The Simon Handbook
sociated word that will trigger the button, but also change the displayed icon, etc. If you remove
all voice interface commands froma graphical element, the element will be hidden automatically.
Voice interface commands are added just like normal commands through the command congu-
ration.
To add a new interface command to a function, just select the action you want to associate with a
command, click Create from Action template and adapt the resulting command to your needs.
Some plugins (for example the desktop grid or the calculator) might also provide a menu item in
the Actions menu.
Scenarios can optionally dene one command that will immediately be run when the scenario
is initialized. If you require more than one command to run automatically, consider the use of a
composite command.
88
The Simon Handbook
Command triggers can contain placeholders in the form of %<index>, referring to any one
word, or %%<index> describing one or more left out words. For example the recognition
result Next window will be matched by the triggers Next %1, Next %%1 and %%1 but
not by the triggers %1, Next window %1, %%1 Next window.
4.8.1 Executable Commands
Executable commands are associated with an executable le (Program) which is started when
the command is invoked.
Arguments to the commands are supported. If either path to the executable or the parameters
contain spaces they must be wrapped in quotes.
89
The Simon Handbook
Given the executable le C:\Program Files\Mozilla Firefox\firefox.exe the local html le C
:\test file.html the correct line for the Executable would be: C:\Program Files\Mozilla
Firefox\firefox.exe C:\test file.html.
The working folder denes where the process should be launched from. Given the working
folder C:\folder, the command C:\Program Files\Mozilla Firefox\firefox.exe file.
html would cause Firefox to search for the le C:\folder\file.html.
The working folder usually does not need to be set and can be left blank most of the time.
4.8.1.1 Importing Programs
For even easier conguration Simon provides an import dialog which allows you to select pro-
grams directly from the KDE menu.
NOTE
This option is not available on Microsoft Windows.
The dialog will list all programs that have an entry in your KDE menu in their respective category.
Sub-Categories are not supported and are thus listed on the same level as top-level categories.
Just select the program you wish to start with Simon and press Ok. The correct values for the
executable and the working folder as well as an appropriate command name and description will
automatically be lled out for you.
4.8.2 Place Commands
With place commands you can allowSimon to open any given URL. Because Simon just hands the
address over to the platforms URL handler, special Protocols like remote:/ (on Linux

/KDE) or
even KDEs Web-Shortcuts are supported.
Instead of folders, les can also be set as the commands URL which will cause the le to be
opened with the application which is associated with it when the command is invoked.
90
The Simon Handbook
To associate a specic URL with the command you can manually enter it in the URL eld (select
Manual rst) or import it with the import place wizard.
4.8.2.1 Importing Places
The import place dialog allows you to easily create the correct URL for the command.
To add a local folder, select Local Place and choose the folder or le with the le selector.
To add a remote URL (HTTP, FTP, etc.) choose Remote URL.
91
The Simon Handbook
Please note that for URLs with authentication information the password will be stored in clear
text.
4.8.3 Shortcut Commands
Using shortcut commands the user can associate commands with key-combinations.
The command will simulate keyboard input to trigger shortcuts like Ctrl-C or Alt-F4.
The plugin can press, release or press and release the congured key combination.
92
The Simon Handbook
To select the shortcut you wish to simulate just toggle the shortcut button and press the key
combination on your keyboard.
Simon will capture the shortcut and associate it with the command.
Due to technical limitations there are several shortcuts on Microsoft Windows that cannot be
captured by Simon (this includes e.g. Ctrl-Alt-Del and Alt-F4). These special shortcuts can be
selected from a list below the aforementioned shortcut button.
NOTE
This selection box is not visible in the screenshot above as the list is only displayed in the Microsoft
Windows version of Simon.
4.8.4 Text-Macro Commands
Using text-macro commands, the user can associate text with a command. When the command
is invoked, the associated text will be written by simulating keystrokes.
4.8.5 List Commands
The list command is designed to combine multiple commands (all types of commands are sup-
ported) into one list. The user can then select the n-th entry by saying the associated number
(1-9).
This is very useful to limit the amount of training required and provides the possibility to keep
the vocabulary to a minimum.
93
The Simon Handbook
List commands are especially useful when using commands with difcult triggers or commands
that can be grouped under a general theme. Atypical example would be a command Startmenu
to present a list of programs to launch. That way the specic executable commands can still retain
very descriptive names (like OpenOfce.org Writer 3.1) without the user having to include these
words in his vocabulary and consider them in the grammar just to trigger them.
Commands of different types can of course be mixed.
4.8.5.1 List Command Display
When invoked, the command will display the list centered on the screen. The list will automati-
cally expand to accompany its items.
94
The Simon Handbook
The user can invoke the commands contained in the list by simply saying their associated number
(In this example: One to launch Mozilla Firefox).
While a list command is active (displayed), all input that is not directed at the list itself (other
commands, etc.) will be rejected. The process can be canceled by pressing the Cancel button or
by saying Cancel.
If there are more than 9 items Simon will add Next and Back options to the list (Zero will be
associated with Back and Nine with Next).
95
The Simon Handbook
4.8.5.2 Conguring list elements
By default the list command uses the following trigger words. To use list commands to their full
potential, make sure that your language and acoustic model contains and allows for the following
sentences:
Zero
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
Cancel
Of course you can also congure these words in your Simon conguration:
Commands > Manage plugins > General > Lists for the scenario wide list conguration.
Settings > Congure Simon... > Actions > Lists for the global conguration. When creating
a new scenario, the scenario conguration will be initialized with a copy of this list congura-
tion.
List commands are internally also used by other plugins like for example the desktop grid. The
conguration of the triggers also affects their displayed lists.
4.8.6 Composite Commands
Composite commands allow the user to group multiple commands into a sequence.
When invoked the commands will be executed in order. Delays between commands can be in-
serted.
Composite commands can also work as transparent wrappers by selecting Pass recogni-
tion result through to other commands. In that case, the recognition result will be treated as
unprocessed even if the composite command was executed.
For example, suppose you have a command to turn on the light in one scenario. Additionally
to turning on the light, you now want to add some kind of reporting to the activity by invoking
a script through a program plugin. You could then set up a reporting scenario that contains a
transparent composite command with the same trigger as the command to turn on the light and
make sure that this scenario is set before the original one in the scenario list. You can then activate
and deactivate the reporting simply by loading and unloading this scenario.
96
The Simon Handbook
Using the composite command the user can compose complex macros. The screenshot above -
for example - does the following:
Start Kopete (Executable Command)
Wait 2000ms for Kopete do be started
Type Mathias (Text-Macro Command) which will select Mathias in my contact list
Press Enter (Shortcut Command)
Wait 1000ms for the chat window to appear
Write Hi! (Text-Macro Command); the text associated to this command contains a newline at
the end so that the message will be send.
Press Alt-F4 (Shortcut Command) to close the chat window
Press Alt-F4 (Shortcut Command) to close the kopete main window
4.8.7 Desktop grid
The desktop grid allows the user to control his mouse with his voice.
97
The Simon Handbook
The desktop grid divides the screen into nine parts which are numbered from 1-9. Saying one of
these numbers will again divide the selected eld into 9 elds again numbered from1-9, etc. This
is repeated 3 times. After the fourth time the desktop grid will be closed and Simon will click in
the middle of the selected area.
The exact click action is congurable but defaults to asking the user. Therefore you will be pre-
sented with a list of possible click modes. When selecting Drag and Drop, the desktop grid will
be displayed again to select the drop point.
While the desktop grid is active (displayed), all input that is not directed at the desktop grid itself
(other commands, etc.) will be rejected. Say Cancel at any time to abort the process.
The desktop grid plugin registers a conguration screen right in the command conguration
when it is loaded.
98
The Simon Handbook
The trigger that invokes the desktop grid is of course completely congurable. Moreover the user
can use real or fake transparency. If your graphical environment allows for compositing effects
(desktop effects) then you can safely use real transparency which will make the desktop grid
transparent. If your platform does not support compositing Simon will simulate transparency
by taking a screenshot of the screen before displaying the desktop grid and display that picture
behind the desktop grid.
If the desktop grid is congured to use real transparency and the system does not support com-
positing it will display a solid gray background.
However, nearly all up-to-date systems will support compositing (real transparency).
This includes:
Microsoft Windows 2000 or higher (XP, Vista, 7)
GNU/Linux using a composite manager like Compiz, KWin4, xcompmgr, etc.
By default the desktop grid uses numbers to select the individual elds. To use the desktop
grid, make sure that your language and acoustic model contains and allows for the following
sentences:
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
Cancel
99
The Simon Handbook
To congure these triggers, just congure the commands associated with the plugin.
4.8.8 Input Number
Using the input-number plugin the user can input large numbers easily.
Using the Dictation or the Text-Macro plugin one could associate the numbers with their digits
and use that as input method. However, to input larger numbers there are two ways that both
have signicant disadvantages:
Adding the words eleven, twelve, etc.
While this seems like the most elegant solution as it would enable the user to say vehun-
dredseventytwo we can easily see that it would be quite a problem to add all these words - let
alone train them. What about twothousandninehundredtwo? Where to stop?
Spell out the number using the individual digits
While this is not as elegant as stating the complete number it is much more practical.
However, many applications (like the great mouseless browsing refox addon) rely on the
user to input large numbers without too much time passing between the individual keystrokes
(mouseless browsing for example will wait exactly 500ms per default before it considers the in-
put of the number complete). So if you want to enter 52 you would rst say Five (pause) Two.
Because of the needed pause, the application (like the mouseless browsing plugin) would con-
sider the input of Five complete.
The input number plugin - when triggered - presents a calculator-like interface for inputting a
number. The input can be corrected by saying Back. It features a decimal point accessible by
saying Comma. When saying Ok the number will be typed out. As all the voice-input and the
correction is handled by the plugin itself the application that nally receive the input will only
get couple of milliseconds between the individual digits.
100
The Simon Handbook
While the input number plugin is active (the user currently inputs a number), all input that is not
directed at the input number plugin (other commands, etc.) will be rejected. Say Cancel at any
time to abort the process.
As there can no command instances be created of this plugin it is not listed in the NewCommand
dialog. However, the input number plugin registers a conguration screen right in the command
conguration when it is loaded.
The trigger denes what word or phrase that will trigger the display of the interface.
By default the input number plugin uses numbers to select the individual digits and a couple
of control words. To use the input number plugin, make sure that your language and acoustic
model contains and allows for the following sentences:
Zero
One
101
The Simon Handbook
Two
Three
Four
Five
Six
Seven
Eight
Nine
Back
Comma
Ok
Cancel
To congure these triggers, just congure the commands associated with the plugin.
4.8.9 Dictation
The dictation plugin writes the recognition result it gets using simulated keystrokes.
Assuming you didnt dene a trigger for the dictation plugin it will accept all recognition results
and just write them out. The written input will be considered as processed input and thus not
be relayed to other plugins. This means that if you loaded the dictation plugin and dened no
trigger for it, all plugins below it in the Selected Plug-Ins list in the command conguration will
never receive any input.
As there can no command instances be created of this plugin it is not listed in the NewCommand
dialog.
The dictation plugin can be congured to append texts after recognition results to for example
add a space after each recognized word.
102
The Simon Handbook
4.8.10 Articial Intelligence
The Articial Intelligence is a just-for-fun plugin that emulates a human conversation.
Using the text to speech system, the computer can talk with the user.
The plugin uses AIMLs for the actual intelligence. Most AIML sets should be supported. The
popular A. L. I. C. E. bot and a German version work and are shipped with the plugin.
The plugin registers a conguration screen in the command conguration menu where you can
choose which AIML set to load.
103
The Simon Handbook
Simon will look for AIML sets in the following folder:
GNU/Linux: kde4-config --prefix/share/apps/ai/aimls/
Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by default)]\
share\apps\ai\aimls\
To add a new set just create a new folder with a descriptive name and copy the .aiml les into it.
To adjust your bots personality have a look at the bot.xml and vars.xml les in the following
folder:
GNU/Linux: kde4-config --prefix/share/apps/ai/util/
Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by default)]\
share\apps\ai\util\
As there can no command instances be created of this plugin it is not listed in the NewCommand
dialog.
It is recommended to not use any trigger for this plugin to provide a more natural feel for the
conversation.
4.8.11 Calculator
The calculator plugin is a simple, voice controlled calculator.
The calculator extends the Input Number plugin by providing additional features.
When loading the plugin, a conguration screen is added to the plugin conguration.
104
The Simon Handbook
There you can also congure the control mode of the calculator. Setting the mode to something
else than Full calculator will hide options from the displayed widget.
However, the hidden controls will, in contrast to simply removing all associated command from
the functions, still react to the congured voice commands.
When selecting Ok, the calculator will by default ask you what to do with the generated result.
You can for example output the calculation, the result, both, etc. Besides always selecting this
from the displayed list after selecting the Ok button, this can also be set in the conguration
options.
105
The Simon Handbook
4.8.12 Filter
Using the lter plugin, you can intercept recognition results from being passed on to further
command plugins. Using this plugin you can for example disable the recognition by voice.
The lter command plugin registers a conguration screen in the command conguration where
you can change what results should be ltered.
The pattern is a regular expression that will be evaluated each time a recognition results receives
the plugin for processing.
106
The Simon Handbook
The plugin also registers voice interface commands for activating and deactivating the lter.
In total, the lter therefore has three states:
Inactive
The default state. All recognition results will be passed through.
Half-active (if Two stage activation is selected)
If the next command is the Deactivate lter command, the lter will enter the Inactive
state.
If, however, the next result is something else and Relay results in stage one of two stage
activation is selected, this result will be passed on to other plugins. The lter will reset to
Active afterwards.
Active
When activated, the lter will eat all results that match the congured pattern. By default
this means every result that Simon recognizes will be accepted by the lter and therefore not
relayed to any of the plugins following the lter plugin.
If Two stage activation is enabled and the lter plugin receives the command to directly enter
the Inactive state, this command is ignored. In other ways: If two stage activation is enabled,
the lter can only be disabled by going through the intermediate stage.
4.8.13 Pronunciation Training
The pronunciation training, when combined with a good static base model, can be a powerful
tool to improve your pronunciation of a new language.
Essentially, the plugin will prompt you to say specic words. The recognition will then recognize
your pronunciation of the word and compare it to your speech model which should be a base
model of native speakers for this to work correctly. Then Simon will display the recognition rate
(how similar your version was to the stored base model).
The closer to the native speaker, the higher the score.
107
The Simon Handbook
The plugin adds an entry to your Commands menu to launch the pronunciation training dialog.
The training itself consists of multiple pages. Each page contains one word fetched from your
active vocabulary. They are identied by a category which needs to be selected in the command
conguration before starting the training.
4.8.14 Keyboard
The keyboard plugin displays a virtual, voice controlled keyboard.
The keyboard consists of multiple tabs, each possibly containing many keys. The entirety of tabs
and keys are collected in sets.
You can select sets in the conguration but also create new ones from scratch in the keyboard
command conguration.
108
The Simon Handbook
Keys are usually mapped to single characters but can also hold long texts and even shortcuts.
Because of this, keyboard sets can contain special keys like a select all key or a Password key
(typing your password).
Next to the tabs that hold the keys of your set, the keyboard may also show special keys like Ctrl,
Shift, etc. Those keys are provided as voice interface commands and are displayed regardless of
what tab of the set is currently active.
As with all voice triggers, removing the associated command, hides the buttons as well.
Moreover, the keyboard provides a numpad that can be shown by selecting the appropriate op-
tion in the keyboard conguration.
Next to the number keys and the delete key for the number input eld (Number backspace), the
numpad provides two options on what to do with the entered number.
When selecting Write number, the entered number will be written out using simulated key
109
The Simon Handbook
presses. Selecting Select number tries to nd a key or tab in the currently active set that has
this number as a trigger. This way you can control a complete keyboard just using numbers.
The keys on the num pad are congurable voice interface commands.
4.8.15 Dialog
The dialog plugin enables users to engage in a scripted dialog with Simon.
4.8.15.1 Dialog design
Simon treats dialogs as a succession of different states. Each state can have a text and several
associated options.
110
The Simon Handbook
Dialogs can have more than one text variants - one of which will be randomly picked when
the dialog is displayed. This can help to make dialogs feel more natural by providing several,
alternative formulations.
The texts can use bound values and template options.
Dialog options capsule the logic of the conversation. They are the active components of the
dialog.
Similar to commands, dialog options have a name (trigger) that, when recognized while the
dialog is active and in the options parent state, will cause this option to activate. Alternatively,
options can also be congured to trigger automatically after a set time period. This time is relative
to when the state is entered.
111
The Simon Handbook
Dialog options, when shown through the graphical output module can show an arbitrary text
(that will most likely be equivalent to the trigger but doesnt have to be) and, optionally, an icon.
If the text-to-speech output module is used, the text (not the trigger) will be read aloud unless
this is disabled by selecting the Silent option.
Every state can also optionally have an avatar that will be displayed when using the graphical
output module.
4.8.15.2 Dialog: Bound values
The text of dialog states can contain variables - so called bound values - that will be lled in
during runtime.
For example, the dialog text This is a $variable$ would replace $variable$ with the result of
a bound value called variable.
112
The Simon Handbook
There are four types of bound values:
Static
Static bound values will always be resolved to the same text. They are useful to provide con-
guration options to be lled in to personalize the dialog (e.g., the name of the user).
QtScript
113
The Simon Handbook
QtScript bound values resolve to the result of the entered QtScript code.
Command arguments
If the dialog trigger command (the Simon command that initiates the dialog) uses placeholders,
they can be accessed through command argument bound values. The Argument number refers
to the index of the placeholder you want to access.
For example, if your dialog is started with the command Call %1, and name is a command
argument bound value, then launching the dialog by recognizing Call Peter, will turn the
dialog text Are you sure you want to call $name$? into Are you sure you want to call
Peter?.
114
The Simon Handbook
Plasma data engine
This type of bound value can readily access a wide array of high-level information through
plasma data engines.
4.8.15.3 Template options
Dialog texts can further be parametrized through template options.
These boolean values choose between different or optional text snippets.
115
The Simon Handbook
For example, the template option formal above, would change the dialog text Would you
please {{{formal}}be quiet{{elseformal}}shut up{{endformal}} to Would you please be quit or
Would you please shut up depending on if the template option is set to true or false. The
else-path can be omitted if it is not required (e.g. Would you {{formal}}please {{endformal}}be
quiet).
4.8.15.4 Avatars
Every state can potentially show a different avatar.
These images can range from the picture of a (simulated) speaker to an image of something
topically appropriate.
To use an avatar, rst add it here and later dene where to use it in the dialog design section.
4.8.15.5 Output
Dialogs can be displayed graphically, use text-to-speech or combine both approaches.
116
The Simon Handbook
The Separator to options will be spoken between the dialog text and the current states options
(if there are any). If there are no options to this state or all are congured to be silent, this will
not be said. The option to listen to the whole announcement again is triggered when saying
one of the congured Repeat on trigger. Additionally, the text-to-speech output can optionally
be congured to repeat the listing of the available options (including the congured separator)
when the user says a command that does not match any of the available dialog options.
4.8.16 Akonadi
The Akonadi plugin allows Simon to plug into KDEs PIM infrastructure.
117
The Simon Handbook
The plugin fullls two major purposes:
Execute Simon commands at scheduled times
The Akonadi plugin can monitor a specic collection (calendar) and react on entries whose
summary start with a specic prex. Per default, this prex is [simon-command], meaning
that events of the form [simon-command] <plugin name>//<command name> will trigger
the appropriate Simon command at the start time of the event.
The name of the plugins and commands are equivalent to the ones shown in the command
dialog and do not necessarily need to reference commands in the same scenario as the Akonadi
plugin instance.
Show reminders for events in the given calendar
If congured to do so, the Akonadi plugin can show reminders for calendar events with a set
alarm ag. These reminders will be shown through the Simon dialog engine.
4.8.17 D-Bus
With the D-Bus command, Simon can call exported methods in 3rd party applications directly.
The screenshot below, for example, calls the Pause method of the MPRIS interface of the Tom-
ahawk music playing software.
4.8.18 JSON
Similar to the D-Bus command plugin, the JSON plugin also allows to contact 3rd party applica-
tions to directly invoke functionality (instead of simulating user activity).
118
The Simon Handbook
119
The Simon Handbook
Chapter 5
Questions and Answers
In an effort to keep this section always up-to-date it is available at our online wiki.
120
The Simon Handbook
Chapter 6
Credits and License
Simon
Program copyright 2006-2009 Peter Grasch peter.grasch@bedahr.org, Phillip Goriup, Tschernegg
Susanne, Bettina Sturmann, Martin Gigerl
Documentation Copyright (c) 2009 Peter Grasch peter.grasch@bedahr.org
This documentation is licensed under the terms of the GNU Free Documentation License.
This program is licensed under the terms of the GNU General Public License.
121
The Simon Handbook
Appendix A
Installation
Please see our wiki for install instructions.
122

Das könnte Ihnen auch gefallen