The Simon Handbook

The simon Handbook
The simon Handbook

by Peter H. Grasch
Copyright © 2008-2010 Peter Grasch
simon is an open source speech recognition solution.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".
Table of Contents
1. Introduction..............................................................................................................................................
2. Overview...................................................................................................................................................
Architecture..........................................................................................................................................
Speech Recognition: Background........................................................................................................
Language Model.........................................................................................................................
Acoustic Model...........................................................................................................................
Scenarios..............................................................................................................................................
Base models..........................................................................................................................................
Where to get base models...........................................................................................................
Types of base models..................................................................................................................
Static base model..............................................................................................................
Adapted base model..........................................................................................................
User generated model........................................................................................................
Phoneme set issues.....................................................................................................................
3. Guidelines.................................................................................................................................................
Recordings............................................................................................................................................
Volume.......................................................................................................................................
simon Calibration..............................................................................................................
Audacity Calibration.........................................................................................................
Silence.........................................................................................................................................
Microphone.................................................................................................................................
Sample Quality Assurance..........................................................................................................
4. Using simon...............................................................................................................................................
The simon Main Window.....................................................................................................................
Required Resources for a Working simon Setup.................................................................................
Language Model.........................................................................................................................
Acoustic Model...........................................................................................................................
First run wizard....................................................................................................................................
Scenarios.....................................................................................................................................
Base models................................................................................................................................
Server..........................................................................................................................................
Sound configuration....................................................................................................................
Volume calibration.....................................................................................................................
Scenarios..............................................................................................................................................
Using scenarios...........................................................................................................................
Managing scenarios....................................................................................................................
Adding a new Scenario.....................................................................................................
Edit Scenario.....................................................................................................................
Delete Scenario.................................................................................................................
Import Scenario.................................................................................................................
Export Scenario.................................................................................................................
Base models..........................................................................................................................................
3
Vocabulary...........................................................................................................................................
General........................................................................................................................................
Active Dictionary..............................................................................................................
Shadow Dictionary............................................................................................................
Maintaining the Vocabulary.......................................................................................................
Adding Words.............................................................................................................................
Defining the Word............................................................................................................
Manually Selecting a Terminal...............................................................................
Manually Providing the Phonetic Transcription......................................................
Training the Word.............................................................................................................
Editing a word.............................................................................................................................
Removing a word........................................................................................................................
Special Training..........................................................................................................................
Importing a Dictionary...............................................................................................................
HADIFIX Dictionary........................................................................................................
HTK Dictionary................................................................................................................
PLS Dictionary..................................................................................................................
SPHINX Dictionary..........................................................................................................
Julius Dictionary...............................................................................................................
Grammar...............................................................................................................................................
General........................................................................................................................................
Defining your Grammar.............................................................................................................
Import a Grammar......................................................................................................................
Renaming Terminals...................................................................................................................
Merging Terminals.....................................................................................................................
Training................................................................................................................................................
Storage Directories.....................................................................................................................
Adding Texts..............................................................................................................................
Add trainings-texts............................................................................................................
Local text files...................................................................................................................
On The Fly Training...................................................................................................................
Importing Trainings Samples.....................................................................................................
Commands............................................................................................................................................
Executable Commands...............................................................................................................
Importing Programs..........................................................................................................
Place Commands........................................................................................................................
Importing Places...............................................................................................................
Shortcut Commands....................................................................................................................
Text-Macro Commands..............................................................................................................
List Commands...........................................................................................................................
List Command Display.....................................................................................................
Configuring list elements..................................................................................................
Composite Commands................................................................................................................
Desktopgrid.................................................................................................................................
Input Number..............................................................................................................................
Dictation.....................................................................................................................................
Artificial Intelligence..................................................................................................................
Calculator....................................................................................................................................
4
Filter
Pronunciation Training...............................................................................................................
Keyboard.....................................................................................................................................
Configuration.......................................................................................................................................
General Configuration................................................................................................................
Sound Configuration...................................................................................................................
Device Configuration........................................................................................................
Voice Activity Detection..................................................................................................
Training settings................................................................................................................
Postprocessing...................................................................................................................
Speech Model.............................................................................................................................
Model Settings............................................................................................................................
General..............................................................................................................................
Extensions.........................................................................................................................
Recognition.................................................................................................................................
Server................................................................................................................................
General....................................................................................................................
Network...................................................................................................................
Synchronization and Model Backup.................................................................................
Actions........................................................................................................................................
Recognition.......................................................................................................................
Plugin base font.................................................................................................................
Lists
Social desktop.............................................................................................................................
Adjusting the recognition parameters manually.........................................................................
5. Questions and Answers............................................................................................................................
6. Credits and License..................................................................................................................................
A. Installation...............................................................................................................................................
5
List of Tables
2-1. Sample Vocabulary.................................................................................................................................
4-3. Improved Sample Vocabulary................................................................................................................
4-4. Improved Sample Vocabulary................................................................................................................
4-5. Julius Configuration Files.......................................................................................................................
6
Chapter 1. Introduction
simon is the main front end for the simon open source speech recognition solution. It is a simond client
and provides a graphical user interface for managing the speech model and the commands. Moreover,
simon can execute all sorts of commands based on the input it receives from the server: simond.
In contrast to existing commercial offerings, simon provides a unique do-it-yourself approach to speech
recognition. Instead of predefined, pre-trained speech models, simon does not ship with any model
whatsoever. Instead, it provides an easy to use end-user interface to create language and acoustic models
from scratch.
Additionally the end-user can easily download created use cases from other users and share his / her
own.
The current release can be used to set up command-and-control solutions especially suitable for disabled
people. However, because of the amount of training necessary, continuous, free dictation is neither
supported nor reasonable with current versions of simon.
Because of it's architecture, the same version of simon can be used with all languages and dialects. One
can even mix languages within one model if necessary.
7
Chapter 2. Overview
Architecture
The main recognition architecture of simon consits of three applications.
• simon
This is the main graphical interface.
It acts as a client to the simond server.
• simond
The recognition server.
• ksimond
A graphical front-end for simond.
These three components form a real a client / server solution for the recognition. That means that there is
one server (simond) for one or more clients (simon; This application). KSimond is just a front-end for
simond which means it adds no functionality to the system but rather provides a way to interact with
simond graphically.
Additionally to the simon, simond and ksimond other, more specialized applications are also part of this
integrated simon distribution.
• sam
Provides more in-depth control to your speech model and allows to test the acoustic model.
• ssc / sscd
These two applications can be used to collect large amount of speech samples from different persons
more easily.
Please refer to the individual handbooks of those applications for more details.
8
Chapter 2. Overview
simon is used to create and maintain a representation of your pronunciation and language. This
representation is then sent to the server simond which compiles it into a usable speech model.
simon then records sound from the microphone and transmits it to the server which runs the recognition
on the received input stream. simond sends the recognition result back to the client (simon).
simon then uses this recognition result to execute commands like opening programs, following links,
etc.
simond identifies its connections with a user / password combination which is completely independent
from the underlying operating system and its users. By default a standard user is set up in both simon
and simond so the typical use case of one simond server per simon client will work "out of the box".
Every simon client logs onto the server with a user / password combination which identifies a unique
user and thus a unique speech model. Every user maintains his own speech model but may use it from
different computers (different, physical simon instances) simply by accessing the same simond server.
One simond instance can of course also serve multiple users.
If you want to open up the server to the internet or use multiple users on one server, you will have to
configure simond. Please see the simond manual (help:/simond) for details.
Speech Recognition: Background

Speech recognition systems take voice input (often from a microphone) and try to translate it into
written text. To do that, they rely on statistical representations of human voice. To put it into simple
terms: The computer learns how words - or more correctly the sounds that make up those words - sound.
A speech model consists of two distinct parts:
• Language Model
• Acoustic Model
9
Language Model
The language model defines the vocabulary and the grammar you want to use.
For more information see the vocabulary section and the grammar section.
Acoustic Model
The acoustic model represents your pronunciation in a machine readable format.
Let's look at the following sample vocabulary:
Table 2-1. Sample Vocabulary

Word Terminal Pronunciation
Computer Noun k ax m p y uw t er
Internet Noun ih n t er n eh t
Mail Noun m ey l
close Verb k l ow s
The pronunciation of each word is composed of individual sounds which are separated by spaces. For
example, the word "close" consists of the following sounds:
• k
• l
• ow
• s
The acoustic model uses the fact that spoken words are composed of sounds much like written words
are composed of letters. Using this knowledge, we can segment words into sounds (represented by the
pronunciation) and assemble them back when recognizing. These building blocks are called
"phonemes".
Because the acoustic model actually represents how you speak the phonemes of the words, trainings
material is shared among all words that use the same phonemes.
That means if you add the word "clothes" to the language model, your acoustic model already has an
idea how the "clo" part is going to sound as they share the same phonemes ("k", "l", "ow") at the
beginning.
To train the acoustic model (in other words to tell him how you pronounce the phonemes) you have to
"train" words from your language model. That means that simon displays a word which you read out
loud. Because the word is listed in your vocabulary, simon already knows what phonemes it contains
and can thus "learn" from your pronunciation of the word.
Scenarios
One scenario makes up one complete use case of simon. To control firefox, for example, the user just
installs the firefox scenario.
Each scenario consists of the following components:
10
Chapter 2. Overview
• Vocabulary
• Grammar
• Trainingstexts
• Commands
Scenarios only cover the language model of the recognition system, the acoustic model is completely
independent.
However, in most cases scenarios are tailored to work best with a specific base model to avoid issues
with the phoneme set.
Because scenarios are not specifically bound to the acoustic model, they can be shared and exchanged
between different simon users without problems. To accomodate this community based repository pool,
a category for simon scenarios has been created on kde-files.org (http://kde-files.org/index.php?
xcontentmode=692) where the scenarios, which are just simple text files (XML format), can be
exchanged easily.
For information on how to use scenarios in simon, please refer to the Scenario section in the Use simon
chapter.
Base models
Base models are already generated, most often speaker independent, acoustic models that can be used
with simon.
Using base models, the user can greatly reduce or even eliminate the need for personalized training.
When using a static base model (see below), installation of the HTK is not necessary.
Base models usable by simon consists of four files which you will find in archive when you download
base models from their respective website.
• hmmdefs
• tiedlist
• macros
• stats
The latter two files, macros and stats, are not required when using a static base model and might, in that
case, be replaced with empty files if they are not available.
Where to get base models

To keep this list up to date, please refer to the list in our online wiki (http://www.simon-
listens.org/wiki/index.php/English:_Base_models#Where_to_get_base_models).
Types of base models

There are three types of base models:
• Static base model
• Adapted base model
11
• User generated model
For information on how to use base models in simon, please refer to the Base Models section in the Use
simon chapter.
Static base model

Static base models simply use a pre-compiled acoustic model without modifying it.
Any training data collected through simon will not be used to improve the recognition accuracy.
This type of model does not require the HTK to be installed.
Adapted base model

By adapting a pre-compiled acoustic model you can improve accuracy by adapting it to your voice.
Collected training data will be compiled in a adaption matrix which will then be applied to the selected
base model.
This type of model does require the HTK to be installed.
User generated model

When using user generated models, the user is responsible for training his own model. No base model
will be used.
The training data will be used to compile your own acoustic model allowing you to create a system
which directly reflects your voice.
This type of model does require the HTK to be installed.
Phoneme set issues

Because the statistical comparison happens at phoneme level the base models describe how these
phonemes sound.
Your scenarios (language model) on the other hand describe what phonemes a word is composed of.
In order for this association to work, both your scenarios and your base model need to use the same set
of phonemes.
If you design a new scenario it is therefore a good idea to use the dictionary that was used to create the
base model as shadow dictionary. This way simon will suggest the "correct" phonemes when adding the
words automatically.
If you try to use scenarios designed for a different phoneme set (different base model) then you will get
an error when starting the recognition listing the affected phonemes and words. To fix this, either
transcribe the words according to the base models phoneme set, use a different base model or use an
user generated model.
12
Chapter 3. Guidelines
This chapter lists some general guidelines that are relevant for different parts of simon.
Recordings
If you are using user generated or adapted models, simon builds it's acoustic model based on transcribed
samples of the users voice. Because of this, the recorded samples are of vital importance for the
recognition performance.
Volume
It is important that you check your microphone volume before recording any samples.
simon Calibration
The current version of simon includes a simple way of ensuring that your volume is configured
correctly.
By default the volume calibration is displayed before starting any recording in simon.
13
To calibrate simply read the text displayed.

The calibration will monitor the current volume and tell you to either raise or lower the volume but you
have to do that manually in your systems audio mixer. Once you changed the volume in any way (while
the calibration is running), press the "Volume changed" button next to the affected device. This will
reset the volume status.
During calibration, try to talk normally. Don't yell but don't be overly quiet either. Take into account that
you should generally use the same volume setting for all your training and for the recognition too. You
might speak a little bit louder (unconsciously) when you are upset or at another time of the day so try to
raise your voice a little bit to anticipate this. It is much better to have a little quieter samples than to start
clipping.
In the simon settings, both the text displayed and the levels considered correct can be changed. If you
leave the text empty, the default text will be displayed. In the options you can also deactivate the
calibration completely. See the training section for more details.
Audacity Calibration
Alternatively you can use an audio editing tool like the free Audacity (http://audacity.sourceforge.net) to
monitor the recording volume.
Too quiet:
Too loud:
Perfect volume:
14
Silence
To help simon with the automatic segmentation it is recommended to leave about one or two seconds of
silence on the recording before and after reading the prompted text.
Current simon versions include a graphical notice on when to speak during recording. The message will
tell the user to wait for one second:
... before telling the user to speak:
15
This method of visual feedback proved especially valuable when recording with people who can't read
the prompted text for themselves and therefore need someone to tell them what they have to say. The
colorful visual cue tells them when to start repeating what the facilitator said without the need of
unreliable hand gestures.
Microphone
For simon to work well, a high quality microphone is recommended.
However, even relatively cheap headsets (around 30 Euros) achieve very good results - magnitudes
better than internal microphones.
For maximum compatability we recommend USB headsets as they usually support the necessary
samplerate of 16 kHz, are very well supported from both Microsoft Windows as well as GNU/Linux and
normally don't require special, proprietary drivers to operate.
Sample Quality Assurance

simon will check each recording against certain criteria to ensure that the recorded samples are not
errenous or of poor quality.
16
If simon detects a problematic sample, it will warn the user to re-record the sample.
Currently, simon checks the following criteria:
• Sample peak volume
If the volume is too loud and the microphone started to "clip" (Clipping on wikipedia
(http://en.wikipedia.org/wiki/Clipping_%28audio%29)), simon will display a warning message urging
the user to lower the microphone volume.
• Signal to noise ratio (SNR)
simon will automatically determine the signal to noise ratio of each recording. If the ratio is below a
configurable threshold, a warning message will be displayed.
The default value of 5000 % means that for simon to accept a sample as correctly recorded the peak
volume has to be 500 times louder than the noise baseline (lowest average over 50 ms).
Often this can be a result of either a very low quality microphone, high levels of ambient noise or a
low microphone gain coupled with a "microphone boost" option in the system mixer.
SNR warning message triggered by an empty sample; This information dialog is displayed when
clicking on the "More information" button visible in the background.
17
Chapter 4. Using simon
The following sections will describe how to use simon.
The simon Main Window
The simon main window provides quick access to most of its features through the main toolbar.
There are 9 main actions listed:
• simond connection
This menu item has several states:
• Connect
When simon is not connected to simond the option says "Connect". When activated, simon will
start to connect to simond and change to the "Connecting" state.
Upon connecting to the server from simon for the first time, you might be prompted for a username
and a password. If you haven't done so already, set up a user for simond (see the simond manual
(help:/simond) for details) before continuing and enter the same username and corresponding
password in the login dialog from simon. If you choose to store the password, you can still change
it in the server configuration at any time.
18
• Connecting
When simon is currently connecting to the configured simond server(s) the option says
"Connecting" and is pressed down. When activated, simon will stop trying to connect to simond
and go back to the "Connect" state.
• Activate
When simon established a connection to the server the option will say "Activate" and will not be
pressed down. When activated (or automatically when simon is configured to automatically start
the recognition when it is available) simon will try to start the recognition.
An option to close the connection to simond ("Disconnect") is available through the small down-
arrow next to it.
• Activated
When simon established a connection to the server and has successfully started the recognition the
option will say "Activated" and will be pressed down. When activated simon will deactivate the
recognition but not close the connection to simond - it changes back to the previous "Activate"
state.
An option to close the connection to simond ("Disconnect") is available through the small down-
arrow next to it.
• Add Word
Displays the add word wizard.
• Vocabulary
Displays the vocabulary.
• Grammar
Displays the grammar.
• Training
Displays the training.
• Commands
Displays the commands.
• Synchronize
When connected to the simond, this option will be available.
simon creates the speech input files which are then compiled and used by the simond server (see the
section architecture for more details).
The process of sending the speech input files, compiling them and receiving the compiled versions is
called "synchronization". By default, simon will initiate a synchronization immediately after the
connection has been established and whenever the model changes (please see the configure
synchronization section for information on how to change that).
Using this menu option the synchronization can be triggered manually at any time.
• Scenario selection
This selection box allows you to select the currently displayed scenario. Each subsection (vocabulary,
19
grammar, commands, training) will then adapt to the currently displayed scenario. Selecting a
different scenario here does not affect the recognition.
• Manage scenarios
Shows the scenario management dialog. There you can manage your scenarios and change the options
of the scenario selection box.
The simon main window can be hidden at any time by clicking on the simon logo in the system tray
(usually next to the system clock in the task bar) which will minimize simon to the tray. Click it again to
show the main window again.
Required Resources for a Working simon Setup

Note: For more information about speech models, please refer to the Speech Recognition:
Background section in the Overview chapter.
To get simon to recognize speech and react to it you need a speech model.
Speech models describe how your voice sounds, what words exist, how they sound and what word
combination ("sentences" or "structures") exist.
A speech model basically consists of two parts:
• Language model: Describes all existing words and what sentences are grammatically correct
• Acoustic model: Describes how words sound
You need both these components to get simon to recognize your voice.
Language Model
In most cases you only need to install the appropriate scenario for your use case to set up your language
model.
To create your own language model, you can use simon to add / edit / remove words and grammar
structures.
To make the adding of words easier, you can import a shadow dictionary.
Acoustic Model
To create your own acoustic model you can simple read the trainings texts that come with your selected
scenarios a couple of times.
If you are creating your own scenario you can easily create trainingstexts yourself.
You can, however use static or adapted base models to avoid using the HTK or to improve the
recognition rate.
First run wizard

On the first start of simon, this wizard is displayed to guide you through the initial configuration of
simon.
20
The configuration consists of five easy steps which are outlined below. You can skip each step and even
the whole wizard if you want to - in that case, the system will be set up with default values.
Scenarios
In this step you can download scenarios from the internet and import them into simon.
21
Pressing Get scenarios displays the download dialog.
22
If you import some scenarios here (or later on in the scenario management dialog) you don't need to set
up the vocuablary, grammar, commands, etc. for yourself. Especially for new users it is recommended to
try some scenarios first to see how the system works before diving into configuring it exactly for your
use case.
Base models
In this step you can set up simon to use base models.
23
The configuration page opened is the same one that is described in the base model usage section.
24
After completing or aborting the first run wizard you can change configuration options defined here in
the simon configuration.
Server
Internally, simon is a server / client application. If you want to take advantage of a network based
installation, you can provide the server address here.
25
The default configuration is sufficient for a "normal" installation and will assume that you use a local
simond server that will be automatically be started and stopped with simon.
the server configuration.
Sound configuration
Because simon recognizes sound from one or more microphones, you have to tell simon which devices
you want to use for recognition and training.
26
simon can use one or more input- and output devices for different tasks. You can find more information
about simons multiple device capabilities in the simon sound configuration section.
If you don't set at least one input device to be used for recognition, you will not be able to activate
simon.
When the option Default to power training is selected, simon will, when training, automatically start-
and stop the recording when displaying and hiding (respectively) the recording prompt. This option only
sets the default value of the option, the user can change it at any time before beginning a training
session.
the sound configuration.
Volume calibration
For simon to work correctly, you need to configure your microphones volume to a sensible level.
27
For more details on this, please see the Volume Calibration section in the Guidelines chapter.
Scenarios
This section describes how to use scenarios from within simon. For general information about scenarios,
please refer to the chapter background.
Using scenarios
Beginning with simon 0.3, each word you add will be added to the currently active scenario. The same
goes for grammar sentences, commands, etc.
Using scenarios then becomes just using simon as you did in 0.2.
Per default, simon ships with an empty scenario names "Standard", so your configuration will be stored
in this scenario.
To select which of your currently active scenarios should be changed (for example before adding new
words), just select it from the drop down list in the upper right corner of the simon main window.
28
To change the available options, click on the Manage scenarios button right next to it or use the menu
entry Scenarios > Manage scenarios.
Managing scenarios
The scenario management dialog allows you to load scenarios from your scenario pool as well as to
import and export scenarios to files or directly from / to an online repository.
To load or unload a scenario you can use the arrow buttons between the two lists or simply double click
the option you want to load / unload.
More information about individual scenarios can be found in the tooltips of the list items.
29
Adding a new Scenario
To add a new scenario, select the Add button. A new dialog will be displayed.
When creating a new scenario, please give it a descriptive name. For the later upload on KDE files
(http://kde-files.org/index.php?xcontentmode=692) we would kindly ask you to follow a certain naming
scheme altough this is of course not a requirement: "[<language>/<base model>] <name>". If, for
example you create a scenario in English that works with the Voxforge base model and controls Mozilla
Firefox this becomes: "[EN/VF] Firefox". If your scenario is not specifically tailored to one phoneme set
(base model), just omit the second tag like this: "[EN] Firefox".
The scenario version is just an incremental version number that makes it easier to distinguish between
different revisions of a scenario.
If your scenario needs a specific feature of simon (for example because you use a new plugin), you can
define minimum and maximum version numbers of simon here.
The license of your scenario can be set through the drop down. You can of course also add an arbitrary
license text directly in the input field.
You can then add your name (or alias) to the list of scenario authors. There you will also be asked for
contact information. This field is purely provided as a convenient way to contact a scenario author for
changes, problems, fanmail etc. If you don't feel comfortable providing your e-Mail address you can
simply enter a dash "-" denoting that you are not willing to divulge this information.
30
Edit Scenario
To edit scenarios, just select "Edit" from the "Manage scenarios" dialog.
The dialog works exactly the same as the add scenario dialog.
Delete Scenario
To delete a scenario, select the scenario and click the "Delete" button.
Because scenarios are synchronized with the recognition server, you can restore deleted scenarios
through the model synchronization backup.
Import Scenario
Scenarios can be imported from a local file in simons XML scenario file format but can also be directly
downloaded and imported from the internet.
When downloading scenarios, the list of scenarios is retrieved from simon Scenarios (http://kde-
files.org/index.php?xcontentmode=692) subsection of the OpenDesktop site KDE-files.org (http://kde-
files.org).
If you create a scenario that might be valuable for other simon users, please consider uploading it to this
online repository and help other simon users.
Export Scenario
Scenarios can be exported to a local file in simons XML scenario file format and directly uploaded to
the simon Scenarios (http://kde-files.org/index.php?xcontentmode=692) subsection of the OpenDesktop
site KDE-files.org (http://kde-files.org).
To upload to OpenDesktop sites, you need an account on the site. Registration
31
(http://opendesktop.org/usermanager/new.php) is very easy and of course free of charge.
simon allows you to upload new content directly from within simon (Export > Publish).
32
33
34
To use this functionality, simply enter your account credentials in the social desktop configuration in the
simon configuration.
Base models
This section describes how to use base models from within simon. For general information about base
models, please refer to the chapter background.
To configure simon to use base models, simply select the base model type you want to use and point
simon to the valid files in simons configuration: Settings > Configure simon > Model Settings
35
Load the files with the appropriate "Load" button next to the file you want to set. The files will be
copied to an internal location so the source files can be removed once you have selected them here.
For static models you don't need macros or stats but simon will not start a model compilation (which is
needed even for setups using static base models to generate the language model) without them. If your
base model doesn't provide them you can simply point simon to empty files instead.
Vocabulary
The vocabulary lets the user manage the available words.
General
The vocabulary defines what words the recognition process should recognize. Every word you want to
be able to use with simon should be contained in your vocabulary.
One entry in the vocabulary defines exactly one "word". In contrast to the common use of the word
"word", in simon "word" means one unique combination of the following:
• Wordname
(The written word itself)
• Terminal
(Grammatical category; For example: "Noun", "Verb", etc.)
• Pronunciation
36
(How the word is pronounced; simon accepts any kind of phonetic as long as it does not use special
characters or numbers)
That means that plurals or even different cases are different "words" to simon. This is an important
design decision to allow more control when using a sophisticated grammar.
In general, it is advisable to keep your vocabulary as sleek as possible. The more words, the higher the
chance that simon might misunderstand you.
Example vocabulary (please note that the terminals here are deliberately set to Noun / Verb to help the
understanding; Please to refer to the grammar section why this might not be the best idea):

Word Terminal Pronunciation
Computer Noun k ax m p y uw t er
Internet Noun ih n t er n eh t
Mail Noun m ey l
close Verb k l ow s
Active Dictionary
The vocabulary used for the recognition is referred to as active dictionary or active vocabulary.
Shadow Dictionary
As said above, the user should keep his vocabulary / dictionary as lean as possible. However, as a word
in your vocabulary has to also have information about it's pronunciation, it would also be good to have
large dictionary where you could look up the pronunciation and other characteristics of the words.
simon provides this functionality. We refer to this large reference dictionary as "shadow dictionary".
This shadow dictionary is not created by the user but can be imported from various sources.
As simon is a multi-language solution we do not ship shadow dictionaries with simon. However, it is
very easy to import them yourself using the import dictionary wizard. This is described in the Import
Dictionary section.
Maintaining the Vocabulary

simon provides a "Vocabulary" menu which lists the current vocabulary.
37
Per default, the active vocabulary is shown. To display the shadow vocabulary select the tab Shadow
Vocabulary.
Every word states it "recognition rate" which at the moment is just a counter of how often the word has
been recorded (alone or together with other words).
When this number is only one or zero the word entry is colored red (1: light red; 0: dark red). This is a
visual warning. When a word contains a phoneme combination that is not covered by any other word
and the word with this unusual phoneme combination is never recorded (recognition rate = 0), the model
will fail to compile. However, simon will display an appropriate error message when the compilation of
the model fails because of such an issue. In general it is a good idea to record each word at least once or
twice (at best when adding the word) to avoid such problems.
Because the shadow dictionary is not used for the recognition there are of course no trainings-samples
for words in the shadow dictionary. So don't be alarmed if all the entries in the shadow dictionary are
colored dark red - this is perfectly normal.
38
Adding Words
To add new words to the active vocabulary, use the add word wizard.
Adding words to simon is basically a two step procedure:
• Defining the word
• Initial training
Defining the Word

Firstly, the user is asked which word he wants to add.
39
When the user proceeds to the next page, simon automatically tries to find as much information about
the word in the shadow dictionary as possible.
If the word is listed in the shadow dictionary, simon automatically fills out all the needed fields
(Terminal and Pronunciation).
40
All suggestions from the shadow dictionary are listed in the table "Similar words". Per default only
exact word matches are shown. However, this can be changed by checking the "Also show non-perfect
matches" checkbox below the suggestion table. Using similar words you can quickly deduce the correct
pronunciation of the word you are actually trying to add. See below for details.
Of course this really depends on your shadow dictionary. If the shadow dictionary does not contain the
word you are trying to add, the required fields have to be filled out manually.
Some dictionaries that can be imported with simon (SPHINX, HTK) do not differentiate between upper
and lower case. Suggestions based on those dictionaries will always be uppercase. You are of course
free to change these suggestions to the correct case.
Some dictionaries that can be imported with simon (SPHINX, PLS and HTK) provide no grammatical
information at all. These will assign all the words to the terminal "Unknown". You should change this to
something appropriate when adding those words.
Manually Selecting a Terminal

The terminal of the word is defined as the grammatical category the word belongs to. This might be
"Noun", "Verb" or completely new categories like "Command". For more information see the grammar
section.
The list contains all terminals used in both your active and your shadow lexicon and in your grammar.
You can add new terminals to the drop-down menu by using the green plus sign next to it.
Manually Providing the Phonetic Transcription

The pronunciation is a bit trickier. simon does not need a certain type of phonetics so you are free to use
41
any method as long as it uses only ASCII characters and no numbers. However, if you want to use a
shadow dictionary and want to use it to it's full potential you should use the same phonetics as the
shadow dictionary.
If you don't know how to transcribe a word yourself you can easily use your shadow dictionary to help
you with the transcription - even if the word is not listed in it. Let's say we want to add the word
"Firefox" (to launch firefox) which is of course not listed in our shadow dictionary.
(I imported the English voxforge HTK lexicon available from voxforge
(http://voxforge.org/home/downloads) as a shadow dictionary.)
"Firefox" is not listed in our shadow dictionary so we don't get any suggestion at all.
However, we know that firefox sounds like "fire" and "fox" put together. So let's just open the
vocabulary (you can keep the wizard open) by selecting "Vocabulary" from your simon main toolbar.
Switch to the shadow vocabulary by clicking on the tab Shadow Vocabulary.
Use the "Filter"-Box above the list to search for "Fire":
42
We can see, that the word "Fire" is transcribed as "f ay r". Now filter for "fox" instead of "Fire" and we
can see that "Fox" is transcribed as "f ao k s". We can assume, that firefox should be transcribed as "f ay
r f ao k s".
Using this approach of deducing the pronunciation from parts of the word has the distinct advantage that
we not only get a high quality transcription but also automatically use the same phoneme set as the other
words which were correctly pulled out of the shadow dictionary.
We can now enter the pronunciation and change the terminal to something appropriate.
43
Training the Word
To complete the wizard we can now train the word twice. If you don't want to do this or for example use
a static base model, you can skip these two pages.
Because you are about to record some training samples, simon will display the volume calibration to
make sure that your microphone is set up correctly. For more information please refer to the volume
calibration section
simon will try to prompt you for real-world examples. To do that, simon will automatically fetch
grammar structures using the terminal of the word and substitute the generic terminals with example
words from your active lexicon.
For example: You have the grammar structure "Trigger Command" and have the word "Computer" of
the terminal "Trigger" in your vocabulary. You then add a new word "Firefox" of the terminal
"Command". simon will now automatically prompt you for "Computer Firefox" as it is - according to
your grammar - a valid sentence.
If simon is unable to find appropriate sentences using the word (i.e.: No grammar, not enough words in
your active lexicon, etc.) it will just prompt you for the word alone.
Although simon ensures that the automatically generated examples are valid, you can always override
it's suggestion. Just switch to the "Examples" tab on the "Define Word" page.
44
You are free to change those examples to anything you like. You can even go so far and use words that
are not yet in your active lexicon as long as you add them before you synchronize the model, although
this is not recommended.
All that is left is to record the examples.
45
Make sure you follow the guidelines listed in the recording section.
Editing a word
To edit a word, simply select it from the vocabulary, and click on Edit word.
There you can change name, terminal and pronunciation of the selected word.
Removing a word
To remove a word from your language model, select it in the vocabulary view and click on "Remove
46
selected word".
The dialog offers four choices:

• Move the word to the "Unused" terminal.
Because you (hopefully) don't use the terminal "Unused" in your grammar, the word will no longer be
considered for recognition. In fact, it will be removed from the active vocabulary before compiling
the model because no grammar sentence references it.
If you want to use the terminal "Unused" in your grammar, you can of course use a different terminal
for unused words. Just set the terminal through the Edit word dialog.
To use the word again, just set the right terminal again. No data will be lost.
• Move the word to the shadow lexicon
This will remove the selected word from the active lexicon (and thus from the recognition) but will
keep a copy in the shadow vocabulary. All the recordings containing the word will be preserved.
To use the word again, add it again to the active vocabulary. When adding a "new" word with the
same name the values of the moved word will be suggested to you. Therefore, no data will be lost.
• Delete the word but keep the samples
Removes the word completely but keeps the associated samples. Whenever you add another word
47
with the same word name the samples will be re-associated.
Be careful with this option as the new word you add again might be transcribed differently and this
difference can not be taken into account automatically (simon will then try to force the new
transcription on the old recordings during the model compilation).
Do not use this option if the samples you recorded for this word were errenous.
• Remove the word completely
Just remove the word. All the recordings containing the word will be removed too.
This option leaves no trace of neither the word itself nor the associated samples.
Because samples are global (not assigned to scenarios), even samples recorded from trainings session
of other scenarios might be removed as well if they contain the word. Use this option carefully.
Special Training
Please see the special training section in the training section.
Importing a Dictionary
simon provides the functionality to import large dictionaries as a reference. This reference dictionary is
called shadow dictionary.
When the user adds a new word to the model, he has to define the following characteristics to define this
word:
• Wordname
• Terminal
• Phonetic definition
These characteristics are taken out of the shadow dictionary if it contains the word in question. A large,
high quality shadow dictionary can thus help the user to easily add new words to the model without
keeping track of the phoneme set or - in many cases - even let him forget a the phonetic transcription is
needed at all.
48
Since version 0.3 you can also import dictionaries directly to the active dictionary. This option is mostly
there to make it easier to move to simon from custom solutions and to encourage importing of older
models (for example one used with simon 0.2). You will almost never want to import a very large
dictionary as active dictionary.
You can find a list of available dictionaries that work with simon on the simon wiki (http://www.simon-
listens.org/wiki/index.php/English:_Shadow_dictionary).
simon is able to import five different types of dictionaries:
• HADIFIX
• HTK
• PLS
• SPHINX
• Julius
HADIFIX Dictionary
simon can import HADIFIX dictionaries.
One example of a HADIFIX dictionary is the German HADIFIX BOMP (http://www.sk.uni-
bonn.de/forschung/phonetik/sprachsynthese/bomp).
Hadifix dictionaries provide both terminals and pronunciation.
Due to a special exemption in their license the simon listens team is proud to be able to offer you to
49
download the excellent HADIFIX BOMP directly from within simon.
Using the automatic bomp import you can, after providing name and e-Mail address for the team of the
University Bonn, directly download and import the dictionary from the simon listens server.
HTK Dictionary
simon can import HTK lexica.
One example of a HTK lexicon is the English Voxforge dictionary
(http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Lexicon/).
Hadifix dictionaries provide pronunciation information but no terminals. All words will be assigned to
the terminal "Unknown".
PLS Dictionary
simon can import PLS dictionaries.
One example of a PLS dictionary is the German GPL dictionary from Voxforge
(http://www.repository.voxforge1.org/downloads/de/Trunk/Lexicon/).
PLS dictionaries provide pronunciation information but no terminals. All words will be assigned to the
terminal "Unknown".
SPHINX Dictionary
simon can import SPHINX dictionaries.
50
One example of a SPHINX dictionary is this dictionary for Mexican Spanish

(http://speech.mty.itesm.mx/~jnolazco/proyectos.htm).
SPHINX dictionaries provide pronunciation information but no terminals. All words will be assigned to
the terminal "Unknown".
Julius Dictionary
simon can import Julius vocabularies.
One example of a Julius vocabularies are the word lists of simon 0.2.
Julius dictionaries provide pronunciation information as well as terminal information.
Grammar
The grammar defines which combinations of words are correct.
General
Let's look at an example: You want to use simon to launch programs and close those windows when you
are done. You would like to use the following commands:
• "Computer, Internet" to open a browser
• "Computer, Mail"
To open a mail client
• "Computer, close"
To close the current window
Following English grammar, your vocabulary would contain the following:

Word Terminal
Computer Noun
Internet Noun
Mail Noun
close Verb
To allow the sentences defined above simon would need the following grammar:
• "Noun Noun" for sentences like "Computer Internet"
• "Noun Verb" for sentences like "Computer close"
While this would work, it would also allow the combinations "Computer Computer", "Internet
Computer", "Internet Internet", etc. which are obviously bogus. To improve the recognition accuracy,
we can try to create a grammar that better reflects what we are trying to do with simon.
It is important to remember that you define your own "language" when using simon. That means that
you are not bound to grammar rules that exist in whatever language you want to use simon with. For a
simple command and control use-case it would for example be advisable to invent new grammatical
51
rules to eliminate the differences between different commands imposed by grammatical information not
relevant for this use case.
In the example above it is for example not relevant that "close" is a verb or that "Computer" and
"Internet" are nouns. Instead, why not define them as something that better reflects what we want them
to be:
Table 4-3. Improved Sample Vocabulary

Word Terminal
Computer Trigger
Internet Command
Mail Command
close Command
Now we change the grammar to the following:

• "Trigger Command"
This allows all the combinations described above. However, it also limits the possibilities to exactly
those three sentences. Especially in larger models a well thought grammar and vocabulary can mean a
huge difference in recognition results.
Defining your Grammar

simon provides an easy to use text based interface to change the grammar. You can simply list all the
allowed sentences (without any punctuation marks, obviously) like described above.
52
When selecting a sentence on the left, the right pane will automatically show possible real sentences
with the words of your vocabulary on the right.
The example section will list at most 35 examples so if more than that amount of sentences match the
selected grammar entry, the list might not be complete.
Import a Grammar
Additionally to simply entering your desired grammar sentence by sentence, simon is able to
automatically deduce allowed grammar structures by reading plain text using the Import Grammar
wizard.
53
simon can read and import text files but also provides an input field if you want to simply type the text
into simon.
Say we have a vocabulary like in the general section above:
Table 4-4. Improved Sample Vocabulary

Word Terminal
Computer Trigger
Internet Command
Mail Command
close Command
We want simon to recognize the sentence "Computer Internet!". So we either enter the text using the
Import text option or create a simple text file with this content "Computer Internet!" (any punctuation
mark would work) and save it as "simongrammar.txt" to use the Import files option.
54
55
simon will then read the entered text or all the given text files (in this case the only given text file is
"simongrammar.txt") and look up every single word in both active and shadow dictionary (the definition
in the active dictionary has more importance if the word is available in both). It will then replace the
word with its terminal.
In our example this would mean that he would find the sentence "Computer Internet". simon would find
out that "Computer" is of the terminal "Trigger" and "Internet" of the terminal "Command". Because of
this simon would "learn" that "Trigger Command" is a valid sentence and add it to its grammar.
The import automatically segments the input text by punctuation marks (".", " - ", "!", etc.) so any
natural text should work. The importer will automatically merge duplicate sentence structures (even
across different files) and add multiple sentence (all possible combinations) when a word has multiple
terminals assigned to it.
The import will ignore sentences where one or more words could not be found in the language model
unless you tick the "Also import unknown sentences" checkbox in which case those words are replaced
with "Unknown".
Renaming Terminals
The rename terminal wizard allows you to rename terminals in both your active vocabulary, your
shadow dictionary and the grammar.
56
Merging Terminals
The merge terminal wizard allows you to merge two terminals into one new terminal in both your active
vocabulary, your shadow dictionary and the grammar.
57
This functionality is especially useful if you want to simplify your grammar structures.
Training
Using the Training-module, you can improve your acoustic model.
The interface lists all installed trainings-texts in a table consisting of three columns:
• Name
A descriptive name for the text.
• Pages
The number of "pages" the text consists of. Each page represents one recording.
• Recognition Rate
Analogue to the vocabulary; Represents how likely simon will recognize the words (higher is better).
The recognition rate of the trainings-text is the average recognition rate of all the words in the text.
58
To improve the acoustic model - and thus the recognition rate - you have to record trainings-texts. This
means that simon gets essentially two needed parts:
• Samples of your speech
• Transcriptions of those samples
The active dictionary is used to transcribe the words (mapping them from the actual word to its phonetic
transcription) that make up the text so every word contained in the trainings-text you want to read (train)
has to be contained in your active dictionary. simon will warn you if this is not the case and provide you
with the possibility to add all the missing words in one go.
The procedure is the same as if you would add a single word but the wizard will prompt you for details
and recordings for all the missing words automatically. This procedure can be aborted at any time and
simon will provide both a way to add the already completely defined words and to undo all changes
59
done so far. When the user has added all the words he is prompted for (all the words missing) the
changes to the active dictionary / vocabulary are saved and the training of the previously selected text
starts automatically.
The training (reading) of the trainings-text works exactly the same as the initial training when adding a
new word.
Make sure you follow the guidelines listed in the recording section.
Storage Directories
Trainings-texts are stored at two different locations:
• Linux: ~/.kde/share/apps/simon/texts
Windows: %appdata%\.kde\share\apps\simon\texts
The texts of the current user. Can be deleted and added with simon (see below).
• Linux: `kde4-config --prefix`/share/apps/simon/texts
Windows: (install directory)\share\apps\simon\texts
60
System wide texts. They will appear on every user account using simon on this machine and can not
be deleted from within simon because of the obvious permission restrictions on system wide files.
This folder can be used by system administrators to provide a common set of trainings-texts for all the
users on one system.
The XML files (one for each text) can just be moved from one location to the other but this will most
likely require admin privileges.
Adding Texts
The add texts wizard provides a simple way to add new trainings-texts to simon.
When importing text files, simon will automatically try to recognize individual sentences and split the
text into appropriate "pages" (recordings). The algorithm treats text between "normal" punctuation (".",
"!", "?", "...", """,...) and line breaks as "sentences". Each "sentence" will be on its own page.
simon supports two different sources for new trainings-texts.
61
Add trainings-texts
Simply enter the trainingstext in an input field.
62
Local text files
simon can import normal text files to use them as trainings-texts.
On The Fly Training

Additionally to trainings-texts, simon also allows to train individual words or word combinations from
your dictionary on-the-fly.
This feature is located in the vocabulary-menu of simon.
63
Select the words to train from the vocabulary on the left and simply drag them to the selection list to the
right (you could also select them in the table on the left and add them by clicking "Add to Training".
Start the training by selecting "Train selected words". The training itself is exactly the same as if it were
a pre-composed trainings-text.
64
If there are more than 9 words to train simon will automatically split the text evenly across multiple
pages.
Of course you are free to add words from the shadow lexicon to the list of words to train but simon will
prompt you to add the words before the training starts just like he would if you would train a text that
contains unknown words (see above).
Importing Trainings Samples

Using the import trainings-data field one can import previously gathered trainings-samples from
previous simon versions or manual trainings without copying the whole dictionary.
This feature is very specific. Please use it with caution and make sure that you know exactly what you
are doing before you continue.
You can either provide a separate prompts file or let simon extract the transcriptions from the filenames.
When using prompts based transcriptions your prompts file (UTF-8) needs to contain lines of the
following content: "[filename] [content]". Filenames are without file extensions and the content has to
be uppercase. For example: "demo_2007_03_20 DEMO" to import the file "demo_2007_03_20.wav"
containing the spoken word "Demo".
65
Because prompts files do not contain a file extension, simon will try wav, mp3, ogg and flac (in that
order). If one of those match, no other extension will be tested and only the first file will be imported (in
contrast to file based transcription where all files would be imported).
When using file based transcriptions, a file called this_is_a_test.wav must contain "This is a test" and
nothing else. Numbers and special characters (".", "-",...) in the filename are ignored and stripped.
Files recorded by simon 0.2 will follow this naming scheme so you can safely import them using the file
name extraction method. Files generated by previous simon versions should not be imported using this
function but you can use the prompts based import for that.
Imported files and their transcription are then added to the trainings corpus.
To import a directory containing trainings-samples just select the folder to import and depending on
your import type also the prompts file.
The folder will be scanned recursively. This means that the given folder and all its subfolders will be
searched for .wav, .flac, .mp3 and .ogg files. All files found will be imported.
When importing the sound files, all configured post processing filters are applied.
If you import anything other than WAV files you are responsible for decoding them during the import
process (for example through post processing filters) or the model creation will fail.
66
Commands
When simon is active and recognizes something, the recognition result is given to the loaded command
plug-ins (in order) for processing.
The command system can be compared with a group of factory workers. Each one of them knows how
to perform one task (e.g. "Karl" knows how to start a program and "Joe" knows how to open a folder,
etc.). Whenever simon recognizes something it is given to "Karl" who then checks if this instruction is
meant for him. If he doesn't know what to do with it, it is handed over to "Joe" and so on. If none of the
loaded plugins know how to process the input it is ignored. The order in which the recognition result is
given to the individual commands (people) is configurable in the command options (Commands >
Manage plugins).
67
Each plugin can be associated with a "trigger". Using triggers, the responsibility of each plugin can be
easily be divided.
Using the factory workers abstraction from above it could be compared to stating the name of who you
mean to process your request. So instead of "Open my home folder" you say "Joe, open my home
folder" and "Joe" (the plugin responsible for opening folders) will instantly know that the request is
meant for him.
In practice you could have commands like the executable command "Firefox" to open the popular
browser and the place command "Google" to open the web search engine. If you assign the trigger
"Start" to the executable plugin and the trigger "Open" to the place command you would have to say
"Start Firefox" (instead of just "Firefox" if you don't use a trigger for the executable plugin) and "Open
Google" to open the search engine (instead of just "Google").
Triggers are of course no requirement and you can easily use simon without defining any plugin triggers
(although many plugins come with a default trigger of "Computer" set which you would have to
remove). But even if you use just on trigger for all your commands (like "Computer" to say "Computer,
Firefox" and "Computer, Google" like) it has the advantage of greatly limiting the number false-
positives.
simons command dialog displays the complete phrase associated with a command in the upper right
corner of the command configuration.
You can load multiple instances of one plugin even in one scenario. Each instance can of course also
have a different plugin trigger.
Each Command has a name (which will trigger its invocation), an icon and more fields depending on the
type of the plugin (see below).
Some command plugins might provide a configuration of the plugin itself (not the commands it
contains). There configuration pages will be plugged directly into the action configuration dialog (below
the General menu item) when you load the associated plugin.
68
Plugins that provide a graphical user interface (like for example the input number command plugin) can
be configured by configuring "Voice commands". You can change the associated word that will trigger
the button, for example, but also change the displayed icon, etc. If you remove all voice interface
commands from a graphical element, the element will be hidden automatically.
Voice interface commands are added just like normal commands through the command configuration.
To add a new interface command to a function, just select the action you want to associate with a
command, click Create from template and adapt the resulting commando to your needs.
Some plugins (for example the desktopgrid or the calculator might also provide a menu item in the
Commands menu.
Executable Commands
Executable commands are associated with an executable file ("Program") which is started when the
command is invoked.
69
Arguments to the commands are supported. If either path to the executable or the parameters contain
spaces they must be wrapped in quotes.
Given the executable file C:\Program Files\Mozilla Firefox\firefox.exe the local html file
C:\test file.html the correct line for the "Executable" would be: "C:\Program Files\Mozilla
Firefox\firefox.exe" "C:\test file.html".
The working directory defines where the process should be launched from. Given the working directory
C:\folder, the command "C:\Program Files\Mozilla Firefox\firefox.exe" file.html
would cause firefox to search for the file C:\folder\file.html.
The working directory does not normally need to be set and can be left blank most of the time.
70
Importing Programs
For even easier configuration simon provides an import dialog which allows you to select programs
directly from the KDE menu.
Note: This option is not available on Microsoft Windows.
The dialog will list all programs that have an entry in your KDE menu in their respective category.
Sub-Categories are not supported and are thus listed on the same level as top-level categories.
Just select the program you wish to start with simon and press Ok. The correct values for the executable
and the working directory as well as an appropriate command name and description will automatically
be filled out for you.
Place Commands
With place commands you can allow simon to open any given URL. Because simon just hands the
address over to the platforms URL handler, special Protocols like "remote:/" (on Linux/KDE) or even
KDEs "Web-Shortcuts" are supported.
Instead of folders, files can also be set as the commands URL which will cause the file to be opened
with the application which is associated with it when the command is invoked.
71
To associate a specific URL with the command you can manually enter it in the URL field (select
Manual first) or import it with the import place wizard.
Importing Places
The import place dialog allows you to easily create the correct URL for the command.
To add a local folder, select Local Place and choose the folder or file with the file selector.
72
To add a remote URL (HTTP, FTP, etc.) choose Remote URL
Please note that for URLs with authentication information the password will be stored in clear text.
Shortcut Commands
Using shortcut commands the user can associate commands with key-combinations.
The command will simulate keyboard input to "press" shortcuts like "Ctrl+C" or "Alt+F4".
73
To select the shortcut you wish to simulate just toggle the shortcut button and press the key combination
on your keyboard.
simon will capture the shortcut and associate it with the command.
Due to technical limitations there are several shortcuts on Microsoft Windows that can not be captured
by simon (this includes e.g. Ctrl+Alt+Del and Alt+F4). These special shortcuts can be selected from a
list below the aforementioned shortcut button.
Note: This selection box is not visible in the screenshot above as the list is only displayed in the
Microsoft Windows version of simon.
74
Text-Macro Commands
Using text-macro commands, the user can associate text with a command. When the command is
invoked, the associated text will be "written" by simulating keystrokes.
List Commands
The list command is designed to combine multiple commands (all types of commands are supported)
into one list. The user can then select the n-th entry by saying the associated number (1-9).
This is very useful to limit the amount of training required and provides the possibility to keep the
vocabulary to a minimum.
75
List commands are especially useful when using commands with difficult triggers or commands that can
be grouped under a general theme. A typical example would be a command "Startmenu" to present a list
of programs to launch. That way the specific executable commands can still retain very descriptive
names (like "OpenOffice.org Writer 3.1") without the user having to include these words in his
vocabulary and consider them in the grammar just to trigger them.
Commands of different types can of course be mixed.
List Command Display

When invoked, the command will display the list centered on the screen. The list will automatically
expand to accompany its items.
76
The user can invoke the commands contained in the list by simply saying their associated number (In
this example: "One" to launch Mozilla Firefox).
While a list command is active (displayed), all input that is not directed at the list itself (other
commands, etc.) will be rejected. The process can be canceled by pressing the "Cancel" button or by
saying "Cancel".
If there are more than 9 items simon will add "Next" and "Back" options to the list ("Zero" will be
associated with "Back" and "Nine" with "Next").
77
Configuring list elements
By default the list command uses the following trigger words. To use list commands to their full
potential, make sure that your language and acoustic model contains and allows for the following
"sentences":
• "Zero"
• "One"
• "Two"
• "Three"
• "Four"
• "Five"
• "Six"
• "Seven"
• "Eight"
• "Nine"
• "Cancel"
Of course you can also configure these words in your simon configuration:
78
• Commands > Manage plugins > General > Lists for the scenario wide list configuration.
• Settings > Configure simon... > Actions > Lists for the global configuration. When creating a new
scenario, the scenario configuration will be initialized with a copy of this list configuration.
List commands are internally also used by other plugins like for example the desktopgrid. The
confiugration of the triggers also affects their displayed lists.
Composite Commands
Composite commands allow the user to group multiple commands into a sequence.
When invoked the commands will be executed in order. Delays between commands can be inserted.
Using the composite command the user can compose complex "macros". The screenshot above - for
79
example - does the following:
• Start Kopete (Executable Command)
• Wait 2000ms for Kopete do be started
• Type "Mathias" (Text-Macro Command) which will select Mathias in my contact list
• Press Enter (Shortcut Command)
• Wait 1000ms for the chat window to appear
• Write "Hi!" (Text-Macro Command); The text associated to this command contains a newline at the
end so that the message will be send.
• Press Alt+F4 (Shortcut Command) to close the chat window
• Press Alt+F4 (Shortcut Command) to close the kopete main window
Desktopgrid
The desktopgrid allows the user to control his mouse with his voice.
The desktopgrid divides the screen into nine parts which are numbered from 1-9. Saying one of these
numbers will again divide the selected field into 9 fields again numbered from 1-9, etc. This is repeated
3 times. After the fourth time the desktopgrid will be closed and simon will click in the middle of the
selected area.
The exact click action is configurable but defaults to asking the user. Therefore you will be presented
with a list of possible click modes. When selecting Drag and Drop, the desktopgrid will be displayed
again to select the drop point.
80
While the desktopgrid is active (displayed), all input that is not directed at the desktopgrid itself (other
commands, etc.) will be rejected. Say "Cancel" at any time to abort the process.
The desktopgrid plugin registers a configuration screen right in the command configuration when it is
loaded.
The trigger that invokes the desktopgrid is of course completely configurable. Moreover the user can use
"real" or "fake" transparency. If your graphical environment allows for compositing effects ("desktop
effects") then you can safely use "real" transparency which will make the desktogrid transparent. If your
platform does not support compositing simon will simulate transparency by taking a screenshot of the
screen before displaying the desktopgrid and display that picture behind the desktopgrid.
If the desktopgrid is configured to use real transparency and the system does not support compositing it
81
will display a solid gray background.
However, nearly all up-to-date systems will support compositing (real transparency).
This includes:
• Microsoft Windows 2000 or higher (XP, Vista, 7)
• GNU/Linux using a composite manager like Compiz, KWin4, xcompmgr, etc.
By default the desktopgrid uses numbers to select the individual fields. To use the desktopgrid, make
sure that your language and acoustic model contains and allows for the following "sentences":
• "One"
• "Two"
• "Three"
• "Four"
• "Five"
• "Six"
• "Seven"
• "Eight"
• "Nine"
• "Cancel"
To configure these triggers, just configure the commands associated with the plugin.
82
Input Number
Using the input-number plugin the user can input large numbers easily.
Using the Dictation or the Text-Macro plugin one could associate the numbers with their digits and use
that as input method. however, to input larger numbers there are two ways that both have significant
disadvantages:
• Adding the words "eleven, "twelve", etc.
While this seems like the most elegant solution as it would enable the user to say
"fivehundredseventytwo" we can easily see that it would be quite a problem to add all these words -
let alone train them. What about "twothousandninehundredtwo"? Where to stop?
• Spell out the number using the individual digits
While this is not as elegant as stating the complete number it is much more practical.
However, many applications (like the great mouseless browsing firefox addon) rely on the user to
input large numbers without too much time passing between the individual keystrokes (mouselss
browsing for example will wait exactly 500ms per default before it considers the input of the number
complete). So if you want to enter 52 you would first say "Five (pause) Two". Because of the needed
pause, the application (like the mouseless browsing plugin) would consider the input of "Five"
complete.
The input number plugin - when triggered - presents a calculator-like interface for inputting a number.
83
The input can be corrected by saying "Back". It features a decimal point accessible by saying "Comma".
When saying "Ok" the number will be typed out. As all the voice-input and the correction is handled by
the plugin itself the application that finally receive the input will only get couple of milliseconds
between the individual digits.
While the input number plugin is active (the user currently inputs a number), all input that is not directed
at the input number plugin (other commands, etc.) will be rejected. Say "Cancel" at any time to abort the
process.
As there can no command instances be created of this plugin it is not listed in the "New Command"
dialog. However, the input number plugin registers a configuration screen right in the command
configuration when it is loaded.
84
The trigger defines what word or phrase that will trigger the display of the interface.
By default the input number plugin uses numbers to select the individual digits and a couple of control
words. To use the input number plugin, make sure that your language and acoustic model contains and
allows for the following "sentences":
• "Zero"
• "One"
• "Two"
• "Three"
• "Four"
• "Five"
• "Six"
• "Seven"
• "Eight"
• "Nine"
• "Back"
• "Comma"
• "Ok"
85
• "Cancel"
To configure these triggers, just configure the commands associated with the plugin.
Dictation
The dictation plugin writes the recognition result it gets using simulated keystrokes.
Assuming you didn't define a trigger for the dictation plugin it will accept all recognition results and just
write them out. The written input will be considered as "processed input" and thus not be relayed to
other plugins. This means that if you loaded the dictation plugin and defined no trigger for it, all plugins
below it in the "Selected Plug-Ins" list in the command configuration will never receive any input.
dialog.
The dictation plugin can be configured to append texts after recognition results to for example add a
space after each recognized word.
86
Artificial Intelligence
The Artificial Intelligence is a just-for-fun plugin that emulates a human conversation.
Using the festival text to speech technology the computer can "talk" with the user and answer question
or chat about the weather.
The plugin uses AIMLs for the actual "intelligence". Most AIML sets should be supported. The popular
A. L. I. C. E. bot (http://www.pandorabots.com/pandora/talk?botid=f5d922d97e345aa1) and a German
version work and are shipped with the plugin.
87
The plugin registers a configuration screen in the command configuration menu where you can choose
which AIML set to load.
simon will look for AIML sets in the following directory:
• GNU/Linux: `kde4-config --prefix`/share/apps/ai/aimls/
• Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by
default)]\share\apps\ai\aimls\
To add a new set just create a new directory with a descriptive name and copy the .aiml files into it.
To adjust your bots personality have a look at the bot.xml and vars.xml files in the following directory:
• GNU/Linux: `kde4-config --prefix`/share/apps/ai/util/
• Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by
default)]\share\apps\ai\util\
The plugin will use mbrola voices (http://tcts.fpms.ac.be/synthesis/mbrola.html) if they are installed.
dialog.
It is recommended to not use any trigger for this plugin to provide a more natural "feel" for the
conversation. The AI plugin will pass any input through to the other plugins, even tough it will react on
any input given. This makes it possible to add a "conversation" to the command & control use-case by
developing custom AIMLs sets (e.g.: User: "Computer, open Firefox"; Computer: "Certainly, Sir!
Starting Firefox..."; Firefox opens).
Please keep in mind that the AI plugin will only work if festival is installed, set-up correctly and lies in
your system path.
88
Calculator
The calculator plugin is a simple, voice controlled calculator.
The calculator extends the Input Number plugin by providing additional features.
When loading the plugin, a configuration screen is added to the plugin confiugration.
There you can also confiugre the control mode of the calculator. Setting the mode to something else than
Full calculator will hide options from the displayed widget.
89
However, the hidden controls will, in contrast to simply removing all associated command from the
functions, still react to the configured voice commands.
When selecting Ok, the calculator will by default ask you what to do with the generated result. You can
for example output the calculation, the result, both, etc. Besides always selecting this from the displayed
list after selecting the Ok button, this can also be set in the configuration options.
Filter
Using the filter plugin, you can intercept recognition results from being passed on to further command
plugins. Using this plugin you can for example disable the recognition by voice.
The filter has two states:
• Inactive
The default state. All recognition results will be passed through.
• Active
When activated, the filter will "eat" all results that match the configured pattern. By default this
means every result that simon recognizes will be accepted by the filter and therefore not relayed to
any of the plugins following the filter plugin.
The filter command plugin registers a configuration screen in the command configuration where you can
90
change what results should be filtered.
The pattern is a regular expression that will be evaluated each time a recognition results receives the
plugin for processing.
The plugin also registers voice interface commands for activating and deactivating the filter.
Pronunciation Training
The pronunciation training, when combined with a good static base model, can be a powerful tool to
improve your pronunciation of a new language.
91
Essentially, the plugin will prompt you to say specific words. The recognition will then recognize your
pronunciation of the word and compare it to your speech model which should be a base model of native
speakers for this to work correctly. Then simon will display the recognition rate (how similar your
version was to the stored base model).
The closer to the native speaker, the higher the score.
The plugin adds an entry to your Commands menu to launch the pronunciation training dialog.
The training itself consists of multiple pages. Each page contains one word fetched from your active
vocabulary. They are identified by a terminal which needs to be selected in the command configuration
before starting the training.
92
Keyboard
The keyboard plugin displays a virtual, voice controlled keyboard.
The keyboard consits of multiple tabs, each possibly containing many keys. The entirety of tabs and
keys are collected in "sets".
You can select sets in the configuration but also create new ones from scratch in the keyboard command
confiugration.
93
Keys are usually mapped to single characters but can also hold long texts and even shortcuts. Because of
this, keyboard sets can contain special keys like a "select all" key or a "Password" key (typing your
password).
Next to the tabs that hold the keys of your set, the keyboard my also show special keys like Ctrl, Shift,
etc. Those keys are provided as voice inteface commands and are displayed regardless of what tab of the
set is currently active.
As with all voice triggers, removing the associated command, hides the buttons as well.
Moreover, the keyboard provides a numpad that can be shown by selecting the appropriate option in the
keyboard configuration.
94
Next to the number keys and the delete key for the number input field (Number backspace), the
numpad provides two options on what to do with the entered number.
When selecting Write number, the entered number will be written out using simulated key presses.
Selecting Select number tries to find a key or tab in the currently active set that has this number as a
trigger. This way you can control a complete keyboard just using numbers.
The keys on the num pad are configurable voice interface commands.
Configuration
simon was designed with high configurability in mind.
General Configuration
The general configuration page lists some basic settings.
95
Please note that the option to start simon at login will work on both Microsoft Windows and when you
are using KDE on Linux. Support for other desktop environments like Gnome, XFCE, etc. might require
manually placing simon in the session autostart (please refer to the respective manuals of your desktop
environment).
When the option to start simon minimized is selected, simon will minimize to the system tray
immediately after starting.
Deselecting the option to warn when there are problems with samples deactivates the sample quality
assurance.
Sound Configuration
simon uses QtMultimedia to record and play sound. QtMultimedia is also used to gather data from the
microphone which is then sent to the simond server for recognition.
Device Configuration
The sound device configuration allows you to choose which sound device(s) to use, how many channels
to use and at which samplerate to record.
Most of the time you will want to use 1 channel and 16kHz (which is also the default) because the
recognition only works on mono input and works best at 16kHz (8kHz being the other option).
However, some low-cost sound cards might not support this particular mode in which case simon can in
many cases work around this limitation by using postprocessing chains and 3rd party software. Please
96
see the postprocessing section for more details.

Bottom line: Only change the channel and the samplerate if you really know what you are doing.
Otherwise the recognition will most likely not work.
Use the selection boxes to change the device. Use the Refresh devices button if you have changed the
sound configuration since you started simon.
You can use simon with more than one sound device at the same time. Use Add device to add a new
device to the configuration and Remove device to remove it from your configuration.
The first device in your sound setup can not be removed.
For each device you can determine for what you want the device to be used: Training or recognition (last
one only applicable for input devices).
If you use more than one device for training, you will create multiple sound files for each utterance.
When using multiple devices for recognition each one feeds a separate sound input stream to the server
resulting in recognition results for each stream.
If you use multiple output devices the playback of the trainings samples will play on all configured
audio devices.
When using different sample rates for your input devices, the output will only play on matching output
devices. If you for example have one input device configured to use 16kHz and the other to use 48kHz,
the playback of samples generated by the first one will only play on 16kHz outputs, the other one only
on 48kHz devices.
97
Voice Activity Detection
The recognition is done one the simond server. See the architecture section for more details.
The sound stream is not continuous but is segmented by the simon client. This is done by something
called "voice activity detection".
Here you can configure this segmentation through the following parameters:
• Cutoff level
Everything below this level is considered "silence" (background noise).
• Head margin
Cache for as long as head margin to start consider it a real sample. During this whole time the input
level needs to be above the cutoff level.
• Tail margin
After the recording went below the cutoff level, simon will wait for as long as tail margin to consider
the current recording a finished sammple.
• Skip samples shorter than
Samples that are shorter than this value are not considered for recognition. (coughs, etc.)
98
Training settings
When the option Default to power training is selected, simon will, when training, automatically start-
and stop the recording when displaying and hiding (respectively) the recording prompt. This option only
sets the default value of the option, the user can change it at any time before beginning a training
session.
The confiugrable font here refers to the text that is recorded to train the acoustic model (through explicit
training or when adding a word).
This option has been introduced after we have worked with a few clients suffering spastic disability.
While we used the mouse to control simon during the training, they had to read what was on the screen.
At first this was very problematic as the regular font size is relatively small and they had trouble making
out what to read. This is why we made the font and the font size of the recording prompt configurable.
Here you can also define the required signal to noise ratio for simon to consider a training sample to be
correct. See the Sample Quality Assurance section for more details.
On this configuration page you can also set the parameters for the volume calibration.
It can be deactivated for both the add word dialog and the trainings wizard by unchecking the group box
itself. As long as the volume is not louder than the minimum volume simon will prompt the user to raise
the microphone volume. If the recording hits the maximum volume once, simon will tell the user to
lower the volume.
Clipping (hitting the maximum amplitute) will always cause a "too loud" warning.
The prompted text can be configured by entering text in the input field below. If the edit is empty a
99
default text will be used.
Postprocessing
All recorded (training) and imported (through the import training data) samples can be processed using a
series of postprocessing commands. Postprocessing chains are an advanced feature and shouldn't be
needed by the average user.
The postprocessing commands can be seen as a chain of filters through which the recordings have to
pass through. Using these "filters" one could define commands to suppress background noise in the
training data or normalize the recordings.
Given the program "process_audio" which takes the input- and output files as its arguments (e.g.:
"process_audio in.wav out.wav") the postprocessing command would be: "process_audio %1 %2". The
two placeholders %1 and %2 will be replaced by the input filename and the output filename
respectively.
The switch to "apply filters to recordings recorded with simon" enables the postprocessing chains for
samples recorded during the training (including the initial training while adding the word). If you don't
select this switch the postprocesing commands are only applied to imported samples (through the import
trainings-data wizard).
One common use-case of postprocessing chains would be the resampling of audio because of hardware
limitations. Given a soundcard that does not support mono 16kHz recordings but only supports 44100Hz
stereo ("CD") recordings, one could use the free command line sound processing utility SoX
(http://sox.sourceforge.net) to resample the recorded files after the recording.
100
This example would require the following postprocessing command:

• sox -c 1 -r 16000 %1 %2
Using this command you can safely record in 44100Hz and 2 channels and - assuming the option to
apply the filters to recordings recorded with simon is selected - simon will automatically downsample
them to 16000Hz and 1 channel automatically after recording them. Make sure to adjust your sound
device configuration accordingly.
Speech Model
Here you can adjust the parameters of the speech model.
The samplerate set here is the target samplerate of the acoustic model. It has nothing to do with the
recording samplerate and it is the responsibility of the user to ensure that the samples are actually made
available in that format (usually by recording in that exact samplerate or by defining postprocessing
commands that resample the files; see the sound configuration section for more details).
Usually either 16kHz or 8kHz models are built / used. 16kHz models will have higher accuracy over
8kHz models. Going higher than 16kHz is not recommended as it is very cpu-intensive and in practice
probably wont result in higher recognition rates.
Moreover, the path to the trainings-samples can be adjusted. However, be sure that the previously
gathered trainings-samples are also moved to the new location. If you use automatic synchronization the
simond would alternatively also provide simon with the missing sample but copying them manually is
still recommended for performance reasons.
101
Model Settings
General
Please see the base model section.
Extensions
Here you can configure the base URL that is going to be used for the automatic bomp import. The
default points to the copy on the simon listens server.
Recognition
Here you can configure the recognition and model synchronization with the simond server.
Server
Using the server configuration you can set parameters of the connection to simond.
General
The simon main application connects to the simond server (see the architecture section for more
information).
To identify individual users of the system (one simond server can of course serve multiple simon
clients), simon and simond use users. Every user has his own speech model. The username / password
102
combination given here is used to log in to simond. If simond does not know the username or the
password is incorrect, the connection will fail. See the simond manual (help:/simond) on how to setup
users for simond.
The recognition itself - which is done by the server - might not be available at all times. For example it
would not be possible to start the recognition as long as the user does not have a compiled acoustic and
language model which has to be created first (during synchronization when all the ingredients -
vocabulary, grammar, training - are present). Using the option to start the recognition automatically once
it is available, simon will request to start the recognition when it receives the information that it is ready
(acoustic and language model is available).
Using the "Connect automatically on simon start" option, simon will automatically start the connection
to the configured simond servers after it has finished loading the user interface.
Network
simon connects to simond using TCP/IP.
As of yet (simon 0.3), encryption is not yet supported.

The timeout setting specifies, how long simon will wait for a first reply when contacting the hosts. If
you are on a very, very slow network and/or use "connect on start" on a very slow machine, you may
want to increase this value if you keep getting timeout errors and can resolve them by trying again
repeatedly.
simon supports to be configured to use more than one simond. This is very useful if you for example are
going to use simon on a laptop which connects to a different server depending where you are. You could
103
for example add the server you use when you are home and the server used when you are at work. When
connecting, simon will try to connect to each of the servers (in order) until it finds one server that
accepts the connection.
To add a server, just enter the hostname or IP and the port (separated by ":") or use the dialog that
appears when you select the blue arrow next to the input field.
Synchronization and Model Backup

Here you can configure the model synchronization and restore older versions of your speech model.
Only after the speech model is synchronized the changes take effect and a new restore point is set. This
is why per default simon will always synchronize the model with the server when it changes. This is
called "Automatic Synchronization" and is the recommended setting.
However, if you want more control you can instruct simon to ask you before starting the synchronization
after the model has changed or to rely on manual synchronization all together. When selecting the
manual synchronization you have to manually use the "Synchronization" menu item of the simon main
window (also see the section simon main window) every time you want to compile the speech model.
The simon server will maintain a copy of the last five iterations of model files. However, this only
includes the "source files" (the vocabulary, grammar, etc.) - not the compiled model. However, the
compiled model will be regenerated from the restored source files automatically.
After you have connected to the server, you can select one of the available models and restore it by
clicking on "Restore Model".
104
Please note that the synchronization will only accept complete source models (containing a vocabulary,
a grammar and some trainingssamples) so incomplete models will not be stored on the server and thus
not be backed up.
Actions
In the actions configuration you can configure the reactions to recognition results.
Recognition
The recognition of simon computes not only the most likely result but rather the top ten results.
Each of the results are assigned a confidence score between 0 and 1 (were 1 is 100% sure).
Using the Minimum confidence you can set a minimum confidence for recognition results to be
considered valid.
If more than one recognition results are rated higher than the minimum confidence score, simon will
provide a popup listing the most likely options for you to choose from.
This popup can be disabled using the Display selection popup for ambiguous results checkbox.
Plugin base font

Many plugins of simon have a graphical user interface.
The fonts of these interfaces can be configured centrally and independant of the systems font settings
here.
105
Lists
Here you can find the global list element configuration. This serves as a template for new scenarios but
is also directly used for the popup for ambiguous recognition results.
Social desktop
Scenarios can be uploaded and downloaded from within simon.
For this we use KDEs social desktop facilities and our own category for simon scenarios on kde-
files.org (http://kde-files.org/index.php?xcontentmode=692).
If you already have an account on opendesktop.org (http://opendesktop.org) you can input the
credentials there. If you don't, you can register directly in the configuration module.
The registration is of course free of charge.
Adjusting the recognition parameters manually

simon is targeted towards end-users. It's interface is designed to allow even users without any
background in speech technology to design their own language and acoustic models by providing
reasonable default values for simple uses.
In special cases (severe speech impairments for example), special configuration might be needed. This is
why the raw configuration files for the recognition are also respected by simon and can of course be
106
modified to suit your needs.

There are basically to parts of the Julius configuration that can be adjusted:
• adin.jconf
This is the configuration of the simon client of the Soundstream sent from simon to the simond. This
file is directly read by the adinstreamer.
simon ships with a default adin.jconf without any special parameters. You can change this system
wide configuration which will affect all users if there are different user accounts on your machine
who all use simon. To just change the configuration of one of those users copy the file to the user path
(see below) and edit this copy.
• julius.jconf
This is a configuration of the simond server and directly influences the recognition. This file is parsed
by libjulius and libsent directly.
simond ships with a default julius.jconf. Whenever there is a new user added to the simond database,
simond will automatically copy this system wide configuration to the new user. After that the user is
of course free to change it but it won't affect the other users. This way the "template" (the system wide
configuration) can be changed without affecting other users.
The path to the Julius configuration files will depend on your platform:
Table 4-5. Julius Configuration Files

File Microsoft Windows GNU/Linux
adin.jconf (system) (installation `kde4-config
path)\share\apps\simon\adin.jc --prefix`/share/apps/simon/adin
onf .jconf
adin.jconf (user) %appdata ~/.kde/share/apps/simon/adin.jc
%\.kde\share\apps\simon\adin.j onf
conf
julius.jconf (template) (installation `kde4-config
path)\share\apps\simond\defaul --prefix`/share/apps/simond/def
t.jconf ault.jconf
julius.jconf (user) %appdata ~/.kde/share/apps/simond/mod
%\.kde\share\apps\simond\mod els/(user)/active/julius.jconf
els\(user)\active\julius.jconf
107
Chapter 5. Questions and Answers
In an effort to keep this section always up-to-date it is available at our online wiki (http://www.cyber-
byte.at/wiki/index.php/English:_Troubleshooting).
108
Chapter 6. Credits and License
simon
Program copyright 2006-2009 Peter Grasch <grasch@simon-listens.org>, Phillip Goriup,
Tschernegg Susanne, Bettina Sturmann, Martin Gigerl
Documentation Copyright (c) 2009 Peter Grasch <grasch@simon-listens.org>
This documentation is licensed under the terms of the GNU Free Documentation License (common/fdl-
license.html).
This program is licensed under the terms of the GNU General Public License (common/gpl-
license.html).
109
Appendix A. Installation
Please see our wiki (http://www.cyber-byte.at/wiki/index.php/English:_Setup) for install instructions.
110

The Simon Handbook

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Simon Handbook

Hochgeladen von

Copyright:

Verfügbare Formate

The simon Handbook

The simon Handbook

Copyright © 2008-2010 Peter Grasch

simon is an open source speech recognition solution.

Speech Recognition: Background

Table 2-1. Sample Vocabulary

Where to get base models

Types of base models

Static base model

Adapted base model

User generated model

Phoneme set issues

To calibrate simply read the text displayed.

... before telling the user to speak:

Sample Quality Assurance

The simon Main Window

Required Resources for a Working simon Setup

First run wizard

Table 4-1. Sample Vocabulary

Maintaining the Vocabulary

Defining the Word

Manually Selecting a Terminal

Manually Providing the Phonetic Transcription

The dialog offers four choices:

One example of a SPHINX dictionary is this dictionary for Mexican Spanish

Table 4-2. Sample Vocabulary

Table 4-3. Improved Sample Vocabulary

Now we change the grammar to the following:

Defining your Grammar

Table 4-4. Improved Sample Vocabulary

Simply enter the trainingstext in an input field.

Local text files

simon can import normal text files to use them as trainings-texts.

On The Fly Training

Importing Trainings Samples

Note: This option is not available on Microsoft Windows.

To add a remote URL (HTTP, FTP, etc.) choose Remote URL

List Command Display

change what results should be filtered.

see the postprocessing section for more details.

This example would require the following postprocessing command:

As of yet (simon 0.3), encryption is not yet supported.

Synchronization and Model Backup

Plugin base font

Adjusting the recognition parameters manually

modified to suit your needs.

Table 4-5. Julius Configuration Files

Das könnte Ihnen auch gefallen